linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] stop using ->read and ->write for kernel access
@ 2020-06-24 16:28 Christoph Hellwig
  2020-06-24 16:28 ` [PATCH 01/11] uptr: add a new "universal pointer" type Christoph Hellwig
                   ` (9 more replies)
  0 siblings, 10 replies; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:28 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-fsdevel

Hi Al and Linus,

as part of removing set_fs entirely (for which I have a working
prototype), we need to stop calling ->read and ->write with kernel
pointers under set_fs.

My previous "clean up kernel_{read,write} & friends v5" series, on which
this one builds, consolidate those calls into the __ḵernel_{read,write}
helpers.  This series goes further and removes the option to call
->read and ->write with kernel pointers entirely.  The replacement
options are either the existing ->read_iter and ->write_iter methods
which cope with kvecs and kernel pointers just fine and are used by
many instances including all the "real" file systems.  For for other
files like devices they are not suitable.  Fortunately we don't really
do pure kernel reads and writes to any of those.  Traditionally the only
real exception has been splice, which could be used every file for the
last few years, although initially it was optional.

So in addition to some more cleanup like moving the seq_file code to
the iter interface this series introduces a new uptr_t type for a
"universal" pointer that can point to either user or kernel memory
in a properly tagged way, and then adds new read_uptr an write_uptr
ops based on that.  It then converts over the sysctl code as 5.8 added
new code that actually does a kernel_write to it.  Any driver that really
wants splice to work can also be converted over to it, although I haven't
run into one yet.

A git branch is available here:

    git://git.infradead.org/users/hch/misc.git uptr

Gitweb:

    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/uptr

Diffstat:
 Documentation/filesystems/seq_file.rst                            |    2 
 Documentation/process/clang-format.rst                            |    4 
 Documentation/translations/it_IT/process/clang-format.rst         |    4 
 arch/alpha/kernel/srm_env.c                                       |    2 
 arch/arm/mm/alignment.c                                           |    2 
 arch/arm/mm/ptdump_debugfs.c                                      |    2 
 arch/arm64/kvm/vgic/vgic-debug.c                                  |    2 
 arch/c6x/platforms/pll.c                                          |    2 
 arch/mips/cavium-octeon/oct_ilm.c                                 |    2 
 arch/mips/kernel/segment.c                                        |    2 
 arch/mips/ralink/bootrom.c                                        |    2 
 arch/powerpc/kernel/rtas-proc.c                                   |   10 -
 arch/powerpc/kvm/book3s_xive_native.c                             |    2 
 arch/powerpc/kvm/timing.c                                         |    2 
 arch/powerpc/mm/numa.c                                            |    2 
 arch/powerpc/mm/ptdump/bats.c                                     |    2 
 arch/powerpc/mm/ptdump/hashpagetable.c                            |    2 
 arch/powerpc/mm/ptdump/ptdump.c                                   |    2 
 arch/powerpc/mm/ptdump/segment_regs.c                             |    2 
 arch/powerpc/platforms/cell/spufs/file.c                          |    8 -
 arch/powerpc/platforms/pseries/hvCall_inst.c                      |    2 
 arch/powerpc/platforms/pseries/lpar.c                             |    4 
 arch/powerpc/platforms/pseries/lparcfg.c                          |    2 
 arch/s390/kernel/diag.c                                           |    2 
 arch/s390/mm/dump_pagetables.c                                    |    2 
 arch/s390/pci/pci_debug.c                                         |    2 
 arch/sh/mm/alignment.c                                            |    2 
 arch/sh/mm/asids-debugfs.c                                        |    2 
 arch/sh/mm/cache-debugfs.c                                        |    2 
 arch/sh/mm/pmb.c                                                  |    2 
 arch/sh/mm/tlb-debugfs.c                                          |    2 
 arch/sparc/kernel/led.c                                           |    2 
 arch/um/kernel/exitcode.c                                         |    2 
 arch/um/kernel/process.c                                          |    2 
 arch/x86/kernel/cpu/mce/severity.c                                |    2 
 arch/x86/kernel/cpu/mtrr/if.c                                     |    2 
 arch/x86/mm/pat/memtype.c                                         |    2 
 arch/x86/mm/pat/set_memory.c                                      |    2 
 arch/x86/platform/uv/tlb_uv.c                                     |    2 
 arch/x86/xen/p2m.c                                                |    2 
 block/blk-mq-debugfs.c                                            |    2 
 drivers/acpi/battery.c                                            |    2 
 drivers/acpi/proc.c                                               |    2 
 drivers/base/power/wakeup.c                                       |    2 
 drivers/block/aoe/aoeblk.c                                        |    2 
 drivers/block/drbd/drbd_debugfs.c                                 |   10 -
 drivers/block/nbd.c                                               |    4 
 drivers/block/pktcdvd.c                                           |    2 
 drivers/block/rsxx/core.c                                         |    4 
 drivers/bus/mvebu-mbus.c                                          |    4 
 drivers/char/tpm/eventlog/common.c                                |    2 
 drivers/crypto/allwinner/sun8i-ce/sun8i-ce-core.c                 |    2 
 drivers/crypto/allwinner/sun8i-ss/sun8i-ss-core.c                 |    2 
 drivers/crypto/amlogic/amlogic-gxl-core.c                         |    2 
 drivers/crypto/caam/dpseci-debugfs.c                              |    2 
 drivers/crypto/cavium/zip/zip_main.c                              |    6 
 drivers/crypto/hisilicon/qm.c                                     |    2 
 drivers/crypto/qat/qat_common/adf_cfg.c                           |    2 
 drivers/crypto/qat/qat_common/adf_transport_debug.c               |    4 
 drivers/firmware/tegra/bpmp-debugfs.c                             |    2 
 drivers/gpio/gpiolib.c                                            |    2 
 drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c                          |    4 
 drivers/gpu/drm/arm/display/komeda/komeda_dev.c                   |    2 
 drivers/gpu/drm/arm/malidp_drv.c                                  |    2 
 drivers/gpu/drm/armada/armada_debugfs.c                           |    2 
 drivers/gpu/drm/drm_debugfs.c                                     |    6 
 drivers/gpu/drm/drm_debugfs_crc.c                                 |    2 
 drivers/gpu/drm/drm_mipi_dbi.c                                    |    2 
 drivers/gpu/drm/i915/display/intel_display_debugfs.c              |   16 +-
 drivers/gpu/drm/i915/gt/debugfs_gt.h                              |    2 
 drivers/gpu/drm/i915/i915_debugfs_params.c                        |   12 -
 drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c                      |    2 
 drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c                          |    4 
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c                       |    2 
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c                           |    4 
 drivers/gpu/drm/msm/msm_debugfs.c                                 |    2 
 drivers/gpu/drm/nouveau/nouveau_debugfs.c                         |    2 
 drivers/gpu/drm/omapdrm/dss/dss.c                                 |    2 
 drivers/gpu/host1x/debug.c                                        |    4 
 drivers/gpu/vga/vga_switcheroo.c                                  |    2 
 drivers/hid/hid-picolcd_debugfs.c                                 |    2 
 drivers/hid/hid-wiimote-debug.c                                   |    2 
 drivers/hwmon/dell-smm-hwmon.c                                    |    2 
 drivers/ide/ide-proc.c                                            |    4 
 drivers/infiniband/hw/cxgb4/device.c                              |    4 
 drivers/infiniband/hw/qib/qib_debugfs.c                           |    2 
 drivers/infiniband/ulp/ipoib/ipoib_fs.c                           |    4 
 drivers/input/input.c                                             |    4 
 drivers/macintosh/via-pmu.c                                       |    2 
 drivers/md/bcache/closure.c                                       |    2 
 drivers/md/md.c                                                   |    2 
 drivers/media/cec/core/cec-core.c                                 |    2 
 drivers/media/pci/saa7164/saa7164-core.c                          |    2 
 drivers/memory/emif.c                                             |    4 
 drivers/memory/tegra/tegra124-emc.c                               |    2 
 drivers/memory/tegra/tegra186-emc.c                               |    2 
 drivers/memory/tegra/tegra20-emc.c                                |    2 
 drivers/memory/tegra/tegra30-emc.c                                |    2 
 drivers/mfd/ab3100-core.c                                         |    2 
 drivers/mfd/ab3100-otp.c                                          |    2 
 drivers/mfd/ab8500-debugfs.c                                      |   14 -
 drivers/mfd/tps65010.c                                            |    2 
 drivers/misc/habanalabs/debugfs.c                                 |    2 
 drivers/misc/sgi-gru/gruprocfs.c                                  |    6 
 drivers/mmc/core/mmc_test.c                                       |    2 
 drivers/mtd/mtdcore.c                                             |    4 
 drivers/mtd/ubi/debug.c                                           |    2 
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c                |   38 ++---
 drivers/net/ethernet/chelsio/cxgb4/l2t.c                          |    2 
 drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c               |    8 -
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth-debugfs.c          |    6 
 drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c                  |    2 
 drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c           |    2 
 drivers/net/wireless/ath/ath5k/debug.c                            |    2 
 drivers/net/wireless/ath/wil6210/debugfs.c                        |   14 -
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/debug.c          |    2 
 drivers/net/wireless/intel/ipw2x00/libipw_module.c                |    2 
 drivers/net/wireless/intel/iwlwifi/fw/debugfs.c                   |    2 
 drivers/net/wireless/intel/iwlwifi/pcie/trans.c                   |    2 
 drivers/net/wireless/intersil/hostap/hostap_download.c            |    2 
 drivers/net/wireless/mediatek/mt76/mt7603/debugfs.c               |    2 
 drivers/net/wireless/mediatek/mt76/mt7615/debugfs.c               |    2 
 drivers/net/wireless/mediatek/mt76/mt76x02_debugfs.c              |    4 
 drivers/net/wireless/mediatek/mt76/mt7915/debugfs.c               |    4 
 drivers/net/wireless/mediatek/mt7601u/debugfs.c                   |    4 
 drivers/net/wireless/realtek/rtlwifi/debug.c                      |    2 
 drivers/net/wireless/realtek/rtw88/debug.c                        |    4 
 drivers/net/wireless/rsi/rsi_91x_debugfs.c                        |    4 
 drivers/net/xen-netback/xenbus.c                                  |    2 
 drivers/nvme/host/fabrics.c                                       |    2 
 drivers/parisc/led.c                                              |    2 
 drivers/pci/controller/pci-tegra.c                                |    2 
 drivers/platform/x86/asus-wmi.c                                   |    2 
 drivers/platform/x86/intel_pmc_core.c                             |    2 
 drivers/platform/x86/intel_telemetry_debugfs.c                    |    4 
 drivers/platform/x86/thinkpad_acpi.c                              |    2 
 drivers/platform/x86/toshiba_acpi.c                               |    8 -
 drivers/pnp/pnpbios/proc.c                                        |    2 
 drivers/power/supply/da9030_battery.c                             |    2 
 drivers/pwm/core.c                                                |    2 
 drivers/ras/cec.c                                                 |    2 
 drivers/ras/debugfs.c                                             |    2 
 drivers/s390/block/dasd.c                                         |    2 
 drivers/s390/block/dasd_proc.c                                    |    2 
 drivers/s390/cio/blacklist.c                                      |    2 
 drivers/s390/cio/qdio_debug.c                                     |    2 
 drivers/scsi/hisi_sas/hisi_sas_main.c                             |   32 ++--
 drivers/scsi/qedf/qedf_dbg.h                                      |    2 
 drivers/scsi/qedi/qedi_dbg.h                                      |    2 
 drivers/scsi/qla2xxx/qla_dfs.c                                    |   12 -
 drivers/scsi/scsi_devinfo.c                                       |    2 
 drivers/scsi/scsi_proc.c                                          |    4 
 drivers/scsi/sg.c                                                 |    4 
 drivers/scsi/snic/snic_debugfs.c                                  |    4 
 drivers/sh/intc/virq-debugfs.c                                    |    2 
 drivers/soc/qcom/cmd-db.c                                         |    2 
 drivers/soc/qcom/socinfo.c                                        |    4 
 drivers/soc/ti/knav_dma.c                                         |    2 
 drivers/soc/ti/knav_qmss_queue.c                                  |    2 
 drivers/staging/rtl8192u/ieee80211/ieee80211_module.c             |    2 
 drivers/staging/vc04_services/interface/vchiq_arm/vchiq_debugfs.c |    4 
 drivers/usb/chipidea/debug.c                                      |    4 
 drivers/usb/dwc2/debugfs.c                                        |    2 
 drivers/usb/dwc3/debugfs.c                                        |    8 -
 drivers/usb/gadget/function/rndis.c                               |    2 
 drivers/usb/gadget/udc/lpc32xx_udc.c                              |    2 
 drivers/usb/gadget/udc/renesas_usb3.c                             |    2 
 drivers/usb/host/xhci-debugfs.c                                   |    6 
 drivers/usb/mtu3/mtu3_debugfs.c                                   |    8 -
 drivers/usb/musb/musb_debugfs.c                                   |    4 
 drivers/video/fbdev/via/viafbdev.c                                |   14 -
 drivers/visorbus/visorbus_main.c                                  |    2 
 drivers/xen/xenfs/xensyms.c                                       |    2 
 fs/cifs/cifs_debug.c                                              |   14 -
 fs/cifs/dfs_cache.c                                               |    2 
 fs/debugfs/file.c                                                 |    4 
 fs/dlm/debug_fs.c                                                 |    8 -
 fs/file_table.c                                                   |    7 
 fs/fscache/object-list.c                                          |    2 
 fs/gfs2/glock.c                                                   |    6 
 fs/internal.h                                                     |   10 +
 fs/jbd2/journal.c                                                 |    2 
 fs/jfs/jfs_debug.c                                                |    2 
 fs/nfsd/nfs4state.c                                               |    4 
 fs/nfsd/nfsctl.c                                                  |   12 -
 fs/nfsd/stats.c                                                   |    2 
 fs/ocfs2/cluster/netdebug.c                                       |    6 
 fs/ocfs2/dlm/dlmdebug.c                                           |    2 
 fs/ocfs2/dlmglue.c                                                |    2 
 fs/open.c                                                         |    8 -
 fs/openpromfs/inode.c                                             |    2 
 fs/orangefs/orangefs-debugfs.c                                    |    2 
 fs/proc/array.c                                                   |    2 
 fs/proc/base.c                                                    |   24 +--
 fs/proc/cpuinfo.c                                                 |    2 
 fs/proc/fd.c                                                      |    2 
 fs/proc/generic.c                                                 |    4 
 fs/proc/inode.c                                                   |   28 +++
 fs/proc/proc_net.c                                                |    4 
 fs/proc/proc_sysctl.c                                             |   41 +++--
 fs/proc/stat.c                                                    |    2 
 fs/proc/task_mmu.c                                                |    8 -
 fs/proc/task_nommu.c                                              |    2 
 fs/proc_namespace.c                                               |    6 
 fs/read_write.c                                                   |   49 ++++--
 fs/seq_file.c                                                     |   34 ++--
 fs/splice.c                                                       |   40 +++--
 include/linux/fs.h                                                |   10 +
 include/linux/proc_fs.h                                           |    1 
 include/linux/seq_file.h                                          |    7 
 include/linux/uptr.h                                              |   72 ++++++++++
 ipc/util.c                                                        |    2 
 kernel/bpf/inode.c                                                |    2 
 kernel/fail_function.c                                            |    2 
 kernel/gcov/fs.c                                                  |    2 
 kernel/irq/debugfs.c                                              |    2 
 kernel/irq/proc.c                                                 |    6 
 kernel/kallsyms.c                                                 |    2 
 kernel/kcsan/debugfs.c                                            |    2 
 kernel/latencytop.c                                               |    2 
 kernel/locking/lockdep_proc.c                                     |    2 
 kernel/module.c                                                   |    2 
 kernel/profile.c                                                  |    2 
 kernel/sched/debug.c                                              |    2 
 kernel/sched/psi.c                                                |    6 
 kernel/time/test_udelay.c                                         |    2 
 kernel/trace/ftrace.c                                             |   16 +-
 kernel/trace/trace.c                                              |   20 +-
 kernel/trace/trace_dynevent.c                                     |    2 
 kernel/trace/trace_events.c                                       |   10 -
 kernel/trace/trace_events_hist.c                                  |    4 
 kernel/trace/trace_events_synth.c                                 |    2 
 kernel/trace/trace_events_trigger.c                               |    2 
 kernel/trace/trace_kprobe.c                                       |    4 
 kernel/trace/trace_printk.c                                       |    2 
 kernel/trace/trace_stack.c                                        |    4 
 kernel/trace/trace_stat.c                                         |    2 
 kernel/trace/trace_uprobe.c                                       |    4 
 lib/debugobjects.c                                                |    2 
 lib/dynamic_debug.c                                               |    4 
 lib/error-inject.c                                                |    2 
 lib/kunit/debugfs.c                                               |    2 
 mm/kmemleak.c                                                     |    2 
 mm/slab_common.c                                                  |    2 
 mm/swapfile.c                                                     |    2 
 net/6lowpan/debugfs.c                                             |    2 
 net/atm/mpoa_proc.c                                               |    2 
 net/batman-adv/debugfs.c                                          |    4 
 net/bluetooth/6lowpan.c                                           |    2 
 net/core/pktgen.c                                                 |    6 
 net/hsr/hsr_debugfs.c                                             |    2 
 net/ipv4/netfilter/ipt_CLUSTERIP.c                                |    2 
 net/ipv4/route.c                                                  |    4 
 net/l2tp/l2tp_debugfs.c                                           |    2 
 net/netfilter/xt_recent.c                                         |    2 
 net/sunrpc/cache.c                                                |    4 
 net/sunrpc/debugfs.c                                              |    4 
 net/sunrpc/rpc_pipe.c                                             |    2 
 net/sunrpc/stats.c                                                |    2 
 security/apparmor/apparmorfs.c                                    |   10 -
 security/integrity/ima/ima_fs.c                                   |    6 
 security/selinux/selinuxfs.c                                      |    2 
 security/smack/smackfs.c                                          |   20 +-
 sound/core/info.c                                                 |    2 
 264 files changed, 700 insertions(+), 549 deletions(-)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 01/11] uptr: add a new "universal pointer" type
  2020-06-24 16:28 [RFC] stop using ->read and ->write for kernel access Christoph Hellwig
@ 2020-06-24 16:28 ` Christoph Hellwig
  2020-06-24 16:28 ` [PATCH 02/11] fs: factor out a set_fmode_can_read_write helper Christoph Hellwig
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:28 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-fsdevel

Add a uptr_t type that can hold a pointer to either a user or kernel
memory region, and simply helpers to copy to and from it.  For
architectures like x86 that have non-overlapping user and kernel
address space it just is a union and uses a TASK_SIZE check to
select the proper copy routine.  For architectures with overlapping
address spaces a flag to indicate the address space is used instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/uptr.h | 72 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)
 create mode 100644 include/linux/uptr.h

diff --git a/include/linux/uptr.h b/include/linux/uptr.h
new file mode 100644
index 00000000000000..1373511f9897b4
--- /dev/null
+++ b/include/linux/uptr.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2020 Christoph Hellwig.
+ *
+ * Support for "universal" pointers that can point to either kernel or userspace
+ * memory.
+ */
+#ifndef _LINUX_UPTR_H
+#define _LINUX_UPTR_H
+
+#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
+typedef union {
+	void		*kernel;
+	void __user	*user;
+} uptr_t;
+
+static inline uptr_t USER_UPTR(void __user *p)
+{
+	return (uptr_t) { .user = p };
+}
+
+static inline uptr_t KERNEL_UPTR(void *p)
+{
+	return (uptr_t) { .kernel = p };
+}
+
+static inline bool uptr_is_kernel(uptr_t uptr)
+{
+	return (unsigned long)uptr.kernel >= TASK_SIZE;
+}
+#else /* CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE */
+typedef struct {
+	union {
+		void		*kernel;
+		void __user	*user;
+	};
+	bool		is_kernel : 1;
+} uptr_t;
+
+static inline uptr_t USER_UPTR(void __user *p)
+{
+	return (uptr_t) { .user = p };
+}
+
+static inline uptr_t KERNEL_UPTR(void *p)
+{
+	return (uptr_t) { .kernel = p, .is_kernel = true };
+}
+
+static inline bool uptr_is_kernel(uptr_t uptr)
+{
+	return uptr.is_kernel;
+}
+#endif /* CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE */
+
+static inline int copy_from_uptr(void *dst, uptr_t src, size_t size)
+{
+	if (!uptr_is_kernel(src))
+		return copy_from_user(dst, src.user, size);
+	memcpy(dst, src.kernel, size);
+	return 0;
+}
+
+static inline int copy_to_uptr(uptr_t dst, const void *src, size_t size)
+{
+	if (!uptr_is_kernel(dst))
+		return copy_to_user(dst.user, src, size);
+	memcpy(dst.kernel, src, size);
+	return 0;
+}
+
+#endif /* _LINUX_UPTR_H */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 02/11] fs: factor out a set_fmode_can_read_write helper
  2020-06-24 16:28 [RFC] stop using ->read and ->write for kernel access Christoph Hellwig
  2020-06-24 16:28 ` [PATCH 01/11] uptr: add a new "universal pointer" type Christoph Hellwig
@ 2020-06-24 16:28 ` Christoph Hellwig
  2020-06-24 16:28 ` [PATCH 03/11] fs: add new read_uptr and write_uptr file operations Christoph Hellwig
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:28 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-fsdevel

Add a helper to set the FMODE_CAN_READ and FMODE_CAN_WRITE logic
instead of duplicating it in two places.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/file_table.c |  7 +------
 fs/internal.h   | 10 ++++++++++
 fs/open.c       |  8 +-------
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 656647f9575a7c..646b83f07a9589 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -199,16 +199,11 @@ static struct file *alloc_file(const struct path *path, int flags,
 	file->f_mapping = path->dentry->d_inode->i_mapping;
 	file->f_wb_err = filemap_sample_wb_err(file->f_mapping);
 	file->f_sb_err = file_sample_sb_err(file);
-	if ((file->f_mode & FMODE_READ) &&
-	     likely(fop->read || fop->read_iter))
-		file->f_mode |= FMODE_CAN_READ;
-	if ((file->f_mode & FMODE_WRITE) &&
-	     likely(fop->write || fop->write_iter))
-		file->f_mode |= FMODE_CAN_WRITE;
 	file->f_mode |= FMODE_OPENED;
 	file->f_op = fop;
 	if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
 		i_readcount_inc(path->dentry->d_inode);
+	set_fmode_can_read_write(file);
 	return file;
 }
 
diff --git a/fs/internal.h b/fs/internal.h
index 9b863a7bd70892..242f2845b3428b 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -185,3 +185,13 @@ int sb_init_dio_done_wq(struct super_block *sb);
  */
 int do_statx(int dfd, const char __user *filename, unsigned flags,
 	     unsigned int mask, struct statx __user *buffer);
+
+static inline void set_fmode_can_read_write(struct file *f)
+{
+	if ((f->f_mode & FMODE_READ) &&
+	    (f->f_op->read || f->f_op->read_iter))
+		f->f_mode |= FMODE_CAN_READ;
+	if ((f->f_mode & FMODE_WRITE) &&
+	    (f->f_op->write || f->f_op->write_iter))
+		f->f_mode |= FMODE_CAN_WRITE;
+}
diff --git a/fs/open.c b/fs/open.c
index 6cd48a61cda3b9..01f2de93c91710 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -832,13 +832,7 @@ static int do_dentry_open(struct file *f,
 	f->f_mode |= FMODE_OPENED;
 	if ((f->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
 		i_readcount_inc(inode);
-	if ((f->f_mode & FMODE_READ) &&
-	     likely(f->f_op->read || f->f_op->read_iter))
-		f->f_mode |= FMODE_CAN_READ;
-	if ((f->f_mode & FMODE_WRITE) &&
-	     likely(f->f_op->write || f->f_op->write_iter))
-		f->f_mode |= FMODE_CAN_WRITE;
-
+	set_fmode_can_read_write(f);
 	f->f_write_hint = WRITE_LIFE_NOT_SET;
 	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-24 16:28 [RFC] stop using ->read and ->write for kernel access Christoph Hellwig
  2020-06-24 16:28 ` [PATCH 01/11] uptr: add a new "universal pointer" type Christoph Hellwig
  2020-06-24 16:28 ` [PATCH 02/11] fs: factor out a set_fmode_can_read_write helper Christoph Hellwig
@ 2020-06-24 16:28 ` Christoph Hellwig
  2020-06-24 17:19   ` Linus Torvalds
  2020-06-24 16:28 ` [PATCH 04/11] sysctl: switch to ->{read,write}_uptr Christoph Hellwig
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:28 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-fsdevel

Add two new file operations that are identical to ->read and ->write
except that they can also safely take kernel pointers using the uptr_t
type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/internal.h      |  4 ++--
 fs/read_write.c    | 18 ++++++++++++++----
 include/linux/fs.h |  3 +++
 3 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 242f2845b3428b..b6777a47b05163 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -189,9 +189,9 @@ int do_statx(int dfd, const char __user *filename, unsigned flags,
 static inline void set_fmode_can_read_write(struct file *f)
 {
 	if ((f->f_mode & FMODE_READ) &&
-	    (f->f_op->read || f->f_op->read_iter))
+	    (f->f_op->read || f->f_op->read_uptr || f->f_op->read_iter))
 		f->f_mode |= FMODE_CAN_READ;
 	if ((f->f_mode & FMODE_WRITE) &&
-	    (f->f_op->write || f->f_op->write_iter))
+	    (f->f_op->write || f->f_op->write_uptr || f->f_op->write_iter))
 		f->f_mode |= FMODE_CAN_WRITE;
 }
diff --git a/fs/read_write.c b/fs/read_write.c
index e7f36b15683049..24ffbf3cbda243 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -430,7 +430,9 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
-	if (file->f_op->read) {
+	if (file->f_op->read_uptr) {
+		ret = file->f_op->read_uptr(file, KERNEL_UPTR(buf), count, pos);
+	} else if (file->f_op->read) {
 		mm_segment_t old_fs = get_fs();
 
 		set_fs(KERNEL_DS);
@@ -485,7 +487,9 @@ ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
 
-	if (file->f_op->read)
+	if (file->f_op->read_uptr)
+		ret = file->f_op->read_uptr(file, USER_UPTR(buf), count, pos);
+	else if (file->f_op->read)
 		ret = file->f_op->read(file, buf, count, pos);
 	else if (file->f_op->read_iter)
 		ret = new_sync_read(file, buf, count, pos);
@@ -530,7 +534,10 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count,
 
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
-	if (file->f_op->write) {
+	if (file->f_op->write_uptr) {
+		ret = file->f_op->write_uptr(file, KERNEL_UPTR((void *)buf),
+				count, pos);
+	} else if (file->f_op->write) {
 		mm_segment_t old_fs = get_fs();
 
 		set_fs(KERNEL_DS);
@@ -592,7 +599,10 @@ ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
 	file_start_write(file);
-	if (file->f_op->write)
+	if (file->f_op->write_uptr)
+		ret = file->f_op->write_uptr(file,
+				USER_UPTR((char __user *)buf), count, pos);
+	else if (file->f_op->write)
 		ret = file->f_op->write(file, buf, count, pos);
 	else if (file->f_op->write_iter)
 		ret = new_sync_write(file, buf, count, pos);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fac6aead402a98..d8fc3015f5a197 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -39,6 +39,7 @@
 #include <linux/fs_types.h>
 #include <linux/build_bug.h>
 #include <linux/stddef.h>
+#include <linux/uptr.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -1830,6 +1831,8 @@ struct file_operations {
 	ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
 	ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
 	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
+	ssize_t (*read_uptr) (struct file *, uptr_t, size_t, loff_t *);
+	ssize_t (*write_uptr) (struct file *, uptr_t, size_t, loff_t *);
 	int (*iopoll)(struct kiocb *kiocb, bool spin);
 	int (*iterate) (struct file *, struct dir_context *);
 	int (*iterate_shared) (struct file *, struct dir_context *);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 04/11] sysctl: switch to ->{read,write}_uptr
  2020-06-24 16:28 [RFC] stop using ->read and ->write for kernel access Christoph Hellwig
                   ` (2 preceding siblings ...)
  2020-06-24 16:28 ` [PATCH 03/11] fs: add new read_uptr and write_uptr file operations Christoph Hellwig
@ 2020-06-24 16:28 ` Christoph Hellwig
  2020-06-24 16:28 ` [PATCH 05/11] fs: refactor new_sync_read Christoph Hellwig
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:28 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-fsdevel

This allows the sysctl handler to safely take kernel space pointers,
as required for the new code to set sysctl parameters at boot time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/proc/proc_sysctl.c | 39 ++++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 42c5128c7d1c76..dd5eb693bd00df 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -540,7 +540,7 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
+static ssize_t proc_sys_call_handler(struct file *filp, uptr_t buf,
 		size_t count, loff_t *ppos, int write)
 {
 	struct inode *inode = file_inode(filp);
@@ -566,20 +566,21 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
 		goto out;
 
 	/* don't even try if the size is too large */
-	if (count > KMALLOC_MAX_SIZE)
+	if (count >= KMALLOC_MAX_SIZE)
 		return -ENOMEM;
 
+	error = -ENOMEM;
+
+	/* allow for NULL termination on writes */
+	kbuf = kzalloc(count + (write ? 1 : 0), GFP_KERNEL);
+	if (!kbuf)
+		goto out;
+
 	if (write) {
-		kbuf = memdup_user_nul(ubuf, count);
-		if (IS_ERR(kbuf)) {
-			error = PTR_ERR(kbuf);
-			goto out;
-		}
-	} else {
-		error = -ENOMEM;
-		kbuf = kzalloc(count, GFP_KERNEL);
-		if (!kbuf)
-			goto out;
+		error = -EFAULT;
+		if (copy_from_uptr(kbuf, buf, count))
+			goto out_free_buf;
+		((char *)kbuf)[count] = '\0';
 	}
 
 	error = BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, &kbuf, &count,
@@ -594,7 +595,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
 
 	if (!write) {
 		error = -EFAULT;
-		if (copy_to_user(ubuf, kbuf, count))
+		if (copy_to_uptr(buf, kbuf, count))
 			goto out_free_buf;
 	}
 
@@ -607,16 +608,16 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
 	return error;
 }
 
-static ssize_t proc_sys_read(struct file *filp, char __user *buf,
+static ssize_t proc_sys_read(struct file *filp, uptr_t buf,
 				size_t count, loff_t *ppos)
 {
-	return proc_sys_call_handler(filp, (void __user *)buf, count, ppos, 0);
+	return proc_sys_call_handler(filp, buf, count, ppos, 0);
 }
 
-static ssize_t proc_sys_write(struct file *filp, const char __user *buf,
+static ssize_t proc_sys_write(struct file *filp, uptr_t buf,
 				size_t count, loff_t *ppos)
 {
-	return proc_sys_call_handler(filp, (void __user *)buf, count, ppos, 1);
+	return proc_sys_call_handler(filp, buf, count, ppos, 1);
 }
 
 static int proc_sys_open(struct inode *inode, struct file *filp)
@@ -853,8 +854,8 @@ static int proc_sys_getattr(const struct path *path, struct kstat *stat,
 static const struct file_operations proc_sys_file_operations = {
 	.open		= proc_sys_open,
 	.poll		= proc_sys_poll,
-	.read		= proc_sys_read,
-	.write		= proc_sys_write,
+	.read_uptr	= proc_sys_read,
+	.write_uptr	= proc_sys_write,
 	.llseek		= default_llseek,
 };
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 05/11] fs: refactor new_sync_read
  2020-06-24 16:28 [RFC] stop using ->read and ->write for kernel access Christoph Hellwig
                   ` (3 preceding siblings ...)
  2020-06-24 16:28 ` [PATCH 04/11] sysctl: switch to ->{read,write}_uptr Christoph Hellwig
@ 2020-06-24 16:28 ` Christoph Hellwig
  2020-06-24 16:28 ` [PATCH 06/11] proc: add a read_iter method to proc proc_ops Christoph Hellwig
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:28 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-fsdevel

Pass the read_iter method as a parameter and mark the function
non-static.  This allows reusing it for additional callsites e.g.
in procfs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c    | 8 +++++---
 include/linux/fs.h | 2 ++
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 24ffbf3cbda243..b92c222ca886ca 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -401,7 +401,8 @@ int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t
 				read_write == READ ? MAY_READ : MAY_WRITE);
 }
 
-static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
+ssize_t iter_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos,
+		ssize_t (*cb)(struct kiocb *iocb, struct iov_iter *iter))
 {
 	struct iovec iov = { .iov_base = buf, .iov_len = len };
 	struct kiocb kiocb;
@@ -412,7 +413,7 @@ static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, lo
 	kiocb.ki_pos = (ppos ? *ppos : 0);
 	iov_iter_init(&iter, READ, &iov, 1, len);
 
-	ret = call_read_iter(filp, &kiocb, &iter);
+	ret = cb(&kiocb, &iter);
 	BUG_ON(ret == -EIOCBQUEUED);
 	if (ppos)
 		*ppos = kiocb.ki_pos;
@@ -492,7 +493,8 @@ ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
 	else if (file->f_op->read)
 		ret = file->f_op->read(file, buf, count, pos);
 	else if (file->f_op->read_iter)
-		ret = new_sync_read(file, buf, count, pos);
+		ret = iter_read(file, buf, count, pos,
+				file->f_op->read_iter);
 	else
 		ret = -EINVAL;
 	if (ret > 0) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d8fc3015f5a197..d0fea0281ef29b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1919,6 +1919,8 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
 			      unsigned long nr_segs, unsigned long fast_segs,
 			      struct iovec *fast_pointer,
 			      struct iovec **ret_pointer);
+ssize_t iter_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos,
+		ssize_t (*cb)(struct kiocb *iocb, struct iov_iter *iter));
 
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 06/11] proc: add a read_iter method to proc proc_ops
  2020-06-24 16:28 [RFC] stop using ->read and ->write for kernel access Christoph Hellwig
                   ` (4 preceding siblings ...)
  2020-06-24 16:28 ` [PATCH 05/11] fs: refactor new_sync_read Christoph Hellwig
@ 2020-06-24 16:28 ` Christoph Hellwig
  2020-06-24 16:28 ` [PATCH 07/11] seq_file: add seq_read_iter Christoph Hellwig
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:28 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-fsdevel

This will allow proc files to implement iter read semantics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/proc/inode.c         | 28 ++++++++++++++++++++++++++++
 include/linux/proc_fs.h |  1 +
 2 files changed, 29 insertions(+)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 28d6105e908e4c..fa86619cebc2be 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -297,6 +297,29 @@ static loff_t proc_reg_llseek(struct file *file, loff_t offset, int whence)
 	return rv;
 }
 
+static ssize_t pde_read_iter(struct proc_dir_entry *pde, struct kiocb *iocb,
+		struct iov_iter *iter)
+{
+	if (!pde->proc_ops->proc_read_iter)
+		return -EINVAL;
+	return pde->proc_ops->proc_read_iter(iocb, iter);
+}
+
+static ssize_t proc_reg_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct proc_dir_entry *pde = PDE(file_inode(iocb->ki_filp));
+	ssize_t ret;
+
+	if (pde_is_permanent(pde))
+		return pde_read_iter(pde, iocb, iter);
+
+	if (!use_pde(pde))
+		return -EIO;
+	ret = pde_read_iter(pde, iocb, iter);
+	unuse_pde(pde);
+	return ret;
+}
+
 static ssize_t pde_read(struct proc_dir_entry *pde, struct file *file, char __user *buf, size_t count, loff_t *ppos)
 {
 	typeof_member(struct proc_ops, proc_read) read;
@@ -312,6 +335,9 @@ static ssize_t proc_reg_read(struct file *file, char __user *buf, size_t count,
 	struct proc_dir_entry *pde = PDE(file_inode(file));
 	ssize_t rv = -EIO;
 
+	if (pde->proc_ops->proc_read_iter)
+		return iter_read(file, buf, count, ppos, proc_reg_read_iter);
+
 	if (pde_is_permanent(pde)) {
 		return pde_read(pde, file, buf, count, ppos);
 	} else if (use_pde(pde)) {
@@ -569,6 +595,7 @@ static int proc_reg_release(struct inode *inode, struct file *file)
 static const struct file_operations proc_reg_file_ops = {
 	.llseek		= proc_reg_llseek,
 	.read		= proc_reg_read,
+	.read_iter	= proc_reg_read_iter,
 	.write		= proc_reg_write,
 	.poll		= proc_reg_poll,
 	.unlocked_ioctl	= proc_reg_unlocked_ioctl,
@@ -585,6 +612,7 @@ static const struct file_operations proc_reg_file_ops = {
 static const struct file_operations proc_reg_file_ops_no_compat = {
 	.llseek		= proc_reg_llseek,
 	.read		= proc_reg_read,
+	.read_iter	= proc_reg_read_iter,
 	.write		= proc_reg_write,
 	.poll		= proc_reg_poll,
 	.unlocked_ioctl	= proc_reg_unlocked_ioctl,
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index d1eed1b4365172..97b3f5f06db9d8 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -30,6 +30,7 @@ struct proc_ops {
 	unsigned int proc_flags;
 	int	(*proc_open)(struct inode *, struct file *);
 	ssize_t	(*proc_read)(struct file *, char __user *, size_t, loff_t *);
+	ssize_t (*proc_read_iter)(struct kiocb *, struct iov_iter *);
 	ssize_t	(*proc_write)(struct file *, const char __user *, size_t, loff_t *);
 	loff_t	(*proc_lseek)(struct file *, loff_t, int);
 	int	(*proc_release)(struct inode *, struct file *);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 07/11] seq_file: add seq_read_iter
  2020-06-24 16:28 [RFC] stop using ->read and ->write for kernel access Christoph Hellwig
                   ` (5 preceding siblings ...)
  2020-06-24 16:28 ` [PATCH 06/11] proc: add a read_iter method to proc proc_ops Christoph Hellwig
@ 2020-06-24 16:28 ` Christoph Hellwig
  2020-06-24 16:28 ` [PATCH 09/11] proc: switch over direct seq_read method calls to seq_read_iter Christoph Hellwig
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:28 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-fsdevel

iov_iter based variant for reading a seq_file.  seq_read is
reimplemented on top of the iter variant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/seq_file.c            | 34 +++++++++++++++++++++-------------
 include/linux/seq_file.h |  1 +
 2 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index 4e6239f33c066a..159a3e12cadd30 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -18,6 +18,7 @@
 #include <linux/mm.h>
 #include <linux/printk.h>
 #include <linux/string_helpers.h>
+#include <linux/uio.h>
 
 #include <linux/uaccess.h>
 #include <asm/page.h>
@@ -146,7 +147,17 @@ static int traverse(struct seq_file *m, loff_t offset)
  */
 ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
 {
-	struct seq_file *m = file->private_data;
+	return iter_read(file, buf, size, ppos, seq_read_iter);
+}
+EXPORT_SYMBOL(seq_read);
+
+/*
+ * Ready-made ->f_op->read_iter()
+ */
+ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct seq_file *m = iocb->ki_filp->private_data;
+	size_t size = iov_iter_count(iter);
 	size_t copied = 0;
 	size_t n;
 	void *p;
@@ -158,14 +169,14 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
 	 * if request is to read from zero offset, reset iterator to first
 	 * record as it might have been already advanced by previous requests
 	 */
-	if (*ppos == 0) {
+	if (iocb->ki_pos == 0) {
 		m->index = 0;
 		m->count = 0;
 	}
 
-	/* Don't assume *ppos is where we left it */
-	if (unlikely(*ppos != m->read_pos)) {
-		while ((err = traverse(m, *ppos)) == -EAGAIN)
+	/* Don't assume ki_pos is where we left it */
+	if (unlikely(iocb->ki_pos != m->read_pos)) {
+		while ((err = traverse(m, iocb->ki_pos)) == -EAGAIN)
 			;
 		if (err) {
 			/* With prejudice... */
@@ -174,7 +185,7 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
 			m->count = 0;
 			goto Done;
 		} else {
-			m->read_pos = *ppos;
+			m->read_pos = iocb->ki_pos;
 		}
 	}
 
@@ -187,13 +198,11 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
 	/* if not empty - flush it first */
 	if (m->count) {
 		n = min(m->count, size);
-		err = copy_to_user(buf, m->buf + m->from, n);
-		if (err)
+		if (copy_to_iter(m->buf + m->from, n, iter) != n)
 			goto Efault;
 		m->count -= n;
 		m->from += n;
 		size -= n;
-		buf += n;
 		copied += n;
 		if (!size)
 			goto Done;
@@ -254,8 +263,7 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
 	}
 	m->op->stop(m, p);
 	n = min(m->count, size);
-	err = copy_to_user(buf, m->buf, n);
-	if (err)
+	if (copy_to_iter(m->buf, n, iter) != n)
 		goto Efault;
 	copied += n;
 	m->count -= n;
@@ -264,7 +272,7 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
 	if (!copied)
 		copied = err;
 	else {
-		*ppos += copied;
+		iocb->ki_pos += copied;
 		m->read_pos += copied;
 	}
 	mutex_unlock(&m->lock);
@@ -276,7 +284,7 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
 	err = -EFAULT;
 	goto Done;
 }
-EXPORT_SYMBOL(seq_read);
+EXPORT_SYMBOL(seq_read_iter);
 
 /**
  *	seq_lseek -	->llseek() method for sequential files.
diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
index 813614d4b71fbc..b83b3ae3c877f3 100644
--- a/include/linux/seq_file.h
+++ b/include/linux/seq_file.h
@@ -107,6 +107,7 @@ void seq_pad(struct seq_file *m, char c);
 char *mangle_path(char *s, const char *p, const char *esc);
 int seq_open(struct file *, const struct seq_operations *);
 ssize_t seq_read(struct file *, char __user *, size_t, loff_t *);
+ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter);
 loff_t seq_lseek(struct file *, loff_t, int);
 int seq_release(struct inode *, struct file *);
 int seq_write(struct seq_file *seq, const void *data, size_t len);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 09/11] proc: switch over direct seq_read method calls to seq_read_iter
  2020-06-24 16:28 [RFC] stop using ->read and ->write for kernel access Christoph Hellwig
                   ` (6 preceding siblings ...)
  2020-06-24 16:28 ` [PATCH 07/11] seq_file: add seq_read_iter Christoph Hellwig
@ 2020-06-24 16:28 ` Christoph Hellwig
  2020-06-24 16:29 ` [PATCH 10/11] fs: don't allow kernel reads and writes using ->read and ->write Christoph Hellwig
  2020-06-24 16:29 ` [PATCH 11/11] fs: don't allow splice read/write without explicit ops Christoph Hellwig
  9 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:28 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-fsdevel

Switch over all instances used directly as methods using these sed
expressions:

sed -i -e 's/\.proc_read\(\s*=\s*\)seq_read/\.proc_read_iter\1seq_read_iter/g'

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/alpha/kernel/srm_env.c                        |  2 +-
 arch/arm/mm/alignment.c                            |  2 +-
 arch/powerpc/kernel/rtas-proc.c                    | 10 +++++-----
 arch/powerpc/mm/numa.c                             |  2 +-
 arch/powerpc/platforms/pseries/lpar.c              |  4 ++--
 arch/powerpc/platforms/pseries/lparcfg.c           |  2 +-
 arch/sh/mm/alignment.c                             |  2 +-
 arch/sparc/kernel/led.c                            |  2 +-
 arch/um/kernel/exitcode.c                          |  2 +-
 arch/um/kernel/process.c                           |  2 +-
 arch/x86/kernel/cpu/mtrr/if.c                      |  2 +-
 arch/x86/platform/uv/tlb_uv.c                      |  2 +-
 drivers/acpi/battery.c                             |  2 +-
 drivers/acpi/proc.c                                |  2 +-
 drivers/hwmon/dell-smm-hwmon.c                     |  2 +-
 drivers/ide/ide-proc.c                             |  2 +-
 drivers/input/input.c                              |  4 ++--
 drivers/macintosh/via-pmu.c                        |  2 +-
 drivers/md/md.c                                    |  2 +-
 drivers/misc/sgi-gru/gruprocfs.c                   |  6 +++---
 drivers/net/wireless/intel/ipw2x00/libipw_module.c |  2 +-
 .../net/wireless/intersil/hostap/hostap_download.c |  2 +-
 drivers/parisc/led.c                               |  2 +-
 drivers/platform/x86/thinkpad_acpi.c               |  2 +-
 drivers/platform/x86/toshiba_acpi.c                |  8 ++++----
 drivers/pnp/pnpbios/proc.c                         |  2 +-
 drivers/s390/block/dasd_proc.c                     |  2 +-
 drivers/s390/cio/blacklist.c                       |  2 +-
 drivers/scsi/scsi_devinfo.c                        |  2 +-
 drivers/scsi/scsi_proc.c                           |  4 ++--
 drivers/scsi/sg.c                                  |  4 ++--
 .../staging/rtl8192u/ieee80211/ieee80211_module.c  |  2 +-
 drivers/usb/gadget/function/rndis.c                |  2 +-
 drivers/video/fbdev/via/viafbdev.c                 | 14 +++++++-------
 fs/cifs/cifs_debug.c                               | 14 +++++++-------
 fs/cifs/dfs_cache.c                                |  2 +-
 fs/fscache/object-list.c                           |  2 +-
 fs/jbd2/journal.c                                  |  2 +-
 fs/jfs/jfs_debug.c                                 |  2 +-
 fs/nfsd/nfsctl.c                                   |  2 +-
 fs/nfsd/stats.c                                    |  2 +-
 fs/proc/cpuinfo.c                                  |  2 +-
 fs/proc/generic.c                                  |  4 ++--
 fs/proc/proc_net.c                                 |  4 ++--
 fs/proc/stat.c                                     |  2 +-
 include/linux/seq_file.h                           |  2 +-
 ipc/util.c                                         |  2 +-
 kernel/irq/proc.c                                  |  6 +++---
 kernel/kallsyms.c                                  |  2 +-
 kernel/latencytop.c                                |  2 +-
 kernel/locking/lockdep_proc.c                      |  2 +-
 kernel/module.c                                    |  2 +-
 kernel/profile.c                                   |  2 +-
 kernel/sched/psi.c                                 |  6 +++---
 lib/dynamic_debug.c                                |  2 +-
 mm/slab_common.c                                   |  2 +-
 mm/swapfile.c                                      |  2 +-
 net/atm/mpoa_proc.c                                |  2 +-
 net/core/pktgen.c                                  |  6 +++---
 net/ipv4/netfilter/ipt_CLUSTERIP.c                 |  2 +-
 net/ipv4/route.c                                   |  4 ++--
 net/netfilter/xt_recent.c                          |  2 +-
 net/sunrpc/cache.c                                 |  2 +-
 net/sunrpc/stats.c                                 |  2 +-
 sound/core/info.c                                  |  2 +-
 65 files changed, 99 insertions(+), 99 deletions(-)

diff --git a/arch/alpha/kernel/srm_env.c b/arch/alpha/kernel/srm_env.c
index 528d2be5818298..8ad9c100ef7612 100644
--- a/arch/alpha/kernel/srm_env.c
+++ b/arch/alpha/kernel/srm_env.c
@@ -121,7 +121,7 @@ static ssize_t srm_env_proc_write(struct file *file, const char __user *buffer,
 
 static const struct proc_ops srm_env_proc_ops = {
 	.proc_open	= srm_env_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= srm_env_proc_write,
diff --git a/arch/arm/mm/alignment.c b/arch/arm/mm/alignment.c
index 81a627e6e1c599..412cab88402acd 100644
--- a/arch/arm/mm/alignment.c
+++ b/arch/arm/mm/alignment.c
@@ -164,7 +164,7 @@ static ssize_t alignment_proc_write(struct file *file, const char __user *buffer
 
 static const struct proc_ops alignment_proc_ops = {
 	.proc_open	= alignment_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= alignment_proc_write,
diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c
index 2d33f342a29307..3aace56aacc1df 100644
--- a/arch/powerpc/kernel/rtas-proc.c
+++ b/arch/powerpc/kernel/rtas-proc.c
@@ -161,7 +161,7 @@ static int poweron_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops ppc_rtas_poweron_proc_ops = {
 	.proc_open	= poweron_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_write	= ppc_rtas_poweron_write,
 	.proc_release	= single_release,
@@ -174,7 +174,7 @@ static int progress_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops ppc_rtas_progress_proc_ops = {
 	.proc_open	= progress_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_write	= ppc_rtas_progress_write,
 	.proc_release	= single_release,
@@ -187,7 +187,7 @@ static int clock_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops ppc_rtas_clock_proc_ops = {
 	.proc_open	= clock_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_write	= ppc_rtas_clock_write,
 	.proc_release	= single_release,
@@ -200,7 +200,7 @@ static int tone_freq_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops ppc_rtas_tone_freq_proc_ops = {
 	.proc_open	= tone_freq_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_write	= ppc_rtas_tone_freq_write,
 	.proc_release	= single_release,
@@ -213,7 +213,7 @@ static int tone_volume_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops ppc_rtas_tone_volume_proc_ops = {
 	.proc_open	= tone_volume_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_write	= ppc_rtas_tone_volume_write,
 	.proc_release	= single_release,
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 9fcf2d19583004..2f3aef6c0b513b 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1672,7 +1672,7 @@ static ssize_t topology_write(struct file *file, const char __user *buf,
 }
 
 static const struct proc_ops topology_proc_ops = {
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= topology_write,
 	.proc_open	= topology_open,
 	.proc_release	= single_release,
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index fd26f3d21d7b4b..2b13a67d60e206 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -584,7 +584,7 @@ static int vcpudispatch_stats_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops vcpudispatch_stats_proc_ops = {
 	.proc_open	= vcpudispatch_stats_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= vcpudispatch_stats_write,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
@@ -628,7 +628,7 @@ static int vcpudispatch_stats_freq_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops vcpudispatch_stats_freq_proc_ops = {
 	.proc_open	= vcpudispatch_stats_freq_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= vcpudispatch_stats_freq_write,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
index b8d28ab881789d..35eb0e4b8fd31a 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -699,7 +699,7 @@ static int lparcfg_open(struct inode *inode, struct file *file)
 }
 
 static const struct proc_ops lparcfg_proc_ops = {
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= lparcfg_write,
 	.proc_open	= lparcfg_open,
 	.proc_release	= single_release,
diff --git a/arch/sh/mm/alignment.c b/arch/sh/mm/alignment.c
index fb517b82a87b10..66115241ad93a4 100644
--- a/arch/sh/mm/alignment.c
+++ b/arch/sh/mm/alignment.c
@@ -154,7 +154,7 @@ static ssize_t alignment_proc_write(struct file *file,
 
 static const struct proc_ops alignment_proc_ops = {
 	.proc_open	= alignment_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= alignment_proc_write,
diff --git a/arch/sparc/kernel/led.c b/arch/sparc/kernel/led.c
index bd48575172c323..a0b893d216c443 100644
--- a/arch/sparc/kernel/led.c
+++ b/arch/sparc/kernel/led.c
@@ -106,7 +106,7 @@ static ssize_t led_proc_write(struct file *file, const char __user *buffer,
 
 static const struct proc_ops led_proc_ops = {
 	.proc_open	= led_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= led_proc_write,
diff --git a/arch/um/kernel/exitcode.c b/arch/um/kernel/exitcode.c
index 43edc2aa57e4fb..95184d271a47cf 100644
--- a/arch/um/kernel/exitcode.c
+++ b/arch/um/kernel/exitcode.c
@@ -57,7 +57,7 @@ static ssize_t exitcode_proc_write(struct file *file,
 
 static const struct proc_ops exitcode_proc_ops = {
 	.proc_open	= exitcode_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= exitcode_proc_write,
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index e3a2cf92a3738b..f3e4bd48f6d5b6 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -312,7 +312,7 @@ static ssize_t sysemu_proc_write(struct file *file, const char __user *buf,
 
 static const struct proc_ops sysemu_proc_ops = {
 	.proc_open	= sysemu_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= sysemu_proc_write,
diff --git a/arch/x86/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c
index a5c506f6da7fa1..f5743b5ecaf232 100644
--- a/arch/x86/kernel/cpu/mtrr/if.c
+++ b/arch/x86/kernel/cpu/mtrr/if.c
@@ -398,7 +398,7 @@ static int mtrr_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops mtrr_proc_ops = {
 	.proc_open		= mtrr_open,
-	.proc_read		= seq_read,
+	.proc_read_iter		= seq_read_iter,
 	.proc_lseek		= seq_lseek,
 	.proc_write		= mtrr_write,
 	.proc_ioctl		= mtrr_ioctl,
diff --git a/arch/x86/platform/uv/tlb_uv.c b/arch/x86/platform/uv/tlb_uv.c
index 0ac96ca304c7b5..cd6ee86283e6e1 100644
--- a/arch/x86/platform/uv/tlb_uv.c
+++ b/arch/x86/platform/uv/tlb_uv.c
@@ -1670,7 +1670,7 @@ static int tunables_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops uv_ptc_proc_ops = {
 	.proc_open	= ptc_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= ptc_proc_write,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= seq_release,
diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c
index 366c389175d844..c8e5972ad6c952 100644
--- a/drivers/acpi/battery.c
+++ b/drivers/acpi/battery.c
@@ -1204,7 +1204,7 @@ static int acpi_battery_alarm_proc_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops acpi_battery_alarm_proc_ops = {
 	.proc_open	= acpi_battery_alarm_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= acpi_battery_write_alarm,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
diff --git a/drivers/acpi/proc.c b/drivers/acpi/proc.c
index 7892980b3ce4d3..774da498c77265 100644
--- a/drivers/acpi/proc.c
+++ b/drivers/acpi/proc.c
@@ -136,7 +136,7 @@ acpi_system_wakeup_device_open_fs(struct inode *inode, struct file *file)
 
 static const struct proc_ops acpi_system_wakeup_device_proc_ops = {
 	.proc_open	= acpi_system_wakeup_device_open_fs,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= acpi_system_write_wakeup_device,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
diff --git a/drivers/hwmon/dell-smm-hwmon.c b/drivers/hwmon/dell-smm-hwmon.c
index 16be012a95ed84..29ac388429620f 100644
--- a/drivers/hwmon/dell-smm-hwmon.c
+++ b/drivers/hwmon/dell-smm-hwmon.c
@@ -597,7 +597,7 @@ static int i8k_open_fs(struct inode *inode, struct file *file)
 
 static const struct proc_ops i8k_proc_ops = {
 	.proc_open	= i8k_open_fs,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_ioctl	= i8k_ioctl,
diff --git a/drivers/ide/ide-proc.c b/drivers/ide/ide-proc.c
index 8ea282a3a19f0d..a0b56737152fb5 100644
--- a/drivers/ide/ide-proc.c
+++ b/drivers/ide/ide-proc.c
@@ -383,7 +383,7 @@ static ssize_t ide_settings_proc_write(struct file *file, const char __user *buf
 
 static const struct proc_ops ide_settings_proc_ops = {
 	.proc_open	= ide_settings_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= ide_settings_proc_write,
diff --git a/drivers/input/input.c b/drivers/input/input.c
index 3cfd2c18eebd9d..c8180d7f92d576 100644
--- a/drivers/input/input.c
+++ b/drivers/input/input.c
@@ -1220,7 +1220,7 @@ static int input_proc_devices_open(struct inode *inode, struct file *file)
 static const struct proc_ops input_devices_proc_ops = {
 	.proc_open	= input_proc_devices_open,
 	.proc_poll	= input_proc_devices_poll,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= seq_release,
 };
@@ -1282,7 +1282,7 @@ static int input_proc_handlers_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops input_handlers_proc_ops = {
 	.proc_open	= input_proc_handlers_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= seq_release,
 };
diff --git a/drivers/macintosh/via-pmu.c b/drivers/macintosh/via-pmu.c
index 73e6ae88fafd4e..9415eddb419402 100644
--- a/drivers/macintosh/via-pmu.c
+++ b/drivers/macintosh/via-pmu.c
@@ -973,7 +973,7 @@ static ssize_t pmu_options_proc_write(struct file *file,
 
 static const struct proc_ops pmu_options_proc_ops = {
 	.proc_open	= pmu_options_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= pmu_options_proc_write,
diff --git a/drivers/md/md.c b/drivers/md/md.c
index f567f536b529bd..0bae6c1523cec6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8301,7 +8301,7 @@ static __poll_t mdstat_poll(struct file *filp, poll_table *wait)
 
 static const struct proc_ops mdstat_proc_ops = {
 	.proc_open	= md_seq_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= seq_release,
 	.proc_poll	= mdstat_poll,
diff --git a/drivers/misc/sgi-gru/gruprocfs.c b/drivers/misc/sgi-gru/gruprocfs.c
index 97b8b38ab47dfd..fc9498ec797762 100644
--- a/drivers/misc/sgi-gru/gruprocfs.c
+++ b/drivers/misc/sgi-gru/gruprocfs.c
@@ -257,7 +257,7 @@ static int options_open(struct inode *inode, struct file *file)
 /* *INDENT-OFF* */
 static const struct proc_ops statistics_proc_ops = {
 	.proc_open	= statistics_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= statistics_write,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
@@ -265,7 +265,7 @@ static const struct proc_ops statistics_proc_ops = {
 
 static const struct proc_ops mcs_statistics_proc_ops = {
 	.proc_open	= mcs_statistics_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= mcs_statistics_write,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
@@ -273,7 +273,7 @@ static const struct proc_ops mcs_statistics_proc_ops = {
 
 static const struct proc_ops options_proc_ops = {
 	.proc_open	= options_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= options_write,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
diff --git a/drivers/net/wireless/intel/ipw2x00/libipw_module.c b/drivers/net/wireless/intel/ipw2x00/libipw_module.c
index 43bab92a4148f2..1929db6921d7e0 100644
--- a/drivers/net/wireless/intel/ipw2x00/libipw_module.c
+++ b/drivers/net/wireless/intel/ipw2x00/libipw_module.c
@@ -242,7 +242,7 @@ static ssize_t debug_level_proc_write(struct file *file,
 
 static const struct proc_ops debug_level_proc_ops = {
 	.proc_open	= debug_level_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= debug_level_proc_write,
diff --git a/drivers/net/wireless/intersil/hostap/hostap_download.c b/drivers/net/wireless/intersil/hostap/hostap_download.c
index 7c6a5a6d1d45d8..8980fd57b2eda4 100644
--- a/drivers/net/wireless/intersil/hostap/hostap_download.c
+++ b/drivers/net/wireless/intersil/hostap/hostap_download.c
@@ -234,7 +234,7 @@ static int prism2_download_aux_dump_proc_open(struct inode *inode, struct file *
 
 static const struct proc_ops prism2_download_aux_dump_proc_ops = {
 	.proc_open		= prism2_download_aux_dump_proc_open,
-	.proc_read		= seq_read,
+	.proc_read_iter		= seq_read_iter,
 	.proc_lseek		= seq_lseek,
 	.proc_release		= seq_release_private,
 };
diff --git a/drivers/parisc/led.c b/drivers/parisc/led.c
index 36c6613f7a36b7..d75df3977926b3 100644
--- a/drivers/parisc/led.c
+++ b/drivers/parisc/led.c
@@ -232,7 +232,7 @@ static ssize_t led_proc_write(struct file *file, const char __user *buf,
 
 static const struct proc_ops led_proc_ops = {
 	.proc_open	= led_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= led_proc_write,
diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c
index ff7f0a4f247563..f571d6254e7c34 100644
--- a/drivers/platform/x86/thinkpad_acpi.c
+++ b/drivers/platform/x86/thinkpad_acpi.c
@@ -901,7 +901,7 @@ static ssize_t dispatch_proc_write(struct file *file,
 
 static const struct proc_ops dispatch_proc_ops = {
 	.proc_open	= dispatch_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= dispatch_proc_write,
diff --git a/drivers/platform/x86/toshiba_acpi.c b/drivers/platform/x86/toshiba_acpi.c
index 1ddab5a6dead6d..770477bb407d49 100644
--- a/drivers/platform/x86/toshiba_acpi.c
+++ b/drivers/platform/x86/toshiba_acpi.c
@@ -1428,7 +1428,7 @@ static ssize_t lcd_proc_write(struct file *file, const char __user *buf,
 
 static const struct proc_ops lcd_proc_ops = {
 	.proc_open	= lcd_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= lcd_proc_write,
@@ -1534,7 +1534,7 @@ static ssize_t video_proc_write(struct file *file, const char __user *buf,
 
 static const struct proc_ops video_proc_ops = {
 	.proc_open	= video_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= video_proc_write,
@@ -1611,7 +1611,7 @@ static ssize_t fan_proc_write(struct file *file, const char __user *buf,
 
 static const struct proc_ops fan_proc_ops = {
 	.proc_open	= fan_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= fan_proc_write,
@@ -1655,7 +1655,7 @@ static ssize_t keys_proc_write(struct file *file, const char __user *buf,
 
 static const struct proc_ops keys_proc_ops = {
 	.proc_open	= keys_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= keys_proc_write,
diff --git a/drivers/pnp/pnpbios/proc.c b/drivers/pnp/pnpbios/proc.c
index a806830e3a407f..10d0181c4430ab 100644
--- a/drivers/pnp/pnpbios/proc.c
+++ b/drivers/pnp/pnpbios/proc.c
@@ -212,7 +212,7 @@ static ssize_t pnpbios_proc_write(struct file *file, const char __user *buf,
 
 static const struct proc_ops pnpbios_proc_ops = {
 	.proc_open	= pnpbios_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= pnpbios_proc_write,
diff --git a/drivers/s390/block/dasd_proc.c b/drivers/s390/block/dasd_proc.c
index 62a859ea67f893..278f0dccc85ff1 100644
--- a/drivers/s390/block/dasd_proc.c
+++ b/drivers/s390/block/dasd_proc.c
@@ -322,7 +322,7 @@ static ssize_t dasd_stats_proc_write(struct file *file,
 
 static const struct proc_ops dasd_stats_proc_ops = {
 	.proc_open	= dasd_stats_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= dasd_stats_proc_write,
diff --git a/drivers/s390/cio/blacklist.c b/drivers/s390/cio/blacklist.c
index 4dd2eb63485699..05f58c453b060c 100644
--- a/drivers/s390/cio/blacklist.c
+++ b/drivers/s390/cio/blacklist.c
@@ -401,7 +401,7 @@ cio_ignore_proc_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops cio_ignore_proc_ops = {
 	.proc_open	= cio_ignore_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= seq_release_private,
 	.proc_write	= cio_ignore_write,
diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index eed31021e7885c..87fb440ddfc5d8 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -738,7 +738,7 @@ static ssize_t proc_scsi_devinfo_write(struct file *file,
 
 static const struct proc_ops scsi_devinfo_proc_ops = {
 	.proc_open	= proc_scsi_devinfo_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= proc_scsi_devinfo_write,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= seq_release,
diff --git a/drivers/scsi/scsi_proc.c b/drivers/scsi/scsi_proc.c
index d6982d3557396b..81601a9e79c4db 100644
--- a/drivers/scsi/scsi_proc.c
+++ b/drivers/scsi/scsi_proc.c
@@ -86,7 +86,7 @@ static int proc_scsi_host_open(struct inode *inode, struct file *file)
 static const struct proc_ops proc_scsi_ops = {
 	.proc_open	= proc_scsi_host_open,
 	.proc_release	= single_release,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_write	= proc_scsi_host_write
 };
@@ -438,7 +438,7 @@ static int proc_scsi_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops scsi_scsi_proc_ops = {
 	.proc_open	= proc_scsi_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= proc_scsi_write,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= seq_release,
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 20472aaaf630a4..c5d482190066bc 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -2328,7 +2328,7 @@ static ssize_t sg_proc_write_adio(struct file *filp, const char __user *buffer,
 			          size_t count, loff_t *off);
 static const struct proc_ops adio_proc_ops = {
 	.proc_open	= sg_proc_single_open_adio,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_write	= sg_proc_write_adio,
 	.proc_release	= single_release,
@@ -2339,7 +2339,7 @@ static ssize_t sg_proc_write_dressz(struct file *filp,
 		const char __user *buffer, size_t count, loff_t *off);
 static const struct proc_ops dressz_proc_ops = {
 	.proc_open	= sg_proc_single_open_dressz,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_write	= sg_proc_write_dressz,
 	.proc_release	= single_release,
diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211_module.c b/drivers/staging/rtl8192u/ieee80211/ieee80211_module.c
index a5a1b14f5a40c5..e198779db6fc2e 100644
--- a/drivers/staging/rtl8192u/ieee80211/ieee80211_module.c
+++ b/drivers/staging/rtl8192u/ieee80211/ieee80211_module.c
@@ -265,7 +265,7 @@ static int open_debug_level(struct inode *inode, struct file *file)
 
 static const struct proc_ops debug_level_proc_ops = {
 	.proc_open	= open_debug_level,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_write	= write_debug_level,
 	.proc_release	= single_release,
diff --git a/drivers/usb/gadget/function/rndis.c b/drivers/usb/gadget/function/rndis.c
index 64de9f1b874c55..562781b95101d3 100644
--- a/drivers/usb/gadget/function/rndis.c
+++ b/drivers/usb/gadget/function/rndis.c
@@ -1166,7 +1166,7 @@ static int rndis_proc_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops rndis_proc_ops = {
 	.proc_open	= rndis_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= rndis_proc_write,
diff --git a/drivers/video/fbdev/via/viafbdev.c b/drivers/video/fbdev/via/viafbdev.c
index 22deb340a0484f..6cf91191de7f15 100644
--- a/drivers/video/fbdev/via/viafbdev.c
+++ b/drivers/video/fbdev/via/viafbdev.c
@@ -1175,7 +1175,7 @@ static ssize_t viafb_dvp0_proc_write(struct file *file,
 
 static const struct proc_ops viafb_dvp0_proc_ops = {
 	.proc_open	= viafb_dvp0_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= viafb_dvp0_proc_write,
@@ -1239,7 +1239,7 @@ static ssize_t viafb_dvp1_proc_write(struct file *file,
 
 static const struct proc_ops viafb_dvp1_proc_ops = {
 	.proc_open	= viafb_dvp1_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= viafb_dvp1_proc_write,
@@ -1273,7 +1273,7 @@ static ssize_t viafb_dfph_proc_write(struct file *file,
 
 static const struct proc_ops viafb_dfph_proc_ops = {
 	.proc_open	= viafb_dfph_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= viafb_dfph_proc_write,
@@ -1307,7 +1307,7 @@ static ssize_t viafb_dfpl_proc_write(struct file *file,
 
 static const struct proc_ops viafb_dfpl_proc_ops = {
 	.proc_open	= viafb_dfpl_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= viafb_dfpl_proc_write,
@@ -1442,7 +1442,7 @@ static ssize_t viafb_vt1636_proc_write(struct file *file,
 
 static const struct proc_ops viafb_vt1636_proc_ops = {
 	.proc_open	= viafb_vt1636_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= viafb_vt1636_proc_write,
@@ -1519,7 +1519,7 @@ static ssize_t viafb_iga1_odev_proc_write(struct file *file,
 
 static const struct proc_ops viafb_iga1_odev_proc_ops = {
 	.proc_open	= viafb_iga1_odev_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= viafb_iga1_odev_proc_write,
@@ -1558,7 +1558,7 @@ static ssize_t viafb_iga2_odev_proc_write(struct file *file,
 
 static const struct proc_ops viafb_iga2_odev_proc_ops = {
 	.proc_open	= viafb_iga2_odev_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= viafb_iga2_odev_proc_write,
diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c
index fc98b97b396a44..65eb425cc3adf0 100644
--- a/fs/cifs/cifs_debug.c
+++ b/fs/cifs/cifs_debug.c
@@ -615,7 +615,7 @@ static int cifs_stats_proc_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops cifs_stats_proc_ops = {
 	.proc_open	= cifs_stats_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= cifs_stats_proc_write,
@@ -644,7 +644,7 @@ static int name##_open(struct inode *inode, struct file *file) \
 \
 static const struct proc_ops cifs_##name##_proc_fops = { \
 	.proc_open	= name##_open, \
-	.proc_read	= seq_read, \
+	.proc_read_iter	= seq_read_iter, \
 	.proc_lseek	= seq_lseek, \
 	.proc_release	= single_release, \
 	.proc_write	= name##_write, \
@@ -778,7 +778,7 @@ static ssize_t cifsFYI_proc_write(struct file *file, const char __user *buffer,
 
 static const struct proc_ops cifsFYI_proc_ops = {
 	.proc_open	= cifsFYI_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= cifsFYI_proc_write,
@@ -809,7 +809,7 @@ static ssize_t cifs_linux_ext_proc_write(struct file *file,
 
 static const struct proc_ops cifs_linux_ext_proc_ops = {
 	.proc_open	= cifs_linux_ext_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= cifs_linux_ext_proc_write,
@@ -840,7 +840,7 @@ static ssize_t cifs_lookup_cache_proc_write(struct file *file,
 
 static const struct proc_ops cifs_lookup_cache_proc_ops = {
 	.proc_open	= cifs_lookup_cache_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= cifs_lookup_cache_proc_write,
@@ -871,7 +871,7 @@ static ssize_t traceSMB_proc_write(struct file *file, const char __user *buffer,
 
 static const struct proc_ops traceSMB_proc_ops = {
 	.proc_open	= traceSMB_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= traceSMB_proc_write,
@@ -982,7 +982,7 @@ static ssize_t cifs_security_flags_proc_write(struct file *file,
 
 static const struct proc_ops cifs_security_flags_proc_ops = {
 	.proc_open	= cifs_security_flags_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= cifs_security_flags_proc_write,
diff --git a/fs/cifs/dfs_cache.c b/fs/cifs/dfs_cache.c
index df81c718d2faec..a4fe155fc92a7a 100644
--- a/fs/cifs/dfs_cache.c
+++ b/fs/cifs/dfs_cache.c
@@ -214,7 +214,7 @@ static int dfscache_proc_open(struct inode *inode, struct file *file)
 
 const struct proc_ops dfscache_proc_ops = {
 	.proc_open	= dfscache_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= dfscache_proc_write,
diff --git a/fs/fscache/object-list.c b/fs/fscache/object-list.c
index e106a1a1600d82..fab5a4197f50c3 100644
--- a/fs/fscache/object-list.c
+++ b/fs/fscache/object-list.c
@@ -408,7 +408,7 @@ static int fscache_objlist_release(struct inode *inode, struct file *file)
 
 const struct proc_ops fscache_objlist_proc_ops = {
 	.proc_open	= fscache_objlist_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= fscache_objlist_release,
 };
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index e4944436e733d0..db661b953c4378 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1080,7 +1080,7 @@ static int jbd2_seq_info_release(struct inode *inode, struct file *file)
 
 static const struct proc_ops jbd2_info_proc_ops = {
 	.proc_open	= jbd2_seq_info_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= jbd2_seq_info_release,
 };
diff --git a/fs/jfs/jfs_debug.c b/fs/jfs/jfs_debug.c
index 44b62b3c322e9a..235df6bac1d71a 100644
--- a/fs/jfs/jfs_debug.c
+++ b/fs/jfs/jfs_debug.c
@@ -45,7 +45,7 @@ static ssize_t jfs_loglevel_proc_write(struct file *file,
 
 static const struct proc_ops jfs_loglevel_proc_ops = {
 	.proc_open	= jfs_loglevel_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= jfs_loglevel_proc_write,
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 1b36a8ba82d204..a73899dfc09b9d 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -159,7 +159,7 @@ static int exports_proc_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops exports_proc_ops = {
 	.proc_open	= exports_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= seq_release,
 };
diff --git a/fs/nfsd/stats.c b/fs/nfsd/stats.c
index b1bc582b0493e4..1076f87715ad88 100644
--- a/fs/nfsd/stats.c
+++ b/fs/nfsd/stats.c
@@ -86,7 +86,7 @@ static int nfsd_proc_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops nfsd_proc_ops = {
 	.proc_open	= nfsd_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 };
diff --git a/fs/proc/cpuinfo.c b/fs/proc/cpuinfo.c
index d0989a443c77df..419760fd77bdd8 100644
--- a/fs/proc/cpuinfo.c
+++ b/fs/proc/cpuinfo.c
@@ -19,7 +19,7 @@ static int cpuinfo_open(struct inode *inode, struct file *file)
 static const struct proc_ops cpuinfo_proc_ops = {
 	.proc_flags	= PROC_ENTRY_PERMANENT,
 	.proc_open	= cpuinfo_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= seq_release,
 };
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index 2f9fa179194d72..4323b28db5643a 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -590,7 +590,7 @@ static int proc_seq_release(struct inode *inode, struct file *file)
 static const struct proc_ops proc_seq_ops = {
 	/* not permanent -- can call into arbitrary seq_operations */
 	.proc_open	= proc_seq_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= proc_seq_release,
 };
@@ -621,7 +621,7 @@ static int proc_single_open(struct inode *inode, struct file *file)
 static const struct proc_ops proc_single_ops = {
 	/* not permanent -- can call into arbitrary ->single_show */
 	.proc_open	= proc_single_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 };
diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index dba63b2429f05f..8274f4fc4d4338 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -92,7 +92,7 @@ static int seq_release_net(struct inode *ino, struct file *f)
 
 static const struct proc_ops proc_net_seq_ops = {
 	.proc_open	= seq_open_net,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= proc_simple_write,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= seq_release_net,
@@ -204,7 +204,7 @@ static int single_release_net(struct inode *ino, struct file *f)
 
 static const struct proc_ops proc_net_single_ops = {
 	.proc_open	= single_open_net,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= proc_simple_write,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release_net,
diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 46b3293015fe61..4695b6de315129 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -226,7 +226,7 @@ static int stat_open(struct inode *inode, struct file *file)
 static const struct proc_ops stat_proc_ops = {
 	.proc_flags	= PROC_ENTRY_PERMANENT,
 	.proc_open	= stat_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 };
diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
index 0c9e9c8607e788..04f342179f5572 100644
--- a/include/linux/seq_file.h
+++ b/include/linux/seq_file.h
@@ -187,7 +187,7 @@ static int __name ## _open(struct inode *inode, struct file *file)	\
 									\
 static const struct proc_ops __name ## _proc_ops = {			\
 	.proc_open	= __name ## _open,				\
-	.proc_read	= seq_read,					\
+	.proc_read_iter	= seq_read_iter,					\
 	.proc_lseek	= seq_lseek,					\
 	.proc_release	= single_release,				\
 }
diff --git a/ipc/util.c b/ipc/util.c
index cfa0045e748d55..189c835108afc8 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -887,7 +887,7 @@ static int sysvipc_proc_release(struct inode *inode, struct file *file)
 static const struct proc_ops sysvipc_proc_ops = {
 	.proc_flags	= PROC_ENTRY_PERMANENT,
 	.proc_open	= sysvipc_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= sysvipc_proc_release,
 };
diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index 32c071d7bc0338..6c541898f614c4 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -200,7 +200,7 @@ static int irq_affinity_list_proc_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops irq_affinity_proc_ops = {
 	.proc_open	= irq_affinity_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= irq_affinity_proc_write,
@@ -208,7 +208,7 @@ static const struct proc_ops irq_affinity_proc_ops = {
 
 static const struct proc_ops irq_affinity_list_proc_ops = {
 	.proc_open	= irq_affinity_list_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= irq_affinity_list_proc_write,
@@ -270,7 +270,7 @@ static int default_affinity_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops default_affinity_proc_ops = {
 	.proc_open	= default_affinity_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= default_affinity_write,
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 16c8c605f4b0fa..90facda3b723ab 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -699,7 +699,7 @@ const char *kdb_walk_kallsyms(loff_t *pos)
 
 static const struct proc_ops kallsyms_proc_ops = {
 	.proc_open	= kallsyms_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= seq_release_private,
 };
diff --git a/kernel/latencytop.c b/kernel/latencytop.c
index 166d7bf49666b0..543c7f552c45ce 100644
--- a/kernel/latencytop.c
+++ b/kernel/latencytop.c
@@ -257,7 +257,7 @@ static int lstats_open(struct inode *inode, struct file *filp)
 
 static const struct proc_ops lstats_proc_ops = {
 	.proc_open	= lstats_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= lstats_write,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
diff --git a/kernel/locking/lockdep_proc.c b/kernel/locking/lockdep_proc.c
index 5525cd3ba0c83c..60c725f53c26d7 100644
--- a/kernel/locking/lockdep_proc.c
+++ b/kernel/locking/lockdep_proc.c
@@ -669,7 +669,7 @@ static int lock_stat_release(struct inode *inode, struct file *file)
 static const struct proc_ops lock_stat_proc_ops = {
 	.proc_open	= lock_stat_open,
 	.proc_write	= lock_stat_write,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= lock_stat_release,
 };
diff --git a/kernel/module.c b/kernel/module.c
index e8a198588f26ee..019e7b21dbbf3c 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -4386,7 +4386,7 @@ static int modules_open(struct inode *inode, struct file *file)
 static const struct proc_ops modules_proc_ops = {
 	.proc_flags	= PROC_ENTRY_PERMANENT,
 	.proc_open	= modules_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= seq_release,
 };
diff --git a/kernel/profile.c b/kernel/profile.c
index 6f69a4195d5630..101090397235ae 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -444,7 +444,7 @@ static ssize_t prof_cpu_mask_proc_write(struct file *file,
 
 static const struct proc_ops prof_cpu_mask_proc_ops = {
 	.proc_open	= prof_cpu_mask_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 	.proc_write	= prof_cpu_mask_proc_write,
diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index 8f45cdb6463b88..6795170140a031 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -1305,7 +1305,7 @@ static int psi_fop_release(struct inode *inode, struct file *file)
 
 static const struct proc_ops psi_io_proc_ops = {
 	.proc_open	= psi_io_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_write	= psi_io_write,
 	.proc_poll	= psi_fop_poll,
@@ -1314,7 +1314,7 @@ static const struct proc_ops psi_io_proc_ops = {
 
 static const struct proc_ops psi_memory_proc_ops = {
 	.proc_open	= psi_memory_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_write	= psi_memory_write,
 	.proc_poll	= psi_fop_poll,
@@ -1323,7 +1323,7 @@ static const struct proc_ops psi_memory_proc_ops = {
 
 static const struct proc_ops psi_cpu_proc_ops = {
 	.proc_open	= psi_cpu_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_write	= psi_cpu_write,
 	.proc_poll	= psi_fop_poll,
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 76750e73dcaf9d..2317d29e16def9 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -878,7 +878,7 @@ static const struct file_operations ddebug_proc_fops = {
 
 static const struct proc_ops proc_fops = {
 	.proc_open = ddebug_proc_open,
-	.proc_read = seq_read,
+	.proc_read_iter = seq_read_iter,
 	.proc_lseek = seq_lseek,
 	.proc_release = seq_release_private,
 	.proc_write = ddebug_proc_write
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 9e72ba2241750c..8497bdf79a770f 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1584,7 +1584,7 @@ static int slabinfo_open(struct inode *inode, struct file *file)
 static const struct proc_ops slabinfo_proc_ops = {
 	.proc_flags	= PROC_ENTRY_PERMANENT,
 	.proc_open	= slabinfo_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= slabinfo_write,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= seq_release,
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 987276c557d1f1..2e50f52a14c7c2 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2835,7 +2835,7 @@ static int swaps_open(struct inode *inode, struct file *file)
 static const struct proc_ops swaps_proc_ops = {
 	.proc_flags	= PROC_ENTRY_PERMANENT,
 	.proc_open	= swaps_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= seq_release,
 	.proc_poll	= swaps_poll,
diff --git a/net/atm/mpoa_proc.c b/net/atm/mpoa_proc.c
index 829db9eba0cb95..fe8f822c7750a6 100644
--- a/net/atm/mpoa_proc.c
+++ b/net/atm/mpoa_proc.c
@@ -55,7 +55,7 @@ static int parse_qos(const char *buff);
 
 static const struct proc_ops mpc_proc_ops = {
 	.proc_open	= proc_mpc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_write	= proc_mpc_write,
 	.proc_release	= seq_release,
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index b53b6d38c4dff8..200b976202d0d1 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -537,7 +537,7 @@ static int pgctrl_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops pktgen_proc_ops = {
 	.proc_open	= pgctrl_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_write	= pgctrl_write,
 	.proc_release	= single_release,
@@ -1709,7 +1709,7 @@ static int pktgen_if_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops pktgen_if_proc_ops = {
 	.proc_open	= pktgen_if_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_write	= pktgen_if_write,
 	.proc_release	= single_release,
@@ -1846,7 +1846,7 @@ static int pktgen_thread_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops pktgen_thread_proc_ops = {
 	.proc_open	= pktgen_thread_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_write	= pktgen_thread_write,
 	.proc_release	= single_release,
diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index f8755a4ae9d4bd..67472389c9c395 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -806,7 +806,7 @@ static ssize_t clusterip_proc_write(struct file *file, const char __user *input,
 
 static const struct proc_ops clusterip_proc_ops = {
 	.proc_open	= clusterip_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= clusterip_proc_write,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= clusterip_proc_release,
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 1d7076b78e630b..4d5f26478f1001 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -239,7 +239,7 @@ static int rt_cache_seq_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops rt_cache_proc_ops = {
 	.proc_open	= rt_cache_seq_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= seq_release,
 };
@@ -330,7 +330,7 @@ static int rt_cpu_seq_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops rt_cpu_proc_ops = {
 	.proc_open	= rt_cpu_seq_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= seq_release,
 };
diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c
index 19bef176145eb9..f9cc00f7486058 100644
--- a/net/netfilter/xt_recent.c
+++ b/net/netfilter/xt_recent.c
@@ -618,7 +618,7 @@ recent_mt_proc_write(struct file *file, const char __user *input,
 
 static const struct proc_ops recent_mt_proc_ops = {
 	.proc_open	= recent_seq_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_write	= recent_mt_proc_write,
 	.proc_release	= seq_release_private,
 	.proc_lseek	= seq_lseek,
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index e5c01697c3f1d6..3671c464e0d30f 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1623,7 +1623,7 @@ static int content_release_procfs(struct inode *inode, struct file *filp)
 
 static const struct proc_ops content_proc_ops = {
 	.proc_open	= content_open_procfs,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= content_release_procfs,
 };
diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
index c964b48eaabae4..95b56f0e5a01e8 100644
--- a/net/sunrpc/stats.c
+++ b/net/sunrpc/stats.c
@@ -71,7 +71,7 @@ static int rpc_proc_open(struct inode *inode, struct file *file)
 
 static const struct proc_ops rpc_proc_ops = {
 	.proc_open	= rpc_proc_open,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 	.proc_lseek	= seq_lseek,
 	.proc_release	= single_release,
 };
diff --git a/sound/core/info.c b/sound/core/info.c
index 8c6bc5241df50c..6e2a35a37b5e6b 100644
--- a/sound/core/info.c
+++ b/sound/core/info.c
@@ -426,7 +426,7 @@ static const struct proc_ops snd_info_text_entry_ops =
 	.proc_release	= snd_info_text_entry_release,
 	.proc_write	= snd_info_text_entry_write,
 	.proc_lseek	= seq_lseek,
-	.proc_read	= seq_read,
+	.proc_read_iter	= seq_read_iter,
 };
 
 static struct snd_info_entry *create_subdir(struct module *mod,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 10/11] fs: don't allow kernel reads and writes using ->read and ->write
  2020-06-24 16:28 [RFC] stop using ->read and ->write for kernel access Christoph Hellwig
                   ` (7 preceding siblings ...)
  2020-06-24 16:28 ` [PATCH 09/11] proc: switch over direct seq_read method calls to seq_read_iter Christoph Hellwig
@ 2020-06-24 16:29 ` Christoph Hellwig
  2020-06-24 16:29 ` [PATCH 11/11] fs: don't allow splice read/write without explicit ops Christoph Hellwig
  9 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:29 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-fsdevel

Don't allow calling ->read or ->write with set_fs as a preparation for
killing off set_fs.  While I've not triggered any of these cases in my
setups as all the usual suspect (file systems, pipes, sockets, block
devices, system character devices) use the iter ops this is almost
going to be guaranteed to eventuall break something, so print a detailed
error message helping to debug such cases.  The fix will be to switch the
affected driver to use ->read_uptr / ->write_uptr or ->read_iter /
->write_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index b92c222ca886ca..1b813d9bcf08b7 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -420,6 +420,18 @@ ssize_t iter_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos,
 	return ret;
 }
 
+static void warn_unsupported(struct file *file, const char *op)
+{
+	char pathname[128], *path;
+
+	path = file_path(file, pathname, sizeof(pathname));
+	if (IS_ERR(path))
+		path = "(unknown)";
+	pr_warn_ratelimited(
+		"kernel space %s not supported for file %s (pid: %d comm: %.20s)\n",
+		op, path, current->pid, current->comm);
+}
+
 ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 {
 	ssize_t ret;
@@ -433,12 +445,6 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 		count =  MAX_RW_COUNT;
 	if (file->f_op->read_uptr) {
 		ret = file->f_op->read_uptr(file, KERNEL_UPTR(buf), count, pos);
-	} else if (file->f_op->read) {
-		mm_segment_t old_fs = get_fs();
-
-		set_fs(KERNEL_DS);
-		ret = file->f_op->read(file, (void __user *)buf, count, pos);
-		set_fs(old_fs);
 	} else if (file->f_op->read_iter) {
 		struct kvec iov = { .iov_base = buf, .iov_len = count };
 		struct kiocb kiocb;
@@ -450,6 +456,8 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 		ret = file->f_op->read_iter(&kiocb, &iter);
 		*pos = kiocb.ki_pos;
 	} else {
+		if (file->f_op->read)
+			warn_unsupported(file, "read");
 		ret = -EINVAL;
 	}
 	if (ret > 0) {
@@ -539,13 +547,6 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count,
 	if (file->f_op->write_uptr) {
 		ret = file->f_op->write_uptr(file, KERNEL_UPTR((void *)buf),
 				count, pos);
-	} else if (file->f_op->write) {
-		mm_segment_t old_fs = get_fs();
-
-		set_fs(KERNEL_DS);
-		ret = file->f_op->write(file, (__force const char __user *)buf,
-				count, pos);
-		set_fs(old_fs);
 	} else if (file->f_op->write_iter) {
 		struct kvec iov = { .iov_base = (void *)buf, .iov_len = count };
 		struct kiocb kiocb;
@@ -558,6 +559,8 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count,
 		if (ret > 0)
 			*pos = kiocb.ki_pos;
 	} else {
+		if (file->f_op->write)
+			warn_unsupported(file, "write");
 		ret = -EINVAL;
 	}
 	if (ret > 0) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 11/11] fs: don't allow splice read/write without explicit ops
  2020-06-24 16:28 [RFC] stop using ->read and ->write for kernel access Christoph Hellwig
                   ` (8 preceding siblings ...)
  2020-06-24 16:29 ` [PATCH 10/11] fs: don't allow kernel reads and writes using ->read and ->write Christoph Hellwig
@ 2020-06-24 16:29 ` Christoph Hellwig
  9 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:29 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-fsdevel

Now that __kernel_write or __kernel_write don't just work on all
file operations instances there is not much of a point of providing
default splice methods.  Renamed the existing default ones to
simple_ and wire them up for the few instancas actually implementing
->read_uptr and ->write_uptr.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/proc/proc_sysctl.c |  2 ++
 fs/splice.c           | 40 ++++++++++++++++++++++++++++------------
 include/linux/fs.h    |  5 ++++-
 3 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index dd5eb693bd00df..5adbc1c8f899cd 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -856,6 +856,8 @@ static const struct file_operations proc_sys_file_operations = {
 	.poll		= proc_sys_poll,
 	.read_uptr	= proc_sys_read,
 	.write_uptr	= proc_sys_write,
+	.splice_read	= simple_splice_read,
+	.splice_write	= simple_splice_write,
 	.llseek		= default_llseek,
 };
 
diff --git a/fs/splice.c b/fs/splice.c
index d1efc53875bd93..d5bba2cd695b5d 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -342,9 +342,8 @@ const struct pipe_buf_operations nosteal_pipe_buf_ops = {
 };
 EXPORT_SYMBOL(nosteal_pipe_buf_ops);
 
-static ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
-				 struct pipe_inode_info *pipe, size_t len,
-				 unsigned int flags)
+ssize_t simple_splice_read(struct file *in, loff_t *ppos,
+		struct pipe_inode_info *pipe, size_t len, unsigned int flags)
 {
 	struct iov_iter to;
 	struct page **pages;
@@ -779,9 +778,8 @@ static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 	return ret;
 }
 
-static ssize_t default_file_splice_write(struct pipe_inode_info *pipe,
-					 struct file *out, loff_t *ppos,
-					 size_t len, unsigned int flags)
+ssize_t simple_splice_write(struct pipe_inode_info *pipe, struct file *out,
+		loff_t *ppos, size_t len, unsigned int flags)
 {
 	ssize_t ret;
 
@@ -813,15 +811,30 @@ ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, struct file *out,
 
 EXPORT_SYMBOL(generic_splice_sendpage);
 
+static void warn_unsupported(struct file *file, const char *op)
+{
+	char pathname[128], *path;
+
+	path = file_path(file, pathname, sizeof(pathname));
+	if (IS_ERR(path))
+		path = "(unknown)";
+	pr_debug_ratelimited(
+		"splice %s not supported for file %s (pid: %d comm: %.20s)\n",
+		op, path, current->pid, current->comm);
+}
+
 /*
  * Attempt to initiate a splice from pipe to file.
  */
 static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
 			   loff_t *ppos, size_t len, unsigned int flags)
 {
-	if (out->f_op->splice_write)
-		return out->f_op->splice_write(pipe, out, ppos, len, flags);
-	return default_file_splice_write(pipe, out, ppos, len, flags);
+	if (!out->f_op->splice_write) {
+		warn_unsupported(out, "write");
+		return -EINVAL;
+	}
+
+	return out->f_op->splice_write(pipe, out, ppos, len, flags);
 }
 
 /*
@@ -843,9 +856,12 @@ static long do_splice_to(struct file *in, loff_t *ppos,
 	if (unlikely(len > MAX_RW_COUNT))
 		len = MAX_RW_COUNT;
 
-	if (in->f_op->splice_read)
-		return in->f_op->splice_read(in, ppos, pipe, len, flags);
-	return default_file_splice_read(in, ppos, pipe, len, flags);
+	if (!in->f_op->splice_read) {
+		warn_unsupported(in, "read");
+		return -EINVAL;
+	}
+
+	return in->f_op->splice_read(in, ppos, pipe, len, flags);
 }
 
 /**
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d0fea0281ef29b..64a6506cba0446 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3175,7 +3175,10 @@ extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe,
 		struct file *out, loff_t *, size_t len, unsigned int flags);
 extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
 		loff_t *opos, size_t len, unsigned int flags);
-
+ssize_t simple_splice_read(struct file *in, loff_t *ppos,
+		struct pipe_inode_info *pipe, size_t len, unsigned int flags);
+ssize_t simple_splice_write(struct pipe_inode_info *pipe, struct file *out,
+		loff_t *ppos, size_t len, unsigned int flags);
 
 extern void
 file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-24 16:28 ` [PATCH 03/11] fs: add new read_uptr and write_uptr file operations Christoph Hellwig
@ 2020-06-24 17:19   ` Linus Torvalds
  2020-06-24 17:55     ` Christoph Hellwig
  2020-06-24 17:56     ` Matthew Wilcox
  0 siblings, 2 replies; 36+ messages in thread
From: Linus Torvalds @ 2020-06-24 17:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Linux Kernel Mailing List, linux-fsdevel

On Wed, Jun 24, 2020 at 9:29 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Add two new file operations that are identical to ->read and ->write
> except that they can also safely take kernel pointers using the uptr_t
> type.

Honestly, I think this is the wrong way to go.

All of this new complexity and messiness, just to remove a few
unimportant final cases?

If somebody can't be bothered to convert a driver to
iter_read/iter_write, why would they be bothered to convert it to
read_uptr/write_uptr?

And this messiness will stay around for decades.

So let's not go down that path.

If you want to do "splice() and kernel_read() requires read_iter"
(with a warning so that we find any cases), then that's fine. But
let's not add yet _another_ read type.

Why did you care so much about sysctl, and why couldn't they use the iter ops?

                    Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-24 17:19   ` Linus Torvalds
@ 2020-06-24 17:55     ` Christoph Hellwig
  2020-06-24 18:11       ` Linus Torvalds
  2020-06-24 17:56     ` Matthew Wilcox
  1 sibling, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 17:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Al Viro, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, Linux Kernel Mailing List, linux-fsdevel

On Wed, Jun 24, 2020 at 10:19:16AM -0700, Linus Torvalds wrote:
> Honestly, I think this is the wrong way to go.
> 
> All of this new complexity and messiness, just to remove a few
> unimportant final cases?
> 
> If somebody can't be bothered to convert a driver to
> iter_read/iter_write, why would they be bothered to convert it to
> read_uptr/write_uptr?
> 
> And this messiness will stay around for decades.
> 
> So let's not go down that path.
> 
> If you want to do "splice() and kernel_read() requires read_iter"
> (with a warning so that we find any cases), then that's fine. But
> let's not add yet _another_ read type.
> 
> Why did you care so much about sysctl, and why couldn't they use the iter ops?

I don't care at all.  Based on our previous chat I assumed you
wanted something like this.  We might still need the uptr_t for
setsockopt, though.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-24 17:19   ` Linus Torvalds
  2020-06-24 17:55     ` Christoph Hellwig
@ 2020-06-24 17:56     ` Matthew Wilcox
  2020-06-24 17:59       ` Christoph Hellwig
  1 sibling, 1 reply; 36+ messages in thread
From: Matthew Wilcox @ 2020-06-24 17:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Al Viro, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, Linux Kernel Mailing List, linux-fsdevel

On Wed, Jun 24, 2020 at 10:19:16AM -0700, Linus Torvalds wrote:
> On Wed, Jun 24, 2020 at 9:29 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > Add two new file operations that are identical to ->read and ->write
> > except that they can also safely take kernel pointers using the uptr_t
> > type.
> 
> Honestly, I think this is the wrong way to go.
> 
> All of this new complexity and messiness, just to remove a few
> unimportant final cases?
> 
> If somebody can't be bothered to convert a driver to
> iter_read/iter_write, why would they be bothered to convert it to
> read_uptr/write_uptr?
> 
> And this messiness will stay around for decades.
> 
> So let's not go down that path.
> 
> If you want to do "splice() and kernel_read() requires read_iter"
> (with a warning so that we find any cases), then that's fine. But
> let's not add yet _another_ read type.
> 
> Why did you care so much about sysctl, and why couldn't they use the iter ops?

Heh, when I saw patch 4, I started working on that.  It doesn't seem all
that bad, except I've never used the iov_iter before, so I have no idea
if I did this right.  Also, this fixes a bug if 'count' is too large,
which I should split out and send separately.

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 42c5128c7d1c..7a8c474bc196 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -12,6 +12,7 @@
 #include <linux/cred.h>
 #include <linux/namei.h>
 #include <linux/mm.h>
+#include <linux/uio.h>
 #include <linux/module.h>
 #include <linux/bpf-cgroup.h>
 #include <linux/mount.h>
@@ -540,12 +541,13 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
-		size_t count, loff_t *ppos, int write)
+static ssize_t proc_sys_call_handler(struct kiocb *iocb, struct iov_iter *iter,
+		int write)
 {
-	struct inode *inode = file_inode(filp);
+	struct inode *inode = file_inode(iocb->ki_filp);
 	struct ctl_table_header *head = grab_header(inode);
 	struct ctl_table *table = PROC_I(inode)->sysctl_entry;
+	size_t count = iov_iter_count(iter);
 	void *kbuf;
 	ssize_t error;
 
@@ -566,35 +568,32 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
 		goto out;
 
 	/* don't even try if the size is too large */
+	error = -ENOMEM;
 	if (count > KMALLOC_MAX_SIZE)
-		return -ENOMEM;
+		goto out;
+	kbuf = kzalloc(count, GFP_KERNEL);
+	if (!kbuf)
+		goto out;
 
 	if (write) {
-		kbuf = memdup_user_nul(ubuf, count);
-		if (IS_ERR(kbuf)) {
-			error = PTR_ERR(kbuf);
-			goto out;
-		}
-	} else {
-		error = -ENOMEM;
-		kbuf = kzalloc(count, GFP_KERNEL);
-		if (!kbuf)
+		error = -EFAULT;
+		if (!copy_from_iter_full(kbuf, count, iter))
 			goto out;
 	}
 
 	error = BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, &kbuf, &count,
-					   ppos);
+					   &iocb->ki_pos);
 	if (error)
 		goto out_free_buf;
 
 	/* careful: calling conventions are nasty here */
-	error = table->proc_handler(table, write, kbuf, &count, ppos);
+	error = table->proc_handler(table, write, kbuf, &count, &iocb->ki_pos);
 	if (error)
 		goto out_free_buf;
 
 	if (!write) {
 		error = -EFAULT;
-		if (copy_to_user(ubuf, kbuf, count))
+		if (copy_to_iter(kbuf, count, iter) < count)
 			goto out_free_buf;
 	}
 
@@ -607,16 +606,14 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
 	return error;
 }
 
-static ssize_t proc_sys_read(struct file *filp, char __user *buf,
-				size_t count, loff_t *ppos)
+static ssize_t proc_sys_read(struct kiocb *iocb, struct iov_iter *iter)
 {
-	return proc_sys_call_handler(filp, (void __user *)buf, count, ppos, 0);
+	return proc_sys_call_handler(iocb, iter, 0);
 }
 
-static ssize_t proc_sys_write(struct file *filp, const char __user *buf,
-				size_t count, loff_t *ppos)
+static ssize_t proc_sys_write(struct kiocb *iocb, struct iov_iter *iter)
 {
-	return proc_sys_call_handler(filp, (void __user *)buf, count, ppos, 1);
+	return proc_sys_call_handler(iocb, iter, 1);
 }
 
 static int proc_sys_open(struct inode *inode, struct file *filp)
@@ -853,8 +850,8 @@ static int proc_sys_getattr(const struct path *path, struct kstat *stat,
 static const struct file_operations proc_sys_file_operations = {
 	.open		= proc_sys_open,
 	.poll		= proc_sys_poll,
-	.read		= proc_sys_read,
-	.write		= proc_sys_write,
+	.read_iter	= proc_sys_read,
+	.write_iter	= proc_sys_write,
 	.llseek		= default_llseek,
 };
 

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-24 17:56     ` Matthew Wilcox
@ 2020-06-24 17:59       ` Christoph Hellwig
  2020-06-24 18:37         ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 17:59 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Christoph Hellwig, Al Viro, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Linux Kernel Mailing List, linux-fsdevel

On Wed, Jun 24, 2020 at 06:56:44PM +0100, Matthew Wilcox wrote:
>  	/* don't even try if the size is too large */
> +	error = -ENOMEM;
>  	if (count > KMALLOC_MAX_SIZE)
> -		return -ENOMEM;
> +		goto out;
> +	kbuf = kzalloc(count, GFP_KERNEL);
> +	if (!kbuf)
> +		goto out;
>  
>  	if (write) {
> +		error = -EFAULT;
> +		if (!copy_from_iter_full(kbuf, count, iter))
>  			goto out;
>  	}

The nul-termination for the write cases seems to be lost here.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-24 17:55     ` Christoph Hellwig
@ 2020-06-24 18:11       ` Linus Torvalds
  2020-06-24 18:14         ` Christoph Hellwig
                           ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Linus Torvalds @ 2020-06-24 18:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Linux Kernel Mailing List, linux-fsdevel

On Wed, Jun 24, 2020 at 10:55 AM Christoph Hellwig <hch@lst.de> wrote:
>
> I don't care at all.  Based on our previous chat I assumed you
> wanted something like this.  We might still need the uptr_t for
> setsockopt, though.

No.

What I mean was *not* something like uptr_t.

Just keep the existing "set_fs()". It's not harmful if it's only used
occasionally. We should rename it once it's rare enough, though.

Then, make the following changes:

 - all the normal user access functions stop caring. They use
TASK_SIZE_MAX and are done with it. They basically stop reacting to
set_fs().

 - then, we can have a few *very* specific cases (like setsockopt,
maybe some random read/write) that we teach to use the new set_fs()
thing.

So in *those* cases, we'd basically just do "oh, ok, we are supposed
to use a kernel pointer" based on the setfs value.

IOW, I mean tto do something much more gradual. No new interfaces, no
new types, just a couple of (very clearly marked!) cases of the legacy
set_fs() behavior.

                Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-24 18:11       ` Linus Torvalds
@ 2020-06-24 18:14         ` Christoph Hellwig
  2020-06-24 18:20           ` Linus Torvalds
  2020-06-24 18:15         ` Linus Torvalds
  2020-06-27 10:49         ` David Laight
  2 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 18:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Al Viro, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, Linux Kernel Mailing List, linux-fsdevel

On Wed, Jun 24, 2020 at 11:11:50AM -0700, Linus Torvalds wrote:
> What I mean was *not* something like uptr_t.
> 
> Just keep the existing "set_fs()". It's not harmful if it's only used
> occasionally. We should rename it once it's rare enough, though.
> 
> Then, make the following changes:
> 
>  - all the normal user access functions stop caring. They use
> TASK_SIZE_MAX and are done with it. They basically stop reacting to
> set_fs().
> 
>  - then, we can have a few *very* specific cases (like setsockopt,
> maybe some random read/write) that we teach to use the new set_fs()
> thing.
> 
> So in *those* cases, we'd basically just do "oh, ok, we are supposed
> to use a kernel pointer" based on the setfs value.
> 
> IOW, I mean tto do something much more gradual. No new interfaces, no
> new types, just a couple of (very clearly marked!) cases of the legacy
> set_fs() behavior.

So we'd need new user copy functions for just those cases, and make
sure everything below the potential get_fs-NG uses them.  But without
any kind of tape safety to easily validate all users below actually
use them?  I just don't see how that makes sense.

FYI, I think the only users where we really need it are setsockopt
and a s390-specific driver from my audits so far.  Everything else
shouldn't need anything like that.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-24 18:11       ` Linus Torvalds
  2020-06-24 18:14         ` Christoph Hellwig
@ 2020-06-24 18:15         ` Linus Torvalds
  2020-06-27 10:49         ` David Laight
  2 siblings, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2020-06-24 18:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Linux Kernel Mailing List, linux-fsdevel

On Wed, Jun 24, 2020 at 11:11 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So in *those* cases, we'd basically just do "oh, ok, we are supposed
> to use a kernel pointer" based on the setfs value.

The important part here si that we don't need to change any
interfaces, because we don't add that whole "carry the bit around with
the pointer".

I agree that that would be the nice clean interface _if_ we intended
for this to be something we care about. But we already _have_ that
interface in the "iter" code, I absolutely do not think we should
create a new one.

So that's why the (admittedly hacky) "just support the old model for
special cases" kind of approach. Make it really easy to convert some
very specific individual places that might care, and make it very
obvious what those places are.

Maybe in a year or two, there's only a couple such places, and we can
see if we can clean those up separately. But make it easy do the
transition to the new model by _not_ changing the basic logic for now.

See what I'm aiming for?

In particular, for architectures that haven't been modified, this
results in zero changes what-so-ever.

            Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-24 18:14         ` Christoph Hellwig
@ 2020-06-24 18:20           ` Linus Torvalds
  2020-06-24 18:24             ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2020-06-24 18:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Linux Kernel Mailing List, linux-fsdevel

On Wed, Jun 24, 2020 at 11:14 AM Christoph Hellwig <hch@lst.de> wrote:
>
> So we'd need new user copy functions for just those cases

No. We'd open-code them. They'd look at "oh, I'm supposed to use a
kernel pointer" and just use those.

IOW, basically IN THE CODE that cares (and the whole argument is that
this code is one or two special cases) you do

    /* This has not been converted to the new world order */
    if (get_fs() == KERNEL_DS) memcpy(..) else copy_from_user();

You're overdesigning things. You're making them more complex than they
need to be.

Basically, I do *NOT* want to pollute the VFS layer with new
interfaces that shouldn't exist in the long run. I'd much rather make
the eventual goal be to get rid of 'read/write' entirely in favour of
the 'iter' things, but what I absolutely do *NOT* want to see is to
make a _third_ interface for reading and writing. Quite the reverse.
We should strive to make it a _single_ interface, not add a new one.

And I'd rather have a couple of ugly code details in odd places (that
we can hopefully fix up later) than have new VFS infrastructure that
will then hang around forever more.

                Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-24 18:20           ` Linus Torvalds
@ 2020-06-24 18:24             ` Christoph Hellwig
  2020-06-24 18:29               ` Matthew Wilcox
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 18:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Al Viro, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, Linux Kernel Mailing List, linux-fsdevel

On Wed, Jun 24, 2020 at 11:20:26AM -0700, Linus Torvalds wrote:
> On Wed, Jun 24, 2020 at 11:14 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > So we'd need new user copy functions for just those cases
> 
> No. We'd open-code them. They'd look at "oh, I'm supposed to use a
> kernel pointer" and just use those.
> 
> IOW, basically IN THE CODE that cares (and the whole argument is that
> this code is one or two special cases) you do
> 
>     /* This has not been converted to the new world order */
>     if (get_fs() == KERNEL_DS) memcpy(..) else copy_from_user();
> 
> You're overdesigning things. You're making them more complex than they
> need to be.

I wish it was so simple.  I really don't like overdesigns, trust me.

But please take a look at setsockopt and all the different instances
(count 90 .setsockopt wireups, and they then branch out into
various subroutines as well).  I really don't want to open code that
there, but we could do helper specific to setsockopt.

Honestly my preference would be to say that no eBPF isn't actually
a user API and just rip out the crap added to it, but I fear that
is not an option.  Because in that case we'd basically be done.

> Basically, I do *NOT* want to pollute the VFS layer with new
> interfaces that shouldn't exist in the long run. I'd much rather make
> the eventual goal be to get rid of 'read/write' entirely in favour of
> the 'iter' things, but what I absolutely do *NOT* want to see is to
> make a _third_ interface for reading and writing. Quite the reverse.
> We should strive to make it a _single_ interface, not add a new one.

Completele agreement on this.  I actually hate the new fops, and only
added them reluctantly as I mis-interpreted what you said.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-24 18:24             ` Christoph Hellwig
@ 2020-06-24 18:29               ` Matthew Wilcox
  2020-06-24 18:31                 ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Matthew Wilcox @ 2020-06-24 18:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Al Viro, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, Linux Kernel Mailing List, linux-fsdevel

On Wed, Jun 24, 2020 at 08:24:37PM +0200, Christoph Hellwig wrote:
> On Wed, Jun 24, 2020 at 11:20:26AM -0700, Linus Torvalds wrote:
> > On Wed, Jun 24, 2020 at 11:14 AM Christoph Hellwig <hch@lst.de> wrote:
> > >
> > > So we'd need new user copy functions for just those cases
> > 
> > No. We'd open-code them. They'd look at "oh, I'm supposed to use a
> > kernel pointer" and just use those.
> > 
> > IOW, basically IN THE CODE that cares (and the whole argument is that
> > this code is one or two special cases) you do
> > 
> >     /* This has not been converted to the new world order */
> >     if (get_fs() == KERNEL_DS) memcpy(..) else copy_from_user();
> > 
> > You're overdesigning things. You're making them more complex than they
> > need to be.
> 
> I wish it was so simple.  I really don't like overdesigns, trust me.
> 
> But please take a look at setsockopt and all the different instances
> (count 90 .setsockopt wireups, and they then branch out into
> various subroutines as well).  I really don't want to open code that
> there, but we could do helper specific to setsockopt.

Can we do a setsockopt_iter() which replaces optval/optlen with an iov_iter?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-24 18:29               ` Matthew Wilcox
@ 2020-06-24 18:31                 ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 18:31 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Linus Torvalds, Al Viro, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Linux Kernel Mailing List, linux-fsdevel

On Wed, Jun 24, 2020 at 07:29:44PM +0100, Matthew Wilcox wrote:
> On Wed, Jun 24, 2020 at 08:24:37PM +0200, Christoph Hellwig wrote:
> > On Wed, Jun 24, 2020 at 11:20:26AM -0700, Linus Torvalds wrote:
> > > On Wed, Jun 24, 2020 at 11:14 AM Christoph Hellwig <hch@lst.de> wrote:
> > > >
> > > > So we'd need new user copy functions for just those cases
> > > 
> > > No. We'd open-code them. They'd look at "oh, I'm supposed to use a
> > > kernel pointer" and just use those.
> > > 
> > > IOW, basically IN THE CODE that cares (and the whole argument is that
> > > this code is one or two special cases) you do
> > > 
> > >     /* This has not been converted to the new world order */
> > >     if (get_fs() == KERNEL_DS) memcpy(..) else copy_from_user();
> > > 
> > > You're overdesigning things. You're making them more complex than they
> > > need to be.
> > 
> > I wish it was so simple.  I really don't like overdesigns, trust me.
> > 
> > But please take a look at setsockopt and all the different instances
> > (count 90 .setsockopt wireups, and they then branch out into
> > various subroutines as well).  I really don't want to open code that
> > there, but we could do helper specific to setsockopt.
> 
> Can we do a setsockopt_iter() which replaces optval/optlen with an iov_iter?

We could.  The only downside is int-sized sockopts are common, and used
in the fast path of networking applications (e.g. cork,uncork) and this
might introduce enough overhead to be noticable.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-24 17:59       ` Christoph Hellwig
@ 2020-06-24 18:37         ` Christoph Hellwig
  2020-06-24 18:43           ` Matthew Wilcox
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-24 18:37 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Christoph Hellwig, Al Viro, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Linux Kernel Mailing List, linux-fsdevel

On Wed, Jun 24, 2020 at 07:59:05PM +0200, Christoph Hellwig wrote:
> On Wed, Jun 24, 2020 at 06:56:44PM +0100, Matthew Wilcox wrote:
> >  	/* don't even try if the size is too large */
> > +	error = -ENOMEM;
> >  	if (count > KMALLOC_MAX_SIZE)
> > -		return -ENOMEM;
> > +		goto out;
> > +	kbuf = kzalloc(count, GFP_KERNEL);
> > +	if (!kbuf)
> > +		goto out;
> >  
> >  	if (write) {
> > +		error = -EFAULT;
> > +		if (!copy_from_iter_full(kbuf, count, iter))
> >  			goto out;
> >  	}
> 
> The nul-termination for the write cases seems to be lost here.

Version with the count and termination fixed below.  Can I get your
signoff?  If testing passes that means I can go back to my
kernel_read/write version from the set_fs removal tree with it.

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 42c5128c7d1c76..36ac7b0e4ba80d 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -12,6 +12,7 @@
 #include <linux/cred.h>
 #include <linux/namei.h>
 #include <linux/mm.h>
+#include <linux/uio.h>
 #include <linux/module.h>
 #include <linux/bpf-cgroup.h>
 #include <linux/mount.h>
@@ -540,12 +541,13 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
-		size_t count, loff_t *ppos, int write)
+static ssize_t proc_sys_call_handler(struct kiocb *iocb, struct iov_iter *iter,
+		int write)
 {
-	struct inode *inode = file_inode(filp);
+	struct inode *inode = file_inode(iocb->ki_filp);
 	struct ctl_table_header *head = grab_header(inode);
 	struct ctl_table *table = PROC_I(inode)->sysctl_entry;
+	size_t count = iov_iter_count(iter);
 	void *kbuf;
 	ssize_t error;
 
@@ -566,35 +568,33 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
 		goto out;
 
 	/* don't even try if the size is too large */
-	if (count > KMALLOC_MAX_SIZE)
-		return -ENOMEM;
+	error = -ENOMEM;
+	if (count + !!write > KMALLOC_MAX_SIZE)
+		goto out;
+	kbuf = kzalloc(count, GFP_KERNEL);
+	if (!kbuf)
+		goto out;
 
 	if (write) {
-		kbuf = memdup_user_nul(ubuf, count);
-		if (IS_ERR(kbuf)) {
-			error = PTR_ERR(kbuf);
-			goto out;
-		}
-	} else {
-		error = -ENOMEM;
-		kbuf = kzalloc(count, GFP_KERNEL);
-		if (!kbuf)
+		error = -EFAULT;
+		if (!copy_from_iter_full(kbuf, count, iter))
 			goto out;
+		((char *)kbuf)[count] = '\0';
 	}
 
 	error = BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, &kbuf, &count,
-					   ppos);
+					   &iocb->ki_pos);
 	if (error)
 		goto out_free_buf;
 
 	/* careful: calling conventions are nasty here */
-	error = table->proc_handler(table, write, kbuf, &count, ppos);
+	error = table->proc_handler(table, write, kbuf, &count, &iocb->ki_pos);
 	if (error)
 		goto out_free_buf;
 
 	if (!write) {
 		error = -EFAULT;
-		if (copy_to_user(ubuf, kbuf, count))
+		if (copy_to_iter(kbuf, count, iter) < count)
 			goto out_free_buf;
 	}
 
@@ -607,16 +607,14 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
 	return error;
 }
 
-static ssize_t proc_sys_read(struct file *filp, char __user *buf,
-				size_t count, loff_t *ppos)
+static ssize_t proc_sys_read(struct kiocb *iocb, struct iov_iter *iter)
 {
-	return proc_sys_call_handler(filp, (void __user *)buf, count, ppos, 0);
+	return proc_sys_call_handler(iocb, iter, 0);
 }
 
-static ssize_t proc_sys_write(struct file *filp, const char __user *buf,
-				size_t count, loff_t *ppos)
+static ssize_t proc_sys_write(struct kiocb *iocb, struct iov_iter *iter)
 {
-	return proc_sys_call_handler(filp, (void __user *)buf, count, ppos, 1);
+	return proc_sys_call_handler(iocb, iter, 1);
 }
 
 static int proc_sys_open(struct inode *inode, struct file *filp)
@@ -853,8 +851,8 @@ static int proc_sys_getattr(const struct path *path, struct kstat *stat,
 static const struct file_operations proc_sys_file_operations = {
 	.open		= proc_sys_open,
 	.poll		= proc_sys_poll,
-	.read		= proc_sys_read,
-	.write		= proc_sys_write,
+	.read_iter	= proc_sys_read,
+	.write_iter	= proc_sys_write,
 	.llseek		= default_llseek,
 };
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-24 18:37         ` Christoph Hellwig
@ 2020-06-24 18:43           ` Matthew Wilcox
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-06-24 18:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Al Viro, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, Linux Kernel Mailing List, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1154 bytes --]

On Wed, Jun 24, 2020 at 08:37:43PM +0200, Christoph Hellwig wrote:
> On Wed, Jun 24, 2020 at 07:59:05PM +0200, Christoph Hellwig wrote:
> > On Wed, Jun 24, 2020 at 06:56:44PM +0100, Matthew Wilcox wrote:
> > >  	/* don't even try if the size is too large */
> > > +	error = -ENOMEM;
> > >  	if (count > KMALLOC_MAX_SIZE)
> > > -		return -ENOMEM;
> > > +		goto out;
> > > +	kbuf = kzalloc(count, GFP_KERNEL);
> > > +	if (!kbuf)
> > > +		goto out;
> > >  
> > >  	if (write) {
> > > +		error = -EFAULT;
> > > +		if (!copy_from_iter_full(kbuf, count, iter))
> > >  			goto out;
> > >  	}
> > 
> > The nul-termination for the write cases seems to be lost here.
> 
> Version with the count and termination fixed below.  Can I get your
> signoff?  If testing passes that means I can go back to my
> kernel_read/write version from the set_fs removal tree with it.

Sure!

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

I went with some slightly different fixes, but I'm having trouble
sending email (the new infradead.org is not quite set up right yet).
I've attached the two fixes to this email in case you want to
incorporate them in some way.

[-- Attachment #2: 0001-sysctl-Call-sysctl_head_finish-on-error.patch --]
[-- Type: text/plain, Size: 1251 bytes --]

From 3c68e4efcc50192962a8cd18c67fb6fad2493713 Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Date: Wed, 24 Jun 2020 14:12:02 -0400
Subject: [PATCH 1/2] sysctl: Call sysctl_head_finish on error
To: hch@lst.de

This error path returned directly instead of calling sysctl_head_finish().

Fixes: ef9d965bc8b6 ("sysctl: reject gigantic reads/write to sysctl files")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/proc/proc_sysctl.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 42c5128c7d1c..6c1166ccdaea 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -566,8 +566,9 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
 		goto out;
 
 	/* don't even try if the size is too large */
-	if (count > KMALLOC_MAX_SIZE)
-		return -ENOMEM;
+	error = -ENOMEM;
+	if (count >= KMALLOC_MAX_SIZE)
+		goto out;
 
 	if (write) {
 		kbuf = memdup_user_nul(ubuf, count);
@@ -576,7 +577,6 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
 			goto out;
 		}
 	} else {
-		error = -ENOMEM;
 		kbuf = kzalloc(count, GFP_KERNEL);
 		if (!kbuf)
 			goto out;
-- 
2.27.0


[-- Attachment #3: 0002-sysctl-Convert-to-iter-interfaces.patch --]
[-- Type: text/plain, Size: 5022 bytes --]

From 93775b67bbc3e4f761808b216bc76127f89f9ad7 Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Date: Wed, 24 Jun 2020 14:31:08 -0400
Subject: [PATCH 2/2] sysctl: Convert to iter interfaces
To: hch@lst.de

Using the read_iter/write_iter interfaces allows for in-kernel users
to set sysctls without using set_fs().  Also, the buffer is a string,
so give it the real type of 'char *', not void *.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/proc/proc_sysctl.c      | 44 ++++++++++++++++++--------------------
 include/linux/bpf-cgroup.h |  2 +-
 kernel/bpf/cgroup.c        |  2 +-
 3 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 6c1166ccdaea..9f6b9c3e3fda 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -12,6 +12,7 @@
 #include <linux/cred.h>
 #include <linux/namei.h>
 #include <linux/mm.h>
+#include <linux/uio.h>
 #include <linux/module.h>
 #include <linux/bpf-cgroup.h>
 #include <linux/mount.h>
@@ -540,13 +541,14 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
-		size_t count, loff_t *ppos, int write)
+static ssize_t proc_sys_call_handler(struct kiocb *iocb, struct iov_iter *iter,
+		int write)
 {
-	struct inode *inode = file_inode(filp);
+	struct inode *inode = file_inode(iocb->ki_filp);
 	struct ctl_table_header *head = grab_header(inode);
 	struct ctl_table *table = PROC_I(inode)->sysctl_entry;
-	void *kbuf;
+	size_t count = iov_iter_count(iter);
+	char *kbuf;
 	ssize_t error;
 
 	if (IS_ERR(head))
@@ -569,32 +571,30 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
 	error = -ENOMEM;
 	if (count >= KMALLOC_MAX_SIZE)
 		goto out;
+	kbuf = kzalloc(count + 1, GFP_KERNEL);
+	if (!kbuf)
+		goto out;
 
 	if (write) {
-		kbuf = memdup_user_nul(ubuf, count);
-		if (IS_ERR(kbuf)) {
-			error = PTR_ERR(kbuf);
-			goto out;
-		}
-	} else {
-		kbuf = kzalloc(count, GFP_KERNEL);
-		if (!kbuf)
+		error = -EFAULT;
+		if (!copy_from_iter_full(kbuf, count, iter))
 			goto out;
+		kbuf[count] = '\0';
 	}
 
 	error = BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, &kbuf, &count,
-					   ppos);
+					   &iocb->ki_pos);
 	if (error)
 		goto out_free_buf;
 
 	/* careful: calling conventions are nasty here */
-	error = table->proc_handler(table, write, kbuf, &count, ppos);
+	error = table->proc_handler(table, write, kbuf, &count, &iocb->ki_pos);
 	if (error)
 		goto out_free_buf;
 
 	if (!write) {
 		error = -EFAULT;
-		if (copy_to_user(ubuf, kbuf, count))
+		if (copy_to_iter(kbuf, count, iter) < count)
 			goto out_free_buf;
 	}
 
@@ -607,16 +607,14 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
 	return error;
 }
 
-static ssize_t proc_sys_read(struct file *filp, char __user *buf,
-				size_t count, loff_t *ppos)
+static ssize_t proc_sys_read(struct kiocb *iocb, struct iov_iter *iter)
 {
-	return proc_sys_call_handler(filp, (void __user *)buf, count, ppos, 0);
+	return proc_sys_call_handler(iocb, iter, 0);
 }
 
-static ssize_t proc_sys_write(struct file *filp, const char __user *buf,
-				size_t count, loff_t *ppos)
+static ssize_t proc_sys_write(struct kiocb *iocb, struct iov_iter *iter)
 {
-	return proc_sys_call_handler(filp, (void __user *)buf, count, ppos, 1);
+	return proc_sys_call_handler(iocb, iter, 1);
 }
 
 static int proc_sys_open(struct inode *inode, struct file *filp)
@@ -853,8 +851,8 @@ static int proc_sys_getattr(const struct path *path, struct kstat *stat,
 static const struct file_operations proc_sys_file_operations = {
 	.open		= proc_sys_open,
 	.poll		= proc_sys_poll,
-	.read		= proc_sys_read,
-	.write		= proc_sys_write,
+	.read_iter	= proc_sys_read,
+	.write_iter	= proc_sys_write,
 	.llseek		= default_llseek,
 };
 
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index c66c545e161a..f81d3b3752f9 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -132,7 +132,7 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
 
 int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 				   struct ctl_table *table, int write,
-				   void **buf, size_t *pcount, loff_t *ppos,
+				   char **buf, size_t *pcount, loff_t *ppos,
 				   enum bpf_attach_type type);
 
 int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 4d76f16524cc..45a918fc573d 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1202,7 +1202,7 @@ const struct bpf_verifier_ops cg_dev_verifier_ops = {
  */
 int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 				   struct ctl_table *table, int write,
-				   void **buf, size_t *pcount, loff_t *ppos,
+				   char **buf, size_t *pcount, loff_t *ppos,
 				   enum bpf_attach_type type)
 {
 	struct bpf_sysctl_kern ctx = {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* RE: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-24 18:11       ` Linus Torvalds
  2020-06-24 18:14         ` Christoph Hellwig
  2020-06-24 18:15         ` Linus Torvalds
@ 2020-06-27 10:49         ` David Laight
  2020-06-27 16:33           ` Linus Torvalds
  2 siblings, 1 reply; 36+ messages in thread
From: David Laight @ 2020-06-27 10:49 UTC (permalink / raw)
  To: 'Linus Torvalds', Christoph Hellwig
  Cc: Al Viro, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Linux Kernel Mailing List, linux-fsdevel

From: Linus Torvalds
> Sent: 24 June 2020 19:12
> On Wed, Jun 24, 2020 at 10:55 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > I don't care at all.  Based on our previous chat I assumed you
> > wanted something like this.  We might still need the uptr_t for
> > setsockopt, though.
> 
> No.
> 
> What I mean was *not* something like uptr_t.
> 
> Just keep the existing "set_fs()". It's not harmful if it's only used
> occasionally. We should rename it once it's rare enough, though.

Am I right in thinking that it just sets a flag in 'current' ?
Although I don't remember access_ok() doing a suitable check
(would need to be (address - base) < limit).

> Then, make the following changes:
> 
>  - all the normal user access functions stop caring. They use
> TASK_SIZE_MAX and are done with it. They basically stop reacting to
> set_fs().
> 
>  - then, we can have a few *very* specific cases (like setsockopt,
> maybe some random read/write) that we teach to use the new set_fs()
> thing.

Certainly there is a 'BPF' hook in the setsockopt() syscall handler
that can substitute a kernel buffer for any setsockopt() request.

If that is needed (I presume it was added for a purpose) then all
the socket option code needs to be able to handle kernel buffers.
(Actually given what some getsockopt() do, if there was a
requirement to 'adjust' setsockopt() then there should be a hook
in the getsockopt() code as well.)

If you are going to go through all the socket option code to change
the name of all the buffer access functions then it is probably
almost as easy to move the usercopies out into the wrappers.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-27 10:49         ` David Laight
@ 2020-06-27 16:33           ` Linus Torvalds
  2020-06-29  8:21             ` David Laight
                               ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Linus Torvalds @ 2020-06-27 16:33 UTC (permalink / raw)
  To: David Laight
  Cc: Christoph Hellwig, Al Viro, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, Linux Kernel Mailing List, linux-fsdevel

On Sat, Jun 27, 2020 at 3:49 AM David Laight <David.Laight@aculab.com> wrote:
>
> > Just keep the existing "set_fs()". It's not harmful if it's only used
> > occasionally. We should rename it once it's rare enough, though.
>
> Am I right in thinking that it just sets a flag in 'current' ?

Basically, yes. That's what it has always done.

Well "always" is not true - it used to set the %fs segment register
originally (thus the name), but _conceptually_ it sets a flag for
"should user accesses be kernel accesses instead".

On x86 - and most other architectures where user space and kernel
space are in the same address space and accessed with the same
instructions, that has then been implemented as just a "what is the
limit for an access".

On other architectures - architectures that need different access
methods (or different flags to the load/store instruction) - it's an
actual flag that changes which access method you use.

> Although I don't remember access_ok() doing a suitable check
> (would need to be (address - base) < limit).

So again, on the architectures with a unified address space,
access_ok() is exactly that "address + access_size <= limit", although
often done with some inline asm just to get the overflow case done
efficiently.

On other architectures, there's no limit check, because _all_
addresses are either user space or kernel space addresses, and what
changes isn't the address limit, but the access itself.

So what I was suggesting is literally

 - keep this flag around as a flag

 - but make all _normal_ user accesses ignore it, and always do user
accesses (so on a unified address space architecture like x86 it
always checks the _fixed_ limit, and on something like sparc32 which
has separate kernel and user address spaces, it just always does a
user access with no conditionals at all)

 - then make the really odd and hopefully very rare cases check that
flag explicitly and manually, and do

        if (current->legacy_uptr_is_kernel)
                memcpy(...);
        else
                copy_to/from_user(...);

and my hope is that we'd have only a handful of cases (like the
setsockopt thing: one for each protocol or whatever) that actually
want this.

Note that the legacy behavior would still remain in architectures that
haven't been modified to remove the use of set_fs(), so I would
further suggest that the two approaches live side-by-side for at least
a while. But _generic_ code (and with Christoph's patches at least
x86) would make set_fs() cause a build error.

So we'd have a new

     set_force_kernel_pointers();
     ....
     clear_force_kernel_pointers();

that would set/clear that 'current->legacy_uptr_is_kernel' variable,
and we'd have a handful of places that would check it.

The naming above is all random, and I'm not claiming that any of this
is particularly _clean_. I'm also not claiming that it's really any
better than our current "set_fs()" mess conceptually.

The only thing that makes it better than our current "set_fs()" is

 - there would hopefully be very few cases of this

 - it would *not* affect random incidental user accesses that just
happen to be in the shadow of this thing.

That second point is the important one, I feel. The real problem with
"set_fs()" has been that we've occasionally had bugs where we ended up
running odd paths that we really didn't _intend_ to run with kernel
pointers. The classic example is the SCSI "write as ioctl" example,
where a write to a SCSI generic device would do various odd things and
follow pointers and what-not. Then you get into real trouble when
"splice()" ends up truiong to write a kernel buffer, and because of
"set_fs()" suddenly the sg code started accessing kernel memory
willy-nilly.

So my suggestion was basically a new version of set_fs(), but one that
is just much more targeted, and doesn't affect all random user
accesses, only those very special ones that are then very *explicitly*
aware of the fact that "hey, I might be called in this situation where
I'm going to get a kernel address instead".

> If that is needed (I presume it was added for a purpose) then all
> the socket option code needs to be able to handle kernel buffers.

So that's very much what I'd like to avoid.

The plan would be that all the *normal* stuff would be handled by
either (a) always having the data come from user space, or (b) the
data has a known size (either fixed, or "optlen" or whatever) and then
being copied to a kernel buffer and then always handled as a kernel
field that bpf can then call with kernel data.

I thought there was just one very specific case of "oh, in certain
cases of setsockopt we don't know what size this address is and optlen
is ignored", so we have to just pass the pointer down to the protocol,
which is the point that knows how much of an address it wants..

Was that a misunderstanding on my part?

Because if there are tons and tons of places that want this "either
kernel or user" then we could still have a helper function for it, but
it means that the whole "limit the cases" advantage to some degree
goes away.

It would still fix the 99% of normal "copy/from/to_user()" cases,
though. They'd be fixed and "safe" and coule never ever touch kernel
memory even if there was some confusion about things. So it would be
an improvement, but I was really hoping that the cases where there can
be confusion would be pretty rare.

             Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-27 16:33           ` Linus Torvalds
@ 2020-06-29  8:21             ` David Laight
  2020-06-29 15:29             ` Christoph Hellwig
  2020-07-08  5:14             ` Luis Chamberlain
  2 siblings, 0 replies; 36+ messages in thread
From: David Laight @ 2020-06-29  8:21 UTC (permalink / raw)
  To: 'Linus Torvalds'
  Cc: Christoph Hellwig, Al Viro, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, Linux Kernel Mailing List, linux-fsdevel

From: Linus Torvalds
> Sent: 27 June 2020 17:33
> On Sat, Jun 27, 2020 at 3:49 AM David Laight <David.Laight@aculab.com> wrote:
> >
> > > Just keep the existing "set_fs()". It's not harmful if it's only used
> > > occasionally. We should rename it once it's rare enough, though.
> >
> > Am I right in thinking that it just sets a flag in 'current' ?
> 
> Basically, yes. That's what it has always done.

I could check, but I suspect it sets what TASK_SIZE uses to ~0u
so that access_ok() can't fail.

> Well "always" is not true - it used to set the %fs segment register
> originally (thus the name), but _conceptually_ it sets a flag for
> "should user accesses be kernel accesses instead".
> 
> On x86 - and most other architectures where user space and kernel
> space are in the same address space and accessed with the same
> instructions, that has then been implemented as just a "what is the
> limit for an access".
> 
> On other architectures - architectures that need different access
> methods (or different flags to the load/store instruction) - it's an
> actual flag that changes which access method you use.
> 
> > Although I don't remember access_ok() doing a suitable check
> > (would need to be (address - base) < limit).
> 
> So again, on the architectures with a unified address space,
> access_ok() is exactly that "address + access_size <= limit", although
> often done with some inline asm just to get the overflow case done
> efficiently.

I realised afterwards that the 'kernel address is actually user'
check isn't really done on architectures like x86 until stac/clac.

I had another thought.
While setting up a full-blown scatter-gather 'iter' structure for
functions like [gs]etsockopt, ioctl and fcntl is OTT and probably
measurably expensive a lightweight 'buffer' structure that just
contained address, length and user/kernel flag could be used.

Although the uses would need an extra level of indirection this
would be offset by reducing the number of parameters passed
through all the layers.

...
> I thought there was just one very specific case of "oh, in certain
> cases of setsockopt we don't know what size this address is and optlen
> is ignored", so we have to just pass the pointer down to the protocol,
> which is the point that knows how much of an address it wants..

I can't help feeling that userspace passes a suitable length but
the kernel doesn't verify it.

It is worse than that, one of the SCTP getsockopt() calls has to return
a length that is shorter than the buffer it wrote.

So any buffer descriptor length would have to be advisory.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-27 16:33           ` Linus Torvalds
  2020-06-29  8:21             ` David Laight
@ 2020-06-29 15:29             ` Christoph Hellwig
  2020-06-29 17:02               ` Linus Torvalds
  2020-07-08  5:14             ` Luis Chamberlain
  2 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-29 15:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Laight, Christoph Hellwig, Al Viro, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Linux Kernel Mailing List, linux-fsdevel

On Sat, Jun 27, 2020 at 09:33:03AM -0700, Linus Torvalds wrote:
> I thought there was just one very specific case of "oh, in certain
> cases of setsockopt we don't know what size this address is and optlen
> is ignored", so we have to just pass the pointer down to the protocol,
> which is the point that knows how much of an address it wants..

The setsock issue is a little more complicated.  Let me try to summarize
it:

 - setsock takes a (user) pointer and len
 - unfortunately while the designed of the BSD socket API designed the
   len to be correct some protocol implementations have been sloppy
   and just use a hardcoded len for the value plus some other funnies
 - unfortunately there is some BPF magic that can attach to a socket
   and be run, and that (and only that in the latest kernel) can cause
   a setsockopt to take a kernel buffer.  One that was copied from
   userspace earlier and had the BPF program run on it.
 - unfortunately we have about 90 ->setsockopt instances, and the BPF
   hook is not specific to one particular of them.  In fact the
   BPF program can run for options that don't even exist, and based on
   my previous dicussion Facebook has setups that rely on that.

> Was that a misunderstanding on my part?
> 
> Because if there are tons and tons of places that want this "either
> kernel or user" then we could still have a helper function for it, but
> it means that the whole "limit the cases" advantage to some degree
> goes away.

But except for setsockopt we don't really have anything like that left.
There is some alpha arch code that would need to be duplicated for
user vs kernel pointers, but I suspect it will get cleaner by that,
and the messy s390 crypto driver whіch will be a bit of work, but all
internal to that driver.

So based on that I'd rather get away without our flag and tag the
kernel pointer case in setsockopt explicitly.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-29 15:29             ` Christoph Hellwig
@ 2020-06-29 17:02               ` Linus Torvalds
  2020-06-29 18:07                 ` Christoph Hellwig
  2020-06-30  7:51                 ` David Laight
  0 siblings, 2 replies; 36+ messages in thread
From: Linus Torvalds @ 2020-06-29 17:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Laight, Al Viro, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Linux Kernel Mailing List, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1781 bytes --]

On Mon, Jun 29, 2020 at 8:29 AM Christoph Hellwig <hch@lst.de> wrote:
>
> So based on that I'd rather get away without our flag and tag the
> kernel pointer case in setsockopt explicitly.

Yeah, I'd be ok to pass that kind of flag around for setsockopt, in
ways I _don't_ want to do for some very core vfs thing like 'read()'.

That said, is there no practical limit on how big "optlen" can be?
Sure, I realize that a lot of setsockopt users may not use all of the
data, but let's say that "optlen" is 128, but the actual low-level
setsockopt operation only uses the first 16 bytes, maybe we could
always just copy the 128 bytes from user space into kernel space, and
just say "setsockopt() always gets a kernel pointer".

Then the bpf use is even simpler. It would just pass the kernel
pointer natively.

Because that seems to be what the BPF code really wants to do: it
takes the user optval, and munges it into a kernel optval, and then
(if that has been done) runs the low-level sock_setsockopt() under
KERNEL_DS.

Couldn't we switch things around instead, and just *always* copy
things from user space, and sock_setsockopt (and
sock->ops->setsockopt) _always_ get a kernel buffer?

And avoid the set_fs(KERNEL_DS) games entirely that way?

Attached it a RFC patch just for __sys_setsockopt() - note that it
does *not* change all the low-level setsockopt callers to just do the
kernel access instead, so this is completely broken, but you can kind
of see what I mean.

Wouldn't this work? In fact, wouldn't this simplify all the setsockopt
places that now don't need to do "get_user()" etc any more?

It would be better if we could limit "optlen" to something sane, but
right now it just does a kmalloc() of whatever the user claims the opt
len is..

                    Linus

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 2079 bytes --]

 net/socket.c | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 976426d03f09..39060a5f0caa 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2086,25 +2086,37 @@ SYSCALL_DEFINE4(recv, int, fd, void __user *, ubuf, size_t, size,
  */
 
 static int __sys_setsockopt(int fd, int level, int optname,
-			    char __user *optval, int optlen)
+			    char __user *u_optval, int optlen)
 {
-	mm_segment_t oldfs = get_fs();
-	char *kernel_optval = NULL;
 	int err, fput_needed;
 	struct socket *sock;
+	char *optval;
 
 	if (optlen < 0)
 		return -EINVAL;
 
+	/* Can we have some upper limit on optlen */
+	optval = kmalloc(optlen, GFP_KERNEL | __GFP_NOWARN);
+	if (!optval)
+		return -ENOMEM;
+
+	if (copy_from_user(optval, u_optval, optlen)) {
+		kfree(optval);
+		return -EFAULT;
+	}
+
 	sock = sockfd_lookup_light(fd, &err, &fput_needed);
 	if (sock != NULL) {
+		char *bpf_optval = NULL;
+		char *use_optval;
+
 		err = security_socket_setsockopt(sock, level, optname);
 		if (err)
 			goto out_put;
 
 		err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level,
 						     &optname, optval, &optlen,
-						     &kernel_optval);
+						     &bpf_optval);
 
 		if (err < 0) {
 			goto out_put;
@@ -2113,27 +2125,23 @@ static int __sys_setsockopt(int fd, int level, int optname,
 			goto out_put;
 		}
 
-		if (kernel_optval) {
-			set_fs(KERNEL_DS);
-			optval = (char __user __force *)kernel_optval;
-		}
+		use_optval = bpf_optval ? bpf_optval : optval;
 
 		if (level == SOL_SOCKET)
 			err =
-			    sock_setsockopt(sock, level, optname, optval,
+			    sock_setsockopt(sock, level, optname, use_optval,
 					    optlen);
 		else
 			err =
-			    sock->ops->setsockopt(sock, level, optname, optval,
+			    sock->ops->setsockopt(sock, level, optname, use_optval,
 						  optlen);
 
-		if (kernel_optval) {
-			set_fs(oldfs);
-			kfree(kernel_optval);
-		}
+		if (bpf_optval)
+			kfree(bpf_optval);
 out_put:
 		fput_light(sock->file, fput_needed);
 	}
+	kfree(optval);
 	return err;
 }
 

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-29 17:02               ` Linus Torvalds
@ 2020-06-29 18:07                 ` Christoph Hellwig
  2020-06-29 18:29                   ` Linus Torvalds
  2020-06-30  7:51                 ` David Laight
  1 sibling, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-29 18:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, David Laight, Al Viro, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Linux Kernel Mailing List, linux-fsdevel

On Mon, Jun 29, 2020 at 10:02:48AM -0700, Linus Torvalds wrote:
> That said, is there no practical limit on how big "optlen" can be?

There are some pretty huge ones, like the sctp one that can take
a basically unlimited list of sockaddr structures.

> Sure, I realize that a lot of setsockopt users may not use all of the
> data, but let's say that "optlen" is 128, but the actual low-level
> setsockopt operation only uses the first 16 bytes, maybe we could
> always just copy the 128 bytes from user space into kernel space, and
> just say "setsockopt() always gets a kernel pointer".

One issue is that a lot setsockopt calls are in the fast path, and
even have micro-optimizations like putting an int on stack for the
fast path to avoid the memory allocation.  While I don't know for
sure I fear that always doing a large allocation could end up having
a performance impact.  But otherwise I like that idea, and did in
fact start some prep work until I realized what I did was futile.

> Then the bpf use is even simpler. It would just pass the kernel
> pointer natively.
> 
> Because that seems to be what the BPF code really wants to do: it
> takes the user optval, and munges it into a kernel optval, and then
> (if that has been done) runs the low-level sock_setsockopt() under
> KERNEL_DS.
> 
> Couldn't we switch things around instead, and just *always* copy
> things from user space, and sock_setsockopt (and
> sock->ops->setsockopt) _always_ get a kernel buffer?
> 
> And avoid the set_fs(KERNEL_DS) games entirely that way?

I'd love to be able to do that.  And now that we want through this
whole mess than Nth time I have another idea:

 - we assume optlen is correct, which should cover about 90% of
   the protocols
 - but to override that a new setsockopt_len method is added that
   returns the correct length for all the messy ones.

Let me try if that works out.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-29 18:07                 ` Christoph Hellwig
@ 2020-06-29 18:29                   ` Linus Torvalds
  2020-06-29 18:36                     ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2020-06-29 18:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Laight, Al Viro, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Linux Kernel Mailing List, linux-fsdevel

On Mon, Jun 29, 2020 at 11:07 AM Christoph Hellwig <hch@lst.de> wrote:
>
> One issue is that a lot setsockopt calls are in the fast path, and
> even have micro-optimizations like putting an int on stack for the
> fast path to avoid the memory allocation.

Yeah., An the RFC patch I posted could easily be updated to do exactly
that for small optlen values (say, avoid the kmalloc and use a stack
buffer for oplen smaller than 16 bytes or whatever).

Most of the setsockopt's I'm aware of are just a single integer, so if
that's the bulk of them, then we'd never actually need to do the
kmalloc() in those cases, and only fall back to the kmalloc for the
(hopefully quite unusual) bigger options..

> I'd love to be able to do that.  And now that we want through this
> whole mess than Nth time I have another idea:
>
>  - we assume optlen is correct, which should cover about 90% of
>    the protocols
>  - but to override that a new setsockopt_len method is added that
>    returns the correct length for all the messy ones.
>
> Let me try if that works out.

Doing a quick grep, there's about 100 different ".setsockopt" function
initializers, but a quarter of them are just setting it to
'sock_no_setsockopt'.

A number of others are using 'sock_common_setsockopt'.

Which leaves something like 50 different implementations of the
.setsockopt functions.  But I didn't go any deeper than that - maybe
they then have hundreds of different option cases each and this is all
a nightmare.

Looking at a couple of them, the "int val" situation does seem to be
the most common one by _far_, and is often handled by a common
"get_user()" thing, so converting them to just getting the thing as a
kernel pointer doesn't look _too_ nasty, because even when they have a
lot of subcases, the actual optval accesses are much fewer.

Which is not to say that it looks all that much fun, but it doesn't
look entirely undoable either.

The good news (I guess) is that any missed transformation will be
fairly obvious (ie somebody uses a "get_user()" on what is now a
kernel pointer, and returns -EFAULT. So it shouldn't cause any subtle
failures, and it shouldn't cause any security issues.

I didn't look at the compat cases, but if anything I'd expect those to
become simpler by having kernel pointers. And there doesn't actually
seem to be that many of them (possibly because the "int" case si so
common that it all ends up being the same?)

              Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-29 18:29                   ` Linus Torvalds
@ 2020-06-29 18:36                     ` Christoph Hellwig
  2020-06-29 19:10                       ` Linus Torvalds
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-29 18:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, David Laight, Al Viro, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Linux Kernel Mailing List, linux-fsdevel

On Mon, Jun 29, 2020 at 11:29:22AM -0700, Linus Torvalds wrote:
> I didn't look at the compat cases, but if anything I'd expect those to
> become simpler by having kernel pointers. And there doesn't actually
> seem to be that many of them (possibly because the "int" case si so
> common that it all ends up being the same?)

Having resurrect my work there really are tons of int cases.  Which
makes me thing that splitting out a setsockopt_int method which gets
passed value instead of a pointer, then converting all the simple cases
to that first and then doing the real shit later sounds like a promіsing
idea.  Let me think a bit more about that.

And yes, a lot of the common methods have tons of cases and
sub-dispatchers and everything else you'd expect from an ioctl-like
interface..

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-29 18:36                     ` Christoph Hellwig
@ 2020-06-29 19:10                       ` Linus Torvalds
  2020-06-30  7:04                         ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2020-06-29 19:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Laight, Al Viro, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Linux Kernel Mailing List, linux-fsdevel

On Mon, Jun 29, 2020 at 11:36 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Having resurrect my work there really are tons of int cases.  Which
> makes me thing that splitting out a setsockopt_int method which gets
> passed value instead of a pointer, then converting all the simple cases
> to that first and then doing the real shit later sounds like a promіsing
> idea.

Try my hacky patch first, and just change the code that does

                if (get_user(val, (int __user *)optval)) {
                        err = -EFAULT;

to do

                val = *(int *)optval;

In fact, that pattern seems to be so common that you can probably
almost do it with a sed-script or something.

                   Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-29 19:10                       ` Linus Torvalds
@ 2020-06-30  7:04                         ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2020-06-30  7:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, David Laight, Al Viro, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Linux Kernel Mailing List, linux-fsdevel

Next fund one, in net/ipv6/ip6_flowlabel.c:ipv6_flowlabel_opt() we
have this gem toward the end:

		if (!freq->flr_label) {
			if (copy_to_user(&((struct in6_flowlabel_req __user *)optval)->flr_label,
					 &fl->label, sizeof(fl->label))) {
				/* Intentionally ignore fault. */

so it writes back to what was supposed to be the input parameter,
and only does it for a partial region.  Not sure how we could handle
that with any kind of copy to kernel in the caller scheme?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-29 17:02               ` Linus Torvalds
  2020-06-29 18:07                 ` Christoph Hellwig
@ 2020-06-30  7:51                 ` David Laight
  1 sibling, 0 replies; 36+ messages in thread
From: David Laight @ 2020-06-30  7:51 UTC (permalink / raw)
  To: 'Linus Torvalds', Christoph Hellwig
  Cc: Al Viro, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Linux Kernel Mailing List, linux-fsdevel

From: Linus Torvalds
> Sent: 29 June 2020 18:03
> On Mon, Jun 29, 2020 at 8:29 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > So based on that I'd rather get away without our flag and tag the
> > kernel pointer case in setsockopt explicitly.
> 
> Yeah, I'd be ok to pass that kind of flag around for setsockopt, in
> ways I _don't_ want to do for some very core vfs thing like 'read()'.
> 
> That said, is there no practical limit on how big "optlen" can be?
> Sure, I realize that a lot of setsockopt users may not use all of the
> data, but let's say that "optlen" is 128, but the actual low-level
> setsockopt operation only uses the first 16 bytes, maybe we could
> always just copy the 128 bytes from user space into kernel space, and
> just say "setsockopt() always gets a kernel pointer".
> 
> Then the bpf use is even simpler. It would just pass the kernel
> pointer natively.
> 
> Because that seems to be what the BPF code really wants to do: it
> takes the user optval, and munges it into a kernel optval, and then
> (if that has been done) runs the low-level sock_setsockopt() under
> KERNEL_DS.
> 
> Couldn't we switch things around instead, and just *always* copy
> things from user space, and sock_setsockopt (and
> sock->ops->setsockopt) _always_ get a kernel buffer?
> 
> And avoid the set_fs(KERNEL_DS) games entirely that way?

I did a patch for SCTP to do the copies in the protocol wrapper.
Apart from the issue of bad applications providing overlarge
buffers and effecting a local DoS attack there were some odd issues:

1) SCTP completely abuses both setsockopt and getsockopt
   to perform additional socket operations.
   I suspect the original implementation didn't want to
   add new system calls.
2) SCTP treats getsockopt as RMW on the user buffer.
   Mostly it only needs 4 bytes, but it can in include
   a sockaddr_storage.
3) SCTP has one getsockopt that is really a setsockopt
   (ie changes things) but is implemented using getsockopt
   so that it can return a value.
4) One of the SCTP getsockopt calls has to return the
   'wrong' value to userspace (ie not the length of the
   transferred data) for compatibility with the orginal
   broken code.

I'm wondering if the [sg]etsockopt wrapper should actually
pass through a structure containing:
	Kernel buffer address (on stack if short)
	User buffer address (may be NULL)
	Length of buffer
	copy_to_user length (normally zero)
	flag: embedded pointers are user/kernel

Most code will just use the kernel buffer and return length/error.

Code that knows the supplied length is invalid can use the
user pointer - but only support direct user requests.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations
  2020-06-27 16:33           ` Linus Torvalds
  2020-06-29  8:21             ` David Laight
  2020-06-29 15:29             ` Christoph Hellwig
@ 2020-07-08  5:14             ` Luis Chamberlain
  2 siblings, 0 replies; 36+ messages in thread
From: Luis Chamberlain @ 2020-07-08  5:14 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig
  Cc: David Laight, Al Viro, Kees Cook, Iurii Zaikin,
	Linux Kernel Mailing List, linux-fsdevel

On Sat, Jun 27, 2020 at 09:33:03AM -0700, Linus Torvalds wrote:
> The real problem with
> "set_fs()" has been that we've occasionally had bugs where we ended up
> running odd paths that we really didn't _intend_ to run with kernel
> pointers. The classic example is the SCSI "write as ioctl" example,
> where a write to a SCSI generic device would do various odd things and
> follow pointers and what-not. Then you get into real trouble when
> "splice()" ends up truiong to write a kernel buffer, and because of
> "set_fs()" suddenly the sg code started accessing kernel memory
> willy-nilly.

So the semantics of this interface can create chaos fast if not used
carefully and conservatively.

Christoph, it would be great if you're future series can include some
version of a verbiage for the motivation for the culling of set_fs().
Maybe it was just me, but the original motivation wasn't clear at first
and took some thread digging to get it.

  Luis

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2020-07-08  5:15 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-24 16:28 [RFC] stop using ->read and ->write for kernel access Christoph Hellwig
2020-06-24 16:28 ` [PATCH 01/11] uptr: add a new "universal pointer" type Christoph Hellwig
2020-06-24 16:28 ` [PATCH 02/11] fs: factor out a set_fmode_can_read_write helper Christoph Hellwig
2020-06-24 16:28 ` [PATCH 03/11] fs: add new read_uptr and write_uptr file operations Christoph Hellwig
2020-06-24 17:19   ` Linus Torvalds
2020-06-24 17:55     ` Christoph Hellwig
2020-06-24 18:11       ` Linus Torvalds
2020-06-24 18:14         ` Christoph Hellwig
2020-06-24 18:20           ` Linus Torvalds
2020-06-24 18:24             ` Christoph Hellwig
2020-06-24 18:29               ` Matthew Wilcox
2020-06-24 18:31                 ` Christoph Hellwig
2020-06-24 18:15         ` Linus Torvalds
2020-06-27 10:49         ` David Laight
2020-06-27 16:33           ` Linus Torvalds
2020-06-29  8:21             ` David Laight
2020-06-29 15:29             ` Christoph Hellwig
2020-06-29 17:02               ` Linus Torvalds
2020-06-29 18:07                 ` Christoph Hellwig
2020-06-29 18:29                   ` Linus Torvalds
2020-06-29 18:36                     ` Christoph Hellwig
2020-06-29 19:10                       ` Linus Torvalds
2020-06-30  7:04                         ` Christoph Hellwig
2020-06-30  7:51                 ` David Laight
2020-07-08  5:14             ` Luis Chamberlain
2020-06-24 17:56     ` Matthew Wilcox
2020-06-24 17:59       ` Christoph Hellwig
2020-06-24 18:37         ` Christoph Hellwig
2020-06-24 18:43           ` Matthew Wilcox
2020-06-24 16:28 ` [PATCH 04/11] sysctl: switch to ->{read,write}_uptr Christoph Hellwig
2020-06-24 16:28 ` [PATCH 05/11] fs: refactor new_sync_read Christoph Hellwig
2020-06-24 16:28 ` [PATCH 06/11] proc: add a read_iter method to proc proc_ops Christoph Hellwig
2020-06-24 16:28 ` [PATCH 07/11] seq_file: add seq_read_iter Christoph Hellwig
2020-06-24 16:28 ` [PATCH 09/11] proc: switch over direct seq_read method calls to seq_read_iter Christoph Hellwig
2020-06-24 16:29 ` [PATCH 10/11] fs: don't allow kernel reads and writes using ->read and ->write Christoph Hellwig
2020-06-24 16:29 ` [PATCH 11/11] fs: don't allow splice read/write without explicit ops Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).