From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D79AAC61DB8 for ; Tue, 3 Jun 2025 09:54:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A63E6B03F9; Tue, 3 Jun 2025 05:53:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D0566B03FD; Tue, 3 Jun 2025 05:53:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EEF906B03FA; Tue, 3 Jun 2025 05:53:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B93396B03F8 for ; Tue, 3 Jun 2025 05:53:57 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 5D9B1E91F7 for ; Tue, 3 Jun 2025 09:53:57 +0000 (UTC) X-FDA: 83513628114.12.4AD088E Received: from mta20.hihonor.com (mta20.honor.com [81.70.206.69]) by imf18.hostedemail.com (Postfix) with ESMTP id B49D01C0008 for ; Tue, 3 Jun 2025 09:53:54 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of tao.wangtao@honor.com designates 81.70.206.69 as permitted sender) smtp.mailfrom=tao.wangtao@honor.com; dmarc=pass (policy=none) header.from=honor.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748944435; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qTDtCSUbec6ZzPRJhLHAUIB6Lkz5rGLi6VdjYnw9KvU=; b=aEo7lNr09xWRe4t0B8CZxw7ewXPjQlqF9xbbiaYNdFC4QXKu3n9+j3c0qdfD+hgcJvU3fk 3FGFNkb+POggM/aFFLwChFHO8SNrilHOv3yQdCi0it2TUhRxZPSYdtIN76Pp55i52nsq14 Mg39y8s8Hlr4csaUYvkSu7gTPMdS6gI= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of tao.wangtao@honor.com designates 81.70.206.69 as permitted sender) smtp.mailfrom=tao.wangtao@honor.com; dmarc=pass (policy=none) header.from=honor.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748944435; a=rsa-sha256; cv=none; b=r6dH87Dmm7yH4MGHnNa7FkUeAzTVf27ohu/SVDHUVgSswqt5VkZGZ+ukgJ5ZZiasqSpytt 1gRQJpTfE2zORmb4echDPGRdEhwHrpmd4ooU9+xiY8mwDQ6iIwWa6HOeU7qHw9Rq+ojZlw mZAboi94RvIPlJSmsW+tcjVgDims3I4= Received: from w002.hihonor.com (unknown [10.68.28.120]) by mta20.hihonor.com (SkyGuard) with ESMTPS id 4bBQw53G5YzYlP5Y; Tue, 3 Jun 2025 17:51:33 +0800 (CST) Received: from a010.hihonor.com (10.68.16.52) by w002.hihonor.com (10.68.28.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 3 Jun 2025 17:53:50 +0800 Received: from localhost.localdomain (10.144.18.117) by a010.hihonor.com (10.68.16.52) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 3 Jun 2025 17:53:50 +0800 From: wangtao To: , , , , , , , , CC: , , , , , , , , , , , , , , , , wangtao Subject: [PATCH v4 1/4] fs: allow cross-FS copy_file_range for memory file with direct I/O Date: Tue, 3 Jun 2025 17:52:42 +0800 Message-ID: <20250603095245.17478-2-tao.wangtao@honor.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250603095245.17478-1-tao.wangtao@honor.com> References: <20250603095245.17478-1-tao.wangtao@honor.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.144.18.117] X-ClientProxiedBy: w002.hihonor.com (10.68.28.120) To a010.hihonor.com (10.68.16.52) X-Rspamd-Queue-Id: B49D01C0008 X-Stat-Signature: 8ic63fbhor49hs3weciu16kh1cds4ijz X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1748944434-716409 X-HE-Meta: U2FsdGVkX1+3+bUrDYgpeWdGhKinkd9AYqquIY/uc1rD9Hxemx4GtkYQZa9NR9vZwykKbdcUbbvbTCZ+sD1q3kUIyQn0Dyj+sm08z2fRj84B8qi0AMpMo+IxHJWUF1AlSAzKtJBTxKVEXex0d6s5YPZXxVQ1N51jb6EUPFdTVUN1c0LOIiJ9ko9c1q+ndMaiGIoMh2bS1W367bRTa0EVRG6LnydVutNPW/F26gRqSJjBzM4MQwhdeLw+MnePydOe1fPsBUlhZ7rf9Gp57EtyyBV8d8Yup6Oh2WmMqqBCR19u/zs6JrtTH8I0NCtrHdSFK3xb8Z5WMyvo+IB1G8NC0HNAX0VvXqLc67C8zgPk9suqZomnffYMuyB1XojOQE+ump9I5iDlxG4CdNyh6S9N9gYq83iy3rSli1qunK27xsBAZuVQ3Vbo9wFaMJZG9wVk5I5Gmvasj39K6297ocWOfDYVDIPFM/pKEWdLdRmdg3Z2KqfsY/HWyYl86FxV0NUHOAoN7/Xhr8pjGX3Hxm3Ed6YAFbvM6UpIH5fNodwJRClW+xlV3waYeKaoc/z0fYAFRt+DJNa1daiLc5XvXdnt6V++56hkCFGdfs9/v34sTrvMtHmWuq8bzjaCNLhOk5xi0tYiLfDZC4ixBclwWCKVeq20RzmoRJL+5LHPlu/cVcrI9tK+lVj0ZP8XblWV7FsJyhQ6pJ1+dukiQmrp4vdrYi5IuQk8OjWOlVLuzDsTyDWY0amsWyO3VfZ+aNruAsZ7Atv9hyZ5qRYXlFbylTpNbCQ3KA2cRYNEcNHqgqtOWiXiuRo4/NLxe+p98aUUVrOoh5J8rfXNcDXBO3zsFB+iLqT4kLdMqKzQuTRFocD8qRfQ3ibH3mmaVNF2DeDZNfzse/im9M4F2xcPZqGPgTZSCVsKh46nPW0LKUR+2qSm5ZkDVVWppy+1+wffya8/DYXKLTsd0ETWu8sFqOc8hE0 jUc9D3z1 hxwLHHwUl0qUcVzp69nvzP7zxp+zCCOgsjakWVLqJwRxLwczT+WyXNlKFPvs/IPZQNCUanit51vKKNLNdXSwM9sZHaFYacO33tmsYuPtedvE2JtxiTLSd/EKMZCJYDDLItOQuo0MRyzwH72W0ZM3tyuQ3tjbwYEHCGFJCex9y69w5tFcspOSFtrEAMMTOYkGXBq2L8CP0pF5z7PXqAZUeNN0wPt9VnkP5ZKTx7T6+250DTtm9n9LTOyqu2Fhdnj2RV6dyL59aiBSqleB97v2PUwVls5YtheWmuN8ydrfGaGhQpKlkhoOhxFoeK6eoHR7R0SlqOrxy6qy81SbgpPjhFQxekg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Memory files can optimize copy performance via copy_file_range callbacks: -Compared to mmap&read: reduces GUP (get_user_pages) overhead -Compared to sendfile/splice: eliminates one memory copy -Supports dma-buf direct I/O zero-copy implementation Suggested by: Christian König Suggested by: Amir Goldstein Signed-off-by: wangtao --- fs/read_write.c | 64 +++++++++++++++++++++++++++++++++++++--------- include/linux/fs.h | 2 ++ 2 files changed, 54 insertions(+), 12 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index bb0ed26a0b3a..ecb4f753c632 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1469,6 +1469,31 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd, } #endif +static const struct file_operations *memory_copy_file_ops( + struct file *file_in, struct file *file_out) +{ + if ((file_in->f_op->fop_flags & FOP_MEMORY_FILE) && + (file_in->f_mode & FMODE_CAN_ODIRECT) && + file_in->f_op->copy_file_range && file_out->f_op->write_iter) + return file_in->f_op; + else if ((file_out->f_op->fop_flags & FOP_MEMORY_FILE) && + (file_out->f_mode & FMODE_CAN_ODIRECT) && + file_in->f_op->read_iter && file_out->f_op->copy_file_range) + return file_out->f_op; + else + return NULL; +} + +static int essential_file_rw_checks(struct file *file_in, struct file *file_out) +{ + if (!(file_in->f_mode & FMODE_READ) || + !(file_out->f_mode & FMODE_WRITE) || + (file_out->f_flags & O_APPEND)) + return -EBADF; + + return 0; +} + /* * Performs necessary checks before doing a file copy * @@ -1484,9 +1509,16 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in, struct inode *inode_out = file_inode(file_out); uint64_t count = *req_count; loff_t size_in; + bool splice = flags & COPY_FILE_SPLICE; + const struct file_operations *mem_fops; int ret; - ret = generic_file_rw_checks(file_in, file_out); + /* The dma-buf file is not a regular file. */ + mem_fops = memory_copy_file_ops(file_in, file_out); + if (splice || mem_fops == NULL) + ret = generic_file_rw_checks(file_in, file_out); + else + ret = essential_file_rw_checks(file_in, file_out); if (ret) return ret; @@ -1500,8 +1532,10 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in, * and several different sets of file_operations, but they all end up * using the same ->copy_file_range() function pointer. */ - if (flags & COPY_FILE_SPLICE) { + if (splice) { /* cross sb splice is allowed */ + } else if (mem_fops != NULL) { + /* cross-fs copy is allowed for memory file. */ } else if (file_out->f_op->copy_file_range) { if (file_in->f_op->copy_file_range != file_out->f_op->copy_file_range) @@ -1554,6 +1588,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, ssize_t ret; bool splice = flags & COPY_FILE_SPLICE; bool samesb = file_inode(file_in)->i_sb == file_inode(file_out)->i_sb; + const struct file_operations *mem_fops; if (flags & ~COPY_FILE_SPLICE) return -EINVAL; @@ -1574,18 +1609,27 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, if (len == 0) return 0; + if (splice) + goto do_splice; + file_start_write(file_out); /* * Cloning is supported by more file systems, so we implement copy on * same sb using clone, but for filesystems where both clone and copy * are supported (e.g. nfs,cifs), we only call the copy method. + * For copy to/from memory file, we alway call the copy method of the + * memory file. */ - if (!splice && file_out->f_op->copy_file_range) { + mem_fops = memory_copy_file_ops(file_in, file_out); + if (mem_fops) { + ret = mem_fops->copy_file_range(file_in, pos_in, + file_out, pos_out, len, flags); + } else if (file_out->f_op->copy_file_range) { ret = file_out->f_op->copy_file_range(file_in, pos_in, - file_out, pos_out, - len, flags); - } else if (!splice && file_in->f_op->remap_file_range && samesb) { + file_out, pos_out, + len, flags); + } else if (file_in->f_op->remap_file_range && samesb) { ret = file_in->f_op->remap_file_range(file_in, pos_in, file_out, pos_out, min_t(loff_t, MAX_RW_COUNT, len), @@ -1603,6 +1647,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, if (!splice) goto done; +do_splice: /* * We can get here for same sb copy of filesystems that do not implement * ->copy_file_range() in case filesystem does not support clone or in @@ -1786,12 +1831,7 @@ int generic_file_rw_checks(struct file *file_in, struct file *file_out) if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode)) return -EINVAL; - if (!(file_in->f_mode & FMODE_READ) || - !(file_out->f_mode & FMODE_WRITE) || - (file_out->f_flags & O_APPEND)) - return -EBADF; - - return 0; + return essential_file_rw_checks(file_in, file_out); } int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter) diff --git a/include/linux/fs.h b/include/linux/fs.h index 016b0fe1536e..37df1b497418 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2187,6 +2187,8 @@ struct file_operations { #define FOP_ASYNC_LOCK ((__force fop_flags_t)(1 << 6)) /* File system supports uncached read/write buffered IO */ #define FOP_DONTCACHE ((__force fop_flags_t)(1 << 7)) +/* Supports cross-FS copy_file_range for memory file */ +#define FOP_MEMORY_FILE ((__force fop_flags_t)(1 << 8)) /* Wrap a directory iterator that needs exclusive inode access */ int wrap_directory_iterator(struct file *, struct dir_context *, -- 2.17.1