linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Jens Axboe <axboe@kernel.dk>, Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>
Cc: David Howells <dhowells@redhat.com>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	Jeff Layton <jlayton@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	Hillf Danton <hdanton@sina.com>,
	linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	syzbot+a440341a59e3b7142895@syzkaller.appspotmail.com,
	Christoph Hellwig <hch@lst.de>,
	John Hubbard <jhubbard@nvidia.com>
Subject: [PATCH v13 01/12] splice: Fix O_DIRECT file read splice to avoid reversion of ITER_PIPE
Date: Thu,  9 Feb 2023 10:29:43 +0000	[thread overview]
Message-ID: <20230209102954.528942-2-dhowells@redhat.com> (raw)
In-Reply-To: <20230209102954.528942-1-dhowells@redhat.com>

With the upcoming iov_iter_extract_pages() function, pages extracted from a
non-user-backed iterator such as ITER_PIPE aren't pinned.
__iomap_dio_rw(), however, calls iov_iter_revert() to shorten the iterator
to just the bufferage it is going to use - which has the side-effect of
freeing the excess pipe buffers, even though they're attached to a bio and
may get written to by DMA (thanks to Hillf Danton for spotting this[1]).

This then causes memory corruption that is particularly noticable when the
syzbot test[2] is run.  The test boils down to:

	out = creat(argv[1], 0666);
	ftruncate(out, 0x800);
	lseek(out, 0x200, SEEK_SET);
	in = open(argv[1], O_RDONLY | O_DIRECT | O_NOFOLLOW);
	sendfile(out, in, NULL, 0x1dd00);

run repeatedly in parallel.  What I think is happening is that ftruncate()
occasionally shortens the DIO read that's about to be made by sendfile's
splice core by reducing i_size.

Fix this by splitting the handling of a splice from an O_DIRECT file fd off
from that of non-DIO and in this case, replacing the use of an ITER_PIPE
iterator with an ITER_BVEC iterator for which reversion won't free the
buffers.  The DIO-specific code bulk allocates all the buffers it thinks it
is going to use in advance, does the read synchronously and only then trims
the buffer down.  The pages we did use get pushed into the pipe.

This should be more efficient for DIO read by virtue of doing a bulk page
allocation, but slightly less efficient by ignoring any partial page in the
pipe.

Fixes: 920756a3306a ("block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages")
Reported-by: syzbot+a440341a59e3b7142895@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20230207094731.1390-1-hdanton@sina.com/ [1]
Link: https://lore.kernel.org/r/000000000000b0b3c005f3a09383@google.com/ [2]
---

Notes:
    ver #13)
     - Don't completely replace generic_file_splice_read(), but rather only use
       this if we're doing a splicing from an O_DIRECT file fd.

 fs/splice.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 96 insertions(+)

diff --git a/fs/splice.c b/fs/splice.c
index 5969b7a1d353..b4be6fc314a1 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -282,6 +282,99 @@ void splice_shrink_spd(struct splice_pipe_desc *spd)
 	kfree(spd->partial);
 }
 
+/*
+ * Splice data from an O_DIRECT file into pages and then add them to the output
+ * pipe.
+ */
+static ssize_t generic_file_direct_splice_read(struct file *in, loff_t *ppos,
+					       struct pipe_inode_info *pipe,
+					       size_t len, unsigned int flags)
+{
+	LIST_HEAD(pages);
+	struct iov_iter to;
+	struct bio_vec *bv;
+	struct kiocb kiocb;
+	struct page *page;
+	unsigned int head;
+	ssize_t ret;
+	size_t used, npages, chunk, remain, reclaim;
+	int i;
+
+	/* Work out how much data we can actually add into the pipe */
+	used = pipe_occupancy(pipe->head, pipe->tail);
+	npages = max_t(ssize_t, pipe->max_usage - used, 0);
+	len = min_t(size_t, len, npages * PAGE_SIZE);
+	npages = DIV_ROUND_UP(len, PAGE_SIZE);
+
+	bv = kmalloc(array_size(npages, sizeof(bv[0])), GFP_KERNEL);
+	if (!bv)
+		return -ENOMEM;
+
+	npages = alloc_pages_bulk_list(GFP_USER, npages, &pages);
+	if (!npages) {
+		kfree(bv);
+		return -ENOMEM;
+	}
+
+	remain = len = min_t(size_t, len, npages * PAGE_SIZE);
+
+	for (i = 0; i < npages; i++) {
+		chunk = min_t(size_t, PAGE_SIZE, remain);
+		page = list_first_entry(&pages, struct page, lru);
+		list_del_init(&page->lru);
+		bv[i].bv_page = page;
+		bv[i].bv_offset = 0;
+		bv[i].bv_len = chunk;
+		remain -= chunk;
+	}
+
+	/* Do the I/O */
+	iov_iter_bvec(&to, ITER_DEST, bv, npages, len);
+	init_sync_kiocb(&kiocb, in);
+	kiocb.ki_pos = *ppos;
+	ret = call_read_iter(in, &kiocb, &to);
+
+	reclaim = npages * PAGE_SIZE;
+	remain = 0;
+	if (ret > 0) {
+		reclaim -= ret;
+		remain = ret;
+		*ppos = kiocb.ki_pos;
+		file_accessed(in);
+	} else if (ret < 0) {
+		/*
+		 * callers of ->splice_read() expect -EAGAIN on
+		 * "can't put anything in there", rather than -EFAULT.
+		 */
+		if (ret == -EFAULT)
+			ret = -EAGAIN;
+	}
+
+	/* Free any pages that didn't get touched at all. */
+	for (; reclaim >= PAGE_SIZE; reclaim -= PAGE_SIZE)
+		__free_page(bv[--npages].bv_page);
+
+	/* Push the remaining pages into the pipe. */
+	head = pipe->head;
+	for (i = 0; i < npages; i++) {
+		struct pipe_buffer *buf = &pipe->bufs[head & (pipe->ring_size - 1)];
+
+		chunk = min_t(size_t, remain, PAGE_SIZE);
+		*buf = (struct pipe_buffer) {
+			.ops	= &default_pipe_buf_ops,
+			.page	= bv[i].bv_page,
+			.offset	= 0,
+			.len	= chunk,
+		};
+		head++;
+		remain -= chunk;
+	}
+	pipe->head = head;
+
+	kfree(bv);
+	return ret;
+}
+
 /**
  * generic_file_splice_read - splice data from file to a pipe
  * @in:		file to splice from
@@ -303,6 +396,9 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos,
 	struct kiocb kiocb;
 	int ret;
 
+	if (in->f_flags & O_DIRECT)
+		return generic_file_direct_splice_read(in, ppos, pipe, len, flags);
+
 	iov_iter_pipe(&to, ITER_DEST, pipe, len);
 	init_sync_kiocb(&kiocb, in);
 	kiocb.ki_pos = *ppos;



  reply	other threads:[~2023-02-09 10:30 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-09 10:29 [PATCH v13 00/12] iov_iter: Improve page extraction (pin or just list) David Howells
2023-02-09 10:29 ` David Howells [this message]
2023-02-09 14:53   ` [PATCH v13 01/12] splice: Fix O_DIRECT file read splice to avoid reversion of ITER_PIPE Matthew Wilcox
2023-02-09 15:06   ` David Howells
2023-02-09 16:15   ` [PATCH v14 " David Howells
2023-02-13  8:22     ` Christoph Hellwig
2023-02-15 13:17     ` David Howells
2023-02-15 14:24       ` Christoph Hellwig
2023-02-15 15:56       ` David Howells
2023-02-15 13:42     ` [PATCH] splice: Clean up direct_splice_read() a bit David Howells
2023-02-15 13:47     ` David Howells
2023-02-09 10:29 ` [PATCH v13 02/12] mm: Pass info, not iter, into filemap_get_pages() and unstatic it David Howells
2023-02-13  8:22   ` Christoph Hellwig
2023-02-09 10:29 ` [PATCH v13 03/12] splice: Do splice read from a buffered file without using ITER_PIPE David Howells
2023-02-13  8:28   ` Christoph Hellwig
2023-02-13 10:11   ` David Howells
2023-02-13 10:18     ` Christoph Hellwig
2023-02-13 11:15     ` David Howells
2023-02-13 14:44       ` Christoph Hellwig
2023-02-13 18:06   ` Guenter Roeck
2023-02-13 22:43   ` David Howells
2023-02-13 22:51     ` Guenter Roeck
2023-02-13 23:12     ` David Howells
2023-02-13 23:25       ` Guenter Roeck
2023-02-09 10:29 ` [PATCH v13 04/12] iov_iter: Kill ITER_PIPE David Howells
2023-02-13  8:28   ` Christoph Hellwig
2023-02-09 10:29 ` [PATCH v13 05/12] iov_iter: Define flags to qualify page extraction David Howells
2023-02-09 10:29 ` [PATCH v13 06/12] iov_iter: Add a function to extract a page list from an iterator David Howells
2023-02-09 10:29 ` [PATCH v13 07/12] iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing David Howells
2023-02-09 10:29 ` [PATCH v13 08/12] block: Fix bio_flagged() so that gcc can better optimise it David Howells
2023-02-09 10:29 ` [PATCH v13 09/12] block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic David Howells
2023-02-09 10:29 ` [PATCH v13 10/12] block: Add BIO_PAGE_PINNED and associated infrastructure David Howells
2023-02-09 10:29 ` [PATCH v13 11/12] block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages David Howells
2023-02-09 10:29 ` [PATCH v13 12/12] block: convert bio_map_user_iov " David Howells
2023-02-10 22:31 ` [PATCH v13 00/12] iov_iter: Improve page extraction (pin or just list) Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230209102954.528942-2-dhowells@redhat.com \
    --to=dhowells@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=david@redhat.com \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=hdanton@sina.com \
    --cc=jack@suse.cz \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jlayton@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=logang@deltatee.com \
    --cc=syzbot+a440341a59e3b7142895@syzkaller.appspotmail.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).