public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Read O_DIRECT regression in 3.7-rc8 (bisected)
@ 2012-12-07 23:55 Milan Broz
  2012-12-08  0:48 ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: Milan Broz @ 2012-12-07 23:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

Hi Linus,

seems this commit in 3.7-rc8 caused regression for O_DIRECT
read near the end of the device.

bbec0270bdd887f96377065ee38b8848b5afa395 is the first bad commit
commit bbec0270bdd887f96377065ee38b8848b5afa395
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu Nov 29 12:31:52 2012 -0800

    blkdev_max_block: make private to fs/buffer.c


With reproducer below (tested on i386), read should return
half of the buffer (8192 bytes), with patch above it fails
completely.

Milan

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>

#define BLOCK 8192
int main (int argc, char *argv[])
{
        char *buf;
        int fd, r;

        if (posix_memalign((void*)&buf, 4096, 2 * BLOCK)) {
                printf("alloc fail\n");
                return 1;
        }

        fd = open("/dev/sdb", O_RDONLY|O_DIRECT);
        if (fd == -1) {
                printf("open fail\n");
                return 1;
        }

        if (lseek(fd, -BLOCK, SEEK_END) < 0) {
                printf("seek fail\n");
                close(fd);
                return 2;
        }

        r = read(fd, buf, 2 * BLOCK);
        printf("Read returned %d.\n", r);

        close(fd);
        return 0;
}

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Read O_DIRECT regression in 3.7-rc8 (bisected)
  2012-12-07 23:55 Read O_DIRECT regression in 3.7-rc8 (bisected) Milan Broz
@ 2012-12-08  0:48 ` Linus Torvalds
  2012-12-08  0:57   ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2012-12-08  0:48 UTC (permalink / raw)
  To: Milan Broz; +Cc: Linux Kernel Mailing List



On Sat, 8 Dec 2012, Milan Broz wrote:
> 
> seems this commit in 3.7-rc8 caused regression for O_DIRECT
> read near the end of the device.

Oh, good find, and thanks for the test-case. I had looked at the O_DIRECT 
side, and convinced myself that it already truncates to i_size_read(), but 
it looks like that actually only happens for the *write* side for some 
reason.

So apparently the read side doesn't have anything like that.

This (TOTALLY UNTESTED) patch adds it the same iov_shorten() logic that 
the write side has. It does it differently (in fs/block_dev.c rather than 
in mm/filemap.c), but I actually suspect this is a nicer way to do it, and 
maybe we should do the write side truncation this way too.

But as mentioned, it's untested.. Does it work for you? I'll reboot and 
test myself, but I'm on my laptop right now, so it's easier to send it out 
before the compile has even finished..

		Linus

---

 fs/block_dev.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index a1e09b4fe1ba..ab3a456f6650 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1544,6 +1544,22 @@ ssize_t blkdev_aio_write(struct kiocb *iocb, const struct iovec *iov,
 }
 EXPORT_SYMBOL_GPL(blkdev_aio_write);
 
+static ssize_t blkdev_aio_read(struct kiocb *iocb, const struct iovec *iov,
+			 unsigned long nr_segs, loff_t pos)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *bd_inode = file->f_mapping->host;
+	loff_t size = i_size_read(bd_inode);
+
+	if (pos >= size)
+		return 0;
+
+	size -= pos;
+	if (size < INT_MAX)
+		nr_segs = iov_shorten((struct iovec *)iov, nr_segs, size);
+	return generic_file_aio_read(iocb, iov, nr_segs, pos);
+}
+
 /*
  * Try to release a page associated with block device when the system
  * is under memory pressure.
@@ -1574,7 +1590,7 @@ const struct file_operations def_blk_fops = {
 	.llseek		= block_llseek,
 	.read		= do_sync_read,
 	.write		= do_sync_write,
-	.aio_read	= generic_file_aio_read,
+	.aio_read	= blkdev_aio_read,
 	.aio_write	= blkdev_aio_write,
 	.mmap		= generic_file_mmap,
 	.fsync		= blkdev_fsync,

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: Read O_DIRECT regression in 3.7-rc8 (bisected)
  2012-12-08  0:48 ` Linus Torvalds
@ 2012-12-08  0:57   ` Linus Torvalds
  2012-12-08  8:53     ` Milan Broz
  0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2012-12-08  0:57 UTC (permalink / raw)
  To: Milan Broz; +Cc: Linux Kernel Mailing List



On Fri, 7 Dec 2012, Linus Torvalds wrote:
> 
> This (TOTALLY UNTESTED) patch adds it the same iov_shorten() logic that 
> the write side has. It does it differently (in fs/block_dev.c rather than 
> in mm/filemap.c), but I actually suspect this is a nicer way to do it, and 
> maybe we should do the write side truncation this way too.

Ok, rebooted and tested with your test-case, and it seems to work fine. 
But I presume that you actually found the problem some other way, so you'd 
want to verify that it fixes whatever original bigger issue you hit.

Thanks,
            Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Read O_DIRECT regression in 3.7-rc8 (bisected)
  2012-12-08  0:57   ` Linus Torvalds
@ 2012-12-08  8:53     ` Milan Broz
  0 siblings, 0 replies; 4+ messages in thread
From: Milan Broz @ 2012-12-08  8:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On 12/08/2012 01:57 AM, Linus Torvalds wrote:
> 
> On Fri, 7 Dec 2012, Linus Torvalds wrote:
>>
>> This (TOTALLY UNTESTED) patch adds it the same iov_shorten() logic that 
>> the write side has. It does it differently (in fs/block_dev.c rather than 
>> in mm/filemap.c), but I actually suspect this is a nicer way to do it, and 
>> maybe we should do the write side truncation this way too.
> 
> Ok, rebooted and tested with your test-case, and it seems to work fine. 
> But I presume that you actually found the problem some other way, so you'd 
> want to verify that it fixes whatever original bigger issue you hit.

Yes, initially it crashed regression test for cryptsetup reencrypt tool
(direct-io is optional there) and that device reencryption case never finished.
(Which means in fact lost of access to the encrypted device for user
without some manual tweaking, so quite serious problem.)

Anyway, it works now with your patch, so

Reported-and-tested-by: Milan Broz <mbroz@redhat.com>

Thanks,
Milan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-12-08  8:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-07 23:55 Read O_DIRECT regression in 3.7-rc8 (bisected) Milan Broz
2012-12-08  0:48 ` Linus Torvalds
2012-12-08  0:57   ` Linus Torvalds
2012-12-08  8:53     ` Milan Broz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox