From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: "Seymour, Shane M" <shane.seymour@hpe.com>,
Christoph Hellwig <hch@infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
Jeff Layton <jlayton@poochiereds.net>,
"J. Bruce Fields" <bfields@fieldses.org>,
"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Subject: Re: [PATCH] block: create ioctl to discard-or-zeroout a range of blocks
Date: Fri, 13 Nov 2015 13:36:04 -0800 [thread overview]
Message-ID: <20151113213604.GD2217@birch.djwong.org> (raw)
In-Reply-To: <56460608.2070107@kernel.dk>
On Fri, Nov 13, 2015 at 08:47:20AM -0700, Jens Axboe wrote:
> On 11/10/2015 11:14 PM, Darrick J. Wong wrote:
> >On Wed, Nov 11, 2015 at 05:30:07AM +0000, Seymour, Shane M wrote:
> >>A quick question about this part of the patch:
> >>
> >>>+ uint64_t end = start + len - 1;
> >>
> >>>+ if (end >= i_size_read(bdev->bd_inode))
> >> return -EINVAL;
> >>
> >>>+ /* Invalidate the page cache, including dirty pages */
> >>>+ mapping = bdev->bd_inode->i_mapping;
> >>>+ truncate_inode_pages_range(mapping, start, end);
> >>
> >>blk_ioctl_zeroout accepts unsigned values for start and end (uint64_t) but
> >>loff_t types are turned from i_size_read() and passed as the 2nd and 3rd
> >>values to truncate_inode_pages_range() and loff_t is a signed value. It
> >>should be possible to pass in some values would overflow the calculation of
> >>end causing the test on the value of end and the result of i_size_read to
> >>pass but then end up passing a large unsigned value for in start that would
> >>be implicitly converted to signed in truncate_inode_pages_range. I was
> >>wondering if you'd tested passing in data that would cause sign conversion
> >>issues when passed into truncate_inode_pages_range (does it handle it
> >>gracefully?) or should this code:
> >>
> >> if (start & 511)
> >> return -EINVAL;
> >> if (len & 511)
> >> return -EINVAL;
> >>
> >>be something more like this (for better sanity checking of your arguments)
> >>which will ensure that you don't have implicit conversion issues from
> >>unsigned to signed and ensure that the result of adding them together won't
> >>either:
> >>
> >> if ((start & 511) || (start > (uint64_t)LLONG_MAX))
> >> return -EINVAL;
> >> if ((len & 511) ) || (len > (uint64_t)LLONG_MAX))
> >> return -EINVAL;
> >> if (end > (uint64_t)LLONG_MAX)
> >> return -EINVAL;
> >>
> >>My apologies in advance if I've made a mistake when looking at this and my
> >>concerns about unsigned values being implicitly converted to signed are
> >>unfounded (I would have hoped for compiler warnings about any implicit
> >>conversions though).
> >
> >I don't have a device large enough to test for signedness errors, since passing
> >huge values for start and len never make it past the i_size_read check.
> >However, I do see that I forgot to check the padding values, so I'll update
> >that.
>
> modprobe null_blk nr_devices=1 gb=512000
I tried doing that for some huge device sizes and this is what I saw:
# modprobe null_blk nr_devices=1 gb=17179869184
modprobe: ERROR: could not insert 'null_blk': Numerical result out of range
# modprobe null_blk nr_devices=1 gb=8589934591
modprobe: ERROR: could not insert 'null_blk': Numerical result out of range
# modprobe null_blk nr_devices=1 gb=8589934592
modprobe: ERROR: could not insert 'null_blk': Numerical result out of range
# modprobe null_blk nr_devices=1 gb=4294967295
modprobe: ERROR: could not insert 'null_blk': Numerical result out of range
# modprobe null_blk nr_devices=1 gb=4294967294
modprobe: ERROR: could not insert 'null_blk': Numerical result out of range
# modprobe null_blk nr_devices=1 gb=2147483647
# lsblk /dev/nullb0
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nullb0 249:0 0 2E 0 disk
Looks like the biggest nullblk device I can create is 2E?
(Same result with "modprobe scsi-debug virtual_gb=...")
> will get you a /dev/nullb0 that is 500TB. Adjust 'gb' at will. Or
> use loop with a big ass sparse file.
I tried that, too. XFS only wanted to let me create an 8E file. So
did btrfs. Eventually, I figured out the following:
# echo '0 18446744073709551616 zero' | dmsetup create MOO-backing
# echo '0 18446744073709551616 snapshot /dev/mapper/MOO-backing /dev/sda N 512' | dmsetup create MOO
# lsblk /dev/mapper/MOO
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
MOO (dm-1) 251:1 0 16E 0 dm
Well, now we're getting somewhere. It's a little funny that we asked
for 2^64 sectors, the dm table says 2^64, but the device claims a size
of 2^55... but it's 16E which exhausts our loff_t. Let's try wrapping
around the end:
# /djwong/t/blkzero/test /dev/mapper/MOO 18446744073709543424 4096
OFFSET=-8192 BUFSZ=4096 USER_COHERENCE=0 DIRECTIO=0 EXCLUSIVE=0
# /djwong/t/blkzero/test /dev/mapper/MOO 18446744073709543424 8192
OFFSET=-8192 BUFSZ=8192 USER_COHERENCE=0 DIRECTIO=0 EXCLUSIVE=0
/dev/mapper/MOO: Invalid argument
# /djwong/t/blkzero/test /dev/mapper/MOO 18446744073709543424 12288
OFFSET=-8192 BUFSZ=12288 USER_COHERENCE=0 DIRECTIO=0 EXCLUSIVE=0
#
Hmm. The third case should fail, since that clearly goes off the end
of the device. However, it's not correct to compare start or end
against LLONG_MAX because the block layer clearly supports devices
that are larger than LLONG_MAX bytes. However, the case where the
"end" calculation overflows should be easy to spot with a quick
"if (end < start) return -EINVAL;"
Curiously, if I create the DM device with 2^55 sectors, the dm
snapshot errors out:
# echo '0 36028797018963967 zero' | dmsetup create MOO-backing
# echo '0 36028797018963967 snapshot /dev/mapper/MOO-backing /dev/sda N 512' | dmsetup create MOO
# /djwong/t/blkzero/test /dev/mapper/MOO 18446744073709543424 4096
OFFSET=-8192 BUFSZ=4096 USER_COHERENCE=0 DIRECTIO=0 EXCLUSIVE=0
/dev/mapper/MOO: Input/output error
and dmesg spits out:
"device-mapper: snapshots: Invalidating snapshot: Error reading/writing."
FWIW, truncate_inode_pages_range takes the loff_t and converts that
into a pgoff_t, which is unsigned long. If the start pgoff_t > the
end pgoff_t, the function does nothing. On the off chance the
blkdev_issue_zeroout call doesn't fail when it's given weird
arguments, invalidate_inode_pages2_range seems to do roughly the same
thing with pgoff_t. Will add the one check and resend.
--D
next prev parent reply other threads:[~2015-11-13 21:36 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-10 5:15 [PATCH] block: create ioctl to discard-or-zeroout a range of blocks Darrick J. Wong
2015-11-11 5:30 ` Seymour, Shane M
[not found] ` <DDB9C85B850785449757F9914A034FCB444D1676-4I1V4pQFGigSZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2015-11-11 6:14 ` Darrick J. Wong
2015-11-11 23:30 ` Seymour, Shane M
2015-11-12 0:56 ` Seymour, Shane M
2015-11-13 3:36 ` Seymour, Shane M
[not found] ` <20151111061435.GA32272-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2015-11-13 15:47 ` Jens Axboe
2015-11-13 21:36 ` Darrick J. Wong [this message]
[not found] <1415336894-15327-1-git-send-email-martin.petersen@oracle.com>
[not found] ` <1415336894-15327-4-git-send-email-martin.petersen@oracle.com>
2014-11-11 0:04 ` [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function Darrick J. Wong
2014-11-17 19:28 ` [PATCH] block: create ioctl to discard-or-zeroout a range of blocks Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151113213604.GD2217@birch.djwong.org \
--to=darrick.wong@oracle.com \
--cc=axboe@kernel.dk \
--cc=bfields@fieldses.org \
--cc=hch@infradead.org \
--cc=jlayton@poochiereds.net \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=shane.seymour@hpe.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).