* Re: [PATCHv6 07/11] block/bounce: count bytes instead of sectors [not found] ` <20220610195830.3574005-8-kbusch@fb.com> @ 2022-06-13 14:22 ` Christoph Hellwig 2022-07-22 22:01 ` Bart Van Assche 1 sibling, 0 replies; 30+ messages in thread From: Christoph Hellwig @ 2022-06-13 14:22 UTC (permalink / raw) To: Keith Busch Cc: linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers, pankydev8, Keith Busch, Pankaj Raghav On Fri, Jun 10, 2022 at 12:58:26PM -0700, Keith Busch wrote: > From: Keith Busch <kbusch@kernel.org> > > Individual bv_len's may not be a sector size. > > Signed-off-by: Keith Busch <kbusch@kernel.org> > Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> > Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 07/11] block/bounce: count bytes instead of sectors [not found] ` <20220610195830.3574005-8-kbusch@fb.com> 2022-06-13 14:22 ` [PATCHv6 07/11] block/bounce: count bytes instead of sectors Christoph Hellwig @ 2022-07-22 22:01 ` Bart Van Assche 2022-07-25 14:46 ` Keith Busch 1 sibling, 1 reply; 30+ messages in thread From: Bart Van Assche @ 2022-07-22 22:01 UTC (permalink / raw) To: Keith Busch, linux-fsdevel, linux-block, linux-nvme Cc: axboe, Kernel Team, hch, damien.lemoal, ebiggers, pankydev8, Keith Busch, Pankaj Raghav On 6/10/22 12:58, Keith Busch wrote: > From: Keith Busch <kbusch@kernel.org> > > Individual bv_len's may not be a sector size. > > Signed-off-by: Keith Busch <kbusch@kernel.org> > Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> > Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> > --- > block/bounce.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/block/bounce.c b/block/bounce.c > index 8f7b6fe3b4db..fbadf179601f 100644 > --- a/block/bounce.c > +++ b/block/bounce.c > @@ -205,19 +205,26 @@ void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig) > int rw = bio_data_dir(*bio_orig); > struct bio_vec *to, from; > struct bvec_iter iter; > - unsigned i = 0; > + unsigned i = 0, bytes = 0; > bool bounce = false; > - int sectors = 0; > + int sectors; > > bio_for_each_segment(from, *bio_orig, iter) { > if (i++ < BIO_MAX_VECS) > - sectors += from.bv_len >> 9; > + bytes += from.bv_len; > if (PageHighMem(from.bv_page)) > bounce = true; > } > if (!bounce) > return; > > + /* > + * Individual bvecs might not be logical block aligned. Round down > + * the split size so that each bio is properly block size aligned, > + * even if we do not use the full hardware limits. > + */ > + sectors = ALIGN_DOWN(bytes, queue_logical_block_size(q)) >> > + SECTOR_SHIFT; > if (sectors < bio_sectors(*bio_orig)) { > bio = bio_split(*bio_orig, sectors, GFP_NOIO, &bounce_bio_split); > bio_chain(bio, *bio_orig); Do I see correctly that there are two changes in this patch: counting bytes instead of sectors and also splitting at logical block boundaries instead of a 512-byte boundary? Should this patch perhaps be split? Thanks, Bart. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 07/11] block/bounce: count bytes instead of sectors 2022-07-22 22:01 ` Bart Van Assche @ 2022-07-25 14:46 ` Keith Busch 0 siblings, 0 replies; 30+ messages in thread From: Keith Busch @ 2022-07-25 14:46 UTC (permalink / raw) To: Bart Van Assche Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch, damien.lemoal, ebiggers, pankydev8, Pankaj Raghav On Fri, Jul 22, 2022 at 03:01:06PM -0700, Bart Van Assche wrote: > On 6/10/22 12:58, Keith Busch wrote: > > From: Keith Busch <kbusch@kernel.org> > > > > Individual bv_len's may not be a sector size. > > > > Signed-off-by: Keith Busch <kbusch@kernel.org> > > Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> > > Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> > > --- > > block/bounce.c | 13 ++++++++++--- > > 1 file changed, 10 insertions(+), 3 deletions(-) > > > > diff --git a/block/bounce.c b/block/bounce.c > > index 8f7b6fe3b4db..fbadf179601f 100644 > > --- a/block/bounce.c > > +++ b/block/bounce.c > > @@ -205,19 +205,26 @@ void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig) > > int rw = bio_data_dir(*bio_orig); > > struct bio_vec *to, from; > > struct bvec_iter iter; > > - unsigned i = 0; > > + unsigned i = 0, bytes = 0; > > bool bounce = false; > > - int sectors = 0; > > + int sectors; > > bio_for_each_segment(from, *bio_orig, iter) { > > if (i++ < BIO_MAX_VECS) > > - sectors += from.bv_len >> 9; > > + bytes += from.bv_len; > > if (PageHighMem(from.bv_page)) > > bounce = true; > > } > > if (!bounce) > > return; > > + /* > > + * Individual bvecs might not be logical block aligned. Round down > > + * the split size so that each bio is properly block size aligned, > > + * even if we do not use the full hardware limits. > > + */ > > + sectors = ALIGN_DOWN(bytes, queue_logical_block_size(q)) >> > > + SECTOR_SHIFT; > > if (sectors < bio_sectors(*bio_orig)) { > > bio = bio_split(*bio_orig, sectors, GFP_NOIO, &bounce_bio_split); > > bio_chain(bio, *bio_orig); > > Do I see correctly that there are two changes in this patch: counting bytes > instead of sectors and also splitting at logical block boundaries instead of > a 512-byte boundary? Should this patch perhaps be split? The code previously would only split on a bvec boundary. All bvecs were logical block sized, so that part is not changing. We just needed to be able to split mid-bvec since the series enables unaligned offsets to match hardware capabilities. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 00/11] direct-io dma alignment [not found] <20220610195830.3574005-1-kbusch@fb.com> [not found] ` <20220610195830.3574005-8-kbusch@fb.com> @ 2022-06-13 21:22 ` Jens Axboe [not found] ` <20220610195830.3574005-12-kbusch@fb.com> ` (2 subsequent siblings) 4 siblings, 0 replies; 30+ messages in thread From: Jens Axboe @ 2022-06-13 21:22 UTC (permalink / raw) To: kbusch, linux-block, linux-nvme, linux-fsdevel Cc: bvanassche, ebiggers, damien.lemoal, Kernel-team, pankydev8, kbusch, Christoph Hellwig On Fri, 10 Jun 2022 12:58:19 -0700, Keith Busch wrote: > From: Keith Busch <kbusch@kernel.org> > > The previous version is available here: > > https://lore.kernel.org/linux-block/Yp4qQRI5awiycml1@kbusch-mbp.dhcp.thefacebook.com/T/#m0a93b6392038aad6144e066fb5ada2cbf316f78e > > Changes from the previous are all trivial changes: > > [...] Applied, thanks! [01/11] block: fix infinite loop for invalid zone append commit: 1180b55c93f6b060ad930db151fe6d2b425f9215 [02/11] block/bio: remove duplicate append pages code commit: 7a2b81b95a89e578343b1c944ddd64d1b14ee49a [03/11] block: export dma_alignment attribute commit: 5f507439f051daaa1e3273ff536afda3ad1f1505 [04/11] block: introduce bdev_dma_alignment helper commit: 24b10a6e0bc22619535b0ed982b7735910981661 [05/11] block: add a helper function for dio alignment commit: 8a39418810a65f0bcbe559261ef011fe0e298eeb [06/11] block/merge: count bytes instead of sectors commit: 4ff782f24a4cad4b033d0f4f6e38cd50e0d463b0 [07/11] block/bounce: count bytes instead of sectors commit: 4b5310470e72d77c9b52f8544b98aa8cf77d956f [08/11] iov: introduce iov_iter_aligned commit: ab7c0c3abb2e5fa15655e4b87bb7b937ca7e18c3 [09/11] block: introduce bdev_iter_is_aligned helper commit: 72230944b7a53280c1f351a0d5cafed12732ec21 [10/11] block: relax direct io memory alignment commit: 84f970d415ef4d048e664ac308792eb93d0152fc [11/11] iomap: add support for dma aligned direct-io commit: 40e11e7a6cc74f11b5ca23ceefec7c84af5c4c73 Best regards, -- Jens Axboe ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <20220610195830.3574005-12-kbusch@fb.com>]
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io [not found] ` <20220610195830.3574005-12-kbusch@fb.com> @ 2022-06-23 18:29 ` Eric Farman 2022-06-23 18:51 ` Keith Busch 2022-07-22 7:36 ` Eric Biggers 1 sibling, 1 reply; 30+ messages in thread From: Eric Farman @ 2022-06-23 18:29 UTC (permalink / raw) To: Keith Busch, linux-fsdevel, linux-block, linux-nvme, Christian Borntraeger Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers, pankydev8, Keith Busch On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote: > From: Keith Busch <kbusch@kernel.org> > > Use the address alignment requirements from the block_device for > direct > io instead of requiring addresses be aligned to the block size. Hi Keith, Our s390 PV guests recently started failing to boot from a -next host, and git blame brought me here. As near as I have been able to tell, we start tripping up on this code from patch 9 [1] that gets invoked with this patch: > for (k = 0; k < i->nr_segs; k++, skip = 0) { > size_t len = i->iov[k].iov_len - skip; > > if (len > size) > len = size; > if (len & len_mask) > return false; The iovec we're failing on has two segments, one with a len of x200 (and base of x...000) and another with a len of xe00 (and a base of x...200), while len_mask is of course xfff. So before I go any further on what we might have broken, do you happen to have any suggestions what might be going on here, or something I should try? Thanks, Eric [1] https://lore.kernel.org/r/20220610195830.3574005-9-kbusch@fb.com/ > > Signed-off-by: Keith Busch <kbusch@kernel.org> > Reviewed-by: Christoph Hellwig <hch@lst.de> > --- > fs/iomap/direct-io.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > index 370c3241618a..5d098adba443 100644 > --- a/fs/iomap/direct-io.c > +++ b/fs/iomap/direct-io.c > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct > iomap_iter *iter, > struct inode *inode = iter->inode; > unsigned int blkbits = > blksize_bits(bdev_logical_block_size(iomap->bdev)); > unsigned int fs_block_size = i_blocksize(inode), pad; > - unsigned int align = iov_iter_alignment(dio->submit.iter); > loff_t length = iomap_length(iter); > loff_t pos = iter->pos; > unsigned int bio_opf; > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct > iomap_iter *iter, > size_t copied = 0; > size_t orig_count; > > - if ((pos | length | align) & ((1 << blkbits) - 1)) > + if ((pos | length) & ((1 << blkbits) - 1) || > + !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter)) > return -EINVAL; > > if (iomap->type == IOMAP_UNWRITTEN) { ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-06-23 18:29 ` [PATCHv6 11/11] iomap: add support for dma aligned direct-io Eric Farman @ 2022-06-23 18:51 ` Keith Busch 2022-06-23 19:11 ` Keith Busch 0 siblings, 1 reply; 30+ messages in thread From: Keith Busch @ 2022-06-23 18:51 UTC (permalink / raw) To: Eric Farman Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme, Christian Borntraeger, axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers, pankydev8 On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote: > On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote: > > From: Keith Busch <kbusch@kernel.org> > > > > Use the address alignment requirements from the block_device for > > direct > > io instead of requiring addresses be aligned to the block size. > > Hi Keith, > > Our s390 PV guests recently started failing to boot from a -next host, > and git blame brought me here. > > As near as I have been able to tell, we start tripping up on this code > from patch 9 [1] that gets invoked with this patch: > > > for (k = 0; k < i->nr_segs; k++, skip = 0) { > > size_t len = i->iov[k].iov_len - skip; > > > > if (len > size) > > len = size; > > if (len & len_mask) > > return false; > > The iovec we're failing on has two segments, one with a len of x200 > (and base of x...000) and another with a len of xe00 (and a base of > x...200), while len_mask is of course xfff. > > So before I go any further on what we might have broken, do you happen > to have any suggestions what might be going on here, or something I > should try? Thanks for the notice, sorry for the trouble. This check wasn't intended to have any difference from the previous code with respect to the vector lengths. Could you tell me if you're accessing this through the block device direct-io, or through iomap filesystem? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-06-23 18:51 ` Keith Busch @ 2022-06-23 19:11 ` Keith Busch 2022-06-23 20:32 ` Eric Farman 0 siblings, 1 reply; 30+ messages in thread From: Keith Busch @ 2022-06-23 19:11 UTC (permalink / raw) To: Eric Farman Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme, Christian Borntraeger, axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers, pankydev8 On Thu, Jun 23, 2022 at 12:51:08PM -0600, Keith Busch wrote: > On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote: > > On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote: > > > From: Keith Busch <kbusch@kernel.org> > > > > > > Use the address alignment requirements from the block_device for > > > direct > > > io instead of requiring addresses be aligned to the block size. > > > > Hi Keith, > > > > Our s390 PV guests recently started failing to boot from a -next host, > > and git blame brought me here. > > > > As near as I have been able to tell, we start tripping up on this code > > from patch 9 [1] that gets invoked with this patch: > > > > > for (k = 0; k < i->nr_segs; k++, skip = 0) { > > > size_t len = i->iov[k].iov_len - skip; > > > > > > if (len > size) > > > len = size; > > > if (len & len_mask) > > > return false; > > > > The iovec we're failing on has two segments, one with a len of x200 > > (and base of x...000) and another with a len of xe00 (and a base of > > x...200), while len_mask is of course xfff. > > > > So before I go any further on what we might have broken, do you happen > > to have any suggestions what might be going on here, or something I > > should try? > > Thanks for the notice, sorry for the trouble. This check wasn't intended to > have any difference from the previous code with respect to the vector lengths. > > Could you tell me if you're accessing this through the block device direct-io, > or through iomap filesystem? If using iomap, the previous check was this: unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev)); unsigned int align = iov_iter_alignment(dio->submit.iter); ... if ((pos | length | align) & ((1 << blkbits) - 1)) return -EINVAL; And if raw block device, it was this: if ((pos | iov_iter_alignment(iter)) & (bdev_logical_block_size(bdev) - 1)) return -EINVAL; The result of "iov_iter_alignment()" would include "0xe00 | 0x200" in your example, and checked against 0xfff should have been failing prior to this patch. Unless I'm missing something... ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-06-23 19:11 ` Keith Busch @ 2022-06-23 20:32 ` Eric Farman 2022-06-23 21:34 ` Eric Farman 0 siblings, 1 reply; 30+ messages in thread From: Eric Farman @ 2022-06-23 20:32 UTC (permalink / raw) To: Keith Busch Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme, Christian Borntraeger, axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers, pankydev8 On Thu, 2022-06-23 at 13:11 -0600, Keith Busch wrote: > On Thu, Jun 23, 2022 at 12:51:08PM -0600, Keith Busch wrote: > > On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote: > > > On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote: > > > > From: Keith Busch <kbusch@kernel.org> > > > > > > > > Use the address alignment requirements from the block_device > > > > for > > > > direct > > > > io instead of requiring addresses be aligned to the block size. > > > > > > Hi Keith, > > > > > > Our s390 PV guests recently started failing to boot from a -next > > > host, > > > and git blame brought me here. > > > > > > As near as I have been able to tell, we start tripping up on this > > > code > > > from patch 9 [1] that gets invoked with this patch: > > > > > > > for (k = 0; k < i->nr_segs; k++, skip = 0) { > > > > size_t len = i->iov[k].iov_len - skip; > > > > > > > > if (len > size) > > > > len = size; > > > > if (len & len_mask) > > > > return false; > > > > > > The iovec we're failing on has two segments, one with a len of > > > x200 > > > (and base of x...000) and another with a len of xe00 (and a base > > > of > > > x...200), while len_mask is of course xfff. > > > > > > So before I go any further on what we might have broken, do you > > > happen > > > to have any suggestions what might be going on here, or something > > > I > > > should try? > > > > Thanks for the notice, sorry for the trouble. This check wasn't > > intended to > > have any difference from the previous code with respect to the > > vector lengths. > > > > Could you tell me if you're accessing this through the block device > > direct-io, > > or through iomap filesystem? Reasonably certain the failure's on iomap. I'd reverted the subject patch from next-20220622 and got things in working order. > > If using iomap, the previous check was this: > > unsigned int blkbits = > blksize_bits(bdev_logical_block_size(iomap->bdev)); > unsigned int align = iov_iter_alignment(dio->submit.iter); > ... > if ((pos | length | align) & ((1 << blkbits) - 1)) > return -EINVAL; > > ... > The result of "iov_iter_alignment()" would include "0xe00 | 0x200" in > your > example, and checked against 0xfff should have been failing prior to > this > patch. Unless I'm missing something... Nope, you're not. I didn't look back at what the old check was doing, just saw "0xe00 and 0x200" and thought "oh there's one page" instead of noting the code was or'ing them. My bad. That was the last entry in my trace before the guest gave up, as everything else through this code up to that point seemed okay. I'll pick up the working case and see if I can get a clearer picture between the two. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-06-23 20:32 ` Eric Farman @ 2022-06-23 21:34 ` Eric Farman 2022-06-27 15:21 ` Eric Farman 0 siblings, 1 reply; 30+ messages in thread From: Eric Farman @ 2022-06-23 21:34 UTC (permalink / raw) To: Keith Busch Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme, Christian Borntraeger, axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers, pankydev8 On Thu, 2022-06-23 at 16:32 -0400, Eric Farman wrote: > On Thu, 2022-06-23 at 13:11 -0600, Keith Busch wrote: > > On Thu, Jun 23, 2022 at 12:51:08PM -0600, Keith Busch wrote: > > > On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote: > > > > On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote: > > > > > From: Keith Busch <kbusch@kernel.org> > > > > > > > > > > Use the address alignment requirements from the block_device > > > > > for > > > > > direct > > > > > io instead of requiring addresses be aligned to the block > > > > > size. > > > > > > > > Hi Keith, > > > > > > > > Our s390 PV guests recently started failing to boot from a > > > > -next > > > > host, > > > > and git blame brought me here. > > > > > > > > As near as I have been able to tell, we start tripping up on > > > > this > > > > code > > > > from patch 9 [1] that gets invoked with this patch: > > > > > > > > > for (k = 0; k < i->nr_segs; k++, skip = 0) { > > > > > size_t len = i->iov[k].iov_len - skip; > > > > > > > > > > if (len > size) > > > > > len = size; > > > > > if (len & len_mask) > > > > > return false; > > > > > > > > The iovec we're failing on has two segments, one with a len of > > > > x200 > > > > (and base of x...000) and another with a len of xe00 (and a > > > > base > > > > of > > > > x...200), while len_mask is of course xfff. > > > > > > > > So before I go any further on what we might have broken, do you > > > > happen > > > > to have any suggestions what might be going on here, or > > > > something > > > > I > > > > should try? > > > > > > Thanks for the notice, sorry for the trouble. This check wasn't > > > intended to > > > have any difference from the previous code with respect to the > > > vector lengths. > > > > > > Could you tell me if you're accessing this through the block > > > device > > > direct-io, > > > or through iomap filesystem? > > Reasonably certain the failure's on iomap. I'd reverted the subject > patch from next-20220622 and got things in working order. > > > If using iomap, the previous check was this: > > > > unsigned int blkbits = > > blksize_bits(bdev_logical_block_size(iomap->bdev)); > > unsigned int align = iov_iter_alignment(dio->submit.iter); > > ... > > if ((pos | length | align) & ((1 << blkbits) - 1)) > > return -EINVAL; > > > > > ... > > The result of "iov_iter_alignment()" would include "0xe00 | 0x200" > > in > > your > > example, and checked against 0xfff should have been failing prior > > to > > this > > patch. Unless I'm missing something... > > Nope, you're not. I didn't look back at what the old check was doing, > just saw "0xe00 and 0x200" and thought "oh there's one page" instead > of > noting the code was or'ing them. My bad. > > That was the last entry in my trace before the guest gave up, as > everything else through this code up to that point seemed okay. I'll > pick up the working case and see if I can get a clearer picture > between > the two. Looking over the trace again, I realize I did dump iov_iter_alignment() as a comparator, and I see one pass through that had a non-zero response but bdev_iter_is_aligned() returned true... count = x1000 iov_offset = x0 nr_segs = 1 iov_len = x1000 (len_mask = xfff) iov_base = x...200 (addr_mask = x1ff) That particular pass through is in the middle of the stuff it tried to do, so I don't know if that's the cause or not but it strikes me as unusual. Will look into that tomorrow and report back. > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-06-23 21:34 ` Eric Farman @ 2022-06-27 15:21 ` Eric Farman 2022-06-27 15:36 ` Keith Busch 0 siblings, 1 reply; 30+ messages in thread From: Eric Farman @ 2022-06-27 15:21 UTC (permalink / raw) To: Keith Busch Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme, Christian Borntraeger, axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers, pankydev8, Halil Pasic On Thu, 2022-06-23 at 17:34 -0400, Eric Farman wrote: > On Thu, 2022-06-23 at 16:32 -0400, Eric Farman wrote: > > On Thu, 2022-06-23 at 13:11 -0600, Keith Busch wrote: > > > On Thu, Jun 23, 2022 at 12:51:08PM -0600, Keith Busch wrote: > > > > On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote: > > > > > On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote: > > > > > > From: Keith Busch <kbusch@kernel.org> > > > > > > > > > > > > Use the address alignment requirements from the > > > > > > block_device > > > > > > for > > > > > > direct > > > > > > io instead of requiring addresses be aligned to the block > > > > > > size. > > > > > > > > > > Hi Keith, > > > > > > > > > > Our s390 PV guests recently started failing to boot from a > > > > > -next > > > > > host, > > > > > and git blame brought me here. > > > > > > > > > > As near as I have been able to tell, we start tripping up on > > > > > this > > > > > code > > > > > from patch 9 [1] that gets invoked with this patch: > > > > > > > > > > > for (k = 0; k < i->nr_segs; k++, skip = 0) { > > > > > > size_t len = i->iov[k].iov_len - skip; > > > > > > > > > > > > if (len > size) > > > > > > len = size; > > > > > > if (len & len_mask) > > > > > > return false; > > > > > > > > > > The iovec we're failing on has two segments, one with a len > > > > > of > > > > > x200 > > > > > (and base of x...000) and another with a len of xe00 (and a > > > > > base > > > > > of > > > > > x...200), while len_mask is of course xfff. > > > > > > > > > > So before I go any further on what we might have broken, do > > > > > you > > > > > happen > > > > > to have any suggestions what might be going on here, or > > > > > something > > > > > I > > > > > should try? > > > > > > > > Thanks for the notice, sorry for the trouble. This check wasn't > > > > intended to > > > > have any difference from the previous code with respect to the > > > > vector lengths. > > > > > > > > Could you tell me if you're accessing this through the block > > > > device > > > > direct-io, > > > > or through iomap filesystem? > > > > Reasonably certain the failure's on iomap. I'd reverted the subject > > patch from next-20220622 and got things in working order. > > > > > If using iomap, the previous check was this: > > > > > > unsigned int blkbits = > > > blksize_bits(bdev_logical_block_size(iomap->bdev)); > > > unsigned int align = iov_iter_alignment(dio->submit.iter); > > > ... > > > if ((pos | length | align) & ((1 << blkbits) - 1)) > > > return -EINVAL; > > > > > > > > ... > > > The result of "iov_iter_alignment()" would include "0xe00 | > > > 0x200" > > > in > > > your > > > example, and checked against 0xfff should have been failing prior > > > to > > > this > > > patch. Unless I'm missing something... > > > > Nope, you're not. I didn't look back at what the old check was > > doing, > > just saw "0xe00 and 0x200" and thought "oh there's one page" > > instead > > of > > noting the code was or'ing them. My bad. > > > > That was the last entry in my trace before the guest gave up, as > > everything else through this code up to that point seemed okay. > > I'll > > pick up the working case and see if I can get a clearer picture > > between > > the two. > > Looking over the trace again, I realize I did dump > iov_iter_alignment() > as a comparator, and I see one pass through that had a non-zero > response but bdev_iter_is_aligned() returned true... > > count = x1000 > iov_offset = x0 > nr_segs = 1 > iov_len = x1000 (len_mask = xfff) > iov_base = x...200 (addr_mask = x1ff) > > That particular pass through is in the middle of the stuff it tried > to > do, so I don't know if that's the cause or not but it strikes me as > unusual. Will look into that tomorrow and report back. > Apologies, it took me an extra day to get back to this, but it is indeed this pass through that's causing our boot failures. I note that the old code (in iomap_dio_bio_iter), did: if ((pos | length | align) & ((1 << blkbits) - 1)) return -EINVAL; With blkbits equal to 12, the resulting mask was 0x0fff against an align value (from iov_iter_alignment) of x200 kicks us out. The new code (in iov_iter_aligned_iovec), meanwhile, compares this: if ((unsigned long)(i->iov[k].iov_base + skip) & addr_mask) return false; iov_base (and the output of the old iov_iter_aligned_iovec() routine) is x200, but since addr_mask is x1ff this check provides a different response than it used to. To check this, I changed the comparator to len_mask (almost certainly not the right answer since addr_mask is then unused, but it was good for a quick test), and our PV guests are able to boot again with -next running in the host. Thanks, Eric ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-06-27 15:21 ` Eric Farman @ 2022-06-27 15:36 ` Keith Busch 2022-06-28 9:00 ` Halil Pasic 0 siblings, 1 reply; 30+ messages in thread From: Keith Busch @ 2022-06-27 15:36 UTC (permalink / raw) To: Eric Farman Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme, Christian Borntraeger, axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers, pankydev8, Halil Pasic On Mon, Jun 27, 2022 at 11:21:20AM -0400, Eric Farman wrote: > > Apologies, it took me an extra day to get back to this, but it is > indeed this pass through that's causing our boot failures. I note that > the old code (in iomap_dio_bio_iter), did: > > if ((pos | length | align) & ((1 << blkbits) - 1)) > return -EINVAL; > > With blkbits equal to 12, the resulting mask was 0x0fff against an > align value (from iov_iter_alignment) of x200 kicks us out. > > The new code (in iov_iter_aligned_iovec), meanwhile, compares this: > > if ((unsigned long)(i->iov[k].iov_base + skip) & > addr_mask) > return false; > > iov_base (and the output of the old iov_iter_aligned_iovec() routine) > is x200, but since addr_mask is x1ff this check provides a different > response than it used to. > > To check this, I changed the comparator to len_mask (almost certainly > not the right answer since addr_mask is then unused, but it was good > for a quick test), and our PV guests are able to boot again with -next > running in the host. This raises more questions for me. It sounds like your process used to get an EINVAL error, and it wants to continue getting an EINVAL error instead of letting the direct-io request proceed. Is that correct? If so, could you provide more details on what issue occurs with dispatching this request? If you really need to restrict address' alignment to the storage's logical block size, I think your storage driver needs to set the dma_alignment queue limit to that value. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-06-27 15:36 ` Keith Busch @ 2022-06-28 9:00 ` Halil Pasic 2022-06-28 15:20 ` Eric Farman 0 siblings, 1 reply; 30+ messages in thread From: Halil Pasic @ 2022-06-28 9:00 UTC (permalink / raw) To: Keith Busch Cc: Eric Farman, Keith Busch, linux-fsdevel, linux-block, linux-nvme, Christian Borntraeger, axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers, pankydev8, Halil Pasic On Mon, 27 Jun 2022 09:36:56 -0600 Keith Busch <kbusch@kernel.org> wrote: > On Mon, Jun 27, 2022 at 11:21:20AM -0400, Eric Farman wrote: > > > > Apologies, it took me an extra day to get back to this, but it is > > indeed this pass through that's causing our boot failures. I note that > > the old code (in iomap_dio_bio_iter), did: > > > > if ((pos | length | align) & ((1 << blkbits) - 1)) > > return -EINVAL; > > > > With blkbits equal to 12, the resulting mask was 0x0fff against an > > align value (from iov_iter_alignment) of x200 kicks us out. > > > > The new code (in iov_iter_aligned_iovec), meanwhile, compares this: > > > > if ((unsigned long)(i->iov[k].iov_base + skip) & > > addr_mask) > > return false; > > > > iov_base (and the output of the old iov_iter_aligned_iovec() routine) > > is x200, but since addr_mask is x1ff this check provides a different > > response than it used to. > > > > To check this, I changed the comparator to len_mask (almost certainly > > not the right answer since addr_mask is then unused, but it was good > > for a quick test), and our PV guests are able to boot again with -next > > running in the host. > > This raises more questions for me. It sounds like your process used to get an > EINVAL error, and it wants to continue getting an EINVAL error instead of > letting the direct-io request proceed. Is that correct? Is my understanding as well. But I'm not familiar enough with the code to tell where and how that -EINVAL gets handled. BTW let me just point out that the bounce buffering via swiotlb needed for PV is not unlikely to mess up the alignment of things. But I'm not sure if that is relevant here. Regards, Halil > If so, could you > provide more details on what issue occurs with dispatching this request? > > If you really need to restrict address' alignment to the storage's logical > block size, I think your storage driver needs to set the dma_alignment queue > limit to that value. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-06-28 9:00 ` Halil Pasic @ 2022-06-28 15:20 ` Eric Farman 2022-06-29 3:18 ` Eric Farman 0 siblings, 1 reply; 30+ messages in thread From: Eric Farman @ 2022-06-28 15:20 UTC (permalink / raw) To: Halil Pasic, Keith Busch Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme, Christian Borntraeger, axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers, pankydev8 On Tue, 2022-06-28 at 11:00 +0200, Halil Pasic wrote: > On Mon, 27 Jun 2022 09:36:56 -0600 > Keith Busch <kbusch@kernel.org> wrote: > > > On Mon, Jun 27, 2022 at 11:21:20AM -0400, Eric Farman wrote: > > > Apologies, it took me an extra day to get back to this, but it is > > > indeed this pass through that's causing our boot failures. I note > > > that > > > the old code (in iomap_dio_bio_iter), did: > > > > > > if ((pos | length | align) & ((1 << blkbits) - 1)) > > > return -EINVAL; > > > > > > With blkbits equal to 12, the resulting mask was 0x0fff against > > > an > > > align value (from iov_iter_alignment) of x200 kicks us out. > > > > > > The new code (in iov_iter_aligned_iovec), meanwhile, compares > > > this: > > > > > > if ((unsigned long)(i->iov[k].iov_base + skip) & > > > addr_mask) > > > return false; > > > > > > iov_base (and the output of the old iov_iter_aligned_iovec() > > > routine) > > > is x200, but since addr_mask is x1ff this check provides a > > > different > > > response than it used to. > > > > > > To check this, I changed the comparator to len_mask (almost > > > certainly > > > not the right answer since addr_mask is then unused, but it was > > > good > > > for a quick test), and our PV guests are able to boot again with > > > -next > > > running in the host. > > > > This raises more questions for me. It sounds like your process used > > to get an > > EINVAL error, and it wants to continue getting an EINVAL error > > instead of > > letting the direct-io request proceed. Is that correct? > > Is my understanding as well. But I'm not familiar enough with the > code to > tell where and how that -EINVAL gets handled. > > BTW let me just point out that the bounce buffering via swiotlb > needed > for PV is not unlikely to mess up the alignment of things. But I'm > not > sure if that is relevant here. > > Regards, > Halil > > > If so, could you > > provide more details on what issue occurs with dispatching this > > request? This error occurs reading the initial boot record for a guest, stating QEMU was unable to read block zero from the device. The code that complains doesn't appear to have anything that says "oh, got EINVAL, try it this other way" but I haven't chased down if/where something in between is expecting that and handling it in some unique way. I -think- I have an easier reproducer now, so maybe I'd be able to get a better answer to this question. > > > > If you really need to restrict address' alignment to the storage's > > logical > > block size, I think your storage driver needs to set the > > dma_alignment queue > > limit to that value. It's possible that there's a problem in the virtio stack here, but the failing configuration is a qcow image on the host rootfs, so it's not using any distinct driver. The bdev request queue that ends up being used is the same allocated out of blk_alloc_queue, so changing dma_alignment there wouldn't work. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-06-28 15:20 ` Eric Farman @ 2022-06-29 3:18 ` Eric Farman 2022-06-29 3:52 ` Keith Busch 0 siblings, 1 reply; 30+ messages in thread From: Eric Farman @ 2022-06-29 3:18 UTC (permalink / raw) To: Halil Pasic, Keith Busch Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme, Christian Borntraeger, axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers, pankydev8 On Tue, 2022-06-28 at 11:20 -0400, Eric Farman wrote: > On Tue, 2022-06-28 at 11:00 +0200, Halil Pasic wrote: > > On Mon, 27 Jun 2022 09:36:56 -0600 > > Keith Busch <kbusch@kernel.org> wrote: > > > > > On Mon, Jun 27, 2022 at 11:21:20AM -0400, Eric Farman wrote: > > > > Apologies, it took me an extra day to get back to this, but it > > > > is > > > > indeed this pass through that's causing our boot failures. I > > > > note > > > > that > > > > the old code (in iomap_dio_bio_iter), did: > > > > > > > > if ((pos | length | align) & ((1 << blkbits) - 1)) > > > > return -EINVAL; > > > > > > > > With blkbits equal to 12, the resulting mask was 0x0fff against > > > > an > > > > align value (from iov_iter_alignment) of x200 kicks us out. > > > > > > > > The new code (in iov_iter_aligned_iovec), meanwhile, compares > > > > this: > > > > > > > > if ((unsigned long)(i->iov[k].iov_base + skip) > > > > & > > > > addr_mask) > > > > return false; > > > > > > > > iov_base (and the output of the old iov_iter_aligned_iovec() > > > > routine) > > > > is x200, but since addr_mask is x1ff this check provides a > > > > different > > > > response than it used to. > > > > > > > > To check this, I changed the comparator to len_mask (almost > > > > certainly > > > > not the right answer since addr_mask is then unused, but it was > > > > good > > > > for a quick test), and our PV guests are able to boot again > > > > with > > > > -next > > > > running in the host. > > > > > > This raises more questions for me. It sounds like your process > > > used > > > to get an > > > EINVAL error, and it wants to continue getting an EINVAL error > > > instead of > > > letting the direct-io request proceed. Is that correct? Sort of. In the working case, I see a set of iovecs come through with different counts: base count 0000 0001 0000 0200 0000 0400 0000 0800 0000 1000 0001 1000 0200 1000 << Change occurs here 0400 1000 0800 1000 1000 1000 EINVAL was being returned for any of these iovecs except the page- aligned ones. Once the x200 request returns 0, the remainder of the above list was skipped and the requests continue elsewhere on the file. Still not sure how our request is getting us into this process. We're simply asking to read a single block, but that's somewhere within an image file. > > > > Is my understanding as well. But I'm not familiar enough with the > > code to > > tell where and how that -EINVAL gets handled. > > > > BTW let me just point out that the bounce buffering via swiotlb > > needed > > for PV is not unlikely to mess up the alignment of things. But I'm > > not > > sure if that is relevant here. It's true that PV guests were the first to trip over this, but I've since been able to reproduce this with a normal guest. So long as the image file is connected with cache.direct=true, it's unbootable. That should absolve the swiotlb bits from being at fault here. > > > > Regards, > > Halil > > > > > If so, could you > > > provide more details on what issue occurs with dispatching this > > > request? > > This error occurs reading the initial boot record for a guest, > stating > QEMU was unable to read block zero from the device. The code that > complains doesn't appear to have anything that says "oh, got EINVAL, > try it this other way" but I haven't chased down if/where something > in > between is expecting that and handling it in some unique way. I > -think- > I have an easier reproducer now, so maybe I'd be able to get a > better > answer to this question. > > > > If you really need to restrict address' alignment to the > > > storage's > > > logical > > > block size, I think your storage driver needs to set the > > > dma_alignment queue > > > limit to that value. > > It's possible that there's a problem in the virtio stack here, but > the > failing configuration is a qcow image on the host rootfs (on an ext4 filesystem) > , so it's not > using any distinct driver. The bdev request queue that ends up being > used is the same allocated out of blk_alloc_queue, so changing > dma_alignment there wouldn't work. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-06-29 3:18 ` Eric Farman @ 2022-06-29 3:52 ` Keith Busch 2022-06-29 18:04 ` Eric Farman 0 siblings, 1 reply; 30+ messages in thread From: Keith Busch @ 2022-06-29 3:52 UTC (permalink / raw) To: Eric Farman Cc: Halil Pasic, Keith Busch, linux-fsdevel, linux-block, linux-nvme, Christian Borntraeger, axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers, pankydev8 On Tue, Jun 28, 2022 at 11:18:34PM -0400, Eric Farman wrote: > Sort of. In the working case, I see a set of iovecs come through with > different counts: > > base count > 0000 0001 > 0000 0200 > 0000 0400 > 0000 0800 > 0000 1000 > 0001 1000 > 0200 1000 << Change occurs here > 0400 1000 > 0800 1000 > 1000 1000 > > EINVAL was being returned for any of these iovecs except the page- > aligned ones. Once the x200 request returns 0, the remainder of the > above list was skipped and the requests continue elsewhere on the file. > > Still not sure how our request is getting us into this process. We're > simply asking to read a single block, but that's somewhere within an > image file. I thought this was sounding like some kind of corruption. I tested ext4 on various qemu devices with 4k logical block sizes, and it all looks okay there. What block driver are you observing this with? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-06-29 3:52 ` Keith Busch @ 2022-06-29 18:04 ` Eric Farman 2022-06-29 19:07 ` Keith Busch 0 siblings, 1 reply; 30+ messages in thread From: Eric Farman @ 2022-06-29 18:04 UTC (permalink / raw) To: Keith Busch Cc: Halil Pasic, Keith Busch, linux-fsdevel, linux-block, linux-nvme, Christian Borntraeger, axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers, pankydev8 On Tue, 2022-06-28 at 21:52 -0600, Keith Busch wrote: > On Tue, Jun 28, 2022 at 11:18:34PM -0400, Eric Farman wrote: > > Sort of. In the working case, I see a set of iovecs come through > > with > > different counts: > > > > base count > > 0000 0001 > > 0000 0200 > > 0000 0400 > > 0000 0800 > > 0000 1000 > > 0001 1000 > > 0200 1000 << Change occurs here > > 0400 1000 > > 0800 1000 > > 1000 1000 > > > > EINVAL was being returned for any of these iovecs except the page- > > aligned ones. Once the x200 request returns 0, the remainder of the > > above list was skipped and the requests continue elsewhere on the > > file. > > > > Still not sure how our request is getting us into this process. > > We're > > simply asking to read a single block, but that's somewhere within > > an > > image file. > > I thought this was sounding like some kind of corruption. I tested > ext4 on > various qemu devices with 4k logical block sizes, and it all looks > okay there. > > What block driver are you observing this with? s390 dasd This made me think to change my rootfs, and of course the problem goes away once on something like a SCSI volume. So crawling through the dasd (instead of virtio) driver and I finally find the point where a change to dma_alignment (which you mentioned earlier) would actually fit. Such a change fixes this for me, so I'll run it by our DASD guys. Thanks for your help and patience. Eric ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-06-29 18:04 ` Eric Farman @ 2022-06-29 19:07 ` Keith Busch 2022-06-29 19:28 ` Eric Farman 2022-06-30 5:45 ` Christian Borntraeger 0 siblings, 2 replies; 30+ messages in thread From: Keith Busch @ 2022-06-29 19:07 UTC (permalink / raw) To: Eric Farman Cc: Halil Pasic, Keith Busch, linux-fsdevel, linux-block, linux-nvme, Christian Borntraeger, axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers, pankydev8 On Wed, Jun 29, 2022 at 02:04:47PM -0400, Eric Farman wrote: > s390 dasd > > This made me think to change my rootfs, and of course the problem goes > away once on something like a SCSI volume. > > So crawling through the dasd (instead of virtio) driver and I finally > find the point where a change to dma_alignment (which you mentioned > earlier) would actually fit. > > Such a change fixes this for me, so I'll run it by our DASD guys. > Thanks for your help and patience. I'm assuming there's some driver or device requirement that's making this necessary. Is the below driver change what you're looking for? If so, I think you might want this regardless of this direct-io patch just because other interfaces like blk_rq_map_user_iov() and blk_rq_aligned() align to it. --- diff --git a/drivers/s390/block/dasd_fba.c b/drivers/s390/block/dasd_fba.c index 60be7f7bf2d1..5c79fb02cded 100644 --- a/drivers/s390/block/dasd_fba.c +++ b/drivers/s390/block/dasd_fba.c @@ -780,6 +780,7 @@ static void dasd_fba_setup_blk_queue(struct dasd_block *block) /* With page sized segments each segment can be translated into one idaw/tidaw */ blk_queue_max_segment_size(q, PAGE_SIZE); blk_queue_segment_boundary(q, PAGE_SIZE - 1); + blk_queue_dma_alignment(q, PAGE_SIZE - 1); q->limits.discard_granularity = logical_block_size; -- ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-06-29 19:07 ` Keith Busch @ 2022-06-29 19:28 ` Eric Farman 2022-06-30 5:45 ` Christian Borntraeger 1 sibling, 0 replies; 30+ messages in thread From: Eric Farman @ 2022-06-29 19:28 UTC (permalink / raw) To: Keith Busch Cc: Halil Pasic, Keith Busch, linux-fsdevel, linux-block, linux-nvme, Christian Borntraeger, axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers, pankydev8 On Wed, 2022-06-29 at 13:07 -0600, Keith Busch wrote: > On Wed, Jun 29, 2022 at 02:04:47PM -0400, Eric Farman wrote: > > s390 dasd > > > > This made me think to change my rootfs, and of course the problem > > goes > > away once on something like a SCSI volume. > > > > So crawling through the dasd (instead of virtio) driver and I > > finally > > find the point where a change to dma_alignment (which you mentioned > > earlier) would actually fit. > > > > Such a change fixes this for me, so I'll run it by our DASD guys. > > Thanks for your help and patience. > > I'm assuming there's some driver or device requirement that's making > this > necessary. Is the below driver change what you're looking for? Yup, that's exactly what I have (in dasd_eckd.c) and indeed gets things working again. Need to scrounge up some FBA volumes to test that configuration and the change there. > If so, I think > you might want this regardless of this direct-io patch just because > other > interfaces like blk_rq_map_user_iov() and blk_rq_aligned() align to > it. Good point. > > --- > diff --git a/drivers/s390/block/dasd_fba.c > b/drivers/s390/block/dasd_fba.c > index 60be7f7bf2d1..5c79fb02cded 100644 > --- a/drivers/s390/block/dasd_fba.c > +++ b/drivers/s390/block/dasd_fba.c > @@ -780,6 +780,7 @@ static void dasd_fba_setup_blk_queue(struct > dasd_block *block) > /* With page sized segments each segment can be translated into > one idaw/tidaw */ > blk_queue_max_segment_size(q, PAGE_SIZE); > blk_queue_segment_boundary(q, PAGE_SIZE - 1); > + blk_queue_dma_alignment(q, PAGE_SIZE - 1); > > q->limits.discard_granularity = logical_block_size; > > -- ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-06-29 19:07 ` Keith Busch 2022-06-29 19:28 ` Eric Farman @ 2022-06-30 5:45 ` Christian Borntraeger 1 sibling, 0 replies; 30+ messages in thread From: Christian Borntraeger @ 2022-06-30 5:45 UTC (permalink / raw) To: Keith Busch, Eric Farman Cc: Halil Pasic, Keith Busch, linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers, pankydev8, Stefan Haberland, Jan Höppner CCing Stefan, Jan. Below is necessary (in dasd-eckd.c) to make kvm boot working again (and this seems to be the right thing anyway). Am 29.06.22 um 21:07 schrieb Keith Busch: > On Wed, Jun 29, 2022 at 02:04:47PM -0400, Eric Farman wrote: >> s390 dasd >> >> This made me think to change my rootfs, and of course the problem goes >> away once on something like a SCSI volume. >> >> So crawling through the dasd (instead of virtio) driver and I finally >> find the point where a change to dma_alignment (which you mentioned >> earlier) would actually fit. >> >> Such a change fixes this for me, so I'll run it by our DASD guys. >> Thanks for your help and patience. > > I'm assuming there's some driver or device requirement that's making this > necessary. Is the below driver change what you're looking for? If so, I think > you might want this regardless of this direct-io patch just because other > interfaces like blk_rq_map_user_iov() and blk_rq_aligned() align to it. > > --- > diff --git a/drivers/s390/block/dasd_fba.c b/drivers/s390/block/dasd_fba.c > index 60be7f7bf2d1..5c79fb02cded 100644 > --- a/drivers/s390/block/dasd_fba.c > +++ b/drivers/s390/block/dasd_fba.c > @@ -780,6 +780,7 @@ static void dasd_fba_setup_blk_queue(struct dasd_block *block) > /* With page sized segments each segment can be translated into one idaw/tidaw */ > blk_queue_max_segment_size(q, PAGE_SIZE); > blk_queue_segment_boundary(q, PAGE_SIZE - 1); > + blk_queue_dma_alignment(q, PAGE_SIZE - 1); > > q->limits.discard_granularity = logical_block_size; > > -- ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io [not found] ` <20220610195830.3574005-12-kbusch@fb.com> 2022-06-23 18:29 ` [PATCHv6 11/11] iomap: add support for dma aligned direct-io Eric Farman @ 2022-07-22 7:36 ` Eric Biggers 2022-07-22 14:43 ` Keith Busch 2022-07-22 17:53 ` Darrick J. Wong 1 sibling, 2 replies; 30+ messages in thread From: Eric Biggers @ 2022-07-22 7:36 UTC (permalink / raw) To: Keith Busch, Jaegeuk Kim, Chao Yu Cc: linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal, pankydev8, Keith Busch, linux-f2fs-devel [+f2fs list and maintainers] On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote: > From: Keith Busch <kbusch@kernel.org> > > Use the address alignment requirements from the block_device for direct > io instead of requiring addresses be aligned to the block size. > > Signed-off-by: Keith Busch <kbusch@kernel.org> > Reviewed-by: Christoph Hellwig <hch@lst.de> > --- > fs/iomap/direct-io.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > index 370c3241618a..5d098adba443 100644 > --- a/fs/iomap/direct-io.c > +++ b/fs/iomap/direct-io.c > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > struct inode *inode = iter->inode; > unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev)); > unsigned int fs_block_size = i_blocksize(inode), pad; > - unsigned int align = iov_iter_alignment(dio->submit.iter); > loff_t length = iomap_length(iter); > loff_t pos = iter->pos; > unsigned int bio_opf; > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > size_t copied = 0; > size_t orig_count; > > - if ((pos | length | align) & ((1 << blkbits) - 1)) > + if ((pos | length) & ((1 << blkbits) - 1) || > + !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter)) > return -EINVAL; > > if (iomap->type == IOMAP_UNWRITTEN) { I noticed that this patch is going to break the following logic in f2fs_should_use_dio() in fs/f2fs/file.c: /* * Direct I/O not aligned to the disk's logical_block_size will be * attempted, but will fail with -EINVAL. * * f2fs additionally requires that direct I/O be aligned to the * filesystem block size, which is often a stricter requirement. * However, f2fs traditionally falls back to buffered I/O on requests * that are logical_block_size-aligned but not fs-block aligned. * * The below logic implements this behavior. */ align = iocb->ki_pos | iov_iter_alignment(iter); if (!IS_ALIGNED(align, i_blocksize(inode)) && IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev))) return false; return true; So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical block aligned. This patch changes that. The result is that DIO will sometimes proceed in cases where the I/O doesn't have the fs block alignment required by f2fs for all DIO. Does anyone have any thoughts about what f2fs should be doing here? I think it's weird that f2fs has different behaviors for different degrees of misalignment: fail with EINVAL if not logical block aligned, else fallback to buffered I/O if not fs block aligned. I think it should be one convention or the other. Any opinions about which one it should be? (Note: if you blame the above code, it was written by me. But I was just preserving the existing behavior; I don't know the original motivation.) - Eric ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-07-22 7:36 ` Eric Biggers @ 2022-07-22 14:43 ` Keith Busch 2022-07-22 18:01 ` Eric Biggers 2022-07-22 17:53 ` Darrick J. Wong 1 sibling, 1 reply; 30+ messages in thread From: Keith Busch @ 2022-07-22 14:43 UTC (permalink / raw) To: Eric Biggers Cc: Keith Busch, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal, pankydev8, linux-f2fs-devel On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote: > [+f2fs list and maintainers] > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote: > > From: Keith Busch <kbusch@kernel.org> > > > > Use the address alignment requirements from the block_device for direct > > io instead of requiring addresses be aligned to the block size. > > > > Signed-off-by: Keith Busch <kbusch@kernel.org> > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > --- > > fs/iomap/direct-io.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > > index 370c3241618a..5d098adba443 100644 > > --- a/fs/iomap/direct-io.c > > +++ b/fs/iomap/direct-io.c > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > struct inode *inode = iter->inode; > > unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev)); > > unsigned int fs_block_size = i_blocksize(inode), pad; > > - unsigned int align = iov_iter_alignment(dio->submit.iter); > > loff_t length = iomap_length(iter); > > loff_t pos = iter->pos; > > unsigned int bio_opf; > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > size_t copied = 0; > > size_t orig_count; > > > > - if ((pos | length | align) & ((1 << blkbits) - 1)) > > + if ((pos | length) & ((1 << blkbits) - 1) || > > + !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter)) > > return -EINVAL; > > > > if (iomap->type == IOMAP_UNWRITTEN) { > > I noticed that this patch is going to break the following logic in > f2fs_should_use_dio() in fs/f2fs/file.c: > > /* > * Direct I/O not aligned to the disk's logical_block_size will be > * attempted, but will fail with -EINVAL. > * > * f2fs additionally requires that direct I/O be aligned to the > * filesystem block size, which is often a stricter requirement. > * However, f2fs traditionally falls back to buffered I/O on requests > * that are logical_block_size-aligned but not fs-block aligned. > * > * The below logic implements this behavior. > */ > align = iocb->ki_pos | iov_iter_alignment(iter); > if (!IS_ALIGNED(align, i_blocksize(inode)) && > IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev))) > return false; > > return true; > > So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical > block aligned. This patch changes that. The result is that DIO will sometimes > proceed in cases where the I/O doesn't have the fs block alignment required by > f2fs for all DIO. > > Does anyone have any thoughts about what f2fs should be doing here? I think > it's weird that f2fs has different behaviors for different degrees of > misalignment: fail with EINVAL if not logical block aligned, else fallback to > buffered I/O if not fs block aligned. I think it should be one convention or > the other. Any opinions about which one it should be? It looks like f2fs just falls back to buffered IO for this condition without reaching the new code in iomap_dio_bio_iter(). btrfs does the same thing (check_direct_IO()). Is that a problem? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-07-22 14:43 ` Keith Busch @ 2022-07-22 18:01 ` Eric Biggers 2022-07-22 20:26 ` Keith Busch 2022-07-24 2:13 ` Jaegeuk Kim 0 siblings, 2 replies; 30+ messages in thread From: Eric Biggers @ 2022-07-22 18:01 UTC (permalink / raw) To: Keith Busch Cc: Keith Busch, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal, pankydev8, linux-f2fs-devel On Fri, Jul 22, 2022 at 08:43:55AM -0600, Keith Busch wrote: > On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote: > > [+f2fs list and maintainers] > > > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote: > > > From: Keith Busch <kbusch@kernel.org> > > > > > > Use the address alignment requirements from the block_device for direct > > > io instead of requiring addresses be aligned to the block size. > > > > > > Signed-off-by: Keith Busch <kbusch@kernel.org> > > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > > --- > > > fs/iomap/direct-io.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > > > index 370c3241618a..5d098adba443 100644 > > > --- a/fs/iomap/direct-io.c > > > +++ b/fs/iomap/direct-io.c > > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > > struct inode *inode = iter->inode; > > > unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev)); > > > unsigned int fs_block_size = i_blocksize(inode), pad; > > > - unsigned int align = iov_iter_alignment(dio->submit.iter); > > > loff_t length = iomap_length(iter); > > > loff_t pos = iter->pos; > > > unsigned int bio_opf; > > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > > size_t copied = 0; > > > size_t orig_count; > > > > > > - if ((pos | length | align) & ((1 << blkbits) - 1)) > > > + if ((pos | length) & ((1 << blkbits) - 1) || > > > + !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter)) > > > return -EINVAL; > > > > > > if (iomap->type == IOMAP_UNWRITTEN) { > > > > I noticed that this patch is going to break the following logic in > > f2fs_should_use_dio() in fs/f2fs/file.c: > > > > /* > > * Direct I/O not aligned to the disk's logical_block_size will be > > * attempted, but will fail with -EINVAL. > > * > > * f2fs additionally requires that direct I/O be aligned to the > > * filesystem block size, which is often a stricter requirement. > > * However, f2fs traditionally falls back to buffered I/O on requests > > * that are logical_block_size-aligned but not fs-block aligned. > > * > > * The below logic implements this behavior. > > */ > > align = iocb->ki_pos | iov_iter_alignment(iter); > > if (!IS_ALIGNED(align, i_blocksize(inode)) && > > IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev))) > > return false; > > > > return true; > > > > So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical > > block aligned. This patch changes that. The result is that DIO will sometimes > > proceed in cases where the I/O doesn't have the fs block alignment required by > > f2fs for all DIO. > > > > Does anyone have any thoughts about what f2fs should be doing here? I think > > it's weird that f2fs has different behaviors for different degrees of > > misalignment: fail with EINVAL if not logical block aligned, else fallback to > > buffered I/O if not fs block aligned. I think it should be one convention or > > the other. Any opinions about which one it should be? > > It looks like f2fs just falls back to buffered IO for this condition without > reaching the new code in iomap_dio_bio_iter(). No. It's a bit subtle, so read the code and what I'm saying carefully. f2fs only supports 4K aligned DIO and normally falls back to buffered I/O; however, for DIO that is *very* misaligned (not even LBS aligned) it returns EINVAL instead. And it relies on __iomap_dio_rw() returning that EINVAL. Relying on __iomap_dio_rw() in that way is definitely a bad design on f2fs's part (and I messed that up when switching f2fs from fs/direct-io.c to iomap). The obvious fix is to just have f2fs do the LBS alignment check itself. But I think that f2fs shouldn't have different behavior for different levels of misalignment in the first place, so I was wondering if anyone had any thoughts on which behavior (EINVAL or fallback to buffered I/O) should be standardized on in all cases, at least for f2fs. There was some discussion about this sort of thing for ext4 several years ago on the thread https://lore.kernel.org/linux-ext4/1461472078-20104-1-git-send-email-tytso@mit.edu/T/#u, but it didn't really reach a conclusion. I'm wondering if the f2fs maintainers have any thoughts about why the f2fs behavior is as it is. I.e. is it just accidental, or are there specific reasons... - Eric ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-07-22 18:01 ` Eric Biggers @ 2022-07-22 20:26 ` Keith Busch 2022-07-25 18:19 ` Eric Biggers 2022-07-24 2:13 ` Jaegeuk Kim 1 sibling, 1 reply; 30+ messages in thread From: Keith Busch @ 2022-07-22 20:26 UTC (permalink / raw) To: Eric Biggers Cc: Keith Busch, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal, pankydev8, linux-f2fs-devel On Fri, Jul 22, 2022 at 06:01:35PM +0000, Eric Biggers wrote: > On Fri, Jul 22, 2022 at 08:43:55AM -0600, Keith Busch wrote: > > On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote: > > > [+f2fs list and maintainers] > > > > > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote: > > > > From: Keith Busch <kbusch@kernel.org> > > > > > > > > Use the address alignment requirements from the block_device for direct > > > > io instead of requiring addresses be aligned to the block size. > > > > > > > > Signed-off-by: Keith Busch <kbusch@kernel.org> > > > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > > > --- > > > > fs/iomap/direct-io.c | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > > > > index 370c3241618a..5d098adba443 100644 > > > > --- a/fs/iomap/direct-io.c > > > > +++ b/fs/iomap/direct-io.c > > > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > > > struct inode *inode = iter->inode; > > > > unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev)); > > > > unsigned int fs_block_size = i_blocksize(inode), pad; > > > > - unsigned int align = iov_iter_alignment(dio->submit.iter); > > > > loff_t length = iomap_length(iter); > > > > loff_t pos = iter->pos; > > > > unsigned int bio_opf; > > > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > > > size_t copied = 0; > > > > size_t orig_count; > > > > > > > > - if ((pos | length | align) & ((1 << blkbits) - 1)) > > > > + if ((pos | length) & ((1 << blkbits) - 1) || > > > > + !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter)) > > > > return -EINVAL; > > > > > > > > if (iomap->type == IOMAP_UNWRITTEN) { > > > > > > I noticed that this patch is going to break the following logic in > > > f2fs_should_use_dio() in fs/f2fs/file.c: > > > > > > /* > > > * Direct I/O not aligned to the disk's logical_block_size will be > > > * attempted, but will fail with -EINVAL. > > > * > > > * f2fs additionally requires that direct I/O be aligned to the > > > * filesystem block size, which is often a stricter requirement. > > > * However, f2fs traditionally falls back to buffered I/O on requests > > > * that are logical_block_size-aligned but not fs-block aligned. > > > * > > > * The below logic implements this behavior. > > > */ > > > align = iocb->ki_pos | iov_iter_alignment(iter); > > > if (!IS_ALIGNED(align, i_blocksize(inode)) && > > > IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev))) > > > return false; > > > > > > return true; > > > > > > So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical > > > block aligned. This patch changes that. The result is that DIO will sometimes > > > proceed in cases where the I/O doesn't have the fs block alignment required by > > > f2fs for all DIO. > > > > > > Does anyone have any thoughts about what f2fs should be doing here? I think > > > it's weird that f2fs has different behaviors for different degrees of > > > misalignment: fail with EINVAL if not logical block aligned, else fallback to > > > buffered I/O if not fs block aligned. I think it should be one convention or > > > the other. Any opinions about which one it should be? > > > > It looks like f2fs just falls back to buffered IO for this condition without > > reaching the new code in iomap_dio_bio_iter(). > > No. It's a bit subtle, so read the code and what I'm saying carefully. f2fs > only supports 4K aligned DIO and normally falls back to buffered I/O; however, > for DIO that is *very* misaligned (not even LBS aligned) it returns EINVAL > instead. And it relies on __iomap_dio_rw() returning that EINVAL. Okay, I understand the code flow now. I tested f2fs direct io with every possible alignment, and it is successful for all hardware dma supported alignments and continues to return EINVAL for unaligned. Is the concern that it's now returning success in scenarios that used to fail? Or is there some other 4k f2fs constraint that I haven't found yet? > Relying on __iomap_dio_rw() in that way is definitely a bad design on f2fs's > part (and I messed that up when switching f2fs from fs/direct-io.c to iomap). > The obvious fix is to just have f2fs do the LBS alignment check itself. > > But I think that f2fs shouldn't have different behavior for different levels of > misalignment in the first place, so I was wondering if anyone had any thoughts > on which behavior (EINVAL or fallback to buffered I/O) should be standardized on > in all cases, at least for f2fs. There was some discussion about this sort of > thing for ext4 several years ago on the thread > https://lore.kernel.org/linux-ext4/1461472078-20104-1-git-send-email-tytso@mit.edu/T/#u, > but it didn't really reach a conclusion. I'm wondering if the f2fs maintainers > have any thoughts about why the f2fs behavior is as it is. I.e. is it just > accidental, or are there specific reasons... > > - Eric ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-07-22 20:26 ` Keith Busch @ 2022-07-25 18:19 ` Eric Biggers 0 siblings, 0 replies; 30+ messages in thread From: Eric Biggers @ 2022-07-25 18:19 UTC (permalink / raw) To: Keith Busch Cc: Keith Busch, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal, pankydev8, linux-f2fs-devel On Fri, Jul 22, 2022 at 02:26:05PM -0600, Keith Busch wrote: > On Fri, Jul 22, 2022 at 06:01:35PM +0000, Eric Biggers wrote: > > On Fri, Jul 22, 2022 at 08:43:55AM -0600, Keith Busch wrote: > > > On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote: > > > > [+f2fs list and maintainers] > > > > > > > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote: > > > > > From: Keith Busch <kbusch@kernel.org> > > > > > > > > > > Use the address alignment requirements from the block_device for direct > > > > > io instead of requiring addresses be aligned to the block size. > > > > > > > > > > Signed-off-by: Keith Busch <kbusch@kernel.org> > > > > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > > > > --- > > > > > fs/iomap/direct-io.c | 4 ++-- > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > > > > > index 370c3241618a..5d098adba443 100644 > > > > > --- a/fs/iomap/direct-io.c > > > > > +++ b/fs/iomap/direct-io.c > > > > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > > > > struct inode *inode = iter->inode; > > > > > unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev)); > > > > > unsigned int fs_block_size = i_blocksize(inode), pad; > > > > > - unsigned int align = iov_iter_alignment(dio->submit.iter); > > > > > loff_t length = iomap_length(iter); > > > > > loff_t pos = iter->pos; > > > > > unsigned int bio_opf; > > > > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > > > > size_t copied = 0; > > > > > size_t orig_count; > > > > > > > > > > - if ((pos | length | align) & ((1 << blkbits) - 1)) > > > > > + if ((pos | length) & ((1 << blkbits) - 1) || > > > > > + !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter)) > > > > > return -EINVAL; > > > > > > > > > > if (iomap->type == IOMAP_UNWRITTEN) { > > > > > > > > I noticed that this patch is going to break the following logic in > > > > f2fs_should_use_dio() in fs/f2fs/file.c: > > > > > > > > /* > > > > * Direct I/O not aligned to the disk's logical_block_size will be > > > > * attempted, but will fail with -EINVAL. > > > > * > > > > * f2fs additionally requires that direct I/O be aligned to the > > > > * filesystem block size, which is often a stricter requirement. > > > > * However, f2fs traditionally falls back to buffered I/O on requests > > > > * that are logical_block_size-aligned but not fs-block aligned. > > > > * > > > > * The below logic implements this behavior. > > > > */ > > > > align = iocb->ki_pos | iov_iter_alignment(iter); > > > > if (!IS_ALIGNED(align, i_blocksize(inode)) && > > > > IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev))) > > > > return false; > > > > > > > > return true; > > > > > > > > So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical > > > > block aligned. This patch changes that. The result is that DIO will sometimes > > > > proceed in cases where the I/O doesn't have the fs block alignment required by > > > > f2fs for all DIO. > > > > > > > > Does anyone have any thoughts about what f2fs should be doing here? I think > > > > it's weird that f2fs has different behaviors for different degrees of > > > > misalignment: fail with EINVAL if not logical block aligned, else fallback to > > > > buffered I/O if not fs block aligned. I think it should be one convention or > > > > the other. Any opinions about which one it should be? > > > > > > It looks like f2fs just falls back to buffered IO for this condition without > > > reaching the new code in iomap_dio_bio_iter(). > > > > No. It's a bit subtle, so read the code and what I'm saying carefully. f2fs > > only supports 4K aligned DIO and normally falls back to buffered I/O; however, > > for DIO that is *very* misaligned (not even LBS aligned) it returns EINVAL > > instead. And it relies on __iomap_dio_rw() returning that EINVAL. > > Okay, I understand the code flow now. > > I tested f2fs direct io with every possible alignment, and it is successful for > all hardware dma supported alignments and continues to return EINVAL for > unaligned. Is the concern that it's now returning success in scenarios that > used to fail? Or is there some other 4k f2fs constraint that I haven't found > yet? f2fs has a lot of different options, and edge cases that only occur occasionally such as data blocks being moved by garbage collection. So just because misaligned DIO appears to work doesn't necessarily mean that it actually works. Maybe the f2fs developers can elaborate on the reason that f2fs always requires 4K alignment for DIO. - Eric ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-07-22 18:01 ` Eric Biggers 2022-07-22 20:26 ` Keith Busch @ 2022-07-24 2:13 ` Jaegeuk Kim 1 sibling, 0 replies; 30+ messages in thread From: Jaegeuk Kim @ 2022-07-24 2:13 UTC (permalink / raw) To: Eric Biggers Cc: Keith Busch, Keith Busch, Chao Yu, linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal, pankydev8, linux-f2fs-devel On 07/22, Eric Biggers wrote: > On Fri, Jul 22, 2022 at 08:43:55AM -0600, Keith Busch wrote: > > On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote: > > > [+f2fs list and maintainers] > > > > > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote: > > > > From: Keith Busch <kbusch@kernel.org> > > > > > > > > Use the address alignment requirements from the block_device for direct > > > > io instead of requiring addresses be aligned to the block size. > > > > > > > > Signed-off-by: Keith Busch <kbusch@kernel.org> > > > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > > > --- > > > > fs/iomap/direct-io.c | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > > > > index 370c3241618a..5d098adba443 100644 > > > > --- a/fs/iomap/direct-io.c > > > > +++ b/fs/iomap/direct-io.c > > > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > > > struct inode *inode = iter->inode; > > > > unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev)); > > > > unsigned int fs_block_size = i_blocksize(inode), pad; > > > > - unsigned int align = iov_iter_alignment(dio->submit.iter); > > > > loff_t length = iomap_length(iter); > > > > loff_t pos = iter->pos; > > > > unsigned int bio_opf; > > > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > > > size_t copied = 0; > > > > size_t orig_count; > > > > > > > > - if ((pos | length | align) & ((1 << blkbits) - 1)) > > > > + if ((pos | length) & ((1 << blkbits) - 1) || > > > > + !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter)) > > > > return -EINVAL; > > > > > > > > if (iomap->type == IOMAP_UNWRITTEN) { > > > > > > I noticed that this patch is going to break the following logic in > > > f2fs_should_use_dio() in fs/f2fs/file.c: > > > > > > /* > > > * Direct I/O not aligned to the disk's logical_block_size will be > > > * attempted, but will fail with -EINVAL. > > > * > > > * f2fs additionally requires that direct I/O be aligned to the > > > * filesystem block size, which is often a stricter requirement. > > > * However, f2fs traditionally falls back to buffered I/O on requests > > > * that are logical_block_size-aligned but not fs-block aligned. > > > * > > > * The below logic implements this behavior. > > > */ > > > align = iocb->ki_pos | iov_iter_alignment(iter); > > > if (!IS_ALIGNED(align, i_blocksize(inode)) && > > > IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev))) > > > return false; > > > > > > return true; > > > > > > So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical > > > block aligned. This patch changes that. The result is that DIO will sometimes > > > proceed in cases where the I/O doesn't have the fs block alignment required by > > > f2fs for all DIO. > > > > > > Does anyone have any thoughts about what f2fs should be doing here? I think > > > it's weird that f2fs has different behaviors for different degrees of > > > misalignment: fail with EINVAL if not logical block aligned, else fallback to > > > buffered I/O if not fs block aligned. I think it should be one convention or > > > the other. Any opinions about which one it should be? > > > > It looks like f2fs just falls back to buffered IO for this condition without > > reaching the new code in iomap_dio_bio_iter(). > > No. It's a bit subtle, so read the code and what I'm saying carefully. f2fs > only supports 4K aligned DIO and normally falls back to buffered I/O; however, > for DIO that is *very* misaligned (not even LBS aligned) it returns EINVAL > instead. And it relies on __iomap_dio_rw() returning that EINVAL. > > Relying on __iomap_dio_rw() in that way is definitely a bad design on f2fs's > part (and I messed that up when switching f2fs from fs/direct-io.c to iomap). > The obvious fix is to just have f2fs do the LBS alignment check itself. > > But I think that f2fs shouldn't have different behavior for different levels of > misalignment in the first place, so I was wondering if anyone had any thoughts > on which behavior (EINVAL or fallback to buffered I/O) should be standardized on > in all cases, at least for f2fs. There was some discussion about this sort of > thing for ext4 several years ago on the thread > https://lore.kernel.org/linux-ext4/1461472078-20104-1-git-send-email-tytso@mit.edu/T/#u, > but it didn't really reach a conclusion. I'm wondering if the f2fs maintainers > have any thoughts about why the f2fs behavior is as it is. I.e. is it just > accidental, or are there specific reasons... If there's a generic way to deal with this, I have no objection to follow it. Initially, I remember I was trying to match the ext4 rule, but at some point, I lost the track. > > - Eric ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-07-22 7:36 ` Eric Biggers 2022-07-22 14:43 ` Keith Busch @ 2022-07-22 17:53 ` Darrick J. Wong 2022-07-22 18:12 ` Eric Biggers 1 sibling, 1 reply; 30+ messages in thread From: Darrick J. Wong @ 2022-07-22 17:53 UTC (permalink / raw) To: Eric Biggers Cc: Keith Busch, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal, pankydev8, Keith Busch, linux-f2fs-devel On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote: > [+f2fs list and maintainers] > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote: > > From: Keith Busch <kbusch@kernel.org> > > > > Use the address alignment requirements from the block_device for direct > > io instead of requiring addresses be aligned to the block size. > > > > Signed-off-by: Keith Busch <kbusch@kernel.org> > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > --- > > fs/iomap/direct-io.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > > index 370c3241618a..5d098adba443 100644 > > --- a/fs/iomap/direct-io.c > > +++ b/fs/iomap/direct-io.c > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > struct inode *inode = iter->inode; > > unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev)); > > unsigned int fs_block_size = i_blocksize(inode), pad; > > - unsigned int align = iov_iter_alignment(dio->submit.iter); > > loff_t length = iomap_length(iter); > > loff_t pos = iter->pos; > > unsigned int bio_opf; > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > size_t copied = 0; > > size_t orig_count; > > > > - if ((pos | length | align) & ((1 << blkbits) - 1)) > > + if ((pos | length) & ((1 << blkbits) - 1) || > > + !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter)) How does this change intersect with "make statx() return DIO alignment information" ? Will the new STATX_DIOALIGN implementations have to be adjusted to set stx_dio_mem_align = bdev_dma_alignment(...)? I'm guessing the answer is yes, but I haven't seen any patches on the list to do that, but more and more these days email behaves like a flood of UDP traffic... :( --D > > return -EINVAL; > > > > if (iomap->type == IOMAP_UNWRITTEN) { > > I noticed that this patch is going to break the following logic in > f2fs_should_use_dio() in fs/f2fs/file.c: > > /* > * Direct I/O not aligned to the disk's logical_block_size will be > * attempted, but will fail with -EINVAL. > * > * f2fs additionally requires that direct I/O be aligned to the > * filesystem block size, which is often a stricter requirement. > * However, f2fs traditionally falls back to buffered I/O on requests > * that are logical_block_size-aligned but not fs-block aligned. > * > * The below logic implements this behavior. > */ > align = iocb->ki_pos | iov_iter_alignment(iter); > if (!IS_ALIGNED(align, i_blocksize(inode)) && > IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev))) > return false; > > return true; > > So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical > block aligned. This patch changes that. The result is that DIO will sometimes > proceed in cases where the I/O doesn't have the fs block alignment required by > f2fs for all DIO. > > Does anyone have any thoughts about what f2fs should be doing here? I think > it's weird that f2fs has different behaviors for different degrees of > misalignment: fail with EINVAL if not logical block aligned, else fallback to > buffered I/O if not fs block aligned. I think it should be one convention or > the other. Any opinions about which one it should be? > > (Note: if you blame the above code, it was written by me. But I was just > preserving the existing behavior; I don't know the original motivation.) > > - Eric ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-07-22 17:53 ` Darrick J. Wong @ 2022-07-22 18:12 ` Eric Biggers 2022-07-23 5:03 ` Darrick J. Wong 0 siblings, 1 reply; 30+ messages in thread From: Eric Biggers @ 2022-07-22 18:12 UTC (permalink / raw) To: Darrick J. Wong Cc: Keith Busch, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal, pankydev8, Keith Busch, linux-f2fs-devel On Fri, Jul 22, 2022 at 10:53:42AM -0700, Darrick J. Wong wrote: > On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote: > > [+f2fs list and maintainers] > > > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote: > > > From: Keith Busch <kbusch@kernel.org> > > > > > > Use the address alignment requirements from the block_device for direct > > > io instead of requiring addresses be aligned to the block size. > > > > > > Signed-off-by: Keith Busch <kbusch@kernel.org> > > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > > --- > > > fs/iomap/direct-io.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > > > index 370c3241618a..5d098adba443 100644 > > > --- a/fs/iomap/direct-io.c > > > +++ b/fs/iomap/direct-io.c > > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > > struct inode *inode = iter->inode; > > > unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev)); > > > unsigned int fs_block_size = i_blocksize(inode), pad; > > > - unsigned int align = iov_iter_alignment(dio->submit.iter); > > > loff_t length = iomap_length(iter); > > > loff_t pos = iter->pos; > > > unsigned int bio_opf; > > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > > size_t copied = 0; > > > size_t orig_count; > > > > > > - if ((pos | length | align) & ((1 << blkbits) - 1)) > > > + if ((pos | length) & ((1 << blkbits) - 1) || > > > + !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter)) > > How does this change intersect with "make statx() return DIO alignment > information" ? Will the new STATX_DIOALIGN implementations have to be > adjusted to set stx_dio_mem_align = bdev_dma_alignment(...)? > > I'm guessing the answer is yes, but I haven't seen any patches on the > list to do that, but more and more these days email behaves like a flood > of UDP traffic... :( > Yes. I haven't done that in the STATX_DIOALIGN patchset yet because I've been basing it on upstream, which doesn't yet have this iomap patch. I haven't been expecting STATX_DIOALIGN to make 5.20, given that it's a new UAPI that needs time to be properly reviewed, plus I've just been busy with other things. So I've been planning to make the above change after this patch lands upstream. - Eric ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io 2022-07-22 18:12 ` Eric Biggers @ 2022-07-23 5:03 ` Darrick J. Wong 0 siblings, 0 replies; 30+ messages in thread From: Darrick J. Wong @ 2022-07-23 5:03 UTC (permalink / raw) To: Eric Biggers Cc: Keith Busch, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal, pankydev8, Keith Busch, linux-f2fs-devel On Fri, Jul 22, 2022 at 06:12:40PM +0000, Eric Biggers wrote: > On Fri, Jul 22, 2022 at 10:53:42AM -0700, Darrick J. Wong wrote: > > On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote: > > > [+f2fs list and maintainers] > > > > > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote: > > > > From: Keith Busch <kbusch@kernel.org> > > > > > > > > Use the address alignment requirements from the block_device for direct > > > > io instead of requiring addresses be aligned to the block size. > > > > > > > > Signed-off-by: Keith Busch <kbusch@kernel.org> > > > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > > > --- > > > > fs/iomap/direct-io.c | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > > > > index 370c3241618a..5d098adba443 100644 > > > > --- a/fs/iomap/direct-io.c > > > > +++ b/fs/iomap/direct-io.c > > > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > > > struct inode *inode = iter->inode; > > > > unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev)); > > > > unsigned int fs_block_size = i_blocksize(inode), pad; > > > > - unsigned int align = iov_iter_alignment(dio->submit.iter); > > > > loff_t length = iomap_length(iter); > > > > loff_t pos = iter->pos; > > > > unsigned int bio_opf; > > > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > > > size_t copied = 0; > > > > size_t orig_count; > > > > > > > > - if ((pos | length | align) & ((1 << blkbits) - 1)) > > > > + if ((pos | length) & ((1 << blkbits) - 1) || > > > > + !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter)) > > > > How does this change intersect with "make statx() return DIO alignment > > information" ? Will the new STATX_DIOALIGN implementations have to be > > adjusted to set stx_dio_mem_align = bdev_dma_alignment(...)? > > > > I'm guessing the answer is yes, but I haven't seen any patches on the > > list to do that, but more and more these days email behaves like a flood > > of UDP traffic... :( > > > > Yes. I haven't done that in the STATX_DIOALIGN patchset yet because I've been > basing it on upstream, which doesn't yet have this iomap patch. I haven't been > expecting STATX_DIOALIGN to make 5.20, given that it's a new UAPI that needs > time to be properly reviewed, plus I've just been busy with other things. So > I've been planning to make the above change after this patch lands upstream. <nod> Ok, I'm looking forward to it. Thank you for your work on statx! :) --D > - Eric ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <20220610195830.3574005-6-kbusch@fb.com>]
* Re: [PATCHv6 05/11] block: add a helper function for dio alignment [not found] ` <20220610195830.3574005-6-kbusch@fb.com> @ 2022-07-22 21:53 ` Bart Van Assche 0 siblings, 0 replies; 30+ messages in thread From: Bart Van Assche @ 2022-07-22 21:53 UTC (permalink / raw) To: Keith Busch, linux-fsdevel, linux-block, linux-nvme Cc: axboe, Kernel Team, hch, damien.lemoal, ebiggers, pankydev8, Keith Busch, Johannes Thumshirn On 6/10/22 12:58, Keith Busch wrote: > +static bool blkdev_dio_unaligned(struct block_device *bdev, loff_t pos, > + struct iov_iter *iter) > +{ > + return ((pos | iov_iter_alignment(iter)) & > + (bdev_logical_block_size(bdev) - 1)); > +} A nit: if you have to repost this patch series, please leave out the outer parentheses from the return statement. Otherwise this patch looks good to me. Thanks, Bart. ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <20220610195830.3574005-7-kbusch@fb.com>]
* Re: [PATCHv6 06/11] block/merge: count bytes instead of sectors [not found] ` <20220610195830.3574005-7-kbusch@fb.com> @ 2022-07-22 21:57 ` Bart Van Assche 0 siblings, 0 replies; 30+ messages in thread From: Bart Van Assche @ 2022-07-22 21:57 UTC (permalink / raw) To: Keith Busch, linux-fsdevel, linux-block, linux-nvme Cc: axboe, Kernel Team, hch, damien.lemoal, ebiggers, pankydev8, Keith Busch, Johannes Thumshirn On 6/10/22 12:58, Keith Busch wrote: > @@ -269,8 +269,8 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, > { > struct bio_vec bv, bvprv, *bvprvp = NULL; > struct bvec_iter iter; > - unsigned nsegs = 0, sectors = 0; > - const unsigned max_sectors = get_max_io_size(q, bio); > + unsigned nsegs = 0, bytes = 0; > + const unsigned max_bytes = get_max_io_size(q, bio) << 9; How about using SECTOR_SHIFT instead of 9? Thanks, Bart. ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2022-07-25 18:19 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20220610195830.3574005-1-kbusch@fb.com>
     [not found] ` <20220610195830.3574005-8-kbusch@fb.com>
2022-06-13 14:22   ` [PATCHv6 07/11] block/bounce: count bytes instead of sectors Christoph Hellwig
2022-07-22 22:01   ` Bart Van Assche
2022-07-25 14:46     ` Keith Busch
2022-06-13 21:22 ` [PATCHv6 00/11] direct-io dma alignment Jens Axboe
     [not found] ` <20220610195830.3574005-12-kbusch@fb.com>
2022-06-23 18:29   ` [PATCHv6 11/11] iomap: add support for dma aligned direct-io Eric Farman
2022-06-23 18:51     ` Keith Busch
2022-06-23 19:11       ` Keith Busch
2022-06-23 20:32         ` Eric Farman
2022-06-23 21:34           ` Eric Farman
2022-06-27 15:21             ` Eric Farman
2022-06-27 15:36               ` Keith Busch
2022-06-28  9:00                 ` Halil Pasic
2022-06-28 15:20                   ` Eric Farman
2022-06-29  3:18                     ` Eric Farman
2022-06-29  3:52                       ` Keith Busch
2022-06-29 18:04                         ` Eric Farman
2022-06-29 19:07                           ` Keith Busch
2022-06-29 19:28                             ` Eric Farman
2022-06-30  5:45                             ` Christian Borntraeger
2022-07-22  7:36   ` Eric Biggers
2022-07-22 14:43     ` Keith Busch
2022-07-22 18:01       ` Eric Biggers
2022-07-22 20:26         ` Keith Busch
2022-07-25 18:19           ` Eric Biggers
2022-07-24  2:13         ` Jaegeuk Kim
2022-07-22 17:53     ` Darrick J. Wong
2022-07-22 18:12       ` Eric Biggers
2022-07-23  5:03         ` Darrick J. Wong
     [not found] ` <20220610195830.3574005-6-kbusch@fb.com>
2022-07-22 21:53   ` [PATCHv6 05/11] block: add a helper function for dio alignment Bart Van Assche
     [not found] ` <20220610195830.3574005-7-kbusch@fb.com>
2022-07-22 21:57   ` [PATCHv6 06/11] block/merge: count bytes instead of sectors Bart Van Assche
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).