linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Hongbo Li <lihongbo22@huawei.com>,
	linux-bcachefs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, axboe@kernel.dk, hch@lst.de
Subject: Re: bvec_iter.bi_sector -> loff_t? (was: Re: [PATCH] bcachefs: allow direct io fallback to buffer io for) unaligned length or offset
Date: Thu, 20 Jun 2024 14:54:09 +0100	[thread overview]
Message-ID: <ZnQ0gdpcplp_-aw7@casper.infradead.org> (raw)
In-Reply-To: <bbf7lnl2d5sxdzqbv3jcn6gxmtnsnscakqmfdf6vj4fcs3nasx@zvjsxfwkavgm>

On Thu, Jun 20, 2024 at 09:36:42AM -0400, Kent Overstreet wrote:
> On Thu, Jun 20, 2024 at 09:21:57PM +0800, Hongbo Li wrote:
> > Support fallback to buffered I/O if the operation being performed on
> > unaligned length or offset. This may change the behavior for direct
> > I/O in some cases.
> > 
> > [Before]
> > For length which aligned with 256 bytes (not SECTOR aligned) will
> > read failed under direct I/O.
> > 
> > [After]
> > For length which aligned with 256 bytes (not SECTOR aligned) will
> > read the data successfully under direct I/O because it will fallback
> > to buffer I/O.

This is against the O_DIRECT requirements.

   O_DIRECT
       The O_DIRECT flag may impose alignment restrictions on  the  length  and
       address  of  user-space  buffers  and the file offset of I/Os.  In Linux
       alignment restrictions vary by filesystem and kernel version  and  might
       be  absent  entirely.   The  handling  of  misaligned O_DIRECT I/Os also
       varies; they can either fail with EINVAL or fall back to buffered I/O.

       Since Linux 6.1, O_DIRECT support and alignment restrictions for a  file
       can  be  queried using statx(2), using the STATX_DIOALIGN flag.  Support
       for STATX_DIOALIGN varies by filesystem; see statx(2).

       Some filesystems provide their  own  interfaces  for  querying  O_DIRECT
       alignment restrictions, for example the XFS_IOC_DIOINFO operation in xf‐
       sctl(3).  STATX_DIOALIGN should be used instead when it is available.

       If none of the above is available, then direct I/O support and alignment
       restrictions  can  only  be  assumed  from  known characteristics of the
       filesystem, the individual file, the underlying storage  device(s),  and
       the  kernel  version.  In Linux 2.4, most filesystems based on block de‐
       vices require that the file offset and the length and memory address  of
       all  I/O  segments  be multiples of the filesystem block size (typically
       4096 bytes).  In Linux 2.6.0, this was relaxed to the logical block size
       of the block device (typically 512 bytes).   A  block  device's  logical
       block  size  can be determined using the ioctl(2) BLKSSZGET operation or
       from the shell using the command:

           blockdev --getss

> The catch is that struct bio - bvec_iter - represents addresses with a
> sector_t, and we'd want that to be a loff_t.
> 
> That's something we should do anyways; everything else in struct bio can
> represent a byte-aligned io, bvec_iter.bi_sector is the only exception
> and fixing that would help in consolidating our various scatter-gather
> list data structures - but we'd need buy-in from Jens and Christoph
> before doing that.

I'm against it.  Block devices only do sector-aligned IO and we should
not pretend otherwise.


  reply	other threads:[~2024-06-20 13:54 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240620132157.888559-1-lihongbo22@huawei.com>
2024-06-20 13:36 ` bvec_iter.bi_sector -> loff_t? (was: Re: [PATCH] bcachefs: allow direct io fallback to buffer io for) unaligned length or offset Kent Overstreet
2024-06-20 13:54   ` Matthew Wilcox [this message]
2024-06-20 14:16     ` Kent Overstreet
2024-06-20 14:49       ` Matthew Wilcox
2024-06-20 14:56         ` bvec_iter.bi_sector -> loff_t? Jens Axboe
2024-06-20 15:15           ` Matthew Wilcox
2024-06-20 15:18             ` Jens Axboe
2024-06-20 16:26               ` Keith Busch
2024-06-20 15:20             ` Christoph Hellwig
2024-06-20 15:21               ` Jens Axboe
2024-06-21  2:37           ` Hongbo Li
2024-06-21  3:05             ` Kent Overstreet
2024-06-20 15:35         ` bvec_iter.bi_sector -> loff_t? (was: Re: [PATCH] bcachefs: allow direct io fallback to buffer io for) unaligned length or offset Kent Overstreet
2024-06-21  3:13         ` bvec_iter.bi_sector -> loff_t? Hongbo Li
2024-06-20 15:30     ` bvec_iter.bi_sector -> loff_t? (was: Re: [PATCH] bcachefs: allow direct io fallback to buffer io for) unaligned length or offset Christoph Hellwig
2024-06-20 15:43       ` Kent Overstreet
2024-06-21  1:48         ` Ming Lei
2024-06-21  3:07           ` Kent Overstreet
2024-06-21  3:36             ` Ming Lei
2024-06-21  3:52               ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZnQ0gdpcplp_-aw7@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kent.overstreet@linux.dev \
    --cc=lihongbo22@huawei.com \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).