public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Eric Sandeen <sandeen@sandeen.net>
Cc: Christoph Hellwig <hch@infradead.org>,
	Eric Sandeen <sandeen@redhat.com>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH RFC] xfs: set block device logical sector size on xfs_buftarg
Date: Thu, 14 Nov 2013 08:26:58 +1100	[thread overview]
Message-ID: <20131113212658.GJ6188@dastard> (raw)
In-Reply-To: <5283CE2E.2070702@sandeen.net>

On Wed, Nov 13, 2013 at 01:08:30PM -0600, Eric Sandeen wrote:
> On 11/13/13, 12:56 PM, Christoph Hellwig wrote:
> > On Wed, Nov 13, 2013 at 12:25:33PM -0600, Eric Sandeen wrote:
> >> Pure RFC; this might be crazy.  Here's the problem I'm trying to solve:
> >>
> >> Today, mkfs.xfs will select a 4k sector size for a 4k physical / 512 logical
> >> drive.  (that change was done by me).  The thought was that it'd be an
> >> efficiency gain to not make the drive do the (possible) RMW cycles on
> >> 512-byte log IO, primarily.
> >>
> >> However, now this restricts all DIO to 4k alignment, not the otherwise-
> >> possible 512.
> >>
> >> This came up when qemu-kvm, in cache=none mode, tries to boot off an
> >> image hosted on such a filesystem, and its bios wants to do a 512 byte
> >> direct IO read off the disk - it fails.
> >>
> >> But I'm wondering - the buftarg's bt_sshift and bt_smask are only used
> >> in a few places.  
> > 
> > No need to mess with kernel code IFF we want to change that, just keep
> > the sector size at 512 bytes and set a log stripe unit at mkfs time.
> > 
> > I have to admit that I'm not really sure if that's what we really want,
> > through.  A drive that has a larger physical block size will need
> > read-modify-write cycles internally, which we try to avoid.
> 
> Yeah, the problem comes up when it is 100% impossible to boot a
> qemu-kvm guest hosted on such a filesystem/drive.  :(

No it's not. Just use cache=writethrough and the page cache will
take care of the mismatch when it occurs.

> (of course I guess that means it fails on a hard 4k drive too)

And on any other filesystem that thinks it has sectors larger than
512 bytes underlying it (e.g. cdrom has a 2k sector size).

> I don't know what the guest sees for logical/physical on its
> file-backed block device in these cases.

Seems like that's the avenue for improvement here to me. i.e. expose
the correct values to the guest so it's mkfs does the right thing.
Or, alternatively, make qemu buffer non-aligned/sized IOs itself
internally.

After all, it has been told to use direct IO, and when that happens
it is the application's responsibility to ensure IO alignment
requirements are met...

> Anyway, if we took your suggestion, normal internal fs operations
> (log IO) wouldn't RMW.  But we'd still presumably advertise and allow
> smaller DIO sizes, which are inefficient.  We could advertise 4k, but
> still allow 512 for less-smart apps, maybe?

I'd say such a problem is a matter of user education and making qemu
aware of logical/physical differences - hacking weird corner cases
into what a sector size means is only going to lead to confusion and
bite us in unexpected ways...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-11-13 21:27 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-13 18:25 [PATCH RFC] xfs: set block device logical sector size on xfs_buftarg Eric Sandeen
2013-11-13 18:56 ` Christoph Hellwig
2013-11-13 19:08   ` Eric Sandeen
2013-11-13 21:26     ` Dave Chinner [this message]
2013-11-13 21:32       ` Eric Sandeen
2013-11-13 22:10         ` Dave Chinner
2013-11-13 22:18           ` Eric Sandeen
2013-11-14  0:34             ` Dave Chinner
2013-11-14 13:37       ` Christoph Hellwig
2013-11-14 14:56         ` Eric Sandeen
2013-11-14 21:01           ` Dave Chinner
2013-11-22 14:13             ` Ric Wheeler
2013-11-22 14:20               ` Christoph Hellwig
2013-11-22 14:26                 ` Ric Wheeler
2013-11-22 14:57               ` Eric Sandeen
2013-11-14  0:35 ` Eric Sandeen
2013-11-14  6:49   ` Dave Chinner
2013-11-14 13:09     ` Ric Wheeler
2013-11-14 15:03       ` Eric Sandeen
2013-11-14 15:18     ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131113212658.GJ6188@dastard \
    --to=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=sandeen@redhat.com \
    --cc=sandeen@sandeen.net \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox