Re: regression introduced by "block: Add support for DAX reads/writes to block devices"

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Linda Knippers <linda.knippers@hp.com>
To: Dave Chinner <david@fromorbit.com>, Jeff Moyer <jmoyer@redhat.com>
Cc: "matthew r. wilcox" <matthew.r.wilcox@intel.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: regression introduced by "block: Add support for DAX reads/writes to block devices"
Date: Wed, 05 Aug 2015 21:42:54 -0400	[thread overview]
Message-ID: <55C2BB9E.3040709@hp.com> (raw)
In-Reply-To: <20150805220113.GC3902@dastard>

On 08/05/2015 06:01 PM, Dave Chinner wrote:
> On Wed, Aug 05, 2015 at 04:19:08PM -0400, Jeff Moyer wrote:
>> Hi, Matthew,
>>
>> Linda Knippers noticed that commit (bbab37ddc20b) breaks mkfs.xfs:
>>
>> # mkfs -t xfs -f /dev/pmem0
>> meta-data=/dev/pmem0             isize=256    agcount=4, agsize=524288 blks
>>          =                       sectsz=512   attr=2, projid32bit=1
>>          =                       crc=0        finobt=0
>> data     =                       bsize=4096   blocks=2097152, imaxpct=25
>>          =                       sunit=0      swidth=0 blks
>> naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
>> log      =internal log           bsize=4096   blocks=2560, version=2
>>          =                       sectsz=512   sunit=0 blks, lazy-count=1
>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>> mkfs.xfs: read failed: Numerical result out of range
>>
>> I sat down with Linda to look into it, and the problem is that mkfs.xfs
>> sets the blocksize of the device to 512 (via BLKBSZSET), and then reads
>> from the last sector of the device.  This results in dax_io trying to do
>> a page-sized I/O at 512 bytes from the end of the device.
> 
> Right - we have to be able to do IO to that last sector, so this is
> a sanity check to tell if the block dev is large enough. The XFS
> kernel code does the same end-of-device sector read when the
> filesystem is mounted, too.
> 
>> bdev_direct_access, receiving this bogus pos/size combo, returns
>> -ERANGE:
>>
>> 	if ((sector + DIV_ROUND_UP(size, 512)) >
>> 					part_nr_sects_read(bdev->bd_part))
>> 		return -ERANGE;
>>
>> Given that file systems supporting dax refuse to mount with a blocksize
>> != page size, I'm guessing this is sort of expected behavior.  However,
>> we really shouldn't be breaking direct I/O on pmem devices.
> 
> If the device is advertising 512 byte sector size support, then this
> needs to work, especially as DAX is completely transparent on the
> block device. Remember that DAX through a filesystem works on
> filesystem data block size boundaries, so a 512 byte sector/4k block
> size filesystem will be able to use DAX for mmapped files just fine.
> 
>> So, what do you want to do?  We could make the pmem device's logical
>> block size fixed at the sytem page size.  Or, we could modify the dax
>> code to work with blocksize < pagesize.  Or, we could continue using the
>> direct I/O codepath for direct block device access.  What do you think?
> 
> I don't know how the pmem device sets up it's limits. Can you post
> the output of:
> 
> 	/sys/block/pmem0/queue/logical_block_size
512

> 	/sys/block/pmem0/queue/physical_block_size
512

> 	/sys/block/pmem0/queue/hw_sector_size
512

> 	/sys/block/pmem0/queue/minimum_io_size
512

> 	/sys/block/pmem0/queue/optimal_io_size
0

Let me know if you need anything else.

-- ljk


> As these all affect how mkfs.xfs configures the filesystem being
> made and so influences the size and alignment of the IO is does....
> 
> Cheers,
> 
> Dave.
>

next prev parent reply	other threads:[~2015-08-06  1:42 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-05 20:19 regression introduced by "block: Add support for DAX reads/writes to block devices" Jeff Moyer
2015-08-05 22:01 ` Dave Chinner
2015-08-06  1:42   ` Linda Knippers [this message]
2015-08-06  3:24     ` Dave Chinner
2015-08-06  7:52       ` Boaz Harrosh
2015-08-06 20:34         ` Dave Chinner
2015-08-09  8:52           ` Boaz Harrosh
2015-08-10 16:32             ` Linda Knippers
2015-08-10 21:27               ` Dave Chinner
2015-08-10 23:04                 ` Linda Knippers
2015-08-06 14:21 ` Wilcox, Matthew R
2015-08-06 15:33   ` Jeff Moyer
2015-08-06 15:51     ` Wilcox, Matthew R
2015-08-06 21:30   ` Jeff Moyer
2015-08-07 18:11     ` Wilcox, Matthew R
2015-08-07 20:41       ` Jeff Moyer
2015-08-10  7:42         ` Boaz Harrosh
2015-08-12 21:11           ` Jeff Moyer
2015-08-13  5:32             ` Boaz Harrosh
2015-08-13 14:00               ` Jeff Moyer
2015-08-13 16:42                 ` Linda Knippers
2015-08-13 17:14                   ` Jeff Moyer
2015-08-13 17:52                     ` Linda Knippers
2015-08-13 18:19                       ` Jeff Moyer
2015-08-13 19:32                         ` Wilcox, Matthew R
2015-08-14 16:28                           ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55C2BB9E.3040709@hp.com \
    --to=linda.knippers@hp.com \
    --cc=david@fromorbit.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthew.r.wilcox@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.