linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Jiang <dave.jiang@intel.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, ross.zwisler@linux.intel.com,
	linux-ext4@vger.kernel.org, dan.j.williams@intel.com,
	linux-nvdimm@lists.01.org
Subject: Re: [PATCH 2 2/2] xfs: fix rt_dev usage for DAX
Date: Tue, 6 Feb 2018 15:32:00 -0700	[thread overview]
Message-ID: <d267ed33-950e-a5c5-73ce-36da6c940513@intel.com> (raw)
In-Reply-To: <20180202004332.GZ4849@magnolia>

On 02/01/2018 05:43 PM, Darrick J. Wong wrote:
> On Fri, Feb 02, 2018 at 10:44:13AM +1100, Dave Chinner wrote:
>> On Thu, Feb 01, 2018 at 01:33:05PM -0700, Dave Jiang wrote:
>>> When using realtime device (rtdev) with xfs where the data device is not
>>> DAX capable, two issues arise. One is when data device is not DAX but the
>>> realtime device is DAX capable, we currently disable DAX.
>>> After passing this check, we are also not marking the inode as DAX capable.
>>> This change will allow DAX enabled if the data device or the realtime
>>> device is DAX capable. S_DAX will be marked for the inode if the file is
>>> residing on a DAX capable device. This will prevent the case of rtdev is not
>>> DAX and data device is DAX to create realtime files.
>>
>> I'm confused by this description. I'm not sure what is broken, nor
>> what you are trying to fix.
>>
>> I think what you want to do is enable DAX on RT devices separately
>> to the data device and vice versa?
>>
>> i.e. is this what you are trying to acheive?
>>
>> datadev dax	rtdev dax		DAX enabled on
>> -----------     ---------		--------------
>> no		no			neither
>> yes		no			datadev
>> no		yes			rtdev
>> yes		yes			both
>>
>>
>>>
>>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>>> Reported-by: Darrick Wong <darrick.wong@oracle.com>
>>> ---
>>>  fs/xfs/xfs_iops.c  |    3 ++-
>>>  fs/xfs/xfs_super.c |    9 ++++++++-
>>>  2 files changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>>> index 56475fcd76f2..ab352c325301 100644
>>> --- a/fs/xfs/xfs_iops.c
>>> +++ b/fs/xfs/xfs_iops.c
>>> @@ -1204,7 +1204,8 @@ xfs_diflags_to_iflags(
>>>  	    ip->i_mount->m_sb.sb_blocksize == PAGE_SIZE &&
>>>  	    !xfs_is_reflink_inode(ip) &&
>>>  	    (ip->i_mount->m_flags & XFS_MOUNT_DAX ||
>>> -	     ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
>>> +	     ip->i_d.di_flags2 & XFS_DIFLAG2_DAX) &&
>>> +	    blk_queue_dax(bdev_get_queue(inode->i_sb->s_bdev)))
>>
>> This does not discriminate between the rtdev or the data dev. This
>> needs to call xfs_find_bdev_for_inode() to get the right device
>> for the inode config.
>>
>> Further, if we add or remove the RT flag to the inode at a later
>> point in time (e.g. via ioctl) we also need to re-evaluate the S_DAX
>> flag at that point in time.
> 
> Ah, right, I'd missed that subtlety in my earlier replies.  Ok, add
> another patch to this series to reevaluate S_DAX when we change the RT
> flag.
> 
>> Which brings me to the real problem here: dynamically changing the
>> S_DAX flag is racy, dangerous and broken. It's not clear that this
>> should be allowed at all as the inode may have already been mmap()d
>> by the time the ioctl is called to set/clear the rt file state.
> 
> Agreed that this is a mess.  Either this needs to get fixed in the dax
> code, or we need to decide that we're not going to support reconfiguring
> the dax flag at all, except possibly for empty files (similar to how we
> restrict changes to the rt flag).

Does this mean we should add a check in xfs_ioctl_setattr_xflags() to
reject removing of realtime flag if S_DAX is set on the inode until the
dynamic change issue is sorted out?

> 
>> IOWs, right now we cannot support mixed DAX mode filesystems because
>> the generic DAX code does not support dynamic changing of the DAX
>> flag on an inode and so checking the block device state here is
>> irrelevant....
>>
>>>  		inode->i_flags |= S_DAX;
>>>  }
>>>  
>>> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
>>> index e8a687232614..5ac478924dce 100644
>>> --- a/fs/xfs/xfs_super.c
>>> +++ b/fs/xfs/xfs_super.c
>>> @@ -1649,11 +1649,18 @@ xfs_fs_fill_super(
>>>  		sb->s_flags |= SB_I_VERSION;
>>>  
>>>  	if (mp->m_flags & XFS_MOUNT_DAX) {
>>> +		bool rtdev_is_dax = false;
>>> +
>>>  		xfs_warn(mp,
>>>  		"DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
>>>  
>>> +		if (mp->m_rtdev_targp->bt_daxdev)
>>> +			if (bdev_dax_supported(mp->m_rtdev_targp->bt_bdev,
>>> +					      sb->s_blocksize) == 0)
>>> +				rtdev_is_dax = true;
>>
>> .... as this code here needs to turn off DAX here if any device
>> in the filesystem doesn't support DAX....
> 
> I think it'd be useful to be able to have a pmem rt device even if the
> data device doesn't support it.  Or rather, I have a few clients who
> have expressed interest in this sort of configuration.
> 
> --D
> 
>>
>>
>> FWIW, the logic in the code is terrible (not your fault, Dave).
>> The logic reads
>>
>> 	if (NOT bdev_dax_supported(rtdev)) then
>> 		rtdev supports DAX
>>
>> That also needs fixing - we're checking something that has a boolean
>> return state (yes or no) and so it should define them in a way that
>> makes the caller logic read cleanly....
>>
>> Cheers,
>>
>> Dave.
>> -- 
>> Dave Chinner
>> david@fromorbit.com
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2018-02-06 22:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-01 20:32 [PATCH 2 1/2] dax: change bdev_dax_supported() to take a block_device as input Dave Jiang
2018-02-01 20:33 ` [PATCH 2 2/2] xfs: fix rt_dev usage for DAX Dave Jiang
2018-02-01 23:28   ` Darrick J. Wong
2018-02-02  0:08     ` Dave Jiang
2018-02-02  0:38       ` Darrick J. Wong
2018-02-01 23:44   ` Dave Chinner
2018-02-02  0:13     ` Dave Jiang
2018-02-02  3:20       ` Dave Chinner
2018-02-02  0:43     ` Darrick J. Wong
2018-02-02  3:36       ` Dave Chinner
2018-02-06 22:32       ` Dave Jiang [this message]
2018-02-06 23:19         ` Darrick J. Wong
2018-02-07  0:19           ` Dan Williams
2018-03-06  0:06           ` Ross Zwisler
2018-02-01 22:46 ` [PATCH 2 1/2] dax: change bdev_dax_supported() to take a block_device as input Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d267ed33-950e-a5c5-73ce-36da6c940513@intel.com \
    --to=dave.jiang@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ross.zwisler@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).