public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: chandan.babu@oracle.com, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] xfs: don't extend the FITRIM range if the rt device does not support discard
Date: Tue, 13 Aug 2024 22:41:18 -0700	[thread overview]
Message-ID: <20240814054118.GE865349@frogsfrogsfrogs> (raw)
In-Reply-To: <20240814042358.19297-2-hch@lst.de>

On Wed, Aug 14, 2024 at 06:23:58AM +0200, Christoph Hellwig wrote:
> Fix the newly added discard support to only offer a FITRIM range that
> spans the RT device in addition to the main device if the RT device
> actually supports discard.  Without this we'll incorrectly accept
> a larger range than actually supported and confuse user space if the
> RT device does not support discard.  This can easily happen when the
> main device is a SSD but the RT device is a hard driver.
> 
> Move the code around a bit to keep the max_blocks and granularity
> assignments together and to handle the case where only the RT device
> supports discard as well.
> 
> Fixes: 3ba3ab1f6719 ("xfs: enable FITRIM on the realtime device")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_discard.c | 37 ++++++++++++++++++++++++-------------
>  1 file changed, 24 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
> index 6516afecce0979..46e28499a7d3ce 100644
> --- a/fs/xfs/xfs_discard.c
> +++ b/fs/xfs/xfs_discard.c
> @@ -654,9 +654,9 @@ xfs_ioc_trim(
>  	struct xfs_mount		*mp,
>  	struct fstrim_range __user	*urange)
>  {
> -	unsigned int		granularity =
> -		bdev_discard_granularity(mp->m_ddev_targp->bt_bdev);
> +	struct block_device	*bdev = mp->m_ddev_targp->bt_bdev;
>  	struct block_device	*rt_bdev = NULL;
> +	unsigned int		granularity = 0;
>  	struct fstrim_range	range;
>  	xfs_daddr_t		start, end;
>  	xfs_extlen_t		minlen;
> @@ -666,15 +666,6 @@ xfs_ioc_trim(
>  
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
> -	if (mp->m_rtdev_targp &&
> -	    bdev_max_discard_sectors(mp->m_rtdev_targp->bt_bdev))
> -		rt_bdev = mp->m_rtdev_targp->bt_bdev;
> -	if (!bdev_max_discard_sectors(mp->m_ddev_targp->bt_bdev) && !rt_bdev)
> -		return -EOPNOTSUPP;

Does this still return EOPNOTSUPP if there's no rt device and the data
device doesn't support discard?  I only see an EOPNOTSUPP return if
there is a rt device and both devices don't support discard.

> -
> -	if (rt_bdev)
> -		granularity = max(granularity,
> -				  bdev_discard_granularity(rt_bdev));
>  
>  	/*
>  	 * We haven't recovered the log, so we cannot use our bnobt-guided
> @@ -683,13 +674,33 @@ xfs_ioc_trim(
>  	if (xfs_has_norecovery(mp))
>  		return -EROFS;
>  
> +	if (bdev_max_discard_sectors(bdev)) {
> +		max_blocks = mp->m_sb.sb_dblocks;
> +		granularity = bdev_discard_granularity(bdev);
> +	} else {
> +		bdev = NULL;
> +	}
> +
> +	if (mp->m_rtdev_targp) {
> +		rt_bdev = mp->m_rtdev_targp->bt_bdev;
> +		if (!bdev_max_discard_sectors(rt_bdev)) {
> +			if (!bdev)
> +				return -EOPNOTSUPP;
> +			rt_bdev = NULL;
> +		}
> +	}
> +	if (rt_bdev) {
> +		max_blocks += mp->m_sb.sb_rblocks;

I think this will break xfs_scrub, which (unlike fstrim) breaks up its
FITRIM requests into smaller pieces.  The (afwul) FITRIM interface says
that [0, dblocks) trims the data device, and [dblocks, dblocks +
rblocks) trims the realtime device.

If the data device doesn't support discard, max_blocks will be rblocks,
and that's what we use to validate the @start parameter.  For example,
if the data device is 10T spinning rust and the rt device is a 10G SSD,
max_blocks will be 10G.  A FITRIM request for just the rt device will be
[10T, 10G), which now fails with EINVAL.

I don't have a fix to suggest for this yet, but let me play around with
this tomorrow and see if I can come up with something better, or figure
out how I'm being thick. ;)

My guess is that what we really want is if either device supports
discard we allow the full range, but if a specific device doesn't
support discard then we skip it and don't add anything to the outgoing
range.len.  But that's what I thought the current code does. <shrug>

--D

> +		granularity =
> +			max(granularity, bdev_discard_granularity(rt_bdev));
> +	}
> +
>  	if (copy_from_user(&range, urange, sizeof(range)))
>  		return -EFAULT;
>  
>  	range.minlen = max_t(u64, granularity, range.minlen);
>  	minlen = XFS_B_TO_FSB(mp, range.minlen);
>  
> -	max_blocks = mp->m_sb.sb_dblocks + mp->m_sb.sb_rblocks;
>  	if (range.start >= XFS_FSB_TO_B(mp, max_blocks) ||
>  	    range.minlen > XFS_FSB_TO_B(mp, mp->m_ag_max_usable) ||
>  	    range.len < mp->m_sb.sb_blocksize)
> @@ -698,7 +709,7 @@ xfs_ioc_trim(
>  	start = BTOBB(range.start);
>  	end = start + BTOBBT(range.len) - 1;
>  
> -	if (bdev_max_discard_sectors(mp->m_ddev_targp->bt_bdev)) {
> +	if (bdev) {
>  		error = xfs_trim_datadev_extents(mp, start, end, minlen,
>  				&blocks_trimmed);
>  		if (error)
> -- 
> 2.43.0
> 
> 

  reply	other threads:[~2024-08-14  5:41 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-14  4:23 [PATCH 1/2] xfs: remove a stale comment in xfs_ioc_trim Christoph Hellwig
2024-08-14  4:23 ` [PATCH 2/2] xfs: don't extend the FITRIM range if the rt device does not support discard Christoph Hellwig
2024-08-14  5:41   ` Darrick J. Wong [this message]
2024-08-14  5:48     ` Christoph Hellwig
2024-08-14  6:16       ` Darrick J. Wong
2024-08-14  5:12 ` [PATCH 1/2] xfs: remove a stale comment in xfs_ioc_trim Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2024-08-16  8:18 fix FITRIM with non-discard capable RT device v2 Christoph Hellwig
2024-08-16  8:18 ` [PATCH 2/2] xfs: don't extend the FITRIM range if the rt device does not support discard Christoph Hellwig
2024-08-16 21:50   ` Darrick J. Wong
2024-08-19 12:44     ` Christoph Hellwig
2024-08-19 15:00       ` Darrick J. Wong
2024-08-19 15:08         ` Christoph Hellwig
2024-08-20 16:19           ` Darrick J. Wong
2024-08-20 16:35             ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240814054118.GE865349@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox