All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: John Garry <john.g.garry@oracle.com>,
	chandan.babu@oracle.com, djwong@kernel.org, dchinner@redhat.com,
	hch@lst.de, viro@zeniv.linux.org.uk, brauner@kernel.org,
	jack@suse.cz, linux-xfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	catherine.hoang@oracle.com, martin.petersen@oracle.com
Subject: Re: [PATCH v4 00/14] forcealign for xfs
Date: Tue, 10 Sep 2024 08:21:55 +0530	[thread overview]
Message-ID: <877cbkgr04.fsf@gmail.com> (raw)
In-Reply-To: <ZtlQt/7VHbOtQ+gY@dread.disaster.area>

Dave Chinner <david@fromorbit.com> writes:

> On Thu, Sep 05, 2024 at 09:26:25AM +0530, Ritesh Harjani wrote:
>> Dave Chinner <david@fromorbit.com> writes:
>> > On Wed, Sep 04, 2024 at 11:44:29PM +0530, Ritesh Harjani wrote:
>> >> 3. It is the FORCEALIGN feature which _mandates_ both allocation
>> >> (by using extsize hint) and de-allocation to happen _only_ in
>> >> extsize chunks.
>> >>
>> >>    i.e. forcealign mandates -
>> >>    - the logical and physical start offset should be aligned as
>> >>    per args->alignment
>> >>    - extent length be aligned as per args->prod/mod.
>> >>      If above two cannot be satisfied then return -ENOSPC.
>> >
>> > Yes.
>> >
>> >> 
>> >>    - Does the unmapping of extents also only happens in extsize
>> >>    chunks (with forcealign)?
>> >
>> > Yes, via use of xfs_inode_alloc_unitsize() in the high level code
>> > aligning the fsbno ranges to be unmapped.
>> >
>> > Remember, force align requires both logical file offset and
>> > physical block number to be correctly aligned,
>> 
>> This is where I would like to double confirm it again. Even the
>> extsize hint feature (w/o FORCEALIGN) will try to allocate aligned
>> physical start and logical start file offset and length right?
>
> No.
>
>> (Or does extsize hint only restricts alignment to logical start file
>> offset + length and not the physical start?)
>
> Neither.
>

Yes, thanks for the correction. Indeed extsize hint does not take care
of the physical start alignment at all.

> extsize hint by itself (i.e. existing behaviour) has no alignment
> effect at all. All it affects is -size- of the extent. i.e. once
> the extent start is chosen, extent size hints will trim the length
> of the extent to a multiple of the extent size hint. Alignment is
> not considered at all.
>

Please correct me I wrong here... but XFS considers aligning the logical
start and the length of the allocated extent (for extsize) as per below
code right? 

i.e.
1) xfs_direct_write_iomap_begin()
{
    <...>
    if (offset + length > XFS_ISIZE(ip))
		end_fsb = xfs_iomap_eof_align_last_fsb(ip, end_fsb);
                  => xfs_fileoff_t aligned_end_fsb = roundup_64(end_fsb, align);
                     return aligned_end_fsb
}

2) xfs_bmap_compute_alignments()
{
    <...>
    	else if (ap->datatype & XFS_ALLOC_USERDATA)
		     align = xfs_get_extsz_hint(ap->ip);

        if (align) {
            if (xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 0,
                        ap->eof, 0, ap->conv, &ap->offset,
                        &ap->length))
                ASSERT(0);
            ASSERT(ap->length);

            args->prod = align;
            div_u64_rem(ap->offset, args->prod, &args->mod);
            if (args->mod)
                args->mod = args->prod - args->mod;
        }
        <...>
}

So args->prod and args->mod... aren't they use to align the logical
start and the length of the extent?


However, I do notice that when the file is closed XFS trims the length
allocated beyond EOF boundary (for extsize but not for forcealign from
the new forcealign series) i.e.

xfs_file_release() -> xfs_release() -> xfs_free_eofblocks()

I guess that is because xfs_can_free_eofblocks() does not consider
alignment for extsize in this function 

xfs_can_free_eofblocks()
{
<...>
	end_fsb = xfs_inode_roundup_alloc_unit(ip,
			XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip)));
<...>
}




>> Also it looks like there is no difference with ATOMIC_WRITE AND
>> FORCEALIGN feature with XFS, correct? (except that ATOMIC_WRITE is
>> adding additional natural alignment restrictions on pos and len). 
>
> Atomic write requires additional hardware support, and it restricts
> the valid sizes of extent size hints that can be set. Only atomic
> writes can be done on files marked as configured for atomic writes;
> force alignment can be done on any file...
>
>> So why maintain 2 separate on disk inode flags for FORCEALIGN AND
>> ATOMIC_WRITE?
>
> the atomic write flag indicates that a file has been set up
> correctly for atomic writes to be able to issues reliably. force
> alignment doesn't guarantee that - it's just a mechanism that tells
> the allocator to behave a specific way.
>
>> - Do you foresee FORCEALIGN to be also used at other places w/o
>> ATOMIC_WRITE where feature differentiation between the two on an
>> inode is required?
>
> The already exist. For example, reliably allocating huge page
> mappings on DAX filesystems requires 2MB forced alignment. 
>
>> - Does the same reasoning will hold for XFS_SB_FEAT_RO_COMPAT_FORCEALIGN
>> & XFS_SB_FEAT_RO_COMPAT_ATOMICWRITES too?
>
> Same as above.
>
>> - But why ro_compact for ATOMICWRITES? There aren't any on disk metadata
>> changes within XFS filesystem to support atomic writes, right? 
>
> Because if you downgrade the kernel to something that doesn't
> support atomic writes, then non-atomic sized/aligned data can be
> written to the file and/or torn writes can occur.
>
> Worse, extent size hints that don't match the underlying hardware
> support could be set up for inodes, and when the kernel is upgraded
> again then atomic writes will fail on inodes that have atomic write
> flags set on them....
>
>> Is it something to just prevent users from destroying their own data
>> by not allowing a rw mount from an older kernel where users could do
>> unaligned writes to files marked for atomic writes?
>> Or is there any other reasoning to prevent XFS filesystem from becoming
>> inconsistent if an older kernel does a rw mount here.
>
> The older kernel does not know what the unknown inode flag means
> (i.e. atomic writes) and so, by definition, we cannot allow it to
> modify metadata or file data because it may not modify it in the
> correct way for that flag being set on the inode.
>
> Kernels that don't understand feature flags need to treat the
> filesystem as read-only, no matter how trivial the feature addition
> might seem.
>
>> > so unmap alignment
>> > has to be set up correctly at file offset level before we even know
>> > what extents underly the file range we need to unmap....
>> >
>> >>      If the start or end of the extent which needs unmapping is
>> >>      unaligned then we convert that extent to unwritten and skip,
>> >>      is it? (__xfs_bunmapi())
>> >
>> > The high level code should be aligning the start and end of the
>> > file range to be removed via xfs_inode_alloc_unitsize(). Hence 
>> > the low level __xfs_bunmapi() code shouldn't ever be encountering
>> > unaligned unmaps on force-aligned inodes.
>> >
>> 
>> Yes, but isn't this code snippet trying to handle a case when it finds an
>> unaligned extent during unmap?
>
> It does exactly that.
>
>> And what we are essentially trying to 
>> do here is leave the unwritten extent as is and if the found extent is
>> written then convert to unwritten and skip it (goto nodelete). This
>> means with forcealign if we encounter an unaligned extent then the file
>> will have this space reserved as is with extent marked unwritten. 
>> 
>> Is this understanding correct?
>
> Yes, except for the fact that this code is not triggered by force
> alignment.
>
> This code is preexisting realtime file functionality. It is used
> when the rtextent size is larger than a single filesytem block.
>
> In these configurations, we do allocation in rtextsize units, but we
> track written/unwritten extents in the BMBT on filesystem block
> granularity. Hence we can have unaligned written/unwritten extent
> boundaries, and if we aren't unmapping a whole rtextent then we
> simply mark the unused portion of it as unwritten in the BMBT to
> indicate it contains zeroes.
>
> IOWs, existing RT files have different IO alignment and
> written/unwritten extent conversion behaviour to the new forced
> alignment feature. The implementation code is shared in many places,
> but the higher level forced alignment functionality ensures there
> are never unaligned unwritten extents created or unaligned
> unmappings asked for. Hence this code does not trigger for the
> forced alignment cases.
>
> i.e. we have multiple seperate allocation alignment behaviours that
> share an implementation, but they do not all behave exactly the same
> way or provide the same alignment guarantees....
>

Thanks for taking time and explaining this. 

-ritesh

  reply	other threads:[~2024-09-10  3:16 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-13 16:36 [PATCH v4 00/14] forcealign for xfs John Garry
2024-08-13 16:36 ` [PATCH v4 01/14] xfs: only allow minlen allocations when near ENOSPC John Garry
2024-08-23 16:28   ` Darrick J. Wong
2024-08-13 16:36 ` [PATCH v4 02/14] xfs: always tail align maxlen allocations John Garry
2024-08-23 16:31   ` Darrick J. Wong
2024-08-29 17:58     ` John Garry
2024-08-29 21:34       ` Darrick J. Wong
2024-08-13 16:36 ` [PATCH v4 03/14] xfs: simplify extent allocation alignment John Garry
2024-08-13 16:36 ` [PATCH v4 04/14] xfs: make EOF allocation simpler John Garry
2024-09-04 18:25   ` Ritesh Harjani
2024-09-05  7:51     ` John Garry
2024-08-13 16:36 ` [PATCH v4 05/14] xfs: introduce forced allocation alignment John Garry
2024-08-13 16:36 ` [PATCH v4 06/14] xfs: align args->minlen for " John Garry
2024-08-13 16:36 ` [PATCH v4 07/14] xfs: Introduce FORCEALIGN inode flag John Garry
2024-08-13 16:36 ` [PATCH v4 08/14] xfs: Update xfs_inode_alloc_unitsize() for forcealign John Garry
2024-08-13 16:36 ` [PATCH v4 09/14] xfs: Update xfs_setattr_size() " John Garry
2024-08-13 16:36 ` [PATCH v4 10/14] xfs: Do not free EOF blocks " John Garry
2024-08-13 16:36 ` [PATCH v4 11/14] xfs: Only free full extents " John Garry
2024-08-13 16:36 ` [PATCH v4 12/14] xfs: Unmap blocks according to forcealign John Garry
2024-08-23 16:35   ` Darrick J. Wong
2024-08-13 16:36 ` [PATCH v4 13/14] xfs: Don't revert allocated offset for forcealign John Garry
2024-08-13 16:36 ` [PATCH v4 14/14] xfs: Enable file data forcealign feature John Garry
2024-09-04 18:14 ` [PATCH v4 00/14] forcealign for xfs Ritesh Harjani
2024-09-04 23:20   ` Dave Chinner
2024-09-05  3:56     ` Ritesh Harjani
2024-09-05  6:33       ` Dave Chinner
2024-09-10  2:51         ` Ritesh Harjani [this message]
2024-09-16  6:33           ` Dave Chinner
2024-09-10 12:33         ` Ritesh Harjani
2024-09-16  7:03           ` Dave Chinner
2024-09-16 10:24             ` John Garry
2024-09-17 20:54               ` Darrick J. Wong
2024-09-17 23:34                 ` Dave Chinner
2024-09-17 22:12               ` Dave Chinner
2024-09-18  7:59                 ` John Garry
2024-09-23  2:57                   ` Dave Chinner
2024-09-23  3:33                     ` Christoph Hellwig
2024-09-23  8:16                       ` John Garry
2024-09-23 12:07                         ` Christoph Hellwig
2024-09-23 12:33                           ` John Garry
2024-09-24  6:17                             ` Christoph Hellwig
2024-09-24  9:48                               ` John Garry
2024-11-29 11:36                                 ` John Garry
2024-09-23  8:00                     ` John Garry
2024-09-05 10:15     ` John Garry
2024-09-05 21:47       ` Dave Chinner
2024-09-06 14:31         ` John Garry
2024-09-08 22:49           ` Dave Chinner
2024-09-09 16:18             ` John Garry
2024-09-16  5:25               ` Dave Chinner
2024-09-16  9:44                 ` John Garry
2024-09-17 22:27                   ` Dave Chinner
2024-09-18 10:12                     ` John Garry
2024-11-14 12:48                       ` Long Li
2024-11-14 16:22                         ` John Garry
2024-11-14 20:07                         ` Dave Chinner
2024-11-15  8:14                           ` John Garry
2024-11-15 11:20                           ` Long Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877cbkgr04.fsf@gmail.com \
    --to=ritesh.list@gmail.com \
    --cc=brauner@kernel.org \
    --cc=catherine.hoang@oracle.com \
    --cc=chandan.babu@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=john.g.garry@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.