public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHBOMB 6.12] xfs: a ton of bugfixes and cleanups
@ 2024-09-02 18:16 Darrick J. Wong
  2024-09-02 18:21 ` [PATCHSET v31.1 1/8] xfs: atomic file content commits Darrick J. Wong
                   ` (7 more replies)
  0 siblings, 8 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:16 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, Christoph Hellwig

Hi everyone,

6.12 is (allegedly) an LTS release, and as it's end of summer vacation
time in the northern hemisphere, the most that I'm going to get done for
this cycle is bug fixes and cleanups in preparation for metadata
directories and realtime allocation groups.

Christoph and I have finished reviewing this big batch of changes and I
think they're ready to be merged.  I'm resending the entire series so
that the patches are recorded in the list archives, and will follow it
with a pile of pull requests for actual merging.

--D


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCHSET v31.1 1/8] xfs: atomic file content commits
  2024-09-02 18:16 [PATCHBOMB 6.12] xfs: a ton of bugfixes and cleanups Darrick J. Wong
@ 2024-09-02 18:21 ` Darrick J. Wong
  2024-09-02 18:23   ` [PATCH 1/1] xfs: introduce new file range commit ioctls Darrick J. Wong
  2024-09-02 18:21 ` [PATCHSET v4.2 2/8] xfs: cleanups before adding metadata directories Darrick J. Wong
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:21 UTC (permalink / raw)
  To: chandanbabu, djwong
  Cc: Jeff Layton, Christoph Hellwig, linux-fsdevel, linux-xfs

Hi all,

This series creates XFS_IOC_START_COMMIT and XFS_IOC_COMMIT_RANGE ioctls
to perform the exchange only if the target file has not been changed
since a given sampling point.

This new functionality uses the mechanism underlying EXCHANGE_RANGE to
stage and commit file updates such that reader programs will see either
the old contents or the new contents in their entirety, with no chance
of torn writes.  A successful call completion guarantees that the new
contents will be seen even if the system fails.  The pair of ioctls
allows userspace to perform what amounts to a compare and exchange
operation on entire file contents.

Note that there are ongoing arguments in the community about how best to
implement some sort of file data write counter that nfsd could also use
to signal invalidations to clients.  Until such a thing is implemented,
this patch will rely on ctime/mtime updates.

Here are the proposed manual pages:

IOCTL-XFS-COMMIT-RANGE(2) System Calls ManualIOCTL-XFS-COMMIT-RANGE(2)

NAME
       ioctl_xfs_start_commit  -  prepare  to exchange the contents of
       two files ioctl_xfs_commit_range - conditionally  exchange  the
       contents of parts of two files

SYNOPSIS
       #include <sys/ioctl.h>
       #include <xfs/xfs_fs.h>

       int  ioctl(int  file2_fd, XFS_IOC_START_COMMIT, struct xfs_com‐
       mit_range *arg);

       int ioctl(int file2_fd, XFS_IOC_COMMIT_RANGE,  struct  xfs_com‐
       mit_range *arg);

DESCRIPTION
       Given  a  range  of bytes in a first file file1_fd and a second
       range of bytes in a second file  file2_fd,  this  ioctl(2)  ex‐
       changes  the contents of the two ranges if file2_fd passes cer‐
       tain freshness criteria.

       Before exchanging the  contents,  the  program  must  call  the
       XFS_IOC_START_COMMIT   ioctl   to  sample  freshness  data  for
       file2_fd.  If the sampled metadata  does  not  match  the  file
       metadata  at  commit  time,  XFS_IOC_COMMIT_RANGE  will  return
       EBUSY.

       Exchanges are atomic with regards  to  concurrent  file  opera‐
       tions.   Implementations must guarantee that readers see either
       the old contents or the new contents in their entirety, even if
       the system fails.

       The  system  call  parameters are conveyed in structures of the
       following form:

           struct xfs_commit_range {
               __s32    file1_fd;
               __u32    pad;
               __u64    file1_offset;
               __u64    file2_offset;
               __u64    length;
               __u64    flags;
               __u64    file2_freshness[5];
           };

       The field pad must be zero.

       The fields file1_fd, file1_offset, and length define the  first
       range of bytes to be exchanged.

       The fields file2_fd, file2_offset, and length define the second
       range of bytes to be exchanged.

       The field file2_freshness is an opaque field whose contents are
       determined  by  the  kernel.  These file attributes are used to
       confirm that file2_fd has not changed by another  thread  since
       the current thread began staging its own update.

       Both  files must be from the same filesystem mount.  If the two
       file descriptors represent the same file, the byte ranges  must
       not  overlap.   Most  disk-based  filesystems  require that the
       starts of both ranges must be aligned to the file  block  size.
       If  this  is  the  case, the ends of the ranges must also be so
       aligned unless the XFS_EXCHANGE_RANGE_TO_EOF flag is set.

       The field flags control the behavior of the exchange operation.

           XFS_EXCHANGE_RANGE_TO_EOF
                  Ignore the length parameter.  All bytes in  file1_fd
                  from  file1_offset to EOF are moved to file2_fd, and
                  file2's size is set to  (file2_offset+(file1_length-
                  file1_offset)).   Meanwhile, all bytes in file2 from
                  file2_offset to EOF are moved to file1  and  file1's
                  size    is   set   to   (file1_offset+(file2_length-
                  file2_offset)).

           XFS_EXCHANGE_RANGE_DSYNC
                  Ensure that all modified in-core data in  both  file
                  ranges  and  all  metadata updates pertaining to the
                  exchange operation are flushed to persistent storage
                  before  the  call  returns.  Opening either file de‐
                  scriptor with O_SYNC or O_DSYNC will have  the  same
                  effect.

           XFS_EXCHANGE_RANGE_FILE1_WRITTEN
                  Only  exchange sub-ranges of file1_fd that are known
                  to contain data  written  by  application  software.
                  Each  sub-range  may  be  expanded (both upwards and
                  downwards) to align with the file  allocation  unit.
                  For files on the data device, this is one filesystem
                  block.  For files on the realtime  device,  this  is
                  the realtime extent size.  This facility can be used
                  to implement fast atomic  scatter-gather  writes  of
                  any  complexity for software-defined storage targets
                  if all writes are aligned  to  the  file  allocation
                  unit.

           XFS_EXCHANGE_RANGE_DRY_RUN
                  Check  the parameters and the feasibility of the op‐
                  eration, but do not change anything.

RETURN VALUE
       On error, -1 is returned, and errno is set to indicate the  er‐
       ror.

ERRORS
       Error  codes can be one of, but are not limited to, the follow‐
       ing:

       EBADF  file1_fd is not open for reading and writing or is  open
              for  append-only  writes;  or  file2_fd  is not open for
              reading and writing or is open for append-only writes.

       EBUSY  The file2 inode number and timestamps  supplied  do  not
              match file2_fd.

       EINVAL The  parameters  are  not correct for these files.  This
              error can also appear if either file  descriptor  repre‐
              sents  a device, FIFO, or socket.  Disk filesystems gen‐
              erally require the offset and  length  arguments  to  be
              aligned to the fundamental block sizes of both files.

       EIO    An I/O error occurred.

       EISDIR One of the files is a directory.

       ENOMEM The  kernel  was unable to allocate sufficient memory to
              perform the operation.

       ENOSPC There is not enough free space  in  the  filesystem  ex‐
              change the contents safely.

       EOPNOTSUPP
              The filesystem does not support exchanging bytes between
              the two files.

       EPERM  file1_fd or file2_fd are immutable.

       ETXTBSY
              One of the files is a swap file.

       EUCLEAN
              The filesystem is corrupt.

       EXDEV  file1_fd and  file2_fd  are  not  on  the  same  mounted
              filesystem.

CONFORMING TO
       This API is XFS-specific.

USE CASES
       Several use cases are imagined for this system call.  Coordina‐
       tion between multiple threads is performed by the kernel.

       The first is a filesystem defragmenter, which copies  the  con‐
       tents  of  a  file into another file and wishes to exchange the
       space mappings of the two files,  provided  that  the  original
       file has not changed.

       An example program might look like this:

           int fd = open("/some/file", O_RDWR);
           int temp_fd = open("/some", O_TMPFILE | O_RDWR);
           struct stat sb;
           struct xfs_commit_range args = {
               .flags = XFS_EXCHANGE_RANGE_TO_EOF,
           };

           /* gather file2's freshness information */
           ioctl(fd, XFS_IOC_START_COMMIT, &args);
           fstat(fd, &sb);

           /* make a fresh copy of the file with terrible alignment to avoid reflink */
           clone_file_range(fd, NULL, temp_fd, NULL, 1, 0);
           clone_file_range(fd, NULL, temp_fd, NULL, sb.st_size - 1, 0);

           /* commit the entire update */
           args.file1_fd = temp_fd;
           ret = ioctl(fd, XFS_IOC_COMMIT_RANGE, &args);
           if (ret && errno == EBUSY)
               printf("file changed while defrag was underway
");

       The  second is a data storage program that wants to commit non-
       contiguous updates to a file atomically.  This  program  cannot
       coordinate updates to the file and therefore relies on the ker‐
       nel to reject the COMMIT_RANGE command if the file has been up‐
       dated  by  someone else.  This can be done by creating a tempo‐
       rary file, calling FICLONE(2) to share the contents, and  stag‐
       ing  the  updates into the temporary file.  The FULL_FILES flag
       is recommended for this purpose.  The  temporary  file  can  be
       deleted or punched out afterwards.

       An example program might look like this:

           int fd = open("/some/file", O_RDWR);
           int temp_fd = open("/some", O_TMPFILE | O_RDWR);
           struct xfs_commit_range args = {
               .flags = XFS_EXCHANGE_RANGE_TO_EOF,
           };

           /* gather file2's freshness information */
           ioctl(fd, XFS_IOC_START_COMMIT, &args);

           ioctl(temp_fd, FICLONE, fd);

           /* append 1MB of records */
           lseek(temp_fd, 0, SEEK_END);
           write(temp_fd, data1, 1000000);

           /* update record index */
           pwrite(temp_fd, data1, 600, 98765);
           pwrite(temp_fd, data2, 320, 54321);
           pwrite(temp_fd, data2, 15, 0);

           /* commit the entire update */
           args.file1_fd = temp_fd;
           ret = ioctl(fd, XFS_IOC_COMMIT_RANGE, &args);
           if (ret && errno == EBUSY)
               printf("file changed before commit; will roll back
");

NOTES
       Some  filesystems may limit the amount of data or the number of
       extents that can be exchanged in a single call.

SEE ALSO
       ioctl(2)

XFS                           2024-02-18     IOCTL-XFS-COMMIT-RANGE(2)

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=atomic-file-commits-6.12
---
Commits in this patchset:
 * xfs: introduce new file range commit ioctls
---
 fs/xfs/libxfs/xfs_fs.h |   26 +++++++++
 fs/xfs/xfs_exchrange.c |  143 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_exchrange.h |   16 +++++
 fs/xfs/xfs_ioctl.c     |    4 +
 fs/xfs/xfs_trace.h     |   57 +++++++++++++++++++
 5 files changed, 243 insertions(+), 3 deletions(-)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCHSET v4.2 2/8] xfs: cleanups before adding metadata directories
  2024-09-02 18:16 [PATCHBOMB 6.12] xfs: a ton of bugfixes and cleanups Darrick J. Wong
  2024-09-02 18:21 ` [PATCHSET v31.1 1/8] xfs: atomic file content commits Darrick J. Wong
@ 2024-09-02 18:21 ` Darrick J. Wong
  2024-09-02 18:23   ` [PATCH 1/3] xfs: validate inumber in xfs_iget Darrick J. Wong
                     ` (2 more replies)
  2024-09-02 18:21 ` [PATCHSET v4.2 3/8] xfs: clean up the rtbitmap code Darrick J. Wong
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:21 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Dave Chinner, Christoph Hellwig, linux-xfs

Hi all,

Before we start adding code for metadata directory trees, let's clean up
some warts in the realtime bitmap code and the inode allocator code.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=metadir-cleanups-6.12
---
Commits in this patchset:
 * xfs: validate inumber in xfs_iget
 * xfs: match on the global RT inode numbers in xfs_is_metadata_inode
 * xfs: pass the icreate args object to xfs_dialloc
---
 fs/xfs/libxfs/xfs_ialloc.c |    5 +++--
 fs/xfs/libxfs/xfs_ialloc.h |    4 +++-
 fs/xfs/scrub/tempfile.c    |    2 +-
 fs/xfs/xfs_icache.c        |    2 +-
 fs/xfs/xfs_inode.c         |    4 ++--
 fs/xfs/xfs_inode.h         |    7 ++++---
 fs/xfs/xfs_qm.c            |    2 +-
 fs/xfs/xfs_symlink.c       |    2 +-
 8 files changed, 16 insertions(+), 12 deletions(-)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCHSET v4.2 3/8] xfs: clean up the rtbitmap code
  2024-09-02 18:16 [PATCHBOMB 6.12] xfs: a ton of bugfixes and cleanups Darrick J. Wong
  2024-09-02 18:21 ` [PATCHSET v31.1 1/8] xfs: atomic file content commits Darrick J. Wong
  2024-09-02 18:21 ` [PATCHSET v4.2 2/8] xfs: cleanups before adding metadata directories Darrick J. Wong
@ 2024-09-02 18:21 ` Darrick J. Wong
  2024-09-02 18:24   ` [PATCH 01/12] xfs: remove xfs_validate_rtextents Darrick J. Wong
                     ` (11 more replies)
  2024-09-02 18:21 ` [PATCHSET v4.2 4/8] xfs: fixes for the realtime allocator Darrick J. Wong
                   ` (4 subsequent siblings)
  7 siblings, 12 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:21 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

Hi all,

Here are some cleanups and reorganization of the realtime bitmap code to share
more of that code between userspace and the kernel.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=rtbitmap-cleanups-6.12
---
Commits in this patchset:
 * xfs: remove xfs_validate_rtextents
 * xfs: factor out a xfs_validate_rt_geometry helper
 * xfs: make the RT rsum_cache mandatory
 * xfs: remove the limit argument to xfs_rtfind_back
 * xfs: assert a valid limit in xfs_rtfind_forw
 * xfs: add bounds checking to xfs_rt{bitmap,summary}_read_buf
 * xfs: cleanup the calling convention for xfs_rtpick_extent
 * xfs: push the calls to xfs_rtallocate_range out to xfs_bmap_rtalloc
 * xfs: factor out a xfs_growfs_rt_bmblock helper
 * xfs: factor out a xfs_last_rt_bmblock helper
 * xfs: factor out rtbitmap/summary initialization helpers
 * xfs: push transaction join out of xfs_rtbitmap_lock and xfs_rtgroup_lock
---
 fs/xfs/libxfs/xfs_bmap.c     |    3 
 fs/xfs/libxfs/xfs_rtbitmap.c |  192 ++++++++++++++-
 fs/xfs/libxfs/xfs_rtbitmap.h |   33 +--
 fs/xfs/libxfs/xfs_sb.c       |   64 +++--
 fs/xfs/libxfs/xfs_sb.h       |    1 
 fs/xfs/libxfs/xfs_types.h    |   12 -
 fs/xfs/xfs_rtalloc.c         |  535 +++++++++++++++++-------------------------
 7 files changed, 438 insertions(+), 402 deletions(-)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCHSET v4.2 4/8] xfs: fixes for the realtime allocator
  2024-09-02 18:16 [PATCHBOMB 6.12] xfs: a ton of bugfixes and cleanups Darrick J. Wong
                   ` (2 preceding siblings ...)
  2024-09-02 18:21 ` [PATCHSET v4.2 3/8] xfs: clean up the rtbitmap code Darrick J. Wong
@ 2024-09-02 18:21 ` Darrick J. Wong
  2024-09-02 18:27   ` [PATCH 01/10] xfs: use the recalculated transaction reservation in xfs_growfs_rt_bmblock Darrick J. Wong
                     ` (9 more replies)
  2024-09-02 18:22 ` [PATCHSET v4.2 5/8] xfs: cleanups for the realtime allocator Darrick J. Wong
                   ` (3 subsequent siblings)
  7 siblings, 10 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:21 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

Hi all,

While I was reviewing how to integrate realtime allocation groups with
the rt allocator, I noticed several bugs in the existing allocation code
with regards to calculating the maximum range of rtx to scan for free
space.  This series fixes those range bugs and cleans up a few things
too.

I also added a few cleanups from Christoph.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=rtalloc-fixes-6.12
---
Commits in this patchset:
 * xfs: use the recalculated transaction reservation in xfs_growfs_rt_bmblock
 * xfs: ensure rtx mask/shift are correct after growfs
 * xfs: don't return too-short extents from xfs_rtallocate_extent_block
 * xfs: don't scan off the end of the rt volume in xfs_rtallocate_extent_block
 * xfs: refactor aligning bestlen to prod
 * xfs: clean up xfs_rtallocate_extent_exact a bit
 * xfs: reduce excessive clamping of maxlen in xfs_rtallocate_extent_near
 * xfs: fix broken variable-sized allocation detection in xfs_rtallocate_extent_block
 * xfs: remove xfs_rtb_to_rtxrem
 * xfs: simplify xfs_rtalloc_query_range
---
 fs/xfs/libxfs/xfs_rtbitmap.c |   51 ++++++---------
 fs/xfs/libxfs/xfs_rtbitmap.h |   21 ------
 fs/xfs/libxfs/xfs_sb.c       |   12 +++
 fs/xfs/libxfs/xfs_sb.h       |    2 +
 fs/xfs/xfs_discard.c         |   15 ++--
 fs/xfs/xfs_fsmap.c           |   11 +--
 fs/xfs/xfs_rtalloc.c         |  145 +++++++++++++++++++++++-------------------
 7 files changed, 124 insertions(+), 133 deletions(-)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCHSET v4.2 5/8] xfs: cleanups for the realtime allocator
  2024-09-02 18:16 [PATCHBOMB 6.12] xfs: a ton of bugfixes and cleanups Darrick J. Wong
                   ` (3 preceding siblings ...)
  2024-09-02 18:21 ` [PATCHSET v4.2 4/8] xfs: fixes for the realtime allocator Darrick J. Wong
@ 2024-09-02 18:22 ` Darrick J. Wong
  2024-09-02 18:29   ` [PATCH 01/10] xfs: clean up the ISVALID macro in xfs_bmap_adjacent Darrick J. Wong
                     ` (9 more replies)
  2024-09-02 18:22 ` [PATCHSET v4.2 6/8] xfs: cleanups for quota mount Darrick J. Wong
                   ` (2 subsequent siblings)
  7 siblings, 10 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:22 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

Hi all,

This third series cleans up the realtime allocator code so that it'll be
somewhat less difficult to figure out what on earth it's doing.  We also
rearrange the fsmap code a bit.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=rtalloc-cleanups-6.12
---
Commits in this patchset:
 * xfs: clean up the ISVALID macro in xfs_bmap_adjacent
 * xfs: factor out a xfs_rtallocate helper
 * xfs: rework the rtalloc fallback handling
 * xfs: factor out a xfs_rtallocate_align helper
 * xfs: make the rtalloc start hint a xfs_rtblock_t
 * xfs: add xchk_setup_nothing and xchk_nothing helpers
 * xfs: remove xfs_{rtbitmap,rtsummary}_wordcount
 * xfs: replace m_rsumsize with m_rsumblocks
 * xfs: rearrange xfs_fsmap.c a little bit
 * xfs: move xfs_ioc_getfsmap out of xfs_ioctl.c
---
 fs/xfs/libxfs/xfs_bmap.c        |   55 +++--
 fs/xfs/libxfs/xfs_rtbitmap.c    |   33 ---
 fs/xfs/libxfs/xfs_rtbitmap.h    |    7 -
 fs/xfs/libxfs/xfs_trans_resv.c  |    2 
 fs/xfs/scrub/common.h           |   29 +--
 fs/xfs/scrub/rtsummary.c        |   11 -
 fs/xfs/scrub/rtsummary.h        |    2 
 fs/xfs/scrub/rtsummary_repair.c |   12 -
 fs/xfs/scrub/scrub.h            |   29 +--
 fs/xfs/xfs_fsmap.c              |  402 ++++++++++++++++++++++++++-------------
 fs/xfs/xfs_fsmap.h              |    6 -
 fs/xfs/xfs_ioctl.c              |  130 -------------
 fs/xfs/xfs_mount.h              |    2 
 fs/xfs/xfs_rtalloc.c            |  246 ++++++++++++++----------
 14 files changed, 477 insertions(+), 489 deletions(-)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCHSET v4.2 6/8] xfs: cleanups for quota mount
  2024-09-02 18:16 [PATCHBOMB 6.12] xfs: a ton of bugfixes and cleanups Darrick J. Wong
                   ` (4 preceding siblings ...)
  2024-09-02 18:22 ` [PATCHSET v4.2 5/8] xfs: cleanups for the realtime allocator Darrick J. Wong
@ 2024-09-02 18:22 ` Darrick J. Wong
  2024-09-02 18:32   ` [PATCH 1/1] xfs: refactor loading quota inodes in the regular case Darrick J. Wong
  2024-09-02 18:22 ` [PATCHSET 7/8] xfs: various bug fixes for 6.12 Darrick J. Wong
  2024-09-02 18:22 ` [PATCHSET v4.2 8/8] xfs: cleanups for inode rooted btree code Darrick J. Wong
  7 siblings, 1 reply; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:22 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

Hi all,

Refactor the quota file loading code in preparation for adding metadata
directory trees.  Did you know that quotarm works even when quota isn't active?

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=quota-cleanups-6.12
---
Commits in this patchset:
 * xfs: refactor loading quota inodes in the regular case
---
 fs/xfs/xfs_qm.c          |   46 +++++++++++++++++++++++++++++++++++-----
 fs/xfs/xfs_qm.h          |    3 +++
 fs/xfs/xfs_qm_syscalls.c |   13 +++++------
 fs/xfs/xfs_quotaops.c    |   53 +++++++++++++++++++++++++++-------------------
 4 files changed, 80 insertions(+), 35 deletions(-)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCHSET 7/8] xfs: various bug fixes for 6.12
  2024-09-02 18:16 [PATCHBOMB 6.12] xfs: a ton of bugfixes and cleanups Darrick J. Wong
                   ` (5 preceding siblings ...)
  2024-09-02 18:22 ` [PATCHSET v4.2 6/8] xfs: cleanups for quota mount Darrick J. Wong
@ 2024-09-02 18:22 ` Darrick J. Wong
  2024-09-02 18:32   ` [PATCH 1/3] xfs: fix C++ compilation errors in xfs_fs.h Darrick J. Wong
                     ` (2 more replies)
  2024-09-02 18:22 ` [PATCHSET v4.2 8/8] xfs: cleanups for inode rooted btree code Darrick J. Wong
  7 siblings, 3 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:22 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, kernel, sam, linux-xfs

Hi all,

Various bug fixes for 6.12.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=xfs-fixes-6.12
---
Commits in this patchset:
 * xfs: fix C++ compilation errors in xfs_fs.h
 * xfs: fix FITRIM reporting again
 * xfs: fix a sloppy memory handling bug in xfs_iroot_realloc
---
 fs/xfs/libxfs/xfs_fs.h         |    5 +++--
 fs/xfs/libxfs/xfs_inode_fork.c |   10 +++++-----
 fs/xfs/xfs_discard.c           |    2 +-
 3 files changed, 9 insertions(+), 8 deletions(-)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCHSET v4.2 8/8] xfs: cleanups for inode rooted btree code
  2024-09-02 18:16 [PATCHBOMB 6.12] xfs: a ton of bugfixes and cleanups Darrick J. Wong
                   ` (6 preceding siblings ...)
  2024-09-02 18:22 ` [PATCHSET 7/8] xfs: various bug fixes for 6.12 Darrick J. Wong
@ 2024-09-02 18:22 ` Darrick J. Wong
  2024-09-02 18:33   ` [PATCH 1/2] xfs: replace shouty XFS_BM{BT,DR} macros Darrick J. Wong
  2024-09-02 18:33   ` [PATCH 2/2] xfs: standardize the btree maxrecs function parameters Darrick J. Wong
  7 siblings, 2 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:22 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

Hi all,

This series prepares the btree code to support realtime reverse mapping btrees
by refactoring xfs_ifork_realloc to be fed a per-btree ops structure so that it
can handle multiple types of inode-rooted btrees.  It moves on to refactoring
the btree code to use the new realloc routines.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=btree-cleanups-6.12
---
Commits in this patchset:
 * xfs: replace shouty XFS_BM{BT,DR} macros
 * xfs: standardize the btree maxrecs function parameters
---
 fs/xfs/libxfs/xfs_alloc_btree.c    |    6 +
 fs/xfs/libxfs/xfs_alloc_btree.h    |    3 -
 fs/xfs/libxfs/xfs_attr_leaf.c      |    8 +
 fs/xfs/libxfs/xfs_bmap.c           |   42 ++++---
 fs/xfs/libxfs/xfs_bmap_btree.c     |   24 ++--
 fs/xfs/libxfs/xfs_bmap_btree.h     |  207 +++++++++++++++++++++++++-----------
 fs/xfs/libxfs/xfs_ialloc.c         |    4 -
 fs/xfs/libxfs/xfs_ialloc_btree.c   |    6 +
 fs/xfs/libxfs/xfs_ialloc_btree.h   |    3 -
 fs/xfs/libxfs/xfs_inode_fork.c     |   34 +++---
 fs/xfs/libxfs/xfs_refcount_btree.c |    5 +
 fs/xfs/libxfs/xfs_refcount_btree.h |    3 -
 fs/xfs/libxfs/xfs_rmap_btree.c     |    7 +
 fs/xfs/libxfs/xfs_rmap_btree.h     |    3 -
 fs/xfs/libxfs/xfs_sb.c             |   16 +--
 fs/xfs/libxfs/xfs_trans_resv.c     |    2 
 fs/xfs/scrub/bmap_repair.c         |    2 
 fs/xfs/scrub/inode_repair.c        |   12 +-
 fs/xfs/xfs_bmap_util.c             |    4 -
 19 files changed, 237 insertions(+), 154 deletions(-)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 1/1] xfs: introduce new file range commit ioctls
  2024-09-02 18:21 ` [PATCHSET v31.1 1/8] xfs: atomic file content commits Darrick J. Wong
@ 2024-09-02 18:23   ` Darrick J. Wong
  2024-09-03  7:52     ` Christian Brauner
  0 siblings, 1 reply; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:23 UTC (permalink / raw)
  To: chandanbabu, djwong
  Cc: Jeff Layton, Christoph Hellwig, linux-fsdevel, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

This patch introduces two more new ioctls to manage atomic updates to
file contents -- XFS_IOC_START_COMMIT and XFS_IOC_COMMIT_RANGE.  The
commit mechanism here is exactly the same as what XFS_IOC_EXCHANGE_RANGE
does, but with the additional requirement that file2 cannot have changed
since some sampling point.  The start-commit ioctl performs the sampling
of file attributes.

Note: This patch currently samples i_ctime during START_COMMIT and
checks that it hasn't changed during COMMIT_RANGE.  This isn't entirely
safe in kernels prior to 6.12 because ctime only had coarse grained
granularity and very fast updates could collide with a COMMIT_RANGE.
With the multi-granularity ctime introduced by Jeff Layton, it's now
possible to update ctime such that this does not happen.

It is critical, then, that this patch must not be backported to any
kernel that does not support fine-grained file change timestamps.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_fs.h |   26 +++++++++
 fs/xfs/xfs_exchrange.c |  143 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_exchrange.h |   16 +++++
 fs/xfs/xfs_ioctl.c     |    4 +
 fs/xfs/xfs_trace.h     |   57 +++++++++++++++++++
 5 files changed, 243 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 454b63ef7201..c85c8077fac3 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -825,6 +825,30 @@ struct xfs_exchange_range {
 	__u64		flags;		/* see XFS_EXCHANGE_RANGE_* below */
 };
 
+/*
+ * Using the same definition of file2 as struct xfs_exchange_range, commit the
+ * contents of file1 into file2 if file2 has the same inode number, mtime, and
+ * ctime as the arguments provided to the call.  The old contents of file2 will
+ * be moved to file1.
+ *
+ * Returns -EBUSY if there isn't an exact match for the file2 fields.
+ *
+ * Filesystems must be able to restart and complete the operation even after
+ * the system goes down.
+ */
+struct xfs_commit_range {
+	__s32		file1_fd;
+	__u32		pad;		/* must be zeroes */
+	__u64		file1_offset;	/* file1 offset, bytes */
+	__u64		file2_offset;	/* file2 offset, bytes */
+	__u64		length;		/* bytes to exchange */
+
+	__u64		flags;		/* see XFS_EXCHANGE_RANGE_* below */
+
+	/* opaque file2 metadata for freshness checks */
+	__u64		file2_freshness[6];
+};
+
 /*
  * Exchange file data all the way to the ends of both files, and then exchange
  * the file sizes.  This flag can be used to replace a file's contents with a
@@ -997,6 +1021,8 @@ struct xfs_getparents_by_handle {
 #define XFS_IOC_BULKSTAT	     _IOR ('X', 127, struct xfs_bulkstat_req)
 #define XFS_IOC_INUMBERS	     _IOR ('X', 128, struct xfs_inumbers_req)
 #define XFS_IOC_EXCHANGE_RANGE	     _IOW ('X', 129, struct xfs_exchange_range)
+#define XFS_IOC_START_COMMIT	     _IOR ('X', 130, struct xfs_commit_range)
+#define XFS_IOC_COMMIT_RANGE	     _IOW ('X', 131, struct xfs_commit_range)
 /*	XFS_IOC_GETFSUUID ---------- deprecated 140	 */
 
 
diff --git a/fs/xfs/xfs_exchrange.c b/fs/xfs/xfs_exchrange.c
index c8a655c92c92..d0889190ab7f 100644
--- a/fs/xfs/xfs_exchrange.c
+++ b/fs/xfs/xfs_exchrange.c
@@ -72,6 +72,34 @@ xfs_exchrange_estimate(
 	return error;
 }
 
+/*
+ * Check that file2's metadata agree with the snapshot that we took for the
+ * range commit request.
+ *
+ * This should be called after the filesystem has locked /all/ inode metadata
+ * against modification.
+ */
+STATIC int
+xfs_exchrange_check_freshness(
+	const struct xfs_exchrange	*fxr,
+	struct xfs_inode		*ip2)
+{
+	struct inode			*inode2 = VFS_I(ip2);
+	struct timespec64		ctime = inode_get_ctime(inode2);
+	struct timespec64		mtime = inode_get_mtime(inode2);
+
+	trace_xfs_exchrange_freshness(fxr, ip2);
+
+	/* Check that file2 hasn't otherwise been modified. */
+	if (fxr->file2_ino != ip2->i_ino ||
+	    fxr->file2_gen != inode2->i_generation ||
+	    !timespec64_equal(&fxr->file2_ctime, &ctime) ||
+	    !timespec64_equal(&fxr->file2_mtime, &mtime))
+		return -EBUSY;
+
+	return 0;
+}
+
 #define QRETRY_IP1	(0x1)
 #define QRETRY_IP2	(0x2)
 
@@ -607,6 +635,12 @@ xfs_exchrange_prep(
 	if (error || fxr->length == 0)
 		return error;
 
+	if (fxr->flags & __XFS_EXCHANGE_RANGE_CHECK_FRESH2) {
+		error = xfs_exchrange_check_freshness(fxr, ip2);
+		if (error)
+			return error;
+	}
+
 	/* Attach dquots to both inodes before changing block maps. */
 	error = xfs_qm_dqattach(ip2);
 	if (error)
@@ -719,7 +753,8 @@ xfs_exchange_range(
 	if (fxr->file1->f_path.mnt != fxr->file2->f_path.mnt)
 		return -EXDEV;
 
-	if (fxr->flags & ~XFS_EXCHANGE_RANGE_ALL_FLAGS)
+	if (fxr->flags & ~(XFS_EXCHANGE_RANGE_ALL_FLAGS |
+			 __XFS_EXCHANGE_RANGE_CHECK_FRESH2))
 		return -EINVAL;
 
 	/* Userspace requests only honored for regular files. */
@@ -802,3 +837,109 @@ xfs_ioc_exchange_range(
 	fdput(file1);
 	return error;
 }
+
+/* Opaque freshness blob for XFS_IOC_COMMIT_RANGE */
+struct xfs_commit_range_fresh {
+	xfs_fsid_t	fsid;		/* m_fixedfsid */
+	__u64		file2_ino;	/* inode number */
+	__s64		file2_mtime;	/* modification time */
+	__s64		file2_ctime;	/* change time */
+	__s32		file2_mtime_nsec; /* mod time, nsec */
+	__s32		file2_ctime_nsec; /* change time, nsec */
+	__u32		file2_gen;	/* inode generation */
+	__u32		magic;		/* zero */
+};
+#define XCR_FRESH_MAGIC	0x444F524B	/* DORK */
+
+/* Set up a commitrange operation by sampling file2's write-related attrs */
+long
+xfs_ioc_start_commit(
+	struct file			*file,
+	struct xfs_commit_range __user	*argp)
+{
+	struct xfs_commit_range		args = { };
+	struct timespec64		ts;
+	struct xfs_commit_range_fresh	*kern_f;
+	struct xfs_commit_range_fresh	__user *user_f;
+	struct inode			*inode2 = file_inode(file);
+	struct xfs_inode		*ip2 = XFS_I(inode2);
+	const unsigned int		lockflags = XFS_IOLOCK_SHARED |
+						    XFS_MMAPLOCK_SHARED |
+						    XFS_ILOCK_SHARED;
+
+	BUILD_BUG_ON(sizeof(struct xfs_commit_range_fresh) !=
+		     sizeof(args.file2_freshness));
+
+	kern_f = (struct xfs_commit_range_fresh *)&args.file2_freshness;
+
+	memcpy(&kern_f->fsid, ip2->i_mount->m_fixedfsid, sizeof(xfs_fsid_t));
+
+	xfs_ilock(ip2, lockflags);
+	ts = inode_get_ctime(inode2);
+	kern_f->file2_ctime		= ts.tv_sec;
+	kern_f->file2_ctime_nsec	= ts.tv_nsec;
+	ts = inode_get_mtime(inode2);
+	kern_f->file2_mtime		= ts.tv_sec;
+	kern_f->file2_mtime_nsec	= ts.tv_nsec;
+	kern_f->file2_ino		= ip2->i_ino;
+	kern_f->file2_gen		= inode2->i_generation;
+	kern_f->magic			= XCR_FRESH_MAGIC;
+	xfs_iunlock(ip2, lockflags);
+
+	user_f = (struct xfs_commit_range_fresh __user *)&argp->file2_freshness;
+	if (copy_to_user(user_f, kern_f, sizeof(*kern_f)))
+		return -EFAULT;
+
+	return 0;
+}
+
+/*
+ * Exchange file1 and file2 contents if file2 has not been written since the
+ * start commit operation.
+ */
+long
+xfs_ioc_commit_range(
+	struct file			*file,
+	struct xfs_commit_range __user	*argp)
+{
+	struct xfs_exchrange		fxr = {
+		.file2			= file,
+	};
+	struct xfs_commit_range		args;
+	struct xfs_commit_range_fresh	*kern_f;
+	struct xfs_inode		*ip2 = XFS_I(file_inode(file));
+	struct xfs_mount		*mp = ip2->i_mount;
+	struct fd			file1;
+	int				error;
+
+	kern_f = (struct xfs_commit_range_fresh *)&args.file2_freshness;
+
+	if (copy_from_user(&args, argp, sizeof(args)))
+		return -EFAULT;
+	if (args.flags & ~XFS_EXCHANGE_RANGE_ALL_FLAGS)
+		return -EINVAL;
+	if (kern_f->magic != XCR_FRESH_MAGIC)
+		return -EBUSY;
+	if (memcmp(&kern_f->fsid, mp->m_fixedfsid, sizeof(xfs_fsid_t)))
+		return -EBUSY;
+
+	fxr.file1_offset	= args.file1_offset;
+	fxr.file2_offset	= args.file2_offset;
+	fxr.length		= args.length;
+	fxr.flags		= args.flags | __XFS_EXCHANGE_RANGE_CHECK_FRESH2;
+	fxr.file2_ino		= kern_f->file2_ino;
+	fxr.file2_gen		= kern_f->file2_gen;
+	fxr.file2_mtime.tv_sec	= kern_f->file2_mtime;
+	fxr.file2_mtime.tv_nsec	= kern_f->file2_mtime_nsec;
+	fxr.file2_ctime.tv_sec	= kern_f->file2_ctime;
+	fxr.file2_ctime.tv_nsec	= kern_f->file2_ctime_nsec;
+
+	file1 = fdget(args.file1_fd);
+	if (!file1.file)
+		return -EBADF;
+	fxr.file1 = file1.file;
+
+	error = xfs_exchange_range(&fxr);
+	fdput(file1);
+	return error;
+}
diff --git a/fs/xfs/xfs_exchrange.h b/fs/xfs/xfs_exchrange.h
index 039abcca546e..bc1298aba806 100644
--- a/fs/xfs/xfs_exchrange.h
+++ b/fs/xfs/xfs_exchrange.h
@@ -10,8 +10,12 @@
 #define __XFS_EXCHANGE_RANGE_UPD_CMTIME1	(1ULL << 63)
 #define __XFS_EXCHANGE_RANGE_UPD_CMTIME2	(1ULL << 62)
 
+/* Freshness check required */
+#define __XFS_EXCHANGE_RANGE_CHECK_FRESH2	(1ULL << 61)
+
 #define XFS_EXCHANGE_RANGE_PRIV_FLAGS	(__XFS_EXCHANGE_RANGE_UPD_CMTIME1 | \
-					 __XFS_EXCHANGE_RANGE_UPD_CMTIME2)
+					 __XFS_EXCHANGE_RANGE_UPD_CMTIME2 | \
+					 __XFS_EXCHANGE_RANGE_CHECK_FRESH2)
 
 struct xfs_exchrange {
 	struct file		*file1;
@@ -22,10 +26,20 @@ struct xfs_exchrange {
 	u64			length;
 
 	u64			flags;	/* XFS_EXCHANGE_RANGE flags */
+
+	/* file2 metadata for freshness checks */
+	u64			file2_ino;
+	struct timespec64	file2_mtime;
+	struct timespec64	file2_ctime;
+	u32			file2_gen;
 };
 
 long xfs_ioc_exchange_range(struct file *file,
 		struct xfs_exchange_range __user *argp);
+long xfs_ioc_start_commit(struct file *file,
+		struct xfs_commit_range __user *argp);
+long xfs_ioc_commit_range(struct file *file,
+		struct xfs_commit_range __user	*argp);
 
 struct xfs_exchmaps_req;
 
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 6b13666d4e96..90b3ee21e7fe 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1518,6 +1518,10 @@ xfs_file_ioctl(
 
 	case XFS_IOC_EXCHANGE_RANGE:
 		return xfs_ioc_exchange_range(filp, arg);
+	case XFS_IOC_START_COMMIT:
+		return xfs_ioc_start_commit(filp, arg);
+	case XFS_IOC_COMMIT_RANGE:
+		return xfs_ioc_commit_range(filp, arg);
 
 	default:
 		return -ENOTTY;
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 180ce697305a..4cf0fa71ba9c 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -4926,7 +4926,8 @@ DEFINE_INODE_ERROR_EVENT(xfs_exchrange_error);
 	{ XFS_EXCHANGE_RANGE_DRY_RUN,		"DRY_RUN" }, \
 	{ XFS_EXCHANGE_RANGE_FILE1_WRITTEN,	"F1_WRITTEN" }, \
 	{ __XFS_EXCHANGE_RANGE_UPD_CMTIME1,	"CMTIME1" }, \
-	{ __XFS_EXCHANGE_RANGE_UPD_CMTIME2,	"CMTIME2" }
+	{ __XFS_EXCHANGE_RANGE_UPD_CMTIME2,	"CMTIME2" }, \
+	{ __XFS_EXCHANGE_RANGE_CHECK_FRESH2,	"FRESH2" }
 
 /* file exchange-range tracepoint class */
 DECLARE_EVENT_CLASS(xfs_exchrange_class,
@@ -4986,6 +4987,60 @@ DEFINE_EXCHRANGE_EVENT(xfs_exchrange_prep);
 DEFINE_EXCHRANGE_EVENT(xfs_exchrange_flush);
 DEFINE_EXCHRANGE_EVENT(xfs_exchrange_mappings);
 
+TRACE_EVENT(xfs_exchrange_freshness,
+	TP_PROTO(const struct xfs_exchrange *fxr, struct xfs_inode *ip2),
+	TP_ARGS(fxr, ip2),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ip2_ino)
+		__field(long long, ip2_mtime)
+		__field(long long, ip2_ctime)
+		__field(int, ip2_mtime_nsec)
+		__field(int, ip2_ctime_nsec)
+
+		__field(xfs_ino_t, file2_ino)
+		__field(long long, file2_mtime)
+		__field(long long, file2_ctime)
+		__field(int, file2_mtime_nsec)
+		__field(int, file2_ctime_nsec)
+	),
+	TP_fast_assign(
+		struct timespec64	ts64;
+		struct inode		*inode2 = VFS_I(ip2);
+
+		__entry->dev = inode2->i_sb->s_dev;
+		__entry->ip2_ino = ip2->i_ino;
+
+		ts64 = inode_get_ctime(inode2);
+		__entry->ip2_ctime = ts64.tv_sec;
+		__entry->ip2_ctime_nsec = ts64.tv_nsec;
+
+		ts64 = inode_get_mtime(inode2);
+		__entry->ip2_mtime = ts64.tv_sec;
+		__entry->ip2_mtime_nsec = ts64.tv_nsec;
+
+		__entry->file2_ino = fxr->file2_ino;
+		__entry->file2_mtime = fxr->file2_mtime.tv_sec;
+		__entry->file2_ctime = fxr->file2_ctime.tv_sec;
+		__entry->file2_mtime_nsec = fxr->file2_mtime.tv_nsec;
+		__entry->file2_ctime_nsec = fxr->file2_ctime.tv_nsec;
+	),
+	TP_printk("dev %d:%d "
+		  "ino 0x%llx mtime %lld:%d ctime %lld:%d -> "
+		  "file 0x%llx mtime %lld:%d ctime %lld:%d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ip2_ino,
+		  __entry->ip2_mtime,
+		  __entry->ip2_mtime_nsec,
+		  __entry->ip2_ctime,
+		  __entry->ip2_ctime_nsec,
+		  __entry->file2_ino,
+		  __entry->file2_mtime,
+		  __entry->file2_mtime_nsec,
+		  __entry->file2_ctime,
+		  __entry->file2_ctime_nsec)
+);
+
 TRACE_EVENT(xfs_exchmaps_overhead,
 	TP_PROTO(struct xfs_mount *mp, unsigned long long bmbt_blocks,
 		 unsigned long long rmapbt_blocks),


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 1/3] xfs: validate inumber in xfs_iget
  2024-09-02 18:21 ` [PATCHSET v4.2 2/8] xfs: cleanups before adding metadata directories Darrick J. Wong
@ 2024-09-02 18:23   ` Darrick J. Wong
  2024-09-02 18:23   ` [PATCH 2/3] xfs: match on the global RT inode numbers in xfs_is_metadata_inode Darrick J. Wong
  2024-09-02 18:23   ` [PATCH 3/3] xfs: pass the icreate args object to xfs_dialloc Darrick J. Wong
  2 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:23 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, Dave Chinner, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Actually use the inumber validator to check the argument passed in here.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_icache.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index cf629302d48e..887d2a01161e 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -755,7 +755,7 @@ xfs_iget(
 	ASSERT((lock_flags & (XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED)) == 0);
 
 	/* reject inode numbers outside existing AGs */
-	if (!ino || XFS_INO_TO_AGNO(mp, ino) >= mp->m_sb.sb_agcount)
+	if (!xfs_verify_ino(mp, ino))
 		return -EINVAL;
 
 	XFS_STATS_INC(mp, xs_ig_attempts);


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 2/3] xfs: match on the global RT inode numbers in xfs_is_metadata_inode
  2024-09-02 18:21 ` [PATCHSET v4.2 2/8] xfs: cleanups before adding metadata directories Darrick J. Wong
  2024-09-02 18:23   ` [PATCH 1/3] xfs: validate inumber in xfs_iget Darrick J. Wong
@ 2024-09-02 18:23   ` Darrick J. Wong
  2024-09-02 18:23   ` [PATCH 3/3] xfs: pass the icreate args object to xfs_dialloc Darrick J. Wong
  2 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:23 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Match the inode number instead of the inode pointers, as the inode
pointers in the superblock will go away soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: port to my tree, make the parameter a const pointer]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_inode.h |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 51defdebef30..1908409968db 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -276,12 +276,13 @@ static inline bool xfs_is_reflink_inode(struct xfs_inode *ip)
 	return ip->i_diflags2 & XFS_DIFLAG2_REFLINK;
 }
 
-static inline bool xfs_is_metadata_inode(struct xfs_inode *ip)
+static inline bool xfs_is_metadata_inode(const struct xfs_inode *ip)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 
-	return ip == mp->m_rbmip || ip == mp->m_rsumip ||
-		xfs_is_quota_inode(&mp->m_sb, ip->i_ino);
+	return ip->i_ino == mp->m_sb.sb_rbmino ||
+	       ip->i_ino == mp->m_sb.sb_rsumino ||
+	       xfs_is_quota_inode(&mp->m_sb, ip->i_ino);
 }
 
 bool xfs_is_always_cow_inode(struct xfs_inode *ip);


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 3/3] xfs: pass the icreate args object to xfs_dialloc
  2024-09-02 18:21 ` [PATCHSET v4.2 2/8] xfs: cleanups before adding metadata directories Darrick J. Wong
  2024-09-02 18:23   ` [PATCH 1/3] xfs: validate inumber in xfs_iget Darrick J. Wong
  2024-09-02 18:23   ` [PATCH 2/3] xfs: match on the global RT inode numbers in xfs_is_metadata_inode Darrick J. Wong
@ 2024-09-02 18:23   ` Darrick J. Wong
  2 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:23 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Pass the xfs_icreate_args object to xfs_dialloc since we can extract the
relevant mode (really just the file type) and parent inumber from there.
This simplifies the calling convention in preparation for the next
patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_ialloc.c |    5 +++--
 fs/xfs/libxfs/xfs_ialloc.h |    4 +++-
 fs/xfs/scrub/tempfile.c    |    2 +-
 fs/xfs/xfs_inode.c         |    4 ++--
 fs/xfs/xfs_qm.c            |    2 +-
 fs/xfs/xfs_symlink.c       |    2 +-
 6 files changed, 11 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 0af5b7a33d05..fc70601e8d8e 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -1855,11 +1855,12 @@ xfs_dialloc_try_ag(
 int
 xfs_dialloc(
 	struct xfs_trans	**tpp,
-	xfs_ino_t		parent,
-	umode_t			mode,
+	const struct xfs_icreate_args *args,
 	xfs_ino_t		*new_ino)
 {
 	struct xfs_mount	*mp = (*tpp)->t_mountp;
+	xfs_ino_t		parent = args->pip ? args->pip->i_ino : 0;
+	umode_t			mode = args->mode & S_IFMT;
 	xfs_agnumber_t		agno;
 	int			error = 0;
 	xfs_agnumber_t		start_agno;
diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
index b549627e3a61..3a1323155a45 100644
--- a/fs/xfs/libxfs/xfs_ialloc.h
+++ b/fs/xfs/libxfs/xfs_ialloc.h
@@ -33,11 +33,13 @@ xfs_make_iptr(struct xfs_mount *mp, struct xfs_buf *b, int o)
 	return xfs_buf_offset(b, o << (mp)->m_sb.sb_inodelog);
 }
 
+struct xfs_icreate_args;
+
 /*
  * Allocate an inode on disk.  Mode is used to tell whether the new inode will
  * need space, and whether it is a directory.
  */
-int xfs_dialloc(struct xfs_trans **tpp, xfs_ino_t parent, umode_t mode,
+int xfs_dialloc(struct xfs_trans **tpp, const struct xfs_icreate_args *args,
 		xfs_ino_t *new_ino);
 
 int xfs_difree(struct xfs_trans *tp, struct xfs_perag *pag,
diff --git a/fs/xfs/scrub/tempfile.c b/fs/xfs/scrub/tempfile.c
index d390d56cd875..177f922acfaf 100644
--- a/fs/xfs/scrub/tempfile.c
+++ b/fs/xfs/scrub/tempfile.c
@@ -88,7 +88,7 @@ xrep_tempfile_create(
 		goto out_release_dquots;
 
 	/* Allocate inode, set up directory. */
-	error = xfs_dialloc(&tp, dp->i_ino, mode, &ino);
+	error = xfs_dialloc(&tp, &args, &ino);
 	if (error)
 		goto out_trans_cancel;
 	error = xfs_icreate(tp, ino, &args, &sc->tempip);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 7dc6f326936c..9ea7a18f5da1 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -704,7 +704,7 @@ xfs_create(
 	 * entry pointing to them, but a directory also the "." entry
 	 * pointing to itself.
 	 */
-	error = xfs_dialloc(&tp, dp->i_ino, args->mode, &ino);
+	error = xfs_dialloc(&tp, args, &ino);
 	if (!error)
 		error = xfs_icreate(tp, ino, args, &du.ip);
 	if (error)
@@ -812,7 +812,7 @@ xfs_create_tmpfile(
 	if (error)
 		goto out_release_dquots;
 
-	error = xfs_dialloc(&tp, dp->i_ino, args->mode, &ino);
+	error = xfs_dialloc(&tp, args, &ino);
 	if (!error)
 		error = xfs_icreate(tp, ino, args, &ip);
 	if (error)
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 9490b913a4ab..63f6ca2db251 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -799,7 +799,7 @@ xfs_qm_qino_alloc(
 		};
 		xfs_ino_t	ino;
 
-		error = xfs_dialloc(&tp, 0, S_IFREG, &ino);
+		error = xfs_dialloc(&tp, &args, &ino);
 		if (!error)
 			error = xfs_icreate(tp, ino, &args, ipp);
 		if (error) {
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 77f19e2f66e0..4252b07cd251 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -165,7 +165,7 @@ xfs_symlink(
 	/*
 	 * Allocate an inode for the symlink.
 	 */
-	error = xfs_dialloc(&tp, dp->i_ino, S_IFLNK, &ino);
+	error = xfs_dialloc(&tp, &args, &ino);
 	if (!error)
 		error = xfs_icreate(tp, ino, &args, &du.ip);
 	if (error)


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 01/12] xfs: remove xfs_validate_rtextents
  2024-09-02 18:21 ` [PATCHSET v4.2 3/8] xfs: clean up the rtbitmap code Darrick J. Wong
@ 2024-09-02 18:24   ` Darrick J. Wong
  2024-09-02 18:24   ` [PATCH 02/12] xfs: factor out a xfs_validate_rt_geometry helper Darrick J. Wong
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:24 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Replace xfs_validate_rtextents with an open coded check for 0
rtextents.  The name for the function implies it does a lot more
than a zero check, which is more obvious when open coded.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_sb.c    |    2 +-
 fs/xfs/libxfs/xfs_types.h |   12 ------------
 fs/xfs/xfs_rtalloc.c      |    2 +-
 3 files changed, 2 insertions(+), 14 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 6b56f0f6d4c1..f2fb6035fd21 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -514,7 +514,7 @@ xfs_validate_sb_common(
 		rbmblocks = howmany_64(sbp->sb_rextents,
 				       NBBY * sbp->sb_blocksize);
 
-		if (!xfs_validate_rtextents(rexts) ||
+		if (sbp->sb_rextents == 0 ||
 		    sbp->sb_rextents != rexts ||
 		    sbp->sb_rextslog != xfs_compute_rextslog(rexts) ||
 		    sbp->sb_rbmblocks != rbmblocks) {
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index 76eb9e328835..a8cd44d03ef6 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -235,16 +235,4 @@ bool xfs_verify_fileoff(struct xfs_mount *mp, xfs_fileoff_t off);
 bool xfs_verify_fileext(struct xfs_mount *mp, xfs_fileoff_t off,
 		xfs_fileoff_t len);
 
-/* Do we support an rt volume having this number of rtextents? */
-static inline bool
-xfs_validate_rtextents(
-	xfs_rtbxlen_t		rtextents)
-{
-	/* No runt rt volumes */
-	if (rtextents == 0)
-		return false;
-
-	return true;
-}
-
 #endif	/* __XFS_TYPES_H__ */
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index ebeab8e4dab1..d28395abdd02 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -903,7 +903,7 @@ xfs_growfs_rt(
 	 */
 	nrextents = nrblocks;
 	do_div(nrextents, in->extsize);
-	if (!xfs_validate_rtextents(nrextents)) {
+	if (nrextents == 0) {
 		error = -EINVAL;
 		goto out_unlock;
 	}


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 02/12] xfs: factor out a xfs_validate_rt_geometry helper
  2024-09-02 18:21 ` [PATCHSET v4.2 3/8] xfs: clean up the rtbitmap code Darrick J. Wong
  2024-09-02 18:24   ` [PATCH 01/12] xfs: remove xfs_validate_rtextents Darrick J. Wong
@ 2024-09-02 18:24   ` Darrick J. Wong
  2024-09-02 18:24   ` [PATCH 03/12] xfs: make the RT rsum_cache mandatory Darrick J. Wong
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:24 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Split the RT geometry validation in the early mount code into a
helper than can be reused by repair (from which this code was
apparently originally stolen anyway).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: u64 return value for calc_rbmblocks]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_sb.c |   64 ++++++++++++++++++++++++++----------------------
 fs/xfs/libxfs/xfs_sb.h |    1 +
 2 files changed, 36 insertions(+), 29 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index f2fb6035fd21..f9c3045f71e0 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -232,6 +232,38 @@ xfs_validate_sb_read(
 	return 0;
 }
 
+static uint64_t
+xfs_sb_calc_rbmblocks(
+	struct xfs_sb		*sbp)
+{
+	return howmany_64(sbp->sb_rextents, NBBY * sbp->sb_blocksize);
+}
+
+/* Validate the realtime geometry */
+bool
+xfs_validate_rt_geometry(
+	struct xfs_sb		*sbp)
+{
+	if (sbp->sb_rextsize * sbp->sb_blocksize > XFS_MAX_RTEXTSIZE ||
+	    sbp->sb_rextsize * sbp->sb_blocksize < XFS_MIN_RTEXTSIZE)
+		return false;
+
+	if (sbp->sb_rblocks == 0) {
+		if (sbp->sb_rextents != 0 || sbp->sb_rbmblocks != 0 ||
+		    sbp->sb_rextslog != 0 || sbp->sb_frextents != 0)
+			return false;
+		return true;
+	}
+
+	if (sbp->sb_rextents == 0 ||
+	    sbp->sb_rextents != div_u64(sbp->sb_rblocks, sbp->sb_rextsize) ||
+	    sbp->sb_rextslog != xfs_compute_rextslog(sbp->sb_rextents) ||
+	    sbp->sb_rbmblocks != xfs_sb_calc_rbmblocks(sbp))
+		return false;
+
+	return true;
+}
+
 /* Check all the superblock fields we care about when writing one out. */
 STATIC int
 xfs_validate_sb_write(
@@ -491,39 +523,13 @@ xfs_validate_sb_common(
 		}
 	}
 
-	/* Validate the realtime geometry; stolen from xfs_repair */
-	if (sbp->sb_rextsize * sbp->sb_blocksize > XFS_MAX_RTEXTSIZE ||
-	    sbp->sb_rextsize * sbp->sb_blocksize < XFS_MIN_RTEXTSIZE) {
+	if (!xfs_validate_rt_geometry(sbp)) {
 		xfs_notice(mp,
-			"realtime extent sanity check failed");
+			"realtime %sgeometry check failed",
+			sbp->sb_rblocks ? "" : "zeroed ");
 		return -EFSCORRUPTED;
 	}
 
-	if (sbp->sb_rblocks == 0) {
-		if (sbp->sb_rextents != 0 || sbp->sb_rbmblocks != 0 ||
-		    sbp->sb_rextslog != 0 || sbp->sb_frextents != 0) {
-			xfs_notice(mp,
-				"realtime zeroed geometry check failed");
-			return -EFSCORRUPTED;
-		}
-	} else {
-		uint64_t	rexts;
-		uint64_t	rbmblocks;
-
-		rexts = div_u64(sbp->sb_rblocks, sbp->sb_rextsize);
-		rbmblocks = howmany_64(sbp->sb_rextents,
-				       NBBY * sbp->sb_blocksize);
-
-		if (sbp->sb_rextents == 0 ||
-		    sbp->sb_rextents != rexts ||
-		    sbp->sb_rextslog != xfs_compute_rextslog(rexts) ||
-		    sbp->sb_rbmblocks != rbmblocks) {
-			xfs_notice(mp,
-				"realtime geometry sanity check failed");
-			return -EFSCORRUPTED;
-		}
-	}
-
 	/*
 	 * Either (sb_unit and !hasdalign) or (!sb_unit and hasdalign)
 	 * would imply the image is corrupted.
diff --git a/fs/xfs/libxfs/xfs_sb.h b/fs/xfs/libxfs/xfs_sb.h
index 37b1ed1bc209..796f02191dfd 100644
--- a/fs/xfs/libxfs/xfs_sb.h
+++ b/fs/xfs/libxfs/xfs_sb.h
@@ -38,6 +38,7 @@ extern int	xfs_sb_get_secondary(struct xfs_mount *mp,
 bool	xfs_validate_stripe_geometry(struct xfs_mount *mp,
 		__s64 sunit, __s64 swidth, int sectorsize, bool may_repair,
 		bool silent);
+bool	xfs_validate_rt_geometry(struct xfs_sb *sbp);
 
 uint8_t xfs_compute_rextslog(xfs_rtbxlen_t rtextents);
 


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 03/12] xfs: make the RT rsum_cache mandatory
  2024-09-02 18:21 ` [PATCHSET v4.2 3/8] xfs: clean up the rtbitmap code Darrick J. Wong
  2024-09-02 18:24   ` [PATCH 01/12] xfs: remove xfs_validate_rtextents Darrick J. Wong
  2024-09-02 18:24   ` [PATCH 02/12] xfs: factor out a xfs_validate_rt_geometry helper Darrick J. Wong
@ 2024-09-02 18:24   ` Darrick J. Wong
  2024-09-02 18:24   ` [PATCH 04/12] xfs: remove the limit argument to xfs_rtfind_back Darrick J. Wong
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:24 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Currently the RT mount code simply ignores an allocation failure for the
rsum_cache.  The code mostly works fine with it, but not having it leads
to nasty corner cases in the growfs code that we don't really handle
well.  Switch to failing the mount if we can't allocate the memory, the
file system would not exactly be useful in such a constrained environment
to start with.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_rtalloc.c |   26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)


diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index d28395abdd02..26eab1b408c8 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -767,21 +767,20 @@ xfs_growfs_rt_alloc(
 	return error;
 }
 
-static void
+static int
 xfs_alloc_rsum_cache(
-	xfs_mount_t	*mp,		/* file system mount structure */
-	xfs_extlen_t	rbmblocks)	/* number of rt bitmap blocks */
+	struct xfs_mount	*mp,
+	xfs_extlen_t		rbmblocks)
 {
 	/*
 	 * The rsum cache is initialized to the maximum value, which is
 	 * trivially an upper bound on the maximum level with any free extents.
-	 * We can continue without the cache if it couldn't be allocated.
 	 */
 	mp->m_rsum_cache = kvmalloc(rbmblocks, GFP_KERNEL);
-	if (mp->m_rsum_cache)
-		memset(mp->m_rsum_cache, -1, rbmblocks);
-	else
-		xfs_warn(mp, "could not allocate realtime summary cache");
+	if (!mp->m_rsum_cache)
+		return -ENOMEM;
+	memset(mp->m_rsum_cache, -1, rbmblocks);
+	return 0;
 }
 
 /*
@@ -939,8 +938,11 @@ xfs_growfs_rt(
 		goto out_unlock;
 
 	rsum_cache = mp->m_rsum_cache;
-	if (nrbmblocks != sbp->sb_rbmblocks)
-		xfs_alloc_rsum_cache(mp, nrbmblocks);
+	if (nrbmblocks != sbp->sb_rbmblocks) {
+		error = xfs_alloc_rsum_cache(mp, nrbmblocks);
+		if (error)
+			goto out_unlock;
+	}
 
 	/*
 	 * Allocate a new (fake) mount/sb.
@@ -1268,7 +1270,9 @@ xfs_rtmount_inodes(
 	if (error)
 		goto out_rele_summary;
 
-	xfs_alloc_rsum_cache(mp, sbp->sb_rbmblocks);
+	error = xfs_alloc_rsum_cache(mp, sbp->sb_rbmblocks);
+	if (error)
+		goto out_rele_summary;
 	return 0;
 
 out_rele_summary:


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 04/12] xfs: remove the limit argument to xfs_rtfind_back
  2024-09-02 18:21 ` [PATCHSET v4.2 3/8] xfs: clean up the rtbitmap code Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-09-02 18:24   ` [PATCH 03/12] xfs: make the RT rsum_cache mandatory Darrick J. Wong
@ 2024-09-02 18:24   ` Darrick J. Wong
  2024-09-02 18:25   ` [PATCH 05/12] xfs: assert a valid limit in xfs_rtfind_forw Darrick J. Wong
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:24 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

All callers pass a 0 limit to xfs_rtfind_back, so remove the argument
and hard code it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_rtbitmap.c |    9 ++++-----
 fs/xfs/libxfs/xfs_rtbitmap.h |    2 +-
 fs/xfs/xfs_rtalloc.c         |    2 +-
 3 files changed, 6 insertions(+), 7 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
index 386b672c5058..9feeefe53948 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.c
+++ b/fs/xfs/libxfs/xfs_rtbitmap.c
@@ -139,14 +139,13 @@ xfs_rtbuf_get(
 }
 
 /*
- * Searching backward from start to limit, find the first block whose
- * allocated/free state is different from start's.
+ * Searching backward from start find the first block whose allocated/free state
+ * is different from start's.
  */
 int
 xfs_rtfind_back(
 	struct xfs_rtalloc_args	*args,
 	xfs_rtxnum_t		start,	/* starting rtext to look at */
-	xfs_rtxnum_t		limit,	/* last rtext to look at */
 	xfs_rtxnum_t		*rtx)	/* out: start rtext found */
 {
 	struct xfs_mount	*mp = args->mp;
@@ -175,7 +174,7 @@ xfs_rtfind_back(
 	 */
 	word = xfs_rtx_to_rbmword(mp, start);
 	bit = (int)(start & (XFS_NBWORD - 1));
-	len = start - limit + 1;
+	len = start + 1;
 	/*
 	 * Compute match value, based on the bit at start: if 1 (free)
 	 * then all-ones, else all-zeroes.
@@ -698,7 +697,7 @@ xfs_rtfree_range(
 	 * We need to find the beginning and end of the extent so we can
 	 * properly update the summary.
 	 */
-	error = xfs_rtfind_back(args, start, 0, &preblock);
+	error = xfs_rtfind_back(args, start, &preblock);
 	if (error) {
 		return error;
 	}
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h
index 6186585f2c37..1e04f0954a0f 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.h
+++ b/fs/xfs/libxfs/xfs_rtbitmap.h
@@ -316,7 +316,7 @@ xfs_rtsummary_read_buf(
 int xfs_rtcheck_range(struct xfs_rtalloc_args *args, xfs_rtxnum_t start,
 		xfs_rtxlen_t len, int val, xfs_rtxnum_t *new, int *stat);
 int xfs_rtfind_back(struct xfs_rtalloc_args *args, xfs_rtxnum_t start,
-		xfs_rtxnum_t limit, xfs_rtxnum_t *rtblock);
+		xfs_rtxnum_t *rtblock);
 int xfs_rtfind_forw(struct xfs_rtalloc_args *args, xfs_rtxnum_t start,
 		xfs_rtxnum_t limit, xfs_rtxnum_t *rtblock);
 int xfs_rtmodify_range(struct xfs_rtalloc_args *args, xfs_rtxnum_t start,
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 26eab1b408c8..3728445b0b1c 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -142,7 +142,7 @@ xfs_rtallocate_range(
 	 * We need to find the beginning and end of the extent so we can
 	 * properly update the summary.
 	 */
-	error = xfs_rtfind_back(args, start, 0, &preblock);
+	error = xfs_rtfind_back(args, start, &preblock);
 	if (error)
 		return error;
 


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 05/12] xfs: assert a valid limit in xfs_rtfind_forw
  2024-09-02 18:21 ` [PATCHSET v4.2 3/8] xfs: clean up the rtbitmap code Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-09-02 18:24   ` [PATCH 04/12] xfs: remove the limit argument to xfs_rtfind_back Darrick J. Wong
@ 2024-09-02 18:25   ` Darrick J. Wong
  2024-09-02 18:25   ` [PATCH 06/12] xfs: add bounds checking to xfs_rt{bitmap,summary}_read_buf Darrick J. Wong
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:25 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Protect against developers passing stupid limits when refactoring the
RT code once again.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_rtbitmap.c |    2 ++
 1 file changed, 2 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
index 9feeefe53948..4de97c4e8ebd 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.c
+++ b/fs/xfs/libxfs/xfs_rtbitmap.c
@@ -315,6 +315,8 @@ xfs_rtfind_forw(
 	xfs_rtword_t		incore;
 	unsigned int		word;	/* word number in the buffer */
 
+	ASSERT(start <= limit);
+
 	/*
 	 * Compute and read in starting bitmap block for starting block.
 	 */


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 06/12] xfs: add bounds checking to xfs_rt{bitmap,summary}_read_buf
  2024-09-02 18:21 ` [PATCHSET v4.2 3/8] xfs: clean up the rtbitmap code Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-09-02 18:25   ` [PATCH 05/12] xfs: assert a valid limit in xfs_rtfind_forw Darrick J. Wong
@ 2024-09-02 18:25   ` Darrick J. Wong
  2024-09-02 18:25   ` [PATCH 07/12] xfs: cleanup the calling convention for xfs_rtpick_extent Darrick J. Wong
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:25 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Add a corruption check for passing an invalid block number, which is a
lot easier to understand than the xfs_bmapi_read failure later on.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_rtbitmap.c |   31 ++++++++++++++++++++++++++++++-
 fs/xfs/libxfs/xfs_rtbitmap.h |   22 ++--------------------
 2 files changed, 32 insertions(+), 21 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
index 4de97c4e8ebd..02d6668d860f 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.c
+++ b/fs/xfs/libxfs/xfs_rtbitmap.c
@@ -69,7 +69,7 @@ xfs_rtbuf_cache_relse(
  * Get a buffer for the bitmap or summary file block specified.
  * The buffer is returned read and locked.
  */
-int
+static int
 xfs_rtbuf_get(
 	struct xfs_rtalloc_args	*args,
 	xfs_fileoff_t		block,	/* block number in bitmap or summary */
@@ -138,6 +138,35 @@ xfs_rtbuf_get(
 	return 0;
 }
 
+int
+xfs_rtbitmap_read_buf(
+	struct xfs_rtalloc_args		*args,
+	xfs_fileoff_t			block)
+{
+	struct xfs_mount		*mp = args->mp;
+
+	if (XFS_IS_CORRUPT(mp, block >= mp->m_sb.sb_rbmblocks)) {
+		xfs_rt_mark_sick(mp, XFS_SICK_RT_BITMAP);
+		return -EFSCORRUPTED;
+	}
+
+	return xfs_rtbuf_get(args, block, 0);
+}
+
+int
+xfs_rtsummary_read_buf(
+	struct xfs_rtalloc_args		*args,
+	xfs_fileoff_t			block)
+{
+	struct xfs_mount		*mp = args->mp;
+
+	if (XFS_IS_CORRUPT(mp, block >= XFS_B_TO_FSB(mp, mp->m_rsumsize))) {
+		xfs_rt_mark_sick(args->mp, XFS_SICK_RT_SUMMARY);
+		return -EFSCORRUPTED;
+	}
+	return xfs_rtbuf_get(args, block, 1);
+}
+
 /*
  * Searching backward from start find the first block whose allocated/free state
  * is different from start's.
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h
index 1e04f0954a0f..e87e2099cff5 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.h
+++ b/fs/xfs/libxfs/xfs_rtbitmap.h
@@ -293,26 +293,8 @@ typedef int (*xfs_rtalloc_query_range_fn)(
 
 #ifdef CONFIG_XFS_RT
 void xfs_rtbuf_cache_relse(struct xfs_rtalloc_args *args);
-
-int xfs_rtbuf_get(struct xfs_rtalloc_args *args, xfs_fileoff_t block,
-		int issum);
-
-static inline int
-xfs_rtbitmap_read_buf(
-	struct xfs_rtalloc_args		*args,
-	xfs_fileoff_t			block)
-{
-	return xfs_rtbuf_get(args, block, 0);
-}
-
-static inline int
-xfs_rtsummary_read_buf(
-	struct xfs_rtalloc_args		*args,
-	xfs_fileoff_t			block)
-{
-	return xfs_rtbuf_get(args, block, 1);
-}
-
+int xfs_rtbitmap_read_buf(struct xfs_rtalloc_args *args, xfs_fileoff_t block);
+int xfs_rtsummary_read_buf(struct xfs_rtalloc_args *args, xfs_fileoff_t block);
 int xfs_rtcheck_range(struct xfs_rtalloc_args *args, xfs_rtxnum_t start,
 		xfs_rtxlen_t len, int val, xfs_rtxnum_t *new, int *stat);
 int xfs_rtfind_back(struct xfs_rtalloc_args *args, xfs_rtxnum_t start,


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 07/12] xfs: cleanup the calling convention for xfs_rtpick_extent
  2024-09-02 18:21 ` [PATCHSET v4.2 3/8] xfs: clean up the rtbitmap code Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-09-02 18:25   ` [PATCH 06/12] xfs: add bounds checking to xfs_rt{bitmap,summary}_read_buf Darrick J. Wong
@ 2024-09-02 18:25   ` Darrick J. Wong
  2024-09-02 18:25   ` [PATCH 08/12] xfs: push the calls to xfs_rtallocate_range out to xfs_bmap_rtalloc Darrick J. Wong
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:25 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

xfs_rtpick_extent never returns an error.  Do away with the error return
and directly return the picked extent instead of doing that through a
call by reference argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_rtalloc.c |   12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 3728445b0b1c..64ba4bcf6e29 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1300,12 +1300,11 @@ xfs_rtunmount_inodes(
  * of rtextents and the fraction.
  * The fraction sequence is 0, 1/2, 1/4, 3/4, 1/8, ..., 7/8, 1/16, ...
  */
-static int
+static xfs_rtxnum_t
 xfs_rtpick_extent(
 	xfs_mount_t		*mp,		/* file system mount point */
 	xfs_trans_t		*tp,		/* transaction pointer */
-	xfs_rtxlen_t		len,		/* allocation length (rtextents) */
-	xfs_rtxnum_t		*pick)		/* result rt extent */
+	xfs_rtxlen_t		len)		/* allocation length (rtextents) */
 {
 	xfs_rtxnum_t		b;		/* result rtext */
 	int			log2;		/* log of sequence number */
@@ -1336,8 +1335,7 @@ xfs_rtpick_extent(
 	ts.tv_sec = seq + 1;
 	inode_set_atime_to_ts(VFS_I(mp->m_rbmip), ts);
 	xfs_trans_log_inode(tp, mp->m_rbmip, XFS_ILOG_CORE);
-	*pick = b;
-	return 0;
+	return b;
 }
 
 static void
@@ -1444,9 +1442,7 @@ xfs_bmap_rtalloc(
 		 * If it's an allocation to an empty file at offset 0, pick an
 		 * extent that will space things out in the rt area.
 		 */
-		error = xfs_rtpick_extent(mp, ap->tp, ralen, &start);
-		if (error)
-			return error;
+		start = xfs_rtpick_extent(mp, ap->tp, ralen);
 	} else {
 		start = 0;
 	}


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 08/12] xfs: push the calls to xfs_rtallocate_range out to xfs_bmap_rtalloc
  2024-09-02 18:21 ` [PATCHSET v4.2 3/8] xfs: clean up the rtbitmap code Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-09-02 18:25   ` [PATCH 07/12] xfs: cleanup the calling convention for xfs_rtpick_extent Darrick J. Wong
@ 2024-09-02 18:25   ` Darrick J. Wong
  2024-09-02 18:26   ` [PATCH 09/12] xfs: factor out a xfs_growfs_rt_bmblock helper Darrick J. Wong
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:25 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Currently the various low-level RT allocator functions call into
xfs_rtallocate_range directly, which ties them into the locking protocol
for the RT bitmap.  As these helpers already return the allocated range,
lift the call to xfs_rtallocate_range into xfs_bmap_rtalloc so that it
happens as high as possible in the stack, which will simplify future
changes to the locking protocol.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_rtalloc.c |   38 ++++++++++++++++++--------------------
 1 file changed, 18 insertions(+), 20 deletions(-)


diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 64ba4bcf6e29..59e599af74f4 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -259,9 +259,9 @@ xfs_rtallocate_extent_block(
 			/*
 			 * i for maxlen is all free, allocate and return that.
 			 */
-			bestlen = maxlen;
-			besti = i;
-			goto allocate;
+			*len = maxlen;
+			*rtx = i;
+			return 0;
 		}
 
 		/*
@@ -312,12 +312,8 @@ xfs_rtallocate_extent_block(
 	}
 
 	/*
-	 * Allocate besti for bestlen & return that.
+	 * Pick besti for bestlen & return that.
 	 */
-allocate:
-	error = xfs_rtallocate_range(args, besti, bestlen);
-	if (error)
-		return error;
 	*len = bestlen;
 	*rtx = besti;
 	return 0;
@@ -371,12 +367,6 @@ xfs_rtallocate_extent_exact(
 		}
 	}
 
-	/*
-	 * Allocate what we can and return it.
-	 */
-	error = xfs_rtallocate_range(args, start, maxlen);
-	if (error)
-		return error;
 	*len = maxlen;
 	*rtx = start;
 	return 0;
@@ -429,7 +419,6 @@ xfs_rtallocate_extent_near(
 	if (error != -ENOSPC)
 		return error;
 
-
 	bbno = xfs_rtx_to_rbmblock(mp, start);
 	i = 0;
 	j = -1;
@@ -552,11 +541,11 @@ xfs_rtalloc_sumlevel(
 	xfs_rtxnum_t		*rtx)	/* out: start rtext allocated */
 {
 	xfs_fileoff_t		i;	/* bitmap block number */
+	int			error;
 
 	for (i = 0; i < args->mp->m_sb.sb_rbmblocks; i++) {
 		xfs_suminfo_t	sum;	/* summary information for extents */
 		xfs_rtxnum_t	n;	/* next rtext to be tried */
-		int		error;
 
 		error = xfs_rtget_summary(args, l, i, &sum);
 		if (error)
@@ -1467,9 +1456,12 @@ xfs_bmap_rtalloc(
 		error = xfs_rtallocate_extent_size(&args, raminlen,
 				ralen, &ralen, prod, &rtx);
 	}
-	xfs_rtbuf_cache_relse(&args);
 
-	if (error == -ENOSPC) {
+	if (error) {
+		xfs_rtbuf_cache_relse(&args);
+		if (error != -ENOSPC)
+			return error;
+
 		if (align > mp->m_sb.sb_rextsize) {
 			/*
 			 * We previously enlarged the request length to try to
@@ -1497,14 +1489,20 @@ xfs_bmap_rtalloc(
 		ap->length = 0;
 		return 0;
 	}
+
+	error = xfs_rtallocate_range(&args, rtx, ralen);
 	if (error)
-		return error;
+		goto out_release;
 
 	xfs_trans_mod_sb(ap->tp, ap->wasdel ?
 			XFS_TRANS_SB_RES_FREXTENTS : XFS_TRANS_SB_FREXTENTS,
 			-(long)ralen);
+
 	ap->blkno = xfs_rtx_to_rtb(mp, rtx);
 	ap->length = xfs_rtxlen_to_extlen(mp, ralen);
 	xfs_bmap_alloc_account(ap);
-	return 0;
+
+out_release:
+	xfs_rtbuf_cache_relse(&args);
+	return error;
 }


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 09/12] xfs: factor out a xfs_growfs_rt_bmblock helper
  2024-09-02 18:21 ` [PATCHSET v4.2 3/8] xfs: clean up the rtbitmap code Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-09-02 18:25   ` [PATCH 08/12] xfs: push the calls to xfs_rtallocate_range out to xfs_bmap_rtalloc Darrick J. Wong
@ 2024-09-02 18:26   ` Darrick J. Wong
  2024-09-02 18:26   ` [PATCH 10/12] xfs: factor out a xfs_last_rt_bmblock helper Darrick J. Wong
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:26 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Add a helper to contain the per-rtbitmap block logic in xfs_growfs_rt.

Note that this helper now allocates a new fake mount structure for
each rtbitmap block iteration instead of reusing the memory for an
entire growfs call.  Compared to all the other work done when freeing
the blocks the overhead for this is in the noise and it keeps the code
nicely modular.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_rtalloc.c |  317 +++++++++++++++++++++++++-------------------------
 1 file changed, 158 insertions(+), 159 deletions(-)


diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 59e599af74f4..febd039718ee 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -805,9 +805,148 @@ xfs_growfs_rt_fixup_extsize(
 	return error;
 }
 
-/*
- * Visible (exported) functions.
- */
+static int
+xfs_growfs_rt_bmblock(
+	struct xfs_mount	*mp,
+	xfs_rfsblock_t		nrblocks,
+	xfs_agblock_t		rextsize,
+	xfs_fileoff_t		bmbno)
+{
+	struct xfs_inode	*rbmip = mp->m_rbmip;
+	struct xfs_inode	*rsumip = mp->m_rsumip;
+	struct xfs_rtalloc_args	args = {
+		.mp		= mp,
+	};
+	struct xfs_rtalloc_args	nargs = {
+	};
+	struct xfs_mount	*nmp;
+	xfs_rfsblock_t		nrblocks_step;
+	xfs_rtbxlen_t		freed_rtx;
+	int			error;
+
+
+	nrblocks_step = (bmbno + 1) * NBBY * mp->m_sb.sb_blocksize * rextsize;
+
+	nmp = nargs.mp = kmemdup(mp, sizeof(*mp), GFP_KERNEL);
+	if (!nmp)
+		return -ENOMEM;
+
+	/*
+	 * Calculate new sb and mount fields for this round.
+	 */
+	nmp->m_rtxblklog = -1; /* don't use shift or masking */
+	nmp->m_sb.sb_rextsize = rextsize;
+	nmp->m_sb.sb_rbmblocks = bmbno + 1;
+	nmp->m_sb.sb_rblocks = min(nrblocks, nrblocks_step);
+	nmp->m_sb.sb_rextents = xfs_rtb_to_rtx(nmp, nmp->m_sb.sb_rblocks);
+	nmp->m_sb.sb_rextslog = xfs_compute_rextslog(nmp->m_sb.sb_rextents);
+	nmp->m_rsumlevels = nmp->m_sb.sb_rextslog + 1;
+	nmp->m_rsumsize = XFS_FSB_TO_B(mp,
+		xfs_rtsummary_blockcount(mp, nmp->m_rsumlevels,
+			nmp->m_sb.sb_rbmblocks));
+
+	/* recompute growfsrt reservation from new rsumsize */
+	xfs_trans_resv_calc(nmp, &nmp->m_resv);
+
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growrtfree, 0, 0, 0,
+			&args.tp);
+	if (error)
+		goto out_free;
+	nargs.tp = args.tp;
+
+	xfs_rtbitmap_lock(args.tp, mp);
+
+	/*
+	 * Update the bitmap inode's size ondisk and incore.  We need to update
+	 * the incore size so that inode inactivation won't punch what it thinks
+	 * are "posteof" blocks.
+	 */
+	rbmip->i_disk_size = nmp->m_sb.sb_rbmblocks * nmp->m_sb.sb_blocksize;
+	i_size_write(VFS_I(rbmip), rbmip->i_disk_size);
+	xfs_trans_log_inode(args.tp, rbmip, XFS_ILOG_CORE);
+
+	/*
+	 * Update the summary inode's size.  We need to update the incore size
+	 * so that inode inactivation won't punch what it thinks are "posteof"
+	 * blocks.
+	 */
+	rsumip->i_disk_size = nmp->m_rsumsize;
+	i_size_write(VFS_I(rsumip), rsumip->i_disk_size);
+	xfs_trans_log_inode(args.tp, rsumip, XFS_ILOG_CORE);
+
+	/*
+	 * Copy summary data from old to new sizes when the real size (not
+	 * block-aligned) changes.
+	 */
+	if (mp->m_sb.sb_rbmblocks != nmp->m_sb.sb_rbmblocks ||
+	    mp->m_rsumlevels != nmp->m_rsumlevels) {
+		error = xfs_rtcopy_summary(&args, &nargs);
+		if (error)
+			goto out_cancel;
+	}
+
+	/*
+	 * Update superblock fields.
+	 */
+	if (nmp->m_sb.sb_rextsize != mp->m_sb.sb_rextsize)
+		xfs_trans_mod_sb(args.tp, XFS_TRANS_SB_REXTSIZE,
+			nmp->m_sb.sb_rextsize - mp->m_sb.sb_rextsize);
+	if (nmp->m_sb.sb_rbmblocks != mp->m_sb.sb_rbmblocks)
+		xfs_trans_mod_sb(args.tp, XFS_TRANS_SB_RBMBLOCKS,
+			nmp->m_sb.sb_rbmblocks - mp->m_sb.sb_rbmblocks);
+	if (nmp->m_sb.sb_rblocks != mp->m_sb.sb_rblocks)
+		xfs_trans_mod_sb(args.tp, XFS_TRANS_SB_RBLOCKS,
+			nmp->m_sb.sb_rblocks - mp->m_sb.sb_rblocks);
+	if (nmp->m_sb.sb_rextents != mp->m_sb.sb_rextents)
+		xfs_trans_mod_sb(args.tp, XFS_TRANS_SB_REXTENTS,
+			nmp->m_sb.sb_rextents - mp->m_sb.sb_rextents);
+	if (nmp->m_sb.sb_rextslog != mp->m_sb.sb_rextslog)
+		xfs_trans_mod_sb(args.tp, XFS_TRANS_SB_REXTSLOG,
+			nmp->m_sb.sb_rextslog - mp->m_sb.sb_rextslog);
+
+	/*
+	 * Free the new extent.
+	 */
+	freed_rtx = nmp->m_sb.sb_rextents - mp->m_sb.sb_rextents;
+	error = xfs_rtfree_range(&nargs, mp->m_sb.sb_rextents, freed_rtx);
+	xfs_rtbuf_cache_relse(&nargs);
+	if (error)
+		goto out_cancel;
+
+	/*
+	 * Mark more blocks free in the superblock.
+	 */
+	xfs_trans_mod_sb(args.tp, XFS_TRANS_SB_FREXTENTS, freed_rtx);
+
+	/*
+	 * Update mp values into the real mp structure.
+	 */
+	mp->m_rsumlevels = nmp->m_rsumlevels;
+	mp->m_rsumsize = nmp->m_rsumsize;
+
+	/*
+	 * Recompute the growfsrt reservation from the new rsumsize.
+	 */
+	xfs_trans_resv_calc(mp, &mp->m_resv);
+
+	error = xfs_trans_commit(args.tp);
+	if (error)
+		goto out_free;
+
+	/*
+	 * Ensure the mount RT feature flag is now set.
+	 */
+	mp->m_features |= XFS_FEAT_REALTIME;
+
+	kfree(nmp);
+	return 0;
+
+out_cancel:
+	xfs_trans_cancel(args.tp);
+out_free:
+	kfree(nmp);
+	return error;
+}
 
 /*
  * Grow the realtime area of the filesystem.
@@ -820,23 +959,14 @@ xfs_growfs_rt(
 	xfs_fileoff_t	bmbno;		/* bitmap block number */
 	struct xfs_buf	*bp;		/* temporary buffer */
 	int		error;		/* error return value */
-	xfs_mount_t	*nmp;		/* new (fake) mount structure */
-	xfs_rfsblock_t	nrblocks;	/* new number of realtime blocks */
 	xfs_extlen_t	nrbmblocks;	/* new number of rt bitmap blocks */
 	xfs_rtxnum_t	nrextents;	/* new number of realtime extents */
-	uint8_t		nrextslog;	/* new log2 of sb_rextents */
 	xfs_extlen_t	nrsumblocks;	/* new number of summary blocks */
-	uint		nrsumlevels;	/* new rt summary levels */
-	uint		nrsumsize;	/* new size of rt summary, bytes */
-	xfs_sb_t	*nsbp;		/* new superblock */
 	xfs_extlen_t	rbmblocks;	/* current number of rt bitmap blocks */
 	xfs_extlen_t	rsumblocks;	/* current number of rt summary blks */
-	xfs_sb_t	*sbp;		/* old superblock */
 	uint8_t		*rsum_cache;	/* old summary cache */
 	xfs_agblock_t	old_rextsize = mp->m_sb.sb_rextsize;
 
-	sbp = &mp->m_sb;
-
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
@@ -855,11 +985,10 @@ xfs_growfs_rt(
 		goto out_unlock;
 
 	/* Shrink not supported. */
-	if (in->newblocks <= sbp->sb_rblocks)
+	if (in->newblocks <= mp->m_sb.sb_rblocks)
 		goto out_unlock;
-
 	/* Can only change rt extent size when adding rt volume. */
-	if (sbp->sb_rblocks > 0 && in->extsize != sbp->sb_rextsize)
+	if (mp->m_sb.sb_rblocks > 0 && in->extsize != mp->m_sb.sb_rextsize)
 		goto out_unlock;
 
 	/* Range check the extent size. */
@@ -872,15 +1001,14 @@ xfs_growfs_rt(
 	if (xfs_has_rmapbt(mp) || xfs_has_reflink(mp) || xfs_has_quota(mp))
 		goto out_unlock;
 
-	nrblocks = in->newblocks;
-	error = xfs_sb_validate_fsb_count(sbp, nrblocks);
+	error = xfs_sb_validate_fsb_count(&mp->m_sb, in->newblocks);
 	if (error)
 		goto out_unlock;
 	/*
 	 * Read in the last block of the device, make sure it exists.
 	 */
 	error = xfs_buf_read_uncached(mp->m_rtdev_targp,
-				XFS_FSB_TO_BB(mp, nrblocks - 1),
+				XFS_FSB_TO_BB(mp, in->newblocks - 1),
 				XFS_FSB_TO_BB(mp, 1), 0, &bp, NULL);
 	if (error)
 		goto out_unlock;
@@ -889,17 +1017,15 @@ xfs_growfs_rt(
 	/*
 	 * Calculate new parameters.  These are the final values to be reached.
 	 */
-	nrextents = nrblocks;
-	do_div(nrextents, in->extsize);
+	nrextents = div_u64(in->newblocks, in->extsize);
 	if (nrextents == 0) {
 		error = -EINVAL;
 		goto out_unlock;
 	}
 	nrbmblocks = xfs_rtbitmap_blockcount(mp, nrextents);
-	nrextslog = xfs_compute_rextslog(nrextents);
-	nrsumlevels = nrextslog + 1;
-	nrsumblocks = xfs_rtsummary_blockcount(mp, nrsumlevels, nrbmblocks);
-	nrsumsize = XFS_FSB_TO_B(mp, nrsumblocks);
+	nrsumblocks = xfs_rtsummary_blockcount(mp,
+			xfs_compute_rextslog(nrextents) + 1, nrbmblocks);
+
 	/*
 	 * New summary size can't be more than half the size of
 	 * the log.  This prevents us from getting a log overflow,
@@ -927,149 +1053,27 @@ xfs_growfs_rt(
 		goto out_unlock;
 
 	rsum_cache = mp->m_rsum_cache;
-	if (nrbmblocks != sbp->sb_rbmblocks) {
+	if (nrbmblocks != mp->m_sb.sb_rbmblocks) {
 		error = xfs_alloc_rsum_cache(mp, nrbmblocks);
 		if (error)
 			goto out_unlock;
 	}
 
-	/*
-	 * Allocate a new (fake) mount/sb.
-	 */
-	nmp = kmalloc(sizeof(*nmp), GFP_KERNEL | __GFP_NOFAIL);
 	/*
 	 * Loop over the bitmap blocks.
 	 * We will do everything one bitmap block at a time.
 	 * Skip the current block if it is exactly full.
 	 * This also deals with the case where there were no rtextents before.
 	 */
-	for (bmbno = sbp->sb_rbmblocks -
-		     ((sbp->sb_rextents & ((1 << mp->m_blkbit_log) - 1)) != 0);
-	     bmbno < nrbmblocks;
-	     bmbno++) {
-		struct xfs_rtalloc_args	args = {
-			.mp		= mp,
-		};
-		struct xfs_rtalloc_args	nargs = {
-			.mp		= nmp,
-		};
-		struct xfs_trans	*tp;
-		xfs_rfsblock_t		nrblocks_step;
-
-		*nmp = *mp;
-		nsbp = &nmp->m_sb;
-		/*
-		 * Calculate new sb and mount fields for this round.
-		 */
-		nsbp->sb_rextsize = in->extsize;
-		nmp->m_rtxblklog = -1; /* don't use shift or masking */
-		nsbp->sb_rbmblocks = bmbno + 1;
-		nrblocks_step = (bmbno + 1) * NBBY * nsbp->sb_blocksize *
-				nsbp->sb_rextsize;
-		nsbp->sb_rblocks = min(nrblocks, nrblocks_step);
-		nsbp->sb_rextents = xfs_rtb_to_rtx(nmp, nsbp->sb_rblocks);
-		ASSERT(nsbp->sb_rextents != 0);
-		nsbp->sb_rextslog = xfs_compute_rextslog(nsbp->sb_rextents);
-		nrsumlevels = nmp->m_rsumlevels = nsbp->sb_rextslog + 1;
-		nrsumblocks = xfs_rtsummary_blockcount(mp, nrsumlevels,
-				nsbp->sb_rbmblocks);
-		nmp->m_rsumsize = nrsumsize = XFS_FSB_TO_B(mp, nrsumblocks);
-		/* recompute growfsrt reservation from new rsumsize */
-		xfs_trans_resv_calc(nmp, &nmp->m_resv);
-
-		/*
-		 * Start a transaction, get the log reservation.
-		 */
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growrtfree, 0, 0, 0,
-				&tp);
+	bmbno = mp->m_sb.sb_rbmblocks;
+	if (xfs_rtx_to_rbmword(mp, mp->m_sb.sb_rextents) != 0)
+		bmbno--;
+	for (; bmbno < nrbmblocks; bmbno++) {
+		error = xfs_growfs_rt_bmblock(mp, in->newblocks, in->extsize,
+				bmbno);
 		if (error)
-			break;
-		args.tp = tp;
-		nargs.tp = tp;
-
-		/*
-		 * Lock out other callers by grabbing the bitmap and summary
-		 * inode locks and joining them to the transaction.
-		 */
-		xfs_rtbitmap_lock(tp, mp);
-		/*
-		 * Update the bitmap inode's size ondisk and incore.  We need
-		 * to update the incore size so that inode inactivation won't
-		 * punch what it thinks are "posteof" blocks.
-		 */
-		mp->m_rbmip->i_disk_size =
-			nsbp->sb_rbmblocks * nsbp->sb_blocksize;
-		i_size_write(VFS_I(mp->m_rbmip), mp->m_rbmip->i_disk_size);
-		xfs_trans_log_inode(tp, mp->m_rbmip, XFS_ILOG_CORE);
-		/*
-		 * Update the summary inode's size.  We need to update the
-		 * incore size so that inode inactivation won't punch what it
-		 * thinks are "posteof" blocks.
-		 */
-		mp->m_rsumip->i_disk_size = nmp->m_rsumsize;
-		i_size_write(VFS_I(mp->m_rsumip), mp->m_rsumip->i_disk_size);
-		xfs_trans_log_inode(tp, mp->m_rsumip, XFS_ILOG_CORE);
-		/*
-		 * Copy summary data from old to new sizes.
-		 * Do this when the real size (not block-aligned) changes.
-		 */
-		if (sbp->sb_rbmblocks != nsbp->sb_rbmblocks ||
-		    mp->m_rsumlevels != nmp->m_rsumlevels) {
-			error = xfs_rtcopy_summary(&args, &nargs);
-			if (error)
-				goto error_cancel;
-		}
-		/*
-		 * Update superblock fields.
-		 */
-		if (nsbp->sb_rextsize != sbp->sb_rextsize)
-			xfs_trans_mod_sb(tp, XFS_TRANS_SB_REXTSIZE,
-				nsbp->sb_rextsize - sbp->sb_rextsize);
-		if (nsbp->sb_rbmblocks != sbp->sb_rbmblocks)
-			xfs_trans_mod_sb(tp, XFS_TRANS_SB_RBMBLOCKS,
-				nsbp->sb_rbmblocks - sbp->sb_rbmblocks);
-		if (nsbp->sb_rblocks != sbp->sb_rblocks)
-			xfs_trans_mod_sb(tp, XFS_TRANS_SB_RBLOCKS,
-				nsbp->sb_rblocks - sbp->sb_rblocks);
-		if (nsbp->sb_rextents != sbp->sb_rextents)
-			xfs_trans_mod_sb(tp, XFS_TRANS_SB_REXTENTS,
-				nsbp->sb_rextents - sbp->sb_rextents);
-		if (nsbp->sb_rextslog != sbp->sb_rextslog)
-			xfs_trans_mod_sb(tp, XFS_TRANS_SB_REXTSLOG,
-				nsbp->sb_rextslog - sbp->sb_rextslog);
-		/*
-		 * Free new extent.
-		 */
-		error = xfs_rtfree_range(&nargs, sbp->sb_rextents,
-				nsbp->sb_rextents - sbp->sb_rextents);
-		xfs_rtbuf_cache_relse(&nargs);
-		if (error) {
-error_cancel:
-			xfs_trans_cancel(tp);
-			break;
-		}
-		/*
-		 * Mark more blocks free in the superblock.
-		 */
-		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FREXTENTS,
-			nsbp->sb_rextents - sbp->sb_rextents);
-		/*
-		 * Update mp values into the real mp structure.
-		 */
-		mp->m_rsumlevels = nrsumlevels;
-		mp->m_rsumsize = nrsumsize;
-		/* recompute growfsrt reservation from new rsumsize */
-		xfs_trans_resv_calc(mp, &mp->m_resv);
-
-		error = xfs_trans_commit(tp);
-		if (error)
-			break;
-
-		/* Ensure the mount RT feature flag is now set. */
-		mp->m_features |= XFS_FEAT_REALTIME;
+			goto out_free;
 	}
-	if (error)
-		goto out_free;
 
 	if (old_rextsize != in->extsize) {
 		error = xfs_growfs_rt_fixup_extsize(mp);
@@ -1081,11 +1085,6 @@ xfs_growfs_rt(
 	error = xfs_update_secondary_sbs(mp);
 
 out_free:
-	/*
-	 * Free the fake mp structure.
-	 */
-	kfree(nmp);
-
 	/*
 	 * If we had to allocate a new rsum_cache, we either need to free the
 	 * old one (if we succeeded) or free the new one and restore the old one


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 10/12] xfs: factor out a xfs_last_rt_bmblock helper
  2024-09-02 18:21 ` [PATCHSET v4.2 3/8] xfs: clean up the rtbitmap code Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-09-02 18:26   ` [PATCH 09/12] xfs: factor out a xfs_growfs_rt_bmblock helper Darrick J. Wong
@ 2024-09-02 18:26   ` Darrick J. Wong
  2024-09-02 18:26   ` [PATCH 11/12] xfs: factor out rtbitmap/summary initialization helpers Darrick J. Wong
  2024-09-02 18:27   ` [PATCH 12/12] xfs: push transaction join out of xfs_rtbitmap_lock and xfs_rtgroup_lock Darrick J. Wong
  11 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:26 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Add helper to calculate the last currently used rt bitmap block to
better structure the growfs code and prepare for future changes to it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_rtalloc.c |   29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)


diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index febd039718ee..45a0d29949ea 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -948,6 +948,23 @@ xfs_growfs_rt_bmblock(
 	return error;
 }
 
+/*
+ * Calculate the last rbmblock currently used.
+ *
+ * This also deals with the case where there were no rtextents before.
+ */
+static xfs_fileoff_t
+xfs_last_rt_bmblock(
+	struct xfs_mount	*mp)
+{
+	xfs_fileoff_t		bmbno = mp->m_sb.sb_rbmblocks;
+
+	/* Skip the current block if it is exactly full. */
+	if (xfs_rtx_to_rbmword(mp, mp->m_sb.sb_rextents) != 0)
+		bmbno--;
+	return bmbno;
+}
+
 /*
  * Grow the realtime area of the filesystem.
  */
@@ -1059,16 +1076,8 @@ xfs_growfs_rt(
 			goto out_unlock;
 	}
 
-	/*
-	 * Loop over the bitmap blocks.
-	 * We will do everything one bitmap block at a time.
-	 * Skip the current block if it is exactly full.
-	 * This also deals with the case where there were no rtextents before.
-	 */
-	bmbno = mp->m_sb.sb_rbmblocks;
-	if (xfs_rtx_to_rbmword(mp, mp->m_sb.sb_rextents) != 0)
-		bmbno--;
-	for (; bmbno < nrbmblocks; bmbno++) {
+	/* Initialize the free space bitmap one bitmap block at a time. */
+	for (bmbno = xfs_last_rt_bmblock(mp); bmbno < nrbmblocks; bmbno++) {
 		error = xfs_growfs_rt_bmblock(mp, in->newblocks, in->extsize,
 				bmbno);
 		if (error)


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 11/12] xfs: factor out rtbitmap/summary initialization helpers
  2024-09-02 18:21 ` [PATCHSET v4.2 3/8] xfs: clean up the rtbitmap code Darrick J. Wong
                     ` (9 preceding siblings ...)
  2024-09-02 18:26   ` [PATCH 10/12] xfs: factor out a xfs_last_rt_bmblock helper Darrick J. Wong
@ 2024-09-02 18:26   ` Darrick J. Wong
  2024-09-02 18:27   ` [PATCH 12/12] xfs: push transaction join out of xfs_rtbitmap_lock and xfs_rtgroup_lock Darrick J. Wong
  11 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:26 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Add helpers to libxfs that can be shared by growfs and mkfs for
initializing the rtbitmap and summary, and by passing the optional data
pointer also by repair for rebuilding them.  This will become even more
useful when the rtgroups feature adds a metadata header to each block,
which means even more shared code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: minor documentation and data advance tweaks]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_rtbitmap.c |  126 ++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_rtbitmap.h |    3 +
 fs/xfs/xfs_rtalloc.c         |  121 +---------------------------------------
 3 files changed, 133 insertions(+), 117 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
index 02d6668d860f..715d2c54ce02 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.c
+++ b/fs/xfs/libxfs/xfs_rtbitmap.c
@@ -13,6 +13,8 @@
 #include "xfs_mount.h"
 #include "xfs_inode.h"
 #include "xfs_bmap.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_trans_space.h"
 #include "xfs_trans.h"
 #include "xfs_rtalloc.h"
 #include "xfs_error.h"
@@ -1255,3 +1257,127 @@ xfs_rtbitmap_unlock_shared(
 	if (rbmlock_flags & XFS_RBMLOCK_BITMAP)
 		xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
 }
+
+static int
+xfs_rtfile_alloc_blocks(
+	struct xfs_inode	*ip,
+	xfs_fileoff_t		offset_fsb,
+	xfs_filblks_t		count_fsb,
+	struct xfs_bmbt_irec	*map)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_trans	*tp;
+	int			nmap = 1;
+	int			error;
+
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growrtalloc,
+			XFS_GROWFSRT_SPACE_RES(mp, count_fsb), 0, 0, &tp);
+	if (error)
+		return error;
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+
+	error = xfs_iext_count_extend(tp, ip, XFS_DATA_FORK,
+				XFS_IEXT_ADD_NOSPLIT_CNT);
+	if (error)
+		goto out_trans_cancel;
+
+	error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb,
+			XFS_BMAPI_METADATA, 0, map, &nmap);
+	if (error)
+		goto out_trans_cancel;
+
+	return xfs_trans_commit(tp);
+
+out_trans_cancel:
+	xfs_trans_cancel(tp);
+	return error;
+}
+
+/* Get a buffer for the block. */
+static int
+xfs_rtfile_initialize_block(
+	struct xfs_inode	*ip,
+	xfs_fsblock_t		fsbno,
+	void			*data)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_trans	*tp;
+	struct xfs_buf		*bp;
+	const size_t		copylen = mp->m_blockwsize << XFS_WORDLOG;
+	enum xfs_blft		buf_type;
+	int			error;
+
+	if (ip == mp->m_rsumip)
+		buf_type = XFS_BLFT_RTSUMMARY_BUF;
+	else
+		buf_type = XFS_BLFT_RTBITMAP_BUF;
+
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growrtzero, 0, 0, 0, &tp);
+	if (error)
+		return error;
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+
+	error = xfs_trans_get_buf(tp, mp->m_ddev_targp,
+			XFS_FSB_TO_DADDR(mp, fsbno), mp->m_bsize, 0, &bp);
+	if (error) {
+		xfs_trans_cancel(tp);
+		return error;
+	}
+
+	xfs_trans_buf_set_type(tp, bp, buf_type);
+	bp->b_ops = &xfs_rtbuf_ops;
+	if (data)
+		memcpy(bp->b_addr, data, copylen);
+	else
+		memset(bp->b_addr, 0, copylen);
+	xfs_trans_log_buf(tp, bp, 0, mp->m_sb.sb_blocksize - 1);
+	return xfs_trans_commit(tp);
+}
+
+/*
+ * Allocate space to the bitmap or summary file, and zero it, for growfs.
+ * @data must be a contiguous buffer large enough to fill all blocks in the
+ * file; or NULL to initialize the contents to zeroes.
+ */
+int
+xfs_rtfile_initialize_blocks(
+	struct xfs_inode	*ip,		/* inode (bitmap/summary) */
+	xfs_fileoff_t		offset_fsb,	/* offset to start from */
+	xfs_fileoff_t		end_fsb,	/* offset to allocate to */
+	void			*data)		/* data to fill the blocks */
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	const size_t		copylen = mp->m_blockwsize << XFS_WORDLOG;
+
+	while (offset_fsb < end_fsb) {
+		struct xfs_bmbt_irec	map;
+		xfs_filblks_t		i;
+		int			error;
+
+		error = xfs_rtfile_alloc_blocks(ip, offset_fsb,
+				end_fsb - offset_fsb, &map);
+		if (error)
+			return error;
+
+		/*
+		 * Now we need to clear the allocated blocks.
+		 *
+		 * Do this one block per transaction, to keep it simple.
+		 */
+		for (i = 0; i < map.br_blockcount; i++) {
+			error = xfs_rtfile_initialize_block(ip,
+					map.br_startblock + i, data);
+			if (error)
+				return error;
+			if (data)
+				data += copylen;
+		}
+
+		offset_fsb = map.br_startoff + map.br_blockcount;
+	}
+
+	return 0;
+}
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h
index e87e2099cff5..0d5ab5e2cb6a 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.h
+++ b/fs/xfs/libxfs/xfs_rtbitmap.h
@@ -343,6 +343,9 @@ xfs_filblks_t xfs_rtsummary_blockcount(struct xfs_mount *mp,
 unsigned long long xfs_rtsummary_wordcount(struct xfs_mount *mp,
 		unsigned int rsumlevels, xfs_extlen_t rbmblocks);
 
+int xfs_rtfile_initialize_blocks(struct xfs_inode *ip,
+		xfs_fileoff_t offset_fsb, xfs_fileoff_t end_fsb, void *data);
+
 void xfs_rtbitmap_lock(struct xfs_trans *tp, struct xfs_mount *mp);
 void xfs_rtbitmap_unlock(struct xfs_mount *mp);
 
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 45a0d29949ea..114807cd80ba 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -641,121 +641,6 @@ xfs_rtallocate_extent_size(
 	return -ENOSPC;
 }
 
-/*
- * Allocate space to the bitmap or summary file, and zero it, for growfs.
- */
-STATIC int
-xfs_growfs_rt_alloc(
-	struct xfs_mount	*mp,		/* file system mount point */
-	xfs_extlen_t		oblocks,	/* old count of blocks */
-	xfs_extlen_t		nblocks,	/* new count of blocks */
-	struct xfs_inode	*ip)		/* inode (bitmap/summary) */
-{
-	xfs_fileoff_t		bno;		/* block number in file */
-	struct xfs_buf		*bp;	/* temporary buffer for zeroing */
-	xfs_daddr_t		d;		/* disk block address */
-	int			error;		/* error return value */
-	xfs_fsblock_t		fsbno;		/* filesystem block for bno */
-	struct xfs_bmbt_irec	map;		/* block map output */
-	int			nmap;		/* number of block maps */
-	int			resblks;	/* space reservation */
-	enum xfs_blft		buf_type;
-	struct xfs_trans	*tp;
-
-	if (ip == mp->m_rsumip)
-		buf_type = XFS_BLFT_RTSUMMARY_BUF;
-	else
-		buf_type = XFS_BLFT_RTBITMAP_BUF;
-
-	/*
-	 * Allocate space to the file, as necessary.
-	 */
-	while (oblocks < nblocks) {
-		resblks = XFS_GROWFSRT_SPACE_RES(mp, nblocks - oblocks);
-		/*
-		 * Reserve space & log for one extent added to the file.
-		 */
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growrtalloc, resblks,
-				0, 0, &tp);
-		if (error)
-			return error;
-		/*
-		 * Lock the inode.
-		 */
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
-		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
-
-		error = xfs_iext_count_extend(tp, ip, XFS_DATA_FORK,
-				XFS_IEXT_ADD_NOSPLIT_CNT);
-		if (error)
-			goto out_trans_cancel;
-
-		/*
-		 * Allocate blocks to the bitmap file.
-		 */
-		nmap = 1;
-		error = xfs_bmapi_write(tp, ip, oblocks, nblocks - oblocks,
-					XFS_BMAPI_METADATA, 0, &map, &nmap);
-		if (error)
-			goto out_trans_cancel;
-		/*
-		 * Free any blocks freed up in the transaction, then commit.
-		 */
-		error = xfs_trans_commit(tp);
-		if (error)
-			return error;
-		/*
-		 * Now we need to clear the allocated blocks.
-		 * Do this one block per transaction, to keep it simple.
-		 */
-		for (bno = map.br_startoff, fsbno = map.br_startblock;
-		     bno < map.br_startoff + map.br_blockcount;
-		     bno++, fsbno++) {
-			/*
-			 * Reserve log for one block zeroing.
-			 */
-			error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growrtzero,
-					0, 0, 0, &tp);
-			if (error)
-				return error;
-			/*
-			 * Lock the bitmap inode.
-			 */
-			xfs_ilock(ip, XFS_ILOCK_EXCL);
-			xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
-			/*
-			 * Get a buffer for the block.
-			 */
-			d = XFS_FSB_TO_DADDR(mp, fsbno);
-			error = xfs_trans_get_buf(tp, mp->m_ddev_targp, d,
-					mp->m_bsize, 0, &bp);
-			if (error)
-				goto out_trans_cancel;
-
-			xfs_trans_buf_set_type(tp, bp, buf_type);
-			bp->b_ops = &xfs_rtbuf_ops;
-			memset(bp->b_addr, 0, mp->m_sb.sb_blocksize);
-			xfs_trans_log_buf(tp, bp, 0, mp->m_sb.sb_blocksize - 1);
-			/*
-			 * Commit the transaction.
-			 */
-			error = xfs_trans_commit(tp);
-			if (error)
-				return error;
-		}
-		/*
-		 * Go on to the next extent, if any.
-		 */
-		oblocks = map.br_startoff + map.br_blockcount;
-	}
-
-	return 0;
-
-out_trans_cancel:
-	xfs_trans_cancel(tp);
-	return error;
-}
-
 static int
 xfs_alloc_rsum_cache(
 	struct xfs_mount	*mp,
@@ -1062,10 +947,12 @@ xfs_growfs_rt(
 	/*
 	 * Allocate space to the bitmap and summary files, as necessary.
 	 */
-	error = xfs_growfs_rt_alloc(mp, rbmblocks, nrbmblocks, mp->m_rbmip);
+	error = xfs_rtfile_initialize_blocks(mp->m_rbmip, rbmblocks,
+			nrbmblocks, NULL);
 	if (error)
 		goto out_unlock;
-	error = xfs_growfs_rt_alloc(mp, rsumblocks, nrsumblocks, mp->m_rsumip);
+	error = xfs_rtfile_initialize_blocks(mp->m_rsumip, rsumblocks,
+			nrsumblocks, NULL);
 	if (error)
 		goto out_unlock;
 


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 12/12] xfs: push transaction join out of xfs_rtbitmap_lock and xfs_rtgroup_lock
  2024-09-02 18:21 ` [PATCHSET v4.2 3/8] xfs: clean up the rtbitmap code Darrick J. Wong
                     ` (10 preceding siblings ...)
  2024-09-02 18:26   ` [PATCH 11/12] xfs: factor out rtbitmap/summary initialization helpers Darrick J. Wong
@ 2024-09-02 18:27   ` Darrick J. Wong
  11 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:27 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

To prepare for being able to join an already locked rtbitmap inode to a
transaction split out separate helpers for joining the transaction from
the locking helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_bmap.c     |    3 ++-
 fs/xfs/libxfs/xfs_rtbitmap.c |   24 +++++++++++++-----------
 fs/xfs/libxfs/xfs_rtbitmap.h |    6 ++++--
 fs/xfs/xfs_rtalloc.c         |    6 ++++--
 4 files changed, 23 insertions(+), 16 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7df74c35d9f9..112c7ee2d493 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5376,7 +5376,8 @@ xfs_bmap_del_extent_real(
 			 */
 			if (!(tp->t_flags & XFS_TRANS_RTBITMAP_LOCKED)) {
 				tp->t_flags |= XFS_TRANS_RTBITMAP_LOCKED;
-				xfs_rtbitmap_lock(tp, mp);
+				xfs_rtbitmap_lock(mp);
+				xfs_rtbitmap_trans_join(tp);
 			}
 			error = xfs_rtfree_blocks(tp, del->br_startblock,
 					del->br_blockcount);
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
index 715d2c54ce02..d7c731aeee12 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.c
+++ b/fs/xfs/libxfs/xfs_rtbitmap.c
@@ -1201,23 +1201,25 @@ xfs_rtsummary_wordcount(
 	return XFS_FSB_TO_B(mp, blocks) >> XFS_WORDLOG;
 }
 
-/*
- * Lock both realtime free space metadata inodes for a freespace update.  If a
- * transaction is given, the inodes will be joined to the transaction and the
- * ILOCKs will be released on transaction commit.
- */
+/* Lock both realtime free space metadata inodes for a freespace update. */
 void
 xfs_rtbitmap_lock(
-	struct xfs_trans	*tp,
 	struct xfs_mount	*mp)
 {
 	xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP);
-	if (tp)
-		xfs_trans_ijoin(tp, mp->m_rbmip, XFS_ILOCK_EXCL);
-
 	xfs_ilock(mp->m_rsumip, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM);
-	if (tp)
-		xfs_trans_ijoin(tp, mp->m_rsumip, XFS_ILOCK_EXCL);
+}
+
+/*
+ * Join both realtime free space metadata inodes to the transaction.  The
+ * ILOCKs will be released on transaction commit.
+ */
+void
+xfs_rtbitmap_trans_join(
+	struct xfs_trans	*tp)
+{
+	xfs_trans_ijoin(tp, tp->t_mountp->m_rbmip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, tp->t_mountp->m_rsumip, XFS_ILOCK_EXCL);
 }
 
 /* Unlock both realtime free space metadata inodes after a freespace update. */
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h
index 0d5ab5e2cb6a..523d3d3c12c6 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.h
+++ b/fs/xfs/libxfs/xfs_rtbitmap.h
@@ -346,8 +346,9 @@ unsigned long long xfs_rtsummary_wordcount(struct xfs_mount *mp,
 int xfs_rtfile_initialize_blocks(struct xfs_inode *ip,
 		xfs_fileoff_t offset_fsb, xfs_fileoff_t end_fsb, void *data);
 
-void xfs_rtbitmap_lock(struct xfs_trans *tp, struct xfs_mount *mp);
+void xfs_rtbitmap_lock(struct xfs_mount *mp);
 void xfs_rtbitmap_unlock(struct xfs_mount *mp);
+void xfs_rtbitmap_trans_join(struct xfs_trans *tp);
 
 /* Lock the rt bitmap inode in shared mode */
 #define XFS_RBMLOCK_BITMAP	(1U << 0)
@@ -376,7 +377,8 @@ xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents)
 # define xfs_rtbitmap_wordcount(mp, r)			(0)
 # define xfs_rtsummary_blockcount(mp, l, b)		(0)
 # define xfs_rtsummary_wordcount(mp, l, b)		(0)
-# define xfs_rtbitmap_lock(tp, mp)		do { } while (0)
+# define xfs_rtbitmap_lock(mp)			do { } while (0)
+# define xfs_rtbitmap_trans_join(tp)		do { } while (0)
 # define xfs_rtbitmap_unlock(mp)		do { } while (0)
 # define xfs_rtbitmap_lock_shared(mp, lf)	do { } while (0)
 # define xfs_rtbitmap_unlock_shared(mp, lf)	do { } while (0)
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 114807cd80ba..d290749b0304 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -739,7 +739,8 @@ xfs_growfs_rt_bmblock(
 		goto out_free;
 	nargs.tp = args.tp;
 
-	xfs_rtbitmap_lock(args.tp, mp);
+	xfs_rtbitmap_lock(mp);
+	xfs_rtbitmap_trans_join(args.tp);
 
 	/*
 	 * Update the bitmap inode's size ondisk and incore.  We need to update
@@ -1313,7 +1314,8 @@ xfs_bmap_rtalloc(
 	 * Lock out modifications to both the RT bitmap and summary inodes
 	 */
 	if (!rtlocked) {
-		xfs_rtbitmap_lock(ap->tp, mp);
+		xfs_rtbitmap_lock(mp);
+		xfs_rtbitmap_trans_join(ap->tp);
 		rtlocked = true;
 	}
 


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 01/10] xfs: use the recalculated transaction reservation in xfs_growfs_rt_bmblock
  2024-09-02 18:21 ` [PATCHSET v4.2 4/8] xfs: fixes for the realtime allocator Darrick J. Wong
@ 2024-09-02 18:27   ` Darrick J. Wong
  2024-09-02 18:27   ` [PATCH 02/10] xfs: ensure rtx mask/shift are correct after growfs Darrick J. Wong
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:27 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

After going great length to calculate the transaction reservation for
the new geometry, we should also use it to allocate the transaction it
was calculated for.

Fixes: 578bd4ce7100 ("xfs: recompute growfsrtfree transaction reservation while growing rt volume")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_rtalloc.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index d290749b0304..a9f08d96f1fe 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -730,10 +730,12 @@ xfs_growfs_rt_bmblock(
 		xfs_rtsummary_blockcount(mp, nmp->m_rsumlevels,
 			nmp->m_sb.sb_rbmblocks));
 
-	/* recompute growfsrt reservation from new rsumsize */
+	/*
+	 * Recompute the growfsrt reservation from the new rsumsize, so that the
+	 * transaction below use the new, potentially larger value.
+	 * */
 	xfs_trans_resv_calc(nmp, &nmp->m_resv);
-
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growrtfree, 0, 0, 0,
+	error = xfs_trans_alloc(mp, &M_RES(nmp)->tr_growrtfree, 0, 0, 0,
 			&args.tp);
 	if (error)
 		goto out_free;


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 02/10] xfs: ensure rtx mask/shift are correct after growfs
  2024-09-02 18:21 ` [PATCHSET v4.2 4/8] xfs: fixes for the realtime allocator Darrick J. Wong
  2024-09-02 18:27   ` [PATCH 01/10] xfs: use the recalculated transaction reservation in xfs_growfs_rt_bmblock Darrick J. Wong
@ 2024-09-02 18:27   ` Darrick J. Wong
  2024-09-02 18:27   ` [PATCH 03/10] xfs: don't return too-short extents from xfs_rtallocate_extent_block Darrick J. Wong
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:27 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

When growfs sets an extent size, it doesn't updated the m_rtxblklog and
m_rtxblkmask values, which could lead to incorrect usage of them if they
were set before and can't be used for the new extent size.

Add a xfs_mount_sb_set_rextsize helper that updates the two fields, and
also use it when calculating the new RT geometry instead of disabling
the optimization there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_sb.c |   12 ++++++++++--
 fs/xfs/libxfs/xfs_sb.h |    2 ++
 fs/xfs/xfs_rtalloc.c   |    5 +++--
 3 files changed, 15 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index f9c3045f71e0..a6fa9aedb28b 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -965,6 +965,15 @@ const struct xfs_buf_ops xfs_sb_quiet_buf_ops = {
 	.verify_write = xfs_sb_write_verify,
 };
 
+void
+xfs_mount_sb_set_rextsize(
+	struct xfs_mount	*mp,
+	struct xfs_sb		*sbp)
+{
+	mp->m_rtxblklog = log2_if_power2(sbp->sb_rextsize);
+	mp->m_rtxblkmask = mask64_if_power2(sbp->sb_rextsize);
+}
+
 /*
  * xfs_mount_common
  *
@@ -989,8 +998,7 @@ xfs_sb_mount_common(
 	mp->m_blockmask = sbp->sb_blocksize - 1;
 	mp->m_blockwsize = sbp->sb_blocksize >> XFS_WORDLOG;
 	mp->m_blockwmask = mp->m_blockwsize - 1;
-	mp->m_rtxblklog = log2_if_power2(sbp->sb_rextsize);
-	mp->m_rtxblkmask = mask64_if_power2(sbp->sb_rextsize);
+	xfs_mount_sb_set_rextsize(mp, sbp);
 
 	mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 1);
 	mp->m_alloc_mxr[1] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 0);
diff --git a/fs/xfs/libxfs/xfs_sb.h b/fs/xfs/libxfs/xfs_sb.h
index 796f02191dfd..885c83755991 100644
--- a/fs/xfs/libxfs/xfs_sb.h
+++ b/fs/xfs/libxfs/xfs_sb.h
@@ -17,6 +17,8 @@ extern void	xfs_log_sb(struct xfs_trans *tp);
 extern int	xfs_sync_sb(struct xfs_mount *mp, bool wait);
 extern int	xfs_sync_sb_buf(struct xfs_mount *mp);
 extern void	xfs_sb_mount_common(struct xfs_mount *mp, struct xfs_sb *sbp);
+void		xfs_mount_sb_set_rextsize(struct xfs_mount *mp,
+			struct xfs_sb *sbp);
 extern void	xfs_sb_from_disk(struct xfs_sb *to, struct xfs_dsb *from);
 extern void	xfs_sb_to_disk(struct xfs_dsb *to, struct xfs_sb *from);
 extern void	xfs_sb_quota_from_disk(struct xfs_sb *sbp);
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index a9f08d96f1fe..4bbb50d5a4b7 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -719,8 +719,8 @@ xfs_growfs_rt_bmblock(
 	/*
 	 * Calculate new sb and mount fields for this round.
 	 */
-	nmp->m_rtxblklog = -1; /* don't use shift or masking */
 	nmp->m_sb.sb_rextsize = rextsize;
+	xfs_mount_sb_set_rextsize(nmp, &nmp->m_sb);
 	nmp->m_sb.sb_rbmblocks = bmbno + 1;
 	nmp->m_sb.sb_rblocks = min(nrblocks, nrblocks_step);
 	nmp->m_sb.sb_rextents = xfs_rtb_to_rtx(nmp, nmp->m_sb.sb_rblocks);
@@ -807,10 +807,11 @@ xfs_growfs_rt_bmblock(
 	xfs_trans_mod_sb(args.tp, XFS_TRANS_SB_FREXTENTS, freed_rtx);
 
 	/*
-	 * Update mp values into the real mp structure.
+	 * Update the calculated values in the real mount structure.
 	 */
 	mp->m_rsumlevels = nmp->m_rsumlevels;
 	mp->m_rsumsize = nmp->m_rsumsize;
+	xfs_mount_sb_set_rextsize(mp, &mp->m_sb);
 
 	/*
 	 * Recompute the growfsrt reservation from the new rsumsize.


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 03/10] xfs: don't return too-short extents from xfs_rtallocate_extent_block
  2024-09-02 18:21 ` [PATCHSET v4.2 4/8] xfs: fixes for the realtime allocator Darrick J. Wong
  2024-09-02 18:27   ` [PATCH 01/10] xfs: use the recalculated transaction reservation in xfs_growfs_rt_bmblock Darrick J. Wong
  2024-09-02 18:27   ` [PATCH 02/10] xfs: ensure rtx mask/shift are correct after growfs Darrick J. Wong
@ 2024-09-02 18:27   ` Darrick J. Wong
  2024-09-02 18:28   ` [PATCH 04/10] xfs: don't scan off the end of the rt volume in xfs_rtallocate_extent_block Darrick J. Wong
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:27 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If xfs_rtallocate_extent_block is asked for a variable-sized allocation,
it will try to return the best-sized free extent, which is apparently
the largest one that it finds starting in this rtbitmap block.  It will
then trim the size of the extent as needed to align it with prod.

However, it misses one thing -- rounding down the best-fit candidate to
the required alignment could make the extent shorter than minlen.  In
the case where minlen > 1, we'd rather the caller relaxed its alignment
requirements and tried again, as the allocator already supports that.

Returning a too-short extent that causes xfs_bmapi_write to return
ENOSR if there aren't enough nmaps to handle multiple new allocations,
which can then cause filesystem shutdowns.

I haven't seen this happen on any production systems, but then I don't
think it's very common to set a per-file extent size hint on realtime
files.  I tripped it while working on the rtgroups feature and pounding
on the realtime allocator enthusiastically.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_rtalloc.c |   21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)


diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 4bbb50d5a4b7..c65ee8d1d38d 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -289,16 +289,9 @@ xfs_rtallocate_extent_block(
 			return error;
 	}
 
-	/*
-	 * Searched the whole thing & didn't find a maxlen free extent.
-	 */
-	if (minlen > maxlen || besti == -1) {
-		/*
-		 * Allocation failed.  Set *nextp to the next block to try.
-		 */
-		*nextp = next;
-		return -ENOSPC;
-	}
+	/* Searched the whole thing & didn't find a maxlen free extent. */
+	if (minlen > maxlen || besti == -1)
+		goto nospace;
 
 	/*
 	 * If size should be a multiple of prod, make that so.
@@ -311,12 +304,20 @@ xfs_rtallocate_extent_block(
 			bestlen -= p;
 	}
 
+	/* Don't return a too-short extent. */
+	if (bestlen < minlen)
+		goto nospace;
+
 	/*
 	 * Pick besti for bestlen & return that.
 	 */
 	*len = bestlen;
 	*rtx = besti;
 	return 0;
+nospace:
+	/* Allocation failed.  Set *nextp to the next block to try. */
+	*nextp = next;
+	return -ENOSPC;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 04/10] xfs: don't scan off the end of the rt volume in xfs_rtallocate_extent_block
  2024-09-02 18:21 ` [PATCHSET v4.2 4/8] xfs: fixes for the realtime allocator Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-09-02 18:27   ` [PATCH 03/10] xfs: don't return too-short extents from xfs_rtallocate_extent_block Darrick J. Wong
@ 2024-09-02 18:28   ` Darrick J. Wong
  2024-09-02 18:28   ` [PATCH 05/10] xfs: refactor aligning bestlen to prod Darrick J. Wong
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:28 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The loop conditional here is not quite correct because an rtbitmap block
can represent rtextents beyond the end of the rt volume.  There's no way
that it makes sense to scan for free space beyond EOFS, so don't do it.
This overrun has been present since v2.6.0.

Also fix the type of bestlen, which was incorrectly converted.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_rtalloc.c |   16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)


diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index c65ee8d1d38d..58081ce5247b 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -229,22 +229,20 @@ xfs_rtallocate_extent_block(
 	xfs_rtxnum_t		*rtx)	/* out: start rtext allocated */
 {
 	struct xfs_mount	*mp = args->mp;
-	xfs_rtxnum_t		besti;	/* best rtext found so far */
-	xfs_rtxnum_t		bestlen;/* best length found so far */
+	xfs_rtxnum_t		besti = -1; /* best rtext found so far */
 	xfs_rtxnum_t		end;	/* last rtext in chunk */
-	int			error;
 	xfs_rtxnum_t		i;	/* current rtext trying */
 	xfs_rtxnum_t		next;	/* next rtext to try */
+	xfs_rtxlen_t		bestlen = 0; /* best length found so far */
 	int			stat;	/* status from internal calls */
+	int			error;
 
 	/*
-	 * Loop over all the extents starting in this bitmap block,
-	 * looking for one that's long enough.
+	 * Loop over all the extents starting in this bitmap block up to the
+	 * end of the rt volume, looking for one that's long enough.
 	 */
-	for (i = xfs_rbmblock_to_rtx(mp, bbno), besti = -1, bestlen = 0,
-		end = xfs_rbmblock_to_rtx(mp, bbno + 1) - 1;
-	     i <= end;
-	     i++) {
+	end = min(mp->m_sb.sb_rextents, xfs_rbmblock_to_rtx(mp, bbno + 1)) - 1;
+	for (i = xfs_rbmblock_to_rtx(mp, bbno); i <= end; i++) {
 		/* Make sure we don't scan off the end of the rt volume. */
 		maxlen = xfs_rtallocate_clamp_len(mp, i, maxlen, prod);
 


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 05/10] xfs: refactor aligning bestlen to prod
  2024-09-02 18:21 ` [PATCHSET v4.2 4/8] xfs: fixes for the realtime allocator Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-09-02 18:28   ` [PATCH 04/10] xfs: don't scan off the end of the rt volume in xfs_rtallocate_extent_block Darrick J. Wong
@ 2024-09-02 18:28   ` Darrick J. Wong
  2024-09-02 18:28   ` [PATCH 06/10] xfs: clean up xfs_rtallocate_extent_exact a bit Darrick J. Wong
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:28 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

There are two places in xfs_rtalloc.c where we want to make sure that a
count of rt extents is aligned with a particular prod(uct) factor.  In
one spot, we actually use rounddown(), albeit unnecessarily if prod < 2.
In the other case, we open-code this rounding inefficiently by promoting
the 32-bit length value to a 64-bit value and then performing a 64-bit
division to figure out the subtraction.

Refactor this into a single helper that uses the correct types and
division method for the type, and skips the division entirely unless
prod is large enough to make a difference.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_rtalloc.c |   26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)


diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 58081ce5247b..11c58f12bcb2 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -194,6 +194,17 @@ xfs_rtallocate_range(
 	return xfs_rtmodify_range(args, start, len, 0);
 }
 
+/* Reduce @rtxlen until it is a multiple of @prod. */
+static inline xfs_rtxlen_t
+xfs_rtalloc_align_len(
+	xfs_rtxlen_t	rtxlen,
+	xfs_rtxlen_t	prod)
+{
+	if (unlikely(prod > 1))
+		return rounddown(rtxlen, prod);
+	return rtxlen;
+}
+
 /*
  * Make sure we don't run off the end of the rt volume.  Be careful that
  * adjusting maxlen downwards doesn't cause us to fail the alignment checks.
@@ -208,7 +219,7 @@ xfs_rtallocate_clamp_len(
 	xfs_rtxlen_t		ret;
 
 	ret = min(mp->m_sb.sb_rextents, startrtx + rtxlen) - startrtx;
-	return rounddown(ret, prod);
+	return xfs_rtalloc_align_len(ret, prod);
 }
 
 /*
@@ -292,17 +303,10 @@ xfs_rtallocate_extent_block(
 		goto nospace;
 
 	/*
-	 * If size should be a multiple of prod, make that so.
+	 * Ensure bestlen is a multiple of prod, but don't return a too-short
+	 * extent.
 	 */
-	if (prod > 1) {
-		xfs_rtxlen_t	p;	/* amount to trim length by */
-
-		div_u64_rem(bestlen, prod, &p);
-		if (p)
-			bestlen -= p;
-	}
-
-	/* Don't return a too-short extent. */
+	bestlen = xfs_rtalloc_align_len(bestlen, prod);
 	if (bestlen < minlen)
 		goto nospace;
 


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 06/10] xfs: clean up xfs_rtallocate_extent_exact a bit
  2024-09-02 18:21 ` [PATCHSET v4.2 4/8] xfs: fixes for the realtime allocator Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-09-02 18:28   ` [PATCH 05/10] xfs: refactor aligning bestlen to prod Darrick J. Wong
@ 2024-09-02 18:28   ` Darrick J. Wong
  2024-09-02 18:28   ` [PATCH 07/10] xfs: reduce excessive clamping of maxlen in xfs_rtallocate_extent_near Darrick J. Wong
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:28 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Before we start doing more surgery on the rt allocator, let's clean up
the exact allocator so that it doesn't change its arguments and uses the
helper introduced in the previous patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_rtalloc.c |   41 +++++++++++++++++++++--------------------
 1 file changed, 21 insertions(+), 20 deletions(-)


diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 11c58f12bcb2..af357704895d 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -338,10 +338,10 @@ xfs_rtallocate_extent_exact(
 	xfs_rtxlen_t		prod,	/* extent product factor */
 	xfs_rtxnum_t		*rtx)	/* out: start rtext allocated */
 {
-	int			error;
-	xfs_rtxlen_t		i;	/* extent length trimmed due to prod */
-	int			isfree;	/* extent is free */
 	xfs_rtxnum_t		next;	/* next rtext to try (dummy) */
+	xfs_rtxlen_t		alloclen; /* candidate length */
+	int			isfree;	/* extent is free */
+	int			error;
 
 	ASSERT(minlen % prod == 0);
 	ASSERT(maxlen % prod == 0);
@@ -352,25 +352,26 @@ xfs_rtallocate_extent_exact(
 	if (error)
 		return error;
 
-	if (!isfree) {
-		/*
-		 * If not, allocate what there is, if it's at least minlen.
-		 */
-		maxlen = next - start;
-		if (maxlen < minlen)
-			return -ENOSPC;
-
-		/*
-		 * Trim off tail of extent, if prod is specified.
-		 */
-		if (prod > 1 && (i = maxlen % prod)) {
-			maxlen -= i;
-			if (maxlen < minlen)
-				return -ENOSPC;
-		}
+	if (isfree) {
+		/* start to maxlen is all free; allocate it. */
+		*len = maxlen;
+		*rtx = start;
+		return 0;
 	}
 
-	*len = maxlen;
+	/*
+	 * If not, allocate what there is, if it's at least minlen.
+	 */
+	alloclen = next - start;
+	if (alloclen < minlen)
+		return -ENOSPC;
+
+	/* Ensure alloclen is a multiple of prod. */
+	alloclen = xfs_rtalloc_align_len(alloclen, prod);
+	if (alloclen < minlen)
+		return -ENOSPC;
+
+	*len = alloclen;
 	*rtx = start;
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 07/10] xfs: reduce excessive clamping of maxlen in xfs_rtallocate_extent_near
  2024-09-02 18:21 ` [PATCHSET v4.2 4/8] xfs: fixes for the realtime allocator Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-09-02 18:28   ` [PATCH 06/10] xfs: clean up xfs_rtallocate_extent_exact a bit Darrick J. Wong
@ 2024-09-02 18:28   ` Darrick J. Wong
  2024-09-02 18:29   ` [PATCH 08/10] xfs: fix broken variable-sized allocation detection in xfs_rtallocate_extent_block Darrick J. Wong
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:28 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The near rt allocator employs two allocation strategies -- first it
tries to allocate at exactly @start.  If that fails, it will pivot back
and forth around that starting point looking for an appropriately sized
free space.

However, I clamped maxlen ages ago to prevent the exact allocation scan
from running off the end of the rt volume.  This, I realize, was
excessive.  If the allocation request is (say) for 32 rtx but the start
position is 5 rtx from the end of the volume, we clamp maxlen to 5.  If
the exact allocation fails, we then pivot back and forth looking for 5
rtx, even though the original intent was to try to get 32 rtx.

If we then find 5 rtx when we could have gotten 32 rtx, we've not done
as well as we could have.  This may be moot if the caller immediately
comes back for more space, but it might not be.  Either way, we can do
better here.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_rtalloc.c |   23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)


diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index af357704895d..d27bfec08ef8 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -338,23 +338,29 @@ xfs_rtallocate_extent_exact(
 	xfs_rtxlen_t		prod,	/* extent product factor */
 	xfs_rtxnum_t		*rtx)	/* out: start rtext allocated */
 {
+	struct xfs_mount	*mp = args->mp;
 	xfs_rtxnum_t		next;	/* next rtext to try (dummy) */
 	xfs_rtxlen_t		alloclen; /* candidate length */
+	xfs_rtxlen_t		scanlen; /* number of free rtx to look for */
 	int			isfree;	/* extent is free */
 	int			error;
 
 	ASSERT(minlen % prod == 0);
 	ASSERT(maxlen % prod == 0);
-	/*
-	 * Check if the range in question (for maxlen) is free.
-	 */
-	error = xfs_rtcheck_range(args, start, maxlen, 1, &next, &isfree);
+
+	/* Make sure we don't run off the end of the rt volume. */
+	scanlen = xfs_rtallocate_clamp_len(mp, start, maxlen, prod);
+	if (scanlen < minlen)
+		return -ENOSPC;
+
+	/* Check if the range in question (for scanlen) is free. */
+	error = xfs_rtcheck_range(args, start, scanlen, 1, &next, &isfree);
 	if (error)
 		return error;
 
 	if (isfree) {
-		/* start to maxlen is all free; allocate it. */
-		*len = maxlen;
+		/* start to scanlen is all free; allocate it. */
+		*len = scanlen;
 		*rtx = start;
 		return 0;
 	}
@@ -410,11 +416,6 @@ xfs_rtallocate_extent_near(
 	if (start >= mp->m_sb.sb_rextents)
 		start = mp->m_sb.sb_rextents - 1;
 
-	/* Make sure we don't run off the end of the rt volume. */
-	maxlen = xfs_rtallocate_clamp_len(mp, start, maxlen, prod);
-	if (maxlen < minlen)
-		return -ENOSPC;
-
 	/*
 	 * Try the exact allocation first.
 	 */


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 08/10] xfs: fix broken variable-sized allocation detection in xfs_rtallocate_extent_block
  2024-09-02 18:21 ` [PATCHSET v4.2 4/8] xfs: fixes for the realtime allocator Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-09-02 18:28   ` [PATCH 07/10] xfs: reduce excessive clamping of maxlen in xfs_rtallocate_extent_near Darrick J. Wong
@ 2024-09-02 18:29   ` Darrick J. Wong
  2024-09-02 18:29   ` [PATCH 09/10] xfs: remove xfs_rtb_to_rtxrem Darrick J. Wong
  2024-09-02 18:29   ` [PATCH 10/10] xfs: simplify xfs_rtalloc_query_range Darrick J. Wong
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:29 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

This function tries to find a suitable free space extent starting from
a particular rtbitmap block.  Some time ago, I added a clamping function
to prevent the free space scans from running off the end of the bitmap,
but I didn't quite get the logic right.

Let's say there's an allocation request with a minlen of 5 and a maxlen
of 32 and we're scanning the last rtbitmap block.  If we come within 4
rtx of the end of the rt volume, maxlen will get clamped to 4.  If the
next 3 rtx are free, we could have satisfied the allocation, but the
code setting partial besti/bestlen for "minlen < maxlen" will think that
we're doing a non-variable allocation and ignore it.

The root of this problem is overwriting maxlen; I should have stuffed
the results in a different variable, which would not have introduced
this bug.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_rtalloc.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)


diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index d27bfec08ef8..72123e2337d8 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -244,6 +244,7 @@ xfs_rtallocate_extent_block(
 	xfs_rtxnum_t		end;	/* last rtext in chunk */
 	xfs_rtxnum_t		i;	/* current rtext trying */
 	xfs_rtxnum_t		next;	/* next rtext to try */
+	xfs_rtxlen_t		scanlen; /* number of free rtx to look for */
 	xfs_rtxlen_t		bestlen = 0; /* best length found so far */
 	int			stat;	/* status from internal calls */
 	int			error;
@@ -255,20 +256,22 @@ xfs_rtallocate_extent_block(
 	end = min(mp->m_sb.sb_rextents, xfs_rbmblock_to_rtx(mp, bbno + 1)) - 1;
 	for (i = xfs_rbmblock_to_rtx(mp, bbno); i <= end; i++) {
 		/* Make sure we don't scan off the end of the rt volume. */
-		maxlen = xfs_rtallocate_clamp_len(mp, i, maxlen, prod);
+		scanlen = xfs_rtallocate_clamp_len(mp, i, maxlen, prod);
+		if (scanlen < minlen)
+			break;
 
 		/*
-		 * See if there's a free extent of maxlen starting at i.
+		 * See if there's a free extent of scanlen starting at i.
 		 * If it's not so then next will contain the first non-free.
 		 */
-		error = xfs_rtcheck_range(args, i, maxlen, 1, &next, &stat);
+		error = xfs_rtcheck_range(args, i, scanlen, 1, &next, &stat);
 		if (error)
 			return error;
 		if (stat) {
 			/*
-			 * i for maxlen is all free, allocate and return that.
+			 * i to scanlen is all free, allocate and return that.
 			 */
-			*len = maxlen;
+			*len = scanlen;
 			*rtx = i;
 			return 0;
 		}
@@ -299,7 +302,7 @@ xfs_rtallocate_extent_block(
 	}
 
 	/* Searched the whole thing & didn't find a maxlen free extent. */
-	if (minlen > maxlen || besti == -1)
+	if (besti == -1)
 		goto nospace;
 
 	/*


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 09/10] xfs: remove xfs_rtb_to_rtxrem
  2024-09-02 18:21 ` [PATCHSET v4.2 4/8] xfs: fixes for the realtime allocator Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-09-02 18:29   ` [PATCH 08/10] xfs: fix broken variable-sized allocation detection in xfs_rtallocate_extent_block Darrick J. Wong
@ 2024-09-02 18:29   ` Darrick J. Wong
  2024-09-02 18:29   ` [PATCH 10/10] xfs: simplify xfs_rtalloc_query_range Darrick J. Wong
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:29 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Simplify the number of block number conversion helpers by removing
xfs_rtb_to_rtxrem.  Any recent compiler is smart enough to eliminate
the double divisions if using separate xfs_rtb_to_rtx and
xfs_rtb_to_rtxoff calls.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_rtbitmap.c |    9 ++++-----
 fs/xfs/libxfs/xfs_rtbitmap.h |   18 ------------------
 2 files changed, 4 insertions(+), 23 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
index d7c731aeee12..431ef62939ca 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.c
+++ b/fs/xfs/libxfs/xfs_rtbitmap.c
@@ -1022,25 +1022,24 @@ xfs_rtfree_blocks(
 	xfs_filblks_t		rtlen)
 {
 	struct xfs_mount	*mp = tp->t_mountp;
-	xfs_rtxnum_t		start;
-	xfs_filblks_t		len;
 	xfs_extlen_t		mod;
 
 	ASSERT(rtlen <= XFS_MAX_BMBT_EXTLEN);
 
-	len = xfs_rtb_to_rtxrem(mp, rtlen, &mod);
+	mod = xfs_rtb_to_rtxoff(mp, rtlen);
 	if (mod) {
 		ASSERT(mod == 0);
 		return -EIO;
 	}
 
-	start = xfs_rtb_to_rtxrem(mp, rtbno, &mod);
+	mod = xfs_rtb_to_rtxoff(mp, rtbno);
 	if (mod) {
 		ASSERT(mod == 0);
 		return -EIO;
 	}
 
-	return xfs_rtfree_extent(tp, start, len);
+	return xfs_rtfree_extent(tp, xfs_rtb_to_rtx(mp, rtbno),
+			xfs_rtb_to_rtx(mp, rtlen));
 }
 
 /* Find all the free records within a given range. */
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h
index 523d3d3c12c6..69ddacd4b01e 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.h
+++ b/fs/xfs/libxfs/xfs_rtbitmap.h
@@ -86,24 +86,6 @@ xfs_rtb_to_rtxoff(
 	return do_div(rtbno, mp->m_sb.sb_rextsize);
 }
 
-/*
- * Crack an rt block number into an rt extent number and an offset within that
- * rt extent.  Returns the rt extent number directly and the offset in @off.
- */
-static inline xfs_rtxnum_t
-xfs_rtb_to_rtxrem(
-	struct xfs_mount	*mp,
-	xfs_rtblock_t		rtbno,
-	xfs_extlen_t		*off)
-{
-	if (likely(mp->m_rtxblklog >= 0)) {
-		*off = rtbno & mp->m_rtxblkmask;
-		return rtbno >> mp->m_rtxblklog;
-	}
-
-	return div_u64_rem(rtbno, mp->m_sb.sb_rextsize, off);
-}
-
 /*
  * Convert an rt block number into an rt extent number, rounding up to the next
  * rt extent if the rt block is not aligned to an rt extent boundary.


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 10/10] xfs: simplify xfs_rtalloc_query_range
  2024-09-02 18:21 ` [PATCHSET v4.2 4/8] xfs: fixes for the realtime allocator Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-09-02 18:29   ` [PATCH 09/10] xfs: remove xfs_rtb_to_rtxrem Darrick J. Wong
@ 2024-09-02 18:29   ` Darrick J. Wong
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:29 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

There isn't much of a good reason to pass the xfs_rtalloc_rec structures
that describe extents to xfs_rtalloc_query_range as we really just want
a lower and upper bound xfs_rtxnum_t.  Pass the rtxnum directly and
simply the interface.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_rtbitmap.c |   42 +++++++++++++++++-------------------------
 fs/xfs/libxfs/xfs_rtbitmap.h |    3 +--
 fs/xfs/xfs_discard.c         |   15 +++++++--------
 fs/xfs/xfs_fsmap.c           |   11 +++++------
 4 files changed, 30 insertions(+), 41 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
index 431ef62939ca..c58eb75ef0fa 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.c
+++ b/fs/xfs/libxfs/xfs_rtbitmap.c
@@ -1047,8 +1047,8 @@ int
 xfs_rtalloc_query_range(
 	struct xfs_mount		*mp,
 	struct xfs_trans		*tp,
-	const struct xfs_rtalloc_rec	*low_rec,
-	const struct xfs_rtalloc_rec	*high_rec,
+	xfs_rtxnum_t			start,
+	xfs_rtxnum_t			end,
 	xfs_rtalloc_query_range_fn	fn,
 	void				*priv)
 {
@@ -1056,45 +1056,42 @@ xfs_rtalloc_query_range(
 		.mp			= mp,
 		.tp			= tp,
 	};
-	struct xfs_rtalloc_rec		rec;
-	xfs_rtxnum_t			rtstart;
-	xfs_rtxnum_t			rtend;
-	xfs_rtxnum_t			high_key;
-	int				is_free;
 	int				error = 0;
 
-	if (low_rec->ar_startext > high_rec->ar_startext)
+	if (start > end)
 		return -EINVAL;
-	if (low_rec->ar_startext >= mp->m_sb.sb_rextents ||
-	    low_rec->ar_startext == high_rec->ar_startext)
+	if (start == end || start >= mp->m_sb.sb_rextents)
 		return 0;
 
-	high_key = min(high_rec->ar_startext, mp->m_sb.sb_rextents - 1);
+	end = min(end, mp->m_sb.sb_rextents - 1);
 
 	/* Iterate the bitmap, looking for discrepancies. */
-	rtstart = low_rec->ar_startext;
-	while (rtstart <= high_key) {
+	while (start <= end) {
+		struct xfs_rtalloc_rec	rec;
+		int			is_free;
+		xfs_rtxnum_t		rtend;
+
 		/* Is the first block free? */
-		error = xfs_rtcheck_range(&args, rtstart, 1, 1, &rtend,
+		error = xfs_rtcheck_range(&args, start, 1, 1, &rtend,
 				&is_free);
 		if (error)
 			break;
 
 		/* How long does the extent go for? */
-		error = xfs_rtfind_forw(&args, rtstart, high_key, &rtend);
+		error = xfs_rtfind_forw(&args, start, end, &rtend);
 		if (error)
 			break;
 
 		if (is_free) {
-			rec.ar_startext = rtstart;
-			rec.ar_extcount = rtend - rtstart + 1;
+			rec.ar_startext = start;
+			rec.ar_extcount = rtend - start + 1;
 
 			error = fn(mp, tp, &rec, priv);
 			if (error)
 				break;
 		}
 
-		rtstart = rtend + 1;
+		start = rtend + 1;
 	}
 
 	xfs_rtbuf_cache_relse(&args);
@@ -1109,13 +1106,8 @@ xfs_rtalloc_query_all(
 	xfs_rtalloc_query_range_fn	fn,
 	void				*priv)
 {
-	struct xfs_rtalloc_rec		keys[2];
-
-	keys[0].ar_startext = 0;
-	keys[1].ar_startext = mp->m_sb.sb_rextents - 1;
-	keys[0].ar_extcount = keys[1].ar_extcount = 0;
-
-	return xfs_rtalloc_query_range(mp, tp, &keys[0], &keys[1], fn, priv);
+	return xfs_rtalloc_query_range(mp, tp, 0, mp->m_sb.sb_rextents - 1, fn,
+			priv);
 }
 
 /* Is the given extent all free? */
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h
index 69ddacd4b01e..0dbc9bb40668 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.h
+++ b/fs/xfs/libxfs/xfs_rtbitmap.h
@@ -292,8 +292,7 @@ int xfs_rtmodify_summary(struct xfs_rtalloc_args *args, int log,
 int xfs_rtfree_range(struct xfs_rtalloc_args *args, xfs_rtxnum_t start,
 		xfs_rtxlen_t len);
 int xfs_rtalloc_query_range(struct xfs_mount *mp, struct xfs_trans *tp,
-		const struct xfs_rtalloc_rec *low_rec,
-		const struct xfs_rtalloc_rec *high_rec,
+		xfs_rtxnum_t start, xfs_rtxnum_t end,
 		xfs_rtalloc_query_range_fn fn, void *priv);
 int xfs_rtalloc_query_all(struct xfs_mount *mp, struct xfs_trans *tp,
 			  xfs_rtalloc_query_range_fn fn,
diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
index 25f5dffeab2a..bf1e3f330018 100644
--- a/fs/xfs/xfs_discard.c
+++ b/fs/xfs/xfs_discard.c
@@ -554,11 +554,10 @@ xfs_trim_rtdev_extents(
 	xfs_daddr_t		end,
 	xfs_daddr_t		minlen)
 {
-	struct xfs_rtalloc_rec	low = { };
-	struct xfs_rtalloc_rec	high = { };
 	struct xfs_trim_rtdev	tr = {
 		.minlen_fsb	= XFS_BB_TO_FSB(mp, minlen),
 	};
+	xfs_rtxnum_t		low, high;
 	struct xfs_trans	*tp;
 	xfs_daddr_t		rtdev_daddr;
 	int			error;
@@ -584,17 +583,17 @@ xfs_trim_rtdev_extents(
 			XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks) - 1);
 
 	/* Convert the rt blocks to rt extents */
-	low.ar_startext = xfs_rtb_to_rtxup(mp, XFS_BB_TO_FSB(mp, start));
-	high.ar_startext = xfs_rtb_to_rtx(mp, XFS_BB_TO_FSBT(mp, end));
+	low = xfs_rtb_to_rtxup(mp, XFS_BB_TO_FSB(mp, start));
+	high = xfs_rtb_to_rtx(mp, XFS_BB_TO_FSBT(mp, end));
 
 	/*
 	 * Walk the free ranges between low and high.  The query_range function
 	 * trims the extents returned.
 	 */
 	do {
-		tr.stop_rtx = low.ar_startext + (mp->m_sb.sb_blocksize * NBBY);
+		tr.stop_rtx = low + (mp->m_sb.sb_blocksize * NBBY);
 		xfs_rtbitmap_lock_shared(mp, XFS_RBMLOCK_BITMAP);
-		error = xfs_rtalloc_query_range(mp, tp, &low, &high,
+		error = xfs_rtalloc_query_range(mp, tp, low, high,
 				xfs_trim_gather_rtextent, &tr);
 
 		if (error == -ECANCELED)
@@ -615,8 +614,8 @@ xfs_trim_rtdev_extents(
 		if (error)
 			break;
 
-		low.ar_startext = tr.restart_rtx;
-	} while (!xfs_trim_should_stop() && low.ar_startext <= high.ar_startext);
+		low = tr.restart_rtx;
+	} while (!xfs_trim_should_stop() && low <= high);
 
 	xfs_trans_cancel(tp);
 	return error;
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index 71f32354944e..e15446626875 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -520,11 +520,11 @@ xfs_getfsmap_rtdev_rtbitmap(
 	struct xfs_getfsmap_info	*info)
 {
 
-	struct xfs_rtalloc_rec		alow = { 0 };
 	struct xfs_rtalloc_rec		ahigh = { 0 };
 	struct xfs_mount		*mp = tp->t_mountp;
 	xfs_rtblock_t			start_rtb;
 	xfs_rtblock_t			end_rtb;
+	xfs_rtxnum_t			high;
 	uint64_t			eofs;
 	int				error;
 
@@ -553,10 +553,9 @@ xfs_getfsmap_rtdev_rtbitmap(
 	 * Set up query parameters to return free rtextents covering the range
 	 * we want.
 	 */
-	alow.ar_startext = xfs_rtb_to_rtx(mp, start_rtb);
-	ahigh.ar_startext = xfs_rtb_to_rtxup(mp, end_rtb);
-	error = xfs_rtalloc_query_range(mp, tp, &alow, &ahigh,
-			xfs_getfsmap_rtdev_rtbitmap_helper, info);
+	high = xfs_rtb_to_rtxup(mp, end_rtb);
+	error = xfs_rtalloc_query_range(mp, tp, xfs_rtb_to_rtx(mp, start_rtb),
+			high, xfs_getfsmap_rtdev_rtbitmap_helper, info);
 	if (error)
 		goto err;
 
@@ -565,7 +564,7 @@ xfs_getfsmap_rtdev_rtbitmap(
 	 * rmap starting at the block after the end of the query range.
 	 */
 	info->last = true;
-	ahigh.ar_startext = min(mp->m_sb.sb_rextents, ahigh.ar_startext);
+	ahigh.ar_startext = min(mp->m_sb.sb_rextents, high);
 
 	error = xfs_getfsmap_rtdev_rtbitmap_helper(mp, tp, &ahigh, info);
 	if (error)


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 01/10] xfs: clean up the ISVALID macro in xfs_bmap_adjacent
  2024-09-02 18:22 ` [PATCHSET v4.2 5/8] xfs: cleanups for the realtime allocator Darrick J. Wong
@ 2024-09-02 18:29   ` Darrick J. Wong
  2024-09-02 18:30   ` [PATCH 02/10] xfs: factor out a xfs_rtallocate helper Darrick J. Wong
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:29 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Turn the  ISVALID macro defined and used inside in xfs_bmap_adjacent
that relies on implict context into a proper inline function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_bmap.c |   55 +++++++++++++++++++++++++++-------------------
 1 file changed, 32 insertions(+), 23 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 112c7ee2d493..434433ed29dc 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3112,6 +3112,23 @@ xfs_bmap_extsize_align(
 	return 0;
 }
 
+static inline bool
+xfs_bmap_adjacent_valid(
+	struct xfs_bmalloca	*ap,
+	xfs_fsblock_t		x,
+	xfs_fsblock_t		y)
+{
+	struct xfs_mount	*mp = ap->ip->i_mount;
+
+	if (XFS_IS_REALTIME_INODE(ap->ip) &&
+	    (ap->datatype & XFS_ALLOC_USERDATA))
+		return x < mp->m_sb.sb_rblocks;
+
+	return XFS_FSB_TO_AGNO(mp, x) == XFS_FSB_TO_AGNO(mp, y) &&
+		XFS_FSB_TO_AGNO(mp, x) < mp->m_sb.sb_agcount &&
+		XFS_FSB_TO_AGBNO(mp, x) < mp->m_sb.sb_agblocks;
+}
+
 #define XFS_ALLOC_GAP_UNITS	4
 
 /* returns true if ap->blkno was modified */
@@ -3119,36 +3136,25 @@ bool
 xfs_bmap_adjacent(
 	struct xfs_bmalloca	*ap)	/* bmap alloc argument struct */
 {
-	xfs_fsblock_t	adjust;		/* adjustment to block numbers */
-	xfs_mount_t	*mp;		/* mount point structure */
-	int		rt;		/* true if inode is realtime */
+	xfs_fsblock_t		adjust;		/* adjustment to block numbers */
 
-#define	ISVALID(x,y)	\
-	(rt ? \
-		(x) < mp->m_sb.sb_rblocks : \
-		XFS_FSB_TO_AGNO(mp, x) == XFS_FSB_TO_AGNO(mp, y) && \
-		XFS_FSB_TO_AGNO(mp, x) < mp->m_sb.sb_agcount && \
-		XFS_FSB_TO_AGBNO(mp, x) < mp->m_sb.sb_agblocks)
-
-	mp = ap->ip->i_mount;
-	rt = XFS_IS_REALTIME_INODE(ap->ip) &&
-		(ap->datatype & XFS_ALLOC_USERDATA);
 	/*
 	 * If allocating at eof, and there's a previous real block,
 	 * try to use its last block as our starting point.
 	 */
 	if (ap->eof && ap->prev.br_startoff != NULLFILEOFF &&
 	    !isnullstartblock(ap->prev.br_startblock) &&
-	    ISVALID(ap->prev.br_startblock + ap->prev.br_blockcount,
-		    ap->prev.br_startblock)) {
+	    xfs_bmap_adjacent_valid(ap,
+			ap->prev.br_startblock + ap->prev.br_blockcount,
+			ap->prev.br_startblock)) {
 		ap->blkno = ap->prev.br_startblock + ap->prev.br_blockcount;
 		/*
 		 * Adjust for the gap between prevp and us.
 		 */
 		adjust = ap->offset -
 			(ap->prev.br_startoff + ap->prev.br_blockcount);
-		if (adjust &&
-		    ISVALID(ap->blkno + adjust, ap->prev.br_startblock))
+		if (adjust && xfs_bmap_adjacent_valid(ap, ap->blkno + adjust,
+				ap->prev.br_startblock))
 			ap->blkno += adjust;
 		return true;
 	}
@@ -3171,7 +3177,8 @@ xfs_bmap_adjacent(
 		    !isnullstartblock(ap->prev.br_startblock) &&
 		    (prevbno = ap->prev.br_startblock +
 			       ap->prev.br_blockcount) &&
-		    ISVALID(prevbno, ap->prev.br_startblock)) {
+		    xfs_bmap_adjacent_valid(ap, prevbno,
+				ap->prev.br_startblock)) {
 			/*
 			 * Calculate gap to end of previous block.
 			 */
@@ -3187,8 +3194,8 @@ xfs_bmap_adjacent(
 			 * number, then just use the end of the previous block.
 			 */
 			if (prevdiff <= XFS_ALLOC_GAP_UNITS * ap->length &&
-			    ISVALID(prevbno + prevdiff,
-				    ap->prev.br_startblock))
+			    xfs_bmap_adjacent_valid(ap, prevbno + prevdiff,
+					ap->prev.br_startblock))
 				prevbno += adjust;
 			else
 				prevdiff += adjust;
@@ -3220,9 +3227,11 @@ xfs_bmap_adjacent(
 			 * offset by our length.
 			 */
 			if (gotdiff <= XFS_ALLOC_GAP_UNITS * ap->length &&
-			    ISVALID(gotbno - gotdiff, gotbno))
+			    xfs_bmap_adjacent_valid(ap, gotbno - gotdiff,
+					gotbno))
 				gotbno -= adjust;
-			else if (ISVALID(gotbno - ap->length, gotbno)) {
+			else if (xfs_bmap_adjacent_valid(ap, gotbno - ap->length,
+					gotbno)) {
 				gotbno -= ap->length;
 				gotdiff += adjust - ap->length;
 			} else
@@ -3250,7 +3259,7 @@ xfs_bmap_adjacent(
 			return true;
 		}
 	}
-#undef ISVALID
+
 	return false;
 }
 


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 02/10] xfs: factor out a xfs_rtallocate helper
  2024-09-02 18:22 ` [PATCHSET v4.2 5/8] xfs: cleanups for the realtime allocator Darrick J. Wong
  2024-09-02 18:29   ` [PATCH 01/10] xfs: clean up the ISVALID macro in xfs_bmap_adjacent Darrick J. Wong
@ 2024-09-02 18:30   ` Darrick J. Wong
  2024-09-02 18:30   ` [PATCH 03/10] xfs: rework the rtalloc fallback handling Darrick J. Wong
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:30 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Split out a helper from xfs_rtallocate that performs the actual
allocation.  This keeps the scope of the xfs_rtalloc_args structure
contained, and prepares for rtgroups support.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_rtalloc.c |   81 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 50 insertions(+), 31 deletions(-)


diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 72123e2337d8..12cf7cb3c02c 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1263,6 +1263,51 @@ xfs_rtalloc_align_minmax(
 	*raminlen = newminlen;
 }
 
+static int
+xfs_rtallocate(
+	struct xfs_trans	*tp,
+	xfs_rtxnum_t		start,
+	xfs_rtxlen_t		minlen,
+	xfs_rtxlen_t		maxlen,
+	xfs_rtxlen_t		prod,
+	bool			wasdel,
+	xfs_rtblock_t		*bno,
+	xfs_extlen_t		*blen)
+{
+	struct xfs_rtalloc_args	args = {
+		.mp		= tp->t_mountp,
+		.tp		= tp,
+	};
+	xfs_rtxnum_t		rtx;
+	xfs_rtxlen_t		len = 0;
+	int			error;
+
+	if (start) {
+		error = xfs_rtallocate_extent_near(&args, start, minlen, maxlen,
+				&len, prod, &rtx);
+	} else {
+		error = xfs_rtallocate_extent_size(&args, minlen, maxlen, &len,
+				prod, &rtx);
+	}
+
+	if (error)
+		goto out_release;
+
+	error = xfs_rtallocate_range(&args, rtx, len);
+	if (error)
+		goto out_release;
+
+	xfs_trans_mod_sb(tp, wasdel ?
+			XFS_TRANS_SB_RES_FREXTENTS : XFS_TRANS_SB_FREXTENTS,
+			-(long)len);
+	*bno = xfs_rtx_to_rtb(args.mp, rtx);
+	*blen = xfs_rtxlen_to_extlen(args.mp, len);
+
+out_release:
+	xfs_rtbuf_cache_relse(&args);
+	return error;
+}
+
 int
 xfs_bmap_rtalloc(
 	struct xfs_bmalloca	*ap)
@@ -1270,7 +1315,6 @@ xfs_bmap_rtalloc(
 	struct xfs_mount	*mp = ap->ip->i_mount;
 	xfs_fileoff_t		orig_offset = ap->offset;
 	xfs_rtxnum_t		start;	   /* allocation hint rtextent no */
-	xfs_rtxnum_t		rtx;	   /* actually allocated rtextent no */
 	xfs_rtxlen_t		prod = 0;  /* product factor for allocators */
 	xfs_extlen_t		mod = 0;   /* product factor for allocators */
 	xfs_rtxlen_t		ralen = 0; /* realtime allocation length */
@@ -1280,10 +1324,6 @@ xfs_bmap_rtalloc(
 	xfs_rtxlen_t		raminlen;
 	bool			rtlocked = false;
 	bool			ignore_locality = false;
-	struct xfs_rtalloc_args	args = {
-		.mp		= mp,
-		.tp		= ap->tp,
-	};
 	int			error;
 
 	align = xfs_get_extsz_hint(ap->ip);
@@ -1357,19 +1397,9 @@ xfs_bmap_rtalloc(
 			xfs_rtalloc_align_minmax(&raminlen, &ralen, &prod);
 	}
 
-	if (start) {
-		error = xfs_rtallocate_extent_near(&args, start, raminlen,
-				ralen, &ralen, prod, &rtx);
-	} else {
-		error = xfs_rtallocate_extent_size(&args, raminlen,
-				ralen, &ralen, prod, &rtx);
-	}
-
-	if (error) {
-		xfs_rtbuf_cache_relse(&args);
-		if (error != -ENOSPC)
-			return error;
-
+	error = xfs_rtallocate(ap->tp, start, raminlen, ralen, prod, ap->wasdel,
+			       &ap->blkno, &ap->length);
+	if (error == -ENOSPC) {
 		if (align > mp->m_sb.sb_rextsize) {
 			/*
 			 * We previously enlarged the request length to try to
@@ -1397,20 +1427,9 @@ xfs_bmap_rtalloc(
 		ap->length = 0;
 		return 0;
 	}
-
-	error = xfs_rtallocate_range(&args, rtx, ralen);
 	if (error)
-		goto out_release;
+		return error;
 
-	xfs_trans_mod_sb(ap->tp, ap->wasdel ?
-			XFS_TRANS_SB_RES_FREXTENTS : XFS_TRANS_SB_FREXTENTS,
-			-(long)ralen);
-
-	ap->blkno = xfs_rtx_to_rtb(mp, rtx);
-	ap->length = xfs_rtxlen_to_extlen(mp, ralen);
 	xfs_bmap_alloc_account(ap);
-
-out_release:
-	xfs_rtbuf_cache_relse(&args);
-	return error;
+	return 0;
 }


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 03/10] xfs: rework the rtalloc fallback handling
  2024-09-02 18:22 ` [PATCHSET v4.2 5/8] xfs: cleanups for the realtime allocator Darrick J. Wong
  2024-09-02 18:29   ` [PATCH 01/10] xfs: clean up the ISVALID macro in xfs_bmap_adjacent Darrick J. Wong
  2024-09-02 18:30   ` [PATCH 02/10] xfs: factor out a xfs_rtallocate helper Darrick J. Wong
@ 2024-09-02 18:30   ` Darrick J. Wong
  2024-09-02 18:30   ` [PATCH 04/10] xfs: factor out a xfs_rtallocate_align helper Darrick J. Wong
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:30 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

xfs_rtallocate currently has two fallbacks, when an allocation fails:

 1) drop the requested extent size alignment, if any, and retry
 2) ignore the locality hint

Oddly enough it does those in order, as trying a different location
is more in line with what the user asked for, and does it in a very
unstructured way.

Lift the fallback to try to allocate without the locality hint into
xfs_rtallocate to both perform them in a more sensible order and to
clean up the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_rtalloc.c |   69 +++++++++++++++++++++++++-------------------------
 1 file changed, 34 insertions(+), 35 deletions(-)


diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 12cf7cb3c02c..a6b9ba572cdc 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1271,6 +1271,8 @@ xfs_rtallocate(
 	xfs_rtxlen_t		maxlen,
 	xfs_rtxlen_t		prod,
 	bool			wasdel,
+	bool			initial_user_data,
+	bool			*rtlocked,
 	xfs_rtblock_t		*bno,
 	xfs_extlen_t		*blen)
 {
@@ -1280,12 +1282,38 @@ xfs_rtallocate(
 	};
 	xfs_rtxnum_t		rtx;
 	xfs_rtxlen_t		len = 0;
-	int			error;
+	int			error = 0;
+
+	/*
+	 * Lock out modifications to both the RT bitmap and summary inodes.
+	 */
+	if (!*rtlocked) {
+		xfs_rtbitmap_lock(args.mp);
+		xfs_rtbitmap_trans_join(tp);
+		*rtlocked = true;
+	}
+
+	/*
+	 * For an allocation to an empty file at offset 0, pick an extent that
+	 * will space things out in the rt area.
+	 */
+	if (!start && initial_user_data)
+		start = xfs_rtpick_extent(args.mp, tp, maxlen);
 
 	if (start) {
 		error = xfs_rtallocate_extent_near(&args, start, minlen, maxlen,
 				&len, prod, &rtx);
-	} else {
+		/*
+		 * If we can't allocate near a specific rt extent, try again
+		 * without locality criteria.
+		 */
+		if (error == -ENOSPC) {
+			xfs_rtbuf_cache_relse(&args);
+			error = 0;
+		}
+	}
+
+	if (!error) {
 		error = xfs_rtallocate_extent_size(&args, minlen, maxlen, &len,
 				prod, &rtx);
 	}
@@ -1314,7 +1342,7 @@ xfs_bmap_rtalloc(
 {
 	struct xfs_mount	*mp = ap->ip->i_mount;
 	xfs_fileoff_t		orig_offset = ap->offset;
-	xfs_rtxnum_t		start;	   /* allocation hint rtextent no */
+	xfs_rtxnum_t		start = 0;   /* allocation hint rtextent no */
 	xfs_rtxlen_t		prod = 0;  /* product factor for allocators */
 	xfs_extlen_t		mod = 0;   /* product factor for allocators */
 	xfs_rtxlen_t		ralen = 0; /* realtime allocation length */
@@ -1323,7 +1351,6 @@ xfs_bmap_rtalloc(
 	xfs_extlen_t		minlen = mp->m_sb.sb_rextsize;
 	xfs_rtxlen_t		raminlen;
 	bool			rtlocked = false;
-	bool			ignore_locality = false;
 	int			error;
 
 	align = xfs_get_extsz_hint(ap->ip);
@@ -1361,28 +1388,8 @@ xfs_bmap_rtalloc(
 	ASSERT(raminlen > 0);
 	ASSERT(raminlen <= ralen);
 
-	/*
-	 * Lock out modifications to both the RT bitmap and summary inodes
-	 */
-	if (!rtlocked) {
-		xfs_rtbitmap_lock(mp);
-		xfs_rtbitmap_trans_join(ap->tp);
-		rtlocked = true;
-	}
-
-	if (ignore_locality) {
-		start = 0;
-	} else if (xfs_bmap_adjacent(ap)) {
+	if (xfs_bmap_adjacent(ap))
 		start = xfs_rtb_to_rtx(mp, ap->blkno);
-	} else if (ap->datatype & XFS_ALLOC_INITIAL_USER_DATA) {
-		/*
-		 * If it's an allocation to an empty file at offset 0, pick an
-		 * extent that will space things out in the rt area.
-		 */
-		start = xfs_rtpick_extent(mp, ap->tp, ralen);
-	} else {
-		start = 0;
-	}
 
 	/*
 	 * Only bother calculating a real prod factor if offset & length are
@@ -1398,7 +1405,8 @@ xfs_bmap_rtalloc(
 	}
 
 	error = xfs_rtallocate(ap->tp, start, raminlen, ralen, prod, ap->wasdel,
-			       &ap->blkno, &ap->length);
+			ap->datatype & XFS_ALLOC_INITIAL_USER_DATA, &rtlocked,
+			&ap->blkno, &ap->length);
 	if (error == -ENOSPC) {
 		if (align > mp->m_sb.sb_rextsize) {
 			/*
@@ -1414,15 +1422,6 @@ xfs_bmap_rtalloc(
 			goto retry;
 		}
 
-		if (!ignore_locality && start != 0) {
-			/*
-			 * If we can't allocate near a specific rt extent, try
-			 * again without locality criteria.
-			 */
-			ignore_locality = true;
-			goto retry;
-		}
-
 		ap->blkno = NULLFSBLOCK;
 		ap->length = 0;
 		return 0;


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 04/10] xfs: factor out a xfs_rtallocate_align helper
  2024-09-02 18:22 ` [PATCHSET v4.2 5/8] xfs: cleanups for the realtime allocator Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-09-02 18:30   ` [PATCH 03/10] xfs: rework the rtalloc fallback handling Darrick J. Wong
@ 2024-09-02 18:30   ` Darrick J. Wong
  2024-09-02 18:30   ` [PATCH 05/10] xfs: make the rtalloc start hint a xfs_rtblock_t Darrick J. Wong
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:30 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Split the code to calculate the aligned allocation request from
xfs_bmap_rtalloc into a separate self-contained helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_rtalloc.c |   93 ++++++++++++++++++++++++++++++++------------------
 1 file changed, 59 insertions(+), 34 deletions(-)


diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index a6b9ba572cdc..61e0c5b7a327 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1336,30 +1336,33 @@ xfs_rtallocate(
 	return error;
 }
 
-int
-xfs_bmap_rtalloc(
-	struct xfs_bmalloca	*ap)
+static int
+xfs_rtallocate_align(
+	struct xfs_bmalloca	*ap,
+	xfs_rtxlen_t		*ralen,
+	xfs_rtxlen_t		*raminlen,
+	xfs_rtxlen_t		*prod,
+	bool			*noalign)
 {
 	struct xfs_mount	*mp = ap->ip->i_mount;
 	xfs_fileoff_t		orig_offset = ap->offset;
-	xfs_rtxnum_t		start = 0;   /* allocation hint rtextent no */
-	xfs_rtxlen_t		prod = 0;  /* product factor for allocators */
-	xfs_extlen_t		mod = 0;   /* product factor for allocators */
-	xfs_rtxlen_t		ralen = 0; /* realtime allocation length */
-	xfs_extlen_t		align;     /* minimum allocation alignment */
-	xfs_extlen_t		orig_length = ap->length;
 	xfs_extlen_t		minlen = mp->m_sb.sb_rextsize;
-	xfs_rtxlen_t		raminlen;
-	bool			rtlocked = false;
+	xfs_extlen_t            align;	/* minimum allocation alignment */
+	xfs_extlen_t		mod;	/* product factor for allocators */
 	int			error;
 
-	align = xfs_get_extsz_hint(ap->ip);
-	if (!align)
-		align = 1;
-retry:
-	error = xfs_bmap_extsize_align(mp, &ap->got, &ap->prev,
-					align, 1, ap->eof, 0,
-					ap->conv, &ap->offset, &ap->length);
+	if (*noalign) {
+		align = mp->m_sb.sb_rextsize;
+	} else {
+		align = xfs_get_extsz_hint(ap->ip);
+		if (!align)
+			align = 1;
+		if (align == mp->m_sb.sb_rextsize)
+			*noalign = true;
+	}
+
+	error = xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 1,
+			ap->eof, 0, ap->conv, &ap->offset, &ap->length);
 	if (error)
 		return error;
 	ASSERT(ap->length);
@@ -1383,32 +1386,54 @@ xfs_bmap_rtalloc(
 	 * XFS_BMBT_MAX_EXTLEN), we don't hear about that number, and can't
 	 * adjust the starting point to match it.
 	 */
-	ralen = xfs_extlen_to_rtxlen(mp, min(ap->length, XFS_MAX_BMBT_EXTLEN));
-	raminlen = max_t(xfs_rtxlen_t, 1, xfs_extlen_to_rtxlen(mp, minlen));
-	ASSERT(raminlen > 0);
-	ASSERT(raminlen <= ralen);
-
-	if (xfs_bmap_adjacent(ap))
-		start = xfs_rtb_to_rtx(mp, ap->blkno);
+	*ralen = xfs_extlen_to_rtxlen(mp, min(ap->length, XFS_MAX_BMBT_EXTLEN));
+	*raminlen = max_t(xfs_rtxlen_t, 1, xfs_extlen_to_rtxlen(mp, minlen));
+	ASSERT(*raminlen > 0);
+	ASSERT(*raminlen <= *ralen);
 
 	/*
 	 * Only bother calculating a real prod factor if offset & length are
 	 * perfectly aligned, otherwise it will just get us in trouble.
 	 */
 	div_u64_rem(ap->offset, align, &mod);
-	if (mod || ap->length % align) {
-		prod = 1;
-	} else {
-		prod = xfs_extlen_to_rtxlen(mp, align);
-		if (prod > 1)
-			xfs_rtalloc_align_minmax(&raminlen, &ralen, &prod);
-	}
+	if (mod || ap->length % align)
+		*prod = 1;
+	else
+		*prod = xfs_extlen_to_rtxlen(mp, align);
+
+	if (*prod > 1)
+		xfs_rtalloc_align_minmax(raminlen, ralen, prod);
+	return 0;
+}
+
+int
+xfs_bmap_rtalloc(
+	struct xfs_bmalloca	*ap)
+{
+	struct xfs_mount	*mp = ap->ip->i_mount;
+	xfs_fileoff_t		orig_offset = ap->offset;
+	xfs_rtxnum_t		start = 0; /* allocation hint rtextent no */
+	xfs_rtxlen_t		prod = 0;  /* product factor for allocators */
+	xfs_rtxlen_t		ralen = 0; /* realtime allocation length */
+	xfs_extlen_t		orig_length = ap->length;
+	xfs_rtxlen_t		raminlen;
+	bool			rtlocked = false;
+	bool			noalign = false;
+	int			error;
+
+retry:
+	error = xfs_rtallocate_align(ap, &ralen, &raminlen, &prod, &noalign);
+	if (error)
+		return error;
+
+	if (xfs_bmap_adjacent(ap))
+		start = xfs_rtb_to_rtx(mp, ap->blkno);
 
 	error = xfs_rtallocate(ap->tp, start, raminlen, ralen, prod, ap->wasdel,
 			ap->datatype & XFS_ALLOC_INITIAL_USER_DATA, &rtlocked,
 			&ap->blkno, &ap->length);
 	if (error == -ENOSPC) {
-		if (align > mp->m_sb.sb_rextsize) {
+		if (!noalign) {
 			/*
 			 * We previously enlarged the request length to try to
 			 * satisfy an extent size hint.  The allocator didn't
@@ -1418,7 +1443,7 @@ xfs_bmap_rtalloc(
 			 */
 			ap->offset = orig_offset;
 			ap->length = orig_length;
-			minlen = align = mp->m_sb.sb_rextsize;
+			noalign = true;
 			goto retry;
 		}
 


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 05/10] xfs: make the rtalloc start hint a xfs_rtblock_t
  2024-09-02 18:22 ` [PATCHSET v4.2 5/8] xfs: cleanups for the realtime allocator Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-09-02 18:30   ` [PATCH 04/10] xfs: factor out a xfs_rtallocate_align helper Darrick J. Wong
@ 2024-09-02 18:30   ` Darrick J. Wong
  2024-09-02 18:31   ` [PATCH 06/10] xfs: add xchk_setup_nothing and xchk_nothing helpers Darrick J. Wong
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:30 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

0 is a valid start RT extent, and with pending changes it will become
both more common and non-unique.  Switch to pass a xfs_rtblock_t instead
so that we can use NULLRTBLOCK to determine if a hint was set or not.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_rtalloc.c |   18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)


diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 61e0c5b7a327..29edb8044b00 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1266,7 +1266,7 @@ xfs_rtalloc_align_minmax(
 static int
 xfs_rtallocate(
 	struct xfs_trans	*tp,
-	xfs_rtxnum_t		start,
+	xfs_rtblock_t		bno_hint,
 	xfs_rtxlen_t		minlen,
 	xfs_rtxlen_t		maxlen,
 	xfs_rtxlen_t		prod,
@@ -1280,6 +1280,7 @@ xfs_rtallocate(
 		.mp		= tp->t_mountp,
 		.tp		= tp,
 	};
+	xfs_rtxnum_t		start = 0;
 	xfs_rtxnum_t		rtx;
 	xfs_rtxlen_t		len = 0;
 	int			error = 0;
@@ -1297,7 +1298,9 @@ xfs_rtallocate(
 	 * For an allocation to an empty file at offset 0, pick an extent that
 	 * will space things out in the rt area.
 	 */
-	if (!start && initial_user_data)
+	if (bno_hint)
+		start = xfs_rtb_to_rtx(args.mp, bno_hint);
+	else if (initial_user_data)
 		start = xfs_rtpick_extent(args.mp, tp, maxlen);
 
 	if (start) {
@@ -1410,15 +1413,16 @@ int
 xfs_bmap_rtalloc(
 	struct xfs_bmalloca	*ap)
 {
-	struct xfs_mount	*mp = ap->ip->i_mount;
 	xfs_fileoff_t		orig_offset = ap->offset;
-	xfs_rtxnum_t		start = 0; /* allocation hint rtextent no */
 	xfs_rtxlen_t		prod = 0;  /* product factor for allocators */
 	xfs_rtxlen_t		ralen = 0; /* realtime allocation length */
+	xfs_rtblock_t		bno_hint = NULLRTBLOCK;
 	xfs_extlen_t		orig_length = ap->length;
 	xfs_rtxlen_t		raminlen;
 	bool			rtlocked = false;
 	bool			noalign = false;
+	bool			initial_user_data =
+		ap->datatype & XFS_ALLOC_INITIAL_USER_DATA;
 	int			error;
 
 retry:
@@ -1427,10 +1431,10 @@ xfs_bmap_rtalloc(
 		return error;
 
 	if (xfs_bmap_adjacent(ap))
-		start = xfs_rtb_to_rtx(mp, ap->blkno);
+		bno_hint = ap->blkno;
 
-	error = xfs_rtallocate(ap->tp, start, raminlen, ralen, prod, ap->wasdel,
-			ap->datatype & XFS_ALLOC_INITIAL_USER_DATA, &rtlocked,
+	error = xfs_rtallocate(ap->tp, bno_hint, raminlen, ralen, prod,
+			ap->wasdel, initial_user_data, &rtlocked,
 			&ap->blkno, &ap->length);
 	if (error == -ENOSPC) {
 		if (!noalign) {


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 06/10] xfs: add xchk_setup_nothing and xchk_nothing helpers
  2024-09-02 18:22 ` [PATCHSET v4.2 5/8] xfs: cleanups for the realtime allocator Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-09-02 18:30   ` [PATCH 05/10] xfs: make the rtalloc start hint a xfs_rtblock_t Darrick J. Wong
@ 2024-09-02 18:31   ` Darrick J. Wong
  2024-09-02 18:31   ` [PATCH 07/10] xfs: remove xfs_{rtbitmap,rtsummary}_wordcount Darrick J. Wong
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:31 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, Christoph Hellwig, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add common helpers for no-op scrubbing methods.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/common.h |   29 +++++++++--------------------
 fs/xfs/scrub/scrub.h  |   29 +++++++++--------------------
 2 files changed, 18 insertions(+), 40 deletions(-)


diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 3d5f1f6b4b7b..47148cc4a833 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -53,6 +53,11 @@ int xchk_checkpoint_log(struct xfs_mount *mp);
 bool xchk_should_check_xref(struct xfs_scrub *sc, int *error,
 			   struct xfs_btree_cur **curpp);
 
+static inline int xchk_setup_nothing(struct xfs_scrub *sc)
+{
+	return -ENOENT;
+}
+
 /* Setup functions */
 int xchk_setup_agheader(struct xfs_scrub *sc);
 int xchk_setup_fs(struct xfs_scrub *sc);
@@ -72,16 +77,8 @@ int xchk_setup_dirtree(struct xfs_scrub *sc);
 int xchk_setup_rtbitmap(struct xfs_scrub *sc);
 int xchk_setup_rtsummary(struct xfs_scrub *sc);
 #else
-static inline int
-xchk_setup_rtbitmap(struct xfs_scrub *sc)
-{
-	return -ENOENT;
-}
-static inline int
-xchk_setup_rtsummary(struct xfs_scrub *sc)
-{
-	return -ENOENT;
-}
+# define xchk_setup_rtbitmap		xchk_setup_nothing
+# define xchk_setup_rtsummary		xchk_setup_nothing
 #endif
 #ifdef CONFIG_XFS_QUOTA
 int xchk_ino_dqattach(struct xfs_scrub *sc);
@@ -93,16 +90,8 @@ xchk_ino_dqattach(struct xfs_scrub *sc)
 {
 	return 0;
 }
-static inline int
-xchk_setup_quota(struct xfs_scrub *sc)
-{
-	return -ENOENT;
-}
-static inline int
-xchk_setup_quotacheck(struct xfs_scrub *sc)
-{
-	return -ENOENT;
-}
+# define xchk_setup_quota		xchk_setup_nothing
+# define xchk_setup_quotacheck		xchk_setup_nothing
 #endif
 int xchk_setup_fscounters(struct xfs_scrub *sc);
 int xchk_setup_nlinks(struct xfs_scrub *sc);
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 1bc33f010d0e..5993fcaffb2c 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -231,6 +231,11 @@ xchk_should_terminate(
 	return false;
 }
 
+static inline int xchk_nothing(struct xfs_scrub *sc)
+{
+	return -ENOENT;
+}
+
 /* Metadata scrubbers */
 int xchk_tester(struct xfs_scrub *sc);
 int xchk_superblock(struct xfs_scrub *sc);
@@ -254,31 +259,15 @@ int xchk_dirtree(struct xfs_scrub *sc);
 int xchk_rtbitmap(struct xfs_scrub *sc);
 int xchk_rtsummary(struct xfs_scrub *sc);
 #else
-static inline int
-xchk_rtbitmap(struct xfs_scrub *sc)
-{
-	return -ENOENT;
-}
-static inline int
-xchk_rtsummary(struct xfs_scrub *sc)
-{
-	return -ENOENT;
-}
+# define xchk_rtbitmap		xchk_nothing
+# define xchk_rtsummary		xchk_nothing
 #endif
 #ifdef CONFIG_XFS_QUOTA
 int xchk_quota(struct xfs_scrub *sc);
 int xchk_quotacheck(struct xfs_scrub *sc);
 #else
-static inline int
-xchk_quota(struct xfs_scrub *sc)
-{
-	return -ENOENT;
-}
-static inline int
-xchk_quotacheck(struct xfs_scrub *sc)
-{
-	return -ENOENT;
-}
+# define xchk_quota		xchk_nothing
+# define xchk_quotacheck	xchk_nothing
 #endif
 int xchk_fscounters(struct xfs_scrub *sc);
 int xchk_nlinks(struct xfs_scrub *sc);


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 07/10] xfs: remove xfs_{rtbitmap,rtsummary}_wordcount
  2024-09-02 18:22 ` [PATCHSET v4.2 5/8] xfs: cleanups for the realtime allocator Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-09-02 18:31   ` [PATCH 06/10] xfs: add xchk_setup_nothing and xchk_nothing helpers Darrick J. Wong
@ 2024-09-02 18:31   ` Darrick J. Wong
  2024-09-02 18:31   ` [PATCH 08/10] xfs: replace m_rsumsize with m_rsumblocks Darrick J. Wong
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:31 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

xfs_rtbitmap_wordcount and xfs_rtsummary_wordcount are currently unused,
so remove them to simplify refactoring other rtbitmap helpers.  They
can be added back or simply open coded when actually needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_rtbitmap.c |   31 -------------------------------
 fs/xfs/libxfs/xfs_rtbitmap.h |    7 -------
 2 files changed, 38 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
index c58eb75ef0fa..76706e8bbc4e 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.c
+++ b/fs/xfs/libxfs/xfs_rtbitmap.c
@@ -1148,21 +1148,6 @@ xfs_rtbitmap_blockcount(
 	return howmany_64(rtextents, NBBY * mp->m_sb.sb_blocksize);
 }
 
-/*
- * Compute the number of rtbitmap words needed to populate every block of a
- * bitmap that is large enough to track the given number of rt extents.
- */
-unsigned long long
-xfs_rtbitmap_wordcount(
-	struct xfs_mount	*mp,
-	xfs_rtbxlen_t		rtextents)
-{
-	xfs_filblks_t		blocks;
-
-	blocks = xfs_rtbitmap_blockcount(mp, rtextents);
-	return XFS_FSB_TO_B(mp, blocks) >> XFS_WORDLOG;
-}
-
 /* Compute the number of rtsummary blocks needed to track the given rt space. */
 xfs_filblks_t
 xfs_rtsummary_blockcount(
@@ -1176,22 +1161,6 @@ xfs_rtsummary_blockcount(
 	return XFS_B_TO_FSB(mp, rsumwords << XFS_WORDLOG);
 }
 
-/*
- * Compute the number of rtsummary info words needed to populate every block of
- * a summary file that is large enough to track the given rt space.
- */
-unsigned long long
-xfs_rtsummary_wordcount(
-	struct xfs_mount	*mp,
-	unsigned int		rsumlevels,
-	xfs_extlen_t		rbmblocks)
-{
-	xfs_filblks_t		blocks;
-
-	blocks = xfs_rtsummary_blockcount(mp, rsumlevels, rbmblocks);
-	return XFS_FSB_TO_B(mp, blocks) >> XFS_WORDLOG;
-}
-
 /* Lock both realtime free space metadata inodes for a freespace update. */
 void
 xfs_rtbitmap_lock(
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h
index 0dbc9bb40668..140513d1d6bc 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.h
+++ b/fs/xfs/libxfs/xfs_rtbitmap.h
@@ -316,13 +316,8 @@ int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno,
 
 xfs_filblks_t xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t
 		rtextents);
-unsigned long long xfs_rtbitmap_wordcount(struct xfs_mount *mp,
-		xfs_rtbxlen_t rtextents);
-
 xfs_filblks_t xfs_rtsummary_blockcount(struct xfs_mount *mp,
 		unsigned int rsumlevels, xfs_extlen_t rbmblocks);
-unsigned long long xfs_rtsummary_wordcount(struct xfs_mount *mp,
-		unsigned int rsumlevels, xfs_extlen_t rbmblocks);
 
 int xfs_rtfile_initialize_blocks(struct xfs_inode *ip,
 		xfs_fileoff_t offset_fsb, xfs_fileoff_t end_fsb, void *data);
@@ -355,9 +350,7 @@ xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents)
 	/* shut up gcc */
 	return 0;
 }
-# define xfs_rtbitmap_wordcount(mp, r)			(0)
 # define xfs_rtsummary_blockcount(mp, l, b)		(0)
-# define xfs_rtsummary_wordcount(mp, l, b)		(0)
 # define xfs_rtbitmap_lock(mp)			do { } while (0)
 # define xfs_rtbitmap_trans_join(tp)		do { } while (0)
 # define xfs_rtbitmap_unlock(mp)		do { } while (0)


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 08/10] xfs: replace m_rsumsize with m_rsumblocks
  2024-09-02 18:22 ` [PATCHSET v4.2 5/8] xfs: cleanups for the realtime allocator Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-09-02 18:31   ` [PATCH 07/10] xfs: remove xfs_{rtbitmap,rtsummary}_wordcount Darrick J. Wong
@ 2024-09-02 18:31   ` Darrick J. Wong
  2024-09-02 18:31   ` [PATCH 09/10] xfs: rearrange xfs_fsmap.c a little bit Darrick J. Wong
  2024-09-02 18:32   ` [PATCH 10/10] xfs: move xfs_ioc_getfsmap out of xfs_ioctl.c Darrick J. Wong
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:31 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Track the RT summary file size in blocks, just like the RT bitmap
file.  While we have users of both units, blocks are used slightly
more often and this matches the bitmap file for consistency.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_rtbitmap.c    |    2 +-
 fs/xfs/libxfs/xfs_trans_resv.c  |    2 +-
 fs/xfs/scrub/rtsummary.c        |   11 +++++------
 fs/xfs/scrub/rtsummary.h        |    2 +-
 fs/xfs/scrub/rtsummary_repair.c |   12 +++++-------
 fs/xfs/xfs_mount.h              |    2 +-
 fs/xfs/xfs_rtalloc.c            |   13 +++++--------
 7 files changed, 19 insertions(+), 25 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
index 76706e8bbc4e..27a4472402ba 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.c
+++ b/fs/xfs/libxfs/xfs_rtbitmap.c
@@ -162,7 +162,7 @@ xfs_rtsummary_read_buf(
 {
 	struct xfs_mount		*mp = args->mp;
 
-	if (XFS_IS_CORRUPT(mp, block >= XFS_B_TO_FSB(mp, mp->m_rsumsize))) {
+	if (XFS_IS_CORRUPT(mp, block >= mp->m_rsumblocks)) {
 		xfs_rt_mark_sick(args->mp, XFS_SICK_RT_SUMMARY);
 		return -EFSCORRUPTED;
 	}
diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 45aaf169806a..2e6d7bb3b5a2 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -918,7 +918,7 @@ xfs_calc_growrtfree_reservation(
 	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
 		xfs_calc_inode_res(mp, 2) +
 		xfs_calc_buf_res(1, mp->m_sb.sb_blocksize) +
-		xfs_calc_buf_res(1, mp->m_rsumsize);
+		xfs_calc_buf_res(1, XFS_FSB_TO_B(mp, mp->m_rsumblocks));
 }
 
 /*
diff --git a/fs/xfs/scrub/rtsummary.c b/fs/xfs/scrub/rtsummary.c
index 3fee603f5244..7c7366c98338 100644
--- a/fs/xfs/scrub/rtsummary.c
+++ b/fs/xfs/scrub/rtsummary.c
@@ -63,7 +63,8 @@ xchk_setup_rtsummary(
 	 * us to avoid pinning kernel memory for this purpose.
 	 */
 	descr = xchk_xfile_descr(sc, "realtime summary file");
-	error = xfile_create(descr, mp->m_rsumsize, &sc->xfile);
+	error = xfile_create(descr, XFS_FSB_TO_B(mp, mp->m_rsumblocks),
+			&sc->xfile);
 	kfree(descr);
 	if (error)
 		return error;
@@ -95,16 +96,14 @@ xchk_setup_rtsummary(
 	 * volume.  Hence it is safe to compute and check the geometry values.
 	 */
 	if (mp->m_sb.sb_rblocks) {
-		xfs_filblks_t	rsumblocks;
 		int		rextslog;
 
 		rts->rextents = xfs_rtb_to_rtx(mp, mp->m_sb.sb_rblocks);
 		rextslog = xfs_compute_rextslog(rts->rextents);
 		rts->rsumlevels = rextslog + 1;
 		rts->rbmblocks = xfs_rtbitmap_blockcount(mp, rts->rextents);
-		rsumblocks = xfs_rtsummary_blockcount(mp, rts->rsumlevels,
+		rts->rsumblocks = xfs_rtsummary_blockcount(mp, rts->rsumlevels,
 				rts->rbmblocks);
-		rts->rsumsize = XFS_FSB_TO_B(mp, rsumblocks);
 	}
 	return 0;
 }
@@ -316,7 +315,7 @@ xchk_rtsummary(
 	}
 
 	/* Is m_rsumsize correct? */
-	if (mp->m_rsumsize != rts->rsumsize) {
+	if (mp->m_rsumblocks != rts->rsumblocks) {
 		xchk_ino_set_corrupt(sc, mp->m_rsumip->i_ino);
 		goto out_rbm;
 	}
@@ -332,7 +331,7 @@ xchk_rtsummary(
 	 * growfsrt expands the summary file before updating sb_rextents, so
 	 * the file can be larger than rsumsize.
 	 */
-	if (mp->m_rsumip->i_disk_size < rts->rsumsize) {
+	if (mp->m_rsumip->i_disk_size < XFS_FSB_TO_B(mp, rts->rsumblocks)) {
 		xchk_ino_set_corrupt(sc, mp->m_rsumip->i_ino);
 		goto out_rbm;
 	}
diff --git a/fs/xfs/scrub/rtsummary.h b/fs/xfs/scrub/rtsummary.h
index e1d50304d8d4..e44b04cb6e2d 100644
--- a/fs/xfs/scrub/rtsummary.h
+++ b/fs/xfs/scrub/rtsummary.h
@@ -14,7 +14,7 @@ struct xchk_rtsummary {
 
 	uint64_t		rextents;
 	uint64_t		rbmblocks;
-	uint64_t		rsumsize;
+	xfs_filblks_t		rsumblocks;
 	unsigned int		rsumlevels;
 	unsigned int		resblks;
 
diff --git a/fs/xfs/scrub/rtsummary_repair.c b/fs/xfs/scrub/rtsummary_repair.c
index d9e971c4c79f..7deeb948cb70 100644
--- a/fs/xfs/scrub/rtsummary_repair.c
+++ b/fs/xfs/scrub/rtsummary_repair.c
@@ -56,7 +56,7 @@ xrep_setup_rtsummary(
 	 * transaction (which we cannot drop because we cannot drop the
 	 * rtsummary ILOCK) and cannot ask for more reservation.
 	 */
-	blocks = XFS_B_TO_FSB(mp, mp->m_rsumsize);
+	blocks = mp->m_rsumblocks;
 	blocks += xfs_bmbt_calc_size(mp, blocks) * 2;
 	if (blocks > UINT_MAX)
 		return -EOPNOTSUPP;
@@ -100,7 +100,6 @@ xrep_rtsummary(
 {
 	struct xchk_rtsummary	*rts = sc->buf;
 	struct xfs_mount	*mp = sc->mp;
-	xfs_filblks_t		rsumblocks;
 	int			error;
 
 	/* We require the rmapbt to rebuild anything. */
@@ -131,10 +130,9 @@ xrep_rtsummary(
 	}
 
 	/* Make sure we have space allocated for the entire summary file. */
-	rsumblocks = XFS_B_TO_FSB(mp, rts->rsumsize);
 	xfs_trans_ijoin(sc->tp, sc->ip, 0);
 	xfs_trans_ijoin(sc->tp, sc->tempip, 0);
-	error = xrep_tempfile_prealloc(sc, 0, rsumblocks);
+	error = xrep_tempfile_prealloc(sc, 0, rts->rsumblocks);
 	if (error)
 		return error;
 
@@ -143,11 +141,11 @@ xrep_rtsummary(
 		return error;
 
 	/* Copy the rtsummary file that we generated. */
-	error = xrep_tempfile_copyin(sc, 0, rsumblocks,
+	error = xrep_tempfile_copyin(sc, 0, rts->rsumblocks,
 			xrep_rtsummary_prep_buf, rts);
 	if (error)
 		return error;
-	error = xrep_tempfile_set_isize(sc, rts->rsumsize);
+	error = xrep_tempfile_set_isize(sc, XFS_FSB_TO_B(mp, rts->rsumblocks));
 	if (error)
 		return error;
 
@@ -168,7 +166,7 @@ xrep_rtsummary(
 		memset(mp->m_rsum_cache, 0xFF, mp->m_sb.sb_rbmblocks);
 
 	mp->m_rsumlevels = rts->rsumlevels;
-	mp->m_rsumsize = rts->rsumsize;
+	mp->m_rsumblocks = rts->rsumblocks;
 
 	/* Free the old rtsummary blocks if they're not in use. */
 	return xrep_reap_ifork(sc, sc->tempip, XFS_DATA_FORK);
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index d0567dfbc036..7bf635cccaa1 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -147,7 +147,7 @@ typedef struct xfs_mount {
 	int			m_logbufs;	/* number of log buffers */
 	int			m_logbsize;	/* size of each log buffer */
 	uint			m_rsumlevels;	/* rt summary levels */
-	uint			m_rsumsize;	/* size of rt summary, bytes */
+	xfs_filblks_t		m_rsumblocks;	/* size of rt summary, FSBs */
 	int			m_fixedfsid[2];	/* unchanged for life of FS */
 	uint			m_qflags;	/* quota status flags */
 	uint64_t		m_features;	/* active filesystem features */
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 29edb8044b00..3a2005a1e673 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -734,9 +734,8 @@ xfs_growfs_rt_bmblock(
 	nmp->m_sb.sb_rextents = xfs_rtb_to_rtx(nmp, nmp->m_sb.sb_rblocks);
 	nmp->m_sb.sb_rextslog = xfs_compute_rextslog(nmp->m_sb.sb_rextents);
 	nmp->m_rsumlevels = nmp->m_sb.sb_rextslog + 1;
-	nmp->m_rsumsize = XFS_FSB_TO_B(mp,
-		xfs_rtsummary_blockcount(mp, nmp->m_rsumlevels,
-			nmp->m_sb.sb_rbmblocks));
+	nmp->m_rsumblocks = xfs_rtsummary_blockcount(mp, nmp->m_rsumlevels,
+			nmp->m_sb.sb_rbmblocks);
 
 	/*
 	 * Recompute the growfsrt reservation from the new rsumsize, so that the
@@ -766,7 +765,7 @@ xfs_growfs_rt_bmblock(
 	 * so that inode inactivation won't punch what it thinks are "posteof"
 	 * blocks.
 	 */
-	rsumip->i_disk_size = nmp->m_rsumsize;
+	rsumip->i_disk_size = nmp->m_rsumblocks * nmp->m_sb.sb_blocksize;
 	i_size_write(VFS_I(rsumip), rsumip->i_disk_size);
 	xfs_trans_log_inode(args.tp, rsumip, XFS_ILOG_CORE);
 
@@ -818,7 +817,7 @@ xfs_growfs_rt_bmblock(
 	 * Update the calculated values in the real mount structure.
 	 */
 	mp->m_rsumlevels = nmp->m_rsumlevels;
-	mp->m_rsumsize = nmp->m_rsumsize;
+	mp->m_rsumblocks = nmp->m_rsumblocks;
 	xfs_mount_sb_set_rextsize(mp, &mp->m_sb);
 
 	/*
@@ -1022,7 +1021,6 @@ xfs_rtmount_init(
 	struct xfs_buf		*bp;	/* buffer for last block of subvolume */
 	struct xfs_sb		*sbp;	/* filesystem superblock copy in mount */
 	xfs_daddr_t		d;	/* address of last block of subvolume */
-	unsigned int		rsumblocks;
 	int			error;
 
 	sbp = &mp->m_sb;
@@ -1034,9 +1032,8 @@ xfs_rtmount_init(
 		return -ENODEV;
 	}
 	mp->m_rsumlevels = sbp->sb_rextslog + 1;
-	rsumblocks = xfs_rtsummary_blockcount(mp, mp->m_rsumlevels,
+	mp->m_rsumblocks = xfs_rtsummary_blockcount(mp, mp->m_rsumlevels,
 			mp->m_sb.sb_rbmblocks);
-	mp->m_rsumsize = XFS_FSB_TO_B(mp, rsumblocks);
 	mp->m_rbmip = mp->m_rsumip = NULL;
 	/*
 	 * Check that the realtime section is an ok size.


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 09/10] xfs: rearrange xfs_fsmap.c a little bit
  2024-09-02 18:22 ` [PATCHSET v4.2 5/8] xfs: cleanups for the realtime allocator Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-09-02 18:31   ` [PATCH 08/10] xfs: replace m_rsumsize with m_rsumblocks Darrick J. Wong
@ 2024-09-02 18:31   ` Darrick J. Wong
  2024-09-02 18:32   ` [PATCH 10/10] xfs: move xfs_ioc_getfsmap out of xfs_ioctl.c Darrick J. Wong
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:31 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The order of the functions in this file has gotten a little confusing
over the years.  Specifically, the two data device implementations
(bnobt and rmapbt) could be adjacent in the source code instead of split
in two by the logdev and rtdev fsmap implementations.  We're about to
add more functionality to this file, so rearrange things now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_fsmap.c |  268 ++++++++++++++++++++++++++--------------------------
 1 file changed, 134 insertions(+), 134 deletions(-)


diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index e15446626875..615253406fde 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -441,140 +441,6 @@ xfs_getfsmap_set_irec_flags(
 		irec->rm_flags |= XFS_RMAP_UNWRITTEN;
 }
 
-/* Execute a getfsmap query against the log device. */
-STATIC int
-xfs_getfsmap_logdev(
-	struct xfs_trans		*tp,
-	const struct xfs_fsmap		*keys,
-	struct xfs_getfsmap_info	*info)
-{
-	struct xfs_mount		*mp = tp->t_mountp;
-	struct xfs_rmap_irec		rmap;
-	xfs_daddr_t			rec_daddr, len_daddr;
-	xfs_fsblock_t			start_fsb, end_fsb;
-	uint64_t			eofs;
-
-	eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks);
-	if (keys[0].fmr_physical >= eofs)
-		return 0;
-	start_fsb = XFS_BB_TO_FSBT(mp,
-				keys[0].fmr_physical + keys[0].fmr_length);
-	end_fsb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical));
-
-	/* Adjust the low key if we are continuing from where we left off. */
-	if (keys[0].fmr_length > 0)
-		info->low_daddr = XFS_FSB_TO_BB(mp, start_fsb);
-
-	trace_xfs_fsmap_low_key_linear(mp, info->dev, start_fsb);
-	trace_xfs_fsmap_high_key_linear(mp, info->dev, end_fsb);
-
-	if (start_fsb > 0)
-		return 0;
-
-	/* Fabricate an rmap entry for the external log device. */
-	rmap.rm_startblock = 0;
-	rmap.rm_blockcount = mp->m_sb.sb_logblocks;
-	rmap.rm_owner = XFS_RMAP_OWN_LOG;
-	rmap.rm_offset = 0;
-	rmap.rm_flags = 0;
-
-	rec_daddr = XFS_FSB_TO_BB(mp, rmap.rm_startblock);
-	len_daddr = XFS_FSB_TO_BB(mp, rmap.rm_blockcount);
-	return xfs_getfsmap_helper(tp, info, &rmap, rec_daddr, len_daddr);
-}
-
-#ifdef CONFIG_XFS_RT
-/* Transform a rtbitmap "record" into a fsmap */
-STATIC int
-xfs_getfsmap_rtdev_rtbitmap_helper(
-	struct xfs_mount		*mp,
-	struct xfs_trans		*tp,
-	const struct xfs_rtalloc_rec	*rec,
-	void				*priv)
-{
-	struct xfs_getfsmap_info	*info = priv;
-	struct xfs_rmap_irec		irec;
-	xfs_rtblock_t			rtbno;
-	xfs_daddr_t			rec_daddr, len_daddr;
-
-	rtbno = xfs_rtx_to_rtb(mp, rec->ar_startext);
-	rec_daddr = XFS_FSB_TO_BB(mp, rtbno);
-	irec.rm_startblock = rtbno;
-
-	rtbno = xfs_rtx_to_rtb(mp, rec->ar_extcount);
-	len_daddr = XFS_FSB_TO_BB(mp, rtbno);
-	irec.rm_blockcount = rtbno;
-
-	irec.rm_owner = XFS_RMAP_OWN_NULL;	/* "free" */
-	irec.rm_offset = 0;
-	irec.rm_flags = 0;
-
-	return xfs_getfsmap_helper(tp, info, &irec, rec_daddr, len_daddr);
-}
-
-/* Execute a getfsmap query against the realtime device rtbitmap. */
-STATIC int
-xfs_getfsmap_rtdev_rtbitmap(
-	struct xfs_trans		*tp,
-	const struct xfs_fsmap		*keys,
-	struct xfs_getfsmap_info	*info)
-{
-
-	struct xfs_rtalloc_rec		ahigh = { 0 };
-	struct xfs_mount		*mp = tp->t_mountp;
-	xfs_rtblock_t			start_rtb;
-	xfs_rtblock_t			end_rtb;
-	xfs_rtxnum_t			high;
-	uint64_t			eofs;
-	int				error;
-
-	eofs = XFS_FSB_TO_BB(mp, xfs_rtx_to_rtb(mp, mp->m_sb.sb_rextents));
-	if (keys[0].fmr_physical >= eofs)
-		return 0;
-	start_rtb = XFS_BB_TO_FSBT(mp,
-				keys[0].fmr_physical + keys[0].fmr_length);
-	end_rtb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical));
-
-	info->missing_owner = XFS_FMR_OWN_UNKNOWN;
-
-	/* Adjust the low key if we are continuing from where we left off. */
-	if (keys[0].fmr_length > 0) {
-		info->low_daddr = XFS_FSB_TO_BB(mp, start_rtb);
-		if (info->low_daddr >= eofs)
-			return 0;
-	}
-
-	trace_xfs_fsmap_low_key_linear(mp, info->dev, start_rtb);
-	trace_xfs_fsmap_high_key_linear(mp, info->dev, end_rtb);
-
-	xfs_rtbitmap_lock_shared(mp, XFS_RBMLOCK_BITMAP);
-
-	/*
-	 * Set up query parameters to return free rtextents covering the range
-	 * we want.
-	 */
-	high = xfs_rtb_to_rtxup(mp, end_rtb);
-	error = xfs_rtalloc_query_range(mp, tp, xfs_rtb_to_rtx(mp, start_rtb),
-			high, xfs_getfsmap_rtdev_rtbitmap_helper, info);
-	if (error)
-		goto err;
-
-	/*
-	 * Report any gaps at the end of the rtbitmap by simulating a null
-	 * rmap starting at the block after the end of the query range.
-	 */
-	info->last = true;
-	ahigh.ar_startext = min(mp->m_sb.sb_rextents, high);
-
-	error = xfs_getfsmap_rtdev_rtbitmap_helper(mp, tp, &ahigh, info);
-	if (error)
-		goto err;
-err:
-	xfs_rtbitmap_unlock_shared(mp, XFS_RBMLOCK_BITMAP);
-	return error;
-}
-#endif /* CONFIG_XFS_RT */
-
 static inline bool
 rmap_not_shareable(struct xfs_mount *mp, const struct xfs_rmap_irec *r)
 {
@@ -799,6 +665,140 @@ xfs_getfsmap_datadev_bnobt(
 			xfs_getfsmap_datadev_bnobt_query, &akeys[0]);
 }
 
+/* Execute a getfsmap query against the log device. */
+STATIC int
+xfs_getfsmap_logdev(
+	struct xfs_trans		*tp,
+	const struct xfs_fsmap		*keys,
+	struct xfs_getfsmap_info	*info)
+{
+	struct xfs_mount		*mp = tp->t_mountp;
+	struct xfs_rmap_irec		rmap;
+	xfs_daddr_t			rec_daddr, len_daddr;
+	xfs_fsblock_t			start_fsb, end_fsb;
+	uint64_t			eofs;
+
+	eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks);
+	if (keys[0].fmr_physical >= eofs)
+		return 0;
+	start_fsb = XFS_BB_TO_FSBT(mp,
+				keys[0].fmr_physical + keys[0].fmr_length);
+	end_fsb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical));
+
+	/* Adjust the low key if we are continuing from where we left off. */
+	if (keys[0].fmr_length > 0)
+		info->low_daddr = XFS_FSB_TO_BB(mp, start_fsb);
+
+	trace_xfs_fsmap_low_key_linear(mp, info->dev, start_fsb);
+	trace_xfs_fsmap_high_key_linear(mp, info->dev, end_fsb);
+
+	if (start_fsb > 0)
+		return 0;
+
+	/* Fabricate an rmap entry for the external log device. */
+	rmap.rm_startblock = 0;
+	rmap.rm_blockcount = mp->m_sb.sb_logblocks;
+	rmap.rm_owner = XFS_RMAP_OWN_LOG;
+	rmap.rm_offset = 0;
+	rmap.rm_flags = 0;
+
+	rec_daddr = XFS_FSB_TO_BB(mp, rmap.rm_startblock);
+	len_daddr = XFS_FSB_TO_BB(mp, rmap.rm_blockcount);
+	return xfs_getfsmap_helper(tp, info, &rmap, rec_daddr, len_daddr);
+}
+
+#ifdef CONFIG_XFS_RT
+/* Transform a rtbitmap "record" into a fsmap */
+STATIC int
+xfs_getfsmap_rtdev_rtbitmap_helper(
+	struct xfs_mount		*mp,
+	struct xfs_trans		*tp,
+	const struct xfs_rtalloc_rec	*rec,
+	void				*priv)
+{
+	struct xfs_getfsmap_info	*info = priv;
+	struct xfs_rmap_irec		irec;
+	xfs_rtblock_t			rtbno;
+	xfs_daddr_t			rec_daddr, len_daddr;
+
+	rtbno = xfs_rtx_to_rtb(mp, rec->ar_startext);
+	rec_daddr = XFS_FSB_TO_BB(mp, rtbno);
+	irec.rm_startblock = rtbno;
+
+	rtbno = xfs_rtx_to_rtb(mp, rec->ar_extcount);
+	len_daddr = XFS_FSB_TO_BB(mp, rtbno);
+	irec.rm_blockcount = rtbno;
+
+	irec.rm_owner = XFS_RMAP_OWN_NULL;	/* "free" */
+	irec.rm_offset = 0;
+	irec.rm_flags = 0;
+
+	return xfs_getfsmap_helper(tp, info, &irec, rec_daddr, len_daddr);
+}
+
+/* Execute a getfsmap query against the realtime device rtbitmap. */
+STATIC int
+xfs_getfsmap_rtdev_rtbitmap(
+	struct xfs_trans		*tp,
+	const struct xfs_fsmap		*keys,
+	struct xfs_getfsmap_info	*info)
+{
+
+	struct xfs_rtalloc_rec		ahigh = { 0 };
+	struct xfs_mount		*mp = tp->t_mountp;
+	xfs_rtblock_t			start_rtb;
+	xfs_rtblock_t			end_rtb;
+	xfs_rtxnum_t			high;
+	uint64_t			eofs;
+	int				error;
+
+	eofs = XFS_FSB_TO_BB(mp, xfs_rtx_to_rtb(mp, mp->m_sb.sb_rextents));
+	if (keys[0].fmr_physical >= eofs)
+		return 0;
+	start_rtb = XFS_BB_TO_FSBT(mp,
+				keys[0].fmr_physical + keys[0].fmr_length);
+	end_rtb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical));
+
+	info->missing_owner = XFS_FMR_OWN_UNKNOWN;
+
+	/* Adjust the low key if we are continuing from where we left off. */
+	if (keys[0].fmr_length > 0) {
+		info->low_daddr = XFS_FSB_TO_BB(mp, start_rtb);
+		if (info->low_daddr >= eofs)
+			return 0;
+	}
+
+	trace_xfs_fsmap_low_key_linear(mp, info->dev, start_rtb);
+	trace_xfs_fsmap_high_key_linear(mp, info->dev, end_rtb);
+
+	xfs_rtbitmap_lock_shared(mp, XFS_RBMLOCK_BITMAP);
+
+	/*
+	 * Set up query parameters to return free rtextents covering the range
+	 * we want.
+	 */
+	high = xfs_rtb_to_rtxup(mp, end_rtb);
+	error = xfs_rtalloc_query_range(mp, tp, xfs_rtb_to_rtx(mp, start_rtb),
+			high, xfs_getfsmap_rtdev_rtbitmap_helper, info);
+	if (error)
+		goto err;
+
+	/*
+	 * Report any gaps at the end of the rtbitmap by simulating a null
+	 * rmap starting at the block after the end of the query range.
+	 */
+	info->last = true;
+	ahigh.ar_startext = min(mp->m_sb.sb_rextents, high);
+
+	error = xfs_getfsmap_rtdev_rtbitmap_helper(mp, tp, &ahigh, info);
+	if (error)
+		goto err;
+err:
+	xfs_rtbitmap_unlock_shared(mp, XFS_RBMLOCK_BITMAP);
+	return error;
+}
+#endif /* CONFIG_XFS_RT */
+
 /* Do we recognize the device? */
 STATIC bool
 xfs_getfsmap_is_valid_device(


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 10/10] xfs: move xfs_ioc_getfsmap out of xfs_ioctl.c
  2024-09-02 18:22 ` [PATCHSET v4.2 5/8] xfs: cleanups for the realtime allocator Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-09-02 18:31   ` [PATCH 09/10] xfs: rearrange xfs_fsmap.c a little bit Darrick J. Wong
@ 2024-09-02 18:32   ` Darrick J. Wong
  9 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:32 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Move this function out of xfs_ioctl.c to reduce the clutter in there,
and make the entire getfsmap implementation self-contained in a single
file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_fsmap.c |  134 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_fsmap.h |    6 +-
 fs/xfs/xfs_ioctl.c |  130 --------------------------------------------------
 3 files changed, 134 insertions(+), 136 deletions(-)


diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index 615253406fde..ae18ab86e608 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -44,7 +44,7 @@ xfs_fsmap_from_internal(
 }
 
 /* Convert an fsmap to an xfs_fsmap. */
-void
+static void
 xfs_fsmap_to_internal(
 	struct xfs_fsmap	*dest,
 	struct fsmap		*src)
@@ -889,7 +889,7 @@ xfs_getfsmap_check_keys(
  * xfs_getfsmap_info.low/high	-- per-AG low/high keys computed from
  *				   dkeys; used to query the metadata.
  */
-int
+STATIC int
 xfs_getfsmap(
 	struct xfs_mount		*mp,
 	struct xfs_fsmap_head		*head,
@@ -1019,3 +1019,133 @@ xfs_getfsmap(
 	head->fmh_oflags = FMH_OF_DEV_T;
 	return error;
 }
+
+int
+xfs_ioc_getfsmap(
+	struct xfs_inode	*ip,
+	struct fsmap_head	__user *arg)
+{
+	struct xfs_fsmap_head	xhead = {0};
+	struct fsmap_head	head;
+	struct fsmap		*recs;
+	unsigned int		count;
+	__u32			last_flags = 0;
+	bool			done = false;
+	int			error;
+
+	if (copy_from_user(&head, arg, sizeof(struct fsmap_head)))
+		return -EFAULT;
+	if (memchr_inv(head.fmh_reserved, 0, sizeof(head.fmh_reserved)) ||
+	    memchr_inv(head.fmh_keys[0].fmr_reserved, 0,
+		       sizeof(head.fmh_keys[0].fmr_reserved)) ||
+	    memchr_inv(head.fmh_keys[1].fmr_reserved, 0,
+		       sizeof(head.fmh_keys[1].fmr_reserved)))
+		return -EINVAL;
+
+	/*
+	 * Use an internal memory buffer so that we don't have to copy fsmap
+	 * data to userspace while holding locks.  Start by trying to allocate
+	 * up to 128k for the buffer, but fall back to a single page if needed.
+	 */
+	count = min_t(unsigned int, head.fmh_count,
+			131072 / sizeof(struct fsmap));
+	recs = kvcalloc(count, sizeof(struct fsmap), GFP_KERNEL);
+	if (!recs) {
+		count = min_t(unsigned int, head.fmh_count,
+				PAGE_SIZE / sizeof(struct fsmap));
+		recs = kvcalloc(count, sizeof(struct fsmap), GFP_KERNEL);
+		if (!recs)
+			return -ENOMEM;
+	}
+
+	xhead.fmh_iflags = head.fmh_iflags;
+	xfs_fsmap_to_internal(&xhead.fmh_keys[0], &head.fmh_keys[0]);
+	xfs_fsmap_to_internal(&xhead.fmh_keys[1], &head.fmh_keys[1]);
+
+	trace_xfs_getfsmap_low_key(ip->i_mount, &xhead.fmh_keys[0]);
+	trace_xfs_getfsmap_high_key(ip->i_mount, &xhead.fmh_keys[1]);
+
+	head.fmh_entries = 0;
+	do {
+		struct fsmap __user	*user_recs;
+		struct fsmap		*last_rec;
+
+		user_recs = &arg->fmh_recs[head.fmh_entries];
+		xhead.fmh_entries = 0;
+		xhead.fmh_count = min_t(unsigned int, count,
+					head.fmh_count - head.fmh_entries);
+
+		/* Run query, record how many entries we got. */
+		error = xfs_getfsmap(ip->i_mount, &xhead, recs);
+		switch (error) {
+		case 0:
+			/*
+			 * There are no more records in the result set.  Copy
+			 * whatever we got to userspace and break out.
+			 */
+			done = true;
+			break;
+		case -ECANCELED:
+			/*
+			 * The internal memory buffer is full.  Copy whatever
+			 * records we got to userspace and go again if we have
+			 * not yet filled the userspace buffer.
+			 */
+			error = 0;
+			break;
+		default:
+			goto out_free;
+		}
+		head.fmh_entries += xhead.fmh_entries;
+		head.fmh_oflags = xhead.fmh_oflags;
+
+		/*
+		 * If the caller wanted a record count or there aren't any
+		 * new records to return, we're done.
+		 */
+		if (head.fmh_count == 0 || xhead.fmh_entries == 0)
+			break;
+
+		/* Copy all the records we got out to userspace. */
+		if (copy_to_user(user_recs, recs,
+				 xhead.fmh_entries * sizeof(struct fsmap))) {
+			error = -EFAULT;
+			goto out_free;
+		}
+
+		/* Remember the last record flags we copied to userspace. */
+		last_rec = &recs[xhead.fmh_entries - 1];
+		last_flags = last_rec->fmr_flags;
+
+		/* Set up the low key for the next iteration. */
+		xfs_fsmap_to_internal(&xhead.fmh_keys[0], last_rec);
+		trace_xfs_getfsmap_low_key(ip->i_mount, &xhead.fmh_keys[0]);
+	} while (!done && head.fmh_entries < head.fmh_count);
+
+	/*
+	 * If there are no more records in the query result set and we're not
+	 * in counting mode, mark the last record returned with the LAST flag.
+	 */
+	if (done && head.fmh_count > 0 && head.fmh_entries > 0) {
+		struct fsmap __user	*user_rec;
+
+		last_flags |= FMR_OF_LAST;
+		user_rec = &arg->fmh_recs[head.fmh_entries - 1];
+
+		if (copy_to_user(&user_rec->fmr_flags, &last_flags,
+					sizeof(last_flags))) {
+			error = -EFAULT;
+			goto out_free;
+		}
+	}
+
+	/* copy back header */
+	if (copy_to_user(arg, &head, sizeof(struct fsmap_head))) {
+		error = -EFAULT;
+		goto out_free;
+	}
+
+out_free:
+	kvfree(recs);
+	return error;
+}
diff --git a/fs/xfs/xfs_fsmap.h b/fs/xfs/xfs_fsmap.h
index a0775788e7b1..a0bcc38486a5 100644
--- a/fs/xfs/xfs_fsmap.h
+++ b/fs/xfs/xfs_fsmap.h
@@ -7,6 +7,7 @@
 #define __XFS_FSMAP_H__
 
 struct fsmap;
+struct fsmap_head;
 
 /* internal fsmap representation */
 struct xfs_fsmap {
@@ -27,9 +28,6 @@ struct xfs_fsmap_head {
 	struct xfs_fsmap fmh_keys[2];	/* low and high keys */
 };
 
-void xfs_fsmap_to_internal(struct xfs_fsmap *dest, struct fsmap *src);
-
-int xfs_getfsmap(struct xfs_mount *mp, struct xfs_fsmap_head *head,
-		struct fsmap *out_recs);
+int xfs_ioc_getfsmap(struct xfs_inode *ip, struct fsmap_head __user *arg);
 
 #endif /* __XFS_FSMAP_H__ */
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 90b3ee21e7fe..7226d27e8afc 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -876,136 +876,6 @@ xfs_ioc_getbmap(
 	return error;
 }
 
-STATIC int
-xfs_ioc_getfsmap(
-	struct xfs_inode	*ip,
-	struct fsmap_head	__user *arg)
-{
-	struct xfs_fsmap_head	xhead = {0};
-	struct fsmap_head	head;
-	struct fsmap		*recs;
-	unsigned int		count;
-	__u32			last_flags = 0;
-	bool			done = false;
-	int			error;
-
-	if (copy_from_user(&head, arg, sizeof(struct fsmap_head)))
-		return -EFAULT;
-	if (memchr_inv(head.fmh_reserved, 0, sizeof(head.fmh_reserved)) ||
-	    memchr_inv(head.fmh_keys[0].fmr_reserved, 0,
-		       sizeof(head.fmh_keys[0].fmr_reserved)) ||
-	    memchr_inv(head.fmh_keys[1].fmr_reserved, 0,
-		       sizeof(head.fmh_keys[1].fmr_reserved)))
-		return -EINVAL;
-
-	/*
-	 * Use an internal memory buffer so that we don't have to copy fsmap
-	 * data to userspace while holding locks.  Start by trying to allocate
-	 * up to 128k for the buffer, but fall back to a single page if needed.
-	 */
-	count = min_t(unsigned int, head.fmh_count,
-			131072 / sizeof(struct fsmap));
-	recs = kvcalloc(count, sizeof(struct fsmap), GFP_KERNEL);
-	if (!recs) {
-		count = min_t(unsigned int, head.fmh_count,
-				PAGE_SIZE / sizeof(struct fsmap));
-		recs = kvcalloc(count, sizeof(struct fsmap), GFP_KERNEL);
-		if (!recs)
-			return -ENOMEM;
-	}
-
-	xhead.fmh_iflags = head.fmh_iflags;
-	xfs_fsmap_to_internal(&xhead.fmh_keys[0], &head.fmh_keys[0]);
-	xfs_fsmap_to_internal(&xhead.fmh_keys[1], &head.fmh_keys[1]);
-
-	trace_xfs_getfsmap_low_key(ip->i_mount, &xhead.fmh_keys[0]);
-	trace_xfs_getfsmap_high_key(ip->i_mount, &xhead.fmh_keys[1]);
-
-	head.fmh_entries = 0;
-	do {
-		struct fsmap __user	*user_recs;
-		struct fsmap		*last_rec;
-
-		user_recs = &arg->fmh_recs[head.fmh_entries];
-		xhead.fmh_entries = 0;
-		xhead.fmh_count = min_t(unsigned int, count,
-					head.fmh_count - head.fmh_entries);
-
-		/* Run query, record how many entries we got. */
-		error = xfs_getfsmap(ip->i_mount, &xhead, recs);
-		switch (error) {
-		case 0:
-			/*
-			 * There are no more records in the result set.  Copy
-			 * whatever we got to userspace and break out.
-			 */
-			done = true;
-			break;
-		case -ECANCELED:
-			/*
-			 * The internal memory buffer is full.  Copy whatever
-			 * records we got to userspace and go again if we have
-			 * not yet filled the userspace buffer.
-			 */
-			error = 0;
-			break;
-		default:
-			goto out_free;
-		}
-		head.fmh_entries += xhead.fmh_entries;
-		head.fmh_oflags = xhead.fmh_oflags;
-
-		/*
-		 * If the caller wanted a record count or there aren't any
-		 * new records to return, we're done.
-		 */
-		if (head.fmh_count == 0 || xhead.fmh_entries == 0)
-			break;
-
-		/* Copy all the records we got out to userspace. */
-		if (copy_to_user(user_recs, recs,
-				 xhead.fmh_entries * sizeof(struct fsmap))) {
-			error = -EFAULT;
-			goto out_free;
-		}
-
-		/* Remember the last record flags we copied to userspace. */
-		last_rec = &recs[xhead.fmh_entries - 1];
-		last_flags = last_rec->fmr_flags;
-
-		/* Set up the low key for the next iteration. */
-		xfs_fsmap_to_internal(&xhead.fmh_keys[0], last_rec);
-		trace_xfs_getfsmap_low_key(ip->i_mount, &xhead.fmh_keys[0]);
-	} while (!done && head.fmh_entries < head.fmh_count);
-
-	/*
-	 * If there are no more records in the query result set and we're not
-	 * in counting mode, mark the last record returned with the LAST flag.
-	 */
-	if (done && head.fmh_count > 0 && head.fmh_entries > 0) {
-		struct fsmap __user	*user_rec;
-
-		last_flags |= FMR_OF_LAST;
-		user_rec = &arg->fmh_recs[head.fmh_entries - 1];
-
-		if (copy_to_user(&user_rec->fmr_flags, &last_flags,
-					sizeof(last_flags))) {
-			error = -EFAULT;
-			goto out_free;
-		}
-	}
-
-	/* copy back header */
-	if (copy_to_user(arg, &head, sizeof(struct fsmap_head))) {
-		error = -EFAULT;
-		goto out_free;
-	}
-
-out_free:
-	kvfree(recs);
-	return error;
-}
-
 int
 xfs_ioc_swapext(
 	xfs_swapext_t	*sxp)


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 1/1] xfs: refactor loading quota inodes in the regular case
  2024-09-02 18:22 ` [PATCHSET v4.2 6/8] xfs: cleanups for quota mount Darrick J. Wong
@ 2024-09-02 18:32   ` Darrick J. Wong
  0 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:32 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create a helper function to load quota inodes in the case where the
dqtype and the sb quota inode fields correspond.  This is true for
nearly all the iget callsites in the quota code, except for when we're
switching the group and project quota inodes.  We'll need this in
subsequent patches to make the metadir handling less convoluted.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_qm.c          |   46 +++++++++++++++++++++++++++++++++++-----
 fs/xfs/xfs_qm.h          |    3 +++
 fs/xfs/xfs_qm_syscalls.c |   13 +++++------
 fs/xfs/xfs_quotaops.c    |   53 +++++++++++++++++++++++++++-------------------
 4 files changed, 80 insertions(+), 35 deletions(-)


diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 63f6ca2db251..7e2307921deb 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1538,6 +1538,43 @@ xfs_qm_mount_quotas(
 	}
 }
 
+/*
+ * Load the inode for a given type of quota, assuming that the sb fields have
+ * been sorted out.  This is not true when switching quota types on a V4
+ * filesystem, so do not use this function for that.
+ *
+ * Returns -ENOENT if the quota inode field is NULLFSINO; 0 and an inode on
+ * success; or a negative errno.
+ */
+int
+xfs_qm_qino_load(
+	struct xfs_mount	*mp,
+	xfs_dqtype_t		type,
+	struct xfs_inode	**ipp)
+{
+	xfs_ino_t		ino = NULLFSINO;
+
+	switch (type) {
+	case XFS_DQTYPE_USER:
+		ino = mp->m_sb.sb_uquotino;
+		break;
+	case XFS_DQTYPE_GROUP:
+		ino = mp->m_sb.sb_gquotino;
+		break;
+	case XFS_DQTYPE_PROJ:
+		ino = mp->m_sb.sb_pquotino;
+		break;
+	default:
+		ASSERT(0);
+		return -EFSCORRUPTED;
+	}
+
+	if (ino == NULLFSINO)
+		return -ENOENT;
+
+	return xfs_iget(mp, NULL, ino, 0, 0, ipp);
+}
+
 /*
  * This is called after the superblock has been read in and we're ready to
  * iget the quota inodes.
@@ -1561,24 +1598,21 @@ xfs_qm_init_quotainos(
 		if (XFS_IS_UQUOTA_ON(mp) &&
 		    mp->m_sb.sb_uquotino != NULLFSINO) {
 			ASSERT(mp->m_sb.sb_uquotino > 0);
-			error = xfs_iget(mp, NULL, mp->m_sb.sb_uquotino,
-					     0, 0, &uip);
+			error = xfs_qm_qino_load(mp, XFS_DQTYPE_USER, &uip);
 			if (error)
 				return error;
 		}
 		if (XFS_IS_GQUOTA_ON(mp) &&
 		    mp->m_sb.sb_gquotino != NULLFSINO) {
 			ASSERT(mp->m_sb.sb_gquotino > 0);
-			error = xfs_iget(mp, NULL, mp->m_sb.sb_gquotino,
-					     0, 0, &gip);
+			error = xfs_qm_qino_load(mp, XFS_DQTYPE_GROUP, &gip);
 			if (error)
 				goto error_rele;
 		}
 		if (XFS_IS_PQUOTA_ON(mp) &&
 		    mp->m_sb.sb_pquotino != NULLFSINO) {
 			ASSERT(mp->m_sb.sb_pquotino > 0);
-			error = xfs_iget(mp, NULL, mp->m_sb.sb_pquotino,
-					     0, 0, &pip);
+			error = xfs_qm_qino_load(mp, XFS_DQTYPE_PROJ, &pip);
 			if (error)
 				goto error_rele;
 		}
diff --git a/fs/xfs/xfs_qm.h b/fs/xfs/xfs_qm.h
index 6e09dfcd13e2..e919c7f62f57 100644
--- a/fs/xfs/xfs_qm.h
+++ b/fs/xfs/xfs_qm.h
@@ -184,4 +184,7 @@ xfs_get_defquota(struct xfs_quotainfo *qi, xfs_dqtype_t type)
 	}
 }
 
+int xfs_qm_qino_load(struct xfs_mount *mp, xfs_dqtype_t type,
+		struct xfs_inode **ipp);
+
 #endif /* __XFS_QM_H__ */
diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index 392cb39cc10c..4eda50ae2d1c 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -53,16 +53,15 @@ xfs_qm_scall_quotaoff(
 STATIC int
 xfs_qm_scall_trunc_qfile(
 	struct xfs_mount	*mp,
-	xfs_ino_t		ino)
+	xfs_dqtype_t		type)
 {
 	struct xfs_inode	*ip;
 	struct xfs_trans	*tp;
 	int			error;
 
-	if (ino == NULLFSINO)
+	error = xfs_qm_qino_load(mp, type, &ip);
+	if (error == -ENOENT)
 		return 0;
-
-	error = xfs_iget(mp, NULL, ino, 0, 0, &ip);
 	if (error)
 		return error;
 
@@ -113,17 +112,17 @@ xfs_qm_scall_trunc_qfiles(
 	}
 
 	if (flags & XFS_QMOPT_UQUOTA) {
-		error = xfs_qm_scall_trunc_qfile(mp, mp->m_sb.sb_uquotino);
+		error = xfs_qm_scall_trunc_qfile(mp, XFS_DQTYPE_USER);
 		if (error)
 			return error;
 	}
 	if (flags & XFS_QMOPT_GQUOTA) {
-		error = xfs_qm_scall_trunc_qfile(mp, mp->m_sb.sb_gquotino);
+		error = xfs_qm_scall_trunc_qfile(mp, XFS_DQTYPE_GROUP);
 		if (error)
 			return error;
 	}
 	if (flags & XFS_QMOPT_PQUOTA)
-		error = xfs_qm_scall_trunc_qfile(mp, mp->m_sb.sb_pquotino);
+		error = xfs_qm_scall_trunc_qfile(mp, XFS_DQTYPE_PROJ);
 
 	return error;
 }
diff --git a/fs/xfs/xfs_quotaops.c b/fs/xfs/xfs_quotaops.c
index 9c162e69976b..4c7f7ce4fd2f 100644
--- a/fs/xfs/xfs_quotaops.c
+++ b/fs/xfs/xfs_quotaops.c
@@ -16,24 +16,25 @@
 #include "xfs_qm.h"
 
 
-static void
+static int
 xfs_qm_fill_state(
 	struct qc_type_state	*tstate,
 	struct xfs_mount	*mp,
-	struct xfs_inode	*ip,
-	xfs_ino_t		ino,
-	struct xfs_def_quota	*defq)
+	xfs_dqtype_t		type)
 {
-	bool			tempqip = false;
+	struct xfs_inode	*ip;
+	struct xfs_def_quota	*defq;
+	int			error;
 
-	tstate->ino = ino;
-	if (!ip && ino == NULLFSINO)
-		return;
-	if (!ip) {
-		if (xfs_iget(mp, NULL, ino, 0, 0, &ip))
-			return;
-		tempqip = true;
+	error = xfs_qm_qino_load(mp, type, &ip);
+	if (error) {
+		tstate->ino = NULLFSINO;
+		return error != -ENOENT ? error : 0;
 	}
+
+	defq = xfs_get_defquota(mp->m_quotainfo, type);
+
+	tstate->ino = ip->i_ino;
 	tstate->flags |= QCI_SYSFILE;
 	tstate->blocks = ip->i_nblocks;
 	tstate->nextents = ip->i_df.if_nextents;
@@ -43,8 +44,9 @@ xfs_qm_fill_state(
 	tstate->spc_warnlimit = 0;
 	tstate->ino_warnlimit = 0;
 	tstate->rt_spc_warnlimit = 0;
-	if (tempqip)
-		xfs_irele(ip);
+	xfs_irele(ip);
+
+	return 0;
 }
 
 /*
@@ -56,8 +58,9 @@ xfs_fs_get_quota_state(
 	struct super_block	*sb,
 	struct qc_state		*state)
 {
-	struct xfs_mount *mp = XFS_M(sb);
-	struct xfs_quotainfo *q = mp->m_quotainfo;
+	struct xfs_mount	*mp = XFS_M(sb);
+	struct xfs_quotainfo	*q = mp->m_quotainfo;
+	int			error;
 
 	memset(state, 0, sizeof(*state));
 	if (!XFS_IS_QUOTA_ON(mp))
@@ -76,12 +79,18 @@ xfs_fs_get_quota_state(
 	if (XFS_IS_PQUOTA_ENFORCED(mp))
 		state->s_state[PRJQUOTA].flags |= QCI_LIMITS_ENFORCED;
 
-	xfs_qm_fill_state(&state->s_state[USRQUOTA], mp, q->qi_uquotaip,
-			  mp->m_sb.sb_uquotino, &q->qi_usr_default);
-	xfs_qm_fill_state(&state->s_state[GRPQUOTA], mp, q->qi_gquotaip,
-			  mp->m_sb.sb_gquotino, &q->qi_grp_default);
-	xfs_qm_fill_state(&state->s_state[PRJQUOTA], mp, q->qi_pquotaip,
-			  mp->m_sb.sb_pquotino, &q->qi_prj_default);
+	error = xfs_qm_fill_state(&state->s_state[USRQUOTA], mp,
+			XFS_DQTYPE_USER);
+	if (error)
+		return error;
+	error = xfs_qm_fill_state(&state->s_state[GRPQUOTA], mp,
+			XFS_DQTYPE_GROUP);
+	if (error)
+		return error;
+	error = xfs_qm_fill_state(&state->s_state[PRJQUOTA], mp,
+			XFS_DQTYPE_PROJ);
+	if (error)
+		return error;
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 1/3] xfs: fix C++ compilation errors in xfs_fs.h
  2024-09-02 18:22 ` [PATCHSET 7/8] xfs: various bug fixes for 6.12 Darrick J. Wong
@ 2024-09-02 18:32   ` Darrick J. Wong
  2024-09-02 18:33   ` [PATCH 2/3] xfs: fix FITRIM reporting again Darrick J. Wong
  2024-09-02 18:33   ` [PATCH 3/3] xfs: fix a sloppy memory handling bug in xfs_iroot_realloc Darrick J. Wong
  2 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:32 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: kernel, sam, Christoph Hellwig, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Several people reported C++ compilation errors due to things that C
compilers allow but C++ compilers do not.  Fix both of these problems,
and hope there aren't more of these brown paper bags in 2 months when we
finally get these fixes through the process into a released xfsprogs.

NOTE: I am submitting this bugfix over the objections of a former
maintainer, who insists that we should remove this function from the
published userspace ABI instead of fixing the C++ compilation errors.
No deprecation period, no discussion, just a hard drop of an already
provided and correct C function, which would be in contravention of
Linus' rules.  IOWs, removing ABI that have already shipped in a
released kernel requires a careful deprecation period, so I will let
that maintainer run that process.

Reported-by: kernel@mattwhitlock.name
Reported-by: sam@gentoo.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219203
Fixes: 233f4e12bbb2c ("xfs: add parent pointer ioctls")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_fs.h |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index c85c8077fac3..860284064c5a 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -8,6 +8,7 @@
 
 /*
  * SGI's XFS filesystem's major stuff (constants, structures)
+ * NOTE: This file must be compile-able with C++ compilers.
  */
 
 /*
@@ -930,13 +931,13 @@ static inline struct xfs_getparents_rec *
 xfs_getparents_next_rec(struct xfs_getparents *gp,
 			struct xfs_getparents_rec *gpr)
 {
-	void *next = ((void *)gpr + gpr->gpr_reclen);
+	void *next = ((char *)gpr + gpr->gpr_reclen);
 	void *end = (void *)(uintptr_t)(gp->gp_buffer + gp->gp_bufsize);
 
 	if (next >= end)
 		return NULL;
 
-	return next;
+	return (struct xfs_getparents_rec *)next;
 }
 
 /* Iterate through this file handle's directory parent pointers. */


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 2/3] xfs: fix FITRIM reporting again
  2024-09-02 18:22 ` [PATCHSET 7/8] xfs: various bug fixes for 6.12 Darrick J. Wong
  2024-09-02 18:32   ` [PATCH 1/3] xfs: fix C++ compilation errors in xfs_fs.h Darrick J. Wong
@ 2024-09-02 18:33   ` Darrick J. Wong
  2024-09-02 18:33   ` [PATCH 3/3] xfs: fix a sloppy memory handling bug in xfs_iroot_realloc Darrick J. Wong
  2 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:33 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Don't report FITRIMming more bytes than possibly exist in the
filesystem.

Fixes: 410e8a18f8e93 ("xfs: don't bother reporting blocks trimmed via FITRIM")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_discard.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
index bf1e3f330018..d8c4a5dcca7a 100644
--- a/fs/xfs/xfs_discard.c
+++ b/fs/xfs/xfs_discard.c
@@ -707,7 +707,7 @@ xfs_ioc_trim(
 		return last_error;
 
 	range.len = min_t(unsigned long long, range.len,
-			  XFS_FSB_TO_B(mp, max_blocks));
+			  XFS_FSB_TO_B(mp, max_blocks) - range.start);
 	if (copy_to_user(urange, &range, sizeof(range)))
 		return -EFAULT;
 	return 0;


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 3/3] xfs: fix a sloppy memory handling bug in xfs_iroot_realloc
  2024-09-02 18:22 ` [PATCHSET 7/8] xfs: various bug fixes for 6.12 Darrick J. Wong
  2024-09-02 18:32   ` [PATCH 1/3] xfs: fix C++ compilation errors in xfs_fs.h Darrick J. Wong
  2024-09-02 18:33   ` [PATCH 2/3] xfs: fix FITRIM reporting again Darrick J. Wong
@ 2024-09-02 18:33   ` Darrick J. Wong
  2 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:33 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

While refactoring code, I noticed that when xfs_iroot_realloc tries to
shrink a bmbt root block, it allocates a smaller new block and then
copies "records" and pointers to the new block.  However, bmbt root
blocks cannot ever be leaves, which means that it's not technically
correct to copy records.  We /should/ be copying keys.

Note that this has never resulted in actual memory corruption because
sizeof(bmbt_rec) == (sizeof(bmbt_key) + sizeof(bmbt_ptr)).  However,
this will no longer be true when we start adding realtime rmap stuff,
so fix this now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_inode_fork.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 9d11ae015909..622382300904 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -463,15 +463,15 @@ xfs_iroot_realloc(
 	}
 
 	/*
-	 * Only copy the records and pointers if there are any.
+	 * Only copy the keys and pointers if there are any.
 	 */
 	if (new_max > 0) {
 		/*
-		 * First copy the records.
+		 * First copy the keys.
 		 */
-		op = (char *)XFS_BMBT_REC_ADDR(mp, ifp->if_broot, 1);
-		np = (char *)XFS_BMBT_REC_ADDR(mp, new_broot, 1);
-		memcpy(np, op, new_max * (uint)sizeof(xfs_bmbt_rec_t));
+		op = (char *)XFS_BMBT_KEY_ADDR(mp, ifp->if_broot, 1);
+		np = (char *)XFS_BMBT_KEY_ADDR(mp, new_broot, 1);
+		memcpy(np, op, new_max * (uint)sizeof(xfs_bmbt_key_t));
 
 		/*
 		 * Then copy the pointers.


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 1/2] xfs: replace shouty XFS_BM{BT,DR} macros
  2024-09-02 18:22 ` [PATCHSET v4.2 8/8] xfs: cleanups for inode rooted btree code Darrick J. Wong
@ 2024-09-02 18:33   ` Darrick J. Wong
  2024-09-02 18:33   ` [PATCH 2/2] xfs: standardize the btree maxrecs function parameters Darrick J. Wong
  1 sibling, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:33 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Replace all the shouty bmap btree and bmap disk root macros with actual
functions.

sed \
 -e 's/XFS_BMBT_BLOCK_LEN/xfs_bmbt_block_len/g' \
 -e 's/XFS_BMBT_REC_ADDR/xfs_bmbt_rec_addr/g' \
 -e 's/XFS_BMBT_KEY_ADDR/xfs_bmbt_key_addr/g' \
 -e 's/XFS_BMBT_PTR_ADDR/xfs_bmbt_ptr_addr/g' \
 -e 's/XFS_BMDR_REC_ADDR/xfs_bmdr_rec_addr/g' \
 -e 's/XFS_BMDR_KEY_ADDR/xfs_bmdr_key_addr/g' \
 -e 's/XFS_BMDR_PTR_ADDR/xfs_bmdr_ptr_addr/g' \
 -e 's/XFS_BMAP_BROOT_PTR_ADDR/xfs_bmap_broot_ptr_addr/g' \
 -e 's/XFS_BMAP_BROOT_SPACE_CALC/xfs_bmap_broot_space_calc/g' \
 -e 's/XFS_BMAP_BROOT_SPACE/xfs_bmap_broot_space/g' \
 -e 's/XFS_BMDR_SPACE_CALC/xfs_bmdr_space_calc/g' \
 -e 's/XFS_BMAP_BMDR_SPACE/xfs_bmap_bmdr_space/g' \
 -i $(git ls-files fs/xfs/*.[ch] fs/xfs/libxfs/*.[ch] fs/xfs/scrub/*.[ch])

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_attr_leaf.c  |    8 +-
 fs/xfs/libxfs/xfs_bmap.c       |   40 ++++----
 fs/xfs/libxfs/xfs_bmap_btree.c |   18 ++--
 fs/xfs/libxfs/xfs_bmap_btree.h |  204 +++++++++++++++++++++++++++-------------
 fs/xfs/libxfs/xfs_inode_fork.c |   30 +++---
 fs/xfs/libxfs/xfs_trans_resv.c |    2 
 fs/xfs/scrub/bmap_repair.c     |    2 
 fs/xfs/scrub/inode_repair.c    |   12 +-
 fs/xfs/xfs_bmap_util.c         |    4 -
 9 files changed, 198 insertions(+), 122 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index b9e98950eb3d..6aaec1246c95 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -686,7 +686,7 @@ xfs_attr_shortform_bytesfit(
 		 */
 		if (!dp->i_forkoff && dp->i_df.if_bytes >
 		    xfs_default_attroffset(dp))
-			dsize = XFS_BMDR_SPACE_CALC(MINDBTPTRS);
+			dsize = xfs_bmdr_space_calc(MINDBTPTRS);
 		break;
 	case XFS_DINODE_FMT_BTREE:
 		/*
@@ -700,7 +700,7 @@ xfs_attr_shortform_bytesfit(
 				return 0;
 			return dp->i_forkoff;
 		}
-		dsize = XFS_BMAP_BROOT_SPACE(mp, dp->i_df.if_broot);
+		dsize = xfs_bmap_bmdr_space(dp->i_df.if_broot);
 		break;
 	}
 
@@ -708,11 +708,11 @@ xfs_attr_shortform_bytesfit(
 	 * A data fork btree root must have space for at least
 	 * MINDBTPTRS key/ptr pairs if the data fork is small or empty.
 	 */
-	minforkoff = max_t(int64_t, dsize, XFS_BMDR_SPACE_CALC(MINDBTPTRS));
+	minforkoff = max_t(int64_t, dsize, xfs_bmdr_space_calc(MINDBTPTRS));
 	minforkoff = roundup(minforkoff, 8) >> 3;
 
 	/* attr fork btree root can have at least this many key/ptr pairs */
-	maxforkoff = XFS_LITINO(mp) - XFS_BMDR_SPACE_CALC(MINABTPTRS);
+	maxforkoff = XFS_LITINO(mp) - xfs_bmdr_space_calc(MINABTPTRS);
 	maxforkoff = maxforkoff >> 3;	/* rounded down */
 
 	if (offset >= maxforkoff)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 434433ed29dc..00cac756c956 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -79,9 +79,9 @@ xfs_bmap_compute_maxlevels(
 	maxleafents = xfs_iext_max_nextents(xfs_has_large_extent_counts(mp),
 				whichfork);
 	if (whichfork == XFS_DATA_FORK)
-		sz = XFS_BMDR_SPACE_CALC(MINDBTPTRS);
+		sz = xfs_bmdr_space_calc(MINDBTPTRS);
 	else
-		sz = XFS_BMDR_SPACE_CALC(MINABTPTRS);
+		sz = xfs_bmdr_space_calc(MINABTPTRS);
 
 	maxrootrecs = xfs_bmdr_maxrecs(sz, 0);
 	minleafrecs = mp->m_bmap_dmnr[0];
@@ -102,8 +102,8 @@ xfs_bmap_compute_attr_offset(
 	struct xfs_mount	*mp)
 {
 	if (mp->m_sb.sb_inodesize == 256)
-		return XFS_LITINO(mp) - XFS_BMDR_SPACE_CALC(MINABTPTRS);
-	return XFS_BMDR_SPACE_CALC(6 * MINABTPTRS);
+		return XFS_LITINO(mp) - xfs_bmdr_space_calc(MINABTPTRS);
+	return xfs_bmdr_space_calc(6 * MINABTPTRS);
 }
 
 STATIC int				/* error */
@@ -298,7 +298,7 @@ xfs_check_block(
 	prevp = NULL;
 	for( i = 1; i <= xfs_btree_get_numrecs(block); i++) {
 		dmxr = mp->m_bmap_dmxr[0];
-		keyp = XFS_BMBT_KEY_ADDR(mp, block, i);
+		keyp = xfs_bmbt_key_addr(mp, block, i);
 
 		if (prevp) {
 			ASSERT(be64_to_cpu(prevp->br_startoff) <
@@ -310,15 +310,15 @@ xfs_check_block(
 		 * Compare the block numbers to see if there are dups.
 		 */
 		if (root)
-			pp = XFS_BMAP_BROOT_PTR_ADDR(mp, block, i, sz);
+			pp = xfs_bmap_broot_ptr_addr(mp, block, i, sz);
 		else
-			pp = XFS_BMBT_PTR_ADDR(mp, block, i, dmxr);
+			pp = xfs_bmbt_ptr_addr(mp, block, i, dmxr);
 
 		for (j = i+1; j <= be16_to_cpu(block->bb_numrecs); j++) {
 			if (root)
-				thispa = XFS_BMAP_BROOT_PTR_ADDR(mp, block, j, sz);
+				thispa = xfs_bmap_broot_ptr_addr(mp, block, j, sz);
 			else
-				thispa = XFS_BMBT_PTR_ADDR(mp, block, j, dmxr);
+				thispa = xfs_bmbt_ptr_addr(mp, block, j, dmxr);
 			if (*thispa == *pp) {
 				xfs_warn(mp, "%s: thispa(%d) == pp(%d) %lld",
 					__func__, j, i,
@@ -373,7 +373,7 @@ xfs_bmap_check_leaf_extents(
 	level = be16_to_cpu(block->bb_level);
 	ASSERT(level > 0);
 	xfs_check_block(block, mp, 1, ifp->if_broot_bytes);
-	pp = XFS_BMAP_BROOT_PTR_ADDR(mp, block, 1, ifp->if_broot_bytes);
+	pp = xfs_bmap_broot_ptr_addr(mp, block, 1, ifp->if_broot_bytes);
 	bno = be64_to_cpu(*pp);
 
 	ASSERT(bno != NULLFSBLOCK);
@@ -406,7 +406,7 @@ xfs_bmap_check_leaf_extents(
 		 */
 
 		xfs_check_block(block, mp, 0, 0);
-		pp = XFS_BMBT_PTR_ADDR(mp, block, 1, mp->m_bmap_dmxr[1]);
+		pp = xfs_bmbt_ptr_addr(mp, block, 1, mp->m_bmap_dmxr[1]);
 		bno = be64_to_cpu(*pp);
 		if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, bno))) {
 			xfs_btree_mark_sick(cur);
@@ -446,14 +446,14 @@ xfs_bmap_check_leaf_extents(
 		 * conform with the first entry in this one.
 		 */
 
-		ep = XFS_BMBT_REC_ADDR(mp, block, 1);
+		ep = xfs_bmbt_rec_addr(mp, block, 1);
 		if (i) {
 			ASSERT(xfs_bmbt_disk_get_startoff(&last) +
 			       xfs_bmbt_disk_get_blockcount(&last) <=
 			       xfs_bmbt_disk_get_startoff(ep));
 		}
 		for (j = 1; j < num_recs; j++) {
-			nextp = XFS_BMBT_REC_ADDR(mp, block, j + 1);
+			nextp = xfs_bmbt_rec_addr(mp, block, j + 1);
 			ASSERT(xfs_bmbt_disk_get_startoff(ep) +
 			       xfs_bmbt_disk_get_blockcount(ep) <=
 			       xfs_bmbt_disk_get_startoff(nextp));
@@ -586,7 +586,7 @@ xfs_bmap_btree_to_extents(
 	ASSERT(be16_to_cpu(rblock->bb_numrecs) == 1);
 	ASSERT(xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0) == 1);
 
-	pp = XFS_BMAP_BROOT_PTR_ADDR(mp, rblock, 1, ifp->if_broot_bytes);
+	pp = xfs_bmap_broot_ptr_addr(mp, rblock, 1, ifp->if_broot_bytes);
 	cbno = be64_to_cpu(*pp);
 #ifdef DEBUG
 	if (XFS_IS_CORRUPT(cur->bc_mp, !xfs_verify_fsbno(mp, cbno))) {
@@ -714,7 +714,7 @@ xfs_bmap_extents_to_btree(
 	for_each_xfs_iext(ifp, &icur, &rec) {
 		if (isnullstartblock(rec.br_startblock))
 			continue;
-		arp = XFS_BMBT_REC_ADDR(mp, ablock, 1 + cnt);
+		arp = xfs_bmbt_rec_addr(mp, ablock, 1 + cnt);
 		xfs_bmbt_disk_set_all(arp, &rec);
 		cnt++;
 	}
@@ -724,10 +724,10 @@ xfs_bmap_extents_to_btree(
 	/*
 	 * Fill in the root key and pointer.
 	 */
-	kp = XFS_BMBT_KEY_ADDR(mp, block, 1);
-	arp = XFS_BMBT_REC_ADDR(mp, ablock, 1);
+	kp = xfs_bmbt_key_addr(mp, block, 1);
+	arp = xfs_bmbt_rec_addr(mp, ablock, 1);
 	kp->br_startoff = cpu_to_be64(xfs_bmbt_disk_get_startoff(arp));
-	pp = XFS_BMBT_PTR_ADDR(mp, block, 1, xfs_bmbt_get_maxrecs(cur,
+	pp = xfs_bmbt_ptr_addr(mp, block, 1, xfs_bmbt_get_maxrecs(cur,
 						be16_to_cpu(block->bb_level)));
 	*pp = cpu_to_be64(args.fsbno);
 
@@ -896,7 +896,7 @@ xfs_bmap_add_attrfork_btree(
 
 	mp = ip->i_mount;
 
-	if (XFS_BMAP_BMDR_SPACE(block) <= xfs_inode_data_fork_size(ip))
+	if (xfs_bmap_bmdr_space(block) <= xfs_inode_data_fork_size(ip))
 		*flags |= XFS_ILOG_DBROOT;
 	else {
 		cur = xfs_bmbt_init_cursor(mp, tp, ip, XFS_DATA_FORK);
@@ -1160,7 +1160,7 @@ xfs_iread_bmbt_block(
 	}
 
 	/* Copy records into the incore cache. */
-	frp = XFS_BMBT_REC_ADDR(mp, block, 1);
+	frp = xfs_bmbt_rec_addr(mp, block, 1);
 	for (j = 0; j < num_recs; j++, frp++, ir->loaded++) {
 		struct xfs_bmbt_irec	new;
 		xfs_failaddr_t		fa;
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index d1b06ccde19e..3695b3ad07d4 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -65,10 +65,10 @@ xfs_bmdr_to_bmbt(
 	ASSERT(be16_to_cpu(rblock->bb_level) > 0);
 	rblock->bb_numrecs = dblock->bb_numrecs;
 	dmxr = xfs_bmdr_maxrecs(dblocklen, 0);
-	fkp = XFS_BMDR_KEY_ADDR(dblock, 1);
-	tkp = XFS_BMBT_KEY_ADDR(mp, rblock, 1);
-	fpp = XFS_BMDR_PTR_ADDR(dblock, 1, dmxr);
-	tpp = XFS_BMAP_BROOT_PTR_ADDR(mp, rblock, 1, rblocklen);
+	fkp = xfs_bmdr_key_addr(dblock, 1);
+	tkp = xfs_bmbt_key_addr(mp, rblock, 1);
+	fpp = xfs_bmdr_ptr_addr(dblock, 1, dmxr);
+	tpp = xfs_bmap_broot_ptr_addr(mp, rblock, 1, rblocklen);
 	dmxr = be16_to_cpu(dblock->bb_numrecs);
 	memcpy(tkp, fkp, sizeof(*fkp) * dmxr);
 	memcpy(tpp, fpp, sizeof(*fpp) * dmxr);
@@ -168,10 +168,10 @@ xfs_bmbt_to_bmdr(
 	dblock->bb_level = rblock->bb_level;
 	dblock->bb_numrecs = rblock->bb_numrecs;
 	dmxr = xfs_bmdr_maxrecs(dblocklen, 0);
-	fkp = XFS_BMBT_KEY_ADDR(mp, rblock, 1);
-	tkp = XFS_BMDR_KEY_ADDR(dblock, 1);
-	fpp = XFS_BMAP_BROOT_PTR_ADDR(mp, rblock, 1, rblocklen);
-	tpp = XFS_BMDR_PTR_ADDR(dblock, 1, dmxr);
+	fkp = xfs_bmbt_key_addr(mp, rblock, 1);
+	tkp = xfs_bmdr_key_addr(dblock, 1);
+	fpp = xfs_bmap_broot_ptr_addr(mp, rblock, 1, rblocklen);
+	tpp = xfs_bmdr_ptr_addr(dblock, 1, dmxr);
 	dmxr = be16_to_cpu(dblock->bb_numrecs);
 	memcpy(tkp, fkp, sizeof(*fkp) * dmxr);
 	memcpy(tpp, fpp, sizeof(*fpp) * dmxr);
@@ -651,7 +651,7 @@ xfs_bmbt_maxrecs(
 	int			blocklen,
 	int			leaf)
 {
-	blocklen -= XFS_BMBT_BLOCK_LEN(mp);
+	blocklen -= xfs_bmbt_block_len(mp);
 	return xfs_bmbt_block_maxrecs(blocklen, leaf);
 }
 
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.h b/fs/xfs/libxfs/xfs_bmap_btree.h
index de1b73f1225c..d006798d591b 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.h
+++ b/fs/xfs/libxfs/xfs_bmap_btree.h
@@ -13,70 +13,6 @@ struct xfs_inode;
 struct xfs_trans;
 struct xbtree_ifakeroot;
 
-/*
- * Btree block header size depends on a superblock flag.
- */
-#define XFS_BMBT_BLOCK_LEN(mp) \
-	(xfs_has_crc(((mp))) ? \
-		XFS_BTREE_LBLOCK_CRC_LEN : XFS_BTREE_LBLOCK_LEN)
-
-#define XFS_BMBT_REC_ADDR(mp, block, index) \
-	((xfs_bmbt_rec_t *) \
-		((char *)(block) + \
-		 XFS_BMBT_BLOCK_LEN(mp) + \
-		 ((index) - 1) * sizeof(xfs_bmbt_rec_t)))
-
-#define XFS_BMBT_KEY_ADDR(mp, block, index) \
-	((xfs_bmbt_key_t *) \
-		((char *)(block) + \
-		 XFS_BMBT_BLOCK_LEN(mp) + \
-		 ((index) - 1) * sizeof(xfs_bmbt_key_t)))
-
-#define XFS_BMBT_PTR_ADDR(mp, block, index, maxrecs) \
-	((xfs_bmbt_ptr_t *) \
-		((char *)(block) + \
-		 XFS_BMBT_BLOCK_LEN(mp) + \
-		 (maxrecs) * sizeof(xfs_bmbt_key_t) + \
-		 ((index) - 1) * sizeof(xfs_bmbt_ptr_t)))
-
-#define XFS_BMDR_REC_ADDR(block, index) \
-	((xfs_bmdr_rec_t *) \
-		((char *)(block) + \
-		 sizeof(struct xfs_bmdr_block) + \
-	         ((index) - 1) * sizeof(xfs_bmdr_rec_t)))
-
-#define XFS_BMDR_KEY_ADDR(block, index) \
-	((xfs_bmdr_key_t *) \
-		((char *)(block) + \
-		 sizeof(struct xfs_bmdr_block) + \
-		 ((index) - 1) * sizeof(xfs_bmdr_key_t)))
-
-#define XFS_BMDR_PTR_ADDR(block, index, maxrecs) \
-	((xfs_bmdr_ptr_t *) \
-		((char *)(block) + \
-		 sizeof(struct xfs_bmdr_block) + \
-		 (maxrecs) * sizeof(xfs_bmdr_key_t) + \
-		 ((index) - 1) * sizeof(xfs_bmdr_ptr_t)))
-
-/*
- * These are to be used when we know the size of the block and
- * we don't have a cursor.
- */
-#define XFS_BMAP_BROOT_PTR_ADDR(mp, bb, i, sz) \
-	XFS_BMBT_PTR_ADDR(mp, bb, i, xfs_bmbt_maxrecs(mp, sz, 0))
-
-#define XFS_BMAP_BROOT_SPACE_CALC(mp, nrecs) \
-	(int)(XFS_BMBT_BLOCK_LEN(mp) + \
-	       ((nrecs) * (sizeof(xfs_bmbt_key_t) + sizeof(xfs_bmbt_ptr_t))))
-
-#define XFS_BMAP_BROOT_SPACE(mp, bb) \
-	(XFS_BMAP_BROOT_SPACE_CALC(mp, be16_to_cpu((bb)->bb_numrecs)))
-#define XFS_BMDR_SPACE_CALC(nrecs) \
-	(int)(sizeof(xfs_bmdr_block_t) + \
-	       ((nrecs) * (sizeof(xfs_bmbt_key_t) + sizeof(xfs_bmbt_ptr_t))))
-#define XFS_BMAP_BMDR_SPACE(bb) \
-	(XFS_BMDR_SPACE_CALC(be16_to_cpu((bb)->bb_numrecs)))
-
 /*
  * Maximum number of bmap btree levels.
  */
@@ -121,4 +57,144 @@ void xfs_bmbt_destroy_cur_cache(void);
 void xfs_bmbt_init_block(struct xfs_inode *ip, struct xfs_btree_block *buf,
 		struct xfs_buf *bp, __u16 level, __u16 numrecs);
 
+/*
+ * Btree block header size depends on a superblock flag.
+ */
+static inline size_t
+xfs_bmbt_block_len(struct xfs_mount *mp)
+{
+	return xfs_has_crc(mp) ?
+			XFS_BTREE_LBLOCK_CRC_LEN : XFS_BTREE_LBLOCK_LEN;
+}
+
+/* Addresses of key, pointers, and records within an incore bmbt block. */
+
+static inline struct xfs_bmbt_rec *
+xfs_bmbt_rec_addr(
+	struct xfs_mount	*mp,
+	struct xfs_btree_block	*block,
+	unsigned int		index)
+{
+	return (struct xfs_bmbt_rec *)
+		((char *)block + xfs_bmbt_block_len(mp) +
+		 (index - 1) * sizeof(struct xfs_bmbt_rec));
+}
+
+static inline struct xfs_bmbt_key *
+xfs_bmbt_key_addr(
+	struct xfs_mount	*mp,
+	struct xfs_btree_block	*block,
+	unsigned int		index)
+{
+	return (struct xfs_bmbt_key *)
+		((char *)block + xfs_bmbt_block_len(mp) +
+		 (index - 1) * sizeof(struct xfs_bmbt_key *));
+}
+
+static inline xfs_bmbt_ptr_t *
+xfs_bmbt_ptr_addr(
+	struct xfs_mount	*mp,
+	struct xfs_btree_block	*block,
+	unsigned int		index,
+	unsigned int		maxrecs)
+{
+	return (xfs_bmbt_ptr_t *)
+		((char *)block + xfs_bmbt_block_len(mp) +
+		 maxrecs * sizeof(struct xfs_bmbt_key) +
+		 (index - 1) * sizeof(xfs_bmbt_ptr_t));
+}
+
+/* Addresses of key, pointers, and records within an ondisk bmbt block. */
+
+static inline struct xfs_bmbt_rec *
+xfs_bmdr_rec_addr(
+	struct xfs_bmdr_block	*block,
+	unsigned int		index)
+{
+	return (struct xfs_bmbt_rec *)
+		((char *)(block + 1) +
+		 (index - 1) * sizeof(struct xfs_bmbt_rec));
+}
+
+static inline struct xfs_bmbt_key *
+xfs_bmdr_key_addr(
+	struct xfs_bmdr_block	*block,
+	unsigned int		index)
+{
+	return (struct xfs_bmbt_key *)
+		((char *)(block + 1) +
+		 (index - 1) * sizeof(struct xfs_bmbt_key));
+}
+
+static inline xfs_bmbt_ptr_t *
+xfs_bmdr_ptr_addr(
+	struct xfs_bmdr_block	*block,
+	unsigned int		index,
+	unsigned int		maxrecs)
+{
+	return (xfs_bmbt_ptr_t *)
+		((char *)(block + 1) +
+		 maxrecs * sizeof(struct xfs_bmbt_key) +
+		 (index - 1) * sizeof(xfs_bmbt_ptr_t));
+}
+
+/*
+ * Address of pointers within the incore btree root.
+ *
+ * These are to be used when we know the size of the block and
+ * we don't have a cursor.
+ */
+static inline xfs_bmbt_ptr_t *
+xfs_bmap_broot_ptr_addr(
+	struct xfs_mount	*mp,
+	struct xfs_btree_block	*bb,
+	unsigned int		i,
+	unsigned int		sz)
+{
+	return xfs_bmbt_ptr_addr(mp, bb, i, xfs_bmbt_maxrecs(mp, sz, 0));
+}
+
+/*
+ * Compute the space required for the incore btree root containing the given
+ * number of records.
+ */
+static inline size_t
+xfs_bmap_broot_space_calc(
+	struct xfs_mount	*mp,
+	unsigned int		nrecs)
+{
+	return xfs_bmbt_block_len(mp) +
+	       (nrecs * (sizeof(struct xfs_bmbt_key) + sizeof(xfs_bmbt_ptr_t)));
+}
+
+/*
+ * Compute the space required for the incore btree root given the ondisk
+ * btree root block.
+ */
+static inline size_t
+xfs_bmap_broot_space(
+	struct xfs_mount	*mp,
+	struct xfs_bmdr_block	*bb)
+{
+	return xfs_bmap_broot_space_calc(mp, be16_to_cpu(bb->bb_numrecs));
+}
+
+/* Compute the space required for the ondisk root block. */
+static inline size_t
+xfs_bmdr_space_calc(unsigned int nrecs)
+{
+	return sizeof(struct xfs_bmdr_block) +
+	       (nrecs * (sizeof(struct xfs_bmbt_key) + sizeof(xfs_bmbt_ptr_t)));
+}
+
+/*
+ * Compute the space required for the ondisk root block given an incore root
+ * block.
+ */
+static inline size_t
+xfs_bmap_bmdr_space(struct xfs_btree_block *bb)
+{
+	return xfs_bmdr_space_calc(be16_to_cpu(bb->bb_numrecs));
+}
+
 #endif	/* __XFS_BMAP_BTREE_H__ */
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 622382300904..973e027e3d88 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -185,7 +185,7 @@ xfs_iformat_btree(
 
 	ifp = xfs_ifork_ptr(ip, whichfork);
 	dfp = (xfs_bmdr_block_t *)XFS_DFORK_PTR(dip, whichfork);
-	size = XFS_BMAP_BROOT_SPACE(mp, dfp);
+	size = xfs_bmap_broot_space(mp, dfp);
 	nrecs = be16_to_cpu(dfp->bb_numrecs);
 	level = be16_to_cpu(dfp->bb_level);
 
@@ -198,7 +198,7 @@ xfs_iformat_btree(
 	 */
 	if (unlikely(ifp->if_nextents <= XFS_IFORK_MAXEXT(ip, whichfork) ||
 		     nrecs == 0 ||
-		     XFS_BMDR_SPACE_CALC(nrecs) >
+		     xfs_bmdr_space_calc(nrecs) >
 					XFS_DFORK_SIZE(dip, mp, whichfork) ||
 		     ifp->if_nextents > ip->i_nblocks) ||
 		     level == 0 || level > XFS_BM_MAXLEVELS(mp, whichfork)) {
@@ -409,7 +409,7 @@ xfs_iroot_realloc(
 		 * allocate it now and get out.
 		 */
 		if (ifp->if_broot_bytes == 0) {
-			new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, rec_diff);
+			new_size = xfs_bmap_broot_space_calc(mp, rec_diff);
 			ifp->if_broot = kmalloc(new_size,
 						GFP_KERNEL | __GFP_NOFAIL);
 			ifp->if_broot_bytes = (int)new_size;
@@ -424,15 +424,15 @@ xfs_iroot_realloc(
 		 */
 		cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0);
 		new_max = cur_max + rec_diff;
-		new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max);
+		new_size = xfs_bmap_broot_space_calc(mp, new_max);
 		ifp->if_broot = krealloc(ifp->if_broot, new_size,
 					 GFP_KERNEL | __GFP_NOFAIL);
-		op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
+		op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
 						     ifp->if_broot_bytes);
-		np = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
+		np = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
 						     (int)new_size);
 		ifp->if_broot_bytes = (int)new_size;
-		ASSERT(XFS_BMAP_BMDR_SPACE(ifp->if_broot) <=
+		ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <=
 			xfs_inode_fork_size(ip, whichfork));
 		memmove(np, op, cur_max * (uint)sizeof(xfs_fsblock_t));
 		return;
@@ -448,7 +448,7 @@ xfs_iroot_realloc(
 	new_max = cur_max + rec_diff;
 	ASSERT(new_max >= 0);
 	if (new_max > 0)
-		new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max);
+		new_size = xfs_bmap_broot_space_calc(mp, new_max);
 	else
 		new_size = 0;
 	if (new_size > 0) {
@@ -457,7 +457,7 @@ xfs_iroot_realloc(
 		 * First copy over the btree block header.
 		 */
 		memcpy(new_broot, ifp->if_broot,
-			XFS_BMBT_BLOCK_LEN(ip->i_mount));
+			xfs_bmbt_block_len(ip->i_mount));
 	} else {
 		new_broot = NULL;
 	}
@@ -469,16 +469,16 @@ xfs_iroot_realloc(
 		/*
 		 * First copy the keys.
 		 */
-		op = (char *)XFS_BMBT_KEY_ADDR(mp, ifp->if_broot, 1);
-		np = (char *)XFS_BMBT_KEY_ADDR(mp, new_broot, 1);
+		op = (char *)xfs_bmbt_key_addr(mp, ifp->if_broot, 1);
+		np = (char *)xfs_bmbt_key_addr(mp, new_broot, 1);
 		memcpy(np, op, new_max * (uint)sizeof(xfs_bmbt_key_t));
 
 		/*
 		 * Then copy the pointers.
 		 */
-		op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
+		op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
 						     ifp->if_broot_bytes);
-		np = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, new_broot, 1,
+		np = (char *)xfs_bmap_broot_ptr_addr(mp, new_broot, 1,
 						     (int)new_size);
 		memcpy(np, op, new_max * (uint)sizeof(xfs_fsblock_t));
 	}
@@ -486,7 +486,7 @@ xfs_iroot_realloc(
 	ifp->if_broot = new_broot;
 	ifp->if_broot_bytes = (int)new_size;
 	if (ifp->if_broot)
-		ASSERT(XFS_BMAP_BMDR_SPACE(ifp->if_broot) <=
+		ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <=
 			xfs_inode_fork_size(ip, whichfork));
 	return;
 }
@@ -655,7 +655,7 @@ xfs_iflush_fork(
 		if ((iip->ili_fields & brootflag[whichfork]) &&
 		    (ifp->if_broot_bytes > 0)) {
 			ASSERT(ifp->if_broot != NULL);
-			ASSERT(XFS_BMAP_BMDR_SPACE(ifp->if_broot) <=
+			ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <=
 			        xfs_inode_fork_size(ip, whichfork));
 			xfs_bmbt_to_bmdr(mp, ifp->if_broot, ifp->if_broot_bytes,
 				(xfs_bmdr_block_t *)cp,
diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 2e6d7bb3b5a2..1a7f95bcf069 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -130,7 +130,7 @@ xfs_calc_inode_res(
 		(4 * sizeof(struct xlog_op_header) +
 		 sizeof(struct xfs_inode_log_format) +
 		 mp->m_sb.sb_inodesize +
-		 2 * XFS_BMBT_BLOCK_LEN(mp));
+		 2 * xfs_bmbt_block_len(mp));
 }
 
 /*
diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c
index 1e656fab5e41..49dc38acc66b 100644
--- a/fs/xfs/scrub/bmap_repair.c
+++ b/fs/xfs/scrub/bmap_repair.c
@@ -480,7 +480,7 @@ xrep_bmap_iroot_size(
 {
 	ASSERT(level > 0);
 
-	return XFS_BMAP_BROOT_SPACE_CALC(cur->bc_mp, nr_this_level);
+	return xfs_bmap_broot_space_calc(cur->bc_mp, nr_this_level);
 }
 
 /* Update the inode counters. */
diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c
index daf9f1ee7c2c..3e45b9b72312 100644
--- a/fs/xfs/scrub/inode_repair.c
+++ b/fs/xfs/scrub/inode_repair.c
@@ -846,7 +846,7 @@ xrep_dinode_bad_bmbt_fork(
 	nrecs = be16_to_cpu(dfp->bb_numrecs);
 	level = be16_to_cpu(dfp->bb_level);
 
-	if (nrecs == 0 || XFS_BMDR_SPACE_CALC(nrecs) > dfork_size)
+	if (nrecs == 0 || xfs_bmdr_space_calc(nrecs) > dfork_size)
 		return true;
 	if (level == 0 || level >= XFS_BM_MAXLEVELS(sc->mp, whichfork))
 		return true;
@@ -858,12 +858,12 @@ xrep_dinode_bad_bmbt_fork(
 		xfs_fileoff_t		fileoff;
 		xfs_fsblock_t		fsbno;
 
-		fkp = XFS_BMDR_KEY_ADDR(dfp, i);
+		fkp = xfs_bmdr_key_addr(dfp, i);
 		fileoff = be64_to_cpu(fkp->br_startoff);
 		if (!xfs_verify_fileoff(sc->mp, fileoff))
 			return true;
 
-		fpp = XFS_BMDR_PTR_ADDR(dfp, i, dmxr);
+		fpp = xfs_bmdr_ptr_addr(dfp, i, dmxr);
 		fsbno = be64_to_cpu(*fpp);
 		if (!xfs_verify_fsbno(sc->mp, fsbno))
 			return true;
@@ -1121,7 +1121,7 @@ xrep_dinode_ensure_forkoff(
 	struct xfs_bmdr_block	*bmdr;
 	struct xfs_scrub	*sc = ri->sc;
 	xfs_extnum_t		attr_extents, data_extents;
-	size_t			bmdr_minsz = XFS_BMDR_SPACE_CALC(1);
+	size_t			bmdr_minsz = xfs_bmdr_space_calc(1);
 	unsigned int		lit_sz = XFS_LITINO(sc->mp);
 	unsigned int		afork_min, dfork_min;
 
@@ -1173,7 +1173,7 @@ xrep_dinode_ensure_forkoff(
 	case XFS_DINODE_FMT_BTREE:
 		/* Must have space for btree header and key/pointers. */
 		bmdr = XFS_DFORK_PTR(dip, XFS_ATTR_FORK);
-		afork_min = XFS_BMAP_BROOT_SPACE(sc->mp, bmdr);
+		afork_min = xfs_bmap_broot_space(sc->mp, bmdr);
 		break;
 	default:
 		/* We should never see any other formats. */
@@ -1223,7 +1223,7 @@ xrep_dinode_ensure_forkoff(
 	case XFS_DINODE_FMT_BTREE:
 		/* Must have space for btree header and key/pointers. */
 		bmdr = XFS_DFORK_PTR(dip, XFS_DATA_FORK);
-		dfork_min = XFS_BMAP_BROOT_SPACE(sc->mp, bmdr);
+		dfork_min = xfs_bmap_broot_space(sc->mp, bmdr);
 		break;
 	default:
 		dfork_min = 0;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index fe2e2c930975..a2c8f0dd85d0 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1184,7 +1184,7 @@ xfs_swap_extents_check_format(
 	 */
 	if (tifp->if_format == XFS_DINODE_FMT_BTREE) {
 		if (xfs_inode_has_attr_fork(ip) &&
-		    XFS_BMAP_BMDR_SPACE(tifp->if_broot) > xfs_inode_fork_boff(ip))
+		    xfs_bmap_bmdr_space(tifp->if_broot) > xfs_inode_fork_boff(ip))
 			return -EINVAL;
 		if (tifp->if_nextents <= XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK))
 			return -EINVAL;
@@ -1193,7 +1193,7 @@ xfs_swap_extents_check_format(
 	/* Reciprocal target->temp btree format checks */
 	if (ifp->if_format == XFS_DINODE_FMT_BTREE) {
 		if (xfs_inode_has_attr_fork(tip) &&
-		    XFS_BMAP_BMDR_SPACE(ip->i_df.if_broot) > xfs_inode_fork_boff(tip))
+		    xfs_bmap_bmdr_space(ip->i_df.if_broot) > xfs_inode_fork_boff(tip))
 			return -EINVAL;
 		if (ifp->if_nextents <= XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK))
 			return -EINVAL;


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 2/2] xfs: standardize the btree maxrecs function parameters
  2024-09-02 18:22 ` [PATCHSET v4.2 8/8] xfs: cleanups for inode rooted btree code Darrick J. Wong
  2024-09-02 18:33   ` [PATCH 1/2] xfs: replace shouty XFS_BM{BT,DR} macros Darrick J. Wong
@ 2024-09-02 18:33   ` Darrick J. Wong
  1 sibling, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:33 UTC (permalink / raw)
  To: chandanbabu, djwong; +Cc: Christoph Hellwig, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Standardize the parameters in xfs_{alloc,bm,ino,rmap,refcount}bt_maxrecs
so that we have consistent calling conventions.  This doesn't affect the
kernel that much, but enables us to clean up userspace a bit.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_alloc_btree.c    |    6 +++---
 fs/xfs/libxfs/xfs_alloc_btree.h    |    3 ++-
 fs/xfs/libxfs/xfs_bmap.c           |    2 +-
 fs/xfs/libxfs/xfs_bmap_btree.c     |    6 +++---
 fs/xfs/libxfs/xfs_bmap_btree.h     |    5 +++--
 fs/xfs/libxfs/xfs_ialloc.c         |    4 ++--
 fs/xfs/libxfs/xfs_ialloc_btree.c   |    6 +++---
 fs/xfs/libxfs/xfs_ialloc_btree.h   |    3 ++-
 fs/xfs/libxfs/xfs_inode_fork.c     |    4 ++--
 fs/xfs/libxfs/xfs_refcount_btree.c |    5 +++--
 fs/xfs/libxfs/xfs_refcount_btree.h |    3 ++-
 fs/xfs/libxfs/xfs_rmap_btree.c     |    7 ++++---
 fs/xfs/libxfs/xfs_rmap_btree.h     |    3 ++-
 fs/xfs/libxfs/xfs_sb.c             |   16 ++++++++--------
 14 files changed, 40 insertions(+), 33 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
index 585e98e87ef9..aada676eee51 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.c
+++ b/fs/xfs/libxfs/xfs_alloc_btree.c
@@ -569,11 +569,11 @@ xfs_allocbt_block_maxrecs(
 /*
  * Calculate number of records in an alloc btree block.
  */
-int
+unsigned int
 xfs_allocbt_maxrecs(
 	struct xfs_mount	*mp,
-	int			blocklen,
-	int			leaf)
+	unsigned int		blocklen,
+	bool			leaf)
 {
 	blocklen -= XFS_ALLOC_BLOCK_LEN(mp);
 	return xfs_allocbt_block_maxrecs(blocklen, leaf);
diff --git a/fs/xfs/libxfs/xfs_alloc_btree.h b/fs/xfs/libxfs/xfs_alloc_btree.h
index 155b47f231ab..12647f9aaa6d 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.h
+++ b/fs/xfs/libxfs/xfs_alloc_btree.h
@@ -53,7 +53,8 @@ struct xfs_btree_cur *xfs_bnobt_init_cursor(struct xfs_mount *mp,
 struct xfs_btree_cur *xfs_cntbt_init_cursor(struct xfs_mount *mp,
 		struct xfs_trans *tp, struct xfs_buf *bp,
 		struct xfs_perag *pag);
-extern int xfs_allocbt_maxrecs(struct xfs_mount *, int, int);
+unsigned int xfs_allocbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen,
+		bool leaf);
 extern xfs_extlen_t xfs_allocbt_calc_size(struct xfs_mount *mp,
 		unsigned long long len);
 
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 00cac756c956..28473b6a95cc 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -584,7 +584,7 @@ xfs_bmap_btree_to_extents(
 	ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE);
 	ASSERT(be16_to_cpu(rblock->bb_level) == 1);
 	ASSERT(be16_to_cpu(rblock->bb_numrecs) == 1);
-	ASSERT(xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0) == 1);
+	ASSERT(xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, false) == 1);
 
 	pp = xfs_bmap_broot_ptr_addr(mp, rblock, 1, ifp->if_broot_bytes);
 	cbno = be64_to_cpu(*pp);
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 3695b3ad07d4..3464be771f95 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -645,11 +645,11 @@ xfs_bmbt_commit_staged_btree(
 /*
  * Calculate number of records in a bmap btree block.
  */
-int
+unsigned int
 xfs_bmbt_maxrecs(
 	struct xfs_mount	*mp,
-	int			blocklen,
-	int			leaf)
+	unsigned int		blocklen,
+	bool			leaf)
 {
 	blocklen -= xfs_bmbt_block_len(mp);
 	return xfs_bmbt_block_maxrecs(blocklen, leaf);
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.h b/fs/xfs/libxfs/xfs_bmap_btree.h
index d006798d591b..49a3bae3f6ec 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.h
+++ b/fs/xfs/libxfs/xfs_bmap_btree.h
@@ -35,7 +35,8 @@ extern void xfs_bmbt_to_bmdr(struct xfs_mount *, struct xfs_btree_block *, int,
 
 extern int xfs_bmbt_get_maxrecs(struct xfs_btree_cur *, int level);
 extern int xfs_bmdr_maxrecs(int blocklen, int leaf);
-extern int xfs_bmbt_maxrecs(struct xfs_mount *, int blocklen, int leaf);
+unsigned int xfs_bmbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen,
+		bool leaf);
 
 extern int xfs_bmbt_change_owner(struct xfs_trans *tp, struct xfs_inode *ip,
 				 int whichfork, xfs_ino_t new_owner,
@@ -151,7 +152,7 @@ xfs_bmap_broot_ptr_addr(
 	unsigned int		i,
 	unsigned int		sz)
 {
-	return xfs_bmbt_ptr_addr(mp, bb, i, xfs_bmbt_maxrecs(mp, sz, 0));
+	return xfs_bmbt_ptr_addr(mp, bb, i, xfs_bmbt_maxrecs(mp, sz, false));
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index fc70601e8d8e..20bb5ce38134 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -2948,8 +2948,8 @@ xfs_ialloc_setup_geometry(
 
 	/* Compute inode btree geometry. */
 	igeo->agino_log = sbp->sb_inopblog + sbp->sb_agblklog;
-	igeo->inobt_mxr[0] = xfs_inobt_maxrecs(mp, sbp->sb_blocksize, 1);
-	igeo->inobt_mxr[1] = xfs_inobt_maxrecs(mp, sbp->sb_blocksize, 0);
+	igeo->inobt_mxr[0] = xfs_inobt_maxrecs(mp, sbp->sb_blocksize, true);
+	igeo->inobt_mxr[1] = xfs_inobt_maxrecs(mp, sbp->sb_blocksize, false);
 	igeo->inobt_mnr[0] = igeo->inobt_mxr[0] / 2;
 	igeo->inobt_mnr[1] = igeo->inobt_mxr[1] / 2;
 
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index 797d5b5f7b72..401b42d52af6 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -572,11 +572,11 @@ xfs_inobt_block_maxrecs(
 /*
  * Calculate number of records in an inobt btree block.
  */
-int
+unsigned int
 xfs_inobt_maxrecs(
 	struct xfs_mount	*mp,
-	int			blocklen,
-	int			leaf)
+	unsigned int		blocklen,
+	bool			leaf)
 {
 	blocklen -= XFS_INOBT_BLOCK_LEN(mp);
 	return xfs_inobt_block_maxrecs(blocklen, leaf);
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.h b/fs/xfs/libxfs/xfs_ialloc_btree.h
index 6472ec1ecbb4..300edf5bc009 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.h
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.h
@@ -50,7 +50,8 @@ struct xfs_btree_cur *xfs_inobt_init_cursor(struct xfs_perag *pag,
 		struct xfs_trans *tp, struct xfs_buf *agbp);
 struct xfs_btree_cur *xfs_finobt_init_cursor(struct xfs_perag *pag,
 		struct xfs_trans *tp, struct xfs_buf *agbp);
-extern int xfs_inobt_maxrecs(struct xfs_mount *, int, int);
+unsigned int xfs_inobt_maxrecs(struct xfs_mount *mp, unsigned int blocklen,
+		bool leaf);
 
 /* ir_holemask to inode allocation bitmap conversion */
 uint64_t xfs_inobt_irec_to_allocmask(const struct xfs_inobt_rec_incore *irec);
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 973e027e3d88..1158ca48626b 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -422,7 +422,7 @@ xfs_iroot_realloc(
 		 * location.  The records don't change location because
 		 * they are kept butted up against the btree block header.
 		 */
-		cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0);
+		cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, false);
 		new_max = cur_max + rec_diff;
 		new_size = xfs_bmap_broot_space_calc(mp, new_max);
 		ifp->if_broot = krealloc(ifp->if_broot, new_size,
@@ -444,7 +444,7 @@ xfs_iroot_realloc(
 	 * records, just get rid of the root and clear the status bit.
 	 */
 	ASSERT((ifp->if_broot != NULL) && (ifp->if_broot_bytes > 0));
-	cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0);
+	cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, false);
 	new_max = cur_max + rec_diff;
 	ASSERT(new_max >= 0);
 	if (new_max > 0)
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index cb3b1d42ae9a..795928d1a66d 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -417,9 +417,10 @@ xfs_refcountbt_block_maxrecs(
 /*
  * Calculate the number of records in a refcount btree block.
  */
-int
+unsigned int
 xfs_refcountbt_maxrecs(
-	int			blocklen,
+	struct xfs_mount	*mp,
+	unsigned int		blocklen,
 	bool			leaf)
 {
 	blocklen -= XFS_REFCOUNT_BLOCK_LEN;
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.h b/fs/xfs/libxfs/xfs_refcount_btree.h
index 1e0ab25f6c68..beb93bef6a81 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.h
+++ b/fs/xfs/libxfs/xfs_refcount_btree.h
@@ -48,7 +48,8 @@ struct xbtree_afakeroot;
 extern struct xfs_btree_cur *xfs_refcountbt_init_cursor(struct xfs_mount *mp,
 		struct xfs_trans *tp, struct xfs_buf *agbp,
 		struct xfs_perag *pag);
-extern int xfs_refcountbt_maxrecs(int blocklen, bool leaf);
+unsigned int xfs_refcountbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen,
+		bool leaf);
 extern void xfs_refcountbt_compute_maxlevels(struct xfs_mount *mp);
 
 extern xfs_extlen_t xfs_refcountbt_calc_size(struct xfs_mount *mp,
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index 56fd6c4bd8b4..ac2f1f499b76 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -731,10 +731,11 @@ xfs_rmapbt_block_maxrecs(
 /*
  * Calculate number of records in an rmap btree block.
  */
-int
+unsigned int
 xfs_rmapbt_maxrecs(
-	int			blocklen,
-	int			leaf)
+	struct xfs_mount	*mp,
+	unsigned int		blocklen,
+	bool			leaf)
 {
 	blocklen -= XFS_RMAP_BLOCK_LEN;
 	return xfs_rmapbt_block_maxrecs(blocklen, leaf);
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index eb90d89e8086..119b1567cd0e 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -47,7 +47,8 @@ struct xfs_btree_cur *xfs_rmapbt_init_cursor(struct xfs_mount *mp,
 				struct xfs_perag *pag);
 void xfs_rmapbt_commit_staged_btree(struct xfs_btree_cur *cur,
 		struct xfs_trans *tp, struct xfs_buf *agbp);
-int xfs_rmapbt_maxrecs(int blocklen, int leaf);
+unsigned int xfs_rmapbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen,
+		bool leaf);
 extern void xfs_rmapbt_compute_maxlevels(struct xfs_mount *mp);
 
 extern xfs_extlen_t xfs_rmapbt_calc_size(struct xfs_mount *mp,
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index a6fa9aedb28b..d95409f3cba6 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -1000,23 +1000,23 @@ xfs_sb_mount_common(
 	mp->m_blockwmask = mp->m_blockwsize - 1;
 	xfs_mount_sb_set_rextsize(mp, sbp);
 
-	mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 1);
-	mp->m_alloc_mxr[1] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 0);
+	mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, true);
+	mp->m_alloc_mxr[1] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, false);
 	mp->m_alloc_mnr[0] = mp->m_alloc_mxr[0] / 2;
 	mp->m_alloc_mnr[1] = mp->m_alloc_mxr[1] / 2;
 
-	mp->m_bmap_dmxr[0] = xfs_bmbt_maxrecs(mp, sbp->sb_blocksize, 1);
-	mp->m_bmap_dmxr[1] = xfs_bmbt_maxrecs(mp, sbp->sb_blocksize, 0);
+	mp->m_bmap_dmxr[0] = xfs_bmbt_maxrecs(mp, sbp->sb_blocksize, true);
+	mp->m_bmap_dmxr[1] = xfs_bmbt_maxrecs(mp, sbp->sb_blocksize, false);
 	mp->m_bmap_dmnr[0] = mp->m_bmap_dmxr[0] / 2;
 	mp->m_bmap_dmnr[1] = mp->m_bmap_dmxr[1] / 2;
 
-	mp->m_rmap_mxr[0] = xfs_rmapbt_maxrecs(sbp->sb_blocksize, 1);
-	mp->m_rmap_mxr[1] = xfs_rmapbt_maxrecs(sbp->sb_blocksize, 0);
+	mp->m_rmap_mxr[0] = xfs_rmapbt_maxrecs(mp, sbp->sb_blocksize, true);
+	mp->m_rmap_mxr[1] = xfs_rmapbt_maxrecs(mp, sbp->sb_blocksize, false);
 	mp->m_rmap_mnr[0] = mp->m_rmap_mxr[0] / 2;
 	mp->m_rmap_mnr[1] = mp->m_rmap_mxr[1] / 2;
 
-	mp->m_refc_mxr[0] = xfs_refcountbt_maxrecs(sbp->sb_blocksize, true);
-	mp->m_refc_mxr[1] = xfs_refcountbt_maxrecs(sbp->sb_blocksize, false);
+	mp->m_refc_mxr[0] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize, true);
+	mp->m_refc_mxr[1] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize, false);
 	mp->m_refc_mnr[0] = mp->m_refc_mxr[0] / 2;
 	mp->m_refc_mnr[1] = mp->m_refc_mxr[1] / 2;
 


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/1] xfs: introduce new file range commit ioctls
  2024-09-02 18:23   ` [PATCH 1/1] xfs: introduce new file range commit ioctls Darrick J. Wong
@ 2024-09-03  7:52     ` Christian Brauner
  2024-10-25 21:23       ` Darrick J. Wong
  0 siblings, 1 reply; 53+ messages in thread
From: Christian Brauner @ 2024-09-03  7:52 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: chandanbabu, Jeff Layton, Christoph Hellwig, linux-fsdevel,
	linux-xfs

On Mon, Sep 02, 2024 at 11:23:07AM GMT, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> This patch introduces two more new ioctls to manage atomic updates to
> file contents -- XFS_IOC_START_COMMIT and XFS_IOC_COMMIT_RANGE.  The
> commit mechanism here is exactly the same as what XFS_IOC_EXCHANGE_RANGE
> does, but with the additional requirement that file2 cannot have changed
> since some sampling point.  The start-commit ioctl performs the sampling
> of file attributes.
> 
> Note: This patch currently samples i_ctime during START_COMMIT and
> checks that it hasn't changed during COMMIT_RANGE.  This isn't entirely
> safe in kernels prior to 6.12 because ctime only had coarse grained
> granularity and very fast updates could collide with a COMMIT_RANGE.
> With the multi-granularity ctime introduced by Jeff Layton, it's now
> possible to update ctime such that this does not happen.
> 
> It is critical, then, that this patch must not be backported to any
> kernel that does not support fine-grained file change timestamps.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Acked-by: Jeff Layton <jlayton@kernel.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/libxfs/xfs_fs.h |   26 +++++++++
>  fs/xfs/xfs_exchrange.c |  143 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_exchrange.h |   16 +++++
>  fs/xfs/xfs_ioctl.c     |    4 +
>  fs/xfs/xfs_trace.h     |   57 +++++++++++++++++++
>  5 files changed, 243 insertions(+), 3 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 454b63ef7201..c85c8077fac3 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -825,6 +825,30 @@ struct xfs_exchange_range {
>  	__u64		flags;		/* see XFS_EXCHANGE_RANGE_* below */
>  };
>  
> +/*
> + * Using the same definition of file2 as struct xfs_exchange_range, commit the
> + * contents of file1 into file2 if file2 has the same inode number, mtime, and
> + * ctime as the arguments provided to the call.  The old contents of file2 will
> + * be moved to file1.
> + *
> + * Returns -EBUSY if there isn't an exact match for the file2 fields.
> + *
> + * Filesystems must be able to restart and complete the operation even after
> + * the system goes down.
> + */
> +struct xfs_commit_range {
> +	__s32		file1_fd;
> +	__u32		pad;		/* must be zeroes */
> +	__u64		file1_offset;	/* file1 offset, bytes */
> +	__u64		file2_offset;	/* file2 offset, bytes */
> +	__u64		length;		/* bytes to exchange */
> +
> +	__u64		flags;		/* see XFS_EXCHANGE_RANGE_* below */
> +
> +	/* opaque file2 metadata for freshness checks */
> +	__u64		file2_freshness[6];
> +};
> +
>  /*
>   * Exchange file data all the way to the ends of both files, and then exchange
>   * the file sizes.  This flag can be used to replace a file's contents with a
> @@ -997,6 +1021,8 @@ struct xfs_getparents_by_handle {
>  #define XFS_IOC_BULKSTAT	     _IOR ('X', 127, struct xfs_bulkstat_req)
>  #define XFS_IOC_INUMBERS	     _IOR ('X', 128, struct xfs_inumbers_req)
>  #define XFS_IOC_EXCHANGE_RANGE	     _IOW ('X', 129, struct xfs_exchange_range)
> +#define XFS_IOC_START_COMMIT	     _IOR ('X', 130, struct xfs_commit_range)
> +#define XFS_IOC_COMMIT_RANGE	     _IOW ('X', 131, struct xfs_commit_range)
>  /*	XFS_IOC_GETFSUUID ---------- deprecated 140	 */
>  
>  
> diff --git a/fs/xfs/xfs_exchrange.c b/fs/xfs/xfs_exchrange.c
> index c8a655c92c92..d0889190ab7f 100644
> --- a/fs/xfs/xfs_exchrange.c
> +++ b/fs/xfs/xfs_exchrange.c
> @@ -72,6 +72,34 @@ xfs_exchrange_estimate(
>  	return error;
>  }
>  
> +/*
> + * Check that file2's metadata agree with the snapshot that we took for the
> + * range commit request.
> + *
> + * This should be called after the filesystem has locked /all/ inode metadata
> + * against modification.
> + */
> +STATIC int
> +xfs_exchrange_check_freshness(
> +	const struct xfs_exchrange	*fxr,
> +	struct xfs_inode		*ip2)
> +{
> +	struct inode			*inode2 = VFS_I(ip2);
> +	struct timespec64		ctime = inode_get_ctime(inode2);
> +	struct timespec64		mtime = inode_get_mtime(inode2);
> +
> +	trace_xfs_exchrange_freshness(fxr, ip2);
> +
> +	/* Check that file2 hasn't otherwise been modified. */
> +	if (fxr->file2_ino != ip2->i_ino ||
> +	    fxr->file2_gen != inode2->i_generation ||
> +	    !timespec64_equal(&fxr->file2_ctime, &ctime) ||
> +	    !timespec64_equal(&fxr->file2_mtime, &mtime))
> +		return -EBUSY;
> +
> +	return 0;
> +}
> +
>  #define QRETRY_IP1	(0x1)
>  #define QRETRY_IP2	(0x2)
>  
> @@ -607,6 +635,12 @@ xfs_exchrange_prep(
>  	if (error || fxr->length == 0)
>  		return error;
>  
> +	if (fxr->flags & __XFS_EXCHANGE_RANGE_CHECK_FRESH2) {
> +		error = xfs_exchrange_check_freshness(fxr, ip2);
> +		if (error)
> +			return error;
> +	}
> +
>  	/* Attach dquots to both inodes before changing block maps. */
>  	error = xfs_qm_dqattach(ip2);
>  	if (error)
> @@ -719,7 +753,8 @@ xfs_exchange_range(
>  	if (fxr->file1->f_path.mnt != fxr->file2->f_path.mnt)
>  		return -EXDEV;
>  
> -	if (fxr->flags & ~XFS_EXCHANGE_RANGE_ALL_FLAGS)
> +	if (fxr->flags & ~(XFS_EXCHANGE_RANGE_ALL_FLAGS |
> +			 __XFS_EXCHANGE_RANGE_CHECK_FRESH2))
>  		return -EINVAL;
>  
>  	/* Userspace requests only honored for regular files. */
> @@ -802,3 +837,109 @@ xfs_ioc_exchange_range(
>  	fdput(file1);
>  	return error;
>  }
> +
> +/* Opaque freshness blob for XFS_IOC_COMMIT_RANGE */
> +struct xfs_commit_range_fresh {
> +	xfs_fsid_t	fsid;		/* m_fixedfsid */
> +	__u64		file2_ino;	/* inode number */
> +	__s64		file2_mtime;	/* modification time */
> +	__s64		file2_ctime;	/* change time */
> +	__s32		file2_mtime_nsec; /* mod time, nsec */
> +	__s32		file2_ctime_nsec; /* change time, nsec */
> +	__u32		file2_gen;	/* inode generation */
> +	__u32		magic;		/* zero */
> +};
> +#define XCR_FRESH_MAGIC	0x444F524B	/* DORK */
> +
> +/* Set up a commitrange operation by sampling file2's write-related attrs */
> +long
> +xfs_ioc_start_commit(
> +	struct file			*file,
> +	struct xfs_commit_range __user	*argp)
> +{
> +	struct xfs_commit_range		args = { };
> +	struct timespec64		ts;
> +	struct xfs_commit_range_fresh	*kern_f;
> +	struct xfs_commit_range_fresh	__user *user_f;
> +	struct inode			*inode2 = file_inode(file);
> +	struct xfs_inode		*ip2 = XFS_I(inode2);
> +	const unsigned int		lockflags = XFS_IOLOCK_SHARED |
> +						    XFS_MMAPLOCK_SHARED |
> +						    XFS_ILOCK_SHARED;
> +
> +	BUILD_BUG_ON(sizeof(struct xfs_commit_range_fresh) !=
> +		     sizeof(args.file2_freshness));
> +
> +	kern_f = (struct xfs_commit_range_fresh *)&args.file2_freshness;
> +
> +	memcpy(&kern_f->fsid, ip2->i_mount->m_fixedfsid, sizeof(xfs_fsid_t));
> +
> +	xfs_ilock(ip2, lockflags);
> +	ts = inode_get_ctime(inode2);
> +	kern_f->file2_ctime		= ts.tv_sec;
> +	kern_f->file2_ctime_nsec	= ts.tv_nsec;
> +	ts = inode_get_mtime(inode2);
> +	kern_f->file2_mtime		= ts.tv_sec;
> +	kern_f->file2_mtime_nsec	= ts.tv_nsec;
> +	kern_f->file2_ino		= ip2->i_ino;
> +	kern_f->file2_gen		= inode2->i_generation;
> +	kern_f->magic			= XCR_FRESH_MAGIC;
> +	xfs_iunlock(ip2, lockflags);
> +
> +	user_f = (struct xfs_commit_range_fresh __user *)&argp->file2_freshness;
> +	if (copy_to_user(user_f, kern_f, sizeof(*kern_f)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +/*
> + * Exchange file1 and file2 contents if file2 has not been written since the
> + * start commit operation.
> + */
> +long
> +xfs_ioc_commit_range(
> +	struct file			*file,
> +	struct xfs_commit_range __user	*argp)
> +{
> +	struct xfs_exchrange		fxr = {
> +		.file2			= file,
> +	};
> +	struct xfs_commit_range		args;
> +	struct xfs_commit_range_fresh	*kern_f;
> +	struct xfs_inode		*ip2 = XFS_I(file_inode(file));
> +	struct xfs_mount		*mp = ip2->i_mount;
> +	struct fd			file1;
> +	int				error;
> +
> +	kern_f = (struct xfs_commit_range_fresh *)&args.file2_freshness;
> +
> +	if (copy_from_user(&args, argp, sizeof(args)))
> +		return -EFAULT;
> +	if (args.flags & ~XFS_EXCHANGE_RANGE_ALL_FLAGS)
> +		return -EINVAL;
> +	if (kern_f->magic != XCR_FRESH_MAGIC)
> +		return -EBUSY;
> +	if (memcmp(&kern_f->fsid, mp->m_fixedfsid, sizeof(xfs_fsid_t)))
> +		return -EBUSY;

So, I mentioned this before in another mail a few months ago and I think
you liked the idea so just as a reminder in case you forgot:

Ioctls are extensible if done correctly:

switch (__IOC_NR(ioctl)) {
	case _IOC_NR(XFS_IOC_START_COMMIT): {
	        size_t usize = _IOC_SIZE(ioctl);
		struct xfs_commit_range	args;

		if (usize < XFS_IOC_START_COMMIT_SIZE_VER0)
                        return -EINVAL;

		if (copy_struct_from_user(&args, sizeof(args), argp, usize))
			return -EFAULT;
	}

If you code it this way, relying on copy_struct_from_user() right from
the start you can easily extend your struct in a backward and forward
compatible manner.

}

> +
> +	fxr.file1_offset	= args.file1_offset;
> +	fxr.file2_offset	= args.file2_offset;
> +	fxr.length		= args.length;
> +	fxr.flags		= args.flags | __XFS_EXCHANGE_RANGE_CHECK_FRESH2;
> +	fxr.file2_ino		= kern_f->file2_ino;
> +	fxr.file2_gen		= kern_f->file2_gen;
> +	fxr.file2_mtime.tv_sec	= kern_f->file2_mtime;
> +	fxr.file2_mtime.tv_nsec	= kern_f->file2_mtime_nsec;
> +	fxr.file2_ctime.tv_sec	= kern_f->file2_ctime;
> +	fxr.file2_ctime.tv_nsec	= kern_f->file2_ctime_nsec;
> +
> +	file1 = fdget(args.file1_fd);
> +	if (!file1.file)
> +		return -EBADF;

Please use CLASS(fd, f)(args.file1_fd) :)

> +	fxr.file1 = file1.file;
> +
> +	error = xfs_exchange_range(&fxr);
> +	fdput(file1);
> +	return error;
> +}
> diff --git a/fs/xfs/xfs_exchrange.h b/fs/xfs/xfs_exchrange.h
> index 039abcca546e..bc1298aba806 100644
> --- a/fs/xfs/xfs_exchrange.h
> +++ b/fs/xfs/xfs_exchrange.h
> @@ -10,8 +10,12 @@
>  #define __XFS_EXCHANGE_RANGE_UPD_CMTIME1	(1ULL << 63)
>  #define __XFS_EXCHANGE_RANGE_UPD_CMTIME2	(1ULL << 62)
>  
> +/* Freshness check required */
> +#define __XFS_EXCHANGE_RANGE_CHECK_FRESH2	(1ULL << 61)
> +
>  #define XFS_EXCHANGE_RANGE_PRIV_FLAGS	(__XFS_EXCHANGE_RANGE_UPD_CMTIME1 | \
> -					 __XFS_EXCHANGE_RANGE_UPD_CMTIME2)
> +					 __XFS_EXCHANGE_RANGE_UPD_CMTIME2 | \
> +					 __XFS_EXCHANGE_RANGE_CHECK_FRESH2)
>  
>  struct xfs_exchrange {
>  	struct file		*file1;
> @@ -22,10 +26,20 @@ struct xfs_exchrange {
>  	u64			length;
>  
>  	u64			flags;	/* XFS_EXCHANGE_RANGE flags */
> +
> +	/* file2 metadata for freshness checks */
> +	u64			file2_ino;
> +	struct timespec64	file2_mtime;
> +	struct timespec64	file2_ctime;
> +	u32			file2_gen;
>  };
>  
>  long xfs_ioc_exchange_range(struct file *file,
>  		struct xfs_exchange_range __user *argp);
> +long xfs_ioc_start_commit(struct file *file,
> +		struct xfs_commit_range __user *argp);
> +long xfs_ioc_commit_range(struct file *file,
> +		struct xfs_commit_range __user	*argp);
>  
>  struct xfs_exchmaps_req;
>  
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 6b13666d4e96..90b3ee21e7fe 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1518,6 +1518,10 @@ xfs_file_ioctl(
>  
>  	case XFS_IOC_EXCHANGE_RANGE:
>  		return xfs_ioc_exchange_range(filp, arg);
> +	case XFS_IOC_START_COMMIT:
> +		return xfs_ioc_start_commit(filp, arg);
> +	case XFS_IOC_COMMIT_RANGE:
> +		return xfs_ioc_commit_range(filp, arg);
>  
>  	default:
>  		return -ENOTTY;
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 180ce697305a..4cf0fa71ba9c 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -4926,7 +4926,8 @@ DEFINE_INODE_ERROR_EVENT(xfs_exchrange_error);
>  	{ XFS_EXCHANGE_RANGE_DRY_RUN,		"DRY_RUN" }, \
>  	{ XFS_EXCHANGE_RANGE_FILE1_WRITTEN,	"F1_WRITTEN" }, \
>  	{ __XFS_EXCHANGE_RANGE_UPD_CMTIME1,	"CMTIME1" }, \
> -	{ __XFS_EXCHANGE_RANGE_UPD_CMTIME2,	"CMTIME2" }
> +	{ __XFS_EXCHANGE_RANGE_UPD_CMTIME2,	"CMTIME2" }, \
> +	{ __XFS_EXCHANGE_RANGE_CHECK_FRESH2,	"FRESH2" }
>  
>  /* file exchange-range tracepoint class */
>  DECLARE_EVENT_CLASS(xfs_exchrange_class,
> @@ -4986,6 +4987,60 @@ DEFINE_EXCHRANGE_EVENT(xfs_exchrange_prep);
>  DEFINE_EXCHRANGE_EVENT(xfs_exchrange_flush);
>  DEFINE_EXCHRANGE_EVENT(xfs_exchrange_mappings);
>  
> +TRACE_EVENT(xfs_exchrange_freshness,
> +	TP_PROTO(const struct xfs_exchrange *fxr, struct xfs_inode *ip2),
> +	TP_ARGS(fxr, ip2),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_ino_t, ip2_ino)
> +		__field(long long, ip2_mtime)
> +		__field(long long, ip2_ctime)
> +		__field(int, ip2_mtime_nsec)
> +		__field(int, ip2_ctime_nsec)
> +
> +		__field(xfs_ino_t, file2_ino)
> +		__field(long long, file2_mtime)
> +		__field(long long, file2_ctime)
> +		__field(int, file2_mtime_nsec)
> +		__field(int, file2_ctime_nsec)
> +	),
> +	TP_fast_assign(
> +		struct timespec64	ts64;
> +		struct inode		*inode2 = VFS_I(ip2);
> +
> +		__entry->dev = inode2->i_sb->s_dev;
> +		__entry->ip2_ino = ip2->i_ino;
> +
> +		ts64 = inode_get_ctime(inode2);
> +		__entry->ip2_ctime = ts64.tv_sec;
> +		__entry->ip2_ctime_nsec = ts64.tv_nsec;
> +
> +		ts64 = inode_get_mtime(inode2);
> +		__entry->ip2_mtime = ts64.tv_sec;
> +		__entry->ip2_mtime_nsec = ts64.tv_nsec;
> +
> +		__entry->file2_ino = fxr->file2_ino;
> +		__entry->file2_mtime = fxr->file2_mtime.tv_sec;
> +		__entry->file2_ctime = fxr->file2_ctime.tv_sec;
> +		__entry->file2_mtime_nsec = fxr->file2_mtime.tv_nsec;
> +		__entry->file2_ctime_nsec = fxr->file2_ctime.tv_nsec;
> +	),
> +	TP_printk("dev %d:%d "
> +		  "ino 0x%llx mtime %lld:%d ctime %lld:%d -> "
> +		  "file 0x%llx mtime %lld:%d ctime %lld:%d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->ip2_ino,
> +		  __entry->ip2_mtime,
> +		  __entry->ip2_mtime_nsec,
> +		  __entry->ip2_ctime,
> +		  __entry->ip2_ctime_nsec,
> +		  __entry->file2_ino,
> +		  __entry->file2_mtime,
> +		  __entry->file2_mtime_nsec,
> +		  __entry->file2_ctime,
> +		  __entry->file2_ctime_nsec)
> +);
> +
>  TRACE_EVENT(xfs_exchmaps_overhead,
>  	TP_PROTO(struct xfs_mount *mp, unsigned long long bmbt_blocks,
>  		 unsigned long long rmapbt_blocks),
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/1] xfs: introduce new file range commit ioctls
  2024-09-03  7:52     ` Christian Brauner
@ 2024-10-25 21:23       ` Darrick J. Wong
  0 siblings, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2024-10-25 21:23 UTC (permalink / raw)
  To: Christian Brauner
  Cc: chandanbabu, Jeff Layton, Christoph Hellwig, linux-fsdevel,
	linux-xfs

On Tue, Sep 03, 2024 at 09:52:43AM +0200, Christian Brauner wrote:
> On Mon, Sep 02, 2024 at 11:23:07AM GMT, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > This patch introduces two more new ioctls to manage atomic updates to
> > file contents -- XFS_IOC_START_COMMIT and XFS_IOC_COMMIT_RANGE.  The
> > commit mechanism here is exactly the same as what XFS_IOC_EXCHANGE_RANGE
> > does, but with the additional requirement that file2 cannot have changed
> > since some sampling point.  The start-commit ioctl performs the sampling
> > of file attributes.
> > 
> > Note: This patch currently samples i_ctime during START_COMMIT and
> > checks that it hasn't changed during COMMIT_RANGE.  This isn't entirely
> > safe in kernels prior to 6.12 because ctime only had coarse grained
> > granularity and very fast updates could collide with a COMMIT_RANGE.
> > With the multi-granularity ctime introduced by Jeff Layton, it's now
> > possible to update ctime such that this does not happen.
> > 
> > It is critical, then, that this patch must not be backported to any
> > kernel that does not support fine-grained file change timestamps.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > Acked-by: Jeff Layton <jlayton@kernel.org>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/xfs/libxfs/xfs_fs.h |   26 +++++++++
> >  fs/xfs/xfs_exchrange.c |  143 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_exchrange.h |   16 +++++
> >  fs/xfs/xfs_ioctl.c     |    4 +
> >  fs/xfs/xfs_trace.h     |   57 +++++++++++++++++++
> >  5 files changed, 243 insertions(+), 3 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > index 454b63ef7201..c85c8077fac3 100644
> > --- a/fs/xfs/libxfs/xfs_fs.h
> > +++ b/fs/xfs/libxfs/xfs_fs.h
> > @@ -825,6 +825,30 @@ struct xfs_exchange_range {
> >  	__u64		flags;		/* see XFS_EXCHANGE_RANGE_* below */
> >  };
> >  
> > +/*
> > + * Using the same definition of file2 as struct xfs_exchange_range, commit the
> > + * contents of file1 into file2 if file2 has the same inode number, mtime, and
> > + * ctime as the arguments provided to the call.  The old contents of file2 will
> > + * be moved to file1.
> > + *
> > + * Returns -EBUSY if there isn't an exact match for the file2 fields.
> > + *
> > + * Filesystems must be able to restart and complete the operation even after
> > + * the system goes down.
> > + */
> > +struct xfs_commit_range {
> > +	__s32		file1_fd;
> > +	__u32		pad;		/* must be zeroes */
> > +	__u64		file1_offset;	/* file1 offset, bytes */
> > +	__u64		file2_offset;	/* file2 offset, bytes */
> > +	__u64		length;		/* bytes to exchange */
> > +
> > +	__u64		flags;		/* see XFS_EXCHANGE_RANGE_* below */
> > +
> > +	/* opaque file2 metadata for freshness checks */
> > +	__u64		file2_freshness[6];
> > +};
> > +
> >  /*
> >   * Exchange file data all the way to the ends of both files, and then exchange
> >   * the file sizes.  This flag can be used to replace a file's contents with a
> > @@ -997,6 +1021,8 @@ struct xfs_getparents_by_handle {
> >  #define XFS_IOC_BULKSTAT	     _IOR ('X', 127, struct xfs_bulkstat_req)
> >  #define XFS_IOC_INUMBERS	     _IOR ('X', 128, struct xfs_inumbers_req)
> >  #define XFS_IOC_EXCHANGE_RANGE	     _IOW ('X', 129, struct xfs_exchange_range)
> > +#define XFS_IOC_START_COMMIT	     _IOR ('X', 130, struct xfs_commit_range)
> > +#define XFS_IOC_COMMIT_RANGE	     _IOW ('X', 131, struct xfs_commit_range)
> >  /*	XFS_IOC_GETFSUUID ---------- deprecated 140	 */
> >  
> >  
> > diff --git a/fs/xfs/xfs_exchrange.c b/fs/xfs/xfs_exchrange.c
> > index c8a655c92c92..d0889190ab7f 100644
> > --- a/fs/xfs/xfs_exchrange.c
> > +++ b/fs/xfs/xfs_exchrange.c
> > @@ -72,6 +72,34 @@ xfs_exchrange_estimate(
> >  	return error;
> >  }
> >  
> > +/*
> > + * Check that file2's metadata agree with the snapshot that we took for the
> > + * range commit request.
> > + *
> > + * This should be called after the filesystem has locked /all/ inode metadata
> > + * against modification.
> > + */
> > +STATIC int
> > +xfs_exchrange_check_freshness(
> > +	const struct xfs_exchrange	*fxr,
> > +	struct xfs_inode		*ip2)
> > +{
> > +	struct inode			*inode2 = VFS_I(ip2);
> > +	struct timespec64		ctime = inode_get_ctime(inode2);
> > +	struct timespec64		mtime = inode_get_mtime(inode2);
> > +
> > +	trace_xfs_exchrange_freshness(fxr, ip2);
> > +
> > +	/* Check that file2 hasn't otherwise been modified. */
> > +	if (fxr->file2_ino != ip2->i_ino ||
> > +	    fxr->file2_gen != inode2->i_generation ||
> > +	    !timespec64_equal(&fxr->file2_ctime, &ctime) ||
> > +	    !timespec64_equal(&fxr->file2_mtime, &mtime))
> > +		return -EBUSY;
> > +
> > +	return 0;
> > +}
> > +
> >  #define QRETRY_IP1	(0x1)
> >  #define QRETRY_IP2	(0x2)
> >  
> > @@ -607,6 +635,12 @@ xfs_exchrange_prep(
> >  	if (error || fxr->length == 0)
> >  		return error;
> >  
> > +	if (fxr->flags & __XFS_EXCHANGE_RANGE_CHECK_FRESH2) {
> > +		error = xfs_exchrange_check_freshness(fxr, ip2);
> > +		if (error)
> > +			return error;
> > +	}
> > +
> >  	/* Attach dquots to both inodes before changing block maps. */
> >  	error = xfs_qm_dqattach(ip2);
> >  	if (error)
> > @@ -719,7 +753,8 @@ xfs_exchange_range(
> >  	if (fxr->file1->f_path.mnt != fxr->file2->f_path.mnt)
> >  		return -EXDEV;
> >  
> > -	if (fxr->flags & ~XFS_EXCHANGE_RANGE_ALL_FLAGS)
> > +	if (fxr->flags & ~(XFS_EXCHANGE_RANGE_ALL_FLAGS |
> > +			 __XFS_EXCHANGE_RANGE_CHECK_FRESH2))
> >  		return -EINVAL;
> >  
> >  	/* Userspace requests only honored for regular files. */
> > @@ -802,3 +837,109 @@ xfs_ioc_exchange_range(
> >  	fdput(file1);
> >  	return error;
> >  }
> > +
> > +/* Opaque freshness blob for XFS_IOC_COMMIT_RANGE */
> > +struct xfs_commit_range_fresh {
> > +	xfs_fsid_t	fsid;		/* m_fixedfsid */
> > +	__u64		file2_ino;	/* inode number */
> > +	__s64		file2_mtime;	/* modification time */
> > +	__s64		file2_ctime;	/* change time */
> > +	__s32		file2_mtime_nsec; /* mod time, nsec */
> > +	__s32		file2_ctime_nsec; /* change time, nsec */
> > +	__u32		file2_gen;	/* inode generation */
> > +	__u32		magic;		/* zero */
> > +};
> > +#define XCR_FRESH_MAGIC	0x444F524B	/* DORK */
> > +
> > +/* Set up a commitrange operation by sampling file2's write-related attrs */
> > +long
> > +xfs_ioc_start_commit(
> > +	struct file			*file,
> > +	struct xfs_commit_range __user	*argp)
> > +{
> > +	struct xfs_commit_range		args = { };
> > +	struct timespec64		ts;
> > +	struct xfs_commit_range_fresh	*kern_f;
> > +	struct xfs_commit_range_fresh	__user *user_f;
> > +	struct inode			*inode2 = file_inode(file);
> > +	struct xfs_inode		*ip2 = XFS_I(inode2);
> > +	const unsigned int		lockflags = XFS_IOLOCK_SHARED |
> > +						    XFS_MMAPLOCK_SHARED |
> > +						    XFS_ILOCK_SHARED;
> > +
> > +	BUILD_BUG_ON(sizeof(struct xfs_commit_range_fresh) !=
> > +		     sizeof(args.file2_freshness));
> > +
> > +	kern_f = (struct xfs_commit_range_fresh *)&args.file2_freshness;
> > +
> > +	memcpy(&kern_f->fsid, ip2->i_mount->m_fixedfsid, sizeof(xfs_fsid_t));
> > +
> > +	xfs_ilock(ip2, lockflags);
> > +	ts = inode_get_ctime(inode2);
> > +	kern_f->file2_ctime		= ts.tv_sec;
> > +	kern_f->file2_ctime_nsec	= ts.tv_nsec;
> > +	ts = inode_get_mtime(inode2);
> > +	kern_f->file2_mtime		= ts.tv_sec;
> > +	kern_f->file2_mtime_nsec	= ts.tv_nsec;
> > +	kern_f->file2_ino		= ip2->i_ino;
> > +	kern_f->file2_gen		= inode2->i_generation;
> > +	kern_f->magic			= XCR_FRESH_MAGIC;
> > +	xfs_iunlock(ip2, lockflags);
> > +
> > +	user_f = (struct xfs_commit_range_fresh __user *)&argp->file2_freshness;
> > +	if (copy_to_user(user_f, kern_f, sizeof(*kern_f)))
> > +		return -EFAULT;
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Exchange file1 and file2 contents if file2 has not been written since the
> > + * start commit operation.
> > + */
> > +long
> > +xfs_ioc_commit_range(
> > +	struct file			*file,
> > +	struct xfs_commit_range __user	*argp)
> > +{
> > +	struct xfs_exchrange		fxr = {
> > +		.file2			= file,
> > +	};
> > +	struct xfs_commit_range		args;
> > +	struct xfs_commit_range_fresh	*kern_f;
> > +	struct xfs_inode		*ip2 = XFS_I(file_inode(file));
> > +	struct xfs_mount		*mp = ip2->i_mount;
> > +	struct fd			file1;
> > +	int				error;
> > +
> > +	kern_f = (struct xfs_commit_range_fresh *)&args.file2_freshness;
> > +
> > +	if (copy_from_user(&args, argp, sizeof(args)))
> > +		return -EFAULT;
> > +	if (args.flags & ~XFS_EXCHANGE_RANGE_ALL_FLAGS)
> > +		return -EINVAL;
> > +	if (kern_f->magic != XCR_FRESH_MAGIC)
> > +		return -EBUSY;
> > +	if (memcmp(&kern_f->fsid, mp->m_fixedfsid, sizeof(xfs_fsid_t)))
> > +		return -EBUSY;
> 
> So, I mentioned this before in another mail a few months ago and I think
> you liked the idea so just as a reminder in case you forgot:
> 
> Ioctls are extensible if done correctly:
> 
> switch (__IOC_NR(ioctl)) {
> 	case _IOC_NR(XFS_IOC_START_COMMIT): {
> 	        size_t usize = _IOC_SIZE(ioctl);
> 		struct xfs_commit_range	args;
> 
> 		if (usize < XFS_IOC_START_COMMIT_SIZE_VER0)
>                         return -EINVAL;
> 
> 		if (copy_struct_from_user(&args, sizeof(args), argp, usize))
> 			return -EFAULT;
> 	}
> 
> If you code it this way, relying on copy_struct_from_user() right from
> the start you can easily extend your struct in a backward and forward
> compatible manner.

I don't know that we'd really need it for commitrange since there's
plenty of space (~40 bytes) in the "opaque" freshness blob.  I suspect
that if we ever add subvolumes to XFS then we might want to take over
the 12 bytes used by mtime* for the subvolume id.

That said, I also think we could convert to this format pretty easily.
Also, copy_struct_from_user can return -E2BIG so I think that needs to
be:

	ret = copy_struct_from_user(&args, sizeof(args), argp, usize);
	if (ret)
		return ret;

Though the overriding reason for not writing the __IOC_NR dispatch code
this way is that every time I've tried extend an xfs ioctl in this
manner, Dave says no because (I think) he doesn't trust how the struct
size is opaquely encoded in the ioctl number /and/ doesn't trust the
BUILD_BUG_ONs I put in the code to guarantee uniqueness so I pick a new
number so I can complete the review instead of starting over with a
different reviewer who doesn't have that particular preference.

> }
> 
> > +
> > +	fxr.file1_offset	= args.file1_offset;
> > +	fxr.file2_offset	= args.file2_offset;
> > +	fxr.length		= args.length;
> > +	fxr.flags		= args.flags | __XFS_EXCHANGE_RANGE_CHECK_FRESH2;
> > +	fxr.file2_ino		= kern_f->file2_ino;
> > +	fxr.file2_gen		= kern_f->file2_gen;
> > +	fxr.file2_mtime.tv_sec	= kern_f->file2_mtime;
> > +	fxr.file2_mtime.tv_nsec	= kern_f->file2_mtime_nsec;
> > +	fxr.file2_ctime.tv_sec	= kern_f->file2_ctime;
> > +	fxr.file2_ctime.tv_nsec	= kern_f->file2_ctime_nsec;
> > +
> > +	file1 = fdget(args.file1_fd);
> > +	if (!file1.file)
> > +		return -EBADF;
> 
> Please use CLASS(fd, f)(args.file1_fd) :)

Yeah, I saw that the fd cleanups collided with xfs in for-next, thanks
for the heads up.

--D

> > +	fxr.file1 = file1.file;
> > +
> > +	error = xfs_exchange_range(&fxr);
> > +	fdput(file1);
> > +	return error;
> > +}
> > diff --git a/fs/xfs/xfs_exchrange.h b/fs/xfs/xfs_exchrange.h
> > index 039abcca546e..bc1298aba806 100644
> > --- a/fs/xfs/xfs_exchrange.h
> > +++ b/fs/xfs/xfs_exchrange.h
> > @@ -10,8 +10,12 @@
> >  #define __XFS_EXCHANGE_RANGE_UPD_CMTIME1	(1ULL << 63)
> >  #define __XFS_EXCHANGE_RANGE_UPD_CMTIME2	(1ULL << 62)
> >  
> > +/* Freshness check required */
> > +#define __XFS_EXCHANGE_RANGE_CHECK_FRESH2	(1ULL << 61)
> > +
> >  #define XFS_EXCHANGE_RANGE_PRIV_FLAGS	(__XFS_EXCHANGE_RANGE_UPD_CMTIME1 | \
> > -					 __XFS_EXCHANGE_RANGE_UPD_CMTIME2)
> > +					 __XFS_EXCHANGE_RANGE_UPD_CMTIME2 | \
> > +					 __XFS_EXCHANGE_RANGE_CHECK_FRESH2)
> >  
> >  struct xfs_exchrange {
> >  	struct file		*file1;
> > @@ -22,10 +26,20 @@ struct xfs_exchrange {
> >  	u64			length;
> >  
> >  	u64			flags;	/* XFS_EXCHANGE_RANGE flags */
> > +
> > +	/* file2 metadata for freshness checks */
> > +	u64			file2_ino;
> > +	struct timespec64	file2_mtime;
> > +	struct timespec64	file2_ctime;
> > +	u32			file2_gen;
> >  };
> >  
> >  long xfs_ioc_exchange_range(struct file *file,
> >  		struct xfs_exchange_range __user *argp);
> > +long xfs_ioc_start_commit(struct file *file,
> > +		struct xfs_commit_range __user *argp);
> > +long xfs_ioc_commit_range(struct file *file,
> > +		struct xfs_commit_range __user	*argp);
> >  
> >  struct xfs_exchmaps_req;
> >  
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index 6b13666d4e96..90b3ee21e7fe 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -1518,6 +1518,10 @@ xfs_file_ioctl(
> >  
> >  	case XFS_IOC_EXCHANGE_RANGE:
> >  		return xfs_ioc_exchange_range(filp, arg);
> > +	case XFS_IOC_START_COMMIT:
> > +		return xfs_ioc_start_commit(filp, arg);
> > +	case XFS_IOC_COMMIT_RANGE:
> > +		return xfs_ioc_commit_range(filp, arg);
> >  
> >  	default:
> >  		return -ENOTTY;
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index 180ce697305a..4cf0fa71ba9c 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -4926,7 +4926,8 @@ DEFINE_INODE_ERROR_EVENT(xfs_exchrange_error);
> >  	{ XFS_EXCHANGE_RANGE_DRY_RUN,		"DRY_RUN" }, \
> >  	{ XFS_EXCHANGE_RANGE_FILE1_WRITTEN,	"F1_WRITTEN" }, \
> >  	{ __XFS_EXCHANGE_RANGE_UPD_CMTIME1,	"CMTIME1" }, \
> > -	{ __XFS_EXCHANGE_RANGE_UPD_CMTIME2,	"CMTIME2" }
> > +	{ __XFS_EXCHANGE_RANGE_UPD_CMTIME2,	"CMTIME2" }, \
> > +	{ __XFS_EXCHANGE_RANGE_CHECK_FRESH2,	"FRESH2" }
> >  
> >  /* file exchange-range tracepoint class */
> >  DECLARE_EVENT_CLASS(xfs_exchrange_class,
> > @@ -4986,6 +4987,60 @@ DEFINE_EXCHRANGE_EVENT(xfs_exchrange_prep);
> >  DEFINE_EXCHRANGE_EVENT(xfs_exchrange_flush);
> >  DEFINE_EXCHRANGE_EVENT(xfs_exchrange_mappings);
> >  
> > +TRACE_EVENT(xfs_exchrange_freshness,
> > +	TP_PROTO(const struct xfs_exchrange *fxr, struct xfs_inode *ip2),
> > +	TP_ARGS(fxr, ip2),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(xfs_ino_t, ip2_ino)
> > +		__field(long long, ip2_mtime)
> > +		__field(long long, ip2_ctime)
> > +		__field(int, ip2_mtime_nsec)
> > +		__field(int, ip2_ctime_nsec)
> > +
> > +		__field(xfs_ino_t, file2_ino)
> > +		__field(long long, file2_mtime)
> > +		__field(long long, file2_ctime)
> > +		__field(int, file2_mtime_nsec)
> > +		__field(int, file2_ctime_nsec)
> > +	),
> > +	TP_fast_assign(
> > +		struct timespec64	ts64;
> > +		struct inode		*inode2 = VFS_I(ip2);
> > +
> > +		__entry->dev = inode2->i_sb->s_dev;
> > +		__entry->ip2_ino = ip2->i_ino;
> > +
> > +		ts64 = inode_get_ctime(inode2);
> > +		__entry->ip2_ctime = ts64.tv_sec;
> > +		__entry->ip2_ctime_nsec = ts64.tv_nsec;
> > +
> > +		ts64 = inode_get_mtime(inode2);
> > +		__entry->ip2_mtime = ts64.tv_sec;
> > +		__entry->ip2_mtime_nsec = ts64.tv_nsec;
> > +
> > +		__entry->file2_ino = fxr->file2_ino;
> > +		__entry->file2_mtime = fxr->file2_mtime.tv_sec;
> > +		__entry->file2_ctime = fxr->file2_ctime.tv_sec;
> > +		__entry->file2_mtime_nsec = fxr->file2_mtime.tv_nsec;
> > +		__entry->file2_ctime_nsec = fxr->file2_ctime.tv_nsec;
> > +	),
> > +	TP_printk("dev %d:%d "
> > +		  "ino 0x%llx mtime %lld:%d ctime %lld:%d -> "
> > +		  "file 0x%llx mtime %lld:%d ctime %lld:%d",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > +		  __entry->ip2_ino,
> > +		  __entry->ip2_mtime,
> > +		  __entry->ip2_mtime_nsec,
> > +		  __entry->ip2_ctime,
> > +		  __entry->ip2_ctime_nsec,
> > +		  __entry->file2_ino,
> > +		  __entry->file2_mtime,
> > +		  __entry->file2_mtime_nsec,
> > +		  __entry->file2_ctime,
> > +		  __entry->file2_ctime_nsec)
> > +);
> > +
> >  TRACE_EVENT(xfs_exchmaps_overhead,
> >  	TP_PROTO(struct xfs_mount *mp, unsigned long long bmbt_blocks,
> >  		 unsigned long long rmapbt_blocks),
> > 
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2024-10-25 21:23 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-02 18:16 [PATCHBOMB 6.12] xfs: a ton of bugfixes and cleanups Darrick J. Wong
2024-09-02 18:21 ` [PATCHSET v31.1 1/8] xfs: atomic file content commits Darrick J. Wong
2024-09-02 18:23   ` [PATCH 1/1] xfs: introduce new file range commit ioctls Darrick J. Wong
2024-09-03  7:52     ` Christian Brauner
2024-10-25 21:23       ` Darrick J. Wong
2024-09-02 18:21 ` [PATCHSET v4.2 2/8] xfs: cleanups before adding metadata directories Darrick J. Wong
2024-09-02 18:23   ` [PATCH 1/3] xfs: validate inumber in xfs_iget Darrick J. Wong
2024-09-02 18:23   ` [PATCH 2/3] xfs: match on the global RT inode numbers in xfs_is_metadata_inode Darrick J. Wong
2024-09-02 18:23   ` [PATCH 3/3] xfs: pass the icreate args object to xfs_dialloc Darrick J. Wong
2024-09-02 18:21 ` [PATCHSET v4.2 3/8] xfs: clean up the rtbitmap code Darrick J. Wong
2024-09-02 18:24   ` [PATCH 01/12] xfs: remove xfs_validate_rtextents Darrick J. Wong
2024-09-02 18:24   ` [PATCH 02/12] xfs: factor out a xfs_validate_rt_geometry helper Darrick J. Wong
2024-09-02 18:24   ` [PATCH 03/12] xfs: make the RT rsum_cache mandatory Darrick J. Wong
2024-09-02 18:24   ` [PATCH 04/12] xfs: remove the limit argument to xfs_rtfind_back Darrick J. Wong
2024-09-02 18:25   ` [PATCH 05/12] xfs: assert a valid limit in xfs_rtfind_forw Darrick J. Wong
2024-09-02 18:25   ` [PATCH 06/12] xfs: add bounds checking to xfs_rt{bitmap,summary}_read_buf Darrick J. Wong
2024-09-02 18:25   ` [PATCH 07/12] xfs: cleanup the calling convention for xfs_rtpick_extent Darrick J. Wong
2024-09-02 18:25   ` [PATCH 08/12] xfs: push the calls to xfs_rtallocate_range out to xfs_bmap_rtalloc Darrick J. Wong
2024-09-02 18:26   ` [PATCH 09/12] xfs: factor out a xfs_growfs_rt_bmblock helper Darrick J. Wong
2024-09-02 18:26   ` [PATCH 10/12] xfs: factor out a xfs_last_rt_bmblock helper Darrick J. Wong
2024-09-02 18:26   ` [PATCH 11/12] xfs: factor out rtbitmap/summary initialization helpers Darrick J. Wong
2024-09-02 18:27   ` [PATCH 12/12] xfs: push transaction join out of xfs_rtbitmap_lock and xfs_rtgroup_lock Darrick J. Wong
2024-09-02 18:21 ` [PATCHSET v4.2 4/8] xfs: fixes for the realtime allocator Darrick J. Wong
2024-09-02 18:27   ` [PATCH 01/10] xfs: use the recalculated transaction reservation in xfs_growfs_rt_bmblock Darrick J. Wong
2024-09-02 18:27   ` [PATCH 02/10] xfs: ensure rtx mask/shift are correct after growfs Darrick J. Wong
2024-09-02 18:27   ` [PATCH 03/10] xfs: don't return too-short extents from xfs_rtallocate_extent_block Darrick J. Wong
2024-09-02 18:28   ` [PATCH 04/10] xfs: don't scan off the end of the rt volume in xfs_rtallocate_extent_block Darrick J. Wong
2024-09-02 18:28   ` [PATCH 05/10] xfs: refactor aligning bestlen to prod Darrick J. Wong
2024-09-02 18:28   ` [PATCH 06/10] xfs: clean up xfs_rtallocate_extent_exact a bit Darrick J. Wong
2024-09-02 18:28   ` [PATCH 07/10] xfs: reduce excessive clamping of maxlen in xfs_rtallocate_extent_near Darrick J. Wong
2024-09-02 18:29   ` [PATCH 08/10] xfs: fix broken variable-sized allocation detection in xfs_rtallocate_extent_block Darrick J. Wong
2024-09-02 18:29   ` [PATCH 09/10] xfs: remove xfs_rtb_to_rtxrem Darrick J. Wong
2024-09-02 18:29   ` [PATCH 10/10] xfs: simplify xfs_rtalloc_query_range Darrick J. Wong
2024-09-02 18:22 ` [PATCHSET v4.2 5/8] xfs: cleanups for the realtime allocator Darrick J. Wong
2024-09-02 18:29   ` [PATCH 01/10] xfs: clean up the ISVALID macro in xfs_bmap_adjacent Darrick J. Wong
2024-09-02 18:30   ` [PATCH 02/10] xfs: factor out a xfs_rtallocate helper Darrick J. Wong
2024-09-02 18:30   ` [PATCH 03/10] xfs: rework the rtalloc fallback handling Darrick J. Wong
2024-09-02 18:30   ` [PATCH 04/10] xfs: factor out a xfs_rtallocate_align helper Darrick J. Wong
2024-09-02 18:30   ` [PATCH 05/10] xfs: make the rtalloc start hint a xfs_rtblock_t Darrick J. Wong
2024-09-02 18:31   ` [PATCH 06/10] xfs: add xchk_setup_nothing and xchk_nothing helpers Darrick J. Wong
2024-09-02 18:31   ` [PATCH 07/10] xfs: remove xfs_{rtbitmap,rtsummary}_wordcount Darrick J. Wong
2024-09-02 18:31   ` [PATCH 08/10] xfs: replace m_rsumsize with m_rsumblocks Darrick J. Wong
2024-09-02 18:31   ` [PATCH 09/10] xfs: rearrange xfs_fsmap.c a little bit Darrick J. Wong
2024-09-02 18:32   ` [PATCH 10/10] xfs: move xfs_ioc_getfsmap out of xfs_ioctl.c Darrick J. Wong
2024-09-02 18:22 ` [PATCHSET v4.2 6/8] xfs: cleanups for quota mount Darrick J. Wong
2024-09-02 18:32   ` [PATCH 1/1] xfs: refactor loading quota inodes in the regular case Darrick J. Wong
2024-09-02 18:22 ` [PATCHSET 7/8] xfs: various bug fixes for 6.12 Darrick J. Wong
2024-09-02 18:32   ` [PATCH 1/3] xfs: fix C++ compilation errors in xfs_fs.h Darrick J. Wong
2024-09-02 18:33   ` [PATCH 2/3] xfs: fix FITRIM reporting again Darrick J. Wong
2024-09-02 18:33   ` [PATCH 3/3] xfs: fix a sloppy memory handling bug in xfs_iroot_realloc Darrick J. Wong
2024-09-02 18:22 ` [PATCHSET v4.2 8/8] xfs: cleanups for inode rooted btree code Darrick J. Wong
2024-09-02 18:33   ` [PATCH 1/2] xfs: replace shouty XFS_BM{BT,DR} macros Darrick J. Wong
2024-09-02 18:33   ` [PATCH 2/2] xfs: standardize the btree maxrecs function parameters Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox