Re: s_bmap and flags explanation

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Emmanouil Vamvakopoulos <emmanouil.vamvakopoulos@ijclab.in2p3.fr>
Cc: "@pop.gmail.com>" <cem@kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: s_bmap and  flags explanation
Date: Fri, 5 Aug 2022 08:55:54 +1000	[thread overview]
Message-ID: <20220804225554.GD3600936@dread.disaster.area> (raw)
In-Reply-To: <789765075.71120211.1659608731638.JavaMail.zimbra@ijclab.in2p3.fr>

On Thu, Aug 04, 2022 at 12:25:31PM +0200, Emmanouil Vamvakopoulos wrote:
> hello Carlos and Dave 
> 
> thank you for the replies
> 
> a) for the mismatch in alignment bewteen xfs  and underlying raid volume I have to re-check 
> but from preliminary tests , when I mount the partition with a static allocsize ( e.g. allocsize=256k)
> we have large file with large number of externs ( up to 40) but the sizes from du was comparable.

As expected - fixing the post-EOF specualtive preallocation to
256kB means almost no consumed space beyond eof so they will always
be close (but not identical) for a non-sparse, non-shared file.

But that begs the question: why are you concerned about large files
consuming slightly more space than expected for a short period of
time?

We've been doing this since commit 055388a3188f ("xfs: dynamic
speculative EOF preallocation") which was committed in January 2011
- over a decade ago - and it's been well known for a couple of
decades before that that ls and du cannot be
relied to match on any filesystem that supports sparse files.

And these days with deduplication/reflink that share extents betwen
files, it's even less useful because du can be correct for every
individual file, but then still report that more blocks are being
used than the filesystem has capacity to store because it reports
shared blocks multiple times...

So why do you care that du and ls are different?

> b) for the speculative preallocation beyond EOF of my files as I understood have to run xfs_fsr to get the space back. 

No, you don't need to do anything, and you *most definitely* do
*not* want to run xfs_fsr to remove it. If you really must remove
specualtive prealloc, then run:

# xfs_spaceman -c "prealloc -m 0" <mntpt>

And that will remove all specualtive preallocation that is current
on all in-memory inodes via an immediate blockgc pass.

If you just want to remove post-eof blocks on a single file, then
find out the file size with stat and truncate it to the same size.
The truncate won't change the file size, but it will remove all
blocks beyond EOF.

*However*

You should not ever need to be doing this as there are several
automated triggers to remove it, all when the filesytem detects
there is no active modification of the file being performed. One
trigger is the last close of a file descriptor, another is the
periodic background blockgc worker, and another is memory reclaim
removing the inode from memory.

In all cases, these are triggers that indicate that the file is not
currently being written to, and hence the speculative prealloc is
not needed anymore and so can be removed.

So you should never have to remove it manually.

> but why the inodes of those files remains dirty  at least for 300 sec  after the  closing of the file and lost the automatic removal of the preallocation ?

What do you mean by "dirty"? A file with post-eof preallocation is
not dirty in any way once the data in the file has been written
back (usually within 30s).

> we are runing on CentOS Stream release 8 with 4.18.0-383.el8.x86_64 
> 
> but we never see something simliar on CentOS Linux release 7.9.2009 (Core) with  3.10.0-1160.45.1.el7.x86_64 
> (for similar pattern of file sizes, but truly with different distributed strorage application)

RHEL 7/CentOS 7 had this same behaviour - it was introduced in
2.6.38. All your observation means is that the application running
on RHEL 7 was writing the files in a way that didn't trigger
speculative prealloc beyond EOF, not that speculative prealloc
beyond EOF didn't exist....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

     prev parent reply	other threads:[~2022-08-04 22:59 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1586129076.70820212.1659538177737.JavaMail.zimbra@ijclab.in2p3.fr>
2022-08-03 14:56 ` s_bmap and flags explanation Emmanouil Vamvakopoulos
2022-08-03 15:54   ` Carlos Maiolino
2022-08-03 21:59   ` Dave Chinner
2022-08-04 10:25     ` Emmanouil Vamvakopoulos
2022-08-04 13:30       ` Carlos Maiolino
2022-08-04 22:55       ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220804225554.GD3600936@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=cem@kernel.org \
    --cc=emmanouil.vamvakopoulos@ijclab.in2p3.fr \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox