From: Dave Chinner <david@fromorbit.com>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: xfs: Temporary extra disk space consumption?
Date: Thu, 24 Mar 2022 06:16:47 +1100 [thread overview]
Message-ID: <20220323191647.GT1544202@dread.disaster.area> (raw)
In-Reply-To: <26806b4a-5953-e45e-3f89-cff2020309b6@I-love.SAKURA.ne.jp>
On Wed, Mar 23, 2022 at 08:21:52PM +0900, Tetsuo Handa wrote:
> Hello.
>
> I found that running a sample program shown below on xfs filesystem
> results in consuming extra disk space until close() is called.
> Is this expected result?
Yes. It's an anti-fragmentation mechanism that is intended to
prevent ecessive fragmentation when many files are being written at
once.
> I don't care if temporarily consumed extra disk space is trivial. But since
> this amount as of returning from fsync() is as much as amount of written data,
> I worry that there might be some bug.
>
> ---------- my_write_unlink.c ----------
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
>
> int main(int argc, char *argv[])
> {
> static char buffer[1048576];
> const char *filename = "my_testfile";
> const int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0600);
> int i;
Truncate to zero length - all writes will be sequential extending
EOF.
>
> if (fd == EOF)
> return 1;
> printf("Before write().\n");
> system("/bin/df -m .");
> for (i = 0; i < 1024; i++)
> if (write(fd, buffer, sizeof(buffer)) != sizeof(buffer))
> return 1;
And then wrote 1GB of sequential data. Without looking yet at your
results, I would expect between about 1.5 and 2GB of space was
allocated.
> if (fsync(fd))
> return 1;
This will allocate it all as a single unwritten extent if possible,
then write the 1GB of data to it converting that range to written.
Check your file size here - it will be 1GB. You can't read beyond
EOF, so the extra allocation in not accesible. It's also unwritten,
so even if you could read beyond EOF, you can't read any data from
the range because reads of unwritten extents return zeros.
> printf("Before close().\n");
> system("/bin/df -m .");
> if (close(fd))
> return 1;
This will run ->release() which will remove any extra allocation
we do at write() and result in just the written data up to EOF
remaining allocated on disk.
> printf("Before unlink().\n");
> system("/bin/df -m .");
> if (unlink(filename))
> return 1;
> printf("After unlink().\n");
> system("/bin/df -m .");
> return 0;
> }
> ---------- my_write_unlink.c ----------
>
> ----------
> $ uname -r
> 5.17.0
> $ ./my_write_unlink
> Before write().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 255875 130392 125483 51% /
> Before close().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 255875 132443 123432 52% /
Yup, 2GB of space allocated.
> Before unlink().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 255875 131416 124459 52% /
and ->release trims extra allocation beyond EOF and now you are
back to just the 1GB the file consumes.
> After unlink().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 255875 130392 125483 51% /
And now it's all gone.
> $ grep sda /proc/mounts
> /dev/sda1 / xfs rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
> ----------
>
> ----------
> $ uname -r
> 4.18.0-365.el8.x86_64
Same.
> ----------
> $ uname -r
> 3.10.0-1160.59.1.el7.x86_64
Same.
Looks like specualtive preallocation for sequential writes is
behaving exactly as designed....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2022-03-23 19:16 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-23 11:21 xfs: Temporary extra disk space consumption? Tetsuo Handa
2022-03-23 16:51 ` Darrick J. Wong
2022-03-23 19:16 ` Dave Chinner [this message]
2022-03-23 23:28 ` Tetsuo Handa
2022-03-24 1:13 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220323191647.GT1544202@dread.disaster.area \
--to=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox