public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: xfs: Temporary extra disk space consumption?
Date: Thu, 24 Mar 2022 06:16:47 +1100	[thread overview]
Message-ID: <20220323191647.GT1544202@dread.disaster.area> (raw)
In-Reply-To: <26806b4a-5953-e45e-3f89-cff2020309b6@I-love.SAKURA.ne.jp>

On Wed, Mar 23, 2022 at 08:21:52PM +0900, Tetsuo Handa wrote:
> Hello.
> 
> I found that running a sample program shown below on xfs filesystem
> results in consuming extra disk space until close() is called.
> Is this expected result?

Yes. It's an anti-fragmentation mechanism that is intended to
prevent ecessive fragmentation when many files are being written at
once.

> I don't care if temporarily consumed extra disk space is trivial. But since
> this amount as of returning from fsync() is as much as amount of written data,
> I worry that there might be some bug.
> 
> ---------- my_write_unlink.c ----------
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> 
> int main(int argc, char *argv[])
> {
> 	static char buffer[1048576];
> 	const char *filename = "my_testfile";
> 	const int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0600);
> 	int i;

Truncate to zero length - all writes will be sequential extending
EOF.

> 
> 	if (fd == EOF)
> 		return 1;
> 	printf("Before write().\n");
> 	system("/bin/df -m .");
> 	for (i = 0; i < 1024; i++)
> 		if (write(fd, buffer, sizeof(buffer)) != sizeof(buffer))
> 			return 1;

And then wrote 1GB of sequential data. Without looking yet at your
results, I would expect between about 1.5 and 2GB of space was
allocated.

> 	if (fsync(fd))
> 		return 1;

This will allocate it all as a single unwritten extent if possible,
then write the 1GB of data to it converting that range to written.

Check your file size here - it will be 1GB. You can't read beyond
EOF, so the extra allocation in not accesible. It's also unwritten,
so even if you could read beyond EOF, you can't read any data from
the range because reads of unwritten extents return zeros.

> 	printf("Before close().\n");
> 	system("/bin/df -m .");
> 	if (close(fd))
> 		return 1;

This will run ->release() which will remove any extra allocation
we do at write() and result in just the written data up to EOF
remaining allocated on disk.

> 	printf("Before unlink().\n");
> 	system("/bin/df -m .");
> 	if (unlink(filename))
> 		return 1;
> 	printf("After unlink().\n");
> 	system("/bin/df -m .");
> 	return 0;
> }
> ---------- my_write_unlink.c ----------
> 
> ----------
> $ uname -r
> 5.17.0
> $ ./my_write_unlink
> Before write().
> Filesystem     1M-blocks   Used Available Use% Mounted on
> /dev/sda1         255875 130392    125483  51% /
> Before close().
> Filesystem     1M-blocks   Used Available Use% Mounted on
> /dev/sda1         255875 132443    123432  52% /

Yup, 2GB of space allocated.

> Before unlink().
> Filesystem     1M-blocks   Used Available Use% Mounted on
> /dev/sda1         255875 131416    124459  52% /

and ->release trims extra allocation beyond EOF and now you are
back to just the 1GB the file consumes.

> After unlink().
> Filesystem     1M-blocks   Used Available Use% Mounted on
> /dev/sda1         255875 130392    125483  51% /

And now it's all gone.

> $ grep sda /proc/mounts
> /dev/sda1 / xfs rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
> ----------
> 
> ----------
> $ uname -r
> 4.18.0-365.el8.x86_64

Same.

> ----------
> $ uname -r
> 3.10.0-1160.59.1.el7.x86_64

Same.

Looks like specualtive preallocation for sequential writes is
behaving exactly as designed....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2022-03-23 19:16 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-23 11:21 xfs: Temporary extra disk space consumption? Tetsuo Handa
2022-03-23 16:51 ` Darrick J. Wong
2022-03-23 19:16 ` Dave Chinner [this message]
2022-03-23 23:28   ` Tetsuo Handa
2022-03-24  1:13     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220323191647.GT1544202@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox