All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Bobrowski <repnop@google.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Subject: Re: General Filesystem Question - Interesting Unexplainable Observation
Date: Wed, 2 Nov 2022 03:07:55 +0000	[thread overview]
Message-ID: <Y2HfC3VmWB/iadLU@google.com> (raw)
In-Reply-To: <20221031112237.kgr64levqo3dxoj5@quack3>

Hey Jan,

Thanks for getting back to me.

On Mon, Oct 31, 2022 at 12:22:37PM +0100, Jan Kara wrote:
> Hi Matthew!
> 
> [added ext4 mailing list to CC, maybe others have more ideas]
> 
> On Fri 28-10-22 23:23:14, Matt Bobrowski wrote:
> > Just had a general question in regards to some recent filesystem (ext4)
> > behaviour I've recently observed, which kind of made my eyebrows raise a
> > little and I wanted to understand why this was happening.
> > 
> > We have an application (single threaded process) that basically performs
> > the following sequence of filesystem operations using buffered I/O:
> > 
> > ---
> > fd = open("dir/tmp/filename.new", O_WRONLY | O_CREAT | O_TRUNC, 0400);
> > ...
> > write(fd, buf, sizeof(buf));
> > ...
> > rename("dir/tmp/filename.new", "dir/new/filename");
> > ---
> > 
> > At times, I see the "dir/new/filename" file size reporting 0 bytes, despite
> > sizeof(buf) written to "dir/tmp/filename.new" always guaranteed to be > 0
> > and the result of the write reported as being successful. This is the part
> > I cannot come up with a valid explanation for (yet).
> 
> So by "file size reporting 0 bytes" do you mean that
> stat("dir/new/filename") from a concurrent process returns file size 0
> sometimes?

Not quite, meaning that stat("dir/new/filename") is reporting 0 bytes
long after the write(2) operation had occurred. IOW, I'm seeing 0 byte
files laying around when they well and truly should have had bytes
written out to them (before a write(2) is issued we check to make sure
that the supplied buffer actually has something in it) i.e. manually
stat'ing them in a shell.

> Or do you refer to a situation after an unclean filesystem
> shutdown?

It could very well be from an unclean shutdown, but it's really hard
to say whether this is the culprit or not.

> > Understandably,
> > there's no fsync being currently performed post calling write, which I
> > think needs to be corrected, but I also can't see how not using fsync post
> > write would result in the file size for "dir/new/filename" being reported
> > as 0 bytes? One of the things that crossed my mind was that the rename
> > operation was possibly being committed prior to the dirty pages from the
> > pagecache being flushed, but regardless I don't see how a rename would
> > result in the data blocks associated to the write not ever being committed
> > for the same underlying inode?
> > 
> > What are your thoughts? Any plausible explanation why I might be seeing
> > this odd behaviour?
> 
> Ext4 uses delayed allocation. That means that write(2) just stores data in
> the page cache but no blocks are allocated yet. So indeed rename(2) can be
> fully committed in the journal before any of the data gets to persistent
> storage. That being said ext4 has a workaround for buggy applications (can
> be disabled with "noauto_da_alloc" mount option) that starts data writeback
> before rename is done so at least in data=ordered mode you should not see 0
> length files after a crash with the above scheme.

Right, we are using buffered I/O after all... However, even if the
rename(2) operation took place and was fully committed to the journal
before the dirty pages associated to the prior write(2) had been
written back, I wouldn't expect the data to be missing? IOW, the
write(2) and rename(2) operations are taking effect on the same
backing inode, no?

/M

  reply	other threads:[~2022-11-02  3:09 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAJBvgGfv9zsE4PEnuuVqKhiKfpbrxk=kXG4pp5AAMOXyVc5-bQ@mail.gmail.com>
2022-10-31 11:22 ` General Filesystem Question - Interesting Unexplainable Observation Jan Kara
2022-11-02  3:07   ` Matt Bobrowski [this message]
2022-11-02 14:22     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2HfC3VmWB/iadLU@google.com \
    --to=repnop@google.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.