* Writing more than 4096 bytes with O_SYNC flag does not persist all previously written data if system crashes
@ 2026-02-18 13:29 Vyacheslav Kovalevsky
2026-02-18 21:55 ` Andreas Dilger
2026-02-24 14:47 ` Christoph Hellwig
0 siblings, 2 replies; 10+ messages in thread
From: Vyacheslav Kovalevsky @ 2026-02-18 13:29 UTC (permalink / raw)
To: tytso, adilger.kernel; +Cc: linux-ext4, linux-kernel
Detailed description
====================
Hello, there seems to be an issue with ext4 crash behavior:
1. Create and sync a new file.
2. Open the file and write some data (must be more than 4096 bytes).
3. Close the file.
4. Open the file with O_SYNC flag and write some data.
After system crash the file will have the wrong size and some previously
written data will be lost.
According to Linux manual
<https://man7.org/linux/man-pages/man2/open.2.html> O_SYNC can replaced
with fsync() call after each write operation:
```
By the time write(2) (or similar) returns, the output data
and associated file metadata have been transferred to the
underlying hardware (i.e., as though each write(2) was
followed by a call to fsync(2)).
```
In this case it is not true, using O_SYNC does not persist the data like
fsync() does (see test below).
System info
===========
Linux version 6.19.2
How to reproduce
================
```
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#define BUFFER_LEN 5000 // should be at least ~ 4096+1
int main() {
int status;
int file_fd0;
int file_fd1;
int file_fd2;
char buffer[BUFFER_LEN + 1] = {};
for (int i = 0; i <= BUFFER_LEN; ++i) {
buffer[i] = (char)i;
}
status = creat("file", S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH);
printf("CREAT: %d\n", status);
file_fd0 = status;
status = close(file_fd0);
printf("CLOSE: %d\n", status);
sync();
status = open("file", O_WRONLY);
printf("OPEN: %d\n", status);
file_fd1 = status;
status = write(file_fd1, buffer, BUFFER_LEN);
printf("WRITE: %d\n", status);
status = close(file_fd1);
printf("CLOSE: %d\n", status);
status = open("file", O_WRONLY | O_SYNC);
printf("OPEN: %d\n", status);
file_fd2 = status;
status = write(file_fd2, "Test data!", 10);
printf("WRITE: %d\n", status);
status = close(file_fd2);
printf("CLOSE: %d\n", status);
}
// after crash file size is 4096 instead of 5000
```
Output:
```
CREAT: 3
CLOSE: 0
OPEN: 3
WRITE: 5000
CLOSE: 0
OPEN: 3
WRITE: 10
CLOSE: 0
```
File content after crash:
```
$ xxd file
00000000: 5465 7374 2064 6174 6121 0a0b 0c0d 0e0f Test data!......
00000010: 1011 1213 1415 1617 1819 1a1b 1c1d 1e1f ................
00000020: 2021 2223 2425 2627 2829 2a2b 2c2d 2e2f !"#$%&'()*+,-./
.........
00000ff0: f0f1 f2f3 f4f5 f6f7 f8f9 fafb fcfd feff ................
```
Steps:
1. Create and mount new ext4 file system in default configuration.
2. Change directory to root of the file system and run the compiled test.
3. Cause hard system crash (e.g. QEMU `system_reset` command).
4. Remount file system after crash.
5. Observe that file size is 4096 instead of 5000.
Notes:
- This also seems to affect XFS in the same way.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Writing more than 4096 bytes with O_SYNC flag does not persist all previously written data if system crashes
2026-02-18 13:29 Writing more than 4096 bytes with O_SYNC flag does not persist all previously written data if system crashes Vyacheslav Kovalevsky
@ 2026-02-18 21:55 ` Andreas Dilger
2026-02-19 13:32 ` Theodore Tso
2026-02-24 14:47 ` Christoph Hellwig
1 sibling, 1 reply; 10+ messages in thread
From: Andreas Dilger @ 2026-02-18 21:55 UTC (permalink / raw)
To: Vyacheslav Kovalevsky; +Cc: tytso, linux-ext4, linux-kernel
On Feb 18, 2026, at 06:29, Vyacheslav Kovalevsky <slava.kovalevskiy.2014@gmail.com> wrote:
>
> Detailed description
> ====================
>
> Hello, there seems to be an issue with ext4 crash behavior:
>
> 1. Create and sync a new file.
> 2. Open the file and write some data (must be more than 4096 bytes).
> 3. Close the file.
> 4. Open the file with O_SYNC flag and write some data.
>
> After system crash the file will have the wrong size and some previously written data will be lost.
>
> According to Linux manual <https://man7.org/linux/man-pages/man2/open.2.html> O_SYNC can replaced with fsync() call after each write operation:
>
> ```
> By the time write(2) (or similar) returns, the output data
> and associated file metadata have been transferred to the
> underlying hardware (i.e., as though each write(2) was
> followed by a call to fsync(2)).
> ```
>
> In this case it is not true, using O_SYNC does not persist the data like fsync() does (see test below).
>
> Notes:
> - This also seems to affect XFS in the same way.
Well, the O_SYNC flag has to be on the file descriptor where writes are done.
In your case, the "write some data" at the start is done on a file descriptor
that does *not* have O_SYNC, so the semantics of that flag do not apply to
those initial writes. It is the same as O_TRUNC or O_DIRECT or other flags
only affecting the file descriptor where it is used, not some earlier or later
file descriptor.
Either the "write some data" phase must also use O_SYNC, or call fsync() on
that file descriptor before closing it, or call fsync() on the later file
descriptor (assuming persistence of the initial writes do not matter until
the later writes are done).
If anything, the man page should be updated to be more concise, like:
"the *just written* output data *on that file descriptor* and associated
file metadata have been transferred to the underlying hardware (i.e.
as though each write(2) was followed by a call to sync_file_range(2)
for the corresponding file offset(s))"
Cheers, Andreas
>
> System info
> ===========
>
> Linux version 6.19.2
>
>
> How to reproduce
> ================
>
> ```
> #include <errno.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <string.h>
> #include <sys/stat.h>
> #include <sys/types.h>
> #include <unistd.h>
>
> #define BUFFER_LEN 5000 // should be at least ~ 4096+1
>
> int main() {
> int status;
> int file_fd0;
> int file_fd1;
> int file_fd2;
>
> char buffer[BUFFER_LEN + 1] = {};
> for (int i = 0; i <= BUFFER_LEN; ++i) {
> buffer[i] = (char)i;
> }
>
> status = creat("file", S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH);
> printf("CREAT: %d\n", status);
> file_fd0 = status;
>
> status = close(file_fd0);
> printf("CLOSE: %d\n", status);
>
> sync();
>
> status = open("file", O_WRONLY);
> printf("OPEN: %d\n", status);
> file_fd1 = status;
>
> status = write(file_fd1, buffer, BUFFER_LEN);
> printf("WRITE: %d\n", status);
>
> status = close(file_fd1);
> printf("CLOSE: %d\n", status);
>
> status = open("file", O_WRONLY | O_SYNC);
> printf("OPEN: %d\n", status);
> file_fd2 = status;
>
> status = write(file_fd2, "Test data!", 10);
> printf("WRITE: %d\n", status);
>
> status = close(file_fd2);
> printf("CLOSE: %d\n", status);
> }
> // after crash file size is 4096 instead of 5000
> ```
>
> Output:
>
> ```
> CREAT: 3
> CLOSE: 0
> OPEN: 3
> WRITE: 5000
> CLOSE: 0
> OPEN: 3
> WRITE: 10
> CLOSE: 0
> ```
>
> File content after crash:
>
> ```
> $ xxd file
> 00000000: 5465 7374 2064 6174 6121 0a0b 0c0d 0e0f Test data!......
> 00000010: 1011 1213 1415 1617 1819 1a1b 1c1d 1e1f ................
> 00000020: 2021 2223 2425 2627 2829 2a2b 2c2d 2e2f !"#$%&'()*+,-./
>
> .........
>
> 00000ff0: f0f1 f2f3 f4f5 f6f7 f8f9 fafb fcfd feff ................
> ```
>
> Steps:
>
> 1. Create and mount new ext4 file system in default configuration.
> 2. Change directory to root of the file system and run the compiled test.
> 3. Cause hard system crash (e.g. QEMU `system_reset` command).
> 4. Remount file system after crash.
> 5. Observe that file size is 4096 instead of 5000.
>
> Notes:
>
> - This also seems to affect XFS in the same way.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Writing more than 4096 bytes with O_SYNC flag does not persist all previously written data if system crashes
2026-02-18 21:55 ` Andreas Dilger
@ 2026-02-19 13:32 ` Theodore Tso
2026-02-23 12:46 ` Alejandro Colomar
0 siblings, 1 reply; 10+ messages in thread
From: Theodore Tso @ 2026-02-19 13:32 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Vyacheslav Kovalevsky, linux-ext4, linux-kernel, linux-man
+linux-man
On Wed, Feb 18, 2026 at 02:55:13PM -0700, Andreas Dilger wrote:
> If anything, the man page should be updated to be more concise, like:
>
> "the *just written* output data *on that file descriptor* and associated
> file metadata have been transferred to the underlying hardware (i.e.
> as though each write(2) was followed by a call to sync_file_range(2)
> for the corresponding file offset(s))"
Yeah, this is an inaccuracy in the man page; the definition of O_SYNC
from the Single Unix Specification states:
O_SYNC Write I/O operations on the file descriptor shall complete
^^^^^^^^^^^^^^^^^^^^^^
as defined by synchronized I/O file integrity completion.
Compare and contrast this to what's in the Linux manpage:
O_SYNC Write operations on the file will complete according to the re‐
quirements of synchronized I/O file integrity completion (by con‐
trast with the synchronized I/O data integrity completion pro‐
vided by O_DSYNC.)
By the time write(2) (or similar) returns, the output data and
associated file metadata have been transferred to the underlying
hardware (i.e., as though each write(2) was followed by a call to
fsync(2)). See VERSIONS.
The parenthetical comment in the second paragraph needs to be removed,
since fsync specifices that all dirty information in the page cache
will be flushed out. From the fsync man page:
fsync() transfers ("flushes") all modified in-core data of (i.e., modi‐
fied buffer cache pages for) the file referred to by the file descriptor
fd to the disk device (or other permanent storage device) so that all
changed information can be retrieved even if the system crashes or is
rebooted. This includes writing through or flushing a disk cache if
present. The call blocks until the device reports that the transfer has
completed.
I'll also mention that the fsync man page doesn't really talk about
its interaction with O_DIRECT writes. This is mentioned in the
open(2) man page, and in general, people who use O_DIRECT are
generally expected to know what they are doing. But in the context of
O_DIRECT writes, the fsync(2) call is also used to make sure that a
CACHE FLUSH or equivalent command is sent to the storage device, such
that the O_DIRECT write is guaranteed to persist after a power
failure.
Cheers,
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Writing more than 4096 bytes with O_SYNC flag does not persist all previously written data if system crashes
2026-02-19 13:32 ` Theodore Tso
@ 2026-02-23 12:46 ` Alejandro Colomar
2026-02-23 19:32 ` Theodore Tso
0 siblings, 1 reply; 10+ messages in thread
From: Alejandro Colomar @ 2026-02-23 12:46 UTC (permalink / raw)
To: Theodore Tso
Cc: Andreas Dilger, Vyacheslav Kovalevsky, linux-ext4, linux-kernel,
linux-man
[-- Attachment #1: Type: text/plain, Size: 3156 bytes --]
Hi Ted, Andreas,
On 2026-02-19T08:32:44-0500, Theodore Tso wrote:
> +linux-man
>
> On Wed, Feb 18, 2026 at 02:55:13PM -0700, Andreas Dilger wrote:
> > If anything, the man page should be updated to be more concise, like:
> >
> > "the *just written* output data *on that file descriptor* and associated
> > file metadata have been transferred to the underlying hardware (i.e.
> > as though each write(2) was followed by a call to sync_file_range(2)
> > for the corresponding file offset(s))"
>
> Yeah, this is an inaccuracy in the man page; the definition of O_SYNC
> from the Single Unix Specification states:
>
> O_SYNC Write I/O operations on the file descriptor shall complete
> ^^^^^^^^^^^^^^^^^^^^^^
> as defined by synchronized I/O file integrity completion.
>
> Compare and contrast this to what's in the Linux manpage:
>
> O_SYNC Write operations on the file will complete according to the re‐
> quirements of synchronized I/O file integrity completion (by con‐
> trast with the synchronized I/O data integrity completion pro‐
> vided by O_DSYNC.)
>
> By the time write(2) (or similar) returns, the output data and
> associated file metadata have been transferred to the underlying
> hardware (i.e., as though each write(2) was followed by a call to
> fsync(2)). See VERSIONS.
>
> The parenthetical comment in the second paragraph needs to be removed,
> since fsync specifices that all dirty information in the page cache
> will be flushed out.
Would you mind checking the text in VERSIONS (since there's a reference
to it right next to the text you're proposing to remove)? I suspect it
will also need to be updated accordingly. I don't feel qualified to
touch that text by myself.
If you'd write a patch, I'd appreciate that.
Have a lovely day!
Alex
> From the fsync man page:
>
> fsync() transfers ("flushes") all modified in-core data of (i.e., modi‐
> fied buffer cache pages for) the file referred to by the file descriptor
> fd to the disk device (or other permanent storage device) so that all
> changed information can be retrieved even if the system crashes or is
> rebooted. This includes writing through or flushing a disk cache if
> present. The call blocks until the device reports that the transfer has
> completed.
>
> I'll also mention that the fsync man page doesn't really talk about
> its interaction with O_DIRECT writes. This is mentioned in the
> open(2) man page, and in general, people who use O_DIRECT are
> generally expected to know what they are doing. But in the context of
> O_DIRECT writes, the fsync(2) call is also used to make sure that a
> CACHE FLUSH or equivalent command is sent to the storage device, such
> that the O_DIRECT write is guaranteed to persist after a power
> failure.
>
> Cheers,
>
> - Ted
>
--
<https://www.alejandro-colomar.es>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Writing more than 4096 bytes with O_SYNC flag does not persist all previously written data if system crashes
2026-02-23 12:46 ` Alejandro Colomar
@ 2026-02-23 19:32 ` Theodore Tso
2026-02-24 1:21 ` Andreas Dilger
2026-03-03 13:19 ` Alejandro Colomar
0 siblings, 2 replies; 10+ messages in thread
From: Theodore Tso @ 2026-02-23 19:32 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Andreas Dilger, Vyacheslav Kovalevsky, linux-ext4, linux-kernel,
linux-man
On Mon, Feb 23, 2026 at 01:46:54PM +0100, Alejandro Colomar wrote:
> Hi Ted, Andreas,
>
> > The parenthetical comment in the second paragraph needs to be removed,
> > since fsync specifices that all dirty information in the page cache
> > will be flushed out.
>
> Would you mind checking the text in VERSIONS (since there's a reference
> to it right next to the text you're proposing to remove)? I suspect it
> will also need to be updated accordingly. I don't feel qualified to
> touch that text by myself.
The text in VERSIONS is not incorrect, in that it is talking about the
distinction of O_SYNC and O_DSYNC in terms of which kinds of metadata
will be persisted.
However, the reason why all of this information regarding Synchronized
I/O is in VERSIONS is describing the historic behaviour of Linux
version 2.6.33 versus more modern versions of Linux. But 2.6.33 dates
from February 24, 2010 --- 16 years ago. So it might be simpler if we
simply dropped this kind of historical information. But if you do
want to keep it, we should move the bulk of that inforamtion into
O_SYNC and O_DSYNC.
So maybe:
O_DSYNC
Write operations on the file will complete according to the re‐
quirements of synchronized I/O data integrity completion.
By the time write(2) (and similar) return, the output data has
been transferred to the underlying hardware, along with any file
metadata that would be required to retrieve that data.
See VERSIONS for a description of how historial versions
of the Linux kernes from 2010 behaved.
O_SYNC Write operations on the file will complete according to the re‐
quirements of synchronized I/O file integrity completion (by con‐
trast with the synchronized I/O data integrity completion pro‐
vided by O_DSYNC.)
By the time write(2) (or similar) returns, the output
data and all file metadata associated inode for the
opened file have been transferred to the underlying
hardware.
See VERSIONS for a description of how historial versions
of the Linux kernes from 2010 behaved.
VERSIONS
Before Linux 2.6.33, Linux implemented only the O_SYNC flag for
open(). However, when that flag was specified, most
filesystems actually pro‐ vided the equivalent of synchronized
I/O data integrity completion (i.e., O_SYNC was actually
implemented as the equivalent of O_DSYNC).
I'd suggest dropping everything else in VERSIONS, including the
discussion of O_RSYNC. All of that is much more appropriate for a
tutorial.
If you really want to keep all of that text, perhaps it could be moved
into a synchronized-io man page in section 7. In that we can talk
about the difference of fsync() and fdatasync(), which is interesting
as a conceptual model, and conceptually it is similar to the O_SYNC
and O_DSYNC. But the difference of what data will be written back
(the data that was written in the file descriptor where the
O_SYNC/O_DSYNC flag was set, eitehr via open or fcntl, versus all
buffered data in the buffer cache). The synchronized-io man page
could also have more of the information around O_DIRECT in one place.
> If you'd write a patch, I'd appreciate that.
Well, there's a question of what's the minimal change that is needed
to fix out-and-out inaccuracies, and we can just delete some
parenthetical comments.
BTW, if we want to delete inaccurate information, I'd also suggest
deleting the following text in the O_DIRECT section of the man page:
A semantically similar (but deprecated) interface for block
devices is described in raw(8).
----
Then there's trying to rearrange the tutorial-style information for
people who want to implement code which needs data persistence
guarantees. That's quite a lot more work, and while I'm happy to
review or assist someone to write that more expansive tutorial
material, it's not something I'm willing to sign up to do.
----
Finally, there are some philosophical questions about what the goals
of the Linux kernel man pages --- how important is having historical
information (for exmaple O_DIRECT has a "since 2.4.10", which is 25
years ago --- really)? and how important is there to have tutorial
infomation and where should that information should be organized in
the man page.
My personal opinion is that the primary priority of the Linux man page
is to document the specification of the kernel interfaces that we
expose to user space. Things like tutorial material and a descriptive
of historical versions are of secondary importance.
I'd also advocate dropping historical information for kernel versions
which are older than say, 7 years. Curretly the oldest LTS kernel
which is supported upstream is 5.10, which was originally released in
2020, and will EOL by end of 2026. The Linux kernel 5.0 was released
on March 3, 2019, so using a 7 year lookback means that explanation
about how the Linux kernel in 2.4.x, 2.6.y, 3.x, 4.x, etc. can be
dropped from the man pages, since IMHO it will reduces a lot of noise
that will likely confuse readers.
But that's a call for Alex and the man pages project to make.
Cheers,
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Writing more than 4096 bytes with O_SYNC flag does not persist all previously written data if system crashes
2026-02-23 19:32 ` Theodore Tso
@ 2026-02-24 1:21 ` Andreas Dilger
2026-03-03 13:19 ` Alejandro Colomar
1 sibling, 0 replies; 10+ messages in thread
From: Andreas Dilger @ 2026-02-24 1:21 UTC (permalink / raw)
To: Theodore Tso
Cc: Alejandro Colomar, Vyacheslav Kovalevsky, linux-ext4,
linux-kernel, linux-man
On Feb 23, 2026, at 12:32, Theodore Tso <tytso@mit.edu> wrote:
>
> On Mon, Feb 23, 2026 at 01:46:54PM +0100, Alejandro Colomar wrote:
>> Hi Ted, Andreas,
>>
>>> The parenthetical comment in the second paragraph needs to be removed,
>>> since fsync specifices that all dirty information in the page cache
>>> will be flushed out.
>>
>> Would you mind checking the text in VERSIONS (since there's a reference
>> to it right next to the text you're proposing to remove)? I suspect it
>> will also need to be updated accordingly. I don't feel qualified to
>> touch that text by myself.
>
> The text in VERSIONS is not incorrect, in that it is talking about the
> distinction of O_SYNC and O_DSYNC in terms of which kinds of metadata
> will be persisted.
>
> However, the reason why all of this information regarding Synchronized
> I/O is in VERSIONS is describing the historic behaviour of Linux
> version 2.6.33 versus more modern versions of Linux. But 2.6.33 dates
> from February 24, 2010 --- 16 years ago. So it might be simpler if we
> simply dropped this kind of historical information. But if you do
> want to keep it, we should move the bulk of that inforamtion into
> O_SYNC and O_DSYNC.
>
> So maybe:
>
> O_DSYNC
> Write operations on the file will complete according to the
> requirements of synchronized I/O data integrity completion.
Should this be more specific to say "on a file descriptor opened with this flag" or "on this file descriptor", since the original thread was about whether *any* data written to the "file" would also be persisted...
> By the time write(2) (and similar) return, the output data has
> been transferred to the underlying hardware, along with any file
> metadata that would be required to retrieve that data.
>
> See VERSIONS for a description of how historial versions
> of the Linux kernes from 2010 behaved.
>
> O_SYNC Write operations on the file will complete according to the re‐
> quirements of synchronized I/O file integrity completion (by con‐
> trast with the synchronized I/O data integrity completion pro‐
> vided by O_DSYNC.)
Same, "on this file descriptor" or similar.
> By the time write(2) (or similar) returns, the output
> data and all file metadata associated inode for the
> opened file have been transferred to the underlying
> hardware.
>
> See VERSIONS for a description of how historial versions
> of the Linux kernes from 2010 behaved.
>
> VERSIONS
> Before Linux 2.6.33, Linux implemented only the O_SYNC flag for
> open(). However, when that flag was specified, most
> filesystems actually pro‐ vided the equivalent of synchronized
> I/O data integrity completion (i.e., O_SYNC was actually
> implemented as the equivalent of O_DSYNC).
>
> I'd suggest dropping everything else in VERSIONS, including the
> discussion of O_RSYNC. All of that is much more appropriate for a
> tutorial.
IMHO, agreed. If users are running really old versions of Linux then it
is likely they will have suitably old versions of the man pages as well.
There has to be some balance between highlighting potential interop issues
that an application developer might see vs. cluttering the text so that
readers are not clear _what_ the right semantics are.
Cheers, Andreas
> If you really want to keep all of that text, perhaps it could be moved
> into a synchronized-io man page in section 7. In that we can talk
> about the difference of fsync() and fdatasync(), which is interesting
> as a conceptual model, and conceptually it is similar to the O_SYNC
> and O_DSYNC. But the difference of what data will be written back
> (the data that was written in the file descriptor where the
> O_SYNC/O_DSYNC flag was set, eitehr via open or fcntl, versus all
> buffered data in the buffer cache). The synchronized-io man page
> could also have more of the information around O_DIRECT in one place.
>
>> If you'd write a patch, I'd appreciate that.
>
> Well, there's a question of what's the minimal change that is needed
> to fix out-and-out inaccuracies, and we can just delete some
> parenthetical comments.
>
> BTW, if we want to delete inaccurate information, I'd also suggest
> deleting the following text in the O_DIRECT section of the man page:
>
> A semantically similar (but deprecated) interface for block
> devices is described in raw(8).
>
> ----
>
> Then there's trying to rearrange the tutorial-style information for
> people who want to implement code which needs data persistence
> guarantees. That's quite a lot more work, and while I'm happy to
> review or assist someone to write that more expansive tutorial
> material, it's not something I'm willing to sign up to do.
>
> ----
>
> Finally, there are some philosophical questions about what the goals
> of the Linux kernel man pages --- how important is having historical
> information (for exmaple O_DIRECT has a "since 2.4.10", which is 25
> years ago --- really)? and how important is there to have tutorial
> infomation and where should that information should be organized in
> the man page.
>
> My personal opinion is that the primary priority of the Linux man page
> is to document the specification of the kernel interfaces that we
> expose to user space. Things like tutorial material and a descriptive
> of historical versions are of secondary importance.
>
> I'd also advocate dropping historical information for kernel versions
> which are older than say, 7 years. Curretly the oldest LTS kernel
> which is supported upstream is 5.10, which was originally released in
> 2020, and will EOL by end of 2026. The Linux kernel 5.0 was released
> on March 3, 2019, so using a 7 year lookback means that explanation
> about how the Linux kernel in 2.4.x, 2.6.y, 3.x, 4.x, etc. can be
> dropped from the man pages, since IMHO it will reduces a lot of noise
> that will likely confuse readers.
>
> But that's a call for Alex and the man pages project to make.
>
> Cheers,
>
> - Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Writing more than 4096 bytes with O_SYNC flag does not persist all previously written data if system crashes
2026-02-18 13:29 Writing more than 4096 bytes with O_SYNC flag does not persist all previously written data if system crashes Vyacheslav Kovalevsky
2026-02-18 21:55 ` Andreas Dilger
@ 2026-02-24 14:47 ` Christoph Hellwig
2026-02-24 22:23 ` Darrick J. Wong
1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2026-02-24 14:47 UTC (permalink / raw)
To: Vyacheslav Kovalevsky; +Cc: tytso, adilger.kernel, linux-ext4, linux-kernel
A lot of folks have already explained the O_SYNC semantics correctly,
but I have another major question about your test case.
On Wed, Feb 18, 2026 at 04:29:30PM +0300, Vyacheslav Kovalevsky wrote:
> Detailed description
> ====================
>
> Hello, there seems to be an issue with ext4 crash behavior:
>
> 1. Create and sync a new file.
> 2. Open the file and write some data (must be more than 4096 bytes).
> 3. Close the file.
> 4. Open the file with O_SYNC flag and write some data.
>
> After system crash the file will have the wrong size and some previously
> written data will be lost.
The wrong size here seems incorrect. Even if the old data written
through the non-O_SYNC fd wasn't written out I absolutely can't see how
the file would have an incorrect size here. Can you please share your
test case?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Writing more than 4096 bytes with O_SYNC flag does not persist all previously written data if system crashes
2026-02-24 14:47 ` Christoph Hellwig
@ 2026-02-24 22:23 ` Darrick J. Wong
2026-02-25 14:20 ` Christoph Hellwig
0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2026-02-24 22:23 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Vyacheslav Kovalevsky, tytso, adilger.kernel, linux-ext4,
linux-kernel
On Tue, Feb 24, 2026 at 06:47:19AM -0800, Christoph Hellwig wrote:
> A lot of folks have already explained the O_SYNC semantics correctly,
> but I have another major question about your test case.
>
> On Wed, Feb 18, 2026 at 04:29:30PM +0300, Vyacheslav Kovalevsky wrote:
> > Detailed description
> > ====================
> >
> > Hello, there seems to be an issue with ext4 crash behavior:
> >
> > 1. Create and sync a new file.
> > 2. Open the file and write some data (must be more than 4096 bytes).
> > 3. Close the file.
> > 4. Open the file with O_SYNC flag and write some data.
> >
> > After system crash the file will have the wrong size and some previously
> > written data will be lost.
>
> The wrong size here seems incorrect. Even if the old data written
> through the non-O_SYNC fd wasn't written out I absolutely can't see how
> the file would have an incorrect size here. Can you please share your
> test case?
He did, way at the beginning: open a file, write 5000 bytes, close it,
open again with O_SYNC, write 300 bytes, close it, force-reboot, and
watch the file come back up with only 4096 bytes written.
I /think/ that's because generic_write_sync only flushes the range that
was dirtied by the write() call, so only the first 4k gets written back
to disk. xfs and ext4 exhibit this behavior; vfat and btrfs persist all
50000 bytes.
--D
#!/bin/bash -x
# Let's see if a small O_SYNC write flushes the rest of the file?
dev="${1:-/dev/sda}"
mnt="${2:-/mnt}"
fstyp="${3:-xfs}"
devsz=$(blockdev --getsz $dev)
test -z "$devsz" && exit 1
umount $dev $mnt
dmsetup remove crap
dmsetup create crap --table "0 $devsz linear $dev 0"
dmdev=/dev/mapper/crap
test -b "$dmdev" || exit 1
rmmod $fstyp
wipefs -a $dmdev
mkfs.$fstyp $dmdev
mount $dmdev $mnt
xfs_io -f -c 'pwrite -S 0x58 0 50000' $mnt/a
xfs_io -s -c 'pwrite -S 0x42 10 300' $mnt/a
dmsetup suspend crap --noflush
dmsetup load crap --table "0 $devsz error"
dmsetup resume crap
dmsetup table
umount $mnt
dmsetup suspend crap
dmsetup load crap --table "0 $devsz linear $dev 0"
dmsetup resume crap
mount $dmdev $mnt
od -tx1 -Ad -c $mnt/a
stat $mnt/a
umount $mnt
dmsetup remove crap
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Writing more than 4096 bytes with O_SYNC flag does not persist all previously written data if system crashes
2026-02-24 22:23 ` Darrick J. Wong
@ 2026-02-25 14:20 ` Christoph Hellwig
0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2026-02-25 14:20 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, Vyacheslav Kovalevsky, tytso, adilger.kernel,
linux-ext4, linux-kernel
On Tue, Feb 24, 2026 at 02:23:39PM -0800, Darrick J. Wong wrote:
> He did, way at the beginning: open a file, write 5000 bytes, close it,
> open again with O_SYNC, write 300 bytes, close it, force-reboot, and
> watch the file come back up with only 4096 bytes written.
Oh, I misunderstood the load and thought it would write the 300 bytes
after the previous 5000 bytes. If it overwrites the result is totally
expected.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Writing more than 4096 bytes with O_SYNC flag does not persist all previously written data if system crashes
2026-02-23 19:32 ` Theodore Tso
2026-02-24 1:21 ` Andreas Dilger
@ 2026-03-03 13:19 ` Alejandro Colomar
1 sibling, 0 replies; 10+ messages in thread
From: Alejandro Colomar @ 2026-03-03 13:19 UTC (permalink / raw)
To: Theodore Tso
Cc: Andreas Dilger, Vyacheslav Kovalevsky, linux-ext4, linux-kernel,
linux-man
[-- Attachment #1: Type: text/plain, Size: 6637 bytes --]
Hi Ted,
On 2026-02-23T14:32:38-0500, Theodore Tso wrote:
[...]
> The text in VERSIONS is not incorrect, in that it is talking about the
> distinction of O_SYNC and O_DSYNC in terms of which kinds of metadata
> will be persisted.
>
> However, the reason why all of this information regarding Synchronized
> I/O is in VERSIONS is describing the historic behaviour of Linux
> version 2.6.33 versus more modern versions of Linux. But 2.6.33 dates
> from February 24, 2010 --- 16 years ago. So it might be simpler if we
> simply dropped this kind of historical information.
I prefer keeping it, but I agree with moving it to a place where it
doesn't distract (maybe even a separate page).
> But if you do
> want to keep it, we should move the bulk of that inforamtion into
> O_SYNC and O_DSYNC.
>
> So maybe:
>
> O_DSYNC
> Write operations on the file will complete according to the re‐
> quirements of synchronized I/O data integrity completion.
>
> By the time write(2) (and similar) return, the output data has
> been transferred to the underlying hardware, along with any file
> metadata that would be required to retrieve that data.
>
> See VERSIONS for a description of how historial versions
> of the Linux kernes from 2010 behaved.
>
> O_SYNC Write operations on the file will complete according to the re‐
> quirements of synchronized I/O file integrity completion (by con‐
> trast with the synchronized I/O data integrity completion pro‐
> vided by O_DSYNC.)
>
> By the time write(2) (or similar) returns, the output
> data and all file metadata associated inode for the
> opened file have been transferred to the underlying
> hardware.
>
> See VERSIONS for a description of how historial versions
> of the Linux kernes from 2010 behaved.
LGTM.
>
> VERSIONS
> Before Linux 2.6.33, Linux implemented only the O_SYNC flag for
> open(). However, when that flag was specified, most
> filesystems actually pro‐ vided the equivalent of synchronized
> I/O data integrity completion (i.e., O_SYNC was actually
> implemented as the equivalent of O_DSYNC).
>
> I'd suggest dropping everything else in VERSIONS, including the
> discussion of O_RSYNC. All of that is much more appropriate for a
> tutorial.
How about having an O_RSYNC(2const) manual page that talks in detail
about it?
>
> If you really want to keep all of that text, perhaps it could be moved
> into a synchronized-io man page in section 7.
Yes, a syncronized-io(7) page would make sense.
> In that we can talk
> about the difference of fsync() and fdatasync(), which is interesting
> as a conceptual model, and conceptually it is similar to the O_SYNC
> and O_DSYNC. But the difference of what data will be written back
> (the data that was written in the file descriptor where the
> O_SYNC/O_DSYNC flag was set, eitehr via open or fcntl, versus all
> buffered data in the buffer cache). The synchronized-io man page
> could also have more of the information around O_DIRECT in one place.
I like the idea of a chapter 7 manual page, or separate 2const pages for
each different macro. Whatever you consider more useful/readable.
>
> > If you'd write a patch, I'd appreciate that.
>
> Well, there's a question of what's the minimal change that is needed
> to fix out-and-out inaccuracies, and we can just delete some
> parenthetical comments.
Yup; I strongly prefer many minimal patches. If you (or anyone) start
by removing parentheticals that are unnecessary or incorrect, that'd be
good.
I would do that, but I wouldn't be able to write the commit messages, or
decide how to group them. I'd need someone expert in those APIs to
write the patches. I can then amend them editorially if they have any
minor issues.
> BTW, if we want to delete inaccurate information, I'd also suggest
> deleting the following text in the O_DIRECT section of the man page:
>
> A semantically similar (but deprecated) interface for block
> devices is described in raw(8).
>
> ----
>
> Then there's trying to rearrange the tutorial-style information for
> people who want to implement code which needs data persistence
> guarantees. That's quite a lot more work, and while I'm happy to
> review or assist someone to write that more expansive tutorial
> material, it's not something I'm willing to sign up to do.
Okay. While I can't do the removal of inaccurate text, I can reorganize
correct text. If you do the former, I can do this afterwards. I'll CC
you in such patches.
> ----
>
> Finally, there are some philosophical questions about what the goals
> of the Linux kernel man pages --- how important is having historical
> information (for exmaple O_DIRECT has a "since 2.4.10", which is 25
> years ago --- really)? and how important is there to have tutorial
> infomation and where should that information should be organized in
> the man page.
Michael Kerrisk wanted to keep everything after Linux 2.6. Moving it to
HISTORY, and reducing less important details, is appropriate, but
removing it all is not so much.
I more or less keep that guideline, although I'm slightly more prone to
removals, but not too much.
> My personal opinion is that the primary priority of the Linux man page
> is to document the specification of the kernel interfaces that we
> expose to user space. Things like tutorial material and a descriptive
> of historical versions are of secondary importance.
Yup. I've been moving a lot of text to separate pages or HISTORY
sections, or removing unnecessary details.
> I'd also advocate dropping historical information for kernel versions
> which are older than say, 7 years. Curretly the oldest LTS kernel
> which is supported upstream is 5.10, which was originally released in
> 2020, and will EOL by end of 2026. The Linux kernel 5.0 was released
> on March 3, 2019, so using a 7 year lookback means that explanation
> about how the Linux kernel in 2.4.x, 2.6.y, 3.x, 4.x, etc. can be
> dropped from the man pages, since IMHO it will reduces a lot of noise
> that will likely confuse readers.
>
> But that's a call for Alex and the man pages project to make.
Have a lovely day!
Alex
>
> Cheers,
>
> - Ted
--
<https://www.alejandro-colomar.es>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-03-03 13:20 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-18 13:29 Writing more than 4096 bytes with O_SYNC flag does not persist all previously written data if system crashes Vyacheslav Kovalevsky
2026-02-18 21:55 ` Andreas Dilger
2026-02-19 13:32 ` Theodore Tso
2026-02-23 12:46 ` Alejandro Colomar
2026-02-23 19:32 ` Theodore Tso
2026-02-24 1:21 ` Andreas Dilger
2026-03-03 13:19 ` Alejandro Colomar
2026-02-24 14:47 ` Christoph Hellwig
2026-02-24 22:23 ` Darrick J. Wong
2026-02-25 14:20 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox