linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [bug report] ext4 misses final i_size meta sync under O_DIRECT | O_SYNC semantics after iomap DIO conversion
@ 2023-09-19  6:00 Gao Xiang
  2023-09-19 12:05 ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Gao Xiang @ 2023-09-19  6:00 UTC (permalink / raw)
  To: linux-ext4
  Cc: Theodore Ts'o, Jan Kara, Matthew Bobrowski, Christoph Hellwig,
	Joseph Qi

[-- Attachment #1: Type: text/plain, Size: 1765 bytes --]

Hi folks,

Our consumer reports a behavior change between pre-iomap and iomap
direct io conversion:

If the system crashes after an appending write to a file open with
O_DIRECT | O_SYNC flag set, file i_size won't be updated even if
O_SYNC was marked before.

It can be reproduced by a test program in the attachment with
gcc -o repro repro.c && ./repro testfile && echo c > /proc/sysrq-trigger

After some analysis, we found that before iomap direct I/O conversion,
the timing was roughly (taking Linux 3.10 codebase as an example):

	..
	- ext4_file_dio_write
	  - __generic_file_aio_write
	      ..
	    - ext4_direct_IO  # generic_file_direct_write
	      - ext4_ext_direct_IO
	        - ext4_ind_direct_IO  # final_size > inode->i_size
	          - ..
	          - ret = blockdev_direct_IO()
	          - i_size_write(inode, end) # orphan && ret > 0 &&
	                                   # end > inode->i_size
	          - ext4_mark_inode_dirty()
	          - ...
	  - generic_write_sync  # handling O_SYNC

So the dirty inode meta will be committed into journal immediately
if O_SYNC is set.  However, After commit 569342dc2485 ("ext4: move
inode extension/truncate code out from ->iomap_end() callback"),
the new behavior seems as below:

	..
	- ext4_dio_write_iter
	  - ext4_dio_write_checks  # extend = 1
	  - iomap_dio_rw
	      - __iomap_dio_rw
	      - iomap_dio_complete
	        - generic_write_sync
	  - ext4_handle_inode_extension  # extend = 1

So that i_size will be recorded only after generic_write_sync() is
called.  So O_SYNC won't flush the update i_size to the disk.

On the other side, after a quick look of XFS side, it will record
i_size changes in xfs_dio_write_end_io() so it seems that it doesn't
have this problem.

Thanks,
Gao Xiang

[-- Attachment #2: repro.c --]
[-- Type: text/plain, Size: 1282 bytes --]

#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <linux/fs.h>
#include <fcntl.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>

#define PAGE_SIZE 4096

int test(char* file)
{
    char* buf = NULL;
    int ret = 0;
    int i = 0;
    posix_memalign((void**)(&buf), PAGE_SIZE, PAGE_SIZE);
    memset(buf, 0, PAGE_SIZE);
    for (i = 0 ; i < PAGE_SIZE ; ++i) buf[i] = i;
    int fd = open(file, O_WRONLY|O_CREAT|O_DIRECT|O_SYNC);
    if (fd == -1)
    {
        fprintf(stderr, "%s: %s\n", file, strerror(errno));
        exit(1);
    }
    struct stat st;
    ret = fstat(fd, &st);
    if (ret != 0)
    {
        fprintf(stderr, "%s: %s\n", file, strerror(errno));
        exit(1);
    }
    int offset = st.st_size;
    ret = pwrite(fd, buf, PAGE_SIZE, offset);
    if (ret != PAGE_SIZE)
    {
        fprintf(stderr, "write fail: ret %d %s\n", ret, strerror(errno));
    }
    close(fd);
    return 0;
}

int main(int argc, char ** argv)
{
    int ret = 0;
    char file[1024] = {};
    if (argc != 2)
    {
        fprintf(stderr, "usage: %s path-to-write\n", argv[0]);
        exit(2);
    }
    snprintf(file, sizeof(file), "%s_%d", argv[1], 0);
    test(file);
    return 0;
}

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-09-28  4:51 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-19  6:00 [bug report] ext4 misses final i_size meta sync under O_DIRECT | O_SYNC semantics after iomap DIO conversion Gao Xiang
2023-09-19 12:05 ` Jan Kara
2023-09-19 13:47   ` Gao Xiang
2023-09-20  7:29     ` Dave Chinner
2023-09-20  7:36       ` Gao Xiang
2023-09-20 11:38   ` Ritesh Harjani
2023-09-20 15:20     ` Jan Kara
2023-09-22 12:03       ` Ritesh Harjani
2023-09-22 12:10         ` [PATCH] generic: Add integrity tests with synchronous directio Ritesh Harjani (IBM)
2023-09-22 13:29           ` Gao Xiang
2023-09-22 16:09             ` Ritesh Harjani
2023-09-22 15:21           ` Darrick J. Wong
2023-09-22 16:13             ` Ritesh Harjani
2023-09-22 17:06           ` Zorro Lang
2023-09-23 10:25             ` Ritesh Harjani
2023-09-23 12:00         ` [PATCHv2 1/2] aio-dio-write-verify: Add sync and noverify option Ritesh Harjani (IBM)
2023-09-23 12:00           ` [PATCHv2 2/2] generic: Add integrity tests with synchronous directio Ritesh Harjani (IBM)
2023-09-28  3:42             ` Zorro Lang
2023-09-28  4:51               ` Ritesh Harjani
2023-09-25 12:08         ` [bug report] ext4 misses final i_size meta sync under O_DIRECT | O_SYNC semantics after iomap DIO conversion Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).