* make misbehavior on ext2 in dax mode (was: a crash when running strace from persistent memory) [not found] <alpine.LRH.2.02.2009031328040.6929@file01.intranet.prod.int.rdu2.redhat.com> @ 2020-09-04 16:21 ` Mikulas Patocka 2020-09-05 12:11 ` Mikulas Patocka 0 siblings, 1 reply; 14+ messages in thread From: Mikulas Patocka @ 2020-09-04 16:21 UTC (permalink / raw) To: Linus Torvalds Cc: Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Jan Kara, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Dan Williams, linux-mm, linux-kernel, linux-nvdimm, linux-ext4 On Thu, 3 Sep 2020, Mikulas Patocka wrote: > Hi > > There's a bug when you run strace from dax-based filesystem. Hmm, so I've found another bug in dax mode. If you extract the Linux kernel tree on dax-based ext2 filesystem (use the real ext2 driver, not ext4), and then you run make twice, the second invocation will rebuild everything. It seems like a problem with timestamps. mount -t ext2 -o dax /dev/pmem0 /mnt/ext2/ cd /mnt/ext2/usr/src/git/linux-2.6 make clean make -j12 make -j12 <--- this rebuilds the whole tree, althought it shouldn't I wasn't able to bisect it because this bug seems to be present in every kernel I tried (back to 4.16.0). Ext4 doesn't seem to have this bug. Mikulas ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: make misbehavior on ext2 in dax mode (was: a crash when running strace from persistent memory) 2020-09-04 16:21 ` make misbehavior on ext2 in dax mode (was: a crash when running strace from persistent memory) Mikulas Patocka @ 2020-09-05 12:11 ` Mikulas Patocka 2020-09-05 12:12 ` [PATCH 1/2] ext2: don't update mtime on COW faults Mikulas Patocka 2020-09-05 12:13 ` [PATCH 2/2] xfs: " Mikulas Patocka 0 siblings, 2 replies; 14+ messages in thread From: Mikulas Patocka @ 2020-09-05 12:11 UTC (permalink / raw) To: Linus Torvalds, Jan Kara, Darrick J. Wong, Dave Chinner Cc: Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Dan Williams, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs On Fri, 4 Sep 2020, Mikulas Patocka wrote: > Hmm, so I've found another bug in dax mode. > > If you extract the Linux kernel tree on dax-based ext2 filesystem (use the > real ext2 driver, not ext4), and then you run make twice, the second > invocation will rebuild everything. It seems like a problem with > timestamps. > > mount -t ext2 -o dax /dev/pmem0 /mnt/ext2/ > cd /mnt/ext2/usr/src/git/linux-2.6 > make clean > make -j12 > make -j12 <--- this rebuilds the whole tree, althought it shouldn't > > I wasn't able to bisect it because this bug seems to be present in every > kernel I tried (back to 4.16.0). Ext4 doesn't seem to have this bug. > > Mikulas I've found out the root cause for this bug (XFS has it too) and I'm sending patches. Mikulas ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/2] ext2: don't update mtime on COW faults 2020-09-05 12:11 ` Mikulas Patocka @ 2020-09-05 12:12 ` Mikulas Patocka 2020-09-07 9:00 ` Jan Kara 2020-09-05 12:13 ` [PATCH 2/2] xfs: " Mikulas Patocka 1 sibling, 1 reply; 14+ messages in thread From: Mikulas Patocka @ 2020-09-05 12:12 UTC (permalink / raw) To: Linus Torvalds, Jan Kara, Darrick J. Wong, Dave Chinner Cc: Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Dan Williams, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs When running in a dax mode, if the user maps a page with MAP_PRIVATE and PROT_WRITE, the ext2 filesystem would incorrectly update ctime and mtime when the user hits a COW fault. This breaks building of the Linux kernel. How to reproduce: 1. extract the Linux kernel tree on dax-mounted ext2 filesystem 2. run make clean 3. run make -j12 4. run make -j12 - at step 4, make would incorrectly rebuild the whole kernel (although it was already built in step 3). The reason for the breakage is that almost all object files depend on objtool. When we run objtool, it takes COW page fault on its .data section, and these faults will incorrectly update the timestamp of the objtool binary. The updated timestamp causes make to rebuild the whole tree. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org --- fs/ext2/file.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) Index: linux-2.6/fs/ext2/file.c =================================================================== --- linux-2.6.orig/fs/ext2/file.c 2020-09-05 10:01:41.000000000 +0200 +++ linux-2.6/fs/ext2/file.c 2020-09-05 13:09:50.000000000 +0200 @@ -93,8 +93,10 @@ static vm_fault_t ext2_dax_fault(struct struct inode *inode = file_inode(vmf->vma->vm_file); struct ext2_inode_info *ei = EXT2_I(inode); vm_fault_t ret; + bool write = (vmf->flags & FAULT_FLAG_WRITE) && + (vmf->vma->vm_flags & VM_SHARED); - if (vmf->flags & FAULT_FLAG_WRITE) { + if (write) { sb_start_pagefault(inode->i_sb); file_update_time(vmf->vma->vm_file); } @@ -103,7 +105,7 @@ static vm_fault_t ext2_dax_fault(struct ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, NULL, &ext2_iomap_ops); up_read(&ei->dax_sem); - if (vmf->flags & FAULT_FLAG_WRITE) + if (write) sb_end_pagefault(inode->i_sb); return ret; } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] ext2: don't update mtime on COW faults 2020-09-05 12:12 ` [PATCH 1/2] ext2: don't update mtime on COW faults Mikulas Patocka @ 2020-09-07 9:00 ` Jan Kara 0 siblings, 0 replies; 14+ messages in thread From: Jan Kara @ 2020-09-07 9:00 UTC (permalink / raw) To: Mikulas Patocka Cc: Linus Torvalds, Jan Kara, Darrick J. Wong, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Dan Williams, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs On Sat 05-09-20 08:12:01, Mikulas Patocka wrote: > When running in a dax mode, if the user maps a page with MAP_PRIVATE and > PROT_WRITE, the ext2 filesystem would incorrectly update ctime and mtime > when the user hits a COW fault. > > This breaks building of the Linux kernel. > How to reproduce: > 1. extract the Linux kernel tree on dax-mounted ext2 filesystem > 2. run make clean > 3. run make -j12 > 4. run make -j12 > - at step 4, make would incorrectly rebuild the whole kernel (although it > was already built in step 3). > > The reason for the breakage is that almost all object files depend on > objtool. When we run objtool, it takes COW page fault on its .data > section, and these faults will incorrectly update the timestamp of the > objtool binary. The updated timestamp causes make to rebuild the whole > tree. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Cc: stable@vger.kernel.org Thanks. Good spotting! Linus has already merged this so nothing more to do here. Honza > > --- > fs/ext2/file.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > Index: linux-2.6/fs/ext2/file.c > =================================================================== > --- linux-2.6.orig/fs/ext2/file.c 2020-09-05 10:01:41.000000000 +0200 > +++ linux-2.6/fs/ext2/file.c 2020-09-05 13:09:50.000000000 +0200 > @@ -93,8 +93,10 @@ static vm_fault_t ext2_dax_fault(struct > struct inode *inode = file_inode(vmf->vma->vm_file); > struct ext2_inode_info *ei = EXT2_I(inode); > vm_fault_t ret; > + bool write = (vmf->flags & FAULT_FLAG_WRITE) && > + (vmf->vma->vm_flags & VM_SHARED); > > - if (vmf->flags & FAULT_FLAG_WRITE) { > + if (write) { > sb_start_pagefault(inode->i_sb); > file_update_time(vmf->vma->vm_file); > } > @@ -103,7 +105,7 @@ static vm_fault_t ext2_dax_fault(struct > ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, NULL, &ext2_iomap_ops); > > up_read(&ei->dax_sem); > - if (vmf->flags & FAULT_FLAG_WRITE) > + if (write) > sb_end_pagefault(inode->i_sb); > return ret; > } > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-05 12:11 ` Mikulas Patocka 2020-09-05 12:12 ` [PATCH 1/2] ext2: don't update mtime on COW faults Mikulas Patocka @ 2020-09-05 12:13 ` Mikulas Patocka 2020-09-05 15:36 ` Darrick J. Wong ` (2 more replies) 1 sibling, 3 replies; 14+ messages in thread From: Mikulas Patocka @ 2020-09-05 12:13 UTC (permalink / raw) To: Linus Torvalds, Jan Kara, Darrick J. Wong, Dave Chinner Cc: Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Dan Williams, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs When running in a dax mode, if the user maps a page with MAP_PRIVATE and PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime when the user hits a COW fault. This breaks building of the Linux kernel. How to reproduce: 1. extract the Linux kernel tree on dax-mounted xfs filesystem 2. run make clean 3. run make -j12 4. run make -j12 - at step 4, make would incorrectly rebuild the whole kernel (although it was already built in step 3). The reason for the breakage is that almost all object files depend on objtool. When we run objtool, it takes COW page fault on its .data section, and these faults will incorrectly update the timestamp of the objtool binary. The updated timestamp causes make to rebuild the whole tree. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org --- fs/xfs/xfs_file.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) Index: linux-2.6/fs/xfs/xfs_file.c =================================================================== --- linux-2.6.orig/fs/xfs/xfs_file.c 2020-09-05 10:01:42.000000000 +0200 +++ linux-2.6/fs/xfs/xfs_file.c 2020-09-05 13:59:12.000000000 +0200 @@ -1223,6 +1223,13 @@ __xfs_filemap_fault( return ret; } +static bool +xfs_is_write_fault( + struct vm_fault *vmf) +{ + return vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED; +} + static vm_fault_t xfs_filemap_fault( struct vm_fault *vmf) @@ -1230,7 +1237,7 @@ xfs_filemap_fault( /* DAX can shortcut the normal fault path on write faults! */ return __xfs_filemap_fault(vmf, PE_SIZE_PTE, IS_DAX(file_inode(vmf->vma->vm_file)) && - (vmf->flags & FAULT_FLAG_WRITE)); + xfs_is_write_fault(vmf)); } static vm_fault_t @@ -1243,7 +1250,7 @@ xfs_filemap_huge_fault( /* DAX can shortcut the normal fault path on write faults! */ return __xfs_filemap_fault(vmf, pe_size, - (vmf->flags & FAULT_FLAG_WRITE)); + xfs_is_write_fault(vmf)); } static vm_fault_t ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-05 12:13 ` [PATCH 2/2] xfs: " Mikulas Patocka @ 2020-09-05 15:36 ` Darrick J. Wong 2020-09-05 17:02 ` Mikulas Patocka 2020-09-05 16:47 ` Linus Torvalds 2020-09-07 6:47 ` [PATCH 2/2] " Christoph Hellwig 2 siblings, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2020-09-05 15:36 UTC (permalink / raw) To: Mikulas Patocka Cc: Linus Torvalds, Jan Kara, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Dan Williams, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs On Sat, Sep 05, 2020 at 08:13:02AM -0400, Mikulas Patocka wrote: > When running in a dax mode, if the user maps a page with MAP_PRIVATE and > PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime > when the user hits a COW fault. > > This breaks building of the Linux kernel. > How to reproduce: > 1. extract the Linux kernel tree on dax-mounted xfs filesystem > 2. run make clean > 3. run make -j12 > 4. run make -j12 > - at step 4, make would incorrectly rebuild the whole kernel (although it > was already built in step 3). > > The reason for the breakage is that almost all object files depend on > objtool. When we run objtool, it takes COW page fault on its .data > section, and these faults will incorrectly update the timestamp of the > objtool binary. The updated timestamp causes make to rebuild the whole > tree. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Cc: stable@vger.kernel.org > > --- > fs/xfs/xfs_file.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > Index: linux-2.6/fs/xfs/xfs_file.c > =================================================================== > --- linux-2.6.orig/fs/xfs/xfs_file.c 2020-09-05 10:01:42.000000000 +0200 > +++ linux-2.6/fs/xfs/xfs_file.c 2020-09-05 13:59:12.000000000 +0200 > @@ -1223,6 +1223,13 @@ __xfs_filemap_fault( > return ret; > } > > +static bool > +xfs_is_write_fault( Call this xfs_is_shared_dax_write_fault, and throw in the IS_DAX() test? You might as well make it a static inline. > + struct vm_fault *vmf) > +{ > + return vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED; Also, is "shortcutting the normal fault path" the reason for ext2 and xfs both being broken? /me puzzles over why write_fault is always true for page_mkwrite and pfn_mkwrite, but not for fault and huge_fault... Also: Can you please turn this (checking for timestamp update behavior wrt shared and private mapping write faults) into an fstest so we don't mess this up again? --D > +} > + > static vm_fault_t > xfs_filemap_fault( > struct vm_fault *vmf) > @@ -1230,7 +1237,7 @@ xfs_filemap_fault( > /* DAX can shortcut the normal fault path on write faults! */ > return __xfs_filemap_fault(vmf, PE_SIZE_PTE, > IS_DAX(file_inode(vmf->vma->vm_file)) && > - (vmf->flags & FAULT_FLAG_WRITE)); > + xfs_is_write_fault(vmf)); > } > > static vm_fault_t > @@ -1243,7 +1250,7 @@ xfs_filemap_huge_fault( > > /* DAX can shortcut the normal fault path on write faults! */ > return __xfs_filemap_fault(vmf, pe_size, > - (vmf->flags & FAULT_FLAG_WRITE)); > + xfs_is_write_fault(vmf)); > } > > static vm_fault_t > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-05 15:36 ` Darrick J. Wong @ 2020-09-05 17:02 ` Mikulas Patocka 2020-09-10 6:06 ` Darrick J. Wong 0 siblings, 1 reply; 14+ messages in thread From: Mikulas Patocka @ 2020-09-05 17:02 UTC (permalink / raw) To: Darrick J. Wong Cc: Linus Torvalds, Jan Kara, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Dan Williams, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs On Sat, 5 Sep 2020, Darrick J. Wong wrote: > On Sat, Sep 05, 2020 at 08:13:02AM -0400, Mikulas Patocka wrote: > > When running in a dax mode, if the user maps a page with MAP_PRIVATE and > > PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime > > when the user hits a COW fault. > > > > This breaks building of the Linux kernel. > > How to reproduce: > > 1. extract the Linux kernel tree on dax-mounted xfs filesystem > > 2. run make clean > > 3. run make -j12 > > 4. run make -j12 > > - at step 4, make would incorrectly rebuild the whole kernel (although it > > was already built in step 3). > > > > The reason for the breakage is that almost all object files depend on > > objtool. When we run objtool, it takes COW page fault on its .data > > section, and these faults will incorrectly update the timestamp of the > > objtool binary. The updated timestamp causes make to rebuild the whole > > tree. > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > Cc: stable@vger.kernel.org > > > > --- > > fs/xfs/xfs_file.c | 11 +++++++++-- > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > Index: linux-2.6/fs/xfs/xfs_file.c > > =================================================================== > > --- linux-2.6.orig/fs/xfs/xfs_file.c 2020-09-05 10:01:42.000000000 +0200 > > +++ linux-2.6/fs/xfs/xfs_file.c 2020-09-05 13:59:12.000000000 +0200 > > @@ -1223,6 +1223,13 @@ __xfs_filemap_fault( > > return ret; > > } > > > > +static bool > > +xfs_is_write_fault( > > Call this xfs_is_shared_dax_write_fault, and throw in the IS_DAX() test? > > You might as well make it a static inline. Yes, it is possible. I'll send a second version. > > + struct vm_fault *vmf) > > +{ > > + return vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED; > > Also, is "shortcutting the normal fault path" the reason for ext2 and > xfs both being broken? > > /me puzzles over why write_fault is always true for page_mkwrite and > pfn_mkwrite, but not for fault and huge_fault... > > Also: Can you please turn this (checking for timestamp update behavior > wrt shared and private mapping write faults) into an fstest so we don't > mess this up again? I've written this program that tests it - you can integrate it into your testsuite. Mikulas #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <string.h> #include <sys/mman.h> #include <sys/stat.h> #define FILE_NAME "test.txt" static struct stat st1, st2; int main(void) { int h, r; char *map; unlink(FILE_NAME); h = creat(FILE_NAME, 0600); if (h == -1) perror("creat"), exit(1); r = write(h, "x", 1); if (r != 1) perror("write"), exit(1); if (close(h)) perror("close"), exit(1); h = open(FILE_NAME, O_RDWR); if (h == -1) perror("open"), exit(1); map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE, h, 0); if (map == MAP_FAILED) perror("mmap"), exit(1); if (fstat(h, &st1)) perror("fstat"), exit(1); sleep(2); *map = 'y'; if (fstat(h, &st2)) perror("fstat"), exit(1); if (memcmp(&st1, &st2, sizeof(struct stat))) fprintf(stderr, "BUG: COW fault changed time!\n"), exit(1); if (munmap(map, 4096)) perror("munmap"), exit(1); map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, h, 0); if (map == MAP_FAILED) perror("mmap"), exit(1); if (fstat(h, &st1)) perror("fstat"), exit(1); sleep(2); *map = 'z'; if (fstat(h, &st2)) perror("fstat"), exit(1); if (st1.st_mtime == st2.st_mtime) fprintf(stderr, "BUG: Shared fault did not change mtime!\n"), exit(1); if (st1.st_ctime == st2.st_ctime) fprintf(stderr, "BUG: Shared fault did not change ctime!\n"), exit(1); if (munmap(map, 4096)) perror("munmap"), exit(1); if (close(h)) perror("close"), exit(1); if (unlink(FILE_NAME)) perror("unlink"), exit(1); return 0; } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-05 17:02 ` Mikulas Patocka @ 2020-09-10 6:06 ` Darrick J. Wong 2020-09-11 16:41 ` Mikulas Patocka 0 siblings, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2020-09-10 6:06 UTC (permalink / raw) To: Mikulas Patocka Cc: Linus Torvalds, Jan Kara, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Dan Williams, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs, Eric Sandeen On Sat, Sep 05, 2020 at 01:02:33PM -0400, Mikulas Patocka wrote: > > > On Sat, 5 Sep 2020, Darrick J. Wong wrote: > > > On Sat, Sep 05, 2020 at 08:13:02AM -0400, Mikulas Patocka wrote: > > > When running in a dax mode, if the user maps a page with MAP_PRIVATE and > > > PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime > > > when the user hits a COW fault. > > > > > > This breaks building of the Linux kernel. > > > How to reproduce: > > > 1. extract the Linux kernel tree on dax-mounted xfs filesystem > > > 2. run make clean > > > 3. run make -j12 > > > 4. run make -j12 > > > - at step 4, make would incorrectly rebuild the whole kernel (although it > > > was already built in step 3). > > > > > > The reason for the breakage is that almost all object files depend on > > > objtool. When we run objtool, it takes COW page fault on its .data > > > section, and these faults will incorrectly update the timestamp of the > > > objtool binary. The updated timestamp causes make to rebuild the whole > > > tree. > > > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > > Cc: stable@vger.kernel.org > > > > > > --- > > > fs/xfs/xfs_file.c | 11 +++++++++-- > > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > > > Index: linux-2.6/fs/xfs/xfs_file.c > > > =================================================================== > > > --- linux-2.6.orig/fs/xfs/xfs_file.c 2020-09-05 10:01:42.000000000 +0200 > > > +++ linux-2.6/fs/xfs/xfs_file.c 2020-09-05 13:59:12.000000000 +0200 > > > @@ -1223,6 +1223,13 @@ __xfs_filemap_fault( > > > return ret; > > > } > > > > > > +static bool > > > +xfs_is_write_fault( > > > > Call this xfs_is_shared_dax_write_fault, and throw in the IS_DAX() test? > > > > You might as well make it a static inline. > > Yes, it is possible. I'll send a second version. > > > > + struct vm_fault *vmf) > > > +{ > > > + return vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED; > > > > Also, is "shortcutting the normal fault path" the reason for ext2 and > > xfs both being broken? > > > > /me puzzles over why write_fault is always true for page_mkwrite and > > pfn_mkwrite, but not for fault and huge_fault... > > > > Also: Can you please turn this (checking for timestamp update behavior > > wrt shared and private mapping write faults) into an fstest so we don't > > mess this up again? > > I've written this program that tests it - you can integrate it into your > testsuite. I don't get it. You're a filesystem maintainer too, which means you're a regular contributor. Do you: (a) not use fstests? If you don't, I really hope you use something else to QA hpfs. (b) really think that it's my problem to integrate and submit your regression tests for you? > Mikulas > > > #include <stdio.h> and (c) what do you want me to do with a piece of code that has no signoff tag, no copyright, and no license? This is your patch, and therefore your responsibility to develop enough of an appropriate regression test in a proper form that the rest of us can easily determine we have the rights to contribute to it. I don't have a problem with helping to tweak a properly licensed and tagged test program into fstests, but this is a non-starter. --D > #include <stdlib.h> > #include <unistd.h> > #include <fcntl.h> > #include <string.h> > #include <sys/mman.h> > #include <sys/stat.h> > > #define FILE_NAME "test.txt" > > static struct stat st1, st2; > > int main(void) > { > int h, r; > char *map; > unlink(FILE_NAME); > h = creat(FILE_NAME, 0600); > if (h == -1) perror("creat"), exit(1); > r = write(h, "x", 1); > if (r != 1) perror("write"), exit(1); > if (close(h)) perror("close"), exit(1); > h = open(FILE_NAME, O_RDWR); > if (h == -1) perror("open"), exit(1); > > map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE, h, 0); > if (map == MAP_FAILED) perror("mmap"), exit(1); > if (fstat(h, &st1)) perror("fstat"), exit(1); > sleep(2); > *map = 'y'; > if (fstat(h, &st2)) perror("fstat"), exit(1); > if (memcmp(&st1, &st2, sizeof(struct stat))) fprintf(stderr, "BUG: COW fault changed time!\n"), exit(1); > if (munmap(map, 4096)) perror("munmap"), exit(1); > > map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, h, 0); > if (map == MAP_FAILED) perror("mmap"), exit(1); > if (fstat(h, &st1)) perror("fstat"), exit(1); > sleep(2); > *map = 'z'; > if (fstat(h, &st2)) perror("fstat"), exit(1); > if (st1.st_mtime == st2.st_mtime) fprintf(stderr, "BUG: Shared fault did not change mtime!\n"), exit(1); > if (st1.st_ctime == st2.st_ctime) fprintf(stderr, "BUG: Shared fault did not change ctime!\n"), exit(1); > if (munmap(map, 4096)) perror("munmap"), exit(1); > > if (close(h)) perror("close"), exit(1); > if (unlink(FILE_NAME)) perror("unlink"), exit(1); > return 0; > } > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-10 6:06 ` Darrick J. Wong @ 2020-09-11 16:41 ` Mikulas Patocka 0 siblings, 0 replies; 14+ messages in thread From: Mikulas Patocka @ 2020-09-11 16:41 UTC (permalink / raw) To: Darrick J. Wong Cc: Linus Torvalds, Jan Kara, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Dan Williams, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs, Eric Sandeen On Wed, 9 Sep 2020, Darrick J. Wong wrote: > On Sat, Sep 05, 2020 at 01:02:33PM -0400, Mikulas Patocka wrote: > > > > > > > > I've written this program that tests it - you can integrate it into your > > testsuite. > > I don't get it. You're a filesystem maintainer too, which means you're > a regular contributor. Do you: > > (a) not use fstests? If you don't, I really hope you use something else > to QA hpfs. I don't use xfstests on HPFS. I was testing it just by using it. Now I use it just a little, but I don't modify it much. > (b) really think that it's my problem to integrate and submit your > regression tests for you? > > and (c) what do you want me to do with a piece of code that has no > signoff tag, no copyright, and no license? This is your patch, and > therefore your responsibility to develop enough of an appropriate > regression test in a proper form that the rest of us can easily > determine we have the rights to contribute to it. If you want a full patch (I copied the script from test 313), I send it here. Mikulas From: Mikulas Patocka <mpatocka@redhat.com> Subject: [PATCH] check ctime and mtime vs mmap Check ctime and mtime are not updated on COW faults and that they are updated on shared faults Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- src/Makefile | 3 +- src/mmap-timestamp.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++ tests/generic/609 | 40 +++++++++++++++++++++++++++++++++++++ tests/generic/609.out | 2 + tests/generic/group | 1 5 files changed, 98 insertions(+), 1 deletion(-) Index: xfstests-dev/src/Makefile =================================================================== --- xfstests-dev.orig/src/Makefile 2020-09-06 12:38:40.000000000 +0200 +++ xfstests-dev/src/Makefile 2020-09-11 17:39:04.000000000 +0200 @@ -17,7 +17,8 @@ TARGETS = dirstress fill fill2 getpagesi t_mmap_cow_race t_mmap_fallocate fsync-err t_mmap_write_ro \ t_ext4_dax_journal_corruption t_ext4_dax_inline_corruption \ t_ofd_locks t_mmap_collision mmap-write-concurrent \ - t_get_file_time t_create_short_dirs t_create_long_dirs + t_get_file_time t_create_short_dirs t_create_long_dirs \ + mmap-timestamp LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \ preallo_rw_pattern_writer ftrunc trunc fs_perms testx looptest \ Index: xfstests-dev/src/mmap-timestamp.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ xfstests-dev/src/mmap-timestamp.c 2020-09-11 18:21:40.000000000 +0200 @@ -0,0 +1,53 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (C) 2020 Red Hat, Inc. All Rights reserved. + +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <fcntl.h> +#include <string.h> +#include <sys/mman.h> +#include <sys/stat.h> + +#define FILE_NAME argv[1] + +static struct stat st1, st2; + +int main(int argc, char *argv[]) +{ + int h, r; + char *map; + unlink(FILE_NAME); + h = creat(FILE_NAME, 0600); + if (h == -1) perror("creat"), exit(1); + r = write(h, "x", 1); + if (r != 1) perror("write"), exit(1); + if (close(h)) perror("close"), exit(1); + h = open(FILE_NAME, O_RDWR); + if (h == -1) perror("open"), exit(1); + + map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE, h, 0); + if (map == MAP_FAILED) perror("mmap"), exit(1); + if (fstat(h, &st1)) perror("fstat"), exit(1); + sleep(2); + *map = 'y'; + if (fstat(h, &st2)) perror("fstat"), exit(1); + if (st1.st_mtime != st2.st_mtime) fprintf(stderr, "BUG: COW fault changed mtime!\n"), exit(1); + if (st1.st_ctime != st2.st_ctime) fprintf(stderr, "BUG: COW fault changed ctime!\n"), exit(1); + if (munmap(map, 4096)) perror("munmap"), exit(1); + + map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, h, 0); + if (map == MAP_FAILED) perror("mmap"), exit(1); + if (fstat(h, &st1)) perror("fstat"), exit(1); + sleep(2); + *map = 'z'; + if (msync(map, 4096, MS_SYNC)) perror("msync"), exit(1); + if (fstat(h, &st2)) perror("fstat"), exit(1); + if (st1.st_mtime == st2.st_mtime) fprintf(stderr, "BUG: Shared fault did not change mtime!\n"), exit(1); + if (st1.st_ctime == st2.st_ctime) fprintf(stderr, "BUG: Shared fault did not change ctime!\n"), exit(1); + if (munmap(map, 4096)) perror("munmap"), exit(1); + + if (close(h)) perror("close"), exit(1); + if (unlink(FILE_NAME)) perror("unlink"), exit(1); + return 0; +} Index: xfstests-dev/tests/generic/609 =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ xfstests-dev/tests/generic/609 2020-09-11 18:30:30.000000000 +0200 @@ -0,0 +1,40 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2020 Red Hat, Inc. All Rights Reserved. +# +# FS QA Test No. 609 +# +# Check ctime and mtime are not updated on COW faults +# and that they are updated on shared faults +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $testfile +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs generic +_supported_os Linux +_require_test + +testfile=$TEST_DIR/testfile.$seq + +echo "Silence is golden" + +$here/src/mmap-timestamp $testfile 2>&1 + +status=0 +exit Index: xfstests-dev/tests/generic/609.out =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ xfstests-dev/tests/generic/609.out 2020-09-11 18:24:24.000000000 +0200 @@ -0,0 +1,2 @@ +QA output created by 609 +Silence is golden Index: xfstests-dev/tests/generic/group =================================================================== --- xfstests-dev.orig/tests/generic/group 2020-09-06 12:38:40.000000000 +0200 +++ xfstests-dev/tests/generic/group 2020-09-11 18:25:09.000000000 +0200 @@ -611,3 +611,4 @@ 606 auto attr quick dax 607 auto attr quick dax 608 auto attr quick dax +609 auto quick ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-05 12:13 ` [PATCH 2/2] xfs: " Mikulas Patocka 2020-09-05 15:36 ` Darrick J. Wong @ 2020-09-05 16:47 ` Linus Torvalds 2020-09-05 17:03 ` Linus Torvalds 2020-09-05 17:04 ` [PATCH 2/2 v2] " Mikulas Patocka 2020-09-07 6:47 ` [PATCH 2/2] " Christoph Hellwig 2 siblings, 2 replies; 14+ messages in thread From: Linus Torvalds @ 2020-09-05 16:47 UTC (permalink / raw) To: Mikulas Patocka Cc: Jan Kara, Darrick J. Wong, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Dan Williams, Linux-MM, Linux Kernel Mailing List, linux-nvdimm, Ext4 Developers List, linux-xfs On Sat, Sep 5, 2020 at 5:13 AM Mikulas Patocka <mpatocka@redhat.com> wrote: > > When running in a dax mode, if the user maps a page with MAP_PRIVATE and > PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime > when the user hits a COW fault. So your patch is obviously correct, but at the same time I look at the (buggy) ext2/xfs code you fixed, and I go "well, that was a really natural mistake to make". So I get the feeling that "yes, this was an ext2 and xfs bug, but we kind of set those filesystems up to fail". Could this possibly have been avoided by having nicer interfaces? Grepping around, and doing a bit of "git blame", I note that ext4 used to have this exact same bug too, but it was fixed three years ago in commit fd96b8da68d3 ("ext4: fix fault handling when mounted with -o dax,ro") and nobody at the time clearly realized it might be a pattern. And honestly, it's possible that the pattern came from cut-and-paste errors, but it's equally likely that the pattern was there simply because it was such a natural pattern and such an easy and natural mistake to make. Maybe it's inevitable. Some people do want (and need) the information whether it was a write just because they care about the page table issues (ie marking the pte dirty etc). To that kind of situation, whether it's shared or not might not matter all that much. But to a filesystem, a private write vs a shared write are quite different things. So I don't really have any suggestions, and maybe it's just what it is, but maybe somebody has an idea for how to make it slightly less natural to make this mistake.. But maybe just a test-case is all it takes, like Darrick suggests. Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-05 16:47 ` Linus Torvalds @ 2020-09-05 17:03 ` Linus Torvalds 2020-09-07 8:59 ` Jan Kara 2020-09-05 17:04 ` [PATCH 2/2 v2] " Mikulas Patocka 1 sibling, 1 reply; 14+ messages in thread From: Linus Torvalds @ 2020-09-05 17:03 UTC (permalink / raw) To: Mikulas Patocka Cc: Jan Kara, Darrick J. Wong, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Dan Williams, Linux-MM, Linux Kernel Mailing List, linux-nvdimm, Ext4 Developers List, linux-xfs On Sat, Sep 5, 2020 at 9:47 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So your patch is obviously correct, [..] Oh, and I had a xfs pull request in my inbox already, so rather than expect Darrick to do another one just for this and have Jan do one for ext2, I just applied these two directly as "ObviouslyCorrect(tm)". I added the "inline" as suggested by Darrick, and I also added parenthesis around the bit tests. Yes, I know the C precedence rules, but I just personally find the code easier to read if I don't even have to think about it and the different subexpressions of a logical operation are just visually very clear. And as I was editing the patch anyway... So that xfs helper function now looks like this +static inline bool +xfs_is_write_fault( + struct vm_fault *vmf) +{ + return (vmf->flags & FAULT_FLAG_WRITE) && + (vmf->vma->vm_flags & VM_SHARED); +} instead. Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-05 17:03 ` Linus Torvalds @ 2020-09-07 8:59 ` Jan Kara 0 siblings, 0 replies; 14+ messages in thread From: Jan Kara @ 2020-09-07 8:59 UTC (permalink / raw) To: Linus Torvalds Cc: Mikulas Patocka, Jan Kara, Darrick J. Wong, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Dan Williams, Linux-MM, Linux Kernel Mailing List, linux-nvdimm, Ext4 Developers List, linux-xfs On Sat 05-09-20 10:03:20, Linus Torvalds wrote: > On Sat, Sep 5, 2020 at 9:47 AM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > So your patch is obviously correct, [..] > > Oh, and I had a xfs pull request in my inbox already, so rather than > expect Darrick to do another one just for this and have Jan do one for > ext2, I just applied these two directly as "ObviouslyCorrect(tm)". OK, thanks! Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/2 v2] xfs: don't update mtime on COW faults 2020-09-05 16:47 ` Linus Torvalds 2020-09-05 17:03 ` Linus Torvalds @ 2020-09-05 17:04 ` Mikulas Patocka 1 sibling, 0 replies; 14+ messages in thread From: Mikulas Patocka @ 2020-09-05 17:04 UTC (permalink / raw) To: Darrick J. Wong Cc: Linus Torvalds, Jan Kara, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Dan Williams, Linux-MM, Linux Kernel Mailing List, linux-nvdimm, Ext4 Developers List, linux-xfs When running in a dax mode, if the user maps a page with MAP_PRIVATE and PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime when the user hits a COW fault. This breaks building of the Linux kernel. How to reproduce: 1. extract the Linux kernel tree on dax-mounted xfs filesystem 2. run make clean 3. run make -j12 4. run make -j12 - at step 4, make would incorrectly rebuild the whole kernel (although it was already built in step 3). The reason for the breakage is that almost all object files depend on objtool. When we run objtool, it takes COW page fault on its .data section, and these faults will incorrectly update the timestamp of the objtool binary. The updated timestamp causes make to rebuild the whole tree. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org --- fs/xfs/xfs_file.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) Index: linux-2.6/fs/xfs/xfs_file.c =================================================================== --- linux-2.6.orig/fs/xfs/xfs_file.c 2020-09-05 18:48:54.000000000 +0200 +++ linux-2.6/fs/xfs/xfs_file.c 2020-09-05 18:56:48.000000000 +0200 @@ -1223,14 +1223,21 @@ __xfs_filemap_fault( return ret; } +static inline bool +xfs_is_shared_dax_write_fault( + struct vm_fault *vmf) +{ + return IS_DAX(file_inode(vmf->vma->vm_file)) && + vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED; +} + static vm_fault_t xfs_filemap_fault( struct vm_fault *vmf) { /* DAX can shortcut the normal fault path on write faults! */ return __xfs_filemap_fault(vmf, PE_SIZE_PTE, - IS_DAX(file_inode(vmf->vma->vm_file)) && - (vmf->flags & FAULT_FLAG_WRITE)); + xfs_is_shared_dax_write_fault(vmf)); } static vm_fault_t @@ -1243,7 +1250,7 @@ xfs_filemap_huge_fault( /* DAX can shortcut the normal fault path on write faults! */ return __xfs_filemap_fault(vmf, pe_size, - (vmf->flags & FAULT_FLAG_WRITE)); + xfs_is_shared_dax_write_fault(vmf)); } static vm_fault_t ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-05 12:13 ` [PATCH 2/2] xfs: " Mikulas Patocka 2020-09-05 15:36 ` Darrick J. Wong 2020-09-05 16:47 ` Linus Torvalds @ 2020-09-07 6:47 ` Christoph Hellwig 2 siblings, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2020-09-07 6:47 UTC (permalink / raw) To: Mikulas Patocka Cc: Linus Torvalds, Jan Kara, Darrick J. Wong, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Dan Williams, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs > +static bool > +xfs_is_write_fault( > + struct vm_fault *vmf) > +{ > + return vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED; > +} This function does not look xfs specific at all. Why isn't it it in fs.h? While we're at it the name sounds rather generic, and there are no good comments. Maybe we just need to split FAULT_FLAG_WRITE into two and check those instead of such crazy workarounds? ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-09-11 16:42 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <alpine.LRH.2.02.2009031328040.6929@file01.intranet.prod.int.rdu2.redhat.com>
2020-09-04 16:21 ` make misbehavior on ext2 in dax mode (was: a crash when running strace from persistent memory) Mikulas Patocka
2020-09-05 12:11 ` Mikulas Patocka
2020-09-05 12:12 ` [PATCH 1/2] ext2: don't update mtime on COW faults Mikulas Patocka
2020-09-07 9:00 ` Jan Kara
2020-09-05 12:13 ` [PATCH 2/2] xfs: " Mikulas Patocka
2020-09-05 15:36 ` Darrick J. Wong
2020-09-05 17:02 ` Mikulas Patocka
2020-09-10 6:06 ` Darrick J. Wong
2020-09-11 16:41 ` Mikulas Patocka
2020-09-05 16:47 ` Linus Torvalds
2020-09-05 17:03 ` Linus Torvalds
2020-09-07 8:59 ` Jan Kara
2020-09-05 17:04 ` [PATCH 2/2 v2] " Mikulas Patocka
2020-09-07 6:47 ` [PATCH 2/2] " Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).