* ext4_da_write_end() and copied == 0 @ 2011-12-12 2:27 Andrea Arcangeli 2011-12-12 2:27 ` [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() Andrea Arcangeli 2011-12-12 2:27 ` [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters Andrea Arcangeli 0 siblings, 2 replies; 13+ messages in thread From: Andrea Arcangeli @ 2011-12-12 2:27 UTC (permalink / raw) To: linux-ext4; +Cc: Theodore Tso, Jan Kara Hi, I've been triggering ext4 troubles out of some experimental autonuma code if I set knumad frequency rates very high. The first patch looks fairly safe, but the second patch I've no idea if it's correct and if damages stuff instead of fixing it, but clearly it fixes the -EINVAL problem for me and it looks like to work fine now. [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters If you make alternate fixes for this let me know and I can test them. I can reproduce both issues instantly. Thanks, Andrea ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() 2011-12-12 2:27 ext4_da_write_end() and copied == 0 Andrea Arcangeli @ 2011-12-12 2:27 ` Andrea Arcangeli 2011-12-12 3:28 ` Yongqiang Yang ` (2 more replies) 2011-12-12 2:27 ` [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters Andrea Arcangeli 1 sibling, 3 replies; 13+ messages in thread From: Andrea Arcangeli @ 2011-12-12 2:27 UTC (permalink / raw) To: linux-ext4; +Cc: Theodore Tso, Jan Kara If the pte mapping in generic_perform_write() is unmapped between iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the "copied" parameter to ->end_write can be zero. ext4 couldn't cope with it with delayed allocations enabled. This skips the i_disksize enlargement logic if copied is zero and no new data was appeneded to the inode. gdb> bt #0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\ 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467 #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 #2 0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\ ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440 #3 generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\ os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482 #4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\ xffff88001e26be40) at mm/filemap.c:2600 #5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\ zed out>, pos=<value optimized out>) at mm/filemap.c:2632 #6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\ t fs/ext4/file.c:136 #7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \ ppos=0xffff88001e26bf48) at fs/read_write.c:406 #8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\ 000, pos=0xffff88001e26bf48) at fs/read_write.c:435 #9 0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\ 4000) at fs/read_write.c:487 #10 <signal handler called> #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ () #12 0x0000000000000000 in ?? () gdb> print offset $22 = 0xffffffffffffffff gdb> print idx $23 = 0xffffffff gdb> print inode->i_blkbits $24 = 0xc gdb> up #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 2512 if (ext4_da_should_update_i_disksize(page, end)) { gdb> print start $25 = 0x0 gdb> print end $26 = 0xffffffffffffffff gdb> print pos $27 = 0x108000 gdb> print new_i_size $28 = 0x108000 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize $29 = 0xd9000 gdb> down 2467 for (i = 0; i < idx; i++) gdb> print i $30 = 0xd44acbee This is 100% reproducible with some autonuma development code tuned in a very aggressive manner (not normal way even for knumad) which does "exotic" changes to the ptes. It wouldn't normally trigger but I don't see why it can't happen normally if the page is added to swap cache in between the two faults leading to "copied" being zero (which then hangs in ext4). So it should be fixed. Especially possible with lumpy reclaim (albeit disabled if compaction is enabled) as that would ignore the young bits in the ptes. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- fs/ext4/inode.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index fffec40..63f9541 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2508,7 +2508,7 @@ static int ext4_da_write_end(struct file *file, */ new_i_size = pos + copied; - if (new_i_size > EXT4_I(inode)->i_disksize) { + if (copied && new_i_size > EXT4_I(inode)->i_disksize) { if (ext4_da_should_update_i_disksize(page, end)) { down_write(&EXT4_I(inode)->i_data_sem); if (new_i_size > EXT4_I(inode)->i_disksize) { ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() 2011-12-12 2:27 ` [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() Andrea Arcangeli @ 2011-12-12 3:28 ` Yongqiang Yang 2011-12-12 16:57 ` Andrea Arcangeli 2011-12-12 20:27 ` Jan Kara 2011-12-14 2:44 ` Ted Ts'o 2 siblings, 1 reply; 13+ messages in thread From: Yongqiang Yang @ 2011-12-12 3:28 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-ext4, Theodore Tso, Jan Kara Hi Andrea, I can not figure out why ext4 hangs. Could you explain more on hanging? Thanks, Yongqiang. On Mon, Dec 12, 2011 at 10:27 AM, Andrea Arcangeli <aarcange@redhat.com> wrote: > If the pte mapping in generic_perform_write() is unmapped between > iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the > "copied" parameter to ->end_write can be zero. ext4 couldn't cope with > it with delayed allocations enabled. This skips the i_disksize > enlargement logic if copied is zero and no new data was appeneded to > the inode. > > gdb> bt > #0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\ > 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467 > #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ > xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 > #2 0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\ > ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440 > #3 generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\ > os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482 > #4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\ > xffff88001e26be40) at mm/filemap.c:2600 > #5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\ > zed out>, pos=<value optimized out>) at mm/filemap.c:2632 > #6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\ > t fs/ext4/file.c:136 > #7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \ > ppos=0xffff88001e26bf48) at fs/read_write.c:406 > #8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\ > 000, pos=0xffff88001e26bf48) at fs/read_write.c:435 > #9 0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\ > 4000) at fs/read_write.c:487 > #10 <signal handler called> > #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ () > #12 0x0000000000000000 in ?? () > gdb> print offset > $22 = 0xffffffffffffffff > gdb> print idx > $23 = 0xffffffff > gdb> print inode->i_blkbits > $24 = 0xc > gdb> up > #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ > xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 > 2512 if (ext4_da_should_update_i_disksize(page, end)) { > gdb> print start > $25 = 0x0 > gdb> print end > $26 = 0xffffffffffffffff > gdb> print pos > $27 = 0x108000 > gdb> print new_i_size > $28 = 0x108000 > gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize > $29 = 0xd9000 > gdb> down > 2467 for (i = 0; i < idx; i++) > gdb> print i > $30 = 0xd44acbee > > This is 100% reproducible with some autonuma development code tuned in > a very aggressive manner (not normal way even for knumad) which does > "exotic" changes to the ptes. It wouldn't normally trigger but I don't > see why it can't happen normally if the page is added to swap cache in > between the two faults leading to "copied" being zero (which then > hangs in ext4). So it should be fixed. Especially possible with lumpy > reclaim (albeit disabled if compaction is enabled) as that would > ignore the young bits in the ptes. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > fs/ext4/inode.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index fffec40..63f9541 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -2508,7 +2508,7 @@ static int ext4_da_write_end(struct file *file, > */ > > new_i_size = pos + copied; > - if (new_i_size > EXT4_I(inode)->i_disksize) { > + if (copied && new_i_size > EXT4_I(inode)->i_disksize) { > if (ext4_da_should_update_i_disksize(page, end)) { > down_write(&EXT4_I(inode)->i_data_sem); > if (new_i_size > EXT4_I(inode)->i_disksize) { > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Wishes Yongqiang Yang -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() 2011-12-12 3:28 ` Yongqiang Yang @ 2011-12-12 16:57 ` Andrea Arcangeli 2011-12-13 1:05 ` Yongqiang Yang 0 siblings, 1 reply; 13+ messages in thread From: Andrea Arcangeli @ 2011-12-12 16:57 UTC (permalink / raw) To: Yongqiang Yang; +Cc: linux-ext4, Theodore Tso, Jan Kara On Mon, Dec 12, 2011 at 11:28:21AM +0800, Yongqiang Yang wrote: > Hi Andrea, > > I can not figure out why ext4 hangs. Could you explain more on hanging? Well I tried to explain it in the changeset comment. If "copied" is zero, end/offset/idx becomes -1, and this loops: for (i = 0; i < idx; i++) definitely a bug and checking ext4 git I don't see anything fixing it. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() 2011-12-12 16:57 ` Andrea Arcangeli @ 2011-12-13 1:05 ` Yongqiang Yang 0 siblings, 0 replies; 13+ messages in thread From: Yongqiang Yang @ 2011-12-13 1:05 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-ext4, Theodore Tso, Jan Kara On Tue, Dec 13, 2011 at 12:57 AM, Andrea Arcangeli <aarcange@redhat.com> wrote: > On Mon, Dec 12, 2011 at 11:28:21AM +0800, Yongqiang Yang wrote: >> Hi Andrea, >> >> I can not figure out why ext4 hangs. Could you explain more on hanging? > > Well I tried to explain it in the changeset comment. > > If "copied" is zero, end/offset/idx becomes -1, and this loops: > > for (i = 0; i < idx; i++) > > definitely a bug and checking ext4 git I don't see anything fixing it. Got it. Thanks. Looks good to me. -- Best Wishes Yongqiang Yang -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() 2011-12-12 2:27 ` [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() Andrea Arcangeli 2011-12-12 3:28 ` Yongqiang Yang @ 2011-12-12 20:27 ` Jan Kara 2011-12-14 2:44 ` Ted Ts'o 2 siblings, 0 replies; 13+ messages in thread From: Jan Kara @ 2011-12-12 20:27 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-ext4, Theodore Tso, Jan Kara On Mon 12-12-11 03:27:07, Andrea Arcangeli wrote: > If the pte mapping in generic_perform_write() is unmapped between > iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the > "copied" parameter to ->end_write can be zero. ext4 couldn't cope with > it with delayed allocations enabled. I'd expand this sentence with: because ext4_da_should_update_i_disksize() is passed -1 as an offset and loops (almost) infinitely. I know it's also in the gdb dump below but it takes a while to notice it if you don't know what you are looking for. > This skips the i_disksize > enlargement logic if copied is zero and no new data was appeneded to > the inode. > > gdb> bt > #0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\ > 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467 > #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ > xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 > #2 0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\ > ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440 > #3 generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\ > os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482 > #4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\ > xffff88001e26be40) at mm/filemap.c:2600 > #5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\ > zed out>, pos=<value optimized out>) at mm/filemap.c:2632 > #6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\ > t fs/ext4/file.c:136 > #7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \ > ppos=0xffff88001e26bf48) at fs/read_write.c:406 > #8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\ > 000, pos=0xffff88001e26bf48) at fs/read_write.c:435 > #9 0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\ > 4000) at fs/read_write.c:487 > #10 <signal handler called> > #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ () > #12 0x0000000000000000 in ?? () > gdb> print offset > $22 = 0xffffffffffffffff > gdb> print idx > $23 = 0xffffffff > gdb> print inode->i_blkbits > $24 = 0xc > gdb> up > #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ > xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 > 2512 if (ext4_da_should_update_i_disksize(page, end)) { > gdb> print start > $25 = 0x0 > gdb> print end > $26 = 0xffffffffffffffff > gdb> print pos > $27 = 0x108000 > gdb> print new_i_size > $28 = 0x108000 > gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize > $29 = 0xd9000 > gdb> down > 2467 for (i = 0; i < idx; i++) > gdb> print i > $30 = 0xd44acbee > > This is 100% reproducible with some autonuma development code tuned in > a very aggressive manner (not normal way even for knumad) which does > "exotic" changes to the ptes. It wouldn't normally trigger but I don't > see why it can't happen normally if the page is added to swap cache in > between the two faults leading to "copied" being zero (which then > hangs in ext4). So it should be fixed. Especially possible with lumpy > reclaim (albeit disabled if compaction is enabled) as that would > ignore the young bits in the ptes. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> You can add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/ext4/inode.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index fffec40..63f9541 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -2508,7 +2508,7 @@ static int ext4_da_write_end(struct file *file, > */ > > new_i_size = pos + copied; > - if (new_i_size > EXT4_I(inode)->i_disksize) { > + if (copied && new_i_size > EXT4_I(inode)->i_disksize) { > if (ext4_da_should_update_i_disksize(page, end)) { > down_write(&EXT4_I(inode)->i_data_sem); > if (new_i_size > EXT4_I(inode)->i_disksize) { -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() 2011-12-12 2:27 ` [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() Andrea Arcangeli 2011-12-12 3:28 ` Yongqiang Yang 2011-12-12 20:27 ` Jan Kara @ 2011-12-14 2:44 ` Ted Ts'o 2 siblings, 0 replies; 13+ messages in thread From: Ted Ts'o @ 2011-12-14 2:44 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-ext4, Jan Kara On Mon, Dec 12, 2011 at 03:27:07AM +0100, Andrea Arcangeli wrote: > If the pte mapping in generic_perform_write() is unmapped between > iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the > "copied" parameter to ->end_write can be zero. ext4 couldn't cope with > it with delayed allocations enabled. This skips the i_disksize > enlargement logic if copied is zero and no new data was appeneded to > the inode. ... > > This is 100% reproducible with some autonuma development code tuned in > a very aggressive manner (not normal way even for knumad) which does > "exotic" changes to the ptes. It wouldn't normally trigger but I don't > see why it can't happen normally if the page is added to swap cache in > between the two faults leading to "copied" being zero (which then > hangs in ext4). So it should be fixed. Especially possible with lumpy > reclaim (albeit disabled if compaction is enabled) as that would > ignore the young bits in the ptes. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Thanks, applied. - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters 2011-12-12 2:27 ext4_da_write_end() and copied == 0 Andrea Arcangeli 2011-12-12 2:27 ` [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() Andrea Arcangeli @ 2011-12-12 2:27 ` Andrea Arcangeli 2011-12-12 3:17 ` Yongqiang Yang 1 sibling, 1 reply; 13+ messages in thread From: Andrea Arcangeli @ 2011-12-12 2:27 UTC (permalink / raw) To: linux-ext4; +Cc: Theodore Tso, Jan Kara If "copied" is zero (it can happen if the pte is unmapped before the atomic copy_user that does the data copy runs) the "from" passed to ext4_discard_partial_page_buffers_no_lock() points to pos-1, which would correspond to a logical page index before the page->index leading to ext4_discard_partial_page_buffers_no_lock() returning -EINVAL (because index != page->index). In such a case write() returns -EINVAL and userland gets a failure and filemap.c doesn't retry the copy_user anymore. I'm not certain of why exactly ext4_discard_partial_page_buffers_no_lock() is run here, so it's hard to tell if this is the correct fix. But it that functions clears data starting from the "from" parameter, so regardless of the -EINVAL retval, the right "from" to start clearing data should be pos+copied, not pos+copied-1. If this assumption is correct, it could mean that this bug in addition to the -EINVAL error, could also zero out 1 byte by mistake. I'm not sure what the implications for that are (not sure if data corruption is possible in some circumstances because of that). I guess normally this functions runs on unmapped buffers and the EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED makes it a noop on those. After fixing the hang in ext4_da_should_update_i_disksize, this write -EINVAL error becomes trivially reproducible with experimental knumad autonuma code running at heavy frequency (not the normal case). But like for the ext4_da_should_update_i_disksize hang, it should not be impossible to reproduce it with legacy swapping behavior. After dropping all caches a md5sum is successful and I can't find errors anymore with this patch, the -EINVAL stops, but it's not conclusive (and I haven't run e2fsck -f yet but I doubt this affects metadata coherency, seems more like a delayalloc data issue). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- fs/ext4/inode.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 63f9541..528c4c5 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2534,11 +2534,11 @@ static int ext4_da_write_end(struct file *file, page, fsdata); page_len = PAGE_CACHE_SIZE - - ((pos + copied - 1) & (PAGE_CACHE_SIZE - 1)); + ((pos + copied) & (PAGE_CACHE_SIZE - 1)); - if (page_len > 0) { + if (page_len < PAGE_CACHE_SIZE) { ret = ext4_discard_partial_page_buffers_no_lock(handle, - inode, page, pos + copied - 1, page_len, + inode, page, pos + copied, page_len, EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED); } ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters 2011-12-12 2:27 ` [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters Andrea Arcangeli @ 2011-12-12 3:17 ` Yongqiang Yang 2011-12-12 16:55 ` Andrea Arcangeli 0 siblings, 1 reply; 13+ messages in thread From: Yongqiang Yang @ 2011-12-12 3:17 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-ext4, Theodore Tso, Jan Kara Hi Andrea, The code you are testing are removed in recent patches, the patches have not been merged. Please try following patches: [PATCH 1/2] ext4: let mpage_submit_io works well when blocksize < pagesize [PATCH 2/2] ext4: let ext4_discard_partial_buffers handle pages without buffers correctly and [PATCH 1/2] ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized [PATCH 2/2] ext4: let ext4_bio_write_page handle EOF correctly Yongqiang. On Mon, Dec 12, 2011 at 10:27 AM, Andrea Arcangeli <aarcange@redhat.com> wrote: > If "copied" is zero (it can happen if the pte is unmapped before the > atomic copy_user that does the data copy runs) the "from" passed to > ext4_discard_partial_page_buffers_no_lock() points to pos-1, which > would correspond to a logical page index before the page->index > leading to ext4_discard_partial_page_buffers_no_lock() returning > -EINVAL (because index != page->index). In such a case write() returns > -EINVAL and userland gets a failure and filemap.c doesn't retry the > copy_user anymore. > > I'm not certain of why exactly > ext4_discard_partial_page_buffers_no_lock() is run here, so it's hard > to tell if this is the correct fix. But it that functions clears data > starting from the "from" parameter, so regardless of the -EINVAL > retval, the right "from" to start clearing data should be pos+copied, > not pos+copied-1. If this assumption is correct, it could mean that > this bug in addition to the -EINVAL error, could also zero out 1 byte > by mistake. I'm not sure what the implications for that are (not sure > if data corruption is possible in some circumstances because of > that). I guess normally this functions runs on unmapped buffers and > the EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED makes it a noop on those. > > After fixing the hang in ext4_da_should_update_i_disksize, this write > -EINVAL error becomes trivially reproducible with experimental knumad > autonuma code running at heavy frequency (not the normal case). But > like for the ext4_da_should_update_i_disksize hang, it should not be > impossible to reproduce it with legacy swapping behavior. > > After dropping all caches a md5sum is successful and I can't find > errors anymore with this patch, the -EINVAL stops, but it's not > conclusive (and I haven't run e2fsck -f yet but I doubt this affects > metadata coherency, seems more like a delayalloc data issue). > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > fs/ext4/inode.c | 6 +++--- > 1 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 63f9541..528c4c5 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -2534,11 +2534,11 @@ static int ext4_da_write_end(struct file *file, > page, fsdata); > > page_len = PAGE_CACHE_SIZE - > - ((pos + copied - 1) & (PAGE_CACHE_SIZE - 1)); > + ((pos + copied) & (PAGE_CACHE_SIZE - 1)); > > - if (page_len > 0) { > + if (page_len < PAGE_CACHE_SIZE) { > ret = ext4_discard_partial_page_buffers_no_lock(handle, > - inode, page, pos + copied - 1, page_len, > + inode, page, pos + copied, page_len, > EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED); > } > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Wishes Yongqiang Yang -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters 2011-12-12 3:17 ` Yongqiang Yang @ 2011-12-12 16:55 ` Andrea Arcangeli 2011-12-12 20:07 ` Allison Henderson 0 siblings, 1 reply; 13+ messages in thread From: Andrea Arcangeli @ 2011-12-12 16:55 UTC (permalink / raw) To: Yongqiang Yang; +Cc: linux-ext4, Theodore Tso, Jan Kara On Mon, Dec 12, 2011 at 11:17:54AM +0800, Yongqiang Yang wrote: > Hi Andrea, > > The code you are testing are removed in recent patches, the patches > have not been merged. I checked out the git ext4 tree and the two patches applies clean to it. So I'm unsure why you say it was removed. origin/dev origin/master are equal too. > > Please try following patches: > [PATCH 1/2] ext4: let mpage_submit_io works well when blocksize < pagesize > [PATCH 2/2] ext4: let ext4_discard_partial_buffers handle pages > without buffers correctly > > and > > [PATCH 1/2] ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized > [PATCH 2/2] ext4: let ext4_bio_write_page handle EOF correctly Is there a git tree with these patches? I'm not subscribed to the list. Thanks, Andrea ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters 2011-12-12 16:55 ` Andrea Arcangeli @ 2011-12-12 20:07 ` Allison Henderson 2011-12-13 0:40 ` Andrea Arcangeli 0 siblings, 1 reply; 13+ messages in thread From: Allison Henderson @ 2011-12-12 20:07 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Yongqiang Yang, linux-ext4, Theodore Tso, Jan Kara On 12/12/2011 09:55 AM, Andrea Arcangeli wrote: > On Mon, Dec 12, 2011 at 11:17:54AM +0800, Yongqiang Yang wrote: >> Hi Andrea, >> >> The code you are testing are removed in recent patches, the patches >> have not been merged. > > I checked out the git ext4 tree and the two patches applies clean to > it. So I'm unsure why you say it was removed. origin/dev origin/master > are equal too. > >> >> Please try following patches: >> [PATCH 1/2] ext4: let mpage_submit_io works well when blocksize< pagesize >> [PATCH 2/2] ext4: let ext4_discard_partial_buffers handle pages >> without buffers correctly >> >> and >> >> [PATCH 1/2] ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized >> [PATCH 2/2] ext4: let ext4_bio_write_page handle EOF correctly > > Is there a git tree with these patches? I'm not subscribed to the list. > > Thanks, > Andrea Hi Andrea, I think what Yongqiang means is that the code you are modifying is being removed in the patches listed above and replaced with a different solution. The patches have not yet been merged so you will not see them in a git tree yet, so you will need to apply them yourself. If you are not subscribed to the list, you can still find the patches here: http://www.spinics.net/lists/linux-ext4/msg29109.html http://www.spinics.net/lists/linux-ext4/msg29110.html http://www.spinics.net/lists/linux-ext4/msg29375.html http://www.spinics.net/lists/linux-ext4/msg29376.html We want to make sure the solution works for everyone, so please apply these patches and let us know if it corrects the issues you are seeing. Thx! Allison Henderson > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters 2011-12-12 20:07 ` Allison Henderson @ 2011-12-13 0:40 ` Andrea Arcangeli 2011-12-13 1:16 ` Yongqiang Yang 0 siblings, 1 reply; 13+ messages in thread From: Andrea Arcangeli @ 2011-12-13 0:40 UTC (permalink / raw) To: Allison Henderson; +Cc: Yongqiang Yang, linux-ext4, Theodore Tso, Jan Kara Hi Allison, On Mon, Dec 12, 2011 at 01:07:47PM -0700, Allison Henderson wrote: > Hi Andrea, > > I think what Yongqiang means is that the code you are modifying is being > removed in the patches listed above and replaced with a different > solution. The patches have not yet been merged so you will not see them > in a git tree yet, so you will need to apply them yourself. If you are > not subscribed to the list, you can still find the patches here: > > http://www.spinics.net/lists/linux-ext4/msg29109.html > http://www.spinics.net/lists/linux-ext4/msg29110.html > > http://www.spinics.net/lists/linux-ext4/msg29375.html > http://www.spinics.net/lists/linux-ext4/msg29376.html Thanks for the links. Applied all 4. > We want to make sure the solution works for everyone, so please apply > these patches and let us know if it corrects the issues you are seeing. > Thx! With all 4 patches applied, my 1/2 patch is still needed to avoid the hang and writes hangs all the time (but now they don't return -EINVAL anymore). My patch 2/2 is not needed with the above 4 patches applied so that can be discared if you apply the above 4 patches (but I suggest you check my patch 2/2 too because if it happens to be correct it could mean there's the potential of 1 byte data corruption in current upstream kernels in some circumstances, in addition to the -EINVAL failure in write()). Let me know if you need me to test me anything else in relation to the copied=0 case. Thanks, Andrea ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters 2011-12-13 0:40 ` Andrea Arcangeli @ 2011-12-13 1:16 ` Yongqiang Yang 0 siblings, 0 replies; 13+ messages in thread From: Yongqiang Yang @ 2011-12-13 1:16 UTC (permalink / raw) To: Andrea Arcangeli Cc: Allison Henderson, linux-ext4, Theodore Tso, Jan Kara, Hugh Dickins, Andy Whitcroft On Tue, Dec 13, 2011 at 8:40 AM, Andrea Arcangeli <aarcange@redhat.com> wrote: > Hi Allison, > > On Mon, Dec 12, 2011 at 01:07:47PM -0700, Allison Henderson wrote: >> Hi Andrea, >> >> I think what Yongqiang means is that the code you are modifying is being >> removed in the patches listed above and replaced with a different >> solution. The patches have not yet been merged so you will not see them >> in a git tree yet, so you will need to apply them yourself. If you are >> not subscribed to the list, you can still find the patches here: >> >> http://www.spinics.net/lists/linux-ext4/msg29109.html >> http://www.spinics.net/lists/linux-ext4/msg29110.html >> >> http://www.spinics.net/lists/linux-ext4/msg29375.html >> http://www.spinics.net/lists/linux-ext4/msg29376.html > > Thanks for the links. Applied all 4. > >> We want to make sure the solution works for everyone, so please apply >> these patches and let us know if it corrects the issues you are seeing. >> Thx! > > With all 4 patches applied, my 1/2 patch is still needed to avoid the > hang and writes hangs all the time (but now they don't return -EINVAL > anymore). > > My patch 2/2 is not needed with the above 4 patches applied so that > can be discared if you apply the above 4 patches (but I suggest you > check my patch 2/2 too because if it happens to be correct it could > mean there's the potential of 1 byte data corruption in current > upstream kernels in some circumstances, in addition to the -EINVAL > failure in write()). page_len is removed in the 2nd patch, so there is no problem related to it any more. Hi Andy and Hugh, Andrea found the reason of EINVAL, and the 2nd patch we have tested can fix it. Many thanks to Andrea. Yongqiang. > > Let me know if you need me to test me anything else in relation to the > copied=0 case. > > Thanks, > Andrea -- Best Wishes Yongqiang Yang -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2011-12-14 2:44 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-12-12 2:27 ext4_da_write_end() and copied == 0 Andrea Arcangeli 2011-12-12 2:27 ` [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() Andrea Arcangeli 2011-12-12 3:28 ` Yongqiang Yang 2011-12-12 16:57 ` Andrea Arcangeli 2011-12-13 1:05 ` Yongqiang Yang 2011-12-12 20:27 ` Jan Kara 2011-12-14 2:44 ` Ted Ts'o 2011-12-12 2:27 ` [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters Andrea Arcangeli 2011-12-12 3:17 ` Yongqiang Yang 2011-12-12 16:55 ` Andrea Arcangeli 2011-12-12 20:07 ` Allison Henderson 2011-12-13 0:40 ` Andrea Arcangeli 2011-12-13 1:16 ` Yongqiang Yang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).