ext4_da_write_end() and copied == 0

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* ext4_da_write_end() and copied == 0
@ 2011-12-12  2:27 Andrea Arcangeli
  2011-12-12  2:27 ` [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() Andrea Arcangeli
  2011-12-12  2:27 ` [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters Andrea Arcangeli
  0 siblings, 2 replies; 13+ messages in thread
From: Andrea Arcangeli @ 2011-12-12  2:27 UTC (permalink / raw)
  To: linux-ext4; +Cc: Theodore Tso, Jan Kara

Hi,

I've been triggering ext4 troubles out of some experimental autonuma
code if I set knumad frequency rates very high.

The first patch looks fairly safe, but the second patch I've no idea
if it's correct and if damages stuff instead of fixing it, but clearly
it fixes the -EINVAL problem for me and it looks like to work fine now.

[PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize()
[PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters

If you make alternate fixes for this let me know and I can test
them. I can reproduce both issues instantly.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize()
  2011-12-12  2:27 ext4_da_write_end() and copied == 0 Andrea Arcangeli
@ 2011-12-12  2:27 ` Andrea Arcangeli
  2011-12-12  3:28   ` Yongqiang Yang
                     ` (2 more replies)
  2011-12-12  2:27 ` [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters Andrea Arcangeli
  1 sibling, 3 replies; 13+ messages in thread
From: Andrea Arcangeli @ 2011-12-12  2:27 UTC (permalink / raw)
  To: linux-ext4; +Cc: Theodore Tso, Jan Kara

If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to ->end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.

 gdb> bt
 #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
 ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
 #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
 os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
 #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
 xffff88001e26be40) at mm/filemap.c:2600
 #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
 zed out>, pos=<value optimized out>) at mm/filemap.c:2632
 #6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
 t fs/ext4/file.c:136
 #7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
 ppos=0xffff88001e26bf48) at fs/read_write.c:406
 #8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
 000, pos=0xffff88001e26bf48) at fs/read_write.c:435
 #9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
 4000) at fs/read_write.c:487
 #10 <signal handler called>
 #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
 #12 0x0000000000000000 in ?? ()
 gdb> print offset
 $22 = 0xffffffffffffffff
 gdb> print idx
 $23 = 0xffffffff
 gdb> print inode->i_blkbits
 $24 = 0xc
 gdb> up
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 2512                    if (ext4_da_should_update_i_disksize(page, end)) {
 gdb> print start
 $25 = 0x0
 gdb> print end
 $26 = 0xffffffffffffffff
 gdb> print pos
 $27 = 0x108000
 gdb> print new_i_size
 $28 = 0x108000
 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
 $29 = 0xd9000
 gdb> down
 2467            for (i = 0; i < idx; i++)
 gdb> print i
 $30 = 0xd44acbee

This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 fs/ext4/inode.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index fffec40..63f9541 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2508,7 +2508,7 @@ static int ext4_da_write_end(struct file *file,
 	 */
 
 	new_i_size = pos + copied;
-	if (new_i_size > EXT4_I(inode)->i_disksize) {
+	if (copied && new_i_size > EXT4_I(inode)->i_disksize) {
 		if (ext4_da_should_update_i_disksize(page, end)) {
 			down_write(&EXT4_I(inode)->i_data_sem);
 			if (new_i_size > EXT4_I(inode)->i_disksize) {

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize()
  2011-12-12  2:27 ` [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() Andrea Arcangeli
@ 2011-12-12  3:28   ` Yongqiang Yang
  2011-12-12 16:57     ` Andrea Arcangeli
  2011-12-12 20:27   ` Jan Kara
  2011-12-14  2:44   ` Ted Ts'o
  2 siblings, 1 reply; 13+ messages in thread
From: Yongqiang Yang @ 2011-12-12  3:28 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-ext4, Theodore Tso, Jan Kara

Hi Andrea,

I can not figure out why ext4 hangs.  Could you explain more on hanging?

Thanks,
Yongqiang.

On Mon, Dec 12, 2011 at 10:27 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> If the pte mapping in generic_perform_write() is unmapped between
> iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
> "copied" parameter to ->end_write can be zero. ext4 couldn't cope with
> it with delayed allocations enabled. This skips the i_disksize
> enlargement logic if copied is zero and no new data was appeneded to
> the inode.
>
>  gdb> bt
>  #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
>  08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
>  #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
>  xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
>  #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
>  ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
>  #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
>  os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
>  #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
>  xffff88001e26be40) at mm/filemap.c:2600
>  #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
>  zed out>, pos=<value optimized out>) at mm/filemap.c:2632
>  #6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
>  t fs/ext4/file.c:136
>  #7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
>  ppos=0xffff88001e26bf48) at fs/read_write.c:406
>  #8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
>  000, pos=0xffff88001e26bf48) at fs/read_write.c:435
>  #9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
>  4000) at fs/read_write.c:487
>  #10 <signal handler called>
>  #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
>  #12 0x0000000000000000 in ?? ()
>  gdb> print offset
>  $22 = 0xffffffffffffffff
>  gdb> print idx
>  $23 = 0xffffffff
>  gdb> print inode->i_blkbits
>  $24 = 0xc
>  gdb> up
>  #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
>  xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
>  2512                    if (ext4_da_should_update_i_disksize(page, end)) {
>  gdb> print start
>  $25 = 0x0
>  gdb> print end
>  $26 = 0xffffffffffffffff
>  gdb> print pos
>  $27 = 0x108000
>  gdb> print new_i_size
>  $28 = 0x108000
>  gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
>  $29 = 0xd9000
>  gdb> down
>  2467            for (i = 0; i < idx; i++)
>  gdb> print i
>  $30 = 0xd44acbee
>
> This is 100% reproducible with some autonuma development code tuned in
> a very aggressive manner (not normal way even for knumad) which does
> "exotic" changes to the ptes. It wouldn't normally trigger but I don't
> see why it can't happen normally if the page is added to swap cache in
> between the two faults leading to "copied" being zero (which then
> hangs in ext4). So it should be fixed. Especially possible with lumpy
> reclaim (albeit disabled if compaction is enabled) as that would
> ignore the young bits in the ptes.
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>  fs/ext4/inode.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index fffec40..63f9541 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2508,7 +2508,7 @@ static int ext4_da_write_end(struct file *file,
>         */
>
>        new_i_size = pos + copied;
> -       if (new_i_size > EXT4_I(inode)->i_disksize) {
> +       if (copied && new_i_size > EXT4_I(inode)->i_disksize) {
>                if (ext4_da_should_update_i_disksize(page, end)) {
>                        down_write(&EXT4_I(inode)->i_data_sem);
>                        if (new_i_size > EXT4_I(inode)->i_disksize) {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize()
  2011-12-12  3:28   ` Yongqiang Yang
@ 2011-12-12 16:57     ` Andrea Arcangeli
  2011-12-13  1:05       ` Yongqiang Yang
  0 siblings, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2011-12-12 16:57 UTC (permalink / raw)
  To: Yongqiang Yang; +Cc: linux-ext4, Theodore Tso, Jan Kara

On Mon, Dec 12, 2011 at 11:28:21AM +0800, Yongqiang Yang wrote:
> Hi Andrea,
> 
> I can not figure out why ext4 hangs.  Could you explain more on hanging?

Well I tried to explain it in the changeset comment.

If "copied" is zero, end/offset/idx becomes -1, and this loops:

	for (i = 0; i < idx; i++)

definitely a bug and checking ext4 git I don't see anything fixing it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize()
  2011-12-12 16:57     ` Andrea Arcangeli
@ 2011-12-13  1:05       ` Yongqiang Yang
  0 siblings, 0 replies; 13+ messages in thread
From: Yongqiang Yang @ 2011-12-13  1:05 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-ext4, Theodore Tso, Jan Kara

On Tue, Dec 13, 2011 at 12:57 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Mon, Dec 12, 2011 at 11:28:21AM +0800, Yongqiang Yang wrote:
>> Hi Andrea,
>>
>> I can not figure out why ext4 hangs.  Could you explain more on hanging?
>
> Well I tried to explain it in the changeset comment.
>
> If "copied" is zero, end/offset/idx becomes -1, and this loops:
>
>        for (i = 0; i < idx; i++)
>
> definitely a bug and checking ext4 git I don't see anything fixing it.
Got it.  Thanks.

Looks good to me.


-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize()
  2011-12-12  2:27 ` [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() Andrea Arcangeli
  2011-12-12  3:28   ` Yongqiang Yang
@ 2011-12-12 20:27   ` Jan Kara
  2011-12-14  2:44   ` Ted Ts'o
  2 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2011-12-12 20:27 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-ext4, Theodore Tso, Jan Kara

On Mon 12-12-11 03:27:07, Andrea Arcangeli wrote:
> If the pte mapping in generic_perform_write() is unmapped between
> iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
> "copied" parameter to ->end_write can be zero. ext4 couldn't cope with
> it with delayed allocations enabled.
  I'd expand this sentence with: because ext4_da_should_update_i_disksize()
is passed -1 as an offset and loops (almost) infinitely.
  I know it's also in the gdb dump below but it takes a while to notice it
if you don't know what you are looking for.

> This skips the i_disksize
> enlargement logic if copied is zero and no new data was appeneded to
> the inode.
> 
>  gdb> bt
>  #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
>  08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
>  #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
>  xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
>  #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
>  ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
>  #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
>  os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
>  #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
>  xffff88001e26be40) at mm/filemap.c:2600
>  #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
>  zed out>, pos=<value optimized out>) at mm/filemap.c:2632
>  #6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
>  t fs/ext4/file.c:136
>  #7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
>  ppos=0xffff88001e26bf48) at fs/read_write.c:406
>  #8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
>  000, pos=0xffff88001e26bf48) at fs/read_write.c:435
>  #9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
>  4000) at fs/read_write.c:487
>  #10 <signal handler called>
>  #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
>  #12 0x0000000000000000 in ?? ()
>  gdb> print offset
>  $22 = 0xffffffffffffffff
>  gdb> print idx
>  $23 = 0xffffffff
>  gdb> print inode->i_blkbits
>  $24 = 0xc
>  gdb> up
>  #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
>  xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
>  2512                    if (ext4_da_should_update_i_disksize(page, end)) {
>  gdb> print start
>  $25 = 0x0
>  gdb> print end
>  $26 = 0xffffffffffffffff
>  gdb> print pos
>  $27 = 0x108000
>  gdb> print new_i_size
>  $28 = 0x108000
>  gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
>  $29 = 0xd9000
>  gdb> down
>  2467            for (i = 0; i < idx; i++)
>  gdb> print i
>  $30 = 0xd44acbee
> 
> This is 100% reproducible with some autonuma development code tuned in
> a very aggressive manner (not normal way even for knumad) which does
> "exotic" changes to the ptes. It wouldn't normally trigger but I don't
> see why it can't happen normally if the page is added to swap cache in
> between the two faults leading to "copied" being zero (which then
> hangs in ext4). So it should be fixed. Especially possible with lumpy
> reclaim (albeit disabled if compaction is enabled) as that would
> ignore the young bits in the ptes.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
  You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
> ---
>  fs/ext4/inode.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index fffec40..63f9541 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2508,7 +2508,7 @@ static int ext4_da_write_end(struct file *file,
>  	 */
>  
>  	new_i_size = pos + copied;
> -	if (new_i_size > EXT4_I(inode)->i_disksize) {
> +	if (copied && new_i_size > EXT4_I(inode)->i_disksize) {
>  		if (ext4_da_should_update_i_disksize(page, end)) {
>  			down_write(&EXT4_I(inode)->i_data_sem);
>  			if (new_i_size > EXT4_I(inode)->i_disksize) {
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize()
  2011-12-12  2:27 ` [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() Andrea Arcangeli
  2011-12-12  3:28   ` Yongqiang Yang
  2011-12-12 20:27   ` Jan Kara
@ 2011-12-14  2:44   ` Ted Ts'o
  2 siblings, 0 replies; 13+ messages in thread
From: Ted Ts'o @ 2011-12-14  2:44 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-ext4, Jan Kara

On Mon, Dec 12, 2011 at 03:27:07AM +0100, Andrea Arcangeli wrote:
> If the pte mapping in generic_perform_write() is unmapped between
> iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
> "copied" parameter to ->end_write can be zero. ext4 couldn't cope with
> it with delayed allocations enabled. This skips the i_disksize
> enlargement logic if copied is zero and no new data was appeneded to
> the inode.
      ...
> 
> This is 100% reproducible with some autonuma development code tuned in
> a very aggressive manner (not normal way even for knumad) which does
> "exotic" changes to the ptes. It wouldn't normally trigger but I don't
> see why it can't happen normally if the page is added to swap cache in
> between the two faults leading to "copied" being zero (which then
> hangs in ext4). So it should be fixed. Especially possible with lumpy
> reclaim (albeit disabled if compaction is enabled) as that would
> ignore the young bits in the ptes.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Thanks, applied.

						- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters
  2011-12-12  2:27 ext4_da_write_end() and copied == 0 Andrea Arcangeli
  2011-12-12  2:27 ` [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() Andrea Arcangeli
@ 2011-12-12  2:27 ` Andrea Arcangeli
  2011-12-12  3:17   ` Yongqiang Yang
  1 sibling, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2011-12-12  2:27 UTC (permalink / raw)
  To: linux-ext4; +Cc: Theodore Tso, Jan Kara

If "copied" is zero (it can happen if the pte is unmapped before the
atomic copy_user that does the data copy runs) the "from" passed to
ext4_discard_partial_page_buffers_no_lock() points to pos-1, which
would correspond to a logical page index before the page->index
leading to ext4_discard_partial_page_buffers_no_lock() returning
-EINVAL (because index != page->index). In such a case write() returns
-EINVAL and userland gets a failure and filemap.c doesn't retry the
copy_user anymore.

I'm not certain of why exactly
ext4_discard_partial_page_buffers_no_lock() is run here, so it's hard
to tell if this is the correct fix. But it that functions clears data
starting from the "from" parameter, so regardless of the -EINVAL
retval, the right "from" to start clearing data should be pos+copied,
not pos+copied-1. If this assumption is correct, it could mean that
this bug in addition to the -EINVAL error, could also zero out 1 byte
by mistake. I'm not sure what the implications for that are (not sure
if data corruption is possible in some circumstances because of
that). I guess normally this functions runs on unmapped buffers and
the EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED makes it a noop on those.

After fixing the hang in ext4_da_should_update_i_disksize, this write
-EINVAL error becomes trivially reproducible with experimental knumad
autonuma code running at heavy frequency (not the normal case). But
like for the ext4_da_should_update_i_disksize hang, it should not be
impossible to reproduce it with legacy swapping behavior.

After dropping all caches a md5sum is successful and I can't find
errors anymore with this patch, the -EINVAL stops, but it's not
conclusive (and I haven't run e2fsck -f yet but I doubt this affects
metadata coherency, seems more like a delayalloc data issue).

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 fs/ext4/inode.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 63f9541..528c4c5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2534,11 +2534,11 @@ static int ext4_da_write_end(struct file *file,
 							page, fsdata);

 	page_len = PAGE_CACHE_SIZE -
-			((pos + copied - 1) & (PAGE_CACHE_SIZE - 1));
+			((pos + copied) & (PAGE_CACHE_SIZE - 1));

-	if (page_len > 0) {
+	if (page_len < PAGE_CACHE_SIZE) {
 		ret = ext4_discard_partial_page_buffers_no_lock(handle,
-			inode, page, pos + copied - 1, page_len,
+			inode, page, pos + copied, page_len,
 			EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED);
 	}

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters
  2011-12-12  2:27 ` [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters Andrea Arcangeli
@ 2011-12-12  3:17   ` Yongqiang Yang
  2011-12-12 16:55     ` Andrea Arcangeli
  0 siblings, 1 reply; 13+ messages in thread
From: Yongqiang Yang @ 2011-12-12  3:17 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-ext4, Theodore Tso, Jan Kara

Hi Andrea,

The code you are testing are removed in recent patches,  the patches
have not been merged.

Please try following patches:
[PATCH 1/2] ext4: let mpage_submit_io works well when blocksize < pagesize
[PATCH 2/2] ext4: let ext4_discard_partial_buffers handle pages
without buffers correctly

and

[PATCH 1/2] ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized
[PATCH 2/2] ext4: let ext4_bio_write_page handle EOF correctly

Yongqiang.

On Mon, Dec 12, 2011 at 10:27 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> If "copied" is zero (it can happen if the pte is unmapped before the
> atomic copy_user that does the data copy runs) the "from" passed to
> ext4_discard_partial_page_buffers_no_lock() points to pos-1, which
> would correspond to a logical page index before the page->index
> leading to ext4_discard_partial_page_buffers_no_lock() returning
> -EINVAL (because index != page->index). In such a case write() returns
> -EINVAL and userland gets a failure and filemap.c doesn't retry the
> copy_user anymore.
>
> I'm not certain of why exactly
> ext4_discard_partial_page_buffers_no_lock() is run here, so it's hard
> to tell if this is the correct fix. But it that functions clears data
> starting from the "from" parameter, so regardless of the -EINVAL
> retval, the right "from" to start clearing data should be pos+copied,
> not pos+copied-1. If this assumption is correct, it could mean that
> this bug in addition to the -EINVAL error, could also zero out 1 byte
> by mistake. I'm not sure what the implications for that are (not sure
> if data corruption is possible in some circumstances because of
> that). I guess normally this functions runs on unmapped buffers and
> the EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED makes it a noop on those.
>
> After fixing the hang in ext4_da_should_update_i_disksize, this write
> -EINVAL error becomes trivially reproducible with experimental knumad
> autonuma code running at heavy frequency (not the normal case). But
> like for the ext4_da_should_update_i_disksize hang, it should not be
> impossible to reproduce it with legacy swapping behavior.
>
> After dropping all caches a md5sum is successful and I can't find
> errors anymore with this patch, the -EINVAL stops, but it's not
> conclusive (and I haven't run e2fsck -f yet but I doubt this affects
> metadata coherency, seems more like a delayalloc data issue).
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>  fs/ext4/inode.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 63f9541..528c4c5 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2534,11 +2534,11 @@ static int ext4_da_write_end(struct file *file,
>                                                        page, fsdata);
>
>        page_len = PAGE_CACHE_SIZE -
> -                       ((pos + copied - 1) & (PAGE_CACHE_SIZE - 1));
> +                       ((pos + copied) & (PAGE_CACHE_SIZE - 1));
>
> -       if (page_len > 0) {
> +       if (page_len < PAGE_CACHE_SIZE) {
>                ret = ext4_discard_partial_page_buffers_no_lock(handle,
> -                       inode, page, pos + copied - 1, page_len,
> +                       inode, page, pos + copied, page_len,
>                        EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED);
>        }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters
  2011-12-12  3:17   ` Yongqiang Yang
@ 2011-12-12 16:55     ` Andrea Arcangeli
  2011-12-12 20:07       ` Allison Henderson
  0 siblings, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2011-12-12 16:55 UTC (permalink / raw)
  To: Yongqiang Yang; +Cc: linux-ext4, Theodore Tso, Jan Kara

On Mon, Dec 12, 2011 at 11:17:54AM +0800, Yongqiang Yang wrote:
> Hi Andrea,
> 
> The code you are testing are removed in recent patches,  the patches
> have not been merged.

I checked out the git ext4 tree and the two patches applies clean to
it. So I'm unsure why you say it was removed. origin/dev origin/master
are equal too.

> 
> Please try following patches:
> [PATCH 1/2] ext4: let mpage_submit_io works well when blocksize < pagesize
> [PATCH 2/2] ext4: let ext4_discard_partial_buffers handle pages
> without buffers correctly
> 
> and
> 
> [PATCH 1/2] ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized
> [PATCH 2/2] ext4: let ext4_bio_write_page handle EOF correctly

Is there a git tree with these patches? I'm not subscribed to the list.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters
  2011-12-12 16:55     ` Andrea Arcangeli
@ 2011-12-12 20:07       ` Allison Henderson
  2011-12-13  0:40         ` Andrea Arcangeli
  0 siblings, 1 reply; 13+ messages in thread
From: Allison Henderson @ 2011-12-12 20:07 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Yongqiang Yang, linux-ext4, Theodore Tso, Jan Kara

On 12/12/2011 09:55 AM, Andrea Arcangeli wrote:
> On Mon, Dec 12, 2011 at 11:17:54AM +0800, Yongqiang Yang wrote:
>> Hi Andrea,
>>
>> The code you are testing are removed in recent patches,  the patches
>> have not been merged.
>
> I checked out the git ext4 tree and the two patches applies clean to
> it. So I'm unsure why you say it was removed. origin/dev origin/master
> are equal too.
>
>>
>> Please try following patches:
>> [PATCH 1/2] ext4: let mpage_submit_io works well when blocksize<  pagesize
>> [PATCH 2/2] ext4: let ext4_discard_partial_buffers handle pages
>> without buffers correctly
>>
>> and
>>
>> [PATCH 1/2] ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized
>> [PATCH 2/2] ext4: let ext4_bio_write_page handle EOF correctly
>
> Is there a git tree with these patches? I'm not subscribed to the list.
>
> Thanks,
> Andrea

Hi Andrea,

I think what Yongqiang means is that the code you are modifying is being 
removed in the patches listed above and replaced with a different 
solution.  The patches have not yet been merged so you will not see them 
in a git tree yet, so you will need to apply them yourself.  If you are 
not subscribed to the list, you can still find the patches here:

http://www.spinics.net/lists/linux-ext4/msg29109.html
http://www.spinics.net/lists/linux-ext4/msg29110.html

http://www.spinics.net/lists/linux-ext4/msg29375.html
http://www.spinics.net/lists/linux-ext4/msg29376.html

We want to make sure the solution works for everyone, so please apply 
these patches and let us know if it corrects the issues you are seeing. 
  Thx!

Allison Henderson

> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters
  2011-12-12 20:07       ` Allison Henderson
@ 2011-12-13  0:40         ` Andrea Arcangeli
  2011-12-13  1:16           ` Yongqiang Yang
  0 siblings, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2011-12-13  0:40 UTC (permalink / raw)
  To: Allison Henderson; +Cc: Yongqiang Yang, linux-ext4, Theodore Tso, Jan Kara

Hi Allison,

On Mon, Dec 12, 2011 at 01:07:47PM -0700, Allison Henderson wrote:
> Hi Andrea,
> 
> I think what Yongqiang means is that the code you are modifying is being 
> removed in the patches listed above and replaced with a different 
> solution.  The patches have not yet been merged so you will not see them 
> in a git tree yet, so you will need to apply them yourself.  If you are 
> not subscribed to the list, you can still find the patches here:
> 
> http://www.spinics.net/lists/linux-ext4/msg29109.html
> http://www.spinics.net/lists/linux-ext4/msg29110.html
> 
> http://www.spinics.net/lists/linux-ext4/msg29375.html
> http://www.spinics.net/lists/linux-ext4/msg29376.html

Thanks for the links. Applied all 4.

> We want to make sure the solution works for everyone, so please apply 
> these patches and let us know if it corrects the issues you are seeing. 
>   Thx!

With all 4 patches applied, my 1/2 patch is still needed to avoid the
hang and writes hangs all the time (but now they don't return -EINVAL
anymore).

My patch 2/2 is not needed with the above 4 patches applied so that
can be discared if you apply the above 4 patches (but I suggest you
check my patch 2/2 too because if it happens to be correct it could
mean there's the potential of 1 byte data corruption in current
upstream kernels in some circumstances, in addition to the -EINVAL
failure in write()).

Let me know if you need me to test me anything else in relation to the
copied=0 case.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters
  2011-12-13  0:40         ` Andrea Arcangeli
@ 2011-12-13  1:16           ` Yongqiang Yang
  0 siblings, 0 replies; 13+ messages in thread
From: Yongqiang Yang @ 2011-12-13  1:16 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Allison Henderson, linux-ext4, Theodore Tso, Jan Kara,
	Hugh Dickins, Andy Whitcroft

On Tue, Dec 13, 2011 at 8:40 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> Hi Allison,
>
> On Mon, Dec 12, 2011 at 01:07:47PM -0700, Allison Henderson wrote:
>> Hi Andrea,
>>
>> I think what Yongqiang means is that the code you are modifying is being
>> removed in the patches listed above and replaced with a different
>> solution.  The patches have not yet been merged so you will not see them
>> in a git tree yet, so you will need to apply them yourself.  If you are
>> not subscribed to the list, you can still find the patches here:
>>
>> http://www.spinics.net/lists/linux-ext4/msg29109.html
>> http://www.spinics.net/lists/linux-ext4/msg29110.html
>>
>> http://www.spinics.net/lists/linux-ext4/msg29375.html
>> http://www.spinics.net/lists/linux-ext4/msg29376.html
>
> Thanks for the links. Applied all 4.
>
>> We want to make sure the solution works for everyone, so please apply
>> these patches and let us know if it corrects the issues you are seeing.
>>   Thx!
>
> With all 4 patches applied, my 1/2 patch is still needed to avoid the
> hang and writes hangs all the time (but now they don't return -EINVAL
> anymore).
>
> My patch 2/2 is not needed with the above 4 patches applied so that
> can be discared if you apply the above 4 patches (but I suggest you
> check my patch 2/2 too because if it happens to be correct it could
> mean there's the potential of 1 byte data corruption in current
> upstream kernels in some circumstances, in addition to the -EINVAL
> failure in write()).
page_len is removed in the 2nd patch, so there is no problem related
to it any more.

Hi Andy and Hugh,

Andrea found the reason of EINVAL,  and the 2nd patch we have tested
can fix it.  Many thanks to Andrea.


Yongqiang.
>
> Let me know if you need me to test me anything else in relation to the
> copied=0 case.
>
> Thanks,
> Andrea



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-12-14  2:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-12  2:27 ext4_da_write_end() and copied == 0 Andrea Arcangeli
2011-12-12  2:27 ` [PATCH 1/2] ext4: avoid hangs in ext4_da_should_update_i_disksize() Andrea Arcangeli
2011-12-12  3:28   ` Yongqiang Yang
2011-12-12 16:57     ` Andrea Arcangeli
2011-12-13  1:05       ` Yongqiang Yang
2011-12-12 20:27   ` Jan Kara
2011-12-14  2:44   ` Ted Ts'o
2011-12-12  2:27 ` [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters Andrea Arcangeli
2011-12-12  3:17   ` Yongqiang Yang
2011-12-12 16:55     ` Andrea Arcangeli
2011-12-12 20:07       ` Allison Henderson
2011-12-13  0:40         ` Andrea Arcangeli
2011-12-13  1:16           ` Yongqiang Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).