[PATCH 1/2] btrfs: submit a normal bio for the mirror read

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] btrfs: submit a normal bio for the mirror read
@ 2017-11-29  7:11 Anand Jain
  2017-11-29  7:11 ` [PATCH 2/2] btrfs: submit a normal bio for the mirror dio read Anand Jain
  2017-11-29 19:26 ` [PATCH 1/2] btrfs: submit a normal bio for the mirror read Liu Bo
  0 siblings, 2 replies; 3+ messages in thread
From: Anand Jain @ 2017-11-29  7:11 UTC (permalink / raw)
  To: linux-btrfs

When the fist mirror failed to read we submit bio again to read from the
2nd mirror, however during this, we also set the flag REQ_FAILFAST_DEV
which indicates to the underlying block device driver not to perform the
default IO retry (sd, 5 counts), and thus command will be failed and
returned at the first failed attempt.

On the contrary, in fact, it should have been other way around that is,
as 2nd mirror being the last copy of the data, it should rather try
equally hard to make this read cmd successful and at the same time with
or without REQ_FAILFAST_DEV there is no performance benefits if the
command is successful in the first attempt itself.

The only benefit which would be achieved with REQ_FAILFAST_DEV is that
when both the copies encounter failed read, then for the applications,
the EIO will be reported roughly 40% faster (since it saves 4 retries).
But when first mirror has failed whats more important will be to make
a successful read from the 2nd mirror.

And for the DUP case where both the copies are on the same disk, this
makes to retry 5 times on 2 different LBA/sector but on the same disk,
which probably is not a good idea if your test case involves pulling
out the disk. But as of now we don't have a way to tell the block layer
how critical a read is, so that block layer can accommodate the retry
dynamically.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/extent_io.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d8eb457f2a70..29c03fb4d5af 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2388,9 +2388,6 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset,
 		return -EIO;
 	}

-	if (failed_bio->bi_vcnt > 1)
-		read_mode |= REQ_FAILFAST_DEV;
-
 	phy_offset >>= inode->i_sb->s_blocksize_bits;
 	bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page,
 				      start - page_offset(page),
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] btrfs: submit a normal bio for the mirror dio read
  2017-11-29  7:11 [PATCH 1/2] btrfs: submit a normal bio for the mirror read Anand Jain
@ 2017-11-29  7:11 ` Anand Jain
  2017-11-29 19:26 ` [PATCH 1/2] btrfs: submit a normal bio for the mirror read Liu Bo
  1 sibling, 0 replies; 3+ messages in thread
From: Anand Jain @ 2017-11-29  7:11 UTC (permalink / raw)
  To: linux-btrfs

Similarly do not set the REQ_FAILFAST_DEV for mirror read bio during dio.

[Taken from patch in ml
 btrfs: submit a normal bio for the mirror read]
[When the fist mirror failed to read we submit bio again to read from the
2nd mirror, however during this, we also set the flag REQ_FAILFAST_DEV
which indicates to the underlying block device driver not to perform the
default IO retry (sd, 5 counts), and thus command will be failed and
returned at the first failed attempt.

On the contrary, in fact, it should have been other way around that is,
as 2nd mirror being the last copy of the data, it should rather try
equally hard to make this read cmd successful and at the same time with
or without REQ_FAILFAST_DEV there is no performance benefits if the
command is successful in the first attempt itself.

The only benefit which would be achieved with REQ_FAILFAST_DEV is that
when both the copies encounter failed read, then for the applications,
the EIO will be reported roughly 40% faster (since it saves 4 retries).
But when first mirror has failed whats more important will be to make
a successful read from the 2nd mirror.

And for the DUP case where both the copies are on the same disk, this
makes to retry 5 times on 2 different LBA/sector but on the same disk,
which probably is not a good idea if your test case involves pulling
out the disk. But as of now we don't have a way to tell the block layer
how critical a read is, so that block layer can accommodate the retry
dynamically].

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/inode.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b93fe05a39c7..9a539cf82e51 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8014,9 +8014,6 @@ static blk_status_t dio_read_error(struct inode *inode, struct bio *failed_bio,
 	}

 	segs = bio_segments(failed_bio);
-	if (segs > 1 ||
-	    (failed_bio->bi_io_vec->bv_len > btrfs_inode_sectorsize(inode)))
-		read_mode |= REQ_FAILFAST_DEV;

 	isector = start - btrfs_io_bio(failed_bio)->logical;
 	isector >>= inode->i_sb->s_blocksize_bits;
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/2] btrfs: submit a normal bio for the mirror read
  2017-11-29  7:11 [PATCH 1/2] btrfs: submit a normal bio for the mirror read Anand Jain
  2017-11-29  7:11 ` [PATCH 2/2] btrfs: submit a normal bio for the mirror dio read Anand Jain
@ 2017-11-29 19:26 ` Liu Bo
  1 sibling, 0 replies; 3+ messages in thread
From: Liu Bo @ 2017-11-29 19:26 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Wed, Nov 29, 2017 at 03:11:14PM +0800, Anand Jain wrote:
> When the fist mirror failed to read we submit bio again to read from the
> 2nd mirror, however during this, we also set the flag REQ_FAILFAST_DEV
> which indicates to the underlying block device driver not to perform the
> default IO retry (sd, 5 counts), and thus command will be failed and
> returned at the first failed attempt.
>
> On the contrary, in fact, it should have been other way around that is,
> as 2nd mirror being the last copy of the data, it should rather try
> equally hard to make this read cmd successful and at the same time with
> or without REQ_FAILFAST_DEV there is no performance benefits if the
> command is successful in the first attempt itself.
> 

There are some misunderstanding in the above commit log.

FAILFAST is only set for the validation phrase, i.e. retry read on the
same failed mirror.  When it comes to the 2nd mirror,
failed_bio->bi_vcnt becomes 1 so FAILFAST is not set any more.  IOW,
FAILFAST is only set when narrowing read failures to each 4K block, so
a quick retry is reasonable because anyway we have another mirror to
read.  You may want to check the comments in btrfs_check_repairable().

thanks,

-liubo

> The only benefit which would be achieved with REQ_FAILFAST_DEV is that
> when both the copies encounter failed read, then for the applications,
> the EIO will be reported roughly 40% faster (since it saves 4 retries).
> But when first mirror has failed whats more important will be to make
> a successful read from the 2nd mirror.
> 
> And for the DUP case where both the copies are on the same disk, this
> makes to retry 5 times on 2 different LBA/sector but on the same disk,
> which probably is not a good idea if your test case involves pulling
> out the disk. But as of now we don't have a way to tell the block layer
> how critical a read is, so that block layer can accommodate the retry
> dynamically.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  fs/btrfs/extent_io.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index d8eb457f2a70..29c03fb4d5af 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2388,9 +2388,6 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset,
>  		return -EIO;
>  	}
>  
> -	if (failed_bio->bi_vcnt > 1)
> -		read_mode |= REQ_FAILFAST_DEV;
> -
>  	phy_offset >>= inode->i_sb->s_blocksize_bits;
>  	bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page,
>  				      start - page_offset(page),
> -- 
> 2.15.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-11-29 19:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-29  7:11 [PATCH 1/2] btrfs: submit a normal bio for the mirror read Anand Jain
2017-11-29  7:11 ` [PATCH 2/2] btrfs: submit a normal bio for the mirror dio read Anand Jain
2017-11-29 19:26 ` [PATCH 1/2] btrfs: submit a normal bio for the mirror read Liu Bo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).