* [PATCH] btrfs: fix false EIO for missing device
@ 2017-10-13 13:42 Anand Jain
2017-10-13 18:01 ` Liu Bo
2017-10-14 0:34 ` [PATCH v2] " Anand Jain
0 siblings, 2 replies; 5+ messages in thread
From: Anand Jain @ 2017-10-13 13:42 UTC (permalink / raw)
To: linux-btrfs
When one of the device is missing, bbio_error() takes care
of setting the error status. And if its only IO that is
pending in that stripe, it fails to check the status of the
other IO at %bbio_error before setting the error %bi_status
for the %orig_bio. Fix this by checking if %bbio->error is
has crossed the %bbio->max_errors. Thxs.
Reproducer as below fdatasync error is seen intermittently.
mount -o degraded /dev/sdc /btrfs
dd status=none if=/dev/zero of=$(mktemp /btrfs/XXX) bs=4096 count=1 conv=fdatasync
dd: fdatasync failed for ‘/btrfs/LSe’: Input/output error
The reason for the intermittences of the problem is because..
following condition has to be met, which depends on timely
coordination.
In btrfs_map_bio()
. The RAID1 the missing device has to be at %dev_nr = 1
In bbio_error()
. Before bbio_error() is called the bio of the not-missing
device at %dev_nr=0 must be completed so that the below
condition is true
if (atomic_dec_and_test(&bbio->stripes_pending)) {
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/volumes.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9af633dcf015..efd502176915 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6131,7 +6131,10 @@ static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical)
btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;
bio->bi_iter.bi_sector = logical >> 9;
- bio->bi_status = BLK_STS_IOERR;
+ if (atomic_read(&bbio->error) > bbio->max_errors)
+ bio->bi_status = BLK_STS_IOERR;
+ else
+ bio->bi_status = 0;
btrfs_end_bbio(bbio, bio);
}
}
--
2.13.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] btrfs: fix false EIO for missing device
2017-10-13 13:42 [PATCH] btrfs: fix false EIO for missing device Anand Jain
@ 2017-10-13 18:01 ` Liu Bo
2017-10-14 0:33 ` Anand Jain
2017-10-14 0:34 ` [PATCH v2] " Anand Jain
1 sibling, 1 reply; 5+ messages in thread
From: Liu Bo @ 2017-10-13 18:01 UTC (permalink / raw)
To: Anand Jain; +Cc: linux-btrfs
On Fri, Oct 13, 2017 at 09:42:18PM +0800, Anand Jain wrote:
> When one of the device is missing, bbio_error() takes care
> of setting the error status. And if its only IO that is
> pending in that stripe, it fails to check the status of the
> other IO at %bbio_error before setting the error %bi_status
> for the %orig_bio. Fix this by checking if %bbio->error is
> has crossed the %bbio->max_errors. Thxs.
>
> Reproducer as below fdatasync error is seen intermittently.
>
> mount -o degraded /dev/sdc /btrfs
> dd status=none if=/dev/zero of=$(mktemp /btrfs/XXX) bs=4096 count=1 conv=fdatasync
>
> dd: fdatasync failed for ‘/btrfs/LSe’: Input/output error
>
> The reason for the intermittences of the problem is because..
> following condition has to be met, which depends on timely
> coordination.
> In btrfs_map_bio()
> . The RAID1 the missing device has to be at %dev_nr = 1
> In bbio_error()
> . Before bbio_error() is called the bio of the not-missing
> device at %dev_nr=0 must be completed so that the below
> condition is true
> if (atomic_dec_and_test(&bbio->stripes_pending)) {
>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> fs/btrfs/volumes.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9af633dcf015..efd502176915 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6131,7 +6131,10 @@ static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical)
>
> btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;
> bio->bi_iter.bi_sector = logical >> 9;
> - bio->bi_status = BLK_STS_IOERR;
> + if (atomic_read(&bbio->error) > bbio->max_errors)
> + bio->bi_status = BLK_STS_IOERR;
> + else
> + bio->bi_status = 0;
Thanks for the fix, I'd prefer BLK_STS_OK rather than 0.
With that,
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
-liubo
> btrfs_end_bbio(bbio, bio);
> }
> }
> --
> 2.13.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] btrfs: fix false EIO for missing device
2017-10-13 18:01 ` Liu Bo
@ 2017-10-14 0:33 ` Anand Jain
0 siblings, 0 replies; 5+ messages in thread
From: Anand Jain @ 2017-10-14 0:33 UTC (permalink / raw)
To: bo.li.liu; +Cc: linux-btrfs
>> - bio->bi_status = BLK_STS_IOERR;
>> + if (atomic_read(&bbio->error) > bbio->max_errors)
>> + bio->bi_status = BLK_STS_IOERR;
>> + else
>> + bio->bi_status = 0;
>
> Thanks for the fix, I'd prefer BLK_STS_OK rather than 0.
>
> With that,
>
> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Thanks for the review will fix it.
-Anand
> -liubo
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2] btrfs: fix false EIO for missing device
2017-10-13 13:42 [PATCH] btrfs: fix false EIO for missing device Anand Jain
2017-10-13 18:01 ` Liu Bo
@ 2017-10-14 0:34 ` Anand Jain
2017-10-16 14:29 ` David Sterba
1 sibling, 1 reply; 5+ messages in thread
From: Anand Jain @ 2017-10-14 0:34 UTC (permalink / raw)
To: linux-btrfs
When one of the device is missing, bbio_error() takes care
of setting the error status. And if its only IO that is
pending in that stripe, it fails to check the status of the
other IO at %bbio_error before setting the error %bi_status
for the %orig_bio. Fix this by checking if %bbio->error is
has crossed the %bbio->max_errors. Thxs.
Reproducer as below fdatasync error is seen intermittently.
mount -o degraded /dev/sdc /btrfs
dd status=none if=/dev/zero of=$(mktemp /btrfs/XXX) bs=4096 count=1 conv=fdatasync
dd: fdatasync failed for ‘/btrfs/LSe’: Input/output error
The reason for the intermittences of the problem is because..
following condition has to be met, which depends on timely
coordination.
In btrfs_map_bio()
. The RAID1 the missing device has to be at %dev_nr = 1
In bbio_error()
. Before bbio_error() is called the bio of the not-missing
device at %dev_nr=0 must be completed so that the below
condition is true
if (atomic_dec_and_test(&bbio->stripes_pending)) {
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
---
v2: Use BLK_STS_OK instead of 0.
fs/btrfs/volumes.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 41c02a3ffc78..15e017af756c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6131,7 +6131,10 @@ static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical)
btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;
bio->bi_iter.bi_sector = logical >> 9;
- bio->bi_status = BLK_STS_IOERR;
+ if (atomic_read(&bbio->error) > bbio->max_errors)
+ bio->bi_status = BLK_STS_IOERR;
+ else
+ bio->bi_status = BLK_STS_OK;
btrfs_end_bbio(bbio, bio);
}
}
--
2.13.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH v2] btrfs: fix false EIO for missing device
2017-10-14 0:34 ` [PATCH v2] " Anand Jain
@ 2017-10-16 14:29 ` David Sterba
0 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2017-10-16 14:29 UTC (permalink / raw)
To: Anand Jain; +Cc: linux-btrfs
On Sat, Oct 14, 2017 at 08:34:02AM +0800, Anand Jain wrote:
> When one of the device is missing, bbio_error() takes care
> of setting the error status. And if its only IO that is
> pending in that stripe, it fails to check the status of the
> other IO at %bbio_error before setting the error %bi_status
> for the %orig_bio. Fix this by checking if %bbio->error is
> has crossed the %bbio->max_errors. Thxs.
>
> Reproducer as below fdatasync error is seen intermittently.
>
> mount -o degraded /dev/sdc /btrfs
> dd status=none if=/dev/zero of=$(mktemp /btrfs/XXX) bs=4096 count=1 conv=fdatasync
>
> dd: fdatasync failed for ‘/btrfs/LSe’: Input/output error
>
> The reason for the intermittences of the problem is because..
> following condition has to be met, which depends on timely
> coordination.
> In btrfs_map_bio()
> . The RAID1 the missing device has to be at %dev_nr = 1
> In bbio_error()
> . Before bbio_error() is called the bio of the not-missing
> device at %dev_nr=0 must be completed so that the below
> condition is true
> if (atomic_dec_and_test(&bbio->stripes_pending)) {
>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Added to queue, thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-10-16 14:30 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-13 13:42 [PATCH] btrfs: fix false EIO for missing device Anand Jain
2017-10-13 18:01 ` Liu Bo
2017-10-14 0:33 ` Anand Jain
2017-10-14 0:34 ` [PATCH v2] " Anand Jain
2017-10-16 14:29 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).