* [PATCH 0/8 v1] utilize bio_clone_fast to clean up
@ 2017-05-17 22:22 Liu Bo
2017-05-17 22:22 ` [PATCH 1/8] Btrfs: use bio_clone_fast to clone our bio Liu Bo
` (8 more replies)
0 siblings, 9 replies; 12+ messages in thread
From: Liu Bo @ 2017-05-17 22:22 UTC (permalink / raw)
To: linux-btrfs
v1: - Drop the RFC tag.
- Update to use bio_segments accordingly as __bio_segments is removed.
- Remove if (!bio) since bio_clone_fast with bioset and GFP_NOFS will
never fail.
This attempts to use bio_clone_fast() in the places where we clone bio,
such as when bio got cloned for multiple disks and when bio got split
during dio submit.
One benefit is to simplify dio submit to avoid calling bio_add_page one by
one.
Another benefit is that comparing to bio_clone_bioset, bio_clone_fast is
faster because of copying the vector pointer directly, and bio_clone_fast
doesn't modify bi_vcnt, so the extra work is to fix up bi_vcnt usage we
currently have to use bi_iter to iterate bvec.
Here are some numbers collected with the script [1], note that most of
performance tests usually issue bs=4k dio write/read so our directIO split code
is not tested as it requires bs > stripe_len(64K), thus I made this simple
script which writes 2G with bs=128K.
- vanilla:
real 0m10.265s
user 0m0.005s
sys 0m9.164s
- patched:
real 0m8.973s
user 0m0.006s
sys 0m7.804s
[1]:
#!/bin/bash
M=/mnt/btrfs
D1=/dev/pmem0p1
D2=/dev/pmem0p2
umount $M
mkfs.btrfs -f $D1 $D2 >/dev/null || exit
mount $D1 $M -onodatasum || exit
xfs_io -f -c "falloc 0 2G" $M/foo
time xfs_io -d -c "pwrite -b 128K 0 2G" $M/foo
Liu Bo (8):
Btrfs: use bio_clone_fast to clone our bio
Btrfs: new helper btrfs_bio_clone_partial
Btrfs: use bio_clone_bioset_partial to simplify DIO submit
Btrfs: change how we iterate bios in endio
Btrfs: record error if one block has failed to retry
Btrfs: make check-integrity use bvec_iter
Btrfs: unify naming of btrfs_io_bio
Btrfs: hardcode GFP_NOFS for btrfs_bio_clone_partial
fs/btrfs/check-integrity.c | 27 +++---
fs/btrfs/extent_io.c | 20 ++++-
fs/btrfs/extent_io.h | 1 +
fs/btrfs/file-item.c | 31 ++++---
fs/btrfs/inode.c | 204 ++++++++++++++++++++-------------------------
fs/btrfs/volumes.h | 1 +
6 files changed, 142 insertions(+), 142 deletions(-)
--
2.9.4
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/8] Btrfs: use bio_clone_fast to clone our bio
2017-05-17 22:22 [PATCH 0/8 v1] utilize bio_clone_fast to clean up Liu Bo
@ 2017-05-17 22:22 ` Liu Bo
2017-05-17 22:22 ` [PATCH 2/8] Btrfs: new helper btrfs_bio_clone_partial Liu Bo
` (7 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Liu Bo @ 2017-05-17 22:22 UTC (permalink / raw)
To: linux-btrfs
For raid1 and raid10, we clone the original bio to the bios which are then
sent to different disks.
Right now we use bio_clone_bioset to create a clone bio with iterating
bi_io_vec to initialize it. This changes it to use bio_clone_fast()
which creates a clone bio but only copies the bi_io_vec pointer
instead of iterating bi_io_vec.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
fs/btrfs/extent_io.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 27fdb25..0d4aea4 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2700,7 +2700,7 @@ struct bio *btrfs_bio_clone(struct bio *bio, gfp_t gfp_mask)
struct btrfs_io_bio *btrfs_bio;
struct bio *new;
- new = bio_clone_bioset(bio, gfp_mask, btrfs_bioset);
+ new = bio_clone_fast(bio, gfp_mask, btrfs_bioset);
if (new) {
btrfs_bio = btrfs_io_bio(new);
btrfs_bio->csum = NULL;
--
2.9.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/8] Btrfs: new helper btrfs_bio_clone_partial
2017-05-17 22:22 [PATCH 0/8 v1] utilize bio_clone_fast to clean up Liu Bo
2017-05-17 22:22 ` [PATCH 1/8] Btrfs: use bio_clone_fast to clone our bio Liu Bo
@ 2017-05-17 22:22 ` Liu Bo
2017-05-17 22:22 ` [PATCH 3/8] Btrfs: use bio_clone_bioset_partial to simplify DIO submit Liu Bo
` (6 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Liu Bo @ 2017-05-17 22:22 UTC (permalink / raw)
To: linux-btrfs
This adds a new helper btrfs_bio_clone_partial, it'll allocate a cloned
bio that only owns a part of the original bio's data.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
fs/btrfs/extent_io.c | 18 ++++++++++++++++++
fs/btrfs/extent_io.h | 2 ++
2 files changed, 20 insertions(+)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0d4aea4..47c0ee2 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2726,6 +2726,24 @@ struct bio *btrfs_io_bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
return bio;
}
+struct bio *btrfs_bio_clone_partial(struct bio *orig, gfp_t gfp_mask,
+ int offset, int size)
+{
+ struct bio *bio;
+ struct btrfs_io_bio *btrfs_bio;
+
+ /* this will never fail when it's backed by a bioset */
+ bio = bio_clone_fast(orig, gfp_mask, btrfs_bioset);
+ ASSERT(bio);
+
+ btrfs_bio = btrfs_io_bio(bio);
+ btrfs_bio->csum = NULL;
+ btrfs_bio->csum_allocated = NULL;
+ btrfs_bio->end_io = NULL;
+
+ bio_trim(bio, offset >> 9, size >> 9);
+ return bio;
+}
static int __must_check submit_one_bio(struct bio *bio, int mirror_num,
unsigned long bio_flags)
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 3e4fad4..b2235b5 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -460,6 +460,8 @@ btrfs_bio_alloc(struct block_device *bdev, u64 first_sector, int nr_vecs,
gfp_t gfp_flags);
struct bio *btrfs_io_bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs);
struct bio *btrfs_bio_clone(struct bio *bio, gfp_t gfp_mask);
+struct bio *btrfs_bio_clone_partial(struct bio *orig, gfp_t gfp_mask,
+ int offset, int size);
struct btrfs_fs_info;
struct btrfs_inode;
--
2.9.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/8] Btrfs: use bio_clone_bioset_partial to simplify DIO submit
2017-05-17 22:22 [PATCH 0/8 v1] utilize bio_clone_fast to clean up Liu Bo
2017-05-17 22:22 ` [PATCH 1/8] Btrfs: use bio_clone_fast to clone our bio Liu Bo
2017-05-17 22:22 ` [PATCH 2/8] Btrfs: new helper btrfs_bio_clone_partial Liu Bo
@ 2017-05-17 22:22 ` Liu Bo
2017-05-17 22:22 ` [PATCH 4/8] Btrfs: change how we iterate bios in endio Liu Bo
` (5 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Liu Bo @ 2017-05-17 22:22 UTC (permalink / raw)
To: linux-btrfs
Currently when mapping bio to limit bio to a single stripe length, we
split bio by adding page to bio one by one, but later we don't modify
the vector of bio at all, thus we can use bio_clone_fast to use the
original bio vector directly.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
fs/btrfs/inode.c | 121 +++++++++++++++++++++----------------------------------
1 file changed, 46 insertions(+), 75 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 134471b..b478ee0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8234,16 +8234,6 @@ static void btrfs_end_dio_bio(struct bio *bio)
bio_put(bio);
}
-static struct bio *btrfs_dio_bio_alloc(struct block_device *bdev,
- u64 first_sector, gfp_t gfp_flags)
-{
- struct bio *bio;
- bio = btrfs_bio_alloc(bdev, first_sector, BIO_MAX_PAGES, gfp_flags);
- if (bio)
- bio_associate_current(bio);
- return bio;
-}
-
static inline int btrfs_lookup_and_bind_dio_csum(struct inode *inode,
struct btrfs_dio_private *dip,
struct bio *bio,
@@ -8331,26 +8321,25 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip,
struct inode *inode = dip->inode;
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct btrfs_root *root = BTRFS_I(inode)->root;
- struct bio *bio;
+ struct bio *bio = NULL;
struct bio *orig_bio = dip->orig_bio;
- struct bio_vec *bvec;
u64 start_sector = orig_bio->bi_iter.bi_sector;
u64 file_offset = dip->logical_offset;
- u64 submit_len = 0;
u64 map_length;
- u32 blocksize = fs_info->sectorsize;
int async_submit = 0;
- int nr_sectors;
+ u64 submit_len;
+ int clone_offset = 0;
+ int clone_len;
int ret;
- int i, j;
map_length = orig_bio->bi_iter.bi_size;
+ submit_len = map_length;
ret = btrfs_map_block(fs_info, btrfs_op(orig_bio), start_sector << 9,
&map_length, NULL, 0);
if (ret)
return -EIO;
- if (map_length >= orig_bio->bi_iter.bi_size) {
+ if (map_length >= submit_len) {
bio = orig_bio;
dip->flags |= BTRFS_DIO_ORIG_BIO_SUBMITTED;
goto submit;
@@ -8362,70 +8351,52 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip,
else
async_submit = 1;
- bio = btrfs_dio_bio_alloc(orig_bio->bi_bdev, start_sector, GFP_NOFS);
- if (!bio)
- return -ENOMEM;
-
- bio->bi_opf = orig_bio->bi_opf;
- bio->bi_private = dip;
- bio->bi_end_io = btrfs_end_dio_bio;
- btrfs_io_bio(bio)->logical = file_offset;
+ /* bio split */
+ ASSERT(map_length <= INT_MAX);
atomic_inc(&dip->pending_bios);
+ while (submit_len > 0) {
+ clone_len = min_t(int, submit_len, map_length);
- bio_for_each_segment_all(bvec, orig_bio, j) {
- nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec->bv_len);
- i = 0;
-next_block:
- if (unlikely(map_length < submit_len + blocksize ||
- bio_add_page(bio, bvec->bv_page, blocksize,
- bvec->bv_offset + (i * blocksize)) < blocksize)) {
- /*
- * inc the count before we submit the bio so
- * we know the end IO handler won't happen before
- * we inc the count. Otherwise, the dip might get freed
- * before we're done setting it up
- */
- atomic_inc(&dip->pending_bios);
- ret = __btrfs_submit_dio_bio(bio, inode,
- file_offset, skip_sum,
- async_submit);
- if (ret) {
- bio_put(bio);
- atomic_dec(&dip->pending_bios);
- goto out_err;
- }
+ /*
+ * This will never fail as it's passing GPF_NOFS and
+ * the allocation is backed by btrfs_bioset.
+ */
+ bio = btrfs_bio_clone_partial(orig_bio, GFP_NOFS, clone_offset,
+ clone_len);
+ bio->bi_private = dip;
+ bio->bi_end_io = btrfs_end_dio_bio;
+ btrfs_io_bio(bio)->logical = file_offset;
+
+ ASSERT(submit_len >= clone_len);
+ submit_len -= clone_len;
+ if (submit_len == 0)
+ break;
- start_sector += submit_len >> 9;
- file_offset += submit_len;
+ /*
+ * Increase the count before we submit the bio so we know
+ * the end IO handler won't happen before we increase the
+ * count. Otherwise, the dip might get freed before we're
+ * done setting it up.
+ */
+ atomic_inc(&dip->pending_bios);
- submit_len = 0;
+ ret = __btrfs_submit_dio_bio(bio, inode, file_offset, skip_sum,
+ async_submit);
+ if (ret) {
+ bio_put(bio);
+ atomic_dec(&dip->pending_bios);
+ goto out_err;
+ }
- bio = btrfs_dio_bio_alloc(orig_bio->bi_bdev,
- start_sector, GFP_NOFS);
- if (!bio)
- goto out_err;
- bio->bi_opf = orig_bio->bi_opf;
- bio->bi_private = dip;
- bio->bi_end_io = btrfs_end_dio_bio;
- btrfs_io_bio(bio)->logical = file_offset;
+ clone_offset += clone_len;
+ start_sector += clone_len >> 9;
+ file_offset += clone_len;
- map_length = orig_bio->bi_iter.bi_size;
- ret = btrfs_map_block(fs_info, btrfs_op(orig_bio),
- start_sector << 9,
- &map_length, NULL, 0);
- if (ret) {
- bio_put(bio);
- goto out_err;
- }
-
- goto next_block;
- } else {
- submit_len += blocksize;
- if (--nr_sectors) {
- i++;
- goto next_block;
- }
- }
+ map_length = submit_len;
+ ret = btrfs_map_block(fs_info, btrfs_op(orig_bio),
+ start_sector << 9, &map_length, NULL, 0);
+ if (ret)
+ goto out_err;
}
submit:
--
2.9.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/8] Btrfs: change how we iterate bios in endio
2017-05-17 22:22 [PATCH 0/8 v1] utilize bio_clone_fast to clean up Liu Bo
` (2 preceding siblings ...)
2017-05-17 22:22 ` [PATCH 3/8] Btrfs: use bio_clone_bioset_partial to simplify DIO submit Liu Bo
@ 2017-05-17 22:22 ` Liu Bo
2017-06-12 17:05 ` [PATCH v3] " Liu Bo
2017-05-17 22:22 ` [PATCH 5/8] Btrfs: record error if one block has failed to retry Liu Bo
` (4 subsequent siblings)
8 siblings, 1 reply; 12+ messages in thread
From: Liu Bo @ 2017-05-17 22:22 UTC (permalink / raw)
To: linux-btrfs
Since dio submit has used bio_clone_fast, the submitted bio may not have a
reliable bi_vcnt, for the bio vector iterations in checksum related
functions, bio->bi_iter is not modified yet and it's safe to use
bio_for_each_segment, while for those bio vector iterations in dio's read
endio, we now save a copy of bvec_iter in struct btrfs_io_bio when cloning
bios and use the helper __bio_for_each_segment with the saved bvec_iter to
access each bvec.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
fs/btrfs/extent_io.c | 1 +
fs/btrfs/file-item.c | 31 +++++++++++++++----------------
fs/btrfs/inode.c | 35 +++++++++++++++++++----------------
fs/btrfs/volumes.h | 1 +
4 files changed, 36 insertions(+), 32 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 47c0ee2..eb5229e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2742,6 +2742,7 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, gfp_t gfp_mask,
btrfs_bio->end_io = NULL;
bio_trim(bio, offset >> 9, size >> 9);
+ btrfs_bio->iter = bio->bi_iter;
return bio;
}
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 64fcb31..9f6062c 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -164,7 +164,8 @@ static int __btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
u64 logical_offset, u32 *dst, int dio)
{
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
- struct bio_vec *bvec;
+ struct bio_vec bvec;
+ struct bvec_iter iter;
struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio);
struct btrfs_csum_item *item = NULL;
struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
@@ -177,7 +178,7 @@ static int __btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
u64 page_bytes_left;
u32 diff;
int nblocks;
- int count = 0, i;
+ int count = 0;
u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
path = btrfs_alloc_path();
@@ -206,8 +207,6 @@ static int __btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
if (bio->bi_iter.bi_size > PAGE_SIZE * 8)
path->reada = READA_FORWARD;
- WARN_ON(bio->bi_vcnt <= 0);
-
/*
* the free space stuff is only read when it hasn't been
* updated in the current transaction. So, we can safely
@@ -223,13 +222,13 @@ static int __btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
if (dio)
offset = logical_offset;
- bio_for_each_segment_all(bvec, bio, i) {
- page_bytes_left = bvec->bv_len;
+ bio_for_each_segment(bvec, bio, iter) {
+ page_bytes_left = bvec.bv_len;
if (count)
goto next;
if (!dio)
- offset = page_offset(bvec->bv_page) + bvec->bv_offset;
+ offset = page_offset(bvec.bv_page) + bvec.bv_offset;
count = btrfs_find_ordered_sum(inode, offset, disk_bytenr,
(u32 *)csum, nblocks);
if (count)
@@ -440,15 +439,15 @@ int btrfs_csum_one_bio(struct inode *inode, struct bio *bio,
struct btrfs_ordered_sum *sums;
struct btrfs_ordered_extent *ordered = NULL;
char *data;
- struct bio_vec *bvec;
+ struct bvec_iter iter;
+ struct bio_vec bvec;
int index;
int nr_sectors;
- int i, j;
unsigned long total_bytes = 0;
unsigned long this_sum_bytes = 0;
+ int i;
u64 offset;
- WARN_ON(bio->bi_vcnt <= 0);
sums = kzalloc(btrfs_ordered_sum_size(fs_info, bio->bi_iter.bi_size),
GFP_NOFS);
if (!sums)
@@ -465,19 +464,19 @@ int btrfs_csum_one_bio(struct inode *inode, struct bio *bio,
sums->bytenr = (u64)bio->bi_iter.bi_sector << 9;
index = 0;
- bio_for_each_segment_all(bvec, bio, j) {
+ bio_for_each_segment(bvec, bio, iter) {
if (!contig)
- offset = page_offset(bvec->bv_page) + bvec->bv_offset;
+ offset = page_offset(bvec.bv_page) + bvec.bv_offset;
if (!ordered) {
ordered = btrfs_lookup_ordered_extent(inode, offset);
BUG_ON(!ordered); /* Logic error */
}
- data = kmap_atomic(bvec->bv_page);
+ data = kmap_atomic(bvec.bv_page);
nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info,
- bvec->bv_len + fs_info->sectorsize
+ bvec.bv_len + fs_info->sectorsize
- 1);
for (i = 0; i < nr_sectors; i++) {
@@ -504,12 +503,12 @@ int btrfs_csum_one_bio(struct inode *inode, struct bio *bio,
+ total_bytes;
index = 0;
- data = kmap_atomic(bvec->bv_page);
+ data = kmap_atomic(bvec.bv_page);
}
sums->sums[index] = ~(u32)0;
sums->sums[index]
- = btrfs_csum_data(data + bvec->bv_offset
+ = btrfs_csum_data(data + bvec.bv_offset
+ (i * fs_info->sectorsize),
sums->sums[index],
fs_info->sectorsize);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b478ee0..997ee7d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7857,6 +7857,7 @@ static int dio_read_error(struct inode *inode, struct bio *failed_bio,
struct bio *bio;
int isector;
int read_mode = 0;
+ int segs;
int ret;
BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
@@ -7872,9 +7873,9 @@ static int dio_read_error(struct inode *inode, struct bio *failed_bio,
return -EIO;
}
- if ((failed_bio->bi_vcnt > 1)
- || (failed_bio->bi_io_vec->bv_len
- > btrfs_inode_sectorsize(inode)))
+ segs = bio_segments(failed_bio);
+ if (segs > 1 ||
+ (failed_bio->bi_io_vec->bv_len > btrfs_inode_sectorsize(inode)))
read_mode |= REQ_FAILFAST_DEV;
isector = start - btrfs_io_bio(failed_bio)->logical;
@@ -7932,13 +7933,13 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
struct btrfs_io_bio *io_bio)
{
struct btrfs_fs_info *fs_info;
- struct bio_vec *bvec;
+ struct bio_vec bvec;
+ struct bvec_iter iter;
struct btrfs_retry_complete done;
u64 start;
unsigned int pgoff;
u32 sectorsize;
int nr_sectors;
- int i;
int ret;
fs_info = BTRFS_I(inode)->root->fs_info;
@@ -7946,17 +7947,18 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
start = io_bio->logical;
done.inode = inode;
+ io_bio->bio.bi_iter = io_bio->iter;
- bio_for_each_segment_all(bvec, &io_bio->bio, i) {
- nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec->bv_len);
- pgoff = bvec->bv_offset;
+ bio_for_each_segment(bvec, &io_bio->bio, iter) {
+ nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
+ pgoff = bvec.bv_offset;
next_block_or_try_again:
done.uptodate = 0;
done.start = start;
init_completion(&done.done);
- ret = dio_read_error(inode, &io_bio->bio, bvec->bv_page,
+ ret = dio_read_error(inode, &io_bio->bio, bvec.bv_page,
pgoff, start, start + sectorsize - 1,
io_bio->mirror_num,
btrfs_retry_endio_nocsum, &done);
@@ -8021,7 +8023,8 @@ static int __btrfs_subio_endio_read(struct inode *inode,
struct btrfs_io_bio *io_bio, int err)
{
struct btrfs_fs_info *fs_info;
- struct bio_vec *bvec;
+ struct bio_vec bvec;
+ struct bvec_iter iter;
struct btrfs_retry_complete done;
u64 start;
u64 offset = 0;
@@ -8029,7 +8032,6 @@ static int __btrfs_subio_endio_read(struct inode *inode,
int nr_sectors;
unsigned int pgoff;
int csum_pos;
- int i;
int uptodate = !!(err == 0);
int ret;
@@ -8039,16 +8041,17 @@ static int __btrfs_subio_endio_read(struct inode *inode,
err = 0;
start = io_bio->logical;
done.inode = inode;
+ io_bio->bio.bi_iter = io_bio->iter;
- bio_for_each_segment_all(bvec, &io_bio->bio, i) {
- nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec->bv_len);
+ bio_for_each_segment(bvec, &io_bio->bio, iter) {
+ nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
- pgoff = bvec->bv_offset;
+ pgoff = bvec.bv_offset;
next_block:
if (uptodate) {
csum_pos = BTRFS_BYTES_TO_BLKS(fs_info, offset);
ret = __readpage_endio_check(inode, io_bio, csum_pos,
- bvec->bv_page, pgoff,
+ bvec.bv_page, pgoff,
start, sectorsize);
if (likely(!ret))
goto next;
@@ -8058,7 +8061,7 @@ static int __btrfs_subio_endio_read(struct inode *inode,
done.start = start;
init_completion(&done.done);
- ret = dio_read_error(inode, &io_bio->bio, bvec->bv_page,
+ ret = dio_read_error(inode, &io_bio->bio, bvec.bv_page,
pgoff, start, start + sectorsize - 1,
io_bio->mirror_num,
btrfs_retry_endio, &done);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 59be812..558d73c 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -280,6 +280,7 @@ struct btrfs_io_bio {
u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE];
u8 *csum_allocated;
btrfs_io_bio_end_io_t *end_io;
+ struct bvec_iter iter;
struct bio bio;
};
--
2.9.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 5/8] Btrfs: record error if one block has failed to retry
2017-05-17 22:22 [PATCH 0/8 v1] utilize bio_clone_fast to clean up Liu Bo
` (3 preceding siblings ...)
2017-05-17 22:22 ` [PATCH 4/8] Btrfs: change how we iterate bios in endio Liu Bo
@ 2017-05-17 22:22 ` Liu Bo
2017-05-17 22:22 ` [PATCH 6/8] Btrfs: make check-integrity use bvec_iter Liu Bo
` (3 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Liu Bo @ 2017-05-17 22:22 UTC (permalink / raw)
To: linux-btrfs
In the nocsum case of dio read endio, it returns immediately if an error
gets returned when repairing, which leaves the rest blocks unrepaired. The
behavior is different from how buffered read endio works in the same case.
This changes it to record error only and go on repairing the rest blocks.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
fs/btrfs/inode.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 997ee7d..3060c66 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7941,6 +7941,7 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
u32 sectorsize;
int nr_sectors;
int ret;
+ int err = 0;
fs_info = BTRFS_I(inode)->root->fs_info;
sectorsize = fs_info->sectorsize;
@@ -7962,8 +7963,10 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
pgoff, start, start + sectorsize - 1,
io_bio->mirror_num,
btrfs_retry_endio_nocsum, &done);
- if (ret)
- return ret;
+ if (ret) {
+ err = ret;
+ goto next;
+ }
wait_for_completion(&done.done);
@@ -7972,6 +7975,7 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
goto next_block_or_try_again;
}
+next:
start += sectorsize;
nr_sectors--;
@@ -7982,7 +7986,7 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
}
}
- return 0;
+ return err;
}
static void btrfs_retry_endio(struct bio *bio)
--
2.9.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 6/8] Btrfs: make check-integrity use bvec_iter
2017-05-17 22:22 [PATCH 0/8 v1] utilize bio_clone_fast to clean up Liu Bo
` (4 preceding siblings ...)
2017-05-17 22:22 ` [PATCH 5/8] Btrfs: record error if one block has failed to retry Liu Bo
@ 2017-05-17 22:22 ` Liu Bo
2017-05-17 22:22 ` [PATCH 7/8] Btrfs: unify naming of btrfs_io_bio Liu Bo
` (2 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Liu Bo @ 2017-05-17 22:22 UTC (permalink / raw)
To: linux-btrfs
Some check-integrity code depends on bio->bi_vcnt, this changes it to use
bio segments because some bios passing here may not have a reliable
bi_vcnt.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
fs/btrfs/check-integrity.c | 27 +++++++++++++++------------
1 file changed, 15 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index ab14c2e..6b080f1 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2822,44 +2822,47 @@ static void __btrfsic_submit_bio(struct bio *bio)
dev_state = btrfsic_dev_state_lookup(bio->bi_bdev);
if (NULL != dev_state &&
(bio_op(bio) == REQ_OP_WRITE) && bio_has_data(bio)) {
- unsigned int i;
+ unsigned int i = 0;
u64 dev_bytenr;
u64 cur_bytenr;
- struct bio_vec *bvec;
+ struct bio_vec bvec;
+ struct bvec_iter iter;
int bio_is_patched;
char **mapped_datav;
+ unsigned int segs = bio_segments(bio);
dev_bytenr = 512 * bio->bi_iter.bi_sector;
bio_is_patched = 0;
if (dev_state->state->print_mask &
BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
pr_info("submit_bio(rw=%d,0x%x, bi_vcnt=%u, bi_sector=%llu (bytenr %llu), bi_bdev=%p)\n",
- bio_op(bio), bio->bi_opf, bio->bi_vcnt,
+ bio_op(bio), bio->bi_opf, segs,
(unsigned long long)bio->bi_iter.bi_sector,
dev_bytenr, bio->bi_bdev);
- mapped_datav = kmalloc_array(bio->bi_vcnt,
+ mapped_datav = kmalloc_array(segs,
sizeof(*mapped_datav), GFP_NOFS);
if (!mapped_datav)
goto leave;
cur_bytenr = dev_bytenr;
- bio_for_each_segment_all(bvec, bio, i) {
- BUG_ON(bvec->bv_len != PAGE_SIZE);
- mapped_datav[i] = kmap(bvec->bv_page);
+ bio_for_each_segment(bvec, bio, iter) {
+ BUG_ON(bvec.bv_len != PAGE_SIZE);
+ mapped_datav[i] = kmap(bvec.bv_page);
+ i++;
if (dev_state->state->print_mask &
BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH_VERBOSE)
pr_info("#%u: bytenr=%llu, len=%u, offset=%u\n",
- i, cur_bytenr, bvec->bv_len, bvec->bv_offset);
- cur_bytenr += bvec->bv_len;
+ i, cur_bytenr, bvec.bv_len, bvec.bv_offset);
+ cur_bytenr += bvec.bv_len;
}
btrfsic_process_written_block(dev_state, dev_bytenr,
- mapped_datav, bio->bi_vcnt,
+ mapped_datav, segs,
bio, &bio_is_patched,
NULL, bio->bi_opf);
- bio_for_each_segment_all(bvec, bio, i)
- kunmap(bvec->bv_page);
+ bio_for_each_segment(bvec, bio, iter)
+ kunmap(bvec.bv_page);
kfree(mapped_datav);
} else if (NULL != dev_state && (bio->bi_opf & REQ_PREFLUSH)) {
if (dev_state->state->print_mask &
--
2.9.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 7/8] Btrfs: unify naming of btrfs_io_bio
2017-05-17 22:22 [PATCH 0/8 v1] utilize bio_clone_fast to clean up Liu Bo
` (5 preceding siblings ...)
2017-05-17 22:22 ` [PATCH 6/8] Btrfs: make check-integrity use bvec_iter Liu Bo
@ 2017-05-17 22:22 ` Liu Bo
2017-05-17 22:22 ` [PATCH 8/8] Btrfs: hardcode GFP_NOFS for btrfs_bio_clone_partial Liu Bo
2017-05-19 17:28 ` [PATCH 0/8 v1] utilize bio_clone_fast to clean up David Sterba
8 siblings, 0 replies; 12+ messages in thread
From: Liu Bo @ 2017-05-17 22:22 UTC (permalink / raw)
To: linux-btrfs
All dio endio functions are using io_bio for struct btrfs_io_bio, this
makes btrfs_submit_direct to follow this convention.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
fs/btrfs/inode.c | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3060c66..042a13e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8431,16 +8431,16 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
loff_t file_offset)
{
struct btrfs_dio_private *dip = NULL;
- struct bio *io_bio = NULL;
- struct btrfs_io_bio *btrfs_bio;
+ struct bio *bio = NULL;
+ struct btrfs_io_bio *io_bio;
int skip_sum;
bool write = (bio_op(dio_bio) == REQ_OP_WRITE);
int ret = 0;
skip_sum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM;
- io_bio = btrfs_bio_clone(dio_bio, GFP_NOFS);
- if (!io_bio) {
+ bio = btrfs_bio_clone(dio_bio, GFP_NOFS);
+ if (!bio) {
ret = -ENOMEM;
goto free_ordered;
}
@@ -8456,17 +8456,17 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
dip->logical_offset = file_offset;
dip->bytes = dio_bio->bi_iter.bi_size;
dip->disk_bytenr = (u64)dio_bio->bi_iter.bi_sector << 9;
- io_bio->bi_private = dip;
- dip->orig_bio = io_bio;
+ bio->bi_private = dip;
+ dip->orig_bio = bio;
dip->dio_bio = dio_bio;
atomic_set(&dip->pending_bios, 0);
- btrfs_bio = btrfs_io_bio(io_bio);
- btrfs_bio->logical = file_offset;
+ io_bio = btrfs_io_bio(bio);
+ io_bio->logical = file_offset;
if (write) {
- io_bio->bi_end_io = btrfs_endio_direct_write;
+ bio->bi_end_io = btrfs_endio_direct_write;
} else {
- io_bio->bi_end_io = btrfs_endio_direct_read;
+ bio->bi_end_io = btrfs_endio_direct_read;
dip->subio_endio = btrfs_subio_endio_read;
}
@@ -8489,8 +8489,8 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
if (!ret)
return;
- if (btrfs_bio->end_io)
- btrfs_bio->end_io(btrfs_bio, ret);
+ if (io_bio->end_io)
+ io_bio->end_io(io_bio, ret);
free_ordered:
/*
@@ -8502,16 +8502,16 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
* same as btrfs_endio_direct_[write|read] because we can't call these
* callbacks - they require an allocated dip and a clone of dio_bio.
*/
- if (io_bio && dip) {
- io_bio->bi_error = -EIO;
- bio_endio(io_bio);
+ if (bio && dip) {
+ bio->bi_error = -EIO;
+ bio_endio(bio);
/*
- * The end io callbacks free our dip, do the final put on io_bio
+ * The end io callbacks free our dip, do the final put on bio
* and all the cleanup and final put for dio_bio (through
* dio_end_io()).
*/
dip = NULL;
- io_bio = NULL;
+ bio = NULL;
} else {
if (write)
btrfs_endio_direct_write_update_ordered(inode,
@@ -8529,8 +8529,8 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
*/
dio_end_io(dio_bio, ret);
}
- if (io_bio)
- bio_put(io_bio);
+ if (bio)
+ bio_put(bio);
kfree(dip);
}
--
2.9.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 8/8] Btrfs: hardcode GFP_NOFS for btrfs_bio_clone_partial
2017-05-17 22:22 [PATCH 0/8 v1] utilize bio_clone_fast to clean up Liu Bo
` (6 preceding siblings ...)
2017-05-17 22:22 ` [PATCH 7/8] Btrfs: unify naming of btrfs_io_bio Liu Bo
@ 2017-05-17 22:22 ` Liu Bo
2017-05-19 17:28 ` [PATCH 0/8 v1] utilize bio_clone_fast to clean up David Sterba
8 siblings, 0 replies; 12+ messages in thread
From: Liu Bo @ 2017-05-17 22:22 UTC (permalink / raw)
To: linux-btrfs
We only pass GFP_NOFS to btrfs_bio_clone_partial, so lets hardcode it.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
fs/btrfs/extent_io.c | 5 ++---
fs/btrfs/extent_io.h | 3 +--
fs/btrfs/inode.c | 2 +-
3 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index eb5229e..f70d9f88 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2726,14 +2726,13 @@ struct bio *btrfs_io_bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
return bio;
}
-struct bio *btrfs_bio_clone_partial(struct bio *orig, gfp_t gfp_mask,
- int offset, int size)
+struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size)
{
struct bio *bio;
struct btrfs_io_bio *btrfs_bio;
/* this will never fail when it's backed by a bioset */
- bio = bio_clone_fast(orig, gfp_mask, btrfs_bioset);
+ bio = bio_clone_fast(orig, GFP_NOFS, btrfs_bioset);
ASSERT(bio);
btrfs_bio = btrfs_io_bio(bio);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index b2235b5..512918c 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -460,8 +460,7 @@ btrfs_bio_alloc(struct block_device *bdev, u64 first_sector, int nr_vecs,
gfp_t gfp_flags);
struct bio *btrfs_io_bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs);
struct bio *btrfs_bio_clone(struct bio *bio, gfp_t gfp_mask);
-struct bio *btrfs_bio_clone_partial(struct bio *orig, gfp_t gfp_mask,
- int offset, int size);
+struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size);
struct btrfs_fs_info;
struct btrfs_inode;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 042a13e..f8590c9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8368,7 +8368,7 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip,
* This will never fail as it's passing GPF_NOFS and
* the allocation is backed by btrfs_bioset.
*/
- bio = btrfs_bio_clone_partial(orig_bio, GFP_NOFS, clone_offset,
+ bio = btrfs_bio_clone_partial(orig_bio, clone_offset,
clone_len);
bio->bi_private = dip;
bio->bi_end_io = btrfs_end_dio_bio;
--
2.9.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 0/8 v1] utilize bio_clone_fast to clean up
2017-05-17 22:22 [PATCH 0/8 v1] utilize bio_clone_fast to clean up Liu Bo
` (7 preceding siblings ...)
2017-05-17 22:22 ` [PATCH 8/8] Btrfs: hardcode GFP_NOFS for btrfs_bio_clone_partial Liu Bo
@ 2017-05-19 17:28 ` David Sterba
8 siblings, 0 replies; 12+ messages in thread
From: David Sterba @ 2017-05-19 17:28 UTC (permalink / raw)
To: Liu Bo; +Cc: linux-btrfs
On Wed, May 17, 2017 at 04:22:44PM -0600, Liu Bo wrote:
> v1: - Drop the RFC tag.
> - Update to use bio_segments accordingly as __bio_segments is removed.
> - Remove if (!bio) since bio_clone_fast with bioset and GFP_NOFS will
> never fail.
>
> This attempts to use bio_clone_fast() in the places where we clone bio,
> such as when bio got cloned for multiple disks and when bio got split
> during dio submit.
>
> One benefit is to simplify dio submit to avoid calling bio_add_page one by
> one.
>
> Another benefit is that comparing to bio_clone_bioset, bio_clone_fast is
> faster because of copying the vector pointer directly, and bio_clone_fast
> doesn't modify bi_vcnt, so the extra work is to fix up bi_vcnt usage we
> currently have to use bi_iter to iterate bvec.
>
> Here are some numbers collected with the script [1], note that most of
> performance tests usually issue bs=4k dio write/read so our directIO split code
> is not tested as it requires bs > stripe_len(64K), thus I made this simple
> script which writes 2G with bs=128K.
>
> - vanilla:
> real 0m10.265s
> user 0m0.005s
> sys 0m9.164s
>
> - patched:
> real 0m8.973s
> user 0m0.006s
> sys 0m7.804s
>
> [1]:
> #!/bin/bash
>
> M=/mnt/btrfs
> D1=/dev/pmem0p1
> D2=/dev/pmem0p2
>
> umount $M
> mkfs.btrfs -f $D1 $D2 >/dev/null || exit
>
> mount $D1 $M -onodatasum || exit
>
> xfs_io -f -c "falloc 0 2G" $M/foo
>
> time xfs_io -d -c "pwrite -b 128K 0 2G" $M/foo
>
> Liu Bo (8):
> Btrfs: use bio_clone_fast to clone our bio
> Btrfs: new helper btrfs_bio_clone_partial
> Btrfs: use bio_clone_bioset_partial to simplify DIO submit
> Btrfs: change how we iterate bios in endio
> Btrfs: record error if one block has failed to retry
> Btrfs: make check-integrity use bvec_iter
> Btrfs: unify naming of btrfs_io_bio
> Btrfs: hardcode GFP_NOFS for btrfs_bio_clone_partial
Added to 4.13 queue, thankgs.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3] Btrfs: change how we iterate bios in endio
2017-05-17 22:22 ` [PATCH 4/8] Btrfs: change how we iterate bios in endio Liu Bo
@ 2017-06-12 17:05 ` Liu Bo
2017-06-13 13:29 ` David Sterba
0 siblings, 1 reply; 12+ messages in thread
From: Liu Bo @ 2017-06-12 17:05 UTC (permalink / raw)
To: linux-btrfs; +Cc: David Sterba
Since dio submit has used bio_clone_fast, the submitted bio may not have a
reliable bi_vcnt, for the bio vector iterations in checksum related
functions, bio->bi_iter is not modified yet and it's safe to use
bio_for_each_segment, while for those bio vector iterations in dio read's
endio, we now save a copy of bvec_iter in struct btrfs_io_bio when cloning
bios and use the helper __bio_for_each_segment with the saved bvec_iter to
access each bvec.
Also for dio reads which don't get split, we also need to save a copy of
bio iterator in btrfs_bio_clone to let __bio_for_each_segments to access
each bvec in dio read's endio. Note that it doesn't affect other calls of
btrfs_bio_clone() because they don't need to use this iterator.
Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
v2: Fix null pointer crash in the case of non-split dio reads.
fs/btrfs/extent_io.c | 2 ++
fs/btrfs/file-item.c | 31 +++++++++++++++----------------
fs/btrfs/inode.c | 35 +++++++++++++++++++----------------
fs/btrfs/volumes.h | 1 +
4 files changed, 37 insertions(+), 32 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 89b824d..79be8c2 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2719,6 +2719,7 @@ struct bio *btrfs_bio_clone(struct bio *bio, gfp_t gfp_mask)
btrfs_bio->csum = NULL;
btrfs_bio->csum_allocated = NULL;
btrfs_bio->end_io = NULL;
+ btrfs_bio->iter = bio->bi_iter;
}
return new;
}
@@ -2755,6 +2756,7 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, gfp_t gfp_mask,
btrfs_bio->end_io = NULL;
bio_trim(bio, offset >> 9, size >> 9);
+ btrfs_bio->iter = bio->bi_iter;
return bio;
}
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 64fcb31..9f6062c 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -164,7 +164,8 @@ static int __btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
u64 logical_offset, u32 *dst, int dio)
{
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
- struct bio_vec *bvec;
+ struct bio_vec bvec;
+ struct bvec_iter iter;
struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio);
struct btrfs_csum_item *item = NULL;
struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
@@ -177,7 +178,7 @@ static int __btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
u64 page_bytes_left;
u32 diff;
int nblocks;
- int count = 0, i;
+ int count = 0;
u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
path = btrfs_alloc_path();
@@ -206,8 +207,6 @@ static int __btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
if (bio->bi_iter.bi_size > PAGE_SIZE * 8)
path->reada = READA_FORWARD;
- WARN_ON(bio->bi_vcnt <= 0);
-
/*
* the free space stuff is only read when it hasn't been
* updated in the current transaction. So, we can safely
@@ -223,13 +222,13 @@ static int __btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
if (dio)
offset = logical_offset;
- bio_for_each_segment_all(bvec, bio, i) {
- page_bytes_left = bvec->bv_len;
+ bio_for_each_segment(bvec, bio, iter) {
+ page_bytes_left = bvec.bv_len;
if (count)
goto next;
if (!dio)
- offset = page_offset(bvec->bv_page) + bvec->bv_offset;
+ offset = page_offset(bvec.bv_page) + bvec.bv_offset;
count = btrfs_find_ordered_sum(inode, offset, disk_bytenr,
(u32 *)csum, nblocks);
if (count)
@@ -440,15 +439,15 @@ int btrfs_csum_one_bio(struct inode *inode, struct bio *bio,
struct btrfs_ordered_sum *sums;
struct btrfs_ordered_extent *ordered = NULL;
char *data;
- struct bio_vec *bvec;
+ struct bvec_iter iter;
+ struct bio_vec bvec;
int index;
int nr_sectors;
- int i, j;
unsigned long total_bytes = 0;
unsigned long this_sum_bytes = 0;
+ int i;
u64 offset;
- WARN_ON(bio->bi_vcnt <= 0);
sums = kzalloc(btrfs_ordered_sum_size(fs_info, bio->bi_iter.bi_size),
GFP_NOFS);
if (!sums)
@@ -465,19 +464,19 @@ int btrfs_csum_one_bio(struct inode *inode, struct bio *bio,
sums->bytenr = (u64)bio->bi_iter.bi_sector << 9;
index = 0;
- bio_for_each_segment_all(bvec, bio, j) {
+ bio_for_each_segment(bvec, bio, iter) {
if (!contig)
- offset = page_offset(bvec->bv_page) + bvec->bv_offset;
+ offset = page_offset(bvec.bv_page) + bvec.bv_offset;
if (!ordered) {
ordered = btrfs_lookup_ordered_extent(inode, offset);
BUG_ON(!ordered); /* Logic error */
}
- data = kmap_atomic(bvec->bv_page);
+ data = kmap_atomic(bvec.bv_page);
nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info,
- bvec->bv_len + fs_info->sectorsize
+ bvec.bv_len + fs_info->sectorsize
- 1);
for (i = 0; i < nr_sectors; i++) {
@@ -504,12 +503,12 @@ int btrfs_csum_one_bio(struct inode *inode, struct bio *bio,
+ total_bytes;
index = 0;
- data = kmap_atomic(bvec->bv_page);
+ data = kmap_atomic(bvec.bv_page);
}
sums->sums[index] = ~(u32)0;
sums->sums[index]
- = btrfs_csum_data(data + bvec->bv_offset
+ = btrfs_csum_data(data + bvec.bv_offset
+ (i * fs_info->sectorsize),
sums->sums[index],
fs_info->sectorsize);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 58ff494..160a25b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7981,6 +7981,7 @@ static int dio_read_error(struct inode *inode, struct bio *failed_bio,
struct bio *bio;
int isector;
int read_mode = 0;
+ int segs;
int ret;
BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
@@ -7996,9 +7997,9 @@ static int dio_read_error(struct inode *inode, struct bio *failed_bio,
return -EIO;
}
- if ((failed_bio->bi_vcnt > 1)
- || (failed_bio->bi_io_vec->bv_len
- > btrfs_inode_sectorsize(inode)))
+ segs = bio_segments(failed_bio);
+ if (segs > 1 ||
+ (failed_bio->bi_io_vec->bv_len > btrfs_inode_sectorsize(inode)))
read_mode |= REQ_FAILFAST_DEV;
isector = start - btrfs_io_bio(failed_bio)->logical;
@@ -8056,13 +8057,13 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
struct btrfs_io_bio *io_bio)
{
struct btrfs_fs_info *fs_info;
- struct bio_vec *bvec;
+ struct bio_vec bvec;
+ struct bvec_iter iter;
struct btrfs_retry_complete done;
u64 start;
unsigned int pgoff;
u32 sectorsize;
int nr_sectors;
- int i;
int ret;
fs_info = BTRFS_I(inode)->root->fs_info;
@@ -8070,17 +8071,18 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
start = io_bio->logical;
done.inode = inode;
+ io_bio->bio.bi_iter = io_bio->iter;
- bio_for_each_segment_all(bvec, &io_bio->bio, i) {
- nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec->bv_len);
- pgoff = bvec->bv_offset;
+ bio_for_each_segment(bvec, &io_bio->bio, iter) {
+ nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
+ pgoff = bvec.bv_offset;
next_block_or_try_again:
done.uptodate = 0;
done.start = start;
init_completion(&done.done);
- ret = dio_read_error(inode, &io_bio->bio, bvec->bv_page,
+ ret = dio_read_error(inode, &io_bio->bio, bvec.bv_page,
pgoff, start, start + sectorsize - 1,
io_bio->mirror_num,
btrfs_retry_endio_nocsum, &done);
@@ -8145,7 +8147,8 @@ static int __btrfs_subio_endio_read(struct inode *inode,
struct btrfs_io_bio *io_bio, int err)
{
struct btrfs_fs_info *fs_info;
- struct bio_vec *bvec;
+ struct bio_vec bvec;
+ struct bvec_iter iter;
struct btrfs_retry_complete done;
u64 start;
u64 offset = 0;
@@ -8153,7 +8156,6 @@ static int __btrfs_subio_endio_read(struct inode *inode,
int nr_sectors;
unsigned int pgoff;
int csum_pos;
- int i;
int uptodate = !!(err == 0);
int ret;
@@ -8163,16 +8165,17 @@ static int __btrfs_subio_endio_read(struct inode *inode,
err = 0;
start = io_bio->logical;
done.inode = inode;
+ io_bio->bio.bi_iter = io_bio->iter;
- bio_for_each_segment_all(bvec, &io_bio->bio, i) {
- nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec->bv_len);
+ bio_for_each_segment(bvec, &io_bio->bio, iter) {
+ nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
- pgoff = bvec->bv_offset;
+ pgoff = bvec.bv_offset;
next_block:
if (uptodate) {
csum_pos = BTRFS_BYTES_TO_BLKS(fs_info, offset);
ret = __readpage_endio_check(inode, io_bio, csum_pos,
- bvec->bv_page, pgoff,
+ bvec.bv_page, pgoff,
start, sectorsize);
if (likely(!ret))
goto next;
@@ -8182,7 +8185,7 @@ static int __btrfs_subio_endio_read(struct inode *inode,
done.start = start;
init_completion(&done.done);
- ret = dio_read_error(inode, &io_bio->bio, bvec->bv_page,
+ ret = dio_read_error(inode, &io_bio->bio, bvec.bv_page,
pgoff, start, start + sectorsize - 1,
io_bio->mirror_num,
btrfs_retry_endio, &done);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index c7d0fbc..7a5a17c 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -279,6 +279,7 @@ struct btrfs_io_bio {
u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE];
u8 *csum_allocated;
btrfs_io_bio_end_io_t *end_io;
+ struct bvec_iter iter;
struct bio bio;
};
--
2.9.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3] Btrfs: change how we iterate bios in endio
2017-06-12 17:05 ` [PATCH v3] " Liu Bo
@ 2017-06-13 13:29 ` David Sterba
0 siblings, 0 replies; 12+ messages in thread
From: David Sterba @ 2017-06-13 13:29 UTC (permalink / raw)
To: Liu Bo; +Cc: linux-btrfs, David Sterba
On Mon, Jun 12, 2017 at 11:05:12AM -0600, Liu Bo wrote:
> Since dio submit has used bio_clone_fast, the submitted bio may not have a
> reliable bi_vcnt, for the bio vector iterations in checksum related
> functions, bio->bi_iter is not modified yet and it's safe to use
> bio_for_each_segment, while for those bio vector iterations in dio read's
> endio, we now save a copy of bvec_iter in struct btrfs_io_bio when cloning
> bios and use the helper __bio_for_each_segment with the saved bvec_iter to
> access each bvec.
>
> Also for dio reads which don't get split, we also need to save a copy of
> bio iterator in btrfs_bio_clone to let __bio_for_each_segments to access
> each bvec in dio read's endio. Note that it doesn't affect other calls of
> btrfs_bio_clone() because they don't need to use this iterator.
>
> Cc: David Sterba <dsterba@suse.cz>
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
> v2: Fix null pointer crash in the case of non-split dio reads.
Perfect, thanks. Patch replaced.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-06-13 13:30 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-17 22:22 [PATCH 0/8 v1] utilize bio_clone_fast to clean up Liu Bo
2017-05-17 22:22 ` [PATCH 1/8] Btrfs: use bio_clone_fast to clone our bio Liu Bo
2017-05-17 22:22 ` [PATCH 2/8] Btrfs: new helper btrfs_bio_clone_partial Liu Bo
2017-05-17 22:22 ` [PATCH 3/8] Btrfs: use bio_clone_bioset_partial to simplify DIO submit Liu Bo
2017-05-17 22:22 ` [PATCH 4/8] Btrfs: change how we iterate bios in endio Liu Bo
2017-06-12 17:05 ` [PATCH v3] " Liu Bo
2017-06-13 13:29 ` David Sterba
2017-05-17 22:22 ` [PATCH 5/8] Btrfs: record error if one block has failed to retry Liu Bo
2017-05-17 22:22 ` [PATCH 6/8] Btrfs: make check-integrity use bvec_iter Liu Bo
2017-05-17 22:22 ` [PATCH 7/8] Btrfs: unify naming of btrfs_io_bio Liu Bo
2017-05-17 22:22 ` [PATCH 8/8] Btrfs: hardcode GFP_NOFS for btrfs_bio_clone_partial Liu Bo
2017-05-19 17:28 ` [PATCH 0/8 v1] utilize bio_clone_fast to clean up David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).