* [PATCH 02/19] bcache: fix sequential large write IO bypass
2017-06-30 20:42 ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
@ 2017-06-30 20:42 ` bcache
2017-07-05 18:25 ` Christoph Hellwig
2017-06-30 20:42 ` [PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO bcache
` (9 subsequent siblings)
10 siblings, 1 reply; 41+ messages in thread
From: bcache @ 2017-06-30 20:42 UTC (permalink / raw)
To: linux-block; +Cc: linux-bcache, hch, axboe, Tang Junhui, stable
From: Tang Junhui <tang.junhui@zte.com.cn>
Sequential write IOs were tested with bs=1M by FIO in writeback cache
mode, these IOs were expected to be bypassed, but actually they did not.
We debug the code, and find in check_should_bypass():
if (!congested &&
mode == CACHE_MODE_WRITEBACK &&
op_is_write(bio_op(bio)) &&
(bio->bi_opf & REQ_SYNC))
goto rescale
that means, If in writeback mode, a write IO with REQ_SYNC flag will not
be bypassed though it is a sequential large IO, It's not a correct thing
to do actually, so this patch remove these codes.
Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org
---
drivers/md/bcache/request.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 019b3df..958072a 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -400,12 +400,6 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio)
if (!congested && !dc->sequential_cutoff)
goto rescale;
- if (!congested &&
- mode == CACHE_MODE_WRITEBACK &&
- op_is_write(bio->bi_opf) &&
- op_is_sync(bio->bi_opf))
- goto rescale;
-
spin_lock(&dc->io_lock);
hlist_for_each_entry(i, iohash(dc, bio->bi_iter.bi_sector), hash)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO
2017-06-30 20:42 ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
2017-06-30 20:42 ` [PATCH 02/19] bcache: fix sequential large write IO bypass bcache
@ 2017-06-30 20:42 ` bcache
2017-07-01 17:26 ` Coly Li
2017-07-05 18:25 ` Christoph Hellwig
2017-06-30 20:42 ` [PATCH 04/19] bcache: fix wrong cache_misses statistics bcache
` (8 subsequent siblings)
10 siblings, 2 replies; 41+ messages in thread
From: bcache @ 2017-06-30 20:42 UTC (permalink / raw)
To: linux-block; +Cc: linux-bcache, hch, axboe, Tang Junhui, stable
From: Tang Junhui <tang.junhui@zte.com.cn>
Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to
trigger gc thread.
Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org
---
drivers/md/bcache/request.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 958072a..4b413db 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -196,12 +196,12 @@ static void bch_data_insert_start(struct closure *cl)
struct data_insert_op *op = container_of(cl, struct data_insert_op, cl);
struct bio *bio = op->bio, *n;
- if (atomic_sub_return(bio_sectors(bio), &op->c->sectors_to_gc) < 0)
- wake_up_gc(op->c);
-
if (op->bypass)
return bch_data_invalidate(cl);
+ if (atomic_sub_return(bio_sectors(bio), &op->c->sectors_to_gc) < 0)
+ wake_up_gc(op->c);
+
/*
* Journal writes are marked REQ_PREFLUSH; if the original write was a
* flush, it'll wait on the journal write.
--
1.8.3.1
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO
2017-06-30 20:42 ` [PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO bcache
@ 2017-07-01 17:26 ` Coly Li
2017-07-05 18:25 ` Christoph Hellwig
1 sibling, 0 replies; 41+ messages in thread
From: Coly Li @ 2017-07-01 17:26 UTC (permalink / raw)
To: bcache, linux-block; +Cc: linux-bcache, hch, axboe, Tang Junhui, stable
On 2017/7/1 上午4:42, bcache@lists.ewheeler.net wrote:
> From: Tang Junhui <tang.junhui@zte.com.cn>
>
> Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to
> trigger gc thread.
>
> Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
> Cc: stable@vger.kernel.org
Acked-by: Coly Li <colyli@suse.de>
Thanks.
> ---
> drivers/md/bcache/request.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index 958072a..4b413db 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -196,12 +196,12 @@ static void bch_data_insert_start(struct closure *cl)
> struct data_insert_op *op = container_of(cl, struct data_insert_op, cl);
> struct bio *bio = op->bio, *n;
>
> - if (atomic_sub_return(bio_sectors(bio), &op->c->sectors_to_gc) < 0)
> - wake_up_gc(op->c);
> -
> if (op->bypass)
> return bch_data_invalidate(cl);
>
> + if (atomic_sub_return(bio_sectors(bio), &op->c->sectors_to_gc) < 0)
> + wake_up_gc(op->c);
> +
> /*
> * Journal writes are marked REQ_PREFLUSH; if the original write was a
> * flush, it'll wait on the journal write.
>
--
Coly Li
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO
2017-06-30 20:42 ` [PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO bcache
2017-07-01 17:26 ` Coly Li
@ 2017-07-05 18:25 ` Christoph Hellwig
1 sibling, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2017-07-05 18:25 UTC (permalink / raw)
To: bcache; +Cc: linux-block, linux-bcache, hch, axboe, Tang Junhui, stable
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 04/19] bcache: fix wrong cache_misses statistics
2017-06-30 20:42 ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
2017-06-30 20:42 ` [PATCH 02/19] bcache: fix sequential large write IO bypass bcache
2017-06-30 20:42 ` [PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO bcache
@ 2017-06-30 20:42 ` bcache
2017-07-01 17:58 ` Coly Li
2017-06-30 20:42 ` [PATCH 06/19] bcache: explicitly destory mutex while exiting bcache
` (7 subsequent siblings)
10 siblings, 1 reply; 41+ messages in thread
From: bcache @ 2017-06-30 20:42 UTC (permalink / raw)
To: linux-block; +Cc: linux-bcache, hch, axboe, Tang Junhui, stable
From: Tang Junhui <tang.junhui@zte.com.cn>
Some missed IOs are not counted into cache_misses, this patch fix this
issue.
Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org
---
drivers/md/bcache/request.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 4b413db..d27707d 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -462,6 +462,7 @@ struct search {
unsigned recoverable:1;
unsigned write:1;
unsigned read_dirty_data:1;
+ unsigned cache_missed:1;
unsigned long start_time;
@@ -647,6 +648,7 @@ static inline struct search *search_alloc(struct bio *bio,
s->orig_bio = bio;
s->cache_miss = NULL;
+ s->cache_missed = 0;
s->d = d;
s->recoverable = 1;
s->write = op_is_write(bio_op(bio));
@@ -758,7 +760,7 @@ static void cached_dev_read_done_bh(struct closure *cl)
struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
bch_mark_cache_accounting(s->iop.c, s->d,
- !s->cache_miss, s->iop.bypass);
+ !s->cache_missed, s->iop.bypass);
trace_bcache_read(s->orig_bio, !s->cache_miss, s->iop.bypass);
if (s->iop.status)
@@ -777,6 +779,8 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
struct bio *miss, *cache_bio;
+ s->cache_missed = 1; /* true */
+
if (s->cache_miss || s->iop.bypass) {
miss = bio_next_split(bio, sectors, GFP_NOIO, s->d->bio_split);
ret = miss == bio ? MAP_DONE : MAP_CONTINUE;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH 04/19] bcache: fix wrong cache_misses statistics
2017-06-30 20:42 ` [PATCH 04/19] bcache: fix wrong cache_misses statistics bcache
@ 2017-07-01 17:58 ` Coly Li
0 siblings, 0 replies; 41+ messages in thread
From: Coly Li @ 2017-07-01 17:58 UTC (permalink / raw)
To: Tang Junhui; +Cc: bcache, linux-block, linux-bcache, hch, axboe, stable
On 2017/7/1 上午4:42, bcache@lists.ewheeler.net wrote:
> From: Tang Junhui <tang.junhui@zte.com.cn>
>
> Some missed IOs are not counted into cache_misses, this patch fix this
> issue.
Could you please explain more about,
- which kind of missed I/O are mot counted
- where cache_missed is located
This will help the patch to be more understandable.
>
> Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
> Cc: stable@vger.kernel.org
[snip]
> @@ -758,7 +760,7 @@ static void cached_dev_read_done_bh(struct closure *cl)
> struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
>
> bch_mark_cache_accounting(s->iop.c, s->d,
> - !s->cache_miss, s->iop.bypass);
> + !s->cache_missed, s->iop.bypass);
> trace_bcache_read(s->orig_bio, !s->cache_miss, s->iop.bypass);
Should the above line be changed to,
trace_bcache_read(s->orig_bio, !s->cache_missed, s->iop.bypass);
as well ?
[snip]
Thanks.
--
Coly Li
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 06/19] bcache: explicitly destory mutex while exiting
2017-06-30 20:42 ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
` (2 preceding siblings ...)
2017-06-30 20:42 ` [PATCH 04/19] bcache: fix wrong cache_misses statistics bcache
@ 2017-06-30 20:42 ` bcache
2017-07-01 18:43 ` Coly Li
2017-07-05 18:27 ` Christoph Hellwig
2017-06-30 20:42 ` [PATCH 10/19] bcache: initialize stripe_sectors_dirty correctly for thin flash device bcache
` (6 subsequent siblings)
10 siblings, 2 replies; 41+ messages in thread
From: bcache @ 2017-06-30 20:42 UTC (permalink / raw)
To: linux-block; +Cc: linux-bcache, hch, axboe, Liang Chen, stable
From: Liang Chen <liangchen.linux@gmail.com>
mutex_destroy does nothing most of time, but it's better to call
it to make the code future proof and it also has some meaning
for like mutex debug.
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org
---
drivers/md/bcache/super.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 48b8c20..1f84791 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2089,6 +2089,7 @@ static void bcache_exit(void)
if (bcache_major)
unregister_blkdev(bcache_major, "bcache");
unregister_reboot_notifier(&reboot);
+ mutex_destroy(&bch_register_lock);
}
static int __init bcache_init(void)
@@ -2106,6 +2107,7 @@ static int __init bcache_init(void)
bcache_major = register_blkdev(0, "bcache");
if (bcache_major < 0) {
+ mutex_destroy(&bch_register_lock);
unregister_reboot_notifier(&reboot);
return bcache_major;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH 06/19] bcache: explicitly destory mutex while exiting
2017-06-30 20:42 ` [PATCH 06/19] bcache: explicitly destory mutex while exiting bcache
@ 2017-07-01 18:43 ` Coly Li
2017-07-05 11:58 ` Liang Chen
2017-07-05 18:27 ` Christoph Hellwig
1 sibling, 1 reply; 41+ messages in thread
From: Coly Li @ 2017-07-01 18:43 UTC (permalink / raw)
To: Liang Chen; +Cc: bcache, linux-block, linux-bcache, hch, axboe, stable
On 2017/7/1 上午4:42, bcache@lists.ewheeler.net wrote:
> From: Liang Chen <liangchen.linux@gmail.com>
>
> mutex_destroy does nothing most of time, but it's better to call
> it to make the code future proof and it also has some meaning
> for like mutex debug.
>
> Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
> Cc: stable@vger.kernel.org
> ---
> drivers/md/bcache/super.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 48b8c20..1f84791 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -2089,6 +2089,7 @@ static void bcache_exit(void)
> if (bcache_major)
> unregister_blkdev(bcache_major, "bcache");
> unregister_reboot_notifier(&reboot);
> + mutex_destroy(&bch_register_lock> }
>
> static int __init bcache_init(void)
> @@ -2106,6 +2107,7 @@ static int __init bcache_init(void)
>
> bcache_major = register_blkdev(0, "bcache");
> if (bcache_major < 0) {
> + mutex_destroy(&bch_register_lock);
> unregister_reboot_notifier(&reboot);
> return bcache_major;
> }
>
Hi Liang,
Current code might have a potential race in a very corner case, see,
2084 static int __init bcache_init(void)
2085 {
2086 static const struct attribute *files[] = {
2087 &ksysfs_register.attr,
2088 &ksysfs_register_quiet.attr,
2089 NULL
2090 };
2091
2092 mutex_init(&bch_register_lock);
2093 init_waitqueue_head(&unregister_wait);
2094 register_reboot_notifier(&reboot);
2095 closure_debug_init();
2096
2097 bcache_major = register_blkdev(0, "bcache");
2098 if (bcache_major < 0) {
2099 unregister_reboot_notifier(&reboot);
2100 return bcache_major;
2101 }
2102
2103 if (!(bcache_wq = alloc_workqueue("bcache", WQ_MEM_RECLAIM,
0)) ||
2104 !(bcache_kobj = kobject_create_and_add("bcache",
fs_kobj)) ||
2105 sysfs_create_files(bcache_kobj, files) ||
2106 bch_request_init() ||
2107 bch_debug_init(bcache_kobj))
2108 goto err;
2109
2110 return 0;
2111 err:
2112 bcache_exit();
2113 return -ENOMEM;
2114 }
At line 2107, most of bache stuffs are ready to work, only a debugfs
entry not created yet. If in the time gap between line 2106 and line
2017, another user space tool just registers cache and backing devices.
Then bch_debug_init() failed, and bcache_exit() gets called. In this
case, I doubt bcache_exit() can handle all the references correctly.
The race is very rare, and almost won't happen in real life. So, if you
don't care about it, the patch can be simpler like this,
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index e57353e39168..fb5453a46a03 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2070,6 +2070,7 @@ static struct notifier_block reboot = {
static void bcache_exit(void)
{
+ mutex_destroy(&bch_register_lock);
bch_debug_exit();
bch_request_exit();
if (bcache_kobj)
@@ -2089,7 +2090,6 @@ static int __init bcache_init(void)
NULL
};
- mutex_init(&bch_register_lock);
init_waitqueue_head(&unregister_wait);
register_reboot_notifier(&reboot);
closure_debug_init();
@@ -2107,6 +2107,7 @@ static int __init bcache_init(void)
bch_debug_init(bcache_kobj))
goto err;
+ mutex_init(&bch_register_lock);
return 0;
err:
bcache_exit();
---
And if you do care about the race, maybe you should do something like this,
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index e57353e39168..ca1d8b7a7815 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2079,6 +2079,7 @@ static void bcache_exit(void)
if (bcache_major)
unregister_blkdev(bcache_major, "bcache");
unregister_reboot_notifier(&reboot);
+ mutex_unlock(&bch_register_lock);
}
static int __init bcache_init(void)
@@ -2090,6 +2091,7 @@ static int __init bcache_init(void)
};
mutex_init(&bch_register_lock);
+ mutex_lock(&bch_register_lock);
init_waitqueue_head(&unregister_wait);
register_reboot_notifier(&reboot);
closure_debug_init();
@@ -2097,6 +2099,8 @@ static int __init bcache_init(void)
bcache_major = register_blkdev(0, "bcache");
if (bcache_major < 0) {
unregister_reboot_notifier(&reboot);
+ mutex_unlock(&bch_register_lock);
+ mutex_destroy(&bch_register_lock);
return bcache_major;
}
@@ -2104,9 +2108,12 @@ static int __init bcache_init(void)
!(bcache_kobj = kobject_create_and_add("bcache", fs_kobj)) ||
sysfs_create_files(bcache_kobj, files) ||
bch_request_init() ||
- bch_debug_init(bcache_kobj))
+ bch_debug_init(bcache_kobj)) {
+ mutex_unlock(&bch_register_lock);
goto err;
+ }
+ mutex_unlock(&bch_register_lock);
return 0;
err:
bcache_exit();
---
Personally I think the first approach with only one new line code added,
your original version will add two new lines of code.
Just FYI. Thanks.
--
Coly Li
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH 06/19] bcache: explicitly destory mutex while exiting
2017-07-01 18:43 ` Coly Li
@ 2017-07-05 11:58 ` Liang Chen
2017-07-11 7:22 ` Coly Li
0 siblings, 1 reply; 41+ messages in thread
From: Liang Chen @ 2017-07-05 11:58 UTC (permalink / raw)
To: Coly Li; +Cc: bcache, linux-block, linux-bcache, hch, axboe, stable
Hi Coly,
Thanks for reviewing the patch! You raised a good point about the race. I also
think it should be addressed. Even though the time window is small, it will
still happen sooner or later.
I would like to keep this "destory mutex" patch unchanged, and send another
patch to fix the issue based on your approach. Please take a look. Thanks!
Thanks,
Liang
On Sun, Jul 2, 2017 at 2:43 AM, Coly Li <i@coly.li> wrote:
> On 2017/7/1 上午4:42, bcache@lists.ewheeler.net wrote:
>> From: Liang Chen <liangchen.linux@gmail.com>
>>
>> mutex_destroy does nothing most of time, but it's better to call
>> it to make the code future proof and it also has some meaning
>> for like mutex debug.
>>
>> Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
>> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
>> Cc: stable@vger.kernel.org
>> ---
>> drivers/md/bcache/super.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>> index 48b8c20..1f84791 100644
>> --- a/drivers/md/bcache/super.c
>> +++ b/drivers/md/bcache/super.c
>> @@ -2089,6 +2089,7 @@ static void bcache_exit(void)
>> if (bcache_major)
>> unregister_blkdev(bcache_major, "bcache");
>> unregister_reboot_notifier(&reboot);
>> + mutex_destroy(&bch_register_lock> }
>>
>> static int __init bcache_init(void)
>> @@ -2106,6 +2107,7 @@ static int __init bcache_init(void)
>>
>> bcache_major = register_blkdev(0, "bcache");
>> if (bcache_major < 0) {
>> + mutex_destroy(&bch_register_lock);
>> unregister_reboot_notifier(&reboot);
>> return bcache_major;
>> }
>>
>
> Hi Liang,
>
> Current code might have a potential race in a very corner case, see,
> 2084 static int __init bcache_init(void)
> 2085 {
> 2086 static const struct attribute *files[] = {
> 2087 &ksysfs_register.attr,
> 2088 &ksysfs_register_quiet.attr,
> 2089 NULL
> 2090 };
> 2091
> 2092 mutex_init(&bch_register_lock);
> 2093 init_waitqueue_head(&unregister_wait);
> 2094 register_reboot_notifier(&reboot);
> 2095 closure_debug_init();
> 2096
> 2097 bcache_major = register_blkdev(0, "bcache");
> 2098 if (bcache_major < 0) {
> 2099 unregister_reboot_notifier(&reboot);
> 2100 return bcache_major;
> 2101 }
> 2102
> 2103 if (!(bcache_wq = alloc_workqueue("bcache", WQ_MEM_RECLAIM,
> 0)) ||
> 2104 !(bcache_kobj = kobject_create_and_add("bcache",
> fs_kobj)) ||
> 2105 sysfs_create_files(bcache_kobj, files) ||
> 2106 bch_request_init() ||
> 2107 bch_debug_init(bcache_kobj))
> 2108 goto err;
> 2109
> 2110 return 0;
> 2111 err:
> 2112 bcache_exit();
> 2113 return -ENOMEM;
> 2114 }
>
> At line 2107, most of bache stuffs are ready to work, only a debugfs
> entry not created yet. If in the time gap between line 2106 and line
> 2017, another user space tool just registers cache and backing devices.
> Then bch_debug_init() failed, and bcache_exit() gets called. In this
> case, I doubt bcache_exit() can handle all the references correctly.
>
> The race is very rare, and almost won't happen in real life. So, if you
> don't care about it, the patch can be simpler like this,
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index e57353e39168..fb5453a46a03 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -2070,6 +2070,7 @@ static struct notifier_block reboot = {
>
> static void bcache_exit(void)
> {
> + mutex_destroy(&bch_register_lock);
> bch_debug_exit();
> bch_request_exit();
> if (bcache_kobj)
> @@ -2089,7 +2090,6 @@ static int __init bcache_init(void)
> NULL
> };
>
> - mutex_init(&bch_register_lock);
> init_waitqueue_head(&unregister_wait);
> register_reboot_notifier(&reboot);
> closure_debug_init();
> @@ -2107,6 +2107,7 @@ static int __init bcache_init(void)
> bch_debug_init(bcache_kobj))
> goto err;
>
> + mutex_init(&bch_register_lock);
> return 0;
> err:
> bcache_exit();
> ---
> And if you do care about the race, maybe you should do something like this,
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index e57353e39168..ca1d8b7a7815 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -2079,6 +2079,7 @@ static void bcache_exit(void)
> if (bcache_major)
> unregister_blkdev(bcache_major, "bcache");
> unregister_reboot_notifier(&reboot);
> + mutex_unlock(&bch_register_lock);
> }
>
> static int __init bcache_init(void)
> @@ -2090,6 +2091,7 @@ static int __init bcache_init(void)
> };
>
> mutex_init(&bch_register_lock);
> + mutex_lock(&bch_register_lock);
> init_waitqueue_head(&unregister_wait);
> register_reboot_notifier(&reboot);
> closure_debug_init();
> @@ -2097,6 +2099,8 @@ static int __init bcache_init(void)
> bcache_major = register_blkdev(0, "bcache");
> if (bcache_major < 0) {
> unregister_reboot_notifier(&reboot);
> + mutex_unlock(&bch_register_lock);
> + mutex_destroy(&bch_register_lock);
> return bcache_major;
> }
>
> @@ -2104,9 +2108,12 @@ static int __init bcache_init(void)
> !(bcache_kobj = kobject_create_and_add("bcache", fs_kobj)) ||
> sysfs_create_files(bcache_kobj, files) ||
> bch_request_init() ||
> - bch_debug_init(bcache_kobj))
> + bch_debug_init(bcache_kobj)) {
> + mutex_unlock(&bch_register_lock);
> goto err;
> + }
>
> + mutex_unlock(&bch_register_lock);
> return 0;
> err:
> bcache_exit();
> ---
>
> Personally I think the first approach with only one new line code added,
> your original version will add two new lines of code.
>
> Just FYI. Thanks.
>
> --
> Coly Li
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 06/19] bcache: explicitly destory mutex while exiting
2017-07-05 11:58 ` Liang Chen
@ 2017-07-11 7:22 ` Coly Li
0 siblings, 0 replies; 41+ messages in thread
From: Coly Li @ 2017-07-11 7:22 UTC (permalink / raw)
To: Liang Chen; +Cc: bcache, linux-block, linux-bcache, hch, axboe, stable
On 2017/7/5 下午7:58, Liang Chen wrote:
> Hi Coly,
> Thanks for reviewing the patch! You raised a good point about the race. I also
> think it should be addressed. Even though the time window is small, it will
> still happen sooner or later.
>
> I would like to keep this "destory mutex" patch unchanged, and send another
> patch to fix the issue based on your approach. Please take a look. Thanks!
>
Sure, good idea. I'd like to review the next fix, and provide my feed
back together. Thanks.
Coly
> Thanks,
> Liang
>
> On Sun, Jul 2, 2017 at 2:43 AM, Coly Li <i@coly.li> wrote:
>> On 2017/7/1 上午4:42, bcache@lists.ewheeler.net wrote:
>>> From: Liang Chen <liangchen.linux@gmail.com>
>>>
>>> mutex_destroy does nothing most of time, but it's better to call
>>> it to make the code future proof and it also has some meaning
>>> for like mutex debug.
>>>
>>> Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
>>> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
>>> Cc: stable@vger.kernel.org
>>> ---
>>> drivers/md/bcache/super.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>>> index 48b8c20..1f84791 100644
>>> --- a/drivers/md/bcache/super.c
>>> +++ b/drivers/md/bcache/super.c
>>> @@ -2089,6 +2089,7 @@ static void bcache_exit(void)
>>> if (bcache_major)
>>> unregister_blkdev(bcache_major, "bcache");
>>> unregister_reboot_notifier(&reboot);
>>> + mutex_destroy(&bch_register_lock> }
>>>
>>> static int __init bcache_init(void)
>>> @@ -2106,6 +2107,7 @@ static int __init bcache_init(void)
>>>
>>> bcache_major = register_blkdev(0, "bcache");
>>> if (bcache_major < 0) {
>>> + mutex_destroy(&bch_register_lock);
>>> unregister_reboot_notifier(&reboot);
>>> return bcache_major;
>>> }
>>>
>>
>> Hi Liang,
>>
>> Current code might have a potential race in a very corner case, see,
>> 2084 static int __init bcache_init(void)
>> 2085 {
>> 2086 static const struct attribute *files[] = {
>> 2087 &ksysfs_register.attr,
>> 2088 &ksysfs_register_quiet.attr,
>> 2089 NULL
>> 2090 };
>> 2091
>> 2092 mutex_init(&bch_register_lock);
>> 2093 init_waitqueue_head(&unregister_wait);
>> 2094 register_reboot_notifier(&reboot);
>> 2095 closure_debug_init();
>> 2096
>> 2097 bcache_major = register_blkdev(0, "bcache");
>> 2098 if (bcache_major < 0) {
>> 2099 unregister_reboot_notifier(&reboot);
>> 2100 return bcache_major;
>> 2101 }
>> 2102
>> 2103 if (!(bcache_wq = alloc_workqueue("bcache", WQ_MEM_RECLAIM,
>> 0)) ||
>> 2104 !(bcache_kobj = kobject_create_and_add("bcache",
>> fs_kobj)) ||
>> 2105 sysfs_create_files(bcache_kobj, files) ||
>> 2106 bch_request_init() ||
>> 2107 bch_debug_init(bcache_kobj))
>> 2108 goto err;
>> 2109
>> 2110 return 0;
>> 2111 err:
>> 2112 bcache_exit();
>> 2113 return -ENOMEM;
>> 2114 }
>>
>> At line 2107, most of bache stuffs are ready to work, only a debugfs
>> entry not created yet. If in the time gap between line 2106 and line
>> 2017, another user space tool just registers cache and backing devices.
>> Then bch_debug_init() failed, and bcache_exit() gets called. In this
>> case, I doubt bcache_exit() can handle all the references correctly.
>>
>> The race is very rare, and almost won't happen in real life. So, if you
>> don't care about it, the patch can be simpler like this,
>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>> index e57353e39168..fb5453a46a03 100644
>> --- a/drivers/md/bcache/super.c
>> +++ b/drivers/md/bcache/super.c
>> @@ -2070,6 +2070,7 @@ static struct notifier_block reboot = {
>>
>> static void bcache_exit(void)
>> {
>> + mutex_destroy(&bch_register_lock);
>> bch_debug_exit();
>> bch_request_exit();
>> if (bcache_kobj)
>> @@ -2089,7 +2090,6 @@ static int __init bcache_init(void)
>> NULL
>> };
>>
>> - mutex_init(&bch_register_lock);
>> init_waitqueue_head(&unregister_wait);
>> register_reboot_notifier(&reboot);
>> closure_debug_init();
>> @@ -2107,6 +2107,7 @@ static int __init bcache_init(void)
>> bch_debug_init(bcache_kobj))
>> goto err;
>>
>> + mutex_init(&bch_register_lock);
>> return 0;
>> err:
>> bcache_exit();
>> ---
>> And if you do care about the race, maybe you should do something like this,
>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>> index e57353e39168..ca1d8b7a7815 100644
>> --- a/drivers/md/bcache/super.c
>> +++ b/drivers/md/bcache/super.c
>> @@ -2079,6 +2079,7 @@ static void bcache_exit(void)
>> if (bcache_major)
>> unregister_blkdev(bcache_major, "bcache");
>> unregister_reboot_notifier(&reboot);
>> + mutex_unlock(&bch_register_lock);
>> }
>>
>> static int __init bcache_init(void)
>> @@ -2090,6 +2091,7 @@ static int __init bcache_init(void)
>> };
>>
>> mutex_init(&bch_register_lock);
>> + mutex_lock(&bch_register_lock);
>> init_waitqueue_head(&unregister_wait);
>> register_reboot_notifier(&reboot);
>> closure_debug_init();
>> @@ -2097,6 +2099,8 @@ static int __init bcache_init(void)
>> bcache_major = register_blkdev(0, "bcache");
>> if (bcache_major < 0) {
>> unregister_reboot_notifier(&reboot);
>> + mutex_unlock(&bch_register_lock);
>> + mutex_destroy(&bch_register_lock);
>> return bcache_major;
>> }
>>
>> @@ -2104,9 +2108,12 @@ static int __init bcache_init(void)
>> !(bcache_kobj = kobject_create_and_add("bcache", fs_kobj)) ||
>> sysfs_create_files(bcache_kobj, files) ||
>> bch_request_init() ||
>> - bch_debug_init(bcache_kobj))
>> + bch_debug_init(bcache_kobj)) {
>> + mutex_unlock(&bch_register_lock);
>> goto err;
>> + }
>>
>> + mutex_unlock(&bch_register_lock);
>> return 0;
>> err:
>> bcache_exit();
>> ---
>>
>> Personally I think the first approach with only one new line code added,
>> your original version will add two new lines of code.
>>
>> Just FYI. Thanks.
>>
>> --
>> Coly Li
--
Coly Li
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 06/19] bcache: explicitly destory mutex while exiting
2017-06-30 20:42 ` [PATCH 06/19] bcache: explicitly destory mutex while exiting bcache
2017-07-01 18:43 ` Coly Li
@ 2017-07-05 18:27 ` Christoph Hellwig
2017-07-06 1:56 ` Liang Chen
1 sibling, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2017-07-05 18:27 UTC (permalink / raw)
To: bcache; +Cc: linux-block, linux-bcache, hch, axboe, Liang Chen, stable
On Fri, Jun 30, 2017 at 01:42:55PM -0700, bcache@lists.ewheeler.net wrote:
> From: Liang Chen <liangchen.linux@gmail.com>
>
> mutex_destroy does nothing most of time, but it's better to call
> it to make the code future proof and it also has some meaning
> for like mutex debug.
It shouldn't really - we should get the destroy behavior for free
when doing a slab free of the area.
What issue are you trying to solve?
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 06/19] bcache: explicitly destory mutex while exiting
2017-07-05 18:27 ` Christoph Hellwig
@ 2017-07-06 1:56 ` Liang Chen
0 siblings, 0 replies; 41+ messages in thread
From: Liang Chen @ 2017-07-06 1:56 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: bcache, linux-block, linux-bcache, axboe, stable
mutex_destroy does nothing normally (may not be true in the future), but
when debug mutex is turned on it helps with debugging - mutex_destroy
in mutex-debug.c.
It's not about freeing of the memory. It's more about consistency of the
use of mutex and making the code future proof.
Thanks,
Linag
On Thu, Jul 6, 2017 at 2:27 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Fri, Jun 30, 2017 at 01:42:55PM -0700, bcache@lists.ewheeler.net wrote:
>> From: Liang Chen <liangchen.linux@gmail.com>
>>
>> mutex_destroy does nothing most of time, but it's better to call
>> it to make the code future proof and it also has some meaning
>> for like mutex debug.
>
> It shouldn't really - we should get the destroy behavior for free
> when doing a slab free of the area.
>
> What issue are you trying to solve?
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 10/19] bcache: initialize stripe_sectors_dirty correctly for thin flash device
2017-06-30 20:42 ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
` (3 preceding siblings ...)
2017-06-30 20:42 ` [PATCH 06/19] bcache: explicitly destory mutex while exiting bcache
@ 2017-06-30 20:42 ` bcache
2017-07-01 18:52 ` Coly Li
2017-06-30 20:43 ` [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate bcache
` (5 subsequent siblings)
10 siblings, 1 reply; 41+ messages in thread
From: bcache @ 2017-06-30 20:42 UTC (permalink / raw)
To: linux-block; +Cc: linux-bcache, hch, axboe, Tang Junhui, stable
From: Tang Junhui <tang.junhui@zte.com.cn>
Thin flash device does not initialize stripe_sectors_dirty correctly, this
patch fixes this issue.
Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Cc: stable@vger.kernel.org
---
drivers/md/bcache/super.c | 3 ++-
drivers/md/bcache/writeback.c | 8 ++++----
drivers/md/bcache/writeback.h | 2 +-
3 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 1f84791..e06641e 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1030,7 +1030,7 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
}
if (BDEV_STATE(&dc->sb) == BDEV_STATE_DIRTY) {
- bch_sectors_dirty_init(dc);
+ bch_sectors_dirty_init(&dc->disk);
atomic_set(&dc->has_dirty, 1);
atomic_inc(&dc->count);
bch_writeback_queue(dc);
@@ -1232,6 +1232,7 @@ static int flash_dev_run(struct cache_set *c, struct uuid_entry *u)
goto err;
bcache_device_attach(d, c, u - c->uuids);
+ bch_sectors_dirty_init(d);
bch_flash_dev_request_init(d);
add_disk(d->disk);
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 3d463f0..4ac8b13 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -482,17 +482,17 @@ static int sectors_dirty_init_fn(struct btree_op *_op, struct btree *b,
return MAP_CONTINUE;
}
-void bch_sectors_dirty_init(struct cached_dev *dc)
+void bch_sectors_dirty_init(struct bcache_device *d)
{
struct sectors_dirty_init op;
bch_btree_op_init(&op.op, -1);
- op.inode = dc->disk.id;
+ op.inode = d->id;
- bch_btree_map_keys(&op.op, dc->disk.c, &KEY(op.inode, 0, 0),
+ bch_btree_map_keys(&op.op, d->c, &KEY(op.inode, 0, 0),
sectors_dirty_init_fn, 0);
- dc->disk.sectors_dirty_last = bcache_dev_sectors_dirty(&dc->disk);
+ d->sectors_dirty_last = bcache_dev_sectors_dirty(d);
}
void bch_cached_dev_writeback_init(struct cached_dev *dc)
diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
index ea2f92e..c2ab4b4 100644
--- a/drivers/md/bcache/writeback.h
+++ b/drivers/md/bcache/writeback.h
@@ -109,7 +109,7 @@ static inline void bch_writeback_add(struct cached_dev *dc)
void bcache_dev_sectors_dirty_add(struct cache_set *, unsigned, uint64_t, int);
-void bch_sectors_dirty_init(struct cached_dev *dc);
+void bch_sectors_dirty_init(struct bcache_device *);
void bch_cached_dev_writeback_init(struct cached_dev *);
int bch_cached_dev_writeback_start(struct cached_dev *);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH 10/19] bcache: initialize stripe_sectors_dirty correctly for thin flash device
2017-06-30 20:42 ` [PATCH 10/19] bcache: initialize stripe_sectors_dirty correctly for thin flash device bcache
@ 2017-07-01 18:52 ` Coly Li
0 siblings, 0 replies; 41+ messages in thread
From: Coly Li @ 2017-07-01 18:52 UTC (permalink / raw)
To: Tang Junhui; +Cc: bcache, linux-block, linux-bcache, hch, axboe, stable
On 2017/7/1 上午4:42, bcache@lists.ewheeler.net wrote:
> From: Tang Junhui <tang.junhui@zte.com.cn>
>
> Thin flash device does not initialize stripe_sectors_dirty correctly, this
> patch fixes this issue.
Hi Junhui,
Could you please explain why stripe_sectors_ditry is not correctly
initialized and how about its negative result ?
Thanks.
Coly
>
> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
> Cc: stable@vger.kernel.org
> ---
> drivers/md/bcache/super.c | 3 ++-
> drivers/md/bcache/writeback.c | 8 ++++----
> drivers/md/bcache/writeback.h | 2 +-
> 3 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 1f84791..e06641e 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1030,7 +1030,7 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
> }
>
> if (BDEV_STATE(&dc->sb) == BDEV_STATE_DIRTY) {
> - bch_sectors_dirty_init(dc);
> + bch_sectors_dirty_init(&dc->disk);
> atomic_set(&dc->has_dirty, 1);
> atomic_inc(&dc->count);
> bch_writeback_queue(dc);
> @@ -1232,6 +1232,7 @@ static int flash_dev_run(struct cache_set *c, struct uuid_entry *u)
> goto err;
>
> bcache_device_attach(d, c, u - c->uuids);
> + bch_sectors_dirty_init(d);
> bch_flash_dev_request_init(d);
> add_disk(d->disk);
>
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 3d463f0..4ac8b13 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -482,17 +482,17 @@ static int sectors_dirty_init_fn(struct btree_op *_op, struct btree *b,
> return MAP_CONTINUE;
> }
>
> -void bch_sectors_dirty_init(struct cached_dev *dc)
> +void bch_sectors_dirty_init(struct bcache_device *d)
> {
> struct sectors_dirty_init op;
>
> bch_btree_op_init(&op.op, -1);
> - op.inode = dc->disk.id;
> + op.inode = d->id;
>
> - bch_btree_map_keys(&op.op, dc->disk.c, &KEY(op.inode, 0, 0),
> + bch_btree_map_keys(&op.op, d->c, &KEY(op.inode, 0, 0),
> sectors_dirty_init_fn, 0);
>
> - dc->disk.sectors_dirty_last = bcache_dev_sectors_dirty(&dc->disk);
> + d->sectors_dirty_last = bcache_dev_sectors_dirty(d);
> }
>
> void bch_cached_dev_writeback_init(struct cached_dev *dc)
> diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
> index ea2f92e..c2ab4b4 100644
> --- a/drivers/md/bcache/writeback.h
> +++ b/drivers/md/bcache/writeback.h
> @@ -109,7 +109,7 @@ static inline void bch_writeback_add(struct cached_dev *dc)
>
> void bcache_dev_sectors_dirty_add(struct cache_set *, unsigned, uint64_t, int);
>
> -void bch_sectors_dirty_init(struct cached_dev *dc);
> +void bch_sectors_dirty_init(struct bcache_device *);
> void bch_cached_dev_writeback_init(struct cached_dev *);
> int bch_cached_dev_writeback_start(struct cached_dev *);
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate
2017-06-30 20:42 ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
` (4 preceding siblings ...)
2017-06-30 20:42 ` [PATCH 10/19] bcache: initialize stripe_sectors_dirty correctly for thin flash device bcache
@ 2017-06-30 20:43 ` bcache
2017-07-10 18:11 ` Coly Li
2017-06-30 20:43 ` [PATCH 14/19] bcache: Correct return value for sysfs attach errors bcache
` (4 subsequent siblings)
10 siblings, 1 reply; 41+ messages in thread
From: bcache @ 2017-06-30 20:43 UTC (permalink / raw)
To: linux-block; +Cc: linux-bcache, hch, axboe, Tang Junhui, stable
From: Tang Junhui <tang.junhui@zte.com.cn>
Since dirty sectors of thin flash cannot be used to cache data for backend
device, so we should subtract it in calculating writeback rate.
Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Cc: stable@vger.kernel.org
---
drivers/md/bcache/writeback.c | 2 +-
drivers/md/bcache/writeback.h | 19 +++++++++++++++++++
2 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 4ac8b13..25289e4 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -21,7 +21,7 @@
static void __update_writeback_rate(struct cached_dev *dc)
{
struct cache_set *c = dc->disk.c;
- uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size;
+ uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size - bcache_flash_devs_sectors_dirty(c);
uint64_t cache_dirty_target =
div_u64(cache_sectors * dc->writeback_percent, 100);
diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
index c2ab4b4..24ff589 100644
--- a/drivers/md/bcache/writeback.h
+++ b/drivers/md/bcache/writeback.h
@@ -14,6 +14,25 @@ static inline uint64_t bcache_dev_sectors_dirty(struct bcache_device *d)
return ret;
}
+static inline uint64_t bcache_flash_devs_sectors_dirty(struct cache_set *c)
+{
+ uint64_t i, ret = 0;
+
+ mutex_lock(&bch_register_lock);
+
+ for (i = 0; i < c->nr_uuids; i++) {
+ struct bcache_device *d = c->devices[i];
+
+ if (!d || !UUID_FLASH_ONLY(&c->uuids[i]))
+ continue;
+ ret += bcache_dev_sectors_dirty(d);
+ }
+
+ mutex_unlock(&bch_register_lock);
+
+ return ret;
+}
+
static inline unsigned offset_to_stripe(struct bcache_device *d,
uint64_t offset)
{
--
1.8.3.1
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate
2017-06-30 20:43 ` [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate bcache
@ 2017-07-10 18:11 ` Coly Li
[not found] ` <OF92BDA950.86AA00FA-ON4825815A.001F33D9-4825815A.001F5C89@zte.com.cn>
0 siblings, 1 reply; 41+ messages in thread
From: Coly Li @ 2017-07-10 18:11 UTC (permalink / raw)
To: Tang Junhui; +Cc: bcache, linux-block, linux-bcache, hch, axboe, stable
On 2017/7/1 上午4:43, bcache@lists.ewheeler.net wrote:
> From: Tang Junhui <tang.junhui@zte.com.cn>
>
> Since dirty sectors of thin flash cannot be used to cache data for backend
> device, so we should subtract it in calculating writeback rate.
>
I see you want to get ride of the noise of flash only cache device for
writeback rate calculation. It makes sense, because flash only cache
device won't have write back happen at all.
> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
> Cc: stable@vger.kernel.org
> ---
> drivers/md/bcache/writeback.c | 2 +-
> drivers/md/bcache/writeback.h | 19 +++++++++++++++++++
> 2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 4ac8b13..25289e4 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -21,7 +21,7 @@
> static void __update_writeback_rate(struct cached_dev *dc)
> {
> struct cache_set *c = dc->disk.c;
> - uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size;
> + uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size - bcache_flash_devs_sectors_dirty(c);
See flash_dev_run(), the flash volume is created per struct
bcache_device of a cache set. That means, all data allocated for the
flash volume will be from a flash only bcache device. Regular dirty data
won't mixed allocating with flash volume dirty data on identical struct
bcache device.
Based on the above implementation, non-dirty space from flash only
bcache device will mislead writeback rate calculation too. So I suggest
to subtract all buckets size from all flash only bcache devices. Then it
might be something like,
uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size -
bcache_flash_devs_nbuckets(c);
Just FYI. Thanks.
Coly
> uint64_t cache_dirty_target =
> div_u64(cache_sectors * dc->writeback_percent, 100);
>
> diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
> index c2ab4b4..24ff589 100644
> --- a/drivers/md/bcache/writeback.h
> +++ b/drivers/md/bcache/writeback.h
> @@ -14,6 +14,25 @@ static inline uint64_t bcache_dev_sectors_dirty(struct bcache_device *d)
> return ret;
> }
>
> +static inline uint64_t bcache_flash_devs_sectors_dirty(struct cache_set *c)
> +{
> + uint64_t i, ret = 0;
> +
> + mutex_lock(&bch_register_lock);
> +
> + for (i = 0; i < c->nr_uuids; i++) {
> + struct bcache_device *d = c->devices[i];
> +
> + if (!d || !UUID_FLASH_ONLY(&c->uuids[i]))
> + continue;
> + ret += bcache_dev_sectors_dirty(d);
> + }
> +
> + mutex_unlock(&bch_register_lock);
> +
> + return ret;
> +}
> +
> static inline unsigned offset_to_stripe(struct bcache_device *d,
> uint64_t offset)
> {
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 14/19] bcache: Correct return value for sysfs attach errors
2017-06-30 20:42 ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
` (5 preceding siblings ...)
2017-06-30 20:43 ` [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate bcache
@ 2017-06-30 20:43 ` bcache
2017-06-30 20:43 ` [PATCH 17/19] bcache: fix for gc and write-back race bcache
` (3 subsequent siblings)
10 siblings, 0 replies; 41+ messages in thread
From: bcache @ 2017-06-30 20:43 UTC (permalink / raw)
To: linux-block; +Cc: linux-bcache, hch, axboe, Tony Asleson, stable
From: Tony Asleson <tasleson@redhat.com>
If you encounter any errors in bch_cached_dev_attach it will return a negative
error code. The variable 'v' which stores the result is unsigned, thus user
space sees a very large value returned for bytes written which can cause
incorrect user space behavior. Utilize 1 signed variable to use throughout
the function to preserve error return capability.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Acked-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org
---
drivers/md/bcache/sysfs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index cc0076d..7579ca6 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -206,7 +206,7 @@ STORE(__cached_dev)
{
struct cached_dev *dc = container_of(kobj, struct cached_dev,
disk.kobj);
- unsigned v = size;
+ ssize_t v = size;
struct cache_set *c;
struct kobj_uevent_env *env;
unsigned ioprio_class = 0; /* invalid initial ioprio values */
@@ -245,7 +245,7 @@ STORE(__cached_dev)
bch_cached_dev_run(dc);
if (attr == &sysfs_cache_mode) {
- ssize_t v = bch_read_string_list(buf, bch_cache_modes + 1);
+ v = bch_read_string_list(buf, bch_cache_modes + 1);
if (v < 0)
return v;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 17/19] bcache: fix for gc and write-back race
2017-06-30 20:42 ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
` (6 preceding siblings ...)
2017-06-30 20:43 ` [PATCH 14/19] bcache: Correct return value for sysfs attach errors bcache
@ 2017-06-30 20:43 ` bcache
2017-08-03 16:20 ` Coly Li
2017-07-01 16:55 ` [PATCH 01/19] bcache: Fix leak of bdev reference Coly Li
` (2 subsequent siblings)
10 siblings, 1 reply; 41+ messages in thread
From: bcache @ 2017-06-30 20:43 UTC (permalink / raw)
To: linux-block; +Cc: linux-bcache, hch, axboe, Tang Junhui, stable
From: Tang Junhui <tang.junhui@zte.com.cn>
gc and write-back get raced (see the email "bcache get stucked" I sended
before):
gc thread write-back thread
| |bch_writeback_thread()
|bch_gc_thread() |
| |==>read_dirty()
|==>bch_btree_gc() |
|==>btree_root() //get btree root |
| node write locker |
|==>bch_btree_gc_root() |
| |==>read_dirty_submit()
| |==>write_dirty()
| |==>continue_at(cl, write_dirty_finish, system_wq);
| |==>write_dirty_finish()//excute in system_wq
| |==>bch_btree_insert()
| |==>bch_btree_map_leaf_nodes()
| |==>__bch_btree_map_nodes()
| |==>btree_root //try to get btree root node read lock
| |-----stuck here
|==>bch_btree_set_root() |
|==>bch_journal_meta() |
|==>bch_journal() |
|==>journal_try_write() |
|==>journal_write_unlocked() //journal_full(&c->journal) condition satisfied
|==>continue_at(cl, journal_write, system_wq); //try to excute journal_write in system_wq
| //but work queue is excuting write_dirty_finish()
|==>closure_sync(); //wait journal_write execute over and wake up gc,
| --stuck here
|==>release root node write locker
This patch alloc a separate work-queue for write-back thread to avoid such
race.
Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Cc: stable@vger.kernel.org
---
drivers/md/bcache/bcache.h | 1 +
drivers/md/bcache/super.c | 2 ++
drivers/md/bcache/writeback.c | 8 ++++++--
3 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 44123e4..deb0a6c 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -333,6 +333,7 @@ struct cached_dev {
/* Limit number of writeback bios in flight */
struct semaphore in_flight;
struct task_struct *writeback_thread;
+ struct workqueue_struct *writeback_write_wq;
struct keybuf writeback_keys;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index e06641e..24cb9b7 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1063,6 +1063,8 @@ static void cached_dev_free(struct closure *cl)
cancel_delayed_work_sync(&dc->writeback_rate_update);
if (!IS_ERR_OR_NULL(dc->writeback_thread))
kthread_stop(dc->writeback_thread);
+ if (dc->writeback_write_wq)
+ destroy_workqueue(dc->writeback_write_wq);
mutex_lock(&bch_register_lock);
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 4104eaa..4bc5daa 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -189,7 +189,7 @@ static void write_dirty(struct closure *cl)
closure_bio_submit(&io->bio, cl);
- continue_at(cl, write_dirty_finish, system_wq);
+ continue_at(cl, write_dirty_finish, io->dc->writeback_write_wq);
}
static void read_dirty_endio(struct bio *bio)
@@ -209,7 +209,7 @@ static void read_dirty_submit(struct closure *cl)
closure_bio_submit(&io->bio, cl);
- continue_at(cl, write_dirty, system_wq);
+ continue_at(cl, write_dirty, io->dc->writeback_write_wq);
}
static void read_dirty(struct cached_dev *dc)
@@ -527,6 +527,10 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc)
int bch_cached_dev_writeback_start(struct cached_dev *dc)
{
+ dc->writeback_write_wq = alloc_workqueue("bcache_writeback_wq", WQ_MEM_RECLAIM, 0);
+ if (!dc->writeback_write_wq)
+ return -ENOMEM;
+
dc->writeback_thread = kthread_create(bch_writeback_thread, dc,
"bcache_writeback");
if (IS_ERR(dc->writeback_thread))
--
1.8.3.1
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH 17/19] bcache: fix for gc and write-back race
2017-06-30 20:43 ` [PATCH 17/19] bcache: fix for gc and write-back race bcache
@ 2017-08-03 16:20 ` Coly Li
0 siblings, 0 replies; 41+ messages in thread
From: Coly Li @ 2017-08-03 16:20 UTC (permalink / raw)
To: bcache, Tang Junhui; +Cc: linux-block, linux-bcache, hch, axboe, stable
On 2017/7/1 上午4:43, bcache@lists.ewheeler.net wrote:
> From: Tang Junhui <tang.junhui@zte.com.cn>
>
> gc and write-back get raced (see the email "bcache get stucked" I sended
> before):
> gc thread write-back thread
> | |bch_writeback_thread()
> |bch_gc_thread() |
> | |==>read_dirty()
> |==>bch_btree_gc() |
> |==>btree_root() //get btree root |
> | node write locker |
> |==>bch_btree_gc_root() |
> | |==>read_dirty_submit()
> | |==>write_dirty()
> | |==>continue_at(cl, write_dirty_finish, system_wq);
> | |==>write_dirty_finish()//excute in system_wq
> | |==>bch_btree_insert()
> | |==>bch_btree_map_leaf_nodes()
> | |==>__bch_btree_map_nodes()
> | |==>btree_root //try to get btree root node read lock
> | |-----stuck here
> |==>bch_btree_set_root() |
> |==>bch_journal_meta() |
> |==>bch_journal() |
> |==>journal_try_write() |
> |==>journal_write_unlocked() //journal_full(&c->journal) condition satisfied
> |==>continue_at(cl, journal_write, system_wq); //try to excute journal_write in system_wq
> | //but work queue is excuting write_dirty_finish()
> |==>closure_sync(); //wait journal_write execute over and wake up gc,
> | --stuck here
> |==>release root node write locker
>
> This patch alloc a separate work-queue for write-back thread to avoid such
> race.
>
> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
> Cc: stable@vger.kernel.org
Add a per-cached device work queue is a good idea, it's OK to me.
Acked-by: Coly Li <colyli@suse.de>
Thansk.
Coly
> ---
> drivers/md/bcache/bcache.h | 1 +
> drivers/md/bcache/super.c | 2 ++
> drivers/md/bcache/writeback.c | 8 ++++++--
> 3 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index 44123e4..deb0a6c 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -333,6 +333,7 @@ struct cached_dev {
> /* Limit number of writeback bios in flight */
> struct semaphore in_flight;
> struct task_struct *writeback_thread;
> + struct workqueue_struct *writeback_write_wq;
>
> struct keybuf writeback_keys;
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index e06641e..24cb9b7 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1063,6 +1063,8 @@ static void cached_dev_free(struct closure *cl)
> cancel_delayed_work_sync(&dc->writeback_rate_update);
> if (!IS_ERR_OR_NULL(dc->writeback_thread))
> kthread_stop(dc->writeback_thread);
> + if (dc->writeback_write_wq)
> + destroy_workqueue(dc->writeback_write_wq);
>
> mutex_lock(&bch_register_lock);
>
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 4104eaa..4bc5daa 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -189,7 +189,7 @@ static void write_dirty(struct closure *cl)
>
> closure_bio_submit(&io->bio, cl);
>
> - continue_at(cl, write_dirty_finish, system_wq);
> + continue_at(cl, write_dirty_finish, io->dc->writeback_write_wq);
> }
>
> static void read_dirty_endio(struct bio *bio)
> @@ -209,7 +209,7 @@ static void read_dirty_submit(struct closure *cl)
>
> closure_bio_submit(&io->bio, cl);
>
> - continue_at(cl, write_dirty, system_wq);
> + continue_at(cl, write_dirty, io->dc->writeback_write_wq);
> }
>
> static void read_dirty(struct cached_dev *dc)
> @@ -527,6 +527,10 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc)
>
> int bch_cached_dev_writeback_start(struct cached_dev *dc)
> {
> + dc->writeback_write_wq = alloc_workqueue("bcache_writeback_wq", WQ_MEM_RECLAIM, 0);
> + if (!dc->writeback_write_wq)
> + return -ENOMEM;
> +
> dc->writeback_thread = kthread_create(bch_writeback_thread, dc,
> "bcache_writeback");
> if (IS_ERR(dc->writeback_thread))
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 01/19] bcache: Fix leak of bdev reference
2017-06-30 20:42 ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
` (7 preceding siblings ...)
2017-06-30 20:43 ` [PATCH 17/19] bcache: fix for gc and write-back race bcache
@ 2017-07-01 16:55 ` Coly Li
2017-07-05 18:24 ` Christoph Hellwig
[not found] ` <1498855388-16990-5-git-send-email-bcache@lists.ewheeler.net>
10 siblings, 0 replies; 41+ messages in thread
From: Coly Li @ 2017-07-01 16:55 UTC (permalink / raw)
To: bcache, linux-block; +Cc: linux-bcache, hch, axboe, Jan Kara, stable
On 2017/7/1 上午4:42, bcache@lists.ewheeler.net wrote:
> From: Jan Kara <jack@suse.cz>
>
> If blkdev_get_by_path() in register_bcache() fails, we try to lookup the
> block device using lookup_bdev() to detect which situation we are in to
> properly report error. However we never drop the reference returned to
> us from lookup_bdev(). Fix that.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> Cc: stable@vger.kernel.org
Acked-by: Coly Li <colyli@suse.de>
Thanks.
> ---
> drivers/md/bcache/super.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 8352fad..9a2c190 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1964,6 +1964,8 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
> else
> err = "device busy";
> mutex_unlock(&bch_register_lock);
> + if (!IS_ERR(bdev))
> + bdput(bdev);
> if (attr == &ksysfs_register_quiet)
> goto out;
> }
>
--
Coly Li
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 01/19] bcache: Fix leak of bdev reference
2017-06-30 20:42 ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
` (8 preceding siblings ...)
2017-07-01 16:55 ` [PATCH 01/19] bcache: Fix leak of bdev reference Coly Li
@ 2017-07-05 18:24 ` Christoph Hellwig
[not found] ` <1498855388-16990-5-git-send-email-bcache@lists.ewheeler.net>
10 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2017-07-05 18:24 UTC (permalink / raw)
To: bcache; +Cc: linux-block, linux-bcache, hch, axboe, Jan Kara, stable
On Fri, Jun 30, 2017 at 01:42:50PM -0700, bcache@lists.ewheeler.net wrote:
> From: Jan Kara <jack@suse.cz>
>
> If blkdev_get_by_path() in register_bcache() fails, we try to lookup the
> block device using lookup_bdev() to detect which situation we are in to
> properly report error. However we never drop the reference returned to
> us from lookup_bdev(). Fix that.
This look ok, but I think that whole chunk of code should just go
away - adding a lookup_bdev and resulting mess just for a slightly
different error message is just insane.
^ permalink raw reply [flat|nested] 41+ messages in thread[parent not found: <1498855388-16990-5-git-send-email-bcache@lists.ewheeler.net>]