public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PULL] bcache updates based on git.kernel.dk/linux-block:for-next
@ 2017-05-25 19:10 Eric Wheeler
  2017-06-28 23:06 ` [PULL] bcache fixes and updates for-4.13 Eric Wheeler
  2017-07-14 11:40 ` [PULL] bcache updates based on git.kernel.dk/linux-block:for-next Eddie Chapman
  0 siblings, 2 replies; 41+ messages in thread
From: Eric Wheeler @ 2017-05-25 19:10 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-bcache, Jan Kara, stable, tang.junhui, Kent Overstreet,
	Coly Li, Stefan Bader, Liang Chen, nix

Hi Jens,

Please pull these updates and bugfixes from the bcache community when you 
have a minute.  If you need a rebase against something else then please 
let me know and I would be happy to update for you.

Thank you for your help!



The following changes since commit 0789bd7bdb5bd036fe3df96c19528f46127a0160:

  Merge branch 'for-linus' into for-next (2017-05-10 07:40:47 -0600)

are available in the git repository at:

  https://bitbucket.org/ewheelerinc/linux.git bcache-updates-linux-block-for-next

for you to fetch changes up to 8a02c3d571b895931db7a2700f05d8c70a7c6cb2:

  bcache: update bio->bi_opf bypass/writeback REQ_ flag hints (2017-05-11 12:29:08 -0700)

----------------------------------------------------------------
Jan Kara (1):
      bcache: Fix leak of bdev reference

Liang Chen (1):
      bcache: explicitly destory mutex while exiting

tang.junhui (4):
      bcache: fix sequential large write IO bypass
      bcache: do not subtract sectors_to_gc for bypassed IO
      bcache: fix wrong cache_misses statistics
      bcache: fix calling ida_simple_remove() with incorrect minor

Eric Wheeler (3):
      bcache: introduce bcache sysfs entries for ioprio-based bypass/writeback hints
      bcache: documentation for sysfs entries describing bcache cache hinting
      bcache: update bio->bi_opf bypass/writeback REQ_ flag hints


 Documentation/bcache.txt      | 80 ++++++++++++++++++++++++++++++++++++++++
 drivers/md/bcache/bcache.h    |  3 ++
 drivers/md/bcache/request.c   | 45 +++++++++++++++++-----
 drivers/md/bcache/super.c     | 18 ++++++---
 drivers/md/bcache/sysfs.c     | 71 +++++++++++++++++++++++++++++++++++
 drivers/md/bcache/writeback.c |  8 ++++
 drivers/md/bcache/writeback.h | 27 +++++++++++++-
 7 files changed, 236 insertions(+), 16 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PULL] bcache fixes and updates for-4.13
  2017-05-25 19:10 [PULL] bcache updates based on git.kernel.dk/linux-block:for-next Eric Wheeler
@ 2017-06-28 23:06 ` Eric Wheeler
  2017-06-29 13:45   ` Christoph Hellwig
  2017-07-14 11:40 ` [PULL] bcache updates based on git.kernel.dk/linux-block:for-next Eddie Chapman
  1 sibling, 1 reply; 41+ messages in thread
From: Eric Wheeler @ 2017-06-28 23:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: dan.carpenter, tasleson, linux-bcache, linux-block, Jan Kara,
	stable, tang.junhui, Kent Overstreet, Coly Li, Stefan Bader,
	Liang Chen, nix

Hi Jens,

It looks like you're getting ready for 4.13, so here are the bcache 
updates rebased and ready from git.kernel.dk/linux-block for-4.13/block 
This tree builds, boots, and works as expected.  Please pull the new fixes 
and features below.

Thank you for your help!


The following changes since commit f1d4ef7d88832444e8dfeb0e85e19d3b6ecb5011:

  nvmet-rdma: register ib_client to not deadlock in device removal (2017-06-28 08:14:13 -0600)

are available in the git repository at:

  https://bitbucket.org/ewheelerinc/linux.git bcache-updates-linux-block-for-4.13

for you to fetch changes up to 960ce81bdbd46bc7faaeb2f9fcddfea1e48ee388:

  bcache: Update continue_at() documentation (2017-06-28 15:18:02 -0700)

----------------------------------------------------------------
Dan Carpenter (2):
      bcache: silence static checker warning
      bcache: Update continue_at() documentation

Eric Wheeler (3):
      bcache: introduce bcache sysfs entries for ioprio-based bypass/writeback hints
      bcache: documentation for sysfs entries describing bcache cache hinting
      bcache: update bio->bi_opf bypass/writeback REQ_ flag hints

Jan Kara (1):
      bcache: Fix leak of bdev reference

Liang Chen (1):
      bcache: explicitly destory mutex while exiting

Tang Junhui (11):
      bcache: fix sequential large write IO bypass
      bcache: do not subtract sectors_to_gc for bypassed IO
      bcache: fix wrong cache_misses statistics
      bcache: fix calling ida_simple_remove() with incorrect minor
      bcache: initialize stripe_sectors_dirty correctly for thin flash device
      bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate
      bcache: update bucket_in_use periodically
      bcache: delete redundant calling set_gc_sectors()
      bcache: fix issue of writeback rate at minimum 1 key per second
      bcache: increase the number of open buckets
      bcache: fix for gc and write-back race

Tony Asleson (1):
      bcache: Correct return value for sysfs attach errors

 Documentation/bcache.txt      | 80 ++++++++++++++++++++++++++++++++++++++++++
 drivers/md/bcache/alloc.c     |  4 ++-
 drivers/md/bcache/bcache.h    |  4 +++
 drivers/md/bcache/btree.c     | 30 ++++++++++++++--
 drivers/md/bcache/closure.h   |  4 ---
 drivers/md/bcache/request.c   | 45 ++++++++++++++++++------
 drivers/md/bcache/super.c     | 26 +++++++++-----
 drivers/md/bcache/sysfs.c     | 75 +++++++++++++++++++++++++++++++++++++--
 drivers/md/bcache/util.c      |  9 ++++-
 drivers/md/bcache/writeback.c | 37 +++++++++++++------
 drivers/md/bcache/writeback.h | 48 +++++++++++++++++++++++--
 11 files changed, 319 insertions(+), 43 deletions(-)



--
Eric Wheeler

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PULL] bcache fixes and updates for-4.13
  2017-06-28 23:06 ` [PULL] bcache fixes and updates for-4.13 Eric Wheeler
@ 2017-06-29 13:45   ` Christoph Hellwig
  2017-06-29 16:19     ` Coly Li
                       ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Christoph Hellwig @ 2017-06-29 13:45 UTC (permalink / raw)
  To: Eric Wheeler
  Cc: Jens Axboe, dan.carpenter, tasleson, linux-bcache, linux-block,
	Jan Kara, stable, tang.junhui, Kent Overstreet, Coly Li,
	Stefan Bader, Liang Chen, nix

On Wed, Jun 28, 2017 at 11:06:46PM +0000, Eric Wheeler wrote:
> Hi Jens,
> 
> It looks like you're getting ready for 4.13, so here are the bcache 
> updates rebased and ready from git.kernel.dk/linux-block for-4.13/block 
> This tree builds, boots, and works as expected.  Please pull the new fixes 
> and features below.
> 
> Thank you for your help!

Can you please send all the patches to linux-block for review as
a series?  Some of these subjects sound a little suspect.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PULL] bcache fixes and updates for-4.13
  2017-06-29 13:45   ` Christoph Hellwig
@ 2017-06-29 16:19     ` Coly Li
  2017-06-29 22:12     ` Eric Wheeler
  2017-06-30 20:42     ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
  2 siblings, 0 replies; 41+ messages in thread
From: Coly Li @ 2017-06-29 16:19 UTC (permalink / raw)
  To: Christoph Hellwig, Eric Wheeler
  Cc: Jens Axboe, dan.carpenter, tasleson, linux-bcache, linux-block,
	Jan Kara, stable, tang.junhui, Kent Overstreet, Coly Li,
	Stefan Bader, Liang Chen, nix

On 2017/6/29 下午9:45, Christoph Hellwig wrote:
> On Wed, Jun 28, 2017 at 11:06:46PM +0000, Eric Wheeler wrote:
>> Hi Jens,
>>
>> It looks like you're getting ready for 4.13, so here are the bcache 
>> updates rebased and ready from git.kernel.dk/linux-block for-4.13/block 
>> This tree builds, boots, and works as expected.  Please pull the new fixes 
>> and features below.
>>
>> Thank you for your help!
> 
> Can you please send all the patches to linux-block for review as
> a series?  Some of these subjects sound a little suspect.
> 

We need more eyes to look at bcache code, maybe we can always CC bcache
patches to linux-block for more review.

-- 
Coly Li

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PULL] bcache fixes and updates for-4.13
  2017-06-29 13:45   ` Christoph Hellwig
  2017-06-29 16:19     ` Coly Li
@ 2017-06-29 22:12     ` Eric Wheeler
  2017-06-29 22:25       ` Eric Wheeler
  2017-06-30 20:42     ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
  2 siblings, 1 reply; 41+ messages in thread
From: Eric Wheeler @ 2017-06-29 22:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, dan.carpenter, tasleson, linux-bcache, linux-block,
	Jan Kara, stable, tang.junhui, Kent Overstreet, Coly Li,
	Stefan Bader, Liang Chen, nix


On Thu, 29 Jun 2017, Christoph Hellwig wrote:

> On Wed, Jun 28, 2017 at 11:06:46PM +0000, Eric Wheeler wrote:
> > Hi Jens,
> > 
> > It looks like you're getting ready for 4.13, so here are the bcache 
> > updates rebased and ready from git.kernel.dk/linux-block for-4.13/block 
> > This tree builds, boots, and works as expected.  Please pull the new fixes 
> > and features below.
> > 
> > Thank you for your help!
> 
> Can you please send all the patches to linux-block for review as
> a series?  Some of these subjects sound a little suspect.


On their way, IRT this thread.

--
Eric Wheeler



> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PULL] bcache fixes and updates for-4.13
  2017-06-29 22:12     ` Eric Wheeler
@ 2017-06-29 22:25       ` Eric Wheeler
  2017-06-29 23:28         ` Nick Alcock
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Wheeler @ 2017-06-29 22:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, dan.carpenter, tasleson, linux-bcache, linux-block,
	Jan Kara, stable, tang.junhui, Kent Overstreet, Coly Li,
	Stefan Bader, Liang Chen, nix

On Thu, 29 Jun 2017, Eric Wheeler wrote:

> 
> On Thu, 29 Jun 2017, Christoph Hellwig wrote:
> 
> > On Wed, Jun 28, 2017 at 11:06:46PM +0000, Eric Wheeler wrote:
> > > Hi Jens,
> > > 
> > > It looks like you're getting ready for 4.13, so here are the bcache 
> > > updates rebased and ready from git.kernel.dk/linux-block for-4.13/block 
> > > This tree builds, boots, and works as expected.  Please pull the new fixes 
> > > and features below.
> > > 
> > > Thank you for your help!
> > 
> > Can you please send all the patches to linux-block for review as
> > a series?  Some of these subjects sound a little suspect.
> 
> 
> On their way, IRT this thread.

Hmm, I think vger might not be letting them in because the From: header 
differs from the original sender (ie, I am not @oracle.com) so 
DNS/SPF/DKIM and such are wrong.

What do you do in these cases?  Should I rewrite the From: header?  
Somehow the original author should remain.

--
Eric Wheeler



> 
> --
> Eric Wheeler
> 
> 
> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PULL] bcache fixes and updates for-4.13
  2017-06-29 22:25       ` Eric Wheeler
@ 2017-06-29 23:28         ` Nick Alcock
  0 siblings, 0 replies; 41+ messages in thread
From: Nick Alcock @ 2017-06-29 23:28 UTC (permalink / raw)
  To: Eric Wheeler
  Cc: Christoph Hellwig, Jens Axboe, dan.carpenter, tasleson,
	linux-bcache, linux-block, Jan Kara, stable, tang.junhui,
	Kent Overstreet, Coly Li, Stefan Bader, Liang Chen

On 29 Jun 2017, Eric Wheeler verbalised:
> Hmm, I think vger might not be letting them in because the From: header 
> differs from the original sender (ie, I am not @oracle.com) so 
> DNS/SPF/DKIM and such are wrong.
>
> What do you do in these cases?  Should I rewrite the From: header?  
> Somehow the original author should remain.

If these are patches from 'git format-patch', you do it by putting your
name in the email From: and the author you want in the commit as a From
header in the very first line of the email. 'git am' will then ignore
the header line in favour of the first-line override. I believe 'git
send-email' should be doing this automatically if your email address is
not the same as the patch author, but I could be wrong...

See the DISCUSSION section in 'man git-am'.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 01/19] bcache: Fix leak of bdev reference
  2017-06-29 13:45   ` Christoph Hellwig
  2017-06-29 16:19     ` Coly Li
  2017-06-29 22:12     ` Eric Wheeler
@ 2017-06-30 20:42     ` bcache
  2017-06-30 20:42       ` [PATCH 02/19] bcache: fix sequential large write IO bypass bcache
                         ` (10 more replies)
  2 siblings, 11 replies; 41+ messages in thread
From: bcache @ 2017-06-30 20:42 UTC (permalink / raw)
  To: linux-block; +Cc: linux-bcache, hch, axboe, Jan Kara, stable

From: Jan Kara <jack@suse.cz>

If blkdev_get_by_path() in register_bcache() fails, we try to lookup the
block device using lookup_bdev() to detect which situation we are in to
properly report error. However we never drop the reference returned to
us from lookup_bdev(). Fix that.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
---
 drivers/md/bcache/super.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 8352fad..9a2c190 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1964,6 +1964,8 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 			else
 				err = "device busy";
 			mutex_unlock(&bch_register_lock);
+			if (!IS_ERR(bdev))
+				bdput(bdev);
 			if (attr == &ksysfs_register_quiet)
 				goto out;
 		}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 02/19] bcache: fix sequential large write IO bypass
  2017-06-30 20:42     ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
@ 2017-06-30 20:42       ` bcache
  2017-07-05 18:25         ` Christoph Hellwig
  2017-06-30 20:42       ` [PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO bcache
                         ` (9 subsequent siblings)
  10 siblings, 1 reply; 41+ messages in thread
From: bcache @ 2017-06-30 20:42 UTC (permalink / raw)
  To: linux-block; +Cc: linux-bcache, hch, axboe, Tang Junhui, stable

From: Tang Junhui <tang.junhui@zte.com.cn>

Sequential write IOs were tested with bs=1M by FIO in writeback cache
mode, these IOs were expected to be bypassed, but actually they did not.
We debug the code, and find in check_should_bypass():
    if (!congested &&
        mode == CACHE_MODE_WRITEBACK &&
        op_is_write(bio_op(bio)) &&
        (bio->bi_opf & REQ_SYNC))
        goto rescale
that means, If in writeback mode, a write IO with REQ_SYNC flag will not
be bypassed though it is a sequential large IO, It's not a correct thing
to do actually, so this patch remove these codes.

Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org
---
 drivers/md/bcache/request.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 019b3df..958072a 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -400,12 +400,6 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio)
 	if (!congested && !dc->sequential_cutoff)
 		goto rescale;
 
-	if (!congested &&
-	    mode == CACHE_MODE_WRITEBACK &&
-	    op_is_write(bio->bi_opf) &&
-	    op_is_sync(bio->bi_opf))
-		goto rescale;
-
 	spin_lock(&dc->io_lock);
 
 	hlist_for_each_entry(i, iohash(dc, bio->bi_iter.bi_sector), hash)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO
  2017-06-30 20:42     ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
  2017-06-30 20:42       ` [PATCH 02/19] bcache: fix sequential large write IO bypass bcache
@ 2017-06-30 20:42       ` bcache
  2017-07-01 17:26         ` Coly Li
  2017-07-05 18:25         ` Christoph Hellwig
  2017-06-30 20:42       ` [PATCH 04/19] bcache: fix wrong cache_misses statistics bcache
                         ` (8 subsequent siblings)
  10 siblings, 2 replies; 41+ messages in thread
From: bcache @ 2017-06-30 20:42 UTC (permalink / raw)
  To: linux-block; +Cc: linux-bcache, hch, axboe, Tang Junhui, stable

From: Tang Junhui <tang.junhui@zte.com.cn>

Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to
trigger gc thread.

Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org
---
 drivers/md/bcache/request.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 958072a..4b413db 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -196,12 +196,12 @@ static void bch_data_insert_start(struct closure *cl)
 	struct data_insert_op *op = container_of(cl, struct data_insert_op, cl);
 	struct bio *bio = op->bio, *n;
 
-	if (atomic_sub_return(bio_sectors(bio), &op->c->sectors_to_gc) < 0)
-		wake_up_gc(op->c);
-
 	if (op->bypass)
 		return bch_data_invalidate(cl);
 
+	if (atomic_sub_return(bio_sectors(bio), &op->c->sectors_to_gc) < 0)
+		wake_up_gc(op->c);
+
 	/*
 	 * Journal writes are marked REQ_PREFLUSH; if the original write was a
 	 * flush, it'll wait on the journal write.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 04/19] bcache: fix wrong cache_misses statistics
  2017-06-30 20:42     ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
  2017-06-30 20:42       ` [PATCH 02/19] bcache: fix sequential large write IO bypass bcache
  2017-06-30 20:42       ` [PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO bcache
@ 2017-06-30 20:42       ` bcache
  2017-07-01 17:58         ` Coly Li
  2017-06-30 20:42       ` [PATCH 06/19] bcache: explicitly destory mutex while exiting bcache
                         ` (7 subsequent siblings)
  10 siblings, 1 reply; 41+ messages in thread
From: bcache @ 2017-06-30 20:42 UTC (permalink / raw)
  To: linux-block; +Cc: linux-bcache, hch, axboe, Tang Junhui, stable

From: Tang Junhui <tang.junhui@zte.com.cn>

Some missed IOs are not counted into cache_misses, this patch fix this
issue.

Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org
---
 drivers/md/bcache/request.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 4b413db..d27707d 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -462,6 +462,7 @@ struct search {
 	unsigned		recoverable:1;
 	unsigned		write:1;
 	unsigned		read_dirty_data:1;
+	unsigned		cache_missed:1;
 
 	unsigned long		start_time;
 
@@ -647,6 +648,7 @@ static inline struct search *search_alloc(struct bio *bio,
 
 	s->orig_bio		= bio;
 	s->cache_miss		= NULL;
+	s->cache_missed		= 0;
 	s->d			= d;
 	s->recoverable		= 1;
 	s->write		= op_is_write(bio_op(bio));
@@ -758,7 +760,7 @@ static void cached_dev_read_done_bh(struct closure *cl)
 	struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
 
 	bch_mark_cache_accounting(s->iop.c, s->d,
-				  !s->cache_miss, s->iop.bypass);
+				  !s->cache_missed, s->iop.bypass);
 	trace_bcache_read(s->orig_bio, !s->cache_miss, s->iop.bypass);
 
 	if (s->iop.status)
@@ -777,6 +779,8 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
 	struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
 	struct bio *miss, *cache_bio;
 
+	s->cache_missed = 1; /* true */
+
 	if (s->cache_miss || s->iop.bypass) {
 		miss = bio_next_split(bio, sectors, GFP_NOIO, s->d->bio_split);
 		ret = miss == bio ? MAP_DONE : MAP_CONTINUE;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 06/19] bcache: explicitly destory mutex while exiting
  2017-06-30 20:42     ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
                         ` (2 preceding siblings ...)
  2017-06-30 20:42       ` [PATCH 04/19] bcache: fix wrong cache_misses statistics bcache
@ 2017-06-30 20:42       ` bcache
  2017-07-01 18:43         ` Coly Li
  2017-07-05 18:27         ` Christoph Hellwig
  2017-06-30 20:42       ` [PATCH 10/19] bcache: initialize stripe_sectors_dirty correctly for thin flash device bcache
                         ` (6 subsequent siblings)
  10 siblings, 2 replies; 41+ messages in thread
From: bcache @ 2017-06-30 20:42 UTC (permalink / raw)
  To: linux-block; +Cc: linux-bcache, hch, axboe, Liang Chen, stable

From: Liang Chen <liangchen.linux@gmail.com>

mutex_destroy does nothing most of time, but it's better to call
it to make the code future proof and it also has some meaning
for like mutex debug.

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org
---
 drivers/md/bcache/super.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 48b8c20..1f84791 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2089,6 +2089,7 @@ static void bcache_exit(void)
 	if (bcache_major)
 		unregister_blkdev(bcache_major, "bcache");
 	unregister_reboot_notifier(&reboot);
+	mutex_destroy(&bch_register_lock);
 }
 
 static int __init bcache_init(void)
@@ -2106,6 +2107,7 @@ static int __init bcache_init(void)
 
 	bcache_major = register_blkdev(0, "bcache");
 	if (bcache_major < 0) {
+		mutex_destroy(&bch_register_lock);
 		unregister_reboot_notifier(&reboot);
 		return bcache_major;
 	}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 10/19] bcache: initialize stripe_sectors_dirty correctly for thin flash device
  2017-06-30 20:42     ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
                         ` (3 preceding siblings ...)
  2017-06-30 20:42       ` [PATCH 06/19] bcache: explicitly destory mutex while exiting bcache
@ 2017-06-30 20:42       ` bcache
  2017-07-01 18:52         ` Coly Li
  2017-06-30 20:43       ` [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate bcache
                         ` (5 subsequent siblings)
  10 siblings, 1 reply; 41+ messages in thread
From: bcache @ 2017-06-30 20:42 UTC (permalink / raw)
  To: linux-block; +Cc: linux-bcache, hch, axboe, Tang Junhui, stable

From: Tang Junhui <tang.junhui@zte.com.cn>

Thin flash device does not initialize stripe_sectors_dirty correctly, this
patch fixes this issue.

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Cc: stable@vger.kernel.org
---
 drivers/md/bcache/super.c     | 3 ++-
 drivers/md/bcache/writeback.c | 8 ++++----
 drivers/md/bcache/writeback.h | 2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 1f84791..e06641e 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1030,7 +1030,7 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
 	}
 
 	if (BDEV_STATE(&dc->sb) == BDEV_STATE_DIRTY) {
-		bch_sectors_dirty_init(dc);
+		bch_sectors_dirty_init(&dc->disk);
 		atomic_set(&dc->has_dirty, 1);
 		atomic_inc(&dc->count);
 		bch_writeback_queue(dc);
@@ -1232,6 +1232,7 @@ static int flash_dev_run(struct cache_set *c, struct uuid_entry *u)
 		goto err;
 
 	bcache_device_attach(d, c, u - c->uuids);
+	bch_sectors_dirty_init(d);
 	bch_flash_dev_request_init(d);
 	add_disk(d->disk);
 
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 3d463f0..4ac8b13 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -482,17 +482,17 @@ static int sectors_dirty_init_fn(struct btree_op *_op, struct btree *b,
 	return MAP_CONTINUE;
 }
 
-void bch_sectors_dirty_init(struct cached_dev *dc)
+void bch_sectors_dirty_init(struct bcache_device *d)
 {
 	struct sectors_dirty_init op;
 
 	bch_btree_op_init(&op.op, -1);
-	op.inode = dc->disk.id;
+	op.inode = d->id;
 
-	bch_btree_map_keys(&op.op, dc->disk.c, &KEY(op.inode, 0, 0),
+	bch_btree_map_keys(&op.op, d->c, &KEY(op.inode, 0, 0),
 			   sectors_dirty_init_fn, 0);
 
-	dc->disk.sectors_dirty_last = bcache_dev_sectors_dirty(&dc->disk);
+	d->sectors_dirty_last = bcache_dev_sectors_dirty(d);
 }
 
 void bch_cached_dev_writeback_init(struct cached_dev *dc)
diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
index ea2f92e..c2ab4b4 100644
--- a/drivers/md/bcache/writeback.h
+++ b/drivers/md/bcache/writeback.h
@@ -109,7 +109,7 @@ static inline void bch_writeback_add(struct cached_dev *dc)
 
 void bcache_dev_sectors_dirty_add(struct cache_set *, unsigned, uint64_t, int);
 
-void bch_sectors_dirty_init(struct cached_dev *dc);
+void bch_sectors_dirty_init(struct bcache_device *);
 void bch_cached_dev_writeback_init(struct cached_dev *);
 int bch_cached_dev_writeback_start(struct cached_dev *);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate
  2017-06-30 20:42     ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
                         ` (4 preceding siblings ...)
  2017-06-30 20:42       ` [PATCH 10/19] bcache: initialize stripe_sectors_dirty correctly for thin flash device bcache
@ 2017-06-30 20:43       ` bcache
  2017-07-10 18:11         ` Coly Li
  2017-06-30 20:43       ` [PATCH 14/19] bcache: Correct return value for sysfs attach errors bcache
                         ` (4 subsequent siblings)
  10 siblings, 1 reply; 41+ messages in thread
From: bcache @ 2017-06-30 20:43 UTC (permalink / raw)
  To: linux-block; +Cc: linux-bcache, hch, axboe, Tang Junhui, stable

From: Tang Junhui <tang.junhui@zte.com.cn>

Since dirty sectors of thin flash cannot be used to cache data for backend
device, so we should subtract it in calculating writeback rate.

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Cc: stable@vger.kernel.org
---
 drivers/md/bcache/writeback.c |  2 +-
 drivers/md/bcache/writeback.h | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 4ac8b13..25289e4 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -21,7 +21,7 @@
 static void __update_writeback_rate(struct cached_dev *dc)
 {
 	struct cache_set *c = dc->disk.c;
-	uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size;
+	uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size - bcache_flash_devs_sectors_dirty(c);
 	uint64_t cache_dirty_target =
 		div_u64(cache_sectors * dc->writeback_percent, 100);
 
diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
index c2ab4b4..24ff589 100644
--- a/drivers/md/bcache/writeback.h
+++ b/drivers/md/bcache/writeback.h
@@ -14,6 +14,25 @@ static inline uint64_t bcache_dev_sectors_dirty(struct bcache_device *d)
 	return ret;
 }
 
+static inline uint64_t  bcache_flash_devs_sectors_dirty(struct cache_set *c)
+{
+	uint64_t i, ret = 0;
+
+	mutex_lock(&bch_register_lock);
+
+	for (i = 0; i < c->nr_uuids; i++) {
+		struct bcache_device *d = c->devices[i];
+
+		if (!d || !UUID_FLASH_ONLY(&c->uuids[i]))
+			continue;
+	   ret += bcache_dev_sectors_dirty(d);
+	}
+
+	mutex_unlock(&bch_register_lock);
+
+	return ret;
+}
+
 static inline unsigned offset_to_stripe(struct bcache_device *d,
 					uint64_t offset)
 {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 14/19] bcache: Correct return value for sysfs attach errors
  2017-06-30 20:42     ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
                         ` (5 preceding siblings ...)
  2017-06-30 20:43       ` [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate bcache
@ 2017-06-30 20:43       ` bcache
  2017-06-30 20:43       ` [PATCH 17/19] bcache: fix for gc and write-back race bcache
                         ` (3 subsequent siblings)
  10 siblings, 0 replies; 41+ messages in thread
From: bcache @ 2017-06-30 20:43 UTC (permalink / raw)
  To: linux-block; +Cc: linux-bcache, hch, axboe, Tony Asleson, stable

From: Tony Asleson <tasleson@redhat.com>

If you encounter any errors in bch_cached_dev_attach it will return a negative
error code.  The variable 'v' which stores the result is unsigned, thus user
space sees a very large value returned for bytes written which can cause
incorrect user space behavior.  Utilize 1 signed variable to use throughout
the function to preserve error return capability.

Signed-off-by: Tony Asleson <tasleson@redhat.com>
Acked-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org
---
 drivers/md/bcache/sysfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index cc0076d..7579ca6 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -206,7 +206,7 @@ STORE(__cached_dev)
 {
 	struct cached_dev *dc = container_of(kobj, struct cached_dev,
 					     disk.kobj);
-	unsigned v = size;
+	ssize_t v = size;
 	struct cache_set *c;
 	struct kobj_uevent_env *env;
 	unsigned ioprio_class = 0; /* invalid initial ioprio values */
@@ -245,7 +245,7 @@ STORE(__cached_dev)
 		bch_cached_dev_run(dc);
 
 	if (attr == &sysfs_cache_mode) {
-		ssize_t v = bch_read_string_list(buf, bch_cache_modes + 1);
+		v = bch_read_string_list(buf, bch_cache_modes + 1);
 
 		if (v < 0)
 			return v;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 17/19] bcache: fix for gc and write-back race
  2017-06-30 20:42     ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
                         ` (6 preceding siblings ...)
  2017-06-30 20:43       ` [PATCH 14/19] bcache: Correct return value for sysfs attach errors bcache
@ 2017-06-30 20:43       ` bcache
  2017-08-03 16:20         ` Coly Li
  2017-07-01 16:55       ` [PATCH 01/19] bcache: Fix leak of bdev reference Coly Li
                         ` (2 subsequent siblings)
  10 siblings, 1 reply; 41+ messages in thread
From: bcache @ 2017-06-30 20:43 UTC (permalink / raw)
  To: linux-block; +Cc: linux-bcache, hch, axboe, Tang Junhui, stable

From: Tang Junhui <tang.junhui@zte.com.cn>

gc and write-back get raced (see the email "bcache get stucked" I sended
before):
gc thread						write-back thread
|							|bch_writeback_thread()
|bch_gc_thread()					|
|							|==>read_dirty()
|==>bch_btree_gc()					|
|==>btree_root() //get btree root			|
|			node write locker		|
|==>bch_btree_gc_root()					|
|							|==>read_dirty_submit()
|							|==>write_dirty()
|							|==>continue_at(cl, write_dirty_finish, system_wq);
|							|==>write_dirty_finish()//excute in system_wq
|							|==>bch_btree_insert()
|							|==>bch_btree_map_leaf_nodes()
|							|==>__bch_btree_map_nodes()
|							|==>btree_root //try to get btree root node read lock
|							|-----stuck here
|==>bch_btree_set_root()				|
|==>bch_journal_meta()					|
|==>bch_journal()					|
|==>journal_try_write()					|
|==>journal_write_unlocked() //journal_full(&c->journal) condition satisfied
|==>continue_at(cl, journal_write, system_wq); //try to excute journal_write in system_wq
|					//but work queue is excuting write_dirty_finish()
|==>closure_sync(); //wait journal_write execute over and wake up gc,
|			--stuck here
|==>release root node write locker

This patch alloc a separate work-queue for write-back thread to avoid such
race.

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Cc: stable@vger.kernel.org
---
 drivers/md/bcache/bcache.h    | 1 +
 drivers/md/bcache/super.c     | 2 ++
 drivers/md/bcache/writeback.c | 8 ++++++--
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 44123e4..deb0a6c 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -333,6 +333,7 @@ struct cached_dev {
 	/* Limit number of writeback bios in flight */
 	struct semaphore	in_flight;
 	struct task_struct	*writeback_thread;
+	struct workqueue_struct	*writeback_write_wq;
 
 	struct keybuf		writeback_keys;
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index e06641e..24cb9b7 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1063,6 +1063,8 @@ static void cached_dev_free(struct closure *cl)
 	cancel_delayed_work_sync(&dc->writeback_rate_update);
 	if (!IS_ERR_OR_NULL(dc->writeback_thread))
 		kthread_stop(dc->writeback_thread);
+	if (dc->writeback_write_wq)
+		destroy_workqueue(dc->writeback_write_wq);
 
 	mutex_lock(&bch_register_lock);
 
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 4104eaa..4bc5daa 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -189,7 +189,7 @@ static void write_dirty(struct closure *cl)
 
 	closure_bio_submit(&io->bio, cl);
 
-	continue_at(cl, write_dirty_finish, system_wq);
+	continue_at(cl, write_dirty_finish, io->dc->writeback_write_wq);
 }
 
 static void read_dirty_endio(struct bio *bio)
@@ -209,7 +209,7 @@ static void read_dirty_submit(struct closure *cl)
 
 	closure_bio_submit(&io->bio, cl);
 
-	continue_at(cl, write_dirty, system_wq);
+	continue_at(cl, write_dirty, io->dc->writeback_write_wq);
 }
 
 static void read_dirty(struct cached_dev *dc)
@@ -527,6 +527,10 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc)
 
 int bch_cached_dev_writeback_start(struct cached_dev *dc)
 {
+	dc->writeback_write_wq = alloc_workqueue("bcache_writeback_wq", WQ_MEM_RECLAIM, 0);
+	if (!dc->writeback_write_wq)
+		return -ENOMEM;
+
 	dc->writeback_thread = kthread_create(bch_writeback_thread, dc,
 					      "bcache_writeback");
 	if (IS_ERR(dc->writeback_thread))
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 01/19] bcache: Fix leak of bdev reference
  2017-06-30 20:42     ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
                         ` (7 preceding siblings ...)
  2017-06-30 20:43       ` [PATCH 17/19] bcache: fix for gc and write-back race bcache
@ 2017-07-01 16:55       ` Coly Li
  2017-07-05 18:24       ` Christoph Hellwig
       [not found]       ` <1498855388-16990-5-git-send-email-bcache@lists.ewheeler.net>
  10 siblings, 0 replies; 41+ messages in thread
From: Coly Li @ 2017-07-01 16:55 UTC (permalink / raw)
  To: bcache, linux-block; +Cc: linux-bcache, hch, axboe, Jan Kara, stable

On 2017/7/1 上午4:42, bcache@lists.ewheeler.net wrote:
> From: Jan Kara <jack@suse.cz>
> 
> If blkdev_get_by_path() in register_bcache() fails, we try to lookup the
> block device using lookup_bdev() to detect which situation we are in to
> properly report error. However we never drop the reference returned to
> us from lookup_bdev(). Fix that.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> Cc: stable@vger.kernel.org

Acked-by: Coly Li <colyli@suse.de>

Thanks.

> ---
>  drivers/md/bcache/super.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 8352fad..9a2c190 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1964,6 +1964,8 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
>  			else
>  				err = "device busy";
>  			mutex_unlock(&bch_register_lock);
> +			if (!IS_ERR(bdev))
> +				bdput(bdev);
>  			if (attr == &ksysfs_register_quiet)
>  				goto out;
>  		}
> 


-- 
Coly Li

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO
  2017-06-30 20:42       ` [PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO bcache
@ 2017-07-01 17:26         ` Coly Li
  2017-07-05 18:25         ` Christoph Hellwig
  1 sibling, 0 replies; 41+ messages in thread
From: Coly Li @ 2017-07-01 17:26 UTC (permalink / raw)
  To: bcache, linux-block; +Cc: linux-bcache, hch, axboe, Tang Junhui, stable

On 2017/7/1 上午4:42, bcache@lists.ewheeler.net wrote:
> From: Tang Junhui <tang.junhui@zte.com.cn>
> 
> Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to
> trigger gc thread.
> 
> Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
> Cc: stable@vger.kernel.org

Acked-by: Coly Li <colyli@suse.de>

Thanks.

> ---
>  drivers/md/bcache/request.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index 958072a..4b413db 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -196,12 +196,12 @@ static void bch_data_insert_start(struct closure *cl)
>  	struct data_insert_op *op = container_of(cl, struct data_insert_op, cl);
>  	struct bio *bio = op->bio, *n;
>  
> -	if (atomic_sub_return(bio_sectors(bio), &op->c->sectors_to_gc) < 0)
> -		wake_up_gc(op->c);
> -
>  	if (op->bypass)
>  		return bch_data_invalidate(cl);
>  
> +	if (atomic_sub_return(bio_sectors(bio), &op->c->sectors_to_gc) < 0)
> +		wake_up_gc(op->c);
> +
>  	/*
>  	 * Journal writes are marked REQ_PREFLUSH; if the original write was a
>  	 * flush, it'll wait on the journal write.
> 


-- 
Coly Li

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 04/19] bcache: fix wrong cache_misses statistics
  2017-06-30 20:42       ` [PATCH 04/19] bcache: fix wrong cache_misses statistics bcache
@ 2017-07-01 17:58         ` Coly Li
  0 siblings, 0 replies; 41+ messages in thread
From: Coly Li @ 2017-07-01 17:58 UTC (permalink / raw)
  To: Tang Junhui; +Cc: bcache, linux-block, linux-bcache, hch, axboe, stable

On 2017/7/1 上午4:42, bcache@lists.ewheeler.net wrote:
> From: Tang Junhui <tang.junhui@zte.com.cn>
> 
> Some missed IOs are not counted into cache_misses, this patch fix this
> issue.

Could you please explain more about,
- which kind of missed I/O are mot counted
- where cache_missed is located

This will help the patch to be more understandable.

> 
> Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
> Cc: stable@vger.kernel.org

[snip]

> @@ -758,7 +760,7 @@ static void cached_dev_read_done_bh(struct closure *cl)
>  	struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
>  
>  	bch_mark_cache_accounting(s->iop.c, s->d,
> -				  !s->cache_miss, s->iop.bypass);
> +				  !s->cache_missed, s->iop.bypass);
>  	trace_bcache_read(s->orig_bio, !s->cache_miss, s->iop.bypass);


Should the above line be changed to,
	trace_bcache_read(s->orig_bio, !s->cache_missed, s->iop.bypass);
as well ?


[snip]

Thanks.

-- 
Coly Li

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 06/19] bcache: explicitly destory mutex while exiting
  2017-06-30 20:42       ` [PATCH 06/19] bcache: explicitly destory mutex while exiting bcache
@ 2017-07-01 18:43         ` Coly Li
  2017-07-05 11:58           ` Liang Chen
  2017-07-05 18:27         ` Christoph Hellwig
  1 sibling, 1 reply; 41+ messages in thread
From: Coly Li @ 2017-07-01 18:43 UTC (permalink / raw)
  To: Liang Chen; +Cc: bcache, linux-block, linux-bcache, hch, axboe, stable

On 2017/7/1 上午4:42, bcache@lists.ewheeler.net wrote:
> From: Liang Chen <liangchen.linux@gmail.com>
> 
> mutex_destroy does nothing most of time, but it's better to call
> it to make the code future proof and it also has some meaning
> for like mutex debug.
> 
> Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
> Cc: stable@vger.kernel.org
> ---
>  drivers/md/bcache/super.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 48b8c20..1f84791 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -2089,6 +2089,7 @@ static void bcache_exit(void)
>  	if (bcache_major)
>  		unregister_blkdev(bcache_major, "bcache");
>  	unregister_reboot_notifier(&reboot);
> +	mutex_destroy(&bch_register_lock>  }
>  
>  static int __init bcache_init(void)
> @@ -2106,6 +2107,7 @@ static int __init bcache_init(void)
>  
>  	bcache_major = register_blkdev(0, "bcache");
>  	if (bcache_major < 0) {
> +		mutex_destroy(&bch_register_lock);
>  		unregister_reboot_notifier(&reboot);
>  		return bcache_major;
>  	}
> 

Hi Liang,

Current code might have a potential race in a very corner case, see,
2084 static int __init bcache_init(void)
2085 {
2086         static const struct attribute *files[] = {
2087                 &ksysfs_register.attr,
2088                 &ksysfs_register_quiet.attr,
2089                 NULL
2090         };
2091
2092         mutex_init(&bch_register_lock);
2093         init_waitqueue_head(&unregister_wait);
2094         register_reboot_notifier(&reboot);
2095         closure_debug_init();
2096
2097         bcache_major = register_blkdev(0, "bcache");
2098         if (bcache_major < 0) {
2099                 unregister_reboot_notifier(&reboot);
2100                 return bcache_major;
2101         }
2102
2103         if (!(bcache_wq = alloc_workqueue("bcache", WQ_MEM_RECLAIM,
0)) ||
2104             !(bcache_kobj = kobject_create_and_add("bcache",
fs_kobj)) ||
2105             sysfs_create_files(bcache_kobj, files) ||
2106             bch_request_init() ||
2107             bch_debug_init(bcache_kobj))
2108                 goto err;
2109
2110         return 0;
2111 err:
2112         bcache_exit();
2113         return -ENOMEM;
2114 }

At line 2107, most of bache stuffs are ready to work, only a debugfs
entry not created yet. If in the time gap between line 2106 and line
2017, another user space tool just registers cache and backing devices.
Then bch_debug_init() failed, and bcache_exit() gets called. In this
case, I doubt bcache_exit() can handle all the references correctly.

The race is very rare, and almost won't happen in real life. So, if you
don't care about it, the patch can be simpler like this,
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index e57353e39168..fb5453a46a03 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2070,6 +2070,7 @@ static struct notifier_block reboot = {

 static void bcache_exit(void)
 {
+       mutex_destroy(&bch_register_lock);
        bch_debug_exit();
        bch_request_exit();
        if (bcache_kobj)
@@ -2089,7 +2090,6 @@ static int __init bcache_init(void)
                NULL
        };

-       mutex_init(&bch_register_lock);
        init_waitqueue_head(&unregister_wait);
        register_reboot_notifier(&reboot);
        closure_debug_init();
@@ -2107,6 +2107,7 @@ static int __init bcache_init(void)
            bch_debug_init(bcache_kobj))
                goto err;

+       mutex_init(&bch_register_lock);
        return 0;
 err:
        bcache_exit();
---
And if you do care about the race, maybe you should do something like this,
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index e57353e39168..ca1d8b7a7815 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2079,6 +2079,7 @@ static void bcache_exit(void)
        if (bcache_major)
                unregister_blkdev(bcache_major, "bcache");
        unregister_reboot_notifier(&reboot);
+       mutex_unlock(&bch_register_lock);
 }

 static int __init bcache_init(void)
@@ -2090,6 +2091,7 @@ static int __init bcache_init(void)
        };

        mutex_init(&bch_register_lock);
+       mutex_lock(&bch_register_lock);
        init_waitqueue_head(&unregister_wait);
        register_reboot_notifier(&reboot);
        closure_debug_init();
@@ -2097,6 +2099,8 @@ static int __init bcache_init(void)
        bcache_major = register_blkdev(0, "bcache");
        if (bcache_major < 0) {
                unregister_reboot_notifier(&reboot);
+               mutex_unlock(&bch_register_lock);
+               mutex_destroy(&bch_register_lock);
                return bcache_major;
        }

@@ -2104,9 +2108,12 @@ static int __init bcache_init(void)
            !(bcache_kobj = kobject_create_and_add("bcache", fs_kobj)) ||
            sysfs_create_files(bcache_kobj, files) ||
            bch_request_init() ||
-           bch_debug_init(bcache_kobj))
+           bch_debug_init(bcache_kobj)) {
+               mutex_unlock(&bch_register_lock);
                goto err;
+       }

+       mutex_unlock(&bch_register_lock);
        return 0;
 err:
        bcache_exit();
---

Personally I think the first approach with only one new line code added,
your original version will add two new lines of code.

Just FYI. Thanks.

-- 
Coly Li

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 10/19] bcache: initialize stripe_sectors_dirty correctly for thin flash device
  2017-06-30 20:42       ` [PATCH 10/19] bcache: initialize stripe_sectors_dirty correctly for thin flash device bcache
@ 2017-07-01 18:52         ` Coly Li
  0 siblings, 0 replies; 41+ messages in thread
From: Coly Li @ 2017-07-01 18:52 UTC (permalink / raw)
  To: Tang Junhui; +Cc: bcache, linux-block, linux-bcache, hch, axboe, stable

On 2017/7/1 上午4:42, bcache@lists.ewheeler.net wrote:
> From: Tang Junhui <tang.junhui@zte.com.cn>
> 
> Thin flash device does not initialize stripe_sectors_dirty correctly, this
> patch fixes this issue.

Hi Junhui,

Could you please explain why stripe_sectors_ditry is not correctly
initialized and how about its negative result ?

Thanks.

Coly

> 
> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
> Cc: stable@vger.kernel.org
> ---
>  drivers/md/bcache/super.c     | 3 ++-
>  drivers/md/bcache/writeback.c | 8 ++++----
>  drivers/md/bcache/writeback.h | 2 +-
>  3 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 1f84791..e06641e 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1030,7 +1030,7 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
>  	}
>  
>  	if (BDEV_STATE(&dc->sb) == BDEV_STATE_DIRTY) {
> -		bch_sectors_dirty_init(dc);
> +		bch_sectors_dirty_init(&dc->disk);
>  		atomic_set(&dc->has_dirty, 1);
>  		atomic_inc(&dc->count);
>  		bch_writeback_queue(dc);
> @@ -1232,6 +1232,7 @@ static int flash_dev_run(struct cache_set *c, struct uuid_entry *u)
>  		goto err;
>  
>  	bcache_device_attach(d, c, u - c->uuids);
> +	bch_sectors_dirty_init(d);
>  	bch_flash_dev_request_init(d);
>  	add_disk(d->disk);
>  
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 3d463f0..4ac8b13 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -482,17 +482,17 @@ static int sectors_dirty_init_fn(struct btree_op *_op, struct btree *b,
>  	return MAP_CONTINUE;
>  }
>  
> -void bch_sectors_dirty_init(struct cached_dev *dc)
> +void bch_sectors_dirty_init(struct bcache_device *d)
>  {
>  	struct sectors_dirty_init op;
>  
>  	bch_btree_op_init(&op.op, -1);
> -	op.inode = dc->disk.id;
> +	op.inode = d->id;
>  
> -	bch_btree_map_keys(&op.op, dc->disk.c, &KEY(op.inode, 0, 0),
> +	bch_btree_map_keys(&op.op, d->c, &KEY(op.inode, 0, 0),
>  			   sectors_dirty_init_fn, 0);
>  
> -	dc->disk.sectors_dirty_last = bcache_dev_sectors_dirty(&dc->disk);
> +	d->sectors_dirty_last = bcache_dev_sectors_dirty(d);
>  }
>  
>  void bch_cached_dev_writeback_init(struct cached_dev *dc)
> diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
> index ea2f92e..c2ab4b4 100644
> --- a/drivers/md/bcache/writeback.h
> +++ b/drivers/md/bcache/writeback.h
> @@ -109,7 +109,7 @@ static inline void bch_writeback_add(struct cached_dev *dc)
>  
>  void bcache_dev_sectors_dirty_add(struct cache_set *, unsigned, uint64_t, int);
>  
> -void bch_sectors_dirty_init(struct cached_dev *dc);
> +void bch_sectors_dirty_init(struct bcache_device *);
>  void bch_cached_dev_writeback_init(struct cached_dev *);
>  int bch_cached_dev_writeback_start(struct cached_dev *);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 05/19] bcache: fix calling ida_simple_remove() with incorrect minor
@ 2017-07-01 19:25 Eric Wheeler
  0 siblings, 0 replies; 41+ messages in thread
From: Eric Wheeler @ 2017-07-01 19:25 UTC (permalink / raw)
  To: linux-block; +Cc: linux-bcache, hch, axboe, Tang Junhui, stable, Stefan Bader

From: Tang Junhui <tang.junhui@zte.com.cn>

bcache called ida_simple_remove() with minor which have multiplied by
BCACHE_MINORS, it would cause minor wrong release and leakage.

In addition, when adding partition support to bcache, the name assignment
was not updated, resulting in numbers jumping (bcache0, bcache16,
bcache32...). This has been fixed implicitly by the rework.

Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
Reviewed-by: Coly Li <colyli@suse.de>
Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org # 4.10
Cc: Stefan Bader <stefan.bader@canonical.com>
Fixes: b8c0d91 (bcache: partition support: add 16 minors per bcacheN device)
BugLink: https://bugs.launchpad.net/bugs/1667078
---
 drivers/md/bcache/super.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 9a2c190..48b8c20 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -58,7 +58,10 @@ static wait_queue_head_t unregister_wait;
 struct workqueue_struct *bcache_wq;
 
 #define BTREE_MAX_PAGES		(256 * 1024 / PAGE_SIZE)
-#define BCACHE_MINORS		16 /* partition support */
+#define BCACHE_MINORS_BITS                4 /* bcache partition support */
+#define BCACHE_MINORS                     (1 << BCACHE_MINORS_BITS)
+#define BCACHE_TO_IDA_MINORS(first_minor) ((first_minor) >> BCACHE_MINORS_BITS)
+#define IDA_TO_BCACHE_MINORS(minor)       ((minor) << BCACHE_MINORS_BITS)
 
 /* Superblock */
 
@@ -734,7 +737,8 @@ static void bcache_device_free(struct bcache_device *d)
 	if (d->disk && d->disk->queue)
 		blk_cleanup_queue(d->disk->queue);
 	if (d->disk) {
-		ida_simple_remove(&bcache_minor, d->disk->first_minor);
+		ida_simple_remove(&bcache_minor,
+			BCACHE_TO_IDA_MINORS(d->disk->first_minor));
 		put_disk(d->disk);
 	}
 
@@ -776,11 +780,11 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size,
 	if (!d->full_dirty_stripes)
 		return -ENOMEM;
 
-	minor = ida_simple_get(&bcache_minor, 0, MINORMASK + 1, GFP_KERNEL);
+	minor = ida_simple_get(&bcache_minor, 0,
+		BCACHE_TO_IDA_MINORS(MINORMASK) + 1, GFP_KERNEL);
 	if (minor < 0)
 		return minor;
 
-	minor *= BCACHE_MINORS;
 
 	if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio),
 					   BIOSET_NEED_BVECS |
@@ -794,7 +798,7 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size,
 	snprintf(d->disk->disk_name, DISK_NAME_LEN, "bcache%i", minor);
 
 	d->disk->major		= bcache_major;
-	d->disk->first_minor	= minor;
+	d->disk->first_minor	= IDA_TO_BCACHE_MINORS(minor);
 	d->disk->fops		= &bcache_ops;
 	d->disk->private_data	= d;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 06/19] bcache: explicitly destory mutex while exiting
  2017-07-01 18:43         ` Coly Li
@ 2017-07-05 11:58           ` Liang Chen
  2017-07-11  7:22             ` Coly Li
  0 siblings, 1 reply; 41+ messages in thread
From: Liang Chen @ 2017-07-05 11:58 UTC (permalink / raw)
  To: Coly Li; +Cc: bcache, linux-block, linux-bcache, hch, axboe, stable

Hi Coly,
Thanks for reviewing the patch! You raised a good point about the race. I also
think it should be addressed. Even though the time window is small, it will
still happen sooner or later.

I would like to keep this "destory mutex" patch unchanged, and send another
patch to fix the issue based on your approach. Please take a look. Thanks!

Thanks,
Liang

On Sun, Jul 2, 2017 at 2:43 AM, Coly Li <i@coly.li> wrote:
> On 2017/7/1 上午4:42, bcache@lists.ewheeler.net wrote:
>> From: Liang Chen <liangchen.linux@gmail.com>
>>
>> mutex_destroy does nothing most of time, but it's better to call
>> it to make the code future proof and it also has some meaning
>> for like mutex debug.
>>
>> Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
>> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
>> Cc: stable@vger.kernel.org
>> ---
>>  drivers/md/bcache/super.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>> index 48b8c20..1f84791 100644
>> --- a/drivers/md/bcache/super.c
>> +++ b/drivers/md/bcache/super.c
>> @@ -2089,6 +2089,7 @@ static void bcache_exit(void)
>>       if (bcache_major)
>>               unregister_blkdev(bcache_major, "bcache");
>>       unregister_reboot_notifier(&reboot);
>> +     mutex_destroy(&bch_register_lock>  }
>>
>>  static int __init bcache_init(void)
>> @@ -2106,6 +2107,7 @@ static int __init bcache_init(void)
>>
>>       bcache_major = register_blkdev(0, "bcache");
>>       if (bcache_major < 0) {
>> +             mutex_destroy(&bch_register_lock);
>>               unregister_reboot_notifier(&reboot);
>>               return bcache_major;
>>       }
>>
>
> Hi Liang,
>
> Current code might have a potential race in a very corner case, see,
> 2084 static int __init bcache_init(void)
> 2085 {
> 2086         static const struct attribute *files[] = {
> 2087                 &ksysfs_register.attr,
> 2088                 &ksysfs_register_quiet.attr,
> 2089                 NULL
> 2090         };
> 2091
> 2092         mutex_init(&bch_register_lock);
> 2093         init_waitqueue_head(&unregister_wait);
> 2094         register_reboot_notifier(&reboot);
> 2095         closure_debug_init();
> 2096
> 2097         bcache_major = register_blkdev(0, "bcache");
> 2098         if (bcache_major < 0) {
> 2099                 unregister_reboot_notifier(&reboot);
> 2100                 return bcache_major;
> 2101         }
> 2102
> 2103         if (!(bcache_wq = alloc_workqueue("bcache", WQ_MEM_RECLAIM,
> 0)) ||
> 2104             !(bcache_kobj = kobject_create_and_add("bcache",
> fs_kobj)) ||
> 2105             sysfs_create_files(bcache_kobj, files) ||
> 2106             bch_request_init() ||
> 2107             bch_debug_init(bcache_kobj))
> 2108                 goto err;
> 2109
> 2110         return 0;
> 2111 err:
> 2112         bcache_exit();
> 2113         return -ENOMEM;
> 2114 }
>
> At line 2107, most of bache stuffs are ready to work, only a debugfs
> entry not created yet. If in the time gap between line 2106 and line
> 2017, another user space tool just registers cache and backing devices.
> Then bch_debug_init() failed, and bcache_exit() gets called. In this
> case, I doubt bcache_exit() can handle all the references correctly.
>
> The race is very rare, and almost won't happen in real life. So, if you
> don't care about it, the patch can be simpler like this,
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index e57353e39168..fb5453a46a03 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -2070,6 +2070,7 @@ static struct notifier_block reboot = {
>
>  static void bcache_exit(void)
>  {
> +       mutex_destroy(&bch_register_lock);
>         bch_debug_exit();
>         bch_request_exit();
>         if (bcache_kobj)
> @@ -2089,7 +2090,6 @@ static int __init bcache_init(void)
>                 NULL
>         };
>
> -       mutex_init(&bch_register_lock);
>         init_waitqueue_head(&unregister_wait);
>         register_reboot_notifier(&reboot);
>         closure_debug_init();
> @@ -2107,6 +2107,7 @@ static int __init bcache_init(void)
>             bch_debug_init(bcache_kobj))
>                 goto err;
>
> +       mutex_init(&bch_register_lock);
>         return 0;
>  err:
>         bcache_exit();
> ---
> And if you do care about the race, maybe you should do something like this,
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index e57353e39168..ca1d8b7a7815 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -2079,6 +2079,7 @@ static void bcache_exit(void)
>         if (bcache_major)
>                 unregister_blkdev(bcache_major, "bcache");
>         unregister_reboot_notifier(&reboot);
> +       mutex_unlock(&bch_register_lock);
>  }
>
>  static int __init bcache_init(void)
> @@ -2090,6 +2091,7 @@ static int __init bcache_init(void)
>         };
>
>         mutex_init(&bch_register_lock);
> +       mutex_lock(&bch_register_lock);
>         init_waitqueue_head(&unregister_wait);
>         register_reboot_notifier(&reboot);
>         closure_debug_init();
> @@ -2097,6 +2099,8 @@ static int __init bcache_init(void)
>         bcache_major = register_blkdev(0, "bcache");
>         if (bcache_major < 0) {
>                 unregister_reboot_notifier(&reboot);
> +               mutex_unlock(&bch_register_lock);
> +               mutex_destroy(&bch_register_lock);
>                 return bcache_major;
>         }
>
> @@ -2104,9 +2108,12 @@ static int __init bcache_init(void)
>             !(bcache_kobj = kobject_create_and_add("bcache", fs_kobj)) ||
>             sysfs_create_files(bcache_kobj, files) ||
>             bch_request_init() ||
> -           bch_debug_init(bcache_kobj))
> +           bch_debug_init(bcache_kobj)) {
> +               mutex_unlock(&bch_register_lock);
>                 goto err;
> +       }
>
> +       mutex_unlock(&bch_register_lock);
>         return 0;
>  err:
>         bcache_exit();
> ---
>
> Personally I think the first approach with only one new line code added,
> your original version will add two new lines of code.
>
> Just FYI. Thanks.
>
> --
> Coly Li

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 01/19] bcache: Fix leak of bdev reference
  2017-06-30 20:42     ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
                         ` (8 preceding siblings ...)
  2017-07-01 16:55       ` [PATCH 01/19] bcache: Fix leak of bdev reference Coly Li
@ 2017-07-05 18:24       ` Christoph Hellwig
       [not found]       ` <1498855388-16990-5-git-send-email-bcache@lists.ewheeler.net>
  10 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2017-07-05 18:24 UTC (permalink / raw)
  To: bcache; +Cc: linux-block, linux-bcache, hch, axboe, Jan Kara, stable

On Fri, Jun 30, 2017 at 01:42:50PM -0700, bcache@lists.ewheeler.net wrote:
> From: Jan Kara <jack@suse.cz>
> 
> If blkdev_get_by_path() in register_bcache() fails, we try to lookup the
> block device using lookup_bdev() to detect which situation we are in to
> properly report error. However we never drop the reference returned to
> us from lookup_bdev(). Fix that.

This look ok, but I think that whole chunk of code should just go
away - adding a lookup_bdev and resulting mess just for a slightly
different error message is just insane.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 02/19] bcache: fix sequential large write IO bypass
  2017-06-30 20:42       ` [PATCH 02/19] bcache: fix sequential large write IO bypass bcache
@ 2017-07-05 18:25         ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2017-07-05 18:25 UTC (permalink / raw)
  To: bcache; +Cc: linux-block, linux-bcache, hch, axboe, Tang Junhui, stable

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO
  2017-06-30 20:42       ` [PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO bcache
  2017-07-01 17:26         ` Coly Li
@ 2017-07-05 18:25         ` Christoph Hellwig
  1 sibling, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2017-07-05 18:25 UTC (permalink / raw)
  To: bcache; +Cc: linux-block, linux-bcache, hch, axboe, Tang Junhui, stable

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 05/19] bcache: fix calling ida_simple_remove() with incorrect minor
       [not found]       ` <1498855388-16990-5-git-send-email-bcache@lists.ewheeler.net>
@ 2017-07-05 18:26         ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2017-07-05 18:26 UTC (permalink / raw)
  To: bcache
  Cc: linux-block, linux-bcache, hch, axboe, Tang Junhui, stable,
	Stefan Bader

> +#define BCACHE_TO_IDA_MINORS(first_minor) ((first_minor) >> BCACHE_MINORS_BITS)
> +#define IDA_TO_BCACHE_MINORS(minor)       ((minor) << BCACHE_MINORS_BITS)

Please use inline functions and lower case for these.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 06/19] bcache: explicitly destory mutex while exiting
  2017-06-30 20:42       ` [PATCH 06/19] bcache: explicitly destory mutex while exiting bcache
  2017-07-01 18:43         ` Coly Li
@ 2017-07-05 18:27         ` Christoph Hellwig
  2017-07-06  1:56           ` Liang Chen
  1 sibling, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2017-07-05 18:27 UTC (permalink / raw)
  To: bcache; +Cc: linux-block, linux-bcache, hch, axboe, Liang Chen, stable

On Fri, Jun 30, 2017 at 01:42:55PM -0700, bcache@lists.ewheeler.net wrote:
> From: Liang Chen <liangchen.linux@gmail.com>
> 
> mutex_destroy does nothing most of time, but it's better to call
> it to make the code future proof and it also has some meaning
> for like mutex debug.

It shouldn't really - we should get the destroy behavior for free
when doing a slab free of the area.

What issue are you trying to solve?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 06/19] bcache: explicitly destory mutex while exiting
  2017-07-05 18:27         ` Christoph Hellwig
@ 2017-07-06  1:56           ` Liang Chen
  0 siblings, 0 replies; 41+ messages in thread
From: Liang Chen @ 2017-07-06  1:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: bcache, linux-block, linux-bcache, axboe, stable

mutex_destroy does nothing normally (may not be true in the future), but
when debug mutex is turned on it helps with debugging - mutex_destroy
in mutex-debug.c.
It's not about freeing of the memory. It's more about consistency of the
use of mutex and making the code future proof.

Thanks,
Linag

On Thu, Jul 6, 2017 at 2:27 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Fri, Jun 30, 2017 at 01:42:55PM -0700, bcache@lists.ewheeler.net wrote:
>> From: Liang Chen <liangchen.linux@gmail.com>
>>
>> mutex_destroy does nothing most of time, but it's better to call
>> it to make the code future proof and it also has some meaning
>> for like mutex debug.
>
> It shouldn't really - we should get the destroy behavior for free
> when doing a slab free of the area.
>
> What issue are you trying to solve?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate
  2017-06-30 20:43       ` [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate bcache
@ 2017-07-10 18:11         ` Coly Li
       [not found]           ` <OF92BDA950.86AA00FA-ON4825815A.001F33D9-4825815A.001F5C89@zte.com.cn>
  0 siblings, 1 reply; 41+ messages in thread
From: Coly Li @ 2017-07-10 18:11 UTC (permalink / raw)
  To: Tang Junhui; +Cc: bcache, linux-block, linux-bcache, hch, axboe, stable

On 2017/7/1 上午4:43, bcache@lists.ewheeler.net wrote:
> From: Tang Junhui <tang.junhui@zte.com.cn>
> 
> Since dirty sectors of thin flash cannot be used to cache data for backend
> device, so we should subtract it in calculating writeback rate.
> 

I see you want to get ride of the noise of flash only cache device for
writeback rate calculation. It makes sense, because flash only cache
device won't have write back happen at all.


> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
> Cc: stable@vger.kernel.org
> ---
>  drivers/md/bcache/writeback.c |  2 +-
>  drivers/md/bcache/writeback.h | 19 +++++++++++++++++++
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 4ac8b13..25289e4 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -21,7 +21,7 @@
>  static void __update_writeback_rate(struct cached_dev *dc)
>  {
>  	struct cache_set *c = dc->disk.c;
> -	uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size;
> +	uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size - bcache_flash_devs_sectors_dirty(c);

See flash_dev_run(), the flash volume is created per struct
bcache_device of a cache set. That means, all data allocated for the
flash volume will be from a flash only bcache device. Regular dirty data
won't mixed allocating with flash volume dirty data on identical struct
bcache device.

Based on the above implementation, non-dirty space from flash only
bcache device will mislead writeback rate calculation too. So I suggest
to subtract all buckets size from all flash only bcache devices. Then it
might be something like,

uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size -
			bcache_flash_devs_nbuckets(c);



Just FYI. Thanks.

Coly

>  	uint64_t cache_dirty_target =
>  		div_u64(cache_sectors * dc->writeback_percent, 100);
>  
> diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
> index c2ab4b4..24ff589 100644
> --- a/drivers/md/bcache/writeback.h
> +++ b/drivers/md/bcache/writeback.h
> @@ -14,6 +14,25 @@ static inline uint64_t bcache_dev_sectors_dirty(struct bcache_device *d)
>  	return ret;
>  }
>  
> +static inline uint64_t  bcache_flash_devs_sectors_dirty(struct cache_set *c)
> +{
> +	uint64_t i, ret = 0;
> +
> +	mutex_lock(&bch_register_lock);
> +
> +	for (i = 0; i < c->nr_uuids; i++) {
> +		struct bcache_device *d = c->devices[i];
> +
> +		if (!d || !UUID_FLASH_ONLY(&c->uuids[i]))
> +			continue;
> +	   ret += bcache_dev_sectors_dirty(d);
> +	}
> +
> +	mutex_unlock(&bch_register_lock);
> +
> +	return ret;
> +}
> +
>  static inline unsigned offset_to_stripe(struct bcache_device *d,
>  					uint64_t offset)
>  {
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 06/19] bcache: explicitly destory mutex while exiting
  2017-07-05 11:58           ` Liang Chen
@ 2017-07-11  7:22             ` Coly Li
  0 siblings, 0 replies; 41+ messages in thread
From: Coly Li @ 2017-07-11  7:22 UTC (permalink / raw)
  To: Liang Chen; +Cc: bcache, linux-block, linux-bcache, hch, axboe, stable

On 2017/7/5 下午7:58, Liang Chen wrote:
> Hi Coly,
> Thanks for reviewing the patch! You raised a good point about the race. I also
> think it should be addressed. Even though the time window is small, it will
> still happen sooner or later.
> 
> I would like to keep this "destory mutex" patch unchanged, and send another
> patch to fix the issue based on your approach. Please take a look. Thanks!
> 

Sure, good idea. I'd like to review the next fix, and provide my feed
back together. Thanks.

Coly



> Thanks,
> Liang
> 
> On Sun, Jul 2, 2017 at 2:43 AM, Coly Li <i@coly.li> wrote:
>> On 2017/7/1 上午4:42, bcache@lists.ewheeler.net wrote:
>>> From: Liang Chen <liangchen.linux@gmail.com>
>>>
>>> mutex_destroy does nothing most of time, but it's better to call
>>> it to make the code future proof and it also has some meaning
>>> for like mutex debug.
>>>
>>> Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
>>> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
>>> Cc: stable@vger.kernel.org
>>> ---
>>>  drivers/md/bcache/super.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>>> index 48b8c20..1f84791 100644
>>> --- a/drivers/md/bcache/super.c
>>> +++ b/drivers/md/bcache/super.c
>>> @@ -2089,6 +2089,7 @@ static void bcache_exit(void)
>>>       if (bcache_major)
>>>               unregister_blkdev(bcache_major, "bcache");
>>>       unregister_reboot_notifier(&reboot);
>>> +     mutex_destroy(&bch_register_lock>  }
>>>
>>>  static int __init bcache_init(void)
>>> @@ -2106,6 +2107,7 @@ static int __init bcache_init(void)
>>>
>>>       bcache_major = register_blkdev(0, "bcache");
>>>       if (bcache_major < 0) {
>>> +             mutex_destroy(&bch_register_lock);
>>>               unregister_reboot_notifier(&reboot);
>>>               return bcache_major;
>>>       }
>>>
>>
>> Hi Liang,
>>
>> Current code might have a potential race in a very corner case, see,
>> 2084 static int __init bcache_init(void)
>> 2085 {
>> 2086         static const struct attribute *files[] = {
>> 2087                 &ksysfs_register.attr,
>> 2088                 &ksysfs_register_quiet.attr,
>> 2089                 NULL
>> 2090         };
>> 2091
>> 2092         mutex_init(&bch_register_lock);
>> 2093         init_waitqueue_head(&unregister_wait);
>> 2094         register_reboot_notifier(&reboot);
>> 2095         closure_debug_init();
>> 2096
>> 2097         bcache_major = register_blkdev(0, "bcache");
>> 2098         if (bcache_major < 0) {
>> 2099                 unregister_reboot_notifier(&reboot);
>> 2100                 return bcache_major;
>> 2101         }
>> 2102
>> 2103         if (!(bcache_wq = alloc_workqueue("bcache", WQ_MEM_RECLAIM,
>> 0)) ||
>> 2104             !(bcache_kobj = kobject_create_and_add("bcache",
>> fs_kobj)) ||
>> 2105             sysfs_create_files(bcache_kobj, files) ||
>> 2106             bch_request_init() ||
>> 2107             bch_debug_init(bcache_kobj))
>> 2108                 goto err;
>> 2109
>> 2110         return 0;
>> 2111 err:
>> 2112         bcache_exit();
>> 2113         return -ENOMEM;
>> 2114 }
>>
>> At line 2107, most of bache stuffs are ready to work, only a debugfs
>> entry not created yet. If in the time gap between line 2106 and line
>> 2017, another user space tool just registers cache and backing devices.
>> Then bch_debug_init() failed, and bcache_exit() gets called. In this
>> case, I doubt bcache_exit() can handle all the references correctly.
>>
>> The race is very rare, and almost won't happen in real life. So, if you
>> don't care about it, the patch can be simpler like this,
>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>> index e57353e39168..fb5453a46a03 100644
>> --- a/drivers/md/bcache/super.c
>> +++ b/drivers/md/bcache/super.c
>> @@ -2070,6 +2070,7 @@ static struct notifier_block reboot = {
>>
>>  static void bcache_exit(void)
>>  {
>> +       mutex_destroy(&bch_register_lock);
>>         bch_debug_exit();
>>         bch_request_exit();
>>         if (bcache_kobj)
>> @@ -2089,7 +2090,6 @@ static int __init bcache_init(void)
>>                 NULL
>>         };
>>
>> -       mutex_init(&bch_register_lock);
>>         init_waitqueue_head(&unregister_wait);
>>         register_reboot_notifier(&reboot);
>>         closure_debug_init();
>> @@ -2107,6 +2107,7 @@ static int __init bcache_init(void)
>>             bch_debug_init(bcache_kobj))
>>                 goto err;
>>
>> +       mutex_init(&bch_register_lock);
>>         return 0;
>>  err:
>>         bcache_exit();
>> ---
>> And if you do care about the race, maybe you should do something like this,
>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>> index e57353e39168..ca1d8b7a7815 100644
>> --- a/drivers/md/bcache/super.c
>> +++ b/drivers/md/bcache/super.c
>> @@ -2079,6 +2079,7 @@ static void bcache_exit(void)
>>         if (bcache_major)
>>                 unregister_blkdev(bcache_major, "bcache");
>>         unregister_reboot_notifier(&reboot);
>> +       mutex_unlock(&bch_register_lock);
>>  }
>>
>>  static int __init bcache_init(void)
>> @@ -2090,6 +2091,7 @@ static int __init bcache_init(void)
>>         };
>>
>>         mutex_init(&bch_register_lock);
>> +       mutex_lock(&bch_register_lock);
>>         init_waitqueue_head(&unregister_wait);
>>         register_reboot_notifier(&reboot);
>>         closure_debug_init();
>> @@ -2097,6 +2099,8 @@ static int __init bcache_init(void)
>>         bcache_major = register_blkdev(0, "bcache");
>>         if (bcache_major < 0) {
>>                 unregister_reboot_notifier(&reboot);
>> +               mutex_unlock(&bch_register_lock);
>> +               mutex_destroy(&bch_register_lock);
>>                 return bcache_major;
>>         }
>>
>> @@ -2104,9 +2108,12 @@ static int __init bcache_init(void)
>>             !(bcache_kobj = kobject_create_and_add("bcache", fs_kobj)) ||
>>             sysfs_create_files(bcache_kobj, files) ||
>>             bch_request_init() ||
>> -           bch_debug_init(bcache_kobj))
>> +           bch_debug_init(bcache_kobj)) {
>> +               mutex_unlock(&bch_register_lock);
>>                 goto err;
>> +       }
>>
>> +       mutex_unlock(&bch_register_lock);
>>         return 0;
>>  err:
>>         bcache_exit();
>> ---
>>
>> Personally I think the first approach with only one new line code added,
>> your original version will add two new lines of code.
>>
>> Just FYI. Thanks.
>>
>> --
>> Coly Li


-- 
Coly Li

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Re: [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate
       [not found]           ` <OF92BDA950.86AA00FA-ON4825815A.001F33D9-4825815A.001F5C89@zte.com.cn>
@ 2017-07-13  4:12             ` Eric Wheeler
  2017-07-13  4:15               ` Coly Li
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Wheeler @ 2017-07-13  4:12 UTC (permalink / raw)
  To: tang.junhui; +Cc: Coly Li, linux-bcache, linux-block, stable

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5081 bytes --]

On Tue, 11 Jul 2017, tang.junhui@zte.com.cn wrote:

> > Based on the above implementation, non-dirty space from flash only
> > bcache device will mislead writeback rate calculation too. So I suggest
> > to subtract all buckets size from all flash only bcache devices. Then it
> > might be something like,
> 
> what is non-dirty space from flash only bcache device?
> Where is non-dirty space from flash only bcache device?

Hi Tang, Coly:
   
Waas there more discussion on this thread, or is the patch good to go?  

Please send your ack if you're happy with it so I can queue it up.

--
Eric Wheeler

> 
> 
> 
> 发件人:         Coly Li <i@coly.li>
> 收件人:         Tang Junhui <tang.junhui@zte.com.cn>,
> 抄送:        bcache@lists.ewheeler.net, linux-block@vger.kernel.org, linux-bcache@vger.kernel.org,
> hch@infradead.org, axboe@kernel.dk, stable@vger.kernel.org
> 日期:         2017/07/11 02:11
> 主题:        Re: [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating
> writeback rate
> 发件人:        linux-bcache-owner@vger.kernel.org
> 
> _________________________________________________________________________________________________________________
> 
> 
> 
> On 2017/7/1 上午4:43, bcache@lists.ewheeler.net wrote:
> > From: Tang Junhui <tang.junhui@zte.com.cn>
> >
> > Since dirty sectors of thin flash cannot be used to cache data for backend
> > device, so we should subtract it in calculating writeback rate.
> >
> 
> I see you want to get ride of the noise of flash only cache device for
> writeback rate calculation. It makes sense, because flash only cache
> device won't have write back happen at all.
> 
> 
> > Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
> > Cc: stable@vger.kernel.org
> > ---
> >  drivers/md/bcache/writeback.c |  2 +-
> >  drivers/md/bcache/writeback.h | 19 +++++++++++++++++++
> >  2 files changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> > index 4ac8b13..25289e4 100644
> > --- a/drivers/md/bcache/writeback.c
> > +++ b/drivers/md/bcache/writeback.c
> > @@ -21,7 +21,7 @@
> >  static void __update_writeback_rate(struct cached_dev *dc)
> >  {
> >                   struct cache_set *c = dc->disk.c;
> > -                 uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size;
> > +                 uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size -
> bcache_flash_devs_sectors_dirty(c);
> 
> See flash_dev_run(), the flash volume is created per struct
> bcache_device of a cache set. That means, all data allocated for the
> flash volume will be from a flash only bcache device. Regular dirty data
> won't mixed allocating with flash volume dirty data on identical struct
> bcache device.
> 
> Based on the above implementation, non-dirty space from flash only
> bcache device will mislead writeback rate calculation too. So I suggest
> to subtract all buckets size from all flash only bcache devices. Then it
> might be something like,
> 
> uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size -
>                                                   bcache_flash_devs_nbuckets(c);
> 
> 
> 
> Just FYI. Thanks.
> 
> Coly
> 
> >                   uint64_t cache_dirty_target =
> >                                    div_u64(cache_sectors * dc->writeback_percent, 100);
> >  
> > diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
> > index c2ab4b4..24ff589 100644
> > --- a/drivers/md/bcache/writeback.h
> > +++ b/drivers/md/bcache/writeback.h
> > @@ -14,6 +14,25 @@ static inline uint64_t bcache_dev_sectors_dirty(struct bcache_device *d)
> >                   return ret;
> >  }
> >  
> > +static inline uint64_t  bcache_flash_devs_sectors_dirty(struct cache_set *c)
> > +{
> > +                 uint64_t i, ret = 0;
> > +
> > +                 mutex_lock(&bch_register_lock);
> > +
> > +                 for (i = 0; i < c->nr_uuids; i++) {
> > +                                  struct bcache_device *d = c->devices[i];
> > +
> > +                                  if (!d || !UUID_FLASH_ONLY(&c->uuids[i]))
> > +                                                   continue;
> > +                    ret += bcache_dev_sectors_dirty(d);
> > +                 }
> > +
> > +                 mutex_unlock(&bch_register_lock);
> > +
> > +                 return ret;
> > +}
> > +
> >  static inline unsigned offset_to_stripe(struct bcache_device *d,
> >                                                                                       uint64_t offset)
> >  {
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate
  2017-07-13  4:12             ` Eric Wheeler
@ 2017-07-13  4:15               ` Coly Li
  2017-10-27 19:12                 ` Eric Wheeler
  0 siblings, 1 reply; 41+ messages in thread
From: Coly Li @ 2017-07-13  4:15 UTC (permalink / raw)
  To: Eric Wheeler, tang.junhui; +Cc: linux-bcache, linux-block, stable

On 2017/7/13 下午12:12, Eric Wheeler wrote:
> On Tue, 11 Jul 2017, tang.junhui@zte.com.cn wrote:
> 
>>> Based on the above implementation, non-dirty space from flash only
>>> bcache device will mislead writeback rate calculation too. So I suggest
>>> to subtract all buckets size from all flash only bcache devices. Then it
>>> might be something like,
>>
>> what is non-dirty space from flash only bcache device?
>> Where is non-dirty space from flash only bcache device?
> 
> Hi Tang, Coly:
>    
> Waas there more discussion on this thread, or is the patch good to go?  
> 
> Please send your ack if you're happy with it so I can queue it up.

I discussed with Tang offline, this patch is correct. But the patch
commit log should be improved. Now I help to work on it, should be done
quite soon.

Coly

>>
>>
>> 发件人:         Coly Li <i@coly.li>
>> 收件人:         Tang Junhui <tang.junhui@zte.com.cn>,
>> 抄送:        bcache@lists.ewheeler.net, linux-block@vger.kernel.org, linux-bcache@vger.kernel.org,
>> hch@infradead.org, axboe@kernel.dk, stable@vger.kernel.org
>> 日期:         2017/07/11 02:11
>> 主题:        Re: [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating
>> writeback rate
>> 发件人:        linux-bcache-owner@vger.kernel.org
>>
>> _________________________________________________________________________________________________________________
>>
>>
>>
>> On 2017/7/1 上午4:43, bcache@lists.ewheeler.net wrote:
>>> From: Tang Junhui <tang.junhui@zte.com.cn>
>>>
>>> Since dirty sectors of thin flash cannot be used to cache data for backend
>>> device, so we should subtract it in calculating writeback rate.
>>>
>>
>> I see you want to get ride of the noise of flash only cache device for
>> writeback rate calculation. It makes sense, because flash only cache
>> device won't have write back happen at all.
>>
>>
>>> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
>>> Cc: stable@vger.kernel.org
>>> ---
>>>  drivers/md/bcache/writeback.c |  2 +-
>>>  drivers/md/bcache/writeback.h | 19 +++++++++++++++++++
>>>  2 files changed, 20 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
>>> index 4ac8b13..25289e4 100644
>>> --- a/drivers/md/bcache/writeback.c
>>> +++ b/drivers/md/bcache/writeback.c
>>> @@ -21,7 +21,7 @@
>>>  static void __update_writeback_rate(struct cached_dev *dc)
>>>  {
>>>                   struct cache_set *c = dc->disk.c;
>>> -                 uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size;
>>> +                 uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size -
>> bcache_flash_devs_sectors_dirty(c);
>>
>> See flash_dev_run(), the flash volume is created per struct
>> bcache_device of a cache set. That means, all data allocated for the
>> flash volume will be from a flash only bcache device. Regular dirty data
>> won't mixed allocating with flash volume dirty data on identical struct
>> bcache device.
>>
>> Based on the above implementation, non-dirty space from flash only
>> bcache device will mislead writeback rate calculation too. So I suggest
>> to subtract all buckets size from all flash only bcache devices. Then it
>> might be something like,
>>
>> uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size -
>>                                                   bcache_flash_devs_nbuckets(c);
>>
>>
>>
>> Just FYI. Thanks.
>>
>> Coly
>>
>>>                   uint64_t cache_dirty_target =
>>>                                    div_u64(cache_sectors * dc->writeback_percent, 100);
>>>  
>>> diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
>>> index c2ab4b4..24ff589 100644
>>> --- a/drivers/md/bcache/writeback.h
>>> +++ b/drivers/md/bcache/writeback.h
>>> @@ -14,6 +14,25 @@ static inline uint64_t bcache_dev_sectors_dirty(struct bcache_device *d)
>>>                   return ret;
>>>  }
>>>  
>>> +static inline uint64_t  bcache_flash_devs_sectors_dirty(struct cache_set *c)
>>> +{
>>> +                 uint64_t i, ret = 0;
>>> +
>>> +                 mutex_lock(&bch_register_lock);
>>> +
>>> +                 for (i = 0; i < c->nr_uuids; i++) {
>>> +                                  struct bcache_device *d = c->devices[i];
>>> +
>>> +                                  if (!d || !UUID_FLASH_ONLY(&c->uuids[i]))
>>> +                                                   continue;
>>> +                    ret += bcache_dev_sectors_dirty(d);
>>> +                 }
>>> +
>>> +                 mutex_unlock(&bch_register_lock);
>>> +
>>> +                 return ret;
>>> +}
>>> +
>>>  static inline unsigned offset_to_stripe(struct bcache_device *d,
>>>                                                                                       uint64_t offset)
>>>  {
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>


-- 
Coly Li

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PULL] bcache updates based on git.kernel.dk/linux-block:for-next
  2017-05-25 19:10 [PULL] bcache updates based on git.kernel.dk/linux-block:for-next Eric Wheeler
  2017-06-28 23:06 ` [PULL] bcache fixes and updates for-4.13 Eric Wheeler
@ 2017-07-14 11:40 ` Eddie Chapman
       [not found]   ` <7040453d-2272-d376-710b-6ed07527a98e@coly.li>
  1 sibling, 1 reply; 41+ messages in thread
From: Eddie Chapman @ 2017-07-14 11:40 UTC (permalink / raw)
  To: Eric Wheeler, Jens Axboe
  Cc: linux-bcache, Jan Kara, stable, tang.junhui, Kent Overstreet,
	Coly Li, Stefan Bader, Liang Chen, nix

On 25/05/17 20:10, Eric Wheeler wrote:
> Hi Jens,
> 
> Please pull these updates and bugfixes from the bcache community when you
> have a minute.  If you need a rebase against something else then please
> let me know and I would be happy to update for you.
> 
> Thank you for your help!

<snip>

Hi all,

(replying to all but as a subscriber to stable only)

I'm not a kernel coder but have several 4.4 (kernel.org stable series) 
servers using bcache, so I'd love to see (possibly some of?) these in 
4.4 if they are relevant and apply without any significant work needed.

This series was CC'd to stable but I don't see any info of how far back 
any of them might be applicable, if at all.

If any of you guys are able to give a hint with this series just along 
the lines of "this one is/is not applicable to 4.4" then I'm happy to 
apply them, resolve any simple context issues, use, and report back with 
clean patches.

thanks,
Eddie

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PULL] bcache updates based on git.kernel.dk/linux-block:for-next
       [not found]   ` <7040453d-2272-d376-710b-6ed07527a98e@coly.li>
@ 2017-07-14 17:33     ` Eddie Chapman
       [not found]       ` <OF92BA0158.87BDF9E3-ON4825815E.000736BF-4825815E.000833F7@zte.com.cn>
  0 siblings, 1 reply; 41+ messages in thread
From: Eddie Chapman @ 2017-07-14 17:33 UTC (permalink / raw)
  To: Coly Li; +Cc: Eric Wheeler, linux-bcache, stable, tang.junhui

On 14/07/17 16:07, Coly Li wrote:
> On 2017/7/14 下午7:40, Eddie Chapman wrote:
>> On 25/05/17 20:10, Eric Wheeler wrote:
>>> Hi Jens,
>>>
>>> Please pull these updates and bugfixes from the bcache community when you
>>> have a minute.  If you need a rebase against something else then please
>>> let me know and I would be happy to update for you.
>>>
>>> Thank you for your help!
>>
>> <snip>
>>
>> Hi all,
>>
>> (replying to all but as a subscriber to stable only)
>>
>> I'm not a kernel coder but have several 4.4 (kernel.org stable series)
>> servers using bcache, so I'd love to see (possibly some of?) these in
>> 4.4 if they are relevant and apply without any significant work needed.
>>
>> This series was CC'd to stable but I don't see any info of how far back
>> any of them might be applicable, if at all.
>>
>> If any of you guys are able to give a hint with this series just along
>> the lines of "this one is/is not applicable to 4.4" then I'm happy to
>> apply them, resolve any simple context issues, use, and report back with
>> clean patches.
> 
> (remove many unnecessary email receivers)

I'm re-adding the stable list to CC as we're discussing a stable kernel. 
Hope that's OK.

> Hi Eddie,
> 
> I think some patches from Junhui Tang are important stable fixes.
> After all the patches get reviewed, and accepted in mainline kernel, you
> may find them in 4.4 stable tree (any way it won't be very soon for
> these fixes show up in stable tree).
> 
> Thanks.

Thanks for your reply Coly.  You're right, forgot about that. Before 
they can go in 4.4 or any other stable kernel they must be in Linus' tree.

Of the 9 patches CC'd to stable, it looks to me that so far these 2 have 
subsequently received approval by you plus at least 1 other person other 
than the author (e.g. Christoph Hellwig):

- fix sequential large write IO bypass
- do not subtract sectors_to_gc for bypassed IO

The first one looks particularly important to me and Kent himself has 
also reviewed it.

This one also has not received any objections yet and you mentioned you 
discussed with the author and both concluded it is correct:

- Subtract dirty sectors of thin flash from cache_sectors in calculating 
writeback rate

So for me these 3 by Junhui Tang seem "safe" enough that I will take a 
little risk and try them already on 4.4 on my own machines (I'm guessing 
they are likely relevant to 4.4 but of course I'll check if they apply). 
I'll report back, FWIW.

If I'm brave (foolish) enough I might go through mainline bcache commits 
since 4.4 and see if there are any other goodies to try out with 4.4. Of 
course if anyone has any in particular to suggest to me to try, please 
do! If I do, I'll report back anything that seems to have worked.

Thanks again!
Eddie

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PULL] bcache updates based on git.kernel.dk/linux-block:for-next
       [not found]       ` <OF92BA0158.87BDF9E3-ON4825815E.000736BF-4825815E.000833F7@zte.com.cn>
@ 2017-07-18 18:24         ` Eddie Chapman
  2017-07-18 18:31           ` Eddie Chapman
  0 siblings, 1 reply; 41+ messages in thread
From: Eddie Chapman @ 2017-07-18 18:24 UTC (permalink / raw)
  To: tang.junhui; +Cc: Eric Wheeler, Coly Li, linux-bcache, stable

On 15/07/17 02:29, tang.junhui@zte.com.cn wrote:
> Hello Eddie
> 
> One thing, please see below:
> 
> 
>>On 14/07/17 16:07, Coly Li wrote:
>>> On 2017/7/14 下午7:40, Eddie Chapman wrote:
>>>> On 25/05/17 20:10, Eric Wheeler wrote:
>>>>> Hi Jens,
>>>>>
>>>>> Please pull these updates and bugfixes from the bcache community when you
>>>>> have a minute.  If you need a rebase against something else then please
>>>>> let me know and I would be happy to update for you.
>>>>>
>>>>> Thank you for your help!
>>>>
>>>> <snip>
>>>>
>>>> Hi all,
>>>>
>>>> (replying to all but as a subscriber to stable only)
>>>>
>>>> I'm not a kernel coder but have several 4.4 (kernel.org stable series)
>>>> servers using bcache, so I'd love to see (possibly some of?) these in
>>>> 4.4 if they are relevant and apply without any significant work needed.
>>>>
>>>> This series was CC'd to stable but I don't see any info of how far back
>>>> any of them might be applicable, if at all.
>>>>
>>>> If any of you guys are able to give a hint with this series just along
>>>> the lines of "this one is/is not applicable to 4.4" then I'm happy to
>>>> apply them, resolve any simple context issues, use, and report back with
>>>> clean patches.
>>>
>>> (remove many unnecessary email receivers)
>>
>>I'm re-adding the stable list to CC as we're discussing a stable kernel.
>>Hope that's OK.
>>
>>> Hi Eddie,
>>>
>>> I think some patches from Junhui Tang are important stable fixes.
>>> After all the patches get reviewed, and accepted in mainline kernel, you
>>> may find them in 4.4 stable tree (any way it won't be very soon for
>>> these fixes show up in stable tree).
>>>
>>> Thanks.
>>
>>Thanks for your reply Coly.  You're right, forgot about that. Before
>>they can go in 4.4 or any other stable kernel they must be in Linus' tree.
>>
>>Of the 9 patches CC'd to stable, it looks to me that so far these 2 have
>>subsequently received approval by you plus at least 1 other person other
>>than the author (e.g. Christoph Hellwig):
>>
>>- fix sequential large write IO bypass
>>- do not subtract sectors_to_gc for bypassed IO
>>
>>The first one looks particularly important to me and Kent himself has
>>also reviewed it.
>>
>>This one also has not received any objections yet and you mentioned you
>>discussed with the author and both concluded it is correct:
>>
>>- Subtract dirty sectors of thin flash from cache_sectors in calculating
>>writeback rate
> 
> If you want to apply this patch, please also apply this patch to make
> dirty sectors of thin flashInitializing correctly. Otherwise
> it will subtract a wrong dirty sectors of thin flash.
> 
> -bcache: initialize stripe_sectors_dirty correctly for thin flash device

Thank you very much Tang for pointing this out.

I applied all 4 patches on top of vanilla kernel.org 4.4.77 . So they are:

- bcache: fix sequential large write IO bypass
- bcache: do not subtract sectors_to_gc for bypassed IO
- bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate
- bcache: initialize stripe_sectors_dirty correctly for thin flash device

They applied without any issues just some fuzz.
There were no issues during build, and I've been running 4.4.77 with 
these 4 patches on one server using bcache for 3 days now without any 
problems or any unusual logs from bcache in dmesg.

I will send another mail separately, as a reply to this one, with the 4 
patches with context adjusted, generated from patched 4.4.77 using diff -up, 
in case they are any use to anyone. Not sure if they are any use to you guys 
though as they are not generated with git-send-email, but anyway, FWIW.

Thanks,
Eddie

>>So for me these 3 by Junhui Tang seem "safe" enough that I will take a
>>little risk and try them already on 4.4 on my own machines (I'm guessing
>>they are likely relevant to 4.4 but of course I'll check if they apply).
>>I'll report back, FWIW.
>>
>>If I'm brave (foolish) enough I might go through mainline bcache commits
>>since 4.4 and see if there are any other goodies to try out with 4.4. Of
>>course if anyone has any in particular to suggest to me to try, please
>>do! If I do, I'll report back anything that seems to have worked.
>>
>>Thanks again!
>>Eddie
>>--
> 
> Thanks to use it.
> 
> Tang Junhui
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PULL] bcache updates based on git.kernel.dk/linux-block:for-next
  2017-07-18 18:24         ` Eddie Chapman
@ 2017-07-18 18:31           ` Eddie Chapman
  2017-07-18 20:06             ` Greg KH
  0 siblings, 1 reply; 41+ messages in thread
From: Eddie Chapman @ 2017-07-18 18:31 UTC (permalink / raw)
  To: tang.junhui; +Cc: Eric Wheeler, Coly Li, linux-bcache, stable

As per previous mail, below are the 4 patches discussed, generated with 
diff -up, from kernel.org 4.4.77, so with correct context for 4.4. Just 
in case they are useful for anyone.

---
From: Tang Junhui <tang.junhui@zte.com.cn>

Sequential write IOs were tested with bs=1M by FIO in writeback cache
mode, these IOs were expected to be bypassed, but actually they did not.
We debug the code, and find in check_should_bypass():
    if (!congested &&
        mode == CACHE_MODE_WRITEBACK &&
        op_is_write(bio_op(bio)) &&
        (bio->bi_opf & REQ_SYNC))
        goto rescale
that means, If in writeback mode, a write IO with REQ_SYNC flag will not
be bypassed though it is a sequential large IO, It's not a correct thing
to do actually, so this patch remove these codes.

Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org
---

--- a/drivers/md/bcache/request.c	2017-07-18 18:09:38.686156583 +0100
+++ b/drivers/md/bcache/request.c	2017-07-18 18:09:44.596167542 +0100
@@ -400,12 +400,6 @@ static bool check_should_bypass(struct c
 	if (!congested && !dc->sequential_cutoff)
 		goto rescale;
 
-	if (!congested &&
-	    mode == CACHE_MODE_WRITEBACK &&
-	    (bio->bi_rw & REQ_WRITE) &&
-	    (bio->bi_rw & REQ_SYNC))
-		goto rescale;
-
 	spin_lock(&dc->io_lock);
 
 	hlist_for_each_entry(i, iohash(dc, bio->bi_iter.bi_sector), hash)


From: Tang Junhui <tang.junhui@zte.com.cn>

Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to
trigger gc thread.

Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org
---

--- a/drivers/md/bcache/request.c	2017-07-18 18:18:43.937169337 +0100
+++ b/drivers/md/bcache/request.c	2017-07-18 18:21:45.637507148 +0100
@@ -196,12 +196,12 @@ static void bch_data_insert_start(struct
 	struct data_insert_op *op = container_of(cl, struct data_insert_op, cl);
 	struct bio *bio = op->bio, *n;
 
-	if (atomic_sub_return(bio_sectors(bio), &op->c->sectors_to_gc) < 0)
-		wake_up_gc(op->c);
-
 	if (op->bypass)
 		return bch_data_invalidate(cl);
 
+	if (atomic_sub_return(bio_sectors(bio), &op->c->sectors_to_gc) < 0)
+		wake_up_gc(op->c);
+
 	/*
 	 * Journal writes are marked REQ_FLUSH; if the original write was a
 	 * flush, it'll wait on the journal write.


From: Tang Junhui <tang.junhui@zte.com.cn>

Thin flash device does not initialize stripe_sectors_dirty correctly, this
patch fixes this issue.

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Cc: stable@vger.kernel.org
---

--- a/drivers/md/bcache/super.c	2017-07-18 18:31:38.968611871 +0100
+++ b/drivers/md/bcache/super.c	2017-07-18 18:32:36.078718382 +0100
@@ -1023,7 +1023,7 @@ int bch_cached_dev_attach(struct cached_
 	}
 
 	if (BDEV_STATE(&dc->sb) == BDEV_STATE_DIRTY) {
-		bch_sectors_dirty_init(dc);
+		bch_sectors_dirty_init(&dc->disk);
 		atomic_set(&dc->has_dirty, 1);
 		atomic_inc(&dc->count);
 		bch_writeback_queue(dc);
@@ -1227,6 +1227,7 @@ static int flash_dev_run(struct cache_se
 		goto err;
 
 	bcache_device_attach(d, c, u - c->uuids);
+	bch_sectors_dirty_init(d);
 	bch_flash_dev_request_init(d);
 	add_disk(d->disk);
 
--- a/drivers/md/bcache/writeback.c	2017-07-18 18:31:50.718633782 +0100
+++ b/drivers/md/bcache/writeback.c	2017-07-18 18:32:36.078718382 +0100
@@ -488,17 +488,17 @@ static int sectors_dirty_init_fn(struct
 	return MAP_CONTINUE;
 }
 
-void bch_sectors_dirty_init(struct cached_dev *dc)
+void bch_sectors_dirty_init(struct bcache_device *d)
 {
 	struct sectors_dirty_init op;
 
 	bch_btree_op_init(&op.op, -1);
-	op.inode = dc->disk.id;
+	op.inode = d->id;
 
-	bch_btree_map_keys(&op.op, dc->disk.c, &KEY(op.inode, 0, 0),
+	bch_btree_map_keys(&op.op, d->c, &KEY(op.inode, 0, 0),
 			   sectors_dirty_init_fn, 0);
 
-	dc->disk.sectors_dirty_last = bcache_dev_sectors_dirty(&dc->disk);
+	d->sectors_dirty_last = bcache_dev_sectors_dirty(d);
 }
 
 void bch_cached_dev_writeback_init(struct cached_dev *dc)
--- a/drivers/md/bcache/writeback.h	2017-07-18 18:32:00.588652189 +0100
+++ b/drivers/md/bcache/writeback.h	2017-07-18 18:32:36.078718382 +0100
@@ -85,7 +85,7 @@ static inline void bch_writeback_add(str
 
 void bcache_dev_sectors_dirty_add(struct cache_set *, unsigned, uint64_t, int);
 
-void bch_sectors_dirty_init(struct cached_dev *dc);
+void bch_sectors_dirty_init(struct bcache_device *);
 void bch_cached_dev_writeback_init(struct cached_dev *);
 int bch_cached_dev_writeback_start(struct cached_dev *);


From: Tang Junhui <tang.junhui@zte.com.cn>

Since dirty sectors of thin flash cannot be used to cache data for backend
device, so we should subtract it in calculating writeback rate.

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Cc: stable@vger.kernel.org
---

--- a/drivers/md/bcache/writeback.c	2017-07-18 18:38:46.929410077 +0100
+++ b/drivers/md/bcache/writeback.c	2017-07-18 18:39:00.979436233 +0100
@@ -21,7 +21,7 @@
 static void __update_writeback_rate(struct cached_dev *dc)
 {
 	struct cache_set *c = dc->disk.c;
-	uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size;
+	uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size - bcache_flash_devs_sectors_dirty(c);
 	uint64_t cache_dirty_target =
 		div_u64(cache_sectors * dc->writeback_percent, 100);
 
--- a/drivers/md/bcache/writeback.h	2017-07-18 18:38:50.489416705 +0100
+++ b/drivers/md/bcache/writeback.h	2017-07-18 18:39:00.979436233 +0100
@@ -14,6 +14,25 @@ static inline uint64_t bcache_dev_sector
 	return ret;
 }
 
+static inline uint64_t  bcache_flash_devs_sectors_dirty(struct cache_set *c)
+{
+	uint64_t i, ret = 0;
+
+	mutex_lock(&bch_register_lock);
+
+	for (i = 0; i < c->nr_uuids; i++) {
+		struct bcache_device *d = c->devices[i];
+
+		if (!d || !UUID_FLASH_ONLY(&c->uuids[i]))
+			continue;
+	   ret += bcache_dev_sectors_dirty(d);
+	}
+
+	mutex_unlock(&bch_register_lock);
+
+	return ret;
+}
+
 static inline unsigned offset_to_stripe(struct bcache_device *d,
 					uint64_t offset)
 {

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PULL] bcache updates based on git.kernel.dk/linux-block:for-next
  2017-07-18 18:31           ` Eddie Chapman
@ 2017-07-18 20:06             ` Greg KH
  2017-07-18 20:36               ` Eddie Chapman
  0 siblings, 1 reply; 41+ messages in thread
From: Greg KH @ 2017-07-18 20:06 UTC (permalink / raw)
  To: Eddie Chapman; +Cc: tang.junhui, Eric Wheeler, Coly Li, linux-bcache, stable

On Tue, Jul 18, 2017 at 07:31:35PM +0100, Eddie Chapman wrote:
> As per previous mail, below are the 4 patches discussed, generated with 
> diff -up, from kernel.org 4.4.77, so with correct context for 4.4. Just 
> in case they are useful for anyone.

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>

Note, I can't do anything with patches until they are in Linus's tree.
Please work to get them accepted there, then we can worry about
backporting them to older kernels if they are needed.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PULL] bcache updates based on git.kernel.dk/linux-block:for-next
  2017-07-18 20:06             ` Greg KH
@ 2017-07-18 20:36               ` Eddie Chapman
  0 siblings, 0 replies; 41+ messages in thread
From: Eddie Chapman @ 2017-07-18 20:36 UTC (permalink / raw)
  To: Greg KH; +Cc: tang.junhui, Eric Wheeler, Coly Li, linux-bcache, stable

Hi Greg,

On 18/07/17 21:06, Greg KH wrote:
> On Tue, Jul 18, 2017 at 07:31:35PM +0100, Eddie Chapman wrote:
>> As per previous mail, below are the 4 patches discussed, generated with 
>> diff -up, from kernel.org 4.4.77, so with correct context for 4.4. Just 
>> in case they are useful for anyone.
> 
> <formletter>
> 
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree.  Please read:
>     https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> for how to do this properly.
> 
> </formletter>

I know, I certainly did not want to submit these patches for inclusion in the stable kernel yet! As I said, I only sent them to Tang and others in case they were useful. I'm just trying to help with testing.

> Note, I can't do anything with patches until they are in Linus's tree.
> Please work to get them accepted there, then we can worry about
> backporting them to older kernels if they are needed.

Hold your fire! :-) I know, I said exactly the same myself earlier in this thread, that they must be in Linus' tree first.

Sorry if I gave the wrong idea. Maybe I should have removed stable@ from CC? But I thought that would not be good in case someone found the thread in future and wanted to know.

Eddie

> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 17/19] bcache: fix for gc and write-back race
  2017-06-30 20:43       ` [PATCH 17/19] bcache: fix for gc and write-back race bcache
@ 2017-08-03 16:20         ` Coly Li
  0 siblings, 0 replies; 41+ messages in thread
From: Coly Li @ 2017-08-03 16:20 UTC (permalink / raw)
  To: bcache, Tang Junhui; +Cc: linux-block, linux-bcache, hch, axboe, stable

On 2017/7/1 上午4:43, bcache@lists.ewheeler.net wrote:
> From: Tang Junhui <tang.junhui@zte.com.cn>
> 
> gc and write-back get raced (see the email "bcache get stucked" I sended
> before):
> gc thread						write-back thread
> |							|bch_writeback_thread()
> |bch_gc_thread()					|
> |							|==>read_dirty()
> |==>bch_btree_gc()					|
> |==>btree_root() //get btree root			|
> |			node write locker		|
> |==>bch_btree_gc_root()					|
> |							|==>read_dirty_submit()
> |							|==>write_dirty()
> |							|==>continue_at(cl, write_dirty_finish, system_wq);
> |							|==>write_dirty_finish()//excute in system_wq
> |							|==>bch_btree_insert()
> |							|==>bch_btree_map_leaf_nodes()
> |							|==>__bch_btree_map_nodes()
> |							|==>btree_root //try to get btree root node read lock
> |							|-----stuck here
> |==>bch_btree_set_root()				|
> |==>bch_journal_meta()					|
> |==>bch_journal()					|
> |==>journal_try_write()					|
> |==>journal_write_unlocked() //journal_full(&c->journal) condition satisfied
> |==>continue_at(cl, journal_write, system_wq); //try to excute journal_write in system_wq
> |					//but work queue is excuting write_dirty_finish()
> |==>closure_sync(); //wait journal_write execute over and wake up gc,
> |			--stuck here
> |==>release root node write locker
> 
> This patch alloc a separate work-queue for write-back thread to avoid such
> race.
> 
> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
> Cc: stable@vger.kernel.org

Add a per-cached device work queue is a good idea, it's OK to me.

Acked-by: Coly Li <colyli@suse.de>

Thansk.

Coly

> ---
>  drivers/md/bcache/bcache.h    | 1 +
>  drivers/md/bcache/super.c     | 2 ++
>  drivers/md/bcache/writeback.c | 8 ++++++--
>  3 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index 44123e4..deb0a6c 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -333,6 +333,7 @@ struct cached_dev {
>  	/* Limit number of writeback bios in flight */
>  	struct semaphore	in_flight;
>  	struct task_struct	*writeback_thread;
> +	struct workqueue_struct	*writeback_write_wq;
>  
>  	struct keybuf		writeback_keys;
>  
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index e06641e..24cb9b7 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1063,6 +1063,8 @@ static void cached_dev_free(struct closure *cl)
>  	cancel_delayed_work_sync(&dc->writeback_rate_update);
>  	if (!IS_ERR_OR_NULL(dc->writeback_thread))
>  		kthread_stop(dc->writeback_thread);
> +	if (dc->writeback_write_wq)
> +		destroy_workqueue(dc->writeback_write_wq);
>  
>  	mutex_lock(&bch_register_lock);
>  
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 4104eaa..4bc5daa 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -189,7 +189,7 @@ static void write_dirty(struct closure *cl)
>  
>  	closure_bio_submit(&io->bio, cl);
>  
> -	continue_at(cl, write_dirty_finish, system_wq);
> +	continue_at(cl, write_dirty_finish, io->dc->writeback_write_wq);
>  }
>  
>  static void read_dirty_endio(struct bio *bio)
> @@ -209,7 +209,7 @@ static void read_dirty_submit(struct closure *cl)
>  
>  	closure_bio_submit(&io->bio, cl);
>  
> -	continue_at(cl, write_dirty, system_wq);
> +	continue_at(cl, write_dirty, io->dc->writeback_write_wq);
>  }
>  
>  static void read_dirty(struct cached_dev *dc)
> @@ -527,6 +527,10 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc)
>  
>  int bch_cached_dev_writeback_start(struct cached_dev *dc)
>  {
> +	dc->writeback_write_wq = alloc_workqueue("bcache_writeback_wq", WQ_MEM_RECLAIM, 0);
> +	if (!dc->writeback_write_wq)
> +		return -ENOMEM;
> +
>  	dc->writeback_thread = kthread_create(bch_writeback_thread, dc,
>  					      "bcache_writeback");
>  	if (IS_ERR(dc->writeback_thread))
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate
  2017-07-13  4:15               ` Coly Li
@ 2017-10-27 19:12                 ` Eric Wheeler
  0 siblings, 0 replies; 41+ messages in thread
From: Eric Wheeler @ 2017-10-27 19:12 UTC (permalink / raw)
  To: Coly Li; +Cc: Michael Lyle, tang.junhui, linux-bcache, linux-block, stable

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5660 bytes --]

On Thu, 13 Jul 2017, Coly Li wrote:

> On 2017/7/13 下午12:12, Eric Wheeler wrote:
> > On Tue, 11 Jul 2017, tang.junhui@zte.com.cn wrote:
> > 
> >>> Based on the above implementation, non-dirty space from flash only
> >>> bcache device will mislead writeback rate calculation too. So I suggest
> >>> to subtract all buckets size from all flash only bcache devices. Then it
> >>> might be something like,
> >>
> >> what is non-dirty space from flash only bcache device?
> >> Where is non-dirty space from flash only bcache device?
> > 
> > Hi Tang, Coly:
> >    
> > Waas there more discussion on this thread, or is the patch good to go?  
> > 
> > Please send your ack if you're happy with it so I can queue it up.
> 
> I discussed with Tang offline, this patch is correct. But the patch
> commit log should be improved. Now I help to work on it, should be done
> quite soon.

Has an updated commit log been made?  I've not seen this in the commit 
stream yet.

--
Eric Wheeler



> 
> Coly
> 
> >>
> >>
> >> 发件人:         Coly Li <i@coly.li>
> >> 收件人:         Tang Junhui <tang.junhui@zte.com.cn>,
> >> 抄送:        bcache@lists.ewheeler.net, linux-block@vger.kernel.org, linux-bcache@vger.kernel.org,
> >> hch@infradead.org, axboe@kernel.dk, stable@vger.kernel.org
> >> 日期:         2017/07/11 02:11
> >> 主题:        Re: [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating
> >> writeback rate
> >> 发件人:        linux-bcache-owner@vger.kernel.org
> >>
> >> _________________________________________________________________________________________________________________
> >>
> >>
> >>
> >> On 2017/7/1 上午4:43, bcache@lists.ewheeler.net wrote:
> >>> From: Tang Junhui <tang.junhui@zte.com.cn>
> >>>
> >>> Since dirty sectors of thin flash cannot be used to cache data for backend
> >>> device, so we should subtract it in calculating writeback rate.
> >>>
> >>
> >> I see you want to get ride of the noise of flash only cache device for
> >> writeback rate calculation. It makes sense, because flash only cache
> >> device won't have write back happen at all.
> >>
> >>
> >>> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
> >>> Cc: stable@vger.kernel.org
> >>> ---
> >>>  drivers/md/bcache/writeback.c |  2 +-
> >>>  drivers/md/bcache/writeback.h | 19 +++++++++++++++++++
> >>>  2 files changed, 20 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> >>> index 4ac8b13..25289e4 100644
> >>> --- a/drivers/md/bcache/writeback.c
> >>> +++ b/drivers/md/bcache/writeback.c
> >>> @@ -21,7 +21,7 @@
> >>>  static void __update_writeback_rate(struct cached_dev *dc)
> >>>  {
> >>>                   struct cache_set *c = dc->disk.c;
> >>> -                 uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size;
> >>> +                 uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size -
> >> bcache_flash_devs_sectors_dirty(c);
> >>
> >> See flash_dev_run(), the flash volume is created per struct
> >> bcache_device of a cache set. That means, all data allocated for the
> >> flash volume will be from a flash only bcache device. Regular dirty data
> >> won't mixed allocating with flash volume dirty data on identical struct
> >> bcache device.
> >>
> >> Based on the above implementation, non-dirty space from flash only
> >> bcache device will mislead writeback rate calculation too. So I suggest
> >> to subtract all buckets size from all flash only bcache devices. Then it
> >> might be something like,
> >>
> >> uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size -
> >>                                                   bcache_flash_devs_nbuckets(c);
> >>
> >>
> >>
> >> Just FYI. Thanks.
> >>
> >> Coly
> >>
> >>>                   uint64_t cache_dirty_target =
> >>>                                    div_u64(cache_sectors * dc->writeback_percent, 100);
> >>>  
> >>> diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
> >>> index c2ab4b4..24ff589 100644
> >>> --- a/drivers/md/bcache/writeback.h
> >>> +++ b/drivers/md/bcache/writeback.h
> >>> @@ -14,6 +14,25 @@ static inline uint64_t bcache_dev_sectors_dirty(struct bcache_device *d)
> >>>                   return ret;
> >>>  }
> >>>  
> >>> +static inline uint64_t  bcache_flash_devs_sectors_dirty(struct cache_set *c)
> >>> +{
> >>> +                 uint64_t i, ret = 0;
> >>> +
> >>> +                 mutex_lock(&bch_register_lock);
> >>> +
> >>> +                 for (i = 0; i < c->nr_uuids; i++) {
> >>> +                                  struct bcache_device *d = c->devices[i];
> >>> +
> >>> +                                  if (!d || !UUID_FLASH_ONLY(&c->uuids[i]))
> >>> +                                                   continue;
> >>> +                    ret += bcache_dev_sectors_dirty(d);
> >>> +                 }
> >>> +
> >>> +                 mutex_unlock(&bch_register_lock);
> >>> +
> >>> +                 return ret;
> >>> +}
> >>> +
> >>>  static inline unsigned offset_to_stripe(struct bcache_device *d,
> >>>                                                                                       uint64_t offset)
> >>>  {
> >>>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> >>
> 
> 
> -- 
> Coly Li
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2017-10-27 19:17 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-25 19:10 [PULL] bcache updates based on git.kernel.dk/linux-block:for-next Eric Wheeler
2017-06-28 23:06 ` [PULL] bcache fixes and updates for-4.13 Eric Wheeler
2017-06-29 13:45   ` Christoph Hellwig
2017-06-29 16:19     ` Coly Li
2017-06-29 22:12     ` Eric Wheeler
2017-06-29 22:25       ` Eric Wheeler
2017-06-29 23:28         ` Nick Alcock
2017-06-30 20:42     ` [PATCH 01/19] bcache: Fix leak of bdev reference bcache
2017-06-30 20:42       ` [PATCH 02/19] bcache: fix sequential large write IO bypass bcache
2017-07-05 18:25         ` Christoph Hellwig
2017-06-30 20:42       ` [PATCH 03/19] bcache: do not subtract sectors_to_gc for bypassed IO bcache
2017-07-01 17:26         ` Coly Li
2017-07-05 18:25         ` Christoph Hellwig
2017-06-30 20:42       ` [PATCH 04/19] bcache: fix wrong cache_misses statistics bcache
2017-07-01 17:58         ` Coly Li
2017-06-30 20:42       ` [PATCH 06/19] bcache: explicitly destory mutex while exiting bcache
2017-07-01 18:43         ` Coly Li
2017-07-05 11:58           ` Liang Chen
2017-07-11  7:22             ` Coly Li
2017-07-05 18:27         ` Christoph Hellwig
2017-07-06  1:56           ` Liang Chen
2017-06-30 20:42       ` [PATCH 10/19] bcache: initialize stripe_sectors_dirty correctly for thin flash device bcache
2017-07-01 18:52         ` Coly Li
2017-06-30 20:43       ` [PATCH 11/19] bcache: Subtract dirty sectors of thin flash from cache_sectors in calculating writeback rate bcache
2017-07-10 18:11         ` Coly Li
     [not found]           ` <OF92BDA950.86AA00FA-ON4825815A.001F33D9-4825815A.001F5C89@zte.com.cn>
2017-07-13  4:12             ` Eric Wheeler
2017-07-13  4:15               ` Coly Li
2017-10-27 19:12                 ` Eric Wheeler
2017-06-30 20:43       ` [PATCH 14/19] bcache: Correct return value for sysfs attach errors bcache
2017-06-30 20:43       ` [PATCH 17/19] bcache: fix for gc and write-back race bcache
2017-08-03 16:20         ` Coly Li
2017-07-01 16:55       ` [PATCH 01/19] bcache: Fix leak of bdev reference Coly Li
2017-07-05 18:24       ` Christoph Hellwig
     [not found]       ` <1498855388-16990-5-git-send-email-bcache@lists.ewheeler.net>
2017-07-05 18:26         ` [PATCH 05/19] bcache: fix calling ida_simple_remove() with incorrect minor Christoph Hellwig
2017-07-14 11:40 ` [PULL] bcache updates based on git.kernel.dk/linux-block:for-next Eddie Chapman
     [not found]   ` <7040453d-2272-d376-710b-6ed07527a98e@coly.li>
2017-07-14 17:33     ` Eddie Chapman
     [not found]       ` <OF92BA0158.87BDF9E3-ON4825815E.000736BF-4825815E.000833F7@zte.com.cn>
2017-07-18 18:24         ` Eddie Chapman
2017-07-18 18:31           ` Eddie Chapman
2017-07-18 20:06             ` Greg KH
2017-07-18 20:36               ` Eddie Chapman
  -- strict thread matches above, loose matches on Subject: below --
2017-07-01 19:25 [PATCH 05/19] bcache: fix calling ida_simple_remove() with incorrect minor Eric Wheeler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox