[PATCH 0/5] dm thin: adapt code to use new async __blkdev_issue

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] dm thin: adapt code to use new async __blkdev_issue_discard
@ 2016-05-05 15:54 Mike Snitzer
  2016-05-05 15:54 ` [PATCH v2 1/5] block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard Mike Snitzer
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Mike Snitzer @ 2016-05-05 15:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, dm-devel, Christoph Hellwig, Mike Snitzer

I'd like to see these changes included in 4.7.

Jens, if you (and hch and others) are OK with the first 2 block
patches can you please add them to your 'for-4.7/core'?

Joe Thornber (1):
  dm thin: unroll issue_discard() to create longer discard bio chains

Mike Snitzer (4):
  block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard
  block: make bio_inc_remaining() interface accessible again
  dm thin: remove __bio_inc_remaining() and switch to using bio_inc_remaining()
  dm thin: use __blkdev_issue_discard for async discard support

 block/bio.c          |  11 ----
 block/blk-lib.c      |   7 ++-
 drivers/md/dm-thin.c | 159 +++++++++++++++++++++++----------------------------
 include/linux/bio.h  |  11 ++++
 4 files changed, 88 insertions(+), 100 deletions(-)

-- 
2.6.4 (Apple Git-63)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/5] block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard
  2016-05-05 15:54 [PATCH 0/5] dm thin: adapt code to use new async __blkdev_issue_discard Mike Snitzer
@ 2016-05-05 15:54 ` Mike Snitzer
  2016-05-06 16:04   ` Christoph Hellwig
  2016-05-05 15:54 ` [PATCH 2/5] block: make bio_inc_remaining() interface accessible again Mike Snitzer
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Mike Snitzer @ 2016-05-05 15:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, dm-devel, Christoph Hellwig, Mike Snitzer

Commit 38f25255330 ("block: add __blkdev_issue_discard") incorrectly
disallowed the early return of -EOPNOTSUPP if the device doesn't support
discard (or secure discard).  This early return of -EOPNOTSUPP has
always been part of blkdev_issue_discard() interface so there isn't a
good reason to break that behaviour -- especially when it can be easily
reinstated.

The nuance of allowing early return of -EOPNOTSUPP vs disallowing late
return of -EOPNOTSUPP is: if the overall device never advertised support
for discards and one is issued to the device it is beneficial to inform
the caller that discards are not supported via -EOPNOTSUPP.  But if a
device advertises discard support it means that at least a subset of the
device does have discard support -- but it could be that discards issued
to some regions of a stacked device will not be supported.  In that case
the late return of -EOPNOTSUPP must be disallowed.

Fixes: 38f25255330 ("block: add __blkdev_issue_discard")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-lib.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index ccbce2b..23d7f30 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -109,11 +109,14 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	blk_start_plug(&plug);
 	ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
 			&bio);
-	if (!ret && bio)
+	if (!ret && bio) {
 		ret = submit_bio_wait(type, bio);
+		if (ret == -EOPNOTSUPP)
+			ret = 0;
+	}
 	blk_finish_plug(&plug);

-	return ret != -EOPNOTSUPP ? ret : 0;
+	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_discard);

-- 
2.6.4 (Apple Git-63)

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/5] block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard
  2016-05-05 15:54 ` [PATCH v2 1/5] block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard Mike Snitzer
@ 2016-05-06 16:04   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2016-05-06 16:04 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Jens Axboe, linux-block, dm-devel, Christoph Hellwig

I don't really agree with the rationale, but I'm fine with reverting
to it this late in the cycle, so:

Acked-by: Christoph Hellwig <hch@lst.de>

But looking at your slightly convoluted handling of the errors in
dm-think, what do you think about splitting out these early checks
into a separate helper, at which point __blkdev_issue_discard
doesn't need a return value any more?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/5] block: make bio_inc_remaining() interface accessible again
  2016-05-05 15:54 [PATCH 0/5] dm thin: adapt code to use new async __blkdev_issue_discard Mike Snitzer
  2016-05-05 15:54 ` [PATCH v2 1/5] block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard Mike Snitzer
@ 2016-05-05 15:54 ` Mike Snitzer
  2016-05-06 15:25   ` Christoph Hellwig
  2016-05-05 15:54 ` [PATCH 3/5] dm thin: remove __bio_inc_remaining() and switch to using bio_inc_remaining() Mike Snitzer
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Mike Snitzer @ 2016-05-05 15:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, dm-devel, Christoph Hellwig, Mike Snitzer

Commit 326e1dbb57 ("block: remove management of bi_remaining when
restoring original bi_end_io") made bio_inc_remaining() private to bio.c
because the only use-case that made sense was confined to the
bio_chain() interface.

Since that time DM thinp went on to use bio_chain() in its relatively
complex implementation of async discard support.  That implementation,
even when converted over to use the new async __blkdev_issue_discard()
interface, depends on deferred completion of the original discard bio --
which is most appropriately implemented using bio_inc_remaining().

DM thinp foolishly duplicated bio_inc_remaining(), local to dm-thin.c as
__bio_inc_remaining(), so re-exporting bio_inc_remaining() allows us to
put an end to that foolishness.

All said, bio_inc_remaining() should really only be used in conjunction
with bio_chain().  It isn't intended for generic bio reference counting.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
---
 block/bio.c         | 11 -----------
 include/linux/bio.h | 11 +++++++++++
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 807d25e..0e4aa42 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -311,17 +311,6 @@ static void bio_chain_endio(struct bio *bio)
 	bio_endio(__bio_chain_endio(bio));
 }
 
-/*
- * Increment chain count for the bio. Make sure the CHAIN flag update
- * is visible before the raised count.
- */
-static inline void bio_inc_remaining(struct bio *bio)
-{
-	bio_set_flag(bio, BIO_CHAIN);
-	smp_mb__before_atomic();
-	atomic_inc(&bio->__bi_remaining);
-}
-
 /**
  * bio_chain - chain bio completions
  * @bio: the target bio
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 6b7481f..9faebf7 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -703,6 +703,17 @@ static inline struct bio *bio_list_get(struct bio_list *bl)
 }
 
 /*
+ * Increment chain count for the bio. Make sure the CHAIN flag update
+ * is visible before the raised count.
+ */
+static inline void bio_inc_remaining(struct bio *bio)
+{
+	bio_set_flag(bio, BIO_CHAIN);
+	smp_mb__before_atomic();
+	atomic_inc(&bio->__bi_remaining);
+}
+
+/*
  * bio_set is used to allow other portions of the IO system to
  * allocate their own private memory pools for bio and iovec structures.
  * These memory pools in turn all allocate from the bio_slab
-- 
2.6.4 (Apple Git-63)

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] block: make bio_inc_remaining() interface accessible again
  2016-05-05 15:54 ` [PATCH 2/5] block: make bio_inc_remaining() interface accessible again Mike Snitzer
@ 2016-05-06 15:25   ` Christoph Hellwig
  2016-05-06 15:56     ` Christoph Hellwig
  2016-05-06 15:57     ` Mike Snitzer
  0 siblings, 2 replies; 20+ messages in thread
From: Christoph Hellwig @ 2016-05-06 15:25 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Jens Axboe, linux-block, dm-devel, Christoph Hellwig

On Thu, May 05, 2016 at 11:54:22AM -0400, Mike Snitzer wrote:
> Commit 326e1dbb57 ("block: remove management of bi_remaining when
> restoring original bi_end_io") made bio_inc_remaining() private to bio.c
> because the only use-case that made sense was confined to the
> bio_chain() interface.
> 
> Since that time DM thinp went on to use bio_chain() in its relatively
> complex implementation of async discard support.  That implementation,
> even when converted over to use the new async __blkdev_issue_discard()
> interface, depends on deferred completion of the original discard bio --
> which is most appropriately implemented using bio_inc_remaining().

Can you explain that code flow to me?  I still fail to why exactly
dm-thinp (and only dm-thinp) needs this.  Maybe start by pointing me
to the additional bio_endio that pairs with the bio_inc_remaining
call.

> All said, bio_inc_remaining() should really only be used in conjunction
> with bio_chain().  It isn't intended for generic bio reference counting.

It's obviously used outside bio_chain in dm-thinp, so this sentence
doesn't make too much sense to me.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] block: make bio_inc_remaining() interface accessible again
  2016-05-06 15:25   ` Christoph Hellwig
@ 2016-05-06 15:56     ` Christoph Hellwig
  2016-05-06 16:19       ` Mike Snitzer
  2016-05-06 15:57     ` Mike Snitzer
  1 sibling, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2016-05-06 15:56 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Jens Axboe, linux-block, dm-devel, Christoph Hellwig

On Fri, May 06, 2016 at 05:25:35PM +0200, Christoph Hellwig wrote:
> Can you explain that code flow to me?  I still fail to why exactly
> dm-thinp (and only dm-thinp) needs this.  Maybe start by pointing me
> to the additional bio_endio that pairs with the bio_inc_remaining
> call.
> 
> > All said, bio_inc_remaining() should really only be used in conjunction
> > with bio_chain().  It isn't intended for generic bio reference counting.
> 
> It's obviously used outside bio_chain in dm-thinp, so this sentence
> doesn't make too much sense to me.

FYI, this untested patch should fix the abuse in dm-think.  Not abusing
bio_inc_remaining actually makes the code a lot simpler and more obvious.
I'm not a 100% sure, but it seems the current pattern can also lead
to a leak of the bi_remaining references when set_pool_mode managed
to set a new process_prepared_discard function pointer after the
references have been grabbed already.

Jens, I noticed you've already applied this patch - I'd really prefer
to see it reverted as using bio_inc_remaining outside bio_chain leads
to this sort of convoluted code.

---
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index e4bb9da..5875749 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -382,7 +382,6 @@ static void end_discard(struct discard_op *op, int r)
 	 */
 	if (r && !op->parent_bio->bi_error)
 		op->parent_bio->bi_error = r;
-	bio_endio(op->parent_bio);
 }
 
 /*----------------------------------------------------------------*/
@@ -1056,11 +1055,9 @@ static void process_prepared_discard_passdown(struct dm_thin_new_mapping *m)
 	r = dm_thin_remove_range(tc->td, m->virt_begin, m->virt_end);
 	if (r) {
 		metadata_operation_failed(pool, "dm_thin_remove_range", r);
-		bio_io_error(m->bio);
-
+		m->bio->bi_error = -EIO;
 	} else if (m->maybe_shared) {
 		passdown_double_checking_shared_status(m);
-
 	} else {
 		struct discard_op op;
 		begin_discard(&op, tc, m->bio);
@@ -1069,6 +1066,7 @@ static void process_prepared_discard_passdown(struct dm_thin_new_mapping *m)
 		end_discard(&op, r);
 	}
 
+	bio_endio(m->bio);
 	cell_defer_no_holder(tc, m->cell);
 	mempool_free(m, pool->mapping_pool);
 }
@@ -1537,15 +1535,6 @@ static void break_up_discard_bio(struct thin_c *tc, dm_block_t begin, dm_block_t
 		m->cell = data_cell;
 		m->bio = bio;
 
-		/*
-		 * The parent bio must not complete before sub discard bios are
-		 * chained to it (see end_discard's bio_chain)!
-		 *
-		 * This per-mapping bi_remaining increment is paired with
-		 * the implicit decrement that occurs via bio_endio() in
-		 * end_discard().
-		 */
-		bio_inc_remaining(bio);
 		if (!dm_deferred_set_add_work(pool->all_io_ds, &m->list))
 			pool->process_prepared_discard(m);
 
@@ -1565,13 +1554,6 @@ static void process_discard_cell_passdown(struct thin_c *tc, struct dm_bio_priso
 	 */
 	h->cell = virt_cell;
 	break_up_discard_bio(tc, virt_cell->key.block_begin, virt_cell->key.block_end, bio);
-
-	/*
-	 * We complete the bio now, knowing that the bi_remaining field
-	 * will prevent completion until the sub range discards have
-	 * completed.
-	 */
-	bio_endio(bio);
 }
 
 static void process_discard_bio(struct thin_c *tc, struct bio *bio)

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] block: make bio_inc_remaining() interface accessible again
  2016-05-06 15:56     ` Christoph Hellwig
@ 2016-05-06 16:19       ` Mike Snitzer
  2016-05-06 16:27         ` Christoph Hellwig
                           ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Mike Snitzer @ 2016-05-06 16:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, dm-devel

On Fri, May 06 2016 at 11:56am -0400,
Christoph Hellwig <hch@lst.de> wrote:

> On Fri, May 06, 2016 at 05:25:35PM +0200, Christoph Hellwig wrote:
> > Can you explain that code flow to me?  I still fail to why exactly
> > dm-thinp (and only dm-thinp) needs this.  Maybe start by pointing me
> > to the additional bio_endio that pairs with the bio_inc_remaining
> > call.
> > 
> > > All said, bio_inc_remaining() should really only be used in conjunction
> > > with bio_chain().  It isn't intended for generic bio reference counting.
> > 
> > It's obviously used outside bio_chain in dm-thinp, so this sentence
> > doesn't make too much sense to me.
> 
> FYI, this untested patch should fix the abuse in dm-think.  Not abusing
> bio_inc_remaining actually makes the code a lot simpler and more obvious.

But sadly I've tried this very same approach and it crashes, as I just
verified again with your patch (nevermind the fugly proprietary fusionio symbols ;):

(this is a direct result of completing the discard bio before all
mappings, and associated sub-discards, have):

73232.788728] BUG: unable to handle kernel paging request at 0000000000619df0
[73232.796519] IP: [<ffffffff810ca68a>] native_queued_spin_lock_slowpath+0x10a/0x1a0
[73232.804876] PGD 2a19e6067 PUD 29d5d0067 PMD 2a1cac067 PTE 0
[73232.811132] Oops: 0002 [#1] SMP
[73232.814750] Modules linked in: dm_thin_pool(O) dm_persistent_data(O) dm_bio_prison(O) dm_bufio(O) ext4 jbd2 mbcache iomemory_vsl(POE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel iTCO_wdt glue_helper iTCO_vendor_support lrw sg i7core_edac gf128mul ablk_helper edac_core ipmi_si pcspkr acpi_power_meter cryptd shpchp ipmi_msghandler lpc_ich i2c_i801 mfd_core acpi_cpufreq ip_tables xfs libcrc32c sr_mod sd_mod cdrom mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ata_generic ttm pata_acpi ixgbe igb drm mdio ata_piix ptp libata crc32c_intel i2c_algo_bit megaraid_sas pps_core skd i2c_core dca dm_mod [last unloaded: dm_bufio]
[73232.888557] CPU: 1 PID: 29156 Comm: fct0-worker Tainted: P          IOE   4.6.0-rc6.snitm+ #162
[73232.898266] Hardware name: FUJITSU                          PRIMERGY RX300 S6             /D2619, BIOS 6.00 Rev. 1.10.2619.N1           05/24/2011
[73232.912919] task: ffff880330901900 ti: ffff8802a9b54000 task.ti: ffff8802a9b54000
[73232.921269] RIP: 0010:[<ffffffff810ca68a>]  [<ffffffff810ca68a>] native_queued_spin_lock_slowpath+0x10a/0x1a0
[73232.932347] RSP: 0018:ffff8802a9b57848  EFLAGS: 00010006
[73232.938272] RAX: 0000000000003ffe RBX: 0000000000000086 RCX: 0000000000080000
[73232.946234] RDX: 0000000000619df0 RSI: 00000000ffffc900 RDI: ffff88029fc9a2cc
[73232.954197] RBP: ffff8802a9b57848 R08: ffff880333257900 R09: 0000000000000000
[73232.962161] R10: 000000009fcc5b01 R11: ffff88029fcc5e00 R12: ffff88029fc9a280
[73232.970122] R13: ffff88032f9472a0 R14: ffff88029fc9a2cc R15: 0000000000000000
[73232.978085] FS:  0000000000000000(0000) GS:ffff880333240000(0000) knlGS:0000000000000000
[73232.987115] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[73232.993524] CR2: 0000000000619df0 CR3: 00000002a1597000 CR4: 00000000000006e0
[73233.001485] Stack:
[73233.003727]  ffff8802a9b57858 ffffffff8117faf1 ffff8802a9b57870 ffffffff81695427
[73233.012017]  0000000000000000 ffff8802a9b578a8 ffffffffa0671788 ffff88029fc9a420
[73233.020307]  0000000000000000 ffff88029fc9a420 ffff88032a6442a0 0000000000000000
[73233.028598] Call Trace:
[73233.031326]  [<ffffffff8117faf1>] queued_spin_lock_slowpath+0xb/0xf
[73233.038321]  [<ffffffff81695427>] _raw_spin_lock_irqsave+0x37/0x40
[73233.045221]  [<ffffffffa0671788>] cell_defer_no_holder+0x28/0x80 [dm_thin_pool]
[73233.053378]  [<ffffffffa0671a4d>] thin_endio+0x14d/0x180 [dm_thin_pool]
[73233.060760]  [<ffffffff811e5e69>] ? kmem_cache_free+0x1c9/0x1e0
[73233.067372]  [<ffffffffa00016aa>] clone_endio+0x3a/0xe0 [dm_mod]
[73233.074076]  [<ffffffff812fb345>] bio_endio+0x55/0x60
[73233.079713]  [<ffffffffa0000d3e>] dec_pending+0x12e/0x280 [dm_mod]
[73233.086611]  [<ffffffffa00016cb>] clone_endio+0x5b/0xe0 [dm_mod]
[73233.093313]  [<ffffffff812fb345>] bio_endio+0x55/0x60
[73233.098951]  [<ffffffffa0000d3e>] dec_pending+0x12e/0x280 [dm_mod]
[73233.105849]  [<ffffffffa00016cb>] clone_endio+0x5b/0xe0 [dm_mod]
[73233.112552]  [<ffffffff812fb345>] bio_endio+0x55/0x60
[73233.118187]  [<ffffffff81303fc7>] blk_update_request+0x87/0x320
[73233.124793]  [<ffffffff8130427c>] blk_update_bidi_request+0x1c/0x70
[73233.131786]  [<ffffffff81304da7>] __blk_end_bidi_request+0x17/0x40
[73233.138684]  [<ffffffff81304de0>] __blk_end_request+0x10/0x20
[73233.145128]  [<ffffffffa0530a68>] complete_list_entries.isra.9+0x38/0x90 [iomemory_vsl]
[73233.154074]  [<ffffffffa0530b66>] kfio_blk_complete_request+0xa6/0x140 [iomemory_vsl]
[73233.162828]  [<ffffffffa0530c2c>] kfio_req_completor+0x2c/0x50 [iomemory_vsl]
[73233.170810]  [<ffffffffa054e282>] ifio_f9142.4bb664afe4728d2c3817c99088f2b5dd004.3.2.9.1461+0x12/0x50 [iomemory_vsl]
[73233.182574]  [<ffffffffa054b62c>] ? fio_device_ptrim_available+0x9c/0x2c0 [iomemory_vsl]
[73233.191606]  [<ffffffff810b57c8>] ? dequeue_entity+0x468/0x900
[73233.198136]  [<ffffffffa05b84ad>] ? ifio_1e5a3.5e62afc90b8d1b25ae145e583e480994677.3.2.9.1461+0xb0d/0x17c0 [iomemory_vsl]
[73233.210390]  [<ffffffffa05b913f>] ? ifio_1e5a3.5e62afc90b8d1b25ae145e583e480994677.3.2.9.1461+0x179f/0x17c0 [iomemory_vsl]
[73233.222740]  [<ffffffffa05bcbd0>] ? ifio_fbeca.b91e0233dee7fc9112bb37f58c4e526bcd8.3.2.9.1461+0x840/0xf20 [iomemory_vsl]
[73233.234889]  [<ffffffffa0532343>] ? __fusion_condvar_timedwait+0xb3/0x120 [iomemory_vsl]
[73233.243941]  [<ffffffffa05bdd78>] ? ifio_5fb65.a9c484151da25a9eb60ef9a6e7309d1a95f.3.2.9.1461+0xf8/0x2a0 [iomemory_vsl]
[73233.255998]  [<ffffffffa05bdc80>] ? ifio_f0cee.039ff5a1c1f314a171e2a8a425361a5875d.3.2.9.1461+0x50/0x50 [iomemory_vsl]
[73233.267960]  [<ffffffffa05bdc80>] ? ifio_f0cee.039ff5a1c1f314a171e2a8a425361a5875d.3.2.9.1461+0x50/0x50 [iomemory_vsl]
[73233.279899]  [<ffffffff8109ffe8>] ? kthread+0xd8/0xf0
[73233.285535]  [<ffffffff81695702>] ? ret_from_fork+0x22/0x40
[73233.291754]  [<ffffffff8109ff10>] ? kthread_park+0x60/0x60
[73233.297873] Code: c1 e0 10 45 31 c9 85 c0 74 46 48 89 c2 c1 e8 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 00 79 01 00 48 03 14 c5 c0 87 d1 81 <4c> 89 02 41 8b 40 08 85 c0 75 0a f3 90 41 8b 40 08 85 c0 74 f6
[73233.319551] RIP  [<ffffffff810ca68a>] native_queued_spin_lock_slowpath+0x10a/0x1a0
[73233.328010]  RSP <ffff8802a9b57848>
[73233.331899] CR2: 0000000000619df0
[73233.341412] ---[ end trace c81cf06a98ad2d88 ]---

> I'm not a 100% sure, but it seems the current pattern can also lead
> to a leak of the bi_remaining references when set_pool_mode managed
> to set a new process_prepared_discard function pointer after the
> references have been grabbed already.

We have quite a few tests in the device-mapper-test-suite that hammer
discards (in conjunction with the pool running out of space or if the
pool fails).  The replacement process_prepared_discard functions will
issue a bio_endio() per mapping anyway.. so there should be no risk of a
bi_remaining reference leaking like you thought.
 
> Jens, I noticed you've already applied this patch - I'd really prefer
> to see it reverted as using bio_inc_remaining outside bio_chain leads
> to this sort of convoluted code.

As you know not all code is simple.  I've looked at this for quite a bit
this week and unfortunately I don't see a way forward (yet) that doesn't
require the use of bio_inc_remaining() to take extra bi_remaining
references.

> ---
> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index e4bb9da..5875749 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -382,7 +382,6 @@ static void end_discard(struct discard_op *op, int r)
>  	 */
>  	if (r && !op->parent_bio->bi_error)
>  		op->parent_bio->bi_error = r;
> -	bio_endio(op->parent_bio);
>  }
>  
>  /*----------------------------------------------------------------*/
> @@ -1056,11 +1055,9 @@ static void process_prepared_discard_passdown(struct dm_thin_new_mapping *m)
>  	r = dm_thin_remove_range(tc->td, m->virt_begin, m->virt_end);
>  	if (r) {
>  		metadata_operation_failed(pool, "dm_thin_remove_range", r);
> -		bio_io_error(m->bio);
> -
> +		m->bio->bi_error = -EIO;
>  	} else if (m->maybe_shared) {
>  		passdown_double_checking_shared_status(m);
> -
>  	} else {
>  		struct discard_op op;
>  		begin_discard(&op, tc, m->bio);
> @@ -1069,6 +1066,7 @@ static void process_prepared_discard_passdown(struct dm_thin_new_mapping *m)
>  		end_discard(&op, r);
>  	}
>  
> +	bio_endio(m->bio);
>  	cell_defer_no_holder(tc, m->cell);
>  	mempool_free(m, pool->mapping_pool);
>  }
> @@ -1537,15 +1535,6 @@ static void break_up_discard_bio(struct thin_c *tc, dm_block_t begin, dm_block_t
>  		m->cell = data_cell;
>  		m->bio = bio;
>  
> -		/*
> -		 * The parent bio must not complete before sub discard bios are
> -		 * chained to it (see end_discard's bio_chain)!
> -		 *
> -		 * This per-mapping bi_remaining increment is paired with
> -		 * the implicit decrement that occurs via bio_endio() in
> -		 * end_discard().
> -		 */
> -		bio_inc_remaining(bio);
>  		if (!dm_deferred_set_add_work(pool->all_io_ds, &m->list))
>  			pool->process_prepared_discard(m);
>  
> @@ -1565,13 +1554,6 @@ static void process_discard_cell_passdown(struct thin_c *tc, struct dm_bio_priso
>  	 */
>  	h->cell = virt_cell;
>  	break_up_discard_bio(tc, virt_cell->key.block_begin, virt_cell->key.block_end, bio);
> -
> -	/*
> -	 * We complete the bio now, knowing that the bi_remaining field
> -	 * will prevent completion until the sub range discards have
> -	 * completed.
> -	 */
> -	bio_endio(bio);
>  }
>  
>  static void process_discard_bio(struct thin_c *tc, struct bio *bio)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] block: make bio_inc_remaining() interface accessible again
  2016-05-06 16:19       ` Mike Snitzer
@ 2016-05-06 16:27         ` Christoph Hellwig
  2016-05-06 16:46           ` Joe Thornber
  2016-05-06 16:30         ` Joe Thornber
  2016-05-06 16:43         ` Christoph Hellwig
  2 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2016-05-06 16:27 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Jens Axboe, linux-block, dm-devel, Christoph Hellwig

On Fri, May 06, 2016 at 12:19:16PM -0400, Mike Snitzer wrote:
> (this is a direct result of completing the discard bio before all
> mappings, and associated sub-discards, have):

The completion order really should not matter, we rely on it not
mattering for almost every usage of the bio_chain API.  So there
is something else fishy going on here.  What's your test case?
I'd like to dig a bit deeper into this.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] block: make bio_inc_remaining() interface accessible again
  2016-05-06 16:27         ` Christoph Hellwig
@ 2016-05-06 16:46           ` Joe Thornber
  0 siblings, 0 replies; 20+ messages in thread
From: Joe Thornber @ 2016-05-06 16:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, dm-devel, Mike Snitzer

On Fri, May 06, 2016 at 06:27:05PM +0200, Christoph Hellwig wrote:
> On Fri, May 06, 2016 at 12:19:16PM -0400, Mike Snitzer wrote:
> > (this is a direct result of completing the discard bio before all
> > mappings, and associated sub-discards, have):
> 
> The completion order really should not matter, we rely on it not
> mattering for almost every usage of the bio_chain API.  So there
> is something else fishy going on here.  What's your test case?
> I'd like to dig a bit deeper into this.

Completion order isn't important.  The issue we have is there are two
levels of fragmentation that we see with thinp.

First a large discard (eg, mkfs) comes in.  We then break this up into
mapped regions.  Then each region is quiesced individually.  Once
quiesced we then can discard the region, with the extra twist that we
can't discard any areas that are shared between snapshots.  The
patches Mike posted switch to use your new bio_chain/blk_issue_thing
(which I like btw) on the bottom level (ie, each region).  So we need
some form of ref counting to work out when all the mid level regions
are complete and hence we can complete the original bio.

It's not rocket science, and is generating far more discussion than is
neccessary.  If I can't piggy back on the bio's ref count, I'll create
my own.

- Joe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] block: make bio_inc_remaining() interface accessible again
  2016-05-06 16:19       ` Mike Snitzer
  2016-05-06 16:27         ` Christoph Hellwig
@ 2016-05-06 16:30         ` Joe Thornber
  2016-05-06 16:43         ` Christoph Hellwig
  2 siblings, 0 replies; 20+ messages in thread
From: Joe Thornber @ 2016-05-06 16:30 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Jens Axboe, linux-block, dm-devel, Christoph Hellwig

On Fri, May 06, 2016 at 12:19:16PM -0400, Mike Snitzer wrote:
> As you know not all code is simple.  I've looked at this for quite a bit
> this week and unfortunately I don't see a way forward (yet) that doesn't
> require the use of bio_inc_remaining() to take extra bi_remaining
> references.

Well there is a way around it;  we just maintain our own reference
counter and use that to determine when to complete the original bio.

It just seems a shame when there's a ref count in the bio already.
But if Jens and Christophe are adamant that bio_chain is going to be
the one and only way to access that ref count then I'll allocate
another object to hold the ref count.

- Joe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] block: make bio_inc_remaining() interface accessible again
  2016-05-06 16:19       ` Mike Snitzer
  2016-05-06 16:27         ` Christoph Hellwig
  2016-05-06 16:30         ` Joe Thornber
@ 2016-05-06 16:43         ` Christoph Hellwig
  2 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2016-05-06 16:43 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Jens Axboe, linux-block, dm-devel, Christoph Hellwig

Ok, I think I understand the problem now - the problem is when
break_up_discard_bio does not actually call pool->process_prepared
_discard directly, but instead defers it to a worker thread.  At that
point we might have already completed all the chained bios and
the original one by the time we run another instance of
process_discard_cell_passdown.

So I guess we'll have to live with this for now.  I really don't
like it very much, and the comments could use a massive improvement,
but instead of duplicating the code let's just export the helper:

Acked-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] block: make bio_inc_remaining() interface accessible again
  2016-05-06 15:25   ` Christoph Hellwig
  2016-05-06 15:56     ` Christoph Hellwig
@ 2016-05-06 15:57     ` Mike Snitzer
  1 sibling, 0 replies; 20+ messages in thread
From: Mike Snitzer @ 2016-05-06 15:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, dm-devel

On Fri, May 06 2016 at 11:25am -0400,
Christoph Hellwig <hch@lst.de> wrote:

> On Thu, May 05, 2016 at 11:54:22AM -0400, Mike Snitzer wrote:
> > Commit 326e1dbb57 ("block: remove management of bi_remaining when
> > restoring original bi_end_io") made bio_inc_remaining() private to bio.c
> > because the only use-case that made sense was confined to the
> > bio_chain() interface.
> > 
> > Since that time DM thinp went on to use bio_chain() in its relatively
> > complex implementation of async discard support.  That implementation,
> > even when converted over to use the new async __blkdev_issue_discard()
> > interface, depends on deferred completion of the original discard bio --
> > which is most appropriately implemented using bio_inc_remaining().
> 
> Can you explain that code flow to me?  I still fail to why exactly
> dm-thinp (and only dm-thinp) needs this.  Maybe start by pointing me
> to the additional bio_endio that pairs with the bio_inc_remaining
> call.

If you look at the latest code here:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.7

dm-thin.c:break_up_discard_bio()'s bio_inc_remaining() is paired with
dm-thin.c:end_discard()'s bio_endio()

But the flow is:

-> process_discard_cell_passdown
   -> break_up_discard_bio (per-mapping reference taken on original discard bio)
   -> bio_endio(original discard bio) -- cleanest place to complete
      it.. but doing so here requires extra references on that discard
      bio because we must wait for the N mappings and associated
      sub-discard bios to be chained, issued, and completed

... wait for associated sub-discard mappings to quiesce and become "prepared"...

-> process_prepared_discard_passdown (bio-chains aren't constructed until here)
   -> passdown_double_checking_shared_status (iterates over each mappings' subset of the original discard)
      -> __blkdev_issue_discard
      -> bio_endio (per-mapping reference, taken in break_up_discard_bio, is dropped)

> > All said, bio_inc_remaining() should really only be used in conjunction
> > with bio_chain().  It isn't intended for generic bio reference counting.
> 
> It's obviously used outside bio_chain in dm-thinp, so this sentence
> doesn't make too much sense to me.

It is used in conjunction with supporting thinp's use of bio_chain(),
via __blkdev_issue_discard, but yeah I can see why you think they aren't
related.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/5] dm thin: remove __bio_inc_remaining() and switch to using bio_inc_remaining()
  2016-05-05 15:54 [PATCH 0/5] dm thin: adapt code to use new async __blkdev_issue_discard Mike Snitzer
  2016-05-05 15:54 ` [PATCH v2 1/5] block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard Mike Snitzer
  2016-05-05 15:54 ` [PATCH 2/5] block: make bio_inc_remaining() interface accessible again Mike Snitzer
@ 2016-05-05 15:54 ` Mike Snitzer
  2016-05-05 15:54 ` [PATCH 4/5] dm thin: use __blkdev_issue_discard for async discard support Mike Snitzer
  2016-05-05 15:54 ` [PATCH 5/5] dm thin: unroll issue_discard() to create longer discard bio chains Mike Snitzer
  4 siblings, 0 replies; 20+ messages in thread
From: Mike Snitzer @ 2016-05-05 15:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, dm-devel, Christoph Hellwig, Mike Snitzer

DM thinp's use of bio_inc_remaining() is critical to ensure the original
parent discard bio isn't completed before sub-discards have.  DM thinp
needs this due to the extra quiescing that occurs, via multiple DM thinp
mappings, while processing large discards.  As such DM thinp must build
the async discard bio chain after some delay -- so bio_inc_remaining()
is used to enable DM thinp to take a reference on the original parent
discard bio for each mapping.  This allows the immediate use of
bio_endio() on that discard bio; but with the understanding that the
actual completion won't occur until each of the sub-discards'
per-mapping references are dropped.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
---
 drivers/md/dm-thin.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 04e7f3b..da42c49 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1494,17 +1494,6 @@ static void process_discard_cell_no_passdown(struct thin_c *tc,
 		pool->process_prepared_discard(m);
 }
 
-/*
- * __bio_inc_remaining() is used to defer parent bios's end_io until
- * we _know_ all chained sub range discard bios have completed.
- */
-static inline void __bio_inc_remaining(struct bio *bio)
-{
-	bio->bi_flags |= (1 << BIO_CHAIN);
-	smp_mb__before_atomic();
-	atomic_inc(&bio->__bi_remaining);
-}
-
 static void break_up_discard_bio(struct thin_c *tc, dm_block_t begin, dm_block_t end,
 				 struct bio *bio)
 {
@@ -1560,7 +1549,7 @@ static void break_up_discard_bio(struct thin_c *tc, dm_block_t begin, dm_block_t
 		 * the implicit decrement that occurs via bio_endio() in
 		 * process_prepared_discard_{passdown,no_passdown}.
 		 */
-		__bio_inc_remaining(bio);
+		bio_inc_remaining(bio);
 		if (!dm_deferred_set_add_work(pool->all_io_ds, &m->list))
 			pool->process_prepared_discard(m);
 
-- 
2.6.4 (Apple Git-63)

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] dm thin: use __blkdev_issue_discard for async discard support
  2016-05-05 15:54 [PATCH 0/5] dm thin: adapt code to use new async __blkdev_issue_discard Mike Snitzer
                   ` (2 preceding siblings ...)
  2016-05-05 15:54 ` [PATCH 3/5] dm thin: remove __bio_inc_remaining() and switch to using bio_inc_remaining() Mike Snitzer
@ 2016-05-05 15:54 ` Mike Snitzer
  2016-05-06 16:05   ` Christoph Hellwig
  2016-05-05 15:54 ` [PATCH 5/5] dm thin: unroll issue_discard() to create longer discard bio chains Mike Snitzer
  4 siblings, 1 reply; 20+ messages in thread
From: Mike Snitzer @ 2016-05-05 15:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, dm-devel, Christoph Hellwig, Mike Snitzer

With commit 38f25255330 ("block: add __blkdev_issue_discard") DM thinp
no longer needs to carry its own async discard method.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
---
 drivers/md/dm-thin.c | 70 ++++++++++++----------------------------------------
 1 file changed, 16 insertions(+), 54 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index da42c49..598a78b 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -322,56 +322,6 @@ struct thin_c {
 
 /*----------------------------------------------------------------*/
 
-/**
- * __blkdev_issue_discard_async - queue a discard with async completion
- * @bdev:	blockdev to issue discard for
- * @sector:	start sector
- * @nr_sects:	number of sectors to discard
- * @gfp_mask:	memory allocation flags (for bio_alloc)
- * @flags:	BLKDEV_IFL_* flags to control behaviour
- * @parent_bio: parent discard bio that all sub discards get chained to
- *
- * Description:
- *    Asynchronously issue a discard request for the sectors in question.
- */
-static int __blkdev_issue_discard_async(struct block_device *bdev, sector_t sector,
-					sector_t nr_sects, gfp_t gfp_mask, unsigned long flags,
-					struct bio *parent_bio)
-{
-	struct request_queue *q = bdev_get_queue(bdev);
-	int type = REQ_WRITE | REQ_DISCARD;
-	struct bio *bio;
-
-	if (!q || !nr_sects)
-		return -ENXIO;
-
-	if (!blk_queue_discard(q))
-		return -EOPNOTSUPP;
-
-	if (flags & BLKDEV_DISCARD_SECURE) {
-		if (!blk_queue_secdiscard(q))
-			return -EOPNOTSUPP;
-		type |= REQ_SECURE;
-	}
-
-	/*
-	 * Required bio_put occurs in bio_endio thanks to bio_chain below
-	 */
-	bio = bio_alloc(gfp_mask, 1);
-	if (!bio)
-		return -ENOMEM;
-
-	bio_chain(bio, parent_bio);
-
-	bio->bi_iter.bi_sector = sector;
-	bio->bi_bdev = bdev;
-	bio->bi_iter.bi_size = nr_sects << 9;
-
-	submit_bio(type, bio);
-
-	return 0;
-}
-
 static bool block_size_is_power_of_two(struct pool *pool)
 {
 	return pool->sectors_per_block_shift >= 0;
@@ -387,11 +337,23 @@ static sector_t block_to_sectors(struct pool *pool, dm_block_t b)
 static int issue_discard(struct thin_c *tc, dm_block_t data_b, dm_block_t data_e,
 			 struct bio *parent_bio)
 {
+	int type = REQ_WRITE | REQ_DISCARD;
 	sector_t s = block_to_sectors(tc->pool, data_b);
 	sector_t len = block_to_sectors(tc->pool, data_e - data_b);
+	struct bio *bio = NULL;
+	struct blk_plug plug;
+	int ret;
+
+	blk_start_plug(&plug);
+	ret = __blkdev_issue_discard(tc->pool_dev->bdev, s, len,
+				     GFP_NOWAIT, type, &bio);
+	if (!ret && bio) {
+		bio_chain(bio, parent_bio);
+		submit_bio(type, bio);
+	}
+	blk_finish_plug(&plug);
 
-	return __blkdev_issue_discard_async(tc->pool_dev->bdev, s, len,
-					    GFP_NOWAIT, 0, parent_bio);
+	return ret;
 }
 
 /*----------------------------------------------------------------*/
@@ -1543,11 +1505,11 @@ static void break_up_discard_bio(struct thin_c *tc, dm_block_t begin, dm_block_t
 
 		/*
 		 * The parent bio must not complete before sub discard bios are
-		 * chained to it (see __blkdev_issue_discard_async's bio_chain)!
+		 * chained to it (see issue_discard's bio_chain)!
 		 *
 		 * This per-mapping bi_remaining increment is paired with
 		 * the implicit decrement that occurs via bio_endio() in
-		 * process_prepared_discard_{passdown,no_passdown}.
+		 * process_prepared_discard_passdown().
 		 */
 		bio_inc_remaining(bio);
 		if (!dm_deferred_set_add_work(pool->all_io_ds, &m->list))
-- 
2.6.4 (Apple Git-63)

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] dm thin: use __blkdev_issue_discard for async discard support
  2016-05-05 15:54 ` [PATCH 4/5] dm thin: use __blkdev_issue_discard for async discard support Mike Snitzer
@ 2016-05-06 16:05   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2016-05-06 16:05 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Jens Axboe, linux-block, dm-devel, Christoph Hellwig

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 5/5] dm thin: unroll issue_discard() to create longer discard bio chains
  2016-05-05 15:54 [PATCH 0/5] dm thin: adapt code to use new async __blkdev_issue_discard Mike Snitzer
                   ` (3 preceding siblings ...)
  2016-05-05 15:54 ` [PATCH 4/5] dm thin: use __blkdev_issue_discard for async discard support Mike Snitzer
@ 2016-05-05 15:54 ` Mike Snitzer
  2016-05-06 16:12   ` Christoph Hellwig
  4 siblings, 1 reply; 20+ messages in thread
From: Mike Snitzer @ 2016-05-05 15:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, dm-devel, Mike Snitzer, Christoph Hellwig,
	Joe Thornber

From: Joe Thornber <ejt@redhat.com>

There is little benefit to doing this but it does structure DM thinp's
code to more cleanly use the __blkdev_issue_discard() interface --
particularly in passdown_double_checking_shared_status().

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-thin.c | 104 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 69 insertions(+), 35 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 598a78b..e4bb9da 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -334,26 +334,55 @@ static sector_t block_to_sectors(struct pool *pool, dm_block_t b)
 		(b * pool->sectors_per_block);
 }
 
-static int issue_discard(struct thin_c *tc, dm_block_t data_b, dm_block_t data_e,
-			 struct bio *parent_bio)
+/*----------------------------------------------------------------*/
+
+struct discard_op {
+	struct thin_c *tc;
+	struct blk_plug plug;
+	struct bio *parent_bio;
+	struct bio *bio;
+};
+
+static void begin_discard(struct discard_op *op, struct thin_c *tc, struct bio *parent)
+{
+	BUG_ON(!parent);
+
+	op->tc = tc;
+	blk_start_plug(&op->plug);
+	op->parent_bio = parent;
+	op->bio = NULL;
+}
+
+static int issue_discard(struct discard_op *op, dm_block_t data_b, dm_block_t data_e)
 {
-	int type = REQ_WRITE | REQ_DISCARD;
+	struct thin_c *tc = op->tc;
 	sector_t s = block_to_sectors(tc->pool, data_b);
 	sector_t len = block_to_sectors(tc->pool, data_e - data_b);
-	struct bio *bio = NULL;
-	struct blk_plug plug;
-	int ret;
 
-	blk_start_plug(&plug);
-	ret = __blkdev_issue_discard(tc->pool_dev->bdev, s, len,
-				     GFP_NOWAIT, type, &bio);
-	if (!ret && bio) {
-		bio_chain(bio, parent_bio);
-		submit_bio(type, bio);
+	return __blkdev_issue_discard(tc->pool_dev->bdev, s, len,
+				      GFP_NOWAIT, REQ_WRITE | REQ_DISCARD, &op->bio);
+}
+
+static void end_discard(struct discard_op *op, int r)
+{
+	if (op->bio) {
+		/*
+		 * Even if one of the calls to issue_discard failed, we
+		 * need to wait for the chain to complete.
+		 */
+		bio_chain(op->bio, op->parent_bio);
+		submit_bio(REQ_WRITE | REQ_DISCARD, op->bio);
 	}
-	blk_finish_plug(&plug);
 
-	return ret;
+	blk_finish_plug(&op->plug);
+
+	/*
+	 * Even if r is set, there could be sub discards in flight that we
+	 * need to wait for.
+	 */
+	if (r && !op->parent_bio->bi_error)
+		op->parent_bio->bi_error = r;
+	bio_endio(op->parent_bio);
 }
 
 /*----------------------------------------------------------------*/
@@ -968,24 +997,28 @@ static void process_prepared_discard_no_passdown(struct dm_thin_new_mapping *m)
 	mempool_free(m, tc->pool->mapping_pool);
 }
 
-static int passdown_double_checking_shared_status(struct dm_thin_new_mapping *m)
+/*----------------------------------------------------------------*/
+
+static void passdown_double_checking_shared_status(struct dm_thin_new_mapping *m)
 {
 	/*
 	 * We've already unmapped this range of blocks, but before we
 	 * passdown we have to check that these blocks are now unused.
 	 */
-	int r;
+	int r = 0;
 	bool used = true;
 	struct thin_c *tc = m->tc;
 	struct pool *pool = tc->pool;
 	dm_block_t b = m->data_block, e, end = m->data_block + m->virt_end - m->virt_begin;
+	struct discard_op op;
 
+	begin_discard(&op, tc, m->bio);
 	while (b != end) {
 		/* find start of unmapped run */
 		for (; b < end; b++) {
 			r = dm_pool_block_is_used(pool->pmd, b, &used);
 			if (r)
-				return r;
+				goto out;
 
 			if (!used)
 				break;
@@ -998,20 +1031,20 @@ static int passdown_double_checking_shared_status(struct dm_thin_new_mapping *m)
 		for (e = b + 1; e != end; e++) {
 			r = dm_pool_block_is_used(pool->pmd, e, &used);
 			if (r)
-				return r;
+				goto out;
 
 			if (used)
 				break;
 		}
 
-		r = issue_discard(tc, b, e, m->bio);
+		r = issue_discard(&op, b, e);
 		if (r)
-			return r;
+			goto out;
 
 		b = e;
 	}
-
-	return 0;
+out:
+	end_discard(&op, r);
 }
 
 static void process_prepared_discard_passdown(struct dm_thin_new_mapping *m)
@@ -1021,20 +1054,21 @@ static void process_prepared_discard_passdown(struct dm_thin_new_mapping *m)
 	struct pool *pool = tc->pool;
 
 	r = dm_thin_remove_range(tc->td, m->virt_begin, m->virt_end);
-	if (r)
+	if (r) {
 		metadata_operation_failed(pool, "dm_thin_remove_range", r);
+		bio_io_error(m->bio);
 
-	else if (m->maybe_shared)
-		r = passdown_double_checking_shared_status(m);
-	else
-		r = issue_discard(tc, m->data_block, m->data_block + (m->virt_end - m->virt_begin), m->bio);
+	} else if (m->maybe_shared) {
+		passdown_double_checking_shared_status(m);
+
+	} else {
+		struct discard_op op;
+		begin_discard(&op, tc, m->bio);
+		r = issue_discard(&op, m->data_block,
+				  m->data_block + (m->virt_end - m->virt_begin));
+		end_discard(&op, r);
+	}
 
-	/*
-	 * Even if r is set, there could be sub discards in flight that we
-	 * need to wait for.
-	 */
-	m->bio->bi_error = r;
-	bio_endio(m->bio);
 	cell_defer_no_holder(tc, m->cell);
 	mempool_free(m, pool->mapping_pool);
 }
@@ -1505,11 +1539,11 @@ static void break_up_discard_bio(struct thin_c *tc, dm_block_t begin, dm_block_t
 
 		/*
 		 * The parent bio must not complete before sub discard bios are
-		 * chained to it (see issue_discard's bio_chain)!
+		 * chained to it (see end_discard's bio_chain)!
 		 *
 		 * This per-mapping bi_remaining increment is paired with
 		 * the implicit decrement that occurs via bio_endio() in
-		 * process_prepared_discard_passdown().
+		 * end_discard().
 		 */
 		bio_inc_remaining(bio);
 		if (!dm_deferred_set_add_work(pool->all_io_ds, &m->list))
-- 
2.6.4 (Apple Git-63)

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] dm thin: unroll issue_discard() to create longer discard bio chains
  2016-05-05 15:54 ` [PATCH 5/5] dm thin: unroll issue_discard() to create longer discard bio chains Mike Snitzer
@ 2016-05-06 16:12   ` Christoph Hellwig
  2016-05-06 16:36     ` Joe Thornber
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2016-05-06 16:12 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, linux-block, dm-devel, Christoph Hellwig,
	Joe Thornber

On Thu, May 05, 2016 at 11:54:25AM -0400, Mike Snitzer wrote:
> From: Joe Thornber <ejt@redhat.com>
> 
> There is little benefit to doing this but it does structure DM thinp's
> code to more cleanly use the __blkdev_issue_discard() interface --
> particularly in passdown_double_checking_shared_status().

As-is I think it makes the code a whole lot more convoluted, and especially
the structure that's just used to communicated 4 parameters between
functions which don't even use all of them isn't very helpful.

If we can get rid of begin_discard/end_discard and struct discard_op I
think the rest of the changes would still be useful, though.

> +static int issue_discard(struct discard_op *op, dm_block_t data_b, dm_block_t data_e)
>  {
> +	struct thin_c *tc = op->tc;
>  	sector_t s = block_to_sectors(tc->pool, data_b);
>  	sector_t len = block_to_sectors(tc->pool, data_e - data_b);
> +	return __blkdev_issue_discard(tc->pool_dev->bdev, s, len,
> +				      GFP_NOWAIT, REQ_WRITE | REQ_DISCARD, &op->bio);
> +}

This is a useful helper I think, but passing tc and bio directly would
make it even more obvious.

> +static void end_discard(struct discard_op *op, int r)
> +{
> +	if (op->bio) {
> +		/*
> +		 * Even if one of the calls to issue_discard failed, we
> +		 * need to wait for the chain to complete.
> +		 */
> +		bio_chain(op->bio, op->parent_bio);
> +		submit_bio(REQ_WRITE | REQ_DISCARD, op->bio);

This comment is a bit confusing as there is nothing that appears to wait
on any of the bios involved.  Also if we ever were to get an error
return from issue_discard when op->bio is not set it would do the wrong
thing, although with the current interface that can't actually happen.

> +	blk_finish_plug(&op->plug);
> +
> +	/*
> +	 * Even if r is set, there could be sub discards in flight that we
> +	 * need to wait for.
> +	 */
> +	if (r && !op->parent_bio->bi_error)
> +		op->parent_bio->bi_error = r;

Both the plugging and the bi_error handling would benefit from being
moved into process_prepared_discard_passdown and handled in the common
path.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] dm thin: unroll issue_discard() to create longer discard bio chains
  2016-05-06 16:12   ` Christoph Hellwig
@ 2016-05-06 16:36     ` Joe Thornber
  2016-05-06 16:44       ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Joe Thornber @ 2016-05-06 16:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, dm-devel, Joe Thornber, Mike Snitzer

On Fri, May 06, 2016 at 06:12:16PM +0200, Christoph Hellwig wrote:
> On Thu, May 05, 2016 at 11:54:25AM -0400, Mike Snitzer wrote:
> > From: Joe Thornber <ejt@redhat.com>
> > 
> > There is little benefit to doing this but it does structure DM thinp's
> > code to more cleanly use the __blkdev_issue_discard() interface --
> > particularly in passdown_double_checking_shared_status().
> 
> As-is I think it makes the code a whole lot more convoluted, and especially
> the structure that's just used to communicated 4 parameters between
> functions which don't even use all of them isn't very helpful.

I started coding it inline, and then factored it out because I thought
it was cleaner.  I'm also likely to lift it into a separate file for
use in dm-cache too.

> This is a useful helper I think, but passing tc and bio directly would
> make it even more obvious.

y, this is exactly what I had.

> Both the plugging and the bi_error handling would benefit from being
> moved into process_prepared_discard_passdown and handled in the common
> path.

But the plugging and error handling are called from two places.
process_prepared_discard_passdown() and
passdown_double_checking_shared_status().

- Joe

(I agree about the confusing comment, we'll remove)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] dm thin: unroll issue_discard() to create longer discard bio chains
  2016-05-06 16:36     ` Joe Thornber
@ 2016-05-06 16:44       ` Christoph Hellwig
  2016-05-06 16:48         ` Joe Thornber
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2016-05-06 16:44 UTC (permalink / raw)
  To: Christoph Hellwig, Mike Snitzer, Jens Axboe, linux-block,
	dm-devel, Joe Thornber

On Fri, May 06, 2016 at 05:36:58PM +0100, Joe Thornber wrote:
> But the plugging and error handling are called from two places.
> process_prepared_discard_passdown() and
> passdown_double_checking_shared_status().

passdown_double_checking_shared_status is only called from
process_prepared_discard_passdown.  So the blk_start_plug from
begin_discard and the work in end_discard could be moved into
it trivially.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] dm thin: unroll issue_discard() to create longer discard bio chains
  2016-05-06 16:44       ` Christoph Hellwig
@ 2016-05-06 16:48         ` Joe Thornber
  0 siblings, 0 replies; 20+ messages in thread
From: Joe Thornber @ 2016-05-06 16:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, dm-devel, Joe Thornber, Mike Snitzer

On Fri, May 06, 2016 at 06:44:35PM +0200, Christoph Hellwig wrote:
> On Fri, May 06, 2016 at 05:36:58PM +0100, Joe Thornber wrote:
> > But the plugging and error handling are called from two places.
> > process_prepared_discard_passdown() and
> > passdown_double_checking_shared_status().
> 
> passdown_double_checking_shared_status is only called from
> process_prepared_discard_passdown.  So the blk_start_plug from
> begin_discard and the work in end_discard could be moved into
> it trivially.

Good point, will do.  Thx.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-05-06 16:48 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-05 15:54 [PATCH 0/5] dm thin: adapt code to use new async __blkdev_issue_discard Mike Snitzer
2016-05-05 15:54 ` [PATCH v2 1/5] block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard Mike Snitzer
2016-05-06 16:04   ` Christoph Hellwig
2016-05-05 15:54 ` [PATCH 2/5] block: make bio_inc_remaining() interface accessible again Mike Snitzer
2016-05-06 15:25   ` Christoph Hellwig
2016-05-06 15:56     ` Christoph Hellwig
2016-05-06 16:19       ` Mike Snitzer
2016-05-06 16:27         ` Christoph Hellwig
2016-05-06 16:46           ` Joe Thornber
2016-05-06 16:30         ` Joe Thornber
2016-05-06 16:43         ` Christoph Hellwig
2016-05-06 15:57     ` Mike Snitzer
2016-05-05 15:54 ` [PATCH 3/5] dm thin: remove __bio_inc_remaining() and switch to using bio_inc_remaining() Mike Snitzer
2016-05-05 15:54 ` [PATCH 4/5] dm thin: use __blkdev_issue_discard for async discard support Mike Snitzer
2016-05-06 16:05   ` Christoph Hellwig
2016-05-05 15:54 ` [PATCH 5/5] dm thin: unroll issue_discard() to create longer discard bio chains Mike Snitzer
2016-05-06 16:12   ` Christoph Hellwig
2016-05-06 16:36     ` Joe Thornber
2016-05-06 16:44       ` Christoph Hellwig
2016-05-06 16:48         ` Joe Thornber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.