All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Vallish Vaidyeshwara <vallish@amazon.com>
Cc: eduval@amazon.com, ejt@redhat.com, gafton@amazon.com,
	dm-devel@redhat.com, anchalag@amazon.com, agk@redhat.com
Subject: Re: dm thin: do not queue freed thin mapping for next stage processing
Date: Mon, 26 Jun 2017 15:06:38 -0400	[thread overview]
Message-ID: <20170626190638.GA375@redhat.com> (raw)
In-Reply-To: <1498243986-89156-1-git-send-email-vallish@amazon.com>

On Fri, Jun 23 2017 at  2:53pm -0400,
Vallish Vaidyeshwara <vallish@amazon.com> wrote:

> process_prepared_discard_passdown_pt1() should cleanup
> dm_thin_new_mapping in cases of error.
> 
> dm_pool_inc_data_range() can fail trying to get a block reference:
> 
> metadata operation 'dm_pool_inc_data_range' failed: error = -61
> 
> When dm_pool_inc_data_range() fails, dm thin aborts current metadata
> transaction and marks pool as PM_READ_ONLY. Memory for thin mapping
> is released as well. However, current thin mapping will be queued
> onto next stage as part of queue_passdown_pt2() or passdown_endio().
> This dangling thin mapping memory when processed and accessed in
> next stage will lead to device mapper crashing.
> 
> Code flow without fix:
> -> process_prepared_discard_passdown_pt1(m)
>    -> dm_thin_remove_range()
>    -> discard passdown
>       --> passdown_endio(m) queues m onto next stage
>    -> dm_pool_inc_data_range() fails, frees memory m
>             but does not remove it from next stage queue
> 
> -> process_prepared_discard_passdown_pt2(m)
>    -> processes freed memory m and crashes
> 
> One such stack:
> 
> Call Trace:
> [<ffffffffa037a46f>] dm_cell_release_no_holder+0x2f/0x70 [dm_bio_prison]
> [<ffffffffa039b6dc>] cell_defer_no_holder+0x3c/0x80 [dm_thin_pool]
> [<ffffffffa039b88b>] process_prepared_discard_passdown_pt2+0x4b/0x90 [dm_thin_pool]
> [<ffffffffa0399611>] process_prepared+0x81/0xa0 [dm_thin_pool]
> [<ffffffffa039e735>] do_worker+0xc5/0x820 [dm_thin_pool]
> [<ffffffff8152bf54>] ? __schedule+0x244/0x680
> [<ffffffff81087e72>] ? pwq_activate_delayed_work+0x42/0xb0
> [<ffffffff81089f53>] process_one_work+0x153/0x3f0
> [<ffffffff8108a71b>] worker_thread+0x12b/0x4b0
> [<ffffffff8108a5f0>] ? rescuer_thread+0x350/0x350
> [<ffffffff8108fd6a>] kthread+0xca/0xe0
> [<ffffffff8108fca0>] ? kthread_park+0x60/0x60
> [<ffffffff81530b45>] ret_from_fork+0x25/0x30
> 
> The fix is to first take the block ref count for discarded block and
> then do a passdown discard of this block. If block ref count fails,
> then bail out aborting current metadata transaction, mark pool as
> PM_READ_ONLY and also free current thin mapping memory (existing error
> handling code) without queueing this thin mapping onto next stage of
> processing. If block ref count succeeds, then passdown discard of this
> block. Discard callback of passdown_endio() will queue this thin mapping
> onto next stage of processing.
> 
> Code flow with fix:
> -> process_prepared_discard_passdown_pt1(m)
>    -> dm_thin_remove_range()
>    -> dm_pool_inc_data_range()
>       --> if fails, free memory m and bail out
>    -> discard passdown
>       --> passdown_endio(m) queues m onto next stage
> 
> Cc: stable <stable@vger.kernel.org> # v4.9+
> Reviewed-by: Eduardo Valentin <eduval@amazon.com>
> Reviewed-by: Cristian Gafton <gafton@amazon.com>
> Reviewed-by: Anchal Agarwal <anchalag@amazon.com>
> Signed-off-by: Vallish Vaidyeshwara <vallish@amazon.com>
> ---
>  drivers/md/dm-thin.c | 26 +++++++++++++-------------
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index 17ad50d..28808e5 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -1094,6 +1094,19 @@ static void process_prepared_discard_passdown_pt1(struct dm_thin_new_mapping *m)
>  		return;
>  	}
>  
> +	/*
> +	 * Increment the unmapped blocks.  This prevents a race between the
> +	 * passdown io and reallocation of freed blocks.
> +	 */
> +	r = dm_pool_inc_data_range(pool->pmd, m->data_block, data_end);
> +	if (r) {
> +		metadata_operation_failed(pool, "dm_pool_inc_data_range", r);
> +		bio_io_error(m->bio);
> +		cell_defer_no_holder(tc, m->cell);
> +		mempool_free(m, pool->mapping_pool);
> +		return;
> +	}
> +
>  	discard_parent = bio_alloc(GFP_NOIO, 1);
>  	if (!discard_parent) {
>  		DMWARN("%s: unable to allocate top level discard bio for passdown. Skipping passdown.",
> @@ -1114,19 +1127,6 @@ static void process_prepared_discard_passdown_pt1(struct dm_thin_new_mapping *m)
>  			end_discard(&op, r);
>  		}
>  	}
> -
> -	/*
> -	 * Increment the unmapped blocks.  This prevents a race between the
> -	 * passdown io and reallocation of freed blocks.
> -	 */
> -	r = dm_pool_inc_data_range(pool->pmd, m->data_block, data_end);
> -	if (r) {
> -		metadata_operation_failed(pool, "dm_pool_inc_data_range", r);
> -		bio_io_error(m->bio);
> -		cell_defer_no_holder(tc, m->cell);
> -		mempool_free(m, pool->mapping_pool);
> -		return;
> -	}
>  }
>  
>  static void process_prepared_discard_passdown_pt2(struct dm_thin_new_mapping *m)

This looks like a good fix.

But I'll wait for Joe's confirmation before staging for 4.12 final
inclusion later this week.

Thanks,
Mike

      reply	other threads:[~2017-06-26 19:06 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-23 18:53 [PATCH] dm thin: do not queue freed thin mapping for next stage processing Vallish Vaidyeshwara
2017-06-26 19:06 ` Mike Snitzer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170626190638.GA375@redhat.com \
    --to=snitzer@redhat.com \
    --cc=agk@redhat.com \
    --cc=anchalag@amazon.com \
    --cc=dm-devel@redhat.com \
    --cc=eduval@amazon.com \
    --cc=ejt@redhat.com \
    --cc=gafton@amazon.com \
    --cc=vallish@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.