All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: dm-devel@redhat.com, David Teigland <teigland@redhat.com>
Subject: Re: dm writecache: fix data corruption when reloading the target
Date: Wed, 15 Apr 2020 09:01:54 -0400	[thread overview]
Message-ID: <20200415130153.GB29959@redhat.com> (raw)
In-Reply-To: <alpine.LRH.2.02.2004150405330.3968@file01.intranet.prod.int.rdu2.redhat.com>

On Wed, Apr 15 2020 at  4:14am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Tue, 14 Apr 2020, Mike Snitzer wrote:
> 
> > On Wed, Apr 08 2020 at  3:02pm -0400,
> > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > 
> > > The dm-writecache reads metadata in the target constructor. However, when 
> > > we reload the target, there could be another active instance running on 
> > > the same device. This is the sequence of operations when doing a reload:
> > > 
> > > 1. construct new target
> > > 2. suspend old target
> > > 3. resume new target
> > > 4. destroy old target
> > > 
> > > Metadata that were written by the old target between steps 1 and 2 would
> > > not be visible by the new target.
> > > 
> > > This patch fixes the data corruption by loading the metadata in the resume
> > > handler.
> > > 
> > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > Cc: stable@vger.kernel.org	# v4.18+
> > > Fixes: 48debafe4f2f ("dm: add writecache target")
> > > 
> > > ---
> > >  drivers/md/dm-writecache.c |   44 ++++++++++++++++++++++++++++++--------------
> > >  1 file changed, 30 insertions(+), 14 deletions(-)
> > > 
> > > Index: linux-2.6/drivers/md/dm-writecache.c
> > > ===================================================================
> > > --- linux-2.6.orig/drivers/md/dm-writecache.c	2020-04-08 14:47:17.000000000 +0200
> > > +++ linux-2.6/drivers/md/dm-writecache.c	2020-04-08 20:59:15.000000000 +0200
> > > @@ -931,6 +931,24 @@ static int writecache_alloc_entries(stru
> > >  	return 0;
> > >  }
> > >  
> > > +static int writecache_read_metadata(struct dm_writecache *wc, sector_t n_sectors)
> > > +{
> > > +	struct dm_io_region region;
> > > +	struct dm_io_request req;
> > > +
> > > +	region.bdev = wc->ssd_dev->bdev;
> > > +	region.sector = wc->start_sector;
> > > +	region.count = wc->metadata_sectors;
> > > +	req.bi_op = REQ_OP_READ;
> > > +	req.bi_op_flags = REQ_SYNC;
> > > +	req.mem.type = DM_IO_VMA;
> > > +	req.mem.ptr.vma = (char *)wc->memory_map;
> > > +	req.client = wc->dm_io;
> > > +	req.notify.fn = NULL;
> > > +
> > > +	return dm_io(&req, 1, &region, NULL);
> > > +}
> > > +
> > 
> > You aren't using the passed n_sectors (for region.count?)
> > 
> > 
> > >  static void writecache_resume(struct dm_target *ti)
> > >  {
> > >  	struct dm_writecache *wc = ti->private;
> > > @@ -941,8 +959,16 @@ static void writecache_resume(struct dm_
> > >  
> > >  	wc_lock(wc);
> > >  
> > > -	if (WC_MODE_PMEM(wc))
> > > +	if (WC_MODE_PMEM(wc)) {
> > >  		persistent_memory_invalidate_cache(wc->memory_map, wc->memory_map_size);
> > > +	} else {
> > > +		r = writecache_read_metadata(wc, wc->metadata_sectors);
> > > +		if (r) {
> > > +			writecache_error(wc, r, "unable to read metadata: %d", r);
> > > +			memset((char *)wc->memory_map + offsetof(struct wc_memory_superblock, entries), -1,
> > > +			       (wc->metadata_sectors << SECTOR_SHIFT) - offsetof(struct wc_memory_superblock, entries));
> > > +		}
> > > +	}
> > >  
> > >  	wc->tree = RB_ROOT;
> > >  	INIT_LIST_HEAD(&wc->lru);
> > > @@ -2200,8 +2226,6 @@ invalid_optional:
> > >  			goto bad;
> > >  		}
> > >  	} else {
> > > -		struct dm_io_region region;
> > > -		struct dm_io_request req;
> > >  		size_t n_blocks, n_metadata_blocks;
> > >  		uint64_t n_bitmap_bits;
> > >  
> > > @@ -2258,17 +2282,9 @@ invalid_optional:
> > >  			goto bad;
> > >  		}
> > >  
> > > -		region.bdev = wc->ssd_dev->bdev;
> > > -		region.sector = wc->start_sector;
> > > -		region.count = wc->metadata_sectors;
> > > -		req.bi_op = REQ_OP_READ;
> > > -		req.bi_op_flags = REQ_SYNC;
> > > -		req.mem.type = DM_IO_VMA;
> > > -		req.mem.ptr.vma = (char *)wc->memory_map;
> > > -		req.client = wc->dm_io;
> > > -		req.notify.fn = NULL;
> > > -
> > > -		r = dm_io(&req, 1, &region, NULL);
> > > +		r = writecache_read_metadata(wc,
> > > +			min((sector_t)bdev_logical_block_size(wc->ssd_dev->bdev) >> SECTOR_SHIFT,
> > > +			    (sector_t)wc->metadata_sectors));
> > 
> > Can you explain why this is needed?  Why isn't wc->metadata_sectors
> > already compatible with wc->ssd_dev->bdev ?
> 
> bdev_logical_block_size is the minimum size accepted by the device. If we 
> used just bdev_logical_block_size(wc->ssd_dev->bdev), someone could (by 
> using extremely small device with large logical_block_size) trigger 
> writing out of the allocated memory.

OK...
 
> > Yet you just use wc->metadata_sectors in the new call to
> > writecache_read_metadata() in writecache_resume()...
> 
> This was my mistake. Change it to "region.count = n_sectors";

sure, that addresses one aspect.  But I'm also asking:
given what yoou said above about reading past end of smaller device, why
is it safe to do this in writecache_resume ?

r = writecache_read_metadata(wc, wc->metadata_sectors);

Shouldn't ctr do extra validation and then all calls to
writecache_read_metadata() use wc->metadata_sectors?  Which would remove
need to pass extra 'n_sectors' arg to writecache_read_metadata()?

Mike

  parent reply	other threads:[~2020-04-15 13:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-08 19:02 [PATCH] dm writecache: fix data corruption when reloading the target Mikulas Patocka
2020-04-14 19:05 ` Mike Snitzer
2020-04-15  8:14   ` Mikulas Patocka
2020-04-15 12:31     ` [PATCH v2] " Mikulas Patocka
2020-04-15 13:01     ` Mike Snitzer [this message]
2020-04-15 14:49       ` Mikulas Patocka
2020-04-15 14:53         ` Mike Snitzer
2020-04-15 15:01         ` Mikulas Patocka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200415130153.GB29959@redhat.com \
    --to=snitzer@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=mpatocka@redhat.com \
    --cc=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.