All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: dm thin: superblock may write succeed before other metadata blocks because of wirting metadata in async mode.
       [not found] <20180522005336.GA30152@yyp.\x02>
@ 2018-06-19 13:11 ` Mike Snitzer
  2018-06-19 14:43   ` Joe Thornber
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Snitzer @ 2018-06-19 13:11 UTC (permalink / raw)
  To: Monty Pavel; +Cc: dm-devel

On Mon, May 21 2018 at  8:53pm -0400,
Monty Pavel <monty_pavel@sina.com> wrote:

> 
> If dm_bufio_write_dirty_buffers func is called by __commit_transaction
> func and power loss happens during executing it, coincidencely
> superblock wrote correctly but some metadata blocks didn't. The reason
> is we write all metadata in async mode. We can guarantee that we send
> superblock after other blocks but we cannot guarantee that superblock
> write completely early than other blocks.
> So, We need to commit other metadata blocks before change superblock.
> 
> Signed-off-by: Monty Pavel <monty_pavel@sina.com>
> ---
>  drivers/md/dm-thin-metadata.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
> index 36ef284..897d7d6 100644
> --- a/drivers/md/dm-thin-metadata.c
> +++ b/drivers/md/dm-thin-metadata.c
> @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd)
>  	if (r)
>  		return r;
>  
> +	r = dm_tm_commit(pmd->tm, sblock);
> +	if (r)
> +		return r;
> +
> +	r = superblock_lock(pmd, &sblock);
> +	if (r)
> +		return r;
> +
>  	disk_super = dm_block_data(sblock);
>  	disk_super->time = cpu_to_le32(pmd->time);
>  	disk_super->data_mapping_root = cpu_to_le64(pmd->root);
> -- 
> 1.7.1

Have you actually found this patch to be effective?  It should be
unnecessary.  But I must admit that in looking at the related code I
couldn't convince myself it was.

But then Joe pointed me to this comment block from
dm-transaction-manager.h:

/*
 * We use a 2-phase commit here.
 *
 * i) Make all changes for the transaction *except* for the superblock.
 * Then call dm_tm_pre_commit() to flush them to disk.
 *
 * ii) Lock your superblock.  Update.  Then call dm_tm_commit() which will
 * unlock the superblock and flush it.  No other blocks should be updated
 * during this period.  Care should be taken to never unlock a partially
 * updated superblock; perform any operations that could fail *before* you
 * take the superblock lock.
 */
int dm_tm_pre_commit(struct dm_transaction_manager *tm);
int dm_tm_commit(struct dm_transaction_manager *tm, struct dm_block *superblock);

So given __commit_transaction() is using dm_tm_pre_commit() prior to the
dm_tm_commit() to flush the superblock -- it would seem that there isn't
any conceptual potential for corruption.

If you've found the dm_tm_pre_commit() to be lacking (whereby not all
metadata getting flushed to disk before the superblock) then please
explain your findings.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dm thin: superblock may write succeed before other metadata blocks because of wirting metadata in async mode.
  2018-06-19 13:11 ` dm thin: superblock may write succeed before other metadata blocks because of wirting metadata in async mode Mike Snitzer
@ 2018-06-19 14:43   ` Joe Thornber
  2018-06-19 15:00     ` Mike Snitzer
  0 siblings, 1 reply; 7+ messages in thread
From: Joe Thornber @ 2018-06-19 14:43 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Monty Pavel

On Tue, Jun 19, 2018 at 09:11:06AM -0400, Mike Snitzer wrote:
> On Mon, May 21 2018 at  8:53pm -0400,
> Monty Pavel <monty_pavel@sina.com> wrote:
> 
> > 
> > If dm_bufio_write_dirty_buffers func is called by __commit_transaction
> > func and power loss happens during executing it, coincidencely
> > superblock wrote correctly but some metadata blocks didn't. The reason
> > is we write all metadata in async mode. We can guarantee that we send
> > superblock after other blocks but we cannot guarantee that superblock
> > write completely early than other blocks.
> > So, We need to commit other metadata blocks before change superblock.
> > 
> > Signed-off-by: Monty Pavel <monty_pavel@sina.com>
> > ---
> >  drivers/md/dm-thin-metadata.c |    8 ++++++++
> >  1 files changed, 8 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
> > index 36ef284..897d7d6 100644
> > --- a/drivers/md/dm-thin-metadata.c
> > +++ b/drivers/md/dm-thin-metadata.c
> > @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd)
> >  	if (r)
> >  		return r;
> >  
> > +	r = dm_tm_commit(pmd->tm, sblock);
> > +	if (r)
> > +		return r;
> > +
> > +	r = superblock_lock(pmd, &sblock);
> > +	if (r)
> > +		return r;
> > +
> >  	disk_super = dm_block_data(sblock);
> >  	disk_super->time = cpu_to_le32(pmd->time);
> >  	disk_super->data_mapping_root = cpu_to_le64(pmd->root);

I don't believe you've tested this; sblock is passed to dm_tm_commit()
uninitialised, and you didn't even bother to remove the later (and correct)
call to dm_tm_commit().  See dm-transaction-manager.h for an explanation of
how the 2-phase commit works.

What is the issue that started you looking in this area?

- Joe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dm thin: superblock may write succeed before other metadata blocks because of wirting metadata in async mode.
  2018-06-19 14:43   ` Joe Thornber
@ 2018-06-19 15:00     ` Mike Snitzer
       [not found]       ` <20180620170357.GA5838@yyp.\x02>
  2018-06-20 17:03       ` monty
  0 siblings, 2 replies; 7+ messages in thread
From: Mike Snitzer @ 2018-06-19 15:00 UTC (permalink / raw)
  To: Monty Pavel, dm-devel; +Cc: ejt

On Tue, Jun 19 2018 at 10:43am -0400,
Joe Thornber <thornber@redhat.com> wrote:

> On Tue, Jun 19, 2018 at 09:11:06AM -0400, Mike Snitzer wrote:
> > On Mon, May 21 2018 at  8:53pm -0400,
> > Monty Pavel <monty_pavel@sina.com> wrote:
> > 
> > > 
> > > If dm_bufio_write_dirty_buffers func is called by __commit_transaction
> > > func and power loss happens during executing it, coincidencely
> > > superblock wrote correctly but some metadata blocks didn't. The reason
> > > is we write all metadata in async mode. We can guarantee that we send
> > > superblock after other blocks but we cannot guarantee that superblock
> > > write completely early than other blocks.
> > > So, We need to commit other metadata blocks before change superblock.
> > > 
> > > Signed-off-by: Monty Pavel <monty_pavel@sina.com>
> > > ---
> > >  drivers/md/dm-thin-metadata.c |    8 ++++++++
> > >  1 files changed, 8 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
> > > index 36ef284..897d7d6 100644
> > > --- a/drivers/md/dm-thin-metadata.c
> > > +++ b/drivers/md/dm-thin-metadata.c
> > > @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd)
> > >  	if (r)
> > >  		return r;
> > >  
> > > +	r = dm_tm_commit(pmd->tm, sblock);
> > > +	if (r)
> > > +		return r;
> > > +
> > > +	r = superblock_lock(pmd, &sblock);
> > > +	if (r)
> > > +		return r;
> > > +
> > >  	disk_super = dm_block_data(sblock);
> > >  	disk_super->time = cpu_to_le32(pmd->time);
> > >  	disk_super->data_mapping_root = cpu_to_le64(pmd->root);
> 
> I don't believe you've tested this; sblock is passed to dm_tm_commit()
> uninitialised, and you didn't even bother to remove the later (and correct)
> call to dm_tm_commit().

I pointed out to Joe that the patch, in isolation, is decieving.  It
_looks_ like sblock may be uninitialized, etc.  But once the patch is
applied and you look at the entirety of __commit_transaction() it is
clear that you're reusing the existing superblock_lock() to safely
accomplish your additional call to dm_tm_commit().

> What is the issue that started you looking in this area?

Right, as my previous reply asked: please clarify if you _know_ your
patch fixes an actual problem you've experienced.  The more details the
better.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dm thin: superblock may write succeed before other metadata blocks because of wirting metadata in async mode.
       [not found]       ` <20180620170357.GA5838@yyp.\x02>
@ 2018-06-20 14:51         ` Mike Snitzer
  2018-06-21 16:54           ` monty
  2018-06-20 15:00         ` Joe Thornber
  1 sibling, 1 reply; 7+ messages in thread
From: Mike Snitzer @ 2018-06-20 14:51 UTC (permalink / raw)
  To: monty; +Cc: dm-devel, thornber

On Wed, Jun 20 2018 at  1:03pm -0400,
monty <monty_pavel@sina.com> wrote:

> 
> On Tue, Jun 19, 2018 at 11:00:32AM -0400, Mike Snitzer wrote:
> > 
> > On Tue, Jun 19 2018 at 10:43am -0400,
> > Joe Thornber <thornber@redhat.com> wrote:
> > 
> > > On Tue, Jun 19, 2018 at 09:11:06AM -0400, Mike Snitzer wrote:
> > > > On Mon, May 21 2018 at  8:53pm -0400,
> > > > Monty Pavel <monty_pavel@sina.com> wrote:
> > > > 
> > > > > 
> > > > > If dm_bufio_write_dirty_buffers func is called by __commit_transaction
> > > > > func and power loss happens during executing it, coincidencely
> > > > > superblock wrote correctly but some metadata blocks didn't. The reason
> > > > > is we write all metadata in async mode. We can guarantee that we send
> > > > > superblock after other blocks but we cannot guarantee that superblock
> > > > > write completely early than other blocks.
> > > > > So, We need to commit other metadata blocks before change superblock.
> > > > > 
> > > > > Signed-off-by: Monty Pavel <monty_pavel@sina.com>
> > > > > ---
> > > > >  drivers/md/dm-thin-metadata.c |    8 ++++++++
> > > > >  1 files changed, 8 insertions(+), 0 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
> > > > > index 36ef284..897d7d6 100644
> > > > > --- a/drivers/md/dm-thin-metadata.c
> > > > > +++ b/drivers/md/dm-thin-metadata.c
> > > > > @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd)
> > > > >  	if (r)
> > > > >  		return r;
> > > > >  
> > > > > +	r = dm_tm_commit(pmd->tm, sblock);
> > > > > +	if (r)
> > > > > +		return r;
> > > > > +
> > > > > +	r = superblock_lock(pmd, &sblock);
> > > > > +	if (r)
> > > > > +		return r;
> > > > > +
> > > > >  	disk_super = dm_block_data(sblock);
> > > > >  	disk_super->time = cpu_to_le32(pmd->time);
> > > > >  	disk_super->data_mapping_root = cpu_to_le64(pmd->root);
> > > 
> > > I don't believe you've tested this; sblock is passed to dm_tm_commit()
> > > uninitialised, and you didn't even bother to remove the later (and correct)
> > > call to dm_tm_commit().
> > 
> > I pointed out to Joe that the patch, in isolation, is decieving.  It
> > _looks_ like sblock may be uninitialized, etc.  But once the patch is
> > applied and you look at the entirety of __commit_transaction() it is
> > clear that you're reusing the existing superblock_lock() to safely
> > accomplish your additional call to dm_tm_commit().
> > 
> > > What is the issue that started you looking in this area?
> > 
> > Right, as my previous reply asked: please clarify if you _know_ your
> > patch fixes an actual problem you've experienced.  The more details the
> > better.
> > 
> > Thanks,
> > Mike
> > 
> Hi, Mike and Joe. Thanks for your reply. I read __commit_transaction
> many times and didn't find any problem of 2-phase commit. I use
> md-raid1(PCIe nvme and md-raid5) in write-behind mode to store dm-thin
> metadata.
> Test case:
> 1. I do copy-diff test on thin device and then reboot my machine.
> 2. Rebuild our device stack and exec "vgchang -ay".
> The thin-pool can not be established(details_root become a bitmap node
> and metadata's bitmap_root become a btree_node).

But are you saying your double commit in __commit_transaction() serves
as a workaround for the corruption you're seeing?

Is it just a case where raid5's writebehind mode is _not_ safe for your
storage config?  By "reboot" do you mean a clean shutdown?  Or a forced
powerfail scenario?

Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dm thin: superblock may write succeed before other metadata blocks because of wirting metadata in async mode.
       [not found]       ` <20180620170357.GA5838@yyp.\x02>
  2018-06-20 14:51         ` Mike Snitzer
@ 2018-06-20 15:00         ` Joe Thornber
  1 sibling, 0 replies; 7+ messages in thread
From: Joe Thornber @ 2018-06-20 15:00 UTC (permalink / raw)
  To: monty; +Cc: dm-devel, Mike Snitzer

On Wed, Jun 20, 2018 at 01:03:57PM -0400, monty wrote:
> Hi, Mike and Joe. Thanks for your reply. I read __commit_transaction
> many times and didn't find any problem of 2-phase commit. I use
> md-raid1(PCIe nvme and md-raid5) in write-behind mode to store dm-thin
> metadata.
> Test case:
> 1. I do copy-diff test on thin device and then reboot my machine.
> 2. Rebuild our device stack and exec "vgchang -ay".
> The thin-pool can not be established(details_root become a bitmap node
> and metadata's bitmap_root become a btree_node).

As you simplify your setup does the problem go away?  eg, turn off write-behind, use just the nvme dev etc.

The only effect of your change is to call flush twice rather than once.

- Joe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dm thin: superblock may write succeed before other metadata blocks because of wirting metadata in async mode.
  2018-06-19 15:00     ` Mike Snitzer
       [not found]       ` <20180620170357.GA5838@yyp.\x02>
@ 2018-06-20 17:03       ` monty
  1 sibling, 0 replies; 7+ messages in thread
From: monty @ 2018-06-20 17:03 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, thornber


On Tue, Jun 19, 2018 at 11:00:32AM -0400, Mike Snitzer wrote:
> 
> On Tue, Jun 19 2018 at 10:43am -0400,
> Joe Thornber <thornber@redhat.com> wrote:
> 
> > On Tue, Jun 19, 2018 at 09:11:06AM -0400, Mike Snitzer wrote:
> > > On Mon, May 21 2018 at  8:53pm -0400,
> > > Monty Pavel <monty_pavel@sina.com> wrote:
> > > 
> > > > 
> > > > If dm_bufio_write_dirty_buffers func is called by __commit_transaction
> > > > func and power loss happens during executing it, coincidencely
> > > > superblock wrote correctly but some metadata blocks didn't. The reason
> > > > is we write all metadata in async mode. We can guarantee that we send
> > > > superblock after other blocks but we cannot guarantee that superblock
> > > > write completely early than other blocks.
> > > > So, We need to commit other metadata blocks before change superblock.
> > > > 
> > > > Signed-off-by: Monty Pavel <monty_pavel@sina.com>
> > > > ---
> > > >  drivers/md/dm-thin-metadata.c |    8 ++++++++
> > > >  1 files changed, 8 insertions(+), 0 deletions(-)
> > > > 
> > > > diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
> > > > index 36ef284..897d7d6 100644
> > > > --- a/drivers/md/dm-thin-metadata.c
> > > > +++ b/drivers/md/dm-thin-metadata.c
> > > > @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd)
> > > >  	if (r)
> > > >  		return r;
> > > >  
> > > > +	r = dm_tm_commit(pmd->tm, sblock);
> > > > +	if (r)
> > > > +		return r;
> > > > +
> > > > +	r = superblock_lock(pmd, &sblock);
> > > > +	if (r)
> > > > +		return r;
> > > > +
> > > >  	disk_super = dm_block_data(sblock);
> > > >  	disk_super->time = cpu_to_le32(pmd->time);
> > > >  	disk_super->data_mapping_root = cpu_to_le64(pmd->root);
> > 
> > I don't believe you've tested this; sblock is passed to dm_tm_commit()
> > uninitialised, and you didn't even bother to remove the later (and correct)
> > call to dm_tm_commit().
> 
> I pointed out to Joe that the patch, in isolation, is decieving.  It
> _looks_ like sblock may be uninitialized, etc.  But once the patch is
> applied and you look at the entirety of __commit_transaction() it is
> clear that you're reusing the existing superblock_lock() to safely
> accomplish your additional call to dm_tm_commit().
> 
> > What is the issue that started you looking in this area?
> 
> Right, as my previous reply asked: please clarify if you _know_ your
> patch fixes an actual problem you've experienced.  The more details the
> better.
> 
> Thanks,
> Mike
> 
Hi, Mike and Joe. Thanks for your reply. I read __commit_transaction
many times and didn't find any problem of 2-phase commit. I use
md-raid1(PCIe nvme and md-raid5) in write-behind mode to store dm-thin
metadata.
Test case:
1. I do copy-diff test on thin device and then reboot my machine.
2. Rebuild our device stack and exec "vgchang -ay".
The thin-pool can not be established(details_root become a bitmap node
and metadata's bitmap_root become a btree_node).

Thanks,
Monty

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dm thin: superblock may write succeed before other metadata blocks because of wirting metadata in async mode.
  2018-06-20 14:51         ` Mike Snitzer
@ 2018-06-21 16:54           ` monty
  0 siblings, 0 replies; 7+ messages in thread
From: monty @ 2018-06-21 16:54 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, thornber


On Wed, Jun 20, 2018 at 10:51:17AM -0400, Mike Snitzer wrote:
> 
> On Wed, Jun 20 2018 at  1:03pm -0400,
> monty <monty_pavel@sina.com> wrote:
> 
> > 
> > On Tue, Jun 19, 2018 at 11:00:32AM -0400, Mike Snitzer wrote:
> > > 
> > > On Tue, Jun 19 2018 at 10:43am -0400,
> > > Joe Thornber <thornber@redhat.com> wrote:
> > > 
> > > > On Tue, Jun 19, 2018 at 09:11:06AM -0400, Mike Snitzer wrote:
> > > > > On Mon, May 21 2018 at  8:53pm -0400,
> > > > > Monty Pavel <monty_pavel@sina.com> wrote:
> > > > > 
> > > > > > 
> > > > > > If dm_bufio_write_dirty_buffers func is called by __commit_transaction
> > > > > > func and power loss happens during executing it, coincidencely
> > > > > > superblock wrote correctly but some metadata blocks didn't. The reason
> > > > > > is we write all metadata in async mode. We can guarantee that we send
> > > > > > superblock after other blocks but we cannot guarantee that superblock
> > > > > > write completely early than other blocks.
> > > > > > So, We need to commit other metadata blocks before change superblock.
> > > > > > 
> > > > > > Signed-off-by: Monty Pavel <monty_pavel@sina.com>
> > > > > > ---
> > > > > >  drivers/md/dm-thin-metadata.c |    8 ++++++++
> > > > > >  1 files changed, 8 insertions(+), 0 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
> > > > > > index 36ef284..897d7d6 100644
> > > > > > --- a/drivers/md/dm-thin-metadata.c
> > > > > > +++ b/drivers/md/dm-thin-metadata.c
> > > > > > @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd)
> > > > > >  	if (r)
> > > > > >  		return r;
> > > > > >  
> > > > > > +	r = dm_tm_commit(pmd->tm, sblock);
> > > > > > +	if (r)
> > > > > > +		return r;
> > > > > > +
> > > > > > +	r = superblock_lock(pmd, &sblock);
> > > > > > +	if (r)
> > > > > > +		return r;
> > > > > > +
> > > > > >  	disk_super = dm_block_data(sblock);
> > > > > >  	disk_super->time = cpu_to_le32(pmd->time);
> > > > > >  	disk_super->data_mapping_root = cpu_to_le64(pmd->root);
> > > > 
> > > > I don't believe you've tested this; sblock is passed to dm_tm_commit()
> > > > uninitialised, and you didn't even bother to remove the later (and correct)
> > > > call to dm_tm_commit().
> > > 
> > > I pointed out to Joe that the patch, in isolation, is decieving.  It
> > > _looks_ like sblock may be uninitialized, etc.  But once the patch is
> > > applied and you look at the entirety of __commit_transaction() it is
> > > clear that you're reusing the existing superblock_lock() to safely
> > > accomplish your additional call to dm_tm_commit().
> > > 
> > > > What is the issue that started you looking in this area?
> > > 
> > > Right, as my previous reply asked: please clarify if you _know_ your
> > > patch fixes an actual problem you've experienced.  The more details the
> > > better.
> > > 
> > > Thanks,
> > > Mike
> > > 
> > Hi, Mike and Joe. Thanks for your reply. I read __commit_transaction
> > many times and didn't find any problem of 2-phase commit. I use
> > md-raid1(PCIe nvme and md-raid5) in write-behind mode to store dm-thin
> > metadata.
> > Test case:
> > 1. I do copy-diff test on thin device and then reboot my machine.
> > 2. Rebuild our device stack and exec "vgchang -ay".
> > The thin-pool can not be established(details_root become a bitmap node
> > and metadata's bitmap_root become a btree_node).
> 
> But are you saying your double commit in __commit_transaction() serves
> as a workaround for the corruption you're seeing?
> 
> Is it just a case where raid5's writebehind mode is _not_ safe for your
> storage config?  By "reboot" do you mean a clean shutdown?  Or a forced
> powerfail scenario?
> 
> Mike
> 
Reboot my machine means exec reboot command.
My patch seems unnecessary, because 2-phase commit have ensure that
committing superblock after other metadata have written to metadata
device.
This problem is hard to recreate, it didn't happen after applying the
patch above. I will check my device stack if there is any possible that
superblock write complete early than other metadata block.

Monty

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-06-21 16:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20180522005336.GA30152@yyp.\x02>
2018-06-19 13:11 ` dm thin: superblock may write succeed before other metadata blocks because of wirting metadata in async mode Mike Snitzer
2018-06-19 14:43   ` Joe Thornber
2018-06-19 15:00     ` Mike Snitzer
     [not found]       ` <20180620170357.GA5838@yyp.\x02>
2018-06-20 14:51         ` Mike Snitzer
2018-06-21 16:54           ` monty
2018-06-20 15:00         ` Joe Thornber
2018-06-20 17:03       ` monty

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.