linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Neil Brown <neilb@suse.de>
Cc: "Danecki, Jacek" <jacek.danecki@intel.com>,
	"Ciechanowski, Ed" <ed.ciechanowski@intel.com>,
	linux-raid <linux-raid@vger.kernel.org>
Subject: Re: array_state_store: 'inactive' versus 'clean' race
Date: Tue, 28 Apr 2009 11:05:20 -0700	[thread overview]
Message-ID: <49F74560.7030702@intel.com> (raw)
In-Reply-To: <18934.33566.354244.217291@notabene.brown>

Neil Brown wrote:
> So I'm leaning towards:
>   - a kernel change so that you can only set 'inactive' if the
>     array is already clean or readonly.
>     i.e. if (mddev->ro || mddev->in_sync)
>     which possible is the same as simply
>          if (mddev->in_sync)
>   - an mdadm change so that is doesn't just set 'inactive'
>     but rather
>          sysfs_set_str(mdi, NULL, "array_state", "clean");
> 	 ping_monitor(mdi->text_version);
> 	 sysfs_set_str(mdi, NULL, "array_state", "inactive")

I recall that when implementing WaitClean there was a reason not to 
write 'clean' from mdadm... /me looks in the logs.  Yes, from commit 
0dd3ba30:

"--wait-clean: shorten timeout

Set the safemode timeout to a small value to get the array marked clean 
as soon as possible.  We don't write 'clean' directly as it may cause 
mdmon to miss a 'write-pending' event."

So we do not want a thread to hang because we missed its write-pending 
event, or will it receive an error after the device has been torn down?

>   - make sure that if mdmon sees the array as 'clean', it will
>     update it's metadata etc so that it feels no further urge
>     to mark the array clean.
> 
> Also, we need to fix the WARNING, which I think means moving
> 		list_for_each_entry(rdev, &mddev->disks, same_set)
> 			if (rdev->raid_disk >= 0) {
> 				char nm[20];
> 				sprintf(nm, "rd%d", rdev->raid_disk);
> 				sysfs_remove_link(&mddev->kobj, nm);
> 			}

It would also involve a flush_scheduled_work() somewhere to make sure 
the md_redundancy_group has been deleted.  But it occurs to me that 
do_md_run() should still be failing in the case where userspace gets the 
ordering of 'inactive' wrong...

> 
> in do_md_stop up in to the
> 		case 0: /* disassemble */
> 		case 2: /* stop */
> 
> section.
> 
> My one concern about this conclusion is that it seems to sidestep the
> core problem rather than address it.
> The core problem is that setting the state to 'clean' means very
> different thing depending on which side you come from, so an app which
> writes 'clean' might get more than it bargained for.  All I'm doing is
> making that confusion avoidable rather than impossible....
> 
> I guess I could invent a syntax:
>  writing "a->b" to array_state sets the state to 'b' but only if it
>  was already in 'a'.
> But that feels needlessly complex.
> 
> So:  do you agree with my leaning?  or my concern?  or both or neither?

First let me say that I really appreciate when you do these 
implementation contingency brain dumps, it really helps to get all the 
cards on the table (something I would do well to emulate).

I agree with your concern and I am currently more in line with the idea 
that the confusion should be impossible rather than avoidable.  I now 
think it is a bug that the 'inactive' state has two meanings.  A new 
state like 'shutdown' or 'defunct' would imply that 'clear' is the only 
valid next state, all other writes return an error.

So, it is not quite as ugly as randomly preventing some 
'inactive->clean' transitions in that userspace can see why its attempt 
to write 'clean' is returning -EINVAL.

Thoughts?

Thanks,
Dan

  reply	other threads:[~2009-04-28 18:05 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-28  1:24 array_state_store: 'inactive' versus 'clean' race Dan Williams
2009-04-28  4:16 ` Neil Brown
2009-04-28 18:05   ` Dan Williams [this message]
2009-04-29  1:14     ` Neil Brown
2009-04-29  1:42       ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49F74560.7030702@intel.com \
    --to=dan.j.williams@intel.com \
    --cc=ed.ciechanowski@intel.com \
    --cc=jacek.danecki@intel.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).