From: Neil Brown <neilb@suse.de>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Danecki, Jacek" <jacek.danecki@intel.com>,
"Ciechanowski, Ed" <ed.ciechanowski@intel.com>,
linux-raid <linux-raid@vger.kernel.org>
Subject: Re: array_state_store: 'inactive' versus 'clean' race
Date: Wed, 29 Apr 2009 11:14:26 +1000 [thread overview]
Message-ID: <18935.43506.654641.312777@notabene.brown> (raw)
In-Reply-To: message from Dan Williams on Tuesday April 28
On Tuesday April 28, dan.j.williams@intel.com wrote:
> Neil Brown wrote:
> > So I'm leaning towards:
> > - a kernel change so that you can only set 'inactive' if the
> > array is already clean or readonly.
> > i.e. if (mddev->ro || mddev->in_sync)
> > which possible is the same as simply
> > if (mddev->in_sync)
> > - an mdadm change so that is doesn't just set 'inactive'
> > but rather
> > sysfs_set_str(mdi, NULL, "array_state", "clean");
> > ping_monitor(mdi->text_version);
> > sysfs_set_str(mdi, NULL, "array_state", "inactive")
>
> I recall that when implementing WaitClean there was a reason not to
> write 'clean' from mdadm... /me looks in the logs. Yes, from commit
> 0dd3ba30:
>
> "--wait-clean: shorten timeout
>
> Set the safemode timeout to a small value to get the array marked clean
> as soon as possible. We don't write 'clean' directly as it may cause
> mdmon to miss a 'write-pending' event."
>
> So we do not want a thread to hang because we missed its write-pending
> event, or will it receive an error after the device has been torn down?
I don't see that writing 'clean' can cause a 'write-pending' state to
be missed.
If the array is in 'write-pending', it is because some thread is in
md_write_start waiting for MD_CHANGE_CLEAN to be cleared. So
->writes_pending is elevated. So an attempt to write 'clean' will
result in -EBUSY.
???
>
> > - make sure that if mdmon sees the array as 'clean', it will
> > update it's metadata etc so that it feels no further urge
> > to mark the array clean.
> >
> > Also, we need to fix the WARNING, which I think means moving
> > list_for_each_entry(rdev, &mddev->disks, same_set)
> > if (rdev->raid_disk >= 0) {
> > char nm[20];
> > sprintf(nm, "rd%d", rdev->raid_disk);
> > sysfs_remove_link(&mddev->kobj, nm);
> > }
>
> It would also involve a flush_scheduled_work() somewhere to make sure
> the md_redundancy_group has been deleted. But it occurs to me that
> do_md_run() should still be failing in the case where userspace gets the
> ordering of 'inactive' wrong...
>
Uhmm..... I think we should split the removal of md_redundancy_group
out in to a separate delayed_work. But yes, we need to deal with that
somehow.
> >
> > in do_md_stop up in to the
> > case 0: /* disassemble */
> > case 2: /* stop */
> >
> > section.
> >
> > My one concern about this conclusion is that it seems to sidestep the
> > core problem rather than address it.
> > The core problem is that setting the state to 'clean' means very
> > different thing depending on which side you come from, so an app which
> > writes 'clean' might get more than it bargained for. All I'm doing is
> > making that confusion avoidable rather than impossible....
> >
> > I guess I could invent a syntax:
> > writing "a->b" to array_state sets the state to 'b' but only if it
> > was already in 'a'.
> > But that feels needlessly complex.
> >
> > So: do you agree with my leaning? or my concern? or both or neither?
>
> First let me say that I really appreciate when you do these
> implementation contingency brain dumps, it really helps to get all the
> cards on the table (something I would do well to emulate).
>
> I agree with your concern and I am currently more in line with the idea
> that the confusion should be impossible rather than avoidable. I now
> think it is a bug that the 'inactive' state has two meanings. A new
> state like 'shutdown' or 'defunct' would imply that 'clear' is the only
> valid next state, all other writes return an error.
>
> So, it is not quite as ugly as randomly preventing some
> 'inactive->clean' transitions in that userspace can see why its attempt
> to write 'clean' is returning -EINVAL.
>
> Thoughts?
I don't like the idea of a trap-door where once you get to 'shutdown'
the only way out is 'clear'. But I could possibly live with a state
like 'shutdown' which can transition to either 'clear' or 'inactive',
but not directly to 'clean'. I cannot think of an immediate use for
the 'shutdown'->'inactive' transition, but I wouldn't want to exclude
it.
But I really think the 'problem' is more around 'clean' then it is
around 'inactive'.
I think I would like writing 'clean' never to start the array.
If we want to start the array in the 'clean' state, then write
something else. I think we can just write 'active'. That will
actually leave the array in a 'clean' state I think. I'd need to
check that...
So now I think we just arrange that 'clean' returns -EINVAL if
mddev->pers == NULL.
We never currently write 'clean' to start an array. We either use
RUN_ARRAY, or mdmon writes either read_auto or active.
So that is safe.
So we need to:
- remove the rd%d links earlier
- remove the redundancy group earlier
- disable starting an array by writing 'clean'.
I think I've convinced myself. Have I convinced you?
NeilBrown
next prev parent reply other threads:[~2009-04-29 1:14 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-28 1:24 array_state_store: 'inactive' versus 'clean' race Dan Williams
2009-04-28 4:16 ` Neil Brown
2009-04-28 18:05 ` Dan Williams
2009-04-29 1:14 ` Neil Brown [this message]
2009-04-29 1:42 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=18935.43506.654641.312777@notabene.brown \
--to=neilb@suse.de \
--cc=dan.j.williams@intel.com \
--cc=ed.ciechanowski@intel.com \
--cc=jacek.danecki@intel.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).