linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Tejun Heo <teheo@suse.de>,
	linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	Doug Ledford <dledford@redhat.com>, Greg KH <greg@kroah.com>,
	Jens Axboe <jens.axboe@oracle.com>
Subject: Re: [PATCH 1/2] md: make devices disappear when they are no longer needed.
Date: Fri, 28 Nov 2008 11:23:14 +1100	[thread overview]
Message-ID: <18735.14834.899120.164099@notabene.brown> (raw)
In-Reply-To: message from Al Viro on Monday November 24

On Monday November 24, viro@ZenIV.linux.org.uk wrote:
> 
> I really think that it's much saner than trying to change the lifetime
> rules for gendisk, etc.

I'm not sure there is anything wrong with the lifetime rules for
gendisk....

In my mind a big issue is:
   What should happen if "open" is racing with "destroy device" ?

For 'sd', if you open before the decision has been made to tear down
the data structures, the open succeeds but you start getting EIO on
any IO.
If you open after the decision, you get ENODEV (or ENXIO?).
If you wait a little longer and some other device gets hot-plugged,
then the open succeeds on a different physical device.

For 'md' the situation is quite different, due to unfortunate legacy
interface design.
An open should never fail with ENODEV.
It might return a device on which read/write always fails and only
various 'ioctls' work, or it might return a device which is fully
working.

So we need to close the window in which ENODEV can be returned.

I think the code I wrote is the best way to close that window.  If we
detect that we are in the window during open, wait for the window to
close and retry, by returning ERESTARTSYS.


I've thought more about the possibility of the disk_release method
which is used to tear down the md data structures but I don't think it
would really answer the need.

A key point in time is when blk_unregister_region is called to
remove the mapping from minor number to the gendisk.
At any time before this, new references can be taken by an open.
At any time after this, any open will be directed through ->probe
which can do any necessary interlocking with ->open.

Currently we need to call blk_unregister_region before the final
put_disk otherwise a new reference could be taken while that put_disk
is running.
blk_unregister_region is normally called by unlink_gendisk called by
del_gendisk.
So we need to call del_gendisk before the final call to put_disk.
And once we have called del_gendisk we may as well clean up the md
specific stuff there and not bother with a ->disk_release method.

So I'm having trouble seeing what is wrong with my approach (other
than the fact it implements an unfortunate choice of user-space
interface, which we agree is a legacy requirement) or how anything
else could provide a better solution.

I can appreciate that there might be other issues with gendisk lifetime
(such as the fact that get_disk takes a reference to the module but
put_disk doesn't) but I suspect they are largely orthogonal to my
current issue.

Finally, I suspect that there is a problem with my patch as it stands
with respect to module reference counting.  I'll have to explore to
be sure exactly what is happening, but I strongly suspect it is
wrong.  Thank you for highlighting that issue for me.

NeilBrown

  parent reply	other threads:[~2008-11-28  0:23 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-24  3:55 [PATCH 0/2] RFC: allow md devices to disappear when not in use NeilBrown
2008-11-24  3:55 ` [PATCH 1/2] md: make devices disappear when they are no longer needed NeilBrown
2008-11-24  4:18   ` Tejun Heo
2008-11-24  5:13     ` Neil Brown
2008-11-24  5:34       ` Tejun Heo
2008-11-24  6:10         ` NeilBrown
2008-11-24  6:12           ` Tejun Heo
2008-11-24  6:24         ` Al Viro
2008-11-24  6:56           ` Tejun Heo
2008-11-24 13:31             ` Al Viro
2008-11-24 14:04               ` Tejun Heo
2008-11-24 14:26                 ` Tejun Heo
2008-11-24 14:48                   ` Al Viro
2008-11-24 16:08                     ` Tejun Heo
2008-11-24 16:42                       ` Al Viro
2008-11-24 17:18                         ` Tejun Heo
2008-11-28  0:23                     ` Neil Brown [this message]
2008-11-24  4:24   ` Al Viro
2008-11-24  4:47     ` Neil Brown
2008-11-24  6:38       ` Al Viro
2008-11-24  3:55 ` [PATCH 2/2] Allow md devices to be created by name NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18735.14834.899120.164099@notabene.brown \
    --to=neilb@suse.de \
    --cc=dledford@redhat.com \
    --cc=greg@kroah.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=teheo@suse.de \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).