From: Guangliang Zhao <lucienchao@gmail.com>
To: Josh Durgin <josh.durgin@inktank.com>
Cc: Alex Elder <alex.elder@linaro.org>,
Guangliang Zhao <guangliang@unitedstack.com>,
ceph-devel@vger.kernel.org, sage@inktank.com,
lucienchao@gmail.com
Subject: Re: [PATCH v3] rbd: add ioctl for rbd
Date: Tue, 24 Sep 2013 11:06:57 +0800 [thread overview]
Message-ID: <20130924030657.GA8545@x230> (raw)
In-Reply-To: <523FF68B.80203@inktank.com>
On Mon, Sep 23, 2013 at 01:06:35AM -0700, Josh Durgin wrote:
> On 09/21/2013 08:53 AM, Alex Elder wrote:
> >On 09/18/2013 01:35 AM, Guangliang Zhao wrote:
> >>When running the following commands:
> >> [root@ceph0 mnt]# blockdev --setro /dev/rbd1
> >> [root@ceph0 mnt]# blockdev --getro /dev/rbd1
> >> 0
> >>
> >>The block setro didn't take effect, it is because
> >>the rbd doesn't support ioctl of block driver.
> >
> >Nicely done.
> >
> >I have a couple small comments below, and one simple
> >change that needs to be made.
> >
> >I also point out another issue about the use of
> >the spinlock to protect against read-only state
> >changes; I'd like Josh to weigh in on how he thinks
> >it might best be handled.
> >
> >Provided you make the simple change, and Josh has no
> >problem with the read-only state thing:
> >
> >Reviewed-by: Alex Elder <elder@linaro.org>
> >
> >The required change is something Josh mentioned. In
> >rbd_request_fn(), the variable read_only is defined
> >and assigned at the top of the function. That line
> >needs to be inside the while loop, to ensure that
> >the most up-to-date value is used (in case it gets
> >changed between requests).
Sorry, forgot it.
> >
> >>The results with this patch:
> >>root@ceph-client01:~# rbd map img
> >>root@ceph-client01:~# blockdev --getro /dev/rbd1
> >>0
> >>root@ceph-client01:~# cat /sys/block/rbd1/ro
> >>0
> >>root@ceph-client01:~# blockdev --setro /dev/rbd1
> >>root@ceph-client01:~# blockdev --getro /dev/rbd1
> >>1
> >>root@ceph-client01:~# cat /sys/block/rbd1/ro
> >>1
> >>root@ceph-client01:~# dd if=/dev/zero of=/dev/rbd1 count=1
> >>dd: opening `/dev/rbd1': Read-only file system`
> >>root@ceph-client01:~# blockdev --setrw /dev/rbd1
> >>root@ceph-client01:~# blockdev --getro /dev/rbd1
> >>0
> >>root@ceph-client01:~# cat /sys/block/rbd1/ro
> >>0
> >>root@ceph-client01:~# dd if=/dev/zero of=/dev/rbd1 count=1
> >>1+0 records in
> >>1+0 records out
> >>512 bytes (512 B) copied, 0.14989 s, 3.4 kB/s
> >
> >I don't necessarily want to see all this testing output
> >in the commit description (a summary would suffice). As
> >Josh suggested, some sort of test script to validate all
> >of this would be much appreciated. It can be very simple:
> >- map rbd device
> >- run "blockdev --getro" on it, and verify it reports 0
> >- run "blockdev --setro" and verify it is now read-only
> >... and so on.
> >
> >Tests we suggested, which I expect will pass but I
> >don't see evidence of it in your description above:
> >- verify changing the state (setro or setrw) fails on
> > a device that's already open (e.g. mounted) (EBUSY)
> >- verify setro on a snapshot that's mounted (EBUSY)
> >- verify setrw on a non-snapshot image that is mapped
> > read-only (EROFS)
All of above have been tested manually, and I also think it should be
better if add these scripts into qa/rbd/rbd.sh
>
> To be clear, you have to use the sysfs interface with the 'ro' option
> to test this case, rather than using the 'rbd map' command, which
> has no way to pass the 'ro' or 'rw' options.
>
> The syntax is documented in Documentation/ABI/testing/sysfs-bus-rbd.
Thanks.
BTW, I have submitted a patch for it, and "rbd map" could map readonly
device soon.
>
> >- verify setrw on a snapshot fails (EROFS)
> >
> >>This resolves:
> >> http://tracker.ceph.com/issues/6265
> >>
> >>Signed-off-by: Guangliang Zhao <guangliang@unitedstack.com>
> >>---
> >> drivers/block/rbd.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 59 insertions(+)
> >>
> >>diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> >>index 2f00778..fea826d 100644
> >>--- a/drivers/block/rbd.c
> >>+++ b/drivers/block/rbd.c
> >>@@ -508,10 +508,69 @@ static void rbd_release(struct gendisk *disk, fmode_t mode)
> >> put_device(&rbd_dev->dev);
> >> }
> >>
> >>+static int rbd_ioctl_set_ro(struct rbd_device *rbd_dev, unsigned long arg)
> >>+{
> >>+ int val;
> >>+ bool ro;
> >>+
> >>+ if (get_user(val, (int __user *)(arg)))
> >>+ return -EFAULT;
> >>+
> >>+ ro = val ? true : false;
> >>+ /* Snapshot doesn't allow to write*/
> >>+ if (rbd_dev->spec->snap_id != CEPH_NOSNAP && !ro)
> >>+ return -EROFS;
> >>+
> >>+ if (rbd_dev->mapping.read_only != ro) {
> >>+ rbd_dev->mapping.read_only = ro;
> >>+ set_disk_ro(rbd_dev->disk, ro ? 1 : 0);
> >
> >This is interesting. Josh mentioned that the set_device_ro()
> >call you had here previously was not needed, because the generic
> >ioctl code handled it.
> >
> >But I believe calling set_disk_ro() (which triggers a udev event)
> >is the right thing to do, even though rbd doesn't support partitions.
> >In fact, it might be more appropriate for the generic code to call
> >that using bdev->bd_disk. Hmmm. (I don't have time to look into
> >this any further right now, maybe if someone thinks it's worth
> >pursuing they can do so.)
> >
> >In the mean time this does more than set_device_ro(), and the
> >redundancy is harmless. HOWEVER I believe that calling
> >this while holding the spinlock will cause locking problems
> >(or may just be a bad idea). I don't know for sure; let
> >lockdep be your guide.
>
> I'm not sure about this either, it'd be good to verify it's
> ok by running with lockdep.
I have turned on the lockdep, and I couldn't see anything related with
rbd in /proc/lockdep and dmesg, so I think that should be OK.
>
> >>+ }
> >>+
> >>+ return 0;
> >>+}
> >>+
> >>+static int rbd_ioctl(struct block_device *bdev, fmode_t mode,
> >>+ unsigned int cmd, unsigned long arg)
> >>+{
> >>+ struct rbd_device *rbd_dev = bdev->bd_disk->private_data;
> >>+ int ret = 0;
> >>+
> >>+ spin_lock_irq(&rbd_dev->lock);
> >>+ /* prevent others open this device */
> >
> >I think indicating something more like "make sure we hold the only
> >reference to this device" would make it clear why you're checking
> >against 1 rather than 0.
> >
> >>+ if (rbd_dev->open_count > 1) {
> >>+ ret = -EBUSY;
> >>+ goto out;
> >>+ }
> >
> >You are adding a new ability to change a fundamental state
> >for an image. You're using the spinlock now to protect
> >the read-only state, where previously it was only used to
> >protect the open count and REMOVING state/flag. This
> >may mean you should check the read_only flag in rbd_open()
> >inside the spinlock there. And you may need to call the
> >set_device_ro() there under protection of that lock.
> >
> >As I mentioned above, holding the lock could also lead
> >to a problem when calling outside code (set_disk_ro()).
> >If that's the case you may need to devise a different
> >(possibly additional) way to protect this.
> >
> >Josh, please offer your insights.
>
> I think with the current structure (rbd_open() calling set_device_ro())
> protecting this with a spinlock won't cause a lock inversion, since
> rbd_open() is not called with the bdev lock held.
>
> With the change to rbd_request_fn() Alex mentioned, could you verify
> it has no problems with lockdep enabled? If so, it looks good to me.
All of the changes and lockdep enabled.
>
> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
>
> >>+
> >>+ switch (cmd) {
> >>+ case BLKROSET:
> >>+ ret = rbd_ioctl_set_ro(rbd_dev, arg);
> >>+ break;
> >>+ default:
> >>+ ret = -ENOTTY;
> >>+ }
> >>+
> >>+out:
> >>+ spin_unlock_irq(&rbd_dev->lock);
> >>+ return ret;
> >>+}
> >>+
> >>+#ifdef CONFIG_COMPAT
> >>+static int rbd_compat_ioctl(struct block_device *bdev, fmode_t mode,
> >>+ unsigned int cmd, unsigned long arg)
> >>+{
> >>+ return rbd_ioctl(bdev, mode, cmd, arg);
> >>+}
> >>+#endif /* CONFIG_COMPAT */
> >>+
> >> static const struct block_device_operations rbd_bd_ops = {
> >> .owner = THIS_MODULE,
> >> .open = rbd_open,
> >> .release = rbd_release,
> >>+ .ioctl = rbd_ioctl,
> >>+#ifdef CONFIG_COMPAT
> >>+ .compat_ioctl = rbd_compat_ioctl,
> >>+#endif
> >> };
> >>
> >> /*
> >>
> >
>
--
Best regards,
Guangliang
prev parent reply other threads:[~2013-09-24 3:07 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-18 6:35 [PATCH v3] rbd: add ioctl for rbd Guangliang Zhao
2013-09-21 15:53 ` Alex Elder
2013-09-23 8:06 ` Josh Durgin
2013-09-24 3:06 ` Guangliang Zhao [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130924030657.GA8545@x230 \
--to=lucienchao@gmail.com \
--cc=alex.elder@linaro.org \
--cc=ceph-devel@vger.kernel.org \
--cc=guangliang@unitedstack.com \
--cc=josh.durgin@inktank.com \
--cc=sage@inktank.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.