All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Elder <alex.elder@linaro.org>
To: Josh Durgin <josh.durgin@inktank.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: [PATCH v3 3/5] rbd: fix use-after free of rbd_dev->disk
Date: Mon, 09 Sep 2013 11:43:26 -0500	[thread overview]
Message-ID: <522DFAAE.8040902@linaro.org> (raw)
In-Reply-To: <522DF405.6080602@inktank.com>

On 09/09/2013 11:15 AM, Josh Durgin wrote:
> On 09/09/2013 06:37 AM, Alex Elder wrote:
>> On 09/09/2013 02:17 AM, Josh Durgin wrote:
>>> Removing a device deallocates the disk, unschedules the watch, and
>>> finally cleans up the rbd_dev structure. rbd_dev_refresh(), called
>>> from the watch callback, updates the disk size and rbd_dev
>>> structure. With no locking between them, rbd_dev_refresh() may use the
>>> device or rbd_dev after they've been freed.

. . .

>>> +    if (!removing) {
>>
>> Here's the problem.  It is the same one faced by the open path.
>>
>> You determined above whether or not the device was getting removed.
>> But you haven't left any state that indicates that the image is
>> getting refreshed (or more specifically, getting its size updated).
>>
>> So, if the device is mapped but isn't actually opened by anybody
>> there's nothing stopping the code in rbd_remove() from going
>> ahead anyway.  So the possibility of the structures getting freed
>> remains (though the window is a little smaller now).
> 
> I think this is safe with the use of ceph_osdc_flush_notifies()
> before rbd_bus_del_dev(). Since the watch is already unregistered,
> flushing the notifies waits for all calls to rbd_watch_cb() to
> complete. This means that rbd_remove() can't actually free
> any of the rbd_dev structures until after the last revalidate_disk()
> etc. is done. Does this make sense, or am I missing something?

Yes, now I remember.  You're right.  That was the point of
doing the flush_notifies call before release anyway...

So my "nothing stopping the code in rbd_remove" was wrong.
In fact, that ceph_osdc_flush_notifies() call is stopping
it from freeing things that may still be in use.

Thanks for explaining it to me (again).

> This should probably go in a comment in rbd_remove(), since
> it's not obvious from the code there why the ordering of
> the last few calls makes sense.

Yes.  I think so.

Reviewed-by: Alex Elder <elder@linaro.org>

> Josh
> 
>> I think you can resolve this problem by using the open count.
>> That is, use the same interlock between the open count and the
>> removing flag that's used for rbd_open() and rbd_remove().
>> The open count in some sense represents someone still needing
>> the data structures for the image.  So treat the refresh activity
>> as one of those "somebodies" by claiming one of those opens
>> while the size update is going on.
>>
>> If you encapsulated some code now in place for this purpose,
>> something like the following might be helpful.  (I've renamed
>> "open_count" to be "inuse_count" here.)
>>
>> static inline int rbd_request_access(struct rbd_device *rbd_dev)
>> {
>>     int ret = 0;
>>
>>     spin_lock_irq(&rbd_dev->lock);
>>     if (!test_bit(RBD_DEV_FLAG_REMOVING, &rbd_dev->flags))
>>         rbd_dev->inuse_count++;
>>     else
>>         ret = -ENOENT;
>>     spin_unlock_irq(&rbd_dev->lock);
>>
>>     return ret;
>> }
>>
>> static inline int rbd_release_access(struct rbd_device *rbd_dev)
>> {
>>     int ret = 0;
>>
>>     spin_lock_irq(&rbd_dev->lock);
>>     if (rbd_dev->inuse_count)
>>         rbd_dev->inuse_count--;
>>     else
>>         ret = -EINVAL;
>>     spin_unlock_irq(&rbd_dev->lock);
>>
>>     return ret;
>> }
>>
>> static inline int rbd_prevent_access(struct rbd_device *rbd_dev)
>> {
>>     int ret = 0;
>>
>>     spin_lock_irq(&rbd_dev->lock);
>>     if (rbd_dev->inuse_count)
>>         ret = -EBUSY;
>>     else if (test_and_set_bit(RBD_DEV_FLAG_REMOVING, &rbd_dev->flags))
>>         ret = -EINPROGRESS;
>>     spin_unlock_irq(&rbd_dev->lock);
>>
>>     return ret;
>> }
>>
>> I'll assume you know what I'm talking about at this point and
>> won't go into exactly where and how you'd use these.
>>
>>> +        size = (sector_t)rbd_dev->mapping.size / SECTOR_SIZE;
>>> +        dout("setting size to %llu sectors", (unsigned long long)size);
>>> +        set_capacity(rbd_dev->disk, size);
>>> +        revalidate_disk(rbd_dev->disk);
>>> +    }
>>> +}+-
>>> +
>>>   static int rbd_dev_refresh(struct rbd_device *rbd_dev)
>>>   {
>>>       u64 mapping_size;
>>
>> Everything below is good.
>>
>>> @@ -3344,12 +3369,7 @@ static int rbd_dev_refresh(struct rbd_device
>>> *rbd_dev)
>>>       up_write(&rbd_dev->header_rwsem);
>>>
>>>       if (mapping_size != rbd_dev->mapping.size) {
>>> -        sector_t size;
>>> -
>>> -        size = (sector_t)rbd_dev->mapping.size / SECTOR_SIZE;
>>> -        dout("setting size to %llu sectors", (unsigned long long)size);
>>> -        set_capacity(rbd_dev->disk, size);
>>> -        revalidate_disk(rbd_dev->disk);
>>> +        rbd_dev_update_size(rbd_dev);
>>>       }
>>>
>>>       return ret;
>>> @@ -5160,7 +5180,6 @@ static ssize_t rbd_remove(struct bus_type *bus,
>>>       if (ret < 0 || already)
>>>           return ret;
>>>
>>> -    rbd_bus_del_dev(rbd_dev);
>>>       ret = rbd_dev_header_watch_sync(rbd_dev, false);
>>>       if (ret)
>>>           rbd_warn(rbd_dev, "failed to cancel watch event (%d)\n", ret);
>>> @@ -5171,6 +5190,7 @@ static ssize_t rbd_remove(struct bus_type *bus,
>>>        */
>>>       dout("%s: flushing notifies", __func__);
>>>       ceph_osdc_flush_notifies(&rbd_dev->rbd_client->client->osdc);
>>> +    rbd_bus_del_dev(rbd_dev);
>>>       rbd_dev_image_release(rbd_dev);
>>>       module_put(THIS_MODULE);
>>>
>>>
>>
> 


  reply	other threads:[~2013-09-09 16:43 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-09  7:16 [PATCH v3 0/5] fix shutdown races and snapshot error handling Josh Durgin
2013-09-09  7:16 ` [PATCH v3 1/5] rbd: complete notifies before cleaning up osd_client and rbd_dev Josh Durgin
2013-09-09  7:16 ` [PATCH v3 2/5] rbd: make rbd_obj_notify_ack() synchronous Josh Durgin
2013-09-09 12:06   ` Alex Elder
2013-09-09  7:17 ` [PATCH v3 3/5] rbd: fix use-after free of rbd_dev->disk Josh Durgin
2013-09-09 13:37   ` Alex Elder
2013-09-09 16:15     ` Josh Durgin
2013-09-09 16:43       ` Alex Elder [this message]
2013-09-09  7:17 ` [PATCH v3 4/5] rbd: ignore unmapped snapshots that no longer exist Josh Durgin
2013-09-09 13:47   ` Alex Elder
2013-09-09  7:17 ` [PATCH v3 5/5] rbd: fix error handling from rbd_snap_name() Josh Durgin
2013-09-09 13:49   ` Alex Elder
2013-09-09 13:50 ` [PATCH v3 0/5] fix shutdown races and snapshot error handling Alex Elder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=522DFAAE.8040902@linaro.org \
    --to=alex.elder@linaro.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=josh.durgin@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.