From: Alex Elder <elder@ieee.org>
To: Olivier Bonvalet <ceph.list@daevel.fr>,
Ilya Dryomov <ilya.dryomov@inktank.com>
Cc: Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: Issue #5876 : assertion failure in rbd_img_obj_callback()
Date: Tue, 25 Mar 2014 08:29:19 -0500 [thread overview]
Message-ID: <533184AF.9050101@ieee.org> (raw)
In-Reply-To: <1395753516.2823.37.camel@localhost>
On 03/25/2014 08:18 AM, Olivier Bonvalet wrote:
>
>
> Le mardi 25 mars 2014 à 14:57 +0200, Ilya Dryomov a écrit :
>> On Tue, Mar 25, 2014 at 2:51 PM, Alex Elder <elder@ieee.org> wrote:
>>> On 03/25/2014 07:34 AM, Ilya Dryomov wrote:
>>>>> On 03/25/2014 04:04 AM, Ilya Dryomov wrote:
>>>>>> On Tue, Mar 25, 2014 at 10:39 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> what can/should I do to help fix that problem ?
>>>>>>>
>>>>>>> for now, RBD kernel client hang on :
>>>>>>> Assertion failure in rbd_img_obj_callback() at line 2131:
>>>>>>> rbd_assert(which >= img_request->next_completion);
>>>>>
>>>>> If you can build your own kernel as Ilya says I'd like to
>>>>> see the values of which and img_request->next_completion
>>>>> here.
>>>>
>>>> Looks like which was 1, which means that next_completion had to be 2 or
>>>> greater. I miss solaris crash dumps ...
>>>>
>>>> On a different note, why are we asserting next_completion outside of
>>>> a spinlock which is supposed to protect next_completion?
>>>
>>> That's a very good point (which could be easily remedied by moving
>>> the assertion down a couple lines). The image object request (#1)
>>> in this case will have been marked done at this point; it's possible
>>> that request #2 (or later) was concurrently getting handled by the
>>> for_each_obj_request_from() loop below in that same function, but
>>> may not have updated next_completion yet.
>>>
>>> So that *could* explain the tripped assertion. The assertion
>>> should be moved in any case, it's a bug.
>>>
>>> That being said, it doesn't explain the other assertion:
>>> rbd_assert(img_request != NULL);
>>> So there's at least one other thing going on.
>>
>> Yeah, exactly my thoughts.
>>
>> Thanks,
>>
>> Ilya
>
> So, a (partial) fix can be this patch ?
>
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -2123,6 +2123,7 @@ static void rbd_img_obj_callback(struct rbd_obj_request *obj_request)
> rbd_assert(obj_request_img_data_test(obj_request));
> img_request = obj_request->img_request;
>
> + spin_lock_irq(&img_request->completion_lock);
> dout("%s: img %p obj %p\n", __func__, img_request, obj_request);
> rbd_assert(img_request != NULL);
> rbd_assert(img_request->obj_request_count > 0);
> @@ -2130,7 +2131,6 @@ static void rbd_img_obj_callback(struct rbd_obj_request *obj_request)
> rbd_assert(which < img_request->obj_request_count);
> rbd_assert(which >= img_request->next_completion);
>
> - spin_lock_irq(&img_request->completion_lock);
> if (which != img_request->next_completion)
> goto out;
Yes, roughly. I'd do the following instead. It would be great
to learn whether it eliminates the one form of assertion failure
you were seeing.
-Alex
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -2128,11 +2128,11 @@ static void rbd_img_obj_callback(struct
rbd_assert(img_request->obj_request_count > 0);
rbd_assert(which != BAD_WHICH);
rbd_assert(which < img_request->obj_request_count);
- rbd_assert(which >= img_request->next_completion);
spin_lock_irq(&img_request->completion_lock);
if (which != img_request->next_completion)
goto out;
+ rbd_assert(which > img_request->next_completion);
for_each_obj_request_from(img_request, obj_request) {
rbd_assert(more);
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-03-25 13:29 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-25 8:39 Issue #5876 : assertion failure in rbd_img_obj_callback() Olivier Bonvalet
2014-03-25 9:04 ` Ilya Dryomov
[not found] ` <1395739214.2823.34.camel@localhost>
2014-03-25 9:52 ` Ilya Dryomov
2014-03-25 11:48 ` Alex Elder
2014-03-25 12:34 ` Ilya Dryomov
2014-03-25 12:51 ` Alex Elder
2014-03-25 12:57 ` Ilya Dryomov
2014-03-25 13:18 ` Olivier Bonvalet
2014-03-25 13:29 ` Alex Elder [this message]
2014-03-25 13:31 ` Alex Elder
2014-03-25 14:01 ` Olivier Bonvalet
2014-03-25 17:15 ` Olivier Bonvalet
2014-03-25 17:21 ` Alex Elder
2014-03-25 18:53 ` Olivier Bonvalet
2014-03-25 17:43 ` Alex Elder
2014-03-25 18:53 ` Olivier Bonvalet
2014-03-25 19:03 ` Alex Elder
2014-03-25 20:18 ` Ilya Dryomov
2014-03-25 20:21 ` Olivier Bonvalet
2014-03-25 20:24 ` Alex Elder
2014-03-25 20:29 ` Olivier Bonvalet
2014-03-25 20:44 ` Alex Elder
2014-03-25 21:03 ` Olivier Bonvalet
2014-03-25 20:41 ` Alex Elder
2014-03-25 20:53 ` Olivier Bonvalet
2014-03-25 21:10 ` Olivier Bonvalet
2014-03-25 21:20 ` Ilya Dryomov
[not found] ` <1395782577.2076.23.camel@localhost>
2014-03-25 21:25 ` Ilya Dryomov
2014-03-25 21:41 ` Olivier Bonvalet
2014-03-25 21:49 ` Ilya Dryomov
2014-03-25 21:54 ` Olivier Bonvalet
2014-03-25 22:17 ` Olivier Bonvalet
2014-03-25 22:46 ` Alex Elder
2014-03-25 23:04 ` Olivier Bonvalet
2014-03-26 0:00 ` Alex Elder
2014-03-26 1:33 ` Olivier Bonvalet
2014-03-26 1:50 ` Olivier Bonvalet
2014-03-26 1:55 ` Alex Elder
2014-03-26 2:40 ` Olivier Bonvalet
2014-03-26 2:42 ` Alex Elder
2014-03-26 2:45 ` Olivier Bonvalet
2014-03-26 3:54 ` Alex Elder
2014-03-26 4:00 ` Olivier Bonvalet
2014-03-26 5:00 ` Alex Elder
2014-03-26 11:13 ` Alex Elder
2014-03-26 11:43 ` Ilya Dryomov
2014-03-26 11:47 ` Alex Elder
2014-03-26 12:05 ` Ilya Dryomov
2014-03-26 20:58 ` Alex Elder
2014-03-27 7:48 ` Olivier Bonvalet
2014-03-27 8:45 ` Ilya Dryomov
2014-03-27 8:49 ` Olivier Bonvalet
2014-03-26 2:35 ` Olivier Bonvalet
2014-03-26 2:54 ` Alex Elder
2014-03-26 3:58 ` Olivier Bonvalet
2014-04-05 1:16 ` Olivier Bonvalet
2014-04-05 1:57 ` Alex Elder
2014-04-05 8:09 ` Olivier Bonvalet
2014-04-05 13:08 ` Alex Elder
2014-04-25 11:37 ` Olivier Bonvalet
2014-04-25 12:17 ` Alex Elder
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=533184AF.9050101@ieee.org \
--to=elder@ieee.org \
--cc=ceph-devel@vger.kernel.org \
--cc=ceph.list@daevel.fr \
--cc=ilya.dryomov@inktank.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.