All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Elder <elder@ieee.org>
To: Olivier Bonvalet <ceph.list@daevel.fr>, ceph-devel@vger.kernel.org
Subject: Re: Issue #5876 : assertion failure in rbd_img_obj_callback()
Date: Fri, 04 Apr 2014 20:57:02 -0500	[thread overview]
Message-ID: <533F62EE.2060701@ieee.org> (raw)
In-Reply-To: <1396660579.2130.103.camel@localhost>

On 04/04/2014 08:16 PM, Olivier Bonvalet wrote:
> Le mardi 25 mars 2014 à 09:39 +0100, Olivier Bonvalet a écrit :
>> Hi,
>>
>> what can/should I do to help fix that problem ?
>>
>> for now, RBD kernel client hang on : 
>>         Assertion failure in rbd_img_obj_callback() at line 2131:
>>            rbd_assert(which >= img_request->next_completion);
>>
>> or on :
>>         Assertion failure in rbd_img_obj_callback() at line 2127:
>>             rbd_assert(img_request != NULL);
>>
>>
>> I have both case at least once per week, on latest 3.13.5 kernels.
>>
>> It seems that the problem occurs only on more loaded servers (I have 4
>> near same servers, and crash occurs on two of them. If I move the VM,
>> crash follows...).
>>
>> Olivier
>>
>> --
> 
> Hi,
> 
> so. After some days without any problems, RBD crashed toonight :

Unfortunately this could be a symptom of the same sort of race.
When a object request is removed from its image request's list
the request count gets decremented.  To be honest, all of these
assertions in rbd_img_obj_callback() are probably unsafe, at
least until I get the patch that does proper reference counting
implemented:

        rbd_assert(img_request != NULL);
        rbd_assert(img_request->obj_request_count > 0);
        rbd_assert(which != BAD_WHICH);
        rbd_assert(which < img_request->obj_request_count);

Until then I think you can avoid this by commenting out those
assertions.  I'm afraid there will remain a (smaller) window
of opportunity for a problem to occur, but I believe commenting
those out will help for now.

I'm very sorry you're hitting these.  I'll see if I can get
a comprehensive fix this weekend.

					-Alex



> Apr  5 02:52:24 rurkh kernel: [799426.461742] 
> Apr  5 02:52:24 rurkh kernel: [799426.461742] Assertion failure in rbd_img_obj_callback() at line 2128:
> Apr  5 02:52:24 rurkh kernel: [799426.461742] 
> Apr  5 02:52:24 rurkh kernel: [799426.461742]   rbd_assert(img_request->obj_request_count > 0);
> Apr  5 02:52:24 rurkh kernel: [799426.461742] 
> Apr  5 02:52:24 rurkh kernel: [799426.461958] ------------[ cut here ]------------
> Apr  5 02:52:24 rurkh kernel: [799426.461997] kernel BUG at drivers/block/rbd.c:2128!
> Apr  5 02:52:24 rurkh kernel: [799426.462036] invalid opcode: 0000 [#1] SMP 
> Apr  5 02:52:24 rurkh kernel: [799426.462080] Modules linked in: cbc rbd libceph xen_gntdev xt_physdev iptable_filter ip_tables x_tables xfs
>  libcrc32c bridge loop iTCO_wdt gpio_ich iTCO_vendor_support serio_raw sb_edac edac_core evdev i2c_i801 lpc_ich mfd_core ioatdma shpchp ipmi
> _si ipmi_msghandler wmi ac button dm_mod hid_generic usbhid hid sg sd_mod crc_t10dif crct10dif_common isci ahci ehci_pci libsas libahci mega
> raid_sas ehci_hcd libata scsi_transport_sas igb usbcore scsi_mod i2c_algo_bit ixgbe i2c_core usb_common dca ptp pps_core mdio
> Apr  5 02:52:24 rurkh kernel: [799426.462579] CPU: 0 PID: 15975 Comm: kworker/0:0 Not tainted 3.13-dae-dom0 #24
> Apr  5 02:52:24 rurkh kernel: [799426.462644] Hardware name: Supermicro X9DRW-7TPF+/X9DRW-7TPF+, BIOS 3.0 07/24/2013
> Apr  5 02:52:24 rurkh kernel: [799426.462717] Workqueue: ceph-msgr con_work [libceph]
> Apr  5 02:52:24 rurkh kernel: [799426.462759] task: ffff88024cd9a8a0 ti: ffff88021a4e4000 task.ti: ffff88021a4e4000
> Apr  5 02:52:24 rurkh kernel: [799426.462825] RIP: e030:[<ffffffffa0305ae8>]  [<ffffffffa0305ae8>] rbd_img_obj_callback+0x91/0x3a2 [rbd]
> Apr  5 02:52:24 rurkh kernel: [799426.462901] RSP: e02b:ffff88021a4e5ce8  EFLAGS: 00010282
> Apr  5 02:52:24 rurkh kernel: [799426.462940] RAX: 000000000000006d RBX: ffff88023f8f6ec8 RCX: 0000000000000000
> Apr  5 02:52:24 rurkh kernel: [799426.463005] RDX: ffff88027fe0eb50 RSI: ffff88027fe0e1a8 RDI: ffff88021a4e02a8
> Apr  5 02:52:24 rurkh kernel: [799426.463069] RBP: ffff88021c90a718 R08: 0000000000000000 R09: 0000000000000000
> Apr  5 02:52:24 rurkh kernel: [799426.463134] R10: 0000000000000000 R11: 000000000000084e R12: 0000000000000001
> Apr  5 02:52:24 rurkh kernel: [799426.463197] R13: 0000000000000000 R14: ffff88025584a130 R15: 0000000000000000
> Apr  5 02:52:24 rurkh kernel: [799426.481060] FS:  00007f1c6138f720(0000) GS:ffff88027fe00000(0000) knlGS:0000000000000000
> Apr  5 02:52:24 rurkh kernel: [799426.481130] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> Apr  5 02:52:24 rurkh kernel: [799426.481170] CR2: 00007f1c6139f000 CR3: 000000023825c000 CR4: 0000000000042660
> Apr  5 02:52:24 rurkh kernel: [799426.481235] Stack:
> Apr  5 02:52:24 rurkh kernel: [799426.481266]  000000000000000d ffff880254da107d ffffffffffffffff ffff880254da1048
> Apr  5 02:52:24 rurkh kernel: [799426.481349]  ffff88025584a128 ffff88026dc59718 0000000000000000 ffff88025584a130
> Apr  5 02:52:24 rurkh kernel: [799426.481429]  0000000000000000 ffffffffa02e4595 0000000000000015 ffff88026dc59770
> Apr  5 02:52:24 rurkh kernel: [799426.481510] Call Trace:
> Apr  5 02:52:24 rurkh kernel: [799426.481554]  [<ffffffffa02e4595>] ? dispatch+0x3e4/0x55e [libceph]
> Apr  5 02:52:24 rurkh kernel: [799426.481600]  [<ffffffffa02df0fc>] ? con_work+0xf6e/0x1a65 [libceph]
> Apr  5 02:52:24 rurkh kernel: [799426.481646]  [<ffffffff81051f83>] ? mmdrop+0xd/0x1c
> Apr  5 02:52:24 rurkh kernel: [799426.481687]  [<ffffffff8105265e>] ? finish_task_switch+0x4d/0x83
> Apr  5 02:52:24 rurkh kernel: [799426.481732]  [<ffffffff810484d7>] ? process_one_work+0x15a/0x214
> Apr  5 02:52:24 rurkh kernel: [799426.481775]  [<ffffffff8104895b>] ? worker_thread+0x139/0x1de
> Apr  5 02:52:24 rurkh kernel: [799426.481817]  [<ffffffff81048822>] ? rescuer_thread+0x26e/0x26e
> Apr  5 02:52:24 rurkh kernel: [799426.481859]  [<ffffffff8104cff6>] ? kthread+0x9e/0xa6
> Apr  5 02:52:24 rurkh kernel: [799426.481900]  [<ffffffff8104cf58>] ? __kthread_parkme+0x55/0x55
> Apr  5 02:52:24 rurkh kernel: [799426.481944]  [<ffffffff8137260c>] ? ret_from_fork+0x7c/0xb0
> Apr  5 02:52:24 rurkh kernel: [799426.481985]  [<ffffffff8104cf58>] ? __kthread_parkme+0x55/0x55
> Apr  5 02:52:24 rurkh kernel: [799426.482025] Code: 26 06 e1 0f 0b 8b 45 5c 85 c0 75 21 48 c7 c1 66 88 30 a0 ba 50 08 00 00 48 c7 c6 50 99 30 a0 48 c7 c7 1f 81 30 a0 e8 5b 26 06 e1 <0f> 0b 41 83 fc ff 75 23 48 c7 c1 f4 8b 30 a0 ba 51 08 00 00 31 
> Apr  5 02:52:24 rurkh kernel: [799426.482413] RIP  [<ffffffffa0305ae8>] rbd_img_obj_callback+0x91/0x3a2 [rbd]
> Apr  5 02:52:24 rurkh kernel: [799426.482462]  RSP <ffff88021a4e5ce8>
> Apr  5 02:52:24 rurkh kernel: [799426.483907] ---[ end trace 4aea8b8c107c24be ]---
> 
> 
> At this time there was a lot of IO, because of backups in VM.
> (but no RBD snapshot create or remove)
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2014-04-05  1:56 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-25  8:39 Issue #5876 : assertion failure in rbd_img_obj_callback() Olivier Bonvalet
2014-03-25  9:04 ` Ilya Dryomov
     [not found]   ` <1395739214.2823.34.camel@localhost>
2014-03-25  9:52     ` Ilya Dryomov
2014-03-25 11:48   ` Alex Elder
2014-03-25 12:34     ` Ilya Dryomov
2014-03-25 12:51       ` Alex Elder
2014-03-25 12:57         ` Ilya Dryomov
2014-03-25 13:18           ` Olivier Bonvalet
2014-03-25 13:29             ` Alex Elder
2014-03-25 13:31               ` Alex Elder
2014-03-25 14:01                 ` Olivier Bonvalet
2014-03-25 17:15                 ` Olivier Bonvalet
2014-03-25 17:21                   ` Alex Elder
2014-03-25 18:53                     ` Olivier Bonvalet
2014-03-25 17:43                   ` Alex Elder
2014-03-25 18:53                     ` Olivier Bonvalet
2014-03-25 19:03                       ` Alex Elder
2014-03-25 20:18                         ` Ilya Dryomov
2014-03-25 20:21                           ` Olivier Bonvalet
2014-03-25 20:24                             ` Alex Elder
2014-03-25 20:29                               ` Olivier Bonvalet
2014-03-25 20:44                                 ` Alex Elder
2014-03-25 21:03                                   ` Olivier Bonvalet
2014-03-25 20:41                             ` Alex Elder
2014-03-25 20:53                             ` Olivier Bonvalet
2014-03-25 21:10                               ` Olivier Bonvalet
2014-03-25 21:20                                 ` Ilya Dryomov
     [not found]                                   ` <1395782577.2076.23.camel@localhost>
2014-03-25 21:25                                     ` Ilya Dryomov
2014-03-25 21:41                                       ` Olivier Bonvalet
2014-03-25 21:49                                         ` Ilya Dryomov
2014-03-25 21:54                                           ` Olivier Bonvalet
2014-03-25 22:17                                             ` Olivier Bonvalet
2014-03-25 22:46                                               ` Alex Elder
2014-03-25 23:04                                                 ` Olivier Bonvalet
2014-03-26  0:00                                                   ` Alex Elder
2014-03-26  1:33                                                     ` Olivier Bonvalet
2014-03-26  1:50                                                       ` Olivier Bonvalet
2014-03-26  1:55                                                         ` Alex Elder
2014-03-26  2:40                                                           ` Olivier Bonvalet
2014-03-26  2:42                                                             ` Alex Elder
2014-03-26  2:45                                                               ` Olivier Bonvalet
2014-03-26  3:54                                                                 ` Alex Elder
2014-03-26  4:00                                                                   ` Olivier Bonvalet
2014-03-26  5:00                                                                     ` Alex Elder
2014-03-26 11:13                                                                       ` Alex Elder
2014-03-26 11:43                                                                         ` Ilya Dryomov
2014-03-26 11:47                                                                           ` Alex Elder
2014-03-26 12:05                                                                             ` Ilya Dryomov
2014-03-26 20:58                                                                           ` Alex Elder
2014-03-27  7:48                                                                             ` Olivier Bonvalet
2014-03-27  8:45                                                                               ` Ilya Dryomov
2014-03-27  8:49                                                                                 ` Olivier Bonvalet
2014-03-26  2:35                                                         ` Olivier Bonvalet
2014-03-26  2:54                                               ` Alex Elder
2014-03-26  3:58                                                 ` Olivier Bonvalet
2014-04-05  1:16 ` Olivier Bonvalet
2014-04-05  1:57   ` Alex Elder [this message]
2014-04-05  8:09     ` Olivier Bonvalet
2014-04-05 13:08       ` Alex Elder
2014-04-25 11:37     ` Olivier Bonvalet
2014-04-25 12:17       ` Alex Elder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=533F62EE.2060701@ieee.org \
    --to=elder@ieee.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=ceph.list@daevel.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.