* Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) @ 2015-01-05 10:53 Chaitanya Huilgol 2015-01-05 11:15 ` Wido den Hollander 2015-01-05 15:57 ` Ilya Dryomov 0 siblings, 2 replies; 15+ messages in thread From: Chaitanya Huilgol @ 2015-01-05 10:53 UTC (permalink / raw) To: ceph-devel@vger.kernel.org Hi All, The stock ceph-client modules with Ubuntu 14.04 LTS are quite dated and we are seeing crashes and soft-lockup issues which have been fixed in the current ceph-client code base. What would be recommended ceph-client branch compatible with the Ubuntu 14.04 (3.13.0-x) kernels so that we can get as many fixes as possible? Regards, Chaitanya ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) 2015-01-05 10:53 Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) Chaitanya Huilgol @ 2015-01-05 11:15 ` Wido den Hollander 2015-01-08 8:49 ` joel.merrick 2015-01-05 15:57 ` Ilya Dryomov 1 sibling, 1 reply; 15+ messages in thread From: Wido den Hollander @ 2015-01-05 11:15 UTC (permalink / raw) To: Chaitanya Huilgol, ceph-devel@vger.kernel.org On 05-01-15 11:53, Chaitanya Huilgol wrote: > Hi All, > > The stock ceph-client modules with Ubuntu 14.04 LTS are quite dated and we are seeing crashes and soft-lockup issues which have been fixed in the current ceph-client code base. > What would be recommended ceph-client branch compatible with the Ubuntu 14.04 (3.13.0-x) kernels so that we can get as many fixes as possible? > I recommend you take a look here: http://kernel.ubuntu.com/~kernel-ppa/mainline/ That should give you some new kernels. Wido > Regards, > Chaitanya > > ________________________________ > > PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) 2015-01-05 11:15 ` Wido den Hollander @ 2015-01-08 8:49 ` joel.merrick 0 siblings, 0 replies; 15+ messages in thread From: joel.merrick @ 2015-01-08 8:49 UTC (permalink / raw) To: Wido den Hollander; +Cc: Chaitanya Huilgol, ceph-devel@vger.kernel.org On Mon, Jan 5, 2015 at 11:15 AM, Wido den Hollander <wido@42on.com> wrote: > > > On 05-01-15 11:53, Chaitanya Huilgol wrote: >> >> Hi All, >> >> The stock ceph-client modules with Ubuntu 14.04 LTS are quite dated and we >> are seeing crashes and soft-lockup issues which have been fixed in the >> current ceph-client code base. >> What would be recommended ceph-client branch compatible with the Ubuntu >> 14.04 (3.13.0-x) kernels so that we can get as many fixes as possible? >> > > I recommend you take a look here: > http://kernel.ubuntu.com/~kernel-ppa/mainline/ > > That should give you some new kernels. Just to throw in another method... Ubuntu also ship the kernels for newer non-LTS releases (as well as X components and other cherry picked bits) in their LTS releases too. So linux-generic-lts-utopic currently exists in 14.04. As Ilya mentioned though, the fixes should be marked anyway in the stock ubuntu 14.04 kernels (I use them without issue but use case may vary), but could be useful knowledge for someone. -- $ echo "kpfmAdpoofdufevq/dp/vl" | perl -pe 's/(.)/chr(ord($1)-1)/ge' ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) 2015-01-05 10:53 Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) Chaitanya Huilgol 2015-01-05 11:15 ` Wido den Hollander @ 2015-01-05 15:57 ` Ilya Dryomov 2015-01-05 17:11 ` Somnath Roy 2015-01-06 2:36 ` Chaitanya Huilgol 1 sibling, 2 replies; 15+ messages in thread From: Ilya Dryomov @ 2015-01-05 15:57 UTC (permalink / raw) To: Chaitanya Huilgol; +Cc: ceph-devel@vger.kernel.org On Mon, Jan 5, 2015 at 1:53 PM, Chaitanya Huilgol <Chaitanya.Huilgol@sandisk.com> wrote: > Hi All, > > The stock ceph-client modules with Ubuntu 14.04 LTS are quite dated and we are seeing crashes and soft-lockup issues which have been fixed in the current ceph-client code base. > What would be recommended ceph-client branch compatible with the Ubuntu 14.04 (3.13.0-x) kernels so that we can get as many fixes as possible? We actively mark rbd (not so much cephfs) fixes for stable and Ubuntu kernel team generally picks them up. 3.13 series should have most of the important fixes, although I haven't counted. What issues in particular you are running into? uname -a? Thanks, Ilya ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) 2015-01-05 15:57 ` Ilya Dryomov @ 2015-01-05 17:11 ` Somnath Roy 2015-01-05 18:50 ` Ilya Dryomov 2015-01-06 2:36 ` Chaitanya Huilgol 1 sibling, 1 reply; 15+ messages in thread From: Somnath Roy @ 2015-01-05 17:11 UTC (permalink / raw) To: Ilya Dryomov, Chaitanya Huilgol; +Cc: ceph-devel@vger.kernel.org Ilya, The main issue we are facing the krbd client crash in case of cluster node reboot. Is this fix backported to any 14.04 stable LTS kernel ? If not, please suggest a workaround for this as upgrading kernel may not be an option. Thanks & Regards Somnath -----Original Message----- From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Ilya Dryomov Sent: Monday, January 05, 2015 7:58 AM To: Chaitanya Huilgol Cc: ceph-devel@vger.kernel.org Subject: Re: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) On Mon, Jan 5, 2015 at 1:53 PM, Chaitanya Huilgol <Chaitanya.Huilgol@sandisk.com> wrote: > Hi All, > > The stock ceph-client modules with Ubuntu 14.04 LTS are quite dated and we are seeing crashes and soft-lockup issues which have been fixed in the current ceph-client code base. > What would be recommended ceph-client branch compatible with the Ubuntu 14.04 (3.13.0-x) kernels so that we can get as many fixes as possible? We actively mark rbd (not so much cephfs) fixes for stable and Ubuntu kernel team generally picks them up. 3.13 series should have most of the important fixes, although I haven't counted. What issues in particular you are running into? uname -a? Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) 2015-01-05 17:11 ` Somnath Roy @ 2015-01-05 18:50 ` Ilya Dryomov 2015-01-05 20:01 ` Somnath Roy 0 siblings, 1 reply; 15+ messages in thread From: Ilya Dryomov @ 2015-01-05 18:50 UTC (permalink / raw) To: Somnath Roy; +Cc: Chaitanya Huilgol, ceph-devel@vger.kernel.org On Mon, Jan 5, 2015 at 8:11 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote: > Ilya, > The main issue we are facing the krbd client crash in case of cluster node reboot. Is this fix backported to any 14.04 stable LTS kernel ? I don't recall anything like that or at least phrased that way. Can you give more details - crash traces at least? Thanks, Ilya ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) 2015-01-05 18:50 ` Ilya Dryomov @ 2015-01-05 20:01 ` Somnath Roy 2015-01-05 20:33 ` Ilya Dryomov 0 siblings, 1 reply; 15+ messages in thread From: Somnath Roy @ 2015-01-05 20:01 UTC (permalink / raw) To: Ilya Dryomov; +Cc: Chaitanya Huilgol, ceph-devel@vger.kernel.org Ilya, Here is the steps.. 1. You have a cluster (3 nodes) and replication is 3 2. map krbd image to a client. 3. Reboot or stop ceph services on one or more nodes 4. The client with krbd mapped module crashes Also, if we try to reboot the clients without unmapping the clients, client nodes goes into a loop and requires hard boot. But, we found this issue is fixed in later version of rbd. We are using inbox rbd coming with Ubuntu 14.04 LTS. Let me know if you need further details. Thanks & Regards Somnath -----Original Message----- From: Ilya Dryomov [mailto:ilya.dryomov@inktank.com] Sent: Monday, January 05, 2015 10:50 AM To: Somnath Roy Cc: Chaitanya Huilgol; ceph-devel@vger.kernel.org Subject: Re: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) On Mon, Jan 5, 2015 at 8:11 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote: > Ilya, > The main issue we are facing the krbd client crash in case of cluster node reboot. Is this fix backported to any 14.04 stable LTS kernel ? I don't recall anything like that or at least phrased that way. Can you give more details - crash traces at least? Thanks, Ilya ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) 2015-01-05 20:01 ` Somnath Roy @ 2015-01-05 20:33 ` Ilya Dryomov 2015-01-05 21:08 ` Somnath Roy 2015-01-05 21:54 ` Somnath Roy 0 siblings, 2 replies; 15+ messages in thread From: Ilya Dryomov @ 2015-01-05 20:33 UTC (permalink / raw) To: Somnath Roy; +Cc: Chaitanya Huilgol, ceph-devel@vger.kernel.org On Mon, Jan 5, 2015 at 11:01 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote: > Ilya, > Here is the steps.. > > 1. You have a cluster (3 nodes) and replication is 3 > > 2. map krbd image to a client. > > 3. Reboot or stop ceph services on one or more nodes > > 4. The client with krbd mapped module crashes Is it idle or under load? Do you have a trace of the crash? Thanks, Ilya ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) 2015-01-05 20:33 ` Ilya Dryomov @ 2015-01-05 21:08 ` Somnath Roy 2015-01-06 12:31 ` Chaitanya Huilgol 2015-01-05 21:54 ` Somnath Roy 1 sibling, 1 reply; 15+ messages in thread From: Somnath Roy @ 2015-01-05 21:08 UTC (permalink / raw) To: Ilya Dryomov; +Cc: Chaitanya Huilgol, ceph-devel@vger.kernel.org It's happening both in idle and under load. I don't have the trace right now but will get you one soon. Thanks & Regards Somnath -----Original Message----- From: Ilya Dryomov [mailto:ilya.dryomov@inktank.com] Sent: Monday, January 05, 2015 12:34 PM To: Somnath Roy Cc: Chaitanya Huilgol; ceph-devel@vger.kernel.org Subject: Re: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) On Mon, Jan 5, 2015 at 11:01 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote: > Ilya, > Here is the steps.. > > 1. You have a cluster (3 nodes) and replication is 3 > > 2. map krbd image to a client. > > 3. Reboot or stop ceph services on one or more nodes > > 4. The client with krbd mapped module crashes Is it idle or under load? Do you have a trace of the crash? Thanks, Ilya ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) 2015-01-05 21:08 ` Somnath Roy @ 2015-01-06 12:31 ` Chaitanya Huilgol 2015-01-06 14:19 ` Ilya Dryomov 0 siblings, 1 reply; 15+ messages in thread From: Chaitanya Huilgol @ 2015-01-06 12:31 UTC (permalink / raw) To: Somnath Roy, Ilya Dryomov; +Cc: ceph-devel@vger.kernel.org Hi Ilya, The RBD crash on OSD nodes going away is routinely hit in our setups. We have not been able to get a good stack trace for this one due to our console capture issues and these don't end up in the syslogs either after the crash. Will get you the traces soon. Most of the times this happens when all the OSD nodes go away at once. This could have probably been fixed by one of the following commits? Ilya Dryomov libceph: change from BUG to WARN for __remove_osd() asserts … idryomov authored on Nov 5 cc9f1f5 Ilya Dryomov libceph: clear r_req_lru_item in __unregister_linger_request() … idryomov authored on Nov 5 ba9d114 Ilya Dryomov libceph: unlink from o_linger_requests when clearing r_osd … idryomov authored on Nov 4 a390de0 Also, We have encountered a few other issues listed below (1) Soft Lockup issue Dec 10 11:22:28 rack3-client-1 kernel: [661597.506625] BUG: soft lockup - CPU#2 stuck for 22s! [java:29169] --- (vdbench process) . . Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.043935] Call Trace: Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.097630] [<ffffffffa062d9e8>] con_work+0x298/0x640 [libceph] Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.152461] [<ffffffff810838a2>] process_one_work+0x182/0x450 Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.206653] [<ffffffff81084641>] worker_thread+0x121/0x410 Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.259860] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0 Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.312023] [<ffffffff8108b312>] kthread+0xd2/0xf0 Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.362974] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.414058] [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0 Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.464358] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.514121] Code: ff ff 48 89 df e8 e3 f1 ff ff 48 8b 7d a8 e8 7a 8c 0e e1 48 8b 7d b0 e8 41 d8 a7 e0 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 48 8b 45 b8 49 8b 0e 4c 89 f2 48 c7 c6 d0 76 64 a0 48 c7 Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.663443] RIP [<ffffffffa063340e>] osd_reset+0x22e/0x2c0 [libceph] Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.712105] RSP <ffff880a22b8bd80> (2) Soft lockup when OSDs are flapping Dec 18 18:25:10 rack3-client-2 kernel: [157126.089489] BUG: soft lockup - CPU#4 stuck for 23s! [kworker/4:0:45012] . . Dec 18 18:25:10 rack3-client-2 kernel: [157126.098648] Call Trace: Dec 18 18:25:10 rack3-client-2 kernel: [157126.098653] [<ffffffffa030d963>] kick_requests+0x1e3/0x440 [libceph] Dec 18 18:25:10 rack3-client-2 kernel: [157126.098657] [<ffffffffa030df98>] ceph_osdc_handle_map+0x2a8/0x620 [libceph] Dec 18 18:25:10 rack3-client-2 kernel: [157126.098662] [<ffffffffa030e55b>] dispatch+0x24b/0xb20 [libceph] Dec 18 18:25:10 rack3-client-2 kernel: [157126.098665] [<ffffffffa0301c08>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph] Dec 18 18:25:10 rack3-client-2 kernel: [157126.098669] [<ffffffffa030552f>] con_work+0x164f/0x2b60 [libceph] Dec 18 18:25:10 rack3-client-2 kernel: [157126.098672] [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10 Dec 18 18:25:10 rack3-client-2 kernel: [157126.098674] [<ffffffff8101b763>] ? native_sched_clock+0x13/0x80 Dec 18 18:25:10 rack3-client-2 kernel: [157126.098676] [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10 Dec 18 18:25:10 rack3-client-2 kernel: [157126.098679] [<ffffffff8109d2d5>] ? sched_clock_cpu+0xb5/0x100 Dec 18 18:25:10 rack3-client-2 kernel: [157126.098681] [<ffffffff8109df6d>] ? vtime_common_task_switch+0x3d/0x40 Dec 18 18:25:10 rack3-client-2 kernel: [157126.098684] [<ffffffff810838a2>] process_one_work+0x182/0x450 Dec 18 18:25:10 rack3-client-2 kernel: [157126.098686] [<ffffffff81084641>] worker_thread+0x121/0x410 Dec 18 18:25:10 rack3-client-2 kernel: [157126.098688] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0 Dec 18 18:25:10 rack3-client-2 kernel: [157126.098690] [<ffffffff8108b312>] kthread+0xd2/0xf0 Dec 18 18:25:10 rack3-client-2 kernel: [157126.098692] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 Dec 18 18:25:10 rack3-client-2 kernel: [157126.098695] [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0 Dec 18 18:25:10 rack3-client-2 kernel: [157126.098697] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 (3) BUG_ON(!list_empty(&req->r_req_lru_item)); Dec 4 17:14:33 rack6-ramp-4 kernel: [320359.828209] kernel BUG at /build/buildd/linux-3.13.0/net/ceph/osd_client.c:892! Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.043935] Call Trace: Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.097630] [<ffffffffa062d9e8>] con_work+0x298/0x640 [libceph] Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.152461] [<ffffffff810838a2>] process_one_work+0x182/0x450 Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.206653] [<ffffffff81084641>] worker_thread+0x121/0x410 Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.259860] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0 Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.312023] [<ffffffff8108b312>] kthread+0xd2/0xf0 Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.362974] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.414058] [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0 Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.464358] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 (4) img_request null Dec 12 08:07:48 rack1-ram-6 kernel: [251596.908865] Assertion failure in rbd_img_obj_callback() at line 2127: Dec 12 08:07:48 rack1-ram-6 kernel: [251596.908865] Dec 12 08:07:48 rack1-ram-6 kernel: [251596.908865] rbd_assert(img_request != NULL); Dec 12 08:07:50 rack1-ram-6 kernel: [251597.257322] [<ffffffffa01a5897>] rbd_obj_request_complete+0x27/0x70 [rbd] Dec 12 08:07:50 rack1-ram-6 kernel: [251597.268450] [<ffffffffa01a8d4f>] rbd_osd_req_callback+0xdf/0x4e0 [rbd] Dec 12 08:07:50 rack1-ram-6 kernel: [251597.279182] [<ffffffffa039e262>] dispatch+0x4a2/0x900 [libceph] Dec 12 08:07:50 rack1-ram-6 kernel: [251597.289159] [<ffffffffa039494b>] try_read+0x4ab/0x10d0 [libceph] Dec 12 08:07:50 rack1-ram-6 kernel: [251597.299236] [<ffffffffa0396362>] ? try_write+0xa42/0xe30 [libceph] Dec 12 08:07:50 rack1-ram-6 kernel: [251597.309777] [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10 Dec 12 08:07:50 rack1-ram-6 kernel: [251597.318627] [<ffffffff8101b763>] ? native_sched_clock+0x13/0x80 Dec 12 08:07:50 rack1-ram-6 kernel: [251597.332347] [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10 Dec 12 08:07:50 rack1-ram-6 kernel: [251597.341095] [<ffffffff8109d2d5>] ? sched_clock_cpu+0xb5/0x100 Dec 12 08:07:50 rack1-ram-6 kernel: [251597.351061] [<ffffffffa0396809>] con_work+0xb9/0x640 [libceph] Dec 12 08:07:50 rack1-ram-6 kernel: [251597.361003] [<ffffffff810838a2>] process_one_work+0x182/0x450 Dec 12 08:07:50 rack1-ram-6 kernel: [251597.370752] [<ffffffff81084641>] worker_thread+0x121/0x410 Dec 12 08:07:50 rack1-ram-6 kernel: [251597.379816] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0 Dec 12 08:07:50 rack1-ram-6 kernel: [251597.389173] [<ffffffff8108b312>] kthread+0xd2/0xf0 Dec 12 08:07:50 rack1-ram-6 kernel: [251597.396898] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 Dec 12 08:07:50 rack1-ram-6 kernel: [251597.407506] [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0 Dec 12 08:07:50 rack1-ram-6 kernel: [251597.416181] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 This is similar to: http://tracker.ceph.com/issues/8378 Saw that the rhel7a branch has many of the latest fixes and is somewhat compatible with 3.13 kernels, For validation, we have taken the rhel7a ceph-client branch and with minor modification gotten it to compile with 3.13.0 headers. With this we did not hit any issues (expect issue-2). We understand that is not the right approach for Ubuntu, It would be great if we could get the fixes into Ubuntu 14.04 kernels as well. Regards, Chaitanya -----Original Message----- From: Somnath Roy Sent: Tuesday, January 06, 2015 2:38 AM To: Ilya Dryomov Cc: Chaitanya Huilgol; ceph-devel@vger.kernel.org Subject: RE: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) It's happening both in idle and under load. I don't have the trace right now but will get you one soon. Thanks & Regards Somnath -----Original Message----- From: Ilya Dryomov [mailto:ilya.dryomov@inktank.com] Sent: Monday, January 05, 2015 12:34 PM To: Somnath Roy Cc: Chaitanya Huilgol; ceph-devel@vger.kernel.org Subject: Re: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) On Mon, Jan 5, 2015 at 11:01 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote: > Ilya, > Here is the steps.. > > 1. You have a cluster (3 nodes) and replication is 3 > > 2. map krbd image to a client. > > 3. Reboot or stop ceph services on one or more nodes > > 4. The client with krbd mapped module crashes Is it idle or under load? Do you have a trace of the crash? Thanks, Ilya ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) 2015-01-06 12:31 ` Chaitanya Huilgol @ 2015-01-06 14:19 ` Ilya Dryomov 2015-01-08 3:30 ` Chaitanya Huilgol 0 siblings, 1 reply; 15+ messages in thread From: Ilya Dryomov @ 2015-01-06 14:19 UTC (permalink / raw) To: Chaitanya Huilgol; +Cc: Somnath Roy, ceph-devel@vger.kernel.org On Tue, Jan 6, 2015 at 3:31 PM, Chaitanya Huilgol <Chaitanya.Huilgol@sandisk.com> wrote: > Hi Ilya, > > The RBD crash on OSD nodes going away is routinely hit in our setups. > We have not been able to get a good stack trace for this one due to our console capture issues and these don't end up in the syslogs either after the crash. Will get you the traces soon. > Most of the times this happens when all the OSD nodes go away at once. This could have probably been fixed by one of the following commits? > > Ilya Dryomov > libceph: change from BUG to WARN for __remove_osd() asserts … > idryomov authored on Nov 5 > cc9f1f5 > Ilya Dryomov > libceph: clear r_req_lru_item in __unregister_linger_request() … > idryomov authored on Nov 5 > ba9d114 > Ilya Dryomov > libceph: unlink from o_linger_requests when clearing r_osd … > idryomov authored on Nov 4 > a390de0 Yes, but probably others as well. > > Also, We have encountered a few other issues listed below > > (1) Soft Lockup issue > Dec 10 11:22:28 rack3-client-1 kernel: [661597.506625] BUG: soft lockup - CPU#2 stuck for 22s! [java:29169] --- (vdbench process) > . > . > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.043935] Call Trace: > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.097630] [<ffffffffa062d9e8>] con_work+0x298/0x640 [libceph] > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.152461] [<ffffffff810838a2>] process_one_work+0x182/0x450 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.206653] [<ffffffff81084641>] worker_thread+0x121/0x410 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.259860] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.312023] [<ffffffff8108b312>] kthread+0xd2/0xf0 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.362974] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.414058] [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.464358] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.514121] Code: ff ff 48 89 df e8 e3 f1 ff ff 48 8b 7d a8 e8 7a 8c 0e e1 48 8b 7d b0 e8 41 d8 a7 e0 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 48 8b 45 b8 49 8b 0e 4c 89 f2 48 c7 c6 d0 76 64 a0 48 c7 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.663443] RIP [<ffffffffa063340e>] osd_reset+0x22e/0x2c0 [libceph] > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.712105] RSP <ffff880a22b8bd80> > > (2) Soft lockup when OSDs are flapping > > Dec 18 18:25:10 rack3-client-2 kernel: [157126.089489] BUG: soft lockup - CPU#4 stuck for 23s! [kworker/4:0:45012] > . > . > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098648] Call Trace: > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098653] [<ffffffffa030d963>] kick_requests+0x1e3/0x440 [libceph] > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098657] [<ffffffffa030df98>] ceph_osdc_handle_map+0x2a8/0x620 [libceph] > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098662] [<ffffffffa030e55b>] dispatch+0x24b/0xb20 [libceph] > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098665] [<ffffffffa0301c08>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph] > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098669] [<ffffffffa030552f>] con_work+0x164f/0x2b60 [libceph] > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098672] [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10 > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098674] [<ffffffff8101b763>] ? native_sched_clock+0x13/0x80 > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098676] [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10 > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098679] [<ffffffff8109d2d5>] ? sched_clock_cpu+0xb5/0x100 > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098681] [<ffffffff8109df6d>] ? vtime_common_task_switch+0x3d/0x40 > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098684] [<ffffffff810838a2>] process_one_work+0x182/0x450 > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098686] [<ffffffff81084641>] worker_thread+0x121/0x410 > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098688] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0 > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098690] [<ffffffff8108b312>] kthread+0xd2/0xf0 > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098692] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098695] [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0 > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098697] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 > > (3) BUG_ON(!list_empty(&req->r_req_lru_item)); > > Dec 4 17:14:33 rack6-ramp-4 kernel: [320359.828209] kernel BUG at /build/buildd/linux-3.13.0/net/ceph/osd_client.c:892! > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.043935] Call Trace: > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.097630] [<ffffffffa062d9e8>] con_work+0x298/0x640 [libceph] > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.152461] [<ffffffff810838a2>] process_one_work+0x182/0x450 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.206653] [<ffffffff81084641>] worker_thread+0x121/0x410 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.259860] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.312023] [<ffffffff8108b312>] kthread+0xd2/0xf0 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.362974] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.414058] [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.464358] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 > > (4) img_request null > Dec 12 08:07:48 rack1-ram-6 kernel: [251596.908865] Assertion failure in rbd_img_obj_callback() at line 2127: > Dec 12 08:07:48 rack1-ram-6 kernel: [251596.908865] > Dec 12 08:07:48 rack1-ram-6 kernel: [251596.908865] rbd_assert(img_request != NULL); > > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.257322] [<ffffffffa01a5897>] rbd_obj_request_complete+0x27/0x70 [rbd] > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.268450] [<ffffffffa01a8d4f>] rbd_osd_req_callback+0xdf/0x4e0 [rbd] > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.279182] [<ffffffffa039e262>] dispatch+0x4a2/0x900 [libceph] > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.289159] [<ffffffffa039494b>] try_read+0x4ab/0x10d0 [libceph] > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.299236] [<ffffffffa0396362>] ? try_write+0xa42/0xe30 [libceph] > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.309777] [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10 > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.318627] [<ffffffff8101b763>] ? native_sched_clock+0x13/0x80 > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.332347] [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10 > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.341095] [<ffffffff8109d2d5>] ? sched_clock_cpu+0xb5/0x100 > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.351061] [<ffffffffa0396809>] con_work+0xb9/0x640 [libceph] > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.361003] [<ffffffff810838a2>] process_one_work+0x182/0x450 > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.370752] [<ffffffff81084641>] worker_thread+0x121/0x410 > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.379816] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0 > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.389173] [<ffffffff8108b312>] kthread+0xd2/0xf0 > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.396898] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.407506] [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0 > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.416181] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 > This is similar to: http://tracker.ceph.com/issues/8378 > > Saw that the rhel7a branch has many of the latest fixes and is somewhat compatible with 3.13 kernels, > For validation, we have taken the rhel7a ceph-client branch and with minor modification gotten it to compile with 3.13.0 headers. With this we did not hit any issues (expect issue-2). What do you mean by "expect issue-2"? (3) and (4) should be fixed in rhel7-a. Can't say anything about (1) and (2) - please report back if you see any soft lockup splats on rhel7-a. > We understand that is not the right approach for Ubuntu, It would be great if we could get the fixes into Ubuntu 14.04 kernels as well. It may not be the right approach, but in many ways it's better than a set of selected backports. While working on another report I found a couple easy-to-backport patches that are missing from Ubuntu 3.13 series and will forward them to stable guys, but, for those who can build their own kernels at least, branches like rhel7-a are best. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) 2015-01-06 14:19 ` Ilya Dryomov @ 2015-01-08 3:30 ` Chaitanya Huilgol 2015-01-08 8:22 ` Ilya Dryomov 0 siblings, 1 reply; 15+ messages in thread From: Chaitanya Huilgol @ 2015-01-08 3:30 UTC (permalink / raw) To: Ilya Dryomov; +Cc: Somnath Roy, ceph-devel@vger.kernel.org We have hit issue-2 to on the rhel7a code base (soft lockup in ceph_osdc_handle_map, when large number of osds were flapping due to spurious heartbeat failures). We have not been able to reproduce other issues. On a side-note, are the changes in the ceph-client rhel7a branch being actively pulled into the rhel7/centos7 kernel updated? -----Original Message----- From: Ilya Dryomov [mailto:ilya.dryomov@inktank.com] Sent: Tuesday, January 06, 2015 7:49 PM To: Chaitanya Huilgol Cc: Somnath Roy; ceph-devel@vger.kernel.org Subject: Re: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) On Tue, Jan 6, 2015 at 3:31 PM, Chaitanya Huilgol <Chaitanya.Huilgol@sandisk.com> wrote: > Hi Ilya, > > The RBD crash on OSD nodes going away is routinely hit in our setups. > We have not been able to get a good stack trace for this one due to our console capture issues and these don't end up in the syslogs either after the crash. Will get you the traces soon. > Most of the times this happens when all the OSD nodes go away at once. This could have probably been fixed by one of the following commits? > > Ilya Dryomov > libceph: change from BUG to WARN for __remove_osd() asserts … idryomov > authored on Nov 5 > cc9f1f5 > Ilya Dryomov > libceph: clear r_req_lru_item in __unregister_linger_request() … > idryomov authored on Nov 5 > ba9d114 > Ilya Dryomov > libceph: unlink from o_linger_requests when clearing r_osd … idryomov > authored on Nov 4 > a390de0 Yes, but probably others as well. > > Also, We have encountered a few other issues listed below > > (1) Soft Lockup issue > Dec 10 11:22:28 rack3-client-1 kernel: [661597.506625] BUG: soft > lockup - CPU#2 stuck for 22s! [java:29169] --- (vdbench process) . > . > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.043935] Call Trace: > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.097630] > [<ffffffffa062d9e8>] con_work+0x298/0x640 [libceph] Dec 4 17:14:33 > rack6-ramp-4 kernel: [320361.152461] [<ffffffff810838a2>] > process_one_work+0x182/0x450 Dec 4 17:14:33 rack6-ramp-4 kernel: > [320361.206653] [<ffffffff81084641>] worker_thread+0x121/0x410 Dec 4 > 17:14:33 rack6-ramp-4 kernel: [320361.259860] [<ffffffff81084520>] ? > rescuer_thread+0x3e0/0x3e0 Dec 4 17:14:33 rack6-ramp-4 kernel: > [320361.312023] [<ffffffff8108b312>] kthread+0xd2/0xf0 Dec 4 17:14:33 > rack6-ramp-4 kernel: [320361.362974] [<ffffffff8108b240>] ? > kthread_create_on_node+0x1d0/0x1d0 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.414058] > [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0 Dec 4 17:14:33 > rack6-ramp-4 kernel: [320361.464358] [<ffffffff8108b240>] ? > kthread_create_on_node+0x1d0/0x1d0 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.514121] Code: ff ff 48 89 > df e8 e3 f1 ff ff 48 8b 7d a8 e8 7a 8c 0e e1 48 8b 7d b0 e8 41 d8 a7 > e0 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 48 8b 45 b8 49 > 8b 0e 4c 89 f2 48 c7 c6 d0 76 64 a0 48 c7 Dec 4 17:14:33 rack6-ramp-4 > kernel: [320361.663443] RIP [<ffffffffa063340e>] osd_reset+0x22e/0x2c0 > [libceph] Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.712105] RSP > <ffff880a22b8bd80> > > (2) Soft lockup when OSDs are flapping > > Dec 18 18:25:10 rack3-client-2 kernel: [157126.089489] BUG: soft > lockup - CPU#4 stuck for 23s! [kworker/4:0:45012] . > . > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098648] Call Trace: > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098653] > [<ffffffffa030d963>] kick_requests+0x1e3/0x440 [libceph] Dec 18 > 18:25:10 rack3-client-2 kernel: [157126.098657] [<ffffffffa030df98>] > ceph_osdc_handle_map+0x2a8/0x620 [libceph] Dec 18 18:25:10 > rack3-client-2 kernel: [157126.098662] [<ffffffffa030e55b>] > dispatch+0x24b/0xb20 [libceph] Dec 18 18:25:10 rack3-client-2 kernel: > [157126.098665] [<ffffffffa0301c08>] ? ceph_tcp_recvmsg+0x48/0x60 > [libceph] Dec 18 18:25:10 rack3-client-2 kernel: [157126.098669] > [<ffffffffa030552f>] con_work+0x164f/0x2b60 [libceph] Dec 18 18:25:10 > rack3-client-2 kernel: [157126.098672] [<ffffffff8101b7d9>] ? > sched_clock+0x9/0x10 Dec 18 18:25:10 rack3-client-2 kernel: > [157126.098674] [<ffffffff8101b763>] ? native_sched_clock+0x13/0x80 > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098676] > [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10 Dec 18 18:25:10 > rack3-client-2 kernel: [157126.098679] [<ffffffff8109d2d5>] ? > sched_clock_cpu+0xb5/0x100 Dec 18 18:25:10 rack3-client-2 kernel: > [157126.098681] [<ffffffff8109df6d>] ? > vtime_common_task_switch+0x3d/0x40 > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098684] > [<ffffffff810838a2>] process_one_work+0x182/0x450 Dec 18 18:25:10 > rack3-client-2 kernel: [157126.098686] [<ffffffff81084641>] > worker_thread+0x121/0x410 Dec 18 18:25:10 rack3-client-2 kernel: > [157126.098688] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0 Dec > 18 18:25:10 rack3-client-2 kernel: [157126.098690] > [<ffffffff8108b312>] kthread+0xd2/0xf0 Dec 18 18:25:10 rack3-client-2 > kernel: [157126.098692] [<ffffffff8108b240>] ? > kthread_create_on_node+0x1d0/0x1d0 > Dec 18 18:25:10 rack3-client-2 kernel: [157126.098695] > [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0 Dec 18 18:25:10 > rack3-client-2 kernel: [157126.098697] [<ffffffff8108b240>] ? > kthread_create_on_node+0x1d0/0x1d0 > > (3) BUG_ON(!list_empty(&req->r_req_lru_item)); > > Dec 4 17:14:33 rack6-ramp-4 kernel: [320359.828209] kernel BUG at /build/buildd/linux-3.13.0/net/ceph/osd_client.c:892! > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.043935] Call Trace: > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.097630] > [<ffffffffa062d9e8>] con_work+0x298/0x640 [libceph] Dec 4 17:14:33 > rack6-ramp-4 kernel: [320361.152461] [<ffffffff810838a2>] > process_one_work+0x182/0x450 Dec 4 17:14:33 rack6-ramp-4 kernel: > [320361.206653] [<ffffffff81084641>] worker_thread+0x121/0x410 Dec 4 > 17:14:33 rack6-ramp-4 kernel: [320361.259860] [<ffffffff81084520>] ? > rescuer_thread+0x3e0/0x3e0 Dec 4 17:14:33 rack6-ramp-4 kernel: > [320361.312023] [<ffffffff8108b312>] kthread+0xd2/0xf0 Dec 4 17:14:33 > rack6-ramp-4 kernel: [320361.362974] [<ffffffff8108b240>] ? > kthread_create_on_node+0x1d0/0x1d0 > Dec 4 17:14:33 rack6-ramp-4 kernel: [320361.414058] > [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0 Dec 4 17:14:33 > rack6-ramp-4 kernel: [320361.464358] [<ffffffff8108b240>] ? > kthread_create_on_node+0x1d0/0x1d0 > > (4) img_request null > Dec 12 08:07:48 rack1-ram-6 kernel: [251596.908865] Assertion failure in rbd_img_obj_callback() at line 2127: > Dec 12 08:07:48 rack1-ram-6 kernel: [251596.908865] > Dec 12 08:07:48 rack1-ram-6 kernel: [251596.908865] rbd_assert(img_request != NULL); > > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.257322] > [<ffffffffa01a5897>] rbd_obj_request_complete+0x27/0x70 [rbd] Dec 12 > 08:07:50 rack1-ram-6 kernel: [251597.268450] [<ffffffffa01a8d4f>] > rbd_osd_req_callback+0xdf/0x4e0 [rbd] Dec 12 08:07:50 rack1-ram-6 > kernel: [251597.279182] [<ffffffffa039e262>] dispatch+0x4a2/0x900 > [libceph] Dec 12 08:07:50 rack1-ram-6 kernel: [251597.289159] > [<ffffffffa039494b>] try_read+0x4ab/0x10d0 [libceph] Dec 12 08:07:50 > rack1-ram-6 kernel: [251597.299236] [<ffffffffa0396362>] ? > try_write+0xa42/0xe30 [libceph] Dec 12 08:07:50 rack1-ram-6 kernel: > [251597.309777] [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10 Dec 12 > 08:07:50 rack1-ram-6 kernel: [251597.318627] [<ffffffff8101b763>] ? > native_sched_clock+0x13/0x80 Dec 12 08:07:50 rack1-ram-6 kernel: > [251597.332347] [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10 Dec 12 > 08:07:50 rack1-ram-6 kernel: [251597.341095] [<ffffffff8109d2d5>] ? > sched_clock_cpu+0xb5/0x100 Dec 12 08:07:50 rack1-ram-6 kernel: > [251597.351061] [<ffffffffa0396809>] con_work+0xb9/0x640 [libceph] > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.361003] > [<ffffffff810838a2>] process_one_work+0x182/0x450 Dec 12 08:07:50 > rack1-ram-6 kernel: [251597.370752] [<ffffffff81084641>] > worker_thread+0x121/0x410 Dec 12 08:07:50 rack1-ram-6 kernel: > [251597.379816] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0 Dec > 12 08:07:50 rack1-ram-6 kernel: [251597.389173] [<ffffffff8108b312>] > kthread+0xd2/0xf0 Dec 12 08:07:50 rack1-ram-6 kernel: [251597.396898] > [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 > Dec 12 08:07:50 rack1-ram-6 kernel: [251597.407506] > [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0 Dec 12 08:07:50 > rack1-ram-6 kernel: [251597.416181] [<ffffffff8108b240>] ? > kthread_create_on_node+0x1d0/0x1d0 > This is similar to: http://tracker.ceph.com/issues/8378 > > Saw that the rhel7a branch has many of the latest fixes and is > somewhat compatible with 3.13 kernels, For validation, we have taken the rhel7a ceph-client branch and with minor modification gotten it to compile with 3.13.0 headers. With this we did not hit any issues (expect issue-2). What do you mean by "expect issue-2"? (3) and (4) should be fixed in rhel7-a. Can't say anything about (1) and (2) - please report back if you see any soft lockup splats on rhel7-a. > We understand that is not the right approach for Ubuntu, It would be great if we could get the fixes into Ubuntu 14.04 kernels as well. It may not be the right approach, but in many ways it's better than a set of selected backports. While working on another report I found a couple easy-to-backport patches that are missing from Ubuntu 3.13 series and will forward them to stable guys, but, for those who can build their own kernels at least, branches like rhel7-a are best. Thanks, Ilya ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) 2015-01-08 3:30 ` Chaitanya Huilgol @ 2015-01-08 8:22 ` Ilya Dryomov 0 siblings, 0 replies; 15+ messages in thread From: Ilya Dryomov @ 2015-01-08 8:22 UTC (permalink / raw) To: Chaitanya Huilgol; +Cc: Somnath Roy, ceph-devel@vger.kernel.org On Thu, Jan 8, 2015 at 6:30 AM, Chaitanya Huilgol <Chaitanya.Huilgol@sandisk.com> wrote: > We have hit issue-2 to on the rhel7a code base (soft lockup in ceph_osdc_handle_map, when large number of osds were flapping due to spurious heartbeat failures). We have not been able to reproduce other issues. Can I see the entire dmesg of a boot when it happened on rhel7-a? > On a side-note, are the changes in the ceph-client rhel7a branch being actively pulled into the rhel7/centos7 kernel updated? All of rhel7-a is on its way to rhel7.1 I think. Not sure about centos. Thanks, Ilya ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) 2015-01-05 20:33 ` Ilya Dryomov 2015-01-05 21:08 ` Somnath Roy @ 2015-01-05 21:54 ` Somnath Roy 1 sibling, 0 replies; 15+ messages in thread From: Somnath Roy @ 2015-01-05 21:54 UTC (permalink / raw) To: Ilya Dryomov; +Cc: Chaitanya Huilgol, ceph-devel@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 6412 bytes --] Ilya, I can gather the following syslog entries. Attached is the syslog..Please have a look if this is helpful. I can see the following trace.. Dec 9 01:38:01 rack1-ramp-5 kernel: [1371757.283268] Workqueue: ceph-msgr con_work [libceph] Dec 9 01:38:01 rack1-ramp-5 kernel: [1371757.291641] task: ffff880fb6868000 ti: ffff880ffaa2a000 task.ti: ffff880ffaa2a000 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371757.304503] RIP: 0010:[<ffffffffa035a40e>] [<ffffffffa035a40e>] osd_reset+0x22e/0x2c0 [libceph] Dec 9 01:38:01 rack1-ramp-5 kernel: [1371757.319808] RSP: 0018:ffff880ffaa2bd80 EFLAGS: 00010206 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371757.328659] RAX: ffff881012fb4ca8 RBX: ffff8810114a9750 RCX: ffff881012790050 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371757.599331] RDX: ffff881012fb4ca8 RSI: 0000000086588656 RDI: 0000000000000286 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371757.703539] RBP: ffff880ffaa2bdd8 R08: 0000000000000000 R09: 0000000000000000 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371757.810053] R10: ffffffff81600edf R11: ffffea003fef7a00 R12: ffff881012fb4c58 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371757.918811] R13: ffff8810114a9810 R14: ffff881012790000 R15: ffff881012790020 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.029661] libceph: osd32 down Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.029662] libceph: osd33 down Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.029662] libceph: osd38 down Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.029662] libceph: osd39 down Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.029663] libceph: osd40 down Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.029663] libceph: osd47 down Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.029663] libceph: osd48 down Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.029663] libceph: osd49 down Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.029664] libceph: osd50 down Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.029664] libceph: osd51 down Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.029664] libceph: osd52 down Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.029665] libceph: osd53 down Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.029665] libceph: osd57 down Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.631655] FS: 0000000000000000(0000) GS:ffff88101f300000(0000) knlGS:0000000000000000 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.700074] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.734306] CR2: 00007f0bbad49000 CR3: 0000000001c0e000 CR4: 00000000001407e0 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.800693] Stack: Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.832457] ffff8810114a97a8 ffff8810114a9760 ffff881012fb4800 ffff881012fb4ca8 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.897340] ffff880ffaa2bda0 ffff880ffaa2bda0 ffff881012fb4c10 ffff881012fb4830 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371758.962318] ffff881012fb49b0 ffff881012fb4860 0000000000000011 ffff880ffaa2be20 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371759.027390] Call Trace: Dec 9 01:38:01 rack1-ramp-5 kernel: [1371759.058230] [<ffffffffa03549e8>] con_work+0x298/0x640 [libceph] Dec 9 01:38:01 rack1-ramp-5 kernel: [1371759.089619] [<ffffffff810838a2>] process_one_work+0x182/0x450 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371759.120139] [<ffffffff81084641>] worker_thread+0x121/0x410 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371759.149533] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371759.179041] [<ffffffff8108b312>] kthread+0xd2/0xf0 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371759.209159] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371759.240921] [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371759.273511] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371759.307636] Code: ff ff 48 89 df e8 e3 f1 ff ff 48 8b 7d a8 e8 7a 1c 3c e1 48 8b 7d b0 e8 41 68 d5 e0 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 48 8b 45 b8 49 8b 0e 4c 89 f2 48 c7 c6 d0 e6 36 a0 48 c7 Dec 9 01:38:01 rack1-ramp-5 kernel: [1371759.421674] RIP [<ffffffffa035a40e>] osd_reset+0x22e/0x2c0 [libceph] Dec 9 01:38:01 rack1-ramp-5 kernel: [1371759.462127] RSP <ffff880ffaa2bd80> Dec 9 01:38:01 rack1-ramp-5 kernel: [1371759.567952] ---[ end trace 37d00d439ac66995 ]--- Dec 9 01:38:17 rack1-ramp-5 kernel: [1371759.614230] BUG: unable to handle kernel paging request at ffffffffffffffd8 Dec 9 01:38:17 rack1-ramp-5 kernel: [1371759.659349] IP: [<ffffffff8108b9b0>] kthread_data+0x10/0x20 Thanks & Regards Somnath -----Original Message----- From: Somnath Roy Sent: Monday, January 05, 2015 1:08 PM To: 'Ilya Dryomov' Cc: Chaitanya Huilgol; ceph-devel@vger.kernel.org Subject: RE: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) It's happening both in idle and under load. I don't have the trace right now but will get you one soon. Thanks & Regards Somnath -----Original Message----- From: Ilya Dryomov [mailto:ilya.dryomov@inktank.com] Sent: Monday, January 05, 2015 12:34 PM To: Somnath Roy Cc: Chaitanya Huilgol; ceph-devel@vger.kernel.org Subject: Re: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) On Mon, Jan 5, 2015 at 11:01 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote: > Ilya, > Here is the steps.. > > 1. You have a cluster (3 nodes) and replication is 3 > > 2. map krbd image to a client. > > 3. Reboot or stop ceph services on one or more nodes > > 4. The client with krbd mapped module crashes Is it idle or under load? Do you have a trace of the crash? Thanks, Ilya ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). [-- Attachment #2: syslog.tar.gz --] [-- Type: application/x-gzip, Size: 64086 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) 2015-01-05 15:57 ` Ilya Dryomov 2015-01-05 17:11 ` Somnath Roy @ 2015-01-06 2:36 ` Chaitanya Huilgol 1 sibling, 0 replies; 15+ messages in thread From: Chaitanya Huilgol @ 2015-01-06 2:36 UTC (permalink / raw) To: Ilya Dryomov; +Cc: ceph-devel@vger.kernel.org Hi Ilya, Can you please point us to the sources for the ubuntu ceph-client with the fixes, the ceph-client code that comes with the linux-source debian package does not seem to contain many of the fixes and I did not see any ceph-client patch over the 3.13 kernel either. Looks like I might be looking at the wrong place. Regards, Chaitanya -----Original Message----- From: Ilya Dryomov [mailto:ilya.dryomov@inktank.com] Sent: Monday, January 05, 2015 9:28 PM To: Chaitanya Huilgol Cc: ceph-devel@vger.kernel.org Subject: Re: Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) On Mon, Jan 5, 2015 at 1:53 PM, Chaitanya Huilgol <Chaitanya.Huilgol@sandisk.com> wrote: > Hi All, > > The stock ceph-client modules with Ubuntu 14.04 LTS are quite dated and we are seeing crashes and soft-lockup issues which have been fixed in the current ceph-client code base. > What would be recommended ceph-client branch compatible with the Ubuntu 14.04 (3.13.0-x) kernels so that we can get as many fixes as possible? We actively mark rbd (not so much cephfs) fixes for stable and Ubuntu kernel team generally picks them up. 3.13 series should have most of the important fixes, although I haven't counted. What issues in particular you are running into? uname -a? Thanks, Ilya ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-01-08 8:49 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-01-05 10:53 Ceph-client branch for Ubuntu 14.04.1 LTS (3.13.0-x kernels) Chaitanya Huilgol 2015-01-05 11:15 ` Wido den Hollander 2015-01-08 8:49 ` joel.merrick 2015-01-05 15:57 ` Ilya Dryomov 2015-01-05 17:11 ` Somnath Roy 2015-01-05 18:50 ` Ilya Dryomov 2015-01-05 20:01 ` Somnath Roy 2015-01-05 20:33 ` Ilya Dryomov 2015-01-05 21:08 ` Somnath Roy 2015-01-06 12:31 ` Chaitanya Huilgol 2015-01-06 14:19 ` Ilya Dryomov 2015-01-08 3:30 ` Chaitanya Huilgol 2015-01-08 8:22 ` Ilya Dryomov 2015-01-05 21:54 ` Somnath Roy 2015-01-06 2:36 ` Chaitanya Huilgol
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.