From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wido den Hollander Subject: Re: handling fs errors Date: Tue, 22 Jan 2013 14:12:23 +0100 Message-ID: <50FE9037.8040501@widodh.nl> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from smtp01.mail.pcextreme.nl ([109.72.87.137]:34682 "EHLO smtp01.mail.pcextreme.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751726Ab3AVNM2 (ORCPT ); Tue, 22 Jan 2013 08:12:28 -0500 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Yehuda Sadeh Cc: Sage Weil , ceph-devel@vger.kernel.org On 01/22/2013 07:12 AM, Yehuda Sadeh wrote: > On Mon, Jan 21, 2013 at 10:05 PM, Sage Weil wrote: >> We observed an interesting situation over the weekend. The XFS volume >> ceph-osd locked up (hung in xfs_ilock) for somewhere between 2 and 4 >> minutes. After 3 minutes (180s), ceph-osd gave up waiting and committed >> suicide. XFS seemed to unwedge itself a bit after that, as the daemon was >> able to restart and continue. >> >> The problem is that during that 180s the OSD was claiming to be alive but >> not able to do any IO. That heartbeat check is meant as a sanity check >> against a wedged kernel, but waiting so long meant that the ceph-osd >> wasn't failed by the cluster quickly enough and client IO stalled. >> >> We could simply change that timeout to something close to the heartbeat >> interval (currently default is 20s). That will make ceph-osd much more >> sensitive to fs stalls that may be transient (high load, whatever). >> >> Another option would be to make the osd heartbeat replies conditional on >> whether the internal heartbeat is healthy. Then the heartbeat warnings >> could start at 10-20s, ping replies would pause, but the suicide could >> still be 180s out. If the stall is short-lived, pings will continue, the >> osd will mark itself back up (if it was marked down) and continue. >> >> Having written that out, the last option sounds like the obvious choice. >> Any other thoughts? >> > > Another option would be to have the osd reply to the ping with some > health description. > Looking to the future with more monitoring that might be a good idea. If an OSD simply stops sending heartbeats if the internal conditions aren't met you don't know what's going on. If the heartbeat would have metadata which tells: "I'm here, but not in such a good shape" that could be reported back to the monitors. Monitoring tools could read this out and could sent out notifications/alerts to where they want. Now we assume I/O completely stalls, but the metadata could also contain high latency? If the latency goes over threshold X you can still mark the OSD out temporarily since it will impact clients, but some information towards the monitor might be useful. Wido > Yehuda > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >