From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yann Dupont Subject: Re: domino-style OSD crash Date: Mon, 09 Jul 2012 21:05:04 +0200 Message-ID: <4FFB2B60.5000201@univ-nantes.fr> References: <4FCC7573.3000704@univ-nantes.fr> <4FF2AFEB.1010403@univ-nantes.fr> <4FF35C01.4070400@univ-nantes.fr> <4FF3F98C.30602@univ-nantes.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from smtptls1-cha.cpub.univ-nantes.fr ([193.52.103.113]:41968 "EHLO smtp-tls.univ-nantes.fr" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751242Ab2GITFN (ORCPT ); Mon, 9 Jul 2012 15:05:13 -0400 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Tommi Virtanen Cc: Sam Just , ceph-devel Le 09/07/2012 19:43, Tommi Virtanen a =C3=A9crit : > On Wed, Jul 4, 2012 at 1:06 AM, Yann Dupont wrote: >> Well, I probably wasn't clear enough. I talked about crashed FS, but= i was >> talking about ceph. The underlying FS (btrfs in that case) of 1 node= (and >> only one) has PROBABLY crashed in the past, causing corruption in ce= ph data >> on this node, and then the subsequent crash of other nodes. >> >> RIGHT now btrfs on this node is OK. I can access the filesystem with= out >> errors. > But the LevelDB isn't. It's contents got corrupted, somehow somewhere= , > and it really is up to the LevelDB library to tolerate those errors; > we have a simple get/put interface we use, and LevelDB is triggering > an internal error. Yes, understood. >> One node had problem with btrfs, leading first to kernel problem , p= robably >> corruption (in disk/ in memory maybe ?) ,and ultimately to a kernel = oops. >> Before that ultimate kernel oops, bad data has been transmitted to o= ther >> (sane) nodes, leading to ceph-osd crash on thoses nodes. > The LevelDB binary contents are not transferred over to other nodes; Ok thanks for the clarification ; > this kind of corruption would not spread over the Ceph clustering > mechanisms. It's more likely that you have 4 independently corrupted > LevelDBs. Something in the workload Ceph runs makes that corruption > quite likely. Very likely : since I reformatted my nodes with XFS I don't have=20 problems so far. > > The information here isn't enough to say whether the cause of the > corruption is btrfs or LevelDB, but the recovery needs to handled by > LevelDB -- and upstream is working on making it more robust: > http://code.google.com/p/leveldb/issues/detail?id=3D97 Yes, saw this. It's very important. Sometimes, s... happens. In respect= =20 to the size ceph volumes can reach, having a tool to restart damaged=20 nodes (for whatever reason) is a must. Thanks for the time you took to answer. It's much clearer for me now. Cheers, --=20 Yann Dupont - Service IRTS, DSI Universit=C3=A9 de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html