From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anton Subject: Re: "umount" of ceph filesystem that has become unavailable hangs forever Date: Fri, 23 Jul 2010 16:43:37 +0500 Message-ID: <201007231643.37780.anton.vazir@gmail.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mail.eastera.tj ([62.122.137.85]:44129 "EHLO mail.eastera.tj" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758730Ab0GWLvq (ORCPT ); Fri, 23 Jul 2010 07:51:46 -0400 Sender: ceph-devel-owner@vger.kernel.org List-ID: To: =?iso-8859-1?q?S=E9bastien_Paolacci?= Cc: ceph-devel@vger.kernel.org Did you try an umount -l (lasy umount) - should just=20 disconnect the fs - as I experienced with other network FS -=20 like NFS or Gluster - you may always have difficulties with=20 any of them - so "-l" helps me. Not sure for CEPH though. On Friday 23 July 2010, S=E9bastien Paolacci wrote: > Hello Sage, >=20 > I would like to emphasize that this issue is somewhat > annoying, even for experiment purpose: I definitely > expect my test server to not behave safely, crash, burn > or whatever, but having a client side impact as deep as > needed a (hard) reboot to solved a hanged ceph really > prevent me from testing with real life payloads. >=20 > I understand that it's not an easy point but a lot of my > colleagues are not really whiling to sacrifice even > their dev workstation to play during spare time... sad > world ;) >=20 > Sebastien >=20 > On Wed, 16 Jun 2010, Peter Niemayer wrote: > > Hi, > >=20 > > trying to "umount" a formerly mounted ceph filesystem > > that has become unavailable (osd crashed, then msd/mon > > were shut down using /etc/init.d/ceph stop) results in > > "umount" hanging forever in > > "D" state. > >=20 > > Strangely, "umount -f" started from another terminal > > reports the ceph filesystem as not being mounted > > anymore, which is consistent with what the mount-table > > says. > >=20 > > The kernel keeps emitting the following messages from=20 time to time: > > > Jun 16 17:25:29 gitega kernel: ceph: tid 211912 > > > timed out on osd0, will reset osd > > > Jun 16 17:25:35 gitega kernel: ceph: mon0 > > > 10.166.166.1:6789 connection failed > > > Jun 16 17:26:15 gitega last message repeated 4 times > >=20 > > I would have expected the "umount" to terminate at > > least after some generous timeout. > >=20 > > Ceph should probably support something like the > > "soft,intr" options of NFS, because if the only > > supported way of mounting is one where a client is > > more or less stuck-until-reboot when the service > > fails, many potential test-configurations involving > > Ceph are way too dangerous to try... >=20 > Yeah, being able to force it to shut down when servers > are unresponsive is definitely the intent. 'umount -f' > should work. It sounds like the problem is related to > the initial 'umount' (which doesn't time out) followed > by 'umount -f'. >=20 > I'm hesitant to add a blanket umount timeout, as that > could prevent proper writeout of cached data/metadata in > some cases. So I think the goal should be that if a > normal umount hangs for some reason, you should be able > to intervene to add the 'force' if things don't go well. >=20 > sage > -- > -- > To unsubscribe from this list: send the line "unsubscribe > ceph-devel" in the body of a message to > majordomo@vger.kernel.org More majordomo info at=20 > http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html