All of lore.kernel.org
 help / color / mirror / Atom feed
* "umount" of ceph filesystem that has become unavailable hangs forever
@ 2010-06-16 15:35 Peter Niemayer
  2010-06-16 18:56 ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Niemayer @ 2010-06-16 15:35 UTC (permalink / raw)
  To: ceph-devel

Hi,

trying to "umount" a formerly mounted ceph filesystem that has become
unavailable (osd crashed, then msd/mon were shut down using 
/etc/init.d/ceph stop) results in "umount" hanging forever in
"D" state.

Strangely, "umount -f" started from another terminal reports
the ceph filesystem as not being mounted anymore, which is consistent
with what the mount-table says.

The kernel keeps emitting the following messages from time to time:
> Jun 16 17:25:29 gitega kernel: ceph:  tid 211912 timed out on osd0, will reset osd
> Jun 16 17:25:35 gitega kernel: ceph: mon0 10.166.166.1:6789 connection failed
> Jun 16 17:26:15 gitega last message repeated 4 times

I would have expected the "umount" to terminate at least after some 
generous timeout.

Ceph should probably support something like the "soft,intr" options
of NFS, because if the only supported way of mounting is one where
a client is more or less stuck-until-reboot when the service fails,
many potential test-configurations involving Ceph are way too dangerous
to try...

Regards,

Peter Niemayer



^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: "umount" of ceph filesystem that has become unavailable hangs forever
@ 2010-07-23 11:36 Sébastien Paolacci
  2010-07-23 11:42 ` Anton V.G.
  2010-07-23 16:56 ` Sage Weil
  0 siblings, 2 replies; 11+ messages in thread
From: Sébastien Paolacci @ 2010-07-23 11:36 UTC (permalink / raw)
  To: ceph-devel

Hello Sage,

I would like to emphasize that this issue is somewhat annoying, even
for experiment purpose: I definitely expect my test server to not
behave safely, crash, burn or whatever, but having a client side
impact as deep as needed a (hard) reboot to solved a hanged ceph
really prevent me from testing with real life payloads.

I understand that it's not an easy point but a lot of my colleagues
are not really whiling to sacrifice even their dev workstation to play
during spare time... sad world ;)

Sebastien

On Wed, 16 Jun 2010, Peter Niemayer wrote:
> Hi,
>
> trying to "umount" a formerly mounted ceph filesystem that has become
> unavailable (osd crashed, then msd/mon were shut down using /etc/init.d/ceph
> stop) results in "umount" hanging forever in
> "D" state.
>
> Strangely, "umount -f" started from another terminal reports
> the ceph filesystem as not being mounted anymore, which is consistent
> with what the mount-table says.
>
> The kernel keeps emitting the following messages from time to time:
> > Jun 16 17:25:29 gitega kernel: ceph:  tid 211912 timed out on osd0, will
> > reset osd
> > Jun 16 17:25:35 gitega kernel: ceph: mon0 10.166.166.1:6789 connection
> > failed
> > Jun 16 17:26:15 gitega last message repeated 4 times
>
> I would have expected the "umount" to terminate at least after some generous
> timeout.
>
> Ceph should probably support something like the "soft,intr" options
> of NFS, because if the only supported way of mounting is one where
> a client is more or less stuck-until-reboot when the service fails,
> many potential test-configurations involving Ceph are way too dangerous
> to try...

Yeah, being able to force it to shut down when servers are unresponsive is
definitely the intent.  'umount -f' should work.  It sounds like the
problem is related to the initial 'umount' (which doesn't time out)
followed by 'umount -f'.

I'm hesitant to add a blanket umount timeout, as that could prevent proper
writeout of cached data/metadata in some cases.  So I think the goal
should be that if a normal umount hangs for some reason, you should be
able to intervene to add the 'force' if things don't go well.

sage
--

^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: "umount" of ceph filesystem that has become unavailable hangs forever
@ 2010-07-23 11:43 Anton
  2010-07-23 12:08 ` Sébastien Paolacci
  0 siblings, 1 reply; 11+ messages in thread
From: Anton @ 2010-07-23 11:43 UTC (permalink / raw)
  To: Sébastien Paolacci; +Cc: ceph-devel

Did you try an umount -l (lasy umount) - should just 
disconnect the fs - as I experienced with other network FS - 
like NFS or Gluster - you may always have difficulties with 
any of them - so "-l" helps me. Not sure for CEPH though.

On Friday 23 July 2010, Sébastien Paolacci wrote:
> Hello Sage,
> 
> I would like to emphasize that this issue is somewhat
> annoying, even for experiment purpose: I definitely
> expect my test server to not behave safely, crash, burn
> or whatever, but having a client side impact as deep as
> needed a (hard) reboot to solved a hanged ceph really
> prevent me from testing with real life payloads.
> 
> I understand that it's not an easy point but a lot of my
> colleagues are not really whiling to sacrifice even
> their dev workstation to play during spare time... sad
> world ;)
> 
> Sebastien
> 
> On Wed, 16 Jun 2010, Peter Niemayer wrote:
> > Hi,
> > 
> > trying to "umount" a formerly mounted ceph filesystem
> > that has become unavailable (osd crashed, then msd/mon
> > were shut down using /etc/init.d/ceph stop) results in
> > "umount" hanging forever in
> > "D" state.
> > 
> > Strangely, "umount -f" started from another terminal
> > reports the ceph filesystem as not being mounted
> > anymore, which is consistent with what the mount-table
> > says.
> > 
> > The kernel keeps emitting the following messages from 
time to time:
> > > Jun 16 17:25:29 gitega kernel: ceph:  tid 211912
> > > timed out on osd0, will reset osd
> > > Jun 16 17:25:35 gitega kernel: ceph: mon0
> > > 10.166.166.1:6789 connection failed
> > > Jun 16 17:26:15 gitega last message repeated 4 times
> > 
> > I would have expected the "umount" to terminate at
> > least after some generous timeout.
> > 
> > Ceph should probably support something like the
> > "soft,intr" options of NFS, because if the only
> > supported way of mounting is one where a client is
> > more or less stuck-until-reboot when the service
> > fails, many potential test-configurations involving
> > Ceph are way too dangerous to try...
> 
> Yeah, being able to force it to shut down when servers
> are unresponsive is definitely the intent.  'umount -f'
> should work.  It sounds like the problem is related to
> the initial 'umount' (which doesn't time out) followed
> by 'umount -f'.
> 
> I'm hesitant to add a blanket umount timeout, as that
> could prevent proper writeout of cached data/metadata in
> some cases.  So I think the goal should be that if a
> normal umount hangs for some reason, you should be able
> to intervene to add the 'force' if things don't go well.
> 
> sage
> --
> --
> To unsubscribe from this list: send the line "unsubscribe
> ceph-devel" in the body of a message to
> majordomo@vger.kernel.org More majordomo info at 
> http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-07-27 17:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-16 15:35 "umount" of ceph filesystem that has become unavailable hangs forever Peter Niemayer
2010-06-16 18:56 ` Sage Weil
2010-06-17 11:36   ` Thomas Mueller
  -- strict thread matches above, loose matches on Subject: below --
2010-07-23 11:36 Sébastien Paolacci
2010-07-23 11:42 ` Anton V.G.
2010-07-23 16:56 ` Sage Weil
2010-07-24  8:36   ` Sébastien Paolacci
2010-07-27 10:49     ` Anton VG
2010-07-27 17:18       ` Sage Weil
2010-07-23 11:43 Anton
2010-07-23 12:08 ` Sébastien Paolacci

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.