"umount" of ceph filesystem that has become unavailable hangs forever

All of lore.kernel.org
 help / color / mirror / Atom feed

* "umount" of ceph filesystem that has become unavailable hangs forever
@ 2010-06-16 15:35 Peter Niemayer
  2010-06-16 18:56 ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Niemayer @ 2010-06-16 15:35 UTC (permalink / raw)
  To: ceph-devel

Hi,

trying to "umount" a formerly mounted ceph filesystem that has become
unavailable (osd crashed, then msd/mon were shut down using 
/etc/init.d/ceph stop) results in "umount" hanging forever in
"D" state.

Strangely, "umount -f" started from another terminal reports
the ceph filesystem as not being mounted anymore, which is consistent
with what the mount-table says.

The kernel keeps emitting the following messages from time to time:
> Jun 16 17:25:29 gitega kernel: ceph:  tid 211912 timed out on osd0, will reset osd
> Jun 16 17:25:35 gitega kernel: ceph: mon0 10.166.166.1:6789 connection failed
> Jun 16 17:26:15 gitega last message repeated 4 times

I would have expected the "umount" to terminate at least after some 
generous timeout.

Ceph should probably support something like the "soft,intr" options
of NFS, because if the only supported way of mounting is one where
a client is more or less stuck-until-reboot when the service fails,
many potential test-configurations involving Ceph are way too dangerous
to try...

Regards,

Peter Niemayer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "umount" of ceph filesystem that has become unavailable hangs forever
  2010-06-16 15:35 "umount" of ceph filesystem that has become unavailable hangs forever Peter Niemayer
@ 2010-06-16 18:56 ` Sage Weil
  2010-06-17 11:36   ` Thomas Mueller
  0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2010-06-16 18:56 UTC (permalink / raw)
  To: Peter Niemayer; +Cc: ceph-devel

On Wed, 16 Jun 2010, Peter Niemayer wrote:
> Hi,
> 
> trying to "umount" a formerly mounted ceph filesystem that has become
> unavailable (osd crashed, then msd/mon were shut down using /etc/init.d/ceph
> stop) results in "umount" hanging forever in
> "D" state.
> 
> Strangely, "umount -f" started from another terminal reports
> the ceph filesystem as not being mounted anymore, which is consistent
> with what the mount-table says.
> 
> The kernel keeps emitting the following messages from time to time:
> > Jun 16 17:25:29 gitega kernel: ceph:  tid 211912 timed out on osd0, will
> > reset osd
> > Jun 16 17:25:35 gitega kernel: ceph: mon0 10.166.166.1:6789 connection
> > failed
> > Jun 16 17:26:15 gitega last message repeated 4 times
> 
> I would have expected the "umount" to terminate at least after some generous
> timeout.
> 
> Ceph should probably support something like the "soft,intr" options
> of NFS, because if the only supported way of mounting is one where
> a client is more or less stuck-until-reboot when the service fails,
> many potential test-configurations involving Ceph are way too dangerous
> to try...

Yeah, being able to force it to shut down when servers are unresponsive is 
definitely the intent.  'umount -f' should work.  It sounds like the 
problem is related to the initial 'umount' (which doesn't time out) 
followed by 'umount -f'.

I'm hesitant to add a blanket umount timeout, as that could prevent proper 
writeout of cached data/metadata in some cases.  So I think the goal 
should be that if a normal umount hangs for some reason, you should be 
able to intervene to add the 'force' if things don't go well.

sage

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "umount" of ceph filesystem that has become unavailable hangs forever
  2010-06-16 18:56 ` Sage Weil
@ 2010-06-17 11:36   ` Thomas Mueller
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Mueller @ 2010-06-17 11:36 UTC (permalink / raw)
  To: ceph-devel


> I'm hesitant to add a blanket umount timeout, as that could prevent
> proper writeout of cached data/metadata in some cases.  So I think the
> goal should be that if a normal umount hangs for some reason, you should
> be able to intervene to add the 'force' if things don't go well.

the option to force the umount if the regular umount hangs would be really 
cool. 

- Thomas


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "umount" of ceph filesystem that has become unavailable hangs forever
@ 2010-07-23 11:36 Sébastien Paolacci
  2010-07-23 11:42 ` Anton V.G.
  2010-07-23 16:56 ` Sage Weil
  0 siblings, 2 replies; 11+ messages in thread
From: Sébastien Paolacci @ 2010-07-23 11:36 UTC (permalink / raw)
  To: ceph-devel

Hello Sage,

I would like to emphasize that this issue is somewhat annoying, even
for experiment purpose: I definitely expect my test server to not
behave safely, crash, burn or whatever, but having a client side
impact as deep as needed a (hard) reboot to solved a hanged ceph
really prevent me from testing with real life payloads.

I understand that it's not an easy point but a lot of my colleagues
are not really whiling to sacrifice even their dev workstation to play
during spare time... sad world ;)

Sebastien

On Wed, 16 Jun 2010, Peter Niemayer wrote:
> Hi,
>
> trying to "umount" a formerly mounted ceph filesystem that has become
> unavailable (osd crashed, then msd/mon were shut down using /etc/init.d/ceph
> stop) results in "umount" hanging forever in
> "D" state.
>
> Strangely, "umount -f" started from another terminal reports
> the ceph filesystem as not being mounted anymore, which is consistent
> with what the mount-table says.
>
> The kernel keeps emitting the following messages from time to time:
> > Jun 16 17:25:29 gitega kernel: ceph:  tid 211912 timed out on osd0, will
> > reset osd
> > Jun 16 17:25:35 gitega kernel: ceph: mon0 10.166.166.1:6789 connection
> > failed
> > Jun 16 17:26:15 gitega last message repeated 4 times
>
> I would have expected the "umount" to terminate at least after some generous
> timeout.
>
> Ceph should probably support something like the "soft,intr" options
> of NFS, because if the only supported way of mounting is one where
> a client is more or less stuck-until-reboot when the service fails,
> many potential test-configurations involving Ceph are way too dangerous
> to try...

Yeah, being able to force it to shut down when servers are unresponsive is
definitely the intent.  'umount -f' should work.  It sounds like the
problem is related to the initial 'umount' (which doesn't time out)
followed by 'umount -f'.

I'm hesitant to add a blanket umount timeout, as that could prevent proper
writeout of cached data/metadata in some cases.  So I think the goal
should be that if a normal umount hangs for some reason, you should be
able to intervene to add the 'force' if things don't go well.

sage
--

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "umount" of ceph filesystem that has become unavailable hangs forever
  2010-07-23 11:36 Sébastien Paolacci
@ 2010-07-23 11:42 ` Anton V.G.
  2010-07-23 16:56 ` Sage Weil
  1 sibling, 0 replies; 11+ messages in thread
From: Anton V.G. @ 2010-07-23 11:42 UTC (permalink / raw)
  To: Sébastien Paolacci; +Cc: ceph-devel

Did you try an umount -l (lasy umount) - should just 
disconnect the fs - as I experienced with other network FS - 
like NFS or Gluster - you may always have difficulties with 
any of them - so "-l" helps me. Not sure for CEPH though.

On Friday 23 July 2010, Sébastien Paolacci wrote:
> Hello Sage,
> 
> I would like to emphasize that this issue is somewhat
> annoying, even for experiment purpose: I definitely
> expect my test server to not behave safely, crash, burn
> or whatever, but having a client side impact as deep as
> needed a (hard) reboot to solved a hanged ceph really
> prevent me from testing with real life payloads.
> 
> I understand that it's not an easy point but a lot of my
> colleagues are not really whiling to sacrifice even
> their dev workstation to play during spare time... sad
> world ;)
> 
> Sebastien
> 
> On Wed, 16 Jun 2010, Peter Niemayer wrote:
> > Hi,
> > 
> > trying to "umount" a formerly mounted ceph filesystem
> > that has become unavailable (osd crashed, then msd/mon
> > were shut down using /etc/init.d/ceph stop) results in
> > "umount" hanging forever in
> > "D" state.
> > 
> > Strangely, "umount -f" started from another terminal
> > reports the ceph filesystem as not being mounted
> > anymore, which is consistent with what the mount-table
> > says.
> > 
> > The kernel keeps emitting the following messages from 
time to time:
> > > Jun 16 17:25:29 gitega kernel: ceph:  tid 211912
> > > timed out on osd0, will reset osd
> > > Jun 16 17:25:35 gitega kernel: ceph: mon0
> > > 10.166.166.1:6789 connection failed
> > > Jun 16 17:26:15 gitega last message repeated 4 times
> > 
> > I would have expected the "umount" to terminate at
> > least after some generous timeout.
> > 
> > Ceph should probably support something like the
> > "soft,intr" options of NFS, because if the only
> > supported way of mounting is one where a client is
> > more or less stuck-until-reboot when the service
> > fails, many potential test-configurations involving
> > Ceph are way too dangerous to try...
> 
> Yeah, being able to force it to shut down when servers
> are unresponsive is definitely the intent.  'umount -f'
> should work.  It sounds like the problem is related to
> the initial 'umount' (which doesn't time out) followed
> by 'umount -f'.
> 
> I'm hesitant to add a blanket umount timeout, as that
> could prevent proper writeout of cached data/metadata in
> some cases.  So I think the goal should be that if a
> normal umount hangs for some reason, you should be able
> to intervene to add the 'force' if things don't go well.
> 
> sage
> --
> --
> To unsubscribe from this list: send the line "unsubscribe
> ceph-devel" in the body of a message to
> majordomo@vger.kernel.org More majordomo info at 
> http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "umount" of ceph filesystem that has become unavailable hangs forever
  2010-07-23 11:36 Sébastien Paolacci
  2010-07-23 11:42 ` Anton V.G.
@ 2010-07-23 16:56 ` Sage Weil
  2010-07-24  8:36   ` Sébastien Paolacci
  1 sibling, 1 reply; 11+ messages in thread
From: Sage Weil @ 2010-07-23 16:56 UTC (permalink / raw)
  To: Sébastien Paolacci; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3678 bytes --]

On Fri, 23 Jul 2010, Sébastien Paolacci wrote:
> Hello Sage,
> 
> I would like to emphasize that this issue is somewhat annoying, even
> for experiment purpose: I definitely expect my test server to not
> behave safely, crash, burn or whatever, but having a client side
> impact as deep as needed a (hard) reboot to solved a hanged ceph
> really prevent me from testing with real life payloads.

Maybe you can clarify for me exactly where the problem is.  'umount -f' 
should work.  'umount -l' should do a lazy unmount (detach from 
namespace), but the actual unmount code may currently hang.  It's 
debateable how that can/should be solved, since it's the 'sync' stage that 
hangs, and it's not clear we should ever 'give up' on that without an 
administrator telling us to (*).

What problem do you actually see, though?  Why does it matter, or why do 
you care, if the 'umount -l' leaves some kernel threads trying to umount?  
Is it just annoying because it Shouldn't Do That, or does it actually 
cause a problem for you?

It may be that if you try to remount the same fs, the old superblock gets 
reused, and the mount fails somehow... I haven't tried that.  That would 
be an easy fix, though.

Any clarification would be helpful!  Thanks-
sage


* Maybe a hook like /sys/kernel/debug/ceph/.../abort_sync that you can 
echo 1 to would be sufficient to make it give up on a sync (in the umount 
-l case, the sync prior to the actual unmount).


> 
> I understand that it's not an easy point but a lot of my colleagues
> are not really whiling to sacrifice even their dev workstation to play
> during spare time... sad world ;)
> 
> Sebastien
> 
> On Wed, 16 Jun 2010, Peter Niemayer wrote:
> > Hi,
> >
> > trying to "umount" a formerly mounted ceph filesystem that has become
> > unavailable (osd crashed, then msd/mon were shut down using /etc/init.d/ceph
> > stop) results in "umount" hanging forever in
> > "D" state.
> >
> > Strangely, "umount -f" started from another terminal reports
> > the ceph filesystem as not being mounted anymore, which is consistent
> > with what the mount-table says.
> >
> > The kernel keeps emitting the following messages from time to time:
> > > Jun 16 17:25:29 gitega kernel: ceph:  tid 211912 timed out on osd0, will
> > > reset osd
> > > Jun 16 17:25:35 gitega kernel: ceph: mon0 10.166.166.1:6789 connection
> > > failed
> > > Jun 16 17:26:15 gitega last message repeated 4 times
> >
> > I would have expected the "umount" to terminate at least after some generous
> > timeout.
> >
> > Ceph should probably support something like the "soft,intr" options
> > of NFS, because if the only supported way of mounting is one where
> > a client is more or less stuck-until-reboot when the service fails,
> > many potential test-configurations involving Ceph are way too dangerous
> > to try...
> 
> Yeah, being able to force it to shut down when servers are unresponsive is
> definitely the intent.  'umount -f' should work.  It sounds like the
> problem is related to the initial 'umount' (which doesn't time out)
> followed by 'umount -f'.
> 
> I'm hesitant to add a blanket umount timeout, as that could prevent proper
> writeout of cached data/metadata in some cases.  So I think the goal
> should be that if a normal umount hangs for some reason, you should be
> able to intervene to add the 'force' if things don't go well.
> 
> sage
> --
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "umount" of ceph filesystem that has become unavailable hangs forever
  2010-07-23 16:56 ` Sage Weil
@ 2010-07-24  8:36   ` Sébastien Paolacci
  2010-07-27 10:49     ` Anton VG
  0 siblings, 1 reply; 11+ messages in thread
From: Sébastien Paolacci @ 2010-07-24  8:36 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hello Sage,

I was just trying to relive an old thread but I definitely agree that
I didn't make my point clear enough, sorry for that.

The global idea is that whatever happen server-side, the client should
be able to be left in a clean state. By clean I mean that, except data
explicitly pushed to (pulled from) the tested ceph share, no other
side effect from the test session should be visible.

The real issue with hanged unmounts is obviously not with the console
been frozen but with all the subsequent syncs that are going to follow
the same path (and syncs do happen in a real life scenarios, e.g. when
softly halting/restarting a box).

Explicitly aborting the sync (whatever the way) is indeed a seductive
option that would almost solve the point without going so far from a
sync decent safe behavior.

As a matter of convenience, should I just have a few hundred nodes to
restart, I would however expect the sync to automatically abort
because a delay I take the responsibility for as expired and the
kclient is still deeply confident with the ceph tragic dead.

So let's go back to a concrete failure case that can bother a client box ;) :
 - a fresh new and just formated ceph instance is started.
 - the share is mounted on a separate box and one single file is
created (touch /mnt/test).
 - ceph daemons are hardly killed (pkill -9 on cosd, cmds, cmon) and
the share is unmonted.

The umount hang "as expected", but If I wait long enough I'll eventually get a

Jul 24 09:31:16: [ 1163.642060] ceph: loaded (mon/mds/osd proto
15/32/24, osdmap 5/5 5/5)
Jul 24 09:31:16: [ 1163.646098] ceph: client4099 fsid
b003239e-a249-7c47-f7ca-a9b75da2a445
Jul 24 09:31:16: [ 1163.646353] ceph: mon0 192.168.0.3:6789 session established
Jul 24 09:32:05: [ 1213.290150] ceph: mon0 192.168.0.3:6789 session
lost, hunting for new mon
Jul 24 09:33:01: [ 1269.227827] ceph: mds0 caps stale
Jul 24 09:33:16: [ 1284.219034] ceph: mds0 caps stale
Jul 24 09:35:52: [ 1439.844419] umount        D 0000000000000000     0
 2819   2788 0x00000000
Jul 24 09:35:52: [ 1439.844425]  ffff880127a5b880 0000000000000086
0000000000000000 0000000000015640
Jul 24 09:35:52: [ 1439.844430]  0000000000015640 0000000000015640
000000000000f8a0 ffff880124ef1fd8
Jul 24 09:35:52: [ 1439.844435]  0000000000015640 0000000000015640
ffff880086c8b170 ffff880086c8b468
Jul 24 09:35:52: [ 1439.844439] Call Trace:
Jul 24 09:35:52: [ 1439.844455]  [<ffffffffa051b740>] ?
ceph_mdsc_sync+0x1be/0x1da [ceph]
Jul 24 09:35:52: [ 1439.844462]  [<ffffffff81064afa>] ?
autoremove_wake_function+0x0/0x2e
Jul 24 09:35:52: [ 1439.844473]  [<ffffffffa05210ac>] ?
ceph_osdc_sync+0x1d/0xc1 [ceph]
Jul 24 09:35:52: [ 1439.844479]  [<ffffffffa050931f>] ?
ceph_syncfs+0x2a/0x2e [ceph]
Jul 24 09:35:52: [ 1439.844485]  [<ffffffff8110b065>] ?
__sync_filesystem+0x5f/0x70
Jul 24 09:35:52: [ 1439.844489]  [<ffffffff8110b1de>] ?
sync_filesystem+0x2e/0x44
Jul 24 09:35:52: [ 1439.844494]  [<ffffffff810efdfa>] ?
generic_shutdown_super+0x21/0xfa
Jul 24 09:35:52: [ 1439.844498]  [<ffffffff810eff16>] ? kill_anon_super+0x9/0x40
Jul 24 09:35:52: [ 1439.844505]  [<ffffffffa05082ab>] ?
ceph_kill_sb+0x24/0x47 [ceph]
Jul 24 09:35:52: [ 1439.844509]  [<ffffffff810f05c5>] ?
deactivate_super+0x60/0x77
Jul 24 09:35:52: [ 1439.844514]  [<ffffffff81102da3>] ? sys_umount+0x2c3/0x2f2
Jul 24 09:35:52: [ 1439.844521]  [<ffffffff81010b42>] ?
system_call_fastpath+0x16/0x1b
Jul 24 09:37:06: [ 1514.085107] ceph: mds0 hung
Jul 24 09:37:52: [ 1559.774508] umount        D 0000000000000000     0
 2819   2788 0x00000000
Jul 24 09:37:52: [ 1559.774514]  ffff880127a5b880 0000000000000086
0000000000000000 0000000000015640
Jul 24 09:37:52: [ 1559.774519]  0000000000015640 0000000000015640
000000000000f8a0 ffff880124ef1fd8
Jul 24 09:37:52: [ 1559.774524]  0000000000015640 0000000000015640
ffff880086c8b170 ffff880086c8b468
Jul 24 09:37:52: [ 1559.774528] Call Trace:
Jul 24 09:37:52: [ 1559.774545]  [<ffffffffa051b740>] ?
ceph_mdsc_sync+0x1be/0x1da [ceph]
Jul 24 09:37:52: [ 1559.774552]  [<ffffffff81064afa>] ?
autoremove_wake_function+0x0/0x2e
Jul 24 09:37:52: [ 1559.774562]  [<ffffffffa05210ac>] ?
ceph_osdc_sync+0x1d/0xc1 [ceph]
Jul 24 09:37:52: [ 1559.774569]  [<ffffffffa050931f>] ?
ceph_syncfs+0x2a/0x2e [ceph]
Jul 24 09:37:52: [ 1559.774574]  [<ffffffff8110b065>] ?
__sync_filesystem+0x5f/0x70
Jul 24 09:37:52: [ 1559.774578]  [<ffffffff8110b1de>] ?
sync_filesystem+0x2e/0x44
Jul 24 09:37:52: [ 1559.774584]  [<ffffffff810efdfa>] ?
generic_shutdown_super+0x21/0xfa
Jul 24 09:37:52: [ 1559.774589]  [<ffffffff810eff16>] ? kill_anon_super+0x9/0x40
Jul 24 09:37:52: [ 1559.774595]  [<ffffffffa05082ab>] ?
ceph_kill_sb+0x24/0x47 [ceph]
Jul 24 09:37:52: [ 1559.774600]  [<ffffffff810f05c5>] ?
deactivate_super+0x60/0x77
Jul 24 09:37:52: [ 1559.774604]  [<ffffffff81102da3>] ? sys_umount+0x2c3/0x2f2
Jul 24 09:37:52: [ 1559.774612]  [<ffffffff81010b42>] ?
system_call_fastpath+0x16/0x1b
(... repeating forever ...)

The box now as to be hardly powered off and a fsck will possibly
follow the restart...

I'm not saying that this situation is not to be expected when testing
a not prod ready system, I'm just trying to emphasize that client
safety may actually be a blocking point for some more people to give a
try.

Hope this clarifies,
Sebastien


2010/7/23 Sage Weil <sage@newdream.net>:
> On Fri, 23 Jul 2010, Sébastien Paolacci wrote:
>> Hello Sage,
>>
>> I would like to emphasize that this issue is somewhat annoying, even
>> for experiment purpose: I definitely expect my test server to not
>> behave safely, crash, burn or whatever, but having a client side
>> impact as deep as needed a (hard) reboot to solved a hanged ceph
>> really prevent me from testing with real life payloads.
>
> Maybe you can clarify for me exactly where the problem is.  'umount -f'
> should work.  'umount -l' should do a lazy unmount (detach from
> namespace), but the actual unmount code may currently hang.  It's
> debateable how that can/should be solved, since it's the 'sync' stage that
> hangs, and it's not clear we should ever 'give up' on that without an
> administrator telling us to (*).
>
> What problem do you actually see, though?  Why does it matter, or why do
> you care, if the 'umount -l' leaves some kernel threads trying to umount?
> Is it just annoying because it Shouldn't Do That, or does it actually
> cause a problem for you?
>
> It may be that if you try to remount the same fs, the old superblock gets
> reused, and the mount fails somehow... I haven't tried that.  That would
> be an easy fix, though.
>
> Any clarification would be helpful!  Thanks-
> sage
>
>
> * Maybe a hook like /sys/kernel/debug/ceph/.../abort_sync that you can
> echo 1 to would be sufficient to make it give up on a sync (in the umount
> -l case, the sync prior to the actual unmount).
>
>
>>
>> I understand that it's not an easy point but a lot of my colleagues
>> are not really whiling to sacrifice even their dev workstation to play
>> during spare time... sad world ;)
>>
>> Sebastien
>>
>> On Wed, 16 Jun 2010, Peter Niemayer wrote:
>> > Hi,
>> >
>> > trying to "umount" a formerly mounted ceph filesystem that has become
>> > unavailable (osd crashed, then msd/mon were shut down using /etc/init.d/ceph
>> > stop) results in "umount" hanging forever in
>> > "D" state.
>> >
>> > Strangely, "umount -f" started from another terminal reports
>> > the ceph filesystem as not being mounted anymore, which is consistent
>> > with what the mount-table says.
>> >
>> > The kernel keeps emitting the following messages from time to time:
>> > > Jun 16 17:25:29 gitega kernel: ceph:  tid 211912 timed out on osd0, will
>> > > reset osd
>> > > Jun 16 17:25:35 gitega kernel: ceph: mon0 10.166.166.1:6789 connection
>> > > failed
>> > > Jun 16 17:26:15 gitega last message repeated 4 times
>> >
>> > I would have expected the "umount" to terminate at least after some generous
>> > timeout.
>> >
>> > Ceph should probably support something like the "soft,intr" options
>> > of NFS, because if the only supported way of mounting is one where
>> > a client is more or less stuck-until-reboot when the service fails,
>> > many potential test-configurations involving Ceph are way too dangerous
>> > to try...
>>
>> Yeah, being able to force it to shut down when servers are unresponsive is
>> definitely the intent.  'umount -f' should work.  It sounds like the
>> problem is related to the initial 'umount' (which doesn't time out)
>> followed by 'umount -f'.
>>
>> I'm hesitant to add a blanket umount timeout, as that could prevent proper
>> writeout of cached data/metadata in some cases.  So I think the goal
>> should be that if a normal umount hangs for some reason, you should be
>> able to intervene to add the 'force' if things don't go well.
>>
>> sage
>> --
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "umount" of ceph filesystem that has become unavailable hangs forever
  2010-07-24  8:36   ` Sébastien Paolacci
@ 2010-07-27 10:49     ` Anton VG
  2010-07-27 17:18       ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: Anton VG @ 2010-07-27 10:49 UTC (permalink / raw)
  To: Sébastien Paolacci; +Cc: Sage Weil, ceph-devel

Sage, is looks logical, that if the user issues "umount -l" - the code
should give up syncing and clear the state. Or possibly there should
be a /proc/...whatever or /sys/...whatever setting to define a default
timeout to give up syncing.

2010/7/24 Sébastien Paolacci <sebastien.paolacci@gmail.com>:
> Hello Sage,
>
> I was just trying to relive an old thread but I definitely agree that
> I didn't make my point clear enough, sorry for that.
>
> The global idea is that whatever happen server-side, the client should
> be able to be left in a clean state. By clean I mean that, except data
> explicitly pushed to (pulled from) the tested ceph share, no other
> side effect from the test session should be visible.
>
> The real issue with hanged unmounts is obviously not with the console
> been frozen but with all the subsequent syncs that are going to follow
> the same path (and syncs do happen in a real life scenarios, e.g. when
> softly halting/restarting a box).
>
> Explicitly aborting the sync (whatever the way) is indeed a seductive
> option that would almost solve the point without going so far from a
> sync decent safe behavior.
>
> As a matter of convenience, should I just have a few hundred nodes to
> restart, I would however expect the sync to automatically abort
> because a delay I take the responsibility for as expired and the
> kclient is still deeply confident with the ceph tragic dead.
>
> So let's go back to a concrete failure case that can bother a client box ;) :
>  - a fresh new and just formated ceph instance is started.
>  - the share is mounted on a separate box and one single file is
> created (touch /mnt/test).
>  - ceph daemons are hardly killed (pkill -9 on cosd, cmds, cmon) and
> the share is unmonted.
>
> The umount hang "as expected", but If I wait long enough I'll eventually get a
>
> Jul 24 09:31:16: [ 1163.642060] ceph: loaded (mon/mds/osd proto
> 15/32/24, osdmap 5/5 5/5)
> Jul 24 09:31:16: [ 1163.646098] ceph: client4099 fsid
> b003239e-a249-7c47-f7ca-a9b75da2a445
> Jul 24 09:31:16: [ 1163.646353] ceph: mon0 192.168.0.3:6789 session established
> Jul 24 09:32:05: [ 1213.290150] ceph: mon0 192.168.0.3:6789 session
> lost, hunting for new mon
> Jul 24 09:33:01: [ 1269.227827] ceph: mds0 caps stale
> Jul 24 09:33:16: [ 1284.219034] ceph: mds0 caps stale
> Jul 24 09:35:52: [ 1439.844419] umount        D 0000000000000000     0
>  2819   2788 0x00000000
> Jul 24 09:35:52: [ 1439.844425]  ffff880127a5b880 0000000000000086
> 0000000000000000 0000000000015640
> Jul 24 09:35:52: [ 1439.844430]  0000000000015640 0000000000015640
> 000000000000f8a0 ffff880124ef1fd8
> Jul 24 09:35:52: [ 1439.844435]  0000000000015640 0000000000015640
> ffff880086c8b170 ffff880086c8b468
> Jul 24 09:35:52: [ 1439.844439] Call Trace:
> Jul 24 09:35:52: [ 1439.844455]  [<ffffffffa051b740>] ?
> ceph_mdsc_sync+0x1be/0x1da [ceph]
> Jul 24 09:35:52: [ 1439.844462]  [<ffffffff81064afa>] ?
> autoremove_wake_function+0x0/0x2e
> Jul 24 09:35:52: [ 1439.844473]  [<ffffffffa05210ac>] ?
> ceph_osdc_sync+0x1d/0xc1 [ceph]
> Jul 24 09:35:52: [ 1439.844479]  [<ffffffffa050931f>] ?
> ceph_syncfs+0x2a/0x2e [ceph]
> Jul 24 09:35:52: [ 1439.844485]  [<ffffffff8110b065>] ?
> __sync_filesystem+0x5f/0x70
> Jul 24 09:35:52: [ 1439.844489]  [<ffffffff8110b1de>] ?
> sync_filesystem+0x2e/0x44
> Jul 24 09:35:52: [ 1439.844494]  [<ffffffff810efdfa>] ?
> generic_shutdown_super+0x21/0xfa
> Jul 24 09:35:52: [ 1439.844498]  [<ffffffff810eff16>] ? kill_anon_super+0x9/0x40
> Jul 24 09:35:52: [ 1439.844505]  [<ffffffffa05082ab>] ?
> ceph_kill_sb+0x24/0x47 [ceph]
> Jul 24 09:35:52: [ 1439.844509]  [<ffffffff810f05c5>] ?
> deactivate_super+0x60/0x77
> Jul 24 09:35:52: [ 1439.844514]  [<ffffffff81102da3>] ? sys_umount+0x2c3/0x2f2
> Jul 24 09:35:52: [ 1439.844521]  [<ffffffff81010b42>] ?
> system_call_fastpath+0x16/0x1b
> Jul 24 09:37:06: [ 1514.085107] ceph: mds0 hung
> Jul 24 09:37:52: [ 1559.774508] umount        D 0000000000000000     0
>  2819   2788 0x00000000
> Jul 24 09:37:52: [ 1559.774514]  ffff880127a5b880 0000000000000086
> 0000000000000000 0000000000015640
> Jul 24 09:37:52: [ 1559.774519]  0000000000015640 0000000000015640
> 000000000000f8a0 ffff880124ef1fd8
> Jul 24 09:37:52: [ 1559.774524]  0000000000015640 0000000000015640
> ffff880086c8b170 ffff880086c8b468
> Jul 24 09:37:52: [ 1559.774528] Call Trace:
> Jul 24 09:37:52: [ 1559.774545]  [<ffffffffa051b740>] ?
> ceph_mdsc_sync+0x1be/0x1da [ceph]
> Jul 24 09:37:52: [ 1559.774552]  [<ffffffff81064afa>] ?
> autoremove_wake_function+0x0/0x2e
> Jul 24 09:37:52: [ 1559.774562]  [<ffffffffa05210ac>] ?
> ceph_osdc_sync+0x1d/0xc1 [ceph]
> Jul 24 09:37:52: [ 1559.774569]  [<ffffffffa050931f>] ?
> ceph_syncfs+0x2a/0x2e [ceph]
> Jul 24 09:37:52: [ 1559.774574]  [<ffffffff8110b065>] ?
> __sync_filesystem+0x5f/0x70
> Jul 24 09:37:52: [ 1559.774578]  [<ffffffff8110b1de>] ?
> sync_filesystem+0x2e/0x44
> Jul 24 09:37:52: [ 1559.774584]  [<ffffffff810efdfa>] ?
> generic_shutdown_super+0x21/0xfa
> Jul 24 09:37:52: [ 1559.774589]  [<ffffffff810eff16>] ? kill_anon_super+0x9/0x40
> Jul 24 09:37:52: [ 1559.774595]  [<ffffffffa05082ab>] ?
> ceph_kill_sb+0x24/0x47 [ceph]
> Jul 24 09:37:52: [ 1559.774600]  [<ffffffff810f05c5>] ?
> deactivate_super+0x60/0x77
> Jul 24 09:37:52: [ 1559.774604]  [<ffffffff81102da3>] ? sys_umount+0x2c3/0x2f2
> Jul 24 09:37:52: [ 1559.774612]  [<ffffffff81010b42>] ?
> system_call_fastpath+0x16/0x1b
> (... repeating forever ...)
>
> The box now as to be hardly powered off and a fsck will possibly
> follow the restart...
>
> I'm not saying that this situation is not to be expected when testing
> a not prod ready system, I'm just trying to emphasize that client
> safety may actually be a blocking point for some more people to give a
> try.
>
> Hope this clarifies,
> Sebastien
>
>
> 2010/7/23 Sage Weil <sage@newdream.net>:
>> On Fri, 23 Jul 2010, Sébastien Paolacci wrote:
>>> Hello Sage,
>>>
>>> I would like to emphasize that this issue is somewhat annoying, even
>>> for experiment purpose: I definitely expect my test server to not
>>> behave safely, crash, burn or whatever, but having a client side
>>> impact as deep as needed a (hard) reboot to solved a hanged ceph
>>> really prevent me from testing with real life payloads.
>>
>> Maybe you can clarify for me exactly where the problem is.  'umount -f'
>> should work.  'umount -l' should do a lazy unmount (detach from
>> namespace), but the actual unmount code may currently hang.  It's
>> debateable how that can/should be solved, since it's the 'sync' stage that
>> hangs, and it's not clear we should ever 'give up' on that without an
>> administrator telling us to (*).
>>
>> What problem do you actually see, though?  Why does it matter, or why do
>> you care, if the 'umount -l' leaves some kernel threads trying to umount?
>> Is it just annoying because it Shouldn't Do That, or does it actually
>> cause a problem for you?
>>
>> It may be that if you try to remount the same fs, the old superblock gets
>> reused, and the mount fails somehow... I haven't tried that.  That would
>> be an easy fix, though.
>>
>> Any clarification would be helpful!  Thanks-
>> sage
>>
>>
>> * Maybe a hook like /sys/kernel/debug/ceph/.../abort_sync that you can
>> echo 1 to would be sufficient to make it give up on a sync (in the umount
>> -l case, the sync prior to the actual unmount).
>>
>>
>>>
>>> I understand that it's not an easy point but a lot of my colleagues
>>> are not really whiling to sacrifice even their dev workstation to play
>>> during spare time... sad world ;)
>>>
>>> Sebastien
>>>
>>> On Wed, 16 Jun 2010, Peter Niemayer wrote:
>>> > Hi,
>>> >
>>> > trying to "umount" a formerly mounted ceph filesystem that has become
>>> > unavailable (osd crashed, then msd/mon were shut down using /etc/init.d/ceph
>>> > stop) results in "umount" hanging forever in
>>> > "D" state.
>>> >
>>> > Strangely, "umount -f" started from another terminal reports
>>> > the ceph filesystem as not being mounted anymore, which is consistent
>>> > with what the mount-table says.
>>> >
>>> > The kernel keeps emitting the following messages from time to time:
>>> > > Jun 16 17:25:29 gitega kernel: ceph:  tid 211912 timed out on osd0, will
>>> > > reset osd
>>> > > Jun 16 17:25:35 gitega kernel: ceph: mon0 10.166.166.1:6789 connection
>>> > > failed
>>> > > Jun 16 17:26:15 gitega last message repeated 4 times
>>> >
>>> > I would have expected the "umount" to terminate at least after some generous
>>> > timeout.
>>> >
>>> > Ceph should probably support something like the "soft,intr" options
>>> > of NFS, because if the only supported way of mounting is one where
>>> > a client is more or less stuck-until-reboot when the service fails,
>>> > many potential test-configurations involving Ceph are way too dangerous
>>> > to try...
>>>
>>> Yeah, being able to force it to shut down when servers are unresponsive is
>>> definitely the intent.  'umount -f' should work.  It sounds like the
>>> problem is related to the initial 'umount' (which doesn't time out)
>>> followed by 'umount -f'.
>>>
>>> I'm hesitant to add a blanket umount timeout, as that could prevent proper
>>> writeout of cached data/metadata in some cases.  So I think the goal
>>> should be that if a normal umount hangs for some reason, you should be
>>> able to intervene to add the 'force' if things don't go well.
>>>
>>> sage
>>> --
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "umount" of ceph filesystem that has become unavailable hangs forever
  2010-07-27 10:49     ` Anton VG
@ 2010-07-27 17:18       ` Sage Weil
  0 siblings, 0 replies; 11+ messages in thread
From: Sage Weil @ 2010-07-27 17:18 UTC (permalink / raw)
  To: Anton VG; +Cc: Sébastien Paolacci, ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 11645 bytes --]

On Tue, 27 Jul 2010, Anton VG wrote:
> Sage, is looks logical, that if the user issues "umount -l" - the code
> should give up syncing and clear the state. Or possibly there should
> be a /proc/...whatever or /sys/...whatever setting to define a default
> timeout to give up syncing.

Yeah, I suspect blanket timeouts are going to be the only way to 
really resolve this.  I played around with it a bit yesterday and the 
problem is that even if I make the ceph sync_fs hooks timeout (or 
killable via SIGKILL), a 'sync' still hangs in the generic VFS code when 
it tries to write out dirty inodes.  

I think a 'soft' mount option that allows any server operations time out 
is the way to go.  Currently we behave like nfs's 'hard':

       soft           If an NFS file operation has a major timeout then report
                      an I/O error to the calling program.  The default is  to
                      continue retrying NFS file operations indefinitely.

       hard           If an NFS file operation has a major timeout then report
                      "server not responding"  on  the  console  and  continue

This is http://tracker.newdream.net/issues/206.

sage


> 
> 2010/7/24 Sébastien Paolacci <sebastien.paolacci@gmail.com>:
> > Hello Sage,
> >
> > I was just trying to relive an old thread but I definitely agree that
> > I didn't make my point clear enough, sorry for that.
> >
> > The global idea is that whatever happen server-side, the client should
> > be able to be left in a clean state. By clean I mean that, except data
> > explicitly pushed to (pulled from) the tested ceph share, no other
> > side effect from the test session should be visible.
> >
> > The real issue with hanged unmounts is obviously not with the console
> > been frozen but with all the subsequent syncs that are going to follow
> > the same path (and syncs do happen in a real life scenarios, e.g. when
> > softly halting/restarting a box).
> >
> > Explicitly aborting the sync (whatever the way) is indeed a seductive
> > option that would almost solve the point without going so far from a
> > sync decent safe behavior.
> >
> > As a matter of convenience, should I just have a few hundred nodes to
> > restart, I would however expect the sync to automatically abort
> > because a delay I take the responsibility for as expired and the
> > kclient is still deeply confident with the ceph tragic dead.
> >
> > So let's go back to a concrete failure case that can bother a client box ;) :
> >  - a fresh new and just formated ceph instance is started.
> >  - the share is mounted on a separate box and one single file is
> > created (touch /mnt/test).
> >  - ceph daemons are hardly killed (pkill -9 on cosd, cmds, cmon) and
> > the share is unmonted.
> >
> > The umount hang "as expected", but If I wait long enough I'll eventually get a
> >
> > Jul 24 09:31:16: [ 1163.642060] ceph: loaded (mon/mds/osd proto
> > 15/32/24, osdmap 5/5 5/5)
> > Jul 24 09:31:16: [ 1163.646098] ceph: client4099 fsid
> > b003239e-a249-7c47-f7ca-a9b75da2a445
> > Jul 24 09:31:16: [ 1163.646353] ceph: mon0 192.168.0.3:6789 session established
> > Jul 24 09:32:05: [ 1213.290150] ceph: mon0 192.168.0.3:6789 session
> > lost, hunting for new mon
> > Jul 24 09:33:01: [ 1269.227827] ceph: mds0 caps stale
> > Jul 24 09:33:16: [ 1284.219034] ceph: mds0 caps stale
> > Jul 24 09:35:52: [ 1439.844419] umount        D 0000000000000000     0
> >  2819   2788 0x00000000
> > Jul 24 09:35:52: [ 1439.844425]  ffff880127a5b880 0000000000000086
> > 0000000000000000 0000000000015640
> > Jul 24 09:35:52: [ 1439.844430]  0000000000015640 0000000000015640
> > 000000000000f8a0 ffff880124ef1fd8
> > Jul 24 09:35:52: [ 1439.844435]  0000000000015640 0000000000015640
> > ffff880086c8b170 ffff880086c8b468
> > Jul 24 09:35:52: [ 1439.844439] Call Trace:
> > Jul 24 09:35:52: [ 1439.844455]  [<ffffffffa051b740>] ?
> > ceph_mdsc_sync+0x1be/0x1da [ceph]
> > Jul 24 09:35:52: [ 1439.844462]  [<ffffffff81064afa>] ?
> > autoremove_wake_function+0x0/0x2e
> > Jul 24 09:35:52: [ 1439.844473]  [<ffffffffa05210ac>] ?
> > ceph_osdc_sync+0x1d/0xc1 [ceph]
> > Jul 24 09:35:52: [ 1439.844479]  [<ffffffffa050931f>] ?
> > ceph_syncfs+0x2a/0x2e [ceph]
> > Jul 24 09:35:52: [ 1439.844485]  [<ffffffff8110b065>] ?
> > __sync_filesystem+0x5f/0x70
> > Jul 24 09:35:52: [ 1439.844489]  [<ffffffff8110b1de>] ?
> > sync_filesystem+0x2e/0x44
> > Jul 24 09:35:52: [ 1439.844494]  [<ffffffff810efdfa>] ?
> > generic_shutdown_super+0x21/0xfa
> > Jul 24 09:35:52: [ 1439.844498]  [<ffffffff810eff16>] ? kill_anon_super+0x9/0x40
> > Jul 24 09:35:52: [ 1439.844505]  [<ffffffffa05082ab>] ?
> > ceph_kill_sb+0x24/0x47 [ceph]
> > Jul 24 09:35:52: [ 1439.844509]  [<ffffffff810f05c5>] ?
> > deactivate_super+0x60/0x77
> > Jul 24 09:35:52: [ 1439.844514]  [<ffffffff81102da3>] ? sys_umount+0x2c3/0x2f2
> > Jul 24 09:35:52: [ 1439.844521]  [<ffffffff81010b42>] ?
> > system_call_fastpath+0x16/0x1b
> > Jul 24 09:37:06: [ 1514.085107] ceph: mds0 hung
> > Jul 24 09:37:52: [ 1559.774508] umount        D 0000000000000000     0
> >  2819   2788 0x00000000
> > Jul 24 09:37:52: [ 1559.774514]  ffff880127a5b880 0000000000000086
> > 0000000000000000 0000000000015640
> > Jul 24 09:37:52: [ 1559.774519]  0000000000015640 0000000000015640
> > 000000000000f8a0 ffff880124ef1fd8
> > Jul 24 09:37:52: [ 1559.774524]  0000000000015640 0000000000015640
> > ffff880086c8b170 ffff880086c8b468
> > Jul 24 09:37:52: [ 1559.774528] Call Trace:
> > Jul 24 09:37:52: [ 1559.774545]  [<ffffffffa051b740>] ?
> > ceph_mdsc_sync+0x1be/0x1da [ceph]
> > Jul 24 09:37:52: [ 1559.774552]  [<ffffffff81064afa>] ?
> > autoremove_wake_function+0x0/0x2e
> > Jul 24 09:37:52: [ 1559.774562]  [<ffffffffa05210ac>] ?
> > ceph_osdc_sync+0x1d/0xc1 [ceph]
> > Jul 24 09:37:52: [ 1559.774569]  [<ffffffffa050931f>] ?
> > ceph_syncfs+0x2a/0x2e [ceph]
> > Jul 24 09:37:52: [ 1559.774574]  [<ffffffff8110b065>] ?
> > __sync_filesystem+0x5f/0x70
> > Jul 24 09:37:52: [ 1559.774578]  [<ffffffff8110b1de>] ?
> > sync_filesystem+0x2e/0x44
> > Jul 24 09:37:52: [ 1559.774584]  [<ffffffff810efdfa>] ?
> > generic_shutdown_super+0x21/0xfa
> > Jul 24 09:37:52: [ 1559.774589]  [<ffffffff810eff16>] ? kill_anon_super+0x9/0x40
> > Jul 24 09:37:52: [ 1559.774595]  [<ffffffffa05082ab>] ?
> > ceph_kill_sb+0x24/0x47 [ceph]
> > Jul 24 09:37:52: [ 1559.774600]  [<ffffffff810f05c5>] ?
> > deactivate_super+0x60/0x77
> > Jul 24 09:37:52: [ 1559.774604]  [<ffffffff81102da3>] ? sys_umount+0x2c3/0x2f2
> > Jul 24 09:37:52: [ 1559.774612]  [<ffffffff81010b42>] ?
> > system_call_fastpath+0x16/0x1b
> > (... repeating forever ...)
> >
> > The box now as to be hardly powered off and a fsck will possibly
> > follow the restart...
> >
> > I'm not saying that this situation is not to be expected when testing
> > a not prod ready system, I'm just trying to emphasize that client
> > safety may actually be a blocking point for some more people to give a
> > try.
> >
> > Hope this clarifies,
> > Sebastien
> >
> >
> > 2010/7/23 Sage Weil <sage@newdream.net>:
> >> On Fri, 23 Jul 2010, Sébastien Paolacci wrote:
> >>> Hello Sage,
> >>>
> >>> I would like to emphasize that this issue is somewhat annoying, even
> >>> for experiment purpose: I definitely expect my test server to not
> >>> behave safely, crash, burn or whatever, but having a client side
> >>> impact as deep as needed a (hard) reboot to solved a hanged ceph
> >>> really prevent me from testing with real life payloads.
> >>
> >> Maybe you can clarify for me exactly where the problem is.  'umount -f'
> >> should work.  'umount -l' should do a lazy unmount (detach from
> >> namespace), but the actual unmount code may currently hang.  It's
> >> debateable how that can/should be solved, since it's the 'sync' stage that
> >> hangs, and it's not clear we should ever 'give up' on that without an
> >> administrator telling us to (*).
> >>
> >> What problem do you actually see, though?  Why does it matter, or why do
> >> you care, if the 'umount -l' leaves some kernel threads trying to umount?
> >> Is it just annoying because it Shouldn't Do That, or does it actually
> >> cause a problem for you?
> >>
> >> It may be that if you try to remount the same fs, the old superblock gets
> >> reused, and the mount fails somehow... I haven't tried that.  That would
> >> be an easy fix, though.
> >>
> >> Any clarification would be helpful!  Thanks-
> >> sage
> >>
> >>
> >> * Maybe a hook like /sys/kernel/debug/ceph/.../abort_sync that you can
> >> echo 1 to would be sufficient to make it give up on a sync (in the umount
> >> -l case, the sync prior to the actual unmount).
> >>
> >>
> >>>
> >>> I understand that it's not an easy point but a lot of my colleagues
> >>> are not really whiling to sacrifice even their dev workstation to play
> >>> during spare time... sad world ;)
> >>>
> >>> Sebastien
> >>>
> >>> On Wed, 16 Jun 2010, Peter Niemayer wrote:
> >>> > Hi,
> >>> >
> >>> > trying to "umount" a formerly mounted ceph filesystem that has become
> >>> > unavailable (osd crashed, then msd/mon were shut down using /etc/init.d/ceph
> >>> > stop) results in "umount" hanging forever in
> >>> > "D" state.
> >>> >
> >>> > Strangely, "umount -f" started from another terminal reports
> >>> > the ceph filesystem as not being mounted anymore, which is consistent
> >>> > with what the mount-table says.
> >>> >
> >>> > The kernel keeps emitting the following messages from time to time:
> >>> > > Jun 16 17:25:29 gitega kernel: ceph:  tid 211912 timed out on osd0, will
> >>> > > reset osd
> >>> > > Jun 16 17:25:35 gitega kernel: ceph: mon0 10.166.166.1:6789 connection
> >>> > > failed
> >>> > > Jun 16 17:26:15 gitega last message repeated 4 times
> >>> >
> >>> > I would have expected the "umount" to terminate at least after some generous
> >>> > timeout.
> >>> >
> >>> > Ceph should probably support something like the "soft,intr" options
> >>> > of NFS, because if the only supported way of mounting is one where
> >>> > a client is more or less stuck-until-reboot when the service fails,
> >>> > many potential test-configurations involving Ceph are way too dangerous
> >>> > to try...
> >>>
> >>> Yeah, being able to force it to shut down when servers are unresponsive is
> >>> definitely the intent.  'umount -f' should work.  It sounds like the
> >>> problem is related to the initial 'umount' (which doesn't time out)
> >>> followed by 'umount -f'.
> >>>
> >>> I'm hesitant to add a blanket umount timeout, as that could prevent proper
> >>> writeout of cached data/metadata in some cases.  So I think the goal
> >>> should be that if a normal umount hangs for some reason, you should be
> >>> able to intervene to add the 'force' if things don't go well.
> >>>
> >>> sage
> >>> --
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >>>
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "umount" of ceph filesystem that has become unavailable hangs forever
@ 2010-07-23 11:43 Anton
  2010-07-23 12:08 ` Sébastien Paolacci
  0 siblings, 1 reply; 11+ messages in thread
From: Anton @ 2010-07-23 11:43 UTC (permalink / raw)
  To: Sébastien Paolacci; +Cc: ceph-devel

Did you try an umount -l (lasy umount) - should just 
disconnect the fs - as I experienced with other network FS - 
like NFS or Gluster - you may always have difficulties with 
any of them - so "-l" helps me. Not sure for CEPH though.

On Friday 23 July 2010, Sébastien Paolacci wrote:
> Hello Sage,
> 
> I would like to emphasize that this issue is somewhat
> annoying, even for experiment purpose: I definitely
> expect my test server to not behave safely, crash, burn
> or whatever, but having a client side impact as deep as
> needed a (hard) reboot to solved a hanged ceph really
> prevent me from testing with real life payloads.
> 
> I understand that it's not an easy point but a lot of my
> colleagues are not really whiling to sacrifice even
> their dev workstation to play during spare time... sad
> world ;)
> 
> Sebastien
> 
> On Wed, 16 Jun 2010, Peter Niemayer wrote:
> > Hi,
> > 
> > trying to "umount" a formerly mounted ceph filesystem
> > that has become unavailable (osd crashed, then msd/mon
> > were shut down using /etc/init.d/ceph stop) results in
> > "umount" hanging forever in
> > "D" state.
> > 
> > Strangely, "umount -f" started from another terminal
> > reports the ceph filesystem as not being mounted
> > anymore, which is consistent with what the mount-table
> > says.
> > 
> > The kernel keeps emitting the following messages from 
time to time:
> > > Jun 16 17:25:29 gitega kernel: ceph:  tid 211912
> > > timed out on osd0, will reset osd
> > > Jun 16 17:25:35 gitega kernel: ceph: mon0
> > > 10.166.166.1:6789 connection failed
> > > Jun 16 17:26:15 gitega last message repeated 4 times
> > 
> > I would have expected the "umount" to terminate at
> > least after some generous timeout.
> > 
> > Ceph should probably support something like the
> > "soft,intr" options of NFS, because if the only
> > supported way of mounting is one where a client is
> > more or less stuck-until-reboot when the service
> > fails, many potential test-configurations involving
> > Ceph are way too dangerous to try...
> 
> Yeah, being able to force it to shut down when servers
> are unresponsive is definitely the intent.  'umount -f'
> should work.  It sounds like the problem is related to
> the initial 'umount' (which doesn't time out) followed
> by 'umount -f'.
> 
> I'm hesitant to add a blanket umount timeout, as that
> could prevent proper writeout of cached data/metadata in
> some cases.  So I think the goal should be that if a
> normal umount hangs for some reason, you should be able
> to intervene to add the 'force' if things don't go well.
> 
> sage
> --
> --
> To unsubscribe from this list: send the line "unsubscribe
> ceph-devel" in the body of a message to
> majordomo@vger.kernel.org More majordomo info at 
> http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "umount" of ceph filesystem that has become unavailable hangs forever
  2010-07-23 11:43 Anton
@ 2010-07-23 12:08 ` Sébastien Paolacci
  0 siblings, 0 replies; 11+ messages in thread
From: Sébastien Paolacci @ 2010-07-23 12:08 UTC (permalink / raw)
  To: Anton; +Cc: ceph-devel

Hello Anton,

Thanks for the tip, I'll give a try. I'm however afraid that it won't
solve all hard ceph deaths since some umount eventually end with a
weird

Jul 23 13:36:34 kernel: [ 2188.974338] ceph: mds0 caps stale
Jul 23 13:36:49 kernel: [ 2203.969716] ceph: mds0 caps stale
Jul 23 13:38:05 kernel: [ 2279.665552] umount        D
ffff88000524f8e0     0  3042   2635 0x00000000
Jul 23 13:38:05 kernel: [ 2279.665558]  ffff880127a5b880
0000000000000086 0000000000000000 0000000000015640
Jul 23 13:38:05 kernel: [ 2279.665563]  0000000000015640
0000000000015640 000000000000f8a0 ffff880095e07fd8
Jul 23 13:38:05 kernel: [ 2279.665568]  0000000000015640
0000000000015640 ffff880084c1f810 ffff880084c1fb08
Jul 23 13:38:05 kernel: [ 2279.665572] Call Trace:
Jul 23 13:38:05 kernel: [ 2279.665588]  [<ffffffffa050b740>] ?
ceph_mdsc_sync+0x1be/0x1da [ceph]
Jul 23 13:38:05 kernel: [ 2279.665596]  [<ffffffff81064afa>] ?
autoremove_wake_function+0x0/0x2e
Jul 23 13:38:05 kernel: [ 2279.665606]  [<ffffffffa05110ac>] ?
ceph_osdc_sync+0x1d/0xc1 [ceph]
Jul 23 13:38:05 kernel: [ 2279.665613]  [<ffffffffa04f931f>] ?
ceph_syncfs+0x2a/0x2e [ceph]
Jul 23 13:38:05 kernel: [ 2279.665618]  [<ffffffff8110b065>] ?
__sync_filesystem+0x5f/0x70
Jul 23 13:38:05 kernel: [ 2279.665622]  [<ffffffff8110b1de>] ?
sync_filesystem+0x2e/0x44
Jul 23 13:38:05 kernel: [ 2279.665627]  [<ffffffff810efdfa>] ?
generic_shutdown_super+0x21/0xfa
Jul 23 13:38:05 kernel: [ 2279.665631]  [<ffffffff810eff16>] ?
kill_anon_super+0x9/0x40
Jul 23 13:38:05 kernel: [ 2279.665638]  [<ffffffffa04f82ab>] ?
ceph_kill_sb+0x24/0x47 [ceph]
Jul 23 13:38:05 kernel: [ 2279.665642]  [<ffffffff810f05c5>] ?
deactivate_super+0x60/0x77
Jul 23 13:38:05 kernel: [ 2279.665647]  [<ffffffff81102da3>] ?
sys_umount+0x2c3/0x2f2
Jul 23 13:38:05 kernel: [ 2279.665654]  [<ffffffff81010b42>] ?
system_call_fastpath+0x16/0x1b

It should however possibly helps in the ceph clean shutdown case.

Thanks,
Sebastien

2010/7/23 Anton <anton.vazir@gmail.com>:
> Did you try an umount -l (lasy umount) - should just
> disconnect the fs - as I experienced with other network FS -
> like NFS or Gluster - you may always have difficulties with
> any of them - so "-l" helps me. Not sure for CEPH though.
>
> On Friday 23 July 2010, Sébastien Paolacci wrote:
>> Hello Sage,
>>
>> I would like to emphasize that this issue is somewhat
>> annoying, even for experiment purpose: I definitely
>> expect my test server to not behave safely, crash, burn
>> or whatever, but having a client side impact as deep as
>> needed a (hard) reboot to solved a hanged ceph really
>> prevent me from testing with real life payloads.
>>
>> I understand that it's not an easy point but a lot of my
>> colleagues are not really whiling to sacrifice even
>> their dev workstation to play during spare time... sad
>> world ;)
>>
>> Sebastien
>>
>> On Wed, 16 Jun 2010, Peter Niemayer wrote:
>> > Hi,
>> >
>> > trying to "umount" a formerly mounted ceph filesystem
>> > that has become unavailable (osd crashed, then msd/mon
>> > were shut down using /etc/init.d/ceph stop) results in
>> > "umount" hanging forever in
>> > "D" state.
>> >
>> > Strangely, "umount -f" started from another terminal
>> > reports the ceph filesystem as not being mounted
>> > anymore, which is consistent with what the mount-table
>> > says.
>> >
>> > The kernel keeps emitting the following messages from
> time to time:
>> > > Jun 16 17:25:29 gitega kernel: ceph:  tid 211912
>> > > timed out on osd0, will reset osd
>> > > Jun 16 17:25:35 gitega kernel: ceph: mon0
>> > > 10.166.166.1:6789 connection failed
>> > > Jun 16 17:26:15 gitega last message repeated 4 times
>> >
>> > I would have expected the "umount" to terminate at
>> > least after some generous timeout.
>> >
>> > Ceph should probably support something like the
>> > "soft,intr" options of NFS, because if the only
>> > supported way of mounting is one where a client is
>> > more or less stuck-until-reboot when the service
>> > fails, many potential test-configurations involving
>> > Ceph are way too dangerous to try...
>>
>> Yeah, being able to force it to shut down when servers
>> are unresponsive is definitely the intent.  'umount -f'
>> should work.  It sounds like the problem is related to
>> the initial 'umount' (which doesn't time out) followed
>> by 'umount -f'.
>>
>> I'm hesitant to add a blanket umount timeout, as that
>> could prevent proper writeout of cached data/metadata in
>> some cases.  So I think the goal should be that if a
>> normal umount hangs for some reason, you should be able
>> to intervene to add the 'force' if things don't go well.
>>
>> sage
>> --
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> ceph-devel" in the body of a message to
>> majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-07-27 17:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-16 15:35 "umount" of ceph filesystem that has become unavailable hangs forever Peter Niemayer
2010-06-16 18:56 ` Sage Weil
2010-06-17 11:36   ` Thomas Mueller
  -- strict thread matches above, loose matches on Subject: below --
2010-07-23 11:36 Sébastien Paolacci
2010-07-23 11:42 ` Anton V.G.
2010-07-23 16:56 ` Sage Weil
2010-07-24  8:36   ` Sébastien Paolacci
2010-07-27 10:49     ` Anton VG
2010-07-27 17:18       ` Sage Weil
2010-07-23 11:43 Anton
2010-07-23 12:08 ` Sébastien Paolacci

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.