All of lore.kernel.org
 help / color / mirror / Atom feed
* Machine hangs while writing to ceph filesystem
@ 2012-09-21 20:23 Bryan K. Wright
  2012-09-21 20:39 ` Alex Elder
  2012-09-21 22:45 ` Alex Elder
  0 siblings, 2 replies; 4+ messages in thread
From: Bryan K. Wright @ 2012-09-21 20:23 UTC (permalink / raw)
  To: ceph-devel

Hi folks,

	I've just started working with ceph, and I'm finding that
whenever a 32-bit client mounts the ceph filesystem and tries
to copy something into it, the client host hangs after some
random, small, amount of data has been copied.  The last error
messages displayed are:

 kernel:Process kworker/0:0 (pid: 4913, ti=f6042000 task=f6008a90 task.ti=f6042000)
 kernel:Stack:
 kernel:Call Trace:
 kernel:Code: 15 48 95 70 c1 81 ea 00 c0 5c 00 81 e2 00 00 e0 ff 29 d0 c1 e8 0c 8b 14 85 a0 82 8e c1 83 ea 01 85 d2 89 14 85 a0 82 8e c1 75 04 <0f> 0b eb fe 31 c0 83 fa 01 75 0f 31 c0 81 3d f0 cc 71 c1 f0 cc
 kernel:EIP: [<c1116fff>] kunmap_high+0x4f/0xa0 SS:ESP 0068:f6043e6c

The client host is running 32-bit Centos 6.3, with the elrepo 3.5.4
kernel.  The osd, mon and mds machines are all 64-bit Centos 6.3, with
the stock Centos 2.6.32 kernel.  The ceph version in all cases is
0.48.2.  The OSDS are using XFS for their data stores.

	There are no error messages in the ceph logs.

	After rebooting the client machine and re-mounting the
ceph filesystem, I can see that some files were, indeed, copied,
but "du" gives an error message indicating that there are circular
directory references, and that the filesystem is probably corrupt.

	After wiping out the osds and re-creating the ceph cluster,
the same thing happens.

	Any advice about how to debug this would be appreciated.

					Thanks,
					Bryan


-- 
========================================================================
Bryan Wright              |"If you take cranberries and stew them like 
Physics Department        | applesauce, they taste much more like prunes
University of Virginia    | than rhubarb does."  --  Groucho 
Charlottesville, VA  22901|			
(434) 924-7218            |         bryan@virginia.edu
========================================================================


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Machine hangs while writing to ceph filesystem
  2012-09-21 20:23 Machine hangs while writing to ceph filesystem Bryan K. Wright
@ 2012-09-21 20:39 ` Alex Elder
  2012-09-21 22:45 ` Alex Elder
  1 sibling, 0 replies; 4+ messages in thread
From: Alex Elder @ 2012-09-21 20:39 UTC (permalink / raw)
  To: bryan; +Cc: Bryan K. Wright, ceph-devel

On 09/21/2012 03:23 PM, Bryan K. Wright wrote:
> Hi folks,
> 
> 	I've just started working with ceph, and I'm finding that
> whenever a 32-bit client mounts the ceph filesystem and tries
> to copy something into it, the client host hangs after some
> random, small, amount of data has been copied.  The last error
> messages displayed are:

I have reproduced this problem myself, while trying to track
down a different problem.  Have you opened a bug for this?
I'll take a look to see if it's been reported before.

					-Alex

>  kernel:Process kworker/0:0 (pid: 4913, ti=f6042000 task=f6008a90 task.ti=f6042000)
>  kernel:Stack:
>  kernel:Call Trace:
>  kernel:Code: 15 48 95 70 c1 81 ea 00 c0 5c 00 81 e2 00 00 e0 ff 29 d0 c1 e8 0c 8b 14 85 a0 82 8e c1 83 ea 01 85 d2 89 14 85 a0 82 8e c1 75 04 <0f> 0b eb fe 31 c0 83 fa 01 75 0f 31 c0 81 3d f0 cc 71 c1 f0 cc
>  kernel:EIP: [<c1116fff>] kunmap_high+0x4f/0xa0 SS:ESP 0068:f6043e6c
> 
> The client host is running 32-bit Centos 6.3, with the elrepo 3.5.4
> kernel.  The osd, mon and mds machines are all 64-bit Centos 6.3, with
> the stock Centos 2.6.32 kernel.  The ceph version in all cases is
> 0.48.2.  The OSDS are using XFS for their data stores.
> 
> 	There are no error messages in the ceph logs.
> 
> 	After rebooting the client machine and re-mounting the
> ceph filesystem, I can see that some files were, indeed, copied,
> but "du" gives an error message indicating that there are circular
> directory references, and that the filesystem is probably corrupt.
> 
> 	After wiping out the osds and re-creating the ceph cluster,
> the same thing happens.
> 
> 	Any advice about how to debug this would be appreciated.
> 
> 					Thanks,
> 					Bryan
> 
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Machine hangs while writing to ceph filesystem
  2012-09-21 20:23 Machine hangs while writing to ceph filesystem Bryan K. Wright
  2012-09-21 20:39 ` Alex Elder
@ 2012-09-21 22:45 ` Alex Elder
  2012-09-24 14:54   ` Bryan K. Wright
  1 sibling, 1 reply; 4+ messages in thread
From: Alex Elder @ 2012-09-21 22:45 UTC (permalink / raw)
  To: bryan; +Cc: Bryan K. Wright, ceph-devel

On 09/21/2012 03:23 PM, Bryan K. Wright wrote:
> Hi folks,
> 
> 	I've just started working with ceph, and I'm finding that
> whenever a 32-bit client mounts the ceph filesystem and tries
> to copy something into it, the client host hangs after some
> random, small, amount of data has been copied.  The last error
> messages displayed are:
> 
>  kernel:Process kworker/0:0 (pid: 4913, ti=f6042000 task=f6008a90 task.ti=f6042000)
>  kernel:Stack:
>  kernel:Call Trace:
>  kernel:Code: 15 48 95 70 c1 81 ea 00 c0 5c 00 81 e2 00 00 e0 ff 29 d0 c1 e8 0c 8b 14 85 a0 82 8e c1 83 ea 01 85 d2 89 14 85 a0 82 8e c1 75 04 <0f> 0b eb fe 31 c0 83 fa 01 75 0f 31 c0 81 3d f0 cc 71 c1 f0 cc
>  kernel:EIP: [<c1116fff>] kunmap_high+0x4f/0xa0 SS:ESP 0068:f6043e6c
> 
> The client host is running 32-bit Centos 6.3, with the elrepo 3.5.4
> kernel.  The osd, mon and mds machines are all 64-bit Centos 6.3, with
> the stock Centos 2.6.32 kernel.  The ceph version in all cases is
> 0.48.2.  The OSDS are using XFS for their data stores.


If you are able to and are comfortable with it, could you please
try to mount your file system with the "nocrc" mount option?

I believe I have found the cause of this problem, but it would
be useful to have you verify that it goes away when this option
is used.

					-Alex


> 	There are no error messages in the ceph logs.
> 
> 	After rebooting the client machine and re-mounting the
> ceph filesystem, I can see that some files were, indeed, copied,
> but "du" gives an error message indicating that there are circular
> directory references, and that the filesystem is probably corrupt.
> 
> 	After wiping out the osds and re-creating the ceph cluster,
> the same thing happens.
> 
> 	Any advice about how to debug this would be appreciated.
> 
> 					Thanks,
> 					Bryan
> 
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Machine hangs while writing to ceph filesystem
  2012-09-21 22:45 ` Alex Elder
@ 2012-09-24 14:54   ` Bryan K. Wright
  0 siblings, 0 replies; 4+ messages in thread
From: Bryan K. Wright @ 2012-09-24 14:54 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

Hi Alex,

elder@inktank.com said:
> If you are able to and are comfortable with it, could you please try to mount
> your file system with the "nocrc" mount option?

> I believe I have found the cause of this problem, but it would be useful to
> have you verify that it goes away when this option is used.

					-Alex 

	Thanks for your help.  I've tried "nocrc", and it does
indeed seem to make the problem go away.

					Bryan

-- 
========================================================================
Bryan Wright              |"If you take cranberries and stew them like 
Physics Department        | applesauce, they taste much more like prunes
University of Virginia    | than rhubarb does."  --  Groucho 
Charlottesville, VA  22901|			
(434) 924-7218            |         bryan@virginia.edu
========================================================================



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-09-24 14:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-21 20:23 Machine hangs while writing to ceph filesystem Bryan K. Wright
2012-09-21 20:39 ` Alex Elder
2012-09-21 22:45 ` Alex Elder
2012-09-24 14:54   ` Bryan K. Wright

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.