CephFS hangs when writing 10GB files in loop

All of lore.kernel.org
 help / color / mirror / Atom feed

* CephFS hangs when writing 10GB files in loop
@ 2014-12-17 16:35 Wido den Hollander
  2014-12-17 16:40 ` Sage Weil
  2014-12-17 18:42 ` Gregory Farnum
  0 siblings, 2 replies; 8+ messages in thread
From: Wido den Hollander @ 2014-12-17 16:35 UTC (permalink / raw)
  To: ceph-devel

Hi,

Today I've been playing with CephFS and the morning started great with
CephFS playing along just fine.

Some information first:
- Ceph 0.89
- Linux kernel 3.18
- Ceph fuse 0.89
- One Active MDS, one Standby

This morning I could write a 10GB file like this using the kclient:
$ dd if=/dev/zero of=10GB.bin bs=1M count=10240 conv=fsync

That gave me 850MB/sec (all 10G network) and I could read the same file
again with 610MB/sec.

After writing to it multiple times it suddenly started to hang.

No real evidence on the MDS (debug mds set to 20) or anything on the
client. That specific operation just blocked, but I could still 'ls' the
filesystem in a second terminal.

The MDS was showing in it's log that it was checking active sessions of
clients. It showed the active session of my single client.

The client renewed it's caps and proceeded.

I currently don't have any logs, but I'm just looking for a direction to
be pointed towards.

Any ideas?

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CephFS hangs when writing 10GB files in loop
  2014-12-17 16:35 CephFS hangs when writing 10GB files in loop Wido den Hollander
@ 2014-12-17 16:40 ` Sage Weil
  2014-12-17 16:43   ` Wido den Hollander
  2014-12-17 18:42 ` Gregory Farnum
  1 sibling, 1 reply; 8+ messages in thread
From: Sage Weil @ 2014-12-17 16:40 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: ceph-devel

On Wed, 17 Dec 2014, Wido den Hollander wrote:
> Hi,
> 
> Today I've been playing with CephFS and the morning started great with
> CephFS playing along just fine.
> 
> Some information first:
> - Ceph 0.89
> - Linux kernel 3.18
> - Ceph fuse 0.89
> - One Active MDS, one Standby
> 
> This morning I could write a 10GB file like this using the kclient:
> $ dd if=/dev/zero of=10GB.bin bs=1M count=10240 conv=fsync
> 
> That gave me 850MB/sec (all 10G network) and I could read the same file
> again with 610MB/sec.
> 
> After writing to it multiple times it suddenly started to hang.
> 
> No real evidence on the MDS (debug mds set to 20) or anything on the
> client. That specific operation just blocked, but I could still 'ls' the
> filesystem in a second terminal.
> 
> The MDS was showing in it's log that it was checking active sessions of
> clients. It showed the active session of my single client.
> 
> The client renewed it's caps and proceeded.
> 
> I currently don't have any logs, but I'm just looking for a direction to
> be pointed towards.

Hmm.  Try

 cat /sys/kernel/debug/ceph/*/mdsc
 cat /sys/kernel/debug/ceph/*/osdc

to see requests in flight (you may need to mount -t debugfs none 
/sys/kernel/debug first).  What kernel version?

sage

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CephFS hangs when writing 10GB files in loop
  2014-12-17 16:40 ` Sage Weil
@ 2014-12-17 16:43   ` Wido den Hollander
  0 siblings, 0 replies; 8+ messages in thread
From: Wido den Hollander @ 2014-12-17 16:43 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 12/17/2014 05:40 PM, Sage Weil wrote:
> On Wed, 17 Dec 2014, Wido den Hollander wrote:
>> Hi,
>>
>> Today I've been playing with CephFS and the morning started great with
>> CephFS playing along just fine.
>>
>> Some information first:
>> - Ceph 0.89
>> - Linux kernel 3.18
>> - Ceph fuse 0.89
>> - One Active MDS, one Standby
>>
>> This morning I could write a 10GB file like this using the kclient:
>> $ dd if=/dev/zero of=10GB.bin bs=1M count=10240 conv=fsync
>>
>> That gave me 850MB/sec (all 10G network) and I could read the same file
>> again with 610MB/sec.
>>
>> After writing to it multiple times it suddenly started to hang.
>>
>> No real evidence on the MDS (debug mds set to 20) or anything on the
>> client. That specific operation just blocked, but I could still 'ls' the
>> filesystem in a second terminal.
>>
>> The MDS was showing in it's log that it was checking active sessions of
>> clients. It showed the active session of my single client.
>>
>> The client renewed it's caps and proceeded.
>>
>> I currently don't have any logs, but I'm just looking for a direction to
>> be pointed towards.
> 
> Hmm.  Try
> 
>  cat /sys/kernel/debug/ceph/*/mdsc
>  cat /sys/kernel/debug/ceph/*/osdc
> 

I'll check that, good point.

> to see requests in flight (you may need to mount -t debugfs none 
> /sys/kernel/debug first).  What kernel version?
> 

I tried with 3.18

Also tried with ceph-fuse 0.89, same result. It is slower, but it also
hangs at some point.

> sage
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CephFS hangs when writing 10GB files in loop
  2014-12-17 16:35 CephFS hangs when writing 10GB files in loop Wido den Hollander
  2014-12-17 16:40 ` Sage Weil
@ 2014-12-17 18:42 ` Gregory Farnum
  2014-12-18 10:13   ` Wido den Hollander
  1 sibling, 1 reply; 8+ messages in thread
From: Gregory Farnum @ 2014-12-17 18:42 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: ceph-devel

On Wed, Dec 17, 2014 at 8:35 AM, Wido den Hollander <wido@42on.com> wrote:
> Hi,
>
> Today I've been playing with CephFS and the morning started great with
> CephFS playing along just fine.
>
> Some information first:
> - Ceph 0.89
> - Linux kernel 3.18
> - Ceph fuse 0.89
> - One Active MDS, one Standby
>
> This morning I could write a 10GB file like this using the kclient:
> $ dd if=/dev/zero of=10GB.bin bs=1M count=10240 conv=fsync
>
> That gave me 850MB/sec (all 10G network) and I could read the same file
> again with 610MB/sec.
>
> After writing to it multiple times it suddenly started to hang.
>
> No real evidence on the MDS (debug mds set to 20) or anything on the
> client. That specific operation just blocked, but I could still 'ls' the
> filesystem in a second terminal.
>
> The MDS was showing in it's log that it was checking active sessions of
> clients. It showed the active session of my single client.
>
> The client renewed it's caps and proceeded.

Can you clarify this? I'm not quite sure what you mean.

> I currently don't have any logs, but I'm just looking for a direction to
> be pointed towards.
>
> Any ideas?

Well, now that you're on v0.89 you should explore the admin
socket...there are commands on the MDS to dump ops in flight (and
maybe to look at session states? I don't remember when that merged).
-Greg

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CephFS hangs when writing 10GB files in loop
  2014-12-17 18:42 ` Gregory Farnum
@ 2014-12-18 10:13   ` Wido den Hollander
  2014-12-18 15:54     ` Wido den Hollander
  0 siblings, 1 reply; 8+ messages in thread
From: Wido den Hollander @ 2014-12-18 10:13 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

On 12/17/2014 07:42 PM, Gregory Farnum wrote:
> On Wed, Dec 17, 2014 at 8:35 AM, Wido den Hollander <wido@42on.com> wrote:
>> Hi,
>>
>> Today I've been playing with CephFS and the morning started great with
>> CephFS playing along just fine.
>>
>> Some information first:
>> - Ceph 0.89
>> - Linux kernel 3.18
>> - Ceph fuse 0.89
>> - One Active MDS, one Standby
>>
>> This morning I could write a 10GB file like this using the kclient:
>> $ dd if=/dev/zero of=10GB.bin bs=1M count=10240 conv=fsync
>>
>> That gave me 850MB/sec (all 10G network) and I could read the same file
>> again with 610MB/sec.
>>
>> After writing to it multiple times it suddenly started to hang.
>>
>> No real evidence on the MDS (debug mds set to 20) or anything on the
>> client. That specific operation just blocked, but I could still 'ls' the
>> filesystem in a second terminal.
>>
>> The MDS was showing in it's log that it was checking active sessions of
>> clients. It showed the active session of my single client.
>>
>> The client renewed it's caps and proceeded.
> 
> Can you clarify this? I'm not quite sure what you mean.
> 

I currently don't have the logs available. That was my problem when
typing the original e-mail.

>> I currently don't have any logs, but I'm just looking for a direction to
>> be pointed towards.
>>
>> Any ideas?
> 
> Well, now that you're on v0.89 you should explore the admin
> socket...there are commands on the MDS to dump ops in flight (and
> maybe to look at session states? I don't remember when that merged).

Sage's pointer towards the kernel debugging and the new admin socket
showed me that it were RADOS calls which were hanging.

I investigated even further and it seems that this is not a CephFS
problem, but a local TCP issue which is only triggered when using CephFS.

At some point, which is still unclear to me, data transfer becomes very
slow. The MDS doesn't seem to be able to update the journal and the
client can't write to the OSDs anymore.

It happened after I did some very basic TCP tuning (timestamp, rmem,
wmem, sack, fastopen).

Reverting back to the Ubuntu 14.04 defaults resolved it all and CephFS
is running happily now.

I'll dig some deeper to see why this system was affected by those
changes. I applied these settings earlier on a RBD-only cluster without
any problems.

> -Greg
> 

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CephFS hangs when writing 10GB files in loop
  2014-12-18 10:13   ` Wido den Hollander
@ 2014-12-18 15:54     ` Wido den Hollander
  2014-12-18 16:32       ` Atchley, Scott
  0 siblings, 1 reply; 8+ messages in thread
From: Wido den Hollander @ 2014-12-18 15:54 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

On 12/18/2014 11:13 AM, Wido den Hollander wrote:
> On 12/17/2014 07:42 PM, Gregory Farnum wrote:
>> On Wed, Dec 17, 2014 at 8:35 AM, Wido den Hollander <wido@42on.com> wrote:
>>> Hi,
>>>
>>> Today I've been playing with CephFS and the morning started great with
>>> CephFS playing along just fine.
>>>
>>> Some information first:
>>> - Ceph 0.89
>>> - Linux kernel 3.18
>>> - Ceph fuse 0.89
>>> - One Active MDS, one Standby
>>>
>>> This morning I could write a 10GB file like this using the kclient:
>>> $ dd if=/dev/zero of=10GB.bin bs=1M count=10240 conv=fsync
>>>
>>> That gave me 850MB/sec (all 10G network) and I could read the same file
>>> again with 610MB/sec.
>>>
>>> After writing to it multiple times it suddenly started to hang.
>>>
>>> No real evidence on the MDS (debug mds set to 20) or anything on the
>>> client. That specific operation just blocked, but I could still 'ls' the
>>> filesystem in a second terminal.
>>>
>>> The MDS was showing in it's log that it was checking active sessions of
>>> clients. It showed the active session of my single client.
>>>
>>> The client renewed it's caps and proceeded.
>>
>> Can you clarify this? I'm not quite sure what you mean.
>>
> 
> I currently don't have the logs available. That was my problem when
> typing the original e-mail.
> 
>>> I currently don't have any logs, but I'm just looking for a direction to
>>> be pointed towards.
>>>
>>> Any ideas?
>>
>> Well, now that you're on v0.89 you should explore the admin
>> socket...there are commands on the MDS to dump ops in flight (and
>> maybe to look at session states? I don't remember when that merged).
> 
> Sage's pointer towards the kernel debugging and the new admin socket
> showed me that it were RADOS calls which were hanging.
> 
> I investigated even further and it seems that this is not a CephFS
> problem, but a local TCP issue which is only triggered when using CephFS.
> 
> At some point, which is still unclear to me, data transfer becomes very
> slow. The MDS doesn't seem to be able to update the journal and the
> client can't write to the OSDs anymore.
> 
> It happened after I did some very basic TCP tuning (timestamp, rmem,
> wmem, sack, fastopen).
> 

So it was tcp_sack. With tcp_sack=0 the MDS has problems talking to
OSDs. Other clients still work fine, but the MDS couldn't replay it's
journal and such.

Enabling tcp_sack again resolved the problem. The new admin socket
really helped there!

> Reverting back to the Ubuntu 14.04 defaults resolved it all and CephFS
> is running happily now.
> 
> I'll dig some deeper to see why this system was affected by those
> changes. I applied these settings earlier on a RBD-only cluster without
> any problems.
> 
>> -Greg
>>
> 
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CephFS hangs when writing 10GB files in loop
  2014-12-18 15:54     ` Wido den Hollander
@ 2014-12-18 16:32       ` Atchley, Scott
  2014-12-18 20:50         ` Wido den Hollander
  0 siblings, 1 reply; 8+ messages in thread
From: Atchley, Scott @ 2014-12-18 16:32 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: Gregory Farnum, ceph-devel

On Dec 18, 2014, at 10:54 AM, Wido den Hollander <wido@42on.com> wrote:

> On 12/18/2014 11:13 AM, Wido den Hollander wrote:
>> On 12/17/2014 07:42 PM, Gregory Farnum wrote:
>>> On Wed, Dec 17, 2014 at 8:35 AM, Wido den Hollander <wido@42on.com> wrote:
>>>> Hi,
>>>> 
>>>> Today I've been playing with CephFS and the morning started great with
>>>> CephFS playing along just fine.
>>>> 
>>>> Some information first:
>>>> - Ceph 0.89
>>>> - Linux kernel 3.18
>>>> - Ceph fuse 0.89
>>>> - One Active MDS, one Standby
>>>> 
>>>> This morning I could write a 10GB file like this using the kclient:
>>>> $ dd if=/dev/zero of=10GB.bin bs=1M count=10240 conv=fsync
>>>> 
>>>> That gave me 850MB/sec (all 10G network) and I could read the same file
>>>> again with 610MB/sec.
>>>> 
>>>> After writing to it multiple times it suddenly started to hang.
>>>> 
>>>> No real evidence on the MDS (debug mds set to 20) or anything on the
>>>> client. That specific operation just blocked, but I could still 'ls' the
>>>> filesystem in a second terminal.
>>>> 
>>>> The MDS was showing in it's log that it was checking active sessions of
>>>> clients. It showed the active session of my single client.
>>>> 
>>>> The client renewed it's caps and proceeded.
>>> 
>>> Can you clarify this? I'm not quite sure what you mean.
>>> 
>> 
>> I currently don't have the logs available. That was my problem when
>> typing the original e-mail.
>> 
>>>> I currently don't have any logs, but I'm just looking for a direction to
>>>> be pointed towards.
>>>> 
>>>> Any ideas?
>>> 
>>> Well, now that you're on v0.89 you should explore the admin
>>> socket...there are commands on the MDS to dump ops in flight (and
>>> maybe to look at session states? I don't remember when that merged).
>> 
>> Sage's pointer towards the kernel debugging and the new admin socket
>> showed me that it were RADOS calls which were hanging.
>> 
>> I investigated even further and it seems that this is not a CephFS
>> problem, but a local TCP issue which is only triggered when using CephFS.
>> 
>> At some point, which is still unclear to me, data transfer becomes very
>> slow. The MDS doesn't seem to be able to update the journal and the
>> client can't write to the OSDs anymore.
>> 
>> It happened after I did some very basic TCP tuning (timestamp, rmem,
>> wmem, sack, fastopen).
>> 
> 
> So it was tcp_sack. With tcp_sack=0 the MDS has problems talking to
> OSDs. Other clients still work fine, but the MDS couldn't replay it's
> journal and such.
> 
> Enabling tcp_sack again resolved the problem. The new admin socket
> really helped there!

What was the reasoning behind disabling SACK to begin with? Without it, any drops or reordering might require resending potentially a lot of data.

> 
>> Reverting back to the Ubuntu 14.04 defaults resolved it all and CephFS
>> is running happily now.
>> 
>> I'll dig some deeper to see why this system was affected by those
>> changes. I applied these settings earlier on a RBD-only cluster without
>> any problems.
>> 
>>> -Greg
>>> 
>> 
>> 
> 
> 
> -- 
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CephFS hangs when writing 10GB files in loop
  2014-12-18 16:32       ` Atchley, Scott
@ 2014-12-18 20:50         ` Wido den Hollander
  0 siblings, 0 replies; 8+ messages in thread
From: Wido den Hollander @ 2014-12-18 20:50 UTC (permalink / raw)
  To: Atchley, Scott; +Cc: Gregory Farnum, ceph-devel

On 12/18/2014 05:32 PM, Atchley, Scott wrote:
> On Dec 18, 2014, at 10:54 AM, Wido den Hollander <wido@42on.com> wrote:
> 
>> On 12/18/2014 11:13 AM, Wido den Hollander wrote:
>>> On 12/17/2014 07:42 PM, Gregory Farnum wrote:
>>>> On Wed, Dec 17, 2014 at 8:35 AM, Wido den Hollander <wido@42on.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Today I've been playing with CephFS and the morning started great with
>>>>> CephFS playing along just fine.
>>>>>
>>>>> Some information first:
>>>>> - Ceph 0.89
>>>>> - Linux kernel 3.18
>>>>> - Ceph fuse 0.89
>>>>> - One Active MDS, one Standby
>>>>>
>>>>> This morning I could write a 10GB file like this using the kclient:
>>>>> $ dd if=/dev/zero of=10GB.bin bs=1M count=10240 conv=fsync
>>>>>
>>>>> That gave me 850MB/sec (all 10G network) and I could read the same file
>>>>> again with 610MB/sec.
>>>>>
>>>>> After writing to it multiple times it suddenly started to hang.
>>>>>
>>>>> No real evidence on the MDS (debug mds set to 20) or anything on the
>>>>> client. That specific operation just blocked, but I could still 'ls' the
>>>>> filesystem in a second terminal.
>>>>>
>>>>> The MDS was showing in it's log that it was checking active sessions of
>>>>> clients. It showed the active session of my single client.
>>>>>
>>>>> The client renewed it's caps and proceeded.
>>>>
>>>> Can you clarify this? I'm not quite sure what you mean.
>>>>
>>>
>>> I currently don't have the logs available. That was my problem when
>>> typing the original e-mail.
>>>
>>>>> I currently don't have any logs, but I'm just looking for a direction to
>>>>> be pointed towards.
>>>>>
>>>>> Any ideas?
>>>>
>>>> Well, now that you're on v0.89 you should explore the admin
>>>> socket...there are commands on the MDS to dump ops in flight (and
>>>> maybe to look at session states? I don't remember when that merged).
>>>
>>> Sage's pointer towards the kernel debugging and the new admin socket
>>> showed me that it were RADOS calls which were hanging.
>>>
>>> I investigated even further and it seems that this is not a CephFS
>>> problem, but a local TCP issue which is only triggered when using CephFS.
>>>
>>> At some point, which is still unclear to me, data transfer becomes very
>>> slow. The MDS doesn't seem to be able to update the journal and the
>>> client can't write to the OSDs anymore.
>>>
>>> It happened after I did some very basic TCP tuning (timestamp, rmem,
>>> wmem, sack, fastopen).
>>>
>>
>> So it was tcp_sack. With tcp_sack=0 the MDS has problems talking to
>> OSDs. Other clients still work fine, but the MDS couldn't replay it's
>> journal and such.
>>
>> Enabling tcp_sack again resolved the problem. The new admin socket
>> really helped there!
> 
> What was the reasoning behind disabling SACK to begin with? Without it, any drops or reordering might require resending potentially a lot of data.
> 

I was testing with various TCP settings and sack was one of those.
Didn't think about it earlier that it might be the problem.

>>
>>> Reverting back to the Ubuntu 14.04 defaults resolved it all and CephFS
>>> is running happily now.
>>>
>>> I'll dig some deeper to see why this system was affected by those
>>> changes. I applied these settings earlier on a RBD-only cluster without
>>> any problems.
>>>
>>>> -Greg
>>>>
>>>
>>>
>>
>>
>> -- 
>> Wido den Hollander
>> 42on B.V.
>> Ceph trainer and consultant
>>
>> Phone: +31 (0)20 700 9902
>> Skype: contact42on
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-12-18 20:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-17 16:35 CephFS hangs when writing 10GB files in loop Wido den Hollander
2014-12-17 16:40 ` Sage Weil
2014-12-17 16:43   ` Wido den Hollander
2014-12-17 18:42 ` Gregory Farnum
2014-12-18 10:13   ` Wido den Hollander
2014-12-18 15:54     ` Wido den Hollander
2014-12-18 16:32       ` Atchley, Scott
2014-12-18 20:50         ` Wido den Hollander

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.