Re: CephFS hangs when writing 10GB files in loop

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Wido den Hollander <wido@42on.com>
To: "Atchley, Scott" <atchleyes@ornl.gov>
Cc: Gregory Farnum <greg@gregs42.com>,
	ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: CephFS hangs when writing 10GB files in loop
Date: Thu, 18 Dec 2014 21:50:50 +0100	[thread overview]
Message-ID: <54933E2A.6070106@42on.com> (raw)
In-Reply-To: <EE4ECAFA-D981-44A6-B72F-63EC012E71EF@ornl.gov>

On 12/18/2014 05:32 PM, Atchley, Scott wrote:
> On Dec 18, 2014, at 10:54 AM, Wido den Hollander <wido@42on.com> wrote:
> 
>> On 12/18/2014 11:13 AM, Wido den Hollander wrote:
>>> On 12/17/2014 07:42 PM, Gregory Farnum wrote:
>>>> On Wed, Dec 17, 2014 at 8:35 AM, Wido den Hollander <wido@42on.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Today I've been playing with CephFS and the morning started great with
>>>>> CephFS playing along just fine.
>>>>>
>>>>> Some information first:
>>>>> - Ceph 0.89
>>>>> - Linux kernel 3.18
>>>>> - Ceph fuse 0.89
>>>>> - One Active MDS, one Standby
>>>>>
>>>>> This morning I could write a 10GB file like this using the kclient:
>>>>> $ dd if=/dev/zero of=10GB.bin bs=1M count=10240 conv=fsync
>>>>>
>>>>> That gave me 850MB/sec (all 10G network) and I could read the same file
>>>>> again with 610MB/sec.
>>>>>
>>>>> After writing to it multiple times it suddenly started to hang.
>>>>>
>>>>> No real evidence on the MDS (debug mds set to 20) or anything on the
>>>>> client. That specific operation just blocked, but I could still 'ls' the
>>>>> filesystem in a second terminal.
>>>>>
>>>>> The MDS was showing in it's log that it was checking active sessions of
>>>>> clients. It showed the active session of my single client.
>>>>>
>>>>> The client renewed it's caps and proceeded.
>>>>
>>>> Can you clarify this? I'm not quite sure what you mean.
>>>>
>>>
>>> I currently don't have the logs available. That was my problem when
>>> typing the original e-mail.
>>>
>>>>> I currently don't have any logs, but I'm just looking for a direction to
>>>>> be pointed towards.
>>>>>
>>>>> Any ideas?
>>>>
>>>> Well, now that you're on v0.89 you should explore the admin
>>>> socket...there are commands on the MDS to dump ops in flight (and
>>>> maybe to look at session states? I don't remember when that merged).
>>>
>>> Sage's pointer towards the kernel debugging and the new admin socket
>>> showed me that it were RADOS calls which were hanging.
>>>
>>> I investigated even further and it seems that this is not a CephFS
>>> problem, but a local TCP issue which is only triggered when using CephFS.
>>>
>>> At some point, which is still unclear to me, data transfer becomes very
>>> slow. The MDS doesn't seem to be able to update the journal and the
>>> client can't write to the OSDs anymore.
>>>
>>> It happened after I did some very basic TCP tuning (timestamp, rmem,
>>> wmem, sack, fastopen).
>>>
>>
>> So it was tcp_sack. With tcp_sack=0 the MDS has problems talking to
>> OSDs. Other clients still work fine, but the MDS couldn't replay it's
>> journal and such.
>>
>> Enabling tcp_sack again resolved the problem. The new admin socket
>> really helped there!
> 
> What was the reasoning behind disabling SACK to begin with? Without it, any drops or reordering might require resending potentially a lot of data.
> 

I was testing with various TCP settings and sack was one of those.
Didn't think about it earlier that it might be the problem.

>>
>>> Reverting back to the Ubuntu 14.04 defaults resolved it all and CephFS
>>> is running happily now.
>>>
>>> I'll dig some deeper to see why this system was affected by those
>>> changes. I applied these settings earlier on a RBD-only cluster without
>>> any problems.
>>>
>>>> -Greg
>>>>
>>>
>>>
>>
>>
>> -- 
>> Wido den Hollander
>> 42on B.V.
>> Ceph trainer and consultant
>>
>> Phone: +31 (0)20 700 9902
>> Skype: contact42on
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on

     prev parent reply	other threads:[~2014-12-18 20:50 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-17 16:35 CephFS hangs when writing 10GB files in loop Wido den Hollander
2014-12-17 16:40 ` Sage Weil
2014-12-17 16:43   ` Wido den Hollander
2014-12-17 18:42 ` Gregory Farnum
2014-12-18 10:13   ` Wido den Hollander
2014-12-18 15:54     ` Wido den Hollander
2014-12-18 16:32       ` Atchley, Scott
2014-12-18 20:50         ` Wido den Hollander [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54933E2A.6070106@42on.com \
    --to=wido@42on.com \
    --cc=atchleyes@ornl.gov \
    --cc=ceph-devel@vger.kernel.org \
    --cc=greg@gregs42.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.