From: Noah Watkins <noahwatkins@gmail.com>
To: Sage Weil <sage@newdream.net>
Cc: Noah Watkins <jayhawk@cs.ucsc.edu>,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: Client receives 'connection refused' only after heavy use
Date: Sun, 4 Dec 2011 22:22:15 -0800 [thread overview]
Message-ID: <5154145039466346698@unknownmsgid> (raw)
In-Reply-To: <Pine.LNX.4.64.1112042046280.8414@cobra.newdream.net>
Sent from my iPhone
On Dec 4, 2011, at 20:48, Sage Weil <sage@newdream.net> wrote:
> On Sun, 4 Dec 2011, Noah Watkins wrote:
>> Yikes, I think this was actually the problem. nm
>>
>> # ulimit -n
>> 1024
>
> I'm a little surprised the fd count got that high with a fixed size
> cluster. Were there lots of short-lived clients?
Not a lot. Maybe a hundred total over a few hours.
>
> It would be interested to see what `ls -al /proc/$pid/fd` looks like after
> the process has been running for a while... there is probably a leak
> somewhere.
I checked this out after the problem became noticeable. There were
significantly less than 1024 file nos but still several hundred w no
active clients. I think this latest fix is masking things. I'll drop
the ulimit backdown and gather some more info.
>
> sage
>
>
>>
>> -----
>>
>> root@issdm-23:/var/log/ceph# grep -n "Too many" full_conn_refused.log
>> 2417924:2011-12-04 14:52:15.289873 7f1406ecb700 -- 192.168.141.123:6800/1325
>> accepter no incoming connection? sd = -1 errno 24 Too many open files
>> 2417925:2011-12-04 14:52:15.289923 7f1406ecb700 -- 192.168.141.123:6800/1325
>> accepter no incoming connection? sd = -1 errno 24 Too many open files
>> 2417926:2011-12-04 14:52:15.289952 7f1406ecb700 -- 192.168.141.123:6800/1325
>> accepter no incoming connection? sd = -1 errno 24 Too many open files
>> 2417927:2011-12-04 14:52:15.289970 7f1406ecb700 -- 192.168.141.123:6800/1325
>> accepter no incoming connection? sd = -1 errno 24 Too many open files
>> 2417928:2011-12-04 14:52:15.290002 7f1406ecb700 -- 192.168.141.123:6800/1325
>> accepter no incoming connection? sd = -1 errno 24 Too many open files
>>
>> On 12/04/2011 04:22 PM, Noah Watkins wrote:
>>> We are experiencing client connection problems that occur only after some
>>> period of heavy use. Prior to the 'connection refused' error in the client
>>> log the cluster behaves as normal. Restarting Ceph solves the problem but we
>>> are not able to finish long jobs.
>>>
>>> Logs attached. I have the full 1 GB MDS log if needed, and included only the
>>> portition of the log in which the client had problems plus about 5 seconds
>>> of context on either side of the test.
>>>
>>> Thanks,
>>> Noah
>>>
>>> Client
>>> ====
>>> ...
>>> 2011-12-04 16:07:58.154523 7f4458314700 -- 192.168.141.123:0/1009375 >>
>>> 192.168.141.123:6800/1325 pipe(0x7f445437d020 sd=55 pgs=0 cs=0 l=0).connect
>>> 0
>>> 2011-12-04 16:07:58.154562 7f4458314700 -- 192.168.141.123:0/1009375 >>
>>> 192.168.141.123:6800/1325 pipe(0x7f445437d020 sd=55 pgs=0 cs=0
>>> l=0).connecting to 192.168.141.123:6800/1325
>>> 2011-12-04 16:07:58.154605 7f4458314700 -- 192.168.141.123:0/1009375 >>
>>> 192.168.141.123:6800/1325 pipe(0x7f445437d020 sd=55 pgs=0 cs=0 l=0).connect
>>> error 192.168.141.123:6800/1325, 111: Connection refused
>>> 2011-12-04 16:07:58.154620 7f4458314700 -- 192.168.141.123:0/1009375 >>
>>> 192.168.141.123:6800/1325 pipe(0x7f445437d020 sd=55 pgs=0 cs=0 l=0).fault
>>> 111: Connection refused
>>> 2011-12-04 16:07:58.154635 7f4458314700 -- 192.168.141.123:0/1009375 >>
>>> 192.168.141.123:6800/1325 pipe(0x7f445437d020 sd=55 pgs=0 cs=0 l=0).fault
>>> waiting 3.200000
>>>
>>> Full logs attached.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-12-05 6:22 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-05 0:22 Client receives 'connection refused' only after heavy use Noah Watkins
2011-12-05 1:03 ` Noah Watkins
2011-12-05 4:47 ` Sage Weil
2011-12-05 6:22 ` Noah Watkins [this message]
2011-12-05 18:57 ` Noah Watkins
2011-12-05 19:03 ` Tommi Virtanen
2011-12-05 19:05 ` Noah Watkins
2011-12-05 19:11 ` Sage Weil
2011-12-05 19:36 ` Tommi Virtanen
2011-12-07 5:49 ` Gregory Farnum
2011-12-07 22:33 ` Gregory Farnum
2011-12-09 0:40 ` Noah Watkins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5154145039466346698@unknownmsgid \
--to=noahwatkins@gmail.com \
--cc=ceph-devel@vger.kernel.org \
--cc=jayhawk@cs.ucsc.edu \
--cc=sage@newdream.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.