From: Joe Jin <joe.jin@oracle.com>
To: Zheng Li <dev@zheng.li>, Dave Scott <Dave.Scott@citrix.com>
Cc: "Luis R. Rodriguez" <mcgrof@suse.com>,
Luonengjun <luonengjun@huawei.com>,
xen-devel <xen-devel@lists.xen.org>,
Fanhenglong <fanhenglong@huawei.com>,
"Liuqiming (John)" <john.liuqiming@huawei.com>,
Ian Jackson <Ian.Jackson@citrix.com>
Subject: Re: Lots of connections led oxenstored stuck
Date: Wed, 27 Aug 2014 09:59:35 +0800 [thread overview]
Message-ID: <53FD3B87.9050609@oracle.com> (raw)
In-Reply-To: <53FC4D35.4020209@zheng.li>
On 08/26/14 17:02, Zheng Li wrote:
> Hi Joe,
>
> I read your patch and understand the basic idea behind it. It can mitigate the situation when bad things happen, but it doesn't solve the double limits imposed by both select and NR_OPEN. E.g.
No this patch does not intend to fix NR_OPEN(this is from system side, need to
unlimited before start the daemon) and select limitations.
We met the bug not because of the limitations, the original issue is when more
connect(i.e 2000) request coming at the same time, accept() failed because of
open fds > SYSCONF.OPEN_MAX, this is as expected. The thing is when client exited,
oxenstored should close the sockets as well, but during our test, it did not, and
oxenstored keeping reported accept failed, any new request hang as well.
So my changes let oxenstored check and delete closed fds, then oxenstored able to
ack new requests.
During our testing, when issue happened also xenstored log enabled, xenstored.log
full of below error and xenstored.log keeping be rotated:
[20140827T15:48:25.399Z|error|xenstored] caught exception Unix.Unix_error(15, "accept", "")
>
> * When the number of fds is beyond NR_OPEN, is there any strict order for which fds being chosen to close? If no, then the special fds might get closed as well, in which case the xenstored might stuck still.
My change will not delete no-closed fds. Also I do not think the special fds will be
removed for no error from them.
>
> * When select is given 1024 fds (which can still happen even with your patch), the behavior is _undefined_. IIRC, some bits in the bitmap might be reused (wrongly), so that the output (fds reported as ready for read/write) might be wrong for some fds, so that the following read/write might be blocked on them.
>
> * Also, we generally prefer to handle special fds first, as the eventchn fd represents all the domain connections.
Remove closed firstly may reduce system resource usage?
>
> I previously mentioned I've got patches for these. I'm currently testing with 1,000 Windows 7 VMs on a single host (each consume at least 2 persistent xenstored socket connections). Besides the two limits just mentioned, I've also fixed several bugs and bottlenecks along the way.
>
> I'm going to upstream these patches very soon, just a bit clean up and documentation are needed. However if you (or anyone) need them urgently or eager to have a test, please send me an private email separately. I'm happy to send you the patch in its current form --- a single non-disaggregated patch for multiple issues, not very well commented, but should just work.
Can you please send a copy of your patch? I'd like to test when connections more than @nfds of
poll, what happened.
Thanks,
Joe
>
> Cheers,
> Zheng
>
> On 26/08/2014 09:15, Joe Jin wrote:
>> This bug caused by oxenstored handle incoming requests, when lots of
>> connections came at same time it has not chance to delete closed sockets.
>>
>> I created a patch for this, please review:
>>
>> Thanks,
>> Joe
>>
>> [PATCH] oxenstored: check and delete closed socket before accept incoming connections
>>
>> When more than SYSCONF.OPEN_MAX connections came at the same time and
>> connecitons been closed later, oxenstored has not change to delete closed
>> socket, this led oxenstored stuck and unable to handle any incoming
>> requests any more. This patch let oxenstored check and process closed
>> socket before handle incoming connections to avoid the stuck.
>>
>> Cc: David Scott <dave.scott@eu.citrix.com>
>> Cc: Zheng Li <dev@zheng.li>
>> Cc: Luis R. Rodriguez <mcgrof@suse.com>
>> Cc: Ian Jackson <Ian.Jackson@citrix.com>
>> Signed-off-by: Joe Jin <joe.jin@oracle.com>
>> ---
>> tools/ocaml/xenstored/xenstored.ml | 4 ++--
>> 1 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
>> index 1c02f2f..b142952 100644
>> --- a/tools/ocaml/xenstored/xenstored.ml
>> +++ b/tools/ocaml/xenstored/xenstored.ml
>> @@ -373,10 +373,10 @@ let _ =
>> [], [], [] in
>> let sfds, cfds =
>> List.partition (fun fd -> List.mem fd spec_fds) rset in
>> - if List.length sfds > 0 then
>> - process_special_fds sfds;
>> if List.length cfds > 0 || List.length wset > 0 then
>> process_connection_fds store cons domains cfds wset;
>> + if List.length sfds > 0 then
>> + process_special_fds sfds;
>> process_domains store cons domains
>> in
>>
>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
--
Oracle <http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing
next prev parent reply other threads:[~2014-08-27 1:59 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-08 7:01 Lots of connections led oxenstored stuck Joe Jin
2014-08-08 8:35 ` Liuqiming (John)
2014-08-08 9:37 ` Dave Scott
2014-08-11 0:35 ` Joe Jin
2014-08-11 9:41 ` Dave Scott
2014-08-12 0:19 ` Joe Jin
2014-08-14 8:33 ` Joe Jin
2014-08-26 8:15 ` Joe Jin
2014-08-26 9:02 ` Zheng Li
2014-08-27 1:59 ` Joe Jin [this message]
2014-08-27 10:16 ` Zheng Li
2014-08-11 16:58 ` Zheng Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53FD3B87.9050609@oracle.com \
--to=joe.jin@oracle.com \
--cc=Dave.Scott@citrix.com \
--cc=Ian.Jackson@citrix.com \
--cc=dev@zheng.li \
--cc=fanhenglong@huawei.com \
--cc=john.liuqiming@huawei.com \
--cc=luonengjun@huawei.com \
--cc=mcgrof@suse.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).