From: Nishanth Aravamudan <nacc@us.ibm.com>
To: Luca Maranzano <liuk001@gmail.com>
Cc: "LinuxVirtualServer.org users mailing list."
<lvs-users@linuxvirtualserver.org>,
netdev@oss.sgi.com
Subject: Re: ipvs_syncmaster brings cpu to 100%
Date: Mon, 26 Sep 2005 10:51:12 -0700 [thread overview]
Message-ID: <20050926175112.GF7532@us.ibm.com> (raw)
In-Reply-To: <68559cef05092607441dd8e961@mail.gmail.com>
On 26.09.2005 [16:44:09 +0200], Luca Maranzano wrote:
> On 26/09/05, Nishanth Aravamudan <nacc@us.ibm.com> wrote:
> > On 26.09.2005 [15:52:02 +0200], Luca Maranzano wrote:
> > > On 26/09/05, Nishanth Aravamudan <nacc@us.ibm.com> wrote:
> > > > On 26.09.2005 [17:12:32 +0900], Horms wrote:
> > > > > On Mon, Sep 26, 2005 at 05:05:10PM +0900, Horms wrote:
> > > > >
> > > > > [snip]
> > > > >
> > > > > > > > > > Furthermore, if I make an "rgrep" in the source tree of kernel 2.6.12
> > > > > > > > > > the function schedule_timeout() is more used than the ssleep() (517
> > > > > > > > > > occurrencies vs. 43), so why in ip_vs_sync.c there was this change?
> > > > > > > > > >
> > > > > > > > > > The other oddity is that Horms reported on this list that on non Xeon
> > > > > > > > > > CPU the same version of kernel of mine does not present the problem.
> > > > > > > > > >
> > > > > > > > > > I'm getting crazy :-)
> > > > > > > >
> > > > > > > > I've prepared a patch, which reverts the change which was introduced
> > > > > > > > by Nishanth Aravamudan in February.
> > > > > > >
> > > > > > > Was the 100% cpu utilization only occurring on Xeon processors?
> > > > > >
> > > > > > That seems to be the only case where were this problem has been
> > > > > > observed. I don't have such a processor myself, so I haven't actually
> > > > > > been able to produce the problem locally.
> > > > > >
> > > > > > One reason I posted this issue to netdev was to get some more
> > > > > > eyes on the problem as it is puzzling to say the least.
> > > > > >
> > > > > > > Care to try to use msleep_interruptible() instead of ssleep(), as
> > > > > > > opposed to schedule_timeout()?
> > > > > >
> > > > > > I will send a version that does that shortly, Luca, can
> > > > > > you plase check that too?
> > > > >
> > > > > Here is that version of the patch. Nishanth, I take it that I do not
> > > > > need to set TASK_INTERRUPTABLE before calling msleep_interruptible(),
> > > > > please let me know if I am wrong.
> > > >
> > > > Yes, exactly. I'm just trying to narrow it down to see if it's the task
> > > > state that's causing the issue (which, to be honest, doesn't make a lot
> > > > of sense to me -- with ssleep() your load average will go up as the task
> > > > will be UNINTERRUPTIBLE state, but I am not sure why utilisation would
> > > > rise, as you are still sleeping...)
> >
> > [trimmed lvs-users from my reply, as it is a closed list]
> >
> > > Just to add more info, please note the output of "ps":
> > >
> > > debld1:~# ps aux|grep ipvs
> > > root 3748 0.0 0.0 0 0 ? D 12:09 0:00
> > > [ipvs_syncmaster]
> > > root 3757 0.0 0.0 0 0 ? D 12:09 0:00
> > > [ipvs_syncbackup]
> > >
> > > Note the D status, i.e. (from ps(1) man page): Uninterruptible sleep
> > > (usually IO)
> >
> > The msleep_interruptible() change should fix that.
> >
> > But that does not show 100% CPU utilisation at all, it shows 0. Did you
> > mean to say your load increases?
> >
> > I'm still unclear what the problem is. Horms initial Cc trimmed some
> > important information. It would be very useful to "start over" -- at
> > least from the perspective of what the problem actually is.
> >
> > > I hope to have a Xeon machine to make some more tests in the next
> > > days, in the mean time I'll try to reproduce my setup on a couple of
> > > VMWare Workstation machines.
> >
> > Please don't top-most. It makes it really hard to write sane replies...
>
> [trimmed Cc to avoid spamming...]
>
> Ok, just to summarize the long thread from the beginning:
>
> The goal: setting up a Local Director with IPVS with state
> synchronization, failover and failback.
>
> The hardware: 1 CPU Intel Xeon 3,4 Ghz - HP DL380G4 on 2 identical boxes
>
> The problems (please note that all kernel versions are *Debian* kernels):
> 1. Kernel 2.6.8: got a system lock of the standby node when simulating
> a failover. The load average as reported from "top" or "w" is always
> 0.00.
>
> 2. Kernel 2.6.11 and Kernel 2.6.12: failover and failback works fine,
> but the load average as reported from "top" or "w" is always
> systematically at 2.00 or more with both sync thread started
> (ipvs_syncmaster and ipvs_syncbackup). Load average from top is 1.00
> or mroe with only one thread (i.e. ipvs_syncmaster). Horms reported
> that he was not able to reproduce this on a non-Xeon system.
Ok, so when whomever mentioned "CPU utilisation" they were mistaken. The
load average being 2 is due to ssleep(). The msleep_interruptible()
version of the patch should fix that up. It really doesn't make any
difference in the code, except that your load average will go back to
0.00 and the ipvs threads can be interrupted by signals.
I would expect the load average to be 2.00 for all systems, not just
Xeon. The system lock has nothing to do with the patch, though.
Something else fixed it.
Thanks,
Nish
P.S. Again, please don't top-post, it makes it harder for me to reply
(and disinclines me to do so).
next prev parent reply other threads:[~2005-09-26 17:51 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <68559cef050908090657fc2599@mail.gmail.com>
[not found] ` <498263350509081605956a771@mail.gmail.com>
[not found] ` <68559cef05092207022f1f0df4@mail.gmail.com>
[not found] ` <498263350509230815eb08a73@mail.gmail.com>
2005-09-26 3:28 ` ipvs_syncmaster brings cpu to 100% Horms
[not found] ` <20050926032807.GI18357@verge.net.au>
2005-09-26 4:34 ` Nishanth Aravamudan
2005-09-26 8:05 ` Horms
[not found] ` <20050926080508.GF11027@verge.net.au>
2005-09-26 8:12 ` Horms
[not found] ` <20050926081229.GA23755@verge.net.au>
2005-09-26 13:11 ` Nishanth Aravamudan
2005-09-26 13:52 ` Luca Maranzano
[not found] ` <68559cef05092606521cc13f9a@mail.gmail.com>
2005-09-26 14:21 ` Nishanth Aravamudan
2005-09-26 14:44 ` Luca Maranzano
2005-09-26 17:51 ` Nishanth Aravamudan [this message]
2005-09-28 2:23 ` Horms
2005-09-28 13:26 ` Nishanth Aravamudan
2005-09-29 7:00 ` Julian Anastasov
2005-09-30 15:59 ` Luca Maranzano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050926175112.GF7532@us.ibm.com \
--to=nacc@us.ibm.com \
--cc=liuk001@gmail.com \
--cc=lvs-users@linuxvirtualserver.org \
--cc=netdev@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).