* Re: ipvs_syncmaster brings cpu to 100%
[not found] ` <498263350509230815eb08a73@mail.gmail.com>
@ 2005-09-26 3:28 ` Horms
[not found] ` <20050926032807.GI18357@verge.net.au>
1 sibling, 0 replies; 13+ messages in thread
From: Horms @ 2005-09-26 3:28 UTC (permalink / raw)
To: Roger Tsang, Luca Maranzano,
LinuxVirtualServer.org users mailing list.
Cc: Nishanth Aravamudan, Dave Miller, Wensong Zhang, Julian Anastasov,
netdev
On Fri, Sep 23, 2005 at 11:15:31AM -0400, Roger Tsang wrote:
> As I've said before in this thread, you might want to try changing all the
> ssleep() calls to schedule_timeout().
>
> Roger
>
>
> On 9/22/05, Luca Maranzano <liuk001@gmail.com> wrote:
> >
> > Hello all,
> >
> > here again trying to discover the reason ot the CPU hog for
> > ipvs_sync{master,backup}.
> >
> > I've digged in the sources for ip_vs_sync.c and the main differences
> > between kernel 2.6.8 and 2.6.12 is the use of ssleep() instead of
> > schedule_timeout().
> >
> > The oddity I've seen is that in the header of both files, the version
> > is always like this:
> >
> > * Version: $Id: ip_vs_sync.c,v 1.13 2003/06/08 09:31:19 wensong Exp $
> > *
> > * Authors: Wensong Zhang <wensong@linuxvirtualserver.org>
> >
> > Is Wensong still the maintainer for this code?
Yes, although he is kind of quiet.
> > Furthermore, if I make an "rgrep" in the source tree of kernel 2.6.12
> > the function schedule_timeout() is more used than the ssleep() (517
> > occurrencies vs. 43), so why in ip_vs_sync.c there was this change?
> >
> > The other oddity is that Horms reported on this list that on non Xeon
> > CPU the same version of kernel of mine does not present the problem.
> >
> > I'm getting crazy :-)
I've prepared a patch, which reverts the change which was introduced
by Nishanth Aravamudan in February.
I have CCed him, Dave Miller, Wensong Zhang, Julian Anastasov, and the
netdev list for comment.
Could intererested parties please test the patch.
Thanks
--
Horms
Use schedule_timeout() instead of ssleep() in ip_vs_sync daemon,
as the latter seems to cause 100% CPU utilistaion on HT Xeons.
Discussion:
http://archive.linuxvirtualserver.org/html/lvs-users/2005-09/msg00031.html
Reverts:
http://www.kernel.org/git/?p=linux/kernel/git/tglx/history.git;a=commit;h=f8afb60c7537130448cc479d6d8dc9bf4ee06027
Signed-off-by: Horms <horms@verge.net.au>
diff --git a/net/ipv4/ipvs/ip_vs_sync.c b/net/ipv4/ipvs/ip_vs_sync.c
--- a/net/ipv4/ipvs/ip_vs_sync.c
+++ b/net/ipv4/ipvs/ip_vs_sync.c
@@ -655,7 +655,9 @@ static void sync_master_loop(void)
if (stop_master_sync)
break;
- ssleep(1);
+ __set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(HZ);
+ __set_current_state(TASK_RUNNING);
}
/* clean up the sync_buff queue */
@@ -712,7 +714,9 @@ static void sync_backup_loop(void)
if (stop_backup_sync)
break;
- ssleep(1);
+ __set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(HZ);
+ __set_current_state(TASK_RUNNING);
}
/* release the sending multicast socket */
@@ -824,7 +828,9 @@ static int fork_sync_thread(void *startu
if ((pid = kernel_thread(sync_thread, startup, 0)) < 0) {
IP_VS_ERR("could not create sync_thread due to %d... "
"retrying.\n", pid);
- ssleep(1);
+ __set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(HZ);
+ __set_current_state(TASK_RUNNING);
goto repeat;
}
@@ -858,7 +864,9 @@ int start_sync_thread(int state, char *m
if ((pid = kernel_thread(fork_sync_thread, &startup, 0)) < 0) {
IP_VS_ERR("could not create fork_sync_thread due to %d... "
"retrying.\n", pid);
- ssleep(1);
+ __set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(HZ);
+ __set_current_state(TASK_RUNNING);
goto repeat;
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ipvs_syncmaster brings cpu to 100%
[not found] ` <20050926032807.GI18357@verge.net.au>
@ 2005-09-26 4:34 ` Nishanth Aravamudan
2005-09-26 8:05 ` Horms
[not found] ` <20050926080508.GF11027@verge.net.au>
0 siblings, 2 replies; 13+ messages in thread
From: Nishanth Aravamudan @ 2005-09-26 4:34 UTC (permalink / raw)
To: Roger Tsang, Luca Maranzano,
LinuxVirtualServer.org users mailing list., Dave Miller,
Wensong Zhang, Julian Anastasov, netdev
On 26.09.2005 [12:28:08 +0900], Horms wrote:
> On Fri, Sep 23, 2005 at 11:15:31AM -0400, Roger Tsang wrote:
> > As I've said before in this thread, you might want to try changing all the
> > ssleep() calls to schedule_timeout().
> >
> > Roger
> >
> >
> > On 9/22/05, Luca Maranzano <liuk001@gmail.com> wrote:
> > >
> > > Hello all,
> > >
> > > here again trying to discover the reason ot the CPU hog for
> > > ipvs_sync{master,backup}.
> > >
> > > I've digged in the sources for ip_vs_sync.c and the main differences
> > > between kernel 2.6.8 and 2.6.12 is the use of ssleep() instead of
> > > schedule_timeout().
> > >
> > > The oddity I've seen is that in the header of both files, the version
> > > is always like this:
> > >
> > > * Version: $Id: ip_vs_sync.c,v 1.13 2003/06/08 09:31:19 wensong Exp $
> > > *
> > > * Authors: Wensong Zhang <wensong@linuxvirtualserver.org>
> > >
> > > Is Wensong still the maintainer for this code?
>
> Yes, although he is kind of quiet.
>
> > > Furthermore, if I make an "rgrep" in the source tree of kernel 2.6.12
> > > the function schedule_timeout() is more used than the ssleep() (517
> > > occurrencies vs. 43), so why in ip_vs_sync.c there was this change?
> > >
> > > The other oddity is that Horms reported on this list that on non Xeon
> > > CPU the same version of kernel of mine does not present the problem.
> > >
> > > I'm getting crazy :-)
>
> I've prepared a patch, which reverts the change which was introduced
> by Nishanth Aravamudan in February.
Was the 100% cpu utilization only occurring on Xeon processors?
Care to try to use msleep_interruptible() instead of ssleep(), as
opposed to schedule_timeout()?
In your patch, you do not need to set the state back to TASK_RUNNING,
btw.
Thanks,
Nish
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ipvs_syncmaster brings cpu to 100%
2005-09-26 4:34 ` Nishanth Aravamudan
@ 2005-09-26 8:05 ` Horms
[not found] ` <20050926080508.GF11027@verge.net.au>
1 sibling, 0 replies; 13+ messages in thread
From: Horms @ 2005-09-26 8:05 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Roger Tsang, Luca Maranzano,
LinuxVirtualServer.org users mailing list., Dave Miller,
Wensong Zhang, Julian Anastasov, netdev
On Sun, Sep 25, 2005 at 09:34:00PM -0700, Nishanth Aravamudan wrote:
> On 26.09.2005 [12:28:08 +0900], Horms wrote:
> > On Fri, Sep 23, 2005 at 11:15:31AM -0400, Roger Tsang wrote:
> > > As I've said before in this thread, you might want to try changing all the
> > > ssleep() calls to schedule_timeout().
> > >
> > > Roger
> > >
> > >
> > > On 9/22/05, Luca Maranzano <liuk001@gmail.com> wrote:
> > > >
> > > > Hello all,
> > > >
> > > > here again trying to discover the reason ot the CPU hog for
> > > > ipvs_sync{master,backup}.
> > > >
> > > > I've digged in the sources for ip_vs_sync.c and the main differences
> > > > between kernel 2.6.8 and 2.6.12 is the use of ssleep() instead of
> > > > schedule_timeout().
> > > >
> > > > The oddity I've seen is that in the header of both files, the version
> > > > is always like this:
> > > >
> > > > * Version: $Id: ip_vs_sync.c,v 1.13 2003/06/08 09:31:19 wensong Exp $
> > > > *
> > > > * Authors: Wensong Zhang <wensong@linuxvirtualserver.org>
> > > >
> > > > Is Wensong still the maintainer for this code?
> >
> > Yes, although he is kind of quiet.
> >
> > > > Furthermore, if I make an "rgrep" in the source tree of kernel 2.6.12
> > > > the function schedule_timeout() is more used than the ssleep() (517
> > > > occurrencies vs. 43), so why in ip_vs_sync.c there was this change?
> > > >
> > > > The other oddity is that Horms reported on this list that on non Xeon
> > > > CPU the same version of kernel of mine does not present the problem.
> > > >
> > > > I'm getting crazy :-)
> >
> > I've prepared a patch, which reverts the change which was introduced
> > by Nishanth Aravamudan in February.
>
> Was the 100% cpu utilization only occurring on Xeon processors?
That seems to be the only case where were this problem has been
observed. I don't have such a processor myself, so I haven't actually
been able to produce the problem locally.
One reason I posted this issue to netdev was to get some more
eyes on the problem as it is puzzling to say the least.
> Care to try to use msleep_interruptible() instead of ssleep(), as
> opposed to schedule_timeout()?
I will send a version that does that shortly, Luca, can
you plase check that too?
> In your patch, you do not need to set the state back to TASK_RUNNING,
> btw.
Thanks, updated patch below.
--
Horms
Use schedule_timeout() instead of ssleep() in ip_vs_sync daemon,
as the latter seems to cause 100% CPU utilistaion on HT Xeons.
Discussion:
http://archive.linuxvirtualserver.org/html/lvs-users/2005-09/msg00031.html
Reverts:
http://www.kernel.org/git/?p=linux/kernel/git/tglx/history.git;a=commit;h=f8afb60c7537130448cc479d6d8dc9bf4ee06027
Signed-off-by: Horms <horms@verge.net.au>
diff --git a/net/ipv4/ipvs/ip_vs_sync.c b/net/ipv4/ipvs/ip_vs_sync.c
--- a/net/ipv4/ipvs/ip_vs_sync.c
+++ b/net/ipv4/ipvs/ip_vs_sync.c
@@ -655,7 +655,8 @@ static void sync_master_loop(void)
if (stop_master_sync)
break;
- ssleep(1);
+ __set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(HZ);
}
/* clean up the sync_buff queue */
@@ -712,7 +713,8 @@ static void sync_backup_loop(void)
if (stop_backup_sync)
break;
- ssleep(1);
+ __set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(HZ);
}
/* release the sending multicast socket */
@@ -824,7 +826,8 @@ static int fork_sync_thread(void *startu
if ((pid = kernel_thread(sync_thread, startup, 0)) < 0) {
IP_VS_ERR("could not create sync_thread due to %d... "
"retrying.\n", pid);
- ssleep(1);
+ __set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(HZ);
goto repeat;
}
@@ -858,7 +861,8 @@ int start_sync_thread(int state, char *m
if ((pid = kernel_thread(fork_sync_thread, &startup, 0)) < 0) {
IP_VS_ERR("could not create fork_sync_thread due to %d... "
"retrying.\n", pid);
- ssleep(1);
+ __set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(HZ);
goto repeat;
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ipvs_syncmaster brings cpu to 100%
[not found] ` <20050926080508.GF11027@verge.net.au>
@ 2005-09-26 8:12 ` Horms
[not found] ` <20050926081229.GA23755@verge.net.au>
1 sibling, 0 replies; 13+ messages in thread
From: Horms @ 2005-09-26 8:12 UTC (permalink / raw)
To: Nishanth Aravamudan, Roger Tsang, Luca Maranzano,
LinuxVirtualServer.org users mailing list., Dave Miller,
Wensong Zhang, Julian Anastasov, netdev
On Mon, Sep 26, 2005 at 05:05:10PM +0900, Horms wrote:
[snip]
> > > > > Furthermore, if I make an "rgrep" in the source tree of kernel 2.6.12
> > > > > the function schedule_timeout() is more used than the ssleep() (517
> > > > > occurrencies vs. 43), so why in ip_vs_sync.c there was this change?
> > > > >
> > > > > The other oddity is that Horms reported on this list that on non Xeon
> > > > > CPU the same version of kernel of mine does not present the problem.
> > > > >
> > > > > I'm getting crazy :-)
> > >
> > > I've prepared a patch, which reverts the change which was introduced
> > > by Nishanth Aravamudan in February.
> >
> > Was the 100% cpu utilization only occurring on Xeon processors?
>
> That seems to be the only case where were this problem has been
> observed. I don't have such a processor myself, so I haven't actually
> been able to produce the problem locally.
>
> One reason I posted this issue to netdev was to get some more
> eyes on the problem as it is puzzling to say the least.
>
> > Care to try to use msleep_interruptible() instead of ssleep(), as
> > opposed to schedule_timeout()?
>
> I will send a version that does that shortly, Luca, can
> you plase check that too?
Here is that version of the patch. Nishanth, I take it that I do not
need to set TASK_INTERRUPTABLE before calling msleep_interruptible(),
please let me know if I am wrong.
Luca, please test.
--
Horms
*UNTESTED*
Use msleep_interruptible() instead of ssleep() in ip_vs_sync daemon,
as the latter seems to cause 100% CPU utilistaion on HT Xeons.
Discussion:
http://archive.linuxvirtualserver.org/html/lvs-users/2005-09/msg00031.html
Reverts:
http://www.kernel.org/git/?p=linux/kernel/git/tglx/history.git;a=commit;h=f8afb60c7537130448cc479d6d8dc9bf4ee06027
Signed-off-by: Horms <horms@verge.net.au>
diff --git a/net/ipv4/ipvs/ip_vs_sync.c b/net/ipv4/ipvs/ip_vs_sync.c
--- a/net/ipv4/ipvs/ip_vs_sync.c
+++ b/net/ipv4/ipvs/ip_vs_sync.c
@@ -655,7 +655,7 @@ static void sync_master_loop(void)
if (stop_master_sync)
break;
- ssleep(1);
+ msleep_interruptible(1000);
}
/* clean up the sync_buff queue */
@@ -712,7 +712,7 @@ static void sync_backup_loop(void)
if (stop_backup_sync)
break;
- ssleep(1);
+ msleep_interruptible(1000);
}
/* release the sending multicast socket */
@@ -824,7 +824,7 @@ static int fork_sync_thread(void *startu
if ((pid = kernel_thread(sync_thread, startup, 0)) < 0) {
IP_VS_ERR("could not create sync_thread due to %d... "
"retrying.\n", pid);
- ssleep(1);
+ msleep_interruptible(1000);
goto repeat;
}
@@ -858,7 +858,7 @@ int start_sync_thread(int state, char *m
if ((pid = kernel_thread(fork_sync_thread, &startup, 0)) < 0) {
IP_VS_ERR("could not create fork_sync_thread due to %d... "
"retrying.\n", pid);
- ssleep(1);
+ msleep_interruptible(1000);
goto repeat;
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ipvs_syncmaster brings cpu to 100%
[not found] ` <20050926081229.GA23755@verge.net.au>
@ 2005-09-26 13:11 ` Nishanth Aravamudan
2005-09-26 13:52 ` Luca Maranzano
[not found] ` <68559cef05092606521cc13f9a@mail.gmail.com>
0 siblings, 2 replies; 13+ messages in thread
From: Nishanth Aravamudan @ 2005-09-26 13:11 UTC (permalink / raw)
To: Roger Tsang, Luca Maranzano,
LinuxVirtualServer.org users mailing list., Dave Miller,
Wensong Zhang, Julian Anastasov, netdev
On 26.09.2005 [17:12:32 +0900], Horms wrote:
> On Mon, Sep 26, 2005 at 05:05:10PM +0900, Horms wrote:
>
> [snip]
>
> > > > > > Furthermore, if I make an "rgrep" in the source tree of kernel 2.6.12
> > > > > > the function schedule_timeout() is more used than the ssleep() (517
> > > > > > occurrencies vs. 43), so why in ip_vs_sync.c there was this change?
> > > > > >
> > > > > > The other oddity is that Horms reported on this list that on non Xeon
> > > > > > CPU the same version of kernel of mine does not present the problem.
> > > > > >
> > > > > > I'm getting crazy :-)
> > > >
> > > > I've prepared a patch, which reverts the change which was introduced
> > > > by Nishanth Aravamudan in February.
> > >
> > > Was the 100% cpu utilization only occurring on Xeon processors?
> >
> > That seems to be the only case where were this problem has been
> > observed. I don't have such a processor myself, so I haven't actually
> > been able to produce the problem locally.
> >
> > One reason I posted this issue to netdev was to get some more
> > eyes on the problem as it is puzzling to say the least.
> >
> > > Care to try to use msleep_interruptible() instead of ssleep(), as
> > > opposed to schedule_timeout()?
> >
> > I will send a version that does that shortly, Luca, can
> > you plase check that too?
>
> Here is that version of the patch. Nishanth, I take it that I do not
> need to set TASK_INTERRUPTABLE before calling msleep_interruptible(),
> please let me know if I am wrong.
Yes, exactly. I'm just trying to narrow it down to see if it's the task
state that's causing the issue (which, to be honest, doesn't make a lot
of sense to me -- with ssleep() your load average will go up as the task
will be UNINTERRUPTIBLE state, but I am not sure why utilisation would
rise, as you are still sleeping...)
Thanks,
Nish
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ipvs_syncmaster brings cpu to 100%
2005-09-26 13:11 ` Nishanth Aravamudan
@ 2005-09-26 13:52 ` Luca Maranzano
[not found] ` <68559cef05092606521cc13f9a@mail.gmail.com>
1 sibling, 0 replies; 13+ messages in thread
From: Luca Maranzano @ 2005-09-26 13:52 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: LinuxVirtualServer.org users mailing list., Dave Miller,
Wensong Zhang, Julian Anastasov, netdev
Just to add more info, please note the output of "ps":
debld1:~# ps aux|grep ipvs
root 3748 0.0 0.0 0 0 ? D 12:09 0:00
[ipvs_syncmaster]
root 3757 0.0 0.0 0 0 ? D 12:09 0:00
[ipvs_syncbackup]
Note the D status, i.e. (from ps(1) man page): Uninterruptible sleep
(usually IO)
I hope to have a Xeon machine to make some more tests in the next
days, in the mean time I'll try to reproduce my setup on a couple of
VMWare Workstation machines.
More later.
Thank you all.
Luca
On 26/09/05, Nishanth Aravamudan <nacc@us.ibm.com> wrote:
> On 26.09.2005 [17:12:32 +0900], Horms wrote:
> > On Mon, Sep 26, 2005 at 05:05:10PM +0900, Horms wrote:
> >
> > [snip]
> >
> > > > > > > Furthermore, if I make an "rgrep" in the source tree of kernel 2.6.12
> > > > > > > the function schedule_timeout() is more used than the ssleep() (517
> > > > > > > occurrencies vs. 43), so why in ip_vs_sync.c there was this change?
> > > > > > >
> > > > > > > The other oddity is that Horms reported on this list that on non Xeon
> > > > > > > CPU the same version of kernel of mine does not present the problem.
> > > > > > >
> > > > > > > I'm getting crazy :-)
> > > > >
> > > > > I've prepared a patch, which reverts the change which was introduced
> > > > > by Nishanth Aravamudan in February.
> > > >
> > > > Was the 100% cpu utilization only occurring on Xeon processors?
> > >
> > > That seems to be the only case where were this problem has been
> > > observed. I don't have such a processor myself, so I haven't actually
> > > been able to produce the problem locally.
> > >
> > > One reason I posted this issue to netdev was to get some more
> > > eyes on the problem as it is puzzling to say the least.
> > >
> > > > Care to try to use msleep_interruptible() instead of ssleep(), as
> > > > opposed to schedule_timeout()?
> > >
> > > I will send a version that does that shortly, Luca, can
> > > you plase check that too?
> >
> > Here is that version of the patch. Nishanth, I take it that I do not
> > need to set TASK_INTERRUPTABLE before calling msleep_interruptible(),
> > please let me know if I am wrong.
>
> Yes, exactly. I'm just trying to narrow it down to see if it's the task
> state that's causing the issue (which, to be honest, doesn't make a lot
> of sense to me -- with ssleep() your load average will go up as the task
> will be UNINTERRUPTIBLE state, but I am not sure why utilisation would
> rise, as you are still sleeping...)
>
> Thanks,
> Nish
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ipvs_syncmaster brings cpu to 100%
[not found] ` <68559cef05092606521cc13f9a@mail.gmail.com>
@ 2005-09-26 14:21 ` Nishanth Aravamudan
2005-09-26 14:44 ` Luca Maranzano
2005-09-28 2:23 ` Horms
0 siblings, 2 replies; 13+ messages in thread
From: Nishanth Aravamudan @ 2005-09-26 14:21 UTC (permalink / raw)
To: Luca Maranzano; +Cc: Dave Miller, Wensong Zhang, Julian Anastasov, netdev
On 26.09.2005 [15:52:02 +0200], Luca Maranzano wrote:
> On 26/09/05, Nishanth Aravamudan <nacc@us.ibm.com> wrote:
> > On 26.09.2005 [17:12:32 +0900], Horms wrote:
> > > On Mon, Sep 26, 2005 at 05:05:10PM +0900, Horms wrote:
> > >
> > > [snip]
> > >
> > > > > > > > Furthermore, if I make an "rgrep" in the source tree of kernel 2.6.12
> > > > > > > > the function schedule_timeout() is more used than the ssleep() (517
> > > > > > > > occurrencies vs. 43), so why in ip_vs_sync.c there was this change?
> > > > > > > >
> > > > > > > > The other oddity is that Horms reported on this list that on non Xeon
> > > > > > > > CPU the same version of kernel of mine does not present the problem.
> > > > > > > >
> > > > > > > > I'm getting crazy :-)
> > > > > >
> > > > > > I've prepared a patch, which reverts the change which was introduced
> > > > > > by Nishanth Aravamudan in February.
> > > > >
> > > > > Was the 100% cpu utilization only occurring on Xeon processors?
> > > >
> > > > That seems to be the only case where were this problem has been
> > > > observed. I don't have such a processor myself, so I haven't actually
> > > > been able to produce the problem locally.
> > > >
> > > > One reason I posted this issue to netdev was to get some more
> > > > eyes on the problem as it is puzzling to say the least.
> > > >
> > > > > Care to try to use msleep_interruptible() instead of ssleep(), as
> > > > > opposed to schedule_timeout()?
> > > >
> > > > I will send a version that does that shortly, Luca, can
> > > > you plase check that too?
> > >
> > > Here is that version of the patch. Nishanth, I take it that I do not
> > > need to set TASK_INTERRUPTABLE before calling msleep_interruptible(),
> > > please let me know if I am wrong.
> >
> > Yes, exactly. I'm just trying to narrow it down to see if it's the task
> > state that's causing the issue (which, to be honest, doesn't make a lot
> > of sense to me -- with ssleep() your load average will go up as the task
> > will be UNINTERRUPTIBLE state, but I am not sure why utilisation would
> > rise, as you are still sleeping...)
[trimmed lvs-users from my reply, as it is a closed list]
> Just to add more info, please note the output of "ps":
>
> debld1:~# ps aux|grep ipvs
> root 3748 0.0 0.0 0 0 ? D 12:09 0:00
> [ipvs_syncmaster]
> root 3757 0.0 0.0 0 0 ? D 12:09 0:00
> [ipvs_syncbackup]
>
> Note the D status, i.e. (from ps(1) man page): Uninterruptible sleep
> (usually IO)
The msleep_interruptible() change should fix that.
But that does not show 100% CPU utilisation at all, it shows 0. Did you
mean to say your load increases?
I'm still unclear what the problem is. Horms initial Cc trimmed some
important information. It would be very useful to "start over" -- at
least from the perspective of what the problem actually is.
> I hope to have a Xeon machine to make some more tests in the next
> days, in the mean time I'll try to reproduce my setup on a couple of
> VMWare Workstation machines.
Please don't top-most. It makes it really hard to write sane replies...
Thanks,
Nish
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ipvs_syncmaster brings cpu to 100%
2005-09-26 14:21 ` Nishanth Aravamudan
@ 2005-09-26 14:44 ` Luca Maranzano
2005-09-26 17:51 ` Nishanth Aravamudan
2005-09-28 2:23 ` Horms
1 sibling, 1 reply; 13+ messages in thread
From: Luca Maranzano @ 2005-09-26 14:44 UTC (permalink / raw)
To: Nishanth Aravamudan, LinuxVirtualServer.org users mailing list.; +Cc: netdev
[trimmed Cc to avoid spamming...]
Ok, just to summarize the long thread from the beginning:
The goal: setting up a Local Director with IPVS with state
synchronization, failover and failback.
The hardware: 1 CPU Intel Xeon 3,4 Ghz - HP DL380G4 on 2 identical boxes
The problems (please note that all kernel versions are *Debian* kernels):
1. Kernel 2.6.8: got a system lock of the standby node when simulating
a failover. The load average as reported from "top" or "w" is always
0.00.
2. Kernel 2.6.11 and Kernel 2.6.12: failover and failback works fine,
but the load average as reported from "top" or "w" is always
systematically at 2.00 or more with both sync thread started
(ipvs_syncmaster and ipvs_syncbackup). Load average from top is 1.00
or mroe with only one thread (i.e. ipvs_syncmaster). Horms reported
that he was not able to reproduce this on a non-Xeon system.
That's all, let me know if you need more info.
Regards,
Luca
On 26/09/05, Nishanth Aravamudan <nacc@us.ibm.com> wrote:
> On 26.09.2005 [15:52:02 +0200], Luca Maranzano wrote:
> > On 26/09/05, Nishanth Aravamudan <nacc@us.ibm.com> wrote:
> > > On 26.09.2005 [17:12:32 +0900], Horms wrote:
> > > > On Mon, Sep 26, 2005 at 05:05:10PM +0900, Horms wrote:
> > > >
> > > > [snip]
> > > >
> > > > > > > > > Furthermore, if I make an "rgrep" in the source tree of kernel 2.6.12
> > > > > > > > > the function schedule_timeout() is more used than the ssleep() (517
> > > > > > > > > occurrencies vs. 43), so why in ip_vs_sync.c there was this change?
> > > > > > > > >
> > > > > > > > > The other oddity is that Horms reported on this list that on non Xeon
> > > > > > > > > CPU the same version of kernel of mine does not present the problem.
> > > > > > > > >
> > > > > > > > > I'm getting crazy :-)
> > > > > > >
> > > > > > > I've prepared a patch, which reverts the change which was introduced
> > > > > > > by Nishanth Aravamudan in February.
> > > > > >
> > > > > > Was the 100% cpu utilization only occurring on Xeon processors?
> > > > >
> > > > > That seems to be the only case where were this problem has been
> > > > > observed. I don't have such a processor myself, so I haven't actually
> > > > > been able to produce the problem locally.
> > > > >
> > > > > One reason I posted this issue to netdev was to get some more
> > > > > eyes on the problem as it is puzzling to say the least.
> > > > >
> > > > > > Care to try to use msleep_interruptible() instead of ssleep(), as
> > > > > > opposed to schedule_timeout()?
> > > > >
> > > > > I will send a version that does that shortly, Luca, can
> > > > > you plase check that too?
> > > >
> > > > Here is that version of the patch. Nishanth, I take it that I do not
> > > > need to set TASK_INTERRUPTABLE before calling msleep_interruptible(),
> > > > please let me know if I am wrong.
> > >
> > > Yes, exactly. I'm just trying to narrow it down to see if it's the task
> > > state that's causing the issue (which, to be honest, doesn't make a lot
> > > of sense to me -- with ssleep() your load average will go up as the task
> > > will be UNINTERRUPTIBLE state, but I am not sure why utilisation would
> > > rise, as you are still sleeping...)
>
> [trimmed lvs-users from my reply, as it is a closed list]
>
> > Just to add more info, please note the output of "ps":
> >
> > debld1:~# ps aux|grep ipvs
> > root 3748 0.0 0.0 0 0 ? D 12:09 0:00
> > [ipvs_syncmaster]
> > root 3757 0.0 0.0 0 0 ? D 12:09 0:00
> > [ipvs_syncbackup]
> >
> > Note the D status, i.e. (from ps(1) man page): Uninterruptible sleep
> > (usually IO)
>
> The msleep_interruptible() change should fix that.
>
> But that does not show 100% CPU utilisation at all, it shows 0. Did you
> mean to say your load increases?
>
> I'm still unclear what the problem is. Horms initial Cc trimmed some
> important information. It would be very useful to "start over" -- at
> least from the perspective of what the problem actually is.
>
> > I hope to have a Xeon machine to make some more tests in the next
> > days, in the mean time I'll try to reproduce my setup on a couple of
> > VMWare Workstation machines.
>
> Please don't top-most. It makes it really hard to write sane replies...
>
> Thanks,
> Nish
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ipvs_syncmaster brings cpu to 100%
2005-09-26 14:44 ` Luca Maranzano
@ 2005-09-26 17:51 ` Nishanth Aravamudan
0 siblings, 0 replies; 13+ messages in thread
From: Nishanth Aravamudan @ 2005-09-26 17:51 UTC (permalink / raw)
To: Luca Maranzano; +Cc: LinuxVirtualServer.org users mailing list., netdev
On 26.09.2005 [16:44:09 +0200], Luca Maranzano wrote:
> On 26/09/05, Nishanth Aravamudan <nacc@us.ibm.com> wrote:
> > On 26.09.2005 [15:52:02 +0200], Luca Maranzano wrote:
> > > On 26/09/05, Nishanth Aravamudan <nacc@us.ibm.com> wrote:
> > > > On 26.09.2005 [17:12:32 +0900], Horms wrote:
> > > > > On Mon, Sep 26, 2005 at 05:05:10PM +0900, Horms wrote:
> > > > >
> > > > > [snip]
> > > > >
> > > > > > > > > > Furthermore, if I make an "rgrep" in the source tree of kernel 2.6.12
> > > > > > > > > > the function schedule_timeout() is more used than the ssleep() (517
> > > > > > > > > > occurrencies vs. 43), so why in ip_vs_sync.c there was this change?
> > > > > > > > > >
> > > > > > > > > > The other oddity is that Horms reported on this list that on non Xeon
> > > > > > > > > > CPU the same version of kernel of mine does not present the problem.
> > > > > > > > > >
> > > > > > > > > > I'm getting crazy :-)
> > > > > > > >
> > > > > > > > I've prepared a patch, which reverts the change which was introduced
> > > > > > > > by Nishanth Aravamudan in February.
> > > > > > >
> > > > > > > Was the 100% cpu utilization only occurring on Xeon processors?
> > > > > >
> > > > > > That seems to be the only case where were this problem has been
> > > > > > observed. I don't have such a processor myself, so I haven't actually
> > > > > > been able to produce the problem locally.
> > > > > >
> > > > > > One reason I posted this issue to netdev was to get some more
> > > > > > eyes on the problem as it is puzzling to say the least.
> > > > > >
> > > > > > > Care to try to use msleep_interruptible() instead of ssleep(), as
> > > > > > > opposed to schedule_timeout()?
> > > > > >
> > > > > > I will send a version that does that shortly, Luca, can
> > > > > > you plase check that too?
> > > > >
> > > > > Here is that version of the patch. Nishanth, I take it that I do not
> > > > > need to set TASK_INTERRUPTABLE before calling msleep_interruptible(),
> > > > > please let me know if I am wrong.
> > > >
> > > > Yes, exactly. I'm just trying to narrow it down to see if it's the task
> > > > state that's causing the issue (which, to be honest, doesn't make a lot
> > > > of sense to me -- with ssleep() your load average will go up as the task
> > > > will be UNINTERRUPTIBLE state, but I am not sure why utilisation would
> > > > rise, as you are still sleeping...)
> >
> > [trimmed lvs-users from my reply, as it is a closed list]
> >
> > > Just to add more info, please note the output of "ps":
> > >
> > > debld1:~# ps aux|grep ipvs
> > > root 3748 0.0 0.0 0 0 ? D 12:09 0:00
> > > [ipvs_syncmaster]
> > > root 3757 0.0 0.0 0 0 ? D 12:09 0:00
> > > [ipvs_syncbackup]
> > >
> > > Note the D status, i.e. (from ps(1) man page): Uninterruptible sleep
> > > (usually IO)
> >
> > The msleep_interruptible() change should fix that.
> >
> > But that does not show 100% CPU utilisation at all, it shows 0. Did you
> > mean to say your load increases?
> >
> > I'm still unclear what the problem is. Horms initial Cc trimmed some
> > important information. It would be very useful to "start over" -- at
> > least from the perspective of what the problem actually is.
> >
> > > I hope to have a Xeon machine to make some more tests in the next
> > > days, in the mean time I'll try to reproduce my setup on a couple of
> > > VMWare Workstation machines.
> >
> > Please don't top-most. It makes it really hard to write sane replies...
>
> [trimmed Cc to avoid spamming...]
>
> Ok, just to summarize the long thread from the beginning:
>
> The goal: setting up a Local Director with IPVS with state
> synchronization, failover and failback.
>
> The hardware: 1 CPU Intel Xeon 3,4 Ghz - HP DL380G4 on 2 identical boxes
>
> The problems (please note that all kernel versions are *Debian* kernels):
> 1. Kernel 2.6.8: got a system lock of the standby node when simulating
> a failover. The load average as reported from "top" or "w" is always
> 0.00.
>
> 2. Kernel 2.6.11 and Kernel 2.6.12: failover and failback works fine,
> but the load average as reported from "top" or "w" is always
> systematically at 2.00 or more with both sync thread started
> (ipvs_syncmaster and ipvs_syncbackup). Load average from top is 1.00
> or mroe with only one thread (i.e. ipvs_syncmaster). Horms reported
> that he was not able to reproduce this on a non-Xeon system.
Ok, so when whomever mentioned "CPU utilisation" they were mistaken. The
load average being 2 is due to ssleep(). The msleep_interruptible()
version of the patch should fix that up. It really doesn't make any
difference in the code, except that your load average will go back to
0.00 and the ipvs threads can be interrupted by signals.
I would expect the load average to be 2.00 for all systems, not just
Xeon. The system lock has nothing to do with the patch, though.
Something else fixed it.
Thanks,
Nish
P.S. Again, please don't top-post, it makes it harder for me to reply
(and disinclines me to do so).
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ipvs_syncmaster brings cpu to 100%
2005-09-26 14:21 ` Nishanth Aravamudan
2005-09-26 14:44 ` Luca Maranzano
@ 2005-09-28 2:23 ` Horms
2005-09-28 13:26 ` Nishanth Aravamudan
1 sibling, 1 reply; 13+ messages in thread
From: Horms @ 2005-09-28 2:23 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Luca Maranzano, Dave Miller, Wensong Zhang, Julian Anastasov,
netdev
On Mon, Sep 26, 2005 at 07:21:09AM -0700, Nishanth Aravamudan wrote:
> On 26.09.2005 [15:52:02 +0200], Luca Maranzano wrote:
> > On 26/09/05, Nishanth Aravamudan <nacc@us.ibm.com> wrote:
> > > On 26.09.2005 [17:12:32 +0900], Horms wrote:
> > > > On Mon, Sep 26, 2005 at 05:05:10PM +0900, Horms wrote:
> > > >
> > > > [snip]
> > > >
> > > > > > > > > Furthermore, if I make an "rgrep" in the source tree of kernel 2.6.12
> > > > > > > > > the function schedule_timeout() is more used than the ssleep() (517
> > > > > > > > > occurrencies vs. 43), so why in ip_vs_sync.c there was this change?
> > > > > > > > >
> > > > > > > > > The other oddity is that Horms reported on this list that on non Xeon
> > > > > > > > > CPU the same version of kernel of mine does not present the problem.
> > > > > > > > >
> > > > > > > > > I'm getting crazy :-)
> > > > > > >
> > > > > > > I've prepared a patch, which reverts the change which was introduced
> > > > > > > by Nishanth Aravamudan in February.
> > > > > >
> > > > > > Was the 100% cpu utilization only occurring on Xeon processors?
> > > > >
> > > > > That seems to be the only case where were this problem has been
> > > > > observed. I don't have such a processor myself, so I haven't actually
> > > > > been able to produce the problem locally.
> > > > >
> > > > > One reason I posted this issue to netdev was to get some more
> > > > > eyes on the problem as it is puzzling to say the least.
> > > > >
> > > > > > Care to try to use msleep_interruptible() instead of ssleep(), as
> > > > > > opposed to schedule_timeout()?
> > > > >
> > > > > I will send a version that does that shortly, Luca, can
> > > > > you plase check that too?
> > > >
> > > > Here is that version of the patch. Nishanth, I take it that I do not
> > > > need to set TASK_INTERRUPTABLE before calling msleep_interruptible(),
> > > > please let me know if I am wrong.
> > >
> > > Yes, exactly. I'm just trying to narrow it down to see if it's the task
> > > state that's causing the issue (which, to be honest, doesn't make a lot
> > > of sense to me -- with ssleep() your load average will go up as the task
> > > will be UNINTERRUPTIBLE state, but I am not sure why utilisation would
> > > rise, as you are still sleeping...)
>
> [trimmed lvs-users from my reply, as it is a closed list]
>
> > Just to add more info, please note the output of "ps":
> >
> > debld1:~# ps aux|grep ipvs
> > root 3748 0.0 0.0 0 0 ? D 12:09 0:00
> > [ipvs_syncmaster]
> > root 3757 0.0 0.0 0 0 ? D 12:09 0:00
> > [ipvs_syncbackup]
> >
> > Note the D status, i.e. (from ps(1) man page): Uninterruptible sleep
> > (usually IO)
>
> The msleep_interruptible() change should fix that.
>
> But that does not show 100% CPU utilisation at all, it shows 0. Did you
> mean to say your load increases?
he full discussion is available online at the follwoing URL:
I can get than information and post it all here if that is
desirable.
http://archive.linuxvirtualserver.org/html/lvs-users/2005-09/msg00031.html
--
Horms
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ipvs_syncmaster brings cpu to 100%
2005-09-28 2:23 ` Horms
@ 2005-09-28 13:26 ` Nishanth Aravamudan
2005-09-29 7:00 ` Julian Anastasov
2005-09-30 15:59 ` Luca Maranzano
0 siblings, 2 replies; 13+ messages in thread
From: Nishanth Aravamudan @ 2005-09-28 13:26 UTC (permalink / raw)
To: Luca Maranzano, Dave Miller, Wensong Zhang, Julian Anastasov,
netdev
On 28.09.2005 [11:23:09 +0900], Horms wrote:
> On Mon, Sep 26, 2005 at 07:21:09AM -0700, Nishanth Aravamudan wrote:
> > On 26.09.2005 [15:52:02 +0200], Luca Maranzano wrote:
> > > On 26/09/05, Nishanth Aravamudan <nacc@us.ibm.com> wrote:
> > > > On 26.09.2005 [17:12:32 +0900], Horms wrote:
> > > > > On Mon, Sep 26, 2005 at 05:05:10PM +0900, Horms wrote:
> > > > >
> > > > > [snip]
> > > > >
> > > > > > > > > > Furthermore, if I make an "rgrep" in the source tree of kernel 2.6.12
> > > > > > > > > > the function schedule_timeout() is more used than the ssleep() (517
> > > > > > > > > > occurrencies vs. 43), so why in ip_vs_sync.c there was this change?
> > > > > > > > > >
> > > > > > > > > > The other oddity is that Horms reported on this list that on non Xeon
> > > > > > > > > > CPU the same version of kernel of mine does not present the problem.
> > > > > > > > > >
> > > > > > > > > > I'm getting crazy :-)
> > > > > > > >
> > > > > > > > I've prepared a patch, which reverts the change which was introduced
> > > > > > > > by Nishanth Aravamudan in February.
> > > > > > >
> > > > > > > Was the 100% cpu utilization only occurring on Xeon processors?
> > > > > >
> > > > > > That seems to be the only case where were this problem has been
> > > > > > observed. I don't have such a processor myself, so I haven't actually
> > > > > > been able to produce the problem locally.
> > > > > >
> > > > > > One reason I posted this issue to netdev was to get some more
> > > > > > eyes on the problem as it is puzzling to say the least.
> > > > > >
> > > > > > > Care to try to use msleep_interruptible() instead of ssleep(), as
> > > > > > > opposed to schedule_timeout()?
> > > > > >
> > > > > > I will send a version that does that shortly, Luca, can
> > > > > > you plase check that too?
> > > > >
> > > > > Here is that version of the patch. Nishanth, I take it that I do not
> > > > > need to set TASK_INTERRUPTABLE before calling msleep_interruptible(),
> > > > > please let me know if I am wrong.
> > > >
> > > > Yes, exactly. I'm just trying to narrow it down to see if it's the task
> > > > state that's causing the issue (which, to be honest, doesn't make a lot
> > > > of sense to me -- with ssleep() your load average will go up as the task
> > > > will be UNINTERRUPTIBLE state, but I am not sure why utilisation would
> > > > rise, as you are still sleeping...)
> >
> > [trimmed lvs-users from my reply, as it is a closed list]
> >
> > > Just to add more info, please note the output of "ps":
> > >
> > > debld1:~# ps aux|grep ipvs
> > > root 3748 0.0 0.0 0 0 ? D 12:09 0:00
> > > [ipvs_syncmaster]
> > > root 3757 0.0 0.0 0 0 ? D 12:09 0:00
> > > [ipvs_syncbackup]
> > >
> > > Note the D status, i.e. (from ps(1) man page): Uninterruptible sleep
> > > (usually IO)
> >
> > The msleep_interruptible() change should fix that.
> >
> > But that does not show 100% CPU utilisation at all, it shows 0. Did you
> > mean to say your load increases?
>
> he full discussion is available online at the follwoing URL:
> I can get than information and post it all here if that is
> desirable.
>
> http://archive.linuxvirtualserver.org/html/lvs-users/2005-09/msg00031.html
Yes, the information in that thread is the same as what Luca said. It's
a load average problem, not a CPU utilisation problem (those threads are
sleeping!) If Luca could test the msleep_interruptible() version of the
patch and it works (like I said, performance should not change, but the
load average will drop to by 2), then I will ACK the patch for mainline
acceptance.
Thanks,
Nish
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ipvs_syncmaster brings cpu to 100%
2005-09-28 13:26 ` Nishanth Aravamudan
@ 2005-09-29 7:00 ` Julian Anastasov
2005-09-30 15:59 ` Luca Maranzano
1 sibling, 0 replies; 13+ messages in thread
From: Julian Anastasov @ 2005-09-29 7:00 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: Luca Maranzano, Dave Miller, Wensong Zhang, netdev
Hello,
On Wed, 28 Sep 2005, Nishanth Aravamudan wrote:
> Yes, the information in that thread is the same as what Luca said. It's
> a load average problem, not a CPU utilisation problem (those threads are
> sleeping!) If Luca could test the msleep_interruptible() version of the
> patch and it works (like I said, performance should not change, but the
> load average will drop to by 2), then I will ACK the patch for mainline
> acceptance.
Agreed. It seems your initial conversion was based on wrong
assumptions, quoting you:
> Description: Use ssleep() instead of schedule_timeout() to guarantee the task
> delays as expected. The first two replacements use TASK_INTERRUPTIBLE but do
> not
> check for signals, so ssleep() should be appropriate.
As all signals are blocked from daemonize and even explicitly
later it was not necessary to convert to non-interruptible variant.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ipvs_syncmaster brings cpu to 100%
2005-09-28 13:26 ` Nishanth Aravamudan
2005-09-29 7:00 ` Julian Anastasov
@ 2005-09-30 15:59 ` Luca Maranzano
1 sibling, 0 replies; 13+ messages in thread
From: Luca Maranzano @ 2005-09-30 15:59 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Dave Miller, Wensong Zhang, Julian Anastasov, netdev, horms
First of all thank you all for your precious support! :-)
The two machines on which I discovered the problem are now in
production and I cannot for the moment make tests, but I hope to have
some other hardware to try in the next week.
I'll let you know ASAP.
Thanks,
Luca
>
> Yes, the information in that thread is the same as what Luca said. It's
> a load average problem, not a CPU utilisation problem (those threads are
> sleeping!) If Luca could test the msleep_interruptible() version of the
> patch and it works (like I said, performance should not change, but the
> load average will drop to by 2), then I will ACK the patch for mainline
> acceptance.
>
> Thanks,
> Nish
>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2005-09-30 15:59 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <68559cef050908090657fc2599@mail.gmail.com>
[not found] ` <498263350509081605956a771@mail.gmail.com>
[not found] ` <68559cef05092207022f1f0df4@mail.gmail.com>
[not found] ` <498263350509230815eb08a73@mail.gmail.com>
2005-09-26 3:28 ` ipvs_syncmaster brings cpu to 100% Horms
[not found] ` <20050926032807.GI18357@verge.net.au>
2005-09-26 4:34 ` Nishanth Aravamudan
2005-09-26 8:05 ` Horms
[not found] ` <20050926080508.GF11027@verge.net.au>
2005-09-26 8:12 ` Horms
[not found] ` <20050926081229.GA23755@verge.net.au>
2005-09-26 13:11 ` Nishanth Aravamudan
2005-09-26 13:52 ` Luca Maranzano
[not found] ` <68559cef05092606521cc13f9a@mail.gmail.com>
2005-09-26 14:21 ` Nishanth Aravamudan
2005-09-26 14:44 ` Luca Maranzano
2005-09-26 17:51 ` Nishanth Aravamudan
2005-09-28 2:23 ` Horms
2005-09-28 13:26 ` Nishanth Aravamudan
2005-09-29 7:00 ` Julian Anastasov
2005-09-30 15:59 ` Luca Maranzano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).