From: Jiri Wiesner <jwiesner@suse.de>
To: Julian Anastasov <ja@ssi.bg>
Cc: Simon Horman <horms@verge.net.au>,
lvs-devel@vger.kernel.org,
yunhong-cgl jiang <xintian1976@gmail.com>,
dust.li@linux.alibaba.com
Subject: Re: [RFC PATCHv5 3/6] ipvs: use kthreads for stats estimation
Date: Sat, 15 Oct 2022 11:21:58 +0200 [thread overview]
Message-ID: <20221015092158.GA3484@incl> (raw)
In-Reply-To: <20221009153710.125919-4-ja@ssi.bg>
On Sun, Oct 09, 2022 at 06:37:07PM +0300, Julian Anastasov wrote:
> +/* Calculate limits for all kthreads */
> +static int ip_vs_est_calc_limits(struct netns_ipvs *ipvs, int *chain_max)
> +{
> + struct ip_vs_est_kt_data *kd;
> + struct ip_vs_stats *s;
> + struct hlist_head chain;
> + int cache_factor = 4;
> + int i, loops, ntest;
> + s32 min_est = 0;
> + ktime_t t1, t2;
> + s64 diff, val;
> + int max = 8;
> + int ret = 1;
> +
> + INIT_HLIST_HEAD(&chain);
> + mutex_lock(&__ip_vs_mutex);
> + kd = ipvs->est_kt_arr[0];
> + mutex_unlock(&__ip_vs_mutex);
> + s = kd ? kd->calc_stats : NULL;
> + if (!s)
> + goto out;
> + hlist_add_head(&s->est.list, &chain);
> +
> + loops = 1;
> + /* Get best result from many tests */
> + for (ntest = 0; ntest < 3; ntest++) {
> + local_bh_disable();
> + rcu_read_lock();
> +
> + /* Put stats in cache */
> + ip_vs_chain_estimation(&chain);
> +
> + t1 = ktime_get();
> + for (i = loops * cache_factor; i > 0; i--)
> + ip_vs_chain_estimation(&chain);
> + t2 = ktime_get();
I have tested this. There is one problem: When the calc phase is carried out for the first time after booting the kernel the diff is several times higher than what is should be - it was 7325 ns on my testing machine. The wrong chain_max value causes 15 kthreads to be created when 500,000 estimators have been added, which is not abysmal (It's better to underestimate chain_max than to overestimate it) but not optimal either. When the ip_vs module is unloaded and then a new service is added again the diff has the expected value. The commands:
> # ipvsadm -A -t 10.10.10.1:2000
> # ipvsadm -D -t 10.10.10.1:2000; modprobe -r ip_vs_wlc ip_vs
> # ipvsadm -A -t 10.10.10.1:2000
The kernel log:
> [ 200.020287] IPVS: ipvs loaded.
> [ 200.036128] IPVS: starting estimator thread 0...
> [ 200.042213] IPVS: calc: chain_max=12, single est=7319ns, diff=7325, loops=1, ntest=3
> [ 200.051714] IPVS: dequeue: 49ns
> [ 200.056024] IPVS: using max 576 ests per chain, 28800 per kthread
> [ 201.983034] IPVS: tick time: 6057ns for 64 CPUs, 2 ests, 1 chains, chain_max=576
> [ 237.555043] IPVS: stop unused estimator thread 0...
> [ 237.599116] IPVS: ipvs unloaded.
> [ 268.533028] IPVS: ipvs loaded.
> [ 268.548401] IPVS: starting estimator thread 0...
> [ 268.554472] IPVS: calc: chain_max=33, single est=2834ns, diff=2834, loops=1, ntest=3
> [ 268.563972] IPVS: dequeue: 68ns
> [ 268.568292] IPVS: using max 1584 ests per chain, 79200 per kthread
> [ 270.495032] IPVS: tick time: 5761ns for 64 CPUs, 2 ests, 1 chains, chain_max=1584
> [ 307.847045] IPVS: stop unused estimator thread 0...
> [ 307.891101] IPVS: ipvs unloaded.
Loading the module and adding a service a third time gives a diff that is close enough to the expected value:
> [ 312.807107] IPVS: ipvs loaded.
> [ 312.823972] IPVS: starting estimator thread 0...
> [ 312.829967] IPVS: calc: chain_max=38, single est=2444ns, diff=2477, loops=1, ntest=3
> [ 312.839470] IPVS: dequeue: 66ns
> [ 312.843800] IPVS: using max 1824 ests per chain, 91200 per kthread
> [ 314.771028] IPVS: tick time: 5703ns for 64 CPUs, 2 ests, 1 chains, chain_max=1824
Here is a distribution of the time needed to process one estimator - the average value is around 2900 ns (on my testing machine):
> dmesg | awk '/tick time:/ {d = $(NF - 8); sub("ns", "", d); d /= $(NF - 4); d = int(d / 100) * 100; hist[d]++} END {PROCINFO["sorted_in"] = "@ind_num_asc"; for (d in hist) printf "%5d %5d\n", d, hist[d]}'
> 2500 2
> 2700 1
> 2800 243
> 2900 427
> 3000 20
> 3100 1
> 3500 1
> 3600 1
> 3700 1
> 4900 1
I am not sure why the first 3 tests give such a high diff value but the diff value is much closer to the read average time after the module is loaded a second time.
I ran more tests. All I did was increase ntests to 3000. The diff had a much more realistic value even when the calc phase was carried out for the first time:
> [ 98.804037] IPVS: ipvs loaded.
> [ 98.819451] IPVS: starting estimator thread 0...
> [ 98.834960] IPVS: calc: chain_max=39, single est=2418ns, diff=2464, loops=1, ntest=3000
> [ 98.844775] IPVS: dequeue: 67ns
> [ 98.849091] IPVS: using max 1872 ests per chain, 93600 per kthread
> [ 100.767346] IPVS: tick time: 5895ns for 64 CPUs, 2 ests, 1 chains, chain_max=1872
> [ 107.419344] IPVS: stop unused estimator thread 0...
> [ 107.459423] IPVS: ipvs unloaded.
> [ 114.421324] IPVS: ipvs loaded.
> [ 114.435151] IPVS: starting estimator thread 0...
> [ 114.451304] IPVS: calc: chain_max=36, single est=2627ns, diff=8136, loops=1, ntest=3000
> [ 114.461079] IPVS: dequeue: 77ns
> [ 114.465389] IPVS: using max 1728 ests per chain, 86400 per kthread
> [ 116.388968] IPVS: tick time: 1632749ns for 64 CPUs, 1433 ests, 1 chains, chain_max=1728
> [ 180.387030] IPVS: tick time: 3686870ns for 64 CPUs, 1728 ests, 1 chains, chain_max=1728
> [ 232.507642] IPVS: starting estimator thread 1...
> [ 244.387184] IPVS: tick time: 3846122ns for 64 CPUs, 1728 ests, 1 chains, chain_max=1728
> [ 308.387170] IPVS: tick time: 3835769ns for 64 CPUs, 1728 ests, 1 chains, chain_max=1728
> [ 358.227680] IPVS: starting estimator thread 2...
> [ 372.387177] IPVS: tick time: 3841369ns for 64 CPUs, 1728 ests, 1 chains, chain_max=1728
> [ 436.387204] IPVS: tick time: 3869654ns for 64 CPUs, 1728 ests, 1 chains, chain_max=1728
Setting ntests to 3000 is probably overkill. The message is that increasing ntests is needed to get a realistic value of the diff. When I added 500,000 estimators 5 kthreads where created, which I think is reasonable. After adding 500,000 estimators, the time needed to process one estimator decreased from 2900 ms to circa 2200 ms when a kthread is fully loaded, which I do not think is necessarily a problem.
> +
> + rcu_read_unlock();
> + local_bh_enable();
> +
> + if (!ipvs->enable || kthread_should_stop())
> + goto stop;
> + cond_resched();
> +
> + diff = ktime_to_ns(ktime_sub(t2, t1));
> + if (diff <= 1 * NSEC_PER_USEC) {
> + /* Do more loops on low resolution */
> + loops *= 2;
> + continue;
> + }
> + if (diff >= NSEC_PER_SEC)
> + continue;
> + val = diff;
> + do_div(val, loops);
> + if (!min_est || val < min_est) {
> + min_est = val;
> + /* goal: 95usec per chain */
> + val = 95 * NSEC_PER_USEC;
> + if (val >= min_est) {
> + do_div(val, min_est);
> + max = (int)val;
> + } else {
> + max = 1;
> + }
> + }
> + }
> +
> +out:
> + if (s)
> + hlist_del_init(&s->est.list);
> + *chain_max = max;
> + return ret;
> +
> +stop:
> + ret = 0;
> + goto out;
> +}
--
Jiri Wiesner
SUSE Labs
next prev parent reply other threads:[~2022-10-15 9:21 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-09 15:37 [RFC PATCHv5 0/6] ipvs: Use kthreads for stats Julian Anastasov
2022-10-09 15:37 ` [RFC PATCHv5 1/6] ipvs: add rcu protection to stats Julian Anastasov
2022-10-09 15:37 ` [RFC PATCHv5 2/6] ipvs: use common functions for stats allocation Julian Anastasov
2022-10-09 15:37 ` [RFC PATCHv5 3/6] ipvs: use kthreads for stats estimation Julian Anastasov
2022-10-15 9:21 ` Jiri Wiesner [this message]
2022-10-16 12:21 ` Julian Anastasov
2022-10-22 18:15 ` Jiri Wiesner
2022-10-24 15:01 ` Julian Anastasov
2022-10-26 15:29 ` Julian Anastasov
2022-10-27 18:07 ` Jiri Wiesner
2022-10-29 14:12 ` Julian Anastasov
2022-11-16 16:41 ` Jiri Wiesner
2022-10-09 15:37 ` [RFC PATCHv5 4/6] ipvs: add est_cpulist and est_nice sysctl vars Julian Anastasov
2022-10-09 15:37 ` [RFC PATCHv5 5/6] ipvs: run_estimation should control the kthread tasks Julian Anastasov
2022-10-09 15:37 ` [RFC PATCHv5 6/6] ipvs: debug the tick time Julian Anastasov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221015092158.GA3484@incl \
--to=jwiesner@suse.de \
--cc=dust.li@linux.alibaba.com \
--cc=horms@verge.net.au \
--cc=ja@ssi.bg \
--cc=lvs-devel@vger.kernel.org \
--cc=xintian1976@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.