netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hans Schillstrom <hans@schillstrom.com>
To: Julian Anastasov <ja@ssi.bg>
Cc: horms@verge.net.au, daniel.lezcano@free.fr, wensong@linux-vs.org,
	lvs-devel@vger.kernel.org, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org,
	Hans Schillstrom <hans.schillstrom@ericsson.com>
Subject: Re: [*v3 PATCH 00/22] IPVS Network Name Space aware
Date: Sun, 2 Jan 2011 17:27:42 +0100	[thread overview]
Message-ID: <201101021727.42759.hans@schillstrom.com> (raw)
In-Reply-To: <alpine.LFD.2.00.1101011421410.2559@ja.ssi.bg>

On Saturday, January 01, 2011 13:27:00 Julian Anastasov wrote:
> 
>  	Hello,
> 
> On Thu, 30 Dec 2010, hans@schillstrom.com wrote:
> 
> > From: Hans Schillstrom <hans.schillstrom@ericsson.com>
> >
> > This patch series adds network name space support to the LVS.
> >
> > REVISION
> >
> > This is version 3
> >
> > OVERVIEW
> >
> > The patch doesn't remove or add any functionality except for netns.
> > For users that don't use network name space (netns) this patch is
> > completely transparent.
> >
> > Now it's possible to run LVS in a Linux container (see lxc-tools)
> > i.e.  a light weight visualization. For example it's possible to run
> > one or several lvs on a real server in their own network name spaces.
> >> From the LVS point of view it looks like it runs on it's own machine.
> 
>  	Only some comments for two of the patches:
> 
> v3 PATCH 10/22 use ip_vs_proto_data as param
> 
>  	- Can ip_vs_protocol_timeout_change() walk
>  	proto_data_table instead of ip_vs_proto_table to avoid the
>  	__ipvs_proto_data_get call?
> 

I just forget to do that ...

> v3 PATCH 15/22 ip_vs_stats
> 
>  	- Is ustats_seq allocated with alloc_percpu?
> 
>  	Such reader sections should be changed to use tmp vars because
>  	on retry we risk to add the values multiple times. For example:
> 
>  		do {
>  			start = read_seqcount_begin(seq_count);
>  			ipvs->ctl_stats->ustats.inbytes += u->inbytes;
>  			ipvs->ctl_stats->ustats.outbytes += u->outbytes;
>  		} while (read_seqcount_retry(seq_count, start));
> 
>  	should be changed as follows:
> 
>  		u64 inbytes, outbytes;
> 
>  		do {
>  			start = read_seqcount_begin(seq_count);
>  			inbytes = u->inbytes;
>  			outbytes = u->outbytes;
>  		} while (read_seqcount_retry(seq_count, start));
>  		ipvs->ctl_stats->ustats.inbytes += inbytes;
>  		ipvs->ctl_stats->ustats.outbytes += outbytes;
> 

Ooops, that was a bug :-(

>  	Or it is better to create new struct for percpu stats,
>  	they will have their own syncp, because we can not
>  	change struct ip_vs_stats_user. syncp should be percpu
>  	because we remove locks.
> 
>  	For example:
> 
>  	struct ip_vs_cpu_stats {
>  		struct ip_vs_stats_user	ustats;
>  		struct u64_stats_sync	syncp;
>  	};
> 
>  	Then we can add this in struct netns_ipvs:
> 
>  	struct ip_vs_cpu_stats __percpu *stats;	/* Statistics */
> 
>  	without the seqcount_t * ustats_seq;
> 
>  	Then syncp does not need any initialization, it seems
>  	alloc_percpu returns zeroed area.
> 
>  	When we use percpu stats for all places (dest and svc) we
>  	can create new struct struct ip_vs_counters, so that we
>  	can reduce the memory usage from percpu data. Now stats
>  	include counters and estimated values. The estimated
>  	values should not be percpu. Then ip_vs_cpu_stats
>  	will be shorter (it is not visible to user space):
> 
>  	struct ip_vs_cpu_stats {
>  		struct ip_vs_counters	ustats;
>  		struct u64_stats_sync	syncp;
>  	};
> 
>  	For writer side in softirq context we should protect the whole
>  	section with u64 counters, for example this code:
> 

There is only two u64,  inbytes and outbytes

>  	spin_lock(&ip_vs_stats.lock);
>  	ip_vs_stats.ustats.outpkts++;
>  	ip_vs_stats.ustats.outbytes += skb->len;
>  	spin_unlock(&ip_vs_stats.lock);
> 
>  	should be changed to:
> 
>  	struct ip_vs_cpu_stats *s = this_cpu_ptr(ipvs->stats);
> 
>  	u64_stats_update_begin(&s->syncp);
>  	s->ustats.outpkts++;
>  	s->ustats.outbytes += skb->len;
>  	u64_stats_update_end(&s->syncp);
> 
>  	Readers should look in similar way, only that _bh fetching
>  	should be used only for the u64 counters because writer is in
>  	softirq context:
> 
>  	u64 inbytes, outbytes;
> 
>  	for_each_possible_cpu(i) {
>  		const struct ip_vs_cpu_stats *s = per_cpu_ptr(ipvs->stats, i);
>  		unsigned int start;
> 
>  		...
>  		do {
>  			start = u64_stats_fetch_begin_bh(&s->syncp);
>  			inbytes = s->ustats.inbytes;
>  			outbytes = s->ustats.outbytes;
>  		} while (u64_stats_fetch_retry_bh(&s->syncp, start));
>  		ipvs->ctl_stats->ustats.inbytes += inbytes;
>  		ipvs->ctl_stats->ustats.outbytes += outbytes;
>  	}
> 
>  	Then IPVS_STAT_ADD and IPVS_STAT_INC can not access
>  	the syncp. They do not look generic enough, in case we
>  	decide to use struct ip_vs_cpu_stats for dest and svc stats.
>  	I think, these macros are not needed.
>  	It is possible to have other macros for write side,
>  	for percpu stats:
> 
>  	#define IP_VS_STATS_UPDATE_BEGIN(stats)		\
>  		{ struct ip_vs_cpu_stats *s = this_cpu_ptr(stats); \
>  		u64_stats_update_begin(&s->syncp)
> 
>  	#define IP_VS_STATS_UPDATE_END()		\
>  		u64_stats_update_end(&s->syncp);	\
>  		}
> 
>  	Then usage can be:
> 
>  	IP_VS_STATS_UPDATE_BEGIN(ipvs->stats);
>  	s->ustats.outpkts++;
>  	s->ustats.outbytes += skb->len;
>  	IP_VS_STATS_UPDATE_END();
> 
>  	Then we can rename ctl_stats to total_stats or full_stats.
>  	get_stats() can be changed to ip_vs_read_cpu_stats(), it
>  	will be used to read all percpu stats as sum in the
>  	total_stats/full_stats structure or tmp space:
> 
>  	/* Get sum of percpu stats */
>  	... ip_vs_read_cpu_stats(struct ip_vs_stats_user *sum,
>  		struct ip_vs_cpu_stats *stats)
>  	{
>  		...
>  		for_each_possible_cpu(i) {
>  			const struct ip_vs_cpu_stats *s = per_cpu_ptr(stats, i);
>  			unsigned int start;
> 
>  			if (i) {...
>  			...
>  			do {
>  				start = u64_stats_fetch_begin_bh(&s->syncp);
>  				inbytes = s->ustats.inbytes;
>  				outbytes = s->ustats.outbytes;
>  			} while (u64_stats_fetch_retry_bh(&s->syncp, start));
>  			sum->inbytes += inbytes;
>  			sum->outbytes += outbytes;
>  		}
>  	}
> 
>  	As result, ip_vs_read_cpu_stats() will be used in
>  	estimation_timer() instead of get_stats():
> 
>  	ip_vs_read_cpu_stats(&ipvs->total_stats->ustats, ipvs->stats);
> 
>  	but also in ip_vs_stats_show()
>  	{
>  		// I hope struct ip_vs_stats_user can fit in stack:
>  		struct ip_vs_stats_user sum;
> 
>  		here we do not need to use spin_lock_bh(&ctl_stats->lock);
>  		because now the stats are lockless, estimation_timer
>  		does not use ctl_stats->lock, so we have to call
>  		ip_vs_read_cpu_stats to avoid problems with
>  		reading incorrect u64 values directly from
>  		ipvs->total_stats->ustats.
> 
>  		ip_vs_read_cpu_stats(&sum, ipvs->stats);
>  		seq_printf for sum
>  	}
> 
>  	ip_vs_read_cpu_stats can be used also in ip_vs_copy_stats()
>  	which copies values to user space and needs proper u64 reading.
>  	But it is used only for svc and dest stats which are not
>  	percpu yet.
> 
>  	Now this code does not look ok:
> 
>  	ip_vs_zero_stats(net_ipvs(net)->ctl_stats);
> 
>  	May be we need new func ip_vs_zero_percpu_stats that will
>  	reset stats for all CPUs, i.e. ipvs->stats in our case?
>  	Even if this operation is not very safe if done from
>  	user space while the softirq can modify u64 counters
>  	on another CPU.

OK, I will take a look at this when I'm back from my vacation at 20 Jan.
I guess it might be worth the work to make all the stat counters per_cpu.

My tests shows that the big "CPU cycle bandit" was the one that I made per_cpu.

> 
> 
> Happy New Year!
> 

      reply	other threads:[~2011-01-02 16:27 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-30 10:50 [*v3 PATCH 00/22] IPVS Network Name Space aware hans
2010-12-30 10:50 ` [*v3 PATCH 01/22] IPVS: netns, add basic init per netns hans
2010-12-30 23:58   ` Jan Engelhardt
2010-12-31 15:36     ` Hans Schillstrom
2010-12-30 10:50 ` [*v3 PATCH 02/22] IPVS: netns to services part 1 hans
2011-01-01 14:57   ` Jan Engelhardt
2011-01-03  9:14     ` Hans Schillstrom
2010-12-30 10:50 ` [*v3 PATCH 03/22] IPVS: netns awarness to lblcr sheduler hans
2010-12-30 10:50 ` [*v3 PATCH 04/22] IPVS: netns awarness to lblc sheduler hans
2010-12-30 10:50 ` [*v3 PATCH 05/22] IPVS: netns, prepare protocol hans
2010-12-30 10:50 ` [*v3 PATCH 06/22] IPVS: netns preparation for proto_tcp hans
2010-12-30 10:50 ` [*v3 PATCH 07/22] IPVS: netns preparation for proto_udp hans
2010-12-30 10:50 ` [*v3 PATCH 08/22] IPVS: netns preparation for proto_sctp hans
2010-12-30 10:50 ` [*v3 PATCH 09/22] IPVS: netns preparation for proto_ah_esp hans
2010-12-30 10:50 ` [*v3 PATCH 10/22] IPVS: netns, use ip_vs_proto_data as param hans
2010-12-30 10:50 ` [*v3 PATCH 11/22] IPVS: netns, common protocol changes and use of appcnt hans
2010-12-30 10:50 ` [*v3 PATCH 12/22] IPVS: netns awareness to ip_vs_app hans
2010-12-30 10:50 ` [*v3 PATCH 13/22] IPVS: netns awareness to ip_vs_est hans
2010-12-30 10:50 ` [*v3 PATCH 14/22] IPVS: netns awareness to ip_vs_sync hans
2010-12-31  0:44   ` Simon Horman
2011-01-02  9:47     ` Hans Schillstrom
2010-12-30 10:50 ` [*v3 PATCH 15/22] IPVS: netns, ip_vs_stats and its procfs hans
2010-12-30 10:51 ` [*v3 PATCH 16/22] IPVS: netns, connection hash got net as param hans
2010-12-30 10:51 ` [*v3 PATCH 17/22] IPVS: netns, ip_vs_ctl local vars moved to ipvs struct hans
2010-12-30 10:51 ` [*v3 PATCH 18/22] IPVS: netns, defense work timer hans
2010-12-30 10:51 ` [*v3 PATCH 19/22] IPVS: netns, trash handling hans
2010-12-30 10:51 ` [*v3 PATCH 20/22] IPVS: netns, svc counters moved in ip_vs_ctl,c hans
2010-12-30 10:51 ` [*v3 PATCH 21/22] IPVS: netns, misc init_net removal in core hans
2010-12-30 10:51 ` [*v3 PATCH 22/22] IPVS: netns, final patch enabling network name space hans
2011-01-01 12:27 ` [*v3 PATCH 00/22] IPVS Network Name Space aware Julian Anastasov
2011-01-02 16:27   ` Hans Schillstrom [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201101021727.42759.hans@schillstrom.com \
    --to=hans@schillstrom.com \
    --cc=daniel.lezcano@free.fr \
    --cc=hans.schillstrom@ericsson.com \
    --cc=horms@verge.net.au \
    --cc=ja@ssi.bg \
    --cc=lvs-devel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=wensong@linux-vs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).