All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hans Schillstrom <hans@schillstrom.com>
To: Julian Anastasov <ja@ssi.bg>
Cc: horms@verge.net.au, daniel.lezcano@free.fr, wensong@linux-vs.org,
	lvs-devel@vger.kernel.org, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org,
	Hans Schillstrom <hans.schillstrom@ericsson.com>
Subject: Re: [*v3 PATCH 00/22] IPVS Network Name Space aware
Date: Sun, 2 Jan 2011 17:27:42 +0100	[thread overview]
Message-ID: <201101021727.42759.hans@schillstrom.com> (raw)
In-Reply-To: <alpine.LFD.2.00.1101011421410.2559@ja.ssi.bg>

On Saturday, January 01, 2011 13:27:00 Julian Anastasov wrote:
> 
>  	Hello,
> 
> On Thu, 30 Dec 2010, hans@schillstrom.com wrote:
> 
> > From: Hans Schillstrom <hans.schillstrom@ericsson.com>
> >
> > This patch series adds network name space support to the LVS.
> >
> > REVISION
> >
> > This is version 3
> >
> > OVERVIEW
> >
> > The patch doesn't remove or add any functionality except for netns.
> > For users that don't use network name space (netns) this patch is
> > completely transparent.
> >
> > Now it's possible to run LVS in a Linux container (see lxc-tools)
> > i.e.  a light weight visualization. For example it's possible to run
> > one or several lvs on a real server in their own network name spaces.
> >> From the LVS point of view it looks like it runs on it's own machine.
> 
>  	Only some comments for two of the patches:
> 
> v3 PATCH 10/22 use ip_vs_proto_data as param
> 
>  	- Can ip_vs_protocol_timeout_change() walk
>  	proto_data_table instead of ip_vs_proto_table to avoid the
>  	__ipvs_proto_data_get call?
> 

I just forget to do that ...

> v3 PATCH 15/22 ip_vs_stats
> 
>  	- Is ustats_seq allocated with alloc_percpu?
> 
>  	Such reader sections should be changed to use tmp vars because
>  	on retry we risk to add the values multiple times. For example:
> 
>  		do {
>  			start = read_seqcount_begin(seq_count);
>  			ipvs->ctl_stats->ustats.inbytes += u->inbytes;
>  			ipvs->ctl_stats->ustats.outbytes += u->outbytes;
>  		} while (read_seqcount_retry(seq_count, start));
> 
>  	should be changed as follows:
> 
>  		u64 inbytes, outbytes;
> 
>  		do {
>  			start = read_seqcount_begin(seq_count);
>  			inbytes = u->inbytes;
>  			outbytes = u->outbytes;
>  		} while (read_seqcount_retry(seq_count, start));
>  		ipvs->ctl_stats->ustats.inbytes += inbytes;
>  		ipvs->ctl_stats->ustats.outbytes += outbytes;
> 

Ooops, that was a bug :-(

>  	Or it is better to create new struct for percpu stats,
>  	they will have their own syncp, because we can not
>  	change struct ip_vs_stats_user. syncp should be percpu
>  	because we remove locks.
> 
>  	For example:
> 
>  	struct ip_vs_cpu_stats {
>  		struct ip_vs_stats_user	ustats;
>  		struct u64_stats_sync	syncp;
>  	};
> 
>  	Then we can add this in struct netns_ipvs:
> 
>  	struct ip_vs_cpu_stats __percpu *stats;	/* Statistics */
> 
>  	without the seqcount_t * ustats_seq;
> 
>  	Then syncp does not need any initialization, it seems
>  	alloc_percpu returns zeroed area.
> 
>  	When we use percpu stats for all places (dest and svc) we
>  	can create new struct struct ip_vs_counters, so that we
>  	can reduce the memory usage from percpu data. Now stats
>  	include counters and estimated values. The estimated
>  	values should not be percpu. Then ip_vs_cpu_stats
>  	will be shorter (it is not visible to user space):
> 
>  	struct ip_vs_cpu_stats {
>  		struct ip_vs_counters	ustats;
>  		struct u64_stats_sync	syncp;
>  	};
> 
>  	For writer side in softirq context we should protect the whole
>  	section with u64 counters, for example this code:
> 

There is only two u64,  inbytes and outbytes

>  	spin_lock(&ip_vs_stats.lock);
>  	ip_vs_stats.ustats.outpkts++;
>  	ip_vs_stats.ustats.outbytes += skb->len;
>  	spin_unlock(&ip_vs_stats.lock);
> 
>  	should be changed to:
> 
>  	struct ip_vs_cpu_stats *s = this_cpu_ptr(ipvs->stats);
> 
>  	u64_stats_update_begin(&s->syncp);
>  	s->ustats.outpkts++;
>  	s->ustats.outbytes += skb->len;
>  	u64_stats_update_end(&s->syncp);
> 
>  	Readers should look in similar way, only that _bh fetching
>  	should be used only for the u64 counters because writer is in
>  	softirq context:
> 
>  	u64 inbytes, outbytes;
> 
>  	for_each_possible_cpu(i) {
>  		const struct ip_vs_cpu_stats *s = per_cpu_ptr(ipvs->stats, i);
>  		unsigned int start;
> 
>  		...
>  		do {
>  			start = u64_stats_fetch_begin_bh(&s->syncp);
>  			inbytes = s->ustats.inbytes;
>  			outbytes = s->ustats.outbytes;
>  		} while (u64_stats_fetch_retry_bh(&s->syncp, start));
>  		ipvs->ctl_stats->ustats.inbytes += inbytes;
>  		ipvs->ctl_stats->ustats.outbytes += outbytes;
>  	}
> 
>  	Then IPVS_STAT_ADD and IPVS_STAT_INC can not access
>  	the syncp. They do not look generic enough, in case we
>  	decide to use struct ip_vs_cpu_stats for dest and svc stats.
>  	I think, these macros are not needed.
>  	It is possible to have other macros for write side,
>  	for percpu stats:
> 
>  	#define IP_VS_STATS_UPDATE_BEGIN(stats)		\
>  		{ struct ip_vs_cpu_stats *s = this_cpu_ptr(stats); \
>  		u64_stats_update_begin(&s->syncp)
> 
>  	#define IP_VS_STATS_UPDATE_END()		\
>  		u64_stats_update_end(&s->syncp);	\
>  		}
> 
>  	Then usage can be:
> 
>  	IP_VS_STATS_UPDATE_BEGIN(ipvs->stats);
>  	s->ustats.outpkts++;
>  	s->ustats.outbytes += skb->len;
>  	IP_VS_STATS_UPDATE_END();
> 
>  	Then we can rename ctl_stats to total_stats or full_stats.
>  	get_stats() can be changed to ip_vs_read_cpu_stats(), it
>  	will be used to read all percpu stats as sum in the
>  	total_stats/full_stats structure or tmp space:
> 
>  	/* Get sum of percpu stats */
>  	... ip_vs_read_cpu_stats(struct ip_vs_stats_user *sum,
>  		struct ip_vs_cpu_stats *stats)
>  	{
>  		...
>  		for_each_possible_cpu(i) {
>  			const struct ip_vs_cpu_stats *s = per_cpu_ptr(stats, i);
>  			unsigned int start;
> 
>  			if (i) {...
>  			...
>  			do {
>  				start = u64_stats_fetch_begin_bh(&s->syncp);
>  				inbytes = s->ustats.inbytes;
>  				outbytes = s->ustats.outbytes;
>  			} while (u64_stats_fetch_retry_bh(&s->syncp, start));
>  			sum->inbytes += inbytes;
>  			sum->outbytes += outbytes;
>  		}
>  	}
> 
>  	As result, ip_vs_read_cpu_stats() will be used in
>  	estimation_timer() instead of get_stats():
> 
>  	ip_vs_read_cpu_stats(&ipvs->total_stats->ustats, ipvs->stats);
> 
>  	but also in ip_vs_stats_show()
>  	{
>  		// I hope struct ip_vs_stats_user can fit in stack:
>  		struct ip_vs_stats_user sum;
> 
>  		here we do not need to use spin_lock_bh(&ctl_stats->lock);
>  		because now the stats are lockless, estimation_timer
>  		does not use ctl_stats->lock, so we have to call
>  		ip_vs_read_cpu_stats to avoid problems with
>  		reading incorrect u64 values directly from
>  		ipvs->total_stats->ustats.
> 
>  		ip_vs_read_cpu_stats(&sum, ipvs->stats);
>  		seq_printf for sum
>  	}
> 
>  	ip_vs_read_cpu_stats can be used also in ip_vs_copy_stats()
>  	which copies values to user space and needs proper u64 reading.
>  	But it is used only for svc and dest stats which are not
>  	percpu yet.
> 
>  	Now this code does not look ok:
> 
>  	ip_vs_zero_stats(net_ipvs(net)->ctl_stats);
> 
>  	May be we need new func ip_vs_zero_percpu_stats that will
>  	reset stats for all CPUs, i.e. ipvs->stats in our case?
>  	Even if this operation is not very safe if done from
>  	user space while the softirq can modify u64 counters
>  	on another CPU.

OK, I will take a look at this when I'm back from my vacation at 20 Jan.
I guess it might be worth the work to make all the stat counters per_cpu.

My tests shows that the big "CPU cycle bandit" was the one that I made per_cpu.

> 
> 
> Happy New Year!
> 

      reply	other threads:[~2011-01-02 16:27 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-30 10:50 [*v3 PATCH 00/22] IPVS Network Name Space aware hans
2010-12-30 10:50 ` hans
2010-12-30 10:50 ` [*v3 PATCH 01/22] IPVS: netns, add basic init per netns hans
2010-12-30 23:58   ` Jan Engelhardt
2010-12-31 15:36     ` Hans Schillstrom
2010-12-30 10:50 ` [*v3 PATCH 02/22] IPVS: netns to services part 1 hans
2011-01-01 14:57   ` Jan Engelhardt
2011-01-03  9:14     ` Hans Schillstrom
2010-12-30 10:50 ` [*v3 PATCH 03/22] IPVS: netns awarness to lblcr sheduler hans
2010-12-30 10:50 ` [*v3 PATCH 04/22] IPVS: netns awarness to lblc sheduler hans
2010-12-30 10:50 ` [*v3 PATCH 05/22] IPVS: netns, prepare protocol hans
2010-12-30 10:50 ` [*v3 PATCH 06/22] IPVS: netns preparation for proto_tcp hans
2010-12-30 10:50 ` [*v3 PATCH 07/22] IPVS: netns preparation for proto_udp hans
2010-12-30 10:50 ` [*v3 PATCH 08/22] IPVS: netns preparation for proto_sctp hans
2010-12-30 10:50 ` [*v3 PATCH 09/22] IPVS: netns preparation for proto_ah_esp hans
2010-12-30 10:50 ` [*v3 PATCH 10/22] IPVS: netns, use ip_vs_proto_data as param hans
2010-12-30 10:50 ` [*v3 PATCH 11/22] IPVS: netns, common protocol changes and use of appcnt hans
2010-12-30 10:50 ` [*v3 PATCH 12/22] IPVS: netns awareness to ip_vs_app hans
2010-12-30 10:50 ` [*v3 PATCH 13/22] IPVS: netns awareness to ip_vs_est hans
2010-12-30 10:50 ` [*v3 PATCH 14/22] IPVS: netns awareness to ip_vs_sync hans
2010-12-31  0:44   ` Simon Horman
2011-01-02  9:47     ` Hans Schillstrom
2010-12-30 10:50 ` [*v3 PATCH 15/22] IPVS: netns, ip_vs_stats and its procfs hans
2010-12-30 10:51 ` [*v3 PATCH 16/22] IPVS: netns, connection hash got net as param hans
2010-12-30 10:51 ` [*v3 PATCH 17/22] IPVS: netns, ip_vs_ctl local vars moved to ipvs struct hans
2010-12-30 10:51 ` [*v3 PATCH 18/22] IPVS: netns, defense work timer hans
2010-12-30 10:51 ` [*v3 PATCH 19/22] IPVS: netns, trash handling hans
2010-12-30 10:51 ` [*v3 PATCH 20/22] IPVS: netns, svc counters moved in ip_vs_ctl,c hans
2010-12-30 10:51 ` [*v3 PATCH 21/22] IPVS: netns, misc init_net removal in core hans
2010-12-30 10:51 ` [*v3 PATCH 22/22] IPVS: netns, final patch enabling network name space hans
2011-01-01 12:27 ` [*v3 PATCH 00/22] IPVS Network Name Space aware Julian Anastasov
2011-01-02 16:27   ` Hans Schillstrom [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201101021727.42759.hans@schillstrom.com \
    --to=hans@schillstrom.com \
    --cc=daniel.lezcano@free.fr \
    --cc=hans.schillstrom@ericsson.com \
    --cc=horms@verge.net.au \
    --cc=ja@ssi.bg \
    --cc=lvs-devel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=wensong@linux-vs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.