Re: [PATCH 3/3] IPVS: init and cleanup restructuring.

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Hans Schillstrom <hans@schillstrom.com>
To: Julian Anastasov <ja@ssi.bg>
Cc: horms@verge.net.au, ebiederm@xmission.com,
	lvs-devel@vger.kernel.org, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org, hans.schillstrom@ericsson.com
Subject: Re: [PATCH 3/3] IPVS: init and cleanup restructuring.
Date: Wed, 20 Apr 2011 12:41:49 +0200	[thread overview]
Message-ID: <201104201241.50176.hans@schillstrom.com> (raw)
In-Reply-To: <alpine.LFD.2.00.1104200207340.1724@ja.ssi.bg>

On Wednesday, April 20, 2011 01:19:38 Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Tue, 19 Apr 2011, Hans Schillstrom wrote:
> 
> > This patch tries to restore the initial init and cleanup
> > sequences that was before name space patch.
> > 
> > The number of calls to register_pernet_device have been
> > reduced to one for the ip_vs.ko
> > Schedulers still have their own calls.
> > 
> > This patch adds a function __ip_vs_service_cleanup()
> > and a throttle or actually on/off switch for
> > the netfilter hooks.
> > 
> > The nf hooks will be enabled when the first service is loaded
> > and disabled when the last service is removed or when a
> > name space exit starts.
> 
> 	For me using _net suffix is more clear compared
> to __ prefix for the pernet methods.
> 
Sure I'll do that.

> 	For ip_vs_in: may be we can move the check here:
> 
> +	net = skb_net(skb);
> +	if (net->ipvs->throttle)
> +		return NF_ACCEPT;
>  	ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
>  
>  	/* Bad... Do not break raw sockets */

Done

> 
> 	It will save such checks later in ICMP funcs. But this
> throttle idea looks dangerous for cleanup. It does not use RCU.
> The readers can cache the 0 in throttle for long time.
> May be by using register_pernet_device we are in list with other
> devices and it is still possible some device used by
> our dst_cache to be unregistered before IPVS or we to be
> unregistered before such device and some race with throttle
> to happen. throttle is good when enabling traffic with
> the first virtual service, later it can slowly stop the traffic
> but we can not rely on it during netns cleanup.

OK , I will revert back to register_pernet_subsys(),
and use one single register_pernet_device()  exit hook for :
 - the throttle to disable traffic
 - stop the kernel threads.

> 
> 	So, there are 2 problems with the devices:
> 
> - if we use _device pernet registration we can see packets
> in our netns during cleanup. I assume this is possible
> when IPVS is unregistered before such devices.
> 
> - dests can cache dst and to hold the device after it is
> unregistered in netns, obviously for very short time until
> IPVS is later unregistered from netns. And for long time
> if device is unregistered but netns remains.
> 
> 	Also, in most of the cases svc->refcnt is above 0 because
> dests can be in trash list. You should be lucky to delete the
> service without any connections, only then ip_vs_svc_unhash can
> see refcnt == 0.
> 
> 	So, may be we have to use register_pernet_subsys (not
> _device). We need just to register notifier with
> register_netdevice_notifier and to catch NETDEV_UNREGISTER,
> so that if any dest uses this device we have to release the dst:
> 
I made a quick test of the concept and it seems to work, 
(A mix of new and old connections at 1GB/s into 4 namespaces when killing them all)

> 	- lock mutex
> 	- for every dest (also in trash):
> 	spin_lock_bh(&dest->dst_lock);
> 	if (dest->dst_cache && dest->dst_cache->dev == unregistered_dev)
> 		ip_vs_dst_reset(dest);
> 	spin_unlock_bh(&dest->dst_lock);


Here is a what i did

static inline void
__ip_vs_dev_reset(struct ip_vs_dest *dest, struct net_device *dev)
{
	spin_lock_bh(&dest->dst_lock);
	if (dest->dst_cache && dest->dst_cache->dev == dev)
		ip_vs_dst_reset(dest);
	spin_unlock_bh(&dest->dst_lock);
}

static void ip_vs_svc_reset(struct ip_vs_service *svc, struct net_device *dev)
{
	struct ip_vs_dest *dest;

	list_for_each_entry(dest, &svc->destinations, n_list) {
		__ip_vs_dev_reset(dest, dev);
	}
}

static int ip_vs_dst_event(struct notifier_block *this, unsigned long event,
			    void *ptr)
{
	struct net_device *dev = ptr;
	struct net *net = dev_net(dev);
	struct ip_vs_service *svc;
	struct ip_vs_dest *dest;
	unsigned int idx;

	if (event != NETDEV_UNREGISTER)
		return NOTIFY_DONE;

	EnterFunction(2);
	mutex_lock(&__ip_vs_mutex);
	for (idx = 0; idx<IP_VS_SVC_TAB_SIZE; idx++) {
		write_lock_bh(&__ip_vs_svc_lock);
		list_for_each_entry(svc, &ip_vs_svc_table[idx], s_list) {
			if (net_eq(svc->net, net))
				ip_vs_svc_reset(svc, dev);
		}
		list_for_each_entry(svc, &ip_vs_svc_fwm_table[idx], f_list) {
			if (net_eq(svc->net, net))
				ip_vs_svc_reset(svc, dev);
		}
		write_unlock_bh(&__ip_vs_svc_lock);
	}

	list_for_each_entry(dest, &net_ipvs(net)->dest_trash, n_list) {
		__ip_vs_dev_reset(dest, dev);
	}
	mutex_unlock(&__ip_vs_mutex);
	LeaveFunction(2);
	return NOTIFY_DONE;
}

static struct notifier_block ip_vs_dst_notifier = {
	.notifier_call = ip_vs_dst_event,
};


int __init ip_vs_control_init(void)
...
	at the end
	ret = register_netdevice_notifier(&ip_vs_dst_notifier);
...

and an unregister_ of course.


> 
> 	There are many examples for this, eg. net/core/fib_rules.c
> Then we are sure on cleanup we can not see traffic for our net
> because all devices are unregistered before us. We don't have
> to rely on throttle to stop the traffic during cleanup. And
> we do not hold devices after NETDEV_UNREGISTER.
> 
> 	I can prepare such patch but in next days. We need such
> code anyways because the dests can hold such dsts when no
> traffic is present and we can see again this "waiting for %s" ...
> message.
> 
> 	throttle still can be used but now it can not stop
> the traffic if connections exist.
> 
> 	For __ip_vs_service_cleanup: it still has to use mutex.

OK I will try to use unlock methods, if it doesn't work use the mutex.

> Or we can avoid it by introducing ip_vs_unlink_service_nolock:
> ip_vs_flush will look like your __ip_vs_service_cleanup and
> will call ip_vs_unlink_service_nolock. ip_vs_unlink_service_nolock
> will be called by ip_vs_flush and by ip_vs_unlink_service.
> You can try such changes, if not, I'll prepare some patches
> after 2-3 days.
> 

Regards
Hans

next prev parent reply	other threads:[~2011-04-20 10:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-19 15:25 [PATCH 1/3] IPVS: Change of socket usage to enable name space exit Hans Schillstrom
2011-04-19 15:25 ` [PATCH 2/3] IPVS: Change of register_pernet_subsys to register_pernet_device Hans Schillstrom
2011-04-20  9:46   ` Eric W. Biederman
2011-04-19 15:25 ` [PATCH 3/3] IPVS: init and cleanup restructuring Hans Schillstrom
2011-04-19 23:12   ` Simon Horman
2011-04-20 12:00     ` Hans Schillstrom
2011-04-19 23:19   ` Julian Anastasov
2011-04-20  9:56     ` Hans Schillstrom
2011-04-20 10:41     ` Hans Schillstrom [this message]
2011-04-19 22:11 ` [PATCH 1/3] IPVS: Change of socket usage to enable name space exit Simon Horman
2011-04-20 10:00   ` Eric W. Biederman
2011-04-20  9:40 ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201104201241.50176.hans@schillstrom.com \
    --to=hans@schillstrom.com \
    --cc=ebiederm@xmission.com \
    --cc=hans.schillstrom@ericsson.com \
    --cc=horms@verge.net.au \
    --cc=ja@ssi.bg \
    --cc=lvs-devel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).