From: Hans Schillstrom <hans@schillstrom.com>
To: Julian Anastasov <ja@ssi.bg>
Cc: horms@verge.net.au, ebiederm@xmission.com,
lvs-devel@vger.kernel.org, netdev@vger.kernel.org,
netfilter-devel@vger.kernel.org, hans.schillstrom@ericsson.com
Subject: Re: [PATCH 3/3] IPVS: init and cleanup restructuring.
Date: Wed, 20 Apr 2011 12:41:49 +0200 [thread overview]
Message-ID: <201104201241.50176.hans@schillstrom.com> (raw)
In-Reply-To: <alpine.LFD.2.00.1104200207340.1724@ja.ssi.bg>
On Wednesday, April 20, 2011 01:19:38 Julian Anastasov wrote:
>
> Hello,
>
> On Tue, 19 Apr 2011, Hans Schillstrom wrote:
>
> > This patch tries to restore the initial init and cleanup
> > sequences that was before name space patch.
> >
> > The number of calls to register_pernet_device have been
> > reduced to one for the ip_vs.ko
> > Schedulers still have their own calls.
> >
> > This patch adds a function __ip_vs_service_cleanup()
> > and a throttle or actually on/off switch for
> > the netfilter hooks.
> >
> > The nf hooks will be enabled when the first service is loaded
> > and disabled when the last service is removed or when a
> > name space exit starts.
>
> For me using _net suffix is more clear compared
> to __ prefix for the pernet methods.
>
Sure I'll do that.
> For ip_vs_in: may be we can move the check here:
>
> + net = skb_net(skb);
> + if (net->ipvs->throttle)
> + return NF_ACCEPT;
> ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
>
> /* Bad... Do not break raw sockets */
Done
>
> It will save such checks later in ICMP funcs. But this
> throttle idea looks dangerous for cleanup. It does not use RCU.
> The readers can cache the 0 in throttle for long time.
> May be by using register_pernet_device we are in list with other
> devices and it is still possible some device used by
> our dst_cache to be unregistered before IPVS or we to be
> unregistered before such device and some race with throttle
> to happen. throttle is good when enabling traffic with
> the first virtual service, later it can slowly stop the traffic
> but we can not rely on it during netns cleanup.
OK , I will revert back to register_pernet_subsys(),
and use one single register_pernet_device() exit hook for :
- the throttle to disable traffic
- stop the kernel threads.
>
> So, there are 2 problems with the devices:
>
> - if we use _device pernet registration we can see packets
> in our netns during cleanup. I assume this is possible
> when IPVS is unregistered before such devices.
>
> - dests can cache dst and to hold the device after it is
> unregistered in netns, obviously for very short time until
> IPVS is later unregistered from netns. And for long time
> if device is unregistered but netns remains.
>
> Also, in most of the cases svc->refcnt is above 0 because
> dests can be in trash list. You should be lucky to delete the
> service without any connections, only then ip_vs_svc_unhash can
> see refcnt == 0.
>
> So, may be we have to use register_pernet_subsys (not
> _device). We need just to register notifier with
> register_netdevice_notifier and to catch NETDEV_UNREGISTER,
> so that if any dest uses this device we have to release the dst:
>
I made a quick test of the concept and it seems to work,
(A mix of new and old connections at 1GB/s into 4 namespaces when killing them all)
> - lock mutex
> - for every dest (also in trash):
> spin_lock_bh(&dest->dst_lock);
> if (dest->dst_cache && dest->dst_cache->dev == unregistered_dev)
> ip_vs_dst_reset(dest);
> spin_unlock_bh(&dest->dst_lock);
Here is a what i did
static inline void
__ip_vs_dev_reset(struct ip_vs_dest *dest, struct net_device *dev)
{
spin_lock_bh(&dest->dst_lock);
if (dest->dst_cache && dest->dst_cache->dev == dev)
ip_vs_dst_reset(dest);
spin_unlock_bh(&dest->dst_lock);
}
static void ip_vs_svc_reset(struct ip_vs_service *svc, struct net_device *dev)
{
struct ip_vs_dest *dest;
list_for_each_entry(dest, &svc->destinations, n_list) {
__ip_vs_dev_reset(dest, dev);
}
}
static int ip_vs_dst_event(struct notifier_block *this, unsigned long event,
void *ptr)
{
struct net_device *dev = ptr;
struct net *net = dev_net(dev);
struct ip_vs_service *svc;
struct ip_vs_dest *dest;
unsigned int idx;
if (event != NETDEV_UNREGISTER)
return NOTIFY_DONE;
EnterFunction(2);
mutex_lock(&__ip_vs_mutex);
for (idx = 0; idx<IP_VS_SVC_TAB_SIZE; idx++) {
write_lock_bh(&__ip_vs_svc_lock);
list_for_each_entry(svc, &ip_vs_svc_table[idx], s_list) {
if (net_eq(svc->net, net))
ip_vs_svc_reset(svc, dev);
}
list_for_each_entry(svc, &ip_vs_svc_fwm_table[idx], f_list) {
if (net_eq(svc->net, net))
ip_vs_svc_reset(svc, dev);
}
write_unlock_bh(&__ip_vs_svc_lock);
}
list_for_each_entry(dest, &net_ipvs(net)->dest_trash, n_list) {
__ip_vs_dev_reset(dest, dev);
}
mutex_unlock(&__ip_vs_mutex);
LeaveFunction(2);
return NOTIFY_DONE;
}
static struct notifier_block ip_vs_dst_notifier = {
.notifier_call = ip_vs_dst_event,
};
int __init ip_vs_control_init(void)
...
at the end
ret = register_netdevice_notifier(&ip_vs_dst_notifier);
...
and an unregister_ of course.
>
> There are many examples for this, eg. net/core/fib_rules.c
> Then we are sure on cleanup we can not see traffic for our net
> because all devices are unregistered before us. We don't have
> to rely on throttle to stop the traffic during cleanup. And
> we do not hold devices after NETDEV_UNREGISTER.
>
> I can prepare such patch but in next days. We need such
> code anyways because the dests can hold such dsts when no
> traffic is present and we can see again this "waiting for %s" ...
> message.
>
> throttle still can be used but now it can not stop
> the traffic if connections exist.
>
> For __ip_vs_service_cleanup: it still has to use mutex.
OK I will try to use unlock methods, if it doesn't work use the mutex.
> Or we can avoid it by introducing ip_vs_unlink_service_nolock:
> ip_vs_flush will look like your __ip_vs_service_cleanup and
> will call ip_vs_unlink_service_nolock. ip_vs_unlink_service_nolock
> will be called by ip_vs_flush and by ip_vs_unlink_service.
> You can try such changes, if not, I'll prepare some patches
> after 2-3 days.
>
Regards
Hans
next prev parent reply other threads:[~2011-04-20 10:41 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-19 15:25 [PATCH 1/3] IPVS: Change of socket usage to enable name space exit Hans Schillstrom
2011-04-19 15:25 ` [PATCH 2/3] IPVS: Change of register_pernet_subsys to register_pernet_device Hans Schillstrom
2011-04-20 9:46 ` Eric W. Biederman
2011-04-19 15:25 ` [PATCH 3/3] IPVS: init and cleanup restructuring Hans Schillstrom
2011-04-19 23:12 ` Simon Horman
2011-04-20 12:00 ` Hans Schillstrom
2011-04-19 23:19 ` Julian Anastasov
2011-04-20 9:56 ` Hans Schillstrom
2011-04-20 10:41 ` Hans Schillstrom [this message]
2011-04-19 22:11 ` [PATCH 1/3] IPVS: Change of socket usage to enable name space exit Simon Horman
2011-04-20 10:00 ` Eric W. Biederman
2011-04-20 9:40 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201104201241.50176.hans@schillstrom.com \
--to=hans@schillstrom.com \
--cc=ebiederm@xmission.com \
--cc=hans.schillstrom@ericsson.com \
--cc=horms@verge.net.au \
--cc=ja@ssi.bg \
--cc=lvs-devel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).