From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexey Perevalov Subject: Re: [net-next RFC v2] net_cls: traffic counter based on classification control cgroup Date: Wed, 28 Nov 2012 09:21:24 +0400 Message-ID: <50B59F54.8080401@samsung.com> References: <50B49C6C.8030604@samsung.com> <50B49DEA.7010000@parallels.com> <50B4B9E2.4030200@monom.org> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-reply-to: <50B4B9E2.4030200-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Daniel Wagner Cc: Glauber Costa , netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On 11/27/2012 05:02 PM, Daniel Wagner wrote: > Hi Alexey, > > On 27.11.2012 12:03, Glauber Costa wrote: >> On 11/27/2012 02:56 PM, Alexey Perevalov wrote: >>> Hello. >>> >>> It's second version of patch I already sent to netdev. >>> >>> The main goal of this patch it's counting traffic for process placed to >>> net_cls cgroup (ingress and egress). >>> It's based on res_counters and holds counter per network interfaces. >>> >>> Description of patch. >>> It handles packets in net/core/dev.c for egress and in >>> /net/ipv4/tcp.c|udp.c for ingress. >>> These places were chosen because we need to know also network interface. >>> >>> Cgroup fs interface provides following files additional to existing >>> net_cls files: >>> net_cls.ifacename.usage_in_bytes >>> Containing rcv/snd lines. >>> Also this patch adds to net_cls ability to handle a network device >>> registration. >>> >>> It could be included or excluded in compile time. >>> I moved the menu entry for "Control group classifier" from network/QoS to >>> General Option/Control Group. >>> >>> I'm waiting for you comments. >>> >> Daniel Wagner is working on something a lot similar. > Yes, basically what I try to do is explained by this excellent article > > https://lwn.net/Articles/523058/ I read articles and agreed with aspects. But problem of selecting preferred network for application can be solved using netprio cgroup. > The short version: Per application routing and statistics. > > I have two PoC implementation doing this. Both implementation have the same key > idea which is to set SO_MARK per application. The routing and statistics would > then be done by a bunch iptables rules. > > In the first implementation extends net_cls to set SO_MARK: > > void sock_update_classid(struct sock *sk, struct task_struct *task) > { > u32 classid; > + u32 mark; > > classid = task_cls_classid(task); > if (classid != sk->sk_classid) > sk->sk_classid = classid; > + > + mark = task_cls_mark(task); > + if (mark != sk->sk_mark) > + sk->sk_mark = mark; > } > > The second implementation is adding a new iptables matcher which matches > on LSM contexts. Then you can do something like this: > > iptables -t mangle -A OUTPUT -m secmark --secctx unconfined_u:unconfined_r:foo_t:s0-s0:c0.c1023 -j MARK --set-mark 200 As I understand in LSM context it works for egress and ingress. >> Maybe you should be in contact, in case you are not yet. >> >> A few general comments: >> 1) res_counters are incredibly expensive. If you are more interested in >> counting than you are in limiting, they may not be your best choice. You right, I have a plan now to limit traffic too here. Or as a variant in QoS in this case atomic is better here. >> 2) When Daniel exposed his use case to me, it gave me the impression >> that "counting traffic" is something that is totally doable by having a >> dedicated interface in a separate namespace. Basically, we already count >> traffic (rx and tx) for all interfaces anyway, so it suggests that it >> could be an interesting way to see the problem. > Moving applications into separate net namespaces is for sure a valid solution. > Though there is a one drawback in this approach. The namespaces need to be > attached to a bridge and then some NATting. That means every application > would get it's own IP address. This might be okay for your certain use > cases but I am still trying to work around this. Glauber and I had some > discussion about this and he suggested to allow the physical networking > device to be attached to several namespaces (e.g. via macvlan). Every > namespace would get the same IP address. Unfortunately, this would result in > the same mess as several physical devices on a network get the same > IP address assigned. Is I truly understand what to make statistics works we need to put process to separate namespace? Approach to keep counter in cgroup hasn't such side effects, but it has another ). > > >> AFAIK, Daniel is still measuring this. But it would be great to know if >> that could work for your use case as well. > I have not started to measure :( > > cheers, > daniel > > Thank you Daniel and Glauber! -- BR Alexey