Netdev List
 help / color / mirror / Atom feed
* Re: tun: Use netif_receive_skb instead of netif_rx
From: Eric Dumazet @ 2010-05-20  5:15 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, bmb, tgraf, nhorman, nhorman, netdev
In-Reply-To: <20100520033446.GA6434@gondor.apana.org.au>

Le jeudi 20 mai 2010 à 13:34 +1000, Herbert Xu a écrit :

> The value is set at socket creation time.  So all sockets created
> via socket(2) automatically gains the ID of the thread creating it.
> Now you may argue that this may not be the same as the thread that
> is sending the packet.  However, we already have a precedence where
> an fd is passed to a different thread, its security property is
> inherited.  In this case, inheriting the classid of the thread
> creating the socket is also the logical thing to do.

I find this very biased, sorry.

In fact, fd passing is just fine today, if we consider that we classify
packets using the identity of the process *using* the fd, not the one
that *created* it.

Now your patch changes this, to the reverse, and you justify the caching
effect on socket. Sorry, this must be too convoluted for me.





^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-20  5:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, bmb, tgraf, nhorman, nhorman, netdev
In-Reply-To: <1274331255.2658.24.camel@edumazet-laptop>

On Thu, May 20, 2010 at 06:54:15AM +0200, Eric Dumazet wrote:
>
> > @@ -1064,6 +1091,9 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
> >  		sock_lock_init(sk);
> >  		sock_net_set(sk, get_net(net));
> >  		atomic_set(&sk->sk_wmem_alloc, 1);
> > +
> > +		if (!in_interrupt())
> > +			sk->sk_classid = task_cls_classid(current);
> 
> It means we cache current classid of this process.
> 
> Can it change later ?

Please read my patchset description.  I explained everything there.

> > @@ -112,6 +108,10 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
> >  	struct cls_cgroup_head *head = tp->root;
> >  	u32 classid;
> >  
> > +	rcu_read_lock();
> > +	classid = task_cls_state(current)->classid;
> > +	rcu_read_unlock();
> > +
> 
> It would be more readable to use :

Sure, however, I'm just moving the existing code around, so I
wanted it to be exactly the same.  Clean-ups should be done as
a follow-up, feel free to do this if you like.

> > @@ -122,12 +122,12 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
> >  	 * calls by looking at the number of nested bh disable calls because
> >  	 * softirqs always disables bh.
> >  	 */
> > -	if (softirq_count() != SOFTIRQ_OFFSET)
> > -		return -1;
> > -
> > -	rcu_read_lock();
> > -	classid = task_cls_state(current)->classid;
> > -	rcu_read_unlock();
> > +	if (softirq_count() != SOFTIRQ_OFFSET) {
> 
> Why do we still need this previous test ?

You didn't read my patchset description at all, did you? :)

This patchset is trying to fix the issue at hand with the minimum
amount of changes to current behaviour.  Once we're happy with the
new semantics, we can get rid of the softirq stuff.

That will be done in a later patch to minimise the risks.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Eric Dumazet @ 2010-05-20  4:54 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, bmb, tgraf, nhorman, nhorman, netdev
In-Reply-To: <20100520033446.GA6434@gondor.apana.org.au>

Le jeudi 20 mai 2010 à 13:34 +1000, Herbert Xu a écrit :
> On Wed, May 19, 2010 at 08:05:22PM -0700, David Miller wrote:
> > Ok, looking forward to it.
> 
> Here we go:
> 
> cls_cgroup: Store classid in struct sock
> 
> Up until now cls_cgroup has relied on fetching the classid out of
> the current executing thread.  This runs into trouble when a packet
> processing is delayed in which case it may execute out of another
> thread's context.
> 
> Furthermore, even when a packet is not delayed we may fail to
> classify it if soft IRQs have been disabled, because this scenario
> is indistinguishable from one where a packet unrelated to the
> current thread is processed by a real soft IRQ.
> 
> As we already have a concept of packet ownership for accounting
> purposes in the skb->sk pointer, this is a natural place to store
> the classid in a persistent manner.
> 
> This patch adds the cls_cgroup classid in struct sock, filling up
> an existing hole on 64-bit :)
> 
> The value is set at socket creation time.  So all sockets created
> via socket(2) automatically gains the ID of the thread creating it.
> Now you may argue that this may not be the same as the thread that
> is sending the packet.  However, we already have a precedence where
> an fd is passed to a different thread, its security property is
> inherited.  In this case, inheriting the classid of the thread
> creating the socket is also the logical thing to do.
> 
> For sockets created on inbound connections through accept(2), we
> inherit the classid of the original listening socket through
> sk_clone, possibly preceding the actual accept(2) call.
> 
> In order to minimise risks, I have not made this the authoritative
> classid.  For now it is only used as a backup when we execute
> with soft IRQs disabled.  Once we're completely happy with its
> semantics we can use it as the sole classid.
> 
> Footnote: I have rearranged the error path on cls_group module
> creation.  If we didn't do this, then there is a window where
> someone could create a tc rule using cls_group before the cgroup
> subsystem has been registered.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 8f78073..63c5036 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -410,6 +410,12 @@ struct cgroup_scanner {
>  	void *data;
>  };
>  
> +struct cgroup_cls_state
> +{
> +	struct cgroup_subsys_state css;
> +	u32 classid;
> +};
> +
>  /*
>   * Add a new file to the given cgroup directory. Should only be
>   * called by subsystems from within a populate() method
> @@ -515,6 +521,10 @@ struct cgroup_subsys {
>  	struct module *module;
>  };
>  
> +#ifndef CONFIG_NET_CLS_CGROUP
> +extern int net_cls_subsys_id;
> +#endif
> +
>  #define SUBSYS(_x) extern struct cgroup_subsys _x ## _subsys;
>  #include <linux/cgroup_subsys.h>
>  #undef SUBSYS
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 1ad6435..361a5dc 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -307,7 +307,7 @@ struct sock {
>  	void			*sk_security;
>  #endif
>  	__u32			sk_mark;
> -	/* XXX 4 bytes hole on 64 bit */
> +	u32			sk_classid;
>  	void			(*sk_state_change)(struct sock *sk);
>  	void			(*sk_data_ready)(struct sock *sk, int bytes);
>  	void			(*sk_write_space)(struct sock *sk);
> diff --git a/net/core/sock.c b/net/core/sock.c
> index c5812bb..70bc529 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -110,6 +110,7 @@
>  #include <linux/tcp.h>
>  #include <linux/init.h>
>  #include <linux/highmem.h>
> +#include <linux/cgroup.h>
>  
>  #include <asm/uaccess.h>
>  #include <asm/system.h>
> @@ -217,6 +218,32 @@ __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX;
>  int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*(2*UIO_MAXIOV+512);
>  EXPORT_SYMBOL(sysctl_optmem_max);
>  
> +#ifdef CONFIG_NET_CLS_CGROUP
> +static inline u32 task_cls_classid(struct task_struct *p)
> +{
> +	return container_of(task_subsys_state(p, net_cls_subsys_id),
> +			    struct cgroup_cls_state, css).classid;
> +}
> +#else
> +int net_cls_subsys_id = -1;
> +EXPORT_SYMBOL_GPL(net_cls_subsys_id);
> +
> +static inline u32 task_cls_classid(struct task_struct *p)
> +{
> +	int id;
> +	u32 classid;
> +
> +	rcu_read_lock();
> +	id = rcu_dereference(net_cls_subsys_id);
> +	if (id >= 0)
> +		classid = container_of(task_subsys_state(p, id),
> +				       struct cgroup_cls_state, css)->classid;
> +	rcu_read_unlock();
> +
> +	return classid;
> +}
> +#endif
> +
>  static int sock_set_timeout(long *timeo_p, char __user *optval, int optlen)
>  {
>  	struct timeval tv;
> @@ -1064,6 +1091,9 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
>  		sock_lock_init(sk);
>  		sock_net_set(sk, get_net(net));
>  		atomic_set(&sk->sk_wmem_alloc, 1);
> +
> +		if (!in_interrupt())
> +			sk->sk_classid = task_cls_classid(current);

It means we cache current classid of this process.

Can it change later ?


>  	}
>  
>  	return sk;
> diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
> index 2211803..32c1296 100644
> --- a/net/sched/cls_cgroup.c
> +++ b/net/sched/cls_cgroup.c
> @@ -16,14 +16,10 @@
>  #include <linux/errno.h>
>  #include <linux/skbuff.h>
>  #include <linux/cgroup.h>
> +#include <linux/rcupdate.h>
>  #include <net/rtnetlink.h>
>  #include <net/pkt_cls.h>
> -
> -struct cgroup_cls_state
> -{
> -	struct cgroup_subsys_state css;
> -	u32 classid;
> -};
> +#include <net/sock.h>
>  
>  static struct cgroup_subsys_state *cgrp_create(struct cgroup_subsys *ss,
>  					       struct cgroup *cgrp);
> @@ -112,6 +108,10 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
>  	struct cls_cgroup_head *head = tp->root;
>  	u32 classid;
>  
> +	rcu_read_lock();
> +	classid = task_cls_state(current)->classid;
> +	rcu_read_unlock();
> +

It would be more readable to use :

static inline int task_cls_state(struct task_struct *p)
{
	int classid;

	rcu_read_lock();
	classid = container_of(task_subsys_state(p, net_cls_subsys_id),
				struct cgroup_cls_state, css)->classid;
	rcu_read_unlock();
	return classid;
}

...

classid = task_cls_classid(current);

>  	/*
>  	 * Due to the nature of the classifier it is required to ignore all
>  	 * packets originating from softirq context as accessing `current'
> @@ -122,12 +122,12 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
>  	 * calls by looking at the number of nested bh disable calls because
>  	 * softirqs always disables bh.
>  	 */
> -	if (softirq_count() != SOFTIRQ_OFFSET)
> -		return -1;
> -
> -	rcu_read_lock();
> -	classid = task_cls_state(current)->classid;
> -	rcu_read_unlock();
> +	if (softirq_count() != SOFTIRQ_OFFSET) {

Why do we still need this previous test ?

> +	 	/* If there is an sk_classid we'll use that. */
> +		if (!skb->sk)
> +			return -1;
> +		classid = skb->sk->sk_classid;
> +	}
>  
>  	if (!classid)
>  		return -1;
> @@ -289,18 +289,35 @@ static struct tcf_proto_ops cls_cgroup_ops __read_mostly = {
>  
>  static int __init init_cgroup_cls(void)
>  {
> -	int ret = register_tcf_proto_ops(&cls_cgroup_ops);
> -	if (ret)
> -		return ret;
> +	int ret;
> +
>  	ret = cgroup_load_subsys(&net_cls_subsys);
>  	if (ret)
> -		unregister_tcf_proto_ops(&cls_cgroup_ops);
> +		goto out;
> +
> +#ifndef CONFIG_NET_CLS_CGROUP
> +	/* We can't use rcu_assign_pointer because this is an int. */
> +	smp_wmb();
> +	net_cls_subsys_id = net_cls_subsys.subsys_id;
> +#endif
> +
> +	ret = register_tcf_proto_ops(&cls_cgroup_ops);
> +	if (ret)
> +		cgroup_unload_subsys(&net_cls_subsys);
> +
> +out:
>  	return ret;
>  }
>  
>  static void __exit exit_cgroup_cls(void)
>  {
>  	unregister_tcf_proto_ops(&cls_cgroup_ops);
> +
> +#ifndef CONFIG_NET_CLS_CGROUP
> +	net_cls_subsys_id = -1;
> +	synchronize_rcu();
> +#endif
> +
>  	cgroup_unload_subsys(&net_cls_subsys);
>  }
>  
> 
> Thanks,



^ permalink raw reply

* Re: [PATCH] net: fix problem in dequeuing from input_pkt_queue
From: Eric Dumazet @ 2010-05-20  4:37 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Changli Gao, davem, netdev
In-Reply-To: <AANLkTimv8YJD61uADUrHoeL4xGfE8wPPUwVpwJNZPFpu@mail.gmail.com>

Le mercredi 19 mai 2010 à 19:48 -0700, Tom Herbert a écrit :
> >> It should be okay?  process_backlog only runs in softirq so bottom
> >> halves are already disabled, and I don't think flush_backlog runs out
> >> of an interrupt.
> >>
> >
> > Oh no. It is an IRQ handler.
> >
> Very well, I will fix that.
> 
> Now I'm wondering, though, what the purpose of flush_backlog is...
> since __netif_receive_skb is called with interrupts enabled it's
> obvious flush_backlog won't catch all the skb's that reference the
> device go away.  Is there a reason these packets need to be flushed
> and can't just be processed?

flush_backlog is called when device is dismantled.

No new packets should be generated by the device at this moment.

Could you please split your patch in units, I spent 20 minutes to review
it and come to same conclusion than Changli (need to disable interrupts
as they are currently disabled) and also :

input_queue_head_incr(sd); are _not_ needed in flush_backlog()

We are in the very last moments of the life of the device, in a very
unlikely situation (packets in flight, not already consumed by the cpu),
we are _dropping_ packets, so OOO means nothing at this point. 

In dev_cpu_callback(), you reverse the order of input_pkt_queue /
process_queue.

Thats fine, but should be a single patch, because I am not sure the
input_queue_head_incr() are valid here, since we re-inject these packets
to netif_rx(). Could you clarify this point ?

Thanks !



^ permalink raw reply

* Re: [PATCH] vhost-net: utilize PUBLISH_USED_IDX feature
From: Rusty Russell @ 2010-05-20  4:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, davem, Juan Quintela, Paul E. McKenney, Arnd Bergmann,
	kvm, virtualization, netdev, linux-kernel, alex.williamson,
	amit.shah
In-Reply-To: <20100519222718.GB4111@redhat.com>

On Thu, 20 May 2010 07:57:18 am Michael S. Tsirkin wrote:
> On Wed, May 19, 2010 at 08:04:51PM +0300, Avi Kivity wrote:
> > On 05/18/2010 04:19 AM, Michael S. Tsirkin wrote:
> >> With PUBLISH_USED_IDX, guest tells us which used entries
> >> it has consumed. This can be used to reduce the number
> >> of interrupts: after we write a used entry, if the guest has not yet
> >> consumed the previous entry, or if the guest has already consumed the
> >> new entry, we do not need to interrupt.
> >> This imporves bandwidth by 30% under some workflows.
> >>
> >> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> >> ---
> >>
> >> Rusty, Dave, this patch depends on the patch
> >> "virtio: put last seen used index into ring itself"
> >> which is currently destined at Rusty's tree.
> >> Rusty, if you are taking that one for 2.6.35, please
> >> take this one as well.
> >> Dave, any objections?
> >>    
> >
> > I object: I think the index should have its own cacheline,
> 
> The issue here is that host/guest do not know each
> other's cache line size. I guess we could just put it
> at offset 128 or something like that ... Rusty?

I was assuming you'd put it at the end of the padding.

I think it's a silly optimization, but Avi obviously feels strongly about
it and I respect his opinion.

Please resubmit...
Thanks,
Rusty.

^ permalink raw reply

* Re: [PATCH 2/2] net: Add classid to sk_buff.
From: David Miller @ 2010-05-20  3:47 UTC (permalink / raw)
  To: herbert; +Cc: bmb, tgraf, nhorman, nhorman, eric.dumazet, netdev
In-Reply-To: <20100520033643.GA6630@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 20 May 2010 13:36:43 +1000

> On Wed, May 19, 2010 at 07:55:38PM -0700, David Miller wrote:
>> 
>> We make this zero cost by moving queue_mapping into an existing
>> empty __u16 slot.  Thus making a __u32 available, which we use
>> for the 'classid'.
>> 
>> Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> You better hide this space before someone steals it :)

Indeed, I didn't even know we had it to begin with. :)

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: David Miller @ 2010-05-20  3:46 UTC (permalink / raw)
  To: herbert; +Cc: bmb, tgraf, nhorman, nhorman, eric.dumazet, netdev
In-Reply-To: <20100520033446.GA6434@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 20 May 2010 13:34:46 +1000

> cls_cgroup: Store classid in struct sock

This looks great to me.

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-20  3:42 UTC (permalink / raw)
  To: David Miller; +Cc: bmb, tgraf, nhorman, nhorman, eric.dumazet, netdev
In-Reply-To: <20100520033446.GA6434@gondor.apana.org.au>

On Thu, May 20, 2010 at 01:34:46PM +1000, Herbert Xu wrote:
>
> cls_cgroup: Store classid in struct sock
> 
> Up until now cls_cgroup has relied on fetching the classid out of
> the current executing thread.  This runs into trouble when a packet
> processing is delayed in which case it may execute out of another
> thread's context.

Oh please add this bit that I forgot about:

This also addresses the case where TCP transmits a queued packet
in response to an inbound ACK.  Currently this isn't classified
at all as we would be in softirq context.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 2/2] net: Add classid to sk_buff.
From: Herbert Xu @ 2010-05-20  3:36 UTC (permalink / raw)
  To: David Miller; +Cc: bmb, tgraf, nhorman, nhorman, eric.dumazet, netdev
In-Reply-To: <20100519.195538.139536660.davem@davemloft.net>

On Wed, May 19, 2010 at 07:55:38PM -0700, David Miller wrote:
> 
> We make this zero cost by moving queue_mapping into an existing
> empty __u16 slot.  Thus making a __u32 available, which we use
> for the 'classid'.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>

You better hide this space before someone steals it :)
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 1/2] ipv6: Store ndisc_nodeid in IP6CB().
From: Herbert Xu @ 2010-05-20  3:35 UTC (permalink / raw)
  To: David Miller; +Cc: bmb, tgraf, nhorman, nhorman, eric.dumazet, netdev
In-Reply-To: <20100519.195536.32704411.davem@davemloft.net>

On Wed, May 19, 2010 at 07:55:36PM -0700, David Miller wrote:
> 
> There is no reason we need to use up space in the generic
> SKB area for this.  All packet input paths in ipv6 explicitly
> clear out the IP6CB() area and therefore the default value
> for ndisc_nodeid will be correct.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>

Looks good to me.  Thanks!
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-20  3:34 UTC (permalink / raw)
  To: David Miller; +Cc: bmb, tgraf, nhorman, nhorman, eric.dumazet, netdev
In-Reply-To: <20100519.200522.140743640.davem@davemloft.net>

On Wed, May 19, 2010 at 08:05:22PM -0700, David Miller wrote:
> Ok, looking forward to it.

Here we go:

cls_cgroup: Store classid in struct sock

Up until now cls_cgroup has relied on fetching the classid out of
the current executing thread.  This runs into trouble when a packet
processing is delayed in which case it may execute out of another
thread's context.

Furthermore, even when a packet is not delayed we may fail to
classify it if soft IRQs have been disabled, because this scenario
is indistinguishable from one where a packet unrelated to the
current thread is processed by a real soft IRQ.

As we already have a concept of packet ownership for accounting
purposes in the skb->sk pointer, this is a natural place to store
the classid in a persistent manner.

This patch adds the cls_cgroup classid in struct sock, filling up
an existing hole on 64-bit :)

The value is set at socket creation time.  So all sockets created
via socket(2) automatically gains the ID of the thread creating it.
Now you may argue that this may not be the same as the thread that
is sending the packet.  However, we already have a precedence where
an fd is passed to a different thread, its security property is
inherited.  In this case, inheriting the classid of the thread
creating the socket is also the logical thing to do.

For sockets created on inbound connections through accept(2), we
inherit the classid of the original listening socket through
sk_clone, possibly preceding the actual accept(2) call.

In order to minimise risks, I have not made this the authoritative
classid.  For now it is only used as a backup when we execute
with soft IRQs disabled.  Once we're completely happy with its
semantics we can use it as the sole classid.

Footnote: I have rearranged the error path on cls_group module
creation.  If we didn't do this, then there is a window where
someone could create a tc rule using cls_group before the cgroup
subsystem has been registered.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 8f78073..63c5036 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -410,6 +410,12 @@ struct cgroup_scanner {
 	void *data;
 };
 
+struct cgroup_cls_state
+{
+	struct cgroup_subsys_state css;
+	u32 classid;
+};
+
 /*
  * Add a new file to the given cgroup directory. Should only be
  * called by subsystems from within a populate() method
@@ -515,6 +521,10 @@ struct cgroup_subsys {
 	struct module *module;
 };
 
+#ifndef CONFIG_NET_CLS_CGROUP
+extern int net_cls_subsys_id;
+#endif
+
 #define SUBSYS(_x) extern struct cgroup_subsys _x ## _subsys;
 #include <linux/cgroup_subsys.h>
 #undef SUBSYS
diff --git a/include/net/sock.h b/include/net/sock.h
index 1ad6435..361a5dc 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -307,7 +307,7 @@ struct sock {
 	void			*sk_security;
 #endif
 	__u32			sk_mark;
-	/* XXX 4 bytes hole on 64 bit */
+	u32			sk_classid;
 	void			(*sk_state_change)(struct sock *sk);
 	void			(*sk_data_ready)(struct sock *sk, int bytes);
 	void			(*sk_write_space)(struct sock *sk);
diff --git a/net/core/sock.c b/net/core/sock.c
index c5812bb..70bc529 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -110,6 +110,7 @@
 #include <linux/tcp.h>
 #include <linux/init.h>
 #include <linux/highmem.h>
+#include <linux/cgroup.h>
 
 #include <asm/uaccess.h>
 #include <asm/system.h>
@@ -217,6 +218,32 @@ __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX;
 int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*(2*UIO_MAXIOV+512);
 EXPORT_SYMBOL(sysctl_optmem_max);
 
+#ifdef CONFIG_NET_CLS_CGROUP
+static inline u32 task_cls_classid(struct task_struct *p)
+{
+	return container_of(task_subsys_state(p, net_cls_subsys_id),
+			    struct cgroup_cls_state, css).classid;
+}
+#else
+int net_cls_subsys_id = -1;
+EXPORT_SYMBOL_GPL(net_cls_subsys_id);
+
+static inline u32 task_cls_classid(struct task_struct *p)
+{
+	int id;
+	u32 classid;
+
+	rcu_read_lock();
+	id = rcu_dereference(net_cls_subsys_id);
+	if (id >= 0)
+		classid = container_of(task_subsys_state(p, id),
+				       struct cgroup_cls_state, css)->classid;
+	rcu_read_unlock();
+
+	return classid;
+}
+#endif
+
 static int sock_set_timeout(long *timeo_p, char __user *optval, int optlen)
 {
 	struct timeval tv;
@@ -1064,6 +1091,9 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		sock_lock_init(sk);
 		sock_net_set(sk, get_net(net));
 		atomic_set(&sk->sk_wmem_alloc, 1);
+
+		if (!in_interrupt())
+			sk->sk_classid = task_cls_classid(current);
 	}
 
 	return sk;
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 2211803..32c1296 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -16,14 +16,10 @@
 #include <linux/errno.h>
 #include <linux/skbuff.h>
 #include <linux/cgroup.h>
+#include <linux/rcupdate.h>
 #include <net/rtnetlink.h>
 #include <net/pkt_cls.h>
-
-struct cgroup_cls_state
-{
-	struct cgroup_subsys_state css;
-	u32 classid;
-};
+#include <net/sock.h>
 
 static struct cgroup_subsys_state *cgrp_create(struct cgroup_subsys *ss,
 					       struct cgroup *cgrp);
@@ -112,6 +108,10 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
 	struct cls_cgroup_head *head = tp->root;
 	u32 classid;
 
+	rcu_read_lock();
+	classid = task_cls_state(current)->classid;
+	rcu_read_unlock();
+
 	/*
 	 * Due to the nature of the classifier it is required to ignore all
 	 * packets originating from softirq context as accessing `current'
@@ -122,12 +122,12 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
 	 * calls by looking at the number of nested bh disable calls because
 	 * softirqs always disables bh.
 	 */
-	if (softirq_count() != SOFTIRQ_OFFSET)
-		return -1;
-
-	rcu_read_lock();
-	classid = task_cls_state(current)->classid;
-	rcu_read_unlock();
+	if (softirq_count() != SOFTIRQ_OFFSET) {
+	 	/* If there is an sk_classid we'll use that. */
+		if (!skb->sk)
+			return -1;
+		classid = skb->sk->sk_classid;
+	}
 
 	if (!classid)
 		return -1;
@@ -289,18 +289,35 @@ static struct tcf_proto_ops cls_cgroup_ops __read_mostly = {
 
 static int __init init_cgroup_cls(void)
 {
-	int ret = register_tcf_proto_ops(&cls_cgroup_ops);
-	if (ret)
-		return ret;
+	int ret;
+
 	ret = cgroup_load_subsys(&net_cls_subsys);
 	if (ret)
-		unregister_tcf_proto_ops(&cls_cgroup_ops);
+		goto out;
+
+#ifndef CONFIG_NET_CLS_CGROUP
+	/* We can't use rcu_assign_pointer because this is an int. */
+	smp_wmb();
+	net_cls_subsys_id = net_cls_subsys.subsys_id;
+#endif
+
+	ret = register_tcf_proto_ops(&cls_cgroup_ops);
+	if (ret)
+		cgroup_unload_subsys(&net_cls_subsys);
+
+out:
 	return ret;
 }
 
 static void __exit exit_cgroup_cls(void)
 {
 	unregister_tcf_proto_ops(&cls_cgroup_ops);
+
+#ifndef CONFIG_NET_CLS_CGROUP
+	net_cls_subsys_id = -1;
+	synchronize_rcu();
+#endif
+
 	cgroup_unload_subsys(&net_cls_subsys);
 }
 

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related

* Re: tun: Use netif_receive_skb instead of netif_rx
From: David Miller @ 2010-05-20  3:05 UTC (permalink / raw)
  To: herbert; +Cc: bmb, tgraf, nhorman, nhorman, eric.dumazet, netdev
In-Reply-To: <20100520025741.GA6129@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 20 May 2010 12:57:42 +1000

> On Wed, May 19, 2010 at 07:55:33PM -0700, David Miller wrote:
>> 
>> Since it seems now inevitable to me that we'll need to store the
>> 'classid' in struct sk_buff, I've prepared two patches to do that at
>> zero cost.
> 
> Actually I'm currently working on a patch to store this in struct
> sock.  The reason I prefer struct sock is because we can then
> centralise the place where we set the classid rather than having
> it spread out through every place that creates an skb.
> 
> I'll post it soon.

Ok, looking forward to it.

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-20  2:57 UTC (permalink / raw)
  To: David Miller; +Cc: bmb, tgraf, nhorman, nhorman, eric.dumazet, netdev
In-Reply-To: <20100519.195533.22536631.davem@davemloft.net>

On Wed, May 19, 2010 at 07:55:33PM -0700, David Miller wrote:
> 
> Since it seems now inevitable to me that we'll need to store the
> 'classid' in struct sk_buff, I've prepared two patches to do that at
> zero cost.

Actually I'm currently working on a patch to store this in struct
sock.  The reason I prefer struct sock is because we can then
centralise the place where we set the classid rather than having
it spread out through every place that creates an skb.

I'll post it soon.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [PATCH 2/2] net: Add classid to sk_buff.
From: David Miller @ 2010-05-20  2:55 UTC (permalink / raw)
  To: bmb; +Cc: tgraf, nhorman, nhorman, eric.dumazet, herbert, netdev


We make this zero cost by moving queue_mapping into an existing
empty __u16 slot.  Thus making a __u32 available, which we use
for the 'classid'.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/skbuff.h |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7c16f24..f847ec2 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -367,12 +367,9 @@ struct sk_buff {
 #ifdef CONFIG_NET_CLS_ACT
 	__u16			tc_verd;	/* traffic control verdict */
 #endif
+	__u32			classid;
 #endif
 
-	__u16			queue_mapping;
-
-	/* 16 bit hole */
-
 #ifdef CONFIG_NET_DMA
 	dma_cookie_t		dma_cookie;
 #endif
@@ -385,6 +382,7 @@ struct sk_buff {
 	};
 
 	__u16			vlan_tci;
+	__u16			queue_mapping;
 
 	sk_buff_data_t		transport_header;
 	sk_buff_data_t		network_header;
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 1/2] ipv6: Store ndisc_nodeid in IP6CB().
From: David Miller @ 2010-05-20  2:55 UTC (permalink / raw)
  To: bmb; +Cc: tgraf, nhorman, nhorman, eric.dumazet, herbert, netdev


There is no reason we need to use up space in the generic
SKB area for this.  All packet input paths in ipv6 explicitly
clear out the IP6CB() area and therefore the default value
for ndisc_nodeid will be correct.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/ipv6.h   |    2 +-
 include/linux/skbuff.h |    9 ++-------
 net/ipv6/ndisc.c       |   11 ++++++-----
 net/ipv6/sit.c         |    9 ++++++---
 4 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index e0cc9a7..fc39add 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -247,7 +247,7 @@ struct inet6_skb_parm {
 #if defined(CONFIG_IPV6_MIP6) || defined(CONFIG_IPV6_MIP6_MODULE)
 	__u16			dsthao;
 #endif
-
+	__u8			ndisc_nodetype;
 #define IP6SKB_XFRM_TRANSFORMED	1
 #define IP6SKB_FORWARDED	2
 };
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 124f90c..7c16f24 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -369,14 +369,9 @@ struct sk_buff {
 #endif
 #endif
 
-	kmemcheck_bitfield_begin(flags2);
-	__u16			queue_mapping:16;
-#ifdef CONFIG_IPV6_NDISC_NODETYPE
-	__u8			ndisc_nodetype:2;
-#endif
-	kmemcheck_bitfield_end(flags2);
+	__u16			queue_mapping;
 
-	/* 0/14 bit hole */
+	/* 16 bit hole */
 
 #ifdef CONFIG_NET_DMA
 	dma_cookie_t		dma_cookie;
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index da0a4d2..e7c0897 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1134,7 +1134,7 @@ static void ndisc_router_discovery(struct sk_buff *skb)
 	}
 
 #ifdef CONFIG_IPV6_NDISC_NODETYPE
-	if (skb->ndisc_nodetype == NDISC_NODETYPE_HOST) {
+	if (IP6CB(skb)->ndisc_nodetype == NDISC_NODETYPE_HOST) {
 		ND_PRINTK2(KERN_WARNING
 			   "ICMPv6 RA: from host or unauthorized router\n");
 		return;
@@ -1166,7 +1166,7 @@ static void ndisc_router_discovery(struct sk_buff *skb)
 
 #ifdef CONFIG_IPV6_NDISC_NODETYPE
 	/* skip link-specific parameters from interior routers */
-	if (skb->ndisc_nodetype == NDISC_NODETYPE_NODEFAULT)
+	if (IP6CB(skb)->ndisc_nodetype == NDISC_NODETYPE_NODEFAULT)
 		goto skip_linkparms;
 #endif
 
@@ -1323,7 +1323,8 @@ skip_linkparms:
 		     p = ndisc_next_option(p, ndopts.nd_opts_ri_end)) {
 			struct route_info *ri = (struct route_info *)p;
 #ifdef CONFIG_IPV6_NDISC_NODETYPE
-			if (skb->ndisc_nodetype == NDISC_NODETYPE_NODEFAULT &&
+			if (IP6CB(skb)->ndisc_nodetype ==
+			    NDISC_NODETYPE_NODEFAULT &&
 			    ri->prefix_len == 0)
 				continue;
 #endif
@@ -1337,7 +1338,7 @@ skip_linkparms:
 
 #ifdef CONFIG_IPV6_NDISC_NODETYPE
 	/* skip link-specific ndopts from interior routers */
-	if (skb->ndisc_nodetype == NDISC_NODETYPE_NODEFAULT)
+	if (IP6CB(skb)->ndisc_nodetype == NDISC_NODETYPE_NODEFAULT)
 		goto out;
 #endif
 
@@ -1405,7 +1406,7 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
 	u8 *lladdr = NULL;
 
 #ifdef CONFIG_IPV6_NDISC_NODETYPE
-	switch (skb->ndisc_nodetype) {
+	switch (IP6CB(skb)->ndisc_nodetype) {
 	case NDISC_NODETYPE_HOST:
 	case NDISC_NODETYPE_NODEFAULT:
 		ND_PRINTK2(KERN_WARNING
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 5abae10..ac014e0 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -427,21 +427,24 @@ static int
 isatap_chksrc(struct sk_buff *skb, struct iphdr *iph, struct ip_tunnel *t)
 {
 	struct ip_tunnel_prl_entry *p;
+	struct inet6_skb_parm *cb;
 	int ok = 1;
 
+	cb = IP6CB(skb);
+
 	rcu_read_lock();
 	p = __ipip6_tunnel_locate_prl(t, iph->saddr);
 	if (p) {
 		if (p->flags & PRL_DEFAULT)
-			skb->ndisc_nodetype = NDISC_NODETYPE_DEFAULT;
+			cb->ndisc_nodetype = NDISC_NODETYPE_DEFAULT;
 		else
-			skb->ndisc_nodetype = NDISC_NODETYPE_NODEFAULT;
+			cb->ndisc_nodetype = NDISC_NODETYPE_NODEFAULT;
 	} else {
 		struct in6_addr *addr6 = &ipv6_hdr(skb)->saddr;
 		if (ipv6_addr_is_isatap(addr6) &&
 		    (addr6->s6_addr32[3] == iph->saddr) &&
 		    ipv6_chk_prefix(addr6, t->dev))
-			skb->ndisc_nodetype = NDISC_NODETYPE_HOST;
+			cb->ndisc_nodetype = NDISC_NODETYPE_HOST;
 		else
 			ok = 0;
 	}
-- 
1.7.0.4


^ permalink raw reply related

* Re: tun: Use netif_receive_skb instead of netif_rx
From: David Miller @ 2010-05-20  2:55 UTC (permalink / raw)
  To: bmb; +Cc: tgraf, nhorman, nhorman, eric.dumazet, herbert, netdev
In-Reply-To: <4BF4517F.1010206@athenacr.com>


Since it seems now inevitable to me that we'll need to store the
'classid' in struct sk_buff, I've prepared two patches to do that at
zero cost.

Someone please go with this.

^ permalink raw reply

* Re: how many msi (msi-x) vectors can be setup?
From: Yinghai @ 2010-05-20  2:53 UTC (permalink / raw)
  To: zhou rui; +Cc: netdev
In-Reply-To: <AANLkTinV8MHMEySnJWzqr0fUdEb_sbJ484y86VpvtlhP@mail.gmail.com>

On 05/19/2010 07:04 PM, zhou rui wrote:
> it is kernel2.6.27, Intel(R) Xeon(R) CPU           E5540,64bit,16
> processors.,so there should be 16 vectors?

are you using sles 11 and x2apic?

YH

^ permalink raw reply

* Re: [PATCH] net: fix problem in dequeuing from input_pkt_queue
From: Tom Herbert @ 2010-05-20  2:48 UTC (permalink / raw)
  To: Changli Gao; +Cc: davem, eric.dumazet, netdev
In-Reply-To: <AANLkTinZHzZykWYWu9vOmK5ydtcB6ToVBKG1RKYSuSn4@mail.gmail.com>

>> It should be okay?  process_backlog only runs in softirq so bottom
>> halves are already disabled, and I don't think flush_backlog runs out
>> of an interrupt.
>>
>
> Oh no. It is an IRQ handler.
>
Very well, I will fix that.

Now I'm wondering, though, what the purpose of flush_backlog is...
since __netif_receive_skb is called with interrupts enabled it's
obvious flush_backlog won't catch all the skb's that reference the
device go away.  Is there a reason these packets need to be flushed
and can't just be processed?

>  on_each_cpu(flush_backlog, dev, 1);
> ...
> int on_each_cpu(void (*func) (void *info), void *info, int wait)
> {
>        int ret = 0;
>
>        preempt_disable();
>        ret = smp_call_function(func, info, wait);
>        local_irq_disable();
>        func(info);
>        local_irq_enable();
>        preempt_enable();
>        return ret;
> }
>
> --
> Regards,
> Changli Gao(xiaosuo@gmail.com)
>

^ permalink raw reply

* Re: [GIT] Networking
From: David Miller @ 2010-05-20  2:34 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel
In-Reply-To: <20100518.233752.256882583.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Tue, 18 May 2010 23:37:52 -0700 (PDT)

> Please pull, thanks a lot!
> 
> The following changes since commit 537b60d17894b7c19a6060feae40299d7109d6e7:
>   Linus Torvalds (1):
>         Merge branch 'x86-uv-for-linus' of git://git.kernel.org/.../tip/linux-2.6-tip
> 
> are available in the git repository at:
> 
>   master.kernel.org:/pub/scm/linux/kernel/git/davem/net-next-2.6.git master

Ping?

^ permalink raw reply

* Re: how many msi (msi-x) vectors can be setup?
From: zhou rui @ 2010-05-20  2:04 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: netdev
In-Reply-To: <AANLkTimrQPhPomBC-x-E0b9BKDUmO2j2flZp_Qm2aAtK@mail.gmail.com>

it is kernel2.6.27, Intel(R) Xeon(R) CPU           E5540,64bit,16
processors.,so there should be 16 vectors?



On Thu, May 20, 2010 at 4:33 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, May 19, 2010 at 8:18 AM, zhou rui <wirelesser@gmail.com> wrote:
>> hi there:
>> how many msi (msi-x) vectors can be setup?
>> the number is limited by hardware resource(nic), or kernel ?
>> I found that the driver (broadcom 57711 ver 1.5.12) tried to request
>> 16 queues on my kernel2.6.27,but only 2  available
>> will it be increased if I update the driver or kernel?
>> and there is a limitiation in the system? if the other devices have
>> already occupied too many MSI vectors then it is not enough.
>
> from kernel 2.6.19 x86_64, there is per-cpu vector irq support.
>
> depends your system : CPU num? 64bit or 32bit.
>
> YH
>

^ permalink raw reply

* Re: HFSC classes going out of bounds, regression in recent kernels?
From: Denys Fedorysychenko @ 2010-05-20  1:34 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, Jeff Garzik, Eric Dumazet
In-Reply-To: <4BBB5543.40607@trash.net>

I am trying to track down HFSC bug.
It seems, most probably it is related to PSCHED_SHIFT at the end, i am doing 
testing again. I will try to do complete clean build, maybe last time some .o 
was left or i forgot to do make clean.

SM_SHIFT in HFSC is calculated as 30 - PSCHED_SHIFT, and it is shifted too 
much (or not enough) with new changes (ISM_SHIFT seems wrong too). So it is 
most probably overflow or not enough resolution.
I will try to change PSCHED_SHIFT back to confirm that, and at least i found 
way to reproduce bug.

Additionally in sch_hfsc.c i notice mentioned that PSCHED_SHIFT 10 is tick per 
1024us, but i try to calculate their table (in source comments), it doesn't 
fit with my calculations based on 1024us/tick, but fits well with 1024 
nanosecond.

Is it was 1024ns per tick and now 64ns per tick? Or it is microseconds(us) ?

^ permalink raw reply

* Re: [RFC] netem: correlated loss generation (v3)
From: Hagen Paul Pfeifer @ 2010-05-20  0:43 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Stefano Salsano, David Miller, Fabio Ludovici, netdev, netem
In-Reply-To: <20100519171733.71c24539@nehalam>

* Stephen Hemminger | 2010-05-19 17:17:33 [-0700]:

>The old model was useful, but it really didn't do correlated loss.
>For legacy, the old syntax will go through the same code and generate
>the same result.

Is this really necessary? The right thing is to fix the broken behavior! If
the new patch provides this, great.  Imaging a network analysis for a PhD
dissertation based on a broken correlation algorithm - the whole results are
misleading and wrong. No one deserves this ... ;-)

If the current algorithm is broken then the mechanism must be fixed. Preserve
compatibility is counterproductive in this case.

>tc qdisc change dev eth0 root netem 
>      loss 2 10                              # compat syntax
>      loss random 2 10                       # same as above
>      loss deterministic file                # loss model based on bitmap
>      loss state p13 [p31 [p32 [p23 [p14]]]] # 4 state 
>      loss model  p [r [1-h [1-k]]]          # gilbert elliot model
>
>Any suggestions for better syntax are appreciated.

Not at the moment, looks clear and understandable.


Cheers, Hagen

-- 
Hagen Paul Pfeifer <hagen@jauu.net>  ||  http://jauu.net/
Telephone: +49 174 5455209           ||  Key Id: 0x98350C22
Key Fingerprint: 490F 557B 6C48 6D7E 5706 2EA2 4A22 8D45 9835 0C22


^ permalink raw reply

* Re: [RFC] netem: correlated loss generation (v3)
From: Stefano Salsano @ 2010-05-20  0:22 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: Stephen Hemminger, David Miller, Fabio Ludovici, netdev, netem
In-Reply-To: <20100519230433.GE5146@nuttenaction>

Hagen Paul Pfeifer wrote:
> * Stefano Salsano | 2010-05-20 00:52:00 [+0200]:
> 
>> So my opinion is that the need to emulate "correlated" loss patterns
>> is not academic, but it is a real need from industry... of course we
>> can debate if it is a "niche" requirement or not
> 
> netem is not in the processing hot path, so there is no issue to add an
> additional component. If there are some[TM] users and it is usable, I am
> fine with this patch!
> 
>> tc qdisc change dev wlan0 root netem loss 2 10
>>
>> because this produces broken results...
> 
> How to model this specific network characteristic (2% loss, correlation 10%)
> with your modifications? Can you give us an example?
> 

The definition of "correlation" for the correlated loss was 
intrinsically broken.

We can now use two models to introduce correlated loss events.

One is called GI (General and Intuitive), where the "burst lenght" of 
consecutive loss events is used to measure correlation, so the second 
(optional) parameter is not the "correlation" but the burst lenght:

tc qdisc add dev wlan0 root netem loss_GI ploss burst_length

for example if ploss = 2% then the burst lenght for uncorrelated loss 
will be 1/(1-ploss) = 1 / 0.98 ~= 1.02

this means that you will have almost always isolated loss events if 
burst_lengh is 1.02

everything greater than 1.02 for burst_lenght will add a correlation in 
the loss patterns, for example:

tc qdisc add dev wlan0 root netem loss_GI 2 3

will mean that the loss events will be grouped in bursts of average 
lenght 3 (to keep the 2% loss this will result in less frequent loss 
bursts, but with more consecutive losses per bursts)

The second model is called Gilbert-Elliot model, you have to input two 
parameters p and r:

tc qdisc add dev eth0 root netem loss_gilb_ell p r

p and r are related to ploss and burst_length in the following way:
ploss = p/(p+r)
burst_length = 1/r

Cheers,
Stefano

PS Thank you for your question! It was important to clarify with such an 
example the new approach. We will soon add this discussion to the 
documentation available at 
http://netgroup.uniroma2.it/twiki/bin/view.cgi/Main/NetemCLG


> Cheers, Hagen
> 


-- 
*******************************************************************
Stefano Salsano
Dipartimento Ingegneria Elettronica
Universita' di Roma "Tor Vergata"
Via del Politecnico, 1 - 00133 Roma - ITALY

http://netgroup.uniroma2.it/Stefano_Salsano/

E-mail  : stefano.salsano@uniroma2.it
Cell.   : +39 320 4307310
Office  : (Tel.) +39 06 72597770  (Fax.) +39 06 72597435
*******************************************************************

^ permalink raw reply

* Re: [RFC] netem: correlated loss generation (v3)
From: Stephen Hemminger @ 2010-05-20  0:17 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: Stefano Salsano, David Miller, Fabio Ludovici, netdev, netem
In-Reply-To: <20100519230433.GE5146@nuttenaction>

On Thu, 20 May 2010 01:04:33 +0200
Hagen Paul Pfeifer <hagen@jauu.net> wrote:

> * Stefano Salsano | 2010-05-20 00:52:00 [+0200]:
> 
> >So my opinion is that the need to emulate "correlated" loss patterns
> >is not academic, but it is a real need from industry... of course we
> >can debate if it is a "niche" requirement or not
> 
> netem is not in the processing hot path, so there is no issue to add an
> additional component. If there are some[TM] users and it is usable, I am
> fine with this patch!
> 
> >tc qdisc change dev wlan0 root netem loss 2 10
> >
> >because this produces broken results...
> 
> How to model this specific network characteristic (2% loss, correlation 10%)
> with your modifications? Can you give us an example?

The old model was useful, but it really didn't do correlated loss.
For legacy, the old syntax will go through the same code and generate
the same result.

iproute2 syntax is not finalized but, plan is simplified version of
the NetemCLG paper.

tc qdisc change dev eth0 root netem 
      loss 2 10                              # compat syntax
      loss random 2 10                       # same as above
      loss deterministic file                # loss model based on bitmap
      loss state p13 [p31 [p32 [p23 [p14]]]] # 4 state 
      loss model  p [r [1-h [1-k]]]          # gilbert elliot model

Any suggestions for better syntax are appreciated.

^ permalink raw reply

* Re: [PATCH] net: fix problem in dequeuing from input_pkt_queue
From: Changli Gao @ 2010-05-20  0:09 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, eric.dumazet, netdev
In-Reply-To: <AANLkTilxZdJifPtBANrDNdYWxiAMLY6AwA0KYm1yWjle@mail.gmail.com>

On Thu, May 20, 2010 at 7:58 AM, Tom Herbert <therbert@google.com> wrote:
>>>        napi->weight = weight_p;
>>> -       local_irq_disable();
>>>        while (work < quota) {
>>>                struct sk_buff *skb;
>>>                unsigned int qlen;
>>>
>>>                while ((skb = __skb_dequeue(&sd->process_queue))) {
>>> -                       local_irq_enable();
>>
>> we need to keep local irq disabled. If not, flush_backlog may be
>> called, and it will access sd->process_queue.
>>
>
> It should be okay?  process_backlog only runs in softirq so bottom
> halves are already disabled, and I don't think flush_backlog runs out
> of an interrupt.
>

Oh no. It is an IRQ handler.

  on_each_cpu(flush_backlog, dev, 1);
...
int on_each_cpu(void (*func) (void *info), void *info, int wait)
{
        int ret = 0;

        preempt_disable();
        ret = smp_call_function(func, info, wait);
        local_irq_disable();
        func(info);
        local_irq_enable();
        preempt_enable();
        return ret;
}

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox