Netdev List
 help / color / mirror / Atom feed
* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-20  3:42 UTC (permalink / raw)
  To: David Miller; +Cc: bmb, tgraf, nhorman, nhorman, eric.dumazet, netdev
In-Reply-To: <20100520033446.GA6434@gondor.apana.org.au>

On Thu, May 20, 2010 at 01:34:46PM +1000, Herbert Xu wrote:
>
> cls_cgroup: Store classid in struct sock
> 
> Up until now cls_cgroup has relied on fetching the classid out of
> the current executing thread.  This runs into trouble when a packet
> processing is delayed in which case it may execute out of another
> thread's context.

Oh please add this bit that I forgot about:

This also addresses the case where TCP transmits a queued packet
in response to an inbound ACK.  Currently this isn't classified
at all as we would be in softirq context.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: David Miller @ 2010-05-20  3:46 UTC (permalink / raw)
  To: herbert; +Cc: bmb, tgraf, nhorman, nhorman, eric.dumazet, netdev
In-Reply-To: <20100520033446.GA6434@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 20 May 2010 13:34:46 +1000

> cls_cgroup: Store classid in struct sock

This looks great to me.

^ permalink raw reply

* Re: [PATCH 2/2] net: Add classid to sk_buff.
From: David Miller @ 2010-05-20  3:47 UTC (permalink / raw)
  To: herbert; +Cc: bmb, tgraf, nhorman, nhorman, eric.dumazet, netdev
In-Reply-To: <20100520033643.GA6630@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 20 May 2010 13:36:43 +1000

> On Wed, May 19, 2010 at 07:55:38PM -0700, David Miller wrote:
>> 
>> We make this zero cost by moving queue_mapping into an existing
>> empty __u16 slot.  Thus making a __u32 available, which we use
>> for the 'classid'.
>> 
>> Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> You better hide this space before someone steals it :)

Indeed, I didn't even know we had it to begin with. :)

^ permalink raw reply

* Re: [PATCH] vhost-net: utilize PUBLISH_USED_IDX feature
From: Rusty Russell @ 2010-05-20  4:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, davem, Juan Quintela, Paul E. McKenney, Arnd Bergmann,
	kvm, virtualization, netdev, linux-kernel, alex.williamson,
	amit.shah
In-Reply-To: <20100519222718.GB4111@redhat.com>

On Thu, 20 May 2010 07:57:18 am Michael S. Tsirkin wrote:
> On Wed, May 19, 2010 at 08:04:51PM +0300, Avi Kivity wrote:
> > On 05/18/2010 04:19 AM, Michael S. Tsirkin wrote:
> >> With PUBLISH_USED_IDX, guest tells us which used entries
> >> it has consumed. This can be used to reduce the number
> >> of interrupts: after we write a used entry, if the guest has not yet
> >> consumed the previous entry, or if the guest has already consumed the
> >> new entry, we do not need to interrupt.
> >> This imporves bandwidth by 30% under some workflows.
> >>
> >> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> >> ---
> >>
> >> Rusty, Dave, this patch depends on the patch
> >> "virtio: put last seen used index into ring itself"
> >> which is currently destined at Rusty's tree.
> >> Rusty, if you are taking that one for 2.6.35, please
> >> take this one as well.
> >> Dave, any objections?
> >>    
> >
> > I object: I think the index should have its own cacheline,
> 
> The issue here is that host/guest do not know each
> other's cache line size. I guess we could just put it
> at offset 128 or something like that ... Rusty?

I was assuming you'd put it at the end of the padding.

I think it's a silly optimization, but Avi obviously feels strongly about
it and I respect his opinion.

Please resubmit...
Thanks,
Rusty.

^ permalink raw reply

* Re: [PATCH] net: fix problem in dequeuing from input_pkt_queue
From: Eric Dumazet @ 2010-05-20  4:37 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Changli Gao, davem, netdev
In-Reply-To: <AANLkTimv8YJD61uADUrHoeL4xGfE8wPPUwVpwJNZPFpu@mail.gmail.com>

Le mercredi 19 mai 2010 à 19:48 -0700, Tom Herbert a écrit :
> >> It should be okay?  process_backlog only runs in softirq so bottom
> >> halves are already disabled, and I don't think flush_backlog runs out
> >> of an interrupt.
> >>
> >
> > Oh no. It is an IRQ handler.
> >
> Very well, I will fix that.
> 
> Now I'm wondering, though, what the purpose of flush_backlog is...
> since __netif_receive_skb is called with interrupts enabled it's
> obvious flush_backlog won't catch all the skb's that reference the
> device go away.  Is there a reason these packets need to be flushed
> and can't just be processed?

flush_backlog is called when device is dismantled.

No new packets should be generated by the device at this moment.

Could you please split your patch in units, I spent 20 minutes to review
it and come to same conclusion than Changli (need to disable interrupts
as they are currently disabled) and also :

input_queue_head_incr(sd); are _not_ needed in flush_backlog()

We are in the very last moments of the life of the device, in a very
unlikely situation (packets in flight, not already consumed by the cpu),
we are _dropping_ packets, so OOO means nothing at this point. 

In dev_cpu_callback(), you reverse the order of input_pkt_queue /
process_queue.

Thats fine, but should be a single patch, because I am not sure the
input_queue_head_incr() are valid here, since we re-inject these packets
to netif_rx(). Could you clarify this point ?

Thanks !



^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Eric Dumazet @ 2010-05-20  4:54 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, bmb, tgraf, nhorman, nhorman, netdev
In-Reply-To: <20100520033446.GA6434@gondor.apana.org.au>

Le jeudi 20 mai 2010 à 13:34 +1000, Herbert Xu a écrit :
> On Wed, May 19, 2010 at 08:05:22PM -0700, David Miller wrote:
> > Ok, looking forward to it.
> 
> Here we go:
> 
> cls_cgroup: Store classid in struct sock
> 
> Up until now cls_cgroup has relied on fetching the classid out of
> the current executing thread.  This runs into trouble when a packet
> processing is delayed in which case it may execute out of another
> thread's context.
> 
> Furthermore, even when a packet is not delayed we may fail to
> classify it if soft IRQs have been disabled, because this scenario
> is indistinguishable from one where a packet unrelated to the
> current thread is processed by a real soft IRQ.
> 
> As we already have a concept of packet ownership for accounting
> purposes in the skb->sk pointer, this is a natural place to store
> the classid in a persistent manner.
> 
> This patch adds the cls_cgroup classid in struct sock, filling up
> an existing hole on 64-bit :)
> 
> The value is set at socket creation time.  So all sockets created
> via socket(2) automatically gains the ID of the thread creating it.
> Now you may argue that this may not be the same as the thread that
> is sending the packet.  However, we already have a precedence where
> an fd is passed to a different thread, its security property is
> inherited.  In this case, inheriting the classid of the thread
> creating the socket is also the logical thing to do.
> 
> For sockets created on inbound connections through accept(2), we
> inherit the classid of the original listening socket through
> sk_clone, possibly preceding the actual accept(2) call.
> 
> In order to minimise risks, I have not made this the authoritative
> classid.  For now it is only used as a backup when we execute
> with soft IRQs disabled.  Once we're completely happy with its
> semantics we can use it as the sole classid.
> 
> Footnote: I have rearranged the error path on cls_group module
> creation.  If we didn't do this, then there is a window where
> someone could create a tc rule using cls_group before the cgroup
> subsystem has been registered.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 8f78073..63c5036 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -410,6 +410,12 @@ struct cgroup_scanner {
>  	void *data;
>  };
>  
> +struct cgroup_cls_state
> +{
> +	struct cgroup_subsys_state css;
> +	u32 classid;
> +};
> +
>  /*
>   * Add a new file to the given cgroup directory. Should only be
>   * called by subsystems from within a populate() method
> @@ -515,6 +521,10 @@ struct cgroup_subsys {
>  	struct module *module;
>  };
>  
> +#ifndef CONFIG_NET_CLS_CGROUP
> +extern int net_cls_subsys_id;
> +#endif
> +
>  #define SUBSYS(_x) extern struct cgroup_subsys _x ## _subsys;
>  #include <linux/cgroup_subsys.h>
>  #undef SUBSYS
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 1ad6435..361a5dc 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -307,7 +307,7 @@ struct sock {
>  	void			*sk_security;
>  #endif
>  	__u32			sk_mark;
> -	/* XXX 4 bytes hole on 64 bit */
> +	u32			sk_classid;
>  	void			(*sk_state_change)(struct sock *sk);
>  	void			(*sk_data_ready)(struct sock *sk, int bytes);
>  	void			(*sk_write_space)(struct sock *sk);
> diff --git a/net/core/sock.c b/net/core/sock.c
> index c5812bb..70bc529 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -110,6 +110,7 @@
>  #include <linux/tcp.h>
>  #include <linux/init.h>
>  #include <linux/highmem.h>
> +#include <linux/cgroup.h>
>  
>  #include <asm/uaccess.h>
>  #include <asm/system.h>
> @@ -217,6 +218,32 @@ __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX;
>  int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*(2*UIO_MAXIOV+512);
>  EXPORT_SYMBOL(sysctl_optmem_max);
>  
> +#ifdef CONFIG_NET_CLS_CGROUP
> +static inline u32 task_cls_classid(struct task_struct *p)
> +{
> +	return container_of(task_subsys_state(p, net_cls_subsys_id),
> +			    struct cgroup_cls_state, css).classid;
> +}
> +#else
> +int net_cls_subsys_id = -1;
> +EXPORT_SYMBOL_GPL(net_cls_subsys_id);
> +
> +static inline u32 task_cls_classid(struct task_struct *p)
> +{
> +	int id;
> +	u32 classid;
> +
> +	rcu_read_lock();
> +	id = rcu_dereference(net_cls_subsys_id);
> +	if (id >= 0)
> +		classid = container_of(task_subsys_state(p, id),
> +				       struct cgroup_cls_state, css)->classid;
> +	rcu_read_unlock();
> +
> +	return classid;
> +}
> +#endif
> +
>  static int sock_set_timeout(long *timeo_p, char __user *optval, int optlen)
>  {
>  	struct timeval tv;
> @@ -1064,6 +1091,9 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
>  		sock_lock_init(sk);
>  		sock_net_set(sk, get_net(net));
>  		atomic_set(&sk->sk_wmem_alloc, 1);
> +
> +		if (!in_interrupt())
> +			sk->sk_classid = task_cls_classid(current);

It means we cache current classid of this process.

Can it change later ?


>  	}
>  
>  	return sk;
> diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
> index 2211803..32c1296 100644
> --- a/net/sched/cls_cgroup.c
> +++ b/net/sched/cls_cgroup.c
> @@ -16,14 +16,10 @@
>  #include <linux/errno.h>
>  #include <linux/skbuff.h>
>  #include <linux/cgroup.h>
> +#include <linux/rcupdate.h>
>  #include <net/rtnetlink.h>
>  #include <net/pkt_cls.h>
> -
> -struct cgroup_cls_state
> -{
> -	struct cgroup_subsys_state css;
> -	u32 classid;
> -};
> +#include <net/sock.h>
>  
>  static struct cgroup_subsys_state *cgrp_create(struct cgroup_subsys *ss,
>  					       struct cgroup *cgrp);
> @@ -112,6 +108,10 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
>  	struct cls_cgroup_head *head = tp->root;
>  	u32 classid;
>  
> +	rcu_read_lock();
> +	classid = task_cls_state(current)->classid;
> +	rcu_read_unlock();
> +

It would be more readable to use :

static inline int task_cls_state(struct task_struct *p)
{
	int classid;

	rcu_read_lock();
	classid = container_of(task_subsys_state(p, net_cls_subsys_id),
				struct cgroup_cls_state, css)->classid;
	rcu_read_unlock();
	return classid;
}

...

classid = task_cls_classid(current);

>  	/*
>  	 * Due to the nature of the classifier it is required to ignore all
>  	 * packets originating from softirq context as accessing `current'
> @@ -122,12 +122,12 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
>  	 * calls by looking at the number of nested bh disable calls because
>  	 * softirqs always disables bh.
>  	 */
> -	if (softirq_count() != SOFTIRQ_OFFSET)
> -		return -1;
> -
> -	rcu_read_lock();
> -	classid = task_cls_state(current)->classid;
> -	rcu_read_unlock();
> +	if (softirq_count() != SOFTIRQ_OFFSET) {

Why do we still need this previous test ?

> +	 	/* If there is an sk_classid we'll use that. */
> +		if (!skb->sk)
> +			return -1;
> +		classid = skb->sk->sk_classid;
> +	}
>  
>  	if (!classid)
>  		return -1;
> @@ -289,18 +289,35 @@ static struct tcf_proto_ops cls_cgroup_ops __read_mostly = {
>  
>  static int __init init_cgroup_cls(void)
>  {
> -	int ret = register_tcf_proto_ops(&cls_cgroup_ops);
> -	if (ret)
> -		return ret;
> +	int ret;
> +
>  	ret = cgroup_load_subsys(&net_cls_subsys);
>  	if (ret)
> -		unregister_tcf_proto_ops(&cls_cgroup_ops);
> +		goto out;
> +
> +#ifndef CONFIG_NET_CLS_CGROUP
> +	/* We can't use rcu_assign_pointer because this is an int. */
> +	smp_wmb();
> +	net_cls_subsys_id = net_cls_subsys.subsys_id;
> +#endif
> +
> +	ret = register_tcf_proto_ops(&cls_cgroup_ops);
> +	if (ret)
> +		cgroup_unload_subsys(&net_cls_subsys);
> +
> +out:
>  	return ret;
>  }
>  
>  static void __exit exit_cgroup_cls(void)
>  {
>  	unregister_tcf_proto_ops(&cls_cgroup_ops);
> +
> +#ifndef CONFIG_NET_CLS_CGROUP
> +	net_cls_subsys_id = -1;
> +	synchronize_rcu();
> +#endif
> +
>  	cgroup_unload_subsys(&net_cls_subsys);
>  }
>  
> 
> Thanks,



^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-20  5:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, bmb, tgraf, nhorman, nhorman, netdev
In-Reply-To: <1274331255.2658.24.camel@edumazet-laptop>

On Thu, May 20, 2010 at 06:54:15AM +0200, Eric Dumazet wrote:
>
> > @@ -1064,6 +1091,9 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
> >  		sock_lock_init(sk);
> >  		sock_net_set(sk, get_net(net));
> >  		atomic_set(&sk->sk_wmem_alloc, 1);
> > +
> > +		if (!in_interrupt())
> > +			sk->sk_classid = task_cls_classid(current);
> 
> It means we cache current classid of this process.
> 
> Can it change later ?

Please read my patchset description.  I explained everything there.

> > @@ -112,6 +108,10 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
> >  	struct cls_cgroup_head *head = tp->root;
> >  	u32 classid;
> >  
> > +	rcu_read_lock();
> > +	classid = task_cls_state(current)->classid;
> > +	rcu_read_unlock();
> > +
> 
> It would be more readable to use :

Sure, however, I'm just moving the existing code around, so I
wanted it to be exactly the same.  Clean-ups should be done as
a follow-up, feel free to do this if you like.

> > @@ -122,12 +122,12 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
> >  	 * calls by looking at the number of nested bh disable calls because
> >  	 * softirqs always disables bh.
> >  	 */
> > -	if (softirq_count() != SOFTIRQ_OFFSET)
> > -		return -1;
> > -
> > -	rcu_read_lock();
> > -	classid = task_cls_state(current)->classid;
> > -	rcu_read_unlock();
> > +	if (softirq_count() != SOFTIRQ_OFFSET) {
> 
> Why do we still need this previous test ?

You didn't read my patchset description at all, did you? :)

This patchset is trying to fix the issue at hand with the minimum
amount of changes to current behaviour.  Once we're happy with the
new semantics, we can get rid of the softirq stuff.

That will be done in a later patch to minimise the risks.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Eric Dumazet @ 2010-05-20  5:15 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, bmb, tgraf, nhorman, nhorman, netdev
In-Reply-To: <20100520033446.GA6434@gondor.apana.org.au>

Le jeudi 20 mai 2010 à 13:34 +1000, Herbert Xu a écrit :

> The value is set at socket creation time.  So all sockets created
> via socket(2) automatically gains the ID of the thread creating it.
> Now you may argue that this may not be the same as the thread that
> is sending the packet.  However, we already have a precedence where
> an fd is passed to a different thread, its security property is
> inherited.  In this case, inheriting the classid of the thread
> creating the socket is also the logical thing to do.

I find this very biased, sorry.

In fact, fd passing is just fine today, if we consider that we classify
packets using the identity of the process *using* the fd, not the one
that *created* it.

Now your patch changes this, to the reverse, and you justify the caching
effect on socket. Sorry, this must be too convoluted for me.





^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-20  5:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, bmb, tgraf, nhorman, nhorman, netdev
In-Reply-To: <1274332507.2658.31.camel@edumazet-laptop>

On Thu, May 20, 2010 at 07:15:07AM +0200, Eric Dumazet wrote:
> 
> I find this very biased, sorry.
> 
> In fact, fd passing is just fine today, if we consider that we classify
> packets using the identity of the process *using* the fd, not the one
> that *created* it.
> 
> Now your patch changes this, to the reverse, and you justify the caching
> effect on socket. Sorry, this must be too convoluted for me.

I'm sorry you find this convoluted, but using the sending process's
classid is inherently broken.

Here is why: consider a TCP socket shared by two processes with
different classids both writing data to it.  Now suppose further
that each writes just one byte, which is then coalesced into a
single skb.

Whose classid should we use on the resulting skb?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH net-next-2.6] bonding: move slave MTU handling from sysfs V2
From: Jiri Pirko @ 2010-05-20  5:34 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev, davem, bonding-devel, monis
In-Reply-To: <628.1274314061@death.nxdomain.ibm.com>

Thu, May 20, 2010 at 02:07:41AM CEST, fubar@us.ibm.com wrote:
>Jiri Pirko <jpirko@redhat.com> wrote:
>
>>V1->V2: corrected res/ret use
>>
>>For some reason, MTU handling (storing, and restoring) is taking  place in
>>bond_sysfs. The correct place for this code is in bond_enslave, bond_release.
>>So move it there.
>
>	In principle this looks ok, as do the other patches, but none of
>them apply to net-next-2.6 for me except for the "optimize
>tlb_get_least_loaded_slave" patch.  It looks like you left out a patch,
>see below.
>
>>Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>>---
>> drivers/net/bonding/bond_main.c  |   15 ++++++++++++++-
>> drivers/net/bonding/bond_sysfs.c |   22 ++--------------------
>> 2 files changed, 16 insertions(+), 21 deletions(-)
>>
>>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>index 5e12462..2c3f9db 100644
>>--- a/drivers/net/bonding/bond_main.c
>>+++ b/drivers/net/bonding/bond_main.c
>>@@ -1533,6 +1533,14 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>> 	 */
>> 	new_slave->original_flags = slave_dev->flags;
>>
>>+	/* Save slave's original mtu and then set it to match the bond */
>>+	new_slave->original_mtu = slave_dev->mtu;
>>+	res = dev_set_mtu(slave_dev, bond->dev->mtu);
>>+	if (res) {
>>+		pr_debug("Error %d calling dev_set_mtu\n", res);
>>+		goto err_free;
>>+	}
>>+
>> 	/*
>> 	 * Save slave's original ("permanent") mac address for modes
>> 	 * that need it, and for restoring it upon release, and then
>>@@ -1550,7 +1558,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>> 		res = dev_set_mac_address(slave_dev, &addr);
>> 		if (res) {
>> 			pr_debug("Error %d calling set_mac_address\n", res);
>>-			goto err_free;
>>+			goto err_restore_mtu;
>> 		}
>> 	}
>>
>>@@ -1785,6 +1793,9 @@ err_restore_mac:
>> 		dev_set_mac_address(slave_dev, &addr);
>> 	}
>>
>>+err_restore_mtu:
>>+	dev_set_mtu(slave_dev, new_slave->original_mtu);
>>+
>> err_free:
>> 	kfree(new_slave);
>>
>>@@ -1969,6 +1980,8 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
>> 		dev_set_mac_address(slave_dev, &addr);
>> 	}
>>
>>+	dev_set_mtu(slave_dev, slave->original_mtu);
>>+
>> 	slave_dev->priv_flags &= ~(IFF_MASTER_8023AD | IFF_MASTER_ALB |
>> 				   IFF_SLAVE_INACTIVE | IFF_BONDING |
>> 				   IFF_SLAVE_NEEDARP);
>>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>>index 392e291..29a7a8a 100644
>>--- a/drivers/net/bonding/bond_sysfs.c
>>+++ b/drivers/net/bonding/bond_sysfs.c
>>@@ -220,7 +220,6 @@ static ssize_t bonding_store_slaves(struct device *d,
>> 	char command[IFNAMSIZ + 1] = { 0, };
>> 	char *ifname;
>> 	int i, res, ret = count;
>>-	u32 original_mtu;
>> 	struct slave *slave;
>> 	struct net_device *dev = NULL;
>> 	struct bonding *bond = to_bond(d);
>
>	This chunk doesn't apply to net-next-2.6 because your context
>doesn't match; it looks like you've removed the variable "found" in your
>"before" source.  On closer inspection, "found" isn't actually used
>meaningfully, so I'm guessing you removed it in a prior patch but didn't
>submit that patch.
>
>	If that's the case, could you repost the whole series, with
>sequence numbers?

I don't think that's necessary for now. The patch removing found was posted as a
first one:
http://patchwork.ozlabs.org/patch/52795/

I tried it several times. Patches are cleanly applicable in order I posted it.

Jirka
>
>	-J
>
>>@@ -281,18 +280,7 @@ static ssize_t bonding_store_slaves(struct device *d,
>> 			memcpy(bond->dev->dev_addr, dev->dev_addr,
>> 			       dev->addr_len);
>>
>>-		/* Set the slave's MTU to match the bond */
>>-		original_mtu = dev->mtu;
>>-		res = dev_set_mtu(dev, bond->dev->mtu);
>>-		if (res) {
>>-			ret = res;
>>-			goto out;
>>-		}
>>-
>> 		res = bond_enslave(bond->dev, dev);
>>-		bond_for_each_slave(bond, slave, i)
>>-			if (strnicmp(slave->dev->name, ifname, IFNAMSIZ) == 0)
>>-				slave->original_mtu = original_mtu;
>> 		if (res)
>> 			ret = res;
>>
>>@@ -301,23 +289,17 @@ static ssize_t bonding_store_slaves(struct device *d,
>>
>> 	if (command[0] == '-') {
>> 		dev = NULL;
>>-		original_mtu = 0;
>> 		bond_for_each_slave(bond, slave, i)
>> 			if (strnicmp(slave->dev->name, ifname, IFNAMSIZ) == 0) {
>> 				dev = slave->dev;
>>-				original_mtu = slave->original_mtu;
>> 				break;
>> 			}
>> 		if (dev) {
>> 			pr_info("%s: Removing slave %s\n",
>> 				bond->dev->name, dev->name);
>>-				res = bond_release(bond->dev, dev);
>>-			if (res) {
>>+			res = bond_release(bond->dev, dev);
>>+			if (res)
>> 				ret = res;
>>-				goto out;
>>-			}
>>-			/* set the slave MTU to the default */
>>-			dev_set_mtu(dev, original_mtu);
>> 		} else {
>> 			pr_err("unable to remove non-existent slave %s for bond %s.\n",
>> 			       ifname, bond->dev->name);
>>-- 
>>1.6.6.1
>
>---
>	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Eric Dumazet @ 2010-05-20  5:36 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, bmb, tgraf, nhorman, nhorman, netdev
In-Reply-To: <20100520052059.GC7443@gondor.apana.org.au>

Le jeudi 20 mai 2010 à 15:20 +1000, Herbert Xu a écrit :
> On Thu, May 20, 2010 at 07:15:07AM +0200, Eric Dumazet wrote:
> > 
> > I find this very biased, sorry.
> > 
> > In fact, fd passing is just fine today, if we consider that we classify
> > packets using the identity of the process *using* the fd, not the one
> > that *created* it.
> > 
> > Now your patch changes this, to the reverse, and you justify the caching
> > effect on socket. Sorry, this must be too convoluted for me.
> 
> I'm sorry you find this convoluted, but using the sending process's
> classid is inherently broken.
> 
> Here is why: consider a TCP socket shared by two processes with
> different classids both writing data to it.  Now suppose further
> that each writes just one byte, which is then coalesced into a
> single skb.
> 
> Whose classid should we use on the resulting skb?

I am ok with any kind of clarification, if its really documented as
such, not as indirect effects of changes.

Right now, I am not sure classification is performed by the current
process/cpu. Our queue handling can process packets queued by other cpus
while we own the queue (__QDISC_STATE_RUNNING,) anyway.




^ permalink raw reply

* will 2 cpu simultaneously  process packets which have same hash value on multiqueue nic?
From: Jon Zhou @ 2010-05-20  5:37 UTC (permalink / raw)
  To: netdev@vger.kernel.org

will 2 cpu simultaneously  process packets which have same hash value on multiqueue nic?

let 's take broadcom 57711 bnx2x_main.c as an example:

#1 packet1->queue=1
#2 packet2->queue=1

will cpu1 and cpu2 execute the function " bnx2x_rx_int" in parallel, to receive packet1 & packet2

or it just depend on smp_affinity setting?

thanks
jon

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-20  5:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, bmb, tgraf, nhorman, nhorman, netdev
In-Reply-To: <1274333779.2658.43.camel@edumazet-laptop>

On Thu, May 20, 2010 at 07:36:19AM +0200, Eric Dumazet wrote:
> 
> I am ok with any kind of clarification, if its really documented as
> such, not as indirect effects of changes.

But I did document this semantic change, quite verbosely too
at that :)

> Right now, I am not sure classification is performed by the current
> process/cpu. Our queue handling can process packets queued by other cpus
> while we own the queue (__QDISC_STATE_RUNNING,) anyway.

Classification should be done when the packet is enqueued.  So
you shouldn't really see other people's traffic.

The original requeueing would have broken this too, but that is
no longer with us :)

In my original submission I forgot to mention the case of TCP
transmission triggered by an incoming ACK, which is really the
same case as this.

So let me take this opportunity to resubmit with an updated
description:

cls_cgroup: Store classid in struct sock

Up until now cls_cgroup has relied on fetching the classid out of
the current executing thread.  This runs into trouble when a packet
processing is delayed in which case it may execute out of another
thread's context.

Furthermore, even when a packet is not delayed we may fail to
classify it if soft IRQs have been disabled, because this scenario
is indistinguishable from one where a packet unrelated to the
current thread is processed by a real soft IRQ.

In fact, the current semantics is inherently broken, as a single
skb may be constructed out of the writes of two different tasks.
A different manifestation of this problem is when the TCP stack
transmits in response of an incoming ACK.  This is currently
unclassified.

As we already have a concept of packet ownership for accounting
purposes in the skb->sk pointer, this is a natural place to store
the classid in a persistent manner.

This patch adds the cls_cgroup classid in struct sock, filling up
an existing hole on 64-bit :)

The value is set at socket creation time.  So all sockets created
via socket(2) automatically gains the ID of the thread creating it.
Now you may argue that this may not be the same as the thread that
is sending the packet.  However, we already have a precedence where
an fd is passed to a different thread, its security property is
inherited.  In this case, inheriting the classid of the thread
creating the socket is also the logical thing to do.

For sockets created on inbound connections through accept(2), we
inherit the classid of the original listening socket through
sk_clone, possibly preceding the actual accept(2) call.

In order to minimise risks, I have not made this the authoritative
classid.  For now it is only used as a backup when we execute
with soft IRQs disabled.  Once we're completely happy with its
semantics we can use it as the sole classid.

Footnote: I have rearranged the error path on cls_group module
creation.  If we didn't do this, then there is a window where
someone could create a tc rule using cls_group before the cgroup
subsystem has been registered.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 8f78073..63c5036 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -410,6 +410,12 @@ struct cgroup_scanner {
 	void *data;
 };
 
+struct cgroup_cls_state
+{
+	struct cgroup_subsys_state css;
+	u32 classid;
+};
+
 /*
  * Add a new file to the given cgroup directory. Should only be
  * called by subsystems from within a populate() method
@@ -515,6 +521,10 @@ struct cgroup_subsys {
 	struct module *module;
 };
 
+#ifndef CONFIG_NET_CLS_CGROUP
+extern int net_cls_subsys_id;
+#endif
+
 #define SUBSYS(_x) extern struct cgroup_subsys _x ## _subsys;
 #include <linux/cgroup_subsys.h>
 #undef SUBSYS
diff --git a/include/net/sock.h b/include/net/sock.h
index 1ad6435..361a5dc 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -307,7 +307,7 @@ struct sock {
 	void			*sk_security;
 #endif
 	__u32			sk_mark;
-	/* XXX 4 bytes hole on 64 bit */
+	u32			sk_classid;
 	void			(*sk_state_change)(struct sock *sk);
 	void			(*sk_data_ready)(struct sock *sk, int bytes);
 	void			(*sk_write_space)(struct sock *sk);
diff --git a/net/core/sock.c b/net/core/sock.c
index c5812bb..70bc529 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -110,6 +110,7 @@
 #include <linux/tcp.h>
 #include <linux/init.h>
 #include <linux/highmem.h>
+#include <linux/cgroup.h>
 
 #include <asm/uaccess.h>
 #include <asm/system.h>
@@ -217,6 +218,32 @@ __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX;
 int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*(2*UIO_MAXIOV+512);
 EXPORT_SYMBOL(sysctl_optmem_max);
 
+#ifdef CONFIG_NET_CLS_CGROUP
+static inline u32 task_cls_classid(struct task_struct *p)
+{
+	return container_of(task_subsys_state(p, net_cls_subsys_id),
+			    struct cgroup_cls_state, css).classid;
+}
+#else
+int net_cls_subsys_id = -1;
+EXPORT_SYMBOL_GPL(net_cls_subsys_id);
+
+static inline u32 task_cls_classid(struct task_struct *p)
+{
+	int id;
+	u32 classid;
+
+	rcu_read_lock();
+	id = rcu_dereference(net_cls_subsys_id);
+	if (id >= 0)
+		classid = container_of(task_subsys_state(p, id),
+				       struct cgroup_cls_state, css)->classid;
+	rcu_read_unlock();
+
+	return classid;
+}
+#endif
+
 static int sock_set_timeout(long *timeo_p, char __user *optval, int optlen)
 {
 	struct timeval tv;
@@ -1064,6 +1091,9 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		sock_lock_init(sk);
 		sock_net_set(sk, get_net(net));
 		atomic_set(&sk->sk_wmem_alloc, 1);
+
+		if (!in_interrupt())
+			sk->sk_classid = task_cls_classid(current);
 	}
 
 	return sk;
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 2211803..32c1296 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -16,14 +16,10 @@
 #include <linux/errno.h>
 #include <linux/skbuff.h>
 #include <linux/cgroup.h>
+#include <linux/rcupdate.h>
 #include <net/rtnetlink.h>
 #include <net/pkt_cls.h>
-
-struct cgroup_cls_state
-{
-	struct cgroup_subsys_state css;
-	u32 classid;
-};
+#include <net/sock.h>
 
 static struct cgroup_subsys_state *cgrp_create(struct cgroup_subsys *ss,
 					       struct cgroup *cgrp);
@@ -112,6 +108,10 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
 	struct cls_cgroup_head *head = tp->root;
 	u32 classid;
 
+	rcu_read_lock();
+	classid = task_cls_state(current)->classid;
+	rcu_read_unlock();
+
 	/*
 	 * Due to the nature of the classifier it is required to ignore all
 	 * packets originating from softirq context as accessing `current'
@@ -122,12 +122,12 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
 	 * calls by looking at the number of nested bh disable calls because
 	 * softirqs always disables bh.
 	 */
-	if (softirq_count() != SOFTIRQ_OFFSET)
-		return -1;
-
-	rcu_read_lock();
-	classid = task_cls_state(current)->classid;
-	rcu_read_unlock();
+	if (softirq_count() != SOFTIRQ_OFFSET) {
+	 	/* If there is an sk_classid we'll use that. */
+		if (!skb->sk)
+			return -1;
+		classid = skb->sk->sk_classid;
+	}
 
 	if (!classid)
 		return -1;
@@ -289,18 +289,35 @@ static struct tcf_proto_ops cls_cgroup_ops __read_mostly = {
 
 static int __init init_cgroup_cls(void)
 {
-	int ret = register_tcf_proto_ops(&cls_cgroup_ops);
-	if (ret)
-		return ret;
+	int ret;
+
 	ret = cgroup_load_subsys(&net_cls_subsys);
 	if (ret)
-		unregister_tcf_proto_ops(&cls_cgroup_ops);
+		goto out;
+
+#ifndef CONFIG_NET_CLS_CGROUP
+	/* We can't use rcu_assign_pointer because this is an int. */
+	smp_wmb();
+	net_cls_subsys_id = net_cls_subsys.subsys_id;
+#endif
+
+	ret = register_tcf_proto_ops(&cls_cgroup_ops);
+	if (ret)
+		cgroup_unload_subsys(&net_cls_subsys);
+
+out:
 	return ret;
 }
 
 static void __exit exit_cgroup_cls(void)
 {
 	unregister_tcf_proto_ops(&cls_cgroup_ops);
+
+#ifndef CONFIG_NET_CLS_CGROUP
+	net_cls_subsys_id = -1;
+	synchronize_rcu();
+#endif
+
 	cgroup_unload_subsys(&net_cls_subsys);
 }
 
Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Eric Dumazet @ 2010-05-20  6:03 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, bmb, tgraf, nhorman, nhorman, netdev
In-Reply-To: <20100520054642.GA7836@gondor.apana.org.au>

Le jeudi 20 mai 2010 à 15:46 +1000, Herbert Xu a écrit :

> +#ifdef CONFIG_NET_CLS_CGROUP
> +static inline u32 task_cls_classid(struct task_struct *p)
> +{
> +	return container_of(task_subsys_state(p, net_cls_subsys_id),
> +			    struct cgroup_cls_state, css).classid;
> +}
> +#else
> +int net_cls_subsys_id = -1;
> +EXPORT_SYMBOL_GPL(net_cls_subsys_id);
> +
> +static inline u32 task_cls_classid(struct task_struct *p)
> +{
> +	int id;
> +	u32 classid;
> +
> +	rcu_read_lock();
> +	id = rcu_dereference(net_cls_subsys_id);
> +	if (id >= 0)
> +		classid = container_of(task_subsys_state(p, id),
> +				       struct cgroup_cls_state, css)->classid;
> +	rcu_read_unlock();
> +
> +	return classid;
> +}
> +#endif

I still dont understand why you introduce this function and not reuse it
in net/sched/cls_cgroup.c

You told me it needs a separate patch, I dont think so.
Just make it non static and EXPORTed



> +	rcu_read_lock();
> +	classid = task_cls_state(current)->classid;
> +	rcu_read_unlock();
> +




^ permalink raw reply

* Re: [PATCH] net: fix problem in dequeuing from input_pkt_queue
From: Tom Herbert @ 2010-05-20  6:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, davem, netdev
In-Reply-To: <1274330246.2658.16.camel@edumazet-laptop>

On Wed, May 19, 2010 at 9:37 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 19 mai 2010 à 19:48 -0700, Tom Herbert a écrit :
>> >> It should be okay?  process_backlog only runs in softirq so bottom
>> >> halves are already disabled, and I don't think flush_backlog runs out
>> >> of an interrupt.
>> >>
>> >
>> > Oh no. It is an IRQ handler.
>> >
>> Very well, I will fix that.
>>
>> Now I'm wondering, though, what the purpose of flush_backlog is...
>> since __netif_receive_skb is called with interrupts enabled it's
>> obvious flush_backlog won't catch all the skb's that reference the
>> device go away.  Is there a reason these packets need to be flushed
>> and can't just be processed?
>
> flush_backlog is called when device is dismantled.
>
> No new packets should be generated by the device at this moment.
>
But again since __netif_receive_skb is called with interrupts disabled
there is still a hole that the device could be completely dismantled
but at least one packet from the device still will be processed.  So
it seems like that's a bug, or maybe it's okay to process packets
after flush_backlog-- if the latter case were true why throw out
perfectly good packets?  The only rationale I can think of for
flush_backlog is to eliminate skb's with references to device that has
gone away, but the mechanism does not seem sufficient to cover all
possible skb's with a reference.

> Could you please split your patch in units, I spent 20 minutes to review
> it and come to same conclusion than Changli (need to disable interrupts
> as they are currently disabled) and also :
>
> input_queue_head_incr(sd); are _not_ needed in flush_backlog()
>
I don't see why they wouldn't be needed.  queue tail is incremented
when queuing to the input_pkt_queue, queue head is incremented when
dequeuing (after skb freed or processed).  queue_tail-queue_head ==
input_pkt_queue.len+process_queue.len.  These should be invariants.

> We are in the very last moments of the life of the device, in a very
> unlikely situation (packets in flight, not already consumed by the cpu),
> we are _dropping_ packets, so OOO means nothing at this point.
>
True.  But still seems nice to handle process_queue to be consistent.

> In dev_cpu_callback(), you reverse the order of input_pkt_queue /
> process_queue.
>
> Thats fine, but should be a single patch, because I am not sure the
> input_queue_head_incr() are valid here, since we re-inject these packets
> to netif_rx(). Could you clarify this point ?
>
queue_head advances on every dequeue.  See above...

Thanks for the great comments!

Tom

> Thanks !
>
>
>

^ permalink raw reply

* Re: [PATCH] vhost-net: utilize PUBLISH_USED_IDX feature
From: Avi Kivity @ 2010-05-20  6:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: davem, Juan Quintela, Rusty Russell, Paul E. McKenney,
	Arnd Bergmann, kvm, virtualization, netdev, linux-kernel,
	alex.williamson, amit.shah
In-Reply-To: <20100519222718.GB4111@redhat.com>

On 05/20/2010 01:27 AM, Michael S. Tsirkin wrote:
> On Wed, May 19, 2010 at 08:04:51PM +0300, Avi Kivity wrote:
>    
>> On 05/18/2010 04:19 AM, Michael S. Tsirkin wrote:
>>      
>>> With PUBLISH_USED_IDX, guest tells us which used entries
>>> it has consumed. This can be used to reduce the number
>>> of interrupts: after we write a used entry, if the guest has not yet
>>> consumed the previous entry, or if the guest has already consumed the
>>> new entry, we do not need to interrupt.
>>> This imporves bandwidth by 30% under some workflows.
>>>
>>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>>> ---
>>>
>>> Rusty, Dave, this patch depends on the patch
>>> "virtio: put last seen used index into ring itself"
>>> which is currently destined at Rusty's tree.
>>> Rusty, if you are taking that one for 2.6.35, please
>>> take this one as well.
>>> Dave, any objections?
>>>
>>>        
>> I object: I think the index should have its own cacheline,
>>      
> The issue here is that host/guest do not know each
> other's cache line size. I guess we could just put it
> at offset 128 or something like that ... Rusty?
>    

Not so pretty, but ok.

>    
>> and that it should be documented before merging.
>>      
> I think you meant to object to the virtio patch, not this one.  This
> patch does not introduce new layout, just implements host support.
> virtio spec patch will follow: it is not part of linux tree so
> there is no patch dependency.
>    

There is a reviewer dependency.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-20  6:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, bmb, tgraf, nhorman, nhorman, netdev
In-Reply-To: <1274335415.2658.46.camel@edumazet-laptop>

On Thu, May 20, 2010 at 08:03:35AM +0200, Eric Dumazet wrote:
> Le jeudi 20 mai 2010 à 15:46 +1000, Herbert Xu a écrit :
> 
> > +#ifdef CONFIG_NET_CLS_CGROUP
> > +static inline u32 task_cls_classid(struct task_struct *p)
> > +{
> > +	return container_of(task_subsys_state(p, net_cls_subsys_id),
> > +			    struct cgroup_cls_state, css).classid;
> > +}
> > +#else
> > +int net_cls_subsys_id = -1;
> > +EXPORT_SYMBOL_GPL(net_cls_subsys_id);
> > +
> > +static inline u32 task_cls_classid(struct task_struct *p)
> > +{
> > +	int id;
> > +	u32 classid;
> > +
> > +	rcu_read_lock();
> > +	id = rcu_dereference(net_cls_subsys_id);
> > +	if (id >= 0)
> > +		classid = container_of(task_subsys_state(p, id),
> > +				       struct cgroup_cls_state, css)->classid;
> > +	rcu_read_unlock();
> > +
> > +	return classid;
> > +}
> > +#endif
> 
> I still dont understand why you introduce this function and not reuse it
> in net/sched/cls_cgroup.c
> 
> You told me it needs a separate patch, I dont think so.
> Just make it non static and EXPORTed

Because it doesn't have to go through this RCU crap.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-20  6:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, bmb, tgraf, nhorman, nhorman, netdev
In-Reply-To: <20100520061138.GA8183@gondor.apana.org.au>

On Thu, May 20, 2010 at 04:11:38PM +1000, Herbert Xu wrote:
> On Thu, May 20, 2010 at 08:03:35AM +0200, Eric Dumazet wrote:
> > Le jeudi 20 mai 2010 à 15:46 +1000, Herbert Xu a écrit :
> > 
> > > +#ifdef CONFIG_NET_CLS_CGROUP
> > > +static inline u32 task_cls_classid(struct task_struct *p)
> > > +{
> > > +	return container_of(task_subsys_state(p, net_cls_subsys_id),
> > > +			    struct cgroup_cls_state, css).classid;
> > > +}
> > > +#else
> > > +int net_cls_subsys_id = -1;
> > > +EXPORT_SYMBOL_GPL(net_cls_subsys_id);
> > > +
> > > +static inline u32 task_cls_classid(struct task_struct *p)
> > > +{
> > > +	int id;
> > > +	u32 classid;
> > > +
> > > +	rcu_read_lock();
> > > +	id = rcu_dereference(net_cls_subsys_id);
> > > +	if (id >= 0)
> > > +		classid = container_of(task_subsys_state(p, id),
> > > +				       struct cgroup_cls_state, css)->classid;
> > > +	rcu_read_unlock();
> > > +
> > > +	return classid;
> > > +}
> > > +#endif
> > 
> > I still dont understand why you introduce this function and not reuse it
> > in net/sched/cls_cgroup.c
> > 
> > You told me it needs a separate patch, I dont think so.
> > Just make it non static and EXPORTed
> 
> Because it doesn't have to go through this RCU crap.

I'm talking about the rcu_dereference on net_cls_subsys_id of
course, not the rest of it.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 3/4] fec: add support for Freescale i.MX25 PDK (3DS)
From: Sascha Hauer @ 2010-05-20  6:46 UTC (permalink / raw)
  To: Jean-Christophe Dubois
  Cc: netdev, linux-arm-kernel, Baruch Siach, Greg Ungerer,
	David Miller
In-Reply-To: <201005191715.52905.jcd@tribudubois.net>

On Wed, May 19, 2010 at 05:15:52PM +0200, Jean-Christophe Dubois wrote:
> le lundi 25 janvier 2010 Baruch Siach a écrit
> > Hi Greg, netdev,
> > 
> > On Wed, Dec 16, 2009 at 08:34:06AM +0200, Baruch Siach wrote:
> > > On Wed, Dec 16, 2009 at 10:13:56AM +1000, Greg Ungerer wrote:
> > > > Baruch Siach wrote:
> > > > >On Tue, Dec 15, 2009 at 09:52:24PM +1000, Greg Ungerer wrote:
> > > > >>On 12/15/2009 06:31 PM, Baruch Siach wrote:
> > > > >>>+#ifndef CONFIG_M5272
> > > > >>
> > > > >>I would suggest making this conditional on FEC_MIIGSK_ENR.
> > > > >>Although the CONFIG_M5272 is the only case here currently,
> > > > >>that may change over the years. And using this here may not
> > > > >>be obvious to the causual code reader, since the register
> > > > >>offset definitions don't explicitly key on CONFIG_M5272.
> > > > >
> > > > >OK, I'll change this conditional.
> > > > >
> > > > >Can I take this as an Ack from you?
> > > > 
> > > > With that conditional check changed, sure:
> > > > 
> > > > Acked-by:  Greg Ungerer <gerg@uclinux.org>
> > > 
> > > Thanks. The updated patch below.
> > 
> > I'm really sorry to bug on this again, but since the platform code is
> > already upstream the i.MX25 code doesn't build without this patch
> > (include/linux/fec.h missing). So, someone please pick up this patch,
> > preferably prior to .33.
> > 
> > baruch
> 
> I am just wondering if somebody is going to pick up this patch 
> (http://patchwork.ozlabs.org/patch/41235/) so that it finds its way on 
> mainline.

I'm much in favour of this patch. David?

Sascha


-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: [PATCH 3/4] fec: add support for Freescale i.MX25 PDK (3DS)
From: David Miller @ 2010-05-20  6:49 UTC (permalink / raw)
  To: s.hauer; +Cc: jcd, netdev, linux-arm-kernel, baruch, gerg
In-Reply-To: <20100520064642.GT31199@pengutronix.de>

From: Sascha Hauer <s.hauer@pengutronix.de>
Date: Thu, 20 May 2010 08:46:43 +0200

> On Wed, May 19, 2010 at 05:15:52PM +0200, Jean-Christophe Dubois wrote:
>> 
>> I am just wondering if somebody is going to pick up this patch 
>> (http://patchwork.ozlabs.org/patch/41235/) so that it finds its way on 
>> mainline.
> 
> I'm much in favour of this patch. David?

I put it back into my queue.

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-20  6:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, bmb, tgraf, nhorman, nhorman, netdev
In-Reply-To: <20100520054642.GA7836@gondor.apana.org.au>

On Thu, May 20, 2010 at 03:46:42PM +1000, Herbert Xu wrote:
> 
> So let me take this opportunity to resubmit with an updated
> description:

OK Dave has convinced me that inheriting the cgroup of the process
creating a socket is not always the right thing to do.  So here's
a new patch that sets the socket classid to the last process with
a valid classid touching it.

cls_cgroup: Store classid in struct sock

Up until now cls_cgroup has relied on fetching the classid out of
the current executing thread.  This runs into trouble when a packet
processing is delayed in which case it may execute out of another
thread's context.

Furthermore, even when a packet is not delayed we may fail to
classify it if soft IRQs have been disabled, because this scenario
is indistinguishable from one where a packet unrelated to the
current thread is processed by a real soft IRQ.

In fact, the current semantics is inherently broken, as a single
skb may be constructed out of the writes of two different tasks.
A different manifestation of this problem is when the TCP stack
transmits in response of an incoming ACK.  This is currently
unclassified.

As we already have a concept of packet ownership for accounting
purposes in the skb->sk pointer, this is a natural place to store
the classid in a persistent manner.

This patch adds the cls_cgroup classid in struct sock, filling up
an existing hole on 64-bit :)

The value is set at socket creation time.  So all sockets created
via socket(2) automatically gains the ID of the thread creating it.
Whenever another process touches the socket by either reading or
writing to it, we will change the socket classid to that of the
process if it has a valid (non-zero) classid.

For sockets created on inbound connections through accept(2), we
inherit the classid of the original listening socket through
sk_clone, possibly preceding the actual accept(2) call.

In order to minimise risks, I have not made this the authoritative
classid.  For now it is only used as a backup when we execute
with soft IRQs disabled.  Once we're completely happy with its
semantics we can use it as the sole classid.

Footnote: I have rearranged the error path on cls_group module
creation.  If we didn't do this, then there is a window where
someone could create a tc rule using cls_group before the cgroup
subsystem has been registered.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h
new file mode 100644
index 0000000..a4cd15f
--- /dev/null
+++ b/include/net/cls_cgroup.h
@@ -0,0 +1,63 @@
+/*
+ * cls_cgroup.h			Control Group Classifier
+ * 
+ * Authors:	Thomas Graf <tgraf@suug.ch>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option) 
+ * any later version.
+ *
+ */
+
+#ifndef _NET_CLS_CGROUP_H
+#define _NET_CLS_CGROUP_H
+
+#include <linux/cgroup.h>
+#include <linux/hardirq.h>
+#include <linux/rcupdate.h>
+
+struct cgroup_cls_state
+{
+	struct cgroup_subsys_state css;
+	u32 classid;
+};
+
+#ifdef CONFIG_CGROUP
+#ifdef CONFIG_NET_CLS_CGROUP
+static inline u32 task_cls_classid(struct task_struct *p)
+{
+	if (in_interrupt())
+		return 0;
+
+	return container_of(task_subsys_state(p, net_cls_subsys_id),
+			    struct cgroup_cls_state, css).classid;
+}
+#else
+extern int net_cls_subsys_id;
+
+static inline u32 task_cls_classid(struct task_struct *p)
+{
+	int id;
+	u32 classid;
+
+	if (in_interrupt())
+		return 0;
+
+	rcu_read_lock();
+	id = rcu_dereference(net_cls_subsys_id);
+	if (id >= 0)
+		classid = container_of(task_subsys_state(p, id),
+				       struct cgroup_cls_state, css)->classid;
+	rcu_read_unlock();
+
+	return classid;
+}
+#endif
+#else
+static inline u32 task_cls_classid(struct task_struct *p)
+{
+	return 0;
+}
+#endif
+#endif  /* _NET_CLS_CGROUP_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index 1ad6435..cf191ea 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -307,7 +307,7 @@ struct sock {
 	void			*sk_security;
 #endif
 	__u32			sk_mark;
-	/* XXX 4 bytes hole on 64 bit */
+	u32			sk_classid;
 	void			(*sk_state_change)(struct sock *sk);
 	void			(*sk_data_ready)(struct sock *sk, int bytes);
 	void			(*sk_write_space)(struct sock *sk);
@@ -1012,6 +1012,14 @@ extern void *sock_kmalloc(struct sock *sk, int size,
 extern void sock_kfree_s(struct sock *sk, void *mem, int size);
 extern void sk_send_sigurg(struct sock *sk);
 
+#ifdef CONFIG_CGROUP
+extern void sock_update_classid(struct sock *sk);
+#else
+static inline void sock_update_classid(struct sock *sk)
+{
+}
+#endif
+
 /*
  * Functions to fill in entries in struct proto_ops when a protocol
  * does not implement a particular function.
diff --git a/net/core/sock.c b/net/core/sock.c
index c5812bb..8f7fdf8 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -123,6 +123,7 @@
 #include <linux/net_tstamp.h>
 #include <net/xfrm.h>
 #include <linux/ipsec.h>
+#include <net/cls_cgroup.h>
 
 #include <linux/filter.h>
 
@@ -217,6 +218,11 @@ __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX;
 int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*(2*UIO_MAXIOV+512);
 EXPORT_SYMBOL(sysctl_optmem_max);
 
+#if defined(CONFIG_CGROUPS) && !defined(CONFIG_NET_CLS_CGROUP)
+int net_cls_subsys_id = -1;
+EXPORT_SYMBOL_GPL(net_cls_subsys_id);
+#endif
+
 static int sock_set_timeout(long *timeo_p, char __user *optval, int optlen)
 {
 	struct timeval tv;
@@ -1041,6 +1047,16 @@ static void sk_prot_free(struct proto *prot, struct sock *sk)
 	module_put(owner);
 }
 
+#ifdef CONFIG_CGROUP
+void sock_update_classid(struct sock *sk)
+{
+	u32 classid = task_cls_classid(current);
+
+	if (classid && classid != sk->sk_classid)
+		sk->classid = classid;
+}
+#endif
+
 /**
  *	sk_alloc - All socket objects are allocated here
  *	@net: the applicable net namespace
@@ -1064,6 +1080,8 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		sock_lock_init(sk);
 		sock_net_set(sk, get_net(net));
 		atomic_set(&sk->sk_wmem_alloc, 1);
+
+		sock_update_classid(sk);
 	}
 
 	return sk;
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 2211803..766124b 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -16,14 +16,11 @@
 #include <linux/errno.h>
 #include <linux/skbuff.h>
 #include <linux/cgroup.h>
+#include <linux/rcupdate.h>
 #include <net/rtnetlink.h>
 #include <net/pkt_cls.h>
-
-struct cgroup_cls_state
-{
-	struct cgroup_subsys_state css;
-	u32 classid;
-};
+#include <net/sock.h>
+#include <net/cls_cgroup.h>
 
 static struct cgroup_subsys_state *cgrp_create(struct cgroup_subsys *ss,
 					       struct cgroup *cgrp);
@@ -112,6 +109,10 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
 	struct cls_cgroup_head *head = tp->root;
 	u32 classid;
 
+	rcu_read_lock();
+	classid = task_cls_state(current)->classid;
+	rcu_read_unlock();
+
 	/*
 	 * Due to the nature of the classifier it is required to ignore all
 	 * packets originating from softirq context as accessing `current'
@@ -122,12 +123,12 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
 	 * calls by looking at the number of nested bh disable calls because
 	 * softirqs always disables bh.
 	 */
-	if (softirq_count() != SOFTIRQ_OFFSET)
-		return -1;
-
-	rcu_read_lock();
-	classid = task_cls_state(current)->classid;
-	rcu_read_unlock();
+	if (softirq_count() != SOFTIRQ_OFFSET) {
+	 	/* If there is an sk_classid we'll use that. */
+		if (!skb->sk)
+			return -1;
+		classid = skb->sk->sk_classid;
+	}
 
 	if (!classid)
 		return -1;
@@ -289,18 +290,35 @@ static struct tcf_proto_ops cls_cgroup_ops __read_mostly = {
 
 static int __init init_cgroup_cls(void)
 {
-	int ret = register_tcf_proto_ops(&cls_cgroup_ops);
-	if (ret)
-		return ret;
+	int ret;
+
 	ret = cgroup_load_subsys(&net_cls_subsys);
 	if (ret)
-		unregister_tcf_proto_ops(&cls_cgroup_ops);
+		goto out;
+
+#ifndef CONFIG_NET_CLS_CGROUP
+	/* We can't use rcu_assign_pointer because this is an int. */
+	smp_wmb();
+	net_cls_subsys_id = net_cls_subsys.subsys_id;
+#endif
+
+	ret = register_tcf_proto_ops(&cls_cgroup_ops);
+	if (ret)
+		cgroup_unload_subsys(&net_cls_subsys);
+
+out:
 	return ret;
 }
 
 static void __exit exit_cgroup_cls(void)
 {
 	unregister_tcf_proto_ops(&cls_cgroup_ops);
+
+#ifndef CONFIG_NET_CLS_CGROUP
+	net_cls_subsys_id = -1;
+	synchronize_rcu();
+#endif
+
 	cgroup_unload_subsys(&net_cls_subsys);
 }
 
diff --git a/net/socket.c b/net/socket.c
index 5e8d0af..6a1970f 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -94,6 +94,7 @@
 
 #include <net/compat.h>
 #include <net/wext.h>
+#include <net/cls_cgroup.h>
 
 #include <net/sock.h>
 #include <linux/netfilter.h>
@@ -542,6 +543,8 @@ static inline int __sock_sendmsg(struct kiocb *iocb, struct socket *sock,
 	struct sock_iocb *si = kiocb_to_siocb(iocb);
 	int err;
 
+	sock_update_classid(sock->sk);
+
 	si->sock = sock;
 	si->scm = NULL;
 	si->msg = msg;
@@ -669,6 +672,8 @@ static inline int __sock_recvmsg_nosec(struct kiocb *iocb, struct socket *sock,
 {
 	struct sock_iocb *si = kiocb_to_siocb(iocb);
 
+	sock_update_classid(sock->sk);
+
 	si->sock = sock;
 	si->scm = NULL;
 	si->msg = msg;
@@ -762,6 +767,8 @@ static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
 	if (unlikely(!sock->ops->splice_read))
 		return -EINVAL;
 
+	sock_update_classid(sock->sk);
+
 	return sock->ops->splice_read(sock, ppos, pipe, len, flags);
 }
 
@@ -3096,6 +3103,8 @@ int kernel_setsockopt(struct socket *sock, int level, int optname,
 int kernel_sendpage(struct socket *sock, struct page *page, int offset,
 		    size_t size, int flags)
 {
+	sock_update_classid(sock->sk);
+
 	if (sock->ops->sendpage)
 		return sock->ops->sendpage(sock, page, offset, size, flags);

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related

* [PATCH net-next-2.6 1/8] caif: Bugfix wait_ev*_timeout returns long.
From: sjur.brandeland @ 2010-05-20  7:44 UTC (permalink / raw)
  To: netdev, davem
  Cc: marcel, daniel.martensson, linus.walleji, sjurbren,
	Sjur Braendeland
In-Reply-To: <1274341448-14924-1-git-send-email-sjur.brandeland@stericsson.com>

From: Sjur Braendeland <sjur.brandeland@stericsson.com>

Discovered problems when testing on 64bit architecture.
Fixed by using long to store result from wait_event_interruptible_timeout.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
---
 net/caif/caif_socket.c |   13 ++++++-------
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index c3a70c5..77e9956 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -920,17 +920,17 @@ wait_connect:
 	timeo = sock_sndtimeo(sk, flags & O_NONBLOCK);
 
 	release_sock(sk);
-	err = wait_event_interruptible_timeout(*sk_sleep(sk),
+	err = -ERESTARTSYS;
+	timeo = wait_event_interruptible_timeout(*sk_sleep(sk),
 			sk->sk_state != CAIF_CONNECTING,
 			timeo);
 	lock_sock(sk);
-	if (err < 0)
+	if (timeo < 0)
 		goto out; /* -ERESTARTSYS */
-	if (err == 0 && sk->sk_state != CAIF_CONNECTED) {
-		err = -ETIMEDOUT;
-		goto out;
-	}
 
+	err = -ETIMEDOUT;
+	if (timeo == 0 && sk->sk_state != CAIF_CONNECTED)
+		goto out;
 	if (sk->sk_state != CAIF_CONNECTED) {
 		sock->state = SS_UNCONNECTED;
 		err = sock_error(sk);
@@ -945,7 +945,6 @@ out:
 	return err;
 }
 
-
 /*
  * caif_release() - Disconnect a CAIF Socket
  * Copied and modified af_irda.c:irda_release().
-- 
1.6.3.3


^ permalink raw reply related

* [PATCH net-next-2.6 0/8] CAIF: Bugfixes and updates
From: sjur.brandeland @ 2010-05-20  7:44 UTC (permalink / raw)
  To: netdev, davem
  Cc: marcel, daniel.martensson, linus.walleji, sjurbren,
	Sjur Braendeland

From: Sjur Braendeland <sjur.brandeland@stericsson.com>

Hi Dave,
I have a series of CAIF bug fixes and changes. I realize you sent
the pull request to linux-next yesterday, am I too late for the
merge window with functional changes?

Functional changes:
o Add debug connection type for CAIF - this adds a new CAIF debug protocol.
o Add support for MSG_TRUNC - changes handling in seqpkt_recvmsg when 
  user buffer cannot hold content of SKB. We are now supporting MSG_TRUNC
  properly.
o Use link layer MTU instead of fixed MTU - get mtu and head/tail room from
  CAIF Link Layer when calculating MTU in CAIF IP Interface and
  CAIF Socket Layer.

The rest are bug fixes.

Regards
Sjur Brændeland

Sjur Braendeland (8):
  caif: Bugfix wait_ev*_timeout returns long.
  caif: Bugfix - use standard Linux lists
  caif: Bugfix - handle mem-allocation failures
  caif: Add support for MSG_TRUNC.
  caif: Add debug connection type for CAIF.
  caif: Use link layer MTU instead of fixed MTU
  caif: Bugfix - Poll can't return POLLHUP while connecting.
  caif: Fixed splint warnings

 drivers/net/caif/caif_serial.c   |    2 +-
 include/linux/caif/caif_socket.h |   40 ++++++++++-
 include/net/caif/caif_dev.h      |    8 ++-
 include/net/caif/caif_layer.h    |    6 --
 include/net/caif/cfcnfg.h        |   16 +++-
 include/net/caif/cfctrl.h        |    4 +-
 net/caif/caif_config_util.c      |    5 ++
 net/caif/caif_dev.c              |   12 ++-
 net/caif/caif_socket.c           |  144 ++++++++++++++++++--------------------
 net/caif/cfcnfg.c                |   39 ++++++++++-
 net/caif/cfctrl.c                |   93 +++++++------------------
 net/caif/cfmuxl.c                |    3 +-
 net/caif/cfpkt_skbuff.c          |   29 +++++---
 net/caif/cfrfml.c                |    5 --
 net/caif/cfserl.c                |   10 ++-
 net/caif/cfsrvl.c                |    6 ++
 net/caif/cfutill.c               |    6 --
 net/caif/cfveil.c                |    5 --
 net/caif/chnl_net.c              |   61 +++++++++++++---
 19 files changed, 284 insertions(+), 210 deletions(-)


^ permalink raw reply

* [PATCH net-next-2.6 2/8] caif: Bugfix - use standard Linux lists
From: sjur.brandeland @ 2010-05-20  7:44 UTC (permalink / raw)
  To: netdev, davem
  Cc: marcel, daniel.martensson, linus.walleji, sjurbren,
	Sjur Braendeland
In-Reply-To: <1274341448-14924-2-git-send-email-sjur.brandeland@stericsson.com>

From: Sjur Braendeland <sjur.brandeland@stericsson.com>

Use linux/list.h implementation instead of home brewed list.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
---
 include/net/caif/cfctrl.h |    4 +-
 net/caif/cfctrl.c         |   92 +++++++++++++--------------------------------
 2 files changed, 28 insertions(+), 68 deletions(-)

diff --git a/include/net/caif/cfctrl.h b/include/net/caif/cfctrl.h
index 997603f..9402543 100644
--- a/include/net/caif/cfctrl.h
+++ b/include/net/caif/cfctrl.h
@@ -94,8 +94,8 @@ struct cfctrl_request_info {
 	enum cfctrl_cmd cmd;
 	u8 channel_id;
 	struct cfctrl_link_param param;
-	struct cfctrl_request_info *next;
 	struct cflayer *client_layer;
+	struct list_head list;
 };
 
 struct cfctrl {
@@ -103,7 +103,7 @@ struct cfctrl {
 	struct cfctrl_rsp res;
 	atomic_t req_seq_no;
 	atomic_t rsp_seq_no;
-	struct cfctrl_request_info *first_req;
+	struct list_head list;
 	/* Protects from simultaneous access to first_req list */
 	spinlock_t info_list_lock;
 #ifndef CAIF_NO_LOOP
diff --git a/net/caif/cfctrl.c b/net/caif/cfctrl.c
index 0ffe1e1..fcfda98 100644
--- a/net/caif/cfctrl.c
+++ b/net/caif/cfctrl.c
@@ -44,13 +44,14 @@ struct cflayer *cfctrl_create(void)
 	dev_info.id = 0xff;
 	memset(this, 0, sizeof(*this));
 	cfsrvl_init(&this->serv, 0, &dev_info);
-	spin_lock_init(&this->info_list_lock);
 	atomic_set(&this->req_seq_no, 1);
 	atomic_set(&this->rsp_seq_no, 1);
 	this->serv.layer.receive = cfctrl_recv;
 	sprintf(this->serv.layer.name, "ctrl");
 	this->serv.layer.ctrlcmd = cfctrl_ctrlcmd;
 	spin_lock_init(&this->loop_linkid_lock);
+	spin_lock_init(&this->info_list_lock);
+	INIT_LIST_HEAD(&this->list);
 	this->loop_linkid = 1;
 	return &this->serv.layer;
 }
@@ -112,20 +113,10 @@ bool cfctrl_req_eq(struct cfctrl_request_info *r1,
 void cfctrl_insert_req(struct cfctrl *ctrl,
 			      struct cfctrl_request_info *req)
 {
-	struct cfctrl_request_info *p;
 	spin_lock(&ctrl->info_list_lock);
-	req->next = NULL;
 	atomic_inc(&ctrl->req_seq_no);
 	req->sequence_no = atomic_read(&ctrl->req_seq_no);
-	if (ctrl->first_req == NULL) {
-		ctrl->first_req = req;
-		spin_unlock(&ctrl->info_list_lock);
-		return;
-	}
-	p = ctrl->first_req;
-	while (p->next != NULL)
-		p = p->next;
-	p->next = req;
+	list_add_tail(&req->list, &ctrl->list);
 	spin_unlock(&ctrl->info_list_lock);
 }
 
@@ -133,46 +124,28 @@ void cfctrl_insert_req(struct cfctrl *ctrl,
 struct cfctrl_request_info *cfctrl_remove_req(struct cfctrl *ctrl,
 					      struct cfctrl_request_info *req)
 {
-	struct cfctrl_request_info *p;
-	struct cfctrl_request_info *ret;
+	struct cfctrl_request_info *p, *tmp, *first;
 
 	spin_lock(&ctrl->info_list_lock);
-	if (ctrl->first_req == NULL) {
-		spin_unlock(&ctrl->info_list_lock);
-		return NULL;
-	}
-
-	if (cfctrl_req_eq(req, ctrl->first_req)) {
-		ret = ctrl->first_req;
-		caif_assert(ctrl->first_req);
-		atomic_set(&ctrl->rsp_seq_no,
-				 ctrl->first_req->sequence_no);
-		ctrl->first_req = ctrl->first_req->next;
-		spin_unlock(&ctrl->info_list_lock);
-		return ret;
-	}
+	first = list_first_entry(&ctrl->list, struct cfctrl_request_info, list);
 
-	p = ctrl->first_req;
-
-	while (p->next != NULL) {
-		if (cfctrl_req_eq(req, p->next)) {
-			pr_warning("CAIF: %s(): Requests are not "
+	list_for_each_entry_safe(p, tmp, &ctrl->list, list) {
+		if (cfctrl_req_eq(req, p)) {
+			if (p != first)
+				pr_warning("CAIF: %s(): Requests are not "
 					"received in order\n",
 					__func__);
-			ret = p->next;
+
 			atomic_set(&ctrl->rsp_seq_no,
-					p->next->sequence_no);
-			p->next = p->next->next;
-			spin_unlock(&ctrl->info_list_lock);
-			return ret;
+					 p->sequence_no);
+			list_del(&p->list);
+			goto out;
 		}
-		p = p->next;
 	}
+	p = NULL;
+out:
 	spin_unlock(&ctrl->info_list_lock);
-
-	pr_warning("CAIF: %s(): Request does not match\n",
-		   __func__);
-	return NULL;
+	return p;
 }
 
 struct cfctrl_rsp *cfctrl_get_respfuncs(struct cflayer *layer)
@@ -388,31 +361,18 @@ void cfctrl_getstartreason_req(struct cflayer *layer)
 
 void cfctrl_cancel_req(struct cflayer *layr, struct cflayer *adap_layer)
 {
-	struct cfctrl_request_info *p, *req;
+	struct cfctrl_request_info *p, *tmp;
 	struct cfctrl *ctrl = container_obj(layr);
 	spin_lock(&ctrl->info_list_lock);
-
-	if (ctrl->first_req == NULL) {
-		spin_unlock(&ctrl->info_list_lock);
-		return;
-	}
-
-	if (ctrl->first_req->client_layer == adap_layer) {
-
-		req = ctrl->first_req;
-		ctrl->first_req = ctrl->first_req->next;
-		kfree(req);
-	}
-
-	p = ctrl->first_req;
-	while (p != NULL && p->next != NULL) {
-		if (p->next->client_layer == adap_layer) {
-
-			req = p->next;
-			p->next = p->next->next;
-			kfree(p->next);
+	pr_warning("CAIF: %s(): enter\n", __func__);
+
+	list_for_each_entry_safe(p, tmp, &ctrl->list, list) {
+		if (p->client_layer == adap_layer) {
+			pr_warning("CAIF: %s(): cancel req :%d\n", __func__,
+					p->sequence_no);
+			list_del(&p->list);
+			kfree(p);
 		}
-		p = p->next;
 	}
 
 	spin_unlock(&ctrl->info_list_lock);
@@ -634,7 +594,7 @@ static void cfctrl_ctrlcmd(struct cflayer *layr, enum caif_ctrlcmd ctrl,
 	case _CAIF_CTRLCMD_PHYIF_FLOW_OFF_IND:
 	case CAIF_CTRLCMD_FLOW_OFF_IND:
 		spin_lock(&this->info_list_lock);
-		if (this->first_req != NULL) {
+		if (!list_empty(&this->list)) {
 			pr_debug("CAIF: %s(): Received flow off in "
 				   "control layer", __func__);
 		}
-- 
1.6.3.3


^ permalink raw reply related

* [PATCH net-next-2.6 3/8] caif: Bugfix - handle mem-allocation failures
From: sjur.brandeland @ 2010-05-20  7:44 UTC (permalink / raw)
  To: netdev, davem
  Cc: marcel, daniel.martensson, linus.walleji, sjurbren,
	Sjur Braendeland
In-Reply-To: <1274341448-14924-3-git-send-email-sjur.brandeland@stericsson.com>

From: Sjur Braendeland <sjur.brandeland@stericsson.com>

Add checks on memory allocation. It seems to be stable running test with
injecting slab allocation failures.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
---
 net/caif/cfpkt_skbuff.c |   25 +++++++++++++++++--------
 net/caif/cfserl.c       |    3 ++-
 net/caif/cfsrvl.c       |    6 ++++++
 3 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/net/caif/cfpkt_skbuff.c b/net/caif/cfpkt_skbuff.c
index 83fff2f..a6fdf89 100644
--- a/net/caif/cfpkt_skbuff.c
+++ b/net/caif/cfpkt_skbuff.c
@@ -238,6 +238,7 @@ int cfpkt_add_head(struct cfpkt *pkt, const void *data2, u16 len)
 	struct sk_buff *lastskb;
 	u8 *to;
 	const u8 *data = data2;
+	int ret;
 	if (unlikely(is_erronous(pkt)))
 		return -EPROTO;
 	if (unlikely(skb_headroom(skb) < len)) {
@@ -246,9 +247,10 @@ int cfpkt_add_head(struct cfpkt *pkt, const void *data2, u16 len)
 	}
 
 	/* Make sure data is writable */
-	if (unlikely(skb_cow_data(skb, 0, &lastskb) < 0)) {
+	ret = skb_cow_data(skb, 0, &lastskb);
+	if (unlikely(ret < 0)) {
 		PKT_ERROR(pkt, "cfpkt_add_head: cow failed\n");
-		return -EPROTO;
+		return ret;
 	}
 
 	to = skb_push(skb, len);
@@ -316,6 +318,8 @@ EXPORT_SYMBOL(cfpkt_setlen);
 struct cfpkt *cfpkt_create_uplink(const unsigned char *data, unsigned int len)
 {
 	struct cfpkt *pkt = cfpkt_create_pfx(len + PKT_POSTFIX, PKT_PREFIX);
+	if (!pkt)
+		return NULL;
 	if (unlikely(data != NULL))
 		cfpkt_add_body(pkt, data, len);
 	return pkt;
@@ -344,12 +348,13 @@ struct cfpkt *cfpkt_append(struct cfpkt *dstpkt,
 
 	if (dst->tail + neededtailspace > dst->end) {
 		/* Create a dumplicate of 'dst' with more tail space */
+		struct cfpkt *tmppkt;
 		dstlen = skb_headlen(dst);
 		createlen = dstlen + neededtailspace;
-		tmp = pkt_to_skb(
-			cfpkt_create(createlen + PKT_PREFIX + PKT_POSTFIX));
-		if (!tmp)
+		tmppkt = cfpkt_create(createlen + PKT_PREFIX + PKT_POSTFIX);
+		if (tmppkt == NULL)
 			return NULL;
+		tmp = pkt_to_skb(tmppkt);
 		skb_set_tail_pointer(tmp, dstlen);
 		tmp->len = dstlen;
 		memcpy(tmp->data, dst->data, dstlen);
@@ -368,6 +373,7 @@ struct cfpkt *cfpkt_split(struct cfpkt *pkt, u16 pos)
 {
 	struct sk_buff *skb2;
 	struct sk_buff *skb = pkt_to_skb(pkt);
+	struct cfpkt *tmppkt;
 	u8 *split = skb->data + pos;
 	u16 len2nd = skb_tail_pointer(skb) - split;
 
@@ -381,9 +387,12 @@ struct cfpkt *cfpkt_split(struct cfpkt *pkt, u16 pos)
 	}
 
 	/* Create a new packet for the second part of the data */
-	skb2 = pkt_to_skb(
-		cfpkt_create_pfx(len2nd + PKT_PREFIX + PKT_POSTFIX,
-				 PKT_PREFIX));
+	tmppkt = cfpkt_create_pfx(len2nd + PKT_PREFIX + PKT_POSTFIX,
+				  PKT_PREFIX);
+	if (tmppkt == NULL)
+		return NULL;
+	skb2 = pkt_to_skb(tmppkt);
+
 
 	if (skb2 == NULL)
 		return NULL;
diff --git a/net/caif/cfserl.c b/net/caif/cfserl.c
index 06029ea..cb4325a 100644
--- a/net/caif/cfserl.c
+++ b/net/caif/cfserl.c
@@ -67,6 +67,8 @@ static int cfserl_receive(struct cflayer *l, struct cfpkt *newpkt)
 		layr->incomplete_frm =
 		    cfpkt_append(layr->incomplete_frm, newpkt, expectlen);
 		pkt = layr->incomplete_frm;
+		if (pkt == NULL)
+			return -ENOMEM;
 	} else {
 		pkt = newpkt;
 	}
@@ -154,7 +156,6 @@ static int cfserl_receive(struct cflayer *l, struct cfpkt *newpkt)
 			if (layr->usestx) {
 				if (tail_pkt != NULL)
 					pkt = cfpkt_append(pkt, tail_pkt, 0);
-
 				/* Start search for next STX if frame failed */
 				continue;
 			} else {
diff --git a/net/caif/cfsrvl.c b/net/caif/cfsrvl.c
index aff31f3..6e5b707 100644
--- a/net/caif/cfsrvl.c
+++ b/net/caif/cfsrvl.c
@@ -123,6 +123,12 @@ static int cfservl_modemcmd(struct cflayer *layr, enum caif_modemcmd ctrl)
 			struct caif_payload_info *info;
 			u8 flow_off = SRVL_FLOW_OFF;
 			pkt = cfpkt_create(SRVL_CTRL_PKT_SIZE);
+			if (!pkt) {
+				pr_warning("CAIF: %s(): Out of memory\n",
+					__func__);
+				return -ENOMEM;
+			}
+
 			if (cfpkt_add_head(pkt, &flow_off, 1) < 0) {
 				pr_err("CAIF: %s(): Packet is erroneous!\n",
 					__func__);
-- 
1.6.3.3


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox