All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nicolas de Pesloüan" <nicolas.2p.debian@gmail.com>
To: Jiri Pirko <jpirko@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>,
	David Miller <davem@davemloft.net>,
	kaber@trash.net, eric.dumazet@gmail.com, netdev@vger.kernel.org,
	shemminger@linux-foundation.org, andy@greyhouse.net
Subject: Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
Date: Sat, 19 Feb 2011 21:27:37 +0100	[thread overview]
Message-ID: <4D6027B9.6050108@gmail.com> (raw)
In-Reply-To: <20110219134645.GF2782@psychotron.redhat.com>

Le 19/02/2011 14:46, Jiri Pirko a écrit :
> Sat, Feb 19, 2011 at 02:18:00PM CET, nicolas.2p.debian@gmail.com wrote:
>> Le 19/02/2011 12:28, Jiri Pirko a écrit :
>>> Sat, Feb 19, 2011 at 12:08:31PM CET, jpirko@redhat.com wrote:
>>>> Sat, Feb 19, 2011 at 11:56:23AM CET, nicolas.2p.debian@gmail.com wrote:
>>>>> Le 19/02/2011 09:05, Jiri Pirko a écrit :
>>>>>> This patch converts bonding to use rx_handler. Results in cleaner
>>>>>> __netif_receive_skb() with much less exceptions needed. Also
>>>>>> bond-specific work is moved into bond code.
>>>>>>
>>>>>> Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>>>>>>
>>>>>> v1->v2:
>>>>>>          using skb_iif instead of new input_dev to remember original
>>>>>> 	device
>>>>>> v2->v3:
>>>>>> 	set orig_dev = skb->dev if skb_iif is set
>>>>>>
>>>>>
>>>>> Why do we need to let the rx_handlers call netif_rx() or __netif_receive_skb()?
>>>>>
>>>>> Bonding used to be handled with very few overhead, simply replacing
>>>>> skb->dev with skb->dev->master. Time has passed and we eventually
>>>>> added many special processing for bonding into __netif_receive_skb(),
>>>>> but the overhead remained very light.
>>>>>
>>>>> Calling netif_rx() (or __netif_receive_skb()) to allow nesting would probably lead to some overhead.
>>>>>
>>>>> Can't we, instead, loop inside __netif_receive_skb(), and deliver
>>>>> whatever need to be delivered, to whoever need, inside the loop ?
>>>>>
>>>>> rx_handler = rcu_dereference(skb->dev->rx_handler);
>>>>> while (rx_handler) {
>>>>> 	/* ...  */
>>>>> 	orig_dev = skb->dev;
>>>>> 	skb = rx_handler(skb);
>>>>> 	/* ... */
>>>>> 	rx_handler = (skb->dev != orig_dev) ? rcu_dereference(skb->dev->rx_handler) : NULL;
>>>>> }
>>>>>
>>>>> This would reduce the overhead, while still allowing nesting: vlan on
>>>>> top on bonding, bridge on top on bonding, ...
>>>>
>>>> I see your point. Makes sense to me. But the loop would have to include
>>>> at least processing of ptype_all too. I'm going to cook a follow-up
>>>> patch.
>>>>
>>>
>>> DRAFT (doesn't modify rx_handlers):
>>>
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index 4ebf7fe..e5dba47 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -3115,6 +3115,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>   {
>>>   	struct packet_type *ptype, *pt_prev;
>>>   	rx_handler_func_t *rx_handler;
>>> +	struct net_device *dev;
>>>   	struct net_device *orig_dev;
>>>   	struct net_device *null_or_dev;
>>>   	int ret = NET_RX_DROP;
>>> @@ -3129,7 +3130,9 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>   	if (netpoll_receive_skb(skb))
>>>   		return NET_RX_DROP;
>>>
>>> -	__this_cpu_inc(softnet_data.processed);
>>> +	skb->skb_iif = skb->dev->ifindex;
>>> +	orig_dev = skb->dev;
>>
>> orig_dev should be set inside the loop, to reflect "previously
>> crossed device", while following the path:
>>
>> eth0 ->  bond0 ->  br0.
>>
>> First step inside loop:
>>
>> orig_dev = eth0
>> skb->dev = bond0 (at the end of the loop).
>>
>> Second step inside loop:
>>
>> orig_dev = bond0
>> skb->dev = br0 (et the end of the loop).
>>
>> This would allow for exact match delivery to bond0 if someone bind there.
>>
>>> +
>>>   	skb_reset_network_header(skb);
>>>   	skb_reset_transport_header(skb);
>>>   	skb->mac_len = skb->network_header - skb->mac_header;
>>> @@ -3138,12 +3141,9 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>
>>>   	rcu_read_lock();
>>>
>>> -	if (!skb->skb_iif) {
>>> -		skb->skb_iif = skb->dev->ifindex;
>>> -		orig_dev = skb->dev;
>>> -	} else {
>>> -		orig_dev = dev_get_by_index_rcu(dev_net(skb->dev), skb->skb_iif);
>>> -	}
>>
>> I like the fact that it removes the above part.
>>
>>> +another_round:
>>> +	__this_cpu_inc(softnet_data.processed);
>>> +	dev = skb->dev;
>>>
>>>   #ifdef CONFIG_NET_CLS_ACT
>>>   	if (skb->tc_verd&   TC_NCLS) {
>>> @@ -3153,7 +3153,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>   #endif
>>>
>>>   	list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>> -		if (!ptype->dev || ptype->dev == skb->dev) {
>>> +		if (!ptype->dev || ptype->dev == dev) {
>>>   			if (pt_prev)
>>>   				ret = deliver_skb(skb, pt_prev, orig_dev);
>>>   			pt_prev = ptype;
>>
>> Inside the loop, we should only do exact match delivery, for
>> &ptype_all and for&ptype_base[ntohs(type)&  PTYPE_HASH_MASK]:
>>
>>         list_for_each_entry_rcu(ptype,&ptype_all, list) {
>> -               if (!ptype->dev || ptype->dev == dev) {
>> +               if (ptype->dev == dev) {
>>                         if (pt_prev)
>>                                 ret = deliver_skb(skb, pt_prev, orig_dev);
>>                         pt_prev = ptype;
>>                 }
>>         }
>>
>>
>>         list_for_each_entry_rcu(ptype,
>>                         &ptype_base[ntohs(type)&  PTYPE_HASH_MASK], list) {
>>                 if (ptype->type == type&&
>> -                   (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>> +                   (ptype->dev == skb->dev)) {
>>                         if (pt_prev)
>>                                 ret = deliver_skb(skb, pt_prev, orig_dev);
>>                         pt_prev = ptype;
>>                 }
>>         }
>>
>> After leaving the loop, we can do wilcard delivery, if skb is not NULL.
>>
>>         list_for_each_entry_rcu(ptype,&ptype_all, list) {
>> -               if (!ptype->dev || ptype->dev == dev) {
>> +               if (!ptype->dev) {
>>                         if (pt_prev)
>>                                 ret = deliver_skb(skb, pt_prev, orig_dev);
>>                         pt_prev = ptype;
>>                }
>>         }
>>
>>
>>         list_for_each_entry_rcu(ptype,
>>                         &ptype_base[ntohs(type)&  PTYPE_HASH_MASK], list) {
>> -               if (ptype->type == type&&
>> -                   (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>> +		if (ptype->type == type&&  !ptype->dev) {
>>                         if (pt_prev)
>>                                 ret = deliver_skb(skb, pt_prev, orig_dev);
>>                         pt_prev = ptype;
>>                 }
>>         }
>>
>> This would reduce the number of tests inside the
>> list_for_each_entry_rcu() loops. And because we match only ptype->dev
>> == dev inside the loop and !ptype->dev outside the loop, this should
>> avoid duplicate delivery.
>
> Would you care to put this into patch so I can see the whole picture?
> Thanks.

Here is what I have in mind. It is based on your previous DRAFT patch, and don't modify rx_handlers yet.

Only compile tested !!

I don't know if every pieces are at the right place. I wonder what to do with CONFIG_NET_CLS_ACT 
part, that currently is between ptype_all and ptype_base processing.

Anyway, the general idea is there.

	Nicolas.

  net/core/dev.c |   70 ++++++++++++++++++++++++++++++++++++++++++++++++--------
  1 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index e5dba47..7e007a9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3117,7 +3117,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
  	rx_handler_func_t *rx_handler;
  	struct net_device *dev;
  	struct net_device *orig_dev;
-	struct net_device *null_or_dev;
  	int ret = NET_RX_DROP;
  	__be16 type;

@@ -3130,9 +3129,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
  	if (netpoll_receive_skb(skb))
  		return NET_RX_DROP;

-	skb->skb_iif = skb->dev->ifindex;
-	orig_dev = skb->dev;
-
  	skb_reset_network_header(skb);
  	skb_reset_transport_header(skb);
  	skb->mac_len = skb->network_header - skb->mac_header;
@@ -3143,6 +3139,8 @@ static int __netif_receive_skb(struct sk_buff *skb)

  another_round:
  	__this_cpu_inc(softnet_data.processed);
+	skb->skb_iif = skb->dev->ifindex;
+	orig_dev = skb->dev;
  	dev = skb->dev;

  #ifdef CONFIG_NET_CLS_ACT
@@ -3152,8 +3150,13 @@ another_round:
  	}
  #endif

+	/*
+	 * Deliver to ptype_all protocol handlers that match current dev.
+	 * This happens before rx_handler is given a chance to change skb->dev.
+	 */
+
  	list_for_each_entry_rcu(ptype, &ptype_all, list) {
-		if (!ptype->dev || ptype->dev == dev) {
+		if (ptype->dev == dev) {
  			if (pt_prev)
  				ret = deliver_skb(skb, pt_prev, orig_dev);
  			pt_prev = ptype;
@@ -3167,6 +3170,31 @@ another_round:
  ncls:
  #endif

+	/*
+	 * Deliver to ptype_base protocol handlers that match current dev.
+	 * This happens before rx_handler is given a chance to change skb->dev.
+	 */
+
+	type = skb->protocol;
+	list_for_each_entry_rcu(ptype,
+			&ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
+		if (ptype->type == type && ptype->dev == skb->dev) {
+			if (pt_prev)
+				ret = deliver_skb(skb, pt_prev, orig_dev);
+			pt_prev = ptype;
+		}
+	}
+
+	/*
+	 * Call rx_handler for current device.
+	 * If rx_handler return NULL, skip wilcard protocol handler delivery.
+	 * Else, if skb->dev changed, restart the whole delivery process, to
+	 * allow for device nesting.
+	 *
+	 * Warning:
+	 * rx_handlers must kfree_skb(skb) if they return NULL.
+	 */
+
  	rx_handler = rcu_dereference(dev->rx_handler);
  	if (rx_handler) {
  		if (pt_prev) {
@@ -3176,10 +3204,15 @@ ncls:
  		skb = rx_handler(skb);
  		if (!skb)
  			goto out;
-		if (dev != skb->dev)
+		if (skb->dev != dev)
  			goto another_round;
  	}

+	/*
+	 * FIXME: The part below should use rx_handler instead of being hard
+	 * coded here.
+	 */
+
  	if (vlan_tx_tag_present(skb)) {
  		if (pt_prev) {
  			ret = deliver_skb(skb, pt_prev, orig_dev);
@@ -3192,16 +3225,33 @@ ncls:
  			goto out;
  	}

+	/*
+	 * FIXME: Can't this be moved into the rx_handler for bonding,
+	 * or into a futur rx_handler for vlan?
+	 */
+
  	vlan_on_bond_hook(skb);

-	/* deliver only exact match when indicated */
-	null_or_dev = skb->deliver_no_wcard ? skb->dev : NULL;
+	/*
+	 * Deliver to wildcard ptype_all protocol handlers.
+	 */
+
+	list_for_each_entry_rcu(ptype, &ptype_all, list) {
+		if (!ptype->dev) {
+			if (pt_prev)
+				ret = deliver_skb(skb, pt_prev, orig_dev);
+			pt_prev = ptype;
+		}
+	}
+
+	/*
+	 * Deliver to wildcard ptype_all protocol handlers.
+	 */

  	type = skb->protocol;
  	list_for_each_entry_rcu(ptype,
  			&ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
-		if (ptype->type == type &&
-		    (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
+		if (ptype->type == type && !ptype->dev) {
  			if (pt_prev)
  				ret = deliver_skb(skb, pt_prev, orig_dev);
  			pt_prev = ptype;
-- 
1.7.2.3




  parent reply	other threads:[~2011-02-19 20:27 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-18 13:25 [patch net-next-2.6] net: convert bonding to use rx_handler Jiri Pirko
2011-02-18 13:29 ` Eric Dumazet
2011-02-18 14:14   ` Jiri Pirko
2011-02-18 14:27     ` Eric Dumazet
2011-02-18 14:46       ` Patrick McHardy
2011-02-18 14:58         ` Jiri Pirko
2011-02-18 15:50           ` Patrick McHardy
2011-02-18 16:14             ` Eric Dumazet
2011-02-18 18:47               ` Jiri Pirko
2011-02-18 19:17                 ` Eric Dumazet
2011-02-18 19:28                   ` Jiri Pirko
2011-02-18 19:58                     ` Eric Dumazet
2011-02-18 20:03                       ` Jiri Pirko
2011-02-18 20:06           ` David Miller
2011-02-18 20:13             ` Jiri Pirko
2011-02-18 20:58             ` [patch net-next-2.6 V2] " Jiri Pirko
2011-02-18 23:06               ` Jay Vosburgh
2011-02-19  7:44                 ` Jiri Pirko
2011-02-19  8:05                 ` [patch net-next-2.6 V3] " Jiri Pirko
2011-02-19  8:37                   ` Eric Dumazet
2011-02-19  8:58                     ` Jiri Pirko
2011-02-19  9:22                       ` Eric Dumazet
2011-02-19 10:56                   ` Nicolas de Pesloüan
2011-02-19 11:08                     ` Jiri Pirko
2011-02-19 11:28                       ` Jiri Pirko
2011-02-19 13:18                         ` Nicolas de Pesloüan
2011-02-19 13:46                           ` Jiri Pirko
2011-02-19 14:32                             ` Nicolas de Pesloüan
2011-02-19 20:27                             ` Nicolas de Pesloüan [this message]
2011-02-20 10:36                               ` Jiri Pirko
2011-02-20 12:12                                 ` Nicolas de Pesloüan
2011-02-20 15:07                                   ` Jiri Pirko
2011-02-21 23:20                                     ` Nicolas de Pesloüan
2011-02-26 14:24                                       ` Nicolas de Pesloüan
2011-02-26 19:42                                         ` Jay Vosburgh
2011-02-27 12:58                                           ` Jiri Pirko
2011-02-27 20:44                                             ` Nicolas de Pesloüan
2011-02-27 23:22                                             ` David Miller
2011-02-28  7:07                                               ` Jiri Pirko
2011-02-28  7:30                                                 ` David Miller
2011-02-28  9:22                                                   ` Jiri Pirko
2011-02-28  9:35                                                     ` Eric Dumazet
2011-02-28  9:55                                                       ` [patch net-next-2.6] net: convert bonding to use rx_handler - second part Jiri Pirko
2011-02-28 18:49                                                     ` [patch net-next-2.6 V3] net: convert bonding to use rx_handler David Miller
2011-02-23 19:05               ` Jiri Pirko
2011-02-25 23:46                 ` Nicolas de Pesloüan
2011-02-26  7:14                   ` Jiri Pirko
2011-02-26 11:25                     ` Nicolas de Pesloüan
2011-02-26 14:58                       ` Jiri Pirko
2011-02-27 14:17                 ` Nicolas de Pesloüan
2011-02-27 20:06                   ` Jiri Pirko
2011-02-27 20:59                     ` Nicolas de Pesloüan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D6027B9.6050108@gmail.com \
    --to=nicolas.2p.debian@gmail.com \
    --cc=andy@greyhouse.net \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=fubar@us.ibm.com \
    --cc=jpirko@redhat.com \
    --cc=kaber@trash.net \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.