All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com,
	John Fastabend <john.r.fastabend@intel.com>,
	e1000-devel@lists.sourceforge.net
Subject: Re: [PATCH net V2 2/2] net: core: explicitly select a txq before doing l2 forwarding
Date: Fri, 10 Jan 2014 15:03:01 +0800	[thread overview]
Message-ID: <52CF9B25.7020909@redhat.com> (raw)
In-Reply-To: <20140109123144.GC16701@hmsreliant.think-freely.org>

On 01/09/2014 08:31 PM, Neil Horman wrote:
> On Thu, Jan 09, 2014 at 05:37:32PM +0800, Jason Wang wrote:
>> Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
>> will cause several issues:
>>
>> - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
>>   instead of lower device which misses the necessary txq synchronization for
>>   lower device such as txq stopping or frozen required by dev watchdog or
>>   control path.
>> - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
>>   watchdog.
>> - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
>>   when tso is disabled for lower device.
>>
>> Fix this by explicitly introducing a new param for .ndo_select_queue() for just
>> selecting queues in the case of l2 forwarding offload. And also introducing
>> dfwd_direct_xmit() to do the queue selecting, txq holding and transmitting for
>> l2 forwarding.
>>
>> With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
>> to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
>> a dedicated ndo_dfwd_start_xmit().
>>
>> In the future, it was also required for macvtap l2 forwarding support since it
>> provides a necessary synchronization method.
>>
>> Cc: John Fastabend <john.r.fastabend@intel.com>
>> Cc: Neil Horman <nhorman@tuxdriver.com>
>> Cc: e1000-devel@lists.sourceforge.net
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>
>> ---
>> Changes from V1:
>> - Adding a new parameter to ndo_select_queue instead of a new method to select
>>   queue for l2 forwarding.
>> - Remove the unnecessary ndo_dfwd_start_xmit() since txq was selected
>>   explicitly.
>> - Keep NETIF_F_LLTX when netdev feature is changed.
>> - Shape the commit log
> A few minor nits inline.
>> <snip>
>>  }
>> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
>> index 5360f73..7eb4c82 100644
>> --- a/drivers/net/macvlan.c
>> +++ b/drivers/net/macvlan.c
>> @@ -299,7 +299,7 @@ netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
>>  
>>  	if (vlan->fwd_priv) {
>>  		skb->dev = vlan->lowerdev;
>> -		ret = dev_hard_start_xmit(skb, skb->dev, NULL, vlan->fwd_priv);
>> +		ret = dfwd_direct_xmit(skb, skb->dev, vlan->fwd_priv);
>>  	} else {
>>  		ret = macvlan_queue_xmit(skb, dev);
>>  	}
>> @@ -366,7 +366,6 @@ static int macvlan_open(struct net_device *dev)
>>  		if (IS_ERR_OR_NULL(vlan->fwd_priv)) {
>>  			vlan->fwd_priv = NULL;
>>  		} else {
>> -			dev->features &= ~NETIF_F_LLTX;
>>  			return 0;
>>  		}
> After removing the features flag operation here, you don't need the braces
> around the else statement either.

Ok.
>> <snip>
>> +int dfwd_direct_xmit(struct sk_buff *skb, struct net_device *dev,
>> +		     void *accel_priv)
>> +{
>> +	struct netdev_queue *txq;
>> +	int ret = NETDEV_TX_BUSY;
>> +	int index;
>> +
>> +	BUG_ON(!dev->netdev_ops->ndo_select_queue);
>> +	index =	dev->netdev_ops->ndo_select_queue(dev, skb, accel_priv);
>> +
>> +	local_bh_disable();
>> +
>> +	skb_set_queue_mapping(skb, index);
>> +	txq = netdev_get_tx_queue(dev, index);
>> +
>> +	HARD_TX_LOCK(dev, txq, smp_processor_id());
>> +	if (!netif_xmit_frozen_or_stopped(txq))
>> +		ret = dev_hard_start_xmit(skb, dev, txq);
>> +	HARD_TX_UNLOCK(dev, txq);
>> +
>> +	local_bh_enable();
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(dfwd_direct_xmit);
>> +
> Now that we're using the common path to select a queue, can we just use
> dev_queue_xmit here instead of creating our own transmit function?  The txq we
> select from the ixgbe card will just have a pfifo_fast queue on it (if not a
> noop queue), so dev_queue_xmit should just fall into the dev_hard_start_xmit
> path, and save us this extra coding.
>
> Neil

Ture, and this will make no difference with the case without l2
forwarding. To not trouble other parts too much, I will keep the current
dev_queue_xmit() API and rename the current dev_queue_xmit() to
__dev_queue_xmit() can make it can accept a accel_priv parameter. So
dev_queue_xmit() will call this will NULL accel_priv and introduce a
dev_queue_xmit_accel() that can accept a accel_priv parameter.

Thanks
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


WARNING: multiple messages have this Message-ID (diff)
From: Jason Wang <jasowang@redhat.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: mst@redhat.com, e1000-devel@lists.sourceforge.net,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	John Fastabend <john.r.fastabend@intel.com>,
	davem@davemloft.net
Subject: Re: [PATCH net V2 2/2] net: core: explicitly select a txq before doing l2 forwarding
Date: Fri, 10 Jan 2014 15:03:01 +0800	[thread overview]
Message-ID: <52CF9B25.7020909@redhat.com> (raw)
In-Reply-To: <20140109123144.GC16701@hmsreliant.think-freely.org>

On 01/09/2014 08:31 PM, Neil Horman wrote:
> On Thu, Jan 09, 2014 at 05:37:32PM +0800, Jason Wang wrote:
>> Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
>> will cause several issues:
>>
>> - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
>>   instead of lower device which misses the necessary txq synchronization for
>>   lower device such as txq stopping or frozen required by dev watchdog or
>>   control path.
>> - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
>>   watchdog.
>> - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
>>   when tso is disabled for lower device.
>>
>> Fix this by explicitly introducing a new param for .ndo_select_queue() for just
>> selecting queues in the case of l2 forwarding offload. And also introducing
>> dfwd_direct_xmit() to do the queue selecting, txq holding and transmitting for
>> l2 forwarding.
>>
>> With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
>> to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
>> a dedicated ndo_dfwd_start_xmit().
>>
>> In the future, it was also required for macvtap l2 forwarding support since it
>> provides a necessary synchronization method.
>>
>> Cc: John Fastabend <john.r.fastabend@intel.com>
>> Cc: Neil Horman <nhorman@tuxdriver.com>
>> Cc: e1000-devel@lists.sourceforge.net
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>
>> ---
>> Changes from V1:
>> - Adding a new parameter to ndo_select_queue instead of a new method to select
>>   queue for l2 forwarding.
>> - Remove the unnecessary ndo_dfwd_start_xmit() since txq was selected
>>   explicitly.
>> - Keep NETIF_F_LLTX when netdev feature is changed.
>> - Shape the commit log
> A few minor nits inline.
>> <snip>
>>  }
>> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
>> index 5360f73..7eb4c82 100644
>> --- a/drivers/net/macvlan.c
>> +++ b/drivers/net/macvlan.c
>> @@ -299,7 +299,7 @@ netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
>>  
>>  	if (vlan->fwd_priv) {
>>  		skb->dev = vlan->lowerdev;
>> -		ret = dev_hard_start_xmit(skb, skb->dev, NULL, vlan->fwd_priv);
>> +		ret = dfwd_direct_xmit(skb, skb->dev, vlan->fwd_priv);
>>  	} else {
>>  		ret = macvlan_queue_xmit(skb, dev);
>>  	}
>> @@ -366,7 +366,6 @@ static int macvlan_open(struct net_device *dev)
>>  		if (IS_ERR_OR_NULL(vlan->fwd_priv)) {
>>  			vlan->fwd_priv = NULL;
>>  		} else {
>> -			dev->features &= ~NETIF_F_LLTX;
>>  			return 0;
>>  		}
> After removing the features flag operation here, you don't need the braces
> around the else statement either.

Ok.
>> <snip>
>> +int dfwd_direct_xmit(struct sk_buff *skb, struct net_device *dev,
>> +		     void *accel_priv)
>> +{
>> +	struct netdev_queue *txq;
>> +	int ret = NETDEV_TX_BUSY;
>> +	int index;
>> +
>> +	BUG_ON(!dev->netdev_ops->ndo_select_queue);
>> +	index =	dev->netdev_ops->ndo_select_queue(dev, skb, accel_priv);
>> +
>> +	local_bh_disable();
>> +
>> +	skb_set_queue_mapping(skb, index);
>> +	txq = netdev_get_tx_queue(dev, index);
>> +
>> +	HARD_TX_LOCK(dev, txq, smp_processor_id());
>> +	if (!netif_xmit_frozen_or_stopped(txq))
>> +		ret = dev_hard_start_xmit(skb, dev, txq);
>> +	HARD_TX_UNLOCK(dev, txq);
>> +
>> +	local_bh_enable();
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(dfwd_direct_xmit);
>> +
> Now that we're using the common path to select a queue, can we just use
> dev_queue_xmit here instead of creating our own transmit function?  The txq we
> select from the ixgbe card will just have a pfifo_fast queue on it (if not a
> noop queue), so dev_queue_xmit should just fall into the dev_hard_start_xmit
> path, and save us this extra coding.
>
> Neil

Ture, and this will make no difference with the case without l2
forwarding. To not trouble other parts too much, I will keep the current
dev_queue_xmit() API and rename the current dev_queue_xmit() to
__dev_queue_xmit() can make it can accept a accel_priv parameter. So
dev_queue_xmit() will call this will NULL accel_priv and introduce a
dev_queue_xmit_accel() that can accept a accel_priv parameter.

Thanks
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

  reply	other threads:[~2014-01-10  7:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-09  9:37 [PATCH net V2 1/2] macvlan: forbid L2 fowarding offload for macvtap Jason Wang
2014-01-09  9:37 ` [PATCH net V2 2/2] net: core: explicitly select a txq before doing l2 forwarding Jason Wang
2014-01-09  9:37   ` Jason Wang
2014-01-09 12:31   ` Neil Horman
2014-01-10  7:03     ` Jason Wang [this message]
2014-01-10  7:03       ` Jason Wang
2014-01-10 14:27       ` Neil Horman
2014-01-09 12:20 ` [PATCH net V2 1/2] macvlan: forbid L2 fowarding offload for macvtap Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52CF9B25.7020909@redhat.com \
    --to=jasowang@redhat.com \
    --cc=davem@davemloft.net \
    --cc=e1000-devel@lists.sourceforge.net \
    --cc=john.r.fastabend@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.