All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com,
	John Fastabend <john.r.fastabend@intel.com>,
	e1000-devel@lists.sourceforge.net
Subject: Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding
Date: Thu, 09 Jan 2014 16:28:49 +0800	[thread overview]
Message-ID: <52CE5DC1.8070807@redhat.com> (raw)
In-Reply-To: <20140108144025.GA17802@neilslaptop.think-freely.org>

On 01/08/2014 10:40 PM, Neil Horman wrote:
> On Wed, Jan 08, 2014 at 11:21:21AM +0800, Jason Wang wrote:
>> On 01/07/2014 09:17 PM, Neil Horman wrote:
>>> On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote:
>>>> On 01/06/2014 08:42 PM, Neil Horman wrote:
>>>>> On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
>>>>>> Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
>>>>>> will cause several issues:
>>>>>>
>>>>>> - NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock
>>>>>>   contention.
>>>>>> - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
>>>>>>   watchdog
>>>>>> - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
>>>>>>   when tso is disabled for lower device.
>>>>>>
>>>>>> Fix this by explicitly introducing a select queue method just for l2 forwarding
>>>>>> offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the
>>>>>> queue selecting and transmitting for l2 forwarding.
>>>>>>
>>>>>> With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
>>>>>> to check txq against NULL in dev_hard_start_xmit().
>>>>>>
>>>>>> In the future, it was also required for macvtap l2 forwarding support since it
>>>>>> provides a necessary synchronization method.
>>>>>>
>>>>>> Cc: John Fastabend <john.r.fastabend@intel.com>
>>>>>> Cc: Neil Horman <nhorman@tuxdriver.com>
>>>>>> Cc: e1000-devel@lists.sourceforge.net
>>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>> Instead of creating another operation here to do special queue selection, why
>>>>> not just have ndo_dfwd_start_xmit include a pointer to a pointer in its argument
>>>>> list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)?
>>>>> ndo_dfwd_start_xmit already knows which queue set to pick from (since their
>>>>> reserved for the device doing the transmitting).  It seems more clear to me than
>>>>> creating a new netdevice operation.  
>>>> See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless
>>>> tx path"). The point is keep the tx path lockless to be efficient and
>>>> simplicity for management. And macvtap multiqueue was also implemented
>>>> with this assumption. The real contention should be done in the txq of
>>>> lower device instead of macvlan itself. This is also needed for
>>>> multiqueue macvtap.
>>> Ok, I see how you're preserving LLTX here, and thats great, but it doesn't
>>> really buy us anything that I can see.  If a macvlan is using hardware
>>> acceleration, it needs to arbitrate access to that hardware.  Weather thats done
>>> by locking the lowerdev's tx queue lock or by enforcing locking on the macvlan
>>> itself is equivalent.  The decision to use dfwd hardware acceleration is made on
>>> open, so its not like theres any traffic that can avoid the lock, as it all goes
>>> through the hardware.  All I see that this has bought us is an extra net_device
>>> method (which isn't a big deal, but not necessecary as I see it).
>> As I replied to patch 1/2, looking at the code itself again. The locking
>> on the lowerdev's tx queue is really need since we need synchronize with
>> other control path. Two examples are dev watchdog and ixgbe_down() both
>> of which will try to hold tx lock to synchronize the with transmission.
>> Without holding the lowerdev tx lock, we may have more serious issues.
>> Also, it's a little strange for a net device has two modes. Future
>> developers need to care about two different tx lock paths which is sub
>> optimal.
>>
> Ok, having looked at this for a few hours, I agree, locking in the lowerdev has
> some definiate advantages in plugging the holes you've pointed out.
>
>> For the issue of an extra net_device method,  if you don't like we can
>> reuse the ndo_select_queue by also passing the accel_priv to that method.
> I do, that actually simplifies things, since it lets us use the entire
> dev_hard_start_xmit path unmodified, which gives us the locking your looking for
> without having to create a new slimmed down variant of dev_hard_start_xmit.
>
> Regards
> Neil

Right, will post V2.

WARNING: multiple messages have this Message-ID (diff)
From: Jason Wang <jasowang@redhat.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: mst@redhat.com, e1000-devel@lists.sourceforge.net,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	John Fastabend <john.r.fastabend@intel.com>,
	davem@davemloft.net
Subject: Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding
Date: Thu, 09 Jan 2014 16:28:49 +0800	[thread overview]
Message-ID: <52CE5DC1.8070807@redhat.com> (raw)
In-Reply-To: <20140108144025.GA17802@neilslaptop.think-freely.org>

On 01/08/2014 10:40 PM, Neil Horman wrote:
> On Wed, Jan 08, 2014 at 11:21:21AM +0800, Jason Wang wrote:
>> On 01/07/2014 09:17 PM, Neil Horman wrote:
>>> On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote:
>>>> On 01/06/2014 08:42 PM, Neil Horman wrote:
>>>>> On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
>>>>>> Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
>>>>>> will cause several issues:
>>>>>>
>>>>>> - NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock
>>>>>>   contention.
>>>>>> - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
>>>>>>   watchdog
>>>>>> - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
>>>>>>   when tso is disabled for lower device.
>>>>>>
>>>>>> Fix this by explicitly introducing a select queue method just for l2 forwarding
>>>>>> offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the
>>>>>> queue selecting and transmitting for l2 forwarding.
>>>>>>
>>>>>> With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
>>>>>> to check txq against NULL in dev_hard_start_xmit().
>>>>>>
>>>>>> In the future, it was also required for macvtap l2 forwarding support since it
>>>>>> provides a necessary synchronization method.
>>>>>>
>>>>>> Cc: John Fastabend <john.r.fastabend@intel.com>
>>>>>> Cc: Neil Horman <nhorman@tuxdriver.com>
>>>>>> Cc: e1000-devel@lists.sourceforge.net
>>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>> Instead of creating another operation here to do special queue selection, why
>>>>> not just have ndo_dfwd_start_xmit include a pointer to a pointer in its argument
>>>>> list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)?
>>>>> ndo_dfwd_start_xmit already knows which queue set to pick from (since their
>>>>> reserved for the device doing the transmitting).  It seems more clear to me than
>>>>> creating a new netdevice operation.  
>>>> See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless
>>>> tx path"). The point is keep the tx path lockless to be efficient and
>>>> simplicity for management. And macvtap multiqueue was also implemented
>>>> with this assumption. The real contention should be done in the txq of
>>>> lower device instead of macvlan itself. This is also needed for
>>>> multiqueue macvtap.
>>> Ok, I see how you're preserving LLTX here, and thats great, but it doesn't
>>> really buy us anything that I can see.  If a macvlan is using hardware
>>> acceleration, it needs to arbitrate access to that hardware.  Weather thats done
>>> by locking the lowerdev's tx queue lock or by enforcing locking on the macvlan
>>> itself is equivalent.  The decision to use dfwd hardware acceleration is made on
>>> open, so its not like theres any traffic that can avoid the lock, as it all goes
>>> through the hardware.  All I see that this has bought us is an extra net_device
>>> method (which isn't a big deal, but not necessecary as I see it).
>> As I replied to patch 1/2, looking at the code itself again. The locking
>> on the lowerdev's tx queue is really need since we need synchronize with
>> other control path. Two examples are dev watchdog and ixgbe_down() both
>> of which will try to hold tx lock to synchronize the with transmission.
>> Without holding the lowerdev tx lock, we may have more serious issues.
>> Also, it's a little strange for a net device has two modes. Future
>> developers need to care about two different tx lock paths which is sub
>> optimal.
>>
> Ok, having looked at this for a few hours, I agree, locking in the lowerdev has
> some definiate advantages in plugging the holes you've pointed out.
>
>> For the issue of an extra net_device method,  if you don't like we can
>> reuse the ndo_select_queue by also passing the accel_priv to that method.
> I do, that actually simplifies things, since it lets us use the entire
> dev_hard_start_xmit path unmodified, which gives us the locking your looking for
> without having to create a new slimmed down variant of dev_hard_start_xmit.
>
> Regards
> Neil

Right, will post V2.

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

  reply	other threads:[~2014-01-09  8:29 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-06  3:21 [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap Jason Wang
2014-01-06  3:21 ` [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding Jason Wang
2014-01-06  3:21   ` Jason Wang
2014-01-06 12:04   ` [E1000-devel] " Jeff Kirsher
2014-01-06 12:04     ` Jeff Kirsher
2014-01-06 12:42   ` Neil Horman
2014-01-06 15:06     ` John Fastabend
2014-01-06 15:06       ` John Fastabend
2014-01-06 15:29       ` Neil Horman
2014-01-06 15:29         ` Neil Horman
2014-01-07  3:42     ` Jason Wang
2014-01-07  3:42       ` Jason Wang
2014-01-07 13:17       ` Neil Horman
2014-01-08  3:21         ` Jason Wang
2014-01-08  3:21           ` Jason Wang
2014-01-08 14:40           ` Neil Horman
2014-01-09  8:28             ` Jason Wang [this message]
2014-01-09  8:28               ` Jason Wang
2014-01-09 11:53               ` Neil Horman
2014-01-09 11:53                 ` Neil Horman
2014-01-07  8:22   ` John Fastabend
2014-01-07  8:37     ` John Fastabend
2014-01-06  7:35 ` [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap John Fastabend
2014-01-06  7:54   ` Jason Wang
2014-01-06 12:26     ` Neil Horman
2014-01-07  3:10       ` Jason Wang
2014-01-07  5:15         ` John Fastabend
2014-01-07  6:22           ` Jason Wang
2014-01-07  7:26             ` John Fastabend
2014-01-07  9:00               ` Jason Wang
2014-01-08 12:55                 ` Michael S. Tsirkin
2014-01-08 19:05                   ` John Fastabend
2014-01-09  7:17                     ` Michael S. Tsirkin
2014-01-09  8:55                       ` Jason Wang
2014-01-09 21:39                         ` Stephen Hemminger
2014-01-09 22:03                           ` Michael S. Tsirkin
2014-01-09 22:20                             ` Stephen Hemminger
2014-01-10  7:06                           ` Jason Wang
2014-01-10 16:40                             ` Vlad Yasevich
2014-01-07  5:16         ` John Fastabend
2014-01-06 20:47 ` David Miller
2014-01-07  3:17   ` Jason Wang
2014-01-07  5:57     ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52CE5DC1.8070807@redhat.com \
    --to=jasowang@redhat.com \
    --cc=davem@davemloft.net \
    --cc=e1000-devel@lists.sourceforge.net \
    --cc=john.r.fastabend@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.