From: Jason Wang <jasowang@redhat.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: mst@redhat.com, e1000-devel@lists.sourceforge.net,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
John Fastabend <john.r.fastabend@intel.com>,
davem@davemloft.net
Subject: Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding
Date: Thu, 09 Jan 2014 16:28:49 +0800 [thread overview]
Message-ID: <52CE5DC1.8070807@redhat.com> (raw)
In-Reply-To: <20140108144025.GA17802@neilslaptop.think-freely.org>
On 01/08/2014 10:40 PM, Neil Horman wrote:
> On Wed, Jan 08, 2014 at 11:21:21AM +0800, Jason Wang wrote:
>> On 01/07/2014 09:17 PM, Neil Horman wrote:
>>> On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote:
>>>> On 01/06/2014 08:42 PM, Neil Horman wrote:
>>>>> On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
>>>>>> Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
>>>>>> will cause several issues:
>>>>>>
>>>>>> - NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock
>>>>>> contention.
>>>>>> - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
>>>>>> watchdog
>>>>>> - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
>>>>>> when tso is disabled for lower device.
>>>>>>
>>>>>> Fix this by explicitly introducing a select queue method just for l2 forwarding
>>>>>> offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the
>>>>>> queue selecting and transmitting for l2 forwarding.
>>>>>>
>>>>>> With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
>>>>>> to check txq against NULL in dev_hard_start_xmit().
>>>>>>
>>>>>> In the future, it was also required for macvtap l2 forwarding support since it
>>>>>> provides a necessary synchronization method.
>>>>>>
>>>>>> Cc: John Fastabend <john.r.fastabend@intel.com>
>>>>>> Cc: Neil Horman <nhorman@tuxdriver.com>
>>>>>> Cc: e1000-devel@lists.sourceforge.net
>>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>> Instead of creating another operation here to do special queue selection, why
>>>>> not just have ndo_dfwd_start_xmit include a pointer to a pointer in its argument
>>>>> list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)?
>>>>> ndo_dfwd_start_xmit already knows which queue set to pick from (since their
>>>>> reserved for the device doing the transmitting). It seems more clear to me than
>>>>> creating a new netdevice operation.
>>>> See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless
>>>> tx path"). The point is keep the tx path lockless to be efficient and
>>>> simplicity for management. And macvtap multiqueue was also implemented
>>>> with this assumption. The real contention should be done in the txq of
>>>> lower device instead of macvlan itself. This is also needed for
>>>> multiqueue macvtap.
>>> Ok, I see how you're preserving LLTX here, and thats great, but it doesn't
>>> really buy us anything that I can see. If a macvlan is using hardware
>>> acceleration, it needs to arbitrate access to that hardware. Weather thats done
>>> by locking the lowerdev's tx queue lock or by enforcing locking on the macvlan
>>> itself is equivalent. The decision to use dfwd hardware acceleration is made on
>>> open, so its not like theres any traffic that can avoid the lock, as it all goes
>>> through the hardware. All I see that this has bought us is an extra net_device
>>> method (which isn't a big deal, but not necessecary as I see it).
>> As I replied to patch 1/2, looking at the code itself again. The locking
>> on the lowerdev's tx queue is really need since we need synchronize with
>> other control path. Two examples are dev watchdog and ixgbe_down() both
>> of which will try to hold tx lock to synchronize the with transmission.
>> Without holding the lowerdev tx lock, we may have more serious issues.
>> Also, it's a little strange for a net device has two modes. Future
>> developers need to care about two different tx lock paths which is sub
>> optimal.
>>
> Ok, having looked at this for a few hours, I agree, locking in the lowerdev has
> some definiate advantages in plugging the holes you've pointed out.
>
>> For the issue of an extra net_device method, if you don't like we can
>> reuse the ndo_select_queue by also passing the accel_priv to that method.
> I do, that actually simplifies things, since it lets us use the entire
> dev_hard_start_xmit path unmodified, which gives us the locking your looking for
> without having to create a new slimmed down variant of dev_hard_start_xmit.
>
> Regards
> Neil
Right, will post V2.
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
next prev parent reply other threads:[~2014-01-09 8:28 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-06 3:21 [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap Jason Wang
2014-01-06 3:21 ` [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding Jason Wang
2014-01-06 12:04 ` Jeff Kirsher
2014-01-06 12:42 ` Neil Horman
2014-01-06 15:06 ` John Fastabend
2014-01-06 15:29 ` Neil Horman
2014-01-07 3:42 ` Jason Wang
2014-01-07 13:17 ` Neil Horman
2014-01-08 3:21 ` Jason Wang
2014-01-08 14:40 ` Neil Horman
2014-01-09 8:28 ` Jason Wang [this message]
2014-01-09 11:53 ` Neil Horman
2014-01-07 8:22 ` John Fastabend
2014-01-07 8:37 ` John Fastabend
2014-01-06 7:35 ` [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap John Fastabend
2014-01-06 7:54 ` Jason Wang
2014-01-06 12:26 ` Neil Horman
2014-01-07 3:10 ` Jason Wang
2014-01-07 5:15 ` John Fastabend
2014-01-07 6:22 ` Jason Wang
2014-01-07 7:26 ` John Fastabend
2014-01-07 9:00 ` Jason Wang
2014-01-08 12:55 ` Michael S. Tsirkin
2014-01-08 19:05 ` John Fastabend
2014-01-09 7:17 ` Michael S. Tsirkin
2014-01-09 8:55 ` Jason Wang
2014-01-09 21:39 ` Stephen Hemminger
2014-01-09 22:03 ` Michael S. Tsirkin
2014-01-09 22:20 ` Stephen Hemminger
2014-01-10 7:06 ` Jason Wang
2014-01-10 16:40 ` Vlad Yasevich
2014-01-07 5:16 ` John Fastabend
2014-01-06 20:47 ` David Miller
2014-01-07 3:17 ` Jason Wang
2014-01-07 5:57 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52CE5DC1.8070807@redhat.com \
--to=jasowang@redhat.com \
--cc=davem@davemloft.net \
--cc=e1000-devel@lists.sourceforge.net \
--cc=john.r.fastabend@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).