Re: [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Alexander Duyck <alexander.h.duyck@intel.com>
To: David Miller <davem@davemloft.net>
Cc: bhutchings@solarflare.com, jeffrey.t.kirsher@intel.com,
	netdev@vger.kernel.org, gospo@redhat.com
Subject: Re: [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget
Date: Mon, 22 Aug 2011 15:57:51 -0700	[thread overview]
Message-ID: <4E52DEEF.40504@intel.com> (raw)
In-Reply-To: <20110822.135644.683110224886588181.davem@davemloft.net>

On 08/22/2011 01:56 PM, David Miller wrote:
> From: Alexander Duyck<alexander.h.duyck@intel.com>
> Date: Mon, 22 Aug 2011 10:29:51 -0700
>
>> The only problem I was seeing with that was that in certain cases it
>> seemed like the TX cleanup could consume enough CPU time to cause
>> pretty significant delays in processing the RX cleanup.  This in turn
>> was causing single queue bi-directional routing tests to come out
>> pretty unbalanced since what seemed to happen is that one CPUs RX work
>> would overwhelm the other CPU with the TX processing resulting in an
>> unbalanced flow that was something like a 60/40 split between the
>> upstream and downstream throughput.
> But the problem is that now you're applying the budget to two operations
> that have much differing costs.  Freeing up a TX ring packet is probably
> on the order of 1/10th the cost of processing an incoming RX ring frame.
>
> I've advocated to not apply the budget at all to TX ring processing.
I fully understand that the TX path is much cheaper than the RX path.  
One step I have taken in all of this code is that the TX path only 
counts SKBs cleaned, it doesn't count descriptors.  So a single 
descriptor 60byte transmit will cost the same as a 64K 18 descriptor 
TSO.  All I am really counting is the number of times I have called 
dev_kfree_skb_any();
> I can see your delimma with respect to RX ring processing being delayed,
> but if that's really happening you can consider whether the TX ring is
> simply too large.
The problem was occurring even without large rings.  I was seeing issues 
with rings just 256 descriptors in size.  The problem seemed to be that 
the TX cleanup being a multiple of budget was allowing one CPU to 
overwhelm the other and the fact that the TX was essentially unbounded 
was just allowing the issue to feedback on itself.

In the routing test case I was actually seeing significant advantages to 
this approach as we were essentially cleaning just the right number of 
buffers to make room for the next set of transmits when the RX cleanup 
came though.  In addition since the RX and TX workload was balanced it 
kept both locked into polling while the CPU was saturated instead of 
allowing the TX to become interrupt driven.  In addition since the TX 
was working on the same budget as the RX the number of SKBs freed up in 
the TX path would match the number consumed when being reallocated on 
the RX path.

> In any event can you try something like dampening the cost applied to
> budget for TX work (1/2, 1/4, etc.)?  Because as far as I can tell, if
> you are really hitting the budget limit on TX then you won't be doing
> any RX work on that device until a future NAPI round that depletes the
> TX ring work without going over the budget.
The problem seemed to be present as long as I allowed the TX budget to 
be a multiple of the RX budget.  The easiest way to keep things balanced 
and avoid allowing the TX from one CPU to overwhelm the RX on another 
was just to keep the budgets equal.

I'm a bit confused by this last comment.  The full budget is used for TX 
and RX, it isn't divided.  I do a budget worth of TX cleanup and a 
budget worth of RX cleanup within the ixgbe_poll routine, and if either 
of them consume their full budget then I return the budget value as the 
work done.

If you are referring to the case where two devices are sharing the CPU 
then I would suspect this might lead to faster consumption of the 
netdev_budget, but other than that I don't see any starvation issues for 
RX or TX.

Thanks,

Alex

next prev parent reply	other threads:[~2011-08-22 23:00 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-21  7:29 [net-next 00/10][pull request] Intel Wired LAN Driver Update Jeff Kirsher
2011-08-21  7:29 ` [net-next 01/10] ixgbe: Simplify transmit cleanup path Jeff Kirsher
2011-08-21  7:29 ` [net-next 02/10] ixgbe: convert rings from q_vector bit indexed array to linked list Jeff Kirsher
2011-08-21  7:29 ` [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget Jeff Kirsher
2011-08-21 14:01   ` Ben Hutchings
2011-08-22 16:30     ` Alexander Duyck
2011-08-22 16:46       ` Ben Hutchings
2011-08-22 17:29         ` Alexander Duyck
2011-08-22 20:56           ` David Miller
2011-08-22 22:57             ` Alexander Duyck [this message]
2011-08-22 23:40               ` David Miller
2011-08-23  4:04                 ` Alexander Duyck
2011-08-23 20:52                   ` Alexander Duyck
2011-08-21  7:29 ` [net-next 04/10] ixgbe: consolidate all MSI-X ring interrupts and poll routines into one Jeff Kirsher
2011-08-21  7:29 ` [net-next 05/10] ixgbe: cleanup allocation and freeing of IRQ affinity hint Jeff Kirsher
2011-08-21  7:29 ` [net-next 06/10] ixgbe: Use ring->dev instead of adapter->pdev->dev when updating DCA Jeff Kirsher
2011-08-21  7:29 ` [net-next 07/10] ixgbe: commonize ixgbe_map_rings_to_vectors to work for all interrupt types Jeff Kirsher
2011-08-21  7:29 ` [net-next 08/10] ixgbe: Drop unnecessary adapter->hw dereference in loopback test setup Jeff Kirsher
2011-08-21  7:29 ` [net-next 09/10] ixgbe: combine PCI_VDEVICE and board declaration to same line Jeff Kirsher
2011-08-21  7:29 ` [net-next 10/10] ixgbe: Update TXDCTL configuration to correctly handle WTHRESH Jeff Kirsher
2011-08-23 22:45 ` [net-next 00/10][pull request] Intel Wired LAN Driver Update David Miller
2011-08-24  2:34   ` Jeff Kirsher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E52DEEF.40504@intel.com \
    --to=alexander.h.duyck@intel.com \
    --cc=bhutchings@solarflare.com \
    --cc=davem@davemloft.net \
    --cc=gospo@redhat.com \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).