netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] enic: improve rq buff allocation and reduce dma mapping
@ 2015-01-31 12:28 Govindarajulu Varadarajan
  2015-01-31 12:28 ` [PATCH net-next 1/4] enic: implement frag allocator Govindarajulu Varadarajan
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Govindarajulu Varadarajan @ 2015-01-31 12:28 UTC (permalink / raw)
  To: davem, netdev; +Cc: ssujith, benve, edumazet, ben, Govindarajulu Varadarajan

The following series tries to address these two problem in rq buff allocation.

* Memory wastage because of large 9k allocation using kmalloc:
  For 9k mtu buffer, netdev_alloc_skb_ip_align internally calls kmalloc for
  size > 4096. In case of 9k buff, kmalloc returns pages for order 2, 16k.
  And we use only ~9k of 16k. 7k memory wasted. Using the frag the frag
  allocator in patch 1/2, we can allocate three 9k buffs in a 32k page size.
  Typical enic configuration has 8 rq, and desc ring of size 4096.
  Thats 8 * 4096 * (16*1024) = 512 MB. Using this frag allocator:
  8 * 4096 * (32*1024/3) = 341 MB. Thats 171 MB of memory save.

* frequent dma_map() calls:
  we call dma_map() for every buff we allocate. When iommu is on, This is very
  cpu time consuming. From my testing, most of the cpu cycles are wasted
  spinning on spin_lock_irqsave(&iovad->iova_rbtree_lock, flags) in
  intel_map_page() .. -> ..__alloc_and_insert_iova_range()

  With this patch, we call dma_map() once for 32k page. i.e once for every three
  9k desc, and once every twenty 1500 bytes desc.

Here are testing result with 8 rq, 4096 ring size and 9k mtu. irq of each rq
is affinitized with different CPU. Ran iperf with 32 threads. Link is 10G.
iommu is on.

                CPU utilization         throughput

without patch   100%                    1.8 Gbps
with patch      13%                     9.8 Gbps

Govindarajulu Varadarajan (4):
  enic: implement frag allocator
  enic: Add rq allocation failure stats
  ethtool: add RX_ALLOC_ORDER to tunable
  enic: add ethtool support for changing alloc order

 drivers/net/ethernet/cisco/enic/enic.h         |  16 +++
 drivers/net/ethernet/cisco/enic/enic_ethtool.c |  17 +++
 drivers/net/ethernet/cisco/enic/enic_main.c    | 177 +++++++++++++++++++++----
 drivers/net/ethernet/cisco/enic/vnic_rq.c      |  13 ++
 drivers/net/ethernet/cisco/enic/vnic_rq.h      |   2 +
 drivers/net/ethernet/cisco/enic/vnic_stats.h   |   2 +
 include/uapi/linux/ethtool.h                   |   1 +
 net/core/ethtool.c                             |   5 +
 8 files changed, 209 insertions(+), 24 deletions(-)

-- 
2.2.2

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-02-05 17:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-31 12:28 [PATCH net-next 0/4] enic: improve rq buff allocation and reduce dma mapping Govindarajulu Varadarajan
2015-01-31 12:28 ` [PATCH net-next 1/4] enic: implement frag allocator Govindarajulu Varadarajan
2015-01-31 12:28 ` [PATCH net-next 2/4] enic: Add rq allocation failure stats Govindarajulu Varadarajan
2015-01-31 12:28 ` [PATCH net-next 3/4] ethtool: add RX_ALLOC_ORDER to tunable Govindarajulu Varadarajan
2015-02-03  3:21   ` David Miller
2015-02-03  9:49     ` Govindarajulu Varadarajan
2015-02-05 17:28       ` Govindarajulu Varadarajan
2015-01-31 12:28 ` [PATCH net-next 4/4] enic: add ethtool support for changing alloc order Govindarajulu Varadarajan
2015-02-02 15:56 ` [PATCH net-next 0/4] enic: improve rq buff allocation and reduce dma mapping David Laight
2015-02-02 17:49   ` Govindarajulu Varadarajan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).