From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Marinus, Dennis" Subject: rte_pktmbuf_clone returning NULL while mempool shouldn't be full Date: Mon, 13 Jul 2015 23:41:23 +0000 Message-ID: <1436830881421.51375@amazon.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable To: "dev@dpdk.org" Return-path: Received: from smtp-fw-2101.amazon.com (smtp-fw-2101.amazon.com [72.21.196.25]) by dpdk.org (Postfix) with ESMTP id 0AC562C7A for ; Tue, 14 Jul 2015 01:41:28 +0200 (CEST) Received: from ex10-hub-7002.ant.amazon.com (pdx1-ws-svc-lb16-vlan2.amazon.com [10.239.138.210]) by email-inbound-relay-64006.pdx4.amazon.com (8.14.7/8.14.7) with ESMTP id t6DNfN8U023092 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL) for ; Mon, 13 Jul 2015 23:41:26 GMT Content-Language: en-US List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hey, I'm having some trouble calculating the right mempool & ring sizes for my a= pplication. I'm getting mempool full errors even though that shouldn't be p= ossible, so I'm missing something. Can you help me figure out if my math is= correct? My application is using a nic -> rx_core -> worker_core -> tx_core -> nic m= odel, with a packet sample core on the side. There are rings between the wo= rker cores and this pcap core, and the worker cores do an rte_pktmbuf_clone= when they want to sample an mbuf. The pcap core is responsible for sending= these mbufs out of kni interfaces. Here are some numbers: number of nics: 2 nic rx queue: 1024 number of rx cores: 2 rx rte_eth_rx_burst size: 64 number of rings between rx_core & worker_core: 20 ring size: 1024 number of worker cores: 10 worker_core rte_ring_sc_dequeue_burst size: 128 number of rings between worker_core & tx_core: 20 ring size: 512 number of tx cores: 2 tx_code rte_ring_sc_dequeue_burst: 64 number of nics: 2 nic tx queue: 512 number of pcap cores: 1 number of rings between worker_core & pcap_core: 20 ring size: 256 pcap core rte_ring_sc_dequeue_burst: 64 per lcore cache is disabled when creating the mempool. So adding this all up (working backwards from tx to rx) gives me: 2 full nic tx queues: 1024 2 full local dequeue buffers in the tx cores: 128 20 full rings between worker & tx: 10240 10 full local dequeue buffers in the worker cores: 1280 20 full rings between rx & worker: 20480 2 full local dequeue buffers in the rx cores: 128 2 full nic rx queues: 2048 1 full local dequeue buffer in the pcap core: 64 20 full rings between the worker & pcap core: 5120 Adding this all up gets me to 40512 elements. My mempool is created with ((= 1 << 16) - 1) or 65535 elements. I only have one mempool and the applicatio= n is restricted to a single numa node. My mempool element size is set to something like 2K + headroom and we don't= have jumbo packets on the network, so each mbuf should only require one el= ement from the mempool. What I'm seeing is that rte_pktmbuf_clone on the worker cores returns NULL = sometimes, and if I immediately after that print an rte_mempool_count I get= numbers back in the single or double digits. Where are the rest of the free elements? According to my math I should have= in the order of 25K free elements in the mempool. What am I missing? What = is the recommended usage of a mempool? - Dennis