From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Hemminger Subject: Re: IXGBE RX packet loss with 5+ cores Date: Mon, 12 Oct 2015 22:18:30 -0700 Message-ID: <20151012221830.6f5f42af@xeon-e3> References: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" To: "Sanford, Robert" Return-path: Received: from mail-pa0-f45.google.com (mail-pa0-f45.google.com [209.85.220.45]) by dpdk.org (Postfix) with ESMTP id C054011F5 for ; Tue, 13 Oct 2015 07:18:22 +0200 (CEST) Received: by padhy16 with SMTP id hy16so9677198pad.1 for ; Mon, 12 Oct 2015 22:18:20 -0700 (PDT) In-Reply-To: List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Tue, 13 Oct 2015 02:57:46 +0000 "Sanford, Robert" wrote: > I'm hoping that someone (perhaps at Intel) can help us understand > an IXGBE RX packet loss issue we're able to reproduce with testpmd. > > We run testpmd with various numbers of cores. We offer line-rate > traffic (~14.88 Mpps) to one ethernet port, and forward all received > packets via the second port. > > When we configure 1, 2, 3, or 4 cores (per port, with same number RX > queues per port), there is no RX packet loss. When we configure 5 or > more cores, we observe the following packet loss (approximate): > 5 cores - 3% loss > 6 cores - 7% loss > 7 cores - 11% loss > 8 cores - 15% loss > 9 cores - 18% loss > > All of the "lost" packets are accounted for in the device's Rx Missed > Packets Count register (RXMPC[0]). Quoting the datasheet: > "Packets are missed when the receive FIFO has insufficient space to > store the incoming packet. This might be caused due to insufficient > buffers allocated, or because there is insufficient bandwidth on the > IO bus." > > RXMPC, and our use of API rx_descriptor_done to verify that we don't > run out of mbufs (discussed below), lead us to theorize that packet > loss occurs because the device is unable to DMA all packets from its > internal packet buffer (512 KB, reported by register RXPBSIZE[0]) > before overrun. > > Questions > ========= > 1. The 82599 device supports up to 128 queues. Why do we see trouble > with as few as 5 queues? What could limit the system (and one port > controlled by 5+ cores) from receiving at line-rate without loss? > > 2. As far as we can tell, the RX path only touches the device > registers when it updates a Receive Descriptor Tail register (RDT[n]), > roughly every rx_free_thresh packets. Is there a big difference > between one core doing this and N cores doing it 1/N as often? > > 3. Do CPU reads/writes from/to device registers have a higher priority > than device reads/writes from/to memory? Could the former transactions > (CPU <-> device) significantly impede the latter (device <-> RAM)? > > Thanks in advance for any help you can provide. As you add cores, there is more traffic on the PCI bus from each core polling. There is a fix number of PCI bus transactions per second possible. Each core is increasing the number of useless (empty) transactions. Why do you think adding more cores will help?