From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Greear Subject: Re: pktgen and spin_lock_bh in xmit path Date: Mon, 19 Oct 2009 21:52:05 -0700 Message-ID: <4ADD41F5.5080707@candelatech.com> References: <4ADD309B.1040505@candelatech.com> <4ADD32FA.6030409@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: NetDev To: Eric Dumazet Return-path: Received: from mail.candelatech.com ([208.74.158.172]:40241 "EHLO ns3.lanforge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751734AbZJTEwE (ORCPT ); Tue, 20 Oct 2009 00:52:04 -0400 In-Reply-To: <4ADD32FA.6030409@gmail.com> Sender: netdev-owner@vger.kernel.org List-ID: Eric Dumazet wrote: > Ben Greear a =E9crit : > =20 >> I'm having strange issues when running pktgen on 10G interfaces whil= e >> also running >> pktgen on mac-vlans on that interface, when the mac-vlan pktgen thre= ads >> are on a different >> CPU. >> >> First, lockdep gives up and says that things are not properly >> annotated. I believe this is because >> the macvlan tx path will lock it's txq and will also lock the >> lower-dev's txq. To fix this, perhaps >> we need some new lockdep aware primitives for netdev txq locking? >> >> Second, is using _bh() locking really sufficient if we have pktgen >> writing to a physical device >> and also have other pktgen threads writing to that same device thoug= h >> mac-vlans? I'm seeing >> deadlocks spinning on the _bh() lock in pktgen as well as strange >> corruptions, so I think there >> must be *some* problem somewhere, I just don't know quite what it is= yet. >> >> =20 > > Could you please give us a copy if your pktgen scripts ? > =20 I'm driving it with another program, and my pktgen is a bit hacked, but= =20 the basic idea is: 1 pktgen connection on cpu 0 running as fast as it can (trying for=20 10Gbps, but getting maybe 3-4), running between two 10G ports (intel 82599). Multi-pkt is set to 10,000 on each side. 3 pairs of mac-vlans on each of the two physical 10G ports. 3 pktgen 'connections' between these..each are running at about 1Gbps. These 3 pktgen connections are on CPU 4. Multi-pkt is set to 1 since multi-pkt is a very bad idea on virtual=20 devices. 1514 byte pkts. No IPs on the interfaces, using ToS in pktgen, but=20 nothing else is configured to care. The two physical ports are cabled together directly with a fibre cable. All pktgen connections are full duplex (both sides transmitting to each= =20 other..and I have rx logic to gather stats on received pkts as well). With no kernel=20 debugging, this can run right at 10Gbps bi-directional, with lockdep it gets around 5-6Gbps in each direction. The lockup often occurs near starting/stopping pktgen, but also happens= =20 while just normally running under load, usually within 10 minutes. I tried and failed to reproduce this on a 1G network, but maybe I'm jus= t=20 not getting (un)lucky, didn't try for too long. Among other things, it appears as if the mac-vlan interfaces sometimes=20 become locked to transmit by pktgen, but a raw socket in user-space can send on them fine. I'm=20 going to add some debugging for this particular issue tomorrow to try to figure out why that happen= s. Please note I have the rest of my network patches applied (but not usin= g=20 any proprietary modules), so it could easily be something I've caused. I think fixing lockdep to= =20 work with the txq _bh locks would be a good first step to fixing this.. Thanks, Ben --=20 Ben Greear =20 Candela Technologies Inc http://www.candelatech.com