Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 0/2] Improve sequence number generation.
From: George Spelvin @ 2011-08-20 23:39 UTC (permalink / raw)
  To: davem, linux, mpm; +Cc: dan, gerrit, herbert, linux-kernel, netdev, w
In-Reply-To: <20110816103028.1360.qmail@science.horizon.com>

(Apologies if I'm obverbroad on the Cc: list.)

I've beeen concerned by the recent change to initial sequence number
generation, from a time-varying 24-bit hash of the endpoint addresses
to a fixed 32-bit hash.

First of all, my apologies that I didn't see this when it was posted
for comment August 7; I only noticed when I tried to merge some local
experiments with -stable and found a conflict in drivers/random.c.

My concern primarily is that the local secret used to compute the hashes
is generated very early in the boot sequence, before any significant
amount of entropy is accumulated.  And since it's constant for the uptime
of the machine, an attacker has a considerable length of time to find
and explot the secret value.

While the increase to 32 bits is definitely desirable, and defends
against a much less sophisticated attack, I'm concerned that this is a
case of robbing Peter to pay Paul.

Trying to improve this, I'm working in a few directions:
1) Postpone the seeding as late in the boot process as possible.
   It's quite low-overhead to generate it only when the first TCP
   connection is made, which hopefully is preceded by running
   init and at least a little bit of device driver activity.

2) Do *both*: Use a fixed 32-bit offset *plus* a time-varing one.
   They can be added together and provide the security advantages of
   both.  The only cost is having to compute two hashes per SYN.

   The main problem here is coming up with a hash function fast enough
   that computing both hashes is no slower than one MD5 invocation.

3) Extend the 24-bit time-varying hash to a 28-bit one.
   This can cause the sequence numbers to wrap in 7/8 of the time
   they would with a fixed offset, but that doesn't seem too bad.
   (That's worst case; it's a triangular distribution centered
   on 15/16.)

It's relatively easy to hash quickly with 15 64-bit registers, but doing
it with 7 32-bit registers is decidedly trickier.

I'm currently playing with a 36-round 6x32-bit variant of the SHA-3
candidate Skein.  I haven't run the genetic algorithm to select optimal
rotation constants, but they shouldn't affect the timing.
(I'm also going to ask the Skein team to look over my work.)

So far, it is notably faster than MD5 (89 ns/hash vs. 148 on a 2.5 GHz
Phenom), as well as being much smaller (383 bytes as opposed to 1951 for
the core transform).  One limitation is that it only hashes 6 32-bit
words per transform.  Thus, IPv6 would need to use two iterations,
or go back to MD5.

As mentioned, we can use a different algorithm for 64-bit processors.
Or even 32-bit ones with more registers.  So the speed problem only
exists for IPv6 on 32-bit x86.

(For example, on a 64-bit processor, two parallel MD5 tranforms
can be computed in barely more time than one.)

A few questions, all related to performance requirements:
* Should I worry about 32-bit x86 performance at all, since it's
  pretty unlikely that a 32-bit machine will be running traffic levels
  (1000+ connections/sec) where it matters?
* Should I worry about 32-bit IPv6 performance, since that's even more
  unlikely to be running heavy loads on 32-bit hardware?
* If yes, is this fast enough to be acceptable, or do I need to work
  harder to find more speed?

Willy, apparently you did some benchmarking of various hash functions.
Is that data available somewhere?  Even if not, just a brief description
of the methodology and assumptions would help to make sure I'm measuring
in a reasonable way.

^ permalink raw reply

* Re: [PATCH 0/2] Improve sequence number generation.
From: David Miller @ 2011-08-20 23:44 UTC (permalink / raw)
  To: linux; +Cc: mpm, dan, gerrit, herbert, linux-kernel, netdev, w
In-Reply-To: <20110820233951.6428.qmail@science.horizon.com>

From: "George Spelvin" <linux@horizon.com>
Date: 20 Aug 2011 19:39:51 -0400

> While the increase to 32 bits is definitely desirable, and defends
> against a much less sophisticated attack, I'm concerned that this is a
> case of robbing Peter to pay Paul.

I disagree, attacking this random number selection is much more theoretical
than the brute force attacks possible on 24-bits of entropy.

Show me a usable attack on a real system, then we can talk.

By comparison, real attacks against the 24-bit value have been
demonstrated.

> 2) Do *both*: Use a fixed 32-bit offset *plus* a time-varing one.
>    They can be added together and provide the security advantages of
>    both.  The only cost is having to compute two hashes per SYN.
> 
>    The main problem here is coming up with a hash function fast enough
>    that computing both hashes is no slower than one MD5 invocation.

Doubling the hashing cost is a non-starter.   Going to MD5 itself was
a huge lose, and was right at the brink of acceptable performance loss.

This whole change was nearly nixed because of the cost introduced
merely by going to MD5.

> 3) Extend the 24-bit time-varying hash to a 28-bit one.
>    This can cause the sequence numbers to wrap in 7/8 of the time
>    they would with a fixed offset, but that doesn't seem too bad.
>    (That's worst case; it's a triangular distribution centered
>    on 15/16.)

I want to stay with a 32-bits of entropy, thank you very much.

^ permalink raw reply

* Re: inetd and Linux kernel 3.0
From: Dâniel Fraga @ 2011-08-20 23:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110820.163312.1686705823210711719.davem@davemloft.net>

On Sat, 20 Aug 2011 16:33:12 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> It is trying to bind to an ipv6 address using an ipv4 socket.

	Thanks David! Exactly. I compiled inetd with ipv6 disabled and
now everything is fine.

	Thank you very much!

-- 

^ permalink raw reply

* Re: [net-next 0/6][pull request] Intel Wired LAN Driver Update
From: David Miller @ 2011-08-21  0:29 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo
In-Reply-To: <1313759486-23575-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 19 Aug 2011 06:11:20 -0700

> The following series contains updates to e1000e and ixgbe.
> 
> The following are changes since commit ae1511bf769cafeae5ab61aaf9947a16a22cbd10:
>   net: rps: support PPPOE session messages
> and are available in the git repository at:
>   master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next master

I had done a net --> net-next merge right before pulling this so there
was a slight merge conflict, which I think I resolved correctly.

Please double-check my work.

Thanks.

^ permalink raw reply

* Re: [PATCH] net: add APIs for manipulating skb page fragments.
From: David Miller @ 2011-08-21  0:31 UTC (permalink / raw)
  To: ian.campbell; +Cc: netdev, linux-kernel, eric.dumazet, mirq-linux
In-Reply-To: <1313771100-22993-1-git-send-email-ian.campbell@citrix.com>

From: Ian Campbell <ian.campbell@citrix.com>
Date: Fri, 19 Aug 2011 17:25:00 +0100

> The primary aim is to add skb_frag_(ref|unref) in order to remove the use of
> bare get/put_page on SKB pages fragments and to isolate users from subsequent
> changes to the skb_frag_t data structure.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

You're going to have to protect all of the things using the interfaces
from linux/dma-mapping.h with CONFIG_HAS_DMA otherwise it won't build
on platforms like S390.

^ permalink raw reply

* Re: [PATCH 0/2] Improve sequence number generation.
From: George Spelvin @ 2011-08-21  0:49 UTC (permalink / raw)
  To: davem, linux; +Cc: dan, gerrit, herbert, linux-kernel, mpm, netdev, w
In-Reply-To: <20110820.164436.684817976385970137.davem@davemloft.net>

>> While the increase to 32 bits is definitely desirable, and defends
>> against a much less sophisticated attack, I'm concerned that this is a
>> case of robbing Peter to pay Paul.

> I disagree, attacking this random number selection is much more theoretical
> than the brute force attacks possible on 24-bits of entropy.

Can you explain more precisely what you disagree with?

What you state after the comma appears to be agreeing with what I
wrote (it seems like a restatement of my first two clauses), so I'm
unenlightened as to where the disagreement is.

I'm not saying you didn't address a real problem, just that fixing
one problem exposed another, and it would be nice to address *both*.

> Show me a usable attack on a real system, then we can talk.

If you like.  It's about a week of implementation work.  (And I
don't have 1 week/week of free time, so more than that elapsed.)

> By comparison, real attacks against the 24-bit value have been
> demonstrated.

Anywhere that I can see?

>> 2) Do *both*: Use a fixed 32-bit offset *plus* a time-varing one.
>>    They can be added together and provide the security advantages of
>>    both.  The only cost is having to compute two hashes per SYN.
>> 
>>    The main problem here is coming up with a hash function fast enough
>>    that computing both hashes is no slower than one MD5 invocation.

> Doubling the hashing cost is a non-starter.   Going to MD5 itself was
> a huge lose, and was right at the brink of acceptable performance loss.
> 
> This whole change was nearly nixed because of the cost introduced
> merely by going to MD5.

Okay, I'll make certain a proposed solution is strictly faster than MD5.
I was asking about performance goals, and you've given me an answer.
Thank you very much!

The patch comment was fairly offhand about the performance cost, and
prior discussion was apparently private, so it wasn't clear how much
pain people experienced.

My only other question is whether IPv6 on x86-32 specificaly needs to be
faster than MD5.  Is that negotiable, or is that also a hard limit?
(This is challenging because it's trying to hash 288 bits of address material
in 224 bits of available registers.)

Eureka!  The possible source addresses are very limited.  It's possible to
pre-hash them, then you only have 160 bits of per-connection variability,
which can fit in a second hash block.

This requires finding somewhere in the network stack to store the
pre-hashed IPv6 addresses, as well as a fallback to use when spoofing
other source addresses, but that shouldn't be TOO difficult.

>> 3) Extend the 24-bit time-varying hash to a 28-bit one.
>>    This can cause the sequence numbers to wrap in 7/8 of the time
>>    they would with a fixed offset, but that doesn't seem too bad.
>>    (That's worst case; it's a triangular distribution centered
>>    on 15/16.)

> I want to stay with a 32-bits of entropy, thank you very much.

My goal is to give you *both*.  32 bits fixed + 28 bits time-varying.
An attacker would have to cryptanalyze the 32 bits (which the 28 bits
makes harder) *and* brute-force the 28 bits.

(It's almost certainly simpler to brute-force 32 bits.)


Thank you for your response!

^ permalink raw reply

* Re: [PATCH 0/2] Improve sequence number generation.
From: Willy Tarreau @ 2011-08-21  1:28 UTC (permalink / raw)
  To: George Spelvin; +Cc: davem, mpm, dan, gerrit, herbert, linux-kernel, netdev
In-Reply-To: <20110820233951.6428.qmail@science.horizon.com>

Hi George,

On Sat, Aug 20, 2011 at 07:39:51PM -0400, George Spelvin wrote:
(...)
> A few questions, all related to performance requirements:
> * Should I worry about 32-bit x86 performance at all, since it's
>   pretty unlikely that a 32-bit machine will be running traffic levels
>   (1000+ connections/sec) where it matters?

1000 connections per second is a moderately low load even for a
32-bit machine. I'm used to play in the 10-100k/s range on 32-bit,
depending on the usage pattern, I even reached 300k/s on an anti-ddos
machine. So yes, performance matters a lot, especially when we risk
to slow down one small operation that is done many times a second.

> * Should I worry about 32-bit IPv6 performance, since that's even more
>   unlikely to be running heavy loads on 32-bit hardware?

On x86 you're probably right, but there are other very fast platforms
such as ARM, which are used to build routers or appliances, and which
are 32-bit and there it may matter.

> * If yes, is this fast enough to be acceptable, or do I need to work
>   harder to find more speed?

I'd suggest that the most important is no performance regression. Probably
that if you can bring something which brings back what we lost with MD5,
your work would gain interest.

> Willy, apparently you did some benchmarking of various hash functions.
> Is that data available somewhere?  Even if not, just a brief description
> of the methodology and assumptions would help to make sure I'm measuring
> in a reasonable way.

I'm copy-pasting here the memo I exchanged in private after my tests, there
is nothing secret in it, so better post the whole explanation :

-------------------------------------------------------------------------
I did an ugly patch which consists in replacing calls to md5_transform()
with sha_transform() in secure_ip_id(), secure_tcp_sequence_number(),
secure_ipv4_port_ephemeral() on top of David's patches. I kept the same
hashing method, without calling sha_init() and by filling the hash with
net_secret, eg :

@@ -104,28 +107,32 @@ __u32 secure_ipv6_id(const __be32 daddr[4])
 __u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
                                 __be16 sport, __be16 dport)
 {
-       u32 hash[MD5_DIGEST_WORDS];
+       u32 hash[SHA_DIGEST_WORDS];
+       u32 workspace[SHA_WORKSPACE_WORDS];

        hash[0] = (__force u32)saddr;
        hash[1] = (__force u32)daddr;
        hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
-       hash[3] = net_secret[15];
+       hash[3] = net_secret[14];
+       hash[4] = net_secret[15];

-       md5_transform(hash, net_secret);
+       sha_transform(hash, (const char *)net_secret, workspace);

        return seq_scale(hash[0]);
 }


With this I could run tests on mainline (called "MD4" below), David's code
("MD5") and the transform above ("SHA1"). The tests involved connecting
from the test machine to an external HTTP server and retrieving an empty
object. This test was followed by two other series, one on a server which
immediately resets upon accept (to reproduce the SYN, SYN/ACK, ACK, RST
sequence I'm used to encounter when setting up anti-DDoS filters), and
a SYN, RST sequence caused by sending the traffic to a closed port, in
order to more accurately observe the differences.

I switch the test machine to an Atom N450 running in 64-bit mode in order
to benefit from the SHA1 optimizations.

Numbers are in connections per second.

kernel   http   RST server   closed port
-------+------+------------+------------
 MD4     9610      7840       16950
 MD5     9340      7560       16360
 SHA1    9250      7280       15400

In HTTP, performance drops by 2.8% when switching to MD5, and by 3.75
when using SHA1 instead. With the reset server, MD5 takes a 3.6% hit
and SHA1 7.15%. On the closed port test, which sees only SYN and RST
packets, MD5 takes a 3.5% hit and SHA1 a 9.15% one.

Note that the biggest hit was still the 2.6.35.11 -> 3.0-git upgrade,
because HTTP gives me 10040 cps in 2.6.35.11. I think it's the compiler
and not the kernel : I used to build 2.6.35 with gcc-3.4 but had to
use a more recent toolchain (gcc 4.4) with 3.0 due to cmpxchg16b, and
my experience with gcc has always been a noticeable performance loss
with each new version, so that seems consistent...

All in all, while the SHA1 cost becomes concerning, it could be used
as an alternative to MD5 when we add a sysctl to select between
performance and security.
-------------------------------------------------------------------------

Note that this wasn't the best machine for the test, but it was available
and moreover it required little additional hardware to saturate it ;-)

Best regards,
Willy

^ permalink raw reply

* strange routing issue--packets stop getting forwarded for a live connection
From: Corey Hickey @ 2011-08-21  2:15 UTC (permalink / raw)
  To: Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 3053 bytes --]

Hi,

Please forgive me for asking a user question on a dev list; does the
linux-net list no longer exist? Majordomo wouldn't subscribe me and I
see no recent history in the archives. If there's a better place for
this question, please tell me. Anyway:

I have a strange issue where, reliably, certain conditions cause my
Linux router to stop forwarding packets for a connection.

----------------------------------------------------------------------

This is my setup:

client      --> linux router          --> vpn --> work desktop
198.18.0.3      198.18.0.1    (eth0)              192.168.10.88
                192.168.6.230 (tun0)

All hosts are running Debian Sid with the stock Debian 3.0.0-1-amd64
kernel. tun0 is set up by openconnect (open-source client for cisco
anyconnnect), which has been historically reliable for me.

I noticed this problem happening when I replaced the router with a new
host. The old host was 32-bit, running Linux 2.6.38, and configured
identically (I think) with respect to routing and iptables. I didn't
have a problem then.

----------------------------------------------------------------------

I have seen this problem happen with http, sometimes, but the easiest
way to reproduce the issue every time is to use SSH with X11 forwarding
(I have no idea why). I can SSH, through my router and VPN connection,
to my desktop at work. I can log in, poke around, do whatever; as soon
as I run some particular X11 programs, the connection hangs. xlogo and
xeyes are fine, but rxvt and jconsole are not.

So, my baseline test is to run rxvt directly. This command always hangs:

$ ssh -X chickey@192.168.10.88 rxvt

I have run simultaneous tcpdumps on the router: one on eth0 and the
other on tun0. I see the tcp connection and ssh sessions get set up,
then many encrypted packets go back and forth. At a certain, reliably
reproducible point, a 1368 byte packet comes in on eth0 and does not
leave tun0; the retransmissions do not get forwarded either.

I have not been able to figure out the cause of this. Here's what I have
investigated:

1. Number of packets on the connection; doesn't seem to matter, because
I can use SSH for other purposes just fine.

2. Transmission rate; doesn't seem to matter, because I can do
$ ssh -X chickey@192.168.10.88 cat /dev/zero > /dev/null

3. MTU size; 1500 on eth0 and 1406 on tun0. Bigger packets have been
transferred fine.

4. VPN client bug; maybe, but I don't think so yet. I can do the same
thing if I SSH directly from the router. This is fine:
ssh -X 198.18.0.1 "ssh -X chickey@192.168.10.88 rxvt"

5. Connection tracking issue; conntrack shows no change in stage for the
connection when it hangs.

6. Some firewall rule. Stripping down my iptables setup to the minimum
does not help. I have also removed all qdiscs.

----------------------------------------------------------------------

Can anybody please suggest something else I should try here? This is
very confusing to me.

I am attaching a tarball of tcpdumps and other pertinent information.


Thank you,
Corey

[-- Attachment #2: problem.tar.bz2 --]
[-- Type: application/octet-stream, Size: 23175 bytes --]

^ permalink raw reply

* Re: [net-next 0/6][pull request] Intel Wired LAN Driver Update
From: Jeff Kirsher @ 2011-08-21  2:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, gospo@redhat.com
In-Reply-To: <20110820.172919.1482966002228420753.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 819 bytes --]

On Sat, 2011-08-20 at 17:29 -0700, David Miller wrote:
> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Fri, 19 Aug 2011 06:11:20 -0700
> 
> > The following series contains updates to e1000e and ixgbe.
> > 
> > The following are changes since commit ae1511bf769cafeae5ab61aaf9947a16a22cbd10:
> >   net: rps: support PPPOE session messages
> > and are available in the git repository at:
> >   master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next master
> 
> I had done a net --> net-next merge right before pulling this so there
> was a slight merge conflict, which I think I resolved correctly.
> 
> Please double-check my work.
> 
> Thanks.

I knew the e1000e driver version would conflict as soon as a I saw you
updated against net.

e1000e and ixgbe look good, thanks Dave.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [PATCH 0/2] Improve sequence number generation.
From: George Spelvin @ 2011-08-21  3:04 UTC (permalink / raw)
  To: linux, w; +Cc: dan, davem, gerrit, herbert, linux-kernel, mpm, netdev
In-Reply-To: <20110821012844.GA15222@1wt.eu>

>> * Should I worry about 32-bit IPv6 performance, since that's even more
>>   unlikely to be running heavy loads on 32-bit hardware?

> On x86 you're probably right, but there are other very fast platforms
> such as ARM, which are used to build routers or appliances, and which
> are 32-bit and there it may matter.

Thanks for the feedback.  Your point about routers is well-taken.
It's particularly x86-32 which gives me fits, but I'll keep ARM
performance in mind, too.

>> * If yes, is this fast enough to be acceptable, or do I need to work
>>   harder to find more speed?

> I'd suggest that the most important is no performance regression. Probably
> that if you can bring something which brings back what we lost with MD5,
> your work would gain interest.

Okay, I'll go back to the drawing board on performance.

Damn, this is going to be tough.

> I'm copy-pasting here the memo I exchanged in private after my tests, there
> is nothing secret in it, so better post the whole explanation :

Thank you very much.  It helps me figure out what the time budget is.

^ permalink raw reply

* Re: [PATCH 0/2] Improve sequence number generation.
From: Ted Ts'o @ 2011-08-21  3:27 UTC (permalink / raw)
  To: George Spelvin; +Cc: w, dan, davem, gerrit, herbert, linux-kernel, mpm, netdev
In-Reply-To: <20110821030415.18106.qmail@science.horizon.com>

Here's a random thought --- it won't help on anything other than
modern x86's, but who's to say we have to use the same algorithm on
all platforms?  Does the AES-NI facility provide enough of a speedup
that it's worth using it instead of MD5, at least on modern x86
systems which have this support?

					- Ted

^ permalink raw reply

* Re: [PATCH 0/2] Improve sequence number generation.
From: Herbert Xu @ 2011-08-21  4:02 UTC (permalink / raw)
  To: Ted Ts'o, George Spelvin, w, dan, davem, gerrit, linux-kernel,
	mpm, net
In-Reply-To: <20110821032753.GA8992@thunk.org>

On Sat, Aug 20, 2011 at 11:27:53PM -0400, Ted Ts'o wrote:
> Here's a random thought --- it won't help on anything other than
> modern x86's, but who's to say we have to use the same algorithm on
> all platforms?  Does the AES-NI facility provide enough of a speedup
> that it's worth using it instead of MD5, at least on modern x86
> systems which have this support?

It is fast but it also touches SSE state.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: strange routing issue--packets stop getting forwarded for a live connection
From: Julian Anastasov @ 2011-08-21  6:35 UTC (permalink / raw)
  To: Corey Hickey; +Cc: Linux Netdev List
In-Reply-To: <4E506A46.6060407@fatooh.org>


	Hello,

On Sat, 20 Aug 2011, Corey Hickey wrote:

> Hi,
> 
> Please forgive me for asking a user question on a dev list; does the
> linux-net list no longer exist? Majordomo wouldn't subscribe me and I
> see no recent history in the archives. If there's a better place for
> this question, please tell me. Anyway:
> 
> I have a strange issue where, reliably, certain conditions cause my
> Linux router to stop forwarding packets for a connection.
> 
> ----------------------------------------------------------------------
> 
> This is my setup:
> 
> client      --> linux router          --> vpn --> work desktop
> 198.18.0.3      198.18.0.1    (eth0)              192.168.10.88
>                 192.168.6.230 (tun0)
> 
> All hosts are running Debian Sid with the stock Debian 3.0.0-1-amd64
> kernel. tun0 is set up by openconnect (open-source client for cisco
> anyconnnect), which has been historically reliable for me.
> 
> I noticed this problem happening when I replaced the router with a new
> host. The old host was 32-bit, running Linux 2.6.38, and configured
> identically (I think) with respect to routing and iptables. I didn't
> have a problem then.
> 
> ----------------------------------------------------------------------
> 
> I have seen this problem happen with http, sometimes, but the easiest
> way to reproduce the issue every time is to use SSH with X11 forwarding
> (I have no idea why). I can SSH, through my router and VPN connection,
> to my desktop at work. I can log in, poke around, do whatever; as soon
> as I run some particular X11 programs, the connection hangs. xlogo and
> xeyes are fine, but rxvt and jconsole are not.
> 
> So, my baseline test is to run rxvt directly. This command always hangs:
> 
> $ ssh -X chickey@192.168.10.88 rxvt
> 
> I have run simultaneous tcpdumps on the router: one on eth0 and the
> other on tun0. I see the tcp connection and ssh sessions get set up,
> then many encrypted packets go back and forth. At a certain, reliably
> reproducible point, a 1368 byte packet comes in on eth0 and does not
> leave tun0; the retransmissions do not get forwarded either.
> 
> I have not been able to figure out the cause of this. Here's what I have
> investigated:
> 
> 1. Number of packets on the connection; doesn't seem to matter, because
> I can use SSH for other purposes just fine.
> 
> 2. Transmission rate; doesn't seem to matter, because I can do
> $ ssh -X chickey@192.168.10.88 cat /dev/zero > /dev/null
> 
> 3. MTU size; 1500 on eth0 and 1406 on tun0. Bigger packets have been
> transferred fine.

	Lower MTU, it can be PMTUD problem. At 04:50:24.112658
I see 7801:9169 is 1420 bytes and no ICMP FRAG NEEDED is generated.
May be these two regressions explain it:

http://marc.info/?l=linux-netdev&m=131342172722536&w=2

	There are 2 fixes you can try or more recent kernel
tree, for example 3.1-rc2 has the fixes.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* [net-next 00/10][pull request] Intel Wired LAN Driver Update
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo

The following series contains updates to ixgbe.

The following are changes since commit ca1ba7caa68520864e4b9227e67f3bbc6fed373b:
  Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next
and are available in the git repository at:
  master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next master

Alexander Duyck (10):
  ixgbe: Simplify transmit cleanup path
  ixgbe: convert rings from q_vector bit indexed array to linked list
  ixgbe: Drop the TX work limit and instead just leave it to budget
  ixgbe: consolidate all MSI-X ring interrupts and poll routines into
    one
  ixgbe: cleanup allocation and freeing of IRQ affinity hint
  ixgbe: Use ring->dev instead of adapter->pdev->dev when updating DCA
  ixgbe: commonize ixgbe_map_rings_to_vectors to work for all interrupt
    types
  ixgbe: Drop unnecessary adapter->hw dereference in loopback test
    setup
  ixgbe: combine PCI_VDEVICE and board declaration to same line
  ixgbe: Update TXDCTL configuration to correctly handle WTHRESH

 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   11 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   25 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  738 +++++++---------------
 3 files changed, 239 insertions(+), 535 deletions(-)

-- 
1.7.6


^ permalink raw reply

* [net-next 01/10] ixgbe: Simplify transmit cleanup path
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313911761-11709-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch helps to simplify the work being done by the transmit path by
removing the unnecessary compares between count and the work limit.  Instead
we can simplify this by just adding a budget value that will act as a count
down from the work limit value.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index e8aad76..e5a4eb6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -804,13 +804,13 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 	struct ixgbe_tx_buffer *tx_buffer;
 	union ixgbe_adv_tx_desc *tx_desc;
 	unsigned int total_bytes = 0, total_packets = 0;
+	u16 budget = q_vector->tx.work_limit;
 	u16 i = tx_ring->next_to_clean;
-	u16 count;
 
 	tx_buffer = &tx_ring->tx_buffer_info[i];
 	tx_desc = IXGBE_TX_DESC_ADV(tx_ring, i);
 
-	for (count = 0; count < q_vector->tx.work_limit; count++) {
+	for (; budget; budget--) {
 		union ixgbe_adv_tx_desc *eop_desc = tx_buffer->next_to_watch;
 
 		/* if next_to_watch is not set then there is no work pending */
@@ -891,11 +891,11 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 		ixgbe_tx_timeout_reset(adapter);
 
 		/* the adapter is about to reset, no point in enabling stuff */
-		return true;
+		return budget;
 	}
 
 #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
-	if (unlikely(count && netif_carrier_ok(tx_ring->netdev) &&
+	if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
 		     (ixgbe_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD))) {
 		/* Make sure that anybody stopping the queue after this
 		 * sees the new next_to_clean.
@@ -908,7 +908,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 		}
 	}
 
-	return count < q_vector->tx.work_limit;
+	return budget;
 }
 
 #ifdef CONFIG_IXGBE_DCA
-- 
1.7.6


^ permalink raw reply related

* [net-next 02/10] ixgbe: convert rings from q_vector bit indexed array to linked list
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313911761-11709-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change converts the current bit array into a linked list so that the
q_vectors can simply go through ring by ring and locate each ring needing
to be cleaned.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |    7 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  192 ++++++++-----------------
 2 files changed, 60 insertions(+), 139 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 378ce46..dc3b12e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -209,6 +209,7 @@ enum ixbge_ring_state_t {
 #define clear_ring_rsc_enabled(ring) \
 	clear_bit(__IXGBE_RX_RSC_ENABLED, &(ring)->state)
 struct ixgbe_ring {
+	struct ixgbe_ring *next;	/* pointer to next ring in q_vector */
 	void *desc;			/* descriptor ring memory */
 	struct device *dev;             /* device for DMA mapping */
 	struct net_device *netdev;      /* netdev ring belongs to */
@@ -277,11 +278,7 @@ struct ixgbe_ring_feature {
 } ____cacheline_internodealigned_in_smp;
 
 struct ixgbe_ring_container {
-#if MAX_RX_QUEUES > MAX_TX_QUEUES
-	DECLARE_BITMAP(idx, MAX_RX_QUEUES);
-#else
-	DECLARE_BITMAP(idx, MAX_TX_QUEUES);
-#endif
+	struct ixgbe_ring *ring;	/* pointer to linked list of rings */
 	unsigned int total_bytes;	/* total bytes processed this int */
 	unsigned int total_packets;	/* total packets processed this int */
 	u16 work_limit;			/* total work allowed per interrupt */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index e5a4eb6..bb54d3d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -974,26 +974,17 @@ static void ixgbe_update_tx_dca(struct ixgbe_adapter *adapter,
 static void ixgbe_update_dca(struct ixgbe_q_vector *q_vector)
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
+	struct ixgbe_ring *ring;
 	int cpu = get_cpu();
-	long r_idx;
-	int i;
 
 	if (q_vector->cpu == cpu)
 		goto out_no_update;
 
-	r_idx = find_first_bit(q_vector->tx.idx, adapter->num_tx_queues);
-	for (i = 0; i < q_vector->tx.count; i++) {
-		ixgbe_update_tx_dca(adapter, adapter->tx_ring[r_idx], cpu);
-		r_idx = find_next_bit(q_vector->tx.idx, adapter->num_tx_queues,
-				      r_idx + 1);
-	}
+	for (ring = q_vector->tx.ring; ring != NULL; ring = ring->next)
+		ixgbe_update_tx_dca(adapter, ring, cpu);
 
-	r_idx = find_first_bit(q_vector->rx.idx, adapter->num_rx_queues);
-	for (i = 0; i < q_vector->rx.count; i++) {
-		ixgbe_update_rx_dca(adapter, adapter->rx_ring[r_idx], cpu);
-		r_idx = find_next_bit(q_vector->rx.idx, adapter->num_rx_queues,
-				      r_idx + 1);
-	}
+	for (ring = q_vector->rx.ring; ring != NULL; ring = ring->next)
+		ixgbe_update_rx_dca(adapter, ring, cpu);
 
 	q_vector->cpu = cpu;
 out_no_update:
@@ -1546,7 +1537,7 @@ static int ixgbe_clean_rxonly(struct napi_struct *, int);
 static void ixgbe_configure_msix(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_q_vector *q_vector;
-	int i, q_vectors, v_idx, r_idx;
+	int q_vectors, v_idx;
 	u32 mask;
 
 	q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
@@ -1556,33 +1547,19 @@ static void ixgbe_configure_msix(struct ixgbe_adapter *adapter)
 	 * corresponding register.
 	 */
 	for (v_idx = 0; v_idx < q_vectors; v_idx++) {
+		struct ixgbe_ring *ring;
 		q_vector = adapter->q_vector[v_idx];
-		/* XXX for_each_set_bit(...) */
-		r_idx = find_first_bit(q_vector->rx.idx,
-				       adapter->num_rx_queues);
-
-		for (i = 0; i < q_vector->rx.count; i++) {
-			u8 reg_idx = adapter->rx_ring[r_idx]->reg_idx;
-			ixgbe_set_ivar(adapter, 0, reg_idx, v_idx);
-			r_idx = find_next_bit(q_vector->rx.idx,
-					      adapter->num_rx_queues,
-					      r_idx + 1);
-		}
-		r_idx = find_first_bit(q_vector->tx.idx,
-				       adapter->num_tx_queues);
-
-		for (i = 0; i < q_vector->tx.count; i++) {
-			u8 reg_idx = adapter->tx_ring[r_idx]->reg_idx;
-			ixgbe_set_ivar(adapter, 1, reg_idx, v_idx);
-			r_idx = find_next_bit(q_vector->tx.idx,
-					      adapter->num_tx_queues,
-					      r_idx + 1);
-		}
 
-		if (q_vector->tx.count && !q_vector->rx.count)
+		for (ring = q_vector->rx.ring; ring != NULL; ring = ring->next)
+			ixgbe_set_ivar(adapter, 0, ring->reg_idx, v_idx);
+
+		for (ring = q_vector->tx.ring; ring != NULL; ring = ring->next)
+			ixgbe_set_ivar(adapter, 1, ring->reg_idx, v_idx);
+
+		if (q_vector->tx.ring && !q_vector->rx.ring)
 			/* tx only */
 			q_vector->eitr = adapter->tx_eitr_param;
-		else if (q_vector->rx.count)
+		else if (q_vector->rx.ring)
 			/* rx or mixed */
 			q_vector->eitr = adapter->rx_eitr_param;
 
@@ -2006,20 +1983,10 @@ static inline void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter,
 static irqreturn_t ixgbe_msix_clean_tx(int irq, void *data)
 {
 	struct ixgbe_q_vector *q_vector = data;
-	struct ixgbe_adapter  *adapter = q_vector->adapter;
-	struct ixgbe_ring     *tx_ring;
-	int i, r_idx;
 
 	if (!q_vector->tx.count)
 		return IRQ_HANDLED;
 
-	r_idx = find_first_bit(q_vector->tx.idx, adapter->num_tx_queues);
-	for (i = 0; i < q_vector->tx.count; i++) {
-		tx_ring = adapter->tx_ring[r_idx];
-		r_idx = find_next_bit(q_vector->tx.idx, adapter->num_tx_queues,
-				      r_idx + 1);
-	}
-
 	/* EIAM disabled interrupts (on this vector) for us */
 	napi_schedule(&q_vector->napi);
 
@@ -2034,22 +2001,6 @@ static irqreturn_t ixgbe_msix_clean_tx(int irq, void *data)
 static irqreturn_t ixgbe_msix_clean_rx(int irq, void *data)
 {
 	struct ixgbe_q_vector *q_vector = data;
-	struct ixgbe_adapter  *adapter = q_vector->adapter;
-	struct ixgbe_ring  *rx_ring;
-	int r_idx;
-	int i;
-
-#ifdef CONFIG_IXGBE_DCA
-	if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)
-		ixgbe_update_dca(q_vector);
-#endif
-
-	r_idx = find_first_bit(q_vector->rx.idx, adapter->num_rx_queues);
-	for (i = 0; i < q_vector->rx.count; i++) {
-		rx_ring = adapter->rx_ring[r_idx];
-		r_idx = find_next_bit(q_vector->rx.idx, adapter->num_rx_queues,
-				      r_idx + 1);
-	}
 
 	if (!q_vector->rx.count)
 		return IRQ_HANDLED;
@@ -2063,28 +2014,10 @@ static irqreturn_t ixgbe_msix_clean_rx(int irq, void *data)
 static irqreturn_t ixgbe_msix_clean_many(int irq, void *data)
 {
 	struct ixgbe_q_vector *q_vector = data;
-	struct ixgbe_adapter  *adapter = q_vector->adapter;
-	struct ixgbe_ring  *ring;
-	int r_idx;
-	int i;
 
 	if (!q_vector->tx.count && !q_vector->rx.count)
 		return IRQ_HANDLED;
 
-	r_idx = find_first_bit(q_vector->tx.idx, adapter->num_tx_queues);
-	for (i = 0; i < q_vector->tx.count; i++) {
-		ring = adapter->tx_ring[r_idx];
-		r_idx = find_next_bit(q_vector->tx.idx, adapter->num_tx_queues,
-				      r_idx + 1);
-	}
-
-	r_idx = find_first_bit(q_vector->rx.idx, adapter->num_rx_queues);
-	for (i = 0; i < q_vector->rx.count; i++) {
-		ring = adapter->rx_ring[r_idx];
-		r_idx = find_next_bit(q_vector->rx.idx, adapter->num_rx_queues,
-				      r_idx + 1);
-	}
-
 	/* EIAM disabled interrupts (on this vector) for us */
 	napi_schedule(&q_vector->napi);
 
@@ -2104,19 +2037,14 @@ static int ixgbe_clean_rxonly(struct napi_struct *napi, int budget)
 	struct ixgbe_q_vector *q_vector =
 			       container_of(napi, struct ixgbe_q_vector, napi);
 	struct ixgbe_adapter *adapter = q_vector->adapter;
-	struct ixgbe_ring *rx_ring = NULL;
 	int work_done = 0;
-	long r_idx;
 
 #ifdef CONFIG_IXGBE_DCA
 	if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)
 		ixgbe_update_dca(q_vector);
 #endif
 
-	r_idx = find_first_bit(q_vector->rx.idx, adapter->num_rx_queues);
-	rx_ring = adapter->rx_ring[r_idx];
-
-	ixgbe_clean_rx_irq(q_vector, rx_ring, &work_done, budget);
+	ixgbe_clean_rx_irq(q_vector, q_vector->rx.ring, &work_done, budget);
 
 	/* If all Rx work done, exit the polling mode */
 	if (work_done < budget) {
@@ -2144,38 +2072,29 @@ static int ixgbe_clean_rxtx_many(struct napi_struct *napi, int budget)
 	struct ixgbe_q_vector *q_vector =
 			       container_of(napi, struct ixgbe_q_vector, napi);
 	struct ixgbe_adapter *adapter = q_vector->adapter;
-	struct ixgbe_ring *ring = NULL;
-	int work_done = 0, i;
-	long r_idx;
-	bool tx_clean_complete = true;
+	struct ixgbe_ring *ring;
+	int work_done = 0;
+	bool clean_complete = true;
 
 #ifdef CONFIG_IXGBE_DCA
 	if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)
 		ixgbe_update_dca(q_vector);
 #endif
 
-	r_idx = find_first_bit(q_vector->tx.idx, adapter->num_tx_queues);
-	for (i = 0; i < q_vector->tx.count; i++) {
-		ring = adapter->tx_ring[r_idx];
-		tx_clean_complete &= ixgbe_clean_tx_irq(q_vector, ring);
-		r_idx = find_next_bit(q_vector->tx.idx, adapter->num_tx_queues,
-				      r_idx + 1);
-	}
+	for (ring = q_vector->tx.ring; ring != NULL; ring = ring->next)
+		clean_complete &= ixgbe_clean_tx_irq(q_vector, ring);
 
 	/* attempt to distribute budget to each queue fairly, but don't allow
 	 * the budget to go below 1 because we'll exit polling */
 	budget /= (q_vector->rx.count ?: 1);
 	budget = max(budget, 1);
-	r_idx = find_first_bit(q_vector->rx.idx, adapter->num_rx_queues);
-	for (i = 0; i < q_vector->rx.count; i++) {
-		ring = adapter->rx_ring[r_idx];
+
+	for (ring = q_vector->rx.ring; ring != NULL; ring = ring->next)
 		ixgbe_clean_rx_irq(q_vector, ring, &work_done, budget);
-		r_idx = find_next_bit(q_vector->rx.idx, adapter->num_rx_queues,
-				      r_idx + 1);
-	}
 
-	r_idx = find_first_bit(q_vector->rx.idx, adapter->num_rx_queues);
-	ring = adapter->rx_ring[r_idx];
+	if (!clean_complete)
+		work_done = budget;
+
 	/* If all Rx work done, exit the polling mode */
 	if (work_done < budget) {
 		napi_complete(napi);
@@ -2203,32 +2122,23 @@ static int ixgbe_clean_txonly(struct napi_struct *napi, int budget)
 	struct ixgbe_q_vector *q_vector =
 			       container_of(napi, struct ixgbe_q_vector, napi);
 	struct ixgbe_adapter *adapter = q_vector->adapter;
-	struct ixgbe_ring *tx_ring = NULL;
-	int work_done = 0;
-	long r_idx;
 
 #ifdef CONFIG_IXGBE_DCA
 	if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)
 		ixgbe_update_dca(q_vector);
 #endif
 
-	r_idx = find_first_bit(q_vector->tx.idx, adapter->num_tx_queues);
-	tx_ring = adapter->tx_ring[r_idx];
-
-	if (!ixgbe_clean_tx_irq(q_vector, tx_ring))
-		work_done = budget;
+	if (!ixgbe_clean_tx_irq(q_vector, q_vector->tx.ring))
+		return budget;
 
 	/* If all Tx work done, exit the polling mode */
-	if (work_done < budget) {
-		napi_complete(napi);
-		if (adapter->tx_itr_setting & 1)
-			ixgbe_set_itr(q_vector);
-		if (!test_bit(__IXGBE_DOWN, &adapter->state))
-			ixgbe_irq_enable_queues(adapter,
-						((u64)1 << q_vector->v_idx));
-	}
+	napi_complete(napi);
+	if (adapter->tx_itr_setting & 1)
+		ixgbe_set_itr(q_vector);
+	if (!test_bit(__IXGBE_DOWN, &adapter->state))
+		ixgbe_irq_enable_queues(adapter, ((u64)1 << q_vector->v_idx));
 
-	return work_done;
+	return 0;
 }
 
 static inline void map_vector_to_rxq(struct ixgbe_adapter *a, int v_idx,
@@ -2237,9 +2147,10 @@ static inline void map_vector_to_rxq(struct ixgbe_adapter *a, int v_idx,
 	struct ixgbe_q_vector *q_vector = a->q_vector[v_idx];
 	struct ixgbe_ring *rx_ring = a->rx_ring[r_idx];
 
-	set_bit(r_idx, q_vector->rx.idx);
-	q_vector->rx.count++;
 	rx_ring->q_vector = q_vector;
+	rx_ring->next = q_vector->rx.ring;
+	q_vector->rx.ring = rx_ring;
+	q_vector->rx.count++;
 }
 
 static inline void map_vector_to_txq(struct ixgbe_adapter *a, int v_idx,
@@ -2248,9 +2159,10 @@ static inline void map_vector_to_txq(struct ixgbe_adapter *a, int v_idx,
 	struct ixgbe_q_vector *q_vector = a->q_vector[v_idx];
 	struct ixgbe_ring *tx_ring = a->tx_ring[t_idx];
 
-	set_bit(t_idx, q_vector->tx.idx);
-	q_vector->tx.count++;
 	tx_ring->q_vector = q_vector;
+	tx_ring->next = q_vector->tx.ring;
+	q_vector->tx.ring = tx_ring;
+	q_vector->tx.count++;
 	q_vector->tx.work_limit = a->tx_work_limit;
 }
 
@@ -2508,14 +2420,26 @@ static irqreturn_t ixgbe_intr(int irq, void *data)
 
 static inline void ixgbe_reset_q_vectors(struct ixgbe_adapter *adapter)
 {
-	int i, q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
+	int q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
+	int i;
+
+	/* legacy and MSI only use one vector */
+	if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED))
+		q_vectors = 1;
+
+	for (i = 0; i < adapter->num_rx_queues; i++) {
+		adapter->rx_ring[i]->q_vector = NULL;
+		adapter->rx_ring[i]->next = NULL;
+	}
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		adapter->tx_ring[i]->q_vector = NULL;
+		adapter->tx_ring[i]->next = NULL;
+	}
 
 	for (i = 0; i < q_vectors; i++) {
 		struct ixgbe_q_vector *q_vector = adapter->q_vector[i];
-		bitmap_zero(q_vector->rx.idx, MAX_RX_QUEUES);
-		bitmap_zero(q_vector->tx.idx, MAX_TX_QUEUES);
-		q_vector->rx.count = 0;
-		q_vector->tx.count = 0;
+		memset(&q_vector->rx, 0, sizeof(struct ixgbe_ring_container));
+		memset(&q_vector->tx, 0, sizeof(struct ixgbe_ring_container));
 	}
 }
 
@@ -5923,7 +5847,7 @@ static void ixgbe_check_hang_subtask(struct ixgbe_adapter *adapter)
 		/* get one bit for every active tx/rx interrupt vector */
 		for (i = 0; i < adapter->num_msix_vectors - NON_Q_VECTORS; i++) {
 			struct ixgbe_q_vector *qv = adapter->q_vector[i];
-			if (qv->rx.count || qv->tx.count)
+			if (qv->rx.ring || qv->tx.ring)
 				eics |= ((u64)1 << i);
 		}
 	}
-- 
1.7.6


^ permalink raw reply related

* [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313911761-11709-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change makes it so that the TX work limit is now obsolete.  Instead of
using it we can instead rely on the NAPI budget for the number of packets
we should clean per interrupt.  The advantage to this approach is that it
results in a much more balanced work flow since the same number of RX and
TX packets should be cleaned per interrupts.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    4 ----
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    7 -------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |   16 +++++++---------
 3 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index dc3b12e..b85312f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -281,7 +281,6 @@ struct ixgbe_ring_container {
 	struct ixgbe_ring *ring;	/* pointer to linked list of rings */
 	unsigned int total_bytes;	/* total bytes processed this int */
 	unsigned int total_packets;	/* total packets processed this int */
-	u16 work_limit;			/* total work allowed per interrupt */
 	u8 count;			/* total number of rings in vector */
 	u8 itr;				/* current ITR setting for ring */
 };
@@ -416,9 +415,6 @@ struct ixgbe_adapter {
 	u16 eitr_low;
 	u16 eitr_high;
 
-	/* Work limits */
-	u16 tx_work_limit;
-
 	/* TX */
 	struct ixgbe_ring *tx_ring[MAX_TX_QUEUES] ____cacheline_aligned_in_smp;
 	int num_tx_queues;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 82d4244..16835cc 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -2005,8 +2005,6 @@ static int ixgbe_get_coalesce(struct net_device *netdev,
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 
-	ec->tx_max_coalesced_frames_irq = adapter->tx_work_limit;
-
 	/* only valid if in constant ITR mode */
 	switch (adapter->rx_itr_setting) {
 	case 0:
@@ -2093,9 +2091,6 @@ static int ixgbe_set_coalesce(struct net_device *netdev,
 	   && ec->tx_coalesce_usecs)
 		return -EINVAL;
 
-	if (ec->tx_max_coalesced_frames_irq)
-		adapter->tx_work_limit = ec->tx_max_coalesced_frames_irq;
-
 	if (ec->rx_coalesce_usecs > 1) {
 		/* check the limits */
 		if ((1000000/ec->rx_coalesce_usecs > IXGBE_MAX_INT_RATE) ||
@@ -2169,14 +2164,12 @@ static int ixgbe_set_coalesce(struct net_device *netdev,
 			else
 				/* rx only or mixed */
 				q_vector->eitr = adapter->rx_eitr_param;
-			q_vector->tx.work_limit = adapter->tx_work_limit;
 			ixgbe_write_eitr(q_vector);
 		}
 	/* Legacy Interrupt Mode */
 	} else {
 		q_vector = adapter->q_vector[0];
 		q_vector->eitr = adapter->rx_eitr_param;
-		q_vector->tx.work_limit = adapter->tx_work_limit;
 		ixgbe_write_eitr(q_vector);
 	}
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index bb54d3d..2450279 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -796,15 +796,16 @@ static void ixgbe_tx_timeout_reset(struct ixgbe_adapter *adapter)
  * ixgbe_clean_tx_irq - Reclaim resources after transmit completes
  * @q_vector: structure containing interrupt and ring information
  * @tx_ring: tx ring to clean
+ * @budget: amount of work driver is allowed to do this pass, in packets
  **/
 static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
-			       struct ixgbe_ring *tx_ring)
+			       struct ixgbe_ring *tx_ring,
+			       int budget)
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	struct ixgbe_tx_buffer *tx_buffer;
 	union ixgbe_adv_tx_desc *tx_desc;
 	unsigned int total_bytes = 0, total_packets = 0;
-	u16 budget = q_vector->tx.work_limit;
 	u16 i = tx_ring->next_to_clean;
 
 	tx_buffer = &tx_ring->tx_buffer_info[i];
@@ -2082,7 +2083,7 @@ static int ixgbe_clean_rxtx_many(struct napi_struct *napi, int budget)
 #endif
 
 	for (ring = q_vector->tx.ring; ring != NULL; ring = ring->next)
-		clean_complete &= ixgbe_clean_tx_irq(q_vector, ring);
+		clean_complete &= ixgbe_clean_tx_irq(q_vector, ring, budget);
 
 	/* attempt to distribute budget to each queue fairly, but don't allow
 	 * the budget to go below 1 because we'll exit polling */
@@ -2128,7 +2129,7 @@ static int ixgbe_clean_txonly(struct napi_struct *napi, int budget)
 		ixgbe_update_dca(q_vector);
 #endif
 
-	if (!ixgbe_clean_tx_irq(q_vector, q_vector->tx.ring))
+	if (!ixgbe_clean_tx_irq(q_vector, q_vector->tx.ring, budget))
 		return budget;
 
 	/* If all Tx work done, exit the polling mode */
@@ -2163,7 +2164,6 @@ static inline void map_vector_to_txq(struct ixgbe_adapter *a, int v_idx,
 	tx_ring->next = q_vector->tx.ring;
 	q_vector->tx.ring = tx_ring;
 	q_vector->tx.count++;
-	q_vector->tx.work_limit = a->tx_work_limit;
 }
 
 /**
@@ -4155,7 +4155,8 @@ static int ixgbe_poll(struct napi_struct *napi, int budget)
 		ixgbe_update_dca(q_vector);
 #endif
 
-	tx_clean_complete = ixgbe_clean_tx_irq(q_vector, adapter->tx_ring[0]);
+	tx_clean_complete = ixgbe_clean_tx_irq(q_vector, adapter->tx_ring[0],
+					       budget);
 	ixgbe_clean_rx_irq(q_vector, adapter->rx_ring[0], &work_done, budget);
 
 	if (!tx_clean_complete)
@@ -5090,9 +5091,6 @@ static int __devinit ixgbe_sw_init(struct ixgbe_adapter *adapter)
 	adapter->tx_ring_count = IXGBE_DEFAULT_TXD;
 	adapter->rx_ring_count = IXGBE_DEFAULT_RXD;
 
-	/* set default work limits */
-	adapter->tx_work_limit = adapter->tx_ring_count;
-
 	/* initialize eeprom parameters */
 	if (ixgbe_init_eeprom_params_generic(hw)) {
 		e_dev_err("EEPROM initialization failed\n");
-- 
1.7.6


^ permalink raw reply related

* [net-next 05/10] ixgbe: cleanup allocation and freeing of IRQ affinity hint
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313911761-11709-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

The allocation and freeing of the IRQ affinity hint needs some updates
since there are a number of spots where we run into possible issues with
the hint not being correctly updated.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   76 ++++++++++++-------------
 1 files changed, 36 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 03432dc..7d745e2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1566,20 +1566,6 @@ static void ixgbe_configure_msix(struct ixgbe_adapter *adapter)
 			q_vector->eitr = adapter->rx_eitr_param;
 
 		ixgbe_write_eitr(q_vector);
-		/* If ATR is enabled, set interrupt affinity */
-		if (adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE) {
-			/*
-			 * Allocate the affinity_hint cpumask, assign the mask
-			 * for this vector, and set our affinity_hint for
-			 * this irq.
-			 */
-			if (!alloc_cpumask_var(&q_vector->affinity_mask,
-			                       GFP_KERNEL))
-				return;
-			cpumask_set_cpu(v_idx, q_vector->affinity_mask);
-			irq_set_affinity_hint(adapter->msix_entries[v_idx].vector,
-			                      q_vector->affinity_mask);
-		}
 	}
 
 	switch (adapter->hw.mac.type) {
@@ -2093,18 +2079,17 @@ out:
 static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
-	int i, vector, q_vectors, err;
+	int q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
+	int vector, err;
 	int ri = 0, ti = 0;
 
-	/* Decrement for Other and TCP Timer vectors */
-	q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
-
 	err = ixgbe_map_rings_to_vectors(adapter);
 	if (err)
 		return err;
 
 	for (vector = 0; vector < q_vectors; vector++) {
 		struct ixgbe_q_vector *q_vector = adapter->q_vector[vector];
+		struct msix_entry *entry = &adapter->msix_entries[vector];
 
 		if (q_vector->tx.ring && q_vector->rx.ring) {
 			snprintf(q_vector->name, sizeof(q_vector->name) - 1,
@@ -2120,14 +2105,19 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
 			/* skip this unused q_vector */
 			continue;
 		}
-		err = request_irq(adapter->msix_entries[vector].vector,
-				  &ixgbe_msix_clean_rings, 0, q_vector->name,
-				  q_vector);
+		err = request_irq(entry->vector, &ixgbe_msix_clean_rings, 0,
+				  q_vector->name, q_vector);
 		if (err) {
 			e_err(probe, "request_irq failed for MSIX interrupt "
 			      "Error: %d\n", err);
 			goto free_queue_irqs;
 		}
+		/* If Flow Director is enabled, set interrupt affinity */
+		if (adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE) {
+			/* assign the mask for this irq */
+			irq_set_affinity_hint(entry->vector,
+					      q_vector->affinity_mask);
+		}
 	}
 
 	sprintf(adapter->lsc_int_name, "%s:lsc", netdev->name);
@@ -2141,9 +2131,13 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
 	return 0;
 
 free_queue_irqs:
-	for (i = vector - 1; i >= 0; i--)
-		free_irq(adapter->msix_entries[--vector].vector,
-			 adapter->q_vector[i]);
+	while (vector) {
+		vector--;
+		irq_set_affinity_hint(adapter->msix_entries[vector].vector,
+				      NULL);
+		free_irq(adapter->msix_entries[vector].vector,
+			 adapter->q_vector[vector]);
+	}
 	adapter->flags &= ~IXGBE_FLAG_MSIX_ENABLED;
 	pci_disable_msix(adapter->pdev);
 	kfree(adapter->msix_entries);
@@ -2333,14 +2327,19 @@ static void ixgbe_free_irq(struct ixgbe_adapter *adapter)
 			    !adapter->q_vector[i]->tx.ring)
 				continue;
 
+			/* clear the affinity_mask in the IRQ descriptor */
+			irq_set_affinity_hint(adapter->msix_entries[i].vector,
+					      NULL);
+
 			free_irq(adapter->msix_entries[i].vector,
 				 adapter->q_vector[i]);
 		}
-
-		ixgbe_reset_q_vectors(adapter);
 	} else {
 		free_irq(adapter->pdev->irq, adapter);
 	}
+
+	/* clear q_vector state information */
+	ixgbe_reset_q_vectors(adapter);
 }
 
 /**
@@ -3879,7 +3878,6 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 	struct ixgbe_hw *hw = &adapter->hw;
 	u32 rxctrl;
 	int i;
-	int num_q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
 
 	/* signal that we are down to the interrupt handler */
 	set_bit(__IXGBE_DOWN, &adapter->state);
@@ -3924,15 +3922,6 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 			adapter->vfinfo[i].clear_to_send = 0;
 	}
 
-	/* Cleanup the affinity_hint CPU mask memory and callback */
-	for (i = 0; i < num_q_vectors; i++) {
-		struct ixgbe_q_vector *q_vector = adapter->q_vector[i];
-		/* clear the affinity_mask in the IRQ descriptor */
-		irq_set_affinity_hint(adapter->msix_entries[i]. vector, NULL);
-		/* release the CPU mask memory */
-		free_cpumask_var(q_vector->affinity_mask);
-	}
-
 	/* disable transmits in the hardware now that interrupts are off */
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		u8 reg_idx = adapter->tx_ring[i]->reg_idx;
@@ -4677,6 +4666,11 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 		q_vector->adapter = adapter;
 		q_vector->v_idx = v_idx;
 
+		/* Allocate the affinity_hint cpumask, configure the mask */
+		if (!alloc_cpumask_var(&q_vector->affinity_mask, GFP_KERNEL))
+			goto err_out;
+		cpumask_set_cpu(v_idx, q_vector->affinity_mask);
+
 		if (q_vector->tx.count && !q_vector->rx.count)
 			q_vector->eitr = adapter->tx_eitr_param;
 		else
@@ -4694,6 +4688,7 @@ err_out:
 		v_idx--;
 		q_vector = adapter->q_vector[v_idx];
 		netif_napi_del(&q_vector->napi);
+		free_cpumask_var(q_vector->affinity_mask);
 		kfree(q_vector);
 		adapter->q_vector[v_idx] = NULL;
 	}
@@ -4710,17 +4705,18 @@ err_out:
  **/
 static void ixgbe_free_q_vectors(struct ixgbe_adapter *adapter)
 {
-	int q_idx, num_q_vectors;
+	int v_idx, num_q_vectors;
 
 	if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED)
 		num_q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
 	else
 		num_q_vectors = 1;
 
-	for (q_idx = 0; q_idx < num_q_vectors; q_idx++) {
-		struct ixgbe_q_vector *q_vector = adapter->q_vector[q_idx];
-		adapter->q_vector[q_idx] = NULL;
+	for (v_idx = 0; v_idx < num_q_vectors; v_idx++) {
+		struct ixgbe_q_vector *q_vector = adapter->q_vector[v_idx];
+		adapter->q_vector[v_idx] = NULL;
 		netif_napi_del(&q_vector->napi);
+		free_cpumask_var(q_vector->affinity_mask);
 		kfree(q_vector);
 	}
 }
-- 
1.7.6


^ permalink raw reply related

* [net-next 04/10] ixgbe: consolidate all MSI-X ring interrupts and poll routines into one
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313911761-11709-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change consolidates all of the MSI-X interrupt and polling routines
into two single functions.  One for the interrupt and one for the code.
The main advantage to doing this is that the compiler can optimize the
routines into single monolithic functions which should allow all of them
function to occupy a single block of memory and as such avoid jumping
around.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  288 ++++++-------------------
 1 files changed, 67 insertions(+), 221 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2450279..03432dc 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -798,9 +798,9 @@ static void ixgbe_tx_timeout_reset(struct ixgbe_adapter *adapter)
  * @tx_ring: tx ring to clean
  * @budget: amount of work driver is allowed to do this pass, in packets
  **/
-static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
-			       struct ixgbe_ring *tx_ring,
-			       int budget)
+static int ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
+			      struct ixgbe_ring *tx_ring,
+			      int budget)
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	struct ixgbe_tx_buffer *tx_buffer;
@@ -1298,9 +1298,9 @@ static inline bool ixgbe_get_rsc_state(union ixgbe_adv_rx_desc *rx_desc)
 		IXGBE_RXDADV_RSCCNT_MASK);
 }
 
-static void ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
-			       struct ixgbe_ring *rx_ring,
-			       int *work_done, int work_to_do)
+static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
+			      struct ixgbe_ring *rx_ring,
+			      int budget)
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	union ixgbe_adv_rx_desc *rx_desc, *next_rxd;
@@ -1480,11 +1480,11 @@ static void ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 #endif /* IXGBE_FCOE */
 		ixgbe_receive_skb(q_vector, skb, staterr, rx_ring, rx_desc);
 
+		budget--;
 next_desc:
 		rx_desc->wb.upper.status_error = 0;
 
-		(*work_done)++;
-		if (*work_done >= work_to_do)
+		if (!budget)
 			break;
 
 		/* return some buffers to hardware, one at a time is too slow */
@@ -1525,9 +1525,10 @@ next_desc:
 	u64_stats_update_end(&rx_ring->syncp);
 	q_vector->rx.total_packets += total_rx_packets;
 	q_vector->rx.total_bytes += total_rx_bytes;
+
+	return budget;
 }
 
-static int ixgbe_clean_rxonly(struct napi_struct *, int);
 /**
  * ixgbe_configure_msix - Configure MSI-X hardware
  * @adapter: board private structure
@@ -1981,167 +1982,18 @@ static inline void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter,
 	/* skip the flush */
 }
 
-static irqreturn_t ixgbe_msix_clean_tx(int irq, void *data)
+static irqreturn_t ixgbe_msix_clean_rings(int irq, void *data)
 {
 	struct ixgbe_q_vector *q_vector = data;
 
-	if (!q_vector->tx.count)
-		return IRQ_HANDLED;
-
 	/* EIAM disabled interrupts (on this vector) for us */
-	napi_schedule(&q_vector->napi);
 
-	return IRQ_HANDLED;
-}
-
-/**
- * ixgbe_msix_clean_rx - single unshared vector rx clean (all queues)
- * @irq: unused
- * @data: pointer to our q_vector struct for this interrupt vector
- **/
-static irqreturn_t ixgbe_msix_clean_rx(int irq, void *data)
-{
-	struct ixgbe_q_vector *q_vector = data;
-
-	if (!q_vector->rx.count)
-		return IRQ_HANDLED;
-
-	/* EIAM disabled interrupts (on this vector) for us */
-	napi_schedule(&q_vector->napi);
+	if (q_vector->rx.ring || q_vector->tx.ring)
+		napi_schedule(&q_vector->napi);
 
 	return IRQ_HANDLED;
 }
 
-static irqreturn_t ixgbe_msix_clean_many(int irq, void *data)
-{
-	struct ixgbe_q_vector *q_vector = data;
-
-	if (!q_vector->tx.count && !q_vector->rx.count)
-		return IRQ_HANDLED;
-
-	/* EIAM disabled interrupts (on this vector) for us */
-	napi_schedule(&q_vector->napi);
-
-	return IRQ_HANDLED;
-}
-
-/**
- * ixgbe_clean_rxonly - msix (aka one shot) rx clean routine
- * @napi: napi struct with our devices info in it
- * @budget: amount of work driver is allowed to do this pass, in packets
- *
- * This function is optimized for cleaning one queue only on a single
- * q_vector!!!
- **/
-static int ixgbe_clean_rxonly(struct napi_struct *napi, int budget)
-{
-	struct ixgbe_q_vector *q_vector =
-			       container_of(napi, struct ixgbe_q_vector, napi);
-	struct ixgbe_adapter *adapter = q_vector->adapter;
-	int work_done = 0;
-
-#ifdef CONFIG_IXGBE_DCA
-	if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)
-		ixgbe_update_dca(q_vector);
-#endif
-
-	ixgbe_clean_rx_irq(q_vector, q_vector->rx.ring, &work_done, budget);
-
-	/* If all Rx work done, exit the polling mode */
-	if (work_done < budget) {
-		napi_complete(napi);
-		if (adapter->rx_itr_setting & 1)
-			ixgbe_set_itr(q_vector);
-		if (!test_bit(__IXGBE_DOWN, &adapter->state))
-			ixgbe_irq_enable_queues(adapter,
-						((u64)1 << q_vector->v_idx));
-	}
-
-	return work_done;
-}
-
-/**
- * ixgbe_clean_rxtx_many - msix (aka one shot) rx clean routine
- * @napi: napi struct with our devices info in it
- * @budget: amount of work driver is allowed to do this pass, in packets
- *
- * This function will clean more than one rx queue associated with a
- * q_vector.
- **/
-static int ixgbe_clean_rxtx_many(struct napi_struct *napi, int budget)
-{
-	struct ixgbe_q_vector *q_vector =
-			       container_of(napi, struct ixgbe_q_vector, napi);
-	struct ixgbe_adapter *adapter = q_vector->adapter;
-	struct ixgbe_ring *ring;
-	int work_done = 0;
-	bool clean_complete = true;
-
-#ifdef CONFIG_IXGBE_DCA
-	if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)
-		ixgbe_update_dca(q_vector);
-#endif
-
-	for (ring = q_vector->tx.ring; ring != NULL; ring = ring->next)
-		clean_complete &= ixgbe_clean_tx_irq(q_vector, ring, budget);
-
-	/* attempt to distribute budget to each queue fairly, but don't allow
-	 * the budget to go below 1 because we'll exit polling */
-	budget /= (q_vector->rx.count ?: 1);
-	budget = max(budget, 1);
-
-	for (ring = q_vector->rx.ring; ring != NULL; ring = ring->next)
-		ixgbe_clean_rx_irq(q_vector, ring, &work_done, budget);
-
-	if (!clean_complete)
-		work_done = budget;
-
-	/* If all Rx work done, exit the polling mode */
-	if (work_done < budget) {
-		napi_complete(napi);
-		if (adapter->rx_itr_setting & 1)
-			ixgbe_set_itr(q_vector);
-		if (!test_bit(__IXGBE_DOWN, &adapter->state))
-			ixgbe_irq_enable_queues(adapter,
-						((u64)1 << q_vector->v_idx));
-		return 0;
-	}
-
-	return work_done;
-}
-
-/**
- * ixgbe_clean_txonly - msix (aka one shot) tx clean routine
- * @napi: napi struct with our devices info in it
- * @budget: amount of work driver is allowed to do this pass, in packets
- *
- * This function is optimized for cleaning one queue only on a single
- * q_vector!!!
- **/
-static int ixgbe_clean_txonly(struct napi_struct *napi, int budget)
-{
-	struct ixgbe_q_vector *q_vector =
-			       container_of(napi, struct ixgbe_q_vector, napi);
-	struct ixgbe_adapter *adapter = q_vector->adapter;
-
-#ifdef CONFIG_IXGBE_DCA
-	if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)
-		ixgbe_update_dca(q_vector);
-#endif
-
-	if (!ixgbe_clean_tx_irq(q_vector, q_vector->tx.ring, budget))
-		return budget;
-
-	/* If all Tx work done, exit the polling mode */
-	napi_complete(napi);
-	if (adapter->tx_itr_setting & 1)
-		ixgbe_set_itr(q_vector);
-	if (!test_bit(__IXGBE_DOWN, &adapter->state))
-		ixgbe_irq_enable_queues(adapter, ((u64)1 << q_vector->v_idx));
-
-	return 0;
-}
-
 static inline void map_vector_to_rxq(struct ixgbe_adapter *a, int v_idx,
 				     int r_idx)
 {
@@ -2241,7 +2093,6 @@ out:
 static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
-	irqreturn_t (*handler)(int, void *);
 	int i, vector, q_vectors, err;
 	int ri = 0, ti = 0;
 
@@ -2252,31 +2103,25 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
 	if (err)
 		return err;
 
-#define SET_HANDLER(_v) (((_v)->rx.count && (_v)->tx.count)        \
-					  ? &ixgbe_msix_clean_many : \
-			  (_v)->rx.count ? &ixgbe_msix_clean_rx   : \
-			  (_v)->tx.count ? &ixgbe_msix_clean_tx   : \
-			  NULL)
 	for (vector = 0; vector < q_vectors; vector++) {
 		struct ixgbe_q_vector *q_vector = adapter->q_vector[vector];
-		handler = SET_HANDLER(q_vector);
 
-		if (handler == &ixgbe_msix_clean_rx) {
+		if (q_vector->tx.ring && q_vector->rx.ring) {
 			snprintf(q_vector->name, sizeof(q_vector->name) - 1,
-			         "%s-%s-%d", netdev->name, "rx", ri++);
-		} else if (handler == &ixgbe_msix_clean_tx) {
+				 "%s-%s-%d", netdev->name, "TxRx", ri++);
+			ti++;
+		} else if (q_vector->rx.ring) {
 			snprintf(q_vector->name, sizeof(q_vector->name) - 1,
-			         "%s-%s-%d", netdev->name, "tx", ti++);
-		} else if (handler == &ixgbe_msix_clean_many) {
+				 "%s-%s-%d", netdev->name, "rx", ri++);
+		} else if (q_vector->tx.ring) {
 			snprintf(q_vector->name, sizeof(q_vector->name) - 1,
-			         "%s-%s-%d", netdev->name, "TxRx", ri++);
-			ti++;
+				 "%s-%s-%d", netdev->name, "tx", ti++);
 		} else {
 			/* skip this unused q_vector */
 			continue;
 		}
 		err = request_irq(adapter->msix_entries[vector].vector,
-				  handler, 0, q_vector->name,
+				  &ixgbe_msix_clean_rings, 0, q_vector->name,
 				  q_vector);
 		if (err) {
 			e_err(probe, "request_irq failed for MSIX interrupt "
@@ -2484,8 +2329,8 @@ static void ixgbe_free_irq(struct ixgbe_adapter *adapter)
 		i--;
 		for (; i >= 0; i--) {
 			/* free only the irqs that were actually requested */
-			if (!adapter->q_vector[i]->rx.count &&
-			    !adapter->q_vector[i]->tx.count)
+			if (!adapter->q_vector[i]->rx.ring &&
+			    !adapter->q_vector[i]->tx.ring)
 				continue;
 
 			free_irq(adapter->msix_entries[i].vector,
@@ -3478,19 +3323,8 @@ static void ixgbe_napi_enable_all(struct ixgbe_adapter *adapter)
 		q_vectors = 1;
 
 	for (q_idx = 0; q_idx < q_vectors; q_idx++) {
-		struct napi_struct *napi;
 		q_vector = adapter->q_vector[q_idx];
-		napi = &q_vector->napi;
-		if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) {
-			if (!q_vector->rx.count || !q_vector->tx.count) {
-				if (q_vector->tx.count == 1)
-					napi->poll = &ixgbe_clean_txonly;
-				else if (q_vector->rx.count == 1)
-					napi->poll = &ixgbe_clean_rxonly;
-			}
-		}
-
-		napi_enable(napi);
+		napi_enable(&q_vector->napi);
 	}
 }
 
@@ -4148,29 +3982,41 @@ static int ixgbe_poll(struct napi_struct *napi, int budget)
 	struct ixgbe_q_vector *q_vector =
 				container_of(napi, struct ixgbe_q_vector, napi);
 	struct ixgbe_adapter *adapter = q_vector->adapter;
-	int tx_clean_complete, work_done = 0;
+	struct ixgbe_ring *ring;
+	int per_ring_budget;
+	bool clean_complete = true;
 
 #ifdef CONFIG_IXGBE_DCA
 	if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)
 		ixgbe_update_dca(q_vector);
 #endif
 
-	tx_clean_complete = ixgbe_clean_tx_irq(q_vector, adapter->tx_ring[0],
-					       budget);
-	ixgbe_clean_rx_irq(q_vector, adapter->rx_ring[0], &work_done, budget);
+	for (ring = q_vector->tx.ring; ring != NULL; ring = ring->next)
+		clean_complete &= !!ixgbe_clean_tx_irq(q_vector, ring, budget);
 
-	if (!tx_clean_complete)
-		work_done = budget;
+	/* attempt to distribute budget to each queue fairly, but don't allow
+	 * the budget to go below 1 because we'll exit polling */
+	if (q_vector->rx.count > 1)
+		per_ring_budget = max(budget/q_vector->rx.count, 1);
+	else
+		per_ring_budget = budget;
 
-	/* If budget not fully consumed, exit the polling mode */
-	if (work_done < budget) {
-		napi_complete(napi);
-		if (adapter->rx_itr_setting & 1)
-			ixgbe_set_itr(q_vector);
-		if (!test_bit(__IXGBE_DOWN, &adapter->state))
-			ixgbe_irq_enable_queues(adapter, IXGBE_EIMS_RTX_QUEUE);
-	}
-	return work_done;
+	for (ring = q_vector->rx.ring; ring != NULL; ring = ring->next)
+		clean_complete &= !!ixgbe_clean_rx_irq(q_vector, ring,
+						       per_ring_budget);
+
+	/* If all work not completed, return budget and keep polling */
+	if (!clean_complete)
+		return budget;
+
+	/* all work done, exit the polling mode */
+	napi_complete(napi);
+	if (adapter->rx_itr_setting & 1)
+		ixgbe_set_itr(q_vector);
+	if (!test_bit(__IXGBE_DOWN, &adapter->state))
+		ixgbe_irq_enable_queues(adapter, ((u64)1 << q_vector->v_idx));
+
+	return 0;
 }
 
 /**
@@ -4811,19 +4657,15 @@ out:
  **/
 static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 {
-	int q_idx, num_q_vectors;
+	int v_idx, num_q_vectors;
 	struct ixgbe_q_vector *q_vector;
-	int (*poll)(struct napi_struct *, int);
 
-	if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) {
+	if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED)
 		num_q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
-		poll = &ixgbe_clean_rxtx_many;
-	} else {
+	else
 		num_q_vectors = 1;
-		poll = &ixgbe_poll;
-	}
 
-	for (q_idx = 0; q_idx < num_q_vectors; q_idx++) {
+	for (v_idx = 0; v_idx < num_q_vectors; v_idx++) {
 		q_vector = kzalloc_node(sizeof(struct ixgbe_q_vector),
 					GFP_KERNEL, adapter->node);
 		if (!q_vector)
@@ -4831,25 +4673,29 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 					   GFP_KERNEL);
 		if (!q_vector)
 			goto err_out;
+
 		q_vector->adapter = adapter;
+		q_vector->v_idx = v_idx;
+
 		if (q_vector->tx.count && !q_vector->rx.count)
 			q_vector->eitr = adapter->tx_eitr_param;
 		else
 			q_vector->eitr = adapter->rx_eitr_param;
-		q_vector->v_idx = q_idx;
-		netif_napi_add(adapter->netdev, &q_vector->napi, (*poll), 64);
-		adapter->q_vector[q_idx] = q_vector;
+
+		netif_napi_add(adapter->netdev, &q_vector->napi,
+			       ixgbe_poll, 64);
+		adapter->q_vector[v_idx] = q_vector;
 	}
 
 	return 0;
 
 err_out:
-	while (q_idx) {
-		q_idx--;
-		q_vector = adapter->q_vector[q_idx];
+	while (v_idx) {
+		v_idx--;
+		q_vector = adapter->q_vector[v_idx];
 		netif_napi_del(&q_vector->napi);
 		kfree(q_vector);
-		adapter->q_vector[q_idx] = NULL;
+		adapter->q_vector[v_idx] = NULL;
 	}
 	return -ENOMEM;
 }
@@ -6944,7 +6790,7 @@ static void ixgbe_netpoll(struct net_device *netdev)
 		int num_q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
 		for (i = 0; i < num_q_vectors; i++) {
 			struct ixgbe_q_vector *q_vector = adapter->q_vector[i];
-			ixgbe_msix_clean_many(0, q_vector);
+			ixgbe_msix_clean_rings(0, q_vector);
 		}
 	} else {
 		ixgbe_intr(adapter->pdev->irq, netdev);
-- 
1.7.6


^ permalink raw reply related

* [net-next 06/10] ixgbe: Use ring->dev instead of adapter->pdev->dev when updating DCA
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313911761-11709-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change switches us over to using the ring->dev pointer instead of
having to use the adapter->pdev->dev reference.  The advantage to this is
that it is a much shorter route to get the to final needed value.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 7d745e2..1929d23 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -925,12 +925,12 @@ static void ixgbe_update_rx_dca(struct ixgbe_adapter *adapter,
 	switch (hw->mac.type) {
 	case ixgbe_mac_82598EB:
 		rxctrl &= ~IXGBE_DCA_RXCTRL_CPUID_MASK;
-		rxctrl |= dca3_get_tag(&adapter->pdev->dev, cpu);
+		rxctrl |= dca3_get_tag(rx_ring->dev, cpu);
 		break;
 	case ixgbe_mac_82599EB:
 	case ixgbe_mac_X540:
 		rxctrl &= ~IXGBE_DCA_RXCTRL_CPUID_MASK_82599;
-		rxctrl |= (dca3_get_tag(&adapter->pdev->dev, cpu) <<
+		rxctrl |= (dca3_get_tag(rx_ring->dev, cpu) <<
 			   IXGBE_DCA_RXCTRL_CPUID_SHIFT_82599);
 		break;
 	default:
@@ -954,7 +954,7 @@ static void ixgbe_update_tx_dca(struct ixgbe_adapter *adapter,
 	case ixgbe_mac_82598EB:
 		txctrl = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(reg_idx));
 		txctrl &= ~IXGBE_DCA_TXCTRL_CPUID_MASK;
-		txctrl |= dca3_get_tag(&adapter->pdev->dev, cpu);
+		txctrl |= dca3_get_tag(tx_ring->dev, cpu);
 		txctrl |= IXGBE_DCA_TXCTRL_DESC_DCA_EN;
 		IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(reg_idx), txctrl);
 		break;
@@ -962,7 +962,7 @@ static void ixgbe_update_tx_dca(struct ixgbe_adapter *adapter,
 	case ixgbe_mac_X540:
 		txctrl = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(reg_idx));
 		txctrl &= ~IXGBE_DCA_TXCTRL_CPUID_MASK_82599;
-		txctrl |= (dca3_get_tag(&adapter->pdev->dev, cpu) <<
+		txctrl |= (dca3_get_tag(tx_ring->dev, cpu) <<
 			   IXGBE_DCA_TXCTRL_CPUID_SHIFT_82599);
 		txctrl |= IXGBE_DCA_TXCTRL_DESC_DCA_EN;
 		IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(reg_idx), txctrl);
-- 
1.7.6


^ permalink raw reply related

* [net-next 07/10] ixgbe: commonize ixgbe_map_rings_to_vectors to work for all interrupt types
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313911761-11709-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch makes it so that the map_rings_to_vectors call will work with
all interrupt types.  The advantage to this is that there will now be a
predictable mapping for all given interrupt types.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   90 ++++++++++---------------
 1 files changed, 35 insertions(+), 55 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 1929d23..2879a74 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2014,59 +2014,41 @@ static inline void map_vector_to_txq(struct ixgbe_adapter *a, int v_idx,
  * group the rings as "efficiently" as possible.  You would add new
  * mapping configurations in here.
  **/
-static int ixgbe_map_rings_to_vectors(struct ixgbe_adapter *adapter)
+static void ixgbe_map_rings_to_vectors(struct ixgbe_adapter *adapter)
 {
-	int q_vectors;
+	int q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
+	int rxr_remaining = adapter->num_rx_queues, rxr_idx = 0;
+	int txr_remaining = adapter->num_tx_queues, txr_idx = 0;
 	int v_start = 0;
-	int rxr_idx = 0, txr_idx = 0;
-	int rxr_remaining = adapter->num_rx_queues;
-	int txr_remaining = adapter->num_tx_queues;
-	int i, j;
-	int rqpv, tqpv;
-	int err = 0;
 
-	/* No mapping required if MSI-X is disabled. */
+	/* only one q_vector if MSI-X is disabled. */
 	if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED))
-		goto out;
-
-	q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
+		q_vectors = 1;
 
 	/*
-	 * The ideal configuration...
-	 * We have enough vectors to map one per queue.
+	 * If we don't have enough vectors for a 1-to-1 mapping, we'll have to
+	 * group them so there are multiple queues per vector.
+	 *
+	 * Re-adjusting *qpv takes care of the remainder.
 	 */
-	if (q_vectors == adapter->num_rx_queues + adapter->num_tx_queues) {
-		for (; rxr_idx < rxr_remaining; v_start++, rxr_idx++)
+	for (; v_start < q_vectors && rxr_remaining; v_start++) {
+		int rqpv = DIV_ROUND_UP(rxr_remaining, q_vectors - v_start);
+		for (; rqpv; rqpv--, rxr_idx++, rxr_remaining--)
 			map_vector_to_rxq(adapter, v_start, rxr_idx);
-
-		for (; txr_idx < txr_remaining; v_start++, txr_idx++)
-			map_vector_to_txq(adapter, v_start, txr_idx);
-
-		goto out;
 	}
 
 	/*
-	 * If we don't have enough vectors for a 1-to-1
-	 * mapping, we'll have to group them so there are
-	 * multiple queues per vector.
+	 * If there are not enough q_vectors for each ring to have it's own
+	 * vector then we must pair up Rx/Tx on a each vector
 	 */
-	/* Re-adjusting *qpv takes care of the remainder. */
-	for (i = v_start; i < q_vectors; i++) {
-		rqpv = DIV_ROUND_UP(rxr_remaining, q_vectors - i);
-		for (j = 0; j < rqpv; j++) {
-			map_vector_to_rxq(adapter, i, rxr_idx);
-			rxr_idx++;
-			rxr_remaining--;
-		}
-		tqpv = DIV_ROUND_UP(txr_remaining, q_vectors - i);
-		for (j = 0; j < tqpv; j++) {
-			map_vector_to_txq(adapter, i, txr_idx);
-			txr_idx++;
-			txr_remaining--;
-		}
+	if ((v_start + txr_remaining) > q_vectors)
+		v_start = 0;
+
+	for (; v_start < q_vectors && txr_remaining; v_start++) {
+		int tqpv = DIV_ROUND_UP(txr_remaining, q_vectors - v_start);
+		for (; tqpv; tqpv--, txr_idx++, txr_remaining--)
+			map_vector_to_txq(adapter, v_start, txr_idx);
 	}
-out:
-	return err;
 }
 
 /**
@@ -2083,10 +2065,6 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
 	int vector, err;
 	int ri = 0, ti = 0;
 
-	err = ixgbe_map_rings_to_vectors(adapter);
-	if (err)
-		return err;
-
 	for (vector = 0; vector < q_vectors; vector++) {
 		struct ixgbe_q_vector *q_vector = adapter->q_vector[vector];
 		struct msix_entry *entry = &adapter->msix_entries[vector];
@@ -2294,19 +2272,25 @@ static int ixgbe_request_irq(struct ixgbe_adapter *adapter)
 	struct net_device *netdev = adapter->netdev;
 	int err;
 
-	if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) {
+	/* map all of the rings to the q_vectors */
+	ixgbe_map_rings_to_vectors(adapter);
+
+	if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED)
 		err = ixgbe_request_msix_irqs(adapter);
-	} else if (adapter->flags & IXGBE_FLAG_MSI_ENABLED) {
+	else if (adapter->flags & IXGBE_FLAG_MSI_ENABLED)
 		err = request_irq(adapter->pdev->irq, ixgbe_intr, 0,
 				  netdev->name, adapter);
-	} else {
+	else
 		err = request_irq(adapter->pdev->irq, ixgbe_intr, IRQF_SHARED,
 				  netdev->name, adapter);
-	}
 
-	if (err)
+	if (err) {
 		e_err(probe, "request_irq failed, Error %d\n", err);
 
+		/* place q_vectors and rings back into a known good state */
+		ixgbe_reset_q_vectors(adapter);
+	}
+
 	return err;
 }
 
@@ -2316,11 +2300,10 @@ static void ixgbe_free_irq(struct ixgbe_adapter *adapter)
 		int i, q_vectors;
 
 		q_vectors = adapter->num_msix_vectors;
-
 		i = q_vectors - 1;
 		free_irq(adapter->msix_entries[i].vector, adapter);
-
 		i--;
+
 		for (; i >= 0; i--) {
 			/* free only the irqs that were actually requested */
 			if (!adapter->q_vector[i]->rx.ring &&
@@ -2387,9 +2370,6 @@ static void ixgbe_configure_msi_and_legacy(struct ixgbe_adapter *adapter)
 	ixgbe_set_ivar(adapter, 0, 0, 0);
 	ixgbe_set_ivar(adapter, 1, 0, 0);
 
-	map_vector_to_rxq(adapter, 0, 0);
-	map_vector_to_txq(adapter, 0, 0);
-
 	e_info(hw, "Legacy interrupt IVAR setup done\n");
 }
 
-- 
1.7.6


^ permalink raw reply related

* [net-next 08/10] ixgbe: Drop unnecessary adapter->hw dereference in loopback test setup
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313911761-11709-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch drops a set of unnecessary dereferences to the hardware structure
since we already have a local copy of the hardware pointer.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   18 +++++++++---------
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 16835cc..6241e9a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -1566,26 +1566,26 @@ static int ixgbe_setup_loopback_test(struct ixgbe_adapter *adapter)
 
 	/* X540 needs to set the MACC.FLU bit to force link up */
 	if (adapter->hw.mac.type == ixgbe_mac_X540) {
-		reg_data = IXGBE_READ_REG(&adapter->hw, IXGBE_MACC);
+		reg_data = IXGBE_READ_REG(hw, IXGBE_MACC);
 		reg_data |= IXGBE_MACC_FLU;
-		IXGBE_WRITE_REG(&adapter->hw, IXGBE_MACC, reg_data);
+		IXGBE_WRITE_REG(hw, IXGBE_MACC, reg_data);
 	}
 
 	/* right now we only support MAC loopback in the driver */
-	reg_data = IXGBE_READ_REG(&adapter->hw, IXGBE_HLREG0);
+	reg_data = IXGBE_READ_REG(hw, IXGBE_HLREG0);
 	/* Setup MAC loopback */
 	reg_data |= IXGBE_HLREG0_LPBK;
-	IXGBE_WRITE_REG(&adapter->hw, IXGBE_HLREG0, reg_data);
+	IXGBE_WRITE_REG(hw, IXGBE_HLREG0, reg_data);
 
-	reg_data = IXGBE_READ_REG(&adapter->hw, IXGBE_FCTRL);
+	reg_data = IXGBE_READ_REG(hw, IXGBE_FCTRL);
 	reg_data |= IXGBE_FCTRL_BAM | IXGBE_FCTRL_SBP | IXGBE_FCTRL_MPE;
-	IXGBE_WRITE_REG(&adapter->hw, IXGBE_FCTRL, reg_data);
+	IXGBE_WRITE_REG(hw, IXGBE_FCTRL, reg_data);
 
-	reg_data = IXGBE_READ_REG(&adapter->hw, IXGBE_AUTOC);
+	reg_data = IXGBE_READ_REG(hw, IXGBE_AUTOC);
 	reg_data &= ~IXGBE_AUTOC_LMS_MASK;
 	reg_data |= IXGBE_AUTOC_LMS_10G_LINK_NO_AN | IXGBE_AUTOC_FLU;
-	IXGBE_WRITE_REG(&adapter->hw, IXGBE_AUTOC, reg_data);
-	IXGBE_WRITE_FLUSH(&adapter->hw);
+	IXGBE_WRITE_REG(hw, IXGBE_AUTOC, reg_data);
+	IXGBE_WRITE_FLUSH(hw);
 	usleep_range(10000, 20000);
 
 	/* Disable Atlas Tx lanes; re-enabled in reset path */
-- 
1.7.6


^ permalink raw reply related

* [net-next 09/10] ixgbe: combine PCI_VDEVICE and board declaration to same line
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313911761-11709-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch is a minor whitespace cleanup to compress the device ID
declaration and board type declaration onto the same line.  It seems to
make sense since all of the combinations of the two are less than 80
characters and it makes the overall layout a bit more readable.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   79 ++++++++-----------------
 1 files changed, 26 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2879a74..35f9912 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -79,59 +79,32 @@ static const struct ixgbe_info *ixgbe_info_tbl[] = {
  *   Class, Class Mask, private data (not used) }
  */
 static DEFINE_PCI_DEVICE_TABLE(ixgbe_pci_tbl) = {
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598AF_DUAL_PORT),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598AF_SINGLE_PORT),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598AT),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598AT2),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598EB_CX4),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598_CX4_DUAL_PORT),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598_DA_DUAL_PORT),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598_SR_DUAL_PORT_EM),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598EB_XF_LR),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598EB_SFP_LOM),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598_BX),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_KX4),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_XAUI_LOM),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_KR),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP_EM),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_KX4_MEZZ),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_CX4),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_BACKPLANE_FCOE),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP_FCOE),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_T3_LOM),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_COMBO_BACKPLANE),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_X540T),
-	 board_X540 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP_SF2),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_LS),
-	 board_82599 },
-
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598AF_DUAL_PORT), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598AF_SINGLE_PORT), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598AT), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598AT2), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598EB_CX4), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598_CX4_DUAL_PORT), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598_DA_DUAL_PORT), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598_SR_DUAL_PORT_EM), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598EB_XF_LR), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598EB_SFP_LOM), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598_BX), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_KX4), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_XAUI_LOM), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_KR), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP_EM), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_KX4_MEZZ), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_CX4), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_BACKPLANE_FCOE), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP_FCOE), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_T3_LOM), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_COMBO_BACKPLANE), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_X540T), board_X540 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP_SF2), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_LS), board_82599 },
 	/* required last entry */
 	{0, }
 };
-- 
1.7.6


^ permalink raw reply related

* [net-next 10/10] ixgbe: Update TXDCTL configuration to correctly handle WTHRESH
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313911761-11709-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change updated the TXDCTL configuration.  The main goal is to be much
more explicit about the configuration and avoid a possible fake TX hang
when the interrupt throttle rate is set to 0.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   35 +++++++++++++------------
 1 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 35f9912..8cf8670 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2359,13 +2359,11 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 	struct ixgbe_hw *hw = &adapter->hw;
 	u64 tdba = ring->dma;
 	int wait_loop = 10;
-	u32 txdctl;
+	u32 txdctl = IXGBE_TXDCTL_ENABLE;
 	u8 reg_idx = ring->reg_idx;
 
 	/* disable queue to avoid issues while updating state */
-	txdctl = IXGBE_READ_REG(hw, IXGBE_TXDCTL(reg_idx));
-	IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx),
-			txdctl & ~IXGBE_TXDCTL_ENABLE);
+	IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), 0);
 	IXGBE_WRITE_FLUSH(hw);
 
 	IXGBE_WRITE_REG(hw, IXGBE_TDBAL(reg_idx),
@@ -2377,18 +2375,22 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 	IXGBE_WRITE_REG(hw, IXGBE_TDT(reg_idx), 0);
 	ring->tail = hw->hw_addr + IXGBE_TDT(reg_idx);
 
-	/* configure fetching thresholds */
-	if (adapter->rx_itr_setting == 0) {
-		/* cannot set wthresh when itr==0 */
-		txdctl &= ~0x007F0000;
-	} else {
-		/* enable WTHRESH=8 descriptors, to encourage burst writeback */
-		txdctl |= (8 << 16);
-	}
-	if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) {
-		/* PThresh workaround for Tx hang with DFP enabled. */
-		txdctl |= 32;
-	}
+	/*
+	 * set WTHRESH to encourage burst writeback, it should not be set
+	 * higher than 1 when ITR is 0 as it could cause false TX hangs
+	 *
+	 * In order to avoid issues WTHRESH + PTHRESH should always be equal
+	 * to or less than the number of on chip descriptors, which is
+	 * currently 40.
+	 */
+	if (!adapter->tx_itr_setting || !adapter->rx_itr_setting)
+		txdctl |= (1 << 16);	/* WTHRESH = 1 */
+	else
+		txdctl |= (8 << 16);	/* WTHRESH = 8 */
+
+	/* PTHRESH=32 is needed to avoid a Tx hang with DFP enabled. */
+	txdctl |= (1 << 8) |	/* HTHRESH = 1 */
+		   32;		/* PTHRESH = 32 */
 
 	/* reinitialize flowdirector state */
 	if ((adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE) &&
@@ -2403,7 +2405,6 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 	clear_bit(__IXGBE_HANG_CHECK_ARMED, &ring->state);
 
 	/* enable queue */
-	txdctl |= IXGBE_TXDCTL_ENABLE;
 	IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), txdctl);
 
 	/* TXDCTL.EN will return 0 on 82598 if link is down, so skip it */
-- 
1.7.6


^ permalink raw reply related

* Nine Hundred And Fifty Thousand Pounds Was Awarded To Your ID In The Benz Cash Offer, Get Bact To Us With Your:
From: letrevino @ 2011-08-21  6:32 UTC (permalink / raw)




1. Full Name: 
2. Full Address: 
3. Phone number: 
4. Country: 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox