Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH RFC] Per route TCP options
From: Gilad Ben-Yossef @ 2009-10-21  8:40 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Rick Jones, netdev, ori
In-Reply-To: <20091021082109.GE8704@Chamillionaire.breakpoint.cc>

Hi Florian,

Florian Westphal wrote:

> Gilad Ben-Yossef <gilad@codefidence.com> wrote:
>   
>> The point is that even then you are more then likely to wish to turn
>> off these options to specific destination and routes (that go over
>> said exotic link) and keep using them over others - e.g. timestamp
>> OK for local LAN, but for default route that goes over exotic TCP/IP
>> over carrier penguins turn it off.
>>     
>
> If you need a bandaid solution, its is possible to replace tcp
> options with NOOPs using netfilters TCPOPTSTRIP target.
>
> There is also an ECN target to work around ECN blackholes.
>   
Thanks for the tip. It is appreciated.

The band aid solution that Ori and company found was
simply to patch the local copy of the kernel used but being that sitting
on a bunch of "private" patches seems like a lose-lose situation (there
is a term you don't hear much when talking to MBAs :-) I'm now trying
to get it mainlined for them.

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Sorry cannot parse this, its too long to be true  :)"
	  -- Eric Dumazet on netdev mailing list

^ permalink raw reply

* Re: [PATCH RFC] Per route TCP options
From: Gilad Ben-Yossef @ 2009-10-21  8:27 UTC (permalink / raw)
  To: Bill Fink; +Cc: netdev, ori
In-Reply-To: <20091020221354.5a714323.billfink@mindspring.com>

Hi,


Bill Fink wrote:

> On Tue, 20 Oct 2009, Gilad Ben-Yossef wrote:
>
>   
>> Turn the global sysctls allowing disabling of TCP SACK, DSCAK,
>> time stamp and window scale into per route entry feature options,
>> laying the ground to future removal of the relevant global sysctls.
>>
>> You really only want to disable SACK, DSACK, time stamp or window
>> scale if you've got a piece of broken networking equipment somewhere 
>> as a stop gap until you can bring a big enough hammer to deal with
>> the broken network equipment. It doesn't make sense to "punish" the
>> entire connections going through the machine to destinations not 
>> related to the broken equipment.
>>     
>
> For certain test situations, it is sometimes desirable to globally
> disable TCP timestamps.  Although I have not personally wanted to
> globally disable the other mentioned features, I can imagine test
> scenarios where it could be useful.  Admittedly it could also be
> accomplished with per route features, just not as conveniently,
> especially if there are a large number of interfaces and/or routes.
>
>   
I don't feel that strongly about it either way, just trying to hearken to
Linus "Linux is bloated" message by not fostering duplicate ways to do
the same thing... ;-)

If the consensus is to adopt the route features but leave the global
kill switches, I certainly have no problem with that.

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Sorry cannot parse this, its too long to be true  :)"
	  -- Eric Dumazet on netdev mailing list


^ permalink raw reply

* Re: [PATCH RFC] Per route TCP options
From: Florian Westphal @ 2009-10-21  8:21 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: Rick Jones, netdev, ori
In-Reply-To: <4ADEC076.2030105@codefidence.com>

Gilad Ben-Yossef <gilad@codefidence.com> wrote:
> The point is that even then you are more then likely to wish to turn
> off these options to specific destination and routes (that go over
> said exotic link) and keep using them over others - e.g. timestamp
> OK for local LAN, but for default route that goes over exotic TCP/IP
> over carrier penguins turn it off.

If you need a bandaid solution, its is possible to replace tcp
options with NOOPs using netfilters TCPOPTSTRIP target.

There is also an ECN target to work around ECN blackholes.

^ permalink raw reply

* Re: [PATCH RFC] Allow to turn off TCP window scale opt per route
From: Gilad Ben-Yossef @ 2009-10-21  8:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, ori
In-Reply-To: <20091021104024.32adab23@s6510>

Stephen Hemminger wrote:

> On Tue, 20 Oct 2009 17:22:39 +0200
> Gilad Ben-Yossef <gilad@codefidence.com> wrote:
>
>   
>> Add and use no window scale bit in the features field.
>>
>> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
>> Sigend-off-by: Ori Finkelman <ori@comsleep.com>
>> Sigend-off-by: Yony Amit <yony@comsleep.com>
>>     
>
> The same effect can by just using window limit on route
>   
Actually, not exactly. There is a subtle but important difference 
between "I support window scaling, but choose a scale of 0", which is 
what your suggestion will imply and "I don't support window scaling at 
all", which is what the suggested feature field does.

To make things more complicated, until the previous patch I sent, when 
Linux would choose a window scale of zero for any reason, it would also 
wrongfully report it does not support window scaling at all to the other 
side, with some subtle bug caused by it, but this is no longer the 
behavior and I believe rightly so.

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Sorry cannot parse this, its too long to be true  :)"
	  -- Eric Dumazet on netdev mailing list

^ permalink raw reply

* Re: [PATCH RFC] Per route TCP options
From: Gilad Ben-Yossef @ 2009-10-21  8:15 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Eric Dumazet, Netdev, ori
In-Reply-To: <Pine.LNX.4.64.0910202151160.22058@melkinkari.cs.Helsinki.FI>

Hi Ilpo,


Ilpo Järvinen wrote:

>
>> Specifically, I couldn't understand why sysctl_tcp_ecn is documented to be a
>> boolean value, but is initialized to 2 and queried with if (sysctl_tcp_ecn ==
>> 1) so I decided to let it be until I figure it out... ;-)
>>     
>
> Ah, I didn't notice that "- BOOLEAN" there so I modified only the 
> description (some blindness to caps perhaps :-)), did you perhaps read
> it fully?
>
>   
Yes, I did finally. What threw me off was that that 
TCP_ECN_create_request() were the value if checked as a boolean was 
inlined in the include file. I've missed that. Thanks for the tip.

At any rate, it means that "noecn" is not a good route option feature, 
since it is not boolean. I plan to handle those in the next patch set, 
if the current one is accepted.

Thanks,
Gilad


-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Sorry cannot parse this, its too long to be true  :)"
	  -- Eric Dumazet on netdev mailing list


^ permalink raw reply

* Re: [PATCH RFC] Per route TCP options
From: Gilad Ben-Yossef @ 2009-10-21  8:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, ori
In-Reply-To: <20091020.173653.193717720.davem@davemloft.net>

David Miller wrote:

> When sending more than one patch in a patch set, you must
> number your patches so that people know in what order your
> pathes should be applied.
>
> Mailing lists and other entities reorder list postings
> quite severely, so it's not "obvious" what order your
> postings matter just based upon arrival order.
>
>   
Yes, of course. How silly of me not to notice. My apologies.

git and I are currently have a bit of love hate thing going on...

I will re-send the patch set in a more favorable form.

Thanks,
Gilad


-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Sorry cannot parse this, its too long to be true  :)"
	  -- Eric Dumazet on netdev mailing list


^ permalink raw reply

* Re: [PATCH RFC] Per route TCP options
From: Gilad Ben-Yossef @ 2009-10-21  8:04 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev, ori
In-Reply-To: <4ADDE4C4.5080501@hp.com>

Rick Jones wrote:

> Gilad Ben-Yossef wrote:
>> Turn the global sysctls allowing disabling of TCP SACK, DSCAK,
>> time stamp and window scale into per route entry feature options,
>> laying the ground to future removal of the relevant global sysctls.
>>
>> You really only want to disable SACK, DSACK, time stamp or window
>> scale if you've got a piece of broken networking equipment somewhere 
>> as a stop gap until you can bring a big enough hammer to deal with
>> the broken network equipment. It doesn't make sense to "punish" the
>> entire connections going through the machine to destinations not 
>> related to the broken equipment.
>
> Is it really only the case that those options get disabled for broken 
> networking equipment?  Does this presage making all TCP options 
> per-route only?
Well, I assume it might be the case that there are situations where you 
are trying to communicate over some exotic link  where the networking 
equipment is not broken as such, but the unusual properties of the link 
makes one of the features not desirable. I can't think of such a 
situation right now off the top of my head, but maybe they exist.

The point is that even then you are more then likely to wish to turn off 
these options to specific destination and routes (that go over said 
exotic link) and keep using them over others - e.g. timestamp OK for 
local LAN, but for default route that goes over exotic TCP/IP over 
carrier penguins turn it off.

To sum it up, I think making these options per route is a win no matter 
the situation. The question I am less certain about if it is also 
desirable to have a global kill switch in addition to the per route 
options. My gut feeling is that this is not needed once you have a per 
route option.

Cheers,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Sorry cannot parse this, its too long to be true  :)"
	  -- Eric Dumazet on netdev mailing list

^ permalink raw reply

* Re: Enable syn cookies by default
From: Olaf van der Spek @ 2009-10-21  7:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <4ADEB752.50103@gmail.com>

On Wed, Oct 21, 2009 at 9:25 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Olaf van der Spek a écrit :
>> On Thu, Oct 15, 2009 at 10:59 AM, Olaf van der Spek
>> <olafvdspek@gmail.com> wrote:
>>> On Sat, Oct 10, 2009 at 3:01 PM, Olaf van der Spek <olafvdspek@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> I'm forwarding Debian feature request #520668.
>>>>
>>>> Could syn cookies be enabled by default?
>>>>
>>>> AFAIK syn cookies only get send when the half-open TCP connection
>>>> queue is full. So stuff like window scaling should work fine in normal
>>>> situations.
>>>>
>>>> Speaking of which:
>>>> When the half-open TCP connection queue is full and syn cookies are
>>>> enabled, you get a message like "kernel: possible SYN flooding on port
>>>> 2710. Sending cookies."
>>>> However when syn cookies are disabled, you don't get any message (in
>>>> kern.log), although connections to your server are timing out.
>>>> Could such a message be added?
>>>> Maybe with a suggestion to increase the size of that queue or to
>>>> enable syn cookies.
>>>>
>>>> Greetings,
>>>>
>>>> Olaf
>>>>
>>>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=520668
>>>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=520667
>>>> https://bugs.launchpad.net/ubuntu/+bug/57091
>>>>
>>> Somebody?
>>
>> Anybody?
>
> This is a user selectable setting. What's wrong with /etc/sysctl.conf ?

It requires user action...
Often you notice cookies are disabled only after a service becomes unreachable.
What's wrong with improving defaults?
Don't forget the missing log entries.

Olaf

^ permalink raw reply

* Re: Enable syn cookies by default
From: Eric Dumazet @ 2009-10-21  7:25 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: netdev
In-Reply-To: <b2cc26e40910210017v3885b18dre5021c8a920f30d7@mail.gmail.com>

Olaf van der Spek a écrit :
> On Thu, Oct 15, 2009 at 10:59 AM, Olaf van der Spek
> <olafvdspek@gmail.com> wrote:
>> On Sat, Oct 10, 2009 at 3:01 PM, Olaf van der Spek <olafvdspek@gmail.com> wrote:
>>> Hi,
>>>
>>> I'm forwarding Debian feature request #520668.
>>>
>>> Could syn cookies be enabled by default?
>>>
>>> AFAIK syn cookies only get send when the half-open TCP connection
>>> queue is full. So stuff like window scaling should work fine in normal
>>> situations.
>>>
>>> Speaking of which:
>>> When the half-open TCP connection queue is full and syn cookies are
>>> enabled, you get a message like "kernel: possible SYN flooding on port
>>> 2710. Sending cookies."
>>> However when syn cookies are disabled, you don't get any message (in
>>> kern.log), although connections to your server are timing out.
>>> Could such a message be added?
>>> Maybe with a suggestion to increase the size of that queue or to
>>> enable syn cookies.
>>>
>>> Greetings,
>>>
>>> Olaf
>>>
>>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=520668
>>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=520667
>>> https://bugs.launchpad.net/ubuntu/+bug/57091
>>>
>> Somebody?
> 
> Anybody?

This is a user selectable setting. What's wrong with /etc/sysctl.conf ?



^ permalink raw reply

* Re: Enable syn cookies by default
From: Olaf van der Spek @ 2009-10-21  7:17 UTC (permalink / raw)
  To: netdev
In-Reply-To: <b2cc26e40910150159q68ce555fs4b1683969d939d25@mail.gmail.com>

On Thu, Oct 15, 2009 at 10:59 AM, Olaf van der Spek
<olafvdspek@gmail.com> wrote:
> On Sat, Oct 10, 2009 at 3:01 PM, Olaf van der Spek <olafvdspek@gmail.com> wrote:
>> Hi,
>>
>> I'm forwarding Debian feature request #520668.
>>
>> Could syn cookies be enabled by default?
>>
>> AFAIK syn cookies only get send when the half-open TCP connection
>> queue is full. So stuff like window scaling should work fine in normal
>> situations.
>>
>> Speaking of which:
>> When the half-open TCP connection queue is full and syn cookies are
>> enabled, you get a message like "kernel: possible SYN flooding on port
>> 2710. Sending cookies."
>> However when syn cookies are disabled, you don't get any message (in
>> kern.log), although connections to your server are timing out.
>> Could such a message be added?
>> Maybe with a suggestion to increase the size of that queue or to
>> enable syn cookies.
>>
>> Greetings,
>>
>> Olaf
>>
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=520668
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=520667
>> https://bugs.launchpad.net/ubuntu/+bug/57091
>>
>
> Somebody?

Anybody?

^ permalink raw reply

* Re: [PATCH] Allow renaming of network interfaces that are up.
From: Nix @ 2009-10-21  6:50 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Matt Mackall, linux-kernel, netdev
In-Reply-To: <20091021103846.2f985ea1@s6510>

[Cc:s adjusted, thanks davem]

On 21 Oct 2009, Stephen Hemminger said:

> On Tue, 20 Oct 2009 19:54:02 +0100
> Nix <nix@esperi.org.uk> wrote:
[...]
>> This makes it much easier to use things like netconsole which bring up a
>> network interface before userspace has started: presently these will cause
>> interface renamings to fail, breaking any userspace that relies on renaming
>> devices to avoid reliance on the potentially-unstable kernel-assigned name.
[...]
> This breaks quagga and other applications that track renames.

So it's only userspace that's the problem? We have a choice of breaking
apps that assume that only downed interfaces can be renamed, and thus
breaking routing while the system is running, or breaking userspaces
that assume that they can rename interfaces, and thus breaking routing
at bootup when netconsole is on? Great :/

(How many systems run things that track renames? Is this, ew, a reason
to make this constraint configurable, maybe even at runtime, so you
could start with interfaces renameable and then lock them down once
static route assignment is up, just before you fire up quagga?)

^ permalink raw reply

* Re: pktgen and spin_lock_bh in xmit path
From: Ben Greear @ 2009-10-21  5:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: NetDev, robert, David S. Miller
In-Reply-To: <4ADE989A.90209@gmail.com>

Eric Dumazet wrote:
> Ben Greear a écrit :
>   
>> Eric Dumazet wrote:
>>     
>>> pktgen should not use "clone XXX" pkts if macvlan is used (or any
>>> other driver
>>> that ultimatly calls dev_queue_xmit() and queue packet), since skb
>>> queue anchor
>>> is shared and would be overwritten.
>>>   After some thoughts, I believe user is in error :)
>>>       
>> I tried to explain in my original post:  The problem arises when
>> when the hard-start-xmit fails with NETDEV_TX_BUSY.  Part of the
>> hard-start-xmit logic for virtual devices can call dev_queue_xmit, which
>> can ultimately
>> change the queue mapping and yet may still return NETDEV_TX_BUSY.
>>
>> pktgen would try to resend this skb next loop, and this is where it would
>> blow up.
>>
>> I have a patched macvlan logic and a patched dev queue xmit logic that
>> allows
>> me to return NETDEV_TX_BUSY when underlying device fails to transmit.
>>
>> It may be that my hacked macvlan is the only virtual device that could ever
>> return NETDEV_TX_BUSY, and if that is the case, I don't think the bug
>> could ever be hit in official kernel code.  My opinion is that the
>> current pktgen code makes
>> too many assumptions, so unless there is a performance penalty, I still
>> think it should be cleaned up.  But, I may be too paranoid.
>>     
>
> If a virtual device changes skb->queue_map, it must consume skb,
> or it breaks caller.
>   
> Alternative would be to restore queue_map to its initial value in
> your hacked macvlan when it wants to return NETDEV_TX_BUSY status.
>   
Yeah, that sounds fine to me, should be easy and cheap.
> We could add a WARN_ON(skb_get_queue_mapping(pkt_dev->skb) != queue_map);
> in pktgen, to catch driver errors but pktgen assumption is right IMHO
>
> @@ -3466,6 +3471,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
>  		/* fallthru */
>  	case NETDEV_TX_LOCKED:
>  	case NETDEV_TX_BUSY:
> +		WARN_ON(skb_get_queue_mapping(pkt_dev->skb) != queue_map);
>  		/* Retry it next time */
>  		atomic_dec(&(pkt_dev->skb->users));
>  		pkt_dev->last_ok = 0;
>   
That looks good too.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply

* Re: pktgen and spin_lock_bh in xmit path
From: Ben Greear @ 2009-10-21  5:32 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: David S. Miller, Eric Dumazet, NetDev, netdev-owner, robert
In-Reply-To: <OFE19FAAE4.E89690B3-ON65257656.001B6B90-65257656.001CA0C0@in.ibm.com>

Krishna Kumar2 wrote:
> Ben Greear <greearb@candelatech.com>  wrote on 10/21/2009 02:40:13 AM:
>
> Coming back a bit to this post:
>
>   
>>> -   queue_map = skb_get_queue_mapping(pkt_dev->skb);
>>> +   queue_map = pkt_dev->cur_queue_map;
>>> +   /*
>>> +    * tells skb_tx_hash() to use this tx queue.
>>> +    * We should reset skb->mapping before each xmit() because
>>> +    * xmit() might change it.
>>> +    */
>>> +   skb_record_rx_queue(pkt_dev->skb, queue_map);
>>>      txq = netdev_get_tx_queue(odev, queue_map);
>>>       
>> I think that must be wrong.  The record_rx_queue sets it to queue_map + 1,
>>     
>> but the hard-start-xmit method (in ixgbe/ixgbe_main.c, at least), takes the
>> skb->queue_map and uses it as an index with no subtraction.
>>     
>
> But that should work fine. record_rx_q sets queue_mapping to +1,
> but skb_tx_hash calls skb_get_rx_queue, which does a -1 on this
> value, and updates that value into queue_mapping. Hence it will
> not cross the txq boundary. Drivers can use the queue_map value
> directly without requiring to subtract.
>   
When using pktgen on real physical hardware, there is none of the 
skb_tx_hash or dev_queue_xmit
logic called, just the hard-start-xmit.  That is why it fails to update 
the proper queue with
his first patch. 

On virtual devices like mac-vlans, the logic probably worked ok since it 
goes
through dev_queue_xmit.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply

* Re: pktgen and spin_lock_bh in xmit path
From: Krishna Kumar2 @ 2009-10-21  5:12 UTC (permalink / raw)
  To: Ben Greear; +Cc: David S. Miller, Eric Dumazet, NetDev, netdev-owner, robert
In-Reply-To: <4ADE2735.9000807@candelatech.com>

Ben Greear <greearb@candelatech.com>  wrote on 10/21/2009 02:40:13 AM:

Coming back a bit to this post:

> > -   queue_map = skb_get_queue_mapping(pkt_dev->skb);
> > +   queue_map = pkt_dev->cur_queue_map;
> > +   /*
> > +    * tells skb_tx_hash() to use this tx queue.
> > +    * We should reset skb->mapping before each xmit() because
> > +    * xmit() might change it.
> > +    */
> > +   skb_record_rx_queue(pkt_dev->skb, queue_map);
> >      txq = netdev_get_tx_queue(odev, queue_map);
>
> I think that must be wrong.  The record_rx_queue sets it to queue_map +
1,
> but the hard-start-xmit method (in ixgbe/ixgbe_main.c, at least), takes
the
> skb->queue_map and uses it as an index with no subtraction.

But that should work fine. record_rx_q sets queue_mapping to +1,
but skb_tx_hash calls skb_get_rx_queue, which does a -1 on this
value, and updates that value into queue_mapping. Hence it will
not cross the txq boundary. Drivers can use the queue_map value
directly without requiring to subtract.

Thanks,

- KK

>
> This causes watchdog timeouts because we are calling txq_trans_update in
pktgen on
> queue 0, for instance, but sending pkts on queue 1.  If queue 1 is ever
> busy when the WD fires, link is reset.

^ permalink raw reply

* Re: pktgen and spin_lock_bh in xmit path
From: Eric Dumazet @ 2009-10-21  5:14 UTC (permalink / raw)
  To: Ben Greear; +Cc: NetDev, robert, David S. Miller
In-Reply-To: <4ADE9560.5050500@candelatech.com>

Ben Greear a écrit :
> Eric Dumazet wrote:
>> pktgen should not use "clone XXX" pkts if macvlan is used (or any
>> other driver
>> that ultimatly calls dev_queue_xmit() and queue packet), since skb
>> queue anchor
>> is shared and would be overwritten.
>>   After some thoughts, I believe user is in error :)
> I tried to explain in my original post:  The problem arises when
> when the hard-start-xmit fails with NETDEV_TX_BUSY.  Part of the
> hard-start-xmit logic for virtual devices can call dev_queue_xmit, which
> can ultimately
> change the queue mapping and yet may still return NETDEV_TX_BUSY.
> 
> pktgen would try to resend this skb next loop, and this is where it would
> blow up.
> 
> I have a patched macvlan logic and a patched dev queue xmit logic that
> allows
> me to return NETDEV_TX_BUSY when underlying device fails to transmit.
> 
> It may be that my hacked macvlan is the only virtual device that could ever
> return NETDEV_TX_BUSY, and if that is the case, I don't think the bug
> could ever be hit in official kernel code.  My opinion is that the
> current pktgen code makes
> too many assumptions, so unless there is a performance penalty, I still
> think it should be cleaned up.  But, I may be too paranoid.

If a virtual device changes skb->queue_map, it must consume skb,
or it breaks caller.

Alternative would be to restore queue_map to its initial value in
your hacked macvlan when it wants to return NETDEV_TX_BUSY status.


We could add a WARN_ON(skb_get_queue_mapping(pkt_dev->skb) != queue_map);
in pktgen, to catch driver errors but pktgen assumption is right IMHO

@@ -3466,6 +3471,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 		/* fallthru */
 	case NETDEV_TX_LOCKED:
 	case NETDEV_TX_BUSY:
+		WARN_ON(skb_get_queue_mapping(pkt_dev->skb) != queue_map);
 		/* Retry it next time */
 		atomic_dec(&(pkt_dev->skb->users));
 		pkt_dev->last_ok = 0;

^ permalink raw reply

* [PATCH] pktgen: Dont leak kernel memory
From: Eric Dumazet @ 2009-10-21  5:01 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Netdev List
In-Reply-To: <4ADE8C85.6020809@gmail.com>

Eric Dumazet a écrit :
> While playing with pktgen, I realized IP ID was not filled and a random value
> was taken, possibly leaking 2 bytes of kernel memory.
> 
> We can use an increasing ID, this can help diagnostics anyway.
> 
> 

Here is a more complete version of the patch, since we leak lot of kernel
memory :(

[PATCH] pktgen: Dont leak kernel memory

While playing with pktgen, I realized IP ID was not filled and a random value
was taken, possibly leaking 2 bytes of kernel memory.
 
We can use an increasing ID, this can help diagnostics anyway.

Also clear packet payload, instead of leaking kernel memory.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 1da0e03..5ce017b 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -335,6 +335,7 @@ struct pktgen_dev {
 	__u32 cur_src_mac_offset;
 	__be32 cur_saddr;
 	__be32 cur_daddr;
+	__u16 ip_id;
 	__u16 cur_udp_dst;
 	__u16 cur_udp_src;
 	__u16 cur_queue_map;
@@ -2630,6 +2631,8 @@ static struct sk_buff *fill_packet_ipv4(struct net_device *odev,
 	iph->protocol = IPPROTO_UDP;	/* UDP */
 	iph->saddr = pkt_dev->cur_saddr;
 	iph->daddr = pkt_dev->cur_daddr;
+	iph->id = htons(pkt_dev->ip_id);
+	pkt_dev->ip_id++;
 	iph->frag_off = 0;
 	iplen = 20 + 8 + datalen;
 	iph->tot_len = htons(iplen);
@@ -2641,24 +2644,26 @@ static struct sk_buff *fill_packet_ipv4(struct net_device *odev,
 	skb->dev = odev;
 	skb->pkt_type = PACKET_HOST;
 
-	if (pkt_dev->nfrags <= 0)
+	if (pkt_dev->nfrags <= 0) {
 		pgh = (struct pktgen_hdr *)skb_put(skb, datalen);
-	else {
+		memset(pgh + 1, 0, datalen - sizeof(struct pktgen_hdr));
+	} else {
 		int frags = pkt_dev->nfrags;
-		int i;
+		int i, len;
 
 		pgh = (struct pktgen_hdr *)(((char *)(udph)) + 8);
 
 		if (frags > MAX_SKB_FRAGS)
 			frags = MAX_SKB_FRAGS;
 		if (datalen > frags * PAGE_SIZE) {
-			skb_put(skb, datalen - frags * PAGE_SIZE);
+			len = datalen - frags * PAGE_SIZE;
+			memset(skb_put(skb, len), 0, len);
 			datalen = frags * PAGE_SIZE;
 		}
 
 		i = 0;
 		while (datalen > 0) {
-			struct page *page = alloc_pages(GFP_KERNEL, 0);
+			struct page *page = alloc_pages(GFP_KERNEL | __GFP_ZERO, 0);
 			skb_shinfo(skb)->frags[i].page = page;
 			skb_shinfo(skb)->frags[i].page_offset = 0;
 			skb_shinfo(skb)->frags[i].size =


^ permalink raw reply related

* Re: pktgen and spin_lock_bh in xmit path
From: Ben Greear @ 2009-10-21  5:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: NetDev, robert, David S. Miller
In-Reply-To: <4ADE873F.3030903@gmail.com>

Eric Dumazet wrote:
> pktgen should not use "clone XXX" pkts if macvlan is used (or any other driver
> that ultimatly calls dev_queue_xmit() and queue packet), since skb queue anchor
> is shared and would be overwritten.
>   
> After some thoughts, I believe user is in error :)
I tried to explain in my original post:  The problem arises when
when the hard-start-xmit fails with NETDEV_TX_BUSY.  Part of the
hard-start-xmit logic for virtual devices can call dev_queue_xmit, which 
can ultimately
change the queue mapping and yet may still return NETDEV_TX_BUSY.

pktgen would try to resend this skb next loop, and this is where it would
blow up.

I have a patched macvlan logic and a patched dev queue xmit logic that 
allows
me to return NETDEV_TX_BUSY when underlying device fails to transmit.

It may be that my hacked macvlan is the only virtual device that could ever
return NETDEV_TX_BUSY, and if that is the case, I don't think the bug
could ever be hit in official kernel code.  My opinion is that the 
current pktgen code makes
too many assumptions, so unless there is a performance penalty, I still
think it should be cleaned up.  But, I may be too paranoid.
> 1) Only way to use "clone XXXX" pkts is when using real device.
>   
Agreed, and I was not cloning pkts on the mac-vlan interface.
> 2) Also, using macvlan in pktgen is sub-optimal, since you can already put any
> MAC addresses in pktgen pkts, you dont need to go through macvlan layer.
>   
It's sub-optimal for massive pkt pushing, but still useful for sending 
multiple distinct flows
across a single physical wire.
> 3) If ixgbe overwrites skb->queue_mapping to current cpu, you should setup pktgen
>  queue_map_min and queue_map_max to match you cpu number, or use QUEUE_MAP_CPU pktgen flag
> Or else, pktgen wont get  the appropriate txq (and lock) before calling driver start_xmit()
>   
The hard-start-xmit path doesn't call the driver's queue-mapping logic, 
so you
only get that fun when transmitting through mac-vlans (or .1q vlans, 
etc).  There appears to be
no watchdog for virtual devices, and the dev_queue_xmit path updates the 
proper txq, so, as long as you
aren't using that +1 variant of the skb set queue map logic in pktgen, I 
think you will be fine.  The
current code is fine in this manner, but your patch broke it w/out the 
second patch to remove the +1
logic.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply

* [PATCH] pktgen: initialize IP ID field
From: Eric Dumazet @ 2009-10-21  4:22 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Netdev List

While playing with pktgen, I realized IP ID was not filled and a random value
was taken, possibly leaking 2 bytes of kernel memory.

We can use an increasing ID, this can help diagnostics anyway.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 1da0e03..59396a5 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -335,6 +335,7 @@ struct pktgen_dev {
 	__u32 cur_src_mac_offset;
 	__be32 cur_saddr;
 	__be32 cur_daddr;
+	__u16 ip_id;
 	__u16 cur_udp_dst;
 	__u16 cur_udp_src;
 	__u16 cur_queue_map;
@@ -2630,6 +2631,8 @@ static struct sk_buff *fill_packet_ipv4(struct net_device *odev,
 	iph->protocol = IPPROTO_UDP;	/* UDP */
 	iph->saddr = pkt_dev->cur_saddr;
 	iph->daddr = pkt_dev->cur_daddr;
+	iph->id = htons(pkt_dev->ip_id);
+	pkt_dev->ip_id++;
 	iph->frag_off = 0;
 	iplen = 20 + 8 + datalen;
 	iph->tot_len = htons(iplen);

^ permalink raw reply related

* Re: pktgen and spin_lock_bh in xmit path
From: Eric Dumazet @ 2009-10-21  3:59 UTC (permalink / raw)
  To: Ben Greear; +Cc: NetDev, robert, David S. Miller
In-Reply-To: <4ADE7C0D.5070208@gmail.com>

Eric Dumazet a écrit :
> Ben Greear a écrit :
>> Ben Greear wrote:
>>> On 10/20/2009 02:57 PM, Eric Dumazet wrote:
>>>> Ben Greear a écrit :
>>>>> That's definitely a nasty little issue.  Using skb_set_queue_mapping
>>>>> in pktgen makes it run for me, but may just be getting lucky with the
>>>>> mac-vlan interfaces which will do the dev_queue_xmit (but, I don't
>>>>> so much
>>>>> care exactly what queue is used as long as things don't crash and the
>>>>> link doesn't reset).
>>>>>
>>>>> Don't worry about a quick patch on my account.  I seem to have it
>>>>> working
>>>>> to at least some degree (no funny crashes, no link watchdog timeouts).
>>>>>
>>>> Could you try following patch ?
>>>>
>>>> This makes queue_mapping invariant if set in range [0 ...
>>>> real_num_tx_queues-1]
>>> Yes, that runs w/out causing link resets and without crashes (just
>>> tested it for
>>> a few minutes).
>>>
>>> Interestingly, the pkts sent by pktgen on the mac-vlans end up in
>>> tx-queues that match processor ID, even though I'm on .31 where mac-vlans
>>> have only one tx-queue and pktgen is setting the queue to 0 in the skb
>>> (per your previous patch).
>> Ok, this is because ixgbe implements the ndo_select_queue, which is
>> called from
>> dev_pick_tx.
>>
>> So, as far as I can tell, as long as you are using ixgbe with 82559, it
>> doesn't matter what you set
>> for the queue-map in the skb..it's always over-written.
>>
>> I don't know if this is a bug or a feature, but it explains the behavior
>> with tx and rx queues
>> that I saw when using pktgen on mac-vlans...
>>
> 
> We have many bugs in this area :)
> 
> So we probably need to reset skb_set_queue_mapping(skb, queue_map);
> each time skb is transmitted by pktgen.
> 
> Or else, pktgen will break if using bonding driver -> ixgbe, since bonding
> uses only one txqueue (it is not yet multiqueue aware)
> 

After some thoughts, I believe user is in error :)

pktgen should not use "clone XXX" pkts if macvlan is used (or any other driver
that ultimatly calls dev_queue_xmit() and queue packet), since skb queue anchor
is shared and would be overwritten.

1) Only way to use "clone XXXX" pkts is when using real device.

2) Also, using macvlan in pktgen is sub-optimal, since you can already put any
MAC addresses in pktgen pkts, you dont need to go through macvlan layer.

3) If ixgbe overwrites skb->queue_mapping to current cpu, you should setup pktgen
 queue_map_min and queue_map_max to match you cpu number, or use QUEUE_MAP_CPU pktgen flag
Or else, pktgen wont get  the appropriate txq (and lock) before calling driver start_xmit()

Unfortunatly, the (queue_map_min==queue_map_max) case needs a patch that might be not present in 2.6.31
(commit 896a7cf8d846a9e86fb823be16f4f14ffeb7f074 : pktgen: Fix multiqueue handling)


^ permalink raw reply

* Re: pktgen and spin_lock_bh in xmit path
From: Eric Dumazet @ 2009-10-21  3:12 UTC (permalink / raw)
  To: Ben Greear; +Cc: NetDev, robert, David S. Miller
In-Reply-To: <4ADE7A63.4090404@candelatech.com>

Ben Greear a écrit :
> Ben Greear wrote:
>> On 10/20/2009 02:57 PM, Eric Dumazet wrote:
>>> Ben Greear a écrit :
>>>> That's definitely a nasty little issue.  Using skb_set_queue_mapping
>>>> in pktgen makes it run for me, but may just be getting lucky with the
>>>> mac-vlan interfaces which will do the dev_queue_xmit (but, I don't
>>>> so much
>>>> care exactly what queue is used as long as things don't crash and the
>>>> link doesn't reset).
>>>>
>>>> Don't worry about a quick patch on my account.  I seem to have it
>>>> working
>>>> to at least some degree (no funny crashes, no link watchdog timeouts).
>>>>
>>>
>>> Could you try following patch ?
>>>
>>> This makes queue_mapping invariant if set in range [0 ...
>>> real_num_tx_queues-1]
>>
>> Yes, that runs w/out causing link resets and without crashes (just
>> tested it for
>> a few minutes).
>>
>> Interestingly, the pkts sent by pktgen on the mac-vlans end up in
>> tx-queues that match processor ID, even though I'm on .31 where mac-vlans
>> have only one tx-queue and pktgen is setting the queue to 0 in the skb
>> (per your previous patch).
> Ok, this is because ixgbe implements the ndo_select_queue, which is
> called from
> dev_pick_tx.
> 
> So, as far as I can tell, as long as you are using ixgbe with 82559, it
> doesn't matter what you set
> for the queue-map in the skb..it's always over-written.
> 
> I don't know if this is a bug or a feature, but it explains the behavior
> with tx and rx queues
> that I saw when using pktgen on mac-vlans...
> 

We have many bugs in this area :)

So we probably need to reset skb_set_queue_mapping(skb, queue_map);
each time skb is transmitted by pktgen.

Or else, pktgen will break if using bonding driver -> ixgbe, since bonding
uses only one txqueue (it is not yet multiqueue aware)

Thanks, I'll take care of official patches submission


^ permalink raw reply

* Re: pktgen and spin_lock_bh in xmit path
From: Ben Greear @ 2009-10-21  3:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: NetDev, robert, David S. Miller
In-Reply-To: <4ADE44FC.4030406@candelatech.com>

Ben Greear wrote:
> On 10/20/2009 02:57 PM, Eric Dumazet wrote:
>> Ben Greear a écrit :
>>> That's definitely a nasty little issue.  Using skb_set_queue_mapping
>>> in pktgen makes it run for me, but may just be getting lucky with the
>>> mac-vlan interfaces which will do the dev_queue_xmit (but, I don't 
>>> so much
>>> care exactly what queue is used as long as things don't crash and the
>>> link doesn't reset).
>>>
>>> Don't worry about a quick patch on my account.  I seem to have it 
>>> working
>>> to at least some degree (no funny crashes, no link watchdog timeouts).
>>>
>>
>> Could you try following patch ?
>>
>> This makes queue_mapping invariant if set in range [0 ... 
>> real_num_tx_queues-1]
>
> Yes, that runs w/out causing link resets and without crashes (just 
> tested it for
> a few minutes).
>
> Interestingly, the pkts sent by pktgen on the mac-vlans end up in
> tx-queues that match processor ID, even though I'm on .31 where mac-vlans
> have only one tx-queue and pktgen is setting the queue to 0 in the skb
> (per your previous patch).
Ok, this is because ixgbe implements the ndo_select_queue, which is 
called from
dev_pick_tx.

So, as far as I can tell, as long as you are using ixgbe with 82559, it 
doesn't matter what you set
for the queue-map in the skb..it's always over-written.

I don't know if this is a bug or a feature, but it explains the behavior 
with tx and rx queues
that I saw when using pktgen on mac-vlans...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply

* [net-next-2.6 PATCH 3/3] ixgbe: Make queue pairs on single MSI-X interrupts
From: Jeff Kirsher @ 2009-10-21  2:27 UTC (permalink / raw)
  To: davem; +Cc: gospo, netdev, Peter P Waskiewicz Jr, Jeff Kirsher
In-Reply-To: <20091021022626.32449.73883.stgit@localhost.localdomain>

From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>

This patch pairs similar-numbered Rx and Tx queues onto a single
MSI-X vector.  For example, Tx queue 0 and Rx queue 0's interrupt
with be ethX-RxTx-0.  This allows for more efficient cleanup, since
fewer interrupts will be firing during device operation.  It also
helps with a cleaner CPU affinity for IRQ affinity.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgbe/ixgbe_main.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index d2280c3..00b6829 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -3579,10 +3579,10 @@ static int ixgbe_set_interrupt_capability(struct ixgbe_adapter *adapter)
 	 * It's easy to be greedy for MSI-X vectors, but it really
 	 * doesn't do us much good if we have a lot more vectors
 	 * than CPU's.  So let's be conservative and only ask for
-	 * (roughly) twice the number of vectors as there are CPU's.
+	 * (roughly) the same number of vectors as there are CPU's.
 	 */
 	v_budget = min(adapter->num_rx_queues + adapter->num_tx_queues,
-	               (int)(num_online_cpus() * 2)) + NON_Q_VECTORS;
+	               (int)num_online_cpus()) + NON_Q_VECTORS;
 
 	/*
 	 * At the same time, hardware can only support a maximum of


^ permalink raw reply related

* [net-next-2.6 PATCH 2/3] ixgbe: Set MSI-X vectors to NOBALANCING and set affinity
From: Jeff Kirsher @ 2009-10-21  2:27 UTC (permalink / raw)
  To: davem; +Cc: gospo, netdev, Peter P Waskiewicz Jr, Jeff Kirsher
In-Reply-To: <20091021022626.32449.73883.stgit@localhost.localdomain>

From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>

This patch will set each MSI-X vector to IRQF_NOBALANCING to
prevent autobalance of the interrupts, then applies a CPU
affinity.  This will only be done when Flow Director is enabled,
which needs interrupts to be processed on the same CPUs where the
applications are running.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgbe/ixgbe_main.c |   34 +++++++++++++++++++++++++++++-----
 1 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index 4c8a449..d2280c3 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1565,8 +1565,10 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
 	irqreturn_t (*handler)(int, void *);
-	int i, vector, q_vectors, err;
+	int i, vector, q_vectors, cpu, err;
 	int ri=0, ti=0;
+	u32 intr_flags = 0;
+	u32 num_cpus = num_online_cpus();
 
 	/* Decrement for Other and TCP Timer vectors */
 	q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
@@ -1576,17 +1578,22 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
 	if (err)
 		goto out;
 
+	/* If Flow Director is enabled, we want to affinitize vectors */
+	if ((adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE) ||
+	    (adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE))
+		intr_flags = IRQF_NOBALANCING;
+
 #define SET_HANDLER(_v) ((!(_v)->rxr_count) ? &ixgbe_msix_clean_tx : \
                          (!(_v)->txr_count) ? &ixgbe_msix_clean_rx : \
                          &ixgbe_msix_clean_many)
-	for (vector = 0; vector < q_vectors; vector++) {
+	for (vector = 0, cpu = 0; vector < q_vectors; vector++) {
 		handler = SET_HANDLER(adapter->q_vector[vector]);
 
-		if(handler == &ixgbe_msix_clean_rx) {
+		if (handler == &ixgbe_msix_clean_rx) {
 			sprintf(adapter->name[vector], "%s-%s-%d",
 				netdev->name, "rx", ri++);
 		}
-		else if(handler == &ixgbe_msix_clean_tx) {
+		else if (handler == &ixgbe_msix_clean_tx) {
 			sprintf(adapter->name[vector], "%s-%s-%d",
 				netdev->name, "tx", ti++);
 		}
@@ -1595,7 +1602,8 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
 				netdev->name, "TxRx", vector);
 
 		err = request_irq(adapter->msix_entries[vector].vector,
-		                  handler, 0, adapter->name[vector],
+		                  handler, intr_flags,
+		                  adapter->name[vector],
 		                  adapter->q_vector[vector]);
 		if (err) {
 			DPRINTK(PROBE, ERR,
@@ -1603,9 +1611,25 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
 			        "Error: %d\n", err);
 			goto free_queue_irqs;
 		}
+		if (intr_flags) {
+			/*
+			 * We're not balancing the vector, so affinitize it.
+			 * Best default layout is try and assign one vector
+			 * per CPU.  If we have more vectors than online
+			 * CPUs, then try to first affinitize Rx, then lay
+			 * Tx over the same Rx CPU map.  This can always be
+			 * overridden using smp_affinity in /proc
+			 */
+
+			irq_set_affinity(adapter->msix_entries[vector].vector,
+			                 cpumask_of(cpu));
+			if (++cpu >= num_cpus)
+				cpu = 0;
+		}
 	}
 
 	sprintf(adapter->name[vector], "%s:lsc", netdev->name);
+	/* We don't care if this vector is irqbalanced or not */
 	err = request_irq(adapter->msix_entries[vector].vector,
 	                  &ixgbe_msix_lsc, 0, adapter->name[vector], netdev);
 	if (err) {


^ permalink raw reply related

* [net-next-2.6 PATCH 1/3] irq: Export irq_set_affinity() for drivers
From: Jeff Kirsher @ 2009-10-21  2:26 UTC (permalink / raw)
  To: davem; +Cc: gospo, netdev, Peter P Waskiewicz Jr, Jeff Kirsher

From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>

This patch allows drivers to specify an IRQ affinity mask for
their respective interrupt sources.  This is very useful on
network adapters using MSI-X, where aligning network flows
linearly to CPUs greatly improves efficiency of the network
stack.

Today, users must either hand-set affinity through /proc, or
use a script through the same interface.  This patch will allow
a driver to come completely pre-canned with an optimal
configuration.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 kernel/irq/manage.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index bde4c66..185eb26 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -137,6 +137,7 @@ int irq_set_affinity(unsigned int irq, const struct cpumask *cpumask)
 	spin_unlock_irqrestore(&desc->lock, flags);
 	return 0;
 }
+EXPORT_SYMBOL(irq_set_affinity);

 #ifndef CONFIG_AUTO_IRQ_AFFINITY
 /*

^ permalink raw reply related

* Re: [PATCH RFC] Per route TCP options
From: Bill Fink @ 2009-10-21  2:13 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: netdev, ori
In-Reply-To: <1256052161-14156-1-git-send-email-gilad@codefidence.com>

On Tue, 20 Oct 2009, Gilad Ben-Yossef wrote:

> Turn the global sysctls allowing disabling of TCP SACK, DSCAK,
> time stamp and window scale into per route entry feature options,
> laying the ground to future removal of the relevant global sysctls.
> 
> You really only want to disable SACK, DSACK, time stamp or window
> scale if you've got a piece of broken networking equipment somewhere 
> as a stop gap until you can bring a big enough hammer to deal with
> the broken network equipment. It doesn't make sense to "punish" the
> entire connections going through the machine to destinations not 
> related to the broken equipment.

For certain test situations, it is sometimes desirable to globally
disable TCP timestamps.  Although I have not personally wanted to
globally disable the other mentioned features, I can imagine test
scenarios where it could be useful.  Admittedly it could also be
accomplished with per route features, just not as conveniently,
especially if there are a large number of interfaces and/or routes.

						-Bill

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox