Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Fwd: Ethtool not displaying ntuple/nfc rule settings
From: Ben Hutchings @ 2012-05-29 17:55 UTC (permalink / raw)
  To: TJ Johnson; +Cc: netdev
In-Reply-To: <CAFZv1415XPt+7v3xo6BR86cRKPju7QV7NQMAzS-O21TqvGBgBA@mail.gmail.com>

On Tue, 2012-05-29 at 10:55 -0600, TJ Johnson wrote:
> Sorry for the double send. Had to convert to plain text as I received
> failure messages.
> 
> ---------- Forwarded message ----------
> From: TJ Johnson <tjjohnson10200@gmail.com>
> Date: Tue, May 29, 2012 at 10:36 AM
> Subject: Ethtool not displaying ntuple/nfc rule settings
> To: netdev@vger.kernel.org, bhutchings@solarflare.com
> 
> 
> Hi,
> 
> Not sure if this is even the right place to ask this, so feel free to push
> me somewhere else if necessary. I am using ethtool's -U option to set up
> rules for an ixgbe device. That part seems to work great. However I am
> unable to check the currently configured rules once they are in place.
> 
> ethtool -u DEVNAME produces this:
> Cannot get RX rings: Operation not supported
> rxclass: Cannot get RX class rule count: Operation not supported
> RX classification rule retrieval failed
> 
> ethtool -n DEVNAME rx-flow-hash udp4 produces this:
> Cannot get RX network flow hashing options: Operation not supported
[...]
> The OS is SUSE Linux Enterprise Server 11 using a custom 2.6.36 kernel.
> 
> Any Ideas as to what the problem might be? Or how I can solve the issue?

This version of the ixgbe driver implemented the n-tuple interface,
which has since been removed in favour of the NFC rules interface.  It
was switched to the new interface in Linux 3.1.

ethtool supports setting rules through either interface (automatically)
but can only read them back through the NFC rules interface.  The
n-tuple interface did support reading rules but the information returned
was unreliable: it would not necessarily match the hardware state.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Fwd: Ethtool not displaying ntuple/nfc rule settings
From: TJ Johnson @ 2012-05-29 16:55 UTC (permalink / raw)
  To: netdev, bhutchings
In-Reply-To: <CAFZv140-yrrDmD_H3ySEfJOL7Yp3syGt-VCNXaOi64nzucp5xA@mail.gmail.com>

Sorry for the double send. Had to convert to plain text as I received
failure messages.

---------- Forwarded message ----------
From: TJ Johnson <tjjohnson10200@gmail.com>
Date: Tue, May 29, 2012 at 10:36 AM
Subject: Ethtool not displaying ntuple/nfc rule settings
To: netdev@vger.kernel.org, bhutchings@solarflare.com

Hi,

Not sure if this is even the right place to ask this, so feel free to push
me somewhere else if necessary. I am using ethtool's -U option to set up
rules for an ixgbe device. That part seems to work great. However I am
unable to check the currently configured rules once they are in place.

ethtool -u DEVNAME produces this:
Cannot get RX rings: Operation not supported
rxclass: Cannot get RX class rule count: Operation not supported
RX classification rule retrieval failed

ethtool -n DEVNAME rx-flow-hash udp4 produces this:
Cannot get RX network flow hashing options: Operation not supported

The ethtool version I am using is 3.2, however I have tried 2.6.36-2.6.39.
Just used configure; make for building the tool.

ethtool -i gives this for the device version:

driver: ixgbe
version: 3.3.9-NAPI
firmware-version: 1.0-3
bus-info: 0000:0c:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

The OS is SUSE Linux Enterprise Server 11 using a custom 2.6.36 kernel.

Any Ideas as to what the problem might be? Or how I can solve the issue?

Thanks,
TJ

^ permalink raw reply

* Re: [PATCH 2/2] tc-drr(8): tab unquoted in a argument to a macro
From: Stephen Hemminger @ 2012-05-29 15:18 UTC (permalink / raw)
  To: Andreas Henriksson; +Cc: netdev, Bjarni Ingi Gislason
In-Reply-To: <1338205565-11872-2-git-send-email-andreas@fatal.se>

On Mon, 28 May 2012 13:46:05 +0200
Andreas Henriksson <andreas@fatal.se> wrote:

> From: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
> 
> From "man ..." ("groff -ww -mandoc ..."):
> 
> <groff: tc-drr.8>:67: warning: tab character in unquoted macro argument
> <groff: tc-drr.8>:69: warning: tab character in unquoted macro argument
> 
> *********************
> 
> Originally filed at: http://bugs.debian.org/674706
> 
> Signed-off-by: Andreas Henriksson <andreas@fatal.se>
> ---
>  man/man8/tc-drr.8 |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/man/man8/tc-drr.8 b/man/man8/tc-drr.8
> index 16a8ec0..e25d6dd 100644
> --- a/man/man8/tc-drr.8
> +++ b/man/man8/tc-drr.8
> @@ -64,9 +64,9 @@ flow filter:
>  
>  .B for i in .. 1024;do
>  .br
> -.B \ttc class add dev ..  classid $handle:$(print %x $i)
> +.B "\ttc class add dev .. classid $handle:$(print %x $i)"
>  .br
> -.B \ttc qdisc add dev .. fifo limit 16
> +.B "\ttc qdisc add dev .. fifo limit 16"
>  .br
>  .B done
>  

Both applied, thanks.

^ permalink raw reply

* Re: Strange latency spikes/TX network stalls on Sun Fire X4150(x86) and e1000e
From: Eric Dumazet @ 2012-05-29 15:11 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Hiroaki SHIMODA, Denys Fedoryshchenko, netdev, e1000-devel,
	jeffrey.t.kirsher, jesse.brandeburg, davem
In-Reply-To: <CA+mtBx_uYy9XcRvpD2E46FuMBFu38iQvCiwFHWqbhPBmY=JfOg@mail.gmail.com>

On Tue, 2012-05-29 at 07:54 -0700, Tom Herbert wrote:
> Thanks Hiroaki for this description, it looks promising.  Denys, can
> you test with his patch.
> 
> Tom

Indeed this sounds good.

Hmm, I guess my e1000e has no FLAG2_DMA_BURST in adapter->flags2

^ permalink raw reply

* Re: Strange latency spikes/TX network stalls on Sun Fire X4150(x86) and e1000e
From: Tom Herbert @ 2012-05-29 14:54 UTC (permalink / raw)
  To: Hiroaki SHIMODA
  Cc: Denys Fedoryshchenko, netdev, e1000-devel, jeffrey.t.kirsher,
	jesse.brandeburg, eric.dumazet, davem
In-Reply-To: <20120529232518.e5b41759.shimoda.hiroaki@gmail.com>

Thanks Hiroaki for this description, it looks promising.  Denys, can
you test with his patch.

Tom

On Tue, May 29, 2012 at 7:25 AM, Hiroaki SHIMODA
<shimoda.hiroaki@gmail.com> wrote:
> On Sun, 20 May 2012 10:40:41 -0700
> Tom Herbert <therbert@google.com> wrote:
>
>> Tried to reproduce:
>>
>> May 20 10:08:30 test kernel: [    6.168240] e1000e 0000:06:00.0:
>> (unregistered net_device): Interrupt Throttling Rate (ints/sec) set to
>> dynamic conservative mode
>> May 20 10:08:30 test kernel: [    6.221591] e1000e 0000:06:00.1:
>> (unregistered net_device): Interrupt Throttling Rate (ints/sec) set to
>> dynamic conservative mode
>>
>> 06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
>> Ethernet Controller (Copper) (rev 01)
>> 06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
>> Ethernet Controller (Copper) (rev 01)
>>
>> Following above instructions to repro gives:
>>
>> 1480 bytes from test2 (192.168.2.49): icmp_req=5875 ttl=64 time=0.358 ms
>> 1480 bytes from test2 (192.168.2.49): icmp_req=5876 ttl=64 time=0.330 ms
>> 1480 bytes from test2 (192.168.2.49): icmp_req=5877 ttl=64 time=0.337 ms
>> 1480 bytes from test2 (192.168.2.49): icmp_req=5878 ttl=64 time=0.375 ms
>> 1480 bytes from test2 (192.168.2.49): icmp_req=5879 ttl=64 time=0.359 ms
>> 1480 bytes from lpb49.prod.google.com (192.168.2.49): icmp_req=5880
>> ttl=64 time=0.380 ms
>>
>> And I didn't see the stalls. This was on an Intel machine.  The limit
>> was stable, went up to around 28K when opened large file and tended to
>> stay between 15-28K.
>>
>> The describe problem seems to have characteristics that transmit
>> interrupts are not at all periodic, and it would seem that some are
>> taking hundreds of milliseconds to pop.  I don't see anything that
>> would cause that in the NIC, is it possible there is some activity on
>> the machines periodically and often holding down interrupts for  long
>> periods of time.  Are there any peculiarities on Sun Fire in interrupt
>> handling?
>>
>> Can you also provide an 'ethtool -c eth0'
>>
>> Thanks,
>> Tom
>
> I also observed the similar behaviour on the following environment.
>
> 03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
>
> [    2.962119] e1000e: Intel(R) PRO/1000 Network Driver - 2.0.0-k
> [    2.968095] e1000e: Copyright(c) 1999 - 2012 Intel Corporation.
> [    2.974251] e1000e 0000:03:00.0: Disabling ASPM L0s L1
> [    2.979653] e1000e 0000:03:00.0: (unregistered net_device): Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
> [    2.991599] e1000e 0000:03:00.0: irq 72 for MSI/MSI-X
> [    2.991606] e1000e 0000:03:00.0: irq 73 for MSI/MSI-X
> [    2.991611] e1000e 0000:03:00.0: irq 74 for MSI/MSI-X
> [    3.092768] e1000e 0000:03:00.0: eth0: (PCI Express:2.5GT/s:Width x1) 48:5b:39:75:91:bd
> [ 3.100992] e1000e 0000:03:00.0: eth0: Intel(R) PRO/1000 Network Connection
> [ 3.108173] e1000e 0000:03:00.0: eth0: MAC: 3, PHY: 8, PBA No: FFFFFF-0FF
>
> I tried some coalesce options by 'ethtool -C eth0', but
> anything didn't help.
>
> If I understand the code and spec correctly, TX interrupts are
> generated when TXDCTL.WTHRESH descriptors have been accumulated
> and write backed.
>
> I tentatively changed the TXDCTL.WTHRESH to 1, then it seems
> that latency spikes are disappear.
>
> drivers/net/ethernet/intel/e1000e/e1000.h
> @@ -181,7 +181,7 @@ struct e1000_info;
>  #define E1000_TXDCTL_DMA_BURST_ENABLE                          \
>        (E1000_TXDCTL_GRAN | /* set descriptor granularity */  \
>         E1000_TXDCTL_COUNT_DESC |                             \
> -        (5 << 16) | /* wthresh must be +1 more than desired */\
> +        (1 << 16) | /* wthresh must be +1 more than desired */\
>         (1 << 8)  | /* hthresh */                             \
>         0x1f)       /* pthresh */
>
> (before) $ ping -i0.2 192.168.11.2
> PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> 64 bytes from 192.168.11.2: icmp_req=1 ttl=64 time=0.191 ms
> 64 bytes from 192.168.11.2: icmp_req=2 ttl=64 time=0.179 ms
> 64 bytes from 192.168.11.2: icmp_req=3 ttl=64 time=0.199 ms
> 64 bytes from 192.168.11.2: icmp_req=4 ttl=64 time=0.143 ms
> 64 bytes from 192.168.11.2: icmp_req=5 ttl=64 time=0.193 ms
> 64 bytes from 192.168.11.2: icmp_req=6 ttl=64 time=0.150 ms
> 64 bytes from 192.168.11.2: icmp_req=7 ttl=64 time=0.186 ms
> 64 bytes from 192.168.11.2: icmp_req=8 ttl=64 time=0.198 ms
> 64 bytes from 192.168.11.2: icmp_req=9 ttl=64 time=0.195 ms
> 64 bytes from 192.168.11.2: icmp_req=10 ttl=64 time=0.194 ms
> 64 bytes from 192.168.11.2: icmp_req=11 ttl=64 time=0.196 ms
> 64 bytes from 192.168.11.2: icmp_req=12 ttl=64 time=0.200 ms
> 64 bytes from 192.168.11.2: icmp_req=13 ttl=64 time=651 ms
> 64 bytes from 192.168.11.2: icmp_req=14 ttl=64 time=451 ms
> 64 bytes from 192.168.11.2: icmp_req=15 ttl=64 time=241 ms
> 64 bytes from 192.168.11.2: icmp_req=16 ttl=64 time=31.3 ms
> 64 bytes from 192.168.11.2: icmp_req=17 ttl=64 time=0.184 ms
> 64 bytes from 192.168.11.2: icmp_req=18 ttl=64 time=0.199 ms
> 64 bytes from 192.168.11.2: icmp_req=19 ttl=64 time=0.197 ms
> 64 bytes from 192.168.11.2: icmp_req=20 ttl=64 time=0.196 ms
> 64 bytes from 192.168.11.2: icmp_req=21 ttl=64 time=0.192 ms
> 64 bytes from 192.168.11.2: icmp_req=22 ttl=64 time=0.205 ms
> 64 bytes from 192.168.11.2: icmp_req=23 ttl=64 time=629 ms
> 64 bytes from 192.168.11.2: icmp_req=24 ttl=64 time=419 ms
> 64 bytes from 192.168.11.2: icmp_req=25 ttl=64 time=209 ms
> 64 bytes from 192.168.11.2: icmp_req=26 ttl=64 time=0.280 ms
> 64 bytes from 192.168.11.2: icmp_req=27 ttl=64 time=0.193 ms
> 64 bytes from 192.168.11.2: icmp_req=28 ttl=64 time=0.194 ms
> 64 bytes from 192.168.11.2: icmp_req=29 ttl=64 time=0.143 ms
> 64 bytes from 192.168.11.2: icmp_req=30 ttl=64 time=0.191 ms
> 64 bytes from 192.168.11.2: icmp_req=31 ttl=64 time=0.144 ms
> 64 bytes from 192.168.11.2: icmp_req=32 ttl=64 time=0.192 ms
> 64 bytes from 192.168.11.2: icmp_req=33 ttl=64 time=0.199 ms
> 64 bytes from 192.168.11.2: icmp_req=34 ttl=64 time=0.193 ms
> 64 bytes from 192.168.11.2: icmp_req=35 ttl=64 time=0.196 ms
> 64 bytes from 192.168.11.2: icmp_req=36 ttl=64 time=0.196 ms
> 64 bytes from 192.168.11.2: icmp_req=37 ttl=64 time=0.196 ms
> 64 bytes from 192.168.11.2: icmp_req=38 ttl=64 time=1600 ms
> 64 bytes from 192.168.11.2: icmp_req=39 ttl=64 time=1390 ms
> 64 bytes from 192.168.11.2: icmp_req=40 ttl=64 time=1180 ms
> 64 bytes from 192.168.11.2: icmp_req=41 ttl=64 time=980 ms
> 64 bytes from 192.168.11.2: icmp_req=42 ttl=64 time=780 ms
> 64 bytes from 192.168.11.2: icmp_req=43 ttl=64 time=570 ms
> 64 bytes from 192.168.11.2: icmp_req=44 ttl=64 time=0.151 ms
> 64 bytes from 192.168.11.2: icmp_req=45 ttl=64 time=0.189 ms
> 64 bytes from 192.168.11.2: icmp_req=46 ttl=64 time=0.203 ms
> 64 bytes from 192.168.11.2: icmp_req=47 ttl=64 time=0.185 ms
> 64 bytes from 192.168.11.2: icmp_req=48 ttl=64 time=0.189 ms
> 64 bytes from 192.168.11.2: icmp_req=49 ttl=64 time=0.204 ms
> 64 bytes from 192.168.11.2: icmp_req=50 ttl=64 time=0.198 ms
>
> I think 1000 ms - 2000 ms delay is come from e1000_watchdog_task().
>
> (after) $ ping -i0.2 192.168.11.2
> 64 bytes from 192.168.11.2: icmp_req=1 ttl=64 time=0.175 ms
> 64 bytes from 192.168.11.2: icmp_req=2 ttl=64 time=0.203 ms
> 64 bytes from 192.168.11.2: icmp_req=3 ttl=64 time=0.196 ms
> 64 bytes from 192.168.11.2: icmp_req=4 ttl=64 time=0.197 ms
> 64 bytes from 192.168.11.2: icmp_req=5 ttl=64 time=0.186 ms
> 64 bytes from 192.168.11.2: icmp_req=6 ttl=64 time=0.197 ms
> 64 bytes from 192.168.11.2: icmp_req=7 ttl=64 time=0.189 ms
> 64 bytes from 192.168.11.2: icmp_req=8 ttl=64 time=0.146 ms
> 64 bytes from 192.168.11.2: icmp_req=9 ttl=64 time=0.193 ms
> 64 bytes from 192.168.11.2: icmp_req=10 ttl=64 time=0.194 ms
> 64 bytes from 192.168.11.2: icmp_req=11 ttl=64 time=0.195 ms
> 64 bytes from 192.168.11.2: icmp_req=12 ttl=64 time=0.190 ms
> 64 bytes from 192.168.11.2: icmp_req=13 ttl=64 time=0.204 ms
> 64 bytes from 192.168.11.2: icmp_req=14 ttl=64 time=0.201 ms
> 64 bytes from 192.168.11.2: icmp_req=15 ttl=64 time=0.189 ms
> 64 bytes from 192.168.11.2: icmp_req=16 ttl=64 time=0.193 ms
> 64 bytes from 192.168.11.2: icmp_req=17 ttl=64 time=0.190 ms
> 64 bytes from 192.168.11.2: icmp_req=18 ttl=64 time=0.143 ms
> 64 bytes from 192.168.11.2: icmp_req=19 ttl=64 time=0.191 ms
> 64 bytes from 192.168.11.2: icmp_req=20 ttl=64 time=0.190 ms

^ permalink raw reply

* Difficulties to get 1Gbps on be2net ethernet card
From: Jean-Michel Hautbois @ 2012-05-29 14:46 UTC (permalink / raw)
  To: netdev

Hi list,

I am using a NC553i ethernet card connected on a HP 10GbE Flex-10.
I am sending UDP multicast packets from one blade to another (HP
ProLiant BL460c G7) which has stricly the same HW.

I have lots of packet loss from Tx to Rx, and I can't understand why.
I suspected TX coalescing but since 3.4 I can't set this parameter
(and adaptive-tx is on by default).
I have tried the same test with a debian lenny (2.6.26 kernel and HP
drivers) and it works very well (adaptive-tx is off).

Here is the netstat (from Tx point of view) :

$> netstat -s eth1 > before ; sleep 10 ; netstat -s eth1 > after
$> beforeafter before after
Ip:
    280769 total packets received
    4 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    275063 incoming packets delivered
    305430 requests sent out
    0 dropped because of missing route
Icmp:
    0 ICMP messages received
    0 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 0
        echo requests: 0
    0 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 0
        echo replies: 0
IcmpMsg:
        InType3: 0
        InType8: 0
        OutType0: 0
        OutType3: 0
Tcp:
    18 active connections openings
    18 passive connection openings
    0 failed connection attempts
    0 connection resets received
    0 connections established
    3681 segments received
    3650 segments send out
    0 segments retransmited
    0 bad segments received.
    0 resets sent
Udp:
    12626 packets received
    0 packets to unknown port received.
    0 packet receive errors
    259025 packets sent
UdpLite:
TcpExt:
    0 invalid SYN cookies received
    0 packets pruned from receive queue because of socket buffer overrun
    14 TCP sockets finished time wait in fast timer
    0 packets rejects in established connections because of timestamp
    61 delayed acks sent
    0 delayed acks further delayed because of locked socket
    Quick ack mode was activated 0 times
    2924 packets directly queued to recvmsg prequeue.
    32 bytes directly in process context from backlog
    48684 bytes directly received in process context from prequeue
    232 packet headers predicted
    1991 packets header predicted and directly queued to user
    132 acknowledgments not containing data payload received
    2230 predicted acknowledgments
    0 times recovered from packet loss by selective acknowledgements
    0 congestion windows recovered without slow start after partial ack
    0 TCP data loss events
    0 timeouts after SACK recovery
    0 fast retransmits
    0 forward retransmits
    0 retransmits in slow start
    0 other TCP timeouts
    1 times receiver scheduled too late for direct processing
    0 packets collapsed in receive queue due to low socket buffer
    0 DSACKs sent for old packets
    0 DSACKs received
    0 connections reset due to unexpected data
    0 connections reset due to early user close
    0 connections aborted due to timeout
    0 times unabled to send RST due to no memory
    TCPSackShifted: 0
    TCPSackMerged: 0
    TCPSackShiftFallback: 0
    TCPBacklogDrop: 0
    TCPDeferAcceptDrop: 0
IpExt:
    InMcastPkts: -652745397
    OutMcastPkts: 301498
    InBcastPkts: 13
    InOctets: -2004227752
    OutOctets: -2096666083
    InMcastOctets: 1058181285
    OutMcastOctets: -1510963815
    InBcastOctets: 1014

And ethtool diff :
$> ethtool -S eth1 > before ; sleep 10 ; ethtool -S eth1 > after
$> beforeafter before after
NIC statistics:
     rx_crc_errors: 0
     rx_alignment_symbol_errors: 0
     rx_pause_frames: 0
     rx_control_frames: 0
     rx_in_range_errors: 0
     rx_out_range_errors: 0
     rx_frame_too_long: 0
     rx_address_mismatch_drops: 6
     rx_dropped_too_small: 0
     rx_dropped_too_short: 0
     rx_dropped_header_too_small: 0
     rx_dropped_tcp_length: 0
     rx_dropped_runt: 0
     rxpp_fifo_overflow_drop: 0
     rx_input_fifo_overflow_drop: 0
     rx_ip_checksum_errs: 0
     rx_tcp_checksum_errs: 0
     rx_udp_checksum_errs: 0
     tx_pauseframes: 0
     tx_controlframes: 0
     rx_priority_pause_frames: 0
     pmem_fifo_overflow_drop: 0
     jabber_events: 0
     rx_drops_no_pbuf: 0
     rx_drops_no_erx_descr: 0
     rx_drops_no_tpre_descr: 0
     rx_drops_too_many_frags: 0
     forwarded_packets: 0
     rx_drops_mtu: 0
     eth_red_drops: 0
     be_on_die_temperature: 0
     rxq0: rx_bytes: 0
     rxq0: rx_pkts: 0
     rxq0: rx_compl: 0
     rxq0: rx_mcast_pkts: 0
     rxq0: rx_post_fail: 0
     rxq0: rx_drops_no_skbs: 0
     rxq0: rx_drops_no_frags: 0
     txq0: tx_compl: 257113
     txq0: tx_bytes: 1038623935
     txq0: tx_pkts: 257113
     txq0: tx_reqs: 257113
     txq0: tx_wrbs: 514226
     txq0: tx_stops: 10

As you can see, there is 10 tx_stops in 10 seconds (it varies, can be 3 to 15).
Any thoughts ?

Regards,
JM

^ permalink raw reply

* RE: [PATCH net-next] iwlwifi: dont pull too much payload in skb head
From: Berg, Johannes @ 2012-05-29 14:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Guy, Wey-Yi W
In-Reply-To: <1337354484.7029.42.camel@edumazet-glaptop>

> > We may want to move this code into mac80211 later though since it also
> > has an if (pull in everything, even reallocating if necessary, if it's
> > a management frame), but that can wait, I think we're the only driver
> > using paged RX.
> 
> This is OK, these frames wont be injected in linux IP/TCP stack.

Right.

> Or maybe you would like an optimized version of skb_header_pointer(),
> avoiding the copy if the whole blob can be part of _one_ fragment ?

Hmm. I guess that would work, but I'm not sure it's worth the effort since there typically aren't many management frames. We'd have to replace all skb->data, the entire mac80211 assumes that management frames are linear. 

johannes
-- 

--------------------------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen, Deutschland 
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 
Ust.-IdNr./VAT Registration No.: DE129385895
Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052

^ permalink raw reply

* Re: Strange latency spikes/TX network stalls on Sun Fire X4150(x86) and e1000e
From: Hiroaki SHIMODA @ 2012-05-29 14:25 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Denys Fedoryshchenko, netdev, e1000-devel, jeffrey.t.kirsher,
	jesse.brandeburg, eric.dumazet, davem
In-Reply-To: <CA+mtBx_sF5GCMRpLQuTruZ=xpFTFpd5z8SZJaG_dBqf4oCXpwg@mail.gmail.com>

On Sun, 20 May 2012 10:40:41 -0700
Tom Herbert <therbert@google.com> wrote:

> Tried to reproduce:
> 
> May 20 10:08:30 test kernel: [    6.168240] e1000e 0000:06:00.0:
> (unregistered net_device): Interrupt Throttling Rate (ints/sec) set to
> dynamic conservative mode
> May 20 10:08:30 test kernel: [    6.221591] e1000e 0000:06:00.1:
> (unregistered net_device): Interrupt Throttling Rate (ints/sec) set to
> dynamic conservative mode
> 
> 06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
> Ethernet Controller (Copper) (rev 01)
> 06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
> Ethernet Controller (Copper) (rev 01)
> 
> Following above instructions to repro gives:
> 
> 1480 bytes from test2 (192.168.2.49): icmp_req=5875 ttl=64 time=0.358 ms
> 1480 bytes from test2 (192.168.2.49): icmp_req=5876 ttl=64 time=0.330 ms
> 1480 bytes from test2 (192.168.2.49): icmp_req=5877 ttl=64 time=0.337 ms
> 1480 bytes from test2 (192.168.2.49): icmp_req=5878 ttl=64 time=0.375 ms
> 1480 bytes from test2 (192.168.2.49): icmp_req=5879 ttl=64 time=0.359 ms
> 1480 bytes from lpb49.prod.google.com (192.168.2.49): icmp_req=5880
> ttl=64 time=0.380 ms
> 
> And I didn't see the stalls. This was on an Intel machine.  The limit
> was stable, went up to around 28K when opened large file and tended to
> stay between 15-28K.
> 
> The describe problem seems to have characteristics that transmit
> interrupts are not at all periodic, and it would seem that some are
> taking hundreds of milliseconds to pop.  I don't see anything that
> would cause that in the NIC, is it possible there is some activity on
> the machines periodically and often holding down interrupts for  long
> periods of time.  Are there any peculiarities on Sun Fire in interrupt
> handling?
> 
> Can you also provide an 'ethtool -c eth0'
> 
> Thanks,
> Tom

I also observed the similar behaviour on the following environment.

03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

[    2.962119] e1000e: Intel(R) PRO/1000 Network Driver - 2.0.0-k
[    2.968095] e1000e: Copyright(c) 1999 - 2012 Intel Corporation.
[    2.974251] e1000e 0000:03:00.0: Disabling ASPM L0s L1
[    2.979653] e1000e 0000:03:00.0: (unregistered net_device): Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    2.991599] e1000e 0000:03:00.0: irq 72 for MSI/MSI-X
[    2.991606] e1000e 0000:03:00.0: irq 73 for MSI/MSI-X
[    2.991611] e1000e 0000:03:00.0: irq 74 for MSI/MSI-X
[    3.092768] e1000e 0000:03:00.0: eth0: (PCI Express:2.5GT/s:Width x1) 48:5b:39:75:91:bd
[    3.100992] e1000e 0000:03:00.0: eth0: Intel(R) PRO/1000 Network Connection
[    3.108173] e1000e 0000:03:00.0: eth0: MAC: 3, PHY: 8, PBA No: FFFFFF-0FF

I tried some coalesce options by 'ethtool -C eth0', but
anything didn't help.

If I understand the code and spec correctly, TX interrupts are
generated when TXDCTL.WTHRESH descriptors have been accumulated
and write backed.

I tentatively changed the TXDCTL.WTHRESH to 1, then it seems
that latency spikes are disappear.

drivers/net/ethernet/intel/e1000e/e1000.h
@@ -181,7 +181,7 @@ struct e1000_info;
 #define E1000_TXDCTL_DMA_BURST_ENABLE                          \
        (E1000_TXDCTL_GRAN | /* set descriptor granularity */  \
         E1000_TXDCTL_COUNT_DESC |                             \
-        (5 << 16) | /* wthresh must be +1 more than desired */\
+        (1 << 16) | /* wthresh must be +1 more than desired */\
         (1 << 8)  | /* hthresh */                             \
         0x1f)       /* pthresh */

(before) $ ping -i0.2 192.168.11.2
PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
64 bytes from 192.168.11.2: icmp_req=1 ttl=64 time=0.191 ms
64 bytes from 192.168.11.2: icmp_req=2 ttl=64 time=0.179 ms
64 bytes from 192.168.11.2: icmp_req=3 ttl=64 time=0.199 ms
64 bytes from 192.168.11.2: icmp_req=4 ttl=64 time=0.143 ms
64 bytes from 192.168.11.2: icmp_req=5 ttl=64 time=0.193 ms
64 bytes from 192.168.11.2: icmp_req=6 ttl=64 time=0.150 ms
64 bytes from 192.168.11.2: icmp_req=7 ttl=64 time=0.186 ms
64 bytes from 192.168.11.2: icmp_req=8 ttl=64 time=0.198 ms
64 bytes from 192.168.11.2: icmp_req=9 ttl=64 time=0.195 ms
64 bytes from 192.168.11.2: icmp_req=10 ttl=64 time=0.194 ms
64 bytes from 192.168.11.2: icmp_req=11 ttl=64 time=0.196 ms
64 bytes from 192.168.11.2: icmp_req=12 ttl=64 time=0.200 ms
64 bytes from 192.168.11.2: icmp_req=13 ttl=64 time=651 ms
64 bytes from 192.168.11.2: icmp_req=14 ttl=64 time=451 ms
64 bytes from 192.168.11.2: icmp_req=15 ttl=64 time=241 ms
64 bytes from 192.168.11.2: icmp_req=16 ttl=64 time=31.3 ms
64 bytes from 192.168.11.2: icmp_req=17 ttl=64 time=0.184 ms
64 bytes from 192.168.11.2: icmp_req=18 ttl=64 time=0.199 ms
64 bytes from 192.168.11.2: icmp_req=19 ttl=64 time=0.197 ms
64 bytes from 192.168.11.2: icmp_req=20 ttl=64 time=0.196 ms
64 bytes from 192.168.11.2: icmp_req=21 ttl=64 time=0.192 ms
64 bytes from 192.168.11.2: icmp_req=22 ttl=64 time=0.205 ms
64 bytes from 192.168.11.2: icmp_req=23 ttl=64 time=629 ms
64 bytes from 192.168.11.2: icmp_req=24 ttl=64 time=419 ms
64 bytes from 192.168.11.2: icmp_req=25 ttl=64 time=209 ms
64 bytes from 192.168.11.2: icmp_req=26 ttl=64 time=0.280 ms
64 bytes from 192.168.11.2: icmp_req=27 ttl=64 time=0.193 ms
64 bytes from 192.168.11.2: icmp_req=28 ttl=64 time=0.194 ms
64 bytes from 192.168.11.2: icmp_req=29 ttl=64 time=0.143 ms
64 bytes from 192.168.11.2: icmp_req=30 ttl=64 time=0.191 ms
64 bytes from 192.168.11.2: icmp_req=31 ttl=64 time=0.144 ms
64 bytes from 192.168.11.2: icmp_req=32 ttl=64 time=0.192 ms
64 bytes from 192.168.11.2: icmp_req=33 ttl=64 time=0.199 ms
64 bytes from 192.168.11.2: icmp_req=34 ttl=64 time=0.193 ms
64 bytes from 192.168.11.2: icmp_req=35 ttl=64 time=0.196 ms
64 bytes from 192.168.11.2: icmp_req=36 ttl=64 time=0.196 ms
64 bytes from 192.168.11.2: icmp_req=37 ttl=64 time=0.196 ms
64 bytes from 192.168.11.2: icmp_req=38 ttl=64 time=1600 ms
64 bytes from 192.168.11.2: icmp_req=39 ttl=64 time=1390 ms
64 bytes from 192.168.11.2: icmp_req=40 ttl=64 time=1180 ms
64 bytes from 192.168.11.2: icmp_req=41 ttl=64 time=980 ms
64 bytes from 192.168.11.2: icmp_req=42 ttl=64 time=780 ms
64 bytes from 192.168.11.2: icmp_req=43 ttl=64 time=570 ms
64 bytes from 192.168.11.2: icmp_req=44 ttl=64 time=0.151 ms
64 bytes from 192.168.11.2: icmp_req=45 ttl=64 time=0.189 ms
64 bytes from 192.168.11.2: icmp_req=46 ttl=64 time=0.203 ms
64 bytes from 192.168.11.2: icmp_req=47 ttl=64 time=0.185 ms
64 bytes from 192.168.11.2: icmp_req=48 ttl=64 time=0.189 ms
64 bytes from 192.168.11.2: icmp_req=49 ttl=64 time=0.204 ms
64 bytes from 192.168.11.2: icmp_req=50 ttl=64 time=0.198 ms

I think 1000 ms - 2000 ms delay is come from e1000_watchdog_task().

(after) $ ping -i0.2 192.168.11.2
64 bytes from 192.168.11.2: icmp_req=1 ttl=64 time=0.175 ms
64 bytes from 192.168.11.2: icmp_req=2 ttl=64 time=0.203 ms
64 bytes from 192.168.11.2: icmp_req=3 ttl=64 time=0.196 ms
64 bytes from 192.168.11.2: icmp_req=4 ttl=64 time=0.197 ms
64 bytes from 192.168.11.2: icmp_req=5 ttl=64 time=0.186 ms
64 bytes from 192.168.11.2: icmp_req=6 ttl=64 time=0.197 ms
64 bytes from 192.168.11.2: icmp_req=7 ttl=64 time=0.189 ms
64 bytes from 192.168.11.2: icmp_req=8 ttl=64 time=0.146 ms
64 bytes from 192.168.11.2: icmp_req=9 ttl=64 time=0.193 ms
64 bytes from 192.168.11.2: icmp_req=10 ttl=64 time=0.194 ms
64 bytes from 192.168.11.2: icmp_req=11 ttl=64 time=0.195 ms
64 bytes from 192.168.11.2: icmp_req=12 ttl=64 time=0.190 ms
64 bytes from 192.168.11.2: icmp_req=13 ttl=64 time=0.204 ms
64 bytes from 192.168.11.2: icmp_req=14 ttl=64 time=0.201 ms
64 bytes from 192.168.11.2: icmp_req=15 ttl=64 time=0.189 ms
64 bytes from 192.168.11.2: icmp_req=16 ttl=64 time=0.193 ms
64 bytes from 192.168.11.2: icmp_req=17 ttl=64 time=0.190 ms
64 bytes from 192.168.11.2: icmp_req=18 ttl=64 time=0.143 ms
64 bytes from 192.168.11.2: icmp_req=19 ttl=64 time=0.191 ms
64 bytes from 192.168.11.2: icmp_req=20 ttl=64 time=0.190 ms

^ permalink raw reply

* Re: [PATCH] skb: avoid unnecessary reallocations in __skb_cow
From: Eric Dumazet @ 2012-05-29 13:43 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: netdev
In-Reply-To: <1338298508-40376-1-git-send-email-nbd@openwrt.org>

On Tue, 2012-05-29 at 15:35 +0200, Felix Fietkau wrote:

> 
> Signed-off-by: Felix Fietkau <nbd@openwrt.org>
> ---
>  include/linux/skbuff.h |    2 --
>  1 files changed, 0 insertions(+), 2 deletions(-)
> 

 

Signed-off-by: Eric Dumazet <edumazet@google.com>

Thanks !

^ permalink raw reply

* [PATCH] skb: avoid unnecessary reallocations in __skb_cow
From: Felix Fietkau @ 2012-05-29 13:35 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet

At the beginning of __skb_cow, headroom gets set to a minimum of
NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not
cloned and the headroom is just below NET_SKB_PAD, but still more than the
amount requested by the caller.
This was showing up frequently in my tests on VLAN tx, where
vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN).

Locally generated packets should have enough headroom, and for forward
paths, we already have NET_SKB_PAD bytes of headroom, so we don't need to
add any extra space here.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
---
 include/linux/skbuff.h |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0e50171..b534a1b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1896,8 +1896,6 @@ static inline int __skb_cow(struct sk_buff *skb, unsigned int headroom,
 {
 	int delta = 0;

-	if (headroom < NET_SKB_PAD)
-		headroom = NET_SKB_PAD;
 	if (headroom > skb_headroom(skb))
 		delta = headroom - skb_headroom(skb);

-- 
1.7.3.2

^ permalink raw reply related

* [PATCH] l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case
From: James Chapman @ 2012-05-29 13:30 UTC (permalink / raw)
  To: netdev; +Cc: levinsasha928, James Chapman

An application may call connect() to disconnect a socket using an
address with family AF_UNSPEC. The L2TP IP sockets were not handling
this case when the socket is not bound and an attempt to connect()
using AF_UNSPEC in such cases would result in an oops. This patch
addresses the problem by protecting the sk_prot->disconnect() call
against trying to unhash the socket before it is bound.

The L2TP IPv4 and IPv6 sockets have the same problem. Both are fixed
by this patch.

The patch also adds more checks that the sockaddr supplied to bind()
and connect() calls is valid.

 RIP: 0010:[<ffffffff82e133b0>]  [<ffffffff82e133b0>] inet_unhash+0x50/0xd0
 RSP: 0018:ffff88001989be28  EFLAGS: 00010293
 Stack:
  ffff8800407a8000 0000000000000000 ffff88001989be78 ffffffff82e3a249
  ffffffff82e3a050 ffff88001989bec8 ffff88001989be88 ffff8800407a8000
  0000000000000010 ffff88001989bec8 ffff88001989bea8 ffffffff82e42639
 Call Trace:
 [<ffffffff82e3a249>] udp_disconnect+0x1f9/0x290
 [<ffffffff82e42639>] inet_dgram_connect+0x29/0x80
 [<ffffffff82d012fc>] sys_connect+0x9c/0x100

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
---
 net/l2tp/l2tp_ip.c  |   24 ++++++++++++++++++++++--
 net/l2tp/l2tp_ip6.c |   18 +++++++++++++++++-
 2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index 889f5d1..70614e7 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -239,9 +239,16 @@ static int l2tp_ip_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct sockaddr_l2tpip *addr = (struct sockaddr_l2tpip *) uaddr;
-	int ret = -EINVAL;
+	int ret;
 	int chk_addr_ret;
 
+	if (!sock_flag(sk, SOCK_ZAPPED))
+		return -EINVAL;
+	if (addr_len < sizeof(struct sockaddr_l2tpip))
+		return -EINVAL;
+	if (addr->l2tp_family != AF_INET)
+		return -EINVAL;
+
 	ret = -EADDRINUSE;
 	read_lock_bh(&l2tp_ip_lock);
 	if (__l2tp_ip_bind_lookup(&init_net, addr->l2tp_addr.s_addr, sk->sk_bound_dev_if, addr->l2tp_conn_id))
@@ -272,6 +279,8 @@ static int l2tp_ip_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	sk_del_node_init(sk);
 	write_unlock_bh(&l2tp_ip_lock);
 	ret = 0;
+	sock_reset_flag(sk, SOCK_ZAPPED);
+
 out:
 	release_sock(sk);
 
@@ -288,6 +297,9 @@ static int l2tp_ip_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
 	struct sockaddr_l2tpip *lsa = (struct sockaddr_l2tpip *) uaddr;
 	int rc;
 
+	if (sock_flag(sk, SOCK_ZAPPED)) /* Must bind first - autobinding does not work */
+		return -EINVAL;
+
 	if (addr_len < sizeof(*lsa))
 		return -EINVAL;
 
@@ -311,6 +323,14 @@ static int l2tp_ip_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
 	return rc;
 }
 
+static int l2tp_ip_disconnect(struct sock *sk, int flags)
+{
+	if (sock_flag(sk, SOCK_ZAPPED))
+		return 0;
+
+	return udp_disconnect(sk, flags);
+}
+
 static int l2tp_ip_getname(struct socket *sock, struct sockaddr *uaddr,
 			   int *uaddr_len, int peer)
 {
@@ -530,7 +550,7 @@ static struct proto l2tp_ip_prot = {
 	.close		   = l2tp_ip_close,
 	.bind		   = l2tp_ip_bind,
 	.connect	   = l2tp_ip_connect,
-	.disconnect	   = udp_disconnect,
+	.disconnect	   = l2tp_ip_disconnect,
 	.ioctl		   = udp_ioctl,
 	.destroy	   = l2tp_ip_destroy_sock,
 	.setsockopt	   = ip_setsockopt,
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 0291d8d..35e1e4b 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -258,6 +258,10 @@ static int l2tp_ip6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	int addr_type;
 	int err;
 
+	if (!sock_flag(sk, SOCK_ZAPPED))
+		return -EINVAL;
+	if (addr->l2tp_family != AF_INET6)
+		return -EINVAL;
 	if (addr_len < sizeof(*addr))
 		return -EINVAL;
 
@@ -331,6 +335,7 @@ static int l2tp_ip6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	sk_del_node_init(sk);
 	write_unlock_bh(&l2tp_ip6_lock);
 
+	sock_reset_flag(sk, SOCK_ZAPPED);
 	release_sock(sk);
 	return 0;
 
@@ -354,6 +359,9 @@ static int l2tp_ip6_connect(struct sock *sk, struct sockaddr *uaddr,
 	int	addr_type;
 	int rc;
 
+	if (sock_flag(sk, SOCK_ZAPPED)) /* Must bind first - autobinding does not work */
+		return -EINVAL;
+
 	if (addr_len < sizeof(*lsa))
 		return -EINVAL;
 
@@ -383,6 +391,14 @@ static int l2tp_ip6_connect(struct sock *sk, struct sockaddr *uaddr,
 	return rc;
 }
 
+static int l2tp_ip6_disconnect(struct sock *sk, int flags)
+{
+	if (sock_flag(sk, SOCK_ZAPPED))
+		return 0;
+
+	return udp_disconnect(sk, flags);
+}
+
 static int l2tp_ip6_getname(struct socket *sock, struct sockaddr *uaddr,
 			    int *uaddr_len, int peer)
 {
@@ -689,7 +705,7 @@ static struct proto l2tp_ip6_prot = {
 	.close		   = l2tp_ip6_close,
 	.bind		   = l2tp_ip6_bind,
 	.connect	   = l2tp_ip6_connect,
-	.disconnect	   = udp_disconnect,
+	.disconnect	   = l2tp_ip6_disconnect,
 	.ioctl		   = udp_ioctl,
 	.destroy	   = l2tp_ip6_destroy_sock,
 	.setsockopt	   = ipv6_setsockopt,
-- 
1.7.0.4

^ permalink raw reply related

* Re: [RFC] skb: avoid unnecessary reallocations in __skb_cow
From: Eric Dumazet @ 2012-05-29 13:26 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: netdev
In-Reply-To: <4FC4CAB1.8010000@openwrt.org>

On Tue, 2012-05-29 at 15:10 +0200, Felix Fietkau wrote:

> I don't have any real use case in mind, but it's not really adding an
> extra NET_SKB_PAD, it simply fills up the headroom to NET_SKB_PAD,

This is not what is doing your patch.

If cloned is true, and current skb headroom less than 64, you add an
extra 64 bytes of headroom.

Just keep it simple, this is inline code and should be kept as small as
possible.

^ permalink raw reply

* Re: [RFC] skb: avoid unnecessary reallocations in __skb_cow
From: Felix Fietkau @ 2012-05-29 13:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1338296361.2840.23.camel@edumazet-glaptop>

On 2012-05-29 2:59 PM, Eric Dumazet wrote:
> On Tue, 2012-05-29 at 14:41 +0200, Felix Fietkau wrote:
>> On 2012-05-29 2:34 PM, Eric Dumazet wrote:
>> > On Sun, 2012-05-27 at 17:26 +0200, Felix Fietkau wrote:
>> >> At the beginning of __skb_cow, headroom gets set to a minimum of
>> >> NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not
>> >> cloned and the headroom is just below NET_SKB_PAD, but still more than the
>> >> amount requested by the caller.
>> >> This was showing up frequently in my tests on VLAN tx, where
>> >> vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN).
>> >> 
>> >> Fix this by only setting the headroom delta if either there is less
>> >> headroom than specified by the caller, or if reallocation has to be done
>> >> anyway because the skb was cloned.
>> >> 
>> >> Signed-off-by: Felix Fietkau <nbd@openwrt.org>
>> >> ---
>> >>  include/linux/skbuff.h |    9 ++++++---
>> >>  1 files changed, 6 insertions(+), 3 deletions(-)
>> >> 
>> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> >> index 0e50171..1898471 100644
>> >> --- a/include/linux/skbuff.h
>> >> +++ b/include/linux/skbuff.h
>> >> @@ -1894,12 +1894,15 @@ static inline int skb_clone_writable(const struct sk_buff *skb, unsigned int len
>> >>  static inline int __skb_cow(struct sk_buff *skb, unsigned int headroom,
>> >>  			    int cloned)
>> >>  {
>> >> +	unsigned int alloc_headroom = headroom;
>> >>  	int delta = 0;
>> >>  
>> >>  	if (headroom < NET_SKB_PAD)
>> >> -		headroom = NET_SKB_PAD;
>> >> -	if (headroom > skb_headroom(skb))
>> >> -		delta = headroom - skb_headroom(skb);
>> >> +		alloc_headroom = NET_SKB_PAD;
>> >> +	if (headroom > skb_headroom(skb) ||
>> >> +	    (cloned && alloc_headroom > skb_headroom(skb))) {
>> >> +		delta = alloc_headroom - skb_headroom(skb);
>> >> +	}
>> >>  
>> >>  	if (delta || cloned)
>> >>  		return pskb_expand_head(skb, ALIGN(delta, NET_SKB_PAD), 0,
>> > 
>> > Nice catch.
>> > 
>> > Scratching my head on this one. Why not the obvious fix ?
>> If we're reallocating anyway, we might as well put in more headroom than
>> requested, in case something else needs even more than that.
> 
> 
> Locally generated packets should have enough headroom, and for forward
> paths, we already have NET_SKB_PAD bytes of headroom.
> 
> Adding yet another NET_SKB_PAD extra space is overkill, unless you have
> a real use case in mind ?
I don't have any real use case in mind, but it's not really adding an
extra NET_SKB_PAD, it simply fills up the headroom to NET_SKB_PAD, but I
guess that's probably unnecessary as well.
I'll resend the patch without the extra padding later.

- Felix

^ permalink raw reply

* Re: [RFC] skb: avoid unnecessary reallocations in __skb_cow
From: Eric Dumazet @ 2012-05-29 12:59 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: netdev
In-Reply-To: <4FC4C3E8.6080206@openwrt.org>

On Tue, 2012-05-29 at 14:41 +0200, Felix Fietkau wrote:
> On 2012-05-29 2:34 PM, Eric Dumazet wrote:
> > On Sun, 2012-05-27 at 17:26 +0200, Felix Fietkau wrote:
> >> At the beginning of __skb_cow, headroom gets set to a minimum of
> >> NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not
> >> cloned and the headroom is just below NET_SKB_PAD, but still more than the
> >> amount requested by the caller.
> >> This was showing up frequently in my tests on VLAN tx, where
> >> vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN).
> >> 
> >> Fix this by only setting the headroom delta if either there is less
> >> headroom than specified by the caller, or if reallocation has to be done
> >> anyway because the skb was cloned.
> >> 
> >> Signed-off-by: Felix Fietkau <nbd@openwrt.org>
> >> ---
> >>  include/linux/skbuff.h |    9 ++++++---
> >>  1 files changed, 6 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >> index 0e50171..1898471 100644
> >> --- a/include/linux/skbuff.h
> >> +++ b/include/linux/skbuff.h
> >> @@ -1894,12 +1894,15 @@ static inline int skb_clone_writable(const struct sk_buff *skb, unsigned int len
> >>  static inline int __skb_cow(struct sk_buff *skb, unsigned int headroom,
> >>  			    int cloned)
> >>  {
> >> +	unsigned int alloc_headroom = headroom;
> >>  	int delta = 0;
> >>  
> >>  	if (headroom < NET_SKB_PAD)
> >> -		headroom = NET_SKB_PAD;
> >> -	if (headroom > skb_headroom(skb))
> >> -		delta = headroom - skb_headroom(skb);
> >> +		alloc_headroom = NET_SKB_PAD;
> >> +	if (headroom > skb_headroom(skb) ||
> >> +	    (cloned && alloc_headroom > skb_headroom(skb))) {
> >> +		delta = alloc_headroom - skb_headroom(skb);
> >> +	}
> >>  
> >>  	if (delta || cloned)
> >>  		return pskb_expand_head(skb, ALIGN(delta, NET_SKB_PAD), 0,
> > 
> > Nice catch.
> > 
> > Scratching my head on this one. Why not the obvious fix ?
> If we're reallocating anyway, we might as well put in more headroom than
> requested, in case something else needs even more than that.


Locally generated packets should have enough headroom, and for forward
paths, we already have NET_SKB_PAD bytes of headroom.

Adding yet another NET_SKB_PAD extra space is overkill, unless you have
a real use case in mind ?

^ permalink raw reply

* Re: [RFC] skb: avoid unnecessary reallocations in __skb_cow
From: Felix Fietkau @ 2012-05-29 12:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1338294848.2840.15.camel@edumazet-glaptop>

On 2012-05-29 2:34 PM, Eric Dumazet wrote:
> On Sun, 2012-05-27 at 17:26 +0200, Felix Fietkau wrote:
>> At the beginning of __skb_cow, headroom gets set to a minimum of
>> NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not
>> cloned and the headroom is just below NET_SKB_PAD, but still more than the
>> amount requested by the caller.
>> This was showing up frequently in my tests on VLAN tx, where
>> vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN).
>> 
>> Fix this by only setting the headroom delta if either there is less
>> headroom than specified by the caller, or if reallocation has to be done
>> anyway because the skb was cloned.
>> 
>> Signed-off-by: Felix Fietkau <nbd@openwrt.org>
>> ---
>>  include/linux/skbuff.h |    9 ++++++---
>>  1 files changed, 6 insertions(+), 3 deletions(-)
>> 
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 0e50171..1898471 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -1894,12 +1894,15 @@ static inline int skb_clone_writable(const struct sk_buff *skb, unsigned int len
>>  static inline int __skb_cow(struct sk_buff *skb, unsigned int headroom,
>>  			    int cloned)
>>  {
>> +	unsigned int alloc_headroom = headroom;
>>  	int delta = 0;
>>  
>>  	if (headroom < NET_SKB_PAD)
>> -		headroom = NET_SKB_PAD;
>> -	if (headroom > skb_headroom(skb))
>> -		delta = headroom - skb_headroom(skb);
>> +		alloc_headroom = NET_SKB_PAD;
>> +	if (headroom > skb_headroom(skb) ||
>> +	    (cloned && alloc_headroom > skb_headroom(skb))) {
>> +		delta = alloc_headroom - skb_headroom(skb);
>> +	}
>>  
>>  	if (delta || cloned)
>>  		return pskb_expand_head(skb, ALIGN(delta, NET_SKB_PAD), 0,
> 
> Nice catch.
> 
> Scratching my head on this one. Why not the obvious fix ?
If we're reallocating anyway, we might as well put in more headroom than
requested, in case something else needs even more than that.

- Felix

^ permalink raw reply

* Re: [RFC] skb: avoid unnecessary reallocations in __skb_cow
From: Eric Dumazet @ 2012-05-29 12:34 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: netdev
In-Reply-To: <1338132370-88299-1-git-send-email-nbd@openwrt.org>

On Sun, 2012-05-27 at 17:26 +0200, Felix Fietkau wrote:
> At the beginning of __skb_cow, headroom gets set to a minimum of
> NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not
> cloned and the headroom is just below NET_SKB_PAD, but still more than the
> amount requested by the caller.
> This was showing up frequently in my tests on VLAN tx, where
> vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN).
> 
> Fix this by only setting the headroom delta if either there is less
> headroom than specified by the caller, or if reallocation has to be done
> anyway because the skb was cloned.
> 
> Signed-off-by: Felix Fietkau <nbd@openwrt.org>
> ---
>  include/linux/skbuff.h |    9 ++++++---
>  1 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 0e50171..1898471 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1894,12 +1894,15 @@ static inline int skb_clone_writable(const struct sk_buff *skb, unsigned int len
>  static inline int __skb_cow(struct sk_buff *skb, unsigned int headroom,
>  			    int cloned)
>  {
> +	unsigned int alloc_headroom = headroom;
>  	int delta = 0;
>  
>  	if (headroom < NET_SKB_PAD)
> -		headroom = NET_SKB_PAD;
> -	if (headroom > skb_headroom(skb))
> -		delta = headroom - skb_headroom(skb);
> +		alloc_headroom = NET_SKB_PAD;
> +	if (headroom > skb_headroom(skb) ||
> +	    (cloned && alloc_headroom > skb_headroom(skb))) {
> +		delta = alloc_headroom - skb_headroom(skb);
> +	}
>  
>  	if (delta || cloned)
>  		return pskb_expand_head(skb, ALIGN(delta, NET_SKB_PAD), 0,

Nice catch.

Scratching my head on this one. Why not the obvious fix ?

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0e50171..b534a1b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1896,8 +1896,6 @@ static inline int __skb_cow(struct sk_buff *skb, unsigned int headroom,
 {
 	int delta = 0;
 
-	if (headroom < NET_SKB_PAD)
-		headroom = NET_SKB_PAD;
 	if (headroom > skb_headroom(skb))
 		delta = headroom - skb_headroom(skb);
 

^ permalink raw reply related

* Re
From: WANG LIU @ 2012-05-29 12:05 UTC (permalink / raw)


I am Mr. Liu Wang, bank officer with international bank
of Taipei, Taiwan. I need your partnership in
re-profiling funds. You will be paid 30% for management
fee.Contact Email:wangliu159@gmail.com

^ permalink raw reply

* Re: [PATCH] r6040: disable pci device if the subsequent calls (after pci_enable_device) fails
From: devendra.aaru @ 2012-05-29 10:13 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, linux-kernel
In-Reply-To: <3646407.261CWqact8@flexo>

Hi Florian,

>> On Tue, May 29, 2012 at 2:50 PM, Florian Fainelli <florian@openwrt.org>
> wrote:
>
>>
>> Thanks for the Ack.
>> I found one more problem. Its when mdiobus_alloc fails in
>> r6040_init_one, we need to do call to the netif_napi_del and set the
>> NULL to pci_set_drvdata, at  err_out_unmap.
>
> Ok, can you please submit a patch to fix this issue as well? Thanks!
> --
Ok sure. will be doing it shortly.
> Florian

Thanks,
Devendra.

^ permalink raw reply

* Re: [PATCH] r6040: disable pci device if the subsequent calls (after pci_enable_device) fails
From: Florian Fainelli @ 2012-05-29 10:06 UTC (permalink / raw)
  To: devendra.aaru; +Cc: netdev, linux-kernel
In-Reply-To: <CAHdPZaPWM3MfyARfwOm2KkKPPvm4dEfncF47N2xASyp27HdzsA@mail.gmail.com>

On Tuesday 29 May 2012 15:28:51 devendra.aaru wrote:
> Hello Florian,
> 
> On Tue, May 29, 2012 at 2:50 PM, Florian Fainelli <florian@openwrt.org> 
wrote:

> 
> Thanks for the Ack.
> I found one more problem. Its when mdiobus_alloc fails in
> r6040_init_one, we need to do call to the netif_napi_del and set the
> NULL to pci_set_drvdata, at  err_out_unmap.

Ok, can you please submit a patch to fix this issue as well? Thanks!
--
Florian

^ permalink raw reply

* Re: [PATCH] r6040: disable pci device if the subsequent calls (after pci_enable_device) fails
From: devendra.aaru @ 2012-05-29  9:58 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, linux-kernel
In-Reply-To: <1659482.3g1Zl6FDuM@flexo>

Hello Florian,

On Tue, May 29, 2012 at 2:50 PM, Florian Fainelli <florian@openwrt.org> wrote:
> On Monday 28 May 2012 17:27:03 Devendra Naga wrote:
>> the calls after the pci_enable_device may fail, and will error out with out
>> disabling it. disable the device at error paths.
>
> Looks good, thanks Devendra!
>
>>
>> Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
>
> Acked-by: Florian Fainelli <florian@openwrt.org>
>
>> ---
>>  drivers/net/ethernet/rdc/r6040.c |   10 ++++++----
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/rdc/r6040.c
> b/drivers/net/ethernet/rdc/r6040.c
>> index 4de7364..8f5079a 100644
>> --- a/drivers/net/ethernet/rdc/r6040.c
>> +++ b/drivers/net/ethernet/rdc/r6040.c
>> @@ -1096,20 +1096,20 @@ static int __devinit r6040_init_one(struct pci_dev
> *pdev,
>>       if (err) {
>>               dev_err(&pdev->dev, "32-bit PCI DMA addresses"
>>                               "not supported by the card\n");
>> -             goto err_out;
>> +             goto err_out_disable_dev;
>>       }
>>       err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
>>       if (err) {
>>               dev_err(&pdev->dev, "32-bit PCI DMA addresses"
>>                               "not supported by the card\n");
>> -             goto err_out;
>> +             goto err_out_disable_dev;
>>       }
>>
>>       /* IO Size check */
>>       if (pci_resource_len(pdev, bar) < io_size) {
>>               dev_err(&pdev->dev, "Insufficient PCI resources, aborting\n");
>>               err = -EIO;
>> -             goto err_out;
>> +             goto err_out_disable_dev;
>>       }
>>
>>       pci_set_master(pdev);
>> @@ -1117,7 +1117,7 @@ static int __devinit r6040_init_one(struct pci_dev
> *pdev,
>>       dev = alloc_etherdev(sizeof(struct r6040_private));
>>       if (!dev) {
>>               err = -ENOMEM;
>> -             goto err_out;
>> +             goto err_out_disable_dev;
>>       }
>>       SET_NETDEV_DEV(dev, &pdev->dev);
>>       lp = netdev_priv(dev);
>> @@ -1238,6 +1238,8 @@ err_out_free_res:
>>       pci_release_regions(pdev);
>>  err_out_free_dev:
>>       free_netdev(dev);
>> +err_out_disable_dev:
>> +     pci_disable_device(dev);
>>  err_out:
>>       return err;
>>  }
>> --
>> 1.7.9.5
>>

Thanks for the Ack.
I found one more problem. Its when mdiobus_alloc fails in
r6040_init_one, we need to do call to the netif_napi_del and set the
NULL to pci_set_drvdata, at  err_out_unmap.

Thanks,
Devendra.

^ permalink raw reply

* Re: [PATCH] r6040: disable pci device if the subsequent calls (after pci_enable_device) fails
From: Florian Fainelli @ 2012-05-29  9:20 UTC (permalink / raw)
  To: Devendra Naga; +Cc: netdev, linux-kernel
In-Reply-To: <1338206223-26781-1-git-send-email-devendra.aaru@gmail.com>

On Monday 28 May 2012 17:27:03 Devendra Naga wrote:
> the calls after the pci_enable_device may fail, and will error out with out
> disabling it. disable the device at error paths.

Looks good, thanks Devendra!

> 
> Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>

Acked-by: Florian Fainelli <florian@openwrt.org>

> ---
>  drivers/net/ethernet/rdc/r6040.c |   10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/rdc/r6040.c 
b/drivers/net/ethernet/rdc/r6040.c
> index 4de7364..8f5079a 100644
> --- a/drivers/net/ethernet/rdc/r6040.c
> +++ b/drivers/net/ethernet/rdc/r6040.c
> @@ -1096,20 +1096,20 @@ static int __devinit r6040_init_one(struct pci_dev 
*pdev,
>  	if (err) {
>  		dev_err(&pdev->dev, "32-bit PCI DMA addresses"
>  				"not supported by the card\n");
> -		goto err_out;
> +		goto err_out_disable_dev;
>  	}
>  	err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
>  	if (err) {
>  		dev_err(&pdev->dev, "32-bit PCI DMA addresses"
>  				"not supported by the card\n");
> -		goto err_out;
> +		goto err_out_disable_dev;
>  	}
>  
>  	/* IO Size check */
>  	if (pci_resource_len(pdev, bar) < io_size) {
>  		dev_err(&pdev->dev, "Insufficient PCI resources, aborting\n");
>  		err = -EIO;
> -		goto err_out;
> +		goto err_out_disable_dev;
>  	}
>  
>  	pci_set_master(pdev);
> @@ -1117,7 +1117,7 @@ static int __devinit r6040_init_one(struct pci_dev 
*pdev,
>  	dev = alloc_etherdev(sizeof(struct r6040_private));
>  	if (!dev) {
>  		err = -ENOMEM;
> -		goto err_out;
> +		goto err_out_disable_dev;
>  	}
>  	SET_NETDEV_DEV(dev, &pdev->dev);
>  	lp = netdev_priv(dev);
> @@ -1238,6 +1238,8 @@ err_out_free_res:
>  	pci_release_regions(pdev);
>  err_out_free_dev:
>  	free_netdev(dev);
> +err_out_disable_dev:
> +	pci_disable_device(dev);
>  err_out:
>  	return err;
>  }
> -- 
> 1.7.9.5
> 

^ permalink raw reply

* [PATCH] net: sh_eth: fix the rxdesc pointer when rx descriptor empty happens
From: Shimoda, Yoshihiro @ 2012-05-29  9:07 UTC (permalink / raw)
  To: netdev; +Cc: SH-Linux

When Receive Descriptor Empty happens, rxdesc pointer of the driver
and actual next descriptor of the controller may be mismatch.
This patch fixes it.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
---
 drivers/net/ethernet/renesas/sh_eth.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index be3c221..667169b 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -1101,8 +1101,12 @@ static int sh_eth_rx(struct net_device *ndev)

 	/* Restart Rx engine if stopped. */
 	/* If we don't need to check status, don't. -KDU */
-	if (!(sh_eth_read(ndev, EDRRR) & EDRRR_R))
+	if (!(sh_eth_read(ndev, EDRRR) & EDRRR_R)) {
+		/* fix the values for the next receiving */
+		mdp->cur_rx = mdp->dirty_rx = (sh_eth_read(ndev, RDFAR) -
+					       sh_eth_read(ndev, RDLAR)) >> 4;
 		sh_eth_write(ndev, EDRRR_R, EDRRR);
+	}

 	return 0;
 }
@@ -1199,8 +1203,6 @@ static void sh_eth_error(struct net_device *ndev, int intr_status)
 		/* Receive Descriptor Empty int */
 		ndev->stats.rx_over_errors++;

-		if (sh_eth_read(ndev, EDRRR) ^ EDRRR_R)
-			sh_eth_write(ndev, EDRRR_R, EDRRR);
 		if (netif_msg_rx_err(mdp))
 			dev_err(&ndev->dev, "Receive Descriptor Empty\n");
 	}
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH v5 1/6] net: sh_eth: remove unnecessary function
From: Shimoda, Yoshihiro @ 2012-05-29  8:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-sh
In-Reply-To: <20120529.041711.866739766279184452.davem@davemloft.net>

2012/05/29 17:17, David Miller wrote:
> 
> It is not appropriate to submit cleanups and feature additions for the
> 'net-next' tree at this time.  That is only valid when I open the
> net-next tree back up which will be shortly after the merge window
> closes.
> 
> If you have bug fixes to submit for the 'net' tree, you must split
> them off from these cleanups and submit them seperately.

I understood it. The "PATCH 3/6" is a bug fix. So, I will submit it
first for the "net" tree.
After the merge window closes I will submit other patches.

Best regards,
Yoshihiro Shimoda

^ permalink raw reply

* [PATCH] asix: allow full size 8021Q frames to be received
From: Eric Dumazet @ 2012-05-29  8:31 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Greg Kroah-Hartman, Trond Wuellner, Grant Grundler,
	Paul Stewart, Allan Chou

From: Eric Dumazet <edumazet@google.com>

asix driver drops 8021Q full size frames because it doesn't take into
account VLAN header size.

Tested on AX88772 adapter.

Signed-off-by: Eric Dumazet <edumazet@google.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Allan Chou <allan@asix.com.tw>
CC: Trond Wuellner <trond@chromium.org>
CC: Grant Grundler <grundler@chromium.org>
CC: Paul Stewart <pstew@chromium.org>
---
drivers/net/usb/asix.c |    3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
index 71e2b05..3ae80ec 100644
--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -35,6 +35,7 @@
 #include <linux/crc32.h>
 #include <linux/usb/usbnet.h>
 #include <linux/slab.h>
+#include <linux/if_vlan.h>
 
 #define DRIVER_VERSION "22-Dec-2011"
 #define DRIVER_NAME "asix"
@@ -321,7 +322,7 @@ static int asix_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 			return 0;
 		}
 
-		if ((size > dev->net->mtu + ETH_HLEN) ||
+		if ((size > dev->net->mtu + ETH_HLEN + VLAN_HLEN) ||
 		    (size + offset > skb->len)) {
 			netdev_err(dev->net, "asix_rx_fixup() Bad RX Length %d\n",
 				   size);

^ permalink raw reply related

* Re: [PATCH v5 1/6] net: sh_eth: remove unnecessary function
From: David Miller @ 2012-05-29  8:17 UTC (permalink / raw)
  To: yoshihiro.shimoda.uh; +Cc: netdev, linux-sh
In-Reply-To: <4FC48570.8040204@renesas.com>

It is not appropriate to submit cleanups and feature additions for the
'net-next' tree at this time.  That is only valid when I open the
net-next tree back up which will be shortly after the merge window
closes.

If you have bug fixes to submit for the 'net' tree, you must split
them off from these cleanups and submit them seperately.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox