Netdev List
 help / color / mirror / Atom feed
* Re: ixgbe - problem with packet/bytes count on all queues
From: Paweł Staszewski @ 2010-05-11 19:40 UTC (permalink / raw)
  To: Brandeburg, Jesse; +Cc: Linux Network Development list, e1000-devel
In-Reply-To: <alpine.WNT.2.00.1005111058150.7656@jbrandeb-desk1.amr.corp.intel.com>

W dniu 2010-05-11 20:00, Brandeburg, Jesse pisze:
>
> On Sun, 18 Apr 2010, Paweł Staszewski wrote:
>
>    
>> Hello
>>
>> I want to ask is this a normal behavior of ixgb driver and 82598EB nic.
>> look for tx_queue_7 stats:
>>      
> Hi, sorry no-one replied.
>
>    
Thanks for reply :)
>>    ethtool -S eth2
>> NIC statistics:
>>        rx_packets: 35103252
>>        tx_packets: 1770371731
>>        rx_bytes: 3602052416
>>        tx_bytes: 1369778276
>>        rx_pkts_nic: 138121006018
>>        tx_pkts_nic: 122033163226
>>        rx_bytes_nic: 101484528847981
>>        tx_bytes_nic: 92258799092069
>>        lsc_int: 1
>>        tx_busy: 0
>>        non_eop_descs: 0
>>        rx_errors: 0
>>        tx_errors: 0
>>        rx_dropped: 0
>>        tx_dropped: 0
>>        multicast: 490226
>>        broadcast: 124104912
>>        rx_no_buffer_count: 0
>>        collisions: 0
>>        rx_over_errors: 0
>>        rx_crc_errors: 0
>>        rx_frame_errors: 0
>>        hw_rsc_aggregated: 0
>>        hw_rsc_flushed: 0
>>        fdir_match: 0
>>        fdir_miss: 0
>>        rx_fifo_errors: 0
>>        rx_missed_errors: 0
>>        tx_aborted_errors: 0
>>        tx_carrier_errors: 0
>>        tx_fifo_errors: 0
>>        tx_heartbeat_errors: 0
>>        tx_timeout_count: 0
>>        tx_restart_queue: 111130
>>        rx_long_length_errors: 38599
>>        rx_short_length_errors: 0
>>        tx_flow_control_xon: 0
>>        rx_flow_control_xon: 0
>>        tx_flow_control_xoff: 0
>>        rx_flow_control_xoff: 0
>>        rx_csum_offload_errors: 1554191
>>        alloc_rx_page_failed: 0
>>        alloc_rx_buff_failed: 0
>>        rx_no_dma_resources: 0
>>        tx_queue_0_packets: 108685351623
>>        tx_queue_0_bytes: 79701402025544
>>        tx_queue_1_packets: 3988024698
>>        tx_queue_1_bytes: 3353530467775
>>        tx_queue_2_packets: 1893305707
>>        tx_queue_2_bytes: 1705357186034
>>        tx_queue_3_packets: 1787852613
>>        tx_queue_3_bytes: 1518632482370
>>        tx_queue_4_packets: 1843108684
>>        tx_queue_4_bytes: 1641474602504
>>        tx_queue_5_packets: 1882637467
>>        tx_queue_5_bytes: 1629905766993
>>        tx_queue_6_packets: 1952759802
>>        tx_queue_6_bytes: 1680666591771
>>        tx_queue_7_packets: 0
>>        tx_queue_7_bytes: 0
>>        rx_queue_0_packets: 17361735592
>>        rx_queue_0_bytes: 12585728518077
>>        rx_queue_1_packets: 17194262916
>>        rx_queue_1_bytes: 12518731583464
>>        rx_queue_2_packets: 17342312348
>>        rx_queue_2_bytes: 12734959063176
>>        rx_queue_3_packets: 17367632051
>>        rx_queue_3_bytes: 12656219984521
>>        rx_queue_4_packets: 17150307164
>>        rx_queue_4_bytes: 12408526754019
>>        rx_queue_5_packets: 17206721842
>>        rx_queue_5_bytes: 12470666039893
>>        rx_queue_6_packets: 17202210572
>>        rx_queue_6_bytes: 12431429298950
>>        rx_queue_7_packets: 17295822822
>>        rx_queue_7_bytes: 12573299488239
>>
>> and here look at multiq queue number 8:
>> tc -s -d class show dev eth2
>> class multiq 1:1 parent 1:
>>    Sent 6905560675905 bytes 510743840 pkt (dropped 0, overlimits 0
>> requeues 0)
>>    backlog 0b 0p requeues 0
>> class multiq 1:2 parent 1:
>>    Sent 280699743990 bytes 330210442 pkt (dropped 0, overlimits 0 requeues 0)
>>    backlog 0b 0p requeues 0
>> class multiq 1:3 parent 1:
>>    Sent 128528666971 bytes 142053106 pkt (dropped 0, overlimits 0 requeues 0)
>>    backlog 0b 0p requeues 0
>> class multiq 1:4 parent 1:
>>    Sent 123086710694 bytes 140454119 pkt (dropped 0, overlimits 0 requeues 0)
>>    backlog 0b 0p requeues 0
>> class multiq 1:5 parent 1:
>>    Sent 121027779083 bytes 146164066 pkt (dropped 0, overlimits 0 requeues 0)
>>    backlog 0b 0p requeues 0
>> class multiq 1:6 parent 1:
>>    Sent 116245520195 bytes 141597610 pkt (dropped 0, overlimits 0 requeues 0)
>>    backlog 0b 0p requeues 0
>> class multiq 1:7 parent 1:
>>    Sent 133310553887 bytes 151141714 pkt (dropped 0, overlimits 0 requeues 0)
>>    backlog 0b 0p requeues 0
>> class multiq 1:8 parent 1:
>>    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>    backlog 0b 0p requeues 0
>>
>> Is that normal that driver don't use queue number 8 ?
>>      
> This seems extremely unusual, can you tell us what kernel version you're
> using and what kind of test you're running?
>
>    
Kernel 2.6.33.1
Traffic type - normal Internet traffic from many users.
2Gbit/s RX + 2.6Gbit/s TX

tc -s -d qdisc show dev eth2
qdisc mq 0: root
  Sent 71590101434962 bytes 2410582579 pkt (dropped 0, overlimits 0 
requeues 199799)
  backlog 0b 0p requeues 199799

Configuration for this nic:
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP 
qlen 10000
8: vlan0100@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
9: vlan0101@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
10: vlan0102@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
11: vlan0103@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
12: vlan0104@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
13: vlan0105@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
14: vlan0106@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
15: vlan0107@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
16: vlan0108@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
17: vlan0109@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
18: vlan0110@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
19: vlan0111@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
20: vlan0112@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
21: vlan0113@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
22: vlan0140@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
23: vlan0141@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
24: vlan0143@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
25: vlan0300@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
26: vlan0114@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
27: vlan0450@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
28: vlan0401@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
29: vlan0402@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
30: vlan0301@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
31: vlan0302@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
32: vlan0303@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
33: vlan0304@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
34: vlan0305@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
35: vlan0306@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
36: vlan0307@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
37: vlan0308@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
38: vlan0309@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
39: vlan0310@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
40: vlan0311@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
41: vlan0312@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
42: vlan0313@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
43: vlan0403@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
44: vlan0314@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
45: vlan0315@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
46: vlan0316@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
47: vlan0317@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
48: vlan0318@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
49: vlan0404@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
50: vlan0405@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
51: vlan0115@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
52: vlan0406@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
53: vlan0116@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
54: vlan0490@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
55: vlan0491@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
56: vlan0319@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
57: vlan0320@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
58: vlan0321@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
59: vlan0322@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
60: vlan0323@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
61: vlan0324@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
62: vlan0325@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
63: vlan0326@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
64: vlan0327@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
65: vlan0328@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
66: vlan0329@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
67: vlan0330@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
68: vlan0331@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
69: vlan0332@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
70: vlan0333@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
71: vlan0334@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
72: vlan0335@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
73: vlan0336@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
74: vlan0337@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
75: vlan0338@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
76: vlan0339@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
77: vlan0340@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
78: vlan0341@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
79: vlan0342@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
80: vlan0343@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
81: vlan0344@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
82: vlan0345@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
83: vlan0117@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
84: vlan0118@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
85: vlan0119@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq 
state UP qlen 100
86: vlan0120@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
87: vlan0121@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
88: vlan0122@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
89: vlan0407@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
90: vlan0408@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
91: vlan0409@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
92: vlan0410@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
93: vlan0411@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
94: vlan0430@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
95: vlan0431@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
96: vlan0432@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
97: vlan0433@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc 
state UP qlen 100
98: vlan0434@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
99: vlan0435@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
100: vlan0436@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
101: vlan0437@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
102: vlan0438@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
103: vlan0439@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
104: vlan0440@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
105: vlan0451@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
106: vlan0452@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
107: vlan0453@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
108: vlan0454@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
109: vlan0455@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
110: vlan0456@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
111: vlan0457@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
112: vlan0458@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
113: vlan0459@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
114: vlan0461@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
115: vlan0202@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
116: vlan0460@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
117: vlan0462@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
118: vlan0463@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
119: vlan0464@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
120: vlan0203@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
121: vlan0503@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
122: vlan0504@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
123: vlan0130@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
124: vlan0131@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
125: vlan0132@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
126: vlan0133@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
127: vlan0134@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
128: vlan0135@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
129: vlan0136@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
130: vlan0137@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
131: vlan0138@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
132: vlan0123@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
133: vlan0124@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
134: vlan0125@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
135: vlan0126@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
136: vlan0127@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
137: vlan0128@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
138: vlan0129@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
139: vlan0139@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
140: vlan0465@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
141: vlan0466@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
142: vlan0467@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
143: vlan0468@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
144: vlan0469@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
145: vlan0470@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
146: vlan0471@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
147: vlan0472@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
148: vlan0473@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
149: vlan0215@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
150: vlan0144@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
151: vlan0145@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
152: vlan0146@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
153: vlan0147@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
154: vlan0148@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
155: vlan0150@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
156: vlan0151@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
157: vlan0152@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
158: vlan0153@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
hfsc state UP qlen 100
159: vlan0412@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
160: vlan0413@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
161: vlan0414@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
162: vlan0415@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
163: vlan0416@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
164: vlan0154@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
165: vlan0155@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
166: vlan0156@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
167: vlan0157@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
168: vlan0158@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
169: vlan0159@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
170: vlan0160@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
171: vlan0161@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
172: vlan0162@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100
173: vlan0163@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP qlen 100

more info about nic:
ethtool -i eth2
driver: ixgbe
version: 2.0.44-k2
firmware-version: 1.12-2
bus-info: 0000:01:00.0

ethtool -k eth2
Offload parameters for eth2:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off



and other weird thing is after delete qdisc:
I think this is also not normal.

tc qdisc del dev eth2 root
tc -s -d class show dev eth2
class mq :1 root
  Sent 2608239 bytes 3000 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :2 root
  Sent 3831841 bytes 3301 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :3 root
  Sent 3518993 bytes 4016 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :4 root
  Sent 1750040 bytes 2657 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :5 root
  Sent 740596 bytes 1221 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :6 root
  Sent 143782921 bytes 210547 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :7 root
  Sent 3935866 bytes 5059 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :8 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :9 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :a root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :b root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :c root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :d root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :e root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :f root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :10 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :11 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :12 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :13 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :14 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :15 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :16 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :17 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :18 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :19 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :1a root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :1b root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :1c root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :1d root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :1e root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :1f root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :20 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :21 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :22 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :23 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :24 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :25 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :26 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :27 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :28 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :29 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :2a root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :2b root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :2c root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :2d root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :2e root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :2f root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :30 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :31 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :32 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :33 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :34 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :35 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :36 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :37 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :38 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :39 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :3a root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :3b root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :3c root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :3d root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :3e root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :3f root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :40 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :41 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :42 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :43 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :44 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :45 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :46 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :47 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :48 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :49 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :4a root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :4b root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :4c root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :4d root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :4e root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :4f root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :50 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :51 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :52 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :53 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :54 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :55 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :56 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :57 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :58 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :59 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :5a root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :5b root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :5c root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :5d root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :5e root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :5f root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :60 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :61 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :62 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :63 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :64 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :65 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :66 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :67 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :68 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :69 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :6a root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :6b root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :6c root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :6d root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :6e root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :6f root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :70 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :71 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :72 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :73 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :74 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :75 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :76 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :77 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :78 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :79 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :7a root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :7b root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :7c root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :7d root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :7e root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :7f root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class mq :80 root
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0


Normal i have multiq qdisc attached to device - and no difference when 
this is bfifo, pfifo or multiq
tc qdisc add dev eth2 root handle 1: multiq
then
  tc -s -d class show dev eth2
class multiq 1:1 parent 1:
  Sent 2458266 bytes 3288 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class multiq 1:2 parent 1:
  Sent 6259789 bytes 5390 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class multiq 1:3 parent 1:
  Sent 4451430 bytes 5457 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class multiq 1:4 parent 1:
  Sent 2915648 bytes 3917 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class multiq 1:5 parent 1:
  Sent 1156897 bytes 1761 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class multiq 1:6 parent 1:
  Sent 181776227 bytes 255856 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class multiq 1:7 parent 1:
  Sent 5510686 bytes 6832 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
class multiq 1:8 parent 1:
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0



> it almost seems that there is an off by one somewhere, what kind of
> traffic is being transmitted?
>
> Jesse
>
>
>    


^ permalink raw reply

* Re: [PATCH net-next-2.6 2/2] bonding: allow user-controlled output slave selection
From: Jay Vosburgh @ 2010-05-11 20:09 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: netdev
In-Reply-To: <20100511003245.GB7497@gospo.rdu.redhat.com>

Andy Gospodarek <andy@greyhouse.net> wrote:

>This patch give the user the ability to control the output slave for
>round-robin and active-backup bonding.  Similar functionality was
>discussed in the past, but Jay Vosburgh indicated he would rather see a
>feature like this added to existing modes rather than creating a
>completely new mode.  Jay's thoughts as well as Neil's input surrounding
>some of the issues with the first implementation pushed us toward a
>design that relied on the queue_mapping rather than skb marks.
>Round-robin and active-backup modes were chosen as the first users of
>this slave selection as they seemed like the most logical choices when
>considering a multi-switch environment.
>
>Round-robin mode works without any modification, but active-backup does
>require inclusion of the first patch in this series and setting
>the 'keep_all' flag.  This will allow reception of unicast traffic on
>any of the backup interfaces.

	Yes, I did think that the mark business fit better into existing
modes (I thought of it as kind of a new hash for xor and 802.3ad modes).
I also didn't expect to see so much new stuff (this, as well as the FCOE
special cases being discussed elsewhere) being shoehorned into the
active-backup mode.  I'm not so sure that adding so many special cases
to active-backup is a good thing.

	Now, I'm starting to wonder if you were right, and it would be
better overall to have a "manual" mode that would hopefully satisfy this
case as well as the FCOE special case.  I don't think either of these is
a bad use case, I'm just not sure the right way to handle them is
another special knob in active-backup mode (either directly, or
implicitly in __netif_receive_skb), which wasn't what I expected to see.

	I presume you're overloading active-backup because it's not
etherchannel, 802.3ad, etc, and just talks right to the switch.  For the
regular load balance modes, I still think overlay into the existing
modes is preferable (more on that later); I'm thinking of "manual"
instead of another tweak to active-backup.

	If users want to have actual hot-standby functionality, then
active-backup would do that, and nothing else (and it can be multi-queue
aware, but only one slave active at a time).

	Users who want the set of bonded slaves to look like a big
multiqueue buffet could use this "manual" mode and set things up however
they want.  One way to set it up is simply that the bond is N queues
wide, where N is the total of the queue counts of all the slaves.  If a
slave fails, N gets smaller, and the user code has to deal with that.
Since the queue count of a device can't change dynamically, the bond
would have to actually be set up with some big number of queues, and
then only a subset is actually active (or there is some sort of wrap).

	In such an implementation, each slave would have a range of
queue IDs, not necessarily just one.  I'm a bit leery of exposing an API
where each slave is one queue ID, as it could make transitioning to real
multi-queue awareness difficult.

	There might also be a way to tie it in to the new RPS code on
the receive side.

	If the slaves all have the same MAC and attach to a single
switch via etherchannel, then it all looks pretty much like a single big
honkin' multiqueue device.  The switch probably won't map the flows back
the same way, though.

	If the slaves are on discrete switches (without etherchannel),
things become more complicated.  If the slaves have the same MAC, then
the switches will be irritated about seeing that same MAC coming in from
multiple places.  If the slaves have different MACs, then ARP has the
same sort of issues.

	In thinking about it, if it's linux bonding at both ends, there
could be any number of discrete switches in the path, and it wouldn't
matter as long as the linux end can work things out, e.g.,

        -- switch 1 --
hostA  /              \  hostB
bond  ---- switch 2 ---- bond
       \              /
        -- switch 3 --

	For something like this, the switches would never share MAC
information for the bonding slaves.  The issue here then becomes more of
detecting link failures (it would require either a "trunk failover" type
of function on the switch, or some kind of active probe between the
bonds).

	Now, I realize that I'm babbling a bit, as from reading your
description, this isn't necessarily your target topology (which sounded
more like a case of slave A can reach only network X, and slave B can
reach anywhere, so sending to network X should use slave A
preferentially), or, as long as I'm doing ASCII-art,

       --- switch 1 ---- network X
hostA /               /
bond  ---- switch 2 -+-- anywhere

	Is that an accurate representation?  Or is it something a bit
different, e.g.,

       --- switch 1 ---- network X -\
hostA /                             /
bond  ---- switch 2 ---- anywhere --

	I.e., the "anywhere" connects back to network X from the
outside, so to speak.  Or, oh, maybe I'm missing it entirely, and you're
thinking of something like this:

       --- switch 1 --- VPN --- web site
hostA /                          /
bond  ---- switch 2 - Internet -/

	Where you prefer to hit "web site" via the VPN (perhaps it's a
more efficient or secure path), but can do it from the public network at
large if necessary.

	Now, regardless of the above, your first patch ("keep_all") is
to deal with the reverse problem, if this is a piggyback on top of
active-backup mode: how to get packets back, when both channels can be
active simultaneously.  That actually dovetails to a degree with work
I've been doing lately, but the solution there probably isn't what
you're looking for (there's a user space daemon to do path finding, and
the "bond IP" address is piggybacked on the slaves' MAC addresses, which
are not changed; the "bond IP" set exists in a separate subnet all its
own).

	As I said, I'm not convinced that the "keep_all" option to
active-backup is really better than just a "manual" mode that lacks the
dup suppression and expects the user to set everything up.

	As for the round-robin change in this patch, if I'm reading it
right, then the way it works is that the packets are round-robined,
unless there's a queue id passed in, in which case it's assigned to the
slave mapped to that queue id.  I'm not entirely sure why you picked
round-robin mode for that over balance-xor; it doesn't seem to fit well
with the description in the documentation.  Or is it just sort of a
demonstrator?

	I do like one other aspect of the patch, and that's the concept
of overlaying the queue map on top of the balance algorithm.  So, e.g.,
balance-xor would do its usual thing, unless the packet is queue mapped,
in which case the packet's assignment is obeyed.  The balance-xor could
even optionally do its xor across the full set of all slaves output
queues instead of just across the slaves.  Round-robin can operate
similarly.  For those modes, a "balance by queue vs. balance by slave"
seems like a reasonable knob to have.

	I do understand that you're proposing something relatively
simple, and I'm thinking out loud about alternate or additional
implementation details.  Some of this is "ooh ahh what if", but we also
don't want to end up with something that's forwards incompatible, and
I'm hoping to find one solution to multiple problems.

	Thoughts?

	-J

>This was tested with IPv4-based filters as well as VLAN-based filters
>with good results.
>
>More information as well as a configuration example is available in the
>patch to Documentation/networking/bonding.txt.
>
>Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
>Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>---
> Documentation/networking/bonding.txt |   76 +++++++++++++++++++++-
> drivers/net/bonding/bond_main.c      |   77 +++++++++++++++++++++-
> drivers/net/bonding/bond_sysfs.c     |  117 +++++++++++++++++++++++++++++++++-
> drivers/net/bonding/bonding.h        |    5 ++
> include/linux/if_bonding.h           |    1 +
> 5 files changed, 270 insertions(+), 6 deletions(-)
>
>diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
>index d64fd2f..fd277c1 100644
>--- a/Documentation/networking/bonding.txt
>+++ b/Documentation/networking/bonding.txt
>@@ -49,6 +49,7 @@ Table of Contents
> 3.3	Configuring Bonding Manually with Ifenslave
> 3.3.1		Configuring Multiple Bonds Manually
> 3.4	Configuring Bonding Manually via Sysfs
>+3.5	Overriding Configuration for Special Cases
>
> 4. Querying Bonding Configuration
> 4.1	Bonding Configuration
>@@ -1333,8 +1334,79 @@ echo 2000 > /sys/class/net/bond1/bonding/arp_interval
> echo +eth2 > /sys/class/net/bond1/bonding/slaves
> echo +eth3 > /sys/class/net/bond1/bonding/slaves
>
>-
>-4. Querying Bonding Configuration 
>+3.5 Overriding Configuration for Special Cases
>+----------------------------------------------
>+Nominally, when using the bonding driver, the physical port which transmits a
>+frame is selected by the bonding driver, and is not relevant to the user or
>+system administrator.  The output port is simply selected using the policies of
>+the selected bonding mode.  On occasion however, it is helpful to direct certain
>+classes of traffic to certain physical interfaces on output to implement
>+slightly more complex policies.  For example, to reach a web server over a
>+bonded interface in which eth0 connects to a private network, while eth1
>+connects via a public network, it may be desirous to bias the bond to send said
>+traffic over eth0 first, using eth1 only as a fall back, while all other traffic
>+can safely be sent over either interface.  Such configurations may be achieved
>+using the traffic control utilities inherent in linux.
>+
>+By default the bonding driver is multiqueue aware and 16 queues are created
>+when the driver initializes (see Documentation/networking/multiqueue.txt
>+for details).  If more or less queues are desired the module parameter
>+tx_queues can be used to change this value.  There is no sysfs parameter
>+available as the allocation is done at module init time.
>+
>+The output of the file /proc/net/bonding/bondX has changed so the output Queue
>+ID is now printed for each slave:
>+
>+Bonding Mode: fault-tolerance (active-backup)
>+Primary Slave: None
>+Currently Active Slave: eth0
>+MII Status: up
>+MII Polling Interval (ms): 0
>+Up Delay (ms): 0
>+Down Delay (ms): 0
>+
>+Slave Interface: eth0
>+MII Status: up
>+Link Failure Count: 0
>+Permanent HW addr: 00:1a:a0:12:8f:cb
>+Slave queue ID: 0
>+
>+Slave Interface: eth1
>+MII Status: up
>+Link Failure Count: 0
>+Permanent HW addr: 00:1a:a0:12:8f:cc
>+Slave queue ID: 2
>+
>+The queue_id for a slave can be set using the command:
>+
>+# echo "eth1:2" > /sys/class/net/bond0/bonding/queue_id
>+
>+Any interface that needs a queue_id set should set it with multiple calls
>+like the one above until proper priorities are set for all interfaces.  On
>+distributions that allow configuration via initscripts, multiple 'queue_id'
>+arguments can be added to BONDING_OPTS to set all needed slave queues.
>+
>+These queue id's can be used in conjunction with the tc utility to configure
>+a multiqueue qdisc and filters to bias certain traffic to transmit on certain
>+slave devices.  For instance, say we wanted, in the above configuration to
>+force all traffic bound to 192.168.1.100 to use eth1 in the bond as its output
>+device. The following commands would accomplish this:
>+
>+# tc qdisc add dev bond0 handle 1 root multiq
>+
>+# tc filter add dev bond0 protocol ip parent 1: prio 1 u32 match ip dst \
>+	192.168.1.100 action skbedit queue_mapping 2
>+
>+These commands tell the kernel to attach a multiqueue queue discipline to the
>+bond0 interface and filter traffic enqueued to it, such that packets with a dst
>+ip of 192.168.1.100 have their output queue mapping value overwritten to 2.
>+This value is then passed into the driver, causing the normal output path
>+selection policy to be overridden, selecting instead qid 2, which maps to eth1.
>+
>+Note that qid values begin at 1.  qid 0 is reserved to initiate to the driver
>+that normal output policy selection should take place.
>+
>+4 Querying Bonding Configuration
> =================================
>
> 4.1 Bonding Configuration
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index eb86363..aa6a79a 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -90,6 +90,7 @@
> #define BOND_LINK_ARP_INTERV	0
>
> static int max_bonds	= BOND_DEFAULT_MAX_BONDS;
>+static int tx_queues	= BOND_DEFAULT_TX_QUEUES;
> static int num_grat_arp = 1;
> static int num_unsol_na = 1;
> static int miimon	= BOND_LINK_MON_INTERV;
>@@ -111,6 +112,8 @@ static struct bond_params bonding_defaults;
>
> module_param(max_bonds, int, 0);
> MODULE_PARM_DESC(max_bonds, "Max number of bonded devices");
>+module_param(tx_queues, int, 0);
>+MODULE_PARM_DESC(tx_queues, "Max number of transmit queues (default = 16)");
> module_param(num_grat_arp, int, 0644);
> MODULE_PARM_DESC(num_grat_arp, "Number of gratuitous ARP packets to send on failover event");
> module_param(num_unsol_na, int, 0644);
>@@ -1532,6 +1535,12 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
> 		goto err_undo_flags;
> 	}
>
>+	/*
>+	 * Set the new_slave's queue_id to be zero.  Queue ID mapping
>+	 * is set via sysfs or module option if desired.
>+	 */
>+	new_slave->queue_id = 0;
>+
> 	/* save slave's original flags before calling
> 	 * netdev_set_master and dev_open
> 	 */
>@@ -1790,6 +1799,7 @@ err_restore_mac:
> 	}
>
> err_free:
>+	new_slave->queue_id = 0;
> 	kfree(new_slave);
>
> err_undo_flags:
>@@ -1977,6 +1987,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
> 				   IFF_SLAVE_INACTIVE | IFF_BONDING |
> 				   IFF_SLAVE_NEEDARP);
>
>+	slave->queue_id = 0;
> 	kfree(slave);
>
> 	return 0;  /* deletion OK */
>@@ -3269,6 +3280,7 @@ static void bond_info_show_slave(struct seq_file *seq,
> 		else
> 			seq_puts(seq, "Aggregator ID: N/A\n");
> 	}
>+	seq_printf(seq, "Slave queue ID: %d\n", slave->queue_id);
> }
>
> static int bond_info_seq_show(struct seq_file *seq, void *v)
>@@ -4405,9 +4417,59 @@ static void bond_set_xmit_hash_policy(struct bonding *bond)
> 	}
> }
>
>+/*
>+ * Lookup the slave that corresponds to a qid
>+ */
>+static inline int bond_slave_override(struct bonding *bond,
>+				      struct sk_buff *skb)
>+{
>+	int i, res = 1;
>+	struct slave *slave = NULL;
>+	struct slave *check_slave;
>+
>+	read_lock(&bond->lock);
>+
>+	if (!BOND_IS_OK(bond) || !skb->queue_mapping)
>+		goto out;
>+
>+	/* Find out if any slaves have the same mapping as this skb. */
>+	bond_for_each_slave(bond, check_slave, i) {
>+		if (check_slave->queue_id == skb->queue_mapping) {
>+			slave = check_slave;
>+			break;
>+		}
>+	}
>+
>+	/* If the slave isn't UP, use default transmit policy. */
>+	if (slave && slave->queue_id && IS_UP(slave->dev) &&
>+	    (slave->link == BOND_LINK_UP)) {
>+		res = bond_dev_queue_xmit(bond, skb, slave->dev);
>+	}
>+
>+out:
>+	read_unlock(&bond->lock);
>+	return res;
>+}
>+
>+static u16 bond_select_queue(struct net_device *dev, struct sk_buff *skb)
>+{
>+	/*
>+	 * This helper function exists to help dev_pick_tx get the correct
>+	 * destination queue.  Using a helper function skips the a call to
>+	 * skb_tx_hash and will put the skbs in the queue we expect on their
>+	 * way down to the bonding driver.
>+	 */
>+	return skb->queue_mapping;
>+}
>+
> static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
> {
>-	const struct bonding *bond = netdev_priv(dev);
>+	struct bonding *bond = netdev_priv(dev);
>+
>+	if (TX_QUEUE_OVERRIDE(bond->params.mode)) {
>+		if (!bond_slave_override(bond, skb))
>+			return NETDEV_TX_OK;
>+	}
>
> 	switch (bond->params.mode) {
> 	case BOND_MODE_ROUNDROBIN:
>@@ -4492,6 +4554,7 @@ static const struct net_device_ops bond_netdev_ops = {
> 	.ndo_open		= bond_open,
> 	.ndo_stop		= bond_close,
> 	.ndo_start_xmit		= bond_start_xmit,
>+	.ndo_select_queue	= bond_select_queue,
> 	.ndo_get_stats		= bond_get_stats,
> 	.ndo_do_ioctl		= bond_do_ioctl,
> 	.ndo_set_multicast_list	= bond_set_multicast_list,
>@@ -4763,6 +4826,13 @@ static int bond_check_params(struct bond_params *params)
> 		}
> 	}
>
>+	if (tx_queues < 1 || tx_queues > 255) {
>+		pr_warning("Warning: tx_queues (%d) should be between "
>+			   "1 and 255, resetting to %d\n",
>+			   tx_queues, BOND_DEFAULT_TX_QUEUES);
>+		tx_queues = BOND_DEFAULT_TX_QUEUES;
>+	}
>+
> 	if ((keep_all != 0) && (keep_all != 1)) {
> 		pr_warning("Warning: keep_all module parameter (%d), "
> 			   "not of valid value (0/1), so it was set to "
>@@ -4940,6 +5010,7 @@ static int bond_check_params(struct bond_params *params)
> 	params->primary[0] = 0;
> 	params->primary_reselect = primary_reselect_value;
> 	params->fail_over_mac = fail_over_mac_value;
>+	params->tx_queues = tx_queues;
> 	params->keep_all = keep_all;
>
> 	if (primary) {
>@@ -5027,8 +5098,8 @@ int bond_create(struct net *net, const char *name)
>
> 	rtnl_lock();
>
>-	bond_dev = alloc_netdev(sizeof(struct bonding), name ? name : "",
>-				bond_setup);
>+	bond_dev = alloc_netdev_mq(sizeof(struct bonding), name ? name : "",
>+				bond_setup, tx_queues);
> 	if (!bond_dev) {
> 		pr_err("%s: eek! can't alloc netdev!\n", name);
> 		rtnl_unlock();
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index 44651ce..87bfcf1 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -1472,6 +1472,121 @@ static ssize_t bonding_show_ad_partner_mac(struct device *d,
> static DEVICE_ATTR(ad_partner_mac, S_IRUGO, bonding_show_ad_partner_mac, NULL);
>
> /*
>+ * Show the queue_ids of the slaves in the current bond.
>+ */
>+static ssize_t bonding_show_queue_id(struct device *d,
>+				     struct device_attribute *attr,
>+				     char *buf)
>+{
>+	struct slave *slave;
>+	int i, res = 0;
>+	struct bonding *bond = to_bond(d);
>+
>+	if (!rtnl_trylock())
>+		return restart_syscall();
>+
>+	read_lock(&bond->lock);
>+	bond_for_each_slave(bond, slave, i) {
>+		if (res > (PAGE_SIZE - 6)) {
>+			/* not enough space for another interface name */
>+			if ((PAGE_SIZE - res) > 10)
>+				res = PAGE_SIZE - 10;
>+			res += sprintf(buf + res, "++more++ ");
>+			break;
>+		}
>+		res += sprintf(buf + res, "%s:%d ",
>+			       slave->dev->name, slave->queue_id);
>+	}
>+	read_unlock(&bond->lock);
>+	if (res)
>+		buf[res-1] = '\n'; /* eat the leftover space */
>+	rtnl_unlock();
>+	return res;
>+}
>+
>+/*
>+ * Set the queue_ids of the  slaves in the current bond.  The bond
>+ * interface must be enslaved for this to work.
>+ */
>+static ssize_t bonding_store_queue_id(struct device *d,
>+				      struct device_attribute *attr,
>+				      const char *buffer, size_t count)
>+{
>+	struct slave *slave, *update_slave;
>+	struct bonding *bond = to_bond(d);
>+	u16 qid;
>+	int i, ret = count;
>+	char *delim;
>+	struct net_device *sdev = NULL;
>+
>+	if (!rtnl_trylock())
>+		return restart_syscall();
>+
>+	/* delim will point to queue id if successful */
>+	delim = strchr(buffer, ':');
>+	if (!delim)
>+		goto err_no_cmd;
>+
>+	/*
>+	 * Terminate string that points to device name and bump it
>+	 * up one, so we can read the queue id there.
>+	 */
>+	*delim = '\0';
>+	if (sscanf(++delim, "%hd\n", &qid) != 1)
>+		goto err_no_cmd;
>+
>+	/* Check buffer length, valid ifname and queue id */
>+	if (strlen(buffer) > IFNAMSIZ ||
>+	    !dev_valid_name(buffer) ||
>+	    qid > bond->params.tx_queues)
>+		goto err_no_cmd;
>+
>+	/* Get the pointer to that interface if it exists */
>+	sdev = __dev_get_by_name(dev_net(bond->dev), buffer);
>+	if (!sdev)
>+		goto err_no_cmd;
>+
>+	read_lock(&bond->lock);
>+
>+	/* Search for thes slave and check for duplicate qids */
>+	update_slave = NULL;
>+	bond_for_each_slave(bond, slave, i) {
>+		if (sdev == slave->dev)
>+			/*
>+			 * We don't need to check the matching
>+			 * slave for dups, since we're overwriting it
>+			 */
>+			update_slave = slave;
>+		else if (qid && qid == slave->queue_id) {
>+			goto err_no_cmd_unlock;
>+		}
>+	}
>+
>+	if (!update_slave)
>+		goto err_no_cmd_unlock;
>+
>+	/* Actually set the qids for the slave */
>+	update_slave->queue_id = qid;
>+
>+	read_unlock(&bond->lock);
>+out:
>+	rtnl_unlock();
>+	return ret;
>+
>+err_no_cmd_unlock:
>+	read_unlock(&bond->lock);
>+err_no_cmd:
>+	pr_info("invalid input for queue_id set for %s.\n",
>+		bond->dev->name);
>+	ret = -EPERM;
>+	goto out;
>+}
>+
>+static DEVICE_ATTR(queue_id, S_IRUGO | S_IWUSR, bonding_show_queue_id,
>+		   bonding_store_queue_id);
>+
>+
>+/*
>  * Show and set the keep_all flag.
>  */
> static ssize_t bonding_show_keep(struct device *d,
>@@ -1513,7 +1628,6 @@ static DEVICE_ATTR(keep_all, S_IRUGO | S_IWUSR,
> 		   bonding_show_keep, bonding_store_keep);
>
>
>-
> static struct attribute *per_bond_attrs[] = {
> 	&dev_attr_slaves.attr,
> 	&dev_attr_mode.attr,
>@@ -1539,6 +1653,7 @@ static struct attribute *per_bond_attrs[] = {
> 	&dev_attr_ad_actor_key.attr,
> 	&dev_attr_ad_partner_key.attr,
> 	&dev_attr_ad_partner_mac.attr,
>+	&dev_attr_queue_id.attr,
> 	&dev_attr_keep_all.attr,
> 	NULL,
> };
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index 3b7532f..274a3a1 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -60,6 +60,9 @@
> 		 ((mode) == BOND_MODE_TLB)          ||	\
> 		 ((mode) == BOND_MODE_ALB))
>
>+#define TX_QUEUE_OVERRIDE(mode)				\
>+			(((mode) == BOND_MODE_ACTIVEBACKUP) ||	\
>+			 ((mode) == BOND_MODE_ROUNDROBIN))
> /*
>  * Less bad way to call ioctl from within the kernel; this needs to be
>  * done some other way to get the call out of interrupt context.
>@@ -131,6 +134,7 @@ struct bond_params {
> 	char primary[IFNAMSIZ];
> 	int primary_reselect;
> 	__be32 arp_targets[BOND_MAX_ARP_TARGETS];
>+	int tx_queues;
> 	int keep_all;
> };
>
>@@ -166,6 +170,7 @@ struct slave {
> 	u8     perm_hwaddr[ETH_ALEN];
> 	u16    speed;
> 	u8     duplex;
>+	u16    queue_id;
> 	struct ad_slave_info ad_info; /* HUGE - better to dynamically alloc */
> 	struct tlb_slave_info tlb_info;
> };
>diff --git a/include/linux/if_bonding.h b/include/linux/if_bonding.h
>index cd525fa..2c79943 100644
>--- a/include/linux/if_bonding.h
>+++ b/include/linux/if_bonding.h
>@@ -83,6 +83,7 @@
>
> #define BOND_DEFAULT_MAX_BONDS  1   /* Default maximum number of devices to support */
>
>+#define BOND_DEFAULT_TX_QUEUES 16   /* Default number of tx queues per device */
> /* hashing types */
> #define BOND_XMIT_POLICY_LAYER2		0 /* layer 2 (MAC only), default */
> #define BOND_XMIT_POLICY_LAYER34	1 /* layer 3+4 (IP ^ (TCP || UDP)) */
>-- 
>1.6.2.5

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: pull request: wireless-next-2.6 2010-05-11
From: David Miller @ 2010-05-11 20:32 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, netdev
In-Reply-To: <20100511190510.GD2400@tuxdriver.com>

From: "John W. Linville" <linville@tuxdriver.com>
Date: Tue, 11 May 2010 15:05:10 -0400

> Another round of bits intended for 2.6.35...mostly driver updates this
> time.  The biggest item of note is some continued attention for rt2800
> from the rt2x00 team.

Pulled, thanks John.

^ permalink raw reply

* Re: TCP-MD5 checksum failure on x86_64 SMP
From: Eric Dumazet @ 2010-05-11 20:50 UTC (permalink / raw)
  To: Bijay Singh
  Cc: Stephen Hemminger, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>, netdev, Ilpo Järvinen
In-Reply-To: <AAFABD0F-C66F-44C2-8BDC-FB489EA8655F@guavus.com>

Le mardi 11 mai 2010 à 04:08 +0000, Bijay Singh a écrit :
> Hi Eric,
> 
> I guess that makes me the enviable one. So I am keen to test out this feature completely, as long as I know what to do as a next step, directions, patches.
> 
> Thanks


I believe third problem comes from commit 4957faad
(TCPCT part 1g: Responder Cookie => Initiator), from William Allen
Simpson.

When a SYN-ACK packet is built (in tcp_synack_options()),
it specifically forbids a TIMESTAMP option to be included if SACK is
also selected :

doing_ts &= !ireq->sack_ok;

Problem is this mask is done on a local variable. socket is still marked
as being timestamp enabled.


Later, when we build tcp options for data packets, we _include_ a
timestamp, while our SYNACK didnt mention the option.  

So the following trafic can happen (and fails) :

18:38:29.041966 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [S], seq 4014064674, win 8860, options [mss 4430,sackOK,TS val 519041 ecr 0,nop,wscale 7,nop,nop,md5can't check - 9b44126367effcf3247fcbf6da76b24d], length 0
18:38:29.042072 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [S.], seq 586328714, ack 4014064675, win 5792, options [nop,nop,md5can't check - badd847799ded46f39642c341cc7e92b,mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
18:38:29.042093 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [.], ack 1, win 70, options [nop,nop,md5can't check - 3994ef6987df02a592963fba04c5d313], length 0
18:38:29.043217 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [.], seq 1:1441, ack 1, win 70, options [nop,nop,md5can't check - 8399f7ccab3a6b8c5a3027ed58bba314], length 1440
18:38:29.043226 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [P.], seq 1441:2501, ack 1, win 70, options [nop,nop,md5can't check - 701ebf65b1894a6bed4cefbf7a56596a], length 1060
18:38:29.043374 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], ack 1441, win 68, options [nop,nop,md5can't check - 1badb315ba436ab59bff5b37daa871be,nop,nop,TS val 113051377 ecr 519041], length 0
18:38:29.043383 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], ack 2501, win 91, options [nop,nop,md5can't check - 120564dcb99f822f3b70910282a6ed9d,nop,nop,TS val 113051377 ecr 519041], length 0
18:38:29.043673 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113051377 ecr 519041], length 1428
18:38:29.043681 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [P.], seq 1429:2500, ack 2501, win 91, options [nop,nop,md5can't check - 7a910cd5ff357bf0e2c8d3489aafaa86,nop,nop,TS val 113051377 ecr 519041], length 1071
18:38:32.037786 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113051677 ecr 519041], length 1428
18:38:38.037708 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113052277 ecr 519041], length 1428
18:38:50.037524 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113053477 ecr 519041], length 1428


Could you try following patch ?

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5db3a2c..0be21cd 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -668,7 +668,7 @@ static unsigned tcp_synack_options(struct sock *sk,
 	u8 cookie_plus = (xvp != NULL && !xvp->cookie_out_never) ?
 			 xvp->cookie_plus :
 			 0;
-	bool doing_ts = ireq->tstamp_ok;
+	bool doing_ts;
 
 #ifdef CONFIG_TCP_MD5SIG
 	*md5 = tcp_rsk(req)->af_specific->md5_lookup(sk, req);
@@ -681,11 +681,12 @@ static unsigned tcp_synack_options(struct sock *sk,
 		 * rather than TS in order to fit in better with old,
 		 * buggy kernels, but that was deemed to be unnecessary.
 		 */
-		doing_ts &= !ireq->sack_ok;
+		ireq->tstamp_ok &= !ireq->sack_ok;
 	}
 #else
 	*md5 = NULL;
 #endif
+	doing_ts = ireq->tstamp_ok;
 
 	/* We always send an MSS option. */
 	opts->mss = mss;






^ permalink raw reply related

* Re: [PATCH net-next-2.6 1/2] bonding: add keep_all parameter
From: Andy Gospodarek @ 2010-05-11 21:03 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev
In-Reply-To: <7958.1273598301@death.nxdomain.ibm.com>

On Tue, May 11, 2010 at 10:18:21AM -0700, Jay Vosburgh wrote:
> Andy Gospodarek <andy@greyhouse.net> wrote:
> 
> >
> >In an effort to suppress duplicate frames on certain bonding modes
> >(specifically the modes that do not require additional configuration on
> >the switch or switches connected to the host),
> 
> 	Strictly speaking, the above is incorrect, as the duplicate
> suppression is turned on for the active-backup inactive slaves as well
> as 802.3ad ports that are disabled (any slave that gets the "inactive"
> flag bit set).

It is also effective when using ALB and TLB, right?  I can change the
language if you would like to increase the description's accuracy.

> 
> >[...] code was added in the
> >generic receive patch in 2.6.16.  The current behavior works quite well
> >for most users, but there are some times it would be nice to restore old
> >functionality and allow all frames to make their way up the stack.
> 
> 	Reading netdev lately, it sure looks like everybody wants ways
> to shut off or bypass the duplicate suppression.
> 

I see that too, which was part of the reason to add a configuration
option.  I know many of the people that complained that they were seeing
dups will complain again if they show up in the future, so a config
option seemed like the best way to satisfy both.

> >This patch adds support for a new module option and sysfs file called
> >'keep_all' that will restore pre-2.6.16 functionality if the user
> >desires.  The default value is '0' and retains existing behavior, but
> >the user can set it to '1' and allow all frames up if desired.
> 
> 	Since this is really meant for the queue tagging stuff in the
> next patch, should this really be something that's enabled automatically
> if the queues are configured in such a way that the inactive slave is
> going to receive traffic?
> 

Part of the reason not to have it happen automatically is that the
second patch *should* allow simple pass-through of queue-mapping (though
I didn't mention that specifically) from bond device to underlying
slaves if the user is aware of the number of output queues in their
NIC and doesn't set the queue_ids for any of the slaves.

Another reason not to turn it on automatically is if the network patch
for transmission and reception are actually different.  The 'keep_all=1'
flag might not be needed if transmission is happening on an inactive
interface, but the active interface will receive all responses due to
the way the network is designed.

Again, a big part of the motivation patch was bringing back that
old-functionality to those that desire it and was why I split this out
from the next patch.
 
> 	I also wonder if something like this would satisfy the FCOE guys
> without making __netif_receive_skb / skb_bond_should_drop even more
> complicated than they already are.

I'd love to think so, but you never know.

> >Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> >Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> >---
> > Documentation/networking/bonding.txt |   15 ++++++++++++
> > drivers/net/bonding/bond_main.c      |   15 ++++++++++++
> > drivers/net/bonding/bond_sysfs.c     |   43 +++++++++++++++++++++++++++++++++-
> > drivers/net/bonding/bonding.h        |    1 +
> > include/linux/if.h                   |    1 +
> > net/core/dev.c                       |   26 +++++++++++---------
> > 6 files changed, 88 insertions(+), 13 deletions(-)
> >
> >diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
> >index 61f516b..d64fd2f 100644
> >--- a/Documentation/networking/bonding.txt
> >+++ b/Documentation/networking/bonding.txt
> >@@ -399,6 +399,21 @@ fail_over_mac
> > 	This option was added in bonding version 3.2.0.  The "follow"
> > 	policy was added in bonding version 3.3.0.
> >
> >+keep_all
> >+
> >+	Option to specify whether or not you will keep all frames
> >+	received on an interface that is a member of a bond.  Right
> >+	now checking is done to ensure that most frames ultimately
> >+	classified as duplicates are dropped to keep noise to a
> >+	minimum.  The feature to drop duplicates was added in kernel
> >+	version 2.6.16 (bonding driver version 3.0.2) and this will
> >+	allow that original behavior to be restored if desired.
> >+
> >+	A value of 0 (default) will preserve the current behavior and
> >+	will drop all duplicate frames the bond may receive.  A value
> >+	of 1 will not attempt to avoid duplicate frames and pass all
> >+	of them up the stack.
> 
> 	Two thoughts (presuming for the moment that this doesn't
> change): first, bump the driver version and mention when it was added;
> second, mention that this only applies to active-backup mode.
> 

Happy to update the version.  But shouldn't this impact ALB and TLB
modes too since they have a concept of 'active' slaves?

> > lacp_rate
> >
> > 	Option specifying the rate in which we'll ask our link partner

<snip>

> >--- a/net/core/dev.c
> >+++ b/net/core/dev.c
> >@@ -2758,21 +2758,23 @@ int __skb_bond_should_drop(struct sk_buff *skb, struct net_device *master)
> > 		skb_bond_set_mac_by_master(skb, master);
> > 	}
> >
> >-	if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
> >-		if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
> >-		    skb->protocol == __cpu_to_be16(ETH_P_ARP))
> >-			return 0;
> >+	if (unlikely(!(master->priv_flags & IFF_BONDING_KEEP_ALL))) {
> 
> 	So it's unlikely that "keep all" will be turned off?
> 

Grrrr.  That should be an if(likely!(....  Good catch.

> >+		if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
> >+			if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
> >+			    skb->protocol == __cpu_to_be16(ETH_P_ARP))
> >+				return 0;
> >
> >-		if (master->priv_flags & IFF_MASTER_ALB) {
> >-			if (skb->pkt_type != PACKET_BROADCAST &&
> >-			    skb->pkt_type != PACKET_MULTICAST)
> >+			if (master->priv_flags & IFF_MASTER_ALB) {
> >+				if (skb->pkt_type != PACKET_BROADCAST &&
> >+				    skb->pkt_type != PACKET_MULTICAST)
> >+					return 0;
> >+			}
> >+			if (master->priv_flags & IFF_MASTER_8023AD &&
> >+			    skb->protocol == __cpu_to_be16(ETH_P_SLOW))
> > 				return 0;
> >-		}
> >-		if (master->priv_flags & IFF_MASTER_8023AD &&
> >-		    skb->protocol == __cpu_to_be16(ETH_P_SLOW))
> >-			return 0;
> >
> >-		return 1;
> >+			return 1;
> >+		}
> > 	}
> > 	return 0;
> > }
> >-- 
> >1.6.2.5
> 
> 	-J
> 
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [Uclinux-dist-devel] [PATCH 1/9] netdev: bfin_mac: add support for IEEE 1588 PTP
From: Mike Frysinger @ 2010-05-11 21:03 UTC (permalink / raw)
  To: Richard Cochran; +Cc: netdev, Barry Song, David S. Miller, uclinux-dist-devel
In-Reply-To: <20100511070716.GA3254@riccoc20.at.omicron.at>

On Tue, May 11, 2010 at 03:07, Richard Cochran wrote:
> On Mon, May 10, 2010 at 11:39:06AM -0400, Mike Frysinger wrote:
>> diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
>> index 587f93c..6a9519f 100644
>> --- a/drivers/net/bfin_mac.c
>> +++ b/drivers/net/bfin_mac.c
> ...
>> +#define PTP_CLK 25000000
>> +
>> +static void bfin_mac_hwtstamp_init(struct net_device *netdev)
>> +{
>> +     struct bfin_mac_local *lp = netdev_priv(netdev);
>> +     u64 append;
>> +
>> +     /* Initialize hardware timer */
>> +     append = PTP_CLK * (1ULL << 32);
>> +     do_div(append, get_sclk());
>> +     bfin_write_EMAC_PTP_ADDEND((u32)append);
>
> It appears that one can tune this PTP clock.
>
> I recently posted a suggestion for a PTP clock class driver. Would you
> care to take a look at that and say whether that API would also work
> for the blackfin?

i'm guessing you mean:
http://thread.gmane.org/gmane.linux.network/159179
http://thread.gmane.org/gmane.linux.network/159180
http://thread.gmane.org/gmane.linux.network/159181
http://thread.gmane.org/gmane.linux.network/159182

Barry: could you take a look please ?
-mike

^ permalink raw reply

* Re: [PATCH net-next-2.6 1/2] bonding: add keep_all parameter
From: Jay Vosburgh @ 2010-05-11 22:12 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: netdev
In-Reply-To: <20100511210344.GH7497@gospo.rdu.redhat.com>

Andy Gospodarek <andy@greyhouse.net> wrote:

>On Tue, May 11, 2010 at 10:18:21AM -0700, Jay Vosburgh wrote:
>> Andy Gospodarek <andy@greyhouse.net> wrote:
>> 
>> >
>> >In an effort to suppress duplicate frames on certain bonding modes
>> >(specifically the modes that do not require additional configuration on
>> >the switch or switches connected to the host),
>> 
>> 	Strictly speaking, the above is incorrect, as the duplicate
>> suppression is turned on for the active-backup inactive slaves as well
>> as 802.3ad ports that are disabled (any slave that gets the "inactive"
>> flag bit set).
>
>It is also effective when using ALB and TLB, right?  I can change the
>language if you would like to increase the description's accuracy.

	Yah, I forgot about that; the ALB/TLB modes suppress broadcast
and multicast traffic on "inactive" slaves, although that's kind of a
misnomer, since in those modes, "inactive" slaves are active for unicast
traffic.

>> >[...] code was added in the
>> >generic receive patch in 2.6.16.  The current behavior works quite well
>> >for most users, but there are some times it would be nice to restore old
>> >functionality and allow all frames to make their way up the stack.
>> 
>> 	Reading netdev lately, it sure looks like everybody wants ways
>> to shut off or bypass the duplicate suppression.
>> 
>
>I see that too, which was part of the reason to add a configuration
>option.  I know many of the people that complained that they were seeing
>dups will complain again if they show up in the future, so a config
>option seemed like the best way to satisfy both.
>
>> >This patch adds support for a new module option and sysfs file called
>> >'keep_all' that will restore pre-2.6.16 functionality if the user
>> >desires.  The default value is '0' and retains existing behavior, but
>> >the user can set it to '1' and allow all frames up if desired.
>> 
>> 	Since this is really meant for the queue tagging stuff in the
>> next patch, should this really be something that's enabled automatically
>> if the queues are configured in such a way that the inactive slave is
>> going to receive traffic?
>> 
>
>Part of the reason not to have it happen automatically is that the
>second patch *should* allow simple pass-through of queue-mapping (though
>I didn't mention that specifically) from bond device to underlying
>slaves if the user is aware of the number of output queues in their
>NIC and doesn't set the queue_ids for any of the slaves.
>
>Another reason not to turn it on automatically is if the network patch
>for transmission and reception are actually different.  The 'keep_all=1'
>flag might not be needed if transmission is happening on an inactive
>interface, but the active interface will receive all responses due to
>the way the network is designed.
>
>Again, a big part of the motivation patch was bringing back that
>old-functionality to those that desire it and was why I split this out
>from the next patch.

	I think I addressed a lot of this in my big honkin' reply to the
other patch, so I'll forbear further commment until you're read through
all that.

>> 	I also wonder if something like this would satisfy the FCOE guys
>> without making __netif_receive_skb / skb_bond_should_drop even more
>> complicated than they already are.
>
>I'd love to think so, but you never know.
>
>> >Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
>> >Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>> >---
>> > Documentation/networking/bonding.txt |   15 ++++++++++++
>> > drivers/net/bonding/bond_main.c      |   15 ++++++++++++
>> > drivers/net/bonding/bond_sysfs.c     |   43 +++++++++++++++++++++++++++++++++-
>> > drivers/net/bonding/bonding.h        |    1 +
>> > include/linux/if.h                   |    1 +
>> > net/core/dev.c                       |   26 +++++++++++---------
>> > 6 files changed, 88 insertions(+), 13 deletions(-)
>> >
>> >diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
>> >index 61f516b..d64fd2f 100644
>> >--- a/Documentation/networking/bonding.txt
>> >+++ b/Documentation/networking/bonding.txt
>> >@@ -399,6 +399,21 @@ fail_over_mac
>> > 	This option was added in bonding version 3.2.0.  The "follow"
>> > 	policy was added in bonding version 3.3.0.
>> >
>> >+keep_all
>> >+
>> >+	Option to specify whether or not you will keep all frames
>> >+	received on an interface that is a member of a bond.  Right
>> >+	now checking is done to ensure that most frames ultimately
>> >+	classified as duplicates are dropped to keep noise to a
>> >+	minimum.  The feature to drop duplicates was added in kernel
>> >+	version 2.6.16 (bonding driver version 3.0.2) and this will
>> >+	allow that original behavior to be restored if desired.
>> >+
>> >+	A value of 0 (default) will preserve the current behavior and
>> >+	will drop all duplicate frames the bond may receive.  A value
>> >+	of 1 will not attempt to avoid duplicate frames and pass all
>> >+	of them up the stack.
>> 
>> 	Two thoughts (presuming for the moment that this doesn't
>> change): first, bump the driver version and mention when it was added;
>> second, mention that this only applies to active-backup mode.
>> 
>
>Happy to update the version.  But shouldn't this impact ALB and TLB
>modes too since they have a concept of 'active' slaves?
>
>> > lacp_rate
>> >
>> > 	Option specifying the rate in which we'll ask our link partner
>
><snip>
>
>> >--- a/net/core/dev.c
>> >+++ b/net/core/dev.c
>> >@@ -2758,21 +2758,23 @@ int __skb_bond_should_drop(struct sk_buff *skb, struct net_device *master)
>> > 		skb_bond_set_mac_by_master(skb, master);
>> > 	}
>> >
>> >-	if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>> >-		if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>> >-		    skb->protocol == __cpu_to_be16(ETH_P_ARP))
>> >-			return 0;
>> >+	if (unlikely(!(master->priv_flags & IFF_BONDING_KEEP_ALL))) {
>> 
>> 	So it's unlikely that "keep all" will be turned off?
>> 
>
>Grrrr.  That should be an if(likely!(....  Good catch.
>
>> >+		if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>> >+			if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>> >+			    skb->protocol == __cpu_to_be16(ETH_P_ARP))
>> >+				return 0;
>> >
>> >-		if (master->priv_flags & IFF_MASTER_ALB) {
>> >-			if (skb->pkt_type != PACKET_BROADCAST &&
>> >-			    skb->pkt_type != PACKET_MULTICAST)
>> >+			if (master->priv_flags & IFF_MASTER_ALB) {
>> >+				if (skb->pkt_type != PACKET_BROADCAST &&
>> >+				    skb->pkt_type != PACKET_MULTICAST)
>> >+					return 0;
>> >+			}
>> >+			if (master->priv_flags & IFF_MASTER_8023AD &&
>> >+			    skb->protocol == __cpu_to_be16(ETH_P_SLOW))
>> > 				return 0;
>> >-		}
>> >-		if (master->priv_flags & IFF_MASTER_8023AD &&
>> >-		    skb->protocol == __cpu_to_be16(ETH_P_SLOW))
>> >-			return 0;
>> >
>> >-		return 1;
>> >+			return 1;
>> >+		}
>> > 	}
>> > 	return 0;
>> > }
>> >-- 
>> >1.6.2.5

 	-J
 
 ---
 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* [PATCH] net sched: cleanup and rate limit warning
From: Stephen Hemminger @ 2010-05-12  0:24 UTC (permalink / raw)
  To: David Miller, jamal; +Cc: netdev

If the user has a bad classification configuration, and gets a packet
that goes through too many steps. Chances are more packets will arrive,
and the message spew will overrun syslog because it is not rate limited.
And because it is not tagged with appropriate priority it can't not be screened.

Added the qdisc to the message to try and give some more context when
the message does arrive.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
Please think about this for 2.6.34 and could even be -stable material.

--- a/net/sched/sch_api.c	2010-05-11 17:08:42.177374275 -0700
+++ b/net/sched/sch_api.c	2010-05-11 17:16:59.560612078 -0700
@@ -1637,9 +1638,12 @@ reclassify:
 		tp = otp;
 
 		if (verd++ >= MAX_REC_LOOP) {
-			printk("rule prio %u protocol %02x reclassify loop, "
-			       "packet dropped\n",
-			       tp->prio&0xffff, ntohs(tp->protocol));
+			if (net_ratelimit())
+				printk(KERN_NOTICE
+				       "%s: packet reclassify loop"
+					  " rule prio %u protocol %02x\n",
+				       tp->q->ops->id,
+				       tp->prio & 0xffff, ntohs(tp->protocol));
 			return TC_ACT_SHOT;
 		}
 		skb->tc_verd = SET_TC_VERD(skb->tc_verd, verd);

^ permalink raw reply

* Re: [PATCH net-next-2.6 2/2] bonding: allow user-controlled output slave selection
From: Neil Horman @ 2010-05-12  0:27 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Andy Gospodarek, netdev
In-Reply-To: <17897.1273608579@death.nxdomain.ibm.com>

On Tue, May 11, 2010 at 01:09:39PM -0700, Jay Vosburgh wrote:
> Andy Gospodarek <andy@greyhouse.net> wrote:
> 
> >This patch give the user the ability to control the output slave for
> >round-robin and active-backup bonding.  Similar functionality was
> >discussed in the past, but Jay Vosburgh indicated he would rather see a
> >feature like this added to existing modes rather than creating a
> >completely new mode.  Jay's thoughts as well as Neil's input surrounding
> >some of the issues with the first implementation pushed us toward a
> >design that relied on the queue_mapping rather than skb marks.
> >Round-robin and active-backup modes were chosen as the first users of
> >this slave selection as they seemed like the most logical choices when
> >considering a multi-switch environment.
> >
> >Round-robin mode works without any modification, but active-backup does
> >require inclusion of the first patch in this series and setting
> >the 'keep_all' flag.  This will allow reception of unicast traffic on
> >any of the backup interfaces.
> 
> 	Yes, I did think that the mark business fit better into existing
> modes (I thought of it as kind of a new hash for xor and 802.3ad modes).
> I also didn't expect to see so much new stuff (this, as well as the FCOE
> special cases being discussed elsewhere) being shoehorned into the
> active-backup mode.  I'm not so sure that adding so many special cases
> to active-backup is a good thing.
> 
> 	Now, I'm starting to wonder if you were right, and it would be
> better overall to have a "manual" mode that would hopefully satisfy this
> case as well as the FCOE special case.  I don't think either of these is
> a bad use case, I'm just not sure the right way to handle them is
> another special knob in active-backup mode (either directly, or
> implicitly in __netif_receive_skb), which wasn't what I expected to see.
> 
I honestly don't think a separate mode is warranted here.  While I'm not opposed
to adding a new mode, I really think doing so is no different from overloading
an existing mode.  I say that because to add a new mode in which we explicitly
expect traffic to be directed to various slaves requires that we implement a
policy for frames which have no queue mapping determined on egress.  Any policy
I can think of is really an approximation of an existing policy, so we may as
well reuse the policy code that we already have in place.  About the only way a
separate mode makes sense is in the 'passthrough' queue mode you document below.
In this model, in which queue ids map to slaves in a 1:1 fashion it doesn't make
senes.


> 	I presume you're overloading active-backup because it's not
> etherchannel, 802.3ad, etc, and just talks right to the switch.  For the
> regular load balance modes, I still think overlay into the existing
> modes is preferable (more on that later); I'm thinking of "manual"
> instead of another tweak to active-backup.
> 
> 	If users want to have actual hot-standby functionality, then
> active-backup would do that, and nothing else (and it can be multi-queue
> aware, but only one slave active at a time).
> 
Yes, but active backup doesn't provide prefered output path selection in and of
itself.  Thats the feature here.

> 	Users who want the set of bonded slaves to look like a big
> multiqueue buffet could use this "manual" mode and set things up however
> they want.  One way to set it up is simply that the bond is N queues
> wide, where N is the total of the queue counts of all the slaves.  If a
> slave fails, N gets smaller, and the user code has to deal with that.
> Since the queue count of a device can't change dynamically, the bond
> would have to actually be set up with some big number of queues, and
> then only a subset is actually active (or there is some sort of wrap).
> 
> 	In such an implementation, each slave would have a range of
> queue IDs, not necessarily just one.  I'm a bit leery of exposing an API
> where each slave is one queue ID, as it could make transitioning to real
> multi-queue awareness difficult.
> 
I'm sorry, what exactly do you mean when you say 'real' multi queue
awareness?  How is this any less real than any other implementation?  The
approach you outline above isn't any more or less valid than this one.

While we're on the subject, Andy and I did discuss a model simmilar to what you
describe above (what I'll refer to as a queue id passthrough model), in which
you can tell the bonding driver to map a frame to a queue, and the bonding
driver doesn't really do anything with the queue id other than pass to the slave
device for hardware based multiqueue tx handling.  While we could do that, its
my feeling such a model isn't the way to go for two primary reasons:

1) Inconsistent behavior.  Such an implementation makes assumptions regarding
queue id specification within a driver.  For example, What if one of the slaves
reserves some fixed number of low order queues for a sepecific purpose, and as
such general use queues begin an at offset from zero, while other slaves do not.
While its easy to accomidate such needs when writing the tc filters, if a slave
fails over, such a bias would change output traffic behavior, as the bonding
driver can't be clearly informed of such a bias.  Likewise, what if a slave
driver allocates more queues than it actually supports in hardware (like the
implementation you propose, ixgbe IIRC actually does this).  If slaves handled
unimplemented tx queues different (if one wrapped queues, while the other simply
dropped frames to unimplemented queues for instance).  A failover would change
traffic patterns dramatically.

2) Need.  While (1) can pretty easily be managed with a few configuration
guidelines (output queues on slaves have to be configured identically, lets
chaos and madness befall you, etc), theres really no reason to bind users to
such a system.  We're using tc filters to set the queue id on skbs enqueued to
the bonding driver, theres absolutely no reason you can add addition filters to
the slaves directly.  Since the bonding driver uses dev_queue_xmit to send a
frame to a slave, it has the opportunity to pass through another set of queuing
diciplines and filters that can reset and re-assign the skbs queue mapping.  So
with the approach in this patch you can get both direct output control without
sacrificing actual hardware tx output queue control.  With a passthrough model,
you save a bit of filter configuration, but at the expense of having to be much
more careful about how you configure your slave nics, and detecting such errors
in configuration would be rather difficult to track down, as it would require
the generation of traffic that hit the right filter after a failover.


> 	There might also be a way to tie it in to the new RPS code on
> the receive side.
> 
> 	If the slaves all have the same MAC and attach to a single
> switch via etherchannel, then it all looks pretty much like a single big
> honkin' multiqueue device.  The switch probably won't map the flows back
> the same way, though.
> 
I agree, they probably wont.  Receive side handling wasn't really our focus here
though.  Thats largely why we chose round robin and active backup as our first
modes to use this with.  They are already written to expect frames on either
interface.

> 	If the slaves are on discrete switches (without etherchannel),
> things become more complicated.  If the slaves have the same MAC, then
> the switches will be irritated about seeing that same MAC coming in from
> multiple places.  If the slaves have different MACs, then ARP has the
> same sort of issues.
> 
> 	In thinking about it, if it's linux bonding at both ends, there
> could be any number of discrete switches in the path, and it wouldn't
> matter as long as the linux end can work things out, e.g.,
> 
>         -- switch 1 --
> hostA  /              \  hostB
> bond  ---- switch 2 ---- bond
>        \              /
>         -- switch 3 --
> 
> 	For something like this, the switches would never share MAC
> information for the bonding slaves.  The issue here then becomes more of
> detecting link failures (it would require either a "trunk failover" type
> of function on the switch, or some kind of active probe between the
> bonds).
> 
> 	Now, I realize that I'm babbling a bit, as from reading your
> description, this isn't necessarily your target topology (which sounded
> more like a case of slave A can reach only network X, and slave B can
> reach anywhere, so sending to network X should use slave A
> preferentially), or, as long as I'm doing ASCII-art,
> 
>        --- switch 1 ---- network X
> hostA /               /
> bond  ---- switch 2 -+-- anywhere
> 
> 	Is that an accurate representation?  Or is it something a bit
> different, e.g.,
> 
>        --- switch 1 ---- network X -\
> hostA /                             /
> bond  ---- switch 2 ---- anywhere --
> 
> 	I.e., the "anywhere" connects back to network X from the
> outside, so to speak.  Or, oh, maybe I'm missing it entirely, and you're
> thinking of something like this:
> 
>        --- switch 1 --- VPN --- web site
> hostA /                          /
> bond  ---- switch 2 - Internet -/
> 
> 	Where you prefer to hit "web site" via the VPN (perhaps it's a
> more efficient or secure path), but can do it from the public network at
> large if necessary.
> 
Yes, this one.  I think the other models are equally interesting, but this model
in which either path had universal reachabilty, but for some classes of traffic
one path is preferred over the other is the one we had in mind.

> 	Now, regardless of the above, your first patch ("keep_all") is
> to deal with the reverse problem, if this is a piggyback on top of
> active-backup mode: how to get packets back, when both channels can be
> active simultaneously.  That actually dovetails to a degree with work
> I've been doing lately, but the solution there probably isn't what
> you're looking for (there's a user space daemon to do path finding, and
> the "bond IP" address is piggybacked on the slaves' MAC addresses, which
> are not changed; the "bond IP" set exists in a separate subnet all its
> own).
> 
> 	As I said, I'm not convinced that the "keep_all" option to
> active-backup is really better than just a "manual" mode that lacks the
> dup suppression and expects the user to set everything up.
> 
> 	As for the round-robin change in this patch, if I'm reading it
> right, then the way it works is that the packets are round-robined,
> unless there's a queue id passed in, in which case it's assigned to the
> slave mapped to that queue id.  I'm not entirely sure why you picked
> round-robin mode for that over balance-xor; it doesn't seem to fit well
> with the description in the documentation.  Or is it just sort of a
> demonstrator?
> 
It was selected because round robin allows transmits on any interface already,
and expects frames on any interface, so it was a 'safe' choice.  I would think
balance-xor would also work.  Ideally it would be nice to get more modes
supporting this mechanism.

> 	I do like one other aspect of the patch, and that's the concept
> of overlaying the queue map on top of the balance algorithm.  So, e.g.,
> balance-xor would do its usual thing, unless the packet is queue mapped,
> in which case the packet's assignment is obeyed.  The balance-xor could
> even optionally do its xor across the full set of all slaves output
> queues instead of just across the slaves.  Round-robin can operate
> similarly.  For those modes, a "balance by queue vs. balance by slave"
> seems like a reasonable knob to have.
Not sure what you mean here.  In the model implemented by this patch, there is
one output queue per slave, and as such, balance by queue == balance by slave.
That would make sense in the model you describe earlier in this note, but not in
the model presented by this patch.

> 
> 	I do understand that you're proposing something relatively
> simple, and I'm thinking out loud about alternate or additional
> implementation details.  Some of this is "ooh ahh what if", but we also
> don't want to end up with something that's forwards incompatible, and
> I'm hoping to find one solution to multiple problems.
> 
For clarification, can you ennumerate what other problems you are trying to
solve with this feature, or features simmilar to this?  From this email, the one
that I most clearly see is the desire to allow a passthrough mode of queue
selection, which I think I've noted can be done already (even without this
patch), by attaching additional tc filters to the slaves output queues directly.
What else do you have in mind?

Thanks & Regards
Neil

> 

^ permalink raw reply

* [PATCH net-next 01/16] tipc: Eliminate obsolete port's "congested_link" field
From: Paul Gortmaker @ 2010-05-12  0:30 UTC (permalink / raw)
  To: netdev; +Cc: allan.stephens
In-Reply-To: <1273624218-22514-1-git-send-email-paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Eliminate a field of the TIPC port structure that is populated,
but never referenced.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/link.c |    2 --
 net/tipc/port.c |    1 -
 net/tipc/port.h |    2 --
 3 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index c76e82e..0b86f6a 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -561,7 +561,6 @@ static int link_schedule_port(struct link *l_ptr, u32 origport, u32 sz)
 			goto exit;
 		if (!list_empty(&p_ptr->wait_list))
 			goto exit;
-		p_ptr->congested_link = l_ptr;
 		p_ptr->publ.congested = 1;
 		p_ptr->waiting_pkts = 1 + ((sz - 1) / link_max_pkt(l_ptr));
 		list_add_tail(&p_ptr->wait_list, &l_ptr->waiting_ports);
@@ -592,7 +591,6 @@ void tipc_link_wakeup_ports(struct link *l_ptr, int all)
 		if (win <= 0)
 			break;
 		list_del_init(&p_ptr->wait_list);
-		p_ptr->congested_link = NULL;
 		spin_lock_bh(p_ptr->publ.lock);
 		p_ptr->publ.congested = 0;
 		p_ptr->wakeup(&p_ptr->publ);
diff --git a/net/tipc/port.c b/net/tipc/port.c
index e70d27e..c703ecb 100644
--- a/net/tipc/port.c
+++ b/net/tipc/port.c
@@ -247,7 +247,6 @@ struct tipc_port *tipc_createport_raw(void *usr_handle,
 	p_ptr->sent = 1;
 	INIT_LIST_HEAD(&p_ptr->wait_list);
 	INIT_LIST_HEAD(&p_ptr->subscription.nodesub_list);
-	p_ptr->congested_link = NULL;
 	p_ptr->dispatcher = dispatcher;
 	p_ptr->wakeup = wakeup;
 	p_ptr->user_port = NULL;
diff --git a/net/tipc/port.h b/net/tipc/port.h
index ff31ee4..8d1652a 100644
--- a/net/tipc/port.h
+++ b/net/tipc/port.h
@@ -75,7 +75,6 @@ struct user_port {
  * @wakeup: ptr to routine to call when port is no longer congested
  * @user_port: ptr to user port associated with port (if any)
  * @wait_list: adjacent ports in list of ports waiting on link congestion
- * @congested_link: ptr to congested link port is waiting on
  * @waiting_pkts:
  * @sent:
  * @acked:
@@ -95,7 +94,6 @@ struct port {
 	void (*wakeup)(struct tipc_port *);
 	struct user_port *user_port;
 	struct list_head wait_list;
-	struct link *congested_link;
 	u32 waiting_pkts;
 	u32 sent;
 	u32 acked;
-- 
1.7.1.rc2


^ permalink raw reply related

* [PATCH net-next 02/16] tipc: Eliminate unused argument in print statement
From: Paul Gortmaker @ 2010-05-12  0:30 UTC (permalink / raw)
  To: netdev; +Cc: allan.stephens
In-Reply-To: <f90800f460df4ef216412e83e148771d2b6a7183.1273621271.git.paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Eliminate an argument in a print statement that has no corresponding
format specification.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/link.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 0b86f6a..c95038f 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -3328,9 +3328,7 @@ static void link_print(struct link *l_ptr, struct print_buf *buf,
 		if (l_ptr->next_out)
 			tipc_printf(buf, "%u..",
 				    msg_seqno(buf_msg(l_ptr->next_out)));
-		tipc_printf(buf, "%u]",
-			    msg_seqno(buf_msg
-				      (l_ptr->last_out)), l_ptr->out_queue_size);
+		tipc_printf(buf, "%u]", msg_seqno(buf_msg(l_ptr->last_out)));
 		if ((mod(msg_seqno(buf_msg(l_ptr->last_out)) -
 			 msg_seqno(buf_msg(l_ptr->first_out)))
 		     != (l_ptr->out_queue_size - 1)) ||
-- 
1.7.1.rc2


^ permalink raw reply related

* [PATCH net-next 0/16] tipc: 1st integration of basic changes from sourceforge
From: Paul Gortmaker @ 2010-05-12  0:30 UTC (permalink / raw)
  To: netdev; +Cc: allan.stephens

The following patches are step one in making use of the changes that
were stored away on sourceforge but not yet integrated into the kernel.

Since I am far from knowledgeable on tipc, this starts by pulling in
the 1st batch of basic/cosmetic changes that will at least start to
reduce the delta between the two.   Hopefully getting all the simple
stuff out of the way 1st will help clarify what is left of interest.

I've also put the same commits on the branch tipc-May11_2010 in:
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/net-next-2.6.git
in case that is more convenient for people.

Thanks,
Paul.


^ permalink raw reply

* [PATCH net-next 03/16] tipc: Prune unused data structures from configuration service
From: Paul Gortmaker @ 2010-05-12  0:30 UTC (permalink / raw)
  To: netdev; +Cc: allan.stephens
In-Reply-To: <f90800f460df4ef216412e83e148771d2b6a7183.1273621271.git.paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Eliminate some unused data structures in the TIPC
configuration service that relate to the handling of link
subscriptions, which were not supported when TIPC 1.5 was
introduced.  If and when support for link subscriptions is
offered in TIPC, these elements may need to be re-introduced.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/config.c |   28 ++++++++++++++--------------
 1 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/net/tipc/config.c b/net/tipc/config.c
index ca3544d..7370241 100644
--- a/net/tipc/config.c
+++ b/net/tipc/config.c
@@ -56,9 +56,6 @@ struct subscr_data {
 struct manager {
 	u32 user_ref;
 	u32 port_ref;
-	u32 subscr_ref;
-	u32 link_subscriptions;
-	struct list_head link_subscribers;
 };
 
 static struct manager mng = { 0};
@@ -70,12 +67,6 @@ static int req_tlv_space;		/* request message TLV area size */
 static int rep_headroom;		/* reply message headroom to use */
 
 
-void tipc_cfg_link_event(u32 addr, char *name, int up)
-{
-	/* TIPC DOESN'T HANDLE LINK EVENT SUBSCRIPTIONS AT THE MOMENT */
-}
-
-
 struct sk_buff *tipc_cfg_reply_alloc(int payload_size)
 {
 	struct sk_buff *buf;
@@ -130,12 +121,24 @@ struct sk_buff *tipc_cfg_reply_string_type(u16 tlv_type, char *string)
 }
 
 
-
-
 #if 0
 
 /* Now obsolete code for handling commands not yet implemented the new way */
 
+/*
+ * Some of this code assumed that the manager structure contains two added
+ * fields:
+ *	u32 link_subscriptions;
+ *	struct list_head link_subscribers;
+ * which are currently not present.  These fields may need to be re-introduced
+ * if and when support for link subscriptions is added.
+ */
+
+void tipc_cfg_link_event(u32 addr, char *name, int up)
+{
+	/* TIPC DOESN'T HANDLE LINK EVENT SUBSCRIPTIONS AT THE MOMENT */
+}
+
 int tipc_cfg_cmd(const struct tipc_cmd_msg * msg,
 		 char *data,
 		 u32 sz,
@@ -667,9 +670,6 @@ int tipc_cfg_init(void)
 	struct tipc_name_seq seq;
 	int res;
 
-	memset(&mng, 0, sizeof(mng));
-	INIT_LIST_HEAD(&mng.link_subscribers);
-
 	res = tipc_attach(&mng.user_ref, NULL, NULL);
 	if (res)
 		goto failed;
-- 
1.7.1.rc2


^ permalink raw reply related

* [PATCH net-next 04/16] tipc: Eliminate unnecessary initialization in native API send routines
From: Paul Gortmaker @ 2010-05-12  0:30 UTC (permalink / raw)
  To: netdev; +Cc: allan.stephens
In-Reply-To: <f90800f460df4ef216412e83e148771d2b6a7183.1273621271.git.paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Eliminate a couple of instances where TIPC's native API send routines
were doing pointless initialization of local variables.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/port.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/tipc/port.c b/net/tipc/port.c
index c703ecb..7641db6 100644
--- a/net/tipc/port.c
+++ b/net/tipc/port.c
@@ -1452,7 +1452,7 @@ int tipc_forward2name(u32 ref,
 	struct port *p_ptr;
 	struct tipc_msg *msg;
 	u32 destnode = domain;
-	u32 destport = 0;
+	u32 destport;
 	int res;
 
 	p_ptr = tipc_port_deref(ref);
@@ -1524,7 +1524,7 @@ int tipc_forward_buf2name(u32 ref,
 	struct port *p_ptr;
 	struct tipc_msg *msg;
 	u32 destnode = domain;
-	u32 destport = 0;
+	u32 destport;
 	int res;
 
 	p_ptr = (struct port *)tipc_ref_deref(ref);
-- 
1.7.1.rc2


^ permalink raw reply related

* [PATCH net-next 05/16] tipc: Rename "multicast-link" to "broadcast-link"
From: Paul Gortmaker @ 2010-05-12  0:30 UTC (permalink / raw)
  To: netdev; +Cc: allan.stephens
In-Reply-To: <f90800f460df4ef216412e83e148771d2b6a7183.1273621271.git.paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Make a cosmetic change to the name displayed for the broadcast link,
to better reflect its true nature. Since TIPC utilizes this link to
distribute name table information, in addition to multicast messages
sent by user applications, the prior name "multicast-link" is
no longer appropriate.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/bcast.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 90a0519..a18f26d 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -119,7 +119,7 @@ static struct bclink *bclink = NULL;
 static struct link *bcl = NULL;
 static DEFINE_SPINLOCK(bc_lock);
 
-const char tipc_bclink_name[] = "multicast-link";
+const char tipc_bclink_name[] = "broadcast-link";
 
 
 static u32 buf_seqno(struct sk_buff *buf)
-- 
1.7.1.rc2


^ permalink raw reply related

* [PATCH net-next 06/16] tipc: Add support for "-s" configuration option
From: Paul Gortmaker @ 2010-05-12  0:30 UTC (permalink / raw)
  To: netdev; +Cc: allan.stephens
In-Reply-To: <f90800f460df4ef216412e83e148771d2b6a7183.1273621271.git.paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Provide initial support for displaying overall TIPC status/statistics
information at runtime.  Currently, only version info for the TIPC
kernel module is displayed.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 include/linux/tipc_config.h |    1 +
 net/tipc/config.c           |   40 +++++++++++++++++++++++++++++++++++++++-
 net/tipc/core.c             |    2 --
 net/tipc/core.h             |    3 +++
 4 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/include/linux/tipc_config.h b/include/linux/tipc_config.h
index 2bc6fa4..9cde86c 100644
--- a/include/linux/tipc_config.h
+++ b/include/linux/tipc_config.h
@@ -74,6 +74,7 @@
 #define  TIPC_CMD_SHOW_NAME_TABLE   0x0005    /* tx name_tbl_query, rx ultra_string */
 #define  TIPC_CMD_SHOW_PORTS        0x0006    /* tx none, rx ultra_string */
 #define  TIPC_CMD_SHOW_LINK_STATS   0x000B    /* tx link_name, rx ultra_string */
+#define  TIPC_CMD_SHOW_STATS        0x000F    /* tx unsigned, rx ultra_string */
 
 #if 0
 #define  TIPC_CMD_SHOW_PORT_STATS   0x0008    /* tx port_ref, rx ultra_string */
diff --git a/net/tipc/config.c b/net/tipc/config.c
index 7370241..961d1b0 100644
--- a/net/tipc/config.c
+++ b/net/tipc/config.c
@@ -246,13 +246,48 @@ static void cfg_cmd_event(struct tipc_cmd_msg *msg,
 	default:
 		rv = tipc_cfg_cmd(msg, data, sz, (u32 *)&msg_sect[1].iov_len, orig);
 	}
-	exit:
+exit:
 	rmsg.result_len = htonl(msg_sect[1].iov_len);
 	rmsg.retval = htonl(rv);
 	tipc_cfg_respond(msg_sect, 2u, orig);
 }
 #endif
 
+#define MAX_STATS_INFO 2000
+
+static struct sk_buff *tipc_show_stats(void)
+{
+	struct sk_buff *buf;
+	struct tlv_desc *rep_tlv;
+	struct print_buf pb;
+	int str_len;
+	u32 value;
+
+	if (!TLV_CHECK(req_tlv_area, req_tlv_space, TIPC_TLV_UNSIGNED))
+		return tipc_cfg_reply_error_string(TIPC_CFG_TLV_ERROR);
+
+	value = ntohl(*(u32 *)TLV_DATA(req_tlv_area));
+	if (value != 0)
+		return tipc_cfg_reply_error_string("unsupported argument");
+
+	buf = tipc_cfg_reply_alloc(TLV_SPACE(MAX_STATS_INFO));
+	if (buf == NULL)
+		return NULL;
+
+	rep_tlv = (struct tlv_desc *)buf->data;
+	tipc_printbuf_init(&pb, (char *)TLV_DATA(rep_tlv), MAX_STATS_INFO);
+
+	tipc_printf(&pb, "TIPC version " TIPC_MOD_VER "\n");
+
+	/* Use additional tipc_printf()'s to return more info ... */
+
+	str_len = tipc_printbuf_validate(&pb);
+	skb_put(buf, TLV_SPACE(str_len));
+	TLV_SET(rep_tlv, TIPC_TLV_ULTRA_STRING, NULL, str_len);
+
+	return buf;
+}
+
 static struct sk_buff *cfg_enable_bearer(void)
 {
 	struct tipc_bearer_config *args;
@@ -536,6 +571,9 @@ struct sk_buff *tipc_cfg_do_cmd(u32 orig_node, u16 cmd, const void *request_area
 	case TIPC_CMD_DUMP_LOG:
 		rep_tlv_buf = tipc_log_dump();
 		break;
+	case TIPC_CMD_SHOW_STATS:
+		rep_tlv_buf = tipc_show_stats();
+		break;
 	case TIPC_CMD_SET_LINK_TOL:
 	case TIPC_CMD_SET_LINK_PRI:
 	case TIPC_CMD_SET_LINK_WINDOW:
diff --git a/net/tipc/core.c b/net/tipc/core.c
index 4e84c84..b47d184 100644
--- a/net/tipc/core.c
+++ b/net/tipc/core.c
@@ -49,8 +49,6 @@
 #include "config.h"
 
 
-#define TIPC_MOD_VER "2.0.0"
-
 #ifndef CONFIG_TIPC_ZONES
 #define CONFIG_TIPC_ZONES 3
 #endif
diff --git a/net/tipc/core.h b/net/tipc/core.h
index c58a1d1..1e149f5 100644
--- a/net/tipc/core.h
+++ b/net/tipc/core.h
@@ -59,6 +59,9 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 
+
+#define TIPC_MOD_VER "2.0.0"
+
 /*
  * TIPC sanity test macros
  */
-- 
1.7.1.rc2


^ permalink raw reply related

* [PATCH net-next 08/16] tipc: remove abstraction for link_max_pkt
From: Paul Gortmaker @ 2010-05-12  0:30 UTC (permalink / raw)
  To: netdev; +Cc: allan.stephens
In-Reply-To: <f90800f460df4ef216412e83e148771d2b6a7183.1273621271.git.paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

This is just a straight return of a field; there is no
value in the abstraction of hiding it behind a function.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/link.c |   27 +++++++++++----------------
 1 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index c95038f..441b26a 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -232,11 +232,6 @@ static int link_congested(struct link *l_ptr)
 	return (l_ptr->out_queue_size >= l_ptr->queue_limit[0]);
 }
 
-static u32 link_max_pkt(struct link *l_ptr)
-{
-	return l_ptr->max_pkt;
-}
-
 static void link_init_max_pkt(struct link *l_ptr)
 {
 	u32 max_pkt;
@@ -562,7 +557,7 @@ static int link_schedule_port(struct link *l_ptr, u32 origport, u32 sz)
 		if (!list_empty(&p_ptr->wait_list))
 			goto exit;
 		p_ptr->publ.congested = 1;
-		p_ptr->waiting_pkts = 1 + ((sz - 1) / link_max_pkt(l_ptr));
+		p_ptr->waiting_pkts = 1 + ((sz - 1) / l_ptr->max_pkt);
 		list_add_tail(&p_ptr->wait_list, &l_ptr->waiting_ports);
 		l_ptr->stats.link_congs++;
 exit:
@@ -1015,7 +1010,7 @@ static int link_bundle_buf(struct link *l_ptr,
 		return 0;
 	if (skb_tailroom(bundler) < (pad + size))
 		return 0;
-	if (link_max_pkt(l_ptr) < (to_pos + size))
+	if (l_ptr->max_pkt < (to_pos + size))
 		return 0;
 
 	skb_put(bundler, pad + size);
@@ -1062,7 +1057,7 @@ int tipc_link_send_buf(struct link *l_ptr, struct sk_buff *buf)
 	u32 queue_size = l_ptr->out_queue_size;
 	u32 imp = msg_tot_importance(msg);
 	u32 queue_limit = l_ptr->queue_limit[imp];
-	u32 max_packet = link_max_pkt(l_ptr);
+	u32 max_packet = l_ptr->max_pkt;
 
 	msg_set_prevnode(msg, tipc_own_addr);	/* If routed message */
 
@@ -1193,7 +1188,7 @@ static int link_send_buf_fast(struct link *l_ptr, struct sk_buff *buf,
 	int res = msg_data_sz(msg);
 
 	if (likely(!link_congested(l_ptr))) {
-		if (likely(msg_size(msg) <= link_max_pkt(l_ptr))) {
+		if (likely(msg_size(msg) <= l_ptr->max_pkt)) {
 			if (likely(list_empty(&l_ptr->b_ptr->cong_links))) {
 				link_add_to_outqueue(l_ptr, buf, msg);
 				if (likely(tipc_bearer_send(l_ptr->b_ptr, buf,
@@ -1210,7 +1205,7 @@ static int link_send_buf_fast(struct link *l_ptr, struct sk_buff *buf,
 			}
 		}
 		else
-			*used_max_pkt = link_max_pkt(l_ptr);
+			*used_max_pkt = l_ptr->max_pkt;
 	}
 	return tipc_link_send_buf(l_ptr, buf);  /* All other cases */
 }
@@ -1317,7 +1312,7 @@ exit:
 			 * then re-try fast path or fragment the message
 			 */
 
-			sender->publ.max_pkt = link_max_pkt(l_ptr);
+			sender->publ.max_pkt = l_ptr->max_pkt;
 			tipc_node_unlock(node);
 			read_unlock_bh(&tipc_net_lock);
 
@@ -1480,8 +1475,8 @@ error:
 			tipc_node_unlock(node);
 			goto reject;
 		}
-		if (link_max_pkt(l_ptr) < max_pkt) {
-			sender->publ.max_pkt = link_max_pkt(l_ptr);
+		if (l_ptr->max_pkt < max_pkt) {
+			sender->publ.max_pkt = l_ptr->max_pkt;
 			tipc_node_unlock(node);
 			for (; buf_chain; buf_chain = buf) {
 				buf = buf_chain->next;
@@ -2679,7 +2674,7 @@ int tipc_link_send_long_buf(struct link *l_ptr, struct sk_buff *buf)
 	u32 dsz = msg_data_sz(inmsg);
 	unchar *crs = buf->data;
 	u32 rest = insize;
-	u32 pack_sz = link_max_pkt(l_ptr);
+	u32 pack_sz = l_ptr->max_pkt;
 	u32 fragm_sz = pack_sz - INT_H_SIZE;
 	u32 fragm_no = 1;
 	u32 destaddr;
@@ -3125,7 +3120,7 @@ static int tipc_link_stats(const char *name, char *buf, const u32 buf_size)
 	tipc_printf(&pb, "Link <%s>\n"
 			 "  %s  MTU:%u  Priority:%u  Tolerance:%u ms"
 			 "  Window:%u packets\n",
-		    l_ptr->name, status, link_max_pkt(l_ptr),
+		    l_ptr->name, status, l_ptr->max_pkt,
 		    l_ptr->priority, l_ptr->tolerance, l_ptr->queue_limit[0]);
 	tipc_printf(&pb, "  RX packets:%u fragments:%u/%u bundles:%u/%u\n",
 		    l_ptr->next_in_no - l_ptr->stats.recv_info,
@@ -3270,7 +3265,7 @@ u32 tipc_link_get_max_pkt(u32 dest, u32 selector)
 		tipc_node_lock(n_ptr);
 		l_ptr = n_ptr->active_links[selector & 1];
 		if (l_ptr)
-			res = link_max_pkt(l_ptr);
+			res = l_ptr->max_pkt;
 		tipc_node_unlock(n_ptr);
 	}
 	read_unlock_bh(&tipc_net_lock);
-- 
1.7.1.rc2


^ permalink raw reply related

* [PATCH net-next 07/16] tipc: Update commenting in TIPC API
From: Paul Gortmaker @ 2010-05-12  0:30 UTC (permalink / raw)
  To: netdev; +Cc: allan.stephens
In-Reply-To: <f90800f460df4ef216412e83e148771d2b6a7183.1273621271.git.paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Eliminate comments in TIPC's main API files that are either obsolete,
incorrect, misleading, or unhelpful.  It also adds in one new comment.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 include/linux/tipc.h    |    6 +++---
 include/net/tipc/tipc.h |   16 ++++++++--------
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/linux/tipc.h b/include/linux/tipc.h
index 9536d8a..181c8d0 100644
--- a/include/linux/tipc.h
+++ b/include/linux/tipc.h
@@ -107,7 +107,7 @@ static inline unsigned int tipc_node(__u32 addr)
  * Message importance levels
  */
 
-#define TIPC_LOW_IMPORTANCE		0  /* default */
+#define TIPC_LOW_IMPORTANCE		0
 #define TIPC_MEDIUM_IMPORTANCE		1
 #define TIPC_HIGH_IMPORTANCE		2
 #define TIPC_CRITICAL_IMPORTANCE	3
@@ -182,7 +182,7 @@ struct sockaddr_tipc {
 		struct tipc_name_seq nameseq;
 		struct {
 			struct tipc_name name;
-			__u32 domain; /* 0: own zone */
+			__u32 domain;
 		} name;
 	} addr;
 };
@@ -200,7 +200,7 @@ struct sockaddr_tipc {
  */
 
 #define TIPC_IMPORTANCE		127	/* Default: TIPC_LOW_IMPORTANCE */
-#define TIPC_SRC_DROPPABLE	128	/* Default: 0 (resend congested msg) */
+#define TIPC_SRC_DROPPABLE	128	/* Default: based on socket type */
 #define TIPC_DEST_DROPPABLE	129	/* Default: based on socket type */
 #define TIPC_CONN_TIMEOUT	130	/* Default: 8000 (ms)  */
 #define TIPC_NODE_RECVQ_DEPTH	131	/* Default: none (read only) */
diff --git a/include/net/tipc/tipc.h b/include/net/tipc/tipc.h
index 9566608..15af6dc 100644
--- a/include/net/tipc/tipc.h
+++ b/include/net/tipc/tipc.h
@@ -2,7 +2,7 @@
  * include/net/tipc/tipc.h: Main include file for TIPC users
  * 
  * Copyright (c) 2003-2006, Ericsson AB
- * Copyright (c) 2005, Wind River Systems
+ * Copyright (c) 2005,2010 Wind River Systems
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -126,7 +126,7 @@ int tipc_createport(unsigned int tipc_user,
 		    tipc_msg_event message_cb, 
 		    tipc_named_msg_event named_message_cb, 
 		    tipc_conn_msg_event conn_message_cb, 
-		    tipc_continue_event continue_event_cb,/* May be zero */
+		    tipc_continue_event continue_event_cb,
 		    u32 *portref);
 
 int tipc_deleteport(u32 portref);
@@ -145,13 +145,13 @@ int tipc_set_portunreturnable(u32 portref, unsigned int isunreturnable);
 int tipc_publish(u32 portref, unsigned int scope, 
 		 struct tipc_name_seq const *name_seq);
 int tipc_withdraw(u32 portref, unsigned int scope,
-		  struct tipc_name_seq const *name_seq); /* 0: all */
+		  struct tipc_name_seq const *name_seq);
 
 int tipc_connect2port(u32 portref, struct tipc_portid const *port);
 
 int tipc_disconnect(u32 portref);
 
-int tipc_shutdown(u32 ref); /* Sends SHUTDOWN msg */
+int tipc_shutdown(u32 ref);
 
 int tipc_isconnected(u32 portref, int *isconnected);
 
@@ -176,7 +176,7 @@ int tipc_send_buf(u32 portref,
 
 int tipc_send2name(u32 portref, 
 		   struct tipc_name const *name, 
-		   u32 domain,	/* 0:own zone */
+		   u32 domain,
 		   unsigned int num_sect,
 		   struct iovec const *msg_sect);
 
@@ -188,7 +188,7 @@ int tipc_send_buf2name(u32 portref,
 
 int tipc_forward2name(u32 portref, 
 		      struct tipc_name const *name, 
-		      u32 domain,   /*0: own zone */
+		      u32 domain,
 		      unsigned int section_count,
 		      struct iovec const *msg_sect,
 		      struct tipc_portid const *origin,
@@ -228,14 +228,14 @@ int tipc_forward_buf2port(u32 portref,
 
 int tipc_multicast(u32 portref, 
 		   struct tipc_name_seq const *seq, 
-		   u32 domain,	/* 0:own zone */
+		   u32 domain,	/* currently unused */
 		   unsigned int section_count,
 		   struct iovec const *msg);
 
 #if 0
 int tipc_multicast_buf(u32 portref, 
 		       struct tipc_name_seq const *seq, 
-		       u32 domain,	/* 0:own zone */
+		       u32 domain,
 		       void *buf,
 		       unsigned int size);
 #endif
-- 
1.7.1.rc2


^ permalink raw reply related

* [PATCH net-next 09/16] tipc: Relocate trivial link status functions to header file
From: Paul Gortmaker @ 2010-05-12  0:30 UTC (permalink / raw)
  To: netdev; +Cc: allan.stephens
In-Reply-To: <f90800f460df4ef216412e83e148771d2b6a7183.1273621271.git.paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Rather than live in link.c where they can only be used in that file alone,
these helper routines are better served by being in link.h

Relocated are the following:

	link_working_working
	link_working_unknown
	link_reset_unknown
	link_reset_reset
	link_blocked
	link_congested

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/discover.c |    2 +-
 net/tipc/link.c     |   30 ------------------------------
 net/tipc/link.h     |   35 +++++++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+), 31 deletions(-)

diff --git a/net/tipc/discover.c b/net/tipc/discover.c
index 74b7d1e..ce1390a 100644
--- a/net/tipc/discover.c
+++ b/net/tipc/discover.c
@@ -224,7 +224,7 @@ void tipc_disc_recv_msg(struct sk_buff *buf, struct bearer *b_ptr)
 			memcpy(addr, &media_addr, sizeof(*addr));
 			tipc_link_reset(link);
 		}
-		link_fully_up = (link->state == WORKING_WORKING);
+		link_fully_up = link_working_working(link);
 		spin_unlock_bh(&n_ptr->lock);
 		if ((type == DSC_RESP_MSG) || link_fully_up)
 			return;
diff --git a/net/tipc/link.c b/net/tipc/link.c
index 441b26a..e8320bf 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -202,36 +202,6 @@ static unsigned int align(unsigned int i)
 	return (i + 3) & ~3u;
 }
 
-static int link_working_working(struct link *l_ptr)
-{
-	return (l_ptr->state == WORKING_WORKING);
-}
-
-static int link_working_unknown(struct link *l_ptr)
-{
-	return (l_ptr->state == WORKING_UNKNOWN);
-}
-
-static int link_reset_unknown(struct link *l_ptr)
-{
-	return (l_ptr->state == RESET_UNKNOWN);
-}
-
-static int link_reset_reset(struct link *l_ptr)
-{
-	return (l_ptr->state == RESET_RESET);
-}
-
-static int link_blocked(struct link *l_ptr)
-{
-	return (l_ptr->exp_msg_count || l_ptr->blocked);
-}
-
-static int link_congested(struct link *l_ptr)
-{
-	return (l_ptr->out_queue_size >= l_ptr->queue_limit[0]);
-}
-
 static void link_init_max_pkt(struct link *l_ptr)
 {
 	u32 max_pkt;
diff --git a/net/tipc/link.h b/net/tipc/link.h
index 6a51e38..2e5385c 100644
--- a/net/tipc/link.h
+++ b/net/tipc/link.h
@@ -292,4 +292,39 @@ static inline u32 lesser(u32 left, u32 right)
 	return less_eq(left, right) ? left : right;
 }
 
+
+/*
+ * Link status checking routines
+ */
+
+static inline int link_working_working(struct link *l_ptr)
+{
+	return (l_ptr->state == WORKING_WORKING);
+}
+
+static inline int link_working_unknown(struct link *l_ptr)
+{
+	return (l_ptr->state == WORKING_UNKNOWN);
+}
+
+static inline int link_reset_unknown(struct link *l_ptr)
+{
+	return (l_ptr->state == RESET_UNKNOWN);
+}
+
+static inline int link_reset_reset(struct link *l_ptr)
+{
+	return (l_ptr->state == RESET_RESET);
+}
+
+static inline int link_blocked(struct link *l_ptr)
+{
+	return (l_ptr->exp_msg_count || l_ptr->blocked);
+}
+
+static inline int link_congested(struct link *l_ptr)
+{
+	return (l_ptr->out_queue_size >= l_ptr->queue_limit[0]);
+}
+
 #endif
-- 
1.7.1.rc2


^ permalink raw reply related

* [PATCH net-next 10/16] tipc: add tipc_ prefix to fcns targeted for un-inlining
From: Paul Gortmaker @ 2010-05-12  0:30 UTC (permalink / raw)
  To: netdev; +Cc: allan.stephens
In-Reply-To: <f90800f460df4ef216412e83e148771d2b6a7183.1273621271.git.paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

These functions have enough code in them such that they
seem like sensible targets for un-inlining.  Prior to doing
that, this adds the tipc_ prefix to the functions, so that
in the event of a panic dump or similar, the subsystem from
which the functions come from is immediately clear.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/addr.h       |    8 ++++----
 net/tipc/bcast.c      |    2 +-
 net/tipc/bcast.h      |   12 ++++++------
 net/tipc/bearer.c     |    4 ++--
 net/tipc/cluster.c    |    2 +-
 net/tipc/discover.c   |    6 +++---
 net/tipc/link.c       |   20 ++++++++++----------
 net/tipc/msg.h        |   14 +++++++-------
 net/tipc/name_distr.c |    2 +-
 net/tipc/name_table.c |    2 +-
 net/tipc/net.c        |    4 ++--
 net/tipc/node.c       |   12 ++++++------
 net/tipc/port.c       |   22 +++++++++++-----------
 13 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/net/tipc/addr.h b/net/tipc/addr.h
index 3ba67e6..4d4aee0 100644
--- a/net/tipc/addr.h
+++ b/net/tipc/addr.h
@@ -67,7 +67,7 @@ static inline int may_route(u32 addr)
 	return(addr ^ tipc_own_addr) >> 11;
 }
 
-static inline int in_scope(u32 domain, u32 addr)
+static inline int tipc_in_scope(u32 domain, u32 addr)
 {
 	if (!domain || (domain == addr))
 		return 1;
@@ -79,10 +79,10 @@ static inline int in_scope(u32 domain, u32 addr)
 }
 
 /**
- * addr_scope - convert message lookup domain to equivalent 2-bit scope value
+ * tipc_addr_scope - convert message lookup domain to a 2-bit scope value
  */
 
-static inline int addr_scope(u32 domain)
+static inline int tipc_addr_scope(u32 domain)
 {
 	if (likely(!domain))
 		return TIPC_ZONE_SCOPE;
@@ -110,7 +110,7 @@ static inline int addr_domain(int sc)
 	return tipc_addr(tipc_zone(tipc_own_addr), 0, 0);
 }
 
-static inline char *addr_string_fill(char *string, u32 addr)
+static inline char *tipc_addr_string_fill(char *string, u32 addr)
 {
 	snprintf(string, 16, "<%u.%u.%u>",
 		 tipc_zone(addr), tipc_cluster(addr), tipc_node(addr));
diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index a18f26d..a8f22e7 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -275,7 +275,7 @@ static void bclink_send_nack(struct tipc_node *n_ptr)
 	buf = buf_acquire(INT_H_SIZE);
 	if (buf) {
 		msg = buf_msg(buf);
-		msg_init(msg, BCAST_PROTOCOL, STATE_MSG,
+		tipc_msg_init(msg, BCAST_PROTOCOL, STATE_MSG,
 			 INT_H_SIZE, n_ptr->addr);
 		msg_set_mc_netid(msg, tipc_net_id);
 		msg_set_bcast_ack(msg, mod(n_ptr->bclink.last_in));
diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
index 4c1771e..2b1c4a7 100644
--- a/net/tipc/bcast.h
+++ b/net/tipc/bcast.h
@@ -74,7 +74,7 @@ extern const char tipc_bclink_name[];
 
 
 /**
- * nmap_add - add a node to a node map
+ * tipc_nmap_add - add a node to a node map
  */
 
 static inline void tipc_nmap_add(struct tipc_node_map *nm_ptr, u32 node)
@@ -90,7 +90,7 @@ static inline void tipc_nmap_add(struct tipc_node_map *nm_ptr, u32 node)
 }
 
 /**
- * nmap_remove - remove a node from a node map
+ * tipc_nmap_remove - remove a node from a node map
  */
 
 static inline void tipc_nmap_remove(struct tipc_node_map *nm_ptr, u32 node)
@@ -106,7 +106,7 @@ static inline void tipc_nmap_remove(struct tipc_node_map *nm_ptr, u32 node)
 }
 
 /**
- * nmap_equal - test for equality of node maps
+ * tipc_nmap_equal - test for equality of node maps
  */
 
 static inline int tipc_nmap_equal(struct tipc_node_map *nm_a, struct tipc_node_map *nm_b)
@@ -115,7 +115,7 @@ static inline int tipc_nmap_equal(struct tipc_node_map *nm_a, struct tipc_node_m
 }
 
 /**
- * nmap_diff - find differences between node maps
+ * tipc_nmap_diff - find differences between node maps
  * @nm_a: input node map A
  * @nm_b: input node map B
  * @nm_diff: output node map A-B (i.e. nodes of A that are not in B)
@@ -143,7 +143,7 @@ static inline void tipc_nmap_diff(struct tipc_node_map *nm_a, struct tipc_node_m
 }
 
 /**
- * port_list_add - add a port to a port list, ensuring no duplicates
+ * tipc_port_list_add - add a port to a port list, ensuring no duplicates
  */
 
 static inline void tipc_port_list_add(struct port_list *pl_ptr, u32 port)
@@ -176,7 +176,7 @@ static inline void tipc_port_list_add(struct port_list *pl_ptr, u32 port)
 }
 
 /**
- * port_list_free - free dynamically created entries in port_list chain
+ * tipc_port_list_free - free dynamically created entries in port_list chain
  *
  * Note: First item is on stack, so it doesn't need to be released
  */
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 7809137..ccec12f 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -493,7 +493,7 @@ int tipc_enable_bearer(const char *name, u32 bcast_scope, u32 priority)
 		return -EINVAL;
 	}
 	if (!tipc_addr_domain_valid(bcast_scope) ||
-	    !in_scope(bcast_scope, tipc_own_addr)) {
+	    !tipc_in_scope(bcast_scope, tipc_own_addr)) {
 		warn("Bearer <%s> rejected, illegal broadcast scope\n", name);
 		return -EINVAL;
 	}
@@ -571,7 +571,7 @@ restart:
 	spin_lock_init(&b_ptr->publ.lock);
 	write_unlock_bh(&tipc_net_lock);
 	info("Enabled bearer <%s>, discovery domain %s, priority %u\n",
-	     name, addr_string_fill(addr_string, bcast_scope), priority);
+	     name, tipc_addr_string_fill(addr_string, bcast_scope), priority);
 	return 0;
 failed:
 	write_unlock_bh(&tipc_net_lock);
diff --git a/net/tipc/cluster.c b/net/tipc/cluster.c
index a7eac00..e68f705 100644
--- a/net/tipc/cluster.c
+++ b/net/tipc/cluster.c
@@ -238,7 +238,7 @@ static struct sk_buff *tipc_cltr_prepare_routing_msg(u32 data_size, u32 dest)
 	if (buf) {
 		msg = buf_msg(buf);
 		memset((char *)msg, 0, size);
-		msg_init(msg, ROUTE_DISTRIBUTOR, 0, INT_H_SIZE, dest);
+		tipc_msg_init(msg, ROUTE_DISTRIBUTOR, 0, INT_H_SIZE, dest);
 	}
 	return buf;
 }
diff --git a/net/tipc/discover.c b/net/tipc/discover.c
index ce1390a..fc1fcf5 100644
--- a/net/tipc/discover.c
+++ b/net/tipc/discover.c
@@ -120,7 +120,7 @@ static struct sk_buff *tipc_disc_init_msg(u32 type,
 
 	if (buf) {
 		msg = buf_msg(buf);
-		msg_init(msg, LINK_CONFIG, type, DSC_H_SIZE, dest_domain);
+		tipc_msg_init(msg, LINK_CONFIG, type, DSC_H_SIZE, dest_domain);
 		msg_set_non_seq(msg, 1);
 		msg_set_req_links(msg, req_links);
 		msg_set_dest_domain(msg, dest_domain);
@@ -144,7 +144,7 @@ static void disc_dupl_alert(struct bearer *b_ptr, u32 node_addr,
 	char media_addr_str[64];
 	struct print_buf pb;
 
-	addr_string_fill(node_addr_str, node_addr);
+	tipc_addr_string_fill(node_addr_str, node_addr);
 	tipc_printbuf_init(&pb, media_addr_str, sizeof(media_addr_str));
 	tipc_media_addr_printf(&pb, media_addr);
 	tipc_printbuf_validate(&pb);
@@ -183,7 +183,7 @@ void tipc_disc_recv_msg(struct sk_buff *buf, struct bearer *b_ptr)
 			disc_dupl_alert(b_ptr, tipc_own_addr, &media_addr);
 		return;
 	}
-	if (!in_scope(dest, tipc_own_addr))
+	if (!tipc_in_scope(dest, tipc_own_addr))
 		return;
 	if (is_slave(tipc_own_addr) && is_slave(orig))
 		return;
diff --git a/net/tipc/link.c b/net/tipc/link.c
index e8320bf..a3616b9 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -433,7 +433,7 @@ struct link *tipc_link_create(struct bearer *b_ptr, const u32 peer,
 
 	l_ptr->pmsg = (struct tipc_msg *)&l_ptr->proto_msg;
 	msg = l_ptr->pmsg;
-	msg_init(msg, LINK_PROTOCOL, RESET_MSG, INT_H_SIZE, l_ptr->addr);
+	tipc_msg_init(msg, LINK_PROTOCOL, RESET_MSG, INT_H_SIZE, l_ptr->addr);
 	msg_set_size(msg, sizeof(l_ptr->proto_msg));
 	msg_set_session(msg, (tipc_random & 0xffff));
 	msg_set_bearer_id(msg, b_ptr->identity);
@@ -1025,7 +1025,7 @@ int tipc_link_send_buf(struct link *l_ptr, struct sk_buff *buf)
 	u32 size = msg_size(msg);
 	u32 dsz = msg_data_sz(msg);
 	u32 queue_size = l_ptr->out_queue_size;
-	u32 imp = msg_tot_importance(msg);
+	u32 imp = tipc_msg_tot_importance(msg);
 	u32 queue_limit = l_ptr->queue_limit[imp];
 	u32 max_packet = l_ptr->max_pkt;
 
@@ -1090,7 +1090,7 @@ int tipc_link_send_buf(struct link *l_ptr, struct sk_buff *buf)
 			struct tipc_msg bundler_hdr;
 
 			if (bundler) {
-				msg_init(&bundler_hdr, MSG_BUNDLER, OPEN_MSG,
+				tipc_msg_init(&bundler_hdr, MSG_BUNDLER, OPEN_MSG,
 					 INT_H_SIZE, l_ptr->addr);
 				skb_copy_to_linear_data(bundler, &bundler_hdr,
 							INT_H_SIZE);
@@ -1243,7 +1243,7 @@ again:
 	 * (Must not hold any locks while building message.)
 	 */
 
-	res = msg_build(hdr, msg_sect, num_sect, sender->publ.max_pkt,
+	res = tipc_msg_build(hdr, msg_sect, num_sect, sender->publ.max_pkt,
 			!sender->user_port, &buf);
 
 	read_lock_bh(&tipc_net_lock);
@@ -1354,7 +1354,7 @@ again:
 	/* Prepare reusable fragment header: */
 
 	msg_dbg(hdr, ">FRAGMENTING>");
-	msg_init(&fragm_hdr, MSG_FRAGMENTER, FIRST_FRAGMENT,
+	tipc_msg_init(&fragm_hdr, MSG_FRAGMENTER, FIRST_FRAGMENT,
 		 INT_H_SIZE, msg_destnode(hdr));
 	msg_set_link_selector(&fragm_hdr, sender->publ.ref);
 	msg_set_size(&fragm_hdr, max_pkt);
@@ -1613,7 +1613,7 @@ static void link_reset_all(unsigned long addr)
 	tipc_node_lock(n_ptr);
 
 	warn("Resetting all links to %s\n",
-	     addr_string_fill(addr_string, n_ptr->addr));
+	     tipc_addr_string_fill(addr_string, n_ptr->addr));
 
 	for (i = 0; i < MAX_BEARERS; i++) {
 		if (n_ptr->links[i]) {
@@ -1655,7 +1655,7 @@ static void link_retransmit_failure(struct link *l_ptr, struct sk_buff *buf)
 		n_ptr = l_ptr->owner->next;
 		tipc_node_lock(n_ptr);
 
-		addr_string_fill(addr_string, n_ptr->addr);
+		tipc_addr_string_fill(addr_string, n_ptr->addr);
 		tipc_printf(TIPC_OUTPUT, "Multicast link info for %s\n", addr_string);
 		tipc_printf(TIPC_OUTPUT, "Supported: %d,  ", n_ptr->bclink.supported);
 		tipc_printf(TIPC_OUTPUT, "Acked: %u\n", n_ptr->bclink.acked);
@@ -2398,7 +2398,7 @@ void tipc_link_changeover(struct link *l_ptr)
 		return;
 	}
 
-	msg_init(&tunnel_hdr, CHANGEOVER_PROTOCOL,
+	tipc_msg_init(&tunnel_hdr, CHANGEOVER_PROTOCOL,
 		 ORIGINAL_MSG, INT_H_SIZE, l_ptr->addr);
 	msg_set_bearer_id(&tunnel_hdr, l_ptr->peer_bearer_id);
 	msg_set_msgcnt(&tunnel_hdr, msgcount);
@@ -2453,7 +2453,7 @@ void tipc_link_send_duplicate(struct link *l_ptr, struct link *tunnel)
 	struct sk_buff *iter;
 	struct tipc_msg tunnel_hdr;
 
-	msg_init(&tunnel_hdr, CHANGEOVER_PROTOCOL,
+	tipc_msg_init(&tunnel_hdr, CHANGEOVER_PROTOCOL,
 		 DUPLICATE_MSG, INT_H_SIZE, l_ptr->addr);
 	msg_set_msgcnt(&tunnel_hdr, l_ptr->out_queue_size);
 	msg_set_bearer_id(&tunnel_hdr, l_ptr->peer_bearer_id);
@@ -2659,7 +2659,7 @@ int tipc_link_send_long_buf(struct link *l_ptr, struct sk_buff *buf)
 
 	/* Prepare reusable fragment header: */
 
-	msg_init(&fragm_hdr, MSG_FRAGMENTER, FIRST_FRAGMENT,
+	tipc_msg_init(&fragm_hdr, MSG_FRAGMENTER, FIRST_FRAGMENT,
 		 INT_H_SIZE, destaddr);
 	msg_set_link_selector(&fragm_hdr, msg_link_selector(inmsg));
 	msg_set_long_msgno(&fragm_hdr, mod(l_ptr->long_msg_seq_no++));
diff --git a/net/tipc/msg.h b/net/tipc/msg.h
index 7ee6ae2..fbcd46f 100644
--- a/net/tipc/msg.h
+++ b/net/tipc/msg.h
@@ -708,7 +708,7 @@ static inline void msg_set_dataoctet(struct tipc_msg *m, u32 pos)
 #define DSC_REQ_MSG          0
 #define DSC_RESP_MSG         1
 
-static inline u32 msg_tot_importance(struct tipc_msg *m)
+static inline u32 tipc_msg_tot_importance(struct tipc_msg *m)
 {
 	if (likely(msg_isdata(m))) {
 		if (likely(msg_orignode(m) == tipc_own_addr))
@@ -722,7 +722,7 @@ static inline u32 msg_tot_importance(struct tipc_msg *m)
 }
 
 
-static inline void msg_init(struct tipc_msg *m, u32 user, u32 type,
+static inline void tipc_msg_init(struct tipc_msg *m, u32 user, u32 type,
 			    u32 hsize, u32 destnode)
 {
 	memset(m, 0, hsize);
@@ -739,10 +739,10 @@ static inline void msg_init(struct tipc_msg *m, u32 user, u32 type,
 }
 
 /**
- * msg_calc_data_size - determine total data size for message
+ * tipc_msg_calc_data_size - determine total data size for message
  */
 
-static inline int msg_calc_data_size(struct iovec const *msg_sect, u32 num_sect)
+static inline int tipc_msg_calc_data_size(struct iovec const *msg_sect, u32 num_sect)
 {
 	int dsz = 0;
 	int i;
@@ -753,20 +753,20 @@ static inline int msg_calc_data_size(struct iovec const *msg_sect, u32 num_sect)
 }
 
 /**
- * msg_build - create message using specified header and data
+ * tipc_msg_build - create message using specified header and data
  *
  * Note: Caller must not hold any locks in case copy_from_user() is interrupted!
  *
  * Returns message data size or errno
  */
 
-static inline int msg_build(struct tipc_msg *hdr,
+static inline int tipc_msg_build(struct tipc_msg *hdr,
 			    struct iovec const *msg_sect, u32 num_sect,
 			    int max_size, int usrmem, struct sk_buff** buf)
 {
 	int dsz, sz, hsz, pos, res, cnt;
 
-	dsz = msg_calc_data_size(msg_sect, num_sect);
+	dsz = tipc_msg_calc_data_size(msg_sect, num_sect);
 	if (unlikely(dsz > TIPC_MAX_USER_MSG_SIZE)) {
 		*buf = NULL;
 		return -EINVAL;
diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
index 10a6989..6ac3c54 100644
--- a/net/tipc/name_distr.c
+++ b/net/tipc/name_distr.c
@@ -103,7 +103,7 @@ static struct sk_buff *named_prepare_buf(u32 type, u32 size, u32 dest)
 
 	if (buf != NULL) {
 		msg = buf_msg(buf);
-		msg_init(msg, NAME_DISTRIBUTOR, type, LONG_H_SIZE, dest);
+		tipc_msg_init(msg, NAME_DISTRIBUTOR, type, LONG_H_SIZE, dest);
 		msg_set_size(msg, LONG_H_SIZE + size);
 	}
 	return buf;
diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index acab41a..8ba7962 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -627,7 +627,7 @@ u32 tipc_nametbl_translate(u32 type, u32 instance, u32 *destnode)
 	struct name_seq *seq;
 	u32 ref;
 
-	if (!in_scope(*destnode, tipc_own_addr))
+	if (!tipc_in_scope(*destnode, tipc_own_addr))
 		return 0;
 
 	read_lock_bh(&tipc_nametbl_lock);
diff --git a/net/tipc/net.c b/net/tipc/net.c
index d7cd1e0..f61b769 100644
--- a/net/tipc/net.c
+++ b/net/tipc/net.c
@@ -219,7 +219,7 @@ void tipc_net_route_msg(struct sk_buff *buf)
 
 	/* Handle message for this node */
 	dnode = msg_short(msg) ? tipc_own_addr : msg_destnode(msg);
-	if (in_scope(dnode, tipc_own_addr)) {
+	if (tipc_in_scope(dnode, tipc_own_addr)) {
 		if (msg_isdata(msg)) {
 			if (msg_mcast(msg))
 				tipc_port_recv_mcast(buf, NULL);
@@ -277,7 +277,7 @@ int tipc_net_start(u32 addr)
 
 	info("Started in network mode\n");
 	info("Own node address %s, network identity %u\n",
-	     addr_string_fill(addr_string, tipc_own_addr), tipc_net_id);
+	     tipc_addr_string_fill(addr_string, tipc_own_addr), tipc_net_id);
 	return 0;
 }
 
diff --git a/net/tipc/node.c b/net/tipc/node.c
index 17cc394..b634942 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -268,7 +268,7 @@ struct tipc_node *tipc_node_attach_link(struct link *l_ptr)
 
 		if (n_ptr->link_cnt >= 2) {
 			err("Attempt to create third link to %s\n",
-			    addr_string_fill(addr_string, n_ptr->addr));
+			    tipc_addr_string_fill(addr_string, n_ptr->addr));
 			return NULL;
 		}
 
@@ -280,7 +280,7 @@ struct tipc_node *tipc_node_attach_link(struct link *l_ptr)
 		}
 		err("Attempt to establish second link on <%s> to %s\n",
 		    l_ptr->b_ptr->publ.name,
-		    addr_string_fill(addr_string, l_ptr->addr));
+		    tipc_addr_string_fill(addr_string, l_ptr->addr));
 	}
 	return NULL;
 }
@@ -439,7 +439,7 @@ static void node_lost_contact(struct tipc_node *n_ptr)
 		return;
 
 	info("Lost contact with %s\n",
-	     addr_string_fill(addr_string, n_ptr->addr));
+	     tipc_addr_string_fill(addr_string, n_ptr->addr));
 
 	/* Abort link changeover */
 	for (i = 0; i < MAX_BEARERS; i++) {
@@ -602,7 +602,7 @@ u32 tipc_available_nodes(const u32 domain)
 
 	read_lock_bh(&tipc_net_lock);
 	for (n_ptr = tipc_nodes; n_ptr; n_ptr = n_ptr->next) {
-		if (!in_scope(domain, n_ptr->addr))
+		if (!tipc_in_scope(domain, n_ptr->addr))
 			continue;
 		if (tipc_node_is_up(n_ptr))
 			cnt++;
@@ -651,7 +651,7 @@ struct sk_buff *tipc_node_get_nodes(const void *req_tlv_area, int req_tlv_space)
 	/* Add TLVs for all nodes in scope */
 
 	for (n_ptr = tipc_nodes; n_ptr; n_ptr = n_ptr->next) {
-		if (!in_scope(domain, n_ptr->addr))
+		if (!tipc_in_scope(domain, n_ptr->addr))
 			continue;
 		node_info.addr = htonl(n_ptr->addr);
 		node_info.up = htonl(tipc_node_is_up(n_ptr));
@@ -711,7 +711,7 @@ struct sk_buff *tipc_node_get_links(const void *req_tlv_area, int req_tlv_space)
 	for (n_ptr = tipc_nodes; n_ptr; n_ptr = n_ptr->next) {
 		u32 i;
 
-		if (!in_scope(domain, n_ptr->addr))
+		if (!tipc_in_scope(domain, n_ptr->addr))
 			continue;
 		tipc_node_lock(n_ptr);
 		for (i = 0; i < MAX_BEARERS; i++) {
diff --git a/net/tipc/port.c b/net/tipc/port.c
index 7641db6..0737680 100644
--- a/net/tipc/port.c
+++ b/net/tipc/port.c
@@ -116,7 +116,7 @@ int tipc_multicast(u32 ref, struct tipc_name_seq const *seq, u32 domain,
 	msg_set_namelower(hdr, seq->lower);
 	msg_set_nameupper(hdr, seq->upper);
 	msg_set_hdr_sz(hdr, MCAST_H_SIZE);
-	res = msg_build(hdr, msg_sect, num_sect, MAX_MSG_SIZE,
+	res = tipc_msg_build(hdr, msg_sect, num_sect, MAX_MSG_SIZE,
 			!oport->user_port, &buf);
 	if (unlikely(!buf))
 		return res;
@@ -241,7 +241,7 @@ struct tipc_port *tipc_createport_raw(void *usr_handle,
 	p_ptr->publ.max_pkt = MAX_PKT_DEFAULT;
 	p_ptr->publ.ref = ref;
 	msg = &p_ptr->publ.phdr;
-	msg_init(msg, importance, TIPC_NAMED_MSG, LONG_H_SIZE, 0);
+	tipc_msg_init(msg, importance, TIPC_NAMED_MSG, LONG_H_SIZE, 0);
 	msg_set_origport(msg, ref);
 	p_ptr->last_in_seqno = 41;
 	p_ptr->sent = 1;
@@ -395,7 +395,7 @@ static struct sk_buff *port_build_proto_msg(u32 destport, u32 destnode,
 	buf = buf_acquire(LONG_H_SIZE);
 	if (buf) {
 		msg = buf_msg(buf);
-		msg_init(msg, usr, type, LONG_H_SIZE, destnode);
+		tipc_msg_init(msg, usr, type, LONG_H_SIZE, destnode);
 		msg_set_errcode(msg, err);
 		msg_set_destport(msg, destport);
 		msg_set_origport(msg, origport);
@@ -439,7 +439,7 @@ int tipc_reject_msg(struct sk_buff *buf, u32 err)
 		return data_sz;
 	}
 	rmsg = buf_msg(rbuf);
-	msg_init(rmsg, imp, msg_type(msg), hdr_sz, msg_orignode(msg));
+	tipc_msg_init(rmsg, imp, msg_type(msg), hdr_sz, msg_orignode(msg));
 	msg_set_errcode(rmsg, err);
 	msg_set_destport(rmsg, msg_origport(msg));
 	msg_set_origport(rmsg, msg_destport(msg));
@@ -480,7 +480,7 @@ int tipc_port_reject_sections(struct port *p_ptr, struct tipc_msg *hdr,
 	struct sk_buff *buf;
 	int res;
 
-	res = msg_build(hdr, msg_sect, num_sect, MAX_MSG_SIZE,
+	res = tipc_msg_build(hdr, msg_sect, num_sect, MAX_MSG_SIZE,
 			!p_ptr->user_port, &buf);
 	if (!buf)
 		return res;
@@ -1343,7 +1343,7 @@ int tipc_port_recv_sections(struct port *sender, unsigned int num_sect,
 	struct sk_buff *buf;
 	int res;
 
-	res = msg_build(&sender->publ.phdr, msg_sect, num_sect,
+	res = tipc_msg_build(&sender->publ.phdr, msg_sect, num_sect,
 			MAX_MSG_SIZE, !sender->user_port, &buf);
 	if (likely(buf))
 		tipc_port_recv_msg(buf);
@@ -1383,7 +1383,7 @@ int tipc_send(u32 ref, unsigned int num_sect, struct iovec const *msg_sect)
 	if (port_unreliable(p_ptr)) {
 		p_ptr->publ.congested = 0;
 		/* Just calculate msg length and return */
-		return msg_calc_data_size(msg_sect, num_sect);
+		return tipc_msg_calc_data_size(msg_sect, num_sect);
 	}
 	return -ELINKCONG;
 }
@@ -1466,7 +1466,7 @@ int tipc_forward2name(u32 ref,
 	msg_set_hdr_sz(msg, LONG_H_SIZE);
 	msg_set_nametype(msg, name->type);
 	msg_set_nameinst(msg, name->instance);
-	msg_set_lookup_scope(msg, addr_scope(domain));
+	msg_set_lookup_scope(msg, tipc_addr_scope(domain));
 	if (importance <= TIPC_CRITICAL_IMPORTANCE)
 		msg_set_importance(msg,importance);
 	destport = tipc_nametbl_translate(name->type, name->instance, &destnode);
@@ -1483,7 +1483,7 @@ int tipc_forward2name(u32 ref,
 			return res;
 		if (port_unreliable(p_ptr)) {
 			/* Just calculate msg length and return */
-			return msg_calc_data_size(msg_sect, num_sect);
+			return tipc_msg_calc_data_size(msg_sect, num_sect);
 		}
 		return -ELINKCONG;
 	}
@@ -1539,7 +1539,7 @@ int tipc_forward_buf2name(u32 ref,
 	msg_set_origport(msg, orig->ref);
 	msg_set_nametype(msg, name->type);
 	msg_set_nameinst(msg, name->instance);
-	msg_set_lookup_scope(msg, addr_scope(domain));
+	msg_set_lookup_scope(msg, tipc_addr_scope(domain));
 	msg_set_hdr_sz(msg, LONG_H_SIZE);
 	msg_set_size(msg, LONG_H_SIZE + dsz);
 	destport = tipc_nametbl_translate(name->type, name->instance, &destnode);
@@ -1619,7 +1619,7 @@ int tipc_forward2port(u32 ref,
 		return res;
 	if (port_unreliable(p_ptr)) {
 		/* Just calculate msg length and return */
-		return msg_calc_data_size(msg_sect, num_sect);
+		return tipc_msg_calc_data_size(msg_sect, num_sect);
 	}
 	return -ELINKCONG;
 }
-- 
1.7.1.rc2


^ permalink raw reply related

* [PATCH net-next 11/16] tipc: Reduce footprint by un-inlining address routines
From: Paul Gortmaker @ 2010-05-12  0:30 UTC (permalink / raw)
  To: netdev; +Cc: allan.stephens
In-Reply-To: <f90800f460df4ef216412e83e148771d2b6a7183.1273621271.git.paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Convert address-related inline routines that are more than one
line into standard functions, thereby eliminating a significant
amount of repeated code.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/addr.c |   32 ++++++++++++++++++++++++++++++++
 net/tipc/addr.h |   37 +++----------------------------------
 2 files changed, 35 insertions(+), 34 deletions(-)

diff --git a/net/tipc/addr.c b/net/tipc/addr.c
index e5207a1..c048543 100644
--- a/net/tipc/addr.c
+++ b/net/tipc/addr.c
@@ -92,3 +92,35 @@ int tipc_addr_node_valid(u32 addr)
 	return (tipc_addr_domain_valid(addr) && tipc_node(addr));
 }
 
+int tipc_in_scope(u32 domain, u32 addr)
+{
+	if (!domain || (domain == addr))
+		return 1;
+	if (domain == (addr & 0xfffff000u)) /* domain <Z.C.0> */
+		return 1;
+	if (domain == (addr & 0xff000000u)) /* domain <Z.0.0> */
+		return 1;
+	return 0;
+}
+
+/**
+ * tipc_addr_scope - convert message lookup domain to a 2-bit scope value
+ */
+
+int tipc_addr_scope(u32 domain)
+{
+	if (likely(!domain))
+		return TIPC_ZONE_SCOPE;
+	if (tipc_node(domain))
+		return TIPC_NODE_SCOPE;
+	if (tipc_cluster(domain))
+		return TIPC_CLUSTER_SCOPE;
+	return TIPC_ZONE_SCOPE;
+}
+
+char *tipc_addr_string_fill(char *string, u32 addr)
+{
+	snprintf(string, 16, "<%u.%u.%u>",
+		 tipc_zone(addr), tipc_cluster(addr), tipc_node(addr));
+	return string;
+}
diff --git a/net/tipc/addr.h b/net/tipc/addr.h
index 4d4aee0..c1cc572 100644
--- a/net/tipc/addr.h
+++ b/net/tipc/addr.h
@@ -67,32 +67,6 @@ static inline int may_route(u32 addr)
 	return(addr ^ tipc_own_addr) >> 11;
 }
 
-static inline int tipc_in_scope(u32 domain, u32 addr)
-{
-	if (!domain || (domain == addr))
-		return 1;
-	if (domain == (addr & 0xfffff000u)) /* domain <Z.C.0> */
-		return 1;
-	if (domain == (addr & 0xff000000u)) /* domain <Z.0.0> */
-		return 1;
-	return 0;
-}
-
-/**
- * tipc_addr_scope - convert message lookup domain to a 2-bit scope value
- */
-
-static inline int tipc_addr_scope(u32 domain)
-{
-	if (likely(!domain))
-		return TIPC_ZONE_SCOPE;
-	if (tipc_node(domain))
-		return TIPC_NODE_SCOPE;
-	if (tipc_cluster(domain))
-		return TIPC_CLUSTER_SCOPE;
-	return TIPC_ZONE_SCOPE;
-}
-
 /**
  * addr_domain - convert 2-bit scope value to equivalent message lookup domain
  *
@@ -110,14 +84,9 @@ static inline int addr_domain(int sc)
 	return tipc_addr(tipc_zone(tipc_own_addr), 0, 0);
 }
 
-static inline char *tipc_addr_string_fill(char *string, u32 addr)
-{
-	snprintf(string, 16, "<%u.%u.%u>",
-		 tipc_zone(addr), tipc_cluster(addr), tipc_node(addr));
-	return string;
-}
-
 int tipc_addr_domain_valid(u32);
 int tipc_addr_node_valid(u32 addr);
-
+int tipc_in_scope(u32 domain, u32 addr);
+int tipc_addr_scope(u32 domain);
+char *tipc_addr_string_fill(char *string, u32 addr);
 #endif
-- 
1.7.1.rc2


^ permalink raw reply related

* [PATCH net-next 12/16] tipc: Reduce footprint by un-inlining nmap routines
From: Paul Gortmaker @ 2010-05-12  0:30 UTC (permalink / raw)
  To: netdev; +Cc: allan.stephens
In-Reply-To: <f90800f460df4ef216412e83e148771d2b6a7183.1273621271.git.paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Converts nmap inline routines that are more than one line into standard
functions, thereby eliminating a significant amount of repeated code.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/bcast.c |   60 +++++++++++++++++++++++++++++++++++++++++++++++++++
 net/tipc/bcast.h |   63 +++--------------------------------------------------
 2 files changed, 64 insertions(+), 59 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index a8f22e7..1ee6424 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -822,3 +822,63 @@ void tipc_bclink_stop(void)
 	spin_unlock_bh(&bc_lock);
 }
 
+
+/**
+ * tipc_nmap_add - add a node to a node map
+ */
+
+void tipc_nmap_add(struct tipc_node_map *nm_ptr, u32 node)
+{
+	int n = tipc_node(node);
+	int w = n / WSIZE;
+	u32 mask = (1 << (n % WSIZE));
+
+	if ((nm_ptr->map[w] & mask) == 0) {
+		nm_ptr->count++;
+		nm_ptr->map[w] |= mask;
+	}
+}
+
+/**
+ * tipc_nmap_remove - remove a node from a node map
+ */
+
+void tipc_nmap_remove(struct tipc_node_map *nm_ptr, u32 node)
+{
+	int n = tipc_node(node);
+	int w = n / WSIZE;
+	u32 mask = (1 << (n % WSIZE));
+
+	if ((nm_ptr->map[w] & mask) != 0) {
+		nm_ptr->map[w] &= ~mask;
+		nm_ptr->count--;
+	}
+}
+
+/**
+ * tipc_nmap_diff - find differences between node maps
+ * @nm_a: input node map A
+ * @nm_b: input node map B
+ * @nm_diff: output node map A-B (i.e. nodes of A that are not in B)
+ */
+
+void tipc_nmap_diff(struct tipc_node_map *nm_a, struct tipc_node_map *nm_b,
+				  struct tipc_node_map *nm_diff)
+{
+	int stop = ARRAY_SIZE(nm_a->map);
+	int w;
+	int b;
+	u32 map;
+
+	memset(nm_diff, 0, sizeof(*nm_diff));
+	for (w = 0; w < stop; w++) {
+		map = nm_a->map[w] ^ (nm_a->map[w] & nm_b->map[w]);
+		nm_diff->map[w] = map;
+		if (map != 0) {
+			for (b = 0 ; b < WSIZE; b++) {
+				if (map & (1 << b))
+					nm_diff->count++;
+			}
+		}
+	}
+}
diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
index 2b1c4a7..cd77981 100644
--- a/net/tipc/bcast.h
+++ b/net/tipc/bcast.h
@@ -72,38 +72,8 @@ struct tipc_node;
 
 extern const char tipc_bclink_name[];
 
-
-/**
- * tipc_nmap_add - add a node to a node map
- */
-
-static inline void tipc_nmap_add(struct tipc_node_map *nm_ptr, u32 node)
-{
-	int n = tipc_node(node);
-	int w = n / WSIZE;
-	u32 mask = (1 << (n % WSIZE));
-
-	if ((nm_ptr->map[w] & mask) == 0) {
-		nm_ptr->count++;
-		nm_ptr->map[w] |= mask;
-	}
-}
-
-/**
- * tipc_nmap_remove - remove a node from a node map
- */
-
-static inline void tipc_nmap_remove(struct tipc_node_map *nm_ptr, u32 node)
-{
-	int n = tipc_node(node);
-	int w = n / WSIZE;
-	u32 mask = (1 << (n % WSIZE));
-
-	if ((nm_ptr->map[w] & mask) != 0) {
-		nm_ptr->map[w] &= ~mask;
-		nm_ptr->count--;
-	}
-}
+void tipc_nmap_add(struct tipc_node_map *nm_ptr, u32 node);
+void tipc_nmap_remove(struct tipc_node_map *nm_ptr, u32 node);
 
 /**
  * tipc_nmap_equal - test for equality of node maps
@@ -114,33 +84,8 @@ static inline int tipc_nmap_equal(struct tipc_node_map *nm_a, struct tipc_node_m
 	return !memcmp(nm_a, nm_b, sizeof(*nm_a));
 }
 
-/**
- * tipc_nmap_diff - find differences between node maps
- * @nm_a: input node map A
- * @nm_b: input node map B
- * @nm_diff: output node map A-B (i.e. nodes of A that are not in B)
- */
-
-static inline void tipc_nmap_diff(struct tipc_node_map *nm_a, struct tipc_node_map *nm_b,
-				  struct tipc_node_map *nm_diff)
-{
-	int stop = ARRAY_SIZE(nm_a->map);
-	int w;
-	int b;
-	u32 map;
-
-	memset(nm_diff, 0, sizeof(*nm_diff));
-	for (w = 0; w < stop; w++) {
-		map = nm_a->map[w] ^ (nm_a->map[w] & nm_b->map[w]);
-		nm_diff->map[w] = map;
-		if (map != 0) {
-			for (b = 0 ; b < WSIZE; b++) {
-				if (map & (1 << b))
-					nm_diff->count++;
-			}
-		}
-	}
-}
+void tipc_nmap_diff(struct tipc_node_map *nm_a, struct tipc_node_map *nm_b,
+				  struct tipc_node_map *nm_diff);
 
 /**
  * tipc_port_list_add - add a port to a port list, ensuring no duplicates
-- 
1.7.1.rc2


^ permalink raw reply related

* [PATCH net-next 13/16] tipc: Reduce footprint by un-inlining port list routines
From: Paul Gortmaker @ 2010-05-12  0:30 UTC (permalink / raw)
  To: netdev; +Cc: allan.stephens
In-Reply-To: <f90800f460df4ef216412e83e148771d2b6a7183.1273621271.git.paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Converts port list inline routines that are more than one line into
standard functions, thereby eliminating a significant amount of
repeated code.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/bcast.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 net/tipc/bcast.h |   52 ++--------------------------------------------------
 2 files changed, 52 insertions(+), 50 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 1ee6424..a008c66 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -882,3 +882,53 @@ void tipc_nmap_diff(struct tipc_node_map *nm_a, struct tipc_node_map *nm_b,
 		}
 	}
 }
+
+/**
+ * tipc_port_list_add - add a port to a port list, ensuring no duplicates
+ */
+
+void tipc_port_list_add(struct port_list *pl_ptr, u32 port)
+{
+	struct port_list *item = pl_ptr;
+	int i;
+	int item_sz = PLSIZE;
+	int cnt = pl_ptr->count;
+
+	for (; ; cnt -= item_sz, item = item->next) {
+		if (cnt < PLSIZE)
+			item_sz = cnt;
+		for (i = 0; i < item_sz; i++)
+			if (item->ports[i] == port)
+				return;
+		if (i < PLSIZE) {
+			item->ports[i] = port;
+			pl_ptr->count++;
+			return;
+		}
+		if (!item->next) {
+			item->next = kmalloc(sizeof(*item), GFP_ATOMIC);
+			if (!item->next) {
+				warn("Incomplete multicast delivery, no memory\n");
+				return;
+			}
+			item->next->next = NULL;
+		}
+	}
+}
+
+/**
+ * tipc_port_list_free - free dynamically created entries in port_list chain
+ *
+ */
+
+void tipc_port_list_free(struct port_list *pl_ptr)
+{
+	struct port_list *item;
+	struct port_list *next;
+
+	for (item = pl_ptr->next; item; item = next) {
+		next = item->next;
+		kfree(item);
+	}
+}
+
diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
index cd77981..e8c2b81 100644
--- a/net/tipc/bcast.h
+++ b/net/tipc/bcast.h
@@ -87,56 +87,8 @@ static inline int tipc_nmap_equal(struct tipc_node_map *nm_a, struct tipc_node_m
 void tipc_nmap_diff(struct tipc_node_map *nm_a, struct tipc_node_map *nm_b,
 				  struct tipc_node_map *nm_diff);
 
-/**
- * tipc_port_list_add - add a port to a port list, ensuring no duplicates
- */
-
-static inline void tipc_port_list_add(struct port_list *pl_ptr, u32 port)
-{
-	struct port_list *item = pl_ptr;
-	int i;
-	int item_sz = PLSIZE;
-	int cnt = pl_ptr->count;
-
-	for (; ; cnt -= item_sz, item = item->next) {
-		if (cnt < PLSIZE)
-			item_sz = cnt;
-		for (i = 0; i < item_sz; i++)
-			if (item->ports[i] == port)
-				return;
-		if (i < PLSIZE) {
-			item->ports[i] = port;
-			pl_ptr->count++;
-			return;
-		}
-		if (!item->next) {
-			item->next = kmalloc(sizeof(*item), GFP_ATOMIC);
-			if (!item->next) {
-				warn("Incomplete multicast delivery, no memory\n");
-				return;
-			}
-			item->next->next = NULL;
-		}
-	}
-}
-
-/**
- * tipc_port_list_free - free dynamically created entries in port_list chain
- *
- * Note: First item is on stack, so it doesn't need to be released
- */
-
-static inline void tipc_port_list_free(struct port_list *pl_ptr)
-{
-	struct port_list *item;
-	struct port_list *next;
-
-	for (item = pl_ptr->next; item; item = next) {
-		next = item->next;
-		kfree(item);
-	}
-}
-
+void tipc_port_list_add(struct port_list *pl_ptr, u32 port);
+void tipc_port_list_free(struct port_list *pl_ptr);
 
 int  tipc_bclink_init(void);
 void tipc_bclink_stop(void);
-- 
1.7.1.rc2


^ permalink raw reply related

* [PATCH net-next 14/16] tipc: Reduce footprint by un-inlining bearer congestion routine
From: Paul Gortmaker @ 2010-05-12  0:30 UTC (permalink / raw)
  To: netdev; +Cc: allan.stephens
In-Reply-To: <f90800f460df4ef216412e83e148771d2b6a7183.1273621271.git.paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Convert bearer congestion inline routine that is more than one line into
a standard function, thereby eliminating some repeated code.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/bearer.c |   12 ++++++++++++
 net/tipc/bearer.h |   16 ++--------------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index ccec12f..52ae17b 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -467,6 +467,18 @@ int tipc_bearer_resolve_congestion(struct bearer *b_ptr, struct link *l_ptr)
 	return res;
 }
 
+/**
+ * tipc_bearer_congested - determines if bearer is currently congested
+ */
+
+int tipc_bearer_congested(struct bearer *b_ptr, struct link *l_ptr)
+{
+	if (unlikely(b_ptr->publ.blocked))
+		return 1;
+	if (likely(list_empty(&b_ptr->cong_links)))
+		return 0;
+	return !tipc_bearer_resolve_congestion(b_ptr, l_ptr);
+}
 
 /**
  * tipc_enable_bearer - enable bearer with the given name
diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h
index 000228e..a850b38 100644
--- a/net/tipc/bearer.h
+++ b/net/tipc/bearer.h
@@ -125,6 +125,7 @@ void tipc_bearer_remove_dest(struct bearer *b_ptr, u32 dest);
 void tipc_bearer_schedule(struct bearer *b_ptr, struct link *l_ptr);
 struct bearer *tipc_bearer_find_interface(const char *if_name);
 int tipc_bearer_resolve_congestion(struct bearer *b_ptr, struct link *l_ptr);
+int tipc_bearer_congested(struct bearer *b_ptr, struct link *l_ptr);
 int tipc_bearer_init(void);
 void tipc_bearer_stop(void);
 void tipc_bearer_lock_push(struct bearer *b_ptr);
@@ -154,17 +155,4 @@ static inline int tipc_bearer_send(struct bearer *b_ptr, struct sk_buff *buf,
 	return !b_ptr->media->send_msg(buf, &b_ptr->publ, dest);
 }
 
-/**
- * tipc_bearer_congested - determines if bearer is currently congested
- */
-
-static inline int tipc_bearer_congested(struct bearer *b_ptr, struct link *l_ptr)
-{
-	if (unlikely(b_ptr->publ.blocked))
-		return 1;
-	if (likely(list_empty(&b_ptr->cong_links)))
-		return 0;
-	return !tipc_bearer_resolve_congestion(b_ptr, l_ptr);
-}
-
-#endif
+#endif	/* _TIPC_BEARER_H */
-- 
1.7.1.rc2


^ permalink raw reply related

* [PATCH net-next 15/16] tipc: Reduce footprint by un-inlining buf_acquire routine
From: Paul Gortmaker @ 2010-05-12  0:30 UTC (permalink / raw)
  To: netdev; +Cc: allan.stephens
In-Reply-To: <f90800f460df4ef216412e83e148771d2b6a7183.1273621271.git.paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Convert buf_acquire inline routine that is more than one line into
a standard function, thereby eliminating some repeated code.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/core.c |   24 ++++++++++++++++++++++++
 net/tipc/core.h |   24 +-----------------------
 2 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/net/tipc/core.c b/net/tipc/core.c
index b47d184..6964681 100644
--- a/net/tipc/core.c
+++ b/net/tipc/core.c
@@ -102,6 +102,30 @@ int tipc_get_mode(void)
 }
 
 /**
+ * buf_acquire - creates a TIPC message buffer
+ * @size: message size (including TIPC header)
+ *
+ * Returns a new buffer with data pointers set to the specified size.
+ *
+ * NOTE: Headroom is reserved to allow prepending of a data link header.
+ *       There may also be unrequested tailroom present at the buffer's end.
+ */
+
+struct sk_buff *buf_acquire(u32 size)
+{
+	struct sk_buff *skb;
+	unsigned int buf_size = (BUF_HEADROOM + size + 3) & ~3u;
+
+	skb = alloc_skb_fclone(buf_size, GFP_ATOMIC);
+	if (skb) {
+		skb_reserve(skb, BUF_HEADROOM);
+		skb_put(skb, size);
+		skb->next = NULL;
+	}
+	return skb;
+}
+
+/**
  * tipc_core_stop_net - shut down TIPC networking sub-systems
  */
 
diff --git a/net/tipc/core.h b/net/tipc/core.h
index 1e149f5..1887990 100644
--- a/net/tipc/core.h
+++ b/net/tipc/core.h
@@ -328,29 +328,7 @@ static inline struct tipc_msg *buf_msg(struct sk_buff *skb)
 	return (struct tipc_msg *)skb->data;
 }
 
-/**
- * buf_acquire - creates a TIPC message buffer
- * @size: message size (including TIPC header)
- *
- * Returns a new buffer with data pointers set to the specified size.
- *
- * NOTE: Headroom is reserved to allow prepending of a data link header.
- *       There may also be unrequested tailroom present at the buffer's end.
- */
-
-static inline struct sk_buff *buf_acquire(u32 size)
-{
-	struct sk_buff *skb;
-	unsigned int buf_size = (BUF_HEADROOM + size + 3) & ~3u;
-
-	skb = alloc_skb_fclone(buf_size, GFP_ATOMIC);
-	if (skb) {
-		skb_reserve(skb, BUF_HEADROOM);
-		skb_put(skb, size);
-		skb->next = NULL;
-	}
-	return skb;
-}
+extern struct sk_buff *buf_acquire(u32 size);
 
 /**
  * buf_discard - frees a TIPC message buffer
-- 
1.7.1.rc2


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox