* Re: Linux Kernel Development - A Practical Approach
From: Tapas Mishra @ 2010-10-05 10:49 UTC (permalink / raw)
To: Linux Kernel Explorer; +Cc: kernelnewbies, linux-kernel, netdev, linux-fsdevel
In-Reply-To: <AANLkTim9RFG799GVEkA0GjU_th_vTv_Y8SqYb1WLBUYp@mail.gmail.com>
On Tue, Oct 5, 2010 at 12:20 AM, Linux "Kernel" Explorer
<mylinuxlab@gmail.com> wrote:
> Hello Everyone,
>
> I am reading 'Linux Kernel Development' by Robert Love these days.
>
> This book takes you on a theoretical journey of the linux kernel
> world.Though the book is good but I do have my share of concerns.
Exactly even I do have same concerns.
www.crashcourse.ca
check the above website till today this is the only most relevant
thing which I find easy to begin with.
You can then go to understand some more howtos on internet that will help.
LKD is no doubt good book but even I do not appreciate that.
Jumping directly to code in your or even mine type of situation is
not easy having said that I would say you not to
indulge into books.If possible get some one who can easily give you a
hands on experience.
Then you will be able to understand what the text books talk.
A good advice I got from some one who works with processors is rather
than going in manuals
jump into the code you play with it and then you will gradually understand.
There may be different views but I would suggest you not to read books.
^ permalink raw reply
* Re: [PATCH] SIW: Module initialization
From: Bart Van Assche @ 2010-10-05 10:57 UTC (permalink / raw)
To: Bernard Metzler
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1286261640-5121-1-git-send-email-bmt-OA+xvbQnYDHMbYB6QlFGEg@public.gmane.org>
On Tue, Oct 5, 2010 at 8:54 AM, Bernard Metzler <bmt-OA+xvbQnYDHMbYB6QlFGEg@public.gmane.org> wrote:
> +static int loopback_enabled;
> +module_param(loopback_enabled, int, 0644);
> +MODULE_PARM_DESC(loopback_enabled, "enable_loopback");
A minor comment: since kernel 2.6.31 the type "bool" can be used for
boolean kernel module parameters.
> + * TODO: Dynamic device management (network device registration/removal).
The current implementation is such that one siw device is created for
each network device found at kernel module load time. That means that
you force the user to load the siw kernel module after all other
kernel modules that register a network device. I'm not sure that's a
good idea.
> + if (!siw_device) {
> + siw_device = siw_p;
> + siw_p->next = NULL;
> + } else {
> + siw_p->next = siw_device->next;
> + siw_device->next = siw_p;
> + }
Why a custom linked list implementation instead of using <linux/list.h> ?
Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [MeeGo-Dev][PATCH v3] Topcliff: Update PCH_CAN driver to 2.6.35
From: Marc Kleine-Budde @ 2010-10-05 11:08 UTC (permalink / raw)
To: Masayuki Ohtake
Cc: Wolfgang Grandegger, andrew.chih.howe.khor, qi.wang,
margie.foster, netdev, linux-kernel, yong.y.wang, socketcan-core,
kok.howg.ewe, joel.clark, Tomoya MORINAGA, meego-dev,
David S. Miller, Christian Pellegrin, Samuel Ortiz
In-Reply-To: <000401cb6477$25a0eba0$66f8800a@maildom.okisemi.com>
[-- Attachment #1: Type: text/plain, Size: 543 bytes --]
On 10/05/2010 12:21 PM, Masayuki Ohtake wrote:
> Hi Wolfgang,
>
> I could confirm below.
> With FIFO mode, it is able to receive packet with in-order.
> We are now implementing FIFO mode.
If FIFO is working you might also think about NAPI.
cheers, Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply
* [net-2.6 PATCH] MAINTAINERS: update Intel LAN Ethernet info
From: Jeff Kirsher @ 2010-10-05 11:15 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-kernel, gospo, bphilips, Jeff Kirsher
- Add ixgbevf and docs files to the maintainers file
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
MAINTAINERS | 16 ++++++++++++++--
1 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 44e6595..20a03b9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3050,16 +3050,27 @@ L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/ixp2000/
-INTEL ETHERNET DRIVERS (e100/e1000/e1000e/igb/igbvf/ixgb/ixgbe)
+INTEL ETHERNET DRIVERS (e100/e1000/e1000e/igb/igbvf/ixgb/ixgbe/ixgbevf)
M: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
M: Jesse Brandeburg <jesse.brandeburg@intel.com>
M: Bruce Allan <bruce.w.allan@intel.com>
-M: Alex Duyck <alexander.h.duyck@intel.com>
+M: Carolyn Wyborny <carolyn.wyborny@intel.com>
+M: Don Skidmore <donald.c.skidmore@intel.com>
+M: Greg Rose <gregory.v.rose@intel.com>
M: PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>
+M: Alex Duyck <alexander.h.duyck@intel.com>
M: John Ronciak <john.ronciak@intel.com>
L: e1000-devel@lists.sourceforge.net
W: http://e1000.sourceforge.net/
S: Supported
+F: Documentation/networking/e100.txt
+F: Documentation/networking/e1000.txt
+F: Documentation/networking/e1000e.txt
+F: Documentation/networking/igb.txt
+F: Documentation/networking/igbvf.txt
+F: Documentation/networking/ixgb.txt
+F: Documentation/networking/ixgbe.txt
+F: Documentation/networking/ixgbevf.txt
F: drivers/net/e100.c
F: drivers/net/e1000/
F: drivers/net/e1000e/
@@ -3067,6 +3078,7 @@ F: drivers/net/igb/
F: drivers/net/igbvf/
F: drivers/net/ixgb/
F: drivers/net/ixgbe/
+F: drivers/net/ixgbevf/
INTEL PRO/WIRELESS 2100 NETWORK CONNECTION SUPPORT
L: linux-wireless@vger.kernel.org
^ permalink raw reply related
* Re: checkentry function
From: Nicola Padovano @ 2010-10-05 11:16 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Stephen Hemminger, netfilter-devel, netdev
In-Reply-To: <1286259838.2457.11.camel@edumazet-laptop>
>
> Because xxx_check() signature is not the one you use.
>
> Could you read source code of _current_ existing modules , and use
> copy/paste ?
>
> static int hashlimit_mt_check(const struct xt_mtchk_param *par)
> {
> ...
> }
as i've written in a previously mail this is the checkentry function
that i use in my source code to check if the iptables command line is
a right line.
[CHECK_ENTRY_CODE]
static bool xt_tarpit_check(const char *tablename, const void *entry,
const struct xt_target *target, void *targinfo,
unsigned int hook_mask)
{
if (strcmp(tablename, "filter")) {
printk(KERN_INFO "DEBUG: the tablename (not FILTER) is %s\n",tablename);
return false;
}
return true;
}
[/CHECK_ENTRY_CODE]
but it doesn't work...NOTE: the module goes inside the function but
the tablename value is a wrong one (also if I set "-t filter" option
in the iptables command line)
i don't know what "static int hashlimit_mt_check(const struct
xt_mtchk_param *par)" is...
thank you
--
Nicola Padovano
e-mail: nicola.padovano@gmail.com
web: http://npadovano.altervista.org
"My only ambition is not be anything at all; it seems the most
sensible thing" (C. Bukowski)
^ permalink raw reply
* [PATCH 1/3] ixgbevf.txt: Update ixgbevf documentation
From: Jeff Kirsher @ 2010-10-05 11:16 UTC (permalink / raw)
To: rdunlap; +Cc: netdev, linux-doc, gospo, bphilips, Jeff Kirsher
Update the documentation for the ixgbevf (ixgbe virtual
function driver).
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
Documentation/networking/ixgbevf.txt | 40 +++-------------------------------
1 files changed, 3 insertions(+), 37 deletions(-)
mode change 100755 => 100644 Documentation/networking/ixgbevf.txt
diff --git a/Documentation/networking/ixgbevf.txt b/Documentation/networking/ixgbevf.txt
old mode 100755
new mode 100644
index 19015de..21dd5d1
--- a/Documentation/networking/ixgbevf.txt
+++ b/Documentation/networking/ixgbevf.txt
@@ -1,19 +1,16 @@
Linux* Base Driver for Intel(R) Network Connection
==================================================
-November 24, 2009
+Intel Gigabit Linux driver.
+Copyright(c) 1999 - 2010 Intel Corporation.
Contents
========
-- In This Release
- Identifying Your Adapter
- Known Issues/Troubleshooting
- Support
-In This Release
-===============
-
This file describes the ixgbevf Linux* Base Driver for Intel Network
Connection.
@@ -33,7 +30,7 @@ Identifying Your Adapter
For more information on how to identify your adapter, go to the Adapter &
Driver ID Guide at:
- http://support.intel.com/support/network/sb/CS-008441.htm
+ http://support.intel.com/support/go/network/adapter/idguide.htm
Known Issues/Troubleshooting
============================
@@ -57,34 +54,3 @@ or the Intel Wired Networking project hosted by Sourceforge at:
If an issue is identified with the released source code on the supported
kernel with a supported adapter, email the specific information related
to the issue to e1000-devel@lists.sf.net
-
-License
-=======
-
-Intel 10 Gigabit Linux driver.
-Copyright(c) 1999 - 2009 Intel Corporation.
-
-This program is free software; you can redistribute it and/or modify it
-under the terms and conditions of the GNU General Public License,
-version 2, as published by the Free Software Foundation.
-
-This program is distributed in the hope it will be useful, but WITHOUT
-ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
-more details.
-
-You should have received a copy of the GNU General Public License along with
-this program; if not, write to the Free Software Foundation, Inc.,
-51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
-The full GNU General Public License is included in this distribution in
-the file called "COPYING".
-
-Trademarks
-==========
-
-Intel, Itanium, and Pentium are trademarks or registered trademarks of
-Intel Corporation or its subsidiaries in the United States and other
-countries.
-
-* Other names and brands may be claimed as the property of others.
^ permalink raw reply related
* [PATCH 2/3] e1000.txt: Update e1000 documentation
From: Jeff Kirsher @ 2010-10-05 11:17 UTC (permalink / raw)
To: rdunlap; +Cc: netdev, linux-doc, gospo, bphilips, Jeff Kirsher
In-Reply-To: <20101005111643.23000.38976.stgit@localhost.localdomain>
Updated the e1000 networking driver documentation.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
Documentation/networking/e1000.txt | 373 +++++++++---------------------------
1 files changed, 96 insertions(+), 277 deletions(-)
diff --git a/Documentation/networking/e1000.txt b/Documentation/networking/e1000.txt
index 2df7186..d9271e7 100644
--- a/Documentation/networking/e1000.txt
+++ b/Documentation/networking/e1000.txt
@@ -1,82 +1,35 @@
Linux* Base Driver for the Intel(R) PRO/1000 Family of Adapters
===============================================================
-September 26, 2006
-
+Intel Gigabit Linux driver.
+Copyright(c) 1999 - 2010 Intel Corporation.
Contents
========
-- In This Release
- Identifying Your Adapter
-- Building and Installation
- Command Line Parameters
- Speed and Duplex Configuration
- Additional Configurations
-- Known Issues
- Support
-
-In This Release
-===============
-
-This file describes the Linux* Base Driver for the Intel(R) PRO/1000 Family
-of Adapters. This driver includes support for Itanium(R)2-based systems.
-
-For questions related to hardware requirements, refer to the documentation
-supplied with your Intel PRO/1000 adapter. All hardware requirements listed
-apply to use with Linux.
-
-The following features are now available in supported kernels:
- - Native VLANs
- - Channel Bonding (teaming)
- - SNMP
-
-Channel Bonding documentation can be found in the Linux kernel source:
-/Documentation/networking/bonding.txt
-
-The driver information previously displayed in the /proc filesystem is not
-supported in this release. Alternatively, you can use ethtool (version 1.6
-or later), lspci, and ifconfig to obtain the same information.
-
-Instructions on updating ethtool can be found in the section "Additional
-Configurations" later in this document.
-
-NOTE: The Intel(R) 82562v 10/100 Network Connection only provides 10/100
-support.
-
-
Identifying Your Adapter
========================
For more information on how to identify your adapter, go to the Adapter &
Driver ID Guide at:
- http://support.intel.com/support/network/adapter/pro100/21397.htm
+ http://support.intel.com/support/go/network/adapter/idguide.htm
For the latest Intel network drivers for Linux, refer to the following
website. In the search field, enter your adapter name or type, or use the
networking link on the left to search for your adapter:
- http://downloadfinder.intel.com/scripts-df/support_intel.asp
-
+ http://support.intel.com/support/go/network/adapter/home.htm
Command Line Parameters
=======================
-If the driver is built as a module, the following optional parameters
-are used by entering them on the command line with the modprobe command
-using this syntax:
-
- modprobe e1000 [<option>=<VAL1>,<VAL2>,...]
-
-For example, with two PRO/1000 PCI adapters, entering:
-
- modprobe e1000 TxDescriptors=80,128
-
-loads the e1000 driver with 80 TX descriptors for the first adapter and
-128 TX descriptors for the second adapter.
-
The default value for each parameter is generally the recommended setting,
unless otherwise noted.
@@ -89,10 +42,6 @@ NOTES: For more information about the AutoNeg, Duplex, and Speed
parameters, see the application note at:
http://www.intel.com/design/network/applnots/ap450.htm
- A descriptor describes a data buffer and attributes related to
- the data buffer. This information is accessed by the hardware.
-
-
AutoNeg
-------
(Supported only on adapters with copper connections)
@@ -106,7 +55,6 @@ Duplex parameters must not be specified.
NOTE: Refer to the Speed and Duplex section of this readme for more
information on the AutoNeg parameter.
-
Duplex
------
(Supported only on adapters with copper connections)
@@ -119,7 +67,6 @@ set to auto-negotiate, the board auto-detects the correct duplex. If the
link partner is forced (either full or half), Duplex defaults to half-
duplex.
-
FlowControl
-----------
Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx)
@@ -128,16 +75,16 @@ Default Value: Reads flow control settings from the EEPROM
This parameter controls the automatic generation(Tx) and response(Rx)
to Ethernet PAUSE frames.
-
InterruptThrottleRate
---------------------
(not supported on Intel(R) 82542, 82543 or 82544-based adapters)
-Valid Range: 0,1,3,100-100000 (0=off, 1=dynamic, 3=dynamic conservative)
+Valid Range: 0,1,3,4,100-100000 (0=off, 1=dynamic, 3=dynamic conservative,
+ 4=simplified balancing)
Default Value: 3
The driver can limit the amount of interrupts per second that the adapter
-will generate for incoming packets. It does this by writing a value to the
-adapter that is based on the maximum amount of interrupts that the adapter
+will generate for incoming packets. It does this by writing a value to the
+adapter that is based on the maximum amount of interrupts that the adapter
will generate per second.
Setting InterruptThrottleRate to a value greater or equal to 100
@@ -146,37 +93,43 @@ per second, even if more packets have come in. This reduces interrupt
load on the system and can lower CPU utilization under heavy load,
but will increase latency as packets are not processed as quickly.
-The default behaviour of the driver previously assumed a static
-InterruptThrottleRate value of 8000, providing a good fallback value for
-all traffic types,but lacking in small packet performance and latency.
-The hardware can handle many more small packets per second however, and
+The default behaviour of the driver previously assumed a static
+InterruptThrottleRate value of 8000, providing a good fallback value for
+all traffic types,but lacking in small packet performance and latency.
+The hardware can handle many more small packets per second however, and
for this reason an adaptive interrupt moderation algorithm was implemented.
Since 7.3.x, the driver has two adaptive modes (setting 1 or 3) in which
-it dynamically adjusts the InterruptThrottleRate value based on the traffic
+it dynamically adjusts the InterruptThrottleRate value based on the traffic
that it receives. After determining the type of incoming traffic in the last
-timeframe, it will adjust the InterruptThrottleRate to an appropriate value
+timeframe, it will adjust the InterruptThrottleRate to an appropriate value
for that traffic.
The algorithm classifies the incoming traffic every interval into
-classes. Once the class is determined, the InterruptThrottleRate value is
-adjusted to suit that traffic type the best. There are three classes defined:
+classes. Once the class is determined, the InterruptThrottleRate value is
+adjusted to suit that traffic type the best. There are three classes defined:
"Bulk traffic", for large amounts of packets of normal size; "Low latency",
for small amounts of traffic and/or a significant percentage of small
-packets; and "Lowest latency", for almost completely small packets or
+packets; and "Lowest latency", for almost completely small packets or
minimal traffic.
-In dynamic conservative mode, the InterruptThrottleRate value is set to 4000
-for traffic that falls in class "Bulk traffic". If traffic falls in the "Low
-latency" or "Lowest latency" class, the InterruptThrottleRate is increased
+In dynamic conservative mode, the InterruptThrottleRate value is set to 4000
+for traffic that falls in class "Bulk traffic". If traffic falls in the "Low
+latency" or "Lowest latency" class, the InterruptThrottleRate is increased
stepwise to 20000. This default mode is suitable for most applications.
For situations where low latency is vital such as cluster or
grid computing, the algorithm can reduce latency even more when
InterruptThrottleRate is set to mode 1. In this mode, which operates
-the same as mode 3, the InterruptThrottleRate will be increased stepwise to
+the same as mode 3, the InterruptThrottleRate will be increased stepwise to
70000 for traffic in class "Lowest latency".
+In simplified mode the interrupt rate is based on the ratio of Tx and
+Rx traffic. If the bytes per second rate is approximately equal, the
+interrupt rate will drop as low as 2000 interrupts per second. If the
+traffic is mostly transmit or mostly receive, the interrupt rate could
+be as high as 8000.
+
Setting InterruptThrottleRate to 0 turns off any interrupt moderation
and may improve small packet latency, but is generally not suitable
for bulk throughput traffic.
@@ -212,8 +165,6 @@ NOTE: When e1000 is loaded with default settings and multiple adapters
be platform-specific. If CPU utilization is not a concern, use
RX_POLLING (NAPI) and default driver settings.
-
-
RxDescriptors
-------------
Valid Range: 80-256 for 82542 and 82543-based adapters
@@ -225,15 +176,14 @@ by the driver. Increasing this value allows the driver to buffer more
incoming packets, at the expense of increased system memory utilization.
Each descriptor is 16 bytes. A receive buffer is also allocated for each
-descriptor and can be either 2048, 4096, 8192, or 16384 bytes, depending
+descriptor and can be either 2048, 4096, 8192, or 16384 bytes, depending
on the MTU setting. The maximum MTU size is 16110.
-NOTE: MTU designates the frame size. It only needs to be set for Jumbo
- Frames. Depending on the available system resources, the request
- for a higher number of receive descriptors may be denied. In this
+NOTE: MTU designates the frame size. It only needs to be set for Jumbo
+ Frames. Depending on the available system resources, the request
+ for a higher number of receive descriptors may be denied. In this
case, use a lower number.
-
RxIntDelay
----------
Valid Range: 0-65535 (0=off)
@@ -254,7 +204,6 @@ CAUTION: When setting RxIntDelay to a value other than 0, adapters may
restoring the network connection. To eliminate the potential
for the hang ensure that RxIntDelay is set to 0.
-
RxAbsIntDelay
-------------
(This parameter is supported only on 82540, 82545 and later adapters.)
@@ -268,7 +217,6 @@ packet is received within the set amount of time. Proper tuning,
along with RxIntDelay, may improve traffic throughput in specific network
conditions.
-
Speed
-----
(This parameter is supported only on adapters with copper connections.)
@@ -280,7 +228,6 @@ Speed forces the line speed to the specified value in megabits per second
partner is set to auto-negotiate, the board will auto-detect the correct
speed. Duplex should also be set when Speed is set to either 10 or 100.
-
TxDescriptors
-------------
Valid Range: 80-256 for 82542 and 82543-based adapters
@@ -295,6 +242,36 @@ NOTE: Depending on the available system resources, the request for a
higher number of transmit descriptors may be denied. In this case,
use a lower number.
+TxDescriptorStep
+----------------
+Valid Range: 1 (use every Tx Descriptor)
+ 4 (use every 4th Tx Descriptor)
+
+Default Value: 1 (use every Tx Descriptor)
+
+On certain non-Intel architectures, it has been observed that intense TX
+traffic bursts of short packets may result in an improper descriptor
+writeback. If this occurs, the driver will report a "TX Timeout" and reset
+the adapter, after which the transmit flow will restart, though data may
+have stalled for as much as 10 seconds before it resumes.
+
+The improper writeback does not occur on the first descriptor in a system
+memory cache-line, which is typically 32 bytes, or 4 descriptors long.
+
+Setting TxDescriptorStep to a value of 4 will ensure that all TX descriptors
+are aligned to the start of a system memory cache line, and so this problem
+will not occur.
+
+NOTES: Setting TxDescriptorStep to 4 effectively reduces the number of
+ TxDescriptors available for transmits to 1/4 of the normal allocation.
+ This has a possible negative performance impact, which may be
+ compensated for by allocating more descriptors using the TxDescriptors
+ module parameter.
+
+ There are other conditions which may result in "TX Timeout", which will
+ not be resolved by the use of the TxDescriptorStep parameter. As the
+ issue addressed by this parameter has never been observed on Intel
+ Architecture platforms, it should not be used on Intel platforms.
TxIntDelay
----------
@@ -307,7 +284,6 @@ efficiency if properly tuned for specific network traffic. If the
system is reporting dropped transmits, this value may be set too high
causing the driver to run out of available transmit descriptors.
-
TxAbsIntDelay
-------------
(This parameter is supported only on 82540, 82545 and later adapters.)
@@ -330,6 +306,35 @@ Default Value: 1
A value of '1' indicates that the driver should enable IP checksum
offload for received packets (both UDP and TCP) to the adapter hardware.
+Copybreak
+---------
+Valid Range: 0-xxxxxxx (0=off)
+Default Value: 256
+Usage: insmod e1000.ko copybreak=128
+
+Driver copies all packets below or equaling this size to a fresh Rx
+buffer before handing it up the stack.
+
+This parameter is different than other parameters, in that it is a
+single (not 1,1,1 etc.) parameter applied to all driver instances and
+it is also available during runtime at
+/sys/module/e1000/parameters/copybreak
+
+SmartPowerDownEnable
+--------------------
+Valid Range: 0-1
+Default Value: 0 (disabled)
+
+Allows PHY to turn off in lower power states. The user can turn off
+this parameter in supported chipsets.
+
+KumeranLockLoss
+---------------
+Valid Range: 0-1
+Default Value: 1 (enabled)
+
+This workaround skips resetting the PHY at shutdown for the initial
+silicon releases of ICH8 systems.
Speed and Duplex Configuration
==============================
@@ -385,40 +390,9 @@ If the link partner is forced to a specific speed and duplex, then this
parameter should not be used. Instead, use the Speed and Duplex parameters
previously mentioned to force the adapter to the same speed and duplex.
-
Additional Configurations
=========================
- Configuring the Driver on Different Distributions
- -------------------------------------------------
- Configuring a network driver to load properly when the system is started
- is distribution dependent. Typically, the configuration process involves
- adding an alias line to /etc/modules.conf or /etc/modprobe.conf as well
- as editing other system startup scripts and/or configuration files. Many
- popular Linux distributions ship with tools to make these changes for you.
- To learn the proper way to configure a network device for your system,
- refer to your distribution documentation. If during this process you are
- asked for the driver or module name, the name for the Linux Base Driver
- for the Intel(R) PRO/1000 Family of Adapters is e1000.
-
- As an example, if you install the e1000 driver for two PRO/1000 adapters
- (eth0 and eth1) and set the speed and duplex to 10full and 100half, add
- the following to modules.conf or or modprobe.conf:
-
- alias eth0 e1000
- alias eth1 e1000
- options e1000 Speed=10,100 Duplex=2,1
-
- Viewing Link Messages
- ---------------------
- Link messages will not be displayed to the console if the distribution is
- restricting system messages. In order to see network driver link messages
- on your console, set dmesg to eight by entering the following:
-
- dmesg -n 8
-
- NOTE: This setting is not saved across reboots.
-
Jumbo Frames
------------
Jumbo Frames support is enabled by changing the MTU to a value larger than
@@ -437,9 +411,11 @@ Additional Configurations
setting in a different location.
Notes:
-
- - To enable Jumbo Frames, increase the MTU size on the interface beyond
- 1500.
+ Degradation in throughput performance may be observed in some Jumbo frames
+ environments. If this is observed, increasing the application's socket buffer
+ size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help.
+ See the specific application manual and /usr/src/linux*/Documentation/
+ networking/ip-sysctl.txt for more details.
- The maximum MTU setting for Jumbo Frames is 16110. This value coincides
with the maximum Jumbo Frames size of 16128.
@@ -447,40 +423,11 @@ Additional Configurations
- Using Jumbo Frames at 10 or 100 Mbps may result in poor performance or
loss of link.
- - Some Intel gigabit adapters that support Jumbo Frames have a frame size
- limit of 9238 bytes, with a corresponding MTU size limit of 9216 bytes.
- The adapters with this limitation are based on the Intel(R) 82571EB,
- 82572EI, 82573L and 80003ES2LAN controller. These correspond to the
- following product names:
- Intel(R) PRO/1000 PT Server Adapter
- Intel(R) PRO/1000 PT Desktop Adapter
- Intel(R) PRO/1000 PT Network Connection
- Intel(R) PRO/1000 PT Dual Port Server Adapter
- Intel(R) PRO/1000 PT Dual Port Network Connection
- Intel(R) PRO/1000 PF Server Adapter
- Intel(R) PRO/1000 PF Network Connection
- Intel(R) PRO/1000 PF Dual Port Server Adapter
- Intel(R) PRO/1000 PB Server Connection
- Intel(R) PRO/1000 PL Network Connection
- Intel(R) PRO/1000 EB Network Connection with I/O Acceleration
- Intel(R) PRO/1000 EB Backplane Connection with I/O Acceleration
- Intel(R) PRO/1000 PT Quad Port Server Adapter
-
- Adapters based on the Intel(R) 82542 and 82573V/E controller do not
support Jumbo Frames. These correspond to the following product names:
Intel(R) PRO/1000 Gigabit Server Adapter
Intel(R) PRO/1000 PM Network Connection
- - The following adapters do not support Jumbo Frames:
- Intel(R) 82562V 10/100 Network Connection
- Intel(R) 82566DM Gigabit Network Connection
- Intel(R) 82566DC Gigabit Network Connection
- Intel(R) 82566MM Gigabit Network Connection
- Intel(R) 82566MC Gigabit Network Connection
- Intel(R) 82562GT 10/100 Network Connection
- Intel(R) 82562G 10/100 Network Connection
-
-
Ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
@@ -490,142 +437,14 @@ Additional Configurations
The latest release of ethtool can be found from
http://sourceforge.net/projects/gkernel.
- NOTE: Ethtool 1.6 only supports a limited set of ethtool options. Support
- for a more complete ethtool feature set can be enabled by upgrading
- ethtool to ethtool-1.8.1.
-
Enabling Wake on LAN* (WoL)
---------------------------
- WoL is configured through the Ethtool* utility. Ethtool is included with
- all versions of Red Hat after Red Hat 7.2. For other Linux distributions,
- download and install Ethtool from the following website:
- http://sourceforge.net/projects/gkernel.
-
- For instructions on enabling WoL with Ethtool, refer to the website listed
- above.
+ WoL is configured through the Ethtool* utility.
WoL will be enabled on the system during the next shut down or reboot.
For this driver version, in order to enable WoL, the e1000 driver must be
loaded when shutting down or rebooting the system.
- Wake On LAN is only supported on port A for the following devices:
- Intel(R) PRO/1000 PT Dual Port Network Connection
- Intel(R) PRO/1000 PT Dual Port Server Connection
- Intel(R) PRO/1000 PT Dual Port Server Adapter
- Intel(R) PRO/1000 PF Dual Port Server Adapter
- Intel(R) PRO/1000 PT Quad Port Server Adapter
-
- NAPI
- ----
- NAPI (Rx polling mode) is enabled in the e1000 driver.
-
- See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI.
-
-
-Known Issues
-============
-
-Dropped Receive Packets on Half-duplex 10/100 Networks
-------------------------------------------------------
-If you have an Intel PCI Express adapter running at 10mbps or 100mbps, half-
-duplex, you may observe occasional dropped receive packets. There are no
-workarounds for this problem in this network configuration. The network must
-be updated to operate in full-duplex, and/or 1000mbps only.
-
-Jumbo Frames System Requirement
--------------------------------
-Memory allocation failures have been observed on Linux systems with 64 MB
-of RAM or less that are running Jumbo Frames. If you are using Jumbo
-Frames, your system may require more than the advertised minimum
-requirement of 64 MB of system memory.
-
-Performance Degradation with Jumbo Frames
------------------------------------------
-Degradation in throughput performance may be observed in some Jumbo frames
-environments. If this is observed, increasing the application's socket
-buffer size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values
-may help. See the specific application manual and
-/usr/src/linux*/Documentation/
-networking/ip-sysctl.txt for more details.
-
-Jumbo Frames on Foundry BigIron 8000 switch
--------------------------------------------
-There is a known issue using Jumbo frames when connected to a Foundry
-BigIron 8000 switch. This is a 3rd party limitation. If you experience
-loss of packets, lower the MTU size.
-
-Allocating Rx Buffers when Using Jumbo Frames
----------------------------------------------
-Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if
-the available memory is heavily fragmented. This issue may be seen with PCI-X
-adapters or with packet split disabled. This can be reduced or eliminated
-by changing the amount of available memory for receive buffer allocation, by
-increasing /proc/sys/vm/min_free_kbytes.
-
-Multiple Interfaces on Same Ethernet Broadcast Network
-------------------------------------------------------
-Due to the default ARP behavior on Linux, it is not possible to have
-one system on two IP networks in the same Ethernet broadcast domain
-(non-partitioned switch) behave as expected. All Ethernet interfaces
-will respond to IP traffic for any IP address assigned to the system.
-This results in unbalanced receive traffic.
-
-If you have multiple interfaces in a server, either turn on ARP
-filtering by entering:
-
- echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
-(this only works if your kernel's version is higher than 2.4.5),
-
-NOTE: This setting is not saved across reboots. The configuration
-change can be made permanent by adding the line:
- net.ipv4.conf.all.arp_filter = 1
-to the file /etc/sysctl.conf
-
- or,
-
-install the interfaces in separate broadcast domains (either in
-different switches or in a switch partitioned to VLANs).
-
-82541/82547 can't link or are slow to link with some link partners
------------------------------------------------------------------
-There is a known compatibility issue with 82541/82547 and some
-low-end switches where the link will not be established, or will
-be slow to establish. In particular, these switches are known to
-be incompatible with 82541/82547:
-
- Planex FXG-08TE
- I-O Data ETG-SH8
-
-To workaround this issue, the driver can be compiled with an override
-of the PHY's master/slave setting. Forcing master or forcing slave
-mode will improve time-to-link.
-
- # make CFLAGS_EXTRA=-DE1000_MASTER_SLAVE=<n>
-
-Where <n> is:
-
- 0 = Hardware default
- 1 = Master mode
- 2 = Slave mode
- 3 = Auto master/slave
-
-Disable rx flow control with ethtool
-------------------------------------
-In order to disable receive flow control using ethtool, you must turn
-off auto-negotiation on the same command line.
-
-For example:
-
- ethtool -A eth? autoneg off rx off
-
-Unplugging network cable while ethtool -p is running
-----------------------------------------------------
-In kernel versions 2.5.50 and later (including 2.6 kernel), unplugging
-the network cable while ethtool -p is running will cause the system to
-become unresponsive to keyboard commands, except for control-alt-delete.
-Restarting the system appears to be the only remedy.
-
-
Support
=======
^ permalink raw reply related
* [PATCH 3/3] e1000e.txt: Add e1000e documentation
From: Jeff Kirsher @ 2010-10-05 11:17 UTC (permalink / raw)
To: rdunlap; +Cc: netdev, linux-doc, gospo, bphilips, Jeff Kirsher
In-Reply-To: <20101005111643.23000.38976.stgit@localhost.localdomain>
Adds documentation for the e1000e networking driver.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
Documentation/networking/e1000e.txt | 303 +++++++++++++++++++++++++++++++++++
1 files changed, 303 insertions(+), 0 deletions(-)
create mode 100644 Documentation/networking/e1000e.txt
diff --git a/Documentation/networking/e1000e.txt b/Documentation/networking/e1000e.txt
new file mode 100644
index 0000000..a3c6e01
--- /dev/null
+++ b/Documentation/networking/e1000e.txt
@@ -0,0 +1,303 @@
+Linux* Driver for Intel(R) Network Connection
+===============================================================
+
+Intel Gigabit Linux driver.
+Copyright(c) 1999 - 2010 Intel Corporation.
+
+Contents
+========
+
+- Identifying Your Adapter
+- Command Line Parameters
+- Additional Configurations
+- Support
+
+Identifying Your Adapter
+========================
+
+The e1000e driver supports all PCI Express Intel(R) Gigabit Network
+Connections, except those that are 82575, 82576 and 82580-based*.
+
+* NOTE: The Intel(R) PRO/1000 P Dual Port Server Adapter is supported by
+ the e1000 driver, not the e1000e driver due to the 82546 part being used
+ behind a PCI Express bridge.
+
+For more information on how to identify your adapter, go to the Adapter &
+Driver ID Guide at:
+
+ http://support.intel.com/support/go/network/adapter/idguide.htm
+
+For the latest Intel network drivers for Linux, refer to the following
+website. In the search field, enter your adapter name or type, or use the
+networking link on the left to search for your adapter:
+
+ http://support.intel.com/support/go/network/adapter/home.htm
+
+Command Line Parameters
+=======================
+
+The default value for each parameter is generally the recommended setting,
+unless otherwise noted.
+
+NOTES: For more information about the InterruptThrottleRate,
+ RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay
+ parameters, see the application note at:
+ http://www.intel.com/design/network/applnots/ap450.htm
+
+InterruptThrottleRate
+---------------------
+Valid Range: 0,1,3,4,100-100000 (0=off, 1=dynamic, 3=dynamic conservative,
+ 4=simplified balancing)
+Default Value: 3
+
+The driver can limit the amount of interrupts per second that the adapter
+will generate for incoming packets. It does this by writing a value to the
+adapter that is based on the maximum amount of interrupts that the adapter
+will generate per second.
+
+Setting InterruptThrottleRate to a value greater or equal to 100
+will program the adapter to send out a maximum of that many interrupts
+per second, even if more packets have come in. This reduces interrupt
+load on the system and can lower CPU utilization under heavy load,
+but will increase latency as packets are not processed as quickly.
+
+The driver has two adaptive modes (setting 1 or 3) in which
+it dynamically adjusts the InterruptThrottleRate value based on the traffic
+that it receives. After determining the type of incoming traffic in the last
+timeframe, it will adjust the InterruptThrottleRate to an appropriate value
+for that traffic.
+
+The algorithm classifies the incoming traffic every interval into
+classes. Once the class is determined, the InterruptThrottleRate value is
+adjusted to suit that traffic type the best. There are three classes defined:
+"Bulk traffic", for large amounts of packets of normal size; "Low latency",
+for small amounts of traffic and/or a significant percentage of small
+packets; and "Lowest latency", for almost completely small packets or
+minimal traffic.
+
+In dynamic conservative mode, the InterruptThrottleRate value is set to 4000
+for traffic that falls in class "Bulk traffic". If traffic falls in the "Low
+latency" or "Lowest latency" class, the InterruptThrottleRate is increased
+stepwise to 20000. This default mode is suitable for most applications.
+
+For situations where low latency is vital such as cluster or
+grid computing, the algorithm can reduce latency even more when
+InterruptThrottleRate is set to mode 1. In this mode, which operates
+the same as mode 3, the InterruptThrottleRate will be increased stepwise to
+70000 for traffic in class "Lowest latency".
+
+In simplified mode the interrupt rate is based on the ratio of Tx and
+Rx traffic. If the bytes per second rate is approximately equal the
+interrupt rate will drop as low as 2000 interrupts per second. If the
+traffic is mostly transmit or mostly receive, the interrupt rate could
+be as high as 8000.
+
+Setting InterruptThrottleRate to 0 turns off any interrupt moderation
+and may improve small packet latency, but is generally not suitable
+for bulk throughput traffic.
+
+NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and
+ RxAbsIntDelay parameters. In other words, minimizing the receive
+ and/or transmit absolute delays does not force the controller to
+ generate more interrupts than what the Interrupt Throttle Rate
+ allows.
+
+NOTE: When e1000e is loaded with default settings and multiple adapters
+ are in use simultaneously, the CPU utilization may increase non-
+ linearly. In order to limit the CPU utilization without impacting
+ the overall throughput, we recommend that you load the driver as
+ follows:
+
+ modprobe e1000e InterruptThrottleRate=3000,3000,3000
+
+ This sets the InterruptThrottleRate to 3000 interrupts/sec for
+ the first, second, and third instances of the driver. The range
+ of 2000 to 3000 interrupts per second works on a majority of
+ systems and is a good starting point, but the optimal value will
+ be platform-specific. If CPU utilization is not a concern, use
+ RX_POLLING (NAPI) and default driver settings.
+
+RxIntDelay
+----------
+Valid Range: 0-65535 (0=off)
+Default Value: 0
+
+This value delays the generation of receive interrupts in units of 1.024
+microseconds. Receive interrupt reduction can improve CPU efficiency if
+properly tuned for specific network traffic. Increasing this value adds
+extra latency to frame reception and can end up decreasing the throughput
+of TCP traffic. If the system is reporting dropped receives, this value
+may be set too high, causing the driver to run out of available receive
+descriptors.
+
+CAUTION: When setting RxIntDelay to a value other than 0, adapters may
+ hang (stop transmitting) under certain network conditions. If
+ this occurs a NETDEV WATCHDOG message is logged in the system
+ event log. In addition, the controller is automatically reset,
+ restoring the network connection. To eliminate the potential
+ for the hang ensure that RxIntDelay is set to 0.
+
+RxAbsIntDelay
+-------------
+Valid Range: 0-65535 (0=off)
+Default Value: 8
+
+This value, in units of 1.024 microseconds, limits the delay in which a
+receive interrupt is generated. Useful only if RxIntDelay is non-zero,
+this value ensures that an interrupt is generated after the initial
+packet is received within the set amount of time. Proper tuning,
+along with RxIntDelay, may improve traffic throughput in specific network
+conditions.
+
+TxIntDelay
+----------
+Valid Range: 0-65535 (0=off)
+Default Value: 8
+
+This value delays the generation of transmit interrupts in units of
+1.024 microseconds. Transmit interrupt reduction can improve CPU
+efficiency if properly tuned for specific network traffic. If the
+system is reporting dropped transmits, this value may be set too high
+causing the driver to run out of available transmit descriptors.
+
+TxAbsIntDelay
+-------------
+Valid Range: 0-65535 (0=off)
+Default Value: 32
+
+This value, in units of 1.024 microseconds, limits the delay in which a
+transmit interrupt is generated. Useful only if TxIntDelay is non-zero,
+this value ensures that an interrupt is generated after the initial
+packet is sent on the wire within the set amount of time. Proper tuning,
+along with TxIntDelay, may improve traffic throughput in specific
+network conditions.
+
+Copybreak
+---------
+Valid Range: 0-xxxxxxx (0=off)
+Default Value: 256
+
+Driver copies all packets below or equaling this size to a fresh Rx
+buffer before handing it up the stack.
+
+This parameter is different than other parameters, in that it is a
+single (not 1,1,1 etc.) parameter applied to all driver instances and
+it is also available during runtime at
+/sys/module/e1000e/parameters/copybreak
+
+SmartPowerDownEnable
+--------------------
+Valid Range: 0-1
+Default Value: 0 (disabled)
+
+Allows PHY to turn off in lower power states. The user can set this parameter
+in supported chipsets.
+
+KumeranLockLoss
+---------------
+Valid Range: 0-1
+Default Value: 1 (enabled)
+
+This workaround skips resetting the PHY at shutdown for the initial
+silicon releases of ICH8 systems.
+
+IntMode
+-------
+Valid Range: 0-2 (0=legacy, 1=MSI, 2=MSI-X)
+Default Value: 2
+
+Allows changing the interrupt mode at module load time, without requiring a
+recompile. If the driver load fails to enable a specific interrupt mode, the
+driver will try other interrupt modes, from least to most compatible. The
+interrupt order is MSI-X, MSI, Legacy. If specifying MSI (IntMode=1)
+interrupts, only MSI and Legacy will be attempted.
+
+CrcStripping
+------------
+Valid Range: 0-1
+Default Value: 1 (enabled)
+
+Strip the CRC from received packets before sending up the network stack. If
+you have a machine with a BMC enabled but cannot receive IPMI traffic after
+loading or enabling the driver, try disabling this feature.
+
+WriteProtectNVM
+---------------
+Valid Range: 0-1
+Default Value: 1 (enabled)
+
+Set the hardware to ignore all write/erase cycles to the GbE region in the
+ICHx NVM (non-volatile memory). This feature can be disabled by the
+WriteProtectNVM module parameter (enabled by default) only after a hardware
+reset, but the machine must be power cycled before trying to enable writes.
+
+Note: the kernel boot option iomem=relaxed may need to be set if the kernel
+config option CONFIG_STRICT_DEVMEM=y, if the root user wants to write the
+NVM from user space via ethtool.
+
+Additional Configurations
+=========================
+
+ Jumbo Frames
+ ------------
+ Jumbo Frames support is enabled by changing the MTU to a value larger than
+ the default of 1500. Use the ifconfig command to increase the MTU size.
+ For example:
+
+ ifconfig eth<x> mtu 9000 up
+
+ This setting is not saved across reboots.
+
+ Notes:
+
+ - The maximum MTU setting for Jumbo Frames is 9216. This value coincides
+ with the maximum Jumbo Frames size of 9234 bytes.
+
+ - Using Jumbo Frames at 10 or 100 Mbps is not supported and may result in
+ poor performance or loss of link.
+
+ - Some adapters limit Jumbo Frames sized packets to a maximum of
+ 4096 bytes and some adapters do not support Jumbo Frames.
+
+
+ Ethtool
+ -------
+ The driver utilizes the ethtool interface for driver configuration and
+ diagnostics, as well as displaying statistical information. We
+ strongly recommend downloading the latest version of Ethtool at:
+
+ http://sourceforge.net/projects/gkernel.
+
+ Speed and Duplex
+ ----------------
+ Speed and Duplex are configured through the Ethtool* utility. For
+ instructions, refer to the Ethtool man page.
+
+ Enabling Wake on LAN* (WoL)
+ ---------------------------
+ WoL is configured through the Ethtool* utility. For instructions on
+ enabling WoL with Ethtool, refer to the Ethtool man page.
+
+ WoL will be enabled on the system during the next shut down or reboot.
+ For this driver version, in order to enable WoL, the e1000e driver must be
+ loaded when shutting down or rebooting the system.
+
+ In most cases Wake On LAN is only supported on port A for multiple port
+ adapters. To verify if a port supports Wake on LAN run ethtool eth<X>.
+
+
+Support
+=======
+
+For general information, go to the Intel support website at:
+
+ www.intel.com/support/
+
+or the Intel Wired Networking project hosted by Sourceforge at:
+
+ http://sourceforge.net/projects/e1000
+
+If an issue is identified with the released source code on the supported
+kernel with a supported adapter, email the specific information related
+to the issue to e1000-devel@lists.sf.net
+
^ permalink raw reply related
* [net-next-2.6 PATCH] ixgbe: Use affinity_hint when Flow Director is enabled
From: Jeff Kirsher @ 2010-10-05 11:27 UTC (permalink / raw)
To: davem; +Cc: netdev, gospo, bphilips, Peter P Waskiewicz Jr, Jeff Kirsher
From: Peter Waskiewicz <peter.p.waskiewicz.jr@intel.com>
Use the new infrastructure to balance interrupts for flow
alignment when ATR or Flow Director are enabled.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ixgbe/ixgbe.h | 2 ++
drivers/net/ixgbe/ixgbe_main.c | 25 +++++++++++++++++++++++++
2 files changed, 27 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ixgbe/ixgbe.h b/drivers/net/ixgbe/ixgbe.h
index 5cebc37..a8c47b0 100644
--- a/drivers/net/ixgbe/ixgbe.h
+++ b/drivers/net/ixgbe/ixgbe.h
@@ -31,6 +31,7 @@
#include <linux/types.h>
#include <linux/pci.h>
#include <linux/netdevice.h>
+#include <linux/cpumask.h>
#include <linux/aer.h>
#include "ixgbe_type.h"
@@ -241,6 +242,7 @@ struct ixgbe_q_vector {
u8 tx_itr;
u8 rx_itr;
u32 eitr;
+ cpumask_var_t affinity_mask;
};
/* Helper macros to switch between ints/sec and what the register uses.
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index c35e13c..95dbf60 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1433,6 +1433,21 @@ static void ixgbe_configure_msix(struct ixgbe_adapter *adapter)
q_vector->eitr = adapter->rx_eitr_param;
ixgbe_write_eitr(q_vector);
+ /* If Flow Director is enabled, set interrupt affinity */
+ if ((adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE) ||
+ (adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE)) {
+ /*
+ * Allocate the affinity_hint cpumask, assign the mask
+ * for this vector, and set our affinity_hint for
+ * this irq.
+ */
+ if (!alloc_cpumask_var(&q_vector->affinity_mask,
+ GFP_KERNEL))
+ return;
+ cpumask_set_cpu(v_idx, q_vector->affinity_mask);
+ irq_set_affinity_hint(adapter->msix_entries[v_idx].vector,
+ q_vector->affinity_mask);
+ }
}
if (adapter->hw.mac.type == ixgbe_mac_82598EB)
@@ -3816,6 +3831,7 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
u32 rxctrl;
u32 txdctl;
int i, j;
+ int num_q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
/* signal that we are down to the interrupt handler */
set_bit(__IXGBE_DOWN, &adapter->state);
@@ -3854,6 +3870,15 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
ixgbe_napi_disable_all(adapter);
+ /* Cleanup the affinity_hint CPU mask memory and callback */
+ for (i = 0; i < num_q_vectors; i++) {
+ struct ixgbe_q_vector *q_vector = adapter->q_vector[i];
+ /* clear the affinity_mask in the IRQ descriptor */
+ irq_set_affinity_hint(adapter->msix_entries[i]. vector, NULL);
+ /* release the CPU mask memory */
+ free_cpumask_var(q_vector->affinity_mask);
+ }
+
if (adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE ||
adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE)
cancel_work_sync(&adapter->fdir_reinit_task);
^ permalink raw reply related
* Re: checkentry function
From: Jan Engelhardt @ 2010-10-05 11:32 UTC (permalink / raw)
To: Nicola Padovano; +Cc: Eric Dumazet, Stephen Hemminger, netfilter-devel, netdev
In-Reply-To: <AANLkTinVtz0VuLkOY5d8QjdvLMXOFD2EaWp_kSBbMJBW@mail.gmail.com>
On Tuesday 2010-10-05 13:16, Nicola Padovano wrote:
>>
>> Could you read source code of _current_ existing modules , and use
>> copy/paste ?
>>
>> static int hashlimit_mt_check(const struct xt_mtchk_param *par)
>> {
>> ...
>> }
>
>as i've written in a previously mail this is the checkentry function
>that i use in my source code to check if the iptables command line is
>a right line.
>
>[CHECK_ENTRY_CODE]
>static bool xt_tarpit_check(const char *tablename, const void *entry,
> const struct xt_target *target, void *targinfo,
> unsigned int hook_mask)
>
>i don't know what "static int hashlimit_mt_check(const struct
>xt_mtchk_param *par)" is...
It's the proper function header.
^ permalink raw reply
* Re: checkentry function
From: Nicola Padovano @ 2010-10-05 11:46 UTC (permalink / raw)
To: Jan Engelhardt; +Cc: Eric Dumazet, Stephen Hemminger, netfilter-devel, netdev
In-Reply-To: <alpine.LNX.2.01.1010051331550.4582@obet.zrqbmnf.qr>
On Tue, Oct 5, 2010 at 1:32 PM, Jan Engelhardt <jengelh@medozas.de> wrote:
> On Tuesday 2010-10-05 13:16, Nicola Padovano wrote:
>>>
>>> Could you read source code of _current_ existing modules , and use
>>> copy/paste ?
>>>
>>> static int hashlimit_mt_check(const struct xt_mtchk_param *par)
>>> {
>>> ...
>>> }
>>
>>as i've written in a previously mail this is the checkentry function
>>that i use in my source code to check if the iptables command line is
>>a right line.
>>
>>[CHECK_ENTRY_CODE]
>>static bool xt_tarpit_check(const char *tablename, const void *entry,
>> const struct xt_target *target, void *targinfo,
>> unsigned int hook_mask)
>>
>>i don't know what "static int hashlimit_mt_check(const struct
>>xt_mtchk_param *par)" is...
>
> It's the proper function header.
>
this is the whole code:
[WHOLE_CODE]
static void function_target(const struct sk_buff *oskb,
struct rtable *ort)
{
...
}
/*
* target function, called everyone the rule is satisfied
* standard behaviour: NF_DROP
*/
static unsigned int xt_tar_target(struct sk_buff *skb,
const struct net_device *in,
const struct net_device *out,
unsigned int hooknum,
const struct xt_target *target,
const void *targinfo)
{
struct rtable *rt = (void *)skb->_skb_refdst;
function_target(skb,rt);
return NF_DROP;
}
/*
* xt_tarpit_check, it allows only:
* 1. raw table & PRE_ROUTING hook or
* 2. filter table & (LOCAL_IN or FORWARD) hook
*/
static bool xt_function_check(const char *tablename, const void *entry,
const struct xt_target *target, void *targinfo,
unsigned int hook_mask)
{
if (strcmp(tablename, "filter"))
{
printk(KERN_INFO "!=filter %s\n",tablename);
return false;
}
return true;
}
static struct xt_target xt_tar_reg = {
.name = "FUN", /* target name */
.family = AF_INET, /* level 3 protocol */
.proto = IPPROTO_TCP, /* we recognize only tcp protocol */
.target = xt_tar_target, /* pointer to target function */
.checkentry = xt_function_check, /* pointer to check-entry function */
.me = THIS_MODULE,
};
/*
* initing module function
*/
static int __init xt_tar_init(void)
{
return xt_register_target(&xt_tar_reg);
}
/*
* delete module
*/
static void __exit xt_tar_exit(void)
{
xt_unregister_target(&xt_tar_reg);
printk(KERN_INFO "TARPIT> !!!exit!!! \n");
}
module_init(xt_tar_init);
module_exit(xt_tar_exit);
/* information about the module and its author */
MODULE_DESCRIPTION("TARPIT target, info: http://npadovano.altervista.org");
MODULE_AUTHOR("Nicola Padovano <nicola.padovano@gmail.com>");
MODULE_LICENSE("GPL");
MODULE_ALIAS("xt_TAR");
[/WHOLE_CODE]
--
Nicola Padovano
e-mail: nicola.padovano@gmail.com
web: http://npadovano.altervista.org
"My only ambition is not be anything at all; it seems the most
sensible thing" (C. Bukowski)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: checkentry function
From: Jan Engelhardt @ 2010-10-05 12:03 UTC (permalink / raw)
To: Nicola Padovano; +Cc: Eric Dumazet, Stephen Hemminger, netfilter-devel, netdev
In-Reply-To: <AANLkTim+5kETTN3sbo-woUAEN71woJb1eG9TX04tcyKW@mail.gmail.com>
On Tuesday 2010-10-05 13:46, Nicola Padovano wrote:
>On Tue, Oct 5, 2010 at 1:32 PM, Jan Engelhardt <jengelh@medozas.de> wrote:
>> On Tuesday 2010-10-05 13:16, Nicola Padovano wrote:
>>>>
>>>> Could you read source code of _current_ existing modules , and use
>>>> copy/paste ?
>>>>
>>>> static int hashlimit_mt_check(const struct xt_mtchk_param *par)
>>>> {
>>>> ...
>>>> }
>>>
>
>this is the whole code:
>
>[WHOLE_CODE]
>static bool xt_function_check(const char *tablename, const void *entry,
> const struct xt_target *target, void *targinfo,
> unsigned int hook_mask)
>{
>
> if (strcmp(tablename, "filter"))
> {
> printk(KERN_INFO "!=filter %s\n",tablename);
> return false;
> }
>
> return true;
>}
And as Stephen said, the proper type for current kernels
is
(static) bool xt_function_check(const struct xt_mtchk_param *par).
If you are compiling against such, you should have gotten appropriate
warnings from gcc.
^ permalink raw reply
* Re: checkentry function
From: Eric Dumazet @ 2010-10-05 12:07 UTC (permalink / raw)
To: Nicola Padovano
Cc: Jan Engelhardt, Stephen Hemminger, netfilter-devel, netdev
In-Reply-To: <AANLkTim+5kETTN3sbo-woUAEN71woJb1eG9TX04tcyKW@mail.gmail.com>
Le mardi 05 octobre 2010 à 13:46 +0200, Nicola Padovano a écrit :
> On Tue, Oct 5, 2010 at 1:32 PM, Jan Engelhardt <jengelh@medozas.de> wrote:
> > On Tuesday 2010-10-05 13:16, Nicola Padovano wrote:
> >>>
> >>> Could you read source code of _current_ existing modules , and use
> >>> copy/paste ?
> >>>
> >>> static int hashlimit_mt_check(const struct xt_mtchk_param *par)
> >>> {
> >>> ...
> >>> }
> >>
> >>as i've written in a previously mail this is the checkentry function
> >>that i use in my source code to check if the iptables command line is
> >>a right line.
> >>
> >>[CHECK_ENTRY_CODE]
> >>static bool xt_tarpit_check(const char *tablename, const void *entry,
> >> const struct xt_target *target, void *targinfo,
> >> unsigned int hook_mask)
> >>
> >>i don't know what "static int hashlimit_mt_check(const struct
> >>xt_mtchk_param *par)" is...
> >
> > It's the proper function header.
> >
>
> this is the whole code:
>
> [WHOLE_CODE]
> [/WHOLE_CODE]
>
Nicola
For the second and last time, could you please _read_ _current_ kernel
source code, and correct your code, before asking us ?
We do not support prehistoric kernels.
Thank you
Dont ask us if you are not able to find hashlimit_mt_check() or any
checkentry function in current kernel sources.
# find net/netfilter/ | xargs grep -n _check
net/netfilter/nf_conntrack_proto_dccp.c:596: if (net->ct.sysctl_checksum && hooknum == NF_INET_PRE_ROUTING &&
net/netfilter/nf_conntrack_proto_dccp.c:597: nf_checksum_partial(skb, hooknum, dataoff, cscov, IPPROTO_DCCP,
net/netfilter/xt_connmark.c:77:static int connmark_tg_check(const struct xt_tgchk_param *par)
net/netfilter/xt_connmark.c:107:static int connmark_mt_check(const struct xt_mtchk_param *par)
net/netfilter/xt_connmark.c:127: .checkentry = connmark_tg_check,
net/netfilter/xt_connmark.c:138: .checkentry = connmark_mt_check,
net/netfilter/xt_CT.c:57:static int xt_ct_tg_check(const struct xt_tgchk_param *par)
net/netfilter/xt_CT.c:149: .checkentry = xt_ct_tg_check,
...
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [MeeGo-Dev][PATCH v3] Topcliff: Update PCH_CAN driver to 2.6.35
From: Masayuki Ohtake @ 2010-10-05 12:09 UTC (permalink / raw)
To: Marc Kleine-Budde
Cc: andrew.chih.howe.khor-ral2JQCrhuEAvxtiuMwx3w, Samuel Ortiz,
margie.foster-ral2JQCrhuEAvxtiuMwx3w,
netdev-u79uwXL29TY76Z2rM5mHXA, Wolfgang Grandegger,
yong.y.wang-ral2JQCrhuEAvxtiuMwx3w,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
kok.howg.ewe-ral2JQCrhuEAvxtiuMwx3w, Christian Pellegrin,
Tomoya MORINAGA, meego-dev-WXzIur8shnEAvxtiuMwx3w,
David S. Miller, joel.clark-ral2JQCrhuEAvxtiuMwx3w,
qi.wang-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <4CAB0732.3010400@pengutronix.de>
Hi Marc,
On Tuesday, October 05, 2010 8:08 PM, Marc Kleine-Budde wrote:
> If FIFO is working you might also think about NAPI.
I think NAPI isn't necessary for our CAN driver.
NAPI is for high-speed networking.
CAN is NOT high-speed.
In fact, some accepted CAN drivers don't have NAPI.
Thanks, Ohtake(OKISemi)
^ permalink raw reply
* Re: [PATCH] sysctl: fix min/max handling in __do_proc_doulongvec_minmax()
From: Américo Wang @ 2010-10-05 13:01 UTC (permalink / raw)
To: Eric Dumazet
Cc: Américo Wang, Robin Holt, Andrew Morton, linux-kernel,
Willy Tarreau, David S. Miller, netdev, James Morris,
Hideaki YOSHIFUJI, Pekka Savola (ipv6), Patrick McHardy,
Alexey Kuznetsov
In-Reply-To: <1286188701.18293.57.camel@edumazet-laptop>
On Mon, Oct 04, 2010 at 12:38:21PM +0200, Eric Dumazet wrote:
>Le lundi 04 octobre 2010 à 18:35 +0800, Américo Wang a écrit :
>
>> Your patch does fix the problem, but seems not a good solution,
>> we should skip all min/max checking if ->extra(1|2) is NULL,
>> instead of checking it every time within the loop.
>
>Please do submit a patch, we'll see if you come to a better solution,
>with no added code size (this is slow path, I dont care for checking it
>'every time winthin the loop')
>
>
I have one, but just did compile test. :)
I will test it tomorrow.
NOT-Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
---
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f88552c..345a193 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2448,86 +2448,119 @@ int proc_dointvec_minmax(struct ctl_table *table, int write,
do_proc_dointvec_minmax_conv, ¶m);
}
-static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int write,
- void __user *buffer,
+static int __doulongvec_minmax_read(void *data, void __user *buffer,
size_t *lenp, loff_t *ppos,
unsigned long convmul,
unsigned long convdiv)
{
- unsigned long *i, *min, *max;
- int vleft, first = 1, err = 0;
- unsigned long page = 0;
- size_t left;
- char *kbuf;
+ unsigned long *i = (unsigned long *) data;
+ int err = 0;
+ bool first = true;
+ size_t left = *lenp;
- if (!data || !table->maxlen || !*lenp || (*ppos && !write)) {
- *lenp = 0;
- return 0;
+ for (; left; i++, first=false) {
+ unsigned long val;
+
+ val = convdiv * (*i) / convmul;
+ if (!first)
+ err = proc_put_char(&buffer, &left, '\t');
+ err = proc_put_long(&buffer, &left, val, false);
+ if (err)
+ break;
}
- i = (unsigned long *) data;
- min = (unsigned long *) table->extra1;
- max = (unsigned long *) table->extra2;
- vleft = table->maxlen / sizeof(unsigned long);
- left = *lenp;
+ if (!first && left && !err)
+ err = proc_put_char(&buffer, &left, '\n');
- if (write) {
- if (left > PAGE_SIZE - 1)
- left = PAGE_SIZE - 1;
- page = __get_free_page(GFP_TEMPORARY);
- kbuf = (char *) page;
- if (!kbuf)
- return -ENOMEM;
- if (copy_from_user(kbuf, buffer, left)) {
- err = -EFAULT;
- goto free;
- }
- kbuf[left] = 0;
+ *lenp -= left;
+ *ppos += *lenp;
+ return err;
+}
+
+static int __doulongvec_minmax_write(void *data, void __user *buffer,
+ size_t *lenp, loff_t *ppos, int vleft,
+ unsigned long min, unsigned long max)
+{
+ char *kbuf;
+ size_t left = *lenp;
+ unsigned long page = 0;
+ unsigned long *i = (unsigned long *) data;
+ int err = 0;
+ bool first = true;
+
+ if (left > PAGE_SIZE - 1)
+ left = PAGE_SIZE - 1;
+ page = __get_free_page(GFP_TEMPORARY);
+ kbuf = (char *) page;
+ if (!kbuf)
+ return -ENOMEM;
+ if (copy_from_user(kbuf, buffer, left)) {
+ err = -EFAULT;
+ goto free;
}
+ kbuf[left] = 0;
- for (; left && vleft--; i++, min++, max++, first=0) {
+ for (; left && vleft--; i++, min++, max++, first=false) {
unsigned long val;
+ bool neg;
- if (write) {
- bool neg;
-
- left -= proc_skip_spaces(&kbuf);
+ left -= proc_skip_spaces(&kbuf);
- err = proc_get_long(&kbuf, &left, &val, &neg,
- proc_wspace_sep,
- sizeof(proc_wspace_sep), NULL);
- if (err)
- break;
- if (neg)
- continue;
- if ((min && val < *min) || (max && val > *max))
- continue;
- *i = val;
- } else {
- val = convdiv * (*i) / convmul;
- if (!first)
- err = proc_put_char(&buffer, &left, '\t');
- err = proc_put_long(&buffer, &left, val, false);
- if (err)
- break;
- }
+ err = proc_get_long(&kbuf, &left, &val, &neg,
+ proc_wspace_sep,
+ sizeof(proc_wspace_sep), NULL);
+ if (err)
+ break;
+ if (neg)
+ continue;
+ if (val < min || val > max)
+ continue;
+ *i = val;
}
- if (!write && !first && left && !err)
- err = proc_put_char(&buffer, &left, '\n');
- if (write && !err)
+ if (!err)
left -= proc_skip_spaces(&kbuf);
free:
- if (write) {
- free_page(page);
- if (first)
- return err ? : -EINVAL;
- }
+ free_page(page);
+ if (first)
+ return err ? : -EINVAL;
+
*lenp -= left;
*ppos += *lenp;
return err;
}
+static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int write,
+ void __user *buffer,
+ size_t *lenp, loff_t *ppos,
+ unsigned long convmul,
+ unsigned long convdiv)
+{
+ if (!data || !table->maxlen || !*lenp || (*ppos && !write)) {
+ *lenp = 0;
+ return 0;
+ }
+
+ if (write) {
+ unsigned long min, max;
+ int vleft;
+
+ vleft = table->maxlen / sizeof(unsigned long);
+ if (table->extra1)
+ min = *(unsigned long *) table->extra1;
+ else
+ min = 0;
+ if (table->extra2)
+ max = *(unsigned long *) table->extra2;
+ else
+ max = ULONG_MAX;
+ return __doulongvec_minmax_write(data, buffer, lenp,
+ ppos, vleft, min, max);
+ } else
+ return __doulongvec_minmax_read(data, buffer, lenp,
+ ppos, convmul, convdiv);
+}
+
static int do_proc_doulongvec_minmax(struct ctl_table *table, int write,
void __user *buffer,
size_t *lenp, loff_t *ppos,
^ permalink raw reply related
* Re: [PATCH] SIW: Module initialization
From: Bernard Metzler @ 2010-10-05 13:12 UTC (permalink / raw)
To: Bart Van Assche
Cc: bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <AANLkTik5=i+A5_OpU0rVyfYi=ibgS9UWMu82vMCmM=PN-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote on 10/05/2010 12:57:21 PM:
> Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
> Sent by: bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
>
> 10/05/2010 12:57 PM
>
> To
>
> Bernard Metzler <bmt-OA+xvbQnYDHMbYB6QlFGEg@public.gmane.org>
>
> cc
>
> netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>
> Subject
>
> Re: [PATCH] SIW: Module initialization
>
> On Tue, Oct 5, 2010 at 8:54 AM, Bernard Metzler <bmt-OA+xvbQnYDHMbYB6QlFGEg@public.gmane.org>
wrote:
> > +static int loopback_enabled;
> > +module_param(loopback_enabled, int, 0644);
> > +MODULE_PARM_DESC(loopback_enabled, "enable_loopback");
>
> A minor comment: since kernel 2.6.31 the type "bool" can be used for
> boolean kernel module parameters.
>
oh, thanks. there are currently two more occurrences (MPA crc, 0copy
transmit).
it will be changed accordingly.
> > + * TODO: Dynamic device management (network device
registration/removal).
>
> The current implementation is such that one siw device is created for
> each network device found at kernel module load time. That means that
> you force the user to load the siw kernel module after all other
> kernel modules that register a network device. I'm not sure that's a
> good idea.
>
good point. do you have a suggestion here - would you like to see
siw to be enabled more selectively?
iwarp is a protocol on top of TCP, explicitly defining the
semantics of data fetching and placement. end-to-end connectivity
and efficient data shipping is provided by TCP/IP. not taking into
account dedicated RDMA hardware, any TCP stream may carry iwarp
traffic. from that point of view, binding a software based
rdma stack to dedicated devices is a concession to the
given environment, in particular to the given rdma
connection management. therefore, we started with binding to all
available network devices.
> > + if (!siw_device) {
> > + siw_device = siw_p;
> > + siw_p->next = NULL;
> > + } else {
> > + siw_p->next = siw_device->next;
> > + siw_device->next = siw_p;
> > + }
>
> Why a custom linked list implementation instead of using <linux/list.h>
?
>
i agree. will be changed.
Bernard.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] bonding: fix WARN_ON when writing to bond_master sysfs file (v2)
From: Neil Horman @ 2010-10-05 13:39 UTC (permalink / raw)
To: netdev; +Cc: bonding-devel, fubar, davem, shemminger
In-Reply-To: <20101004202112.GA27897@hmsreliant.think-freely.org>
Ok, V2 of this patch, taking Stephens notes into account. Switched to using
__dev_get_by_name to avoid reference count inc/dec.
Fix a WARN_ON failure in bond_masters sysfs file
Got a report of this warning recently
bonding: bond0 is being created...
------------[ cut here ]------------
WARNING: at fs/proc/generic.c:590 proc_register+0x14d/0x185()
Hardware name: ProLiant BL465c G1
proc_dir_entry 'bonding/bond0' already registered
Modules linked in: bonding ipv6 tg3 bnx2 shpchp amd64_edac_mod edac_core
ipmi_si
ipmi_msghandler serio_raw i2c_piix4 k8temp edac_mce_amd hpwdt microcode hpsa
cc
iss radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded:
scsi_wai
t_scan]
Pid: 935, comm: ifup-eth Not tainted 2.6.33.5-124.fc13.x86_64 #1
Call Trace:
[<ffffffff8104b54c>] warn_slowpath_common+0x77/0x8f
[<ffffffff8104b5b1>] warn_slowpath_fmt+0x3c/0x3e
[<ffffffff8114bf0b>] proc_register+0x14d/0x185
[<ffffffff8114c20c>] proc_create_data+0x87/0xa1
[<ffffffffa0211e9b>] bond_create_proc_entry+0x55/0x95 [bonding]
[<ffffffffa0215e5d>] bond_init+0x95/0xd0 [bonding]
[<ffffffff8138cd97>] register_netdevice+0xdd/0x29e
[<ffffffffa021240b>] bond_create+0x8e/0xb8 [bonding]
[<ffffffffa021c4be>] bonding_store_bonds+0xb3/0x1c1 [bonding]
[<ffffffff812aec85>] class_attr_store+0x27/0x29
[<ffffffff8115423d>] sysfs_write_file+0x10f/0x14b
[<ffffffff81101acf>] vfs_write+0xa9/0x106
[<ffffffff81101be2>] sys_write+0x45/0x69
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
---[ end trace a677c3f7f8b16b1e ]---
bonding: Bond creation failed.
It happens because a user space writer to bond_master can try to register and
already existing bond interface name. Fix it by teaching bond_create to check
for the existance of devices with that name first in cases where a non-NULL name
parameter has been passed in
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
bond_main.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index fb70c3e..985cbc1 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5168,6 +5168,15 @@ int bond_create(struct net *net, const char *name)
res = dev_alloc_name(bond_dev, "bond%d");
if (res < 0)
goto out;
+ } else {
+ /*
+ * If we're given a name to register
+ * we need to ensure that its not already
+ * registered
+ */
+ res = -EEXIST;
+ if (__dev_get_by_name(net, name) != NULL)
+ goto out;
}
res = register_netdevice(bond_dev);
^ permalink raw reply related
* Re: [PATCH] SIW: iWARP Protocol headers
From: Steve Wise @ 2010-10-05 13:53 UTC (permalink / raw)
To: Bernard Metzler
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1286261630-5085-1-git-send-email-bmt-OA+xvbQnYDHMbYB6QlFGEg@public.gmane.org>
On 10/05/2010 01:53 AM, Bernard Metzler wrote:
> ---
> drivers/infiniband/hw/siw/iwarp.h | 324 +++++++++++++++++++++++++++++++++++++
> 1 files changed, 324 insertions(+), 0 deletions(-)
> create mode 100644 drivers/infiniband/hw/siw/iwarp.h
>
> diff --git a/drivers/infiniband/hw/siw/iwarp.h b/drivers/infiniband/hw/siw/iwarp.h
> new file mode 100644
> index 0000000..762c1d3
> --- /dev/null
> +++ b/drivers/infiniband/hw/siw/iwarp.h
> @@ -0,0 +1,324 @@
> +/*
> + * Software iWARP device driver for Linux
> + *
> + * Authors: Bernard Metzler<bmt-OA+xvbQnYDHMbYB6QlFGEg@public.gmane.org>
> + * Fredy Neeser<nfd-OA+xvbQnYDHMbYB6QlFGEg@public.gmane.org>
> + *
> + * Copyright (c) 2008-2010, IBM Corporation
> + *
> + * This software is available to you under a choice of one of two
> + * licenses. You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * BSD license below:
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * - Redistributions of source code must retain the above copyright notice,
> + * this list of conditions and the following disclaimer.
> + *
> + * - Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in the
> + * documentation and/or other materials provided with the distribution.
> + *
> + * - Neither the name of IBM nor the names of its contributors may be
> + * used to endorse or promote products derived from this software without
> + * specific prior written permission.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#ifndef _IWARP_H
> +#define _IWARP_H
> +
> +#include<rdma/rdma_user_cm.h> /* RDMA_MAX_PRIVATE_DATA */
> +#include<linux/types.h>
> +#include<asm/byteorder.h>
> +
> +
> +#define RDMAP_VERSION 1
> +#define DDP_VERSION 1
> +#define MPA_REVISION_1 1
> +#define MPA_MAX_PRIVDATA RDMA_MAX_PRIVATE_DATA
> +#define MPA_KEY_REQ "MPA ID Req Frame"
> +#define MPA_KEY_REP "MPA ID Rep Frame"
> +
> +struct mpa_rr_params {
> +#if defined(__LITTLE_ENDIAN_BITFIELD)
> + __u16 res:5,
> + r:1,
> + c:1,
> + m:1,
> + rev:8;
> +#elif defined(__BIG_ENDIAN_BITFIELD)
> + __u16 m:1,
> + c:1,
> + r:1,
> + res:5,
> + rev:8;
> +#else
> +#error "Adjust your<asm/byteorder.h> defines"
> +#endif
> + __u16 pd_len;
> +};
> +
> +/*
> + * MPA request/reply header
> + */
> +struct mpa_rr {
> + __u8 key[16];
> + struct mpa_rr_params params;
> +};
> +
> +/*
> + * Don't change the layout/size of this struct!
> + */
> +struct mpa_marker {
> + __u16 rsvd;
> + __u16 fpdu_hmd; /* FPDU header-marker distance (= MPA's FPDUPTR) */
> +};
> +
> +#define MPA_MARKER_SPACING 512
> +#define MPA_HDR_SIZE 2
> +
> +/*
> + * MPA marker size:
> + * - Standards-compliant marker insertion: Use sizeof(struct mpa_marker)
> + * - "Invisible markers" for testing sender's marker insertion
> + * without affecting receiver: Use 0
> + */
> +#define MPA_MARKER_SIZE sizeof(struct mpa_marker)
> +
> +
> +/*
> + * maximum MPA trailer
> + */
> +struct mpa_trailer {
> + char pad[4];
> + __u32 crc;
> +};
> +
> +#define MPA_CRC_SIZE 4
> +
> +
> +/*
> + * Common portion of iWARP headers (MPA, DDP, RDMAP)
> + * for any FPDU
> + */
> +struct iwarp_ctrl {
> + __u16 mpa_len;
> +#if defined(__LITTLE_ENDIAN_BITFIELD)
> + __u16 dv:2, /* DDP Version */
> + rsvd:4, /* DDP reserved, MBZ */
> + l:1, /* DDP Last flag */
> + t:1, /* DDP Tagged flag */
> + opcode:4, /* RDMAP opcode */
> + rsv:2, /* RDMAP reserved, MBZ */
> + rv:2; /* RDMAP Version, 01 for IETF, 00 for RDMAC */
> +#elif defined(__BIG_ENDIAN_BITFIELD)
> + __u16 t:1, /* DDP Tagged flag */
> + l:1, /* DDP Last flag */
> + rsvd:4, /* DDP reserved, MBZ */
> + dv:2, /* DDP Version */
> + rv:2, /* RDMAP Version, 01 for IETF, 00 for RDMAC */
> + rsv:2, /* RDMAP reserved, MBZ */
> + opcode:4; /* RDMAP opcode */
> +#else
> +#error "Adjust your<asm/byteorder.h> defines"
> +#endif
> +};
> +
> +
> +struct rdmap_terminate_ctrl {
> +#if defined(__LITTLE_ENDIAN_BITFIELD)
> + __u32 etype:4,
> + layer:4,
> + ecode:8,
> + rsvd1:5,
> + r:1,
> + d:1,
> + m:1,
> + rsvd2:8;
> +#elif defined(__BIG_ENDIAN_BITFIELD)
> + __u32 layer:4,
> + etype:4,
> + ecode:8,
> + m:1,
> + d:1,
> + r:1,
> + rsvd1:5,
> + rsvd2:8;
> +#else
> +#error "Adjust your<asm/byteorder.h> defines"
> +#endif
> +};
> +
> +
> +struct iwarp_rdma_write {
> + struct iwarp_ctrl ctrl;
> + __u32 sink_stag;
> + __u64 sink_to;
> +} __attribute__((__packed__));
> +
> +struct iwarp_rdma_rreq {
> + struct iwarp_ctrl ctrl;
> + __u32 rsvd;
> + __u32 ddp_qn;
> + __u32 ddp_msn;
> + __u32 ddp_mo;
> + __u32 sink_stag;
> + __u64 sink_to;
> + __u32 read_size;
> + __u32 source_stag;
> + __u64 source_to;
> +} __attribute__((__packed__));
> +
> +struct iwarp_rdma_rresp {
> + struct iwarp_ctrl ctrl;
> + __u32 sink_stag;
> + __u64 sink_to;
> +} __attribute__((__packed__));
> +
> +struct iwarp_send {
> + struct iwarp_ctrl ctrl;
> + __u32 rsvd;
> + __u32 ddp_qn;
> + __u32 ddp_msn;
> + __u32 ddp_mo;
> +} __attribute__((__packed__));
> +
> +struct iwarp_send_inv {
> + struct iwarp_ctrl ctrl;
> + __u32 inval_stag;
> + __u32 ddp_qn;
> + __u32 ddp_msn;
> + __u32 ddp_mo;
> +} __attribute__((__packed__));
> +
> +struct iwarp_terminate {
> + struct iwarp_ctrl ctrl;
> + __u32 rsvd;
> + __u32 ddp_qn;
> + __u32 ddp_msn;
> + __u32 ddp_mo;
> + struct rdmap_terminate_ctrl term_ctrl;
> +} __attribute__((__packed__));
> +
> +
> +/*
> + * Common portion of iWARP headers (MPA, DDP, RDMAP)
> + * for an FPDU carrying an untagged DDP segment
> + */
> +struct iwarp_ctrl_untagged {
> + struct iwarp_ctrl ctrl;
> + __u32 rsvd;
> + __u32 ddp_qn;
> + __u32 ddp_msn;
> + __u32 ddp_mo;
> +} __attribute__((__packed__));
> +
> +/*
> + * Common portion of iWARP headers (MPA, DDP, RDMAP)
> + * for an FPDU carrying a tagged DDP segment
> + */
> +struct iwarp_ctrl_tagged {
> + struct iwarp_ctrl ctrl;
> + __u32 ddp_stag;
> + __u64 ddp_to;
> +} __attribute__((__packed__));
> +
>
All of the above header structures should use __beXX types since the
fields are all in Network Byte Order.
Also, did you run sparse on the patches (Documentation/sparse.txt)?
> +union iwarp_hdrs {
> + struct iwarp_ctrl ctrl;
> + struct iwarp_ctrl_untagged c_untagged;
> + struct iwarp_ctrl_tagged c_tagged;
> + struct iwarp_rdma_write rwrite;
> + struct iwarp_rdma_rreq rreq;
> + struct iwarp_rdma_rresp rresp;
> + struct iwarp_terminate terminate;
> + struct iwarp_send send;
> + struct iwarp_send_inv send_inv;
> +};
> +
> +
> +#define MPA_MIN_FRAG ((sizeof(union iwarp_hdrs) + MPA_CRC_SIZE))
> +
> +enum ddp_etype {
> + DDP_ETYPE_CATASTROPHIC = 0x0,
> + DDP_ETYPE_TAGGED_BUF = 0x1,
> + DDP_ETYPE_UNTAGGED_BUF = 0x2,
> + DDP_ETYPE_RSVD = 0x3
> +};
> +
> +enum ddp_ecode {
> + DDP_ECODE_CATASTROPHIC = 0x00,
> + /* Tagged Buffer Errors */
> + DDP_ECODE_T_INVALID_STAG = 0x00,
> + DDP_ECODE_T_BASE_BOUNDS = 0x01,
> + DDP_ECODE_T_STAG_NOT_ASSOC = 0x02,
> + DDP_ECODE_T_TO_WRAP = 0x03,
> + DDP_ECODE_T_DDP_VERSION = 0x04,
> + /* Untagged Buffer Errors */
> + DDP_ECODE_UT_INVALID_QN = 0x01,
> + DDP_ECODE_UT_INVALID_MSN_NOBUF = 0x02,
> + DDP_ECODE_UT_INVALID_MSN_RANGE = 0x03,
> + DDP_ECODE_UT_INVALID_MO = 0x04,
> + DDP_ECODE_UT_MSG_TOOLONG = 0x05,
> + DDP_ECODE_UT_DDP_VERSION = 0x06
> +};
> +
> +
> +enum rdmap_untagged_qn {
> + RDMAP_UNTAGGED_QN_SEND = 0,
> + RDMAP_UNTAGGED_QN_RDMA_READ = 1,
> + RDMAP_UNTAGGED_QN_TERMINATE = 2,
> + RDMAP_UNTAGGED_QN_COUNT = 3
> +};
> +
> +enum rdmap_etype {
> + RDMAP_ETYPE_CATASTROPHIC = 0x0,
> + RDMAP_ETYPE_REMOTE_PROTECTION = 0x1,
> + RDMAP_ETYPE_REMOTE_OPERATION = 0x2
> +};
> +
> +enum rdmap_ecode {
> + RDMAP_ECODE_INVALID_STAG = 0x00,
> + RDMAP_ECODE_BASE_BOUNDS = 0x01,
> + RDMAP_ECODE_ACCESS_RIGHTS = 0x02,
> + RDMAP_ECODE_STAG_NOT_ASSOC = 0x03,
> + RDMAP_ECODE_TO_WRAP = 0x04,
> + RDMAP_ECODE_RDMAP_VERSION = 0x05,
> + RDMAP_ECODE_UNEXPECTED_OPCODE = 0x06,
> + RDMAP_ECODE_CATASTROPHIC_STREAM = 0x07,
> + RDMAP_ECODE_CATASTROPHIC_GLOBAL = 0x08,
> + RDMAP_ECODE_STAG_NOT_INVALIDATE = 0x09,
> + RDMAP_ECODE_UNSPECIFIED = 0xff
> +};
> +
> +enum rdmap_elayer {
> + RDMAP_ERROR_LAYER_RDMA = 0x00,
> + RDMAP_ERROR_LAYER_DDP = 0x01,
> + RDMAP_ERROR_LAYER_LLP = 0x02 /* eg., MPA */
> +};
> +
> +enum rdma_opcode {
> + RDMAP_RDMA_WRITE = 0x0,
> + RDMAP_RDMA_READ_REQ = 0x1,
> + RDMAP_RDMA_READ_RESP = 0x2,
> + RDMAP_SEND = 0x3,
> + RDMAP_SEND_INVAL = 0x4,
> + RDMAP_SEND_SE = 0x5,
> + RDMAP_SEND_SE_INVAL = 0x6,
> + RDMAP_TERMINATE = 0x7,
> + RDMAP_NOT_SUPPORTED = RDMAP_TERMINATE + 1
> +};
> +
> +#endif
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] SIW: User interface
From: Steve Wise @ 2010-10-05 14:17 UTC (permalink / raw)
To: Bernard Metzler
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1286261647-5139-1-git-send-email-bmt-OA+xvbQnYDHMbYB6QlFGEg@public.gmane.org>
On 10/05/2010 01:54 AM, Bernard Metzler wrote:
<snip>
> +
> +/*
> + * siw_post_send()
> + *
> + * Post a list of S-WR's to a SQ.
> + *
> + * @ofa_qp: OFA QP contained in siw QP
> + * @wr: Null terminated list of user WR's
> + * @bad_wr: Points to failing WR in case of synchronous failure.
> + */
> +int siw_post_send(struct ib_qp *ofa_qp, struct ib_send_wr *wr,
> + struct ib_send_wr **bad_wr)
> +{
> + struct siw_wqe *wqe = NULL;
> + struct siw_qp *qp = siw_qp_ofa2siw(ofa_qp);
> +
> + unsigned long flags;
> + int rv = 0;
> +
> + dprint(DBG_WR|DBG_TX, "(QP%d): state=%d\n",
> + QP_ID(qp), qp->attrs.state);
> +
> + /*
> + * Acquire QP state lock for reading. The idea is that a
> + * user cannot move the QP out of RTS during TX/RX processing.
> + */
> + down_read(&qp->state_lock);
> +
>
I don't think you can use a rw_semaphore here because it potentially can
block. You cannot block/sleep in the post_send/post_recv (and some
other) RDMA provider functions. See
Documentation/infiniband/core_locking.txt.
Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH net-next] dccp:
From: Stephen Hemminger @ 2010-10-05 14:24 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, David Miller; +Cc: dccp, netdev
Remove dead code and make some functions static.
Compile tested only.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
---
net/dccp/dccp.h | 2 --
net/dccp/feat.c | 10 ----------
net/dccp/feat.h | 1 -
net/dccp/options.c | 4 +---
net/dccp/proto.c | 48 ++++++++++++++++++++++++------------------------
5 files changed, 25 insertions(+), 40 deletions(-)
--- a/net/dccp/dccp.h 2010-10-05 23:09:10.000117419 +0900
+++ b/net/dccp/dccp.h 2010-10-05 23:22:37.879120065 +0900
@@ -246,7 +246,6 @@ static inline void dccp_clear_xmit_timer
extern unsigned int dccp_sync_mss(struct sock *sk, u32 pmtu);
extern const char *dccp_packet_name(const int type);
-extern const char *dccp_state_name(const int state);
extern void dccp_set_state(struct sock *sk, const int state);
extern void dccp_done(struct sock *sk);
@@ -449,7 +448,6 @@ extern int dccp_insert_options_rsk(struc
extern int dccp_insert_option_elapsed_time(struct sk_buff *skb, u32 elapsed);
extern u32 dccp_timestamp(void);
extern void dccp_timestamping_init(void);
-extern int dccp_insert_option_timestamp(struct sk_buff *skb);
extern int dccp_insert_option(struct sk_buff *skb, unsigned char option,
const void *value, unsigned char len);
--- a/net/dccp/feat.c 2010-10-05 23:09:09.828117827 +0900
+++ b/net/dccp/feat.c 2010-10-05 23:14:58.315118201 +0900
@@ -730,16 +730,6 @@ int dccp_feat_register_sp(struct sock *s
0, list, len);
}
-/* Analogous to dccp_feat_register_sp(), but for non-negotiable values */
-int dccp_feat_register_nn(struct sock *sk, u8 feat, u64 val)
-{
- /* any changes must be registered before establishing the connection */
- if (sk->sk_state != DCCP_CLOSED)
- return -EISCONN;
- if (dccp_feat_type(feat) != FEAT_NN)
- return -EINVAL;
- return __feat_register_nn(&dccp_sk(sk)->dccps_featneg, feat, 0, val);
-}
/*
* Tracking features whose value depend on the choice of CCID
--- a/net/dccp/options.c 2010-10-05 23:09:09.884117478 +0900
+++ b/net/dccp/options.c 2010-10-05 23:14:27.375117940 +0900
@@ -369,7 +369,7 @@ int dccp_insert_option_elapsed_time(stru
EXPORT_SYMBOL_GPL(dccp_insert_option_elapsed_time);
-int dccp_insert_option_timestamp(struct sk_buff *skb)
+static int dccp_insert_option_timestamp(struct sk_buff *skb)
{
__be32 now = htonl(dccp_timestamp());
/* yes this will overflow but that is the point as we want a
@@ -378,8 +378,6 @@ int dccp_insert_option_timestamp(struct
return dccp_insert_option(skb, DCCPO_TIMESTAMP, &now, sizeof(now));
}
-EXPORT_SYMBOL_GPL(dccp_insert_option_timestamp);
-
static int dccp_insert_option_timestamp_echo(struct dccp_sock *dp,
struct dccp_request_sock *dreq,
struct sk_buff *skb)
--- a/net/dccp/proto.c 2010-10-05 23:09:09.956117344 +0900
+++ b/net/dccp/proto.c 2010-10-05 23:22:09.171119892 +0900
@@ -50,6 +50,30 @@ EXPORT_SYMBOL_GPL(dccp_hashinfo);
/* the maximum queue length for tx in packets. 0 is no limit */
int sysctl_dccp_tx_qlen __read_mostly = 5;
+#ifdef CONFIG_IP_DCCP_DEBUG
+static const char *dccp_state_name(const int state)
+{
+ static const char *const dccp_state_names[] = {
+ [DCCP_OPEN] = "OPEN",
+ [DCCP_REQUESTING] = "REQUESTING",
+ [DCCP_PARTOPEN] = "PARTOPEN",
+ [DCCP_LISTEN] = "LISTEN",
+ [DCCP_RESPOND] = "RESPOND",
+ [DCCP_CLOSING] = "CLOSING",
+ [DCCP_ACTIVE_CLOSEREQ] = "CLOSEREQ",
+ [DCCP_PASSIVE_CLOSE] = "PASSIVE_CLOSE",
+ [DCCP_PASSIVE_CLOSEREQ] = "PASSIVE_CLOSEREQ",
+ [DCCP_TIME_WAIT] = "TIME_WAIT",
+ [DCCP_CLOSED] = "CLOSED",
+ };
+
+ if (state >= DCCP_MAX_STATES)
+ return "INVALID STATE!";
+ else
+ return dccp_state_names[state];
+}
+#endif
+
void dccp_set_state(struct sock *sk, const int state)
{
const int oldstate = sk->sk_state;
@@ -146,30 +170,6 @@ const char *dccp_packet_name(const int t
EXPORT_SYMBOL_GPL(dccp_packet_name);
-const char *dccp_state_name(const int state)
-{
- static const char *const dccp_state_names[] = {
- [DCCP_OPEN] = "OPEN",
- [DCCP_REQUESTING] = "REQUESTING",
- [DCCP_PARTOPEN] = "PARTOPEN",
- [DCCP_LISTEN] = "LISTEN",
- [DCCP_RESPOND] = "RESPOND",
- [DCCP_CLOSING] = "CLOSING",
- [DCCP_ACTIVE_CLOSEREQ] = "CLOSEREQ",
- [DCCP_PASSIVE_CLOSE] = "PASSIVE_CLOSE",
- [DCCP_PASSIVE_CLOSEREQ] = "PASSIVE_CLOSEREQ",
- [DCCP_TIME_WAIT] = "TIME_WAIT",
- [DCCP_CLOSED] = "CLOSED",
- };
-
- if (state >= DCCP_MAX_STATES)
- return "INVALID STATE!";
- else
- return dccp_state_names[state];
-}
-
-EXPORT_SYMBOL_GPL(dccp_state_name);
-
int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized)
{
struct dccp_sock *dp = dccp_sk(sk);
--- a/net/dccp/feat.h 2010-10-05 23:12:45.999118130 +0900
+++ b/net/dccp/feat.h 2010-10-05 23:13:10.347118219 +0900
@@ -111,7 +111,6 @@ extern int dccp_feat_init(struct sock *
extern void dccp_feat_initialise_sysctls(void);
extern int dccp_feat_register_sp(struct sock *sk, u8 feat, u8 is_local,
u8 const *list, u8 len);
-extern int dccp_feat_register_nn(struct sock *sk, u8 feat, u64 val);
extern int dccp_feat_parse_options(struct sock *, struct dccp_request_sock *,
u8 mand, u8 opt, u8 feat, u8 *val, u8 len);
extern int dccp_feat_clone_list(struct list_head const *, struct list_head *);
^ permalink raw reply
* [net-next PATCH] igb: update adapter stats when reading /proc/net/dev.
From: Jesper Dangaard Brouer @ 2010-10-05 14:18 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Jeff Kirsher
Network driver igb: Improve the accuracy of stats in /proc/net/dev, by
updating the adapter stats when reading /proc/net/dev. Currently the
stats are updated by the watchdog timer every 2 sec, or when getting
stats via ethtool -S.
A number of userspace apps read these /proc/net/dev stats every second,
e.g. ifstat, which then gives a perceived very bursty traffic pattern,
which is actually false.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
---
drivers/net/igb/igb_main.c | 12 +++++++++---
1 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 55edcb7..6cec297 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -4218,11 +4218,17 @@ static void igb_reset_task(struct work_struct *work)
* @netdev: network interface device structure
*
* Returns the address of the device statistics structure.
- * The statistics are actually updated from the timer callback.
+ * The statistics are also updated from the timer callback
+ * igb_watchdog_task().
**/
static struct net_device_stats *igb_get_stats(struct net_device *netdev)
{
- /* only return the current stats */
+ struct igb_adapter *adapter = netdev_priv(netdev);
+
+ /* update stats */
+ igb_update_stats(adapter);
+
+ /* return the current stats */
return &netdev->stats;
}
@@ -4307,7 +4313,7 @@ static int igb_change_mtu(struct net_device *netdev, int new_mtu)
void igb_update_stats(struct igb_adapter *adapter)
{
- struct net_device_stats *net_stats = igb_get_stats(adapter->netdev);
+ struct net_device_stats *net_stats = &adapter->netdev->stats;
struct e1000_hw *hw = &adapter->hw;
struct pci_dev *pdev = adapter->pdev;
u32 reg, mpc;
^ permalink raw reply related
* Re: [PATCH] SIW: Object management
From: Steve Wise @ 2010-10-05 14:26 UTC (permalink / raw)
To: Bernard Metzler; +Cc: netdev, linux-rdma
In-Reply-To: <1286261665-5175-1-git-send-email-bmt@zurich.ibm.com>
On 10/05/2010 01:54 AM, Bernard Metzler wrote:
<snip>+
> +
> +/***** routines for WQE handling ***/
> +
> +/*
> + * siw_wqe_get()
> + *
> + * Get new WQE. For READ RESPONSE, take it from the free list which
> + * has a maximum size of maximum inbound READs. All other WQE are
> + * malloc'ed which creates some overhead. Consider change to
> + *
> + * 1. malloc WR only if it cannot be synchonously completed, or
> + * 2. operate own cache of reuseable WQE's.
> + *
> + * Current code trusts on malloc efficiency.
> + */
> +inline struct siw_wqe *siw_wqe_get(struct siw_qp *qp, enum siw_wr_opcode op)
> +{
> + struct siw_wqe *wqe;
> +
> + if (op == SIW_WR_RDMA_READ_RESP) {
> + spin_lock(&qp->freelist_lock);
> + if (!(list_empty(&qp->wqe_freelist))) {
> + wqe = list_entry(qp->wqe_freelist.next,
> + struct siw_wqe, list);
> + list_del(&wqe->list);
> + spin_unlock(&qp->freelist_lock);
> + wqe->processed = 0;
> + dprint(DBG_OBJ|DBG_WR,
> + "(QP%d): WQE from FreeList p: %p\n",
> + QP_ID(qp), wqe);
> + } else {
> + spin_unlock(&qp->freelist_lock);
> + wqe = NULL;
> + dprint(DBG_ON|DBG_OBJ|DBG_WR,
> + "(QP%d): FreeList empty!\n", QP_ID(qp));
> + }
> + } else {
> + wqe = kzalloc(sizeof(struct siw_wqe), GFP_KERNEL);
> + dprint(DBG_OBJ|DBG_WR, "(QP%d): New WQE p: %p\n",
> + QP_ID(qp), wqe);
> + }
>
I think you can't allocate at GFP_KERNEL here if this is called from the
post_ functions. I think you might want to pre-allocate these when you
create the QP...
Steve.
^ permalink raw reply
* Re: [PATCH] SIW: User interface
From: Bernard Metzler @ 2010-10-05 14:29 UTC (permalink / raw)
To: Steve Wise; +Cc: linux-rdma, netdev
In-Reply-To: <4CAB337E.1040205@opengridcomputing.com>
Steve Wise <swise@opengridcomputing.com> wrote on 10/05/2010 04:17:34 PM:
> Steve Wise <swise@opengridcomputing.com>
> 10/05/2010 04:17 PM
>
> To
>
> Bernard Metzler <bmt@zurich.ibm.com>
>
> cc
>
> netdev@vger.kernel.org, linux-rdma@vger.kernel.org
>
> Subject
>
> Re: [PATCH] SIW: User interface
>
> On 10/05/2010 01:54 AM, Bernard Metzler wrote:
>
>
> <snip>
>
> > +
> > +/*
> > + * siw_post_send()
> > + *
> > + * Post a list of S-WR's to a SQ.
> > + *
> > + * @ofa_qp: OFA QP contained in siw QP
> > + * @wr: Null terminated list of user WR's
> > + * @bad_wr: Points to failing WR in case of synchronous failure.
> > + */
> > +int siw_post_send(struct ib_qp *ofa_qp, struct ib_send_wr *wr,
> > + struct ib_send_wr **bad_wr)
> > +{
> > + struct siw_wqe *wqe = NULL;
> > + struct siw_qp *qp = siw_qp_ofa2siw(ofa_qp);
> > +
> > + unsigned long flags;
> > + int rv = 0;
> > +
> > + dprint(DBG_WR|DBG_TX, "(QP%d): state=%d\n",
> > + QP_ID(qp), qp->attrs.state);
> > +
> > + /*
> > + * Acquire QP state lock for reading. The idea is that a
> > + * user cannot move the QP out of RTS during TX/RX processing.
> > + */
> > + down_read(&qp->state_lock);
> > +
> >
>
> I don't think you can use a rw_semaphore here because it potentially can
> block. You cannot block/sleep in the post_send/post_recv (and some
> other) RDMA provider functions. See
> Documentation/infiniband/core_locking.txt.
>
>
ah, ok.
with that, a down_read_trylock() would solve the issue...?
given the limited set of errno values - what would you suggest
as a meaningful return value? EBUSY, EINVAL, ...?
thanks!
bernard.
> Steve.
^ permalink raw reply
* Re: [PATCH] SIW: User interface
From: Steve Wise @ 2010-10-05 14:32 UTC (permalink / raw)
To: Bernard Metzler; +Cc: linux-rdma, netdev
In-Reply-To: <OF07B553EF.B746D69E-ONC12577B3.004EAD6A-C12577B3.004F92B0@ch.ibm.com>
On 10/05/2010 09:29 AM, Bernard Metzler wrote:
> Steve Wise<swise@opengridcomputing.com> wrote on 10/05/2010 04:17:34 PM:
>
>
>> Steve Wise<swise@opengridcomputing.com>
>> 10/05/2010 04:17 PM
>>
>> To
>>
>> Bernard Metzler<bmt@zurich.ibm.com>
>>
>> cc
>>
>> netdev@vger.kernel.org, linux-rdma@vger.kernel.org
>>
>> Subject
>>
>> Re: [PATCH] SIW: User interface
>>
>> On 10/05/2010 01:54 AM, Bernard Metzler wrote:
>>
>>
>> <snip>
>>
>>
>>> +
>>> +/*
>>> + * siw_post_send()
>>> + *
>>> + * Post a list of S-WR's to a SQ.
>>> + *
>>> + * @ofa_qp: OFA QP contained in siw QP
>>> + * @wr: Null terminated list of user WR's
>>> + * @bad_wr: Points to failing WR in case of synchronous failure.
>>> + */
>>> +int siw_post_send(struct ib_qp *ofa_qp, struct ib_send_wr *wr,
>>> + struct ib_send_wr **bad_wr)
>>> +{
>>> + struct siw_wqe *wqe = NULL;
>>> + struct siw_qp *qp = siw_qp_ofa2siw(ofa_qp);
>>> +
>>> + unsigned long flags;
>>> + int rv = 0;
>>> +
>>> + dprint(DBG_WR|DBG_TX, "(QP%d): state=%d\n",
>>> + QP_ID(qp), qp->attrs.state);
>>> +
>>> + /*
>>> + * Acquire QP state lock for reading. The idea is that a
>>> + * user cannot move the QP out of RTS during TX/RX processing.
>>> + */
>>> + down_read(&qp->state_lock);
>>> +
>>>
>>>
>> I don't think you can use a rw_semaphore here because it potentially can
>>
>
>> block. You cannot block/sleep in the post_send/post_recv (and some
>> other) RDMA provider functions. See
>> Documentation/infiniband/core_locking.txt.
>>
>>
>>
> ah, ok.
> with that, a down_read_trylock() would solve the issue...?
> given the limited set of errno values - what would you suggest
> as a meaningful return value? EBUSY, EINVAL, ...?
>
>
I think it is expected that you should implement this without requiring
the blocking semaphore. Returning an error will cause the application
to bail.
Steve.
^ permalink raw reply
* Re: [PATCH] bonding: fix to rejoin multicast groups immediately
From: Flavio Leitner @ 2010-10-05 14:34 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20101005.001338.52208103.davem@davemloft.net>
On Tue, Oct 05, 2010 at 12:13:38AM -0700, David Miller wrote:
> From: Flavio Leitner <fleitner@redhat.com>
> Date: Wed, 29 Sep 2010 04:12:07 -0300
>
> > It should rejoin multicast groups immediately when
> > the failover happens to restore the multicast traffic.
> >
> > Signed-off-by: Flavio Leitner <fleitner@redhat.com>
>
> I suspect the IGMPv3 handling via a delayed action, as is currently
> implemented, is on purpose and is done so to follow the specification
> of the IGMPv3 RFCs.
>
> Therefore you have to explain why your new behavior is so desirable
> and in particular why something as undesirable as violating the RFCs
> is therefore warranted.
That patch only changes the behavior for bonding during a link
failure, so if we have a bonding in active-backup or any other mode
with current-active-slave, the initialization will happen just fine
following IGMP specs.
However, neither the backup slave interface nor the backup switch
connected to backup slave knows about mcast. Thus when a link failure
happens, we shouldn't rely on timers to not stay out of the mcast
group losing traffic.
E.g. The V1 specs says that we shouldn't send any membership report
if it has been one in the last minute because that means the switch
is notified and the system will receive mcast traffic for that group.
Therefore, if it sees one and a link failure happens right after that,
the backup slave will send another membership report only one minute
later. During this time the system loses traffic.
--
Flavio
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox