* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
From: Eric Dumazet @ 2009-10-22 4:57 UTC (permalink / raw)
To: Bill Fink; +Cc: William Allen Simpson, netdev
In-Reply-To: <20091022003245.5cd4885c.billfink@mindspring.com>
Bill Fink a écrit :
> And as mentioned previously, the global options can be quite useful
> in certain test scenarios. I also agree the per route settings are
> a very useful addition. I think the global and per route settings
> are complementary and shouldn't be thought of as in conflict with
> one another.
>
Absolutely, global setting is a must when an admin wants a quick path.
The more flexible would be to have two bits per route, plus
2 bits on the global configuration.
global conf:
00 : timestamps OFF, unless a route setting is not 00
01 : timestamps ON, unless a route setting is not 00
10 : Force timestamps OFF, ignore route settings (emergency sysadmin request)
11 : Force timestamps ON, ignore route settings
Route settings (used *only* if global setting is 0Y)
00 : global conf is used
01 : Force timestamps being OFF for this route
10 : Force timestamps being ON for this route
11 : complement global conf
^ permalink raw reply
* [PATCH 1/3] netxen: fix i2c init
From: Dhananjay Phadke @ 2009-10-22 5:39 UTC (permalink / raw)
To: davem; +Cc: netdev
In-Reply-To: <1256189943-20477-1-git-send-email-dhananjay@netxen.com>
Avoid resetting subsys ID in i2c block. Also remove duplicate
check for address tranlsation error.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
---
drivers/net/netxen/netxen_nic_hdr.h | 1 +
drivers/net/netxen/netxen_nic_init.c | 8 ++------
2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/net/netxen/netxen_nic_hdr.h b/drivers/net/netxen/netxen_nic_hdr.h
index 7a71774..1c46da6 100644
--- a/drivers/net/netxen/netxen_nic_hdr.h
+++ b/drivers/net/netxen/netxen_nic_hdr.h
@@ -419,6 +419,7 @@ enum {
#define NETXEN_CRB_ROMUSB \
NETXEN_PCI_CRB_WINDOW(NETXEN_HW_PX_MAP_CRB_ROMUSB)
#define NETXEN_CRB_I2Q NETXEN_PCI_CRB_WINDOW(NETXEN_HW_PX_MAP_CRB_I2Q)
+#define NETXEN_CRB_I2C0 NETXEN_PCI_CRB_WINDOW(NETXEN_HW_PX_MAP_CRB_I2C0)
#define NETXEN_CRB_SMB NETXEN_PCI_CRB_WINDOW(NETXEN_HW_PX_MAP_CRB_SMB)
#define NETXEN_CRB_MAX NETXEN_PCI_CRB_WINDOW(64)
diff --git a/drivers/net/netxen/netxen_nic_init.c b/drivers/net/netxen/netxen_nic_init.c
index 91c2bc6..e40b914 100644
--- a/drivers/net/netxen/netxen_nic_init.c
+++ b/drivers/net/netxen/netxen_nic_init.c
@@ -531,6 +531,8 @@ int netxen_pinit_from_rom(struct netxen_adapter *adapter, int verbose)
continue;
if (NX_IS_REVISION_P3(adapter->ahw.revision_id)) {
+ if (off == (NETXEN_CRB_I2C0 + 0x1c))
+ continue;
/* do not reset PCI */
if (off == (ROMUSB_GLB + 0xbc))
continue;
@@ -553,12 +555,6 @@ int netxen_pinit_from_rom(struct netxen_adapter *adapter, int verbose)
continue;
}
- if (off == NETXEN_ADDR_ERROR) {
- printk(KERN_ERR "%s: Err: Unknown addr: 0x%08x\n",
- netxen_nic_driver_name, buf[i].addr);
- continue;
- }
-
init_delay = 1;
/* After writing this register, HW needs time for CRB */
/* to quiet down (else crb_window returns 0xffffffff) */
--
1.6.0.2
^ permalink raw reply related
* [PATCH 2/3] netxen: fix tx timeout handling on firmware hang
From: Dhananjay Phadke @ 2009-10-22 5:39 UTC (permalink / raw)
To: davem; +Cc: netdev, Amit Kumar Salecha
In-Reply-To: <1256189943-20477-1-git-send-email-dhananjay@netxen.com>
From: Amit Kumar Salecha <amit.salecha@qlogic.com>
Clear NX_RESETING bit in netxen_tx_timeout_task() so that
the firmware watchdog task can catch need_reset request
from tx timeout.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
---
drivers/net/netxen/netxen_nic_main.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/drivers/net/netxen/netxen_nic_main.c b/drivers/net/netxen/netxen_nic_main.c
index 7fc15e9..0b4a56a 100644
--- a/drivers/net/netxen/netxen_nic_main.c
+++ b/drivers/net/netxen/netxen_nic_main.c
@@ -1919,6 +1919,7 @@ static void netxen_tx_timeout_task(struct work_struct *work)
request_reset:
adapter->need_fw_reset = 1;
+ clear_bit(__NX_RESETTING, &adapter->state);
}
struct net_device_stats *netxen_nic_get_stats(struct net_device *netdev)
--
1.6.0.2
^ permalink raw reply related
* [PATCH 0/3] netxen: bug fixes
From: Dhananjay Phadke @ 2009-10-22 5:39 UTC (permalink / raw)
To: davem; +Cc: netdev
Dave,
3 bug fixes for 2.6.32. Please apply to net-2.6.
Thanks,
Dhananjay
^ permalink raw reply
* [PATCH 3/3] netxen: avoid undue board config check
From: Dhananjay Phadke @ 2009-10-22 5:39 UTC (permalink / raw)
To: davem; +Cc: netdev
In-Reply-To: <1256189943-20477-1-git-send-email-dhananjay@netxen.com>
Old code assumed board config version in the flash to be 1.
When this will get changed by tools, driver just refuses to
attach. This is unnecessary since driver does not have to
parse board config structure directly (maintained by firmware).
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
---
drivers/net/netxen/netxen_nic_hw.c | 14 ++++----------
1 files changed, 4 insertions(+), 10 deletions(-)
diff --git a/drivers/net/netxen/netxen_nic_hw.c b/drivers/net/netxen/netxen_nic_hw.c
index 3231400..3185a98 100644
--- a/drivers/net/netxen/netxen_nic_hw.c
+++ b/drivers/net/netxen/netxen_nic_hw.c
@@ -1901,22 +1901,16 @@ netxen_setup_hwops(struct netxen_adapter *adapter)
int netxen_nic_get_board_info(struct netxen_adapter *adapter)
{
- int offset, board_type, magic, header_version;
+ int offset, board_type, magic;
struct pci_dev *pdev = adapter->pdev;
offset = NX_FW_MAGIC_OFFSET;
if (netxen_rom_fast_read(adapter, offset, &magic))
return -EIO;
- offset = NX_HDR_VERSION_OFFSET;
- if (netxen_rom_fast_read(adapter, offset, &header_version))
- return -EIO;
-
- if (magic != NETXEN_BDINFO_MAGIC ||
- header_version != NETXEN_BDINFO_VERSION) {
- dev_err(&pdev->dev,
- "invalid board config, magic=%08x, version=%08x\n",
- magic, header_version);
+ if (magic != NETXEN_BDINFO_MAGIC) {
+ dev_err(&pdev->dev, "invalid board config, magic=%08x\n",
+ magic);
return -EIO;
}
--
1.6.0.2
^ permalink raw reply related
* [PATCH] udev: create empty regular files to represent net interfaces
From: dann frazier @ 2009-10-22 6:36 UTC (permalink / raw)
To: linux-hotplug
Cc: Narendra_K, netdev, Matt_Domsch, Jordan_Hargrave, Charles_Rose,
Ben Hutchings
In-Reply-To: <20091016214024.GA10091@ldl.fc.hp.com>
Here's a proof of concept to further the discussion..
The default filename uses the format:
/dev/netdev/by-ifindex/$ifindex
This provides the infrastructure to permit udev rules to create aliases for
network devices using symlinks, for example:
/dev/netdev/by-name/eth0 -> ../by-ifindex/1
/dev/netdev/by-biosname/LOM0 -> ../by-ifindex/3
A library (such as the proposed libnetdevname) could use this information
to provide an alias->realname mapping for network utilities.
Tested with the following rule:
SUBSYSTEM=="net", PROGRAM=="/usr/local/bin/ifindex2name $attr{ifindex}", SYMLINK+="netdev/by-name/%c"
$ cat /usr/local/bin/ifindex2name
#!/bin/sh
set -e
ifindex="$1"
for d in /sys/class/net/*; do
testindex="$(cat $d/ifindex)"
if [ "$ifindex" = "$testindex" ]; then
echo "$(basename $d)"
exit 0
fi
done
exit 1
---
libudev/exported_symbols | 1 +
libudev/libudev.c | 29 ++++++++++++++++
libudev/libudev.h | 1 +
udev/udev-event.c | 82 ++++++++++++++++++++--------------------------
udev/udev-node.c | 41 ++++++++++++++++++++---
udev/udev-rules.c | 3 +-
6 files changed, 105 insertions(+), 52 deletions(-)
diff --git a/libudev/exported_symbols b/libudev/exported_symbols
index 018463d..31c616a 100644
--- a/libudev/exported_symbols
+++ b/libudev/exported_symbols
@@ -8,6 +8,7 @@ udev_get_userdata
udev_set_userdata
udev_get_sys_path
udev_get_dev_path
+udev_get_netdev_path
udev_list_entry_get_next
udev_list_entry_get_by_name
udev_list_entry_get_name
diff --git a/libudev/libudev.c b/libudev/libudev.c
index 1909138..2a83417 100644
--- a/libudev/libudev.c
+++ b/libudev/libudev.c
@@ -42,6 +42,7 @@ struct udev {
void *userdata;
char *sys_path;
char *dev_path;
+ char *netdev_path;
char *rules_path;
struct udev_list_node properties_list;
int log_priority;
@@ -125,8 +126,10 @@ struct udev *udev_new(void)
udev->run = 1;
udev->dev_path = strdup("/dev");
udev->sys_path = strdup("/sys");
+ udev->netdev_path = strdup("/dev/netdev/by-ifindex");
config_file = strdup(SYSCONFDIR "/udev/udev.conf");
if (udev->dev_path == NULL ||
+ udev->netdev_path == NULL ||
udev->sys_path == NULL ||
config_file == NULL)
goto err;
@@ -243,6 +246,14 @@ struct udev *udev_new(void)
udev_add_property(udev, "UDEV_ROOT", udev->dev_path);
}
+ env = getenv("NETDEV_ROOT");
+ if (env != NULL) {
+ free(udev->netdev_path);
+ udev->netdev_path = strdup(env);
+ util_remove_trailing_chars(udev->netdev_path, '/');
+ udev_add_property(udev, "NETDEV_ROOT", udev->netdev_path);
+ }
+
env = getenv("UDEV_LOG");
if (env != NULL)
udev_set_log_priority(udev, util_log_priority(env));
@@ -253,6 +264,7 @@ struct udev *udev_new(void)
dbg(udev, "log_priority=%d\n", udev->log_priority);
dbg(udev, "config_file='%s'\n", config_file);
dbg(udev, "dev_path='%s'\n", udev->dev_path);
+ dbg(udev, "netdev_path='%s'\n", udev->netdev_path);
dbg(udev, "sys_path='%s'\n", udev->sys_path);
if (udev->rules_path != NULL)
dbg(udev, "rules_path='%s'\n", udev->rules_path);
@@ -398,6 +410,23 @@ const char *udev_get_dev_path(struct udev *udev)
return udev->dev_path;
}
+/**
+ * udev_get_netdev_path:
+ * @udev: udev library context
+ *
+ * Retrieve the device directory path. The default value is "/etc/udev/net",
+ * the actual value may be overridden in the udev configuration
+ * file.
+ *
+ * Returns: the device directory path
+ **/
+const char *udev_get_netdev_path(struct udev *udev)
+{
+ if (udev == NULL)
+ return NULL;
+ return udev->netdev_path;
+}
+
struct udev_list_entry *udev_add_property(struct udev *udev, const char *key, const char *value)
{
if (value == NULL) {
diff --git a/libudev/libudev.h b/libudev/libudev.h
index 4bcf442..5834781 100644
--- a/libudev/libudev.h
+++ b/libudev/libudev.h
@@ -77,6 +77,7 @@ struct udev_device *udev_device_get_parent_with_subsystem_devtype(struct udev_de
const char *subsystem, const char *devtype);
/* retrieve device properties */
const char *udev_device_get_devpath(struct udev_device *udev_device);
+const char *udev_device_get_netdevpath(struct udev_device *udev_device);
const char *udev_device_get_subsystem(struct udev_device *udev_device);
const char *udev_device_get_devtype(struct udev_device *udev_device);
const char *udev_device_get_syspath(struct udev_device *udev_device);
diff --git a/udev/udev-event.c b/udev/udev-event.c
index d5b4d09..953f87a 100644
--- a/udev/udev-event.c
+++ b/udev/udev-event.c
@@ -542,7 +542,7 @@ int udev_event_execute_rules(struct udev_event *event, struct udev_rules *rules)
}
/* add device node */
- if (major(udev_device_get_devnum(dev)) != 0 &&
+ if ((major(udev_device_get_devnum(dev)) != 0 || strcmp(udev_device_get_subsystem(dev), "net") == 0) &&
(strcmp(udev_device_get_action(dev), "add") == 0 || strcmp(udev_device_get_action(dev), "change") == 0)) {
char filename[UTIL_PATH_SIZE];
struct udev_device *dev_old;
@@ -603,10 +603,38 @@ int udev_event_execute_rules(struct udev_event *event, struct udev_rules *rules)
goto exit_add;
}
- /* set device node name */
- util_strscpyl(filename, sizeof(filename), udev_get_dev_path(event->udev), "/", event->name, NULL);
- udev_device_set_devnode(dev, filename);
-
+ /* add netif */
+ if (strcmp(udev_device_get_subsystem(dev), "net") == 0 &&
+ strcmp(udev_device_get_action(dev), "add") == 0) {
+ char syspath[UTIL_PATH_SIZE];
+ info(event->udev, "netif add '%s'\n", udev_device_get_devpath(dev));
+ /* look if we want to change the name of the netif */
+ if (strcmp(event->name, udev_device_get_sysname(dev)) != 0) {
+ char *pos;
+ err = rename_netif(event);
+ if (err != 0)
+ goto exit;
+ info(event->udev, "renamed netif to '%s'\n", event->name);
+
+ /* remember old name */
+ udev_device_add_property(dev, "INTERFACE_OLD", udev_device_get_sysname(dev));
+
+ /* now change the devpath, because the kernel device name has changed */
+ util_strscpy(syspath, sizeof(syspath), udev_device_get_syspath(dev));
+ pos = strrchr(syspath, '/');
+ if (pos != NULL) {
+ pos++;
+ util_strscpy(pos, sizeof(syspath) - (pos - syspath), event->name);
+ udev_device_set_syspath(event->dev, syspath);
+ udev_device_add_property(dev, "INTERFACE", udev_device_get_sysname(dev));
+ info(event->udev, "changed devpath to '%s'\n", udev_device_get_devpath(dev));
+ }
+ }
+ snprintf(syspath, sizeof(syspath), "%s/%s", udev_get_netdev_path(event->udev),
+ udev_device_get_property_value(event->dev, "IFINDEX"));
+ udev_device_set_devnode(dev, syspath);
+ }
+
/* write current database entry */
udev_device_update_db(dev);
@@ -632,49 +660,11 @@ exit_add:
goto exit;
}
- /* add netif */
- if (strcmp(udev_device_get_subsystem(dev), "net") == 0 && strcmp(udev_device_get_action(dev), "add") == 0) {
- dbg(event->udev, "netif add '%s'\n", udev_device_get_devpath(dev));
- udev_device_delete_db(dev);
-
- udev_rules_apply_to_event(rules, event);
- if (event->ignore_device) {
- info(event->udev, "device event will be ignored\n");
- goto exit;
- }
- if (event->name == NULL)
- goto exit;
-
- /* look if we want to change the name of the netif */
- if (strcmp(event->name, udev_device_get_sysname(dev)) != 0) {
- char syspath[UTIL_PATH_SIZE];
- char *pos;
-
- err = rename_netif(event);
- if (err != 0)
- goto exit;
- info(event->udev, "renamed netif to '%s'\n", event->name);
-
- /* remember old name */
- udev_device_add_property(dev, "INTERFACE_OLD", udev_device_get_sysname(dev));
-
- /* now change the devpath, because the kernel device name has changed */
- util_strscpy(syspath, sizeof(syspath), udev_device_get_syspath(dev));
- pos = strrchr(syspath, '/');
- if (pos != NULL) {
- pos++;
- util_strscpy(pos, sizeof(syspath) - (pos - syspath), event->name);
- udev_device_set_syspath(event->dev, syspath);
- udev_device_add_property(dev, "INTERFACE", udev_device_get_sysname(dev));
- info(event->udev, "changed devpath to '%s'\n", udev_device_get_devpath(dev));
- }
- }
- udev_device_update_db(dev);
- goto exit;
- }
/* remove device node */
- if (major(udev_device_get_devnum(dev)) != 0 && strcmp(udev_device_get_action(dev), "remove") == 0) {
+ if ((major(udev_device_get_devnum(dev)) != 0 ||
+ strcmp(udev_device_get_subsystem(dev), "net") == 0) &&
+ strcmp(udev_device_get_action(dev), "remove") == 0) {
/* import database entry and delete it */
udev_device_read_db(dev);
udev_device_set_info_loaded(dev);
diff --git a/udev/udev-node.c b/udev/udev-node.c
index 39bec3e..da96a4a 100644
--- a/udev/udev-node.c
+++ b/udev/udev-node.c
@@ -32,6 +32,34 @@
#define TMP_FILE_EXT ".udev-tmp"
+static bool udev_node_mode_matches(struct stat *stats, dev_t devnum, mode_t mode)
+{
+ if ((stats->st_mode & S_IFMT) != (mode & S_IFMT))
+ return false;
+
+ if ((S_ISCHR(mode) || S_ISBLK(mode)) && (stats->st_rdev != devnum))
+ return false;
+
+ return true;
+}
+
+static int udev_node_create_file(struct udev *udev, const char *path, dev_t devnum, mode_t mode)
+{
+ int fd, ret = 0;
+
+ if (S_ISCHR(mode) || S_ISBLK(mode))
+ ret = mknod(path, mode, devnum);
+ else {
+ fd = creat(path, mode);
+ if (fd < 0)
+ ret = fd;
+ else
+ close(fd);
+ }
+
+ return ret;
+}
+
int udev_node_mknod(struct udev_device *dev, const char *file, dev_t devnum, mode_t mode, uid_t uid, gid_t gid)
{
struct udev *udev = udev_device_get_udev(dev);
@@ -47,12 +75,15 @@ int udev_node_mknod(struct udev_device *dev, const char *file, dev_t devnum, mod
else
mode |= S_IFCHR;
+ if (strcmp(udev_device_get_subsystem(dev), "net") == 0)
+ mode = S_IFREG | S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;
+
if (file == NULL)
file = udev_device_get_devnode(dev);
if (lstat(file, &stats) == 0) {
- if (((stats.st_mode & S_IFMT) == (mode & S_IFMT)) && (stats.st_rdev == devnum)) {
- info(udev, "preserve file '%s', because it has correct dev_t\n", file);
+ if (udev_node_mode_matches(&stats, devnum, mode)) {
+ info(udev, "preserve file '%s', because it has correct type\n", file);
preserve = 1;
udev_selinux_lsetfilecon(udev, file, mode);
} else {
@@ -62,10 +93,10 @@ int udev_node_mknod(struct udev_device *dev, const char *file, dev_t devnum, mod
util_strscpyl(file_tmp, sizeof(file_tmp), file, TMP_FILE_EXT, NULL);
unlink(file_tmp);
udev_selinux_setfscreatecon(udev, file_tmp, mode);
- err = mknod(file_tmp, mode, devnum);
+ err = udev_node_create_file(udev, file_tmp, devnum, mode);
udev_selinux_resetfscreatecon(udev);
if (err != 0) {
- err(udev, "mknod(%s, %#o, %u, %u) failed: %m\n",
+ err(udev, "udev_node_create_file(%s, %#o, %u, %u) failed: %m\n",
file_tmp, mode, major(devnum), minor(devnum));
goto exit;
}
@@ -80,7 +111,7 @@ int udev_node_mknod(struct udev_device *dev, const char *file, dev_t devnum, mod
do {
util_create_path(udev, file);
udev_selinux_setfscreatecon(udev, file, mode);
- err = mknod(file, mode, devnum);
+ err = udev_node_create_file(udev, file, devnum, mode);
if (err != 0)
err = errno;
udev_selinux_resetfscreatecon(udev);
diff --git a/udev/udev-rules.c b/udev/udev-rules.c
index ddb51de..a1fe991 100644
--- a/udev/udev-rules.c
+++ b/udev/udev-rules.c
@@ -2435,7 +2435,8 @@ int udev_rules_apply_to_event(struct udev_rules *rules, struct udev_event *event
if (event->devlink_final)
break;
- if (major(udev_device_get_devnum(event->dev)) == 0)
+ if ((major(udev_device_get_devnum(event->dev)) == 0) &&
+ (strcmp(udev_device_get_subsystem(event->dev), "net") != 0))
break;
if (cur->key.op == OP_ASSIGN_FINAL)
event->devlink_final = 1;
--
1.6.5
^ permalink raw reply related
* [PATCH 2.6.32-rc5] r8169: fix Ethernet Hangup for RTL8110SC rev d
From: Simon Wunderlich @ 2009-10-22 6:48 UTC (permalink / raw)
To: netdev; +Cc: Francois Romieu, Bernhard Schmidt
The 8110SC rev d chip on our board shows a regression which the 8110SB chip
did not have. When inbound traffic is overflowing the receive descriptor queue,
"holes" in the ring buffer may occur which lead to a hangup until the buffer
is filled again. The packets are than completely processed, but the ring
remains porous and no packets are processed until the next overflow. Setting
the interface down and up can fix the problem temporary from userspace.
For some reason we don't know, this behaviour is not occuring if the RxVlan
bit for hardware VLAN untagging is set. There is another "Work around for
AMD plateform" in the current code which checks the VLAN status
word in receive descriptors, but does never come to effect when hardware
VLAN support is enabled. We assume that this is a bug in the chip.
The following patch fixes the problem. Without the patch we could reproduce
the hang within minutes (given other devices also generating lots of
interrupts), without we couldn't reproduce within a few days of long term
testing.
Signed-off-by: Bernhard Schmidt <bernhard.schmidt@saxnet.de>
Signed-off-by: Simon Wunderlich <simon.wunderlich@saxnet.de>
Acked-by: Francois Romieu <romieu@zoreil.com>
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 83c47d9..0908c50 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -1029,7 +1029,10 @@ static void rtl8169_vlan_rx_register(struct
net_device *dev,
spin_lock_irqsave(&tp->lock, flags);
tp->vlgrp = grp;
- if (tp->vlgrp)
+ /*
+ * Do not disable RxVlan on 8110SCd.
+ */
+ if (tp->vlgrp || (tp->mac_version == RTL_GIGA_MAC_VER_05))
tp->cp_cmd |= RxVlan;
else
tp->cp_cmd &= ~RxVlan;
@@ -3197,6 +3200,15 @@ rtl8169_init_one(struct pci_dev *pdev, const
struct pci_device_id *ent)
}
rtl8169_init_phy(dev, tp);
+
+ /*
+ * Pretend we are using VLANs; This bypasses a nasty bug where
+ * Interrupts stop flowing on high load on 8110SCd controllers.
+ */
+ if (tp->mac_version == RTL_GIGA_MAC_VER_05)
+ RTL_W16(CPlusCmd, RTL_R16(CPlusCmd) | RxVlan);
+
+
device_set_wakeup_enable(&pdev->dev, tp->features & RTL_FEATURE_WOL);
out:
^ permalink raw reply related
* Re: [PATCH]bnx2x: remove duplication of the BCM_VLAN macro
From: Eilon Greenstein @ 2009-10-22 7:42 UTC (permalink / raw)
To: kirjanov@gmail.com; +Cc: davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <20091021203851.GA5311@coldcone>
On Wed, 2009-10-21 at 13:38 -0700, Denis Kirjanov
> File bnx2.c already contains condition of the macro inclusion.
> So we can remove this.
It is true that BCM_VLAN is defined in bnx2.c, however it is not defined
in bnx2x_*.c files. The definition in bnx2x.h is needed for the bnx2x.ko
module.
Please do not remove it.
Thanks,
Eilon
> Signed-off-by: Denis Kirjanov <kirjanov@gmail.com>
> ---
>
> diff --git a/drivers/net/bnx2x.h b/drivers/net/bnx2x.h
> index bbf8422..4b99fd2 100644
> --- a/drivers/net/bnx2x.h
> +++ b/drivers/net/bnx2x.h
> @@ -20,10 +20,6 @@
> * (you will need to reboot afterwards) */
> /* #define BNX2X_STOP_ON_ERROR */
>
> -#if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE)
> -#define BCM_VLAN 1
> -#endif
> -
>
> #define BNX2X_MULTI_QUEUE
>
^ permalink raw reply
* Re: [net-next-2.6 PATCH 2/3] ixgbe: Set MSI-X vectors to NOBALANCING and set affinity
From: Peter P Waskiewicz Jr @ 2009-10-22 8:22 UTC (permalink / raw)
To: David Miller; +Cc: Kirsher, Jeffrey T, gospo@redhat.com, netdev@vger.kernel.org
In-Reply-To: <20091021.215031.57955781.davem@davemloft.net>
On Wed, 2009-10-21 at 21:50 -0700, David Miller wrote:
> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Tue, 20 Oct 2009 19:27:14 -0700
>
> > From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> >
> > This patch will set each MSI-X vector to IRQF_NOBALANCING to
> > prevent autobalance of the interrupts, then applies a CPU
> > affinity. This will only be done when Flow Director is enabled,
> > which needs interrupts to be processed on the same CPUs where the
> > applications are running.
> >
> > Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>
> Just explain to me why irqbalanced in userspace cannot take care
> of this issue.
The problem we have is when Flow Director is enabled, we want to try and
balance the applications across all CPUs. irqbalance is going to fight
with the scheduler to balance things, and our tests show that irqbalance
only utilizes a few of the CPU cores, not all of them. That fights
directly with Flow Director and what it's trying to do.
> Second, even if we cannot use irqbalanced for some reason, the last
> thing I want to see is drivers directly fiddling with interrupt
> states and attributes. Every driver is going to do it every so
> slightly differently, and often will get it wrong.
The first thing any performance guide says is to disable irqbalance, and
affinitize the interrupts in /proc/irq/<irq>/smp_affinity. This will
ensure the best distribution of work. The major disadvantage in doing
this is disabling irqbalance affects the entire system. What this
patchset is trying to do is make sure a single driver, trying to
optimize for performance, doesn't need to affect the entire system.
Setting no-balancing on a vector is the best approach for the entire
system.
I completely understand your concern that this opens precedent for other
drivers to potentially start doing crazy things with interrupts, but
with MSI-X, we're only impacting our driver.
> There is also no global policy or policy control available when
> drivers do this stuff directly. And that's how we end up with
> situations where every driver behaves differently which results in a
> terrible user experience.
Again, I think the overall impact is worse where the normal approach to
performance tuning is to altogether disable irqbalancing. The same
effect can be attained by a user disabling irqbalance, and assigning
whatever affinity they want, which could be even more devastating. What
we're trying to do here is have the driver come as best tuned out of the
box as possible.
If there's something about this particular implementation you're not
comfortable with, I'm very willing to take any feedback on it. We're
trying to do a specific thing, not lead poor design in drivers when
dealing with interrupts.
Regards,
-PJ Waskiewicz
^ permalink raw reply
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule
From: Johannes Berg @ 2009-10-22 8:27 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Tilman Schmidt, David Miller, hidave.darkstar, linux-kernel, tglx,
linux-wireless, linux-ppp, netdev, paulus, Michael Buesch,
Oliver Hartkopp
In-Reply-To: <20091021213947.GA12202@ami.dom.local>
[-- Attachment #1: Type: text/plain, Size: 1228 bytes --]
On Wed, 2009-10-21 at 23:39 +0200, Jarek Poplawski wrote:
> > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ);
> > > + raise_softirq_irqoff(NET_RX_SOFTIRQ);
> >
> > This still doesn't make any sense.
> >
> > There may or may not be a lot of code that assumes that everything else
> > is run with other tasklets disabled, and that it cannot be interrupted
> > by a tasklet and thus create a race.
> >
> > Can you prove that is not the case, across the entire networking layer?
>
> I'm not sure I can understand your question. This patch is mainly to
> avoid using netif_rx()/netif_rx_ni() pair as a test of proper process
> context handling; IMHO there're better tools for this (lockdep,
> WARN_ON's).
And how exactly does that matter to the patch at hand?!
I'm saying that it seems to me, as indicated by the API (and without
proof otherwise that's how it is) the networking layer needs to have
packets handed to it with softirqs disabled. Therefore, this patch is
not needed. While it may not be _wrong_, it'll definitely introduce a
performance regression.
This really should be obvious. You're fixing the warning at the source
of the warning, rather than the source of the problem.
johannes
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply
* [PATCH] isdn: fix possible circular locking dependency
From: Xiaotian Feng @ 2009-10-22 9:07 UTC (permalink / raw)
To: isdn, isdn4linux; +Cc: tilman, netdev, linux-kernel, Xiaotian Feng
There's a circular locking dependency:
---> isdn_net_get_locked_lp
--->lock &nd->queue_lock
--->lock &nd->queue->xmit_lock
.....................
---->unlock &nd->queue_lock
---> isdn_net_writebuf_skb (called with &nd->queue->xmit_lock locked)
---->isdn_net_inc_frame_cnt
---->isdn_net_device_busy
----> lock &nd->queue_lock
This will trigger lockdep warnings:
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.32-rc4-testing #7
-------------------------------------------------------
ipppd/28379 is trying to acquire lock:
(&netdev->queue_lock){......}, at: [<e62ad0fd>] isdn_net_device_busy+0x2c/0x74 [isdn]
but task is already holding lock:
(&netdev->local->xmit_lock){+.....}, at: [<e62aefc2>] isdn_net_write_super+0x3f/0x6e [isdn]
which lock already depends on the new lock.
.......
We don't need to lock nd->queue->xmit_lock to protect single
isdn_net_lp_busy(). This can fix above lockdep warnings.
Reported-and-tested-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Xiaotian Feng <xtfeng@gmail.com>
---
drivers/isdn/i4l/isdn_net.h | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/isdn/i4l/isdn_net.h b/drivers/isdn/i4l/isdn_net.h
index 74032d0..7511f08 100644
--- a/drivers/isdn/i4l/isdn_net.h
+++ b/drivers/isdn/i4l/isdn_net.h
@@ -83,19 +83,19 @@ static __inline__ isdn_net_local * isdn_net_get_locked_lp(isdn_net_dev *nd)
spin_lock_irqsave(&nd->queue_lock, flags);
lp = nd->queue; /* get lp on top of queue */
- spin_lock(&nd->queue->xmit_lock);
while (isdn_net_lp_busy(nd->queue)) {
- spin_unlock(&nd->queue->xmit_lock);
nd->queue = nd->queue->next;
if (nd->queue == lp) { /* not found -- should never happen */
lp = NULL;
goto errout;
}
- spin_lock(&nd->queue->xmit_lock);
}
lp = nd->queue;
nd->queue = nd->queue->next;
+ spin_unlock_irqrestore(&nd->queue_lock, flags);
+ spin_lock(&lp->xmit_lock);
local_bh_disable();
+ return lp;
errout:
spin_unlock_irqrestore(&nd->queue_lock, flags);
return lp;
--
1.6.2.5
^ permalink raw reply related
* Re: [PATCH v2 2/8] Allow tcp_parse_options to consult dst entry
From: Ilpo Järvinen @ 2009-10-22 9:41 UTC (permalink / raw)
To: Gilad Ben-Yossef; +Cc: Netdev, ori
In-Reply-To: <4ADF15A2.1050804@codefidence.com>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 4535 bytes --]
On Wed, 21 Oct 2009, Gilad Ben-Yossef wrote:
> Hi Ilpo,
>
>
> Thanks for the feedback :-)
>
>
> Ilpo Järvinen wrote:
>
> > On Wed, 21 Oct 2009, Gilad Ben-Yossef wrote:
> >
> >
> > > We need tcp_parse_options to be aware of dst_entry to take into account
> > > per dst_entry TCP options settings
> > >
> > > Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
> > > Sigend-off-by: Ori Finkelman <ori@comsleep.com>
> > > Sigend-off-by: Yony Amit <yony@comsleep.com>
> > >
> > > ---
> > > include/net/tcp.h | 3 ++-
> > > net/ipv4/syncookies.c | 27 ++++++++++++++-------------
> > > net/ipv4/tcp_input.c | 9 ++++++---
> > > net/ipv4/tcp_ipv4.c | 19 ++++++++++---------
> > > net/ipv4/tcp_minisocks.c | 7 +++++--
> > > net/ipv6/syncookies.c | 28 +++++++++++++++-------------
> > > net/ipv6/tcp_ipv6.c | 3 ++-
> > > 7 files changed, 54 insertions(+), 42 deletions(-)
> > >
> > >
> > >
> <snip>
> > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > > index 7cda24b..1cb0ec4 100644
> > > --- a/net/ipv4/tcp_ipv4.c
> > > +++ b/net/ipv4/tcp_ipv4.c
> >> @@ -1256,11 +1256,18 @@ int tcp_v4_conn_request(struct sock *sk, struct
> sk_buff *skb)
> > > tcp_rsk(req)->af_specific = &tcp_request_sock_ipv4_ops;
> > > #endif
> > >
> > > + ireq = inet_rsk(req);
> > > + ireq->loc_addr = daddr;
> > > + ireq->rmt_addr = saddr;
> > > + ireq->no_srccheck = inet_sk(sk)->transparent;
> > > + ireq->opt = tcp_v4_save_options(sk, skb);
> > > +
> > > + dst = inet_csk_route_req(sk, req);
> > > tcp_clear_options(&tmp_opt);
> > > tmp_opt.mss_clamp = 536;
> > > tmp_opt.user_mss = tcp_sk(sk)->rx_opt.user_mss;
> > >
> > > - tcp_parse_options(skb, &tmp_opt, 0);
> > > + tcp_parse_options(skb, &tmp_opt, 0, dst);
> > >
> > > if (want_cookie && !tmp_opt.saw_tstamp)
> > > tcp_clear_options(&tmp_opt);
> >> @@ -1269,14 +1276,8 @@ int tcp_v4_conn_request(struct sock *sk, struct
> sk_buff *skb)
> > >
> > > tcp_openreq_init(req, &tmp_opt, skb);
> > >
> > > - ireq = inet_rsk(req);
> > > - ireq->loc_addr = daddr;
> > > - ireq->rmt_addr = saddr;
> > > - ireq->no_srccheck = inet_sk(sk)->transparent;
> > > - ireq->opt = tcp_v4_save_options(sk, skb);
> > > -
> > > if (security_inet_conn_request(sk, skb, req))
> > > - goto drop_and_free;
> > > + goto drop_and_release;
> > >
> > > if (!want_cookie)
> > > TCP_ECN_create_request(req, tcp_hdr(skb));
> >> @@ -1301,7 +1302,7 @@ int tcp_v4_conn_request(struct sock *sk, struct
> sk_buff *skb)
> > > */
> > > if (tmp_opt.saw_tstamp &&
> > > tcp_death_row.sysctl_tw_recycle &&
> > > - (dst = inet_csk_route_req(sk, req)) != NULL &&
> > > + dst != NULL &&
> > >
> >
> > Why you need this NULL check this here while you trap it with BUG_ON
> > elsewhere? Does your patch perhaps create a remote DoS opportunity?
> >
> >
> >
> Indeed, I believe you are right. Good catch.
>
> What about this (I know the patch gets eaten by Thunderbird, sorry about that.
> This is just for explaining what I want to do):
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
>
> index 1cb0ec4..1d611e3 100644
>
> --- a/net/ipv4/tcp_ipv4.c
>
> +++ b/net/ipv4/tcp_ipv4.c
>
> @@ -1263,6 +1263,9 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff
> *skb)
>
> ireq->opt = tcp_v4_save_options(sk, skb);
>
>
>
> dst = inet_csk_route_req(sk, req);
>
> + if(!dst)
>
> + goto drop_and_free;
>
> +
>
> tcp_clear_options(&tmp_opt);
>
> tmp_opt.mss_clamp = 536;
>
> tmp_opt.user_mss = tcp_sk(sk)->rx_opt.user_mss;
>
> @@ -1302,7 +1305,6 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff
> *skb)
>
> */
>
> if (tmp_opt.saw_tstamp &&
>
> tcp_death_row.sysctl_tw_recycle &&
>
> - dst != NULL &&
>
> (peer = rt_get_peer((struct rtable *)dst)) != NULL &&
>
> peer->v4daddr == saddr) {
>
> if (get_seconds() < peer->tcp_ts_stamp + TCP_PAWS_MSL
> &&
>
>
>
> My rational is that since if the connection is formed we will need to send a
> syn/ack ( call to __tcp_v4_send_synack a couple of lines below) and since we
> can't do that if we don't have a route, this makes sense.
>
> If this sounds sane, I'll re-spin the patch with this as a fix.
I'd just guard the relevant places with dst && ...? ...But I didn't go
through that far to find out how many one would then need.
--
i.
^ permalink raw reply
* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
From: William Allen Simpson @ 2009-10-22 10:53 UTC (permalink / raw)
To: netdev
In-Reply-To: <4ADFE635.4020109@gmail.com>
Eric Dumazet wrote:
> Absolutely, global setting is a must when an admin wants a quick path.
>
> The more flexible would be to have two bits per route, plus
> 2 bits on the global configuration.
>
> global conf:
> 00 : timestamps OFF, unless a route setting is not 00
> 01 : timestamps ON, unless a route setting is not 00
> 10 : Force timestamps OFF, ignore route settings (emergency sysadmin request)
> 11 : Force timestamps ON, ignore route settings
>
> Route settings (used *only* if global setting is 0Y)
> 00 : global conf is used
> 01 : Force timestamps being OFF for this route
> 10 : Force timestamps being ON for this route
> 11 : complement global conf
>
Nice! Seems to have all the bases covered. For consistency, I'd swap the
latter values (although I doubt complement will have much use):
00 : global conf is used
01 : complement global conf
10 : Timestamps OFF for this route
11 : Timestamps ON for this route
And the documentation should make it clear that global 10 and 11 override
per route 10 and 11.
^ permalink raw reply
* Re: [net-next-2.6 PATCH 2/3] ixgbe: Set MSI-X vectors to NOBALANCING and set affinity
From: David Miller @ 2009-10-22 10:56 UTC (permalink / raw)
To: peter.p.waskiewicz.jr; +Cc: jeffrey.t.kirsher, gospo, netdev
In-Reply-To: <1256199756.2634.65.camel@ppwaskie-mobl2>
From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Date: Thu, 22 Oct 2009 01:22:36 -0700
> The first thing any performance guide says is to disable irqbalance
Such guides are wrong, and that's the end of this discussion.
These kinds of guides also say to do all kinds of crazy things with
the socket sysctl settings. That's wrong too and we absolutely do not
do things to accomodate nor support those guide suggestions.
And we won't do that here.
I'm especially not going to succumb in this case because Arjan has
been more than responsive to making sure irqbalanced in userspace does
the right thing for networking devices, even multiqueue ones.
So we can make it do the right thing when flow director is present.
In fact, the thing you want for flow director makes sense in the
general case too.
^ permalink raw reply
* Re: [PATCH kernel 2.6.32-rc5] pcnet_cs: add cis of PreMax PE-200 ethernet pcmcia card
From: Ken Kawasaki @ 2009-10-22 11:10 UTC (permalink / raw)
To: Dan Williams; +Cc: netdev
In-Reply-To: <1256152686.8469.34.camel@localhost.localdomain>
Hi,
>Dan Williams <dcbw@redhat.com> wrote:
> > add cis of PreMax ethernet pcmcia card,
> > and some Sierra Wireless serial card(AC555, AC7xx, AC8xx).
> Random question: are CIS files copyrightable?
The CIS contains the IRQ, ioport-range, voltage information etc
like the PCI config space.
So I think it is not copyrightable.
but Sierra Wireless provided this CIS by GPL.
> What exactly do they
> contain, just updates to the the CIS data on the card itself that the
> manufacturer forgot to burn before shipping the card?
The reason for the CIS update is original CIS does not conform to the pcmcia spec,
not forget to burn the CIS.
> Also, I've got a Sierra AC860 here that reports as "prod_id(2):
> "AC860"", and has the same manf_id (0x0192) and card_id (0x710) as the
> AC850.
Actually, not all Sierra Wireless card need the CIS update.
Could you remove the PCMCIA_DEVICE_CIS_PROD_ID12 and PCMCIA_DEVICE_CIS_MANF_CARD
definition of the Sierra Wireless card,
and check the AC860 works or not?
Here is the output of dumpcis for SW_8xx_SER.cis.
Socket 0
offset 0x02, tuple 0x01, link 0x01
ff
dev_info
no_info
offset 0x05, tuple 0x17, link 0x03
41 00 ff
attr_dev_info
EEPROM 250ns, 512b
offset 0x0a, tuple 0x20, link 0x04
92 01 10 07
manfid 0x0192, 0x0710
offset 0x10, tuple 0x21, link 0x02
02 00
funcid serial_port
offset 0x14, tuple 0x15, link 0x2f
07 00 53 69 65 72 72 61 20 57 69 72 65 6c 65 73
73 00 41 43 38 35 30 00 33 47 20 4e 65 74 77 6f
72 6b 20 41 64 61 70 74 65 72 00 52 31 00 ff
vers_1 7.0, "Sierra Wireless", "AC850", "3G Network Adapter", "R1"
offset 0x45, tuple 0x1a, link 0x05
01 03 00 07 73
config base 0x0700 mask 0x0073 last_index 0x03
offset 0x4c, tuple 0x1b, link 0x10
e0 01 19 78 4d 55 5d 25 a3 60 f8 48 07 30 bc 86
cftable_entry 0x20 [default]
Vcc Istatic 45mA Iavg 50mA Ipeak 55mA Idown 20mA
io 0x48f8-0x48ff [lines=3] [8bit] [range]
irq mask 0x86bc [level]
offset 0x5e, tuple 0x1b, link 0x08
a1 01 08 a3 60 f8 47 07
cftable_entry 0x21
io 0x47f8-0x47ff [lines=3] [8bit] [range]
offset 0x68, tuple 0x1b, link 0x08
a2 01 08 a3 60 e8 48 07
cftable_entry 0x22
io 0x48e8-0x48ef [lines=3] [8bit] [range]
offset 0x72, tuple 0x1b, link 0x08
a3 01 08 a3 60 e8 47 07
cftable_entry 0x23
io 0x47e8-0x47ef [lines=3] [8bit] [range]
offset 0x7c, tuple 0x1b, link 0x04
a4 01 08 23
cftable_entry 0x24
io 0x0000-0x0007 [lines=3] [8bit]
offset 0x82, tuple 0x14, link 0x00
no_long_link
Best Regards
Ken.
^ permalink raw reply
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule
From: David Miller @ 2009-10-22 11:29 UTC (permalink / raw)
To: jarkao2
Cc: johannes, tilman, hidave.darkstar, linux-kernel, tglx,
linux-wireless, linux-ppp, netdev, paulus, mb, oliver
In-Reply-To: <20091021213947.GA12202@ami.dom.local>
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Wed, 21 Oct 2009 23:39:47 +0200
> I'm not sure I can understand your question. This patch is mainly to
> avoid using netif_rx()/netif_rx_ni() pair as a test of proper process
> context handling; IMHO there're better tools for this (lockdep,
> WARN_ON's).
Semantically I think your patch is correct, but I wonder about cost.
Something that is a simply per-cpu inline "or" operation is now a
function call and potentially mispredicted branch inside of
raise_softirq_irqoff().
And netif_rx() is indeed a fast path for tunnels and other users so
this does matter.
I like having people call things in the correct context the function
was built for, and thus we can avoiryd completely useless operations and
tests as we can now in netif_rx().
Makaing things general purpose costs something, and it costs too much
here for this critical routine, sorry.
I was just having a talk with Nick Piggin about these kinds of issues
today, too few people care about these ever encrouching tiny pieces
of bloat that slow the kernel down gradually over time, and I simply
won't stand for it when I notice it :-)
^ permalink raw reply
* Re: [PATCH net-next-2.6] rtnetlink: rtnl_setlink() and rtnl_getlink() changes
From: David Miller @ 2009-10-22 11:34 UTC (permalink / raw)
To: eric.dumazet; +Cc: shemminger, netdev
In-Reply-To: <4ADF7633.9050208@gmail.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Oct 2009 22:59:31 +0200
> rtnl_getlink() & rtnl_setlink() run with RTNL held, we can use
> __dev_get_by_index() and __dev_get_by_name() variants and avoid
> dev_hold()/dev_put()
>
> Adds to rtnl_getlink() the capability to find a device by its name,
> not only by its index.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Looks good, applied, thanks.
^ permalink raw reply
* Re: [RFC] net,socket: introduce build_sockaddr_check helper to catch overflow at build time
From: David Miller @ 2009-10-22 11:49 UTC (permalink / raw)
To: gorcunov; +Cc: netdev
In-Reply-To: <20091021170732.GE5976@lenovo>
From: Cyrill Gorcunov <gorcunov@gmail.com>
Date: Wed, 21 Oct 2009 21:07:32 +0400
> net,socket: introduce build_sockaddr_check helper to catch overflow at build time
>
> proto_ops->getname implies copying protocol specific data
> into storage unit (particulary to __kernel_sockaddr_storage).
> So when one implements new protocol he either may keep this
> in mind (or may not).
>
> Lets introduce build_sockaddr_check helper which check if
> storage unit is not overfowed. Note that the check is build
> time and introduce no slowdown at execution time.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Nice idea, and I wonder if we can automate it even further.
Perhaps some tag that gets put on the socket address type
definition or similar?
^ permalink raw reply
* xfrm transport mode policy and forward packets
From: Timo Teräs @ 2009-10-22 12:07 UTC (permalink / raw)
To: netdev, Herbert Xu
Hi,
I'm using on my dmvpn environment security policies like:
src 0.0.0.0/0 dst 0.0.0.0/0 proto gre
dir in priority 2147483648 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 0 mode transport
src 0.0.0.0/0 dst 0.0.0.0/0 proto gre
dir out priority 2147483648 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 0 mode transport
To make sure the locally generated/received GRE traffic is IPsec protected.
Now when some other non-local gre traffic is being forwarded by this router,
that seems to match these SPs too. Basically no one behind this router box
can use GRE (or PPTP).
I originally had the 'fwd' policy too, but removing it did not help as-is.
I needed to add destination specific 'out' policies with higher priority.
Apparently, the forward path does two xfrm lookups: first one with from 'fwd'
policies to check if the received packet is not against policy, and a second
'out' lookup to see if it needs to get transformed.
My initial thought was if transport mode policies ought to be ignored, but
if the forwarded packet is NATted we might actually want to xfrm it in
transport mode.
There is 'ifindex' field in xfrm_selector, but that seems to be the output
interface. So it would not solve my problem: both local and forwarded gre
packets are output on the same interface.
I'm now slightly curious why 'in' was sort of split to 'in' and 'fwd', but
'out' was not split similarly, so we'd have more control over policies
depending if the traffic is local or forwarded?
My ideas so far have been:
a) rename 'fwd' to 'infwd' and split 'out' to 'out' and 'outfwd' ?
(sounds kinda intrusive)
b) iptables target that would be able to disable xfrm
Any other ideas?
What would be the proper fix for this problem?
Thanks,
Timo
^ permalink raw reply
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule
From: Jarek Poplawski @ 2009-10-22 12:54 UTC (permalink / raw)
To: David Miller
Cc: johannes, tilman, hidave.darkstar, linux-kernel, tglx,
linux-wireless, linux-ppp, netdev, paulus, mb, oliver
In-Reply-To: <20091022.042939.95166154.davem@davemloft.net>
On Thu, Oct 22, 2009 at 04:29:39AM -0700, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Wed, 21 Oct 2009 23:39:47 +0200
>
> > I'm not sure I can understand your question. This patch is mainly to
> > avoid using netif_rx()/netif_rx_ni() pair as a test of proper process
> > context handling; IMHO there're better tools for this (lockdep,
> > WARN_ON's).
>
> Semantically I think your patch is correct, but I wonder about cost.
>
> Something that is a simply per-cpu inline "or" operation is now a
> function call and potentially mispredicted branch inside of
> raise_softirq_irqoff().
>
> And netif_rx() is indeed a fast path for tunnels and other users so
> this does matter.
>
> I like having people call things in the correct context the function
> was built for, and thus we can avoiryd completely useless operations and
> tests as we can now in netif_rx().
I like it too, but in this particular case I'm not sure netif_rx()
functionality requires this kind of separation; it looks to me quite
similarly to e.g. tasklet_schedule(), the same for process or softirq
contexts.
>
> Makaing things general purpose costs something, and it costs too much
> here for this critical routine, sorry.
>
> I was just having a talk with Nick Piggin about these kinds of issues
> today, too few people care about these ever encrouching tiny pieces
> of bloat that slow the kernel down gradually over time, and I simply
> won't stand for it when I notice it :-)
I'm not sure we're saving in the right place. As a matter of fact,
whenever I look into kernel/ code I can't see this kind of
optimization. There is quite a lot of WARN_ON's and if's. These NOHZ
warnings simply show somebody's else debugging triggers far from
places where it all started and is quite accidental, while this
particular "bug" should've been printed immediately long time ago, if
we really cared.
Since I understand it's a question of taste, and it's not anything
critical, I'm quite OK with staying with the old way (except old
bugs, I hope ;-).
Jarek P.
^ permalink raw reply
* bridging + load balancing bonding
From: Jasper Spaans @ 2009-10-22 12:23 UTC (permalink / raw)
To: netdev
Hi,
We're using the following setup for bonding and bridging, to be able to put
large amounts of data through multiple IDS analyzers:
+---[br0]----+ +--- eth1 ---(IDS machine 1)
(Span port from switch) -- eth0 bond0--+
+--- eth2 ---(IDS machine 2)
eth0 receives network traffic, which should be passed to machines which are
connected to eth1 and eth2. These machines run an IDS package, and there are
two of those for performance reasons.
bond0 is configured to load balance the packets using "balance-xor", in this
case combined with xmit_hash_policy layer2.
However, we're seeing problems: packets from one flow do not end up at the
same IDS machine. This is because this selection is not based on the source
_and_ destination mac addresses of the original packet, but on the mac
address of the bonding device and the destination mac address of the
package.
This is also clear in the code:
For example, in bond_main.c, in bond_xmit_hash_policy_l2:
return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
Changing this to
return (data->h_dest[5] ^ data->h_source[5]) % count;
fixes our problems, but is this harmful for packets originating locally (or
being routed?)
If not, can this be applied? Or does anyone have other ideas?
Thanks,
Jasper Spaans
--
Fox-IT Experts in IT Security!
T: +31 (0) 15 284 79 99
KvK Haaglanden 27301624
^ permalink raw reply
* iproute2: 2 questions
From: almaop @ 2009-10-22 12:42 UTC (permalink / raw)
To: netdev
1. There is the known bug so we cant use ipt action with recent iptables:
tc filter add ...
action ipt -j mark --set-mark 2
does not work.
It does not work with the last iproute2-2.6.29 and with latest git.
Is there some official workaround?
2. Are there plans to release the new iproute2 which fixes this bug?
Krzysiek
----------------------------------------------------------------------
Afera Hazardowa- O co tu chodzi?
Sprawdz >>> http://link.interia.pl/f238e
^ permalink raw reply
* Re: xfrm transport mode policy and forward packets
From: Herbert Xu @ 2009-10-22 13:21 UTC (permalink / raw)
To: Timo Teräs; +Cc: netdev, Alexey Kuznetsov
In-Reply-To: <4AE04B00.8090207@iki.fi>
On Thu, Oct 22, 2009 at 03:07:28PM +0300, Timo Teräs wrote:
>
> I'm using on my dmvpn environment security policies like:
>
> src 0.0.0.0/0 dst 0.0.0.0/0 proto gre dir in priority 2147483648 ptype
> main tmpl src 0.0.0.0 dst 0.0.0.0
> proto esp reqid 0 mode transport
>
> src 0.0.0.0/0 dst 0.0.0.0/0 proto gre dir out priority 2147483648 ptype
> main tmpl src 0.0.0.0 dst 0.0.0.0
> proto esp reqid 0 mode transport
>
> To make sure the locally generated/received GRE traffic is IPsec protected.
> Now when some other non-local gre traffic is being forwarded by this router,
> that seems to match these SPs too. Basically no one behind this router box
> can use GRE (or PPTP).
This is expected since forwarded GRE packets match the selector
given.
> My ideas so far have been:
> a) rename 'fwd' to 'infwd' and split 'out' to 'out' and 'outfwd' ?
> (sounds kinda intrusive)
> b) iptables target that would be able to disable xfrm
>
> Any other ideas?
> What would be the proper fix for this problem?
We could add the fwmark as a key.
Alexey and others may have better ideas on this.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* [net-next-2.6 PATCH] be2net:Changes to update ethtool get_settings function to return appropriate values.
From: Sarveshwar Bandi @ 2009-10-22 13:30 UTC (permalink / raw)
To: netdev; +Cc: davem
Update ethtool get_settings function to:
- get current link speed settings from controller
- get port transceiver type from controller
- fill appropriate values for supported, phy_address
Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
---
drivers/net/benet/be_cmds.c | 37 +++++++++++++++++++++++++++++++--
drivers/net/benet/be_cmds.h | 45 ++++++++++++++++++++++++++++++++++++++--
drivers/net/benet/be_ethtool.c | 36 +++++++++++++++++++++++++++++++-
drivers/net/benet/be_main.c | 5 ++++
4 files changed, 117 insertions(+), 6 deletions(-)
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index 25b6602..a034265 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -823,7 +823,7 @@ int be_cmd_get_stats(struct be_adapter *
/* Uses synchronous mcc */
int be_cmd_link_status_query(struct be_adapter *adapter,
- bool *link_up)
+ bool *link_up, u8 *mac_speed, u16 *link_speed)
{
struct be_mcc_wrb *wrb;
struct be_cmd_req_link_status *req;
@@ -844,8 +844,11 @@ int be_cmd_link_status_query(struct be_a
status = be_mcc_notify_wait(adapter);
if (!status) {
struct be_cmd_resp_link_status *resp = embedded_payload(wrb);
- if (resp->mac_speed != PHY_LINK_SPEED_ZERO)
+ if (resp->mac_speed != PHY_LINK_SPEED_ZERO) {
*link_up = true;
+ *link_speed = le16_to_cpu(resp->link_speed);
+ *mac_speed = resp->mac_speed;
+ }
}
spin_unlock_bh(&adapter->mcc_lock);
@@ -1177,6 +1180,36 @@ int be_cmd_get_beacon_state(struct be_ad
return status;
}
+/* Uses sync mcc */
+int be_cmd_read_port_type(struct be_adapter *adapter, u32 port,
+ u8 *connector)
+{
+ struct be_mcc_wrb *wrb;
+ struct be_cmd_req_port_type *req;
+ int status;
+
+ spin_lock_bh(&adapter->mcc_lock);
+
+ wrb = wrb_from_mccq(adapter);
+ req = embedded_payload(wrb);
+
+ be_wrb_hdr_prepare(wrb, sizeof(struct be_cmd_resp_port_type), true, 0);
+
+ be_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_COMMON,
+ OPCODE_COMMON_READ_TRANSRECV_DATA, sizeof(*req));
+
+ req->port = cpu_to_le32(port);
+ req->page_num = cpu_to_le32(TR_PAGE_A0);
+ status = be_mcc_notify_wait(adapter);
+ if (!status) {
+ struct be_cmd_resp_port_type *resp = embedded_payload(wrb);
+ *connector = resp->data.connector;
+ }
+
+ spin_unlock_bh(&adapter->mcc_lock);
+ return status;
+}
+
int be_cmd_write_flashrom(struct be_adapter *adapter, struct be_dma_mem *cmd,
u32 flash_type, u32 flash_opcode, u32 buf_size)
{
diff --git a/drivers/net/benet/be_cmds.h b/drivers/net/benet/be_cmds.h
index a1e78cc..65e14dd 100644
--- a/drivers/net/benet/be_cmds.h
+++ b/drivers/net/benet/be_cmds.h
@@ -140,6 +140,7 @@ #define OPCODE_COMMON_NTWK_PMAC_DEL 60
#define OPCODE_COMMON_FUNCTION_RESET 61
#define OPCODE_COMMON_ENABLE_DISABLE_BEACON 69
#define OPCODE_COMMON_GET_BEACON_STATE 70
+#define OPCODE_COMMON_READ_TRANSRECV_DATA 73
#define OPCODE_ETH_ACPI_CONFIG 2
#define OPCODE_ETH_PROMISCUOUS 3
@@ -635,9 +636,47 @@ struct be_cmd_resp_link_status {
u8 mac_fault;
u8 mgmt_mac_duplex;
u8 mgmt_mac_speed;
- u16 rsvd0;
+ u16 link_speed;
+ u32 rsvd0;
} __packed;
+/******************** Port Identification ***************************/
+/* Identifies the type of port attached to NIC */
+struct be_cmd_req_port_type {
+ struct be_cmd_req_hdr hdr;
+ u32 page_num;
+ u32 port;
+};
+
+enum {
+ TR_PAGE_A0 = 0xa0,
+ TR_PAGE_A2 = 0xa2
+};
+
+struct be_cmd_resp_port_type {
+ struct be_cmd_resp_hdr hdr;
+ u32 page_num;
+ u32 port;
+ struct data {
+ u8 identifier;
+ u8 identifier_ext;
+ u8 connector;
+ u8 transceiver[8];
+ u8 rsvd0[3];
+ u8 length_km;
+ u8 length_hm;
+ u8 length_om1;
+ u8 length_om2;
+ u8 length_cu;
+ u8 length_cu_m;
+ u8 vendor_name[16];
+ u8 rsvd;
+ u8 vendor_oui[3];
+ u8 vendor_pn[16];
+ u8 vendor_rev[4];
+ } data;
+};
+
/******************** Get FW Version *******************/
struct be_cmd_req_get_fw_version {
struct be_cmd_req_hdr hdr;
@@ -775,7 +814,7 @@ extern int be_cmd_rxq_create(struct be_a
extern int be_cmd_q_destroy(struct be_adapter *adapter, struct be_queue_info *q,
int type);
extern int be_cmd_link_status_query(struct be_adapter *adapter,
- bool *link_up);
+ bool *link_up, u8 *mac_speed, u16 *link_speed);
extern int be_cmd_reset(struct be_adapter *adapter);
extern int be_cmd_get_stats(struct be_adapter *adapter,
struct be_dma_mem *nonemb_cmd);
@@ -801,6 +840,8 @@ extern int be_cmd_set_beacon_state(struc
u8 port_num, u8 beacon, u8 status, u8 state);
extern int be_cmd_get_beacon_state(struct be_adapter *adapter,
u8 port_num, u32 *state);
+extern int be_cmd_read_port_type(struct be_adapter *adapter, u32 port,
+ u8 *connector);
extern int be_cmd_write_flashrom(struct be_adapter *adapter,
struct be_dma_mem *cmd, u32 flash_oper,
u32 flash_opcode, u32 buf_size);
diff --git a/drivers/net/benet/be_ethtool.c b/drivers/net/benet/be_ethtool.c
index 280471e..edebce9 100644
--- a/drivers/net/benet/be_ethtool.c
+++ b/drivers/net/benet/be_ethtool.c
@@ -293,9 +293,43 @@ static int be_get_sset_count(struct net_
static int be_get_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
{
- ecmd->speed = SPEED_10000;
+ struct be_adapter *adapter = netdev_priv(netdev);
+ u8 mac_speed = 0, connector = 0;
+ u16 link_speed = 0;
+ bool link_up = false;
+
+ be_cmd_link_status_query(adapter, &link_up, &mac_speed, &link_speed);
+
+ /* link_speed is in units of 10 Mbps */
+ if (link_speed) {
+ ecmd->speed = link_speed*10;
+ } else {
+ switch (mac_speed) {
+ case PHY_LINK_SPEED_1GBPS:
+ ecmd->speed = SPEED_1000;
+ break;
+ case PHY_LINK_SPEED_10GBPS:
+ ecmd->speed = SPEED_10000;
+ break;
+ }
+ }
ecmd->duplex = DUPLEX_FULL;
ecmd->autoneg = AUTONEG_DISABLE;
+ ecmd->supported = (SUPPORTED_10000baseT_Full | SUPPORTED_TP);
+
+ be_cmd_read_port_type(adapter, adapter->port_num, &connector);
+ switch (connector) {
+ case 7:
+ ecmd->port = PORT_FIBRE;
+ break;
+ default:
+ ecmd->port = PORT_TP;
+ break;
+ }
+
+ ecmd->phy_address = adapter->port_num;
+ ecmd->transceiver = XCVR_INTERNAL;
+
return 0;
}
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index e0f9d64..a48e822 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -1586,6 +1586,8 @@ static int be_open(struct net_device *ne
struct be_eq_obj *tx_eq = &adapter->tx_eq;
bool link_up;
int status;
+ u8 mac_speed;
+ u16 link_speed;
/* First time posting */
be_post_rx_frags(adapter);
@@ -1604,7 +1606,8 @@ static int be_open(struct net_device *ne
/* Rx compl queue may be in unarmed state; rearm it */
be_cq_notify(adapter, adapter->rx_obj.cq.id, true, 0);
- status = be_cmd_link_status_query(adapter, &link_up);
+ status = be_cmd_link_status_query(adapter, &link_up, &mac_speed,
+ &link_speed);
if (status)
return status;
be_link_status_update(adapter, link_up);
--
1.4.0
^ permalink raw reply related
* Re: xfrm transport mode policy and forward packets
From: Timo Teräs @ 2009-10-22 13:31 UTC (permalink / raw)
To: Herbert Xu; +Cc: netdev, Alexey Kuznetsov
In-Reply-To: <20091022132126.GB28893@gondor.apana.org.au>
Herbert Xu wrote:
> On Thu, Oct 22, 2009 at 03:07:28PM +0300, Timo Teräs wrote:
>> I'm using on my dmvpn environment security policies like:
>>
>> src 0.0.0.0/0 dst 0.0.0.0/0 proto gre dir in priority 2147483648 ptype
>> main tmpl src 0.0.0.0 dst 0.0.0.0
>> proto esp reqid 0 mode transport
>>
>> src 0.0.0.0/0 dst 0.0.0.0/0 proto gre dir out priority 2147483648 ptype
>> main tmpl src 0.0.0.0 dst 0.0.0.0
>> proto esp reqid 0 mode transport
>>
>> To make sure the locally generated/received GRE traffic is IPsec protected.
>> Now when some other non-local gre traffic is being forwarded by this router,
>> that seems to match these SPs too. Basically no one behind this router box
>> can use GRE (or PPTP).
>
> This is expected since forwarded GRE packets match the selector
> given.
Yes. I forgot to explicitly mention, that I thought just removing the
'fwd' policy would fix this. It's slightly confusing that that input path
is split to two separate policy db's, while output is not.
>> My ideas so far have been:
>> a) rename 'fwd' to 'infwd' and split 'out' to 'out' and 'outfwd' ?
>> (sounds kinda intrusive)
>> b) iptables target that would be able to disable xfrm
>>
>> Any other ideas?
>> What would be the proper fix for this problem?
>
> We could add the fwmark as a key.
Ah, sounds even better.
> Alexey and others may have better ideas on this.
Thanks!
Timo
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox