* [PATCH 6/9] stmmac: rework the code to get the Synopsys ID
From: Giuseppe CAVALLARO @ 2011-08-25 8:00 UTC (permalink / raw)
To: netdev; +Cc: Giuseppe Cavallaro
In-Reply-To: <1314259229-13767-1-git-send-email-peppe.cavallaro@st.com>
The Synopsys ID is now passed from the MAC core
to the main. This info will be used for managing
the HW cap register (supported in the new GMAC
generations).
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
drivers/net/stmmac/common.h | 1 +
drivers/net/stmmac/dwmac1000_core.c | 6 ++----
drivers/net/stmmac/dwmac100_core.c | 1 +
drivers/net/stmmac/stmmac_main.c | 20 +++++++++++++++++++-
4 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/drivers/net/stmmac/common.h b/drivers/net/stmmac/common.h
index 290b97a..79c6171 100644
--- a/drivers/net/stmmac/common.h
+++ b/drivers/net/stmmac/common.h
@@ -229,6 +229,7 @@ struct mac_device_info {
const struct stmmac_dma_ops *dma;
struct mii_regs mii; /* MII register Addresses */
struct mac_link link;
+ unsigned int synopsys_uid;
};
struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr);
diff --git a/drivers/net/stmmac/dwmac1000_core.c b/drivers/net/stmmac/dwmac1000_core.c
index 9ba9cae..b1c48b9 100644
--- a/drivers/net/stmmac/dwmac1000_core.c
+++ b/drivers/net/stmmac/dwmac1000_core.c
@@ -224,10 +224,7 @@ static const struct stmmac_ops dwmac1000_ops = {
struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr)
{
struct mac_device_info *mac;
- u32 uid = readl(ioaddr + GMAC_VERSION);
-
- pr_info("\tDWMAC1000 - user ID: 0x%x, Synopsys ID: 0x%x\n",
- ((uid & 0x0000ff00) >> 8), (uid & 0x000000ff));
+ u32 hwid = readl(ioaddr + GMAC_VERSION);
mac = kzalloc(sizeof(const struct mac_device_info), GFP_KERNEL);
if (!mac)
@@ -241,6 +238,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr)
mac->link.speed = GMAC_CONTROL_FES;
mac->mii.addr = GMAC_MII_ADDR;
mac->mii.data = GMAC_MII_DATA;
+ mac->synopsys_uid = hwid;
return mac;
}
diff --git a/drivers/net/stmmac/dwmac100_core.c b/drivers/net/stmmac/dwmac100_core.c
index aacfc6e..138fb8d 100644
--- a/drivers/net/stmmac/dwmac100_core.c
+++ b/drivers/net/stmmac/dwmac100_core.c
@@ -188,6 +188,7 @@ struct mac_device_info *dwmac100_setup(void __iomem *ioaddr)
mac->link.speed = 0;
mac->mii.addr = MAC_MII_ADDR;
mac->mii.data = MAC_MII_DATA;
+ mac->synopsys_uid = 0;
return mac;
}
diff --git a/drivers/net/stmmac/stmmac_main.c b/drivers/net/stmmac/stmmac_main.c
index 5b20e1b..f5aac12 100644
--- a/drivers/net/stmmac/stmmac_main.c
+++ b/drivers/net/stmmac/stmmac_main.c
@@ -762,6 +762,23 @@ static void stmmac_mmc_setup(struct stmmac_priv *priv)
memset(&priv->mmc, 0, sizeof(struct stmmac_counters));
}
+static u32 stmmac_get_synopsys_id(struct stmmac_priv *priv)
+{
+ u32 hwid = priv->hw->synopsys_uid;
+
+ /* Only check valid Synopsys Id because old MAC chips
+ * have no HW registers where get the ID */
+ if (likely(hwid)) {
+ u32 uid = ((hwid & 0x0000ff00) >> 8);
+ u32 synid = (hwid & 0x000000ff);
+
+ pr_info("STMMAC - user ID: 0x%x, Synopsys ID: 0x%x\n",
+ uid, synid);
+
+ return synid;
+ }
+ return 0;
+}
/**
* stmmac_open - open entry point of the driver
* @dev : pointer to the device structure.
@@ -834,7 +851,8 @@ static int stmmac_open(struct net_device *dev)
/* Initialize the MAC Core */
priv->hw->mac->core_init(priv->ioaddr);
- priv->rx_coe = priv->hw->mac->rx_coe(priv->ioaddr);
+ stmmac_get_synopsys_id(priv);
+
if (priv->rx_coe)
pr_info("stmmac: Rx Checksum Offload Engine supported\n");
if (priv->plat->tx_coe)
--
1.7.4.4
^ permalink raw reply related
* [PATCH 7/9] stmmac: add HW DMA feature register
From: Giuseppe CAVALLARO @ 2011-08-25 8:00 UTC (permalink / raw)
To: netdev; +Cc: Giuseppe Cavallaro
In-Reply-To: <1314259229-13767-1-git-send-email-peppe.cavallaro@st.com>
New GMAC chips have an extra register to indicate
the presence of the optional features/functions of
the DMA core.
This patch adds this support and all the HW cap
are exported via debugfs.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
drivers/net/stmmac/common.h | 33 +++++++++
drivers/net/stmmac/dwmac1000_dma.c | 6 ++
drivers/net/stmmac/dwmac_dma.h | 1 +
drivers/net/stmmac/stmmac.h | 1 +
drivers/net/stmmac/stmmac_main.c | 132 ++++++++++++++++++++++++++++++++++++
5 files changed, 173 insertions(+), 0 deletions(-)
diff --git a/drivers/net/stmmac/common.h b/drivers/net/stmmac/common.h
index 79c6171..1733019 100644
--- a/drivers/net/stmmac/common.h
+++ b/drivers/net/stmmac/common.h
@@ -115,6 +115,37 @@ enum tx_dma_irq_status {
handle_tx_rx = 3,
};
+/* DMA HW capabilities */
+struct dma_features {
+ unsigned int mbps_10_100;
+ unsigned int mbps_1000;
+ unsigned int half_duplex;
+ unsigned int hash_filter;
+ unsigned int multi_addr;
+ unsigned int pcs;
+ unsigned int sma_mdio;
+ unsigned int pmt_remote_wake_up;
+ unsigned int pmt_magic_frame;
+ unsigned int rmon;
+ /* IEEE 1588-2002*/
+ unsigned int time_stamp;
+ /* IEEE 1588-2008*/
+ unsigned int atime_stamp;
+ /* 802.3az - Energy-Efficient Ethernet (EEE) */
+ unsigned int eee;
+ unsigned int av;
+ /* TX and RX csum */
+ unsigned int tx_coe;
+ unsigned int rx_coe_type1;
+ unsigned int rx_coe_type2;
+ unsigned int rxfifo_over_2048;
+ /* TX and RX number of channels */
+ unsigned int number_rx_channel;
+ unsigned int number_tx_channel;
+ /* Alternate (enhanced) DESC mode*/
+ unsigned int enh_desc;
+};
+
/* GMAC TX FIFO is 8K, Rx FIFO is 16K */
#define BUF_SIZE_16KiB 16384
#define BUF_SIZE_8KiB 8192
@@ -187,6 +218,8 @@ struct stmmac_dma_ops {
void (*stop_rx) (void __iomem *ioaddr);
int (*dma_interrupt) (void __iomem *ioaddr,
struct stmmac_extra_stats *x);
+ /* If supported then get the optional core features */
+ unsigned int (*get_hw_feature) (void __iomem *ioaddr);
};
struct stmmac_ops {
diff --git a/drivers/net/stmmac/dwmac1000_dma.c b/drivers/net/stmmac/dwmac1000_dma.c
index 3dbeea6..5c17d35 100644
--- a/drivers/net/stmmac/dwmac1000_dma.c
+++ b/drivers/net/stmmac/dwmac1000_dma.c
@@ -139,6 +139,11 @@ static void dwmac1000_dump_dma_regs(void __iomem *ioaddr)
}
}
+static unsigned int dwmac1000_get_hw_feature(void __iomem *ioaddr)
+{
+ return readl(ioaddr + DMA_HW_FEATURE);
+}
+
const struct stmmac_dma_ops dwmac1000_dma_ops = {
.init = dwmac1000_dma_init,
.dump_regs = dwmac1000_dump_dma_regs,
@@ -152,4 +157,5 @@ const struct stmmac_dma_ops dwmac1000_dma_ops = {
.start_rx = dwmac_dma_start_rx,
.stop_rx = dwmac_dma_stop_rx,
.dma_interrupt = dwmac_dma_interrupt,
+ .get_hw_feature = dwmac1000_get_hw_feature,
};
diff --git a/drivers/net/stmmac/dwmac_dma.h b/drivers/net/stmmac/dwmac_dma.h
index da3f5cc..437edac 100644
--- a/drivers/net/stmmac/dwmac_dma.h
+++ b/drivers/net/stmmac/dwmac_dma.h
@@ -34,6 +34,7 @@
#define DMA_MISSED_FRAME_CTR 0x00001020 /* Missed Frame Counter */
#define DMA_CUR_TX_BUF_ADDR 0x00001050 /* Current Host Tx Buffer */
#define DMA_CUR_RX_BUF_ADDR 0x00001054 /* Current Host Rx Buffer */
+#define DMA_HW_FEATURE 0x00001058 /* HW Feature Register */
/* DMA Control register defines */
#define DMA_CONTROL_ST 0x00002000 /* Start/Stop Transmission */
diff --git a/drivers/net/stmmac/stmmac.h b/drivers/net/stmmac/stmmac.h
index f86ea87..3461676 100644
--- a/drivers/net/stmmac/stmmac.h
+++ b/drivers/net/stmmac/stmmac.h
@@ -79,6 +79,7 @@ struct stmmac_priv {
#endif
struct plat_stmmacenet_data *plat;
struct stmmac_counters mmc;
+ struct dma_features dma_cap;
};
extern int stmmac_mdio_unregister(struct net_device *ndev);
diff --git a/drivers/net/stmmac/stmmac_main.c b/drivers/net/stmmac/stmmac_main.c
index f5aac12..6d806b5 100644
--- a/drivers/net/stmmac/stmmac_main.c
+++ b/drivers/net/stmmac/stmmac_main.c
@@ -779,6 +779,49 @@ static u32 stmmac_get_synopsys_id(struct stmmac_priv *priv)
}
return 0;
}
+
+/* New GMAC chips support a new register to indicate the
+ * presence of the optional feature/functions.
+ */
+static int stmmac_get_hw_features(struct stmmac_priv *priv)
+{
+ u32 hw_cap = priv->hw->dma->get_hw_feature(priv->ioaddr);
+
+ if (likely(hw_cap)) {
+ priv->dma_cap.mbps_10_100 = (hw_cap & 0x1);
+ priv->dma_cap.mbps_1000 = (hw_cap & 0x2) >> 1;
+ priv->dma_cap.half_duplex = (hw_cap & 0x4) >> 2;
+ priv->dma_cap.hash_filter = (hw_cap & 0x10) >> 4;
+ priv->dma_cap.multi_addr = (hw_cap & 0x20) >> 5;
+ priv->dma_cap.pcs = (hw_cap & 0x40) >> 6;
+ priv->dma_cap.sma_mdio = (hw_cap & 0x100) >> 8;
+ priv->dma_cap.pmt_remote_wake_up = (hw_cap & 0x200) >> 9;
+ priv->dma_cap.pmt_magic_frame = (hw_cap & 0x400) >> 10;
+ priv->dma_cap.rmon = (hw_cap & 0x800) >> 11; /* MMC */
+ /* IEEE 1588-2002*/
+ priv->dma_cap.time_stamp = (hw_cap & 0x1000) >> 12;
+ /* IEEE 1588-2008*/
+ priv->dma_cap.atime_stamp = (hw_cap & 0x2000) >> 13;
+ /* 802.3az - Energy-Efficient Ethernet (EEE) */
+ priv->dma_cap.eee = (hw_cap & 0x4000) >> 14;
+ priv->dma_cap.av = (hw_cap & 0x8000) >> 15;
+ /* TX and RX csum */
+ priv->dma_cap.tx_coe = (hw_cap & 0x10000) >> 16;
+ priv->dma_cap.rx_coe_type1 = (hw_cap & 0x20000) >> 17;
+ priv->dma_cap.rx_coe_type2 = (hw_cap & 0x40000) >> 18;
+ priv->dma_cap.rxfifo_over_2048 = (hw_cap & 0x80000) >> 19;
+ /* TX and RX number of channels */
+ priv->dma_cap.number_rx_channel = (hw_cap & 0x300000) >> 20;
+ priv->dma_cap.number_tx_channel = (hw_cap & 0xc00000) >> 22;
+ /* Alternate (enhanced) DESC mode*/
+ priv->dma_cap.enh_desc = (hw_cap & 0x1000000) >> 24;
+
+ } else
+ pr_debug("\tNo HW DMA feature register supported");
+
+ return hw_cap;
+}
+
/**
* stmmac_open - open entry point of the driver
* @dev : pointer to the device structure.
@@ -853,6 +896,8 @@ static int stmmac_open(struct net_device *dev)
stmmac_get_synopsys_id(priv);
+ stmmac_get_hw_features(priv);
+
if (priv->rx_coe)
pr_info("stmmac: Rx Checksum Offload Engine supported\n");
if (priv->plat->tx_coe)
@@ -1450,6 +1495,7 @@ static int stmmac_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
static struct dentry *stmmac_fs_dir;
static struct dentry *stmmac_rings_status;
static struct dentry *stmmac_mmc;
+static struct dentry *stmmac_dma_cap;
static int stmmac_sysfs_ring_read(struct seq_file *seq, void *v)
{
@@ -1715,6 +1761,78 @@ static const struct file_operations stmmac_mmc_fops = {
.release = seq_release,
};
+static int stmmac_sysfs_dma_cap_read(struct seq_file *seq, void *v)
+{
+ struct net_device *dev = seq->private;
+ struct stmmac_priv *priv = netdev_priv(dev);
+
+ if (!stmmac_get_hw_features(priv)) {
+ seq_printf(seq, "DMA HW features not supported\n");
+ return 0;
+ }
+
+ seq_printf(seq, "==============================\n");
+ seq_printf(seq, "\tDMA HW features\n");
+ seq_printf(seq, "==============================\n");
+
+ seq_printf(seq, "\t10/100 Mbps %s\n",
+ (priv->dma_cap.mbps_10_100) ? "Y" : "N");
+ seq_printf(seq, "\t1000 Mbps %s\n",
+ (priv->dma_cap.mbps_1000) ? "Y" : "N");
+ seq_printf(seq, "\tHalf duple %s\n",
+ (priv->dma_cap.half_duplex) ? "Y" : "N");
+ seq_printf(seq, "\tHash Filter: %s\n",
+ (priv->dma_cap.hash_filter) ? "Y" : "N");
+ seq_printf(seq, "\tMultiple MAC address registers: %s\n",
+ (priv->dma_cap.multi_addr) ? "Y" : "N");
+ seq_printf(seq, "\tPCS (TBI/SGMII/RTBI PHY interfatces): %s\n",
+ (priv->dma_cap.pcs) ? "Y" : "N");
+ seq_printf(seq, "\tSMA (MDIO) Interface: %s\n",
+ (priv->dma_cap.sma_mdio) ? "Y" : "N");
+ seq_printf(seq, "\tPMT Remote wake up: %s\n",
+ (priv->dma_cap.pmt_remote_wake_up) ? "Y" : "N");
+ seq_printf(seq, "\tPMT Magic Frame: %s\n",
+ (priv->dma_cap.pmt_magic_frame) ? "Y" : "N");
+ seq_printf(seq, "\tRMON module: %s\n",
+ (priv->dma_cap.rmon) ? "Y" : "N");
+ seq_printf(seq, "\tIEEE 1588-2002 Time Stamp: %s\n",
+ (priv->dma_cap.time_stamp) ? "Y" : "N");
+ seq_printf(seq, "\tIEEE 1588-2008 Advanced Time Stamp:%s\n",
+ (priv->dma_cap.atime_stamp) ? "Y" : "N");
+ seq_printf(seq, "\t802.3az - Energy-Efficient Ethernet (EEE) %s\n",
+ (priv->dma_cap.eee) ? "Y" : "N");
+ seq_printf(seq, "\tAV features: %s\n", (priv->dma_cap.av) ? "Y" : "N");
+ seq_printf(seq, "\tChecksum Offload in TX: %s\n",
+ (priv->dma_cap.tx_coe) ? "Y" : "N");
+ seq_printf(seq, "\tIP Checksum Offload (type1) in RX: %s\n",
+ (priv->dma_cap.rx_coe_type1) ? "Y" : "N");
+ seq_printf(seq, "\tIP Checksum Offload (type2) in RX: %s\n",
+ (priv->dma_cap.rx_coe_type2) ? "Y" : "N");
+ seq_printf(seq, "\tRXFIFO > 2048bytes: %s\n",
+ (priv->dma_cap.rxfifo_over_2048) ? "Y" : "N");
+ seq_printf(seq, "\tNumber of Additional RX channel: %d\n",
+ priv->dma_cap.number_rx_channel);
+ seq_printf(seq, "\tNumber of Additional TX channel: %d\n",
+ priv->dma_cap.number_tx_channel);
+ seq_printf(seq, "\tEnhanced descriptors: %s\n",
+ (priv->dma_cap.enh_desc) ? "Y" : "N");
+
+ return 0;
+}
+
+static int stmmac_sysfs_dma_cap_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, stmmac_sysfs_dma_cap_read, inode->i_private);
+}
+
+static const struct file_operations stmmac_dma_cap_fops = {
+ .owner = THIS_MODULE,
+ .open = stmmac_sysfs_dma_cap_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
static int stmmac_init_fs(struct net_device *dev)
{
/* Create debugfs entries */
@@ -1751,6 +1869,19 @@ static int stmmac_init_fs(struct net_device *dev)
return -ENOMEM;
}
+ /* Entry to report the DMA HW features */
+ stmmac_dma_cap = debugfs_create_file("dma_cap", S_IRUGO, stmmac_fs_dir,
+ dev, &stmmac_dma_cap_fops);
+
+ if (!stmmac_dma_cap || IS_ERR(stmmac_dma_cap)) {
+ pr_info("ERROR creating stmmac MMC debugfs file\n");
+ debugfs_remove(stmmac_rings_status);
+ debugfs_remove(stmmac_mmc);
+ debugfs_remove(stmmac_fs_dir);
+
+ return -ENOMEM;
+ }
+
return 0;
}
@@ -1758,6 +1889,7 @@ static void stmmac_exit_fs(void)
{
debugfs_remove(stmmac_rings_status);
debugfs_remove(stmmac_mmc);
+ debugfs_remove(stmmac_dma_cap);
debugfs_remove(stmmac_fs_dir);
}
#endif /* CONFIG_STMMAC_MONITOR */
--
1.7.4.4
^ permalink raw reply related
* [PATCH 8/9] stmmac: update the doc with new info about the driver's debug.
From: Giuseppe CAVALLARO @ 2011-08-25 8:00 UTC (permalink / raw)
To: netdev; +Cc: Giuseppe Cavallaro
In-Reply-To: <1314259229-13767-1-git-send-email-peppe.cavallaro@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
Documentation/networking/stmmac.txt | 39 ++++++++++++++++++++++++++++++++++-
1 files changed, 38 insertions(+), 1 deletions(-)
diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index 57a2410..d82e180 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -235,7 +235,44 @@ reset procedure etc).
o enh_desc.c: functions for handling enhanced descriptors
o norm_desc.c: functions for handling normal descriptors
-5) TODO:
+5) Debug Information
+
+The driver exports many information i.e. internal statistics,
+debug information, MAC and DMA registers etc.
+
+These can be read in several ways depending on the
+type of the information actually needed.
+
+For example a user can be use the ethtool support
+to get statistics: e.g. using: ethtool -S ethX
+or sees the MAC/DMA registers: e.g. using: ethtool -d ethX
+
+Compiling the Kernel with CONFIG_DEBUG_FS and enabling the
+STMMAC_MONITORING option the driver will export the following
+debugfs entries:
+
+/sys/kernel/debug/stmmaceth/descriptors_status
+ To show the DMA TX/RX descriptor rings
+
+/sys/kernel/debug/stmmaceth/mmc
+ To show the internal Management counters (MMC)
+ if supported in the core.
+
+/sys/kernel/debug/stmmaceth/dma_cap
+ To show the DMA HW features register (if supported)
+
+Developer can also use the "debug" module parameter to get
+further debug information.
+
+In the end, there are other macros (that cannot be enabled
+via menuconfig) to turn-on the RX/TX DMA debugging,
+specific MAC core debug printk etc. Others to enable the
+debug in the TX and RX processes.
+All these are only useful during the developing stage
+and should never enabled inside the code for general usage.
+In fact, these can generate an huge amount of debug messages.
+
+6) TODO:
o XGMAC is not supported.
o Review the timer optimisation code to use an embedded device that will be
available in new chip generations.
--
1.7.4.4
^ permalink raw reply related
* [PATCH 9/9] stmmac: update the driver version (Aug_2011)
From: Giuseppe CAVALLARO @ 2011-08-25 8:00 UTC (permalink / raw)
To: netdev; +Cc: Giuseppe Cavallaro
In-Reply-To: <1314259229-13767-1-git-send-email-peppe.cavallaro@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
drivers/net/stmmac/stmmac.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/stmmac/stmmac.h b/drivers/net/stmmac/stmmac.h
index 3461676..6bc0d0c 100644
--- a/drivers/net/stmmac/stmmac.h
+++ b/drivers/net/stmmac/stmmac.h
@@ -20,7 +20,7 @@
Author: Giuseppe Cavallaro <peppe.cavallaro@st.com>
*******************************************************************************/
-#define DRV_MODULE_VERSION "July_2011"
+#define DRV_MODULE_VERSION "Aug_2011"
#include <linux/stmmac.h>
#include "common.h"
--
1.7.4.4
^ permalink raw reply related
* Re: [PATCH] stmmac: remove the STBus bridge setting from the GMAC code
From: Giuseppe CAVALLARO @ 2011-08-25 8:01 UTC (permalink / raw)
To: ML netdev, David S. Miller
In-Reply-To: <1314021507-26955-1-git-send-email-peppe.cavallaro@st.com>
Hello David
you can discard this because I'm resending it in a bundle of patches to
update the whole driver to the Aug_2011 version.
Peppe
On 8/22/2011 3:58 PM, Giuseppe CAVALLARO wrote:
> This patch removes a piece of code (actually commented)
> only useful for some ST platforms in the past.
>
> This kind of setting now can be done by using the platform
> callbacks provided in linux/stmmac.h (see the stmmac.txt for
> further details).
>
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> ---
> drivers/net/stmmac/dwmac1000_core.c | 3 ---
> 1 files changed, 0 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/stmmac/dwmac1000_core.c b/drivers/net/stmmac/dwmac1000_core.c
> index 0f63b3c..eea184a 100644
> --- a/drivers/net/stmmac/dwmac1000_core.c
> +++ b/drivers/net/stmmac/dwmac1000_core.c
> @@ -37,9 +37,6 @@ static void dwmac1000_core_init(void __iomem *ioaddr)
> value |= GMAC_CORE_INIT;
> writel(value, ioaddr + GMAC_CONTROL);
>
> - /* STBus Bridge Configuration */
> - /*writel(0xc5608, ioaddr + 0x00007000);*/
> -
> /* Freeze MMC counters */
> writel(0x8, ioaddr + GMAC_MMC_CTRL);
> /* Mask GMAC interrupts */
^ permalink raw reply
* how to distribute irqs of ixgbevf
From: J.Hwan Kim @ 2011-08-25 8:07 UTC (permalink / raw)
To: netdev
Hi, everyone
The interrupts of my ixgbevf driver occurs only Core 0
although the user space "irqbalance" serivce is working.
How can I distribute the interrupt of RX in ixgbevf to all cores?
cat /proc/interrupts | grep "isv"
97: 8 0 0 0 0
0 0 0 PCI-MSI-edge isv0-rx-0
99: 7 0 0 0 0
0 0 0 PCI-MSI-edge isv0:lsc
103: 2059 0 0 0 0
0 0 0 PCI-MSI-edge isv2-rx-0
104: 14 0 0 0 0
0 0 0 PCI-MSI-edge isv2-tx-0
105: 1 0 0 0 0
0 0 0 PCI-MSI-edge isv2:mbx
"isv" is netdevice name of my ixgbevf.
Thanks in advance.
Best Regards,
J.Hwan Kim
^ permalink raw reply
* Re: how to distribute irqs of ixgbevf
From: Eric Dumazet @ 2011-08-25 8:21 UTC (permalink / raw)
To: J.Hwan Kim; +Cc: netdev
In-Reply-To: <4E5602D6.90807@gmail.com>
Le jeudi 25 août 2011 à 17:07 +0900, J.Hwan Kim a écrit :
> Hi, everyone
>
> The interrupts of my ixgbevf driver occurs only Core 0
> although the user space "irqbalance" serivce is working.
>
> How can I distribute the interrupt of RX in ixgbevf to all cores?
>
> cat /proc/interrupts | grep "isv"
> 97: 8 0 0 0 0
> 0 0 0 PCI-MSI-edge isv0-rx-0
> 99: 7 0 0 0 0
> 0 0 0 PCI-MSI-edge isv0:lsc
> 103: 2059 0 0 0 0
> 0 0 0 PCI-MSI-edge isv2-rx-0
> 104: 14 0 0 0 0
> 0 0 0 PCI-MSI-edge isv2-tx-0
> 105: 1 0 0 0 0
> 0 0 0 PCI-MSI-edge isv2:mbx
>
> "isv" is netdevice name of my ixgbevf.
>
>
Given load is very small, irqbalance chose to send interrupts on a
single cpu.
^ permalink raw reply
* Re: [PATCH] tcp: bound RTO to minimum
From: Eric Dumazet @ 2011-08-25 8:26 UTC (permalink / raw)
To: Alexander Zimmermann
Cc: Yuchung Cheng, Hagen Paul Pfeifer, netdev, Hannemann Arnd,
Lukowski Damian
In-Reply-To: <4033BFEE-C432-4D94-8372-BA166AF2AA26@comsys.rwth-aachen.de>
Le jeudi 25 août 2011 à 09:28 +0200, Alexander Zimmermann a écrit :
> Hi Eric,
>
> Am 25.08.2011 um 07:28 schrieb Eric Dumazet:
> > Real question is : do we really want to process ~1000 timer interrupts
> > per tcp session, ~2000 skb alloc/free/build/handling, possibly ~1000 ARP
> > requests, only to make tcp revover in ~1sec when connectivity returns
> > back. This just doesnt scale.
>
> maybe a stupid question, but 1000?. With an minRTO of 200ms and a maximum
> probing time of 120s, we 600 retransmits in a worst-case-senario
> (assumed that we get for every rot retransmission an icmp). No?
Where is asserted the "max probing time of 120s" ?
It is not the case on my machine :
I have way more retransmits than that, even if spaced by 1600 ms
07:16:13.389331 write(3, "\350F\235JC\357\376\363&\3\374\270R\21L\26\324{\37p\342\244i\304\356\241I:\301\332\222\26"..., 48) = 48
07:16:13.389417 select(7, [3 4], [], NULL, NULL) = 1 (in [3])
07:31:39.901311 read(3, 0xff8c4c90, 8192) = -1 EHOSTUNREACH (No route to host)
Old kernels where performing up to 15 retries, doing exponential backoff.
Now its kind of unlimited, according to experimental results.
^ permalink raw reply
* [PATCH 0/9] skb fragment API: convert non-network drivers
From: Ian Campbell @ 2011-08-25 8:28 UTC (permalink / raw)
To: netdev-u79uwXL29TY76Z2rM5mHXA
Cc: linux-atm-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-scsi-u79uwXL29TY76Z2rM5mHXA, devel-s9riP+hp16TNLxjTenLetw
The following series converts some non-network drivers to the SKB pages
fragment API introduced in 131ea6675c76. Included are ATM, Infiniband,
and FibreChannel. I also included the broadcom network drivers since I
was touching the related FC driver.
This is part of my series to enable visibility into SKB paged fragment's
lifecycles, [0] contains some more background and rationale but
basically the completed series will allow entities which inject pages
into the networking stack to receive a notification when the stack has
really finished with those pages (i.e. including retransmissions,
clones, pull-ups etc) and not just when the original skb is finished
with, which is beneficial to many subsystems which wish to inject pages
into the network stack without giving up full ownership of those page's
lifecycle. It implements something broadly along the lines of what was
described in [1].
Cheers,
Ian.
[0] http://marc.info/?l=linux-netdev&m=131072801125521&w=2
[1] http://marc.info/?l=linux-netdev&m=130925719513084&w=2
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH 2/9] IB: amso1100: convert to SKB paged frag API.
From: Ian Campbell @ 2011-08-25 8:28 UTC (permalink / raw)
To: netdev-u79uwXL29TY76Z2rM5mHXA
Cc: Ian Campbell, Tom Tucker, Steve Wise, Roland Dreier, Sean Hefty,
Hal Rosenstock, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1314260881.10283.48.camel-o4Be2W7LfRlXesXXhkcM7miJhflN2719@public.gmane.org>
Signed-off-by: Ian Campbell <ian.campbell-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: Tom Tucker <tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
Cc: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Hal Rosenstock <hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
drivers/infiniband/hw/amso1100/c2.c | 8 +++-----
1 files changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/infiniband/hw/amso1100/c2.c b/drivers/infiniband/hw/amso1100/c2.c
index 444470a..6a8f36e 100644
--- a/drivers/infiniband/hw/amso1100/c2.c
+++ b/drivers/infiniband/hw/amso1100/c2.c
@@ -802,11 +802,9 @@ static int c2_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
maplen = frag->size;
- mapaddr =
- pci_map_page(c2dev->pcidev, frag->page,
- frag->page_offset, maplen,
- PCI_DMA_TODEVICE);
-
+ mapaddr = skb_frag_dma_map(&c2dev->pcidev->dev, frag,
+ 0, maplen,
+ PCI_DMA_TODEVICE);
elem = elem->next;
elem->skb = NULL;
elem->mapaddr = mapaddr;
--
1.7.2.5
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 3/9] IB: nes: convert to SKB paged frag API.
From: Ian Campbell @ 2011-08-25 8:28 UTC (permalink / raw)
To: netdev-u79uwXL29TY76Z2rM5mHXA
Cc: Ian Campbell, Faisal Latif, Roland Dreier, Sean Hefty,
Hal Rosenstock, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1314260881.10283.48.camel-o4Be2W7LfRlXesXXhkcM7miJhflN2719@public.gmane.org>
Signed-off-by: Ian Campbell <ian.campbell-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: Faisal Latif <faisal.latif-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Hal Rosenstock <hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
drivers/infiniband/hw/nes/nes_nic.c | 21 +++++++++++----------
1 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c
index 66e1229..96cb35a 100644
--- a/drivers/infiniband/hw/nes/nes_nic.c
+++ b/drivers/infiniband/hw/nes/nes_nic.c
@@ -441,11 +441,11 @@ static int nes_nic_send(struct sk_buff *skb, struct net_device *netdev)
nesnic->tx_skb[nesnic->sq_head] = skb;
for (skb_fragment_index = 0; skb_fragment_index < skb_shinfo(skb)->nr_frags;
skb_fragment_index++) {
- bus_address = pci_map_page( nesdev->pcidev,
- skb_shinfo(skb)->frags[skb_fragment_index].page,
- skb_shinfo(skb)->frags[skb_fragment_index].page_offset,
- skb_shinfo(skb)->frags[skb_fragment_index].size,
- PCI_DMA_TODEVICE);
+ skb_frag_t *frag =
+ &skb_shinfo(skb)->frags[skb_fragment_index];
+ bus_address = skb_frag_dma_map(&nesdev->pcidev->dev,
+ frag, 0, frag->size,
+ PCI_DMA_TODEVICE);
wqe_fragment_length[wqe_fragment_index] =
cpu_to_le16(skb_shinfo(skb)->frags[skb_fragment_index].size);
set_wqe_64bit_value(nic_sqe->wqe_words, NES_NIC_SQ_WQE_FRAG0_LOW_IDX+(2*wqe_fragment_index),
@@ -561,11 +561,12 @@ tso_sq_no_longer_full:
/* Map all the buffers */
for (tso_frag_count=0; tso_frag_count < skb_shinfo(skb)->nr_frags;
tso_frag_count++) {
- tso_bus_address[tso_frag_count] = pci_map_page( nesdev->pcidev,
- skb_shinfo(skb)->frags[tso_frag_count].page,
- skb_shinfo(skb)->frags[tso_frag_count].page_offset,
- skb_shinfo(skb)->frags[tso_frag_count].size,
- PCI_DMA_TODEVICE);
+ skb_frag_t *frag =
+ &skb_shinfo(skb)->frags[tso_frag_count];
+ tso_bus_address[tso_frag_count] =
+ skb_frag_dma_map(&nesdev->pcidev->dev,
+ frag, 0, frag->size,
+ PCI_DMA_TODEVICE);
}
tso_frag_index = 0;
--
1.7.2.5
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 4/9] IPoIB: convert to SKB paged frag API.
From: Ian Campbell @ 2011-08-25 8:28 UTC (permalink / raw)
To: netdev-u79uwXL29TY76Z2rM5mHXA
Cc: Ian Campbell, Roland Dreier, Sean Hefty, Hal Rosenstock,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1314260881.10283.48.camel-o4Be2W7LfRlXesXXhkcM7miJhflN2719@public.gmane.org>
Signed-off-by: Ian Campbell <ian.campbell-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Hal Rosenstock <hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
drivers/infiniband/ulp/ipoib/ipoib_cm.c | 5 +++--
drivers/infiniband/ulp/ipoib/ipoib_ib.c | 5 +++--
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 39913a0..67a477b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -169,7 +169,7 @@ static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev,
goto partial_error;
skb_fill_page_desc(skb, i, page, 0, PAGE_SIZE);
- mapping[i + 1] = ib_dma_map_page(priv->ca, skb_shinfo(skb)->frags[i].page,
+ mapping[i + 1] = ib_dma_map_page(priv->ca, page,
0, PAGE_SIZE, DMA_FROM_DEVICE);
if (unlikely(ib_dma_mapping_error(priv->ca, mapping[i + 1])))
goto partial_error;
@@ -537,7 +537,8 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
if (length == 0) {
/* don't need this page */
- skb_fill_page_desc(toskb, i, frag->page, 0, PAGE_SIZE);
+ skb_fill_page_desc(toskb, i, skb_frag_page(frag),
+ 0, PAGE_SIZE);
--skb_shinfo(skb)->nr_frags;
} else {
size = min(length, (unsigned) PAGE_SIZE);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 81ae61d..00435be 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -182,7 +182,7 @@ static struct sk_buff *ipoib_alloc_rx_skb(struct net_device *dev, int id)
goto partial_error;
skb_fill_page_desc(skb, 0, page, 0, PAGE_SIZE);
mapping[1] =
- ib_dma_map_page(priv->ca, skb_shinfo(skb)->frags[0].page,
+ ib_dma_map_page(priv->ca, page,
0, PAGE_SIZE, DMA_FROM_DEVICE);
if (unlikely(ib_dma_mapping_error(priv->ca, mapping[1])))
goto partial_error;
@@ -323,7 +323,8 @@ static int ipoib_dma_map_tx(struct ib_device *ca,
for (i = 0; i < skb_shinfo(skb)->nr_frags; ++i) {
skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
- mapping[i + off] = ib_dma_map_page(ca, frag->page,
+ mapping[i + off] = ib_dma_map_page(ca,
+ skb_frag_page(frag),
frag->page_offset, frag->size,
DMA_TO_DEVICE);
if (unlikely(ib_dma_mapping_error(ca, mapping[i + off])))
--
1.7.2.5
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 5/9] tg3: convert to SKB paged frag API.
From: Ian Campbell @ 2011-08-25 8:28 UTC (permalink / raw)
To: netdev-u79uwXL29TY76Z2rM5mHXA
Cc: devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Matt Carlson,
Ian Campbell, Michael Chan
In-Reply-To: <1314260881.10283.48.camel-o4Be2W7LfRlXesXXhkcM7miJhflN2719@public.gmane.org>
Signed-off-by: Ian Campbell <ian.campbell-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: Matt Carlson <mcarlson-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Cc: Michael Chan <mchan-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
---
drivers/net/ethernet/broadcom/tg3.c | 6 ++----
1 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 0f81111..a7e28a2 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6311,10 +6311,8 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
len = frag->size;
- mapping = pci_map_page(tp->pdev,
- frag->page,
- frag->page_offset,
- len, PCI_DMA_TODEVICE);
+ mapping = skb_frag_dma_map(&tp->pdev->dev, frag, 0,
+ len, PCI_DMA_TODEVICE);
tnapi->tx_buffers[entry].skb = NULL;
dma_unmap_addr_set(&tnapi->tx_buffers[entry], mapping,
--
1.7.2.5
^ permalink raw reply related
* [PATCH 1/9] atm: convert to SKB paged frag API.
From: Ian Campbell @ 2011-08-25 8:28 UTC (permalink / raw)
To: netdev; +Cc: Ian Campbell, Chas Williams, linux-atm-general
In-Reply-To: <1314260881.10283.48.camel@zakaz.uk.xensource.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Chas Williams <chas@cmf.nrl.navy.mil>
Cc: linux-atm-general@lists.sourceforge.net
Cc: netdev@vger.kernel.org
--
The original logic here appears to be bogus (adding page-offset to the struct
page * itself doesn't seem likely to be correct) but I left that unchanged for
this mechanical change.
---
drivers/atm/eni.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/drivers/atm/eni.c b/drivers/atm/eni.c
index 9307141..f7ca4c1 100644
--- a/drivers/atm/eni.c
+++ b/drivers/atm/eni.c
@@ -1134,7 +1134,8 @@ DPRINTK("doing direct send\n"); /* @@@ well, this doesn't work anyway */
skb_headlen(skb));
else
put_dma(tx->index,eni_dev->dma,&j,(unsigned long)
- skb_shinfo(skb)->frags[i].page + skb_shinfo(skb)->frags[i].page_offset,
+ skb_frag_page(&skb_shinfo(skb)->frags[i]) +
+ skb_shinfo(skb)->frags[i].page_offset,
skb_shinfo(skb)->frags[i].size);
}
if (skb->len & 3)
--
1.7.2.5
^ permalink raw reply related
* [PATCH 6/9] bnx2: convert to SKB paged frag API.
From: Ian Campbell @ 2011-08-25 8:28 UTC (permalink / raw)
To: netdev; +Cc: Ian Campbell, Michael Chan
In-Reply-To: <1314260881.10283.48.camel@zakaz.uk.xensource.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Michael Chan <mchan@broadcom.com>
Cc: netdev@vger.kernel.org
---
drivers/net/ethernet/broadcom/bnx2.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index 4a9a8c81..9afb653 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -2930,8 +2930,8 @@ bnx2_reuse_rx_skb_pages(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr,
shinfo = skb_shinfo(skb);
shinfo->nr_frags--;
- page = shinfo->frags[shinfo->nr_frags].page;
- shinfo->frags[shinfo->nr_frags].page = NULL;
+ page = skb_frag_page(&shinfo->frags[shinfo->nr_frags]);
+ __skb_frag_set_page(&shinfo->frags[shinfo->nr_frags], NULL);
cons_rx_pg->page = page;
dev_kfree_skb(skb);
@@ -6511,8 +6511,8 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev)
txbd = &txr->tx_desc_ring[ring_prod];
len = frag->size;
- mapping = dma_map_page(&bp->pdev->dev, frag->page, frag->page_offset,
- len, PCI_DMA_TODEVICE);
+ mapping = skb_frag_dma_map(&bp->pdev->dev, frag, 0, len,
+ PCI_DMA_TODEVICE);
if (dma_mapping_error(&bp->pdev->dev, mapping))
goto dma_error;
dma_unmap_addr_set(&txr->tx_buf_ring[ring_prod], mapping,
--
1.7.2.5
^ permalink raw reply related
* [PATCH 7/9] bnx2x: convert to SKB paged frag API.
From: Ian Campbell @ 2011-08-25 8:28 UTC (permalink / raw)
To: netdev; +Cc: Ian Campbell, Eilon Greenstein
In-Reply-To: <1314260881.10283.48.camel@zakaz.uk.xensource.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: netdev@vger.kernel.org
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 93bff08..5c3eb17 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -2800,9 +2800,8 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
- mapping = dma_map_page(&bp->pdev->dev, frag->page,
- frag->page_offset, frag->size,
- DMA_TO_DEVICE);
+ mapping = skb_frag_dma_map(&bp->pdev->dev, frag, 0, frag->size,
+ DMA_TO_DEVICE);
if (unlikely(dma_mapping_error(&bp->pdev->dev, mapping))) {
DP(NETIF_MSG_TX_QUEUED, "Unable to map page - "
--
1.7.2.5
^ permalink raw reply related
* [PATCH 8/9] bnx2fc: convert to SKB paged frag API.
From: Ian Campbell @ 2011-08-25 8:28 UTC (permalink / raw)
To: netdev
Cc: Ian Campbell, Bhanu Prakash Gollapudi, James E.J. Bottomley,
linux-scsi
In-Reply-To: <1314260881.10283.48.camel@zakaz.uk.xensource.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: linux-scsi@vger.kernel.org
Cc: netdev@vger.kernel.org
---
drivers/scsi/bnx2fc/bnx2fc_fcoe.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
index 7cb2cd4..2c780a7 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
@@ -302,7 +302,7 @@ static int bnx2fc_xmit(struct fc_lport *lport, struct fc_frame *fp)
return -ENOMEM;
}
frag = &skb_shinfo(skb)->frags[skb_shinfo(skb)->nr_frags - 1];
- cp = kmap_atomic(frag->page, KM_SKB_DATA_SOFTIRQ)
+ cp = kmap_atomic(skb_frag_page(frag), KM_SKB_DATA_SOFTIRQ)
+ frag->page_offset;
} else {
cp = (struct fcoe_crc_eof *)skb_put(skb, tlen);
--
1.7.2.5
^ permalink raw reply related
* [PATCH 9/9] fcoe: convert to SKB paged frag API.
From: Ian Campbell @ 2011-08-25 8:28 UTC (permalink / raw)
To: netdev; +Cc: Ian Campbell, Robert Love, James E.J. Bottomley, devel,
linux-scsi
In-Reply-To: <1314260881.10283.48.camel@zakaz.uk.xensource.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: devel@open-fcoe.org
Cc: linux-scsi@vger.kernel.org
Cc: netdev@vger.kernel.org
---
drivers/scsi/fcoe/fcoe.c | 2 +-
drivers/scsi/fcoe/fcoe_transport.c | 5 +++--
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
index ba710e3..3416ab6 100644
--- a/drivers/scsi/fcoe/fcoe.c
+++ b/drivers/scsi/fcoe/fcoe.c
@@ -1514,7 +1514,7 @@ int fcoe_xmit(struct fc_lport *lport, struct fc_frame *fp)
return -ENOMEM;
}
frag = &skb_shinfo(skb)->frags[skb_shinfo(skb)->nr_frags - 1];
- cp = kmap_atomic(frag->page, KM_SKB_DATA_SOFTIRQ)
+ cp = kmap_atomic(skb_frag_page(frag), KM_SKB_DATA_SOFTIRQ)
+ frag->page_offset;
} else {
cp = (struct fcoe_crc_eof *)skb_put(skb, tlen);
diff --git a/drivers/scsi/fcoe/fcoe_transport.c b/drivers/scsi/fcoe/fcoe_transport.c
index 41068e8..f6613f9 100644
--- a/drivers/scsi/fcoe/fcoe_transport.c
+++ b/drivers/scsi/fcoe/fcoe_transport.c
@@ -108,8 +108,9 @@ u32 fcoe_fc_crc(struct fc_frame *fp)
len = frag->size;
while (len > 0) {
clen = min(len, PAGE_SIZE - (off & ~PAGE_MASK));
- data = kmap_atomic(frag->page + (off >> PAGE_SHIFT),
- KM_SKB_DATA_SOFTIRQ);
+ data = kmap_atomic(
+ skb_frag_page(frag) + (off >> PAGE_SHIFT),
+ KM_SKB_DATA_SOFTIRQ);
crc = crc32(crc, data + (off & ~PAGE_MASK), clen);
kunmap_atomic(data, KM_SKB_DATA_SOFTIRQ);
off += clen;
--
1.7.2.5
^ permalink raw reply related
* Re: [PATCH] tcp: bound RTO to minimum
From: Alexander Zimmermann @ 2011-08-25 8:44 UTC (permalink / raw)
To: Eric Dumazet
Cc: Yuchung Cheng, Hagen Paul Pfeifer, netdev, Hannemann Arnd,
Lukowski Damian
In-Reply-To: <1314260805.2387.11.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>
Am 25.08.2011 um 10:26 schrieb Eric Dumazet:
> Le jeudi 25 août 2011 à 09:28 +0200, Alexander Zimmermann a écrit :
>> Hi Eric,
>>
>> Am 25.08.2011 um 07:28 schrieb Eric Dumazet:
>
>>> Real question is : do we really want to process ~1000 timer interrupts
>>> per tcp session, ~2000 skb alloc/free/build/handling, possibly ~1000 ARP
>>> requests, only to make tcp revover in ~1sec when connectivity returns
>>> back. This just doesnt scale.
>>
>> maybe a stupid question, but 1000?. With an minRTO of 200ms and a maximum
>> probing time of 120s, we 600 retransmits in a worst-case-senario
>> (assumed that we get for every rot retransmission an icmp). No?
>
> Where is asserted the "max probing time of 120s" ?
>
> It is not the case on my machine :
> I have way more retransmits than that, even if spaced by 1600 ms
>
> 07:16:13.389331 write(3, "\350F\235JC\357\376\363&\3\374\270R\21L\26\324{\37p\342\244i\304\356\241I:\301\332\222\26"..., 48) = 48
> 07:16:13.389417 select(7, [3 4], [], NULL, NULL) = 1 (in [3])
> 07:31:39.901311 read(3, 0xff8c4c90, 8192) = -1 EHOSTUNREACH (No route to host)
>
> Old kernels where performing up to 15 retries, doing exponential backoff.
Yes I know. And in combination with RFC6069 we have to convert this
See Section 7.1
and
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6fa12c85031485dff38ce550c24f10da23b0adaa
Is the transformation broken? Damian?
>
> Now its kind of unlimited, according to experimental results.
Ok, unlimited is not what I expect...
>
>
>
//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22222
// email: zimmermann@cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//
^ permalink raw reply
* Re: [PATCH] tcp: bound RTO to minimum
From: Arnd Hannemann @ 2011-08-25 8:46 UTC (permalink / raw)
To: Eric Dumazet
Cc: Alexander Zimmermann, Yuchung Cheng, Hagen Paul Pfeifer, netdev,
Lukowski Damian
In-Reply-To: <1314260805.2387.11.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>
Hi,
Am 25.08.2011 10:26, schrieb Eric Dumazet:
> Le jeudi 25 août 2011 à 09:28 +0200, Alexander Zimmermann a écrit :
>> Hi Eric,
>>
>> Am 25.08.2011 um 07:28 schrieb Eric Dumazet:
>
>>> Real question is : do we really want to process ~1000 timer interrupts
>>> per tcp session, ~2000 skb alloc/free/build/handling, possibly ~1000 ARP
>>> requests, only to make tcp revover in ~1sec when connectivity returns
>>> back. This just doesnt scale.
>>
>> maybe a stupid question, but 1000?. With an minRTO of 200ms and a maximum
>> probing time of 120s, we 600 retransmits in a worst-case-senario
>> (assumed that we get for every rot retransmission an icmp). No?
>
> Where is asserted the "max probing time of 120s" ?
>
> It is not the case on my machine :
> I have way more retransmits than that, even if spaced by 1600 ms
>
> 07:16:13.389331 write(3, "\350F\235JC\357\376\363&\3\374\270R\21L\26\324{\37p\342\244i\304\356\241I:\301\332\222\26"..., 48) = 48
> 07:16:13.389417 select(7, [3 4], [], NULL, NULL) = 1 (in [3])
> 07:31:39.901311 read(3, 0xff8c4c90, 8192) = -1 EHOSTUNREACH (No route to host)
>
> Old kernels where performing up to 15 retries, doing exponential backoff.
>
> Now its kind of unlimited, according to experimental results.
That shouldn't be. It should stop after the same time a TCP connection with an
RTO of Minimum RTO which is doing 15 retries (tcp_retries2=15) and doing exponential backoff.
So it should be around 900s*. But it could be that because of the icsk_retransmit wrapover
this doesn't work as expected.
* 200ms + 400ms + 800ms ...
Best regards,
Arnd
^ permalink raw reply
* Re: [BUG] tcp : how many times a frame can possibly be retransmitted ?
From: Ilpo Järvinen @ 2011-08-25 8:56 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, Jerry Chu, Damian Lukowski
In-Reply-To: <1314226834.6797.5.camel@edumazet-laptop>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1973 bytes --]
On Thu, 25 Aug 2011, Eric Dumazet wrote:
> Le jeudi 25 août 2011 à 01:44 +0300, Ilpo Järvinen a écrit :
> > On Wed, 24 Aug 2011, Eric Dumazet wrote:
> >
> > > On one dev machine running net-next, I just found strange tcp sessions
> > > that retransmit a frame forever (The other peer disappeared)
> > >
> > > # ss -emoi dst 10.2.1.1
> > > State Recv-Q Send-Q Local Address:Port Peer Address:Port
> > > ESTAB 0 816 10.2.1.2:37930 10.2.1.1:ssh timer:(on,630ms,246) ino:60786 sk:ffff8801189aa400
> > > mem:(r0,w3776,f320,t0) ts sack ecn cubic wscale:8,6 rto:1680 rtt:16.25/7.5 ato:40 ssthresh:7 send 1.4Mbps rcv_rtt:10 rcv_space:16632
> > >
> > >
> > > You can see the retransmit count : 246
> > >
> > > What possibly can be going on ?
> > >
> > > What happened to backoff ?
> >
> > But RTO (even without any backoffs) should be lower bounded to some not so
> > zeroish value?
>
> Apparently not.
>
> The only thing that protect us from a flood is that ip_error() uses
> inetpeer cache to ratelimit the icmp_send(ICMP_DEST_UNREACH)
>
> This is why we get retransmit period >= 1 sec
>
> vi +432 net/ipv4/tcp_ipv4.c
>
> icsk->icsk_backoff--;
> inet_csk(sk)->icsk_rto = (tp->srtt ? __tcp_set_rto(tp) :
> TCP_TIMEOUT_INIT) << icsk->icsk_backoff;
> tcp_bound_rto(sk);
>
> and __tcp_set_rto() uses : return (tp->srtt >> 3) + tp->rttvar;
So you think that this is not true: ?
/* NOTE: clamping at TCP_RTO_MIN is not required, current algo
* guarantees that rto is higher.
*/
...it would still be smaller than 1sec though, but certainly not going to
cause flooding either. Default tcp_rto_min should be 200ms so it's
5pkts+5ICMP sent, received and processed per second. Which doesn't sound
that bad CPU load?!?
It is unclear to me how tp->rttvar could become smaller than
tcp_rto_min().
--
i.
^ permalink raw reply
* Re: slow performance on disk/network i/o full speed after drop_caches
From: Stefan Priebe - Profihost AG @ 2011-08-25 9:00 UTC (permalink / raw)
To: Wu Fengguang
Cc: Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman,
Jens Axboe, Linux Netdev List
In-Reply-To: <20110824093336.GB5214@localhost>
Am 24.08.2011 11:33, schrieb Wu Fengguang:
> On Wed, Aug 24, 2011 at 05:01:03PM +0800, Stefan Priebe - Profihost AG wrote:
>>
>>>> sync&& echo 3>/proc/sys/vm/drop_caches&& sleep 2&& echo 0
>>>>> /proc/sys/vm/drop_caches
>>
>> Another way to get it working again is to stop some processes. Could be
>> mysql or apache or php fcgi doesn't matter. Just free some memory.
>> Although there are already 5GB free.
>
> Is it a NUMA machine and _every_ node has enough free pages?
>
> grep . /sys/devices/system/node/node*/vmstat
>
> Thanks,
> Fengguang
Hi Fengguang,
thanks for your fast reply.
Here is the data you requested:
root@server1015-han:~# grep . /sys/devices/system/node/node*/vmstat
/sys/devices/system/node/node0/vmstat:nr_written 5546561
/sys/devices/system/node/node0/vmstat:nr_dirtied 5572497
/sys/devices/system/node/node1/vmstat:nr_written 3936
/sys/devices/system/node/node1/vmstat:nr_dirtied 4190
modified it a little bit:
~# while [ true ]; do ps -eo
user,pid,tid,class,rtprio,ni,pri,psr,pcpu,vsz,rss,pmem,stat,wchan:28,cmd
| grep scp | grep -v grep; sleep 1; done
root 12409 12409 TS - 0 19 0 59.8 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 0 64.0 42136 1724 0.0 Rs
- scp -t /tmp/
root 12409 12409 TS - 0 19 0 67.7 42136 1724 0.0 Rs
- scp -t /tmp/
root 12409 12409 TS - 0 19 8 70.6 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 8 73.5 42136 1724 0.0 Rs
- scp -t /tmp/
root 12409 12409 TS - 0 19 8 76.0 42136 1724 0.0 Rs
- scp -t /tmp/
root 12409 12409 TS - 0 19 8 78.2 42136 1724 0.0 Rs
- scp -t /tmp/
root 12409 12409 TS - 0 19 8 80.0 42136 1724 0.0 Rs
- scp -t /tmp/
root 12409 12409 TS - 0 19 8 80.9 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 2 76.7 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 1 75.6 42136 1724 0.0 Ds
pipe_read scp -t /tmp/
root 12409 12409 TS - 0 19 0 76.0 42136 1724 0.0 Rs
- scp -t /tmp/
root 12409 12409 TS - 0 19 1 75.2 42136 1724 0.0 Rs
- scp -t /tmp/
root 12409 12409 TS - 0 19 1 76.6 42136 1724 0.0 Rs
- scp -t /tmp/
root 12409 12409 TS - 0 19 1 77.9 42136 1724 0.0 Rs
- scp -t /tmp/
root 12409 12409 TS - 0 19 1 79.0 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 1 72.8 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 0 73.0 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 0 73.8 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 1 74.3 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 1 73.4 42136 1724 0.0 Ss
- scp -t /tmp/
root 12409 12409 TS - 0 19 1 71.3 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 1 71.9 42136 1724 0.0 Rs
- scp -t /tmp/
root 12409 12409 TS - 0 19 0 72.7 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 3 73.5 42136 1724 0.0 Rs
- scp -t /tmp/
root 12409 12409 TS - 0 19 3 74.4 42136 1724 0.0 Rs
- scp -t /tmp/
root 12409 12409 TS - 0 19 3 75.2 42136 1724 0.0 Rs
- scp -t /tmp/
root 12409 12409 TS - 0 19 0 76.0 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 8 76.6 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 1 74.8 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 1 73.2 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 1 73.9 42136 1724 0.0 Rs
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 0 72.4 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 8 72.0 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 8 72.5 42136 1724 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12409 12409 TS - 0 19 8 72.9 42136 1724 0.0 Rs
- scp -t /tmp/
root 12409 12409 TS - 0 19 8 73.5 42136 1724 0.0 Rs
- scp -t /tmp/
root 12566 12566 TS - 0 19 1 0.0 42136 1728 0.0 Rs
- scp -t /tmp/
root 12566 12566 TS - 0 19 1 23.0 42136 1728 0.0 Rs
- scp -t /tmp/
root 12566 12566 TS - 0 19 1 49.5 42136 1728 0.0 Rs
- scp -t /tmp/
root 12566 12566 TS - 0 19 2 63.3 42136 1728 0.0 Rs
- scp -t /tmp/
root 12566 12566 TS - 0 19 1 71.5 42136 1728 0.0 Rs
- scp -t /tmp/
root 12566 12566 TS - 0 19 1 77.4 42136 1728 0.0 Rs
- scp -t /tmp/
root 12566 12566 TS - 0 19 1 70.3 42136 1728 0.0 Rs
- scp -t /tmp/
root 12566 12566 TS - 0 19 1 73.1 42136 1728 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12566 12566 TS - 0 19 0 65.7 42136 1728 0.0 Ss
poll_schedule_timeout scp -t /tmp/
root 12566 12566 TS - 0 19 1 61.2 42136 1728 0.0 Ss
- scp -t /tmp/
root 12566 12566 TS - 0 19 1 63.7 42136 1728 0.0 Rs
- scp -t /tmp/
root 12636 12636 TS - 0 19 8 0.0 42136 1728 0.0 Ss
poll_schedule_timeout scp -t /tmp/
Stefan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH] tcp: bound RTO to minimum
From: Eric Dumazet @ 2011-08-25 9:09 UTC (permalink / raw)
To: Arnd Hannemann
Cc: Alexander Zimmermann, Yuchung Cheng, Hagen Paul Pfeifer, netdev,
Lukowski Damian
In-Reply-To: <4E560BFD.5020301@arndnet.de>
Le jeudi 25 août 2011 à 10:46 +0200, Arnd Hannemann a écrit :
> Hi,
>
> Am 25.08.2011 10:26, schrieb Eric Dumazet:
> > Le jeudi 25 août 2011 à 09:28 +0200, Alexander Zimmermann a écrit :
> >> Hi Eric,
> >>
> >> Am 25.08.2011 um 07:28 schrieb Eric Dumazet:
> >
> >>> Real question is : do we really want to process ~1000 timer interrupts
> >>> per tcp session, ~2000 skb alloc/free/build/handling, possibly ~1000 ARP
> >>> requests, only to make tcp revover in ~1sec when connectivity returns
> >>> back. This just doesnt scale.
> >>
> >> maybe a stupid question, but 1000?. With an minRTO of 200ms and a maximum
> >> probing time of 120s, we 600 retransmits in a worst-case-senario
> >> (assumed that we get for every rot retransmission an icmp). No?
> >
> > Where is asserted the "max probing time of 120s" ?
> >
> > It is not the case on my machine :
> > I have way more retransmits than that, even if spaced by 1600 ms
> >
> > 07:16:13.389331 write(3, "\350F\235JC\357\376\363&\3\374\270R\21L\26\324{\37p\342\244i\304\356\241I:\301\332\222\26"..., 48) = 48
> > 07:16:13.389417 select(7, [3 4], [], NULL, NULL) = 1 (in [3])
> > 07:31:39.901311 read(3, 0xff8c4c90, 8192) = -1 EHOSTUNREACH (No route to host)
> >
> > Old kernels where performing up to 15 retries, doing exponential backoff.
> >
> > Now its kind of unlimited, according to experimental results.
>
> That shouldn't be. It should stop after the same time a TCP connection with an
> RTO of Minimum RTO which is doing 15 retries (tcp_retries2=15) and doing exponential backoff.
> So it should be around 900s*. But it could be that because of the icsk_retransmit wrapover
> this doesn't work as expected.
>
> * 200ms + 400ms + 800ms ...
It is 924 second with retries2=15 (default value)
I said ~1000 probes.
If ICMP are not rate limited, that could be about 924*5 probes, instead
of 15 probes on old kernels.
Maybe we should refine the thing a bit, to not reverse backoff unless
rto is > some_threshold.
Say 10s being the value, that would give at most 92 tries.
I mean, what is the gain to be able to restart a frozen TCP session with
a 1sec latency instead of 10s if it was blocked more than 60 seconds ?
^ permalink raw reply
* When set mtu 9600 by gfar_change_mtu, the maxfrm register is greater than 9600
From: Rongqing Li @ 2011-08-25 9:24 UTC (permalink / raw)
To: linuxppc-dev; +Cc: netdev
Hi:
When set MTU to 9600 by gfar_change_mtu(), the maxfrm register will
be set to 9728 which is greater than 9600 in gianfar.c.
But the MPC8315 Reference manual says the value of maxfrm can not
greater than 9600.
Is it a defect, Do we need to fix it?
--
Best Reagrds,
Roy | RongQing Li
^ permalink raw reply
* Re: Use of 802.3ad bonding for increasing link throughput
From: Simon Horman @ 2011-08-25 9:35 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: Tom Brown, netdev
In-Reply-To: <5344.1312998372@death>
On Wed, Aug 10, 2011 at 10:46:12AM -0700, Jay Vosburgh wrote:
[snip]
> On linux, the tcp_reordering sysctl value can be raised to
> compensate, but it will still result in increased packet overhead, and
> is not likely to be very efficient, and doesn't help with anything
> that's not TCP/IP. I have not tested balance-rr in a few years now, but
> my recollection is that, as a best case, throughput of one TCP
> connection could reach about 1.5x with 2 slaves, or about 2.5x with 4
> slaves (where the multipliers are in units of "bandwidth of one slave").
Hi Jay,
for what it is worth I would like to chip in with the results of some
testing I did using ballance-rr and 3 gigabit NICs late last year. The
link was three direct ("cross-over") cables to a machine that was also
using balance-rr.
I found that by increasing both rx-usecs (from 3 to 45) and enabling GRO
and TSO I was able to push 2.7*10^9 bits/s.
Local CPU utilisation was 30% and remote CPU utilisation was 10%.
Local service demand was 1.7 us/KB and remote service demand was 2.2us/KB.
The MTU was 1500 bytes.
In this configuration, with the tuning options described above, increasing
tcp_reordering (to 127) did not have a noticable effect on throughput but
did increase local CPU utilisation to about 50% and local service demand to
3.0 us/KB. There was also increased remote CPU utilisation and service
demand, although not as significant.
By using an 9000 byte MTU I was able to get close to 3*10^9 bits/s
with other parameters at their default values.
Local CPU utilisation was 15% and remote CPU utilisation was 5%.
Local service demand was 0.8us/KB and remote service demand was 1.1us/KB.
Increasing rx-usecs was suggested to me by Eric Dumazet on this list.
I no longer have access to the systems that I used to run these tests but I
do have other results that I have omitted from this email for the sake of
brevity.
Anecdotally my opinion after running these and other tests is that if you
want to push more than a gigabit/s over a single TCP stream then you would
be well advised to get a faster link rather than bond gigabit devices. I
believe you stated something similar earlier on in this thread.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox