* Re: [PATCH 5/5] net-next:asix: update VERSION and white space changes
From: Mark Lord @ 2011-11-15 15:19 UTC (permalink / raw)
To: David Miller; +Cc: grundler, netdev, linux-kernel, allan, freddy
In-Reply-To: <20111114.214542.1423779515286773837.davem@davemloft.net>
On 11-11-14 09:45 PM, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Mon, 14 Nov 2011 21:41:51 -0500 (EST)
>
>> Come on man... are you kidding me?
>
> Want to know what really pisses me off about this?
>
> All of Mark Lord's hard work to bring the entire vendor driver over
> was thrown out.
Well, ASIX and I appear to be back on track again.
So once the dust settles in net-dev with Grant's patches,
I will take over development of the asix driver,
and start sending you (Dave) patches to merge the
rest of the vendor's driver code.
With luck, it might all make it in there in time for the next (3.3) merge.
Cheers
^ permalink raw reply
* [PATCH] iproute2: Display closed UDP sockets on 'ss -ul'
From: Petr Šabata @ 2011-11-15 15:19 UTC (permalink / raw)
To: netdev; +Cc: Petr Šabata
This patch emulates 'netstat -ul' behavior, showing 'closed'
(state 07) UDP sockets when ss is called with '-ul' options.
Although dirty, this seems like the least invasive way to fix
it and shouldn't really break anything.
Signed-off-by: Petr Šabata <contyk@redhat.com>
---
misc/ss.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/misc/ss.c b/misc/ss.c
index 1353620..af774d1 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -2568,7 +2568,7 @@ int main(int argc, char *argv[])
current_filter.states = SS_ALL;
break;
case 'l':
- current_filter.states = (1<<SS_LISTEN);
+ current_filter.states = (1<<SS_LISTEN) | (1<<SS_CLOSE);
break;
case '4':
preferred_family = AF_INET;
--
1.7.7.1
^ permalink raw reply related
* Re: [RFC] kvm tools: Implement multiple VQ for virtio-net
From: Sasha Levin @ 2011-11-15 15:30 UTC (permalink / raw)
To: Krishna Kumar2
Cc: Asias He, gorcunov, kvm, mingo, Michael S. Tsirkin, netdev,
penberg, Rusty Russell, virtualization
In-Reply-To: <OFDA747DDD.8D1C8FD8-ON65257949.001837DA-65257949.0019D25F@in.ibm.com>
On Tue, 2011-11-15 at 10:14 +0530, Krishna Kumar2 wrote:
> Sasha Levin <levinsasha928@gmail.com> wrote on 11/14/2011 03:45:40 PM:
>
> > > Why both the bandwidth and latency performance are dropping so
> > > dramatically with multiple VQ?
> >
> > It looks like theres no hash sync between host and guest, which makes
> > the RX VQ change for every packet. This is my guess.
>
> Yes, I confirmed this happens for macvtap. I am
> using ixgbe - it calls skb_record_rx_queue when
> a skb is allocated, but sets rxhash when a packet
> arrives. Macvtap is relying on record_rx_queue
> first ahead of rxhash (as part of my patch making
> macvtap multiqueue), hence different skbs result
> in macvtap selecting different vq's.
I'm seeing this behavior in non-macvtep related setup as well (simple
tap <-> virtio-net).
--
Sasha.
^ permalink raw reply
* Re: [PATCH] net: fsl_pq_mdio: fix non tbi phy access
From: Baruch Siach @ 2011-11-15 15:44 UTC (permalink / raw)
To: Andy Fleming; +Cc: netdev@vger.kernel.org, linuxppc-dev
In-Reply-To: <74631EEB-F6F8-4969-AD05-81DEAFB0EAB4@freescale.com>
Hi Andy,
On Tue, Nov 15, 2011 at 09:06:03AM -0600, Andy Fleming wrote:
> On Nov 14, 2011, at 11:17 PM, Baruch Siach wrote:
> > On Mon, Nov 14, 2011 at 09:04:47PM +0000, Fleming Andy-AFLEMING wrote:
[snip]
> >> And looking at the p1010si.dtsi, I see that it's automatically there for
> >> you.
> >>
> >> How were you breaking?
> >
> > Adding linuxppc to Cc.
> >
> > My board is P1011 based, the single core version of P1020, not P1010. In
> > p1020si.dtsi I see no tbi node. In p1020rdb.dts I see a tbi node but only for
> > mdio@25000, not mdio@24000, which is what I'm using.
> >
> > Am I missing something?
>
> Well, that's a bug. In truth, the silicon dtsi trees should not have tbi
> nodes, as that's highly machine-specific. The p1020rdb is apparently relying
> on the old behavior, which is broken, and due to the fact that the first
> ethernet interface doesn't *use* the TBI PHY.
>
> You should add this to your board tree:
>
> mdio@24000 {
>
> tbi0: tbi-phy@11 {
> reg = <0x11>;
> device_type = "tbi-phy";
> };
> };
>
> And add the PHYs you use, as well as set reg (and the value after the "@")
> to something that makes sense for your board.
Thanks for your detailed explanation and prompt response. I've added a tbi
node, dropped my patch, and now my board works as expected.
> I am going to go right now, and add tbi nodes for all of the Freescale
> platforms. I will also modify the fsl_pq_mdio code to be more explicit about
> its reason for failure.
Please Cc me on these.
Thanks,
baruch
--
~. .~ Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
- baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -
^ permalink raw reply
* Re: [PATCH 5/5] net-next:asix: update VERSION and white space changes
From: Grant Grundler @ 2011-11-15 15:58 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-kernel, allan, freddy, kernel
In-Reply-To: <20111114.214542.1423779515286773837.davem@davemloft.net>
On Mon, Nov 14, 2011 at 6:45 PM, David Miller <davem@davemloft.net> wrote:
> From: David Miller <davem@davemloft.net>
> Date: Mon, 14 Nov 2011 21:41:51 -0500 (EST)
>
>> Come on man... are you kidding me?
Dave,
my apologies. That's obviously my fail.
The problem is I can't test your git tree on my systems...but I should
have at least compile tested it. Or just submitted the changes
straight from chromium.org tree. *sigh*
> Want to know what really pisses me off about this?
>
> All of Mark Lord's hard work to bring the entire vendor driver over
> was thrown out.
Not entirely correct as Mark pointed out. I was able to convince ASIX
they should be working with Mark and they committed to doing so.
> And it was thrown out in favor of this! Code that doesn't even
> compile.
*sigh* sorry...I'll resubmit the entire mess and compile test first. /o\
thanks for your patience,
grant
^ permalink raw reply
* Re: [PATCH] r8169: add module param for control of ASPM disable
From: Matthew Garrett @ 2011-11-15 16:32 UTC (permalink / raw)
To: Todd Broch
Cc: Francois Romieu, Realtek linux nic maintainers, netdev,
Hayes Wang
In-Reply-To: <CA+iF6Rog3ptpmQZzhcRODmZUKN18_uw5t9xfpQjbJ86qKUA0eQ@mail.gmail.com>
On Tue, Nov 15, 2011 at 08:27:41AM -0800, Todd Broch wrote:
> On Sat, Nov 12, 2011 at 2:46 AM, Francois Romieu <romieu@fr.zoreil.com>wrote:
> >
> > Re-visiting the original change that disabled ASPM,
>
> http://www.google.com/url?q=http%3A//git.kernel.org/%3Fp%3Dlinux/kernel/git/torvalds/linux-2.6.git%3Ba%3Dcommit%3Bh%3Dba04c7c93bbcb48ce880cf75b6e9dffcd79d4c7b&usg=AFQjCNFfPARrhwg-nBtW09W_n4qr1hgvdA
>
> Led me to,
> https://bugzilla.redhat.com/show_bug.cgi?id=642861#c4
>
> This comment by tomi.leppikangas@, is later re-canted as a h/w issue in,
> https://bugzilla.redhat.com/show_bug.cgi?id=642861#c9
> 'I am now pretty sure that my problems were caused by faulty hardware.
> Cpu or
> motherboard seems to be broken, so pcie_aspm=off didnt help for me.
> Sorry
> about misleading info.'
Mike Khusid's issue was fixed by disabling ASPM.
> My assement from above is that ASPM was disabled prematurely and given the
> power
> savings should be re-enabled.
Power savings are great. I'm all in favour of power savings. But not
when they break otherwise working setups.
> I'd certainly be agreeable to switching the assertion of patch to default
> being disabled.
> Unfortunately I fear that means most will never benefit from the power
> savings.
I'd recommend working with your hardware partners to figure out which
parts are expected to work and which aren't. There's no problem with
making this code conditional on product ID or version.
--
Matthew Garrett | mjg59@srcf.ucam.org
^ permalink raw reply
* [PATCH] bonding: Don't allow mode change via sysfs with slaves present
From: Veaceslav Falico @ 2011-11-15 16:44 UTC (permalink / raw)
To: netdev; +Cc: Andy Gospodarek, Jay Vosburgh
When changing mode via bonding's sysfs, the slaves are not initialized
correctly. Forbid to change modes with slaves present to ensure that every
slave is initialized correctly via bond_enslave().
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
drivers/net/bonding/bond_sysfs.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 5a20804..4ef7e2f 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -319,6 +319,13 @@ static ssize_t bonding_store_mode(struct device *d,
goto out;
}
+ if (bond->slave_cnt > 0) {
+ pr_err("unable to update mode of %s because it has slaves.\n",
+ bond->dev->name);
+ ret = -EPERM;
+ goto out;
+ }
+
new_value = bond_parse_parm(buf, bond_mode_tbl);
if (new_value < 0) {
pr_err("%s: Ignoring invalid mode value %.*s.\n",
--
1.7.6.4
^ permalink raw reply related
* Re: [PATCH] bonding: Don't allow mode change via sysfs with slaves present
From: Andy Gospodarek @ 2011-11-15 17:00 UTC (permalink / raw)
To: Veaceslav Falico; +Cc: netdev, Andy Gospodarek, Jay Vosburgh
In-Reply-To: <1321375482-8637-1-git-send-email-vfalico@redhat.com>
On Tue, Nov 15, 2011 at 05:44:42PM +0100, Veaceslav Falico wrote:
> When changing mode via bonding's sysfs, the slaves are not initialized
> correctly. Forbid to change modes with slaves present to ensure that every
> slave is initialized correctly via bond_enslave().
>
> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Looks good. This behavior forces someone who wants to change to mode to
go through steps that are almost as destructive as when module options
are used to configure the mode. I do not see a problem with this.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
^ permalink raw reply
* [PATCH 1/5] net-next:asix:PHY_MODE_RTL8211CL should be 0xC
From: Grant Grundler @ 2011-11-15 17:12 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-kernel, Allan Chou, Freddy Xin, Grant Grundler
From: Grant Grundler <grundler@google.com>
Use correct value for rtl phy support.
(rtl phy are in AX88178 devices like NWU220G and USB2-ET1000).
Signed-off-by: Allan Chou <allan@asix.com.tw>
Tested-by: Grant Grundler <grundler@chromium.org>
---
drivers/net/usb/asix.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
index e81e22e..873860d 100644
--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -163,7 +163,7 @@
#define MARVELL_CTRL_TXDELAY 0x0002
#define MARVELL_CTRL_RXDELAY 0x0080
-#define PHY_MODE_RTL8211CL 0x0004
+#define PHY_MODE_RTL8211CL 0x000C
/* This structure cannot exceed sizeof(unsigned long [5]) AKA 20 bytes */
struct asix_data {
--
1.7.3.1
^ permalink raw reply related
* [PATCH 3/5] net-next:asix: reduce AX88772 init time by about 2 seconds
From: Grant Grundler @ 2011-11-15 17:12 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, Allan Chou, Freddy Xin, Grant Grundler,
Grant Grundler
In-Reply-To: <1321377163-26308-1-git-send-email-grundler@chromium.org>
From: Grant Grundler <grundler@google.com>
ax88772_reset takes about 2 seconds and is called twice.
Once from ax88772_bind() directly and again indirectly from usbnet_open().
Reset the USB FW/Phy enough to blink the LEDs when inserted.
Signed-off-by: Allan Chou <allan@asix.com.tw>
Signed-off-by: Grant Grundler <grundler@chromium.org>
---
drivers/net/usb/asix.c | 30 +++++++++++++++++++++++++-----
1 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
index b4675e8..8462be5 100644
--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -1083,7 +1083,7 @@ static const struct net_device_ops ax88772_netdev_ops = {
static int ax88772_bind(struct usbnet *dev, struct usb_interface *intf)
{
- int ret;
+ int ret, embd_phy;
struct asix_data *data = (struct asix_data *)&dev->data;
u8 buf[ETH_ALEN];
u32 phyid;
@@ -1108,16 +1108,36 @@ static int ax88772_bind(struct usbnet *dev, struct usb_interface *intf)
dev->mii.reg_num_mask = 0x1f;
dev->mii.phy_id = asix_get_phy_addr(dev);
- phyid = asix_get_phyid(dev);
- dbg("PHYID=0x%08x", phyid);
-
dev->net->netdev_ops = &ax88772_netdev_ops;
dev->net->ethtool_ops = &ax88772_ethtool_ops;
- ret = ax88772_reset(dev);
+ embd_phy = ((dev->mii.phy_id & 0x1f) == 0x10 ? 1 : 0);
+
+ /* Reset the PHY to normal operation mode */
+ ret = asix_write_cmd(dev, AX_CMD_SW_PHY_SELECT, embd_phy, 0, 0, NULL);
+ if (ret < 0) {
+ dbg("Select PHY #1 failed: %d", ret);
+ return ret;
+ }
+
+ ret = asix_sw_reset(dev, AX_SWRESET_IPPD | AX_SWRESET_PRL);
if (ret < 0)
return ret;
+ msleep(150);
+
+ ret = asix_sw_reset(dev, AX_SWRESET_CLEAR);
+ if (ret < 0)
+ return ret;
+
+ msleep(150);
+
+ ret = asix_sw_reset(dev, embd_phy ? AX_SWRESET_IPRL : AX_SWRESET_PRTE);
+
+ /* Read PHYID register *AFTER* the PHY was reset properly */
+ phyid = asix_get_phyid(dev);
+ dbg("PHYID=0x%08x", phyid);
+
/* Asix framing packs multiple eth frames into a 2K usb bulk transfer */
if (dev->driver_info->flags & FLAG_FRAMING_AX) {
/* hard_mtu is still the default - the device does not support
--
1.7.3.1
^ permalink raw reply related
* [PATCH 4/5] net-next:asix: V2 more fixes for ax88178 phy init sequence
From: Grant Grundler @ 2011-11-15 17:12 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, Allan Chou, Freddy Xin, Grant Grundler,
Grant Grundler
In-Reply-To: <1321377163-26308-1-git-send-email-grundler@chromium.org>
From: Grant Grundler <grundler@google.com>
Now works on Samsung Series 5 (chromebook)
Two fixes here:
o use 0x7F mask for phymode
o read phyid *AFTER* phy is powered up (via GPIOs)
Signed-off-by: Allan Chou <allan@asix.com.tw>
Signed-off-by: Grant Grundler <grundler@chromium.org>
---
Dave,
Apologies again for botching this patch (not compiling).
I had failed to s/ax88178_reset/asix_sw_reset/ and gave a blend of the two.
I've reviewed and compile tested all 5 patches.
drivers/net/usb/asix.c | 22 +++++++++++++++-------
1 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
index 8462be5..f870ab9 100644
--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -1248,6 +1248,7 @@ static int ax88178_reset(struct usbnet *dev)
__le16 eeprom;
u8 status;
int gpio0 = 0;
+ u32 phyid;
asix_read_cmd(dev, AX_CMD_READ_GPIOS, 0, 0, 1, &status);
dbg("GPIO Status: 0x%04x", status);
@@ -1263,12 +1264,13 @@ static int ax88178_reset(struct usbnet *dev)
data->ledmode = 0;
gpio0 = 1;
} else {
- data->phymode = le16_to_cpu(eeprom) & 7;
+ data->phymode = le16_to_cpu(eeprom) & 0x7F;
data->ledmode = le16_to_cpu(eeprom) >> 8;
gpio0 = (le16_to_cpu(eeprom) & 0x80) ? 0 : 1;
}
dbg("GPIO0: %d, PhyMode: %d", gpio0, data->phymode);
+ /* Power up external GigaPHY through AX88178 GPIO pin */
asix_write_gpio(dev, AX_GPIO_RSE | AX_GPIO_GPO_1 | AX_GPIO_GPO1EN, 40);
if ((le16_to_cpu(eeprom) >> 8) != 1) {
asix_write_gpio(dev, 0x003c, 30);
@@ -1280,6 +1282,13 @@ static int ax88178_reset(struct usbnet *dev)
asix_write_gpio(dev, AX_GPIO_GPO1EN | AX_GPIO_GPO_1, 30);
}
+ /* Read PHYID register *AFTER* powering up PHY */
+ phyid = asix_get_phyid(dev);
+ dbg("PHYID=0x%08x", phyid);
+
+ /* Set AX88178 to enable MII/GMII/RGMII interface for external PHY */
+ asix_write_cmd(dev, AX_CMD_SW_PHY_SELECT, 0, 0, 0, NULL);
+
asix_sw_reset(dev, 0);
msleep(150);
@@ -1424,7 +1433,6 @@ static int ax88178_bind(struct usbnet *dev, struct usb_interface *intf)
{
int ret;
u8 buf[ETH_ALEN];
- u32 phyid;
struct asix_data *data = (struct asix_data *)&dev->data;
data->eeprom_len = AX88772_EEPROM_LEN;
@@ -1451,12 +1459,12 @@ static int ax88178_bind(struct usbnet *dev, struct usb_interface *intf)
dev->net->netdev_ops = &ax88178_netdev_ops;
dev->net->ethtool_ops = &ax88178_ethtool_ops;
- phyid = asix_get_phyid(dev);
- dbg("PHYID=0x%08x", phyid);
+ /* Blink LEDS so users know driver saw dongle */
+ asix_sw_reset(dev, 0);
+ msleep(150);
- ret = ax88178_reset(dev);
- if (ret < 0)
- return ret;
+ asix_sw_reset(dev, AX_SWRESET_PRL | AX_SWRESET_IPPD);
+ msleep(150);
/* Asix framing packs multiple eth frames into a 2K usb bulk transfer */
if (dev->driver_info->flags & FLAG_FRAMING_AX) {
--
1.7.3.1
^ permalink raw reply related
* [PATCH 5/5] net-next:asix: V2 Update VERSION
From: Grant Grundler @ 2011-11-15 17:12 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, Allan Chou, Freddy Xin, Grant Grundler,
Grant Grundler
In-Reply-To: <1321377163-26308-1-git-send-email-grundler@chromium.org>
From: Grant Grundler <grundler@google.com>
Only update VERSION to reflect previous changes.
Signed-off-by: Grant Grundler <grundler@chromium.org>
---
drivers/net/usb/asix.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
index f870ab9..e6fed4d 100644
--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -36,7 +36,7 @@
#include <linux/usb/usbnet.h>
#include <linux/slab.h>
-#define DRIVER_VERSION "26-Sep-2011"
+#define DRIVER_VERSION "08-Nov-2011"
#define DRIVER_NAME "asix"
/* ASIX AX8817X based USB 2.0 Ethernet Devices */
--
1.7.3.1
^ permalink raw reply related
* [PATCH 2/5] net-next:asix:poll in asix_get_phyid in case phy not ready
From: Grant Grundler @ 2011-11-15 17:12 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, Allan Chou, Freddy Xin, Grant Grundler,
Grant Grundler
In-Reply-To: <1321377163-26308-1-git-send-email-grundler@chromium.org>
From: Grant Grundler <grundler@google.com>
Sometimes the phy isn't ready after reset...poll and pray it will be soon.
Signed-off-by: Freddy Xin <freddy@asix.com.tw>
Signed-off-by: Grant Grundler <grundler@chromium.org>
---
drivers/net/usb/asix.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
index 873860d..b4675e8 100644
--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -652,9 +652,17 @@ static u32 asix_get_phyid(struct usbnet *dev)
{
int phy_reg;
u32 phy_id;
+ int i;
- phy_reg = asix_mdio_read(dev->net, dev->mii.phy_id, MII_PHYSID1);
- if (phy_reg < 0)
+ /* Poll for the rare case the FW or phy isn't ready yet. */
+ for (i = 0; i < 100; i++) {
+ phy_reg = asix_mdio_read(dev->net, dev->mii.phy_id, MII_PHYSID1);
+ if (phy_reg != 0 && phy_reg != 0xFFFF)
+ break;
+ mdelay(1);
+ }
+
+ if (phy_reg <= 0 || phy_reg == 0xFFFF)
return 0;
phy_id = (phy_reg & 0xffff) << 16;
--
1.7.3.1
^ permalink raw reply related
* Re: [patch net-next V8] net: introduce ethernet teaming device
From: Rick Jones @ 2011-11-15 17:22 UTC (permalink / raw)
To: Andy Gospodarek
Cc: Jiri Pirko, netdev, davem, eric.dumazet, bhutchings, shemminger,
fubar, tgraf, ebiederm, mirqus, kaber, greearb, jesse, fbl,
benjamin.poirier, jzupka, ivecera
In-Reply-To: <20111115015616.GA25132@gospo.rdu.redhat.com>
> On most modern systems I suspect there will be little to no difference
> between bonding RX peformance and team performance.
>
> If there is any now, I suspect team and bond performance to be similar
> by the time team has to account for the corner-cases bonding has already
> resolved. :-)
>
> Benchmarks may prove otherwise, but I've yet to see Jiri produce
> anything. My initial testing doesn't demonstrate any measureable
> differences with 1Gbps interfaces on a multi-core, multi-socket system.
I wouldn't expect much difference in terms of bandwidth, I was thinking
the demonstration would be made in the area of service demand (CPU
consumed per unit work) and perhaps aggregate packets per second.
happy benchmarking,
rick jones
^ permalink raw reply
* [PATCH net-next] bnx2: switch to build_skb() infrastructure
From: Eric Dumazet @ 2011-11-15 17:30 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Michael Chan, Eilon Greenstein
This is very similar to bnx2x conversion, but bnx2 only requires 16bytes
alignement at start of the received frame to store its l2_fhdr, so goal
was not to reduce skb truesize (in fact it should not change after this
patch)
Using build_skb() reduces cache line misses in the driver, since we
use cache hot skb instead of cold ones. Number of in-flight sk_buff
structures is lower, they are more likely recycled in SLUB caches
while still hot.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Michael Chan <mchan@broadcom.com>
CC: Eilon Greenstein <eilong@broadcom.com>
---
Tested with SLUB/SLAB/SLOB on my dev machine
drivers/net/ethernet/broadcom/bnx2.c | 137 ++++++++++++-------------
drivers/net/ethernet/broadcom/bnx2.h | 17 ++-
2 files changed, 85 insertions(+), 69 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index 32d1f92..8556077 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -2734,31 +2734,27 @@ bnx2_free_rx_page(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
}
static inline int
-bnx2_alloc_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index, gfp_t gfp)
+bnx2_alloc_rx_data(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index, gfp_t gfp)
{
- struct sk_buff *skb;
+ u8 *data;
struct sw_bd *rx_buf = &rxr->rx_buf_ring[index];
dma_addr_t mapping;
struct rx_bd *rxbd = &rxr->rx_desc_ring[RX_RING(index)][RX_IDX(index)];
- unsigned long align;
- skb = __netdev_alloc_skb(bp->dev, bp->rx_buf_size, gfp);
- if (skb == NULL) {
+ data = kmalloc(bp->rx_buf_size, gfp);
+ if (!data)
return -ENOMEM;
- }
- if (unlikely((align = (unsigned long) skb->data & (BNX2_RX_ALIGN - 1))))
- skb_reserve(skb, BNX2_RX_ALIGN - align);
-
- mapping = dma_map_single(&bp->pdev->dev, skb->data, bp->rx_buf_use_size,
+ mapping = dma_map_single(&bp->pdev->dev,
+ get_l2_fhdr(data),
+ bp->rx_buf_use_size,
PCI_DMA_FROMDEVICE);
if (dma_mapping_error(&bp->pdev->dev, mapping)) {
- dev_kfree_skb(skb);
+ kfree(data);
return -EIO;
}
- rx_buf->skb = skb;
- rx_buf->desc = (struct l2_fhdr *) skb->data;
+ rx_buf->data = data;
dma_unmap_addr_set(rx_buf, mapping, mapping);
rxbd->rx_bd_haddr_hi = (u64) mapping >> 32;
@@ -2965,8 +2961,8 @@ bnx2_reuse_rx_skb_pages(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr,
}
static inline void
-bnx2_reuse_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr,
- struct sk_buff *skb, u16 cons, u16 prod)
+bnx2_reuse_rx_data(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr,
+ u8 *data, u16 cons, u16 prod)
{
struct sw_bd *cons_rx_buf, *prod_rx_buf;
struct rx_bd *cons_bd, *prod_bd;
@@ -2980,8 +2976,7 @@ bnx2_reuse_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr,
rxr->rx_prod_bseq += bp->rx_buf_use_size;
- prod_rx_buf->skb = skb;
- prod_rx_buf->desc = (struct l2_fhdr *) skb->data;
+ prod_rx_buf->data = data;
if (cons == prod)
return;
@@ -2995,33 +2990,39 @@ bnx2_reuse_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr,
prod_bd->rx_bd_haddr_lo = cons_bd->rx_bd_haddr_lo;
}
-static int
-bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
+static struct sk_buff *
+bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u8 *data,
unsigned int len, unsigned int hdr_len, dma_addr_t dma_addr,
u32 ring_idx)
{
int err;
u16 prod = ring_idx & 0xffff;
+ struct sk_buff *skb;
- err = bnx2_alloc_rx_skb(bp, rxr, prod, GFP_ATOMIC);
+ err = bnx2_alloc_rx_data(bp, rxr, prod, GFP_ATOMIC);
if (unlikely(err)) {
- bnx2_reuse_rx_skb(bp, rxr, skb, (u16) (ring_idx >> 16), prod);
+ bnx2_reuse_rx_data(bp, rxr, data, (u16) (ring_idx >> 16), prod);
+error:
if (hdr_len) {
unsigned int raw_len = len + 4;
int pages = PAGE_ALIGN(raw_len - hdr_len) >> PAGE_SHIFT;
bnx2_reuse_rx_skb_pages(bp, rxr, NULL, pages);
}
- return err;
+ return NULL;
}
- skb_reserve(skb, BNX2_RX_OFFSET);
dma_unmap_single(&bp->pdev->dev, dma_addr, bp->rx_buf_use_size,
PCI_DMA_FROMDEVICE);
-
+ skb = build_skb(data);
+ if (!skb) {
+ kfree(data);
+ goto error;
+ }
+ skb_reserve(skb, ((u8 *)get_l2_fhdr(data) - data) + BNX2_RX_OFFSET);
if (hdr_len == 0) {
skb_put(skb, len);
- return 0;
+ return skb;
} else {
unsigned int i, frag_len, frag_size, pages;
struct sw_pg *rx_pg;
@@ -3052,7 +3053,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
skb_frag_size_sub(frag, tail);
skb->data_len -= tail;
}
- return 0;
+ return skb;
}
rx_pg = &rxr->rx_pg_ring[pg_cons];
@@ -3074,7 +3075,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
rxr->rx_pg_prod = pg_prod;
bnx2_reuse_rx_skb_pages(bp, rxr, skb,
pages - i);
- return err;
+ return NULL;
}
dma_unmap_page(&bp->pdev->dev, mapping_old,
@@ -3091,7 +3092,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
rxr->rx_pg_prod = pg_prod;
rxr->rx_pg_cons = pg_cons;
}
- return 0;
+ return skb;
}
static inline u16
@@ -3130,19 +3131,17 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
struct sw_bd *rx_buf, *next_rx_buf;
struct sk_buff *skb;
dma_addr_t dma_addr;
+ u8 *data;
sw_ring_cons = RX_RING_IDX(sw_cons);
sw_ring_prod = RX_RING_IDX(sw_prod);
rx_buf = &rxr->rx_buf_ring[sw_ring_cons];
- skb = rx_buf->skb;
- prefetchw(skb);
+ data = rx_buf->data;
+ rx_buf->data = NULL;
- next_rx_buf =
- &rxr->rx_buf_ring[RX_RING_IDX(NEXT_RX_BD(sw_cons))];
- prefetch(next_rx_buf->desc);
-
- rx_buf->skb = NULL;
+ rx_hdr = get_l2_fhdr(data);
+ prefetch(rx_hdr);
dma_addr = dma_unmap_addr(rx_buf, mapping);
@@ -3150,7 +3149,10 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
BNX2_RX_OFFSET + BNX2_RX_COPY_THRESH,
PCI_DMA_FROMDEVICE);
- rx_hdr = rx_buf->desc;
+ next_rx_buf =
+ &rxr->rx_buf_ring[RX_RING_IDX(NEXT_RX_BD(sw_cons))];
+ prefetch(get_l2_fhdr(next_rx_buf->data));
+
len = rx_hdr->l2_fhdr_pkt_len;
status = rx_hdr->l2_fhdr_status;
@@ -3169,7 +3171,7 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
L2_FHDR_ERRORS_TOO_SHORT |
L2_FHDR_ERRORS_GIANT_FRAME))) {
- bnx2_reuse_rx_skb(bp, rxr, skb, sw_ring_cons,
+ bnx2_reuse_rx_data(bp, rxr, data, sw_ring_cons,
sw_ring_prod);
if (pg_ring_used) {
int pages;
@@ -3184,30 +3186,29 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
len -= 4;
if (len <= bp->rx_copy_thresh) {
- struct sk_buff *new_skb;
-
- new_skb = netdev_alloc_skb(bp->dev, len + 6);
- if (new_skb == NULL) {
- bnx2_reuse_rx_skb(bp, rxr, skb, sw_ring_cons,
+ skb = netdev_alloc_skb(bp->dev, len + 6);
+ if (skb == NULL) {
+ bnx2_reuse_rx_data(bp, rxr, data, sw_ring_cons,
sw_ring_prod);
goto next_rx;
}
/* aligned copy */
- skb_copy_from_linear_data_offset(skb,
- BNX2_RX_OFFSET - 6,
- new_skb->data, len + 6);
- skb_reserve(new_skb, 6);
- skb_put(new_skb, len);
+ memcpy(skb->data,
+ (u8 *)rx_hdr + BNX2_RX_OFFSET - 6,
+ len + 6);
+ skb_reserve(skb, 6);
+ skb_put(skb, len);
- bnx2_reuse_rx_skb(bp, rxr, skb,
+ bnx2_reuse_rx_data(bp, rxr, data,
sw_ring_cons, sw_ring_prod);
- skb = new_skb;
- } else if (unlikely(bnx2_rx_skb(bp, rxr, skb, len, hdr_len,
- dma_addr, (sw_ring_cons << 16) | sw_ring_prod)))
- goto next_rx;
-
+ } else {
+ skb = bnx2_rx_skb(bp, rxr, data, len, hdr_len, dma_addr,
+ (sw_ring_cons << 16) | sw_ring_prod);
+ if (!skb)
+ goto next_rx;
+ }
if ((status & L2_FHDR_STATUS_L2_VLAN_TAG) &&
!(bp->rx_mode & BNX2_EMAC_RX_MODE_KEEP_VLAN_TAG))
__vlan_hwaccel_put_tag(skb, rx_hdr->l2_fhdr_vlan_tag);
@@ -5234,7 +5235,7 @@ bnx2_init_rx_ring(struct bnx2 *bp, int ring_num)
ring_prod = prod = rxr->rx_prod;
for (i = 0; i < bp->rx_ring_size; i++) {
- if (bnx2_alloc_rx_skb(bp, rxr, ring_prod, GFP_KERNEL) < 0) {
+ if (bnx2_alloc_rx_data(bp, rxr, ring_prod, GFP_KERNEL) < 0) {
netdev_warn(bp->dev, "init'ed rx ring %d with %d/%d skbs only\n",
ring_num, i, bp->rx_ring_size);
break;
@@ -5329,7 +5330,7 @@ bnx2_set_rx_ring_size(struct bnx2 *bp, u32 size)
rx_size = bp->dev->mtu + ETH_HLEN + BNX2_RX_OFFSET + 8;
rx_space = SKB_DATA_ALIGN(rx_size + BNX2_RX_ALIGN) + NET_SKB_PAD +
- sizeof(struct skb_shared_info);
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
bp->rx_copy_thresh = BNX2_RX_COPY_THRESH;
bp->rx_pg_ring_size = 0;
@@ -5351,8 +5352,9 @@ bnx2_set_rx_ring_size(struct bnx2 *bp, u32 size)
}
bp->rx_buf_use_size = rx_size;
- /* hw alignment */
- bp->rx_buf_size = bp->rx_buf_use_size + BNX2_RX_ALIGN;
+ /* hw alignment + build_skb() overhead*/
+ bp->rx_buf_size = SKB_DATA_ALIGN(bp->rx_buf_use_size + BNX2_RX_ALIGN) +
+ NET_SKB_PAD + SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
bp->rx_jumbo_thresh = rx_size - BNX2_RX_OFFSET;
bp->rx_ring_size = size;
bp->rx_max_ring = bnx2_find_max_ring(size, MAX_RX_RINGS);
@@ -5418,9 +5420,9 @@ bnx2_free_rx_skbs(struct bnx2 *bp)
for (j = 0; j < bp->rx_max_ring_idx; j++) {
struct sw_bd *rx_buf = &rxr->rx_buf_ring[j];
- struct sk_buff *skb = rx_buf->skb;
+ u8 *data = rx_buf->data;
- if (skb == NULL)
+ if (data == NULL)
continue;
dma_unmap_single(&bp->pdev->dev,
@@ -5428,9 +5430,9 @@ bnx2_free_rx_skbs(struct bnx2 *bp)
bp->rx_buf_use_size,
PCI_DMA_FROMDEVICE);
- rx_buf->skb = NULL;
+ rx_buf->data = NULL;
- dev_kfree_skb(skb);
+ kfree(data);
}
for (j = 0; j < bp->rx_max_pg_ring_idx; j++)
bnx2_free_rx_page(bp, rxr, j);
@@ -5736,7 +5738,8 @@ static int
bnx2_run_loopback(struct bnx2 *bp, int loopback_mode)
{
unsigned int pkt_size, num_pkts, i;
- struct sk_buff *skb, *rx_skb;
+ struct sk_buff *skb;
+ u8 *data;
unsigned char *packet;
u16 rx_start_idx, rx_idx;
dma_addr_t map;
@@ -5828,14 +5831,14 @@ bnx2_run_loopback(struct bnx2 *bp, int loopback_mode)
}
rx_buf = &rxr->rx_buf_ring[rx_start_idx];
- rx_skb = rx_buf->skb;
+ data = rx_buf->data;
- rx_hdr = rx_buf->desc;
- skb_reserve(rx_skb, BNX2_RX_OFFSET);
+ rx_hdr = get_l2_fhdr(data);
+ data = (u8 *)rx_hdr + BNX2_RX_OFFSET;
dma_sync_single_for_cpu(&bp->pdev->dev,
dma_unmap_addr(rx_buf, mapping),
- bp->rx_buf_size, PCI_DMA_FROMDEVICE);
+ bp->rx_buf_use_size, PCI_DMA_FROMDEVICE);
if (rx_hdr->l2_fhdr_status &
(L2_FHDR_ERRORS_BAD_CRC |
@@ -5852,7 +5855,7 @@ bnx2_run_loopback(struct bnx2 *bp, int loopback_mode)
}
for (i = 14; i < pkt_size; i++) {
- if (*(rx_skb->data + i) != (unsigned char) (i & 0xff)) {
+ if (*(data + i) != (unsigned char) (i & 0xff)) {
goto loopback_test_done;
}
}
diff --git a/drivers/net/ethernet/broadcom/bnx2.h b/drivers/net/ethernet/broadcom/bnx2.h
index 99d31a7..1db2d51 100644
--- a/drivers/net/ethernet/broadcom/bnx2.h
+++ b/drivers/net/ethernet/broadcom/bnx2.h
@@ -6563,12 +6563,25 @@ struct l2_fhdr {
#define MB_TX_CID_ADDR MB_GET_CID_ADDR(TX_CID)
#define MB_RX_CID_ADDR MB_GET_CID_ADDR(RX_CID)
+/*
+ * This driver uses new build_skb() API :
+ * RX ring buffer contains pointer to kmalloc() data only,
+ * skb are built only after Hardware filled the frame.
+ */
struct sw_bd {
- struct sk_buff *skb;
- struct l2_fhdr *desc;
+ u8 *data;
DEFINE_DMA_UNMAP_ADDR(mapping);
};
+/* Its faster to compute this from data than storing it in sw_bd
+ * (less cache misses)
+ */
+static inline struct l2_fhdr *get_l2_fhdr(u8 *data)
+{
+ return (struct l2_fhdr *)(PTR_ALIGN(data, BNX2_RX_ALIGN) + NET_SKB_PAD);
+}
+
+
struct sw_pg {
struct page *page;
DEFINE_DMA_UNMAP_ADDR(mapping);
^ permalink raw reply related
* Re: [PATCH V2] vlan:return error when real dev is enslaved
From: Ben Hutchings @ 2011-11-15 17:34 UTC (permalink / raw)
To: Weiping Pan
Cc: Patrick McHardy, David S. Miller, open list:VLAN (802.1Q),
open list
In-Reply-To: <d7ea491a500c99c0b4839ddcedab027a3c865c59.1321360959.git.wpan@redhat.com>
On Tue, 2011-11-15 at 20:44 +0800, Weiping Pan wrote:
> Qinhuibin reported a kernel panic when he do some operation about vlan.
> https://lkml.org/lkml/2011/11/6/218
>
> The operation is as below:
> ifconfig eth2 up
> modprobe bonding
> modprobe 8021q
> ifconfig bond0 up
> ifenslave bond0 eth2
> vconfig add eth2 3300
> vconfig add bond0 33
> vconfig rem eth2.3300
>
> the panic stack is as below:
> [<ffffffffa002f1c9>] panic_event+0x49/0x70 [ipmi_msghandler]
> [<ffffffff80378917>] notifier_call_chain+0x37/0x70
> [<ffffffff80372122>] panic+0xa2/0x195
> [<ffffffff80376ed8>] oops_end+0xd8/0x140
> [<ffffffff8001bea7>] no_context+0xf7/0x280
> [<ffffffff8001c1a5>] __bad_area_nosemaphore+0x175/0x250
> [<ffffffff80376318>] page_fault+0x28/0x30
> [<ffffffffa039dabd>] igb_vlan_rx_kill_vid+0x4d/0x100 [igb]
> [<ffffffffa044045f>] bond_vlan_rx_kill_vid+0x9f/0x290 [bonding]
> [<ffffffffa047e636>] unregister_vlan_dev+0x136/0x180 [8021q]
> [<ffffffffa047ed20>] vlan_ioctl_handler+0x170/0x3f0 [8021q]
> [<ffffffff802c1d3f>] sock_ioctl+0x21f/0x280
> [<ffffffff800e6d7f>] vfs_ioctl+0x2f/0xb0
> [<ffffffff800e726b>] do_vfs_ioctl+0x3cb/0x5a0
> [<ffffffff800e74e1>] sys_ioctl+0xa1/0xb0
> [<ffffffff80007388>] system_call_fastpath+0x16/0x1b
> [<00007f108a2b8bd7>] 0x7f108a2b8bd7
> And the nic is as below:
> [root@localhost ~]# ethtool -i eth2
> driver: igb
> version: 3.0.6-k2
> firmware-version: 1.2-1
> bus-info: 0000:04:00.0
> kernel version:
> 2.6.32.12-0.7 also happen in 2.6.32-131
>
> For kernel 2.6.32, the reason of this bug is that when we do "vconfig add bond0 33",
> adapter->vlgrp is overwritten in igb_vlan_rx_register. So when we do "vconfig rem
> eth2.3300", it can't find the correct vlgrp.
>
> And this bug is avoided by vlan cleanup patchset from Jiri Pirko
> <jpirko@redhat.com>, especially commit b2cb09b1a772(igb: do vlan cleanup).
Since this won't be applied to mainline first, you should send it
directly to stable@vger.kernel.org as well as to netdev.
> But it is not a correct operation to creat a vlan interface on eth2
> when it have been enslaved by bond0, so this patch is to return error
> when the real dev is already enslaved.
>
> Changelog:
> V2: use pr_err instead of pr_info
>
> Signed-off-by: Weiping Pan <wpan@redhat.com>
> ---
> net/8021q/vlan.c | 5 +++++
> 1 files changed, 5 insertions(+), 0 deletions(-)
>
> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index 5471628..7ce50ba 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -148,6 +148,11 @@ int vlan_check_real_dev(struct net_device *real_dev, u16 vlan_id)
> const char *name = real_dev->name;
> const struct net_device_ops *ops = real_dev->netdev_ops;
>
> + if (real_dev->flags & IFF_SLAVE) {
> + pr_err("Error, %s was already enslaved\n", name);
> + return -EOPNOTSUPP;
I think the appropriate error code is EBUSY. The operation is supported
(probably - we haven't checked for VLAN_CHALLENGED yet) but the device
is otherwise occupied.
Ben.
> + }
> +
> if (real_dev->features & NETIF_F_VLAN_CHALLENGED) {
> pr_info("VLANs not supported on %s\n", name);
> return -EOPNOTSUPP;
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: sky2 hw csum failure
From: Stephen Hemminger @ 2011-11-15 17:45 UTC (permalink / raw)
To: Yan, Zheng
Cc: Martin Volf, shemminger@linux-foundation.org,
bridge@lists.linux-foundation.org, netdev@vger.kernel.org,
davem@davemloft.net, wcang@sfc.wide.ad.jp
In-Reply-To: <4EC23EE7.2010606@intel.com>
On Tue, 15 Nov 2011 18:28:55 +0800
"Yan, Zheng" <zheng.z.yan@intel.com> wrote:
> I re-tested the checksum code, both CHECKSUM_NONE and CHECKSUM_COMPLETE
> cases are OK. Maybe the bug is related to sky2.
>
> Regards
> Yan, Zheng
There are three types of receive checksumming:
1. Hardware does not do checksumming (CHECKSUM_NONE)
2. Hardware validates checksum (CHECKSUM_UNNECESSARY)
3. Hardware computes sum of bytes in skb (CHECKSUM_COMPLETE)
Most hardware does #2, but sky2 uses #3.
For the second case, the hardware does not look at headers but only
reports the one's complement value in ip_summed. It is up to the
protocol layers to adjust accordingly. This means if data is removed
or added the checksum needs to be adjusted.
^ permalink raw reply
* Re: bnx2 cards intermittantly going offline
From: Ken @ 2011-11-15 17:41 UTC (permalink / raw)
To: netdev
In-Reply-To: <6DD3782C33561D44B47071B09946026405F63853AB@exchange1>
+1 with identical L2 components and symptoms.
^ permalink raw reply
* [RFT] bridge: checksum not updated after pull
From: Stephen Hemminger @ 2011-11-15 18:09 UTC (permalink / raw)
To: Yan, Zheng
Cc: Martin Volf, bridge@lists.linux-foundation.org,
netdev@vger.kernel.org, davem@davemloft.net, wcang@sfc.wide.ad.jp
In-Reply-To: <4EC23EE7.2010606@intel.com>
I think this is what is necessary, please test.
Subject: bridge: correct IPv6 checksum after pull
Bridge multicast snooping of ICMPv6 would incorrectly report a checksum problem
when used with Ethernet devices like sky2 that use CHECKSUM_COMPLETE.
When bytes are removed from skb, the computed checksum needs to be adjusted.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
--- a/net/bridge/br_multicast.c 2011-11-09 13:55:00.028012483 -0800
+++ b/net/bridge/br_multicast.c 2011-11-15 10:05:06.171314194 -0800
@@ -1501,7 +1501,9 @@ static int br_multicast_ipv6_rcv(struct
__skb_pull(skb2, offset);
skb_reset_transport_header(skb2);
-
+ skb_postpull_rcsum(skb2, skb_network_header(skb2),
+ skb_network_header_len(skb2));
+
icmp6_type = icmp6_hdr(skb2)->icmp6_type;
switch (icmp6_type) {
^ permalink raw reply
* Re: [Devel] Re: [PATCH v5 00/10] per-cgroup tcp memory pressure
From: James Bottomley @ 2011-11-15 18:27 UTC (permalink / raw)
To: davem@davemloft.net, eric.dumazet@gmail.com
Cc: Glauber Costa, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, paul@paulmenage.org, lizf@cn.fujitsu.com,
linux-mm@kvack.org, devel@openvz.org, kirill@shutemov.name,
gthelen@google.com, kamezawa.hiroyu@jp.fujitsu.com
In-Reply-To: <4EBAC04F.1010901@parallels.com>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2029 bytes --]
On Wed, 2011-11-09 at 16:02 -0200, Glauber Costa wrote:
> On 11/07/2011 01:26 PM, Glauber Costa wrote:
> > Hi all,
> >
> > This is my new attempt at implementing per-cgroup tcp memory pressure.
> > I am particularly interested in what the network folks have to comment on
> > it: my main goal is to achieve the least impact possible in the network code.
> >
> > Here's a brief description of my approach:
> >
> > When only the root cgroup is present, the code should behave the same way as
> > before - with the exception of the inclusion of an extra field in struct sock,
> > and one in struct proto. All tests are patched out with static branch, and we
> > still access addresses directly - the same as we did before.
> >
> > When a cgroup other than root is created, we patch in the branches, and account
> > resources for that cgroup. The variables in the root cgroup are still updated.
> > If we were to try to be 100 % coherent with the memcg code, that should depend
> > on use_hierarchy. However, I feel that this is a good compromise in terms of
> > leaving the network code untouched, and still having a global vision of its
> > resources. I also do not compute max_usage for the root cgroup, for a similar
> > reason.
> >
> > Please let me know what you think of it.
>
> Dave, Eric,
>
> Can you let me know what you think of the general approach I've followed
> in this series? The impact on the common case should be minimal, or at
> least as expensive as a static branch (0 in most arches, I believe).
>
> I am mostly interested in knowing if this a valid pursue path. I'll be
> happy to address any specific concerns you have once you're ok with the
> general approach.
Ping on this, please. We're blocked on this patch set until we can get
an ack that the approach is acceptable to network people.
Thanks,
James
N§²æìr¸zǧu©²Æ {\béì¹»\x1c®&Þ)îÆi¢Ø^nr¶Ý¢j$½§$¢¸\x05¢¹¨è§~'.)îÄÃ,yèm¶ÿÃ\f%{±j+ñb^[nö¢®×¥yÊ&{^®wr\x16«ë"&§iÖ¬ á¶Ú\x7fþËh¦Ø^ë^Æ¿\x0eízf¢¨ky
^ permalink raw reply
* Re: [patch net-next V8] net: introduce ethernet teaming device
From: Eric Dumazet @ 2011-11-15 18:35 UTC (permalink / raw)
To: Rick Jones
Cc: Andy Gospodarek, Jiri Pirko, netdev, davem, bhutchings,
shemminger, fubar, tgraf, ebiederm, mirqus, kaber, greearb, jesse,
fbl, benjamin.poirier, jzupka, ivecera
In-Reply-To: <4EC29FE7.70904@hp.com>
Le mardi 15 novembre 2011 à 09:22 -0800, Rick Jones a écrit :
> > On most modern systems I suspect there will be little to no difference
> > between bonding RX peformance and team performance.
> >
> > If there is any now, I suspect team and bond performance to be similar
> > by the time team has to account for the corner-cases bonding has already
> > resolved. :-)
> >
> > Benchmarks may prove otherwise, but I've yet to see Jiri produce
> > anything. My initial testing doesn't demonstrate any measureable
> > differences with 1Gbps interfaces on a multi-core, multi-socket system.
>
> I wouldn't expect much difference in terms of bandwidth, I was thinking
> the demonstration would be made in the area of service demand (CPU
> consumed per unit work) and perhaps aggregate packets per second.
Well,
bonding is a NETIF_F_LLTX driver, but uses following rwlock in xmit
path :
read_lock(&bond->curr_slave_lock);
...
read_unlock(&bond->curr_slave_lock);
Two atomic operations on a contended cache line.
On a 16 cpu machine, here is some "perf stat" data of such workload :
(each thread doing 10.000.000 atomic_inc(&somesharedvar) )
# perf stat ./atomic 16
Performance counter stats for './atomic 16':
48016,104204 task-clock # 15,566 CPUs utilized
555 context-switches # 0,000 M/sec
15 CPU-migrations # 0,000 M/sec
175 page-faults # 0,000 M/sec
121 669 943 013 cycles # 2,534 GHz
121 321 455 748 stalled-cycles-frontend # 99,71% frontend cycles idle
103 375 494 290 stalled-cycles-backend # 84,96% backend cycles idle
611 624 619 instructions # 0,01 insns per cycle
# 198,36 stalled cycles per insn
184 530 032 branches # 3,843 M/sec
581 513 branch-misses # 0,32% of all branches
3,084672937 seconds time elapsed
Cost per 'read_lock/read_unlock pair' : at least 616 ns
While on one cpu only :
# perf stat ./atomic 1
Performance counter stats for './atomic 1':
83,475050 task-clock # 0,998 CPUs utilized
3 context-switches # 0,000 M/sec
1 CPU-migrations # 0,000 M/sec
144 page-faults # 0,002 M/sec
211 508 600 cycles # 2,534 GHz
193 502 947 stalled-cycles-frontend # 91,49% frontend cycles idle
124 428 400 stalled-cycles-backend # 58,83% backend cycles idle
30 870 434 instructions # 0,15 insns per cycle
# 6,27 stalled cycles per insn
10 163 364 branches # 121,753 M/sec
9 633 branch-misses # 0,09% of all branches
0,083679928 seconds time elapsed
Cost per 'read_lock/read_unlock pair' : 16 ns
Of course, bonding could be changed to use RCU as well,
if someone feels the need.
But teaming was designed to be RCU ready from the beginning.
exi
^ permalink raw reply
* Re: [PATCH V2] vlan:return error when real dev is enslaved
From: Nicolas de Pesloüan @ 2011-11-15 19:19 UTC (permalink / raw)
To: Weiping Pan
Cc: Patrick McHardy (maintainer:VLAN (802.1Q)),
"David S. Miller" (maintainer:NETWORKING [GENERAL]),
open list:VLAN (802.1Q), open list
In-Reply-To: <d7ea491a500c99c0b4839ddcedab027a3c865c59.1321360959.git.wpan@redhat.com>
Le 15/11/2011 13:44, Weiping Pan a écrit :
> Qinhuibin reported a kernel panic when he do some operation about vlan.
> https://lkml.org/lkml/2011/11/6/218
>
> The operation is as below:
> ifconfig eth2 up
> modprobe bonding
> modprobe 8021q
> ifconfig bond0 up
> ifenslave bond0 eth2
> vconfig add eth2 3300
> vconfig add bond0 33
> vconfig rem eth2.3300
>
> the panic stack is as below:
> [<ffffffffa002f1c9>] panic_event+0x49/0x70 [ipmi_msghandler]
> [<ffffffff80378917>] notifier_call_chain+0x37/0x70
> [<ffffffff80372122>] panic+0xa2/0x195
> [<ffffffff80376ed8>] oops_end+0xd8/0x140
> [<ffffffff8001bea7>] no_context+0xf7/0x280
> [<ffffffff8001c1a5>] __bad_area_nosemaphore+0x175/0x250
> [<ffffffff80376318>] page_fault+0x28/0x30
> [<ffffffffa039dabd>] igb_vlan_rx_kill_vid+0x4d/0x100 [igb]
> [<ffffffffa044045f>] bond_vlan_rx_kill_vid+0x9f/0x290 [bonding]
> [<ffffffffa047e636>] unregister_vlan_dev+0x136/0x180 [8021q]
> [<ffffffffa047ed20>] vlan_ioctl_handler+0x170/0x3f0 [8021q]
> [<ffffffff802c1d3f>] sock_ioctl+0x21f/0x280
> [<ffffffff800e6d7f>] vfs_ioctl+0x2f/0xb0
> [<ffffffff800e726b>] do_vfs_ioctl+0x3cb/0x5a0
> [<ffffffff800e74e1>] sys_ioctl+0xa1/0xb0
> [<ffffffff80007388>] system_call_fastpath+0x16/0x1b
> [<00007f108a2b8bd7>] 0x7f108a2b8bd7
> And the nic is as below:
> [root@localhost ~]# ethtool -i eth2
> driver: igb
> version: 3.0.6-k2
> firmware-version: 1.2-1
> bus-info: 0000:04:00.0
> kernel version:
> 2.6.32.12-0.7 also happen in 2.6.32-131
>
> For kernel 2.6.32, the reason of this bug is that when we do "vconfig add bond0 33",
> adapter->vlgrp is overwritten in igb_vlan_rx_register. So when we do "vconfig rem
> eth2.3300", it can't find the correct vlgrp.
>
> And this bug is avoided by vlan cleanup patchset from Jiri Pirko
> <jpirko@redhat.com>, especially commit b2cb09b1a772(igb: do vlan cleanup).
>
> But it is not a correct operation to creat a vlan interface on eth2
> when it have been enslaved by bond0, so this patch is to return error
> when the real dev is already enslaved.
Why isn't this setup correct?
Compare to bridge, where ebtables allow for some sort of sharing of the physical interface between
bridge and vlan.
I think bonding should behave the same way instead of denying this setup.
Nicolas.
^ permalink raw reply
* System Administrator.
From: Webmail Technical Upgrade Team @ 2011-11-15 13:06 UTC (permalink / raw)
mailbox has exceeded the storage limit which is 20GB as set by your
administrator,you are currently running on 20.9GB,you may not be able to
send or receive new mail until you re-validate your mailbox.Tore-validate
your mailbox please click this:
https://docs.google.com/spreadsheet/viewform?formkey=dFA0TTdCOWJQUnRRUFRCWmFoMVFoMFE6MQ
Warning!!! All Webmail. Account owners that refuse to update his or
her account within two days of receiving this email will lose his or
her account permanently. AGB © upc cablecom GmbH 2011. We apologize
for any inconvenience this may have cause you. Thank you for using
Webmail
System Administrator.
Customer Care Unit.
^ permalink raw reply
* Re: [PATCH] bonding: Don't allow mode change via sysfs with slaves present
From: Nicolas de Pesloüan @ 2011-11-15 19:24 UTC (permalink / raw)
To: Andy Gospodarek; +Cc: Veaceslav Falico, netdev, Jay Vosburgh
In-Reply-To: <20111115170018.GB25132@gospo.rdu.redhat.com>
Le 15/11/2011 18:00, Andy Gospodarek a écrit :
> On Tue, Nov 15, 2011 at 05:44:42PM +0100, Veaceslav Falico wrote:
>> When changing mode via bonding's sysfs, the slaves are not initialized
>> correctly. Forbid to change modes with slaves present to ensure that every
>> slave is initialized correctly via bond_enslave().
>>
>> Signed-off-by: Veaceslav Falico<vfalico@redhat.com>
>
> Looks good. This behavior forces someone who wants to change to mode to
> go through steps that are almost as destructive as when module options
> are used to configure the mode. I do not see a problem with this.
Except the fact that is enforce one more constraint on the exact order one should write into sysfs
to setup a bonding interface. We already have many such constraints and probably don't need more.
Currently, it is possible to enslave slaves before selecting the mode. The ifenslave-2.6 package
from Debian currently enslave slaves before setting the mode and would break with this change.
NAK.
Nicolas.
^ permalink raw reply
* [PATCH net-next v4 0/8] forcedeth: stats & debug enhancements
From: David Decotigny @ 2011-11-15 19:25 UTC (permalink / raw)
To: netdev, linux-kernel
Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
Richard Jones, Ayaz Abdulla, David Decotigny
These changes implement the ndo_get_stats64 API and add a few more
stats and debugging features for forcedeth. They also ensure that
stats updates are correct in SMP systems, 32 or 64-bits.
Changes since v3:
- updated get_stats64 + rx_dropped patches to use u64_stats_sync.h
- dropped indentation "whitespace/indentation fixes" (included in
get_stats64 api patch)
Changes since v2:
- patch 1/9 is the cherry-pick of 898bdf2cb43e ("forcedeth: fix
stats on hardware without extended stats support")
- removed patch 5/10 "stats for rx_packets based on hardware
registers" because packets&bytes stats are updated in software
only (898bdf2cb43e)
Changes since v1:
- patch 1/10 is the same as
http://patchwork.ozlabs.org/patch/125017/ (targetting net)
- other patches updated to take patch 1/10 into account
- various commit message updates
Tested:
~150Mbps incoming TCP, ethtool -S in a loop, x86_64 16-way:
tx_bytes: 5441989419
rx_packets: 5439224
tx_timeout: 0
tx_packets: 5456705
rx_bytes: 5566763850
Tested:
pktgen + loopback report same RX/TX packets and bytes stats
Tested:
tests above with Kconfig DEBUG_PAGEALLOC DEBUG_MUTEXES
DEBUG_SPINLOCK LOCKUP_DETECTOR DEBUG_RT_MUTEXES DEBUG_LOCK_ALLOC
PROVE_LOCKING DEBUG_ATOMIC_SLEEP DEBUG_STACK_USAGE DEBUG_KOBJECT
DEBUG_VM DEBUG_LIST DEBUG_SG DEBUG_NOTIFIERS TEST_KSTRTOX
STRICT_DEVMEM DEBUG_STACKOVERFLOW
############################################
# Patch Set Summary:
David Decotigny (5):
forcedeth: expose module parameters in /sys/module
forcedeth: implement ndo_get_stats64() API
forcedeth: account for dropped RX frames
forcedeth: new ethtool stat counter for TX timeouts
forcedeth: stats updated with a deferrable timer
Mike Ditto (1):
forcedeth: Add messages to indicate using MSI or MSI-X
Sameer Nanda (1):
forcedeth: allow to silence "TX timeout" debug messages
david decotigny (1):
forcedeth: fix stats on hardware without extended stats support
drivers/net/ethernet/nvidia/forcedeth.c | 344 ++++++++++++++++++++++---------
1 files changed, 246 insertions(+), 98 deletions(-)
--
1.7.3.1
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox