* Re: Raise initial congestion window size / speedup slow start?
From: Mitchell Erblich @ 2010-07-14 21:47 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Ed W, David Miller, davidsen, linux-kernel, netdev
In-Reply-To: <20100714121040.4a674511@nehalam>
On Jul 14, 2010, at 12:10 PM, Stephen Hemminger wrote:
> On Wed, 14 Jul 2010 19:48:36 +0100
> Ed W <lists@wildgooses.com> wrote:
>
>> On 14/07/2010 19:15, David Miller wrote:
>>> From: Bill Davidsen<davidsen@tmr.com>
>>> Date: Wed, 14 Jul 2010 11:21:15 -0400
>>>
>>>
>>>> You may have to go into /proc/sys/net/core and crank up the
>>>> rmem_* settings, depending on your distribution.
>>>>
>>> You should never, ever, have to touch the various networking sysctl
>>> values to get good performance in any normal setup. If you do, it's a
>>> bug, report it so we can fix it.
>>>
>>
>> Just checking the basics here because I don't think this is a bug so
>> much as a, less common installation that differs from the "normal" case.
>>
>> - When we create a tcp connection we always start with tcp slow start
>> - This sets the congestion window to effectively 4 packets?
>> - This applies in both directions?
>> - Remote sender responds to my hypothetical http request with the first
>> 4 packets of data
>> - We need to wait one RTT for the ack to come back and now we can send
>> the next 8 packets,
>> - Wait for the next ack and at 16 packets we are now moving at a
>> sensible fraction of the bandwidth delay product?
>>
>> So just to be clear:
>> - We don't seem to have any user-space tuning knobs to influence this
>> right now?
>> - In this age of short attention spans, a couple of extra seconds
>> between clicking something and it responding is worth optimising (IMHO)
>> - I think I need to take this to netdev, but anyone else with any ideas
>> happy to hear them?
>>
>> Thanks
>>
>> Ed W
>
> TCP slow start is required by the RFC. It is there to prevent a TCP congestion
> collapse. The HTTP problem is exacerbated by things beyond the user's control:
> 1. stupid server software that dribbles out data and doesn't used the full
> payload of the packets
> 2. web pages with data from multiple sources (ads especially), each of which
> requires a new connection
> 3. pages with huge graphics.
>
> Most of this is because of sites that haven't figured out that somebody on a phone
> across the globl might not have the same RTT and bandwidth that the developer on a
> local network that created them. Changing the initial cwnd isn't going to fix it.
> --
IMO, in theory one of the RFCs state a window with 4 ETH MTU (~6k window)
size packets/segment to allow a fast retransmit if a pkt is dropped.
I thought their is a fast-rexmit knob of 2 or 3 DUPACKs, for faster loss recovery.
Theorecticly it could be set to 1 DUPACK for lossey environments.
Now, the orig slow-start doubles the number of pkts per RTT assuming no loss,
which is a faster ramp up vs the orig congestion avoidance.
Now, with IPv4 with a default of 576 sized segments, without invalidating
the amount of data, 12 pkts could be sent. This would be helpful if your
app only generates smaller buffers, gets more ACKs in return which sets
the ACK clocking at a faster rate. To compensate for the smaller pkt, the ABC
Experimental RFC does byte counting to suggest fairness.
During a few round trips, the pkt size could be increased to the 1.5k ETH MTU
and hopefully to even a 9k Jumbo, probing with one increasing sized pkt.
(?to prevent rexmit of the too large pkt, overlap the increasing pkt with the next
one?)
Mitchell Erblich
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: Raise initial congestion window size / speedup slow start?
From: David Miller @ 2010-07-14 21:55 UTC (permalink / raw)
To: hagen; +Cc: rick.jones2, lists, davidsen, linux-kernel, netdev
In-Reply-To: <20100714203919.GD6682@nuttenaction>
From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Wed, 14 Jul 2010 22:39:19 +0200
> * Rick Jones | 2010-07-14 13:17:24 [-0700]:
>
>>There is an effort under way, lead by some folks at Google and
>>including some others, to get the RFC's enhanced in support of the
>>concept of larger initial congestion windows. Some of the discussion
>>may be in the "tcpm" mailing list (assuming I've not gotten my
>>mailing lists confused). There may be some previous discussion of
>>that work in the netdev archives as well.
>
> tcpm is the right mailing list but there is currently no effort to develop
> this topic. Why? Because is not a standardization issue, rather it is a
> technical issue. You cannot rise the initial CWND and expect a fair behavior.
> This was discussed several times and is documented in several documents and
> RFCs.
>
> RFC 5681 Section 3.1. Google employees should start with Section 3. This topic
> pop's of every two months in netdev and until now I _never_ read a
> consolidated contribution.
>
> Partial local issues can already be "fixed" via route specific ip options -
> see initcwnd.
Although section 3 of RFC 5681 is a great text, it does not say at all
that increasing the initial CWND would lead to fairness issues.
To be honest, I think google's proposal holds a lot of weight. If
over time link sizes and speeds are increasing (they are) then nudging
the initial CWND every so often is a legitimate proposal. Were
someone to claim that utilization is lower than it could be because of
the currenttly specified initial CWND, I would have no problem
believing them.
And I'm happy to make Linux use an increased value once it has
traction in the standardization community.
But for all we know this side discussion about initial CWND settings
could have nothing to do with the issue being reported at the start of
this thread. :-)
^ permalink raw reply
* [PATCH] CAN: Add Flexcan CAN controller driver
From: Marc Kleine-Budde @ 2010-07-14 22:00 UTC (permalink / raw)
To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde,
wg-5Yr1BZd7O62+XT7JhA+gdA
From: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
This core is found on some Freescale SoCs and also some Coldfire
SoCs. Support for Coldfire is missing though at the moment as
they have an older revision of the core which does not have RX FIFO
support.
Signed-off-by: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
---
Changes to prev version:
* The is now GPLv2 (only) as no one complained.
The patch applies to current net-next-2.6/master.
If there aren't any objections please consider applying this patch.
Wolfgang, can I an Acked-by?
Cheers, Marc
P.S.:
This patch can be pulled, too:
The following changes since commit fae88f7eedae42c955075aec7a0cd27545f81511:
Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 (2010-07-13 14:25:13 -0700)
are available in the git repository at:
git://git.pengutronix.de/git/mkl/linux-2.6.git for-net-next-2.6
Sascha Hauer (1):
CAN: Add Flexcan CAN controller driver
drivers/net/can/Kconfig | 6 +
drivers/net/can/Makefile | 1 +
drivers/net/can/flexcan.c | 1005 ++++++++++++++++++++++++++++++++++
include/linux/can/platform/flexcan.h | 20 +
4 files changed, 1032 insertions(+), 0 deletions(-)
create mode 100644 drivers/net/can/flexcan.c
create mode 100644 include/linux/can/platform/flexcan.h
diff --git a/drivers/net/can/Kconfig b/drivers/net/can/Kconfig
index 2c5227c..3f13299 100644
--- a/drivers/net/can/Kconfig
+++ b/drivers/net/can/Kconfig
@@ -73,6 +73,12 @@ config CAN_JANZ_ICAN3
This driver can also be built as a module. If so, the module will be
called janz-ican3.ko.
+config CAN_FLEXCAN
+ tristate "Support for Freescale FLEXCAN based chips"
+ depends on CAN_DEV
+ ---help---
+ Say Y here if you want to support for Freescale FlexCAN.
+
source "drivers/net/can/mscan/Kconfig"
source "drivers/net/can/sja1000/Kconfig"
diff --git a/drivers/net/can/Makefile b/drivers/net/can/Makefile
index 9047cd0..0057537 100644
--- a/drivers/net/can/Makefile
+++ b/drivers/net/can/Makefile
@@ -16,5 +16,6 @@ obj-$(CONFIG_CAN_TI_HECC) += ti_hecc.o
obj-$(CONFIG_CAN_MCP251X) += mcp251x.o
obj-$(CONFIG_CAN_BFIN) += bfin_can.o
obj-$(CONFIG_CAN_JANZ_ICAN3) += janz-ican3.o
+obj-$(CONFIG_CAN_FLEXCAN) += flexcan.o
ccflags-$(CONFIG_CAN_DEBUG_DEVICES) := -DDEBUG
diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
new file mode 100644
index 0000000..a3180ba
--- /dev/null
+++ b/drivers/net/can/flexcan.c
@@ -0,0 +1,1005 @@
+/*
+ * flexcan.c - FLEXCAN CAN controller driver
+ *
+ * Copyright (c) 2005-2006 Varma Electronics Oy
+ * Copyright (c) 2009 Sascha Hauer, Pengutronix
+ * Copyright (c) 2010 Marc Kleine-Budde, Pengutronix
+ *
+ * Based on code originally by Andrey Volkov <avolkov-ppI4tVfbJvJWk0Htik3J/w@public.gmane.org>
+ *
+ * LICENCE:
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/netdevice.h>
+#include <linux/can.h>
+#include <linux/can/dev.h>
+#include <linux/can/error.h>
+#include <linux/can/platform/flexcan.h>
+#include <linux/clk.h>
+#include <linux/delay.h>
+#include <linux/if_arp.h>
+#include <linux/if_ether.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+
+
+#include <mach/clock.h>
+
+#define DRV_NAME "flexcan"
+#define FLEXCAN_NAPI_WEIGHT 8
+
+
+/* FLEXCAN module configuration register (CANMCR) bits */
+#define FLEXCAN_MCR_MDIS BIT(31)
+#define FLEXCAN_MCR_FRZ BIT(30)
+#define FLEXCAN_MCR_FEN BIT(29)
+#define FLEXCAN_MCR_HALT BIT(28)
+#define FLEXCAN_MCR_NOT_RDY BIT(27)
+#define FLEXCAN_MCR_WAK_MSK BIT(26)
+#define FLEXCAN_MCR_SOFTRST BIT(25)
+#define FLEXCAN_MCR_FRZ_ACK BIT(24)
+#define FLEXCAN_MCR_SUPV BIT(23)
+#define FLEXCAN_MCR_SLF_WAK BIT(22)
+#define FLEXCAN_MCR_WRN_EN BIT(21)
+#define FLEXCAN_MCR_LPM_ACK BIT(20)
+#define FLEXCAN_MCR_WAK_SRC BIT(19)
+#define FLEXCAN_MCR_DOZE BIT(18)
+#define FLEXCAN_MCR_SRX_DIS BIT(17)
+#define FLEXCAN_MCR_BCC BIT(16)
+#define FLEXCAN_MCR_LPRIO_EN BIT(13)
+#define FLEXCAN_MCR_AEN BIT(12)
+#define FLEXCAN_MCR_MAXMB(x) ((x) & 0xf)
+#define FLEXCAN_MCR_IDAM_A (0 << 8)
+#define FLEXCAN_MCR_IDAM_B (1 << 8)
+#define FLEXCAN_MCR_IDAM_C (2 << 8)
+#define FLEXCAN_MCR_IDAM_D (3 << 8)
+
+/* FLEXCAN control register (CANCTRL) bits */
+#define FLEXCAN_CTRL_PRESDIV(x) (((x) & 0xff) << 24)
+#define FLEXCAN_CTRL_RJW(x) (((x) & 0x03) << 22)
+#define FLEXCAN_CTRL_PSEG1(x) (((x) & 0x07) << 19)
+#define FLEXCAN_CTRL_PSEG2(x) (((x) & 0x07) << 16)
+#define FLEXCAN_CTRL_BOFF_MSK BIT(15)
+#define FLEXCAN_CTRL_ERR_MSK BIT(14)
+#define FLEXCAN_CTRL_CLK_SRC BIT(13)
+#define FLEXCAN_CTRL_LPB BIT(12)
+#define FLEXCAN_CTRL_TWRN_MSK BIT(11)
+#define FLEXCAN_CTRL_RWRN_MSK BIT(10)
+#define FLEXCAN_CTRL_SMP BIT(7)
+#define FLEXCAN_CTRL_BOFF_REC BIT(6)
+#define FLEXCAN_CTRL_TSYNC BIT(5)
+#define FLEXCAN_CTRL_LBUF BIT(4)
+#define FLEXCAN_CTRL_LOM BIT(3)
+#define FLEXCAN_CTRL_PROPSEG(x) ((x) & 0x07)
+
+/* FLEXCAN error and status register (ESR) bits */
+#define FLEXCAN_ESR_TWRN_INT BIT(17)
+#define FLEXCAN_ESR_RWRN_INT BIT(16)
+#define FLEXCAN_ESR_BIT1_ERR BIT(15)
+#define FLEXCAN_ESR_BIT0_ERR BIT(14)
+#define FLEXCAN_ESR_ACK_ERR BIT(13)
+#define FLEXCAN_ESR_CRC_ERR BIT(12)
+#define FLEXCAN_ESR_FRM_ERR BIT(11)
+#define FLEXCAN_ESR_STF_ERR BIT(10)
+#define FLEXCAN_ESR_TX_WRN BIT(9)
+#define FLEXCAN_ESR_RX_WRN BIT(8)
+#define FLEXCAN_ESR_IDLE BIT(7)
+#define FLEXCAN_ESR_TXRX BIT(6)
+#define FLEXCAN_EST_FLT_CONF_SHIFT (4)
+#define FLEXCAN_ESR_FLT_CONF_MASK (0x2 << FLEXCAN_EST_FLT_CONF_SHIFT)
+#define FLEXCAN_ESR_FLT_CONF_ACTIVE (0x0 << FLEXCAN_EST_FLT_CONF_SHIFT)
+#define FLEXCAN_ESR_FLT_CONF_PASSIVE (0x1 << FLEXCAN_EST_FLT_CONF_SHIFT)
+#define FLEXCAN_ESR_BOFF_INT BIT(2)
+#define FLEXCAN_ESR_ERR_INT BIT(1)
+#define FLEXCAN_ESR_WAK_INT BIT(0)
+#define FLEXCAN_ESR_ERR_FRAME \
+ (FLEXCAN_ESR_BIT1_ERR | FLEXCAN_ESR_BIT0_ERR | \
+ FLEXCAN_ESR_ACK_ERR | FLEXCAN_ESR_CRC_ERR | \
+ FLEXCAN_ESR_FRM_ERR | FLEXCAN_ESR_STF_ERR)
+#define FLEXCAN_ESR_ERR_LINE \
+ (FLEXCAN_ESR_TWRN_INT | FLEXCAN_ESR_RWRN_INT | FLEXCAN_ESR_BOFF_INT)
+
+/* FLEXCAN interrupt flag register (IFLAG) bits */
+#define FLEXCAN_TX_BUF_ID 8
+#define FLEXCAN_IFLAG_BUF(x) BIT(x)
+#define FLEXCAN_IFLAG_RX_FIFO_OVERFLOW BIT(7)
+#define FLEXCAN_IFLAG_RX_FIFO_WARN BIT(6)
+#define FLEXCAN_IFLAG_RX_FIFO_AVAILABLE BIT(5)
+#define FLEXCAN_IFLAG_DEFAULT \
+ (FLEXCAN_IFLAG_RX_FIFO_OVERFLOW | FLEXCAN_IFLAG_RX_FIFO_AVAILABLE | \
+ FLEXCAN_IFLAG_BUF(FLEXCAN_TX_BUF_ID))
+
+/* FLEXCAN message buffers */
+#define FLEXCAN_MB_CNT_CODE(x) (((x) & 0xf) << 24)
+#define FLEXCAN_MB_CNT_SRR BIT(22)
+#define FLEXCAN_MB_CNT_IDE BIT(21)
+#define FLEXCAN_MB_CNT_RTR BIT(20)
+#define FLEXCAN_MB_CNT_LENGTH(x) (((x) & 0xf) << 16)
+#define FLEXCAN_MB_CNT_TIMESTAMP(x) ((x) & 0xffff)
+
+#define FLEXCAN_MB_CODE_MASK (0xf0ffffff)
+
+/* Structure of the message buffer */
+struct flexcan_mb {
+ u32 can_ctrl;
+ u32 can_id;
+ u32 data[2];
+};
+
+/* Structure of the hardware registers */
+struct flexcan_regs {
+ u32 mcr; /* 0x00 */
+ u32 ctrl; /* 0x04 */
+ u32 timer; /* 0x08 */
+ u32 _reserved1; /* 0x0c */
+ u32 rxgmask; /* 0x10 */
+ u32 rx14mask; /* 0x14 */
+ u32 rx15mask; /* 0x18 */
+ u32 ecr; /* 0x1c */
+ u32 esr; /* 0x20 */
+ u32 imask2; /* 0x24 */
+ u32 imask1; /* 0x28 */
+ u32 iflag2; /* 0x2c */
+ u32 iflag1; /* 0x30 */
+ u32 _reserved2[19];
+ struct flexcan_mb cantxfg[64];
+};
+
+struct flexcan_priv {
+ struct can_priv can;
+ struct net_device *dev;
+ struct napi_struct napi;
+
+ void __iomem *base;
+ u32 reg_esr;
+ u32 reg_ctrl_default;
+
+ struct clk *clk;
+ struct flexcan_platform_data *pdata;
+};
+
+static struct can_bittiming_const flexcan_bittiming_const = {
+ .name = DRV_NAME,
+ .tseg1_min = 4,
+ .tseg1_max = 16,
+ .tseg2_min = 2,
+ .tseg2_max = 8,
+ .sjw_max = 4,
+ .brp_min = 1,
+ .brp_max = 256,
+ .brp_inc = 1,
+};
+
+/*
+ * Swtich transceiver on or off
+ */
+static void flexcan_transceiver_switch(const struct flexcan_priv *priv, int on)
+{
+ if (priv->pdata && priv->pdata->transceiver_switch)
+ priv->pdata->transceiver_switch(on);
+}
+
+static inline void flexcan_chip_enable(struct flexcan_priv *priv)
+{
+ struct flexcan_regs __iomem *regs = priv->base;
+ u32 reg;
+
+ reg = readl(®s->mcr);
+ reg &= ~FLEXCAN_MCR_MDIS;
+ writel(reg, ®s->mcr);
+
+ udelay(10);
+}
+
+static inline void flexcan_chip_disable(struct flexcan_priv *priv)
+{
+ struct flexcan_regs __iomem *regs = priv->base;
+ u32 reg;
+
+ reg = readl(®s->mcr);
+ reg |= FLEXCAN_MCR_MDIS;
+ writel(reg, ®s->mcr);
+}
+
+static int flexcan_get_berr_counter(const struct net_device *dev,
+ struct can_berr_counter *bec)
+{
+ const struct flexcan_priv *priv = netdev_priv(dev);
+ struct flexcan_regs __iomem *regs = priv->base;
+ u32 reg = readl(®s->ecr);
+
+ bec->txerr = (reg >> 0) & 0xff;
+ bec->rxerr = (reg >> 8) & 0xff;
+
+ return 0;
+}
+
+
+static int flexcan_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+ const struct flexcan_priv *priv = netdev_priv(dev);
+ struct net_device_stats *stats = &dev->stats;
+ struct flexcan_regs __iomem *regs = priv->base;
+ struct can_frame *frame = (struct can_frame *)skb->data;
+ u32 can_id;
+ u32 ctrl = FLEXCAN_MB_CNT_CODE(0xc) | (frame->can_dlc << 16);
+
+ if (can_dropped_invalid_skb(dev, skb))
+ return NETDEV_TX_OK;
+
+ netif_stop_queue(dev);
+
+ if (frame->can_id & CAN_EFF_FLAG) {
+ can_id = frame->can_id & CAN_EFF_MASK;
+ ctrl |= FLEXCAN_MB_CNT_IDE | FLEXCAN_MB_CNT_SRR;
+ } else {
+ can_id = (frame->can_id & CAN_SFF_MASK) << 18;
+ }
+
+ if (frame->can_id & CAN_RTR_FLAG)
+ ctrl |= FLEXCAN_MB_CNT_RTR;
+
+ if (frame->can_dlc > 0) {
+ u32 data;
+ data = frame->data[0] << 24;
+ data |= frame->data[1] << 16;
+ data |= frame->data[2] << 8;
+ data |= frame->data[3];
+ writel(data, ®s->cantxfg[FLEXCAN_TX_BUF_ID].data[0]);
+ }
+ if (frame->can_dlc > 3) {
+ u32 data;
+ data = frame->data[4] << 24;
+ data |= frame->data[5] << 16;
+ data |= frame->data[6] << 8;
+ data |= frame->data[7];
+ writel(data, ®s->cantxfg[FLEXCAN_TX_BUF_ID].data[1]);
+ }
+
+ writel(can_id, ®s->cantxfg[FLEXCAN_TX_BUF_ID].can_id);
+ writel(ctrl, ®s->cantxfg[FLEXCAN_TX_BUF_ID].can_ctrl);
+
+ kfree_skb(skb);
+
+ stats->tx_bytes += frame->can_dlc;
+
+ return NETDEV_TX_OK;
+}
+
+
+static void flexcan_poll_err_frame(struct net_device *dev,
+ struct can_frame *cf, u32 reg_esr)
+{
+ struct flexcan_priv *priv = netdev_priv(dev);
+ int error_warning = 0, rx_errors = 0, tx_errors = 0;
+
+ if (reg_esr & FLEXCAN_ESR_BIT1_ERR) {
+ rx_errors = 1;
+ cf->can_id |= CAN_ERR_PROT | CAN_ERR_BUSERROR;
+ cf->data[2] |= CAN_ERR_PROT_BIT1;
+ }
+
+ if (reg_esr & FLEXCAN_ESR_BIT0_ERR) {
+ rx_errors = 1;
+ cf->can_id |= CAN_ERR_PROT | CAN_ERR_BUSERROR;
+ cf->data[2] |= CAN_ERR_PROT_BIT0;
+ }
+
+ if (reg_esr & FLEXCAN_ESR_FRM_ERR) {
+ rx_errors = 1;
+ cf->can_id |= CAN_ERR_PROT | CAN_ERR_BUSERROR;
+ cf->data[2] |= CAN_ERR_PROT_FORM;
+ }
+
+ if (reg_esr & FLEXCAN_ESR_STF_ERR) {
+ rx_errors = 1;
+ cf->can_id |= CAN_ERR_PROT | CAN_ERR_BUSERROR;
+ cf->data[2] |= CAN_ERR_PROT_STUFF;
+ }
+
+
+ if (reg_esr & FLEXCAN_ESR_ACK_ERR) {
+ tx_errors = 1;
+ cf->can_id |= CAN_ERR_ACK;
+ }
+
+ if (error_warning)
+ priv->can.can_stats.error_warning++;
+ if (rx_errors)
+ dev->stats.rx_errors++;
+ if (tx_errors)
+ dev->stats.tx_errors++;
+
+}
+
+static void flexcan_poll_err(struct net_device *dev, u32 reg_esr)
+{
+ struct sk_buff *skb;
+ struct can_frame *cf;
+
+ skb = alloc_can_err_skb(dev, &cf);
+ if (unlikely(!skb))
+ return;
+
+ flexcan_poll_err_frame(dev, cf, reg_esr);
+ netif_receive_skb(skb);
+
+ dev->stats.rx_packets++;
+ dev->stats.rx_bytes += cf->can_dlc;
+}
+
+static void flexcan_read_fifo(const struct net_device *dev, struct can_frame *cf)
+{
+ const struct flexcan_priv *priv = netdev_priv(dev);
+ struct flexcan_regs __iomem *regs = priv->base;
+ struct flexcan_mb __iomem *mb = ®s->cantxfg[0];
+ u32 reg_ctrl, reg_id;
+
+ reg_ctrl = readl(&mb->can_ctrl);
+ reg_id = readl(&mb->can_id);
+ if (reg_ctrl & FLEXCAN_MB_CNT_IDE)
+ cf->can_id = ((reg_id >> 0) & CAN_EFF_MASK) | CAN_EFF_FLAG;
+ else
+ cf->can_id = (reg_id >> 18) & CAN_SFF_MASK;
+
+ if (reg_ctrl & FLEXCAN_MB_CNT_RTR)
+ cf->can_id |= CAN_RTR_FLAG;
+ cf->can_dlc = get_can_dlc((reg_ctrl >> 16) & 0xf);
+
+ *(__be32 *)(cf->data + 0) = cpu_to_be32(readl(&mb->data[0]));
+ *(__be32 *)(cf->data + 4) = cpu_to_be32(readl(&mb->data[1]));
+
+ /* mark as read */
+ writel(FLEXCAN_IFLAG_RX_FIFO_AVAILABLE, ®s->iflag1);
+ readl(®s->timer);
+}
+
+static void flexcan_read_frame(struct net_device *dev)
+{
+ struct net_device_stats *stats = &dev->stats;
+ struct can_frame *cf;
+ struct sk_buff *skb;
+
+ skb = alloc_can_skb(dev, &cf);
+ if (unlikely(!skb)) {
+ stats->rx_dropped++;
+ return;
+ }
+
+ flexcan_read_fifo(dev, cf);
+ netif_receive_skb(skb);
+
+ stats->rx_packets++;
+ stats->rx_bytes += cf->can_dlc;
+}
+
+static int flexcan_poll(struct napi_struct *napi, int quota)
+{
+ struct net_device *dev = napi->dev;
+ const struct flexcan_priv *priv = netdev_priv(dev);
+ struct flexcan_regs __iomem *regs = priv->base;
+ u32 reg_iflag1, reg_esr;
+ int work_done = 0;
+
+ reg_iflag1 = readl(®s->iflag1);
+
+ /* first handle RX-FIFO */
+ while (reg_iflag1 & FLEXCAN_IFLAG_RX_FIFO_AVAILABLE &&
+ work_done < quota) {
+ flexcan_read_frame(dev);
+
+ work_done++;
+ reg_iflag1 = readl(®s->iflag1);
+ }
+
+ /*
+ * The error bits are clear on read,
+ * so use saved value from irq handler.
+ */
+ reg_esr = readl(®s->esr) | priv->reg_esr;
+ if (work_done < quota) {
+ flexcan_poll_err(dev, reg_esr);
+ work_done++;
+ }
+
+ if (work_done < quota) {
+ napi_complete(napi);
+ /* enable IRQs */
+ writel(FLEXCAN_IFLAG_DEFAULT, ®s->imask1);
+ writel(priv->reg_ctrl_default, ®s->ctrl);
+ }
+
+ return work_done;
+}
+
+static void flexcan_irq_err_state(struct net_device *dev,
+ struct can_frame *cf, enum can_state new_state)
+{
+ struct flexcan_priv *priv = netdev_priv(dev);
+ struct can_berr_counter bec;
+
+ flexcan_get_berr_counter(dev, &bec);
+
+ switch (priv->can.state) {
+ case CAN_STATE_ERROR_ACTIVE:
+ /*
+ * from: ERROR_ACTIVE
+ * to : ERROR_WARNING, ERROR_PASSIVE, BUS_OFF
+ * => : there was a warning int
+ */
+ if (new_state >= CAN_STATE_ERROR_WARNING &&
+ new_state <= CAN_STATE_BUS_OFF) {
+ dev_dbg(dev->dev.parent, "Error Warning IRQ\n");
+ priv->can.can_stats.error_warning++;
+
+ cf->can_id |= CAN_ERR_CRTL;
+ cf->data[1] = (bec.txerr > bec.rxerr) ?
+ CAN_ERR_CRTL_TX_WARNING :
+ CAN_ERR_CRTL_RX_WARNING;
+ }
+ case CAN_STATE_ERROR_WARNING: /* fallthrough */
+ /*
+ * from: ERROR_ACTIVE, ERROR_WARNING
+ * to : ERROR_PASSIVE, BUS_OFF
+ * => : error passive int
+ */
+ if (new_state >= CAN_STATE_ERROR_PASSIVE &&
+ new_state <= CAN_STATE_BUS_OFF) {
+ dev_dbg(dev->dev.parent, "Error Passive IRQ\n");
+ priv->can.can_stats.error_passive++;
+
+ cf->can_id |= CAN_ERR_CRTL;
+ cf->data[1] = (bec.txerr > bec.rxerr) ?
+ CAN_ERR_CRTL_TX_PASSIVE :
+ CAN_ERR_CRTL_RX_PASSIVE;
+ }
+ break;
+ case CAN_STATE_BUS_OFF:
+ dev_err(dev->dev.parent,
+ "BUG! hardware recovered automatically from BUS_OFF\n");
+ break;
+ default:
+ break;
+ }
+
+ /* process state changes depending on the new state */
+ switch (new_state) {
+ case CAN_STATE_BUS_OFF:
+ cf->can_id |= CAN_ERR_BUSOFF;
+ can_bus_off(dev);
+ break;
+ default:
+ break;
+ }
+}
+
+static void flexcan_irq_err(struct net_device *dev)
+{
+ struct flexcan_priv *priv = netdev_priv(dev);
+ struct flexcan_regs __iomem *regs = priv->base;
+ struct sk_buff *skb;
+ struct can_frame *cf;
+ enum can_state new_state;
+ u32 reg_esr;
+ int flt;
+
+ reg_esr = readl(®s->esr);
+ writel(reg_esr, ®s->esr);
+
+ flt = reg_esr & FLEXCAN_ESR_FLT_CONF_MASK;
+ if (likely(flt == FLEXCAN_ESR_FLT_CONF_ACTIVE)) {
+ if (likely(!(reg_esr & (FLEXCAN_ESR_TX_WRN |
+ FLEXCAN_ESR_RX_WRN))))
+ new_state = CAN_STATE_ERROR_ACTIVE;
+ else
+ new_state = CAN_STATE_ERROR_WARNING;
+ } else if (unlikely(flt == FLEXCAN_ESR_FLT_CONF_PASSIVE))
+ new_state = CAN_STATE_ERROR_PASSIVE;
+ else
+ new_state = CAN_STATE_BUS_OFF;
+
+ /* state hasn't changed */
+ if (likely(new_state == priv->can.state))
+ return;
+
+ skb = alloc_can_err_skb(dev, &cf);
+ if (unlikely(!skb))
+ return;
+
+ flexcan_irq_err_state(dev, cf, new_state);
+ netif_rx(skb);
+
+ dev->stats.rx_packets++;
+ dev->stats.rx_bytes += cf->can_dlc;
+
+ priv->can.state = new_state;
+}
+
+static irqreturn_t flexcan_irq(int irq, void *dev_id)
+{
+ struct net_device *dev = dev_id;
+ struct net_device_stats *stats = &dev->stats;
+ struct flexcan_priv *priv = netdev_priv(dev);
+ struct flexcan_regs __iomem *regs = priv->base;
+ u32 reg_iflag1, reg_esr;
+
+ reg_iflag1 = readl(®s->iflag1);
+ reg_esr = readl(®s->esr);
+
+ /* receive or error interrupt -> napi */
+ if ((reg_iflag1 & FLEXCAN_IFLAG_RX_FIFO_AVAILABLE) ||
+ (reg_esr & FLEXCAN_ESR_ERR_FRAME)) {
+ /*
+ * The error bits are cleared on read,
+ * save for later use.
+ */
+ priv->reg_esr = reg_esr;
+ writel(FLEXCAN_IFLAG_DEFAULT & ~FLEXCAN_IFLAG_RX_FIFO_AVAILABLE,
+ ®s->imask1);
+ writel(priv->reg_ctrl_default & ~FLEXCAN_CTRL_ERR_MSK,
+ ®s->ctrl);
+ napi_schedule(&priv->napi);
+ }
+
+ /* FIFO overflow */
+ if (reg_iflag1 & FLEXCAN_IFLAG_RX_FIFO_OVERFLOW) {
+ writel(FLEXCAN_IFLAG_RX_FIFO_OVERFLOW, ®s->iflag1);
+ dev->stats.rx_over_errors++;
+ dev->stats.rx_errors++;
+ }
+
+ /* transmission complete interrupt */
+ if (reg_iflag1 & (1 << FLEXCAN_TX_BUF_ID)) {
+ stats->tx_packets++;
+ writel((1 << FLEXCAN_TX_BUF_ID), ®s->iflag1);
+ netif_wake_queue(dev);
+ }
+
+ /* handle state changes */
+ flexcan_irq_err(dev);
+
+ return IRQ_HANDLED;
+}
+
+static void flexcan_set_bittiming(struct net_device *dev)
+{
+ const struct flexcan_priv *priv = netdev_priv(dev);
+ const struct can_bittiming *bt = &priv->can.bittiming;
+ struct flexcan_regs __iomem *regs = priv->base;
+ u32 reg;
+
+ reg = readl(®s->ctrl);
+ reg &= ~(FLEXCAN_CTRL_PRESDIV(0xff) |
+ FLEXCAN_CTRL_RJW(0x3) |
+ FLEXCAN_CTRL_PSEG1(0x7) |
+ FLEXCAN_CTRL_PSEG2(0x7) |
+ FLEXCAN_CTRL_PROPSEG(0x7) |
+ FLEXCAN_CTRL_LPB |
+ FLEXCAN_CTRL_SMP |
+ FLEXCAN_CTRL_LOM);
+
+ reg |= FLEXCAN_CTRL_PRESDIV(bt->brp - 1) |
+ FLEXCAN_CTRL_PSEG1(bt->phase_seg1 - 1) |
+ FLEXCAN_CTRL_PSEG2(bt->phase_seg2 - 1) |
+ FLEXCAN_CTRL_RJW(bt->sjw - 1) |
+ FLEXCAN_CTRL_PROPSEG(bt->prop_seg - 1);
+
+ if (priv->can.ctrlmode & CAN_CTRLMODE_LOOPBACK)
+ reg |= FLEXCAN_CTRL_LPB;
+ if (priv->can.ctrlmode & CAN_CTRLMODE_LISTENONLY)
+ reg |= FLEXCAN_CTRL_LOM;
+ if (priv->can.ctrlmode & CAN_CTRLMODE_3_SAMPLES)
+ reg |= FLEXCAN_CTRL_SMP;
+
+ dev_info(dev->dev.parent, "writing ctrl=0x%08x\n", reg);
+ writel(reg, ®s->ctrl);
+
+ /* print chip status */
+ dev_dbg(dev->dev.parent, "%s: mcr=0x%08x ctrl=0x%08x\n", __func__,
+ readl(®s->mcr), readl(®s->ctrl));
+}
+
+/*
+ * flexcan_chip_start
+ *
+ * this functions is entered with clocks enabled
+ *
+ */
+static int flexcan_chip_start(struct net_device *dev)
+{
+ struct flexcan_priv *priv = netdev_priv(dev);
+ struct flexcan_regs __iomem *regs = priv->base;
+ unsigned int i;
+ int err;
+ u32 reg_mcr, reg_ctrl;
+
+ /* enable module */
+ flexcan_chip_enable(priv);
+
+ /* soft reset */
+ writel(FLEXCAN_MCR_SOFTRST, ®s->mcr);
+ udelay(10);
+
+ reg_mcr = readl(®s->mcr);
+ if (reg_mcr & FLEXCAN_MCR_SOFTRST) {
+ dev_err(dev->dev.parent,
+ "Failed to softreset can module (mcr=0x%08x)\n", reg_mcr);
+ err = -ENODEV;
+ goto out;
+ }
+
+ flexcan_set_bittiming(dev);
+
+ /*
+ * MCR
+ *
+ * enable freeze
+ * enable fifo
+ * halt now
+ * only supervisor access
+ * enable warning int
+ * choose format C
+ *
+ */
+ reg_mcr = readl(®s->mcr);
+ reg_mcr |= FLEXCAN_MCR_FRZ | FLEXCAN_MCR_FEN | FLEXCAN_MCR_HALT |
+ FLEXCAN_MCR_SUPV | FLEXCAN_MCR_WRN_EN |
+ FLEXCAN_MCR_IDAM_C;
+ dev_dbg(dev->dev.parent, "%s: writing mcr=0x%08x", __func__, reg_mcr);
+ writel(reg_mcr, ®s->mcr);
+
+ /*
+ * CTRL
+ *
+ * enable bus off interrupt
+ * disable auto busoff recovery
+ * enable tx and rx warning interrupt
+ * transmit lowest buffer first
+ */
+ reg_ctrl = readl(®s->ctrl);
+ reg_ctrl |= FLEXCAN_CTRL_BOFF_MSK |FLEXCAN_CTRL_BOFF_REC |
+ FLEXCAN_CTRL_TWRN_MSK | FLEXCAN_CTRL_RWRN_MSK |
+ FLEXCAN_CTRL_LBUF;
+ /*
+ * TODO: for now turn on the error interrupt, otherwise we
+ * don't get any warning or bus passive interrupts.
+ */
+ reg_ctrl |= FLEXCAN_CTRL_ERR_MSK;
+
+ /* save for later use */
+ priv->reg_ctrl_default = reg_ctrl;
+ dev_dbg(dev->dev.parent, "%s: writing ctrl=0x%08x", __func__, reg_ctrl);
+ writel(reg_ctrl, ®s->ctrl);
+
+ for (i = 0; i < ARRAY_SIZE(regs->cantxfg); i++) {
+ writel(0, ®s->cantxfg[i].can_ctrl);
+ writel(0, ®s->cantxfg[i].can_id);
+ writel(0, ®s->cantxfg[i].data[0]);
+ writel(0, ®s->cantxfg[i].data[1]);
+
+ /* put MB into rx queue */
+ writel(FLEXCAN_MB_CNT_CODE(0x4), ®s->cantxfg[i].can_ctrl);
+ }
+
+ /* acceptance mask/acceptance code (accept everything) */
+ writel(0x0, ®s->rxgmask);
+ writel(0x0, ®s->rx14mask);
+ writel(0x0, ®s->rx15mask);
+
+ flexcan_transceiver_switch(priv, 1);
+
+ /* synchronize with the can bus */
+ reg_mcr = readl(®s->mcr);
+ reg_mcr &= ~FLEXCAN_MCR_HALT;
+ writel(reg_mcr, ®s->mcr);
+
+ priv->can.state = CAN_STATE_ERROR_ACTIVE;
+
+ /* enable FIFO interrupts */
+ writel(FLEXCAN_IFLAG_DEFAULT, ®s->imask1);
+
+ /* print chip status */
+ dev_dbg(dev->dev.parent, "%s: mcr=0x%08x ctrl=0x%08x\n", __func__,
+ readl(®s->mcr), readl(®s->ctrl));
+
+ return 0;
+
+ out:
+ flexcan_chip_disable(priv);
+ return err;
+}
+
+/*
+ * flexcan_chip_stop
+ *
+ * this functions is entered with clocks enabled
+ *
+ */
+static void flexcan_chip_stop(struct net_device *dev)
+{
+ struct flexcan_priv *priv = netdev_priv(dev);
+ struct flexcan_regs __iomem *regs = priv->base;
+ u32 reg;
+
+ /* Disable all interrupts */
+ writel(0, ®s->imask1);
+
+ /* Disable + halt module */
+ reg = readl(®s->mcr);
+ reg |= FLEXCAN_MCR_MDIS | FLEXCAN_MCR_HALT;
+ writel(reg, ®s->mcr);
+
+ flexcan_transceiver_switch(priv, 0);
+ priv->can.state = CAN_STATE_STOPPED;
+
+ return;
+}
+
+static int flexcan_open(struct net_device *dev)
+{
+ struct flexcan_priv *priv = netdev_priv(dev);
+ int err;
+
+ clk_enable(priv->clk);
+
+ err = open_candev(dev);
+ if (err)
+ goto out;
+
+ err = request_irq(dev->irq, flexcan_irq, IRQF_SHARED, dev->name, dev);
+ if (err)
+ goto out_close;
+
+ /* start chip and queuing */
+ err = flexcan_chip_start(dev);
+ if (err)
+ goto out_close;
+ napi_enable(&priv->napi);
+ netif_start_queue(dev);
+
+ return 0;
+
+ out_close:
+ close_candev(dev);
+ out:
+ clk_disable(priv->clk);
+
+ return err;
+}
+
+static int flexcan_close(struct net_device *dev)
+{
+ struct flexcan_priv *priv = netdev_priv(dev);
+
+ netif_stop_queue(dev);
+ napi_disable(&priv->napi);
+ flexcan_chip_stop(dev);
+
+ free_irq(dev->irq, dev);
+ clk_disable(priv->clk);
+
+ close_candev(dev);
+
+ return 0;
+}
+
+static int flexcan_set_mode(struct net_device *dev, enum can_mode mode)
+{
+ int err;
+
+ switch (mode) {
+ case CAN_MODE_START:
+ err = flexcan_chip_start(dev);
+ if (err)
+ return err;
+
+ netif_wake_queue(dev);
+ break;
+
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static const struct net_device_ops flexcan_netdev_ops = {
+ .ndo_open = flexcan_open,
+ .ndo_stop = flexcan_close,
+ .ndo_start_xmit = flexcan_start_xmit,
+};
+
+static int __devinit register_flexcandev(struct net_device *dev)
+{
+ struct flexcan_priv *priv = netdev_priv(dev);
+ struct flexcan_regs __iomem *regs = priv->base;
+ u32 reg, err;
+
+ clk_enable(priv->clk);
+
+ /* select "bus clock", chip must be disabled */
+ flexcan_chip_disable(priv);
+ reg = readl(®s->ctrl);
+ reg |= FLEXCAN_CTRL_CLK_SRC;
+ writel(reg, ®s->ctrl);
+
+ flexcan_chip_enable(priv);
+
+ /* set freeze, halt and activate FIFO, restrict register access */
+ reg = readl(®s->mcr);
+ reg |= FLEXCAN_MCR_FRZ | FLEXCAN_MCR_HALT |
+ FLEXCAN_MCR_FEN | FLEXCAN_MCR_SUPV;
+ writel(reg, ®s->mcr);
+
+ /*
+ * Currently we only support newer versions of this core
+ * featuring a RX FIFO. Older cores found on some Coldfire
+ * derivates are not yet supported.
+ */
+ reg = readl(®s->mcr);
+ if (!(reg & FLEXCAN_MCR_FEN)) {
+ dev_err(dev->dev.parent,
+ "Could not enable RX FIFO, unsupported core\n");
+ err = -ENODEV;
+ goto out;
+ }
+
+ err = register_candev(dev);
+
+ out:
+ /* disable core and turn off clocks */
+ flexcan_chip_disable(priv);
+ clk_disable(priv->clk);
+
+ return err;
+}
+
+static void __devexit unregister_flexcandev(struct net_device *dev)
+{
+ unregister_candev(dev);
+}
+
+static int __devinit flexcan_probe(struct platform_device *pdev)
+{
+ struct net_device *dev;
+ struct flexcan_priv *priv;
+ struct resource *mem;
+ struct clk *clk;
+ void __iomem *base;
+ resource_size_t mem_size;
+ int err, irq;
+
+ clk = clk_get(&pdev->dev, NULL);
+ if (IS_ERR(clk)) {
+ dev_err(&pdev->dev, "no clock defined\n");
+ err = PTR_ERR(clk);
+ goto failed_clock;
+ }
+
+ mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ irq = platform_get_irq(pdev, 0);
+ if (!mem || irq <= 0) {
+ err = -ENODEV;
+ goto failed_get;
+ }
+
+ mem_size = resource_size(mem);
+ if (!request_mem_region(mem->start, mem_size, pdev->name)) {
+ err = -EBUSY;
+ goto failed_req;
+ }
+
+ base = ioremap(mem->start, mem_size);
+ if (!base) {
+ err = -ENOMEM;
+ goto failed_map;
+ }
+
+ dev = alloc_candev(sizeof(struct flexcan_priv), 0);
+ if (!dev) {
+ err = -ENOMEM;
+ goto failed_alloc;
+ }
+
+ dev->netdev_ops = &flexcan_netdev_ops;
+ dev->irq = irq;
+ dev->flags |= IFF_ECHO; /* we support local echo in hardware */
+
+ priv = netdev_priv(dev);
+ priv->can.clock.freq = clk_get_rate(clk);
+ priv->can.bittiming_const = &flexcan_bittiming_const;
+ priv->can.do_set_mode = flexcan_set_mode;
+ priv->can.do_get_berr_counter = flexcan_get_berr_counter;
+ priv->can.ctrlmode_supported = CAN_CTRLMODE_LOOPBACK |
+ CAN_CTRLMODE_LISTENONLY | CAN_CTRLMODE_3_SAMPLES;
+ priv->base = base;
+ priv->dev = dev;
+ priv->clk = clk;
+ priv->pdata = pdev->dev.platform_data;
+
+ netif_napi_add(dev, &priv->napi, flexcan_poll, FLEXCAN_NAPI_WEIGHT);
+
+ dev_set_drvdata(&pdev->dev, dev);
+ SET_NETDEV_DEV(dev, &pdev->dev);
+
+ err = register_flexcandev(dev);
+ if (err) {
+ dev_err(&pdev->dev, "registering netdev failed\n");
+ goto failed_register;
+ }
+
+ dev_info(&pdev->dev, "device registered (reg_base=%p, irq=%d)\n",
+ priv->base, dev->irq);
+
+ return 0;
+
+ failed_register:
+ free_candev(dev);
+ failed_alloc:
+ iounmap(base);
+ failed_map:
+ release_mem_region(mem->start, mem_size);
+ failed_req:
+ clk_put(clk);
+ failed_get:
+ failed_clock:
+ return err;
+}
+
+static int __devexit flexcan_remove(struct platform_device *pdev)
+{
+ struct net_device *dev = platform_get_drvdata(pdev);
+ struct flexcan_priv *priv = netdev_priv(dev);
+ struct resource *mem;
+
+ unregister_flexcandev(dev);
+ platform_set_drvdata(pdev, NULL);
+ free_candev(dev);
+ iounmap(priv->base);
+
+ mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ release_mem_region(mem->start, resource_size(mem));
+
+ clk_put(priv->clk);
+
+ return 0;
+}
+
+static struct platform_driver flexcan_driver = {
+ .driver.name = DRV_NAME,
+ .probe = flexcan_probe,
+ .remove = __devexit_p(flexcan_remove),
+};
+
+static int __init flexcan_init(void)
+{
+ pr_info("%s netdevice driver\n", DRV_NAME);
+ return platform_driver_register(&flexcan_driver);
+}
+
+static void __exit flexcan_exit(void)
+{
+ platform_driver_unregister(&flexcan_driver);
+ pr_info("%s: driver removed\n", DRV_NAME);
+}
+
+module_init(flexcan_init);
+module_exit(flexcan_exit);
+
+MODULE_AUTHOR("Sascha Hauer <kernel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>, "
+ "Marc Kleine-Budde <kernel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("CAN port driver for flexcan based chip");
diff --git a/include/linux/can/platform/flexcan.h b/include/linux/can/platform/flexcan.h
new file mode 100644
index 0000000..72b713a
--- /dev/null
+++ b/include/linux/can/platform/flexcan.h
@@ -0,0 +1,20 @@
+/*
+ * Copyright (C) 2010 Marc Kleine-Budde <kernel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
+ *
+ * This file is released under the GPLv2
+ *
+ */
+
+#ifndef __CAN_PLATFORM_FLEXCAN_H
+#define __CAN_PLATFORM_FLEXCAN_H
+
+/**
+ * struct flexcan_platform_data - flex CAN controller platform data
+ * @transceiver_enable: - called to power on/off the transceiver
+ *
+ */
+struct flexcan_platform_data {
+ void (*transceiver_switch)(int enable);
+};
+
+#endif /* __CAN_PLATFORM_FLEXCAN_H */
--
1.7.1
^ permalink raw reply related
* Re: Raise initial congestion window size / speedup slow start?
From: Ed W @ 2010-07-14 22:05 UTC (permalink / raw)
To: Hagen Paul Pfeifer
Cc: Rick Jones, David Miller, davidsen, linux-kernel, netdev
In-Reply-To: <20100714203919.GD6682@nuttenaction>
On 14/07/2010 21:39, Hagen Paul Pfeifer wrote:
> * Rick Jones | 2010-07-14 13:17:24 [-0700]:
>
>
>> There is an effort under way, lead by some folks at Google and
>> including some others, to get the RFC's enhanced in support of the
>> concept of larger initial congestion windows. Some of the discussion
>> may be in the "tcpm" mailing list (assuming I've not gotten my
>> mailing lists confused). There may be some previous discussion of
>> that work in the netdev archives as well.
>>
> tcpm is the right mailing list but there is currently no effort to develop
> this topic. Why? Because is not a standardization issue, rather it is a
> technical issue. You cannot rise the initial CWND and expect a fair behavior.
> This was discussed several times and is documented in several documents and
> RFCs.
>
I'm sure you have covered this to the point you are fed up, but my
searches turn up only a smattering of posts covering this - could you
summarise why "you cannot raise the initial cwnd and expect a fair
behaviour"?
Initial cwnd was changed (increased) in the past (rfc3390) and the RFC
claims that studies then suggested that the benefits were all positive.
Some reasonably smart people have suggested that it might be time to
review the status quo again so it doesn't seem completely obvious that
the current number is optimal?
> RFC 5681 Section 3.1. Google employees should start with Section 3. This topic
> pop's of every two months in netdev and until now I _never_ read a
> consolidated contribution.
>
Sorry, what do you mean by a "consolidated contribution"?
That RFC is a subtle read - it appears to give more specific guidance on
what to do in certain situations, but I'm not sure I see that it
improves slow start convergence speed for my situation (large RTT)?
Would you mind highlighting the new bits for those of us a bit newer to
the subject?
> Partial local issues can already be "fixed" via route specific ip options -
> see initcwnd.
>
Oh, excellent. This seems like exactly what I'm after. (Thanks Stephen
Hemminger!)
Many thanks
Ed W
^ permalink raw reply
* Re: [PATCH 05/11] tulip: formatting of pointers in printk()
From: David Miller @ 2010-07-14 22:07 UTC (permalink / raw)
To: segooon-Re5JQEeQqe8AvxtiuMwx3w
Cc: grundler-6jwH94ZQLHl74goWV3ctuw, jpirko-H+wXaHxf7aLQT0dZR+AlfA,
netdev-u79uwXL29TY76Z2rM5mHXA,
devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
kernel-janitors-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kyle-pfcGkIkfWfAsA/PxXw9srA,
joe-6d6DIl74uiNBDgjK7y7TUQ, ben-/+tVBieCtBitmTQ+vhA3Yw
In-Reply-To: <1279130568-10857-1-git-send-email-segooon-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
From: Kulikov Vasiliy <segooon-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date: Wed, 14 Jul 2010 22:02:47 +0400
> Use %p instead of %08x in printk().
>
> Signed-off-by: Kulikov Vasiliy <segooon-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Since patch #5 and #6 are doing the same change to different
files in the same driver, I combined them into one commit.
There is no need to split things up with so much granularity.
If it's all in the same driver, doing the same transformation,
keep it all in one patch.
Thanks.
^ permalink raw reply
* Re: Raise initial congestion window size / speedup slow start?
From: Hagen Paul Pfeifer @ 2010-07-14 22:13 UTC (permalink / raw)
To: David Miller; +Cc: rick.jones2, lists, davidsen, linux-kernel, netdev
In-Reply-To: <20100714.145547.102555471.davem@davemloft.net>
* David Miller | 2010-07-14 14:55:47 [-0700]:
>Although section 3 of RFC 5681 is a great text, it does not say at all
>that increasing the initial CWND would lead to fairness issues.
Because it is only one side of the medal, probing conservative the available
link capacity in conjunction with n simultaneous probing TCP/SCTP/DCCP
instances is another.
>To be honest, I think google's proposal holds a lot of weight. If
>over time link sizes and speeds are increasing (they are) then nudging
>the initial CWND every so often is a legitimate proposal. Were
>someone to claim that utilization is lower than it could be because of
>the currenttly specified initial CWND, I would have no problem
>believing them.
>
>And I'm happy to make Linux use an increased value once it has
>traction in the standardization community.
Currently I know no working link capacity probing approach, without active
network feedback, to conservatively probing the available link capacity with a
high CWND. I am curious about any future trends.
>But for all we know this side discussion about initial CWND settings
>could have nothing to do with the issue being reported at the start of
>this thread. :-)
;-) sure, but it is often wise to thwart these kind of discussions. It seems
these CWND discussions turn up once every other month. ;-)
Hagen
^ permalink raw reply
* Re: Raise initial congestion window size / speedup slow start?
From: Rick Jones @ 2010-07-14 22:19 UTC (permalink / raw)
To: Hagen Paul Pfeifer; +Cc: David Miller, lists, davidsen, linux-kernel, netdev
In-Reply-To: <20100714221301.GI6682@nuttenaction>
Hagen Paul Pfeifer wrote:
> * David Miller | 2010-07-14 14:55:47 [-0700]:
>>But for all we know this side discussion about initial CWND settings
>>could have nothing to do with the issue being reported at the start of
>>this thread. :-)
>
>
> ;-) sure, but it is often wise to thwart these kind of discussions. It seems
> these CWND discussions turn up once every other month. ;-)
Which suggests there is a constant "force" out there yet to be rekoned with. :)
rick jones
^ permalink raw reply
* [PATCH] bonding: fix a buffer overflow in bonding_show_queue_id.
From: Nicolas de Pesloüan @ 2010-07-14 22:24 UTC (permalink / raw)
To: bonding-devel, andy, fubar, davem, netdev; +Cc: Nicolas de Pesloüan
The test for buffer overflow ensures we have room for 6 more bytes.
sprintf, called with %s:%d, slave->dev->name, slave->queue_id may yield
far more than 6 bytes.
The correct test is res > (PAGE_SIZE - IFNAMSIZ - 6) .
Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
---
drivers/net/bonding/bond_sysfs.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index f9a0343..1a99764 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1427,8 +1427,8 @@ static ssize_t bonding_show_queue_id(struct device *d,
read_lock(&bond->lock);
bond_for_each_slave(bond, slave, i) {
- if (res > (PAGE_SIZE - 6)) {
- /* not enough space for another interface name */
+ if (res > (PAGE_SIZE - IFNAMSIZ - 6)) {
+ /* not enough space for another interface_name:queue_id pair */
if ((PAGE_SIZE - res) > 10)
res = PAGE_SIZE - 10;
res += sprintf(buf + res, "++more++ ");
--
1.7.1
^ permalink raw reply related
* Re: [PATCH] net: skb_tx_hash() fix relative to skb_orphan_try()
From: David Miller @ 2010-07-14 22:33 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1279034660.2634.439.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 13 Jul 2010 17:24:20 +0200
> commit fc6055a5ba31e2 (net: Introduce skb_orphan_try()) added early
> orphaning of skbs.
>
> This unfortunately added a performance regression in skb_tx_hash() in
> case of stacked devices (bonding, vlans, ...)
>
> Since skb->sk is now NULL, we cannot access sk->sk_hash anymore to
> spread tx packets to multiple NIC queues on multiqueue devices.
>
> skb_tx_hash() in this case only uses skb->protocol, same value for all
> flows.
>
> skb_orphan_try() can copy sk->sk_hash into skb->rxhash and skb_tx_hash()
> can use this saved sk_hash value to compute its internal hash value.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied, thanks Eric.
^ permalink raw reply
* Re: Raise initial congestion window size / speedup slow start?
From: Hagen Paul Pfeifer @ 2010-07-14 22:36 UTC (permalink / raw)
To: Ed W; +Cc: Rick Jones, David Miller, davidsen, linux-kernel, netdev
In-Reply-To: <4C3E34AB.2060405@wildgooses.com>
* Ed W | 2010-07-14 23:05:31 [+0100]:
>Initial cwnd was changed (increased) in the past (rfc3390) and the
>RFC claims that studies then suggested that the benefits were all
>positive. Some reasonably smart people have suggested that it might
>be time to review the status quo again so it doesn't seem completely
>obvious that the current number is optimal?
Do you cite "An Argument for Increasing TCP's Initial Congestion Window"?
People at google stated that a CWND of 10 seems to be fair in their
measurements. 10 because the test setup was equipped with a reasonable large
link capacity? Do they analyse their modification in environments with a small
BDP (e.g. multihop MANET setup, ...)? I am curious, but We will see what
happens if TCPM adopts this.
>That RFC is a subtle read - it appears to give more specific guidance
>on what to do in certain situations, but I'm not sure I see that it
>improves slow start convergence speed for my situation (large RTT)?
>Would you mind highlighting the new bits for those of us a bit newer
>to the subject?
The objection/hint was more of general nature - not specific for larger RTTs.
Environments with larger RTTs are disadvantaged because TCP is ACK clocked.
Half-truth statement for my part because RTT fairness is and was an issue at
the development of new congestion control algorithms: BIC, CUBIC and friends.
>>Partial local issues can already be "fixed" via route specific ip options -
>>see initcwnd.
>
>Oh, excellent. This seems like exactly what I'm after. (Thanks
>Stephen Hemminger!)
Great, you are welcome! ;-)
Hagen
^ permalink raw reply
* Re: Raise initial congestion window size / speedup slow start?
From: Hagen Paul Pfeifer @ 2010-07-14 22:40 UTC (permalink / raw)
To: Rick Jones; +Cc: David Miller, lists, davidsen, linux-kernel, netdev
In-Reply-To: <4C3E37F7.3020607@hp.com>
* Rick Jones | 2010-07-14 15:19:35 [-0700]:
>>;-) sure, but it is often wise to thwart these kind of discussions. It seems
>>these CWND discussions turn up once every other month. ;-)
>
>Which suggests there is a constant "force" out there yet to be rekoned with. :)
;-) I am _not_ unconscious, but the better address for this kind of
discussions is still tcpm.
Hagen
^ permalink raw reply
* Re: Raise initial congestion window size / speedup slow start?
From: Ed W @ 2010-07-14 22:52 UTC (permalink / raw)
To: Hagen Paul Pfeifer
Cc: David Miller, rick.jones2, davidsen, linux-kernel, netdev
In-Reply-To: <20100714221301.GI6682@nuttenaction>
>> Although section 3 of RFC 5681 is a great text, it does not say at all
>> that increasing the initial CWND would lead to fairness issues.
>>
> Because it is only one side of the medal, probing conservative the available
> link capacity in conjunction with n simultaneous probing TCP/SCTP/DCCP
> instances is another.
>
So lets define the problem more succinctly:
- New TCP connections are assumed to have no knowledge of current
network conditions (bah)
- We desire the connection to consume the maximum amount of bandwidth
possible, but staying ever so fractionally under the maximum link bandwidth
> Currently I know no working link capacity probing approach, without active
> network feedback, to conservatively probing the available link capacity with a
> high CWND. I am curious about any future trends.
>
Sounds like smarter people than I have played this game, but just to
chuck out one idea: How about attacking the idea that we have no
knowledge of network conditions? After all we have a bunch of
information about:
1) very good information about the size of the link to the first hop (eg
the modem/network card reported rate)
2) often a reasonably good idea about the bandwidth to the first
"restrictive" router along our default path (ie usually the situation is
there is a pool of high speed network locally, then a more limited
connectivity between our network and other networks. We can look at the
maximum flows through our network device to outside our subnet and infer
an approximate link speed from that)
3) often moderate quality information about the size of the link between
us and a specific destination IP
So here goes: the heuristic could be to examine current flows through
our interface, use this to offer hints to the remote end during SYN
handshake as to a recommended starting size, and additionally the client
side can examine the implied RTT of the SYN/ACK to further fine tune the
initial cwnd?
In practice this could be implemented in other ways such as examining
recent TCP congestion windows and using some heuristic to start "near"
those. Or remembering congestion windows recently used for popular
destinations? Also we can benefit the receiver of our data - if we see
some app open up 16 http connections to some poor server then some of
those connections will NOT be given large initial cwnd.
Essentially perhaps we can refine our initial cwnd heuristic somewhat if
we assume better than zero knowledge about the network link?
Out of curiousity, why has it taken so long for active feedback to
appear? If every router simply added a hint to the packet as to the max
bandwidth it can offer then we would appear to be able to make massively
better decisions on window sizes. Furthermore routers have the ability
to put backpressure on classes of traffic as appropriate. I guess the
speed at which ECN has been adopted answers the question of why nothing
more exotic has appeared?
>> But for all we know this side discussion about initial CWND settings
>> could have nothing to do with the issue being reported at the start of
>> this thread. :-)
>>
Actually the original question was mine and it was literally - can I
adjust the initial cwnd for users of my very specific satellite network
which has a high RTT. I believe Stephen Hemminger has been kind enough
to recently add the facility to experiment with this to the ip utility
and so I am now in a position to go do some testing - thanks Stephen
Cheers
Ed W
^ permalink raw reply
* RE: [REGRESSION] e1000e stopped working [MANUALLY BISECTED]
From: Tantilov, Emil S @ 2010-07-14 22:56 UTC (permalink / raw)
To: Maxim Levitsky
Cc: Kirsher, Jeffrey T, netdev@vger.kernel.org, Allan, Bruce W,
Pieper, Jeffrey E
In-Reply-To: <1278981483.23017.4.camel@localhost.localdomain>
Maxim Levitsky wrote:
> On Mon, 2010-07-12 at 15:23 -0600, Tantilov, Emil S wrote:
>> Maxim Levitsky wrote:
>>> On Mon, 2010-07-05 at 12:58 +0300, Maxim Levitsky wrote:
>>>> On Mon, 2010-07-05 at 01:13 -0700, Jeff Kirsher wrote:
>>>>> On Sun, Jul 4, 2010 at 15:48, Maxim Levitsky
>>>>> <maximlevitsky@gmail.com> wrote:
>>>>>> Did few guesses, and now I see that reverting the below commit
>>>>>> fixes the problem.
>>>>>>
>>>>>> "e1000e: Fix/cleanup PHY reset code for ICHx/PCHx"
>>>>>> e98cac447cc1cc418dff1d610a5c79c4f2bdec7f.
>>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Maxim Levitsky
>>>>>>
>>>>>> --
>>>>>
>>>>> Can you give us till Tuesday to respond? I know that there are
>>>>> some additional e1000e patches in my queue, which may resolve the
>>>>> issue, but this weekend the power is down to do some
>>>>> infrastructure upgrades which prevents us from doing any
>>>>> investigation.debugging until Tuesday.
>>>>>
>>>>
>>>> Sure.
>>>>
>>>> Best regards,
>>>> Maxim Levitsky
>>>>
>>>
>>> Updates?
>>
>> We are working on reproducing the issue. So far we have not seen the
>> problem when testing with net-next.
>>
>> I asked in previous email about some additional info from ethtool
>> (-d, -e, -S) and kernel config. That would help us to narrow it
>> down.
>>
>> Thanks,
>> Emil
> I did send -e and -d output.
Sorry, looks like I lost the email with the attachements.
Could you provide the output of dmesg after the failure occurs?
> Since you probably want -S output during failure, I need to recompile
> kernel for that. I will do that soon.
>
>
> One question, in two weeks I hope 2.6.35 won't be released?
> If so, I will have enough free time then to narrow down this issue.
>
> Other solution, is to revert this commit.
> (I have never seen this problem with it reverted).
We have been running reboot tests on 2 separate systems with recent net-next kernels
using your config and so far no luck in reproducing this issue.
What is the make model of your system (or MB)?
Thanks,
Emil
^ permalink raw reply
* Re: Raise initial congestion window size / speedup slow start?
From: Hagen Paul Pfeifer @ 2010-07-14 23:01 UTC (permalink / raw)
To: Ed W; +Cc: David Miller, rick.jones2, davidsen, linux-kernel, netdev
In-Reply-To: <4C3E3F92.2090506@wildgooses.com>
* Ed W | 2010-07-14 23:52:02 [+0100]:
>Out of curiousity, why has it taken so long for active feedback to
>appear? If every router simply added a hint to the packet as to the
>max bandwidth it can offer then we would appear to be able to make
>massively better decisions on window sizes. Furthermore routers have
>the ability to put backpressure on classes of traffic as appropriate.
>I guess the speed at which ECN has been adopted answers the question
>of why nothing more exotic has appeared?
It is quite late here so I will quickly write two sentence about ECN: one
month ago Lars Eggers posted a link at the tcpm maillinglist where google (not
really sure if it was google) analysed the employment of ECN - the usage was
really low. Search the PDF, it is quite interesting one.
Hagen
^ permalink raw reply
* Re: Raise initial congestion window size / speedup slow start?
From: Ed W @ 2010-07-14 23:01 UTC (permalink / raw)
To: Hagen Paul Pfeifer
Cc: Rick Jones, David Miller, davidsen, linux-kernel, netdev
In-Reply-To: <20100714223633.GJ6682@nuttenaction>
> Do you cite "An Argument for Increasing TCP's Initial Congestion Window"?
> People at google stated that a CWND of 10 seems to be fair in their
> measurements. 10 because the test setup was equipped with a reasonable large
> link capacity? Do they analyse their modification in environments with a small
> BDP (e.g. multihop MANET setup, ...)? I am curious, but We will see what
> happens if TCPM adopts this.
>
Well, I personally would shoot for starting from the position of
assuming better than zero knowledge about our link and incorporating
that into the initial cwnd estimate...
We know something about the RTT from the syn/ack times, speed of the
local link and quickly we will learn about median window sizes to other
destinations, plus additionally the kernel has some knowledge of other
connections currently in progress. With all that information perhaps we
can make a more informed option than just a hard coded magic number? (Oh
and lets make the option pluggable so that we can soon have 10 different
kernel options...)
Seems like there is evidence that networks are starting to cluster into groups that would benefit from a range of cwnd options (higher/lower) - perhaps there is some way to choose a reasonable heuristic to cluster these and choose a better starting option?
Cheers
Ed W
^ permalink raw reply
* Re: Raise initial congestion window size / speedup slow start?
From: Ed W @ 2010-07-14 23:05 UTC (permalink / raw)
To: Hagen Paul Pfeifer
Cc: David Miller, rick.jones2, davidsen, linux-kernel, netdev
In-Reply-To: <20100714230100.GL6682@nuttenaction>
On 15/07/2010 00:01, Hagen Paul Pfeifer wrote:
> It is quite late here so I will quickly write two sentence about ECN: one
> month ago Lars Eggers posted a link at the tcpm maillinglist where google (not
> really sure if it was google) analysed the employment of ECN - the usage was
> really low. Search the PDF, it is quite interesting one.
>
I would speculate that this is because there is a big warning on ECN
saying that it may cause you to loose customers who can't connect to
you... Businesses are driven by needing to support the most common case,
not the most optimal (witness the pain of html development and needing
to consider IE6...)
What would be more useful is for google to survey how many devices are
unable to interoperate with ECN and if that number turned out to be
extremely low, and this fact were advertised, then I suspect we might
see a mass increase in it's deployment? I know I have it turned off on
all my servers because I worry more about loosing one customer than
improving the experience for all customers...
Cheers
Ed W
^ permalink raw reply
* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Sridhar Samudrala @ 2010-07-14 23:26 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Oleg Nesterov, Peter Zijlstra, Tejun Heo, Ingo Molnar, netdev,
lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <20100713110939.GA3446@redhat.com>
On Tue, 2010-07-13 at 14:09 +0300, Michael S. Tsirkin wrote:
> On Mon, Jul 12, 2010 at 11:59:08PM -0700, Sridhar Samudrala wrote:
> > On 7/4/2010 2:00 AM, Michael S. Tsirkin wrote:
> > >On Fri, Jul 02, 2010 at 11:06:37PM +0200, Oleg Nesterov wrote:
> > >>On 07/02, Peter Zijlstra wrote:
> > >>>On Fri, 2010-07-02 at 11:01 -0700, Sridhar Samudrala wrote:
> > >>>> Does it (Tejun's kthread_clone() patch) also inherit the
> > >>>>cgroup of the caller?
> > >>>Of course, its a simple do_fork() which inherits everything just as you
> > >>>would expect from a similar sys_clone()/sys_fork() call.
> > >>Yes. And I'm afraid it can inherit more than we want. IIUC, this is called
> > >>from ioctl(), right?
> > >>
> > >>Then the new thread becomes the natural child of the caller, and it shares
> > >>->mm with the parent. And files, dup_fd() without CLONE_FS.
> > >>
> > >>Signals. Say, if you send SIGKILL to this new thread, it can't sleep in
> > >>TASK_INTERRUPTIBLE or KILLABLE after that. And this SIGKILL can be sent
> > >>just because the parent gets SIGQUIT or abother coredumpable signal.
> > >>Or the new thread can recieve SIGSTOP via ^Z.
> > >>
> > >>Perhaps this is OK, I do not know. Just to remind that kernel_thread()
> > >>is merely clone(CLONE_VM).
> > >>
> > >>Oleg.
> > >
> > >Right. Doing this might break things like flush. The signal and exit
> > >behaviour needs to be examined carefully. I am also unsure whether
> > >using such threads might be more expensive than inheriting kthreadd.
> > >
> > Should we just leave it to the userspace to set the cgroup/cpumask
> > after qemu starts the guest and
> > the vhost threads?
> >
> > Thanks
> > Sridhar
>
> Yes but we can't trust userspace to do this. It's important
> to do it on thread creation: if we don't, malicious userspace
> can create large amount of work exceeding the cgroup limits.
>
> And the same applies so the affinity: if the qemu process
> is limited to a set of CPUs, it's important to make
> the kernel thread that does work our behalf limited to the same
> set of CPUs.
>
> This is not unique to vhost, it's just that virt scenarious are affected
> by this more: people seem to run untrusted applications and expect the
> damage to be contained.
OK. So we want to create a thread that is a child of kthreadd, but inherits the cgroup/cpumask
from the caller. How about an exported kthread function kthread_create_in_current_cg()
that does this?
diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index aabc8a1..e0616f0 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -9,6 +9,9 @@ struct task_struct *kthread_create(int (*threadfn)(void *data),
const char namefmt[], ...)
__attribute__((format(printf, 3, 4)));
+struct task_struct *kthread_create_in_current_cg(int (*threadfn)(void *data),
+ void *data, char *name);
+
/**
* kthread_run - create and wake a thread.
* @threadfn: the function to run until signal_pending(current).
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 83911c7..ea4e737 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -15,6 +15,7 @@
#include <linux/module.h>
#include <linux/mutex.h>
#include <trace/events/sched.h>
+#include <linux/cgroup.h>
static DEFINE_SPINLOCK(kthread_create_lock);
static LIST_HEAD(kthread_create_list);
@@ -149,6 +150,42 @@ struct task_struct *kthread_create(int (*threadfn)(void *data),
}
EXPORT_SYMBOL(kthread_create);
+struct task_struct *kthread_create_in_current_cg(int (*threadfn)(void *data),
+ void *data, char *name)
+{
+ struct task_struct *worker;
+ cpumask_var_t mask;
+ int ret = -ENOMEM;
+
+ if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+ goto out_free_mask;
+
+ worker = kthread_create(threadfn, data, "%s-%d", name, current->pid);
+ if (IS_ERR(worker))
+ goto out_free_mask;
+
+ ret = sched_getaffinity(current->pid, mask);
+ if (ret)
+ goto out_stop_worker;
+
+ ret = sched_setaffinity(worker->pid, mask);
+ if (ret)
+ goto out_stop_worker;
+
+ ret = cgroup_attach_task_current_cg(worker);
+ if (ret)
+ goto out_stop_worker;
+
+ return worker;
+
+out_stop_worker:
+ kthread_stop(worker);
+out_free_mask:
+ free_cpumask_var(mask);
+ return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(kthread_create_in_current_cg);
+
/**
* kthread_bind - bind a just-created kthread to a cpu.
* @p: thread created by kthread_create().
Thanks
Sridhar
^ permalink raw reply related
* RE: [REGRESSION] e1000e stopped working [MANUALLY BISECTED]
From: Maxim Levitsky @ 2010-07-14 23:33 UTC (permalink / raw)
To: Tantilov, Emil S
Cc: Kirsher, Jeffrey T, netdev@vger.kernel.org, Allan, Bruce W,
Pieper, Jeffrey E
In-Reply-To: <EA929A9653AAE14F841771FB1DE5A1365FF4B05AF9@rrsmsx501.amr.corp.intel.com>
On Wed, 2010-07-14 at 16:56 -0600, Tantilov, Emil S wrote:
> Maxim Levitsky wrote:
> > On Mon, 2010-07-12 at 15:23 -0600, Tantilov, Emil S wrote:
> >> Maxim Levitsky wrote:
> >>> On Mon, 2010-07-05 at 12:58 +0300, Maxim Levitsky wrote:
> >>>> On Mon, 2010-07-05 at 01:13 -0700, Jeff Kirsher wrote:
> >>>>> On Sun, Jul 4, 2010 at 15:48, Maxim Levitsky
> >>>>> <maximlevitsky@gmail.com> wrote:
> >>>>>> Did few guesses, and now I see that reverting the below commit
> >>>>>> fixes the problem.
> >>>>>>
> >>>>>> "e1000e: Fix/cleanup PHY reset code for ICHx/PCHx"
> >>>>>> e98cac447cc1cc418dff1d610a5c79c4f2bdec7f.
> >>>>>>
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Maxim Levitsky
> >>>>>>
> >>>>>> --
> >>>>>
> >>>>> Can you give us till Tuesday to respond? I know that there are
> >>>>> some additional e1000e patches in my queue, which may resolve the
> >>>>> issue, but this weekend the power is down to do some
> >>>>> infrastructure upgrades which prevents us from doing any
> >>>>> investigation.debugging until Tuesday.
> >>>>>
> >>>>
> >>>> Sure.
> >>>>
> >>>> Best regards,
> >>>> Maxim Levitsky
> >>>>
> >>>
> >>> Updates?
> >>
> >> We are working on reproducing the issue. So far we have not seen the
> >> problem when testing with net-next.
> >>
> >> I asked in previous email about some additional info from ethtool
> >> (-d, -e, -S) and kernel config. That would help us to narrow it
> >> down.
> >>
> >> Thanks,
> >> Emil
> > I did send -e and -d output.
>
> Sorry, looks like I lost the email with the attachements.
>
> Could you provide the output of dmesg after the failure occurs?
>
> > Since you probably want -S output during failure, I need to recompile
> > kernel for that. I will do that soon.
> >
> >
> > One question, in two weeks I hope 2.6.35 won't be released?
> > If so, I will have enough free time then to narrow down this issue.
> >
> > Other solution, is to revert this commit.
> > (I have never seen this problem with it reverted).
>
> We have been running reboot tests on 2 separate systems with recent net-next kernels
> using your config and so far no luck in reproducing this issue.
>
> What is the make model of your system (or MB)?
the motherboard is Intel DG965RY.
However, I am using vanilla kernel.
net-next might contain further fixes.
I see if net-next works here.
Best regards,
Maxim Levitsky
^ permalink raw reply
* multiqueue, skb_get_queue_mapping() and netdev_get_tx_queue()
From: Eldon Koyle @ 2010-07-14 23:13 UTC (permalink / raw)
To: netdev
It looks like there is a potential for an out of bounds index anywhere
skb_get_queue_mapping(skb) (which just returns skb->queue_mapping) is
used to get an index for netdev_get_tx_queue() (and probably other
places) on a device with multiple rx/tx queues.
As I understand it, skb->queue_mapping should contain rx_queue + 1,
which can be out of range for netdev_get_tx_queue (which expects a
0-based index).
Am I misunderstanding something, or should all of these occurrences be
replaced with something more like the following?
static inline u16 skb_get_queue_index(const struct sk_buff *skb)
{
return skb->queue_mapping ? skb->queue_mapping - 1 : 0;
}
Here is how it is commonly used (which looks incorrect to me):
In net/8021q/vlan_dev.c:
static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev)
{
int i = skb_get_queue_mapping(skb);
struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
...
And here is some other possibly pertinent code:
In include/linux/netdevice.h:
static inline
struct netdev_queue *netdev_get_tx_queue(const struct net_device *dev,
unsigned int index)
{
return &dev->_tx[index];
}
In net/core/dev.c:
struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
void (*setup)(struct net_device *), unsigned int queue_count)
...
tx = kcalloc(queue_count, sizeof(struct netdev_queue), GFP_KERNEL);
...
dev->_tx = tx;
...
In include/linux/skbuff.h:
static inline u16 skb_get_queue_mapping(const struct sk_buff *skb)
{
return skb->queue_mapping;
}
...
static inline void skb_record_rx_queue(struct sk_buff *skb, u16 rx_queue)
{
skb->queue_mapping = rx_queue + 1;
}
static inline u16 skb_get_rx_queue(const struct sk_buff *skb)
{
return skb->queue_mapping - 1;
}
--
Eldon Koyle
--
Politicians are the same all over. They promise to build a bridge even
where there is no river.
-- Nikita Khrushchev
^ permalink raw reply
* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Oleg Nesterov @ 2010-07-15 0:05 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: Michael S. Tsirkin, Peter Zijlstra, Tejun Heo, Ingo Molnar,
netdev, lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <1279149996.32374.5.camel@w-sridhar.beaverton.ibm.com>
On 07/14, Sridhar Samudrala wrote:
>
> OK. So we want to create a thread that is a child of kthreadd, but inherits the cgroup/cpumask
> from the caller. How about an exported kthread function kthread_create_in_current_cg()
> that does this?
Well. I must admit, this looks a bit strange to me ;)
Instead of exporting sched_xxxaffinity() we export the new function
which calls them. And I don't think this new helper is very useful
in general. May be I am wrong...
Oleg.
^ permalink raw reply
* Re: [PATCH] wd: fix memory leak
From: David Miller @ 2010-07-15 0:53 UTC (permalink / raw)
To: segooon; +Cc: kernel-janitors, joe, netdev
In-Reply-To: <1279020192-9484-1-git-send-email-segooon@gmail.com>
From: Kulikov Vasiliy <segooon@gmail.com>
Date: Tue, 13 Jul 2010 15:23:12 +0400
> Unmap mapped IO in wd_probe1() if register_netdev() failed.
>
> Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH NEXT 1/1] netxen: fix for kdump
From: David Miller @ 2010-07-15 0:55 UTC (permalink / raw)
To: amit.salecha; +Cc: netdev, ameen.rahman, rajesh.borundia
In-Reply-To: <1279020822-10419-1-git-send-email-amit.salecha@qlogic.com>
From: Amit Kumar Salecha <amit.salecha@qlogic.com>
Date: Tue, 13 Jul 2010 04:33:42 -0700
> From: Rajesh Borundia <rajesh.borundia@qlogic.com>
>
> When the crash kernel is loaded after crash, the device is in unknown state.
> So reset the device contexts prior to its creation in case of kdump,
> depending upon kernel parameter reset_devices.
>
> Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Applied, thanks.
^ permalink raw reply
* Re: [patch] net/sched: potential data corruption
From: David Miller @ 2010-07-15 0:56 UTC (permalink / raw)
To: hadi; +Cc: error27, shemminger, netdev, kernel-janitors, matthew
In-Reply-To: <1279036694.16376.0.camel@bigi>
From: jamal <hadi@cyberus.ca>
Date: Tue, 13 Jul 2010 11:58:14 -0400
> On Tue, 2010-07-13 at 15:21 +0200, Dan Carpenter wrote:
>> The reset_policy() does:
>> memset(d->tcfd_defdata, 0, SIMP_MAX_DATA);
>> strlcpy(d->tcfd_defdata, defdata, SIMP_MAX_DATA);
>>
>> In the original code, the size of d->tcfd_defdata wasn't fixed and if
>> strlen(defdata) was less than 31, reset_policy() would cause memory
>> corruption.
>>
>> Please Note: The original alloc_defdata() assumes defdata is 32
>> characters and a NUL terminator while reset_policy() assumes defdata is
>> 31 characters and a NUL. This patch updates alloc_defdata() to match
>> reset_policy() (ie a shorter string). I'm not very familiar with this
>> code so please review carefully.
>>
>> Signed-off-by: Dan Carpenter <error27@gmail.com>
>
>
> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH v2] eth16i: fix memory leak
From: David Miller @ 2010-07-15 0:57 UTC (permalink / raw)
To: segooon; +Cc: kernel-janitors, miku, shemminger, eric.dumazet, tj, jpirko,
netdev
In-Reply-To: <1279050563-15759-1-git-send-email-segooon@gmail.com>
From: Kulikov Vasiliy <segooon@gmail.com>
Date: Tue, 13 Jul 2010 23:49:23 +0400
> Free allocated netdev if no probe is expected.
>
> Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next-2.6] xfrm: cleanup of xfrm_input.c.
From: David Miller @ 2010-07-15 0:59 UTC (permalink / raw)
To: ramirose; +Cc: netdev
In-Reply-To: <AANLkTik6qHgbCVwZfMGXpso4p2DTC9w6U9agdFVl1ZbN@mail.gmail.com>
From: Rami Rosen <ramirose@gmail.com>
Date: Wed, 14 Jul 2010 11:18:41 +0300
> Hi,
> The patch removes unneeded inclusion of header files
> (linux/module.h, linux/netdevice.h, net/dst.h and net/ip.h)
> in net/xfrm/xfrm_input.c
>
> Regards,
> Rami Rosen
>
> Signed-off-by: Rami Rosen <ramirose@gmail.com>
If you do this, I also want to see you add includes for things like
linux/skbuff.h since data structures such as "struct sk_buff"
are used in this file.
Otherwise, this is how we end up with obscure build failures on
some configurations and not others, either now or in the future
when a similar change is made to some header file.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox