Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] ipw2x00: silence GCC warning for unused variable 'dev'
From: Paul Bolle @ 2012-09-21 10:02 UTC (permalink / raw)
  To: Stanislav Yakovlev; +Cc: John W. Linville, linux-wireless, netdev, linux-kernel

Building the libipw component without CONFIG_LIBIPW_DEBUG set triggers this GCC
warning:
    drivers/net/wireless/ipw2x00/libipw_wx.c:526:21: warning: unused variable 'dev' [-Wunused-variable]

The cause of this warning is that, without CONFIG_LIBIPW_DEBUG set,
LIBIPW_DEBUG_WX compiles away. Fix it by substituting ieee->dev for (its
equivalent) dev.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
---
0) I noticed this warning while building v3.6-rc6 on current Fedora 17,
using Fedora's default config.

1) Compile tested only (by just compiling libipw_wx.o).

 drivers/net/wireless/ipw2x00/libipw_wx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ipw2x00/libipw_wx.c b/drivers/net/wireless/ipw2x00/libipw_wx.c
index 1571505..54aba47 100644
--- a/drivers/net/wireless/ipw2x00/libipw_wx.c
+++ b/drivers/net/wireless/ipw2x00/libipw_wx.c
@@ -675,7 +675,7 @@ int libipw_wx_set_encodeext(struct libipw_device *ieee,
 	}
       done:
 	if (ieee->set_security)
-		ieee->set_security(ieee->dev, &sec);
+		ieee->set_security(dev, &sec);

 	return ret;
 }
-- 
1.7.11.4

^ permalink raw reply related

* Re: Oops with latest (netfilter) nf-next tree, when unloading iptable_nat
From: Pablo Neira Ayuso @ 2012-09-21 10:03 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David Miller, Patrick McHardy, Florian Westphal, netfilter-devel,
	netdev
In-Reply-To: <1348220842.3103.17.camel@localhost>

On Fri, Sep 21, 2012 at 11:47:22AM +0200, Jesper Dangaard Brouer wrote:
> On Fri, 2012-09-21 at 03:00 +0200, Pablo Neira Ayuso wrote:
> > On Thu, Sep 20, 2012 at 07:06:52PM +0200, Patrick McHardy wrote:
> > > On Thu, 20 Sep 2012, Patrick McHardy wrote:
> [cut]
> (discussion of fixes by Patrick and Florian)
> (...settling on Patricks second patch)
> 
> > Makes sense. And we can revisit this to improve it later.
> > 
> > I'll take this patch. I'll send a batch with updates for the nf-nat
> > thin asap.
> 
> What git tree is that?
> 
> I'm trying to work off Pablo's nf-next tree (for my IPVS changes):
>   git://1984.lsi.us.es/nf-next
> 
> But I don't see the patch in that tree ...yet.

I didn't push it yet, will do asap.

> Notice, the bug is also present in DaveM's net-next tree.
> (I know I stated earlier that it didn't affect net-next, but I just
> forgot to select the new netfilter .config options for nat)

^ permalink raw reply

* Re: Oops with latest (netfilter) nf-next tree, when unloading iptable_nat
From: Pablo Neira Ayuso @ 2012-09-21 10:17 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David Miller, Patrick McHardy, Florian Westphal, netfilter-devel,
	netdev
In-Reply-To: <20120921100308.GA24155@1984>

On Fri, Sep 21, 2012 at 12:03:08PM +0200, Pablo Neira Ayuso wrote:
> On Fri, Sep 21, 2012 at 11:47:22AM +0200, Jesper Dangaard Brouer wrote:
> > On Fri, 2012-09-21 at 03:00 +0200, Pablo Neira Ayuso wrote:
> > > On Thu, Sep 20, 2012 at 07:06:52PM +0200, Patrick McHardy wrote:
> > > > On Thu, 20 Sep 2012, Patrick McHardy wrote:
> > [cut]
> > (discussion of fixes by Patrick and Florian)
> > (...settling on Patricks second patch)
> > 
> > > Makes sense. And we can revisit this to improve it later.
> > > 
> > > I'll take this patch. I'll send a batch with updates for the nf-nat
> > > thin asap.
> > 
> > What git tree is that?
> > 
> > I'm trying to work off Pablo's nf-next tree (for my IPVS changes):
> >   git://1984.lsi.us.es/nf-next
> > 
> > But I don't see the patch in that tree ...yet.
> 
> I didn't push it yet, will do asap.

Done.

You may require git pull --rebase to get your patches up on the git
head.

^ permalink raw reply

* [PATCH] mISDN: suppress compiler warning
From: Paul Bolle @ 2012-09-21 10:25 UTC (permalink / raw)
  To: Karsten Keil; +Cc: netdev, linux-kernel

Building the hfcpci driver triggers this GCC warning:
    drivers/isdn/hardware/mISDN/hfcpci.c:2298:2: warning: ignoring return value of 'driver_for_each_device', declared with attribute warn_unused_result [-Wunused-result]

That return value is apparently ignored because _hfcpci_softirq() will
always return 0. Suppress this warning in the way a few other drivers do
that too.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
---
0) I noticed this warning while building v3.6-rc6 on current Fedora 17,
using Fedora's default config.

1) Compile tested only.

 drivers/isdn/hardware/mISDN/hfcpci.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/isdn/hardware/mISDN/hfcpci.c b/drivers/isdn/hardware/mISDN/hfcpci.c
index 81363ff..a547c8c 100644
--- a/drivers/isdn/hardware/mISDN/hfcpci.c
+++ b/drivers/isdn/hardware/mISDN/hfcpci.c
@@ -2295,8 +2295,11 @@ _hfcpci_softirq(struct device *dev, void *arg)
 static void
 hfcpci_softirq(void *arg)
 {
-	(void) driver_for_each_device(&hfc_driver.driver, NULL, arg,
+	int ret;
+
+	ret = driver_for_each_device(&hfc_driver.driver, NULL, arg,
 				      _hfcpci_softirq);
+	(void)ret;	/* suppress compiler warning */
 
 	/* if next event would be in the past ... */
 	if ((s32)(hfc_jiffies + tics - jiffies) <= 0)
-- 
1.7.11.4

^ permalink raw reply related

* pull-request: can 2012-09-21
From: Marc Kleine-Budde @ 2012-09-21 11:05 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-can

Hello David,

two patches for the v3.6 release cycle. Ira W. Snyder fixed support for the
older version of the Janz CMOD-IO Carrier Board. I found and fixed an oops in
the ti_hecc driver, which occurs when removing the module if the network
interface is still open.

If it's too late for these patches, I'll rebase them to net-next.

regards, Marc

--

The following changes since commit c0d680e577ff171e7b37dbdb1b1bf5451e851f04:

  net: do not disable sg for packets requiring no checksum (2012-09-20 22:23:40 -0400)

are available in the git repository at:

  git://gitorious.org/linux-can/linux-can.git fixes-for-3.6

for you to fetch changes up to ab04c8bd423edb03e2148350a091836c196107fc:

  can: ti_hecc: fix oops during rmmod (2012-09-21 12:54:53 +0200)

----------------------------------------------------------------
Ira W. Snyder (1):
      can: janz-ican3: fix support for older hardware revisions

Marc Kleine-Budde (1):
      can: ti_hecc: fix oops during rmmod

 drivers/net/can/janz-ican3.c |    4 +---
 drivers/net/can/ti_hecc.c    |    2 +-
 2 files changed, 2 insertions(+), 4 deletions(-)

^ permalink raw reply

* [PATCH 1/2] can: janz-ican3: fix support for older hardware revisions
From: Marc Kleine-Budde @ 2012-09-21 11:05 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-can, Ira W. Snyder, stable, Marc Kleine-Budde
In-Reply-To: <1348225535-24976-1-git-send-email-mkl@pengutronix.de>

From: "Ira W. Snyder" <iws@ovro.caltech.edu>

The Revision 1.0 Janz CMOD-IO Carrier Board does not have support for
the reset registers. To support older hardware, the code is changed to
use the hardware reset register on the Janz VMOD-ICAN3 hardware itself.

Cc: stable <stable@vger.kernel.org>
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/janz-ican3.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/can/janz-ican3.c b/drivers/net/can/janz-ican3.c
index 98ee438..7edadee 100644
--- a/drivers/net/can/janz-ican3.c
+++ b/drivers/net/can/janz-ican3.c
@@ -1391,7 +1391,6 @@ static irqreturn_t ican3_irq(int irq, void *dev_id)
  */
 static int ican3_reset_module(struct ican3_dev *mod)
 {
-	u8 val = 1 << mod->num;
 	unsigned long start;
 	u8 runold, runnew;

@@ -1405,8 +1404,7 @@ static int ican3_reset_module(struct ican3_dev *mod)
 	runold = ioread8(mod->dpm + TARGET_RUNNING);

 	/* reset the module */
-	iowrite8(val, &mod->ctrl->reset_assert);
-	iowrite8(val, &mod->ctrl->reset_deassert);
+	iowrite8(0x00, &mod->dpmctrl->hwreset);

 	/* wait until the module has finished resetting and is running */
 	start = jiffies;
-- 
1.7.10

^ permalink raw reply related

* [PATCH 2/2] can: ti_hecc: fix oops during rmmod
From: Marc Kleine-Budde @ 2012-09-21 11:05 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-can, Marc Kleine-Budde, stable, Anant Gole
In-Reply-To: <1348225535-24976-1-git-send-email-mkl@pengutronix.de>

This patch fixes an oops which occurs when unloading the driver, while the
network interface is still up. The problem is that first the io mapping is
teared own, then the CAN device is unregistered, resulting in accessing the
hardware's iomem:

[  172.744232] Unable to handle kernel paging request at virtual address c88b0040
[  172.752441] pgd = c7be4000
[  172.755645] [c88b0040] *pgd=87821811, *pte=00000000, *ppte=00000000
[  172.762207] Internal error: Oops: 807 [#1] PREEMPT ARM
[  172.767517] Modules linked in: ti_hecc(-) can_dev
[  172.772430] CPU: 0    Not tainted  (3.5.0alpha-00037-g3554cc0 #126)
[  172.778961] PC is at ti_hecc_close+0xb0/0x100 [ti_hecc]
[  172.784423] LR is at __dev_close_many+0x90/0xc0
[  172.789123] pc : [<bf00c768>]    lr : [<c033be58>]    psr: 60000013
[  172.789123] sp : c5c1de68  ip : 00040081  fp : 00000000
[  172.801025] r10: 00000001  r9 : c5c1c000  r8 : 00100100
[  172.806457] r7 : c5d0a48c  r6 : c5d0a400  r5 : 00000000  r4 : c5d0a000
[  172.813232] r3 : c88b0000  r2 : 00000001  r1 : c5d0a000  r0 : c5d0a000
[  172.820037] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  172.827423] Control: 10c5387d  Table: 87be4019  DAC: 00000015
[  172.833404] Process rmmod (pid: 600, stack limit = 0xc5c1c2f0)
[  172.839447] Stack: (0xc5c1de68 to 0xc5c1e000)
[  172.843994] de60:                   bf00c6b8 c5c1dec8 c5d0a000 c5d0a000 00200200 c033be58
[  172.852478] de80: c5c1de44 c5c1dec8 c5c1dec8 c033bf2c c5c1de90 c5c1de90 c5d0a084 c5c1de44
[  172.860992] dea0: c5c1dec8 c033c098 c061d3dc c5d0a000 00000000 c05edf28 c05edb34 c000d724
[  172.869476] dec0: 00000000 c033c2f8 c5d0a084 c5d0a084 00000000 c033c370 00000000 c5d0a000
[  172.877990] dee0: c05edb00 c033c3b8 c5d0a000 bf00d3ac c05edb00 bf00d7c8 bf00d7c8 c02842dc
[  172.886474] df00: c02842c8 c0282f90 c5c1c000 c05edb00 bf00d7c8 c0283668 bf00d7c8 00000000
[  172.894989] df20: c0611f98 befe2f80 c000d724 c0282d10 bf00d804 00000000 00000013 c0068a8c
[  172.903472] df40: c5c538e8 685f6974 00636365 c61571a8 c5cb9980 c61571a8 c6158a20 c00c9bc4
[  172.911987] df60: 00000000 00000000 c5cb9980 00000000 c5cb9980 00000000 c7823680 00000006
[  172.920471] df80: bf00d804 00000880 c5c1df8c 00000000 000d4267 befe2f80 00000001 b6d90068
[  172.928985] dfa0: 00000081 c000d5a0 befe2f80 00000001 befe2f80 00000880 b6d90008 00000008
[  172.937469] dfc0: befe2f80 00000001 b6d90068 00000081 00000001 00000000 befe2eac 00000000
[  172.945983] dfe0: 00000000 befe2b18 00023ba4 b6e6addc 60000010 befe2f80 a8e00190 86d2d344
[  172.954498] [<bf00c768>] (ti_hecc_close+0xb0/0x100 [ti_hecc]) from [<c033be58>] (__dev__registered_many+0xc0/0x2a0)
[  172.984161] [<c033c098>] (rollback_registered_many+0xc0/0x2a0) from [<c033c2f8>] (rollback_registered+0x20/0x30)
[  172.994750] [<c033c2f8>] (rollback_registered+0x20/0x30) from [<c033c370>] (unregister_netdevice_queue+0x68/0x98)
[  173.005401] [<c033c370>] (unregister_netdevice_queue+0x68/0x98) from [<c033c3b8>] (unregister_netdev+0x18/0x20)
[  173.015899] [<c033c3b8>] (unregister_netdev+0x18/0x20) from [<bf00d3ac>] (ti_hecc_remove+0x60/0x80 [ti_hecc])
[  173.026245] [<bf00d3ac>] (ti_hecc_remove+0x60/0x80 [ti_hecc]) from [<c02842dc>] (platform_drv_remove+0x14/0x18)
[  173.036712] [<c02842dc>] (platform_drv_remove+0x14/0x18) from [<c0282f90>] (__device_release_driver+0x7c/0xbc)

Cc: stable <stable@vger.kernel.org>
Cc: Anant Gole <anantgole@ti.com>
Tested-by: Jan Luebbe <jlu@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/ti_hecc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/ti_hecc.c b/drivers/net/can/ti_hecc.c
index 527dbcf..9ded21e 100644
--- a/drivers/net/can/ti_hecc.c
+++ b/drivers/net/can/ti_hecc.c
@@ -984,12 +984,12 @@ static int __devexit ti_hecc_remove(struct platform_device *pdev)
 	struct net_device *ndev = platform_get_drvdata(pdev);
 	struct ti_hecc_priv *priv = netdev_priv(ndev);
 
+	unregister_candev(ndev);
 	clk_disable(priv->clk);
 	clk_put(priv->clk);
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	iounmap(priv->base);
 	release_mem_region(res->start, resource_size(res));
-	unregister_candev(ndev);
 	free_candev(ndev);
 	platform_set_drvdata(pdev, NULL);
 
-- 
1.7.10

^ permalink raw reply related

* [PATCH] net/stmmac: Use clk_prepare_enable and clk_disable_unprepare
From: Stefan Roese @ 2012-09-21 11:06 UTC (permalink / raw)
  To: netdev; +Cc: Viresh Kumar, Giuseppe Cavallaro

This patch fixes an issue introduced by commit ID 6a81c26f
[net/stmmac: remove conditional compilation of clk code], which
switched from the internal stmmac_clk_{en}{dis}able calls to
clk_{en}{dis}able. By this, calling clk_prepare and clk_unprepare
was removed.

clk_{un}prepare is mandatory for platforms using common clock framework.
Since these drivers are used by SPEAr platform, which supports common
clock framework, add clk_{un}prepare() support for them. Otherwise
the clocks are not correctly en-/disabled and ethernet support doesn't
work.

Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 10 +++++-----
 drivers/net/ethernet/stmicro/stmmac/stmmac_timer.c |  6 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c136162..3be8833 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1066,7 +1066,7 @@ static int stmmac_open(struct net_device *dev)
 	} else
 		priv->tm->enable = 1;
 #endif
-	clk_enable(priv->stmmac_clk);
+	clk_prepare_enable(priv->stmmac_clk);
 
 	stmmac_check_ether_addr(priv);
 
@@ -1188,7 +1188,7 @@ open_error:
 	if (priv->phydev)
 		phy_disconnect(priv->phydev);
 
-	clk_disable(priv->stmmac_clk);
+	clk_disable_unprepare(priv->stmmac_clk);
 
 	return ret;
 }
@@ -1246,7 +1246,7 @@ static int stmmac_release(struct net_device *dev)
 #ifdef CONFIG_STMMAC_DEBUG_FS
 	stmmac_exit_fs();
 #endif
-	clk_disable(priv->stmmac_clk);
+	clk_disable_unprepare(priv->stmmac_clk);
 
 	return 0;
 }
@@ -2178,7 +2178,7 @@ int stmmac_suspend(struct net_device *ndev)
 	else {
 		stmmac_set_mac(priv->ioaddr, false);
 		/* Disable clock in case of PWM is off */
-		clk_disable(priv->stmmac_clk);
+		clk_disable_unprepare(priv->stmmac_clk);
 	}
 	spin_unlock_irqrestore(&priv->lock, flags);
 	return 0;
@@ -2203,7 +2203,7 @@ int stmmac_resume(struct net_device *ndev)
 		priv->hw->mac->pmt(priv->ioaddr, 0);
 	else
 		/* enable the clk prevously disabled */
-		clk_enable(priv->stmmac_clk);
+		clk_prepare_enable(priv->stmmac_clk);
 
 	netif_device_attach(ndev);
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_timer.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_timer.c
index 2a0e1ab..63ea9987 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_timer.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_timer.c
@@ -97,12 +97,12 @@ static struct clk *timer_clock;
 static void stmmac_tmu_start(unsigned int new_freq)
 {
 	clk_set_rate(timer_clock, new_freq);
-	clk_enable(timer_clock);
+	clk_prepare_enable(timer_clock);
 }
 
 static void stmmac_tmu_stop(void)
 {
-	clk_disable(timer_clock);
+	clk_disable_unprepare(timer_clock);
 }
 
 int stmmac_open_ext_timer(struct net_device *dev, struct stmmac_timer *tm)
@@ -126,7 +126,7 @@ int stmmac_open_ext_timer(struct net_device *dev, struct stmmac_timer *tm)
 
 int stmmac_close_ext_timer(void)
 {
-	clk_disable(timer_clock);
+	clk_disable_unprepare(timer_clock);
 	tmu2_unregister_user();
 	clk_put(timer_clock);
 	return 0;
-- 
1.7.12.1

^ permalink raw reply related

* Re: [PATCH] net/stmmac: Use clk_prepare_enable and clk_disable_unprepare
From: Viresh Kumar @ 2012-09-21 11:21 UTC (permalink / raw)
  To: Stefan Roese; +Cc: netdev, Giuseppe Cavallaro, spear-devel
In-Reply-To: <1348225589-8126-1-git-send-email-sr@denx.de>

On 21 September 2012 16:36, Stefan Roese <sr@denx.de> wrote:
> This patch fixes an issue introduced by commit ID 6a81c26f
> [net/stmmac: remove conditional compilation of clk code], which
> switched from the internal stmmac_clk_{en}{dis}able calls to
> clk_{en}{dis}able. By this, calling clk_prepare and clk_unprepare
> was removed.
>
> clk_{un}prepare is mandatory for platforms using common clock framework.
> Since these drivers are used by SPEAr platform, which supports common
> clock framework, add clk_{un}prepare() support for them. Otherwise
> the clocks are not correctly en-/disabled and ethernet support doesn't
> work.

I can't believe i have done this. :)

IIRC, when i wrote this code prepare/unprepare weren't there. And by the
time my code got merged, they were. And this mistake was missed there.

Thanks for fixing it.

Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>

^ permalink raw reply

* [patch net-next] teamd: send port changed when added
From: Jiri Pirko @ 2012-09-21 11:50 UTC (permalink / raw)
  To: netdev; +Cc: davem

On some hw, link is not up during adding iface to team. That causes event
not being sent to userspace and that may cause confusion.
Fix this bug by sending port changed event once it's added to team.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 drivers/net/team/team.c | 33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 266af7b..9ce0c51 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -966,7 +966,8 @@ static struct netpoll_info *team_netpoll_info(struct team *team)
 }
 #endif
 
-static void __team_port_change_check(struct team_port *port, bool linkup);
+static void __team_port_change_port_added(struct team_port *port, bool linkup);
+
 static int team_dev_type_check_change(struct net_device *dev,
 				      struct net_device *port_dev);
 
@@ -1079,7 +1080,7 @@ static int team_port_add(struct team *team, struct net_device *port_dev)
 	team_port_enable(team, port);
 	list_add_tail_rcu(&port->list, &team->port_list);
 	__team_compute_features(team);
-	__team_port_change_check(port, !!netif_carrier_ok(port_dev));
+	__team_port_change_port_added(port, !!netif_carrier_ok(port_dev));
 	__team_options_change_check(team);
 
 	netdev_info(dev, "Port device %s added\n", portname);
@@ -1114,6 +1115,8 @@ err_set_mtu:
 	return err;
 }
 
+static void __team_port_change_port_removed(struct team_port *port);
+
 static int team_port_del(struct team *team, struct net_device *port_dev)
 {
 	struct net_device *dev = team->dev;
@@ -1130,8 +1133,7 @@ static int team_port_del(struct team *team, struct net_device *port_dev)
 	__team_option_inst_mark_removed_port(team, port);
 	__team_options_change_check(team);
 	__team_option_inst_del_port(team, port);
-	port->removed = true;
-	__team_port_change_check(port, false);
+	__team_port_change_port_removed(port);
 	team_port_disable(team, port);
 	list_del_rcu(&port->list);
 	netdev_rx_handler_unregister(port_dev);
@@ -2499,13 +2501,11 @@ static void __team_options_change_check(struct team *team)
 }
 
 /* rtnl lock is held */
-static void __team_port_change_check(struct team_port *port, bool linkup)
+
+static void __team_port_change_send(struct team_port *port, bool linkup)
 {
 	int err;
 
-	if (!port->removed && port->state.linkup == linkup)
-		return;
-
 	port->changed = true;
 	port->state.linkup = linkup;
 	team_refresh_port_linkup(port);
@@ -2530,6 +2530,23 @@ send_event:
 
 }
 
+static void __team_port_change_check(struct team_port *port, bool linkup)
+{
+	if (port->state.linkup != linkup)
+		__team_port_change_send(port, linkup);
+}
+
+static void __team_port_change_port_added(struct team_port *port, bool linkup)
+{
+	__team_port_change_send(port, linkup);
+}
+
+static void __team_port_change_port_removed(struct team_port *port)
+{
+	port->removed = true;
+	__team_port_change_send(port, false);
+}
+
 static void team_port_change_check(struct team_port *port, bool linkup)
 {
 	struct team *team = port->team;
-- 
1.7.12

^ permalink raw reply related

* [patch net-next] team: send port changed when added
From: Jiri Pirko @ 2012-09-21 11:51 UTC (permalink / raw)
  To: netdev; +Cc: davem

On some hw, link is not up during adding iface to team. That causes event
not being sent to userspace and that may cause confusion.
Fix this bug by sending port changed event once it's added to team.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 drivers/net/team/team.c | 33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 266af7b..9ce0c51 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -966,7 +966,8 @@ static struct netpoll_info *team_netpoll_info(struct team *team)
 }
 #endif
 
-static void __team_port_change_check(struct team_port *port, bool linkup);
+static void __team_port_change_port_added(struct team_port *port, bool linkup);
+
 static int team_dev_type_check_change(struct net_device *dev,
 				      struct net_device *port_dev);
 
@@ -1079,7 +1080,7 @@ static int team_port_add(struct team *team, struct net_device *port_dev)
 	team_port_enable(team, port);
 	list_add_tail_rcu(&port->list, &team->port_list);
 	__team_compute_features(team);
-	__team_port_change_check(port, !!netif_carrier_ok(port_dev));
+	__team_port_change_port_added(port, !!netif_carrier_ok(port_dev));
 	__team_options_change_check(team);
 
 	netdev_info(dev, "Port device %s added\n", portname);
@@ -1114,6 +1115,8 @@ err_set_mtu:
 	return err;
 }
 
+static void __team_port_change_port_removed(struct team_port *port);
+
 static int team_port_del(struct team *team, struct net_device *port_dev)
 {
 	struct net_device *dev = team->dev;
@@ -1130,8 +1133,7 @@ static int team_port_del(struct team *team, struct net_device *port_dev)
 	__team_option_inst_mark_removed_port(team, port);
 	__team_options_change_check(team);
 	__team_option_inst_del_port(team, port);
-	port->removed = true;
-	__team_port_change_check(port, false);
+	__team_port_change_port_removed(port);
 	team_port_disable(team, port);
 	list_del_rcu(&port->list);
 	netdev_rx_handler_unregister(port_dev);
@@ -2499,13 +2501,11 @@ static void __team_options_change_check(struct team *team)
 }
 
 /* rtnl lock is held */
-static void __team_port_change_check(struct team_port *port, bool linkup)
+
+static void __team_port_change_send(struct team_port *port, bool linkup)
 {
 	int err;
 
-	if (!port->removed && port->state.linkup == linkup)
-		return;
-
 	port->changed = true;
 	port->state.linkup = linkup;
 	team_refresh_port_linkup(port);
@@ -2530,6 +2530,23 @@ send_event:
 
 }
 
+static void __team_port_change_check(struct team_port *port, bool linkup)
+{
+	if (port->state.linkup != linkup)
+		__team_port_change_send(port, linkup);
+}
+
+static void __team_port_change_port_added(struct team_port *port, bool linkup)
+{
+	__team_port_change_send(port, linkup);
+}
+
+static void __team_port_change_port_removed(struct team_port *port)
+{
+	port->removed = true;
+	__team_port_change_send(port, false);
+}
+
 static void team_port_change_check(struct team_port *port, bool linkup)
 {
 	struct team *team = port->team;
-- 
1.7.12

^ permalink raw reply related

* Re: [PATCH v3] ucc_geth: Lockless xmit
From: Francois Romieu @ 2012-09-21 12:51 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: netdev
In-Reply-To: <1348218675-3804-1-git-send-email-Joakim.Tjernlund@transmode.se>

Joakim Tjernlund <Joakim.Tjernlund@transmode.se> :
> Currently ucc_geth_start_xmit wraps IRQ off for the
> whole body just to be safe. By rearranging the code a bit
> one can avoid the lock completely.

Afaics you went a bit too lockless with the queueing disable / enable
logic. The hard_start_xmit handler is run in a locally softirq disabled
section but it will happily race with the napi handler on a different
CPU. Grep netif_tx_lock in tg3.c for it.

The Tx skb free logic probably requires some smp memory barriers as
well since the current skb is used by the ucc_geth driver to sync the
Tx xmit with the napi completion handler.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH] Xen backend support for paged out grant targets V4.
From: Konrad Rzeszutek Wilk @ 2012-09-21 13:28 UTC (permalink / raw)
  To: davem
  Cc: Ian Campbell, Andres Lagar-Cavilla, xen-devel, David Vrabel,
	David Miller, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <5B5132A4-93B2-41D0-B1A6-048810565DB5@gridcentric.ca>

> >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
> > 
> > Acked-by: Ian Campbell <ian.campbell@citrix.com>
> > 
> > Since this is more about grant tables than netback this should probably
> > go via Konrad rather than Dave, is that OK with you Dave?
> 
> If that is the case hopefully Konrad can deal with the two typos? Otherwise happy to re-spin the patch.
> Thanks!

David, I pulled it in my tree since the only changes it does to drivers/net/xen-* is
change the name of the function to call in the bowels of grant API.

HYPERVISOR_grant_table_op(GNTTABOP_copy, netbk->tx_copy_ops, nr_gops);
to
gnttab_batch_copy(netbk->tx_copy_ops, nr_gops);

Hope that is OK - if not I can prep a branch that has patches that this depends
on that you can pull.

^ permalink raw reply

* (unknown), 
From: NICOLAS LEMIEUX @ 2012-09-21 14:00 UTC (permalink / raw)
  To: netdev


Thanks,
Regards,
Nick
O: +1 847-430-6845 | M: +1 360-977-2845

^ permalink raw reply

* RE: QUOTATION OFFER&#8207;
From: Purchase Manager @ 2012-09-21 14:04 UTC (permalink / raw)
  To: netdev

Dear Friend,

Thanks for your response,

We have gone through your sites/ products and have made our choice.

kindly view our order click on our web site to download our offer

http://pastehtml.com/view/ccano27d1.html

your  Proforma Invoice.

Thank you

^ permalink raw reply

* Re: [PATCH net-next v1] net: use a per task frag allocator
From: Eric Dumazet @ 2012-09-21 14:57 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, netdev, Ben Hutchings, Vijay Subramanian,
	Alexander Duyck
In-Reply-To: <20120920.174827.1245530945282009606.davem@davemloft.net>

On Thu, 2012-09-20 at 17:48 -0400, David Miller wrote:

> 
> I like this a lot and I look forward to your upcoming changes to
> convert the other two sk_sndmsg_page users as well, but I can't
> apply this to net-next just yet.
> 

Sure, I was not expecting a merge at this early stage.

> The question on fallback is a good one and something we have
> to resolve before applying this.
> 
> Note in particular that sk_allocation can be set to just about
> anything, and this also has potential interaction issues with
> SOCK_MEMALLOC.

It seems the SOCK_MEMALLOC is only used in the receive path ?

current tcp_sendmsg() uses a mere :

static inline struct page *sk_stream_alloc_page(struct sock *sk)
{
	struct page *page = NULL;

	page = alloc_pages(sk->sk_allocation, 0);
...

So there is no test on SOCK_MEMALLOC flag, and everything is contained
in sk_allocation.


What I did on v2 is to use either :

- Per task __GFP_WAIT frag allocator (current->task_frag)

- Per socket !__GFP_WAIT frag allocator (sk->sk_frag), used only
  on 'special' sockets (kernel icmp sockets for example), or any socket
  that use GFP_ATOMIC for its sk_allocation mode

Both use a common helper trying to allocate "32768 bytes" pages,
with fallback to smaller ones in case of memory pressure.

I removed the special cork->page Herbert Xu introduced
for the lockless udp send : we can use the per task task_frag for this.

I also covered ipv6/ipv4 append_data use to benefit from high order
pages as well.

This patch actually removes 40 LOC in the kernel ;)

Thanks

(As a followup, sk_enter_memory_pressure() could be moved from
include/net/sock.h to net/core/sock.c)

[PATCH net-next v2] net: use a per task frag allocator

We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk->sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
---
v2: uses existing page_frag structure to hold page/offset/size
    convert linear_to_page()/__ip_append_data()/ip6_append_data()
    remove @page and @off fields from struct inet_cork
    move the destructor from tcp_v4_destroy_sock() to sk_common_release

 include/linux/sched.h   |    3 +
 include/net/inet_sock.h |    4 -
 include/net/sock.h      |   27 +++++++-----
 kernel/exit.c           |    3 +
 kernel/fork.c           |    1 
 net/core/skbuff.c       |   37 ++++-------------
 net/core/sock.c         |   48 ++++++++++++++++++++++-
 net/ipv4/ip_output.c    |   70 +++++++++++++--------------------
 net/ipv4/tcp.c          |   79 ++++++++++----------------------------
 net/ipv4/tcp_ipv4.c     |    8 ---
 net/ipv6/ip6_output.c   |   65 ++++++++++++-------------------
 11 files changed, 153 insertions(+), 192 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index b8c8664..a8e2413 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1530,6 +1530,9 @@ struct task_struct {
 	 * cache last used pipe for splice
 	 */
 	struct pipe_inode_info *splice_pipe;
+
+	struct page_frag task_frag;
+
 #ifdef	CONFIG_TASK_DELAY_ACCT
 	struct task_delay_info *delays;
 #endif
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 613cfa4..256c1ed 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -101,10 +101,8 @@ struct inet_cork {
 	__be32			addr;
 	struct ip_options	*opt;
 	unsigned int		fragsize;
-	struct dst_entry	*dst;
 	int			length; /* Total length of all frames */
-	struct page		*page;
-	u32			off;
+	struct dst_entry	*dst;
 	u8			tx_flags;
 };
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 181b711..42053759 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -247,8 +247,7 @@ struct cg_proto;
   *	@sk_stamp: time stamp of last packet received
   *	@sk_socket: Identd and reporting IO signals
   *	@sk_user_data: RPC layer private data
-  *	@sk_sndmsg_page: cached page for sendmsg
-  *	@sk_sndmsg_off: cached offset for sendmsg
+  *	@sk_frag: cached page frag
   *	@sk_peek_off: current peek_offset value
   *	@sk_send_head: front of stuff to transmit
   *	@sk_security: used by security modules
@@ -362,9 +361,8 @@ struct sock {
 	ktime_t			sk_stamp;
 	struct socket		*sk_socket;
 	void			*sk_user_data;
-	struct page		*sk_sndmsg_page;
+	struct page_frag	sk_frag;
 	struct sk_buff		*sk_send_head;
-	__u32			sk_sndmsg_off;
 	__s32			sk_peek_off;
 	int			sk_write_pending;
 #ifdef CONFIG_SECURITY
@@ -2034,18 +2032,23 @@ static inline void sk_stream_moderate_sndbuf(struct sock *sk)
 
 struct sk_buff *sk_stream_alloc_skb(struct sock *sk, int size, gfp_t gfp);
 
-static inline struct page *sk_stream_alloc_page(struct sock *sk)
+/**
+ * sk_page_frag - return an appropriate page_frag
+ * @sk: socket
+ *
+ * If socket allocation mode allows current thread to sleep, it means its
+ * safe to use the per task page_frag instead of the per socket one.
+ */
+static inline struct page_frag *sk_page_frag(struct sock *sk)
 {
-	struct page *page = NULL;
+	if (sk->sk_allocation & __GFP_WAIT)
+		return &current->task_frag;
 
-	page = alloc_pages(sk->sk_allocation, 0);
-	if (!page) {
-		sk_enter_memory_pressure(sk);
-		sk_stream_moderate_sndbuf(sk);
-	}
-	return page;
+	return &sk->sk_frag;
 }
 
+extern bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag);
+
 /*
  *	Default write policy as shown to user space via poll/select/SIGIO
  */
diff --git a/kernel/exit.c b/kernel/exit.c
index f65345f..42f2595 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1046,6 +1046,9 @@ void do_exit(long code)
 	if (tsk->splice_pipe)
 		__free_pipe_info(tsk->splice_pipe);
 
+	if (tsk->task_frag.page)
+		put_page(tsk->task_frag.page);
+
 	validate_creds_for_do_exit(tsk);
 
 	preempt_disable();
diff --git a/kernel/fork.c b/kernel/fork.c
index 2c8857e..01565b9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -330,6 +330,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig)
 	tsk->btrace_seq = 0;
 #endif
 	tsk->splice_pipe = NULL;
+	tsk->task_frag.page = NULL;
 
 	account_kernel_stack(ti, 1);
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index fe00d12..2ede3cf 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1655,38 +1655,19 @@ static struct page *linear_to_page(struct page *page, unsigned int *len,
 				   unsigned int *offset,
 				   struct sk_buff *skb, struct sock *sk)
 {
-	struct page *p = sk->sk_sndmsg_page;
-	unsigned int off;
+	struct page_frag *pfrag = sk_page_frag(sk);
 
-	if (!p) {
-new_page:
-		p = sk->sk_sndmsg_page = alloc_pages(sk->sk_allocation, 0);
-		if (!p)
-			return NULL;
-
-		off = sk->sk_sndmsg_off = 0;
-		/* hold one ref to this page until it's full */
-	} else {
-		unsigned int mlen;
-
-		/* If we are the only user of the page, we can reset offset */
-		if (page_count(p) == 1)
-			sk->sk_sndmsg_off = 0;
-		off = sk->sk_sndmsg_off;
-		mlen = PAGE_SIZE - off;
-		if (mlen < 64 && mlen < *len) {
-			put_page(p);
-			goto new_page;
-		}
+	if (!sk_page_frag_refill(sk, pfrag))
+		return NULL;
 
-		*len = min_t(unsigned int, *len, mlen);
-	}
+	*len = min_t(unsigned int, *len, pfrag->size - pfrag->offset);
 
-	memcpy(page_address(p) + off, page_address(page) + *offset, *len);
-	sk->sk_sndmsg_off += *len;
-	*offset = off;
+	memcpy(page_address(pfrag->page) + pfrag->offset,
+	       page_address(page) + *offset, *len);
+	*offset = pfrag->offset;
+	pfrag->offset += *len;
 
-	return p;
+	return pfrag->page;
 }
 
 static bool spd_can_coalesce(const struct splice_pipe_desc *spd,
diff --git a/net/core/sock.c b/net/core/sock.c
index 2693f76..a9f6a4d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1744,6 +1744,45 @@ struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size,
 }
 EXPORT_SYMBOL(sock_alloc_send_skb);
 
+/* On 32bit arches, an skb frag is limited to 2^15 */
+#define SKB_FRAG_PAGE_ORDER	get_order(32768)
+
+bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
+{
+	int order;
+
+	if (pfrag->page) {
+		if (atomic_read(&pfrag->page->_count) == 1) {
+			pfrag->offset = 0;
+			return true;
+		}
+		if (pfrag->offset < pfrag->size)
+			return true;
+		put_page(pfrag->page);
+	}
+
+	/* We restrict high order allocations to users that can afford to wait */
+	order = (sk->sk_allocation & __GFP_WAIT) ? SKB_FRAG_PAGE_ORDER : 0;
+
+	do {
+		gfp_t gfp = sk->sk_allocation;
+
+		if (order)
+			gfp |= __GFP_COMP | __GFP_NOWARN;
+		pfrag->page = alloc_pages(gfp, order);
+		if (likely(pfrag->page)) {
+			pfrag->offset = 0;
+			pfrag->size = PAGE_SIZE << order;
+			return true;
+		}
+	} while (--order >= 0);
+
+	sk_enter_memory_pressure(sk);
+	sk_stream_moderate_sndbuf(sk);
+	return false;
+}
+EXPORT_SYMBOL(sk_page_frag_refill);
+
 static void __lock_sock(struct sock *sk)
 	__releases(&sk->sk_lock.slock)
 	__acquires(&sk->sk_lock.slock)
@@ -2173,8 +2212,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 	sk->sk_error_report	=	sock_def_error_report;
 	sk->sk_destruct		=	sock_def_destruct;
 
-	sk->sk_sndmsg_page	=	NULL;
-	sk->sk_sndmsg_off	=	0;
+	sk->sk_frag.page	=	NULL;
 	sk->sk_peek_off		=	-1;
 
 	sk->sk_peer_pid 	=	NULL;
@@ -2417,6 +2455,12 @@ void sk_common_release(struct sock *sk)
 	xfrm_sk_free_policy(sk);
 
 	sk_refcnt_debug_release(sk);
+
+	if (sk->sk_frag.page) {
+		put_page(sk->sk_frag.page);
+		sk->sk_frag.page = NULL;
+	}
+
 	sock_put(sk);
 }
 EXPORT_SYMBOL(sk_common_release);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index a5beab1..24a29a3 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -793,6 +793,7 @@ static int __ip_append_data(struct sock *sk,
 			    struct flowi4 *fl4,
 			    struct sk_buff_head *queue,
 			    struct inet_cork *cork,
+			    struct page_frag *pfrag,
 			    int getfrag(void *from, char *to, int offset,
 					int len, int odd, struct sk_buff *skb),
 			    void *from, int length, int transhdrlen,
@@ -987,47 +988,30 @@ alloc_new_skb:
 			}
 		} else {
 			int i = skb_shinfo(skb)->nr_frags;
-			skb_frag_t *frag = &skb_shinfo(skb)->frags[i-1];
-			struct page *page = cork->page;
-			int off = cork->off;
-			unsigned int left;
-
-			if (page && (left = PAGE_SIZE - off) > 0) {
-				if (copy >= left)
-					copy = left;
-				if (page != skb_frag_page(frag)) {
-					if (i == MAX_SKB_FRAGS) {
-						err = -EMSGSIZE;
-						goto error;
-					}
-					skb_fill_page_desc(skb, i, page, off, 0);
-					skb_frag_ref(skb, i);
-					frag = &skb_shinfo(skb)->frags[i];
-				}
-			} else if (i < MAX_SKB_FRAGS) {
-				if (copy > PAGE_SIZE)
-					copy = PAGE_SIZE;
-				page = alloc_pages(sk->sk_allocation, 0);
-				if (page == NULL)  {
-					err = -ENOMEM;
-					goto error;
-				}
-				cork->page = page;
-				cork->off = 0;
 
-				skb_fill_page_desc(skb, i, page, 0, 0);
-				frag = &skb_shinfo(skb)->frags[i];
-			} else {
-				err = -EMSGSIZE;
-				goto error;
-			}
-			if (getfrag(from, skb_frag_address(frag)+skb_frag_size(frag),
-				    offset, copy, skb->len, skb) < 0) {
-				err = -EFAULT;
+			err = -ENOMEM;
+			if (!sk_page_frag_refill(sk, pfrag))
 				goto error;
+
+			if (!skb_can_coalesce(skb, i, pfrag->page,
+					      pfrag->offset)) {
+				err = -EMSGSIZE;
+				if (i == MAX_SKB_FRAGS)
+					goto error;
+
+				__skb_fill_page_desc(skb, i, pfrag->page,
+						     pfrag->offset, 0);
+				skb_shinfo(skb)->nr_frags = ++i;
+				get_page(pfrag->page);
 			}
-			cork->off += copy;
-			skb_frag_size_add(frag, copy);
+			copy = min_t(int, copy, pfrag->size - pfrag->offset);
+			if (getfrag(from,
+				    page_address(pfrag->page) + pfrag->offset,
+				    offset, copy, skb->len, skb) < 0)
+				goto error_efault;
+
+			pfrag->offset += copy;
+			skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
 			skb->len += copy;
 			skb->data_len += copy;
 			skb->truesize += copy;
@@ -1039,6 +1023,8 @@ alloc_new_skb:
 
 	return 0;
 
+error_efault:
+	err = -EFAULT;
 error:
 	cork->length -= length;
 	IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTDISCARDS);
@@ -1079,8 +1065,6 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 	cork->dst = &rt->dst;
 	cork->length = 0;
 	cork->tx_flags = ipc->tx_flags;
-	cork->page = NULL;
-	cork->off = 0;
 
 	return 0;
 }
@@ -1117,7 +1101,8 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4,
 		transhdrlen = 0;
 	}
 
-	return __ip_append_data(sk, fl4, &sk->sk_write_queue, &inet->cork.base, getfrag,
+	return __ip_append_data(sk, fl4, &sk->sk_write_queue, &inet->cork.base,
+				sk_page_frag(sk), getfrag,
 				from, length, transhdrlen, flags);
 }
 
@@ -1439,7 +1424,8 @@ struct sk_buff *ip_make_skb(struct sock *sk,
 	if (err)
 		return ERR_PTR(err);
 
-	err = __ip_append_data(sk, fl4, &queue, &cork, getfrag,
+	err = __ip_append_data(sk, fl4, &queue, &cork,
+			       &current->task_frag, getfrag,
 			       from, length, transhdrlen, flags);
 	if (err) {
 		__ip_flush_pending_frames(sk, &queue, &cork);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index df83d74..ede98db 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1150,78 +1150,43 @@ new_segment:
 				if (err)
 					goto do_fault;
 			} else {
-				bool merge = false;
+				bool merge = true;
 				int i = skb_shinfo(skb)->nr_frags;
-				struct page *page = sk->sk_sndmsg_page;
-				int off;
-
-				if (page && page_count(page) == 1)
-					sk->sk_sndmsg_off = 0;
-
-				off = sk->sk_sndmsg_off;
-
-				if (skb_can_coalesce(skb, i, page, off) &&
-				    off != PAGE_SIZE) {
-					/* We can extend the last page
-					 * fragment. */
-					merge = true;
-				} else if (i == MAX_SKB_FRAGS || !sg) {
-					/* Need to add new fragment and cannot
-					 * do this because interface is non-SG,
-					 * or because all the page slots are
-					 * busy. */
-					tcp_mark_push(tp, skb);
-					goto new_segment;
-				} else if (page) {
-					if (off == PAGE_SIZE) {
-						put_page(page);
-						sk->sk_sndmsg_page = page = NULL;
-						off = 0;
+				struct page_frag *pfrag = sk_page_frag(sk);
+
+				if (!sk_page_frag_refill(sk, pfrag))
+					goto wait_for_memory;
+
+				if (!skb_can_coalesce(skb, i, pfrag->page,
+						      pfrag->offset)) {
+					if (i == MAX_SKB_FRAGS || !sg) {
+						tcp_mark_push(tp, skb);
+						goto new_segment;
 					}
-				} else
-					off = 0;
+					merge = false;
+				}
 
-				if (copy > PAGE_SIZE - off)
-					copy = PAGE_SIZE - off;
+				copy = min_t(int, copy, pfrag->size - pfrag->offset);
 
 				if (!sk_wmem_schedule(sk, copy))
 					goto wait_for_memory;
 
-				if (!page) {
-					/* Allocate new cache page. */
-					if (!(page = sk_stream_alloc_page(sk)))
-						goto wait_for_memory;
-				}
-
-				/* Time to copy data. We are close to
-				 * the end! */
 				err = skb_copy_to_page_nocache(sk, from, skb,
-							       page, off, copy);
-				if (err) {
-					/* If this page was new, give it to the
-					 * socket so it does not get leaked.
-					 */
-					if (!sk->sk_sndmsg_page) {
-						sk->sk_sndmsg_page = page;
-						sk->sk_sndmsg_off = 0;
-					}
+							       pfrag->page,
+							       pfrag->offset,
+							       copy);
+				if (err)
 					goto do_error;
-				}
 
 				/* Update the skb. */
 				if (merge) {
 					skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
 				} else {
-					skb_fill_page_desc(skb, i, page, off, copy);
-					if (sk->sk_sndmsg_page) {
-						get_page(page);
-					} else if (off + copy < PAGE_SIZE) {
-						get_page(page);
-						sk->sk_sndmsg_page = page;
-					}
+					skb_fill_page_desc(skb, i, pfrag->page,
+							   pfrag->offset, copy);
+					get_page(pfrag->page);
 				}
-
-				sk->sk_sndmsg_off = off + copy;
+				pfrag->offset += copy;
 			}
 
 			if (!copied)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index e64abed..1bbee19 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2196,14 +2196,6 @@ void tcp_v4_destroy_sock(struct sock *sk)
 	if (inet_csk(sk)->icsk_bind_hash)
 		inet_put_port(sk);
 
-	/*
-	 * If sendmsg cached page exists, toss it.
-	 */
-	if (sk->sk_sndmsg_page) {
-		__free_page(sk->sk_sndmsg_page);
-		sk->sk_sndmsg_page = NULL;
-	}
-
 	/* TCP Cookie Transactions */
 	if (tp->cookie_values != NULL) {
 		kref_put(&tp->cookie_values->kref,
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 3dd4a37..aece3e7 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1279,8 +1279,6 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 		if (dst_allfrag(rt->dst.path))
 			cork->flags |= IPCORK_ALLFRAG;
 		cork->length = 0;
-		sk->sk_sndmsg_page = NULL;
-		sk->sk_sndmsg_off = 0;
 		exthdrlen = (opt ? opt->opt_flen : 0) - rt->rt6i_nfheader_len;
 		length += exthdrlen;
 		transhdrlen += exthdrlen;
@@ -1504,48 +1502,31 @@ alloc_new_skb:
 			}
 		} else {
 			int i = skb_shinfo(skb)->nr_frags;
-			skb_frag_t *frag = &skb_shinfo(skb)->frags[i-1];
-			struct page *page = sk->sk_sndmsg_page;
-			int off = sk->sk_sndmsg_off;
-			unsigned int left;
-
-			if (page && (left = PAGE_SIZE - off) > 0) {
-				if (copy >= left)
-					copy = left;
-				if (page != skb_frag_page(frag)) {
-					if (i == MAX_SKB_FRAGS) {
-						err = -EMSGSIZE;
-						goto error;
-					}
-					skb_fill_page_desc(skb, i, page, sk->sk_sndmsg_off, 0);
-					skb_frag_ref(skb, i);
-					frag = &skb_shinfo(skb)->frags[i];
-				}
-			} else if(i < MAX_SKB_FRAGS) {
-				if (copy > PAGE_SIZE)
-					copy = PAGE_SIZE;
-				page = alloc_pages(sk->sk_allocation, 0);
-				if (page == NULL) {
-					err = -ENOMEM;
-					goto error;
-				}
-				sk->sk_sndmsg_page = page;
-				sk->sk_sndmsg_off = 0;
+			struct page_frag *pfrag = sk_page_frag(sk);
 
-				skb_fill_page_desc(skb, i, page, 0, 0);
-				frag = &skb_shinfo(skb)->frags[i];
-			} else {
-				err = -EMSGSIZE;
+			err = -ENOMEM;
+			if (!sk_page_frag_refill(sk, pfrag))
 				goto error;
+
+			if (!skb_can_coalesce(skb, i, pfrag->page,
+					      pfrag->offset)) {
+				err = -EMSGSIZE;
+				if (i == MAX_SKB_FRAGS)
+					goto error;
+
+				__skb_fill_page_desc(skb, i, pfrag->page,
+						     pfrag->offset, 0);
+				skb_shinfo(skb)->nr_frags = ++i;
+				get_page(pfrag->page);
 			}
+			copy = min_t(int, copy, pfrag->size - pfrag->offset);
 			if (getfrag(from,
-				    skb_frag_address(frag) + skb_frag_size(frag),
-				    offset, copy, skb->len, skb) < 0) {
-				err = -EFAULT;
-				goto error;
-			}
-			sk->sk_sndmsg_off += copy;
-			skb_frag_size_add(frag, copy);
+				    page_address(pfrag->page) + pfrag->offset,
+				    offset, copy, skb->len, skb) < 0)
+				goto error_efault;
+
+			pfrag->offset += copy;
+			skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
 			skb->len += copy;
 			skb->data_len += copy;
 			skb->truesize += copy;
@@ -1554,7 +1535,11 @@ alloc_new_skb:
 		offset += copy;
 		length -= copy;
 	}
+
 	return 0;
+
+error_efault:
+	err = -EFAULT;
 error:
 	cork->length -= length;
 	IP6_INC_STATS(sock_net(sk), rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS);

^ permalink raw reply related

* Re: [PATCH] tcp: sysctl for initial receive window
From: Eric Dumazet @ 2012-09-21 15:25 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: netdev, Nandita Dukkipati
In-Reply-To: <20120921085502.4534.20232.stgit@dragon>

On Fri, 2012-09-21 at 10:55 +0200, Jesper Dangaard Brouer wrote:
> Make it possible to adjust the TCP default initial advertised receive
> window, via sysctl /proc/sys/net/ipv4/tcp_init_recv_window.
> 
> The window size is this value multiplied by the MSS of the connection.
> The default value is (still) 10, as descibed in commit 356f039822b
> (TCP: increase default initial receive window.)
> 
> Allow minimum value of 1, but recommend against setting value below 2
> in the documentation.
> 
> Its possible to control/override this value per route table entry via
> the iproute2 option initrwnd.  Having the global default exported via
> sysctl, helps determine the default setting, and make is easier to
> adjust.

I was wondering why its not symmetric :

If we add a sysctl for initial receive window, we need another one for
initial send window ?

Thanks

^ permalink raw reply

* 3.6rc6 use-after-free in destroy_conntrack()
From: Dave Jones @ 2012-09-21 15:31 UTC (permalink / raw)
  To: netdev; +Cc: stefw, Fedora Kernel Team

We just had a report of this happening during shutdown..

There's a blurry photograph of the full trace here.. https://bugzilla.redhat.com/attachment.cgi?id=615311

Rough transcription:

general protection fault
RIP: destroy_conntrack+0x88

RAX: 6b6b6b6b6b6b6b6b6b6b

trace:
 ? destroy_conntrack
 ? __nf_conntrack_find
 nf_conntrack_destroy
 ? nf_regsiter_afinfo
 skb_release_head_state
 __kfree
 kfree
 arp_error_report
 ? neigh_parms_alloc
 neigh_invalidate
 ? neigh_parms_alloc
 neigh_timer_handler
 run_timer_softirq
 ? run_timer_softirq
 __do_softirq
 call_softirq
 do_softirq
 irq_exit
 smp_apic_timer_interrupt
 apic_timer_interrupt
 <EOI>
 ? mwait_idle
 ? mwait_idle
 cpu_idle
 start_secondary

Disassembly of the code line shows that the dereference is happening here
in destroy_conntrack ..

        l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
    1403:       0f b6 b3 86 00 00 00    movzbl 0x86(%rbx),%esi
    140a:       0f b7 7b 72             movzwl 0x72(%rbx),%edi
    140e:       e8 00 00 00 00          callq  1413 <destroy_conntrack+0x83>
->       if (l4proto && l4proto->destroy)
    1413:       48 85 c0                test   %rax,%rax
    1416:       74 0e                   je     1426 <destroy_conntrack+0x96>
    1418:       48 8b 40 28             mov    0x28(%rax),%rax

'l4proto' seems to have been freed already, judging by the value in rax.

	Dave

^ permalink raw reply

* Re: [RFC] tcp: use order-3 pages in tcp_sendmsg()
From: Eric Dumazet @ 2012-09-21 15:48 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, subramanian.vijay, netdev
In-Reply-To: <1348172722.2669.10.camel@edumazet-glaptop>

On Thu, 2012-09-20 at 22:25 +0200, Eric Dumazet wrote:
> On Thu, 2012-09-20 at 13:06 -0700, Rick Jones wrote:
> 
> > 
> > Yes, I was being too fast and loose with my wording, paying more 
> > attention to the netperf tests than the rest of it.  While loopback may 
> > be lossless, TCP retransmissions over loopback shouldn't be all *that* 
> > surprising.
> 
> Sending perfect packets (large packets) should trigger no retransmits.

By the way, with current MTU of 16436 on loopback, max packet size is
48KB (3 MSS)

Using an mtu of 65536 allows another 25% increase of bulk performance...

(and less potential reordering effects, as a packet contains one MSS
instead of three)

There is probably a reason why lo default MTU is 16436 ?

^ permalink raw reply

* [PATCH 1/6] xen-netfront: handle backend CLOSED without CLOSING
From: David Vrabel @ 2012-09-21 16:04 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, linux-kernel, Ian Campbell,
	netdev
In-Reply-To: <1348243464-15903-1-git-send-email-david.vrabel@citrix.com>

From: David Vrabel <david.vrabel@citrix.com>

Backend drivers shouldn't transistion to CLOSED unless the frontend is
CLOSED.  If a backend does transition to CLOSED too soon then the
frontend may not see the CLOSING state and will not properly shutdown.

So, treat an unexpected backend CLOSED state the same as CLOSING.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: <netdev@vger.kernel.org>
---
 drivers/net/xen-netfront.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 3089990..843533a 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1719,7 +1719,6 @@ static void netback_changed(struct xenbus_device *dev,
 	case XenbusStateReconfiguring:
 	case XenbusStateReconfigured:
 	case XenbusStateUnknown:
-	case XenbusStateClosed:
 		break;
 
 	case XenbusStateInitWait:
@@ -1734,6 +1733,10 @@ static void netback_changed(struct xenbus_device *dev,
 		netif_notify_peers(netdev);
 		break;
 
+	case XenbusStateClosed:
+		if (dev->state == XenbusStateClosed)
+			break;
+		/* Missed the backend's CLOSING state -- fallthrough */
 	case XenbusStateClosing:
 		xenbus_frontend_closed(dev);
 		break;
-- 
1.7.2.5

^ permalink raw reply related

* Re: [Patch net-next v2] l2tp: fix compile error when CONFIG_IPV6=m and CONFIG_L2TP=y
From: David Miller @ 2012-09-21 16:07 UTC (permalink / raw)
  To: amwang; +Cc: netdev
In-Reply-To: <1348209366-26978-1-git-send-email-amwang@redhat.com>

From: Cong Wang <amwang@redhat.com>
Date: Fri, 21 Sep 2012 14:36:06 +0800

> When CONFIG_IPV6=m and CONFIG_L2TP=y, I got the following compile error:
...
> This is due to l2tp uses symbols from IPV6, so when IPV6
> is a module, l2tp is not allowed to be builtin.
> 
> Cc: David Miller <davem@davemloft.net>
> Signed-off-by: Cong Wang <amwang@redhat.com>

Applied, thanks.

^ permalink raw reply

* Re: [RFC] tcp: use order-3 pages in tcp_sendmsg()
From: David Miller @ 2012-09-21 16:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: rick.jones2, subramanian.vijay, netdev
In-Reply-To: <1348242511.2669.635.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 21 Sep 2012 17:48:31 +0200

> There is probably a reason why lo default MTU is 16436 ?

That's what fit into L1 caches back in 1999

^ permalink raw reply

* [PATCH net-next] be2net: Ignore spurious UE indication from NIC
From: Ajit Khaparde @ 2012-09-21 16:36 UTC (permalink / raw)
  To: davem; +Cc: netdev

Ignore spurious UE indication seen on some platforms.
Consider the error as un-recoverable only when the bits
stay high during second sampling.

Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
---
 drivers/net/ethernet/emulex/benet/be.h      |    2 ++
 drivers/net/ethernet/emulex/benet/be_main.c |   18 +++++++++++++++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
index 5b622993..3d4a7bc 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -401,6 +401,8 @@ struct be_adapter {
 	bool eeh_error;
 	bool fw_timeout;
 	bool hw_error;
+	u32 ue_lo;
+	u32 ue_hi;
 
 	u32 port_num;
 	bool promiscuous;
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 84379f4..e970f77 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2105,6 +2105,7 @@ void be_detect_error(struct be_adapter *adapter)
 	u32 ue_lo = 0, ue_hi = 0, ue_lo_mask = 0, ue_hi_mask = 0;
 	u32 sliport_status = 0, sliport_err1 = 0, sliport_err2 = 0;
 	u32 i;
+	struct device *dev = &adapter->pdev->dev;
 
 	if (be_crit_error(adapter))
 		return;
@@ -2129,13 +2130,22 @@ void be_detect_error(struct be_adapter *adapter)
 
 		ue_lo = (ue_lo & ~ue_lo_mask);
 		ue_hi = (ue_hi & ~ue_hi_mask);
+		if (ue_lo != adapter->ue_lo || ue_hi != adapter->ue_hi) {
+			dev_err(dev, "UE read: 0x%x/0x%x\n", ue_lo, ue_hi);
+			goto done;
+		}
+	}
+
+	if (ue_lo == 0xffffffff || ue_hi == 0xffffffff) {
+		adapter->eeh_error = true;
+		dev_err(dev, "PCI slot disconnected\n");
+		goto done;
 	}
 
 	if (ue_lo || ue_hi ||
 		sliport_status & SLIPORT_STATUS_ERR_MASK) {
 		adapter->hw_error = true;
-		dev_err(&adapter->pdev->dev,
-			"Error detected in the card\n");
+		dev_err(dev, "UE detected\n");
 	}
 
 	if (sliport_status & SLIPORT_STATUS_ERR_MASK) {
@@ -2162,7 +2172,9 @@ void be_detect_error(struct be_adapter *adapter)
 				"UE: %s bit set\n", ue_status_hi_desc[i]);
 		}
 	}
-
+done:
+	adapter->ue_lo = ue_lo;
+	adapter->ue_hi = ue_hi;
 }
 
 static void be_msix_disable(struct be_adapter *adapter)
-- 
1.7.9.5

^ permalink raw reply related

* RE: Netfilter lacks ability to filter packets via Application-origin
From: Chad Gray @ 2012-09-21 16:39 UTC (permalink / raw)
  To: netdev@vger.kernel.org

No firewall appears to exist for Linux that can filter packets based on application. Mac and Windows both offer these firewalls. Why can't Linux add this capability to its firewalls? It is a very powerful privacy & security & awareness tool for the user. 

Every attempt I've made to get this capability added to Distributions, firewall makers, etc has resulted in their telling me the Kernel does not support this capability and that is why Linux can't do this and won't be able to do this until the Kernel supports it. 		 	   		  

^ permalink raw reply

* Re: [PATCH] ath6kl: use list_move_tail instead of list_del/list_add_tail
From: Kalle Valo @ 2012-09-21 16:42 UTC (permalink / raw)
  To: Wei Yongjun; +Cc: linville, yongjun_wei, linux-wireless, netdev, ath6kl-devel
In-Reply-To: <CAPgLHd-6RmrLpi9FPTFhXiA+BHywGknmvNAd-KCdHaM+tn0qgQ@mail.gmail.com>

On 09/05/2012 10:07 AM, Wei Yongjun wrote:
> From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
> 
> Using list_move_tail() instead of list_del() + list_add_tail().
> 
> spatch with a semantic match is used to found this problem.
> (http://coccinelle.lip6.fr/)
> 
> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Thanks, applied to ath6kl.git.

Kalle

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox