Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 07/17] sfc: Log specific message for failure of NVRAM self-test
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482916.2549.23.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:28:36 +0100

> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 08/17] sfc: Read MEM_STAT for SRM_PERR as well as MEM_PERR errors
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482932.2549.24.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:28:52 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> Parity errors in different blocks of SRAM may set one of two different
> interrupt flags.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 09/17] sfc: Enable IPv6 RSS using random key for Toeplitz hash
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482942.2549.25.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:02 +0100

> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 10/17] sfc: Update MCDI protocol definitions
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482954.2549.26.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:14 +0100

> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 11/17] sfc: Set PERIODIC_NOEVENT flag for MC_CMD_MAC_STATS
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482972.2549.27.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:32 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> When set, an event is not sent whenever periodic MAC statistics are
> raised.  This avoids unnecessary wake-ups.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 12/17] sfc: Break NAPI processing after one ring-full of TX completions
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482982.2549.28.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:42 +0100

> Currently TX completions do not count towards the NAPI budget.  This
> means a continuous stream of TX completions can cause the polling
> function to loop indefinitely with scheduling disabled.  To avoid
> this, follow the common practice of reporting the budget spent after
> processing one ring-full of TX completions.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 13/17] sfc: Add necessary parentheses to macro definitions in net_driver.h
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482990.2549.29.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:50 +0100

> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 14/17] sfc: Clean up efx_nic::irq_zero_count
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483000.2549.30.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:00 +0100

> There is no need for this to be unsigned long; make it unsigned int.
> It does need a line in kernel-doc, so add that.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 15/17] sfc: Add Siena PHY BIST and cable diagnostic support
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483022.2549.31.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:22 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 16/17] sfc: Test only the first pair of TX queues
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483030.2549.32.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:30 +0100

> This makes no immediate difference, but we definitely do not want
> to test all TX queues once we allocate a pair of TX queues to each
> channel.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 17/17] sfc: Create multiple TX queues
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483043.2549.33.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:43 +0100

> Create a core TX queue and 2 hardware TX queues for each channel.
> If separate_tx_channels is set, create equal numbers of RX and TX
> channels instead.
> 
> Rewrite the channel and queue iteration macros accordingly.
> Eliminate efx_channel::used_flags as redundant.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* [PATCH v2] net/sb1250: register mdio bus in probe
From: Sebastian Andrzej Siewior @ 2010-04-28 19:57 UTC (permalink / raw)
  To: David Miller; +Cc: ralf, netdev
In-Reply-To: <20100427.155215.228945283.davem@davemloft.net>

"ifconfig eth0 up && ifconfig eth0 down" triggers:
| kobject (a8000000cfa5a480): tried to init an initialized object, something is seriously wrong.
| Call Trace:
| [<ffffffff8010aabc>] dump_stack+0x8/0x34
| [<ffffffff80293128>] kobject_init+0xe8/0xf0
| [<ffffffff802d922c>] device_initialize+0x2c/0x98
| [<ffffffff802d9cfc>] device_register+0x14/0x28
| [<ffffffff80312cd4>] mdiobus_register+0xdc/0x1e0
| [<ffffffff80314cf0>] sbmac_open+0x58/0x220
| [<ffffffff803519bc>] __dev_open+0x11c/0x180
| [<ffffffff8034d578>] __dev_change_flags+0x120/0x180
| [<ffffffff80351848>] dev_change_flags+0x20/0x78
| [<ffffffff803a753c>] devinet_ioctl+0x7cc/0x820
| [<ffffffff80339ac8>] sock_do_ioctl+0x38/0x90
| [<ffffffff8033a258>] compat_sock_ioctl_trans+0x408/0x1030
| [<ffffffff8033af30>] compat_sock_ioctl+0xb0/0xd0
| [<ffffffff80208b08>] compat_sys_ioctl+0xa0/0x18b8
| [<ffffffff80102f94>] handle_sys+0x114/0x130
|
| sb1250-mac-mdio: probed

mdiobus_register() calls device_register() which initializes the kobj of
the device. mdiobus_unregister() calls only device_del() so we have one
reference left. That one is leaving with mdiobus_free() which is only
called on remove.
Since I don't see any reason why mdiobus_register()/mdiobus_unregister()
should happen in ->open()/->close() I move them to probe & exit.

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
---
v2: register mdio bus before registering netdev device

 drivers/net/sb1250-mac.c |   67 ++++++++++++++++++++++-----------------------
 1 files changed, 33 insertions(+), 34 deletions(-)

diff --git a/drivers/net/sb1250-mac.c b/drivers/net/sb1250-mac.c
index 3f882d5..b0161b4 100644
--- a/drivers/net/sb1250-mac.c
+++ b/drivers/net/sb1250-mac.c
@@ -2256,17 +2256,36 @@ static int sbmac_init(struct platform_device *pldev, long long base)
 
 	sc->mii_bus = mdiobus_alloc();
 	if (sc->mii_bus == NULL) {
-		sbmac_uninitctx(sc);
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto uninit_ctx;
 	}
 
+	sc->mii_bus->name = sbmac_mdio_string;
+	snprintf(sc->mii_bus->id, MII_BUS_ID_SIZE, "%x", idx);
+	sc->mii_bus->priv = sc;
+	sc->mii_bus->read = sbmac_mii_read;
+	sc->mii_bus->write = sbmac_mii_write;
+	sc->mii_bus->irq = sc->phy_irq;
+	for (i = 0; i < PHY_MAX_ADDR; ++i)
+		sc->mii_bus->irq[i] = SBMAC_PHY_INT;
+
+	sc->mii_bus->parent = &pldev->dev;
+	/*
+	 * Probe PHY address
+	 */
+	err = mdiobus_register(sc->mii_bus);
+	if (err) {
+		printk(KERN_ERR "%s: unable to register MDIO bus\n",
+		       dev->name);
+		goto free_mdio;
+	}
+	dev_set_drvdata(&pldev->dev, sc->mii_bus);
+
 	err = register_netdev(dev);
 	if (err) {
 		printk(KERN_ERR "%s.%d: unable to register netdev\n",
 		       sbmac_string, idx);
-		mdiobus_free(sc->mii_bus);
-		sbmac_uninitctx(sc);
-		return err;
+		goto unreg_mdio;
 	}
 
 	pr_info("%s.%d: registered as %s\n", sbmac_string, idx, dev->name);
@@ -2282,19 +2301,15 @@ static int sbmac_init(struct platform_device *pldev, long long base)
 	pr_info("%s: SiByte Ethernet at 0x%08Lx, address: %pM\n",
 	       dev->name, base, eaddr);
 
-	sc->mii_bus->name = sbmac_mdio_string;
-	snprintf(sc->mii_bus->id, MII_BUS_ID_SIZE, "%x", idx);
-	sc->mii_bus->priv = sc;
-	sc->mii_bus->read = sbmac_mii_read;
-	sc->mii_bus->write = sbmac_mii_write;
-	sc->mii_bus->irq = sc->phy_irq;
-	for (i = 0; i < PHY_MAX_ADDR; ++i)
-		sc->mii_bus->irq[i] = SBMAC_PHY_INT;
-
-	sc->mii_bus->parent = &pldev->dev;
-	dev_set_drvdata(&pldev->dev, sc->mii_bus);
-
 	return 0;
+unreg_mdio:
+	mdiobus_unregister(sc->mii_bus);
+	dev_set_drvdata(&pldev->dev, NULL);
+free_mdio:
+	mdiobus_free(sc->mii_bus);
+uninit_ctx:
+	sbmac_uninitctx(sc);
+	return err;
 }
 
 
@@ -2320,16 +2335,6 @@ static int sbmac_open(struct net_device *dev)
 		goto out_err;
 	}
 
-	/*
-	 * Probe PHY address
-	 */
-	err = mdiobus_register(sc->mii_bus);
-	if (err) {
-		printk(KERN_ERR "%s: unable to register MDIO bus\n",
-		       dev->name);
-		goto out_unirq;
-	}
-
 	sc->sbm_speed = sbmac_speed_none;
 	sc->sbm_duplex = sbmac_duplex_none;
 	sc->sbm_fc = sbmac_fc_none;
@@ -2360,11 +2365,7 @@ static int sbmac_open(struct net_device *dev)
 	return 0;
 
 out_unregister:
-	mdiobus_unregister(sc->mii_bus);
-
-out_unirq:
 	free_irq(dev->irq, dev);
-
 out_err:
 	return err;
 }
@@ -2553,9 +2554,6 @@ static int sbmac_close(struct net_device *dev)
 
 	phy_disconnect(sc->phy_dev);
 	sc->phy_dev = NULL;
-
-	mdiobus_unregister(sc->mii_bus);
-
 	free_irq(dev->irq, dev);
 
 	sbdma_emptyring(&(sc->sbm_txdma));
@@ -2663,6 +2661,7 @@ static int __exit sbmac_remove(struct platform_device *pldev)
 
 	unregister_netdev(dev);
 	sbmac_uninitctx(sc);
+	mdiobus_unregister(sc->mii_bus);
 	mdiobus_free(sc->mii_bus);
 	iounmap(sc->sbm_base);
 	free_netdev(dev);
-- 
1.6.6.1

^ permalink raw reply related

* Re: [PATCH net-next-2.6 1/7] caif: Ldisc add permission check and mem-alloc error check
From: David Miller @ 2010-04-28 20:02 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-1-git-send-email-sjur.brandeland@stericsson.com>

From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:34 +0200

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Changes:
>    o Added permission checks for installing. CAP_SYS_ADMIN and
>      CAP_SYS_TTY_CONFIG can install the ldisc.
>    o Check if allocation of skb was successful.
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 2/7] caif: Rename functions in cfcnfg and caif_dev
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-2-git-send-email-sjur.brandeland@stericsson.com>

From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:35 +0200

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Changes:
>  o Renamed cfcnfg_del_adapt_layer to cfcnfg_disconn_adapt_layer
>  o Fixed typo cfcfg to cfcnfg
>  o Renamed linkid to channel_id
>  o Updated documentation in caif_dev.h
>  o Minor formatting changes
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 3/7] caif: Add reference counting to service layer
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-3-git-send-email-sjur.brandeland@stericsson.com>

From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:36 +0200

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Changes:
> o Added functions cfsrvl_get and cfsrvl_put.
> o Added support release_client to use by socket and net device.
> o Increase reference counting for in-flight packets from cfmuxl
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 4/7] caif: Disconnect without waiting for response
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-4-git-send-email-sjur.brandeland@stericsson.com>

From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:37 +0200

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Changes:
> o Function cfcnfg_disconn_adapt_layer is changed to do asynchronous
>   disconnect, not waiting for any response from the modem. Due to this
>   the function cfcnfg_linkdestroy_rsp does nothing anymore.
> o Because disconnect may take down a connection before a connect response
>   is received the function cfcnfg_linkup_rsp is checking if the client is
>   still waiting for the response, if not a disconnect request is sent to
>   the modem.
> o cfctrl is no longer keeping track of pending disconnect requests.
> o Added function cfctrl_cancel_req, which is used for deleting a pending
>   connect request if disconnect is done before connect response is received.
> o Removed unused function cfctrl_insert_req2
> o Added better handling of connect reject from modem.
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 5/7] caif: Rewritten socket implementation
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-5-git-send-email-sjur.brandeland@stericsson.com>

From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:38 +0200

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Changes:
>  This is a complete re-write of the socket layer. Making the socket
>  implementation more aligned with the other socket layers and using more
>  of the support functions available in sock.c. Lots of code is copied
>  from af_unix (and some from af_irda).
>  Non-blocking mode should be working as well.
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 6/7] caif: Bugfixes in CAIF netdevice for close and flow control
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-6-git-send-email-sjur.brandeland@stericsson.com>

From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:39 +0200

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Changes:
> o Bugfix: Flow control was causing the device to be destroyed.
> o Bugfix: Handle CAIF channel connect failures.
> o If the underlying link layer is gone the net-device is no longer removed,
>   but closed.
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 7/7] Bugfix: Link selection was swapped in switch.
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-7-git-send-email-sjur.brandeland@stericsson.com>

From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:40 +0200

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: 2.6.34-rc5-git7 (plus all patches) -- another suspicious rcu_dereference_check() usage.
From: Paul E. McKenney @ 2010-04-28 20:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Miles Lane, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
	Peter Zijlstra, LKML, nauman, netdev, Jens Axboe, Gui Jianfeng,
	Li Zefan, Johannes Berg, shemminger
In-Reply-To: <1272483491.2201.9.camel@edumazet-laptop>

On Wed, Apr 28, 2010 at 09:38:11PM +0200, Eric Dumazet wrote:
> Le mercredi 28 avril 2010 à 10:54 -0700, Paul E. McKenney a écrit :
> > On Mon, Apr 26, 2010 at 08:51:06PM -0400, Miles Lane wrote:
> > > This one occurred during the wakeup from suspend to RAM.
> > > 
> > > [  984.724697] [ INFO: suspicious rcu_dereference_check() usage. ]
> > > [  984.724700] ---------------------------------------------------
> > > [  984.724703] include/linux/fdtable.h:88 invoked
> > > rcu_dereference_check() without protection!
> > > [  984.724706]
> > > [  984.724707] other info that might help us debug this:
> > > [  984.724708]
> > > [  984.724711]
> > > [  984.724711] rcu_scheduler_active = 1, debug_locks = 1
> > > [  984.724714] no locks held by dbus-daemon/4680.
> > > [  984.724717]
> > > [  984.724717] stack backtrace:
> > > [  984.724721] Pid: 4680, comm: dbus-daemon Not tainted 2.6.34-rc5-git7 #33
> > > [  984.724724] Call Trace:
> > > [  984.724734]  [<ffffffff81074556>] lockdep_rcu_dereference+0x9d/0xa6
> > > [  984.724740]  [<ffffffff810fc785>] fcheck_files+0xb1/0xc9
> > > [  984.724745]  [<ffffffff810fc7f5>] fget_light+0x35/0xab
> > > [  984.724751]  [<ffffffff81433e1b>] ? sock_poll_wait+0x13/0x18
> > > [  984.724755]  [<ffffffff81433e39>] ? unix_poll+0x19/0x95
> > > [  984.724762]  [<ffffffff8110aa95>] do_sys_poll+0x1ff/0x3e5
> > > [  984.724766]  [<ffffffff8110a19e>] ? __pollwait+0x0/0xc7
> > > [  984.724771]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724776]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724780]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724784]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724788]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724793]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724797]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724802]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724806]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724812]  [<ffffffff8110ae0f>] sys_poll+0x50/0xbb
> > > [  984.724818]  [<ffffffff81009d82>] system_call_fastpath+0x16/0x1b
> > 
> > Hmmm...  I am not convinced that this is a false positive.  Couldn't
> > there be a multi-threaded process where one thread is invoking poll()
> > on a UNIX socket just as another thread is calling close() on it?
> > 
> > The current fcheck_files() logic requires that the caller either (1) be in
> > an RCU read-side critical section, (2) hold ->files_lock, or (3) passing
> > in a files_struct with ->count equal to 1 (initialization or cleanup).
> > 
> > So I don't feel comfortable just slapping an RCU read-side critical
> > section around this one, at least not unless someone who understands
> > the locking says that doing so is OK.
> > 
> > 		
> 
> Its a single threaded program.
> 
> So fget_light() calls fcheck_files(files, fd); without rcu lock,
> but some /proc/pid/fd/... user temporarly raised files->count just
> before we perform the condition check.

So I should add a single-threaded check.  My first thought was to use
current_is_single_threaded(), but the bit about scanning the full list
of processes does give me pause.  However, thread_group_empty() looks
like a much lighter-weight alternative.

I believe that it is possible for a pair of single-threaded processes
to share a file descriptor, but that should not be a problem, as both
of them would need to close it for it to go away.

But what happens if someone does a clone() with CLONE_FILES, as some
of the AIO stuff seems to do?  Won't that allow one of the resulting
processes to close the file for both of them, even though both are
otherwise single-threaded?  And the ->count seems to be the only
distinction between these two cases.

And AIO does CLONE_VM as well as CLONE_FILES, but that seems to mean that
the check must scan the processes with current_is_single_threaded().
Besides which, a user could invoke clone() with only CLONE_FILES
specified, right?

Or am I just confused here?

							Thanx, Paul

^ permalink raw reply

* Re: [PATCH]: sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v3)
From: Vlad Yasevich @ 2010-04-28 20:16 UTC (permalink / raw)
  To: Neil Horman; +Cc: sri, linux-sctp, eteo, netdev, davem, security
In-Reply-To: <20100428193755.GF4818@hmsreliant.think-freely.org>



Neil Horman wrote:

... snip description ...

> 
> 
>  include/net/sctp/structs.h |    1 
>  net/sctp/sm_make_chunk.c   |   70 +++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 62 insertions(+), 9 deletions(-)
> 
> 
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index ff30177..597f8e2 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -778,6 +778,7 @@ int sctp_user_addto_chunk(struct sctp_chunk *chunk, int off, int len,
>  			  struct iovec *data);
>  void sctp_chunk_free(struct sctp_chunk *);
>  void  *sctp_addto_chunk(struct sctp_chunk *, int len, const void *data);
> +void  *sctp_addto_chunk_fixed(struct sctp_chunk *, int len, const void *data);
>  struct sctp_chunk *sctp_chunkify(struct sk_buff *,
>  				 const struct sctp_association *,
>  				 struct sock *);
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index f592163..9623112 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -107,7 +107,7 @@ static const struct sctp_paramhdr prsctp_param = {
>  	cpu_to_be16(sizeof(struct sctp_paramhdr)),
>  };
>  
> -/* A helper to initialize to initialize an op error inside a
> +/* A helper to initialize an op error inside a
>   * provided chunk, as most cause codes will be embedded inside an
>   * abort chunk.
>   */
> @@ -124,6 +124,29 @@ void  sctp_init_cause(struct sctp_chunk *chunk, __be16 cause_code,
>  	chunk->subh.err_hdr = sctp_addto_chunk(chunk, sizeof(sctp_errhdr_t), &err);
>  }
>  
> +/* A helper to initialize an op error inside a
> + * provided chunk, as most cause codes will be embedded inside an
> + * abort chunk.  Differs from sctp_init_cause in that it won't oops
> + * if there isn't enough space in the op error chunk
> + */
> +int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
> +		      size_t paylen)
> +{
> +	sctp_errhdr_t err;
> +	__u16 len;
> +
> +	/* Cause code constants are now defined in network order.  */
> +	err.cause = cause_code;
> +	len = sizeof(sctp_errhdr_t) + paylen;
> +	err.length  = htons(len);
> +
> +	if (skb_tailroom(chunk->skb) >  len)
> +		return -ENOSPC;
> +	chunk->subh.err_hdr = sctp_addto_chunk_fixed(chunk,
> +						     sizeof(sctp_errhdr_t),
> +						     &err);
> +	return 0;
> +}
>  /* 3.3.2 Initiation (INIT) (1)
>   *
>   * This chunk is used to initiate a SCTP association between two
> @@ -1131,6 +1154,24 @@ nodata:
>  	return retval;
>  }
>  
> +/* Create an Operation Error chunk of a fixed size,
> + * specifically, max(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT)
> + * This is a helper function to allocate an error chunk for
> + * for those invalid parameter codes in which we may not want
> + * to report all the errors, if the incomming chunk is large
> + */
> +static inline struct sctp_chunk *sctp_make_op_error_fixed(
> +	const struct sctp_association *asoc,
> +	const struct sctp_chunk *chunk)
> +{
> +	size_t size = asoc ? asoc->pathmtu : 0;
> +
> +	if (size > SCTP_DEFAULT_MAXSEGMENT)
> +		size = SCTP_DEFAULT_MAXSEGMENT;
> +

This doesn't look right.  If you don't have an association or if pmtu is
not initialized, you will end up with a size of 0.  I think you simply
want to a
	if (!size)

there.


> +	return sctp_make_op_error_space(asoc, chunk, size);
> +}
> +
>  /* Create an Operation Error chunk.  */
>  struct sctp_chunk *sctp_make_op_error(const struct sctp_association *asoc,
>  				 const struct sctp_chunk *chunk,
> @@ -1373,6 +1414,18 @@ void *sctp_addto_chunk(struct sctp_chunk *chunk, int len, const void *data)
>  	return target;
>  }
>  
> +/* Append bytes to the end of a chunk. Returns NULL if there isn't sufficient
> + * space in the chunk
> + */
> +void *sctp_addto_chunk_fixed(struct sctp_chunk *chunk,
> +			     int len, const void *data)
> +{
> +	if (skb_tailroom(chunk->skb) > len)
> +		return sctp_addto_chunk(chunk, len, data);
> +	else
> +		return NULL;
> +}
> +
>  /* Append bytes from user space to the end of a chunk.  Will panic if
>   * chunk is not big enough.
>   * Returns a kernel err value.
> @@ -1793,9 +1846,9 @@ static int sctp_process_missing_param(const struct sctp_association *asoc,
>  	if (*errp) {
>  		report.num_missing = htonl(1);
>  		report.type = paramtype;
> -		sctp_init_cause(*errp, SCTP_ERROR_MISS_PARAM,
> -				sizeof(report));
> -		sctp_addto_chunk(*errp, sizeof(report), &report);
> +		sctp_init_cause_fixed(*errp, SCTP_ERROR_MISS_PARAM,
> +				      sizeof(report));
> +		sctp_addto_chunk_fixed(*errp, sizeof(report), &report);
>  	}
>  
>  	/* Stop processing this chunk. */
> @@ -1813,7 +1866,7 @@ static int sctp_process_inv_mandatory(const struct sctp_association *asoc,
>  		*errp = sctp_make_op_error_space(asoc, chunk, 0);
>  
>  	if (*errp)
> -		sctp_init_cause(*errp, SCTP_ERROR_INV_PARAM, 0);
> +		sctp_init_cause_fixed(*errp, SCTP_ERROR_INV_PARAM, 0);
>  
>  	/* Stop processing this chunk. */
>  	return 0;

I don't think missing or mandatory parameters are effected by this.  They are a
once and done deal.  There don't report multiple errors and we don't add any
more error after them.

> @@ -1976,13 +2029,12 @@ static sctp_ierror_t sctp_process_unk_param(const struct sctp_association *asoc,
>  		 * returning multiple unknown parameters.
>  		 */
>  		if (NULL == *errp)
> -			*errp = sctp_make_op_error_space(asoc, chunk,
> -					ntohs(chunk->chunk_hdr->length));
> +			*errp = sctp_make_op_error_fixed(asoc, chunk);
>  
>  		if (*errp) {
> -			sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
> +			sctp_init_cause_fixed(*errp, SCTP_ERROR_UNKNOWN_PARAM,
>  					WORD_ROUND(ntohs(param.p->length)));
> -			sctp_addto_chunk(*errp,
> +			sctp_addto_chunk_fixed(*errp,
>  					WORD_ROUND(ntohs(param.p->length)),
>  					param.v);
>  		} else {
> 

So we completely get rid of variable size error chunk in this case, which I can
live with.  It simplifies the code.

-vlad

^ permalink raw reply

* Re: 2.6.34-rc5-git7 (plus all patches) -- another suspicious rcu_dereference_check() usage.
From: Eric Dumazet @ 2010-04-28 20:18 UTC (permalink / raw)
  To: paulmck
  Cc: Miles Lane, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
	Peter Zijlstra, LKML, nauman, netdev, Jens Axboe, Gui Jianfeng,
	Li Zefan, Johannes Berg, shemminger
In-Reply-To: <20100428200904.GS2540@linux.vnet.ibm.com>

Le mercredi 28 avril 2010 à 13:09 -0700, Paul E. McKenney a écrit :
> On Wed, Apr 28, 2010 at 09:38:11PM +0200, Eric Dumazet wrote:
> > Le mercredi 28 avril 2010 à 10:54 -0700, Paul E. McKenney a écrit :
> > > On Mon, Apr 26, 2010 at 08:51:06PM -0400, Miles Lane wrote:
> > > > This one occurred during the wakeup from suspend to RAM.
> > > > 
> > > > [  984.724697] [ INFO: suspicious rcu_dereference_check() usage. ]
> > > > [  984.724700] ---------------------------------------------------
> > > > [  984.724703] include/linux/fdtable.h:88 invoked
> > > > rcu_dereference_check() without protection!
> > > > [  984.724706]
> > > > [  984.724707] other info that might help us debug this:
> > > > [  984.724708]
> > > > [  984.724711]
> > > > [  984.724711] rcu_scheduler_active = 1, debug_locks = 1
> > > > [  984.724714] no locks held by dbus-daemon/4680.
> > > > [  984.724717]
> > > > [  984.724717] stack backtrace:
> > > > [  984.724721] Pid: 4680, comm: dbus-daemon Not tainted 2.6.34-rc5-git7 #33
> > > > [  984.724724] Call Trace:
> > > > [  984.724734]  [<ffffffff81074556>] lockdep_rcu_dereference+0x9d/0xa6
> > > > [  984.724740]  [<ffffffff810fc785>] fcheck_files+0xb1/0xc9
> > > > [  984.724745]  [<ffffffff810fc7f5>] fget_light+0x35/0xab
> > > > [  984.724751]  [<ffffffff81433e1b>] ? sock_poll_wait+0x13/0x18
> > > > [  984.724755]  [<ffffffff81433e39>] ? unix_poll+0x19/0x95
> > > > [  984.724762]  [<ffffffff8110aa95>] do_sys_poll+0x1ff/0x3e5
> > > > [  984.724766]  [<ffffffff8110a19e>] ? __pollwait+0x0/0xc7
> > > > [  984.724771]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724776]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724780]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724784]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724788]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724793]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724797]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724802]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724806]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724812]  [<ffffffff8110ae0f>] sys_poll+0x50/0xbb
> > > > [  984.724818]  [<ffffffff81009d82>] system_call_fastpath+0x16/0x1b
> > > 
> > > Hmmm...  I am not convinced that this is a false positive.  Couldn't
> > > there be a multi-threaded process where one thread is invoking poll()
> > > on a UNIX socket just as another thread is calling close() on it?
> > > 
> > > The current fcheck_files() logic requires that the caller either (1) be in
> > > an RCU read-side critical section, (2) hold ->files_lock, or (3) passing
> > > in a files_struct with ->count equal to 1 (initialization or cleanup).
> > > 
> > > So I don't feel comfortable just slapping an RCU read-side critical
> > > section around this one, at least not unless someone who understands
> > > the locking says that doing so is OK.
> > > 
> > > 		
> > 
> > Its a single threaded program.
> > 
> > So fget_light() calls fcheck_files(files, fd); without rcu lock,
> > but some /proc/pid/fd/... user temporarly raised files->count just
> > before we perform the condition check.
> 
> So I should add a single-threaded check.  My first thought was to use
> current_is_single_threaded(), but the bit about scanning the full list
> of processes does give me pause.  However, thread_group_empty() looks
> like a much lighter-weight alternative.
> 
> I believe that it is possible for a pair of single-threaded processes
> to share a file descriptor, but that should not be a problem, as both
> of them would need to close it for it to go away.
> 
> But what happens if someone does a clone() with CLONE_FILES, as some
> of the AIO stuff seems to do?  Won't that allow one of the resulting
> processes to close the file for both of them, even though both are
> otherwise single-threaded?  And the ->count seems to be the only
> distinction between these two cases.
> 
> And AIO does CLONE_VM as well as CLONE_FILES, but that seems to mean that
> the check must scan the processes with current_is_single_threaded().
> Besides which, a user could invoke clone() with only CLONE_FILES
> specified, right?
> 
> Or am I just confused here?
> 
> 							Thanx, Paul

If a program is mono threaded, and doing a fget_light() syscall, it
cannot possibly do a clone() in // ;)

If we want to be picky, we could add a user provided condition, aka "we
are sure we are allowed to do this because we are the owner of the files
struct".






diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index 6da962c..027f5e1 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -2694,7 +2694,7 @@ void __do_SAK(struct tty_struct *tty)
 			spin_lock(&p->files->file_lock);
 			fdt = files_fdtable(p->files);
 			for (i = 0; i < fdt->max_fds; i++) {
-				filp = fcheck_files(p->files, i);
+				filp = fcheck_files(p->files, i, false);
 				if (!filp)
 					continue;
 				if (filp->f_op->read == tty_read &&
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 452d02f..dabf4d8 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -119,7 +119,7 @@ SYSCALL_DEFINE2(dup2, unsigned int, oldfd, unsigned int, newfd)
 		int retval = oldfd;
 
 		rcu_read_lock();
-		if (!fcheck_files(files, oldfd))
+		if (!fcheck_files(files, oldfd, false))
 			retval = -EBADF;
 		rcu_read_unlock();
 		return retval;
diff --git a/fs/file_table.c b/fs/file_table.c
index 32d12b7..2865f72 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -274,7 +274,7 @@ struct file *fget(unsigned int fd)
 	struct files_struct *files = current->files;
 
 	rcu_read_lock();
-	file = fcheck_files(files, fd);
+	file = fcheck_files(files, fd, false);
 	if (file) {
 		if (!atomic_long_inc_not_zero(&file->f_count)) {
 			/* File object ref couldn't be taken */
@@ -303,10 +303,10 @@ struct file *fget_light(unsigned int fd, int *fput_needed)
 
 	*fput_needed = 0;
 	if (likely((atomic_read(&files->count) == 1))) {
-		file = fcheck_files(files, fd);
+		file = fcheck_files(files, fd, true);
 	} else {
 		rcu_read_lock();
-		file = fcheck_files(files, fd);
+		file = fcheck_files(files, fd, false);
 		if (file) {
 			if (atomic_long_inc_not_zero(&file->f_count))
 				*fput_needed = 1;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 8418fcc..0e89448 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1716,7 +1716,7 @@ static int proc_fd_info(struct inode *inode, struct path *path, char *info)
 		 * hold ->file_lock.
 		 */
 		spin_lock(&files->file_lock);
-		file = fcheck_files(files, fd);
+		file = fcheck_files(files, fd, false);
 		if (file) {
 			if (path) {
 				*path = file->f_path;
@@ -1755,7 +1755,7 @@ static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
 		files = get_files_struct(task);
 		if (files) {
 			rcu_read_lock();
-			if (fcheck_files(files, fd)) {
+			if (fcheck_files(files, fd, false)) {
 				rcu_read_unlock();
 				put_files_struct(files);
 				if (task_dumpable(task)) {
@@ -1813,7 +1813,7 @@ static struct dentry *proc_fd_instantiate(struct inode *dir,
 	 * hold ->file_lock.
 	 */
 	spin_lock(&files->file_lock);
-	file = fcheck_files(files, fd);
+	file = fcheck_files(files, fd, false);
 	if (!file)
 		goto out_unlock;
 	if (file->f_mode & FMODE_READ)
@@ -1899,7 +1899,7 @@ static int proc_readfd_common(struct file * filp, void * dirent,
 				char name[PROC_NUMBUF];
 				int len;
 
-				if (!fcheck_files(files, fd))
+				if (!fcheck_files(files, fd, false))
 					continue;
 				rcu_read_unlock();
 
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 013dc52..76423ad 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -57,11 +57,12 @@ struct files_struct {
 	struct file * fd_array[NR_OPEN_DEFAULT];
 };
 
-#define rcu_dereference_check_fdtable(files, fdtfd) \
+#define rcu_dereference_check_fdtable(files, fdtfd, cond) \
 	(rcu_dereference_check((fdtfd), \
 			       rcu_read_lock_held() || \
 			       lockdep_is_held(&(files)->file_lock) || \
-			       atomic_read(&(files)->count) == 1))
+			       atomic_read(&(files)->count) == 1 || \
+			       cond))
 
 #define files_fdtable(files) \
 		(rcu_dereference_check_fdtable((files), (files)->fdt))
@@ -79,13 +80,13 @@ static inline void free_fdtable(struct fdtable *fdt)
 	call_rcu(&fdt->rcu, free_fdtable_rcu);
 }
 
-static inline struct file * fcheck_files(struct files_struct *files, unsigned int fd)
+static inline struct file * fcheck_files(struct files_struct *files, unsigned int fd, bool cond)
 {
 	struct file * file = NULL;
 	struct fdtable *fdt = files_fdtable(files);
 
 	if (fd < fdt->max_fds)
-		file = rcu_dereference_check_fdtable(files, fdt->fd[fd]);
+		file = rcu_dereference_check_fdtable(files, fdt->fd[fd], cond);
 	return file;
 }
 



^ permalink raw reply related

* Re: [PATCH]: sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v4)
From: Neil Horman @ 2010-04-28 20:30 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: sri, linux-sctp, eteo, netdev, davem, security
In-Reply-To: <4BD897B6.5040405@hp.com>

Ok, version 4

Change Notes:
1) Minor cleanups, from Vlads notes

Summary:


Hey-
	Recently, it was reported to me that the kernel could oops in the
following way:

<5> kernel BUG at net/core/skbuff.c:91!
<5> invalid operand: 0000 [#1]
<5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
mptbase sd_mod scsi_mod
<5> CPU:    0
<5> EIP:    0060:[<c02bff27>]    Not tainted VLI
<5> EFLAGS: 00010216   (2.6.9-89.0.25.EL) 
<5> EIP is at skb_over_panic+0x1f/0x2d
<5> eax: 0000002c   ebx: c033f461   ecx: c0357d96   edx: c040fd44
<5> esi: c033f461   edi: df653280   ebp: 00000000   esp: c040fd40
<5> ds: 007b   es: 007b   ss: 0068
<5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
<5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
e0c2947d 
<5>        00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
df653490 
<5>        00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
00000004 
<5> Call Trace:
<5>  [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
<5>  [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
<5>  [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
<5>  [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
<5>  [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
<5>  [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
<5>  [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
<5>  [<c01555a4>] cache_grow+0x140/0x233
<5>  [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
<5>  [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
<5>  [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
<5>  [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
<5>  [<c02d005e>] nf_iterate+0x40/0x81
<5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5>  [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
<5>  [<c02d0362>] nf_hook_slow+0x83/0xb5
<5>  [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
<5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5>  [<c02e103e>] ip_rcv+0x334/0x3b4
<5>  [<c02c66fd>] netif_receive_skb+0x320/0x35b
<5>  [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
<5>  [<c02c67a4>] process_backlog+0x6c/0xd9
<5>  [<c02c690f>] net_rx_action+0xfe/0x1f8
<5>  [<c012a7b1>] __do_softirq+0x35/0x79
<5>  [<c0107efb>] handle_IRQ_event+0x0/0x4f
<5>  [<c01094de>] do_softirq+0x46/0x4d

Its an skb_over_panic BUG halt that results from processing an init chunk in
which too many of its variable length parameters are in some way malformed.

The problem is in sctp_process_unk_param:
if (NULL == *errp)
	*errp = sctp_make_op_error_space(asoc, chunk,
					 ntohs(chunk->chunk_hdr->length));

	if (*errp) {
		sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
				 WORD_ROUND(ntohs(param.p->length)));
		sctp_addto_chunk(*errp,
			WORD_ROUND(ntohs(param.p->length)),
				  param.v);

When we allocate an error chunk, we assume that the worst case scenario requires
that we have chunk_hdr->length data allocated, which would be correct nominally,
given that we call sctp_addto_chunk for the violating parameter.  Unfortunately,
we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
chunk, so the worst case situation in which all parameters are in violation
requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.

The result of this error is that a deliberately malformed packet sent to a
listening host can cause a remote DOS, described in CVE-2010-1173:
http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173

I've tested the below fix and confirmed that it fixes the issue.  We move to a
strategy whereby we allocate a fixed size error chunk and ignore errors we don't
have space to report.  Tested by me successfully

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 include/net/sctp/structs.h |    1 
 net/sctp/sm_make_chunk.c   |   62 +++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 58 insertions(+), 5 deletions(-)


diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index ff30177..597f8e2 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -778,6 +778,7 @@ int sctp_user_addto_chunk(struct sctp_chunk *chunk, int off, int len,
 			  struct iovec *data);
 void sctp_chunk_free(struct sctp_chunk *);
 void  *sctp_addto_chunk(struct sctp_chunk *, int len, const void *data);
+void  *sctp_addto_chunk_fixed(struct sctp_chunk *, int len, const void *data);
 struct sctp_chunk *sctp_chunkify(struct sk_buff *,
 				 const struct sctp_association *,
 				 struct sock *);
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index f592163..2971731 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -107,7 +107,7 @@ static const struct sctp_paramhdr prsctp_param = {
 	cpu_to_be16(sizeof(struct sctp_paramhdr)),
 };
 
-/* A helper to initialize to initialize an op error inside a
+/* A helper to initialize an op error inside a
  * provided chunk, as most cause codes will be embedded inside an
  * abort chunk.
  */
@@ -124,6 +124,29 @@ void  sctp_init_cause(struct sctp_chunk *chunk, __be16 cause_code,
 	chunk->subh.err_hdr = sctp_addto_chunk(chunk, sizeof(sctp_errhdr_t), &err);
 }
 
+/* A helper to initialize an op error inside a
+ * provided chunk, as most cause codes will be embedded inside an
+ * abort chunk.  Differs from sctp_init_cause in that it won't oops
+ * if there isn't enough space in the op error chunk
+ */
+int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
+		      size_t paylen)
+{
+	sctp_errhdr_t err;
+	__u16 len;
+
+	/* Cause code constants are now defined in network order.  */
+	err.cause = cause_code;
+	len = sizeof(sctp_errhdr_t) + paylen;
+	err.length  = htons(len);
+
+	if (skb_tailroom(chunk->skb) >  len)
+		return -ENOSPC;
+	chunk->subh.err_hdr = sctp_addto_chunk_fixed(chunk,
+						     sizeof(sctp_errhdr_t),
+						     &err);
+	return 0;
+}
 /* 3.3.2 Initiation (INIT) (1)
  *
  * This chunk is used to initiate a SCTP association between two
@@ -1131,6 +1154,24 @@ nodata:
 	return retval;
 }
 
+/* Create an Operation Error chunk of a fixed size,
+ * specifically, max(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT)
+ * This is a helper function to allocate an error chunk for
+ * for those invalid parameter codes in which we may not want
+ * to report all the errors, if the incomming chunk is large
+ */
+static inline struct sctp_chunk *sctp_make_op_error_fixed(
+	const struct sctp_association *asoc,
+	const struct sctp_chunk *chunk)
+{
+	size_t size = asoc ? asoc->pathmtu : 0;
+
+	if (!size)
+		size = SCTP_DEFAULT_MAXSEGMENT;
+
+	return sctp_make_op_error_space(asoc, chunk, size);
+}
+
 /* Create an Operation Error chunk.  */
 struct sctp_chunk *sctp_make_op_error(const struct sctp_association *asoc,
 				 const struct sctp_chunk *chunk,
@@ -1373,6 +1414,18 @@ void *sctp_addto_chunk(struct sctp_chunk *chunk, int len, const void *data)
 	return target;
 }
 
+/* Append bytes to the end of a chunk. Returns NULL if there isn't sufficient
+ * space in the chunk
+ */
+void *sctp_addto_chunk_fixed(struct sctp_chunk *chunk,
+			     int len, const void *data)
+{
+	if (skb_tailroom(chunk->skb) > len)
+		return sctp_addto_chunk(chunk, len, data);
+	else
+		return NULL;
+}
+
 /* Append bytes from user space to the end of a chunk.  Will panic if
  * chunk is not big enough.
  * Returns a kernel err value.
@@ -1976,13 +2029,12 @@ static sctp_ierror_t sctp_process_unk_param(const struct sctp_association *asoc,
 		 * returning multiple unknown parameters.
 		 */
 		if (NULL == *errp)
-			*errp = sctp_make_op_error_space(asoc, chunk,
-					ntohs(chunk->chunk_hdr->length));
+			*errp = sctp_make_op_error_fixed(asoc, chunk);
 
 		if (*errp) {
-			sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
+			sctp_init_cause_fixed(*errp, SCTP_ERROR_UNKNOWN_PARAM,
 					WORD_ROUND(ntohs(param.p->length)));
-			sctp_addto_chunk(*errp,
+			sctp_addto_chunk_fixed(*errp,
 					WORD_ROUND(ntohs(param.p->length)),
 					param.v);
 		} else {

^ permalink raw reply related

* Re: [PATCH]: sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v4)
From: Vlad Yasevich @ 2010-04-28 20:37 UTC (permalink / raw)
  To: Neil Horman; +Cc: sri, linux-sctp, eteo, netdev, davem, security
In-Reply-To: <20100428203059.GG4818@hmsreliant.think-freely.org>


Looks good.

Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>

-vlad

Neil Horman wrote:
> Ok, version 4
> 
> Change Notes:
> 1) Minor cleanups, from Vlads notes
> 
> Summary:
> 
> 
> Hey-
> 	Recently, it was reported to me that the kernel could oops in the
> following way:
> 
> <5> kernel BUG at net/core/skbuff.c:91!
> <5> invalid operand: 0000 [#1]
> <5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
> ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
> vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
> ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
> snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
> pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
> mptbase sd_mod scsi_mod
> <5> CPU:    0
> <5> EIP:    0060:[<c02bff27>]    Not tainted VLI
> <5> EFLAGS: 00010216   (2.6.9-89.0.25.EL) 
> <5> EIP is at skb_over_panic+0x1f/0x2d
> <5> eax: 0000002c   ebx: c033f461   ecx: c0357d96   edx: c040fd44
> <5> esi: c033f461   edi: df653280   ebp: 00000000   esp: c040fd40
> <5> ds: 007b   es: 007b   ss: 0068
> <5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
> <5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
> e0c2947d 
> <5>        00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
> df653490 
> <5>        00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
> 00000004 
> <5> Call Trace:
> <5>  [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
> <5>  [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
> <5>  [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
> <5>  [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
> <5>  [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
> <5>  [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
> <5>  [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
> <5>  [<c01555a4>] cache_grow+0x140/0x233
> <5>  [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
> <5>  [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
> <5>  [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
> <5>  [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
> <5>  [<c02d005e>] nf_iterate+0x40/0x81
> <5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
> <5>  [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
> <5>  [<c02d0362>] nf_hook_slow+0x83/0xb5
> <5>  [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
> <5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
> <5>  [<c02e103e>] ip_rcv+0x334/0x3b4
> <5>  [<c02c66fd>] netif_receive_skb+0x320/0x35b
> <5>  [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
> <5>  [<c02c67a4>] process_backlog+0x6c/0xd9
> <5>  [<c02c690f>] net_rx_action+0xfe/0x1f8
> <5>  [<c012a7b1>] __do_softirq+0x35/0x79
> <5>  [<c0107efb>] handle_IRQ_event+0x0/0x4f
> <5>  [<c01094de>] do_softirq+0x46/0x4d
> 
> Its an skb_over_panic BUG halt that results from processing an init chunk in
> which too many of its variable length parameters are in some way malformed.
> 
> The problem is in sctp_process_unk_param:
> if (NULL == *errp)
> 	*errp = sctp_make_op_error_space(asoc, chunk,
> 					 ntohs(chunk->chunk_hdr->length));
> 
> 	if (*errp) {
> 		sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
> 				 WORD_ROUND(ntohs(param.p->length)));
> 		sctp_addto_chunk(*errp,
> 			WORD_ROUND(ntohs(param.p->length)),
> 				  param.v);
> 
> When we allocate an error chunk, we assume that the worst case scenario requires
> that we have chunk_hdr->length data allocated, which would be correct nominally,
> given that we call sctp_addto_chunk for the violating parameter.  Unfortunately,
> we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
> chunk, so the worst case situation in which all parameters are in violation
> requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.
> 
> The result of this error is that a deliberately malformed packet sent to a
> listening host can cause a remote DOS, described in CVE-2010-1173:
> http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173
> 
> I've tested the below fix and confirmed that it fixes the issue.  We move to a
> strategy whereby we allocate a fixed size error chunk and ignore errors we don't
> have space to report.  Tested by me successfully
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> 
> 
>  include/net/sctp/structs.h |    1 
>  net/sctp/sm_make_chunk.c   |   62 +++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 58 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index ff30177..597f8e2 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -778,6 +778,7 @@ int sctp_user_addto_chunk(struct sctp_chunk *chunk, int off, int len,
>  			  struct iovec *data);
>  void sctp_chunk_free(struct sctp_chunk *);
>  void  *sctp_addto_chunk(struct sctp_chunk *, int len, const void *data);
> +void  *sctp_addto_chunk_fixed(struct sctp_chunk *, int len, const void *data);
>  struct sctp_chunk *sctp_chunkify(struct sk_buff *,
>  				 const struct sctp_association *,
>  				 struct sock *);
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index f592163..2971731 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -107,7 +107,7 @@ static const struct sctp_paramhdr prsctp_param = {
>  	cpu_to_be16(sizeof(struct sctp_paramhdr)),
>  };
>  
> -/* A helper to initialize to initialize an op error inside a
> +/* A helper to initialize an op error inside a
>   * provided chunk, as most cause codes will be embedded inside an
>   * abort chunk.
>   */
> @@ -124,6 +124,29 @@ void  sctp_init_cause(struct sctp_chunk *chunk, __be16 cause_code,
>  	chunk->subh.err_hdr = sctp_addto_chunk(chunk, sizeof(sctp_errhdr_t), &err);
>  }
>  
> +/* A helper to initialize an op error inside a
> + * provided chunk, as most cause codes will be embedded inside an
> + * abort chunk.  Differs from sctp_init_cause in that it won't oops
> + * if there isn't enough space in the op error chunk
> + */
> +int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
> +		      size_t paylen)
> +{
> +	sctp_errhdr_t err;
> +	__u16 len;
> +
> +	/* Cause code constants are now defined in network order.  */
> +	err.cause = cause_code;
> +	len = sizeof(sctp_errhdr_t) + paylen;
> +	err.length  = htons(len);
> +
> +	if (skb_tailroom(chunk->skb) >  len)
> +		return -ENOSPC;
> +	chunk->subh.err_hdr = sctp_addto_chunk_fixed(chunk,
> +						     sizeof(sctp_errhdr_t),
> +						     &err);
> +	return 0;
> +}
>  /* 3.3.2 Initiation (INIT) (1)
>   *
>   * This chunk is used to initiate a SCTP association between two
> @@ -1131,6 +1154,24 @@ nodata:
>  	return retval;
>  }
>  
> +/* Create an Operation Error chunk of a fixed size,
> + * specifically, max(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT)
> + * This is a helper function to allocate an error chunk for
> + * for those invalid parameter codes in which we may not want
> + * to report all the errors, if the incomming chunk is large
> + */
> +static inline struct sctp_chunk *sctp_make_op_error_fixed(
> +	const struct sctp_association *asoc,
> +	const struct sctp_chunk *chunk)
> +{
> +	size_t size = asoc ? asoc->pathmtu : 0;
> +
> +	if (!size)
> +		size = SCTP_DEFAULT_MAXSEGMENT;
> +
> +	return sctp_make_op_error_space(asoc, chunk, size);
> +}
> +
>  /* Create an Operation Error chunk.  */
>  struct sctp_chunk *sctp_make_op_error(const struct sctp_association *asoc,
>  				 const struct sctp_chunk *chunk,
> @@ -1373,6 +1414,18 @@ void *sctp_addto_chunk(struct sctp_chunk *chunk, int len, const void *data)
>  	return target;
>  }
>  
> +/* Append bytes to the end of a chunk. Returns NULL if there isn't sufficient
> + * space in the chunk
> + */
> +void *sctp_addto_chunk_fixed(struct sctp_chunk *chunk,
> +			     int len, const void *data)
> +{
> +	if (skb_tailroom(chunk->skb) > len)
> +		return sctp_addto_chunk(chunk, len, data);
> +	else
> +		return NULL;
> +}
> +
>  /* Append bytes from user space to the end of a chunk.  Will panic if
>   * chunk is not big enough.
>   * Returns a kernel err value.
> @@ -1976,13 +2029,12 @@ static sctp_ierror_t sctp_process_unk_param(const struct sctp_association *asoc,
>  		 * returning multiple unknown parameters.
>  		 */
>  		if (NULL == *errp)
> -			*errp = sctp_make_op_error_space(asoc, chunk,
> -					ntohs(chunk->chunk_hdr->length));
> +			*errp = sctp_make_op_error_fixed(asoc, chunk);
>  
>  		if (*errp) {
> -			sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
> +			sctp_init_cause_fixed(*errp, SCTP_ERROR_UNKNOWN_PARAM,
>  					WORD_ROUND(ntohs(param.p->length)));
> -			sctp_addto_chunk(*errp,
> +			sctp_addto_chunk_fixed(*errp,
>  					WORD_ROUND(ntohs(param.p->length)),
>  					param.v);
>  		} else {
> 

^ permalink raw reply

* Re: 2.6.34-rc5-git7 (plus all patches) -- another suspicious rcu_dereference_check() usage.
From: Paul E. McKenney @ 2010-04-28 20:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Miles Lane, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
	Peter Zijlstra, LKML, nauman, netdev, Jens Axboe, Gui Jianfeng,
	Li Zefan, Johannes Berg, shemminger
In-Reply-To: <1272485923.2201.22.camel@edumazet-laptop>

On Wed, Apr 28, 2010 at 10:18:43PM +0200, Eric Dumazet wrote:
> Le mercredi 28 avril 2010 à 13:09 -0700, Paul E. McKenney a écrit :
> > On Wed, Apr 28, 2010 at 09:38:11PM +0200, Eric Dumazet wrote:
> > > Le mercredi 28 avril 2010 à 10:54 -0700, Paul E. McKenney a écrit :
> > > > On Mon, Apr 26, 2010 at 08:51:06PM -0400, Miles Lane wrote:
> > > > > This one occurred during the wakeup from suspend to RAM.
> > > > > 
> > > > > [  984.724697] [ INFO: suspicious rcu_dereference_check() usage. ]
> > > > > [  984.724700] ---------------------------------------------------
> > > > > [  984.724703] include/linux/fdtable.h:88 invoked
> > > > > rcu_dereference_check() without protection!
> > > > > [  984.724706]
> > > > > [  984.724707] other info that might help us debug this:
> > > > > [  984.724708]
> > > > > [  984.724711]
> > > > > [  984.724711] rcu_scheduler_active = 1, debug_locks = 1
> > > > > [  984.724714] no locks held by dbus-daemon/4680.
> > > > > [  984.724717]
> > > > > [  984.724717] stack backtrace:
> > > > > [  984.724721] Pid: 4680, comm: dbus-daemon Not tainted 2.6.34-rc5-git7 #33
> > > > > [  984.724724] Call Trace:
> > > > > [  984.724734]  [<ffffffff81074556>] lockdep_rcu_dereference+0x9d/0xa6
> > > > > [  984.724740]  [<ffffffff810fc785>] fcheck_files+0xb1/0xc9
> > > > > [  984.724745]  [<ffffffff810fc7f5>] fget_light+0x35/0xab
> > > > > [  984.724751]  [<ffffffff81433e1b>] ? sock_poll_wait+0x13/0x18
> > > > > [  984.724755]  [<ffffffff81433e39>] ? unix_poll+0x19/0x95
> > > > > [  984.724762]  [<ffffffff8110aa95>] do_sys_poll+0x1ff/0x3e5
> > > > > [  984.724766]  [<ffffffff8110a19e>] ? __pollwait+0x0/0xc7
> > > > > [  984.724771]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724776]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724780]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724784]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724788]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724793]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724797]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724802]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724806]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724812]  [<ffffffff8110ae0f>] sys_poll+0x50/0xbb
> > > > > [  984.724818]  [<ffffffff81009d82>] system_call_fastpath+0x16/0x1b
> > > > 
> > > > Hmmm...  I am not convinced that this is a false positive.  Couldn't
> > > > there be a multi-threaded process where one thread is invoking poll()
> > > > on a UNIX socket just as another thread is calling close() on it?
> > > > 
> > > > The current fcheck_files() logic requires that the caller either (1) be in
> > > > an RCU read-side critical section, (2) hold ->files_lock, or (3) passing
> > > > in a files_struct with ->count equal to 1 (initialization or cleanup).
> > > > 
> > > > So I don't feel comfortable just slapping an RCU read-side critical
> > > > section around this one, at least not unless someone who understands
> > > > the locking says that doing so is OK.
> > > > 
> > > > 		
> > > 
> > > Its a single threaded program.
> > > 
> > > So fget_light() calls fcheck_files(files, fd); without rcu lock,
> > > but some /proc/pid/fd/... user temporarly raised files->count just
> > > before we perform the condition check.
> > 
> > So I should add a single-threaded check.  My first thought was to use
> > current_is_single_threaded(), but the bit about scanning the full list
> > of processes does give me pause.  However, thread_group_empty() looks
> > like a much lighter-weight alternative.
> > 
> > I believe that it is possible for a pair of single-threaded processes
> > to share a file descriptor, but that should not be a problem, as both
> > of them would need to close it for it to go away.
> > 
> > But what happens if someone does a clone() with CLONE_FILES, as some
> > of the AIO stuff seems to do?  Won't that allow one of the resulting
> > processes to close the file for both of them, even though both are
> > otherwise single-threaded?  And the ->count seems to be the only
> > distinction between these two cases.
> > 
> > And AIO does CLONE_VM as well as CLONE_FILES, but that seems to mean that
> > the check must scan the processes with current_is_single_threaded().
> > Besides which, a user could invoke clone() with only CLONE_FILES
> > specified, right?
> > 
> > Or am I just confused here?
> > 
> > 							Thanx, Paul
> 
> If a program is mono threaded, and doing a fget_light() syscall, it
> cannot possibly do a clone() in // ;)

The sequence of events that I am worried about is as follows:

1.	Single-threaded process does clone(CLONE_FILES).  The
	result is a pair of single-threaded processes that share
	file descriptors.

2.	One of these processes does files_fdtable(i) at the same
	time as the other process closes file descriptor i.

So, clone and -then- do fget_light().

> If we want to be picky, we could add a user provided condition, aka "we
> are sure we are allowed to do this because we are the owner of the files
> struct".

But yes, if I understand your trick below, the race conditions from
the above sequence of events would simply force the processes off
of the fget_light() path, which should be OK.

						Thanx, Paul

> diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
> index 6da962c..027f5e1 100644
> --- a/drivers/char/tty_io.c
> +++ b/drivers/char/tty_io.c
> @@ -2694,7 +2694,7 @@ void __do_SAK(struct tty_struct *tty)
>  			spin_lock(&p->files->file_lock);
>  			fdt = files_fdtable(p->files);
>  			for (i = 0; i < fdt->max_fds; i++) {
> -				filp = fcheck_files(p->files, i);
> +				filp = fcheck_files(p->files, i, false);
>  				if (!filp)
>  					continue;
>  				if (filp->f_op->read == tty_read &&
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index 452d02f..dabf4d8 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -119,7 +119,7 @@ SYSCALL_DEFINE2(dup2, unsigned int, oldfd, unsigned int, newfd)
>  		int retval = oldfd;
> 
>  		rcu_read_lock();
> -		if (!fcheck_files(files, oldfd))
> +		if (!fcheck_files(files, oldfd, false))
>  			retval = -EBADF;
>  		rcu_read_unlock();
>  		return retval;
> diff --git a/fs/file_table.c b/fs/file_table.c
> index 32d12b7..2865f72 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -274,7 +274,7 @@ struct file *fget(unsigned int fd)
>  	struct files_struct *files = current->files;
> 
>  	rcu_read_lock();
> -	file = fcheck_files(files, fd);
> +	file = fcheck_files(files, fd, false);
>  	if (file) {
>  		if (!atomic_long_inc_not_zero(&file->f_count)) {
>  			/* File object ref couldn't be taken */
> @@ -303,10 +303,10 @@ struct file *fget_light(unsigned int fd, int *fput_needed)
> 
>  	*fput_needed = 0;
>  	if (likely((atomic_read(&files->count) == 1))) {
> -		file = fcheck_files(files, fd);
> +		file = fcheck_files(files, fd, true);
>  	} else {
>  		rcu_read_lock();
> -		file = fcheck_files(files, fd);
> +		file = fcheck_files(files, fd, false);
>  		if (file) {
>  			if (atomic_long_inc_not_zero(&file->f_count))
>  				*fput_needed = 1;
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 8418fcc..0e89448 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1716,7 +1716,7 @@ static int proc_fd_info(struct inode *inode, struct path *path, char *info)
>  		 * hold ->file_lock.
>  		 */
>  		spin_lock(&files->file_lock);
> -		file = fcheck_files(files, fd);
> +		file = fcheck_files(files, fd, false);
>  		if (file) {
>  			if (path) {
>  				*path = file->f_path;
> @@ -1755,7 +1755,7 @@ static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
>  		files = get_files_struct(task);
>  		if (files) {
>  			rcu_read_lock();
> -			if (fcheck_files(files, fd)) {
> +			if (fcheck_files(files, fd, false)) {
>  				rcu_read_unlock();
>  				put_files_struct(files);
>  				if (task_dumpable(task)) {
> @@ -1813,7 +1813,7 @@ static struct dentry *proc_fd_instantiate(struct inode *dir,
>  	 * hold ->file_lock.
>  	 */
>  	spin_lock(&files->file_lock);
> -	file = fcheck_files(files, fd);
> +	file = fcheck_files(files, fd, false);
>  	if (!file)
>  		goto out_unlock;
>  	if (file->f_mode & FMODE_READ)
> @@ -1899,7 +1899,7 @@ static int proc_readfd_common(struct file * filp, void * dirent,
>  				char name[PROC_NUMBUF];
>  				int len;
> 
> -				if (!fcheck_files(files, fd))
> +				if (!fcheck_files(files, fd, false))
>  					continue;
>  				rcu_read_unlock();
> 
> diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
> index 013dc52..76423ad 100644
> --- a/include/linux/fdtable.h
> +++ b/include/linux/fdtable.h
> @@ -57,11 +57,12 @@ struct files_struct {
>  	struct file * fd_array[NR_OPEN_DEFAULT];
>  };
> 
> -#define rcu_dereference_check_fdtable(files, fdtfd) \
> +#define rcu_dereference_check_fdtable(files, fdtfd, cond) \
>  	(rcu_dereference_check((fdtfd), \
>  			       rcu_read_lock_held() || \
>  			       lockdep_is_held(&(files)->file_lock) || \
> -			       atomic_read(&(files)->count) == 1))
> +			       atomic_read(&(files)->count) == 1 || \
> +			       cond))
> 
>  #define files_fdtable(files) \
>  		(rcu_dereference_check_fdtable((files), (files)->fdt))
> @@ -79,13 +80,13 @@ static inline void free_fdtable(struct fdtable *fdt)
>  	call_rcu(&fdt->rcu, free_fdtable_rcu);
>  }
> 
> -static inline struct file * fcheck_files(struct files_struct *files, unsigned int fd)
> +static inline struct file * fcheck_files(struct files_struct *files, unsigned int fd, bool cond)
>  {
>  	struct file * file = NULL;
>  	struct fdtable *fdt = files_fdtable(files);
> 
>  	if (fd < fdt->max_fds)
> -		file = rcu_dereference_check_fdtable(files, fdt->fd[fd]);
> +		file = rcu_dereference_check_fdtable(files, fdt->fd[fd], cond);
>  	return file;
>  }
> 
> 
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox