* Re: [PATCH 07/17] sfc: Log specific message for failure of NVRAM self-test
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482916.2549.23.camel@achroite.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:28:36 +0100
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied.
^ permalink raw reply
* Re: [PATCH 08/17] sfc: Read MEM_STAT for SRM_PERR as well as MEM_PERR errors
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482932.2549.24.camel@achroite.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:28:52 +0100
> From: Steve Hodgson <shodgson@solarflare.com>
>
> Parity errors in different blocks of SRAM may set one of two different
> interrupt flags.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied.
^ permalink raw reply
* Re: [PATCH 09/17] sfc: Enable IPv6 RSS using random key for Toeplitz hash
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482942.2549.25.camel@achroite.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:02 +0100
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied.
^ permalink raw reply
* Re: [PATCH 10/17] sfc: Update MCDI protocol definitions
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482954.2549.26.camel@achroite.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:14 +0100
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied.
^ permalink raw reply
* Re: [PATCH 11/17] sfc: Set PERIODIC_NOEVENT flag for MC_CMD_MAC_STATS
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482972.2549.27.camel@achroite.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:32 +0100
> From: Steve Hodgson <shodgson@solarflare.com>
>
> When set, an event is not sent whenever periodic MAC statistics are
> raised. This avoids unnecessary wake-ups.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied.
^ permalink raw reply
* Re: [PATCH 12/17] sfc: Break NAPI processing after one ring-full of TX completions
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482982.2549.28.camel@achroite.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:42 +0100
> Currently TX completions do not count towards the NAPI budget. This
> means a continuous stream of TX completions can cause the polling
> function to loop indefinitely with scheduling disabled. To avoid
> this, follow the common practice of reporting the budget spent after
> processing one ring-full of TX completions.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied.
^ permalink raw reply
* Re: [PATCH 13/17] sfc: Add necessary parentheses to macro definitions in net_driver.h
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482990.2549.29.camel@achroite.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:50 +0100
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied.
^ permalink raw reply
* Re: [PATCH 14/17] sfc: Clean up efx_nic::irq_zero_count
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483000.2549.30.camel@achroite.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:00 +0100
> There is no need for this to be unsigned long; make it unsigned int.
> It does need a line in kernel-doc, so add that.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied.
^ permalink raw reply
* Re: [PATCH 15/17] sfc: Add Siena PHY BIST and cable diagnostic support
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483022.2549.31.camel@achroite.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:22 +0100
> From: Steve Hodgson <shodgson@solarflare.com>
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied.
^ permalink raw reply
* Re: [PATCH 16/17] sfc: Test only the first pair of TX queues
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483030.2549.32.camel@achroite.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:30 +0100
> This makes no immediate difference, but we definitely do not want
> to test all TX queues once we allocate a pair of TX queues to each
> channel.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied.
^ permalink raw reply
* Re: [PATCH 17/17] sfc: Create multiple TX queues
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483043.2549.33.camel@achroite.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:43 +0100
> Create a core TX queue and 2 hardware TX queues for each channel.
> If separate_tx_channels is set, create equal numbers of RX and TX
> channels instead.
>
> Rewrite the channel and queue iteration macros accordingly.
> Eliminate efx_channel::used_flags as redundant.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied.
^ permalink raw reply
* [PATCH v2] net/sb1250: register mdio bus in probe
From: Sebastian Andrzej Siewior @ 2010-04-28 19:57 UTC (permalink / raw)
To: David Miller; +Cc: ralf, netdev
In-Reply-To: <20100427.155215.228945283.davem@davemloft.net>
"ifconfig eth0 up && ifconfig eth0 down" triggers:
| kobject (a8000000cfa5a480): tried to init an initialized object, something is seriously wrong.
| Call Trace:
| [<ffffffff8010aabc>] dump_stack+0x8/0x34
| [<ffffffff80293128>] kobject_init+0xe8/0xf0
| [<ffffffff802d922c>] device_initialize+0x2c/0x98
| [<ffffffff802d9cfc>] device_register+0x14/0x28
| [<ffffffff80312cd4>] mdiobus_register+0xdc/0x1e0
| [<ffffffff80314cf0>] sbmac_open+0x58/0x220
| [<ffffffff803519bc>] __dev_open+0x11c/0x180
| [<ffffffff8034d578>] __dev_change_flags+0x120/0x180
| [<ffffffff80351848>] dev_change_flags+0x20/0x78
| [<ffffffff803a753c>] devinet_ioctl+0x7cc/0x820
| [<ffffffff80339ac8>] sock_do_ioctl+0x38/0x90
| [<ffffffff8033a258>] compat_sock_ioctl_trans+0x408/0x1030
| [<ffffffff8033af30>] compat_sock_ioctl+0xb0/0xd0
| [<ffffffff80208b08>] compat_sys_ioctl+0xa0/0x18b8
| [<ffffffff80102f94>] handle_sys+0x114/0x130
|
| sb1250-mac-mdio: probed
mdiobus_register() calls device_register() which initializes the kobj of
the device. mdiobus_unregister() calls only device_del() so we have one
reference left. That one is leaving with mdiobus_free() which is only
called on remove.
Since I don't see any reason why mdiobus_register()/mdiobus_unregister()
should happen in ->open()/->close() I move them to probe & exit.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
---
v2: register mdio bus before registering netdev device
drivers/net/sb1250-mac.c | 67 ++++++++++++++++++++++-----------------------
1 files changed, 33 insertions(+), 34 deletions(-)
diff --git a/drivers/net/sb1250-mac.c b/drivers/net/sb1250-mac.c
index 3f882d5..b0161b4 100644
--- a/drivers/net/sb1250-mac.c
+++ b/drivers/net/sb1250-mac.c
@@ -2256,17 +2256,36 @@ static int sbmac_init(struct platform_device *pldev, long long base)
sc->mii_bus = mdiobus_alloc();
if (sc->mii_bus == NULL) {
- sbmac_uninitctx(sc);
- return -ENOMEM;
+ err = -ENOMEM;
+ goto uninit_ctx;
}
+ sc->mii_bus->name = sbmac_mdio_string;
+ snprintf(sc->mii_bus->id, MII_BUS_ID_SIZE, "%x", idx);
+ sc->mii_bus->priv = sc;
+ sc->mii_bus->read = sbmac_mii_read;
+ sc->mii_bus->write = sbmac_mii_write;
+ sc->mii_bus->irq = sc->phy_irq;
+ for (i = 0; i < PHY_MAX_ADDR; ++i)
+ sc->mii_bus->irq[i] = SBMAC_PHY_INT;
+
+ sc->mii_bus->parent = &pldev->dev;
+ /*
+ * Probe PHY address
+ */
+ err = mdiobus_register(sc->mii_bus);
+ if (err) {
+ printk(KERN_ERR "%s: unable to register MDIO bus\n",
+ dev->name);
+ goto free_mdio;
+ }
+ dev_set_drvdata(&pldev->dev, sc->mii_bus);
+
err = register_netdev(dev);
if (err) {
printk(KERN_ERR "%s.%d: unable to register netdev\n",
sbmac_string, idx);
- mdiobus_free(sc->mii_bus);
- sbmac_uninitctx(sc);
- return err;
+ goto unreg_mdio;
}
pr_info("%s.%d: registered as %s\n", sbmac_string, idx, dev->name);
@@ -2282,19 +2301,15 @@ static int sbmac_init(struct platform_device *pldev, long long base)
pr_info("%s: SiByte Ethernet at 0x%08Lx, address: %pM\n",
dev->name, base, eaddr);
- sc->mii_bus->name = sbmac_mdio_string;
- snprintf(sc->mii_bus->id, MII_BUS_ID_SIZE, "%x", idx);
- sc->mii_bus->priv = sc;
- sc->mii_bus->read = sbmac_mii_read;
- sc->mii_bus->write = sbmac_mii_write;
- sc->mii_bus->irq = sc->phy_irq;
- for (i = 0; i < PHY_MAX_ADDR; ++i)
- sc->mii_bus->irq[i] = SBMAC_PHY_INT;
-
- sc->mii_bus->parent = &pldev->dev;
- dev_set_drvdata(&pldev->dev, sc->mii_bus);
-
return 0;
+unreg_mdio:
+ mdiobus_unregister(sc->mii_bus);
+ dev_set_drvdata(&pldev->dev, NULL);
+free_mdio:
+ mdiobus_free(sc->mii_bus);
+uninit_ctx:
+ sbmac_uninitctx(sc);
+ return err;
}
@@ -2320,16 +2335,6 @@ static int sbmac_open(struct net_device *dev)
goto out_err;
}
- /*
- * Probe PHY address
- */
- err = mdiobus_register(sc->mii_bus);
- if (err) {
- printk(KERN_ERR "%s: unable to register MDIO bus\n",
- dev->name);
- goto out_unirq;
- }
-
sc->sbm_speed = sbmac_speed_none;
sc->sbm_duplex = sbmac_duplex_none;
sc->sbm_fc = sbmac_fc_none;
@@ -2360,11 +2365,7 @@ static int sbmac_open(struct net_device *dev)
return 0;
out_unregister:
- mdiobus_unregister(sc->mii_bus);
-
-out_unirq:
free_irq(dev->irq, dev);
-
out_err:
return err;
}
@@ -2553,9 +2554,6 @@ static int sbmac_close(struct net_device *dev)
phy_disconnect(sc->phy_dev);
sc->phy_dev = NULL;
-
- mdiobus_unregister(sc->mii_bus);
-
free_irq(dev->irq, dev);
sbdma_emptyring(&(sc->sbm_txdma));
@@ -2663,6 +2661,7 @@ static int __exit sbmac_remove(struct platform_device *pldev)
unregister_netdev(dev);
sbmac_uninitctx(sc);
+ mdiobus_unregister(sc->mii_bus);
mdiobus_free(sc->mii_bus);
iounmap(sc->sbm_base);
free_netdev(dev);
--
1.6.6.1
^ permalink raw reply related
* Re: [PATCH net-next-2.6 1/7] caif: Ldisc add permission check and mem-alloc error check
From: David Miller @ 2010-04-28 20:02 UTC (permalink / raw)
To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-1-git-send-email-sjur.brandeland@stericsson.com>
From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:34 +0200
> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
>
> Changes:
> o Added permission checks for installing. CAP_SYS_ADMIN and
> CAP_SYS_TTY_CONFIG can install the ldisc.
> o Check if allocation of skb was successful.
>
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next-2.6 2/7] caif: Rename functions in cfcnfg and caif_dev
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-2-git-send-email-sjur.brandeland@stericsson.com>
From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:35 +0200
> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
>
> Changes:
> o Renamed cfcnfg_del_adapt_layer to cfcnfg_disconn_adapt_layer
> o Fixed typo cfcfg to cfcnfg
> o Renamed linkid to channel_id
> o Updated documentation in caif_dev.h
> o Minor formatting changes
>
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next-2.6 3/7] caif: Add reference counting to service layer
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-3-git-send-email-sjur.brandeland@stericsson.com>
From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:36 +0200
> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
>
> Changes:
> o Added functions cfsrvl_get and cfsrvl_put.
> o Added support release_client to use by socket and net device.
> o Increase reference counting for in-flight packets from cfmuxl
>
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next-2.6 4/7] caif: Disconnect without waiting for response
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-4-git-send-email-sjur.brandeland@stericsson.com>
From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:37 +0200
> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
>
> Changes:
> o Function cfcnfg_disconn_adapt_layer is changed to do asynchronous
> disconnect, not waiting for any response from the modem. Due to this
> the function cfcnfg_linkdestroy_rsp does nothing anymore.
> o Because disconnect may take down a connection before a connect response
> is received the function cfcnfg_linkup_rsp is checking if the client is
> still waiting for the response, if not a disconnect request is sent to
> the modem.
> o cfctrl is no longer keeping track of pending disconnect requests.
> o Added function cfctrl_cancel_req, which is used for deleting a pending
> connect request if disconnect is done before connect response is received.
> o Removed unused function cfctrl_insert_req2
> o Added better handling of connect reject from modem.
>
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next-2.6 5/7] caif: Rewritten socket implementation
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-5-git-send-email-sjur.brandeland@stericsson.com>
From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:38 +0200
> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
>
> Changes:
> This is a complete re-write of the socket layer. Making the socket
> implementation more aligned with the other socket layers and using more
> of the support functions available in sock.c. Lots of code is copied
> from af_unix (and some from af_irda).
> Non-blocking mode should be working as well.
>
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next-2.6 6/7] caif: Bugfixes in CAIF netdevice for close and flow control
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-6-git-send-email-sjur.brandeland@stericsson.com>
From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:39 +0200
> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
>
> Changes:
> o Bugfix: Flow control was causing the device to be destroyed.
> o Bugfix: Handle CAIF channel connect failures.
> o If the underlying link layer is gone the net-device is no longer removed,
> but closed.
>
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next-2.6 7/7] Bugfix: Link selection was swapped in switch.
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-7-git-send-email-sjur.brandeland@stericsson.com>
From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:40 +0200
> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
>
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Applied.
^ permalink raw reply
* Re: 2.6.34-rc5-git7 (plus all patches) -- another suspicious rcu_dereference_check() usage.
From: Paul E. McKenney @ 2010-04-28 20:09 UTC (permalink / raw)
To: Eric Dumazet
Cc: Miles Lane, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
Peter Zijlstra, LKML, nauman, netdev, Jens Axboe, Gui Jianfeng,
Li Zefan, Johannes Berg, shemminger
In-Reply-To: <1272483491.2201.9.camel@edumazet-laptop>
On Wed, Apr 28, 2010 at 09:38:11PM +0200, Eric Dumazet wrote:
> Le mercredi 28 avril 2010 à 10:54 -0700, Paul E. McKenney a écrit :
> > On Mon, Apr 26, 2010 at 08:51:06PM -0400, Miles Lane wrote:
> > > This one occurred during the wakeup from suspend to RAM.
> > >
> > > [ 984.724697] [ INFO: suspicious rcu_dereference_check() usage. ]
> > > [ 984.724700] ---------------------------------------------------
> > > [ 984.724703] include/linux/fdtable.h:88 invoked
> > > rcu_dereference_check() without protection!
> > > [ 984.724706]
> > > [ 984.724707] other info that might help us debug this:
> > > [ 984.724708]
> > > [ 984.724711]
> > > [ 984.724711] rcu_scheduler_active = 1, debug_locks = 1
> > > [ 984.724714] no locks held by dbus-daemon/4680.
> > > [ 984.724717]
> > > [ 984.724717] stack backtrace:
> > > [ 984.724721] Pid: 4680, comm: dbus-daemon Not tainted 2.6.34-rc5-git7 #33
> > > [ 984.724724] Call Trace:
> > > [ 984.724734] [<ffffffff81074556>] lockdep_rcu_dereference+0x9d/0xa6
> > > [ 984.724740] [<ffffffff810fc785>] fcheck_files+0xb1/0xc9
> > > [ 984.724745] [<ffffffff810fc7f5>] fget_light+0x35/0xab
> > > [ 984.724751] [<ffffffff81433e1b>] ? sock_poll_wait+0x13/0x18
> > > [ 984.724755] [<ffffffff81433e39>] ? unix_poll+0x19/0x95
> > > [ 984.724762] [<ffffffff8110aa95>] do_sys_poll+0x1ff/0x3e5
> > > [ 984.724766] [<ffffffff8110a19e>] ? __pollwait+0x0/0xc7
> > > [ 984.724771] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [ 984.724776] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [ 984.724780] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [ 984.724784] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [ 984.724788] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [ 984.724793] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [ 984.724797] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [ 984.724802] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [ 984.724806] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [ 984.724812] [<ffffffff8110ae0f>] sys_poll+0x50/0xbb
> > > [ 984.724818] [<ffffffff81009d82>] system_call_fastpath+0x16/0x1b
> >
> > Hmmm... I am not convinced that this is a false positive. Couldn't
> > there be a multi-threaded process where one thread is invoking poll()
> > on a UNIX socket just as another thread is calling close() on it?
> >
> > The current fcheck_files() logic requires that the caller either (1) be in
> > an RCU read-side critical section, (2) hold ->files_lock, or (3) passing
> > in a files_struct with ->count equal to 1 (initialization or cleanup).
> >
> > So I don't feel comfortable just slapping an RCU read-side critical
> > section around this one, at least not unless someone who understands
> > the locking says that doing so is OK.
> >
> >
>
> Its a single threaded program.
>
> So fget_light() calls fcheck_files(files, fd); without rcu lock,
> but some /proc/pid/fd/... user temporarly raised files->count just
> before we perform the condition check.
So I should add a single-threaded check. My first thought was to use
current_is_single_threaded(), but the bit about scanning the full list
of processes does give me pause. However, thread_group_empty() looks
like a much lighter-weight alternative.
I believe that it is possible for a pair of single-threaded processes
to share a file descriptor, but that should not be a problem, as both
of them would need to close it for it to go away.
But what happens if someone does a clone() with CLONE_FILES, as some
of the AIO stuff seems to do? Won't that allow one of the resulting
processes to close the file for both of them, even though both are
otherwise single-threaded? And the ->count seems to be the only
distinction between these two cases.
And AIO does CLONE_VM as well as CLONE_FILES, but that seems to mean that
the check must scan the processes with current_is_single_threaded().
Besides which, a user could invoke clone() with only CLONE_FILES
specified, right?
Or am I just confused here?
Thanx, Paul
^ permalink raw reply
* Re: [PATCH]: sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v3)
From: Vlad Yasevich @ 2010-04-28 20:16 UTC (permalink / raw)
To: Neil Horman; +Cc: sri, linux-sctp, eteo, netdev, davem, security
In-Reply-To: <20100428193755.GF4818@hmsreliant.think-freely.org>
Neil Horman wrote:
... snip description ...
>
>
> include/net/sctp/structs.h | 1
> net/sctp/sm_make_chunk.c | 70 +++++++++++++++++++++++++++++++++++++++------
> 2 files changed, 62 insertions(+), 9 deletions(-)
>
>
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index ff30177..597f8e2 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -778,6 +778,7 @@ int sctp_user_addto_chunk(struct sctp_chunk *chunk, int off, int len,
> struct iovec *data);
> void sctp_chunk_free(struct sctp_chunk *);
> void *sctp_addto_chunk(struct sctp_chunk *, int len, const void *data);
> +void *sctp_addto_chunk_fixed(struct sctp_chunk *, int len, const void *data);
> struct sctp_chunk *sctp_chunkify(struct sk_buff *,
> const struct sctp_association *,
> struct sock *);
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index f592163..9623112 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -107,7 +107,7 @@ static const struct sctp_paramhdr prsctp_param = {
> cpu_to_be16(sizeof(struct sctp_paramhdr)),
> };
>
> -/* A helper to initialize to initialize an op error inside a
> +/* A helper to initialize an op error inside a
> * provided chunk, as most cause codes will be embedded inside an
> * abort chunk.
> */
> @@ -124,6 +124,29 @@ void sctp_init_cause(struct sctp_chunk *chunk, __be16 cause_code,
> chunk->subh.err_hdr = sctp_addto_chunk(chunk, sizeof(sctp_errhdr_t), &err);
> }
>
> +/* A helper to initialize an op error inside a
> + * provided chunk, as most cause codes will be embedded inside an
> + * abort chunk. Differs from sctp_init_cause in that it won't oops
> + * if there isn't enough space in the op error chunk
> + */
> +int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
> + size_t paylen)
> +{
> + sctp_errhdr_t err;
> + __u16 len;
> +
> + /* Cause code constants are now defined in network order. */
> + err.cause = cause_code;
> + len = sizeof(sctp_errhdr_t) + paylen;
> + err.length = htons(len);
> +
> + if (skb_tailroom(chunk->skb) > len)
> + return -ENOSPC;
> + chunk->subh.err_hdr = sctp_addto_chunk_fixed(chunk,
> + sizeof(sctp_errhdr_t),
> + &err);
> + return 0;
> +}
> /* 3.3.2 Initiation (INIT) (1)
> *
> * This chunk is used to initiate a SCTP association between two
> @@ -1131,6 +1154,24 @@ nodata:
> return retval;
> }
>
> +/* Create an Operation Error chunk of a fixed size,
> + * specifically, max(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT)
> + * This is a helper function to allocate an error chunk for
> + * for those invalid parameter codes in which we may not want
> + * to report all the errors, if the incomming chunk is large
> + */
> +static inline struct sctp_chunk *sctp_make_op_error_fixed(
> + const struct sctp_association *asoc,
> + const struct sctp_chunk *chunk)
> +{
> + size_t size = asoc ? asoc->pathmtu : 0;
> +
> + if (size > SCTP_DEFAULT_MAXSEGMENT)
> + size = SCTP_DEFAULT_MAXSEGMENT;
> +
This doesn't look right. If you don't have an association or if pmtu is
not initialized, you will end up with a size of 0. I think you simply
want to a
if (!size)
there.
> + return sctp_make_op_error_space(asoc, chunk, size);
> +}
> +
> /* Create an Operation Error chunk. */
> struct sctp_chunk *sctp_make_op_error(const struct sctp_association *asoc,
> const struct sctp_chunk *chunk,
> @@ -1373,6 +1414,18 @@ void *sctp_addto_chunk(struct sctp_chunk *chunk, int len, const void *data)
> return target;
> }
>
> +/* Append bytes to the end of a chunk. Returns NULL if there isn't sufficient
> + * space in the chunk
> + */
> +void *sctp_addto_chunk_fixed(struct sctp_chunk *chunk,
> + int len, const void *data)
> +{
> + if (skb_tailroom(chunk->skb) > len)
> + return sctp_addto_chunk(chunk, len, data);
> + else
> + return NULL;
> +}
> +
> /* Append bytes from user space to the end of a chunk. Will panic if
> * chunk is not big enough.
> * Returns a kernel err value.
> @@ -1793,9 +1846,9 @@ static int sctp_process_missing_param(const struct sctp_association *asoc,
> if (*errp) {
> report.num_missing = htonl(1);
> report.type = paramtype;
> - sctp_init_cause(*errp, SCTP_ERROR_MISS_PARAM,
> - sizeof(report));
> - sctp_addto_chunk(*errp, sizeof(report), &report);
> + sctp_init_cause_fixed(*errp, SCTP_ERROR_MISS_PARAM,
> + sizeof(report));
> + sctp_addto_chunk_fixed(*errp, sizeof(report), &report);
> }
>
> /* Stop processing this chunk. */
> @@ -1813,7 +1866,7 @@ static int sctp_process_inv_mandatory(const struct sctp_association *asoc,
> *errp = sctp_make_op_error_space(asoc, chunk, 0);
>
> if (*errp)
> - sctp_init_cause(*errp, SCTP_ERROR_INV_PARAM, 0);
> + sctp_init_cause_fixed(*errp, SCTP_ERROR_INV_PARAM, 0);
>
> /* Stop processing this chunk. */
> return 0;
I don't think missing or mandatory parameters are effected by this. They are a
once and done deal. There don't report multiple errors and we don't add any
more error after them.
> @@ -1976,13 +2029,12 @@ static sctp_ierror_t sctp_process_unk_param(const struct sctp_association *asoc,
> * returning multiple unknown parameters.
> */
> if (NULL == *errp)
> - *errp = sctp_make_op_error_space(asoc, chunk,
> - ntohs(chunk->chunk_hdr->length));
> + *errp = sctp_make_op_error_fixed(asoc, chunk);
>
> if (*errp) {
> - sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
> + sctp_init_cause_fixed(*errp, SCTP_ERROR_UNKNOWN_PARAM,
> WORD_ROUND(ntohs(param.p->length)));
> - sctp_addto_chunk(*errp,
> + sctp_addto_chunk_fixed(*errp,
> WORD_ROUND(ntohs(param.p->length)),
> param.v);
> } else {
>
So we completely get rid of variable size error chunk in this case, which I can
live with. It simplifies the code.
-vlad
^ permalink raw reply
* Re: 2.6.34-rc5-git7 (plus all patches) -- another suspicious rcu_dereference_check() usage.
From: Eric Dumazet @ 2010-04-28 20:18 UTC (permalink / raw)
To: paulmck
Cc: Miles Lane, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
Peter Zijlstra, LKML, nauman, netdev, Jens Axboe, Gui Jianfeng,
Li Zefan, Johannes Berg, shemminger
In-Reply-To: <20100428200904.GS2540@linux.vnet.ibm.com>
Le mercredi 28 avril 2010 à 13:09 -0700, Paul E. McKenney a écrit :
> On Wed, Apr 28, 2010 at 09:38:11PM +0200, Eric Dumazet wrote:
> > Le mercredi 28 avril 2010 à 10:54 -0700, Paul E. McKenney a écrit :
> > > On Mon, Apr 26, 2010 at 08:51:06PM -0400, Miles Lane wrote:
> > > > This one occurred during the wakeup from suspend to RAM.
> > > >
> > > > [ 984.724697] [ INFO: suspicious rcu_dereference_check() usage. ]
> > > > [ 984.724700] ---------------------------------------------------
> > > > [ 984.724703] include/linux/fdtable.h:88 invoked
> > > > rcu_dereference_check() without protection!
> > > > [ 984.724706]
> > > > [ 984.724707] other info that might help us debug this:
> > > > [ 984.724708]
> > > > [ 984.724711]
> > > > [ 984.724711] rcu_scheduler_active = 1, debug_locks = 1
> > > > [ 984.724714] no locks held by dbus-daemon/4680.
> > > > [ 984.724717]
> > > > [ 984.724717] stack backtrace:
> > > > [ 984.724721] Pid: 4680, comm: dbus-daemon Not tainted 2.6.34-rc5-git7 #33
> > > > [ 984.724724] Call Trace:
> > > > [ 984.724734] [<ffffffff81074556>] lockdep_rcu_dereference+0x9d/0xa6
> > > > [ 984.724740] [<ffffffff810fc785>] fcheck_files+0xb1/0xc9
> > > > [ 984.724745] [<ffffffff810fc7f5>] fget_light+0x35/0xab
> > > > [ 984.724751] [<ffffffff81433e1b>] ? sock_poll_wait+0x13/0x18
> > > > [ 984.724755] [<ffffffff81433e39>] ? unix_poll+0x19/0x95
> > > > [ 984.724762] [<ffffffff8110aa95>] do_sys_poll+0x1ff/0x3e5
> > > > [ 984.724766] [<ffffffff8110a19e>] ? __pollwait+0x0/0xc7
> > > > [ 984.724771] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [ 984.724776] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [ 984.724780] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [ 984.724784] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [ 984.724788] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [ 984.724793] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [ 984.724797] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [ 984.724802] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [ 984.724806] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [ 984.724812] [<ffffffff8110ae0f>] sys_poll+0x50/0xbb
> > > > [ 984.724818] [<ffffffff81009d82>] system_call_fastpath+0x16/0x1b
> > >
> > > Hmmm... I am not convinced that this is a false positive. Couldn't
> > > there be a multi-threaded process where one thread is invoking poll()
> > > on a UNIX socket just as another thread is calling close() on it?
> > >
> > > The current fcheck_files() logic requires that the caller either (1) be in
> > > an RCU read-side critical section, (2) hold ->files_lock, or (3) passing
> > > in a files_struct with ->count equal to 1 (initialization or cleanup).
> > >
> > > So I don't feel comfortable just slapping an RCU read-side critical
> > > section around this one, at least not unless someone who understands
> > > the locking says that doing so is OK.
> > >
> > >
> >
> > Its a single threaded program.
> >
> > So fget_light() calls fcheck_files(files, fd); without rcu lock,
> > but some /proc/pid/fd/... user temporarly raised files->count just
> > before we perform the condition check.
>
> So I should add a single-threaded check. My first thought was to use
> current_is_single_threaded(), but the bit about scanning the full list
> of processes does give me pause. However, thread_group_empty() looks
> like a much lighter-weight alternative.
>
> I believe that it is possible for a pair of single-threaded processes
> to share a file descriptor, but that should not be a problem, as both
> of them would need to close it for it to go away.
>
> But what happens if someone does a clone() with CLONE_FILES, as some
> of the AIO stuff seems to do? Won't that allow one of the resulting
> processes to close the file for both of them, even though both are
> otherwise single-threaded? And the ->count seems to be the only
> distinction between these two cases.
>
> And AIO does CLONE_VM as well as CLONE_FILES, but that seems to mean that
> the check must scan the processes with current_is_single_threaded().
> Besides which, a user could invoke clone() with only CLONE_FILES
> specified, right?
>
> Or am I just confused here?
>
> Thanx, Paul
If a program is mono threaded, and doing a fget_light() syscall, it
cannot possibly do a clone() in // ;)
If we want to be picky, we could add a user provided condition, aka "we
are sure we are allowed to do this because we are the owner of the files
struct".
diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index 6da962c..027f5e1 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -2694,7 +2694,7 @@ void __do_SAK(struct tty_struct *tty)
spin_lock(&p->files->file_lock);
fdt = files_fdtable(p->files);
for (i = 0; i < fdt->max_fds; i++) {
- filp = fcheck_files(p->files, i);
+ filp = fcheck_files(p->files, i, false);
if (!filp)
continue;
if (filp->f_op->read == tty_read &&
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 452d02f..dabf4d8 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -119,7 +119,7 @@ SYSCALL_DEFINE2(dup2, unsigned int, oldfd, unsigned int, newfd)
int retval = oldfd;
rcu_read_lock();
- if (!fcheck_files(files, oldfd))
+ if (!fcheck_files(files, oldfd, false))
retval = -EBADF;
rcu_read_unlock();
return retval;
diff --git a/fs/file_table.c b/fs/file_table.c
index 32d12b7..2865f72 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -274,7 +274,7 @@ struct file *fget(unsigned int fd)
struct files_struct *files = current->files;
rcu_read_lock();
- file = fcheck_files(files, fd);
+ file = fcheck_files(files, fd, false);
if (file) {
if (!atomic_long_inc_not_zero(&file->f_count)) {
/* File object ref couldn't be taken */
@@ -303,10 +303,10 @@ struct file *fget_light(unsigned int fd, int *fput_needed)
*fput_needed = 0;
if (likely((atomic_read(&files->count) == 1))) {
- file = fcheck_files(files, fd);
+ file = fcheck_files(files, fd, true);
} else {
rcu_read_lock();
- file = fcheck_files(files, fd);
+ file = fcheck_files(files, fd, false);
if (file) {
if (atomic_long_inc_not_zero(&file->f_count))
*fput_needed = 1;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 8418fcc..0e89448 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1716,7 +1716,7 @@ static int proc_fd_info(struct inode *inode, struct path *path, char *info)
* hold ->file_lock.
*/
spin_lock(&files->file_lock);
- file = fcheck_files(files, fd);
+ file = fcheck_files(files, fd, false);
if (file) {
if (path) {
*path = file->f_path;
@@ -1755,7 +1755,7 @@ static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
files = get_files_struct(task);
if (files) {
rcu_read_lock();
- if (fcheck_files(files, fd)) {
+ if (fcheck_files(files, fd, false)) {
rcu_read_unlock();
put_files_struct(files);
if (task_dumpable(task)) {
@@ -1813,7 +1813,7 @@ static struct dentry *proc_fd_instantiate(struct inode *dir,
* hold ->file_lock.
*/
spin_lock(&files->file_lock);
- file = fcheck_files(files, fd);
+ file = fcheck_files(files, fd, false);
if (!file)
goto out_unlock;
if (file->f_mode & FMODE_READ)
@@ -1899,7 +1899,7 @@ static int proc_readfd_common(struct file * filp, void * dirent,
char name[PROC_NUMBUF];
int len;
- if (!fcheck_files(files, fd))
+ if (!fcheck_files(files, fd, false))
continue;
rcu_read_unlock();
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 013dc52..76423ad 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -57,11 +57,12 @@ struct files_struct {
struct file * fd_array[NR_OPEN_DEFAULT];
};
-#define rcu_dereference_check_fdtable(files, fdtfd) \
+#define rcu_dereference_check_fdtable(files, fdtfd, cond) \
(rcu_dereference_check((fdtfd), \
rcu_read_lock_held() || \
lockdep_is_held(&(files)->file_lock) || \
- atomic_read(&(files)->count) == 1))
+ atomic_read(&(files)->count) == 1 || \
+ cond))
#define files_fdtable(files) \
(rcu_dereference_check_fdtable((files), (files)->fdt))
@@ -79,13 +80,13 @@ static inline void free_fdtable(struct fdtable *fdt)
call_rcu(&fdt->rcu, free_fdtable_rcu);
}
-static inline struct file * fcheck_files(struct files_struct *files, unsigned int fd)
+static inline struct file * fcheck_files(struct files_struct *files, unsigned int fd, bool cond)
{
struct file * file = NULL;
struct fdtable *fdt = files_fdtable(files);
if (fd < fdt->max_fds)
- file = rcu_dereference_check_fdtable(files, fdt->fd[fd]);
+ file = rcu_dereference_check_fdtable(files, fdt->fd[fd], cond);
return file;
}
^ permalink raw reply related
* Re: [PATCH]: sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v4)
From: Neil Horman @ 2010-04-28 20:30 UTC (permalink / raw)
To: Vlad Yasevich; +Cc: sri, linux-sctp, eteo, netdev, davem, security
In-Reply-To: <4BD897B6.5040405@hp.com>
Ok, version 4
Change Notes:
1) Minor cleanups, from Vlads notes
Summary:
Hey-
Recently, it was reported to me that the kernel could oops in the
following way:
<5> kernel BUG at net/core/skbuff.c:91!
<5> invalid operand: 0000 [#1]
<5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
mptbase sd_mod scsi_mod
<5> CPU: 0
<5> EIP: 0060:[<c02bff27>] Not tainted VLI
<5> EFLAGS: 00010216 (2.6.9-89.0.25.EL)
<5> EIP is at skb_over_panic+0x1f/0x2d
<5> eax: 0000002c ebx: c033f461 ecx: c0357d96 edx: c040fd44
<5> esi: c033f461 edi: df653280 ebp: 00000000 esp: c040fd40
<5> ds: 007b es: 007b ss: 0068
<5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
<5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
e0c2947d
<5> 00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
df653490
<5> 00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
00000004
<5> Call Trace:
<5> [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
<5> [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
<5> [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
<5> [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
<5> [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
<5> [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
<5> [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
<5> [<c01555a4>] cache_grow+0x140/0x233
<5> [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
<5> [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
<5> [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
<5> [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
<5> [<c02d005e>] nf_iterate+0x40/0x81
<5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5> [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
<5> [<c02d0362>] nf_hook_slow+0x83/0xb5
<5> [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
<5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5> [<c02e103e>] ip_rcv+0x334/0x3b4
<5> [<c02c66fd>] netif_receive_skb+0x320/0x35b
<5> [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
<5> [<c02c67a4>] process_backlog+0x6c/0xd9
<5> [<c02c690f>] net_rx_action+0xfe/0x1f8
<5> [<c012a7b1>] __do_softirq+0x35/0x79
<5> [<c0107efb>] handle_IRQ_event+0x0/0x4f
<5> [<c01094de>] do_softirq+0x46/0x4d
Its an skb_over_panic BUG halt that results from processing an init chunk in
which too many of its variable length parameters are in some way malformed.
The problem is in sctp_process_unk_param:
if (NULL == *errp)
*errp = sctp_make_op_error_space(asoc, chunk,
ntohs(chunk->chunk_hdr->length));
if (*errp) {
sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
WORD_ROUND(ntohs(param.p->length)));
sctp_addto_chunk(*errp,
WORD_ROUND(ntohs(param.p->length)),
param.v);
When we allocate an error chunk, we assume that the worst case scenario requires
that we have chunk_hdr->length data allocated, which would be correct nominally,
given that we call sctp_addto_chunk for the violating parameter. Unfortunately,
we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
chunk, so the worst case situation in which all parameters are in violation
requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.
The result of this error is that a deliberately malformed packet sent to a
listening host can cause a remote DOS, described in CVE-2010-1173:
http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173
I've tested the below fix and confirmed that it fixes the issue. We move to a
strategy whereby we allocate a fixed size error chunk and ignore errors we don't
have space to report. Tested by me successfully
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
include/net/sctp/structs.h | 1
net/sctp/sm_make_chunk.c | 62 +++++++++++++++++++++++++++++++++++++++++----
2 files changed, 58 insertions(+), 5 deletions(-)
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index ff30177..597f8e2 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -778,6 +778,7 @@ int sctp_user_addto_chunk(struct sctp_chunk *chunk, int off, int len,
struct iovec *data);
void sctp_chunk_free(struct sctp_chunk *);
void *sctp_addto_chunk(struct sctp_chunk *, int len, const void *data);
+void *sctp_addto_chunk_fixed(struct sctp_chunk *, int len, const void *data);
struct sctp_chunk *sctp_chunkify(struct sk_buff *,
const struct sctp_association *,
struct sock *);
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index f592163..2971731 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -107,7 +107,7 @@ static const struct sctp_paramhdr prsctp_param = {
cpu_to_be16(sizeof(struct sctp_paramhdr)),
};
-/* A helper to initialize to initialize an op error inside a
+/* A helper to initialize an op error inside a
* provided chunk, as most cause codes will be embedded inside an
* abort chunk.
*/
@@ -124,6 +124,29 @@ void sctp_init_cause(struct sctp_chunk *chunk, __be16 cause_code,
chunk->subh.err_hdr = sctp_addto_chunk(chunk, sizeof(sctp_errhdr_t), &err);
}
+/* A helper to initialize an op error inside a
+ * provided chunk, as most cause codes will be embedded inside an
+ * abort chunk. Differs from sctp_init_cause in that it won't oops
+ * if there isn't enough space in the op error chunk
+ */
+int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
+ size_t paylen)
+{
+ sctp_errhdr_t err;
+ __u16 len;
+
+ /* Cause code constants are now defined in network order. */
+ err.cause = cause_code;
+ len = sizeof(sctp_errhdr_t) + paylen;
+ err.length = htons(len);
+
+ if (skb_tailroom(chunk->skb) > len)
+ return -ENOSPC;
+ chunk->subh.err_hdr = sctp_addto_chunk_fixed(chunk,
+ sizeof(sctp_errhdr_t),
+ &err);
+ return 0;
+}
/* 3.3.2 Initiation (INIT) (1)
*
* This chunk is used to initiate a SCTP association between two
@@ -1131,6 +1154,24 @@ nodata:
return retval;
}
+/* Create an Operation Error chunk of a fixed size,
+ * specifically, max(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT)
+ * This is a helper function to allocate an error chunk for
+ * for those invalid parameter codes in which we may not want
+ * to report all the errors, if the incomming chunk is large
+ */
+static inline struct sctp_chunk *sctp_make_op_error_fixed(
+ const struct sctp_association *asoc,
+ const struct sctp_chunk *chunk)
+{
+ size_t size = asoc ? asoc->pathmtu : 0;
+
+ if (!size)
+ size = SCTP_DEFAULT_MAXSEGMENT;
+
+ return sctp_make_op_error_space(asoc, chunk, size);
+}
+
/* Create an Operation Error chunk. */
struct sctp_chunk *sctp_make_op_error(const struct sctp_association *asoc,
const struct sctp_chunk *chunk,
@@ -1373,6 +1414,18 @@ void *sctp_addto_chunk(struct sctp_chunk *chunk, int len, const void *data)
return target;
}
+/* Append bytes to the end of a chunk. Returns NULL if there isn't sufficient
+ * space in the chunk
+ */
+void *sctp_addto_chunk_fixed(struct sctp_chunk *chunk,
+ int len, const void *data)
+{
+ if (skb_tailroom(chunk->skb) > len)
+ return sctp_addto_chunk(chunk, len, data);
+ else
+ return NULL;
+}
+
/* Append bytes from user space to the end of a chunk. Will panic if
* chunk is not big enough.
* Returns a kernel err value.
@@ -1976,13 +2029,12 @@ static sctp_ierror_t sctp_process_unk_param(const struct sctp_association *asoc,
* returning multiple unknown parameters.
*/
if (NULL == *errp)
- *errp = sctp_make_op_error_space(asoc, chunk,
- ntohs(chunk->chunk_hdr->length));
+ *errp = sctp_make_op_error_fixed(asoc, chunk);
if (*errp) {
- sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
+ sctp_init_cause_fixed(*errp, SCTP_ERROR_UNKNOWN_PARAM,
WORD_ROUND(ntohs(param.p->length)));
- sctp_addto_chunk(*errp,
+ sctp_addto_chunk_fixed(*errp,
WORD_ROUND(ntohs(param.p->length)),
param.v);
} else {
^ permalink raw reply related
* Re: [PATCH]: sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v4)
From: Vlad Yasevich @ 2010-04-28 20:37 UTC (permalink / raw)
To: Neil Horman; +Cc: sri, linux-sctp, eteo, netdev, davem, security
In-Reply-To: <20100428203059.GG4818@hmsreliant.think-freely.org>
Looks good.
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-vlad
Neil Horman wrote:
> Ok, version 4
>
> Change Notes:
> 1) Minor cleanups, from Vlads notes
>
> Summary:
>
>
> Hey-
> Recently, it was reported to me that the kernel could oops in the
> following way:
>
> <5> kernel BUG at net/core/skbuff.c:91!
> <5> invalid operand: 0000 [#1]
> <5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
> ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
> vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
> ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
> snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
> pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
> mptbase sd_mod scsi_mod
> <5> CPU: 0
> <5> EIP: 0060:[<c02bff27>] Not tainted VLI
> <5> EFLAGS: 00010216 (2.6.9-89.0.25.EL)
> <5> EIP is at skb_over_panic+0x1f/0x2d
> <5> eax: 0000002c ebx: c033f461 ecx: c0357d96 edx: c040fd44
> <5> esi: c033f461 edi: df653280 ebp: 00000000 esp: c040fd40
> <5> ds: 007b es: 007b ss: 0068
> <5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
> <5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
> e0c2947d
> <5> 00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
> df653490
> <5> 00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
> 00000004
> <5> Call Trace:
> <5> [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
> <5> [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
> <5> [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
> <5> [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
> <5> [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
> <5> [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
> <5> [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
> <5> [<c01555a4>] cache_grow+0x140/0x233
> <5> [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
> <5> [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
> <5> [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
> <5> [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
> <5> [<c02d005e>] nf_iterate+0x40/0x81
> <5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
> <5> [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
> <5> [<c02d0362>] nf_hook_slow+0x83/0xb5
> <5> [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
> <5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
> <5> [<c02e103e>] ip_rcv+0x334/0x3b4
> <5> [<c02c66fd>] netif_receive_skb+0x320/0x35b
> <5> [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
> <5> [<c02c67a4>] process_backlog+0x6c/0xd9
> <5> [<c02c690f>] net_rx_action+0xfe/0x1f8
> <5> [<c012a7b1>] __do_softirq+0x35/0x79
> <5> [<c0107efb>] handle_IRQ_event+0x0/0x4f
> <5> [<c01094de>] do_softirq+0x46/0x4d
>
> Its an skb_over_panic BUG halt that results from processing an init chunk in
> which too many of its variable length parameters are in some way malformed.
>
> The problem is in sctp_process_unk_param:
> if (NULL == *errp)
> *errp = sctp_make_op_error_space(asoc, chunk,
> ntohs(chunk->chunk_hdr->length));
>
> if (*errp) {
> sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
> WORD_ROUND(ntohs(param.p->length)));
> sctp_addto_chunk(*errp,
> WORD_ROUND(ntohs(param.p->length)),
> param.v);
>
> When we allocate an error chunk, we assume that the worst case scenario requires
> that we have chunk_hdr->length data allocated, which would be correct nominally,
> given that we call sctp_addto_chunk for the violating parameter. Unfortunately,
> we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
> chunk, so the worst case situation in which all parameters are in violation
> requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.
>
> The result of this error is that a deliberately malformed packet sent to a
> listening host can cause a remote DOS, described in CVE-2010-1173:
> http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173
>
> I've tested the below fix and confirmed that it fixes the issue. We move to a
> strategy whereby we allocate a fixed size error chunk and ignore errors we don't
> have space to report. Tested by me successfully
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>
>
> include/net/sctp/structs.h | 1
> net/sctp/sm_make_chunk.c | 62 +++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 58 insertions(+), 5 deletions(-)
>
>
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index ff30177..597f8e2 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -778,6 +778,7 @@ int sctp_user_addto_chunk(struct sctp_chunk *chunk, int off, int len,
> struct iovec *data);
> void sctp_chunk_free(struct sctp_chunk *);
> void *sctp_addto_chunk(struct sctp_chunk *, int len, const void *data);
> +void *sctp_addto_chunk_fixed(struct sctp_chunk *, int len, const void *data);
> struct sctp_chunk *sctp_chunkify(struct sk_buff *,
> const struct sctp_association *,
> struct sock *);
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index f592163..2971731 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -107,7 +107,7 @@ static const struct sctp_paramhdr prsctp_param = {
> cpu_to_be16(sizeof(struct sctp_paramhdr)),
> };
>
> -/* A helper to initialize to initialize an op error inside a
> +/* A helper to initialize an op error inside a
> * provided chunk, as most cause codes will be embedded inside an
> * abort chunk.
> */
> @@ -124,6 +124,29 @@ void sctp_init_cause(struct sctp_chunk *chunk, __be16 cause_code,
> chunk->subh.err_hdr = sctp_addto_chunk(chunk, sizeof(sctp_errhdr_t), &err);
> }
>
> +/* A helper to initialize an op error inside a
> + * provided chunk, as most cause codes will be embedded inside an
> + * abort chunk. Differs from sctp_init_cause in that it won't oops
> + * if there isn't enough space in the op error chunk
> + */
> +int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
> + size_t paylen)
> +{
> + sctp_errhdr_t err;
> + __u16 len;
> +
> + /* Cause code constants are now defined in network order. */
> + err.cause = cause_code;
> + len = sizeof(sctp_errhdr_t) + paylen;
> + err.length = htons(len);
> +
> + if (skb_tailroom(chunk->skb) > len)
> + return -ENOSPC;
> + chunk->subh.err_hdr = sctp_addto_chunk_fixed(chunk,
> + sizeof(sctp_errhdr_t),
> + &err);
> + return 0;
> +}
> /* 3.3.2 Initiation (INIT) (1)
> *
> * This chunk is used to initiate a SCTP association between two
> @@ -1131,6 +1154,24 @@ nodata:
> return retval;
> }
>
> +/* Create an Operation Error chunk of a fixed size,
> + * specifically, max(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT)
> + * This is a helper function to allocate an error chunk for
> + * for those invalid parameter codes in which we may not want
> + * to report all the errors, if the incomming chunk is large
> + */
> +static inline struct sctp_chunk *sctp_make_op_error_fixed(
> + const struct sctp_association *asoc,
> + const struct sctp_chunk *chunk)
> +{
> + size_t size = asoc ? asoc->pathmtu : 0;
> +
> + if (!size)
> + size = SCTP_DEFAULT_MAXSEGMENT;
> +
> + return sctp_make_op_error_space(asoc, chunk, size);
> +}
> +
> /* Create an Operation Error chunk. */
> struct sctp_chunk *sctp_make_op_error(const struct sctp_association *asoc,
> const struct sctp_chunk *chunk,
> @@ -1373,6 +1414,18 @@ void *sctp_addto_chunk(struct sctp_chunk *chunk, int len, const void *data)
> return target;
> }
>
> +/* Append bytes to the end of a chunk. Returns NULL if there isn't sufficient
> + * space in the chunk
> + */
> +void *sctp_addto_chunk_fixed(struct sctp_chunk *chunk,
> + int len, const void *data)
> +{
> + if (skb_tailroom(chunk->skb) > len)
> + return sctp_addto_chunk(chunk, len, data);
> + else
> + return NULL;
> +}
> +
> /* Append bytes from user space to the end of a chunk. Will panic if
> * chunk is not big enough.
> * Returns a kernel err value.
> @@ -1976,13 +2029,12 @@ static sctp_ierror_t sctp_process_unk_param(const struct sctp_association *asoc,
> * returning multiple unknown parameters.
> */
> if (NULL == *errp)
> - *errp = sctp_make_op_error_space(asoc, chunk,
> - ntohs(chunk->chunk_hdr->length));
> + *errp = sctp_make_op_error_fixed(asoc, chunk);
>
> if (*errp) {
> - sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
> + sctp_init_cause_fixed(*errp, SCTP_ERROR_UNKNOWN_PARAM,
> WORD_ROUND(ntohs(param.p->length)));
> - sctp_addto_chunk(*errp,
> + sctp_addto_chunk_fixed(*errp,
> WORD_ROUND(ntohs(param.p->length)),
> param.v);
> } else {
>
^ permalink raw reply
* Re: 2.6.34-rc5-git7 (plus all patches) -- another suspicious rcu_dereference_check() usage.
From: Paul E. McKenney @ 2010-04-28 20:44 UTC (permalink / raw)
To: Eric Dumazet
Cc: Miles Lane, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
Peter Zijlstra, LKML, nauman, netdev, Jens Axboe, Gui Jianfeng,
Li Zefan, Johannes Berg, shemminger
In-Reply-To: <1272485923.2201.22.camel@edumazet-laptop>
On Wed, Apr 28, 2010 at 10:18:43PM +0200, Eric Dumazet wrote:
> Le mercredi 28 avril 2010 à 13:09 -0700, Paul E. McKenney a écrit :
> > On Wed, Apr 28, 2010 at 09:38:11PM +0200, Eric Dumazet wrote:
> > > Le mercredi 28 avril 2010 à 10:54 -0700, Paul E. McKenney a écrit :
> > > > On Mon, Apr 26, 2010 at 08:51:06PM -0400, Miles Lane wrote:
> > > > > This one occurred during the wakeup from suspend to RAM.
> > > > >
> > > > > [ 984.724697] [ INFO: suspicious rcu_dereference_check() usage. ]
> > > > > [ 984.724700] ---------------------------------------------------
> > > > > [ 984.724703] include/linux/fdtable.h:88 invoked
> > > > > rcu_dereference_check() without protection!
> > > > > [ 984.724706]
> > > > > [ 984.724707] other info that might help us debug this:
> > > > > [ 984.724708]
> > > > > [ 984.724711]
> > > > > [ 984.724711] rcu_scheduler_active = 1, debug_locks = 1
> > > > > [ 984.724714] no locks held by dbus-daemon/4680.
> > > > > [ 984.724717]
> > > > > [ 984.724717] stack backtrace:
> > > > > [ 984.724721] Pid: 4680, comm: dbus-daemon Not tainted 2.6.34-rc5-git7 #33
> > > > > [ 984.724724] Call Trace:
> > > > > [ 984.724734] [<ffffffff81074556>] lockdep_rcu_dereference+0x9d/0xa6
> > > > > [ 984.724740] [<ffffffff810fc785>] fcheck_files+0xb1/0xc9
> > > > > [ 984.724745] [<ffffffff810fc7f5>] fget_light+0x35/0xab
> > > > > [ 984.724751] [<ffffffff81433e1b>] ? sock_poll_wait+0x13/0x18
> > > > > [ 984.724755] [<ffffffff81433e39>] ? unix_poll+0x19/0x95
> > > > > [ 984.724762] [<ffffffff8110aa95>] do_sys_poll+0x1ff/0x3e5
> > > > > [ 984.724766] [<ffffffff8110a19e>] ? __pollwait+0x0/0xc7
> > > > > [ 984.724771] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [ 984.724776] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [ 984.724780] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [ 984.724784] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [ 984.724788] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [ 984.724793] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [ 984.724797] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [ 984.724802] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [ 984.724806] [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [ 984.724812] [<ffffffff8110ae0f>] sys_poll+0x50/0xbb
> > > > > [ 984.724818] [<ffffffff81009d82>] system_call_fastpath+0x16/0x1b
> > > >
> > > > Hmmm... I am not convinced that this is a false positive. Couldn't
> > > > there be a multi-threaded process where one thread is invoking poll()
> > > > on a UNIX socket just as another thread is calling close() on it?
> > > >
> > > > The current fcheck_files() logic requires that the caller either (1) be in
> > > > an RCU read-side critical section, (2) hold ->files_lock, or (3) passing
> > > > in a files_struct with ->count equal to 1 (initialization or cleanup).
> > > >
> > > > So I don't feel comfortable just slapping an RCU read-side critical
> > > > section around this one, at least not unless someone who understands
> > > > the locking says that doing so is OK.
> > > >
> > > >
> > >
> > > Its a single threaded program.
> > >
> > > So fget_light() calls fcheck_files(files, fd); without rcu lock,
> > > but some /proc/pid/fd/... user temporarly raised files->count just
> > > before we perform the condition check.
> >
> > So I should add a single-threaded check. My first thought was to use
> > current_is_single_threaded(), but the bit about scanning the full list
> > of processes does give me pause. However, thread_group_empty() looks
> > like a much lighter-weight alternative.
> >
> > I believe that it is possible for a pair of single-threaded processes
> > to share a file descriptor, but that should not be a problem, as both
> > of them would need to close it for it to go away.
> >
> > But what happens if someone does a clone() with CLONE_FILES, as some
> > of the AIO stuff seems to do? Won't that allow one of the resulting
> > processes to close the file for both of them, even though both are
> > otherwise single-threaded? And the ->count seems to be the only
> > distinction between these two cases.
> >
> > And AIO does CLONE_VM as well as CLONE_FILES, but that seems to mean that
> > the check must scan the processes with current_is_single_threaded().
> > Besides which, a user could invoke clone() with only CLONE_FILES
> > specified, right?
> >
> > Or am I just confused here?
> >
> > Thanx, Paul
>
> If a program is mono threaded, and doing a fget_light() syscall, it
> cannot possibly do a clone() in // ;)
The sequence of events that I am worried about is as follows:
1. Single-threaded process does clone(CLONE_FILES). The
result is a pair of single-threaded processes that share
file descriptors.
2. One of these processes does files_fdtable(i) at the same
time as the other process closes file descriptor i.
So, clone and -then- do fget_light().
> If we want to be picky, we could add a user provided condition, aka "we
> are sure we are allowed to do this because we are the owner of the files
> struct".
But yes, if I understand your trick below, the race conditions from
the above sequence of events would simply force the processes off
of the fget_light() path, which should be OK.
Thanx, Paul
> diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
> index 6da962c..027f5e1 100644
> --- a/drivers/char/tty_io.c
> +++ b/drivers/char/tty_io.c
> @@ -2694,7 +2694,7 @@ void __do_SAK(struct tty_struct *tty)
> spin_lock(&p->files->file_lock);
> fdt = files_fdtable(p->files);
> for (i = 0; i < fdt->max_fds; i++) {
> - filp = fcheck_files(p->files, i);
> + filp = fcheck_files(p->files, i, false);
> if (!filp)
> continue;
> if (filp->f_op->read == tty_read &&
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index 452d02f..dabf4d8 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -119,7 +119,7 @@ SYSCALL_DEFINE2(dup2, unsigned int, oldfd, unsigned int, newfd)
> int retval = oldfd;
>
> rcu_read_lock();
> - if (!fcheck_files(files, oldfd))
> + if (!fcheck_files(files, oldfd, false))
> retval = -EBADF;
> rcu_read_unlock();
> return retval;
> diff --git a/fs/file_table.c b/fs/file_table.c
> index 32d12b7..2865f72 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -274,7 +274,7 @@ struct file *fget(unsigned int fd)
> struct files_struct *files = current->files;
>
> rcu_read_lock();
> - file = fcheck_files(files, fd);
> + file = fcheck_files(files, fd, false);
> if (file) {
> if (!atomic_long_inc_not_zero(&file->f_count)) {
> /* File object ref couldn't be taken */
> @@ -303,10 +303,10 @@ struct file *fget_light(unsigned int fd, int *fput_needed)
>
> *fput_needed = 0;
> if (likely((atomic_read(&files->count) == 1))) {
> - file = fcheck_files(files, fd);
> + file = fcheck_files(files, fd, true);
> } else {
> rcu_read_lock();
> - file = fcheck_files(files, fd);
> + file = fcheck_files(files, fd, false);
> if (file) {
> if (atomic_long_inc_not_zero(&file->f_count))
> *fput_needed = 1;
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 8418fcc..0e89448 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1716,7 +1716,7 @@ static int proc_fd_info(struct inode *inode, struct path *path, char *info)
> * hold ->file_lock.
> */
> spin_lock(&files->file_lock);
> - file = fcheck_files(files, fd);
> + file = fcheck_files(files, fd, false);
> if (file) {
> if (path) {
> *path = file->f_path;
> @@ -1755,7 +1755,7 @@ static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
> files = get_files_struct(task);
> if (files) {
> rcu_read_lock();
> - if (fcheck_files(files, fd)) {
> + if (fcheck_files(files, fd, false)) {
> rcu_read_unlock();
> put_files_struct(files);
> if (task_dumpable(task)) {
> @@ -1813,7 +1813,7 @@ static struct dentry *proc_fd_instantiate(struct inode *dir,
> * hold ->file_lock.
> */
> spin_lock(&files->file_lock);
> - file = fcheck_files(files, fd);
> + file = fcheck_files(files, fd, false);
> if (!file)
> goto out_unlock;
> if (file->f_mode & FMODE_READ)
> @@ -1899,7 +1899,7 @@ static int proc_readfd_common(struct file * filp, void * dirent,
> char name[PROC_NUMBUF];
> int len;
>
> - if (!fcheck_files(files, fd))
> + if (!fcheck_files(files, fd, false))
> continue;
> rcu_read_unlock();
>
> diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
> index 013dc52..76423ad 100644
> --- a/include/linux/fdtable.h
> +++ b/include/linux/fdtable.h
> @@ -57,11 +57,12 @@ struct files_struct {
> struct file * fd_array[NR_OPEN_DEFAULT];
> };
>
> -#define rcu_dereference_check_fdtable(files, fdtfd) \
> +#define rcu_dereference_check_fdtable(files, fdtfd, cond) \
> (rcu_dereference_check((fdtfd), \
> rcu_read_lock_held() || \
> lockdep_is_held(&(files)->file_lock) || \
> - atomic_read(&(files)->count) == 1))
> + atomic_read(&(files)->count) == 1 || \
> + cond))
>
> #define files_fdtable(files) \
> (rcu_dereference_check_fdtable((files), (files)->fdt))
> @@ -79,13 +80,13 @@ static inline void free_fdtable(struct fdtable *fdt)
> call_rcu(&fdt->rcu, free_fdtable_rcu);
> }
>
> -static inline struct file * fcheck_files(struct files_struct *files, unsigned int fd)
> +static inline struct file * fcheck_files(struct files_struct *files, unsigned int fd, bool cond)
> {
> struct file * file = NULL;
> struct fdtable *fdt = files_fdtable(files);
>
> if (fd < fdt->max_fds)
> - file = rcu_dereference_check_fdtable(files, fdt->fd[fd]);
> + file = rcu_dereference_check_fdtable(files, fdt->fd[fd], cond);
> return file;
> }
>
>
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox