* [PATCH] netfilter: don't xt_jumpstack_alloc twice in xt_register_table
From: Xiaotian Feng @ 2010-05-31 11:06 UTC (permalink / raw)
To: netfilter-devel, netfilter, coreteam
Cc: linux-kernel, netdev, Xiaotian Feng, Patrick McHardy,
David S. Miller, Jan Engelhardt, Andrew Morton, Rusty Russell,
Alexey Dobriyan
In xt_register_table, xt_jumpstack_alloc is called first, later
xt_replace_table is used. But in xt_replace_table, xt_jumpstack_alloc
will be used again. Then the memory allocated by previous xt_jumpstack_alloc
will be leaked. We can simply remove the previous xt_jumpstack_alloc because
there aren't any users of newinfo between xt_jumpstack_alloc and
xt_replace_table.
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
---
net/netfilter/x_tables.c | 4 ----
1 files changed, 0 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 445de70..47b1e79 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -844,10 +844,6 @@ struct xt_table *xt_register_table(struct net *net,
struct xt_table_info *private;
struct xt_table *t, *table;
- ret = xt_jumpstack_alloc(newinfo);
- if (ret < 0)
- return ERR_PTR(ret);
-
/* Don't add one object to multiple lists. */
table = kmemdup(input_table, sizeof(struct xt_table), GFP_KERNEL);
if (!table) {
--
1.7.0.1
^ permalink raw reply related
* [PATCH] virtio-net: pass gfp to add_buf
From: Rusty Russell @ 2010-05-31 11:10 UTC (permalink / raw)
To: netdev; +Cc: Michael S. Tsirkin
From: "Michael S. Tsirkin" <mst@redhat.com>
(Dave: the virtqueue_add_buf_gfp is in Linus' tree, so please queue
this trivial use now. Thanks!)
virtio-net bounces buffer allocations off to
a thread if it can't allocate buffers from the atomic
pool. However, if posting buffers still requires atomic
buffers, this is unlikely to succeed.
Fix by passing in the proper gfp_t parameter.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
drivers/net/virtio_net.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -341,7 +341,7 @@ static int add_recvbuf_small(struct virt
skb_to_sgvec(skb, vi->rx_sg + 1, 0, skb->len);
- err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, 2, skb);
+ err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, 2, skb, gfp);
if (err < 0)
dev_kfree_skb(skb);
@@ -386,8 +386,8 @@ static int add_recvbuf_big(struct virtne
/* chain first in list head */
first->private = (unsigned long)list;
- err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, MAX_SKB_FRAGS + 2,
- first);
+ err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, MAX_SKB_FRAGS + 2,
+ first, gfp);
if (err < 0)
give_pages(vi, first);
@@ -405,7 +405,7 @@ static int add_recvbuf_mergeable(struct
sg_init_one(vi->rx_sg, page_address(page), PAGE_SIZE);
- err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, 1, page);
+ err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, 1, page, gfp);
if (err < 0)
give_pages(vi, page);
^ permalink raw reply
* [net-next-2.6 PATCH 2/2] be2net: replace udelay() with schedule_timeout() in mbox polling
From: Sathya Perla @ 2010-05-31 9:34 UTC (permalink / raw)
To: netdev
As mbox polling is done only in process context, it is better to
use schedule_timeout() instead of udelay().
Signed-off-by: Sathya Perla <sathyap@serverengines.com>
---
drivers/net/benet/be_cmds.c | 11 +++++------
1 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index 9d11dbf..c2bd84c 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -186,7 +186,7 @@ static int be_mcc_notify_wait(struct be_adapter *adapter)
static int be_mbox_db_ready_wait(struct be_adapter *adapter, void __iomem *db)
{
- int cnt = 0, wait = 5;
+ int msecs = 0;
u32 ready;
do {
@@ -201,15 +201,14 @@ static int be_mbox_db_ready_wait(struct be_adapter *adapter, void __iomem *db)
if (ready)
break;
- if (cnt > 4000000) {
+ if (msecs > 4000) {
dev_err(&adapter->pdev->dev, "mbox poll timed out\n");
return -1;
}
- if (cnt > 50)
- wait = 200;
- cnt += wait;
- udelay(wait);
+ set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(msecs_to_jiffies(1));
+ msecs++;
} while (true);
return 0;
--
1.6.5.2
^ permalink raw reply related
* [net-next-2.6 PATCH 1/2] be2net: cleanup in case of error in be_open()
From: Sathya Perla @ 2010-05-31 9:33 UTC (permalink / raw)
To: netdev
This patch adds cleanup code (things like unregistering irq,
disabling napi etc) to be_open() when an error occurs inside the
routine.
Signed-off-by: Sathya Perla <sathyap@serverengines.com>
---
drivers/net/benet/be_main.c | 95 ++++++++++++++++++++++---------------------
1 files changed, 49 insertions(+), 46 deletions(-)
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 54b1427..3225774 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -1735,6 +1735,44 @@ done:
adapter->isr_registered = false;
}
+static int be_close(struct net_device *netdev)
+{
+ struct be_adapter *adapter = netdev_priv(netdev);
+ struct be_eq_obj *rx_eq = &adapter->rx_eq;
+ struct be_eq_obj *tx_eq = &adapter->tx_eq;
+ int vec;
+
+ cancel_delayed_work_sync(&adapter->work);
+
+ be_async_mcc_disable(adapter);
+
+ netif_stop_queue(netdev);
+ netif_carrier_off(netdev);
+ adapter->link_up = false;
+
+ be_intr_set(adapter, false);
+
+ if (adapter->msix_enabled) {
+ vec = be_msix_vec_get(adapter, tx_eq->q.id);
+ synchronize_irq(vec);
+ vec = be_msix_vec_get(adapter, rx_eq->q.id);
+ synchronize_irq(vec);
+ } else {
+ synchronize_irq(netdev->irq);
+ }
+ be_irq_unregister(adapter);
+
+ napi_disable(&rx_eq->napi);
+ napi_disable(&tx_eq->napi);
+
+ /* Wait for all pending tx completions to arrive so that
+ * all tx skbs are freed.
+ */
+ be_tx_compl_clean(adapter);
+
+ return 0;
+}
+
static int be_open(struct net_device *netdev)
{
struct be_adapter *adapter = netdev_priv(netdev);
@@ -1765,27 +1803,29 @@ static int be_open(struct net_device *netdev)
/* Now that interrupts are on we can process async mcc */
be_async_mcc_enable(adapter);
+ schedule_delayed_work(&adapter->work, msecs_to_jiffies(100));
+
status = be_cmd_link_status_query(adapter, &link_up, &mac_speed,
&link_speed);
if (status)
- goto ret_sts;
+ goto err;
be_link_status_update(adapter, link_up);
- if (be_physfn(adapter))
+ if (be_physfn(adapter)) {
status = be_vid_config(adapter);
- if (status)
- goto ret_sts;
+ if (status)
+ goto err;
- if (be_physfn(adapter)) {
status = be_cmd_set_flow_control(adapter,
adapter->tx_fc, adapter->rx_fc);
if (status)
- goto ret_sts;
+ goto err;
}
- schedule_delayed_work(&adapter->work, msecs_to_jiffies(100));
-ret_sts:
- return status;
+ return 0;
+err:
+ be_close(adapter->netdev);
+ return -EIO;
}
static int be_setup_wol(struct be_adapter *adapter, bool enable)
@@ -1913,43 +1953,6 @@ static int be_clear(struct be_adapter *adapter)
return 0;
}
-static int be_close(struct net_device *netdev)
-{
- struct be_adapter *adapter = netdev_priv(netdev);
- struct be_eq_obj *rx_eq = &adapter->rx_eq;
- struct be_eq_obj *tx_eq = &adapter->tx_eq;
- int vec;
-
- cancel_delayed_work_sync(&adapter->work);
-
- be_async_mcc_disable(adapter);
-
- netif_stop_queue(netdev);
- netif_carrier_off(netdev);
- adapter->link_up = false;
-
- be_intr_set(adapter, false);
-
- if (adapter->msix_enabled) {
- vec = be_msix_vec_get(adapter, tx_eq->q.id);
- synchronize_irq(vec);
- vec = be_msix_vec_get(adapter, rx_eq->q.id);
- synchronize_irq(vec);
- } else {
- synchronize_irq(netdev->irq);
- }
- be_irq_unregister(adapter);
-
- napi_disable(&rx_eq->napi);
- napi_disable(&tx_eq->napi);
-
- /* Wait for all pending tx completions to arrive so that
- * all tx skbs are freed.
- */
- be_tx_compl_clean(adapter);
-
- return 0;
-}
#define FW_FILE_HDR_SIGN "ServerEngines Corp. "
char flash_cookie[2][16] = {"*** SE FLAS",
--
1.6.5.2
^ permalink raw reply related
* Re: [PATCH] mac8390: change an error return code and some cleanup, take 4
From: David Miller @ 2010-05-31 11:27 UTC (permalink / raw)
To: fthain; +Cc: joe, p_gortmaker, netdev, linux-kernel, linux-m68k
In-Reply-To: <alpine.LNX.2.00.1005311905520.469@nippy.intranet>
From: fthain@telegraphics.com.au
Date: Mon, 31 May 2010 19:21:26 +1000 (EST)
> Your suggestion to use pr_debug is invoking compile time infrastructure
> (the DEBUG macro), so it is not in the spirit of this commit, and it is
> not relevant to any criticism from you or Joe of the earlier submissions.
>
> Please apply the patch.
I won't do that, because your change elides the pr_fmt prefix Joe
Perches added to the driver for the purposes of making sure such a
prefix would be added to all log messages output by the driver.
The whole idea is that everything the driver puts into the kernel
log has the driver module name at the beginning. That's what the
pr_fmt define at the top of the driver does.
It's so you can tell which driver the message came from.
You're undoing that, which is a bug. It's a step backwards, it's
wrong.
And to top it off we've had to explain this stuff to you multiple
times.
So, as long as the patch is incorrect I absolutely will not apply it.
^ permalink raw reply
* Re: [PATCH] mac8390: change an error return code and some cleanup, take 4
From: David Miller @ 2010-05-31 11:30 UTC (permalink / raw)
To: fthain; +Cc: geert, joe, p_gortmaker, netdev, linux-kernel, linux-m68k
In-Reply-To: <alpine.LNX.2.00.1005312011210.469@nippy.intranet>
From: fthain@telegraphics.com.au
Date: Mon, 31 May 2010 21:07:09 +1000 (EST)
> Apparently David now wants me to submit this again --
>
> if (ei_debug)
> pr_debug(...)
>
> David, if that code is acceptable, please let me know.
The only thing I care about is at the moment that you don't do
something that ends up dropping the pr_fmt prefix.
The pr_fmt define at the beginning of the driver is for nothing
if we end up adding exceptions that end up eliding it for no
good reason. And that's what your patch was doing.
^ permalink raw reply
* Re: [PATCH] netfilter: don't xt_jumpstack_alloc twice in xt_register_table
From: Jan Engelhardt @ 2010-05-31 11:51 UTC (permalink / raw)
To: Xiaotian Feng
Cc: netfilter-devel, netfilter, coreteam, linux-kernel, netdev,
Patrick McHardy, David S. Miller, Andrew Morton, Rusty Russell,
Alexey Dobriyan
In-Reply-To: <1275303998-2435-1-git-send-email-dfeng@redhat.com>
On Monday 2010-05-31 13:06, Xiaotian Feng wrote:
>In xt_register_table, xt_jumpstack_alloc is called first, later
>xt_replace_table is used. But in xt_replace_table, xt_jumpstack_alloc
>will be used again. Then the memory allocated by previous xt_jumpstack_alloc
>will be leaked. We can simply remove the previous xt_jumpstack_alloc because
>there aren't any users of newinfo between xt_jumpstack_alloc and
>xt_replace_table.
Indeed that seems to be so.
>diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
>index 445de70..47b1e79 100644
>--- a/net/netfilter/x_tables.c
>+++ b/net/netfilter/x_tables.c
>@@ -844,10 +844,6 @@ struct xt_table *xt_register_table(struct net *net,
> struct xt_table_info *private;
> struct xt_table *t, *table;
>
>- ret = xt_jumpstack_alloc(newinfo);
>- if (ret < 0)
>- return ERR_PTR(ret);
>-
> /* Don't add one object to multiple lists. */
> table = kmemdup(input_table, sizeof(struct xt_table), GFP_KERNEL);
> if (!table) {
^ permalink raw reply
* Re: [PATCH] IPv6: fix Mobile IPv6 regression
From: Jiri Olsa @ 2010-05-31 12:49 UTC (permalink / raw)
To: Brian Haley
Cc: David Miller, Arnaud Ebalard, Scott C Otto,
YOSHIFUJI Hideaki / 吉藤英明, netdev
In-Reply-To: <20100531084620.GB1929@jolsa.lab.eng.brq.redhat.com>
On Mon, May 31, 2010 at 10:46:20AM +0200, Jiri Olsa wrote:
> On Fri, May 28, 2010 at 02:17:43PM -0400, Brian Haley wrote:
> > Commit f4f914b5 (net: ipv6 bind to device issue) caused
> > a regression with Mobile IPv6 when it changed the meaning
> > of fl->oif to become a strict requirement of the route
> > lookup. Instead, only force strict mode when
> > sk->sk_bound_dev_if is set on the calling socket, getting
> > the intended behavior and fixing the regression.
> >
> > Tested-by: Arnaud Ebalard <arno@natisbad.org>
> > Signed-off-by: Brian Haley <brian.haley@hp.com>
> >
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index 294cbe8..252d761 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -814,7 +814,7 @@ struct dst_entry * ip6_route_output(struct net *net, struct sock *sk,
> > {
> > int flags = 0;
> >
> > - if (fl->oif || rt6_need_strict(&fl->fl6_dst))
> > + if ((sk && sk->sk_bound_dev_if) || rt6_need_strict(&fl->fl6_dst))
> > flags |= RT6_LOOKUP_F_IFACE;
> >
> > if (!ipv6_addr_any(&fl->fl6_src))
> hi,
>
> sorry for the late reply, I was out last week..
>
> the change looks ok, I'll verify it with the reproducer
> I used for the last fix
>
> thanks,
> jirka
it passed the test for me, so it looks fine
jirka
^ permalink raw reply
* Re: [PATCH] mac8390: change an error return code and some cleanup, take 4
From: Finn Thain @ 2010-05-31 12:55 UTC (permalink / raw)
To: David Miller; +Cc: geert, joe, p_gortmaker, netdev, linux-kernel, linux-m68k
In-Reply-To: <20100531.043056.258120791.davem@davemloft.net>
On Mon, 31 May 2010, David Miller wrote:
> From: fthain@telegraphics.com.au
> Date: Mon, 31 May 2010 21:07:09 +1000 (EST)
>
> > Apparently David now wants me to submit this again --
> >
> > if (ei_debug)
> > pr_debug(...)
> >
> > David, if that code is acceptable, please let me know.
>
> The only thing I care about is at the moment that you don't do something
> that ends up dropping the pr_fmt prefix.
>
> The pr_fmt define at the beginning of the driver is for nothing if we
> end up adding exceptions that end up eliding it for no good reason.
> And that's what your patch was doing.
>
Since you have rejected my most recent patch submission, which uses pr_fmt
explicitly, I imagine that what you are trying to say here is that only
pr_debug or pr_info are acceptable.
Now, so that we don't have to go through pointless resubmission
iterations, can you tell me which of the following you prefer:
if (ei_debug)
pr_debug(...)
OR
if (ei_debug)
pr_info(...)
Thanks.
Finn
^ permalink raw reply
* Re: [PATCH] mac8390: change an error return code and some cleanup, take 4
From: David Miller @ 2010-05-31 13:02 UTC (permalink / raw)
To: fthain; +Cc: geert, joe, p_gortmaker, netdev, linux-kernel, linux-m68k
In-Reply-To: <alpine.OSX.2.00.1005312244520.1490@localhost>
From: Finn Thain <fthain@telegraphics.com.au>
Date: Mon, 31 May 2010 22:55:23 +1000 (EST)
> if (ei_debug)
> pr_debug(...)
>
> OR
>
> if (ei_debug)
> pr_info(...)
Well for the printk in question, it's telling the user that
a certain feature can't be enabled.
And if the driver has an explicit way to control this message,
using ei_debug, it's kind of redundant to slap another layer on
top by using the debug printk facility.
Changing this to a debug log level printk is only going to make
getting the debug message shown harder for the user. So leaving it at
pr_info() is just as correct in my eyes.
Finally, your patch has so many problems getting applied because
you're doing multiple things in one patch.
And in fact I've asked you not to do this on several occiaions.
Fix the error return codes in one patch, and nothing more.
Then in another you can masterbate with printk log levels.
^ permalink raw reply
* [PATCH] phylib: Add support for the LXT973 phy.
From: Richard Cochran @ 2010-05-31 13:09 UTC (permalink / raw)
To: netdev
This patch implements a work around for Erratum 5, "3.3 V Fiber Speed
Selection." If the hardware wiring does not respect this erratum, then
fiber optic mode will not work properly.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
---
drivers/net/phy/lxt.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 50 insertions(+), 1 deletions(-)
diff --git a/drivers/net/phy/lxt.c b/drivers/net/phy/lxt.c
index 057ecaa..c3de582 100644
--- a/drivers/net/phy/lxt.c
+++ b/drivers/net/phy/lxt.c
@@ -53,6 +53,9 @@
#define MII_LXT971_ISR 19 /* Interrupt Status Register */
+/* register definitions for the 973 */
+#define MII_LXT973_PCR 16 /* Port Configuration Register */
+#define PCR_FIBER_SELECT 1
MODULE_DESCRIPTION("Intel LXT PHY driver");
MODULE_AUTHOR("Andy Fleming");
@@ -119,6 +122,33 @@ static int lxt971_config_intr(struct phy_device *phydev)
return err;
}
+static int lxt973_probe(struct phy_device *phydev)
+{
+ int val = phy_read(phydev, MII_LXT973_PCR);
+
+ if (val & PCR_FIBER_SELECT) {
+ /*
+ * If fiber is selected, then the only correct setting
+ * is 100Mbps, full duplex, and auto negotiation off.
+ */
+ val = phy_read(phydev, MII_BMCR);
+ val |= (BMCR_SPEED100 | BMCR_FULLDPLX);
+ val &= ~BMCR_ANENABLE;
+ phy_write(phydev, MII_BMCR, val);
+ /* Remember that the port is in fiber mode. */
+ phydev->priv = lxt973_probe;
+ } else {
+ phydev->priv = NULL;
+ }
+ return 0;
+}
+
+static int lxt973_config_aneg(struct phy_device *phydev)
+{
+ /* Do nothing if port is in fiber mode. */
+ return phydev->priv ? 0 : genphy_config_aneg(phydev);
+}
+
static struct phy_driver lxt970_driver = {
.phy_id = 0x78100000,
.name = "LXT970",
@@ -146,6 +176,18 @@ static struct phy_driver lxt971_driver = {
.driver = { .owner = THIS_MODULE,},
};
+static struct phy_driver lxt973_driver = {
+ .phy_id = 0x00137a10,
+ .name = "LXT973",
+ .phy_id_mask = 0xfffffff0,
+ .features = PHY_BASIC_FEATURES,
+ .flags = 0,
+ .probe = lxt973_probe,
+ .config_aneg = lxt973_config_aneg,
+ .read_status = genphy_read_status,
+ .driver = { .owner = THIS_MODULE,},
+};
+
static int __init lxt_init(void)
{
int ret;
@@ -157,9 +199,15 @@ static int __init lxt_init(void)
ret = phy_driver_register(&lxt971_driver);
if (ret)
goto err2;
+
+ ret = phy_driver_register(&lxt973_driver);
+ if (ret)
+ goto err3;
return 0;
- err2:
+ err3:
+ phy_driver_unregister(&lxt971_driver);
+ err2:
phy_driver_unregister(&lxt970_driver);
err1:
return ret;
@@ -169,6 +217,7 @@ static void __exit lxt_exit(void)
{
phy_driver_unregister(&lxt970_driver);
phy_driver_unregister(&lxt971_driver);
+ phy_driver_unregister(&lxt973_driver);
}
module_init(lxt_init);
--
1.6.3.3
^ permalink raw reply related
* [PATCH] ixp4xx: Support the all multicast flag on the NPE devices.
From: Richard Cochran @ 2010-05-31 13:11 UTC (permalink / raw)
To: Krzysztof Halasa; +Cc: netdev, linux-arm-kernel
This patch adds support for the IFF_ALLMULTI flag. Previously only the
IFF_PROMISC flag was supported.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
---
drivers/net/arm/ixp4xx_eth.c | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/drivers/net/arm/ixp4xx_eth.c b/drivers/net/arm/ixp4xx_eth.c
index 6be8b09..5d35064 100644
--- a/drivers/net/arm/ixp4xx_eth.c
+++ b/drivers/net/arm/ixp4xx_eth.c
@@ -739,6 +739,17 @@ static void eth_set_mcast_list(struct net_device *dev)
struct dev_mc_list *mclist;
u8 diffs[ETH_ALEN], *addr;
int i;
+ u8 allmulti[] = { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00 };
+
+ if (dev->flags & IFF_ALLMULTI) {
+ for (i = 0; i < ETH_ALEN; i++) {
+ __raw_writel(allmulti[i], &port->regs->mcast_addr[i]);
+ __raw_writel(allmulti[i], &port->regs->mcast_mask[i]);
+ }
+ __raw_writel(DEFAULT_RX_CNTRL0 | RX_CNTRL0_ADDR_FLTR_EN,
+ &port->regs->rx_control[0]);
+ return;
+ }
if ((dev->flags & IFF_PROMISC) || netdev_mc_empty(dev)) {
__raw_writel(DEFAULT_RX_CNTRL0 & ~RX_CNTRL0_ADDR_FLTR_EN,
--
1.6.3.3
^ permalink raw reply related
* [PATCH] netfilter: xtables: stackptr should be percpu
From: Eric Dumazet @ 2010-05-31 13:13 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Xiaotian Feng, netfilter-devel, netfilter, coreteam, linux-kernel,
netdev, Patrick McHardy, David S. Miller, Andrew Morton,
Rusty Russell, Alexey Dobriyan
In-Reply-To: <alpine.LSU.2.01.1005311350580.31799@obet.zrqbmnf.qr>
Le lundi 31 mai 2010 à 13:51 +0200, Jan Engelhardt a écrit :
> On Monday 2010-05-31 13:06, Xiaotian Feng wrote:
>
> >In xt_register_table, xt_jumpstack_alloc is called first, later
> >xt_replace_table is used. But in xt_replace_table, xt_jumpstack_alloc
> >will be used again. Then the memory allocated by previous xt_jumpstack_alloc
> >will be leaked. We can simply remove the previous xt_jumpstack_alloc because
> >there aren't any users of newinfo between xt_jumpstack_alloc and
> >xt_replace_table.
>
> Indeed that seems to be so.
An official "Acked-by: ..." would be fine Jan :)
BTW I noticed a _big_ slowdown of iptables lately, and located the
reason.
All cpus share a single cache line for their 'stackptr' storage,
introduced in commit f3c5c1bfd4
This is a stable candidate (2.6.34)
Note : We also should use alloc_percpu() for jumpstack but this is not a
critical thing and can be a net-next patch.
[PATCH] netfilter: xtables: stackptr should be percpu
commit f3c5c1bfd4 (netfilter: xtables: make ip_tables reentrant)
introduced a performance regression, because stackptr array is shared by
all cpus, adding cache line ping pongs. (16 cpus share a 64 bytes cache
line)
Fix this using alloc_percpu()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
include/linux/netfilter/x_tables.h | 2 +-
net/ipv4/netfilter/ip_tables.c | 2 +-
net/ipv6/netfilter/ip6_tables.c | 2 +-
net/netfilter/x_tables.c | 13 +++----------
4 files changed, 6 insertions(+), 13 deletions(-)
diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index c00cc0c..24e5d01 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -397,7 +397,7 @@ struct xt_table_info {
* @stacksize jumps (number of user chains) can possibly be made.
*/
unsigned int stacksize;
- unsigned int *stackptr;
+ unsigned int __percpu *stackptr;
void ***jumpstack;
/* ipt_entry tables: one per CPU */
/* Note : this field MUST be the last one, see XT_TABLE_INFO_SZ */
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 63958f3..4b6c5ca 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -336,7 +336,7 @@ ipt_do_table(struct sk_buff *skb,
cpu = smp_processor_id();
table_base = private->entries[cpu];
jumpstack = (struct ipt_entry **)private->jumpstack[cpu];
- stackptr = &private->stackptr[cpu];
+ stackptr = per_cpu_ptr(private->stackptr, cpu);
origptr = *stackptr;
e = get_entry(table_base, private->hook_entry[hook]);
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 6f517bd..9d2d68f 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -363,7 +363,7 @@ ip6t_do_table(struct sk_buff *skb,
cpu = smp_processor_id();
table_base = private->entries[cpu];
jumpstack = (struct ip6t_entry **)private->jumpstack[cpu];
- stackptr = &private->stackptr[cpu];
+ stackptr = per_cpu_ptr(private->stackptr, cpu);
origptr = *stackptr;
e = get_entry(table_base, private->hook_entry[hook]);
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 445de70..7e8a93d 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -699,10 +699,8 @@ void xt_free_table_info(struct xt_table_info *info)
vfree(info->jumpstack);
else
kfree(info->jumpstack);
- if (sizeof(unsigned int) * nr_cpu_ids > PAGE_SIZE)
- vfree(info->stackptr);
- else
- kfree(info->stackptr);
+
+ free_percpu(info->stackptr);
kfree(info);
}
@@ -753,14 +751,9 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
unsigned int size;
int cpu;
- size = sizeof(unsigned int) * nr_cpu_ids;
- if (size > PAGE_SIZE)
- i->stackptr = vmalloc(size);
- else
- i->stackptr = kmalloc(size, GFP_KERNEL);
+ i->stackptr = alloc_percpu(unsigned int);
if (i->stackptr == NULL)
return -ENOMEM;
- memset(i->stackptr, 0, size);
size = sizeof(void **) * nr_cpu_ids;
if (size > PAGE_SIZE)
^ permalink raw reply related
* Re: [PATCH] netfilter: don't xt_jumpstack_alloc twice in xt_register_table
From: Jan Engelhardt @ 2010-05-31 13:19 UTC (permalink / raw)
To: Xiaotian Feng
Cc: netfilter-devel, netfilter, coreteam, linux-kernel, netdev,
Patrick McHardy, David S. Miller, Andrew Morton, Rusty Russell,
Alexey Dobriyan
In-Reply-To: <alpine.LSU.2.01.1005311350580.31799@obet.zrqbmnf.qr>
On Monday 2010-05-31 13:51, Jan Engelhardt wrote:
>On Monday 2010-05-31 13:06, Xiaotian Feng wrote:
>
>>In xt_register_table, xt_jumpstack_alloc is called first, later
>>xt_replace_table is used. But in xt_replace_table, xt_jumpstack_alloc
>>will be used again. Then the memory allocated by previous xt_jumpstack_alloc
>>will be leaked. We can simply remove the previous xt_jumpstack_alloc because
>>there aren't any users of newinfo between xt_jumpstack_alloc and
>>xt_replace_table.
>
>Indeed that seems to be so.
Acked-By: Jan Engelhardt <jengelh@medozas.de>
>
>>diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
>>index 445de70..47b1e79 100644
>>--- a/net/netfilter/x_tables.c
>>+++ b/net/netfilter/x_tables.c
>>@@ -844,10 +844,6 @@ struct xt_table *xt_register_table(struct net *net,
>> struct xt_table_info *private;
>> struct xt_table *t, *table;
>>
>>- ret = xt_jumpstack_alloc(newinfo);
>>- if (ret < 0)
>>- return ERR_PTR(ret);
>>-
>> /* Don't add one object to multiple lists. */
>> table = kmemdup(input_table, sizeof(struct xt_table), GFP_KERNEL);
>> if (!table) {
>--
>To unsubscribe from this list: send the line "unsubscribe netfilter" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: [PATCH] netfilter: xtables: stackptr should be percpu
From: Jan Engelhardt @ 2010-05-31 13:22 UTC (permalink / raw)
To: Eric Dumazet
Cc: Xiaotian Feng, netfilter-devel, netfilter, coreteam, linux-kernel,
netdev, Patrick McHardy, David S. Miller, Andrew Morton,
Rusty Russell, Alexey Dobriyan
In-Reply-To: <1275311580.3291.44.camel@edumazet-laptop>
On Monday 2010-05-31 15:13, Eric Dumazet wrote:
>
>All cpus share a single cache line for their 'stackptr' storage,
>introduced in commit f3c5c1bfd4
>
>This is a stable candidate (2.6.34)
Stackptr was first introduced for 2.6.35-rcX.
>+ i->stackptr = alloc_percpu(unsigned int);
> if (i->stackptr == NULL)
> return -ENOMEM;
>- memset(i->stackptr, 0, size);
>
> size = sizeof(void **) * nr_cpu_ids;
> if (size > PAGE_SIZE)
Are alloc_percpu areas cleared?
Acked-By: Jan Engelhardt <jengelh@medozas.de>
^ permalink raw reply
* Re: [Patch] fix packet loss and massive ping spikes with PPP multi-link
From: Richard Hartmann @ 2010-05-31 13:39 UTC (permalink / raw)
To: Paul Mackerras
Cc: Ben McKeegan, netdev, linux-ppp, Alan Cox, Alexander E. Patrakov,
linux-kernel
In-Reply-To: <20100529021624.GA2538@brick.ozlabs.ibm.com>
On Sat, May 29, 2010 at 04:16, Paul Mackerras <paulus@samba.org> wrote:
> If you fix up the indentation issues (2-space indent in some of the
> added code -- if you're using emacs, set c-basic-offset to 8), I'll
> ack it and hopefully DaveM will pick it up.
If no one submits it by tonight, I will probably send a cleaned-up
version of the alternative patch. Not that I want to steal anyone's
laurels, mind. But having it in mainline is better than not having
it in mainline.
Richard
^ permalink raw reply
* Re: [PATCH] netfilter: xtables: stackptr should be percpu
From: Eric Dumazet @ 2010-05-31 13:44 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Xiaotian Feng, netfilter-devel, netfilter, coreteam, linux-kernel,
netdev, Patrick McHardy, David S. Miller, Andrew Morton,
Rusty Russell, Alexey Dobriyan
In-Reply-To: <alpine.LSU.2.01.1005311519340.25402@obet.zrqbmnf.qr>
Le lundi 31 mai 2010 à 15:22 +0200, Jan Engelhardt a écrit :
> On Monday 2010-05-31 15:13, Eric Dumazet wrote:
> >
> >All cpus share a single cache line for their 'stackptr' storage,
> >introduced in commit f3c5c1bfd4
> >
> >This is a stable candidate (2.6.34)
>
> Stackptr was first introduced for 2.6.35-rcX.
>
Indeed, I was fooled by 'git describe'
> >+ i->stackptr = alloc_percpu(unsigned int);
> > if (i->stackptr == NULL)
> > return -ENOMEM;
> >- memset(i->stackptr, 0, size);
> >
> > size = sizeof(void **) * nr_cpu_ids;
> > if (size > PAGE_SIZE)
>
> Are alloc_percpu areas cleared?
>
Yes, allocated chunks are cleared.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 1/2] Export firmware assigned labels of network devices to sysfs
From: Michael Ellerman @ 2010-05-31 14:07 UTC (permalink / raw)
To: K, Narendra
Cc: netdev@vger.kernel.org, linux-hotplug@vger.kernel.org,
linux-pci@vger.kernel.org, Domsch, Matt, Hargrave, Jordan,
Rose, Charles, Nijhawan, Vijay
In-Reply-To: <20100528115520.GA24114@littleblue.us.dell.com>
[-- Attachment #1: Type: text/plain, Size: 781 bytes --]
On Fri, 2010-05-28 at 06:55 -0500, K, Narendra wrote:
> Hello,
>
> This patch is in continuation of an earlier discussion -
>
> http://marc.info/?l=linux-netdev&m=126712978908314&w=3
>
> The patch has the following review suggestions from the community incorporated -
>
> 1. The name of the attribute has been changed from "smbiosname" to "label" to hide
> the implementation details.
> 2. The implementation has been moved to a new file drivers/pci/pci-label.c
You've changed the name, which is good, but the implementation is still
100% dependant on ACPI or DMI AFAICS.
So it seems to me until it's supported on another platform it may as
well go in pci-acpi.c, or at least only be compiled if (ACPI || DMI).
Otherwise it's just dead code.
cheers
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* Re: [PATCH] ixp4xx: Support the all multicast flag on the NPE devices.
From: Krzysztof Halasa @ 2010-05-31 14:09 UTC (permalink / raw)
To: Richard Cochran; +Cc: netdev, linux-arm-kernel
In-Reply-To: <20100531131102.GA15870@riccoc20.at.omicron.at>
Richard Cochran <richardcochran@gmail.com> writes:
> This patch adds support for the IFF_ALLMULTI flag. Previously only the
> IFF_PROMISC flag was supported.
>
> Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
> ---
> drivers/net/arm/ixp4xx_eth.c | 11 +++++++++++
> 1 files changed, 11 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/arm/ixp4xx_eth.c b/drivers/net/arm/ixp4xx_eth.c
> index 6be8b09..5d35064 100644
> --- a/drivers/net/arm/ixp4xx_eth.c
> +++ b/drivers/net/arm/ixp4xx_eth.c
> @@ -739,6 +739,17 @@ static void eth_set_mcast_list(struct net_device *dev)
> struct dev_mc_list *mclist;
> u8 diffs[ETH_ALEN], *addr;
> int i;
> + u8 allmulti[] = { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00 };
> +
> + if (dev->flags & IFF_ALLMULTI) {
> + for (i = 0; i < ETH_ALEN; i++) {
> + __raw_writel(allmulti[i], &port->regs->mcast_addr[i]);
> + __raw_writel(allmulti[i], &port->regs->mcast_mask[i]);
> + }
> + __raw_writel(DEFAULT_RX_CNTRL0 | RX_CNTRL0_ADDR_FLTR_EN,
> + &port->regs->rx_control[0]);
> + return;
> + }
Looks good, though I'd prefer a bit of "const" and "static" in the
allmulti[] declaration. Would you please repost? TIA.
--
Krzysztof Halasa
^ permalink raw reply
* Re: [PATCH] netfilter: xtables: stackptr should be percpu
From: Jan Engelhardt @ 2010-05-31 14:09 UTC (permalink / raw)
To: Eric Dumazet
Cc: Xiaotian Feng, netfilter-devel, netfilter, coreteam, linux-kernel,
netdev, Patrick McHardy, David S. Miller, Andrew Morton,
Rusty Russell, Alexey Dobriyan
In-Reply-To: <1275313448.3291.49.camel@edumazet-laptop>
On Monday 2010-05-31 15:44, Eric Dumazet wrote:
>Le lundi 31 mai 2010 à 15:22 +0200, Jan Engelhardt a écrit :
>> On Monday 2010-05-31 15:13, Eric Dumazet wrote:
>> >
>> >All cpus share a single cache line for their 'stackptr' storage,
>> >introduced in commit f3c5c1bfd4
>> >
>> >This is a stable candidate (2.6.34)
>>
>> Stackptr was first introduced for 2.6.35-rcX.
>
>Indeed, I was fooled by 'git describe'
Keep your friends close, and your enemies closer ;-)
git describe --contains f3c5c1bfd
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] netfilter: xtables: stackptr should be percpu
From: Eric Dumazet @ 2010-05-31 14:16 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Xiaotian Feng, netfilter-devel, netfilter, coreteam, linux-kernel,
netdev, Patrick McHardy, David S. Miller, Andrew Morton,
Rusty Russell, Alexey Dobriyan
In-Reply-To: <alpine.LSU.2.01.1005311608570.20623@obet.zrqbmnf.qr>
Le lundi 31 mai 2010 à 16:09 +0200, Jan Engelhardt a écrit :
> Keep your friends close, and your enemies closer ;-)
>
> git describe --contains f3c5c1bfd
Yes, --contains should be the default, and --predates the option :)
This is a bit OT anyway :)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] netfilter: don't xt_jumpstack_alloc twice in xt_register_table
From: Patrick McHardy @ 2010-05-31 14:34 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Xiaotian Feng, netfilter-devel, netfilter, coreteam, linux-kernel,
netdev, David S. Miller, Andrew Morton, Rusty Russell,
Alexey Dobriyan
In-Reply-To: <alpine.LSU.2.01.1005311519160.25402@obet.zrqbmnf.qr>
Jan Engelhardt wrote:
> On Monday 2010-05-31 13:51, Jan Engelhardt wrote:
>> On Monday 2010-05-31 13:06, Xiaotian Feng wrote:
>>
>>> In xt_register_table, xt_jumpstack_alloc is called first, later
>>> xt_replace_table is used. But in xt_replace_table, xt_jumpstack_alloc
>>> will be used again. Then the memory allocated by previous xt_jumpstack_alloc
>>> will be leaked. We can simply remove the previous xt_jumpstack_alloc because
>>> there aren't any users of newinfo between xt_jumpstack_alloc and
>>> xt_replace_table.
>> Indeed that seems to be so.
>
> Acked-By: Jan Engelhardt <jengelh@medozas.de>
Applied, thanks everyone.
^ permalink raw reply
* Re: [PATCH] netfilter: don't xt_jumpstack_alloc twice in xt_register_table
From: Patrick McHardy @ 2010-05-31 14:37 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Xiaotian Feng, netfilter-devel, netfilter, coreteam, linux-kernel,
netdev, David S. Miller, Andrew Morton, Rusty Russell,
Alexey Dobriyan
In-Reply-To: <alpine.LSU.2.01.1005311519160.25402@obet.zrqbmnf.qr>
Jan Engelhardt wrote:
> On Monday 2010-05-31 13:51, Jan Engelhardt wrote:
>> On Monday 2010-05-31 13:06, Xiaotian Feng wrote:
>>
>>> In xt_register_table, xt_jumpstack_alloc is called first, later
>>> xt_replace_table is used. But in xt_replace_table, xt_jumpstack_alloc
>>> will be used again. Then the memory allocated by previous xt_jumpstack_alloc
>>> will be leaked. We can simply remove the previous xt_jumpstack_alloc because
>>> there aren't any users of newinfo between xt_jumpstack_alloc and
>>> xt_replace_table.
>> Indeed that seems to be so.
>
> Acked-By: Jan Engelhardt <jengelh@medozas.de>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] netfilter: xtables: stackptr should be percpu
From: Patrick McHardy @ 2010-05-31 14:37 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Eric Dumazet, Xiaotian Feng, netfilter-devel, netfilter, coreteam,
linux-kernel, netdev, David S. Miller, Andrew Morton,
Rusty Russell, Alexey Dobriyan
In-Reply-To: <alpine.LSU.2.01.1005311519340.25402@obet.zrqbmnf.qr>
Jan Engelhardt wrote:
> On Monday 2010-05-31 15:13, Eric Dumazet wrote:
>> All cpus share a single cache line for their 'stackptr' storage,
>> introduced in commit f3c5c1bfd4
>>
>> This is a stable candidate (2.6.34)
>
> Stackptr was first introduced for 2.6.35-rcX.
>
>> + i->stackptr = alloc_percpu(unsigned int);
>> if (i->stackptr == NULL)
>> return -ENOMEM;
>> - memset(i->stackptr, 0, size);
>>
>> size = sizeof(void **) * nr_cpu_ids;
>> if (size > PAGE_SIZE)
>
> Are alloc_percpu areas cleared?
>
> Acked-By: Jan Engelhardt <jengelh@medozas.de>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Oleg Nesterov @ 2010-05-31 14:39 UTC (permalink / raw)
To: Tejun Heo
Cc: Michael S. Tsirkin, Sridhar Samudrala, netdev, lkml,
kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C02C961.9050606@kernel.org>
On 05/30, Tejun Heo wrote:
>
> This conversion is to make each vhost use a dedicated kthread so that
> resource control via cgroup can be applied.
Personally, I agree. I think This is better than play with workqueue thread.
A couple of simple questions after the quick glance at the unapplied patch...
> void vhost_poll_flush(struct vhost_poll *poll)
> {
> - flush_work(&poll->work);
> + int seq = poll->queue_seq;
> +
> + if (seq - poll->done_seq > 0)
> + wait_event(poll->done, seq - poll->done_seq <= 0);
The check before wait_event() is not needed, please note that wait_event()
checks the condition before __wait_event().
What I can't understand is why we do have ->queue_seq and ->done_seq.
Isn't the single "bool poll->active" enough? vhost_poll_queue() sets
->active == T, vhost_poller() clears it before wake_up_all(poll->done).
> +static int vhost_poller(void *data)
> +{
> + struct vhost_dev *dev = data;
> + struct vhost_poll *poll;
> +
> +repeat:
> + set_current_state(TASK_INTERRUPTIBLE); /* mb paired w/ kthread_stop */
I don't understand the comment... why do we need this barrier?
> + if (kthread_should_stop()) {
> + __set_current_state(TASK_RUNNING);
> + return 0;
> + }
> +
> + poll = NULL;
> + spin_lock(&dev->poller_lock);
> + if (!list_empty(&dev->poll_list)) {
> + poll = list_first_entry(&dev->poll_list,
> + struct vhost_poll, node);
> + list_del_init(&poll->node);
> + }
> + spin_unlock(&dev->poller_lock);
> +
> + if (poll) {
> + __set_current_state(TASK_RUNNING);
> + poll->fn(poll);
> + smp_wmb(); /* paired with rmb in vhost_poll_flush() */
> + poll->done_seq = poll->queue_seq;
> + wake_up_all(&poll->done);
> + } else
> + schedule();
> +
> + goto repeat;
> +}
Given that vhost_poll_queue() does list_add() and wake_up_process() under
->poller_lock, I don't think we need any barriers to change ->state.
IOW, can't vhost_poller() simply do
while(!kthread_should_stop()) {
poll = NULL;
spin_lock(&dev->poller_lock);
if (!list_empty(&dev->poll_list)) {
...
} else
__set_current_state(TASK_INTERRUPTIBLE);
spin_unlock(&dev->poller_lock);
if (poll) {
...
} else
schedule();
}
?
Oleg.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox