* [FIB]: Avoid using static variables without proper locking
From: Eric Dumazet @ 2008-01-14 19:27 UTC (permalink / raw)
To: David Miller; +Cc: Robert Olsson, netdev
In-Reply-To: <18315.41725.417992.715140@robur.slu.se>
[-- Attachment #1: Type: text/plain, Size: 287 bytes --]
fib_trie_seq_show() uses two helper functions, rtn_scope() and
rtn_type() that can
write to static storage without locking.
Just pass to them a temporary buffer to avoid potential corruption
(probably not triggerable but still...)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
[-- Attachment #2: fib_trie.patch --]
[-- Type: text/plain, Size: 1936 bytes --]
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 8d8c291..15a555a 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -2276,10 +2276,8 @@ static void seq_indent(struct seq_file *seq, int n)
while (n-- > 0) seq_puts(seq, " ");
}
-static inline const char *rtn_scope(enum rt_scope_t s)
+static inline const char *rtn_scope(char *buf, size_t len, enum rt_scope_t s)
{
- static char buf[32];
-
switch (s) {
case RT_SCOPE_UNIVERSE: return "universe";
case RT_SCOPE_SITE: return "site";
@@ -2287,7 +2285,7 @@ static inline const char *rtn_scope(enum rt_scope_t s)
case RT_SCOPE_HOST: return "host";
case RT_SCOPE_NOWHERE: return "nowhere";
default:
- snprintf(buf, sizeof(buf), "scope=%d", s);
+ snprintf(buf, len, "scope=%d", s);
return buf;
}
}
@@ -2307,13 +2305,11 @@ static const char *rtn_type_names[__RTN_MAX] = {
[RTN_XRESOLVE] = "XRESOLVE",
};
-static inline const char *rtn_type(unsigned t)
+static inline const char *rtn_type(char *buf, size_t len, unsigned t)
{
- static char buf[32];
-
if (t < __RTN_MAX && rtn_type_names[t])
return rtn_type_names[t];
- snprintf(buf, sizeof(buf), "type %d", t);
+ snprintf(buf, len, "type %d", t);
return buf;
}
@@ -2351,13 +2347,19 @@ static int fib_trie_seq_show(struct seq_file *seq, void *v)
seq_printf(seq, " |-- %d.%d.%d.%d\n", NIPQUAD(val));
for (i = 32; i >= 0; i--) {
struct leaf_info *li = find_leaf_info(l, i);
+
if (li) {
struct fib_alias *fa;
+
list_for_each_entry_rcu(fa, &li->falh, fa_list) {
+ char buf1[32], buf2[32];
+
seq_indent(seq, iter->depth+1);
seq_printf(seq, " /%d %s %s", i,
- rtn_scope(fa->fa_scope),
- rtn_type(fa->fa_type));
+ rtn_scope(buf1, sizeof(buf1),
+ fa->fa_scope),
+ rtn_type(buf2, sizeof(buf2),
+ fa->fa_type));
if (fa->fa_tos)
seq_printf(seq, "tos =%d\n",
fa->fa_tos);
^ permalink raw reply related
* Re: questions on NAPI processing latency and dropped network packets
From: Chris Friesen @ 2008-01-14 19:25 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Ray Lee, netdev, linux-kernel
In-Reply-To: <478B943C.7080009@cosmosbay.com>
Eric Dumazet wrote:
> Chris Friesen a écrit :
>> Based on profiling and instrumentation it seems like the cost of
>> sctp_endpoint_lookup_assoc() more than triples, which means that the
>> amount of time that bottom halves are disabled in that function also
>> triples.
>
> Any idea of the size of sctp hash size you have ?
> (your dmesg probably includes a message starting with SCTP: Hash tables
> configured...
> How many concurrent sctp sockets are handled ?
Our lab is currently rebooting, but I'll try and get this once it's back up.
> Maybe sctp_assoc_hashfn() is too weak for your use, and some chains are
> *really* long.
Based on the profiling information we're spending time in
sctp_endpoint_lookup_assoc() which doesn't actually use hashes, so I
can't see how the hash would be related. I'm pretty new to SCTP though,
so I may be missing something.
Here's the top results from readprofile, unfortunately these are
aggregated across both cpus so they don't really show what's going on.
The key thing is that sctp_endpoint_lookup_assoc() is the most expensive
kernel routine on this entire system.
3147 .power4_idle 22.4786
1712 .native_idle 20.3810
1234 .sctp_endpoint_lookup_assoc 2.1725
1212 ._spin_unlock_irqrestore 6.4468
778 .do_futex 0.3791
447 ._spin_unlock_irq 4.2981
313 .fget 1.7784
277 .fput 3.8472
275 .kfree 0.7473
234 .__kmalloc 0.5571
131 SystemCall_common 0.3411
130 .sctp_assoc_is_match 0.6373
123 .lock_sock 0.4155
119 .find_vma 0.6919
116 .kmem_cache_alloc 0.3580
111 .kmem_cache_free 0.3343
106 .skb_release_data 0.4907
102 .__copy_tofrom_user 0.0724
100 .exit_elf_binfmt 1.9231
100 .do_select 0.0820
Chris
^ permalink raw reply
* Re: occasionally corrupted network stats in /proc/net/dev
From: Eric Dumazet @ 2008-01-14 19:12 UTC (permalink / raw)
To: Mark Seger; +Cc: Ben Greear, netdev, mchan
In-Reply-To: <478BAF47.10607@hp.com>
Mark Seger a écrit :
> Ignore that last one as it was pointed out to me that we have both nic
> installed on many of our systems and ethtool told me the one
> associated with the nic is actually the broadcom one.
>
> version: 1.4.38 E1B1EC867DEEB8027B2DA0F
> license: GPL
> description: Broadcom NetXtreme II BCM5706/5708 Driver
>
I remember some tg3 chips actually have bugs when reporting stats....
once in a while
CCed to Michael Chan to get some details.
> -mark
>
> Mark Seger wrote:
>> I'll try to get data on the other systems reporting it and as I said
>> it does not happen all that often AND you have to be looking for
>> it. The system I've personally seen it happen on several times is
>> running RHEL4/U4 which redhat numbers 2.6.9-42 and from modinfo I see:
>> version: 7.0.33-k2-NAPI 51E97FEE51D0772AFC89130
>> description: Intel(R) PRO/1000 Network Driver
>>
>> -mark
>>
>> Ben Greear wrote:
>>> Mark Seger wrote:
>>>> I had posted the following on linux-net and haven't see any
>>>> responses possibly because nobody had any or that list is
>>>> obsolete. I have been told this is the current list for everything
>>>> networking on linux so I thought I'd try again...
>>> Do you see this with multiple network drivers, or just with one
>>> particular driver. If so, which one?
>>>
>>> Thanks,
>>> Ben
>>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply
* Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...
From: Paul Moore @ 2008-01-14 19:07 UTC (permalink / raw)
To: Valdis.Kletnieks; +Cc: Andrew Morton, linux-kernel, netdev
In-Reply-To: <1030.1200336639@turing-police.cc.vt.edu>
On Monday 14 January 2008 1:50:39 pm Valdis.Kletnieks@vt.edu wrote:
> On Mon, 14 Jan 2008 13:22:10 EST, Valdis.Kletnieks@vt.edu said:
> > Apparently the only new commit in there since the tree that was in
> > 24-rc6-mm1 is 5d95575903fd3865b884952bd93c339d48725c33 adding some
> > warning printk's. Would it be more productive to test against the full
> > tree, or leaving out the one commit I already reverted?
>
> <voice=Emily Litella> Nevermind... </voice> :)
>
> The new commit won't apply with the other one reverted - it patches
> security/selinux/netnode.c which was created by the problematic commit...
There have been quite a few changes in lblnet-2.6_testing since 2.6.24-rc6-mm1
so I would recommend taking the whole tree. I'm also not quite sure if
simply reverting the "Convert the netif code to use ifindex values" patch
would solve the problem as there are other patches in the rc6-mm1 tree that
rely on skb->iif being valid (new code, not converted code). If you want to
stick with a _relatively_ vanilla rc6-mm1 tree I would leave everything in
and simply apply the following patch which solved the skb_clone()/iif
problem:
http://git.infradead.org/?p=users/pcmoore/lblnet-2.6_testing;a=commitdiff;h=02f1c89d6e36507476f78108a3dcc78538be460b
--
paul moore
linux security @ hp
^ permalink raw reply
* Re: occasionally corrupted network stats in /proc/net/dev
From: Ben Greear @ 2008-01-14 19:01 UTC (permalink / raw)
To: Mark Seger; +Cc: netdev
In-Reply-To: <478BAF47.10607@hp.com>
Mark Seger wrote:
> Ignore that last one as it was pointed out to me that we have both nic
> installed on many of our systems and ethtool told me the one
> associated with the nic is actually the broadcom one.
>
> version: 1.4.38 E1B1EC867DEEB8027B2DA0F
> license: GPL
> description: Broadcom NetXtreme II BCM5706/5708 Driver
Ok, we do a similar stats polling, though through a private ioctl I
hacked into the kernel to
get the netdev->stats struct with a memcpy. I haven't noticed any
problems with counters
in the e1000 driver. I haven't done enough testing on bcm drivers to
ascertain whether it's
reliable or not w/regard to stats.
If you can reproduce the problem with e1000, it would be worth looking
at the logic that prints
out the proc interface text for problems..and if you cannot, then maybe
it's the bcm driver that
is at issue.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply
* Re: [PATCH][v2] phylib: add module owner to the mii_bus structure
From: Andy Fleming @ 2008-01-14 18:56 UTC (permalink / raw)
To: Ionut Nicu; +Cc: netdev, shemminger
In-Reply-To: <78b2dab672856274c518f4c523f6f63f59f06dd1.1198861866.git.ionut.nicu@freescale.com>
On Dec 28, 2007, at 11:31, Ionut Nicu wrote:
> Prevent unloading mii bus driver module when other modules have
> references to some
> phydevs on that bus. Added a new member (module owner) to struct
> mii_bus and added
> code to increment the mii bus owner module usage count on
> phy_connect and decrement
> it on phy_disconnect
>
> Set the module owner in the ucc_geth_mdio driver.
>
> Signed-off-by: Ionut Nicu <ionut.nicu@freescale.com>
> Tested-by: Emil Medve <Emilian.Medve@Freescale.com>
> ---
> drivers/net/phy/phy_device.c | 9 ++++++++-
> drivers/net/ucc_geth_mii.c | 3 +++
> diff --git a/drivers/net/ucc_geth_mii.c b/drivers/net/ucc_geth_mii.c
> index a3af4ea..84c7295 100644
> --- a/drivers/net/ucc_geth_mii.c
> +++ b/drivers/net/ucc_geth_mii.c
> @@ -217,6 +217,9 @@ static int uec_mdio_probe(struct of_device
> *ofdev, const struct of_device_id *ma
> }
> }
>
> + /* register ourselves as the owner of this bus */
> + new_bus->owner = THIS_MODULE;
> +
> err = mdiobus_register(new_bus);
> if (0 != err) {
> printk(KERN_ERR "%s: Cannot register as MDIO bus\n",
Any reason you didn't update the other drivers?
> git grep mdiobus_register drivers/net/ // duplicates and
mdio_bus.c edited out
drivers/net/au1000_eth.c: mdiobus_register(&aup->mii_bus);
drivers/net/bfin_mac.c: mdiobus_register(&lp->mii_bus);
drivers/net/cpmac.c: res = mdiobus_register(&cpmac_mii);
drivers/net/fec_mpc52xx_phy.c: err = mdiobus_register(bus);
drivers/net/fs_enet/mii-bitbang.c: ret = mdiobus_register(new_bus);
drivers/net/fs_enet/mii-fec.c: ret = mdiobus_register(new_bus);
drivers/net/gianfar_mii.c: err = mdiobus_register(new_bus);
drivers/net/macb.c: if (mdiobus_register(&bp->mii_bus))
drivers/net/sb1250-mac.c: err = mdiobus_register(&sc->mii_bus);
drivers/net/ucc_geth_mii.c: err = mdiobus_register(new_bus);
I'm guessing this was only tested on the UEC, because unless I
misunderstand the code, any other driver would now crash when you try
to get the owner.
Andy
^ permalink raw reply
* Re: [PATCH 9/9] fix sparse warnings
From: Eric Dumazet @ 2008-01-14 17:34 UTC (permalink / raw)
To: Robert Olsson; +Cc: Stephen Hemminger, David Miller, Robert Olsson, netdev
In-Reply-To: <18315.16984.456053.250600@robur.slu.se>
Robert Olsson a écrit :
> Thanks for hacking and improving and the trie... another idea that could
> be also tested. If we look into routing table we see that most leafs
> only has one prefix
>
> Main:
> Aver depth: 2.57
> Max depth: 7
> Leaves: 231173
>
> ip route | wc -l
> 241649
>
> Thats 231173/241649 = 96% with the current Internet routing.
>
> How about if would have a fastpath and store one entry direct in the
> leaf struct this to avoid loading the leaf_info list in most cases?
>
> One could believe that both lookup and dump could improve.
>
>
You mean to include one "leaf_info" inside leaf structure, so that we
can access it without cache line miss ?
^ permalink raw reply
* Re: occasionally corrupted network stats in /proc/net/dev
From: Mark Seger @ 2008-01-14 18:51 UTC (permalink / raw)
To: Mark Seger; +Cc: Ben Greear, netdev
In-Reply-To: <478BA8D7.8050803@hp.com>
Ignore that last one as it was pointed out to me that we have both nic
installed on many of our systems and ethtool told me the one associated
with the nic is actually the broadcom one.
version: 1.4.38 E1B1EC867DEEB8027B2DA0F
license: GPL
description: Broadcom NetXtreme II BCM5706/5708 Driver
-mark
Mark Seger wrote:
> I'll try to get data on the other systems reporting it and as I said
> it does not happen all that often AND you have to be looking for it.
> The system I've personally seen it happen on several times is running
> RHEL4/U4 which redhat numbers 2.6.9-42 and from modinfo I see:
> version: 7.0.33-k2-NAPI 51E97FEE51D0772AFC89130
> description: Intel(R) PRO/1000 Network Driver
>
> -mark
>
> Ben Greear wrote:
>> Mark Seger wrote:
>>> I had posted the following on linux-net and haven't see any
>>> responses possibly because nobody had any or that list is obsolete.
>>> I have been told this is the current list for everything networking
>>> on linux so I thought I'd try again...
>> Do you see this with multiple network drivers, or just with one
>> particular driver. If so, which one?
>>
>> Thanks,
>> Ben
>>
^ permalink raw reply
* Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...
From: Valdis.Kletnieks @ 2008-01-14 18:50 UTC (permalink / raw)
To: Paul Moore, Andrew Morton; +Cc: linux-kernel, netdev
In-Reply-To: <32065.1200334930@turing-police.cc.vt.edu>
[-- Attachment #1: Type: text/plain, Size: 525 bytes --]
On Mon, 14 Jan 2008 13:22:10 EST, Valdis.Kletnieks@vt.edu said:
> Apparently the only new commit in there since the tree that was in
> 24-rc6-mm1 is 5d95575903fd3865b884952bd93c339d48725c33 adding some warning
> printk's. Would it be more productive to test against the full tree, or
> leaving out the one commit I already reverted?
<voice=Emily Litella> Nevermind... </voice> :)
The new commit won't apply with the other one reverted - it patches
security/selinux/netnode.c which was created by the problematic commit...
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply
* Re: [PATCH]drivers/net/phy/: default return value in ioctl phy.c
From: Andy Fleming @ 2008-01-14 18:48 UTC (permalink / raw)
To: Rini van Zetten; +Cc: netdev, linux-kernel
In-Reply-To: <014901c83c06$da9223e0$2b00a8c0@ARV127>
On Dec 11, 2007, at 09:02, Rini van Zetten wrote:
> Hello Andy,
>
> This patch (to 2.6.23.9) add a default return value EOPNOTSUPP to
> the ioctl function. The problem with the always 0 return value is
> that the iwconfig (wireless) tool found a valid device when an
> ethernet device uses the phy abstraction layer.
> I 've tetsted this with the macb driver.
>
>
> Signed-off-by: Rini van Zetten <rini@arvoo.nl>
Acked-by: Andy Fleming <afleming@freescale.com>
^ permalink raw reply
* Re: occasionally corrupted network stats in /proc/net/dev
From: Mark Seger @ 2008-01-14 18:24 UTC (permalink / raw)
To: Ben Greear; +Cc: netdev
In-Reply-To: <478BA53B.9060504@candelatech.com>
I'll try to get data on the other systems reporting it and as I said it
does not happen all that often AND you have to be looking for it. The
system I've personally seen it happen on several times is running
RHEL4/U4 which redhat numbers 2.6.9-42 and from modinfo I see:
version: 7.0.33-k2-NAPI 51E97FEE51D0772AFC89130
description: Intel(R) PRO/1000 Network Driver
-mark
Ben Greear wrote:
> Mark Seger wrote:
>> I had posted the following on linux-net and haven't see any responses
>> possibly because nobody had any or that list is obsolete. I have
>> been told this is the current list for everything networking on linux
>> so I thought I'd try again...
> Do you see this with multiple network drivers, or just with one
> particular driver. If so, which one?
>
> Thanks,
> Ben
>
^ permalink raw reply
* Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...
From: Valdis.Kletnieks @ 2008-01-14 18:22 UTC (permalink / raw)
Cc: Paul Moore, Andrew Morton, linux-kernel, netdev
In-Reply-To: <31130.1200333948@turing-police.cc.vt.edu>
[-- Attachment #1: Type: text/plain, Size: 472 bytes --]
On Mon, 14 Jan 2008 13:05:48 EST, Valdis.Kletnieks@vt.edu said:
> I'm pulling git://git.infradead.org/users/pcmoore/lblnet-2.6_testing at the
> moment, and seeing if there's already a fix in there for this.
Apparently the only new commit in there since the tree that was in
24-rc6-mm1 is 5d95575903fd3865b884952bd93c339d48725c33 adding some warning
printk's. Would it be more productive to test against the full tree, or
leaving out the one commit I already reverted?
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply
* Re: [PATCH 2.6.23+] ingress classify to [nf]mark
From: Dzianis Kahanovich @ 2008-01-14 22:20 UTC (permalink / raw)
To: netdev; +Cc: hadi
In-Reply-To: <1200315372.4427.75.camel@localhost>
jamal wrote:
May be I am mix in mind other code (multi-class loop/walking) and this code. I
am deprogramming... ;)
>> Sorry, I just change focus from existing "tc_index=..." to common behaviour ;)
>
>> [...]
>>> Please refer to what i said above; if what i said still doesnt make
>>> sense i can create (the simple) patch.
>> A bit vague... sorry...
>
> I mean:
>
> #ifdef CONFIG_NET_CLS_ACT
> .... leave this part alone which already sets tc_index ...
> #else
> ...set tc_index and mark here ...
> #endif
>
> And when we have a metadata action - we remove setting of tc_index from
> #ifdef CONFIG_NET_CLS_ACT
>
> Did that make sense?
After current "#endif" - may be.
What "result" are with:
1) no filters?
2) 1 filter only, with "action continue"?
--
WBR,
Denis Kaganovich, mahatma@eu.by http://mahatma.bspu.unibel.by
^ permalink raw reply
* [RFT] sky2: wake-on-lan configuration issues
From: Stephen Hemminger @ 2008-01-14 18:14 UTC (permalink / raw)
To: supersud501
Cc: Rafael J. Wysocki, Andrew Morton, netdev, linux-acpi,
bugme-daemon
In-Reply-To: <478A965E.8070306@yahoo.de>
Please test this patch against Linus's current (approx 2.6.24-rc7-git5).
Ignore Andrew's premature reversion attempt...
This patch disables config mode access after clearing PCI settings.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
--- a/drivers/net/sky2.c 2008-01-14 09:44:22.000000000 -0800
+++ b/drivers/net/sky2.c 2008-01-14 09:44:51.000000000 -0800
@@ -621,6 +621,7 @@ static void sky2_phy_power(struct sky2_h
static const u32 phy_power[] = { PCI_Y2_PHY1_POWD, PCI_Y2_PHY2_POWD };
static const u32 coma_mode[] = { PCI_Y2_PHY1_COMA, PCI_Y2_PHY2_COMA };
+ sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_ON);
reg1 = sky2_pci_read32(hw, PCI_DEV_REG1);
/* Turn on/off phy power saving */
if (onoff)
@@ -632,7 +633,8 @@ static void sky2_phy_power(struct sky2_h
reg1 |= coma_mode[port];
sky2_pci_write32(hw, PCI_DEV_REG1, reg1);
- reg1 = sky2_pci_read32(hw, PCI_DEV_REG1);
+ sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_OFF);
+ sky2_pci_read32(hw, PCI_DEV_REG1);
udelay(100);
}
@@ -2426,6 +2428,7 @@ static void sky2_hw_intr(struct sky2_hw
if (status & (Y2_IS_MST_ERR | Y2_IS_IRQ_STAT)) {
u16 pci_err;
+ sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_ON);
pci_err = sky2_pci_read16(hw, PCI_STATUS);
if (net_ratelimit())
dev_err(&pdev->dev, "PCI hardware error (0x%x)\n",
@@ -2433,12 +2436,14 @@ static void sky2_hw_intr(struct sky2_hw
sky2_pci_write16(hw, PCI_STATUS,
pci_err | PCI_STATUS_ERROR_BITS);
+ sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_OFF);
}
if (status & Y2_IS_PCI_EXP) {
/* PCI-Express uncorrectable Error occurred */
u32 err;
+ sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_ON);
err = sky2_read32(hw, Y2_CFG_AER + PCI_ERR_UNCOR_STATUS);
sky2_write32(hw, Y2_CFG_AER + PCI_ERR_UNCOR_STATUS,
0xfffffffful);
@@ -2446,6 +2451,7 @@ static void sky2_hw_intr(struct sky2_hw
dev_err(&pdev->dev, "PCI Express error (0x%x)\n", err);
sky2_read32(hw, Y2_CFG_AER + PCI_ERR_UNCOR_STATUS);
+ sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_OFF);
}
if (status & Y2_HWE_L1_MASK)
@@ -2811,6 +2817,7 @@ static void sky2_reset(struct sky2_hw *h
}
sky2_power_on(hw);
+ sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_OFF);
for (i = 0; i < hw->ports; i++) {
sky2_write8(hw, SK_REG(i, GMAC_LINK_CTRL), GMLC_RST_SET);
^ permalink raw reply
* Re: occasionally corrupted network stats in /proc/net/dev
From: Ben Greear @ 2008-01-14 18:08 UTC (permalink / raw)
To: Mark Seger; +Cc: netdev
In-Reply-To: <478B99E6.2050800@hp.com>
Mark Seger wrote:
> I had posted the following on linux-net and haven't see any responses
> possibly because nobody had any or that list is obsolete. I have been
> told this is the current list for everything networking on linux so I
> thought I'd try again...
Do you see this with multiple network drivers, or just with one
particular driver. If so, which one?
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply
* Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...
From: Valdis.Kletnieks @ 2008-01-14 18:05 UTC (permalink / raw)
To: Paul Moore; +Cc: Andrew Morton, linux-kernel, netdev
In-Reply-To: <200801141136.41140.paul.moore@hp.com>
[-- Attachment #1: Type: text/plain, Size: 812 bytes --]
On Mon, 14 Jan 2008 11:36:40 EST, Paul Moore said:
> Are you still only seeing these problems on loopback? I can't help but wonder
> if this is the skb_clone() problem where it wasn't copying skb->iif causing
> SELinux to silently drop the packets.
Yes, I've only spotted it on loopback. The odd part is that I had reverted the
one commit 9c6ad8f6895db7a517c04c2147cb5e7ffb83a315 "Convert the netif code to
use ifindex values" - so either I managed to get the revert terribly wrong,
or there's something else odd going on. The first time around, I was seeing
hangs during a TCP 3-packet handshake - this time data flows for some number
of packets before hanging.
I'm pulling git://git.infradead.org/users/pcmoore/lblnet-2.6_testing at the
moment, and seeing if there's already a fix in there for this.
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply
* Re: [PATCH 0/3] UCC TDM driver for MPC83xx platforms
From: Kim Phillips @ 2008-01-14 18:00 UTC (permalink / raw)
To: Aggrwal Poonam, Andrew Morton
Cc: sfr, rubini, linux-ppcdev, netdev, linux-kernel, Gala Kumar,
Barkowski Michael, Kalra Ashish, Cutler Richard
In-Reply-To: <FBA61160C48B8D438F3323FEFB4EF2C26E7653@zin33exm24.fsl.freescale.net>
On Thu, 10 Jan 2008 21:41:20 -0700
"Aggrwal Poonam" <Poonam.Aggrwal@freescale.com> wrote:
> Hello All
>
> I am waiting for more feedback on the patches.
>
> If there are no objections please consider them for 2.6.25.
>
if this isn't going to go through Alessandro Rubini/misc drivers, can
it go through the akpm/mm tree?
Kim
^ permalink raw reply
* Re: [PATCH 9/9] fix sparse warnings
From: Robert Olsson @ 2008-01-14 17:59 UTC (permalink / raw)
To: Eric Dumazet
Cc: Robert Olsson, Stephen Hemminger, David Miller, Robert Olsson,
netdev
In-Reply-To: <478B9D0B.5040301@cosmosbay.com>
Eric Dumazet writes:
> > Thats 231173/241649 = 96% with the current Internet routing.
> >
> > How about if would have a fastpath and store one entry direct in the
> > leaf struct this to avoid loading the leaf_info list in most cases?
> >
> > One could believe that both lookup and dump could improve.
> >
> You mean to include one "leaf_info" inside leaf structure, so that we
> can access it without cache line miss ?
Yes.
Cheers
--ro
^ permalink raw reply
* Re: [patch 5/7] netxen: fix race in interrupt / napi
From: Dhananjay Phadke @ 2008-01-14 18:00 UTC (permalink / raw)
To: Jeff Garzik; +Cc: netdev
In-Reply-To: <4789414B.40800@garzik.org>
Ok, I will respin the failed patches.
Thanks,
-Dhananjay
On Sat, 12 Jan 2008, Jeff Garzik wrote:
> patch conflicted with
>
> commit 1706287f6eb58726a9a0e5cbbde87f49757615e3
> Author: David S. Miller <davem@davemloft.net>
> Date: Mon Jan 7 20:51:29 2008 -0800
>
> [NETXEN]: Fix ->poll() done logic.
>
> If work_done >= budget we should always elide the NAPI
> completion.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
>
^ permalink raw reply
* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
From: Rick Jones @ 2008-01-14 17:46 UTC (permalink / raw)
To: Zhang, Yanmin; +Cc: LKML, netdev
In-Reply-To: <1200280292.3151.24.camel@ymzhang>
>>*) netperf/netserver support CPU affinity within themselves with the
>>global -T option to netperf. Is the result with taskset much different?
>> The equivalent to the above would be to run netperf with:
>>
>>./netperf -T 0,7 ..
>
> I checked the source codes and didn't find this option.
> I use netperf V2.3 (I found the number in the makefile).
Indeed, that version pre-dates the -T option. If you weren't already
chasing a regression I'd suggest an upgrade to 2.4.mumble. Once you are
at a point where changing another variable won't muddle things you may
want to consider upgrading.
happy benchmarking,
rick jones
^ permalink raw reply
* Re: [PATCH/RFC] synchronize_rcu(): high latency on idle system
From: Stephen Hemminger @ 2008-01-14 17:19 UTC (permalink / raw)
To: Andi Kleen; +Cc: netdev, linux-kerne
In-Reply-To: <200801131634.17677.ak@suse.de>
On Sun, 13 Jan 2008 16:34:17 +0100
Andi Kleen <ak@suse.de> wrote:
>
> > I think it should be in netdev_unregister_kobject(). But that would
> > only get rid of one of the two calls to synchronize_rcu() in the unregister_netdev.
>
> Would be already an improvement.
>
> > The other synchronize_rcu() is for qdisc's and not sure if that one can
> > be removed?
>
> The standard way to remove such calls is to set a "deleted" flag in the object,
> then check and ignore such objects in the reader and finally remove the object with
> call_rcu
>
> I have not checked if that is really feasible for qdiscs.
>
> -Andi
Actually, the synchronize_rcu() is now acting a barrier between two sections
in the current unregister process. It can't be removed.
But, an alternative unregister_and_free_netdev() could be created that uses
call_rcu. Basically:
void unregistr_and_free_netdev() {
do stuff before barrier...
setup rcu callback
call_rcu();
}
static void netdev_after_rcu() {
rtnl_lock();
do stuff after barier
rtnl_unlock();
free_netdev
}
--
Stephen Hemminger <stephen.hemminger@vyatta.com>
^ permalink raw reply
* occasionally corrupted network stats in /proc/net/dev
From: Mark Seger @ 2008-01-14 17:20 UTC (permalink / raw)
To: netdev
I had posted the following on linux-net and haven't see any responses
possibly because nobody had any or that list is obsolete. I have been
told this is the current list for everything networking on linux so I
thought I'd try again...
I suspect the answer will be that it is what it is, but here's the
deal. I have a tool I use for monitoring network traffic among other
things - see http://collectl.sourceforge.net/ - and one of its benefits
is that you can run it continuously as a daemon (similar to sar) and
generate data in a format suitable for plotting. This means that you
can automate your entire network monitoring infrastructure at fairly
fine granularity, down to second if you like. Actually 1-second level
monitoring will provide incorrect data on earlier kernels because the
stats aren't updated on 1 second boundaries and you need to monitor at
an interval of 0.9765 seconds, but that's a different story which is
explained at http://collectl.sourceforge.net/NetworkStats.html
But more importantly, I've found that occasionally (not that often)
there is bogus data reported from /proc/net/dev. While I don't have a
lot of details on this it seems to only show up in 64 bit kernels. Look
at the following samples taken at 1 second intervals:
eth0:135115809 1024897 0 0 0 0 0 9
135458926 910340 0 0 0 0 0 0
eth0:135118023 1024923 0 0 0 0 0 9
135460952 910363 0 0 0 0 0 0
eth0: 0 884620 0 0 0 0 0 909397
9687563 1049736 0 0 0 0 0 0
eth0:135121189 1024957 0 0 0 0 0 9
135464222 910400 0 0 0 0 0 0
eth0:135129565 1024995 0 0 0 0 0 9
135473687 910435 0 0 0 0 0 0
see the middle sample? When I look at the change between samples it
generates a really big number since the difference is assumed to be
caused a counter wrapping. The problem is it's not always
straightforward when there is bad data. For example if the original and
bogus values are close enough it's not even clear there is a problem.
So the obvious question is, is there any way to prevent the bogus data
from getting reported? If not, is there any way to set the values to
something to indicate that the correct values can't be determined?
Clearly this problem would be visible to any tool that looks at /proc
but since many tools are not automated or don't take it to the level I
do, nobody probably notices. As for the counter update frequency, even
though they now appear to be updated closer to a 1 second boundary it
also means tools that can monitor at sub-second intervals will report
incorrect data since the counters only change once a second.
-mark
^ permalink raw reply
* Re: questions on NAPI processing latency and dropped network packets
From: Eric Dumazet @ 2008-01-14 16:56 UTC (permalink / raw)
To: Chris Friesen; +Cc: Ray Lee, netdev, linux-kernel
In-Reply-To: <478B8473.6080506@nortel.com>
Chris Friesen a écrit :
> Ray Lee wrote:
>> On Jan 10, 2008 9:24 AM, Chris Friesen <cfriesen@nortel.com> wrote:
>
>>> After a recent userspace app change, we've started seeing packets being
>>> dropped by the ethernet hardware (e1000, NAPI is enabled). The
>>> error/dropped/fifo counts are going up in ethtool:
>
>> Can you reproduce it with a simple userspace cpu hog? (Two, really,
>> one per cpu.)
>> Can you reproduce it with the newer e1000?
>
> Hmm...good questions and I haven't checked either. The first one is
> relatively straightforward. The second is a bit trickier...last time
> I tried the latest e1000 driver the card wouldn't boot (we use netboot).
>
>> Can you reproduce it with git head?
>
> Unfortunately, I don't think I'll be able to try this. We require
> kernel mods for our userspace to run, and I doubt I'd be able to get
> the time to port all the changes forward to git head.
>
>> If the answer to the first one is yes, the last no, then bisect until
>> you get a kernel that doesn't show the problem. Backport the fix,
>> unless the fix happens to be CFS. However, I suspect that your
>> userpace app is just starving the system from time to time.
>
> It's conceivable that userspace is starving the kernel, but we have do
> about 45% idle on one cpu, and 7-10% idle on the other.
>
> We also have an odd situation where on an initial test run after
> bootup we have 18-24% idle on cpu1, but resetting the test tool drops
> that to the 7-10% I mentioned above.
>
> Based on profiling and instrumentation it seems like the cost of
> sctp_endpoint_lookup_assoc() more than triples, which means that the
> amount of time that bottom halves are disabled in that function also
> triples.
Any idea of the size of sctp hash size you have ?
(your dmesg probably includes a message starting with SCTP: Hash tables
configured...
How many concurrent sctp sockets are handled ?
Maybe sctp_assoc_hashfn() is too weak for your use, and some chains are
*really* long.
^ permalink raw reply
* Re: [PATCH 2/9] get rid of unused revision element
From: Stephen Hemminger @ 2008-01-14 16:35 UTC (permalink / raw)
To: David Miller; +Cc: Robert.Olsson, robert.olsson, netdev, stephen.hemminger
In-Reply-To: <20080114.040657.117757802.davem@davemloft.net>
On Mon, 14 Jan 2008 04:06:57 -0800 (PST)
David Miller <davem@davemloft.net> wrote:
> From: Robert Olsson <Robert.Olsson@data.slu.se>
> Date: Mon, 14 Jan 2008 12:44:32 +0100
>
> > The idea was to have a selective flush of route cache entries when
> > a fib insert/delete happened. From what I remember you added another/
> > better solution. Just a list with route cache entries pointing to parent
> > route. So yes this was obsoleted by your/our effort to avoid total
> > flushing of the route cache. Unfinished work.
>
> Yes, that's right. The synchronization was very hard.
>
> But there is another issue, see below....
>
> > According to http://bgpupdates.potaroo.net/instability/bgpupd.html
> > (last in page) we currently flush the route cache 2.80 times per second.
> > when using full Internet routing with Linux. Maybe we're forced to pick
> > up this thread again someday.
>
> This proves we need to solve this problem.
>
> The reason I've never gone back to that work is that I didn't
> want to do it while we still had multiple FIB data structure
> implementations.
>
> Someone needs to go over whatever deficiencies exist in fib_trie
> vs. fib_hash so that we can delete fib_hash and move over to using
> fib_trie always. It makes no sense to implement everything
> interfacing into that code twice.
>
> There was a full consensus that this was the way to move forward,
> we just need the dirty work to be done.
>
> If someone wants to show their gratitude for my getting rid of
> the multipath cached routing code, the above work would be a
> great way to do so (hint hint) :-)
I will be glad to get this working. Is there any point in doing the a
small systems version as well?
--
Stephen Hemminger <stephen.hemminger@vyatta.com>
^ permalink raw reply
* Re: [Bugme-new] [Bug 9721] New: wake on lan fails with sky2 module
From: Stephen Hemminger @ 2008-01-14 16:39 UTC (permalink / raw)
To: Andrew Morton
Cc: supersud501, Rafael J. Wysocki, netdev, linux-acpi, bugme-daemon
In-Reply-To: <20080113112712.e93f07a4.akpm@linux-foundation.org>
On Sun, 13 Jan 2008 11:27:12 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:
> On Sun, 13 Jan 2008 16:08:38 +0100 supersud501 <supersud501@yahoo.de> wrote:
>
> >
> >
> > supersud501 wrote:
> > >
> > >
> > > Rafael J. Wysocki wrote:
> > >
> > >>
> > >> Since it seems to be 100% reproducible, it would be very helpful if
> > >> you could
> > >> use git-bisect to identify the offending commit.
> > >>
> > >
> > > allright, bisect found the offending commit, here's what i've done:
> > >
> > > first i started bisect with the following command (since i assumed it is
> > > a net-driver problem):
> > >
> > > git-bisect start 'v2.6.24-rc6' 'v2.6.23' '--' 'drivers/net/'
> > >
> > > after building many kernels and saying good/bad if wol worked/didn't
> > > work etc. it identified the following commit:
> > >
> > > # bad: [ac93a3946b676025fa55356180e8321639744b31] sky2: enable PCI
> > > config writes
> > >
> > > and refs/bisect/bad gives:
> > >
> > > 14:16:53 /usr/src/linux-2.6/.git # cat refs/bisect/bad
> > > ac93a3946b676025fa55356180e8321639744b31
> > >
> > >
> > > need some more info?
> > >
> >
> > i just checked it: commented out the passage of the commit in kernel
> > 2.6.24-rc7-git4 and compiled it: wol WORKS. so this one line is causing
> > my wol-disturbance...
> >
> >
>
> So simply reverting this:
>
> commit ac93a3946b676025fa55356180e8321639744b31
> Author: Stephen Hemminger <shemminger@linux-foundation.org>
> Date: Mon Nov 5 15:52:08 2007 -0800
>
> sky2: enable PCI config writes
>
> On some boards, PCI configuration space access is turned off by default.
> The 2.6.24 driver doesn't turn it on, and should have.
>
> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
> Signed-off-by: Jeff Garzik <jeff@garzik.org>
>
> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> index c27c7d6..4f41a94 100644
> --- a/drivers/net/sky2.c
> +++ b/drivers/net/sky2.c
> @@ -2791,6 +2791,9 @@ static void sky2_reset(struct sky2_hw *hw)
> sky2_write8(hw, B0_CTST, CS_RST_SET);
> sky2_write8(hw, B0_CTST, CS_RST_CLR);
>
> + /* allow writes to PCI config */
> + sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_ON);
> +
> /* clear PCI errors, if any */
> pci_read_config_word(pdev, PCI_STATUS, &status);
> status |= PCI_STATUS_ERROR_BITS;
>
> fixes this regression?
>
> If so, we should revert that change.
>
> > but i noticed another "bug" on 2.6.24-rc7-git with sky2: dmesg shows a
> > lot of lines every 5 seconds:
> >
> > [...]
> > [ 357.400462] sky2 0000:02:00.0: error interrupt status=0xc0000000
> > [ 362.442039] printk: 41 messages suppressed.
> > [ 362.442043] sky2 0000:02:00.0: error interrupt status=0x80000000
> > [ 367.439151] printk: 18 messages suppressed.
> > [ 367.439156] sky2 0000:02:00.0: error interrupt status=0x80000000
> > [ 372.436267] printk: 30 messages suppressed.
> > [ 372.436271] sky2 0000:02:00.0: error interrupt status=0x80000000
> > [ 377.350236] printk: 19 messages suppressed.
> > [...]
> >
> > since i do not notice any errors (yet) i'll wait till next rc, maybe it
> > will be gone then...
>
> That's not good. is this new behaviour?
>
No, reverting that change will break other systems (including mine).
--
Stephen Hemminger <stephen.hemminger@vyatta.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox