* Re: RCU latency regression in 2.6.16-rc1
From: Lee Revell @ 2006-01-25 23:13 UTC (permalink / raw)
To: Ingo Molnar
Cc: dipankar, Paul E. McKenney, linux-kernel, Linus Torvalds, NetDev
In-Reply-To: <20060125225639.GA1382@elte.hu>
On Wed, 2006-01-25 at 23:56 +0100, Ingo Molnar wrote:
>
> yes, that would be a nice test. (I'm busy now with mutex stuff to be
> able to do a working softirq-preemption patch, but i sent you my
> current patches off-list - if you want to give it a shot. Be warned
> though, there will likely be quite some merging work to do, so it's
> definitely not for the faint hearted.)
>
OK, I probably won't have time to test it this week either.
In the meantime can anyone explain briefly why such a heavy fix is
needed? It seems like it would be simpler to make the route cache
flushing operate in batches of 100 routes, rather than invalidating the
whole thing in one shot. This does seem to be the only softirq that
regularly runs for much more than 1ms.
Would this require major surgery on the networking subsystem?
Lee
^ permalink raw reply
* Re: e100 oops on resume
From: Jesse Brandeburg @ 2006-01-25 22:28 UTC (permalink / raw)
To: Olaf Kirch; +Cc: Stefan Seyfried, Linux Kernel Mailing List, netdev
In-Reply-To: <20060125201450.GA15102@suse.de>
On 1/25/06, Olaf Kirch <okir@suse.de> wrote:
> On Wed, Jan 25, 2006 at 11:37:40AM -0800, Jesse Brandeburg wrote:
> > its an interesting patch, but it raises the question why does
> > e100_init_hw need to be called at all in resume? I looked back
> > through our history and that init_hw call has always been there. I
> > think its incorrect, but its taking me a while to set up a system with
> > the ability to resume.
>
> I'll ask the folks here to give it a try tomorrow. But I suspect at
> least some of it will be needed. For instance I assume you'll
> have to reload to ucode when bringing the NIC back from sleep.
I totally agree thats what it looks like, but unless I'm missing
something e100_up will take care of everything, and if the interface
is not up, e100_open->e100_up afterward will take care of it.
we have to be really careful about what might happen when resuming on
a system with a SMBUS link to a BMC, as there are some tricky
transitions in the hardware that can be easily violated.
Jesse
^ permalink raw reply
* Re: [PATCH 2.6.15-git9a] aoe [1/1]: do not stop retransmit timer when device goes down
From: Al Boldi @ 2006-01-25 22:04 UTC (permalink / raw)
To: Ed L. Cashin; +Cc: linux-kernel, linux-raid, netdev
Ed L. Cashin wrote:
> This patch is a bugfix that follows and depends on the
> eight aoe driver patches sent January 19th.
Will they also fix this?
Or is this an md bug?
It only happens with aoe.
Also, why is aoe slower than nbd?
md: bind<etherd/e0.0>
------------[ cut here ]------------
kernel BUG at fs/sysfs/symlink.c:87!
invalid operand: 0000 [#1]
CPU: 0
EIP: 0060:[<c0188166>] Not tainted VLI
EFLAGS: 00210246 (2.6.15)
EIP is at sysfs_create_link+0x56/0x60
eax: c66de390 ebx: 00000000 ecx: c03db91f edx: c7ee0040
esi: c211bdf8 edi: c7ca0400 ebp: c66de360 esp: c211bdb4
ds: 007b es: 007b ss: 0068
Process mkraid (pid: 701, threadinfo=c211b000 task=c2300600)
Stack: c7ca0424 c66de390 c211bdf8 c66de390 c02e5997 c66de390 c6b1b5ec
c03db91f
00200296 c0207d56 c66de3a8 c66de360 c02e650f c66de390 09800000
5c4725a7
98831dc4 65687465 652f6472 00302e30 3feed8a3 891a1652 7f3dc64e
ab9a9a72
Call Trace:
[<c02e5997>] bind_rdev_to_array+0x157/0x1a0
[<c0207d56>] kobject_init+0x16/0x50
[<c02e650f>] md_import_device+0xbf/0x1c0
[<c02e80ad>] add_new_disk+0x22d/0x390
[<c024403f>] get_random_bytes+0x2f/0x40
[<c020be9e>] copy_from_user+0x4e/0x90
[<c02e8ef8>] md_ioctl+0x2e8/0x710
[<c01fdb46>] blkdev_driver_ioctl+0x56/0x70
[<c01fdbf3>] blkdev_ioctl+0x93/0x1a0
[<c015a83b>] block_ioctl+0x2b/0x30
[<c01641ce>] do_ioctl+0x6e/0x80
[<c016435a>] vfs_ioctl+0x6a/0x1e0
[<c0164515>] sys_ioctl+0x45/0x70
[<c0103009>] syscall_call+0x7/0xb
Code: 4c 24 04 8b 44 24 18 89 1c 24 89 44 24 08 e8 f2 fe ff ff 8b 53 08 89 c1
ff 42 70 0f 8e 0b 02 00 00 8b 5c 24 0c 89 c8 83 c4 10 c3 <0f> 0b 57 00 5e a6
3d c0 eb be 8b 44 24 04 8b 40 30 89 44 24 04
^ permalink raw reply
* Re: e100 oops on resume
From: Olaf Kirch @ 2006-01-25 20:14 UTC (permalink / raw)
To: Jesse Brandeburg; +Cc: Stefan Seyfried, Linux Kernel Mailing List, netdev
In-Reply-To: <4807377b0601251137r7621216byc47b03a3c634557c@mail.gmail.com>
On Wed, Jan 25, 2006 at 11:37:40AM -0800, Jesse Brandeburg wrote:
> its an interesting patch, but it raises the question why does
> e100_init_hw need to be called at all in resume? I looked back
> through our history and that init_hw call has always been there. I
> think its incorrect, but its taking me a while to set up a system with
> the ability to resume.
I'll ask the folks here to give it a try tomorrow. But I suspect at
least some of it will be needed. For instance I assume you'll
have to reload to ucode when bringing the NIC back from sleep.
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
^ permalink raw reply
* Re: e100 oops on resume
From: Jesse Brandeburg @ 2006-01-25 19:37 UTC (permalink / raw)
To: Olaf Kirch; +Cc: Stefan Seyfried, Linux Kernel Mailing List, netdev
In-Reply-To: <20060125121125.GH5465@suse.de>
[-- Attachment #1: Type: text/plain, Size: 966 bytes --]
On 1/25/06, Olaf Kirch <okir@suse.de> wrote:
> On Wed, Jan 25, 2006 at 10:02:40AM +0100, Olaf Kirch wrote:
> > I'm not sure what the right fix would be. e100_resume would probably
> > have to call e100_alloc_cbs early on, while e100_up should avoid
> > calling it a second time if nic->cbs_avail != 0. A tentative patch
> > for testing is attached.
>
> Reportedly, the patch fixes the crash on resume.
Cool, thanks for the research, I have a concern about this however.
its an interesting patch, but it raises the question why does
e100_init_hw need to be called at all in resume? I looked back
through our history and that init_hw call has always been there. I
think its incorrect, but its taking me a while to set up a system with
the ability to resume.
everywhere else in the driver alloc_cbs is called before init_hw so it
just seems like a long standing bug.
comments? anyone want to test? i compile tested this, but it is untested.
[-- Attachment #2: e100_resume_no_init.diff --]
[-- Type: application/octet-stream, Size: 818 bytes --]
e100: remove init_hw call to fix panic
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
e100 seems to have had a long standing bug where e100_init_hw was being
called when it should not have been. This caused a panic due to recent
changes that rely on correct set up in the driver, and more robust error
paths.
---
drivers/net/e100.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/drivers/net/e100.c b/drivers/net/e100.c
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -2752,8 +2752,6 @@ static int e100_resume(struct pci_dev *p
retval = pci_enable_wake(pdev, 0, 0);
if (retval)
DPRINTK(PROBE,ERR, "Error clearing wake events\n");
- if(e100_hw_init(nic))
- DPRINTK(HW, ERR, "e100_hw_init failed\n");
netif_device_attach(netdev);
if(netif_running(netdev))
^ permalink raw reply
* Re: [softmac-dev] [PATCH] ieee80211_rx_any: filter out packets, call ieee80211_rx or ieee80211_rx_mgt
From: Stuffed Crust @ 2006-01-25 15:44 UTC (permalink / raw)
To: Johannes Berg
Cc: Denis Vlasenko, John W. Linville, jbenc, netdev, softmac-dev,
linux-kernel, bcm43xx-dev
In-Reply-To: <1138026752.3957.98.camel@localhost>
[-- Attachment #1: Type: text/plain, Size: 764 bytes --]
On Mon, Jan 23, 2006 at 03:32:32PM +0100, Johannes Berg wrote:
> Shouldn't you BSS-filter management packets too?
Filtering on BSSID is necessary for management frames, especially when
multicast management frames are thrown into the mix.
For example, STAs are supposed to respect broadcast disassoc/deauth
messages, but of course should ignore them if they're not destined for
the local BSSID.
The only extra-BSS management frames that should not be dropped are are
beacons and probe responses. That said, probe responses are directed so
our A1 (RA) filter will probably drop the frame if it is not destined
for us.
- Solomon
--
Solomon Peachy ICQ: 1318344
Melbourne, FL
Quidquid latine dictum sit, altum viditur.
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply
* Re: e100 oops on resume
From: Olaf Kirch @ 2006-01-25 12:11 UTC (permalink / raw)
To: Stefan Seyfried, Linux Kernel Mailing List, netdev
In-Reply-To: <20060125090240.GA12651@suse.de>
On Wed, Jan 25, 2006 at 10:02:40AM +0100, Olaf Kirch wrote:
> I'm not sure what the right fix would be. e100_resume would probably
> have to call e100_alloc_cbs early on, while e100_up should avoid
> calling it a second time if nic->cbs_avail != 0. A tentative patch
> for testing is attached.
Reportedly, the patch fixes the crash on resume.
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
^ permalink raw reply
* Re: Fw: [Bugme-new] [Bug 5936] New: Openswan tunnels + netfilter problem
From: Herbert Xu @ 2006-01-25 9:57 UTC (permalink / raw)
To: Patrick McHardy; +Cc: akpm, netdev, netfilter-devel, webmaster
In-Reply-To: <43D74407.5040705@trash.net>
On Wed, Jan 25, 2006 at 10:25:27AM +0100, Patrick McHardy wrote:
>
> I don't like adding this special behaviour for NAT, people need
> to adjust their rulesets for filtering etc. anyway. We could stop
> rerouting packets in between transforms (when both dst->xfrm and
> IPSKB_XFRM_TRANSFORMED are set), but this is inconsistent with what
> happens on input, when a packet is DNATed in PRE_ROUTING it does
Actually we can never achieve perfect symmetry because the two cases
are fundamentally different. On outbound we start with a template
which guides us all the way to the end. On inbound we (currently)
don't determine the policy until the very end.
> affect the SA lookup. So I think I'd prefer handling this case in
> xfrm[46]_output_finish, but I need to think about it a bit more.
Having said that I'm certainly not averse to such a solution. The
only thing I would like to see is for it to be flexible enough so
that you always get at least one chance to SNAT before the xfrm_policy
is completely pinned down. This should leave the user with enough
flexibility to do whatever they wish.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: Fw: [Bugme-new] [Bug 5936] New: Openswan tunnels + netfilter problem
From: Patrick McHardy @ 2006-01-25 9:25 UTC (permalink / raw)
To: Herbert Xu; +Cc: akpm, netdev, netfilter-devel, webmaster
In-Reply-To: <E1F1IYh-0001I3-00@gondolin.me.apana.org.au>
Herbert Xu wrote:
> Patrick McHardy <kaber@trash.net> wrote:
>
>>Andrew Morton wrote:
>>
>>>http://bugzilla.kernel.org/show_bug.cgi?id=5936
>>
>>Please post your iptables rules and the full list of loaded modules.
>
>
> The problem is caused by SNAT on a dst that already has an xfrm set.
> When ip_route_me_harder processes the dst it will cause the dst to
> lose its xfrm since it has IPSKB_XFRM_TRANSFORMED set.
>
> Since xfrm4_output_finish does not expect dst's to lose their xfrm's
> after POST_ROUTING, it crashes.
>
> Obviously we could add a check in xfrm4_output_finish to prevent this
> crash, however, I think we need to consider this a bit more since it
> breaks a fairly common setup where people just stick a rule into the
> NAT table that says
>
> iptables -t nat -I POSTROUTING -i eth1 -j MASQUERADE
>
> where eth1 is the outbound interface. If this rule catches any IPsec
> VPN traffic then it'll SNAT them even though the intention is obviously
> to let them through without SNAT.
>
> Perhaps it's best to have SNAT not touch packets with dst->xfrm set.
> Unfortunately that leads to problems as well (albeit rarer) since you
> may have catch-all IPsec policies that every packet matches, but you
> want certain packets to be SNATed so that they match more specific
> policies.
I don't like adding this special behaviour for NAT, people need
to adjust their rulesets for filtering etc. anyway. We could stop
rerouting packets in between transforms (when both dst->xfrm and
IPSKB_XFRM_TRANSFORMED are set), but this is inconsistent with what
happens on input, when a packet is DNATed in PRE_ROUTING it does
affect the SA lookup. So I think I'd prefer handling this case in
xfrm[46]_output_finish, but I need to think about it a bit more.
^ permalink raw reply
* Re: e100 oops on resume
From: Olaf Kirch @ 2006-01-25 9:02 UTC (permalink / raw)
To: Stefan Seyfried, Linux Kernel Mailing List, netdev
In-Reply-To: <20060124232142.GB6174@inferi.kami.home>
[-- Attachment #1: Type: text/plain, Size: 1478 bytes --]
On Wed, Jan 25, 2006 at 12:21:42AM +0100, Mattia Dongili wrote:
> I experienced the same today, I was planning to get a photo tomorrow :)
> I'm running 2.6.16-rc1-mm2 and the last working kernel was 2.6.15-mm4
> (didn't try .16-rc1-mm1 being scared of the reiserfs breakage).
I think that's because the latest driver version wants to wait for
the ucode download, and e100_exec_cb_wait before allocating any
control blocks.
static inline int e100_exec_cb_wait(struct nic *nic, struct sk_buff *skb,
void (*cb_prepare)(struct nic *, struct cb *, struct sk_buff *))
{
int err = 0, counter = 50;
struct cb *cb = nic->cb_to_clean;
if ((err = e100_exec_cb(nic, NULL, e100_setup_ucode)))
DPRINTK(PROBE,ERR, "ucode cmd failed with error %d\n", err);
/* NOTE: the oops shows that e100_exec_cb fails with ENOMEM,
* which also means there are no cbs */
/* ... other stuff...
* and then we die here because cb is NULL: */
while (!(cb->status & cpu_to_le16(cb_complete))) {
msleep(10);
if (!--counter) break;
}
I'm not sure what the right fix would be. e100_resume would probably
have to call e100_alloc_cbs early on, while e100_up should avoid
calling it a second time if nic->cbs_avail != 0. A tentative patch
for testing is attached.
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
[-- Attachment #2: e100-resume-fix --]
[-- Type: text/plain, Size: 1830 bytes --]
[PATCH] e100: allocate cbs early on when resuming
Signed-off-by: Olaf Kirch <okir@suse.de>
drivers/net/e100.c | 14 +++++++++++---
1 files changed, 11 insertions(+), 3 deletions(-)
Index: build/drivers/net/e100.c
===================================================================
--- build.orig/drivers/net/e100.c
+++ build/drivers/net/e100.c
@@ -1298,8 +1298,10 @@ static inline int e100_exec_cb_wait(stru
int err = 0, counter = 50;
struct cb *cb = nic->cb_to_clean;
- if ((err = e100_exec_cb(nic, NULL, e100_setup_ucode)))
+ if ((err = e100_exec_cb(nic, NULL, e100_setup_ucode))) {
DPRINTK(PROBE,ERR, "ucode cmd failed with error %d\n", err);
+ return err;
+ }
/* must restart cuc */
nic->cuc_cmd = cuc_start;
@@ -1721,9 +1723,11 @@ static int e100_alloc_cbs(struct nic *ni
struct cb *cb;
unsigned int i, count = nic->params.cbs.count;
+ /* bail out if we've been here before */
+ if (nic->cbs_avail)
+ return 0;
+
nic->cuc_cmd = cuc_start;
- nic->cb_to_use = nic->cb_to_send = nic->cb_to_clean = NULL;
- nic->cbs_avail = 0;
nic->cbs = pci_alloc_consistent(nic->pdev,
sizeof(struct cb) * count, &nic->cbs_dma_addr);
@@ -2578,6 +2582,8 @@ static int __devinit e100_probe(struct p
nic->pdev = pdev;
nic->msg_enable = (1 << debug) - 1;
pci_set_drvdata(pdev, netdev);
+ nic->cb_to_use = nic->cb_to_send = nic->cb_to_clean = NULL;
+ nic->cbs_avail = 0;
if((err = pci_enable_device(pdev))) {
DPRINTK(PROBE, ERR, "Cannot enable PCI device, aborting.\n");
@@ -2752,6 +2758,8 @@ static int e100_resume(struct pci_dev *p
retval = pci_enable_wake(pdev, 0, 0);
if (retval)
DPRINTK(PROBE,ERR, "Error clearing wake events\n");
+ if ((retval = e100_alloc_cbs(nic)))
+ DPRINTK(PROBE,ERR, "No memory for cbs\n");
if(e100_hw_init(nic))
DPRINTK(HW, ERR, "e100_hw_init failed\n");
^ permalink raw reply
* Re: [PATCH] ipw2200: fix ->eeprom[EEPROM_VERSION] check
From: Zhu Yi @ 2006-01-25 4:34 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: Andrew Morton, linux-kernel, netdev
In-Reply-To: <20060125004429.GE3234@mipter.zuzino.mipt.ru>
Acked.
Thanks,
-yi
On Wed, 2006-01-25 at 03:44 +0300, Alexey Dobriyan wrote:
> priv->eeprom is a pointer.
>
> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
> ---
>
> drivers/net/wireless/ipw2200.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/drivers/net/wireless/ipw2200.c
> +++ b/drivers/net/wireless/ipw2200.c
> @@ -2456,7 +2456,7 @@ static void ipw_eeprom_init_sram(struct
> copy. Otherwise let the firmware know to perform the operation
> on it's own
> */
> - if ((priv->eeprom + EEPROM_VERSION) != 0) {
> + if (priv->eeprom[EEPROM_VERSION] != 0) {
> IPW_DEBUG_INFO("Writing EEPROM data into SRAM\n");
>
> /* write the eeprom data to sram */
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply
* [PATCH] ipw2200: fix ->eeprom[EEPROM_VERSION] check
From: Alexey Dobriyan @ 2006-01-25 0:44 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, netdev
priv->eeprom is a pointer.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---
drivers/net/wireless/ipw2200.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/wireless/ipw2200.c
+++ b/drivers/net/wireless/ipw2200.c
@@ -2456,7 +2456,7 @@ static void ipw_eeprom_init_sram(struct
copy. Otherwise let the firmware know to perform the operation
on it's own
*/
- if ((priv->eeprom + EEPROM_VERSION) != 0) {
+ if (priv->eeprom[EEPROM_VERSION] != 0) {
IPW_DEBUG_INFO("Writing EEPROM data into SRAM\n");
/* write the eeprom data to sram */
^ permalink raw reply
* Re: e100 oops on resume
From: Mattia Dongili @ 2006-01-24 23:21 UTC (permalink / raw)
To: Stefan Seyfried; +Cc: Linux Kernel Mailing List, netdev
In-Reply-To: <20060124225919.GC12566@suse.de>
On Tue, Jan 24, 2006 at 11:59:19PM +0100, Stefan Seyfried wrote:
> Hi,
> since 2.6.16rc1-git3, e100 dies on resume (regardless if from disk, ram or
> runtime powermanagement). Unfortunately i only have a bad photo of
> the oops right now, it is available from
> https://bugzilla.novell.com/attachment.cgi?id=64761&action=view
> I have reproduced this on a second e100 machine and can get a serial
> console log from this machine tomorrow if needed.
> It did resume fine with 2.6.15-git12
I experienced the same today, I was planning to get a photo tomorrow :)
I'm running 2.6.16-rc1-mm2 and the last working kernel was 2.6.15-mm4
(didn't try .16-rc1-mm1 being scared of the reiserfs breakage).
--
mattia
:wq!
^ permalink raw reply
* e100 oops on resume
From: Stefan Seyfried @ 2006-01-24 22:59 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: netdev
Hi,
since 2.6.16rc1-git3, e100 dies on resume (regardless if from disk, ram or
runtime powermanagement). Unfortunately i only have a bad photo of
the oops right now, it is available from
https://bugzilla.novell.com/attachment.cgi?id=64761&action=view
I have reproduced this on a second e100 machine and can get a serial
console log from this machine tomorrow if needed.
It did resume fine with 2.6.15-git12
--
Stefan Seyfried \ "I didn't want to write for pay. I
QA / R&D Team Mobile Devices \ wanted to be paid for what I write."
SUSE LINUX Products GmbH, Nürnberg \ -- Leonard Cohen
^ permalink raw reply
* Re: [BUG] sky2 broken for Yukon PCI-E Gigabit Ethernet Controller 11ab:4362 (rev 19)
From: Herbert Xu @ 2006-01-24 20:32 UTC (permalink / raw)
To: Knut Petersen; +Cc: shemminger, netdev, linux-kernel
In-Reply-To: <43D5F6DD.70702@t-online.de>
Knut Petersen <Knut_Petersen@t-online.de> wrote:
>
> "ethtool -K eth0 rx off" does cure my problem with sky2.
>
> Anybody is invited to send patches as the problem is 100% reproducible here.
Does the problem go away if you disable conntrack by unloading its module?
Please try to capture the offending ICMP packet with tcpdump and show us
what it looks like.
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [BUG] sky2 broken for Yukon PCI-E Gigabit Ethernet Controller 11ab:4362 (rev 19)
From: Stephen Hemminger @ 2006-01-24 17:54 UTC (permalink / raw)
To: Knut Petersen; +Cc: netdev, linux-kernel
In-Reply-To: <43D5F6DD.70702@t-online.de>
On Tue, 24 Jan 2006 10:43:57 +0100
Knut Petersen <Knut_Petersen@t-online.de> wrote:
> Stephen Hemminger schrieb:
>
> >Could you try turning off rx checksumming (with ethtool).
> > ethtool -K eth0 rx off
> >
> >There probably still are (generic) bugs in the netfilter code for CHECKSUM_HW
> >socket buffers.
> >
> >
> >
> "ethtool -K eth0 rx off" does cure my problem with sky2.
>
> Anybody is invited to send patches as the problem is 100% reproducible here.
>
Does it always show up on icmp only?
What are the iptables rules (iptables -L)
--
Stephen Hemminger <shemminger@osdl.org>
OSDL http://developer.osdl.org/~shemminger
^ permalink raw reply
* [PATCH] sky2: fix hang on Yukon-EC (0xb6) rev 1
From: Carl-Daniel Hailfinger @ 2006-01-24 13:19 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Linux Kernel Mailing List, netdev
This patch for sky2 fixes a hang on Yukon-EC (0xb6) rev 1
where suddenly no more interrupts were delivered.
I don't know the real cause of the hang due to lack of docs,
but the patch has been running stable for a few hours
whereas the unmodified driver will hang after less than
2 minutes.
Regards,
Carl-Daniel
--
http://www.hailfinger.org/
Signed-off-by: Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net>
--- linux-2.6.15/drivers/net/sky2.c 2006-01-23 23:41:35.000000000 +0100
+++ linux-2.6.15/drivers/net/sky2.c 2006-01-24 14:12:12.000000000 +0100
@@ -1913,8 +1913,26 @@
}
exit_loop:
+ /* Is this really a good idea?
+ * We clear all IRQs although there may be pending work due to
+ * - packets arrived since start of this function
+ * - the (++work_done >= to_do) abort
+ */
sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ);
+ /* Pending resolution of the comment above, at least kick the
+ * STAT_LEV_TIMER_CTRL timer.
+ * This fixes my hangs on Yukon-EC (0xb6) rev 1.
+ * The if clause is there to start the timer only if it has been
+ * configured correctly and not been disabled via ethtool.
+ * Maybe it would be sufficient to only restart the timer if
+ * there is pending work. Without docs, that is hard to say.
+ */
+ if (sky2_read8(hw, STAT_LEV_TIMER_CTRL) == TIM_START) {
+ sky2_write8(hw, STAT_LEV_TIMER_CTRL, TIM_STOP);
+ sky2_write8(hw, STAT_LEV_TIMER_CTRL, TIM_START);
+ }
+
sky2_tx_check(hw, 0, tx_done[0]);
sky2_tx_check(hw, 1, tx_done[1]);
^ permalink raw reply
* [PATCH] sky2: fix ethtool ops
From: Carl-Daniel Hailfinger @ 2006-01-24 12:49 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Linux Kernel Mailing List, netdev
This fixes setting rx_coalesce_usecs_irq via ethtool in sky2.
The write was directed to the wrong register.
Signed-off-by: Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net>
--- linux/drivers/net/sky2.c 2006-01-23 23:41:35.000000000 +0100
+++ linux/drivers/net/sky2.c 2006-01-24 12:52:11.000000000 +0100
@@ -2843,7 +2843,7 @@
if (ecmd->rx_coalesce_usecs_irq == 0)
sky2_write8(hw, STAT_ISR_TIMER_CTRL, TIM_STOP);
else {
- sky2_write32(hw, STAT_TX_TIMER_INI,
+ sky2_write32(hw, STAT_ISR_TIMER_INI,
sky2_us2clk(hw, ecmd->rx_coalesce_usecs_irq));
sky2_write8(hw, STAT_ISR_TIMER_CTRL, TIM_START);
}
^ permalink raw reply
* Re: [BUG] sky2 broken for Yukon PCI-E Gigabit Ethernet Controller 11ab:4362 (rev 19)
From: Knut Petersen @ 2006-01-24 9:43 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, linux-kernel
In-Reply-To: <20060123112751.2e3f1b15@dxpl.pdx.osdl.net>
Stephen Hemminger schrieb:
>Could you try turning off rx checksumming (with ethtool).
> ethtool -K eth0 rx off
>
>There probably still are (generic) bugs in the netfilter code for CHECKSUM_HW
>socket buffers.
>
>
>
"ethtool -K eth0 rx off" does cure my problem with sky2.
Anybody is invited to send patches as the problem is 100% reproducible here.
cu,
Knut
^ permalink raw reply
* Re: [Bcm43xx-dev] Re: [softmac-dev] [PATCH] ieee80211_rx_any: filter out packets, call ieee80211_rx or ieee80211_rx_mgt
From: Denis Vlasenko @ 2006-01-24 8:06 UTC (permalink / raw)
To: bcm43xx-dev
Cc: Johannes Berg, John W. Linville, jbenc, netdev, softmac-dev,
linux-kernel
In-Reply-To: <1138026752.3957.98.camel@localhost>
On Monday 23 January 2006 16:32, Johannes Berg wrote:
> On Sun, 2006-01-22 at 14:04 +0200, Denis Vlasenko wrote:
> > + hdr = (struct ieee80211_hdr_4addr *)skb->data;:
> > + fc = le16_to_cpu(hdr->frame_ctl);:
> > +:
> > + switch (fc & IEEE80211_FCTL_FTYPE) {:
> > + case IEEE80211_FTYPE_MGMT:
> > + ieee80211_rx_mgt(ieee, hdr, stats);:
> > + return 0;:
>
> Shouldn't you BSS-filter management packets too?
>
> > + is_packet_for_us = 0;:
> > + switch (ieee->iw_mode) {:
> > + case IW_MODE_ADHOC:
> > + /* promisc: get all */
> > + if (ieee->dev->flags & IFF_PROMISC):
> > + is_packet_for_us = 1;
>
> And I still think BSS-filtering is correct even in the promisc case. Any
> other opinions why either way is right or not? [I think we should filter
> because upper layers won't know the packet wasn't for us if it was
> broadcast in another BSSID]
In wired networks promisc literally means "receive all packets", right?
But for wireless, maybe we should filter them out, or else running tcpdump
on the iface will force us to listen to ARP packets from unrelated networks.
That would be rather surprising and disrupting.
--
vda
^ permalink raw reply
* RE: My vote against eepro* removal
From: kus Kusche Klaus @ 2006-01-24 7:38 UTC (permalink / raw)
To: Jesse Brandeburg
Cc: Lee Revell, Evgeniy Polyakov, Adrian Bunk, linux-kernel,
Ronciak, John, netdev, Steven Rostedt
From: Jesse Brandeburg
> On Mon, 23 Jan 2006, kus Kusche Klaus wrote:
> > Here are my results:
> >
> > If the watchdog doesn't get interrupted, preempted, or whatever,
> > it spends 340 us in its body:
> > * 303 us in the mii code
> > * 36 us in the following code up to e100_adjust_adaptive_ifs
> > * 1 us in the remaining code (I think my chip doesn't need any
> > of those chip-specific fixups)
> >
> > The 303 us in the mii code are divided in the following way:
> > * 101 us in mii_ethtool_gset
> > * 135 us in the whole if
> > * 67 us in mii_check_link
> >
> > This is with the udelay(2) instead of udelay(20) hack applied.
> > With udelay(20), the mii times are 128 + 170 + 85 us,
> > i.e. 383 us instead of 303 us, or >= 420 us for the whole watchdog.
>
> Thank you very much for that detailed analysis! okay, so
> calls to mii.c
> take too long, but those depend on mmio_read in e100 to do
> the work, so
> this patch attempts to minimize the latency.
>
> This patch is against linus-2.6.git, I compile and ssh/ping
> tested it.
> Would you be willing to send your instrumentation patches? I
> could then
> test any fixes easier.
No deep magic behind my instrumentation:
A few global variables and some rdtscl in the watchdog:
unsigned long my_tsc_1 = 0;
unsigned long my_tsc_2 = 0;
unsigned long my_tsc_3 = 0;
unsigned long my_tsc_4 = 0;
EXPORT_SYMBOL(my_tsc_1);
EXPORT_SYMBOL(my_tsc_2);
EXPORT_SYMBOL(my_tsc_3);
EXPORT_SYMBOL(my_tsc_4);
static void e100_watchdog(unsigned long data)
{
struct nic *nic = (struct nic *)data;
struct ethtool_cmd cmd;
DPRINTK(TIMER, DEBUG, "right now = %ld\n", jiffies);
/* mii library handles link maintenance tasks */
rdtscl(my_tsc_1);
mii_ethtool_gset(&nic->mii, &cmd);
rdtscl(my_tsc_2);
if(mii_link_ok(&nic->mii) && !netif_carrier_ok(nic->netdev)) {
DPRINTK(LINK, INFO, "link up, %sMbps, %s-duplex\n",
cmd.speed == SPEED_100 ? "100" : "10",
cmd.duplex == DUPLEX_FULL ? "full" : "half");
} else if(!mii_link_ok(&nic->mii) && netif_carrier_ok(nic->netdev)) {
DPRINTK(LINK, INFO, "link down\n");
}
rdtscl(my_tsc_3);
mii_check_link(&nic->mii);
rdtscl(my_tsc_4);
/* Software generated interrupt to recover from (rare) Rx
* allocation failure.
...
And a small module which prints the timings periodically
when loaded:
/* Example module, built after LDD book release 3 */
#include <linux/init.h>
#include <linux/module.h>
#include <linux/version.h>
#include <linux/errno.h>
#include <linux/timer.h>
MODULE_LICENSE("GPL");
/* Output interval, in jiffies */
#define INTERVAL 2111
/* Output scaling: TSC ==> microseconds */
#define SCALE(x) ((x)/500)
extern unsigned long my_tsc_1;
extern unsigned long my_tsc_2;
extern unsigned long my_tsc_3;
extern unsigned long my_tsc_4;
static struct timer_list my_timer;
static void timer_func(unsigned long dummy)
{
printk(KERN_NOTICE "my_timer: diff = %lu / %lu / %lu\n",
SCALE(my_tsc_2 - my_tsc_1),
SCALE(my_tsc_3 - my_tsc_2),
SCALE(my_tsc_4 - my_tsc_3));
my_timer.expires += INTERVAL;
add_timer(&my_timer);
}
static int __init mymod_init(void)
{
init_timer(&my_timer);
my_timer.function = timer_func;
my_timer.expires = jiffies + INTERVAL;
add_timer(&my_timer);
printk(KERN_NOTICE "Started mymod...\n");
return 0;
}
static void __exit mymod_exit(void)
{
del_timer_sync(&my_timer);
printk(KERN_NOTICE "Finished mymod...\n");
}
module_init(mymod_init);
module_exit(mymod_exit);
>
> e100: attempt a shorter delay for mdio reads
>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>
> Simply reorder our write/read sequence for mdio reads to
> minimize latency
> as well as delay a shorter interval for each loop.
>
> diff --git a/drivers/net/e100.c b/drivers/net/e100.c
> --- a/drivers/net/e100.c
> +++ b/drivers/net/e100.c
> @@ -891,23 +891,25 @@ static u16 mdio_ctrl(struct nic *nic, u3
> * procedure it should be done under lock.
> */
> spin_lock_irqsave(&nic->mdio_lock, flags);
> - for (i = 100; i; --i) {
> + for (i = 1000; i; --i) {
> if (readl(&nic->csr->mdi_ctrl) & mdi_ready)
> break;
> - udelay(20);
> + udelay(2);
> }
> if (unlikely(!i)) {
> - printk("e100.mdio_ctrl(%s) won't go Ready\n",
> + DPRINTK(PROBE, ERR, "e100.mdio_ctrl(%s) won't
> go Ready\n",
> nic->netdev->name );
> spin_unlock_irqrestore(&nic->mdio_lock, flags);
> return 0; /* No way to indicate
> timeout error */
> }
The piece of code above is not yet present
in my version of e100.
(I'm still at 2.6.15, there is no -rt patch for 2.6.16 yet)
> writel((reg << 16) | (addr << 21) | dir | data,
> &nic->csr->mdi_ctrl);
>
> - for (i = 0; i < 100; i++) {
> - udelay(20);
> + /* to avoid latency, read to flush the write, then
> delay, and only
> + * delay 2us per loop, manual says read should complete
> in < 64us */
> + for (i = 0; i < 1000; i++) {
> if ((data_out = readl(&nic->csr->mdi_ctrl)) & mdi_ready)
> break;
> + udelay(2);
> }
Exchanging the if and the udelay made things slightly worse:
It runs with 103 / 136 / 68 instead of 101 / 135 / 67 us.
--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: kus@keba.com WWW: www.keba.com
^ permalink raw reply
* Re: Fw: [Bugme-new] [Bug 5936] New: Openswan tunnels + netfilter problem
From: Herbert Xu @ 2006-01-24 7:25 UTC (permalink / raw)
To: Patrick McHardy; +Cc: akpm, netdev, netfilter-devel, webmaster
In-Reply-To: <43D3F186.6030206@trash.net>
Patrick McHardy <kaber@trash.net> wrote:
> Andrew Morton wrote:
>>
>> http://bugzilla.kernel.org/show_bug.cgi?id=5936
>
> Please post your iptables rules and the full list of loaded modules.
The problem is caused by SNAT on a dst that already has an xfrm set.
When ip_route_me_harder processes the dst it will cause the dst to
lose its xfrm since it has IPSKB_XFRM_TRANSFORMED set.
Since xfrm4_output_finish does not expect dst's to lose their xfrm's
after POST_ROUTING, it crashes.
Obviously we could add a check in xfrm4_output_finish to prevent this
crash, however, I think we need to consider this a bit more since it
breaks a fairly common setup where people just stick a rule into the
NAT table that says
iptables -t nat -I POSTROUTING -i eth1 -j MASQUERADE
where eth1 is the outbound interface. If this rule catches any IPsec
VPN traffic then it'll SNAT them even though the intention is obviously
to let them through without SNAT.
Perhaps it's best to have SNAT not touch packets with dst->xfrm set.
Unfortunately that leads to problems as well (albeit rarer) since you
may have catch-all IPsec policies that every packet matches, but you
want certain packets to be SNATed so that they match more specific
policies.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH] net: Move destructor from neigh->ops to neigh_params
From: David S. Miller @ 2006-01-23 21:54 UTC (permalink / raw)
To: rdreier; +Cc: netdev, openib-general
In-Reply-To: <adazmlmmy7v.fsf@cisco.com>
From: Roland Dreier <rdreier@cisco.com>
Date: Mon, 23 Jan 2006 13:27:32 -0800
> I'd like to get an ACK or NAK of it from Dave
Dave is in New Zealand at linux.conf.au, don't expect him to
be too active for at least a week...
^ permalink raw reply
* [PATCH] net: Move destructor from neigh->ops to neigh_params
From: Roland Dreier @ 2006-01-23 21:27 UTC (permalink / raw)
To: davem; +Cc: netdev, openib-general
This is a resend of a patch written by Michael S. Tsirkin
<mst@mellanox.co.il>. I'd like to get an ACK or NAK of it from Dave
and other networking people, so that we can either merge it upstream
or try a different approach. There definitely is a problem with
neighbour destructors that IP-over-IB is running into.
It would be good to know what the design was behind putting the
destructor method in neigh->ops in the first place.
Dave, if you want to merge this directly, that's fine. Or I'm fine
with merging this through the IB tree if you'd prefer (if you want me
to do that, let me know if you think it's 2.6.16 material).
Thanks,
Roland
struct neigh_ops currently has a destructor field, which no in-kernel
drivers outside of infiniband use. The infiniband/ulp/ipoib in-tree
driver stashes some info in the neighbour structure (the results of
the second-stage lookup from ARP results to real link-level path), and
it uses neigh->ops->destructor to get a callback so it can clean up
this extra info when a neighbour is freed. We've run into problems
with this: since the destructor is in an ops field that is shared
between neighbours that may belong to different net devices, there's
no way to set/clear it safely.
The following patch moves this field to neigh_parms where it can be
safely set, together with its twin neigh_setup. Two additional
patches in the patch series update ipoib to use this new interface.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
---
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 6fa9ae1..b0666d6 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -68,6 +68,7 @@ struct neigh_parms
struct net_device *dev;
struct neigh_parms *next;
int (*neigh_setup)(struct neighbour *);
+ void (*neigh_destructor)(struct neighbour *);
struct neigh_table *tbl;
void *sysctl_table;
@@ -145,7 +146,6 @@ struct neighbour
struct neigh_ops
{
int family;
- void (*destructor)(struct neighbour *);
void (*solicit)(struct neighbour *, struct sk_buff*);
void (*error_report)(struct neighbour *, struct sk_buff*);
int (*output)(struct sk_buff*);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index e68700f..3489e23 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -586,8 +586,8 @@ void neigh_destroy(struct neighbour *nei
kfree(hh);
}
- if (neigh->ops && neigh->ops->destructor)
- (neigh->ops->destructor)(neigh);
+ if (neigh->parms->neigh_destructor)
+ (neigh->parms->neigh_destructor)(neigh);
skb_queue_purge(&neigh->arp_queue);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index fd3f5c8..9588124 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -247,7 +247,6 @@ static void path_free(struct net_device
if (neigh->ah)
ipoib_put_ah(neigh->ah);
*to_ipoib_neigh(neigh->neighbour) = NULL;
- neigh->neighbour->ops->destructor = NULL;
kfree(neigh);
}
@@ -530,7 +529,6 @@ static void neigh_add_path(struct sk_buf
err:
*to_ipoib_neigh(skb->dst->neighbour) = NULL;
list_del(&neigh->list);
- neigh->neighbour->ops->destructor = NULL;
kfree(neigh);
++priv->stats.tx_dropped;
@@ -769,21 +767,9 @@ static void ipoib_neigh_destructor(struc
ipoib_put_ah(ah);
}
-static int ipoib_neigh_setup(struct neighbour *neigh)
-{
- /*
- * Is this kosher? I can't find anybody in the kernel that
- * sets neigh->destructor, so we should be able to set it here
- * without trouble.
- */
- neigh->ops->destructor = ipoib_neigh_destructor;
-
- return 0;
-}
-
static int ipoib_neigh_setup_dev(struct net_device *dev, struct neigh_parms *parms)
{
- parms->neigh_setup = ipoib_neigh_setup;
+ parms->neigh_destructor = ipoib_neigh_destructor;
return 0;
}
^ permalink raw reply related
* RE: My vote against eepro* removal
From: Jesse Brandeburg @ 2006-01-23 20:23 UTC (permalink / raw)
To: kus Kusche Klaus
Cc: Lee Revell, Evgeniy Polyakov, Adrian Bunk, linux-kernel,
Ronciak, John, Brandeburg, Jesse, netdev, Steven Rostedt
In-Reply-To: <AAD6DA242BC63C488511C611BD51F367323329@MAILIT.keba.co.at>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3497 bytes --]
On Mon, 23 Jan 2006, kus Kusche Klaus wrote:
> From: John Ronciak
> > Can we try a couple of things? 1) just comment out all the check for
> > link code in the e100 driver and give that a try and 2) just comment
> > out the update stats call and see if that works. These seem to be the
> > differences and we need to know which one is causing the problem.
>
> First of all, I am still unable to get any traces of this in the
> latency tracer. Moreover, as I told before, removing parts of the
> watchdog usually made my eth0 nonfunctional (which is bad - this
> is an embedded system with ssh access).
>
> Hence, I explicitely instrumented the watchdog function with tsc.
> Output of the timings is done by a background thread, so the
> timings should not increase the runtime of the watchdog.
>
> Here are my results:
>
> If the watchdog doesn't get interrupted, preempted, or whatever,
> it spends 340 us in its body:
> * 303 us in the mii code
> * 36 us in the following code up to e100_adjust_adaptive_ifs
> * 1 us in the remaining code (I think my chip doesn't need any
> of those chip-specific fixups)
>
> The 303 us in the mii code are divided in the following way:
> * 101 us in mii_ethtool_gset
> * 135 us in the whole if
> * 67 us in mii_check_link
>
> This is with the udelay(2) instead of udelay(20) hack applied.
> With udelay(20), the mii times are 128 + 170 + 85 us,
> i.e. 383 us instead of 303 us, or >= 420 us for the whole watchdog.
>
> As the RTC runs with 8192 Hz during my tests, the watchdog is hit
> by 2-3 interrupts, which adds another 75 - 110 us to its total
> execution time, i.e. the time it blocks other rtprio 1 threads.
Thank you very much for that detailed analysis! okay, so calls to mii.c
take too long, but those depend on mmio_read in e100 to do the work, so
this patch attempts to minimize the latency.
This patch is against linus-2.6.git, I compile and ssh/ping tested it.
Would you be willing to send your instrumentation patches? I could then
test any fixes easier.
e100: attempt a shorter delay for mdio reads
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Simply reorder our write/read sequence for mdio reads to minimize latency
as well as delay a shorter interval for each loop.
---
drivers/net/e100.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/net/e100.c b/drivers/net/e100.c
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -891,23 +891,25 @@ static u16 mdio_ctrl(struct nic *nic, u3
* procedure it should be done under lock.
*/
spin_lock_irqsave(&nic->mdio_lock, flags);
- for (i = 100; i; --i) {
+ for (i = 1000; i; --i) {
if (readl(&nic->csr->mdi_ctrl) & mdi_ready)
break;
- udelay(20);
+ udelay(2);
}
if (unlikely(!i)) {
- printk("e100.mdio_ctrl(%s) won't go Ready\n",
+ DPRINTK(PROBE, ERR, "e100.mdio_ctrl(%s) won't go Ready\n",
nic->netdev->name );
spin_unlock_irqrestore(&nic->mdio_lock, flags);
return 0; /* No way to indicate timeout error */
}
writel((reg << 16) | (addr << 21) | dir | data, &nic->csr->mdi_ctrl);
- for (i = 0; i < 100; i++) {
- udelay(20);
+ /* to avoid latency, read to flush the write, then delay, and only
+ * delay 2us per loop, manual says read should complete in < 64us */
+ for (i = 0; i < 1000; i++) {
if ((data_out = readl(&nic->csr->mdi_ctrl)) & mdi_ready)
break;
+ udelay(2);
}
spin_unlock_irqrestore(&nic->mdio_lock, flags);
DPRINTK(HW, DEBUG,
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox