* NetDev! Please help!
@ 2008-09-20 13:19 Badalian Vyacheslav
2008-09-20 13:38 ` Badalian Vyacheslav
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Badalian Vyacheslav @ 2008-09-20 13:19 UTC (permalink / raw)
To: netdev
Hello all.
We buy 10 Intel servers and paste it to shape traffic. After 5-15 hours
all PC is was freeze! Kernel not see TCO watchdog at this platform and
can't reboot it!. Soft Watchdog not reboot pc in this situation. =(
At screen we see messages like this (when it freeze and i was near monitor):
http://www.kerneloops.org/guilty.php?guilty=dev_watchdog&version=2.6.26-release&start=1736704&end=1769471&class=warn
Also by netconsole we was get this one time:
[ 1352.245851] netconsole: network logging started
[ 1458.400133] 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
[ 1458.400133] All bugs added by David S. Miller <davem@redhat.com>
[ 4956.420298] CPU 1: Machine Check Exception: 0000000000000005
[ 4956.420298] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
[ 4956.420300] Tx Queue <0>
[ 4956.420300] TDH <81>
[ 4956.420301] TDT <81>
[ 4956.420302] next_to_use <81>
[ 4956.420302] next_to_clean <d6>
[ 4956.420303] buffer_info[next_to_clean]
[ 4956.420303] time_stamp <15498d>
[ 4956.420304] next_to_watch <d6>
[ 4956.420304] jiffies <15511c>
[ 4956.420305] next_to_watch.status <1>
[ 4956.420537] eth1: Detected Tx Unit Hang:
[ 4956.420538] TDH <b0>
[ 4956.420538] TDT <b0>
[ 4956.420539] next_to_use <b0>
[ 4956.420539] next_to_clean <5>
[ 4956.420540] buffer_info[next_to_clean]:
[ 4956.420540] time_stamp <15498e>
[ 4956.420541] next_to_watch <5>
[ 4956.420542] jiffies <15511c>
[ 4956.420542] next_to_watch.status <1>
[ 4956.423064] CPU 1: Bank 0: 3200004000000800
[ 4956.423190] CPU 1: Bank 5: 3200220024080400
[ 4956.423315] Kernel panic - not syncing: CPU context corrupt
[ 4956.423933] Rebooting in 3 seconds..[ 531.843998] CPU 2: Machine
Check Exception: 0000000000000005
[ 531.843998] CPU 0: Machine Check Exception: 0000000000000004
[ 531.844000] CPU 0: Bank 0: 3200004000000800
[ 531.844001] CPU 0: Bank 5: 3200121014040400
[ 531.844002] Kernel panic - not syncing: CPU context corrupt
[ 531.844916] Rebooting in 3 seconds..
This out of lspci:
00:00.0 Host bridge: Intel Corporation Server DRAM Controller
00:19.0 Ethernet controller: Intel Corporation 82566DM-2 Gigabit Network
Connection (rev 02)
00:1a.0 USB Controller: Intel Corporation USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation USB UHCI Controller #5 (rev 02)
00:1a.2 USB Controller: Intel Corporation USB UHCI Controller #6 (rev 02)
00:1a.7 USB Controller: Intel Corporation USB2 EHCI Controller #2 (rev 02)
00:1c.0 PCI bridge: Intel Corporation PCI Express Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation PCI Express Port 5 (rev 02)
00:1d.0 USB Controller: Intel Corporation USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 6 port SATA AHCI Controller
(rev 02)
00:1f.3 SMBus: Intel Corporation SMBus Controller (rev 02)
02:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e
[Pilot] ServerEngines (SEP1) (rev 02)
03:02.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet
Controller (rev 05)
lspci -tv
-[0000:00]-+-00.0 Intel Corporation Server DRAM Controller
+-19.0 Intel Corporation 82566DM-2 Gigabit Network Connection
+-1a.0 Intel Corporation USB UHCI Controller #4
+-1a.1 Intel Corporation USB UHCI Controller #5
+-1a.2 Intel Corporation USB UHCI Controller #6
+-1a.7 Intel Corporation USB2 EHCI Controller #2
+-1c.0-[0000:01]--
+-1c.4-[0000:02]----00.0 Matrox Graphics, Inc. MGA G200e
[Pilot] ServerEngines (SEP1)
+-1d.0 Intel Corporation USB UHCI Controller #1
+-1d.1 Intel Corporation USB UHCI Controller #2
+-1d.2 Intel Corporation USB UHCI Controller #3
+-1d.7 Intel Corporation USB2 EHCI Controller #1
+-1e.0-[0000:03]----02.0 Intel Corporation 82541GI Gigabit
Ethernet Controller
+-1f.0 Intel Corporation LPC Interface Controller
+-1f.2 Intel Corporation 6 port SATA AHCI Controller
\-1f.3 Intel Corporation SMBus Controller
All tests in 2.6.26.2 kernel... now i try compile 2.6.26.5 kernel, but
if KernelOops is right - its not will help.
I hope NetDev help!
If any information or test is needed - please write!
Thanks for anyone!
Badalian Vyacheslav.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: NetDev! Please help!
2008-09-20 13:19 NetDev! Please help! Badalian Vyacheslav
@ 2008-09-20 13:38 ` Badalian Vyacheslav
2008-09-20 18:11 ` Denys Fedoryshchenko
2008-09-20 18:31 ` Machine Check Exception Was: " Jarek Poplawski
2 siblings, 0 replies; 18+ messages in thread
From: Badalian Vyacheslav @ 2008-09-20 13:38 UTC (permalink / raw)
To: netdev
New crash.... netconsole log:
[ 116.333349] ------------[ cut here ]------------
[ 116.333516] WARNING: at net/sched/sch_generic.c:222
dev_watchdog+0xf1/0x110()
[ 116.333690] Modules linked in: netconsole i2c_i801 i2c_core e1000e e1000
[ 116.334199] Pid: 0, comm: swapper Not tainted 2.6.26-gentoo-r1-fw #2
[ 116.334371] [<c012506f>] warn_on_slowpath+0x5f/0x90
[ 116.334597] [<c011dd1a>] enqueue_task_fair+0x1a/0x30
[ 116.334823] [<c011b962>] enqueue_task+0x12/0x30
[ 116.335046] [<c011b9d3>] activate_task+0x23/0x40
[ 116.335268] [<c011e01a>] try_to_wake_up+0x6a/0x110
[ 116.335491] [<c0137c7b>] autoremove_wake_function+0x1b/0x50
[ 116.335718] [<c011be6b>] __wake_up_common+0x4b/0x80
[ 116.335941] [<c011cfde>] __wake_up+0x3e/0x60
[ 116.336161] [<c0134a2b>] insert_work+0x4b/0x70
[ 116.336384] [<c0134dd7>] __queue_work+0x27/0x40
[ 116.336610] [<c02d0651>] dev_watchdog+0xf1/0x110
[ 116.337333] [<c012e055>] run_timer_softirq+0x115/0x170
[ 116.337557] [<c0122c71>] scheduler_tick+0xa1/0xd0
[ 116.337780] [<c012a062>] __do_softirq+0x82/0x100
[ 116.338002] [<c012a117>] do_softirq+0x37/0x40
[ 116.338222] [<c0114027>] smp_apic_timer_interrupt+0x57/0x90
[ 116.338448] [<c0105660>] apic_timer_interrupt+0x28/0x30
[ 116.338672] [<c010a5e2>] mwait_idle+0x32/0x40
[ 116.338894] [<c010a5b0>] mwait_idle+0x0/0x40
[ 116.339115] [<c01036e8>] cpu_idle+0x48/0xc0
[ 116.339336] =======================
[ 116.339499] ---[ end trace e25a40b7dc59df07 ]---
[ 117.655918] CPU 1: Machine Check Exception: 0000000000000005
[ 117.656103] CPU 1: Bank 0: 3200004000000800
[ 117.656604] CPU 1: Bank 5: 3200220024080400
[ 117.656604] Kernel panic - not syncing: CPU context corrupt
[ 117.656624] Rebooting in 3 seconds..
Thanks
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: NetDev! Please help!
2008-09-20 13:19 NetDev! Please help! Badalian Vyacheslav
2008-09-20 13:38 ` Badalian Vyacheslav
@ 2008-09-20 18:11 ` Denys Fedoryshchenko
2008-09-21 16:11 ` Jarek Poplawski
2008-09-20 18:31 ` Machine Check Exception Was: " Jarek Poplawski
2 siblings, 1 reply; 18+ messages in thread
From: Denys Fedoryshchenko @ 2008-09-20 18:11 UTC (permalink / raw)
To: Badalian Vyacheslav; +Cc: netdev
On Saturday 20 September 2008, Badalian Vyacheslav wrote:
> Hello all.
>
> We buy 10 Intel servers and paste it to shape traffic. After 5-15 hours
> all PC is was freeze! Kernel not see TCO watchdog at this platform and
> can't reboot it!. Soft Watchdog not reboot pc in this situation. =(
>
> At screen we see messages like this (when it freeze and i was near
> monitor):
Maybe try nmi_watchdog=1 ?
Also try http://www.nongnu.org/dmidecode/ - to check if it has IPMI.
Mine for example:
....
Handle 0x003F, DMI type 38, 16 bytes
IPMI Device Information
Interface Type: KCS (Keyboard Control Style)
Specification Version: 2.0
I2C Slave Address: 0x10
NV Storage Device Address: 0
Base Address: 0x0000000000000CA2 (I/O)
Also important to change it to newer 2.6.26.5 , because for example 2.6.26.4
have fix:
http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.26.4
....
commit 685f605a498b73759cbcbc816089e804710fcc48
Author: David S. Miller <davem@davemloft.net>
Date: Wed Aug 27 22:35:56 2008 -0700
pkt_sched: Fix return value corruption in HTB and TBF.
[ Upstream commit 69747650c814a8a79fef412c7416adf823293a3e ]
Based upon a bug report by Josip Rodin.
Packet schedulers should only return NET_XMIT_DROP iff
the packet really was dropped. If the packet does reach
the device after we return NET_XMIT_DROP then TCP can
crash because it depends upon the enqueue path return
values being accurate.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
....
P.S. For netdev, i have one more friend - who is complaining that shapers is
crashing on Intel machines (who uses TSC, he have two different "Core" based
servers, and both is crashing). With HPET i dont have any problem on high
performance shapers (except, that it is CPU expensive). It happens on latest
2.6.26.5 too. Machine getting hard lockup, and nothing than hardware watchdog
able to recover it. They dont have experience to get actual reason of this
issue and they dont know english well to report this issue.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine Check Exception Was: NetDev! Please help!
2008-09-20 13:19 NetDev! Please help! Badalian Vyacheslav
2008-09-20 13:38 ` Badalian Vyacheslav
2008-09-20 18:11 ` Denys Fedoryshchenko
@ 2008-09-20 18:31 ` Jarek Poplawski
2 siblings, 0 replies; 18+ messages in thread
From: Jarek Poplawski @ 2008-09-20 18:31 UTC (permalink / raw)
To: Badalian Vyacheslav; +Cc: netdev, LKML
Badalian Vyacheslav wrote, On 09/20/2008 03:19 PM:
> Hello all.
Hi Vyacheslav,
I think it might be something more than netdev. Please, read in the kernel
config the comment to Processor type and features/Machine Check Exception.
I pasted your second message below and Cc linux-kernel.
Jarek P.
> We buy 10 Intel servers and paste it to shape traffic. After 5-15 hours
> all PC is was freeze! Kernel not see TCO watchdog at this platform and
> can't reboot it!. Soft Watchdog not reboot pc in this situation. =(
>
> At screen we see messages like this (when it freeze and i was near monitor):
>
> http://www.kerneloops.org/guilty.php?guilty=dev_watchdog&version=2.6.26-release&start=1736704&end=1769471&class=warn
>
> Also by netconsole we was get this one time:
>
> [ 1352.245851] netconsole: network logging started
> [ 1458.400133] 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
> [ 1458.400133] All bugs added by David S. Miller <davem@redhat.com>
> [ 4956.420298] CPU 1: Machine Check Exception: 0000000000000005
> [ 4956.420298] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
> [ 4956.420300] Tx Queue <0>
> [ 4956.420300] TDH <81>
> [ 4956.420301] TDT <81>
> [ 4956.420302] next_to_use <81>
> [ 4956.420302] next_to_clean <d6>
> [ 4956.420303] buffer_info[next_to_clean]
> [ 4956.420303] time_stamp <15498d>
> [ 4956.420304] next_to_watch <d6>
> [ 4956.420304] jiffies <15511c>
> [ 4956.420305] next_to_watch.status <1>
> [ 4956.420537] eth1: Detected Tx Unit Hang:
> [ 4956.420538] TDH <b0>
> [ 4956.420538] TDT <b0>
> [ 4956.420539] next_to_use <b0>
> [ 4956.420539] next_to_clean <5>
> [ 4956.420540] buffer_info[next_to_clean]:
> [ 4956.420540] time_stamp <15498e>
> [ 4956.420541] next_to_watch <5>
> [ 4956.420542] jiffies <15511c>
> [ 4956.420542] next_to_watch.status <1>
> [ 4956.423064] CPU 1: Bank 0: 3200004000000800
> [ 4956.423190] CPU 1: Bank 5: 3200220024080400
> [ 4956.423315] Kernel panic - not syncing: CPU context corrupt
> [ 4956.423933] Rebooting in 3 seconds..[ 531.843998] CPU 2: Machine
> Check Exception: 0000000000000005
> [ 531.843998] CPU 0: Machine Check Exception: 0000000000000004
> [ 531.844000] CPU 0: Bank 0: 3200004000000800
> [ 531.844001] CPU 0: Bank 5: 3200121014040400
> [ 531.844002] Kernel panic - not syncing: CPU context corrupt
> [ 531.844916] Rebooting in 3 seconds..
>
> This out of lspci:
>
> 00:00.0 Host bridge: Intel Corporation Server DRAM Controller
> 00:19.0 Ethernet controller: Intel Corporation 82566DM-2 Gigabit Network
> Connection (rev 02)
> 00:1a.0 USB Controller: Intel Corporation USB UHCI Controller #4 (rev 02)
> 00:1a.1 USB Controller: Intel Corporation USB UHCI Controller #5 (rev 02)
> 00:1a.2 USB Controller: Intel Corporation USB UHCI Controller #6 (rev 02)
> 00:1a.7 USB Controller: Intel Corporation USB2 EHCI Controller #2 (rev 02)
> 00:1c.0 PCI bridge: Intel Corporation PCI Express Port 1 (rev 02)
> 00:1c.4 PCI bridge: Intel Corporation PCI Express Port 5 (rev 02)
> 00:1d.0 USB Controller: Intel Corporation USB UHCI Controller #1 (rev 02)
> 00:1d.1 USB Controller: Intel Corporation USB UHCI Controller #2 (rev 02)
> 00:1d.2 USB Controller: Intel Corporation USB UHCI Controller #3 (rev 02)
> 00:1d.7 USB Controller: Intel Corporation USB2 EHCI Controller #1 (rev 02)
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
> 00:1f.0 ISA bridge: Intel Corporation LPC Interface Controller (rev 02)
> 00:1f.2 SATA controller: Intel Corporation 6 port SATA AHCI Controller
> (rev 02)
> 00:1f.3 SMBus: Intel Corporation SMBus Controller (rev 02)
> 02:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e
> [Pilot] ServerEngines (SEP1) (rev 02)
> 03:02.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet
> Controller (rev 05)
>
> lspci -tv
> -[0000:00]-+-00.0 Intel Corporation Server DRAM Controller
> +-19.0 Intel Corporation 82566DM-2 Gigabit Network Connection
> +-1a.0 Intel Corporation USB UHCI Controller #4
> +-1a.1 Intel Corporation USB UHCI Controller #5
> +-1a.2 Intel Corporation USB UHCI Controller #6
> +-1a.7 Intel Corporation USB2 EHCI Controller #2
> +-1c.0-[0000:01]--
> +-1c.4-[0000:02]----00.0 Matrox Graphics, Inc. MGA G200e
> [Pilot] ServerEngines (SEP1)
> +-1d.0 Intel Corporation USB UHCI Controller #1
> +-1d.1 Intel Corporation USB UHCI Controller #2
> +-1d.2 Intel Corporation USB UHCI Controller #3
> +-1d.7 Intel Corporation USB2 EHCI Controller #1
> +-1e.0-[0000:03]----02.0 Intel Corporation 82541GI Gigabit
> Ethernet Controller
> +-1f.0 Intel Corporation LPC Interface Controller
> +-1f.2 Intel Corporation 6 port SATA AHCI Controller
> \-1f.3 Intel Corporation SMBus Controller
>
> All tests in 2.6.26.2 kernel... now i try compile 2.6.26.5 kernel, but
> if KernelOops is right - its not will help.
>
> I hope NetDev help!
> If any information or test is needed - please write!
>
> Thanks for anyone!
>
> Badalian Vyacheslav.
Badalian Vyacheslav wrote, On 09/20/2008 03:38 PM:
> New crash.... netconsole log:
>
> [ 116.333349] ------------[ cut here ]------------
> [ 116.333516] WARNING: at net/sched/sch_generic.c:222
> dev_watchdog+0xf1/0x110()
> [ 116.333690] Modules linked in: netconsole i2c_i801 i2c_core e1000e e1000
> [ 116.334199] Pid: 0, comm: swapper Not tainted 2.6.26-gentoo-r1-fw #2
> [ 116.334371] [<c012506f>] warn_on_slowpath+0x5f/0x90
> [ 116.334597] [<c011dd1a>] enqueue_task_fair+0x1a/0x30
> [ 116.334823] [<c011b962>] enqueue_task+0x12/0x30
> [ 116.335046] [<c011b9d3>] activate_task+0x23/0x40
> [ 116.335268] [<c011e01a>] try_to_wake_up+0x6a/0x110
> [ 116.335491] [<c0137c7b>] autoremove_wake_function+0x1b/0x50
> [ 116.335718] [<c011be6b>] __wake_up_common+0x4b/0x80
> [ 116.335941] [<c011cfde>] __wake_up+0x3e/0x60
> [ 116.336161] [<c0134a2b>] insert_work+0x4b/0x70
> [ 116.336384] [<c0134dd7>] __queue_work+0x27/0x40
> [ 116.336610] [<c02d0651>] dev_watchdog+0xf1/0x110
> [ 116.337333] [<c012e055>] run_timer_softirq+0x115/0x170
> [ 116.337557] [<c0122c71>] scheduler_tick+0xa1/0xd0
> [ 116.337780] [<c012a062>] __do_softirq+0x82/0x100
> [ 116.338002] [<c012a117>] do_softirq+0x37/0x40
> [ 116.338222] [<c0114027>] smp_apic_timer_interrupt+0x57/0x90
> [ 116.338448] [<c0105660>] apic_timer_interrupt+0x28/0x30
> [ 116.338672] [<c010a5e2>] mwait_idle+0x32/0x40
> [ 116.338894] [<c010a5b0>] mwait_idle+0x0/0x40
> [ 116.339115] [<c01036e8>] cpu_idle+0x48/0xc0
> [ 116.339336] =======================
> [ 116.339499] ---[ end trace e25a40b7dc59df07 ]---
> [ 117.655918] CPU 1: Machine Check Exception: 0000000000000005
> [ 117.656103] CPU 1: Bank 0: 3200004000000800
> [ 117.656604] CPU 1: Bank 5: 3200220024080400
> [ 117.656604] Kernel panic - not syncing: CPU context corrupt
> [ 117.656624] Rebooting in 3 seconds..
>
>
> Thanks
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: NetDev! Please help!
2008-09-20 18:11 ` Denys Fedoryshchenko
@ 2008-09-21 16:11 ` Jarek Poplawski
[not found] ` <48D7385D.40107@bigtelecom.ru>
0 siblings, 1 reply; 18+ messages in thread
From: Jarek Poplawski @ 2008-09-21 16:11 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: Badalian Vyacheslav, netdev
Denys Fedoryshchenko wrote, On 09/20/2008 08:11 PM:
...
> P.S. For netdev, i have one more friend - who is complaining that shapers is
> crashing on Intel machines (who uses TSC, he have two different "Core" based
> servers, and both is crashing). With HPET i dont have any problem on high
> performance shapers (except, that it is CPU expensive). It happens on latest
> 2.6.26.5 too. Machine getting hard lockup, and nothing than hardware watchdog
> able to recover it. They dont have experience to get actual reason of this
> issue and they dont know english well to report this issue.
Is your friend sure it's because of shapers? If he/she can patch
there is no need to know English well to report here:
Subject: 2.6.26.5 tc not OK
Config:
.config
tc script:
script
dmesg:
dmesg
not OK when: script run/script not run
patch #1 not OK
patch #2 not OK
...
patch #2001 OK!
Jarek P.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine Check Exception Re: NetDev! Please help!
[not found] ` <48D7385D.40107@bigtelecom.ru>
@ 2008-09-22 6:53 ` Jarek Poplawski
2008-09-22 8:05 ` Jarek Poplawski
2008-09-22 9:40 ` Badalian Vyacheslav
0 siblings, 2 replies; 18+ messages in thread
From: Jarek Poplawski @ 2008-09-22 6:53 UTC (permalink / raw)
To: Badalian Vyacheslav; +Cc: Denys Fedoryshchenko, netdev, linux-kernel
On Mon, Sep 22, 2008 at 10:17:01AM +0400, Badalian Vyacheslav wrote:
> Jarek Poplawski:
>
> Hello!
> There all requested information.
> I try 2.6.26.5 and again get:
> [143784.513166] CPU 2: Bank 0: 3200004000000800
> [143784.513241] CPU 2: Bank 5: 3200121020080400
> [143784.513241] Kernel panic - not syncing: CPU context corrupt
> [143784.513282] Rebooting in 3 seconds..
Hi,
Actually, I suggested you to read this Machine Check Exception help,
because I think you should first try to test your hardware instead of
sending configs. This type of error isn't usually seen with netdev
bugs.
Since I'm not a hardware expert I added linux-kernel to Cc, and
probably you should do the same (I added it to this one). But, until
you have any better advice I think you should do some long and heavy
testing of your PCs especially for overheating or memory problems.
We can start to analyze other bugs after we are sure the hardware is
OK.
BTW, probably your attachements are too big for the lists and the
message could be dropped. It would be better to add some link to a
server or use bugzilla for this.
Thanks,
Jarek P.
>
> Attached all info that i was can get from PC. Maybe problem that we use
> Core Duo Quard processors? It's 64bit, but kernel and software compile
> as 32. On 2 x "OLD HT(2 core) Xeon 32 bit" PC all work great...
>
> Simple step to reproduce
> Add iptables and tc rules.... give above 500 mbs total traffic (we have
> above 300/200 mbs in/out) from any (many?) ip what preset in TC rules
> and run any CPU like process (like compiling)...
>
> Thanks for answers!
>
> Denys Fedoryshchenko:
> Hello!
> i try run nmi_watchdog...
> i hope its helps, but this PC have hardware watchdog (bios have params
> for it), but kernel not have module for it - /S3210SH/ (ICH9-R chipset).
> I think simple not add ID to driver. I try write to author of it -
> wim@iguana.be.
> Please ask for me... this line:
> [ 0.143332] APIC timer registered as dummy, due to nmi_watchdog=1!
> its normal start of nmi_watchdog? or i need use nmi_watchdog=2?
>
> Thanks for answers!
>
> > Denys Fedoryshchenko wrote, On 09/20/2008 08:11 PM:
> > ...
> >
> >
> >> P.S. For netdev, i have one more friend - who is complaining that shapers is
> >> crashing on Intel machines (who uses TSC, he have two different "Core" based
> >> servers, and both is crashing). With HPET i dont have any problem on high
> >> performance shapers (except, that it is CPU expensive). It happens on latest
> >> 2.6.26.5 too. Machine getting hard lockup, and nothing than hardware watchdog
> >> able to recover it. They dont have experience to get actual reason of this
> >> issue and they dont know english well to report this issue.
> >>
> >
> > Is your friend sure it's because of shapers? If he/she can patch
> > there is no need to know English well to report here:
> >
> > Subject: 2.6.26.5 tc not OK
> >
> > Config:
> > .config
> >
> > tc script:
> > script
> >
> > dmesg:
> > dmesg
> >
> > not OK when: script run/script not run
> >
> > patch #1 not OK
> > patch #2 not OK
> > ...
> > patch #2001 OK!
> >
> > Jarek P.
> >
> >
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine Check Exception Re: NetDev! Please help!
2008-09-22 6:53 ` Machine Check Exception " Jarek Poplawski
@ 2008-09-22 8:05 ` Jarek Poplawski
2008-09-22 9:40 ` Badalian Vyacheslav
1 sibling, 0 replies; 18+ messages in thread
From: Jarek Poplawski @ 2008-09-22 8:05 UTC (permalink / raw)
To: Badalian Vyacheslav; +Cc: Denys Fedoryshchenko, netdev, linux-kernel
On Mon, Sep 22, 2008 at 06:53:39AM +0000, Jarek Poplawski wrote:
...
> [...]I think you should do some long and heavy
> testing of your PCs especially for overheating or memory problems.
but also power supply, network card etc.
Jarek P.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine Check Exception Re: NetDev! Please help!
2008-09-22 6:53 ` Machine Check Exception " Jarek Poplawski
2008-09-22 8:05 ` Jarek Poplawski
@ 2008-09-22 9:40 ` Badalian Vyacheslav
2008-09-22 11:24 ` Jarek Poplawski
1 sibling, 1 reply; 18+ messages in thread
From: Badalian Vyacheslav @ 2008-09-22 9:40 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Denys Fedoryshchenko, netdev, linux-kernel
Thanks for answer Jarek!
I post it is bugtrack - http://bugzilla.kernel.org/show_bug.cgi?id=11618
I not think that its hardware error because this problem we have in 10
servers on 2.6.26.2 kernel +)
On Friday night i compile 2.6.26.5 and have 2 panic on 1 pc what have
max load and 1 panic on other pc.
I write to netdev list because first messages looks like:
[ 4956.420298] CPU 1: Machine Check Exception: 0000000000000005
[ 4956.420298] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
[ 4956.420300] Tx Queue <0>
[ 4956.420300] TDH <81>
[ 4956.420301] TDT <81>
[ 4956.420302] next_to_use <81>
[ 4956.420302] next_to_clean <d6>
[ 4956.420303] buffer_info[next_to_clean]
[ 4956.420303] time_stamp <15498d>
[ 4956.420304] next_to_watch <d6>
[ 4956.420304] jiffies <15511c>
[ 4956.420305] next_to_watch.status <1>
[ 4956.420537] eth1: Detected Tx Unit Hang:
[ 4956.420538] TDH <b0>
[ 4956.420538] TDT <b0>
[ 4956.420539] next_to_use <b0>
[ 4956.420539] next_to_clean <5>
[ 4956.420540] buffer_info[next_to_clean]:
[ 4956.420540] time_stamp <15498e>
[ 4956.420541] next_to_watch <5>
[ 4956.420542] jiffies <15511c>
[ 4956.420542] next_to_watch.status <1>
[ 4956.423064] CPU 1: Bank 0: 3200004000000800
[ 4956.423190] CPU 1: Bank 5: 3200220024080400
[ 4956.423315] Kernel panic - not syncing: CPU context corrupt
[ 4956.423933] Rebooting in 3 seconds..
But in 2.6.26.5 i not see errors like this 2 days... Also if system not have network load - i can't do panic by cpuburn or compiling sources...
Anyone i think its good that my message also go to general mail-list and bugzilla...
I try get more info... if you or anyone have idea how test this bug - i can do it)
Thanks!
> On Mon, Sep 22, 2008 at 10:17:01AM +0400, Badalian Vyacheslav wrote:
>
>> Jarek Poplawski:
>>
>> Hello!
>> There all requested information.
>> I try 2.6.26.5 and again get:
>> [143784.513166] CPU 2: Bank 0: 3200004000000800
>> [143784.513241] CPU 2: Bank 5: 3200121020080400
>> [143784.513241] Kernel panic - not syncing: CPU context corrupt
>> [143784.513282] Rebooting in 3 seconds..
>>
>
> Hi,
>
> Actually, I suggested you to read this Machine Check Exception help,
> because I think you should first try to test your hardware instead of
> sending configs. This type of error isn't usually seen with netdev
> bugs.
>
> Since I'm not a hardware expert I added linux-kernel to Cc, and
> probably you should do the same (I added it to this one). But, until
> you have any better advice I think you should do some long and heavy
> testing of your PCs especially for overheating or memory problems.
> We can start to analyze other bugs after we are sure the hardware is
> OK.
>
> BTW, probably your attachements are too big for the lists and the
> message could be dropped. It would be better to add some link to a
> server or use bugzilla for this.
>
> Thanks,
> Jarek P.
>
>
>> Attached all info that i was can get from PC. Maybe problem that we use
>> Core Duo Quard processors? It's 64bit, but kernel and software compile
>> as 32. On 2 x "OLD HT(2 core) Xeon 32 bit" PC all work great...
>>
>> Simple step to reproduce
>> Add iptables and tc rules.... give above 500 mbs total traffic (we have
>> above 300/200 mbs in/out) from any (many?) ip what preset in TC rules
>> and run any CPU like process (like compiling)...
>>
>> Thanks for answers!
>>
>> Denys Fedoryshchenko:
>> Hello!
>> i try run nmi_watchdog...
>> i hope its helps, but this PC have hardware watchdog (bios have params
>> for it), but kernel not have module for it - /S3210SH/ (ICH9-R chipset).
>> I think simple not add ID to driver. I try write to author of it -
>> wim@iguana.be.
>> Please ask for me... this line:
>> [ 0.143332] APIC timer registered as dummy, due to nmi_watchdog=1!
>> its normal start of nmi_watchdog? or i need use nmi_watchdog=2?
>>
>> Thanks for answers!
>>
>>
>>> Denys Fedoryshchenko wrote, On 09/20/2008 08:11 PM:
>>> ...
>>>
>>>
>>>
>>>> P.S. For netdev, i have one more friend - who is complaining that shapers is
>>>> crashing on Intel machines (who uses TSC, he have two different "Core" based
>>>> servers, and both is crashing). With HPET i dont have any problem on high
>>>> performance shapers (except, that it is CPU expensive). It happens on latest
>>>> 2.6.26.5 too. Machine getting hard lockup, and nothing than hardware watchdog
>>>> able to recover it. They dont have experience to get actual reason of this
>>>> issue and they dont know english well to report this issue.
>>>>
>>>>
>>> Is your friend sure it's because of shapers? If he/she can patch
>>> there is no need to know English well to report here:
>>>
>>> Subject: 2.6.26.5 tc not OK
>>>
>>> Config:
>>> .config
>>>
>>> tc script:
>>> script
>>>
>>> dmesg:
>>> dmesg
>>>
>>> not OK when: script run/script not run
>>>
>>> patch #1 not OK
>>> patch #2 not OK
>>> ...
>>> patch #2001 OK!
>>>
>>> Jarek P.
>>>
>>>
>>>
>
>
>
>
>
>
>
>
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine Check Exception Re: NetDev! Please help!
2008-09-22 9:40 ` Badalian Vyacheslav
@ 2008-09-22 11:24 ` Jarek Poplawski
2008-09-22 13:00 ` Badalian Vyacheslav
0 siblings, 1 reply; 18+ messages in thread
From: Jarek Poplawski @ 2008-09-22 11:24 UTC (permalink / raw)
To: Badalian Vyacheslav; +Cc: Denys Fedoryshchenko, netdev, linux-kernel
On Mon, Sep 22, 2008 at 01:40:35PM +0400, Badalian Vyacheslav wrote:
> Thanks for answer Jarek!
> I post it is bugtrack - http://bugzilla.kernel.org/show_bug.cgi?id=11618
>
> I not think that its hardware error because this problem we have in 10
> servers on 2.6.26.2 kernel +)
> On Friday night i compile 2.6.26.5 and have 2 panic on 1 pc what have
> max load and 1 panic on other pc.
> I write to netdev list because first messages looks like:
>
> [ 4956.420298] CPU 1: Machine Check Exception: 0000000000000005
> [ 4956.420298] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
> [ 4956.420300] Tx Queue <0>
> [ 4956.420300] TDH <81>
> [ 4956.420301] TDT <81>
> [ 4956.420302] next_to_use <81>
> [ 4956.420302] next_to_clean <d6>
> [ 4956.420303] buffer_info[next_to_clean]
> [ 4956.420303] time_stamp <15498d>
> [ 4956.420304] next_to_watch <d6>
> [ 4956.420304] jiffies <15511c>
> [ 4956.420305] next_to_watch.status <1>
> [ 4956.420537] eth1: Detected Tx Unit Hang:
> [ 4956.420538] TDH <b0>
> [ 4956.420538] TDT <b0>
> [ 4956.420539] next_to_use <b0>
> [ 4956.420539] next_to_clean <5>
> [ 4956.420540] buffer_info[next_to_clean]:
> [ 4956.420540] time_stamp <15498e>
> [ 4956.420541] next_to_watch <5>
> [ 4956.420542] jiffies <15511c>
> [ 4956.420542] next_to_watch.status <1>
> [ 4956.423064] CPU 1: Bank 0: 3200004000000800
> [ 4956.423190] CPU 1: Bank 5: 3200220024080400
> [ 4956.423315] Kernel panic - not syncing: CPU context corrupt
> [ 4956.423933] Rebooting in 3 seconds..
Yes, similar messages are often netdev problems, but not with
this Machine Check Exception with this CPU context corrupt,
which should mean some severe hardware problem (unless some bug,
probably not netdev, triggers them).
>
> But in 2.6.26.5 i not see errors like this 2 days... Also if system not have network load - i can't do panic by cpuburn or compiling sources...
> Anyone i think its good that my message also go to general mail-list and bugzilla...
>
> I try get more info... if you or anyone have idea how test this bug - i can do it)
I see you have some advice in bugzilla. These people really know more
about these things, so you should try this first. I think, they expect
you to compile the most current kernel version (tip) using git for
this. You can do this using the instructions from Ingo Molnar's README.
Make a script from this: from the beginning to the "git checkout ...".
Of course you have to install git before. After running the commands
it will download the kernel sources to a subdir (takes time). Copy your
config there, make oldconfig, make etc. Then send them dmesg after
rebooting. If you have any problems - write. Alternatively, I guess,
you could try the current 2.6.27-rc7 kernel at least.
Jarek P.
BTW: could you try to trigger this bug with one network card off?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine Check Exception Re: NetDev! Please help!
2008-09-22 11:24 ` Jarek Poplawski
@ 2008-09-22 13:00 ` Badalian Vyacheslav
2008-09-22 17:23 ` Jarek Poplawski
0 siblings, 1 reply; 18+ messages in thread
From: Badalian Vyacheslav @ 2008-09-22 13:00 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Denys Fedoryshchenko, netdev, linux-kernel
> BTW: could you try to trigger this bug with one network card off?
>
Shire! I stop eth1 and do "/etc/init.d/bgpd stop" (this pc not get route
traffic anymore)....
run "emerge portage" 2 times and get:
[25492.187405] CPU 3: Machine Check Exception: 0000000000000005
[25492.187405] MCE: The hardware reports a non fatal, correctable
incident occurred on CPU 1.
[25492.187405] Bank 0: b200004000000800
[25492.187405] MCE: The hardware reports a non fatal, correctable
incident occurred on CPU 1.
[25492.187405] Bank 5: b200120014040400
[25497.124884] CPU 1: Machine Check Exception: 0000000000000004
[25497.124884] Kernel panic - not syncing: Unable to continue
[25497.124884] Rebooting in 3 seconds..
bugtracker updated.... i can get (reproduce) error on all 10 servers at
2.6.26.5... I use TC and not wont test 2.6.27-rc because its have (if i
understand) multiqueue feature that not tested...
Thanks!
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine Check Exception Re: NetDev! Please help!
2008-09-22 13:00 ` Badalian Vyacheslav
@ 2008-09-22 17:23 ` Jarek Poplawski
2008-09-23 7:43 ` Badalian Vyacheslav
0 siblings, 1 reply; 18+ messages in thread
From: Jarek Poplawski @ 2008-09-22 17:23 UTC (permalink / raw)
To: Badalian Vyacheslav; +Cc: Denys Fedoryshchenko, netdev, linux-kernel
On Mon, Sep 22, 2008 at 05:00:57PM +0400, Badalian Vyacheslav wrote:
>
> > BTW: could you try to trigger this bug with one network card off?
> >
>
>
> Shire! I stop eth1 and do "/etc/init.d/bgpd stop" (this pc not get route
> traffic anymore)....
>
> run "emerge portage" 2 times and get:
>
> [25492.187405] CPU 3: Machine Check Exception: 0000000000000005
> [25492.187405] MCE: The hardware reports a non fatal, correctable
> incident occurred on CPU 1.
> [25492.187405] Bank 0: b200004000000800
> [25492.187405] MCE: The hardware reports a non fatal, correctable
> incident occurred on CPU 1.
> [25492.187405] Bank 5: b200120014040400
> [25497.124884] CPU 1: Machine Check Exception: 0000000000000004
> [25497.124884] Kernel panic - not syncing: Unable to continue
> [25497.124884] Rebooting in 3 seconds..
>
> bugtracker updated.... i can get (reproduce) error on all 10 servers at
> 2.6.26.5... I use TC and not wont test 2.6.27-rc because its have (if i
> understand) multiqueue feature that not tested...
Actually, it's quite well tested, especially by Denys, and I doubt it
will be much better in 2.6.27. BTW, maybe start eth1, stop eth0 yet?
Thanks,
Jarek P.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine Check Exception Re: NetDev! Please help!
2008-09-22 17:23 ` Jarek Poplawski
@ 2008-09-23 7:43 ` Badalian Vyacheslav
2008-09-23 9:25 ` Jarek Poplawski
0 siblings, 1 reply; 18+ messages in thread
From: Badalian Vyacheslav @ 2008-09-23 7:43 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Denys Fedoryshchenko, netdev, linux-kernel
Hello
I stop eth1 and eth0 and run "emegre portage" and get exception. Now i
think its not problem in network part.
I miss situation in 2.6.27 about multiqueue and traffic shaper because i
was have many work and was can't read all netdev list =(
As i understand 2.6.27-rc have support multiqueue, but how it will work
with HTB/SFQ?
Is tc rules must have in 2.6.27 one root queue (and all queue go to this
tree) or need to do many qdiscs and settings it to device queues (i was
read some about queue2band)?
If it is simple for you - can you sort describe this part of changes?
P.S. I think now that problem not in network part of kernel and i think
i stop CC netdev and Denys Fedoryshchenko. Thanks for you doing and
thanks for help!
Thanks.
> Actually, it's quite well tested, especially by Denys, and I doubt it
> will be much better in 2.6.27. BTW, maybe start eth1, stop eth0 yet?
>
> Thanks,
> Jarek P.
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine Check Exception Re: NetDev! Please help!
2008-09-23 7:43 ` Badalian Vyacheslav
@ 2008-09-23 9:25 ` Jarek Poplawski
2008-09-23 10:36 ` Badalian Vyacheslav
0 siblings, 1 reply; 18+ messages in thread
From: Jarek Poplawski @ 2008-09-23 9:25 UTC (permalink / raw)
To: Badalian Vyacheslav; +Cc: Denys Fedoryshchenko, netdev, linux-kernel
On Tue, Sep 23, 2008 at 11:43:08AM +0400, Badalian Vyacheslav wrote:
> Hello
>
> I stop eth1 and eth0 and run "emegre portage" and get exception. Now i
> think its not problem in network part.
>
> I miss situation in 2.6.27 about multiqueue and traffic shaper because i
> was have many work and was can't read all netdev list =(
> As i understand 2.6.27-rc have support multiqueue, but how it will work
> with HTB/SFQ?
> Is tc rules must have in 2.6.27 one root queue (and all queue go to this
> tree) or need to do many qdiscs and settings it to device queues (i was
> read some about queue2band)?
> If it is simple for you - can you sort describe this part of changes?
There are two main cases:
1) The default qdiscs (created while activating a new net device):
depending on a driver (most drivers are still uniqueue), there are
created independent pfifo_fast_qdiscs for each supported tx queue;
if a driver doesn't change this, packets are directed to them
automatically, according to some hash function, which tries to
separate different flows. This should be the fastest solution because
there are separate qdisc and transmit locks, which could be taken by
different cpus at the same time.
2) Non-default qdiscs (any qdiscs added with tc): there is only one
root qdisc (with its tree) as before, dequeued to all tx queues (if
available). Since there is only one qdisc lock, and additional flag
preventing other processes to run the qdisc at the same time, there
is not so much advantage of SMP, except on tx locking. All previous
tc configs should work without changes (except sch_prio and sch_rr
used for multiqueuing, replaced by sch_multiq and act_skbedit now).
Probably in some cases adding sch_multiq to a tree for separating
qdisc queues per tx queues could be useful.
Cheers,
Jarek P.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine Check Exception Re: NetDev! Please help!
2008-09-23 9:25 ` Jarek Poplawski
@ 2008-09-23 10:36 ` Badalian Vyacheslav
2008-09-23 11:57 ` Jarek Poplawski
0 siblings, 1 reply; 18+ messages in thread
From: Badalian Vyacheslav @ 2008-09-23 10:36 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: netdev
> 2) Non-default qdiscs (any qdiscs added with tc): there is only one
> root qdisc (with its tree) as before, dequeued to all tx queues (if
> available). Since there is only one qdisc lock, and additional flag
> preventing other processes to run the qdisc at the same time, there
> is not so much advantage of SMP, except on tx locking. All previous
> tc configs should work without changes (except sch_prio and sch_rr
> used for multiqueuing, replaced by sch_multiq and act_skbedit now).
> Probably in some cases adding sch_multiq to a tree for separating
> qdisc queues per tx queues could be useful.
>
Very thanks for detailed information!
Yeh! Its sound great for me. I also can stress test this feature in our
network if you will needed it.
Only i have 2 question ...
1. If kernel use default situation (no tc user create rules, simple
autocreate by network card driver/module) its will normal work with
traffic what must delivered "as is" (not shape)... like IPTV or other
multicast/unicast video stream...
If i understand logic
we have 8 cpu/core and 4 TX queue and 4 RX... one cpu linked to 1
TX/RX.... but if 1 cpu is burned by some process - this cpu will send
its packet later when other cpu and packet is shape... but streams must
go packet by packet to receive device... I understand that it simple
need use hash function for TX queue, but if i understand - RX can't
separate packets to different queues (its doing by hardware?) by hash
function and packets may shape in rx stage because one CPU get it letter
when nedded?
I do not wish to spend your time... only say if it will work normal by
default and i will sleep easy :)
2. I can locate module like sch_multiq at last 2.6.27-rc tree and not
have information of it in google... I need to know only one thing - what
params for hashing was planned for it?
Thanks and thanks again! And again sorry for my English.
Best regals, Badalyan Vyacheslav.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine Check Exception Re: NetDev! Please help!
2008-09-23 10:36 ` Badalian Vyacheslav
@ 2008-09-23 11:57 ` Jarek Poplawski
2008-09-23 12:06 ` Jarek Poplawski
2008-09-23 12:16 ` Badalian Vyacheslav
0 siblings, 2 replies; 18+ messages in thread
From: Jarek Poplawski @ 2008-09-23 11:57 UTC (permalink / raw)
To: Badalian Vyacheslav; +Cc: netdev
On Tue, Sep 23, 2008 at 02:36:05PM +0400, Badalian Vyacheslav wrote:
>
> > 2) Non-default qdiscs (any qdiscs added with tc): there is only one
> > root qdisc (with its tree) as before, dequeued to all tx queues (if
> > available). Since there is only one qdisc lock, and additional flag
> > preventing other processes to run the qdisc at the same time, there
> > is not so much advantage of SMP, except on tx locking. All previous
> > tc configs should work without changes (except sch_prio and sch_rr
> > used for multiqueuing, replaced by sch_multiq and act_skbedit now).
> > Probably in some cases adding sch_multiq to a tree for separating
> > qdisc queues per tx queues could be useful.
> >
> Very thanks for detailed information!
> Yeh! Its sound great for me. I also can stress test this feature in our
> network if you will needed it.
Actually, I don't use these things too much, but I guess, you'll need
this more. Main issues were tested and fixed, but there could be always
some details not used, not noticed or not reported until you decide to
use this.
>
> Only i have 2 question ...
>
> 1. If kernel use default situation (no tc user create rules, simple
> autocreate by network card driver/module) its will normal work with
> traffic what must delivered "as is" (not shape)... like IPTV or other
> multicast/unicast video stream...
> If i understand logic
> we have 8 cpu/core and 4 TX queue and 4 RX... one cpu linked to 1
> TX/RX.... but if 1 cpu is burned by some process - this cpu will send
> its packet later when other cpu and packet is shape... but streams must
> go packet by packet to receive device... I understand that it simple
> need use hash function for TX queue, but if i understand - RX can't
> separate packets to different queues (its doing by hardware?) by hash
> function and packets may shape in rx stage because one CPU get it letter
> when nedded?
> I do not wish to spend your time... only say if it will work normal by
> default and i will sleep easy :)
Yes, I'm not sure I understand question, but I think you shouldn't
expect too much, at least in 2.6.27. There is many work now in drivers
around this multiqueing (and RX hashing), which should be available in
next kernels, but I'm not tracking this too much... Anyway, with the
basic support (which really isn't common for drivers in 2.6.27 yet),
this separation is done only for TX just before enqueuing.
>
> 2. I can locate module like sch_multiq at last 2.6.27-rc tree and not
> have information of it in google... I need to know only one thing - what
> params for hashing was planned for it?
sch_multiq doesn't use any params for hashing now - it uses mapping
in packets to separate them to different bands/queues. So, by default
it'll respect common hashing. You can change this using any filter with
act_skbedit (Documentation/networking/multiqueue.txt).
> Thanks and thanks again! And again sorry for my English.
Don't worry, English should understand...
Jarek P.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine Check Exception Re: NetDev! Please help!
2008-09-23 11:57 ` Jarek Poplawski
@ 2008-09-23 12:06 ` Jarek Poplawski
2008-09-23 12:16 ` Badalian Vyacheslav
1 sibling, 0 replies; 18+ messages in thread
From: Jarek Poplawski @ 2008-09-23 12:06 UTC (permalink / raw)
To: Badalian Vyacheslav; +Cc: netdev
On Tue, Sep 23, 2008 at 11:57:08AM +0000, Jarek Poplawski wrote:
> On Tue, Sep 23, 2008 at 02:36:05PM +0400, Badalian Vyacheslav wrote:
> >
> > > 2) Non-default qdiscs (any qdiscs added with tc): there is only one
> > > root qdisc (with its tree) as before, dequeued to all tx queues (if
> > > available). Since there is only one qdisc lock, and additional flag
> > > preventing other processes to run the qdisc at the same time, there
> > > is not so much advantage of SMP, except on tx locking. All previous
> > > tc configs should work without changes (except sch_prio and sch_rr
> > > used for multiqueuing, replaced by sch_multiq and act_skbedit now).
> > > Probably in some cases adding sch_multiq to a tree for separating
> > > qdisc queues per tx queues could be useful.
...
> > 2. I can locate module like sch_multiq at last 2.6.27-rc tree and not
> > have information of it in google... I need to know only one thing - what
> > params for hashing was planned for it?
>
> sch_multiq doesn't use any params for hashing now - it uses mapping
> in packets to separate them to different bands/queues. So, by default
> it'll respect common hashing. You can change this using any filter with
> act_skbedit (Documentation/networking/multiqueue.txt).
OOPS!!! This 2.6.27-rc is so ...old I forgot the sch_multiq and
act_skbedit are only for the -next!
I'm very sorry for misleading!!!
Jarek P.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine Check Exception Re: NetDev! Please help!
2008-09-23 11:57 ` Jarek Poplawski
2008-09-23 12:06 ` Jarek Poplawski
@ 2008-09-23 12:16 ` Badalian Vyacheslav
2008-09-23 18:26 ` Jarek Poplawski
1 sibling, 1 reply; 18+ messages in thread
From: Badalian Vyacheslav @ 2008-09-23 12:16 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: netdev
> Actually, I don't use these things too much, but I guess, you'll need
> this more. Main issues were tested and fixed, but there could be always
> some details not used, not noticed or not reported until you decide to
> use this.
>
Yep. I will begin test it in this week if i can get time for it...
> Yes, I'm not sure I understand question, but I think you shouldn't
> expect too much, at least in 2.6.27. There is many work now in drivers
> around this multiqueing (and RX hashing), which should be available in
> next kernels, but I'm not tracking this too much... Anyway, with the
> basic support (which really isn't common for drivers in 2.6.27 yet),
> this separation is done only for TX just before enqueuing.
>
>
I simple hope that after divide to bands/queues and go back to network
all packets save "First in and First Out" logic that needed in services
like video streaming...
> sch_multiq doesn't use any params for hashing now - it uses mapping
> in packets to separate them to different bands/queues. So, by default
> it'll respect common hashing. You can change this using any filter with
> act_skbedit (Documentation/networking/multiqueue.txt).
>
>
Ok! I will read all related information and rewrite all tc generate
scripts to test it!
> Don't worry, English should understand...
>
>
Great what world have people that read messages like this and try
understand it... thanks!
> OOPS!!! This 2.6.27-rc is so ...old I forgot the sch_multiq and
> act_skbedit are only for the -next!
> I'm very sorry for misleading!!
not to worry ;) Thanks again. I try last -next git after someone fix
Intel exception issue ;)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine Check Exception Re: NetDev! Please help!
2008-09-23 12:16 ` Badalian Vyacheslav
@ 2008-09-23 18:26 ` Jarek Poplawski
0 siblings, 0 replies; 18+ messages in thread
From: Jarek Poplawski @ 2008-09-23 18:26 UTC (permalink / raw)
To: Badalian Vyacheslav; +Cc: netdev
Badalian Vyacheslav wrote, On 09/23/2008 02:16 PM:
...
> I simple hope that after divide to bands/queues and go back to network
> all packets save "First in and First Out" logic that needed in services
> like video streaming...
Of course this is aimed at, and if it's really first in per flow, and
flows are bound to devs/irqs, after hashing should be the same. But...
shit happens - e.g. recently there has been fixed a bug in the hash
which could change this order. And you should remember this default
qdisc is prioritized fifo, so you should be careful with TOS etc.
>> sch_multiq doesn't use any params for hashing now - it uses mapping
>> in packets to separate them to different bands/queues. So, by default
>> it'll respect common hashing. You can change this using any filter with
>> act_skbedit (Documentation/networking/multiqueue.txt).
>>
>>
> Ok! I will read all related information and rewrite all tc generate
> scripts to test it!
Do you mean net-next? I think you should first do some tests if you
really need this.
...
> not to worry ;) Thanks again. I try last -next git after someone fix
> Intel exception issue ;)
I think, maybe you could try what Ingo Molnar asked: -tip git, for now?
Jarek P.
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2008-09-23 18:25 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-20 13:19 NetDev! Please help! Badalian Vyacheslav
2008-09-20 13:38 ` Badalian Vyacheslav
2008-09-20 18:11 ` Denys Fedoryshchenko
2008-09-21 16:11 ` Jarek Poplawski
[not found] ` <48D7385D.40107@bigtelecom.ru>
2008-09-22 6:53 ` Machine Check Exception " Jarek Poplawski
2008-09-22 8:05 ` Jarek Poplawski
2008-09-22 9:40 ` Badalian Vyacheslav
2008-09-22 11:24 ` Jarek Poplawski
2008-09-22 13:00 ` Badalian Vyacheslav
2008-09-22 17:23 ` Jarek Poplawski
2008-09-23 7:43 ` Badalian Vyacheslav
2008-09-23 9:25 ` Jarek Poplawski
2008-09-23 10:36 ` Badalian Vyacheslav
2008-09-23 11:57 ` Jarek Poplawski
2008-09-23 12:06 ` Jarek Poplawski
2008-09-23 12:16 ` Badalian Vyacheslav
2008-09-23 18:26 ` Jarek Poplawski
2008-09-20 18:31 ` Machine Check Exception Was: " Jarek Poplawski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).