* codel/fq_codel triggers heaps of WARNs in net/sched/sch_hfsc.c:1426
@ 2016-05-31 7:03 Miroslav Kratochvil
2016-05-31 10:00 ` Florian Westphal
0 siblings, 1 reply; 4+ messages in thread
From: Miroslav Kratochvil @ 2016-05-31 7:03 UTC (permalink / raw)
To: netdev
Hello everyone,
I've been trying to debug an issue that arises when I'm using codel
(of fq_codel) qdiscs attached to a HFSC leaf class. Basic problem is
that on random points in time, kernel log gets overfilled (tens of
MB's of the messages) with many WARNINGs at net/sched/sch_hfsc.c:1426;
full text of several is attached below. The warnings appear randomly
in time, but always in (large) groups.
I was thinking that it is an issue relevant to a similar thing with
SFQ, where it's been fixed by some trimming of stats produced by SFQ.
Documented here:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631945
Similar patch for codel and fq_codel was recommended me for trying out, here:
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/net/sched/sch_fq_codel.c?h=linux-4.5.y&id=01465faa0e2d311512690724196042f9bb466034
but the issue didn't get solved by it.
Also also, there's my original debian bugreport:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=824790
Is there any good approach I can debug this? I currently have a test
system where I can trigger the message easily with any custom kernel;
I'd appreciate any advice on what to try out next.
The messages from test kernel are from 4.5.5 on debian with ~20k hfsc
classes; I'll try to test out 4.6 ASAP but there seems to be no
relevant change in this direction. tg3 driver is not to blame (same
happens with e1000, e1000e, igb and ixgbe). I'm not sure whether u32
filter hashbuckets could trigger this behavior, but hope not
(currently I have no method to try this without u32).
Thanks in advance for any thoughts on this.
-mk
Attached full warnings:
[ 1320.176095] ------------[ cut here ]------------
[ 1320.176104] WARNING: CPU: 2 PID: 0 at net/sched/sch_hfsc.c:1426
hfsc_dequeue+0x300/0x320 [sch_hfsc]()
[ 1320.176105] Modules linked in: sch_codel(E) binfmt_misc(E)
act_mirred(E) act_gact(E) sch_ingress(E) sch_sfq(E) cls_u32(E)
sch_hfsc(E) ext4(E) crc16(E) mbcache(E) jbd2(E)
x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E)
kvm(E) irqbypass(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E)
acpi_power_meter(E) mgag200(E) ttm(E) drm_kms_helper(E) joydev(E)
crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) drm(E)
i2c_algo_bit(E) hmac(E) drbg(E) ansi_cprng(E) 8250_fintek(E)
aesni_intel(E) ipmi_devintf(E) aes_x86_64(E) lrw(E) gf128mul(E)
evdev(E) sg(E) iTCO_wdt(E) iTCO_vendor_support(E) pcspkr(E) wmi(E)
shpchp(E) glue_helper(E) acpi_pad(E) ipmi_si(E) ipmi_msghandler(E)
mei_me(E) sb_edac(E) ablk_helper(E) cryptd(E) lpc_ich(E) button(E)
edac_core(E) mei(E) mfd_core(E) tpm_tis(E) tpm(E)
[ 1320.176141] processor(E) ifb(E) autofs4(E) xfs(E) libcrc32c(E)
hid_generic(E) usbhid(E) hid(E) sr_mod(E) sd_mod(E) cdrom(E)
crc32c_intel(E) ixgbe(E) dca(E) vxlan(E) ip6_udp_tunnel(E)
udp_tunnel(E) mdio(E) ehci_pci(E) ahci(E) ehci_hcd(E) libahci(E)
libata(E) tg3(E) ptp(E) pps_core(E) megaraid_sas(E) usbcore(E)
libphy(E) usb_common(E) scsi_mod(E) fjes(E)
[ 1320.176159] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G E 4.5.5 #1
[ 1320.176160] Hardware name: /08DM12, BIOS 2.1.2 01/20/2014
[ 1320.176162] 0000000000000286 21264a740a0fcbac ffffffff81302ff5
0000000000000000
[ 1320.176164] ffffffffc04db049 ffffffff81078ced ffff880610c85948
00000004cd5ee44c
[ 1320.176166] ffff880610c85800 ffff880610c85c90 ffff880606a67600
ffffffffc04d9550
[ 1320.176168] Call Trace:
[ 1320.176169] <IRQ> [<ffffffff81302ff5>] ? dump_stack+0x5c/0x77
[ 1320.176179] [<ffffffff81078ced>] ? warn_slowpath_common+0x7d/0xb0
[ 1320.176181] [<ffffffffc04d9550>] ? hfsc_dequeue+0x300/0x320 [sch_hfsc]
[ 1320.176185] [<ffffffff814db925>] ? __qdisc_run+0x65/0x190
[ 1320.176189] [<ffffffff814b33f6>] ? net_tx_action+0xd6/0x230
[ 1320.176191] [<ffffffff8107d4c8>] ? __do_softirq+0xf8/0x290
[ 1320.176193] [<ffffffff8107d7ab>] ? irq_exit+0x9b/0xa0
[ 1320.176196] [<ffffffff815b50df>] ? do_IRQ+0x4f/0xd0
[ 1320.176199] [<ffffffff815b3202>] ? common_interrupt+0x82/0x82
[ 1320.176200] <EOI> [<ffffffff8147dbf8>] ? cpuidle_enter_state+0x118/0x2c0
[ 1320.176203] [<ffffffff8147dbe5>] ? cpuidle_enter_state+0x105/0x2c0
[ 1320.176207] [<ffffffff810b8837>] ? cpu_startup_entry+0x287/0x340
[ 1320.176210] [<ffffffff8104d40a>] ? start_secondary+0x15a/0x190
[ 1320.176211] ---[ end trace b5b10ee435b3246b ]---
[ 1320.176254] ------------[ cut here ]------------
[ 1320.176256] WARNING: CPU: 2 PID: 0 at net/sched/sch_hfsc.c:1426
hfsc_dequeue+0x300/0x320 [sch_hfsc]()
[ 1320.176257] Modules linked in: sch_codel(E) binfmt_misc(E)
act_mirred(E) act_gact(E) sch_ingress(E) sch_sfq(E) cls_u32(E)
sch_hfsc(E) ext4(E) crc16(E) mbcache(E) jbd2(E)
x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E)
kvm(E) irqbypass(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E)
acpi_power_meter(E) mgag200(E) ttm(E) drm_kms_helper(E) joydev(E)
crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) drm(E)
i2c_algo_bit(E) hmac(E) drbg(E) ansi_cprng(E) 8250_fintek(E)
aesni_intel(E) ipmi_devintf(E) aes_x86_64(E) lrw(E) gf128mul(E)
evdev(E) sg(E) iTCO_wdt(E) iTCO_vendor_support(E) pcspkr(E) wmi(E)
shpchp(E) glue_helper(E) acpi_pad(E) ipmi_si(E) ipmi_msghandler(E)
mei_me(E) sb_edac(E) ablk_helper(E) cryptd(E) lpc_ich(E) button(E)
edac_core(E) mei(E) mfd_core(E) tpm_tis(E) tpm(E)
[ 1320.176276] processor(E) ifb(E) autofs4(E) xfs(E) libcrc32c(E)
hid_generic(E) usbhid(E) hid(E) sr_mod(E) sd_mod(E) cdrom(E)
crc32c_intel(E) ixgbe(E) dca(E) vxlan(E) ip6_udp_tunnel(E)
udp_tunnel(E) mdio(E) ehci_pci(E) ahci(E) ehci_hcd(E) libahci(E)
libata(E) tg3(E) ptp(E) pps_core(E) megaraid_sas(E) usbcore(E)
libphy(E) usb_common(E) scsi_mod(E) fjes(E)
[ 1320.176287] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W E 4.5.5 #1
[ 1320.176288] Hardware name: /08DM12, BIOS 2.1.2 01/20/2014
[ 1320.176289] 0000000000000286 21264a740a0fcbac ffffffff81302ff5
0000000000000000
[ 1320.176291] ffffffffc04db049 ffffffff81078ced ffff880610c85948
00000004cd5eee0c
[ 1320.176292] ffff880610c85800 ffff880610c85c90 000000000000004c
ffffffffc04d9550
[ 1320.176295] Call Trace:
[ 1320.176295] <IRQ> [<ffffffff81302ff5>] ? dump_stack+0x5c/0x77
[ 1320.176299] [<ffffffff81078ced>] ? warn_slowpath_common+0x7d/0xb0
[ 1320.176301] [<ffffffffc04d9550>] ? hfsc_dequeue+0x300/0x320 [sch_hfsc]
[ 1320.176303] [<ffffffff814db925>] ? __qdisc_run+0x65/0x190
[ 1320.176305] [<ffffffff814b33f6>] ? net_tx_action+0xd6/0x230
[ 1320.176308] [<ffffffff8107d4c8>] ? __do_softirq+0xf8/0x290
[ 1320.176310] [<ffffffff8107d7ab>] ? irq_exit+0x9b/0xa0
[ 1320.176311] [<ffffffff815b50df>] ? do_IRQ+0x4f/0xd0
[ 1320.176313] [<ffffffff815b3202>] ? common_interrupt+0x82/0x82
[ 1320.176314] <EOI> [<ffffffff8147dbf8>] ? cpuidle_enter_state+0x118/0x2c0
[ 1320.176316] [<ffffffff8147dbe5>] ? cpuidle_enter_state+0x105/0x2c0
[ 1320.176318] [<ffffffff810b8837>] ? cpu_startup_entry+0x287/0x340
[ 1320.176320] [<ffffffff8104d40a>] ? start_secondary+0x15a/0x190
[ 1320.176322] ---[ end trace b5b10ee435b3246c ]---
[ 1320.176332] ------------[ cut here ]------------
[ 1320.176334] WARNING: CPU: 2 PID: 0 at net/sched/sch_hfsc.c:1426
hfsc_dequeue+0x300/0x320 [sch_hfsc]()
[ 1320.176335] Modules linked in: sch_codel(E) binfmt_misc(E)
act_mirred(E) act_gact(E) sch_ingress(E) sch_sfq(E) cls_u32(E)
sch_hfsc(E) ext4(E) crc16(E) mbcache(E) jbd2(E)
x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E)
kvm(E) irqbypass(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E)
acpi_power_meter(E) mgag200(E) ttm(E) drm_kms_helper(E) joydev(E)
crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) drm(E)
i2c_algo_bit(E) hmac(E) drbg(E) ansi_cprng(E) 8250_fintek(E)
aesni_intel(E) ipmi_devintf(E) aes_x86_64(E) lrw(E) gf128mul(E)
evdev(E) sg(E) iTCO_wdt(E) iTCO_vendor_support(E) pcspkr(E) wmi(E)
shpchp(E) glue_helper(E) acpi_pad(E) ipmi_si(E) ipmi_msghandler(E)
mei_me(E) sb_edac(E) ablk_helper(E) cryptd(E) lpc_ich(E) button(E)
edac_core(E) mei(E) mfd_core(E) tpm_tis(E) tpm(E)
[ 1320.176354] processor(E) ifb(E) autofs4(E) xfs(E) libcrc32c(E)
hid_generic(E) usbhid(E) hid(E) sr_mod(E) sd_mod(E) cdrom(E)
crc32c_intel(E) ixgbe(E) dca(E) vxlan(E) ip6_udp_tunnel(E)
udp_tunnel(E) mdio(E) ehci_pci(E) ahci(E) ehci_hcd(E) libahci(E)
libata(E) tg3(E) ptp(E) pps_core(E) megaraid_sas(E) usbcore(E)
libphy(E) usb_common(E) scsi_mod(E) fjes(E)
[ 1320.176365] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W E 4.5.5 #1
[ 1320.176366] Hardware name: /08DM12, BIOS 2.1.2 01/20/2014
[ 1320.176366] 0000000000000286 21264a740a0fcbac ffffffff81302ff5
0000000000000000
[ 1320.176368] ffffffffc04db049 ffffffff81078ced ffff880610c85948
00000004cd5ef2d6
[ 1320.176370] ffff880610c85800 ffff880610c85c90 ffff880610e81e00
ffffffffc04d9550
[ 1320.176371] Call Trace:
[ 1320.176372] <IRQ> [<ffffffff81302ff5>] ? dump_stack+0x5c/0x77
[ 1320.176375] [<ffffffff81078ced>] ? warn_slowpath_common+0x7d/0xb0
[ 1320.176377] [<ffffffffc04d9550>] ? hfsc_dequeue+0x300/0x320 [sch_hfsc]
[ 1320.176379] [<ffffffff814db925>] ? __qdisc_run+0x65/0x190
[ 1320.176381] [<ffffffff814b7301>] ? __dev_queue_xmit+0x221/0x660
[ 1320.176384] [<ffffffffc0554626>] ? tcf_mirred+0xf6/0x178 [act_mirred]
[ 1320.176387] [<ffffffff814e11a1>] ? tcf_action_exec+0x41/0x70
[ 1320.176390] [<ffffffffc0532a02>] ? u32_classify+0x232/0x460 [cls_u32]
[ 1320.176392] [<ffffffff810e0a21>] ? hrtimer_interrupt+0xc1/0x190
[ 1320.176394] [<ffffffff8107d74c>] ? irq_exit+0x3c/0xa0
[ 1320.176396] [<ffffffff815b519e>] ? smp_apic_timer_interrupt+0x3e/0x50
[ 1320.176398] [<ffffffff815b34a2>] ? apic_timer_interrupt+0x82/0x90
[ 1320.176400] [<ffffffff814dcdea>] ? tc_classify+0x6a/0x120
[ 1320.176403] [<ffffffff814b4725>] ? __netif_receive_skb_core+0x495/0xa20
[ 1320.176405] [<ffffffff810bc7e2>] ? up+0x12/0x60
[ 1320.176408] [<ffffffff810c9624>] ? console_unlock+0x214/0x540
[ 1320.176410] [<ffffffff814b4d2f>] ? netif_receive_skb_internal+0x2f/0xa0
[ 1320.176411] [<ffffffff814b5c5b>] ? napi_gro_receive+0xbb/0x110
[ 1320.176416] [<ffffffffc0177700>] ? tg3_poll_work+0xd90/0xef0 [tg3]
[ 1320.176420] [<ffffffffc017789a>] ? tg3_poll_msix+0x3a/0x150 [tg3]
[ 1320.176421] [<ffffffff814b54de>] ? net_rx_action+0x22e/0x360
[ 1320.176423] [<ffffffff8107d4c8>] ? __do_softirq+0xf8/0x290
[ 1320.176425] [<ffffffff8107d7ab>] ? irq_exit+0x9b/0xa0
[ 1320.176427] [<ffffffff815b50df>] ? do_IRQ+0x4f/0xd0
[ 1320.176429] [<ffffffff815b3202>] ? common_interrupt+0x82/0x82
[ 1320.176429] <EOI> [<ffffffff8147dbf8>] ? cpuidle_enter_state+0x118/0x2c0
[ 1320.176432] [<ffffffff8147dbe5>] ? cpuidle_enter_state+0x105/0x2c0
[ 1320.176434] [<ffffffff810b8837>] ? cpu_startup_entry+0x287/0x340
[ 1320.176436] [<ffffffff8104d40a>] ? start_secondary+0x15a/0x190
[ 1320.176438] ---[ end trace b5b10ee435b3246d ]---
[ 1320.176443] ------------[ cut here ]------------
[ 1320.176446] WARNING: CPU: 2 PID: 0 at net/sched/sch_hfsc.c:1426
hfsc_dequeue+0x300/0x320 [sch_hfsc]()
[ 1320.176446] Modules linked in: sch_codel(E) binfmt_misc(E)
act_mirred(E) act_gact(E) sch_ingress(E) sch_sfq(E) cls_u32(E)
sch_hfsc(E) ext4(E) crc16(E) mbcache(E) jbd2(E)
x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E)
kvm(E) irqbypass(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E)
acpi_power_meter(E) mgag200(E) ttm(E) drm_kms_helper(E) joydev(E)
crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) drm(E)
i2c_algo_bit(E) hmac(E) drbg(E) ansi_cprng(E) 8250_fintek(E)
aesni_intel(E) ipmi_devintf(E) aes_x86_64(E) lrw(E) gf128mul(E)
evdev(E) sg(E) iTCO_wdt(E) iTCO_vendor_support(E) pcspkr(E) wmi(E)
shpchp(E) glue_helper(E) acpi_pad(E) ipmi_si(E) ipmi_msghandler(E)
mei_me(E) sb_edac(E) ablk_helper(E) cryptd(E) lpc_ich(E) button(E)
edac_core(E) mei(E) mfd_core(E) tpm_tis(E) tpm(E)
[ 1320.176465] processor(E) ifb(E) autofs4(E) xfs(E) libcrc32c(E)
hid_generic(E) usbhid(E) hid(E) sr_mod(E) sd_mod(E) cdrom(E)
crc32c_intel(E) ixgbe(E) dca(E) vxlan(E) ip6_udp_tunnel(E)
udp_tunnel(E) mdio(E) ehci_pci(E) ahci(E) ehci_hcd(E) libahci(E)
libata(E) tg3(E) ptp(E) pps_core(E) megaraid_sas(E) usbcore(E)
libphy(E) usb_common(E) scsi_mod(E) fjes(E)
[ 1320.176476] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W E 4.5.5 #1
[ 1320.176477] Hardware name: /08DM12, BIOS 2.1.2 01/20/2014
[ 1320.176478] 0000000000000286 21264a740a0fcbac ffffffff81302ff5
0000000000000000
[ 1320.176479] ffffffffc04db049 ffffffff81078ced ffff880610c85948
00000004cd5ef9a4
[ 1320.176481] ffff880610c85800 ffff880610c85c90 ffff8806092e6b00
ffffffffc04d9550
[ 1320.176483] Call Trace:
[ 1320.176484] <IRQ> [<ffffffff81302ff5>] ? dump_stack+0x5c/0x77
[ 1320.176487] [<ffffffff81078ced>] ? warn_slowpath_common+0x7d/0xb0
[ 1320.176489] [<ffffffffc04d9550>] ? hfsc_dequeue+0x300/0x320 [sch_hfsc]
[ 1320.176491] [<ffffffff814db925>] ? __qdisc_run+0x65/0x190
[ 1320.176493] [<ffffffff814b7301>] ? __dev_queue_xmit+0x221/0x660
[ 1320.176495] [<ffffffffc0554626>] ? tcf_mirred+0xf6/0x178 [act_mirred]
[ 1320.176496] [<ffffffff814e11a1>] ? tcf_action_exec+0x41/0x70
[ 1320.176498] [<ffffffffc0532a02>] ? u32_classify+0x232/0x460 [cls_u32]
[ 1320.176500] [<ffffffff810e0a21>] ? hrtimer_interrupt+0xc1/0x190
[ 1320.176502] [<ffffffff8130b8ee>] ? timerqueue_del+0x1e/0x60
[ 1320.176505] [<ffffffff810dff75>] ? __remove_hrtimer+0x35/0x90
[ 1320.176507] [<ffffffff814dcc62>] ? qdisc_watchdog+0x22/0x30
[ 1320.176510] [<ffffffff810e028a>] ? __hrtimer_run_queues+0xfa/0x280
[ 1320.176512] [<ffffffff814dcdea>] ? tc_classify+0x6a/0x120
[ 1320.176514] [<ffffffff814b4725>] ? __netif_receive_skb_core+0x495/0xa20
[ 1320.176516] [<ffffffff814b4d2f>] ? netif_receive_skb_internal+0x2f/0xa0
[ 1320.176517] [<ffffffff814b5c5b>] ? napi_gro_receive+0xbb/0x110
[ 1320.176520] [<ffffffffc0177700>] ? tg3_poll_work+0xd90/0xef0 [tg3]
[ 1320.176523] [<ffffffffc017789a>] ? tg3_poll_msix+0x3a/0x150 [tg3]
[ 1320.176525] [<ffffffff814b54de>] ? net_rx_action+0x22e/0x360
[ 1320.176527] [<ffffffff8107d4c8>] ? __do_softirq+0xf8/0x290
[ 1320.176529] [<ffffffff8107d7ab>] ? irq_exit+0x9b/0xa0
[ 1320.176531] [<ffffffff815b50df>] ? do_IRQ+0x4f/0xd0
[ 1320.176532] [<ffffffff815b3202>] ? common_interrupt+0x82/0x82
[ 1320.176533] <EOI> [<ffffffff8147dbf8>] ? cpuidle_enter_state+0x118/0x2c0
[ 1320.176535] [<ffffffff8147dbe5>] ? cpuidle_enter_state+0x105/0x2c0
[ 1320.176537] [<ffffffff810b8837>] ? cpu_startup_entry+0x287/0x340
[ 1320.176539] [<ffffffff8104d40a>] ? start_secondary+0x15a/0x190
[ 1320.176540] ---[ end trace b5b10ee435b3246e ]---
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: codel/fq_codel triggers heaps of WARNs in net/sched/sch_hfsc.c:1426
2016-05-31 7:03 codel/fq_codel triggers heaps of WARNs in net/sched/sch_hfsc.c:1426 Miroslav Kratochvil
@ 2016-05-31 10:00 ` Florian Westphal
2016-05-31 12:31 ` Miroslav Kratochvil
0 siblings, 1 reply; 4+ messages in thread
From: Florian Westphal @ 2016-05-31 10:00 UTC (permalink / raw)
To: Miroslav Kratochvil; +Cc: netdev
Miroslav Kratochvil <exa.exa@gmail.com> wrote:
> Hello everyone,
>
> I've been trying to debug an issue that arises when I'm using codel
> (of fq_codel) qdiscs attached to a HFSC leaf class. Basic problem is
> that on random points in time, kernel log gets overfilled (tens of
> MB's of the messages) with many WARNINGs at net/sched/sch_hfsc.c:1426;
> full text of several is attached below. The warnings appear randomly
> in time, but always in (large) groups.
>
> I was thinking that it is an issue relevant to a similar thing with
> SFQ, where it's been fixed by some trimming of stats produced by SFQ.
> Documented here:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631945
>
> Similar patch for codel and fq_codel was recommended me for trying out, here:
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/net/sched/sch_fq_codel.c?h=linux-4.5.y&id=01465faa0e2d311512690724196042f9bb466034
> but the issue didn't get solved by it.
>
> Also also, there's my original debian bugreport:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=824790
>
> Is there any good approach I can debug this? I currently have a test
> system where I can trigger the message easily with any custom kernel;
> I'd appreciate any advice on what to try out next.
>
> The messages from test kernel are from 4.5.5 on debian with ~20k hfsc
> classes; I'll try to test out 4.6 ASAP but there seems to be no
> relevant change in this direction. tg3 driver is not to blame (same
> happens with e1000, e1000e, igb and ixgbe). I'm not sure whether u32
> filter hashbuckets could trigger this behavior, but hope not
> (currently I have no method to try this without u32).
>
> Thanks in advance for any thoughts on this.
Both HFSC and fq_codel have problems, but I'm not sure if these are
relevant for your 4.5.5 kernel.
I'll submit a hfsc patch soon (it does fix a real problem).
If you have any config knobs enabled on the fq_codel leaf qdiscs it
would be good to know what parameters are used.
Can you try this patch (it doesn't fix anything but might provide more info):
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index d783d7c..045169e 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -49,6 +49,8 @@
* a class whose fit-time exceeds the current time.
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/types.h>
@@ -1423,7 +1425,11 @@ hfsc_schedule_watchdog(struct Qdisc *sch)
if (next_time == 0 || next_time > q->root.cl_cfmin)
next_time = q->root.cl_cfmin;
}
- WARN_ON(next_time == 0);
+ if (WARN_ON_ONCE(next_time == 0)) {
+ pr_warn_ratelimited("qlen %u droplist_empty: %d, cfmin %llu, minel %d, root_empty %d\n",
+ sch->q.qlen, list_empty(&q->droplist),
+ (unsigned long long)q->root.cl_cfmin, !!cl, RB_EMPTY_ROOT(&q->eligible));
+ }
qdisc_watchdog_schedule(&q->watchdog, next_time);
}
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: codel/fq_codel triggers heaps of WARNs in net/sched/sch_hfsc.c:1426
2016-05-31 10:00 ` Florian Westphal
@ 2016-05-31 12:31 ` Miroslav Kratochvil
2016-05-31 13:44 ` Miroslav Kratochvil
0 siblings, 1 reply; 4+ messages in thread
From: Miroslav Kratochvil @ 2016-05-31 12:31 UTC (permalink / raw)
To: Florian Westphal; +Cc: netdev
> Both HFSC and fq_codel have problems, but I'm not sure if these are
> relevant for your 4.5.5 kernel.
> I'll submit a hfsc patch soon (it does fix a real problem).
I'm actually QoSing around 5 Gbit/s on some machines with HFSC,
without any problems, with SFQ though. Feel free to contact me if you
need some testing ground.
>
> If you have any config knobs enabled on the fq_codel leaf qdiscs it
> would be good to know what parameters are used.
"all default". Codels look like this:
qdisc codel 573b: dev ifb1 parent 1:573b limit 1000p target 5.0ms
interval 100.0ms
Sent 64451468 bytes 43164 pkt (dropped 29, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
count 3 lastcount 1 ldelay 2us drop_next 0us
maxpacket 1536 ecn_mark 0 drop_overlimit 0
>
> Can you try this patch (it doesn't fix anything but might provide more info):
> [...patch...]
The output is attached below. If I got it correctly from my weak
knowledge of HFSC internals there's a packet that somehow appears in
the queue without having the fit time set?
-mk
[ 1476.685712] ------------[ cut here ]------------
[ 1476.685728] WARNING: CPU: 16 PID: 0 at net/sched/sch_hfsc.c:1426
hfsc_dequeue+0x30f/0x3a0 [sch_hfsc]()
[ 1476.685731] Modules linked in: sch_fq_codel(E) act_mirred(E)
act_gact(E) sch_ingress(E) sch_codel(E) sch_sfq(E) cls_u32(E)
sch_hfsc(E) ext4(E) x86_pkg_temp_thermal(E) crc16(E) mbcache(E)
jbd2(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E)
irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E)
ghash_clmulni_intel(E) hmac(E) drbg(E) ansi_cprng(E) aesni_intel(E)
aes_x86_64(E) joydev(E) lrw(E) gf128mul(E) sg(E) iTCO_wdt(E)
iTCO_vendor_support(E) glue_helper(E) mgag200(E) snd_pcm(E)
snd_timer(E) snd(E) soundcore(E) ipmi_devintf(E) ablk_helper(E)
cryptd(E) ttm(E) drm_kms_helper(E) drm(E) i2c_algo_bit(E) evdev(E)
wmi(E) acpi_power_meter(E) 8250_fintek(E) pcspkr(E) acpi_pad(E)
ipmi_si(E) ipmi_msghandler(E) sb_edac(E) button(E) shpchp(E)
edac_core(E) mei_me(E) mei(E) tpm_tis(E) lpc_ich(E) mfd_core(E) tpm(E)
[ 1476.685779] processor(E) ifb(E) autofs4(E) xfs(E) libcrc32c(E)
hid_generic(E) usbhid(E) hid(E) sd_mod(E) sr_mod(E) cdrom(E)
crc32c_intel(E) ixgbe(E) dca(E) vxlan(E) ip6_udp_tunnel(E)
udp_tunnel(E) mdio(E) ahci(E) libahci(E) ehci_pci(E) libata(E) tg3(E)
ehci_hcd(E) ptp(E) pps_core(E) usbcore(E) libphy(E) megaraid_sas(E)
usb_common(E) scsi_mod(E) fjes(E)
[ 1476.685808] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G E
4.5.5 #1
[ 1476.685810] Hardware name: /08DM12, BIOS 2.1.2 01/20/2014
[ 1476.685812] 0000000000000286 2a3bd3d7e60bafde ffffffff81302ff5
0000000000000000
[ 1476.685815] ffffffffc04fb049 ffffffff81078ced 0000000000000000
0000000000000000
[ 1476.685818] ffff880c13411000 ffff880c13411490 ffff88003679a300
ffffffffc04f955f
[ 1476.685821] Call Trace:
[ 1476.685823] <IRQ> [<ffffffff81302ff5>] ? dump_stack+0x5c/0x77
[ 1476.685834] [<ffffffff81078ced>] ? warn_slowpath_common+0x7d/0xb0
[ 1476.685838] [<ffffffffc04f955f>] ? hfsc_dequeue+0x30f/0x3a0 [sch_hfsc]
[ 1476.685842] [<ffffffff814db925>] ? __qdisc_run+0x65/0x190
[ 1476.685848] [<ffffffff814b33f6>] ? net_tx_action+0xd6/0x230
[ 1476.685854] [<ffffffff8107d4c8>] ? __do_softirq+0xf8/0x290
[ 1476.685857] [<ffffffff8107d7ab>] ? irq_exit+0x9b/0xa0
[ 1476.685861] [<ffffffff815b519e>] ? smp_apic_timer_interrupt+0x3e/0x50
[ 1476.685864] [<ffffffff815b34a2>] ? apic_timer_interrupt+0x82/0x90
[ 1476.685865] <EOI> [<ffffffff8147dbf8>] ? cpuidle_enter_state+0x118/0x2c0
[ 1476.685870] [<ffffffff8147dbe5>] ? cpuidle_enter_state+0x105/0x2c0
[ 1476.685874] [<ffffffff810b8837>] ? cpu_startup_entry+0x287/0x340
[ 1476.685884] [<ffffffff8104d40a>] ? start_secondary+0x15a/0x190
[ 1476.685887] ---[ end trace 3c736fc106257086 ]---
[ 1476.685890] qlen 1 droplist_empty: 1, cfmin 0, minel 0, root_empty 1
[ 1476.685901] qlen 1 droplist_empty: 1, cfmin 0, minel 0, root_empty 1
[ 1476.685973] qlen 1 droplist_empty: 1, cfmin 0, minel 0, root_empty 1
[ 1476.685982] qlen 1 droplist_empty: 1, cfmin 0, minel 0, root_empty 1
[ 1476.685992] qlen 1 droplist_empty: 1, cfmin 0, minel 0, root_empty 1
[ 1476.685997] qlen 1 droplist_empty: 1, cfmin 0, minel 0, root_empty 1
[ 1476.686000] qlen 1 droplist_empty: 1, cfmin 0, minel 0, root_empty 1
[ 1476.686004] qlen 1 droplist_empty: 1, cfmin 0, minel 0, root_empty 1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: codel/fq_codel triggers heaps of WARNs in net/sched/sch_hfsc.c:1426
2016-05-31 12:31 ` Miroslav Kratochvil
@ 2016-05-31 13:44 ` Miroslav Kratochvil
0 siblings, 0 replies; 4+ messages in thread
From: Miroslav Kratochvil @ 2016-05-31 13:44 UTC (permalink / raw)
To: Florian Westphal; +Cc: netdev
>> I'll submit a hfsc patch soon (it does fix a real problem).
Just FYI, the patch from another thread
[PATCH] hfsc: ensure class is added to eltree exactly once
doesn't help the situation, I just tested it. The debugging output is
exactly the same as before.
-mk
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-05-31 13:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-31 7:03 codel/fq_codel triggers heaps of WARNs in net/sched/sch_hfsc.c:1426 Miroslav Kratochvil
2016-05-31 10:00 ` Florian Westphal
2016-05-31 12:31 ` Miroslav Kratochvil
2016-05-31 13:44 ` Miroslav Kratochvil
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).