From: Jorrit Kronjee <j.kronjee@infopact.nl>
To: Eric Dumazet <eric.dumazet@gmail.com>, netfilter-devel@vger.kernel.org
Subject: Re: debugging kernel during packet drops
Date: Fri, 26 Mar 2010 11:41:06 +0100 [thread overview]
Message-ID: <4BAC8F42.7090706@infopact.nl> (raw)
In-Reply-To: <1269509574.3626.9.camel@edumazet-laptop>
On 3/25/2010 10:32 AM, Eric Dumazet wrote:
> Le mercredi 24 mars 2010 à 17:22 +0100, Eric Dumazet a écrit :
>
>> Sure this helps a lot !
>>
>> You might try RPS by doing :
>>
>> echo f >/sys/class/net/eth3/queues/rx-0/rps_cpus
>>
>> (But you'll also need a new xt_hashlimit module to make it more
>> scalable, I can work on this this week if necessary)
>>
>>
> Here is patch I cooked for xt_hashlimit (on top of net-next-2.6) to make
> it use RCU and scale better in your case (allowing several concurrent
> cpus once RPS is activated), but also on more general cases.
>
> [PATCH] xt_hashlimit: RCU conversion
>
> xt_hashlimit uses a central lock per hash table and suffers from
> contention on some workloads.
>
> After RCU conversion, central lock is only used when a writer wants to
> add or delete an entry. For 'readers', updating an existing entry, they
> use an individual lock per entry.
>
Eric,
Awesome work, thanks for the effort! I've tried the patch and got some
results. The drop rate was reduced dramatically after I activated RPS.
I did the same test I did before, namely I rebooted and started flooding
the machine immediately after with 300 kpps. After 5 minutes, perf top
looked like this:
-------------------------------------------------------------------------------------------------------------------------
PerfTop: 1962 irqs/sec kernel:99.3% [1000Hz cycles], (all, 4 CPUs)
-------------------------------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ________________________
_____________________________________________________________________
4501.00 14.0% __ticket_spin_lock
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
2985.00 9.3% dsthash_find
/lib/modules/2.6.34-rc1-net-next/kernel/net/netfilter/xt_hashlimit.ko
2346.00 7.3% __ticket_spin_unlock
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
1354.00 4.2% e1000_xmit_frame
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000e/e1000e.ko
1070.00 3.3% __slab_free
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
997.00 3.1% memcpy
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
809.00 2.5% dev_queue_xmit
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
791.00 2.5% nf_iterate
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
705.00 2.2% e1000_clean_tx_irq
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000e/e1000e.ko
634.00 2.0% nf_hook_slow
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
624.00 1.9% skb_release_head_state
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
584.00 1.8% e1000_intr
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000/e1000.ko
536.00 1.7% br_nf_pre_routing_finish
/lib/modules/2.6.34-rc1-net-next/kernel/net/bridge/bridge.ko
528.00 1.6% nommu_map_page
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
499.00 1.6% kfree
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
494.00 1.5% __netif_receive_skb
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
472.00 1.5% __alloc_skb
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
448.00 1.4% br_fdb_update
/lib/modules/2.6.34-rc1-net-next/kernel/net/bridge/bridge.ko
437.00 1.4% __slab_alloc
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
428.00 1.3% ipt_do_table [ip_tables]
403.00 1.3% memset
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
402.00 1.3% br_handle_frame
/lib/modules/2.6.34-rc1-net-next/kernel/net/bridge/bridge.ko
389.00 1.2% e1000_clean_rx_irq
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000/e1000.ko
388.00 1.2% e1000_clean
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000/e1000.ko
381.00 1.2% uhci_irq
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
366.00 1.1% get_rps_cpu
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
365.00 1.1% br_nf_pre_routing
/lib/modules/2.6.34-rc1-net-next/kernel/net/bridge/bridge.ko
349.00 1.1% dst_release
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
And iptables-save -c produced this:
# Generated by iptables-save v1.4.4 on Fri Mar 26 11:24:59 2010
*filter
:INPUT ACCEPT [1043:60514]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [942:282723]
[99563191:3783420610] -A FORWARD -m hashlimit --hashlimit-upto 10000/sec
--hashlimit-burst 100 --hashlimit-mode dstip --hashlimit-name hashtable
--hashlimit-htable-max 131072 --hashlimit-htable-expire 1000 -j ACCEPT
[0:0] -A FORWARD -m limit --limit 5/sec -j LOG --log-prefix "HASHLIMITED
-- "
[0:0] -A FORWARD -j DROP
COMMIT
# Completed on Fri Mar 26 11:24:59 2010
And /proc/interrupts looked like this:
CPU0 CPU1 CPU2 CPU3
0: 47 0 1 0 IO-APIC-edge timer
1: 0 1 0 1 IO-APIC-edge i8042
6: 1 1 0 0 IO-APIC-edge floppy
8: 1 0 0 0 IO-APIC-edge rtc0
9: 0 0 0 0 IO-APIC-fasteoi acpi
12: 0 1 1 2 IO-APIC-edge i8042
14: 21 22 22 21 IO-APIC-edge
ata_piix
15: 0 0 0 0 IO-APIC-edge
ata_piix
16: 492 464 463 474 IO-APIC-fasteoi arcmsr
17: 0 0 0 0 IO-APIC-fasteoi
ehci_hcd:usb1
18: 971171 971391 948171 948663 IO-APIC-fasteoi
uhci_hcd:usb3, uhci_hcd:usb7, eth3
19: 0 0 0 0 IO-APIC-fasteoi
uhci_hcd:usb6
21: 0 0 0 0 IO-APIC-fasteoi
ata_piix, uhci_hcd:usb4
23: 1 0 1 0 IO-APIC-fasteoi
ehci_hcd:usb2, uhci_hcd:usb5
27: 1003145 1002952 1026174 1025671 PCI-MSI-edge eth4
NMI: 202553 185135 134999 185071 Non-maskable interrupts
LOC: 20270 19227 17387 23282 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 202553 185135 134999 185071 Performance
monitoring interrupts
PND: 201464 183939 134067 184098 Performance pending work
RES: 2216 2449 1212 1432 Rescheduling interrupts
CAL: 2223380 2226493 2233481 2228957 Function call interrupts
TLB: 606 584 1274 1216 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 2 2 2 2 Machine check polls
ERR: 3
MIS: 0
ifconfig reported only 2 drops after these 5 minutes. I'm thinking about
removing/changing the hashing algorithm to make dsthash_find faster. All
I need after all is a match against a destination IP address. Also, I'd
like the limit of 10kpps to be a bit higher. I'll see if I can work on
that during the weekend.
Thanks again for everything!
Regards,
Jorrit Kronjee
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-03-26 10:41 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-22 10:41 debugging kernel during packet drops Jorrit Kronjee
2010-03-22 17:16 ` Patrick McHardy
2010-03-22 17:53 ` Jan Engelhardt
2010-03-22 18:02 ` Patrick McHardy
2010-03-23 15:14 ` Jorrit Kronjee
2010-03-23 15:39 ` Patrick McHardy
2010-03-23 17:21 ` Eric Dumazet
2010-03-23 20:07 ` Eric Dumazet
2010-03-24 15:20 ` Jorrit Kronjee
2010-03-24 16:21 ` Eric Dumazet
2010-03-24 16:28 ` Jan Engelhardt
2010-03-24 17:04 ` Eric Dumazet
2010-03-24 17:25 ` Jan Engelhardt
2010-03-25 9:32 ` Eric Dumazet
2010-03-25 10:35 ` Patrick McHardy
2010-03-25 11:02 ` Eric Dumazet
2010-03-31 12:23 ` [PATCH nf-next-2.6] xt_hashlimit: RCU conversion Eric Dumazet
2010-04-01 11:03 ` Patrick McHardy
2010-04-01 12:10 ` Eric Dumazet
2010-04-01 12:36 ` Patrick McHardy
2010-03-25 12:42 ` debugging kernel during packet drops Jan Engelhardt
2010-03-30 12:06 ` Jan Engelhardt
2010-03-30 14:12 ` Patrick McHardy
2010-03-26 10:41 ` Jorrit Kronjee [this message]
2010-03-26 11:21 ` Eric Dumazet
2010-03-26 14:17 ` Eric Dumazet
2010-03-26 15:54 ` Jorrit Kronjee
2010-03-23 17:04 ` James King
2010-03-23 17:23 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BAC8F42.7090706@infopact.nl \
--to=j.kronjee@infopact.nl \
--cc=eric.dumazet@gmail.com \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).