From: Jorrit Kronjee <j.kronjee@infopact.nl>
To: Eric Dumazet <eric.dumazet@gmail.com>, netfilter-devel@vger.kernel.org
Subject: Re: debugging kernel during packet drops
Date: Fri, 26 Mar 2010 11:41:06 +0100 [thread overview]
Message-ID: <4BAC8F42.7090706@infopact.nl> (raw)
In-Reply-To: <1269509574.3626.9.camel@edumazet-laptop>
On 3/25/2010 10:32 AM, Eric Dumazet wrote:
> Le mercredi 24 mars 2010 à 17:22 +0100, Eric Dumazet a écrit :
>
>> Sure this helps a lot !
>>
>> You might try RPS by doing :
>>
>> echo f >/sys/class/net/eth3/queues/rx-0/rps_cpus
>>
>> (But you'll also need a new xt_hashlimit module to make it more
>> scalable, I can work on this this week if necessary)
>>
>>
> Here is patch I cooked for xt_hashlimit (on top of net-next-2.6) to make
> it use RCU and scale better in your case (allowing several concurrent
> cpus once RPS is activated), but also on more general cases.
>
> [PATCH] xt_hashlimit: RCU conversion
>
> xt_hashlimit uses a central lock per hash table and suffers from
> contention on some workloads.
>
> After RCU conversion, central lock is only used when a writer wants to
> add or delete an entry. For 'readers', updating an existing entry, they
> use an individual lock per entry.
>
Eric,
Awesome work, thanks for the effort! I've tried the patch and got some
results. The drop rate was reduced dramatically after I activated RPS.
I did the same test I did before, namely I rebooted and started flooding
the machine immediately after with 300 kpps. After 5 minutes, perf top
looked like this:
-------------------------------------------------------------------------------------------------------------------------
PerfTop: 1962 irqs/sec kernel:99.3% [1000Hz cycles], (all, 4 CPUs)
-------------------------------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ________________________
_____________________________________________________________________
4501.00 14.0% __ticket_spin_lock
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
2985.00 9.3% dsthash_find
/lib/modules/2.6.34-rc1-net-next/kernel/net/netfilter/xt_hashlimit.ko
2346.00 7.3% __ticket_spin_unlock
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
1354.00 4.2% e1000_xmit_frame
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000e/e1000e.ko
1070.00 3.3% __slab_free
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
997.00 3.1% memcpy
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
809.00 2.5% dev_queue_xmit
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
791.00 2.5% nf_iterate
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
705.00 2.2% e1000_clean_tx_irq
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000e/e1000e.ko
634.00 2.0% nf_hook_slow
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
624.00 1.9% skb_release_head_state
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
584.00 1.8% e1000_intr
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000/e1000.ko
536.00 1.7% br_nf_pre_routing_finish
/lib/modules/2.6.34-rc1-net-next/kernel/net/bridge/bridge.ko
528.00 1.6% nommu_map_page
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
499.00 1.6% kfree
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
494.00 1.5% __netif_receive_skb
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
472.00 1.5% __alloc_skb
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
448.00 1.4% br_fdb_update
/lib/modules/2.6.34-rc1-net-next/kernel/net/bridge/bridge.ko
437.00 1.4% __slab_alloc
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
428.00 1.3% ipt_do_table [ip_tables]
403.00 1.3% memset
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
402.00 1.3% br_handle_frame
/lib/modules/2.6.34-rc1-net-next/kernel/net/bridge/bridge.ko
389.00 1.2% e1000_clean_rx_irq
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000/e1000.ko
388.00 1.2% e1000_clean
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000/e1000.ko
381.00 1.2% uhci_irq
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
366.00 1.1% get_rps_cpu
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
365.00 1.1% br_nf_pre_routing
/lib/modules/2.6.34-rc1-net-next/kernel/net/bridge/bridge.ko
349.00 1.1% dst_release
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
And iptables-save -c produced this:
# Generated by iptables-save v1.4.4 on Fri Mar 26 11:24:59 2010
*filter
:INPUT ACCEPT [1043:60514]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [942:282723]
[99563191:3783420610] -A FORWARD -m hashlimit --hashlimit-upto 10000/sec
--hashlimit-burst 100 --hashlimit-mode dstip --hashlimit-name hashtable
--hashlimit-htable-max 131072 --hashlimit-htable-expire 1000 -j ACCEPT
[0:0] -A FORWARD -m limit --limit 5/sec -j LOG --log-prefix "HASHLIMITED
-- "
[0:0] -A FORWARD -j DROP
COMMIT
# Completed on Fri Mar 26 11:24:59 2010
And /proc/interrupts looked like this:
CPU0 CPU1 CPU2 CPU3
0: 47 0 1 0 IO-APIC-edge timer
1: 0 1 0 1 IO-APIC-edge i8042
6: 1 1 0 0 IO-APIC-edge floppy
8: 1 0 0 0 IO-APIC-edge rtc0
9: 0 0 0 0 IO-APIC-fasteoi acpi
12: 0 1 1 2 IO-APIC-edge i8042
14: 21 22 22 21 IO-APIC-edge
ata_piix
15: 0 0 0 0 IO-APIC-edge
ata_piix
16: 492 464 463 474 IO-APIC-fasteoi arcmsr
17: 0 0 0 0 IO-APIC-fasteoi
ehci_hcd:usb1
18: 971171 971391 948171 948663 IO-APIC-fasteoi
uhci_hcd:usb3, uhci_hcd:usb7, eth3
19: 0 0 0 0 IO-APIC-fasteoi
uhci_hcd:usb6
21: 0 0 0 0 IO-APIC-fasteoi
ata_piix, uhci_hcd:usb4
23: 1 0 1 0 IO-APIC-fasteoi
ehci_hcd:usb2, uhci_hcd:usb5
27: 1003145 1002952 1026174 1025671 PCI-MSI-edge eth4
NMI: 202553 185135 134999 185071 Non-maskable interrupts
LOC: 20270 19227 17387 23282 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 202553 185135 134999 185071 Performance
monitoring interrupts
PND: 201464 183939 134067 184098 Performance pending work
RES: 2216 2449 1212 1432 Rescheduling interrupts
CAL: 2223380 2226493 2233481 2228957 Function call interrupts
TLB: 606 584 1274 1216 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 2 2 2 2 Machine check polls
ERR: 3
MIS: 0
ifconfig reported only 2 drops after these 5 minutes. I'm thinking about
removing/changing the hashing algorithm to make dsthash_find faster. All
I need after all is a match against a destination IP address. Also, I'd
like the limit of 10kpps to be a bit higher. I'll see if I can work on
that during the weekend.
Thanks again for everything!
Regards,
Jorrit Kronjee
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-03-26 10:41 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-22 10:41 debugging kernel during packet drops Jorrit Kronjee
2010-03-22 17:16 ` Patrick McHardy
2010-03-22 17:53 ` Jan Engelhardt
2010-03-22 18:02 ` Patrick McHardy
2010-03-23 15:14 ` Jorrit Kronjee
2010-03-23 15:39 ` Patrick McHardy
2010-03-23 17:21 ` Eric Dumazet
2010-03-23 20:07 ` Eric Dumazet
2010-03-24 15:20 ` Jorrit Kronjee
2010-03-24 16:21 ` Eric Dumazet
2010-03-24 16:28 ` Jan Engelhardt
2010-03-24 17:04 ` Eric Dumazet
2010-03-24 17:25 ` Jan Engelhardt
2010-03-25 9:32 ` Eric Dumazet
2010-03-25 10:35 ` Patrick McHardy
2010-03-25 11:02 ` Eric Dumazet
2010-03-31 12:23 ` [PATCH nf-next-2.6] xt_hashlimit: RCU conversion Eric Dumazet
2010-04-01 11:03 ` Patrick McHardy
2010-04-01 12:10 ` Eric Dumazet
2010-04-01 12:36 ` Patrick McHardy
2010-03-25 12:42 ` debugging kernel during packet drops Jan Engelhardt
2010-03-30 12:06 ` Jan Engelhardt
2010-03-30 14:12 ` Patrick McHardy
2010-03-26 10:41 ` Jorrit Kronjee [this message]
2010-03-26 11:21 ` Eric Dumazet
2010-03-26 14:17 ` Eric Dumazet
2010-03-26 15:54 ` Jorrit Kronjee
2010-03-23 17:04 ` James King
2010-03-23 17:23 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BAC8F42.7090706@infopact.nl \
--to=j.kronjee@infopact.nl \
--cc=eric.dumazet@gmail.com \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.