* Conntrack Events Performance - Multipart Messages? @ 2008-07-16 16:42 Fabian Hugelshofer 2008-07-17 9:16 ` Patrick McHardy 2008-07-17 10:03 ` Pablo Neira Ayuso 0 siblings, 2 replies; 30+ messages in thread From: Fabian Hugelshofer @ 2008-07-16 16:42 UTC (permalink / raw) To: netfilter-devel Hi, I am writing a network application for a genuine wireless router (266Mhz IXP4XX). I am capturing packets with ULOG and need connection tracking. For performance reasons I planned to use connection tracking events (NEW/DESTROY) to avoid doing the same work twice. In a high load test case I stress the router with UDP packets with random source ports (1000B payload, 1800pps). CPU usage is 100%, 10% of packets and 80% ctevents are dropped. If I disable ctevents, the CPU usage is just 24% and no packet drops occur. My application is not very heavy and I expect most of the ctevent overhead to be caused by passing events from kernel to user space. I expect that performance could be increased by using multipart messages for ctevents like it is done in ULOG/NFLOG. Do you share my opinion, that multipart messages would lead to significant performance improvements? (Actually, I doubt that I will be more efficient than performing connection tracking in user space) Do you think introducing multipart messages for connection tracking events is feasible without breaking existing applications? Maybe with a default setting of 1 bundled events, which can be increased by a function call? Is someone intending to implement multipart messages for ctevents? ;-) Any comments are appreciated. Regards, Fabian ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-16 16:42 Conntrack Events Performance - Multipart Messages? Fabian Hugelshofer @ 2008-07-17 9:16 ` Patrick McHardy 2008-07-17 10:03 ` Pablo Neira Ayuso 1 sibling, 0 replies; 30+ messages in thread From: Patrick McHardy @ 2008-07-17 9:16 UTC (permalink / raw) To: Fabian Hugelshofer; +Cc: netfilter-devel Fabian Hugelshofer wrote: > Hi, > > I am writing a network application for a genuine wireless router (266Mhz > IXP4XX). I am capturing packets with ULOG and need connection tracking. > For performance reasons I planned to use connection tracking events > (NEW/DESTROY) to avoid doing the same work twice. > > In a high load test case I stress the router with UDP packets with > random source ports (1000B payload, 1800pps). CPU usage is 100%, 10% of > packets and 80% ctevents are dropped. If I disable ctevents, the CPU > usage is just 24% and no packet drops occur. > > My application is not very heavy and I expect most of the ctevent > overhead to be caused by passing events from kernel to user space. I > expect that performance could be increased by using multipart messages > for ctevents like it is done in ULOG/NFLOG. > > Do you share my opinion, that multipart messages would lead to > significant performance improvements? (Actually, I doubt that I will be > more efficient than performing connection tracking in user space) Quite possible, but some profiles would be useful to determine whether this is actually the bottleneck. > Do you think introducing multipart messages for connection tracking > events is feasible without breaking existing applications? Maybe with a > default setting of 1 bundled events, which can be increased by a > function call? That sounds sane. > Is someone intending to implement multipart messages for ctevents? ;-) I don't think so. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-16 16:42 Conntrack Events Performance - Multipart Messages? Fabian Hugelshofer 2008-07-17 9:16 ` Patrick McHardy @ 2008-07-17 10:03 ` Pablo Neira Ayuso 2008-07-17 14:34 ` Fabian Hugelshofer 2008-07-18 2:11 ` Patrick McHardy 1 sibling, 2 replies; 30+ messages in thread From: Pablo Neira Ayuso @ 2008-07-17 10:03 UTC (permalink / raw) To: Fabian Hugelshofer; +Cc: netfilter-devel Fabian Hugelshofer wrote: > I am writing a network application for a genuine wireless router (266Mhz > IXP4XX). I am capturing packets with ULOG and need connection tracking. > For performance reasons I planned to use connection tracking events > (NEW/DESTROY) to avoid doing the same work twice. Did you write your own application to handle ctevents and ULOG messages? Are you using any library? What does your application do? We now have the berkeley socket filtering facilities for netlink, you may use it to filter only the events that you need. I have a patch here for libnetfilter_conntrack that introduces a high-level API to autogenerate simple BSF code for filtering. As soon as I finish testing it, I'll commit it. Also, you may periodically dump the connection tracking table (polling), but, of course, this depends on the nature of your application. Assuming that your application is a logger, this is not a choice as you'll lose information. > In a high load test case I stress the router with UDP packets with > random source ports (1000B payload, 1800pps). CPU usage is 100%, 10% of > packets and 80% ctevents are dropped. If I disable ctevents, the CPU > usage is just 24% and no packet drops occur. I have a similar testbed here. You did not mention the threshold that you're using in ULOG. If you provide more information on your application I'll try to reproduce those numbers. > My application is not very heavy and I expect most of the ctevent > overhead to be caused by passing events from kernel to user space. I > expect that performance could be increased by using multipart messages > for ctevents like it is done in ULOG/NFLOG. > > Do you share my opinion, that multipart messages would lead to > significant performance improvements? (Actually, I doubt that I will be > more efficient than performing connection tracking in user space) Yes, I think that batching could help here. > Do you think introducing multipart messages for connection tracking > events is feasible without breaking existing applications? Maybe with a > default setting of 1 bundled events, which can be increased by a > function call? AFAIK, libnfnetlink and other netlink-based libraries should handle the multipart messages appropriately so that should not be a problem. > Is someone intending to implement multipart messages for ctevents? ;-) The problem here is that batching should be a per-socket parameter. We will not accept a patch that changes the behaviour for all the ctevent users. And I don't see an obvious way to do this now. -- "Los honestos son inadaptados sociales" -- Les Luthiers ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-17 10:03 ` Pablo Neira Ayuso @ 2008-07-17 14:34 ` Fabian Hugelshofer 2008-07-17 15:15 ` Fabian Hugelshofer 2008-07-18 15:56 ` Fabian Hugelshofer 2008-07-18 2:11 ` Patrick McHardy 1 sibling, 2 replies; 30+ messages in thread From: Fabian Hugelshofer @ 2008-07-17 14:34 UTC (permalink / raw) To: netfilter-devel Pablo Neira Ayuso wrote: > Fabian Hugelshofer wrote: >> I am writing a network application for a genuine wireless router (266Mhz >> IXP4XX). I am capturing packets with ULOG and need connection tracking. >> For performance reasons I planned to use connection tracking events >> (NEW/DESTROY) to avoid doing the same work twice. > > Did you write your own application to handle ctevents and ULOG messages? > Are you using any library? What does your application do? I am using: libnetfilter_conntrack 0.0.89 libnfnetlink 0.0.38 libipulog (from ulog2 r6382) My application parses packets (IP and transport headers) and increments different counters per sending host. For this a hash table lookup is performed. Similar for ctevents. > We now have the berkeley socket filtering facilities for netlink, you > may use it to filter only the events that you need. I have a patch here > for libnetfilter_conntrack that introduces a high-level API to > autogenerate simple BSF code for filtering. As soon as I finish testing > it, I'll commit it. This is interesting and useful in various cases. Thanks for implementing. However, here I does not help, as I need all events. > Also, you may periodically dump the connection tracking table (polling), > but, of course, this depends on the nature of your application. Assuming > that your application is a logger, this is not a choice as you'll lose > information. Polling is not possible as I need destroy events and information in them. Further I don't want to loose any events. >> In a high load test case I stress the router with UDP packets with >> random source ports (1000B payload, 1800pps). CPU usage is 100%, 10% of >> packets and 80% ctevents are dropped. If I disable ctevents, the CPU >> usage is just 24% and no packet drops occur. > > I have a similar testbed here. You did not mention the threshold that > you're using in ULOG. If you provide more information on your > application I'll try to reproduce those numbers. --ulog-cprange 96 --ulog-qthreshold 50 A dispatcher is calling back a ulog and a ctevent module as soon as their sockets are readable. The modules then read one message from the socket and process it before returning to the dispatcher. If both sockets are readable at the same time, both modules are called back. To reduce any side effects I wrote a small test application which just reads the ctevent socket and does nothing else. You find it attached to this email. I started the application, sent 102'313 UDP packets with random source ports and 1000B UDP payload in 57s (1795pps) and then waited until all entries have been removed from the connection table. The CPU usage was measured with top every 10s while sending and then averaged over 5 intervals. It was 56%, which seems quite high to me. Without any applications running it is 11% to route this UDP traffic. The test application reports 113699 received events and 140 overflows. The events are NEW and DESTROY events. Generally I would roughly expect the double number of events than packets. Under this assumption 44% of events have been dropped. As I wrote earlier my real application is easily able to capture and process packets from this traffic flow without any drops and 24% CPU usage, as long as I unload nf_conntrack_netlink or do not initialise my ctevent module. If ctevents are used it is reasonable that more events are lost than with the test application as no events are read as long as a multipart ulog message is processed. However 44% drops with the test app seems too high. I think that either ctevents are very expensive in general or a big overhead is introduced by passing them one at the time. >> Do you think introducing multipart messages for connection tracking >> events is feasible without breaking existing applications? Maybe with a >> default setting of 1 bundled events, which can be increased by a >> function call? > > AFAIK, libnfnetlink and other netlink-based libraries should handle the > multipart messages appropriately so that should not be a problem. > >> Is someone intending to implement multipart messages for ctevents? ;-) > > The problem here is that batching should be a per-socket parameter. We > will not accept a patch that changes the behaviour for all the ctevent > users. And I don't see an obvious way to do this now. If you do see a good way to do it, let me know. Depending on how much time I have I might consider implementing it. Fabian Example top for ctevtest: Cpu(s): 0.1%us, 3.1%sy, 0.0%ni, 38.8%id, 0.0%wa, 1.4%hi, 56.6%si, 0.0%st PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3913 root 20 0 696 220 160 R 32.1 0.7 0:03.89 ctevtest 3916 root 20 0 1068 548 412 R 0.8 1.8 0:00.22 top ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-17 14:34 ` Fabian Hugelshofer @ 2008-07-17 15:15 ` Fabian Hugelshofer 2008-07-18 15:56 ` Fabian Hugelshofer 1 sibling, 0 replies; 30+ messages in thread From: Fabian Hugelshofer @ 2008-07-17 15:15 UTC (permalink / raw) To: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 237 bytes --] Fabian Hugelshofer wrote: > To reduce any side effects I wrote a small test application which just > reads the ctevent socket and does nothing else. You find it attached to > this email. Forgot to attach, you find it in this email... [-- Attachment #2: ctevtest.c --] [-- Type: text/x-csrc, Size: 1201 bytes --] #include <stdlib.h> #include <stdio.h> #include <errno.h> #include <signal.h> #include <libnfnetlink/libnfnetlink.h> #include <libnetfilter_conntrack/libnetfilter_conntrack.h> volatile sig_atomic_t terminate; static void __sig_handler(int sig) { switch (sig) { case SIGTERM: case SIGINT: terminate = 1; break; } } int main(int argc, char* argv[]) { struct nfct_handle *h; struct sigaction sigact; char buf[NFNL_BUFFSIZE] __attribute__ ((aligned)); int len; int events = 0; int overflows = 0; terminate = 0; sigact.sa_handler = &__sig_handler; sigaction(SIGINT, &sigact, NULL); sigaction(SIGTERM, &sigact, NULL); h = nfct_open(NFNL_SUBSYS_CTNETLINK, NF_NETLINK_CONNTRACK_NEW | NF_NETLINK_CONNTRACK_DESTROY); if (h == NULL) { perror("opening ctnetlink failed"); exit(EXIT_FAILURE); } while (!terminate) { len = recv(nfct_fd(h), buf, sizeof(buf), 0); if (len < 0) { if (errno == ENOBUFS) { overflows++; } else if (errno != EINTR) { perror("recv failed"); nfct_close(h); exit(EXIT_FAILURE); } } else { events++; } } printf("%d events received (%d overflows)\n", events, overflows); nfct_close(h); exit(EXIT_SUCCESS); } ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-17 14:34 ` Fabian Hugelshofer 2008-07-17 15:15 ` Fabian Hugelshofer @ 2008-07-18 15:56 ` Fabian Hugelshofer 1 sibling, 0 replies; 30+ messages in thread From: Fabian Hugelshofer @ 2008-07-18 15:56 UTC (permalink / raw) To: netfilter-devel The last message I accidentially first sent to Pablo only. He replied without noticing that the list was missing. You find his reply included in this message. (And I got it wrong with this message again, sorry for that.) Pablo Neira Ayuso wrote: > Fabian Hugelshofer wrote: >> I started the application, sent 102'313 UDP packets with random source >> ports and 1000B UDP payload in 57s (1795pps) and then waited until all >> entries have been removed from the connection table. The CPU usage was >> measured with top every 10s while sending and then averaged over 5 >> intervals. It was 56%, which seems quite high to me. Without any >> applications running it is 11% to route this UDP traffic. >> >> The test application reports 113699 received events and 140 overflows. >> The events are NEW and DESTROY events. Generally I would roughly expect >> the double number of events than packets. Under this assumption 44% of >> events have been dropped. > > I guess that your device has little memory so the default socket buffer > must be pretty small. I suggest you to increase the socket buffer size > via nfnl_rcvbufsiz(), that will delay the ENOBUFS. I'd like to see the > results with my suggestion. The system has only 32MB of RAM. The default socket buffer size is 110592 bytes which applies to ctevtest. My real application increases the socket buffer to 430080 bytes. > Of course, this suggestion is not directly related with the message > batching that you're proposing that can be useful to reduce CPU > consumption - if someone wants to use ctevents for logging purposes > which is what you want. I increased the socket buffer size of ctevtest to 2MB. With this setting no more overruns occur and all events can be logged (same test as before). The CPU usage is now 84%. I did not retest my real application yet, but the CPU usage will hit 100% again and the bigger buffer size will not help to prevent losses anymore. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-17 10:03 ` Pablo Neira Ayuso 2008-07-17 14:34 ` Fabian Hugelshofer @ 2008-07-18 2:11 ` Patrick McHardy 2008-07-21 15:51 ` Fabian Hugelshofer 1 sibling, 1 reply; 30+ messages in thread From: Patrick McHardy @ 2008-07-18 2:11 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Fabian Hugelshofer, netfilter-devel Pablo Neira Ayuso wrote: > Fabian Hugelshofer wrote: >> Is someone intending to implement multipart messages for ctevents? ;-) > > The problem here is that batching should be a per-socket parameter. We > will not accept a patch that changes the behaviour for all the ctevent > users. And I don't see an obvious way to do this now Thats a good point, there's no pretty solution to this. Profiles would still be nice to have though :) ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-18 2:11 ` Patrick McHardy @ 2008-07-21 15:51 ` Fabian Hugelshofer 2008-07-21 15:59 ` Patrick McHardy 0 siblings, 1 reply; 30+ messages in thread From: Fabian Hugelshofer @ 2008-07-21 15:51 UTC (permalink / raw) To: Patrick McHardy; +Cc: Pablo Neira Ayuso, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 2826 bytes --] Patrick McHardy wrote: > Pablo Neira Ayuso wrote: >> The problem here is that batching should be a per-socket parameter. We >> will not accept a patch that changes the behaviour for all the ctevent >> users. And I don't see an obvious way to do this now > > Thats a good point, there's no pretty solution to this. Profiles > would still be nice to have though :) It took me some time to set up profiling on the router. I am using oprofile. nf_conntrack and nfnetlink are built into the kernel. Most of the time is probably spent in nfnetlink. If you need other data than the one provided here, just let me know. I'm not very familiar with oprofile, so providing the needed arguments would be helpful. opreport: CPU: ARM/XScale PMU2, speed 0 MHz (estimated) Counted CPU_CYCLES events (clock cycles counter) with a unit mask of 0x00 (No unit mask) count 100000 CPU_CYCLES:100000| samples| %| ------------------ 156310 75.3672 vmlinux 25733 12.4075 ath_pci 14674 7.0753 wlan 5383 2.5955 nf_conntrack_netlink 2926 1.4108 oprofiled 1206 0.5815 ath_rate_minstrel 246 0.1186 libuClibc-0.9.29.so 234 0.1128 iptable_raw 232 0.1119 ld-uClibc-0.9.29.so 145 0.0699 ctevtest 128 0.0617 busybox 99 0.0477 libnfnetlink.so.0.2.0 56 0.0270 libnetfilter_conntrack.so.1.2.0 25 0.0121 arp_tables 1 4.8e-04 arptable_filter opreport --symbols (first 20, full file attached): CPU: ARM/XScale PMU2, speed 0 MHz (estimated) Counted CPU_CYCLES events (clock cycles counter) with a unit mask of 0x00 (No unit mask) count 100000 samples % app name symbol name 20018 9.6520 vmlinux memcpy 19493 9.3988 ath_pci.ko ath_sysctl_register 6676 3.2189 vmlinux __nf_conntrack_find 6098 2.9402 vmlinux ipt_do_table 5858 2.8245 wlan.ko ieee80211_input 5383 2.5955 nf_conntrack_netlink.ko .text 4567 2.2020 vmlinux __memzero 4225 2.0371 vmlinux __kmalloc 4091 1.9725 vmlinux csum_partial 3469 1.6726 vmlinux nf_nat_setup_info 3468 1.6721 vmlinux netlink_broadcast 3376 1.6278 vmlinux __hash_conntrack 3304 1.5931 vmlinux nla_put 2992 1.4426 vmlinux __nla_reserve 2933 1.4142 vmlinux __nf_conntrack_confirm 2926 1.4108 oprofiled /jffs/usr/bin/oprofiled 2847 1.3727 ath_pci.ko ath_intr 2698 1.3009 vmlinux kfree 2655 1.2801 vmlinux __nla_put 2563 1.2358 vmlinux nf_conntrack_in [-- Attachment #2: oprep_symb.txt --] [-- Type: text/plain, Size: 27139 bytes --] CPU: ARM/XScale PMU2, speed 0 MHz (estimated) Counted CPU_CYCLES events (clock cycles counter) with a unit mask of 0x00 (No unit mask) count 100000 samples % app name symbol name 20018 9.6520 vmlinux memcpy 19493 9.3988 ath_pci.ko ath_sysctl_register 6676 3.2189 vmlinux __nf_conntrack_find 6098 2.9402 vmlinux ipt_do_table 5858 2.8245 wlan.ko ieee80211_input 5383 2.5955 nf_conntrack_netlink.ko .text 4567 2.2020 vmlinux __memzero 4225 2.0371 vmlinux __kmalloc 4091 1.9725 vmlinux csum_partial 3469 1.6726 vmlinux nf_nat_setup_info 3468 1.6721 vmlinux netlink_broadcast 3376 1.6278 vmlinux __hash_conntrack 3304 1.5931 vmlinux nla_put 2992 1.4426 vmlinux __nla_reserve 2933 1.4142 vmlinux __nf_conntrack_confirm 2926 1.4108 oprofiled /jffs/usr/bin/oprofiled 2847 1.3727 ath_pci.ko ath_intr 2698 1.3009 vmlinux kfree 2655 1.2801 vmlinux __nla_put 2563 1.2358 vmlinux nf_conntrack_in 2561 1.2348 vmlinux nf_conntrack_alloc 2331 1.1239 ath_pci.ko ath_suspend 2122 1.0232 vmlinux netif_receive_skb 1955 0.9426 vmlinux nf_iterate 1929 0.9301 vmlinux local_bh_enable 1916 0.9238 vmlinux __copy_to_user 1810 0.8727 vmlinux handle_IRQ_event 1798 0.8669 vmlinux dev_queue_xmit 1781 0.8587 wlan.ko ieee80211_find_txnode 1751 0.8443 vmlinux __alloc_skb 1657 0.7989 vmlinux ip_rcv 1633 0.7874 vmlinux __copy_skb_header 1619 0.7806 wlan.ko ieee80211_hardstart 1566 0.7551 vmlinux pskb_expand_head 1505 0.7257 vmlinux ip_forward 1360 0.6557 vmlinux skb_release_data 1273 0.6138 wlan.ko ieee80211_saveath 1250 0.6027 vmlinux default_idle 1222 0.5892 vmlinux nf_nat_fn 1206 0.5815 ath_rate_minstrel.ko .text 1197 0.5772 vmlinux xscale_dma_inv_range 1182 0.5699 vmlinux kmem_cache_alloc 1130 0.5448 vmlinux skb_release_all 1064 0.5130 vmlinux ip_finish_output 1062 0.5121 ath_pci.ko ath_sysctl_unregister 1039 0.5010 vmlinux netlink_recvmsg 1038 0.5005 vmlinux ip_route_input 1031 0.4971 vmlinux dma_map_single 1008 0.4860 vmlinux local_bh_disable 990 0.4773 vmlinux skb_copy 982 0.4735 vmlinux nf_ct_invert_tuple 967 0.4663 vmlinux nf_hook_slow 936 0.4513 vmlinux ip_rcv_finish 914 0.4407 vmlinux kmem_cache_free 901 0.4344 vmlinux udp_error 865 0.4171 vmlinux __wake_up 864 0.4166 vmlinux __mod_timer 845 0.4074 vmlinux dev_hard_start_xmit 843 0.4065 vmlinux death_by_timeout 802 0.3867 vmlinux __nf_ct_refresh_acct 764 0.3684 vmlinux memset 733 0.3534 vmlinux memcmp 732 0.3529 vmlinux __nf_ct_l4proto_find 730 0.3520 vmlinux pfifo_fast_enqueue 692 0.3337 vmlinux destroy_conntrack 688 0.3317 vmlinux ip_rt_send_redirect 679 0.3274 wlan.ko ieee80211_encap 678 0.3269 vmlinux dma_unmap_single 647 0.3120 wlan.ko ieee80211_unref_node 643 0.3100 vmlinux __kfree_skb 632 0.3047 vmlinux nlmsg_notify 601 0.2898 vmlinux ip_output 600 0.2893 vmlinux dma_cache_maint 597 0.2879 vmlinux kfree_skb 593 0.2859 vmlinux __udivsi3 590 0.2845 vmlinux ipv4_get_l4proto 586 0.2825 vmlinux skb_checksum 582 0.2806 vmlinux netlink_has_listeners 576 0.2777 vmlinux __aeabi_idiv 568 0.2739 vmlinux notifier_call_chain 534 0.2575 wlan.ko ieee80211_dev_alloc_skb 518 0.2498 vmlinux nf_ct_port_tuple_to_nlattr 518 0.2498 vmlinux nf_nat_packet 514 0.2478 wlan.ko ieee80211_skb_track 508 0.2449 vmlinux nf_ct_deliver_cached_events 499 0.2406 vmlinux nf_nat_rule_find 490 0.2363 vmlinux nf_ip_checksum 481 0.2319 vmlinux __nf_ct_ext_add 479 0.2310 vmlinux nf_nat_cleanup_conntrack 471 0.2271 vmlinux rt_hash_code 469 0.2261 wlan.ko ieee80211_cancel_scan 464 0.2237 vmlinux nf_ct_invert_tuplepr 456 0.2199 vmlinux nf_ct_get_tuple 455 0.2194 vmlinux __nf_ct_helper_find 439 0.2117 vmlinux ipv4_tuple_to_nlattr 434 0.2093 vmlinux nf_ct_l3proto_find_get 430 0.2073 vmlinux alloc_null_binding 424 0.2044 vmlinux del_timer 419 0.2020 vmlinux __nf_ct_ext_destroy 413 0.1991 vmlinux sock_def_readable 403 0.1943 vmlinux udp_pkt_to_tuple 401 0.1933 vmlinux dma_sync_single_for_cpu 396 0.1909 vmlinux ipv4_pkt_to_tuple 386 0.1861 vmlinux copy_skb_header 384 0.1852 vmlinux ipv4_invert_tuple 377 0.1818 vmlinux __nf_ct_event_cache_init 365 0.1760 vmlinux nfnetlink_send 363 0.1750 vmlinux __skb_checksum_complete_head 363 0.1750 vmlinux nf_ct_l4proto_find_get 356 0.1717 vmlinux module_put 354 0.1707 vmlinux skb_copy_bits 343 0.1654 vmlinux cpu_idle 342 0.1649 vmlinux sock_recvmsg 333 0.1606 vmlinux nf_nat_in 332 0.1601 wlan.ko ieee80211_parent_queue_xmit 330 0.1591 vmlinux ipv4_conntrack_defrag 323 0.1557 vmlinux ipt_route_hook 320 0.1543 vmlinux nf_ct_remove_expectations 319 0.1538 vmlinux __nf_ct_expect_find 314 0.1514 vmlinux sys_recvfrom 309 0.1490 vmlinux udp_invert_tuple 299 0.1442 vmlinux atomic_notifier_call_chain 295 0.1422 wlan.ko ieee80211_ref_node 293 0.1413 vmlinux ipv4_confirm 290 0.1398 vmlinux pfifo_fast_dequeue 284 0.1369 vmlinux ipv4_conntrack_help 280 0.1350 vmlinux __nf_conntrack_hash_insert 274 0.1321 wlan.ko ieee80211_setup_rates 273 0.1316 vmlinux nf_conntrack_find_get 273 0.1316 vmlinux nf_nat_used_tuple 271 0.1307 vmlinux nf_ct_l3proto_put 267 0.1287 vmlinux nf_conntrack_tuple_taken 267 0.1287 vmlinux nfnetlink_has_listeners 263 0.1268 vmlinux udp_packet 261 0.1258 vmlinux dev_kfree_skb_any 259 0.1249 vmlinux nf_nat_out 247 0.1191 vmlinux xscale_dma_clean_range 246 0.1186 libuClibc-0.9.29.so /rom/lib/libuClibc-0.9.29.so 246 0.1186 wlan.ko ieee80211_dev_kfree_skb 244 0.1176 vmlinux __umodsi3 241 0.1162 vmlinux dma_needs_bounce 240 0.1157 vmlinux __do_softirq 240 0.1157 vmlinux lock_timer_base 234 0.1128 iptable_raw.ko .text 233 0.1123 vmlinux nf_ct_find_expectation 232 0.1119 ld-uClibc-0.9.29.so /rom/lib/ld-uClibc-0.9.29.so 232 0.1119 vmlinux cpu_xscale_switch_mm 230 0.1109 vmlinux skb_queue_tail 228 0.1099 vmlinux ip_forward_finish 228 0.1099 vmlinux nf_ct_l4proto_put 226 0.1090 vmlinux net_rx_action 209 0.1008 vmlinux add_event_entry 209 0.1008 vmlinux skb_dequeue 208 0.1003 vmlinux init_timer 200 0.0964 vmlinux vector_swi 191 0.0921 vmlinux __csum_ipv6_magic 188 0.0906 vmlinux memcpy_toiovec 188 0.0906 vmlinux skb_recv_datagram 178 0.0858 vmlinux sync_buffer 166 0.0800 vmlinux skb_copy_datagram_iovec 163 0.0786 vmlinux nf_conntrack_destroy 158 0.0762 vmlinux __atomic_notifier_call_chain 158 0.0762 vmlinux ipv4_conntrack_in 151 0.0728 vmlinux nf_conntrack_free 148 0.0714 vmlinux schedule 144 0.0694 vmlinux ip_sabotage_in 142 0.0685 vmlinux xscale_flush_user_cache_range 138 0.0665 vmlinux ipt_hook 135 0.0651 vmlinux __qdisc_run 134 0.0646 vmlinux nf_nat_adjust 128 0.0617 busybox /rom/bin/busybox 126 0.0608 wlan.ko ieee80211_iterate_dev_nodes 124 0.0598 vmlinux __skb_checksum_complete 123 0.0593 vmlinux sock_rfree 121 0.0583 ctevtest main 108 0.0521 vmlinux fget_light 101 0.0487 vmlinux add_sample_entry 99 0.0477 libnfnetlink.so.0.2.0 /rom/usr/lib/libnfnetlink.so.0.2.0 99 0.0477 vmlinux mc_copy_user_page 98 0.0473 vmlinux helper_hash 98 0.0473 vmlinux update_mmu_cache 97 0.0468 vmlinux sys_recv 94 0.0453 vmlinux neigh_resolve_output 87 0.0419 vmlinux net_tx_action 83 0.0400 vmlinux __napi_schedule 71 0.0342 vmlinux tasklet_action 70 0.0338 vmlinux xscale_mc_clear_user_page 64 0.0309 vmlinux run_timer_softirq 57 0.0275 vmlinux netlink_overrun 56 0.0270 libnetfilter_conntrack.so.1.2.0 /jffs/usr/lib/libnetfilter_conntrack.so.1.2.0 53 0.0256 vmlinux skb_free_datagram 40 0.0193 vmlinux __flush_whole_cache 40 0.0193 vmlinux udp_new 39 0.0188 vmlinux get_page_from_freelist 37 0.0178 vmlinux eth_header 34 0.0164 vmlinux tick_nohz_stop_sched_tick 32 0.0154 vmlinux tick_nohz_restart_sched_tick 29 0.0140 vmlinux handle_mm_fault 27 0.0130 vmlinux __link_path_walk 27 0.0130 vmlinux ret_fast_syscall 26 0.0125 vmlinux __tasklet_schedule 25 0.0121 arp_tables.ko arpt_do_table 24 0.0116 ctevtest .plt 24 0.0116 vmlinux find_lock_page 24 0.0116 vmlinux unmap_vmas 23 0.0111 vmlinux hrtimer_run_queues 22 0.0106 vmlinux __switch_to 22 0.0106 vmlinux find_vma 22 0.0106 vmlinux pfifo_fast_requeue 21 0.0101 vmlinux __queue_work 21 0.0101 vmlinux do_page_fault 18 0.0087 vmlinux eth_poll 17 0.0082 vmlinux __dabt_usr 16 0.0077 vmlinux __do_fault 15 0.0072 vmlinux arp_process 14 0.0068 vmlinux filemap_fault 12 0.0058 vmlinux cache_reap 11 0.0053 vmlinux put_page 10 0.0048 vmlinux __rcu_process_callbacks 9 0.0043 vmlinux copy_process 9 0.0043 vmlinux do_alignment 9 0.0043 vmlinux drain_array 9 0.0043 vmlinux dsp_do 9 0.0043 vmlinux fib_semantic_match 9 0.0043 vmlinux free_hot_cold_page 9 0.0043 wlan.ko ieee80211_beacon_update 8 0.0039 vmlinux __d_lookup 8 0.0039 vmlinux vma_adjust 7 0.0034 vmlinux copy_page_range 7 0.0034 vmlinux fib_validate_source 7 0.0034 vmlinux zone_watermark_ok 6 0.0029 vmlinux __wake_up_bit 6 0.0029 vmlinux anon_vma_unlink 6 0.0029 vmlinux cpu_xscale_set_pte_ext 6 0.0029 vmlinux do_DataAbort 6 0.0029 vmlinux do_alignment_ldrstr 6 0.0029 vmlinux load_elf_binary 6 0.0029 vmlinux mdio_read 6 0.0029 vmlinux neigh_lookup 6 0.0029 vmlinux xscale_flush_kern_dcache_page 6 0.0029 wlan.ko ieee80211_skb_untrack 5 0.0024 vmlinux __clear_user 5 0.0024 vmlinux __page_set_anon_rmap 5 0.0024 vmlinux arp_hash 5 0.0024 vmlinux arp_rcv 5 0.0024 vmlinux do_exit 5 0.0024 vmlinux do_lookup 5 0.0024 vmlinux do_wp_page 5 0.0024 vmlinux dput 5 0.0024 vmlinux unlock_page 4 0.0019 vmlinux __up_read 4 0.0019 vmlinux find_get_page 4 0.0019 vmlinux find_vma_prev 4 0.0019 vmlinux finish_task_switch 4 0.0019 vmlinux fn_hash_lookup 4 0.0019 vmlinux fput 4 0.0019 vmlinux free_pgtables 4 0.0019 vmlinux neigh_update 4 0.0019 vmlinux page_add_file_rmap 4 0.0019 vmlinux qmgr_irq1 4 0.0019 vmlinux up_read 4 0.0019 vmlinux vma_link 4 0.0019 wlan.ko ieee80211_recv_mgmt 3 0.0014 vmlinux __copy_from_user 3 0.0014 vmlinux __dentry_open 3 0.0014 vmlinux __down_read_trylock 3 0.0014 vmlinux __pabt_usr 3 0.0014 vmlinux __pagevec_lru_add_active 3 0.0014 vmlinux __udp4_lib_rcv 3 0.0014 vmlinux __up_write 3 0.0014 vmlinux __vm_enough_memory 3 0.0014 vmlinux __vma_link 3 0.0014 vmlinux anon_vma_link 3 0.0014 vmlinux anon_vma_prepare 3 0.0014 vmlinux cpu_xscale_dcache_clean_area 3 0.0014 vmlinux do_mmap_pgoff 3 0.0014 vmlinux do_munmap 3 0.0014 vmlinux do_sync_read 3 0.0014 vmlinux down_read_trylock 3 0.0014 vmlinux eth_type_trans 3 0.0014 vmlinux file_read_actor 3 0.0014 vmlinux filp_close 3 0.0014 vmlinux find_mergeable_anon_vma 3 0.0014 vmlinux finish_wait 3 0.0014 vmlinux generic_fillattr 3 0.0014 vmlinux lru_add_drain 3 0.0014 vmlinux mark_page_accessed 3 0.0014 vmlinux mmap_region 3 0.0014 vmlinux page_remove_rmap 3 0.0014 vmlinux release_mm 3 0.0014 vmlinux ret_to_user 3 0.0014 vmlinux touch_atime 3 0.0014 vmlinux v4wbi_flush_user_tlb_range 3 0.0014 vmlinux vm_normal_page 3 0.0014 wlan.ko ieee80211_fix_rate 2 9.6e-04 vmlinux __alloc_pages 2 9.6e-04 vmlinux __dabt_svc 2 9.6e-04 vmlinux __down_read 2 9.6e-04 vmlinux __free_pages 2 9.6e-04 vmlinux __rb_rotate_left 2 9.6e-04 vmlinux __remove_shared_vm_struct 2 9.6e-04 vmlinux __user_walk_fd 2 9.6e-04 vmlinux _atomic_dec_and_lock 2 9.6e-04 vmlinux acct_collect 2 9.6e-04 vmlinux add_to_page_cache 2 9.6e-04 vmlinux arch_get_unmapped_area 2 9.6e-04 vmlinux can_vma_merge_after 2 9.6e-04 vmlinux cap_vm_enough_memory 2 9.6e-04 vmlinux check_mini_fo_file 2 9.6e-04 vmlinux cond_resched 2 9.6e-04 vmlinux current_fs_time 2 9.6e-04 vmlinux delayed_work_timer_fn 2 9.6e-04 vmlinux dentry_open 2 9.6e-04 vmlinux dnotify_parent 2 9.6e-04 vmlinux do_path_lookup 2 9.6e-04 vmlinux down_read 2 9.6e-04 vmlinux eth_rx_irq 2 9.6e-04 vmlinux fget 2 9.6e-04 vmlinux file_free_rcu 2 9.6e-04 vmlinux file_ra_state_init 2 9.6e-04 vmlinux find_vma_prepare 2 9.6e-04 vmlinux free_page_and_swap_cache 2 9.6e-04 vmlinux free_pgd_slow 2 9.6e-04 vmlinux get_pgd_slow 2 9.6e-04 vmlinux getname 2 9.6e-04 vmlinux getnstimeofday 2 9.6e-04 vmlinux kthread_should_stop 2 9.6e-04 vmlinux locks_remove_posix 2 9.6e-04 vmlinux may_expand_vm 2 9.6e-04 vmlinux mini_fo_d_delete 2 9.6e-04 vmlinux mini_fo_d_revalidate 2 9.6e-04 vmlinux mini_fo_flush 2 9.6e-04 vmlinux mini_fo_read 2 9.6e-04 vmlinux mini_fo_readlink 2 9.6e-04 vmlinux mntput_no_expire 2 9.6e-04 vmlinux move_page_tables 2 9.6e-04 vmlinux mprotect_fixup 2 9.6e-04 vmlinux mutex_lock 2 9.6e-04 vmlinux page_getlink 2 9.6e-04 vmlinux page_waitqueue 2 9.6e-04 vmlinux permission 2 9.6e-04 vmlinux prepare_to_wait_exclusive 2 9.6e-04 vmlinux proc_flush_task 2 9.6e-04 vmlinux ramfs_get_inode 2 9.6e-04 vmlinux rb_insert_color 2 9.6e-04 vmlinux read_cache_page 2 9.6e-04 vmlinux ret_from_exception 2 9.6e-04 vmlinux run_workqueue 2 9.6e-04 vmlinux split_vma 2 9.6e-04 vmlinux sys_close 2 9.6e-04 vmlinux sys_read 2 9.6e-04 vmlinux timespec_trunc 2 9.6e-04 vmlinux unlink_file_vma 2 9.6e-04 vmlinux vfs_read 2 9.6e-04 vmlinux vm_stat_account 2 9.6e-04 vmlinux worker_thread 2 9.6e-04 vmlinux wq_sync_buffer 2 9.6e-04 vmlinux write_chan 2 9.6e-04 vmlinux xscale_mc_copy_user_page 2 9.6e-04 wlan.ko ieee80211_find_rxnode 1 4.8e-04 arptable_filter.ko .text 1 4.8e-04 vmlinux __anon_vma_link 1 4.8e-04 vmlinux __down_write 1 4.8e-04 vmlinux __down_write_nested 1 4.8e-04 vmlinux __fput 1 4.8e-04 vmlinux __get_free_pages 1 4.8e-04 vmlinux __get_user_4 1 4.8e-04 vmlinux __lookup_mnt 1 4.8e-04 vmlinux __mark_inode_dirty 1 4.8e-04 vmlinux __netdev_alloc_skb 1 4.8e-04 vmlinux __strncpy_from_user 1 4.8e-04 vmlinux __strnlen_user 1 4.8e-04 vmlinux __wake_up_sync 1 4.8e-04 vmlinux _clear_bit_be 1 4.8e-04 vmlinux acct_stack_growth 1 4.8e-04 vmlinux bit_waitqueue 1 4.8e-04 vmlinux cap_bprm_secureexec 1 4.8e-04 vmlinux compute_creds 1 4.8e-04 vmlinux copy_files 1 4.8e-04 vmlinux copy_namespaces 1 4.8e-04 vmlinux copy_strings 1 4.8e-04 vmlinux cp_new_stat 1 4.8e-04 vmlinux cp_new_stat64 1 4.8e-04 vmlinux current_kernel_time 1 4.8e-04 vmlinux d_alloc 1 4.8e-04 vmlinux dnotify_flush 1 4.8e-04 vmlinux do_generic_mapping_read 1 4.8e-04 vmlinux do_translation_fault 1 4.8e-04 vmlinux dup_fd 1 4.8e-04 vmlinux dupfd 1 4.8e-04 vmlinux expand_files 1 4.8e-04 vmlinux free_hot_page 1 4.8e-04 vmlinux generic_file_mmap 1 4.8e-04 vmlinux generic_readlink 1 4.8e-04 vmlinux get_empty_filp 1 4.8e-04 vmlinux get_signal_to_deliver 1 4.8e-04 vmlinux get_task_mm 1 4.8e-04 vmlinux get_unused_fd_flags 1 4.8e-04 vmlinux get_user_pages 1 4.8e-04 vmlinux half_md4_transform 1 4.8e-04 vmlinux hrtimer_start 1 4.8e-04 vmlinux ip_local_deliver 1 4.8e-04 vmlinux iput 1 4.8e-04 vmlinux ixp4xx_get_cycles 1 4.8e-04 vmlinux ktime_get 1 4.8e-04 vmlinux link_path_walk 1 4.8e-04 vmlinux lookup_mnt 1 4.8e-04 vmlinux lru_cache_add_active 1 4.8e-04 vmlinux max_sane_readahead 1 4.8e-04 vmlinux may_open 1 4.8e-04 vmlinux mini_fo_d_compare 1 4.8e-04 vmlinux mini_fo_follow_link 1 4.8e-04 vmlinux mini_fo_getattr 1 4.8e-04 vmlinux mini_fo_open 1 4.8e-04 vmlinux mini_fo_put_link 1 4.8e-04 vmlinux mm_init 1 4.8e-04 vmlinux mutex_unlock 1 4.8e-04 vmlinux new_inode 1 4.8e-04 vmlinux notify_change 1 4.8e-04 vmlinux open_exec 1 4.8e-04 vmlinux open_namei 1 4.8e-04 vmlinux page_follow_link_light 1 4.8e-04 vmlinux path_walk 1 4.8e-04 vmlinux pdflush_operation 1 4.8e-04 vmlinux pipe_read 1 4.8e-04 vmlinux pipe_read_fasync 1 4.8e-04 vmlinux pipe_read_release 1 4.8e-04 vmlinux pipe_write_fasync 1 4.8e-04 vmlinux prepare_to_wait 1 4.8e-04 vmlinux proc_pident_lookup 1 4.8e-04 vmlinux process_task_mortuary 1 4.8e-04 vmlinux put_pid 1 4.8e-04 vmlinux qmgr_disable_irq 1 4.8e-04 vmlinux qmgr_enable_irq 1 4.8e-04 vmlinux queue_delayed_work_on 1 4.8e-04 vmlinux rb_erase 1 4.8e-04 vmlinux release_pages 1 4.8e-04 vmlinux release_task 1 4.8e-04 vmlinux remove_vma 1 4.8e-04 vmlinux schedule_delayed_work 1 4.8e-04 vmlinux seq_escape 1 4.8e-04 vmlinux serial_in 1 4.8e-04 vmlinux strlen 1 4.8e-04 vmlinux sys_brk 1 4.8e-04 vmlinux sys_fcntl64 1 4.8e-04 vmlinux sys_ioctl 1 4.8e-04 vmlinux sys_mprotect 1 4.8e-04 vmlinux sys_munmap 1 4.8e-04 vmlinux sysctl_head_next 1 4.8e-04 vmlinux tty_ioctl 1 4.8e-04 vmlinux uart_start 1 4.8e-04 vmlinux unmap_region 1 4.8e-04 vmlinux vfs_write 1 4.8e-04 vmlinux vma_merge 1 4.8e-04 vmlinux vsnprintf 1 4.8e-04 wlan.ko ieee80211_chan2ieee 1 4.8e-04 wlan.ko ieee80211_ibss_merge 1 4.8e-04 wlan.ko ieee80211_iterate_nodes ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-21 15:51 ` Fabian Hugelshofer @ 2008-07-21 15:59 ` Patrick McHardy 2008-07-21 17:49 ` Fabian Hugelshofer 0 siblings, 1 reply; 30+ messages in thread From: Patrick McHardy @ 2008-07-21 15:59 UTC (permalink / raw) To: Fabian Hugelshofer; +Cc: Pablo Neira Ayuso, netfilter-devel Fabian Hugelshofer wrote: > It took me some time to set up profiling on the router. I am using > oprofile. nf_conntrack and nfnetlink are built into the kernel. Most of > the time is probably spent in nfnetlink. > > If you need other data than the one provided here, just let me know. I'm > not very familiar with oprofile, so providing the needed arguments would > be helpful. Thanks. > opreport --symbols (first 20, full file attached): > CPU: ARM/XScale PMU2, speed 0 MHz (estimated) > Counted CPU_CYCLES events (clock cycles counter) with a unit mask of > 0x00 (No unit mask) count 100000 > samples % app name symbol name > 20018 9.6520 vmlinux memcpy Callgraph information would be useful since its unclear whether this is the memcpy triggered by netlink message trimming in af_netlink.c or something different. Unfortunately according to the documentation this is only supported on x86. I think selecting the netfilter options as modules should provide slightly more detail though. > 19493 9.3988 ath_pci.ko ath_sysctl_register This looks odd. I couldn't find this function in the current kernel tree, which version are you using? > 6676 3.2189 vmlinux __nf_conntrack_find > 6098 2.9402 vmlinux ipt_do_table > 5858 2.8245 wlan.ko ieee80211_input > 5383 2.5955 nf_conntrack_netlink.ko .text > 4567 2.2020 vmlinux __memzero > 4225 2.0371 vmlinux __kmalloc > 4091 1.9725 vmlinux csum_partial You can disable conntrack checksumming by executing: echo 0 >/proc/sys/net/netfilter/nf_conntrack_checksum > 3469 1.6726 vmlinux nf_nat_setup_info > 3468 1.6721 vmlinux netlink_broadcast > 3376 1.6278 vmlinux __hash_conntrack > 3304 1.5931 vmlinux nla_put > 2992 1.4426 vmlinux __nla_reserve > 2933 1.4142 vmlinux __nf_conntrack_confirm > 2926 1.4108 oprofiled /jffs/usr/bin/oprofiled > 2847 1.3727 ath_pci.ko ath_intr > 2698 1.3009 vmlinux kfree > 2655 1.2801 vmlinux __nla_put > 2563 1.2358 vmlinux nf_conntrack_in > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-21 15:59 ` Patrick McHardy @ 2008-07-21 17:49 ` Fabian Hugelshofer 2008-07-23 14:32 ` Fabian Hugelshofer 0 siblings, 1 reply; 30+ messages in thread From: Fabian Hugelshofer @ 2008-07-21 17:49 UTC (permalink / raw) To: Patrick McHardy; +Cc: Pablo Neira Ayuso, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 2100 bytes --] Patrick McHardy wrote: > Fabian Hugelshofer wrote: >> opreport --symbols (first 20, full file attached): >> CPU: ARM/XScale PMU2, speed 0 MHz (estimated) >> Counted CPU_CYCLES events (clock cycles counter) with a unit mask of >> 0x00 (No unit mask) count 100000 >> samples % app name symbol name >> 20018 9.6520 vmlinux memcpy > > Callgraph information would be useful since its unclear whether > this is the memcpy triggered by netlink message trimming in > af_netlink.c or something different. Unfortunately according > to the documentation this is only supported on x86. I think > selecting the netfilter options as modules should provide > slightly more detail though. Callgraphs work also with ARM. I had to enable callgraph already for capturing, which I did not before. I collected a deepness of 5 levels. You find the result attached to this email. For help on interpreting read: http://oprofile.sourceforge.net/doc/opreport.html#opreport-callgraph It basically shows the same list as before (unindented) with callers on top and callees below. memcpy is mostly invoked by skb_copy and netlink_broadcast (af_netlink). netlink_broadcast is expensive on its own and calls pskb_expand_head which is expensive as well. Using multipart messages would reduce the need to call netlink_broadcast. >> 19493 9.3988 ath_pci.ko ath_sysctl_register > > This looks odd. I couldn't find this function in the current > kernel tree, which version are you using? I am using 2.6.24.2. ath_pci is from the Madwifi drivers. The test setup is all wireless. >> 4091 1.9725 vmlinux csum_partial > > You can disable conntrack checksumming by executing: > > echo 0 >/proc/sys/net/netfilter/nf_conntrack_checksum I don't think that this has a significant impact. As I wrote it is possible to capture all packets (same traffic) with ULOG, parse the headers, do lookups and the counting without problems. For this checksumming is enabled as well. The problems only start if I enable connection events capturing. [-- Attachment #2: oprep_symb_cg5.txt.tar.gz --] [-- Type: application/x-gzip, Size: 32505 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-21 17:49 ` Fabian Hugelshofer @ 2008-07-23 14:32 ` Fabian Hugelshofer 2008-07-23 14:38 ` Patrick McHardy 0 siblings, 1 reply; 30+ messages in thread From: Fabian Hugelshofer @ 2008-07-23 14:32 UTC (permalink / raw) To: netfilter-devel; +Cc: Patrick McHardy, Pablo Neira Ayuso Fabian Hugelshofer wrote: > Patrick McHardy wrote: >> Callgraph information would be useful since its unclear whether >> this is the memcpy triggered by netlink message trimming in >> af_netlink.c or something different. Unfortunately according >> to the documentation this is only supported on x86. I think >> selecting the netfilter options as modules should provide >> slightly more detail though. [...] > > memcpy is mostly invoked by skb_copy and netlink_broadcast (af_netlink). > netlink_broadcast is expensive on its own and calls pskb_expand_head > which is expensive as well. Using multipart messages would reduce the > need to call netlink_broadcast. I profiled again with nfnetlink and nf_conntrack compiled as modules: 103599 61.1842 vmlinux 24481 14.4582 ath_pci 19232 11.3582 nf_conntrack 10435 6.1628 wlan 3588 2.1190 nf_conntrack_netlink 2869 1.6944 oprofiled 1886 1.1138 nf_conntrack_ipv4 1447 0.8546 ath_rate_minstrel 627 0.3703 nfnetlink 237 0.1400 ld-uClibc-0.9.29.so 233 0.1376 libuClibc-0.9.29.so 183 0.1081 iptable_raw 174 0.1028 ctevtest 147 0.0868 busybox 85 0.0502 libnfnetlink.so.0.2.0 60 0.0354 libnetfilter_conntrack.so.1.2.0 38 0.0224 arp_tables 2 0.0012 arptable_filter Again most of the time is spent in the kernel. Memory and skb operations are accounted there. I suspect that they cause the most overhead. Do you plan to dig deeper into optimising the non-optimal parts? I consider myself not to have enough understanding to do it myself. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-23 14:32 ` Fabian Hugelshofer @ 2008-07-23 14:38 ` Patrick McHardy 2008-07-23 16:12 ` Fabian Hugelshofer 0 siblings, 1 reply; 30+ messages in thread From: Patrick McHardy @ 2008-07-23 14:38 UTC (permalink / raw) To: Fabian Hugelshofer; +Cc: netfilter-devel, Pablo Neira Ayuso [-- Attachment #1: Type: text/plain, Size: 1884 bytes --] Fabian Hugelshofer wrote: > Fabian Hugelshofer wrote: >> Patrick McHardy wrote: >>> Callgraph information would be useful since its unclear whether >>> this is the memcpy triggered by netlink message trimming in >>> af_netlink.c or something different. Unfortunately according >>> to the documentation this is only supported on x86. I think >>> selecting the netfilter options as modules should provide >>> slightly more detail though. > [...] >> >> memcpy is mostly invoked by skb_copy and netlink_broadcast >> (af_netlink). netlink_broadcast is expensive on its own and calls >> pskb_expand_head which is expensive as well. Using multipart messages >> would reduce the need to call netlink_broadcast. > > I profiled again with nfnetlink and nf_conntrack compiled as modules: > 103599 61.1842 vmlinux > 24481 14.4582 ath_pci > 19232 11.3582 nf_conntrack > 10435 6.1628 wlan > 3588 2.1190 nf_conntrack_netlink > 2869 1.6944 oprofiled > 1886 1.1138 nf_conntrack_ipv4 > 1447 0.8546 ath_rate_minstrel > 627 0.3703 nfnetlink > 237 0.1400 ld-uClibc-0.9.29.so > 233 0.1376 libuClibc-0.9.29.so > 183 0.1081 iptable_raw > 174 0.1028 ctevtest > 147 0.0868 busybox > 85 0.0502 libnfnetlink.so.0.2.0 > 60 0.0354 libnetfilter_conntrack.so.1.2.0 > 38 0.0224 arp_tables > 2 0.0012 arptable_filter > > Again most of the time is spent in the kernel. Memory and skb operations > are accounted there. I suspect that they cause the most overhead. > > Do you plan to dig deeper into optimising the non-optimal parts? I > consider myself not to have enough understanding to do it myself. The first thing to try would be to use sane allocation sizes for the event messages. This patch doesn't implement it properly (uses probing), but should be enough to test whether it helps. [-- Attachment #2: x --] [-- Type: text/plain, Size: 991 bytes --] diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 105a616..0aa1b30 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -425,6 +425,7 @@ static int ctnetlink_conntrack_event(struct notifier_block *this, unsigned int type; sk_buff_data_t b; unsigned int flags = 0, group; + static unsigned int size = 128; /* ignore our fake conntrack entry */ if (ct == &nf_conntrack_untracked) @@ -446,7 +447,8 @@ static int ctnetlink_conntrack_event(struct notifier_block *this, if (!nfnetlink_has_listeners(group)) return NOTIFY_DONE; - skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); +retry: + skb = alloc_skb(size, GFP_ATOMIC); if (!skb) return NOTIFY_DONE; @@ -525,7 +527,8 @@ static int ctnetlink_conntrack_event(struct notifier_block *this, nlmsg_failure: nla_put_failure: kfree_skb(skb); - return NOTIFY_DONE; + size <<= 1; + goto retry; } #endif /* CONFIG_NF_CONNTRACK_EVENTS */ ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-23 14:38 ` Patrick McHardy @ 2008-07-23 16:12 ` Fabian Hugelshofer 2008-07-23 17:01 ` Patrick McHardy 2008-07-25 8:44 ` Fabian Hugelshofer 0 siblings, 2 replies; 30+ messages in thread From: Fabian Hugelshofer @ 2008-07-23 16:12 UTC (permalink / raw) To: Patrick McHardy; +Cc: netfilter-devel, Pablo Neira Ayuso [-- Attachment #1: Type: text/plain, Size: 2286 bytes --] Patrick McHardy wrote: > Fabian Hugelshofer wrote: >> Again most of the time is spent in the kernel. Memory and skb >> operations are accounted there. I suspect that they cause the most >> overhead. >> >> Do you plan to dig deeper into optimising the non-optimal parts? I >> consider myself not to have enough understanding to do it myself. > > The first thing to try would be to use sane allocation sizes > for the event messages. This patch doesn't implement it properly > (uses probing), but should be enough to test whether it helps. Thanks a lot. This patch already decreased the CPU usage for ctevtest from 85% to 44%. Sweet... I created a new callgraph profile which you find attached to this mail. Let's have a look at two parts: First: 2055 2.7205 ctnetlink_conntrack_event 2378 21.6201 nla_put 2181 19.8291 nfnetlink_send 2055 18.6835 ctnetlink_conntrack_event [self] 1250 11.3647 __alloc_skb 955 8.6826 ipv4_tuple_to_nlattr 752 6.8370 nf_ct_port_tuple_to_nlattr 321 2.9184 __memzero 220 2.0002 nfnetlink_has_listeners 177 1.6092 nf_ct_l4proto_find_get 155 1.4092 __nla_put 116 1.0546 nf_ct_l3proto_find_get 82 0.7455 module_put 70 0.6364 nf_ct_l4proto_put 66 0.6001 nf_ct_l3proto_put 60 0.5455 nlmsg_notify 43 0.3909 netlink_has_listeners 42 0.3819 __kmalloc 37 0.3364 kmem_cache_alloc 26 0.2364 __nf_ct_l4proto_find 13 0.1182 __irq_svc nf_conntrack_event is now one of the first functions listed. Do you see other ways of improving performance? Second: 33 2.4775 __nf_ct_ext_add 63 4.7297 dev_hard_start_xmit 65 4.8799 sock_recvmsg 77 5.7808 netif_receive_skb 92 6.9069 __nla_put 96 7.2072 nf_conntrack_alloc 199 14.9399 nf_conntrack_in 246 18.4685 skb_copy 427 32.0571 nf_ct_invert_tuplepr 1793 2.3737 __memzero 1793 100.000 __memzero [self] Is the zeroing of the inverted tuple in nf_ct_invert_tuple really required? As far as I can see all fields are set by the subsequent code. [-- Attachment #2: opreport_cg_patch.tar.gz --] [-- Type: application/x-gzip, Size: 34247 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-23 16:12 ` Fabian Hugelshofer @ 2008-07-23 17:01 ` Patrick McHardy 2008-07-23 17:07 ` Patrick McHardy 2008-07-23 17:15 ` Fabian Hugelshofer 2008-07-25 8:44 ` Fabian Hugelshofer 1 sibling, 2 replies; 30+ messages in thread From: Patrick McHardy @ 2008-07-23 17:01 UTC (permalink / raw) To: Fabian Hugelshofer; +Cc: netfilter-devel, Pablo Neira Ayuso Fabian Hugelshofer wrote: > Patrick McHardy wrote: >> Fabian Hugelshofer wrote: >>> Again most of the time is spent in the kernel. Memory and skb >>> operations are accounted there. I suspect that they cause the most >>> overhead. >>> >>> Do you plan to dig deeper into optimising the non-optimal parts? I >>> consider myself not to have enough understanding to do it myself. >> >> The first thing to try would be to use sane allocation sizes >> for the event messages. This patch doesn't implement it properly >> (uses probing), but should be enough to test whether it helps. > > Thanks a lot. This patch already decreased the CPU usage for ctevtest > from 85% to 44%. Sweet... Nice. Now we just need to do it properly :) > I created a new callgraph profile which you find attached to this mail. > Let's have a look at two parts: > > First: > 2055 2.7205 ctnetlink_conntrack_event > 2378 21.6201 nla_put > 2181 19.8291 nfnetlink_send > 2055 18.6835 ctnetlink_conntrack_event [self] > 1250 11.3647 __alloc_skb > 955 8.6826 ipv4_tuple_to_nlattr > 752 6.8370 nf_ct_port_tuple_to_nlattr > 321 2.9184 __memzero > 220 2.0002 nfnetlink_has_listeners > 177 1.6092 nf_ct_l4proto_find_get > 155 1.4092 __nla_put > 116 1.0546 nf_ct_l3proto_find_get > 82 0.7455 module_put > 70 0.6364 nf_ct_l4proto_put > 66 0.6001 nf_ct_l3proto_put > 60 0.5455 nlmsg_notify > 43 0.3909 netlink_has_listeners > 42 0.3819 __kmalloc > 37 0.3364 kmem_cache_alloc > 26 0.2364 __nf_ct_l4proto_find > 13 0.1182 __irq_svc > > nf_conntrack_event is now one of the first functions listed. Do you see > other ways of improving performance? For some members doing in-place message construction instead of copying the data might help, but I couldn only spot few only used rarely. The module reference stuff (module_put/nf_ct_*_find_get etc) is clearly superfluous, this runs in packet processing context and shouldn't use module references but RCU. > Second: > 33 2.4775 __nf_ct_ext_add > 63 4.7297 dev_hard_start_xmit > 65 4.8799 sock_recvmsg > 77 5.7808 netif_receive_skb > 92 6.9069 __nla_put > 96 7.2072 nf_conntrack_alloc > 199 14.9399 nf_conntrack_in > 246 18.4685 skb_copy > 427 32.0571 nf_ct_invert_tuplepr > 1793 2.3737 __memzero > 1793 100.000 __memzero [self] > > Is the zeroing of the inverted tuple in nf_ct_invert_tuple really > required? As far as I can see all fields are set by the subsequent code. It dependfs on the protocol family. For IPv6 its completely unnecessary, for IPv4 the last 12 bytes of each address need to be zeroes. We could push this down to the protocols to behave more optimally (actually something I started and didn't finish some time ago). ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-23 17:01 ` Patrick McHardy @ 2008-07-23 17:07 ` Patrick McHardy 2008-07-23 17:30 ` Fabian Hugelshofer 2008-07-23 17:15 ` Fabian Hugelshofer 1 sibling, 1 reply; 30+ messages in thread From: Patrick McHardy @ 2008-07-23 17:07 UTC (permalink / raw) To: Fabian Hugelshofer; +Cc: netfilter-devel, Pablo Neira Ayuso Patrick McHardy wrote: > Fabian Hugelshofer wrote: >> Is the zeroing of the inverted tuple in nf_ct_invert_tuple really >> required? As far as I can see all fields are set by the subsequent code. > > It dependfs on the protocol family. For IPv6 its completely > unnecessary, for IPv4 the last 12 bytes of each address need > to be zeroes. We could push this down to the protocols to > behave more optimally (actually something I started and didn't > finish some time ago). Actually that really is necessary because we have padding in the tuple on at least ARM. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-23 17:07 ` Patrick McHardy @ 2008-07-23 17:30 ` Fabian Hugelshofer 2008-07-23 17:32 ` Patrick McHardy 0 siblings, 1 reply; 30+ messages in thread From: Fabian Hugelshofer @ 2008-07-23 17:30 UTC (permalink / raw) To: Patrick McHardy; +Cc: netfilter-devel, Pablo Neira Ayuso Patrick McHardy wrote: > Patrick McHardy wrote: >> Fabian Hugelshofer wrote: >>> Is the zeroing of the inverted tuple in nf_ct_invert_tuple really >>> required? As far as I can see all fields are set by the subsequent code. >> >> It dependfs on the protocol family. For IPv6 its completely >> unnecessary, for IPv4 the last 12 bytes of each address need >> to be zeroes. We could push this down to the protocols to >> behave more optimally (actually something I started and didn't >> finish some time ago). > > Actually that really is necessary because we have padding in the > tuple on at least ARM. As you write the remaining 12 bytes for IPv4 can easily be handled in the l3protocol. The original tuple is already initialised properly and one would just have to replace things like tuple->src.u3.ip = orig->dst.u3.ip; with tuple->src.u3.all = orig->dst.u3.all; in nf_conntrack_l3proto_ipv4.c. Why do you think the padding causes problems? For hashing e.g. src/dst .u3 and .u are referenced independently of potential padding. Where does access to the padding data occur? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-23 17:30 ` Fabian Hugelshofer @ 2008-07-23 17:32 ` Patrick McHardy 2008-07-23 17:38 ` Fabian Hugelshofer 0 siblings, 1 reply; 30+ messages in thread From: Patrick McHardy @ 2008-07-23 17:32 UTC (permalink / raw) To: Fabian Hugelshofer; +Cc: netfilter-devel, Pablo Neira Ayuso Fabian Hugelshofer wrote: > Patrick McHardy wrote: >> Patrick McHardy wrote: >>> Fabian Hugelshofer wrote: >>>> Is the zeroing of the inverted tuple in nf_ct_invert_tuple really >>>> required? As far as I can see all fields are set by the subsequent >>>> code. >>> >>> It dependfs on the protocol family. For IPv6 its completely >>> unnecessary, for IPv4 the last 12 bytes of each address need >>> to be zeroes. We could push this down to the protocols to >>> behave more optimally (actually something I started and didn't >>> finish some time ago). >> >> Actually that really is necessary because we have padding in the >> tuple on at least ARM. > > As you write the remaining 12 bytes for IPv4 can easily be handled in > the l3protocol. The original tuple is already initialised properly and > one would just have to replace things like > tuple->src.u3.ip = orig->dst.u3.ip; > with > tuple->src.u3.all = orig->dst.u3.all; > in nf_conntrack_l3proto_ipv4.c. Correct. > Why do you think the padding causes problems? For hashing e.g. src/dst > .u3 and .u are referenced independently of potential padding. Where does > access to the padding data occur? The hash function hashes the entire tuple in one go. The padding needs to have deterministic content for that. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-23 17:32 ` Patrick McHardy @ 2008-07-23 17:38 ` Fabian Hugelshofer 2008-07-23 17:40 ` Patrick McHardy 0 siblings, 1 reply; 30+ messages in thread From: Fabian Hugelshofer @ 2008-07-23 17:38 UTC (permalink / raw) To: Patrick McHardy; +Cc: netfilter-devel, Pablo Neira Ayuso Patrick McHardy wrote: > Fabian Hugelshofer wrote: >> Why do you think the padding causes problems? For hashing e.g. src/dst >> .u3 and .u are referenced independently of potential padding. Where >> does access to the padding data occur? > > The hash function hashes the entire tuple in one go. > The padding needs to have deterministic content for > that. Ok, I see. This has been changed since 2.6.24. Makes sense... ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-23 17:38 ` Fabian Hugelshofer @ 2008-07-23 17:40 ` Patrick McHardy 0 siblings, 0 replies; 30+ messages in thread From: Patrick McHardy @ 2008-07-23 17:40 UTC (permalink / raw) To: Fabian Hugelshofer; +Cc: netfilter-devel, Pablo Neira Ayuso Fabian Hugelshofer wrote: > Patrick McHardy wrote: >> Fabian Hugelshofer wrote: >>> Why do you think the padding causes problems? For hashing e.g. >>> src/dst .u3 and .u are referenced independently of potential padding. >>> Where does access to the padding data occur? >> >> The hash function hashes the entire tuple in one go. >> The padding needs to have deterministic content for >> that. > > Ok, I see. This has been changed since 2.6.24. Makes sense... Indeed. That part might also help speed things up for you a bit. But that would also need this fix: commit 443a70d50bdc212e1292778e264ce3d0a85b896f Author: Philip Craig <philipc@snapgear.com> Date: Tue Apr 29 03:35:10 2008 -0700 netfilter: nf_conntrack: padding breaks conntrack hash on ARM ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-23 17:01 ` Patrick McHardy 2008-07-23 17:07 ` Patrick McHardy @ 2008-07-23 17:15 ` Fabian Hugelshofer 2008-07-23 17:20 ` Patrick McHardy 1 sibling, 1 reply; 30+ messages in thread From: Fabian Hugelshofer @ 2008-07-23 17:15 UTC (permalink / raw) To: Patrick McHardy; +Cc: netfilter-devel, Pablo Neira Ayuso Patrick McHardy wrote: > Fabian Hugelshofer wrote: >> Patrick McHardy wrote: >>> The first thing to try would be to use sane allocation sizes >>> for the event messages. This patch doesn't implement it properly >>> (uses probing), but should be enough to test whether it helps. >> >> Thanks a lot. This patch already decreased the CPU usage for ctevtest >> from 85% to 44%. Sweet... > > Nice. Now we just need to do it properly :) What do you mean with properly? Put some kind of cap? Or eliminate the guessing by allocating a reasonable small fixed amount? >> nf_conntrack_event is now one of the first functions listed. Do you >> see other ways of improving performance? > > For some members doing in-place message construction instead of > copying the data might help, but I couldn only spot few only > used rarely. > > The module reference stuff (module_put/nf_ct_*_find_get etc) > is clearly superfluous, this runs in packet processing context > and shouldn't use module references but RCU. This goes too deep, that I could help you on this. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-23 17:15 ` Fabian Hugelshofer @ 2008-07-23 17:20 ` Patrick McHardy 2008-07-24 13:21 ` Fabian Hugelshofer 0 siblings, 1 reply; 30+ messages in thread From: Patrick McHardy @ 2008-07-23 17:20 UTC (permalink / raw) To: Fabian Hugelshofer; +Cc: netfilter-devel, Pablo Neira Ayuso Fabian Hugelshofer wrote: > Patrick McHardy wrote: >> Fabian Hugelshofer wrote: >>> Patrick McHardy wrote: >>>> The first thing to try would be to use sane allocation sizes >>>> for the event messages. This patch doesn't implement it properly >>>> (uses probing), but should be enough to test whether it helps. >>> >>> Thanks a lot. This patch already decreased the CPU usage for ctevtest >>> from 85% to 44%. Sweet... >> >> Nice. Now we just need to do it properly :) > > What do you mean with properly? Put some kind of cap? Or eliminate the > guessing by allocating a reasonable small fixed amount? Yes, ideally it should use exact allocations, but thats probably not worth it because it depends on too many factors. The next best thing would be to allocate exactly the maximum thats needed. The <<= 1 in my patch might still cause reallocations, some (or all) of them could be avoided. >>> nf_conntrack_event is now one of the first functions listed. Do you >>> see other ways of improving performance? >> >> For some members doing in-place message construction instead of >> copying the data might help, but I couldn only spot few only >> used rarely. >> >> The module reference stuff (module_put/nf_ct_*_find_get etc) >> is clearly superfluous, this runs in packet processing context >> and shouldn't use module references but RCU. > > This goes too deep, that I could help you on this. Its not really hard, all the nf_ct_*_find_get functions on the event sending path should be replaced by the corresponding __nf_ct_*_find functions and the _put functions removed. In case dumping functions are used by both the event handler and userspace triggered dumps, the userspace path needs to call rcu_read_lock/rcu_read_unlock. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-23 17:20 ` Patrick McHardy @ 2008-07-24 13:21 ` Fabian Hugelshofer 2008-07-25 8:51 ` Fabian Hugelshofer 2008-07-25 9:32 ` Pablo Neira Ayuso 0 siblings, 2 replies; 30+ messages in thread From: Fabian Hugelshofer @ 2008-07-24 13:21 UTC (permalink / raw) To: Patrick McHardy; +Cc: netfilter-devel, Pablo Neira Ayuso Patrick McHardy wrote: > Fabian Hugelshofer wrote: > > Patrick McHardy wrote: > >> The module reference stuff (module_put/nf_ct_*_find_get etc) > >> is clearly superfluous, this runs in packet processing context > >> and shouldn't use module references but RCU. > > > > This goes too deep, that I could help you on this. > > Its not really hard, all the nf_ct_*_find_get functions on the event > sending path should be replaced by the corresponding __nf_ct_*_find > functions and the _put functions removed. In case dumping functions > are used by both the event handler and userspace triggered dumps, > the userspace path needs to call rcu_read_lock/rcu_read_unlock. Ok, I did what you said. You find the patch below. Is this what you mean? If yes, I'll give it a try... I only touched the parts from the event notification chain. Should I leave the other find_gets as they are? Is it ok to call rcu_read_lock() multiple times in a row? In ctnetlink_fill_info e.g. I lock and then ctnetlink_dump_helpinfo locks again. Should I move the second lock out of dump_helpinfo? Generally I have the impression, that moving the locks out of some dump functions makes the whole thing less uniform. Is this acceptable? diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 95a7967..b8650e5 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -104,16 +104,14 @@ ctnetlink_dump_tuples(struct sk_buff *skb, struct nf_conntrack_l3proto *l3proto; struct nf_conntrack_l4proto *l4proto; - l3proto = nf_ct_l3proto_find_get(tuple->src.l3num); + l3proto = __nf_ct_l3proto_find(tuple->src.l3num); ret = ctnetlink_dump_tuples_ip(skb, tuple, l3proto); - nf_ct_l3proto_put(l3proto); if (unlikely(ret < 0)) return ret; - l4proto = nf_ct_l4proto_find_get(tuple->src.l3num, tuple->dst.protonum); + l4proto = __nf_ct_l4proto_find(tuple->src.l3num, tuple->dst.protonum); ret = ctnetlink_dump_tuples_proto(skb, tuple, l4proto); - nf_ct_l4proto_put(l4proto); return ret; } @@ -150,9 +148,8 @@ ctnetlink_dump_protoinfo(struct sk_buff *skb, const struct nf_conn *ct) struct nlattr *nest_proto; int ret; - l4proto = nf_ct_l4proto_find_get(nf_ct_l3num(ct), nf_ct_protonum(ct)); + l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct)); if (!l4proto->to_nlattr) { - nf_ct_l4proto_put(l4proto); return 0; } @@ -162,14 +159,11 @@ ctnetlink_dump_protoinfo(struct sk_buff *skb, const struct nf_conn *ct) ret = l4proto->to_nlattr(skb, nest_proto, ct); - nf_ct_l4proto_put(l4proto); - nla_nest_end(skb, nest_proto); return ret; nla_put_failure: - nf_ct_l4proto_put(l4proto); return -1; } @@ -384,8 +378,11 @@ ctnetlink_fill_info(struct sk_buff *skb, u32 pid, u32 seq, nest_parms = nla_nest_start(skb, CTA_TUPLE_REPLY | NLA_F_NESTED); if (!nest_parms) goto nla_put_failure; - if (ctnetlink_dump_tuples(skb, tuple(ct, IP_CT_DIR_REPLY)) < 0) + rcu_read_lock(); + if (ctnetlink_dump_tuples(skb, tuple(ct, IP_CT_DIR_REPLY)) < 0) { + rcu_read_unlock(); goto nla_put_failure; + } nla_nest_end(skb, nest_parms); if (ctnetlink_dump_status(skb, ct) < 0 || @@ -399,8 +396,11 @@ ctnetlink_fill_info(struct sk_buff *skb, u32 pid, u32 seq, ctnetlink_dump_id(skb, ct) < 0 || ctnetlink_dump_use(skb, ct) < 0 || ctnetlink_dump_master(skb, ct) < 0 || - ctnetlink_dump_nat_seq_adj(skb, ct) < 0) + ctnetlink_dump_nat_seq_adj(skb, ct) < 0) { + rcu_read_unlock(); goto nla_put_failure; + } + rcu_read_unlock(); nlh->nlmsg_len = skb_tail_pointer(skb) - b; return skb->len; @@ -1288,8 +1288,12 @@ ctnetlink_exp_dump_tuple(struct sk_buff *skb, nest_parms = nla_nest_start(skb, type | NLA_F_NESTED); if (!nest_parms) goto nla_put_failure; - if (ctnetlink_dump_tuples(skb, tuple) < 0) + rcu_read_lock(); + if (ctnetlink_dump_tuples(skb, tuple) < 0) { + rcu_read_unlock(); goto nla_put_failure; + } + rcu_read_unlock(); nla_nest_end(skb, nest_parms); return 0; ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-24 13:21 ` Fabian Hugelshofer @ 2008-07-25 8:51 ` Fabian Hugelshofer 2008-07-25 9:32 ` Pablo Neira Ayuso 1 sibling, 0 replies; 30+ messages in thread From: Fabian Hugelshofer @ 2008-07-25 8:51 UTC (permalink / raw) To: netfilter-devel; +Cc: Patrick McHardy, Pablo Neira Ayuso Fabian Hugelshofer wrote: >>> Patrick McHardy wrote: >>>> The module reference stuff (module_put/nf_ct_*_find_get etc) >>>> is clearly superfluous, this runs in packet processing context >>>> and shouldn't use module references but RCU. > > Ok, I did what you said. You find the patch below. Is this what you > mean? If yes, I'll give it a try... I applied the patch and did the test. The effect of removing unnecessary locks reduced the CPU usage by one percent (packet rates verified to be comparable this time). It's up to you to decide if making the code less nice (IMHO) is worth saving 1% CPU usage under rare circumstances. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-24 13:21 ` Fabian Hugelshofer 2008-07-25 8:51 ` Fabian Hugelshofer @ 2008-07-25 9:32 ` Pablo Neira Ayuso 2008-07-25 11:15 ` Pablo Neira Ayuso 1 sibling, 1 reply; 30+ messages in thread From: Pablo Neira Ayuso @ 2008-07-25 9:32 UTC (permalink / raw) To: Fabian Hugelshofer; +Cc: Patrick McHardy, netfilter-devel Fabian Hugelshofer wrote: > @@ -384,8 +378,11 @@ ctnetlink_fill_info(struct sk_buff *skb, u32 pid, u32 seq, > nest_parms = nla_nest_start(skb, CTA_TUPLE_REPLY | NLA_F_NESTED); > if (!nest_parms) > goto nla_put_failure; > - if (ctnetlink_dump_tuples(skb, tuple(ct, IP_CT_DIR_REPLY)) < 0) > + rcu_read_lock(); > + if (ctnetlink_dump_tuples(skb, tuple(ct, IP_CT_DIR_REPLY)) < 0) { > + rcu_read_unlock(); > goto nla_put_failure; ^^^ Would it look nicer if you add a new label 'nla_put_failure_unlock'? nla_put_failure_unlock: read_rcu_unlock(); nlmsg_failure: nla_put_failure: nlmsg_trim(skb, b); return -1; Or much simpler, just call read_rcu_unlock() before the first nla_nest_start() so that this results in much smaller patch: nlmsg_failure: nla_put_failure: read_rcu_unlock(); <--- nlmsg_trim(skb, b); return -1; BTW, please, add a short description to your patches and the 'Signed-off-by' field. -- "Los honestos son inadaptados sociales" -- Les Luthiers ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-25 9:32 ` Pablo Neira Ayuso @ 2008-07-25 11:15 ` Pablo Neira Ayuso 2008-07-27 17:23 ` Fabian Hugelshofer 2008-07-28 18:31 ` Pablo Neira Ayuso 0 siblings, 2 replies; 30+ messages in thread From: Pablo Neira Ayuso @ 2008-07-25 11:15 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Fabian Hugelshofer, Patrick McHardy, netfilter-devel Pablo Neira Ayuso wrote: > Or much simpler, just call read_rcu_unlock() before the first > nla_nest_start() so that this results in much smaller patch: > > nlmsg_failure: > nla_put_failure: > read_rcu_unlock(); <--- > nlmsg_trim(skb, b); > return -1; As said, if you do this in ctnetlink_conntrack_event, I think that you can also remove the rcu_read_lock in ctnetlink_dump_helpinfo, as then all dump functions will be invoked under rcu_read_lock. In ctnetlink_get_conntrack, I think that ctnetlink_fill_info needs to be protected with rcu_read_lock. BTW, why do we need such a big read-side critical section in ctnetlink_create_conntrack? I think that we only need for the helper assignation, right? -- "Los honestos son inadaptados sociales" -- Les Luthiers ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-25 11:15 ` Pablo Neira Ayuso @ 2008-07-27 17:23 ` Fabian Hugelshofer 2008-07-28 18:31 ` Pablo Neira Ayuso 1 sibling, 0 replies; 30+ messages in thread From: Fabian Hugelshofer @ 2008-07-27 17:23 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Patrick McHardy, netfilter-devel Pablo Neira Ayuso wrote: > Pablo Neira Ayuso wrote: > > Or much simpler, just call read_rcu_unlock() before the first > > nla_nest_start() so that this results in much smaller patch: > > > > nlmsg_failure: > > nla_put_failure: > > read_rcu_unlock(); <--- > > nlmsg_trim(skb, b); > > return -1; > > As said, if you do this in ctnetlink_conntrack_event, I think that you > can also remove the rcu_read_lock in ctnetlink_dump_helpinfo, as then > all dump functions will be invoked under rcu_read_lock. I modified the patch with your suggestions. Read locks have been removed from all dump functions. > In ctnetlink_get_conntrack, I think that ctnetlink_fill_info needs to be > protected with rcu_read_lock. ctnetlink_get_conntrack now contains a read lock to protect the dump functions. Protecting it outside is therefore not necessary. nf_ctnetlink: Remove read locks from dump functions to increase performance in the event notification path Signed-off-by: Fabian Hugelshofer <hugelshofer2006@gmx.ch> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 95a7967..696649e 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -104,16 +104,14 @@ ctnetlink_dump_tuples(struct sk_buff *skb, struct nf_conntrack_l3proto *l3proto; struct nf_conntrack_l4proto *l4proto; - l3proto = nf_ct_l3proto_find_get(tuple->src.l3num); + l3proto = __nf_ct_l3proto_find(tuple->src.l3num); ret = ctnetlink_dump_tuples_ip(skb, tuple, l3proto); - nf_ct_l3proto_put(l3proto); if (unlikely(ret < 0)) return ret; - l4proto = nf_ct_l4proto_find_get(tuple->src.l3num, tuple->dst.protonum); + l4proto = __nf_ct_l4proto_find(tuple->src.l3num, tuple->dst.protonum); ret = ctnetlink_dump_tuples_proto(skb, tuple, l4proto); - nf_ct_l4proto_put(l4proto); return ret; } @@ -150,9 +148,8 @@ ctnetlink_dump_protoinfo(struct sk_buff *skb, const struct nf_conn *ct) struct nlattr *nest_proto; int ret; - l4proto = nf_ct_l4proto_find_get(nf_ct_l3num(ct), nf_ct_protonum(ct)); + l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct)); if (!l4proto->to_nlattr) { - nf_ct_l4proto_put(l4proto); return 0; } @@ -162,14 +159,11 @@ ctnetlink_dump_protoinfo(struct sk_buff *skb, const struct nf_conn *ct) ret = l4proto->to_nlattr(skb, nest_proto, ct); - nf_ct_l4proto_put(l4proto); - nla_nest_end(skb, nest_proto); return ret; nla_put_failure: - nf_ct_l4proto_put(l4proto); return -1; } @@ -183,7 +177,6 @@ ctnetlink_dump_helpinfo(struct sk_buff *skb, const struct nf_conn *ct) if (!help) return 0; - rcu_read_lock(); helper = rcu_dereference(help->helper); if (!helper) goto out; @@ -198,11 +191,9 @@ ctnetlink_dump_helpinfo(struct sk_buff *skb, const struct nf_conn *ct) nla_nest_end(skb, nest_helper); out: - rcu_read_unlock(); return 0; nla_put_failure: - rcu_read_unlock(); return -1; } @@ -374,6 +365,8 @@ ctnetlink_fill_info(struct sk_buff *skb, u32 pid, u32 seq, nfmsg->version = NFNETLINK_V0; nfmsg->res_id = 0; + rcu_read_lock(); + nest_parms = nla_nest_start(skb, CTA_TUPLE_ORIG | NLA_F_NESTED); if (!nest_parms) goto nla_put_failure; @@ -402,11 +395,14 @@ ctnetlink_fill_info(struct sk_buff *skb, u32 pid, u32 seq, ctnetlink_dump_nat_seq_adj(skb, ct) < 0) goto nla_put_failure; + rcu_read_unlock(); + nlh->nlmsg_len = skb_tail_pointer(skb) - b; return skb->len; nlmsg_failure: nla_put_failure: + rcu_read_unlock(); nlmsg_trim(skb, b); return -1; } @@ -1285,16 +1281,19 @@ ctnetlink_exp_dump_tuple(struct sk_buff *skb, { struct nlattr *nest_parms; + rcu_read_lock(); nest_parms = nla_nest_start(skb, type | NLA_F_NESTED); if (!nest_parms) goto nla_put_failure; if (ctnetlink_dump_tuples(skb, tuple) < 0) goto nla_put_failure; nla_nest_end(skb, nest_parms); + rcu_read_unlock(); return 0; nla_put_failure: + rcu_read_unlock(); return -1; } ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-25 11:15 ` Pablo Neira Ayuso 2008-07-27 17:23 ` Fabian Hugelshofer @ 2008-07-28 18:31 ` Pablo Neira Ayuso 2008-07-28 23:12 ` Fabian Hugelshofer 1 sibling, 1 reply; 30+ messages in thread From: Pablo Neira Ayuso @ 2008-07-28 18:31 UTC (permalink / raw) To: Fabian Hugelshofer; +Cc: Patrick McHardy, netfilter-devel Pablo Neira Ayuso wrote: > Pablo Neira Ayuso wrote: >> Or much simpler, just call read_rcu_unlock() before the first >> nla_nest_start() so that this results in much smaller patch: >> >> nlmsg_failure: >> nla_put_failure: >> read_rcu_unlock(); <--- >> nlmsg_trim(skb, b); >> return -1; Sorry, this is wrong. It should be: nla_put_failure: read_rcu_unlock(); nlmsg_failure: nlmsg_trim(skb, b); return -1; -- "Los honestos son inadaptados sociales" -- Les Luthiers ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-28 18:31 ` Pablo Neira Ayuso @ 2008-07-28 23:12 ` Fabian Hugelshofer 2008-07-29 17:11 ` Pablo Neira Ayuso 0 siblings, 1 reply; 30+ messages in thread From: Fabian Hugelshofer @ 2008-07-28 23:12 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Patrick McHardy, netfilter-devel On Mon, 2008-07-28 at 20:31 +0200, Pablo Neira Ayuso wrote: > Pablo Neira Ayuso wrote: > > Pablo Neira Ayuso wrote: > >> Or much simpler, just call read_rcu_unlock() before the first > >> nla_nest_start() so that this results in much smaller patch: > >> > >> nlmsg_failure: > >> nla_put_failure: > >> read_rcu_unlock(); <--- > >> nlmsg_trim(skb, b); > >> return -1; > > Sorry, this is wrong. It should be: > > nla_put_failure: > read_rcu_unlock(); > nlmsg_failure: > nlmsg_trim(skb, b); > return -1; Very true indeed. Thanks for noticing. The nlmsg_failure is kinda hidden in the macro and the jump targets were in the wrong order. You find the corrected version below. nf_ctnetlink: Remove read locks from dump functions to increase performance in the event notification path Signed-off-by: Fabian Hugelshofer <hugelshofer2006@gmx.ch> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 95a7967..06f69a2 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -104,16 +104,14 @@ ctnetlink_dump_tuples(struct sk_buff *skb, struct nf_conntrack_l3proto *l3proto; struct nf_conntrack_l4proto *l4proto; - l3proto = nf_ct_l3proto_find_get(tuple->src.l3num); + l3proto = __nf_ct_l3proto_find(tuple->src.l3num); ret = ctnetlink_dump_tuples_ip(skb, tuple, l3proto); - nf_ct_l3proto_put(l3proto); if (unlikely(ret < 0)) return ret; - l4proto = nf_ct_l4proto_find_get(tuple->src.l3num, tuple->dst.protonum); + l4proto = __nf_ct_l4proto_find(tuple->src.l3num, tuple->dst.protonum); ret = ctnetlink_dump_tuples_proto(skb, tuple, l4proto); - nf_ct_l4proto_put(l4proto); return ret; } @@ -150,9 +148,8 @@ ctnetlink_dump_protoinfo(struct sk_buff *skb, const struct nf_conn *ct) struct nlattr *nest_proto; int ret; - l4proto = nf_ct_l4proto_find_get(nf_ct_l3num(ct), nf_ct_protonum(ct)); + l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct)); if (!l4proto->to_nlattr) { - nf_ct_l4proto_put(l4proto); return 0; } @@ -162,14 +159,11 @@ ctnetlink_dump_protoinfo(struct sk_buff *skb, const struct nf_conn *ct) ret = l4proto->to_nlattr(skb, nest_proto, ct); - nf_ct_l4proto_put(l4proto); - nla_nest_end(skb, nest_proto); return ret; nla_put_failure: - nf_ct_l4proto_put(l4proto); return -1; } @@ -183,7 +177,6 @@ ctnetlink_dump_helpinfo(struct sk_buff *skb, const struct nf_conn *ct) if (!help) return 0; - rcu_read_lock(); helper = rcu_dereference(help->helper); if (!helper) goto out; @@ -198,11 +191,9 @@ ctnetlink_dump_helpinfo(struct sk_buff *skb, const struct nf_conn *ct) nla_nest_end(skb, nest_helper); out: - rcu_read_unlock(); return 0; nla_put_failure: - rcu_read_unlock(); return -1; } @@ -374,6 +365,8 @@ ctnetlink_fill_info(struct sk_buff *skb, u32 pid, u32 seq, nfmsg->version = NFNETLINK_V0; nfmsg->res_id = 0; + rcu_read_lock(); + nest_parms = nla_nest_start(skb, CTA_TUPLE_ORIG | NLA_F_NESTED); if (!nest_parms) goto nla_put_failure; @@ -402,11 +395,14 @@ ctnetlink_fill_info(struct sk_buff *skb, u32 pid, u32 seq, ctnetlink_dump_nat_seq_adj(skb, ct) < 0) goto nla_put_failure; + rcu_read_unlock(); + nlh->nlmsg_len = skb_tail_pointer(skb) - b; return skb->len; -nlmsg_failure: nla_put_failure: + rcu_read_unlock(); +nlmsg_failure: nlmsg_trim(skb, b); return -1; } @@ -1285,16 +1281,19 @@ ctnetlink_exp_dump_tuple(struct sk_buff *skb, { struct nlattr *nest_parms; + rcu_read_lock(); nest_parms = nla_nest_start(skb, type | NLA_F_NESTED); if (!nest_parms) goto nla_put_failure; if (ctnetlink_dump_tuples(skb, tuple) < 0) goto nla_put_failure; nla_nest_end(skb, nest_parms); + rcu_read_unlock(); return 0; nla_put_failure: + rcu_read_unlock(); return -1; } ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-28 23:12 ` Fabian Hugelshofer @ 2008-07-29 17:11 ` Pablo Neira Ayuso 0 siblings, 0 replies; 30+ messages in thread From: Pablo Neira Ayuso @ 2008-07-29 17:11 UTC (permalink / raw) To: Fabian Hugelshofer; +Cc: Patrick McHardy, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1056 bytes --] Fabian Hugelshofer wrote: > On Mon, 2008-07-28 at 20:31 +0200, Pablo Neira Ayuso wrote: >> Pablo Neira Ayuso wrote: >>> Pablo Neira Ayuso wrote: >>>> Or much simpler, just call read_rcu_unlock() before the first >>>> nla_nest_start() so that this results in much smaller patch: >>>> >>>> nlmsg_failure: >>>> nla_put_failure: >>>> read_rcu_unlock(); <--- >>>> nlmsg_trim(skb, b); >>>> return -1; >> Sorry, this is wrong. It should be: >> >> nla_put_failure: >> read_rcu_unlock(); >> nlmsg_failure: >> nlmsg_trim(skb, b); >> return -1; > > Very true indeed. Thanks for noticing. The nlmsg_failure is kinda hidden > in the macro and the jump targets were in the wrong order. You find the > corrected version below. > > nf_ctnetlink: Remove read locks from dump functions to increase > performance in the event notification path I have six patches for ctnetlink here, one of them is based on your patch. I hope to post them tomorrow for review. -- "Los honestos son inadaptados sociales" -- Les Luthiers [-- Attachment #2: 00.patch --] [-- Type: text/x-diff, Size: 4652 bytes --] [PATCH] get rid of module refcounting in ctnetlink This patch replaces the unnecessary module refcounting with the read-side locks. With this patch, all the dump and fill_info function are called under the RCU read lock. Based on a patch from Fabien Hugelshofer. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Index: net-next-2.6.git/net/netfilter/nf_conntrack_netlink.c =================================================================== --- net-next-2.6.git.orig/net/netfilter/nf_conntrack_netlink.c 2008-07-29 14:24:39.000000000 +0200 +++ net-next-2.6.git/net/netfilter/nf_conntrack_netlink.c 2008-07-29 14:24:41.000000000 +0200 @@ -103,16 +103,14 @@ ctnetlink_dump_tuples(struct sk_buff *sk struct nf_conntrack_l3proto *l3proto; struct nf_conntrack_l4proto *l4proto; - l3proto = nf_ct_l3proto_find_get(tuple->src.l3num); + l3proto = __nf_ct_l3proto_find(tuple->src.l3num); ret = ctnetlink_dump_tuples_ip(skb, tuple, l3proto); - nf_ct_l3proto_put(l3proto); if (unlikely(ret < 0)) return ret; - l4proto = nf_ct_l4proto_find_get(tuple->src.l3num, tuple->dst.protonum); + l4proto = __nf_ct_l4proto_find(tuple->src.l3num, tuple->dst.protonum); ret = ctnetlink_dump_tuples_proto(skb, tuple, l4proto); - nf_ct_l4proto_put(l4proto); return ret; } @@ -149,11 +147,9 @@ ctnetlink_dump_protoinfo(struct sk_buff struct nlattr *nest_proto; int ret; - l4proto = nf_ct_l4proto_find_get(nf_ct_l3num(ct), nf_ct_protonum(ct)); - if (!l4proto->to_nlattr) { - nf_ct_l4proto_put(l4proto); + l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct)); + if (!l4proto->to_nlattr) return 0; - } nest_proto = nla_nest_start(skb, CTA_PROTOINFO | NLA_F_NESTED); if (!nest_proto) @@ -161,14 +157,11 @@ ctnetlink_dump_protoinfo(struct sk_buff ret = l4proto->to_nlattr(skb, nest_proto, ct); - nf_ct_l4proto_put(l4proto); - nla_nest_end(skb, nest_proto); return ret; nla_put_failure: - nf_ct_l4proto_put(l4proto); return -1; } @@ -182,7 +175,6 @@ ctnetlink_dump_helpinfo(struct sk_buff * if (!help) return 0; - rcu_read_lock(); helper = rcu_dereference(help->helper); if (!helper) goto out; @@ -197,11 +189,9 @@ ctnetlink_dump_helpinfo(struct sk_buff * nla_nest_end(skb, nest_helper); out: - rcu_read_unlock(); return 0; nla_put_failure: - rcu_read_unlock(); return -1; } @@ -458,6 +448,7 @@ static int ctnetlink_conntrack_event(str nfmsg->version = NFNETLINK_V0; nfmsg->res_id = 0; + rcu_read_lock(); nest_parms = nla_nest_start(skb, CTA_TUPLE_ORIG | NLA_F_NESTED); if (!nest_parms) goto nla_put_failure; @@ -519,13 +510,15 @@ static int ctnetlink_conntrack_event(str && ctnetlink_dump_mark(skb, ct) < 0) goto nla_put_failure; #endif + rcu_read_unlock(); nlh->nlmsg_len = skb->tail - b; nfnetlink_send(skb, 0, group, 0); return NOTIFY_DONE; -nlmsg_failure: nla_put_failure: + rcu_read_unlock(); +nlmsg_failure: kfree_skb(skb); return NOTIFY_DONE; } @@ -863,8 +856,10 @@ ctnetlink_get_conntrack(struct sock *ctn return -ENOMEM; } + rcu_read_lock(); err = ctnetlink_fill_info(skb2, NETLINK_CB(skb).pid, nlh->nlmsg_seq, IPCTNL_MSG_CT_NEW, 1, ct); + rcu_read_unlock(); nf_ct_put(ct); if (err <= 0) goto free; @@ -1316,16 +1311,14 @@ ctnetlink_exp_dump_mask(struct sk_buff * if (!nest_parms) goto nla_put_failure; - l3proto = nf_ct_l3proto_find_get(tuple->src.l3num); + l3proto = __nf_ct_l3proto_find(tuple->src.l3num); ret = ctnetlink_dump_tuples_ip(skb, &m, l3proto); - nf_ct_l3proto_put(l3proto); if (unlikely(ret < 0)) goto nla_put_failure; - l4proto = nf_ct_l4proto_find_get(tuple->src.l3num, tuple->dst.protonum); + l4proto = __nf_ct_l4proto_find(tuple->src.l3num, tuple->dst.protonum); ret = ctnetlink_dump_tuples_proto(skb, &m, l4proto); - nf_ct_l4proto_put(l4proto); if (unlikely(ret < 0)) goto nla_put_failure; @@ -1432,15 +1425,18 @@ static int ctnetlink_expect_event(struct nfmsg->version = NFNETLINK_V0; nfmsg->res_id = 0; + rcu_read_lock(); if (ctnetlink_exp_dump_expect(skb, exp) < 0) goto nla_put_failure; + rcu_read_unlock(); nlh->nlmsg_len = skb->tail - b; nfnetlink_send(skb, 0, NFNLGRP_CONNTRACK_EXP_NEW, 0); return NOTIFY_DONE; -nlmsg_failure: nla_put_failure: + rcu_read_unlock(); +nlmsg_failure: kfree_skb(skb); return NOTIFY_DONE; } @@ -1543,9 +1539,11 @@ ctnetlink_get_expect(struct sock *ctnl, if (!skb2) goto out; + rcu_read_lock(); err = ctnetlink_exp_fill_info(skb2, NETLINK_CB(skb).pid, nlh->nlmsg_seq, IPCTNL_MSG_EXP_NEW, 1, exp); + rcu_read_unlock(); if (err <= 0) goto free; ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Conntrack Events Performance - Multipart Messages? 2008-07-23 16:12 ` Fabian Hugelshofer 2008-07-23 17:01 ` Patrick McHardy @ 2008-07-25 8:44 ` Fabian Hugelshofer 1 sibling, 0 replies; 30+ messages in thread From: Fabian Hugelshofer @ 2008-07-25 8:44 UTC (permalink / raw) To: netfilter-devel; +Cc: Patrick McHardy, Pablo Neira Ayuso Fabian Hugelshofer wrote: > Patrick McHardy wrote: >> The first thing to try would be to use sane allocation sizes >> for the event messages. This patch doesn't implement it properly >> (uses probing), but should be enough to test whether it helps. > > Thanks a lot. This patch already decreased the CPU usage for ctevtest > from 85% to 44%. Sweet... I just rerun the test. The 85% CPU usage was with 1700pps and 44% with 1500pps. The wireless channel does not provide the same performance all the time and I did not pay attention to the packet rate. Without the allocation patch and 1500pps the CPU usage is 56%. But 10% less is still quite nice. ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2008-07-29 17:12 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-07-16 16:42 Conntrack Events Performance - Multipart Messages? Fabian Hugelshofer 2008-07-17 9:16 ` Patrick McHardy 2008-07-17 10:03 ` Pablo Neira Ayuso 2008-07-17 14:34 ` Fabian Hugelshofer 2008-07-17 15:15 ` Fabian Hugelshofer 2008-07-18 15:56 ` Fabian Hugelshofer 2008-07-18 2:11 ` Patrick McHardy 2008-07-21 15:51 ` Fabian Hugelshofer 2008-07-21 15:59 ` Patrick McHardy 2008-07-21 17:49 ` Fabian Hugelshofer 2008-07-23 14:32 ` Fabian Hugelshofer 2008-07-23 14:38 ` Patrick McHardy 2008-07-23 16:12 ` Fabian Hugelshofer 2008-07-23 17:01 ` Patrick McHardy 2008-07-23 17:07 ` Patrick McHardy 2008-07-23 17:30 ` Fabian Hugelshofer 2008-07-23 17:32 ` Patrick McHardy 2008-07-23 17:38 ` Fabian Hugelshofer 2008-07-23 17:40 ` Patrick McHardy 2008-07-23 17:15 ` Fabian Hugelshofer 2008-07-23 17:20 ` Patrick McHardy 2008-07-24 13:21 ` Fabian Hugelshofer 2008-07-25 8:51 ` Fabian Hugelshofer 2008-07-25 9:32 ` Pablo Neira Ayuso 2008-07-25 11:15 ` Pablo Neira Ayuso 2008-07-27 17:23 ` Fabian Hugelshofer 2008-07-28 18:31 ` Pablo Neira Ayuso 2008-07-28 23:12 ` Fabian Hugelshofer 2008-07-29 17:11 ` Pablo Neira Ayuso 2008-07-25 8:44 ` Fabian Hugelshofer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.