netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jorrit Kronjee <j.kronjee@infopact.nl>
To: Eric Dumazet <eric.dumazet@gmail.com>, netfilter-devel@vger.kernel.org
Subject: Re: debugging kernel during packet drops
Date: Fri, 26 Mar 2010 11:41:06 +0100	[thread overview]
Message-ID: <4BAC8F42.7090706@infopact.nl> (raw)
In-Reply-To: <1269509574.3626.9.camel@edumazet-laptop>

On 3/25/2010 10:32 AM, Eric Dumazet wrote:
> Le mercredi 24 mars 2010 à 17:22 +0100, Eric Dumazet a écrit :
>   
>> Sure this helps a lot !
>>
>> You might try RPS by doing :
>>
>> echo f >/sys/class/net/eth3/queues/rx-0/rps_cpus
>>
>> (But you'll also need a new xt_hashlimit module to make it more
>> scalable, I can work on this this week if necessary)
>>
>>     
> Here is patch I cooked for xt_hashlimit (on top of net-next-2.6) to make
> it use RCU and scale better in your case (allowing several concurrent
> cpus once RPS is activated), but also on more general cases.
>
> [PATCH] xt_hashlimit: RCU conversion
>
> xt_hashlimit uses a central lock per hash table and suffers from
> contention on some workloads.
>
> After RCU conversion, central lock is only used when a writer wants to
> add or delete an entry. For 'readers', updating an existing entry, they
> use an individual lock per entry.
>   
Eric,

Awesome work, thanks for the effort! I've tried the patch and got some
results. The drop rate was reduced dramatically after I activated RPS.

I did the same test I did before, namely I rebooted and started flooding
the machine immediately after with 300 kpps. After 5 minutes, perf top
looked like this:

-------------------------------------------------------------------------------------------------------------------------
   PerfTop:    1962 irqs/sec  kernel:99.3% [1000Hz cycles],  (all, 4 CPUs)
-------------------------------------------------------------------------------------------------------------------------

             samples  pcnt function                 DSO
             _______ _____ ________________________
_____________________________________________________________________

             4501.00 14.0% __ticket_spin_lock      
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
             2985.00  9.3% dsthash_find            
/lib/modules/2.6.34-rc1-net-next/kernel/net/netfilter/xt_hashlimit.ko
             2346.00  7.3% __ticket_spin_unlock    
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
             1354.00  4.2% e1000_xmit_frame        
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000e/e1000e.ko
             1070.00  3.3% __slab_free             
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              997.00  3.1% memcpy                  
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              809.00  2.5% dev_queue_xmit          
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              791.00  2.5% nf_iterate              
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              705.00  2.2% e1000_clean_tx_irq      
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000e/e1000e.ko
              634.00  2.0% nf_hook_slow            
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              624.00  1.9% skb_release_head_state  
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              584.00  1.8% e1000_intr              
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000/e1000.ko
              536.00  1.7% br_nf_pre_routing_finish
/lib/modules/2.6.34-rc1-net-next/kernel/net/bridge/bridge.ko
              528.00  1.6% nommu_map_page          
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              499.00  1.6% kfree                   
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              494.00  1.5% __netif_receive_skb     
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              472.00  1.5% __alloc_skb             
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              448.00  1.4% br_fdb_update           
/lib/modules/2.6.34-rc1-net-next/kernel/net/bridge/bridge.ko
              437.00  1.4% __slab_alloc            
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              428.00  1.3% ipt_do_table             [ip_tables]
              403.00  1.3% memset                  
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              402.00  1.3% br_handle_frame         
/lib/modules/2.6.34-rc1-net-next/kernel/net/bridge/bridge.ko
              389.00  1.2% e1000_clean_rx_irq      
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000/e1000.ko
              388.00  1.2% e1000_clean             
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000/e1000.ko
              381.00  1.2% uhci_irq                
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              366.00  1.1% get_rps_cpu             
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              365.00  1.1% br_nf_pre_routing       
/lib/modules/2.6.34-rc1-net-next/kernel/net/bridge/bridge.ko
              349.00  1.1% dst_release             
/lib/modules/2.6.34-rc1-net-next/build/vmlinux

And iptables-save -c produced this:
# Generated by iptables-save v1.4.4 on Fri Mar 26 11:24:59 2010
*filter
:INPUT ACCEPT [1043:60514]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [942:282723]
[99563191:3783420610] -A FORWARD -m hashlimit --hashlimit-upto 10000/sec
--hashlimit-burst 100 --hashlimit-mode dstip --hashlimit-name hashtable
--hashlimit-htable-max 131072 --hashlimit-htable-expire 1000 -j ACCEPT
[0:0] -A FORWARD -m limit --limit 5/sec -j LOG --log-prefix "HASHLIMITED
-- "
[0:0] -A FORWARD -j DROP
COMMIT
# Completed on Fri Mar 26 11:24:59 2010

And /proc/interrupts looked like this:
     CPU0       CPU1       CPU2       CPU3
  0:         47          0          1          0   IO-APIC-edge      timer
  1:          0          1          0          1   IO-APIC-edge      i8042
  6:          1          1          0          0   IO-APIC-edge      floppy
  8:          1          0          0          0   IO-APIC-edge      rtc0
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 12:          0          1          1          2   IO-APIC-edge      i8042
 14:         21         22         22         21   IO-APIC-edge     
ata_piix
 15:          0          0          0          0   IO-APIC-edge     
ata_piix
 16:        492        464        463        474   IO-APIC-fasteoi   arcmsr
 17:          0          0          0          0   IO-APIC-fasteoi  
ehci_hcd:usb1
 18:     971171     971391     948171     948663   IO-APIC-fasteoi  
uhci_hcd:usb3, uhci_hcd:usb7, eth3
 19:          0          0          0          0   IO-APIC-fasteoi  
uhci_hcd:usb6
 21:          0          0          0          0   IO-APIC-fasteoi  
ata_piix, uhci_hcd:usb4
 23:          1          0          1          0   IO-APIC-fasteoi  
ehci_hcd:usb2, uhci_hcd:usb5
 27:    1003145    1002952    1026174    1025671   PCI-MSI-edge      eth4
NMI:     202553     185135     134999     185071   Non-maskable interrupts
LOC:      20270      19227      17387      23282   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:     202553     185135     134999     185071   Performance
monitoring interrupts
PND:     201464     183939     134067     184098   Performance pending work
RES:       2216       2449       1212       1432   Rescheduling interrupts
CAL:    2223380    2226493    2233481    2228957   Function call interrupts
TLB:        606        584       1274       1216   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:          2          2          2          2   Machine check polls
ERR:          3
MIS:          0

ifconfig reported only 2 drops after these 5 minutes. I'm thinking about
removing/changing the hashing algorithm to make dsthash_find faster. All
I need after all is a match against a destination IP address. Also, I'd
like the limit of 10kpps to be a bit higher. I'll see if I can work on
that during the weekend.

Thanks again for everything!

Regards,

Jorrit Kronjee

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2010-03-26 10:41 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-22 10:41 debugging kernel during packet drops Jorrit Kronjee
2010-03-22 17:16 ` Patrick McHardy
2010-03-22 17:53   ` Jan Engelhardt
2010-03-22 18:02     ` Patrick McHardy
2010-03-23 15:14   ` Jorrit Kronjee
2010-03-23 15:39     ` Patrick McHardy
2010-03-23 17:21     ` Eric Dumazet
2010-03-23 20:07       ` Eric Dumazet
2010-03-24 15:20       ` Jorrit Kronjee
2010-03-24 16:21         ` Eric Dumazet
2010-03-24 16:28           ` Jan Engelhardt
2010-03-24 17:04             ` Eric Dumazet
2010-03-24 17:25               ` Jan Engelhardt
2010-03-25  9:32           ` Eric Dumazet
2010-03-25 10:35             ` Patrick McHardy
2010-03-25 11:02               ` Eric Dumazet
2010-03-31 12:23                 ` [PATCH nf-next-2.6] xt_hashlimit: RCU conversion Eric Dumazet
2010-04-01 11:03                   ` Patrick McHardy
2010-04-01 12:10                     ` Eric Dumazet
2010-04-01 12:36                       ` Patrick McHardy
2010-03-25 12:42               ` debugging kernel during packet drops Jan Engelhardt
2010-03-30 12:06               ` Jan Engelhardt
2010-03-30 14:12                 ` Patrick McHardy
2010-03-26 10:41             ` Jorrit Kronjee [this message]
2010-03-26 11:21               ` Eric Dumazet
2010-03-26 14:17               ` Eric Dumazet
2010-03-26 15:54                 ` Jorrit Kronjee
2010-03-23 17:04 ` James King
2010-03-23 17:23   ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BAC8F42.7090706@infopact.nl \
    --to=j.kronjee@infopact.nl \
    --cc=eric.dumazet@gmail.com \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).