All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jorrit Kronjee <j.kronjee@infopact.nl>
To: Eric Dumazet <eric.dumazet@gmail.com>, netfilter-devel@vger.kernel.org
Subject: Re: debugging kernel during packet drops
Date: Fri, 26 Mar 2010 11:41:06 +0100	[thread overview]
Message-ID: <4BAC8F42.7090706@infopact.nl> (raw)
In-Reply-To: <1269509574.3626.9.camel@edumazet-laptop>

On 3/25/2010 10:32 AM, Eric Dumazet wrote:
> Le mercredi 24 mars 2010 à 17:22 +0100, Eric Dumazet a écrit :
>   
>> Sure this helps a lot !
>>
>> You might try RPS by doing :
>>
>> echo f >/sys/class/net/eth3/queues/rx-0/rps_cpus
>>
>> (But you'll also need a new xt_hashlimit module to make it more
>> scalable, I can work on this this week if necessary)
>>
>>     
> Here is patch I cooked for xt_hashlimit (on top of net-next-2.6) to make
> it use RCU and scale better in your case (allowing several concurrent
> cpus once RPS is activated), but also on more general cases.
>
> [PATCH] xt_hashlimit: RCU conversion
>
> xt_hashlimit uses a central lock per hash table and suffers from
> contention on some workloads.
>
> After RCU conversion, central lock is only used when a writer wants to
> add or delete an entry. For 'readers', updating an existing entry, they
> use an individual lock per entry.
>   
Eric,

Awesome work, thanks for the effort! I've tried the patch and got some
results. The drop rate was reduced dramatically after I activated RPS.

I did the same test I did before, namely I rebooted and started flooding
the machine immediately after with 300 kpps. After 5 minutes, perf top
looked like this:

-------------------------------------------------------------------------------------------------------------------------
   PerfTop:    1962 irqs/sec  kernel:99.3% [1000Hz cycles],  (all, 4 CPUs)
-------------------------------------------------------------------------------------------------------------------------

             samples  pcnt function                 DSO
             _______ _____ ________________________
_____________________________________________________________________

             4501.00 14.0% __ticket_spin_lock      
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
             2985.00  9.3% dsthash_find            
/lib/modules/2.6.34-rc1-net-next/kernel/net/netfilter/xt_hashlimit.ko
             2346.00  7.3% __ticket_spin_unlock    
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
             1354.00  4.2% e1000_xmit_frame        
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000e/e1000e.ko
             1070.00  3.3% __slab_free             
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              997.00  3.1% memcpy                  
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              809.00  2.5% dev_queue_xmit          
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              791.00  2.5% nf_iterate              
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              705.00  2.2% e1000_clean_tx_irq      
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000e/e1000e.ko
              634.00  2.0% nf_hook_slow            
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              624.00  1.9% skb_release_head_state  
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              584.00  1.8% e1000_intr              
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000/e1000.ko
              536.00  1.7% br_nf_pre_routing_finish
/lib/modules/2.6.34-rc1-net-next/kernel/net/bridge/bridge.ko
              528.00  1.6% nommu_map_page          
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              499.00  1.6% kfree                   
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              494.00  1.5% __netif_receive_skb     
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              472.00  1.5% __alloc_skb             
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              448.00  1.4% br_fdb_update           
/lib/modules/2.6.34-rc1-net-next/kernel/net/bridge/bridge.ko
              437.00  1.4% __slab_alloc            
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              428.00  1.3% ipt_do_table             [ip_tables]
              403.00  1.3% memset                  
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              402.00  1.3% br_handle_frame         
/lib/modules/2.6.34-rc1-net-next/kernel/net/bridge/bridge.ko
              389.00  1.2% e1000_clean_rx_irq      
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000/e1000.ko
              388.00  1.2% e1000_clean             
/lib/modules/2.6.34-rc1-net-next/kernel/drivers/net/e1000/e1000.ko
              381.00  1.2% uhci_irq                
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              366.00  1.1% get_rps_cpu             
/lib/modules/2.6.34-rc1-net-next/build/vmlinux
              365.00  1.1% br_nf_pre_routing       
/lib/modules/2.6.34-rc1-net-next/kernel/net/bridge/bridge.ko
              349.00  1.1% dst_release             
/lib/modules/2.6.34-rc1-net-next/build/vmlinux

And iptables-save -c produced this:
# Generated by iptables-save v1.4.4 on Fri Mar 26 11:24:59 2010
*filter
:INPUT ACCEPT [1043:60514]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [942:282723]
[99563191:3783420610] -A FORWARD -m hashlimit --hashlimit-upto 10000/sec
--hashlimit-burst 100 --hashlimit-mode dstip --hashlimit-name hashtable
--hashlimit-htable-max 131072 --hashlimit-htable-expire 1000 -j ACCEPT
[0:0] -A FORWARD -m limit --limit 5/sec -j LOG --log-prefix "HASHLIMITED
-- "
[0:0] -A FORWARD -j DROP
COMMIT
# Completed on Fri Mar 26 11:24:59 2010

And /proc/interrupts looked like this:
     CPU0       CPU1       CPU2       CPU3
  0:         47          0          1          0   IO-APIC-edge      timer
  1:          0          1          0          1   IO-APIC-edge      i8042
  6:          1          1          0          0   IO-APIC-edge      floppy
  8:          1          0          0          0   IO-APIC-edge      rtc0
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 12:          0          1          1          2   IO-APIC-edge      i8042
 14:         21         22         22         21   IO-APIC-edge     
ata_piix
 15:          0          0          0          0   IO-APIC-edge     
ata_piix
 16:        492        464        463        474   IO-APIC-fasteoi   arcmsr
 17:          0          0          0          0   IO-APIC-fasteoi  
ehci_hcd:usb1
 18:     971171     971391     948171     948663   IO-APIC-fasteoi  
uhci_hcd:usb3, uhci_hcd:usb7, eth3
 19:          0          0          0          0   IO-APIC-fasteoi  
uhci_hcd:usb6
 21:          0          0          0          0   IO-APIC-fasteoi  
ata_piix, uhci_hcd:usb4
 23:          1          0          1          0   IO-APIC-fasteoi  
ehci_hcd:usb2, uhci_hcd:usb5
 27:    1003145    1002952    1026174    1025671   PCI-MSI-edge      eth4
NMI:     202553     185135     134999     185071   Non-maskable interrupts
LOC:      20270      19227      17387      23282   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:     202553     185135     134999     185071   Performance
monitoring interrupts
PND:     201464     183939     134067     184098   Performance pending work
RES:       2216       2449       1212       1432   Rescheduling interrupts
CAL:    2223380    2226493    2233481    2228957   Function call interrupts
TLB:        606        584       1274       1216   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:          2          2          2          2   Machine check polls
ERR:          3
MIS:          0

ifconfig reported only 2 drops after these 5 minutes. I'm thinking about
removing/changing the hashing algorithm to make dsthash_find faster. All
I need after all is a match against a destination IP address. Also, I'd
like the limit of 10kpps to be a bit higher. I'll see if I can work on
that during the weekend.

Thanks again for everything!

Regards,

Jorrit Kronjee

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2010-03-26 10:41 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-22 10:41 debugging kernel during packet drops Jorrit Kronjee
2010-03-22 17:16 ` Patrick McHardy
2010-03-22 17:53   ` Jan Engelhardt
2010-03-22 18:02     ` Patrick McHardy
2010-03-23 15:14   ` Jorrit Kronjee
2010-03-23 15:39     ` Patrick McHardy
2010-03-23 17:21     ` Eric Dumazet
2010-03-23 20:07       ` Eric Dumazet
2010-03-24 15:20       ` Jorrit Kronjee
2010-03-24 16:21         ` Eric Dumazet
2010-03-24 16:28           ` Jan Engelhardt
2010-03-24 17:04             ` Eric Dumazet
2010-03-24 17:25               ` Jan Engelhardt
2010-03-25  9:32           ` Eric Dumazet
2010-03-25 10:35             ` Patrick McHardy
2010-03-25 11:02               ` Eric Dumazet
2010-03-31 12:23                 ` [PATCH nf-next-2.6] xt_hashlimit: RCU conversion Eric Dumazet
2010-04-01 11:03                   ` Patrick McHardy
2010-04-01 12:10                     ` Eric Dumazet
2010-04-01 12:36                       ` Patrick McHardy
2010-03-25 12:42               ` debugging kernel during packet drops Jan Engelhardt
2010-03-30 12:06               ` Jan Engelhardt
2010-03-30 14:12                 ` Patrick McHardy
2010-03-26 10:41             ` Jorrit Kronjee [this message]
2010-03-26 11:21               ` Eric Dumazet
2010-03-26 14:17               ` Eric Dumazet
2010-03-26 15:54                 ` Jorrit Kronjee
2010-03-23 17:04 ` James King
2010-03-23 17:23   ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BAC8F42.7090706@infopact.nl \
    --to=j.kronjee@infopact.nl \
    --cc=eric.dumazet@gmail.com \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.