RE: Using old CPU for 100s of clients

Linux Netfilter discussions
 help / color / mirror / Atom feed

From: "Shawn Wright" <swright@sls.bc.ca>
To: Daniel Chemko <dchemko@smgtec.com>, netfilter@lists.netfilter.org
Subject: RE: Using old CPU for 100s of clients
Date: Fri, 03 Dec 2004 13:57:10 -0800	[thread overview]
Message-ID: <41B070B6.23668.B47D46A8@localhost> (raw)
In-Reply-To: <7C9884991ADAE0479C14F10C858BCDF591E3A1@alderaan.smgtec.com>

On 3 Dec 2004 at 12:22, Daniel Chemko wrote:

> The Speed problems may not be isolated to your CPU. You'll want to make
> sure your conntrack table isn't getting full, and that conntracks are
> safely getting expired from your system. Are you using a custom kernel,
> or a stock distro one?

Thanks for the reply. I didn't give many details because I've already beat 
this to death on the Shorewall list before coming here (I know, I should 
have started here). It is a custom kernel, as all of the recent stock kernels 
will not boot on this machine - APIC must be disabled (it's an old DEC 
Prioris). I have tried 2.4.22, two different Mandrake releases, along with a 
plain 2.4.28 from kernel.org. It is possible that I've messed up somehow, 
so I plan on taking a stock 2.4.22-37mdk kernel that currently runs well on 
a P3/667, and compile it, making no change except for CPU support and 
APIC. This might help isolate the problem.

> Just for fun, could you forward me the following:
> 
> # cat /proc/loadavg
Load average *never* goes above 0.3, currently all zeros...
I don't believe the system CPU% factors into the loadavg though?

> # free
             total       used       free     shared    buffers     cached
Mem:        223208     219472       3736          0          0     127028
-/+ buffers/cache:      92444     130764
Swap:       409616          0     409616

> # iostat 20 2 (sysstat package is nice for accounting)
don't have this installed, although I plan to... 

> # top (grab the CPU lines, over time is best)
top will show up to ~13% system CPU% during a load test when I pass 
1000kB/s + across the 10Mb link. Otherwise, it is rarely over 5% system.

> # cat /proc/slabinfo
I've looked at this also - our peak conntrack count is around 4000, max is 
set to 16K. I've also tried it at 64K, and set the hashsize upon load of 
ip_conntrack module to 64K, just for fun, made no difference.

> # cat /proc/net/ip_conntrack | wc -l
Usually around 1500, but I have seen 4000 peak. 

> # hdparm /dev/<your disk(s)>
This is from the "bad" machine. All machines use a 3940 PCI SCSI with 
aic7xxx driver, and one or more Seagate Cheetah 10K 9Gb drives.

/dev/sda:
 readonly     =  0 (off)
 geometry     = 1106/255/63, sectors = 17783240, start = 0

> # cat /proc/sys/net/ipv4/netfilter/ip_conntrack_max
Tried 16k and 64k...

> # netstat -i
This is from current live firewall (the good one). The bad one has been 
rebooted since the last time I tried it live, so no data.
Kernel Interface table
Iface     MTU Met   RX-OK RX-ERR RX-DRP RX-OVR   TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0       1500   058662850      1      0      074520718      3      0      0 BMRU
eth1       1500   074674696      0      0      057280898      0      0      0 BMRU
lo        16436   0   89156      0      0      0   89156      0      0      0 LRU

> # mii-tool
I've used this exhaustively to check the NICs are setup right. The outside 
NIC goes to a Cat1900 forced 10FD, and they are notoriously bad at 
playing nice with NICs. No errors though as you can see above on eth1.
The inside link is 100Mb FD to a Cat 3500, and again no errors. Current 
NICs are one Intel E100B (eepro100 driver), and a Dlink DFE500TX (tulip 
driver). I have tried all combinations of e100/eepro100/tulip with half a 
dozen different NICs, no change in symptoms.

I should mention that we can reproduce the problem within a few minutes 
of hitting random web sites, waiting for one to "hang". We've eliminated 
our DNS and proxy as sources of the problem - it occurs when bypassing 
proxy and NATing through firewall. Have tried 3 different DNS servers, 
squid reports avg DNS times of < 100ms. We're talking up to 20sec 
delays before getting data from a website, even timeouts. A second visit 
to same site, different pages, is quick. To duplicate we need to hit random 
sites, but can do so within a few minutes, even when network load is low.

> wow.. there are a lot of areas to look into.. Anyways, hope to find
> something.

So do I...
 
> Good ol' BC boy!

Nice to hear from someone nearby! :-)

Thanks!
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Shawn Wright, I.T. Manager
Shawnigan Lake School
http://www.sls.bc.ca
swright@sls.bc.ca

next prev parent reply	other threads:[~2004-12-03 21:57 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-03 20:22 Using old CPU for 100s of clients Daniel Chemko
2004-12-03 21:57 ` Shawn Wright [this message]
2004-12-04  1:24   ` Shawn Wright
2004-12-04  1:27   ` Michael Gale
  -- strict thread matches above, loose matches on Subject: below --
2004-12-03 20:06 Shawn Wright

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41B070B6.23668.B47D46A8@localhost \
    --to=swright@sls.bc.ca \
    --cc=dchemko@smgtec.com \
    --cc=netfilter@lists.netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox