* 2.5.50 + e100 benchmarking
@ 2002-12-08 12:44 Anton Blanchard
2002-12-08 20:08 ` Andrew Morton
2002-12-09 18:08 ` Serge Kuznetsov
0 siblings, 2 replies; 4+ messages in thread
From: Anton Blanchard @ 2002-12-08 12:44 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 3409 bytes --]
Hi,
Ive got the benchmarking itch and am still waiting for the mail to
deliver me some e1000 size christmas presents, so Ive started playing
with some e100s that were lying around.
Setup:
2.5.50-BK, 2 ppc64 partitions, one e100 card in each, 1500 byte MTU.
In all the runs we were pumping 11.76MB/sec down the socket.
We are sending bytes down a TCP socket (using tridge's socklib), the send
side looks like:
write(4, "ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ"..., 65536) = 65536
So we are pushing 64kB into the networking layer at a time. And the
read side looks like this:
read(3, "ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ"..., 65536) = 8688
So we are getting about 8kB per read. (Im guessing due to rx interrupt
mitigation).
First let me explain the patches I have attached.
1. e100_nodisable
e100_intr was the worst function in a profile. PCI reads are very costly
(and PCI reads that flush posted writes are even worse) and we were
disabling and enabling the on chip interrupt bit for each interrupt
(both operations had a PCI read to flush the write).
The question is, why do we need to disable and reenable interrupts via
the on chip status register? At least on ppc64 we cant take the same
interrupt recursively, isnt this the case on x86?
Andrew Morton's cyclesoak to the rescue:
Before:
System load: 6.4% || Free: 74.4%(0) 100.1%(1) 100.1%(2) 99.7%(3)
After:
System load: 5.1% || Free: 79.6%(0) 100.0%(1) 100.1%(2) 99.7%(3)
(Ignore the 3 other cpus, I have locked both irq and process to cpu 0)
74.4% -> 79.6% idle. So e100_nodisable is worth 5% on my machine. Not bad.
2. e100_txchecksum
In recent 2.5 I found almost every tx packet had an invalid pseudo
header checksum. We didnt catch this in 2.4 because we would only use tx
checksumming for zero copy. In 2.5 we use it whenever we can (and thats
good, our copy_to/from_user has been optimised to within an inch of its
life thanks to paulus).
Anyway, I know zip about this stuff but it seems (from a quick look
of the acenic and tg3 drivers) that Linux always computes this
checksum. Bottom line is the patch fixes the problems I was seeing.
OK now to get a feel for what is going on:
sending (roughly):
9k irqs/second
900 context switches/second
20.5% CPU
receiving (roughly):
2.3k irqs/second
3k context switches/second
13.5% CPU
Most of the extra cost on send appears to be the higher interrupt rate.
So this begs the question, can we be more agressive with the tx
interrupt mitigation? I had a quick play with some of the e100 options
and it gave some short term relief (4k/sec) but then jumped back up to
9k/sec again.
For those who have made it this far down, here are some profiles :)
http://samba.org/~anton/linux/2.5.50-BK/e100/
Keep in mind no idle time shows up here because I was running akpm(TM)
cyclesoak.
As you can see for the receiving side (sock_sink), copy_tofrom user is
the worst offender. Very nice. Ignore plpar_hcall_norets, its some magic
we do for dynamic PCI mapping. Also note profile hits get attributed to
the following instruction, eg in e100intr we see a bunch of time just
after the first lhbrx (the number on the left is % time of the entire
function). lhbrx is a byte reversed load - in this case it happens to be
a PCI memory read.
On the send side (sock_source) the higher interrupt rate shows up.
(hmm I wonder how we got idle time here, cyclesoak should have sucked
all of it up).
Anton
[-- Attachment #2: e100_nodisable --]
[-- Type: text/plain, Size: 715 bytes --]
===== drivers/net/e100/e100_main.c 1.30 vs edited =====
--- 1.30/drivers/net/e100/e100_main.c Sat Nov 30 03:18:46 2002
+++ edited/drivers/net/e100/e100_main.c Sun Dec 8 20:54:06 2002
@@ -1774,15 +1774,11 @@
return;
}
- /* disable intr before we ack & after identifying the intr as ours */
- e100_dis_intr(bdp);
-
writew(intr_status, &bdp->scb->scb_status); /* ack intrs */
readw(&bdp->scb->scb_status);
/* the device is closed, don't continue or else bad things may happen. */
if (!netif_running(dev)) {
- e100_set_intr_mask(bdp);
return;
}
@@ -1801,8 +1797,6 @@
bdp->tx_count = 0; /* restart tx interrupt batch count */
e100_tx_srv(bdp);
}
-
- e100_set_intr_mask(bdp);
}
/**
[-- Attachment #3: e100_txchecksum --]
[-- Type: text/plain, Size: 2261 bytes --]
===== drivers/net/e100/e100.h 1.19 vs edited =====
--- 1.19/drivers/net/e100/e100.h Wed Nov 6 03:31:33 2002
+++ edited/drivers/net/e100/e100.h Sun Dec 8 20:43:03 2002
@@ -705,8 +705,6 @@
#define IPCB_HARDWAREPARSING_ENABLE BIT_0
#define IPCB_INSERTVLAN_ENABLE BIT_1
#define IPCB_IP_ACTIVATION_DEFAULT IPCB_HARDWAREPARSING_ENABLE
-
-#define FOLD_CSUM(_XSUM) ((((_XSUM << 16) | (_XSUM >> 16)) + _XSUM) >> 16)
/* Transmit Buffer Descriptor (TBD)*/
typedef struct _tbd_t {
===== drivers/net/e100/e100_main.c 1.30 vs edited =====
--- 1.30/drivers/net/e100/e100_main.c Sat Nov 30 03:18:46 2002
+++ edited/drivers/net/e100/e100_main.c Sun Dec 8 20:54:06 2002
@@ -2053,32 +2047,6 @@
}
/**
- * e100_pseudo_hdr_csum - compute IP pseudo-header checksum
- * @ip: points to the header of the IP packet
- *
- * Return the 16 bit checksum of the IP pseudo-header.,which is computed
- * on the fields: IP src, IP dst, next protocol, payload length.
- * The checksum vaule is returned in network byte order.
- */
-static inline u16
-e100_pseudo_hdr_csum(const struct iphdr *ip)
-{
- u32 pseudo = 0;
- u32 payload_len = 0;
-
- payload_len = ntohs(ip->tot_len) - (ip->ihl * 4);
-
- pseudo += htons(payload_len);
- pseudo += (ip->protocol << 8);
- pseudo += ip->saddr & 0x0000ffff;
- pseudo += (ip->saddr & 0xffff0000) >> 16;
- pseudo += ip->daddr & 0x0000ffff;
- pseudo += (ip->daddr & 0xffff0000) >> 16;
-
- return FOLD_CSUM(pseudo);
-}
-
-/**
* e100_prepare_xmit_buff - prepare a buffer for transmission
* @bdp: atapter's private data struct
* @skb: skb to send
@@ -2121,27 +2089,13 @@
if ((ip->protocol == IPPROTO_TCP) ||
(ip->protocol == IPPROTO_UDP)) {
- u16 *chksum;
-
tcb->tcbu.ipcb.ip_activation_high =
IPCB_HARDWAREPARSING_ENABLE;
tcb->tcbu.ipcb.ip_schedule |=
IPCB_TCPUDP_CHECKSUM_ENABLE;
- if (ip->protocol == IPPROTO_TCP) {
- struct tcphdr *tcp;
-
- tcp = (struct tcphdr *) ((u32 *) ip + ip->ihl);
- chksum = &(tcp->check);
+ if (ip->protocol == IPPROTO_TCP)
tcb->tcbu.ipcb.ip_schedule |= IPCB_TCP_PACKET;
- } else {
- struct udphdr *udp;
-
- udp = (struct udphdr *) ((u32 *) ip + ip->ihl);
- chksum = &(udp->check);
- }
-
- *chksum = e100_pseudo_hdr_csum(ip);
}
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.5.50 + e100 benchmarking
2002-12-08 12:44 2.5.50 + e100 benchmarking Anton Blanchard
@ 2002-12-08 20:08 ` Andrew Morton
2002-12-09 13:59 ` Anton Blanchard
2002-12-09 18:08 ` Serge Kuznetsov
1 sibling, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2002-12-08 20:08 UTC (permalink / raw)
To: Anton Blanchard; +Cc: linux-kernel
Anton Blanchard wrote:
>
> On the send side (sock_source) the higher interrupt rate shows up.
> (hmm I wonder how we got idle time here, cyclesoak should have sucked
> all of it up).
That has to be a CPU scheduler problem; I think Andrea identified
a glitch which could do that.
Could you put together an isolated testcase?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.5.50 + e100 benchmarking
2002-12-08 20:08 ` Andrew Morton
@ 2002-12-09 13:59 ` Anton Blanchard
0 siblings, 0 replies; 4+ messages in thread
From: Anton Blanchard @ 2002-12-09 13:59 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
> That has to be a CPU scheduler problem; I think Andrea identified
> a glitch which could do that.
>
> Could you put together an isolated testcase?
It might have been operator error, still trying to reproduce it.
Anton
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.5.50 + e100 benchmarking
2002-12-08 12:44 2.5.50 + e100 benchmarking Anton Blanchard
2002-12-08 20:08 ` Andrew Morton
@ 2002-12-09 18:08 ` Serge Kuznetsov
1 sibling, 0 replies; 4+ messages in thread
From: Serge Kuznetsov @ 2002-12-09 18:08 UTC (permalink / raw)
To: Anton Blanchard, linux-kernel
> e100_intr was the worst function in a profile. PCI reads are very costly
I profiled this part and I found what the main issue is the e100_alloc_skbs.
It takes from 10000 to ~19000 processor's tacts.
At the same time eepro100's speedo_iterrupt is pretty fast.
All the Best!
Serge.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2002-12-09 18:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-12-08 12:44 2.5.50 + e100 benchmarking Anton Blanchard
2002-12-08 20:08 ` Andrew Morton
2002-12-09 13:59 ` Anton Blanchard
2002-12-09 18:08 ` Serge Kuznetsov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox