* [1/5] [NET]: Merge TSO/UFO fields in sk_buff
2006-06-20 9:09 [0/5] GSO: Generic Segmentation Offload Herbert Xu
@ 2006-06-20 9:10 ` Herbert Xu
2006-06-21 21:48 ` Michael Chan
2006-06-20 9:28 ` [2/5] [NET]: Add generic segmentation offload Herbert Xu
` (4 subsequent siblings)
5 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2006-06-20 9:10 UTC (permalink / raw)
To: David S. Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 1451 bytes --]
Hi:
[NET]: Merge TSO/UFO fields in sk_buff
Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
going to scale if we add any more segmentation methods (e.g., DCCP). So
let's merge them.
They were used to tell the protocol of a packet. This function has been
subsumed by the new gso_type field. This is essentially a set of netdev
feature bits (shifted by 16 bits) that are required to process a specific
skb. As such it's easy to tell whether a given device can process a GSO
skb: you just have to and the gso_type field and the netdev's features
field.
I've made gso_type a conjunction. The idea is that you have a base type
(e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
For example, if we add a hardware TSO type that supports ECN, they would
declare NETIF_F_TSO | NETIF_F_TSO_ECN. All TSO packets with CWR set would
have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
packets would be SKB_GSO_TCPV4. This means that only the CWR packets need
to be emulated in software. The emulation could even chop it up into one
CWR fragment and another super-packet to be further segmented by the NIC.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p1.patch --]
[-- Type: text/plain, Size: 25944 bytes --]
diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c
--- a/drivers/net/8139cp.c
+++ b/drivers/net/8139cp.c
@@ -792,7 +792,7 @@ static int cp_start_xmit (struct sk_buff
entry = cp->tx_head;
eor = (entry == (CP_TX_RING_SIZE - 1)) ? RingEnd : 0;
if (dev->features & NETIF_F_TSO)
- mss = skb_shinfo(skb)->tso_size;
+ mss = skb_shinfo(skb)->gso_size;
if (skb_shinfo(skb)->nr_frags == 0) {
struct cp_desc *txd = &cp->tx_ring[entry];
diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -1640,7 +1640,7 @@ bnx2_tx_int(struct bnx2 *bp)
skb = tx_buf->skb;
#ifdef BCM_TSO
/* partial BD completions possible with TSO packets */
- if (skb_shinfo(skb)->tso_size) {
+ if (skb_shinfo(skb)->gso_size) {
u16 last_idx, last_ring_idx;
last_idx = sw_cons +
@@ -4428,7 +4428,7 @@ bnx2_start_xmit(struct sk_buff *skb, str
(TX_BD_FLAGS_VLAN_TAG | (vlan_tx_tag_get(skb) << 16));
}
#ifdef BCM_TSO
- if ((mss = skb_shinfo(skb)->tso_size) &&
+ if ((mss = skb_shinfo(skb)->gso_size) &&
(skb->len > (bp->dev->mtu + ETH_HLEN))) {
u32 tcp_opt_len, ip_tcp_len;
diff --git a/drivers/net/chelsio/sge.c b/drivers/net/chelsio/sge.c
--- a/drivers/net/chelsio/sge.c
+++ b/drivers/net/chelsio/sge.c
@@ -1418,7 +1418,7 @@ int t1_start_xmit(struct sk_buff *skb, s
struct cpl_tx_pkt *cpl;
#ifdef NETIF_F_TSO
- if (skb_shinfo(skb)->tso_size) {
+ if (skb_shinfo(skb)->gso_size) {
int eth_type;
struct cpl_tx_pkt_lso *hdr;
@@ -1433,7 +1433,7 @@ int t1_start_xmit(struct sk_buff *skb, s
hdr->ip_hdr_words = skb->nh.iph->ihl;
hdr->tcp_hdr_words = skb->h.th->doff;
hdr->eth_type_mss = htons(MK_ETH_TYPE_MSS(eth_type,
- skb_shinfo(skb)->tso_size));
+ skb_shinfo(skb)->gso_size));
hdr->len = htonl(skb->len - sizeof(*hdr));
cpl = (struct cpl_tx_pkt *)hdr;
sge->stats.tx_lso_pkts++;
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -2394,7 +2394,7 @@ e1000_tso(struct e1000_adapter *adapter,
uint8_t ipcss, ipcso, tucss, tucso, hdr_len;
int err;
- if (skb_shinfo(skb)->tso_size) {
+ if (skb_shinfo(skb)->gso_size) {
if (skb_header_cloned(skb)) {
err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
if (err)
@@ -2402,7 +2402,7 @@ e1000_tso(struct e1000_adapter *adapter,
}
hdr_len = ((skb->h.raw - skb->data) + (skb->h.th->doff << 2));
- mss = skb_shinfo(skb)->tso_size;
+ mss = skb_shinfo(skb)->gso_size;
if (skb->protocol == htons(ETH_P_IP)) {
skb->nh.iph->tot_len = 0;
skb->nh.iph->check = 0;
@@ -2519,7 +2519,7 @@ e1000_tx_map(struct e1000_adapter *adapt
* tso gets written back prematurely before the data is fully
* DMA'd to the controller */
if (!skb->data_len && tx_ring->last_tx_tso &&
- !skb_shinfo(skb)->tso_size) {
+ !skb_shinfo(skb)->gso_size) {
tx_ring->last_tx_tso = 0;
size -= 4;
}
@@ -2757,7 +2757,7 @@ e1000_xmit_frame(struct sk_buff *skb, st
}
#ifdef NETIF_F_TSO
- mss = skb_shinfo(skb)->tso_size;
+ mss = skb_shinfo(skb)->gso_size;
/* The controller does a simple calculation to
* make sure there is enough room in the FIFO before
* initiating the DMA for each buffer. The calc is:
@@ -2807,7 +2807,7 @@ e1000_xmit_frame(struct sk_buff *skb, st
#ifdef NETIF_F_TSO
/* Controller Erratum workaround */
if (!skb->data_len && tx_ring->last_tx_tso &&
- !skb_shinfo(skb)->tso_size)
+ !skb_shinfo(skb)->gso_size)
count++;
#endif
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -1495,8 +1495,8 @@ static int nv_start_xmit(struct sk_buff
np->tx_skbuff[nr] = skb;
#ifdef NETIF_F_TSO
- if (skb_shinfo(skb)->tso_size)
- tx_flags_extra = NV_TX2_TSO | (skb_shinfo(skb)->tso_size << NV_TX2_TSO_SHIFT);
+ if (skb_shinfo(skb)->gso_size)
+ tx_flags_extra = NV_TX2_TSO | (skb_shinfo(skb)->gso_size << NV_TX2_TSO_SHIFT);
else
#endif
tx_flags_extra = (skb->ip_summed == CHECKSUM_HW ? (NV_TX2_CHECKSUM_L3|NV_TX2_CHECKSUM_L4) : 0);
diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
--- a/drivers/net/ixgb/ixgb_main.c
+++ b/drivers/net/ixgb/ixgb_main.c
@@ -1173,7 +1173,7 @@ ixgb_tso(struct ixgb_adapter *adapter, s
uint16_t ipcse, tucse, mss;
int err;
- if(likely(skb_shinfo(skb)->tso_size)) {
+ if(likely(skb_shinfo(skb)->gso_size)) {
if (skb_header_cloned(skb)) {
err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
if (err)
@@ -1181,7 +1181,7 @@ ixgb_tso(struct ixgb_adapter *adapter, s
}
hdr_len = ((skb->h.raw - skb->data) + (skb->h.th->doff << 2));
- mss = skb_shinfo(skb)->tso_size;
+ mss = skb_shinfo(skb)->gso_size;
skb->nh.iph->tot_len = 0;
skb->nh.iph->check = 0;
skb->h.th->check = ~csum_tcpudp_magic(skb->nh.iph->saddr,
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -74,7 +74,7 @@ static void emulate_large_send_offload(s
struct iphdr *iph = skb->nh.iph;
struct tcphdr *th = (struct tcphdr*)(skb->nh.raw + (iph->ihl * 4));
unsigned int doffset = (iph->ihl + th->doff) * 4;
- unsigned int mtu = skb_shinfo(skb)->tso_size + doffset;
+ unsigned int mtu = skb_shinfo(skb)->gso_size + doffset;
unsigned int offset = 0;
u32 seq = ntohl(th->seq);
u16 id = ntohs(iph->id);
@@ -139,7 +139,7 @@ static int loopback_xmit(struct sk_buff
#endif
#ifdef LOOPBACK_TSO
- if (skb_shinfo(skb)->tso_size) {
+ if (skb_shinfo(skb)->gso_size) {
BUG_ON(skb->protocol != htons(ETH_P_IP));
BUG_ON(skb->nh.iph->protocol != IPPROTO_TCP);
diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
--- a/drivers/net/myri10ge/myri10ge.c
+++ b/drivers/net/myri10ge/myri10ge.c
@@ -1879,7 +1879,7 @@ again:
#ifdef NETIF_F_TSO
if (skb->len > (dev->mtu + ETH_HLEN)) {
- mss = skb_shinfo(skb)->tso_size;
+ mss = skb_shinfo(skb)->gso_size;
if (mss != 0)
max_segments = MYRI10GE_MAX_SEND_DESC_TSO;
}
@@ -2113,7 +2113,7 @@ abort_linearize:
}
idx = (idx + 1) & tx->mask;
} while (idx != last_idx);
- if (skb_shinfo(skb)->tso_size) {
+ if (skb_shinfo(skb)->gso_size) {
printk(KERN_ERR
"myri10ge: %s: TSO but wanted to linearize?!?!?\n",
mgp->dev->name);
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -2172,7 +2172,7 @@ static int rtl8169_xmit_frags(struct rtl
static inline u32 rtl8169_tso_csum(struct sk_buff *skb, struct net_device *dev)
{
if (dev->features & NETIF_F_TSO) {
- u32 mss = skb_shinfo(skb)->tso_size;
+ u32 mss = skb_shinfo(skb)->gso_size;
if (mss)
return LargeSend | ((mss & MSSMask) << MSSShift);
diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -3915,8 +3915,8 @@ static int s2io_xmit(struct sk_buff *skb
txdp->Control_1 = 0;
txdp->Control_2 = 0;
#ifdef NETIF_F_TSO
- mss = skb_shinfo(skb)->tso_size;
- if (mss) {
+ mss = skb_shinfo(skb)->gso_size;
+ if (skb_shinfo(skb)->gso_type == SKB_GSO_TCPV4) {
txdp->Control_1 |= TXD_TCP_LSO_EN;
txdp->Control_1 |= TXD_TCP_LSO_MSS(mss);
}
@@ -3936,10 +3936,10 @@ static int s2io_xmit(struct sk_buff *skb
}
frg_len = skb->len - skb->data_len;
- if (skb_shinfo(skb)->ufo_size) {
+ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4) {
int ufo_size;
- ufo_size = skb_shinfo(skb)->ufo_size;
+ ufo_size = skb_shinfo(skb)->gso_size;
ufo_size &= ~7;
txdp->Control_1 |= TXD_UFO_EN;
txdp->Control_1 |= TXD_UFO_MSS(ufo_size);
@@ -3965,7 +3965,7 @@ static int s2io_xmit(struct sk_buff *skb
txdp->Host_Control = (unsigned long) skb;
txdp->Control_1 |= TXD_BUFFER0_SIZE(frg_len);
- if (skb_shinfo(skb)->ufo_size)
+ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4)
txdp->Control_1 |= TXD_UFO_EN;
frg_cnt = skb_shinfo(skb)->nr_frags;
@@ -3980,12 +3980,12 @@ static int s2io_xmit(struct sk_buff *skb
(sp->pdev, frag->page, frag->page_offset,
frag->size, PCI_DMA_TODEVICE);
txdp->Control_1 = TXD_BUFFER0_SIZE(frag->size);
- if (skb_shinfo(skb)->ufo_size)
+ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4)
txdp->Control_1 |= TXD_UFO_EN;
}
txdp->Control_1 |= TXD_GATHER_CODE_LAST;
- if (skb_shinfo(skb)->ufo_size)
+ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4)
frg_cnt++; /* as Txd0 was used for inband header */
tx_fifo = mac_control->tx_FIFO_start[queue];
@@ -3999,7 +3999,7 @@ static int s2io_xmit(struct sk_buff *skb
if (mss)
val64 |= TX_FIFO_SPECIAL_FUNC;
#endif
- if (skb_shinfo(skb)->ufo_size)
+ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4)
val64 |= TX_FIFO_SPECIAL_FUNC;
writeq(val64, &tx_fifo->List_Control);
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1160,7 +1160,7 @@ static unsigned tx_le_req(const struct s
count = sizeof(dma_addr_t) / sizeof(u32);
count += skb_shinfo(skb)->nr_frags * count;
- if (skb_shinfo(skb)->tso_size)
+ if (skb_shinfo(skb)->gso_size)
++count;
if (skb->ip_summed == CHECKSUM_HW)
@@ -1232,7 +1232,7 @@ static int sky2_xmit_frame(struct sk_buf
}
/* Check for TCP Segmentation Offload */
- mss = skb_shinfo(skb)->tso_size;
+ mss = skb_shinfo(skb)->gso_size;
if (mss != 0) {
/* just drop the packet if non-linear expansion fails */
if (skb_header_cloned(skb) &&
diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -3780,7 +3780,7 @@ static int tg3_start_xmit(struct sk_buff
#if TG3_TSO_SUPPORT != 0
mss = 0;
if (skb->len > (tp->dev->mtu + ETH_HLEN) &&
- (mss = skb_shinfo(skb)->tso_size) != 0) {
+ (mss = skb_shinfo(skb)->gso_size) != 0) {
int tcp_opt_len, ip_tcp_len;
if (skb_header_cloned(skb) &&
@@ -3905,7 +3905,7 @@ static int tg3_start_xmit_dma_bug(struct
#if TG3_TSO_SUPPORT != 0
mss = 0;
if (skb->len > (tp->dev->mtu + ETH_HLEN) &&
- (mss = skb_shinfo(skb)->tso_size) != 0) {
+ (mss = skb_shinfo(skb)->gso_size) != 0) {
int tcp_opt_len, ip_tcp_len;
if (skb_header_cloned(skb) &&
diff --git a/drivers/net/typhoon.c b/drivers/net/typhoon.c
--- a/drivers/net/typhoon.c
+++ b/drivers/net/typhoon.c
@@ -340,7 +340,7 @@ enum state_values {
#endif
#if defined(NETIF_F_TSO)
-#define skb_tso_size(x) (skb_shinfo(x)->tso_size)
+#define skb_tso_size(x) (skb_shinfo(x)->gso_size)
#define TSO_NUM_DESCRIPTORS 2
#define TSO_OFFLOAD_ON TYPHOON_OFFLOAD_TCP_SEGMENT
#else
diff --git a/drivers/s390/net/qeth_eddp.c b/drivers/s390/net/qeth_eddp.c
--- a/drivers/s390/net/qeth_eddp.c
+++ b/drivers/s390/net/qeth_eddp.c
@@ -420,7 +420,7 @@ __qeth_eddp_fill_context_tcp(struct qeth
}
tcph = eddp->skb->h.th;
while (eddp->skb_offset < eddp->skb->len) {
- data_len = min((int)skb_shinfo(eddp->skb)->tso_size,
+ data_len = min((int)skb_shinfo(eddp->skb)->gso_size,
(int)(eddp->skb->len - eddp->skb_offset));
/* prepare qdio hdr */
if (eddp->qh.hdr.l2.id == QETH_HEADER_TYPE_LAYER2){
@@ -515,20 +515,20 @@ qeth_eddp_calc_num_pages(struct qeth_edd
QETH_DBF_TEXT(trace, 5, "eddpcanp");
/* can we put multiple skbs in one page? */
- skbs_per_page = PAGE_SIZE / (skb_shinfo(skb)->tso_size + hdr_len);
+ skbs_per_page = PAGE_SIZE / (skb_shinfo(skb)->gso_size + hdr_len);
if (skbs_per_page > 1){
- ctx->num_pages = (skb_shinfo(skb)->tso_segs + 1) /
+ ctx->num_pages = (skb_shinfo(skb)->gso_segs + 1) /
skbs_per_page + 1;
ctx->elements_per_skb = 1;
} else {
/* no -> how many elements per skb? */
- ctx->elements_per_skb = (skb_shinfo(skb)->tso_size + hdr_len +
+ ctx->elements_per_skb = (skb_shinfo(skb)->gso_size + hdr_len +
PAGE_SIZE) >> PAGE_SHIFT;
ctx->num_pages = ctx->elements_per_skb *
- (skb_shinfo(skb)->tso_segs + 1);
+ (skb_shinfo(skb)->gso_segs + 1);
}
ctx->num_elements = ctx->elements_per_skb *
- (skb_shinfo(skb)->tso_segs + 1);
+ (skb_shinfo(skb)->gso_segs + 1);
}
static inline struct qeth_eddp_context *
diff --git a/drivers/s390/net/qeth_main.c b/drivers/s390/net/qeth_main.c
--- a/drivers/s390/net/qeth_main.c
+++ b/drivers/s390/net/qeth_main.c
@@ -4417,7 +4417,7 @@ qeth_send_packet(struct qeth_card *card,
struct qeth_eddp_context *ctx = NULL;
int tx_bytes = skb->len;
unsigned short nr_frags = skb_shinfo(skb)->nr_frags;
- unsigned short tso_size = skb_shinfo(skb)->tso_size;
+ unsigned short tso_size = skb_shinfo(skb)->gso_size;
int rc;
QETH_DBF_TEXT(trace, 6, "sendpkt");
@@ -4453,7 +4453,7 @@ qeth_send_packet(struct qeth_card *card,
queue = card->qdio.out_qs
[qeth_get_priority_queue(card, skb, ipv, cast_type)];
- if (skb_shinfo(skb)->tso_size)
+ if (skb_shinfo(skb)->gso_size)
large_send = card->options.large_send;
/*are we able to do TSO ? If so ,prepare and send it from here */
diff --git a/drivers/s390/net/qeth_tso.h b/drivers/s390/net/qeth_tso.h
--- a/drivers/s390/net/qeth_tso.h
+++ b/drivers/s390/net/qeth_tso.h
@@ -51,7 +51,7 @@ qeth_tso_fill_header(struct qeth_card *c
hdr->ext.hdr_version = 1;
hdr->ext.hdr_len = 28;
/*insert non-fix values */
- hdr->ext.mss = skb_shinfo(skb)->tso_size;
+ hdr->ext.mss = skb_shinfo(skb)->gso_size;
hdr->ext.dg_hdr_len = (__u16)(iph->ihl*4 + tcph->doff*4);
hdr->ext.payload_len = (__u16)(skb->len - hdr->ext.dg_hdr_len -
sizeof(struct qeth_hdr_tso));
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -309,9 +309,12 @@ struct net_device
#define NETIF_F_HW_VLAN_RX 256 /* Receive VLAN hw acceleration */
#define NETIF_F_HW_VLAN_FILTER 512 /* Receive filtering on VLAN */
#define NETIF_F_VLAN_CHALLENGED 1024 /* Device cannot handle VLAN packets */
-#define NETIF_F_TSO 2048 /* Can offload TCP/IP segmentation */
#define NETIF_F_LLTX 4096 /* LockLess TX */
-#define NETIF_F_UFO 8192 /* Can offload UDP Large Send*/
+
+ /* Segmentation offload features */
+#define NETIF_F_GSO_SHIFT 16
+#define NETIF_F_TSO (SKB_GSO_TCPV4 << NETIF_F_GSO_SHIFT)
+#define NETIF_F_UFO (SKB_GSO_UDPV4 << NETIF_F_GSO_SHIFT)
#define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
#define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
@@ -980,6 +983,13 @@ extern void dev_seq_stop(struct seq_file
extern void linkwatch_run_queue(void);
+static inline int netif_needs_gso(struct net_device *dev, struct sk_buff *skb)
+{
+ int feature = skb_shinfo(skb)->gso_type << NETIF_F_GSO_SHIFT;
+ return skb_shinfo(skb)->gso_size &&
+ (dev->features & feature) != feature;
+}
+
#endif /* __KERNEL__ */
#endif /* _LINUX_DEV_H */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -135,9 +135,10 @@ struct skb_frag_struct {
struct skb_shared_info {
atomic_t dataref;
unsigned short nr_frags;
- unsigned short tso_size;
- unsigned short tso_segs;
- unsigned short ufo_size;
+ unsigned short gso_size;
+ /* Warning: this field is not always filled in (UFO)! */
+ unsigned short gso_segs;
+ unsigned short gso_type;
unsigned int ip6_frag_id;
struct sk_buff *frag_list;
skb_frag_t frags[MAX_SKB_FRAGS];
@@ -169,6 +170,11 @@ enum {
SKB_FCLONE_CLONE,
};
+enum {
+ SKB_GSO_TCPV4 = 1 << 0,
+ SKB_GSO_UDPV4 = 1 << 1,
+};
+
/**
* struct sk_buff - socket buffer
* @next: Next buffer in list
diff --git a/include/net/tcp.h b/include/net/tcp.h
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -570,13 +570,13 @@ struct tcp_skb_cb {
*/
static inline int tcp_skb_pcount(const struct sk_buff *skb)
{
- return skb_shinfo(skb)->tso_segs;
+ return skb_shinfo(skb)->gso_segs;
}
/* This is valid iff tcp_skb_pcount() > 1. */
static inline int tcp_skb_mss(const struct sk_buff *skb)
{
- return skb_shinfo(skb)->tso_size;
+ return skb_shinfo(skb)->gso_size;
}
static inline void tcp_dec_pcount_approx(__u32 *count,
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -34,8 +34,8 @@ static inline unsigned packet_length(con
int br_dev_queue_push_xmit(struct sk_buff *skb)
{
- /* drop mtu oversized packets except tso */
- if (packet_length(skb) > skb->dev->mtu && !skb_shinfo(skb)->tso_size)
+ /* drop mtu oversized packets except gso */
+ if (packet_length(skb) > skb->dev->mtu && !skb_shinfo(skb)->gso_size)
kfree_skb(skb);
else {
#ifdef CONFIG_BRIDGE_NETFILTER
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -761,7 +761,7 @@ static int br_nf_dev_queue_xmit(struct s
{
if (skb->protocol == htons(ETH_P_IP) &&
skb->len > skb->dev->mtu &&
- !(skb_shinfo(skb)->ufo_size || skb_shinfo(skb)->tso_size))
+ !skb_shinfo(skb)->gso_size)
return ip_fragment(skb, br_dev_queue_push_xmit);
else
return br_dev_queue_push_xmit(skb);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -172,9 +172,9 @@ struct sk_buff *__alloc_skb(unsigned int
shinfo = skb_shinfo(skb);
atomic_set(&shinfo->dataref, 1);
shinfo->nr_frags = 0;
- shinfo->tso_size = 0;
- shinfo->tso_segs = 0;
- shinfo->ufo_size = 0;
+ shinfo->gso_size = 0;
+ shinfo->gso_segs = 0;
+ shinfo->gso_type = 0;
shinfo->ip6_frag_id = 0;
shinfo->frag_list = NULL;
@@ -238,8 +238,9 @@ struct sk_buff *alloc_skb_from_cache(kme
atomic_set(&(skb_shinfo(skb)->dataref), 1);
skb_shinfo(skb)->nr_frags = 0;
- skb_shinfo(skb)->tso_size = 0;
- skb_shinfo(skb)->tso_segs = 0;
+ skb_shinfo(skb)->gso_size = 0;
+ skb_shinfo(skb)->gso_segs = 0;
+ skb_shinfo(skb)->gso_type = 0;
skb_shinfo(skb)->frag_list = NULL;
out:
return skb;
@@ -528,8 +529,9 @@ static void copy_skb_header(struct sk_bu
#endif
skb_copy_secmark(new, old);
atomic_set(&new->users, 1);
- skb_shinfo(new)->tso_size = skb_shinfo(old)->tso_size;
- skb_shinfo(new)->tso_segs = skb_shinfo(old)->tso_segs;
+ skb_shinfo(new)->gso_size = skb_shinfo(old)->gso_size;
+ skb_shinfo(new)->gso_segs = skb_shinfo(old)->gso_segs;
+ skb_shinfo(new)->gso_type = skb_shinfo(old)->gso_type;
}
/**
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -210,8 +210,7 @@ static inline int ip_finish_output(struc
return dst_output(skb);
}
#endif
- if (skb->len > dst_mtu(skb->dst) &&
- !(skb_shinfo(skb)->ufo_size || skb_shinfo(skb)->tso_size))
+ if (skb->len > dst_mtu(skb->dst) && !skb_shinfo(skb)->gso_size)
return ip_fragment(skb, ip_finish_output2);
else
return ip_finish_output2(skb);
@@ -362,7 +361,7 @@ packet_routed:
}
ip_select_ident_more(iph, &rt->u.dst, sk,
- (skb_shinfo(skb)->tso_segs ?: 1) - 1);
+ (skb_shinfo(skb)->gso_segs ?: 1) - 1);
/* Add an IP checksum. */
ip_send_check(iph);
@@ -744,7 +743,8 @@ static inline int ip_ufo_append_data(str
(length - transhdrlen));
if (!err) {
/* specify the length of each IP datagram fragment*/
- skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen);
+ skb_shinfo(skb)->gso_size = mtu - fragheaderlen;
+ skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4;
__skb_queue_tail(&sk->sk_write_queue, skb);
return 0;
@@ -1087,14 +1087,16 @@ ssize_t ip_append_page(struct sock *sk,
inet->cork.length += size;
if ((sk->sk_protocol == IPPROTO_UDP) &&
- (rt->u.dst.dev->features & NETIF_F_UFO))
- skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen);
+ (rt->u.dst.dev->features & NETIF_F_UFO)) {
+ skb_shinfo(skb)->gso_size = mtu - fragheaderlen;
+ skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4;
+ }
while (size > 0) {
int i;
- if (skb_shinfo(skb)->ufo_size)
+ if (skb_shinfo(skb)->gso_size)
len = size;
else {
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -571,7 +571,7 @@ new_segment:
skb->ip_summed = CHECKSUM_HW;
tp->write_seq += copy;
TCP_SKB_CB(skb)->end_seq += copy;
- skb_shinfo(skb)->tso_segs = 0;
+ skb_shinfo(skb)->gso_segs = 0;
if (!copied)
TCP_SKB_CB(skb)->flags &= ~TCPCB_FLAG_PSH;
@@ -818,7 +818,7 @@ new_segment:
tp->write_seq += copy;
TCP_SKB_CB(skb)->end_seq += copy;
- skb_shinfo(skb)->tso_segs = 0;
+ skb_shinfo(skb)->gso_segs = 0;
from += copy;
copied += copy;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1073,7 +1073,7 @@ tcp_sacktag_write_queue(struct sock *sk,
else
pkt_len = (end_seq -
TCP_SKB_CB(skb)->seq);
- if (tcp_fragment(sk, skb, pkt_len, skb_shinfo(skb)->tso_size))
+ if (tcp_fragment(sk, skb, pkt_len, skb_shinfo(skb)->gso_size))
break;
pcount = tcp_skb_pcount(skb);
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -515,15 +515,17 @@ static void tcp_set_skb_tso_segs(struct
/* Avoid the costly divide in the normal
* non-TSO case.
*/
- skb_shinfo(skb)->tso_segs = 1;
- skb_shinfo(skb)->tso_size = 0;
+ skb_shinfo(skb)->gso_segs = 1;
+ skb_shinfo(skb)->gso_size = 0;
+ skb_shinfo(skb)->gso_type = 0;
} else {
unsigned int factor;
factor = skb->len + (mss_now - 1);
factor /= mss_now;
- skb_shinfo(skb)->tso_segs = factor;
- skb_shinfo(skb)->tso_size = mss_now;
+ skb_shinfo(skb)->gso_segs = factor;
+ skb_shinfo(skb)->gso_size = mss_now;
+ skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
}
}
@@ -914,7 +916,7 @@ static int tcp_init_tso_segs(struct sock
if (!tso_segs ||
(tso_segs > 1 &&
- skb_shinfo(skb)->tso_size != mss_now)) {
+ tcp_skb_mss(skb) != mss_now)) {
tcp_set_skb_tso_segs(sk, skb, mss_now);
tso_segs = tcp_skb_pcount(skb);
}
@@ -1724,8 +1726,9 @@ int tcp_retransmit_skb(struct sock *sk,
tp->snd_una == (TCP_SKB_CB(skb)->end_seq - 1)) {
if (!pskb_trim(skb, 0)) {
TCP_SKB_CB(skb)->seq = TCP_SKB_CB(skb)->end_seq - 1;
- skb_shinfo(skb)->tso_segs = 1;
- skb_shinfo(skb)->tso_size = 0;
+ skb_shinfo(skb)->gso_segs = 1;
+ skb_shinfo(skb)->gso_size = 0;
+ skb_shinfo(skb)->gso_type = 0;
skb->ip_summed = CHECKSUM_NONE;
skb->csum = 0;
}
@@ -1930,8 +1933,9 @@ void tcp_send_fin(struct sock *sk)
skb->csum = 0;
TCP_SKB_CB(skb)->flags = (TCPCB_FLAG_ACK | TCPCB_FLAG_FIN);
TCP_SKB_CB(skb)->sacked = 0;
- skb_shinfo(skb)->tso_segs = 1;
- skb_shinfo(skb)->tso_size = 0;
+ skb_shinfo(skb)->gso_segs = 1;
+ skb_shinfo(skb)->gso_size = 0;
+ skb_shinfo(skb)->gso_type = 0;
/* FIN eats a sequence byte, write_seq advanced by tcp_queue_skb(). */
TCP_SKB_CB(skb)->seq = tp->write_seq;
@@ -1963,8 +1967,9 @@ void tcp_send_active_reset(struct sock *
skb->csum = 0;
TCP_SKB_CB(skb)->flags = (TCPCB_FLAG_ACK | TCPCB_FLAG_RST);
TCP_SKB_CB(skb)->sacked = 0;
- skb_shinfo(skb)->tso_segs = 1;
- skb_shinfo(skb)->tso_size = 0;
+ skb_shinfo(skb)->gso_segs = 1;
+ skb_shinfo(skb)->gso_size = 0;
+ skb_shinfo(skb)->gso_type = 0;
/* Send it off. */
TCP_SKB_CB(skb)->seq = tcp_acceptable_seq(sk, tp);
@@ -2047,8 +2052,9 @@ struct sk_buff * tcp_make_synack(struct
TCP_SKB_CB(skb)->seq = tcp_rsk(req)->snt_isn;
TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(skb)->seq + 1;
TCP_SKB_CB(skb)->sacked = 0;
- skb_shinfo(skb)->tso_segs = 1;
- skb_shinfo(skb)->tso_size = 0;
+ skb_shinfo(skb)->gso_segs = 1;
+ skb_shinfo(skb)->gso_size = 0;
+ skb_shinfo(skb)->gso_type = 0;
th->seq = htonl(TCP_SKB_CB(skb)->seq);
th->ack_seq = htonl(tcp_rsk(req)->rcv_isn + 1);
if (req->rcv_wnd == 0) { /* ignored for retransmitted syns */
@@ -2152,8 +2158,9 @@ int tcp_connect(struct sock *sk)
TCP_SKB_CB(buff)->flags = TCPCB_FLAG_SYN;
TCP_ECN_send_syn(sk, tp, buff);
TCP_SKB_CB(buff)->sacked = 0;
- skb_shinfo(buff)->tso_segs = 1;
- skb_shinfo(buff)->tso_size = 0;
+ skb_shinfo(buff)->gso_segs = 1;
+ skb_shinfo(buff)->gso_size = 0;
+ skb_shinfo(buff)->gso_type = 0;
buff->csum = 0;
TCP_SKB_CB(buff)->seq = tp->write_seq++;
TCP_SKB_CB(buff)->end_seq = tp->write_seq;
@@ -2257,8 +2264,9 @@ void tcp_send_ack(struct sock *sk)
buff->csum = 0;
TCP_SKB_CB(buff)->flags = TCPCB_FLAG_ACK;
TCP_SKB_CB(buff)->sacked = 0;
- skb_shinfo(buff)->tso_segs = 1;
- skb_shinfo(buff)->tso_size = 0;
+ skb_shinfo(buff)->gso_segs = 1;
+ skb_shinfo(buff)->gso_size = 0;
+ skb_shinfo(buff)->gso_type = 0;
/* Send it off, this clears delayed acks for us. */
TCP_SKB_CB(buff)->seq = TCP_SKB_CB(buff)->end_seq = tcp_acceptable_seq(sk, tp);
@@ -2293,8 +2301,9 @@ static int tcp_xmit_probe_skb(struct soc
skb->csum = 0;
TCP_SKB_CB(skb)->flags = TCPCB_FLAG_ACK;
TCP_SKB_CB(skb)->sacked = urgent;
- skb_shinfo(skb)->tso_segs = 1;
- skb_shinfo(skb)->tso_size = 0;
+ skb_shinfo(skb)->gso_segs = 1;
+ skb_shinfo(skb)->gso_size = 0;
+ skb_shinfo(skb)->gso_type = 0;
/* Use a previous sequence. This should cause the other
* end to send an ack. Don't queue or clone SKB, just
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -148,7 +148,7 @@ static int ip6_output2(struct sk_buff *s
int ip6_output(struct sk_buff *skb)
{
- if ((skb->len > dst_mtu(skb->dst) && !skb_shinfo(skb)->ufo_size) ||
+ if ((skb->len > dst_mtu(skb->dst) && !skb_shinfo(skb)->gso_size) ||
dst_allfrag(skb->dst))
return ip6_fragment(skb, ip6_output2);
else
@@ -833,8 +833,9 @@ static inline int ip6_ufo_append_data(st
struct frag_hdr fhdr;
/* specify the length of each IP datagram fragment*/
- skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen) -
- sizeof(struct frag_hdr);
+ skb_shinfo(skb)->gso_size = mtu - fragheaderlen -
+ sizeof(struct frag_hdr);
+ skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4;
ipv6_select_ident(skb, &fhdr);
skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
__skb_queue_tail(&sk->sk_write_queue, skb);
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [1/5] [NET]: Merge TSO/UFO fields in sk_buff
2006-06-20 9:10 ` [1/5] [NET]: Merge TSO/UFO fields in sk_buff Herbert Xu
@ 2006-06-21 21:48 ` Michael Chan
2006-06-21 23:27 ` Herbert Xu
0 siblings, 1 reply; 21+ messages in thread
From: Michael Chan @ 2006-06-21 21:48 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, netdev
On Tue, 2006-06-20 at 19:10 +1000, Herbert Xu wrote:
> I've made gso_type a conjunction. The idea is that you have a base type
> (e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
> For example, if we add a hardware TSO type that supports ECN, they would
> declare NETIF_F_TSO | NETIF_F_TSO_ECN.
Hi Herbert,
We have some hardware that supports TSO and ECN. Is something like the
patch below what you had in mind to support NETIF_F_TSO_ECN? Or are you
thinking about something more generic that works with or without
hardware support?
[NET]: Add hardware TSO support for ECN
In the current TSO implementation, NETIF_F_TSO and ECN cannot be
turned on together in a TCP connection. This patch adds a new
feature NETIF_F_TSO_ECN for hardware that supports TSO and ECN.
To support NETIF_F_TSO_ECN, hardware has to set the ECE flag in the
TCP flags for all segments if the first TSO segment has the ECE flag set.
If the CWR flag is set in the first TSO segment, hardware has to set
CWR in the first segment only and clear it in all subsequent segments.
Signed-off-by: Michael Chan <mchan@broadcom.com>
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a3af961..825b66d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -316,6 +316,7 @@ struct net_device
#define NETIF_F_GSO_SHIFT 16
#define NETIF_F_TSO (SKB_GSO_TCPV4 << NETIF_F_GSO_SHIFT)
#define NETIF_F_UFO (SKB_GSO_UDPV4 << NETIF_F_GSO_SHIFT)
+#define NETIF_F_TSO_ECN (SKB_GSO_TCPV4_ECN << NETIF_F_GSO_SHIFT)
#define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
#define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 679feab..818f478 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -173,6 +173,7 @@ enum {
enum {
SKB_GSO_TCPV4 = 1 << 0,
SKB_GSO_UDPV4 = 1 << 1,
+ SKB_GSO_TCPV4_ECN = 1 << 2,
};
/**
diff --git a/include/net/sock.h b/include/net/sock.h
index 6aac245..7c1ac0c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1034,7 +1034,8 @@ static inline void sk_setup_caps(struct
if (sk->sk_route_caps & NETIF_F_GSO)
sk->sk_route_caps |= NETIF_F_TSO;
if (sk->sk_route_caps & NETIF_F_TSO) {
- if (sock_flag(sk, SOCK_NO_LARGESEND) || dst->header_len)
+ if ((sock_flag(sk, SOCK_NO_LARGESEND) &&
+ !(sk->sk_route_caps & NETIF_F_TSO_ECN)) || dst->header_len)
sk->sk_route_caps &= ~NETIF_F_TSO;
else
sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index c6b8439..c8a3b48 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -31,7 +31,8 @@ static inline void TCP_ECN_send_syn(stru
struct sk_buff *skb)
{
tp->ecn_flags = 0;
- if (sysctl_tcp_ecn && !(sk->sk_route_caps & NETIF_F_TSO)) {
+ if (sysctl_tcp_ecn && (!(sk->sk_route_caps & NETIF_F_TSO) ||
+ (sk->sk_route_caps & NETIF_F_TSO_ECN))) {
TCP_SKB_CB(skb)->flags |= TCPCB_FLAG_ECE|TCPCB_FLAG_CWR;
tp->ecn_flags = TCP_ECN_OK;
sock_set_flag(sk, SOCK_NO_LARGESEND);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index bdd71db..a65fe56 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2044,7 +2044,8 @@ struct sk_buff * tcp_make_synack(struct
memset(th, 0, sizeof(struct tcphdr));
th->syn = 1;
th->ack = 1;
- if (dst->dev->features&NETIF_F_TSO)
+ if ((dst->dev->features&NETIF_F_TSO) &&
+ !(dst->dev->features&NETIF_F_TSO_ECN))
ireq->ecn_ok = 0;
TCP_ECN_make_synack(req, th);
th->source = inet_sk(sk)->sport;
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [1/5] [NET]: Merge TSO/UFO fields in sk_buff
2006-06-21 21:48 ` Michael Chan
@ 2006-06-21 23:27 ` Herbert Xu
2006-06-22 0:46 ` Michael Chan
0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2006-06-21 23:27 UTC (permalink / raw)
To: Michael Chan; +Cc: David S. Miller, netdev
Hi Michael:
On Wed, Jun 21, 2006 at 02:48:15PM -0700, Michael Chan wrote:
>
> We have some hardware that supports TSO and ECN. Is something like the
> patch below what you had in mind to support NETIF_F_TSO_ECN? Or are you
> thinking about something more generic that works with or without
> hardware support?
Yeah I was thinking of something more generic because packets with CWR
set should be rare.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [1/5] [NET]: Merge TSO/UFO fields in sk_buff
2006-06-21 23:27 ` Herbert Xu
@ 2006-06-22 0:46 ` Michael Chan
2006-06-22 1:09 ` Herbert Xu
0 siblings, 1 reply; 21+ messages in thread
From: Michael Chan @ 2006-06-22 0:46 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, netdev
On Thu, 2006-06-22 at 09:27 +1000, Herbert Xu wrote:
> Hi Michael:
>
> On Wed, Jun 21, 2006 at 02:48:15PM -0700, Michael Chan wrote:
> >
> > We have some hardware that supports TSO and ECN. Is something like the
> > patch below what you had in mind to support NETIF_F_TSO_ECN? Or are you
> > thinking about something more generic that works with or without
> > hardware support?
>
> Yeah I was thinking of something more generic because packets with CWR
> set should be rare.
>
OK, if time permits, I'll cook up some patches to support generic TSO
ECN with or without hardware support. Without hardware ECN, it will use
GSO to split up the packet with CWR. Can we assume that all hardware
will handle ECE properly?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [1/5] [NET]: Merge TSO/UFO fields in sk_buff
2006-06-22 0:46 ` Michael Chan
@ 2006-06-22 1:09 ` Herbert Xu
2006-06-22 1:14 ` David Miller
0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2006-06-22 1:09 UTC (permalink / raw)
To: Michael Chan; +Cc: David S. Miller, netdev
On Wed, Jun 21, 2006 at 05:46:24PM -0700, Michael Chan wrote:
>
> OK, if time permits, I'll cook up some patches to support generic TSO
> ECN with or without hardware support. Without hardware ECN, it will use
> GSO to split up the packet with CWR. Can we assume that all hardware
> will handle ECE properly?
ECE just needs to be replicated so it would seem to be a safe bet unless
Dave knows some really broken hardware out there? If not I'd say that
we should just assume that it works and add a new bit it if said broken
stuff does turn up.
Thanks a lot for looking into this!
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [1/5] [NET]: Merge TSO/UFO fields in sk_buff
2006-06-22 1:09 ` Herbert Xu
@ 2006-06-22 1:14 ` David Miller
0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2006-06-22 1:14 UTC (permalink / raw)
To: herbert; +Cc: mchan, netdev
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 22 Jun 2006 11:09:25 +1000
> ECE just needs to be replicated so it would seem to be a safe bet unless
> Dave knows some really broken hardware out there? If not I'd say that
> we should just assume that it works and add a new bit it if said broken
> stuff does turn up.
ECE simply needs to persist while the ECE condition is true.
If it is true when we build the TSO frame, it would have
thus been true during the time in which we had built each
individual sub-frame.
I don't anticipate any problems if you just mirror the ECE
bit in each chopped up frame.
> Thanks a lot for looking into this!
Yes, indeed, thanks Michael.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [2/5] [NET]: Add generic segmentation offload
2006-06-20 9:09 [0/5] GSO: Generic Segmentation Offload Herbert Xu
2006-06-20 9:10 ` [1/5] [NET]: Merge TSO/UFO fields in sk_buff Herbert Xu
@ 2006-06-20 9:28 ` Herbert Xu
2006-06-20 17:54 ` Michael Chan
2006-06-20 9:29 ` [3/5] [NET]: Add software TSOv4 Herbert Xu
` (3 subsequent siblings)
5 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2006-06-20 9:28 UTC (permalink / raw)
To: David S. Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 716 bytes --]
Hi:
[NET]: Add generic segmentation offload
This patch adds the infrastructure for generic segmentation offload.
The idea is to tap into the potential savings of TSO without hardware
support by postponing the allocation of segmented skb's until just
before the entry point into the NIC driver.
The same structure can be used to support software IPv6 TSO, as well as
UFO and segmentation offload for other relevant protocols, e.g., DCCP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p2.patch --]
[-- Type: text/plain, Size: 7421 bytes --]
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -406,6 +406,9 @@ struct net_device
struct list_head qdisc_list;
unsigned long tx_queue_len; /* Max frames per queue allowed */
+ /* Partially transmitted GSO packet. */
+ struct sk_buff *gso_skb;
+
/* ingress path synchronizer */
spinlock_t ingress_lock;
struct Qdisc *qdisc_ingress;
@@ -540,6 +543,7 @@ struct packet_type {
struct net_device *,
struct packet_type *,
struct net_device *);
+ struct sk_buff *(*gso_segment)(struct sk_buff *skb, int sg);
void *af_packet_priv;
struct list_head list;
};
@@ -690,7 +694,8 @@ extern int dev_change_name(struct net_d
extern int dev_set_mtu(struct net_device *, int);
extern int dev_set_mac_address(struct net_device *,
struct sockaddr *);
-extern void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev);
+extern int dev_hard_start_xmit(struct sk_buff *skb,
+ struct net_device *dev);
extern void dev_init(void);
@@ -964,6 +969,7 @@ extern int netdev_max_backlog;
extern int weight_p;
extern int netdev_set_master(struct net_device *dev, struct net_device *master);
extern int skb_checksum_help(struct sk_buff *skb, int inward);
+extern struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg);
#ifdef CONFIG_BUG
extern void netdev_rx_csum_fault(struct net_device *dev);
#else
diff --git a/net/core/dev.c b/net/core/dev.c
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -116,6 +116,7 @@
#include <asm/current.h>
#include <linux/audit.h>
#include <linux/dmaengine.h>
+#include <linux/err.h>
/*
* The list of packet types we will receive (as opposed to discard)
@@ -1048,7 +1049,7 @@ static inline void net_timestamp(struct
* taps currently in use.
*/
-void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
+static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
{
struct packet_type *ptype;
@@ -1186,6 +1187,40 @@ out:
return ret;
}
+/**
+ * skb_gso_segment - Perform segmentation on skb.
+ * @skb: buffer to segment
+ * @sg: whether scatter-gather is supported on the target.
+ *
+ * This function segments the given skb and returns a list of segments.
+ */
+struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg)
+{
+ struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT);
+ struct packet_type *ptype;
+ int type = skb->protocol;
+
+ BUG_ON(skb_shinfo(skb)->frag_list);
+ BUG_ON(skb->ip_summed != CHECKSUM_HW);
+
+ skb->mac.raw = skb->data;
+ skb->mac_len = skb->nh.raw - skb->data;
+ __skb_pull(skb, skb->mac_len);
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type) & 15], list) {
+ if (ptype->type == type && !ptype->dev && ptype->gso_segment) {
+ segs = ptype->gso_segment(skb, sg);
+ break;
+ }
+ }
+ rcu_read_unlock();
+
+ return segs;
+}
+
+EXPORT_SYMBOL(skb_gso_segment);
+
/* Take action when hardware reception checksum errors are detected. */
#ifdef CONFIG_BUG
void netdev_rx_csum_fault(struct net_device *dev)
@@ -1222,6 +1257,85 @@ static inline int illegal_highdma(struct
#define illegal_highdma(dev, skb) (0)
#endif
+struct dev_gso_cb {
+ void (*destructor)(struct sk_buff *skb);
+};
+
+#define DEV_GSO_CB(skb) ((struct dev_gso_cb *)(skb)->cb)
+
+static void dev_gso_skb_destructor(struct sk_buff *skb)
+{
+ struct dev_gso_cb *cb;
+
+ do {
+ struct sk_buff *nskb = skb->next;
+
+ skb->next = nskb->next;
+ nskb->next = NULL;
+ kfree_skb(nskb);
+ } while (skb->next);
+
+ cb = DEV_GSO_CB(skb);
+ if (cb->destructor)
+ cb->destructor(skb);
+}
+
+/**
+ * dev_gso_segment - Perform emulated hardware segmentation on skb.
+ * @skb: buffer to segment
+ *
+ * This function segments the given skb and stores the list of segments
+ * in skb->next.
+ */
+static int dev_gso_segment(struct sk_buff *skb)
+{
+ struct sk_buff *segs;
+
+ segs = skb_gso_segment(skb, skb->dev->features & NETIF_F_SG &&
+ !illegal_highdma(dev, skb));
+ if (unlikely(IS_ERR(segs)))
+ return PTR_ERR(segs);
+
+ skb->next = segs;
+ DEV_GSO_CB(skb)->destructor = skb->destructor;
+ skb->destructor = dev_gso_skb_destructor;
+
+ return 0;
+}
+
+int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+ if (likely(!skb->next)) {
+ if (netdev_nit)
+ dev_queue_xmit_nit(skb, dev);
+
+ if (!netif_needs_gso(dev, skb))
+ return dev->hard_start_xmit(skb, dev);
+
+ if (unlikely(dev_gso_segment(skb)))
+ goto out_kfree_skb;
+ }
+
+ do {
+ struct sk_buff *nskb = skb->next;
+ int rc;
+
+ skb->next = nskb->next;
+ nskb->next = NULL;
+ rc = dev->hard_start_xmit(nskb, dev);
+ if (unlikely(rc)) {
+ skb->next = nskb;
+ return rc;
+ }
+ } while (skb->next);
+
+ skb->destructor = DEV_GSO_CB(skb)->destructor;
+
+out_kfree_skb:
+ kfree_skb(skb);
+ return 0;
+}
+
#define HARD_TX_LOCK(dev, cpu) { \
if ((dev->features & NETIF_F_LLTX) == 0) { \
netif_tx_lock(dev); \
@@ -1266,6 +1380,10 @@ int dev_queue_xmit(struct sk_buff *skb)
struct Qdisc *q;
int rc = -ENOMEM;
+ /* GSO will handle the following emulations directly. */
+ if (netif_needs_gso(dev, skb))
+ goto gso;
+
if (skb_shinfo(skb)->frag_list &&
!(dev->features & NETIF_F_FRAGLIST) &&
__skb_linearize(skb))
@@ -1290,6 +1408,7 @@ int dev_queue_xmit(struct sk_buff *skb)
if (skb_checksum_help(skb, 0))
goto out_kfree_skb;
+gso:
spin_lock_prefetch(&dev->queue_lock);
/* Disable soft irqs for various locks below. Also
@@ -1346,11 +1465,8 @@ int dev_queue_xmit(struct sk_buff *skb)
HARD_TX_LOCK(dev, cpu);
if (!netif_queue_stopped(dev)) {
- if (netdev_nit)
- dev_queue_xmit_nit(skb, dev);
-
rc = 0;
- if (!dev->hard_start_xmit(skb, dev)) {
+ if (!dev_hard_start_xmit(skb, dev)) {
HARD_TX_UNLOCK(dev);
goto out;
}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -96,8 +96,11 @@ static inline int qdisc_restart(struct n
struct sk_buff *skb;
/* Dequeue packet */
- if ((skb = q->dequeue(q)) != NULL) {
+ if (((skb = dev->gso_skb)) || ((skb = q->dequeue(q)))) {
unsigned nolock = (dev->features & NETIF_F_LLTX);
+
+ dev->gso_skb = NULL;
+
/*
* When the driver has LLTX set it does its own locking
* in start_xmit. No need to add additional overhead by
@@ -134,10 +137,8 @@ static inline int qdisc_restart(struct n
if (!netif_queue_stopped(dev)) {
int ret;
- if (netdev_nit)
- dev_queue_xmit_nit(skb, dev);
- ret = dev->hard_start_xmit(skb, dev);
+ ret = dev_hard_start_xmit(skb, dev);
if (ret == NETDEV_TX_OK) {
if (!nolock) {
netif_tx_unlock(dev);
@@ -171,7 +172,10 @@ static inline int qdisc_restart(struct n
*/
requeue:
- q->ops->requeue(skb, q);
+ if (skb->next)
+ dev->gso_skb = skb;
+ else
+ q->ops->requeue(skb, q);
netif_schedule(dev);
return 1;
}
@@ -572,15 +576,19 @@ void dev_activate(struct net_device *dev
void dev_deactivate(struct net_device *dev)
{
struct Qdisc *qdisc;
+ struct sk_buff *skb;
spin_lock_bh(&dev->queue_lock);
qdisc = dev->qdisc;
dev->qdisc = &noop_qdisc;
+ skb = dev->gso_skb;
+ dev->gso_skb = NULL;
qdisc_reset(qdisc);
spin_unlock_bh(&dev->queue_lock);
+ kfree_skb(skb);
dev_watchdog_down(dev);
while (test_bit(__LINK_STATE_SCHED, &dev->state))
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [2/5] [NET]: Add generic segmentation offload
2006-06-20 9:28 ` [2/5] [NET]: Add generic segmentation offload Herbert Xu
@ 2006-06-20 17:54 ` Michael Chan
2006-06-20 23:46 ` Herbert Xu
0 siblings, 1 reply; 21+ messages in thread
From: Michael Chan @ 2006-06-20 17:54 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, netdev
On Tue, 2006-06-20 at 19:28 +1000, Herbert Xu wrote:
> [NET]: Add generic segmentation offload
>
> +static int dev_gso_segment(struct sk_buff *skb)
> +{
> + struct sk_buff *segs;
> +
> + segs = skb_gso_segment(skb, skb->dev->features & NETIF_F_SG &&
> + !illegal_highdma(dev, skb));
I think you need !illegal_highdma(skb->dev, skb)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [2/5] [NET]: Add generic segmentation offload
2006-06-20 17:54 ` Michael Chan
@ 2006-06-20 23:46 ` Herbert Xu
0 siblings, 0 replies; 21+ messages in thread
From: Herbert Xu @ 2006-06-20 23:46 UTC (permalink / raw)
To: Michael Chan; +Cc: David S. Miller, netdev
On Tue, Jun 20, 2006 at 10:54:48AM -0700, Michael Chan wrote:
>
> I think you need !illegal_highdma(skb->dev, skb)
Thanks for catching this. You can tell that I don't have HIGHMEM :)
Here is the fixed version:
[NET]: Add generic segmentation offload
This patch adds the infrastructure for generic segmentation offload.
The idea is to tap into the potential savings of TSO without hardware
support by postponing the allocation of segmented skb's until just
before the entry point into the NIC driver.
The same structure can be used to support software IPv6 TSO, as well as
UFO and segmentation offload for other relevant protocols, e.g., DCCP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -406,6 +406,9 @@ struct net_device
struct list_head qdisc_list;
unsigned long tx_queue_len; /* Max frames per queue allowed */
+ /* Partially transmitted GSO packet. */
+ struct sk_buff *gso_skb;
+
/* ingress path synchronizer */
spinlock_t ingress_lock;
struct Qdisc *qdisc_ingress;
@@ -540,6 +543,7 @@ struct packet_type {
struct net_device *,
struct packet_type *,
struct net_device *);
+ struct sk_buff *(*gso_segment)(struct sk_buff *skb, int sg);
void *af_packet_priv;
struct list_head list;
};
@@ -690,7 +694,8 @@ extern int dev_change_name(struct net_d
extern int dev_set_mtu(struct net_device *, int);
extern int dev_set_mac_address(struct net_device *,
struct sockaddr *);
-extern void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev);
+extern int dev_hard_start_xmit(struct sk_buff *skb,
+ struct net_device *dev);
extern void dev_init(void);
@@ -964,6 +969,7 @@ extern int netdev_max_backlog;
extern int weight_p;
extern int netdev_set_master(struct net_device *dev, struct net_device *master);
extern int skb_checksum_help(struct sk_buff *skb, int inward);
+extern struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg);
#ifdef CONFIG_BUG
extern void netdev_rx_csum_fault(struct net_device *dev);
#else
diff --git a/net/core/dev.c b/net/core/dev.c
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -116,6 +116,7 @@
#include <asm/current.h>
#include <linux/audit.h>
#include <linux/dmaengine.h>
+#include <linux/err.h>
/*
* The list of packet types we will receive (as opposed to discard)
@@ -1048,7 +1049,7 @@ static inline void net_timestamp(struct
* taps currently in use.
*/
-void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
+static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
{
struct packet_type *ptype;
@@ -1186,6 +1187,40 @@ out:
return ret;
}
+/**
+ * skb_gso_segment - Perform segmentation on skb.
+ * @skb: buffer to segment
+ * @sg: whether scatter-gather is supported on the target.
+ *
+ * This function segments the given skb and returns a list of segments.
+ */
+struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg)
+{
+ struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT);
+ struct packet_type *ptype;
+ int type = skb->protocol;
+
+ BUG_ON(skb_shinfo(skb)->frag_list);
+ BUG_ON(skb->ip_summed != CHECKSUM_HW);
+
+ skb->mac.raw = skb->data;
+ skb->mac_len = skb->nh.raw - skb->data;
+ __skb_pull(skb, skb->mac_len);
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type) & 15], list) {
+ if (ptype->type == type && !ptype->dev && ptype->gso_segment) {
+ segs = ptype->gso_segment(skb, sg);
+ break;
+ }
+ }
+ rcu_read_unlock();
+
+ return segs;
+}
+
+EXPORT_SYMBOL(skb_gso_segment);
+
/* Take action when hardware reception checksum errors are detected. */
#ifdef CONFIG_BUG
void netdev_rx_csum_fault(struct net_device *dev)
@@ -1222,6 +1257,86 @@ static inline int illegal_highdma(struct
#define illegal_highdma(dev, skb) (0)
#endif
+struct dev_gso_cb {
+ void (*destructor)(struct sk_buff *skb);
+};
+
+#define DEV_GSO_CB(skb) ((struct dev_gso_cb *)(skb)->cb)
+
+static void dev_gso_skb_destructor(struct sk_buff *skb)
+{
+ struct dev_gso_cb *cb;
+
+ do {
+ struct sk_buff *nskb = skb->next;
+
+ skb->next = nskb->next;
+ nskb->next = NULL;
+ kfree_skb(nskb);
+ } while (skb->next);
+
+ cb = DEV_GSO_CB(skb);
+ if (cb->destructor)
+ cb->destructor(skb);
+}
+
+/**
+ * dev_gso_segment - Perform emulated hardware segmentation on skb.
+ * @skb: buffer to segment
+ *
+ * This function segments the given skb and stores the list of segments
+ * in skb->next.
+ */
+static int dev_gso_segment(struct sk_buff *skb)
+{
+ struct net_device *dev = skb->dev;
+ struct sk_buff *segs;
+
+ segs = skb_gso_segment(skb, dev->features & NETIF_F_SG &&
+ !illegal_highdma(dev, skb));
+ if (unlikely(IS_ERR(segs)))
+ return PTR_ERR(segs);
+
+ skb->next = segs;
+ DEV_GSO_CB(skb)->destructor = skb->destructor;
+ skb->destructor = dev_gso_skb_destructor;
+
+ return 0;
+}
+
+int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+ if (likely(!skb->next)) {
+ if (netdev_nit)
+ dev_queue_xmit_nit(skb, dev);
+
+ if (!netif_needs_gso(dev, skb))
+ return dev->hard_start_xmit(skb, dev);
+
+ if (unlikely(dev_gso_segment(skb)))
+ goto out_kfree_skb;
+ }
+
+ do {
+ struct sk_buff *nskb = skb->next;
+ int rc;
+
+ skb->next = nskb->next;
+ nskb->next = NULL;
+ rc = dev->hard_start_xmit(nskb, dev);
+ if (unlikely(rc)) {
+ skb->next = nskb;
+ return rc;
+ }
+ } while (skb->next);
+
+ skb->destructor = DEV_GSO_CB(skb)->destructor;
+
+out_kfree_skb:
+ kfree_skb(skb);
+ return 0;
+}
+
#define HARD_TX_LOCK(dev, cpu) { \
if ((dev->features & NETIF_F_LLTX) == 0) { \
netif_tx_lock(dev); \
@@ -1266,6 +1381,10 @@ int dev_queue_xmit(struct sk_buff *skb)
struct Qdisc *q;
int rc = -ENOMEM;
+ /* GSO will handle the following emulations directly. */
+ if (netif_needs_gso(dev, skb))
+ goto gso;
+
if (skb_shinfo(skb)->frag_list &&
!(dev->features & NETIF_F_FRAGLIST) &&
__skb_linearize(skb))
@@ -1290,6 +1409,7 @@ int dev_queue_xmit(struct sk_buff *skb)
if (skb_checksum_help(skb, 0))
goto out_kfree_skb;
+gso:
spin_lock_prefetch(&dev->queue_lock);
/* Disable soft irqs for various locks below. Also
@@ -1346,11 +1466,8 @@ int dev_queue_xmit(struct sk_buff *skb)
HARD_TX_LOCK(dev, cpu);
if (!netif_queue_stopped(dev)) {
- if (netdev_nit)
- dev_queue_xmit_nit(skb, dev);
-
rc = 0;
- if (!dev->hard_start_xmit(skb, dev)) {
+ if (!dev_hard_start_xmit(skb, dev)) {
HARD_TX_UNLOCK(dev);
goto out;
}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -96,8 +96,11 @@ static inline int qdisc_restart(struct n
struct sk_buff *skb;
/* Dequeue packet */
- if ((skb = q->dequeue(q)) != NULL) {
+ if (((skb = dev->gso_skb)) || ((skb = q->dequeue(q)))) {
unsigned nolock = (dev->features & NETIF_F_LLTX);
+
+ dev->gso_skb = NULL;
+
/*
* When the driver has LLTX set it does its own locking
* in start_xmit. No need to add additional overhead by
@@ -134,10 +137,8 @@ static inline int qdisc_restart(struct n
if (!netif_queue_stopped(dev)) {
int ret;
- if (netdev_nit)
- dev_queue_xmit_nit(skb, dev);
- ret = dev->hard_start_xmit(skb, dev);
+ ret = dev_hard_start_xmit(skb, dev);
if (ret == NETDEV_TX_OK) {
if (!nolock) {
netif_tx_unlock(dev);
@@ -171,7 +172,10 @@ static inline int qdisc_restart(struct n
*/
requeue:
- q->ops->requeue(skb, q);
+ if (skb->next)
+ dev->gso_skb = skb;
+ else
+ q->ops->requeue(skb, q);
netif_schedule(dev);
return 1;
}
@@ -572,15 +576,19 @@ void dev_activate(struct net_device *dev
void dev_deactivate(struct net_device *dev)
{
struct Qdisc *qdisc;
+ struct sk_buff *skb;
spin_lock_bh(&dev->queue_lock);
qdisc = dev->qdisc;
dev->qdisc = &noop_qdisc;
+ skb = dev->gso_skb;
+ dev->gso_skb = NULL;
qdisc_reset(qdisc);
spin_unlock_bh(&dev->queue_lock);
+ kfree_skb(skb);
dev_watchdog_down(dev);
while (test_bit(__LINK_STATE_SCHED, &dev->state))
^ permalink raw reply [flat|nested] 21+ messages in thread
* [3/5] [NET]: Add software TSOv4
2006-06-20 9:09 [0/5] GSO: Generic Segmentation Offload Herbert Xu
2006-06-20 9:10 ` [1/5] [NET]: Merge TSO/UFO fields in sk_buff Herbert Xu
2006-06-20 9:28 ` [2/5] [NET]: Add generic segmentation offload Herbert Xu
@ 2006-06-20 9:29 ` Herbert Xu
2006-06-20 9:30 ` [4/5] [NET]: Added GSO toggle Herbert Xu
` (2 subsequent siblings)
5 siblings, 0 replies; 21+ messages in thread
From: Herbert Xu @ 2006-06-20 9:29 UTC (permalink / raw)
To: David S. Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 361 bytes --]
Hi:
[NET]: Add software TSOv4
This patch adds the GSO implementation for IPv4 TCP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p3.patch --]
[-- Type: text/plain, Size: 7982 bytes --]
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1299,6 +1299,7 @@ extern void skb_split(struct sk_b
struct sk_buff *skb1, const u32 len);
extern void skb_release_data(struct sk_buff *skb);
+extern struct sk_buff *skb_segment(struct sk_buff *skb, int sg);
static inline void *skb_header_pointer(const struct sk_buff *skb, int offset,
int len, void *buffer)
diff --git a/include/net/protocol.h b/include/net/protocol.h
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -37,6 +37,7 @@
struct net_protocol {
int (*handler)(struct sk_buff *skb);
void (*err_handler)(struct sk_buff *skb, u32 info);
+ struct sk_buff *(*gso_segment)(struct sk_buff *skb, int sg);
int no_policy;
};
diff --git a/include/net/tcp.h b/include/net/tcp.h
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1087,6 +1087,8 @@ extern struct request_sock_ops tcp_reque
extern int tcp_v4_destroy_sock(struct sock *sk);
+extern struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int sg);
+
#ifdef CONFIG_PROC_FS
extern int tcp4_proc_init(void);
extern void tcp4_proc_exit(void);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1826,6 +1826,132 @@ unsigned char *skb_pull_rcsum(struct sk_
EXPORT_SYMBOL_GPL(skb_pull_rcsum);
+/**
+ * skb_segment - Perform protocol segmentation on skb.
+ * @skb: buffer to segment
+ * @sg: whether scatter-gather can be used for generated segments
+ *
+ * This function performs segmentation on the given skb. It returns
+ * the segment at the given position. It returns NULL if there are
+ * no more segments to generate, or when an error is encountered.
+ */
+struct sk_buff *skb_segment(struct sk_buff *skb, int sg)
+{
+ struct sk_buff *segs = NULL;
+ struct sk_buff *tail = NULL;
+ unsigned int mss = skb_shinfo(skb)->gso_size;
+ unsigned int doffset = skb->data - skb->mac.raw;
+ unsigned int offset = doffset;
+ unsigned int headroom;
+ unsigned int len;
+ int nfrags = skb_shinfo(skb)->nr_frags;
+ int err = -ENOMEM;
+ int i = 0;
+ int pos;
+
+ __skb_push(skb, doffset);
+ headroom = skb_headroom(skb);
+ pos = skb_headlen(skb);
+
+ do {
+ struct sk_buff *nskb;
+ skb_frag_t *frag;
+ int hsize, nsize;
+ int k;
+ int size;
+
+ len = skb->len - offset;
+ if (len > mss)
+ len = mss;
+
+ hsize = skb_headlen(skb) - offset;
+ if (hsize < 0)
+ hsize = 0;
+ nsize = hsize + doffset;
+ if (nsize > len + doffset || !sg)
+ nsize = len + doffset;
+
+ nskb = alloc_skb(nsize + headroom, GFP_ATOMIC);
+ if (unlikely(!nskb))
+ goto err;
+
+ if (segs)
+ tail->next = nskb;
+ else
+ segs = nskb;
+ tail = nskb;
+
+ nskb->dev = skb->dev;
+ nskb->priority = skb->priority;
+ nskb->protocol = skb->protocol;
+ nskb->dst = dst_clone(skb->dst);
+ memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
+ nskb->pkt_type = skb->pkt_type;
+ nskb->mac_len = skb->mac_len;
+
+ skb_reserve(nskb, headroom);
+ nskb->mac.raw = nskb->data;
+ nskb->nh.raw = nskb->data + skb->mac_len;
+ nskb->h.raw = nskb->nh.raw + (skb->h.raw - skb->nh.raw);
+ memcpy(skb_put(nskb, doffset), skb->data, doffset);
+
+ if (!sg) {
+ nskb->csum = skb_copy_and_csum_bits(skb, offset,
+ skb_put(nskb, len),
+ len, 0);
+ continue;
+ }
+
+ frag = skb_shinfo(nskb)->frags;
+ k = 0;
+
+ nskb->ip_summed = CHECKSUM_HW;
+ nskb->csum = skb->csum;
+ memcpy(skb_put(nskb, hsize), skb->data + offset, hsize);
+
+ while (pos < offset + len) {
+ BUG_ON(i >= nfrags);
+
+ *frag = skb_shinfo(skb)->frags[i];
+ get_page(frag->page);
+ size = frag->size;
+
+ if (pos < offset) {
+ frag->page_offset += offset - pos;
+ frag->size -= offset - pos;
+ }
+
+ k++;
+
+ if (pos + size <= offset + len) {
+ i++;
+ pos += size;
+ } else {
+ frag->size -= pos + size - (offset + len);
+ break;
+ }
+
+ frag++;
+ }
+
+ skb_shinfo(nskb)->nr_frags = k;
+ nskb->data_len = len - hsize;
+ nskb->len += nskb->data_len;
+ nskb->truesize += nskb->data_len;
+ } while ((offset += len) < skb->len);
+
+ return segs;
+
+err:
+ while ((skb = segs)) {
+ segs = skb->next;
+ kfree(skb);
+ }
+ return ERR_PTR(err);
+}
+
+EXPORT_SYMBOL_GPL(skb_segment);
+
void __init skb_init(void)
{
skbuff_head_cache = kmem_cache_create("skbuff_head_cache",
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -68,6 +68,7 @@
*/
#include <linux/config.h>
+#include <linux/err.h>
#include <linux/errno.h>
#include <linux/types.h>
#include <linux/socket.h>
@@ -1096,6 +1097,54 @@ int inet_sk_rebuild_header(struct sock *
EXPORT_SYMBOL(inet_sk_rebuild_header);
+static struct sk_buff *inet_gso_segment(struct sk_buff *skb, int sg)
+{
+ struct sk_buff *segs = ERR_PTR(-EINVAL);
+ struct iphdr *iph;
+ struct net_protocol *ops;
+ int proto;
+ int ihl;
+ int id;
+
+ if (!pskb_may_pull(skb, sizeof(*iph)))
+ goto out;
+
+ iph = skb->nh.iph;
+ ihl = iph->ihl * 4;
+ if (ihl < sizeof(*iph))
+ goto out;
+
+ if (!pskb_may_pull(skb, ihl))
+ goto out;
+
+ skb->h.raw = __skb_pull(skb, ihl);
+ iph = skb->nh.iph;
+ id = ntohs(iph->id);
+ proto = iph->protocol & (MAX_INET_PROTOS - 1);
+ segs = ERR_PTR(-EPROTONOSUPPORT);
+
+ rcu_read_lock();
+ ops = rcu_dereference(inet_protos[proto]);
+ if (ops && ops->gso_segment)
+ segs = ops->gso_segment(skb, sg);
+ rcu_read_unlock();
+
+ if (IS_ERR(segs))
+ goto out;
+
+ skb = segs;
+ do {
+ iph = skb->nh.iph;
+ iph->id = htons(id++);
+ iph->tot_len = htons(skb->len - skb->mac_len);
+ iph->check = 0;
+ iph->check = ip_fast_csum(skb->nh.raw, iph->ihl);
+ } while ((skb = skb->next));
+
+out:
+ return segs;
+}
+
#ifdef CONFIG_IP_MULTICAST
static struct net_protocol igmp_protocol = {
.handler = igmp_rcv,
@@ -1105,6 +1154,7 @@ static struct net_protocol igmp_protocol
static struct net_protocol tcp_protocol = {
.handler = tcp_v4_rcv,
.err_handler = tcp_v4_err,
+ .gso_segment = tcp_tso_segment,
.no_policy = 1,
};
@@ -1150,6 +1200,7 @@ static int ipv4_proc_init(void);
static struct packet_type ip_packet_type = {
.type = __constant_htons(ETH_P_IP),
.func = ip_rcv,
+ .gso_segment = inet_gso_segment,
};
static int __init inet_init(void)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -258,6 +258,7 @@
#include <linux/random.h>
#include <linux/bootmem.h>
#include <linux/cache.h>
+#include <linux/err.h>
#include <net/icmp.h>
#include <net/tcp.h>
@@ -2144,6 +2145,67 @@ int compat_tcp_getsockopt(struct sock *s
EXPORT_SYMBOL(compat_tcp_getsockopt);
#endif
+struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int sg)
+{
+ struct sk_buff *segs = ERR_PTR(-EINVAL);
+ struct tcphdr *th;
+ unsigned thlen;
+ unsigned int seq;
+ unsigned int delta;
+ unsigned int oldlen;
+ unsigned int len;
+
+ if (!pskb_may_pull(skb, sizeof(*th)))
+ goto out;
+
+ th = skb->h.th;
+ thlen = th->doff * 4;
+ if (thlen < sizeof(*th))
+ goto out;
+
+ if (!pskb_may_pull(skb, thlen))
+ goto out;
+
+ oldlen = ~htonl(skb->len);
+ __skb_pull(skb, thlen);
+
+ segs = skb_segment(skb, sg);
+ if (IS_ERR(segs))
+ goto out;
+
+ len = skb_shinfo(skb)->gso_size;
+ delta = csum_add(oldlen, htonl(thlen + len));
+
+ skb = segs;
+ th = skb->h.th;
+ seq = ntohl(th->seq);
+
+ do {
+ th->fin = th->psh = 0;
+
+ if (skb->ip_summed == CHECKSUM_NONE) {
+ th->check = csum_fold(csum_partial(
+ skb->h.raw, thlen, csum_add(skb->csum, delta)));
+ }
+
+ seq += len;
+ skb = skb->next;
+ th = skb->h.th;
+
+ th->seq = htonl(seq);
+ th->cwr = 0;
+ } while (skb->next);
+
+ if (skb->ip_summed == CHECKSUM_NONE) {
+ delta = csum_add(oldlen, htonl(skb->tail - skb->h.raw));
+ th->check = csum_fold(csum_partial(
+ skb->h.raw, thlen, csum_add(skb->csum, delta)));
+ }
+
+out:
+ return segs;
+}
+
extern void __skb_cb_too_small_for_tcp(int, int);
extern struct tcp_congestion_ops tcp_reno;
^ permalink raw reply [flat|nested] 21+ messages in thread* [4/5] [NET]: Added GSO toggle
2006-06-20 9:09 [0/5] GSO: Generic Segmentation Offload Herbert Xu
` (2 preceding siblings ...)
2006-06-20 9:29 ` [3/5] [NET]: Add software TSOv4 Herbert Xu
@ 2006-06-20 9:30 ` Herbert Xu
2006-06-20 9:30 ` [5/5] [IPSEC]: Handle GSO packets Herbert Xu
2006-06-20 9:32 ` [0/5] GSO: Generic Segmentation Offload Herbert Xu
5 siblings, 0 replies; 21+ messages in thread
From: Herbert Xu @ 2006-06-20 9:30 UTC (permalink / raw)
To: David S. Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 443 bytes --]
Hi:
[NET]: Added GSO toggle
This patch adds a generic segmentation offload toggle that can be turned
on/off for each net device. For now it only supports in TCPv4.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p4.patch --]
[-- Type: text/plain, Size: 3869 bytes --]
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -408,6 +408,8 @@ struct ethtool_ops {
#define ETHTOOL_GPERMADDR 0x00000020 /* Get permanent hardware address */
#define ETHTOOL_GUFO 0x00000021 /* Get UFO enable (ethtool_value) */
#define ETHTOOL_SUFO 0x00000022 /* Set UFO enable (ethtool_value) */
+#define ETHTOOL_GGSO 0x00000023 /* Get GSO enable (ethtool_value) */
+#define ETHTOOL_SGSO 0x00000024 /* Set GSO enable (ethtool_value) */
/* compatibility with older code */
#define SPARC_ETH_GSET ETHTOOL_GSET
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -309,6 +309,7 @@ struct net_device
#define NETIF_F_HW_VLAN_RX 256 /* Receive VLAN hw acceleration */
#define NETIF_F_HW_VLAN_FILTER 512 /* Receive filtering on VLAN */
#define NETIF_F_VLAN_CHALLENGED 1024 /* Device cannot handle VLAN packets */
+#define NETIF_F_GSO 2048 /* Enable software GSO. */
#define NETIF_F_LLTX 4096 /* LockLess TX */
/* Segmentation offload features */
diff --git a/include/net/sock.h b/include/net/sock.h
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1031,9 +1031,13 @@ static inline void sk_setup_caps(struct
{
__sk_dst_set(sk, dst);
sk->sk_route_caps = dst->dev->features;
+ if (sk->sk_route_caps & NETIF_F_GSO)
+ sk->sk_route_caps |= NETIF_F_TSO;
if (sk->sk_route_caps & NETIF_F_TSO) {
if (sock_flag(sk, SOCK_NO_LARGESEND) || dst->header_len)
sk->sk_route_caps &= ~NETIF_F_TSO;
+ else
+ sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
}
}
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -376,15 +376,20 @@ void br_features_recompute(struct net_br
features = br->feature_mask & ~NETIF_F_ALL_CSUM;
list_for_each_entry(p, &br->port_list, list) {
- if (checksum & NETIF_F_NO_CSUM &&
- !(p->dev->features & NETIF_F_NO_CSUM))
+ unsigned long feature = p->dev->features;
+
+ if (checksum & NETIF_F_NO_CSUM && !(feature & NETIF_F_NO_CSUM))
checksum ^= NETIF_F_NO_CSUM | NETIF_F_HW_CSUM;
- if (checksum & NETIF_F_HW_CSUM &&
- !(p->dev->features & NETIF_F_HW_CSUM))
+ if (checksum & NETIF_F_HW_CSUM && !(feature & NETIF_F_HW_CSUM))
checksum ^= NETIF_F_HW_CSUM | NETIF_F_IP_CSUM;
- if (!(p->dev->features & NETIF_F_IP_CSUM))
+ if (!(feature & NETIF_F_IP_CSUM))
checksum = 0;
- features &= p->dev->features;
+
+ if (feature & NETIF_F_GSO)
+ feature |= NETIF_F_TSO;
+ feature |= NETIF_F_GSO;
+
+ features &= feature;
}
br->dev->features = features | checksum | NETIF_F_LLTX;
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -614,6 +614,29 @@ static int ethtool_set_ufo(struct net_de
return dev->ethtool_ops->set_ufo(dev, edata.data);
}
+static int ethtool_get_gso(struct net_device *dev, char __user *useraddr)
+{
+ struct ethtool_value edata = { ETHTOOL_GGSO };
+
+ edata.data = dev->features & NETIF_F_GSO;
+ if (copy_to_user(useraddr, &edata, sizeof(edata)))
+ return -EFAULT;
+ return 0;
+}
+
+static int ethtool_set_gso(struct net_device *dev, char __user *useraddr)
+{
+ struct ethtool_value edata;
+
+ if (copy_from_user(&edata, useraddr, sizeof(edata)))
+ return -EFAULT;
+ if (edata.data)
+ dev->features |= NETIF_F_GSO;
+ else
+ dev->features &= ~NETIF_F_GSO;
+ return 0;
+}
+
static int ethtool_self_test(struct net_device *dev, char __user *useraddr)
{
struct ethtool_test test;
@@ -905,6 +928,12 @@ int dev_ethtool(struct ifreq *ifr)
case ETHTOOL_SUFO:
rc = ethtool_set_ufo(dev, useraddr);
break;
+ case ETHTOOL_GGSO:
+ rc = ethtool_get_gso(dev, useraddr);
+ break;
+ case ETHTOOL_SGSO:
+ rc = ethtool_set_gso(dev, useraddr);
+ break;
default:
rc = -EOPNOTSUPP;
}
^ permalink raw reply [flat|nested] 21+ messages in thread* [5/5] [IPSEC]: Handle GSO packets
2006-06-20 9:09 [0/5] GSO: Generic Segmentation Offload Herbert Xu
` (3 preceding siblings ...)
2006-06-20 9:30 ` [4/5] [NET]: Added GSO toggle Herbert Xu
@ 2006-06-20 9:30 ` Herbert Xu
2006-06-20 9:32 ` [0/5] GSO: Generic Segmentation Offload Herbert Xu
5 siblings, 0 replies; 21+ messages in thread
From: Herbert Xu @ 2006-06-20 9:30 UTC (permalink / raw)
To: David S. Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 767 bytes --]
Hi:
[IPSEC]: Handle GSO packets
This patch segments GSO packets received by the IPsec stack. This can
happen when a NIC driver injects GSO packets into the stack which are
then forwarded to another host.
The primary application of this is going to be Xen where its backend
driver may inject GSO packets into dom0.
Of course this also can be used by other virtualisation schemes such as
VMWare or UML since the tap device could be modified to inject GSO packets
received through splice.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p5.patch --]
[-- Type: text/plain, Size: 3351 bytes --]
diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -9,6 +9,8 @@
*/
#include <linux/compiler.h>
+#include <linux/if_ether.h>
+#include <linux/kernel.h>
#include <linux/skbuff.h>
#include <linux/spinlock.h>
#include <linux/netfilter_ipv4.h>
@@ -97,16 +99,10 @@ error_nolock:
goto out_exit;
}
-static int xfrm4_output_finish(struct sk_buff *skb)
+static int xfrm4_output_finish2(struct sk_buff *skb)
{
int err;
-#ifdef CONFIG_NETFILTER
- if (!skb->dst->xfrm) {
- IPCB(skb)->flags |= IPSKB_REROUTED;
- return dst_output(skb);
- }
-#endif
while (likely((err = xfrm4_output_one(skb)) == 0)) {
nf_reset(skb);
@@ -119,7 +115,7 @@ static int xfrm4_output_finish(struct sk
return dst_output(skb);
err = nf_hook(PF_INET, NF_IP_POST_ROUTING, &skb, NULL,
- skb->dst->dev, xfrm4_output_finish);
+ skb->dst->dev, xfrm4_output_finish2);
if (unlikely(err != 1))
break;
}
@@ -127,6 +123,48 @@ static int xfrm4_output_finish(struct sk
return err;
}
+static int xfrm4_output_finish(struct sk_buff *skb)
+{
+ struct sk_buff *segs;
+
+#ifdef CONFIG_NETFILTER
+ if (!skb->dst->xfrm) {
+ IPCB(skb)->flags |= IPSKB_REROUTED;
+ return dst_output(skb);
+ }
+#endif
+
+ if (!skb_shinfo(skb)->gso_size)
+ return xfrm4_output_finish2(skb);
+
+ skb->protocol = htons(ETH_P_IP);
+ segs = skb_gso_segment(skb, 0);
+ kfree_skb(skb);
+ if (unlikely(IS_ERR(segs)))
+ return PTR_ERR(segs);
+
+ do {
+ struct sk_buff *nskb = segs->next;
+ int err;
+
+ segs->next = NULL;
+ err = xfrm4_output_finish2(segs);
+
+ if (unlikely(err)) {
+ while ((segs = nskb)) {
+ nskb = segs->next;
+ segs->next = NULL;
+ kfree_skb(segs);
+ }
+ return err;
+ }
+
+ segs = nskb;
+ } while (segs);
+
+ return 0;
+}
+
int xfrm4_output(struct sk_buff *skb)
{
return NF_HOOK_COND(PF_INET, NF_IP_POST_ROUTING, skb, NULL, skb->dst->dev,
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -94,7 +94,7 @@ error_nolock:
goto out_exit;
}
-static int xfrm6_output_finish(struct sk_buff *skb)
+static int xfrm6_output_finish2(struct sk_buff *skb)
{
int err;
@@ -110,7 +110,7 @@ static int xfrm6_output_finish(struct sk
return dst_output(skb);
err = nf_hook(PF_INET6, NF_IP6_POST_ROUTING, &skb, NULL,
- skb->dst->dev, xfrm6_output_finish);
+ skb->dst->dev, xfrm6_output_finish2);
if (unlikely(err != 1))
break;
}
@@ -118,6 +118,41 @@ static int xfrm6_output_finish(struct sk
return err;
}
+static int xfrm6_output_finish(struct sk_buff *skb)
+{
+ struct sk_buff *segs;
+
+ if (!skb_shinfo(skb)->gso_size)
+ return xfrm6_output_finish2(skb);
+
+ skb->protocol = htons(ETH_P_IP);
+ segs = skb_gso_segment(skb, 0);
+ kfree_skb(skb);
+ if (unlikely(IS_ERR(segs)))
+ return PTR_ERR(segs);
+
+ do {
+ struct sk_buff *nskb = segs->next;
+ int err;
+
+ segs->next = NULL;
+ err = xfrm6_output_finish2(segs);
+
+ if (unlikely(err)) {
+ while ((segs = nskb)) {
+ nskb = segs->next;
+ segs->next = NULL;
+ kfree_skb(segs);
+ }
+ return err;
+ }
+
+ segs = nskb;
+ } while (segs);
+
+ return 0;
+}
+
int xfrm6_output(struct sk_buff *skb)
{
return NF_HOOK(PF_INET6, NF_IP6_POST_ROUTING, skb, NULL, skb->dst->dev,
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [0/5] GSO: Generic Segmentation Offload
2006-06-20 9:09 [0/5] GSO: Generic Segmentation Offload Herbert Xu
` (4 preceding siblings ...)
2006-06-20 9:30 ` [5/5] [IPSEC]: Handle GSO packets Herbert Xu
@ 2006-06-20 9:32 ` Herbert Xu
2006-06-20 10:40 ` David Miller
2006-06-20 16:18 ` Rick Jones
5 siblings, 2 replies; 21+ messages in thread
From: Herbert Xu @ 2006-06-20 9:32 UTC (permalink / raw)
To: David S. Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 647 bytes --]
On Tue, Jun 20, 2006 at 07:09:19PM +1000, herbert wrote:
>
> I've attached some numbers to demonstrate the savings brought on by
> doing this. The best scenario is obviously the case where the underlying
> NIC supports SG. This means that we simply have to manipulate the SG
> entries and place them into individual skb's before passing them to the
> driver. The attached file lo-res shows this.
Obviously I forgot to attach them :)
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: lo-res --]
[-- Type: text/plain, Size: 1632 bytes --]
$ sudo ./ethtool -K lo gso on
$ sudo ifconfig lo mtu 1500
$ netperf -t TCP_STREAM
TCP STREAM TEST to localhost
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 3598.17
$ sudo ./ethtool -K lo gso off
$ netperf -t TCP_STREAM
TCP STREAM TEST to localhost
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 3061.05
$ sudo ifconfig lo mtu 60000
$ netperf -t TCP_STREAM
TCP STREAM TEST to localhost
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 8245.05
$ sudo ./ethtool -K lo gso on
$ netperf -t TCP_STREAM
TCP STREAM TEST to localhost
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 8563.36
$ sudo ifconfig lo mtu 16436
$ netperf -t TCP_STREAM
TCP STREAM TEST to localhost
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 7359.95
$ sudo ./ethtool -K lo gso off
$ netperf -t TCP_STREAM
TCP STREAM TEST to localhost
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 7535.04
$
[-- Attachment #3: gso-off --]
[-- Type: text/plain, Size: 12446 bytes --]
CPU: PIII, speed 1200 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples % symbol name
1247 21.7551 csum_partial_copy_generic
294 5.1291 prep_new_page
240 4.1870 __alloc_skb
120 2.0935 tcp_sendmsg
113 1.9714 get_offset_pmtmr
113 1.9714 kfree
103 1.7969 skb_release_data
103 1.7969 timer_interrupt
101 1.7620 ip_queue_xmit
96 1.6748 skb_clone
94 1.6399 __kmalloc
94 1.6399 net_rx_action
86 1.5003 tcp_transmit_skb
80 1.3957 kmem_cache_free
76 1.3259 tcp_clean_rtx_queue
67 1.1689 ip_output
66 1.1514 mark_offset_pmtmr
65 1.1340 tcp_v4_rcv
64 1.1165 local_bh_enable
62 1.0816 kmem_cache_alloc
59 1.0293 irq_entries_start
59 1.0293 page_fault
57 0.9944 tcp_push_one
52 0.9072 kfree_skbmem
47 0.8200 __qdisc_run
47 0.8200 csum_partial
47 0.8200 netif_receive_skb
46 0.8025 __kfree_skb
46 0.8025 tcp_init_tso_segs
44 0.7676 __copy_to_user_ll
44 0.7676 dev_queue_xmit
39 0.6804 pfifo_fast_enqueue
39 0.6804 system_call
37 0.6455 __copy_from_user_ll
37 0.6455 ip_rcv
36 0.6281 __tcp_select_window
33 0.5757 sock_wfree
31 0.5408 __do_softirq
31 0.5408 tcp_v4_send_check
30 0.5234 eth_header
28 0.4885 tcp_rcv_established
27 0.4710 restore_nocheck
26 0.4536 pfifo_fast_dequeue
25 0.4361 __do_IRQ
25 0.4361 do_softirq
25 0.4361 tcp_build_and_update_options
25 0.4361 tcp_snd_test
23 0.4013 cache_alloc_refill
23 0.4013 handle_IRQ_event
23 0.4013 tcp_ack
22 0.3838 free_block
22 0.3838 ip_route_input
21 0.3664 __netif_rx_schedule
21 0.3664 schedule
20 0.3489 do_wp_page
20 0.3489 neigh_resolve_output
19 0.3315 do_IRQ
19 0.3315 do_page_fault
19 0.3315 do_select
19 0.3315 fget_light
19 0.3315 ip_local_deliver
18 0.3140 __tcp_push_pending_frames
18 0.3140 end_level_ioapic_irq
17 0.2966 cpu_idle
17 0.2966 delay_pmtmr
17 0.2966 tcp_select_window
16 0.2791 add_wait_queue
16 0.2791 rt_hash_code
16 0.2791 tcp_set_skb_tso_segs
15 0.2617 find_vma
15 0.2617 irq_exit
15 0.2617 update_send_head
14 0.2442 __switch_to
13 0.2268 __skb_checksum_complete
13 0.2268 common_interrupt
13 0.2268 dev_kfree_skb_any
13 0.2268 tcp_event_data_sent
13 0.2268 zap_pte_range
12 0.2094 __d_lookup
12 0.2094 __page_set_anon_rmap
12 0.2094 mod_timer
12 0.2094 ret_from_intr
12 0.2094 sock_poll
12 0.2094 tcp_current_mss
12 0.2094 tcp_write_xmit
11 0.1919 do_no_page
11 0.1919 error_code
11 0.1919 free_hot_cold_page
11 0.1919 i8042_interrupt
10 0.1745 __link_path_walk
10 0.1745 buffered_rmqueue
10 0.1745 sk_reset_timer
9 0.1570 __rmqueue
9 0.1570 dev_hard_start_xmit
9 0.1570 free_pages_bulk
9 0.1570 resume_kernel
9 0.1570 skb_checksum
9 0.1570 tcp_cong_avoid
9 0.1570 tcp_rtt_estimator
8 0.1396 do_anonymous_page
8 0.1396 eth_type_trans
8 0.1396 get_page_from_freelist
8 0.1396 tcp_ack_saw_tstamp
8 0.1396 tcp_v4_checksum_init
7 0.1221 __wake_up
7 0.1221 atomic_notifier_call_chain
7 0.1221 normal_poll
7 0.1221 sk_stream_write_space
7 0.1221 tcp_ack_packets_out
7 0.1221 tcp_check_space
7 0.1221 tcp_cwnd_validate
7 0.1221 tcp_reno_cong_avoid
6 0.1047 __pagevec_lru_add_active
6 0.1047 copy_from_user
6 0.1047 hrtimer_get_softirq_time
6 0.1047 lock_sock
6 0.1047 lookup_bh_lru
6 0.1047 net_tx_action
6 0.1047 remove_wait_queue
6 0.1047 tcp_new_space
6 0.1047 unmap_vmas
5 0.0872 __copy_user_intel
5 0.0872 __handle_mm_fault
5 0.0872 __page_cache_release
5 0.0872 core_sys_select
5 0.0872 del_timer
5 0.0872 dnotify_parent
5 0.0872 filemap_nopage
5 0.0872 find_get_page
5 0.0872 kfree_skb
5 0.0872 lru_cache_add_active
5 0.0872 max_select_fd
5 0.0872 mod_page_state_offset
5 0.0872 note_interrupt
5 0.0872 pipe_poll
5 0.0872 prepare_to_wait
5 0.0872 restore_all
5 0.0872 scheduler_tick
5 0.0872 slab_put_obj
5 0.0872 syscall_exit
5 0.0872 try_to_wake_up
5 0.0872 zone_watermark_ok
4 0.0698 __sk_dst_check
4 0.0698 copy_to_user
4 0.0698 do_poll
4 0.0698 do_pollfd
4 0.0698 fput
4 0.0698 inotify_dentry_parent_queue_event
4 0.0698 inotify_inode_queue_event
4 0.0698 memcpy
4 0.0698 sk_stream_wait_memory
4 0.0698 slab_get_obj
4 0.0698 sock_sendmsg
4 0.0698 strncpy_from_user
4 0.0698 strnlen_user
4 0.0698 tcp_should_expand_sndbuf
4 0.0698 tty_poll
3 0.0523 __alloc_pages
3 0.0523 __copy_user_zeroing_intel
3 0.0523 __d_path
3 0.0523 __find_get_block
3 0.0523 __follow_mount
3 0.0523 __netif_schedule
3 0.0523 __wake_up_bit
3 0.0523 __wake_up_common
3 0.0523 _atomic_dec_and_lock
3 0.0523 activate_task
3 0.0523 anon_vma_prepare
3 0.0523 bh_lru_install
3 0.0523 cond_resched
3 0.0523 do_lookup
3 0.0523 do_path_lookup
3 0.0523 do_readv_writev
3 0.0523 dup_fd
3 0.0523 effective_prio
3 0.0523 hrtimer_run_queues
3 0.0523 ing_filter
3 0.0523 link_path_walk
3 0.0523 notifier_call_chain
3 0.0523 preempt_schedule
3 0.0523 radix_tree_lookup
3 0.0523 release_pages
3 0.0523 run_timer_softirq
3 0.0523 run_workqueue
3 0.0523 sys_sendto
3 0.0523 sys_writev
3 0.0523 tty_ldisc_deref
3 0.0523 unmap_page_range
3 0.0523 vm_normal_page
2 0.0349 __brelse
2 0.0349 __find_get_block_slow
2 0.0349 __getblk
2 0.0349 __mod_page_state_offset
2 0.0349 __mod_timer
2 0.0349 acct_update_integrals
2 0.0349 adjtime_adjustment
2 0.0349 alloc_sock_iocb
2 0.0349 apic_timer_interrupt
2 0.0349 bit_waitqueue
2 0.0349 cache_flusharray
2 0.0349 cache_reap
2 0.0349 d_alloc
2 0.0349 dput
2 0.0349 fget
2 0.0349 finish_wait
2 0.0349 init_timer
2 0.0349 lock_timer_base
2 0.0349 opost_block
2 0.0349 page_remove_rmap
2 0.0349 permission
2 0.0349 poll_get_entry
2 0.0349 poll_initwait
2 0.0349 profile_munmap
2 0.0349 pty_chars_in_buffer
2 0.0349 put_page
2 0.0349 raise_softirq
2 0.0349 recalc_task_prio
2 0.0349 resume_userspace
2 0.0349 ret_from_exception
2 0.0349 rmqueue_bulk
2 0.0349 rw_verify_area
2 0.0349 sched_clock
2 0.0349 setup_frame
2 0.0349 skb_queue_head
2 0.0349 sock_aio_read
2 0.0349 sock_def_readable
2 0.0349 sys_ioctl
2 0.0349 sys_read
2 0.0349 task_curr
2 0.0349 task_timeslice
2 0.0349 tty_ldisc_try
2 0.0349 vfs_read
2 0.0349 vma_adjust
2 0.0349 vma_link
1 0.0174 __block_write_full_page
1 0.0174 __dentry_open
1 0.0174 __dequeue_signal
1 0.0174 __do_page_cache_readahead
1 0.0174 __fput
1 0.0174 __generic_file_aio_read
1 0.0174 __group_complete_signal
1 0.0174 __ip_route_output_key
1 0.0174 __lookup_mnt
1 0.0174 __mark_inode_dirty
1 0.0174 __pollwait
1 0.0174 __put_task_struct
1 0.0174 __put_user_4
1 0.0174 __queue_work
1 0.0174 __rcu_pending
1 0.0174 __sigqueue_alloc
1 0.0174 __vma_link_rb
1 0.0174 alloc_inode
1 0.0174 alloc_slabmgmt
1 0.0174 arch_unmap_area_topdown
1 0.0174 as_add_request
1 0.0174 as_fifo_expired
1 0.0174 as_find_next_arq
1 0.0174 autoremove_wake_function
1 0.0174 bio_init
1 0.0174 block_read_full_page
1 0.0174 cached_lookup
1 0.0174 can_vma_merge_before
1 0.0174 con_chars_in_buffer
1 0.0174 convert_fxsr_from_user
1 0.0174 copy_from_read_buf
1 0.0174 copy_pte_range
1 0.0174 cp_new_stat64
1 0.0174 d_splice_alias
1 0.0174 do_exit
1 0.0174 do_filp_open
1 0.0174 do_fork
1 0.0174 do_getname
1 0.0174 do_gettimeofday
1 0.0174 do_mpage_readpage
1 0.0174 do_sigaction
1 0.0174 do_sock_read
1 0.0174 do_sock_write
1 0.0174 do_sync_write
1 0.0174 do_timer
1 0.0174 drain_array
1 0.0174 dummy_inode_permission
1 0.0174 dup_mm
1 0.0174 dup_task_struct
1 0.0174 elv_queue_empty
1 0.0174 enqueue_hrtimer
1 0.0174 enqueue_task
1 0.0174 exit_mmap
1 0.0174 file_ra_state_init
1 0.0174 filesystems_read_proc
1 0.0174 find_vma_prev
1 0.0174 free_poll_entry
1 0.0174 generic_permission
1 0.0174 get_index
1 0.0174 get_signal_to_deliver
1 0.0174 get_vmalloc_info
1 0.0174 getname
1 0.0174 handle_signal
1 0.0174 hrtimer_try_to_cancel
1 0.0174 inet_csk_init_xmit_timers
1 0.0174 init_buffer_head
1 0.0174 inode_change_ok
1 0.0174 inode_init_once
1 0.0174 kbd_bh
1 0.0174 kmem_cache_zalloc
1 0.0174 kmem_getpages
1 0.0174 load_elf_binary
1 0.0174 locks_remove_posix
1 0.0174 memmove
1 0.0174 mempool_free
1 0.0174 mmput
1 0.0174 mutex_lock
1 0.0174 netlink_insert
1 0.0174 no_singlestep
1 0.0174 nr_blockdev_pages
1 0.0174 number
1 0.0174 open_namei
1 0.0174 page_add_new_anon_rmap
1 0.0174 path_release
1 0.0174 pipe_release
1 0.0174 poke_blanked_console
1 0.0174 proc_pid_readlink
1 0.0174 pty_unthrottle
1 0.0174 put_filp
1 0.0174 radix_tree_insert
1 0.0174 raise_softirq_irqoff
1 0.0174 rb_insert_color
1 0.0174 rcu_do_batch
1 0.0174 rcu_pending
1 0.0174 release_sock
1 0.0174 remove_vma
1 0.0174 restore_sigcontext
1 0.0174 search_binary_handler
1 0.0174 sk_wait_data
1 0.0174 skb_dequeue
1 0.0174 skb_queue_tail
1 0.0174 sock_aio_write
1 0.0174 sock_alloc_send_pskb
1 0.0174 sock_def_write_space
1 0.0174 sock_from_file
1 0.0174 sock_ioctl
1 0.0174 submit_bio
1 0.0174 sys_fcntl64
1 0.0174 sys_fstat64
1 0.0174 sys_rt_sigaction
1 0.0174 sys_rt_sigprocmask
1 0.0174 sys_send
1 0.0174 sys_sigreturn
1 0.0174 sys_socketcall
1 0.0174 sys_waitpid
1 0.0174 tcp_close
1 0.0174 tcp_data_queue
1 0.0174 tcp_fastretrans_alert
1 0.0174 tcp_grow_window
1 0.0174 tcp_mtu_probe
1 0.0174 tcp_v4_do_rcv
1 0.0174 tty_hung_up_p
1 0.0174 tty_insert_flip_string_flags
1 0.0174 tty_paranoia_check
1 0.0174 tty_wakeup
1 0.0174 tty_write
1 0.0174 unlock_buffer
1 0.0174 unmap_region
1 0.0174 update_process_times
1 0.0174 update_wall_time
1 0.0174 update_wall_time_one_tick
1 0.0174 vfs_ioctl
1 0.0174 vfs_permission
1 0.0174 vma_prio_tree_add
1 0.0174 wait_task_zombie
[-- Attachment #4: gso-on --]
[-- Type: text/plain, Size: 12774 bytes --]
CPU: PIII, speed 1200 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples % symbol name
1255 21.6865 csum_partial_copy_generic
398 6.8775 __copy_from_user_ll
343 5.9271 __alloc_skb
254 4.3891 prep_new_page
243 4.1991 skb_segment
110 1.9008 __kmalloc
106 1.8317 kfree
106 1.8317 timer_interrupt
105 1.8144 skb_copy_and_csum_bits
94 1.6243 net_rx_action
77 1.3306 kmem_cache_free
75 1.2960 tcp_v4_rcv
72 1.2442 kmem_cache_alloc
63 1.0886 page_fault
55 0.9504 mark_offset_pmtmr
54 0.9331 __kfree_skb
52 0.8986 skb_release_data
51 0.8813 csum_partial
50 0.8640 do_softirq
50 0.8640 inet_gso_segment
47 0.8122 get_offset_pmtmr
47 0.8122 irq_entries_start
45 0.7776 netif_receive_skb
43 0.7430 tcp_current_mss
41 0.7085 free_hot_cold_page
40 0.6912 tcp_clean_rtx_queue
36 0.6221 kfree_skbmem
35 0.6048 tcp_sendmsg
35 0.6048 tcp_write_xmit
34 0.5875 __do_softirq
31 0.5357 __do_IRQ
31 0.5357 ip_rcv
31 0.5357 tcp_rcv_established
30 0.5184 __pskb_trim_head
29 0.5011 system_call
28 0.4838 tcp_ack
28 0.4838 tcp_tso_segment
28 0.4838 tcp_tso_should_defer
27 0.4666 __copy_to_user_ll
26 0.4493 restore_nocheck
26 0.4493 rt_hash_code
25 0.4320 do_wp_page
24 0.4147 handle_IRQ_event
24 0.4147 schedule
23 0.3974 do_select
23 0.3974 tcp_tso_acked
22 0.3802 ip_local_deliver
22 0.3802 ip_route_input
21 0.3629 buffered_rmqueue
21 0.3629 free_block
20 0.3456 end_level_ioapic_irq
20 0.3456 tcp_init_tso_segs
19 0.3283 __netif_rx_schedule
19 0.3283 cache_alloc_refill
19 0.3283 dev_kfree_skb_any
18 0.3110 skb_split
17 0.2938 ret_from_intr
17 0.2938 tcp_mark_head_lost
16 0.2765 common_interrupt
16 0.2765 do_page_fault
16 0.2765 get_page_from_freelist
16 0.2765 slab_put_obj
15 0.2592 do_IRQ
15 0.2592 sock_poll
15 0.2592 zap_pte_range
14 0.2419 irq_exit
14 0.2419 tcp_trim_head
14 0.2419 tcp_v4_checksum_init
13 0.2246 __link_path_walk
13 0.2246 add_wait_queue
13 0.2246 delay_pmtmr
13 0.2246 tcp_rtt_estimator
12 0.2074 __skb_checksum_complete
12 0.2074 cpu_idle
12 0.2074 fget_light
12 0.2074 find_vma
12 0.2074 skb_checksum
12 0.2074 tcp_new_space
11 0.1901 copy_from_user
11 0.1901 put_page
11 0.1901 tcp_set_skb_tso_segs
10 0.1728 __d_lookup
10 0.1728 __switch_to
10 0.1728 error_code
10 0.1728 eth_type_trans
10 0.1728 i8042_interrupt
10 0.1728 skb_copy_bits
10 0.1728 tcp_transmit_skb
9 0.1555 dev_hard_start_xmit
9 0.1555 mod_page_state_offset
9 0.1555 strnlen_user
8 0.1382 __page_set_anon_rmap
8 0.1382 do_no_page
8 0.1382 ip_output
8 0.1382 resume_kernel
8 0.1382 skb_clone
8 0.1382 tcp_check_space
8 0.1382 tcp_xmit_retransmit_queue
7 0.1210 __mod_timer
7 0.1210 __tcp_push_pending_frames
7 0.1210 __tcp_select_window
7 0.1210 ip_queue_xmit
7 0.1210 mod_timer
7 0.1210 pipe_poll
7 0.1210 remove_wait_queue
7 0.1210 zone_watermark_ok
6 0.1037 __pagevec_lru_add_active
6 0.1037 core_sys_select
6 0.1037 do_pollfd
6 0.1037 find_get_page
6 0.1037 note_interrupt
6 0.1037 sk_stream_write_space
6 0.1037 skb_gso_segment
6 0.1037 sys_read
6 0.1037 tcp_ack_packets_out
6 0.1037 tcp_cong_avoid
6 0.1037 tcp_reno_cong_avoid
5 0.0864 __rmqueue
5 0.0864 __wake_up
5 0.0864 __wake_up_common
5 0.0864 dev_queue_xmit
5 0.0864 eth_header
5 0.0864 filemap_nopage
5 0.0864 fput
5 0.0864 free_pages_bulk
5 0.0864 internal_add_timer
5 0.0864 local_bh_enable
5 0.0864 lookup_bh_lru
5 0.0864 sys_socketcall
5 0.0864 syscall_exit
5 0.0864 tcp_mtu_probe
5 0.0864 tcp_v4_do_rcv
4 0.0691 __copy_user_intel
4 0.0691 __handle_mm_fault
4 0.0691 __mod_page_state_offset
4 0.0691 __page_cache_release
4 0.0691 __pollwait
4 0.0691 __qdisc_run
4 0.0691 adjtime_adjustment
4 0.0691 apic_timer_interrupt
4 0.0691 cond_resched
4 0.0691 hrtimer_run_queues
4 0.0691 kfree_skb
4 0.0691 lock_timer_base
4 0.0691 normal_poll
4 0.0691 opost_block
4 0.0691 pfifo_fast_enqueue
4 0.0691 preempt_schedule
4 0.0691 pskb_expand_head
4 0.0691 radix_tree_lookup
4 0.0691 resume_userspace
4 0.0691 sk_reset_timer
4 0.0691 skb_dequeue
4 0.0691 sys_send
4 0.0691 tcp_sacktag_write_queue
4 0.0691 tty_ldisc_try
4 0.0691 vfs_permission
3 0.0518 __alloc_pages
3 0.0518 __find_get_block
3 0.0518 __sk_dst_check
3 0.0518 anon_vma_prepare
3 0.0518 do_mmap_pgoff
3 0.0518 do_readv_writev
3 0.0518 do_sock_read
3 0.0518 dup_mm
3 0.0518 generic_permission
3 0.0518 hrtimer_get_softirq_time
3 0.0518 ing_filter
3 0.0518 lru_cache_add_active
3 0.0518 page_add_new_anon_rmap
3 0.0518 permission
3 0.0518 pfifo_fast_dequeue
3 0.0518 pty_chars_in_buffer
3 0.0518 raise_softirq
3 0.0518 rb_insert_color
3 0.0518 release_pages
3 0.0518 restore_all
3 0.0518 run_timer_softirq
3 0.0518 rw_verify_area
3 0.0518 slab_get_obj
3 0.0518 sock_wfree
3 0.0518 tcp_ack_saw_tstamp
3 0.0518 tcp_build_and_update_options
3 0.0518 tcp_event_data_sent
3 0.0518 tcp_should_expand_sndbuf
3 0.0518 tcp_v4_send_check
3 0.0518 tso_fragment
3 0.0518 unmap_vmas
3 0.0518 update_wall_time
3 0.0518 vsnprintf
2 0.0346 __rcu_pending
2 0.0346 _atomic_dec_and_lock
2 0.0346 account_system_time
2 0.0346 acct_update_integrals
2 0.0346 blk_recount_segments
2 0.0346 cache_flusharray
2 0.0346 cleanup_timers
2 0.0346 copy_pte_range
2 0.0346 cp_new_stat64
2 0.0346 current_fs_time
2 0.0346 d_instantiate
2 0.0346 default_wake_function
2 0.0346 dequeue_task
2 0.0346 dnotify_parent
2 0.0346 do_anonymous_page
2 0.0346 do_gettimeofday
2 0.0346 do_path_lookup
2 0.0346 do_setitimer
2 0.0346 do_sys_poll
2 0.0346 drain_array
2 0.0346 effective_prio
2 0.0346 find_next_zero_bit
2 0.0346 inode_init_once
2 0.0346 input_event
2 0.0346 max_select_fd
2 0.0346 memcpy
2 0.0346 memmove
2 0.0346 need_resched
2 0.0346 neigh_resolve_output
2 0.0346 no_singlestep
2 0.0346 notifier_call_chain
2 0.0346 page_remove_rmap
2 0.0346 poll_freewait
2 0.0346 prepare_to_wait
2 0.0346 recalc_task_prio
2 0.0346 rmqueue_bulk
2 0.0346 schedule_timeout
2 0.0346 scheduler_tick
2 0.0346 skb_queue_tail
2 0.0346 sock_aio_read
2 0.0346 sock_aio_write
2 0.0346 sock_from_file
2 0.0346 sock_sendmsg
2 0.0346 strncpy_from_user
2 0.0346 sys_gettimeofday
2 0.0346 sys_sendto
2 0.0346 tcp_cwnd_down
2 0.0346 tcp_data_queue
2 0.0346 tcp_fastretrans_alert
2 0.0346 tcp_parse_options
2 0.0346 tcp_push_one
2 0.0346 tcp_select_window
2 0.0346 tcp_snd_test
2 0.0346 transfer_objects
2 0.0346 try_to_wake_up
2 0.0346 tty_write
2 0.0346 unmap_page_range
2 0.0346 vfs_ioctl
2 0.0346 vma_adjust
1 0.0173 __bread
1 0.0173 __brelse
1 0.0173 __dentry_open
1 0.0173 __dequeue_signal
1 0.0173 __exit_signal
1 0.0173 __find_get_block_slow
1 0.0173 __group_complete_signal
1 0.0173 __insert_inode_hash
1 0.0173 __lookup_mnt
1 0.0173 __netif_schedule
1 0.0173 __pskb_pull_tail
1 0.0173 __pte_alloc
1 0.0173 __tasklet_schedule
1 0.0173 __wake_up_bit
1 0.0173 acct_process
1 0.0173 ack_edge_ioapic_irq
1 0.0173 acquire_console_sem
1 0.0173 activate_task
1 0.0173 alarm_setitimer
1 0.0173 alloc_new_pmd
1 0.0173 as_dispatch_request
1 0.0173 as_merged_request
1 0.0173 autoremove_wake_function
1 0.0173 bh_lru_install
1 0.0173 bit_waitqueue
1 0.0173 block_read_full_page
1 0.0173 cache_reap
1 0.0173 check_itimerval
1 0.0173 clear_user
1 0.0173 con_chars_in_buffer
1 0.0173 convert_fxsr_to_user
1 0.0173 copy_semundo
1 0.0173 copy_strings
1 0.0173 copy_to_user
1 0.0173 d_rehash
1 0.0173 deactivate_task
1 0.0173 dev_gso_segment
1 0.0173 do_fcntl
1 0.0173 do_generic_mapping_read
1 0.0173 do_lookup
1 0.0173 do_poll
1 0.0173 do_sigaction
1 0.0173 do_sync_read
1 0.0173 do_sys_open
1 0.0173 dummy_vm_enough_memory
1 0.0173 dup_fd
1 0.0173 enqueue_task
1 0.0173 exec_permission_lite
1 0.0173 file_ra_state_init
1 0.0173 file_update_time
1 0.0173 filp_close
1 0.0173 find_task_by_pid_type
1 0.0173 finish_wait
1 0.0173 flush_old_exec
1 0.0173 free_one_page
1 0.0173 free_page_and_swap_cache
1 0.0173 free_poll_entry
1 0.0173 free_uid
1 0.0173 get_empty_filp
1 0.0173 get_index
1 0.0173 get_signal_to_deliver
1 0.0173 get_task_mm
1 0.0173 get_vmalloc_info
1 0.0173 getname
1 0.0173 group_send_sig_info
1 0.0173 groups_search
1 0.0173 handle_signal
1 0.0173 hrtimer_try_to_cancel
1 0.0173 inode_setattr
1 0.0173 inode_sub_bytes
1 0.0173 inotify_dentry_parent_queue_event
1 0.0173 inotify_inode_queue_event
1 0.0173 kbd_keycode
1 0.0173 kmem_cache_zalloc
1 0.0173 kthread_should_stop
1 0.0173 locks_remove_flock
1 0.0173 lookup_create
1 0.0173 make_ahead_window
1 0.0173 mark_page_accessed
1 0.0173 math_state_restore
1 0.0173 may_expand_vm
1 0.0173 n_tty_receive_buf
1 0.0173 nameidata_to_filp
1 0.0173 opost
1 0.0173 page_waitqueue
1 0.0173 prio_tree_remove
1 0.0173 proc_file_read
1 0.0173 profile_munmap
1 0.0173 profile_tick
1 0.0173 put_io_context
1 0.0173 rb_next
1 0.0173 rcu_do_batch
1 0.0173 rcu_pending
1 0.0173 recalc_sigpending_tsk
1 0.0173 run_local_timers
1 0.0173 run_posix_cpu_timers
1 0.0173 save_i387
1 0.0173 sched_clock
1 0.0173 setup_frame
1 0.0173 signal_wake_up
1 0.0173 sk_stream_wait_memory
1 0.0173 skb_checksum_help
1 0.0173 slab_destroy
1 0.0173 smp_send_timer_broadcast_ipi
1 0.0173 sock_def_readable
1 0.0173 sock_ioctl
1 0.0173 sys_getpid
1 0.0173 sys_munmap
1 0.0173 syscall_call
1 0.0173 tcp_ack_update_window
1 0.0173 tcp_check_sack_reneging
1 0.0173 tcp_fast_parse_options
1 0.0173 tcp_fragment
1 0.0173 tcp_mtu_to_mss
1 0.0173 tcp_window_allows
1 0.0173 timespec_trunc
1 0.0173 tty_hung_up_p
1 0.0173 tty_ldisc_deref
1 0.0173 tty_poll
1 0.0173 unlink_file_vma
1 0.0173 vfs_getattr
1 0.0173 vfs_read
1 0.0173 vfs_write
1 0.0173 vm_normal_page
1 0.0173 vm_stat_account
1 0.0173 vma_prio_tree_add
1 0.0173 zone_statistics
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [0/5] GSO: Generic Segmentation Offload
2006-06-20 9:32 ` [0/5] GSO: Generic Segmentation Offload Herbert Xu
@ 2006-06-20 10:40 ` David Miller
2006-06-20 16:18 ` Rick Jones
1 sibling, 0 replies; 21+ messages in thread
From: David Miller @ 2006-06-20 10:40 UTC (permalink / raw)
To: herbert; +Cc: netdev
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 20 Jun 2006 19:32:19 +1000
> On Tue, Jun 20, 2006 at 07:09:19PM +1000, herbert wrote:
> >
> > I've attached some numbers to demonstrate the savings brought on by
> > doing this. The best scenario is obviously the case where the underlying
> > NIC supports SG. This means that we simply have to manipulate the SG
> > entries and place them into individual skb's before passing them to the
> > driver. The attached file lo-res shows this.
>
> Obviously I forgot to attach them :)
:-)
The changes look good on first scan, I'll look more deeply and
meanwhile we'll let the patches ferment for a few days so others
can comment too :-)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [0/5] GSO: Generic Segmentation Offload
2006-06-20 9:32 ` [0/5] GSO: Generic Segmentation Offload Herbert Xu
2006-06-20 10:40 ` David Miller
@ 2006-06-20 16:18 ` Rick Jones
1 sibling, 0 replies; 21+ messages in thread
From: Rick Jones @ 2006-06-20 16:18 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, netdev
> $ sudo ./ethtool -K lo gso on
> $ sudo ifconfig lo mtu 1500
> $ netperf -t TCP_STREAM
> TCP STREAM TEST to localhost
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. 10^6bits/sec
>
> 87380 16384 16384 10.00 3598.17
Would it really mess people up if netperf started doing CPU utilization
measurements by default on those platforms where it did not require
prior calibrarion? I think that might make it more likely that when
folks run tests, even over loopback (esp on MP), we'll get the service
demand figures that help show the the change in stack efficiency.
rick jones
BTW, the style of the netperf test banner tells me you might want to
upgrade to a newer version of netperf :)
^ permalink raw reply [flat|nested] 21+ messages in thread