* [1/5] [NET]: Merge TSO/UFO fields in sk_buff
2006-06-22 8:12 [0/5] GSO: Generic Segmentation Offload Herbert Xu
@ 2006-06-22 8:12 ` Herbert Xu
2006-06-22 8:13 ` [2/5] [NET]: Add generic segmentation offload Herbert Xu
` (6 subsequent siblings)
7 siblings, 0 replies; 23+ messages in thread
From: Herbert Xu @ 2006-06-22 8:12 UTC (permalink / raw)
To: David S. Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 1331 bytes --]
Hi:
[NET]: Merge TSO/UFO fields in sk_buff
Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
going to scale if we add any more segmentation methods (e.g., DCCP). So
let's merge them.
They were used to tell the protocol of a packet. This function has been
subsumed by the new gso_type field. This is essentially a set of netdev
feature bits (shifted by 16 bits) that are required to process a specific
skb. As such it's easy to tell whether a given device can process a GSO
skb: you just have to and the gso_type field and the netdev's features
field.
I've made gso_type a conjunction. The idea is that you have a base type
(e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
For example, if we add a hardware TSO type that supports ECN, they would
declare NETIF_F_TSO | NETIF_F_TSO_ECN. All TSO packets with CWR set would
have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
packets would be SKB_GSO_TCPV4. This means that only the CWR packets need
to be emulated in software.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p1.patch --]
[-- Type: text/plain, Size: 25944 bytes --]
diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c
--- a/drivers/net/8139cp.c
+++ b/drivers/net/8139cp.c
@@ -792,7 +792,7 @@ static int cp_start_xmit (struct sk_buff
entry = cp->tx_head;
eor = (entry == (CP_TX_RING_SIZE - 1)) ? RingEnd : 0;
if (dev->features & NETIF_F_TSO)
- mss = skb_shinfo(skb)->tso_size;
+ mss = skb_shinfo(skb)->gso_size;
if (skb_shinfo(skb)->nr_frags == 0) {
struct cp_desc *txd = &cp->tx_ring[entry];
diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -1640,7 +1640,7 @@ bnx2_tx_int(struct bnx2 *bp)
skb = tx_buf->skb;
#ifdef BCM_TSO
/* partial BD completions possible with TSO packets */
- if (skb_shinfo(skb)->tso_size) {
+ if (skb_shinfo(skb)->gso_size) {
u16 last_idx, last_ring_idx;
last_idx = sw_cons +
@@ -4428,7 +4428,7 @@ bnx2_start_xmit(struct sk_buff *skb, str
(TX_BD_FLAGS_VLAN_TAG | (vlan_tx_tag_get(skb) << 16));
}
#ifdef BCM_TSO
- if ((mss = skb_shinfo(skb)->tso_size) &&
+ if ((mss = skb_shinfo(skb)->gso_size) &&
(skb->len > (bp->dev->mtu + ETH_HLEN))) {
u32 tcp_opt_len, ip_tcp_len;
diff --git a/drivers/net/chelsio/sge.c b/drivers/net/chelsio/sge.c
--- a/drivers/net/chelsio/sge.c
+++ b/drivers/net/chelsio/sge.c
@@ -1418,7 +1418,7 @@ int t1_start_xmit(struct sk_buff *skb, s
struct cpl_tx_pkt *cpl;
#ifdef NETIF_F_TSO
- if (skb_shinfo(skb)->tso_size) {
+ if (skb_shinfo(skb)->gso_size) {
int eth_type;
struct cpl_tx_pkt_lso *hdr;
@@ -1433,7 +1433,7 @@ int t1_start_xmit(struct sk_buff *skb, s
hdr->ip_hdr_words = skb->nh.iph->ihl;
hdr->tcp_hdr_words = skb->h.th->doff;
hdr->eth_type_mss = htons(MK_ETH_TYPE_MSS(eth_type,
- skb_shinfo(skb)->tso_size));
+ skb_shinfo(skb)->gso_size));
hdr->len = htonl(skb->len - sizeof(*hdr));
cpl = (struct cpl_tx_pkt *)hdr;
sge->stats.tx_lso_pkts++;
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -2394,7 +2394,7 @@ e1000_tso(struct e1000_adapter *adapter,
uint8_t ipcss, ipcso, tucss, tucso, hdr_len;
int err;
- if (skb_shinfo(skb)->tso_size) {
+ if (skb_shinfo(skb)->gso_size) {
if (skb_header_cloned(skb)) {
err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
if (err)
@@ -2402,7 +2402,7 @@ e1000_tso(struct e1000_adapter *adapter,
}
hdr_len = ((skb->h.raw - skb->data) + (skb->h.th->doff << 2));
- mss = skb_shinfo(skb)->tso_size;
+ mss = skb_shinfo(skb)->gso_size;
if (skb->protocol == htons(ETH_P_IP)) {
skb->nh.iph->tot_len = 0;
skb->nh.iph->check = 0;
@@ -2519,7 +2519,7 @@ e1000_tx_map(struct e1000_adapter *adapt
* tso gets written back prematurely before the data is fully
* DMA'd to the controller */
if (!skb->data_len && tx_ring->last_tx_tso &&
- !skb_shinfo(skb)->tso_size) {
+ !skb_shinfo(skb)->gso_size) {
tx_ring->last_tx_tso = 0;
size -= 4;
}
@@ -2757,7 +2757,7 @@ e1000_xmit_frame(struct sk_buff *skb, st
}
#ifdef NETIF_F_TSO
- mss = skb_shinfo(skb)->tso_size;
+ mss = skb_shinfo(skb)->gso_size;
/* The controller does a simple calculation to
* make sure there is enough room in the FIFO before
* initiating the DMA for each buffer. The calc is:
@@ -2807,7 +2807,7 @@ e1000_xmit_frame(struct sk_buff *skb, st
#ifdef NETIF_F_TSO
/* Controller Erratum workaround */
if (!skb->data_len && tx_ring->last_tx_tso &&
- !skb_shinfo(skb)->tso_size)
+ !skb_shinfo(skb)->gso_size)
count++;
#endif
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -1495,8 +1495,8 @@ static int nv_start_xmit(struct sk_buff
np->tx_skbuff[nr] = skb;
#ifdef NETIF_F_TSO
- if (skb_shinfo(skb)->tso_size)
- tx_flags_extra = NV_TX2_TSO | (skb_shinfo(skb)->tso_size << NV_TX2_TSO_SHIFT);
+ if (skb_shinfo(skb)->gso_size)
+ tx_flags_extra = NV_TX2_TSO | (skb_shinfo(skb)->gso_size << NV_TX2_TSO_SHIFT);
else
#endif
tx_flags_extra = (skb->ip_summed == CHECKSUM_HW ? (NV_TX2_CHECKSUM_L3|NV_TX2_CHECKSUM_L4) : 0);
diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
--- a/drivers/net/ixgb/ixgb_main.c
+++ b/drivers/net/ixgb/ixgb_main.c
@@ -1173,7 +1173,7 @@ ixgb_tso(struct ixgb_adapter *adapter, s
uint16_t ipcse, tucse, mss;
int err;
- if(likely(skb_shinfo(skb)->tso_size)) {
+ if(likely(skb_shinfo(skb)->gso_size)) {
if (skb_header_cloned(skb)) {
err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
if (err)
@@ -1181,7 +1181,7 @@ ixgb_tso(struct ixgb_adapter *adapter, s
}
hdr_len = ((skb->h.raw - skb->data) + (skb->h.th->doff << 2));
- mss = skb_shinfo(skb)->tso_size;
+ mss = skb_shinfo(skb)->gso_size;
skb->nh.iph->tot_len = 0;
skb->nh.iph->check = 0;
skb->h.th->check = ~csum_tcpudp_magic(skb->nh.iph->saddr,
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -74,7 +74,7 @@ static void emulate_large_send_offload(s
struct iphdr *iph = skb->nh.iph;
struct tcphdr *th = (struct tcphdr*)(skb->nh.raw + (iph->ihl * 4));
unsigned int doffset = (iph->ihl + th->doff) * 4;
- unsigned int mtu = skb_shinfo(skb)->tso_size + doffset;
+ unsigned int mtu = skb_shinfo(skb)->gso_size + doffset;
unsigned int offset = 0;
u32 seq = ntohl(th->seq);
u16 id = ntohs(iph->id);
@@ -139,7 +139,7 @@ static int loopback_xmit(struct sk_buff
#endif
#ifdef LOOPBACK_TSO
- if (skb_shinfo(skb)->tso_size) {
+ if (skb_shinfo(skb)->gso_size) {
BUG_ON(skb->protocol != htons(ETH_P_IP));
BUG_ON(skb->nh.iph->protocol != IPPROTO_TCP);
diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
--- a/drivers/net/myri10ge/myri10ge.c
+++ b/drivers/net/myri10ge/myri10ge.c
@@ -1879,7 +1879,7 @@ again:
#ifdef NETIF_F_TSO
if (skb->len > (dev->mtu + ETH_HLEN)) {
- mss = skb_shinfo(skb)->tso_size;
+ mss = skb_shinfo(skb)->gso_size;
if (mss != 0)
max_segments = MYRI10GE_MAX_SEND_DESC_TSO;
}
@@ -2113,7 +2113,7 @@ abort_linearize:
}
idx = (idx + 1) & tx->mask;
} while (idx != last_idx);
- if (skb_shinfo(skb)->tso_size) {
+ if (skb_shinfo(skb)->gso_size) {
printk(KERN_ERR
"myri10ge: %s: TSO but wanted to linearize?!?!?\n",
mgp->dev->name);
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -2172,7 +2172,7 @@ static int rtl8169_xmit_frags(struct rtl
static inline u32 rtl8169_tso_csum(struct sk_buff *skb, struct net_device *dev)
{
if (dev->features & NETIF_F_TSO) {
- u32 mss = skb_shinfo(skb)->tso_size;
+ u32 mss = skb_shinfo(skb)->gso_size;
if (mss)
return LargeSend | ((mss & MSSMask) << MSSShift);
diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -3915,8 +3915,8 @@ static int s2io_xmit(struct sk_buff *skb
txdp->Control_1 = 0;
txdp->Control_2 = 0;
#ifdef NETIF_F_TSO
- mss = skb_shinfo(skb)->tso_size;
- if (mss) {
+ mss = skb_shinfo(skb)->gso_size;
+ if (skb_shinfo(skb)->gso_type == SKB_GSO_TCPV4) {
txdp->Control_1 |= TXD_TCP_LSO_EN;
txdp->Control_1 |= TXD_TCP_LSO_MSS(mss);
}
@@ -3936,10 +3936,10 @@ static int s2io_xmit(struct sk_buff *skb
}
frg_len = skb->len - skb->data_len;
- if (skb_shinfo(skb)->ufo_size) {
+ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4) {
int ufo_size;
- ufo_size = skb_shinfo(skb)->ufo_size;
+ ufo_size = skb_shinfo(skb)->gso_size;
ufo_size &= ~7;
txdp->Control_1 |= TXD_UFO_EN;
txdp->Control_1 |= TXD_UFO_MSS(ufo_size);
@@ -3965,7 +3965,7 @@ static int s2io_xmit(struct sk_buff *skb
txdp->Host_Control = (unsigned long) skb;
txdp->Control_1 |= TXD_BUFFER0_SIZE(frg_len);
- if (skb_shinfo(skb)->ufo_size)
+ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4)
txdp->Control_1 |= TXD_UFO_EN;
frg_cnt = skb_shinfo(skb)->nr_frags;
@@ -3980,12 +3980,12 @@ static int s2io_xmit(struct sk_buff *skb
(sp->pdev, frag->page, frag->page_offset,
frag->size, PCI_DMA_TODEVICE);
txdp->Control_1 = TXD_BUFFER0_SIZE(frag->size);
- if (skb_shinfo(skb)->ufo_size)
+ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4)
txdp->Control_1 |= TXD_UFO_EN;
}
txdp->Control_1 |= TXD_GATHER_CODE_LAST;
- if (skb_shinfo(skb)->ufo_size)
+ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4)
frg_cnt++; /* as Txd0 was used for inband header */
tx_fifo = mac_control->tx_FIFO_start[queue];
@@ -3999,7 +3999,7 @@ static int s2io_xmit(struct sk_buff *skb
if (mss)
val64 |= TX_FIFO_SPECIAL_FUNC;
#endif
- if (skb_shinfo(skb)->ufo_size)
+ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4)
val64 |= TX_FIFO_SPECIAL_FUNC;
writeq(val64, &tx_fifo->List_Control);
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1160,7 +1160,7 @@ static unsigned tx_le_req(const struct s
count = sizeof(dma_addr_t) / sizeof(u32);
count += skb_shinfo(skb)->nr_frags * count;
- if (skb_shinfo(skb)->tso_size)
+ if (skb_shinfo(skb)->gso_size)
++count;
if (skb->ip_summed == CHECKSUM_HW)
@@ -1232,7 +1232,7 @@ static int sky2_xmit_frame(struct sk_buf
}
/* Check for TCP Segmentation Offload */
- mss = skb_shinfo(skb)->tso_size;
+ mss = skb_shinfo(skb)->gso_size;
if (mss != 0) {
/* just drop the packet if non-linear expansion fails */
if (skb_header_cloned(skb) &&
diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -3780,7 +3780,7 @@ static int tg3_start_xmit(struct sk_buff
#if TG3_TSO_SUPPORT != 0
mss = 0;
if (skb->len > (tp->dev->mtu + ETH_HLEN) &&
- (mss = skb_shinfo(skb)->tso_size) != 0) {
+ (mss = skb_shinfo(skb)->gso_size) != 0) {
int tcp_opt_len, ip_tcp_len;
if (skb_header_cloned(skb) &&
@@ -3905,7 +3905,7 @@ static int tg3_start_xmit_dma_bug(struct
#if TG3_TSO_SUPPORT != 0
mss = 0;
if (skb->len > (tp->dev->mtu + ETH_HLEN) &&
- (mss = skb_shinfo(skb)->tso_size) != 0) {
+ (mss = skb_shinfo(skb)->gso_size) != 0) {
int tcp_opt_len, ip_tcp_len;
if (skb_header_cloned(skb) &&
diff --git a/drivers/net/typhoon.c b/drivers/net/typhoon.c
--- a/drivers/net/typhoon.c
+++ b/drivers/net/typhoon.c
@@ -340,7 +340,7 @@ enum state_values {
#endif
#if defined(NETIF_F_TSO)
-#define skb_tso_size(x) (skb_shinfo(x)->tso_size)
+#define skb_tso_size(x) (skb_shinfo(x)->gso_size)
#define TSO_NUM_DESCRIPTORS 2
#define TSO_OFFLOAD_ON TYPHOON_OFFLOAD_TCP_SEGMENT
#else
diff --git a/drivers/s390/net/qeth_eddp.c b/drivers/s390/net/qeth_eddp.c
--- a/drivers/s390/net/qeth_eddp.c
+++ b/drivers/s390/net/qeth_eddp.c
@@ -420,7 +420,7 @@ __qeth_eddp_fill_context_tcp(struct qeth
}
tcph = eddp->skb->h.th;
while (eddp->skb_offset < eddp->skb->len) {
- data_len = min((int)skb_shinfo(eddp->skb)->tso_size,
+ data_len = min((int)skb_shinfo(eddp->skb)->gso_size,
(int)(eddp->skb->len - eddp->skb_offset));
/* prepare qdio hdr */
if (eddp->qh.hdr.l2.id == QETH_HEADER_TYPE_LAYER2){
@@ -515,20 +515,20 @@ qeth_eddp_calc_num_pages(struct qeth_edd
QETH_DBF_TEXT(trace, 5, "eddpcanp");
/* can we put multiple skbs in one page? */
- skbs_per_page = PAGE_SIZE / (skb_shinfo(skb)->tso_size + hdr_len);
+ skbs_per_page = PAGE_SIZE / (skb_shinfo(skb)->gso_size + hdr_len);
if (skbs_per_page > 1){
- ctx->num_pages = (skb_shinfo(skb)->tso_segs + 1) /
+ ctx->num_pages = (skb_shinfo(skb)->gso_segs + 1) /
skbs_per_page + 1;
ctx->elements_per_skb = 1;
} else {
/* no -> how many elements per skb? */
- ctx->elements_per_skb = (skb_shinfo(skb)->tso_size + hdr_len +
+ ctx->elements_per_skb = (skb_shinfo(skb)->gso_size + hdr_len +
PAGE_SIZE) >> PAGE_SHIFT;
ctx->num_pages = ctx->elements_per_skb *
- (skb_shinfo(skb)->tso_segs + 1);
+ (skb_shinfo(skb)->gso_segs + 1);
}
ctx->num_elements = ctx->elements_per_skb *
- (skb_shinfo(skb)->tso_segs + 1);
+ (skb_shinfo(skb)->gso_segs + 1);
}
static inline struct qeth_eddp_context *
diff --git a/drivers/s390/net/qeth_main.c b/drivers/s390/net/qeth_main.c
--- a/drivers/s390/net/qeth_main.c
+++ b/drivers/s390/net/qeth_main.c
@@ -4417,7 +4417,7 @@ qeth_send_packet(struct qeth_card *card,
struct qeth_eddp_context *ctx = NULL;
int tx_bytes = skb->len;
unsigned short nr_frags = skb_shinfo(skb)->nr_frags;
- unsigned short tso_size = skb_shinfo(skb)->tso_size;
+ unsigned short tso_size = skb_shinfo(skb)->gso_size;
int rc;
QETH_DBF_TEXT(trace, 6, "sendpkt");
@@ -4453,7 +4453,7 @@ qeth_send_packet(struct qeth_card *card,
queue = card->qdio.out_qs
[qeth_get_priority_queue(card, skb, ipv, cast_type)];
- if (skb_shinfo(skb)->tso_size)
+ if (skb_shinfo(skb)->gso_size)
large_send = card->options.large_send;
/*are we able to do TSO ? If so ,prepare and send it from here */
diff --git a/drivers/s390/net/qeth_tso.h b/drivers/s390/net/qeth_tso.h
--- a/drivers/s390/net/qeth_tso.h
+++ b/drivers/s390/net/qeth_tso.h
@@ -51,7 +51,7 @@ qeth_tso_fill_header(struct qeth_card *c
hdr->ext.hdr_version = 1;
hdr->ext.hdr_len = 28;
/*insert non-fix values */
- hdr->ext.mss = skb_shinfo(skb)->tso_size;
+ hdr->ext.mss = skb_shinfo(skb)->gso_size;
hdr->ext.dg_hdr_len = (__u16)(iph->ihl*4 + tcph->doff*4);
hdr->ext.payload_len = (__u16)(skb->len - hdr->ext.dg_hdr_len -
sizeof(struct qeth_hdr_tso));
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -309,9 +309,12 @@ struct net_device
#define NETIF_F_HW_VLAN_RX 256 /* Receive VLAN hw acceleration */
#define NETIF_F_HW_VLAN_FILTER 512 /* Receive filtering on VLAN */
#define NETIF_F_VLAN_CHALLENGED 1024 /* Device cannot handle VLAN packets */
-#define NETIF_F_TSO 2048 /* Can offload TCP/IP segmentation */
#define NETIF_F_LLTX 4096 /* LockLess TX */
-#define NETIF_F_UFO 8192 /* Can offload UDP Large Send*/
+
+ /* Segmentation offload features */
+#define NETIF_F_GSO_SHIFT 16
+#define NETIF_F_TSO (SKB_GSO_TCPV4 << NETIF_F_GSO_SHIFT)
+#define NETIF_F_UFO (SKB_GSO_UDPV4 << NETIF_F_GSO_SHIFT)
#define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
#define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
@@ -980,6 +983,13 @@ extern void dev_seq_stop(struct seq_file
extern void linkwatch_run_queue(void);
+static inline int netif_needs_gso(struct net_device *dev, struct sk_buff *skb)
+{
+ int feature = skb_shinfo(skb)->gso_type << NETIF_F_GSO_SHIFT;
+ return skb_shinfo(skb)->gso_size &&
+ (dev->features & feature) != feature;
+}
+
#endif /* __KERNEL__ */
#endif /* _LINUX_DEV_H */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -135,9 +135,10 @@ struct skb_frag_struct {
struct skb_shared_info {
atomic_t dataref;
unsigned short nr_frags;
- unsigned short tso_size;
- unsigned short tso_segs;
- unsigned short ufo_size;
+ unsigned short gso_size;
+ /* Warning: this field is not always filled in (UFO)! */
+ unsigned short gso_segs;
+ unsigned short gso_type;
unsigned int ip6_frag_id;
struct sk_buff *frag_list;
skb_frag_t frags[MAX_SKB_FRAGS];
@@ -169,6 +170,11 @@ enum {
SKB_FCLONE_CLONE,
};
+enum {
+ SKB_GSO_TCPV4 = 1 << 0,
+ SKB_GSO_UDPV4 = 1 << 1,
+};
+
/**
* struct sk_buff - socket buffer
* @next: Next buffer in list
diff --git a/include/net/tcp.h b/include/net/tcp.h
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -570,13 +570,13 @@ struct tcp_skb_cb {
*/
static inline int tcp_skb_pcount(const struct sk_buff *skb)
{
- return skb_shinfo(skb)->tso_segs;
+ return skb_shinfo(skb)->gso_segs;
}
/* This is valid iff tcp_skb_pcount() > 1. */
static inline int tcp_skb_mss(const struct sk_buff *skb)
{
- return skb_shinfo(skb)->tso_size;
+ return skb_shinfo(skb)->gso_size;
}
static inline void tcp_dec_pcount_approx(__u32 *count,
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -34,8 +34,8 @@ static inline unsigned packet_length(con
int br_dev_queue_push_xmit(struct sk_buff *skb)
{
- /* drop mtu oversized packets except tso */
- if (packet_length(skb) > skb->dev->mtu && !skb_shinfo(skb)->tso_size)
+ /* drop mtu oversized packets except gso */
+ if (packet_length(skb) > skb->dev->mtu && !skb_shinfo(skb)->gso_size)
kfree_skb(skb);
else {
#ifdef CONFIG_BRIDGE_NETFILTER
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -761,7 +761,7 @@ static int br_nf_dev_queue_xmit(struct s
{
if (skb->protocol == htons(ETH_P_IP) &&
skb->len > skb->dev->mtu &&
- !(skb_shinfo(skb)->ufo_size || skb_shinfo(skb)->tso_size))
+ !skb_shinfo(skb)->gso_size)
return ip_fragment(skb, br_dev_queue_push_xmit);
else
return br_dev_queue_push_xmit(skb);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -172,9 +172,9 @@ struct sk_buff *__alloc_skb(unsigned int
shinfo = skb_shinfo(skb);
atomic_set(&shinfo->dataref, 1);
shinfo->nr_frags = 0;
- shinfo->tso_size = 0;
- shinfo->tso_segs = 0;
- shinfo->ufo_size = 0;
+ shinfo->gso_size = 0;
+ shinfo->gso_segs = 0;
+ shinfo->gso_type = 0;
shinfo->ip6_frag_id = 0;
shinfo->frag_list = NULL;
@@ -238,8 +238,9 @@ struct sk_buff *alloc_skb_from_cache(kme
atomic_set(&(skb_shinfo(skb)->dataref), 1);
skb_shinfo(skb)->nr_frags = 0;
- skb_shinfo(skb)->tso_size = 0;
- skb_shinfo(skb)->tso_segs = 0;
+ skb_shinfo(skb)->gso_size = 0;
+ skb_shinfo(skb)->gso_segs = 0;
+ skb_shinfo(skb)->gso_type = 0;
skb_shinfo(skb)->frag_list = NULL;
out:
return skb;
@@ -528,8 +529,9 @@ static void copy_skb_header(struct sk_bu
#endif
skb_copy_secmark(new, old);
atomic_set(&new->users, 1);
- skb_shinfo(new)->tso_size = skb_shinfo(old)->tso_size;
- skb_shinfo(new)->tso_segs = skb_shinfo(old)->tso_segs;
+ skb_shinfo(new)->gso_size = skb_shinfo(old)->gso_size;
+ skb_shinfo(new)->gso_segs = skb_shinfo(old)->gso_segs;
+ skb_shinfo(new)->gso_type = skb_shinfo(old)->gso_type;
}
/**
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -210,8 +210,7 @@ static inline int ip_finish_output(struc
return dst_output(skb);
}
#endif
- if (skb->len > dst_mtu(skb->dst) &&
- !(skb_shinfo(skb)->ufo_size || skb_shinfo(skb)->tso_size))
+ if (skb->len > dst_mtu(skb->dst) && !skb_shinfo(skb)->gso_size)
return ip_fragment(skb, ip_finish_output2);
else
return ip_finish_output2(skb);
@@ -362,7 +361,7 @@ packet_routed:
}
ip_select_ident_more(iph, &rt->u.dst, sk,
- (skb_shinfo(skb)->tso_segs ?: 1) - 1);
+ (skb_shinfo(skb)->gso_segs ?: 1) - 1);
/* Add an IP checksum. */
ip_send_check(iph);
@@ -744,7 +743,8 @@ static inline int ip_ufo_append_data(str
(length - transhdrlen));
if (!err) {
/* specify the length of each IP datagram fragment*/
- skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen);
+ skb_shinfo(skb)->gso_size = mtu - fragheaderlen;
+ skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4;
__skb_queue_tail(&sk->sk_write_queue, skb);
return 0;
@@ -1087,14 +1087,16 @@ ssize_t ip_append_page(struct sock *sk,
inet->cork.length += size;
if ((sk->sk_protocol == IPPROTO_UDP) &&
- (rt->u.dst.dev->features & NETIF_F_UFO))
- skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen);
+ (rt->u.dst.dev->features & NETIF_F_UFO)) {
+ skb_shinfo(skb)->gso_size = mtu - fragheaderlen;
+ skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4;
+ }
while (size > 0) {
int i;
- if (skb_shinfo(skb)->ufo_size)
+ if (skb_shinfo(skb)->gso_size)
len = size;
else {
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -571,7 +571,7 @@ new_segment:
skb->ip_summed = CHECKSUM_HW;
tp->write_seq += copy;
TCP_SKB_CB(skb)->end_seq += copy;
- skb_shinfo(skb)->tso_segs = 0;
+ skb_shinfo(skb)->gso_segs = 0;
if (!copied)
TCP_SKB_CB(skb)->flags &= ~TCPCB_FLAG_PSH;
@@ -818,7 +818,7 @@ new_segment:
tp->write_seq += copy;
TCP_SKB_CB(skb)->end_seq += copy;
- skb_shinfo(skb)->tso_segs = 0;
+ skb_shinfo(skb)->gso_segs = 0;
from += copy;
copied += copy;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1073,7 +1073,7 @@ tcp_sacktag_write_queue(struct sock *sk,
else
pkt_len = (end_seq -
TCP_SKB_CB(skb)->seq);
- if (tcp_fragment(sk, skb, pkt_len, skb_shinfo(skb)->tso_size))
+ if (tcp_fragment(sk, skb, pkt_len, skb_shinfo(skb)->gso_size))
break;
pcount = tcp_skb_pcount(skb);
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -515,15 +515,17 @@ static void tcp_set_skb_tso_segs(struct
/* Avoid the costly divide in the normal
* non-TSO case.
*/
- skb_shinfo(skb)->tso_segs = 1;
- skb_shinfo(skb)->tso_size = 0;
+ skb_shinfo(skb)->gso_segs = 1;
+ skb_shinfo(skb)->gso_size = 0;
+ skb_shinfo(skb)->gso_type = 0;
} else {
unsigned int factor;
factor = skb->len + (mss_now - 1);
factor /= mss_now;
- skb_shinfo(skb)->tso_segs = factor;
- skb_shinfo(skb)->tso_size = mss_now;
+ skb_shinfo(skb)->gso_segs = factor;
+ skb_shinfo(skb)->gso_size = mss_now;
+ skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
}
}
@@ -914,7 +916,7 @@ static int tcp_init_tso_segs(struct sock
if (!tso_segs ||
(tso_segs > 1 &&
- skb_shinfo(skb)->tso_size != mss_now)) {
+ tcp_skb_mss(skb) != mss_now)) {
tcp_set_skb_tso_segs(sk, skb, mss_now);
tso_segs = tcp_skb_pcount(skb);
}
@@ -1724,8 +1726,9 @@ int tcp_retransmit_skb(struct sock *sk,
tp->snd_una == (TCP_SKB_CB(skb)->end_seq - 1)) {
if (!pskb_trim(skb, 0)) {
TCP_SKB_CB(skb)->seq = TCP_SKB_CB(skb)->end_seq - 1;
- skb_shinfo(skb)->tso_segs = 1;
- skb_shinfo(skb)->tso_size = 0;
+ skb_shinfo(skb)->gso_segs = 1;
+ skb_shinfo(skb)->gso_size = 0;
+ skb_shinfo(skb)->gso_type = 0;
skb->ip_summed = CHECKSUM_NONE;
skb->csum = 0;
}
@@ -1930,8 +1933,9 @@ void tcp_send_fin(struct sock *sk)
skb->csum = 0;
TCP_SKB_CB(skb)->flags = (TCPCB_FLAG_ACK | TCPCB_FLAG_FIN);
TCP_SKB_CB(skb)->sacked = 0;
- skb_shinfo(skb)->tso_segs = 1;
- skb_shinfo(skb)->tso_size = 0;
+ skb_shinfo(skb)->gso_segs = 1;
+ skb_shinfo(skb)->gso_size = 0;
+ skb_shinfo(skb)->gso_type = 0;
/* FIN eats a sequence byte, write_seq advanced by tcp_queue_skb(). */
TCP_SKB_CB(skb)->seq = tp->write_seq;
@@ -1963,8 +1967,9 @@ void tcp_send_active_reset(struct sock *
skb->csum = 0;
TCP_SKB_CB(skb)->flags = (TCPCB_FLAG_ACK | TCPCB_FLAG_RST);
TCP_SKB_CB(skb)->sacked = 0;
- skb_shinfo(skb)->tso_segs = 1;
- skb_shinfo(skb)->tso_size = 0;
+ skb_shinfo(skb)->gso_segs = 1;
+ skb_shinfo(skb)->gso_size = 0;
+ skb_shinfo(skb)->gso_type = 0;
/* Send it off. */
TCP_SKB_CB(skb)->seq = tcp_acceptable_seq(sk, tp);
@@ -2047,8 +2052,9 @@ struct sk_buff * tcp_make_synack(struct
TCP_SKB_CB(skb)->seq = tcp_rsk(req)->snt_isn;
TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(skb)->seq + 1;
TCP_SKB_CB(skb)->sacked = 0;
- skb_shinfo(skb)->tso_segs = 1;
- skb_shinfo(skb)->tso_size = 0;
+ skb_shinfo(skb)->gso_segs = 1;
+ skb_shinfo(skb)->gso_size = 0;
+ skb_shinfo(skb)->gso_type = 0;
th->seq = htonl(TCP_SKB_CB(skb)->seq);
th->ack_seq = htonl(tcp_rsk(req)->rcv_isn + 1);
if (req->rcv_wnd == 0) { /* ignored for retransmitted syns */
@@ -2152,8 +2158,9 @@ int tcp_connect(struct sock *sk)
TCP_SKB_CB(buff)->flags = TCPCB_FLAG_SYN;
TCP_ECN_send_syn(sk, tp, buff);
TCP_SKB_CB(buff)->sacked = 0;
- skb_shinfo(buff)->tso_segs = 1;
- skb_shinfo(buff)->tso_size = 0;
+ skb_shinfo(buff)->gso_segs = 1;
+ skb_shinfo(buff)->gso_size = 0;
+ skb_shinfo(buff)->gso_type = 0;
buff->csum = 0;
TCP_SKB_CB(buff)->seq = tp->write_seq++;
TCP_SKB_CB(buff)->end_seq = tp->write_seq;
@@ -2257,8 +2264,9 @@ void tcp_send_ack(struct sock *sk)
buff->csum = 0;
TCP_SKB_CB(buff)->flags = TCPCB_FLAG_ACK;
TCP_SKB_CB(buff)->sacked = 0;
- skb_shinfo(buff)->tso_segs = 1;
- skb_shinfo(buff)->tso_size = 0;
+ skb_shinfo(buff)->gso_segs = 1;
+ skb_shinfo(buff)->gso_size = 0;
+ skb_shinfo(buff)->gso_type = 0;
/* Send it off, this clears delayed acks for us. */
TCP_SKB_CB(buff)->seq = TCP_SKB_CB(buff)->end_seq = tcp_acceptable_seq(sk, tp);
@@ -2293,8 +2301,9 @@ static int tcp_xmit_probe_skb(struct soc
skb->csum = 0;
TCP_SKB_CB(skb)->flags = TCPCB_FLAG_ACK;
TCP_SKB_CB(skb)->sacked = urgent;
- skb_shinfo(skb)->tso_segs = 1;
- skb_shinfo(skb)->tso_size = 0;
+ skb_shinfo(skb)->gso_segs = 1;
+ skb_shinfo(skb)->gso_size = 0;
+ skb_shinfo(skb)->gso_type = 0;
/* Use a previous sequence. This should cause the other
* end to send an ack. Don't queue or clone SKB, just
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -148,7 +148,7 @@ static int ip6_output2(struct sk_buff *s
int ip6_output(struct sk_buff *skb)
{
- if ((skb->len > dst_mtu(skb->dst) && !skb_shinfo(skb)->ufo_size) ||
+ if ((skb->len > dst_mtu(skb->dst) && !skb_shinfo(skb)->gso_size) ||
dst_allfrag(skb->dst))
return ip6_fragment(skb, ip6_output2);
else
@@ -833,8 +833,9 @@ static inline int ip6_ufo_append_data(st
struct frag_hdr fhdr;
/* specify the length of each IP datagram fragment*/
- skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen) -
- sizeof(struct frag_hdr);
+ skb_shinfo(skb)->gso_size = mtu - fragheaderlen -
+ sizeof(struct frag_hdr);
+ skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4;
ipv6_select_ident(skb, &fhdr);
skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
__skb_queue_tail(&sk->sk_write_queue, skb);
^ permalink raw reply [flat|nested] 23+ messages in thread* [2/5] [NET]: Add generic segmentation offload
2006-06-22 8:12 [0/5] GSO: Generic Segmentation Offload Herbert Xu
2006-06-22 8:12 ` [1/5] [NET]: Merge TSO/UFO fields in sk_buff Herbert Xu
@ 2006-06-22 8:13 ` Herbert Xu
2006-06-22 8:14 ` [3/5] [NET]: Add software TSOv4 Herbert Xu
` (5 subsequent siblings)
7 siblings, 0 replies; 23+ messages in thread
From: Herbert Xu @ 2006-06-22 8:13 UTC (permalink / raw)
To: David S. Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 716 bytes --]
Hi:
[NET]: Add generic segmentation offload
This patch adds the infrastructure for generic segmentation offload.
The idea is to tap into the potential savings of TSO without hardware
support by postponing the allocation of segmented skb's until just
before the entry point into the NIC driver.
The same structure can be used to support software IPv6 TSO, as well as
UFO and segmentation offload for other relevant protocols, e.g., DCCP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p2.patch --]
[-- Type: text/plain, Size: 7530 bytes --]
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -406,6 +406,9 @@ struct net_device
struct list_head qdisc_list;
unsigned long tx_queue_len; /* Max frames per queue allowed */
+ /* Partially transmitted GSO packet. */
+ struct sk_buff *gso_skb;
+
/* ingress path synchronizer */
spinlock_t ingress_lock;
struct Qdisc *qdisc_ingress;
@@ -540,6 +543,7 @@ struct packet_type {
struct net_device *,
struct packet_type *,
struct net_device *);
+ struct sk_buff *(*gso_segment)(struct sk_buff *skb, int sg);
void *af_packet_priv;
struct list_head list;
};
@@ -690,7 +694,8 @@ extern int dev_change_name(struct net_d
extern int dev_set_mtu(struct net_device *, int);
extern int dev_set_mac_address(struct net_device *,
struct sockaddr *);
-extern void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev);
+extern int dev_hard_start_xmit(struct sk_buff *skb,
+ struct net_device *dev);
extern void dev_init(void);
@@ -964,6 +969,7 @@ extern int netdev_max_backlog;
extern int weight_p;
extern int netdev_set_master(struct net_device *dev, struct net_device *master);
extern int skb_checksum_help(struct sk_buff *skb, int inward);
+extern struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg);
#ifdef CONFIG_BUG
extern void netdev_rx_csum_fault(struct net_device *dev);
#else
diff --git a/net/core/dev.c b/net/core/dev.c
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -116,6 +116,7 @@
#include <asm/current.h>
#include <linux/audit.h>
#include <linux/dmaengine.h>
+#include <linux/err.h>
/*
* The list of packet types we will receive (as opposed to discard)
@@ -1048,7 +1049,7 @@ static inline void net_timestamp(struct
* taps currently in use.
*/
-void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
+static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
{
struct packet_type *ptype;
@@ -1186,6 +1187,40 @@ out:
return ret;
}
+/**
+ * skb_gso_segment - Perform segmentation on skb.
+ * @skb: buffer to segment
+ * @sg: whether scatter-gather is supported on the target.
+ *
+ * This function segments the given skb and returns a list of segments.
+ */
+struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg)
+{
+ struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT);
+ struct packet_type *ptype;
+ int type = skb->protocol;
+
+ BUG_ON(skb_shinfo(skb)->frag_list);
+ BUG_ON(skb->ip_summed != CHECKSUM_HW);
+
+ skb->mac.raw = skb->data;
+ skb->mac_len = skb->nh.raw - skb->data;
+ __skb_pull(skb, skb->mac_len);
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type) & 15], list) {
+ if (ptype->type == type && !ptype->dev && ptype->gso_segment) {
+ segs = ptype->gso_segment(skb, sg);
+ break;
+ }
+ }
+ rcu_read_unlock();
+
+ return segs;
+}
+
+EXPORT_SYMBOL(skb_gso_segment);
+
/* Take action when hardware reception checksum errors are detected. */
#ifdef CONFIG_BUG
void netdev_rx_csum_fault(struct net_device *dev)
@@ -1222,6 +1257,86 @@ static inline int illegal_highdma(struct
#define illegal_highdma(dev, skb) (0)
#endif
+struct dev_gso_cb {
+ void (*destructor)(struct sk_buff *skb);
+};
+
+#define DEV_GSO_CB(skb) ((struct dev_gso_cb *)(skb)->cb)
+
+static void dev_gso_skb_destructor(struct sk_buff *skb)
+{
+ struct dev_gso_cb *cb;
+
+ do {
+ struct sk_buff *nskb = skb->next;
+
+ skb->next = nskb->next;
+ nskb->next = NULL;
+ kfree_skb(nskb);
+ } while (skb->next);
+
+ cb = DEV_GSO_CB(skb);
+ if (cb->destructor)
+ cb->destructor(skb);
+}
+
+/**
+ * dev_gso_segment - Perform emulated hardware segmentation on skb.
+ * @skb: buffer to segment
+ *
+ * This function segments the given skb and stores the list of segments
+ * in skb->next.
+ */
+static int dev_gso_segment(struct sk_buff *skb)
+{
+ struct net_device *dev = skb->dev;
+ struct sk_buff *segs;
+
+ segs = skb_gso_segment(skb, dev->features & NETIF_F_SG &&
+ !illegal_highdma(dev, skb));
+ if (unlikely(IS_ERR(segs)))
+ return PTR_ERR(segs);
+
+ skb->next = segs;
+ DEV_GSO_CB(skb)->destructor = skb->destructor;
+ skb->destructor = dev_gso_skb_destructor;
+
+ return 0;
+}
+
+int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+ if (likely(!skb->next)) {
+ if (netdev_nit)
+ dev_queue_xmit_nit(skb, dev);
+
+ if (!netif_needs_gso(dev, skb))
+ return dev->hard_start_xmit(skb, dev);
+
+ if (unlikely(dev_gso_segment(skb)))
+ goto out_kfree_skb;
+ }
+
+ do {
+ struct sk_buff *nskb = skb->next;
+ int rc;
+
+ skb->next = nskb->next;
+ nskb->next = NULL;
+ rc = dev->hard_start_xmit(nskb, dev);
+ if (unlikely(rc)) {
+ skb->next = nskb;
+ return rc;
+ }
+ } while (skb->next);
+
+ skb->destructor = DEV_GSO_CB(skb)->destructor;
+
+out_kfree_skb:
+ kfree_skb(skb);
+ return 0;
+}
+
#define HARD_TX_LOCK(dev, cpu) { \
if ((dev->features & NETIF_F_LLTX) == 0) { \
netif_tx_lock(dev); \
@@ -1266,6 +1381,10 @@ int dev_queue_xmit(struct sk_buff *skb)
struct Qdisc *q;
int rc = -ENOMEM;
+ /* GSO will handle the following emulations directly. */
+ if (netif_needs_gso(dev, skb))
+ goto gso;
+
if (skb_shinfo(skb)->frag_list &&
!(dev->features & NETIF_F_FRAGLIST) &&
__skb_linearize(skb))
@@ -1290,6 +1409,7 @@ int dev_queue_xmit(struct sk_buff *skb)
if (skb_checksum_help(skb, 0))
goto out_kfree_skb;
+gso:
spin_lock_prefetch(&dev->queue_lock);
/* Disable soft irqs for various locks below. Also
@@ -1346,11 +1466,8 @@ int dev_queue_xmit(struct sk_buff *skb)
HARD_TX_LOCK(dev, cpu);
if (!netif_queue_stopped(dev)) {
- if (netdev_nit)
- dev_queue_xmit_nit(skb, dev);
-
rc = 0;
- if (!dev->hard_start_xmit(skb, dev)) {
+ if (!dev_hard_start_xmit(skb, dev)) {
HARD_TX_UNLOCK(dev);
goto out;
}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -96,8 +96,11 @@ static inline int qdisc_restart(struct n
struct sk_buff *skb;
/* Dequeue packet */
- if ((skb = q->dequeue(q)) != NULL) {
+ if (((skb = dev->gso_skb)) || ((skb = q->dequeue(q)))) {
unsigned nolock = (dev->features & NETIF_F_LLTX);
+
+ dev->gso_skb = NULL;
+
/*
* When the driver has LLTX set it does its own locking
* in start_xmit. No need to add additional overhead by
@@ -134,10 +137,8 @@ static inline int qdisc_restart(struct n
if (!netif_queue_stopped(dev)) {
int ret;
- if (netdev_nit)
- dev_queue_xmit_nit(skb, dev);
- ret = dev->hard_start_xmit(skb, dev);
+ ret = dev_hard_start_xmit(skb, dev);
if (ret == NETDEV_TX_OK) {
if (!nolock) {
netif_tx_unlock(dev);
@@ -171,7 +172,10 @@ static inline int qdisc_restart(struct n
*/
requeue:
- q->ops->requeue(skb, q);
+ if (skb->next)
+ dev->gso_skb = skb;
+ else
+ q->ops->requeue(skb, q);
netif_schedule(dev);
return 1;
}
@@ -576,6 +580,7 @@ void dev_activate(struct net_device *dev
void dev_deactivate(struct net_device *dev)
{
struct Qdisc *qdisc;
+ struct sk_buff *skb;
spin_lock_bh(&dev->queue_lock);
qdisc = dev->qdisc;
@@ -593,6 +598,11 @@ void dev_deactivate(struct net_device *d
/* Wait for outstanding qdisc_run calls. */
while (test_bit(__LINK_STATE_QDISC_RUNNING, &dev->state))
yield();
+
+ if (dev->gso_skb) {
+ kfree_skb(dev->gso_skb);
+ dev->gso_skb = NULL;
+ }
}
void dev_init_scheduler(struct net_device *dev)
^ permalink raw reply [flat|nested] 23+ messages in thread* [3/5] [NET]: Add software TSOv4
2006-06-22 8:12 [0/5] GSO: Generic Segmentation Offload Herbert Xu
2006-06-22 8:12 ` [1/5] [NET]: Merge TSO/UFO fields in sk_buff Herbert Xu
2006-06-22 8:13 ` [2/5] [NET]: Add generic segmentation offload Herbert Xu
@ 2006-06-22 8:14 ` Herbert Xu
2006-06-22 8:23 ` Herbert Xu
` (2 more replies)
2006-06-22 8:14 ` [4/5] [NET]: Added GSO toggle Herbert Xu
` (4 subsequent siblings)
7 siblings, 3 replies; 23+ messages in thread
From: Herbert Xu @ 2006-06-22 8:14 UTC (permalink / raw)
To: David S. Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 361 bytes --]
Hi:
[NET]: Add software TSOv4
This patch adds the GSO implementation for IPv4 TCP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p3.patch --]
[-- Type: text/plain, Size: 7982 bytes --]
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1299,6 +1299,7 @@ extern void skb_split(struct sk_b
struct sk_buff *skb1, const u32 len);
extern void skb_release_data(struct sk_buff *skb);
+extern struct sk_buff *skb_segment(struct sk_buff *skb, int sg);
static inline void *skb_header_pointer(const struct sk_buff *skb, int offset,
int len, void *buffer)
diff --git a/include/net/protocol.h b/include/net/protocol.h
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -37,6 +37,7 @@
struct net_protocol {
int (*handler)(struct sk_buff *skb);
void (*err_handler)(struct sk_buff *skb, u32 info);
+ struct sk_buff *(*gso_segment)(struct sk_buff *skb, int sg);
int no_policy;
};
diff --git a/include/net/tcp.h b/include/net/tcp.h
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1087,6 +1087,8 @@ extern struct request_sock_ops tcp_reque
extern int tcp_v4_destroy_sock(struct sock *sk);
+extern struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int sg);
+
#ifdef CONFIG_PROC_FS
extern int tcp4_proc_init(void);
extern void tcp4_proc_exit(void);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1826,6 +1826,132 @@ unsigned char *skb_pull_rcsum(struct sk_
EXPORT_SYMBOL_GPL(skb_pull_rcsum);
+/**
+ * skb_segment - Perform protocol segmentation on skb.
+ * @skb: buffer to segment
+ * @sg: whether scatter-gather can be used for generated segments
+ *
+ * This function performs segmentation on the given skb. It returns
+ * the segment at the given position. It returns NULL if there are
+ * no more segments to generate, or when an error is encountered.
+ */
+struct sk_buff *skb_segment(struct sk_buff *skb, int sg)
+{
+ struct sk_buff *segs = NULL;
+ struct sk_buff *tail = NULL;
+ unsigned int mss = skb_shinfo(skb)->gso_size;
+ unsigned int doffset = skb->data - skb->mac.raw;
+ unsigned int offset = doffset;
+ unsigned int headroom;
+ unsigned int len;
+ int nfrags = skb_shinfo(skb)->nr_frags;
+ int err = -ENOMEM;
+ int i = 0;
+ int pos;
+
+ __skb_push(skb, doffset);
+ headroom = skb_headroom(skb);
+ pos = skb_headlen(skb);
+
+ do {
+ struct sk_buff *nskb;
+ skb_frag_t *frag;
+ int hsize, nsize;
+ int k;
+ int size;
+
+ len = skb->len - offset;
+ if (len > mss)
+ len = mss;
+
+ hsize = skb_headlen(skb) - offset;
+ if (hsize < 0)
+ hsize = 0;
+ nsize = hsize + doffset;
+ if (nsize > len + doffset || !sg)
+ nsize = len + doffset;
+
+ nskb = alloc_skb(nsize + headroom, GFP_ATOMIC);
+ if (unlikely(!nskb))
+ goto err;
+
+ if (segs)
+ tail->next = nskb;
+ else
+ segs = nskb;
+ tail = nskb;
+
+ nskb->dev = skb->dev;
+ nskb->priority = skb->priority;
+ nskb->protocol = skb->protocol;
+ nskb->dst = dst_clone(skb->dst);
+ memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
+ nskb->pkt_type = skb->pkt_type;
+ nskb->mac_len = skb->mac_len;
+
+ skb_reserve(nskb, headroom);
+ nskb->mac.raw = nskb->data;
+ nskb->nh.raw = nskb->data + skb->mac_len;
+ nskb->h.raw = nskb->nh.raw + (skb->h.raw - skb->nh.raw);
+ memcpy(skb_put(nskb, doffset), skb->data, doffset);
+
+ if (!sg) {
+ nskb->csum = skb_copy_and_csum_bits(skb, offset,
+ skb_put(nskb, len),
+ len, 0);
+ continue;
+ }
+
+ frag = skb_shinfo(nskb)->frags;
+ k = 0;
+
+ nskb->ip_summed = CHECKSUM_HW;
+ nskb->csum = skb->csum;
+ memcpy(skb_put(nskb, hsize), skb->data + offset, hsize);
+
+ while (pos < offset + len) {
+ BUG_ON(i >= nfrags);
+
+ *frag = skb_shinfo(skb)->frags[i];
+ get_page(frag->page);
+ size = frag->size;
+
+ if (pos < offset) {
+ frag->page_offset += offset - pos;
+ frag->size -= offset - pos;
+ }
+
+ k++;
+
+ if (pos + size <= offset + len) {
+ i++;
+ pos += size;
+ } else {
+ frag->size -= pos + size - (offset + len);
+ break;
+ }
+
+ frag++;
+ }
+
+ skb_shinfo(nskb)->nr_frags = k;
+ nskb->data_len = len - hsize;
+ nskb->len += nskb->data_len;
+ nskb->truesize += nskb->data_len;
+ } while ((offset += len) < skb->len);
+
+ return segs;
+
+err:
+ while ((skb = segs)) {
+ segs = skb->next;
+ kfree(skb);
+ }
+ return ERR_PTR(err);
+}
+
+EXPORT_SYMBOL_GPL(skb_segment);
+
void __init skb_init(void)
{
skbuff_head_cache = kmem_cache_create("skbuff_head_cache",
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -68,6 +68,7 @@
*/
#include <linux/config.h>
+#include <linux/err.h>
#include <linux/errno.h>
#include <linux/types.h>
#include <linux/socket.h>
@@ -1096,6 +1097,54 @@ int inet_sk_rebuild_header(struct sock *
EXPORT_SYMBOL(inet_sk_rebuild_header);
+static struct sk_buff *inet_gso_segment(struct sk_buff *skb, int sg)
+{
+ struct sk_buff *segs = ERR_PTR(-EINVAL);
+ struct iphdr *iph;
+ struct net_protocol *ops;
+ int proto;
+ int ihl;
+ int id;
+
+ if (!pskb_may_pull(skb, sizeof(*iph)))
+ goto out;
+
+ iph = skb->nh.iph;
+ ihl = iph->ihl * 4;
+ if (ihl < sizeof(*iph))
+ goto out;
+
+ if (!pskb_may_pull(skb, ihl))
+ goto out;
+
+ skb->h.raw = __skb_pull(skb, ihl);
+ iph = skb->nh.iph;
+ id = ntohs(iph->id);
+ proto = iph->protocol & (MAX_INET_PROTOS - 1);
+ segs = ERR_PTR(-EPROTONOSUPPORT);
+
+ rcu_read_lock();
+ ops = rcu_dereference(inet_protos[proto]);
+ if (ops && ops->gso_segment)
+ segs = ops->gso_segment(skb, sg);
+ rcu_read_unlock();
+
+ if (IS_ERR(segs))
+ goto out;
+
+ skb = segs;
+ do {
+ iph = skb->nh.iph;
+ iph->id = htons(id++);
+ iph->tot_len = htons(skb->len - skb->mac_len);
+ iph->check = 0;
+ iph->check = ip_fast_csum(skb->nh.raw, iph->ihl);
+ } while ((skb = skb->next));
+
+out:
+ return segs;
+}
+
#ifdef CONFIG_IP_MULTICAST
static struct net_protocol igmp_protocol = {
.handler = igmp_rcv,
@@ -1105,6 +1154,7 @@ static struct net_protocol igmp_protocol
static struct net_protocol tcp_protocol = {
.handler = tcp_v4_rcv,
.err_handler = tcp_v4_err,
+ .gso_segment = tcp_tso_segment,
.no_policy = 1,
};
@@ -1150,6 +1200,7 @@ static int ipv4_proc_init(void);
static struct packet_type ip_packet_type = {
.type = __constant_htons(ETH_P_IP),
.func = ip_rcv,
+ .gso_segment = inet_gso_segment,
};
static int __init inet_init(void)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -258,6 +258,7 @@
#include <linux/random.h>
#include <linux/bootmem.h>
#include <linux/cache.h>
+#include <linux/err.h>
#include <net/icmp.h>
#include <net/tcp.h>
@@ -2144,6 +2145,67 @@ int compat_tcp_getsockopt(struct sock *s
EXPORT_SYMBOL(compat_tcp_getsockopt);
#endif
+struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int sg)
+{
+ struct sk_buff *segs = ERR_PTR(-EINVAL);
+ struct tcphdr *th;
+ unsigned thlen;
+ unsigned int seq;
+ unsigned int delta;
+ unsigned int oldlen;
+ unsigned int len;
+
+ if (!pskb_may_pull(skb, sizeof(*th)))
+ goto out;
+
+ th = skb->h.th;
+ thlen = th->doff * 4;
+ if (thlen < sizeof(*th))
+ goto out;
+
+ if (!pskb_may_pull(skb, thlen))
+ goto out;
+
+ oldlen = ~htonl(skb->len);
+ __skb_pull(skb, thlen);
+
+ segs = skb_segment(skb, sg);
+ if (IS_ERR(segs))
+ goto out;
+
+ len = skb_shinfo(skb)->gso_size;
+ delta = csum_add(oldlen, htonl(thlen + len));
+
+ skb = segs;
+ th = skb->h.th;
+ seq = ntohl(th->seq);
+
+ do {
+ th->fin = th->psh = 0;
+
+ if (skb->ip_summed == CHECKSUM_NONE) {
+ th->check = csum_fold(csum_partial(
+ skb->h.raw, thlen, csum_add(skb->csum, delta)));
+ }
+
+ seq += len;
+ skb = skb->next;
+ th = skb->h.th;
+
+ th->seq = htonl(seq);
+ th->cwr = 0;
+ } while (skb->next);
+
+ if (skb->ip_summed == CHECKSUM_NONE) {
+ delta = csum_add(oldlen, htonl(skb->tail - skb->h.raw));
+ th->check = csum_fold(csum_partial(
+ skb->h.raw, thlen, csum_add(skb->csum, delta)));
+ }
+
+out:
+ return segs;
+}
+
extern void __skb_cb_too_small_for_tcp(int, int);
extern struct tcp_congestion_ops tcp_reno;
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [3/5] [NET]: Add software TSOv4
2006-06-22 8:14 ` [3/5] [NET]: Add software TSOv4 Herbert Xu
@ 2006-06-22 8:23 ` Herbert Xu
2006-06-22 15:04 ` YOSHIFUJI Hideaki / 吉藤英明
2006-06-23 19:33 ` Michael Chan
2 siblings, 0 replies; 23+ messages in thread
From: Herbert Xu @ 2006-06-22 8:23 UTC (permalink / raw)
To: David S. Miller, netdev
On Thu, Jun 22, 2006 at 06:14:00PM +1000, herbert wrote:
>
> [NET]: Add software TSOv4
Doh, forgot to remove an unused declaration. Here is an updated version.
[NET]: Add software TSOv4
This patch adds the GSO implementation for IPv4 TCP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -406,6 +406,9 @@ struct net_device
struct list_head qdisc_list;
unsigned long tx_queue_len; /* Max frames per queue allowed */
+ /* Partially transmitted GSO packet. */
+ struct sk_buff *gso_skb;
+
/* ingress path synchronizer */
spinlock_t ingress_lock;
struct Qdisc *qdisc_ingress;
@@ -540,6 +543,7 @@ struct packet_type {
struct net_device *,
struct packet_type *,
struct net_device *);
+ struct sk_buff *(*gso_segment)(struct sk_buff *skb, int sg);
void *af_packet_priv;
struct list_head list;
};
@@ -690,7 +694,8 @@ extern int dev_change_name(struct net_d
extern int dev_set_mtu(struct net_device *, int);
extern int dev_set_mac_address(struct net_device *,
struct sockaddr *);
-extern void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev);
+extern int dev_hard_start_xmit(struct sk_buff *skb,
+ struct net_device *dev);
extern void dev_init(void);
@@ -964,6 +969,7 @@ extern int netdev_max_backlog;
extern int weight_p;
extern int netdev_set_master(struct net_device *dev, struct net_device *master);
extern int skb_checksum_help(struct sk_buff *skb, int inward);
+extern struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg);
#ifdef CONFIG_BUG
extern void netdev_rx_csum_fault(struct net_device *dev);
#else
diff --git a/net/core/dev.c b/net/core/dev.c
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -116,6 +116,7 @@
#include <asm/current.h>
#include <linux/audit.h>
#include <linux/dmaengine.h>
+#include <linux/err.h>
/*
* The list of packet types we will receive (as opposed to discard)
@@ -1048,7 +1049,7 @@ static inline void net_timestamp(struct
* taps currently in use.
*/
-void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
+static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
{
struct packet_type *ptype;
@@ -1186,6 +1187,40 @@ out:
return ret;
}
+/**
+ * skb_gso_segment - Perform segmentation on skb.
+ * @skb: buffer to segment
+ * @sg: whether scatter-gather is supported on the target.
+ *
+ * This function segments the given skb and returns a list of segments.
+ */
+struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg)
+{
+ struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT);
+ struct packet_type *ptype;
+ int type = skb->protocol;
+
+ BUG_ON(skb_shinfo(skb)->frag_list);
+ BUG_ON(skb->ip_summed != CHECKSUM_HW);
+
+ skb->mac.raw = skb->data;
+ skb->mac_len = skb->nh.raw - skb->data;
+ __skb_pull(skb, skb->mac_len);
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type) & 15], list) {
+ if (ptype->type == type && !ptype->dev && ptype->gso_segment) {
+ segs = ptype->gso_segment(skb, sg);
+ break;
+ }
+ }
+ rcu_read_unlock();
+
+ return segs;
+}
+
+EXPORT_SYMBOL(skb_gso_segment);
+
/* Take action when hardware reception checksum errors are detected. */
#ifdef CONFIG_BUG
void netdev_rx_csum_fault(struct net_device *dev)
@@ -1222,6 +1257,86 @@ static inline int illegal_highdma(struct
#define illegal_highdma(dev, skb) (0)
#endif
+struct dev_gso_cb {
+ void (*destructor)(struct sk_buff *skb);
+};
+
+#define DEV_GSO_CB(skb) ((struct dev_gso_cb *)(skb)->cb)
+
+static void dev_gso_skb_destructor(struct sk_buff *skb)
+{
+ struct dev_gso_cb *cb;
+
+ do {
+ struct sk_buff *nskb = skb->next;
+
+ skb->next = nskb->next;
+ nskb->next = NULL;
+ kfree_skb(nskb);
+ } while (skb->next);
+
+ cb = DEV_GSO_CB(skb);
+ if (cb->destructor)
+ cb->destructor(skb);
+}
+
+/**
+ * dev_gso_segment - Perform emulated hardware segmentation on skb.
+ * @skb: buffer to segment
+ *
+ * This function segments the given skb and stores the list of segments
+ * in skb->next.
+ */
+static int dev_gso_segment(struct sk_buff *skb)
+{
+ struct net_device *dev = skb->dev;
+ struct sk_buff *segs;
+
+ segs = skb_gso_segment(skb, dev->features & NETIF_F_SG &&
+ !illegal_highdma(dev, skb));
+ if (unlikely(IS_ERR(segs)))
+ return PTR_ERR(segs);
+
+ skb->next = segs;
+ DEV_GSO_CB(skb)->destructor = skb->destructor;
+ skb->destructor = dev_gso_skb_destructor;
+
+ return 0;
+}
+
+int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+ if (likely(!skb->next)) {
+ if (netdev_nit)
+ dev_queue_xmit_nit(skb, dev);
+
+ if (!netif_needs_gso(dev, skb))
+ return dev->hard_start_xmit(skb, dev);
+
+ if (unlikely(dev_gso_segment(skb)))
+ goto out_kfree_skb;
+ }
+
+ do {
+ struct sk_buff *nskb = skb->next;
+ int rc;
+
+ skb->next = nskb->next;
+ nskb->next = NULL;
+ rc = dev->hard_start_xmit(nskb, dev);
+ if (unlikely(rc)) {
+ skb->next = nskb;
+ return rc;
+ }
+ } while (skb->next);
+
+ skb->destructor = DEV_GSO_CB(skb)->destructor;
+
+out_kfree_skb:
+ kfree_skb(skb);
+ return 0;
+}
+
#define HARD_TX_LOCK(dev, cpu) { \
if ((dev->features & NETIF_F_LLTX) == 0) { \
netif_tx_lock(dev); \
@@ -1266,6 +1381,10 @@ int dev_queue_xmit(struct sk_buff *skb)
struct Qdisc *q;
int rc = -ENOMEM;
+ /* GSO will handle the following emulations directly. */
+ if (netif_needs_gso(dev, skb))
+ goto gso;
+
if (skb_shinfo(skb)->frag_list &&
!(dev->features & NETIF_F_FRAGLIST) &&
__skb_linearize(skb))
@@ -1290,6 +1409,7 @@ int dev_queue_xmit(struct sk_buff *skb)
if (skb_checksum_help(skb, 0))
goto out_kfree_skb;
+gso:
spin_lock_prefetch(&dev->queue_lock);
/* Disable soft irqs for various locks below. Also
@@ -1346,11 +1466,8 @@ int dev_queue_xmit(struct sk_buff *skb)
HARD_TX_LOCK(dev, cpu);
if (!netif_queue_stopped(dev)) {
- if (netdev_nit)
- dev_queue_xmit_nit(skb, dev);
-
rc = 0;
- if (!dev->hard_start_xmit(skb, dev)) {
+ if (!dev_hard_start_xmit(skb, dev)) {
HARD_TX_UNLOCK(dev);
goto out;
}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -96,8 +96,11 @@ static inline int qdisc_restart(struct n
struct sk_buff *skb;
/* Dequeue packet */
- if ((skb = q->dequeue(q)) != NULL) {
+ if (((skb = dev->gso_skb)) || ((skb = q->dequeue(q)))) {
unsigned nolock = (dev->features & NETIF_F_LLTX);
+
+ dev->gso_skb = NULL;
+
/*
* When the driver has LLTX set it does its own locking
* in start_xmit. No need to add additional overhead by
@@ -134,10 +137,8 @@ static inline int qdisc_restart(struct n
if (!netif_queue_stopped(dev)) {
int ret;
- if (netdev_nit)
- dev_queue_xmit_nit(skb, dev);
- ret = dev->hard_start_xmit(skb, dev);
+ ret = dev_hard_start_xmit(skb, dev);
if (ret == NETDEV_TX_OK) {
if (!nolock) {
netif_tx_unlock(dev);
@@ -171,7 +172,10 @@ static inline int qdisc_restart(struct n
*/
requeue:
- q->ops->requeue(skb, q);
+ if (skb->next)
+ dev->gso_skb = skb;
+ else
+ q->ops->requeue(skb, q);
netif_schedule(dev);
return 1;
}
@@ -593,6 +597,11 @@ void dev_deactivate(struct net_device *d
/* Wait for outstanding qdisc_run calls. */
while (test_bit(__LINK_STATE_QDISC_RUNNING, &dev->state))
yield();
+
+ if (dev->gso_skb) {
+ kfree_skb(dev->gso_skb);
+ dev->gso_skb = NULL;
+ }
}
void dev_init_scheduler(struct net_device *dev)
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [3/5] [NET]: Add software TSOv4
2006-06-22 8:14 ` [3/5] [NET]: Add software TSOv4 Herbert Xu
2006-06-22 8:23 ` Herbert Xu
@ 2006-06-22 15:04 ` YOSHIFUJI Hideaki / 吉藤英明
2006-06-22 21:32 ` David Miller
2006-06-23 19:33 ` Michael Chan
2 siblings, 1 reply; 23+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2006-06-22 15:04 UTC (permalink / raw)
To: herbert; +Cc: davem, netdev, yoshfuji
In article <20060622081400.GC22671@gondor.apana.org.au> (at Thu, 22 Jun 2006 18:14:00 +1000), Herbert Xu <herbert@gondor.apana.org.au> says:
> [NET]: Add software TSOv4
>
> This patch adds the GSO implementation for IPv4 TCP.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
I'd appreciate if you code up IPv6 TCP as well. :-)
Regards,
--yoshfuji
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3/5] [NET]: Add software TSOv4
2006-06-22 15:04 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2006-06-22 21:32 ` David Miller
2006-06-24 0:28 ` Ravinandan Arakali
0 siblings, 1 reply; 23+ messages in thread
From: David Miller @ 2006-06-22 21:32 UTC (permalink / raw)
To: yoshfuji; +Cc: herbert, netdev
From: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date: Fri, 23 Jun 2006 00:04:03 +0900 (JST)
> In article <20060622081400.GC22671@gondor.apana.org.au> (at Thu, 22 Jun 2006 18:14:00 +1000), Herbert Xu <herbert@gondor.apana.org.au> says:
>
> > [NET]: Add software TSOv4
> >
> > This patch adds the GSO implementation for IPv4 TCP.
> >
> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> I'd appreciate if you code up IPv6 TCP as well. :-)
To my understanding doing IPV6 TCP TSO is a non-trivial task, even in
software.
The header editing is a lot more complicated because things like
routing and other extension headers can sit between IPV6 and TCP
header.
It is probably why IPV6 TSO hardware does not exist yet :)
Do not take this to mean I think it should not be implemented, I think
it should.
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [3/5] [NET]: Add software TSOv4
2006-06-22 21:32 ` David Miller
@ 2006-06-24 0:28 ` Ravinandan Arakali
2006-06-24 1:32 ` YOSHIFUJI Hideaki / 吉藤英明
0 siblings, 1 reply; 23+ messages in thread
From: Ravinandan Arakali @ 2006-06-24 0:28 UTC (permalink / raw)
To: 'David Miller', yoshfuji; +Cc: herbert, netdev
Neterion's Xframe adapter supports TSO over IPv6.
Ravi
-----Original Message-----
From: netdev-owner@vger.kernel.org
[mailto:netdev-owner@vger.kernel.org]On Behalf Of David Miller
Sent: Thursday, June 22, 2006 2:32 PM
To: yoshfuji@linux-ipv6.org
Cc: herbert@gondor.apana.org.au; netdev@vger.kernel.org
Subject: Re: [3/5] [NET]: Add software TSOv4
From: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date: Fri, 23 Jun 2006 00:04:03 +0900 (JST)
> In article <20060622081400.GC22671@gondor.apana.org.au> (at Thu, 22 Jun
2006 18:14:00 +1000), Herbert Xu <herbert@gondor.apana.org.au> says:
>
> > [NET]: Add software TSOv4
> >
> > This patch adds the GSO implementation for IPv4 TCP.
> >
> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> I'd appreciate if you code up IPv6 TCP as well. :-)
To my understanding doing IPV6 TCP TSO is a non-trivial task, even in
software.
The header editing is a lot more complicated because things like
routing and other extension headers can sit between IPV6 and TCP
header.
It is probably why IPV6 TSO hardware does not exist yet :)
Do not take this to mean I think it should not be implemented, I think
it should.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3/5] [NET]: Add software TSOv4
2006-06-24 0:28 ` Ravinandan Arakali
@ 2006-06-24 1:32 ` YOSHIFUJI Hideaki / 吉藤英明
2006-06-26 18:33 ` Ravinandan Arakali
0 siblings, 1 reply; 23+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2006-06-24 1:32 UTC (permalink / raw)
To: ravinandan.arakali; +Cc: davem, herbert, netdev, yoshfuji
In article <004201c69725$14123700$4110100a@pc.s2io.com> (at Fri, 23 Jun 2006 17:28:12 -0700), "Ravinandan Arakali" <ravinandan.arakali@neterion.com> says:
> Neterion's Xframe adapter supports TSO over IPv6.
I remember you posted some patches.
Would you post revised version reflecting Stephen's comment, please?
--yoshfuji
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [3/5] [NET]: Add software TSOv4
2006-06-24 1:32 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2006-06-26 18:33 ` Ravinandan Arakali
0 siblings, 0 replies; 23+ messages in thread
From: Ravinandan Arakali @ 2006-06-26 18:33 UTC (permalink / raw)
To: 'YOSHIFUJI Hideaki / <g"!?p-?'; +Cc: davem, herbert, netdev
We are working on it.
Ravi
-----Original Message-----
From: YOSHIFUJI Hideaki / <g"!?p-? [mailto:yoshfuji@linux-ipv6.org]
Sent: Friday, June 23, 2006 6:33 PM
To: ravinandan.arakali@neterion.com
Cc: davem@davemloft.net; herbert@gondor.apana.org.au;
netdev@vger.kernel.org; yoshfuji@linux-ipv6.org
Subject: Re: [3/5] [NET]: Add software TSOv4
In article <004201c69725$14123700$4110100a@pc.s2io.com> (at Fri, 23 Jun 2006
17:28:12 -0700), "Ravinandan Arakali" <ravinandan.arakali@neterion.com>
says:
> Neterion's Xframe adapter supports TSO over IPv6.
I remember you posted some patches.
Would you post revised version reflecting Stephen's comment, please?
--yoshfuji
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3/5] [NET]: Add software TSOv4
2006-06-22 8:14 ` [3/5] [NET]: Add software TSOv4 Herbert Xu
2006-06-22 8:23 ` Herbert Xu
2006-06-22 15:04 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2006-06-23 19:33 ` Michael Chan
2006-06-23 21:26 ` Michael Chan
2 siblings, 1 reply; 23+ messages in thread
From: Michael Chan @ 2006-06-23 19:33 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, netdev
On Thu, 2006-06-22 at 18:14 +1000, Herbert Xu wrote:
> [NET]: Add software TSOv4
>
> This patch adds the GSO implementation for IPv4 TCP.
Herbert, Looks like there were some problems in the CHECKSUM_HW case.
This patch should fix it. Please double-check my checksum math.
[NET]: Fix CHECKSUM_HW GSO problems.
Fix the following 2 problems in the GSO code path for CHECKSUM_HW
packets:
1. Adjust ipv4 TCP pseudo header checksum.
2. Initialize skb->tail.
Signed-off-by: Michael Chan <mchan@broadcom.com>
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 8e5044b..3f19b3d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1954,6 +1954,7 @@ struct sk_buff *skb_segment(struct sk_bu
nskb->data_len = len - hsize;
nskb->len += nskb->data_len;
nskb->truesize += nskb->data_len;
+ nskb->tail += nskb->data_len;
} while ((offset += len) < skb->len);
return segs;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0e029c4..3399110 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2186,7 +2186,8 @@ struct sk_buff *tcp_tso_segment(struct s
if (skb->ip_summed == CHECKSUM_NONE) {
th->check = csum_fold(csum_partial(
skb->h.raw, thlen, csum_add(skb->csum, delta)));
- }
+ } else if (skb->ip_summed == CHECKSUM_HW)
+ th->check = ~csum_fold(csum_add(th->check, delta));
seq += len;
skb = skb->next;
@@ -2196,11 +2197,12 @@ struct sk_buff *tcp_tso_segment(struct s
th->cwr = 0;
} while (skb->next);
+ delta = csum_add(oldlen, htonl(skb->tail - skb->h.raw));
if (skb->ip_summed == CHECKSUM_NONE) {
- delta = csum_add(oldlen, htonl(skb->tail - skb->h.raw));
th->check = csum_fold(csum_partial(
skb->h.raw, thlen, csum_add(skb->csum, delta)));
- }
+ } else if (skb->ip_summed == CHECKSUM_HW)
+ th->check = ~csum_fold(csum_add(th->check, delta));
out:
return segs;
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [3/5] [NET]: Add software TSOv4
2006-06-23 19:33 ` Michael Chan
@ 2006-06-23 21:26 ` Michael Chan
2006-06-23 23:38 ` Herbert Xu
0 siblings, 1 reply; 23+ messages in thread
From: Michael Chan @ 2006-06-23 21:26 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, netdev
On Fri, 2006-06-23 at 12:33 -0700, Michael Chan wrote:
> On Thu, 2006-06-22 at 18:14 +1000, Herbert Xu wrote:
> > [NET]: Add software TSOv4
> >
> > This patch adds the GSO implementation for IPv4 TCP.
>
> Herbert, Looks like there were some problems in the CHECKSUM_HW case.
> This patch should fix it. Please double-check my checksum math.
This patch is more correct. Please ignore the previous one.
[NET]: Fix CHECKSUM_HW GSO problems.
Fix checksum problems in the GSO code path for CHECKSUM_HW packets.
The ipv4 TCP pseudo header checksum has to be adjusted for GSO
segmented packets.
Signed-off-by: Michael Chan <mchan@broadcom.com>
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0e029c4..b9c37f1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2186,7 +2186,8 @@ struct sk_buff *tcp_tso_segment(struct s
if (skb->ip_summed == CHECKSUM_NONE) {
th->check = csum_fold(csum_partial(
skb->h.raw, thlen, csum_add(skb->csum, delta)));
- }
+ } else if (skb->ip_summed == CHECKSUM_HW)
+ th->check = ~csum_fold(csum_add(th->check, delta));
seq += len;
skb = skb->next;
@@ -2200,6 +2201,10 @@ struct sk_buff *tcp_tso_segment(struct s
delta = csum_add(oldlen, htonl(skb->tail - skb->h.raw));
th->check = csum_fold(csum_partial(
skb->h.raw, thlen, csum_add(skb->csum, delta)));
+ } else if (skb->ip_summed == CHECKSUM_HW) {
+ delta = csum_add(oldlen, htonl(skb->len -
+ (skb->h.raw - skb->data)));
+ th->check = ~csum_fold(csum_add(th->check, delta));
}
out:
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [3/5] [NET]: Add software TSOv4
2006-06-23 21:26 ` Michael Chan
@ 2006-06-23 23:38 ` Herbert Xu
2006-06-23 23:53 ` Herbert Xu
0 siblings, 1 reply; 23+ messages in thread
From: Herbert Xu @ 2006-06-23 23:38 UTC (permalink / raw)
To: Michael Chan; +Cc: David S. Miller, netdev
On Fri, Jun 23, 2006 at 02:26:16PM -0700, Michael Chan wrote:
>
> This patch is more correct. Please ignore the previous one.
>
> [NET]: Fix CHECKSUM_HW GSO problems.
>
> Fix checksum problems in the GSO code path for CHECKSUM_HW packets.
>
> The ipv4 TCP pseudo header checksum has to be adjusted for GSO
> segmented packets.
>
> Signed-off-by: Michael Chan <mchan@broadcom.com>
Good catch. Obviously the only CHECKSUM_HW I tested was loop :)
Looking at this again it seems that we can optimise it further so
how about this?
[NET]: Fix CHECKSUM_HW GSO problems.
Fix checksum problems in the GSO code path for CHECKSUM_HW packets.
The ipv4 TCP pseudo header checksum has to be adjusted for GSO
segmented packets.
The adjustment is needed because the length field in the pseudo-header
changes. However, because we have the inequality oldlen > newlen, we
know that delta = (u16)~oldlen + newlen is still a 16-bit quantity.
This also means that htonl(delta) + th->check still fits in 32 bits.
Therefore we don't have to use csum_add on this operations.
This is based on a patch by Michael Chan <mchan@broadcom.com>.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0e029c4..10f1a8c 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2166,7 +2166,7 @@ struct sk_buff *tcp_tso_segment(struct s
if (!pskb_may_pull(skb, thlen))
goto out;
- oldlen = ~htonl(skb->len);
+ oldlen = (u16)~skb->len;
__skb_pull(skb, thlen);
segs = skb_segment(skb, sg);
@@ -2174,7 +2174,7 @@ struct sk_buff *tcp_tso_segment(struct s
goto out;
len = skb_shinfo(skb)->gso_size;
- delta = csum_add(oldlen, htonl(thlen + len));
+ delta = htonl(oldlen + (thlen + len));
skb = segs;
th = skb->h.th;
@@ -2183,10 +2183,10 @@ struct sk_buff *tcp_tso_segment(struct s
do {
th->fin = th->psh = 0;
- if (skb->ip_summed == CHECKSUM_NONE) {
- th->check = csum_fold(csum_partial(
- skb->h.raw, thlen, csum_add(skb->csum, delta)));
- }
+ th->check = ~csum_fold(th->check + delta);
+ if (skb->ip_summed != CHECKSUM_HW)
+ th->check = csum_fold(csum_partial(skb->h.raw, thlen,
+ skb->csum));
seq += len;
skb = skb->next;
@@ -2196,11 +2196,11 @@ struct sk_buff *tcp_tso_segment(struct s
th->cwr = 0;
} while (skb->next);
- if (skb->ip_summed == CHECKSUM_NONE) {
- delta = csum_add(oldlen, htonl(skb->tail - skb->h.raw));
- th->check = csum_fold(csum_partial(
- skb->h.raw, thlen, csum_add(skb->csum, delta)));
- }
+ delta = htonl(oldlen + (skb->tail - skb->h.raw));
+ th->check = ~csum_fold(th->check + delta);
+ if (skb->ip_summed != CHECKSUM_HW)
+ th->check = csum_fold(csum_partial(skb->h.raw, thlen,
+ skb->csum));
out:
return segs;
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [3/5] [NET]: Add software TSOv4
2006-06-23 23:38 ` Herbert Xu
@ 2006-06-23 23:53 ` Herbert Xu
2006-06-24 3:08 ` Michael Chan
0 siblings, 1 reply; 23+ messages in thread
From: Herbert Xu @ 2006-06-23 23:53 UTC (permalink / raw)
To: Michael Chan; +Cc: David S. Miller, netdev
On Sat, Jun 24, 2006 at 09:38:40AM +1000, herbert wrote:
>
> Good catch. Obviously the only CHECKSUM_HW I tested was loop :)
> Looking at this again it seems that we can optimise it further so
> how about this?
Nevermind, I obviously complete ignored your other fix to the length of
the last segment :) Here is a fixed version.
[NET]: Fix CHECKSUM_HW GSO problems.
Fix checksum problems in the GSO code path for CHECKSUM_HW packets.
The ipv4 TCP pseudo header checksum has to be adjusted for GSO
segmented packets.
The adjustment is needed because the length field in the pseudo-header
changes. However, because we have the inequality oldlen > newlen, we
know that delta = (u16)~oldlen + newlen is still a 16-bit quantity.
This also means that htonl(delta) + th->check still fits in 32 bits.
Therefore we don't have to use csum_add on this operations.
This is based on a patch by Michael Chan <mchan@broadcom.com>.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
cfecbf18c32a6dca8954538b5d5fb7186ed336d1
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0e029c4..c04176b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2166,7 +2166,7 @@ struct sk_buff *tcp_tso_segment(struct s
if (!pskb_may_pull(skb, thlen))
goto out;
- oldlen = ~htonl(skb->len);
+ oldlen = (u16)~skb->len;
__skb_pull(skb, thlen);
segs = skb_segment(skb, sg);
@@ -2174,7 +2174,7 @@ struct sk_buff *tcp_tso_segment(struct s
goto out;
len = skb_shinfo(skb)->gso_size;
- delta = csum_add(oldlen, htonl(thlen + len));
+ delta = htonl(oldlen + (thlen + len));
skb = segs;
th = skb->h.th;
@@ -2183,10 +2183,10 @@ struct sk_buff *tcp_tso_segment(struct s
do {
th->fin = th->psh = 0;
- if (skb->ip_summed == CHECKSUM_NONE) {
- th->check = csum_fold(csum_partial(
- skb->h.raw, thlen, csum_add(skb->csum, delta)));
- }
+ th->check = ~csum_fold(th->check + delta);
+ if (skb->ip_summed != CHECKSUM_HW)
+ th->check = csum_fold(csum_partial(skb->h.raw, thlen,
+ skb->csum));
seq += len;
skb = skb->next;
@@ -2196,11 +2196,11 @@ struct sk_buff *tcp_tso_segment(struct s
th->cwr = 0;
} while (skb->next);
- if (skb->ip_summed == CHECKSUM_NONE) {
- delta = csum_add(oldlen, htonl(skb->tail - skb->h.raw));
- th->check = csum_fold(csum_partial(
- skb->h.raw, thlen, csum_add(skb->csum, delta)));
- }
+ delta = htonl(oldlen + (skb->tail - skb->h.raw) + skb->data_len);
+ th->check = ~csum_fold(th->check + delta);
+ if (skb->ip_summed != CHECKSUM_HW)
+ th->check = csum_fold(csum_partial(skb->h.raw, thlen,
+ skb->csum));
out:
return segs;
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [3/5] [NET]: Add software TSOv4
2006-06-23 23:53 ` Herbert Xu
@ 2006-06-24 3:08 ` Michael Chan
2006-06-26 6:55 ` David Miller
0 siblings, 1 reply; 23+ messages in thread
From: Michael Chan @ 2006-06-24 3:08 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, netdev
On Sat, 2006-06-24 at 09:53 +1000, Herbert Xu wrote:
> Nevermind, I obviously complete ignored your other fix to the length of
> the last segment :) Here is a fixed version.
>
> [NET]: Fix CHECKSUM_HW GSO problems.
>
> Fix checksum problems in the GSO code path for CHECKSUM_HW packets.
>
> The ipv4 TCP pseudo header checksum has to be adjusted for GSO
> segmented packets.
>
> The adjustment is needed because the length field in the pseudo-header
> changes. However, because we have the inequality oldlen > newlen, we
> know that delta = (u16)~oldlen + newlen is still a 16-bit quantity.
> This also means that htonl(delta) + th->check still fits in 32 bits.
> Therefore we don't have to use csum_add on this operations.
>
> This is based on a patch by Michael Chan <mchan@broadcom.com>.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
Yes, this should work. ACK.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3/5] [NET]: Add software TSOv4
2006-06-24 3:08 ` Michael Chan
@ 2006-06-26 6:55 ` David Miller
0 siblings, 0 replies; 23+ messages in thread
From: David Miller @ 2006-06-26 6:55 UTC (permalink / raw)
To: mchan; +Cc: herbert, netdev
From: "Michael Chan" <mchan@broadcom.com>
Date: Fri, 23 Jun 2006 20:08:41 -0700
> On Sat, 2006-06-24 at 09:53 +1000, Herbert Xu wrote:
>
> > Nevermind, I obviously complete ignored your other fix to the length of
> > the last segment :) Here is a fixed version.
> >
> > [NET]: Fix CHECKSUM_HW GSO problems.
> >
> > Fix checksum problems in the GSO code path for CHECKSUM_HW packets.
> >
> > The ipv4 TCP pseudo header checksum has to be adjusted for GSO
> > segmented packets.
> >
> > The adjustment is needed because the length field in the pseudo-header
> > changes. However, because we have the inequality oldlen > newlen, we
> > know that delta = (u16)~oldlen + newlen is still a 16-bit quantity.
> > This also means that htonl(delta) + th->check still fits in 32 bits.
> > Therefore we don't have to use csum_add on this operations.
> >
> > This is based on a patch by Michael Chan <mchan@broadcom.com>.
> >
> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> >
> Yes, this should work. ACK.
Applied, thanks a lot guys.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [4/5] [NET]: Added GSO toggle
2006-06-22 8:12 [0/5] GSO: Generic Segmentation Offload Herbert Xu
` (2 preceding siblings ...)
2006-06-22 8:14 ` [3/5] [NET]: Add software TSOv4 Herbert Xu
@ 2006-06-22 8:14 ` Herbert Xu
2006-06-22 8:14 ` [5/5] [IPSEC]: Handle GSO packets Herbert Xu
` (3 subsequent siblings)
7 siblings, 0 replies; 23+ messages in thread
From: Herbert Xu @ 2006-06-22 8:14 UTC (permalink / raw)
To: David S. Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 443 bytes --]
Hi:
[NET]: Added GSO toggle
This patch adds a generic segmentation offload toggle that can be turned
on/off for each net device. For now it only supports in TCPv4.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p4.patch --]
[-- Type: text/plain, Size: 3869 bytes --]
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -408,6 +408,8 @@ struct ethtool_ops {
#define ETHTOOL_GPERMADDR 0x00000020 /* Get permanent hardware address */
#define ETHTOOL_GUFO 0x00000021 /* Get UFO enable (ethtool_value) */
#define ETHTOOL_SUFO 0x00000022 /* Set UFO enable (ethtool_value) */
+#define ETHTOOL_GGSO 0x00000023 /* Get GSO enable (ethtool_value) */
+#define ETHTOOL_SGSO 0x00000024 /* Set GSO enable (ethtool_value) */
/* compatibility with older code */
#define SPARC_ETH_GSET ETHTOOL_GSET
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -309,6 +309,7 @@ struct net_device
#define NETIF_F_HW_VLAN_RX 256 /* Receive VLAN hw acceleration */
#define NETIF_F_HW_VLAN_FILTER 512 /* Receive filtering on VLAN */
#define NETIF_F_VLAN_CHALLENGED 1024 /* Device cannot handle VLAN packets */
+#define NETIF_F_GSO 2048 /* Enable software GSO. */
#define NETIF_F_LLTX 4096 /* LockLess TX */
/* Segmentation offload features */
diff --git a/include/net/sock.h b/include/net/sock.h
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1031,9 +1031,13 @@ static inline void sk_setup_caps(struct
{
__sk_dst_set(sk, dst);
sk->sk_route_caps = dst->dev->features;
+ if (sk->sk_route_caps & NETIF_F_GSO)
+ sk->sk_route_caps |= NETIF_F_TSO;
if (sk->sk_route_caps & NETIF_F_TSO) {
if (sock_flag(sk, SOCK_NO_LARGESEND) || dst->header_len)
sk->sk_route_caps &= ~NETIF_F_TSO;
+ else
+ sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
}
}
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -376,15 +376,20 @@ void br_features_recompute(struct net_br
features = br->feature_mask & ~NETIF_F_ALL_CSUM;
list_for_each_entry(p, &br->port_list, list) {
- if (checksum & NETIF_F_NO_CSUM &&
- !(p->dev->features & NETIF_F_NO_CSUM))
+ unsigned long feature = p->dev->features;
+
+ if (checksum & NETIF_F_NO_CSUM && !(feature & NETIF_F_NO_CSUM))
checksum ^= NETIF_F_NO_CSUM | NETIF_F_HW_CSUM;
- if (checksum & NETIF_F_HW_CSUM &&
- !(p->dev->features & NETIF_F_HW_CSUM))
+ if (checksum & NETIF_F_HW_CSUM && !(feature & NETIF_F_HW_CSUM))
checksum ^= NETIF_F_HW_CSUM | NETIF_F_IP_CSUM;
- if (!(p->dev->features & NETIF_F_IP_CSUM))
+ if (!(feature & NETIF_F_IP_CSUM))
checksum = 0;
- features &= p->dev->features;
+
+ if (feature & NETIF_F_GSO)
+ feature |= NETIF_F_TSO;
+ feature |= NETIF_F_GSO;
+
+ features &= feature;
}
br->dev->features = features | checksum | NETIF_F_LLTX;
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -614,6 +614,29 @@ static int ethtool_set_ufo(struct net_de
return dev->ethtool_ops->set_ufo(dev, edata.data);
}
+static int ethtool_get_gso(struct net_device *dev, char __user *useraddr)
+{
+ struct ethtool_value edata = { ETHTOOL_GGSO };
+
+ edata.data = dev->features & NETIF_F_GSO;
+ if (copy_to_user(useraddr, &edata, sizeof(edata)))
+ return -EFAULT;
+ return 0;
+}
+
+static int ethtool_set_gso(struct net_device *dev, char __user *useraddr)
+{
+ struct ethtool_value edata;
+
+ if (copy_from_user(&edata, useraddr, sizeof(edata)))
+ return -EFAULT;
+ if (edata.data)
+ dev->features |= NETIF_F_GSO;
+ else
+ dev->features &= ~NETIF_F_GSO;
+ return 0;
+}
+
static int ethtool_self_test(struct net_device *dev, char __user *useraddr)
{
struct ethtool_test test;
@@ -905,6 +928,12 @@ int dev_ethtool(struct ifreq *ifr)
case ETHTOOL_SUFO:
rc = ethtool_set_ufo(dev, useraddr);
break;
+ case ETHTOOL_GGSO:
+ rc = ethtool_get_gso(dev, useraddr);
+ break;
+ case ETHTOOL_SGSO:
+ rc = ethtool_set_gso(dev, useraddr);
+ break;
default:
rc = -EOPNOTSUPP;
}
^ permalink raw reply [flat|nested] 23+ messages in thread* [5/5] [IPSEC]: Handle GSO packets
2006-06-22 8:12 [0/5] GSO: Generic Segmentation Offload Herbert Xu
` (3 preceding siblings ...)
2006-06-22 8:14 ` [4/5] [NET]: Added GSO toggle Herbert Xu
@ 2006-06-22 8:14 ` Herbert Xu
2006-06-22 8:15 ` [0/5] GSO: Generic Segmentation Offload Herbert Xu
` (2 subsequent siblings)
7 siblings, 0 replies; 23+ messages in thread
From: Herbert Xu @ 2006-06-22 8:14 UTC (permalink / raw)
To: David S. Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 767 bytes --]
Hi:
[IPSEC]: Handle GSO packets
This patch segments GSO packets received by the IPsec stack. This can
happen when a NIC driver injects GSO packets into the stack which are
then forwarded to another host.
The primary application of this is going to be Xen where its backend
driver may inject GSO packets into dom0.
Of course this also can be used by other virtualisation schemes such as
VMWare or UML since the tap device could be modified to inject GSO packets
received through splice.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p5.patch --]
[-- Type: text/plain, Size: 3351 bytes --]
diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -9,6 +9,8 @@
*/
#include <linux/compiler.h>
+#include <linux/if_ether.h>
+#include <linux/kernel.h>
#include <linux/skbuff.h>
#include <linux/spinlock.h>
#include <linux/netfilter_ipv4.h>
@@ -97,16 +99,10 @@ error_nolock:
goto out_exit;
}
-static int xfrm4_output_finish(struct sk_buff *skb)
+static int xfrm4_output_finish2(struct sk_buff *skb)
{
int err;
-#ifdef CONFIG_NETFILTER
- if (!skb->dst->xfrm) {
- IPCB(skb)->flags |= IPSKB_REROUTED;
- return dst_output(skb);
- }
-#endif
while (likely((err = xfrm4_output_one(skb)) == 0)) {
nf_reset(skb);
@@ -119,7 +115,7 @@ static int xfrm4_output_finish(struct sk
return dst_output(skb);
err = nf_hook(PF_INET, NF_IP_POST_ROUTING, &skb, NULL,
- skb->dst->dev, xfrm4_output_finish);
+ skb->dst->dev, xfrm4_output_finish2);
if (unlikely(err != 1))
break;
}
@@ -127,6 +123,48 @@ static int xfrm4_output_finish(struct sk
return err;
}
+static int xfrm4_output_finish(struct sk_buff *skb)
+{
+ struct sk_buff *segs;
+
+#ifdef CONFIG_NETFILTER
+ if (!skb->dst->xfrm) {
+ IPCB(skb)->flags |= IPSKB_REROUTED;
+ return dst_output(skb);
+ }
+#endif
+
+ if (!skb_shinfo(skb)->gso_size)
+ return xfrm4_output_finish2(skb);
+
+ skb->protocol = htons(ETH_P_IP);
+ segs = skb_gso_segment(skb, 0);
+ kfree_skb(skb);
+ if (unlikely(IS_ERR(segs)))
+ return PTR_ERR(segs);
+
+ do {
+ struct sk_buff *nskb = segs->next;
+ int err;
+
+ segs->next = NULL;
+ err = xfrm4_output_finish2(segs);
+
+ if (unlikely(err)) {
+ while ((segs = nskb)) {
+ nskb = segs->next;
+ segs->next = NULL;
+ kfree_skb(segs);
+ }
+ return err;
+ }
+
+ segs = nskb;
+ } while (segs);
+
+ return 0;
+}
+
int xfrm4_output(struct sk_buff *skb)
{
return NF_HOOK_COND(PF_INET, NF_IP_POST_ROUTING, skb, NULL, skb->dst->dev,
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -94,7 +94,7 @@ error_nolock:
goto out_exit;
}
-static int xfrm6_output_finish(struct sk_buff *skb)
+static int xfrm6_output_finish2(struct sk_buff *skb)
{
int err;
@@ -110,7 +110,7 @@ static int xfrm6_output_finish(struct sk
return dst_output(skb);
err = nf_hook(PF_INET6, NF_IP6_POST_ROUTING, &skb, NULL,
- skb->dst->dev, xfrm6_output_finish);
+ skb->dst->dev, xfrm6_output_finish2);
if (unlikely(err != 1))
break;
}
@@ -118,6 +118,41 @@ static int xfrm6_output_finish(struct sk
return err;
}
+static int xfrm6_output_finish(struct sk_buff *skb)
+{
+ struct sk_buff *segs;
+
+ if (!skb_shinfo(skb)->gso_size)
+ return xfrm6_output_finish2(skb);
+
+ skb->protocol = htons(ETH_P_IP);
+ segs = skb_gso_segment(skb, 0);
+ kfree_skb(skb);
+ if (unlikely(IS_ERR(segs)))
+ return PTR_ERR(segs);
+
+ do {
+ struct sk_buff *nskb = segs->next;
+ int err;
+
+ segs->next = NULL;
+ err = xfrm6_output_finish2(segs);
+
+ if (unlikely(err)) {
+ while ((segs = nskb)) {
+ nskb = segs->next;
+ segs->next = NULL;
+ kfree_skb(segs);
+ }
+ return err;
+ }
+
+ segs = nskb;
+ } while (segs);
+
+ return 0;
+}
+
int xfrm6_output(struct sk_buff *skb)
{
return NF_HOOK(PF_INET6, NF_IP6_POST_ROUTING, skb, NULL, skb->dst->dev,
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [0/5] GSO: Generic Segmentation Offload
2006-06-22 8:12 [0/5] GSO: Generic Segmentation Offload Herbert Xu
` (4 preceding siblings ...)
2006-06-22 8:14 ` [5/5] [IPSEC]: Handle GSO packets Herbert Xu
@ 2006-06-22 8:15 ` Herbert Xu
2006-06-22 10:08 ` David Miller
2006-06-22 14:28 ` YOSHIFUJI Hideaki / 吉藤英明
7 siblings, 0 replies; 23+ messages in thread
From: Herbert Xu @ 2006-06-22 8:15 UTC (permalink / raw)
To: David S. Miller, netdev
Hi:
If anyone is interested here is the incremental patch against the previous
series.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/core/dev.c b/net/core/dev.c
index 9c68ab8..d293e0f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1415,7 +1415,7 @@ gso:
/* Disable soft irqs for various locks below. Also
* stops preemption for RCU.
*/
- local_bh_disable();
+ rcu_read_lock_bh();
/* Updates of qdisc are serialized by queue_lock.
* The struct Qdisc which is pointed to by qdisc is now a
@@ -1486,13 +1486,13 @@ #endif
}
rc = -ENETDOWN;
- local_bh_enable();
+ rcu_read_unlock_bh();
out_kfree_skb:
kfree_skb(skb);
return rc;
out:
- local_bh_enable();
+ rcu_read_unlock_bh();
return rc;
}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 472cb5a..4cdd6ca 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -185,9 +185,13 @@ requeue:
void __qdisc_run(struct net_device *dev)
{
+ if (unlikely(dev->qdisc == &noop_qdisc))
+ goto out;
+
while (qdisc_restart(dev) < 0 && !netif_queue_stopped(dev))
/* NOTHING */;
+out:
clear_bit(__LINK_STATE_QDISC_RUNNING, &dev->state);
}
@@ -581,20 +585,24 @@ void dev_deactivate(struct net_device *d
spin_lock_bh(&dev->queue_lock);
qdisc = dev->qdisc;
dev->qdisc = &noop_qdisc;
- skb = dev->gso_skb;
- dev->gso_skb = NULL;
qdisc_reset(qdisc);
spin_unlock_bh(&dev->queue_lock);
- kfree_skb(skb);
dev_watchdog_down(dev);
- while (test_bit(__LINK_STATE_SCHED, &dev->state))
+ /* Wait for outstanding dev_queue_xmit calls. */
+ synchronize_rcu();
+
+ /* Wait for outstanding qdisc_run calls. */
+ while (test_bit(__LINK_STATE_QDISC_RUNNING, &dev->state))
yield();
- spin_unlock_wait(&dev->_xmit_lock);
+ if (dev->gso_skb) {
+ kfree_skb(dev->gso_skb);
+ dev->gso_skb = NULL;
+ }
}
void dev_init_scheduler(struct net_device *dev)
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [0/5] GSO: Generic Segmentation Offload
2006-06-22 8:12 [0/5] GSO: Generic Segmentation Offload Herbert Xu
` (5 preceding siblings ...)
2006-06-22 8:15 ` [0/5] GSO: Generic Segmentation Offload Herbert Xu
@ 2006-06-22 10:08 ` David Miller
2006-06-22 14:28 ` YOSHIFUJI Hideaki / 吉藤英明
7 siblings, 0 replies; 23+ messages in thread
From: David Miller @ 2006-06-22 10:08 UTC (permalink / raw)
To: herbert; +Cc: netdev
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 22 Jun 2006 18:12:11 +1000
> This is a repost of the GSO patches. The main change is the fix to a bug
> in the way dev->gso_skb is freed. This series requires the dev_deactivate
> patch that I just posted.
Applied, thanks a lot Herbert.
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [0/5] GSO: Generic Segmentation Offload
2006-06-22 8:12 [0/5] GSO: Generic Segmentation Offload Herbert Xu
` (6 preceding siblings ...)
2006-06-22 10:08 ` David Miller
@ 2006-06-22 14:28 ` YOSHIFUJI Hideaki / 吉藤英明
2006-06-24 5:36 ` Herbert Xu
7 siblings, 1 reply; 23+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2006-06-22 14:28 UTC (permalink / raw)
To: herbert; +Cc: davem, netdev, yoshfuji
Hello.
Yes, I genrally like this idea.
In article <20060622081211.GA22505@gondor.apana.org.au> (at Thu, 22 Jun 2006 18:12:11 +1000), Herbert Xu <herbert@gondor.apana.org.au> says:
> GSO like TSO is only effective if the MTU is significantly less than the
> maximum value of 64K. So only the case where the MTU was set to 1500 is
> of interest. There we can see that the throughput improved by 17.5%
> (3061.05Mb/s => 3598.17Mb/s). The actual saving in transmission cost is
> in fact a lot more than that as the majority of the time here is spent on
> the RX side which still has to deal with 1500-byte packets.
Can you measure some with other sizes,
e.g. 4kByte, 8kByte, 9000Byte?
--yoshfuji
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [0/5] GSO: Generic Segmentation Offload
2006-06-22 14:28 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2006-06-24 5:36 ` Herbert Xu
0 siblings, 0 replies; 23+ messages in thread
From: Herbert Xu @ 2006-06-24 5:36 UTC (permalink / raw)
To: YOSHIFUJI Hideaki / ?$B5HF#1QL@; +Cc: davem, netdev
On Thu, Jun 22, 2006 at 11:28:01PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ wrote:
>
> Can you measure some with other sizes,
> e.g. 4kByte, 8kByte, 9000Byte?
GSO like TSO is less effective when the MTU is larger. However, NICs
supporting larger MTUs also support SG. So the figures I included for
lo should apply. In those scenarios, GSO is basically on par with the
default segmentation.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 23+ messages in thread