Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 1/9][BNX2]: Add function to fetch hardware tx index.
From: David Miller @ 2007-12-21  3:55 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1198182667.9163.27.camel@dell>

From: "Michael Chan" <mchan@broadcom.com>
Date: Thu, 20 Dec 2007 12:31:07 -0800

> [BNX2]: Add function to fetch hardware tx index.
> 
> This makes the code cleaner and easier to support different tx rings.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH] [XFRM] IPv6: Fix dst/routing check at transformation.
From: David Miller @ 2007-12-21  3:50 UTC (permalink / raw)
  To: nakam; +Cc: usagi-core, herbert, netdev
In-Reply-To: <200712211248.31508.nakam@linux-ipv6.org>

From: Masahide NAKAMURA <nakam@linux-ipv6.org>
Date: Fri, 21 Dec 2007 12:48:31 +0900

> My 5 patches for XFRM sent to netdev should be TOed to David, but it is not.
> 
> It does not seems that the command works for me.
> git-send-email --to "David S. Miller <davem@davemloft.net>" --to herbert@gondor.apana.org.au --cc...
> 
> Please see my patches, even it is not TOed to you.

All of your patches won't make it anywhere.

In the email headers my name shows up like this:

	David S. Miller

Email SMTP rules dictate that if special characters like
"." appear in the name it must be surrounded by double
quotes otherwise it is a syntax error.

This is a bug in git-send-email that I thought was fixed
by now.  Perhaps it is fixed in git mainline and not any
of the stable releases yet.

Perhaps you can submit them by hand until you resolve the
git-send-email problem?

Thanks.

^ permalink raw reply

* Re: [PATCH] [XFRM] IPv6: Fix dst/routing check at transformation.
From: Herbert Xu @ 2007-12-21  3:50 UTC (permalink / raw)
  To: Masahide NAKAMURA; +Cc: netdev, usagi-core, David S. Miller
In-Reply-To: <11982084391595-git-send-email-nakam@linux-ipv6.org>

On Fri, Dec 21, 2007 at 12:40:39PM +0900, Masahide NAKAMURA wrote:
> IPv6 specific thing is wrongly removed from transformation at net-2.6.25.
> This patch recovers it with current design.
> 
> o Update "path" of xfrm_dst since IPv6 transformation should
>   care about routing changes. It is required by MIPv6 and
>   off-link destined IPsec.
> o Rename nfheader_len which is for non-fragment transformation used by
>   MIPv6 to rt6i_nfheader_len as IPv6 name space.
> 
> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>

Thanks for fixing this up.  They both look good to me.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 1/4] [UDP]: fix send buffer check
From: Hideo AOKI @ 2007-12-21  3:43 UTC (permalink / raw)
  To: David Miller
  Cc: herbert, netdev, tyasui, mhiramat, satoshi.oshima.fk, billfink,
	andi, johnpol, shemminger, yoshfuji, yumiko.sugita.yf, haoki
In-Reply-To: <20071220.033138.157714241.davem@davemloft.net>

Hello,

David Miller wrote:

>> diff -pruN net-2.6/net/ipv4/ip_output.c net-2.6-udp-take11a1-p1/net/ipv4/ip_output.c
>> --- net-2.6/net/ipv4/ip_output.c	2007-12-11 10:54:55.000000000 -0500
>> +++ net-2.6-udp-take11a1-p1/net/ipv4/ip_output.c	2007-12-17 14:42:31.000000000 -0500
>> @@ -1004,6 +1004,11 @@ alloc_new_skb:
>>  					frag = &skb_shinfo(skb)->frags[i];
>>  				}
>>  			} else if (i < MAX_SKB_FRAGS) {
>> +				if (atomic_read(&sk->sk_wmem_alloc) + PAGE_SIZE
>> +				    > 2 * sk->sk_sndbuf) {
>> +					err = -ENOBUFS;
>> +					goto error;
>> +				}
>>  				if (copy > PAGE_SIZE)
>>  					copy = PAGE_SIZE;
>>  				page = alloc_pages(sk->sk_allocation, 0);
> 
> If we are going to do this, we need to add the same check to
> skb_append_datato_frags() which is invoked via ip_ufo_append_data().
> 
> We also have to be very careful in this area.  One problem we had a
> long time ago was that we would socket account when fragmenting an
> outgoing frame.  This was bogus because even if the socket had enough
> space for one full sized frame, the packet send would fail because it
> could not fit the space for both the original frame and the
> fragmented copy of it.
> 
> This situation was cured by simply not enforcing accounting for the
> fragmented copy.  It is valid because after we fragment, we keep
> the fragmented copy but free the original.
> 
> This doesn't apply directly to this specific patch, but it is
> something to keep in mind when doing these changes.

Hello,

Thank you for sharing your experience.

Let me investigate this code and skb_append_datato_frags().

I'll include the check code in next patch set if it is really needed.

Regards,
Hideo

-- 
Hitachi Computer Products (America) Inc.

^ permalink raw reply

* Re: [PATCH 0/10] sysfs network namespace support
From: Greg KH @ 2007-12-21  3:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Tejun Heo, Linux Containers, netdev,
	cornelia.huck, stern, kay.sievers, linux-kernel, Andrew Morton,
	Herbert Xu, David Miller
In-Reply-To: <m163zi9425.fsf@ebiederm.dsl.xmission.com>

On Sat, Dec 01, 2007 at 02:06:58AM -0700, Eric W. Biederman wrote:
> 
> Now that we have network namespace support merged it is time to
> revisit the sysfs support so we can remove the dependency on !SYSFS.

<snip>

Oops, I forgot to apply this to my tree.  Eric, you still want this
submitted, right?

thanks,

greg k-h

^ permalink raw reply

* [patch 4/6] netxen: stop second phy correctly
From: dhananjay @ 2007-12-21  2:37 UTC (permalink / raw)
  To: netdev; +Cc: jeff
In-Reply-To: <20071221023656.409657310@netxen.com>

[-- Attachment #1: stop_port.patch --]
[-- Type: text/plain, Size: 1700 bytes --]

This patch fixes bug that doesn't quiesce second port when interface is
brought down, which could lead to unwarranted interrupt during rmmod/ifdown.

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>

Index: netdev-2.6/drivers/net/netxen/netxen_nic_niu.c
===================================================================
--- netdev-2.6.orig/drivers/net/netxen/netxen_nic_niu.c
+++ netdev-2.6/drivers/net/netxen/netxen_nic_niu.c
@@ -742,12 +742,12 @@ int netxen_niu_disable_xg_port(struct ne
 	__u32 mac_cfg;
 	u32 port = physical_port[adapter->portnum];
 
-	if (port != 0)
+	if (port > NETXEN_NIU_MAX_XG_PORTS)
 		return -EINVAL;
+
 	mac_cfg = 0;
-	netxen_xg_soft_reset(mac_cfg);
-	if (netxen_nic_hw_write_wx(adapter, NETXEN_NIU_XGE_CONFIG_0,
-				   &mac_cfg, 4))
+	if (netxen_nic_hw_write_wx(adapter,
+		NETXEN_NIU_XGE_CONFIG_0 + (0x10000 * port), &mac_cfg, 4))
 		return -EIO;
 	return 0;
 }
Index: netdev-2.6/drivers/net/netxen/netxen_nic_main.c
===================================================================
--- netdev-2.6.orig/drivers/net/netxen/netxen_nic_main.c
+++ netdev-2.6/drivers/net/netxen/netxen_nic_main.c
@@ -725,11 +725,6 @@ static void __devexit netxen_nic_remove(
 
 	unregister_netdev(netdev);
 
-	if (adapter->stop_port)
-		adapter->stop_port(adapter);
-
-	netxen_nic_disable_int(adapter);
-
 	if (adapter->is_up == NETXEN_ADAPTER_UP_MAGIC) {
 		init_firmware_done++;
 		netxen_free_hw_resources(adapter);
@@ -912,6 +907,9 @@ static int netxen_nic_close(struct net_d
 	netif_stop_queue(netdev);
 	napi_disable(&adapter->napi);
 
+	if (adapter->stop_port)
+		adapter->stop_port(adapter);
+
 	netxen_nic_disable_int(adapter);
 
 	cmd_buff = adapter->cmd_buf_arr;

-- 

^ permalink raw reply

* [patch 5/6] netxen: fix race in interrupt / napi
From: dhananjay @ 2007-12-21  2:37 UTC (permalink / raw)
  To: netdev; +Cc: jeff
In-Reply-To: <20071221023656.409657310@netxen.com>

[-- Attachment #1: poll.patch --]
[-- Type: text/plain, Size: 8651 bytes --]

This patch simplifies netxen ISR and poll() routine. Interrupts are not
unmasked in interrupt routine based on a racy has_work() checks, but left
to the poll function to enable them. 

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>

Index: netdev-2.6/drivers/net/netxen/netxen_nic_main.c
===================================================================
--- netdev-2.6.orig/drivers/net/netxen/netxen_nic_main.c
+++ netdev-2.6/drivers/net/netxen/netxen_nic_main.c
@@ -63,7 +63,6 @@ static int netxen_nic_xmit_frame(struct 
 static void netxen_tx_timeout(struct net_device *netdev);
 static void netxen_tx_timeout_task(struct work_struct *work);
 static void netxen_watchdog(unsigned long);
-static int netxen_handle_int(struct netxen_adapter *, struct net_device *);
 static int netxen_nic_poll(struct napi_struct *napi, int budget);
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void netxen_nic_poll_controller(struct net_device *netdev);
@@ -1218,40 +1217,6 @@ static void netxen_tx_timeout_task(struc
 	netif_wake_queue(adapter->netdev);
 }
 
-static int
-netxen_handle_int(struct netxen_adapter *adapter, struct net_device *netdev)
-{
-	u32 ret = 0;
-
-	DPRINTK(INFO, "Entered handle ISR\n");
-	adapter->stats.ints++;
-
-	netxen_nic_disable_int(adapter);
-
-	if (netxen_nic_rx_has_work(adapter) || netxen_nic_tx_has_work(adapter)) {
-		if (netif_rx_schedule_prep(netdev, &adapter->napi)) {
-			/*
-			 * Interrupts are already disabled.
-			 */
-			__netif_rx_schedule(netdev, &adapter->napi);
-		} else {
-			static unsigned int intcount = 0;
-			if ((++intcount & 0xfff) == 0xfff)
-				DPRINTK(KERN_ERR
-				       "%s: %s interrupt %d while in poll\n",
-				       netxen_nic_driver_name, netdev->name,
-				       intcount);
-		}
-		ret = 1;
-	}
-
-	if (ret == 0) {
-		netxen_nic_enable_int(adapter);
-	}
-
-	return ret;
-}
-
 /*
  * netxen_intr - Interrupt Handler
  * @irq: interrupt number
@@ -1278,8 +1243,12 @@ irqreturn_t netxen_intr(int irq, void *d
 		}
 	}
 
-	if (netif_running(netdev))
-		netxen_handle_int(adapter, netdev);
+	adapter->stats.ints++;
+
+	if (netif_rx_schedule_prep(netdev, &adapter->napi)) {
+		netxen_nic_disable_int(adapter);
+		__netif_rx_schedule(netdev, &adapter->napi);
+	}
 
 	return IRQ_HANDLED;
 }
@@ -1287,12 +1256,11 @@ irqreturn_t netxen_intr(int irq, void *d
 static int netxen_nic_poll(struct napi_struct *napi, int budget)
 {
 	struct netxen_adapter *adapter = container_of(napi, struct netxen_adapter, napi);
-	struct net_device *netdev = adapter->netdev;
-	int done = 1;
+	int tx_complete;
 	int ctx;
 	int work_done;
 
-	DPRINTK(INFO, "polling for %d descriptors\n", *budget);
+	tx_complete = netxen_process_cmd_ring(adapter);
 
 	work_done = 0;
 	for (ctx = 0; ctx < MAX_RCV_CTX; ++ctx) {
@@ -1312,16 +1280,8 @@ static int netxen_nic_poll(struct napi_s
 						     budget / MAX_RCV_CTX);
 	}
 
-	if (work_done >= budget && netxen_nic_rx_has_work(adapter) != 0)
-		done = 0;
-
-	if (netxen_process_cmd_ring((unsigned long)adapter) == 0)
-		done = 0;
-
-	DPRINTK(INFO, "new work_done: %d work_to_do: %d\n",
-		work_done, work_to_do);
-	if (done) {
-		netif_rx_complete(netdev, napi);
+	if ((work_done < budget) && tx_complete) {
+		netif_rx_complete(adapter->netdev, &adapter->napi);
 		netxen_nic_enable_int(adapter);
 	}
 
Index: netdev-2.6/drivers/net/netxen/netxen_nic.h
===================================================================
--- netdev-2.6.orig/drivers/net/netxen/netxen_nic.h
+++ netdev-2.6/drivers/net/netxen/netxen_nic.h
@@ -839,7 +839,6 @@ struct netxen_rcv_desc_ctx {
 	u32 flags;
 	u32 producer;
 	u32 rcv_pending;	/* Num of bufs posted in phantom */
-	u32 rcv_free;		/* Num of bufs in free list */
 	dma_addr_t phys_addr;
 	struct pci_dev *phys_pdev;
 	struct rcv_desc *desc_head;	/* address of rx ring in Phantom */
@@ -1073,12 +1072,10 @@ void netxen_tso_check(struct netxen_adap
 		      struct cmd_desc_type0 *desc, struct sk_buff *skb);
 int netxen_nic_hw_resources(struct netxen_adapter *adapter);
 void netxen_nic_clear_stats(struct netxen_adapter *adapter);
-int netxen_nic_rx_has_work(struct netxen_adapter *adapter);
-int netxen_nic_tx_has_work(struct netxen_adapter *adapter);
 void netxen_watchdog_task(struct work_struct *work);
 void netxen_post_rx_buffers(struct netxen_adapter *adapter, u32 ctx,
 			    u32 ringid);
-int netxen_process_cmd_ring(unsigned long data);
+int netxen_process_cmd_ring(struct netxen_adapter *adapter);
 u32 netxen_process_rcv_ring(struct netxen_adapter *adapter, int ctx, int max);
 void netxen_nic_set_multi(struct net_device *netdev);
 int netxen_nic_change_mtu(struct net_device *netdev, int new_mtu);
Index: netdev-2.6/drivers/net/netxen/netxen_nic_init.c
===================================================================
--- netdev-2.6.orig/drivers/net/netxen/netxen_nic_init.c
+++ netdev-2.6/drivers/net/netxen/netxen_nic_init.c
@@ -185,7 +185,6 @@ void netxen_initialize_adapter_sw(struct
 		for (ring = 0; ring < NUM_RCV_DESC_RINGS; ring++) {
 			struct netxen_rx_buffer *rx_buf;
 			rcv_desc = &adapter->recv_ctx[ctxid].rcv_desc[ring];
-			rcv_desc->rcv_free = rcv_desc->max_rx_desc_count;
 			rcv_desc->begin_alloc = 0;
 			rx_buf = rcv_desc->rx_buf_arr;
 			num_rx_bufs = rcv_desc->max_rx_desc_count;
@@ -975,28 +974,6 @@ int netxen_phantom_init(struct netxen_ad
 	return 0;
 }
 
-int netxen_nic_rx_has_work(struct netxen_adapter *adapter)
-{
-	int ctx;
-
-	for (ctx = 0; ctx < MAX_RCV_CTX; ++ctx) {
-		struct netxen_recv_context *recv_ctx =
-		    &(adapter->recv_ctx[ctx]);
-		u32 consumer;
-		struct status_desc *desc_head;
-		struct status_desc *desc;
-
-		consumer = recv_ctx->status_rx_consumer;
-		desc_head = recv_ctx->rcv_status_desc_head;
-		desc = &desc_head[consumer];
-
-		if (netxen_get_sts_owner(desc) & STATUS_OWNER_HOST)
-			return 1;
-	}
-
-	return 0;
-}
-
 static int netxen_nic_check_temp(struct netxen_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
@@ -1175,7 +1152,6 @@ static void netxen_process_rcv(struct ne
 
 	netdev->last_rx = jiffies;
 
-	rcv_desc->rcv_free++;
 	rcv_desc->rcv_pending--;
 
 	/*
@@ -1231,23 +1207,22 @@ u32 netxen_process_rcv_ring(struct netxe
 		recv_ctx->status_rx_consumer = consumer;
 		recv_ctx->status_rx_producer = producer;
 
+		smp_wmb();
 		/* Window = 1 */
 		writel(consumer,
 		       NETXEN_CRB_NORMALIZE(adapter,
 					    recv_crb_registers[adapter->portnum].
 					    crb_rcv_status_consumer));
-		wmb();
 	}
 
 	return count;
 }
 
 /* Process Command status ring */
-int netxen_process_cmd_ring(unsigned long data)
+int netxen_process_cmd_ring(struct netxen_adapter *adapter)
 {
 	u32 last_consumer;
 	u32 consumer;
-	struct netxen_adapter *adapter = (struct netxen_adapter *)data;
 	int count1 = 0;
 	int count2 = 0;
 	struct netxen_cmd_buffer *buffer;
@@ -1353,11 +1328,7 @@ int netxen_process_cmd_ring(unsigned lon
 	 * There is still a possible race condition and the host could miss an
 	 * interrupt. The card has to take care of this.
 	 */
-	if (adapter->last_cmd_consumer == consumer &&
-	    (((adapter->cmd_producer + 1) %
-	      adapter->max_tx_desc_count) == adapter->last_cmd_consumer)) {
-		consumer = le32_to_cpu(*(adapter->cmd_consumer));
-	}
+	consumer = le32_to_cpu(*(adapter->cmd_consumer));
 	done = (adapter->last_cmd_consumer == consumer);
 
 	spin_unlock(&adapter->tx_lock);
@@ -1436,8 +1407,6 @@ void netxen_post_rx_buffers(struct netxe
 		rcv_desc->begin_alloc = index;
 		rcv_desc->rcv_pending += count;
 		rcv_desc->producer = producer;
-		if (rcv_desc->rcv_free >= 32) {
-			rcv_desc->rcv_free = 0;
 			/* Window = 1 */
 			writel((producer - 1) &
 			       (rcv_desc->max_rx_desc_count - 1),
@@ -1461,8 +1430,6 @@ void netxen_post_rx_buffers(struct netxe
 			writel(msg,
 			       DB_NORMALIZE(adapter,
 					    NETXEN_RCV_PRODUCER_OFFSET));
-			wmb();
-		}
 	}
 }
 
@@ -1526,8 +1493,6 @@ static void netxen_post_rx_buffers_nodb(
 		rcv_desc->begin_alloc = index;
 		rcv_desc->rcv_pending += count;
 		rcv_desc->producer = producer;
-		if (rcv_desc->rcv_free >= 32) {
-			rcv_desc->rcv_free = 0;
 			/* Window = 1 */
 			writel((producer - 1) &
 			       (rcv_desc->max_rx_desc_count - 1),
@@ -1537,21 +1502,9 @@ static void netxen_post_rx_buffers_nodb(
 						    rcv_desc_crb[ringid].
 						    crb_rcv_producer_offset));
 			wmb();
-		}
 	}
 }
 
-int netxen_nic_tx_has_work(struct netxen_adapter *adapter)
-{
-	if (find_diff_among(adapter->last_cmd_consumer,
-			    adapter->cmd_producer,
-			    adapter->max_tx_desc_count) > 0)
-		return 1;
-
-	return 0;
-}
-
-
 void netxen_nic_clear_stats(struct netxen_adapter *adapter)
 {
 	memset(&adapter->stats, 0, sizeof(adapter->stats));

-- 

^ permalink raw reply

* [patch 3/6] netxen: improve MSI interrupt handling
From: dhananjay @ 2007-12-21  2:36 UTC (permalink / raw)
  To: netdev; +Cc: jeff
In-Reply-To: <20071221023656.409657310@netxen.com>

[-- Attachment #1: msifix.patch --]
[-- Type: text/plain, Size: 7342 bytes --]

Recent netxen firmware has new scheme of generating MSI interrupts, it
raises interrupt and blocks itself, waiting for driver to unmask. This
reduces chance of spurious interrupts.

The driver will be able to deal with older firmware as well.

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>

Index: netdev-2.6/drivers/net/netxen/netxen_nic_hw.c
===================================================================
--- netdev-2.6.orig/drivers/net/netxen/netxen_nic_hw.c
+++ netdev-2.6/drivers/net/netxen/netxen_nic_hw.c
@@ -398,6 +398,8 @@ int netxen_nic_hw_resources(struct netxe
 		NETXEN_CRB_NORMALIZE(adapter, CRB_NIC_CAPABILITIES_FW));
 	printk(KERN_NOTICE "%s: FW capabilities:0x%x\n", netxen_nic_driver_name,
 			adapter->intr_scheme);
+	adapter->msi_mode = readl(
+		NETXEN_CRB_NORMALIZE(adapter, CRB_NIC_MSI_MODE_FW));
 	DPRINTK(INFO, "Receive Peg ready too. starting stuff\n");
 
 	addr = netxen_alloc(adapter->ahw.pdev,
Index: netdev-2.6/drivers/net/netxen/netxen_nic_main.c
===================================================================
--- netdev-2.6.orig/drivers/net/netxen/netxen_nic_main.c
+++ netdev-2.6/drivers/net/netxen/netxen_nic_main.c
@@ -149,33 +149,33 @@ static void netxen_nic_update_cmd_consum
 
 #define	ADAPTER_LIST_SIZE 12
 
+static uint32_t msi_tgt_status[4] = {
+	ISR_INT_TARGET_STATUS, ISR_INT_TARGET_STATUS_F1,
+	ISR_INT_TARGET_STATUS_F2, ISR_INT_TARGET_STATUS_F3
+};
+
+static uint32_t sw_int_mask[4] = {
+	CRB_SW_INT_MASK_0, CRB_SW_INT_MASK_1,
+	CRB_SW_INT_MASK_2, CRB_SW_INT_MASK_3
+};
+
 static void netxen_nic_disable_int(struct netxen_adapter *adapter)
 {
-	uint32_t	mask = 0x7ff;
+	u32 mask;
 	int retries = 32;
+	int port = adapter->portnum;
+	int pci_fn = adapter->ahw.pci_func;
 
 	DPRINTK(1, INFO, "Entered ISR Disable \n");
 
-	switch (adapter->portnum) {
-	case 0:
-		writel(0x0, NETXEN_CRB_NORMALIZE(adapter, CRB_SW_INT_MASK_0));
-		break;
-	case 1:
-		writel(0x0, NETXEN_CRB_NORMALIZE(adapter, CRB_SW_INT_MASK_1));
-		break;
-	case 2:
-		writel(0x0, NETXEN_CRB_NORMALIZE(adapter, CRB_SW_INT_MASK_2));
-		break;
-	case 3:
-		writel(0x0, NETXEN_CRB_NORMALIZE(adapter, CRB_SW_INT_MASK_3));
-		break;
+	if (adapter->msi_mode != MSI_MODE_MULTIFUNC) {
+		writel(0x0, NETXEN_CRB_NORMALIZE(adapter, sw_int_mask[port]));
 	}
 
 	if (adapter->intr_scheme != -1 &&
 	    adapter->intr_scheme != INTR_SCHEME_PERPORT)
-		writel(mask,PCI_OFFSET_SECOND_RANGE(adapter, ISR_INT_MASK));
+		writel(0x7ff,PCI_OFFSET_SECOND_RANGE(adapter, ISR_INT_MASK));
 
-	/* Window = 0 or 1 */
 	if (!(adapter->flags & NETXEN_NIC_MSI_ENABLED)) {
 		do {
 			writel(0xffffffff,
@@ -190,6 +190,11 @@ static void netxen_nic_disable_int(struc
 			printk(KERN_NOTICE "%s: Failed to disable interrupt completely\n",
 					netxen_nic_driver_name);
 		}
+	} else {
+		if (adapter->msi_mode == MSI_MODE_MULTIFUNC) {
+			writel(0xffffffff, PCI_OFFSET_SECOND_RANGE(adapter,
+						msi_tgt_status[pci_fn]));
+		}
 	}
 
 	DPRINTK(1, INFO, "Done with Disable Int\n");
@@ -198,6 +203,7 @@ static void netxen_nic_disable_int(struc
 static void netxen_nic_enable_int(struct netxen_adapter *adapter)
 {
 	u32 mask;
+	int port = adapter->portnum;
 
 	DPRINTK(1, INFO, "Entered ISR Enable \n");
 
@@ -218,20 +224,7 @@ static void netxen_nic_enable_int(struct
 		writel(mask, PCI_OFFSET_SECOND_RANGE(adapter, ISR_INT_MASK));
 	}
 
-	switch (adapter->portnum) {
-	case 0:
-		writel(0x1, NETXEN_CRB_NORMALIZE(adapter, CRB_SW_INT_MASK_0));
-		break;
-	case 1:
-		writel(0x1, NETXEN_CRB_NORMALIZE(adapter, CRB_SW_INT_MASK_1));
-		break;
-	case 2:
-		writel(0x1, NETXEN_CRB_NORMALIZE(adapter, CRB_SW_INT_MASK_2));
-		break;
-	case 3:
-		writel(0x1, NETXEN_CRB_NORMALIZE(adapter, CRB_SW_INT_MASK_3));
-		break;
-	}
+	writel(0x1, NETXEN_CRB_NORMALIZE(adapter, sw_int_mask[port]));
 
 	if (!(adapter->flags & NETXEN_NIC_MSI_ENABLED)) {
 		mask = 0xbff;
@@ -401,6 +394,7 @@ netxen_nic_probe(struct pci_dev *pdev, c
 
 	/* this will be read from FW later */
 	adapter->intr_scheme = -1;
+	adapter->msi_mode = -1;
 
 	/* This will be reset for mezz cards  */
 	adapter->portnum = pci_func_id;
Index: netdev-2.6/drivers/net/netxen/netxen_nic.h
===================================================================
--- netdev-2.6.orig/drivers/net/netxen/netxen_nic.h
+++ netdev-2.6/drivers/net/netxen/netxen_nic.h
@@ -939,6 +939,7 @@ struct netxen_adapter {
 	struct pci_dev *ctx_desc_pdev;
 	dma_addr_t ctx_desc_phys_addr;
 	int intr_scheme;
+	int msi_mode;
 	int (*enable_phy_interrupts) (struct netxen_adapter *);
 	int (*disable_phy_interrupts) (struct netxen_adapter *);
 	void (*handle_phy_intr) (struct netxen_adapter *);
Index: netdev-2.6/drivers/net/netxen/netxen_nic_phan_reg.h
===================================================================
--- netdev-2.6.orig/drivers/net/netxen/netxen_nic_phan_reg.h
+++ netdev-2.6/drivers/net/netxen/netxen_nic_phan_reg.h
@@ -126,8 +126,11 @@
  */
 #define CRB_NIC_CAPABILITIES_HOST	NETXEN_NIC_REG(0x1a8)
 #define CRB_NIC_CAPABILITIES_FW	  	NETXEN_NIC_REG(0x1dc)
+#define CRB_NIC_MSI_MODE_HOST		NETXEN_NIC_REG(0x270)
+#define CRB_NIC_MSI_MODE_FW	  		NETXEN_NIC_REG(0x274)
 
 #define INTR_SCHEME_PERPORT	      	0x1
+#define MSI_MODE_MULTIFUNC	      	0x1
 
 /* used for ethtool tests */
 #define CRB_SCRATCHPAD_TEST	    NETXEN_NIC_REG(0x280)
Index: netdev-2.6/drivers/net/netxen/netxen_nic_hdr.h
===================================================================
--- netdev-2.6.orig/drivers/net/netxen/netxen_nic_hdr.h
+++ netdev-2.6/drivers/net/netxen/netxen_nic_hdr.h
@@ -456,6 +456,12 @@ enum {
 #define ISR_INT_MASK_SLOW	(NETXEN_PCIX_PS_REG(PCIX_INT_MASK))
 #define ISR_INT_TARGET_STATUS	(NETXEN_PCIX_PS_REG(PCIX_TARGET_STATUS))
 #define ISR_INT_TARGET_MASK	(NETXEN_PCIX_PS_REG(PCIX_TARGET_MASK))
+#define ISR_INT_TARGET_STATUS_F1   (NETXEN_PCIX_PS_REG(PCIX_TARGET_STATUS_F1))
+#define ISR_INT_TARGET_MASK_F1     (NETXEN_PCIX_PS_REG(PCIX_TARGET_MASK_F1))
+#define ISR_INT_TARGET_STATUS_F2   (NETXEN_PCIX_PS_REG(PCIX_TARGET_STATUS_F2))
+#define ISR_INT_TARGET_MASK_F2     (NETXEN_PCIX_PS_REG(PCIX_TARGET_MASK_F2))
+#define ISR_INT_TARGET_STATUS_F3   (NETXEN_PCIX_PS_REG(PCIX_TARGET_STATUS_F3))
+#define ISR_INT_TARGET_MASK_F3     (NETXEN_PCIX_PS_REG(PCIX_TARGET_MASK_F3))
 
 #define NETXEN_PCI_MAPSIZE	128
 #define NETXEN_PCI_DDR_NET	(0x00000000UL)
@@ -662,6 +668,12 @@ enum {
 
 #define PCIX_TARGET_STATUS	(0x10118)
 #define PCIX_TARGET_MASK	(0x10128)
+#define PCIX_TARGET_STATUS_F1 (0x10160)
+#define PCIX_TARGET_MASK_F1   (0x10170)
+#define PCIX_TARGET_STATUS_F2 (0x10164)
+#define PCIX_TARGET_MASK_F2   (0x10174)
+#define PCIX_TARGET_STATUS_F3 (0x10168)
+#define PCIX_TARGET_MASK_F3   (0x10178)
 
 #define PCIX_MSI_F0		(0x13000)
 #define PCIX_MSI_F1		(0x13004)
Index: netdev-2.6/drivers/net/netxen/netxen_nic_init.c
===================================================================
--- netdev-2.6.orig/drivers/net/netxen/netxen_nic_init.c
+++ netdev-2.6/drivers/net/netxen/netxen_nic_init.c
@@ -145,6 +145,8 @@ int netxen_init_firmware(struct netxen_a
 	/* Window 1 call */
 	writel(INTR_SCHEME_PERPORT,
 	       NETXEN_CRB_NORMALIZE(adapter, CRB_NIC_CAPABILITIES_HOST));
+	writel(MSI_MODE_MULTIFUNC,
+	       NETXEN_CRB_NORMALIZE(adapter, CRB_NIC_MSI_MODE_HOST));
 	writel(MPORT_MULTI_FUNCTION_MODE,
 	       NETXEN_CRB_NORMALIZE(adapter, CRB_MPORT_MODE));
 	writel(PHAN_INITIALIZE_ACK,

-- 

^ permalink raw reply

* [patch 6/6] netxen: optimize tx handling
From: dhananjay @ 2007-12-21  2:37 UTC (permalink / raw)
  To: netdev; +Cc: jeff
In-Reply-To: <20071221023656.409657310@netxen.com>

[-- Attachment #1: xmit.patch --]
[-- Type: text/plain, Size: 4940 bytes --]

netxen driver allows limited number of threads simultaneously posting skb's
in tx ring. If transmit slot is unavailable, driver calls schedule() or
loops in xmit_frame().

This patch returns TX_BUSY and lets the stack reschedule the packet if
transmit slot is unavailable. Also removes unnecessary check for tx timeout
in the driver itself, the network stack does that anyway.

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>

Index: netdev-2.6/drivers/net/netxen/netxen_nic_main.c
===================================================================
--- netdev-2.6.orig/drivers/net/netxen/netxen_nic_main.c
+++ netdev-2.6/drivers/net/netxen/netxen_nic_main.c
@@ -986,28 +986,6 @@ static int netxen_nic_xmit_frame(struct 
 		return NETDEV_TX_OK;
 	}
 
-	/*
-	 * Everything is set up. Now, we just need to transmit it out.
-	 * Note that we have to copy the contents of buffer over to
-	 * right place. Later on, this can be optimized out by de-coupling the
-	 * producer index from the buffer index.
-	 */
-      retry_getting_window:
-	spin_lock_bh(&adapter->tx_lock);
-	if (adapter->total_threads >= MAX_XMIT_PRODUCERS) {
-		spin_unlock_bh(&adapter->tx_lock);
-		/*
-		 * Yield CPU
-		 */
-		if (!in_atomic())
-			schedule();
-		else {
-			for (i = 0; i < 20; i++)
-				cpu_relax();	/*This a nop instr on i386 */
-		}
-		goto retry_getting_window;
-	}
-	local_producer = adapter->cmd_producer;
 	/* There 4 fragments per descriptor */
 	no_of_desc = (frag_count + 3) >> 2;
 	if (netdev->features & NETIF_F_TSO) {
@@ -1021,16 +999,19 @@ static int netxen_nic_xmit_frame(struct 
 			}
 		}
 	}
+
+	spin_lock_bh(&adapter->tx_lock);
+	if (adapter->total_threads >= MAX_XMIT_PRODUCERS) {
+		goto out_requeue;
+	}
+	local_producer = adapter->cmd_producer;
 	k = adapter->cmd_producer;
 	max_tx_desc_count = adapter->max_tx_desc_count;
 	last_cmd_consumer = adapter->last_cmd_consumer;
 	if ((k + no_of_desc) >=
 	    ((last_cmd_consumer <= k) ? last_cmd_consumer + max_tx_desc_count :
 	     last_cmd_consumer)) {
-		netif_stop_queue(netdev);
-		adapter->flags |= NETXEN_NETDEV_STATUS;
-		spin_unlock_bh(&adapter->tx_lock);
-		return NETDEV_TX_BUSY;
+		goto out_requeue;
 	}
 	k = get_index_range(k, max_tx_desc_count, no_of_desc);
 	adapter->cmd_producer = k;
@@ -1083,6 +1064,8 @@ static int netxen_nic_xmit_frame(struct 
 						  adapter->max_tx_desc_count);
 			hwdesc = &hw->cmd_desc_head[producer];
 			memset(hwdesc, 0, sizeof(struct cmd_desc_type0));
+			pbuf = &adapter->cmd_buf_arr[producer];
+			pbuf->skb = NULL;
 		}
 		frag = &skb_shinfo(skb)->frags[i - 1];
 		len = frag->size;
@@ -1138,6 +1121,8 @@ static int netxen_nic_xmit_frame(struct 
 		}
 		/* copy the MAC/IP/TCP headers to the cmd descriptor list */
 		hwdesc = &hw->cmd_desc_head[producer];
+		pbuf = &adapter->cmd_buf_arr[producer];
+		pbuf->skb = NULL;
 
 		/* copy the first 64 bytes */
 		memcpy(((void *)hwdesc) + 2,
@@ -1146,6 +1131,8 @@ static int netxen_nic_xmit_frame(struct 
 
 		if (more_hdr) {
 			hwdesc = &hw->cmd_desc_head[producer];
+			pbuf = &adapter->cmd_buf_arr[producer];
+			pbuf->skb = NULL;
 			/* copy the next 64 bytes - should be enough except
 			 * for pathological case
 			 */
@@ -1179,14 +1166,17 @@ static int netxen_nic_xmit_frame(struct 
 	}
 
 	adapter->stats.xmitfinished++;
-	spin_unlock_bh(&adapter->tx_lock);
-
 	netdev->trans_start = jiffies;
 
-	DPRINTK(INFO, "wrote CMD producer %x to phantom\n", producer);
-
-	DPRINTK(INFO, "Done. Send\n");
+	spin_unlock_bh(&adapter->tx_lock);
 	return NETDEV_TX_OK;
+
+out_requeue:
+	netif_stop_queue(netdev);
+	adapter->flags |= NETXEN_NETDEV_STATUS;
+
+	spin_unlock_bh(&adapter->tx_lock);
+	return NETDEV_TX_BUSY;
 }
 
 static void netxen_watchdog(unsigned long v)
Index: netdev-2.6/drivers/net/netxen/netxen_nic_init.c
===================================================================
--- netdev-2.6.orig/drivers/net/netxen/netxen_nic_init.c
+++ netdev-2.6/drivers/net/netxen/netxen_nic_init.c
@@ -1229,7 +1229,6 @@ int netxen_process_cmd_ring(struct netxe
 	struct pci_dev *pdev;
 	struct netxen_skb_frag *frag;
 	u32 i;
-	struct sk_buff *skb = NULL;
 	int done;
 
 	spin_lock(&adapter->tx_lock);
@@ -1259,9 +1258,8 @@ int netxen_process_cmd_ring(struct netxe
 	while ((last_consumer != consumer) && (count1 < MAX_STATUS_HANDLE)) {
 		buffer = &adapter->cmd_buf_arr[last_consumer];
 		pdev = adapter->pdev;
-		frag = &buffer->frag_array[0];
-		skb = buffer->skb;
-		if (skb && (cmpxchg(&buffer->skb, skb, 0) == skb)) {
+		if (buffer->skb) {
+			frag = &buffer->frag_array[0];
 			pci_unmap_single(pdev, frag->dma, frag->length,
 					 PCI_DMA_TODEVICE);
 			frag->dma = 0ULL;
@@ -1274,8 +1272,8 @@ int netxen_process_cmd_ring(struct netxe
 			}
 
 			adapter->stats.skbfreed++;
-			dev_kfree_skb_any(skb);
-			skb = NULL;
+			dev_kfree_skb_any(buffer->skb);
+			buffer->skb = NULL;
 		} else if (adapter->proc_cmd_buf_counter == 1) {
 			adapter->stats.txnullskb++;
 		}

-- 

^ permalink raw reply

* [patch 1/6] netxen: Update MAINTAINERS
From: dhananjay @ 2007-12-21  2:36 UTC (permalink / raw)
  To: netdev; +Cc: jeff
In-Reply-To: <20071221023656.409657310@netxen.com>

[-- Attachment #1: maintainer.patch --]
[-- Type: text/plain, Size: 524 bytes --]

Changing MAINTAINERS for netxen nic driver.

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>

Index: netdev-2.6/MAINTAINERS
===================================================================
--- netdev-2.6.orig/MAINTAINERS
+++ netdev-2.6/MAINTAINERS
@@ -2738,8 +2738,8 @@ T:	git kernel.org:/pub/scm/linux/kernel/
 S:	Maintained
 
 NETXEN (1/10) GbE SUPPORT
-P:	Amit S. Kale
-M:	amitkale@netxen.com
+P:	Dhananjay Phadke
+M:	dhananjay@netxen.com
 L:	netdev@vger.kernel.org
 W:	http://www.netxen.com
 S:	Supported

-- 

^ permalink raw reply

* [patch 2/6] netxen: update driver version
From: dhananjay @ 2007-12-21  2:36 UTC (permalink / raw)
  To: netdev; +Cc: jeff
In-Reply-To: <20071221023656.409657310@netxen.com>

[-- Attachment #1: version.patch --]
[-- Type: text/plain, Size: 713 bytes --]

Bumping up driver version to 3.4.18, several fixes have gone in since version 3.4.2.

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>

Index: netdev-2.6/drivers/net/netxen/netxen_nic.h
===================================================================
--- netdev-2.6.orig/drivers/net/netxen/netxen_nic.h
+++ netdev-2.6/drivers/net/netxen/netxen_nic.h
@@ -65,8 +65,8 @@
 
 #define _NETXEN_NIC_LINUX_MAJOR 3
 #define _NETXEN_NIC_LINUX_MINOR 4
-#define _NETXEN_NIC_LINUX_SUBVERSION 2
-#define NETXEN_NIC_LINUX_VERSIONID  "3.4.2"
+#define _NETXEN_NIC_LINUX_SUBVERSION 18
+#define NETXEN_NIC_LINUX_VERSIONID  "3.4.18"
 
 #define NETXEN_NUM_FLASH_SECTORS (64)
 #define NETXEN_FLASH_SECTOR_SIZE (64 * 1024)

-- 

^ permalink raw reply

* [patch 0/6] netxen bug fixes
From: dhananjay @ 2007-12-21  2:36 UTC (permalink / raw)
  To: netdev; +Cc: jeff

Sending out 4 bugfixes and some improvements for the netxen nic driver.
Also updating driver version and maintainer. The patches are generated
against upstream branch.


 MAINTAINERS                              |    4 +-
 drivers/net/netxen/netxen_nic.h          |   10 +-
 drivers/net/netxen/netxen_nic_hdr.h      |   12 ++
 drivers/net/netxen/netxen_nic_hw.c       |    2 +
 drivers/net/netxen/netxen_nic_init.c     |   65 ++----------
 drivers/net/netxen/netxen_nic_main.c     |  174 ++++++++++--------------------
 drivers/net/netxen/netxen_nic_niu.c      |    8 +-
 drivers/net/netxen/netxen_nic_phan_reg.h |    3 +
 8 files changed, 94 insertions(+), 184 deletions(-)



^ permalink raw reply

* Re: [PATCH] [IPROUTE]: A workaround to make larger rto_min printed correctly
From: Satoru SATOH @ 2007-12-21  2:24 UTC (permalink / raw)
  To: netdev
In-Reply-To: <476AD127.2020909@gmail.com>

2007/12/21, Jarek Poplawski <jarkao2@gmail.com>:
> Jarek Poplawski wrote, On 12/20/2007 09:24 PM:
> ...
>
> > but since it's your patch, I hope you do some additional checking
> > if it's always like this...
>
>
> ...or maybe only changing this all a little bit will make it look safer!
>
> Jarek P.


OK, how about this?

Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com>

 ip/iproute.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index f4200ae..c771b34 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -510,16 +510,20 @@ int print_route(const struct sockaddr_nl *who,
struct nlmsghdr *n, void *arg)
 				fprintf(fp, " %u", *(unsigned*)RTA_DATA(mxrta[i]));
 			else {
 				unsigned val = *(unsigned*)RTA_DATA(mxrta[i]);
+				unsigned hz1 = hz;
+				if (hz1 > 1000)
+					hz1 /= 1000;
+				else
+					val *= 1000;

-				val *= 1000;
 				if (i == RTAX_RTT)
 					val /= 8;
 				else if (i == RTAX_RTTVAR)
 					val /= 4;
-				if (val >= hz)
-					fprintf(fp, " %ums", val/hz);
+				if (val >= hz1)
+					fprintf(fp, " %ums", val/hz1);
 				else
-					fprintf(fp, " %.2fms", (float)val/hz);
+					fprintf(fp, " %.2fms", (float)val/hz1);
 			}
 		}
 	}


Thanks,
Satoru SATOH

^ permalink raw reply related

* Re: Update ip command line processing
From: Simon Horman @ 2007-12-21  1:32 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, netdev, apw
In-Reply-To: <20071220.152121.188610437.davem@davemloft.net>

On Thu, Dec 20, 2007 at 03:21:21PM -0800, David Miller wrote:
> From: Simon Horman <horms@verge.net.au>
> Date: Tue, 18 Dec 2007 17:57:32 +0900
> 
> > @@ -1414,9 +1414,16 @@ late_initcall(ip_auto_config);
> >   */
> >  static int __init ic_proto_name(char *name)
> >  {
> > +	if (!name) {
> > +		return 1;
> > +	}
> 
> I do not see any circumstance under which this pointer can
> be NULL.  Judging by your other changes, I think you mean
> to use "!*name" here.
> 
> Maybe:
> 
> 	if (*name == '\0')
> 
> would make it clearer what you're checking for, an
> empty string.
> 
> Otherwise I'm fine with your change.

Sorry, I meant if (*name == '\0') as you suggest to replace the first
portion of:

-	ic_enable = (*addrs &&
-		(strcmp(addrs, "off") != 0) &&
-		(strcmp(addrs, "none") != 0));

I'll send an updated patch shortly.

-- 
Horms


^ permalink raw reply

* skbuff data pointer alignment requirement
From: Keyur Chudgar @ 2007-12-21  1:12 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 2829 bytes --]

Hi,

I would like to ask/suggest some requirements for memory alignment in
Linux memory allocation. Is there any way, in Linux kernel socket
buffer allocation scheme, particularly the data pointer of the skb and
not the skb itself, that, if I want to have the address of allocated
data block to be 256 bytes aligned? If there is any way, can you
please let me know about it?

Based on my review of memory allocation for data pointer of skb, in
the __alloc_skb function, file skbuff.c, the data pointer is allocated
based on the size specified. It is allocated by kmalloc.

The kmalloc is allocating from one of the fixed sized pools, for which
the size will be in power of 2. So, for example, if the size specified
is 100, I will get the buffer of size 128 bytes, and for which the
address will also be 128 bytes aligned.

If some hardware requirements, for example is, they need to have 256
bytes aligned address for them to do the DMA, no matter what the
packet size is. In this kind of cases, can you guide me what should I
do? Is there any way already in Linux I can do this?

If there is no way we can do it as of now, can I propose to have some
implementation in __alloc_skb function, so that, if one has defined a
flag, specifying the minimum alignment requirement for data pointer,
kmalloc will give the pointer with the specified alignment? I have
done this change in
the __alloc_skb function as folloing:

#if defined SKB_ADDR_MIN_ALIGN
                size+=SKB_ADDR_MIN_ALIGN;
#endif

Just before following code:
        size = SKB_DATA_ALIGN(size);

        data = kmalloc_node_track_caller(size + sizeof(struct skb_shared_info),
                        gfp_mask, node);

In the above specified situation, I can define SKB_ADDR_MIN_ALIGN =
256 in my Makefile or I don't define it at all if I am okay with
default alignment size.

Similarly, if one wants to reserve a specific amount of data as soon
as the data pointer is allocated in __alloc_skb, he/she can define
flag for this. We can add following code in the same function
__alloc_skb,

#if defined SKB_RESERVE_MIN
        skb_reserve(skb,SKB_RESERVE_MIN);
#endif

Just after following code:
        memset(skb, 0, offsetof(struct sk_buff, truesize));
        skb->truesize = size + sizeof(struct sk_buff);
        atomic_set(&skb->users, 1);
        skb->head = data;
        skb->data = data;
        skb->tail = data;
        skb->end  = data + size;

By doing it this way, others will not get impacted if they have not
defined these flags. But, people who need to have some requirements
like this, they can define these flags.

Would you please kindly advice on the suggested implenentation or let
me know if you think Linux can already provide some means of
fulfilling these requirements? I would really appreciate your
feedback.

Sincerely,
Keyur Chudgar

[-- Attachment #2: skbuff.c --]
[-- Type: text/plain, Size: 52143 bytes --]

/*
 *	Routines having to do with the 'struct sk_buff' memory handlers.
 *
 *	Authors:	Alan Cox <iiitac@pyr.swan.ac.uk>
 *			Florian La Roche <rzsfl@rz.uni-sb.de>
 *
 *	Version:	$Id: skbuff.c,v 1.90 2001/11/07 05:56:19 davem Exp $
 *
 *	Fixes:
 *		Alan Cox	:	Fixed the worst of the load
 *					balancer bugs.
 *		Dave Platt	:	Interrupt stacking fix.
 *	Richard Kooijman	:	Timestamp fixes.
 *		Alan Cox	:	Changed buffer format.
 *		Alan Cox	:	destructor hook for AF_UNIX etc.
 *		Linus Torvalds	:	Better skb_clone.
 *		Alan Cox	:	Added skb_copy.
 *		Alan Cox	:	Added all the changed routines Linus
 *					only put in the headers
 *		Ray VanTassle	:	Fixed --skb->lock in free
 *		Alan Cox	:	skb_copy copy arp field
 *		Andi Kleen	:	slabified it.
 *		Robert Olsson	:	Removed skb_head_pool
 *
 *	NOTE:
 *		The __skb_ routines should be called with interrupts
 *	disabled, or you better be *real* sure that the operation is atomic
 *	with respect to whatever list is being frobbed (e.g. via lock_sock()
 *	or via disabling bottom half handlers, etc).
 *
 *	This program is free software; you can redistribute it and/or
 *	modify it under the terms of the GNU General Public License
 *	as published by the Free Software Foundation; either version
 *	2 of the License, or (at your option) any later version.
 */

/*
 *	The functions in this file will not compile correctly with gcc 2.4.x
 */

#include <linux/module.h>
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/interrupt.h>
#include <linux/in.h>
#include <linux/inet.h>
#include <linux/slab.h>
#include <linux/netdevice.h>
#ifdef CONFIG_NET_CLS_ACT
#include <net/pkt_sched.h>
#endif
#include <linux/string.h>
#include <linux/skbuff.h>
#include <linux/cache.h>
#include <linux/rtnetlink.h>
#include <linux/init.h>

#include <net/protocol.h>
#include <net/dst.h>
#include <net/sock.h>
#include <net/checksum.h>
#include <net/xfrm.h>

#include <asm/uaccess.h>
#include <asm/system.h>

#include "kmap_skb.h"

static struct kmem_cache *skbuff_head_cache __read_mostly;
static struct kmem_cache *skbuff_fclone_cache __read_mostly;

/*#define SKB_ADDR_MIN_ALIGN 	256
#define SKB_RESERVE_MIN		 32*/

/*
 *	Keep out-of-line to prevent kernel bloat.
 *	__builtin_return_address is not used because it is not always
 *	reliable.
 */

/**
 *	skb_over_panic	- 	private function
 *	@skb: buffer
 *	@sz: size
 *	@here: address
 *
 *	Out of line support code for skb_put(). Not user callable.
 */
void skb_over_panic(struct sk_buff *skb, int sz, void *here)
{
	printk(KERN_EMERG "skb_over_panic: text:%p len:%d put:%d head:%p "
			  "data:%p tail:%p end:%p dev:%s\n",
	       here, skb->len, sz, skb->head, skb->data, skb->tail, skb->end,
	       skb->dev ? skb->dev->name : "<NULL>");
	BUG();
}

/**
 *	skb_under_panic	- 	private function
 *	@skb: buffer
 *	@sz: size
 *	@here: address
 *
 *	Out of line support code for skb_push(). Not user callable.
 */

void skb_under_panic(struct sk_buff *skb, int sz, void *here)
{
	printk(KERN_EMERG "skb_under_panic: text:%p len:%d put:%d head:%p "
			  "data:%p tail:%p end:%p dev:%s\n",
	       here, skb->len, sz, skb->head, skb->data, skb->tail, skb->end,
	       skb->dev ? skb->dev->name : "<NULL>");
	BUG();
}

void skb_truesize_bug(struct sk_buff *skb)
{
	printk(KERN_ERR "SKB BUG: Invalid truesize (%u) "
	       "len=%u, sizeof(sk_buff)=%Zd\n",
	       skb->truesize, skb->len, sizeof(struct sk_buff));
}
EXPORT_SYMBOL(skb_truesize_bug);

/* 	Allocate a new skbuff. We do this ourselves so we can fill in a few
 *	'private' fields and also do memory statistics to find all the
 *	[BEEP] leaks.
 *
 */

/**
 *	__alloc_skb	-	allocate a network buffer
 *	@size: size to allocate
 *	@gfp_mask: allocation mask
 *	@fclone: allocate from fclone cache instead of head cache
 *		and allocate a cloned (child) skb
 *	@node: numa node to allocate memory on
 *
 *	Allocate a new &sk_buff. The returned buffer has no headroom and a
 *	tail room of size bytes. The object has a reference count of one.
 *	The return is the buffer. On a failure the return is %NULL.
 *
 *	Buffers may only be allocated from interrupts using a @gfp_mask of
 *	%GFP_ATOMIC.
 */
struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
			    int fclone, int node)
{
	struct kmem_cache *cache;
	struct skb_shared_info *shinfo;
	struct sk_buff *skb;
	u8 *data;

	cache = fclone ? skbuff_fclone_cache : skbuff_head_cache;

	/* Get the HEAD */
	skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node);
	if (!skb)
		goto out;

	/* Get the DATA. Size must match skb_add_mtu(). */
#if defined SKB_ADDR_MIN_ALIGN
		size+=SKB_ADDR_MIN_ALIGN;
#endif
	size = SKB_DATA_ALIGN(size);

	data = kmalloc_node_track_caller(size + sizeof(struct skb_shared_info),
			gfp_mask, node);
	
	if (!data)
		goto nodata;

	memset(skb, 0, offsetof(struct sk_buff, truesize));
	skb->truesize = size + sizeof(struct sk_buff);
	atomic_set(&skb->users, 1);
	skb->head = data;
	skb->data = data;
	skb->tail = data;
	skb->end  = data + size;

#if defined SKB_RESERVE_MIN
	skb_reserve(skb,SKB_RESERVE_MIN);
#endif
	/* make sure we initialize shinfo sequentially */
	shinfo = skb_shinfo(skb);
	atomic_set(&shinfo->dataref, 1);
	shinfo->nr_frags  = 0;
	shinfo->gso_size = 0;
	shinfo->gso_segs = 0;
	shinfo->gso_type = 0;
	shinfo->ip6_frag_id = 0;
	shinfo->frag_list = NULL;

	if (fclone) {
		struct sk_buff *child = skb + 1;
		atomic_t *fclone_ref = (atomic_t *) (child + 1);

		skb->fclone = SKB_FCLONE_ORIG;
		atomic_set(fclone_ref, 1);

		child->fclone = SKB_FCLONE_UNAVAILABLE;
	}
out:
	return skb;
nodata:
	kmem_cache_free(cache, skb);
	skb = NULL;
	goto out;
}

/**
 *	alloc_skb_from_cache	-	allocate a network buffer
 *	@cp: kmem_cache from which to allocate the data area
 *           (object size must be big enough for @size bytes + skb overheads)
 *	@size: size to allocate
 *	@gfp_mask: allocation mask
 *
 *	Allocate a new &sk_buff. The returned buffer has no headroom and
 *	tail room of size bytes. The object has a reference count of one.
 *	The return is the buffer. On a failure the return is %NULL.
 *
 *	Buffers may only be allocated from interrupts using a @gfp_mask of
 *	%GFP_ATOMIC.
 */
struct sk_buff *alloc_skb_from_cache(struct kmem_cache *cp,
				     unsigned int size,
				     gfp_t gfp_mask)
{
	struct sk_buff *skb;
	u8 *data;

	/* Get the HEAD */
	skb = kmem_cache_alloc(skbuff_head_cache,
			       gfp_mask & ~__GFP_DMA);
	if (!skb)
		goto out;

	/* Get the DATA. */
	size = SKB_DATA_ALIGN(size);
	data = kmem_cache_alloc(cp, gfp_mask);
	if (!data)
		goto nodata;

	memset(skb, 0, offsetof(struct sk_buff, truesize));
	skb->truesize = size + sizeof(struct sk_buff);
	atomic_set(&skb->users, 1);
	skb->head = data;
	skb->data = data;
	skb->tail = data;
	skb->end  = data + size;

	atomic_set(&(skb_shinfo(skb)->dataref), 1);
	skb_shinfo(skb)->nr_frags  = 0;
	skb_shinfo(skb)->gso_size = 0;
	skb_shinfo(skb)->gso_segs = 0;
	skb_shinfo(skb)->gso_type = 0;
	skb_shinfo(skb)->frag_list = NULL;
out:
	return skb;
nodata:
	kmem_cache_free(skbuff_head_cache, skb);
	skb = NULL;
	goto out;
}

/**
 *	__netdev_alloc_skb - allocate an skbuff for rx on a specific device
 *	@dev: network device to receive on
 *	@length: length to allocate
 *	@gfp_mask: get_free_pages mask, passed to alloc_skb
 *
 *	Allocate a new &sk_buff and assign it a usage count of one. The
 *	buffer has unspecified headroom built in. Users should allocate
 *	the headroom they think they need without accounting for the
 *	built in space. The built in space is used for optimisations.
 *
 *	%NULL is returned if there is no free memory.
 */
struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
		unsigned int length, gfp_t gfp_mask)
{
	int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
	struct sk_buff *skb;

	skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask, 0, node);
	if (likely(skb)) {
		skb_reserve(skb, NET_SKB_PAD);
		skb->dev = dev;
	}
	return skb;
}

static void skb_drop_list(struct sk_buff **listp)
{
	struct sk_buff *list = *listp;

	*listp = NULL;

	do {
		struct sk_buff *this = list;
		list = list->next;
		kfree_skb(this);
	} while (list);
}

static inline void skb_drop_fraglist(struct sk_buff *skb)
{
	skb_drop_list(&skb_shinfo(skb)->frag_list);
}

static void skb_clone_fraglist(struct sk_buff *skb)
{
	struct sk_buff *list;

	for (list = skb_shinfo(skb)->frag_list; list; list = list->next)
		skb_get(list);
}

static void skb_release_data(struct sk_buff *skb)
{
	if (!skb->cloned ||
	    !atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
			       &skb_shinfo(skb)->dataref)) {
		if (skb_shinfo(skb)->nr_frags) {
			int i;
			for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
				put_page(skb_shinfo(skb)->frags[i].page);
		}

		if (skb_shinfo(skb)->frag_list)
			skb_drop_fraglist(skb);

		kfree(skb->head);
	}
}

/*
 *	Free an skbuff by memory without cleaning the state.
 */
void kfree_skbmem(struct sk_buff *skb)
{
	struct sk_buff *other;
	atomic_t *fclone_ref;

	skb_release_data(skb);
	switch (skb->fclone) {
	case SKB_FCLONE_UNAVAILABLE:
		kmem_cache_free(skbuff_head_cache, skb);
		break;

	case SKB_FCLONE_ORIG:
		fclone_ref = (atomic_t *) (skb + 2);
		if (atomic_dec_and_test(fclone_ref))
			kmem_cache_free(skbuff_fclone_cache, skb);
		break;

	case SKB_FCLONE_CLONE:
		fclone_ref = (atomic_t *) (skb + 1);
		other = skb - 1;

		/* The clone portion is available for
		 * fast-cloning again.
		 */
		skb->fclone = SKB_FCLONE_UNAVAILABLE;

		if (atomic_dec_and_test(fclone_ref))
			kmem_cache_free(skbuff_fclone_cache, other);
		break;
	};
}

/**
 *	__kfree_skb - private function
 *	@skb: buffer
 *
 *	Free an sk_buff. Release anything attached to the buffer.
 *	Clean the state. This is an internal helper function. Users should
 *	always call kfree_skb
 */

void __kfree_skb(struct sk_buff *skb)
{
	dst_release(skb->dst);
#ifdef CONFIG_XFRM
	secpath_put(skb->sp);
#endif
	if (skb->destructor) {
		WARN_ON(in_irq());
		skb->destructor(skb);
	}
#ifdef CONFIG_NETFILTER
	nf_conntrack_put(skb->nfct);
#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
	nf_conntrack_put_reasm(skb->nfct_reasm);
#endif
#ifdef CONFIG_BRIDGE_NETFILTER
	nf_bridge_put(skb->nf_bridge);
#endif
#endif
/* XXX: IS this still necessary? - JHS */
#ifdef CONFIG_NET_SCHED
	skb->tc_index = 0;
#ifdef CONFIG_NET_CLS_ACT
	skb->tc_verd = 0;
#endif
#endif

	kfree_skbmem(skb);
}

/**
 *	kfree_skb - free an sk_buff
 *	@skb: buffer to free
 *
 *	Drop a reference to the buffer and free it if the usage count has
 *	hit zero.
 */
void kfree_skb(struct sk_buff *skb)
{
	if (unlikely(!skb))
		return;
	if (likely(atomic_read(&skb->users) == 1))
		smp_rmb();
	else if (likely(!atomic_dec_and_test(&skb->users)))
		return;
	__kfree_skb(skb);
}

/**
 *	skb_clone	-	duplicate an sk_buff
 *	@skb: buffer to clone
 *	@gfp_mask: allocation priority
 *
 *	Duplicate an &sk_buff. The new one is not owned by a socket. Both
 *	copies share the same packet data but not structure. The new
 *	buffer has a reference count of 1. If the allocation fails the
 *	function returns %NULL otherwise the new buffer is returned.
 *
 *	If this function is called from an interrupt gfp_mask() must be
 *	%GFP_ATOMIC.
 */

struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
{
	struct sk_buff *n;

	n = skb + 1;
	if (skb->fclone == SKB_FCLONE_ORIG &&
	    n->fclone == SKB_FCLONE_UNAVAILABLE) {
		atomic_t *fclone_ref = (atomic_t *) (n + 1);
		n->fclone = SKB_FCLONE_CLONE;
		atomic_inc(fclone_ref);
	} else {
		n = kmem_cache_alloc(skbuff_head_cache, gfp_mask);
		if (!n)
			return NULL;
		n->fclone = SKB_FCLONE_UNAVAILABLE;
	}

#define C(x) n->x = skb->x

	n->next = n->prev = NULL;
	n->sk = NULL;
	C(tstamp);
	C(dev);
	C(h);
	C(nh);
	C(mac);
	C(dst);
	dst_clone(skb->dst);
	C(sp);
#ifdef CONFIG_INET
	secpath_get(skb->sp);
#endif
	memcpy(n->cb, skb->cb, sizeof(skb->cb));
	C(len);
	C(data_len);
	C(mac_len);
	C(csum);
	C(local_df);
	n->cloned = 1;
	n->nohdr = 0;
	C(pkt_type);
	C(ip_summed);
	C(priority);
#if defined(CONFIG_IP_VS) || defined(CONFIG_IP_VS_MODULE)
	C(ipvs_property);
#endif
	C(protocol);
	n->destructor = NULL;
	C(mark);
#ifdef CONFIG_NETFILTER
	C(nfct);
	nf_conntrack_get(skb->nfct);
	C(nfctinfo);
#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
	C(nfct_reasm);
	nf_conntrack_get_reasm(skb->nfct_reasm);
#endif
#ifdef CONFIG_BRIDGE_NETFILTER
	C(nf_bridge);
	nf_bridge_get(skb->nf_bridge);
#endif
#endif /*CONFIG_NETFILTER*/
#ifdef CONFIG_NET_SCHED
	C(tc_index);
#ifdef CONFIG_NET_CLS_ACT
	n->tc_verd = SET_TC_VERD(skb->tc_verd,0);
	n->tc_verd = CLR_TC_OK2MUNGE(n->tc_verd);
	n->tc_verd = CLR_TC_MUNGED(n->tc_verd);
	C(input_dev);
#endif
	skb_copy_secmark(n, skb);
#endif
	C(truesize);
	atomic_set(&n->users, 1);
	C(head);
	C(data);
	C(tail);
	C(end);

	atomic_inc(&(skb_shinfo(skb)->dataref));
	skb->cloned = 1;

	return n;
}

static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
{
	/*
	 *	Shift between the two data areas in bytes
	 */
	unsigned long offset = new->data - old->data;

	new->sk		= NULL;
	new->dev	= old->dev;
	new->priority	= old->priority;
	new->protocol	= old->protocol;
	new->dst	= dst_clone(old->dst);
#ifdef CONFIG_INET
	new->sp		= secpath_get(old->sp);
#endif
	new->h.raw	= old->h.raw + offset;
	new->nh.raw	= old->nh.raw + offset;
	new->mac.raw	= old->mac.raw + offset;
	memcpy(new->cb, old->cb, sizeof(old->cb));
	new->local_df	= old->local_df;
	new->fclone	= SKB_FCLONE_UNAVAILABLE;
	new->pkt_type	= old->pkt_type;
	new->tstamp	= old->tstamp;
	new->destructor = NULL;
	new->mark	= old->mark;
#ifdef CONFIG_NETFILTER
	new->nfct	= old->nfct;
	nf_conntrack_get(old->nfct);
	new->nfctinfo	= old->nfctinfo;
#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
	new->nfct_reasm = old->nfct_reasm;
	nf_conntrack_get_reasm(old->nfct_reasm);
#endif
#if defined(CONFIG_IP_VS) || defined(CONFIG_IP_VS_MODULE)
	new->ipvs_property = old->ipvs_property;
#endif
#ifdef CONFIG_BRIDGE_NETFILTER
	new->nf_bridge	= old->nf_bridge;
	nf_bridge_get(old->nf_bridge);
#endif
#endif
#ifdef CONFIG_NET_SCHED
#ifdef CONFIG_NET_CLS_ACT
	new->tc_verd = old->tc_verd;
#endif
	new->tc_index	= old->tc_index;
#endif
	skb_copy_secmark(new, old);
	atomic_set(&new->users, 1);
	skb_shinfo(new)->gso_size = skb_shinfo(old)->gso_size;
	skb_shinfo(new)->gso_segs = skb_shinfo(old)->gso_segs;
	skb_shinfo(new)->gso_type = skb_shinfo(old)->gso_type;
}

/**
 *	skb_copy	-	create private copy of an sk_buff
 *	@skb: buffer to copy
 *	@gfp_mask: allocation priority
 *
 *	Make a copy of both an &sk_buff and its data. This is used when the
 *	caller wishes to modify the data and needs a private copy of the
 *	data to alter. Returns %NULL on failure or the pointer to the buffer
 *	on success. The returned buffer has a reference count of 1.
 *
 *	As by-product this function converts non-linear &sk_buff to linear
 *	one, so that &sk_buff becomes completely private and caller is allowed
 *	to modify all the data of returned buffer. This means that this
 *	function is not recommended for use in circumstances when only
 *	header is going to be modified. Use pskb_copy() instead.
 */

struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask)
{
	int headerlen = skb->data - skb->head;
	/*
	 *	Allocate the copy buffer
	 */
	struct sk_buff *n = alloc_skb(skb->end - skb->head + skb->data_len,
				      gfp_mask);
	if (!n)
		return NULL;

	/* Set the data pointer */
	skb_reserve(n, headerlen);
	/* Set the tail pointer and length */
	skb_put(n, skb->len);
	n->csum	     = skb->csum;
	n->ip_summed = skb->ip_summed;

	if (skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len))
		BUG();

	copy_skb_header(n, skb);
	return n;
}


/**
 *	pskb_copy	-	create copy of an sk_buff with private head.
 *	@skb: buffer to copy
 *	@gfp_mask: allocation priority
 *
 *	Make a copy of both an &sk_buff and part of its data, located
 *	in header. Fragmented data remain shared. This is used when
 *	the caller wishes to modify only header of &sk_buff and needs
 *	private copy of the header to alter. Returns %NULL on failure
 *	or the pointer to the buffer on success.
 *	The returned buffer has a reference count of 1.
 */

struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask)
{
	/*
	 *	Allocate the copy buffer
	 */
	struct sk_buff *n = alloc_skb(skb->end - skb->head, gfp_mask);

	if (!n)
		goto out;

	/* Set the data pointer */
	skb_reserve(n, skb->data - skb->head);
	/* Set the tail pointer and length */
	skb_put(n, skb_headlen(skb));
	/* Copy the bytes */
	memcpy(n->data, skb->data, n->len);
	n->csum	     = skb->csum;
	n->ip_summed = skb->ip_summed;

	n->truesize += skb->data_len;
	n->data_len  = skb->data_len;
	n->len	     = skb->len;

	if (skb_shinfo(skb)->nr_frags) {
		int i;

		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
			skb_shinfo(n)->frags[i] = skb_shinfo(skb)->frags[i];
			get_page(skb_shinfo(n)->frags[i].page);
		}
		skb_shinfo(n)->nr_frags = i;
	}

	if (skb_shinfo(skb)->frag_list) {
		skb_shinfo(n)->frag_list = skb_shinfo(skb)->frag_list;
		skb_clone_fraglist(n);
	}

	copy_skb_header(n, skb);
out:
	return n;
}

/**
 *	pskb_expand_head - reallocate header of &sk_buff
 *	@skb: buffer to reallocate
 *	@nhead: room to add at head
 *	@ntail: room to add at tail
 *	@gfp_mask: allocation priority
 *
 *	Expands (or creates identical copy, if &nhead and &ntail are zero)
 *	header of skb. &sk_buff itself is not changed. &sk_buff MUST have
 *	reference count of 1. Returns zero in the case of success or error,
 *	if expansion failed. In the last case, &sk_buff is not changed.
 *
 *	All the pointers pointing into skb header may change and must be
 *	reloaded after call to this function.
 */

int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
		     gfp_t gfp_mask)
{
	int i;
	u8 *data;
	int size = nhead + (skb->end - skb->head) + ntail;
	long off;

	if (skb_shared(skb))
		BUG();

	size = SKB_DATA_ALIGN(size);

	data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
	if (!data)
		goto nodata;

	/* Copy only real data... and, alas, header. This should be
	 * optimized for the cases when header is void. */
	memcpy(data + nhead, skb->head, skb->tail - skb->head);
	memcpy(data + size, skb->end, sizeof(struct skb_shared_info));

	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
		get_page(skb_shinfo(skb)->frags[i].page);

	if (skb_shinfo(skb)->frag_list)
		skb_clone_fraglist(skb);

	skb_release_data(skb);

	off = (data + nhead) - skb->head;

	skb->head     = data;
	skb->end      = data + size;
	skb->data    += off;
	skb->tail    += off;
	skb->mac.raw += off;
	skb->h.raw   += off;
	skb->nh.raw  += off;
	skb->cloned   = 0;
	skb->nohdr    = 0;
	atomic_set(&skb_shinfo(skb)->dataref, 1);
	return 0;

nodata:
	return -ENOMEM;
}

/* Make private copy of skb with writable head and some headroom */

struct sk_buff *skb_realloc_headroom(struct sk_buff *skb, unsigned int headroom)
{
	struct sk_buff *skb2;
	int delta = headroom - skb_headroom(skb);

	if (delta <= 0)
		skb2 = pskb_copy(skb, GFP_ATOMIC);
	else {
		skb2 = skb_clone(skb, GFP_ATOMIC);
		if (skb2 && pskb_expand_head(skb2, SKB_DATA_ALIGN(delta), 0,
					     GFP_ATOMIC)) {
			kfree_skb(skb2);
			skb2 = NULL;
		}
	}
	return skb2;
}


/**
 *	skb_copy_expand	-	copy and expand sk_buff
 *	@skb: buffer to copy
 *	@newheadroom: new free bytes at head
 *	@newtailroom: new free bytes at tail
 *	@gfp_mask: allocation priority
 *
 *	Make a copy of both an &sk_buff and its data and while doing so
 *	allocate additional space.
 *
 *	This is used when the caller wishes to modify the data and needs a
 *	private copy of the data to alter as well as more space for new fields.
 *	Returns %NULL on failure or the pointer to the buffer
 *	on success. The returned buffer has a reference count of 1.
 *
 *	You must pass %GFP_ATOMIC as the allocation priority if this function
 *	is called from an interrupt.
 *
 *	BUG ALERT: ip_summed is not copied. Why does this work? Is it used
 *	only by netfilter in the cases when checksum is recalculated? --ANK
 */
struct sk_buff *skb_copy_expand(const struct sk_buff *skb,
				int newheadroom, int newtailroom,
				gfp_t gfp_mask)
{
	/*
	 *	Allocate the copy buffer
	 */
	struct sk_buff *n = alloc_skb(newheadroom + skb->len + newtailroom,
				      gfp_mask);
	int head_copy_len, head_copy_off;

	if (!n)
		return NULL;

	skb_reserve(n, newheadroom);

	/* Set the tail pointer and length */
	skb_put(n, skb->len);

	head_copy_len = skb_headroom(skb);
	head_copy_off = 0;
	if (newheadroom <= head_copy_len)
		head_copy_len = newheadroom;
	else
		head_copy_off = newheadroom - head_copy_len;

	/* Copy the linear header and data. */
	if (skb_copy_bits(skb, -head_copy_len, n->head + head_copy_off,
			  skb->len + head_copy_len))
		BUG();

	copy_skb_header(n, skb);

	return n;
}

/**
 *	skb_pad			-	zero pad the tail of an skb
 *	@skb: buffer to pad
 *	@pad: space to pad
 *
 *	Ensure that a buffer is followed by a padding area that is zero
 *	filled. Used by network drivers which may DMA or transfer data
 *	beyond the buffer end onto the wire.
 *
 *	May return error in out of memory cases. The skb is freed on error.
 */

int skb_pad(struct sk_buff *skb, int pad)
{
	int err;
	int ntail;

	/* If the skbuff is non linear tailroom is always zero.. */
	if (!skb_cloned(skb) && skb_tailroom(skb) >= pad) {
		memset(skb->data+skb->len, 0, pad);
		return 0;
	}

	ntail = skb->data_len + pad - (skb->end - skb->tail);
	if (likely(skb_cloned(skb) || ntail > 0)) {
		err = pskb_expand_head(skb, 0, ntail, GFP_ATOMIC);
		if (unlikely(err))
			goto free_skb;
	}

	/* FIXME: The use of this function with non-linear skb's really needs
	 * to be audited.
	 */
	err = skb_linearize(skb);
	if (unlikely(err))
		goto free_skb;

	memset(skb->data + skb->len, 0, pad);
	return 0;

free_skb:
	kfree_skb(skb);
	return err;
}

/* Trims skb to length len. It can change skb pointers.
 */

int ___pskb_trim(struct sk_buff *skb, unsigned int len)
{
	struct sk_buff **fragp;
	struct sk_buff *frag;
	int offset = skb_headlen(skb);
	int nfrags = skb_shinfo(skb)->nr_frags;
	int i;
	int err;

	if (skb_cloned(skb) &&
	    unlikely((err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC))))
		return err;

	i = 0;
	if (offset >= len)
		goto drop_pages;

	for (; i < nfrags; i++) {
		int end = offset + skb_shinfo(skb)->frags[i].size;

		if (end < len) {
			offset = end;
			continue;
		}

		skb_shinfo(skb)->frags[i++].size = len - offset;

drop_pages:
		skb_shinfo(skb)->nr_frags = i;

		for (; i < nfrags; i++)
			put_page(skb_shinfo(skb)->frags[i].page);

		if (skb_shinfo(skb)->frag_list)
			skb_drop_fraglist(skb);
		goto done;
	}

	for (fragp = &skb_shinfo(skb)->frag_list; (frag = *fragp);
	     fragp = &frag->next) {
		int end = offset + frag->len;

		if (skb_shared(frag)) {
			struct sk_buff *nfrag;

			nfrag = skb_clone(frag, GFP_ATOMIC);
			if (unlikely(!nfrag))
				return -ENOMEM;

			nfrag->next = frag->next;
			kfree_skb(frag);
			frag = nfrag;
			*fragp = frag;
		}

		if (end < len) {
			offset = end;
			continue;
		}

		if (end > len &&
		    unlikely((err = pskb_trim(frag, len - offset))))
			return err;

		if (frag->next)
			skb_drop_list(&frag->next);
		break;
	}

done:
	if (len > skb_headlen(skb)) {
		skb->data_len -= skb->len - len;
		skb->len       = len;
	} else {
		skb->len       = len;
		skb->data_len  = 0;
		skb->tail      = skb->data + len;
	}

	return 0;
}

/**
 *	__pskb_pull_tail - advance tail of skb header
 *	@skb: buffer to reallocate
 *	@delta: number of bytes to advance tail
 *
 *	The function makes a sense only on a fragmented &sk_buff,
 *	it expands header moving its tail forward and copying necessary
 *	data from fragmented part.
 *
 *	&sk_buff MUST have reference count of 1.
 *
 *	Returns %NULL (and &sk_buff does not change) if pull failed
 *	or value of new tail of skb in the case of success.
 *
 *	All the pointers pointing into skb header may change and must be
 *	reloaded after call to this function.
 */

/* Moves tail of skb head forward, copying data from fragmented part,
 * when it is necessary.
 * 1. It may fail due to malloc failure.
 * 2. It may change skb pointers.
 *
 * It is pretty complicated. Luckily, it is called only in exceptional cases.
 */
unsigned char *__pskb_pull_tail(struct sk_buff *skb, int delta)
{
	/* If skb has not enough free space at tail, get new one
	 * plus 128 bytes for future expansions. If we have enough
	 * room at tail, reallocate without expansion only if skb is cloned.
	 */
	int i, k, eat = (skb->tail + delta) - skb->end;

	if (eat > 0 || skb_cloned(skb)) {
		if (pskb_expand_head(skb, 0, eat > 0 ? eat + 128 : 0,
				     GFP_ATOMIC))
			return NULL;
	}

	if (skb_copy_bits(skb, skb_headlen(skb), skb->tail, delta))
		BUG();

	/* Optimization: no fragments, no reasons to preestimate
	 * size of pulled pages. Superb.
	 */
	if (!skb_shinfo(skb)->frag_list)
		goto pull_pages;

	/* Estimate size of pulled pages. */
	eat = delta;
	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
		if (skb_shinfo(skb)->frags[i].size >= eat)
			goto pull_pages;
		eat -= skb_shinfo(skb)->frags[i].size;
	}

	/* If we need update frag list, we are in troubles.
	 * Certainly, it possible to add an offset to skb data,
	 * but taking into account that pulling is expected to
	 * be very rare operation, it is worth to fight against
	 * further bloating skb head and crucify ourselves here instead.
	 * Pure masohism, indeed. 8)8)
	 */
	if (eat) {
		struct sk_buff *list = skb_shinfo(skb)->frag_list;
		struct sk_buff *clone = NULL;
		struct sk_buff *insp = NULL;

		do {
			BUG_ON(!list);

			if (list->len <= eat) {
				/* Eaten as whole. */
				eat -= list->len;
				list = list->next;
				insp = list;
			} else {
				/* Eaten partially. */

				if (skb_shared(list)) {
					/* Sucks! We need to fork list. :-( */
					clone = skb_clone(list, GFP_ATOMIC);
					if (!clone)
						return NULL;
					insp = list->next;
					list = clone;
				} else {
					/* This may be pulled without
					 * problems. */
					insp = list;
				}
				if (!pskb_pull(list, eat)) {
					if (clone)
						kfree_skb(clone);
					return NULL;
				}
				break;
			}
		} while (eat);

		/* Free pulled out fragments. */
		while ((list = skb_shinfo(skb)->frag_list) != insp) {
			skb_shinfo(skb)->frag_list = list->next;
			kfree_skb(list);
		}
		/* And insert new clone at head. */
		if (clone) {
			clone->next = list;
			skb_shinfo(skb)->frag_list = clone;
		}
	}
	/* Success! Now we may commit changes to skb data. */

pull_pages:
	eat = delta;
	k = 0;
	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
		if (skb_shinfo(skb)->frags[i].size <= eat) {
			put_page(skb_shinfo(skb)->frags[i].page);
			eat -= skb_shinfo(skb)->frags[i].size;
		} else {
			skb_shinfo(skb)->frags[k] = skb_shinfo(skb)->frags[i];
			if (eat) {
				skb_shinfo(skb)->frags[k].page_offset += eat;
				skb_shinfo(skb)->frags[k].size -= eat;
				eat = 0;
			}
			k++;
		}
	}
	skb_shinfo(skb)->nr_frags = k;

	skb->tail     += delta;
	skb->data_len -= delta;

	return skb->tail;
}

/* Copy some data bits from skb to kernel buffer. */

int skb_copy_bits(const struct sk_buff *skb, int offset, void *to, int len)
{
	int i, copy;
	int start = skb_headlen(skb);

	if (offset > (int)skb->len - len)
		goto fault;

	/* Copy header. */
	if ((copy = start - offset) > 0) {
		if (copy > len)
			copy = len;
		memcpy(to, skb->data + offset, copy);
		if ((len -= copy) == 0)
			return 0;
		offset += copy;
		to     += copy;
	}

	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
		int end;

		BUG_TRAP(start <= offset + len);

		end = start + skb_shinfo(skb)->frags[i].size;
		if ((copy = end - offset) > 0) {
			u8 *vaddr;

			if (copy > len)
				copy = len;

			vaddr = kmap_skb_frag(&skb_shinfo(skb)->frags[i]);
			memcpy(to,
			       vaddr + skb_shinfo(skb)->frags[i].page_offset+
			       offset - start, copy);
			kunmap_skb_frag(vaddr);

			if ((len -= copy) == 0)
				return 0;
			offset += copy;
			to     += copy;
		}
		start = end;
	}

	if (skb_shinfo(skb)->frag_list) {
		struct sk_buff *list = skb_shinfo(skb)->frag_list;

		for (; list; list = list->next) {
			int end;

			BUG_TRAP(start <= offset + len);

			end = start + list->len;
			if ((copy = end - offset) > 0) {
				if (copy > len)
					copy = len;
				if (skb_copy_bits(list, offset - start,
						  to, copy))
					goto fault;
				if ((len -= copy) == 0)
					return 0;
				offset += copy;
				to     += copy;
			}
			start = end;
		}
	}
	if (!len)
		return 0;

fault:
	return -EFAULT;
}

/**
 *	skb_store_bits - store bits from kernel buffer to skb
 *	@skb: destination buffer
 *	@offset: offset in destination
 *	@from: source buffer
 *	@len: number of bytes to copy
 *
 *	Copy the specified number of bytes from the source buffer to the
 *	destination skb.  This function handles all the messy bits of
 *	traversing fragment lists and such.
 */

int skb_store_bits(const struct sk_buff *skb, int offset, void *from, int len)
{
	int i, copy;
	int start = skb_headlen(skb);

	if (offset > (int)skb->len - len)
		goto fault;

	if ((copy = start - offset) > 0) {
		if (copy > len)
			copy = len;
		memcpy(skb->data + offset, from, copy);
		if ((len -= copy) == 0)
			return 0;
		offset += copy;
		from += copy;
	}

	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
		int end;

		BUG_TRAP(start <= offset + len);

		end = start + frag->size;
		if ((copy = end - offset) > 0) {
			u8 *vaddr;

			if (copy > len)
				copy = len;

			vaddr = kmap_skb_frag(frag);
			memcpy(vaddr + frag->page_offset + offset - start,
			       from, copy);
			kunmap_skb_frag(vaddr);

			if ((len -= copy) == 0)
				return 0;
			offset += copy;
			from += copy;
		}
		start = end;
	}

	if (skb_shinfo(skb)->frag_list) {
		struct sk_buff *list = skb_shinfo(skb)->frag_list;

		for (; list; list = list->next) {
			int end;

			BUG_TRAP(start <= offset + len);

			end = start + list->len;
			if ((copy = end - offset) > 0) {
				if (copy > len)
					copy = len;
				if (skb_store_bits(list, offset - start,
						   from, copy))
					goto fault;
				if ((len -= copy) == 0)
					return 0;
				offset += copy;
				from += copy;
			}
			start = end;
		}
	}
	if (!len)
		return 0;

fault:
	return -EFAULT;
}

EXPORT_SYMBOL(skb_store_bits);

/* Checksum skb data. */

__wsum skb_checksum(const struct sk_buff *skb, int offset,
			  int len, __wsum csum)
{
	int start = skb_headlen(skb);
	int i, copy = start - offset;
	int pos = 0;

	/* Checksum header. */
	if (copy > 0) {
		if (copy > len)
			copy = len;
		csum = csum_partial(skb->data + offset, copy, csum);
		if ((len -= copy) == 0)
			return csum;
		offset += copy;
		pos	= copy;
	}

	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
		int end;

		BUG_TRAP(start <= offset + len);

		end = start + skb_shinfo(skb)->frags[i].size;
		if ((copy = end - offset) > 0) {
			__wsum csum2;
			u8 *vaddr;
			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];

			if (copy > len)
				copy = len;
			vaddr = kmap_skb_frag(frag);
			csum2 = csum_partial(vaddr + frag->page_offset +
					     offset - start, copy, 0);
			kunmap_skb_frag(vaddr);
			csum = csum_block_add(csum, csum2, pos);
			if (!(len -= copy))
				return csum;
			offset += copy;
			pos    += copy;
		}
		start = end;
	}

	if (skb_shinfo(skb)->frag_list) {
		struct sk_buff *list = skb_shinfo(skb)->frag_list;

		for (; list; list = list->next) {
			int end;

			BUG_TRAP(start <= offset + len);

			end = start + list->len;
			if ((copy = end - offset) > 0) {
				__wsum csum2;
				if (copy > len)
					copy = len;
				csum2 = skb_checksum(list, offset - start,
						     copy, 0);
				csum = csum_block_add(csum, csum2, pos);
				if ((len -= copy) == 0)
					return csum;
				offset += copy;
				pos    += copy;
			}
			start = end;
		}
	}
	BUG_ON(len);

	return csum;
}

/* Both of above in one bottle. */

__wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset,
				    u8 *to, int len, __wsum csum)
{
	int start = skb_headlen(skb);
	int i, copy = start - offset;
	int pos = 0;

	/* Copy header. */
	if (copy > 0) {
		if (copy > len)
			copy = len;
		csum = csum_partial_copy_nocheck(skb->data + offset, to,
						 copy, csum);
		if ((len -= copy) == 0)
			return csum;
		offset += copy;
		to     += copy;
		pos	= copy;
	}

	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
		int end;

		BUG_TRAP(start <= offset + len);

		end = start + skb_shinfo(skb)->frags[i].size;
		if ((copy = end - offset) > 0) {
			__wsum csum2;
			u8 *vaddr;
			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];

			if (copy > len)
				copy = len;
			vaddr = kmap_skb_frag(frag);
			csum2 = csum_partial_copy_nocheck(vaddr +
							  frag->page_offset +
							  offset - start, to,
							  copy, 0);
			kunmap_skb_frag(vaddr);
			csum = csum_block_add(csum, csum2, pos);
			if (!(len -= copy))
				return csum;
			offset += copy;
			to     += copy;
			pos    += copy;
		}
		start = end;
	}

	if (skb_shinfo(skb)->frag_list) {
		struct sk_buff *list = skb_shinfo(skb)->frag_list;

		for (; list; list = list->next) {
			__wsum csum2;
			int end;

			BUG_TRAP(start <= offset + len);

			end = start + list->len;
			if ((copy = end - offset) > 0) {
				if (copy > len)
					copy = len;
				csum2 = skb_copy_and_csum_bits(list,
							       offset - start,
							       to, copy, 0);
				csum = csum_block_add(csum, csum2, pos);
				if ((len -= copy) == 0)
					return csum;
				offset += copy;
				to     += copy;
				pos    += copy;
			}
			start = end;
		}
	}
	BUG_ON(len);
	return csum;
}

void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to)
{
	__wsum csum;
	long csstart;

	if (skb->ip_summed == CHECKSUM_PARTIAL)
		csstart = skb->h.raw - skb->data;
	else
		csstart = skb_headlen(skb);

	BUG_ON(csstart > skb_headlen(skb));

	memcpy(to, skb->data, csstart);

	csum = 0;
	if (csstart != skb->len)
		csum = skb_copy_and_csum_bits(skb, csstart, to + csstart,
					      skb->len - csstart, 0);

	if (skb->ip_summed == CHECKSUM_PARTIAL) {
		long csstuff = csstart + skb->csum_offset;

		*((__sum16 *)(to + csstuff)) = csum_fold(csum);
	}
}

/**
 *	skb_dequeue - remove from the head of the queue
 *	@list: list to dequeue from
 *
 *	Remove the head of the list. The list lock is taken so the function
 *	may be used safely with other locking list functions. The head item is
 *	returned or %NULL if the list is empty.
 */

struct sk_buff *skb_dequeue(struct sk_buff_head *list)
{
	unsigned long flags;
	struct sk_buff *result;

	spin_lock_irqsave(&list->lock, flags);
	result = __skb_dequeue(list);
	spin_unlock_irqrestore(&list->lock, flags);
	return result;
}

/**
 *	skb_dequeue_tail - remove from the tail of the queue
 *	@list: list to dequeue from
 *
 *	Remove the tail of the list. The list lock is taken so the function
 *	may be used safely with other locking list functions. The tail item is
 *	returned or %NULL if the list is empty.
 */
struct sk_buff *skb_dequeue_tail(struct sk_buff_head *list)
{
	unsigned long flags;
	struct sk_buff *result;

	spin_lock_irqsave(&list->lock, flags);
	result = __skb_dequeue_tail(list);
	spin_unlock_irqrestore(&list->lock, flags);
	return result;
}

/**
 *	skb_queue_purge - empty a list
 *	@list: list to empty
 *
 *	Delete all buffers on an &sk_buff list. Each buffer is removed from
 *	the list and one reference dropped. This function takes the list
 *	lock and is atomic with respect to other list locking functions.
 */
void skb_queue_purge(struct sk_buff_head *list)
{
	struct sk_buff *skb;
	while ((skb = skb_dequeue(list)) != NULL)
		kfree_skb(skb);
}

/**
 *	skb_queue_head - queue a buffer at the list head
 *	@list: list to use
 *	@newsk: buffer to queue
 *
 *	Queue a buffer at the start of the list. This function takes the
 *	list lock and can be used safely with other locking &sk_buff functions
 *	safely.
 *
 *	A buffer cannot be placed on two lists at the same time.
 */
void skb_queue_head(struct sk_buff_head *list, struct sk_buff *newsk)
{
	unsigned long flags;

	spin_lock_irqsave(&list->lock, flags);
	__skb_queue_head(list, newsk);
	spin_unlock_irqrestore(&list->lock, flags);
}

/**
 *	skb_queue_tail - queue a buffer at the list tail
 *	@list: list to use
 *	@newsk: buffer to queue
 *
 *	Queue a buffer at the tail of the list. This function takes the
 *	list lock and can be used safely with other locking &sk_buff functions
 *	safely.
 *
 *	A buffer cannot be placed on two lists at the same time.
 */
void skb_queue_tail(struct sk_buff_head *list, struct sk_buff *newsk)
{
	unsigned long flags;

	spin_lock_irqsave(&list->lock, flags);
	__skb_queue_tail(list, newsk);
	spin_unlock_irqrestore(&list->lock, flags);
}

/**
 *	skb_unlink	-	remove a buffer from a list
 *	@skb: buffer to remove
 *	@list: list to use
 *
 *	Remove a packet from a list. The list locks are taken and this
 *	function is atomic with respect to other list locked calls
 *
 *	You must know what list the SKB is on.
 */
void skb_unlink(struct sk_buff *skb, struct sk_buff_head *list)
{
	unsigned long flags;

	spin_lock_irqsave(&list->lock, flags);
	__skb_unlink(skb, list);
	spin_unlock_irqrestore(&list->lock, flags);
}

/**
 *	skb_append	-	append a buffer
 *	@old: buffer to insert after
 *	@newsk: buffer to insert
 *	@list: list to use
 *
 *	Place a packet after a given packet in a list. The list locks are taken
 *	and this function is atomic with respect to other list locked calls.
 *	A buffer cannot be placed on two lists at the same time.
 */
void skb_append(struct sk_buff *old, struct sk_buff *newsk, struct sk_buff_head *list)
{
	unsigned long flags;

	spin_lock_irqsave(&list->lock, flags);
	__skb_append(old, newsk, list);
	spin_unlock_irqrestore(&list->lock, flags);
}


/**
 *	skb_insert	-	insert a buffer
 *	@old: buffer to insert before
 *	@newsk: buffer to insert
 *	@list: list to use
 *
 *	Place a packet before a given packet in a list. The list locks are
 * 	taken and this function is atomic with respect to other list locked
 *	calls.
 *
 *	A buffer cannot be placed on two lists at the same time.
 */
void skb_insert(struct sk_buff *old, struct sk_buff *newsk, struct sk_buff_head *list)
{
	unsigned long flags;

	spin_lock_irqsave(&list->lock, flags);
	__skb_insert(newsk, old->prev, old, list);
	spin_unlock_irqrestore(&list->lock, flags);
}

#if 0
/*
 * 	Tune the memory allocator for a new MTU size.
 */
void skb_add_mtu(int mtu)
{
	/* Must match allocation in alloc_skb */
	mtu = SKB_DATA_ALIGN(mtu) + sizeof(struct skb_shared_info);

	kmem_add_cache_size(mtu);
}
#endif

static inline void skb_split_inside_header(struct sk_buff *skb,
					   struct sk_buff* skb1,
					   const u32 len, const int pos)
{
	int i;

	memcpy(skb_put(skb1, pos - len), skb->data + len, pos - len);

	/* And move data appendix as is. */
	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
		skb_shinfo(skb1)->frags[i] = skb_shinfo(skb)->frags[i];

	skb_shinfo(skb1)->nr_frags = skb_shinfo(skb)->nr_frags;
	skb_shinfo(skb)->nr_frags  = 0;
	skb1->data_len		   = skb->data_len;
	skb1->len		   += skb1->data_len;
	skb->data_len		   = 0;
	skb->len		   = len;
	skb->tail		   = skb->data + len;
}

static inline void skb_split_no_header(struct sk_buff *skb,
				       struct sk_buff* skb1,
				       const u32 len, int pos)
{
	int i, k = 0;
	const int nfrags = skb_shinfo(skb)->nr_frags;

	skb_shinfo(skb)->nr_frags = 0;
	skb1->len		  = skb1->data_len = skb->len - len;
	skb->len		  = len;
	skb->data_len		  = len - pos;

	for (i = 0; i < nfrags; i++) {
		int size = skb_shinfo(skb)->frags[i].size;

		if (pos + size > len) {
			skb_shinfo(skb1)->frags[k] = skb_shinfo(skb)->frags[i];

			if (pos < len) {
				/* Split frag.
				 * We have two variants in this case:
				 * 1. Move all the frag to the second
				 *    part, if it is possible. F.e.
				 *    this approach is mandatory for TUX,
				 *    where splitting is expensive.
				 * 2. Split is accurately. We make this.
				 */
				get_page(skb_shinfo(skb)->frags[i].page);
				skb_shinfo(skb1)->frags[0].page_offset += len - pos;
				skb_shinfo(skb1)->frags[0].size -= len - pos;
				skb_shinfo(skb)->frags[i].size	= len - pos;
				skb_shinfo(skb)->nr_frags++;
			}
			k++;
		} else
			skb_shinfo(skb)->nr_frags++;
		pos += size;
	}
	skb_shinfo(skb1)->nr_frags = k;
}

/**
 * skb_split - Split fragmented skb to two parts at length len.
 * @skb: the buffer to split
 * @skb1: the buffer to receive the second part
 * @len: new length for skb
 */
void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len)
{
	int pos = skb_headlen(skb);

	if (len < pos)	/* Split line is inside header. */
		skb_split_inside_header(skb, skb1, len, pos);
	else		/* Second chunk has no header, nothing to copy. */
		skb_split_no_header(skb, skb1, len, pos);
}

/**
 * skb_prepare_seq_read - Prepare a sequential read of skb data
 * @skb: the buffer to read
 * @from: lower offset of data to be read
 * @to: upper offset of data to be read
 * @st: state variable
 *
 * Initializes the specified state variable. Must be called before
 * invoking skb_seq_read() for the first time.
 */
void skb_prepare_seq_read(struct sk_buff *skb, unsigned int from,
			  unsigned int to, struct skb_seq_state *st)
{
	st->lower_offset = from;
	st->upper_offset = to;
	st->root_skb = st->cur_skb = skb;
	st->frag_idx = st->stepped_offset = 0;
	st->frag_data = NULL;
}

/**
 * skb_seq_read - Sequentially read skb data
 * @consumed: number of bytes consumed by the caller so far
 * @data: destination pointer for data to be returned
 * @st: state variable
 *
 * Reads a block of skb data at &consumed relative to the
 * lower offset specified to skb_prepare_seq_read(). Assigns
 * the head of the data block to &data and returns the length
 * of the block or 0 if the end of the skb data or the upper
 * offset has been reached.
 *
 * The caller is not required to consume all of the data
 * returned, i.e. &consumed is typically set to the number
 * of bytes already consumed and the next call to
 * skb_seq_read() will return the remaining part of the block.
 *
 * Note: The size of each block of data returned can be arbitary,
 *       this limitation is the cost for zerocopy seqeuental
 *       reads of potentially non linear data.
 *
 * Note: Fragment lists within fragments are not implemented
 *       at the moment, state->root_skb could be replaced with
 *       a stack for this purpose.
 */
unsigned int skb_seq_read(unsigned int consumed, const u8 **data,
			  struct skb_seq_state *st)
{
	unsigned int block_limit, abs_offset = consumed + st->lower_offset;
	skb_frag_t *frag;

	if (unlikely(abs_offset >= st->upper_offset))
		return 0;

next_skb:
	block_limit = skb_headlen(st->cur_skb);

	if (abs_offset < block_limit) {
		*data = st->cur_skb->data + abs_offset;
		return block_limit - abs_offset;
	}

	if (st->frag_idx == 0 && !st->frag_data)
		st->stepped_offset += skb_headlen(st->cur_skb);

	while (st->frag_idx < skb_shinfo(st->cur_skb)->nr_frags) {
		frag = &skb_shinfo(st->cur_skb)->frags[st->frag_idx];
		block_limit = frag->size + st->stepped_offset;

		if (abs_offset < block_limit) {
			if (!st->frag_data)
				st->frag_data = kmap_skb_frag(frag);

			*data = (u8 *) st->frag_data + frag->page_offset +
				(abs_offset - st->stepped_offset);

			return block_limit - abs_offset;
		}

		if (st->frag_data) {
			kunmap_skb_frag(st->frag_data);
			st->frag_data = NULL;
		}

		st->frag_idx++;
		st->stepped_offset += frag->size;
	}

	if (st->cur_skb->next) {
		st->cur_skb = st->cur_skb->next;
		st->frag_idx = 0;
		goto next_skb;
	} else if (st->root_skb == st->cur_skb &&
		   skb_shinfo(st->root_skb)->frag_list) {
		st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
		goto next_skb;
	}

	return 0;
}

/**
 * skb_abort_seq_read - Abort a sequential read of skb data
 * @st: state variable
 *
 * Must be called if skb_seq_read() was not called until it
 * returned 0.
 */
void skb_abort_seq_read(struct skb_seq_state *st)
{
	if (st->frag_data)
		kunmap_skb_frag(st->frag_data);
}

#define TS_SKB_CB(state)	((struct skb_seq_state *) &((state)->cb))

static unsigned int skb_ts_get_next_block(unsigned int offset, const u8 **text,
					  struct ts_config *conf,
					  struct ts_state *state)
{
	return skb_seq_read(offset, text, TS_SKB_CB(state));
}

static void skb_ts_finish(struct ts_config *conf, struct ts_state *state)
{
	skb_abort_seq_read(TS_SKB_CB(state));
}

/**
 * skb_find_text - Find a text pattern in skb data
 * @skb: the buffer to look in
 * @from: search offset
 * @to: search limit
 * @config: textsearch configuration
 * @state: uninitialized textsearch state variable
 *
 * Finds a pattern in the skb data according to the specified
 * textsearch configuration. Use textsearch_next() to retrieve
 * subsequent occurrences of the pattern. Returns the offset
 * to the first occurrence or UINT_MAX if no match was found.
 */
unsigned int skb_find_text(struct sk_buff *skb, unsigned int from,
			   unsigned int to, struct ts_config *config,
			   struct ts_state *state)
{
	unsigned int ret;

	config->get_next_block = skb_ts_get_next_block;
	config->finish = skb_ts_finish;

	skb_prepare_seq_read(skb, from, to, TS_SKB_CB(state));

	ret = textsearch_find(config, state);
	return (ret <= to - from ? ret : UINT_MAX);
}

/**
 * skb_append_datato_frags: - append the user data to a skb
 * @sk: sock  structure
 * @skb: skb structure to be appened with user data.
 * @getfrag: call back function to be used for getting the user data
 * @from: pointer to user message iov
 * @length: length of the iov message
 *
 * Description: This procedure append the user data in the fragment part
 * of the skb if any page alloc fails user this procedure returns  -ENOMEM
 */
int skb_append_datato_frags(struct sock *sk, struct sk_buff *skb,
			int (*getfrag)(void *from, char *to, int offset,
					int len, int odd, struct sk_buff *skb),
			void *from, int length)
{
	int frg_cnt = 0;
	skb_frag_t *frag = NULL;
	struct page *page = NULL;
	int copy, left;
	int offset = 0;
	int ret;

	do {
		/* Return error if we don't have space for new frag */
		frg_cnt = skb_shinfo(skb)->nr_frags;
		if (frg_cnt >= MAX_SKB_FRAGS)
			return -EFAULT;

		/* allocate a new page for next frag */
		page = alloc_pages(sk->sk_allocation, 0);

		/* If alloc_page fails just return failure and caller will
		 * free previous allocated pages by doing kfree_skb()
		 */
		if (page == NULL)
			return -ENOMEM;

		/* initialize the next frag */
		sk->sk_sndmsg_page = page;
		sk->sk_sndmsg_off = 0;
		skb_fill_page_desc(skb, frg_cnt, page, 0, 0);
		skb->truesize += PAGE_SIZE;
		atomic_add(PAGE_SIZE, &sk->sk_wmem_alloc);

		/* get the new initialized frag */
		frg_cnt = skb_shinfo(skb)->nr_frags;
		frag = &skb_shinfo(skb)->frags[frg_cnt - 1];

		/* copy the user data to page */
		left = PAGE_SIZE - frag->page_offset;
		copy = (length > left)? left : length;

		ret = getfrag(from, (page_address(frag->page) +
			    frag->page_offset + frag->size),
			    offset, copy, 0, skb);
		if (ret < 0)
			return -EFAULT;

		/* copy was successful so update the size parameters */
		sk->sk_sndmsg_off += copy;
		frag->size += copy;
		skb->len += copy;
		skb->data_len += copy;
		offset += copy;
		length -= copy;

	} while (length > 0);

	return 0;
}

/**
 *	skb_pull_rcsum - pull skb and update receive checksum
 *	@skb: buffer to update
 *	@start: start of data before pull
 *	@len: length of data pulled
 *
 *	This function performs an skb_pull on the packet and updates
 *	update the CHECKSUM_COMPLETE checksum.  It should be used on
 *	receive path processing instead of skb_pull unless you know
 *	that the checksum difference is zero (e.g., a valid IP header)
 *	or you are setting ip_summed to CHECKSUM_NONE.
 */
unsigned char *skb_pull_rcsum(struct sk_buff *skb, unsigned int len)
{
	BUG_ON(len > skb->len);
	skb->len -= len;
	BUG_ON(skb->len < skb->data_len);
	skb_postpull_rcsum(skb, skb->data, len);
	return skb->data += len;
}

EXPORT_SYMBOL_GPL(skb_pull_rcsum);

/**
 *	skb_segment - Perform protocol segmentation on skb.
 *	@skb: buffer to segment
 *	@features: features for the output path (see dev->features)
 *
 *	This function performs segmentation on the given skb.  It returns
 *	the segment at the given position.  It returns NULL if there are
 *	no more segments to generate, or when an error is encountered.
 */
struct sk_buff *skb_segment(struct sk_buff *skb, int features)
{
	struct sk_buff *segs = NULL;
	struct sk_buff *tail = NULL;
	unsigned int mss = skb_shinfo(skb)->gso_size;
	unsigned int doffset = skb->data - skb->mac.raw;
	unsigned int offset = doffset;
	unsigned int headroom;
	unsigned int len;
	int sg = features & NETIF_F_SG;
	int nfrags = skb_shinfo(skb)->nr_frags;
	int err = -ENOMEM;
	int i = 0;
	int pos;

	__skb_push(skb, doffset);
	headroom = skb_headroom(skb);
	pos = skb_headlen(skb);

	do {
		struct sk_buff *nskb;
		skb_frag_t *frag;
		int hsize;
		int k;
		int size;

		len = skb->len - offset;
		if (len > mss)
			len = mss;

		hsize = skb_headlen(skb) - offset;
		if (hsize < 0)
			hsize = 0;
		if (hsize > len || !sg)
			hsize = len;

		nskb = alloc_skb(hsize + doffset + headroom, GFP_ATOMIC);
		if (unlikely(!nskb))
			goto err;

		if (segs)
			tail->next = nskb;
		else
			segs = nskb;
		tail = nskb;

		nskb->dev = skb->dev;
		nskb->priority = skb->priority;
		nskb->protocol = skb->protocol;
		nskb->dst = dst_clone(skb->dst);
		memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
		nskb->pkt_type = skb->pkt_type;
		nskb->mac_len = skb->mac_len;

		skb_reserve(nskb, headroom);
		nskb->mac.raw = nskb->data;
		nskb->nh.raw = nskb->data + skb->mac_len;
		nskb->h.raw = nskb->nh.raw + (skb->h.raw - skb->nh.raw);
		memcpy(skb_put(nskb, doffset), skb->data, doffset);

		if (!sg) {
			nskb->csum = skb_copy_and_csum_bits(skb, offset,
							    skb_put(nskb, len),
							    len, 0);
			continue;
		}

		frag = skb_shinfo(nskb)->frags;
		k = 0;

		nskb->ip_summed = CHECKSUM_PARTIAL;
		nskb->csum = skb->csum;
		memcpy(skb_put(nskb, hsize), skb->data + offset, hsize);

		while (pos < offset + len) {
			BUG_ON(i >= nfrags);

			*frag = skb_shinfo(skb)->frags[i];
			get_page(frag->page);
			size = frag->size;

			if (pos < offset) {
				frag->page_offset += offset - pos;
				frag->size -= offset - pos;
			}

			k++;

			if (pos + size <= offset + len) {
				i++;
				pos += size;
			} else {
				frag->size -= pos + size - (offset + len);
				break;
			}

			frag++;
		}

		skb_shinfo(nskb)->nr_frags = k;
		nskb->data_len = len - hsize;
		nskb->len += nskb->data_len;
		nskb->truesize += nskb->data_len;
	} while ((offset += len) < skb->len);

	return segs;

err:
	while ((skb = segs)) {
		segs = skb->next;
		kfree_skb(skb);
	}
	return ERR_PTR(err);
}

EXPORT_SYMBOL_GPL(skb_segment);

void __init skb_init(void)
{
	skbuff_head_cache = kmem_cache_create("skbuff_head_cache",
					      sizeof(struct sk_buff),
					      0,
					      SLAB_HWCACHE_ALIGN|SLAB_PANIC,
					      NULL, NULL);
	skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache",
						(2*sizeof(struct sk_buff)) +
						sizeof(atomic_t),
						0,
						SLAB_HWCACHE_ALIGN|SLAB_PANIC,
						NULL, NULL);
}

EXPORT_SYMBOL(___pskb_trim);
EXPORT_SYMBOL(__kfree_skb);
EXPORT_SYMBOL(kfree_skb);
EXPORT_SYMBOL(__pskb_pull_tail);
EXPORT_SYMBOL(__alloc_skb);
EXPORT_SYMBOL(__netdev_alloc_skb);
EXPORT_SYMBOL(pskb_copy);
EXPORT_SYMBOL(pskb_expand_head);
EXPORT_SYMBOL(skb_checksum);
EXPORT_SYMBOL(skb_clone);
EXPORT_SYMBOL(skb_clone_fraglist);
EXPORT_SYMBOL(skb_copy);
EXPORT_SYMBOL(skb_copy_and_csum_bits);
EXPORT_SYMBOL(skb_copy_and_csum_dev);
EXPORT_SYMBOL(skb_copy_bits);
EXPORT_SYMBOL(skb_copy_expand);
EXPORT_SYMBOL(skb_over_panic);
EXPORT_SYMBOL(skb_pad);
EXPORT_SYMBOL(skb_realloc_headroom);
EXPORT_SYMBOL(skb_under_panic);
EXPORT_SYMBOL(skb_dequeue);
EXPORT_SYMBOL(skb_dequeue_tail);
EXPORT_SYMBOL(skb_insert);
EXPORT_SYMBOL(skb_queue_purge);
EXPORT_SYMBOL(skb_queue_head);
EXPORT_SYMBOL(skb_queue_tail);
EXPORT_SYMBOL(skb_unlink);
EXPORT_SYMBOL(skb_append);
EXPORT_SYMBOL(skb_split);
EXPORT_SYMBOL(skb_prepare_seq_read);
EXPORT_SYMBOL(skb_seq_read);
EXPORT_SYMBOL(skb_abort_seq_read);
EXPORT_SYMBOL(skb_find_text);
EXPORT_SYMBOL(skb_append_datato_frags);

^ permalink raw reply

* Re: [PATCH]: remove netif_running() check from myri10ge_poll()
From: David Miller @ 2007-12-21  0:37 UTC (permalink / raw)
  To: akpm; +Cc: gallatin, jeff, netdev, linux-kernel
In-Reply-To: <20071220160518.fa65ac90.akpm@linux-foundation.org>

From: Andrew Morton <akpm@linux-foundation.org>
Date: Thu, 20 Dec 2007 16:05:18 -0800

> On Wed, 12 Dec 2007 11:02:43 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
> 
> > From: Andrew Gallatin <gallatin@myri.com>
> > Date: Wed, 12 Dec 2007 13:38:34 -0500
> > 
> > > Remove the bogus netif_running() check from myri10ge_poll().
> > > 
> > > This eliminates any chance that myri10ge_poll() can trigger
> > > an oops by calling netif_rx_complete() and returning
> > > with work_done == budget.
> > > 
> > > Signed-off-by: Andrew Gallatin <gallatin@myri.com>
> > 
> > Acked-by: David S. Miller <davem@davemloft.net>
> 
> hm, eight days old, fixes a possible oops and hasn't been merged anywhere
> yet?
> 
> I'll put it in my for-2.6.24-via-other-subsystem queue.

This actually adds a bug back into the code.

We're trying to work out how to cleanly break out of
the net_rx_action() ->poll() loop when a device is
brought down yet getting hammered with packets.

I'll try to devote some time to this over the weekend,
meanwhile just toss this patch, we know it's an issue
that all the drivers need to get audited for but we
can't fix that until the above paragraph stuff is worked
out.

Thanks.

^ permalink raw reply

* Re: [PATCH net-2.6.25 3/3] Uninline the inet_twsk_put function
From: David Miller @ 2007-12-21  0:08 UTC (permalink / raw)
  To: netdev; +Cc: xemul, netdev, devel
In-Reply-To: <200712201932.45900.netdev@axxeo.de>

From: Ingo Oeser <netdev@axxeo.de>
Date: Thu, 20 Dec 2007 19:32:45 +0100

> static inline inet_twsk_put(struct inet_timewait_sock *tw)
> {
> 	kref_put(&tw->kref, inet_twsk_release);
> }
> 
> David, can you see any reason (e.g. some crazy lock stuff) NOT to do this?

Look at how this datastructure actually works before making
such suggestions, don't just look at the context provided
purely by a patch.

"inet_timewait_sock" begins with a "struct sock_common"
which is where the atomic_t is, and:

#define tw_refcnt		__tw_common.skc_refcnt

So you would have to change struct sock_common over to kref, and thus
the entire networking, in order to make such a change.

I see zero value in this.  There are millions of more useful things to
invest that kind of time on.

But you would have seen this instantly if you had spent 5 seconds
looking at how these datastructures are defined.  Instead you choose
to make me do it and explain it to you instead.

^ permalink raw reply

* Re: [PATCH]: remove netif_running() check from myri10ge_poll()
From: Andrew Morton @ 2007-12-21  0:05 UTC (permalink / raw)
  To: David Miller; +Cc: gallatin, jeff, netdev, linux-kernel
In-Reply-To: <20071212.110243.202840959.davem@davemloft.net>

On Wed, 12 Dec 2007 11:02:43 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Andrew Gallatin <gallatin@myri.com>
> Date: Wed, 12 Dec 2007 13:38:34 -0500
> 
> > Remove the bogus netif_running() check from myri10ge_poll().
> > 
> > This eliminates any chance that myri10ge_poll() can trigger
> > an oops by calling netif_rx_complete() and returning
> > with work_done == budget.
> > 
> > Signed-off-by: Andrew Gallatin <gallatin@myri.com>
> 
> Acked-by: David S. Miller <davem@davemloft.net>

hm, eight days old, fixes a possible oops and hasn't been merged anywhere
yet?

I'll put it in my for-2.6.24-via-other-subsystem queue.

^ permalink raw reply

* Re: TSO trimming question
From: David Miller @ 2007-12-20 23:55 UTC (permalink / raw)
  To: herbert; +Cc: ilpo.jarvinen, netdev
In-Reply-To: <20071220140012.GA22495@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 20 Dec 2007 22:00:12 +0800

> On Thu, Dec 20, 2007 at 04:00:37AM -0800, David Miller wrote:
> >
> > In the most ideal sense, tcp_window_allows() should probably
> > be changed to only return MSS multiples.
> > 
> > Unfortunately this would add an expensive modulo operation,
> > however I think it would elimiate this problem case.
> 
> Well you only have to divide in the unlikely case of us being
> limited by the receiver window.  In that case speed is probably
> not of the essence anyway.

Agreed, to some extent.

I say "to some extent" because it might be realistic, with
lots (millions) of sockets to hit this case a lot.

There are so many things that are a "don't care" performance
wise until you have a lot of stinky connections over crappy
links.

^ permalink raw reply

* Re: [PATCH net-2.6.25][NEIGH] Make neigh_add_timer symmetrical to neigh_del_timer
From: David Miller @ 2007-12-20 23:50 UTC (permalink / raw)
  To: xemul; +Cc: netdev, devel
In-Reply-To: <476A3AED.7080503@openvz.org>

From: Pavel Emelyanov <xemul@openvz.org>
Date: Thu, 20 Dec 2007 12:50:37 +0300

> The neigh_del_timer() looks sane - it removes the timer and
> (conditionally) puts the neighbor. I expected, that the
> neigh_add_timer() is symmetrical to the del one - i.e. it
> holds the neighbor and arms the timer - but it turned out
> that it was not so.
> 
> I think, that making them look symmetrical makes the code 
> more readable.
> 
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

I agree, it looks more reable now, applied.

This code used to be a lot worse, I think we had some
confusion about whether the timer should always not be
pending in these circumstances.  But that was a bug
fix from a long time ago, however I believe that's
where the dump_stack() bug check came from in the
add timer case.

^ permalink raw reply

* Re: [PATCH net-2.6.25 1/19] [NETNS] Add netns parameter to fib_rules_(un)register.
From: David Miller @ 2007-12-20 23:46 UTC (permalink / raw)
  To: den; +Cc: benjamin.thery, dlezcano, devel, containers, netdev, xemul
In-Reply-To: <1198077889-10693-2-git-send-email-den@openvz.org>

From: "Denis V. Lunev" <den@openvz.org>
Date: Wed, 19 Dec 2007 18:24:31 +0300

> @@ -101,14 +101,12 @@ static inline u32 frh_get_table(struct fib_rule_hdr *frh, struct nlattr **nla)
>  	return frh->table;
>  }
>  
> -extern int			fib_rules_register(struct fib_rules_ops *);
> -extern int			fib_rules_unregister(struct fib_rules_ops *);
> -extern void                     fib_rules_cleanup_ops(struct fib_rules_ops *);
> +extern int fib_rules_register(struct net *, struct fib_rules_ops *);
> +extern int fib_rules_unregister(struct net *, struct fib_rules_ops *);
> +extern void fib_rules_cleanup_ops(struct fib_rules_ops *);
>  
> -extern int			fib_rules_lookup(struct fib_rules_ops *,
> -						 struct flowi *, int flags,
> -						 struct fib_lookup_arg *);
> -extern int			fib_default_rule_add(struct fib_rules_ops *,
> -						     u32 pref, u32 table,
> -						     u32 flags);
> +extern int fib_rules_lookup(struct fib_rules_ops *, struct flowi *, int flags,
> +			    struct fib_lookup_arg *);
> +extern int fib_default_rule_add(struct fib_rules_ops *, u32 pref, u32 table,
> +				u32 flags);
>  #endif

Please do not make gratuitous coding style changes like this!

What bothers you so much that there is lots of whitespace there after
the "extern int"?  Does it bother you so much that you think the side
effect of your patch being unreadable is worth it?!?!

Why is it unreadable?  I'm glad you asked....

Just like me, someone will have to read this over carefully to
see what you're actually doing.

Are you deleting all the existing declarations and adding new
ones with different names?

Are you deleting some of them, but keeping others yet changing
the arguments to them somehow?

Are you deleting some of them, but masterbating with the coding
style of others?

NOBODY KNOWS!

Whereas if you just deleted the lines for the functions you
are removing, it would be totally clear what is happening.

This patch, from a reviewability standpoint, sucks.  It makes
efficient patch review next to impossible.

I'm not looking at the rest of this patch set, clean this stuff up and
resubmit it all, thank you.

^ permalink raw reply

* Re: [patch 1/2][NETNS] net: Modify the neighbour table code so it handles multiple network namespaces
From: David Miller @ 2007-12-20 23:40 UTC (permalink / raw)
  To: dlezcano; +Cc: netdev, den, benjamin.thery, ebiederm
In-Reply-To: <20071219145830.605868037@ICON-9-164-138-215.megacenter.de.ibm.com>

From: Daniel Lezcano <dlezcano@fr.ibm.com>
Date: Wed, 19 Dec 2007 15:55:45 +0100

> -struct neigh_seq_state {
> +struct neigh_seq_state
> +{

Please don't make coding style change like this.

The accepted convention is:

struct NAME {
	...
};

I know the other structs in that header file use the lousy:

struct NAME
{
};

format, but that doesn't make it right and we gain nothing
by taking a step backwards instead of fixing all the instances
in that header file over to the correct style.

^ permalink raw reply

* Re: [PATCH net-2.6.25 3/3] Uninline the inet_twsk_put function
From: David Miller @ 2007-12-20 23:33 UTC (permalink / raw)
  To: xemul; +Cc: netdev, devel
In-Reply-To: <4768F8CD.2050209@openvz.org>

From: Pavel Emelyanov <xemul@openvz.org>
Date: Wed, 19 Dec 2007 13:56:13 +0300

> This one is not that big, but is widely used: saves 1200 bytes 
> from net/ipv4/built-in.o
> 
> add/remove: 1/0 grow/shrink: 1/12 up/down: 97/-1300 (-1203)
> function                                     old     new   delta
> inet_twsk_put                                  -      87     +87
> __inet_lookup_listener                       274     284     +10
> tcp_sacktag_write_queue                     2255    2254      -1
> tcp_time_wait                                482     411     -71
> __inet_check_established                     796     722     -74
> tcp_v4_err                                   973     898     -75
> __inet_twsk_kill                             230     154     -76
> inet_twsk_deschedule                         180     103     -77
> tcp_v4_do_rcv                                462     384     -78
> inet_hash_connect                            686     607     -79
> inet_twdr_do_twkill_work                     236     150     -86
> inet_twdr_twcal_tick                         395     307     -88
> tcp_v4_rcv                                  1744    1480    -264
> tcp_timewait_state_process                   975     644    -331
> 
> Export it for ipv6 module.
> 
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-2.6.25 2/3] Uninline the __inet_lookup_established function
From: David Miller @ 2007-12-20 23:32 UTC (permalink / raw)
  To: xemul; +Cc: netdev, devel
In-Reply-To: <4768F834.3060100@openvz.org>

From: Pavel Emelyanov <xemul@openvz.org>
Date: Wed, 19 Dec 2007 13:53:40 +0300

> This is -700 bytes from the net/ipv4/built-in.o
> 
> add/remove: 1/0 grow/shrink: 1/3 up/down: 340/-1040 (-700)
> function                                     old     new   delta
> __inet_lookup_established                      -     339    +339
> tcp_sacktag_write_queue                     2254    2255      +1
> tcp_v4_err                                  1304     973    -331
> tcp_v4_rcv                                  2089    1744    -345
> tcp_v4_do_rcv                                826     462    -364
> 
> Exporting is for dccp module (used via e.g. inet_lookup).
> 
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

Applied.

^ permalink raw reply

* Re: [PATCH net-2.6.25 (resend) 1/3] Uninline the __inet_hash function
From: David Miller @ 2007-12-20 23:31 UTC (permalink / raw)
  To: xemul; +Cc: dada1, netdev, devel
In-Reply-To: <476A39DF.5080604@openvz.org>

From: Pavel Emelyanov <xemul@openvz.org>
Date: Thu, 20 Dec 2007 12:46:07 +0300

> This one is used in quite many places in the networking code and
> seems to big to be inline.
> 
> After the patch net/ipv4/build-in.o loses ~650 bytes:
> add/remove: 2/0 grow/shrink: 0/5 up/down: 461/-1114 (-653)
> function                                     old     new   delta
> __inet_hash_nolisten                           -     282    +282
> __inet_hash                                    -     179    +179
> tcp_sacktag_write_queue                     2255    2254      -1
> __inet_lookup_listener                       284     274     -10
> tcp_v4_syn_recv_sock                         755     493    -262
> tcp_v4_hash                                  389      35    -354
> inet_hash_connect                           1086     599    -487
> 
> This version addresses the issue pointed by Eric, that
> while being inline this function was optimized by gcc
> in respect to the 'listen_possible' argument.
> 
> (Patches 2 and 3 in this series are still applied after this)
> 
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox