Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [v3 PATCH 0/2] NETFILTER new target module, HMARK
From: Pablo Neira Ayuso @ 2011-10-20  8:54 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: kaber, jengelh, netfilter-devel, netdev, hans
In-Reply-To: <1318532530-29446-1-git-send-email-hans.schillstrom@ericsson.com>

Hi Hans,

On Thu, Oct 13, 2011 at 09:02:08PM +0200, Hans Schillstrom wrote:
> The target allows you to create rules in the "raw" and "mangle" tables
> which alter the netfilter mark (nfmark) field within a given range.
> First a 32 bit hash value is generated then modulus by <limit> and
> finally an offset is added before it's written to nfmark.
> Prior to routing, the nfmark can influence the routing method (see
> "Use netfilter MARK value as routing key") and can also be used by
> other subsystems to change their behaviour.
> 
> The mark match can also be used to match nfmark produced by this module.
> See the kernel module for more info.
> 
> REVISION
> Version 3
>         Handling of SCTP for IPv6 added.
> 
> Version 2
> 	NAT Added for IPv4
> 	IPv6 ICMP handling enhanced.
> 	Usage example added
> 
> Version 1
> 	Initial RFC
> 
> 
> We (Ericsson) use hmark in-front of ipvs as a pre-loadbalancer and
> handles up to 70 ipvs running in parallel in clusters.
> However hmark is not restricted to run infront of IPVS it can also be used as
> "poor mans" load balancer.
> With this version is also NAT supported as an option, with very high flows
> you might not want to use conntrack.
> 
> The idea is to generate a direction independent fw mark range to use as input to
> the routing (i.e. ip rule add fwmark ...).
> Pretty straight forward and simple.
> 
> 
> Example:
>                                       App Server (Real Server)
> 
>                                            +---------+
>                                         -->| Service |
>      Gateway A                             +---------+
>                           /
>             +----------+ /     +----+      +---------+
> --- if -A---| selector |---->  |ipvs|  --->| Service |
>             +----------+ \     +----+      +---------+
>                           \
>                                +----+      +---------+
>                                |ipvs|   -->| Service |
>                                +----+      +---------+
>       Gateway C
>             +----------+ /     +----+
> --- if-B ---| selector | --->  |ipvs|
>             +----------+ \     +----+      +---------+
>                                            | Service |
>                                            +---------+
>                           /
>             +----------+ /     +----+     ..
> --- if-B ---| selector | --->  |ipvs|      +---------+
>             +----------+ \     +----+      | Service |
>                           \                +---------+
> #
> # Example with four ipvs loadbalancers
> #
> iptables -t mangle -I PREROUTING -d $IPADDR -j HMARK --hmark-mod 4 --hmark-offs 100

I think you can replace this rule by:

iptables -t mangle -I PREROUTING -d $IPADDR -m cluster \
        --cluster-total-nodes 4 --cluster-local-node 1
        --cluster-hash-seed 0xdeadbeef -j MARK --set-mark 100
iptables -t mangle -I PREROUTING -d $IPADDR -m cluster \
        --cluster-total-nodes 4 --cluster-local-node 2
        --cluster-hash-seed 0xdeadbeef -j MARK --set-mark 101
iptables -t mangle -I PREROUTING -d $IPADDR -m cluster \
        --cluster-total-nodes 4 --cluster-local-node 3
        --cluster-hash-seed 0xdeadbeef -j MARK --set-mark 103
iptables -t mangle -I PREROUTING -d $IPADDR -m cluster \
        --cluster-total-nodes 4 --cluster-local-node 4
        --cluster-hash-seed 0xdeadbeef -j MARK --set-mark 104

The hashing is done by the cluster match, which is currently based on
the source address.

This match currently depends on the connection tracking system, so you
could save the ctmark with CONNMARK. Thus, you only has to hash the
first packet of the flow, instead of hashing every single packet.

> ip rule add fwmark 100 table 100
> ip rule add fwmark 101 table 101
> ip rule add fwmark 102 table 102
> ip rule add fwmark 103 table 103
> 
> ip ro ad table 100 default via x.y.z.1 dev bond1
> ip ro ad table 101 default via x.y.z.2 dev bond1
> ip ro ad table 102 default via x.y.z.3 dev bond1
> ip ro ad table 103 default via x.y.z.4 dev bond1
>
> If conntrack doesn't handle the return path,
> do the oposite with HMARK and send it back right to ipvs.
> 
> Another exmaple of usage could be if you have cluster originated connections
> and want to spread the connections over a number of interfaces
> (NAT will complpicate things for you in this case)
> 
> 
> 
>                      \  Blade 1
>                       \ +----------+      +---------+
>                     <-- | selector | <--- | Service |
>                       / +----------+      +---------+
>                      /
>    +------+
> -- | Gw-A |          \  Blade 2
>    +------+           \ +----------+      +---------+
>    +------+         <-- | selector | <--- | Service |
> -- | Gw-B |           / +----------+      +---------+
>    +------+          /
>    +------+
> -- | Gw-C |          \
>    +------+           \ +----------+      +---------+
>                     <-- | selector | <--- | Service |
>                       / +----------+      +---------+
>                      /
> 
>                      \  Blande -n
>                       \ +----------+      +---------+
>                     <-- | selector | <--- | Service |
>                       / +----------+      +---------+
>                      /

Unless I'm missing something, I think this can be done with the
cluster match as well.

^ permalink raw reply

* Re: [PATCH] virtio_net: Clean up set_skb_frag()
From: David Miller @ 2011-10-20  8:54 UTC (permalink / raw)
  To: eric.dumazet; +Cc: krkumar2, rusty, mst, netdev, linux-kernel, virtualization
In-Reply-To: <1319100362.3781.3.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 20 Oct 2011 10:46:02 +0200

> Le jeudi 20 octobre 2011 à 13:47 +0530, Krishna Kumar a écrit :
>> Remove manual initialization in set_skb_frag, and instead
>> use __skb_fill_page_desc() to do the same. Patch tested
>> on net-next.
>> 
>> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
...
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH V2 net-next] mlx4_en: fix skb truesize underestimation
From: David Miller @ 2011-10-20  8:55 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, yevgenyp
In-Reply-To: <1319086192.8416.42.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 20 Oct 2011 06:49:52 +0200

> skb->truesize must account for allocated memory, not the used part of
> it. Doing this work is important to avoid unexpected OOM situations.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> ---
> V2: respin after recent patches

Applied, thanks Eric.

^ permalink raw reply

* [PATCH net-next] bnx2x: fix skb truesize underestimation
From: Eric Dumazet @ 2011-10-20  9:00 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eilon Greenstein

bnx2x allocates a full page per fragment.

We must account in skb->truesize, the size of the fragment, not the used
part of it.
    
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index dd8ee56..580b44e 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -454,7 +454,7 @@ static int bnx2x_fill_frag_skb(struct bnx2x *bp, struct bnx2x_fastpath *fp,
 		skb_fill_page_desc(skb, j, old_rx_pg.page, 0, frag_len);
 
 		skb->data_len += frag_len;
-		skb->truesize += frag_len;
+		skb->truesize += SGE_PAGE_SIZE * PAGES_PER_SGE;
 		skb->len += frag_len;
 
 		frag_size -= frag_len;

^ permalink raw reply related

* Re: [patch net-next]alx: Atheros AR8131/AR8151/AR8152/AR8161 Ethernet driver
From: Joe Perches @ 2011-10-20  9:00 UTC (permalink / raw)
  To: David Miller; +Cc: cloud.ren, Luis.Rodriguez, netdev, linux-kernel
In-Reply-To: <20111020.044541.970282389722164761.davem@davemloft.net>

On Thu, 2011-10-20 at 04:45 -0400, David Miller wrote:
> From: <cloud.ren@Atheros.com>
[]
> +#define ALX_HW_WARN(_fmt, _args...) \
> +		ALX_HW_PRINTA(WARNING, _fmt, ## _args)
> +
> +#define ALX_HW_INFO(_fmt, _args...) \
> +		ALX_HW_PRINTA(INFO, _fmt, ## _args)
> +
> +#define ALX_HW_DBG(_fmt, _args...) \
> +		ALX_HW_PRINTA(DEBUG, _fmt, ## _args)
> +

I've just done patches for these
and for ALX_PRINTA and ALX_PRINTB.

I'll send them directly to Cloud.

> Please just submit it to staging and let it cook there for a couple
> weeks in the interests of our sanity.

That's a good plan.

^ permalink raw reply

* [PATCH 0/6] skb fragment API: convert network drivers (part V, take 2)
From: Ian Campbell @ 2011-10-20  9:01 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: linux-scsi@vger.kernel.org, linux-mm@kvack.org

The following series is the second attempt to convert a fifth (and
hopefully final) batch of network drivers to the SKB pages fragment API
introduced in 131ea6675c76.

There are four drivers here (mlx4, cxgb4, cxgb4vf and cxgbi) which used
skb_frag_t as part of their internal datastructures which meant that
they are impacted by changes to that type more than most drivers. To
break this dependency I added a "struct page_frag" (struct page + offset
+ len) and converted them to use it. These conversions are a little less
trivial than most of the preceding ones and I have only been able to
compile test them.

The struct page_frag addition has been acked by Andrew Morton to go
through the net tree. (Andrew, I took you "yes please" as an Acked-by. I
hope that's ok).

The final patch here wraps the page member of skb_frag_t in a structure,
this is a precursor to adding the destructor here (those patches need a
little more work, arising from comments made at LPC, I'll post regarding
those shortly). This should help ensure that no direct uses of the page
get introduced in the meantime.

In the previous posting of this series I ran an allmodconfig build on a
boatload architectures[2] on a baseline of the then current
net-next/master (88c5100c28b0) and with that series. Although the
baseline didn't build for most architectures I used "make -k" and
confirmed that this series added no new warnings or errors. For this
iteration I have just rebuilt things which changed in the interval
88c5100c28b0..a0bec1cd8f7a (current net-next/master) on amd64 and
eyeballed the diff for new uses of frag->page (I saw none).

This is part of my series to enable visibility into SKB paged fragment's
lifecycles, [0] contains some more background and rationale but
basically the completed series will allow entities which inject pages
into the networking stack to receive a notification when the stack has
really finished with those pages (i.e. including retransmissions,
clones, pull-ups etc) and not just when the original skb is finished
with, which is beneficial to many subsystems which wish to inject pages
into the network stack without giving up full ownership of those page's
lifecycle. It implements something broadly along the lines of what was
described in [1].

Cheers,
Ian.

[0] http://marc.info/?l=linux-netdev&m=131072801125521&w=2
[1] http://marc.info/?l=linux-netdev&m=130925719513084&w=2
[2] arm amd64 blackfin cris i386 ia64 m68k mips64 mips powerpc64 powerpc
s390x sh4 sparc64 sparc xtensa 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH 1/6] mm: add a "struct page_frag" type containing a page, offset and length
From: Ian Campbell @ 2011-10-20  9:01 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: Ian Campbell, Christoph Hellwig, David Miller, linux-mm,
	linux-kernel
In-Reply-To: <1319101275.3385.129.camel@zakaz.uk.xensource.com>

A few network drivers currently use skb_frag_struct for this purpose but I have
patches which add additional fields and semantics there which these other uses
do not want.

A structure for reference sub-page regions seems like a generally useful thing
so do so instead of adding a network subsystem specific structure.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
[since v1: s/struct subpage/struct page_frag/ on advice from Christoph]
[since v2: s/page_offset/offset/ on advice from Andrew]
---
 include/linux/mm_types.h |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 774b895..29971a5 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -135,6 +135,17 @@ struct page {
 #endif
 ;
 
+struct page_frag {
+	struct page *page;
+#if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536)
+	__u32 offset;
+	__u32 size;
+#else
+	__u16 offset;
+	__u16 size;
+#endif
+};
+
 typedef unsigned long __nocast vm_flags_t;
 
 /*
-- 
1.7.2.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 5/6] cxgbi: convert to SKB paged frag API.
From: Ian Campbell @ 2011-10-20  9:01 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: Ian Campbell, James E.J. Bottomley, David S. Miller,
	Mike Christie, James Bottomley, Karen Xie, linux-scsi, netdev
In-Reply-To: <1319101275.3385.129.camel@zakaz.uk.xensource.com>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@suse.de>
Cc: Karen Xie <kxie@chelsio.com>
Cc: linux-scsi@vger.kernel.org
Cc: netdev@vger.kernel.org
---
 drivers/scsi/cxgbi/libcxgbi.c |   28 +++++++++++++++-------------
 drivers/scsi/cxgbi/libcxgbi.h |    2 +-
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index be69da3..1c1329b 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -1787,7 +1787,7 @@ static int sgl_seek_offset(struct scatterlist *sgl, unsigned int sgcnt,
 }
 
 static int sgl_read_to_frags(struct scatterlist *sg, unsigned int sgoffset,
-				unsigned int dlen, skb_frag_t *frags,
+				unsigned int dlen, struct page_frag *frags,
 				int frag_max)
 {
 	unsigned int datalen = dlen;
@@ -1814,8 +1814,8 @@ static int sgl_read_to_frags(struct scatterlist *sg, unsigned int sgoffset,
 		copy = min(datalen, sglen);
 		if (i && page == frags[i - 1].page &&
 		    sgoffset + sg->offset ==
-			frags[i - 1].page_offset + skb_frag_size(&frags[i - 1])) {
-			skb_frag_size_add(&frags[i - 1], copy);
+			frags[i - 1].offset + frags[i - 1].size) {
+			frags[i - 1].size += copy;
 		} else {
 			if (i >= frag_max) {
 				pr_warn("too many pages %u, dlen %u.\n",
@@ -1824,8 +1824,8 @@ static int sgl_read_to_frags(struct scatterlist *sg, unsigned int sgoffset,
 			}
 
 			frags[i].page = page;
-			frags[i].page_offset = sg->offset + sgoffset;
-			skb_frag_size_set(&frags[i], copy);
+			frags[i].offset = sg->offset + sgoffset;
+			frags[i].size = copy;
 			i++;
 		}
 		datalen -= copy;
@@ -1944,15 +1944,15 @@ int cxgbi_conn_init_pdu(struct iscsi_task *task, unsigned int offset,
 		if (tdata->nr_frags > MAX_SKB_FRAGS ||
 		    (padlen && tdata->nr_frags == MAX_SKB_FRAGS)) {
 			char *dst = skb->data + task->hdr_len;
-			skb_frag_t *frag = tdata->frags;
+			struct page_frag *frag = tdata->frags;
 
 			/* data fits in the skb's headroom */
 			for (i = 0; i < tdata->nr_frags; i++, frag++) {
 				char *src = kmap_atomic(frag->page,
 							KM_SOFTIRQ0);
 
-				memcpy(dst, src+frag->page_offset, skb_frag_size(frag));
-				dst += skb_frag_size(frag);
+				memcpy(dst, src+frag->offset, frag->size);
+				dst += frag->size;
 				kunmap_atomic(src, KM_SOFTIRQ0);
 			}
 			if (padlen) {
@@ -1962,11 +1962,13 @@ int cxgbi_conn_init_pdu(struct iscsi_task *task, unsigned int offset,
 			skb_put(skb, count + padlen);
 		} else {
 			/* data fit into frag_list */
-			for (i = 0; i < tdata->nr_frags; i++)
-				get_page(tdata->frags[i].page);
-
-			memcpy(skb_shinfo(skb)->frags, tdata->frags,
-				sizeof(skb_frag_t) * tdata->nr_frags);
+			for (i = 0; i < tdata->nr_frags; i++) {
+				__skb_fill_page_desc(skb, i,
+						tdata->frags[i].page,
+						tdata->frags[i].offset,
+						tdata->frags[i].size);
+				skb_frag_ref(skb, i);
+			}
 			skb_shinfo(skb)->nr_frags = tdata->nr_frags;
 			skb->len += count;
 			skb->data_len += count;
diff --git a/drivers/scsi/cxgbi/libcxgbi.h b/drivers/scsi/cxgbi/libcxgbi.h
index 9267844..3a25b11 100644
--- a/drivers/scsi/cxgbi/libcxgbi.h
+++ b/drivers/scsi/cxgbi/libcxgbi.h
@@ -574,7 +574,7 @@ struct cxgbi_endpoint {
 #define MAX_PDU_FRAGS	((ULP2_MAX_PDU_PAYLOAD + 512 - 1) / 512)
 struct cxgbi_task_data {
 	unsigned short nr_frags;
-	skb_frag_t frags[MAX_PDU_FRAGS];
+	struct page_frag frags[MAX_PDU_FRAGS];
 	struct sk_buff *skb;
 	unsigned int offset;
 	unsigned int count;
-- 
1.7.2.5


^ permalink raw reply related

* [PATCH 2/6] mlx4: convert to SKB paged frag API.
From: Ian Campbell @ 2011-10-20  9:01 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Ian Campbell, netdev
In-Reply-To: <1319101275.3385.129.camel@zakaz.uk.xensource.com>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |   32 ++++++++++++++--------------
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |   20 +++--------------
 2 files changed, 20 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 9b18d85..af26983 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -44,7 +44,7 @@
 
 static int mlx4_en_alloc_frag(struct mlx4_en_priv *priv,
 			      struct mlx4_en_rx_desc *rx_desc,
-			      struct skb_frag_struct *skb_frags,
+			      struct page_frag *skb_frags,
 			      struct mlx4_en_rx_alloc *ring_alloc,
 			      int i)
 {
@@ -61,7 +61,7 @@ static int mlx4_en_alloc_frag(struct mlx4_en_priv *priv,
 			return -ENOMEM;
 
 		skb_frags[i].page = page_alloc->page;
-		skb_frags[i].page_offset = page_alloc->offset;
+		skb_frags[i].offset = page_alloc->offset;
 		page_alloc->page = page;
 		page_alloc->offset = frag_info->frag_align;
 	} else {
@@ -69,11 +69,11 @@ static int mlx4_en_alloc_frag(struct mlx4_en_priv *priv,
 		get_page(page);
 
 		skb_frags[i].page = page;
-		skb_frags[i].page_offset = page_alloc->offset;
+		skb_frags[i].offset = page_alloc->offset;
 		page_alloc->offset += frag_info->frag_stride;
 	}
 	dma = pci_map_single(mdev->pdev, page_address(skb_frags[i].page) +
-			     skb_frags[i].page_offset, frag_info->frag_size,
+			     skb_frags[i].offset, frag_info->frag_size,
 			     PCI_DMA_FROMDEVICE);
 	rx_desc->data[i].addr = cpu_to_be64(dma);
 	return 0;
@@ -157,8 +157,8 @@ static int mlx4_en_prepare_rx_desc(struct mlx4_en_priv *priv,
 				   struct mlx4_en_rx_ring *ring, int index)
 {
 	struct mlx4_en_rx_desc *rx_desc = ring->buf + (index * ring->stride);
-	struct skb_frag_struct *skb_frags = ring->rx_info +
-					    (index << priv->log_rx_info);
+	struct page_frag *skb_frags = ring->rx_info +
+				      (index << priv->log_rx_info);
 	int i;
 
 	for (i = 0; i < priv->num_frags; i++)
@@ -183,7 +183,7 @@ static void mlx4_en_free_rx_desc(struct mlx4_en_priv *priv,
 				 int index)
 {
 	struct mlx4_en_dev *mdev = priv->mdev;
-	struct skb_frag_struct *skb_frags;
+	struct page_frag *skb_frags;
 	struct mlx4_en_rx_desc *rx_desc = ring->buf + (index << ring->log_stride);
 	dma_addr_t dma;
 	int nr;
@@ -194,7 +194,7 @@ static void mlx4_en_free_rx_desc(struct mlx4_en_priv *priv,
 		dma = be64_to_cpu(rx_desc->data[nr].addr);
 
 		en_dbg(DRV, priv, "Unmapping buffer at dma:0x%llx\n", (u64) dma);
-		pci_unmap_single(mdev->pdev, dma, skb_frag_size(&skb_frags[nr]),
+		pci_unmap_single(mdev->pdev, dma, skb_frags[nr].size,
 				 PCI_DMA_FROMDEVICE);
 		put_page(skb_frags[nr].page);
 	}
@@ -403,7 +403,7 @@ void mlx4_en_deactivate_rx_ring(struct mlx4_en_priv *priv,
 /* Unmap a completed descriptor and free unused pages */
 static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
 				    struct mlx4_en_rx_desc *rx_desc,
-				    struct skb_frag_struct *skb_frags,
+				    struct page_frag *skb_frags,
 				    struct skb_frag_struct *skb_frags_rx,
 				    struct mlx4_en_rx_alloc *page_alloc,
 				    int length)
@@ -420,9 +420,9 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
 			break;
 
 		/* Save page reference in skb */
-		skb_frags_rx[nr].page = skb_frags[nr].page;
-		skb_frag_size_set(&skb_frags_rx[nr], skb_frag_size(&skb_frags[nr]));
-		skb_frags_rx[nr].page_offset = skb_frags[nr].page_offset;
+		__skb_frag_set_page(&skb_frags_rx[nr], skb_frags[nr].page);
+		skb_frag_size_set(&skb_frags_rx[nr], skb_frags[nr].size);
+		skb_frags_rx[nr].page_offset = skb_frags[nr].offset;
 		dma = be64_to_cpu(rx_desc->data[nr].addr);
 
 		/* Allocate a replacement page */
@@ -444,7 +444,7 @@ fail:
 	 * the descriptor) of this packet; remaining fragments are reused... */
 	while (nr > 0) {
 		nr--;
-		put_page(skb_frags_rx[nr].page);
+		__skb_frag_unref(&skb_frags_rx[nr]);
 	}
 	return 0;
 }
@@ -452,7 +452,7 @@ fail:
 
 static struct sk_buff *mlx4_en_rx_skb(struct mlx4_en_priv *priv,
 				      struct mlx4_en_rx_desc *rx_desc,
-				      struct skb_frag_struct *skb_frags,
+				      struct page_frag *skb_frags,
 				      struct mlx4_en_rx_alloc *page_alloc,
 				      unsigned int length)
 {
@@ -474,7 +474,7 @@ static struct sk_buff *mlx4_en_rx_skb(struct mlx4_en_priv *priv,
 
 	/* Get pointer to first fragment so we could copy the headers into the
 	 * (linear part of the) skb */
-	va = page_address(skb_frags[0].page) + skb_frags[0].page_offset;
+	va = page_address(skb_frags[0].page) + skb_frags[0].offset;
 
 	if (length <= SMALL_PACKET_SIZE) {
 		/* We are copying all relevant data to the skb - temporarily
@@ -533,7 +533,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_cqe *cqe;
 	struct mlx4_en_rx_ring *ring = &priv->rx_ring[cq->ring];
-	struct skb_frag_struct *skb_frags;
+	struct page_frag *skb_frags;
 	struct mlx4_en_rx_desc *rx_desc;
 	struct sk_buff *skb;
 	int index;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 75dda26..75338eb 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -460,26 +460,13 @@ static inline void mlx4_en_xmit_poll(struct mlx4_en_priv *priv, int tx_ind)
 		}
 }
 
-static void *get_frag_ptr(struct sk_buff *skb)
-{
-	struct skb_frag_struct *frag =  &skb_shinfo(skb)->frags[0];
-	struct page *page = frag->page;
-	void *ptr;
-
-	ptr = page_address(page);
-	if (unlikely(!ptr))
-		return NULL;
-
-	return ptr + frag->page_offset;
-}
-
 static int is_inline(struct sk_buff *skb, void **pfrag)
 {
 	void *ptr;
 
 	if (inline_thold && !skb_is_gso(skb) && skb->len <= inline_thold) {
 		if (skb_shinfo(skb)->nr_frags == 1) {
-			ptr = get_frag_ptr(skb);
+			ptr = skb_frag_address_safe(&skb_shinfo(skb)->frags[0]);
 			if (unlikely(!ptr))
 				return 0;
 
@@ -756,8 +743,9 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		/* Map fragments */
 		for (i = skb_shinfo(skb)->nr_frags - 1; i >= 0; i--) {
 			frag = &skb_shinfo(skb)->frags[i];
-			dma = pci_map_page(mdev->dev->pdev, frag->page, frag->page_offset,
-					   skb_frag_size(frag), PCI_DMA_TODEVICE);
+			dma = skb_frag_dma_map(&mdev->dev->pdev->dev, frag,
+					       0, skb_frag_size(frag),
+					       DMA_TO_DEVICE);
 			data->addr = cpu_to_be64(dma);
 			data->lkey = cpu_to_be32(mdev->mr.key);
 			wmb();
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH 4/6] cxgb4vf: convert to SKB paged frag API.
From: Ian Campbell @ 2011-10-20  9:01 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Ian Campbell, Casey Leedom, netdev
In-Reply-To: <1319101275.3385.129.camel@zakaz.uk.xensource.com>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Casey Leedom <leedom@chelsio.com>
Cc: netdev@vger.kernel.org
---
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h |    2 +-
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c     |   92 ++++++++++-------------
 2 files changed, 41 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
index 594334d..611396c 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
@@ -144,7 +144,7 @@ struct sge_fl {
  * An ingress packet gather list.
  */
 struct pkt_gl {
-	skb_frag_t frags[MAX_SKB_FRAGS];
+	struct page_frag frags[MAX_SKB_FRAGS];
 	void *va;			/* virtual address of first byte */
 	unsigned int nfrags;		/* # of fragments */
 	unsigned int tot_len;		/* total length of fragments */
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
index c2d456d..8d5d55a 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
@@ -296,8 +296,8 @@ static int map_skb(struct device *dev, const struct sk_buff *skb,
 	si = skb_shinfo(skb);
 	end = &si->frags[si->nr_frags];
 	for (fp = si->frags; fp < end; fp++) {
-		*++addr = dma_map_page(dev, fp->page, fp->page_offset,
-				       skb_frag_size(fp), DMA_TO_DEVICE);
+		*++addr = skb_frag_dma_map(dev, fp, 0, skb_frag_size(fp),
+					   DMA_TO_DEVICE);
 		if (dma_mapping_error(dev, *addr))
 			goto unwind;
 	}
@@ -1357,6 +1357,35 @@ out_free:
 }
 
 /**
+ *	copy_frags - copy fragments from gather list into skb_shared_info
+ *	@skb: destination skb
+ *	@gl: source internal packet gather list
+ *	@offset: packet start offset in first page
+ *
+ *	Copy an internal packet gather list into a Linux skb_shared_info
+ *	structure.
+ */
+static inline void copy_frags(struct sk_buff *skb,
+			      const struct pkt_gl *gl,
+			      unsigned int offset)
+{
+	int i;
+
+	/* usually there's just one frag */
+	__skb_fill_page_desc(skb, 0, gl->frags[0].page,
+			     gl->frags[0].offset + offset,
+			     gl->frags[0].size - offset);
+	skb_shinfo(skb)->nr_frags = gl->nfrags;
+	for (i = 1; i < gl->nfrags; i++)
+		__skb_fill_page_desc(skb, i, gl->frags[i].page,
+				     gl->frags[i].offset,
+				     gl->frags[i].size);
+
+	/* get a reference to the last page, we don't own it */
+	get_page(gl->frags[gl->nfrags - 1].page);
+}
+
+/**
  *	t4vf_pktgl_to_skb - build an sk_buff from a packet gather list
  *	@gl: the gather list
  *	@skb_len: size of sk_buff main body if it carries fragments
@@ -1369,7 +1398,6 @@ struct sk_buff *t4vf_pktgl_to_skb(const struct pkt_gl *gl,
 				  unsigned int skb_len, unsigned int pull_len)
 {
 	struct sk_buff *skb;
-	struct skb_shared_info *ssi;
 
 	/*
 	 * If the ingress packet is small enough, allocate an skb large enough
@@ -1396,21 +1424,10 @@ struct sk_buff *t4vf_pktgl_to_skb(const struct pkt_gl *gl,
 		__skb_put(skb, pull_len);
 		skb_copy_to_linear_data(skb, gl->va, pull_len);
 
-		ssi = skb_shinfo(skb);
-		ssi->frags[0].page = gl->frags[0].page;
-		ssi->frags[0].page_offset = gl->frags[0].page_offset + pull_len;
-		skb_frag_size_set(&ssi->frags[0], skb_frag_size(&gl->frags[0]) - pull_len);
-		if (gl->nfrags > 1)
-			memcpy(&ssi->frags[1], &gl->frags[1],
-			       (gl->nfrags-1) * sizeof(skb_frag_t));
-		ssi->nr_frags = gl->nfrags;
-
+		copy_frags(skb, gl, pull_len);
 		skb->len = gl->tot_len;
 		skb->data_len = skb->len - pull_len;
 		skb->truesize += skb->data_len;
-
-		/* Get a reference for the last page, we don't own it */
-		get_page(gl->frags[gl->nfrags - 1].page);
 	}
 
 out:
@@ -1434,35 +1451,6 @@ void t4vf_pktgl_free(const struct pkt_gl *gl)
 }
 
 /**
- *	copy_frags - copy fragments from gather list into skb_shared_info
- *	@si: destination skb shared info structure
- *	@gl: source internal packet gather list
- *	@offset: packet start offset in first page
- *
- *	Copy an internal packet gather list into a Linux skb_shared_info
- *	structure.
- */
-static inline void copy_frags(struct skb_shared_info *si,
-			      const struct pkt_gl *gl,
-			      unsigned int offset)
-{
-	unsigned int n;
-
-	/* usually there's just one frag */
-	si->frags[0].page = gl->frags[0].page;
-	si->frags[0].page_offset = gl->frags[0].page_offset + offset;
-	skb_frag_size_set(&si->frags[0], skb_frag_size(&gl->frags[0]) - offset);
-	si->nr_frags = gl->nfrags;
-
-	n = gl->nfrags - 1;
-	if (n)
-		memcpy(&si->frags[1], &gl->frags[1], n * sizeof(skb_frag_t));
-
-	/* get a reference to the last page, we don't own it */
-	get_page(gl->frags[n].page);
-}
-
-/**
  *	do_gro - perform Generic Receive Offload ingress packet processing
  *	@rxq: ingress RX Ethernet Queue
  *	@gl: gather list for ingress packet
@@ -1484,7 +1472,7 @@ static void do_gro(struct sge_eth_rxq *rxq, const struct pkt_gl *gl,
 		return;
 	}
 
-	copy_frags(skb_shinfo(skb), gl, PKTSHIFT);
+	copy_frags(skb, gl, PKTSHIFT);
 	skb->len = gl->tot_len - PKTSHIFT;
 	skb->data_len = skb->len;
 	skb->truesize += skb->data_len;
@@ -1667,7 +1655,7 @@ int process_responses(struct sge_rspq *rspq, int budget)
 		rmb();
 		rsp_type = RSPD_TYPE(rc->type_gen);
 		if (likely(rsp_type == RSP_TYPE_FLBUF)) {
-			skb_frag_t *fp;
+			struct page_frag *fp;
 			struct pkt_gl gl;
 			const struct rx_sw_desc *sdesc;
 			u32 bufsz, frag;
@@ -1701,9 +1689,9 @@ int process_responses(struct sge_rspq *rspq, int budget)
 				sdesc = &rxq->fl.sdesc[rxq->fl.cidx];
 				bufsz = get_buf_size(sdesc);
 				fp->page = sdesc->page;
-				fp->page_offset = rspq->offset;
-				skb_frag_size_set(fp, min(bufsz, len));
-				len -= skb_frag_size(fp);
+				fp->offset = rspq->offset;
+				fp->size = min(bufsz, len);
+				len -= fp->size;
 				if (!len)
 					break;
 				unmap_rx_buf(rspq->adapter, &rxq->fl);
@@ -1717,9 +1705,9 @@ int process_responses(struct sge_rspq *rspq, int budget)
 			 */
 			dma_sync_single_for_cpu(rspq->adapter->pdev_dev,
 						get_buf_addr(sdesc),
-						skb_frag_size(fp), DMA_FROM_DEVICE);
+						fp->size, DMA_FROM_DEVICE);
 			gl.va = (page_address(gl.frags[0].page) +
-				 gl.frags[0].page_offset);
+				 gl.frags[0].offset);
 			prefetch(gl.va);
 
 			/*
@@ -1728,7 +1716,7 @@ int process_responses(struct sge_rspq *rspq, int budget)
 			 */
 			ret = rspq->handler(rspq, rspq->cur_desc, &gl);
 			if (likely(ret == 0))
-				rspq->offset += ALIGN(skb_frag_size(fp), FL_ALIGN);
+				rspq->offset += ALIGN(fp->size, FL_ALIGN);
 			else
 				restore_rx_bufs(&gl, &rxq->fl, frag);
 		} else if (likely(rsp_type == RSP_TYPE_CPL)) {
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH 6/6] net: add opaque struct around skb frag page
From: Ian Campbell @ 2011-10-20  9:01 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Ian Campbell
In-Reply-To: <1319101275.3385.129.camel@zakaz.uk.xensource.com>

I've split this bit out of the skb frag destructor patch since it helps enforce
the use of the fragment API.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 include/linux/skbuff.h |   10 ++++++----
 net/core/skbuff.c      |    6 +++---
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 1ebf1ea..aec73c1 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -140,7 +140,9 @@ struct sk_buff;
 typedef struct skb_frag_struct skb_frag_t;
 
 struct skb_frag_struct {
-	struct page *page;
+	struct {
+		struct page *p;
+	} page;
 #if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536)
 	__u32 page_offset;
 	__u32 size;
@@ -1175,7 +1177,7 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i,
 {
 	skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
 
-	frag->page		  = page;
+	frag->page.p		  = page;
 	frag->page_offset	  = off;
 	skb_frag_size_set(frag, size);
 }
@@ -1699,7 +1701,7 @@ static inline void netdev_free_page(struct net_device *dev, struct page *page)
  */
 static inline struct page *skb_frag_page(const skb_frag_t *frag)
 {
-	return frag->page;
+	return frag->page.p;
 }
 
 /**
@@ -1785,7 +1787,7 @@ static inline void *skb_frag_address_safe(const skb_frag_t *frag)
  */
 static inline void __skb_frag_set_page(skb_frag_t *frag, struct page *page)
 {
-	frag->page = page;
+	frag->page.p = page;
 }
 
 /**
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e271040..ca4db40 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -668,14 +668,14 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
 
 	/* skb frags release userspace buffers */
 	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
-		put_page(skb_shinfo(skb)->frags[i].page);
+		skb_frag_unref(skb, i);
 
 	uarg->callback(uarg);
 
 	/* skb frags point to kernel buffers */
 	for (i = skb_shinfo(skb)->nr_frags; i > 0; i--) {
-		skb_shinfo(skb)->frags[i - 1].page_offset = 0;
-		skb_shinfo(skb)->frags[i - 1].page = head;
+		__skb_fill_page_desc(skb, i-1, head, 0,
+				     skb_shinfo(skb)->frags[i - 1].size);
 		head = (struct page *)head->private;
 	}
 
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH 3/6] cxgb4: convert to SKB paged frag API.
From: Ian Campbell @ 2011-10-20  9:01 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Ian Campbell, Dimitris Michailidis, netdev
In-Reply-To: <1319101275.3385.129.camel@zakaz.uk.xensource.com>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Dimitris Michailidis <dm@chelsio.com>
Cc: netdev@vger.kernel.org
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |    2 +-
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |   45 ++++++++++++++-------------
 2 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 223a7f7..0fe1885 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -326,7 +326,7 @@ struct sge_fl {                     /* SGE free-buffer queue state */
 
 /* A packet gather list */
 struct pkt_gl {
-	skb_frag_t frags[MAX_SKB_FRAGS];
+	struct page_frag frags[MAX_SKB_FRAGS];
 	void *va;                         /* virtual address of first byte */
 	unsigned int nfrags;              /* # of fragments */
 	unsigned int tot_len;             /* total length of fragments */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 14f31d3..ddc1698 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -215,8 +215,8 @@ static int map_skb(struct device *dev, const struct sk_buff *skb,
 	end = &si->frags[si->nr_frags];
 
 	for (fp = si->frags; fp < end; fp++) {
-		*++addr = dma_map_page(dev, fp->page, fp->page_offset,
-				       skb_frag_size(fp), DMA_TO_DEVICE);
+		*++addr = skb_frag_dma_map(dev, fp, 0, skb_frag_size(fp),
+					   DMA_TO_DEVICE);
 		if (dma_mapping_error(dev, *addr))
 			goto unwind;
 	}
@@ -1409,22 +1409,23 @@ int cxgb4_ofld_send(struct net_device *dev, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(cxgb4_ofld_send);
 
-static inline void copy_frags(struct skb_shared_info *ssi,
+static inline void copy_frags(struct sk_buff *skb,
 			      const struct pkt_gl *gl, unsigned int offset)
 {
-	unsigned int n;
+	int i;
 
 	/* usually there's just one frag */
-	ssi->frags[0].page = gl->frags[0].page;
-	ssi->frags[0].page_offset = gl->frags[0].page_offset + offset;
-	skb_frag_size_set(&ssi->frags[0], skb_frag_size(&gl->frags[0]) - offset);
-	ssi->nr_frags = gl->nfrags;
-	n = gl->nfrags - 1;
-	if (n)
-		memcpy(&ssi->frags[1], &gl->frags[1], n * sizeof(skb_frag_t));
+	__skb_fill_page_desc(skb, 0, gl->frags[0].page,
+			     gl->frags[0].offset + offset,
+			     gl->frags[0].size - offset);
+	skb_shinfo(skb)->nr_frags = gl->nfrags;
+	for (i = 1; i < gl->nfrags; i++)
+		__skb_fill_page_desc(skb, i, gl->frags[i].page,
+				     gl->frags[i].offset,
+				     gl->frags[i].size);
 
 	/* get a reference to the last page, we don't own it */
-	get_page(gl->frags[n].page);
+	get_page(gl->frags[gl->nfrags - 1].page);
 }
 
 /**
@@ -1459,7 +1460,7 @@ struct sk_buff *cxgb4_pktgl_to_skb(const struct pkt_gl *gl,
 		__skb_put(skb, pull_len);
 		skb_copy_to_linear_data(skb, gl->va, pull_len);
 
-		copy_frags(skb_shinfo(skb), gl, pull_len);
+		copy_frags(skb, gl, pull_len);
 		skb->len = gl->tot_len;
 		skb->data_len = skb->len - pull_len;
 		skb->truesize += skb->data_len;
@@ -1478,7 +1479,7 @@ EXPORT_SYMBOL(cxgb4_pktgl_to_skb);
 static void t4_pktgl_free(const struct pkt_gl *gl)
 {
 	int n;
-	const skb_frag_t *p;
+	const struct page_frag *p;
 
 	for (p = gl->frags, n = gl->nfrags - 1; n--; p++)
 		put_page(p->page);
@@ -1522,7 +1523,7 @@ static void do_gro(struct sge_eth_rxq *rxq, const struct pkt_gl *gl,
 		return;
 	}
 
-	copy_frags(skb_shinfo(skb), gl, RX_PKT_PAD);
+	copy_frags(skb, gl, RX_PKT_PAD);
 	skb->len = gl->tot_len - RX_PKT_PAD;
 	skb->data_len = skb->len;
 	skb->truesize += skb->data_len;
@@ -1698,7 +1699,7 @@ static int process_responses(struct sge_rspq *q, int budget)
 		rmb();
 		rsp_type = RSPD_TYPE(rc->type_gen);
 		if (likely(rsp_type == RSP_TYPE_FLBUF)) {
-			skb_frag_t *fp;
+			struct page_frag *fp;
 			struct pkt_gl si;
 			const struct rx_sw_desc *rsd;
 			u32 len = ntohl(rc->pldbuflen_qid), bufsz, frags;
@@ -1717,9 +1718,9 @@ static int process_responses(struct sge_rspq *q, int budget)
 				rsd = &rxq->fl.sdesc[rxq->fl.cidx];
 				bufsz = get_buf_size(rsd);
 				fp->page = rsd->page;
-				fp->page_offset = q->offset;
-				skb_frag_size_set(fp, min(bufsz, len));
-				len -= skb_frag_size(fp);
+				fp->offset = q->offset;
+				fp->size = min(bufsz, len);
+				len -= fp->size;
 				if (!len)
 					break;
 				unmap_rx_buf(q->adap, &rxq->fl);
@@ -1731,16 +1732,16 @@ static int process_responses(struct sge_rspq *q, int budget)
 			 */
 			dma_sync_single_for_cpu(q->adap->pdev_dev,
 						get_buf_addr(rsd),
-						skb_frag_size(fp), DMA_FROM_DEVICE);
+						fp->size, DMA_FROM_DEVICE);
 
 			si.va = page_address(si.frags[0].page) +
-				si.frags[0].page_offset;
+				si.frags[0].offset;
 			prefetch(si.va);
 
 			si.nfrags = frags + 1;
 			ret = q->handler(q, q->cur_desc, &si);
 			if (likely(ret == 0))
-				q->offset += ALIGN(skb_frag_size(fp), FL_ALIGN);
+				q->offset += ALIGN(fp->size, FL_ALIGN);
 			else
 				restore_rx_bufs(&si, &rxq->fl, frags);
 		} else if (likely(rsp_type == RSP_TYPE_CPL)) {
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH net-next] virtio_net: fix truesize underestimation
From: Eric Dumazet @ 2011-10-20  9:14 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Rusty Russell, Michael S. Tsirkin, virtualization,
	Krishna Kumar

We must account in skb->truesize, the size of the fragments, not the
used part of them.

Doing this work is important to avoid unexpected OOM situations.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: virtualization@lists.linux-foundation.org
CC: Krishna Kumar <krkumar2@in.ibm.com>
---
 drivers/net/virtio_net.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index abbf34f..765ab9a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -150,6 +150,7 @@ static void set_skb_frag(struct sk_buff *skb, struct page *page,
 
 	skb->data_len += size;
 	skb->len += size;
+	skb->truesize += PAGE_SIZE;
 	skb_shinfo(skb)->nr_frags++;
 	*len -= size;
 }
@@ -287,7 +288,6 @@ static void receive_buf(struct net_device *dev, void *buf, unsigned int len)
 	}
 
 	hdr = skb_vnet_hdr(skb);
-	skb->truesize += skb->data_len;
 
 	u64_stats_update_begin(&stats->syncp);
 	stats->rx_bytes += skb->len;

^ permalink raw reply related

* RE: [patch net-next]alx: Atheros AR8131/AR8151/AR8152/AR8161 Ethernet driver
From: Ren, Cloud @ 2011-10-20  9:23 UTC (permalink / raw)
  To: David Miller
  Cc: Rodriguez, Luis, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20111020.044541.970282389722164761.davem@davemloft.net>


>From: <cloud.ren@Atheros.com>
>Date: Thu, 20 Oct 2011 14:46:24 +0800
>
>> +#define __far
>
>So much unused crap left in these header files, get rid of this stuff.
>
>+#define ALX_HW_WARN(_fmt, _args...) \
>+		ALX_HW_PRINTA(WARNING, _fmt, ## _args)
>+
>+#define ALX_HW_INFO(_fmt, _args...) \
>+		ALX_HW_PRINTA(INFO, _fmt, ## _args)
>+
>+#define ALX_HW_DBG(_fmt, _args...) \
>+		ALX_HW_PRINTA(DEBUG, _fmt, ## _args)
>+
>
>We told you to get rid of your customized debug logging interfaces, yet all of
>this stuff is still there.
>
>+/* delay function */
>+#define US_DELAY(_hw, _n)	__US_DELAY(_n)
>+#define MS_DELAY(_hw, _n)	__MS_DELAY(_n)
>+#define __US_DELAY(_n)		udelay(_n)
>+#define __MS_DELAY(_n)		mdelay(_n)
>
>Useless wrappers for standard kernel interfaces, kill this.
>
>+#define DEBUG_INFO(_a, _b)
>+#define DEBUG_INFOS(_a, _b)
>
>Again we told you to get rid of this stuff.
>
>I suspect it's going to take may rounds of feedback before this driver is
>anywhere near ready for inclusion.
>
>Please just submit it to staging and let it cook there for a couple weeks in the
>interests of our sanity.

As you saw, should I do the two following steps?
1. I firstly try to submit code to linux-staging.git. 
2. After the driver have been accepted by  linux-staging.git, I submit to net-next.git again.

^ permalink raw reply

* Re: [PATCH 0/6] skb fragment API: convert network drivers (part V, take 2)
From: David Miller @ 2011-10-20  9:23 UTC (permalink / raw)
  To: Ian.Campbell; +Cc: netdev, linux-scsi, linux-mm
In-Reply-To: <1319101275.3385.129.camel@zakaz.uk.xensource.com>

From: Ian Campbell <Ian.Campbell@citrix.com>
Date: Thu, 20 Oct 2011 10:01:15 +0100

> The following series is the second attempt to convert a fifth (and
> hopefully final) batch of network drivers to the SKB pages fragment API
> introduced in 131ea6675c76.

Applied, thanks Ian.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH net-next] bnx2x: fix skb truesize underestimation
From: David Miller @ 2011-10-20  9:23 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, eilong
In-Reply-To: <1319101223.3781.7.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 20 Oct 2011 11:00:23 +0200

> bnx2x allocates a full page per fragment.
> 
> We must account in skb->truesize, the size of the fragment, not the used
> part of it.
>     
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Eilon Greenstein <eilong@broadcom.com>

Applied.

^ permalink raw reply

* Re: [patch net-next]alx: Atheros AR8131/AR8151/AR8152/AR8161 Ethernet driver
From: David Miller @ 2011-10-20  9:25 UTC (permalink / raw)
  To: cjren; +Cc: rodrigue, netdev, linux-kernel
In-Reply-To: <6349D7A510622448B1BA0967850A8438011CC21D@nasanexd02d.na.qualcomm.com>

From: "Ren, Cloud" <cjren@qca.qualcomm.com>
Date: Thu, 20 Oct 2011 09:23:07 +0000

> As you saw, should I do the two following steps?
> 1. I firstly try to submit code to linux-staging.git. 
> 2. After the driver have been accepted by  linux-staging.git, I submit to net-next.git again.

You submit and get it into staging so that it can sit there for some
time and get reviewed and improved by others.

One doesn't submit directly to net-next right after it gets into
staging, staging is a place where your driver lives while it still
smelly funky and needs more work.

^ permalink raw reply

* Re: [PATCH net-next] virtio_net: fix truesize underestimation
From: David Miller @ 2011-10-20  9:23 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, rusty, mst, virtualization, krkumar2
In-Reply-To: <1319102086.3781.13.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 20 Oct 2011 11:14:46 +0200

> We must account in skb->truesize, the size of the fragments, not the
> used part of them.
> 
> Doing this work is important to avoid unexpected OOM situations.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops
From: David Miller @ 2011-10-20  9:30 UTC (permalink / raw)
  To: herbert; +Cc: eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras
In-Reply-To: <20111019080807.GA25099@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.hengli.com.au>
Date: Wed, 19 Oct 2011 10:08:07 +0200

> I think Eric's initial patch is probably the safest bet for rc10.
> We can then work on the proper fix for the next release.

There are two "initial patch", I wonder which one you mean.

There's his really first patch, which remoevs the lines in IP_GRE
which change dev->needed_headroom.  I was under the impression we
were against doing that.

The other patch he posted duplicates the device attribute variable
caching in two functions.

My patch is just a tweak so that we only do this sequence in one
place, the new sock_alloc_send_skb_reserve() helper, instead of
in both the ipv4 and ipv6 RAW code.

So I'm a little confused what your suggestion for rc10 really
is :-)

^ permalink raw reply

* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops
From: Herbert Xu @ 2011-10-20  9:35 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras
In-Reply-To: <20111020.053050.383972361986316046.davem@davemloft.net>

On Thu, Oct 20, 2011 at 05:30:50AM -0400, David Miller wrote:
>
> So I'm a little confused what your suggestion for rc10 really
> is :-)

I meant his first initial patch :)

While it is suboptimal in the sense that should the value of
needed_headroom increase we'll end up constantly reallocating
skbs, I believe that it is at least semantically correct.

In the time being I'll look more closely at all the users of
needed_headroom to see if there's anything we've missed.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2011-10-20  9:43 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


I have two fixes still being worked on and under discussion.  One for
pktgen giving too large values to ndelay(), and one for RAW ipv4/ipv6
sockets crashing when used over IP_GRE tunnels.  Probably I can have
both fixes finalized in about a day.

1) When bridge is removed via netlink, we hang, fix from Stephen Hemminger.

2) USE_PHYLIB flag test reversed in tg3 due to regression, fix from Jiri Pirko.

3) IPVS netns down/up deadlock fix from Hans Schillstrom.

4) Leaks and missing SKB pull calls in pptp and l2tp, from Eric Dumazet.

5) Several buffer overruns and missing skb size checks in x25, fixes from
   Matthew Daley.

6) bond_handle_frame() races with taking a bond down, resulting in crash,
   fix from Mitsuo Hayasaka.

7) R8169 WoL regression fix from Francois Romieu.  Energy Efficient Ethernet
   setting for rtl8111evl r8169 chip from Hayes Wang.

8) Add SMSC LAN89218 device IDs, from Phil Edworthy.

9) Bluetooth forgets to propagate LSM attributes on child sockets, fix
   from Paul Moore.

10) Transparent proxy doesn't propagate flag to TIME_WAIT sockets, resulting
    in resets.  Fix from KOVACS Krisztian.

Please pull, thanks a lot.

The following changes since commit 486cf46f3f9be5f2a966016c1a8fe01e32cde09e:

  mm: fix race between mremap and removing migration entry (2011-10-19 23:42:58 -0700)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

David S. Miller (1):
      Merge branch 'nf' of git://1984.lsi.us.es/net

Eric Dumazet (3):
      l2tp: fix a potential skb leak in l2tp_xmit_skb()
      pptp: fix skb leak in pptp_xmit()
      pptp: pptp_rcv_core() misses pskb_may_pull() call

Florian Westphal (1):
      netfilter: nf_conntrack: fix event flooding in GRE protocol tracker

Gao feng (1):
      netconsole: enable netconsole can make net_device refcnt incorrent

Gerrit Renker (1):
      udplite: fast-path computation of checksum coverage

Hans Schillstrom (1):
      IPVS netns shutdown/startup dead-lock

Jiri Pirko (1):
      tg3: negate USE_PHYLIB flag check

KOVACS Krisztian (1):
      tproxy: copy transparent flag when creating a time wait

Matthew Daley (3):
      x25: Validate incoming call user data lengths
      x25: Handle undersized/fragmented skbs
      x25: Prevent skb overreads when checking call user data

Mitsuo Hayasaka (1):
      bonding: use local function pointer of bond->recv_probe in bond_handle_frame

Paul Moore (1):
      bluetooth: Properly clone LSM attributes to newly created child connections

Phil Edworthy (1):
      smsc911x: Add support for SMSC LAN89218

Thadeu Lima de Souza Cascardo (1):
      ehea: Change maintainer to me

Yan, Zheng (1):
      fib_rules: fix unresolved_rules counting

françois romieu (1):
      r8169: fix driver shutdown WoL regression.

hayeswang (1):
      r8169: fix wrong eee setting for rlt8111evl

stephen hemminger (1):
      bridge: fix hang on removal of bridge via netlink

 MAINTAINERS                            |    2 +-
 drivers/net/bonding/bond_main.c        |    7 +-
 drivers/net/netconsole.c               |    5 +
 drivers/net/pptp.c                     |   22 ++++--
 drivers/net/r8169.c                    |   90 ++++++++++++++--------
 drivers/net/smsc911x.c                 |    2 +
 drivers/net/tg3.c                      |    2 +-
 include/net/ip_vs.h                    |    1 +
 include/net/udplite.h                  |   63 ++++++++--------
 net/bluetooth/l2cap_sock.c             |    4 +
 net/bluetooth/rfcomm/sock.c            |    3 +
 net/bluetooth/sco.c                    |    5 +-
 net/bridge/br_if.c                     |    9 +-
 net/bridge/br_netlink.c                |    1 +
 net/bridge/br_private.h                |    1 +
 net/core/fib_rules.c                   |    5 +-
 net/ipv4/tcp_minisocks.c               |    1 +
 net/l2tp/l2tp_core.c                   |    4 +-
 net/netfilter/ipvs/ip_vs_ctl.c         |  131 +++++++++++++++++++------------
 net/netfilter/ipvs/ip_vs_sync.c        |    6 ++
 net/netfilter/nf_conntrack_proto_gre.c |    4 +-
 net/x25/af_x25.c                       |   40 ++++++++--
 net/x25/x25_dev.c                      |    6 ++
 net/x25/x25_facilities.c               |   10 ++-
 net/x25/x25_in.c                       |   43 +++++++++-
 net/x25/x25_link.c                     |    3 +
 net/x25/x25_subr.c                     |   14 +++-
 security/security.c                    |    1 +
 28 files changed, 330 insertions(+), 155 deletions(-)

^ permalink raw reply

* RE: [patch net-next]alx: Atheros AR8131/AR8151/AR8152/AR8161 Ethernet driver
From: Ren, Cloud @ 2011-10-20  9:48 UTC (permalink / raw)
  To: David Miller
  Cc: Rodriguez, Luis, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20111020.052506.373437241768777548.davem@davemloft.net>

>From: "Ren, Cloud" <cjren@qca.qualcomm.com>
>Date: Thu, 20 Oct 2011 09:23:07 +0000
>
>> As you saw, should I do the two following steps?
>> 1. I firstly try to submit code to linux-staging.git.
>> 2. After the driver have been accepted by  linux-staging.git, I submit to net-
>next.git again.
>
>You submit and get it into staging so that it can sit there for some time and get
>reviewed and improved by others.
>
>One doesn't submit directly to net-next right after it gets into staging, staging
>is a place where your driver lives while it still smelly funky and needs more
>work.

The driver will support the next generation NICs of Atheros. Meanwhile, the driver can 
also have better optimization for AR8131 and AR8151 than atl1c. For some reason, we 
don't plan to patch atl1c driver to support our new NIC, such as AR8161. So I hope the driver
can stay in net-next in the end. Of course, I will be responsible for modify source code and 
let it match kernel requirements.

^ permalink raw reply

* Re: Comment on nf_queue NF_STOLEN patch
From: Pablo Neira Ayuso @ 2011-10-20 10:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jim Sansing, Linux Network Development list,
	Netfilter Development Mailinglist, Florian Westphal
In-Reply-To: <1318997435.19139.16.camel@edumazet-laptop>

On Wed, Oct 19, 2011 at 06:10:35AM +0200, Eric Dumazet wrote:
> Le mardi 18 octobre 2011 à 17:34 -0400, Jim Sansing a écrit :
> > Eric Dumazet wrote:
> > > Le mardi 18 octobre 2011 à 15:08 -0400, Jim Sansing a écrit :
> > >   
> > >> I have been working on a kernel module that registers with netfilter,
> > >> and I noticed that a patch was added to nf_queue that changed the
> > >> handling of return code NF_FILTER from 'do nothing' to 'free the skb'. 
> > >> I'm not sure which kernel version this went in, but the date of the
> > >> patch is Feb, 19, 2010.
> > >>
> > >> Everything I have read about netfilter states that it is up to the
> > >> netfilter hook to free the skb if NF_STOLEN is returned.  The
> > >> implications of this patch from a hook programming perspective are:
> > >>
> > >> 1) If the skb is used after the return from the hook, it must be cloned.
> > >> 2) The original skb must not be freed.
> > >>
> > >> I suggest that a comment be added to include/linux/netfilter.h that says
> > >> explicitly the skb will be freed if NF_STOLEN is returned.
> > >>     
> > >
> > > But its not true. Just read the code.
> > >
> > > If you are working on this stuff I recommend you take a look at
> > > commits :
> > >
> > > c6675233f9015d3c0460c8aab53ed9b99d915c64
> > > (netfilter: nf_queue: reject NF_STOLEN verdicts from userspace)
> > >
> > > fad54440438a7c231a6ae347738423cbabc936d9
> > > (netfilter: avoid double free in nf_reinject)
> > >
> > > 64507fdbc29c3a622180378210ecea8659b14e40
> > > (netfilter: nf_queue: fix NF_STOLEN skb leak)
> > >
> > > 3bc38712e3a6e0596ccb6f8299043a826f983701
> > > ([NETFILTER]: nf_queue: handle NF_STOP and unknown verdicts in
> > > nf_reinject)
> > >
> > >   
> > 
> > I see that fad54440438a7c231a6ae347738423cbabc936d9 (netfilter: avoid
> > double free in nf_reinject) returns the switch case for NF_STOLEN back
> > to the original state, but I just downloaded 3.0.4, and the skb is still
> > freed.  So for some versions of the kernel, the situation exists. 
> > Hopefully anyone who runs into it will find this thread.
> > 
> 
> Hopefully netfilter guys (CCed) will sort out the problem and ask stable
> submissions, if not already done. 3.0.4 is quite old :)

Not done yet, sorry. I'll do it asap.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 3/4] net: xen-netback: use API provided by xenbus module to map rings
From: David Vrabel @ 2011-10-20 10:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, David Vrabel, netdev, David S . Miller
In-Reply-To: <1319107519-2253-1-git-send-email-david.vrabel@citrix.com>

The xenbus module provides xenbus_map_ring_valloc() and
xenbus_map_ring_vfree().  Use these to map the Tx and Rx ring pages
granted by the frontend.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
Dave, this is a standalone patch and can be applied independently of
the rest of the series.

 drivers/net/xen-netback/common.h  |   11 ++---
 drivers/net/xen-netback/netback.c |   80 ++++++++-----------------------------
 2 files changed, 22 insertions(+), 69 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 161f207..94b79c3 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -58,10 +58,6 @@ struct xenvif {
 	u8               fe_dev_addr[6];
 
 	/* Physical parameters of the comms window. */
-	grant_handle_t   tx_shmem_handle;
-	grant_ref_t      tx_shmem_ref;
-	grant_handle_t   rx_shmem_handle;
-	grant_ref_t      rx_shmem_ref;
 	unsigned int     irq;
 
 	/* List of frontends to notify after a batch of frames sent. */
@@ -70,8 +66,6 @@ struct xenvif {
 	/* The shared rings and indexes. */
 	struct xen_netif_tx_back_ring tx;
 	struct xen_netif_rx_back_ring rx;
-	struct vm_struct *tx_comms_area;
-	struct vm_struct *rx_comms_area;
 
 	/* Frontend feature information. */
 	u8 can_sg:1;
@@ -106,6 +100,11 @@ struct xenvif {
 	wait_queue_head_t waiting_to_free;
 };
 
+static inline struct xenbus_device *xenvif_to_xenbus_device(struct xenvif *vif)
+{
+	return to_xenbus_device(vif->dev->dev.parent);
+}
+
 #define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
 #define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index fd00f25..3af2924 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1577,88 +1577,42 @@ static int xen_netbk_kthread(void *data)
 
 void xen_netbk_unmap_frontend_rings(struct xenvif *vif)
 {
-	struct gnttab_unmap_grant_ref op;
-
-	if (vif->tx.sring) {
-		gnttab_set_unmap_op(&op, (unsigned long)vif->tx_comms_area->addr,
-				    GNTMAP_host_map, vif->tx_shmem_handle);
-
-		if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1))
-			BUG();
-	}
-
-	if (vif->rx.sring) {
-		gnttab_set_unmap_op(&op, (unsigned long)vif->rx_comms_area->addr,
-				    GNTMAP_host_map, vif->rx_shmem_handle);
-
-		if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1))
-			BUG();
-	}
-	if (vif->rx_comms_area)
-		free_vm_area(vif->rx_comms_area);
-	if (vif->tx_comms_area)
-		free_vm_area(vif->tx_comms_area);
+	if (vif->tx.sring)
+		xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(vif),
+					vif->tx.sring);
+	if (vif->rx.sring)
+		xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(vif),
+					vif->rx.sring);
 }
 
 int xen_netbk_map_frontend_rings(struct xenvif *vif,
 				 grant_ref_t tx_ring_ref,
 				 grant_ref_t rx_ring_ref)
 {
-	struct gnttab_map_grant_ref op;
+	void *addr;
 	struct xen_netif_tx_sring *txs;
 	struct xen_netif_rx_sring *rxs;
 
 	int err = -ENOMEM;
 
-	vif->tx_comms_area = alloc_vm_area(PAGE_SIZE);
-	if (vif->tx_comms_area == NULL)
+	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
+				     tx_ring_ref, &addr);
+	if (err)
 		goto err;
 
-	vif->rx_comms_area = alloc_vm_area(PAGE_SIZE);
-	if (vif->rx_comms_area == NULL)
-		goto err;
-
-	gnttab_set_map_op(&op, (unsigned long)vif->tx_comms_area->addr,
-			  GNTMAP_host_map, tx_ring_ref, vif->domid);
-
-	if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
-		BUG();
-
-	if (op.status) {
-		netdev_warn(vif->dev,
-			    "failed to map tx ring. err=%d status=%d\n",
-			    err, op.status);
-		err = op.status;
-		goto err;
-	}
-
-	vif->tx_shmem_ref    = tx_ring_ref;
-	vif->tx_shmem_handle = op.handle;
-
-	txs = (struct xen_netif_tx_sring *)vif->tx_comms_area->addr;
+	txs = (struct xen_netif_tx_sring *)addr;
 	BACK_RING_INIT(&vif->tx, txs, PAGE_SIZE);
 
-	gnttab_set_map_op(&op, (unsigned long)vif->rx_comms_area->addr,
-			  GNTMAP_host_map, rx_ring_ref, vif->domid);
-
-	if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
-		BUG();
-
-	if (op.status) {
-		netdev_warn(vif->dev,
-			    "failed to map rx ring. err=%d status=%d\n",
-			    err, op.status);
-		err = op.status;
+	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
+				     rx_ring_ref, &addr);
+	if (err)
 		goto err;
-	}
-
-	vif->rx_shmem_ref     = rx_ring_ref;
-	vif->rx_shmem_handle  = op.handle;
-	vif->rx_req_cons_peek = 0;
 
-	rxs = (struct xen_netif_rx_sring *)vif->rx_comms_area->addr;
+	rxs = (struct xen_netif_rx_sring *)addr;
 	BACK_RING_INIT(&vif->rx, rxs, PAGE_SIZE);
 
+	vif->rx_req_cons_peek = 0;
+
 	return 0;
 
 err:
-- 
1.7.2.5

^ permalink raw reply related

* Re: [RFC PATCH 0/5] SUNRPC: "RPC pipefs per network namespace" preparations
From: Stanislav Kinsbursky @ 2011-10-20 11:06 UTC (permalink / raw)
  To: Trond.Myklebust@netapp.com
  Cc: linux-nfs@vger.kernel.org, Pavel Emelianov, neilb@suse.de,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	bfields@fieldses.org, davem@davemloft.net, devel@openvz.org
In-Reply-To: <20111017120629.4541.67395.stgit@localhost6.localdomain6>

Guys, please, spend some of your expensive time to review this patch-set briefly.
This is not for commit, but just an idea representation.
I really need some opinions about it, since all my further work aroud RPC pipefs 
depends on it.
IOW I need to now, does anyone has something against this idea.
Trond, please, respond, does this idea suits you in general or not?

17.10.2011 17:10, Stanislav Kinsbursky пишет:
> Hello to everyone.
> RPC pipefs file system have to work per network namespace context is required
> prior to any NFS modifications.
> This is a way how to do it. I'll really appreciate for any comments.
>
> There are several statements about how to make RPC pipefs working per network
> namespace context.
> Here they are:
> 1) RPC pipefs should be mounted per network namespace context.
> 2) RPC pipefs superblock should holds network namespace while active.
> 3) RPC pipefs lookup and readir should be perfomed in network namespace context
> it was mounted. IOW, user-space process, working in another network namespace
> context, should see RPC pipefs dentries from network namespace context this
> mount-point was created (like it was done for sysfs).
>
> These statement leads to some restrictions which we must follow during
> implementation. Here are they:
> 1) RPC pipefs mount can't be performed in kernel context since new super block
> will holds networks namespace reference and it's impossible to recognize, when
> and how we have to release this mount point. IOW rpc_get_mount() and
> rpc_put_mount() have to be removed.
> 2) RPC pipefs should provide some new helpers to lookup directory dentry for
> those modules which creates pipes, because without RPC pipefs mount point
> general lookup can't be performed.
> 3) These methods must garantee, that pipefs superblock will be active during
> pipes creation and destruction.
>
> So, here is the idea of making RPC pipefs works per network namespace context:
> 1) RPC pipefs superblock should holds network namespcae context while active.
> 2) RPC pipefs should send notification events on superblock creation and
> destruction.
> 3) RPC pipefs should provide "lookup dentry by name" method for notification
> subscribers.
> 4) RPC pipefs should place superblock reference on current network namespace
> context on creation and remove it on destruction.
> 5) RPC pipefs should provide safe "lookup dentry by name" method for per-net
> operations, which garantees, that superblock is active, while
> per-net-operations are performing.
> 6) Client and cache directories creation and destruction should be performed
> also on superblock creation and destruction notification events. Note: generic
> creation (like now) can fail (if no superblock is not created yet).
> 7) Pipes creation and destruction should be performed on superblock creation
> and destruction events. Also pipes operations should be performed during
> per-net operation and in this case they could fail (due to the same reason as
> in statement above).
>
> This patch-set implements first 5 points and thus doesn't affects current RPC
> pipefs logic.
>
> The only problem about I'm not sure how to solve properly yet, is auth gss
> pipes creations operations. Hoping for some help with it.
>
>
> The following series consists of:
>
> ---
>
> Stanislav Kinsbursky (5):
>        SUNRPC: hold current network namespace while pipefs superblock is active
>        SUNRPC: send notification events on pipefs sb creation and destruction
>        SUNRPC: pipefs dentry lookup helper introduced
>        SUNRPC: put pipefs superblock link on network namespace
>        SUNRPC: pipefs per-net operations helper introduced
>
>
>   include/linux/sunrpc/rpc_pipe_fs.h |   16 ++++++
>   net/sunrpc/netns.h                 |    3 +
>   net/sunrpc/rpc_pipe.c              |  103 ++++++++++++++++++++++++++++++++++++
>   net/sunrpc/sunrpc_syms.c           |    1
>   4 files changed, 122 insertions(+), 1 deletions(-)
>


-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox