Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [net-next 0/4][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2012-05-07 16:33 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1336374778.2386.1.camel@jtkirshe-mobl>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 07 May 2012 00:12:58 -0700

> On Sun, 2012-05-06 at 13:25 -0400, David Miller wrote:
>> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> Date: Sat,  5 May 2012 05:38:09 -0700
>> 
>> > This series of patches contains updates for e1000e and ixgbe.
>> > 
>> > NOTE- The ixgbe patch can and probably should be applied to
>> > David Miller's net tree as well.
>> > 
>> > The following are changes since commit bd14b1b2e29bd6812597f896dde06eaf7c6d2f24:
>> >   tcp: be more strict before accepting ECN negociation
>> > and are available in the git repository at:
>> >   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
>> 
>> No new changes there?
>> 
>> [davem@drr net-next]$ git pull git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
>> From git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
>>  * branch            master     -> FETCH_HEAD
>> Already up-to-date.
> 
> Sorry Dave, I thought I had pushed the changes but it appears I did not.
> I have rectified that and now my net-next tree contains the four
> patches.

That looks better, pulled, thanks Jeff.

^ permalink raw reply

* Re: [PATCH] mwl8k: Add 0x2a02 PCI device-id (Marvell 88W8361)
From: Dan Williams @ 2012-05-07 16:09 UTC (permalink / raw)
  To: Lennert Buytenhek
  Cc: sedat.dilek, John W. Linville, linux-wireless, netdev,
	linux-kernel, lautriv, Jim Cromie
In-Reply-To: <20120501125102.GN3157@wantstofly.org>

On Tue, 2012-05-01 at 14:51 +0200, Lennert Buytenhek wrote:
> On Sun, Apr 29, 2012 at 12:25:21AM +0200, Sedat Dilek wrote:
> 
> > > On 1st sight, logs look fine:
> > >
> > > [21:52:52] <lautriv> [    6.050967] ieee80211 phy0: 88w8361p v4,
> > > 00173f3bdde3, STA firmware 2.1.4.25
> > >
> > > But WLAN connection is not that fast and stable as lautriv reports
> > > (several abnormalities were observed).
> > >
> > > I requested a tarball which includes:
> > > * dmesg (Linux-3.3.3)
> > > * e_n_a (/etc/network/interfaces)
> > > * ifconfig output
> > > * iwconfig output
> > > * iw_phy output
> > > * ps_axu (WPA) output
> > >
> > > lautriv will be so kind to be around on #linux-wireless/Freenode the
> > > next days (UTC+2: German/Swiss local-time).
> > > Just ping him.
> > >
> > > Hope you have fun, together!
> > >
> > > - Sedat -
> > 
> > A new tarball from lautriv with same outputs as before, but now tested
> > with Linux-3.4-rc4.
> 
> The output looks good enough for me to ACK adding the PCI ID.
> 
> Can the firmware being used here be submitted to the linux-firmware
> git tree?

So Marvell sent John a driver for TopDog a long time ago, which he put
up on kernel.org.  That driver was reworked by Louis and put up in a git
tree, but both were lost to the kernel.org hack.  I have git backups of
both git trees.  I put Louis' cleanup here:

http://people.redhat.com/dcbw/mrvl_cb82.tar.bz2

That driver (mrvl_cb82) has the following PCI IDs:

static const struct pci_device_id mwl_id_tbl[] __devinitdata = {
        { PCI_VDEVICE(MARVELL, 0x2a02), 0 },
        { PCI_VDEVICE(MARVELL, 0x2a03), 1 },
        { PCI_VDEVICE(MARVELL, 0x2a06), 2 },
        { PCI_VDEVICE(MARVELL, 0x2a07), 3 },
        { PCI_VDEVICE(MARVELL, 0x2a04), 4 },
        { PCI_VDEVICE(MARVELL, 0x2a08), 5 },
        { PCI_VDEVICE(MARVELL, 0x2a0a), 6 },
        { PCI_VDEVICE(MARVELL, 0x2a0b), 7 },
        { PCI_VDEVICE(MARVELL, 0x2a0c), 8 },
        { 0 }
};

and supposedly works for CB82 + CB85.  The firmware helper for CB82
looks pretty close to the mwl8k one.

The firmware API exposed by mrvl_cb82 looks very close to mwl8k
actually.  I only checked the HostCmd bits, not the structures, so I
would expect a few differences.  There are some commands that mwl8k
exposes that mrvl_cb82 does not and vice versa, but I'm not sure if the
drivers actually use those commands.

Hope this helps.

Dan

^ permalink raw reply

* Re: New commands to configure IOV features
From: Greg Rose @ 2012-05-07 15:16 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: Ben Hutchings, netdev@vger.kernel.org
In-Reply-To: <4FA7AF62.8000405@broadcom.com>

On Mon, 7 May 2012 14:17:54 +0300
Yuval Mintz <yuvalmin@broadcom.com> wrote:

> I've tried to figure out if there was a standard interface
> (ethtool/iproute) through which a user could configure the number
> of vfs in his system.
> 
> I've seen the RFC suggested in
> http://markmail.org/thread/qblfcv7zbxsxp7q6, and
> http://markmail.org/thread/fw54dcppmxuxoe6n, but failed to see any
> later references to it (commits or further discussion on this topic).
> 
> How exactly are things standing with these RFCs? Were they abandoned?

The only way to configure the number of VFs continues to be through the
max_vfs module parameter.  I've got a patch to do it through ethtool
sitting on the back burner but due to other requirements of my day job
I've not been able to work on it since last fall.

- Greg

> 
> Thanks,
> Yuval
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Pablo Neira Ayuso @ 2012-05-07 14:54 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com
In-Reply-To: <201205071457.32127.hans.schillstrom@ericsson.com>

On Mon, May 07, 2012 at 02:57:30PM +0200, Hans Schillstrom wrote:
> On Monday 07 May 2012 14:22:32 Pablo Neira Ayuso wrote:
> > On Mon, May 07, 2012 at 02:09:46PM +0200, Hans Schillstrom wrote:
> > > On Monday 07 May 2012 13:56:12 Pablo Neira Ayuso wrote:
> > > > On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
> > > > > > > We have plenty of rules where just source port mask is zero.
> > > > > > > and the dest-port-mask is 0xfffc (or 0xffff)
> > > > > > 
> > > > > > 0xffff and 0x0000 means on/off respectively.
> > > > > > 
> > > > > > Still curious, how can 0xfffc be useful?
> > > > > 
> > > > > That's a special case where an appl is using 4 ports.
> > > > > But in general, have not seen other than "on/off" except for above.
> > > > 
> > > > I see. Well I'm fine with this way to switch on/off things, just
> > > > wanted some clafication.
> > > > 
> > > > Still one final thing I'd like to remove before inclusion:
> > > > 
> > > > +       union hmark_ports       port_mask;
> > > > +       union hmark_ports       port_set;
> > > > +       __u32                   spi_mask;
> > > > +       __u32                   spi_set;
> > > > 
> > > > the spi_mask seems redundant. The port_mask already provides u32 for
> > > > it.
> > > 
> > > No problems, I'll remove it.
> > 
> > OK. As a nice side-effect, this will lead to removing the branch that
> > tests ESP/AH in hmark_set_tuple_ports.
> >
> Yes, only check if not ESP or AH to swap src/dst

Do you really that branch? I mean, unless I'm missing anything, swapping
them shouldn't be a problem.

^ permalink raw reply

* [PATCH net] cdc_ether: add Novatel USB551L device IDs for FLAG_WWAN
From: Dan Williams @ 2012-05-07 14:24 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA, Oliver Neukum,
	stable-u79uwXL29TY76Z2rM5mHXA

Needs to be tagged with FLAG_WWAN, which since it has generic
descriptors, won't happen if we don't override the generic
driver info.

Cc: Oliver Neukum <oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org>
Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: Dan Williams <dcbw-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/net/usb/cdc_ether.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index 90a3002..41cd7da 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -475,6 +475,7 @@ static const struct driver_info wwan_info = {
 /*-------------------------------------------------------------------------*/
 
 #define HUAWEI_VENDOR_ID	0x12D1
+#define NOVATEL_VENDOR_ID	0x1410
 
 static const struct usb_device_id	products [] = {
 /*
@@ -592,6 +593,21 @@ static const struct usb_device_id	products [] = {
  * because of bugs/quirks in a given product (like Zaurus, above).
  */
 {
+	/* Novatel USB551L */
+	/* This match must come *before* the generic CDC-ETHER match so that
+	 * we get FLAG_WWAN set on the device, since it's descriptors are
+	 * generic CDC-ETHER.
+	 */
+	.match_flags    =   USB_DEVICE_ID_MATCH_VENDOR
+		 | USB_DEVICE_ID_MATCH_PRODUCT
+		 | USB_DEVICE_ID_MATCH_INT_INFO,
+	.idVendor               = NOVATEL_VENDOR_ID,
+	.idProduct		= 0xB001,
+	.bInterfaceClass	= USB_CLASS_COMM,
+	.bInterfaceSubClass	= USB_CDC_SUBCLASS_ETHERNET,
+	.bInterfaceProtocol	= USB_CDC_PROTO_NONE,
+	.driver_info = (unsigned long)&wwan_info,
+}, {
 	USB_INTERFACE_INFO(USB_CLASS_COMM, USB_CDC_SUBCLASS_ETHERNET,
 			USB_CDC_PROTO_NONE),
 	.driver_info = (unsigned long) &cdc_info,
-- 
1.7.7.6


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH net] cdc_ether: add Novatel USB551L device IDs for FLAG_WWAN
From: Oliver Neukum @ 2012-05-07 14:23 UTC (permalink / raw)
  To: Dan Williams
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1336400691.2385.10.camel-wKZy7rqYPVb5EHUCmHmTqw@public.gmane.org>

Am Montag, 7. Mai 2012, 16:24:51 schrieb Dan Williams:
> Needs to be tagged with FLAG_WWAN, which since it has generic
> descriptors, won't happen if we don't override the generic
> driver info.
> 
> Cc: Oliver Neukum <oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Signed-off-by: Dan Williams <dcbw-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Acked-by: Oliver Neukum <oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] net: compare_ether_addr[_64bits]() has no ordering
From: Johannes Berg @ 2012-05-07 14:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1336398809.3752.2313.camel@edumazet-glaptop>

On Mon, 2012-05-07 at 15:53 +0200, Eric Dumazet wrote:
> On Mon, 2012-05-07 at 15:39 +0200, Johannes Berg wrote:
> > From: Johannes Berg <johannes.berg@intel.com>
> > 
> > Neither compare_ether_addr() nor compare_ether_addr_64bits()
> > (as it can fall back to the former) have comparison semantics
> > like memcmp() where the sign of the return value indicates sort
> > order. We had a bug in the wireless code due to a blind memcmp
> > replacement because of this.
> > 
> > A cursory look suggests that the wireless bug was the only one
> > due to this semantic difference.
> > 
> > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> > ---
> >  include/linux/etherdevice.h |   11 ++++++-----
> >  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> The right way to avoid this kind of problems is to change these
> functions to return a bool

Well, I guess so, but that'd be a weird thing for a compare_ function...
should probably be named equal_... then, but I'm not really able to do
such a huge change on the first day after my vacation :-)

johannes

^ permalink raw reply

* [PATCH RFC 6/6] skbuff: set zerocopy flag on frag destructor
From: Michael S. Tsirkin @ 2012-05-07 13:54 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <cover.1336397823.git.mst@redhat.com>

set tx flags when adding a frag destructor to
force frags to be orphaned as appropriate.
copy when copying frags between skbs.
Note: rare false positives (where flag is set with no
destructors) are harmless.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/linux/skbuff.h |   19 ++++++++++++++++++-
 net/core/skbuff.c      |    3 +++
 2 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 28d842e..2876e4d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -258,6 +258,13 @@ enum {
 	SKBTX_WIFI_STATUS = 1 << 5,
 };
 
+static inline void skb_copy_frag_destructor(struct sk_buff *to,
+					    struct sk_buff *from)
+{
+	skb_shinfo(to)->tx_flags |= skb_shinfo(from)->tx_flags &
+		SKBTX_DEV_ZEROCOPY;
+}
+
 /*
  * The callback notifies userspace to release buffers when skb DMA is done in
  * lower device, the skb last reference should be 0 when calling this.
@@ -1260,6 +1267,8 @@ static inline void skb_frag_set_destructor(struct sk_buff *skb, int i,
 {
 	skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
 	frag->page.destructor = destroy;
+	if (destroy)
+		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
 }
 
 /**
@@ -1726,7 +1735,15 @@ static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask)
 	return skb_copy_ubufs(skb, gfp_mask);
 }
 
-
+/**
+ *	skb_copy_frag_destructor - update skb after destructor copy
+ *	@to: target skb to which frags were copied
+ *	@from: source skb from which frags where copied
+ *
+ *	Called after some frags move between skbs.
+ *	If any frags in @from have a destructor, a flag in tx_flags is set.
+ *	Set flag for @to so that it gets checked for destructors.
+ */
 static inline void skb_copy_frag_destructor(struct sk_buff *to,
 					    struct sk_buff *from)
 {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index bdf5b09..83b04d9 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -902,6 +902,7 @@ struct sk_buff *__pskb_copy(struct sk_buff *skb, int headroom, gfp_t gfp_mask)
 	n->truesize += skb->data_len;
 	n->data_len  = skb->data_len;
 	n->len	     = skb->len;
+	skb_copy_frag_destructor(n, skb);
 
 	if (skb_shinfo(skb)->nr_frags) {
 		int i;
@@ -2302,6 +2303,7 @@ static inline void skb_split_no_header(struct sk_buff *skb,
 void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len)
 {
 	int pos = skb_headlen(skb);
+	skb_copy_frag_destructor(skb1, skb);
 
 	if (len < pos)	/* Split line is inside header. */
 		skb_split_inside_header(skb, skb1, len, pos);
@@ -2781,6 +2783,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features)
 			if (unlikely(!nskb))
 				goto err;
 
+			skb_copy_frag_destructor(nskb, skb);
 			skb_reserve(nskb, headroom);
 			__skb_put(nskb, doffset);
 		}
-- 
MST

^ permalink raw reply related

* [PATCH RFC 5/6] net: orphan frags on receive
From: Michael S. Tsirkin @ 2012-05-07 13:54 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <cover.1336397823.git.mst@redhat.com>

zero copy packets are normally sent to the outside
network, but bridging, tun etc might loop them
back to host networking stack. If this happens
destructors will never be called, so orphan
the frags immediately on receive.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 net/core/dev.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index a2be59f..c0cdc00 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1630,6 +1630,8 @@ static inline int deliver_skb(struct sk_buff *skb,
 			      struct packet_type *pt_prev,
 			      struct net_device *orig_dev)
 {
+	if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
+		return -ENOMEM;
 	atomic_inc(&skb->users);
 	return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 }
-- 
MST

^ permalink raw reply related

* [PATCH RFC 4/6] tun: orphan frags on xmit
From: Michael S. Tsirkin @ 2012-05-07 13:54 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <cover.1336397823.git.mst@redhat.com>

tun xmit is actually receive of the internal tun
socket. Orphan the frags same as we do for normal rx path.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/tun.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index bb8c72c..1ccbfd0 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -414,6 +414,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)

 	/* Orphan the skb - required as we might hang on to it
 	 * for indefinite time. */
+	if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
+		goto drop;
 	skb_orphan(skb);

 	/* Enqueue packet */
-- 
MST

^ permalink raw reply related

* [PATCH RFC 3/6] skbuff: convert to skb_orphan_frags
From: Michael S. Tsirkin @ 2012-05-07 13:54 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <cover.1336397823.git.mst@redhat.com>

Reduce dode duplication a bit using the new helper.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 net/core/skbuff.c |   22 ++++++++--------------
 1 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index bd28e80..bdf5b09 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -785,10 +785,8 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 {
 	struct sk_buff *n;
 
-	if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
-		if (skb_copy_ubufs(skb, gfp_mask))
-			return NULL;
-	}
+	if (skb_orphan_frags(skb, gfp_mask))
+		return NULL;
 
 	n = skb + 1;
 	if (skb->fclone == SKB_FCLONE_ORIG &&
@@ -908,12 +906,10 @@ struct sk_buff *__pskb_copy(struct sk_buff *skb, int headroom, gfp_t gfp_mask)
 	if (skb_shinfo(skb)->nr_frags) {
 		int i;
 
-		if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
-			if (skb_copy_ubufs(skb, gfp_mask)) {
-				kfree_skb(n);
-				n = NULL;
-				goto out;
-			}
+		if (skb_orphan_frags(skb, gfp_mask)) {
+			kfree_skb(n);
+			n = NULL;
+			goto out;
 		}
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
 			skb_shinfo(n)->frags[i] = skb_shinfo(skb)->frags[i];
@@ -1005,10 +1001,8 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 		skb_free_head(skb);
 	} else {
 		/* copy this zero copy skb frags */
-		if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
-			if (skb_copy_ubufs(skb, gfp_mask))
-				goto nofrags;
-		}
+		if (skb_orphan_frags(skb, gfp_mask))
+			goto nofrags;
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
 			skb_frag_ref(skb, i);
 
-- 
MST

^ permalink raw reply related

* [PATCH RFC 2/6] skbuff: add an api to orphan frags
From: Michael S. Tsirkin @ 2012-05-07 13:54 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <cover.1336397823.git.mst@redhat.com>

Many places do
       if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY))
		skb_copy_ubufs(skb, gfp_mask);
to copy and invoke frag destructors if necessary.
Add an inline helper for this.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/linux/skbuff.h |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bb78f70..28d842e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1711,6 +1711,30 @@ static inline void skb_orphan(struct sk_buff *skb)
 }
 
 /**
+ *	skb_orphan_frags - orphan the frags contained in a buffer
+ *	@skb: buffer to orphan frags from
+ *	@gfp_mask: allocation mask for replacement pages
+ *
+ *	For each frag in the SKB which needs a destructor (i.e. has an
+ *	owner) create a copy of that frag and release the original
+ *	page by calling the destructor.
+ */
+static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask)
+{
+	if (likely(!(skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY)))
+		return 0;
+	return skb_copy_ubufs(skb, gfp_mask);
+}
+
+
+static inline void skb_copy_frag_destructor(struct sk_buff *to,
+					    struct sk_buff *from)
+{
+	skb_shinfo(to)->tx_flags |= skb_shinfo(from)->tx_flags &
+		SKBTX_DEV_ZEROCOPY;
+}
+
+/**
  *	__skb_queue_purge - empty a list
  *	@list: list to empty
  *
-- 
MST

^ permalink raw reply related

* [PATCH RFC 1/6] skbuff: support per-page destructors in copy_ubufs
From: Michael S. Tsirkin @ 2012-05-07 13:54 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <cover.1336397823.git.mst@redhat.com>

sunrpc wants to use zero copy with tcp which means
some fragments are zero copy while others are not.
This in turn means there's no per skb destructor_arg,
instead some fragments have destructors. Teach
skb_copy_ubufs and skb_release_data to handle such skbs.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 net/core/skbuff.c |   18 ++++++++++++++----
 1 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c81240c..bd28e80 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -423,7 +423,7 @@ static void skb_release_data(struct sk_buff *skb)
 			struct ubuf_info *uarg;
 
 			uarg = skb_shinfo(skb)->destructor_arg;
-			if (uarg->callback)
+			if (uarg && uarg->callback)
 				uarg->callback(uarg);
 		}
 
@@ -721,6 +721,8 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
 	for (i = 0; i < num_frags; i++) {
 		u8 *vaddr;
 		skb_frag_t *f = &skb_shinfo(skb)->frags[i];
+		if (unlikely((!uarg && !f->page.destructor)))
+			continue;
 
 		page = alloc_page(GFP_ATOMIC);
 		if (!page) {
@@ -740,13 +742,21 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
 	}
 
 	/* skb frags release userspace buffers */
-	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
-		skb_frag_unref(skb, i);
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		skb_frag_t *f = &skb_shinfo(skb)->frags[i];
+		if (unlikely((!uarg && !f->page.destructor)))
+			continue;
+		__skb_frag_unref(f);
+	}
 
-	uarg->callback(uarg);
+	if (uarg)
+		uarg->callback(uarg);
 
 	/* skb frags point to kernel buffers */
 	for (i = skb_shinfo(skb)->nr_frags; i > 0; i--) {
+		skb_frag_t *f = &skb_shinfo(skb)->frags[i];
+		if (unlikely((!uarg && !f->page.destructor)))
+			continue;
 		__skb_fill_page_desc(skb, i-1, head, 0,
 				     skb_shinfo(skb)->frags[i - 1].size);
 		head = (struct page *)head->private;
-- 
MST

^ permalink raw reply related

* [PATCH RFC 0/6] copy aside frags with destructors (was [PATCH 7/9] net: add skb_orphan_frags to copy aside frags with destructors)
From: Michael S. Tsirkin @ 2012-05-07 13:53 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com

On Fri, May 04, 2012 at 11:51:31AM +0100, Ian Campbell wrote:
> On Fri, 2012-05-04 at 11:03 +0100, Michael S. Tsirkin wrote:
> > On Fri, May 04, 2012 at 02:54:33AM -0400, David Miller wrote:
> > > From: "Michael S. Tsirkin" <mst@redhat.com>
> > > Date: Fri, 4 May 2012 00:10:24 +0300
> > > 
> > > > Hmm we orphan skbs when we loop them back so how about reusing the
> > > > skb->destructor for this?
> > > 
> > > That's one possibility.

So originally I thought it would work: destructor would wake the
original owner which would do data copy and modify the fragments.  But
then I unfortunately realized that would be racy: the new owner could be
using the old frags and there appears no way for us to
make sure it doesn't so that we can put the original page.
And the same logic applies to modifying the frags at any
other time if the skb is cloned. So it seems we must copy if we
want to clone the skb.

Further, destructor itself can't do the copy because this needs
to allocate memory in atomic context, and destructor
itself can't fail.
For this second problem I see two solutions: either pre-allocate the
copy buffer just in case and track it together with the destructor, or
use an skb flag to make the check for destructors quick (if not
completely free).

Second option is what macvtap zero copy uses and it already
does copy on clone too. So I hacked that to make it support tcp/udp
used by sunrpc.

> > > 
> > > But I fear we're about to toss Ian into yet another rabbit hole. :-)
> > > 
> > > Let's try to converge on something quickly as I think integration of
> > > his work has been delayed enough as-is.

...

> > It's weekend here, I'll work on a patch like this
> > Sunday.
> 
> Thanks, I was starting to feel my nose twitching and my ears beginning
> to elongate ;-)
> 
> Ian.

Here's a first stub at a fix. Basically to be able to modify frags on
the fly we must make sure the skb isn't cloned, so the moment someone
clones the skb we need to trigger the frag copy logic.  And this is
exactly what happens with SKBTX_DEV_ZEROCOPY so it seems to make sense
to reuse that logic.

The below patchset replaces patch 7
([PATCH 7/9] net: add skb_orphan_frags to copy aside frags with destructors)
in Ian's patchset and needs to be applied there.

Compiled only but I'd like to hear what people think all the same
because it does add a couple of branches on fast path.  On the other
hand this makes it generic so the same logic will be reusable for packet
sockets (which IIRC are currently buggy in the same way as sunrpc) and
for adding zero copy support to tun.

Please comment,
Thanks!

-- 
MST

Michael S. Tsirkin (6):
  skbuff: support per-page destructors in copy_ubufs
  skbuff: add an api to orphan frags
  skbuff: convert to skb_orphan_frags
  tun: orphan frags on xmit
  net: orphan frags on receive
  skbuff: set zerocopy flag on frag destructor

 drivers/net/tun.c      |    2 ++
 include/linux/skbuff.h |   41 +++++++++++++++++++++++++++++++++++++++++
 net/core/dev.c         |    2 ++
 net/core/skbuff.c      |   43 +++++++++++++++++++++++++------------------
 4 files changed, 70 insertions(+), 18 deletions(-)

-- 
MST

^ permalink raw reply

* Re: [PATCH] net: compare_ether_addr[_64bits]() has no ordering
From: Eric Dumazet @ 2012-05-07 13:53 UTC (permalink / raw)
  To: Johannes Berg; +Cc: netdev
In-Reply-To: <1336397946.4325.27.camel@jlt3.sipsolutions.net>

On Mon, 2012-05-07 at 15:39 +0200, Johannes Berg wrote:
> From: Johannes Berg <johannes.berg@intel.com>
> 
> Neither compare_ether_addr() nor compare_ether_addr_64bits()
> (as it can fall back to the former) have comparison semantics
> like memcmp() where the sign of the return value indicates sort
> order. We had a bug in the wireless code due to a blind memcmp
> replacement because of this.
> 
> A cursory look suggests that the wireless bug was the only one
> due to this semantic difference.
> 
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> ---
>  include/linux/etherdevice.h |   11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)

The right way to avoid this kind of problems is to change these
functions to return a bool

^ permalink raw reply

* [PATCH] net: compare_ether_addr[_64bits]() has no ordering
From: Johannes Berg @ 2012-05-07 13:39 UTC (permalink / raw)
  To: netdev

From: Johannes Berg <johannes.berg@intel.com>

Neither compare_ether_addr() nor compare_ether_addr_64bits()
(as it can fall back to the former) have comparison semantics
like memcmp() where the sign of the return value indicates sort
order. We had a bug in the wireless code due to a blind memcmp
replacement because of this.

A cursory look suggests that the wireless bug was the only one
due to this semantic difference.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 include/linux/etherdevice.h |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

--- a/include/linux/etherdevice.h	2012-04-12 05:40:35.000000000 +0200
+++ b/include/linux/etherdevice.h	2012-05-07 15:34:28.000000000 +0200
@@ -159,7 +159,8 @@ static inline void eth_hw_addr_random(st
  * @addr1: Pointer to a six-byte array containing the Ethernet address
  * @addr2: Pointer other six-byte array containing the Ethernet address
  *
- * Compare two ethernet addresses, returns 0 if equal
+ * Compare two ethernet addresses, returns 0 if equal, non-zero otherwise.
+ * Unlike memcmp(), it doesn't return a value suitable for sorting.
  */
 static inline unsigned compare_ether_addr(const u8 *addr1, const u8 *addr2)
 {
@@ -184,10 +185,10 @@ static inline unsigned long zap_last_2by
  * @addr1: Pointer to an array of 8 bytes
  * @addr2: Pointer to an other array of 8 bytes
  *
- * Compare two ethernet addresses, returns 0 if equal.
- * Same result than "memcmp(addr1, addr2, ETH_ALEN)" but without conditional
- * branches, and possibly long word memory accesses on CPU allowing cheap
- * unaligned memory reads.
+ * Compare two ethernet addresses, returns 0 if equal, non-zero otherwise.
+ * Unlike memcmp(), it doesn't return a value suitable for sorting.
+ * The function doesn't need any conditional branches and possibly uses
+ * word memory accesses on CPU allowing cheap unaligned memory reads.
  * arrays = { byte1, byte2, byte3, byte4, byte6, byte7, pad1, pad2}
  *
  * Please note that alignment of addr1 & addr2 is only guaranted to be 16 bits.

^ permalink raw reply

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Hans Schillstrom @ 2012-05-07 12:57 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com
In-Reply-To: <20120507122232.GA32146@1984>

On Monday 07 May 2012 14:22:32 Pablo Neira Ayuso wrote:
> On Mon, May 07, 2012 at 02:09:46PM +0200, Hans Schillstrom wrote:
> > On Monday 07 May 2012 13:56:12 Pablo Neira Ayuso wrote:
> > > On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
> > > > > > We have plenty of rules where just source port mask is zero.
> > > > > > and the dest-port-mask is 0xfffc (or 0xffff)
> > > > > 
> > > > > 0xffff and 0x0000 means on/off respectively.
> > > > > 
> > > > > Still curious, how can 0xfffc be useful?
> > > > 
> > > > That's a special case where an appl is using 4 ports.
> > > > But in general, have not seen other than "on/off" except for above.
> > > 
> > > I see. Well I'm fine with this way to switch on/off things, just
> > > wanted some clafication.
> > > 
> > > Still one final thing I'd like to remove before inclusion:
> > > 
> > > +       union hmark_ports       port_mask;
> > > +       union hmark_ports       port_set;
> > > +       __u32                   spi_mask;
> > > +       __u32                   spi_set;
> > > 
> > > the spi_mask seems redundant. The port_mask already provides u32 for
> > > it.
> > 
> > No problems, I'll remove it.
> 
> OK. As a nice side-effect, this will lead to removing the branch that
> tests ESP/AH in hmark_set_tuple_ports.
>
Yes, only check if not ESP or AH to swap src/dst

+static void
+hmark_set_tuple_ports(const struct sk_buff *skb, unsigned int nhoff,
+		      struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	int protoff;
+
+	protoff = proto_ports_offset(t->proto);
+	if (protoff < 0)
+		return;
+
+	nhoff += protoff;
+	if (skb_copy_bits(skb, nhoff, &t->uports, sizeof(t->uports)) < 0)
+		return;
+
+	t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+			info->port_set.v32;
+
+	if (t->proto != IPPROTO_ESP && t->proto != IPPROTO_AH)
+		if (t->uports.p16.dst < t->uports.p16.src)
+			swap(t->uports.p16.dst, t->uports.p16.src);
+}

> Please, use the patch that I sent you yesterday. Recover the swap
> behaviour that you need, I'll mangle the patch myself to add the
> little comment to explain why we do this with CT as well.
> 
> BTW, note that you do *not* have to remove the XT_HMARK_SPI flags, we
> still need those for iptables-save.
> 
> While at it:
> 
> +enum {                      
> +       XT_HMARK_NONE,       
> +       XT_HMARK_SADR_AND,   
> +       XT_HMARK_DADR_AND,   
> +       XT_HMARK_SPI_AND,    
> +       XT_HMARK_SPI_OR,    
> 
> remove all trailing _OR
> 
> +       XT_HMARK_SPORT_AND,  
> +       XT_HMARK_DPORT_AND,  
> +       XT_HMARK_SPORT_OR,   
> +       XT_HMARK_DPORT_OR,   
> +       XT_HMARK_PROTO_AND,
> 
> rename all _AND by _MASK.
> 
> +       XT_HMARK_RND,        
> +       XT_HMARK_MODULUS,    
> +       XT_HMARK_OFFSET,     
> +       XT_HMARK_CT,         
> +       XT_HMARK_METHOD_L3,  
> +       XT_HMARK_METHOD_L3_4,
> };
> 
> What I'm asking should require very little changes in the kernel-code.
> 

I'll send you the updates later to day

> > > In case you want to support different masks for AH/ESP and TCP, you
> > > could do the following:
> > > 
> > > iptables -I PREROUTING -t mangle -p esp -j HARK --spi-mask 0xffff0000
> > > iptables -I PREROUTING -t mangle -p tcp -j HARK --port-mask 0xfffc
> > > 
> > > Any objection?
> > 
> > I don't think this is a problem, but it should be written in the man page
> > that ports and spi share mask so they can't be used at the same time.
> 
> documentation is fine.
> 
> iptables can stop this by spotting a warning message from user-space.

If you think thats enough, I fine with that.

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Pablo Neira Ayuso @ 2012-05-07 12:22 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com
In-Reply-To: <201205071409.47945.hans.schillstrom@ericsson.com>

On Mon, May 07, 2012 at 02:09:46PM +0200, Hans Schillstrom wrote:
> On Monday 07 May 2012 13:56:12 Pablo Neira Ayuso wrote:
> > On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
> > > > > We have plenty of rules where just source port mask is zero.
> > > > > and the dest-port-mask is 0xfffc (or 0xffff)
> > > > 
> > > > 0xffff and 0x0000 means on/off respectively.
> > > > 
> > > > Still curious, how can 0xfffc be useful?
> > > 
> > > That's a special case where an appl is using 4 ports.
> > > But in general, have not seen other than "on/off" except for above.
> > 
> > I see. Well I'm fine with this way to switch on/off things, just
> > wanted some clafication.
> > 
> > Still one final thing I'd like to remove before inclusion:
> > 
> > +       union hmark_ports       port_mask;
> > +       union hmark_ports       port_set;
> > +       __u32                   spi_mask;
> > +       __u32                   spi_set;
> > 
> > the spi_mask seems redundant. The port_mask already provides u32 for
> > it.
> 
> No problems, I'll remove it.

OK. As a nice side-effect, this will lead to removing the branch that
tests ESP/AH in hmark_set_tuple_ports.

Please, use the patch that I sent you yesterday. Recover the swap
behaviour that you need, I'll mangle the patch myself to add the
little comment to explain why we do this with CT as well.

BTW, note that you do *not* have to remove the XT_HMARK_SPI flags, we
still need those for iptables-save.

While at it:

+enum {                      
+       XT_HMARK_NONE,       
+       XT_HMARK_SADR_AND,   
+       XT_HMARK_DADR_AND,   
+       XT_HMARK_SPI_AND,    
+       XT_HMARK_SPI_OR,    

remove all trailing _OR

+       XT_HMARK_SPORT_AND,  
+       XT_HMARK_DPORT_AND,  
+       XT_HMARK_SPORT_OR,   
+       XT_HMARK_DPORT_OR,   
+       XT_HMARK_PROTO_AND,

rename all _AND by _MASK.

+       XT_HMARK_RND,        
+       XT_HMARK_MODULUS,    
+       XT_HMARK_OFFSET,     
+       XT_HMARK_CT,         
+       XT_HMARK_METHOD_L3,  
+       XT_HMARK_METHOD_L3_4,
};

What I'm asking should require very little changes in the kernel-code.

> > In case you want to support different masks for AH/ESP and TCP, you
> > could do the following:
> > 
> > iptables -I PREROUTING -t mangle -p esp -j HARK --spi-mask 0xffff0000
> > iptables -I PREROUTING -t mangle -p tcp -j HARK --port-mask 0xfffc
> > 
> > Any objection?
> 
> I don't think this is a problem, but it should be written in the man page
> that ports and spi share mask so they can't be used at the same time.

documentation is fine.

iptables can stop this by spotting a warning message from user-space.

^ permalink raw reply

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Hans Schillstrom @ 2012-05-07 12:09 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com
In-Reply-To: <20120507115612.GA31110@1984>

On Monday 07 May 2012 13:56:12 Pablo Neira Ayuso wrote:
> On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
> > > > We have plenty of rules where just source port mask is zero.
> > > > and the dest-port-mask is 0xfffc (or 0xffff)
> > > 
> > > 0xffff and 0x0000 means on/off respectively.
> > > 
> > > Still curious, how can 0xfffc be useful?
> > 
> > That's a special case where an appl is using 4 ports.
> > But in general, have not seen other than "on/off" except for above.
> 
> I see. Well I'm fine with this way to switch on/off things, just
> wanted some clafication.
> 
> Still one final thing I'd like to remove before inclusion:
> 
> +       union hmark_ports       port_mask;
> +       union hmark_ports       port_set;
> +       __u32                   spi_mask;
> +       __u32                   spi_set;
> 
> the spi_mask seems redundant. The port_mask already provides u32 for
> it.

No problems, I'll remove it.

> In case you want to support different masks for AH/ESP and TCP, you
> could do the following:
> 
> iptables -I PREROUTING -t mangle -p esp -j HARK --spi-mask 0xffff0000
> iptables -I PREROUTING -t mangle -p tcp -j HARK --port-mask 0xfffc
> 
> Any objection?

I don't think this is a problem, but it should be written in the man page
that ports and spi share mask so they can't be used at the same time.


> Yes, you'll have to change user-space again, but we have time for
> that.

:-)

> 
> > > > > I'm also telling this because I think that ICMP support will be
> > > > > easier to add if port masking is removed.
> > > > > 
> > > > > [...]
> > > > > > This is what I have done.
> > > > > >
> > > > > > - I reduced the code size a little bit by combining the hmark_ct_set_htuple_ipvX into one func.
> > > > > >   by adding a hmark_addr6_mask() and hmark_addr_any_mask()
> > > > > >   Note that using "otuple->src.l3num" as param 1 in both src and dst is not a typo.
> > > > > >   (it's not set in the rtuple)
> > > > > 
> > > > > Good one, this made the code even smaller.
> > > > > 
> > > > > > - Made the if (dst < src) swap() in the hmark_hash() since it should be used by every caller.
> > > > > 
> > > > > Not really, you don't need for the conntrack part. The original tuple
> > > > > is always the same, not matter where the packet is coming from. I have
> > > > > removed this again so it only affects packet-based hashing.
> > > > 
> > > > Yes original tuple is always the same but not always less than the rtuple.
> > > > If you have two nodes that should produce the same hmark,
> > > > one with conntrack an one without you must make a compare to make it consistent.
> > > 
> > > I see, for consistency still makes sense although this seems to me
> > > like still strange configuration. In what scenario would you use two
> > > different approaches?
> > 
> > In the way that we use HMARK,
> > in the incomming path there is conntrack disabled in the contrainer, 
> > for the outgoing patch i.e. at the payloads there is conntrack used.
> > In that case the --hmark-ct makes life easier.
> 
> That's still not enough to guarantee that the mark will be consistent
> if NAT is in user, but I don't mind recovering the swap and add some
> comment on the code to explain this if this makes your life easier.

Thanks,  I will send a new patch soon.

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Pablo Neira Ayuso @ 2012-05-07 11:56 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com
In-Reply-To: <201205071114.35324.hans.schillstrom@ericsson.com>

On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
> > > We have plenty of rules where just source port mask is zero.
> > > and the dest-port-mask is 0xfffc (or 0xffff)
> > 
> > 0xffff and 0x0000 means on/off respectively.
> > 
> > Still curious, how can 0xfffc be useful?
> 
> That's a special case where an appl is using 4 ports.
> But in general, have not seen other than "on/off" except for above.

I see. Well I'm fine with this way to switch on/off things, just
wanted some clafication.

Still one final thing I'd like to remove before inclusion:

+       union hmark_ports       port_mask;
+       union hmark_ports       port_set;
+       __u32                   spi_mask;
+       __u32                   spi_set;

the spi_mask seems redundant. The port_mask already provides u32 for
it.

In case you want to support different masks for AH/ESP and TCP, you
could do the following:

iptables -I PREROUTING -t mangle -p esp -j HARK --spi-mask 0xffff0000
iptables -I PREROUTING -t mangle -p tcp -j HARK --port-mask 0xfffc

Any objection?

Yes, you'll have to change user-space again, but we have time for
that.

> > > > I'm also telling this because I think that ICMP support will be
> > > > easier to add if port masking is removed.
> > > > 
> > > > [...]
> > > > > This is what I have done.
> > > > >
> > > > > - I reduced the code size a little bit by combining the hmark_ct_set_htuple_ipvX into one func.
> > > > >   by adding a hmark_addr6_mask() and hmark_addr_any_mask()
> > > > >   Note that using "otuple->src.l3num" as param 1 in both src and dst is not a typo.
> > > > >   (it's not set in the rtuple)
> > > > 
> > > > Good one, this made the code even smaller.
> > > > 
> > > > > - Made the if (dst < src) swap() in the hmark_hash() since it should be used by every caller.
> > > > 
> > > > Not really, you don't need for the conntrack part. The original tuple
> > > > is always the same, not matter where the packet is coming from. I have
> > > > removed this again so it only affects packet-based hashing.
> > > 
> > > Yes original tuple is always the same but not always less than the rtuple.
> > > If you have two nodes that should produce the same hmark,
> > > one with conntrack an one without you must make a compare to make it consistent.
> > 
> > I see, for consistency still makes sense although this seems to me
> > like still strange configuration. In what scenario would you use two
> > different approaches?
> 
> In the way that we use HMARK,
> in the incomming path there is conntrack disabled in the contrainer, 
> for the outgoing patch i.e. at the payloads there is conntrack used.
> In that case the --hmark-ct makes life easier.

That's still not enough to guarantee that the mark will be consistent
if NAT is in user, but I don't mind recovering the swap and add some
comment on the code to explain this if this makes your life easier.

^ permalink raw reply

* batostr() function
From: Johannes Berg @ 2012-05-07 11:49 UTC (permalink / raw)
  To: linux-bluetooth-u79uwXL29TY76Z2rM5mHXA, netdev

Really? 2 static buffers that are used alternately based on a static
variable? How can that possibly be thread-safe? That may work in very
restricted scenarios, but ...

johannes

^ permalink raw reply

* New commands to configure IOV features
From: Yuval Mintz @ 2012-05-07 11:17 UTC (permalink / raw)
  To: gregory.v.rose; +Cc: Ben Hutchings, netdev@vger.kernel.org

I've tried to figure out if there was a standard interface
(ethtool/iproute) through which a user could configure the number
of vfs in his system.

I've seen the RFC suggested in http://markmail.org/thread/qblfcv7zbxsxp7q6,
and http://markmail.org/thread/fw54dcppmxuxoe6n, but failed to see any
later references to it (commits or further discussion on this topic).

How exactly are things standing with these RFCs? Were they abandoned?

Thanks,
Yuval

^ permalink raw reply

* Re: [PATCH 7/9] net: add skb_orphan_frags to copy aside frags with destructors
From: Michael S. Tsirkin @ 2012-05-07 10:24 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev, David Miller, Eric Dumazet
In-Reply-To: <1336056971-7839-7-git-send-email-ian.campbell@citrix.com>

On Thu, May 03, 2012 at 03:56:09PM +0100, Ian Campbell wrote:
> This should be used by drivers which need to hold on to an skb for an extended
> (perhaps unbounded) period of time. e.g. the tun driver which relies on
> userspace consuming the skb.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: mst@redhat.com
> ---
>  drivers/net/tun.c      |    1 +
>  include/linux/skbuff.h |   11 ++++++++
>  net/core/skbuff.c      |   68 ++++++++++++++++++++++++++++++++++-------------
>  3 files changed, 61 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index bb8c72c..b53e04e 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -415,6 +415,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>  	/* Orphan the skb - required as we might hang on to it
>  	 * for indefinite time. */
>  	skb_orphan(skb);
> +	skb_orphan_frags(skb, GFP_KERNEL);
>  
>  	/* Enqueue packet */
>  	skb_queue_tail(&tun->socket.sk->sk_receive_queue, skb);
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index ccc7d93..9145f83 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1711,6 +1711,17 @@ static inline void skb_orphan(struct sk_buff *skb)
>  }
>  
>  /**
> + *	skb_orphan_frags - orphan the frags contained in a buffer
> + *	@skb: buffer to orphan frags from
> + *	@gfp_mask: allocation mask for replacement pages
> + *
> + *	For each frag in the SKB which has a destructor (i.e. has an
> + *	owner) create a copy of that frag and release the original
> + *	page by calling the destructor.
> + */
> +extern int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask);
> +
> +/**
>   *	__skb_queue_purge - empty a list
>   *	@list: list to empty
>   *
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 945b807..f009abb 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -697,31 +697,25 @@ struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src)
>  }
>  EXPORT_SYMBOL_GPL(skb_morph);
>  
> -/*	skb_copy_ubufs	-	copy userspace skb frags buffers to kernel
> - *	@skb: the skb to modify
> - *	@gfp_mask: allocation priority
> - *
> - *	This must be called on SKBTX_DEV_ZEROCOPY skb.
> - *	It will copy all frags into kernel and drop the reference
> - *	to userspace pages.
> - *
> - *	If this function is called from an interrupt gfp_mask() must be
> - *	%GFP_ATOMIC.
> - *
> - *	Returns 0 on success or a negative error code on failure
> - *	to allocate kernel memory to copy to.
> +/*
> + * If uarg != NULL copy and replace all frags.
> + * If uarg == NULL then only copy and replace those which have a destructor
> + * pointer.
>   */
> -int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
> +static int skb_copy_frags(struct sk_buff *skb, gfp_t gfp_mask,
> +			  struct ubuf_info *uarg)
>  {
>  	int i;
>  	int num_frags = skb_shinfo(skb)->nr_frags;
>  	struct page *page, *head = NULL;
> -	struct ubuf_info *uarg = skb_shinfo(skb)->destructor_arg;
>  
>  	for (i = 0; i < num_frags; i++) {
>  		u8 *vaddr;
>  		skb_frag_t *f = &skb_shinfo(skb)->frags[i];
>  
> +		if (!uarg && !f->page.destructor)
> +			continue;
> +
>  		page = alloc_page(GFP_ATOMIC);
>  		if (!page) {
>  			while (head) {
> @@ -739,11 +733,16 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
>  		head = page;
>  	}
>  
> -	/* skb frags release userspace buffers */
> -	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
> +	/* skb frags release buffers */
> +	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> +		skb_frag_t *f = &skb_shinfo(skb)->frags[i];
> +		if (!uarg && !f->page.destructor)
> +			continue;
>  		skb_frag_unref(skb, i);
> +	}
>  
> -	uarg->callback(uarg);
> +	if (uarg)
> +		uarg->callback(uarg);
>  

So above we only linked up copied pages, but below we
try to use the list for all frags. Looks like a bug,
I think it needs to check destructor and uarg too.


>  	/* skb frags point to kernel buffers */
>  	for (i = skb_shinfo(skb)->nr_frags; i > 0; i--) {
> @@ -752,10 +751,41 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
>  		head = (struct page *)head->private;
>  	}
>  
> -	skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
>  	return 0;
>  }
>  
> +/*	skb_copy_ubufs	-	copy userspace skb frags buffers to kernel
> + *	@skb: the skb to modify
> + *	@gfp_mask: allocation priority
> + *
> + *	This must be called on SKBTX_DEV_ZEROCOPY skb.
> + *	It will copy all frags into kernel and drop the reference
> + *	to userspace pages.
> + *
> + *	If this function is called from an interrupt gfp_mask() must be
> + *	%GFP_ATOMIC.
> + *
> + *	Returns 0 on success or a negative error code on failure
> + *	to allocate kernel memory to copy to.
> + */
> +int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
> +{
> +	struct ubuf_info *uarg = skb_shinfo(skb)->destructor_arg;
> +	int rc;
> +
> +	rc = skb_copy_frags(skb, gfp_mask, uarg);
> +
> +	if (rc == 0)
> +		skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
> +
> +	return rc;
> +}
> +
> +int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask)
> +{
> +	return skb_copy_frags(skb, gfp_mask, NULL);
> +}
> +EXPORT_SYMBOL(skb_orphan_frags);
>  
>  /**
>   *	skb_clone	-	duplicate an sk_buff
> -- 
> 1.7.2.5

^ permalink raw reply

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Hans Schillstrom @ 2012-05-07  9:14 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com
In-Reply-To: <20120507090328.GA27650@1984>

On Monday 07 May 2012 11:03:28 Pablo Neira Ayuso wrote:
> On Mon, May 07, 2012 at 10:20:42AM +0200, Hans Schillstrom wrote:
> > On Monday 07 May 2012 00:57:38 Pablo Neira Ayuso wrote:
> > > Hi Hans,
> > > 
> > > [...]
> > > > > > > Regarding ICMP traffic, I think we can use the ID field for the
> > > > > > > hashing as well. Thus, we handle ICMP like other protocols.
> > > > > >
> > > > > > Yes why not, I can give it a try.
> > > > > >
> > > >
> > > > I think we wait with this one..
> > > 
> > > I see. This is easy to add for the conntrack side, but it will require
> > > some extra code for the packet-based solution.
> > 
> > Actually I think there is very little gain to spread with type 
> > and then we must add a user mode possibility to turn it off 
> > i.e. a --hmark-icmp-type-mask 
> > 
> > > Not directly related to this but, I know that your intention is to
> > > make this as flexible as possible. However, I still don't find how I
> > > would use the port mask feature in any of my setups.  Basically, I
> > > don't come up with any useful example for this situation.
> > 
> > We have plenty of rules where just source port mask is zero.
> > and the dest-port-mask is 0xfffc (or 0xffff)
> 
> 0xffff and 0x0000 means on/off respectively.
> 
> Still curious, how can 0xfffc be useful?

That's a special case where an appl is using 4 ports.
But in general, have not seen other than "on/off" except for above.

> 
> > > I'm also telling this because I think that ICMP support will be
> > > easier to add if port masking is removed.
> > > 
> > > [...]
> > > > This is what I have done.
> > > >
> > > > - I reduced the code size a little bit by combining the hmark_ct_set_htuple_ipvX into one func.
> > > >   by adding a hmark_addr6_mask() and hmark_addr_any_mask()
> > > >   Note that using "otuple->src.l3num" as param 1 in both src and dst is not a typo.
> > > >   (it's not set in the rtuple)
> > > 
> > > Good one, this made the code even smaller.
> > > 
> > > > - Made the if (dst < src) swap() in the hmark_hash() since it should be used by every caller.
> > > 
> > > Not really, you don't need for the conntrack part. The original tuple
> > > is always the same, not matter where the packet is coming from. I have
> > > removed this again so it only affects packet-based hashing.
> > 
> > Yes original tuple is always the same but not always less than the rtuple.
> > If you have two nodes that should produce the same hmark,
> > one with conntrack an one without you must make a compare to make it consistent.
> 
> I see, for consistency still makes sense although this seems to me
> like still strange configuration. In what scenario would you use two
> different approaches?

In the way that we use HMARK,
in the incomming path there is conntrack disabled in the contrainer, 
for the outgoing patch i.e. at the payloads there is conntrack used.
In that case the --hmark-ct makes life easier.

> 
> > > > - Moved the L3 check a little bit earlier.
> > > 
> > > good.
> > > 
> > > > - changed return values for fragments.
> > > 
> > > With this, you're giving up on trying to classify fragments. Do you
> > > really want this?
> > > 
> > > From my point of view, if your firewalls (assuming they are the HMARK
> > > classification) are stateless, it still makes sense to me to classify
> > > fragments using the XT_HMARK_METHOD_L3_4.
> > 
> > I do agree, it is back to "return 0" again.
> 
> OK.
> 

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply

* [net-next 2/2] stmmac: add mixed burst for DMA
From: Giuseppe CAVALLARO @ 2012-05-07  9:12 UTC (permalink / raw)
  To: netdev; +Cc: Giuseppe Cavallaro
In-Reply-To: <1336381953-18041-1-git-send-email-peppe.cavallaro@st.com>

In mixed burst (MB) mode, the AHB master always initiates
the bursts with fixed-size when the DMA requests transfers
of size less than or equal to 16 beats.
This patch adds the MB support and the flag that can be
passed from the platform to select it.
MB mode can also give some benefits in terms of performances
on some platforms.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/common.h       |    4 ++--
 drivers/net/ethernet/stmicro/stmmac/dwmac1000.h    |    1 +
 .../net/ethernet/stmicro/stmmac/dwmac1000_dma.c    |    6 +++++-
 drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c |    2 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |    6 ++++--
 include/linux/stmmac.h                             |    1 +
 6 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 7164509..b343b71 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -247,8 +247,8 @@ struct stmmac_desc_ops {
 
 struct stmmac_dma_ops {
 	/* DMA core initialization */
-	int (*init) (void __iomem *ioaddr, int pbl, int fb, int burst_len,
-			u32 dma_tx, u32 dma_rx);
+	int (*init) (void __iomem *ioaddr, int pbl, int fb, int mb,
+			int burst_len, u32 dma_tx, u32 dma_rx);
 	/* Dump DMA registers */
 	void (*dump_regs) (void __iomem *ioaddr);
 	/* Set tx/rx threshold in the csr6 register
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
index 53ed56b..f02162f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
@@ -141,6 +141,7 @@ enum rx_tx_priority_ratio {
 };
 
 #define DMA_BUS_MODE_FB		0x00010000	/* Fixed burst */
+#define DMA_BUS_MODE_MB		0x04000000	/* Mixed burst */
 #define DMA_BUS_MODE_RPBL_MASK	0x003e0000	/* Rx-Programmable Burst Len */
 #define DMA_BUS_MODE_RPBL_SHIFT	17
 #define DMA_BUS_MODE_USP	0x00800000
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
index 3675c57..15e1aa1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
@@ -30,7 +30,7 @@
 #include "dwmac1000.h"
 #include "dwmac_dma.h"
 
-static int dwmac1000_dma_init(void __iomem *ioaddr, int pbl, int fb,
+static int dwmac1000_dma_init(void __iomem *ioaddr, int pbl, int fb, int mb,
 			      int burst_len, u32 dma_tx, u32 dma_rx)
 {
 	u32 value = readl(ioaddr + DMA_BUS_MODE);
@@ -66,6 +66,10 @@ static int dwmac1000_dma_init(void __iomem *ioaddr, int pbl, int fb,
 	if (fb)
 		value |= DMA_BUS_MODE_FB;
 
+	/* Mixed Burst has no effect when fb is set */
+	if (mb)
+		value |= DMA_BUS_MODE_MB;
+
 #ifdef CONFIG_STMMAC_DA
 	value |= DMA_BUS_MODE_DA;	/* Rx has priority over tx */
 #endif
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
index 92ed2e0..e4eca62 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
@@ -32,7 +32,7 @@
 #include "dwmac100.h"
 #include "dwmac_dma.h"
 
-static int dwmac100_dma_init(void __iomem *ioaddr, int pbl, int fb,
+static int dwmac100_dma_init(void __iomem *ioaddr, int pbl, int fb, int mb,
 			     int burst_len, u32 dma_tx, u32 dma_rx)
 {
 	u32 value = readl(ioaddr + DMA_BUS_MODE);
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index a9699ae..4fd62ff 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -924,7 +924,8 @@ static void stmmac_check_ether_addr(struct stmmac_priv *priv)
 
 static int stmmac_init_dma_engine(struct stmmac_priv *priv)
 {
-	int pbl = DEFAULT_DMA_PBL, fixed_burst = 0, burst_len = 0;
+	int pbl = DEFAULT_DMA_PBL, fixed_burst = 0, burst_len = 0,
+	    mixed_burst = 0;
 
 	/* Some DMA parameters can be passed from the platform;
 	 * in case of these are not passed we keep a default
@@ -932,10 +933,11 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv)
 	if (priv->plat->dma_cfg) {
 		pbl = priv->plat->dma_cfg->pbl;
 		fixed_burst = priv->plat->dma_cfg->fixed_burst;
+		mixed_burst = priv->plat->dma_cfg->mixed_burst;
 		burst_len = priv->plat->dma_cfg->burst_len;
 	}
 
-	return priv->hw->dma->init(priv->ioaddr, pbl, fixed_burst,
+	return priv->hw->dma->init(priv->ioaddr, pbl, fixed_burst, mixed_burst,
 				   burst_len, priv->dma_tx_phy,
 				   priv->dma_rx_phy);
 }
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index f85c93d..b69bdb1 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -86,6 +86,7 @@ struct stmmac_mdio_bus_data {
 struct stmmac_dma_cfg {
 	int pbl;
 	int fixed_burst;
+	int mixed_burst;
 	int burst_len;
 };
 
-- 
1.7.4.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox