Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: UDP path MTU discovery
From: Glen Turner @ 2010-03-31 23:43 UTC (permalink / raw)
  To: Rick Jones; +Cc: Andi Kleen, netdev
In-Reply-To: <4BB0DCF6.9020401@hp.com>

On Mon, 2010-03-29 at 10:01 -0700, Rick Jones wrote:

> But which of the last N datagrams sent by the application should be retained for 
> retransmission?  It could be scores if not hundreds of datagrams depending on 
> the behaviour of the application and the latency to the narrow part of the network.

We don't need that sort of exotica from the kernel.  The applications
have to be prepared to retransmit lost packets in any case.

What we need is an API for an instant notification that a ICMP Packet
Too Big message has arrived concerning the socket.

Then the application simply retransmits immediately, without adding
to the exponential backoff penalty which the application maintains.
The application maintain a overall packet-transmitted limit to prevent
a DoS.

>From this application behaviour the kernel sees a stream of packets
it can use for UDP Path MTU Discovery (paced at the RTT, so not
contributing to congestion collapse). That stream halts when the
first packet makes it to the end system.

As for David Miller's rant, the applications currently have no choice
but to "do it stupidly" as the kernel doesn't pass enough information
for user space to do it intelligently.  If the kernel passed user space
the same indication as TCP gets, then we could -- and would -- do it
right.

Re-writing the applications to take advantage of the API is no great
shakes -- there aren't many of them, they are written by people with
a good knowledge of networking, but unfortunately they tend to do
important stuff (allocate addresses, serve names, authenticate link
layer access).

It would be nice if the API had some commonality between platforms.
But there's no shortage of #ifdefs already, and one more to make
these applications work well for IPv6 on jumbo frames on the platform
of choice for networking infrastructure would be seen by application
authors as well worthwhile.

Thanks for your consideration,
Glen

-- 
 Glen Turner
 www.gdt.id.au/~gdt

^ permalink raw reply

* Re: UDP path MTU discovery
From: Glen Turner @ 2010-03-31 23:42 UTC (permalink / raw)
  To: David Miller; +Cc: rick.jones2, netdev
In-Reply-To: <20100325.202636.149498207.davem@davemloft.net>

On Thu, 2010-03-25 at 20:26 -0700, David Miller wrote:
> We already provide this information.
> 
> The socket ends up with EMSGSIZE in it's error queue, so the next time
> the application does I/O it sees that error immediately from the
> read/write call and thus knows that path MTU arrived.

Thanks David.

Does select() return from its blocking so the application can make
use of this indication immediately, rather than after the
application's exponentially-increasing wait?

Is an incoming ICMP the only cause of EMSGSIZE?  That is, can an
application safely retransmit immediately?

-- 
 Glen Turner
 www.gdt.id.au/~gdt

^ permalink raw reply

* Re: [PATCHv3] drivers/net/usb: Add new driver ipheth
From: Ben Hutchings @ 2010-03-31 23:28 UTC (permalink / raw)
  To: Greg KH
  Cc: L. Alberto Gim?nez, linux-kernel, netdev, linux-usb, oliver,
	linville, j.dumon, steve.glendinning, davem, dgiagio, dborca
In-Reply-To: <20100331232516.GA27730@suse.de>

On Wed, 2010-03-31 at 16:25 -0700, Greg KH wrote:
> On Thu, Apr 01, 2010 at 12:18:58AM +0100, Ben Hutchings wrote:
> > On Wed, 2010-03-31 at 21:42 +0200, L. Alberto Gim??nez wrote:
[...]
> > > +#include <linux/usb.h>
> > > +#include <linux/workqueue.h>
> > > +
> > > +#define USB_VENDOR_APPLE        0x05ac
> > > +#define USB_PRODUCT_IPHETH     0x1290
> > > +#define USB_PRODUCT_IPHETH_3G   0x1292
> > > +#define USB_PRODUCT_IPHETH_3GS  0x1294
> > 
> > Apple doesn't assign device ids to ipheth so either the names are
> > incorrect or you should get proper device ids.  I believe the Linux
> > Foundation has a USB vendor id and could assign device ids under that.
> 
> You mean there is an application on the iphone that sets the device id
> here?  I thought this was the iphone device ids already assigned to
> apple.
[...]

That's why my first suggestion was that the names are incorrect; they
should probably include IPHONE not IPHETH.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* RE: re-submit2 [ANNOUNCEMENT] NET: usb: sierra_net.c driver
From: Ben Hutchings @ 2010-03-31 23:26 UTC (permalink / raw)
  To: Elina Pasheva
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, Rory Filer,
	dbrownell-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1270069108.6053.5.camel@Linuxdev3>

On Wed, 2010-03-31 at 13:58 -0700, Elina Pasheva wrote:
> On Tue, 2010-03-30 at 23:21 -0700, Rory Filer wrote:
> > -----Original Message-----
> > > >
> > > > Applied, thanks.
> > > 
> > > Nevermind, reverted, it doesn't even build.
> > > 
> > > drivers/net/usb/sierra_net.c: In function 'check_ethip_packet':
> > > drivers/net/usb/sierra_net.c:221:3: error: implicit declaration of
> > > function 'deverr'
> > > 
> > > Really, this should never ever happen, and there is simply
> > > no excuse at all for this.
> > 
> > Well actually there is an excuse, but I'm sure you don't want to
> > know what it is. 
> 
> The patch was posted saying it had only been tested on 2.6.33.1.
> There was failure-to-build and boot problems  beyond our control with 2.6.34-rc2.
> Now that 2.6.34-rc3 is available (and we are able to build and boot on our Ubuntu platform)
> I will modify the driver to reflect the fact that deverr is no longer in existence
> and is replaced by another function. I will submit the modified driver tested on 2.6.34-rc3 shortly.

You should be sending patches based on David Miller's net-next-2.6 or
net-2.6 git repository, where he will be applying them.  As this is a
new driver I assume it can go into net-2.6.

> Perhaps we should have waited until the build problem was fixed.

Or come up with a fix yourself.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCHv3] drivers/net/usb: Add new driver ipheth
From: Greg KH @ 2010-03-31 23:25 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: L. Alberto Gim?nez, linux-kernel, netdev, linux-usb, oliver,
	linville, j.dumon, steve.glendinning, davem, dgiagio, dborca
In-Reply-To: <1270077538.8653.484.camel@localhost>

On Thu, Apr 01, 2010 at 12:18:58AM +0100, Ben Hutchings wrote:
> On Wed, 2010-03-31 at 21:42 +0200, L. Alberto Gim??nez wrote:
> [...]
> > --- /dev/null
> > +++ b/drivers/net/usb/ipheth.c
> [...]
> > +#include <linux/kernel.h>
> > +#include <linux/errno.h>
> > +#include <linux/init.h>
> > +#include <linux/slab.h>
> > +#include <linux/module.h>
> > +#include <linux/netdevice.h>
> > +#include <linux/etherdevice.h>
> > +#include <linux/ethtool.h>
> > +#include <linux/uaccess.h>
> 
> I don't think you need this header.
> 
> > +#include <linux/usb.h>
> > +#include <linux/workqueue.h>
> > +
> > +#define USB_VENDOR_APPLE        0x05ac
> > +#define USB_PRODUCT_IPHETH     0x1290
> > +#define USB_PRODUCT_IPHETH_3G   0x1292
> > +#define USB_PRODUCT_IPHETH_3GS  0x1294
> 
> Apple doesn't assign device ids to ipheth so either the names are
> incorrect or you should get proper device ids.  I believe the Linux
> Foundation has a USB vendor id and could assign device ids under that.

You mean there is an application on the iphone that sets the device id
here?  I thought this was the iphone device ids already assigned to
apple.

If we need a linux-specific device id, I can assign them from the Linux
foundations pool of device ids, but so far I've only done that for
devices that either run Linux (usb gadgets), or for the usb root hubs.
But hey, if it's controlable from software, let me know and I'll dole
one out.  In that case, why would you need 3 different device ids?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCHv3] drivers/net/usb: Add new driver ipheth
From: Ben Hutchings @ 2010-03-31 23:18 UTC (permalink / raw)
  To: L. Alberto Giménez
  Cc: linux-kernel, netdev, linux-usb, oliver, linville, j.dumon,
	steve.glendinning, davem, gregkh, dgiagio, dborca
In-Reply-To: <1270064527-8485-1-git-send-email-agimenez@sysvalve.es>

On Wed, 2010-03-31 at 21:42 +0200, L. Alberto Giménez wrote:
[...]
> --- /dev/null
> +++ b/drivers/net/usb/ipheth.c
[...]
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/init.h>
> +#include <linux/slab.h>
> +#include <linux/module.h>
> +#include <linux/netdevice.h>
> +#include <linux/etherdevice.h>
> +#include <linux/ethtool.h>
> +#include <linux/uaccess.h>

I don't think you need this header.

> +#include <linux/usb.h>
> +#include <linux/workqueue.h>
> +
> +#define USB_VENDOR_APPLE        0x05ac
> +#define USB_PRODUCT_IPHETH     0x1290
> +#define USB_PRODUCT_IPHETH_3G   0x1292
> +#define USB_PRODUCT_IPHETH_3GS  0x1294

Apple doesn't assign device ids to ipheth so either the names are
incorrect or you should get proper device ids.  I believe the Linux
Foundation has a USB vendor id and could assign device ids under that.

> +struct ipheth_device {
> +	struct usb_device *udev;
> +	struct usb_interface *intf;
> +	struct net_device *net;
> +	struct net_device_stats stats;

You can store stats in the net_device.

> +	struct sk_buff *tx_skb;
> +	struct urb *tx_urb;
> +	struct urb *rx_urb;
> +	unsigned char *tx_buf;
> +	unsigned char *rx_buf;
> +	unsigned char *ctrl_buf;
> +	__u8 bulk_in;
> +	__u8 bulk_out;

No need for the double-underscores; that's a convention for use in
definitions shared with user-space.

> +	struct delayed_work carrier_work;
> +};
[...]
> +static int ipheth_open(struct net_device *net)
> +{
> +	struct ipheth_device *dev = netdev_priv(net);
> +	struct usb_device *udev = dev->udev;
> +	int retval = 0;
> +
> +	usb_set_interface(udev, IPHETH_INTFNUM, IPHETH_ALT_INTFNUM);
> +	usb_clear_halt(udev, usb_rcvbulkpipe(udev, dev->bulk_in));
> +	usb_clear_halt(udev, usb_sndbulkpipe(udev, dev->bulk_out));
> +
> +	retval = ipheth_carrier_set(dev);
> +	if (retval)
> +		goto error;
> +
> +	retval = ipheth_rx_submit(dev, GFP_KERNEL);
> +	if (retval)
> +		goto error;
> +
> +	schedule_delayed_work(&dev->carrier_work, IPHETH_CARRIER_CHECK_TIMEOUT);
> +	netif_start_queue(net);
> +error:
> +	return retval;
> +}

There is no cleanup here (and none appears to be needed) so you could
take out the label and replace the gotos with returns.

> +static int ipheth_tx(struct sk_buff *skb, struct net_device *net)
> +{
> +	struct ipheth_device *dev = netdev_priv(net);
> +	struct usb_device *udev = dev->udev;
> +	int retval;
> +
> +	/* Paranoid */
> +	if (skb->len > IPHETH_BUF_SIZE) {
> +		err("%s: skb too large: %d bytes", __func__, skb->len);
> +		dev->stats.tx_dropped++;
> +		dev_kfree_skb_irq(skb);
> +		goto exit;
> +	}

Use WARN here as this would indicate a bug.

Again the goto is unnecessary.

> +	memset(dev->tx_buf, 0, IPHETH_BUF_SIZE);
> +	memcpy(dev->tx_buf, skb->data, skb->len);

This is wasteful.  If you really must pad the buffer then do so, but
don't clear the area that is going to be filled with real data.

> +	usb_fill_bulk_urb(dev->tx_urb, udev,
> +			  usb_sndbulkpipe(udev, dev->bulk_out),
> +			  dev->tx_buf, IPHETH_BUF_SIZE,
> +			  ipheth_sndbulk_callback,
> +			  dev);
> +	dev->tx_urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP;
> +
> +	retval = usb_submit_urb(dev->tx_urb, GFP_ATOMIC);
> +	if (retval) {
> +		err("%s: usb_submit_urb: %d", __func__, retval);
> +		dev->stats.tx_errors++;
> +		dev_kfree_skb_irq(skb);
> +	} else {
> +		net->trans_start = jiffies;

No longer needed.

> +		dev->tx_skb = skb;
> +
> +		dev->stats.tx_packets++;
> +		dev->stats.tx_bytes += skb->len;
> +		netif_stop_queue(net);
> +	}
> +exit:
> +	return NETDEV_TX_OK;
> +}
[...]
> +#ifdef HAVE_NET_DEVICE_OPS
> +static const struct net_device_ops ipheth_netdev_ops = {
> +       .ndo_open = &ipheth_open,
> +       .ndo_stop = &ipheth_close,
> +       .ndo_start_xmit = &ipheth_tx,
> +       .ndo_tx_timeout = &ipheth_tx_timeout,
> +       .ndo_get_stats = &ipheth_stats,
> +};
> +#endif

Remove the #ifdef, there is no question whether we have net_device_ops.

> +static int ipheth_probe(struct usb_interface *intf,
> +			 const struct usb_device_id *id)
> +{
[...]
> +#ifdef HAVE_NET_DEVICE_OPS
> +	netdev->netdev_ops = &ipheth_netdev_ops;
> +#else /* CONFIG_COMPAT_NET_DEV_OPS */
> +	netdev->open = &ipheth_open;
> +	netdev->stop = &ipheth_close;
> +	netdev->hard_start_xmit = &ipheth_tx;
> +	netdev->tx_timeout = &ipheth_tx_timeout;
> +	netdev->get_stats = &ipheth_stats;
> +#endif
[...]

Similarly here.

As a general point, you should now be using netdev_err(), netdev_info(),
etc. for logging messages related to net devices.

I have no idea about USB so I haven't checked the USB API usage at all.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH v3 04/12] l2tp: Add ppp device name to L2TP ppp session data
From: Stephen Hemminger @ 2010-03-31 23:08 UTC (permalink / raw)
  To: James Chapman; +Cc: netdev
In-Reply-To: <4BB2FD0D.1080105@katalix.com>

On Wed, 31 Mar 2010 08:43:09 +0100
James Chapman <jchapman@katalix.com> wrote:

> Stephen Hemminger wrote:
> > On Tue, 30 Mar 2010 17:17:46 +0100
> > James Chapman <jchapman@katalix.com> wrote:
> > 
> >> When dumping L2TP PPP sessions using /proc/net/l2tp, get
> >> the assigned PPP device name from PPP using ppp_dev_name().
> >>
> >> Signed-off-by: James Chapman <jchapman@katalix.com>
> >> Reviewed-by: Randy Dunlap <randy.dunlap@oracle.com>
> >>
> > 
> > Why is this a necessary API?
> > Why not put it in debugfs if just a debugging tool?
> 
> With the original driver (merged in 2.6.23), some people use horrible
> hacks in scripts to derive info about their L2TP connections from /proc.
> So I was reluctant to move it to debugfs in the new driver. If it is ok
> to move an existing /proc file to debugfs, I'm happy to do so. People
> should obtain such info from their L2TP userspace daemon, or through
> netlink anyway.
> 
> 


Sounds like a good use of sysfs either with attribute or symlink
back to underlying device

^ permalink raw reply

* Re: iproute2: silence errors about kernel missing 6rd on "ip tun show".
From: Stephen Hemminger @ 2010-03-31 23:06 UTC (permalink / raw)
  To: Andreas Henriksson; +Cc: Alexandre Cassen, 575970, netdev
In-Reply-To: <20100331080854.GA3810@amd64.fatal.se>

On Wed, 31 Mar 2010 10:08:54 +0200
Andreas Henriksson <andreas@fatal.se> wrote:

> Hello!
> 
> As reported in http://bugs.debian.org/575970 there is currently a warning
> printed for every tunnel when using latest iproute2 on atleast <= 2.6.32
> kernels (missing 6rd?!).
> 
> The attached patch avoids perror when errno is EINVAL, which I assume
> is the way to detect missing 6rd support. A better/cleaner
> method to detect and avoid 6rd when there's no kernel support
> is more then welcome.
> 
> Regards,
> Andreas Henriksson
> 

I will wait (a little while) to see if Alexandre has a preferred alternative.

^ permalink raw reply

* Re: [Patch] bonding: fix potential deadlock in bond_uninit()
From: Stephen Hemminger @ 2010-03-31 23:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Amerigo Wang, linux-kernel, Jiri Pirko, netdev, David S. Miller,
	bonding-devel, Jay Vosburgh
In-Reply-To: <m1d3ykzq5a.fsf@fess.ebiederm.org>

On Wed, 31 Mar 2010 04:28:33 -0700
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Amerigo Wang <amwang@redhat.com> writes:
> 
> > bond_uninit() is invoked with rtnl_lock held, when it does destroy_workqueue()
> > which will potentially flush all works in this workqueue, if we hold rtnl_lock
> > again in the work function, it will deadlock.
> >
> > So unlock rtnl_lock before calling destroy_workqueue().
> 
> Ouch.  That seems rather rude to our caller, and likely very
> dangerous.
> 
> Is this a deadlock you actually hit, or is this something lockdep
> warned about?
> 
> My gut feel says we need to move the destroy_workqueue into
> the network device destructor.
> 
> Eric

Why is there one workqueue per bond device rather than just one workqueue for
all bonding devices controlled by the module instance? It would be cleaner
on removal and less space and overhead.  I can't see that doing arp/mii or alb
work is high parallel and load activity.

^ permalink raw reply

* Re: [RFC] SPD basic actions per netdev
From: jamal @ 2010-03-31 22:58 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Timo Teras, David S. Miller, Patrick McHardy, netdev
In-Reply-To: <1270053478.26743.111.camel@bigi>

[-- Attachment #1: Type: text/plain, Size: 2130 bytes --]


And here's something i just tested on net-next that
fixes this for me.

cheers,
jamal

On Wed, 2010-03-31 at 12:38 -0400, jamal wrote:
> This may be oversight in current implementation and possibly
> nobody has needed it before - hence it is not functional.
> 
> I want to have a drop-all policy on a per-interface level
> for incoming packets and add exceptions as i need them.
> [using the flow table is cheap if you have xfrm built in].
> i.e something along the lines of:
> 
> #eth0, wild-card drop all
> ip xfrm policy add src 0.0.0.0/0 dst 0.0.0.0/0 dev eth0 \
>        dir in ptype main action block priority $SOME-HIGH-value
> #eth0, exception
> ip xfrm policy add blah blah dev eth0 \
> dir in ptype main action allow priority $SOME-small-value
> #eth1, wild-card drop all
> ip xfrm policy add src 0.0.0.0/0 dst 0.0.0.0/0 dev eth1 \
>        dir in ptype main action block priority $SOME-HIGH-value
> #eth1 exception ...
> 
> The problem is this works as long as i dont specify an interface.
> i.e, this would work in the in-direction:
> 
> ip xfrm policy add src 0.0.0.0/0 dst 0.0.0.0/0 \
>         dir in ptype main action block priority $SOME-HIGH-value
> 
> This would not work:
> ip xfrm policy add src 0.0.0.0/0 dst 0.0.0.0/0 dev eth0 \
>        dir in ptype main action block priority $SOME-HIGH-value
> 
> 
> The checks in the selector matching is the culprit, example for v4:
> 
> __xfrm4_selector_match(struct xfrm_selector *sel, struct flowi *fl)
> {
>         return  .... &&
>                 .... &&
>                 (fl->oif == sel->ifindex || !sel->ifindex);
> }
> 
> i.e in the second case i have a non-zero sel->ifindex but
> a zero fl->oif; so it wont match.
> 
> One approach to fix this is to pass the direction then i can do
> in the function call, then i can do something along the lines of
> matching if:
> (fl_dir == FLOW_DIR_IN && (fl->iif == sel->ifindex || !sel->ifindex) ||
> (fl->oif == sel->ifindex || !sel->ifindex);
> 
> Is there any reason the selector matching only assumes fl->oif?
> Are there any unforeseen issues/breakages if i added a check for the
> above.
> 
> cheers,
> jamal

[-- Attachment #2: spd-per-intf --]
[-- Type: text/x-patch, Size: 6044 bytes --]

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index d74e080..ce18464 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -838,7 +838,7 @@ __be16 xfrm_flowi_dport(struct flowi *fl)
 }
 
 extern int xfrm_selector_match(struct xfrm_selector *sel, struct flowi *fl,
-			       unsigned short family);
+			       unsigned short family, int use_if);
 
 #ifdef CONFIG_SECURITY_NETWORK_XFRM
 /*	If neither has a context --> match
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 843e066..e7e230b 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -55,35 +55,37 @@ static struct xfrm_policy *__xfrm_policy_unlink(struct xfrm_policy *pol,
 						int dir);
 
 static inline int
-__xfrm4_selector_match(struct xfrm_selector *sel, struct flowi *fl)
+__xfrm4_selector_match(struct xfrm_selector *sel, struct flowi *fl, int uif)
 {
+	int use_if = uif ? uif : fl->oif;
 	return  addr_match(&fl->fl4_dst, &sel->daddr, sel->prefixlen_d) &&
 		addr_match(&fl->fl4_src, &sel->saddr, sel->prefixlen_s) &&
 		!((xfrm_flowi_dport(fl) ^ sel->dport) & sel->dport_mask) &&
 		!((xfrm_flowi_sport(fl) ^ sel->sport) & sel->sport_mask) &&
 		(fl->proto == sel->proto || !sel->proto) &&
-		(fl->oif == sel->ifindex || !sel->ifindex);
+		(use_if == sel->ifindex || !sel->ifindex);
 }
 
 static inline int
-__xfrm6_selector_match(struct xfrm_selector *sel, struct flowi *fl)
+__xfrm6_selector_match(struct xfrm_selector *sel, struct flowi *fl, int uif)
 {
+	int use_if = uif ? uif : fl->oif;
 	return  addr_match(&fl->fl6_dst, &sel->daddr, sel->prefixlen_d) &&
 		addr_match(&fl->fl6_src, &sel->saddr, sel->prefixlen_s) &&
 		!((xfrm_flowi_dport(fl) ^ sel->dport) & sel->dport_mask) &&
 		!((xfrm_flowi_sport(fl) ^ sel->sport) & sel->sport_mask) &&
 		(fl->proto == sel->proto || !sel->proto) &&
-		(fl->oif == sel->ifindex || !sel->ifindex);
+		(use_if == sel->ifindex || !sel->ifindex);
 }
 
 int xfrm_selector_match(struct xfrm_selector *sel, struct flowi *fl,
-		    unsigned short family)
+		    unsigned short family, int ifindex)
 {
 	switch (family) {
 	case AF_INET:
-		return __xfrm4_selector_match(sel, fl);
+		return __xfrm4_selector_match(sel, fl, ifindex);
 	case AF_INET6:
-		return __xfrm6_selector_match(sel, fl);
+		return __xfrm6_selector_match(sel, fl, ifindex);
 	}
 	return 0;
 }
@@ -917,14 +919,17 @@ static int xfrm_policy_match(struct xfrm_policy *pol, struct flowi *fl,
 			     u8 type, u16 family, int dir)
 {
 	struct xfrm_selector *sel = &pol->selector;
-	int match, ret = -ESRCH;
+	int use_if = fl->oif, match, ret = -ESRCH;
 
 	if (pol->family != family ||
 	    (fl->mark & pol->mark.m) != pol->mark.v ||
 	    pol->type != type)
 		return ret;
 
-	match = xfrm_selector_match(sel, fl, family);
+	if (dir == FLOW_DIR_IN)
+		use_if = fl->iif;
+
+	match = xfrm_selector_match(sel, fl, family, use_if);
 	if (match)
 		ret = security_xfrm_policy_lookup(pol->security, fl->secid,
 						  dir);
@@ -1041,7 +1046,7 @@ static struct xfrm_policy *xfrm_sk_policy_lookup(struct sock *sk, int dir, struc
 	read_lock_bh(&xfrm_policy_lock);
 	if ((pol = sk->sk_policy[dir]) != NULL) {
 		int match = xfrm_selector_match(&pol->selector, fl,
-						sk->sk_family);
+						sk->sk_family, 0);
 		int err = 0;
 
 		if (match) {
@@ -1918,6 +1923,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
 	struct flowi fl;
 	u8 fl_dir;
 	int xerr_idx = -1;
+	int use_if = 0;
 
 	reverse = dir & ~XFRM_POLICY_MASK;
 	dir &= XFRM_POLICY_MASK;
@@ -1928,6 +1934,9 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
 		return 0;
 	}
 
+	if (fl_dir ==  FLOW_DIR_IN)
+		use_if = fl.iif = skb->skb_iif;
+
 	nf_nat_decode_session(skb, &fl, family);
 
 	/* First, check used SA against their selectors. */
@@ -1936,7 +1945,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
 
 		for (i=skb->sp->len-1; i>=0; i--) {
 			struct xfrm_state *x = skb->sp->xvec[i];
-			if (!xfrm_selector_match(&x->sel, &fl, family)) {
+			if (!xfrm_selector_match(&x->sel, &fl, family, use_if)) {
 				XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEMISMATCH);
 				return 0;
 			}
@@ -1956,6 +1965,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
 		pol = flow_cache_lookup(net, &fl, family, fl_dir,
 					xfrm_policy_lookup);
 
+
 	if (IS_ERR(pol)) {
 		XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLERROR);
 		return 0;
@@ -2243,7 +2253,7 @@ int xfrm_bundle_ok(struct xfrm_policy *pol, struct xfrm_dst *first,
 		if (first->origin && !flow_cache_uli_match(first->origin, fl))
 			return 0;
 		if (first->partner &&
-		    !xfrm_selector_match(first->partner, fl, family))
+		    !xfrm_selector_match(first->partner, fl, family, 0))
 			return 0;
 	}
 #endif
@@ -2253,7 +2263,7 @@ int xfrm_bundle_ok(struct xfrm_policy *pol, struct xfrm_dst *first,
 	do {
 		struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
 
-		if (fl && !xfrm_selector_match(&dst->xfrm->sel, fl, family))
+		if (fl && !xfrm_selector_match(&dst->xfrm->sel, fl, family, 0))
 			return 0;
 		if (fl && pol &&
 		    !security_xfrm_state_pol_flow_match(dst->xfrm, pol, fl))
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 17d5b96..f47c90b 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -756,7 +756,7 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, struct xfrm_state *x,
 	 */
 	if (x->km.state == XFRM_STATE_VALID) {
 		if ((x->sel.family &&
-		     !xfrm_selector_match(&x->sel, fl, x->sel.family)) ||
+		     !xfrm_selector_match(&x->sel, fl, x->sel.family, 0)) ||
 		    !security_xfrm_state_pol_flow_match(x, pol, fl))
 			return;
 
@@ -769,7 +769,7 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, struct xfrm_state *x,
 		*acq_in_progress = 1;
 	} else if (x->km.state == XFRM_STATE_ERROR ||
 		   x->km.state == XFRM_STATE_EXPIRED) {
-		if (xfrm_selector_match(&x->sel, fl, x->sel.family) &&
+		if (xfrm_selector_match(&x->sel, fl, x->sel.family, 0) &&
 		    security_xfrm_state_pol_flow_match(x, pol, fl))
 			*error = -ESRCH;
 	}

^ permalink raw reply related

* Re: [PATCHv3] drivers/net/usb: Add new driver ipheth
From: Daniel Borca @ 2010-03-31 22:10 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: L. Alberto Giménez, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-usb@vger.kernel.org,
	linville@tuxdriver.com, j.dumon@option.com,
	steve.glendinning@smsc.com, davem@davemloft.net, gregkh@suse.de,
	dgiagio@gmail.com

Hello,

Thanks for the comments.  I've already fixed a few things.  We'll fix the remaining ones (below) ASAP.  

Regards,
Daniel Borca

On 31.03.2010, at 23:33, Oliver Neukum <oliver@neukum.org> wrote:

Am Mittwoch, 31. März 2010 21:42:07 schrieb L. Alberto Giménez:

Hi,

a few comments below.

   Regards
       Oliver

+
+static int ipheth_open(struct net_device *net)
+{
+    struct ipheth_device *dev = netdev_priv(net);
+    struct usb_device *udev = dev->udev;
+    int retval = 0;
+
+    usb_set_interface(udev, IPHETH_INTFNUM, IPHETH_ALT_INTFNUM);
+    usb_clear_halt(udev, usb_rcvbulkpipe(udev, dev->bulk_in));
+    usb_clear_halt(udev, usb_sndbulkpipe(udev, dev->bulk_out));

Is this really needed? If so, please add a comment.

+
+    retval = ipheth_carrier_set(dev);
+    if (retval)
+        goto error;
+
+    retval = ipheth_rx_submit(dev, GFP_KERNEL);
+    if (retval)
+        goto error;
+
+    schedule_delayed_work(&dev->carrier_work, IPHETH_CARRIER_CHECK_TIMEOUT);

Does it make sense to start rx while you have no carrier?

+static int ipheth_tx(struct sk_buff *skb, struct net_device *net)
+{
+    struct ipheth_device *dev = netdev_priv(net);
+    struct usb_device *udev = dev->udev;
+    int retval;
+
+    /* Paranoid */
+    if (skb->len > IPHETH_BUF_SIZE) {
+        err("%s: skb too large: %d bytes", __func__, skb->len);
+        dev->stats.tx_dropped++;
+        dev_kfree_skb_irq(skb);
+        goto exit;
+    }
+
+    memset(dev->tx_buf, 0, IPHETH_BUF_SIZE);
+    memcpy(dev->tx_buf, skb->data, skb->len);

a bit wasteful

+static void ipheth_disconnect(struct usb_interface *intf)
+{
+    struct ipheth_device *dev;
+
+    dev = usb_get_intfdata(intf);
+    if (dev != NULL) {

is this check needed?

+static struct usb_driver ipheth_driver = {
+    .name =        "ipheth",
+    .probe =    ipheth_probe,
+    .disconnect =    ipheth_disconnect,
+    .id_table =    ipheth_table,
+    .supports_autosuspend = 0,

redundant



      

^ permalink raw reply

* Re: Undefined behaviour of connect(fd, NULL, 0);
From: Neil Brown @ 2010-03-31 22:07 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, netdev
In-Reply-To: <20100331.141732.225997212.davem@davemloft.net>

On Wed, 31 Mar 2010 14:17:32 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Neil Brown <neilb@suse.de>
> Date: Thu, 1 Apr 2010 07:24:12 +1100
> 
> >> --- a/net/ipv4/af_inet.c	2010-03-31 11:47:01.952910248 -0700
> >> +++ b/net/ipv4/af_inet.c	2010-03-31 11:48:09.852938406 -0700
> >> @@ -575,7 +575,7 @@ int inet_stream_connect(struct socket *s
> >>  
> >>  	lock_sock(sk);
> >>  
> >> -	if (uaddr->sa_family == AF_UNSPEC) {
> >> +	if (addr_len < sizeof(sa_family_t) || uaddr->sa_family == AF_UNSPEC) {
> >>  		err = sk->sk_prot->disconnect(sk, flags);
> >>  		sock->state = err ? SS_DISCONNECTING : SS_UNCONNECTED;
> >>  		goto out;
> > 
> > Thanks for the reply.
> > 
> > The implication of this patch is that
> >    connect(fd, NULL, 0)
> > is actually a valid way to check if an in-progress connection has completed.
> > 
> > Is that the intention?
> 
> That's not how I read the patch, the result is that connect(fd, NULL...)
> will now disconnect the socket.

Yes, you are right - I read it upside-down.  Sorry.

Thanks,
NeilBrown

^ permalink raw reply

* Re: [PATCH v2] Add Mergeable RX buffer feature to vhost_net
From: David Stevens @ 2010-03-31 22:04 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, kvm-owner, netdev, rusty, virtualization
In-Reply-To: <20100331120228.GH31085@redhat.com>

"Michael S. Tsirkin" <mst@redhat.com> wrote on 03/31/2010 05:02:28 AM:
> 
> attached patch seems to be whiespace damaged as well.
> Does the origin pass checkpatch.pl for you?

        Yes, but I probably broke it in the transfer -- will be more
careful with the next revision.



> > +               head.iov_base = (void *)vhost_get_vq_desc(&net->dev, 
vq,
> > +                       vq->iov, ARRAY_SIZE(vq->iov), &out, &in, NULL, 

> > NULL);
> 
> I this casting confusing.
> Is it really expensive to add an array of heads so that
> we do not need to cast?

        It needs the heads and the lengths, which looks a lot
like an iovec. I was trying to resist adding a new
struct XXX { unsigned head; unsigned len; } just for this,
but I could make these parallel arrays, one with head index and
the other with length.

> > +               if (vq->rxmaxheadcount < headcount)
> > +                       vq->rxmaxheadcount = headcount;
> 
> This seems the only place where we set the rxmaxheadcount
> value. Maybe it can be moved out of vhost.c to net.c?
> If vhost library needs this it can get it as function
> parameter.

        I can move it to vhost_get_heads(), sure.
> 
> > +               /* Skip header. TODO: support TSO. */
> 
> You seem to have removed the code that skips the header.
> Won't this break non-GSO backends such as raw?

        From our prior discussion, what I had in mind was that
we'd remove all special-header processing by using the ioctl
you added for TUN and later, an equivalent ioctl for raw (when
supported in qemu-kvm). Qemu would arrange headers with tap
(and later raw) to get what the guest expects, and vhost then
just passes all data as-is between the socket and the ring.
        That not only removes the different-header-len code
for mergeable buffers, but also eliminates making a copy of the
header for every packet as before. Vhost has no need to do
anything with the header, or even know its length. It also
doesn't have to care what the type of the backend is-- raw or
tap.
        I think that simplifies everything, but it does mean that
raw socket support requires a header ioctl for the different
vnethdr sizes if the guest wants a vnethdr with and without
mergeable buffers. Actually, I thought this was your idea and
the point of the TUNSETVNETHDRSZ ioctl, but I guess you had
something else in mind?

> > -               /* TODO: Check specific error and bomb out unless 
EAGAIN? 
> > */
> 
> Do you think it's not a good idea?

        EAGAIN is not possible after the change, because we don't
even enter the loop unless we have an skb on the read queue; the
other cases bomb out, so I figured the comment for future work is
now done. :-)

> 
> >                 if (err < 0) {
> > -                       vhost_discard_vq_desc(vq);
> > +                       vhost_discard_vq_desc(vq, headcount);
> >                         break;
> >                 }
> 
> I think we should detect and discard truncated messages,
> since len might not be reliable if userspace pulls
> a packet from under us. Also, if new packet is
> shorter than the old one, there's no truncation but
> headcount is wrong.
> 
> So simplest fix IMO would be to compare err with expected len.
> If there's a difference, we hit the race and so
> we would discard the packet.

Sure.

> 
> 
> >                 /* TODO: Should check and handle checksum. */
> > +               if (vhost_has_feature(&net->dev, 
VIRTIO_NET_F_MRG_RXBUF)) 
> > {
> > +                       struct virtio_net_hdr_mrg_rxbuf *vhdr =
> > +                               (struct virtio_net_hdr_mrg_rxbuf *)
> > +                               vq->iov[0].iov_base;
> > +                       /* add num_bufs */
> > +                       if (put_user(headcount, &vhdr->num_buffers)) {
> > +                               vq_err(vq, "Failed to write 
num_buffers");
> > +                               vhost_discard_vq_desc(vq, headcount);
> 
> Let's do memcpy_to_iovecend etc so that we do not assume layout.
> This is also why we need move_iovec: sendmsg might modify the iovec.
> It would also be nice not to corrupt memory and
> get a reasonable error if buffer size
> that we get is smaller than expected header size.

        I wanted to avoid the header copy here, with the reasonable 
restriction
that the guest gives you a buffer at least big enough for the vnet_hdr. A
length check alone (on iov[0].iov_len) could enforce that without copying
headers back and forth to support silly cases like 8-byte buffers or
num_buffers spanning multiple iovecs, and I think paying the price of the
copy on every packet to support guests that broken isn't worth it.
        So, my preference here would be to add:

        if (vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF))
                struct virtio_net_mrg_rxbuf *vhdr...

                if (vq->iov[0].iov_len < sizeof(*vhdr)) {
                        vq_err(vq, "tiny buffers (len %d < %d) are not 
supported",
                                vq->iov[0].iov_len, sizeof(*vhdr));
                        break;
                }
                /* add num_bufs */
                ...

Does that work for you?

> > -               if (err) {
> > -                       vq_err(vq, "Unable to write vnet_hdr at addr 
%p: 
> > %d\n",
> > -                              vq->iov->iov_base, err);
> > -                       break;
> > -               }
> > -               len += hdr_size;
> 
> This seems to break non-GSO backends as well.

I don't have any to test with w/ current qemu-kvm, but again, I thought
your plan was to remove special header processing from vhost and let
the socket end do it via ioctl. Wouldn't a similar ioctl for raw
sockets when supported by qemu do that?


> >         }
> >         n->dev.acked_features = features;
> >         smp_wmb();
> > -       for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
> > -               mutex_lock(&n->vqs[i].mutex);
> > -               n->vqs[i].hdr_size = hdr_size;
> > -               mutex_unlock(&n->vqs[i].mutex);
> 
> I expect the above is a leftover from the previous version
> which calculated header size in kernel?

        Right. With the ioctl, and assuming raw gets a similar
one, we don't need to know anything about the header size in
vhost, except that qemu needs to make sure mergeable rxbufs is
only set when the socket has the larger vnet_hdr that includes
the num_bufs field.


> > @@ -410,6 +410,7 @@ static long vhost_set_vring(struct vhost
> >                 vq->last_avail_idx = s.num;
> >                 /* Forget the cached index value. */
> >                 vq->avail_idx = vq->last_avail_idx;
> > +               vq->rxmaxheadcount = 0;
> >                 break;
> >         case VHOST_GET_VRING_BASE:
> >                 s.index = idx;
> > @@ -856,6 +857,48 @@ static unsigned get_indirect(struct vhos
> >         return 0;
> >  }
> > 
> > +/* This is a multi-head version of vhost_get_vq_desc
> 
> How about:
> version of vhost_get_vq_desc that returns multiple
> descriptors

OK.

> 
> > + * @vq         - the relevant virtqueue
> > + * datalen     - data length we'll be reading
> > + * @iovcount   - returned count of io vectors we fill
> > + * @log                - vhost log
> > + * @log_num    - log offset
> 
> return value?
> Also - why unsigned?

        Returned value is the number of heads, which is
0 or greater.
> 
> > + */
> > +unsigned vhost_get_heads(struct vhost_virtqueue *vq, int datalen, int 

> 
> Would vhost_get_vq_desc_multiple be a better name?

        26 chars for the function name with indentation and
argument list might not be pretty on the calls. What about
changing vhost_get_desc() and vhost_get_desc_n() [make them
both shorter]? ("_n" indicating >= 1 instead of just one).

> > +       struct vhost_log *log, unsigned int *log_num)
> > +{
> > +       struct iovec *heads = vq->heads;
> 
> I think it's better to pass in heads array than take it from vq->heads.

        I'd actually prefer to go the other way, and remove the
log stuff, for example, since they are accessible from vq which
we have already. Adding heads make it look less similar to the
arg list for vhost_get_vq_desc(), but I'm more interested in
getting the feature than particular arg lists, so your call.

> 
> > +       int out, in = 0;
> 
> Why is in initialized here?

Needed in the previous version, but not here.

> 
> > +       int seg = 0;            /* iov index */
> > +       int hc = 0;             /* head count */
> > +
> > +       while (datalen > 0) {
> 
> Can't this simply call vhost_get_vq_desc in a loop somehow,
> or use a common function that both this and vhost_get_vq_desc call?

It is calling vhost_get_vq_desc() in this loop; that's how it
gets the indiviual heads and iov entries. The rest of it is just
massaging the offsets so vhost_get_vq_desc puts them all in the
right places and tracks the heads and lengths correctly for add_used.

> 
> > +               if (hc >= VHOST_NET_MAX_SG) {
> > +                       vhost_discard_vq_desc(vq, hc);
> > +                       return 0;
> > +               }
> > +               heads[hc].iov_base = (void 
*)vhost_get_vq_desc(vq->dev, 
> > vq,
> > +                       vq->iov+seg, ARRAY_SIZE(vq->iov)-seg, &out, 
&in, 
> > log,
> > +                       log_num);
> > +               if (heads[hc].iov_base == (void *)vq->num) {
> > +                       vhost_discard_vq_desc(vq, hc);
> > +                       return 0;
> > +               }
> > +               if (out || in <= 0) {
> > +                       vq_err(vq, "unexpected descriptor format for 
RX: "
> > +                               "out %d, in %d\n", out, in);
> > +                       vhost_discard_vq_desc(vq, hc);
> 
> goto err above might help simplify cleanup.

I generally would rather see the error cleanup explicitly where the
error detection and message are, for code readability, but I can change
it to a goto here if you think that's clearer.

> > -int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int 

> > len)
> > +int vhost_add_used(struct vhost_virtqueue *vq, struct iovec *heads, 
int 
> > count)
> 
> count is always 0 for send, right?
> I think it is better to have two APIs here as well:
> vhost_add_used
> vhost_add_used_multiple
> we can use functions to avoid code duplication.
> 
> >  {
> > -       struct vring_used_elem *used;
> > +       struct vring_used_elem *used = 0;
> 
> why is used initialized here?

        This is to remove a compiler warning that "used" may
be used unitialized, for a case that can't happen (count <= 0).


> > +               vq->last_used_idx++;
> 
> I think we should update last_used_idx on success only,
> at the end. Simply use last_used_idx + count
> instead of last_used_idx + 1.

        OK.

> > @@ -1023,22 +1071,35 @@ int vhost_add_used(struct vhost_virtqueu
> >                 if (vq->log_ctx)
> >                         eventfd_signal(vq->log_ctx, 1);
> >         }
> > -       vq->last_used_idx++;
> >         return 0;
> >  }
> > 
> > +int vhost_available(struct vhost_virtqueue *vq)
> 
> since this function is non-static, please document what it does.

        OK.

> > +{
> > +       int avail;
> > +
> > +       if (!vq->rxmaxheadcount)        /* haven't got any yet */
> > +               return 1;
> 
> since seems to make net - specific asumptions.
> How about moving this check out to net.c?

        I can, but its only user is vhost_signal() in this file.
 
> > +       avail = vq->avail_idx - vq->last_avail_idx;
> > +       if (avail < 0)
> > +               avail += 0x10000; /* wrapped */
> 
> A branch that is likely non-taken. Also, rxmaxheadcount
> is really unlikely to get as large as half the ring.
> So this just wastes cycles?

        The indexes are free-running counters, so if
avail_idx has overflowed but last_avail_idx has not
then the result will be (incorrectly) negative. It
has nothing to do with how many buffers are in use--
consider avail_idx = 2 and last_avail_idx = 65534,
then avail = will be -65532, but we want it to be 4.

> 
> > +       return avail;
> > +}
> > +
> >  /* This actually signals the guest, using eventfd. */
> > -void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> > +void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq, 
bool 
> > recvr)
> 
> This one is not elegant. receive is net.c concept, let's not
> put it in vhost.c. Pass in headcount if you must.

        Mainly what I was after here is to not affect signal's
for the transmit side. I thought about splitting it into a
separate function entirely for the transmit, but came up with
this, instead. I can't tell if it's a receiver from the vq,
but I don't want to override NOTIFY_ON_EMPTY or the headcount
check (which has nothing to do with transmitters) for the
transmitter at all.
        The vq doesn't tell you if it is a receiver or not,
so I either need a flag as argument to tell, or I need to
split into a transmit vhost_signal and a receive vhost_signal.
        I can do the available check before calling vhost_signal,
but then I need to remove the NOTIFY check entirely so it'll
still notify for the receiver case when available is not
enough but still not 0.
        So, I think the options are:
1) a flag
2) separate transmit or receive signal functions
3) move NOTIFY_ON_EMPTY out of vhost_signal and
        check that or available (for recvr) before
        calling vhost_signal().

> 
> >  {
> >         __u16 flags = 0;
> > +
> >         if (get_user(flags, &vq->avail->flags)) {
> >                 vq_err(vq, "Failed to get flags");
> >                 return;
> >         }
> > 
> > -       /* If they don't want an interrupt, don't signal, unless 
empty. */
> > +       /* If they don't want an interrupt, don't signal, unless
> > +        * empty or receiver can't get a max-sized packet. */
> >         if ((flags & VRING_AVAIL_F_NO_INTERRUPT) &&
> > -           (vq->avail_idx != vq->last_avail_idx ||
> > +           (!recvr || vhost_available(vq) >= vq->rxmaxheadcount ||
> 
> Is the above really worth the complexity?
> Guests can't rely on this kind of fuzzy logic, can they?
> Do you see this helping performance at all?

        It definitely hurts performance to allocate max-sized buffers
and then free up the excess (the translate_desc is expensive). It is
a heuristic, but it's one that keeps the notifies flowing without
having to fail on a packet to reenable them.
        The notify code has to change because there is no recovery
if we need 5 buffers and have only 3, while notification is
disabled unless we go all the way to 0. We can't honor NOTIFY_ON_EMPTY
in that case because it will never proceed. We can remove the
NOTIFY_ON_EMPTY check in vhost_signal and do it in the caller; then allow
receiver to fail a buffer allocation and signal unconditionally,
but we cannot wait for it to be 0 even if NOTIFY_ON_EMPTY is set--
that's a deadlock.


> >  /* OK, now we need to know about added descriptors. */
> > -bool vhost_enable_notify(struct vhost_virtqueue *vq)
> > +bool vhost_enable_notify(struct vhost_virtqueue *vq, bool recvr)
> >  {
> >         u16 avail_idx;
> >         int r;
> > @@ -1080,6 +1141,8 @@ bool vhost_enable_notify(struct vhost_vi
> >                 return false;
> >         }
> > 
> > +       if (recvr && vq->rxmaxheadcount)
> > +               return (avail_idx - vq->last_avail_idx) >= 
> > vq->rxmaxheadcount;
> 
> The fuzzy logic behind rxmaxheadcount might be a good
> optimization, but I am not comfortable using it
> for correctness. Maybe vhost_enable_notify should get
> the last head so we can redo poll when another one
> is added.

        I tried requiring at least 64K here always, but as
I said before, the cost of doing the translate_desc and
then throwing those away for small packets killed performance.
I also considered using a highwater mark on the ring, but
potentially differing buffer sizes makes that less useful,
and a small ring may only have enough space for 1 max-sized
packet. 
        Maybe we should just remove NOTIFY_ON_EMPTY if we
fail a packet allocation; I can try that out.
        Another alternative I considered was splitting out
the translate_desc stuff in the hopes that we could found
out how many buffers we need in a less-expensive way; then
we could go for max-sized and free the excess without too
much overhead, maybe.
        Anyway, I'll look harder at working around this.

> >         struct iovec indirect[VHOST_NET_MAX_SG];
> > -       struct iovec iov[VHOST_NET_MAX_SG];
> > -       struct iovec hdr[VHOST_NET_MAX_SG];
> > -       size_t hdr_size;
> > +       struct iovec iov[VHOST_NET_MAX_SG+1]; /* an extra for vnet hdr 
*/
> 
> VHOST_NET_MAX_SG should already include iovec vnet hdr.

        That's actually an artifact from the previous patch. I
was arranging the iovec to have just the header needed by the
guest and this simplified the bounds checking. It can be
removed if we leave header size management to the socket side,
but if vhost is going to be doing vnet_hdr manipulation for
raw sockets, then it'll depend on what that new code needs
to do to manage the header, I suppose.

                                                +-DLS


^ permalink raw reply

* Re: [PATCHv3] drivers/net/usb: Add new driver ipheth
From: "L. Alberto Giménez" @ 2010-03-31 21:38 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: linux-kernel, netdev, linux-usb, linville, j.dumon,
	steve.glendinning, davem, gregkh, dgiagio, dborca
In-Reply-To: <201003312233.26130.oliver@neukum.org>

On 03/31/2010 10:33 PM, Oliver Neukum wrote:
> Am Mittwoch, 31. März 2010 21:42:07 schrieb L. Alberto Giménez:
> 
> Hi,
> 
> a few comments below.

Hi Oliver,

I'll try to fix all the problems that you spotted as soon as I know how
to do that (documentation pointers more than the obvious ones are
welcome). In the meantime, please be patient :)

Regards,
-- 
L. Alberto Giménez
JabberID agimenez@jabber.sysvalve.es
GnuPG key ID 0x3BAABDE1

^ permalink raw reply

* Re: [net-2.6 PATCH] bonding: fix broken multicast with round-robin mode
From: David Miller @ 2010-03-31 21:25 UTC (permalink / raw)
  To: eric.dumazet; +Cc: andy, netdev, lhh, fubar, bonding-devel
In-Reply-To: <1270070586.2593.6.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 31 Mar 2010 23:23:06 +0200

> New compiler got these new options :  -Wframe-larger-than=1024
> -fno-strict-overflow -fno-dwarf2-cfi-asm -fconserve-stack

Ok, so much for my warning option theory :-)

^ permalink raw reply

* Re: [net-2.6 PATCH] bonding: fix broken multicast with round-robin mode
From: Eric Dumazet @ 2010-03-31 21:23 UTC (permalink / raw)
  To: David Miller; +Cc: andy, netdev, lhh, fubar, bonding-devel
In-Reply-To: <20100331.140055.246389406.davem@davemloft.net>

Le mercredi 31 mars 2010 à 14:00 -0700, David Miller a écrit :

> Funny how going back in time gives us better diagnostic messages from
> the compiler :-)
> 
> FWIW I also didn't get the warning, and that was with gcc-4.5 built
> from the gcc trunk just the other day.
> 
> I suspect this is to do with a change to what warnings get enabled by
> default with the -W options we put in the cflags rather than gcc
> losing the ability to detect this case.

  x86_64-unknown-linux-gcc -Wp,-MD,drivers/net/bonding/.bond_main.o.d
-nostdinc
-isystem /data/x86-64/lib/gcc/x86_64-unknown-linux/4.1.2/include
-I/data/src/linux-2.6/arch/x86/include -Iinclude  -include
include/generated/autoconf.h -D__KERNEL__ -Wall -Wundef
-Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common
-Werror-implicit-function-declaration -Wno-format-security
-fno-delete-null-pointer-checks -O2 -m64 -mno-red-zone -mcmodel=kernel
-funit-at-a-time -maccumulate-outgoing-args -DCONFIG_AS_CFI=1
-DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-sign-compare
-fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow
-fno-stack-protector -fno-omit-frame-pointer -fno-optimize-sibling-calls
-g -pg -Wdeclaration-after-statement -Wno-pointer-sign
-D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(bond_main)"
-D"KBUILD_MODNAME=KBUILD_STR(bonding)"  -c -o
drivers/net/bonding/.tmp_bond_main.o drivers/net/bonding/bond_main.c
drivers/net/bonding/bond_main.c: In function ‘bond_xmit_roundrobin’:
drivers/net/bonding/bond_main.c:4159: warning: comparison is always
false due to limited range of data type


while with gcc-4.4.2 (native compiler, no warning displayed)

 gcc -Wp,-MD,drivers/net/bonding/.bond_main.o.d  -nostdinc
-isystem /usr/lib/gcc/x86_64-linux-gnu/4.4.3/include
-I/usr/src/git/linux-2.6/arch/x86/include -Iinclude  -include
include/generated/autoconf.h -D__KERNEL__ -Wall -Wundef
-Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common
-Werror-implicit-function-declaration -Wno-format-security
-fno-delete-null-pointer-checks -O2 -m64 -mtune=generic -mno-red-zone
-mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args
-DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-sign-compare
-fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow
-Wframe-larger-than=1024 -fno-stack-protector -fno-omit-frame-pointer
-fno-optimize-sibling-calls -g -Wdeclaration-after-statement
-Wno-pointer-sign -fno-strict-overflow -fno-dwarf2-cfi-asm
-fconserve-stack   -D"KBUILD_STR(s)=#s"
-D"KBUILD_BASENAME=KBUILD_STR(bond_main)"
-D"KBUILD_MODNAME=KBUILD_STR(bond_main)"  -c -o
drivers/net/bonding/.tmp_bond_main.o drivers/net/bonding/bond_main.c


New compiler got these new options :  -Wframe-larger-than=1024
-fno-strict-overflow -fno-dwarf2-cfi-asm -fconserve-stack




^ permalink raw reply

* Re: Undefined behaviour of connect(fd, NULL, 0);
From: David Miller @ 2010-03-31 21:17 UTC (permalink / raw)
  To: neilb; +Cc: shemminger, netdev
In-Reply-To: <20100401072412.032aa8e6@notabene.brown>

From: Neil Brown <neilb@suse.de>
Date: Thu, 1 Apr 2010 07:24:12 +1100

>> --- a/net/ipv4/af_inet.c	2010-03-31 11:47:01.952910248 -0700
>> +++ b/net/ipv4/af_inet.c	2010-03-31 11:48:09.852938406 -0700
>> @@ -575,7 +575,7 @@ int inet_stream_connect(struct socket *s
>>  
>>  	lock_sock(sk);
>>  
>> -	if (uaddr->sa_family == AF_UNSPEC) {
>> +	if (addr_len < sizeof(sa_family_t) || uaddr->sa_family == AF_UNSPEC) {
>>  		err = sk->sk_prot->disconnect(sk, flags);
>>  		sock->state = err ? SS_DISCONNECTING : SS_UNCONNECTED;
>>  		goto out;
> 
> Thanks for the reply.
> 
> The implication of this patch is that
>    connect(fd, NULL, 0)
> is actually a valid way to check if an in-progress connection has completed.
> 
> Is that the intention?

That's not how I read the patch, the result is that connect(fd, NULL...)
will now disconnect the socket.

^ permalink raw reply

* Re: Undefined behaviour of connect(fd, NULL, 0);
From: Stephen Hemminger @ 2010-03-31 21:14 UTC (permalink / raw)
  To: Neil Brown; +Cc: netdev
In-Reply-To: <20100401072412.032aa8e6@notabene.brown>

On Thu, 1 Apr 2010 07:24:12 +1100
Neil Brown <neilb@suse.de> wrote:

> On Wed, 31 Mar 2010 11:49:36 -0700
> Stephen Hemminger <shemminger@vyatta.com> wrote:
> 
> > On Wed, 31 Mar 2010 22:36:37 +1100
> > Neil Brown <neilb@suse.de> wrote:
> > 
> > > 
> > > Hi Netdev.
> > > 
> > > We have a customer who was reporting strangely unpredictable behaviour of an
> > > in-house application that used networking.
> > > 
> > > It called connect on a non-blocking socket and subsequently called
> > >    connect(fd, NULL, 0)
> > > 
> > > to check if the connection had succeeded.
> > > This would sometime "work" and sometimes close the connection.
> > > 
> > > Looking at the code (sys_connect, move_addr_to_kernel, inet_stream_connect),
> > > it seems that in this case an uninitialised on-stack address is passed
> > > to inet_stream_connect and it makes a decision based on ->sa_family (which is
> > > uninitialised).
> > > 
> > > It seems clear that connect(fd, NULL, 0) is the wrong thing to do in this
> > > circumstance, but I think it would be good if it failed consistently rather
> > > than unpredictably.
> > > 
> > > Would it be appropriate for move_addr_to_kernel to zero out the remainder of
> > > the address?
> > >    memset(kaddr+ulen, 0, MAX_SOCK_ADDR-ulen);
> > > ??
> > > 
> > > Then connect(fd, NULL, 0) would always break the connection.
> > 
> > I think the problem is inet_stream_connect referencing past addr_len.
> > 
> > --- a/net/ipv4/af_inet.c	2010-03-31 11:47:01.952910248 -0700
> > +++ b/net/ipv4/af_inet.c	2010-03-31 11:48:09.852938406 -0700
> > @@ -575,7 +575,7 @@ int inet_stream_connect(struct socket *s
> >  
> >  	lock_sock(sk);
> >  
> > -	if (uaddr->sa_family == AF_UNSPEC) {
> > +	if (addr_len < sizeof(sa_family_t) || uaddr->sa_family == AF_UNSPEC) {
> >  		err = sk->sk_prot->disconnect(sk, flags);
> >  		sock->state = err ? SS_DISCONNECTING : SS_UNCONNECTED;
> >  		goto out;
> 
> Thanks for the reply.
> 
> The implication of this patch is that
>    connect(fd, NULL, 0)
> is actually a valid way to check if an in-progress connection has completed.
> 
> Is that the intention?
The rationale is that move_addr_to_kernel, explcitly allow addr=NULL with addr_len=0
so if it is allowed there why not let it through. The implication of this is that
addr_len is the same as AF_UNSPEC.

> Does all other address manipulation code check the addr_len ?? (probably).

Not sure.
Someone ought to check BSD/Solaris to see if there is some standard here.

^ permalink raw reply

* Re: [PATCH 5/6] cxgb4: Add main driver file and driver Makefile
From: David Miller @ 2010-03-31 21:13 UTC (permalink / raw)
  To: swise; +Cc: shemminger, dm, netdev
In-Reply-To: <4BB3A301.3090100@opengridcomputing.com>

From: Steve Wise <swise@opengridcomputing.com>
Date: Wed, 31 Mar 2010 14:31:13 -0500

> This sounds interesting.  Dave, how would you envision the ethtool
> interface for this?  A new modifier to --statistics?  Like: 'ethtool
> --statistics ethX viewname' where viewname could maybe be passed to
> the driver to allow arbitrary views?  Or something else?

I really don't care much about the userland side, I'm only
interested in how you implement this in the kernel :-)

^ permalink raw reply

* Re: [net-2.6 PATCH] bonding: fix broken multicast with round-robin mode
From: David Miller @ 2010-03-31 21:00 UTC (permalink / raw)
  To: eric.dumazet; +Cc: andy, netdev, lhh, fubar, bonding-devel
In-Reply-To: <1270048448.2103.28.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 31 Mar 2010 17:14:08 +0200

> ARCH=x86_64 CROSS_COMPILE=x86_64-unknown-linux- make vmlinux
> 
> so, it was a cross compiler :
> 
> /data/x86-64/bin/x86_64-unknown-linux-gcc-4.1.2 -v
> Using built-in specs.
> Target: x86_64-unknown-linux
> Configured with: ../gcc-4.1.2/configure --prefix=/data/x86-64
> --target=x86_64-unknown-linux --enable-languages=c --disable-shared
> --disable-multilib --disable-threads --disable-libssp --without-headers
> --disable-libmudflap
> Thread model: single
> gcc version 4.1.2

Funny how going back in time gives us better diagnostic messages from
the compiler :-)

FWIW I also didn't get the warning, and that was with gcc-4.5 built
from the gcc trunk just the other day.

I suspect this is to do with a change to what warnings get enabled by
default with the -W options we put in the cflags rather than gcc
losing the ability to detect this case.

^ permalink raw reply

* RE: re-submit2 [ANNOUNCEMENT] NET: usb: sierra_net.c driver
From: Elina Pasheva @ 2010-03-31 20:58 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: Rory Filer,
	dbrownell-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <8F90F944E50427428C60E12A34A309D22FC800D825-3qf8vkpM5jTbmvMHnzRVpW3Pdy6AhKVLXbPIYa3/oNjDOqzlkpFKJg@public.gmane.org>

On Tue, 2010-03-30 at 23:21 -0700, Rory Filer wrote:
> -----Original Message-----
> > >
> > > Applied, thanks.
> > 
> > Nevermind, reverted, it doesn't even build.
> > 
> > drivers/net/usb/sierra_net.c: In function 'check_ethip_packet':
> > drivers/net/usb/sierra_net.c:221:3: error: implicit declaration of
> > function 'deverr'
> > 
> > Really, this should never ever happen, and there is simply
> > no excuse at all for this.
> 
> Well actually there is an excuse, but I'm sure you don't want to
> know what it is. 

The patch was posted saying it had only been tested on 2.6.33.1. 
There was failure-to-build and boot problems  beyond our control with 2.6.34-rc2.
Now that 2.6.34-rc3 is available (and we are able to build and boot on our Ubuntu platform)
I will modify the driver to reflect the fact that deverr is no longer in existence
and is replaced by another function. I will submit the modified driver tested on 2.6.34-rc3 shortly.
Perhaps we should have waited until the build problem was fixed.
Regards,
Elina

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: David Miller @ 2010-03-31 20:57 UTC (permalink / raw)
  To: herbert; +Cc: hadi, timo.teras, netdev
In-Reply-To: <20100331141525.GA14331@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 31 Mar 2010 22:15:25 +0800

> Dave can keep or revert this as he likes.

Herbert, if it doesn't break anything, I feel the gains are very
worthwhile.

When we have a breakage report, I will undo this or fix it
immediately.  Until then, we have to be cognizant of what positives we
get out of this.

Have you actually tried to monitor the IPSEC netlink socket events
when a daemon is running?  It's way too painful before Jamal's
changes, and we already emit way too many semantically empty events in
netlink.

Minimization of the noise is a good thing, if it can legally be done,
which I believe is the case here.  And this applies to pfkey events
too, because processing those is even more expensive on the
application side.

^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: David Miller @ 2010-03-31 20:54 UTC (permalink / raw)
  To: herbert; +Cc: hadi, timo.teras, netdev
In-Reply-To: <20100331135629.GD14082@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 31 Mar 2010 21:56:29 +0800

> This is inconsistent with the behaviour of SADB flushes, and the
> spirit of RFC2367.

Want to have an even larger list of areas where we're not compliant
with that RFC2367 that everyone has to extend and makes changes to in
order to be useful? :-)

Herbert, you have to show that something breaks because of this,
because the new behavior helps in diagnosis and efficiency and the old
behavior has been a serious pet peeve of mine just as it was for
Jamal.

^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: David Miller @ 2010-03-31 20:52 UTC (permalink / raw)
  To: herbert; +Cc: hadi, timo.teras, netdev
In-Reply-To: <20100331135331.GB14082@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 31 Mar 2010 21:53:31 +0800

> Dave, please revert this patch.

Herbert I actually don't agree with you on this one, sorry :-)

If no semantic change occurred to the policy or state databases,
there is zero reason to report the flush event, it's noise.

Like Jamal, I too am pretty frustrated with how noisy and unusable the
netlink messages are to watch when trying to debug something and this
behavior change should break no applications whatsoever.

When you have evidence of a real existing case failing because of
Jamal's changes (not some case you crafted specifically to show that
an empty flush doesn't get reported), I'll consider a revert or ask
Jamal to fix that case up.

Until then, Jamal's change is staying in the tree.

^ permalink raw reply

* Re: [PATCHv3] drivers/net/usb: Add new driver ipheth
From: Oliver Neukum @ 2010-03-31 20:33 UTC (permalink / raw)
  To: L. Alberto Giménez
  Cc: linux-kernel, netdev, linux-usb, linville, j.dumon,
	steve.glendinning, davem, gregkh, dgiagio, dborca
In-Reply-To: <1270064527-8485-1-git-send-email-agimenez@sysvalve.es>

Am Mittwoch, 31. März 2010 21:42:07 schrieb L. Alberto Giménez:

Hi,

a few comments below.

	Regards
		Oliver

> +
> +static struct usb_device_id ipheth_table[] = {
> +	{ USB_DEVICE(USB_VENDOR_APPLE, USB_PRODUCT_IPHETH) },
> +	{ USB_DEVICE(USB_VENDOR_APPLE, USB_PRODUCT_IPHETH_3G) },
> +	{ USB_DEVICE(USB_VENDOR_APPLE, USB_PRODUCT_IPHETH_3GS) },

Please use the macros specifying the interface.

> +	{ }
> +};
> +MODULE_DEVICE_TABLE(usb, ipheth_table);


> +static void ipheth_unlink_urbs(struct ipheth_device *dev)

The naming might be misleading.

> +{
> +	usb_kill_urb(dev->tx_urb);
> +	usb_kill_urb(dev->rx_urb);
> +}

> +
> +static void ipheth_sndbulk_callback(struct urb *urb)
> +{
> +	struct ipheth_device *dev;
> +
> +	dev = urb->context;
> +	if (dev == NULL)
> +		return;
> +
> +	if (urb->status != 0 &&
> +	    urb->status != -ENOENT &&
> +	    urb->status != -ECONNRESET &&
> +	    urb->status != -ESHUTDOWN)
> +		err("%s: urb status: %d", __func__, urb->status);
> +
> +	netif_wake_queue(dev->net);
> +	dev_kfree_skb_irq(dev->tx_skb);

Race condition. You must free the skb before you wake the queue.

> +}

> +
> +static int ipheth_open(struct net_device *net)
> +{
> +	struct ipheth_device *dev = netdev_priv(net);
> +	struct usb_device *udev = dev->udev;
> +	int retval = 0;
> +
> +	usb_set_interface(udev, IPHETH_INTFNUM, IPHETH_ALT_INTFNUM);
> +	usb_clear_halt(udev, usb_rcvbulkpipe(udev, dev->bulk_in));
> +	usb_clear_halt(udev, usb_sndbulkpipe(udev, dev->bulk_out));

Is this really needed? If so, please add a comment.

> +
> +	retval = ipheth_carrier_set(dev);
> +	if (retval)
> +		goto error;
> +
> +	retval = ipheth_rx_submit(dev, GFP_KERNEL);
> +	if (retval)
> +		goto error;
> +
> +	schedule_delayed_work(&dev->carrier_work, IPHETH_CARRIER_CHECK_TIMEOUT);

Does it make sense to start rx while you have no carrier?

> +	netif_start_queue(net);
> +error:
> +	return retval;
> +}

> +static int ipheth_tx(struct sk_buff *skb, struct net_device *net)
> +{
> +	struct ipheth_device *dev = netdev_priv(net);
> +	struct usb_device *udev = dev->udev;
> +	int retval;
> +
> +	/* Paranoid */
> +	if (skb->len > IPHETH_BUF_SIZE) {
> +		err("%s: skb too large: %d bytes", __func__, skb->len);
> +		dev->stats.tx_dropped++;
> +		dev_kfree_skb_irq(skb);
> +		goto exit;
> +	}
> +
> +	memset(dev->tx_buf, 0, IPHETH_BUF_SIZE);
> +	memcpy(dev->tx_buf, skb->data, skb->len);

a bit wasteful

> +static int ipheth_probe(struct usb_interface *intf,
> +			 const struct usb_device_id *id)
> +{
> +	struct usb_device *udev = interface_to_usbdev(intf);
> +	struct usb_host_interface *hintf;
> +	struct usb_endpoint_descriptor *endp;
> +	struct ipheth_device *dev;
> +	struct net_device *netdev;
> +	int i;
> +	int retval;
> +
> +	/* Ensure we are probing the right interface */
> +	if (intf->cur_altsetting->desc.bInterfaceClass != IPHETH_USBINTF_CLASS ||
> +	    intf->cur_altsetting->desc.bInterfaceSubClass != IPHETH_USBINTF_SUBCLASS)
> +		return -ENODEV;

Please use the correct macro to specify the interface you are interested in

> +static void ipheth_disconnect(struct usb_interface *intf)
> +{
> +	struct ipheth_device *dev;
> +
> +	dev = usb_get_intfdata(intf);
> +	if (dev != NULL) {

is this check needed?

> +static struct usb_driver ipheth_driver = {
> +	.name =		"ipheth",
> +	.probe =	ipheth_probe,
> +	.disconnect =	ipheth_disconnect,
> +	.id_table =	ipheth_table,
> +	.supports_autosuspend = 0,

redundant

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox