Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] mlx4: Performing SENSE_PORT command only when supported
From: Yevgeny Petrilin @ 2011-05-04 13:38 UTC (permalink / raw)
  To: davem; +Cc: netdev, yevgenyp


Not all HW supports this functionality, and in this case FW would
report command error.
This patch checks this capability before trying to sense link partner.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/mlx4/main.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 3814fc9..f47ac5a 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -944,10 +944,12 @@ static int mlx4_setup_hca(struct mlx4_dev *dev)
 	}
 
 	for (port = 1; port <= dev->caps.num_ports; port++) {
-		enum mlx4_port_type port_type = 0;
-		mlx4_SENSE_PORT(dev, port, &port_type);
-		if (port_type)
-			dev->caps.port_type[port] = port_type;
+		if (dev->caps.flags & MLX4_DEV_CAP_FLAG_DPDP) {
+			enum mlx4_port_type port_type = 0;
+			mlx4_SENSE_PORT(dev, port, &port_type);
+			if (port_type)
+				dev->caps.port_type[port] = port_type;
+		}
 		ib_port_default_caps = 0;
 		err = mlx4_get_port_ib_caps(dev, port, &ib_port_default_caps);
 		if (err)
-- 
1.6.0.2




^ permalink raw reply related

* [PATCH] mlx4_en: Setting RSS hash result to skb->rxhash field
From: Yevgeny Petrilin @ 2011-05-04 13:37 UTC (permalink / raw)
  To: davem; +Cc: netdev, yevgenyp


Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/mlx4/en_rx.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/mlx4/en_rx.c b/drivers/net/mlx4/en_rx.c
index 62dd21b..bb4d66a 100644
--- a/drivers/net/mlx4/en_rx.c
+++ b/drivers/net/mlx4/en_rx.c
@@ -610,6 +610,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 					gro_skb->data_len = length;
 					gro_skb->truesize += length;
 					gro_skb->ip_summed = CHECKSUM_UNNECESSARY;
+					gro_skb->rxhash = be32_to_cpu(cqe->immed_rss_invalid) << 24;
+					skb_record_rx_queue(gro_skb, cq->ring);
 
 					if (priv->vlgrp && (cqe->vlan_my_qpn &
 							    cpu_to_be32(MLX4_CQE_VLAN_PRESENT_MASK)))
@@ -643,6 +645,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 			goto next;
 		}
 
+		skb->rxhash = be32_to_cpu(cqe->immed_rss_invalid) << 24;
 		skb->ip_summed = ip_summed;
 		skb->protocol = eth_type_trans(skb, dev);
 		skb_record_rx_queue(skb, cq->ring);
-- 
1.6.0.2




^ permalink raw reply related

* Re: [PATCH v4 1/1] can: add pruss CAN driver.
From: Arnd Bergmann @ 2011-05-04 13:11 UTC (permalink / raw)
  To: Subhasish Ghosh
  Cc: sachi-EvXpCiN+lbve9wHmmfpqLFaTQe2KTcn/,
	davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/,
	Netdev-u79uwXL29TY76Z2rM5mHXA, nsekhar-l0cyMroinI0, open list,
	CAN NETWORK DRIVERS, Marc Kleine-Budde,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	m-watkins-l0cyMroinI0, Wolfgang Grandegger
In-Reply-To: <15AD189F851849F69A011B6F4D1DDB6C@subhasishg>

On Wednesday 04 May 2011, Subhasish Ghosh wrote:
> CAN requires mail box IDs to be programmed in. But, the socket
> CAN subsystem supports only software filtering of the mail box IDs.
> 
> So, the mail box IDs programmed into socket CAN during initialization
> does not propagate into the hardware. This is planned to be a future
> implementation in Socket CAN.
> 
> In our case, we support hardware filtering, to work around with this,
> Wolfgang (Socket CAN owner) suggested that we implement
> this using sysfs.
> 
> These setting are not for debugging, but to program the mail box IDs
> into the hardware. 

Ok, I see. Can you point me to that discussion?

Wolfgang, I'm a bit worried by the API being split between sockets and sysfs.
The problem is that once the sysfs API is established, users will start
relying on it, and you can no longer migrate away from it, even when
a later version of the Socket CAN also supports setting through a different
interface. What is the current interface to set mail box IDs in software?
How hard would it be to implement that feature in Socket CAN?

Is that something that Subhasish or someone else could to as a prerequisite
to merging the driver?

	Arnd

^ permalink raw reply

* Re: [PATCH 2/2] usbnet: Convert dbg to dev_dbg and neatening
From: Michał Mirosław @ 2011-05-04 12:42 UTC (permalink / raw)
  To: Joe Perches
  Cc: Sergei Shtylyov, Oliver Neukum, David Brownell,
	Greg Kroah-Hartman, netdev, linux-usb, linux-kernel
In-Reply-To: <1304511102.1788.91.camel@Joe-Laptop>

2011/5/4 Joe Perches <joe@perches.com>:
> On Wed, 2011-05-04 at 15:35 +0400, Sergei Shtylyov wrote:
>> Hello.
>
> And hello to you Sergei.
>
> On 03-05-2011 22:17, Joe Perches wrote:
>> > Use the more standard logging form.
>> > Add a bit more tidying style.
>>     Style changes look rather doubtful to me...
>> >   drivers/net/usb/usbnet.c |   23 +++++++++++------------
>> > diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> []
>> > @@ -192,8 +192,8 @@ static int init_status(struct usbnet *dev, struct usb_interface *intf)
>> >             return 0;
>> >
>> >     pipe = usb_rcvintpipe(dev->udev,
>> > -                         dev->status->desc.bEndpointAddress
>> > -                   & USB_ENDPOINT_NUMBER_MASK);
>> > +                         (dev->status->desc.bEndpointAddress
>> > +                   & USB_ENDPOINT_NUMBER_MASK));
>>
>>     Why add parens?
>
> Leading & uses are almost always addressof.
> This makes it easier for me to see that it's not an addressof use.

This is a clear case where 80-char limit impairs code readability. Why
not just use another variable?

int epn = dev->status->desc.bEndpointAddress & USB_ENDPOINT_NUMBER_MASK;
pipe = usb_rcvintpipe(dev->udev, epn);

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH 2/2 net-next] net: drivers: set TSO/UFO offload option explicitly
From: Michał Mirosław @ 2011-05-04 12:36 UTC (permalink / raw)
  To: Shan Wei
  Cc: David Miller, netdev, rusty, mst, Eric Dumazet, mirq-linux,
	bhutchings, dm
In-Reply-To: <4DBA4DF5.5020101@cn.fujitsu.com>

2011/4/29 Shan Wei <shanwei@cn.fujitsu.com>:
> The device drivers should not use NETIF_F_ALL_TSO mask to set hw_features(or features),
> but have to explicitly set offload option. Because, This would make drivers automatically
> clain to support any new TSO feature an the moment of NETIF_F_ALL_TSO is expanded.
>
> Some code style tuning. Just compile test.
>
> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
> ---
>  drivers/net/loopback.c   |   18 ++++++++----------
>  drivers/net/virtio_net.c |    9 ++++++---
>  2 files changed, 14 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
> index d70fb76..bfb6a4a 100644
> --- a/drivers/net/loopback.c
> +++ b/drivers/net/loopback.c
> @@ -152,6 +152,9 @@ static const struct net_device_ops loopback_ops = {
>        .ndo_get_stats64 = loopback_get_stats64,
>  };
>
> +#define LOOPBACK_USER_FEATURES (NETIF_F_TSO | NETIF_F_TSO_ECN | \
> +                               NETIF_F_TSO6 | NETIF_F_UFO)
> +
>  /*
>  * The loopback device is special. There is only one instance
>  * per network namespace.
> @@ -165,16 +168,11 @@ static void loopback_setup(struct net_device *dev)
>        dev->type               = ARPHRD_LOOPBACK;      /* 0x0001*/
>        dev->flags              = IFF_LOOPBACK;
>        dev->priv_flags        &= ~IFF_XMIT_DST_RELEASE;
> -       dev->hw_features        = NETIF_F_ALL_TSO | NETIF_F_UFO;
> -       dev->features           = NETIF_F_SG | NETIF_F_FRAGLIST
> -               | NETIF_F_ALL_TSO
> -               | NETIF_F_UFO
> -               | NETIF_F_NO_CSUM
> -               | NETIF_F_RXCSUM
> -               | NETIF_F_HIGHDMA
> -               | NETIF_F_LLTX
> -               | NETIF_F_NETNS_LOCAL
> -               | NETIF_F_VLAN_CHALLENGED;
> +       dev->hw_features        = LOOPBACK_USER_FEATURES;
> +       dev->features           = NETIF_F_SG | NETIF_F_FRAGLIST
> +               | LOOPBACK_USER_FEATURES | NETIF_F_NO_CSUM | NETIF_F_RXCSUM
> +               | NETIF_F_HIGHDMA | NETIF_F_LLTX
> +               | NETIF_F_NETNS_LOCAL | NETIF_F_VLAN_CHALLENGED;
>        dev->ethtool_ops        = &loopback_ethtool_ops;
>        dev->header_ops         = &eth_header_ops;
>        dev->netdev_ops         = &loopback_ops;

You can add NETIF_F_HIGHDMA and NETIF_F_NO_CSUM to
LOOPBACK_USER_FEATURES in one go. NETIF_F_RXCSUM could match
NETIF_F_NO_CSUM state (this needs ndo_fix_features callback), but this
won't have much real functional impact.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH 2/2] usbnet: Convert dbg to dev_dbg and neatening
From: Joe Perches @ 2011-05-04 12:11 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Oliver Neukum, David Brownell, Greg Kroah-Hartman, netdev,
	linux-usb, linux-kernel
In-Reply-To: <4DC13A1B.8020004@ru.mvista.com>

On Wed, 2011-05-04 at 15:35 +0400, Sergei Shtylyov wrote:
> Hello.

And hello to you Sergei.

On 03-05-2011 22:17, Joe Perches wrote:
> > Use the more standard logging form.
> > Add a bit more tidying style.
>     Style changes look rather doubtful to me...
> >   drivers/net/usb/usbnet.c |   23 +++++++++++------------
> > diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
[]
> > @@ -192,8 +192,8 @@ static int init_status(struct usbnet *dev, struct usb_interface *intf)
> >   		return 0;
> >
> >   	pipe = usb_rcvintpipe(dev->udev,
> > -			      dev->status->desc.bEndpointAddress
> > -			& USB_ENDPOINT_NUMBER_MASK);
> > +			      (dev->status->desc.bEndpointAddress
> > +			& USB_ENDPOINT_NUMBER_MASK));
> 
>     Why add parens?

Leading & uses are almost always addressof.
This makes it easier for me to see that it's not an addressof use.

> > @@ -1345,8 +1346,9 @@ usbnet_probe(struct usb_interface *udev, const struct usb_device_id *prod)
> >   	dev->intf = udev;
> >   	dev->driver_info = info;
> >   	dev->driver_name = name;
> > -	dev->msg_enable = netif_msg_init(msg_level, NETIF_MSG_DRV
> > -					 | NETIF_MSG_PROBE | NETIF_MSG_LINK);
> > +	dev->msg_enable = netif_msg_init(msg_level, (NETIF_MSG_DRV |
> > +						     NETIF_MSG_PROBE |
> > +						     NETIF_MSG_LINK));
> 
>     Why add parens?

I think it's neater.
Or's are almost always placed at EOL.
It also makes alignment easier in emacs.
It could otherwise be a #define on a single line
like several other uses of netif_msg_init.

> > @@ -1485,16 +1486,14 @@ int usbnet_suspend(struct usb_interface *intf, pm_message_t message)
> >   			set_bit(EVENT_DEV_ASLEEP,&dev->flags);
> >   			spin_unlock_irq(&dev->txq.lock);
> >   		}
> > -		/*
> > -		 * accelerate emptying of the rx and queues, to avoid
> > +		/* accelerate emptying of the rx and queues, to avoid
> 
>     Why?

It's the style David Miller prefers for drivers/net/...
I don't care either way.

http://www.spinics.net/lists/netdev/msg139647.html

cheers, Joe

^ permalink raw reply

* 2.6.38.2, kernel panic, probably related to framentation handling
From: Denys Fedoryshchenko @ 2011-05-04 11:36 UTC (permalink / raw)
  To: netdev

 Seems once more, during trying to bring another type of tunnel (this 
 time userspace, working over tun device) and switching routes got one 
 more kernel panic
 It is vanilla kernel, but many source routing rules, firewall, QoS and 
 etc, including this tunnel now also. Here is what i got on netconsole:
 Any other info required?

 netc [1192230.881002]
 netc [1192230.881002] Pid: 0, comm: kworker/0:1 Not tainted 
 2.6.38.2-devel2 #2
 netc
 netc Dell Inc. PowerEdge 1950
 netc /
 netc 0D8635
 netc
 netc [1192230.881002] EIP: 0060:[<c03c0847>] EFLAGS: 00010206 CPU: 3
 netc [1192230.881002] EIP is at icmp_send+0x39/0x396
 netc [1192230.881002] EAX: 121a8aca EBX: d1d28600 ECX: 00000001 EDX: 
 c63b6600
 netc [1192230.881002] ESI: d1d28600 EDI: c33438a0 EBP: f2a41840 ESP: 
 f64b5e8c
 netc [1192230.881002]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
 netc [1192230.881002] Process kworker/0:1 (pid: 0, ti=f64b4000 
 task=f64a4a80 task.ti=f64b0000)
 netc [1192230.881002] Stack:
 netc [1192230.881002]  c0113ea1
 netc 00000001
 netc 0000000b
 netc 00000000
 netc efed48c7
 netc 0000114a
 netc 00000000
 netc f6b01fa8
 netc
 netc [1192230.881002]  e21be5e1
 netc c0148bf3
 netc e217d5d0
 netc 00043c53
 netc 00000001
 netc f6b02d14
 netc e21be5e1
 netc 00043c53
 netc
 netc [1192230.881002]  e21be5e1
 netc 00043c53
 netc c0148d2e
 netc 00000000
 netc 00000058
 netc 00000000
 netc c0140779
 netc ce9f5aa9
 netc
 netc [1192230.881002] Call Trace:
 netc [1192230.881002]  [<c0113ea1>] ? lapic_next_event+0x13/0x16
 netc [1192230.881002]  [<c0148bf3>] ? tick_dev_program_event+0x26/0x116
 netc [1192230.881002]  [<c0148d2e>] ? tick_program_event+0x1b/0x1f
 netc [1192230.881002]  [<c0140779>] ? hrtimer_interrupt+0x10c/0x1ca
 netc [1192230.881002]  [<c0140e49>] ? hrtimer_start+0x20/0x25
 netc [1192230.881002]  [<c012f18e>] ? irq_exit+0x36/0x59
 netc [1192230.881002]  [<c0114933>] ? 
 smp_apic_timer_interrupt+0x71/0x7d
 netc [1192230.881002]  [<c03f2752>] ? apic_timer_interrupt+0x2a/0x30
 netc [1192230.881002]  [<c039f527>] ? ip_expire+0xf2/0x11b
 netc [1192230.881002]  [<c039f435>] ? ip_expire+0x0/0x11b
 netc [1192230.881002]  [<c0133421>] ? run_timer_softirq+0x140/0x1c7
 netc [1192230.881002]  [<c012f28f>] ? __do_softirq+0x6b/0x104
 netc [1192230.881002]  [<c012f224>] ? __do_softirq+0x0/0x104
 netc [1192230.881002]  [<c012f224>] ? __do_softirq+0x0/0x104
 netc [1192230.881002]  <IRQ>
 netc
 netc [1192230.881002]  [<c012f17e>] ? irq_exit+0x26/0x59
 netc [1192230.881002]  [<c0103b3d>] ? do_IRQ+0x81/0x95
 netc [1192230.881002]  [<c0114933>] ? 
 smp_apic_timer_interrupt+0x71/0x7d
 netc [1192230.881002]  [<c0102ca9>] ? common_interrupt+0x29/0x30
 netc [1192230.881002]  [<c010807a>] ? mwait_idle+0x51/0x56
 netc [1192230.881002]  [<c0101a97>] ? cpu_idle+0x41/0x5e
 netc [1192230.881002] Code:
 netc 08
 netc 89
 netc c6
 netc 89
 netc 4c
 netc 24
 netc 04
 netc 8b
 netc 40
 netc 48
 netc 89                                                                 
                       netc c2
 netc 83
 netc e2
 netc fe
 netc 0f
 netc 84
 netc 66
 netc 03
 netc 00
 netc 00
 netc 89
 netc 94
 netc 24
 netc c0
 netc 00
 netc 00
 netc 00
 netc 8b
 netc 42
 netc 0c
 netc 8b
 netc be
 netc 94
 netc 00
 netc 00
 netc 00
 netc 3b
 netc be
 netc a4
 netc 00
 netc 00
 netc 00
 May  4 11:17:12 217.151.224.119 unparseable log message: "<8b> "
 netc 80
 netc 80
 netc 02
 netc 00
 netc 00
 netc 89
 netc 44
 netc 24
 netc 10
 netc 0f
 netc 82
 netc 40
 netc 03
 netc 00
 netc 00
 netc 8d
 netc 47
 netc 14
 netc 39
 netc 86
 netc
 netc [1192230.881002] EIP: [<c03c0847>]
 netc icmp_send+0x39/0x396
 netc SS:ESP 0068:f64b5e8c
 netc [1192230.881002] CR2: 00000000121a8d4a
 netc [1192230.910072] ---[ end trace 42aae79d7fb08725 ]---
 netc [1192230.910354] Kernel panic - not syncing: Fatal exception in 
 interrupt
 netc [1192230.911062] Rebooting in 5 seconds..


^ permalink raw reply

* Re: [PATCH 2/2] usbnet: Convert dbg to dev_dbg and neatening
From: Sergei Shtylyov @ 2011-05-04 11:35 UTC (permalink / raw)
  To: Joe Perches
  Cc: Oliver Neukum, David Brownell, Greg Kroah-Hartman, netdev,
	linux-usb, linux-kernel
In-Reply-To: <f86364fbdc335fadad003081d010843579e95765.1304445019.git.joe@perches.com>

Hello.

On 03-05-2011 22:17, Joe Perches wrote:

> Use the more standard logging form.
> Add a bit more tidying style.

    Style changes look rather doubtful to me...

> Signed-off-by: Joe Perches<joe@perches.com>
> ---
>   drivers/net/usb/usbnet.c |   23 +++++++++++------------
>   1 files changed, 11 insertions(+), 12 deletions(-)

> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index 28aecbb..b803085 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -192,8 +192,8 @@ static int init_status(struct usbnet *dev, struct usb_interface *intf)
>   		return 0;
>
>   	pipe = usb_rcvintpipe(dev->udev,
> -			      dev->status->desc.bEndpointAddress
> -			& USB_ENDPOINT_NUMBER_MASK);
> +			      (dev->status->desc.bEndpointAddress
> +			& USB_ENDPOINT_NUMBER_MASK));

    Why add parens?

> @@ -1345,8 +1346,9 @@ usbnet_probe(struct usb_interface *udev, const struct usb_device_id *prod)
>   	dev->intf = udev;
>   	dev->driver_info = info;
>   	dev->driver_name = name;
> -	dev->msg_enable = netif_msg_init(msg_level, NETIF_MSG_DRV
> -					 | NETIF_MSG_PROBE | NETIF_MSG_LINK);
> +	dev->msg_enable = netif_msg_init(msg_level, (NETIF_MSG_DRV |
> +						     NETIF_MSG_PROBE |
> +						     NETIF_MSG_LINK));

    Why add parens?

> @@ -1466,8 +1468,7 @@ EXPORT_SYMBOL_GPL(usbnet_probe);
>
>   /*-------------------------------------------------------------------------*/
>
> -/*
> - * suspend the whole driver as soon as the first interface is suspended
> +/* suspend the whole driver as soon as the first interface is suspended

    Why? This already corresponded to the preferred multi-line comment style...

>    * resume only when the last interface is resumed
>    */
>
> @@ -1485,16 +1486,14 @@ int usbnet_suspend(struct usb_interface *intf, pm_message_t message)
>   			set_bit(EVENT_DEV_ASLEEP,&dev->flags);
>   			spin_unlock_irq(&dev->txq.lock);
>   		}
> -		/*
> -		 * accelerate emptying of the rx and queues, to avoid
> +		/* accelerate emptying of the rx and queues, to avoid

    Why?

>   		 * having everything error out.
>   		 */
>   		netif_device_detach(dev->net);
>   		usbnet_terminate_urbs(dev);
>   		usb_kill_urb(dev->interrupt);
>
> -		/*
> -		 * reattach so runtime management can use and
> +		/* reattach so runtime management can use and

    Why?

WBR, Sergei

^ permalink raw reply

* Re: [PATCH 1/2] net: Allow ethtool to set interface in loopback mode.
From: Michał Mirosław @ 2011-05-04 11:15 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: Matt Carlson, David Miller, netdev, Michael Chan, Ben Hutchings,
	Tom Herbert
In-Reply-To: <1304471935-402-2-git-send-email-maheshb@google.com>

On Tue, May 03, 2011 at 06:18:54PM -0700, Mahesh Bandewar wrote:
> This patch enables ethtool to set the loopback mode on a given interface.
> By configuring the interface in loopback mode in conjunction with a policy
> route / rule, a userland application can stress the egress / ingress path
> exposing the flows of the change in progress and potentially help developer(s)
> understand the impact of those changes without even sending a packet out
> on the network.
> 
> Following set of commands illustrates one such example -
>     a) ip -4 addr add 192.168.1.1/24 dev eth1
>     b) ip -4 rule add from all iif eth1 lookup 250
>     c) ip -4 route add local 0/0 dev lo proto kernel scope host table 250
>     d) arp -Ds 192.168.1.100 eth1
>     e) arp -Ds 192.168.1.200 eth1
>     f) sysctl -w net.ipv4.ip_nonlocal_bind=1
>     g) sysctl -w net.ipv4.conf.all.accept_local=1
>     # Assuming that the machine has 8 cores
>     h) taskset 000f netserver -L 192.168.1.200
>     i) taskset 00f0 netperf -t TCP_CRR -L 192.168.1.100 -H 192.168.1.200 -l 30
> 
> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> ---
>  include/linux/netdevice.h |    3 ++-
>  net/core/ethtool.c        |    2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index d5de66a..e7244ed 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1067,6 +1067,7 @@ struct net_device {
>  #define NETIF_F_RXHASH		(1 << 28) /* Receive hashing offload */
>  #define NETIF_F_RXCSUM		(1 << 29) /* Receive checksumming offload */
>  #define NETIF_F_NOCACHE_COPY	(1 << 30) /* Use no-cache copyfromuser */
> +#define NETIF_F_LOOPBACK	(1 << 31) /* Enable loopback */
[...]

Just for correctness: you could add this flag to loopback's dev->features.
It's just an aesthetics point, though.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCHv3 2/2] tg3: Allow ethtool to enable/disable loopback.
From: Michał Mirosław @ 2011-05-04 11:11 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: Matt Carlson, David Miller, netdev, Michael Chan, Ben Hutchings,
	Tom Herbert
In-Reply-To: <1304471935-402-3-git-send-email-maheshb@google.com>

On Tue, May 03, 2011 at 06:18:55PM -0700, Mahesh Bandewar wrote:
> This patch adds tg3_set_features() to handle loopback mode. Currently the
> capability is added for the devices which support internal MAC loopback mode.
> So when enabled, it enables internal-MAC loopback.
[...]
> diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> index 7c7c9a8..46de633 100644
> --- a/drivers/net/tg3.c
> +++ b/drivers/net/tg3.c
> @@ -6319,6 +6319,51 @@ static u32 tg3_fix_features(struct net_device *dev, u32 features)
>  	return features;
>  }
>  
> +static int tg3_set_features(struct net_device *dev, u32 features)
> +{
> +	struct tg3 *tp = netdev_priv(dev);
> +	u32 cur_mode = 0;
> +	int err = 0;
> +
> +	if (!netif_running(dev)) {
> +		err = -EAGAIN;
> +		goto sfeatures_out;
> +	}

netdev_update_features() is not designed to handle -EAGAIN from
ndo_set_features callback. It might be useful to implement this
handling, but in this case you should just return 0 and check
dev->features in ndo_open callback.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH V4 4/8]vhost: vhost TX zero-copy support
From: Michael S. Tsirkin @ 2011-05-04  9:56 UTC (permalink / raw)
  To: Shirley Ma
  Cc: David Miller, Eric Dumazet, Avi Kivity, Arnd Bergmann, netdev,
	kvm, linux-kernel
In-Reply-To: <1304496684.20660.84.camel@localhost.localdomain>

On Wed, May 04, 2011 at 01:11:24AM -0700, Shirley Ma wrote:
> This patch maintains the outstanding userspace buffers in the 
> sequence it is delivered to vhost. The outstanding userspace buffers 
> will be marked as done once the lower device buffers DMA has finished. 
> This is monitored through last reference of kfree_skb callback. Two
> buffer index are used for this purpose.
> 
> The vhost passes the userspace buffers info to lower device skb 
> through message control. Since there will be some done DMAs when
> entering vhost handle_tx. The worse case is all buffers in the vq are
> in pending/done status, so we need to notify guest to release DMA done 
> buffers first before get any new buffers from the vq.
> 
> Signed-off-by: Shirley <xma@us.ibm.com>

Looks good overall. Some nits to iron out below.

> ---
> 
>  drivers/vhost/net.c   |   30 +++++++++++++++++++++++++++-
>  drivers/vhost/vhost.c |   50
> ++++++++++++++++++++++++++++++++++++++++++++++++-
>  drivers/vhost/vhost.h |   10 +++++++++
>  3 files changed, 87 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 2f7c76a..c403afb 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -32,6 +32,8 @@
>   * Using this limit prevents one virtqueue from starving others. */
>  #define VHOST_NET_WEIGHT 0x80000
>  
> +#define MAX_ZEROCOPY_PEND 64
> +

Pls document what the above is. Also scope with VHOST_

>  enum {
>  	VHOST_NET_VQ_RX = 0,
>  	VHOST_NET_VQ_TX = 1,
> @@ -129,6 +131,7 @@ static void handle_tx(struct vhost_net *net)
>  	int err, wmem;
>  	size_t hdr_size;
>  	struct socket *sock;
> +	struct skb_ubuf_info pend;
>  
>  	/* TODO: check that we are running from vhost_worker? */
>  	sock = rcu_dereference_check(vq->private_data, 1);
> @@ -151,6 +154,10 @@ static void handle_tx(struct vhost_net *net)
>  	hdr_size = vq->vhost_hlen;
>  
>  	for (;;) {
> +		/* Release DMAs done buffers first */
> +		if (sock_flag(sock->sk, SOCK_ZEROCOPY))
> +			vhost_zerocopy_signal_used(vq);
> +
>  		head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
>  					 ARRAY_SIZE(vq->iov),
>  					 &out, &in,
> @@ -166,6 +173,12 @@ static void handle_tx(struct vhost_net *net)
>  				set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
>  				break;
>  			}
> +			/* If more outstanding DMAs, queue the work */
> +			if (sock_flag(sock->sk, SOCK_ZEROCOPY) &&
> +			    (atomic_read(&vq->refcnt) > MAX_ZEROCOPY_PEND)) {
> +				vhost_poll_queue(&vq->poll);

Well, this just keeps polling, doesn't it?
If you want to wait until # of DMAs is below MAX_ZEROCOPY_PEND,
you'll need to do the queueing from some callback.

Something like this: when refcnt is above 2 * MAX_ZEROCOPY_PEND,
stop submitting and wait until some are freed.

BTW, can the socket poll wakeup do the job?

> +				break;
> +			}
>  			if (unlikely(vhost_enable_notify(vq))) {
>  				vhost_disable_notify(vq);
>  				continue;
> @@ -188,17 +201,30 @@ static void handle_tx(struct vhost_net *net)
>  			       iov_length(vq->hdr, s), hdr_size);
>  			break;
>  		}
> +		/* use msg_control to pass vhost zerocopy ubuf info to skb */
> +		if (sock_flag(sock->sk, SOCK_ZEROCOPY)) {
> +			pend.callback = vhost_zerocopy_callback;
> +			pend.arg = vq;
> +			pend.desc = vq->upend_idx;
> +			msg.msg_control = &pend;
> +			msg.msg_controllen = sizeof(pend);
> +			vq->heads[vq->upend_idx].id = head;
> +			vq->upend_idx = (vq->upend_idx + 1) % UIO_MAXIOV;
> +			atomic_inc(&vq->refcnt);
> +		}
>  		/* TODO: Check specific error and bomb out unless ENOBUFS? */
>  		err = sock->ops->sendmsg(NULL, sock, &msg, len);
>  		if (unlikely(err < 0)) {
> -			vhost_discard_vq_desc(vq, 1);
> +			if (!sock_flag(sock->sk, SOCK_ZEROCOPY))
> +				vhost_discard_vq_desc(vq, 1);
>  			tx_poll_start(net, sock);
>  			break;
>  		}
>  		if (err != len)
>  			pr_debug("Truncated TX packet: "
>  				 " len %d != %zd\n", err, len);
> -		vhost_add_used_and_signal(&net->dev, vq, head, 0);
> +		if (!sock_flag(sock->sk, SOCK_ZEROCOPY))
> +			vhost_add_used_and_signal(&net->dev, vq, head, 0);
>  		total_len += len;
>  		if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
>  			vhost_poll_queue(&vq->poll);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 2ab2912..3048953 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -174,6 +174,9 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>  	vq->call_ctx = NULL;
>  	vq->call = NULL;
>  	vq->log_ctx = NULL;
> +	vq->upend_idx = 0;
> +	vq->done_idx = 0;
> +	atomic_set(&vq->refcnt, 0);
>  }
>  
>  static int vhost_worker(void *data)
> @@ -230,7 +233,7 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev
> *dev)
>  					       UIO_MAXIOV, GFP_KERNEL);
>  		dev->vqs[i].log = kmalloc(sizeof *dev->vqs[i].log * UIO_MAXIOV,
>  					  GFP_KERNEL);
> -		dev->vqs[i].heads = kmalloc(sizeof *dev->vqs[i].heads *
> +		dev->vqs[i].heads = kzalloc(sizeof *dev->vqs[i].heads *
>  					    UIO_MAXIOV, GFP_KERNEL);

Do we really need to zero it all out? We generally tried to only
init what is necessary ...

>  		if (!dev->vqs[i].indirect || !dev->vqs[i].log ||
> @@ -385,10 +388,41 @@ long vhost_dev_reset_owner(struct vhost_dev *dev)
>  	return 0;
>  }
>  

Pls document what the below does.

> +void vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
> +{
> +	int i, j = 0;
> +
> +	i = vq->done_idx;
> +	while (i != vq->upend_idx) {
> +		/* len = 1 means DMA done */

Hmm. Guests aren't likely to use len 1 in practice,
but I think it's better to support this.

I'd suggest sticking extra stuff in id, IIRC only values
< vq size are legal there, anything else we can use.
Also, pls add some defines for special values, better than
a comment:
	if (len == VHOST_DMA_DONE_LEN)

> +		if (vq->heads[i].len == 1) {
> +			/* reset len = 0 */
> +			vq->heads[i].len = 0;
> +			i = (i + 1) % UIO_MAXIOV;
> +			++j;
> +		} else
> +			break;
> +	}
> +	if (j) {

Pls add some comments to explain the logic here.

> +		if (i > vq->done_idx)
> +			vhost_add_used_n(vq, &vq->heads[vq->done_idx], j);
> +		else {
> +			vhost_add_used_n(vq, &vq->heads[vq->done_idx],
> +					 UIO_MAXIOV - vq->done_idx);
> +			vhost_add_used_n(vq, vq->heads, i);
> +		}
> +		vq->done_idx = i;
> +		vhost_signal(vq->dev, vq);
> +		atomic_sub(j, &vq->refcnt);
> +	}
> +}
> +
>  /* Caller should have device mutex */
>  void vhost_dev_cleanup(struct vhost_dev *dev)
>  {
>  	int i;
> +	unsigned long begin = jiffies;
> +
>  
>  	for (i = 0; i < dev->nvqs; ++i) {
>  		if (dev->vqs[i].kick && dev->vqs[i].handle_kick) {
> @@ -405,6 +439,11 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>  			eventfd_ctx_put(dev->vqs[i].call_ctx);
>  		if (dev->vqs[i].call)
>  			fput(dev->vqs[i].call);
> +		/* wait for all lower device DMAs done, then notify guest */
> +		while (atomic_read(&dev->vqs[i].refcnt)) {
> +			if (time_after(jiffies, begin + 5 * HZ))

Hmm, does this actually busy-wait?  Let's at least sleep here.
Or maybe some wakeup scheme can be cooked up.
For example, have a kref with release function that signals some
completion.

> +				vhost_zerocopy_signal_used(&dev->vqs[i]);
> +		}
>  		vhost_vq_reset(dev, dev->vqs + i);
>  	}
>  	vhost_dev_free_iovecs(dev);
> @@ -1416,3 +1455,12 @@ void vhost_disable_notify(struct vhost_virtqueue
> *vq)
>  		vq_err(vq, "Failed to enable notification at %p: %d\n",
>  		       &vq->used->flags, r);
>  }
> +
> +void vhost_zerocopy_callback(struct sk_buff *skb)
> +{
> +	int idx = skb_shinfo(skb)->ubuf.desc;
> +	struct vhost_virtqueue *vq = skb_shinfo(skb)->ubuf.arg;
> +
> +	/* set len = 1 to mark this desc buffers done DMA */
> +	vq->heads[idx].len = 1;
> +}

So any kind of callback like that, that goes into the skb,
will be racy wrt module unloading because module can go away
after you mark dma done and before this function returns.
Solution is to have a core function that does the
final signalling (e.g. sock_wfree is in core).
Would be nice to fix, even though this race is
completely theoretical, I don't believe it will
trigger in practice.


> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index b3363ae..ec032a0 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -108,6 +108,14 @@ struct vhost_virtqueue {
>  	/* Log write descriptors */
>  	void __user *log_base;
>  	struct vhost_log *log;
> +	/* vhost zerocopy support */
> +	atomic_t refcnt; /* num of outstanding zerocopy DMAs */
> +	/* index of zerocopy pending DMA buffers */
> +	int upend_idx;
> +	/* index of zerocopy done DMA buffers, but not notify guest yet */
> +	int done_idx;

Pls try to find more descriptive names for the above,
and clarify the comments: I could not tell what do the
comments mean.

upend_idx seems to be a copy of avail idx?
done_idx is ... ?

> +	/* notify vhost zerocopy DMA buffers has done in lower device */

Do you mean 'notify vhost that zerocopy DMA is complete'?

> +	void (*callback)(struct sk_buff *);

Is this actually used?
If yes rename it zerocopy_dma_done or something like this?

>  };
>  
>  struct vhost_dev {
> @@ -154,6 +162,8 @@ bool vhost_enable_notify(struct vhost_virtqueue *);
>  
>  int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
>  		    unsigned int log_num, u64 len);
> +void vhost_zerocopy_callback(struct sk_buff *skb);
> +void vhost_zerocopy_signal_used(struct vhost_virtqueue *vq);
>  
>  #define vq_err(vq, fmt, ...) do {                                  \
>  		pr_debug(pr_fmt(fmt), ##__VA_ARGS__);       \
> 

^ permalink raw reply

* Re: [RFC v3 02/10] Revert "lsm: Remove the socket_post_accept() hook"
From: Samir Bellabes @ 2011-05-04  8:50 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: paul.moore, linux-security-module, linux-kernel, netdev,
	netfilter-devel, hadi, kaber, zbr, root
In-Reply-To: <201105041128.BAB13061.LMHVtOSOQOFFJF@I-love.SAKURA.ne.jp>

Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:

> Paul Moore wrote:
>> On Tuesday, May 03, 2011 10:24:15 AM Samir Bellabes wrote:
>> > snet needs to reintroduce this hook, as it was designed to be: a hook for
>> > updating security informations on objects.
>> 
>> Looking at this and 5/10 again, it seems that you should be able to do what 
>> you need with the sock_graft() hook.  Am I missing something?
>> 
>> My apologies if we've already discussed this approach previously ...
>
> Third problem (though independent with security_sock_graft()) is that
> snet_do_send_event() ignores snet_nl_send_event() failure.

using snet_do_send_event() means that system is sending data to
userspace. the system is not waiting for a verdict from userspace.

If error occurs, we actually loose the information data.
I may be able to write a solution which try to send the data again, but
we need a exit solution for this loop (a number of try ?).

^ permalink raw reply

* [PATCH V4 5/8]macvtap: macvtap TX zero-copy support
From: Shirley Ma @ 2011-05-04  8:14 UTC (permalink / raw)
  To: David Miller, mst, Eric Dumazet, Avi Kivity, Arnd Bergmann
  Cc: netdev, kvm, linux-kernel

Only when buffer size is greater than GOODCOPY_LEN (256), macvtap
enables zero-copy.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/net/macvtap.c |  126 ++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 115 insertions(+), 11 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 6696e56..e8bc5ff 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -60,6 +60,7 @@ static struct proto macvtap_proto = {
  */
 static dev_t macvtap_major;
 #define MACVTAP_NUM_DEVS 65536
+#define GOODCOPY_LEN 256
 static struct class *macvtap_class;
 static struct cdev macvtap_cdev;
 
@@ -340,6 +341,7 @@ static int macvtap_open(struct inode *inode, struct file *file)
 {
 	struct net *net = current->nsproxy->net_ns;
 	struct net_device *dev = dev_get_by_index(net, iminor(inode));
+	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct macvtap_queue *q;
 	int err;
 
@@ -369,6 +371,16 @@ static int macvtap_open(struct inode *inode, struct file *file)
 	q->flags = IFF_VNET_HDR | IFF_NO_PI | IFF_TAP;
 	q->vnet_hdr_sz = sizeof(struct virtio_net_hdr);
 
+	/*
+	 * so far only VM uses macvtap, enable zero copy between guest
+	 * kernel and host kernel when lower device supports high memory
+	 * DMA
+	 */
+	if (vlan) {
+		if (vlan->lowerdev->features & NETIF_F_ZEROCOPY)
+			sock_set_flag(&q->sk, SOCK_ZEROCOPY);
+	}
+
 	err = macvtap_set_queue(dev, file, q);
 	if (err)
 		sock_put(&q->sk);
@@ -433,6 +445,80 @@ static inline struct sk_buff *macvtap_alloc_skb(struct sock *sk, size_t prepad,
 	return skb;
 }
 
+/* set skb frags from iovec, this can move to core network code for reuse */
+static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from,
+				  int offset, size_t count)
+{
+	int len = iov_length(from, count) - offset;
+	int copy = skb_headlen(skb);
+	int size, offset1 = 0;
+	int i = 0;
+	skb_frag_t *f;
+
+	/* Skip over from offset */
+	while (offset >= from->iov_len) {
+		offset -= from->iov_len;
+		++from;
+		--count;
+	}
+
+	/* copy up to skb headlen */
+	while (copy > 0) {
+		size = min_t(unsigned int, copy, from->iov_len - offset);
+		if (copy_from_user(skb->data + offset1, from->iov_base + offset,
+				   size))
+			return -EFAULT;
+		if (copy > size) {
+			++from;
+			--count;
+		}
+		copy -= size;
+		offset1 += size;
+		offset = 0;
+	}
+
+	if (len == offset1)
+		return 0;
+
+	while (count--) {
+		struct page *page[MAX_SKB_FRAGS];
+		int num_pages;
+		unsigned long base;
+
+		len = from->iov_len - offset1;
+		if (!len) {
+			offset1 = 0;
+			++from;
+			continue;
+		}
+		base = (unsigned long)from->iov_base + offset1;
+		size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
+		num_pages = get_user_pages_fast(base, size, 0, &page[i]);
+		if ((num_pages != size) ||
+		    (num_pages > MAX_SKB_FRAGS - skb_shinfo(skb)->nr_frags))
+			/* put_page is in skb free */
+			return -EFAULT;
+		skb->data_len += len;
+		skb->len += len;
+		skb->truesize += len;
+		while (len) {
+			f = &skb_shinfo(skb)->frags[i];
+			f->page = page[i];
+			f->page_offset = base & ~PAGE_MASK;
+			f->size = min_t(int, len, PAGE_SIZE - f->page_offset);
+			skb_shinfo(skb)->nr_frags++;
+			/* increase sk_wmem_alloc */
+			atomic_add(f->size, &skb->sk->sk_wmem_alloc);
+			base += f->size;
+			len -= f->size;
+			i++;
+		}
+		offset1 = 0;
+		++from;
+	}
+	return 0;
+}
+
 /*
  * macvtap_skb_from_vnet_hdr and macvtap_skb_to_vnet_hdr should
  * be shared with the tun/tap driver.
@@ -515,17 +601,19 @@ static int macvtap_skb_to_vnet_hdr(const struct sk_buff *skb,
 
 
 /* Get packet from user space buffer */
-static ssize_t macvtap_get_user(struct macvtap_queue *q,
-				const struct iovec *iv, size_t count,
-				int noblock)
+static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
+				const struct iovec *iv, unsigned long total_len,
+				size_t count, int noblock)
 {
 	struct sk_buff *skb;
 	struct macvlan_dev *vlan;
-	size_t len = count;
+	unsigned long len = total_len;
 	int err;
 	struct virtio_net_hdr vnet_hdr = { 0 };
 	int vnet_hdr_len = 0;
+	int copylen, zerocopy;
 
+	zerocopy = sock_flag(&q->sk, SOCK_ZEROCOPY) && (len > GOODCOPY_LEN);
 	if (q->flags & IFF_VNET_HDR) {
 		vnet_hdr_len = q->vnet_hdr_sz;
 
@@ -552,12 +640,28 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q,
 	if (unlikely(len < ETH_HLEN))
 		goto err;
 
-	skb = macvtap_alloc_skb(&q->sk, NET_IP_ALIGN, len, vnet_hdr.hdr_len,
-				noblock, &err);
+	if (zerocopy)
+		/* There are 256 bytes to be copied in skb, so there is enough
+		 * room for skb expand head in case it is used.
+		 * The rest buffer is mapped from userspace.
+		 */
+		copylen = GOODCOPY_LEN;
+	else
+		copylen = len;
+
+	skb = macvtap_alloc_skb(&q->sk, NET_IP_ALIGN, copylen,
+				vnet_hdr.hdr_len, noblock, &err);
 	if (!skb)
 		goto err;
 
-	err = skb_copy_datagram_from_iovec(skb, 0, iv, vnet_hdr_len, len);
+	if (zerocopy)
+		err = zerocopy_sg_from_iovec(skb, iv, vnet_hdr_len, count);
+	else
+		err = skb_copy_datagram_from_iovec(skb, 0, iv, vnet_hdr_len,
+						   len);
+	if (sock_flag(&q->sk, SOCK_ZEROCOPY))
+		memcpy(&skb_shinfo(skb)->ubuf, m->msg_control,
+			sizeof(struct skb_ubuf_info));
 	if (err)
 		goto err_kfree;
 
@@ -579,7 +683,7 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q,
 		kfree_skb(skb);
 	rcu_read_unlock_bh();
 
-	return count;
+	return total_len;
 
 err_kfree:
 	kfree_skb(skb);
@@ -601,8 +705,8 @@ static ssize_t macvtap_aio_write(struct kiocb *iocb, const struct iovec *iv,
 	ssize_t result = -ENOLINK;
 	struct macvtap_queue *q = file->private_data;
 
-	result = macvtap_get_user(q, iv, iov_length(iv, count),
-			      file->f_flags & O_NONBLOCK);
+	result = macvtap_get_user(q, NULL, iv, iov_length(iv, count), count,
+				  file->f_flags & O_NONBLOCK);
 	return result;
 }
 
@@ -815,7 +919,7 @@ static int macvtap_sendmsg(struct kiocb *iocb, struct socket *sock,
 			   struct msghdr *m, size_t total_len)
 {
 	struct macvtap_queue *q = container_of(sock, struct macvtap_queue, sock);
-	return macvtap_get_user(q, m->msg_iov, total_len,
+	return macvtap_get_user(q, m, m->msg_iov, total_len, m->msg_iovlen,
 			    m->msg_flags & MSG_DONTWAIT);
 }
 

^ permalink raw reply related

* [PATCH V4 4/8]vhost: vhost TX zero-copy support
From: Shirley Ma @ 2011-05-04  8:11 UTC (permalink / raw)
  To: David Miller, mst, Eric Dumazet, Avi Kivity, Arnd Bergmann
  Cc: netdev, kvm, linux-kernel

This patch maintains the outstanding userspace buffers in the 
sequence it is delivered to vhost. The outstanding userspace buffers 
will be marked as done once the lower device buffers DMA has finished. 
This is monitored through last reference of kfree_skb callback. Two
buffer index are used for this purpose.

The vhost passes the userspace buffers info to lower device skb 
through message control. Since there will be some done DMAs when
entering vhost handle_tx. The worse case is all buffers in the vq are
in pending/done status, so we need to notify guest to release DMA done 
buffers first before get any new buffers from the vq.

Signed-off-by: Shirley <xma@us.ibm.com>
---

 drivers/vhost/net.c   |   30 +++++++++++++++++++++++++++-
 drivers/vhost/vhost.c |   50
++++++++++++++++++++++++++++++++++++++++++++++++-
 drivers/vhost/vhost.h |   10 +++++++++
 3 files changed, 87 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 2f7c76a..c403afb 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -32,6 +32,8 @@
  * Using this limit prevents one virtqueue from starving others. */
 #define VHOST_NET_WEIGHT 0x80000
 
+#define MAX_ZEROCOPY_PEND 64
+
 enum {
 	VHOST_NET_VQ_RX = 0,
 	VHOST_NET_VQ_TX = 1,
@@ -129,6 +131,7 @@ static void handle_tx(struct vhost_net *net)
 	int err, wmem;
 	size_t hdr_size;
 	struct socket *sock;
+	struct skb_ubuf_info pend;
 
 	/* TODO: check that we are running from vhost_worker? */
 	sock = rcu_dereference_check(vq->private_data, 1);
@@ -151,6 +154,10 @@ static void handle_tx(struct vhost_net *net)
 	hdr_size = vq->vhost_hlen;
 
 	for (;;) {
+		/* Release DMAs done buffers first */
+		if (sock_flag(sock->sk, SOCK_ZEROCOPY))
+			vhost_zerocopy_signal_used(vq);
+
 		head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
 					 ARRAY_SIZE(vq->iov),
 					 &out, &in,
@@ -166,6 +173,12 @@ static void handle_tx(struct vhost_net *net)
 				set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
 				break;
 			}
+			/* If more outstanding DMAs, queue the work */
+			if (sock_flag(sock->sk, SOCK_ZEROCOPY) &&
+			    (atomic_read(&vq->refcnt) > MAX_ZEROCOPY_PEND)) {
+				vhost_poll_queue(&vq->poll);
+				break;
+			}
 			if (unlikely(vhost_enable_notify(vq))) {
 				vhost_disable_notify(vq);
 				continue;
@@ -188,17 +201,30 @@ static void handle_tx(struct vhost_net *net)
 			       iov_length(vq->hdr, s), hdr_size);
 			break;
 		}
+		/* use msg_control to pass vhost zerocopy ubuf info to skb */
+		if (sock_flag(sock->sk, SOCK_ZEROCOPY)) {
+			pend.callback = vhost_zerocopy_callback;
+			pend.arg = vq;
+			pend.desc = vq->upend_idx;
+			msg.msg_control = &pend;
+			msg.msg_controllen = sizeof(pend);
+			vq->heads[vq->upend_idx].id = head;
+			vq->upend_idx = (vq->upend_idx + 1) % UIO_MAXIOV;
+			atomic_inc(&vq->refcnt);
+		}
 		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(NULL, sock, &msg, len);
 		if (unlikely(err < 0)) {
-			vhost_discard_vq_desc(vq, 1);
+			if (!sock_flag(sock->sk, SOCK_ZEROCOPY))
+				vhost_discard_vq_desc(vq, 1);
 			tx_poll_start(net, sock);
 			break;
 		}
 		if (err != len)
 			pr_debug("Truncated TX packet: "
 				 " len %d != %zd\n", err, len);
-		vhost_add_used_and_signal(&net->dev, vq, head, 0);
+		if (!sock_flag(sock->sk, SOCK_ZEROCOPY))
+			vhost_add_used_and_signal(&net->dev, vq, head, 0);
 		total_len += len;
 		if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
 			vhost_poll_queue(&vq->poll);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 2ab2912..3048953 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -174,6 +174,9 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->call_ctx = NULL;
 	vq->call = NULL;
 	vq->log_ctx = NULL;
+	vq->upend_idx = 0;
+	vq->done_idx = 0;
+	atomic_set(&vq->refcnt, 0);
 }
 
 static int vhost_worker(void *data)
@@ -230,7 +233,7 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev
*dev)
 					       UIO_MAXIOV, GFP_KERNEL);
 		dev->vqs[i].log = kmalloc(sizeof *dev->vqs[i].log * UIO_MAXIOV,
 					  GFP_KERNEL);
-		dev->vqs[i].heads = kmalloc(sizeof *dev->vqs[i].heads *
+		dev->vqs[i].heads = kzalloc(sizeof *dev->vqs[i].heads *
 					    UIO_MAXIOV, GFP_KERNEL);
 
 		if (!dev->vqs[i].indirect || !dev->vqs[i].log ||
@@ -385,10 +388,41 @@ long vhost_dev_reset_owner(struct vhost_dev *dev)
 	return 0;
 }
 
+void vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
+{
+	int i, j = 0;
+
+	i = vq->done_idx;
+	while (i != vq->upend_idx) {
+		/* len = 1 means DMA done */
+		if (vq->heads[i].len == 1) {
+			/* reset len = 0 */
+			vq->heads[i].len = 0;
+			i = (i + 1) % UIO_MAXIOV;
+			++j;
+		} else
+			break;
+	}
+	if (j) {
+		if (i > vq->done_idx)
+			vhost_add_used_n(vq, &vq->heads[vq->done_idx], j);
+		else {
+			vhost_add_used_n(vq, &vq->heads[vq->done_idx],
+					 UIO_MAXIOV - vq->done_idx);
+			vhost_add_used_n(vq, vq->heads, i);
+		}
+		vq->done_idx = i;
+		vhost_signal(vq->dev, vq);
+		atomic_sub(j, &vq->refcnt);
+	}
+}
+
 /* Caller should have device mutex */
 void vhost_dev_cleanup(struct vhost_dev *dev)
 {
 	int i;
+	unsigned long begin = jiffies;
+
 
 	for (i = 0; i < dev->nvqs; ++i) {
 		if (dev->vqs[i].kick && dev->vqs[i].handle_kick) {
@@ -405,6 +439,11 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
 			eventfd_ctx_put(dev->vqs[i].call_ctx);
 		if (dev->vqs[i].call)
 			fput(dev->vqs[i].call);
+		/* wait for all lower device DMAs done, then notify guest */
+		while (atomic_read(&dev->vqs[i].refcnt)) {
+			if (time_after(jiffies, begin + 5 * HZ))
+				vhost_zerocopy_signal_used(&dev->vqs[i]);
+		}
 		vhost_vq_reset(dev, dev->vqs + i);
 	}
 	vhost_dev_free_iovecs(dev);
@@ -1416,3 +1455,12 @@ void vhost_disable_notify(struct vhost_virtqueue
*vq)
 		vq_err(vq, "Failed to enable notification at %p: %d\n",
 		       &vq->used->flags, r);
 }
+
+void vhost_zerocopy_callback(struct sk_buff *skb)
+{
+	int idx = skb_shinfo(skb)->ubuf.desc;
+	struct vhost_virtqueue *vq = skb_shinfo(skb)->ubuf.arg;
+
+	/* set len = 1 to mark this desc buffers done DMA */
+	vq->heads[idx].len = 1;
+}
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index b3363ae..ec032a0 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -108,6 +108,14 @@ struct vhost_virtqueue {
 	/* Log write descriptors */
 	void __user *log_base;
 	struct vhost_log *log;
+	/* vhost zerocopy support */
+	atomic_t refcnt; /* num of outstanding zerocopy DMAs */
+	/* index of zerocopy pending DMA buffers */
+	int upend_idx;
+	/* index of zerocopy done DMA buffers, but not notify guest yet */
+	int done_idx;
+	/* notify vhost zerocopy DMA buffers has done in lower device */
+	void (*callback)(struct sk_buff *);
 };
 
 struct vhost_dev {
@@ -154,6 +162,8 @@ bool vhost_enable_notify(struct vhost_virtqueue *);
 
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
 		    unsigned int log_num, u64 len);
+void vhost_zerocopy_callback(struct sk_buff *skb);
+void vhost_zerocopy_signal_used(struct vhost_virtqueue *vq);
 
 #define vq_err(vq, fmt, ...) do {                                  \
 		pr_debug(pr_fmt(fmt), ##__VA_ARGS__);       \

^ permalink raw reply related

* [PATCH V4 3/8] skbuff: Add userspace buffers support in skb (zero-copy)
From: Shirley Ma @ 2011-05-04  8:06 UTC (permalink / raw)
  To: David Miller, mst, Eric Dumazet, Avi Kivity, Arnd Bergmann
  Cc: netdev, kvm, linux-kernel

This patch adds userspace buffers support in skb shared info. A new 
struct skb_ubuf_info is needed to maintain the userspace buffers argument
and index, a callback is used to notify userspace to release the
buffers once lower device has done DMA (Last reference to that skb
has gone). This kind of skb has a 256 bytes copied data in head to 
make sure we have enough room for head expanding; and mapped the rest of
userspace buffers in skb frags. 

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 include/linux/skbuff.h |   26 ++++++++++++++++++++++++++
 net/core/skbuff.c      |   13 +++++++++++++
 2 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d0ae90a..025de5c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -189,6 +189,18 @@ enum {
 	SKBTX_DRV_NEEDS_SK_REF = 1 << 3,
 };
 
+/*
+ * The callback notifies userspace to release buffers when skb DMA is done in
+ * lower device, the skb last reference should be 0 when calling this.
+ * The desc is used to track userspace buffer index.
+ */
+struct skb_ubuf_info {
+	/* support buffers allocation from userspace */
+	void		(*callback)(struct sk_buff *);
+	void		*arg;
+	size_t		desc;
+};
+
 /* This data is invariant across clones and lives at
  * the end of the header data, ie. at skb->end.
  */
@@ -211,6 +223,10 @@ struct skb_shared_info {
 	/* Intermediate layers must ensure that destructor_arg
 	 * remains valid until skb destructor */
 	void *		destructor_arg;
+
+	/* DMA mapping from/to userspace buffers */
+	struct skb_ubuf_info ubuf;
+
 	/* must be last field, see pskb_expand_head() */
 	skb_frag_t	frags[MAX_SKB_FRAGS];
 };
@@ -2261,5 +2277,15 @@ static inline void skb_checksum_none_assert(struct sk_buff *skb)
 }
 
 bool skb_partial_csum_set(struct sk_buff *skb, u16 start, u16 off);
+
+/*
+ *	skb_ubuf - is the buffer from userspace
+ *	@skb: buffer to check
+ */
+static inline int skb_ubuf(const struct sk_buff *skb)
+{
+	return (skb_shinfo(skb)->ubuf.callback != NULL);
+}
+
 #endif	/* __KERNEL__ */
 #endif	/* _LINUX_SKBUFF_H */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7ebeed0..9cbd3fc 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -210,6 +210,8 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	shinfo = skb_shinfo(skb);
 	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
 	atomic_set(&shinfo->dataref, 1);
+	shinfo->ubuf.callback = NULL;
+	shinfo->ubuf.arg = NULL;
 	kmemcheck_annotate_variable(shinfo->destructor_arg);
 
 	if (fclone) {
@@ -328,6 +330,14 @@ static void skb_release_data(struct sk_buff *skb)
 				put_page(skb_shinfo(skb)->frags[i].page);
 		}
 
+		/*
+		 * if skb buf is from userspace, we need to notify the caller
+		 * the lower device DMA has done;
+		 */
+		if (skb_shinfo(skb)->ubuf.callback) {
+			skb_shinfo(skb)->ubuf.callback(skb);
+			skb_shinfo(skb)->ubuf.callback = NULL;
+		}
 		if (skb_has_frag_list(skb))
 			skb_drop_fraglist(skb);
 
@@ -480,6 +490,9 @@ bool skb_recycle_check(struct sk_buff *skb, int skb_size)
 	if (irqs_disabled())
 		return false;
 
+	if (skb_ubuf(skb))
+		return false;
+
 	if (skb_is_nonlinear(skb) || skb->fclone != SKB_FCLONE_UNAVAILABLE)
 		return false;
 

^ permalink raw reply related

* [PATCH V4 2/8] netdevice.h: Add a new zerocopy device flag
From: Shirley Ma @ 2011-05-04  7:55 UTC (permalink / raw)
  To: David Miller, mst, Eric Dumazet, Avi Kivity, Arnd Bergmann
  Cc: netdev, kvm, linux-kernel

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 include/linux/netdevice.h |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0249fe7..0808f1e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1067,6 +1067,16 @@ struct net_device {
 #define NETIF_F_RXHASH		(1 << 28) /* Receive hashing offload */
 #define NETIF_F_RXCSUM		(1 << 29) /* Receive checksumming offload */
 
+/*
+ * Bit 31 is for device to map userspace buffers -- zerocopy
+ * Device can set this flag when it supports HIGHDMA.
+ * Device can't recycle this kind of skb buffers.
+ * There are 256 bytes copied, the rest of buffers are mapped.
+ * The userspace callback should only be called when last reference to this skb
+ * is gone.
+ */
+#define NETIF_F_ZEROCOPY	(1 << 31)
+
 	/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT	16
 #define NETIF_F_GSO_MASK	0x00ff0000



^ permalink raw reply related

* [PATCH V4 1/8] sock.h: Add a new sock zero-copy flag
From: Shirley Ma @ 2011-05-04  7:53 UTC (permalink / raw)
  To: David Miller, mst, Eric Dumazet, Avi Kivity, Arnd Bergmann
  Cc: netdev, kvm, linux-kernel

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 include/net/sock.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 01810a3..ab09097 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -562,6 +562,7 @@ enum sock_flags {
 	SOCK_TIMESTAMPING_SYS_HARDWARE, /* %SOF_TIMESTAMPING_SYS_HARDWARE */
 	SOCK_FASYNC, /* fasync() active */
 	SOCK_RXQ_OVFL,
+	SOCK_ZEROCOPY, /* buffers from userspace */
 };
 
 static inline void sock_copy_flags(struct sock *nsk, struct sock *osk)

^ permalink raw reply related

* [PATCH V4 0/8] macvtap/vhost TX zero-copy support
From: Shirley Ma @ 2011-05-04  7:48 UTC (permalink / raw)
  To: David Miller, mst, Eric Dumazet, Avi Kivity, Arnd Bergmann
  Cc: netdev, kvm, linux-kernel

This patchset add supports for TX zero-copy between guest and host
kernel through vhost. It significantly reduces CPU utilization on the
local host on which the guest is located (It reduced 30-50% CPU usage
for vhost thread for single stream test). The patchset is based on
previous submission and comments from the community regarding when/how
to handle guest kernel buffers to be released. This is the simplest
approach I can think of after comparing with several other solutions.

This patchset has integrated V3 review comments from the community:

1. Add more comments on how to use device ZEROCOPY flag;

2. Change device ZEROCOPY to available bit 31

3. Fix skb header linear allocation when virtio_net GSO is not enabled

This patchset includes:

1/8: Add a new sock zero-copy flag, SOCK_ZEROCOPY;

2/8: Add a new device flag, NETIF_F_ZEROCOPY for lower level device
support zero-copy;

3/8: Add a new struct skb_ubuf_info in skb_share_info for userspace
buffers release callback when lower device DMA has done for that skb,
which is the last reference count gone;

4/8: Add vhost zero-copy callback in vhost when skb last refcnt is gone;
add vhost_zerocopy_signal_used to notify guest to release TX skb
buffers.

5/8: Add macvtap zero-copy in lower device when sending packet is
greater than 256 bytes to make sure there is enough room for expanding
skb head.

6/8: Add Chelsio 10Gb NIC to zero-copy feature flag

7/8: Add Intel 10Gb NIC zero-copy feature flag

8/8: Add Emulex 10Gb NIC zero-copy feature flag

The patchset is built against most recent linux 2.6.39-rc5. It has
passed netperf/netserver multiple streams stress test on above NICs.

Single TCP_STREAM 120 secs test results over ixgbe 10Gb NIC results:

Message BW(Gb/s)qemu-kvm (NumCPU)vhost-net(NumCPU) PerfTop irq/s
4K      7408.57         92.1%           22.6%           1229
4K(Orig)4913.17         118.1%          84.1%           2086    
8K      9129.90         89.3%           23.3%           1141
8K(Orig)7094.55         115.9%          84.7%           2157
16K     9178.81         89.1%           23.3%           1139
16K(Orig)8927.1         118.7%          83.4%           2262
64K     9171.43         88.4%           24.9%           1253
64K(Orig)9085.85        115.9%          82.4%           2229

For message size less or equal than 2K, there is a known KVM guest TX
overrun issue. With this zero-copy patch, the issue becomes more severe,
guest io_exits has tripled than before, so the performance is not good.
Once the TX overrun problem has been addressed, I will retest the small
message size performance.

Thanks
Shirley

^ permalink raw reply

* Re: [PATCH] usbnet: Transfer of maintainership
From: Oliver Neukum @ 2011-05-04  7:36 UTC (permalink / raw)
  To: Richard Cochran; +Cc: davem, netdev, USB list
In-Reply-To: <20110504054511.GA3362@riccoc20.at.omicron.at>

Am Mittwoch, 4. Mai 2011, 07:45:11 schrieb Richard Cochran:
> On Fri, Apr 29, 2011 at 02:19:04PM +0200, Oliver Neukum wrote:
> 
> >  USB "USBNET" DRIVER FRAMEWORK
> > -M:	David Brownell <dbrownell@users.sourceforge.net>
> > +M:	Oliver Neukum <oneukum@suse.de>
> 
> Oliver,
> 
> We have been looking at usbnet and a have question.
> 
> Usbnet doesn't use either phylib or napi, but I think the reason is
> probably purely historical. Is there a technical reason why phylib or
> napi won't work with usbnet devices?

Hi,

phylib is for historical reasons, although many devices won't
give you much access to the phy.

NAPI however is technically not very meaningful for USB
devices as you cannot poll them. So the central concept
behind NAPI doesn't apply.

	Regards
		Oliver
-- 
- - - 
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) 
Maxfeldstraße 5                         
90409 Nürnberg 
Germany 
- - - 

^ permalink raw reply

* Re: [PATCH] usbnet: runtime pm: fix out of memory
From: Oliver Neukum @ 2011-05-04  7:26 UTC (permalink / raw)
  To: Ming Lei
  Cc: David Miller, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <BANLkTinZK_qjO7u+ckG_83_paVyMeyrgPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Am Mittwoch, 4. Mai 2011, 08:59:41 schrieb Ming Lei:
> Hi,
> 
> 2011/5/3 Oliver Neukum <oneukum-l3A5Bk7waGM@public.gmane.org>:
> 
> > Do the devices in question use cdc_ether?
> 
> No, the device is smsc95xx, which is compound device and is
> integrated into pandaboard.

OK,

in this case:

Acked-by: Oliver Neukum <oneukum-l3A5Bk7waGM@public.gmane.org>

	Regards
		Oliver
-- 
- - - 
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) 
Maxfeldstraße 5                         
90409 Nürnberg 
Germany 
- - - 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4 1/1] can: add pruss CAN driver.
From: Subhasish Ghosh @ 2011-05-04  7:13 UTC (permalink / raw)
  To: Arnd Bergmann, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: sachi-EvXpCiN+lbve9wHmmfpqLFaTQe2KTcn/,
	davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/,
	Netdev-u79uwXL29TY76Z2rM5mHXA, nsekhar-l0cyMroinI0, open list,
	CAN NETWORK DRIVERS, Marc Kleine-Budde, m-watkins-l0cyMroinI0,
	Wolfgang Grandegger
In-Reply-To: <201104271525.28512.arnd-r2nGTMty4D4@public.gmane.org>

> On Wednesday 27 April 2011, Subhasish Ghosh wrote:
>> >
>> > - Use just one value per sysfs file
>> 
>> SG - I felt adding entry for each mbx_id will clutter the sysfs.
>>         Is it ok to do that.
> 
> That is probably not much better either.
> 
> Note also that every sysfs file needs to come with associated
> documentation in Documentation/ABI/*/ to make sure that users
> will know exactly how the file is meant to work. 
> 
> Why do you need to export these values in the first place? Is
> it just for debugging or do you expect all CAN user space
> to look at this?
> 
> If it's for debugging, please don't export the files through sysfs.
> Depending on how useful the data is to regular users, you can
> still export it through a debugfs file in that case, which has
> much less strict rules.
> 
> If the file is instead meant as part of the regular operation of
> the device, it should not be in debugfs but probably be integrated
> into the CAN socket interface, so that users don't need to work
> with two different ways of getting to the device (socket and sysfs).
> 

CAN requires mail box IDs to be programmed in. But, the socket
CAN subsystem supports only software filtering of the mail box IDs.

So, the mail box IDs programmed into socket CAN during initialization
does not propagate into the hardware. This is planned to be a future
implementation in Socket CAN.

In our case, we support hardware filtering, to work around with this,
Wolfgang (Socket CAN owner) suggested that we implement
this using sysfs.

These setting are not for debugging, but to program the mail box IDs
into the hardware. 

^ permalink raw reply

* Re: [PATCH] usbnet: runtime pm: fix out of memory
From: Ming Lei @ 2011-05-04  6:59 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: David Miller, netdev, linux-usb
In-Reply-To: <201105030953.44131.oneukum@suse.de>

Hi,

2011/5/3 Oliver Neukum <oneukum@suse.de>:

> Do the devices in question use cdc_ether?

No, the device is smsc95xx, which is compound device and is
integrated into pandaboard.

> The problem I see with this patch is that cdc_ether uses .reset_resume = usbnet_resume
> Therefore the device will not have been reset from the viewpoint of the device, yet
> the device may be open, so the bug would strike again.
>
> It seems to me that this patch is not wrong as such, but incomplete.

Since it is not cdc device, so the patch should be complete, the idea behind
the patch is reasonable: only start to schedule urb for data packets after the
interface is opened.

Even from the comment below in usbnet_open, it is a generic fix for
this kind of issue, we should always put the device into a 'known safe" state
before starting communication.

    int usbnet_open (struct net_device *net)
         ......
         // put into "known safe" state
         ......

thanks,
-- 
Ming Lei

^ permalink raw reply

* [PATCH] net: add mac_pton() for parsing MAC address
From: Alexey Dobriyan @ 2011-05-04  6:15 UTC (permalink / raw)
  To: davem; +Cc: netdev

mac_pton() parses MAC address in form XX:XX:XX:XX:XX:XX and
only in that form.

mac_pton() doesn't dirty result until it's sure string representation is valid.

mac_pton() doesn't care about characters _after_ last octet,
it's up to caller to deal with it.

mac_pton() diverges from 0/-E return value convention.
Target usage:

	if (!mac_pton(str, whatever->mac))
		return -EINVAL;
	/* ->mac being u8 [ETH_ALEN] is filled at this point. */
	/* optionally check str[3 * ETH_ALEN - 1] for termination */


Use mac_pton() in pktgen and netconsole for start.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 drivers/net/netconsole.c |   18 +++------------
 include/linux/if_ether.h |    1 
 net/core/netpoll.c       |   26 -----------------------
 net/core/pktgen.c        |   53 +++++++----------------------------------------
 net/core/utils.c         |   24 +++++++++++++++++++++
 5 files changed, 38 insertions(+), 84 deletions(-)

--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -473,23 +473,13 @@ static ssize_t store_remote_mac(struct netconsole_target *nt,
 		return -EINVAL;
 	}
 
-	for (i = 0; i < ETH_ALEN - 1; i++) {
-		remote_mac[i] = simple_strtoul(p, &p, 16);
-		if (*p != ':')
-			goto invalid;
-		p++;
-	}
-	remote_mac[ETH_ALEN - 1] = simple_strtoul(p, &p, 16);
-	if (*p && (*p != '\n'))
-		goto invalid;
-
+	if (!mac_pton(buf, remote_mac))
+		return -EINVAL;
+	if (buf[3 * ETH_ALEN - 1] && buf[3 * ETH_ALEN - 1] != '\n')
+		return -EINVAL;
 	memcpy(nt->np.remote_mac, remote_mac, ETH_ALEN);
 
 	return strnlen(buf, count);
-
-invalid:
-	printk(KERN_ERR "netconsole: invalid input\n");
-	return -EINVAL;
 }
 
 /*
--- a/include/linux/if_ether.h
+++ b/include/linux/if_ether.h
@@ -136,6 +136,7 @@ int eth_header_parse(const struct sk_buff *skb, unsigned char *haddr);
 extern struct ctl_table ether_table[];
 #endif
 
+int mac_pton(const char *s, u8 *mac);
 extern ssize_t sysfs_format_mac(char *buf, const unsigned char *addr, int len);
 
 #endif
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -698,32 +698,8 @@ int netpoll_parse_options(struct netpoll *np, char *opt)
 
 	if (*cur != 0) {
 		/* MAC address */
-		if ((delim = strchr(cur, ':')) == NULL)
+		if (!mac_pton(cur, np->remote_mac))
 			goto parse_failed;
-		*delim = 0;
-		np->remote_mac[0] = simple_strtol(cur, NULL, 16);
-		cur = delim + 1;
-		if ((delim = strchr(cur, ':')) == NULL)
-			goto parse_failed;
-		*delim = 0;
-		np->remote_mac[1] = simple_strtol(cur, NULL, 16);
-		cur = delim + 1;
-		if ((delim = strchr(cur, ':')) == NULL)
-			goto parse_failed;
-		*delim = 0;
-		np->remote_mac[2] = simple_strtol(cur, NULL, 16);
-		cur = delim + 1;
-		if ((delim = strchr(cur, ':')) == NULL)
-			goto parse_failed;
-		*delim = 0;
-		np->remote_mac[3] = simple_strtol(cur, NULL, 16);
-		cur = delim + 1;
-		if ((delim = strchr(cur, ':')) == NULL)
-			goto parse_failed;
-		*delim = 0;
-		np->remote_mac[4] = simple_strtol(cur, NULL, 16);
-		cur = delim + 1;
-		np->remote_mac[5] = simple_strtol(cur, NULL, 16);
 	}
 
 	netpoll_print_options(np);
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -1420,11 +1420,6 @@ static ssize_t pktgen_if_write(struct file *file,
 		return count;
 	}
 	if (!strcmp(name, "dst_mac")) {
-		char *v = valstr;
-		unsigned char old_dmac[ETH_ALEN];
-		unsigned char *m = pkt_dev->dst_mac;
-		memcpy(old_dmac, pkt_dev->dst_mac, ETH_ALEN);
-
 		len = strn_len(&user_buffer[i], sizeof(valstr) - 1);
 		if (len < 0)
 			return len;
@@ -1432,35 +1427,16 @@ static ssize_t pktgen_if_write(struct file *file,
 		memset(valstr, 0, sizeof(valstr));
 		if (copy_from_user(valstr, &user_buffer[i], len))
 			return -EFAULT;
-		i += len;
-
-		for (*m = 0; *v && m < pkt_dev->dst_mac + 6; v++) {
-			int value;
-
-			value = hex_to_bin(*v);
-			if (value >= 0)
-				*m = *m * 16 + value;
-
-			if (*v == ':') {
-				m++;
-				*m = 0;
-			}
-		}
 
+		if (!mac_pton(valstr, pkt_dev->dst_mac))
+			return -EINVAL;
 		/* Set up Dest MAC */
-		if (compare_ether_addr(old_dmac, pkt_dev->dst_mac))
-			memcpy(&(pkt_dev->hh[0]), pkt_dev->dst_mac, ETH_ALEN);
+		memcpy(&pkt_dev->hh[0], pkt_dev->dst_mac, ETH_ALEN);
 
-		sprintf(pg_result, "OK: dstmac");
+		sprintf(pg_result, "OK: dstmac %pM", pkt_dev->dst_mac);
 		return count;
 	}
 	if (!strcmp(name, "src_mac")) {
-		char *v = valstr;
-		unsigned char old_smac[ETH_ALEN];
-		unsigned char *m = pkt_dev->src_mac;
-
-		memcpy(old_smac, pkt_dev->src_mac, ETH_ALEN);
-
 		len = strn_len(&user_buffer[i], sizeof(valstr) - 1);
 		if (len < 0)
 			return len;
@@ -1468,26 +1444,13 @@ static ssize_t pktgen_if_write(struct file *file,
 		memset(valstr, 0, sizeof(valstr));
 		if (copy_from_user(valstr, &user_buffer[i], len))
 			return -EFAULT;
-		i += len;
-
-		for (*m = 0; *v && m < pkt_dev->src_mac + 6; v++) {
-			int value;
-
-			value = hex_to_bin(*v);
-			if (value >= 0)
-				*m = *m * 16 + value;
-
-			if (*v == ':') {
-				m++;
-				*m = 0;
-			}
-		}
 
+		if (!mac_pton(valstr, pkt_dev->src_mac))
+			return -EINVAL;
 		/* Set up Src MAC */
-		if (compare_ether_addr(old_smac, pkt_dev->src_mac))
-			memcpy(&(pkt_dev->hh[6]), pkt_dev->src_mac, ETH_ALEN);
+		memcpy(&pkt_dev->hh[6], pkt_dev->src_mac, ETH_ALEN);
 
-		sprintf(pg_result, "OK: srcmac");
+		sprintf(pg_result, "OK: srcmac %pM", pkt_dev->src_mac);
 		return count;
 	}
 
--- a/net/core/utils.c
+++ b/net/core/utils.c
@@ -296,3 +296,27 @@ void inet_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
 				csum_unfold(*sum)));
 }
 EXPORT_SYMBOL(inet_proto_csum_replace4);
+
+int mac_pton(const char *s, u8 *mac)
+{
+	int i;
+
+	/* XX:XX:XX:XX:XX:XX */
+	if (strlen(s) < 3 * ETH_ALEN - 1)
+		return 0;
+
+	/* Don't half dirty result. */
+	for (i = 0; i < ETH_ALEN; i++) {
+		if (!strchr("0123456789abcdefABCDEF", s[i * 3]))
+			return 0;
+		if (!strchr("0123456789abcdefABCDEF", s[i * 3 + 1]))
+			return 0;
+		if (i != ETH_ALEN - 1 && s[i * 3 + 2] != ':')
+			return 0;
+	}
+	for (i = 0; i < ETH_ALEN; i++) {
+		mac[i] = (hex_to_bin(s[i * 3]) << 4) | hex_to_bin(s[i * 3 + 1]);
+	}
+	return 1;
+}
+EXPORT_SYMBOL(mac_pton);

^ permalink raw reply

* Re: linux-next: manual merge of the rcu tree with the net tree
From: Paul E. McKenney @ 2011-05-04  6:05 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: linux-next, linux-kernel, Eric Dumazet, David Miller, netdev
In-Reply-To: <20110503142419.2b7d5e23.sfr@canb.auug.org.au>

On Tue, May 03, 2011 at 02:24:19PM +1000, Stephen Rothwell wrote:
> Hi Paul,
> 
> Today's linux-next merge of the rcu tree got a conflict in
> net/core/filter.c between commit 0a14842f5a3c ("net: filter: Just In Time
> compiler for x86-64") from the net tree and commit 10cde158c259
> ("net,rcu: convert call_rcu(sk_filter_release_rcu) to kfree_rcu") from
> the rcu tree.
> 
> The former adds another operation into sk_filter_release_rcu(), so I have
> effectively reverted the rcu tree change for now (by applying the patch
> below as a merge fixup).

Thank you, Stephen!

Looks like I need to get my changes forward-ported to the latest -rc...

						Thanx, Paul

> -- 
> Cheers,
> Stephen Rothwell                    sfr@canb.auug.org.au
> 
> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Tue, 3 May 2011 14:06:50 +1000
> Subject: [PATCH] Revert "net,rcu: convert call_rcu(sk_filter_release_rcu) to
>  kfree_rcu"
> 
> This reverts commit 10cde158c2591422a2b32a2f560f406b8e69bee6.
> ---
>  include/net/sock.h |    4 +++-
>  net/core/filter.c  |   13 +++++++++++++
>  2 files changed, 16 insertions(+), 1 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 1a2f255..f2046e4 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1180,6 +1180,8 @@ extern void sk_common_release(struct sock *sk);
>  /* Initialise core socket variables */
>  extern void sock_init_data(struct socket *sock, struct sock *sk);
> 
> +extern void sk_filter_release_rcu(struct rcu_head *rcu);
> +
>  /**
>   *	sk_filter_release - release a socket filter
>   *	@fp: filter to remove
> @@ -1190,7 +1192,7 @@ extern void sock_init_data(struct socket *sock, struct sock *sk);
>  static inline void sk_filter_release(struct sk_filter *fp)
>  {
>  	if (atomic_dec_and_test(&fp->refcnt))
> -		kfree_rcu(fp, rcu);
> +		call_rcu(&fp->rcu, sk_filter_release_rcu);
>  }
> 
>  static inline void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp)
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 66d403d..0eb8c44 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -572,6 +572,19 @@ int sk_chk_filter(struct sock_filter *filter, int flen)
>  EXPORT_SYMBOL(sk_chk_filter);
> 
>  /**
> + * 	sk_filter_release_rcu - Release a socket filter by rcu_head
> + *	@rcu: rcu_head that contains the sk_filter to free
> + */
> +void sk_filter_release_rcu(struct rcu_head *rcu)
> +{
> +	struct sk_filter *fp = container_of(rcu, struct sk_filter, rcu);
> +
> +	bpf_jit_free(fp);
> +	kfree(fp);
> +}
> +EXPORT_SYMBOL(sk_filter_release_rcu);
> +
> +/**
>   *	sk_attach_filter - attach a socket filter
>   *	@fprog: the filter program
>   *	@sk: the socket to use
> -- 
> 1.7.4.4
> 

^ permalink raw reply

* Re: [PATCH] usbnet: Transfer of maintainership
From: Richard Cochran @ 2011-05-04  5:45 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, netdev-u79uwXL29TY76Z2rM5mHXA,
	USB list
In-Reply-To: <201104291419.04498.oneukum-l3A5Bk7waGM@public.gmane.org>

On Fri, Apr 29, 2011 at 02:19:04PM +0200, Oliver Neukum wrote:

>  USB "USBNET" DRIVER FRAMEWORK
> -M:	David Brownell <dbrownell-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
> +M:	Oliver Neukum <oneukum-l3A5Bk7waGM@public.gmane.org>

Oliver,

We have been looking at usbnet and a have question.

Usbnet doesn't use either phylib or napi, but I think the reason is
probably purely historical. Is there a technical reason why phylib or
napi won't work with usbnet devices?

If not, I would like to convert them in the near future.

Thanks,

Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox