Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] vxlan: Allow setting unicast address to the group address
From: Cong Wang @ 2013-04-11  4:05 UTC (permalink / raw)
  To: netdev
In-Reply-To: <87txnen4y4.wl%atzm@stratosphere.co.jp>

On Wed, 10 Apr 2013 at 08:52 GMT, Atzm Watanabe <atzm@stratosphere.co.jp> wrote:
> This patch allows setting unicast address to the VXLAN group address.
> It allows that VXLAN can be used as peer-to-peer tunnel without
> multicast.
>

Then GROUP is confusing, please pick another name and attribute.

Thanks.

^ permalink raw reply

* Re: [Patch net-next] vxlan: revert "vxlan: Bypass encapsulation if the destination is local"
From: Sridhar Samudrala @ 2013-04-11  4:53 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, David S. Miller
In-Reply-To: <1365646215.25993.3.camel@cr0>

On 4/10/2013 7:10 PM, Cong Wang wrote:
>> - when source and destination endpoints belonging to different vni's
>>    are on 2 different bridges on the same host. encap bypass is done
>>    in this scenario by checking if rt_flags has RTCF_LOCAL set. I think
>>    you must be hitting this path and the following patch should fix
>>    it by only doing bypass if the source and dest devices belong to
>>    the same net. Can you try it and see if it fixes your tests?
> I just tested it, unfortunately it doesn't work, the bug still exists.
>
> If you need any other info, please let me know.
So does it mean that you are hitting the if condition that does encap 
bypass
even afterthe net_eq() check? Do the tests pass If you comment out the 
'if' block?

Can you share your test config/scripts so that i can try out your setup if
it is not toocomplicated?

Thanks
Sridhar

^ permalink raw reply

* Re: [Patch net-next] vxlan: revert "vxlan: Bypass encapsulation if the destination is local"
From: Cong Wang @ 2013-04-11  5:55 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: netdev, David S. Miller
In-Reply-To: <516641CF.4020101@us.ibm.com>



----- Original Message -----
> On 4/10/2013 7:10 PM, Cong Wang wrote:
> >> - when source and destination endpoints belonging to different vni's
> >>    are on 2 different bridges on the same host. encap bypass is done
> >>    in this scenario by checking if rt_flags has RTCF_LOCAL set. I think
> >>    you must be hitting this path and the following patch should fix
> >>    it by only doing bypass if the source and dest devices belong to
> >>    the same net. Can you try it and see if it fixes your tests?
> > I just tested it, unfortunately it doesn't work, the bug still exists.
> >
> > If you need any other info, please let me know.
> So does it mean that you are hitting the if condition that does encap
> bypass
> even afterthe net_eq() check? Do the tests pass If you comment out the
> 'if' block?

Yes, after adding a printk inside the 'if' block, I got:

[   71.456329] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth0
[   71.596551] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth1
[   72.028574] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth0
[   72.436384] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth1
[   73.028576] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth0
[   73.185134] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth0
[   73.436582] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth1
[   74.184251] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth0

It seems the dst dev is the dev which vxlan0 setup on, so
there is no way to know if the packet is targeted for a different netns
on the same host, at least I don't find such RTCF_* flag.

I'd propose to revert that commit partially:

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 9a64715..0847564 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1012,18 +1012,6 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
                goto tx_error;
        }
 
-       /* Bypass encapsulation if the destination is local */
-       if (rt->rt_flags & RTCF_LOCAL) {
-               struct vxlan_dev *dst_vxlan;
-
-               ip_rt_put(rt);
-               dst_vxlan = vxlan_find_vni(dev_net(dev), vni);
-               if (!dst_vxlan)
-                       goto tx_error;
-               vxlan_encap_bypass(skb, vxlan, dst_vxlan);
-               return NETDEV_TX_OK;
-       }
-
        memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
        IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED |
                              IPSKB_REROUTED);


> 
> Can you share your test config/scripts so that i can try out your setup if
> it is not toocomplicated?
> 


Sure, here is what I did:

1) create a veth pair: veth0 and veth1
2) create a new netns
3) move veth1 to the new netns
4) setup vxlan0 on veth0
5) setup vxlan0 on veth1 in the new netns
6) ping remote, that is the IP of the vxlan0 in new netns

^ permalink raw reply related

* !!PCH LOTTO WINNER**
From: PCHSearch.com @ 2013-04-11  5:05 UTC (permalink / raw)
  To: Recipients

  We are please to announce to you that your email address emerged along
side 4 others as a category 2 winner in this New year weekly Publisher®'s
Clearing House Consequently. You have won One million dollars and therefore been approved for a total pay out of One million dollars($1,000,000.00USD) The following particulars are attached to your lotto payment order:

winning numbers : 1680
email, ticket number:ETN9091176

PLEASE CONTACT THE UNDERLINED CLAIME OFFICER WITH THE CONTACT INFORMATION BELOW

AGENT: MRS. Margaret Crossan
EMAIL: pchlotto.claims@pchlottowinners.com

Winner you are to send the details below to process the immediate payment of your prize

1. Name in full:
2. Address:
3. Sex:
4. Nationality:
5. Age:
6. Present Country:

!!!Once Again Congratulations!!!

Yours Sincerely,
Mr.Dave Sayer
ONLINE CO-ORDINATOR.

^ permalink raw reply

* Re: [PATCH iproute2] vxlan: Allow setting unicast address to the group address
From: Atzm Watanabe @ 2013-04-11  6:27 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20130410094639.748d022f@nehalam.linuxnetplumber.net>


At Wed, 10 Apr 2013 09:46:39 -0700,
Stephen Hemminger wrote:
> 
> On Wed, 10 Apr 2013 17:52:05 +0900
> Atzm Watanabe <atzm@stratosphere.co.jp> wrote:
> 
> > This patch allows setting unicast address to the VXLAN group address.
> > It allows that VXLAN can be used as peer-to-peer tunnel without
> > multicast.
> > 
> > Signed-off-by: Atzm Watanabe <atzm@stratosphere.co.jp>
> > ---
> >  ip/iplink_vxlan.c | 3 ---
> >  1 file changed, 3 deletions(-)
> > 
> > diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
> > index 1025326..cfe324c 100644
> > --- a/ip/iplink_vxlan.c
> > +++ b/ip/iplink_vxlan.c
> > @@ -66,9 +66,6 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
> >  		} else if (!matches(*argv, "group")) {
> >  			NEXT_ARG();
> >  			gaddr = get_addr32(*argv);
> > -
> > -			if (!IN_MULTICAST(ntohl(gaddr)))
> > -				invarg("invald group address", *argv);
> >  		} else if (!matches(*argv, "local")) {
> >  			NEXT_ARG();
> >  			if (strcmp(*argv, "any"))
> 
> Could you use another name or argument to express the different intended behavior.

OK, I agreed.

But simply adding another name and argument also means that a VXLAN
interface can have multiple "last resort".
So I'll try to fix as below:

   1) Add "remote" apart from "group".
   2) But specifying both arguments at a time will fail.

If you have other better ideas, please let me know.

Thanks.

^ permalink raw reply

* Re: [PATCH] vxlan: Allow setting unicast address to the group address
From: Atzm Watanabe @ 2013-04-11  6:27 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev
In-Reply-To: <kk5cpc$f14$1@ger.gmane.org>


At Thu, 11 Apr 2013 04:05:01 +0000 (UTC),
Cong Wang wrote:
> 
> On Wed, 10 Apr 2013 at 08:52 GMT, Atzm Watanabe <atzm@stratosphere.co.jp> wrote:
> > This patch allows setting unicast address to the VXLAN group address.
> > It allows that VXLAN can be used as peer-to-peer tunnel without
> > multicast.
> >
> 
> Then GROUP is confusing, please pick another name and attribute.

OK, I agreed.

But simply adding another name and attribute also means that a VXLAN
interface can have multiple "last resort".
So I'll try to fix as below:

  1) Replace "gaddr" by "daddr", in struct vxlan_dev.
  2) Add IFLA_VXLAN_REMOTE apart from IFLA_VXLAN_GROUP.
  3) If IFLA_VXLAN_REMOTE and IFLA_VXLAN_GROUP are specified at a
     time, vxlan_validate() will return an error.

If you have other better ideas, please let me know.

Thanks.

^ permalink raw reply

* Re: [Patch net-next] vxlan: revert "vxlan: Bypass encapsulation if the destination is local"
From: Sridhar Samudrala @ 2013-04-11  6:33 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, David S. Miller
In-Reply-To: <1937058599.2214531.1365659704193.JavaMail.root@redhat.com>

On 4/10/2013 10:55 PM, Cong Wang wrote:
>
> ----- Original Message -----
>> On 4/10/2013 7:10 PM, Cong Wang wrote:
>>>> - when source and destination endpoints belonging to different vni's
>>>>     are on 2 different bridges on the same host. encap bypass is done
>>>>     in this scenario by checking if rt_flags has RTCF_LOCAL set. I think
>>>>     you must be hitting this path and the following patch should fix
>>>>     it by only doing bypass if the source and dest devices belong to
>>>>     the same net. Can you try it and see if it fixes your tests?
>>> I just tested it, unfortunately it doesn't work, the bug still exists.
>>>
>>> If you need any other info, please let me know.
>> So does it mean that you are hitting the if condition that does encap
>> bypass
>> even afterthe net_eq() check? Do the tests pass If you comment out the
>> 'if' block?
> Yes, after adding a printk inside the 'if' block, I got:
>
> [   71.456329] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth0
> [   71.596551] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth1
> [   72.028574] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth0
> [   72.436384] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth1
> [   73.028576] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth0
> [   73.185134] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth0
> [   73.436582] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth1
> [   74.184251] vxlan: dev: vxlan0, dst: 224.8.8.8, dst dev: veth0
>
> It seems the dst dev is the dev which vxlan0 setup on, so
> there is no way to know if the packet is targeted for a different netns
> on the same host, at least I don't find such RTCF_* flag.
>
> I'd propose to revert that commit partially:
I think we should spend some more time to address this issue correctly.
Bypassing encap makes a significant improvement in performance when the 
dest.
endpoint is on the same host.
So is vxlan_encap_bypass() getting called or are you hitting goto tx_error?

>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 9a64715..0847564 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1012,18 +1012,6 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>                  goto tx_error;
>          }
>
> -       /* Bypass encapsulation if the destination is local */
> -       if (rt->rt_flags & RTCF_LOCAL) {
> -               struct vxlan_dev *dst_vxlan;
> -
> -               ip_rt_put(rt);
> -               dst_vxlan = vxlan_find_vni(dev_net(dev), vni);
> -               if (!dst_vxlan)
> -                       goto tx_error;
> -               vxlan_encap_bypass(skb, vxlan, dst_vxlan);
> -               return NETDEV_TX_OK;
> -       }
> -
>          memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
>          IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED |
>                                IPSKB_REROUTED);
>
>
>> Can you share your test config/scripts so that i can try out your setup if
>> it is not toocomplicated?
>>
>
> Sure, here is what I did:
>
> 1) create a veth pair: veth0 and veth1
> 2) create a new netns
> 3) move veth1 to the new netns
> 4) setup vxlan0 on veth0
> 5) setup vxlan0 on veth1 in the new netns
> 6) ping remote, that is the IP of the vxlan0 in new netns
>
I am not all that familiar with creating netns and veth interfaces.
I guess we can do all this via 'ip' command.
Can you give me a script with the exact commands to do this setup?

Thanks
Sridhar

^ permalink raw reply

* [net-next PATCH] xen-netback: switch to use skb_partial_csum_set()
From: Jason Wang @ 2013-04-11  6:35 UTC (permalink / raw)
  To: ian.campbell, netdev, linux-kernel, davem; +Cc: Jason Wang

Switch to use skb_partial_csum_set() to simplify the codes.

Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
Note:
- Compile test only.
---
 drivers/net/xen-netback/netback.c |   22 ++++++++--------------
 1 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 83905a9..70631f0 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1156,7 +1156,6 @@ static int netbk_set_skb_gso(struct xenvif *vif,
 static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
 {
 	struct iphdr *iph;
-	unsigned char *th;
 	int err = -EPROTO;
 	int recalculate_partial_csum = 0;
 
@@ -1180,28 +1179,26 @@ static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
 		goto out;
 
 	iph = (void *)skb->data;
-	th = skb->data + 4 * iph->ihl;
-	if (th >= skb_tail_pointer(skb))
-		goto out;
-
-	skb_set_transport_header(skb, 4 * iph->ihl);
-	skb->csum_start = th - skb->head;
 	switch (iph->protocol) {
 	case IPPROTO_TCP:
-		skb->csum_offset = offsetof(struct tcphdr, check);
+		if (!skb_partial_csum_set(skb, 4 * iph->ihl,
+					  offsetof(struct tcphdr, check)))
+			goto out;
 
 		if (recalculate_partial_csum) {
-			struct tcphdr *tcph = (struct tcphdr *)th;
+			struct tcphdr *tcph = tcp_hdr(skb);
 			tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
 							 skb->len - iph->ihl*4,
 							 IPPROTO_TCP, 0);
 		}
 		break;
 	case IPPROTO_UDP:
-		skb->csum_offset = offsetof(struct udphdr, check);
+		if (!skb_partial_csum_set(skb, 4 * iph->ihl,
+					  offsetof(struct udphdr, check)))
+			goto out;
 
 		if (recalculate_partial_csum) {
-			struct udphdr *udph = (struct udphdr *)th;
+			struct udphdr *udph = udp_hdr(skb);
 			udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
 							 skb->len - iph->ihl*4,
 							 IPPROTO_UDP, 0);
@@ -1215,9 +1212,6 @@ static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
 		goto out;
 	}
 
-	if ((th + skb->csum_offset + 2) > skb_tail_pointer(skb))
-		goto out;
-
 	err = 0;
 
 out:
-- 
1.7.1

^ permalink raw reply related

* Re: [net-next 4/5] netback: set transport header before passing it to kernel
From: Jason Wang @ 2013-04-11  6:37 UTC (permalink / raw)
  To: Ian Campbell
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com, Eric Dumazet
In-Reply-To: <1365600803.27868.60.camel@zakaz.uk.xensource.com>

On 04/10/2013 09:33 PM, Ian Campbell wrote:
> On Tue, 2013-03-26 at 06:19 +0000, Jason Wang wrote:
>> Currently, for the packets receives from netback, before doing header check,
>> kernel just reset the transport header in netif_receive_skb() which pretends non
>> l4 header. This is suboptimal for precise packet length estimation (introduced
>> in 1def9238: net_sched: more precise pkt_len computation) which needs correct l4
>> header for gso packets.
>>
>> The patch just reuse the header probed by netback for partial checksum packets
>> and tries to use skb_flow_dissect() for other cases, if both fail, just pretend
>> no l4 header.
>>
>> Cc: Eric Dumazet <edumazet@google.com>
>> Cc: Ian Campbell <ian.campbell@citrix.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  drivers/net/xen-netback/netback.c |   12 ++++++++++++
>>  1 files changed, 12 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index aa28550..fc8faa7 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -39,6 +39,7 @@
>>  #include <linux/udp.h>
>>  
>>  #include <net/tcp.h>
>> +#include <net/flow_keys.h>
>>  
>>  #include <xen/xen.h>
>>  #include <xen/events.h>
>> @@ -1184,6 +1185,7 @@ static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
>>  	if (th >= skb_tail_pointer(skb))
>>  		goto out;
>>  
>> +	skb_set_transport_header(skb, 4 * iph->ihl);
>>  	skb->csum_start = th - skb->head;
> Should the use of th here (and perhaps above) be replaced with
> skb_transport_header() too?

Yes, and furthermore looks like we can use skb_partial_csum_set() here,
will send a patch.

Thanks
>>  	switch (iph->protocol) {
>>  	case IPPROTO_TCP:
>> @@ -1495,6 +1497,7 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk)
>>  
>>  		skb->dev      = vif->dev;
>>  		skb->protocol = eth_type_trans(skb, skb->dev);
>> +		skb_reset_network_header(skb);
>>  
>>  		if (checksum_setup(vif, skb)) {
>>  			netdev_dbg(vif->dev,
>> @@ -1503,6 +1506,15 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk)
>>  			continue;
>>  		}
>>  
>> +		if (!skb_transport_header_was_set(skb)) {
>> +			struct flow_keys keys;
>> +
>> +			if (skb_flow_dissect(skb, &keys))
>> +				skb_set_transport_header(skb, keys.thoff);
>> +			else
>> +				skb_reset_transport_header(skb);
>> +		}
>> +
>>  		vif->dev->stats.rx_bytes += skb->len;
>>  		vif->dev->stats.rx_packets++;
>>  
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [PATCH 1/2 v5] usbnet: allow status interrupt URB to always be active
From: Oliver Neukum @ 2013-04-11  6:50 UTC (permalink / raw)
  To: Ming Lei
  Cc: Dan Williams, Elina Pasheva, Network Development, linux-usb,
	Rory Filer, Phil Sutter
In-Reply-To: <CACVXFVNKSszw+3VnM3hwepbTHh3gvjC+pG2SL_OsVm+1SqrNzQ@mail.gmail.com>

On Thursday 11 April 2013 10:31:31 Ming Lei wrote:
 
> 'mem_flags' isn't needed any more since we can apply allocation
> of GFP_NOIO automatically in resume path now, and you can always
> use GFP_KERNEL safely. Considered that it is a API, please don't
> introduce it.

The automatic system goes a long way, but there are corner cases, for example
work queues, which still need mem_flags.

	Regards
		Oliver

^ permalink raw reply

* [PATCH] vhost_net: remove tx polling state
From: Jason Wang @ 2013-04-11  6:50 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel

After commit 2b8b328b61c799957a456a5a8dab8cc7dea68575 (vhost_net: handle polling
errors when setting backend), we in fact track the polling state through
poll->wqh, so there's no need to duplicate the work with an extra
vhost_net_polling_state. So this patch removes this and make the code simpler.

This patch also removes the all tx starting/stopping code in tx path according
to Michael's suggestion.

Netperf test shows almost the same result in stream test, but gets improvements
on TCP_RR tests (both zerocopy or copy) especially on low load cases.

Tested between multiqueue kvm guest and external host with two direct
connected 82599s.

zerocopy disabled:

sessions|transaction rates|normalize|
before/after/+improvements
1 | 9510.24/11727.29/+23.3%    | 693.54/887.68/+28.0%   |
25| 192931.50/241729.87/+25.3% | 2376.80/2771.70/+16.6% |
50| 277634.64/291905.76/+5%    | 3118.36/3230.11/+3.6%  |

zerocopy enabled:

sessions|transaction rates|normalize|
before/after/+improvements
1 | 7318.33/11929.76/+63.0%    | 521.86/843.30/+61.6%   |
25| 167264.88/242422.15/+44.9% | 2181.60/2788.16/+27.8% |
50| 272181.02/294347.04/+8.1%  | 3071.56/3257.85/+6.1%  |

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c   |   74 ++++---------------------------------------------
 drivers/vhost/vhost.c |    3 ++
 2 files changed, 9 insertions(+), 68 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index ec6fb3f..87c216c 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -64,20 +64,10 @@ enum {
 	VHOST_NET_VQ_MAX = 2,
 };
 
-enum vhost_net_poll_state {
-	VHOST_NET_POLL_DISABLED = 0,
-	VHOST_NET_POLL_STARTED = 1,
-	VHOST_NET_POLL_STOPPED = 2,
-};
-
 struct vhost_net {
 	struct vhost_dev dev;
 	struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX];
 	struct vhost_poll poll[VHOST_NET_VQ_MAX];
-	/* Tells us whether we are polling a socket for TX.
-	 * We only do this when socket buffer fills up.
-	 * Protected by tx vq lock. */
-	enum vhost_net_poll_state tx_poll_state;
 	/* Number of TX recently submitted.
 	 * Protected by tx vq lock. */
 	unsigned tx_packets;
@@ -155,28 +145,6 @@ static void copy_iovec_hdr(const struct iovec *from, struct iovec *to,
 	}
 }
 
-/* Caller must have TX VQ lock */
-static void tx_poll_stop(struct vhost_net *net)
-{
-	if (likely(net->tx_poll_state != VHOST_NET_POLL_STARTED))
-		return;
-	vhost_poll_stop(net->poll + VHOST_NET_VQ_TX);
-	net->tx_poll_state = VHOST_NET_POLL_STOPPED;
-}
-
-/* Caller must have TX VQ lock */
-static int tx_poll_start(struct vhost_net *net, struct socket *sock)
-{
-	int ret;
-
-	if (unlikely(net->tx_poll_state != VHOST_NET_POLL_STOPPED))
-		return 0;
-	ret = vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
-	if (!ret)
-		net->tx_poll_state = VHOST_NET_POLL_STARTED;
-	return ret;
-}
-
 /* In case of DMA done not in order in lower device driver for some reason.
  * upend_idx is used to track end of used idx, done_idx is used to track head
  * of used idx. Once lower device DMA done contiguously, we will signal KVM
@@ -242,7 +210,7 @@ static void handle_tx(struct vhost_net *net)
 		.msg_flags = MSG_DONTWAIT,
 	};
 	size_t len, total_len = 0;
-	int err, wmem;
+	int err;
 	size_t hdr_size;
 	struct socket *sock;
 	struct vhost_ubuf_ref *uninitialized_var(ubufs);
@@ -253,19 +221,9 @@ static void handle_tx(struct vhost_net *net)
 	if (!sock)
 		return;
 
-	wmem = atomic_read(&sock->sk->sk_wmem_alloc);
-	if (wmem >= sock->sk->sk_sndbuf) {
-		mutex_lock(&vq->mutex);
-		tx_poll_start(net, sock);
-		mutex_unlock(&vq->mutex);
-		return;
-	}
-
 	mutex_lock(&vq->mutex);
 	vhost_disable_notify(&net->dev, vq);
 
-	if (wmem < sock->sk->sk_sndbuf / 2)
-		tx_poll_stop(net);
 	hdr_size = vq->vhost_hlen;
 	zcopy = vq->ubufs;
 
@@ -285,23 +243,14 @@ static void handle_tx(struct vhost_net *net)
 		if (head == vq->num) {
 			int num_pends;
 
-			wmem = atomic_read(&sock->sk->sk_wmem_alloc);
-			if (wmem >= sock->sk->sk_sndbuf * 3 / 4) {
-				tx_poll_start(net, sock);
-				set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
-				break;
-			}
 			/* If more outstanding DMAs, queue the work.
 			 * Handle upend_idx wrap around
 			 */
 			num_pends = likely(vq->upend_idx >= vq->done_idx) ?
 				    (vq->upend_idx - vq->done_idx) :
 				    (vq->upend_idx + UIO_MAXIOV - vq->done_idx);
-			if (unlikely(num_pends > VHOST_MAX_PEND)) {
-				tx_poll_start(net, sock);
-				set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
+			if (unlikely(num_pends > VHOST_MAX_PEND))
 				break;
-			}
 			if (unlikely(vhost_enable_notify(&net->dev, vq))) {
 				vhost_disable_notify(&net->dev, vq);
 				continue;
@@ -364,8 +313,6 @@ static void handle_tx(struct vhost_net *net)
 					UIO_MAXIOV;
 			}
 			vhost_discard_vq_desc(vq, 1);
-			if (err == -EAGAIN || err == -ENOBUFS)
-				tx_poll_start(net, sock);
 			break;
 		}
 		if (err != len)
@@ -628,7 +575,6 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 
 	vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
 	vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
-	n->tx_poll_state = VHOST_NET_POLL_DISABLED;
 
 	f->private_data = n;
 
@@ -638,32 +584,24 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 static void vhost_net_disable_vq(struct vhost_net *n,
 				 struct vhost_virtqueue *vq)
 {
+	struct vhost_poll *poll = n->poll + (vq - n->vqs);
 	if (!vq->private_data)
 		return;
-	if (vq == n->vqs + VHOST_NET_VQ_TX) {
-		tx_poll_stop(n);
-		n->tx_poll_state = VHOST_NET_POLL_DISABLED;
-	} else
-		vhost_poll_stop(n->poll + VHOST_NET_VQ_RX);
+	vhost_poll_stop(poll);
 }
 
 static int vhost_net_enable_vq(struct vhost_net *n,
 				struct vhost_virtqueue *vq)
 {
+	struct vhost_poll *poll = n->poll + (vq - n->vqs);
 	struct socket *sock;
-	int ret;
 
 	sock = rcu_dereference_protected(vq->private_data,
 					 lockdep_is_held(&vq->mutex));
 	if (!sock)
 		return 0;
-	if (vq == n->vqs + VHOST_NET_VQ_TX) {
-		n->tx_poll_state = VHOST_NET_POLL_STOPPED;
-		ret = tx_poll_start(n, sock);
-	} else
-		ret = vhost_poll_start(n->poll + VHOST_NET_VQ_RX, sock->file);
 
-	return ret;
+	return vhost_poll_start(poll, sock->file);
 }
 
 static struct socket *vhost_net_stop_vq(struct vhost_net *n,
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 9759249..4eecdb8 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -89,6 +89,9 @@ int vhost_poll_start(struct vhost_poll *poll, struct file *file)
 	unsigned long mask;
 	int ret = 0;
 
+	if (poll->wqh)
+		return 0;
+
 	mask = file->f_op->poll(file, &poll->table);
 	if (mask)
 		vhost_poll_wakeup(&poll->wait, 0, 0, (void *)mask);
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH v2] net: mv643xx_eth: use managed devm_kzalloc
From: Sebastian Hesselbarth @ 2013-04-11  6:53 UTC (permalink / raw)
  To: David Miller
  Cc: buytenh, andrew, jason, florian, sergei.shtylyov, netdev,
	linux-kernel
In-Reply-To: <20130410.233933.2109788843089649198.davem@davemloft.net>

On 04/11/2013 05:39 AM, David Miller wrote:
> From: Sebastian Hesselbarth<sebastian.hesselbarth@gmail.com>
> Date: Wed, 10 Apr 2013 22:42:07 +0200
>
>> This patch moves shared private data kzalloc to managed devm_kzalloc and
>> cleans now unneccessary kfree and error handling.
>>
>> Signed-off-by: Sebastian Hesselbarth<sebastian.hesselbarth@gmail.com>
>
> This doesn't apply cleanly to the net-next tree.

Yeah. I sent two single patches for mv643xx_eth, while they should
have been sent together in one patch set. I'll prepare a cover letter
and resend both in one patch set.

Sebastian

^ permalink raw reply

* Re: [PATCH] net: usb: active URB submitted multiple times
From: Petko Manolov @ 2013-04-11  7:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, Sarah Sharp
In-Reply-To: <20130410.232241.1038793023507666278.davem@davemloft.net>

From: Petko Manolov <petkan@nucleusys.com>

(For inclusion in 3.10, diff against latest net-next.)

Pegasus driver used single callback for sync and async control URBs. 
Special flags were employed to distinguish between both, but due to flawed 
logic (as Sarah Sharp spotted) it didn't always work.  As a result of this 
change [get|set]_registers() are now much simpler.  Async write is also 
leaner and does not use single, statically allocated memory for 
usb_ctrlrequest, which is another potential race when asynchronously 
submitting URBs.

The socket buffer pool for the receive path is now gone.  It's existence 
didn't make much difference (performance-wise) and the code is better off 
without the spinlocks protecting it.

Largely duplicated code in routines reading and writing MII registers is 
now packed in __mii_op().

Adding URL for the public pegasus git repository.

Signed-off-by: Petko Manolov <petkan@nucleusys.com>
---
  MAINTAINERS               |    6 +-
  drivers/net/usb/pegasus.c |  601 +++++++++++++--------------------------------
  drivers/net/usb/pegasus.h |   10 +-
  3 files changed, 181 insertions(+), 436 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index c39bdc3..863d8cb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8358,9 +8358,10 @@ S:	Maintained
  F:	drivers/usb/serial/option.c

  USB PEGASUS DRIVER
-M:	Petko Manolov <petkan@users.sourceforge.net>
+M:	Petko Manolov <petkan@nucleusys.com>
  L:	linux-usb@vger.kernel.org
  L:	netdev@vger.kernel.org
+T:	git git://git.code.sf.net/p/pegasus2/git
  W:	http://pegasus2.sourceforge.net/
  S:	Maintained
  F:	drivers/net/usb/pegasus.*
@@ -8380,9 +8381,10 @@ S:	Supported
  F:	drivers/usb/class/usblp.c

  USB RTL8150 DRIVER
-M:	Petko Manolov <petkan@users.sourceforge.net>
+M:	Petko Manolov <petkan@nucleusys.com>
  L:	linux-usb@vger.kernel.org
  L:	netdev@vger.kernel.org
+T:	git git://git.code.sf.net/p/pegasus2/git
  W:	http://pegasus2.sourceforge.net/
  S:	Maintained
  F:	drivers/net/usb/rtl8150.c
diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c
index 73051d1..501db20 100644
--- a/drivers/net/usb/pegasus.c
+++ b/drivers/net/usb/pegasus.c
@@ -1,5 +1,5 @@
  /*
- *  Copyright (c) 1999-2005 Petko Manolov (petkan@users.sourceforge.net)
+ *  Copyright (c) 1999-2013 Petko Manolov (petkan@nucleusys.com)
   *
   * This program is free software; you can redistribute it and/or modify
   * it under the terms of the GNU General Public License version 2 as
@@ -26,6 +26,9 @@
   *		v0.5.1	ethtool support added
   *		v0.5.5	rx socket buffers are in a pool and the their allocation
   *			is out of the interrupt routine.
+ *		...
+ *		v0.9.1	simplified [get|set]_register(s), async update registers
+ *			logic revisited, receive skb_pool removed.
   */

  #include <linux/sched.h>
@@ -45,8 +48,8 @@
  /*
   * Version Information
   */
-#define DRIVER_VERSION "v0.6.14 (2006/09/27)"
-#define DRIVER_AUTHOR "Petko Manolov <petkan@users.sourceforge.net>"
+#define DRIVER_VERSION "v0.9.3 (2013/04/09)"
+#define DRIVER_AUTHOR "Petko Manolov <petkan@nucleusys.com>"
  #define DRIVER_DESC "Pegasus/Pegasus II USB Ethernet driver"

  static const char driver_name[] = "pegasus";
@@ -108,299 +111,154 @@ MODULE_PARM_DESC(msg_level, "Override default message level");
  MODULE_DEVICE_TABLE(usb, pegasus_ids);
  static const struct net_device_ops pegasus_netdev_ops;

-static int update_eth_regs_async(pegasus_t *);
-/* Aargh!!! I _really_ hate such tweaks */
-static void ctrl_callback(struct urb *urb)
+static void async_ctrl_callback(struct urb *urb)
  {
-	pegasus_t *pegasus = urb->context;
+	struct usb_ctrlrequest *req = (struct usb_ctrlrequest *)urb->context;
  	int status = urb->status;

-	if (!pegasus)
-		return;
-
-	switch (status) {
-	case 0:
-		if (pegasus->flags & ETH_REGS_CHANGE) {
-			pegasus->flags &= ~ETH_REGS_CHANGE;
-			pegasus->flags |= ETH_REGS_CHANGED;
-			update_eth_regs_async(pegasus);
-			return;
-		}
-		break;
-	case -EINPROGRESS:
-		return;
-	case -ENOENT:
-		break;
-	default:
-		if (net_ratelimit())
-			netif_dbg(pegasus, drv, pegasus->net,
-				  "%s, status %d\n", __func__, status);
-		break;
-	}
-	pegasus->flags &= ~ETH_REGS_CHANGED;
-	wake_up(&pegasus->ctrl_wait);
+	if (status < 0)
+		dev_dbg(&urb->dev->dev, "%s failed with %d", __func__, status);
+	kfree(req);
+	usb_free_urb(urb);
  }

-static int get_registers(pegasus_t *pegasus, __u16 indx, __u16 size,
+static int get_registers(pegasus_t * pegasus, __u16 indx, __u16 size,
  			 void *data)
  {
  	int ret;
-	char *buffer;
-	DECLARE_WAITQUEUE(wait, current);
-
-	buffer = kmalloc(size, GFP_KERNEL);
-	if (!buffer)
-		return -ENOMEM;
-
-	add_wait_queue(&pegasus->ctrl_wait, &wait);
-	set_current_state(TASK_UNINTERRUPTIBLE);
-	while (pegasus->flags & ETH_REGS_CHANGED)
-		schedule();
-	remove_wait_queue(&pegasus->ctrl_wait, &wait);
-	set_current_state(TASK_RUNNING);
-
-	pegasus->dr.bRequestType = PEGASUS_REQT_READ;
-	pegasus->dr.bRequest = PEGASUS_REQ_GET_REGS;
-	pegasus->dr.wValue = cpu_to_le16(0);
-	pegasus->dr.wIndex = cpu_to_le16(indx);
-	pegasus->dr.wLength = cpu_to_le16(size);
-	pegasus->ctrl_urb->transfer_buffer_length = size;
-
-	usb_fill_control_urb(pegasus->ctrl_urb, pegasus->usb,
-			     usb_rcvctrlpipe(pegasus->usb, 0),
-			     (char *) &pegasus->dr,
-			     buffer, size, ctrl_callback, pegasus);
-
-	add_wait_queue(&pegasus->ctrl_wait, &wait);
-	set_current_state(TASK_UNINTERRUPTIBLE);
-
-	/* using ATOMIC, we'd never wake up if we slept */
-	if ((ret = usb_submit_urb(pegasus->ctrl_urb, GFP_ATOMIC))) {
-		set_current_state(TASK_RUNNING);
-		if (ret == -ENODEV)
-			netif_device_detach(pegasus->net);
-		if (net_ratelimit())
-			netif_err(pegasus, drv, pegasus->net,
-				  "%s, status %d\n", __func__, ret);
-		goto out;
-	}
-
-	schedule();
-out:
-	remove_wait_queue(&pegasus->ctrl_wait, &wait);
-	memcpy(data, buffer, size);
-	kfree(buffer);

+	ret = usb_control_msg(pegasus->usb, usb_rcvctrlpipe(pegasus->usb, 0),
+			      PEGASUS_REQ_GET_REGS, PEGASUS_REQT_READ, 0,
+			      indx, data, size, 1000);
+	if (ret < 0)
+		netif_dbg(pegasus, drv, pegasus->net,
+			  "%s returned %d\n", __func__, ret);
  	return ret;
  }

-static int set_registers(pegasus_t *pegasus, __u16 indx, __u16 size,
+static int set_registers(pegasus_t * pegasus, __u16 indx, __u16 size,
  			 void *data)
  {
  	int ret;
-	char *buffer;
-	DECLARE_WAITQUEUE(wait, current);
-
-	buffer = kmemdup(data, size, GFP_KERNEL);
-	if (!buffer) {
-		netif_warn(pegasus, drv, pegasus->net,
-			   "out of memory in %s\n", __func__);
-		return -ENOMEM;
-	}
-
-	add_wait_queue(&pegasus->ctrl_wait, &wait);
-	set_current_state(TASK_UNINTERRUPTIBLE);
-	while (pegasus->flags & ETH_REGS_CHANGED)
-		schedule();
-	remove_wait_queue(&pegasus->ctrl_wait, &wait);
-	set_current_state(TASK_RUNNING);
-
-	pegasus->dr.bRequestType = PEGASUS_REQT_WRITE;
-	pegasus->dr.bRequest = PEGASUS_REQ_SET_REGS;
-	pegasus->dr.wValue = cpu_to_le16(0);
-	pegasus->dr.wIndex = cpu_to_le16(indx);
-	pegasus->dr.wLength = cpu_to_le16(size);
-	pegasus->ctrl_urb->transfer_buffer_length = size;
-
-	usb_fill_control_urb(pegasus->ctrl_urb, pegasus->usb,
-			     usb_sndctrlpipe(pegasus->usb, 0),
-			     (char *) &pegasus->dr,
-			     buffer, size, ctrl_callback, pegasus);
-
-	add_wait_queue(&pegasus->ctrl_wait, &wait);
-	set_current_state(TASK_UNINTERRUPTIBLE);
-
-	if ((ret = usb_submit_urb(pegasus->ctrl_urb, GFP_ATOMIC))) {
-		if (ret == -ENODEV)
-			netif_device_detach(pegasus->net);
-		netif_err(pegasus, drv, pegasus->net,
-			  "%s, status %d\n", __func__, ret);
-		goto out;
-	}
-
-	schedule();
-out:
-	remove_wait_queue(&pegasus->ctrl_wait, &wait);
-	kfree(buffer);

+	ret = usb_control_msg(pegasus->usb, usb_sndctrlpipe(pegasus->usb, 0),
+			      PEGASUS_REQ_SET_REGS, PEGASUS_REQT_WRITE, 0,
+			      indx, data, size, 100);
+	if (ret < 0)
+		netif_dbg(pegasus, drv, pegasus->net,
+			  "%s returned %d\n", __func__, ret);
  	return ret;
  }

-static int set_register(pegasus_t *pegasus, __u16 indx, __u8 data)
+static int set_register(pegasus_t * pegasus, __u16 indx, __u8 data)
  {
  	int ret;
-	char *tmp;
-	DECLARE_WAITQUEUE(wait, current);
-
-	tmp = kmemdup(&data, 1, GFP_KERNEL);
-	if (!tmp) {
-		netif_warn(pegasus, drv, pegasus->net,
-			   "out of memory in %s\n", __func__);
-		return -ENOMEM;
-	}
-	add_wait_queue(&pegasus->ctrl_wait, &wait);
-	set_current_state(TASK_UNINTERRUPTIBLE);
-	while (pegasus->flags & ETH_REGS_CHANGED)
-		schedule();
-	remove_wait_queue(&pegasus->ctrl_wait, &wait);
-	set_current_state(TASK_RUNNING);
-
-	pegasus->dr.bRequestType = PEGASUS_REQT_WRITE;
-	pegasus->dr.bRequest = PEGASUS_REQ_SET_REG;
-	pegasus->dr.wValue = cpu_to_le16(data);
-	pegasus->dr.wIndex = cpu_to_le16(indx);
-	pegasus->dr.wLength = cpu_to_le16(1);
-	pegasus->ctrl_urb->transfer_buffer_length = 1;
-
-	usb_fill_control_urb(pegasus->ctrl_urb, pegasus->usb,
-			     usb_sndctrlpipe(pegasus->usb, 0),
-			     (char *) &pegasus->dr,
-			     tmp, 1, ctrl_callback, pegasus);
-
-	add_wait_queue(&pegasus->ctrl_wait, &wait);
-	set_current_state(TASK_UNINTERRUPTIBLE);
-
-	if ((ret = usb_submit_urb(pegasus->ctrl_urb, GFP_ATOMIC))) {
-		if (ret == -ENODEV)
-			netif_device_detach(pegasus->net);
-		if (net_ratelimit())
-			netif_err(pegasus, drv, pegasus->net,
-				  "%s, status %d\n", __func__, ret);
-		goto out;
-	}
-
-	schedule();
-out:
-	remove_wait_queue(&pegasus->ctrl_wait, &wait);
-	kfree(tmp);

+	ret = usb_control_msg(pegasus->usb, usb_sndctrlpipe(pegasus->usb, 0),
+			      PEGASUS_REQ_SET_REG, PEGASUS_REQT_WRITE, data,
+			      indx, &data, 1, 1000);
+	if (ret < 0)
+		netif_dbg(pegasus, drv, pegasus->net,
+			  "%s returned %d\n", __func__, ret);
  	return ret;
  }

-static int update_eth_regs_async(pegasus_t *pegasus)
+static int update_eth_regs_async(pegasus_t * pegasus)
  {
-	int ret;
+	int ret = -ENOMEM;
+	struct urb *async_urb;
+	struct usb_ctrlrequest *req;

-	pegasus->dr.bRequestType = PEGASUS_REQT_WRITE;
-	pegasus->dr.bRequest = PEGASUS_REQ_SET_REGS;
-	pegasus->dr.wValue = cpu_to_le16(0);
-	pegasus->dr.wIndex = cpu_to_le16(EthCtrl0);
-	pegasus->dr.wLength = cpu_to_le16(3);
-	pegasus->ctrl_urb->transfer_buffer_length = 3;
+	req = kmalloc(sizeof(struct usb_ctrlrequest), GFP_ATOMIC);
+	if (req == NULL)
+		return ret;

-	usb_fill_control_urb(pegasus->ctrl_urb, pegasus->usb,
-			     usb_sndctrlpipe(pegasus->usb, 0),
-			     (char *) &pegasus->dr,
-			     pegasus->eth_regs, 3, ctrl_callback, pegasus);
+	if ((async_urb = usb_alloc_urb(0, GFP_ATOMIC)) == NULL) {
+		kfree(req);
+		return ret;
+	}
+	req->bRequestType = PEGASUS_REQT_WRITE;
+	req->bRequest = PEGASUS_REQ_SET_REGS;
+	req->wValue = cpu_to_le16(0);
+	req->wIndex = cpu_to_le16(EthCtrl0);
+	req->wLength = cpu_to_le16(3);
+
+	usb_fill_control_urb(async_urb, pegasus->usb,
+			     usb_sndctrlpipe(pegasus->usb, 0), (void *)req,
+			     pegasus->eth_regs, 3, async_ctrl_callback, req);

-	if ((ret = usb_submit_urb(pegasus->ctrl_urb, GFP_ATOMIC))) {
+	if ((ret = usb_submit_urb(async_urb, GFP_ATOMIC))) {
  		if (ret == -ENODEV)
  			netif_device_detach(pegasus->net);
  		netif_err(pegasus, drv, pegasus->net,
-			  "%s, status %d\n", __func__, ret);
+			  "%s returned %d\n", __func__, ret);
  	}
-
  	return ret;
  }

-/* Returns 0 on success, error on failure */
-static int read_mii_word(pegasus_t *pegasus, __u8 phy, __u8 indx, __u16 *regd)
+static int __mii_op(pegasus_t * p, __u8 phy, __u8 indx, __u16 * regd, __u8 cmd)
  {
  	int i;
  	__u8 data[4] = { phy, 0, 0, indx };
  	__le16 regdi;
-	int ret;
+	int ret = -ETIMEDOUT;

-	set_register(pegasus, PhyCtrl, 0);
-	set_registers(pegasus, PhyAddr, sizeof(data), data);
-	set_register(pegasus, PhyCtrl, (indx | PHY_READ));
+	if (cmd & PHY_WRITE) {
+		__le16 *t = (__le16 *) & data[1];
+		*t = cpu_to_le16(*regd);
+	}
+	set_register(p, PhyCtrl, 0);
+	set_registers(p, PhyAddr, sizeof(data), data);
+	set_register(p, PhyCtrl, (indx | cmd));
  	for (i = 0; i < REG_TIMEOUT; i++) {
-		ret = get_registers(pegasus, PhyCtrl, 1, data);
-		if (ret == -ESHUTDOWN)
+		ret = get_registers(p, PhyCtrl, 1, data);
+		if (ret < 0)
  			goto fail;
  		if (data[0] & PHY_DONE)
  			break;
  	}
-
  	if (i >= REG_TIMEOUT)
  		goto fail;
-
-	ret = get_registers(pegasus, PhyData, 2, &regdi);
-	*regd = le16_to_cpu(regdi);
+	if (cmd & PHY_READ) {
+		ret = get_registers(p, PhyData, 2, &regdi);
+		*regd = le16_to_cpu(regdi);
+		return ret;
+	}
+	return 0;
+fail:
+	netif_dbg(p, drv, p->net, "%s failed\n", __func__);
  	return ret;
+}

-fail:
-	netif_warn(pegasus, drv, pegasus->net, "%s failed\n", __func__);
+/* Returns non-negative int on success, error on failure */
+static int read_mii_word(pegasus_t * pegasus, __u8 phy, __u8 indx, __u16 * regd)
+{
+	return __mii_op(pegasus, phy, indx, regd, PHY_READ);
+}

-	return ret;
+/* Returns zero on success, error on failure */
+static int write_mii_word(pegasus_t * pegasus, __u8 phy, __u8 indx, __u16 * regd)
+{
+	return __mii_op(pegasus, phy, indx, regd, PHY_WRITE);
  }

  static int mdio_read(struct net_device *dev, int phy_id, int loc)
  {
  	pegasus_t *pegasus = netdev_priv(dev);
-	u16 res;
+	__u16 res;

  	read_mii_word(pegasus, phy_id, loc, &res);
  	return (int)res;
  }

-static int write_mii_word(pegasus_t *pegasus, __u8 phy, __u8 indx, __u16 regd)
-{
-	int i;
-	__u8 data[4] = { phy, 0, 0, indx };
-	int ret;
-
-	data[1] = (u8) regd;
-	data[2] = (u8) (regd >> 8);
-	set_register(pegasus, PhyCtrl, 0);
-	set_registers(pegasus, PhyAddr, sizeof(data), data);
-	set_register(pegasus, PhyCtrl, (indx | PHY_WRITE));
-	for (i = 0; i < REG_TIMEOUT; i++) {
-		ret = get_registers(pegasus, PhyCtrl, 1, data);
-		if (ret == -ESHUTDOWN)
-			goto fail;
-		if (data[0] & PHY_DONE)
-			break;
-	}
-
-	if (i >= REG_TIMEOUT)
-		goto fail;
-
-	return ret;
-
-fail:
-	netif_warn(pegasus, drv, pegasus->net, "%s failed\n", __func__);
-	return -ETIMEDOUT;
-}
-
  static void mdio_write(struct net_device *dev, int phy_id, int loc, int val)
  {
  	pegasus_t *pegasus = netdev_priv(dev);

-	write_mii_word(pegasus, phy_id, loc, val);
+	write_mii_word(pegasus, phy_id, loc, (__u16 *) & val);
  }

-static int read_eprom_word(pegasus_t *pegasus, __u8 index, __u16 *retdata)
+static int read_eprom_word(pegasus_t * pegasus, __u8 index, __u16 * retdata)
  {
  	int i;
  	__u8 tmp;
@@ -420,18 +278,16 @@ static int read_eprom_word(pegasus_t *pegasus, __u8 index, __u16 *retdata)
  	}
  	if (i >= REG_TIMEOUT)
  		goto fail;
-
  	ret = get_registers(pegasus, EpromData, 2, &retdatai);
  	*retdata = le16_to_cpu(retdatai);
  	return ret;
-
  fail:
  	netif_warn(pegasus, drv, pegasus->net, "%s failed\n", __func__);
  	return -ETIMEDOUT;
  }

  #ifdef	PEGASUS_WRITE_EEPROM
-static inline void enable_eprom_write(pegasus_t *pegasus)
+static inline void enable_eprom_write(pegasus_t * pegasus)
  {
  	__u8 tmp;
  	int ret;
@@ -440,7 +296,7 @@ static inline void enable_eprom_write(pegasus_t *pegasus)
  	set_register(pegasus, EthCtrl2, tmp | EPROM_WR_ENABLE);
  }

-static inline void disable_eprom_write(pegasus_t *pegasus)
+static inline void disable_eprom_write(pegasus_t * pegasus)
  {
  	__u8 tmp;
  	int ret;
@@ -450,7 +306,7 @@ static inline void disable_eprom_write(pegasus_t *pegasus)
  	set_register(pegasus, EthCtrl2, tmp & ~EPROM_WR_ENABLE);
  }

-static int write_eprom_word(pegasus_t *pegasus, __u8 index, __u16 data)
+static int write_eprom_word(pegasus_t * pegasus, __u8 index, __u16 data)
  {
  	int i;
  	__u8 tmp, d[4] = { 0x3f, 0, 0, EPROM_WRITE };
@@ -473,16 +329,14 @@ static int write_eprom_word(pegasus_t *pegasus, __u8 index, __u16 data)
  	disable_eprom_write(pegasus);
  	if (i >= REG_TIMEOUT)
  		goto fail;
-
  	return ret;
-
  fail:
  	netif_warn(pegasus, drv, pegasus->net, "%s failed\n", __func__);
  	return -ETIMEDOUT;
  }
-#endif				/* PEGASUS_WRITE_EEPROM */
+#endif /* PEGASUS_WRITE_EEPROM */

-static inline void get_node_id(pegasus_t *pegasus, __u8 *id)
+static inline void get_node_id(pegasus_t * pegasus, __u8 * id)
  {
  	int i;
  	__u16 w16;
@@ -493,7 +347,7 @@ static inline void get_node_id(pegasus_t *pegasus, __u8 *id)
  	}
  }

-static void set_ethernet_addr(pegasus_t *pegasus)
+static void set_ethernet_addr(pegasus_t * pegasus)
  {
  	__u8 node_id[6];

@@ -506,7 +360,7 @@ static void set_ethernet_addr(pegasus_t *pegasus)
  	memcpy(pegasus->net->dev_addr, node_id, sizeof(node_id));
  }

-static inline int reset_mac(pegasus_t *pegasus)
+static inline int reset_mac(pegasus_t * pegasus)
  {
  	__u8 data = 0x8;
  	int i;
@@ -528,7 +382,6 @@ static inline int reset_mac(pegasus_t *pegasus)
  	}
  	if (i == REG_TIMEOUT)
  		return -ETIMEDOUT;
-
  	if (usb_dev_id[pegasus->dev_index].vendor == VENDOR_LINKSYS ||
  	    usb_dev_id[pegasus->dev_index].vendor == VENDOR_DLINK) {
  		set_register(pegasus, Gpio0, 0x24);
@@ -537,18 +390,17 @@ static inline int reset_mac(pegasus_t *pegasus)
  	if (usb_dev_id[pegasus->dev_index].vendor == VENDOR_ELCON) {
  		__u16 auxmode;
  		read_mii_word(pegasus, 3, 0x1b, &auxmode);
-		write_mii_word(pegasus, 3, 0x1b, auxmode | 4);
+		auxmode |= 4;
+		write_mii_word(pegasus, 3, 0x1b, &auxmode);
  	}
-
  	return 0;
  }

-static int enable_net_traffic(struct net_device *dev, struct usb_device *usb)
+static void enable_net_traffic(struct net_device *dev, struct usb_device *usb)
  {
  	__u16 linkpart;
  	__u8 data[4];
  	pegasus_t *pegasus = netdev_priv(dev);
-	int ret;

  	read_mii_word(pegasus, pegasus->phy, MII_LPA, &linkpart);
  	data[0] = 0xc9;
@@ -562,62 +414,16 @@ static int enable_net_traffic(struct net_device *dev, struct usb_device *usb)
  	data[2] = loopback ? 0x09 : 0x01;

  	memcpy(pegasus->eth_regs, data, sizeof(data));
-	ret = set_registers(pegasus, EthCtrl0, 3, data);
+	set_registers(pegasus, EthCtrl0, 3, data);

  	if (usb_dev_id[pegasus->dev_index].vendor == VENDOR_LINKSYS ||
  	    usb_dev_id[pegasus->dev_index].vendor == VENDOR_LINKSYS2 ||
  	    usb_dev_id[pegasus->dev_index].vendor == VENDOR_DLINK) {
-		u16 auxmode;
+		__u16 auxmode;
  		read_mii_word(pegasus, 0, 0x1b, &auxmode);
-		write_mii_word(pegasus, 0, 0x1b, auxmode | 4);
+		auxmode |= 4;
+		write_mii_word(pegasus, 0, 0x1b, &auxmode);
  	}
-
-	return ret;
-}
-
-static void fill_skb_pool(pegasus_t *pegasus)
-{
-	int i;
-
-	for (i = 0; i < RX_SKBS; i++) {
-		if (pegasus->rx_pool[i])
-			continue;
-		pegasus->rx_pool[i] = dev_alloc_skb(PEGASUS_MTU + 2);
-		/*
-		 ** we give up if the allocation fail. the tasklet will be
-		 ** rescheduled again anyway...
-		 */
-		if (pegasus->rx_pool[i] == NULL)
-			return;
-		skb_reserve(pegasus->rx_pool[i], 2);
-	}
-}
-
-static void free_skb_pool(pegasus_t *pegasus)
-{
-	int i;
-
-	for (i = 0; i < RX_SKBS; i++) {
-		if (pegasus->rx_pool[i]) {
-			dev_kfree_skb(pegasus->rx_pool[i]);
-			pegasus->rx_pool[i] = NULL;
-		}
-	}
-}
-
-static inline struct sk_buff *pull_skb(pegasus_t * pegasus)
-{
-	int i;
-	struct sk_buff *skb;
-
-	for (i = 0; i < RX_SKBS; i++) {
-		if (likely(pegasus->rx_pool[i] != NULL)) {
-			skb = pegasus->rx_pool[i];
-			pegasus->rx_pool[i] = NULL;
-			return skb;
-		}
-	}
-	return NULL;
  }

  static void read_bulk_callback(struct urb *urb)
@@ -643,7 +449,7 @@ static void read_bulk_callback(struct urb *urb)
  		netif_dbg(pegasus, rx_err, net, "reset MAC\n");
  		pegasus->flags &= ~PEGASUS_RX_BUSY;
  		break;
-	case -EPIPE:		/* stall, or disconnect from TT */
+	case -EPIPE:	/* stall, or disconnect from TT */
  		/* FIXME schedule work to clear the halt */
  		netif_warn(pegasus, rx_err, net, "no rx stall recovery\n");
  		return;
@@ -665,16 +471,16 @@ static void read_bulk_callback(struct urb *urb)
  		netif_dbg(pegasus, rx_err, net,
  			  "RX packet error %x\n", rx_status);
  		pegasus->stats.rx_errors++;
-		if (rx_status & 0x06)	/* long or runt	*/
+		if (rx_status & 0x06)	/* long or runt */
  			pegasus->stats.rx_length_errors++;
  		if (rx_status & 0x08)
  			pegasus->stats.rx_crc_errors++;
-		if (rx_status & 0x10)	/* extra bits	*/
+		if (rx_status & 0x10)	/* extra bits */
  			pegasus->stats.rx_frame_errors++;
  		goto goon;
  	}
  	if (pegasus->chip == 0x8513) {
-		pkt_len = le32_to_cpu(*(__le32 *)urb->transfer_buffer);
+		pkt_len = le32_to_cpu(*(__le32 *) urb->transfer_buffer);
  		pkt_len &= 0x0fff;
  		pegasus->rx_skb->data += 2;
  	} else {
@@ -683,14 +489,12 @@ static void read_bulk_callback(struct urb *urb)
  		pkt_len &= 0xfff;
  		pkt_len -= 8;
  	}
-
  	/*
  	 * If the packet is unreasonably long, quietly drop it rather than
  	 * kernel panicing by calling skb_put.
  	 */
  	if (pkt_len > PEGASUS_MTU)
  		goto goon;
-
  	/*
  	 * at this point we are sure pegasus->rx_skb != NULL
  	 * so we go ahead and pass up the packet.
@@ -704,10 +508,8 @@ static void read_bulk_callback(struct urb *urb)
  	if (pegasus->flags & PEGASUS_UNPLUG)
  		return;

-	spin_lock(&pegasus->rx_pool_lock);
-	pegasus->rx_skb = pull_skb(pegasus);
-	spin_unlock(&pegasus->rx_pool_lock);
-
+	pegasus->rx_skb = __netdev_alloc_skb_ip_align(pegasus->net,
+						      PEGASUS_MTU, GFP_ATOMIC);
  	if (pegasus->rx_skb == NULL)
  		goto tl_sched;
  goon:
@@ -724,9 +526,7 @@ goon:
  	} else {
  		pegasus->flags &= ~PEGASUS_RX_URB_FAIL;
  	}
-
  	return;
-
  tl_sched:
  	tasklet_schedule(&pegasus->rx_tl);
  }
@@ -734,24 +534,22 @@ tl_sched:
  static void rx_fixup(unsigned long data)
  {
  	pegasus_t *pegasus;
-	unsigned long flags;
  	int status;

  	pegasus = (pegasus_t *) data;
  	if (pegasus->flags & PEGASUS_UNPLUG)
  		return;
-
-	spin_lock_irqsave(&pegasus->rx_pool_lock, flags);
-	fill_skb_pool(pegasus);
  	if (pegasus->flags & PEGASUS_RX_URB_FAIL)
  		if (pegasus->rx_skb)
  			goto try_again;
  	if (pegasus->rx_skb == NULL)
-		pegasus->rx_skb = pull_skb(pegasus);
+		pegasus->rx_skb = __netdev_alloc_skb_ip_align(pegasus->net,
+							      PEGASUS_MTU,
+							      GFP_ATOMIC);
  	if (pegasus->rx_skb == NULL) {
  		netif_warn(pegasus, rx_err, pegasus->net, "low on memory\n");
  		tasklet_schedule(&pegasus->rx_tl);
-		goto done;
+		return;
  	}
  	usb_fill_bulk_urb(pegasus->rx_urb, pegasus->usb,
  			  usb_rcvbulkpipe(pegasus->usb, 1),
@@ -767,8 +565,6 @@ try_again:
  	} else {
  		pegasus->flags &= ~PEGASUS_RX_URB_FAIL;
  	}
-done:
-	spin_unlock_irqrestore(&pegasus->rx_pool_lock, flags);
  }

  static void write_bulk_callback(struct urb *urb)
@@ -779,12 +575,9 @@ static void write_bulk_callback(struct urb *urb)

  	if (!pegasus)
  		return;
-
  	net = pegasus->net;
-
  	if (!netif_device_present(net) || !netif_running(net))
  		return;
-
  	switch (status) {
  	case -EPIPE:
  		/* FIXME schedule_work() to clear the tx halt */
@@ -802,8 +595,7 @@ static void write_bulk_callback(struct urb *urb)
  	case 0:
  		break;
  	}
-
-	net->trans_start = jiffies; /* prevent tx timeout */
+	net->trans_start = jiffies;	/* prevent tx timeout */
  	netif_wake_queue(net);
  }

@@ -816,7 +608,6 @@ static void intr_callback(struct urb *urb)
  	if (!pegasus)
  		return;
  	net = pegasus->net;
-
  	switch (status) {
  	case 0:
  		break;
@@ -830,13 +621,12 @@ static void intr_callback(struct urb *urb)
  		 */
  		netif_dbg(pegasus, timer, net, "intr status %d\n", status);
  	}
-
  	if (urb->actual_length >= 6) {
  		u8 *d = urb->transfer_buffer;

  		/* byte 0 == tx_status1, reg 2B */
-		if (d[0] & (TX_UNDERRUN|EXCESSIVE_COL
-					|LATE_COL|JABBER_TIMEOUT)) {
+		if (d[0] & (TX_UNDERRUN | EXCESSIVE_COL
+			    | LATE_COL | JABBER_TIMEOUT)) {
  			pegasus->stats.tx_errors++;
  			if (d[0] & TX_UNDERRUN)
  				pegasus->stats.tx_fifo_errors++;
@@ -854,7 +644,6 @@ static void intr_callback(struct urb *urb)
  		/* bytes 3-4 == rx_lostpkt, reg 2E/2F */
  		pegasus->stats.rx_missed_errors += ((d[3] & 0x7f) << 8) | d[4];
  	}
-
  	res = usb_submit_urb(urb, GFP_ATOMIC);
  	if (res == -ENODEV)
  		netif_device_detach(pegasus->net);
@@ -872,7 +661,7 @@ static void pegasus_tx_timeout(struct net_device *net)
  }

  static netdev_tx_t pegasus_start_xmit(struct sk_buff *skb,
-					    struct net_device *net)
+				      struct net_device *net)
  {
  	pegasus_t *pegasus = netdev_priv(net);
  	int count = ((skb->len + 2) & 0x3f) ? skb->len + 2 : skb->len + 3;
@@ -890,10 +679,10 @@ static netdev_tx_t pegasus_start_xmit(struct sk_buff *skb,
  	if ((res = usb_submit_urb(pegasus->tx_urb, GFP_ATOMIC))) {
  		netif_warn(pegasus, tx_err, net, "fail tx, %d\n", res);
  		switch (res) {
-		case -EPIPE:		/* stall, or disconnect from TT */
+		case -EPIPE:	/* stall, or disconnect from TT */
  			/* cleanup should already have been scheduled */
  			break;
-		case -ENODEV:		/* disconnect() upcoming */
+		case -ENODEV:	/* disconnect() upcoming */
  		case -EPERM:
  			netif_device_detach(pegasus->net);
  			break;
@@ -915,14 +704,14 @@ static struct net_device_stats *pegasus_netdev_stats(struct net_device *dev)
  	return &((pegasus_t *) netdev_priv(dev))->stats;
  }

-static inline void disable_net_traffic(pegasus_t *pegasus)
+static inline void disable_net_traffic(pegasus_t * pegasus)
  {
  	__le16 tmp = cpu_to_le16(0);

  	set_registers(pegasus, EthCtrl0, sizeof(tmp), &tmp);
  }

-static inline void get_interrupt_interval(pegasus_t *pegasus)
+static inline void get_interrupt_interval(pegasus_t * pegasus)
  {
  	u16 data;
  	u8 interval;
@@ -935,7 +724,7 @@ static inline void get_interrupt_interval(pegasus_t *pegasus)
  				   "intr interval changed from %ums to %ums\n",
  				   interval, 0x80);
  			interval = 0x80;
-			data = (data & 0x00FF) | ((u16)interval << 8);
+			data = (data & 0x00FF) | ((u16) interval << 8);
  #ifdef PEGASUS_WRITE_EEPROM
  			write_eprom_word(pegasus, 4, data);
  #endif
@@ -951,71 +740,58 @@ static void set_carrier(struct net_device *net)

  	if (read_mii_word(pegasus, pegasus->phy, MII_BMSR, &tmp))
  		return;
-
  	if (tmp & BMSR_LSTATUS)
  		netif_carrier_on(net);
  	else
  		netif_carrier_off(net);
  }

-static void free_all_urbs(pegasus_t *pegasus)
+static void free_all_urbs(pegasus_t * pegasus)
  {
  	usb_free_urb(pegasus->intr_urb);
  	usb_free_urb(pegasus->tx_urb);
  	usb_free_urb(pegasus->rx_urb);
-	usb_free_urb(pegasus->ctrl_urb);
  }

-static void unlink_all_urbs(pegasus_t *pegasus)
+static void unlink_all_urbs(pegasus_t * pegasus)
  {
  	usb_kill_urb(pegasus->intr_urb);
  	usb_kill_urb(pegasus->tx_urb);
  	usb_kill_urb(pegasus->rx_urb);
-	usb_kill_urb(pegasus->ctrl_urb);
  }

-static int alloc_urbs(pegasus_t *pegasus)
+static int alloc_urbs(pegasus_t * pegasus)
  {
-	pegasus->ctrl_urb = usb_alloc_urb(0, GFP_KERNEL);
-	if (!pegasus->ctrl_urb)
-		return 0;
  	pegasus->rx_urb = usb_alloc_urb(0, GFP_KERNEL);
  	if (!pegasus->rx_urb) {
-		usb_free_urb(pegasus->ctrl_urb);
  		return 0;
  	}
  	pegasus->tx_urb = usb_alloc_urb(0, GFP_KERNEL);
  	if (!pegasus->tx_urb) {
  		usb_free_urb(pegasus->rx_urb);
-		usb_free_urb(pegasus->ctrl_urb);
  		return 0;
  	}
  	pegasus->intr_urb = usb_alloc_urb(0, GFP_KERNEL);
  	if (!pegasus->intr_urb) {
  		usb_free_urb(pegasus->tx_urb);
  		usb_free_urb(pegasus->rx_urb);
-		usb_free_urb(pegasus->ctrl_urb);
  		return 0;
  	}
-
  	return 1;
  }

  static int pegasus_open(struct net_device *net)
  {
  	pegasus_t *pegasus = netdev_priv(net);
-	int res;
+	int res = -ENOMEM;

  	if (pegasus->rx_skb == NULL)
-		pegasus->rx_skb = pull_skb(pegasus);
-	/*
-	 ** Note: no point to free the pool.  it is empty :-)
-	 */
+		pegasus->rx_skb = __netdev_alloc_skb_ip_align(pegasus->net,
+							      PEGASUS_MTU,
+							      GFP_KERNEL);
  	if (!pegasus->rx_skb)
-		return -ENOMEM;
-
+		goto exit;
  	res = set_registers(pegasus, EthID, 6, net->dev_addr);
-
  	usb_fill_bulk_urb(pegasus->rx_urb, pegasus->usb,
  			  usb_rcvbulkpipe(pegasus->usb, 1),
  			  pegasus->rx_skb->data, PEGASUS_MTU + 8,
@@ -1026,7 +802,6 @@ static int pegasus_open(struct net_device *net)
  		netif_dbg(pegasus, ifup, net, "failed rx_urb, %d\n", res);
  		goto exit;
  	}
-
  	usb_fill_int_urb(pegasus->intr_urb, pegasus->usb,
  			 usb_rcvintpipe(pegasus->usb, 3),
  			 pegasus->intr_buff, sizeof(pegasus->intr_buff),
@@ -1038,18 +813,9 @@ static int pegasus_open(struct net_device *net)
  		usb_kill_urb(pegasus->rx_urb);
  		goto exit;
  	}
-	if ((res = enable_net_traffic(net, pegasus->usb))) {
-		netif_dbg(pegasus, ifup, net,
-			  "can't enable_net_traffic() - %d\n", res);
-		res = -EIO;
-		usb_kill_urb(pegasus->rx_urb);
-		usb_kill_urb(pegasus->intr_urb);
-		free_skb_pool(pegasus);
-		goto exit;
-	}
+	enable_net_traffic(net, pegasus->usb);
  	set_carrier(net);
  	netif_start_queue(net);
-	netif_dbg(pegasus, ifup, net, "open\n");
  	res = 0;
  exit:
  	return res;
@@ -1081,25 +847,22 @@ static void pegasus_get_drvinfo(struct net_device *dev,
  /* also handles three patterns of some kind in hardware */
  #define	WOL_SUPPORTED	(WAKE_MAGIC|WAKE_PHY)

-static void
-pegasus_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
+static void pegasus_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
  {
-	pegasus_t	*pegasus = netdev_priv(dev);
+	pegasus_t *pegasus = netdev_priv(dev);

  	wol->supported = WAKE_MAGIC | WAKE_PHY;
  	wol->wolopts = pegasus->wolopts;
  }

-static int
-pegasus_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
+static int pegasus_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
  {
-	pegasus_t	*pegasus = netdev_priv(dev);
-	u8		reg78 = 0x04;
-	int		ret;
+	pegasus_t *pegasus = netdev_priv(dev);
+	u8 reg78 = 0x04;
+	int r;

  	if (wol->wolopts & ~WOL_SUPPORTED)
  		return -EINVAL;
-
  	if (wol->wolopts & WAKE_MAGIC)
  		reg78 |= 0x80;
  	if (wol->wolopts & WAKE_PHY)
@@ -1111,11 +874,10 @@ pegasus_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
  		pegasus->eth_regs[0] &= ~0x10;
  	pegasus->wolopts = wol->wolopts;

-	ret = set_register(pegasus, WakeupControl, reg78);
-	if (!ret)
-		ret = device_set_wakeup_enable(&pegasus->usb->dev,
-						wol->wolopts);
-	return ret;
+	r = set_register(pegasus, WakeupControl, reg78);
+	if (!r)
+		r = device_set_wakeup_enable(&pegasus->usb->dev, wol->wolopts);
+	return r;
  }

  static inline void pegasus_reset_wol(struct net_device *dev)
@@ -1123,7 +885,7 @@ static inline void pegasus_reset_wol(struct net_device *dev)
  	struct ethtool_wolinfo wol;

  	memset(&wol, 0, sizeof wol);
-	(void) pegasus_set_wol(dev, &wol);
+	(void)pegasus_set_wol(dev, &wol);
  }

  static int
@@ -1133,6 +895,7 @@ pegasus_get_settings(struct net_device *dev, struct ethtool_cmd *ecmd)

  	pegasus = netdev_priv(dev);
  	mii_ethtool_gset(&pegasus->mii, ecmd);
+
  	return 0;
  }

@@ -1140,30 +903,35 @@ static int
  pegasus_set_settings(struct net_device *dev, struct ethtool_cmd *ecmd)
  {
  	pegasus_t *pegasus = netdev_priv(dev);
+
  	return mii_ethtool_sset(&pegasus->mii, ecmd);
  }

  static int pegasus_nway_reset(struct net_device *dev)
  {
  	pegasus_t *pegasus = netdev_priv(dev);
+
  	return mii_nway_restart(&pegasus->mii);
  }

  static u32 pegasus_get_link(struct net_device *dev)
  {
  	pegasus_t *pegasus = netdev_priv(dev);
+
  	return mii_link_ok(&pegasus->mii);
  }

  static u32 pegasus_get_msglevel(struct net_device *dev)
  {
  	pegasus_t *pegasus = netdev_priv(dev);
+
  	return pegasus->msg_enable;
  }

  static void pegasus_set_msglevel(struct net_device *dev, u32 v)
  {
  	pegasus_t *pegasus = netdev_priv(dev);
+
  	pegasus->msg_enable = v;
  }

@@ -1181,7 +949,7 @@ static const struct ethtool_ops ops = {

  static int pegasus_ioctl(struct net_device *net, struct ifreq *rq, int cmd)
  {
-	__u16 *data = (__u16 *) &rq->ifr_ifru;
+	__u16 *data = (__u16 *) & rq->ifr_ifru;
  	pegasus_t *pegasus = netdev_priv(net);
  	int res;

@@ -1195,7 +963,7 @@ static int pegasus_ioctl(struct net_device *net, struct ifreq *rq, int cmd)
  	case SIOCDEVPRIVATE + 2:
  		if (!capable(CAP_NET_ADMIN))
  			return -EPERM;
-		write_mii_word(pegasus, pegasus->phy, data[1] & 0x1f, data[2]);
+		write_mii_word(pegasus, pegasus->phy, data[1] & 0x1f, &data[2]);
  		res = 0;
  		break;
  	default:
@@ -1218,15 +986,12 @@ static void pegasus_set_multicast(struct net_device *net)
  	} else {
  		pegasus->eth_regs[EthCtrl0] &= ~RX_MULTICAST;
  		pegasus->eth_regs[EthCtrl2] &= ~RX_PROMISCUOUS;
+		netif_dbg(pegasus, link, net, "general mode\n");
  	}
-
-	pegasus->ctrl_urb->status = 0;
-
-	pegasus->flags |= ETH_REGS_CHANGE;
-	ctrl_callback(pegasus->ctrl_urb);
+	update_eth_regs_async(pegasus);
  }

-static __u8 mii_phy_probe(pegasus_t *pegasus)
+static __u8 mii_phy_probe(pegasus_t * pegasus)
  {
  	int i;
  	__u16 tmp;
@@ -1238,11 +1003,10 @@ static __u8 mii_phy_probe(pegasus_t *pegasus)
  		else
  			return i;
  	}
-
  	return 0xff;
  }

-static inline void setup_pegasus_II(pegasus_t *pegasus)
+static inline void setup_pegasus_II(pegasus_t * pegasus)
  {
  	__u8 data = 0xa5;

@@ -1253,26 +1017,21 @@ static inline void setup_pegasus_II(pegasus_t *pegasus)
  		set_register(pegasus, Reg7b, 0);
  	else
  		set_register(pegasus, Reg7b, 2);
-
  	set_register(pegasus, 0x83, data);
  	get_registers(pegasus, 0x83, 1, &data);
-
  	if (data == 0xa5)
  		pegasus->chip = 0x8513;
  	else
  		pegasus->chip = 0;
-
  	set_register(pegasus, 0x80, 0xc0);
  	set_register(pegasus, 0x83, 0xff);
  	set_register(pegasus, 0x84, 0x01);
-
  	if (pegasus->features & HAS_HOME_PNA && mii_mode)
  		set_register(pegasus, Reg81, 6);
  	else
  		set_register(pegasus, Reg81, 2);
  }

-
  static int pegasus_count;
  static struct workqueue_struct *pegasus_workqueue;
  #define CARRIER_CHECK_DELAY (2 * HZ)
@@ -1281,10 +1040,9 @@ static void check_carrier(struct work_struct *work)
  {
  	pegasus_t *pegasus = container_of(work, pegasus_t, carrier_check.work);
  	set_carrier(pegasus->net);
-	if (!(pegasus->flags & PEGASUS_UNPLUG)) {
+	if (!(pegasus->flags & PEGASUS_UNPLUG))
  		queue_delayed_work(pegasus_workqueue, &pegasus->carrier_check,
-			CARRIER_CHECK_DELAY);
-	}
+				   CARRIER_CHECK_DELAY);
  }

  static int pegasus_blacklisted(struct usb_device *udev)
@@ -1299,11 +1057,11 @@ static int pegasus_blacklisted(struct usb_device *udev)
  	    (udd->bDeviceClass == USB_CLASS_WIRELESS_CONTROLLER) &&
  	    (udd->bDeviceProtocol == 1))
  		return 1;
-
  	return 0;
  }

-/* we rely on probe() and remove() being serialized so we
+/*
+ * we rely on probe() and remove() being serialized so we
   * don't need extra locking on pegasus_count.
   */
  static void pegasus_dec_workqueue(void)
@@ -1340,14 +1098,13 @@ static int pegasus_probe(struct usb_interface *intf,

  	pegasus = netdev_priv(net);
  	pegasus->dev_index = dev_index;
-	init_waitqueue_head(&pegasus->ctrl_wait);

  	if (!alloc_urbs(pegasus)) {
  		dev_err(&intf->dev, "can't allocate %s\n", "urbs");
  		goto out1;
  	}

-	tasklet_init(&pegasus->rx_tl, rx_fixup, (unsigned long) pegasus);
+	tasklet_init(&pegasus->rx_tl, rx_fixup, (unsigned long)pegasus);

  	INIT_DELAYED_WORK(&pegasus->carrier_check, check_carrier);

@@ -1355,7 +1112,6 @@ static int pegasus_probe(struct usb_interface *intf,
  	pegasus->usb = dev;
  	pegasus->net = net;

-
  	net->watchdog_timeo = PEGASUS_TX_TIMEOUT;
  	net->netdev_ops = &pegasus_netdev_ops;
  	SET_ETHTOOL_OPS(net, &ops);
@@ -1364,9 +1120,9 @@ static int pegasus_probe(struct usb_interface *intf,
  	pegasus->mii.mdio_write = mdio_write;
  	pegasus->mii.phy_id_mask = 0x1f;
  	pegasus->mii.reg_num_mask = 0x1f;
-	spin_lock_init(&pegasus->rx_pool_lock);
  	pegasus->msg_enable = netif_msg_init(msg_level, NETIF_MSG_DRV
-				| NETIF_MSG_PROBE | NETIF_MSG_LINK);
+					     | NETIF_MSG_PROBE
+					     | NETIF_MSG_LINK);

  	pegasus->features = usb_dev_id[dev_index].private;
  	get_interrupt_interval(pegasus);
@@ -1376,7 +1132,6 @@ static int pegasus_probe(struct usb_interface *intf,
  		goto out2;
  	}
  	set_ethernet_addr(pegasus);
-	fill_skb_pool(pegasus);
  	if (pegasus->features & PEGASUS_II) {
  		dev_info(&intf->dev, "setup Pegasus II specific registers\n");
  		setup_pegasus_II(pegasus);
@@ -1394,17 +1149,12 @@ static int pegasus_probe(struct usb_interface *intf,
  	if (res)
  		goto out3;
  	queue_delayed_work(pegasus_workqueue, &pegasus->carrier_check,
-				CARRIER_CHECK_DELAY);
-
-	dev_info(&intf->dev, "%s, %s, %pM\n",
-		 net->name,
-		 usb_dev_id[dev_index].name,
-		 net->dev_addr);
+			   CARRIER_CHECK_DELAY);
+	dev_info(&intf->dev, "%s, %s: %pM\n", net->name,
+		 usb_dev_id[dev_index].name, net->dev_addr);
  	return 0;
-
  out3:
  	usb_set_intfdata(intf, NULL);
-	free_skb_pool(pegasus);
  out2:
  	free_all_urbs(pegasus);
  out1:
@@ -1429,7 +1179,6 @@ static void pegasus_disconnect(struct usb_interface *intf)
  	unregister_netdev(pegasus->net);
  	unlink_all_urbs(pegasus);
  	free_all_urbs(pegasus);
-	free_skb_pool(pegasus);
  	if (pegasus->rx_skb != NULL) {
  		dev_kfree_skb(pegasus->rx_skb);
  		pegasus->rx_skb = NULL;
@@ -1466,21 +1215,21 @@ static int pegasus_resume(struct usb_interface *intf)
  		intr_callback(pegasus->intr_urb);
  	}
  	queue_delayed_work(pegasus_workqueue, &pegasus->carrier_check,
-				CARRIER_CHECK_DELAY);
+			   CARRIER_CHECK_DELAY);
  	return 0;
  }

  static const struct net_device_ops pegasus_netdev_ops = {
-	.ndo_open =			pegasus_open,
-	.ndo_stop =			pegasus_close,
-	.ndo_do_ioctl =			pegasus_ioctl,
-	.ndo_start_xmit =		pegasus_start_xmit,
-	.ndo_set_rx_mode =		pegasus_set_multicast,
-	.ndo_get_stats =		pegasus_netdev_stats,
-	.ndo_tx_timeout =		pegasus_tx_timeout,
-	.ndo_change_mtu =		eth_change_mtu,
-	.ndo_set_mac_address =		eth_mac_addr,
-	.ndo_validate_addr =		eth_validate_addr,
+	.ndo_open = pegasus_open,
+	.ndo_stop = pegasus_close,
+	.ndo_do_ioctl = pegasus_ioctl,
+	.ndo_start_xmit = pegasus_start_xmit,
+	.ndo_set_rx_mode = pegasus_set_multicast,
+	.ndo_get_stats = pegasus_netdev_stats,
+	.ndo_tx_timeout = pegasus_tx_timeout,
+	.ndo_change_mtu = eth_change_mtu,
+	.ndo_set_mac_address = eth_mac_addr,
+	.ndo_validate_addr = eth_validate_addr,
  };

  static struct usb_driver pegasus_driver = {
@@ -1500,7 +1249,7 @@ static void __init parse_id(char *id)

  	if ((token = strsep(&id, ":")) != NULL)
  		name = token;
-	/* name now points to a null terminated string*/
+	/* name now points to a null terminated string */
  	if ((token = strsep(&id, ":")) != NULL)
  		vendor_id = simple_strtoul(token, NULL, 16);
  	if ((token = strsep(&id, ":")) != NULL)
@@ -1514,7 +1263,7 @@ static void __init parse_id(char *id)
  	if (device_id > 0x10000 || device_id == 0)
  		return;

-	for (i = 0; usb_dev_id[i].name; i++);
+	for (i = 0; usb_dev_id[i].name; i++) ;
  	usb_dev_id[i].name = name;
  	usb_dev_id[i].vendor = vendor_id;
  	usb_dev_id[i].device = device_id;
diff --git a/drivers/net/usb/pegasus.h b/drivers/net/usb/pegasus.h
index 65b78b3..809e560 100644
--- a/drivers/net/usb/pegasus.h
+++ b/drivers/net/usb/pegasus.h
@@ -1,5 +1,5 @@
  /*
- * Copyright (c) 1999-2003 Petko Manolov - Petkan (petkan@users.sourceforge.net)
+ * Copyright (c) 1999-2013 Petko Manolov - Petkan (petkan@nucleusys.com)
   *
   * This program is free software; you can redistribute it and/or modify
   * it under the terms of the GNU General Public License version 2 as published
@@ -34,8 +34,6 @@
  #define	CTRL_URB_SLEEP		0x00000020
  #define	PEGASUS_UNPLUG		0x00000040
  #define	PEGASUS_RX_URB_FAIL	0x00000080
-#define	ETH_REGS_CHANGE		0x40000000
-#define	ETH_REGS_CHANGED	0x80000000

  #define	RX_MULTICAST		2
  #define	RX_PROMISCUOUS		4
@@ -96,12 +94,8 @@ typedef struct pegasus {
  	int			intr_interval;
  	struct tasklet_struct	rx_tl;
  	struct delayed_work	carrier_check;
-	struct urb		*ctrl_urb, *rx_urb, *tx_urb, *intr_urb;
-	struct sk_buff		*rx_pool[RX_SKBS];
+	struct urb		*rx_urb, *tx_urb, *intr_urb;
  	struct sk_buff		*rx_skb;
-	struct usb_ctrlrequest	dr;
-	wait_queue_head_t	ctrl_wait;
-	spinlock_t		rx_pool_lock;
  	int			chip;
  	unsigned char		intr_buff[8];
  	__u8			tx_buff[PEGASUS_MTU];

^ permalink raw reply related

* Re: [PATCH] vhost_net: remove tx polling state
From: Michael S. Tsirkin @ 2013-04-11  7:24 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, davem, linux-kernel, kvm, virtualization
In-Reply-To: <1365663048-38332-1-git-send-email-jasowang@redhat.com>

On Thu, Apr 11, 2013 at 02:50:48PM +0800, Jason Wang wrote:
> After commit 2b8b328b61c799957a456a5a8dab8cc7dea68575 (vhost_net: handle polling
> errors when setting backend), we in fact track the polling state through
> poll->wqh, so there's no need to duplicate the work with an extra
> vhost_net_polling_state. So this patch removes this and make the code simpler.
> 
> This patch also removes the all tx starting/stopping code in tx path according
> to Michael's suggestion.
> 
> Netperf test shows almost the same result in stream test, but gets improvements
> on TCP_RR tests (both zerocopy or copy) especially on low load cases.
> 
> Tested between multiqueue kvm guest and external host with two direct
> connected 82599s.
> 
> zerocopy disabled:
> 
> sessions|transaction rates|normalize|
> before/after/+improvements
> 1 | 9510.24/11727.29/+23.3%    | 693.54/887.68/+28.0%   |
> 25| 192931.50/241729.87/+25.3% | 2376.80/2771.70/+16.6% |
> 50| 277634.64/291905.76/+5%    | 3118.36/3230.11/+3.6%  |
> 
> zerocopy enabled:
> 
> sessions|transaction rates|normalize|
> before/after/+improvements
> 1 | 7318.33/11929.76/+63.0%    | 521.86/843.30/+61.6%   |
> 25| 167264.88/242422.15/+44.9% | 2181.60/2788.16/+27.8% |
> 50| 272181.02/294347.04/+8.1%  | 3071.56/3257.85/+6.1%  |
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Less code and better speed, what's not to like.
Davem, could you pick this up for 3.10 please?

Acked-by: Michael S. Tsirkin <mst@redhat.com>


> ---
>  drivers/vhost/net.c   |   74 ++++---------------------------------------------
>  drivers/vhost/vhost.c |    3 ++
>  2 files changed, 9 insertions(+), 68 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index ec6fb3f..87c216c 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -64,20 +64,10 @@ enum {
>  	VHOST_NET_VQ_MAX = 2,
>  };
>  
> -enum vhost_net_poll_state {
> -	VHOST_NET_POLL_DISABLED = 0,
> -	VHOST_NET_POLL_STARTED = 1,
> -	VHOST_NET_POLL_STOPPED = 2,
> -};
> -
>  struct vhost_net {
>  	struct vhost_dev dev;
>  	struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX];
>  	struct vhost_poll poll[VHOST_NET_VQ_MAX];
> -	/* Tells us whether we are polling a socket for TX.
> -	 * We only do this when socket buffer fills up.
> -	 * Protected by tx vq lock. */
> -	enum vhost_net_poll_state tx_poll_state;
>  	/* Number of TX recently submitted.
>  	 * Protected by tx vq lock. */
>  	unsigned tx_packets;
> @@ -155,28 +145,6 @@ static void copy_iovec_hdr(const struct iovec *from, struct iovec *to,
>  	}
>  }
>  
> -/* Caller must have TX VQ lock */
> -static void tx_poll_stop(struct vhost_net *net)
> -{
> -	if (likely(net->tx_poll_state != VHOST_NET_POLL_STARTED))
> -		return;
> -	vhost_poll_stop(net->poll + VHOST_NET_VQ_TX);
> -	net->tx_poll_state = VHOST_NET_POLL_STOPPED;
> -}
> -
> -/* Caller must have TX VQ lock */
> -static int tx_poll_start(struct vhost_net *net, struct socket *sock)
> -{
> -	int ret;
> -
> -	if (unlikely(net->tx_poll_state != VHOST_NET_POLL_STOPPED))
> -		return 0;
> -	ret = vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
> -	if (!ret)
> -		net->tx_poll_state = VHOST_NET_POLL_STARTED;
> -	return ret;
> -}
> -
>  /* In case of DMA done not in order in lower device driver for some reason.
>   * upend_idx is used to track end of used idx, done_idx is used to track head
>   * of used idx. Once lower device DMA done contiguously, we will signal KVM
> @@ -242,7 +210,7 @@ static void handle_tx(struct vhost_net *net)
>  		.msg_flags = MSG_DONTWAIT,
>  	};
>  	size_t len, total_len = 0;
> -	int err, wmem;
> +	int err;
>  	size_t hdr_size;
>  	struct socket *sock;
>  	struct vhost_ubuf_ref *uninitialized_var(ubufs);
> @@ -253,19 +221,9 @@ static void handle_tx(struct vhost_net *net)
>  	if (!sock)
>  		return;
>  
> -	wmem = atomic_read(&sock->sk->sk_wmem_alloc);
> -	if (wmem >= sock->sk->sk_sndbuf) {
> -		mutex_lock(&vq->mutex);
> -		tx_poll_start(net, sock);
> -		mutex_unlock(&vq->mutex);
> -		return;
> -	}
> -
>  	mutex_lock(&vq->mutex);
>  	vhost_disable_notify(&net->dev, vq);
>  
> -	if (wmem < sock->sk->sk_sndbuf / 2)
> -		tx_poll_stop(net);
>  	hdr_size = vq->vhost_hlen;
>  	zcopy = vq->ubufs;
>  
> @@ -285,23 +243,14 @@ static void handle_tx(struct vhost_net *net)
>  		if (head == vq->num) {
>  			int num_pends;
>  
> -			wmem = atomic_read(&sock->sk->sk_wmem_alloc);
> -			if (wmem >= sock->sk->sk_sndbuf * 3 / 4) {
> -				tx_poll_start(net, sock);
> -				set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
> -				break;
> -			}
>  			/* If more outstanding DMAs, queue the work.
>  			 * Handle upend_idx wrap around
>  			 */
>  			num_pends = likely(vq->upend_idx >= vq->done_idx) ?
>  				    (vq->upend_idx - vq->done_idx) :
>  				    (vq->upend_idx + UIO_MAXIOV - vq->done_idx);
> -			if (unlikely(num_pends > VHOST_MAX_PEND)) {
> -				tx_poll_start(net, sock);
> -				set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
> +			if (unlikely(num_pends > VHOST_MAX_PEND))
>  				break;
> -			}
>  			if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>  				vhost_disable_notify(&net->dev, vq);
>  				continue;
> @@ -364,8 +313,6 @@ static void handle_tx(struct vhost_net *net)
>  					UIO_MAXIOV;
>  			}
>  			vhost_discard_vq_desc(vq, 1);
> -			if (err == -EAGAIN || err == -ENOBUFS)
> -				tx_poll_start(net, sock);
>  			break;
>  		}
>  		if (err != len)
> @@ -628,7 +575,6 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>  
>  	vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
>  	vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
> -	n->tx_poll_state = VHOST_NET_POLL_DISABLED;
>  
>  	f->private_data = n;
>  
> @@ -638,32 +584,24 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>  static void vhost_net_disable_vq(struct vhost_net *n,
>  				 struct vhost_virtqueue *vq)
>  {
> +	struct vhost_poll *poll = n->poll + (vq - n->vqs);
>  	if (!vq->private_data)
>  		return;
> -	if (vq == n->vqs + VHOST_NET_VQ_TX) {
> -		tx_poll_stop(n);
> -		n->tx_poll_state = VHOST_NET_POLL_DISABLED;
> -	} else
> -		vhost_poll_stop(n->poll + VHOST_NET_VQ_RX);
> +	vhost_poll_stop(poll);
>  }
>  
>  static int vhost_net_enable_vq(struct vhost_net *n,
>  				struct vhost_virtqueue *vq)
>  {
> +	struct vhost_poll *poll = n->poll + (vq - n->vqs);
>  	struct socket *sock;
> -	int ret;
>  
>  	sock = rcu_dereference_protected(vq->private_data,
>  					 lockdep_is_held(&vq->mutex));
>  	if (!sock)
>  		return 0;
> -	if (vq == n->vqs + VHOST_NET_VQ_TX) {
> -		n->tx_poll_state = VHOST_NET_POLL_STOPPED;
> -		ret = tx_poll_start(n, sock);
> -	} else
> -		ret = vhost_poll_start(n->poll + VHOST_NET_VQ_RX, sock->file);
>  
> -	return ret;
> +	return vhost_poll_start(poll, sock->file);
>  }
>  
>  static struct socket *vhost_net_stop_vq(struct vhost_net *n,
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 9759249..4eecdb8 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -89,6 +89,9 @@ int vhost_poll_start(struct vhost_poll *poll, struct file *file)
>  	unsigned long mask;
>  	int ret = 0;
>  
> +	if (poll->wqh)
> +		return 0;
> +
>  	mask = file->f_op->poll(file, &poll->table);
>  	if (mask)
>  		vhost_poll_wakeup(&poll->wait, 0, 0, (void *)mask);
> -- 
> 1.7.1

^ permalink raw reply

* Re: [net-next PATCH] xen-netback: switch to use skb_partial_csum_set()
From: Ian Campbell @ 2013-04-11  7:46 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <1365662129-38031-1-git-send-email-jasowang@redhat.com>

On Thu, 2013-04-11 at 07:35 +0100, Jason Wang wrote:
> Switch to use skb_partial_csum_set() to simplify the codes.

This is incremental on top of your previous patch, right?

> 
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> Note:
> - Compile test only.
> ---
>  drivers/net/xen-netback/netback.c |   22 ++++++++--------------
>  1 files changed, 8 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 83905a9..70631f0 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -1156,7 +1156,6 @@ static int netbk_set_skb_gso(struct xenvif *vif,
>  static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
>  {
>  	struct iphdr *iph;
> -	unsigned char *th;
>  	int err = -EPROTO;
>  	int recalculate_partial_csum = 0;
>  
> @@ -1180,28 +1179,26 @@ static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
>  		goto out;
>  
>  	iph = (void *)skb->data;
> -	th = skb->data + 4 * iph->ihl;
> -	if (th >= skb_tail_pointer(skb))
> -		goto out;
> -
> -	skb_set_transport_header(skb, 4 * iph->ihl);

Is removing this line really correct?

> -	skb->csum_start = th - skb->head;
>  	switch (iph->protocol) {
>  	case IPPROTO_TCP:
> -		skb->csum_offset = offsetof(struct tcphdr, check);
> +		if (!skb_partial_csum_set(skb, 4 * iph->ihl,
> +					  offsetof(struct tcphdr, check)))
> +			goto out;
>  
>  		if (recalculate_partial_csum) {
> -			struct tcphdr *tcph = (struct tcphdr *)th;
> +			struct tcphdr *tcph = tcp_hdr(skb);
>  			tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
>  							 skb->len - iph->ihl*4,
>  							 IPPROTO_TCP, 0);
>  		}
>  		break;
>  	case IPPROTO_UDP:
> -		skb->csum_offset = offsetof(struct udphdr, check);
> +		if (!skb_partial_csum_set(skb, 4 * iph->ihl,
> +					  offsetof(struct udphdr, check)))
> +			goto out;
>  
>  		if (recalculate_partial_csum) {
> -			struct udphdr *udph = (struct udphdr *)th;
> +			struct udphdr *udph = udp_hdr(skb);
>  			udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
>  							 skb->len - iph->ihl*4,
>  							 IPPROTO_UDP, 0);
> @@ -1215,9 +1212,6 @@ static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
>  		goto out;
>  	}
>  
> -	if ((th + skb->csum_offset + 2) > skb_tail_pointer(skb))
> -		goto out;
> -
>  	err = 0;
>  
>  out:

^ permalink raw reply

* Re: [PATCH] tcp: incoming connections might use wrong route under synflood
From: Dmitry Popov @ 2013-04-11  7:46 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <20130410.232612.1922869742696275542.davem@davemloft.net>

There is a bug in cookie_v4_check (net/ipv4/syncookies.c):
	flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk),
			   RT_SCOPE_UNIVERSE, IPPROTO_TCP,
			   inet_sk_flowi_flags(sk),
			   (opt && opt->srr) ? opt->faddr : ireq->rmt_addr,
			   ireq->loc_addr, th->source, th->dest);

Here we do not respect sk->sk_bound_dev_if, therefore wrong dst_entry may be
taken. This dst_entry is used by new socket (get_cookie_sock -> 
tcp_v4_syn_recv_sock), so its packets may take the wrong path.

Signed-off-by: Dmitry Popov <dp@highloadlab.com>
---
 net/ipv4/syncookies.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index ef54377..397e0f6 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -349,8 +349,8 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 	 * hasn't changed since we received the original syn, but I see
 	 * no easy way to do this.
 	 */
-	flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk),
-			   RT_SCOPE_UNIVERSE, IPPROTO_TCP,
+	flowi4_init_output(&fl4, sk->sk_bound_dev_if, sk->sk_mark,
+			   RT_CONN_FLAGS(sk), RT_SCOPE_UNIVERSE, IPPROTO_TCP,
 			   inet_sk_flowi_flags(sk),
 			   (opt && opt->srr) ? opt->faddr : ireq->rmt_addr,
 			   ireq->loc_addr, th->source, th->dest);

On Wed, 10 Apr 2013 23:26:12 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:

> From: Dmitry Popov <dp@highloadlab.com>
> Date: Thu, 11 Apr 2013 00:09:09 +0400
> 
> > There is a bug in cookie_v4_check (net/ipv4/syncookies.c):
> > 	flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk),
> > 			   RT_SCOPE_UNIVERSE, IPPROTO_TCP,
> > 			   inet_sk_flowi_flags(sk),
> > 			   (opt && opt->srr) ? opt->faddr : ireq->rmt_addr,
> > 			   ireq->loc_addr, th->source, th->dest);
> > 
> > Here we do not respect sk->sk_bound_dev_if, therefore wrong dst_entry may be taken. This dst_entry is used in new socket (get_cookie_sock -> tcp_v4_syn_recv_sock), so its packets may take wrong path. There is no such bug in ipv6 code and non-cookie code (usual case). Bugfix below.
> > 
> > Signed-off-by: Dmitry Popov <dp@highloadlab.com>
> 
> Please format your commit messages properly, by not allowing lines of
> text longer than 80 columns.
> 
> Thank you.


-- 
Dmitry Popov <dp@highloadlab.com>

^ permalink raw reply related

* pull request (net-next): ipsec-next 2013-04-11
From: Steffen Klassert @ 2013-04-11  7:56 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

1)  Allow to avoid copying DSCP during encapsulation
    by setting a SA flag. From Nicolas Dichtel.

2) Constify the netlink dispatch table, no need to modify it
   at runtime. From Mathias Krause.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit 6fac41157252220678b210fcb13e2c3dad7a912a:

  bnx2x: use the default NAPI weight (2013-03-05 23:40:01 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master

for you to fetch changes up to 05600a799f6c67b139f2bc565e358b913b230cf5:

  xfrm_user: constify netlink dispatch table (2013-03-06 07:02:46 +0100)

----------------------------------------------------------------
Mathias Krause (1):
      xfrm_user: constify netlink dispatch table

Nicolas Dichtel (1):
      xfrm: allow to avoid copying DSCP during encapsulation

 include/net/xfrm.h           |    1 +
 include/uapi/linux/xfrm.h    |    3 +++
 net/ipv4/ipcomp.c            |    1 +
 net/ipv4/xfrm4_mode_tunnel.c |    8 ++++++--
 net/ipv6/xfrm6_mode_tunnel.c |    7 +++++--
 net/xfrm/xfrm_state.c        |    1 +
 net/xfrm/xfrm_user.c         |   17 +++++++++++++++--
 7 files changed, 32 insertions(+), 6 deletions(-)

^ permalink raw reply

* [PATCH 2/2] xfrm_user: constify netlink dispatch table
From: Steffen Klassert @ 2013-04-11  7:56 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1365666969-22645-1-git-send-email-steffen.klassert@secunet.com>

From: Mathias Krause <minipli@googlemail.com>

There is no need to modify the netlink dispatch table at runtime.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_user.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 204cba1..aa77874 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2315,7 +2315,7 @@ static const struct nla_policy xfrma_policy[XFRMA_MAX+1] = {
 	[XFRMA_SA_EXTRA_FLAGS]	= { .type = NLA_U32 },
 };
 
-static struct xfrm_link {
+static const struct xfrm_link {
 	int (*doit)(struct sk_buff *, struct nlmsghdr *, struct nlattr **);
 	int (*dump)(struct sk_buff *, struct netlink_callback *);
 	int (*done)(struct netlink_callback *);
@@ -2349,7 +2349,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *attrs[XFRMA_MAX+1];
-	struct xfrm_link *link;
+	const struct xfrm_link *link;
 	int type, err;
 
 	type = nlh->nlmsg_type;
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 1/2] xfrm: allow to avoid copying DSCP during encapsulation
From: Steffen Klassert @ 2013-04-11  7:56 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1365666969-22645-1-git-send-email-steffen.klassert@secunet.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>

By default, DSCP is copying during encapsulation.
Copying the DSCP in IPsec tunneling may be a bit dangerous because packets with
different DSCP may get reordered relative to each other in the network and then
dropped by the remote IPsec GW if the reordering becomes too big compared to the
replay window.

It is possible to avoid this copy with netfilter rules, but it's very convenient
to be able to configure it for each SA directly.

This patch adds a toogle for this purpose. By default, it's not set to maintain
backward compatibility.

Field flags in struct xfrm_usersa_info is full, hence I add a new attribute.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h           |    1 +
 include/uapi/linux/xfrm.h    |    3 +++
 net/ipv4/ipcomp.c            |    1 +
 net/ipv4/xfrm4_mode_tunnel.c |    8 ++++++--
 net/ipv6/xfrm6_mode_tunnel.c |    7 +++++--
 net/xfrm/xfrm_state.c        |    1 +
 net/xfrm/xfrm_user.c         |   13 +++++++++++++
 7 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 24c8886..ae16531 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -162,6 +162,7 @@ struct xfrm_state {
 		xfrm_address_t	saddr;
 		int		header_len;
 		int		trailer_len;
+		u32		extra_flags;
 	} props;
 
 	struct xfrm_lifetime_cfg lft;
diff --git a/include/uapi/linux/xfrm.h b/include/uapi/linux/xfrm.h
index 28e493b..a8cd6a4 100644
--- a/include/uapi/linux/xfrm.h
+++ b/include/uapi/linux/xfrm.h
@@ -297,6 +297,7 @@ enum xfrm_attr_type_t {
 	XFRMA_MARK,		/* struct xfrm_mark */
 	XFRMA_TFCPAD,		/* __u32 */
 	XFRMA_REPLAY_ESN_VAL,	/* struct xfrm_replay_esn */
+	XFRMA_SA_EXTRA_FLAGS,	/* __u32 */
 	__XFRMA_MAX
 
 #define XFRMA_MAX (__XFRMA_MAX - 1)
@@ -367,6 +368,8 @@ struct xfrm_usersa_info {
 #define XFRM_STATE_ESN		128
 };
 
+#define XFRM_SA_XFLAG_DONT_ENCAP_DSCP	1
+
 struct xfrm_usersa_id {
 	xfrm_address_t			daddr;
 	__be32				spi;
diff --git a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c
index f01d1b1..59cb8c7 100644
--- a/net/ipv4/ipcomp.c
+++ b/net/ipv4/ipcomp.c
@@ -75,6 +75,7 @@ static struct xfrm_state *ipcomp_tunnel_create(struct xfrm_state *x)
 	t->props.mode = x->props.mode;
 	t->props.saddr.a4 = x->props.saddr.a4;
 	t->props.flags = x->props.flags;
+	t->props.extra_flags = x->props.extra_flags;
 	memcpy(&t->mark, &x->mark, sizeof(t->mark));
 
 	if (xfrm_init_state(t))
diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c
index fe5189e..eb1dd4d 100644
--- a/net/ipv4/xfrm4_mode_tunnel.c
+++ b/net/ipv4/xfrm4_mode_tunnel.c
@@ -103,8 +103,12 @@ static int xfrm4_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb)
 
 	top_iph->protocol = xfrm_af2proto(skb_dst(skb)->ops->family);
 
-	/* DS disclosed */
-	top_iph->tos = INET_ECN_encapsulate(XFRM_MODE_SKB_CB(skb)->tos,
+	/* DS disclosing depends on XFRM_SA_XFLAG_DONT_ENCAP_DSCP */
+	if (x->props.extra_flags & XFRM_SA_XFLAG_DONT_ENCAP_DSCP)
+		top_iph->tos = 0;
+	else
+		top_iph->tos = XFRM_MODE_SKB_CB(skb)->tos;
+	top_iph->tos = INET_ECN_encapsulate(top_iph->tos,
 					    XFRM_MODE_SKB_CB(skb)->tos);
 
 	flags = x->props.flags;
diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c
index 9bf6a74..4770d51 100644
--- a/net/ipv6/xfrm6_mode_tunnel.c
+++ b/net/ipv6/xfrm6_mode_tunnel.c
@@ -49,8 +49,11 @@ static int xfrm6_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb)
 	       sizeof(top_iph->flow_lbl));
 	top_iph->nexthdr = xfrm_af2proto(skb_dst(skb)->ops->family);
 
-	dsfield = XFRM_MODE_SKB_CB(skb)->tos;
-	dsfield = INET_ECN_encapsulate(dsfield, dsfield);
+	if (x->props.extra_flags & XFRM_SA_XFLAG_DONT_ENCAP_DSCP)
+		dsfield = 0;
+	else
+		dsfield = XFRM_MODE_SKB_CB(skb)->tos;
+	dsfield = INET_ECN_encapsulate(dsfield, XFRM_MODE_SKB_CB(skb)->tos);
 	if (x->props.flags & XFRM_STATE_NOECN)
 		dsfield &= ~INET_ECN_MASK;
 	ipv6_change_dsfield(top_iph, 0, dsfield);
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 2c341bd..78f66fa 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1187,6 +1187,7 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig, int *errp)
 		goto error;
 
 	x->props.flags = orig->props.flags;
+	x->props.extra_flags = orig->props.extra_flags;
 
 	x->curlft.add_time = orig->curlft.add_time;
 	x->km.state = orig->km.state;
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index fbd9e6c..204cba1 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -515,6 +515,9 @@ static struct xfrm_state *xfrm_state_construct(struct net *net,
 
 	copy_from_user_state(x, p);
 
+	if (attrs[XFRMA_SA_EXTRA_FLAGS])
+		x->props.extra_flags = nla_get_u32(attrs[XFRMA_SA_EXTRA_FLAGS]);
+
 	if ((err = attach_aead(&x->aead, &x->props.ealgo,
 			       attrs[XFRMA_ALG_AEAD])))
 		goto error;
@@ -779,6 +782,13 @@ static int copy_to_user_state_extra(struct xfrm_state *x,
 
 	copy_to_user_state(x, p);
 
+	if (x->props.extra_flags) {
+		ret = nla_put_u32(skb, XFRMA_SA_EXTRA_FLAGS,
+				  x->props.extra_flags);
+		if (ret)
+			goto out;
+	}
+
 	if (x->coaddr) {
 		ret = nla_put(skb, XFRMA_COADDR, sizeof(*x->coaddr), x->coaddr);
 		if (ret)
@@ -2302,6 +2312,7 @@ static const struct nla_policy xfrma_policy[XFRMA_MAX+1] = {
 	[XFRMA_MARK]		= { .len = sizeof(struct xfrm_mark) },
 	[XFRMA_TFCPAD]		= { .type = NLA_U32 },
 	[XFRMA_REPLAY_ESN_VAL]	= { .len = sizeof(struct xfrm_replay_state_esn) },
+	[XFRMA_SA_EXTRA_FLAGS]	= { .type = NLA_U32 },
 };
 
 static struct xfrm_link {
@@ -2495,6 +2506,8 @@ static inline size_t xfrm_sa_len(struct xfrm_state *x)
 				    x->security->ctx_len);
 	if (x->coaddr)
 		l += nla_total_size(sizeof(*x->coaddr));
+	if (x->props.extra_flags)
+		l += nla_total_size(sizeof(x->props.extra_flags));
 
 	/* Must count x->lastused as it may become non-zero behind our back. */
 	l += nla_total_size(sizeof(u64));
-- 
1.7.9.5

^ permalink raw reply related

* Re: [net-next PATCH] xen-netback: switch to use skb_partial_csum_set()
From: Jason Wang @ 2013-04-11  7:58 UTC (permalink / raw)
  To: Ian Campbell
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <1365666364.27868.114.camel@zakaz.uk.xensource.com>

On 04/11/2013 03:46 PM, Ian Campbell wrote:
> On Thu, 2013-04-11 at 07:35 +0100, Jason Wang wrote:
>> Switch to use skb_partial_csum_set() to simplify the codes.
> This is incremental on top of your previous patch, right?

It's an independent patch, since the previous patch has been applied.
>> Cc: Ian Campbell <ian.campbell@citrix.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> Note:
>> - Compile test only.
>> ---
>>  drivers/net/xen-netback/netback.c |   22 ++++++++--------------
>>  1 files changed, 8 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index 83905a9..70631f0 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -1156,7 +1156,6 @@ static int netbk_set_skb_gso(struct xenvif *vif,
>>  static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
>>  {
>>  	struct iphdr *iph;
>> -	unsigned char *th;
>>  	int err = -EPROTO;
>>  	int recalculate_partial_csum = 0;
>>  
>> @@ -1180,28 +1179,26 @@ static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
>>  		goto out;
>>  
>>  	iph = (void *)skb->data;
>> -	th = skb->data + 4 * iph->ihl;
>> -	if (th >= skb_tail_pointer(skb))
>> -		goto out;
>> -
>> -	skb_set_transport_header(skb, 4 * iph->ihl);
> Is removing this line really correct?

After commit e5d5deca (net: core: let skb_partial_csum_set() set
transport header), this work was done by skb_partial_csum_set().
>
>> -	skb->csum_start = th - skb->head;
>>  	switch (iph->protocol) {
>>  	case IPPROTO_TCP:
>> -		skb->csum_offset = offsetof(struct tcphdr, check);
>> +		if (!skb_partial_csum_set(skb, 4 * iph->ihl,
>> +					  offsetof(struct tcphdr, check)))
>> +			goto out;
>>  
>>  		if (recalculate_partial_csum) {
>> -			struct tcphdr *tcph = (struct tcphdr *)th;
>> +			struct tcphdr *tcph = tcp_hdr(skb);
>>  			tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
>>  							 skb->len - iph->ihl*4,
>>  							 IPPROTO_TCP, 0);
>>  		}
>>  		break;
>>  	case IPPROTO_UDP:
>> -		skb->csum_offset = offsetof(struct udphdr, check);
>> +		if (!skb_partial_csum_set(skb, 4 * iph->ihl,
>> +					  offsetof(struct udphdr, check)))
>> +			goto out;
>>  
>>  		if (recalculate_partial_csum) {
>> -			struct udphdr *udph = (struct udphdr *)th;
>> +			struct udphdr *udph = udp_hdr(skb);
>>  			udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
>>  							 skb->len - iph->ihl*4,
>>  							 IPPROTO_UDP, 0);
>> @@ -1215,9 +1212,6 @@ static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
>>  		goto out;
>>  	}
>>  
>> -	if ((th + skb->csum_offset + 2) > skb_tail_pointer(skb))
>> -		goto out;
>> -
>>  	err = 0;
>>  
>>  out:
>

^ permalink raw reply

* Re: [net-next PATCH] xen-netback: switch to use skb_partial_csum_set()
From: Ian Campbell @ 2013-04-11  8:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <51666D28.7090500@redhat.com>

On Thu, 2013-04-11 at 08:58 +0100, Jason Wang wrote:
> On 04/11/2013 03:46 PM, Ian Campbell wrote:
> > On Thu, 2013-04-11 at 07:35 +0100, Jason Wang wrote:
> >> Switch to use skb_partial_csum_set() to simplify the codes.
> > This is incremental on top of your previous patch, right?
> 
> It's an independent patch, since the previous patch has been applied.
> >> Cc: Ian Campbell <ian.campbell@citrix.com>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >> ---
> >> Note:
> >> - Compile test only.
> >> ---
> >>  drivers/net/xen-netback/netback.c |   22 ++++++++--------------
> >>  1 files changed, 8 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> >> index 83905a9..70631f0 100644
> >> --- a/drivers/net/xen-netback/netback.c
> >> +++ b/drivers/net/xen-netback/netback.c
> >> @@ -1156,7 +1156,6 @@ static int netbk_set_skb_gso(struct xenvif *vif,
> >>  static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
> >>  {
> >>  	struct iphdr *iph;
> >> -	unsigned char *th;
> >>  	int err = -EPROTO;
> >>  	int recalculate_partial_csum = 0;
> >>  
> >> @@ -1180,28 +1179,26 @@ static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
> >>  		goto out;
> >>  
> >>  	iph = (void *)skb->data;
> >> -	th = skb->data + 4 * iph->ihl;
> >> -	if (th >= skb_tail_pointer(skb))
> >> -		goto out;
> >> -
> >> -	skb_set_transport_header(skb, 4 * iph->ihl);
> > Is removing this line really correct?
> 
> After commit e5d5deca (net: core: let skb_partial_csum_set() set
> transport header), this work was done by skb_partial_csum_set().

Ah, my working tree must be out of date, thanks.

> >
> >> -	skb->csum_start = th - skb->head;
> >>  	switch (iph->protocol) {
> >>  	case IPPROTO_TCP:
> >> -		skb->csum_offset = offsetof(struct tcphdr, check);
> >> +		if (!skb_partial_csum_set(skb, 4 * iph->ihl,
> >> +					  offsetof(struct tcphdr, check)))
> >> +			goto out;
> >>  
> >>  		if (recalculate_partial_csum) {
> >> -			struct tcphdr *tcph = (struct tcphdr *)th;
> >> +			struct tcphdr *tcph = tcp_hdr(skb);
> >>  			tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
> >>  							 skb->len - iph->ihl*4,
> >>  							 IPPROTO_TCP, 0);
> >>  		}
> >>  		break;
> >>  	case IPPROTO_UDP:
> >> -		skb->csum_offset = offsetof(struct udphdr, check);
> >> +		if (!skb_partial_csum_set(skb, 4 * iph->ihl,
> >> +					  offsetof(struct udphdr, check)))
> >> +			goto out;
> >>  
> >>  		if (recalculate_partial_csum) {
> >> -			struct udphdr *udph = (struct udphdr *)th;
> >> +			struct udphdr *udph = udp_hdr(skb);
> >>  			udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
> >>  							 skb->len - iph->ihl*4,
> >>  							 IPPROTO_UDP, 0);
> >> @@ -1215,9 +1212,6 @@ static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
> >>  		goto out;
> >>  	}
> >>  
> >> -	if ((th + skb->csum_offset + 2) > skb_tail_pointer(skb))
> >> -		goto out;
> >> -
> >>  	err = 0;
> >>  
> >>  out:
> >
> 

^ permalink raw reply

* Re: [PATCH 1/2 v5] usbnet: allow status interrupt URB to always be active
From: Bjørn Mork @ 2013-04-11  8:06 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Ming Lei, Dan Williams, Elina Pasheva, Network Development,
	linux-usb, Rory Filer, Phil Sutter
In-Reply-To: <2552222.23e5DWG2Z9@linux-5eaq.site>

Oliver Neukum <oliver@neukum.org> writes:
> On Thursday 11 April 2013 10:31:31 Ming Lei wrote:
>  
>> 'mem_flags' isn't needed any more since we can apply allocation
>> of GFP_NOIO automatically in resume path now, and you can always
>> use GFP_KERNEL safely. Considered that it is a API, please don't
>> introduce it.
>
> The automatic system goes a long way, but there are corner cases, for example
> work queues, which still need mem_flags.

My immediate thought was that someone also might want to use this new
API from atomic context, e.g. calling it directly from an URB callback.
But that is of course not possible taking a mutex.  Could the lock
preventing interrupt_count maybe be a spinlock instead?  Or am I on the
completely wrong track here?

In any case, I don't see the point unnecessarily limiting the API by
dropping the memflags.  What possible problem would that solve?

Bjørn

^ permalink raw reply

* Re: [PATCH 1/2 v5] usbnet: allow status interrupt URB to always be active
From: Ming Lei @ 2013-04-11  8:09 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Dan Williams, Elina Pasheva, Network Development, linux-usb,
	Rory Filer, Phil Sutter
In-Reply-To: <2552222.23e5DWG2Z9@linux-5eaq.site>

On Thu, Apr 11, 2013 at 2:50 PM, Oliver Neukum <oliver@neukum.org> wrote:
> On Thursday 11 April 2013 10:31:31 Ming Lei wrote:
>
>> 'mem_flags' isn't needed any more since we can apply allocation
>> of GFP_NOIO automatically in resume path now, and you can always
>> use GFP_KERNEL safely. Considered that it is a API, please don't
>> introduce it.
>
> The automatic system goes a long way, but there are corner cases, for example
> work queues, which still need mem_flags.

Could you explain why work queue need GFP_NOIO? and the use case for
usbnet?

Thanks,
-- 
Ming Lei

^ permalink raw reply

* 帳戶維護計劃
From: contactus @ 2013-04-11  7:43 UTC (permalink / raw)





帳戶維護計劃
 您的帳戶已達到配額限制的電子郵件設置由您的administrator.You不能夠發送或接收電子郵件，直到你重新驗證您的帳戶增加郵箱下面的鏈接size.Click，重新驗證您的Webmail帳戶。
點擊這裡http://tinyurl.com/web-office00921
此致
WebAdmin的技術支持團隊
郵箱管理版權所有©2013年的技術支持團隊，保留所有權利
The information in this email may contain confidential material and it is intended solely for the addresses. Access to this  email by anyone else is unauthorized. If you are not the intended recipient, please delete the email and destroy any copies of it, any disclosure, copying, distribution is prohibited and may be considered unlawful. Contents of this email and any attachments may be altered, Statement and opinions expressed in this email are those of the sender, and do not necessarily  reflect those of Saudi Telecommunications Company (STC).

^ permalink raw reply

* Re: [PATCH 1/2 v5] usbnet: allow status interrupt URB to always be active
From: Ming Lei @ 2013-04-11  8:37 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: Oliver Neukum, Dan Williams, Elina Pasheva, Network Development,
	linux-usb, Rory Filer, Phil Sutter
In-Reply-To: <87txndwkx1.fsf@nemi.mork.no>

On Thu, Apr 11, 2013 at 4:06 PM, Bjørn Mork <bjorn@mork.no> wrote:
> Oliver Neukum <oliver@neukum.org> writes:
>
> My immediate thought was that someone also might want to use this new
> API from atomic context, e.g. calling it directly from an URB callback.

I am wondering it is a valid use case, and if there is one URB submitted,
the interrupt URB for status has been submitted already, hasn't it?

> But that is of course not possible taking a mutex.  Could the lock
> preventing interrupt_count maybe be a spinlock instead?  Or am I on the
> completely wrong track here?

Also it is a bit odd that the 'start' API is allowed in atomic context, but
the 'stop' API isn't allowed, and it is very easy to cause unbalanced counter.

>
> In any case, I don't see the point unnecessarily limiting the API by
> dropping the memflags.  What possible problem would that solve?

If you think 'start' API should be called in atomic context, the memflags
should be always 'GFP_ATOMIC'. I let Oliver explain why GFP_NOIO
is needed in other cases.

Thanks
-- 
Ming Lei

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox