Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Strange packet drops with heavy firewalling
From: Benny Amorsen @ 2010-04-15 13:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, zhigang gong, netdev
In-Reply-To: <m3hbnfzsye.fsf@ursa.amorsen.dk>

Benny Amorsen <benny+usenet@amorsen.dk> writes:

> I'll keep monitoring the server, and if it starts dropping packets again
> or load increases I'll check whether irqbalanced does the right thing,
> and if not I'll implement your suggestion.

It did start dropping packets (although very few, a few packets dropped
at once perhaps every ten minutes). Irqbalanced didn't move the
interrupts.

Doing

echo 01 >/proc/irq/99/smp_affinity
echo 02 >/proc/irq/100/smp_affinity
echo 04 >/proc/irq/101/smp_affinity

and so on like Erik Dumazet suggested seems to have helped, but not
entirely solved the problem.

The problem now manifests itself this way in ethtool -S:
     rx_no_buffer_count: 270
     rx_queue_drop_packet_count: 270

I can't be sure that I'm not just getting hit by a 1Gbps traffic spike,
of course, but it is a bit strange that a machine which can do 200Mbps
at 92% idle can't handle subsecond peaks close to 1Gbps...

I wish ifstat could report errors so I could see what the traffic rate
was when the problem occurred...

/Benny

^ permalink raw reply

* Re: Strange packet drops with heavy firewalling
From: Eric Dumazet @ 2010-04-15 13:42 UTC (permalink / raw)
  To: Benny Amorsen; +Cc: Changli Gao, zhigang gong, netdev
In-Reply-To: <m3633szw61.fsf@ursa.amorsen.dk>

Le jeudi 15 avril 2010 à 15:23 +0200, Benny Amorsen a écrit :
> Benny Amorsen <benny+usenet@amorsen.dk> writes:
> 
> > I'll keep monitoring the server, and if it starts dropping packets again
> > or load increases I'll check whether irqbalanced does the right thing,
> > and if not I'll implement your suggestion.
> 
> It did start dropping packets (although very few, a few packets dropped
> at once perhaps every ten minutes). Irqbalanced didn't move the
> interrupts.
> 
> Doing
> 
> echo 01 >/proc/irq/99/smp_affinity
> echo 02 >/proc/irq/100/smp_affinity
> echo 04 >/proc/irq/101/smp_affinity
> 
> and so on like Erik Dumazet suggested seems to have helped, but not
> entirely solved the problem.
> 
> The problem now manifests itself this way in ethtool -S:
>      rx_no_buffer_count: 270
>      rx_queue_drop_packet_count: 270
> 
> I can't be sure that I'm not just getting hit by a 1Gbps traffic spike,
> of course, but it is a bit strange that a machine which can do 200Mbps
> at 92% idle can't handle subsecond peaks close to 1Gbps...
> 

Even with multiqueue, its quite possible one queue gets more than one
packet per micro second. Time to process a packet might be greater then
1 us even on recent hardware. So bursts of 1000 small packets with same
flow information, hit one queue, one cpu, and fill rx ring.

Loosing these packets is OK, its very likely its an attack :)

> I wish ifstat could report errors so I could see what the traffic rate
> was when the problem occurred...

yes, it could be added I guess.



^ permalink raw reply

* "kernel:nf_ct_icmp: bad HW ICMP checksum" too noisy
From: Benny Amorsen @ 2010-04-15 14:25 UTC (permalink / raw)
  To: netdev

Would it be possible to lower the log level of "kernel:nf_ct_icmp: bad
HW ICMP checksum" so that they don't show up on the console? Obviously I
could configure rsyslog to never send anything to the console, but then
I might miss something which is actually critical.

It's obviously nice to know about corrupted ICMP on a controlled LAN,
but on the open Internet you can't really do anything about it.

/Benny

^ permalink raw reply

* Re: "kernel:nf_ct_icmp: bad HW ICMP checksum" too noisy
From: Patrick McHardy @ 2010-04-15 14:34 UTC (permalink / raw)
  To: Benny Amorsen; +Cc: netdev, Netfilter Development Mailinglist
In-Reply-To: <m31vegztav.fsf@ursa.amorsen.dk>

Benny Amorsen wrote:
> Would it be possible to lower the log level of "kernel:nf_ct_icmp: bad
> HW ICMP checksum" so that they don't show up on the console? Obviously I
> could configure rsyslog to never send anything to the console, but then
> I might miss something which is actually critical.

Yeah, I guess defaulting to KERN_EMERG wasn't the best choice :)
I'll lower it to something reasonable - I guess KERN_NOTICE would
be appropriate.

> It's obviously nice to know about corrupted ICMP on a controlled LAN,
> but on the open Internet you can't really do anything about it.

You should only see that message when nf_conntrack_log_invalid is
active.

^ permalink raw reply

* Re: [RFC][PATCH v3 1/3] A device for zero-copy based on KVM virtio-net.
From: Arnd Bergmann @ 2010-04-15 15:06 UTC (permalink / raw)
  To: Xin, Xiaohui
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
	davem@davemloft.net, jdike@linux.intel.com
In-Reply-To: <97F6D3BD476C464182C1B7BABF0B0AF5C18969A5@shzsmsx502.ccr.corp.intel.com>

On Thursday 15 April 2010, Xin, Xiaohui wrote:
> 
> >It seems that you are duplicating a lot of functionality that
> >is already in macvtap. I've asked about this before but then
> >didn't look at your newer versions. Can you explain the value
> >of introducing another interface to user land?
> 
> >I'm still planning to add zero-copy support to macvtap,
> >hopefully reusing parts of your code, but do you think there
> >is value in having both?
> 
> I have not looked into your macvtap code in detail before.
> Does the two interface exactly the same? We just want to create a simple
> way to do zero-copy. Now it can only support vhost, but in future
> we also want it to support directly read/write operations from user space too.

Right now, the features are mostly distinct. Macvtap first of all provides
a "tap" style interface for users, and can also be used by vhost-net.
It also provides a way to share a NIC among a number of guests by software,
though I indent to add support for VMDq and SR-IOV as well. Zero-copy
is also not yet done in macvtap but should be added.

mpassthru right now does not allow sharing a NIC between guests, and
does not have a tap interface for non-vhost operation, but does the
zero-copy that is missing in macvtap.

> Basically, compared to the interface, I'm more worried about the modification
> to net core we have made to implement zero-copy now. If this hardest part
> can be done, then any user space interface modifications or integrations are 
> more easily to be done after that.

I agree that the network stack modifications are the hard part for zero-copy,
and your work on that looks very promising and is complementary to what I've
done with macvtap. Your current user interface looks good for testing this out,
but I think we should not merge it (the interface) upstream if we can get the
same or better result by integrating your buffer management code into macvtap.

I can try to merge your code into macvtap myself if you agree, so you
can focus on getting the internals right.

> >Not sure what I'm missing, but who calls the vq->receiver? This seems
> >to be neither in the upstream version of vhost nor introduced by your
> >patch.
> 
> See Patch v3 2/3 I have sent out, it is called by handle_rx() in vhost.

Ok, I see. As a general rule, it's preferred to split a patch series
in a way that makes it possible to apply each patch separately and still
get a working kernel, ideally with more features than the version before
the patch. I believe you could get there by reordering your patches to
make the actual driver the last one in the series.

Not a big problem though, I was mostly looking in the wrong place.

> >> +		ifr.ifr_name[IFNAMSIZ-1] = '\0';
> >> +
> >> +		ret = -EBUSY;
> >> +
> >> +		if (ifr.ifr_flags & IFF_MPASSTHRU_EXCL)
> >> +			break;
> 
> >Your current use of the IFF_MPASSTHRU* flags does not seem to make
> >any sense whatsoever. You check that this flag is never set, but set
> >it later yourself and then ignore all flags.
> 
> Using that flag is tried to prevent if another one wants to bind the same device
> Again. But I will see if it really ignore all other flags.

The ifr variable is on the stack of the mp_chr_ioctl function, and you never
look at the value after setting it. In order to prevent multiple opens
of that device, you probably need to lock out any other users as well,
and make it a property of the underlying device. E.g. you also want to
prevent users on the host from setting an IP address on the NIC and
using it to send and receive data there.

	Arnd

^ permalink raw reply

* Poor localhost net performance on recent stable kernel
From: Kelly Burkhart @ 2010-04-15 15:44 UTC (permalink / raw)
  To: netdev, linux-kernel

Hello,

While working on upgrading distributions, I've noticed that local
network communication is much slower on 2.6.33.2 than on our old
kernel 2.6.16.60 (sles 10.2).

Results of netperf, UDP_RR against localhost I get around 150000 tps
on the new kernel vs. 290000 tps with the old kernel.  The netperf
command:

netperf -T 1 -H 127.0.0.1 -t UDP_RR -c -C -- -r 100

TCP_RR had similar results.  The problem did not exist with TCP_STREAM.

While trying to track this down, I wrote a test program that writes
then reads a 32 bit integer to a pipe:

static void tst_pipe0( int sleep_us )
{
    int pipefd[2];
    int idx;
    uint32_t tarr[ITERS];

    printf("tst_pipe0 -- sleep %dus\n", sleep_us);

    if (pipe(pipefd) < 0)
        err_exit("pipe");

    for(idx=0; idx<ITERS; ++idx) {
        uint32_t btsc;
        uint32_t rtsc;
        uint32_t etsc;
        get_tscl(btsc);
        write(pipefd[1], (char *)&btsc, sizeof(btsc));
        read(pipefd[0], (char *)&rtsc, sizeof(rtsc));
        get_tscl(etsc);
        tarr[idx] = etsc-btsc;
        do_sleep(sleep_us);
    }
    prt_avg(tarr, ITERS);
    close(pipefd[0]);
    close(pipefd[1]);
    printf("\n");
}

There's a dramatic difference if there's a sleep between iterations on
the new kernel.  On the old kernel the write/read round trip takes
1100-1300 cycles with or without sleep.  On the new kernel, with no
sleep the round trip is about 1400 cycles.  It doubles with a 1us
sleep then gradually increases to 12000-14000 cycles then stabilizes
as I increase the sleep time to 1500us.  I'm not sure if this is
related to the netperf difference or is a completely different
scheduling issue.

I'm running on an Intel Xeon X5570 @ 2.93GHz.  Different tick/notick,
preemption, HZ kernel config option values doesn't substantially change
the magnitude of the difference.

Does anyone have any ideas regarding what could be causing the netperf
issue?  And is the pipe microbenchmark meaningful and if so what does
it mean?

Thanks,

-Kelly

^ permalink raw reply

* [RFC][PATCH] xfrm6 refcnt problem in bundle creation
From: Nicolas Dichtel @ 2010-04-15 16:32 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 1383 bytes --]

Hi all,

I got a ref count problem in xfrm IPv6 part, but I don't really know what is the 
best way to fix it.

When xfrm6_fill_dst() is called, a dev is given as parameter:

static int xfrm6_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
                           struct flowi *fl)
{
         struct rt6_info *rt = (struct rt6_info*)xdst->route;

         xdst->u.dst.dev = dev;
         dev_hold(dev);

         xdst->u.rt6.rt6i_idev = in6_dev_get(rt->u.dst.dev);
         if (!xdst->u.rt6.rt6i_idev)
                 return -ENODEV;
[snip]

In my case, dev points to an ethernet device and the route (rt->u.dst.dev) 
points to a tunnel interface (ip6 over ip6). This function will get a ref on the 
idev of the tunnel (xdst->u.rt6.rt6i_idev = in6_dev_get(rt->u.dst.dev)), but dev 
of the dst is set to the ethernet interface (xdst->u.dst.dev = dev).
After, when we try to remove the tunnel interface, the xfrm gc function will 
never check rt6i_idev, it will only check u.dst.dev, hence it will not remove 
the dst.
The consequence is that the interface cannot be removed.

IPv4 code takes the same dev to get idev, rather than using rt->u.dst.dev. Is it 
right to do the same in IPv6?
A proposal patch is attached.

Code, before the patch of the bundle creation merge, takes 'rt->u.dst.dev' to 
get idev and to set dst.dev.

Suggestions are welcome.

Regards,
Nicolas

[-- Attachment #2: 0001-xfrm6-ensure-to-use-the-same-dev-when-building-a-bu.patch --]
[-- Type: text/x-diff, Size: 950 bytes --]

>From 80432d47369925d4e9e38bcb1068ebf923de3a8f Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 15 Apr 2010 18:27:30 +0200
Subject: [PATCH] xfrm6: ensure to use the same dev when building a bundle

When building a bundle, we set dst.dev and rt6.rt6i_idev.
We must ensure to set the same device for both fields.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/ipv6/xfrm6_policy.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index ae18165..00bf7c9 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -124,7 +124,7 @@ static int xfrm6_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 	xdst->u.dst.dev = dev;
 	dev_hold(dev);

-	xdst->u.rt6.rt6i_idev = in6_dev_get(rt->u.dst.dev);
+	xdst->u.rt6.rt6i_idev = in6_dev_get(dev);
 	if (!xdst->u.rt6.rt6i_idev)
 		return -ENODEV;

-- 
1.5.4.5

^ permalink raw reply related

* Re: rps perfomance WAS(Re: rps: question
From: Rick Jones @ 2010-04-15 16:41 UTC (permalink / raw)
  To: hadi; +Cc: David Miller, eric.dumazet, therbert, netdev, robert, xiaosuo,
	andi
In-Reply-To: <1271332528.4567.150.camel@bigi>

> 
> I speculate again that it may be too costly to run rps on something like
> a tigerton or intel clovertown where you have cores sharing/contending
> for an FSB. If I can get answers to the question: "What h/ware are
> people running?" i could be proven wrong.
> [Note: I am not against RPS - i think it has its place; so i hope my
> desire to find out when to use rps doesnt show as hostility towards
> rps.]

IPS (~= RPS) was running on shared FSB HP9000's.  Now, that was also a BSD 
networking stack with netisrq's and the like.  TOPS (~= RFS) was also run on 
shared FSB HP9000s, as well as CC-NUMA HP9000s and Integrity systems.  TOPS was 
implemented in a Streams-based stack tracing its history to a common ancestor 
with Solaris (Mentat).

rick jones

^ permalink raw reply

* [PATCH net-next] net/l2tp/l2tp_debugfs.c: Convert NIPQUAD to %pI4
From: Joe Perches @ 2010-04-15 16:41 UTC (permalink / raw)
  To: David Miller; +Cc: James Chapman, LKML, netdev

Signed-off-by: Joe Perches <joe@perches.com>
---
 net/l2tp/l2tp_debugfs.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/l2tp/l2tp_debugfs.c b/net/l2tp/l2tp_debugfs.c
index 908f10f..104ec3b 100644
--- a/net/l2tp/l2tp_debugfs.c
+++ b/net/l2tp/l2tp_debugfs.c
@@ -122,8 +122,8 @@ static void l2tp_dfs_seq_tunnel_show(struct seq_file *m, void *v)
 	seq_printf(m, "\nTUNNEL %u peer %u", tunnel->tunnel_id, tunnel->peer_tunnel_id);
 	if (tunnel->sock) {
 		struct inet_sock *inet = inet_sk(tunnel->sock);
-		seq_printf(m, " from " NIPQUAD_FMT " to " NIPQUAD_FMT "\n",
-			   NIPQUAD(inet->inet_saddr), NIPQUAD(inet->inet_daddr));
+		seq_printf(m, " from %pI4 to %pI4\n",
+			   &inet->inet_saddr, &inet->inet_daddr);
 		if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
 			seq_printf(m, " source port %hu, dest port %hu\n",
 				   ntohs(inet->inet_sport), ntohs(inet->inet_dport));

^ permalink raw reply related

* Network multiqueue question
From: George B. @ 2010-04-15 16:58 UTC (permalink / raw)
  To: netdev

I am in need of a little education on multiqueue and was wondering if
someone here might be able to help me.

Given intel igb network driver, it appears I can do something like:

 tc qdisc del dev eth0 root handle 1: multiq

which works and reports 4 bands:  dev eth0 root refcnt 4 bands 4/4

But our network is a little more complicated.  Above the ethernet we
have the bonding driver which is using mode 2 bonding with two
ethernet slaves.  Then we have vlans on the bond interface.  Our
production traffic is on a vlan and resource contention is an issue as
these are busy machines.

It is my understanding that the vlan driver became multiqueue aware in
2.6.32 (we are currently using 2.6.31).

It would seem that the first thing the kernel would encounter with
traffic headed out would be the vlan interface, and then the bond
interface, and then the physical ethernet interface.  Is that correct?
 So with my kernel, I would seem to get no utility from multiq on the
ethernet interface if the vlan interface is going to be a
single-threaded bottleneck.  What about the bond driver?  Is it
currently multiqueue aware?

I am try to get some sort of logical picture of how all these things
interact with each other to get things a little more efficient and
reduce resource contention in the application while still trying to be
efficient in use of network ports/interfaces.

If someone feels up to the task of sending a little education my way,
I would be most appreciative.  There doesn't seem to be a whole lot of
documentation floating around about multiqueue other than a blurb of
text in the kernel and David's presentation of last year.

Thanks!

George

^ permalink raw reply

* Re: Network multiqueue question
From: Eric Dumazet @ 2010-04-15 17:47 UTC (permalink / raw)
  To: George B.; +Cc: netdev
In-Reply-To: <i2tb65cae941004150958n5c66dc42j26724bbb075125a0@mail.gmail.com>

Le jeudi 15 avril 2010 à 09:58 -0700, George B. a écrit :
> I am in need of a little education on multiqueue and was wondering if
> someone here might be able to help me.
> 
> Given intel igb network driver, it appears I can do something like:
> 
>  tc qdisc del dev eth0 root handle 1: multiq
> 
> which works and reports 4 bands:  dev eth0 root refcnt 4 bands 4/4
> 
> But our network is a little more complicated.  Above the ethernet we
> have the bonding driver which is using mode 2 bonding with two
> ethernet slaves.  Then we have vlans on the bond interface.  Our
> production traffic is on a vlan and resource contention is an issue as
> these are busy machines.
> 
> It is my understanding that the vlan driver became multiqueue aware in
> 2.6.32 (we are currently using 2.6.31).
> 
> It would seem that the first thing the kernel would encounter with
> traffic headed out would be the vlan interface, and then the bond
> interface, and then the physical ethernet interface.  Is that correct?
>  So with my kernel, I would seem to get no utility from multiq on the
> ethernet interface if the vlan interface is going to be a
> single-threaded bottleneck.  What about the bond driver?  Is it
> currently multiqueue aware?
> 
> I am try to get some sort of logical picture of how all these things
> interact with each other to get things a little more efficient and
> reduce resource contention in the application while still trying to be
> efficient in use of network ports/interfaces.
> 
> If someone feels up to the task of sending a little education my way,
> I would be most appreciative.  There doesn't seem to be a whole lot of
> documentation floating around about multiqueue other than a blurb of
> text in the kernel and David's presentation of last year.

Hi George

Vlan is multiqueue aware, but bonding is not unfortunatly at this
moment.

We could let it being 'multiqueue' (a patch was submitted by Oleg A.
Arkhangelsky a while ago), but bonding xmit routine needs to lock a
central lock, shared by all queues, so it wont be very efficient...

Since this bothers me a bit, I will probably work on this in a near
future. (adding real multiqueue capability and RCU to bonding fast
paths)

Ref: http://permalink.gmane.org/gmane.linux.network/152987



^ permalink raw reply

* Re: [PATCH 1/2 resend] igb: dobule increment nr_frags
From: Jeff Kirsher @ 2010-04-15 18:03 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: netdev, e1000-devel, davem, jesse.brandeburg, bruce.w.allan,
	john.ronciak, Taku Izumi
In-Reply-To: <4BC6D705.80601@jp.fujitsu.com>

On Thu, Apr 15, 2010 at 02:06, Koki Sanagi <sanagi.koki@jp.fujitsu.com> wrote:
> Previous patch has some mail format problem.
> Maybe I've fixed and re-sent.
>
> There is no need to increment nr_frags becasue skb_fill_page_desc increments
> it.
>
> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> ---
>  drivers/net/igb/igb_main.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>

Thanks, I have added the patch to my queue of patches.

-- 
Cheers,
Jeff

^ permalink raw reply

* Re: [PATCH 2/2] igbvf: dobule increment nr_frags
From: Jeff Kirsher @ 2010-04-15 18:05 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: e1000-devel, netdev, bruce.w.allan, jesse.brandeburg,
	john.ronciak, davem
In-Reply-To: <4BC6BEFC.3010309@jp.fujitsu.com>

2010/4/15 Koki Sanagi <sanagi.koki@jp.fujitsu.com>:
> There is no need to increment nr_frags becasue skb_fill_page_desc increments
> it.
>
> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> ---
>  drivers/net/igbvf/netdev.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>

Thanks I have added this patch to my queue of patches.

-- 
Cheers,
Jeff

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: Network multiqueue question
From: Jay Vosburgh @ 2010-04-15 18:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: George B., netdev
In-Reply-To: <1271353637.16881.2846.camel@edumazet-laptop>

Eric Dumazet <eric.dumazet@gmail.com> wrote:

>Le jeudi 15 avril 2010 à 09:58 -0700, George B. a écrit :
>> I am in need of a little education on multiqueue and was wondering if
>> someone here might be able to help me.
>> 
>> Given intel igb network driver, it appears I can do something like:
>> 
>>  tc qdisc del dev eth0 root handle 1: multiq
>> 
>> which works and reports 4 bands:  dev eth0 root refcnt 4 bands 4/4
>> 
>> But our network is a little more complicated.  Above the ethernet we
>> have the bonding driver which is using mode 2 bonding with two
>> ethernet slaves.  Then we have vlans on the bond interface.  Our
>> production traffic is on a vlan and resource contention is an issue as
>> these are busy machines.
>> 
>> It is my understanding that the vlan driver became multiqueue aware in
>> 2.6.32 (we are currently using 2.6.31).
>> 
>> It would seem that the first thing the kernel would encounter with
>> traffic headed out would be the vlan interface, and then the bond
>> interface, and then the physical ethernet interface.  Is that correct?
>>  So with my kernel, I would seem to get no utility from multiq on the
>> ethernet interface if the vlan interface is going to be a
>> single-threaded bottleneck.  What about the bond driver?  Is it
>> currently multiqueue aware?
>> 
>> I am try to get some sort of logical picture of how all these things
>> interact with each other to get things a little more efficient and
>> reduce resource contention in the application while still trying to be
>> efficient in use of network ports/interfaces.
>> 
>> If someone feels up to the task of sending a little education my way,
>> I would be most appreciative.  There doesn't seem to be a whole lot of
>> documentation floating around about multiqueue other than a blurb of
>> text in the kernel and David's presentation of last year.
>
>Hi George
>
>Vlan is multiqueue aware, but bonding is not unfortunatly at this
>moment.
>
>We could let it being 'multiqueue' (a patch was submitted by Oleg A.
>Arkhangelsky a while ago), but bonding xmit routine needs to lock a
>central lock, shared by all queues, so it wont be very efficient...

	The lock is a read lock, so theoretically it should be possible
to enter the bonding transmit function on multiple CPUs at the same
time.  The lock may thrash around, though.

>Since this bothers me a bit, I will probably work on this in a near
>future. (adding real multiqueue capability and RCU to bonding fast
>paths)
>
>Ref: http://permalink.gmane.org/gmane.linux.network/152987

	The question I have about it (and the above patch), is: what
does multi-queue "awareness" really mean for a bonding device?  How does
allocating a bunch of TX queues help, given that the determination of
the transmitting device hasn't necessarily been made?

	I haven't had the chance to acquire some multi-queue network
cards and check things out with bonding, so I'm not really sure how it
should work.  Should the bond look, from a multi-queue perspective, like
the largest slave, or should it look like the sum of the slaves?  Some
of this is may be mode-specific, as well.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: Network multiqueue question
From: Eric Dumazet @ 2010-04-15 18:41 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: George B., netdev
In-Reply-To: <21433.1271354986@death.nxdomain.ibm.com>

Le jeudi 15 avril 2010 à 11:09 -0700, Jay Vosburgh a écrit :
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> >Vlan is multiqueue aware, but bonding is not unfortunatly at this
> >moment.
> >
> >We could let it being 'multiqueue' (a patch was submitted by Oleg A.
> >Arkhangelsky a while ago), but bonding xmit routine needs to lock a
> >central lock, shared by all queues, so it wont be very efficient...
> 
> 	The lock is a read lock, so theoretically it should be possible
> to enter the bonding transmit function on multiple CPUs at the same
> time.  The lock may thrash around, though.
> 

Yes, and with 10Gb cards, this is a limiting factor, if you want to send
14 million packets per second ;)

read_lock() is one atomic op, dirtying cacheline
read_unlock() is one atomic op, dirtying cache line again (if contended)

in active-passive mode, RCU use should be really easy, given netdevices
are already RCU compatable. This way, each cpu only reads bonding state,
without any memory changes.


> >Since this bothers me a bit, I will probably work on this in a near
> >future. (adding real multiqueue capability and RCU to bonding fast
> >paths)
> >
> >Ref: http://permalink.gmane.org/gmane.linux.network/152987
> 
> 	The question I have about it (and the above patch), is: what
> does multi-queue "awareness" really mean for a bonding device?  How does
> allocating a bunch of TX queues help, given that the determination of
> the transmitting device hasn't necessarily been made?
> 

Well, it is a problem that was also taken into account with vlan, you
might take a look at this commit :

commit 669d3e0babb40018dd6e78f4093c13a2eac73866
Author: Vasu Dev <vasu.dev@intel.com>
Date:   Tue Mar 23 14:41:45 2010 +0000

    vlan: adds vlan_dev_select_queue
    
    This is required to correctly select vlan tx queue for a driver
    supporting multi tx queue with ndo_select_queue implemented since
    currently selected vlan tx queue is unaligned to selected queue by
    real net_devce ndo_select_queue.
    
    Unaligned vlan tx queue selection causes thrash with higher vlan
    tx lock contention for least fcoe traffic and wrong socket tx
    queue_mapping for ixgbe having ndo_select_queue implemented.
    
    -v2
    
    As per Eric Dumazet<eric.dumazet@gmail.com> comments, mirrored
    vlan net_device_ops to have them with and without
vlan_dev_select_queue
    and then select according to real dev ndo_select_queue present or
not
    for a vlan net_device. This is to completely skip
vlan_dev_select_queue
    calling for real net_device not supporting ndo_select_queue.
    
    Signed-off-by: Vasu Dev <vasu.dev@intel.com>
    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
    Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>


> 	I haven't had the chance to acquire some multi-queue network
> cards and check things out with bonding, so I'm not really sure how it
> should work.  Should the bond look, from a multi-queue perspective, like
> the largest slave, or should it look like the sum of the slaves?  Some
> of this is may be mode-specific, as well.
> 





^ permalink raw reply

* Re: [PATCH net-next] net/l2tp/l2tp_debugfs.c: Convert NIPQUAD to %pI4
From: James Chapman @ 2010-04-15 18:44 UTC (permalink / raw)
  To: Joe Perches; +Cc: David Miller, LKML, netdev
In-Reply-To: <1271349703.1726.62.camel@Joe-Laptop.home>

Joe Perches wrote:
> Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: James Chapman <jchapman@katalix.com>

> ---
>  net/l2tp/l2tp_debugfs.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/l2tp/l2tp_debugfs.c b/net/l2tp/l2tp_debugfs.c
> index 908f10f..104ec3b 100644
> --- a/net/l2tp/l2tp_debugfs.c
> +++ b/net/l2tp/l2tp_debugfs.c
> @@ -122,8 +122,8 @@ static void l2tp_dfs_seq_tunnel_show(struct seq_file *m, void *v)
>  	seq_printf(m, "\nTUNNEL %u peer %u", tunnel->tunnel_id, tunnel->peer_tunnel_id);
>  	if (tunnel->sock) {
>  		struct inet_sock *inet = inet_sk(tunnel->sock);
> -		seq_printf(m, " from " NIPQUAD_FMT " to " NIPQUAD_FMT "\n",
> -			   NIPQUAD(inet->inet_saddr), NIPQUAD(inet->inet_daddr));
> +		seq_printf(m, " from %pI4 to %pI4\n",
> +			   &inet->inet_saddr, &inet->inet_daddr);
>  		if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
>  			seq_printf(m, " source port %hu, dest port %hu\n",
>  				   ntohs(inet->inet_sport), ntohs(inet->inet_dport));
> 
> 
> 


^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon:  caller is netif_rx
From: Eric Dumazet @ 2010-04-15 19:07 UTC (permalink / raw)
  To: Eric Paris, David Miller; +Cc: netdev, Tom Herbert
In-Reply-To: <1271337401.16881.2563.camel@edumazet-laptop>

Le jeudi 15 avril 2010 à 15:16 +0200, Eric Dumazet a écrit :
> Le lundi 12 avril 2010 à 21:40 +0200, Eric Dumazet a écrit :
> > Good spot, RPS changed a bit netif_rx() requirements.
> > 
> > I would change ip_dev_loopback_xmit() to call netif_rx_ni() instead...
> > 
> > David, Tom ?
> > 
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index c65f18e..d1bcc9f 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb)
> >  	newskb->pkt_type = PACKET_LOOPBACK;
> >  	newskb->ip_summed = CHECKSUM_UNNECESSARY;
> >  	WARN_ON(!skb_dst(newskb));
> > -	netif_rx(newskb);
> > +	netif_rx_ni(newskb);
> >  	return 0;
> >  }
> >  
> 
> After some confusion, it seems this was the right fix after all :)
> 
> [PATCH] ip: Fix ip_dev_loopback_xmit()
> 
> Eric Paris got following trace with a linux-next kernel
> 
> [   14.203970] BUG: using smp_processor_id() in preemptible [00000000]
> code: avahi-daemon/2093
> [   14.204025] caller is netif_rx+0xfa/0x110
> [   14.204035] Call Trace:
> [   14.204064]  [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110
> [   14.204070]  [<ffffffff8142163a>] netif_rx+0xfa/0x110
> [   14.204090]  [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0
> [   14.204095]  [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0
> [   14.204099]  [<ffffffff8145d610>] ip_local_out+0x20/0x30
> [   14.204105]  [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0
> [   14.204119]  [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400
> [   14.204125]  [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790
> [   14.204137]  [<ffffffff814891d5>] inet_sendmsg+0x45/0x80
> [   14.204149]  [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110
> [   14.204189]  [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380
> [   14.204233]  [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b
> 
> While current linux-2.6 kernel doesnt emit this warning, bug is latent
> and might cause unexpected failures.
> 
> ip_dev_loopback_xmit() runs in process context, preemption enabled, so
> must call netif_rx_ni() instead of netif_rx(), to make sure that we
> process pending software interrupt.
> 
> Reported-by: Eric Paris <eparis@redhat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index c65f18e..d1bcc9f 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb)
>  	newskb->pkt_type = PACKET_LOOPBACK;
>  	newskb->ip_summed = CHECKSUM_UNNECESSARY;
>  	WARN_ON(!skb_dst(newskb));
> -	netif_rx(newskb);
> +	netif_rx_ni(newskb);
>  	return 0;
>  }
>  

Oops silly me, I forgot ipv6

updated patch in a couple of minutes





^ permalink raw reply

* [PATCH] ip: Fix ip_dev_loopback_xmit()
From: Eric Dumazet @ 2010-04-15 19:13 UTC (permalink / raw)
  To: Eric Paris, David Miller; +Cc: netdev, Tom Herbert
In-Reply-To: <1271337401.16881.2563.camel@edumazet-laptop>

[PATCH] ip: Fix ip_dev_loopback_xmit()

Eric Paris got following trace with a linux-next kernel

[   14.203970] BUG: using smp_processor_id() in preemptible [00000000]
code: avahi-daemon/2093
[   14.204025] caller is netif_rx+0xfa/0x110
[   14.204035] Call Trace:
[   14.204064]  [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110
[   14.204070]  [<ffffffff8142163a>] netif_rx+0xfa/0x110
[   14.204090]  [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0
[   14.204095]  [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0
[   14.204099]  [<ffffffff8145d610>] ip_local_out+0x20/0x30
[   14.204105]  [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0
[   14.204119]  [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400
[   14.204125]  [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790
[   14.204137]  [<ffffffff814891d5>] inet_sendmsg+0x45/0x80
[   14.204149]  [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110
[   14.204189]  [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380
[   14.204233]  [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b

While current linux-2.6 kernel doesnt emit this warning, bug is latent
and might cause unexpected failures.

ip_dev_loopback_xmit() runs in process context, preemption enabled, so
must call netif_rx_ni() instead of netif_rx(), to make sure that we
process pending software interrupt.

Same change for ip6_dev_loopback_xmit()

Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/ip_output.c  |    2 +-
 net/ipv6/ip6_output.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index c65f18e..d1bcc9f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb)
 	newskb->pkt_type = PACKET_LOOPBACK;
 	newskb->ip_summed = CHECKSUM_UNNECESSARY;
 	WARN_ON(!skb_dst(newskb));
-	netif_rx(newskb);
+	netif_rx_ni(newskb);
 	return 0;
 }
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 16c4391..65f9c37 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -108,7 +108,7 @@ static int ip6_dev_loopback_xmit(struct sk_buff *newskb)
 	newskb->ip_summed = CHECKSUM_UNNECESSARY;
 	WARN_ON(!skb_dst(newskb));
 
-	netif_rx(newskb);
+	netif_rx_ni(newskb);
 	return 0;
 }
 



^ permalink raw reply related

* [PATCH] net: mac8390 - Sort out memory/MMIO accesses and casts (was: Re: drivers/net/mac8390.c: Remove useless memcpy casting)
From: Geert Uytterhoeven @ 2010-04-15 19:34 UTC (permalink / raw)
  To: Joe Perches, Finn Thain
  Cc: David S. Miller, netdev, Linux Kernel Mailing List, Linux/m68k

On Mon, Mar 8, 2010 at 18:16, Joe Perches <joe@perches.com> wrote:
> On Sun, 2010-03-07 at 10:19 +0100, Geert Uytterhoeven wrote:
>> ... hence you introduced 3 compiler warnings:
>>
>> drivers/net/mac8390.c:249: warning: passing argument 1 of
>> '__builtin_memcpy' makes pointer from integer without a cast
>> drivers/net/mac8390.c:254: warning: passing argument 1 of
>> 'word_memcpy_tocard' makes pointer from integer without a cast
>> drivers/net/mac8390.c:256: warning: passing argument 2 of
>> 'word_memcpy_fromcard' makes pointer from integer without a cast
>
> Thanks, I'll submit a patch to fix it by tomorrow or so.

It hasn't been fixed yet. But here's a better solution.
I do not have the hardware to test it, though.
Finn, does it {look OK,work}?

>From 7ef849741922afa7b613f271f414024c454a0d23 Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Wed, 14 Apr 2010 18:48:50 +0200
Subject: [PATCH] net: mac8390 - Sort out memory/MMIO accesses and casts

commit 5c7fffd0e3b57cb63f50bbd710868f012d67654f ("drivers/net/mac8390.c: Remove
useless memcpy casting") removed too many casts, introducing the following
warnings:

| drivers/net/mac8390.c:248: warning: passing argument 1 of
'__builtin_memcpy' makes pointer from integer without a cast
| drivers/net/mac8390.c:253: warning: passing argument 1 of
'word_memcpy_tocard' makes pointer from integer without a cast
| drivers/net/mac8390.c:255: warning: passing argument 2 of
'word_memcpy_fromcard' makes pointer from integer without a cast

Instead of just readding the casts,
  - move all casts inside word_memcpy_{to,from}card(),
  - replace an incorrect memcpy() by memcpy_toio(),
  - add memcmp_withio() as a wrapper around memcmp(),
  - replace an incorrect memcpy_toio() by memcpy_fromio().

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
 drivers/net/mac8390.c |   44 ++++++++++++++++++++++----------------------
 1 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/net/mac8390.c b/drivers/net/mac8390.c
index c8e68fd..8c96e9c 100644
--- a/drivers/net/mac8390.c
+++ b/drivers/net/mac8390.c
@@ -157,6 +157,8 @@ static void dayna_block_output(struct net_device
*dev, int count,
 #define memcpy_fromio(a, b, c)	memcpy((a), (void *)(b), (c))
 #define memcpy_toio(a, b, c)	memcpy((void *)(a), (b), (c))

+#define memcmp_withio(a, b, c)	memcmp((a), (void *)(b), (c))
+
 /* Slow Sane (16-bit chunk memory read/write) Cabletron uses this */
 static void slow_sane_get_8390_hdr(struct net_device *dev,
 				   struct e8390_pkt_hdr *hdr, int ring_page);
@@ -164,8 +166,8 @@ static void slow_sane_block_input(struct
net_device *dev, int count,
 				  struct sk_buff *skb, int ring_offset);
 static void slow_sane_block_output(struct net_device *dev, int count,
 				   const unsigned char *buf, int start_page);
-static void word_memcpy_tocard(void *tp, const void *fp, int count);
-static void word_memcpy_fromcard(void *tp, const void *fp, int count);
+static void word_memcpy_tocard(unsigned long tp, const void *fp, int count);
+static void word_memcpy_fromcard(void *tp, unsigned long fp, int count);

 static enum mac8390_type __init mac8390_ident(struct nubus_dev *dev)
 {
@@ -245,9 +247,9 @@ static enum mac8390_access __init
mac8390_testio(volatile unsigned long membase)
 	unsigned long outdata = 0xA5A0B5B0;
 	unsigned long indata =  0x00000000;
 	/* Try writing 32 bits */
-	memcpy(membase, &outdata, 4);
+	memcpy_toio(membase, &outdata, 4);
 	/* Now compare them */
-	if (memcmp((char *)&outdata, (char *)membase, 4) == 0)
+	if (memcmp_withio(&outdata, membase, 4) == 0)
 		return ACCESS_32;
 	/* Write 16 bit output */
 	word_memcpy_tocard(membase, &outdata, 4);
@@ -733,7 +735,7 @@ static void sane_get_8390_hdr(struct net_device *dev,
 			      struct e8390_pkt_hdr *hdr, int ring_page)
 {
 	unsigned long hdr_start = (ring_page - WD_START_PG)<<8;
-	memcpy_fromio((void *)hdr, (char *)dev->mem_start + hdr_start, 4);
+	memcpy_fromio(hdr, dev->mem_start + hdr_start, 4);
 	/* Fix endianness */
 	hdr->count = swab16(hdr->count);
 }
@@ -747,14 +749,13 @@ static void sane_block_input(struct net_device
*dev, int count,
 	if (xfer_start + count > ei_status.rmem_end) {
 		/* We must wrap the input move. */
 		int semi_count = ei_status.rmem_end - xfer_start;
-		memcpy_fromio(skb->data, (char *)dev->mem_start + xfer_base,
+		memcpy_fromio(skb->data, dev->mem_start + xfer_base,
 			      semi_count);
 		count -= semi_count;
-		memcpy_toio(skb->data + semi_count,
-			    (char *)ei_status.rmem_start, count);
-	} else {
-		memcpy_fromio(skb->data, (char *)dev->mem_start + xfer_base,
+		memcpy_fromio(skb->data + semi_count, ei_status.rmem_start,
 			      count);
+	} else {
+		memcpy_fromio(skb->data, dev->mem_start + xfer_base, count);
 	}
 }

@@ -763,7 +764,7 @@ static void sane_block_output(struct net_device
*dev, int count,
 {
 	long shmem = (start_page - WD_START_PG)<<8;

-	memcpy_toio((char *)dev->mem_start + shmem, buf, count);
+	memcpy_toio(dev->mem_start + shmem, buf, count);
 }

 /* dayna block input/output */
@@ -814,7 +815,7 @@ static void slow_sane_get_8390_hdr(struct net_device *dev,
 				   int ring_page)
 {
 	unsigned long hdr_start = (ring_page - WD_START_PG)<<8;
-	word_memcpy_fromcard(hdr, (char *)dev->mem_start + hdr_start, 4);
+	word_memcpy_fromcard(hdr, dev->mem_start + hdr_start, 4);
 	/* Register endianism - fix here rather than 8390.c */
 	hdr->count = (hdr->count&0xFF)<<8|(hdr->count>>8);
 }
@@ -828,15 +829,14 @@ static void slow_sane_block_input(struct
net_device *dev, int count,
 	if (xfer_start + count > ei_status.rmem_end) {
 		/* We must wrap the input move. */
 		int semi_count = ei_status.rmem_end - xfer_start;
-		word_memcpy_fromcard(skb->data,
-				     (char *)dev->mem_start + xfer_base,
+		word_memcpy_fromcard(skb->data, dev->mem_start + xfer_base,
 				     semi_count);
 		count -= semi_count;
 		word_memcpy_fromcard(skb->data + semi_count,
-				     (char *)ei_status.rmem_start, count);
+				     ei_status.rmem_start, count);
 	} else {
-		word_memcpy_fromcard(skb->data,
-				     (char *)dev->mem_start + xfer_base, count);
+		word_memcpy_fromcard(skb->data, dev->mem_start + xfer_base,
+				     count);
 	}
 }

@@ -845,12 +845,12 @@ static void slow_sane_block_output(struct
net_device *dev, int count,
 {
 	long shmem = (start_page - WD_START_PG)<<8;

-	word_memcpy_tocard((char *)dev->mem_start + shmem, buf, count);
+	word_memcpy_tocard(dev->mem_start + shmem, buf, count);
 }

-static void word_memcpy_tocard(void *tp, const void *fp, int count)
+static void word_memcpy_tocard(unsigned long tp, const void *fp, int count)
 {
-	volatile unsigned short *to = tp;
+	volatile unsigned short *to = (void *)tp;
 	const unsigned short *from = fp;

 	count++;
@@ -860,10 +860,10 @@ static void word_memcpy_tocard(void *tp, const
void *fp, int count)
 		*to++ = *from++;
 }

-static void word_memcpy_fromcard(void *tp, const void *fp, int count)
+static void word_memcpy_fromcard(void *tp, unsigned long fp, int count)
 {
 	unsigned short *to = tp;
-	const volatile unsigned short *from = fp;
+	const volatile unsigned short *from = (const void *)fp;

 	count++;
 	count /= 2;
-- 
1.6.0.4

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply related

* [PATCH Resubmission v2] drivers/net/usb: Add new driver ipheth
From: L. Alberto Giménez @ 2010-04-15 19:46 UTC (permalink / raw)
  Cc: dgiagio, dborca, David S. Miller, James Bottomley, Ralf Baechle,
	Greg Kroah-Hartman, Jonas Sjöquist, Torgny Johansson,
	Steve Glendinning, David Brownell, Omar Laazimani,
	Rémi Denis-Courmont, L. Alberto Giménez, netdev,
	linux-kernel, linux-usb
In-Reply-To: <1269984864-28159-1-git-send-email-agimenez@sysvalve.es>

From: dborca@yahoo.com

Add new driver to use tethering with an iPhone device. After initial submission,
apply fixes to fit the new driver into the kernel standards.

There are still a couple of minor (almost cosmetic-level) issues, but the driver
is fully functional right now.

Signed-off-by: L. Alberto Giménez <agimenez@sysvalve.es>
---
 drivers/net/Makefile     |    1 +
 drivers/net/usb/Kconfig  |   12 +
 drivers/net/usb/Makefile |    1 +
 drivers/net/usb/ipheth.c |  568 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 582 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/usb/ipheth.c

diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index a583b50..12b280a 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -273,6 +273,7 @@ obj-$(CONFIG_USB_RTL8150)       += usb/
 obj-$(CONFIG_USB_HSO)		+= usb/
 obj-$(CONFIG_USB_USBNET)        += usb/
 obj-$(CONFIG_USB_ZD1201)        += usb/
+obj-$(CONFIG_USB_IPHETH)        += usb/
 
 obj-y += wireless/
 obj-$(CONFIG_NET_TULIP) += tulip/
diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
index ba56ce4..63be4ca 100644
--- a/drivers/net/usb/Kconfig
+++ b/drivers/net/usb/Kconfig
@@ -385,4 +385,16 @@ config USB_CDC_PHONET
 	  cellular modem, as found on most Nokia handsets with the
 	  "PC suite" USB profile.
 
+config USB_IPHETH
+	tristate "Apple iPhone USB Ethernet driver"
+	default n
+	---help---
+	  Module used to share Internet connection (tethering) from your
+	  iPhone (Original, 3G and 3GS) to your system.
+	  Note that you need userspace libraries and programs that are needed
+	  to pair your device with your system and that understand the iPhone
+	  protocol.
+
+	  For more information: http://giagio.com/wiki/moin.cgi/iPhoneEthernetDriver
+
 endmenu
diff --git a/drivers/net/usb/Makefile b/drivers/net/usb/Makefile
index 82ea629..edb09c0 100644
--- a/drivers/net/usb/Makefile
+++ b/drivers/net/usb/Makefile
@@ -23,4 +23,5 @@ obj-$(CONFIG_USB_NET_MCS7830)	+= mcs7830.o
 obj-$(CONFIG_USB_USBNET)	+= usbnet.o
 obj-$(CONFIG_USB_NET_INT51X1)	+= int51x1.o
 obj-$(CONFIG_USB_CDC_PHONET)	+= cdc-phonet.o
+obj-$(CONFIG_USB_IPHETH)	+= ipheth.o
 
diff --git a/drivers/net/usb/ipheth.c b/drivers/net/usb/ipheth.c
new file mode 100644
index 0000000..fd10331
--- /dev/null
+++ b/drivers/net/usb/ipheth.c
@@ -0,0 +1,568 @@
+/*
+ * ipheth.c - Apple iPhone USB Ethernet driver
+ *
+ * Copyright (c) 2009 Diego Giagio <diego@giagio.com>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of GIAGIO.COM nor the names of its contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License ("GPL") version 2, in which case the provisions of the
+ * GPL apply INSTEAD OF those given above.
+ *
+ * The provided data structures and external interfaces from this code
+ * are not restricted to be used by modules with a GPL compatible license.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ *
+ *
+ * Attention: iPhone device must be paired, otherwise it won't respond to our
+ * driver. For more info: http://giagio.com/wiki/moin.cgi/iPhoneEthernetDriver
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
+#include <linux/usb.h>
+#include <linux/workqueue.h>
+
+#define USB_VENDOR_APPLE        0x05ac
+#define USB_PRODUCT_IPHONE      0x1290
+#define USB_PRODUCT_IPHONE_3G   0x1292
+#define USB_PRODUCT_IPHONE_3GS  0x1294
+
+#define IPHETH_USBINTF_CLASS    255
+#define IPHETH_USBINTF_SUBCLASS 253
+#define IPHETH_USBINTF_PROTO    1
+
+#define IPHETH_BUF_SIZE         1516
+#define IPHETH_TX_TIMEOUT       (5 * HZ)
+
+#define IPHETH_INTFNUM          2
+#define IPHETH_ALT_INTFNUM      1
+
+#define IPHETH_CTRL_ENDP        0x00
+#define IPHETH_CTRL_BUF_SIZE    0x40
+#define IPHETH_CTRL_TIMEOUT     (5 * HZ)
+
+#define IPHETH_CMD_GET_MACADDR   0x00
+#define IPHETH_CMD_CARRIER_CHECK 0x45
+
+#define IPHETH_CARRIER_CHECK_TIMEOUT round_jiffies_relative(1 * HZ)
+#define IPHETH_CARRIER_ON       0x04
+
+static struct usb_device_id ipheth_table[] = {
+	{ USB_DEVICE_AND_INTERFACE_INFO(
+		USB_VENDOR_APPLE, USB_PRODUCT_IPHONE,
+		IPHETH_USBINTF_CLASS, IPHETH_USBINTF_SUBCLASS,
+		IPHETH_USBINTF_PROTO) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(
+		USB_VENDOR_APPLE, USB_PRODUCT_IPHONE_3G,
+		IPHETH_USBINTF_CLASS, IPHETH_USBINTF_SUBCLASS,
+		IPHETH_USBINTF_PROTO) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(
+		USB_VENDOR_APPLE, USB_PRODUCT_IPHONE_3GS,
+		IPHETH_USBINTF_CLASS, IPHETH_USBINTF_SUBCLASS,
+		IPHETH_USBINTF_PROTO) },
+	{ }
+};
+MODULE_DEVICE_TABLE(usb, ipheth_table);
+
+struct ipheth_device {
+	struct usb_device *udev;
+	struct usb_interface *intf;
+	struct net_device *net;
+	struct sk_buff *tx_skb;
+	struct urb *tx_urb;
+	struct urb *rx_urb;
+	unsigned char *tx_buf;
+	unsigned char *rx_buf;
+	unsigned char *ctrl_buf;
+	u8 bulk_in;
+	u8 bulk_out;
+	struct delayed_work carrier_work;
+};
+
+static int ipheth_rx_submit(struct ipheth_device *dev, gfp_t mem_flags);
+
+static int ipheth_alloc_urbs(struct ipheth_device *iphone)
+{
+	struct urb *tx_urb = NULL;
+	struct urb *rx_urb = NULL;
+	u8 *tx_buf = NULL;
+	u8 *rx_buf = NULL;
+
+	tx_urb = usb_alloc_urb(0, GFP_KERNEL);
+	if (tx_urb == NULL)
+		goto error;
+
+	rx_urb = usb_alloc_urb(0, GFP_KERNEL);
+	if (rx_urb == NULL)
+		goto error;
+
+	tx_buf = usb_buffer_alloc(iphone->udev,
+				  IPHETH_BUF_SIZE,
+				  GFP_KERNEL,
+				  &tx_urb->transfer_dma);
+	if (tx_buf == NULL)
+		goto error;
+
+	rx_buf = usb_buffer_alloc(iphone->udev,
+				  IPHETH_BUF_SIZE,
+				  GFP_KERNEL,
+				  &rx_urb->transfer_dma);
+	if (rx_buf == NULL)
+		goto error;
+
+
+	iphone->tx_urb = tx_urb;
+	iphone->rx_urb = rx_urb;
+	iphone->tx_buf = tx_buf;
+	iphone->rx_buf = rx_buf;
+	return 0;
+
+error:
+	usb_buffer_free(iphone->udev, IPHETH_BUF_SIZE, rx_buf,
+			rx_urb->transfer_dma);
+	usb_buffer_free(iphone->udev, IPHETH_BUF_SIZE, tx_buf,
+			tx_urb->transfer_dma);
+	usb_free_urb(rx_urb);
+	usb_free_urb(tx_urb);
+	return -ENOMEM;
+}
+
+static void ipheth_free_urbs(struct ipheth_device *iphone)
+{
+	usb_buffer_free(iphone->udev, IPHETH_BUF_SIZE, iphone->rx_buf,
+			iphone->rx_urb->transfer_dma);
+	usb_buffer_free(iphone->udev, IPHETH_BUF_SIZE, iphone->tx_buf,
+			iphone->tx_urb->transfer_dma);
+	usb_free_urb(iphone->rx_urb);
+	usb_free_urb(iphone->tx_urb);
+}
+
+static void ipheth_kill_urbs(struct ipheth_device *dev)
+{
+	usb_kill_urb(dev->tx_urb);
+	usb_kill_urb(dev->rx_urb);
+}
+
+static void ipheth_rcvbulk_callback(struct urb *urb)
+{
+	struct ipheth_device *dev;
+	struct sk_buff *skb;
+	int status;
+	char *buf;
+	int len;
+
+	dev = urb->context;
+	if (dev == NULL)
+		return;
+
+	status = urb->status;
+	switch (status) {
+	case -ENOENT:
+	case -ECONNRESET:
+	case -ESHUTDOWN:
+		return;
+	case 0:
+		break;
+	default:
+		err("%s: urb status: %d", __func__, urb->status);
+		return;
+	}
+
+	len = urb->actual_length;
+	buf = urb->transfer_buffer;
+
+	skb = dev_alloc_skb(NET_IP_ALIGN + len);
+	if (!skb) {
+		err("%s: dev_alloc_skb: -ENOMEM", __func__);
+		dev->net->stats.rx_dropped++;
+		return;
+	}
+
+	skb_reserve(skb, NET_IP_ALIGN);
+	memcpy(skb_put(skb, len), buf + NET_IP_ALIGN, len - NET_IP_ALIGN);
+	skb->dev = dev->net;
+	skb->protocol = eth_type_trans(skb, dev->net);
+
+	dev->net->stats.rx_packets++;
+	dev->net->stats.rx_bytes += len;
+
+	netif_rx(skb);
+	ipheth_rx_submit(dev, GFP_ATOMIC);
+}
+
+static void ipheth_sndbulk_callback(struct urb *urb)
+{
+	struct ipheth_device *dev;
+
+	dev = urb->context;
+	if (dev == NULL)
+		return;
+
+	if (urb->status != 0 &&
+	    urb->status != -ENOENT &&
+	    urb->status != -ECONNRESET &&
+	    urb->status != -ESHUTDOWN)
+		err("%s: urb status: %d", __func__, urb->status);
+
+	dev_kfree_skb_irq(dev->tx_skb);
+	netif_wake_queue(dev->net);
+}
+
+static int ipheth_carrier_set(struct ipheth_device *dev)
+{
+	struct usb_device *udev = dev->udev;
+	int retval;
+
+	retval = usb_control_msg(udev,
+			usb_rcvctrlpipe(udev, IPHETH_CTRL_ENDP),
+			IPHETH_CMD_CARRIER_CHECK, /* request */
+			0xc0, /* request type */
+			0x00, /* value */
+			0x02, /* index */
+			dev->ctrl_buf, IPHETH_CTRL_BUF_SIZE,
+			IPHETH_CTRL_TIMEOUT);
+	if (retval < 0) {
+		err("%s: usb_control_msg: %d", __func__, retval);
+		return retval;
+	}
+
+	if (dev->ctrl_buf[0] == IPHETH_CARRIER_ON)
+		netif_carrier_on(dev->net);
+	else
+		netif_carrier_off(dev->net);
+
+	return 0;
+}
+
+static void ipheth_carrier_check_work(struct work_struct *work)
+{
+	struct ipheth_device *dev = container_of(work, struct ipheth_device,
+						 carrier_work.work);
+
+	ipheth_carrier_set(dev);
+	schedule_delayed_work(&dev->carrier_work, IPHETH_CARRIER_CHECK_TIMEOUT);
+}
+
+static int ipheth_get_macaddr(struct ipheth_device *dev)
+{
+	struct usb_device *udev = dev->udev;
+	struct net_device *net = dev->net;
+	int retval;
+
+	retval = usb_control_msg(udev,
+				 usb_rcvctrlpipe(udev, IPHETH_CTRL_ENDP),
+				 IPHETH_CMD_GET_MACADDR, /* request */
+				 0xc0, /* request type */
+				 0x00, /* value */
+				 0x02, /* index */
+				 dev->ctrl_buf,
+				 IPHETH_CTRL_BUF_SIZE,
+				 IPHETH_CTRL_TIMEOUT);
+	if (retval < 0) {
+		err("%s: usb_control_msg: %d", __func__, retval);
+	} else if (retval < ETH_ALEN) {
+		err("%s: usb_control_msg: short packet: %d bytes",
+			__func__, retval);
+		retval = -EINVAL;
+	} else {
+		memcpy(net->dev_addr, dev->ctrl_buf, ETH_ALEN);
+		retval = 0;
+	}
+
+	return retval;
+}
+
+static int ipheth_rx_submit(struct ipheth_device *dev, gfp_t mem_flags)
+{
+	struct usb_device *udev = dev->udev;
+	int retval;
+
+	usb_fill_bulk_urb(dev->rx_urb, udev,
+			  usb_rcvbulkpipe(udev, dev->bulk_in),
+			  dev->rx_buf, IPHETH_BUF_SIZE,
+			  ipheth_rcvbulk_callback,
+			  dev);
+	dev->rx_urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP;
+
+	retval = usb_submit_urb(dev->rx_urb, mem_flags);
+	if (retval)
+		err("%s: usb_submit_urb: %d", __func__, retval);
+	return retval;
+}
+
+static int ipheth_open(struct net_device *net)
+{
+	struct ipheth_device *dev = netdev_priv(net);
+	struct usb_device *udev = dev->udev;
+	int retval = 0;
+
+	usb_set_interface(udev, IPHETH_INTFNUM, IPHETH_ALT_INTFNUM);
+
+	retval = ipheth_carrier_set(dev);
+	if (retval)
+		return retval;
+
+	retval = ipheth_rx_submit(dev, GFP_KERNEL);
+	if (retval)
+		return retval;
+
+	schedule_delayed_work(&dev->carrier_work, IPHETH_CARRIER_CHECK_TIMEOUT);
+	netif_start_queue(net);
+	return retval;
+}
+
+static int ipheth_close(struct net_device *net)
+{
+	struct ipheth_device *dev = netdev_priv(net);
+
+	cancel_delayed_work_sync(&dev->carrier_work);
+	netif_stop_queue(net);
+	return 0;
+}
+
+static int ipheth_tx(struct sk_buff *skb, struct net_device *net)
+{
+	struct ipheth_device *dev = netdev_priv(net);
+	struct usb_device *udev = dev->udev;
+	int retval;
+
+	/* Paranoid */
+	if (skb->len > IPHETH_BUF_SIZE) {
+		WARN(1, "%s: skb too large: %d bytes", __func__, skb->len);
+		dev->net->stats.tx_dropped++;
+		dev_kfree_skb_irq(skb);
+		return NETDEV_TX_OK;
+	}
+
+	memcpy(dev->tx_buf, skb->data, skb->len);
+	if (skb->len < IPHETH_BUF_SIZE)
+		memset(dev->tx_buf + skb->len, 0, IPHETH_BUF_SIZE - skb->len);
+
+	usb_fill_bulk_urb(dev->tx_urb, udev,
+			  usb_sndbulkpipe(udev, dev->bulk_out),
+			  dev->tx_buf, IPHETH_BUF_SIZE,
+			  ipheth_sndbulk_callback,
+			  dev);
+	dev->tx_urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP;
+
+	retval = usb_submit_urb(dev->tx_urb, GFP_ATOMIC);
+	if (retval) {
+		err("%s: usb_submit_urb: %d", __func__, retval);
+		dev->net->stats.tx_errors++;
+		dev_kfree_skb_irq(skb);
+	} else {
+		dev->tx_skb = skb;
+
+		dev->net->stats.tx_packets++;
+		dev->net->stats.tx_bytes += skb->len;
+		netif_stop_queue(net);
+	}
+
+	return NETDEV_TX_OK;
+}
+
+static void ipheth_tx_timeout(struct net_device *net)
+{
+	struct ipheth_device *dev = netdev_priv(net);
+
+	err("%s: TX timeout", __func__);
+	dev->net->stats.tx_errors++;
+	usb_unlink_urb(dev->tx_urb);
+}
+
+static struct net_device_stats *ipheth_stats(struct net_device *net)
+{
+	struct ipheth_device *dev = netdev_priv(net);
+	return &dev->net->stats;
+}
+
+static u32 ipheth_ethtool_op_get_link(struct net_device *net)
+{
+	struct ipheth_device *dev = netdev_priv(net);
+	return netif_carrier_ok(dev->net);
+}
+
+static struct ethtool_ops ops = {
+	.get_link = ipheth_ethtool_op_get_link
+};
+
+static const struct net_device_ops ipheth_netdev_ops = {
+	.ndo_open = &ipheth_open,
+	.ndo_stop = &ipheth_close,
+	.ndo_start_xmit = &ipheth_tx,
+	.ndo_tx_timeout = &ipheth_tx_timeout,
+	.ndo_get_stats = &ipheth_stats,
+};
+
+static struct device_type ipheth_type = {
+	.name	= "wwan",
+};
+
+static int ipheth_probe(struct usb_interface *intf,
+			const struct usb_device_id *id)
+{
+	struct usb_device *udev = interface_to_usbdev(intf);
+	struct usb_host_interface *hintf;
+	struct usb_endpoint_descriptor *endp;
+	struct ipheth_device *dev;
+	struct net_device *netdev;
+	int i;
+	int retval;
+
+	netdev = alloc_etherdev(sizeof(struct ipheth_device));
+	if (!netdev)
+		return -ENOMEM;
+
+	netdev->netdev_ops = &ipheth_netdev_ops;
+	netdev->watchdog_timeo = IPHETH_TX_TIMEOUT;
+	strcpy(netdev->name, "wwan%d");
+
+	dev = netdev_priv(netdev);
+	dev->udev = udev;
+	dev->net = netdev;
+	dev->intf = intf;
+
+	/* Set up endpoints */
+	hintf = usb_altnum_to_altsetting(intf, IPHETH_ALT_INTFNUM);
+	if (hintf == NULL) {
+		retval = -ENODEV;
+		err("Unable to find alternate settings interface");
+		goto err_endpoints;
+	}
+
+	for (i = 0; i < hintf->desc.bNumEndpoints; i++) {
+		endp = &hintf->endpoint[i].desc;
+		if (usb_endpoint_is_bulk_in(endp))
+			dev->bulk_in = endp->bEndpointAddress;
+		else if (usb_endpoint_is_bulk_out(endp))
+			dev->bulk_out = endp->bEndpointAddress;
+	}
+	if (!(dev->bulk_in && dev->bulk_out)) {
+		retval = -ENODEV;
+		err("Unable to find endpoints");
+		goto err_endpoints;
+	}
+
+	dev->ctrl_buf = kmalloc(IPHETH_CTRL_BUF_SIZE, GFP_KERNEL);
+	if (dev->ctrl_buf == NULL) {
+		retval = -ENOMEM;
+		goto err_alloc_ctrl_buf;
+	}
+
+	retval = ipheth_get_macaddr(dev);
+	if (retval)
+		goto err_get_macaddr;
+
+	INIT_DELAYED_WORK(&dev->carrier_work, ipheth_carrier_check_work);
+
+	retval = ipheth_alloc_urbs(dev);
+	if (retval) {
+		err("error allocating urbs: %d", retval);
+		goto err_alloc_urbs;
+	}
+
+	usb_set_intfdata(intf, dev);
+
+	SET_NETDEV_DEV(netdev, &intf->dev);
+	SET_ETHTOOL_OPS(netdev, &ops);
+	SET_NETDEV_DEVTYPE(netdev, &ipheth_type);
+
+	retval = register_netdev(netdev);
+	if (retval) {
+		err("error registering netdev: %d", retval);
+		retval = -EIO;
+		goto err_register_netdev;
+	}
+
+	dev_info(&intf->dev, "Apple iPhone USB Ethernet device attached\n");
+	return 0;
+
+err_register_netdev:
+	ipheth_free_urbs(dev);
+err_alloc_urbs:
+err_get_macaddr:
+err_alloc_ctrl_buf:
+	kfree(dev->ctrl_buf);
+err_endpoints:
+	free_netdev(netdev);
+	return retval;
+}
+
+static void ipheth_disconnect(struct usb_interface *intf)
+{
+	struct ipheth_device *dev;
+
+	dev = usb_get_intfdata(intf);
+	if (dev != NULL) {
+		unregister_netdev(dev->net);
+		ipheth_kill_urbs(dev);
+		ipheth_free_urbs(dev);
+		kfree(dev->ctrl_buf);
+		free_netdev(dev->net);
+	}
+	usb_set_intfdata(intf, NULL);
+	dev_info(&intf->dev, "Apple iPhone USB Ethernet now disconnected\n");
+}
+
+static struct usb_driver ipheth_driver = {
+	.name =		"ipheth",
+	.probe =	ipheth_probe,
+	.disconnect =	ipheth_disconnect,
+	.id_table =	ipheth_table,
+};
+
+static int __init ipheth_init(void)
+{
+	int retval;
+
+	retval = usb_register(&ipheth_driver);
+	if (retval) {
+		err("usb_register failed: %d", retval);
+		return retval;
+	}
+	return 0;
+}
+
+static void __exit ipheth_exit(void)
+{
+	usb_deregister(&ipheth_driver);
+}
+
+module_init(ipheth_init);
+module_exit(ipheth_exit);
+
+MODULE_AUTHOR("Diego Giagio <diego@giagio.com>");
+MODULE_DESCRIPTION("Apple iPhone USB Ethernet driver");
+MODULE_LICENSE("Dual BSD/GPL");
-- 
1.7.0

^ permalink raw reply related

* [PATCH] rdma/cm: Randomize local port allocation.
From: Sean Hefty @ 2010-04-15 19:55 UTC (permalink / raw)
  To: 'Tetsuo Handa'
  Cc: amwang-H+wXaHxf7aLQT0dZR+AlfA, opurdila-+zzKsuq53OdBDgjK7y7TUQ,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	rolandd-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <201004150229.o3F2T4dZ054768-etx+eQDEXHD7nzcFbJAaVXf5DAMn2ifp@public.gmane.org>

From: Tetsuo Handa <penguin-kernel-JPay3/Yim36HaxMnTkn67Xf5DAMn2ifp@public.gmane.org>

>Randomize local port allocation in a way sctp_get_port_local() does.
>Update rover at the end of loop since we're likely to pick a valid port
>on the first try.
>
>Signed-off-by: Tetsuo Handa <penguin-kernel-JPay3/Yim36HaxMnTkn67Xf5DAMn2ifp@public.gmane.org>
Reviewed-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

>---

I like this version, thanks!  I'm not sure which tree to merge it through.
Are you needing this for 2.6.34, or is 2.6.35 okay?

> drivers/infiniband/core/cma.c |   70 +++++++++++++++--------------------------
>-
> 1 file changed, 25 insertions(+), 45 deletions(-)
>
>--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
>+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
>@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
> static DEFINE_IDR(tcp_ps);
> static DEFINE_IDR(udp_ps);
> static DEFINE_IDR(ipoib_ps);
>-static int next_port;
>
> struct cma_device {
> 	struct list_head	list;
>@@ -1970,47 +1969,33 @@ err1:
>
> static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
> {
>-	struct rdma_bind_list *bind_list;
>-	int port, ret, low, high;
>-
>-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
>-	if (!bind_list)
>-		return -ENOMEM;
>-
>-retry:
>-	/* FIXME: add proper port randomization per like inet_csk_get_port */
>-	do {
>-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
>-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
>-
>-	if (ret)
>-		goto err1;
>+	static unsigned int last_used_port;
>+	int low, high, remaining;
>+	unsigned int rover;
>
> 	inet_get_local_port_range(&low, &high);
>-	if (port > high) {
>-		if (next_port != low) {
>-			idr_remove(ps, port);
>-			next_port = low;
>-			goto retry;
>-		}
>-		ret = -EADDRNOTAVAIL;
>-		goto err2;
>+	remaining = (high - low) + 1;
>+	rover = net_random() % remaining + low;
>+retry:
>+	if (last_used_port != rover &&
>+	    !idr_find(ps, (unsigned short) rover)) {
>+		int ret = cma_alloc_port(ps, id_priv, rover);
>+		/*
>+		 * Remember previously used port number in order to avoid
>+		 * re-using same port immediately after it is closed.
>+		 */
>+		if (!ret)
>+			last_used_port = rover;
>+		if (ret != -EADDRNOTAVAIL)
>+			return ret;
> 	}
>-
>-	if (port == high)
>-		next_port = low;
>-	else
>-		next_port = port + 1;
>-
>-	bind_list->ps = ps;
>-	bind_list->port = (unsigned short) port;
>-	cma_bind_port(bind_list, id_priv);
>-	return 0;
>-err2:
>-	idr_remove(ps, port);
>-err1:
>-	kfree(bind_list);
>-	return ret;
>+	if (--remaining) {
>+		rover++;
>+		if ((rover < low) || (rover > high))
>+			rover = low;
>+		goto retry;
>+	}
>+	return -EADDRNOTAVAIL;
> }
>
> static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
>@@ -2995,12 +2980,7 @@ static void cma_remove_one(struct ib_dev
>
> static int __init cma_init(void)
> {
>-	int ret, low, high, remaining;
>-
>-	get_random_bytes(&next_port, sizeof next_port);
>-	inet_get_local_port_range(&low, &high);
>-	remaining = (high - low) + 1;
>-	next_port = ((unsigned int) next_port % remaining) + low;
>+	int ret;
>
> 	cma_wq = create_singlethread_workqueue("rdma_cm");
> 	if (!cma_wq)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* pull request: wireless-2.6 2010-04-15
From: John W. Linville @ 2010-04-15 20:03 UTC (permalink / raw)
  To: davem; +Cc: linux-wireless, netdev, linux-kernel

Dave,

Another fix intended for 2.6.34...without it some firmware wierdness can
induce the driver into hanging the box... :-(

Please let me know if there are problems!

Thanks,

John

---

The following changes since commit 4eaa0e3c869acd5dbc7c2e3818a9ae9cbf221d27:
  Eric Dumazet (1):
        fib: suppress lockdep-RCU false positive in FIB trie.

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git master

Johannes Berg (1):
      iwlwifi: work around bogus active chains detection

 drivers/net/wireless/iwlwifi/iwl-calib.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-calib.c b/drivers/net/wireless/iwlwifi/iwl-calib.c
index 845831a..64de42b 100644
--- a/drivers/net/wireless/iwlwifi/iwl-calib.c
+++ b/drivers/net/wireless/iwlwifi/iwl-calib.c
@@ -807,6 +807,18 @@ void iwl_chain_noise_calibration(struct iwl_priv *priv,
 		}
 	}
 
+	/*
+	 * The above algorithm sometimes fails when the ucode
+	 * reports 0 for all chains. It's not clear why that
+	 * happens to start with, but it is then causing trouble
+	 * because this can make us enable more chains than the
+	 * hardware really has.
+	 *
+	 * To be safe, simply mask out any chains that we know
+	 * are not on the device.
+	 */
+	active_chains &= priv->hw_params.valid_rx_ant;
+
 	num_tx_chains = 0;
 	for (i = 0; i < NUM_RX_CHAINS; i++) {
 		/* loops on all the bits of
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply related

* Re: rps perfomance WAS(Re: rps: question
From: jamal @ 2010-04-15 20:16 UTC (permalink / raw)
  To: Rick Jones
  Cc: David Miller, eric.dumazet, therbert, netdev, robert, xiaosuo,
	andi
In-Reply-To: <4BC741AE.3000108@hp.com>

On Thu, 2010-04-15 at 09:41 -0700, Rick Jones wrote:

> IPS (~= RPS) was running on shared FSB HP9000's.  Now, that was also a BSD 
> networking stack with netisrq's and the like.  TOPS (~= RFS) was also run on 
> shared FSB HP9000s, as well as CC-NUMA HP9000s and Integrity systems.  TOPS was 
> implemented in a Streams-based stack tracing its history to a common ancestor 
> with Solaris (Mentat).

Sounds interesting.
Wikipedia information overload. Any arch description of the HP9000? 
Did your scheme use IPIs to message the other CPUs?

cheers,
jamal 


^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: Rick Jones @ 2010-04-15 20:25 UTC (permalink / raw)
  To: hadi; +Cc: David Miller, eric.dumazet, therbert, netdev, robert, xiaosuo,
	andi
In-Reply-To: <1271362581.23780.12.camel@bigi>

jamal wrote:
> On Thu, 2010-04-15 at 09:41 -0700, Rick Jones wrote:
> 
> 
>>IPS (~= RPS) was running on shared FSB HP9000's.  Now, that was also a BSD 
>>networking stack with netisrq's and the like.  TOPS (~= RFS) was also run on 
>>shared FSB HP9000s, as well as CC-NUMA HP9000s and Integrity systems.  TOPS was 
>>implemented in a Streams-based stack tracing its history to a common ancestor 
>>with Solaris (Mentat).
> 
> 
> Sounds interesting.
> Wikipedia information overload. Any arch description of the HP9000? 

I should have been more specific - HP 9000 Model 800's :) PA-RISC based business 
computers running HP-UX.  In the case of IPS, HP-UX 10.20 ca 1995 or so.

> Did your scheme use IPIs to message the other CPUs?

Netisrs were kernel processes one per CPU (back then a core, a processor and a 
CPU were one and the same :), and while we didn't call them IPI's, yes, it was a 
"soft interrupt" directed at the given processor to launch the netisr if it 
wasn't already running.

TOPS was similar, but was with Streams and that did/does have some kernel 
processes not everything would happen as a kernel process.

rick jones

HP 3000 Model 900's - by and large the same PA-RISC hardware but running MPE/XL 
(later called MPE/iX)
HP 9000 Model 700's - PA-RISC based workstations
HP 9000 Model 300's - Moto 68K-based workstations (replaced by the 700s)

^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: Eric Dumazet @ 2010-04-15 20:30 UTC (permalink / raw)
  To: David Miller; +Cc: krkumar2, netdev, nuclearcat
In-Reply-To: <20100415.020619.00349859.davem@davemloft.net>

Le jeudi 15 avril 2010 à 02:06 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 15 Apr 2010 10:51:47 +0200
> 
> > In any case, I think there is a fundamental problem with this sk
> > caching. Because one packet can travel in many stacked devices before
> > hitting the wire.
> > 
> > (bonding, vlan, ethernet) for example.
> > 
> > Socket cache is meaningfull for one level only...
> 
> We were talking the other day about that 'tun' change to orphan the
> SKB on TX, and I mentioned the possibility of just doing this in some
> generic location before we give the packet to the device ->xmit()
> method.
> 
> Such a scheme could help with this problem too.

Same thing we did with 

if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
	skb_dst_drop(skb);

in dev_hard_start_xmit() ?

Problem is this skb_tstamp_tx() thing....

One possibility would be to change skb_orphan() to let everything done
by destructor. 

One more argument would let destructor() know if this is the final
destructor() called from skb_release_head_state()

destructor() would be responsible to set skb->destructor and/or skb->sk
to NULL when possible. 


All normal destructors would not care of this 2nd argument and just do
what they actually do, plus setting skb->sk = NULL, skb->destructor =
NULL

Fast path would stay as today, no extra test.

Only tstamp users would need to setup another destructor, a bit more
complex (it would have to take a look at 2nd argument before really
doing the job)


Completely untested patch to get the idea :
(to be completed for the tstamp thing)


 include/linux/skbuff.h         |    8 +++-----
 include/net/sctp/sctp.h        |    2 +-
 include/net/sock.h             |    4 ++--
 net/caif/caif_socket.c         |    4 +++-
 net/core/dev.c                 |   26 +++++++++-----------------
 net/core/skbuff.c              |    2 +-
 net/core/sock.c                |    8 ++++++--
 net/l2tp/l2tp_core.c           |    4 +++-
 net/netfilter/nf_tproxy_core.c |    2 +-
 net/packet/af_packet.c         |    4 ++--
 net/unix/af_unix.c             |    4 ++--
 11 files changed, 33 insertions(+), 35 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 38501d2..7dfd833 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -353,7 +353,7 @@ struct sk_buff {
 	kmemcheck_bitfield_end(flags1);
 	__be16			protocol;
 
-	void			(*destructor)(struct sk_buff *skb);
+	void			(*destructor)(struct sk_buff *skb, int final);
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 	struct nf_conntrack	*nfct;
 	struct sk_buff		*nfct_reasm;
@@ -1407,15 +1407,13 @@ static inline void pskb_trim_unique(struct sk_buff *skb, unsigned int len)
  *	@skb: buffer to orphan
  *
  *	If a buffer currently has an owner then we call the owner's
- *	destructor function and make the @skb unowned. The buffer continues
+ *	destructor function. The buffer continues
  *	to exist but is no longer charged to its former owner.
  */
 static inline void skb_orphan(struct sk_buff *skb)
 {
 	if (skb->destructor)
-		skb->destructor(skb);
-	skb->destructor = NULL;
-	skb->sk		= NULL;
+		skb->destructor(skb, 0);
 }
 
 /**
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 5915155..49e2162 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -130,7 +130,7 @@ int sctp_inet_listen(struct socket *sock, int backlog);
 void sctp_write_space(struct sock *sk);
 unsigned int sctp_poll(struct file *file, struct socket *sock,
 		poll_table *wait);
-void sctp_sock_rfree(struct sk_buff *skb);
+void sctp_sock_rfree(struct sk_buff *skb, int final);
 void sctp_copy_sock(struct sock *newsk, struct sock *sk,
 		    struct sctp_association *asoc);
 extern struct percpu_counter sctp_sockets_allocated;
diff --git a/include/net/sock.h b/include/net/sock.h
index 56df440..a042c9d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -988,8 +988,8 @@ extern struct sk_buff		*sock_wmalloc(struct sock *sk,
 extern struct sk_buff		*sock_rmalloc(struct sock *sk,
 					      unsigned long size, int force,
 					      gfp_t priority);
-extern void			sock_wfree(struct sk_buff *skb);
-extern void			sock_rfree(struct sk_buff *skb);
+extern void			sock_wfree(struct sk_buff *skb, int final);
+extern void			sock_rfree(struct sk_buff *skb, int final);
 
 extern int			sock_setsockopt(struct socket *sock, int level,
 						int op, char __user *optval,
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index cdf62b9..4e7276a 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -256,10 +256,12 @@ static void caif_sktflowctrl_cb(struct cflayer *layr,
 	}
 }
 
-static void skb_destructor(struct sk_buff *skb)
+static void skb_destructor(struct sk_buff *skb, int final)
 {
 	dbfs_atomic_inc(&cnt.skb_free);
 	dbfs_atomic_dec(&cnt.skb_in_use);
+	skb->sk = NULL;
+	skb->destructor = NULL;
 }
 
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 876b111..9bffbe5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1828,12 +1828,12 @@ static int illegal_highdma(struct net_device *dev, struct sk_buff *skb)
 }
 
 struct dev_gso_cb {
-	void (*destructor)(struct sk_buff *skb);
+	void (*destructor)(struct sk_buff *skb, int final);
 };
 
 #define DEV_GSO_CB(skb) ((struct dev_gso_cb *)(skb)->cb)
 
-static void dev_gso_skb_destructor(struct sk_buff *skb)
+static void dev_gso_skb_destructor(struct sk_buff *skb, int final)
 {
 	struct dev_gso_cb *cb;
 
@@ -1847,7 +1847,9 @@ static void dev_gso_skb_destructor(struct sk_buff *skb)
 
 	cb = DEV_GSO_CB(skb);
 	if (cb->destructor)
-		cb->destructor(skb);
+		cb->destructor(skb, final);
+	skb->sk = NULL;
+	skb->destructor = NULL;
 }
 
 /**
@@ -1904,23 +1906,11 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(skb);
 
+		skb_orphan(skb);
+
 		rc = ops->ndo_start_xmit(skb, dev);
 		if (rc == NETDEV_TX_OK)
 			txq_trans_update(txq);
-		/*
-		 * TODO: if skb_orphan() was called by
-		 * dev->hard_start_xmit() (for example, the unmodified
-		 * igb driver does that; bnx2 doesn't), then
-		 * skb_tx_software_timestamp() will be unable to send
-		 * back the time stamp.
-		 *
-		 * How can this be prevented? Always create another
-		 * reference to the socket before calling
-		 * dev->hard_start_xmit()? Prevent that skb_orphan()
-		 * does anything in dev->hard_start_xmit() by clearing
-		 * the skb destructor before the call and restoring it
-		 * afterwards, then doing the skb_orphan() ourselves?
-		 */
 		return rc;
 	}
 
@@ -1938,6 +1928,8 @@ gso:
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(nskb);
 
+		skb_orphan(nskb);
+
 		rc = ops->ndo_start_xmit(nskb, dev);
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index bdea0ef..90c171f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -396,7 +396,7 @@ static void skb_release_head_state(struct sk_buff *skb)
 #endif
 	if (skb->destructor) {
 		WARN_ON(in_irq());
-		skb->destructor(skb);
+		skb->destructor(skb, 1);
 	}
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 	nf_conntrack_put(skb->nfct);
diff --git a/net/core/sock.c b/net/core/sock.c
index 7effa1e..fcf67ea 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1259,7 +1259,7 @@ void __init sk_init(void)
 /*
  * Write buffer destructor automatically called from kfree_skb.
  */
-void sock_wfree(struct sk_buff *skb)
+void sock_wfree(struct sk_buff *skb, int final)
 {
 	struct sock *sk = skb->sk;
 	unsigned int len = skb->truesize;
@@ -1279,18 +1279,22 @@ void sock_wfree(struct sk_buff *skb)
 	 */
 	if (atomic_sub_and_test(len, &sk->sk_wmem_alloc))
 		__sk_free(sk);
+	skb->sk = NULL;
+	skb->destructor = NULL;
 }
 EXPORT_SYMBOL(sock_wfree);
 
 /*
  * Read buffer destructor automatically called from kfree_skb.
  */
-void sock_rfree(struct sk_buff *skb)
+void sock_rfree(struct sk_buff *skb, int final)
 {
 	struct sock *sk = skb->sk;
 
 	atomic_sub(skb->truesize, &sk->sk_rmem_alloc);
 	sk_mem_uncharge(skb->sk, skb->truesize);
+	skb->sk = NULL;
+	skb->destructor = NULL;
 }
 EXPORT_SYMBOL(sock_rfree);
 
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 98dfcce..a3b0a95 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -973,9 +973,11 @@ EXPORT_SYMBOL_GPL(l2tp_xmit_core);
 
 /* Automatically called when the skb is freed.
  */
-static void l2tp_sock_wfree(struct sk_buff *skb)
+static void l2tp_sock_wfree(struct sk_buff *skb, int final)
 {
 	sock_put(skb->sk);
+	skb->sk = NULL;
+	skb->destructor = NULL;
 }
 
 /* For data skbs that we transmit, we associate with the tunnel socket
diff --git a/net/netfilter/nf_tproxy_core.c b/net/netfilter/nf_tproxy_core.c
index 5490fc3..17dd2d9 100644
--- a/net/netfilter/nf_tproxy_core.c
+++ b/net/netfilter/nf_tproxy_core.c
@@ -55,7 +55,7 @@ nf_tproxy_get_sock_v4(struct net *net, const u8 protocol,
 EXPORT_SYMBOL_GPL(nf_tproxy_get_sock_v4);
 
 static void
-nf_tproxy_destructor(struct sk_buff *skb)
+nf_tproxy_destructor(struct sk_buff *skb, int final)
 {
 	struct sock *sk = skb->sk;
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index f162d59..dc8e843 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -808,7 +808,7 @@ ring_is_full:
 	goto drop_n_restore;
 }
 
-static void tpacket_destruct_skb(struct sk_buff *skb)
+static void tpacket_destruct_skb(struct sk_buff *skb, int final)
 {
 	struct packet_sock *po = pkt_sk(skb->sk);
 	void *ph;
@@ -823,7 +823,7 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
 		__packet_set_status(po, ph, TP_STATUS_AVAILABLE);
 	}
 
-	sock_wfree(skb);
+	sock_wfree(skb, final);
 }
 
 static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 3d9122e..17fca55 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1303,7 +1303,7 @@ static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 		unix_notinflight(scm->fp->fp[i]);
 }
 
-static void unix_destruct_fds(struct sk_buff *skb)
+static void unix_destruct_fds(struct sk_buff *skb, int final)
 {
 	struct scm_cookie scm;
 	memset(&scm, 0, sizeof(scm));
@@ -1312,7 +1312,7 @@ static void unix_destruct_fds(struct sk_buff *skb)
 	/* Alas, it calls VFS */
 	/* So fscking what? fput() had been SMP-safe since the last Summer */
 	scm_destroy(&scm);
-	sock_wfree(skb);
+	sock_wfree(skb, final);
 }
 
 static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)



^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox