* Re: [PATCH 1/3] TIPC: Removing EXPERIMENTAL label
From: Paul Gortmaker @ 2012-05-25 19:05 UTC (permalink / raw)
To: David Miller
Cc: jon.maloy, netdev, tipc-discussion, ying.xue, erik.hugne,
allan.stephens, maloy
In-Reply-To: <20120524.161231.1058511318935925082.davem@davemloft.net>
[Re: [PATCH 1/3] TIPC: Removing EXPERIMENTAL label] On 24/05/2012 (Thu 16:12) David Miller wrote:
> From: Paul Gortmaker <paul.gortmaker@windriver.com>
> Date: Thu, 24 May 2012 15:58:16 -0400
>
> > But for new TIPC development features, future direction, and things like
> > that -- making the right call requires intimate understanding of TIPC
> > and its users, which is something that a maintainer should have but
> > something I know I don't have. (A man has to know his limitations.)
> >
> > In this context, I'm not talking about these three trivial patches; but
> > more complicated stuff that I imagine will be floated in the future.
> >
> > To that end, I can still review and call out issues in a crap patch when
> > I see them. But I'd like to see new stuff sent to netdev, so that folks
> > smarter than me have a chance to catch when a patch appears generally OK
> > but is architecturally the wrong direction etc.
>
> For maintainership, taste is more important than deep knowledge of the
> specific technology. Worst case you ask the submitter to explain the
> background of their change more thoroughly and that information is an
> absolutely requirement in the commit message and code comments
> anyways.
OK, what I'm hearing is that you'd prefer I continue to collect up TIPC
patches and issue pull requests for a while longer. I can do that. Any
specifics of how you'd like things done? -- e.g. if the reviews of new
TIPC development patches takes place here on netdev before I stage them,
will that create extra work for you dealing with them in patchworks?
Paul.
^ permalink raw reply
* Re: Using jiffies for tcp_time_stamp?
From: Eric Dumazet @ 2012-05-25 19:00 UTC (permalink / raw)
To: Srećko Jurić-Kavelj; +Cc: Dave Taht, Chris Friesen, netdev
In-Reply-To: <CAACrLC0eVZJdu802Ff4BWRBKuabS=Z-T3WvZCxyqBTem8-n8ug@mail.gmail.com>
On Fri, 2012-05-25 at 20:35 +0200, Srećko Jurić-Kavelj wrote:
> From what I've seen in the code, NO_HZ doesn't make jiffies go away,
> it simply doesn't use regular CONFIG_HZ interrupt to update, but
> updates them when has an opportunity?
HZ=1000 makes jiffies 10 times more precise, and with NO_HZ, generates
no extra timer interrupts.
This also makes timers workload smoothed, instead of spikes.
^ permalink raw reply
* Re: Using jiffies for tcp_time_stamp?
From: Srećko Jurić-Kavelj @ 2012-05-25 18:35 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Dave Taht, Chris Friesen, netdev
In-Reply-To: <1337964880.3347.52.camel@edumazet-glaptop>
On Fri, May 25, 2012 at 6:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> linux TCP uses high precision timestamps (ktime_get_real()) where
> needed.
>
> # find net|xargs grep -n TCP_CONG_RTT_STAMP
> net/ipv4/tcp_veno.c:205: .flags = TCP_CONG_RTT_STAMP,
> net/ipv4/tcp_vegas.c:308: .flags = TCP_CONG_RTT_STAMP,
> net/ipv4/tcp_cubic.c:478: cubictcp.flags |= TCP_CONG_RTT_STAMP;
> net/ipv4/tcp_output.c:815: if (icsk->icsk_ca_ops->flags & TCP_CONG_RTT_STAMP)
> net/ipv4/tcp_lp.c:317: .flags = TCP_CONG_RTT_STAMP,
> net/ipv4/tcp_yeah.c:229: .flags = TCP_CONG_RTT_STAMP,
> net/ipv4/tcp_illinois.c:326: .flags = TCP_CONG_RTT_STAMP,
> net/ipv4/tcp_input.c:3496: if (ca_ops->flags & TCP_CONG_RTT_STAMP &&
Didn't know about TCP_CONG_RTT_STAMP.
Thing is, the device I'm connecting to doesn't even support TCP time
stamp option. The returning SYN ACK packet only has maximum segment
size 1460 bytes in options.
From the net/ipv4/tcp_input.c code, RTT is estimated using
#define tcp_time_stamp ((__u32)(jiffies))
from include/net/tcp.h.
Could ktime_get_real() be used for tcp_time_stamp instead of jiffies?
> Other than that HZ=1000 seems fine.
>
> HZ=100 seems a poor choice, we have NO_HZ since a long time.
I have:
$ grep HZ /boot/config-2.6.32-41-generic
CONFIG_NO_HZ=y
CONFIG_HZ_100=y
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
CONFIG_MACHZ_WDT=m
From what I've seen in the code, NO_HZ doesn't make jiffies go away,
it simply doesn't use regular CONFIG_HZ interrupt to update, but
updates them when has an opportunity?
--
JKS
^ permalink raw reply
* Inadvertently sending a Christmas Tree TCP packet
From: Earl Chew @ 2012-05-25 18:30 UTC (permalink / raw)
To: netdev
Does anyone have a reference to any discussions or patches that address this issue ?
Running a userspace daemon on a rather old 2.6.18 system can inadvertently cause a TCP
packet containing flags FIN, PSH, ACK and URG (see packet 16237) which can cause the receiver
(not Linux in this case) to become confused:
16220 111.075627 10.64.33.43 10.128.163.100 TCP 59253 > exec [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=2
16222 0.203210 10.128.163.100 10.64.33.43 TCP exec > 59253 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1250 WS=7
16223 0.000032 10.64.33.43 10.128.163.100 TCP 59253 > exec [ACK] Seq=1 Ack=1 Win=65532 Len=0
16224 0.000215 10.64.33.43 10.128.163.100 TCP 59253 > exec [PSH, ACK] Seq=1 Ack=1 Win=65532 Len=6
16225 0.202465 10.128.163.100 10.64.33.43 TCP exec > 59253 [ACK] Seq=1 Ack=7 Win=5888 Len=0
16229 0.209383 10.64.33.43 10.128.163.100 TCP 59253 > exec [PSH, ACK] Seq=7 Ack=1 Win=65532 Len=9
16231 0.202573 10.128.163.100 10.64.33.43 TCP exec > 59253 [ACK] Seq=1 Ack=16 Win=5888 Len=0
16232 0.000024 10.64.33.43 10.128.163.100 TCP 59253 > exec [PSH, ACK] Seq=16 Ack=1 Win=65532 Len=14
16233 0.202618 10.128.163.100 10.64.33.43 TCP exec > 59253 [ACK] Seq=1 Ack=30 Win=5888 Len=0
16234 0.012718 10.128.163.100 10.64.33.43 TCP exec > 59253 [PSH, ACK] Seq=1 Ack=30 Win=5888 Len=1
16235 0.101229 10.128.163.100 10.64.33.43 TCP exec > 59253 [PSH, ACK] Seq=2 Ack=30 Win=5888 Len=29
16236 0.000032 10.64.33.43 10.128.163.100 TCP 59253 > exec [ACK] Seq=30 Ack=31 Win=65504 Len=0
16237 0.000319 10.128.163.100 10.64.33.43 TCP exec > 59253 [FIN, PSH, ACK, URG] Seq=31 Ack=30 Win=5888 Urg=1 Len=1
16240 1.114085 10.128.163.100 10.64.33.43 TCP [TCP Retransmission] exec > 59253 [FIN, PSH, ACK, URG] Seq=31 Ack=30 Win=5888 Urg=1 Len=1
The receiver has become confused, and the so the Linux sender retransmits at packet 16240, and continues retransmitting.
In this case, the application code at the receiver is blocked indefinitely trying to read a socket that seemingly
has (URG) data and yet at the same time doesn't have any more data (FIN).
Perhaps the making of a DOS attack ?
Earl
^ permalink raw reply
* Re: [PATCH 05/21] vswitchd: Add add_tunnel_ports()
From: Ben Pfaff @ 2012-05-25 17:18 UTC (permalink / raw)
To: Simon Horman; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1337850554-10339-6-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
On Thu, May 24, 2012 at 06:08:58PM +0900, Simon Horman wrote:
> Add tunnel tundevs for tunnel realdevs as needed.
>
> In general the notion is that realdevs may be configured by users
> and from an end-user point of view are compatible with the existing
> port-based tunneling code. And that tundevs exist in the datapath
> arnd are actually used to send and recieve packets, based on flows.
>
> Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
This seems reasonable at a glance. There are bits I might quibble
with as this gets closer, but the structure seems reasonable.
^ permalink raw reply
* Re: Strange latency spikes/TX network stalls on Sun Fire X4150(x86) and e1000e
From: Eric Dumazet @ 2012-05-25 17:18 UTC (permalink / raw)
To: Tom Herbert; +Cc: Denys Fedoryshchenko, netdev
In-Reply-To: <CA+mtBx8fpS_w7X2iq+tOEFuFhhgJ-4V_1ubBsd8-MvvmCef45w@mail.gmail.com>
On Fri, 2012-05-25 at 09:59 -0700, Tom Herbert wrote:
> > TX completion has no budget, I am not sure what you mean.
> >
> Right, it's the budget on RX that can be a factor. TX completion is
> done from the NAPI poll routine, so if RX does not complete the NAPI
> is rescheduled and TX completion is done again for the same HW
> interrupt.
>
> We need to remove the constraint that netdev_completed can only be
> called once per interrupt...
Not clear where is this constraint in the code.
Under heavy load, we can be in a loop situation, one cpu serving NAPI
for a bunch of devices (no more hardware interrupts are delivered since
we dont re-enable them at all)
^ permalink raw reply
* Re: [RFC] mac80211: Use correct originator sequence number in a Path Reply
From: Qasim Javed @ 2012-05-25 17:17 UTC (permalink / raw)
To: Javier Cardona
Cc: devel-ZwoEplunGu1xMJw8dq7oimD2FQJk+8+b,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, ravip-DNmUmOh1Rg72fBVCVOL8/A
In-Reply-To: <CAEFj987dNMyMcS9rySzVpfY0Fo5t0LtL9FZg770ohKi+bDO9ZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Forgot to add Javier. Could you please comment on this?
Thanks,
-Qasim
On Fri, May 25, 2012 at 11:31 AM, Yeoh Chun-Yeow <yeohchunyeow-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> I think that the PREQ element has originator sequence number at the
>> end but the PREQ element has the target sequence number at the end.
>> This is what mesh_path_sel_frame_tx is doing.
>
> PREP element has originator sequence number at the end but the PREQ
> element has the target sequence number at the end.
>
> Regards,
> Chun-Yeow
> _______________________________________________
> Devel mailing list
> Devel-ZwoEplunGu1xMJw8dq7oimD2FQJk+8+b@public.gmane.org
> http://lists.open80211s.org/cgi-bin/mailman/listinfo/devel
^ permalink raw reply
* Re: Strange latency spikes/TX network stalls on Sun Fire X4150(x86) and e1000e
From: Tom Herbert @ 2012-05-25 16:59 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Denys Fedoryshchenko, netdev
In-Reply-To: <1337926938.7753.8.camel@edumazet-glaptop>
> TX completion has no budget, I am not sure what you mean.
>
Right, it's the budget on RX that can be a factor. TX completion is
done from the NAPI poll routine, so if RX does not complete the NAPI
is rescheduled and TX completion is done again for the same HW
interrupt.
We need to remove the constraint that netdev_completed can only be
called once per interrupt...
Tom
> e1000e driver indeed has a limit : It cannot clean more than
> tx_ring->count frames per e1000_clean_tx_irq() invocation.
>
> But with BQL, this should not happen ?
>
> # ethtool -g eth0
> Ring parameters for eth0:
> Pre-set maximums:
> RX: 4096
> RX Mini: 0
> RX Jumbo: 0
> TX: 4096
> Current hardware settings:
> RX: 256
> RX Mini: 0
> RX Jumbo: 0
> TX: 256
>
>
^ permalink raw reply
* Re: Using jiffies for tcp_time_stamp?
From: Eric Dumazet @ 2012-05-25 16:54 UTC (permalink / raw)
To: Srećko Jurić-Kavelj; +Cc: Dave Taht, Chris Friesen, netdev
In-Reply-To: <CAACrLC0b5dyjJM=DGf-9nUOwar3O9EVTTR0tvynQj285EDpfwA@mail.gmail.com>
On Fri, 2012-05-25 at 18:23 +0200, Srećko Jurić-Kavelj wrote:
> On Fri, May 25, 2012 at 6:17 PM, Dave Taht <dave.taht@gmail.com> wrote:
> > On Fri, May 25, 2012 at 4:58 PM, Chris Friesen
> > <chris.friesen@genband.com> wrote:
> >> I don't know if it would make any difference to the tcp algorithms, but
> >> certainly on some architectures you can get a fast and accurate hardware
> >> timestamp.
> >
> > I would be interested in someone doing that experiment in light of the
> > codel work.
>
> I've looked this up in other implementations, e.g. FreeBSD uses 1ms
> granularity no matter what HZ says, NetBSD has 500ms ticks, ...
>
> I guess that granularity also depends on the retransmit timers used. I
> didn't make out what's the precision of the timers that Linux uses in
> TCP, but I guess it uses high resolution timers? At least on x86?
>
> I've done a simple experiment by repeatedly calling clock_gettime
> (from userspace, but I guess it ends up as a vsyscall). I get >17
> million calls per second on a Q6600.
linux TCP uses high precision timestamps (ktime_get_real()) where
needed.
# find net|xargs grep -n TCP_CONG_RTT_STAMP
net/ipv4/tcp_veno.c:205: .flags = TCP_CONG_RTT_STAMP,
net/ipv4/tcp_vegas.c:308: .flags = TCP_CONG_RTT_STAMP,
net/ipv4/tcp_cubic.c:478: cubictcp.flags |= TCP_CONG_RTT_STAMP;
net/ipv4/tcp_output.c:815: if (icsk->icsk_ca_ops->flags & TCP_CONG_RTT_STAMP)
net/ipv4/tcp_lp.c:317: .flags = TCP_CONG_RTT_STAMP,
net/ipv4/tcp_yeah.c:229: .flags = TCP_CONG_RTT_STAMP,
net/ipv4/tcp_illinois.c:326: .flags = TCP_CONG_RTT_STAMP,
net/ipv4/tcp_input.c:3496: if (ca_ops->flags & TCP_CONG_RTT_STAMP &&
Other than that HZ=1000 seems fine.
HZ=100 seems a poor choice, we have NO_HZ since a long time.
^ permalink raw reply
* Attention: Email Owner
From: United Nation Program @ 2012-05-25 5:30 UTC (permalink / raw)
Attention: Email Owner,
You have been selected by the UN for a
Humanitarian Development Cash Grant program
to enhance and develop the standard of
living geared towards poverty eradication as
targeted by the year 2020.You have been
granted the sum of 800,000.00 Pounds your
grant pin #UNF/FBF-816-1119 G-900-94.
Contact payment department for your funds.
Payment Officer: Mrs.Shannon Maris
Email:ungrants@ymail.com
You are to provide her your information
below for claims.
1 Full Names:
2 Full Address:
3 Nationality:
4 Age:
5 Gender:
6 Occupation:
7 Cell Phone:
8 Present Country:
9 Alternate Email Address:
Regards,
Mrs. Carlton Joan.
Chairman UNDP Grant Programme.
^ permalink raw reply
* Re: [RFC] mac80211: Use correct originator sequence number in a Path Reply
From: Yeoh Chun-Yeow @ 2012-05-25 16:31 UTC (permalink / raw)
To: devel; +Cc: netdev, linux-wireless, linux-kernel, ravip
In-Reply-To: <CAEFj984efuO12nSPsZ2A0Q0nwPLesYJ8gDHwgh=0e1VO3_UtGw@mail.gmail.com>
> I think that the PREQ element has originator sequence number at the
> end but the PREQ element has the target sequence number at the end.
> This is what mesh_path_sel_frame_tx is doing.
PREP element has originator sequence number at the end but the PREQ
element has the target sequence number at the end.
Regards,
Chun-Yeow
^ permalink raw reply
* Re: [RFC] mac80211: Use correct originator sequence number in a Path Reply
From: Yeoh Chun-Yeow @ 2012-05-25 16:29 UTC (permalink / raw)
To: devel-ZwoEplunGu1xMJw8dq7oimD2FQJk+8+b
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, ravip-DNmUmOh1Rg72fBVCVOL8/A
In-Reply-To: <CAJivULpmLp9JK+PQZ9RkrvvYHNk=uA33=A-ffftroigivy1_NQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
> No. I am referring to the originator sequence number in PREP because
> when the PREP reaches the originator of the PREQ, the originator
> sequence number in the PREP and the value of the metric is used to
> determine which PREP will be accepted. If the originator sequence
> numbers in the PREPs are different, then the PREP with the higher
> sequence number will be accepted irrespective of the value of the
> metric. Only if the originator sequence numbers in the PREP are equal
> will the metric values in the PREPs be examined.
Based on the "Table 11C-17—Contents of a PREP element in Case A", the
originator HWMP sequence number is the HWMP sequence number of the
originator mesh STA. So this value (orig_sn in
hwmp_preq_frame_process) is unchanged in the generated PREP element
upon receiving the PREQ element.
The target HWMP sequence number is the HWMP sequence number of the
target mesh STA or target proxy mesh gate after it has been updated
according to 11C.9.8.3. So this value is changed based on target_sn in
hwmp_preq_frame_process.
I think that the confusion is in the hwmp_route_info_get,
case MPATH_PREP:
/* Originator here refers to the MP that was the target in the
* Path Request. We divert from the nomenclature in the draft
* so that we can easily use a single function to gather path
* information from both PREQ and PREP frames.
*/
orig_sn = PREP_IE_TARGET_SN(hwmp_ie);
orig_sn here is actually the target HWMP sequence number of PREP
element generated by hwmp_preq_frame_process.
> Please notice that in hwmp_preq_frame_process, target_sn ends up being
> used as orig_sn for the PREP. This is probably what is causing the
> confusion in your case.
I think that the PREQ element has originator sequence number at the
end but the PREQ element has the target sequence number at the end.
This is what mesh_path_sel_frame_tx is doing.
> Your patch is definitely not what I was pointing out, in fact it
> diverts from the standard functionality since it removes the check for
> HWMPNetDiameterTraversalTime.
I thought that this is only for originator sequence number which is
done by mesh_path_start_discovery.
Regards,
Chun-Yeow
^ permalink raw reply
* Re: Using jiffies for tcp_time_stamp?
From: Srećko Jurić-Kavelj @ 2012-05-25 16:23 UTC (permalink / raw)
To: Dave Taht; +Cc: Chris Friesen, netdev
In-Reply-To: <CAA93jw7MU8PjFk-UX864STSBp2SjjjivmfpRiPhaQ2kxZLu2sA@mail.gmail.com>
On Fri, May 25, 2012 at 6:17 PM, Dave Taht <dave.taht@gmail.com> wrote:
> On Fri, May 25, 2012 at 4:58 PM, Chris Friesen
> <chris.friesen@genband.com> wrote:
>> I don't know if it would make any difference to the tcp algorithms, but
>> certainly on some architectures you can get a fast and accurate hardware
>> timestamp.
>
> I would be interested in someone doing that experiment in light of the
> codel work.
I've looked this up in other implementations, e.g. FreeBSD uses 1ms
granularity no matter what HZ says, NetBSD has 500ms ticks, ...
I guess that granularity also depends on the retransmit timers used. I
didn't make out what's the precision of the timers that Linux uses in
TCP, but I guess it uses high resolution timers? At least on x86?
I've done a simple experiment by repeatedly calling clock_gettime
(from userspace, but I guess it ends up as a vsyscall). I get >17
million calls per second on a Q6600.
--
JKS
^ permalink raw reply
* Re: Using jiffies for tcp_time_stamp?
From: Dave Taht @ 2012-05-25 16:17 UTC (permalink / raw)
To: Chris Friesen; +Cc: Srećko Jurić-Kavelj, netdev
In-Reply-To: <4FBFAC30.8050508@genband.com>
On Fri, May 25, 2012 at 4:58 PM, Chris Friesen
<chris.friesen@genband.com> wrote:
> On 05/22/2012 11:21 AM, Srećko Jurić-Kavelj wrote:
>
>> By looking at the code it's clear that the time stamping is done with
>> jiffies, and my kernel has CONFIG_HZ=100.
>>
>> I understand that this is for performance reasons (and the RTT
>> smoothing filter is implemented with bit shifting operations), but
>> would using a more precise time stamp have significant impact on
>> performance? Since RTT is used to compute RTO, wouldn't there be any
>> benefits of having more accurate estimate of this value?
>
>
> I don't know if it would make any difference to the tcp algorithms, but
> certainly on some architectures you can get a fast and accurate hardware
> timestamp.
I would be interested in someone doing that experiment in light of the
codel work.
>
> Chris
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net
^ permalink raw reply
* Re: Using jiffies for tcp_time_stamp?
From: Chris Friesen @ 2012-05-25 15:58 UTC (permalink / raw)
To: Srećko Jurić-Kavelj; +Cc: netdev
In-Reply-To: <CAACrLC2CXU-DNeonWQGJTfX53ssm_asK7WQrFuWRBB77cg-YdA@mail.gmail.com>
On 05/22/2012 11:21 AM, Srećko Jurić-Kavelj wrote:
> By looking at the code it's clear that the time stamping is done with
> jiffies, and my kernel has CONFIG_HZ=100.
>
> I understand that this is for performance reasons (and the RTT
> smoothing filter is implemented with bit shifting operations), but
> would using a more precise time stamp have significant impact on
> performance? Since RTT is used to compute RTO, wouldn't there be any
> benefits of having more accurate estimate of this value?
I don't know if it would make any difference to the tcp algorithms, but
certainly on some architectures you can get a fast and accurate hardware
timestamp.
Chris
^ permalink raw reply
* Re: [PATCH] gianfar:don't add FCB length to hard_header_len
From: Paul Gortmaker @ 2012-05-25 15:58 UTC (permalink / raw)
To: Joe Perches; +Cc: Jan Ceuleers, David Miller, b06378, netdev, linuxppc-dev
In-Reply-To: <1337876210.5070.4.camel@joe2Laptop>
[Re: [PATCH] gianfar:don't add FCB length to hard_header_len] On 24/05/2012 (Thu 09:16) Joe Perches wrote:
> On Thu, 2012-05-24 at 17:04 +0200, Jan Ceuleers wrote:
> > On 05/22/2012 09:18 PM, David Miller wrote:
> > > From: Jiajun Wu <b06378@freescale.com>
> > > Date: Tue, 22 May 2012 17:00:48 +0800
> > >
> > >> FCB(Frame Control Block) isn't the part of netdev hard header.
> > >> Add FCB to hard_header_len will make GRO fail at MAC comparision stage.
> > >>
> > >> Signed-off-by: Jiajun Wu <b06378@freescale.com>
> > >
> > > Applied, thanks.
> > >
> > > Someone needs to go through this driver when net-next opens up
> > > and fix all of the indentation in this driver.
> >
> > May I give that a go?
>
> I have scripts that automate most of this.
> I don't have the card though.
There is no card. The gianfar is a SOC for freescale 83xx, 85xx, 86xx
CPUs. If need be, I can test just as I did for your name overrun fix
in commit 0015e551e.
But you really shouldn't need the hardware to validate this kind of
patch anyways -- aside from your code flow change in the irq routine of
gianfar_ptp, you should have been simply able to check for object file
equivalence before and after your change.
Paul.
>
> Maybe this is a starting point?
> It doesn't fix most 80 column warnings.
>
> drivers/net/ethernet/freescale/gianfar.c | 299 +++++++++++-----------
> drivers/net/ethernet/freescale/gianfar_ethtool.c | 131 +++++-----
> drivers/net/ethernet/freescale/gianfar_ptp.c | 8 +-
> drivers/net/ethernet/freescale/gianfar_sysfs.c | 2 +-
> 4 files changed, 225 insertions(+), 215 deletions(-)
>
[...]
> diff --git a/drivers/net/ethernet/freescale/gianfar_ptp.c b/drivers/net/ethernet/freescale/gianfar_ptp.c
> index c08e5d4..3f7b81d 100644
> --- a/drivers/net/ethernet/freescale/gianfar_ptp.c
> +++ b/drivers/net/ethernet/freescale/gianfar_ptp.c
> @@ -268,11 +268,11 @@ static irqreturn_t isr(int irq, void *priv)
> ptp_clock_event(etsects->clock, &event);
> }
>
> - if (ack) {
> - gfar_write(&etsects->regs->tmr_tevent, ack);
> - return IRQ_HANDLED;
> - } else
> + if (!ack)
> return IRQ_NONE;
> +
> + gfar_write(&etsects->regs->tmr_tevent, ack);
> + return IRQ_HANDLED;
> }
>
> /*
^ permalink raw reply
* Re: [RFC] mac80211: Use correct originator sequence number in a Path Reply
From: Qasim Javed @ 2012-05-25 15:50 UTC (permalink / raw)
To: devel-ZwoEplunGu1xMJw8dq7oimD2FQJk+8+b
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, ravip-DNmUmOh1Rg72fBVCVOL8/A
In-Reply-To: <CAEFj985HxSrOwOoevDjG1jxPxobLda-X_LZUtj6LgwXZwozBog-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Please see my comments below.
On Fri, May 25, 2012 at 10:30 AM, Yeoh Chun-Yeow <yeohchunyeow-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi, Qasim Javed
>
> I think that you are referring to the target HWMP sequence number in
> PREP element.
No. I am referring to the originator sequence number in PREP because
when the PREP reaches the originator of the PREQ, the originator
sequence number in the PREP and the value of the metric is used to
determine which PREP will be accepted. If the originator sequence
numbers in the PREPs are different, then the PREP with the higher
sequence number will be accepted irrespective of the value of the
metric. Only if the originator sequence numbers in the PREP are equal
will the metric values in the PREPs be examined.
>
> Based on the 802.11s standard, it has specified that
> dot11MeshHWMPnetDiameterTraversalTime is only applied to original HWMP
> sequence number for PREQ as mentioned in the "Contents of a PREQ
> element" in section 11C.
>
> For PREP element, it should be based on the description in section 11C.9.8.3:
> "If it is a target mesh STA, it shall update its own HWMP sequence
> number to maximum (current HWMP sequence number, target HWMP sequence
> number in the PREQ) + 1 immediately before it generates a PREP in
> response to a PREQ. The target HWMP sequence number of the PREQ is
> relevant when a link was broken along the path and the stored sequence
> number was increased at an intermediate mesh STA."
>
> So the target HWMP sequence number should be modified as follow:
>
> diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c
> index 70ac7d1..5988e82 100644
> --- a/net/mac80211/mesh_hwmp.c
> +++ b/net/mac80211/mesh_hwmp.c
> @@ -538,12 +538,10 @@ static void hwmp_preq_frame_process(struct
> ieee80211_sub_if_data *sdata,
> forward = false;
> reply = true;
> metric = 0;
> - if (time_after(jiffies, ifmsh->last_sn_update +
> - net_traversal_jiffies(sdata)) ||
> - time_before(jiffies, ifmsh->last_sn_update)) {
> - target_sn = ++ifmsh->sn;
> - ifmsh->last_sn_update = jiffies;
> - }
> + if (SN_LT(ifmsh->sn, target_sn))
> + ifmsh->sn = target_sn;
> + target_sn = ++ifmsh->sn;
> + ifmsh->last_sn_update = jiffies;
>
> Comments.
I agree with the description in the standard but you seem to be
misinterpreting it. Please note that the function being considered
here is hwmp_preq_frame_process, which evidently processes a PREQ.
However, because a PREP is generated in response to a PREQ, this
function also checks whether a PREP needs to be generated and then
calls mesh_path_sel_frame_tx with frame type being MPATH_PREP. This
function is also passed the originator and target sequence numbers.
What I am saying is that, in the scenario described in my original
email, the wrong originator sequence number is being used for the
PREP.
Please notice that in hwmp_preq_frame_process, target_sn ends up being
used as orig_sn for the PREP. This is probably what is causing the
confusion in your case.
Your patch is definitely not what I was pointing out, in fact it
diverts from the standard functionality since it removes the check for
HWMPNetDiameterTraversalTime.
>
> Regards,
> Chun-Yeow
> _______________________________________________
> Devel mailing list
> Devel-ZwoEplunGu1xMJw8dq7oimD2FQJk+8+b@public.gmane.org
> http://lists.open80211s.org/cgi-bin/mailman/listinfo/devel
^ permalink raw reply
* Re: on the the two IPs unreachable if second interface is up
From: lejeczek @ 2012-05-25 15:35 UTC (permalink / raw)
To: netdev
In-Reply-To: <4FBFA172.2060604@yahoo.co.uk>
here is something potentially interesting!
when I arping the BOX's public IP from a system/client of
the private subnet I get replied with a mac of the BOX's
second(private) interface :-[
could this be normal?
On 25/05/12 16:12, lejeczek wrote:
> hello everybody
>
> apologies if this may feel off the topic, I was hoping
> some net experts could shed some lights on some peculiar
> symptoms I experience with one linux box
>
> a BOX has two net interfaces, a public and private one
> public IP is reachable from/via the Internet just fine
> public IP is not reachable from the same private network
> the BOX's second interface is on
> public IP becomes reachable to private subnet immediately
> after second(private) interface was turned down
> BOX's firewall whether on or off makes no difference
>
> this is the most peculiar problem of this nature I've ever
> experienced
>
> the goal is simple, have other systems on the same private
> subnet as the BOX's second interface to be able to talk to
> the BOX's public IP
> default gateway for the private subnet is a separate
> another system.
>
> any suggestions as to how to troubleshoot it I would very!
> much appreciate
> many thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe
> netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at
> http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: [RFC] mac80211: Use correct originator sequence number in a Path Reply
From: Yeoh Chun-Yeow @ 2012-05-25 15:30 UTC (permalink / raw)
To: devel-ZwoEplunGu1xMJw8dq7oimD2FQJk+8+b
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, ravip-DNmUmOh1Rg72fBVCVOL8/A
In-Reply-To: <1337934071-29342-1-git-send-email-qasimj-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Hi, Qasim Javed
I think that you are referring to the target HWMP sequence number in
PREP element.
Based on the 802.11s standard, it has specified that
dot11MeshHWMPnetDiameterTraversalTime is only applied to original HWMP
sequence number for PREQ as mentioned in the "Contents of a PREQ
element" in section 11C.
For PREP element, it should be based on the description in section 11C.9.8.3:
"If it is a target mesh STA, it shall update its own HWMP sequence
number to maximum (current HWMP sequence number, target HWMP sequence
number in the PREQ) + 1 immediately before it generates a PREP in
response to a PREQ. The target HWMP sequence number of the PREQ is
relevant when a link was broken along the path and the stored sequence
number was increased at an intermediate mesh STA."
So the target HWMP sequence number should be modified as follow:
diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c
index 70ac7d1..5988e82 100644
--- a/net/mac80211/mesh_hwmp.c
+++ b/net/mac80211/mesh_hwmp.c
@@ -538,12 +538,10 @@ static void hwmp_preq_frame_process(struct
ieee80211_sub_if_data *sdata,
forward = false;
reply = true;
metric = 0;
- if (time_after(jiffies, ifmsh->last_sn_update +
- net_traversal_jiffies(sdata)) ||
- time_before(jiffies, ifmsh->last_sn_update)) {
- target_sn = ++ifmsh->sn;
- ifmsh->last_sn_update = jiffies;
- }
+ if (SN_LT(ifmsh->sn, target_sn))
+ ifmsh->sn = target_sn;
+ target_sn = ++ifmsh->sn;
+ ifmsh->last_sn_update = jiffies;
Comments.
Regards,
Chun-Yeow
^ permalink raw reply related
* on the the two IPs unreachable if second interface is up
From: lejeczek @ 2012-05-25 15:12 UTC (permalink / raw)
To: netdev
hello everybody
apologies if this may feel off the topic, I was hoping some
net experts could shed some lights on some peculiar symptoms
I experience with one linux box
a BOX has two net interfaces, a public and private one
public IP is reachable from/via the Internet just fine
public IP is not reachable from the same private network the
BOX's second interface is on
public IP becomes reachable to private subnet immediately
after second(private) interface was turned down
BOX's firewall whether on or off makes no difference
this is the most peculiar problem of this nature I've ever
experienced
the goal is simple, have other systems on the same private
subnet as the BOX's second interface to be able to talk to
the BOX's public IP
default gateway for the private subnet is a separate another
system.
any suggestions as to how to troubleshoot it I would very!
much appreciate
many thanks!
^ permalink raw reply
* Re: skb_release_data oops
From: Eric Dumazet @ 2012-05-25 15:11 UTC (permalink / raw)
To: kendo; +Cc: netdev
In-Reply-To: <4FBF9876.032A7B.32344@m12-16.163.com>
On Fri, 2012-05-25 at 22:19 +0800, kendo wrote:
> I use the Linux kernel 2.6..38.8,found a bug when free skb,This failure may occur because what was it? Can you give some suggestions, thanks!!!!
>
> Best reguards.
>
> ---------------------------------------------------------------
>
> May 25 19:30:54 AnShion <9> klogd: [164619.378640] BUG: unable to handle kernel paging request at 000095a3
> May 25 19:30:54 AnShion <9> klogd: [164619.454609] IP: [<c01c2353>] put_page+0x3/0x40
> May 25 19:30:54 AnShion <12> klogd: [164619.508726] *pde = 00000000
> May 25 19:30:54 AnShion <8> klogd: [164619.544185] Oops: 0000 [#1] SMP
> May 25 19:30:54 AnShion <8> klogd: [164619.583891] last sysfs file: /sys/devices/virtual/net/tunl_FJ/uevent
> May 25 19:30:54 AnShion <12> klogd: [164619.660716] Modules linked in:
> dpi_engine ipmi_watchdog nf_connmark ip_set_hash_netiface ip_set_hash_net ip_set_hash_ip xt_set ip_set \
> xt_hashrate xt_dpi xt_pcc xt_nth xt_random xt_nflog xt_replace igb e1000e
Looks like you use a bunch of alien modules.
netdev is not the place to discuss of their bugs.
^ permalink raw reply
* [PATCH v8] tilegx network driver: initial support
From: Chris Metcalf @ 2012-05-25 14:42 UTC (permalink / raw)
To: bhutchings, arnd, David Miller, linux-kernel, netdev
In-Reply-To: <20120524.003148.700603156196416506.davem@davemloft.net>
This change adds support for the tilegx network driver based on the
GXIO IORPC support in the tilegx software stack, using the on-chip
mPIPE packet processing engine.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
This version of the patch fixes the issue where we were failing
to properly stop the net_device queue when the mpipe egress queue
filled up. I also removed the internal bug numbers from the sources.
drivers/net/ethernet/tile/Kconfig | 1 +
drivers/net/ethernet/tile/Makefile | 4 +-
drivers/net/ethernet/tile/tilegx.c | 1854 ++++++++++++++++++++++++++++++++++++
3 files changed, 1857 insertions(+), 2 deletions(-)
create mode 100644 drivers/net/ethernet/tile/tilegx.c
diff --git a/drivers/net/ethernet/tile/Kconfig b/drivers/net/ethernet/tile/Kconfig
index 2d9218f..9184b61 100644
--- a/drivers/net/ethernet/tile/Kconfig
+++ b/drivers/net/ethernet/tile/Kconfig
@@ -7,6 +7,7 @@ config TILE_NET
depends on TILE
default y
select CRC32
+ select TILE_GXIO_MPIPE if TILEGX
---help---
This is a standard Linux network device driver for the
on-chip Tilera Gigabit Ethernet and XAUI interfaces.
diff --git a/drivers/net/ethernet/tile/Makefile b/drivers/net/ethernet/tile/Makefile
index f634f14..0ef9eef 100644
--- a/drivers/net/ethernet/tile/Makefile
+++ b/drivers/net/ethernet/tile/Makefile
@@ -4,7 +4,7 @@
obj-$(CONFIG_TILE_NET) += tile_net.o
ifdef CONFIG_TILEGX
-tile_net-objs := tilegx.o mpipe.o iorpc_mpipe.o dma_queue.o
+tile_net-y := tilegx.o
else
-tile_net-objs := tilepro.o
+tile_net-y := tilepro.o
endif
diff --git a/drivers/net/ethernet/tile/tilegx.c b/drivers/net/ethernet/tile/tilegx.c
new file mode 100644
index 0000000..cc00ba5
--- /dev/null
+++ b/drivers/net/ethernet/tile/tilegx.c
@@ -0,0 +1,1854 @@
+/*
+ * Copyright 2012 Tilera Corporation. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/moduleparam.h>
+#include <linux/sched.h>
+#include <linux/kernel.h> /* printk() */
+#include <linux/slab.h> /* kmalloc() */
+#include <linux/errno.h> /* error codes */
+#include <linux/types.h> /* size_t */
+#include <linux/interrupt.h>
+#include <linux/in.h>
+#include <linux/irq.h>
+#include <linux/netdevice.h> /* struct device, and other headers */
+#include <linux/etherdevice.h> /* eth_type_trans */
+#include <linux/skbuff.h>
+#include <linux/ioctl.h>
+#include <linux/cdev.h>
+#include <linux/hugetlb.h>
+#include <linux/in6.h>
+#include <linux/timer.h>
+#include <linux/hrtimer.h>
+#include <linux/ktime.h>
+#include <linux/io.h>
+#include <linux/ctype.h>
+#include <linux/ip.h>
+#include <linux/tcp.h>
+
+#include <asm/checksum.h>
+#include <asm/homecache.h>
+#include <gxio/mpipe.h>
+#include <arch/sim.h>
+
+/* Default transmit lockup timeout period, in jiffies. */
+#define TILE_NET_TIMEOUT (5 * HZ)
+
+/* The maximum number of distinct channels (idesc.channel is 5 bits). */
+#define TILE_NET_CHANNELS 32
+
+/* Maximum number of idescs to handle per "poll". */
+#define TILE_NET_BATCH 128
+
+/* Maximum number of packets to handle per "poll". */
+#define TILE_NET_WEIGHT 64
+
+/* Number of entries in each iqueue. */
+#define IQUEUE_ENTRIES 512
+
+/* Number of entries in each equeue. */
+#define EQUEUE_ENTRIES 2048
+
+/* Total header bytes per equeue slot. Must be big enough for 2 bytes
+ * of NET_IP_ALIGN alignment, plus 14 bytes (?) of L2 header, plus up to
+ * 60 bytes of actual TCP header. We round up to align to cache lines.
+ */
+#define HEADER_BYTES 128
+
+/* Maximum completions per cpu per device (must be a power of two).
+ * ISSUE: What is the right number here? If this is too small, then
+ * egress might block waiting for free space in a completions array.
+ * ISSUE: At the least, allocate these only for initialized echannels.
+ */
+#define TILE_NET_MAX_COMPS 64
+
+#define MAX_FRAGS (MAX_SKB_FRAGS + 1)
+
+/* Size of completions data to allocate.
+ * ISSUE: Probably more than needed since we don't use all the channels.
+ */
+#define COMPS_SIZE (TILE_NET_CHANNELS * sizeof(struct tile_net_comps))
+
+/* Size of NotifRing data to allocate. */
+#define NOTIF_RING_SIZE (IQUEUE_ENTRIES * sizeof(gxio_mpipe_idesc_t))
+
+/* Timeout to wake the per-device TX timer after we stop the queue.
+ * We don't want the timeout too short (adds overhead, and might end
+ * up causing stop/wake/stop/wake cycles) or too long (affects performance).
+ * For the 10 Gb NIC, 30 usec means roughly 30+ 1500-byte packets.
+ */
+#define TX_TIMER_DELAY_USEC 30
+
+/* Timeout to wake the per-cpu egress timer to free completions. */
+#define EGRESS_TIMER_DELAY_USEC 1000
+
+MODULE_AUTHOR("Tilera Corporation");
+MODULE_LICENSE("GPL");
+
+/* A "packet fragment" (a chunk of memory). */
+struct frag {
+ void *buf;
+ size_t length;
+};
+
+/* A single completion. */
+struct tile_net_comp {
+ /* The "complete_count" when the completion will be complete. */
+ s64 when;
+ /* The buffer to be freed when the completion is complete. */
+ struct sk_buff *skb;
+};
+
+/* The completions for a given cpu and device. */
+struct tile_net_comps {
+ /* The completions. */
+ struct tile_net_comp comp_queue[TILE_NET_MAX_COMPS];
+ /* The number of completions used. */
+ unsigned long comp_next;
+ /* The number of completions freed. */
+ unsigned long comp_last;
+};
+
+/* Info for a specific cpu. */
+struct tile_net_info {
+ /* The NAPI struct. */
+ struct napi_struct napi;
+ /* Packet queue. */
+ gxio_mpipe_iqueue_t iqueue;
+ /* Our cpu. */
+ int my_cpu;
+ /* True if iqueue is valid. */
+ bool has_iqueue;
+ /* NAPI flags. */
+ bool napi_added;
+ bool napi_enabled;
+ /* Number of small sk_buffs which must still be provided. */
+ unsigned int num_needed_small_buffers;
+ /* Number of large sk_buffs which must still be provided. */
+ unsigned int num_needed_large_buffers;
+ /* A timer for handling egress completions. */
+ struct hrtimer egress_timer;
+ /* True if "egress_timer" is scheduled. */
+ bool egress_timer_scheduled;
+ /* Comps for each egress channel. */
+ struct tile_net_comps *comps_for_echannel[TILE_NET_CHANNELS];
+};
+
+/* Info for egress on a particular egress channel. */
+struct tile_net_egress {
+ /* The "equeue". */
+ gxio_mpipe_equeue_t *equeue;
+ /* The headers for TSO. */
+ unsigned char *headers;
+};
+
+/* Info for a specific device. */
+struct tile_net_priv {
+ /* Our network device. */
+ struct net_device *dev;
+ /* The primary link. */
+ gxio_mpipe_link_t link;
+ /* The primary channel, if open, else -1. */
+ int channel;
+ /* The "loopify" egress link, if needed. */
+ gxio_mpipe_link_t loopify_link;
+ /* The "loopify" egress channel, if open, else -1. */
+ int loopify_channel;
+ /* The egress channel (channel or loopify_channel). */
+ int echannel;
+ /* Total stats. */
+ struct net_device_stats stats;
+ /* Timer to wake up tx queue */
+ struct hrtimer tx_wake_timer;
+};
+
+/* Egress info, indexed by "priv->echannel" (lazily created as needed). */
+static struct tile_net_egress egress_for_echannel[TILE_NET_CHANNELS];
+
+/* Devices currently associated with each channel.
+ * NOTE: The array entry can become NULL after ifconfig down, but
+ * we do not free the underlying net_device structures, so it is
+ * safe to use a pointer after reading it from this array.
+ */
+static struct net_device *tile_net_devs_for_channel[TILE_NET_CHANNELS];
+
+/* A mutex for "tile_net_devs_for_channel". */
+static DEFINE_MUTEX(tile_net_devs_for_channel_mutex);
+
+/* The per-cpu info. */
+static DEFINE_PER_CPU(struct tile_net_info, per_cpu_info);
+
+/* The "context" for all devices. */
+static gxio_mpipe_context_t context;
+
+/* The small/large "buffer stacks". */
+static int small_buffer_stack = -1;
+static int large_buffer_stack = -1;
+
+/* Amount of memory allocated for each buffer stack. */
+static size_t buffer_stack_size;
+
+/* The actual memory allocated for the buffer stacks. */
+static void *small_buffer_stack_va;
+static void *large_buffer_stack_va;
+
+/* The buckets. */
+static int first_bucket = -1;
+static int num_buckets = 1;
+
+/* The ingress irq. */
+static int ingress_irq = -1;
+
+/* Text value of tile_net.cpus if passed as a module parameter. */
+static char *network_cpus_string;
+
+/* The actual cpus in "network_cpus". */
+static struct cpumask network_cpus_map;
+
+/* If "loopify=LINK" was specified, this is "LINK". */
+static char *loopify_link_name;
+
+/* If "tile_net.custom" was specified, this is non-NULL. */
+static char *custom_str;
+
+/* The "tile_net.cpus" argument specifies the cpus that are dedicated
+ * to handle ingress packets.
+ *
+ * The parameter should be in the form "tile_net.cpus=m-n[,x-y]", where
+ * m, n, x, y are integer numbers that represent the cpus that can be
+ * neither a dedicated cpu nor a dataplane cpu.
+ */
+static bool network_cpus_init(void)
+{
+ char buf[1024];
+ int rc;
+
+ if (network_cpus_string == NULL)
+ return false;
+
+ rc = cpulist_parse_crop(network_cpus_string, &network_cpus_map);
+ if (rc != 0) {
+ pr_warn("tile_net.cpus=%s: malformed cpu list\n",
+ network_cpus_string);
+ return false;
+ }
+
+ /* Remove dedicated cpus. */
+ cpumask_and(&network_cpus_map, &network_cpus_map, cpu_possible_mask);
+
+ if (cpumask_empty(&network_cpus_map)) {
+ pr_warn("Ignoring empty tile_net.cpus='%s'.\n",
+ network_cpus_string);
+ return false;
+ }
+
+ cpulist_scnprintf(buf, sizeof(buf), &network_cpus_map);
+ pr_info("Linux network CPUs: %s\n", buf);
+ return true;
+}
+
+module_param_named(cpus, network_cpus_string, charp, 0444);
+MODULE_PARM_DESC(cpus, "cpulist of cores that handle network interrupts");
+
+/* The "tile_net.loopify=LINK" argument causes the named device to
+ * actually use "loop0" for ingress, and "loop1" for egress. This
+ * allows an app to sit between the actual link and linux, passing
+ * (some) packets along to linux, and forwarding (some) packets sent
+ * out by linux.
+ */
+module_param_named(loopify, loopify_link_name, charp, 0444);
+MODULE_PARM_DESC(loopify, "name the device to use loop0/1 for ingress/egress");
+
+/* The "tile_net.custom" argument causes us to ignore the "conventional"
+ * classifier metadata, in particular, the "l2_offset".
+ */
+module_param_named(custom, custom_str, charp, 0444);
+MODULE_PARM_DESC(custom, "indicates a (heavily) customized classifier");
+
+/* Atomically update a statistics field.
+ * Note that on TILE-Gx, this operation is fire-and-forget on the
+ * issuing core (single-cycle dispatch) and takes only a few cycles
+ * longer than a regular store when the request reaches the home cache.
+ * No expensive bus management overhead is required.
+ */
+static void tile_net_stats_add(unsigned long value, unsigned long *field)
+{
+ BUILD_BUG_ON(sizeof(atomic_long_t) != sizeof(unsigned long));
+ atomic_long_add(value, (atomic_long_t *)field);
+}
+
+/* Allocate and push a buffer. */
+static bool tile_net_provide_buffer(bool small)
+{
+ int stack = small ? small_buffer_stack : large_buffer_stack;
+ const unsigned long buffer_alignment = 128;
+ struct sk_buff *skb;
+ int len;
+
+ len = sizeof(struct sk_buff **) + buffer_alignment;
+ len += (small ? 128 : 1664);
+ skb = dev_alloc_skb(len);
+ if (skb == NULL)
+ return false;
+
+ /* Make room for a back-pointer to 'skb' and guarantee alignment. */
+ skb_reserve(skb, sizeof(struct sk_buff **));
+ skb_reserve(skb, -(long)skb->data & (buffer_alignment - 1));
+
+ /* Save a back-pointer to 'skb'. */
+ *(struct sk_buff **)(skb->data - sizeof(struct sk_buff **)) = skb;
+
+ /* Make sure "skb" and the back-pointer have been flushed. */
+ wmb();
+
+ gxio_mpipe_push_buffer(&context, stack,
+ (void *)va_to_tile_io_addr(skb->data));
+
+ return true;
+}
+
+static void tile_net_pop_all_buffers(int stack)
+{
+ void *va;
+ while ((va = gxio_mpipe_pop_buffer(&context, stack)) != NULL) {
+ struct sk_buff **skb_ptr = va - sizeof(*skb_ptr);
+ struct sk_buff *skb = *skb_ptr;
+ dev_kfree_skb_irq(skb);
+ }
+}
+
+/* Provide linux buffers to mPIPE. */
+static void tile_net_provide_needed_buffers(struct tile_net_info *info)
+{
+ while (info->num_needed_small_buffers != 0) {
+ if (!tile_net_provide_buffer(true))
+ goto oops;
+ info->num_needed_small_buffers--;
+ }
+
+ while (info->num_needed_large_buffers != 0) {
+ if (!tile_net_provide_buffer(false))
+ goto oops;
+ info->num_needed_large_buffers--;
+ }
+
+ return;
+
+oops:
+ /* Add a description to the page allocation failure dump. */
+ pr_notice("Tile %d still needs some buffers\n", info->my_cpu);
+}
+
+static inline bool filter_packet(struct net_device *dev, void *buf)
+{
+ /* Filter packets received before we're up. */
+ if (dev == NULL || !(dev->flags & IFF_UP))
+ return true;
+
+ /* Filter out packets that aren't for us. */
+ if (!(dev->flags & IFF_PROMISC) &&
+ !is_multicast_ether_addr(buf) &&
+ compare_ether_addr(dev->dev_addr, buf) != 0)
+ return true;
+
+ return false;
+}
+
+/* Convert a raw mpipe buffer to its matching skb pointer. */
+static struct sk_buff *mpipe_buf_to_skb(void *va)
+{
+ /* Acquire the associated "skb". */
+ struct sk_buff **skb_ptr = va - sizeof(*skb_ptr);
+ struct sk_buff *skb = *skb_ptr;
+
+ /* Paranoia. */
+ if (skb->data != va) {
+ /* Panic here since there's a reasonable chance
+ * that corrupt buffers means generic memory
+ * corruption, with unpredictable system effects.
+ */
+ panic("Corrupt linux buffer! va=%p, skb=%p, skb->data=%p",
+ va, skb, skb->data);
+ }
+
+ return skb;
+}
+
+static void tile_net_receive_skb(struct net_device *dev, struct sk_buff *skb,
+ struct tile_net_info *info,
+ gxio_mpipe_idesc_t *idesc, unsigned long len)
+{
+ struct tile_net_priv *priv = netdev_priv(dev);
+
+ /* Encode the actual packet length. */
+ skb_put(skb, len);
+
+ skb->protocol = eth_type_trans(skb, dev);
+
+ /* Acknowledge "good" hardware checksums. */
+ if (idesc->cs && idesc->csum_seed_val == 0xFFFF)
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+ netif_receive_skb(skb);
+
+ /* Update stats. */
+ tile_net_stats_add(1, &priv->stats.rx_packets);
+ tile_net_stats_add(len, &priv->stats.rx_bytes);
+
+ /* Need a new buffer. */
+ if (idesc->size == GXIO_MPIPE_BUFFER_SIZE_128)
+ info->num_needed_small_buffers++;
+ else
+ info->num_needed_large_buffers++;
+}
+
+/* Handle a packet. Return true if "processed", false if "filtered". */
+static bool tile_net_handle_packet(struct tile_net_info *info,
+ gxio_mpipe_idesc_t *idesc)
+{
+ struct net_device *dev = tile_net_devs_for_channel[idesc->channel];
+ uint8_t l2_offset;
+ void *va;
+ void *buf;
+ unsigned long len;
+ bool filter;
+
+ /* Drop packets for which no buffer was available.
+ * NOTE: This happens under heavy load.
+ */
+ if (idesc->be) {
+ struct tile_net_priv *priv = netdev_priv(dev);
+ tile_net_stats_add(1, &priv->stats.rx_dropped);
+ gxio_mpipe_iqueue_consume(&info->iqueue, idesc);
+ if (net_ratelimit())
+ pr_info("Dropping packet (insufficient buffers).\n");
+ return false;
+ }
+
+ /* Get the "l2_offset", if allowed. */
+ l2_offset = custom_str ? 0 : gxio_mpipe_idesc_get_l2_offset(idesc);
+
+ /* Get the raw buffer VA (includes "headroom"). */
+ va = tile_io_addr_to_va((unsigned long)(long)idesc->va);
+
+ /* Get the actual packet start/length. */
+ buf = va + l2_offset;
+ len = idesc->l2_size - l2_offset;
+
+ /* Point "va" at the raw buffer. */
+ va -= NET_IP_ALIGN;
+
+ filter = filter_packet(dev, buf);
+ if (filter) {
+ gxio_mpipe_iqueue_drop(&info->iqueue, idesc);
+ } else {
+ struct sk_buff *skb = mpipe_buf_to_skb(va);
+
+ /* Skip headroom, and any custom header. */
+ skb_reserve(skb, NET_IP_ALIGN + l2_offset);
+
+ tile_net_receive_skb(dev, skb, info, idesc, len);
+ }
+
+ gxio_mpipe_iqueue_consume(&info->iqueue, idesc);
+ return !filter;
+}
+
+/* Handle some packets for the current CPU.
+ *
+ * This function handles up to TILE_NET_BATCH idescs per call.
+ *
+ * ISSUE: Since we do not provide new buffers until this function is
+ * complete, we must initially provide enough buffers for each network
+ * cpu to fill its iqueue and also its batched idescs.
+ *
+ * ISSUE: The "rotting packet" race condition occurs if a packet
+ * arrives after the queue appears to be empty, and before the
+ * hypervisor interrupt is re-enabled.
+ */
+static int tile_net_poll(struct napi_struct *napi, int budget)
+{
+ struct tile_net_info *info = &__get_cpu_var(per_cpu_info);
+ unsigned int work = 0;
+ gxio_mpipe_idesc_t *idesc;
+ int i, n;
+
+ /* Process packets. */
+ while ((n = gxio_mpipe_iqueue_try_peek(&info->iqueue, &idesc)) > 0) {
+ for (i = 0; i < n; i++) {
+ if (i == TILE_NET_BATCH)
+ goto done;
+ if (tile_net_handle_packet(info, idesc + i)) {
+ if (++work >= budget)
+ goto done;
+ }
+ }
+ }
+
+ /* There are no packets left. */
+ napi_complete(&info->napi);
+
+ /* Re-enable hypervisor interrupts. */
+ gxio_mpipe_enable_notif_ring_interrupt(&context, info->iqueue.ring);
+
+ /* HACK: Avoid the "rotting packet" problem. */
+ if (gxio_mpipe_iqueue_try_peek(&info->iqueue, &idesc) > 0)
+ napi_schedule(&info->napi);
+
+ /* ISSUE: Handle completions? */
+
+done:
+ tile_net_provide_needed_buffers(info);
+
+ return work;
+}
+
+/* Handle an ingress interrupt on the current cpu. */
+static irqreturn_t tile_net_handle_ingress_irq(int irq, void *unused)
+{
+ struct tile_net_info *info = &__get_cpu_var(per_cpu_info);
+ napi_schedule(&info->napi);
+ return IRQ_HANDLED;
+}
+
+/* Free some completions. This must be called with interrupts blocked. */
+static int tile_net_free_comps(gxio_mpipe_equeue_t *equeue,
+ struct tile_net_comps *comps,
+ int limit, bool force_update)
+{
+ int n = 0;
+ while (comps->comp_last < comps->comp_next) {
+ unsigned int cid = comps->comp_last % TILE_NET_MAX_COMPS;
+ struct tile_net_comp *comp = &comps->comp_queue[cid];
+ if (!gxio_mpipe_equeue_is_complete(equeue, comp->when,
+ force_update || n == 0))
+ break;
+ dev_kfree_skb_irq(comp->skb);
+ comps->comp_last++;
+ if (++n == limit)
+ break;
+ }
+ return n;
+}
+
+/* Add a completion. This must be called with interrupts blocked.
+ * tile_net_equeue_try_reserve() will have ensured a free completion entry.
+ */
+static void add_comp(gxio_mpipe_equeue_t *equeue,
+ struct tile_net_comps *comps,
+ uint64_t when, struct sk_buff *skb)
+{
+ int cid = comps->comp_next % TILE_NET_MAX_COMPS;
+ comps->comp_queue[cid].when = when;
+ comps->comp_queue[cid].skb = skb;
+ comps->comp_next++;
+}
+
+static void tile_net_schedule_tx_wake_timer(struct net_device *dev)
+{
+ struct tile_net_priv *priv = netdev_priv(dev);
+
+ hrtimer_start(&priv->tx_wake_timer,
+ ktime_set(0, TX_TIMER_DELAY_USEC * 1000UL),
+ HRTIMER_MODE_REL);
+}
+
+static enum hrtimer_restart tile_net_handle_tx_wake_timer(struct hrtimer *t)
+{
+ struct net_device *dev;
+ struct tile_net_priv *priv;
+
+ priv = container_of(t, struct tile_net_priv, tx_wake_timer);
+ dev = priv->dev;
+
+ if (netif_queue_stopped(dev))
+ netif_wake_queue(dev);
+
+ return HRTIMER_NORESTART;
+}
+
+/* Make sure the egress timer is scheduled.
+ *
+ * Note that we use "schedule if not scheduled" logic instead of the more
+ * obvious "reschedule" logic, because "reschedule" is fairly expensive.
+ */
+static void tile_net_schedule_egress_timer(struct tile_net_info *info)
+{
+ if (!info->egress_timer_scheduled) {
+ hrtimer_start(&info->egress_timer,
+ ktime_set(0, EGRESS_TIMER_DELAY_USEC * 1000UL),
+ HRTIMER_MODE_REL);
+ info->egress_timer_scheduled = true;
+ }
+}
+
+/* The "function" for "info->egress_timer".
+ *
+ * This timer will reschedule itself as long as there are any pending
+ * completions expected for this tile.
+ */
+static enum hrtimer_restart tile_net_handle_egress_timer(struct hrtimer *t)
+{
+ struct tile_net_info *info = &__get_cpu_var(per_cpu_info);
+ unsigned long irqflags;
+ bool pending = false;
+ int i;
+
+ local_irq_save(irqflags);
+
+ /* The timer is no longer scheduled. */
+ info->egress_timer_scheduled = false;
+
+ /* Free all possible comps for this tile. */
+ for (i = 0; i < TILE_NET_CHANNELS; i++) {
+ struct tile_net_egress *egress = &egress_for_echannel[i];
+ struct tile_net_comps *comps = info->comps_for_echannel[i];
+ if (comps->comp_last >= comps->comp_next)
+ continue;
+ tile_net_free_comps(egress->equeue, comps, -1, true);
+ pending = pending || (comps->comp_last < comps->comp_next);
+ }
+
+ /* Reschedule timer if needed. */
+ if (pending)
+ tile_net_schedule_egress_timer(info);
+
+ local_irq_restore(irqflags);
+
+ return HRTIMER_NORESTART;
+}
+
+/* Helper function for "tile_net_update()".
+ * "dev" (i.e. arg) is the device being brought up or down,
+ * or NULL if all devices are now down.
+ */
+static void tile_net_update_cpu(void *arg)
+{
+ struct net_device *dev = arg;
+ struct tile_net_info *info = &__get_cpu_var(per_cpu_info);
+
+ if (!info->has_iqueue)
+ return;
+
+ if (dev != NULL) {
+ if (!info->napi_added) {
+ netif_napi_add(dev, &info->napi,
+ tile_net_poll, TILE_NET_WEIGHT);
+ info->napi_added = true;
+ }
+ if (!info->napi_enabled) {
+ napi_enable(&info->napi);
+ info->napi_enabled = true;
+ }
+ enable_percpu_irq(ingress_irq, 0);
+ } else {
+ disable_percpu_irq(ingress_irq);
+ if (info->napi_enabled) {
+ napi_disable(&info->napi);
+ info->napi_enabled = false;
+ }
+ /* FIXME: Drain the iqueue. */
+ }
+}
+
+/* Helper function for tile_net_open() and tile_net_stop().
+ * Always called under tile_net_devs_for_channel_mutex.
+ */
+static int tile_net_update(struct net_device *dev)
+{
+ static gxio_mpipe_rules_t rules; /* too big to fit on the stack */
+ bool saw_channel = false;
+ int channel;
+ int rc;
+ int cpu;
+
+ gxio_mpipe_rules_init(&rules, &context);
+
+ for (channel = 0; channel < TILE_NET_CHANNELS; channel++) {
+ if (tile_net_devs_for_channel[channel] == NULL)
+ continue;
+ if (!saw_channel) {
+ saw_channel = true;
+ gxio_mpipe_rules_begin(&rules, first_bucket,
+ num_buckets, NULL);
+ gxio_mpipe_rules_set_headroom(&rules, NET_IP_ALIGN);
+ }
+ gxio_mpipe_rules_add_channel(&rules, channel);
+ }
+
+ /* NOTE: This can fail if there is no classifier.
+ * ISSUE: Can anything else cause it to fail?
+ */
+ rc = gxio_mpipe_rules_commit(&rules);
+ if (rc != 0) {
+ netdev_warn(dev, "gxio_mpipe_rules_commit failed: %d\n", rc);
+ return -EIO;
+ }
+
+ /* Update all cpus, sequentially (to protect "netif_napi_add()"). */
+ for_each_online_cpu(cpu)
+ smp_call_function_single(cpu, tile_net_update_cpu,
+ (saw_channel ? dev : NULL), 1);
+
+ /* HACK: Allow packets to flow in the simulator. */
+ if (saw_channel)
+ sim_enable_mpipe_links(0, -1);
+
+ return 0;
+}
+
+/* Allocate and initialize mpipe buffer stacks, and register them in
+ * the mPIPE TLBs, for both small and large packet sizes.
+ * This routine supports tile_net_init_mpipe(), below.
+ */
+static int init_buffer_stacks(struct net_device *dev, int num_buffers)
+{
+ pte_t hash_pte = pte_set_home((pte_t) { 0 }, PAGE_HOME_HASH);
+ int rc;
+
+ /* Compute stack bytes; we round up to 64KB and then use
+ * alloc_pages() so we get the required 64KB alignment as well.
+ */
+ buffer_stack_size =
+ ALIGN(gxio_mpipe_calc_buffer_stack_bytes(num_buffers),
+ 64 * 1024);
+
+ /* Allocate two buffer stack indices. */
+ rc = gxio_mpipe_alloc_buffer_stacks(&context, 2, 0, 0);
+ if (rc < 0) {
+ netdev_err(dev, "gxio_mpipe_alloc_buffer_stacks failed: %d\n",
+ rc);
+ return rc;
+ }
+ small_buffer_stack = rc;
+ large_buffer_stack = rc + 1;
+
+ /* Allocate the small memory stack. */
+ small_buffer_stack_va =
+ alloc_pages_exact(buffer_stack_size, GFP_KERNEL);
+ if (small_buffer_stack_va == NULL) {
+ netdev_err(dev,
+ "Could not alloc %zd bytes for buffer stacks\n",
+ buffer_stack_size);
+ return -ENOMEM;
+ }
+ rc = gxio_mpipe_init_buffer_stack(&context, small_buffer_stack,
+ GXIO_MPIPE_BUFFER_SIZE_128,
+ small_buffer_stack_va,
+ buffer_stack_size, 0);
+ if (rc != 0) {
+ netdev_err(dev, "gxio_mpipe_init_buffer_stack: %d\n", rc);
+ return rc;
+ }
+ rc = gxio_mpipe_register_client_memory(&context, small_buffer_stack,
+ hash_pte, 0);
+ if (rc != 0) {
+ netdev_err(dev,
+ "gxio_mpipe_register_buffer_memory failed: %d\n",
+ rc);
+ return rc;
+ }
+
+ /* Allocate the large buffer stack. */
+ large_buffer_stack_va =
+ alloc_pages_exact(buffer_stack_size, GFP_KERNEL);
+ if (large_buffer_stack_va == NULL) {
+ netdev_err(dev,
+ "Could not alloc %zd bytes for buffer stacks\n",
+ buffer_stack_size);
+ return -ENOMEM;
+ }
+ rc = gxio_mpipe_init_buffer_stack(&context, large_buffer_stack,
+ GXIO_MPIPE_BUFFER_SIZE_1664,
+ large_buffer_stack_va,
+ buffer_stack_size, 0);
+ if (rc != 0) {
+ netdev_err(dev, "gxio_mpipe_init_buffer_stack failed: %d\n",
+ rc);
+ return rc;
+ }
+ rc = gxio_mpipe_register_client_memory(&context, large_buffer_stack,
+ hash_pte, 0);
+ if (rc != 0) {
+ netdev_err(dev,
+ "gxio_mpipe_register_buffer_memory failed: %d\n",
+ rc);
+ return rc;
+ }
+
+ return 0;
+}
+
+/* Allocate per-cpu resources (memory for completions and idescs).
+ * This routine supports tile_net_init_mpipe(), below.
+ */
+static int alloc_percpu_mpipe_resources(struct net_device *dev,
+ int cpu, int ring)
+{
+ struct tile_net_info *info = &per_cpu(per_cpu_info, cpu);
+ int order, i, rc;
+ struct page *page;
+ void *addr;
+
+ /* Allocate the "comps". */
+ order = get_order(COMPS_SIZE);
+ page = homecache_alloc_pages(GFP_KERNEL, order, cpu);
+ if (page == NULL) {
+ netdev_err(dev, "Failed to alloc %zd bytes comps memory\n",
+ COMPS_SIZE);
+ return -ENOMEM;
+ }
+ addr = pfn_to_kaddr(page_to_pfn(page));
+ memset(addr, 0, COMPS_SIZE);
+ for (i = 0; i < TILE_NET_CHANNELS; i++)
+ info->comps_for_echannel[i] =
+ addr + i * sizeof(struct tile_net_comps);
+
+ /* If this is a network cpu, create an iqueue. */
+ if (cpu_isset(cpu, network_cpus_map)) {
+ order = get_order(NOTIF_RING_SIZE);
+ page = homecache_alloc_pages(GFP_KERNEL, order, cpu);
+ if (page == NULL) {
+ netdev_err(dev,
+ "Failed to alloc %zd bytes iqueue memory\n",
+ NOTIF_RING_SIZE);
+ return -ENOMEM;
+ }
+ addr = pfn_to_kaddr(page_to_pfn(page));
+ rc = gxio_mpipe_iqueue_init(&info->iqueue, &context, ring,
+ addr, NOTIF_RING_SIZE, 0);
+ if (rc != 0) {
+ netdev_err(dev,
+ "gxio_mpipe_iqueue_init failed: %d\n", rc);
+ return rc;
+ }
+ info->has_iqueue = true;
+ }
+
+ return 0;
+}
+
+/* Initialize NotifGroup and buckets.
+ * This routine supports tile_net_init_mpipe(), below.
+ */
+static int init_notif_group_and_buckets(struct net_device *dev,
+ int ring, int network_cpus_count)
+{
+ int group, rc;
+
+ /* Allocate one NotifGroup. */
+ rc = gxio_mpipe_alloc_notif_groups(&context, 1, 0, 0);
+ if (rc < 0) {
+ netdev_err(dev, "gxio_mpipe_alloc_notif_groups failed: %d\n",
+ rc);
+ return rc;
+ }
+ group = rc;
+
+ /* Initialize global num_buckets value. */
+ if (network_cpus_count > 4)
+ num_buckets = 256;
+ else if (network_cpus_count > 1)
+ num_buckets = 16;
+
+ /* Allocate some buckets, and set global first_bucket value. */
+ rc = gxio_mpipe_alloc_buckets(&context, num_buckets, 0, 0);
+ if (rc < 0) {
+ netdev_err(dev, "gxio_mpipe_alloc_buckets failed: %d\n", rc);
+ return rc;
+ }
+ first_bucket = rc;
+
+ /* Init group and buckets. */
+ rc = gxio_mpipe_init_notif_group_and_buckets(
+ &context, group, ring, network_cpus_count,
+ first_bucket, num_buckets,
+ GXIO_MPIPE_BUCKET_STICKY_FLOW_LOCALITY);
+ if (rc != 0) {
+ netdev_err(
+ dev,
+ "gxio_mpipe_init_notif_group_and_buckets failed: %d\n",
+ rc);
+ return rc;
+ }
+
+ return 0;
+}
+
+/* Create an irq and register it, then activate the irq and request
+ * interrupts on all cores. Note that "ingress_irq" being initialized
+ * is how we know not to call tile_net_init_mpipe() again.
+ * This routine supports tile_net_init_mpipe(), below.
+ */
+static int tile_net_setup_interrupts(struct net_device *dev)
+{
+ int cpu, rc;
+
+ rc = create_irq();
+ if (rc < 0) {
+ netdev_err(dev, "create_irq failed: %d\n", rc);
+ return rc;
+ }
+ ingress_irq = rc;
+ tile_irq_activate(ingress_irq, TILE_IRQ_PERCPU);
+ rc = request_irq(ingress_irq, tile_net_handle_ingress_irq,
+ 0, NULL, NULL);
+ if (rc != 0) {
+ netdev_err(dev, "request_irq failed: %d\n", rc);
+ destroy_irq(ingress_irq);
+ ingress_irq = -1;
+ return rc;
+ }
+
+ for_each_online_cpu(cpu) {
+ struct tile_net_info *info = &per_cpu(per_cpu_info, cpu);
+ if (info->has_iqueue) {
+ gxio_mpipe_request_notif_ring_interrupt(
+ &context, cpu_x(cpu), cpu_y(cpu),
+ 1, ingress_irq, info->iqueue.ring);
+ }
+ }
+
+ return 0;
+}
+
+/* Undo any state set up partially by a failed call to tile_net_init_mpipe. */
+static void tile_net_init_mpipe_fail(void)
+{
+ int cpu;
+
+ /* Do cleanups that require the mpipe context first. */
+ if (small_buffer_stack >= 0)
+ tile_net_pop_all_buffers(small_buffer_stack);
+ if (large_buffer_stack >= 0)
+ tile_net_pop_all_buffers(large_buffer_stack);
+
+ /* Destroy mpipe context so the hardware no longer owns any memory. */
+ gxio_mpipe_destroy(&context);
+
+ for_each_online_cpu(cpu) {
+ struct tile_net_info *info = &per_cpu(per_cpu_info, cpu);
+ free_pages((unsigned long)(info->comps_for_echannel[0]),
+ get_order(COMPS_SIZE));
+ info->comps_for_echannel[0] = NULL;
+ free_pages((unsigned long)(info->iqueue.idescs),
+ get_order(NOTIF_RING_SIZE));
+ info->iqueue.idescs = NULL;
+ }
+
+ if (small_buffer_stack_va)
+ free_pages_exact(small_buffer_stack_va, buffer_stack_size);
+ if (large_buffer_stack_va)
+ free_pages_exact(large_buffer_stack_va, buffer_stack_size);
+
+ small_buffer_stack_va = NULL;
+ large_buffer_stack_va = NULL;
+ large_buffer_stack = -1;
+ small_buffer_stack = -1;
+ first_bucket = -1;
+}
+
+/* The first time any tilegx network device is opened, we initialize
+ * the global mpipe state. If this step fails, we fail to open the
+ * device, but if it succeeds, we never need to do it again, and since
+ * tile_net can't be unloaded, we never undo it.
+ *
+ * Note that some resources in this path (buffer stack indices,
+ * bindings from init_buffer_stack, etc.) are hypervisor resources
+ * that are freed implicitly by gxio_mpipe_destroy().
+ */
+static int tile_net_init_mpipe(struct net_device *dev)
+{
+ int i, num_buffers, rc;
+ int cpu;
+ int first_ring, ring;
+ int network_cpus_count = cpus_weight(network_cpus_map);
+
+ if (!hash_default) {
+ netdev_err(dev, "Networking requires hash_default!\n");
+ return -EIO;
+ }
+
+ rc = gxio_mpipe_init(&context, 0);
+ if (rc != 0) {
+ netdev_err(dev, "gxio_mpipe_init failed: %d\n", rc);
+ return -EIO;
+ }
+
+ /* Set up the buffer stacks. */
+ num_buffers =
+ network_cpus_count * (IQUEUE_ENTRIES + TILE_NET_BATCH);
+ rc = init_buffer_stacks(dev, num_buffers);
+ if (rc != 0)
+ goto fail;
+
+ /* Provide initial buffers. */
+ rc = -ENOMEM;
+ for (i = 0; i < num_buffers; i++) {
+ if (!tile_net_provide_buffer(true)) {
+ netdev_err(dev, "Cannot allocate initial sk_bufs!\n");
+ goto fail;
+ }
+ }
+ for (i = 0; i < num_buffers; i++) {
+ if (!tile_net_provide_buffer(false)) {
+ netdev_err(dev, "Cannot allocate initial sk_bufs!\n");
+ goto fail;
+ }
+ }
+
+ /* Allocate one NotifRing for each network cpu. */
+ rc = gxio_mpipe_alloc_notif_rings(&context, network_cpus_count, 0, 0);
+ if (rc < 0) {
+ netdev_err(dev, "gxio_mpipe_alloc_notif_rings failed %d\n",
+ rc);
+ goto fail;
+ }
+
+ /* Init NotifRings per-cpu. */
+ first_ring = rc;
+ ring = first_ring;
+ for_each_online_cpu(cpu) {
+ rc = alloc_percpu_mpipe_resources(dev, cpu, ring++);
+ if (rc != 0)
+ goto fail;
+ }
+
+ /* Initialize NotifGroup and buckets. */
+ rc = init_notif_group_and_buckets(dev, first_ring, network_cpus_count);
+ if (rc != 0)
+ goto fail;
+
+ /* Create and enable interrupts. */
+ rc = tile_net_setup_interrupts(dev);
+ if (rc != 0)
+ goto fail;
+
+ return 0;
+
+fail:
+ tile_net_init_mpipe_fail();
+ return rc;
+}
+
+/* Create persistent egress info for a given egress channel.
+ * Note that this may be shared between, say, "gbe0" and "xgbe0".
+ * ISSUE: Defer header allocation until TSO is actually needed?
+ */
+static int tile_net_init_egress(struct net_device *dev, int echannel)
+{
+ struct page *headers_page, *edescs_page, *equeue_page;
+ gxio_mpipe_edesc_t *edescs;
+ gxio_mpipe_equeue_t *equeue;
+ unsigned char *headers;
+ int headers_order, edescs_order, equeue_order;
+ size_t edescs_size;
+ int edma;
+ int rc = -ENOMEM;
+
+ /* Only initialize once. */
+ if (egress_for_echannel[echannel].equeue != NULL)
+ return 0;
+
+ /* Allocate memory for the "headers". */
+ headers_order = get_order(EQUEUE_ENTRIES * HEADER_BYTES);
+ headers_page = alloc_pages(GFP_KERNEL, headers_order);
+ if (headers_page == NULL) {
+ netdev_warn(dev,
+ "Could not alloc %zd bytes for TSO headers.\n",
+ PAGE_SIZE << headers_order);
+ goto fail;
+ }
+ headers = pfn_to_kaddr(page_to_pfn(headers_page));
+
+ /* Allocate memory for the "edescs". */
+ edescs_size = EQUEUE_ENTRIES * sizeof(*edescs);
+ edescs_order = get_order(edescs_size);
+ edescs_page = alloc_pages(GFP_KERNEL, edescs_order);
+ if (edescs_page == NULL) {
+ netdev_warn(dev,
+ "Could not alloc %zd bytes for eDMA ring.\n",
+ edescs_size);
+ goto fail_headers;
+ }
+ edescs = pfn_to_kaddr(page_to_pfn(edescs_page));
+
+ /* Allocate memory for the "equeue". */
+ equeue_order = get_order(sizeof(*equeue));
+ equeue_page = alloc_pages(GFP_KERNEL, equeue_order);
+ if (equeue_page == NULL) {
+ netdev_warn(dev,
+ "Could not alloc %zd bytes for equeue info.\n",
+ PAGE_SIZE << equeue_order);
+ goto fail_edescs;
+ }
+ equeue = pfn_to_kaddr(page_to_pfn(equeue_page));
+
+ /* Allocate an edma ring. Note that in practice this can't
+ * fail, which is good, because we will leak an edma ring if so.
+ */
+ rc = gxio_mpipe_alloc_edma_rings(&context, 1, 0, 0);
+ if (rc < 0) {
+ netdev_warn(dev, "gxio_mpipe_alloc_edma_rings failed: %d\n",
+ rc);
+ goto fail_equeue;
+ }
+ edma = rc;
+
+ /* Initialize the equeue. */
+ rc = gxio_mpipe_equeue_init(equeue, &context, edma, echannel,
+ edescs, edescs_size, 0);
+ if (rc != 0) {
+ netdev_err(dev, "gxio_mpipe_equeue_init failed: %d\n", rc);
+ goto fail_equeue;
+ }
+
+ /* Done. */
+ egress_for_echannel[echannel].equeue = equeue;
+ egress_for_echannel[echannel].headers = headers;
+ return 0;
+
+fail_equeue:
+ __free_pages(equeue_page, equeue_order);
+
+fail_edescs:
+ __free_pages(edescs_page, edescs_order);
+
+fail_headers:
+ __free_pages(headers_page, headers_order);
+
+fail:
+ return rc;
+}
+
+/* Return channel number for a newly-opened link. */
+static int tile_net_link_open(struct net_device *dev, gxio_mpipe_link_t *link,
+ const char *link_name)
+{
+ int rc = gxio_mpipe_link_open(link, &context, link_name, 0);
+ if (rc < 0) {
+ netdev_err(dev, "Failed to open '%s'\n", link_name);
+ return rc;
+ }
+ rc = gxio_mpipe_link_channel(link);
+ if (rc < 0 || rc >= TILE_NET_CHANNELS) {
+ netdev_err(dev, "gxio_mpipe_link_channel bad value: %d\n", rc);
+ gxio_mpipe_link_close(link);
+ return -EINVAL;
+ }
+ return rc;
+}
+
+/* Help the kernel activate the given network interface. */
+static int tile_net_open(struct net_device *dev)
+{
+ struct tile_net_priv *priv = netdev_priv(dev);
+ int rc;
+
+ mutex_lock(&tile_net_devs_for_channel_mutex);
+
+ /* Do one-time initialization the first time any device is opened. */
+ if (ingress_irq < 0) {
+ rc = tile_net_init_mpipe(dev);
+ if (rc != 0)
+ goto fail;
+ }
+
+ /* Determine if this is the "loopify" device. */
+ if (unlikely((loopify_link_name != NULL) &&
+ !strcmp(dev->name, loopify_link_name))) {
+ rc = tile_net_link_open(dev, &priv->link, "loop0");
+ if (rc < 0)
+ goto fail;
+ priv->channel = rc;
+ rc = tile_net_link_open(dev, &priv->loopify_link, "loop1");
+ if (rc < 0)
+ goto fail;
+ priv->loopify_channel = rc;
+ priv->echannel = rc;
+ } else {
+ rc = tile_net_link_open(dev, &priv->link, dev->name);
+ if (rc < 0)
+ goto fail;
+ priv->channel = rc;
+ priv->echannel = rc;
+ }
+
+ /* Initialize egress info (if needed). Once ever, per echannel. */
+ rc = tile_net_init_egress(dev, priv->echannel);
+ if (rc != 0)
+ goto fail;
+
+ tile_net_devs_for_channel[priv->channel] = dev;
+
+ rc = tile_net_update(dev);
+ if (rc != 0)
+ goto fail;
+
+ mutex_unlock(&tile_net_devs_for_channel_mutex);
+
+ netif_start_queue(dev);
+ netif_carrier_on(dev);
+ return 0;
+
+fail:
+ if (priv->loopify_channel >= 0) {
+ if (gxio_mpipe_link_close(&priv->loopify_link) != 0)
+ netdev_warn(dev, "Failed to close loopify link!\n");
+ priv->loopify_channel = -1;
+ }
+ if (priv->channel >= 0) {
+ if (gxio_mpipe_link_close(&priv->link) != 0)
+ netdev_warn(dev, "Failed to close link!\n");
+ priv->channel = -1;
+ }
+ priv->echannel = -1;
+ tile_net_devs_for_channel[priv->channel] = NULL;
+ mutex_unlock(&tile_net_devs_for_channel_mutex);
+
+ /* Don't return raw gxio error codes to generic Linux. */
+ return (rc > -512) ? rc : -EIO;
+}
+
+/* Help the kernel deactivate the given network interface. */
+static int tile_net_stop(struct net_device *dev)
+{
+ struct tile_net_priv *priv = netdev_priv(dev);
+
+ netif_stop_queue(dev);
+
+ mutex_lock(&tile_net_devs_for_channel_mutex);
+ tile_net_devs_for_channel[priv->channel] = NULL;
+ (void)tile_net_update(dev);
+ if (priv->loopify_channel >= 0) {
+ if (gxio_mpipe_link_close(&priv->loopify_link) != 0)
+ netdev_warn(dev, "Failed to close loopify link!\n");
+ priv->loopify_channel = -1;
+ }
+ if (priv->channel >= 0) {
+ if (gxio_mpipe_link_close(&priv->link) != 0)
+ netdev_warn(dev, "Failed to close link!\n");
+ priv->channel = -1;
+ }
+ priv->echannel = -1;
+ mutex_unlock(&tile_net_devs_for_channel_mutex);
+
+ return 0;
+}
+
+/* Determine the VA for a fragment. */
+static inline void *tile_net_frag_buf(skb_frag_t *f)
+{
+ unsigned long pfn = page_to_pfn(skb_frag_page(f));
+ return pfn_to_kaddr(pfn) + f->page_offset;
+}
+
+/* Acquire a completion entry and an egress slot, or if we can't,
+ * stop the queue and schedule the tx_wake timer.
+ */
+static s64 tile_net_equeue_try_reserve(struct net_device *dev,
+ struct tile_net_comps *comps,
+ gxio_mpipe_equeue_t *equeue,
+ int num_edescs)
+{
+ /* Try to acquire a completion entry. */
+ if (comps->comp_next - comps->comp_last < TILE_NET_MAX_COMPS - 1 ||
+ tile_net_free_comps(equeue, comps, 32, false) != 0) {
+
+ /* Try to acquire an egress slot. */
+ s64 slot = gxio_mpipe_equeue_try_reserve(equeue, num_edescs);
+ if (slot >= 0)
+ return slot;
+
+ /* Freeing some completions gives the equeue time to drain. */
+ tile_net_free_comps(equeue, comps, TILE_NET_MAX_COMPS, false);
+
+ slot = gxio_mpipe_equeue_try_reserve(equeue, num_edescs);
+ if (slot >= 0)
+ return slot;
+ }
+
+ /* Still nothing; give up and stop the queue for a short while. */
+ netif_stop_queue(dev);
+ tile_net_schedule_tx_wake_timer(dev);
+ return -1;
+}
+
+/* Determine how many edesc's are needed for TSO.
+ *
+ * Sometimes, if "sendfile()" requires copying, we will be called with
+ * "data" containing the header and payload, with "frags" being empty.
+ * Sometimes, for example when using NFS over TCP, a single segment can
+ * span 3 fragments. This requires special care.
+ */
+static int tso_count_edescs(struct sk_buff *skb)
+{
+ struct skb_shared_info *sh = skb_shinfo(skb);
+ unsigned int len = skb->len;
+ unsigned int p_len = sh->gso_size;
+ long f_id = -1; /* id of the current fragment */
+ long f_size = -1; /* size of the current fragment */
+ long f_used = -1; /* bytes used from the current fragment */
+ long n; /* size of the current piece of payload */
+ int num_edescs = 0;
+ int segment;
+
+ for (segment = 0; segment < sh->gso_segs; segment++) {
+
+ unsigned int p_used = 0;
+
+ /* The last segment may be less than gso_size. */
+ len -= p_len;
+ if (len < p_len)
+ p_len = len;
+
+ /* One edesc for header and for each piece of the payload. */
+ for (num_edescs++; p_used < p_len; num_edescs++) {
+
+ /* Advance as needed. */
+ while (f_used >= f_size) {
+ f_id++;
+ f_size = sh->frags[f_id].size;
+ f_used = 0;
+ }
+
+ /* Use bytes from the current fragment. */
+ n = p_len - p_used;
+ if (n > f_size - f_used)
+ n = f_size - f_used;
+ f_used += n;
+ p_used += n;
+ }
+ }
+
+ return num_edescs;
+}
+
+/* Prepare modified copies of the skbuff headers.
+ * FIXME: add support for IPv6.
+ */
+static void tso_headers_prepare(struct sk_buff *skb, unsigned char *headers,
+ s64 slot)
+{
+ struct skb_shared_info *sh = skb_shinfo(skb);
+ struct iphdr *ih;
+ struct tcphdr *th;
+ unsigned int len = skb->len;
+ unsigned char *data = skb->data;
+ unsigned int ih_off, th_off, sh_len, total_len, p_len;
+ unsigned int isum_start, tsum_start, id, seq;
+ long f_id = -1; /* id of the current fragment */
+ long f_size = -1; /* size of the current fragment */
+ long f_used = -1; /* bytes used from the current fragment */
+ long n; /* size of the current piece of payload */
+ int segment;
+
+ /* Locate original headers and compute various lengths. */
+ ih = ip_hdr(skb);
+ th = tcp_hdr(skb);
+ ih_off = (unsigned char *)ih - data;
+ th_off = (unsigned char *)th - data;
+ sh_len = th_off + tcp_hdrlen(skb);
+ p_len = sh->gso_size;
+ total_len = p_len + sh_len;
+
+ /* Set up seed values for IP and TCP csum and initialize id and seq. */
+ isum_start = ((0xFFFF - ih->check) +
+ (0xFFFF - ih->tot_len) +
+ (0xFFFF - ih->id));
+ tsum_start = th->check + (0xFFFF ^ htons(len));
+ id = ntohs(ih->id);
+ seq = ntohl(th->seq);
+
+ /* Prepare all the headers. */
+ for (segment = 0; segment < sh->gso_segs; segment++) {
+ unsigned char *buf;
+ unsigned int p_used = 0;
+
+ /* The last segment may be less than gso_size. */
+ len -= p_len;
+ if (len < p_len) {
+ p_len = len;
+ total_len = p_len + sh_len;
+ }
+
+ /* Copy to the header memory for this segment. */
+ buf = headers + (slot % EQUEUE_ENTRIES) * HEADER_BYTES +
+ NET_IP_ALIGN;
+ memcpy(buf, data, sh_len);
+
+ /* Update copied ip header. */
+ ih = (struct iphdr *)(buf + ih_off);
+ ih->tot_len = htons(total_len - ih_off);
+ ih->id = htons(id);
+ ih->check = csum_long(isum_start + htons(total_len - ih_off) +
+ htons(id)) ^ 0xffff;
+
+ /* Update copied tcp header. */
+ th = (struct tcphdr *)(buf + th_off);
+ th->seq = htonl(seq);
+ th->check = csum_long(tsum_start + htons(total_len));
+ if (segment != sh->gso_segs - 1) {
+ th->fin = 0;
+ th->psh = 0;
+ }
+
+ /* Skip past the header. */
+ slot++;
+
+ /* Skip past the payload. */
+ while (p_used < p_len) {
+
+ /* Advance as needed. */
+ while (f_used >= f_size) {
+ f_id++;
+ f_size = sh->frags[f_id].size;
+ f_used = 0;
+ }
+
+ /* Use bytes from the current fragment. */
+ n = p_len - p_used;
+ if (n > f_size - f_used)
+ n = f_size - f_used;
+ f_used += n;
+ p_used += n;
+
+ slot++;
+ }
+
+ id++;
+ seq += p_len;
+ }
+
+ /* Flush the headers so they are ready for hardware DMA. */
+ wmb();
+}
+
+/* Pass all the data to mpipe for egress. */
+static void tso_egress(struct net_device *dev, gxio_mpipe_equeue_t *equeue,
+ struct sk_buff *skb, unsigned char *headers, s64 slot)
+{
+ struct tile_net_priv *priv = netdev_priv(dev);
+ struct skb_shared_info *sh = skb_shinfo(skb);
+ unsigned int len = skb->len;
+ unsigned int p_len = sh->gso_size;
+ gxio_mpipe_edesc_t edesc_head = { { 0 } };
+ gxio_mpipe_edesc_t edesc_body = { { 0 } };
+ long f_id = -1; /* id of the current fragment */
+ long f_size = -1; /* size of the current fragment */
+ long f_used = -1; /* bytes used from the current fragment */
+ long n; /* size of the current piece of payload */
+ unsigned long tx_packets = 0, tx_bytes = 0;
+ unsigned int csum_start, sh_len;
+ int segment;
+
+ /* Prepare to egress the headers: set up header edesc. */
+ csum_start = skb_checksum_start_offset(skb);
+ sh_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
+ edesc_head.csum = 1;
+ edesc_head.csum_start = csum_start;
+ edesc_head.csum_dest = csum_start + skb->csum_offset;
+ edesc_head.xfer_size = sh_len;
+
+ /* This is only used to specify the TLB. */
+ edesc_head.stack_idx = large_buffer_stack;
+ edesc_body.stack_idx = large_buffer_stack;
+
+ /* Egress all the edescs. */
+ for (segment = 0; segment < sh->gso_segs; segment++) {
+ void *va;
+ unsigned char *buf;
+ unsigned int p_used = 0;
+
+ /* The last segment may be less than gso_size. */
+ len -= p_len;
+ if (len < p_len)
+ p_len = len;
+
+ /* Egress the header. */
+ buf = headers + (slot % EQUEUE_ENTRIES) * HEADER_BYTES +
+ NET_IP_ALIGN;
+ edesc_head.va = va_to_tile_io_addr(buf);
+ gxio_mpipe_equeue_put_at(equeue, edesc_head, slot);
+ slot++;
+
+ /* Egress the payload. */
+ while (p_used < p_len) {
+
+ /* Advance as needed. */
+ while (f_used >= f_size) {
+ f_id++;
+ f_size = sh->frags[f_id].size;
+ f_used = 0;
+ }
+
+ va = tile_net_frag_buf(&sh->frags[f_id]) + f_used;
+
+ /* Use bytes from the current fragment. */
+ n = p_len - p_used;
+ if (n > f_size - f_used)
+ n = f_size - f_used;
+ f_used += n;
+ p_used += n;
+
+ /* Egress a piece of the payload. */
+ edesc_body.va = va_to_tile_io_addr(va);
+ edesc_body.xfer_size = n;
+ edesc_body.bound = !(p_used < p_len);
+ gxio_mpipe_equeue_put_at(equeue, edesc_body, slot);
+ slot++;
+ }
+
+ tx_packets++;
+ tx_bytes += sh_len + p_len;
+ }
+
+ /* Update stats. */
+ tile_net_stats_add(tx_packets, &priv->stats.tx_packets);
+ tile_net_stats_add(tx_bytes, &priv->stats.tx_bytes);
+}
+
+/* Do TSO handling for egress. */
+static int tile_net_tx_tso(struct sk_buff *skb, struct net_device *dev)
+{
+ struct tile_net_priv *priv = netdev_priv(dev);
+ struct tile_net_info *info = &__get_cpu_var(per_cpu_info);
+ int channel = priv->echannel;
+ struct tile_net_egress *egress = &egress_for_echannel[channel];
+ struct tile_net_comps *comps = info->comps_for_echannel[channel];
+ gxio_mpipe_equeue_t *equeue = egress->equeue;
+ unsigned long irqflags;
+ int num_edescs;
+ s64 slot;
+
+ /* Determine how many mpipe edesc's are needed. */
+ num_edescs = tso_count_edescs(skb);
+
+ local_irq_save(irqflags);
+
+ /* Set first reserved egress slot. */
+ slot = tile_net_equeue_try_reserve(dev, comps, equeue, num_edescs);
+ if (slot < 0) {
+ local_irq_restore(irqflags);
+ return NETDEV_TX_BUSY;
+ }
+
+ /* Set up copies of header data properly. */
+ tso_headers_prepare(skb, egress->headers, slot);
+
+ /* Actually pass the data to the network hardware. */
+ tso_egress(dev, equeue, skb, egress->headers, slot);
+
+ /* Add a completion record. */
+ add_comp(equeue, comps, slot + num_edescs - 1, skb);
+
+ local_irq_restore(irqflags);
+
+ /* Make sure the egress timer is scheduled. */
+ tile_net_schedule_egress_timer(info);
+
+ return NETDEV_TX_OK;
+}
+
+/* Analyze the body and frags for a transmit request. */
+static unsigned int tile_net_tx_frags(struct frag *frags,
+ struct sk_buff *skb,
+ void *b_data, unsigned int b_len)
+{
+ unsigned int i, n = 0;
+
+ struct skb_shared_info *sh = skb_shinfo(skb);
+
+ if (b_len != 0) {
+ frags[n].buf = b_data;
+ frags[n++].length = b_len;
+ }
+
+ for (i = 0; i < sh->nr_frags; i++) {
+ skb_frag_t *f = &sh->frags[i];
+ frags[n].buf = tile_net_frag_buf(f);
+ frags[n++].length = skb_frag_size(f);
+ }
+
+ return n;
+}
+
+/* Help the kernel transmit a packet. */
+static int tile_net_tx(struct sk_buff *skb, struct net_device *dev)
+{
+ struct tile_net_priv *priv = netdev_priv(dev);
+ struct tile_net_info *info = &__get_cpu_var(per_cpu_info);
+ struct tile_net_egress *egress = &egress_for_echannel[priv->echannel];
+ gxio_mpipe_equeue_t *equeue = egress->equeue;
+ struct tile_net_comps *comps =
+ info->comps_for_echannel[priv->echannel];
+ unsigned int len = skb->len;
+ unsigned char *data = skb->data;
+ unsigned int num_edescs;
+ struct frag frags[MAX_FRAGS];
+ gxio_mpipe_edesc_t edescs[MAX_FRAGS];
+ unsigned long irqflags;
+ gxio_mpipe_edesc_t edesc = { { 0 } };
+ unsigned int i;
+ s64 slot;
+
+ /* Save the timestamp. */
+ dev->trans_start = jiffies;
+
+ if (skb_is_gso(skb))
+ return tile_net_tx_tso(skb, dev);
+
+ num_edescs = tile_net_tx_frags(frags, skb, data, skb_headlen(skb));
+
+ /* This is only used to specify the TLB. */
+ edesc.stack_idx = large_buffer_stack;
+
+ /* Prepare the edescs. */
+ for (i = 0; i < num_edescs; i++) {
+ edesc.xfer_size = frags[i].length;
+ edesc.va = va_to_tile_io_addr(frags[i].buf);
+ edescs[i] = edesc;
+ }
+
+ /* Mark the final edesc. */
+ edescs[num_edescs - 1].bound = 1;
+
+ /* Add checksum info to the initial edesc, if needed. */
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ unsigned int csum_start = skb_checksum_start_offset(skb);
+ edescs[0].csum = 1;
+ edescs[0].csum_start = csum_start;
+ edescs[0].csum_dest = csum_start + skb->csum_offset;
+ }
+
+ local_irq_save(irqflags);
+
+ /* Set first reserved egress slot. */
+ slot = tile_net_equeue_try_reserve(dev, comps, equeue, num_edescs);
+ if (slot < 0) {
+ local_irq_restore(irqflags);
+ return NETDEV_TX_BUSY;
+ }
+
+ for (i = 0; i < num_edescs; i++)
+ gxio_mpipe_equeue_put_at(equeue, edescs[i], slot++);
+
+ /* Add a completion record. */
+ add_comp(equeue, comps, slot - 1, skb);
+
+ /* NOTE: Use ETH_ZLEN for short packets (e.g. 42 < 60). */
+ tile_net_stats_add(1, &priv->stats.tx_packets);
+ tile_net_stats_add(max_t(unsigned int, len, ETH_ZLEN),
+ &priv->stats.tx_bytes);
+
+ local_irq_restore(irqflags);
+
+ /* Make sure the egress timer is scheduled. */
+ tile_net_schedule_egress_timer(info);
+
+ return NETDEV_TX_OK;
+}
+
+/* Deal with a transmit timeout. */
+static void tile_net_tx_timeout(struct net_device *dev)
+{
+ netif_wake_queue(dev);
+}
+
+/* Ioctl commands. */
+static int tile_net_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
+{
+ return -EOPNOTSUPP;
+}
+
+/* Get system network statistics for device. */
+static struct net_device_stats *tile_net_get_stats(struct net_device *dev)
+{
+ struct tile_net_priv *priv = netdev_priv(dev);
+ return &priv->stats;
+}
+
+/* Change the MTU. */
+static int tile_net_change_mtu(struct net_device *dev, int new_mtu)
+{
+ if ((new_mtu < 68) || (new_mtu > 1500))
+ return -EINVAL;
+ dev->mtu = new_mtu;
+ return 0;
+}
+
+/* Change the Ethernet address of the NIC.
+ *
+ * The hypervisor driver does not support changing MAC address. However,
+ * the hardware does not do anything with the MAC address, so the address
+ * which gets used on outgoing packets, and which is accepted on incoming
+ * packets, is completely up to us.
+ *
+ * Returns 0 on success, negative on failure.
+ */
+static int tile_net_set_mac_address(struct net_device *dev, void *p)
+{
+ struct sockaddr *addr = p;
+
+ if (!is_valid_ether_addr(addr->sa_data))
+ return -EINVAL;
+ memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
+ return 0;
+}
+
+#ifdef CONFIG_NET_POLL_CONTROLLER
+/* Polling 'interrupt' - used by things like netconsole to send skbs
+ * without having to re-enable interrupts. It's not called while
+ * the interrupt routine is executing.
+ */
+static void tile_net_netpoll(struct net_device *dev)
+{
+ disable_percpu_irq(ingress_irq);
+ tile_net_handle_ingress_irq(ingress_irq, NULL);
+ enable_percpu_irq(ingress_irq, 0);
+}
+#endif
+
+static const struct net_device_ops tile_net_ops = {
+ .ndo_open = tile_net_open,
+ .ndo_stop = tile_net_stop,
+ .ndo_start_xmit = tile_net_tx,
+ .ndo_do_ioctl = tile_net_ioctl,
+ .ndo_get_stats = tile_net_get_stats,
+ .ndo_change_mtu = tile_net_change_mtu,
+ .ndo_tx_timeout = tile_net_tx_timeout,
+ .ndo_set_mac_address = tile_net_set_mac_address,
+#ifdef CONFIG_NET_POLL_CONTROLLER
+ .ndo_poll_controller = tile_net_netpoll,
+#endif
+};
+
+/* The setup function.
+ *
+ * This uses ether_setup() to assign various fields in dev, including
+ * setting IFF_BROADCAST and IFF_MULTICAST, then sets some extra fields.
+ */
+static void tile_net_setup(struct net_device *dev)
+{
+ ether_setup(dev);
+ dev->netdev_ops = &tile_net_ops;
+ dev->watchdog_timeo = TILE_NET_TIMEOUT;
+ dev->features |= NETIF_F_LLTX;
+ dev->features |= NETIF_F_HW_CSUM;
+ dev->features |= NETIF_F_SG;
+ dev->features |= NETIF_F_TSO;
+ dev->tx_queue_len = 0;
+ dev->mtu = 1500;
+}
+
+/* Allocate the device structure, register the device, and obtain the
+ * MAC address from the hypervisor.
+ */
+static void tile_net_dev_init(const char *name, const uint8_t *mac)
+{
+ int ret;
+ int i;
+ int nz_addr = 0;
+ struct net_device *dev;
+ struct tile_net_priv *priv;
+
+ /* HACK: Ignore "loop" links. */
+ if (strncmp(name, "loop", 4) == 0)
+ return;
+
+ /* Allocate the device structure. Normally, "name" is a
+ * template, instantiated by register_netdev(), but not for us.
+ */
+ dev = alloc_netdev(sizeof(*priv), name, tile_net_setup);
+ if (!dev) {
+ pr_err("alloc_netdev(%s) failed\n", name);
+ return;
+ }
+
+ /* Initialize "priv". */
+ priv = netdev_priv(dev);
+ memset(priv, 0, sizeof(*priv));
+ priv->dev = dev;
+ priv->channel = -1;
+ priv->loopify_channel = -1;
+ priv->echannel = -1;
+
+ /* Get the MAC address and set it in the device struct; this must
+ * be done before the device is opened. If the MAC is all zeroes,
+ * we use a random address, since we're probably on the simulator.
+ */
+ for (i = 0; i < 6; i++)
+ nz_addr |= mac[i];
+
+ if (nz_addr) {
+ memcpy(dev->dev_addr, mac, 6);
+ dev->addr_len = 6;
+ } else {
+ random_ether_addr(dev->dev_addr);
+ }
+
+ /* Initialize the transmit wake timer. */
+ hrtimer_init(&priv->tx_wake_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ priv->tx_wake_timer.function = tile_net_handle_tx_wake_timer;
+
+ /* Register the network device. */
+ ret = register_netdev(dev);
+ if (ret) {
+ netdev_err(dev, "register_netdev failed %d\n", ret);
+ free_netdev(dev);
+ return;
+ }
+}
+
+/* Per-cpu module initialization. */
+static void tile_net_init_module_percpu(void *unused)
+{
+ struct tile_net_info *info = &__get_cpu_var(per_cpu_info);
+ int my_cpu = smp_processor_id();
+
+ info->has_iqueue = false;
+
+ info->my_cpu = my_cpu;
+
+ /* Initialize the egress timer. */
+ hrtimer_init(&info->egress_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ info->egress_timer.function = tile_net_handle_egress_timer;
+}
+
+/* Module initialization. */
+static int __init tile_net_init_module(void)
+{
+ int i;
+ char name[GXIO_MPIPE_LINK_NAME_LEN];
+ uint8_t mac[6];
+
+ pr_info("Tilera Network Driver\n");
+
+ mutex_init(&tile_net_devs_for_channel_mutex);
+
+ /* Initialize each CPU. */
+ on_each_cpu(tile_net_init_module_percpu, NULL, 1);
+
+ /* Find out what devices we have, and initialize them. */
+ for (i = 0; gxio_mpipe_link_enumerate_mac(i, name, mac) >= 0; i++)
+ tile_net_dev_init(name, mac);
+
+ if (!network_cpus_init())
+ network_cpus_map = *cpu_online_mask;
+
+ return 0;
+}
+
+module_init(tile_net_init_module);
--
1.6.5.2
^ permalink raw reply related
* Investment
From: Alexander Eric @ 2012-05-25 14:28 UTC (permalink / raw)
In-Reply-To: <909505805.23519491337955964409.JavaMail.root@zim-store04.web.westnet.com.au>
[-- Attachment #1: Type: text/plain, Size: 51 bytes --]
Attached are details of Investment
cooperation
[-- Attachment #2: Investment!.pdf --]
[-- Type: application/pdf, Size: 25258 bytes --]
^ permalink raw reply
* skb_release_data oops
From: kendo @ 2012-05-25 14:19 UTC (permalink / raw)
To: netdev
I use the Linux kernel 2.6..38.8,found a bug when free skb,This failure may occur because what was it? Can you give some suggestions, thanks!!!!
Best reguards.
---------------------------------------------------------------
May 25 19:30:54 AnShion <9> klogd: [164619.378640] BUG: unable to handle kernel paging request at 000095a3
May 25 19:30:54 AnShion <9> klogd: [164619.454609] IP: [<c01c2353>] put_page+0x3/0x40
May 25 19:30:54 AnShion <12> klogd: [164619.508726] *pde = 00000000
May 25 19:30:54 AnShion <8> klogd: [164619.544185] Oops: 0000 [#1] SMP
May 25 19:30:54 AnShion <8> klogd: [164619.583891] last sysfs file: /sys/devices/virtual/net/tunl_FJ/uevent
May 25 19:30:54 AnShion <12> klogd: [164619.660716] Modules linked in: dpi_engine ipmi_watchdog nf_connmark ip_set_hash_netiface ip_set_hash_net ip_set_hash_ip xt_set ip_set xt_hashrate xt_dpi xt_pcc xt_nth xt_random xt_nflog xt_replace igb e1000e [last unloaded: dpi_engine]
May 25 19:30:54 AnShion <12> klogd: [164619.912644]
May 25 19:30:54 AnShion <12> klogd: [164619.931412] Pid: 0, comm: kworker/0:1 Not tainted 2.6.38.8 #347 To be filled by O.E.M. To be filled by O.E.M./P8B-X series
May 25 19:30:54 AnShion <12> klogd: [164620.064736] EIP: 0060:[<c01c2353>] EFLAGS: 00010202 CPU: 5
May 25 19:30:54 AnShion <12> klogd: [164620.131193] EIP is at put_page+0x3/0x40
May 25 19:30:54 AnShion <12> klogd: [164620.177950] EAX: 000095a3 EBX: 00000001 ECX: 00000000 EDX: 000095a3
May 25 19:30:54 AnShion <12> klogd: [164620.253737] ESI: dbda30c0 EDI: dbda30c0 EBP: f3cffd4c ESP: f3cffd3c
May 25 19:30:54 AnShion <12> klogd: [164620.329522] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
May 25 19:30:54 AnShion <8> klogd: [164620.394942] Process kworker/0:1 (pid: 0, ti=f3cfe000 task=f3cd4b00 task.ti=f3ce6000)
May 25 19:30:54 AnShion <8> klogd: [164620.488351] Stack:
May 25 19:30:54 AnShion <12> klogd: [164620.513337] f3cffd4c c069dbe4 dbda30c0 00000001 f3cffd58 c069d7a2 00000000 f3cffd70
May 25 19:30:54 AnShion <12> klogd: [164620.607577] c069d8ea c06dba0f 00000000 00000001 dbda30c0 f3cffda4 c06dba0f f30e0000
May 25 19:30:54 AnShion <12> klogd: [164620.701819] 00000000 f3cffd94 c0700ba0 80000000 00000002 c0b407a8 f34d3240 dbda30c0
May 25 19:30:54 AnShion <8> klogd: [164620.796059] Call Trace:
May 25 19:30:54 AnShion <12> klogd: [164620.826232] [<c069dbe4>] ? skb_release_data+0x84/0xa0
May 25 19:30:54 AnShion <12> klogd: [164620.888539] [<c069d7a2>] __kfree_skb+0x12/0x90
May 25 19:30:54 AnShion <12> klogd: [164620.943589] [<c069d8ea>] kfree_skb+0x5a/0x70
May 25 19:30:54 AnShion <12> klogd: [164620.996568] [<c06dba0f>] ? nf_hook_slow+0xcf/0xf0
May 25 19:30:54 AnShion <12> klogd: [164621.054730] [<c06dba0f>] nf_hook_slow+0xcf/0xf0
May 25 19:30:54 AnShion <12> klogd: [164621.110820] [<c0700ba0>] ? ip_local_deliver_finish+0x0/0x260
May 25 19:30:54 AnShion <12> klogd: [164621.180385] [<c0700e52>] ip_local_deliver+0x52/0xa0
May 25 19:30:54 AnShion <12> klogd: [164621.240620] [<c0700ba0>] ? ip_local_deliver_finish+0x0/0x260
May 25 19:30:54 AnShion <12> klogd: [164621.310186] [<c0700701>] ip_rcv_finish+0x241/0x3c0
May 25 19:30:54 AnShion <12> klogd: [164621.369383] [<c0700b26>] ip_rcv+0x2a6/0x320
May 25 19:30:54 AnShion <12> klogd: [164621.421324] [<c07004c0>] ? ip_rcv_finish+0x0/0x3c0
May 25 19:30:54 AnShion <12> klogd: [164621.480523] [<c06a9468>] __netif_receive_skb+0x258/0x520
May 25 19:30:54 AnShion <12> klogd: [164621.545942] [<c01e8cc0>] ? add_partial+0x40/0x70
May 25 19:30:54 AnShion <12> klogd: [164621.603068] [<c06a9863>] netif_receive_skb+0x23/0x50
May 25 19:30:54 AnShion <12> klogd: [164621.664340] [<c06a9987>] napi_skb_finish+0x37/0x50
May 25 19:30:54 AnShion <12> klogd: [164621.723537] [<c06a9fcb>] napi_gro_receive+0xdb/0xf0
May 25 19:30:54 AnShion <12> klogd: [164621.783774] [<c0192a4f>] ? irq_to_desc+0xf/0x20
May 25 19:30:54 AnShion <12> klogd: [164621.839860] [<c0105606>] ? handle_irq+0x16/0x90
May 25 19:30:54 AnShion <12> klogd: [164621.895954] [<f81d679c>] igb_poll+0x5fc/0xef0 [igb]
May 25 19:30:54 AnShion <12> klogd: [164621.956184] [<c0104bc5>] ? do_IRQ+0x45/0xb0
May 25 19:30:54 AnShion <12> klogd: [164622.008128] [<c06a9dfa>] net_rx_action+0xaa/0x1a0
May 25 19:30:54 AnShion <12> klogd: [164622.066288] [<c014bfa1>] __do_softirq+0xb1/0x190
May 25 19:30:54 AnShion <12> klogd: [164622.123411] [<c014bef0>] ? __do_softirq+0x0/0x190
May 25 19:30:54 AnShion <8> klogd: [164622.181572] <IRQ>
May 25 19:30:54 AnShion <12> klogd: [164622.207700] [<c014be6d>] ? irq_exit+0x5d/0x80
May 25 19:30:54 AnShion <12> klogd: [164622.261716] [<c011c9a6>] ? smp_apic_timer_interrupt+0x56/0x90
May 25 19:30:54 AnShion <12> klogd: [164622.332317] [<c0819c61>] ? apic_timer_interrupt+0x31/0x38
May 25 19:30:54 AnShion <12> klogd: [164622.398773] [<c0124b05>] ? native_safe_halt+0x5/0x10
May 25 19:30:54 AnShion <12> klogd: [164622.460046] [<c040e345>] ? acpi_idle_do_entry+0x33/0x54
May 25 19:30:54 AnShion <12> klogd: [164622.524426] [<c040e3bd>] ? acpi_idle_enter_c1+0x57/0x95
May 25 19:30:54 AnShion <12> klogd: [164622.588809] [<c0674089>] ? cpuidle_idle_call+0xd9/0x1c0
May 25 19:30:54 AnShion <12> klogd: [164622.653190] [<c010214a>] ? cpu_idle+0x8a/0xc0
May 25 19:30:54 AnShion <12> klogd: [164622.707206] [<c0813699>] ? start_secondary+0x1a1/0x1e8
May 25 19:30:54 AnShion <8> klogd: [164622.770550] Code: 04 f0 ff 0e 0f 94 c0 84 c0 74 d4 89 f8 e8 96 fe ff ff eb cb 0f ae e8 89 f6 8b 03 eb de 8d 74 26 00 8d bc 27 00 00 00 00 55 89 c2 <66> f7 00 00 c0 89 e5 75 1d 8b 40 04 f0 ff 4a 04 0f 94 c0 84 c0
^ permalink raw reply
* [PATCH] ieee802154: pass source address in dgram_recvmsg
From: Stephen Röttger @ 2012-05-25 12:14 UTC (permalink / raw)
To: dbaryshkov, slapin
Cc: davem, linux-zigbee-devel, netdev, linux-kernel,
Stephen Röttger
This patch lets dgram_recvmsg fill in the sockaddr struct in
msg->msg_name with the source address of the packet.
This is used by the userland functions recvmsg and recvfrom to get the
senders address.
The patch is based on the devel branch of
git://linux-zigbee.git.sourceforge.net/gitroot/linux-zigbee/kernel
Signed-off-by: Stephen Röttger <stephen.roettger@zero-entropy.de>
---
net/ieee802154/dgram.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/net/ieee802154/dgram.c b/net/ieee802154/dgram.c
index 7883fa6..d0a6ebc 100644
--- a/net/ieee802154/dgram.c
+++ b/net/ieee802154/dgram.c
@@ -290,6 +290,9 @@ static int dgram_recvmsg(struct kiocb *iocb, struct sock *sk,
size_t copied = 0;
int err = -EOPNOTSUPP;
struct sk_buff *skb;
+ struct sockaddr_ieee802154 *saddr;
+
+ saddr = (struct sockaddr_ieee802154 *)msg->msg_name;
skb = skb_recv_datagram(sk, flags, noblock, &err);
if (!skb)
@@ -308,6 +311,13 @@ static int dgram_recvmsg(struct kiocb *iocb, struct sock *sk,
sock_recv_ts_and_drops(msg, sk, skb);
+ if (saddr) {
+ saddr->family = AF_IEEE802154;
+ saddr->addr = mac_cb(skb)->sa;
+ }
+ if (addr_len)
+ *addr_len = sizeof(*saddr);
+
if (flags & MSG_TRUNC)
copied = skb->len;
done:
--
1.7.8
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox