* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: Arnaldo Carvalho de Melo @ 2014-12-17 21:24 UTC (permalink / raw)
To: David Ahern
Cc: Alexei Starovoitov, Martin KaFai Lau, netdev@vger.kernel.org,
David S. Miller, Hannes Frederic Sowa, Steven Rostedt,
Lawrence Brakmo, Josef Bacik, Kernel Team
In-Reply-To: <5491EE01.5020406@gmail.com>
Em Wed, Dec 17, 2014 at 01:56:33PM -0700, David Ahern escreveu:
> On 12/17/14 1:42 PM, Alexei Starovoitov wrote:
> >>It is not strictly necessary to carry vmlinux, that is just a probe
> >>>point resolution time problem, solvable when generating a shell script,
> >>>on the development machine, to insert the probes.
> >on N development machines with kernels that
> >would match worker machines...
> >I'm not saying it's impossible, just operationally difficult.
> >This is my understanding of Martin's use case.
> That's the use case I am talking about ... N-different kernel versions and
> the probe definitions would need to be generated at *build* time of the
> kernel that uses a cross-compile environment. ie., can't assume there is a
> development machine running the kernel from which you can generate the probe
> definitions. This gets messy quick for embedded deployments.
It shouldn't, you're saying that the rate of pushing out production
kernels is so high that we get lost and can't find the matching full
debug original binaries used.
We have build-ids for that, to have binary content keys, that we can
match what is in production, that has to be as lean as possible, while
being able to get back to all that fat.
Is it that people want so hard to forget about that extra debugging fat
that in the end we need to keep it to be able to figure out what happens
when things go wrong?
I understand that the expectation is that for each production build
there will be unwieldly different probe point definitions to keep, but
is that so?
- Arnaldo
^ permalink raw reply
* Re: Bug: mv643xxx fails with highmem
From: Ezequiel Garcia @ 2014-12-17 21:18 UTC (permalink / raw)
To: Russell King - ARM Linux, David Miller, Nimrod Andy,
Fabio Estevam
Cc: netdev, fugang.duan
In-Reply-To: <20141211202507.GS11285@n2100.arm.linux.org.uk>
Russell, David:
On 12/11/2014 05:25 PM, Russell King - ARM Linux wrote:
> On Thu, Dec 11, 2014 at 03:10:55PM -0500, David Miller wrote:
>> From: Russell King - ARM Linux <linux@arm.linux.org.uk>
>> Date: Thu, 11 Dec 2014 19:49:20 +0000
>>
>>> Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
>>> all fragments with dma_map_single(). This fails when the driver is
>>> used in an environment with highmem.
>>
>> This change looks really buggy to me.
>>
>> Unfortunately, all the changes he subsequently makes for software TSO
>> support depend upon this :-/
>>
>> The change is definitely wrong.
I've been trying to find a fix for this issue, and also trying to
reproduce the bug.
As for the fix, we need to fix the non-TSO and TSO paths independently.
The former is fairly straightforward, but the latter might be a bit more
involved.
The problem is that the tso_t struct holds a pointer to the skb linear
and non-linear data.
struct tso_t {
int next_frag_idx;
void *data;
size_t size;
u16 ip_id;
u32 tcp_seq;
};
Instead, we should deal with pages, and only map the non-linear skb with
skb_frag_dma_map().
On the other side, I haven't been able to reproduce this on my boards. I
did try to put a hack to hold most lowmem pages, but it didn't make a
difference. (In fact, I haven't been able to clearly see how the pages
for the skbuff are allocated from high memory.)
Russell, would you share any hints about your setup? I don't have access
to any Dove boards at the moment, but I do have Kirkwoods, Armadas and
i.MX6.
Thanks a lot for your report and help!
--
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com
^ permalink raw reply
* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: Arnaldo Carvalho de Melo @ 2014-12-17 21:19 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Martin KaFai Lau, netdev@vger.kernel.org, David S. Miller,
Hannes Frederic Sowa, Steven Rostedt, Lawrence Brakmo,
Josef Bacik, Kernel Team
In-Reply-To: <CAADnVQ+zkNL9sJhJuAiQ_y4bis=Sck5pzG86qccXE9vvM0-drQ@mail.gmail.com>
Em Wed, Dec 17, 2014 at 12:42:34PM -0800, Alexei Starovoitov escreveu:
> On Wed, Dec 17, 2014 at 11:51 AM, Arnaldo Carvalho de Melo
> <arnaldo.melo@gmail.com> wrote:
> > Em Wed, Dec 17, 2014 at 09:14:02AM -0800, Alexei Starovoitov escreveu:
> >> On Wed, Dec 17, 2014 at 7:07 AM, Arnaldo Carvalho de Melo
> >> <arnaldo.melo@gmail.com> wrote:
> >> > I guess even just using 'perf probe' to set those wannabe tracepoints
> >> > should be enough, no? Then he can refer to those in his perf record
> >> > call, etc and process it just like with the real tracepoints.
> >
> >> it's far from ideal for two reasons.
> >> - they have different kernels and dragging along vmlinux
> >> with debug info or multiple 'perf list' data is too cumbersome
> >
> > It is not strictly necessary to carry vmlinux, that is just a probe
> > point resolution time problem, solvable when generating a shell script,
> > on the development machine, to insert the probes.
>
> on N development machines with kernels that
> would match worker machines...
> I'm not saying it's impossible, just operationally difficult.
> This is my understanding of Martin's use case.
The point here is that its difficult to cater to the needs of all
involved, researchers and maintainers don't like to be plastered by
contracts to keep metrics and crossroads that at some point made sense.
It will be difficult, in some cases, to some people, to be able to get
all they want, what I tried to stress is that there are alternatives to
commiting to tons of tracepoints (or just a few), in the form of dynamic
ones, that with some infrastructure, could be put to use before
something better comes along.
> >> operationally. Permanent tracepoints solve this problem.
> >
> > Sure, and when available, use them, my suggestion wasn't to use
> > exclusively any mechanism, but to initially use what is available to
> > create the tools, then find places that could be improved (if that
> > proves to be the case) by using a higher performance mechanism.
> agree. I think if kprobe approach was usable, it would have
Who said it was not?
> been used already and yet here you have these patches
> that add tracepoints in few strategic places of tcp stack.
Well, up to the point that these points are argued to death to being
strategic enough to have a tracepoint, kprobes is the way to go, or, in
other words, the _only_ way to go, if you don't want to have a patched
kernel.
> >> - the action upon hitting tracepoint is non-trivial.
> >> perf probe style of unconditionally walking pointer chains
> >> will be tripping over wrong pointers.
> >
> > Huh? Care to elaborate on this one?
>
> if perf probe does 'result->name' as in your example
> then it would work, but patch 5 does conditional
> walking of pointers, so you cannot just add
> a perf probe that does print(ptr1->value1, ptr2->value2)
> It won't crash, but will be collecting wrong stats.
> (likely counting zeros)
Right, for that we need to activate eBPF code when we hit such probes,
but then, it continues being something dynamic, not something that is
forever there, in the source code.
> >> Plus they already need to do aggregation for high
> >> frequency events.
> >
> >> As part of acting on trace_transmit_skb() event:
> >> if (before(tcb->seq, tcp_sk(sk)->snd_nxt)) {
> >> tcp_trace_stats_add(...)
> >> }
> >> if (jiffies_to_msecs(jiffies - sktr->last_ts) ..) {
> >> tcp_trace_stats_add(...)
> >> }
> >
> > But aren't these stats TCP already keeps or could be made to?
>
> that's the whole discussion about.
> tcp_info has some of them.
> Though it's difficult to claim that, say, tcp_info->tcpi_lost is
For such flexibility I think we need to go the eBPF way, i.e. strive the
most to reduce the cost of inserting a stat collection point.
> the same as loss_segs_retrans from patch 5.
- Arnaldo
^ permalink raw reply
* Re: [PATCH net 2/2] geneve: Fix races between socket add and release.
From: Thomas Graf @ 2014-12-17 21:15 UTC (permalink / raw)
To: Jesse Gross; +Cc: David Miller, netdev, Andy Zhou, Stephen Hemminger
In-Reply-To: <CAEP_g=_-hgeosH83FdPZLb9mvi5DSW6mbe+Xe5x0YoR7mKaTPA@mail.gmail.com>
On 12/17/14 at 10:48am, Jesse Gross wrote:
> I generally agree (with the exception of kfree_rcu() - I believe that
> is still needed since incoming packets reference it using RCU).
I didn't inspect this in full detail but seems like the data path
should only care about gs->sock which is properly refcnt'ed.
> However, since this patch is targeted a net- I wanted to make a
> minimal change and not completely redo the locking. A lot of the
> locking here was pulled over from VXLAN and I think it can be
> simplified since I don't expect that the Geneve code will bring in all
> of that logic.
Makes sense. Feel free to take:
Acked-by: Thomas Graf <tgraf@suug.ch>
> for destroying the socket. This was added by Stephen in "vxlan: listen
> on multiple ports" but it's not obvious to me what problem it is
> trying to avoid and I don't see a comment. If possible, it would be
> nice to simplify this as well if the issue doesn't apply to Geneve.
I don't have an explanation for that either. Each entry on the
vni_list[] takes a vs->refcnt.
^ permalink raw reply
* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: David Ahern @ 2014-12-17 20:56 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo
Cc: Martin KaFai Lau, netdev@vger.kernel.org, David S. Miller,
Hannes Frederic Sowa, Steven Rostedt, Lawrence Brakmo,
Josef Bacik, Kernel Team
In-Reply-To: <CAADnVQ+zkNL9sJhJuAiQ_y4bis=Sck5pzG86qccXE9vvM0-drQ@mail.gmail.com>
On 12/17/14 1:42 PM, Alexei Starovoitov wrote:
>> It is not strictly necessary to carry vmlinux, that is just a probe
>> >point resolution time problem, solvable when generating a shell script,
>> >on the development machine, to insert the probes.
> on N development machines with kernels that
> would match worker machines...
> I'm not saying it's impossible, just operationally difficult.
> This is my understanding of Martin's use case.
>
That's the use case I am talking about ... N-different kernel versions
and the probe definitions would need to be generated at *build* time of
the kernel that uses a cross-compile environment. ie., can't assume
there is a development machine running the kernel from which you can
generate the probe definitions. This gets messy quick for embedded
deployments.
David
^ permalink raw reply
* pull request: bluetooth 2014-12-17
From: Johan Hedberg @ 2014-12-17 20:46 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-wireless, linux-bluetooth, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1058 bytes --]
Hi Dave,
Here's the first direct (i.e. skipping the wireless tree) bluetooth pull
request for you, intended for 3.19. It's just one patch: a fix from
Marcel for for remote service discovery filtering which also fixes a
'used uninitialized' compiler warning.
Please let me know if there are any issues pulling. Thanks.
Johan
---
The following changes since commit 65891feac27e26115dc4cce881743a1ac33372df:
net: Disallow providing non zero VLAN ID for NIC drivers FDB add flow (2014-12-16 15:41:19 -0500)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git for-upstream
for you to fetch changes up to ea8ae2516ac43028a01c40b58ffa80d3b0afb802:
Bluetooth: Fix bug with filter in service discovery optimization (2014-12-17 22:03:49 +0200)
----------------------------------------------------------------
Marcel Holtmann (1):
Bluetooth: Fix bug with filter in service discovery optimization
net/bluetooth/mgmt.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: rapier @ 2014-12-17 20:45 UTC (permalink / raw)
To: Yuchung Cheng, Blake Matheny
Cc: Eric Dumazet, Alexei Starovoitov, Laurent Chavey, Martin Lau,
netdev@vger.kernel.org, David S. Miller, Hannes Frederic Sowa,
Steven Rostedt, Lawrence Brakmo, Josef Bacik, Kernel Team
In-Reply-To: <CAK6E8=fsKNcVBBewDw89cLLFuQb1a0parZ7hmvuRfherqjXTng@mail.gmail.com>
On 12/15/14 2:56 PM, Yuchung Cheng wrote:
> On Mon, Dec 15, 2014 at 8:08 AM, Blake Matheny <bmatheny@fb.com> wrote:
>>
>> We have an additional set of patches for web10g that builds on these
>> tracepoints. It can be made to work either way, but I agree the idea of
>> something like a sockopt would be really nice.
>
> I'd like to compare these patches with tools that parse pcap files to
> generate per-flow counters to collect RTTs, #dupacks, etc. What
> additional values or insights do they provide to improve/debug TCP
> performance? maybe an example?
So this is our use scenario:
If the stack were instrumented on a per flow basis we can gather metrics
proactively. This data can likely be processed in a near real time basis
to at least get some general idea about the health of the flow (dupack,
cong events, spurious rto, etc). It's possible we can use this data to
provisionally flag flows during the lifespan of the transfer. If we
store the collected metrics NOC engineers can access this to make a
final determination about performance. They may then start the
resolution process immediately using data collected in situ. With the
web10g data we do collect stack data but we are also collecting
information about the path and the interaction between the application
and the stack.
This scenario is particularly appealing in the realm of big data
science. We're currently working with datasets that are hundreds of TBs
in size and will soon be dealing with multiple PBs as a matter of
course. In many cases we're aware of the path characteristics in advance
via SDN so we can apply the macroscopic model and see when we're
dropping below thresholds for that path. Since we're doing most of
transfers between loosely federated sets of distantly located transfer
nodes we don't generally have access to the far end of the connection
which might be the right place to collect the pcap data.
> IMO these stats provide a general pictures of how TCP works of a
> specific network, but not enough to really nail specific bugs in TCP
> protocol or implementation. Then SNMP stats or sampling with pcap
> traces with offline analysis can achieve the same purpose.
I'd agree with that but in the scenario we are most interested in
protocol/implementation issues are secondary concerns. They are
important but we've mostly be focused on what we can do to make the
scientific workflow easier when dealing with the transfer of large data
sets.
^ permalink raw reply
* Re: [PATCH 03/10] ovs: Enable handling of UFO6 packets.
From: Vlad Yasevich @ 2014-12-17 20:44 UTC (permalink / raw)
To: Sergei Shtylyov, Vladislav Yasevich, netdev
Cc: mst, ben, stefanha, virtualization
In-Reply-To: <5491E4C2.5080402@cogentembedded.com>
On 12/17/2014 03:17 PM, Sergei Shtylyov wrote:
> Hello.
>
> On 12/17/2014 09:20 PM, Vladislav Yasevich wrote:
>
>> Since UFO6 packets can now be identified by SKB_GSO_UDP6, add proper checks
>> to handel UFO6 flows.
>> Legacy applications may still have UFO6 packets identified by SKB_GSO_UDP,
>> so we need to continue to handle them correclty.
>
>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>> ---
>> net/openvswitch/datapath.c | 3 ++-
>> net/openvswitch/flow.c | 2 +-
>> 2 files changed, 3 insertions(+), 2 deletions(-)
>
>> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
>> index f9e556b..b43fc60 100644
>> --- a/net/openvswitch/datapath.c
>> +++ b/net/openvswitch/datapath.c
>> @@ -334,7 +334,8 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb,
>> if (err)
>> break;
>>
>> - if (skb == segs && gso_type & SKB_GSO_UDP) {
>> + if (skb == segs &&
>> + ((gso_type & SKB_GSO_UDP) || (gso_type & SKB_GSO_UDP6))) {
>
> 'gso_type & (SKB_GSO_UDP | SKB_GSO_UDP6)' would be shorter...
Thanks, will do.
-vlad
>
>> /* The initial flow key extracted by ovs_flow_extract()
>> * in this case is for a first fragment, so we need to
>> * properly mark later fragments.
>> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>> index 2b78789..d03adf4 100644
>> --- a/net/openvswitch/flow.c
>> +++ b/net/openvswitch/flow.c
>> @@ -602,7 +602,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>>
>> if (key->ip.frag == OVS_FRAG_TYPE_LATER)
>> return 0;
>> - if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
>> + if (skb_shinfo(skb)->gso_type & (SKB_GSO_UDP | SKB_GSO_UDP6))
>
> .... like here.
>
> [...]
>
> WBR, Sergei
>
^ permalink raw reply
* Re: [PATCH 01/10] core: Split out UFO6 support
From: Vlad Yasevich @ 2014-12-17 20:43 UTC (permalink / raw)
To: Ben Hutchings, Vladislav Yasevich; +Cc: netdev, mst, stefanha, virtualization
In-Reply-To: <1418847039.30883.29.camel@decadent.org.uk>
On 12/17/2014 03:10 PM, Ben Hutchings wrote:
> On Wed, 2014-12-17 at 13:20 -0500, Vladislav Yasevich wrote:
>> Split IPv6 support for UFO into its own feature similiar to TSO.
>> This will later allow us to re-enable UFO support for virtio-net
>> devices.
> [...]
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 6c8b6f6..8538b67 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -372,6 +372,7 @@ enum {
>>
>> SKB_GSO_MPLS = 1 << 12,
>>
>> + SKB_GSO_UDP6 = 1 << 13
>
> It seems like it would be cleaner to use the names SKB_GSO_UDPV{4,6},
> similarly to SKB_GSO_TCPV{4,6}.
I wanted to try to avoid touched ipv4 paths if I could. I could use
GSO_UDPV6 though.
>
>> };
>>
>> #if BITS_PER_LONG > 32
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 945bbd0..fa4d2ee 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
> [...]
>> @@ -5952,24 +5958,21 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
> [...]
>> + /* UFO also needs checksumming */
>> + if ((features & NETIF_F_UFO) && !(features & NETIF_F_GEN_CSUM) &&
>> + !(features & NETIF_F_IP_CSUM)) {
>
> You can use !(features & NETIF_F_V4_CSUM) instead of the last two terms.
>
>> + netdev_dbg(dev,
>> + "Dropping NETIF_F_UFO since no checksum offload features.\n");
>> + features &= ~NETIF_F_UFO;
>> + }
>> + if ((features & NETIF_F_UFO6) && !(features & NETIF_F_GEN_CSUM) &&
>> + !(features & NETIF_F_IPV6_CSUM)) {
> [...]
>
> Similarly you can use !(features & NETIF_F_V6_CSUM) instead of the last
> two terms.
I made those to look the same as the TSO checks for consistency, but I can change
these to be shorter like above.
-vlad
>
> Aside from those minor points, this looks fine.
>
> Ben.
>
^ permalink raw reply
* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: Alexei Starovoitov @ 2014-12-17 20:42 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Martin KaFai Lau, netdev@vger.kernel.org, David S. Miller,
Hannes Frederic Sowa, Steven Rostedt, Lawrence Brakmo,
Josef Bacik, Kernel Team
On Wed, Dec 17, 2014 at 11:51 AM, Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
> Em Wed, Dec 17, 2014 at 09:14:02AM -0800, Alexei Starovoitov escreveu:
>> On Wed, Dec 17, 2014 at 7:07 AM, Arnaldo Carvalho de Melo
>> <arnaldo.melo@gmail.com> wrote:
>> > I guess even just using 'perf probe' to set those wannabe tracepoints
>> > should be enough, no? Then he can refer to those in his perf record
>> > call, etc and process it just like with the real tracepoints.
>
>> it's far from ideal for two reasons.
>> - they have different kernels and dragging along vmlinux
>> with debug info or multiple 'perf list' data is too cumbersome
>
> It is not strictly necessary to carry vmlinux, that is just a probe
> point resolution time problem, solvable when generating a shell script,
> on the development machine, to insert the probes.
on N development machines with kernels that
would match worker machines...
I'm not saying it's impossible, just operationally difficult.
This is my understanding of Martin's use case.
>> operationally. Permanent tracepoints solve this problem.
>
> Sure, and when available, use them, my suggestion wasn't to use
> exclusively any mechanism, but to initially use what is available to
> create the tools, then find places that could be improved (if that
> proves to be the case) by using a higher performance mechanism.
agree. I think if kprobe approach was usable, it would have
been used already and yet here you have these patches
that add tracepoints in few strategic places of tcp stack.
>> - the action upon hitting tracepoint is non-trivial.
>> perf probe style of unconditionally walking pointer chains
>> will be tripping over wrong pointers.
>
> Huh? Care to elaborate on this one?
if perf probe does 'result->name' as in your example
then it would work, but patch 5 does conditional
walking of pointers, so you cannot just add
a perf probe that does print(ptr1->value1, ptr2->value2)
It won't crash, but will be collecting wrong stats.
(likely counting zeros)
>> Plus they already need to do aggregation for high
>> frequency events.
>
>> As part of acting on trace_transmit_skb() event:
>> if (before(tcb->seq, tcp_sk(sk)->snd_nxt)) {
>> tcp_trace_stats_add(...)
>> }
>> if (jiffies_to_msecs(jiffies - sktr->last_ts) ..) {
>> tcp_trace_stats_add(...)
>> }
>
> But aren't these stats TCP already keeps or could be made to?
that's the whole discussion about.
tcp_info has some of them.
Though it's difficult to claim that, say, tcp_info->tcpi_lost is
the same as loss_segs_retrans from patch 5.
^ permalink raw reply
* Re: [PATCH] MAINTAINERS: changes for wireless
From: Johannes Berg @ 2014-12-17 20:23 UTC (permalink / raw)
To: Arend van Spriel; +Cc: John W. Linville, netdev, linux-wireless, davem
In-Reply-To: <5491D991.30908@broadcom.com>
On Wed, 2014-12-17 at 20:29 +0100, Arend van Spriel wrote:
> > +NETWORKING DRIVERS (WIRELESS)
> > +M: Kalle Valo<kvalo@codeaurora.org>
> > +L: linux-wireless@vger.kernel.org
> > +Q: http://patchwork.kernel.org/project/linux-wireless/list/
> > +T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git/
> > +S: Maintained
> > +F: drivers/net/wireless/
>
> So what about the other paths that were in "NETWORKING [WIRELESS]".
> Couple of them are obviously maintained by Johannes, but..
The remaining ones are probably just wext, and nobody cares any more ...
I guess they're orphaned. If anyone really really really needs to have a
patch against them, and we actually end up wanting it, I'm sure we can
figure something out :)
johannes
^ permalink raw reply
* Re: [PATCH 03/10] ovs: Enable handling of UFO6 packets.
From: Sergei Shtylyov @ 2014-12-17 20:17 UTC (permalink / raw)
To: Vladislav Yasevich, netdev; +Cc: mst, ben, stefanha, virtualization
In-Reply-To: <1418840455-22598-4-git-send-email-vyasevic@redhat.com>
Hello.
On 12/17/2014 09:20 PM, Vladislav Yasevich wrote:
> Since UFO6 packets can now be identified by SKB_GSO_UDP6, add proper checks
> to handel UFO6 flows.
> Legacy applications may still have UFO6 packets identified by SKB_GSO_UDP,
> so we need to continue to handle them correclty.
> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> ---
> net/openvswitch/datapath.c | 3 ++-
> net/openvswitch/flow.c | 2 +-
> 2 files changed, 3 insertions(+), 2 deletions(-)
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index f9e556b..b43fc60 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -334,7 +334,8 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb,
> if (err)
> break;
>
> - if (skb == segs && gso_type & SKB_GSO_UDP) {
> + if (skb == segs &&
> + ((gso_type & SKB_GSO_UDP) || (gso_type & SKB_GSO_UDP6))) {
'gso_type & (SKB_GSO_UDP | SKB_GSO_UDP6)' would be shorter...
> /* The initial flow key extracted by ovs_flow_extract()
> * in this case is for a first fragment, so we need to
> * properly mark later fragments.
> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index 2b78789..d03adf4 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -602,7 +602,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>
> if (key->ip.frag == OVS_FRAG_TYPE_LATER)
> return 0;
> - if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
> + if (skb_shinfo(skb)->gso_type & (SKB_GSO_UDP | SKB_GSO_UDP6))
.... like here.
[...]
WBR, Sergei
^ permalink raw reply
* Re: [PATCH 01/10] core: Split out UFO6 support
From: Ben Hutchings @ 2014-12-17 20:10 UTC (permalink / raw)
To: Vladislav Yasevich
Cc: netdev, virtualization, mst, stefanha, Vladislav Yasevich
In-Reply-To: <1418840455-22598-2-git-send-email-vyasevic@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 1588 bytes --]
On Wed, 2014-12-17 at 13:20 -0500, Vladislav Yasevich wrote:
> Split IPv6 support for UFO into its own feature similiar to TSO.
> This will later allow us to re-enable UFO support for virtio-net
> devices.
[...]
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 6c8b6f6..8538b67 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -372,6 +372,7 @@ enum {
>
> SKB_GSO_MPLS = 1 << 12,
>
> + SKB_GSO_UDP6 = 1 << 13
It seems like it would be cleaner to use the names SKB_GSO_UDPV{4,6},
similarly to SKB_GSO_TCPV{4,6}.
> };
>
> #if BITS_PER_LONG > 32
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 945bbd0..fa4d2ee 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
[...]
> @@ -5952,24 +5958,21 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
[...]
> + /* UFO also needs checksumming */
> + if ((features & NETIF_F_UFO) && !(features & NETIF_F_GEN_CSUM) &&
> + !(features & NETIF_F_IP_CSUM)) {
You can use !(features & NETIF_F_V4_CSUM) instead of the last two terms.
> + netdev_dbg(dev,
> + "Dropping NETIF_F_UFO since no checksum offload features.\n");
> + features &= ~NETIF_F_UFO;
> + }
> + if ((features & NETIF_F_UFO6) && !(features & NETIF_F_GEN_CSUM) &&
> + !(features & NETIF_F_IPV6_CSUM)) {
[...]
Similarly you can use !(features & NETIF_F_V6_CSUM) instead of the last
two terms.
Aside from those minor points, this looks fine.
Ben.
--
Ben Hutchings
Absolutum obsoletum. (If it works, it's out of date.) - Stafford Beer
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply
* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: Arnaldo Carvalho de Melo @ 2014-12-17 19:51 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Martin KaFai Lau, netdev@vger.kernel.org, David S. Miller,
Hannes Frederic Sowa, Steven Rostedt, Lawrence Brakmo,
Josef Bacik, Kernel Team
In-Reply-To: <CAADnVQJgUn-hUrR1XLE4J64Ms52zRCRQkJDFUDu9SGTONJ507w@mail.gmail.com>
Em Wed, Dec 17, 2014 at 09:14:02AM -0800, Alexei Starovoitov escreveu:
> On Wed, Dec 17, 2014 at 7:07 AM, Arnaldo Carvalho de Melo
> <arnaldo.melo@gmail.com> wrote:
> > I guess even just using 'perf probe' to set those wannabe tracepoints
> > should be enough, no? Then he can refer to those in his perf record
> > call, etc and process it just like with the real tracepoints.
> it's far from ideal for two reasons.
> - they have different kernels and dragging along vmlinux
> with debug info or multiple 'perf list' data is too cumbersome
It is not strictly necessary to carry vmlinux, that is just a probe
point resolution time problem, solvable when generating a shell script,
on the development machine, to insert the probes.
> operationally. Permanent tracepoints solve this problem.
Sure, and when available, use them, my suggestion wasn't to use
exclusively any mechanism, but to initially use what is available to
create the tools, then find places that could be improved (if that
proves to be the case) by using a higher performance mechanism.
> - the action upon hitting tracepoint is non-trivial.
> perf probe style of unconditionally walking pointer chains
> will be tripping over wrong pointers.
Huh? Care to elaborate on this one?
> Plus they already need to do aggregation for high
> frequency events.
> As part of acting on trace_transmit_skb() event:
> if (before(tcb->seq, tcp_sk(sk)->snd_nxt)) {
> tcp_trace_stats_add(...)
> }
> if (jiffies_to_msecs(jiffies - sktr->last_ts) ..) {
> tcp_trace_stats_add(...)
> }
But aren't these stats TCP already keeps or could be made to?
- Arnaldo
^ permalink raw reply
* Re: [iproute2] tc: Show classes more hierarchically]
From: Stephen Hemminger @ 2014-12-17 19:55 UTC (permalink / raw)
To: Marcelo Ricardo Leitner; +Cc: vadim4j, netdev
In-Reply-To: <54907619.5080508@gmail.com>
On Tue, 16 Dec 2014 16:12:41 -0200
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> On 15-12-2014 20:48, vadim4j@gmail.com wrote:
> > Hi All,
> >
> > I am playing with showing classes in more hierarchically format and I
> > have some code and example of output from my TC looks like:
> >
> > # tc/tc -t class show dev tap0
> >
> > \---1:2 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > \---1:40 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > \---1:50 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > \---1:60 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > \---1:1 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > \---1:10 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > \---1:11 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > \---1:111 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > \---1:20 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > \---1:30 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> >
> >
> > which in standart output mode it looks like:
> >
> > # tc/tc class show dev tap0
> >
> > class htb 1:11 parent 1:10 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> > class htb 1:111 parent 1:11 prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > class htb 1:10 parent 1:1 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
> > class htb 1:1 root rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
> > class htb 1:20 parent 1:1 leaf 20: prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> > class htb 1:2 root rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
> > class htb 1:30 parent 1:1 leaf 30: prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > class htb 1:40 parent 1:2 leaf 40: prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
> > class htb 1:50 parent 1:2 leaf 50: prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> > class htb 1:60 parent 1:2 leaf 60: prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> >
> > So I'd like to ask if it might be useful for the TC users (may be
> > better format ?) to have this ?
>
> Good idea! It already looks good, but what about:
>
> |-- 1:2 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> | |-- 1:40 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> | |-- 1:50 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> | '-- 1:60 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> |-- 1:1 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> ...
>
> just another idea..
>
> Thanks.
> Marcelo
There are several places that also print tree format, hopefully there would
be reusable code (lspci, tree, ps).
^ permalink raw reply
* Re: [PATCH] MAINTAINERS: changes for wireless
From: Arend van Spriel @ 2014-12-17 19:29 UTC (permalink / raw)
To: John W. Linville; +Cc: netdev, linux-wireless, davem
In-Reply-To: <1418836025-9035-1-git-send-email-linville@tuxdriver.com>
On 12/17/14 18:07, John W. Linville wrote:
> http://marc.info/?l=linux-wireless&m=141883202530292&w=2
>
> This makes it official... :-)
Let see if I can comment on this patch.
First of all, thanks for the years of service. You already gave the
heads up few months ago, but now its official. It has been good working
with you getting rid of eye-sore code in brcm80211 drivers.
> Signed-off-by: John W. Linville<linville@tuxdriver.com>
> ---
> MAINTAINERS | 19 ++++++++-----------
> 1 file changed, 8 insertions(+), 11 deletions(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index fdffe962a16a..e82d31aeb936 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6603,19 +6603,8 @@ L: netdev@vger.kernel.org
> S: Maintained
>
> NETWORKING [WIRELESS]
> -M: "John W. Linville"<linville@tuxdriver.com>
> L: linux-wireless@vger.kernel.org
> Q: http://patchwork.kernel.org/project/linux-wireless/list/
> -T: git git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless.git
> -S: Maintained
> -F: net/mac80211/
> -F: net/rfkill/
> -F: net/wireless/
> -F: include/net/ieee80211*
> -F: include/linux/wireless.h
> -F: include/uapi/linux/wireless.h
> -F: include/net/iw_handler.h
> -F: drivers/net/wireless/
>
> NETWORKING DRIVERS
> L: netdev@vger.kernel.org
> @@ -6636,6 +6625,14 @@ F: include/linux/inetdevice.h
> F: include/uapi/linux/if_*
> F: include/uapi/linux/netdevice.h
>
> +NETWORKING DRIVERS (WIRELESS)
> +M: Kalle Valo<kvalo@codeaurora.org>
> +L: linux-wireless@vger.kernel.org
> +Q: http://patchwork.kernel.org/project/linux-wireless/list/
> +T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git/
> +S: Maintained
> +F: drivers/net/wireless/
So what about the other paths that were in "NETWORKING [WIRELESS]".
Couple of them are obviously maintained by Johannes, but..
> +
> NETXEN (1/10) GbE SUPPORT
> M: Manish Chopra<manish.chopra@qlogic.com>
> M: Sony Chacko<sony.chacko@qlogic.com>
^ permalink raw reply
* [PATCH] net: unisys: adding unisys virtnic driver
From: Erik Arfvidson @ 2014-12-17 18:52 UTC (permalink / raw)
To: benjamin.romer, netdev, dzickus, davem, Bruce.Vessey,
sparmaintainer, prarit
Cc: Erik Arfvidson
The purpose of this patch is to add Unisys virtual network driver
into the network directory and also to start a discussion about
the requirements needed.
Signed-off-by: Erik Arfvidson <earfvids@redhat.com>
---
drivers/net/virtnic.c | 2475 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 2475 insertions(+)
create mode 100644 drivers/net/virtnic.c
diff --git a/drivers/net/virtnic.c b/drivers/net/virtnic.c
new file mode 100644
index 0000000..0af48f3
--- /dev/null
+++ b/drivers/net/virtnic.c
@@ -0,0 +1,2475 @@
+/* virtnic.c
+ *
+ * Copyright © 2010 - 2014 UNISYS CORPORATION
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for more
+ * details.
+ */
+
+#define EXPORT_SYMTAB
+
+#include <linux/kernel.h>
+#ifdef CONFIG_MODVERSIONS
+#include <config/modversions.h>
+#endif
+
+#include "uniklog.h"
+#include "diagnostics/appos_subsystems.h"
+#include "uisutils.h"
+#include "uisthread.h"
+#include "uisqueue.h"
+#include "visorchipset.h"
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/pci.h>
+#include <linux/spinlock.h>
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/string.h>
+#include <linux/tcp.h>
+#include <linux/ip.h>
+#include <linux/types.h>
+#include <linux/uuid.h>
+#include <linux/debugfs.h>
+
+#include "virtpci.h"
+#include "version.h"
+
+/* this is shorter than using __FILE__ (full path name) in */
+/* debug/info/error messages */
+#define __MYFILE__ "virtnic.c"
+
+/* turn off collecting of debug statistics */
+#define VIRTNIC_STATS 0
+
+ /* MAX_BUF = 64 lines x 32 MAXVHBA x 80 characters
+ * = 163840 bytes ~ 40 pages
+ */
+#define MAX_BUF 163840
+
+/*
+ * uisnic virtnic
+ * <---- xmit --- virtnic_xmit(hard-start-xmit)
+ * <-- rcvpost -- open, virtnic_rx
+ * <-- unpost --- close
+ * <-- enb/dis -- open, close
+ *
+ * open & close can't run at the same time as each other or rcv/xmit, but
+ * virtnic_xmit and virtnic_rx could be running at the same time.
+ * and all messages being sent to uisnic MUST be sent so if the queue is
+ * full we have to retry, but we don't want to retry with a spinlock held.
+ */
+
+/*****************************************************/
+/* Forward declarations */
+/*****************************************************/
+static int virtnic_probe(struct virtpci_dev *dev,
+ const struct pci_device_id *id);
+static void virtnic_remove(struct virtpci_dev *dev);
+static int virtnic_change_mtu(struct net_device *netdev, int new_mtu);
+static int virtnic_close(struct net_device *netdev);
+static struct net_device_stats *virtnic_get_stats(struct net_device *netdev);
+static int virtnic_open(struct net_device *netdev);
+static int virtnic_ioctl(struct net_device *netdev, struct ifreq *ifr,
+ int cmd);
+static void virtnic_rx(struct uiscmdrsp *cmdrsp);
+static int virtnic_xmit(struct sk_buff *skb, struct net_device *netdev);
+static void virtnic_xmit_timeout(struct net_device *netdev);
+static void virtnic_set_multi(struct net_device *netdev);
+static int virtnic_serverdown(struct virtpci_dev *virtpcidev, u32 state);
+static int virtnic_serverup(struct virtpci_dev *virtpcidev);
+static void virtnic_serverdown_complete(struct work_struct *work);
+static void virtnic_timeout_reset(struct work_struct *work);
+static int process_incoming_rsps(void *);
+static ssize_t info_debugfs_read(struct file *file, char __user *buf,
+ size_t len, loff_t *offset);
+static ssize_t enable_ints_write(struct file *file,
+ const char __user *buffer,
+ size_t count, loff_t *ppos);
+
+/*****************************************************/
+/* Globals */
+/*****************************************************/
+
+#define VIRTNIC_XMIT_TIMEOUT (5 * HZ) /* Default timeout period in jiffies */
+#define VIRTNIC_INFINITE_RESPONSE_WAIT 0
+#define INTERRUPT_VECTOR_MASK 0x3F
+
+static struct workqueue_struct *virtnic_serverdown_workqueue;
+static struct workqueue_struct *virtnic_timeout_reset_workqueue;
+
+static const struct pci_device_id virtnic_id_table[] = {
+ {
+ PCI_DEVICE(PCI_VENDOR_ID_UNISYS, PCI_DEVICE_ID_VIRTNIC)}, {
+0},};
+/* export virtnic_id_table */
+MODULE_DEVICE_TABLE(pci, virtnic_id_table);
+
+static struct virtpci_driver virtnic_driver = {
+ .name = "uisvirtnic",
+ .version = VERSION,
+ .vertag = NULL,
+ .id_table = virtnic_id_table,
+ .probe = virtnic_probe,
+ .remove = virtnic_remove,
+ .suspend = virtnic_serverdown,
+ .resume = virtnic_serverup
+};
+
+#define SEND_ENBDIS(ndev, state, cmdrsp, queue, insertlock, stats) { \
+ DBGINF("sending rcv enb/dis netdev:%p state:%d\n", ndev, state); \
+ cmdrsp->net.enbdis.enable = state; \
+ cmdrsp->net.enbdis.context = ndev; \
+ cmdrsp->net.type = NET_RCV_ENBDIS; \
+ cmdrsp->cmdtype = CMD_NET_TYPE; \
+ uisqueue_put_cmdrsp_with_lock_client(queue, cmdrsp, IOCHAN_TO_IOPART, \
+ (void *)insertlock, \
+ DONT_ISSUE_INTERRUPT, \
+ (uint64_t)NULL, \
+ OK_TO_WAIT, "vnic"); \
+ stats.sent_enbdis++;\
+}
+
+struct chanstat {
+ unsigned long got_rcv; /* count of NET_RCV received */
+ unsigned long got_enbdisack; /* count of NET_RCV_ENBDIS_ACK rcvd */
+ unsigned long got_xmit_done; /* count of NET_XMIT_DONE received */
+ unsigned long xmit_fail; /* count of NET_XMIT_DONE failures */
+ unsigned long sent_enbdis; /* count of NET_RCV_ENBDIS sent */
+ unsigned long sent_promisc; /* count of NET_RCV_PROMISC sent */
+ unsigned long sent_post; /* count of NET_RCV_POST sent */
+ unsigned long sent_xmit; /* count of NET_XMIT sent */
+ unsigned long reject_count; /* count of NET_XMIT rejected because */
+ /* of BUSY/queue full */
+ unsigned long extra_rcvbufs_sent;
+#if VIRTNIC_STATS
+ unsigned long reject_jiffies_start; /* jiffie count at start of
+ NET_XMIT rejects */
+#endif /* VIRTNIC_STATS */
+};
+
+struct datachan {
+ struct chaninfo chinfo;
+ struct chanstat chstat;
+};
+
+struct virtnic_info {
+ struct virtpci_dev *virtpcidev;
+ struct net_device *netdev;
+ struct net_device_stats net_stats;
+ spinlock_t priv_lock; /* spinlock check for private lock */
+ struct datachan datachan;
+ struct sk_buff **rcvbuf; /* rcvbuf is the array of rcv buffer */
+ /* we post to */
+ unsigned long long uniquenum;
+
+ /* the IOPART end */
+ int num_rcv_bufs; /* indicates how many receive buffers the
+ vnic will post */
+ int num_rcv_bufs_could_not_alloc;
+ atomic_t num_rcv_bufs_in_iovm; /* indicates how many receive buffers
+ have actully been sent to the iovm */
+ unsigned long inner_loop_limit_reached_cnt;
+ unsigned long alloc_failed_in_if_needed_cnt;
+ unsigned long alloc_failed_in_repost_return_cnt;
+
+ struct sk_buff_head xmitbufhead; /* xmitbufhead is the head of
+ the xmit buffer list that
+ have been sent to the IOPART
+ end */
+ int max_outstanding_net_xmits; /* absolute max number of outstanding
+ xmits - should never hit this */
+ int upper_threshold_net_xmits; /* high water mark for calling
+ netif_stop_queue() */
+ int lower_threshold_net_xmits; /* high water mark for calling
+ netif_wake_queue() */
+ uuid_le zoneguid; /* specifies the zone for the switch in
+ which this VNIC resides */
+ struct uiscmdrsp *cmdrsp_rcv; /* cmdrsp_rcv is used for
+ posting/unposting rcv buffers */
+ unsigned short enabled; /* 0 disabled 1 enabled to receive */
+ unsigned short enab_dis_acked; /* NET_RCV_ENABLE/DISABLE acked by
+ uisnic */
+ atomic_t usage; /* count of users */
+ unsigned short old_flags; /* flags as they were prior to
+ set_multicast_list */
+ struct uiscmdrsp *xmit_cmdrsp; /* used to issue NET_XMIT - there is
+ never more that one xmit in progress
+ at a time */
+ struct dentry *eth_debugfs_dir; /* this points to /proc/eth?
+ directory */
+ struct dentry *zone_debugfs_entry; /* this points to
+ /proc/virtnic/eth?/zone */
+ /* file */
+ struct dentry *clientstr_debugfs_entry;/* this points to
+ /proc/virtnic/eth?/clientstr
+ file */
+ struct irq_info intr; /* use recvInterrupt info to connect
+ to this to receive interrupts when
+ IOs complete */
+ int interrupt_vector;
+ int thread_wait_ms;
+ int queuefullmsg_logged; /* flag for throttling queue full */
+ /* messages */
+ /* some debug counters */
+ ulong n_rcv0; /* # rcvs of 0 buffers */
+ ulong n_rcv1; /* # rcvs of 1 buffer */
+ ulong n_rcv2; /* # rcvs of 2 buffers */
+ ulong n_rcvx; /* # rcvs of >2 buffers */
+ ulong found_repost_rcvbuf_cnt; /* #time we called repost_rcvbuf_cnt */
+ ulong repost_found_skb_cnt; /* # times found the skb */
+ ulong n_repost_deficit; /* # times we couldn't find all of the
+ rcv buffers */
+ ulong bad_rcv_buf; /* # times we neglected to
+ free the rcv skb because
+ we didn't know where it
+ came from */
+ ulong n_rcv_packet_not_accepted; /* # bogus recv packets */
+ bool server_down;
+ bool server_change_state;
+ unsigned long long interrupts_rcvd;
+ unsigned long long interrupts_notme;
+ unsigned long long interrupts_disabled;
+ unsigned long long busy_cnt;
+ unsigned long long flow_control_upper_hits;
+ unsigned long long flow_control_lower_hits;
+ struct work_struct serverdown_completion;
+ struct work_struct timeout_reset;
+ uint64_t __iomem *flags_addr;
+ atomic_t interrupt_rcvd;
+ wait_queue_head_t rsp_queue;
+};
+
+struct virtnic_devices_open {
+ struct net_device *netdev;
+ struct virtnic_info *vnicinfo;
+};
+
+static ssize_t show_zone(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct net_device *net = to_net_dev(dev);
+ struct virtnic_info *vnicinfo = netdev_priv(net);
+
+ return scnprintf(buf, PAGE_SIZE, "%pUL\n", &vnicinfo->zoneguid);
+}
+
+static ssize_t show_clientstr(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct net_device *net = to_net_dev(dev);
+ struct virtnic_info *vnicinfo = netdev_priv(net);
+ struct spar_io_channel_protocol *chan =
+ (struct spar_io_channel_protocol *)vnicinfo->
+ datachan.chinfo.queueinfo->chan;
+
+ return scnprintf(buf, PAGE_SIZE, "%s\n",
+ (char *)&chan->client_string);
+}
+static DEVICE_ATTR(clientstr, S_IRUGO, show_clientstr, NULL);
+static DEVICE_ATTR(zone, S_IRUGO, show_zone, NULL);
+
+#define VIRTNICSOPENMAX 32
+/* array of open devices maintained by open() and close() */
+static struct virtnic_devices_open num_virtnic_open[VIRTNICSOPENMAX];
+static struct dentry *virtnic_debugfs_dir;
+
+static const struct file_operations debugfs_info_fops = {
+ .read = info_debugfs_read,
+};
+
+static const struct file_operations debugfs_enable_ints_fops = {
+ .write = enable_ints_write,
+};
+
+/*****************************************************/
+/* Probe Remove Functions */
+/*****************************************************/
+/* set up net.rcvpost struct in cmdrsp.
+ * all rcv buf skb are allocated at RCVPOST_BUF_SIZE, so length is
+ * RCVPOST_BUF_SIZE by default. and since RCVPOST_BUF_SIZE < 2048, one
+ * phys_info struct can describe the rcv buf.
+ */
+static inline void
+post_skb(struct uiscmdrsp *cmdrsp,
+ struct virtnic_info *vnicinfo, struct sk_buff *skb)
+{
+ cmdrsp->net.buf = skb;
+ cmdrsp->net.rcvpost.frag.pi_pfn = page_to_pfn(virt_to_page(skb->data));
+ cmdrsp->net.rcvpost.frag.pi_off =
+ (unsigned long)skb->data & PI_PAGE_MASK;
+ cmdrsp->net.rcvpost.frag.pi_len = skb->len;
+ cmdrsp->net.rcvpost.unique_num = vnicinfo->uniquenum;
+
+ DBGINF("RCV_POST skb:%p pfn:%llu off:%x len:%d\n", skb,
+ cmdrsp->net.rcvpost.frag.pi_pfn,
+ cmdrsp->net.rcvpost.frag.pi_off,
+ cmdrsp->net.rcvpost.frag.pi_len);
+ if ((cmdrsp->net.rcvpost.frag.pi_off + skb->len) > PI_PAGE_SIZE) {
+ LOGERRNAME(vnicinfo->netdev,
+ "**** pi_off:0x%x pi_len:%d SPAN ACROSS A PAGE\n",
+ cmdrsp->net.rcvpost.frag.pi_off, skb->len);
+ } else {
+ cmdrsp->net.type = NET_RCV_POST;
+ cmdrsp->cmdtype = CMD_NET_TYPE;
+ uisqueue_put_cmdrsp_with_lock_client(vnicinfo->datachan.chinfo.
+ queueinfo, cmdrsp,
+ IOCHAN_TO_IOPART,
+ (void *)&vnicinfo->
+ datachan.chinfo.insertlock,
+ DONT_ISSUE_INTERRUPT,
+ (uint64_t)NULL,
+ OK_TO_WAIT,
+ "vnic");
+ atomic_inc(&vnicinfo->num_rcv_bufs_in_iovm);
+ vnicinfo->datachan.chstat.sent_post++;
+ }
+}
+
+static irqreturn_t
+virtnic_ISR(int irq, void *dev_id)
+{
+ struct virtnic_info *vnicinfo = (struct virtnic_info *)dev_id;
+
+ struct channel_header __iomem *p_channel_header;
+
+ struct signal_queue_header __iomem *pqhdr;
+ uint64_t mask;
+ unsigned long long rc1;
+
+ if (vnicinfo == NULL)
+ return IRQ_NONE;
+ vnicinfo->interrupts_rcvd++;
+ p_channel_header = vnicinfo->datachan.chinfo.queueinfo->chan;
+ if (((readq(&p_channel_header->features) &
+ ULTRA_IO_IOVM_IS_OK_WITH_DRIVER_DISABLING_INTS) != 0) &&
+ ((readq(&p_channel_header->features) &
+ ULTRA_IO_DRIVER_DISABLES_INTS) != 0)) {
+ /*
+ * should not enter this path because we setup without
+ * DRIVER_DISABLES_INTS.
+ */
+ vnicinfo->interrupts_disabled++;
+ mask = ~ULTRA_CHANNEL_ENABLE_INTS;
+ rc1 = uisqueue_interlocked_and(vnicinfo->flags_addr, mask);
+ }
+ if (spar_signalqueue_empty(p_channel_header, IOCHAN_FROM_IOPART)) {
+ vnicinfo->interrupts_notme++;
+ return IRQ_NONE;
+ }
+ pqhdr = (struct signal_queue_header __iomem *)
+ ((char __iomem *)p_channel_header +
+ readq(&p_channel_header->ch_space_offset)) +
+ IOCHAN_FROM_IOPART;
+ writeq(readq(&pqhdr->num_irq_received) + 1,
+ &pqhdr->num_irq_received);
+ atomic_set(&vnicinfo->interrupt_rcvd, 1);
+ wake_up_interruptible(&vnicinfo->rsp_queue);
+ return IRQ_HANDLED;
+}
+
+static const struct net_device_ops virtnic_dev_ops = {
+ .ndo_open = virtnic_open,
+ .ndo_stop = virtnic_close,
+ .ndo_start_xmit = virtnic_xmit,
+ .ndo_get_stats = virtnic_get_stats,
+ .ndo_do_ioctl = virtnic_ioctl,
+ .ndo_change_mtu = virtnic_change_mtu,
+ .ndo_tx_timeout = virtnic_xmit_timeout,
+ .ndo_set_rx_mode = virtnic_set_multi,
+};
+
+static int
+virtnic_probe(struct virtpci_dev *virtpcidev, const struct pci_device_id *id)
+{
+ struct net_device *netdev = NULL;
+ struct virtnic_info *vnicinfo;
+ int err;
+ int rsp;
+ irq_handler_t handler = virtnic_ISR;
+ struct channel_header __iomem *p_channel_header;
+ struct signal_queue_header __iomem *pqhdr;
+ uint64_t mask;
+
+#define RETFAIL(res) {\
+ kfree(vnicinfo->cmdrsp_rcv); \
+ kfree(vnicinfo->xmit_cmdrsp); \
+ kfree(vnicinfo->rcvbuf); \
+ if (vnicinfo->interrupt_vector != -1) \
+ free_irq(vnicinfo->interrupt_vector, vnicinfo); \
+ if (netdev) \
+ free_netdev(netdev); \
+ return res; \
+}
+
+ DBGINF("virtpci_dev:%p\n", virtpcidev);
+ DBGINF("virtpcidev busNo<<%d>>devNo<<%d>>",
+ virtpcidev->busNo, virtpcidev->deviceNo);
+ netdev = alloc_etherdev(sizeof(struct virtnic_info));
+ if (netdev == NULL) {
+ LOGERR("**** FAILED to alloc etherdev\n");
+ return -ENOMEM;
+ }
+ netdev->netdev_ops = &virtnic_dev_ops;
+ netdev->watchdog_timeo = VIRTNIC_XMIT_TIMEOUT;
+
+ memcpy(netdev->dev_addr, virtpcidev->net.mac_addr, MAX_MACADDR_LEN);
+ netdev->addr_len = MAX_MACADDR_LEN;
+ /* netdev->name should be ethx already */
+ netdev->dev.parent = &virtpcidev->generic_dev;
+
+ /* setup our private struct */
+ vnicinfo = netdev_priv(netdev);
+ memset(vnicinfo, 0, sizeof(struct virtnic_info));
+ vnicinfo->interrupt_vector = -1;
+ vnicinfo->netdev = netdev;
+ vnicinfo->virtpcidev = virtpcidev;
+ init_waitqueue_head(&vnicinfo->rsp_queue);
+ spin_lock_init(&vnicinfo->priv_lock);
+ vnicinfo->datachan.chinfo.queueinfo = &virtpcidev->queueinfo;
+ spin_lock_init(&vnicinfo->datachan.chinfo.insertlock);
+ vnicinfo->enabled = 0; /* not yet */
+ atomic_set(&vnicinfo->usage, 1); /* starting val */
+ vnicinfo->zoneguid = virtpcidev->net.zone_uuid;
+ vnicinfo->num_rcv_bufs = virtpcidev->net.num_rcv_bufs;
+ LOGINFNAME(vnicinfo->netdev, "num_rcv_bufs = %d\n",
+ vnicinfo->num_rcv_bufs);
+ vnicinfo->rcvbuf = kmalloc(sizeof(struct sk_buff *) *
+ vnicinfo->num_rcv_bufs, GFP_ATOMIC);
+ if (vnicinfo->rcvbuf == NULL) {
+ LOGERRNAME(vnicinfo->netdev,
+ "**** FAILED to allocate memory for %d receive buffers.\n",
+ vnicinfo->num_rcv_bufs);
+ RETFAIL(-ENOMEM);
+ }
+ memset(vnicinfo->rcvbuf, 0,
+ sizeof(struct sk_buff *) * vnicinfo->num_rcv_bufs);
+ /* set the net_xmit outstanding threshold */
+ vnicinfo->max_outstanding_net_xmits =
+ max(3, ((vnicinfo->num_rcv_bufs / 3) - 2));
+ /* always leave two slots open but you should have 3 at a minimum */
+ LOGINFNAME(vnicinfo->netdev, "max_outstanding_net_xmits = %d\n",
+ vnicinfo->max_outstanding_net_xmits);
+ vnicinfo->upper_threshold_net_xmits =
+ max(2, vnicinfo->max_outstanding_net_xmits - 1);
+ LOGINFNAME(vnicinfo->netdev, "upper_threshold_net_xmits = %d\n",
+ vnicinfo->upper_threshold_net_xmits);
+ vnicinfo->lower_threshold_net_xmits =
+ max(1, vnicinfo->max_outstanding_net_xmits / 2);
+ LOGINFNAME(vnicinfo->netdev, "lower_threshold_net_xmits = %d\n",
+ vnicinfo->lower_threshold_net_xmits);
+ skb_queue_head_init(&vnicinfo->xmitbufhead);
+
+ /* create a cmdrsp we can use to post and unpost rcv buffers */
+ vnicinfo->cmdrsp_rcv = kmalloc(SIZEOF_CMDRSP, GFP_ATOMIC);
+ if (vnicinfo->cmdrsp_rcv == NULL) {
+ LOGERRNAME(vnicinfo->netdev,
+ "**** FAILED to allocate cmdrsp to use for posting rcv buffers\n");
+ RETFAIL(-ENOMEM);
+ }
+ vnicinfo->xmit_cmdrsp = kmalloc(SIZEOF_CMDRSP, GFP_ATOMIC);
+ if (vnicinfo->xmit_cmdrsp == NULL) {
+ LOGERRNAME(vnicinfo->netdev,
+ "**** FAILED to allocate cmdrsp to use for xmits\n");
+ RETFAIL(-ENOMEM);
+ }
+ INIT_WORK(&vnicinfo->serverdown_completion,
+ virtnic_serverdown_complete);
+ INIT_WORK(&vnicinfo->timeout_reset, virtnic_timeout_reset);
+ vnicinfo->server_down = false;
+ vnicinfo->server_change_state = false;
+
+ /* set the default mtu */
+ netdev->mtu = virtpcidev->net.mtu;
+
+ vnicinfo->intr = virtpcidev->intr;
+ /* buffers will be allocated in open using mtu */
+
+ /* save off netdev in virtpcidev */
+ virtpcidev->net.netdev = netdev;
+
+ /* start thread that will receive responses */
+ writeq(readq(&vnicinfo->datachan.chinfo.queueinfo->chan->features) |
+ ULTRA_IO_CHANNEL_IS_POLLING,
+ &vnicinfo->datachan.chinfo.queueinfo->chan->features);
+ DBGINF("starting rsp thread queueinfo:%p threadinfo:%p\n",
+ vnicinfo->datachan.chinfo.queueinfo,
+ &vnicinfo->datachan.chinfo.threadinfo);
+ p_channel_header = vnicinfo->datachan.chinfo.queueinfo->chan;
+ pqhdr = (struct signal_queue_header __iomem *)
+ ((char __iomem *)p_channel_header +
+ readq(&p_channel_header->ch_space_offset)) +
+ IOCHAN_FROM_IOPART;
+ vnicinfo->flags_addr = (__force uint64_t __iomem *)&pqhdr->features;
+ vnicinfo->thread_wait_ms = 2;
+ if (!uisthread_start(&vnicinfo->datachan.chinfo.threadinfo,
+ process_incoming_rsps, &vnicinfo->datachan,
+ "vnic_incoming")) {
+ LOGERRNAME(vnicinfo->netdev, "**** FAILED to start thread\n");
+ RETFAIL(-ENODEV);
+ }
+
+ /* register_netdev */
+ LOGINFNAME(vnicinfo->netdev, "sendInterruptHandle=0x%16llX",
+ (unsigned long long)vnicinfo->intr.send_irq_handle);
+ LOGINFNAME(vnicinfo->netdev, "recvInterruptHandle=0x%16llX",
+ (unsigned long long)vnicinfo->intr.recv_irq_handle);
+ LOGINFNAME(vnicinfo->netdev, "recvInterruptVector=0x%8X",
+ vnicinfo->intr.recv_irq_vector);
+ LOGINFNAME(vnicinfo->netdev, "recvInterruptShared=0x%2X",
+ vnicinfo->intr.recv_irq_shared);
+ LOGINFNAME(vnicinfo->netdev, "netdev->name=%s", netdev->name);
+ vnicinfo->interrupt_vector = vnicinfo->intr.recv_irq_handle &
+ INTERRUPT_VECTOR_MASK;
+ netdev->irq = vnicinfo->interrupt_vector;
+ err = register_netdev(netdev);
+ if (err) {
+ uisthread_stop(&vnicinfo->datachan.chinfo.threadinfo);
+ RETFAIL(err);
+ }
+
+ /* create proc/ethx directory */
+ vnicinfo->eth_debugfs_dir = debugfs_create_dir(netdev->name,
+ virtnic_debugfs_dir);
+ if (!vnicinfo->eth_debugfs_dir) {
+ LOGERRNAME(vnicinfo->netdev,
+ "****FAILED to create proc dir entry:%s\n",
+ netdev->name);
+ uisthread_stop(&vnicinfo->datachan.chinfo.threadinfo);
+ RETFAIL(-ENODEV);
+ }
+
+ if (device_create_file(&netdev->dev, &dev_attr_zone) < 0) {
+ uisthread_stop(&vnicinfo->datachan.chinfo.threadinfo);
+ RETFAIL(-ENODEV);
+ }
+ if (device_create_file(&netdev->dev, &dev_attr_clientstr) < 0) {
+ device_remove_file(&netdev->dev, &dev_attr_zone);
+ uisthread_stop(&vnicinfo->datachan.chinfo.threadinfo);
+ RETFAIL(-ENODEV);
+ }
+ /* create proc/ethx directory */
+ rsp = request_irq(vnicinfo->interrupt_vector, handler, IRQF_SHARED,
+ netdev->name, vnicinfo);
+ if (rsp != 0) {
+ LOGERRNAME(vnicinfo->netdev,
+ "request_irq(%d) uislib_vnic_ISR request failed with rsp=%d\n",
+ vnicinfo->interrupt_vector, rsp);
+ vnicinfo->interrupt_vector = -1;
+ } else {
+ uint64_t __iomem *features_addr =
+ &vnicinfo->datachan.chinfo.queueinfo->chan->features;
+ LOGERRNAME(vnicinfo->netdev,
+ "request_irq(%d) uislib_vnic_ISR request succeeded\n",
+ vnicinfo->interrupt_vector);
+ mask = ~(ULTRA_IO_CHANNEL_IS_POLLING |
+ ULTRA_IO_DRIVER_DISABLES_INTS |
+ ULTRA_IO_DRIVER_SUPPORTS_ENHANCED_RCVBUF_CHECKING);
+ uisqueue_interlocked_and(features_addr, mask);
+ mask = ULTRA_IO_DRIVER_ENABLES_INTS |
+ ULTRA_IO_DRIVER_SUPPORTS_ENHANCED_RCVBUF_CHECKING;
+ uisqueue_interlocked_or(features_addr, mask);
+
+ vnicinfo->thread_wait_ms = 2000;
+ }
+
+ LOGINFNAME(vnicinfo->netdev,
+ "Added VirtNic:%p %s insertlock:%p %02x:%02x:%02x:%02x:%02x:%02x\n",
+ netdev, netdev->name, &vnicinfo->datachan.chinfo.insertlock,
+ netdev->dev_addr[0], netdev->dev_addr[1],
+ netdev->dev_addr[2], netdev->dev_addr[3],
+ netdev->dev_addr[4], netdev->dev_addr[5]);
+ return 0;
+}
+
+static void
+virtnic_remove(struct virtpci_dev *virtpcidev)
+{
+ struct net_device *netdev = virtpcidev->net.netdev;
+ struct virtnic_info *vnicinfo;
+
+ vnicinfo = netdev_priv(netdev);
+
+ LOGINFNAME(vnicinfo->netdev,
+ "virtpcidev:%p netdev:%p name:%s vnicinfo:%p\n",
+ virtpcidev, netdev, netdev->name, vnicinfo);
+ LOGINFNAME(vnicinfo->netdev,
+ "virtpcidev busNo<<%d>>devNo<<%d>>",
+ virtpcidev->bus_no, virtpcidev->device_no);
+ /* REMOVE netdev */
+ DBGINF("unregistering netdev\n");
+ if (vnicinfo->interrupt_vector != -1)
+ free_irq(vnicinfo->interrupt_vector, vnicinfo);
+ unregister_netdev(netdev);
+ /* this is going to call virtnic_close which will send out */
+ /* disable don't take thread down until after that */
+ uisthread_stop(&vnicinfo->datachan.chinfo.threadinfo);
+
+ /* freeing of rcv bufs should have happened in close. */
+ /* free cmdrsp we allocated for rcv post/unpost */
+ kfree(vnicinfo->cmdrsp_rcv);
+ kfree(vnicinfo->xmit_cmdrsp);
+
+ /* delete proc file entries */
+ device_remove_file(&netdev->dev, &dev_attr_zone);
+ device_remove_file(&netdev->dev, &dev_attr_clientstr);
+
+ debugfs_remove(vnicinfo->eth_debugfs_dir);
+ LOGINFNAME(vnicinfo->netdev, "removed dentry %s\n",
+ netdev->name);
+
+ kfree(vnicinfo->rcvbuf);
+ free_netdev(netdev);
+
+ LOGINF("virtnic removed\n");
+}
+
+/*****************************************************/
+/* NIC statistics handling */
+/*****************************************************/
+
+/* update rcv stats - locking done by invoker */
+#define UPD_RCV_STATS { \
+ vnicinfo->net_stats.rx_packets++; \
+ vnicinfo->net_stats.rx_bytes += skb->len; \
+}
+
+/* update xmt stats - locking done by invoker */
+#define UPD_XMT_STATS { \
+ vnicinfo->net_stats.tx_packets++; \
+ vnicinfo->net_stats.tx_bytes += skb->len; \
+}
+
+static struct net_device_stats *
+virtnic_get_stats(struct net_device *netdev)
+{
+ struct virtnic_info *vnicinfo = netdev_priv(netdev);
+
+ /* take this opportunity to print out our internal stats */
+ DBGINF
+ ("NET_RCV_ENBDIS sent: %ld NET_RCV_ENBDIS_ACK received: %ld\n",
+ vnicinfo->datachan.chstat.sent_enbdis,
+ vnicinfo->datachan.chstat.got_enbdisack);
+
+ DBGINF("NET_RCV received: %ld NET_RCV_POST sent: %ld\n",
+ vnicinfo->datachan.chstat.got_rcv,
+ vnicinfo->datachan.chstat.sent_post);
+
+ DBGINF("extra NET_RCV_POST sent: %ld\n",
+ vnicinfo->datachan.chstat.extra_rcvbufs_sent);
+
+ DBGINF("NET_XMIT sent: %ld NET_XMIT_DONE received: %ld\n",
+ vnicinfo->datachan.chstat.sent_xmit,
+ vnicinfo->datachan.chstat.got_xmit_done);
+
+ DBGINF("XMIT failures: %ld NET_RCV_PROMISC sent: %ld\n",
+ vnicinfo->datachan.chstat.xmit_fail,
+ vnicinfo->datachan.chstat.sent_promisc);
+
+ DBGINF("XMIT reject/busy: %ld\n",
+ vnicinfo->datachan.chstat.reject_count);
+
+ return &vnicinfo->net_stats;
+}
+
+/*****************************************************/
+/* Local functions */
+/*****************************************************/
+
+/*
+ * This function allocates skb, skb->data for first fragment. If Mtu
+ * size is > default, it allocates frags.
+ */
+static struct sk_buff *
+alloc_rcv_buf(struct net_device *netdev)
+{
+ struct sk_buff *skb;
+
+/*
+ * NOTE: the first fragment in each rcv buffer is pointed to by rcvskb->data.
+ * For now all rcv buffers will be RCVPOST_BUF_SIZE in length, so the firstfrag
+ * is large enough to hold 1514.
+ */
+ DBGINF("netdev->name <<%s>>: allocating skb len:%d\n", netdev->name,
+ RCVPOST_BUF_SIZE);
+ skb = alloc_skb(RCVPOST_BUF_SIZE, GFP_ATOMIC | __GFP_NOWARN);
+ if (!skb) {
+ LOGVER("**** alloc_skb failed\n");
+ return NULL;
+ }
+ skb->dev = netdev;
+ skb->len = RCVPOST_BUF_SIZE;
+ /* current value of mtu doesn't come into play here; large
+ * packets will just end up using multiple rcv buffers all of
+ * same size
+ */
+ skb->data_len = 0; /* dev_alloc_skb already zeroes it out.
+ for clarification. */
+ return skb;
+}
+
+static int
+init_rcv_bufs(struct net_device *netdev, struct virtnic_info *vnicinfo)
+{
+ int i, count;
+
+ DBGINF("netdev->name <<%s>>", netdev->name);
+ /*
+ * allocate fixed number of receive buffers to post to uisnic
+ * post receive buffers after we've allocated a required
+ * amount
+ */
+ for (i = 0; i < vnicinfo->num_rcv_bufs; i++) {
+ vnicinfo->rcvbuf[i] = alloc_rcv_buf(netdev);
+ if (!vnicinfo->rcvbuf[i])
+ break; /* if we failed to allocate one let us stop */
+ }
+ if (i < vnicinfo->num_rcv_bufs) {
+ LOGWRNNAME(vnicinfo->netdev,
+ "only allocated %d of %d receive buffers", i,
+ vnicinfo->num_rcv_bufs);
+ if (i == 0) {
+ /* couldn't even allocate one - bail out */
+ LOGERRNAME(vnicinfo->netdev,
+ "**** FAILED to allocate any rcv buffers\n");
+ return -ENOMEM;
+ }
+ }
+ count = i;
+ /* Ensure we can alloc 2/3rd of the requested number of
+ * buffers. 2/3 is an arbitraty choice; used also in ndis
+ * init.c.
+ */
+ if (count < ((2 * vnicinfo->num_rcv_bufs) / 3)) {
+ LOGERRNAME(vnicinfo->netdev,
+ "**** FAILED to allocate enough rcv bufs; allocated only:%d MAX_NET_RCV_BUFS:%d\n",
+ count, MAX_NET_RCV_BUFS);
+ /* free receive buffers we did allocate and then bail out */
+ for (i = 0; i < count; i++) {
+ kfree_skb(vnicinfo->rcvbuf[i]);
+ vnicinfo->rcvbuf[i] = NULL;
+ }
+ return -ENOMEM;
+ }
+
+ /* post receive buffers to receive incoming input - without holding */
+ /* lock - we've not enabled nor started the queue so there shouldn't */
+ /* be any rcv or xmit activity */
+ for (i = 0; i < count; i++)
+ post_skb(vnicinfo->cmdrsp_rcv, vnicinfo, vnicinfo->rcvbuf[i]);
+
+ /* push through with what buffers we've got - unallocated ones will */
+ /* be null */
+ LOGINFNAME(vnicinfo->netdev, "Allocated & posted %d rcv buffers\n",
+ count);
+
+ return 0;
+}
+
+/* Sends disable to IOVM and frees receive buffers that were posted to
+ * IOVM (cleared by IOVM when disable is received)
+ * returns 0 on success, negative number on failure
+ *
+ * timeout is defined in msecs (timeout of 0 specifies infinite wait)
+ */
+static int
+virtnic_disable_with_timeout(struct net_device *netdev, const int timeout)
+{
+ struct virtnic_info *vnicinfo = netdev_priv(netdev);
+ int i, count = 0;
+ unsigned long flags;
+ int wait = 0;
+
+ LOGINFNAME(vnicinfo->netdev, "netdev->name <<%s>>", netdev->name);
+ /* stop the transmit queue so nothing more can be transmitted */
+ netif_stop_queue(netdev);
+
+ /* send a msg telling the other end we are stopping incoming pkts */
+ spin_lock_irqsave(&vnicinfo->priv_lock, flags);
+ vnicinfo->enabled = 0;
+ vnicinfo->enab_dis_acked = 0; /* must wait for ack */
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+
+ /* send disable and wait for ack - don't hold lock when
+ * sending disable because if the queue is full, insert might
+ * sleep.
+ */
+ SEND_ENBDIS(netdev, 0, vnicinfo->cmdrsp_rcv,
+ vnicinfo->datachan.chinfo.queueinfo,
+ &vnicinfo->datachan.chinfo.insertlock,
+ vnicinfo->datachan.chstat);
+
+ LOGINFNAME(vnicinfo->netdev,
+ "Waiting for ENBDIS ACK before freeing rcv buffers...\n");
+ /* wait for ack to arrive before we try to free rcv buffers
+ * NOTE: the other end automatically unposts the rcv buffers
+ * when it gets a disable.
+ */
+ while ((timeout == VIRTNIC_INFINITE_RESPONSE_WAIT) ||
+ (wait < timeout)) {
+ spin_lock_irqsave(&vnicinfo->priv_lock, flags);
+ if (vnicinfo->n_rcv_packet_not_accepted) {
+ /* now we can continue with disable */
+ break;
+ } else if (vnicinfo->server_down ||
+ vnicinfo->server_change_state) {
+ LOGERRNAME(vnicinfo->netdev,
+ "IOVM is down so disable will not be acknowledged. Stopping wait.\n");
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+ return -1;
+ }
+ set_current_state(TASK_INTERRUPTIBLE);
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+ wait += schedule_timeout(msecs_to_jiffies(10));
+ }
+ if (!vnicinfo->n_rcv_packet_not_accepted) {
+ LOGERRNAME(vnicinfo->netdev,
+ "IOVM did not respond to Disable in allocated time (%d msecs).\n",
+ timeout);
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+ return -1;
+ }
+ LOGINFNAME(vnicinfo->netdev,
+ "Got ENBDIS ACK; now waiting for 0 usage count...\n");
+
+ /*
+ * wait for usage to go to 1 (no other users) before freeing
+ * rcv buffers
+ */
+ if (atomic_read(&vnicinfo->usage) > 1) {
+ /* wait for usage count to be 1 */
+ while (1) {
+ set_current_state(TASK_INTERRUPTIBLE);
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+ schedule_timeout(msecs_to_jiffies(10));
+ spin_lock_irqsave(&vnicinfo->priv_lock, flags);
+ if (atomic_read(&vnicinfo->usage) == 1) {
+ break; /* go do work and only after
+ that give up lock */
+ }
+ }
+ }
+ /* we've set enabled to 0, so we can give up the lock. */
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+ LOGINFNAME(vnicinfo->netdev,
+ "Usage count is 0; freeing the rcv buffers now\n");
+
+ /* free rcv buffers - other end has automatically unposted
+ * them on disable
+ */
+ for (i = 0; i < vnicinfo->num_rcv_bufs; i++) {
+ if (vnicinfo->rcvbuf[i]) {
+ kfree_skb(vnicinfo->rcvbuf[i]);
+ vnicinfo->rcvbuf[i] = NULL;
+ count++;
+ }
+ }
+ LOGINFNAME(vnicinfo->netdev, "Freed %d rcv bufs\n", count);
+
+ /* remove references from debug array */
+ for (i = 0; i < VIRTNICSOPENMAX; i++) {
+ if (num_virtnic_open[i].netdev == netdev) {
+ num_virtnic_open[i].netdev = NULL;
+ num_virtnic_open[i].vnicinfo = NULL;
+ break;
+ }
+ }
+
+ return 0;
+}
+
+/* Wait indefinitely for IOVM to acknowledge disable request */
+static int
+virtnic_disable(struct net_device *netdev)
+{
+ return virtnic_disable_with_timeout(netdev,
+ VIRTNIC_INFINITE_RESPONSE_WAIT);
+}
+
+/* Sends enable to IOVM, inits, and posts receive buffers to IOVM
+ * returns 0 on success, negative number on failure
+ *
+ * timeout is defined in msecs (timeout of 0 specifies infinite wait)
+ */
+static int
+virtnic_enable_with_timeout(struct net_device *netdev, const int timeout)
+{
+ int i;
+ struct virtnic_info *vnicinfo = netdev_priv(netdev);
+ unsigned long flags;
+ int wait = 0;
+
+ /* NOTE: the other end automatically unposts the rcv buffers when
+ * it gets a disable.
+ */
+ i = init_rcv_bufs(netdev, vnicinfo);
+ if (i < 0)
+ return i;
+
+ spin_lock_irqsave(&vnicinfo->priv_lock, flags);
+ vnicinfo->enabled = 1;
+ /* now we're ready, let's send an ENB to uisnic but until we
+ * get an ACK back from uisnic, we'll drop the packets
+ */
+ vnicinfo->n_rcv_packet_not_accepted = 0;
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+
+ /* send enable and wait for ack - don't hold lock when sending
+ * enable because if the queue is full, insert might sleep.
+ */
+ SEND_ENBDIS(netdev, 1, vnicinfo->cmdrsp_rcv,
+ vnicinfo->datachan.chinfo.queueinfo,
+ &vnicinfo->datachan.chinfo.insertlock,
+ vnicinfo->datachan.chstat);
+
+ LOGINFNAME(vnicinfo->netdev, "netdev->name <<%s>>", netdev->name);
+ LOGINFNAME(vnicinfo->netdev,
+ "Waiting for ENBDIS ACK before starting device queue...\n");
+ while ((timeout == VIRTNIC_INFINITE_RESPONSE_WAIT) ||
+ (wait < timeout)) {
+ spin_lock_irqsave(&vnicinfo->priv_lock, flags);
+ if (vnicinfo->enab_dis_acked) {
+ /* now we can continue */
+ break;
+ } else if (vnicinfo->server_down ||
+ vnicinfo->server_change_state) {
+ /* IOVM is going down so don't wait for a response */
+ LOGERRNAME(vnicinfo->netdev,
+ "IOVM is down so enable will not be acknowledged. Stopping wait.\n");
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+ return -1;
+ }
+ set_current_state(TASK_INTERRUPTIBLE);
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+ wait += schedule_timeout(msecs_to_jiffies(10));
+ }
+ if (!vnicinfo->enab_dis_acked) {
+ LOGERRNAME(vnicinfo->netdev,
+ "IOVM did not respond to Enable in allocated time (%d msecs).\n",
+ timeout);
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+ return -1;
+ }
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+ LOGINFNAME(vnicinfo->netdev, "Got ENBDIS ACK\n");
+
+ /* find an open slot in the array to save off VirtNic
+ * references for debug
+ */
+ for (i = 0; i < VIRTNICSOPENMAX; i++) {
+ if (num_virtnic_open[i].netdev == NULL) {
+ num_virtnic_open[i].netdev = netdev;
+ num_virtnic_open[i].vnicinfo = vnicinfo;
+ break;
+ }
+ }
+ if (i == VIRTNICSOPENMAX)
+ LOGINFNAME(vnicinfo->netdev,
+ "No storage for debug ref for netdev = 0x%p vnicinfo = 0x%p\n",
+ netdev, vnicinfo);
+
+ return 0;
+}
+
+/* Wait indefinitely for IOVM to acknowledge enable request */
+static int
+virtnic_enable(struct net_device *netdev)
+{
+ return virtnic_enable_with_timeout(netdev,
+ VIRTNIC_INFINITE_RESPONSE_WAIT);
+}
+
+static void
+send_rcv_posts_if_needed(struct virtnic_info *vnicinfo)
+{
+ int i;
+ struct net_device *netdev;
+ struct uiscmdrsp *cmdrsp = vnicinfo->cmdrsp_rcv;
+ int cur_num_rcv_bufs_to_alloc, rcv_bufs_allocated;
+
+ if (!(vnicinfo->enabled && vnicinfo->enab_dis_acked)) {
+ /* dont do this until vnic is marked ready. */
+ return;
+ }
+ netdev = vnicinfo->netdev;
+ rcv_bufs_allocated = 0;
+ /* this code is trying to prevent getting stuck here forever,
+ * but still retry it if you cant allocate them all this
+ * time.
+ */
+ cur_num_rcv_bufs_to_alloc = vnicinfo->num_rcv_bufs_could_not_alloc;
+ while (cur_num_rcv_bufs_to_alloc > 0) {
+ cur_num_rcv_bufs_to_alloc--;
+ for (i = 0; i < vnicinfo->num_rcv_bufs; i++) {
+ if (vnicinfo->rcvbuf[i] != NULL)
+ continue;
+ vnicinfo->rcvbuf[i] = alloc_rcv_buf(netdev);
+ if (!vnicinfo->rcvbuf[i]) {
+ LOGVER("**** %s FAILED to allocate new rcv buf - no REPOST\n",
+ netdev->name);
+ vnicinfo->
+ alloc_failed_in_if_needed_cnt++;
+ break;
+ } else {
+ rcv_bufs_allocated++;
+ post_skb(cmdrsp, vnicinfo,
+ vnicinfo->rcvbuf[i]);
+ vnicinfo->datachan.chstat.
+ extra_rcvbufs_sent++;
+ }
+ }
+ }
+ vnicinfo->num_rcv_bufs_could_not_alloc -= rcv_bufs_allocated;
+ if (vnicinfo->num_rcv_bufs_could_not_alloc > 0) {
+ /*
+ * this path means you failed to alloc an skb in the
+ * normal path, and you are trying again later, and
+ * it still fails.
+ */
+ LOGVER("attempted to recover buffers which could not be allocated and failed");
+ LOGVER("rcv_bufs_allocated=%d, num_rcv_bufs_could_not_alloc=%d",
+ rcv_bufs_allocated,
+ vnicinfo->num_rcv_bufs_could_not_alloc);
+ }
+}
+
+static void
+drain_queue(struct datachan *dc, struct uiscmdrsp *cmdrsp,
+ struct virtnic_info *vnicinfo)
+{
+ unsigned long flags;
+ int qrslt;
+ struct net_device *netdev;
+
+ /* drain queue */
+ while (1) {
+ spin_lock_irqsave(&dc->chinfo.insertlock, flags);
+ if (!spar_channel_client_acquire_os(dc->chinfo.queueinfo->chan,
+ "vnic")) {
+ spin_unlock_irqrestore(&dc->chinfo.insertlock,
+ flags);
+ break;
+ }
+ qrslt = uisqueue_get_cmdrsp(dc->chinfo.queueinfo, cmdrsp,
+ IOCHAN_FROM_IOPART);
+ spar_channel_client_release_os(dc->chinfo.queueinfo->chan,
+ "vnic");
+ spin_unlock_irqrestore(&dc->chinfo.insertlock, flags);
+ if (qrslt == 0)
+ break; /* queue empty */
+ DBGINF("%p cmdrsp->net.type:%d\n",
+ &dc->chinfo.queueinfo, cmdrsp->net.type);
+ switch (cmdrsp->net.type) {
+ case NET_RCV:
+ DBGINF("Got NET_RCV\n");
+ dc->chstat.got_rcv++;
+ /* process incoming packet */
+ virtnic_rx(cmdrsp);
+ break;
+ case NET_XMIT_DONE:
+ DBGINF("Got NET_XMIT_DONE %p\n", cmdrsp->net.buf);
+ spin_lock_irqsave(&vnicinfo->priv_lock, flags);
+ dc->chstat.got_xmit_done++;
+ if (cmdrsp->net.xmtdone.xmt_done_result) {
+ LOGERRNAME(vnicinfo->netdev,
+ "XMIT_DONE failure buf:%p\n",
+ cmdrsp->net.buf);
+ dc->chstat.xmit_fail++;
+ }
+ /* only call queue wake if we stopped it */
+ netdev = ((struct sk_buff *)cmdrsp->net.buf)->dev;
+ /* ASSERT netdev == vnicinfo->netdev; */
+ if (netdev != vnicinfo->netdev) {
+ LOGERRNAME(vnicinfo->netdev, "NET_XMIT_DONE something wrong; vnicinfo->netdev:%p != cmdrsp->net.buf)->dev:%p\n",
+ vnicinfo->netdev, netdev);
+ } else if (netif_queue_stopped(netdev)) {
+ /*
+ * check to see if we have crossed
+ * the lower watermark for
+ * netif_wake_queue()
+ */
+ if (((vnicinfo->datachan.chstat.sent_xmit >=
+ vnicinfo->datachan.chstat.got_xmit_done) &&
+ (vnicinfo->datachan.chstat.sent_xmit -
+ vnicinfo->datachan.chstat.got_xmit_done <=
+ vnicinfo->lower_threshold_net_xmits)) ||
+ ((vnicinfo->datachan.chstat.sent_xmit <
+ vnicinfo->datachan.chstat.got_xmit_done) &&
+ (ULONG_MAX -
+ vnicinfo->datachan.chstat.got_xmit_done
+ + vnicinfo->datachan.chstat.sent_xmit <=
+ vnicinfo->lower_threshold_net_xmits))) {
+ /*
+ * enough NET_XMITs completed
+ * so can restart netif queue
+ */
+ netif_wake_queue(netdev);
+ vnicinfo->flow_control_lower_hits++;
+ }
+ }
+ skb_unlink(cmdrsp->net.buf, &vnicinfo->xmitbufhead);
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+ kfree_skb(cmdrsp->net.buf);
+ break;
+ case NET_RCV_ENBDIS_ACK:
+ DBGINF("Got NET_RCV_ENBDIS_ACK on:%p\n",
+ (struct net_device *)
+ cmdrsp->net.enbdis.context);
+ dc->chstat.got_enbdisack++;
+ netdev = (struct net_device *)
+ cmdrsp->net.enbdis.context;
+ spin_lock_irqsave(&vnicinfo->priv_lock, flags);
+ vnicinfo->enab_dis_acked = 1;
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+
+ if (vnicinfo->server_down &&
+ vnicinfo->server_change_state) {
+ /* Inform Linux that the link is up */
+ vnicinfo->server_down = false;
+ vnicinfo->server_change_state = false;
+ netif_wake_queue(netdev);
+ netif_carrier_on(netdev);
+ }
+ break;
+ case NET_CONNECT_STATUS:
+ DBGINF("NET_CONNECT_STATUS, enable=:%d\n",
+ cmdrsp->net.enbdis.enable);
+ netdev = vnicinfo->netdev;
+ if (cmdrsp->net.enbdis.enable == 1) {
+ spin_lock_irqsave(&vnicinfo->priv_lock, flags);
+ vnicinfo->enabled = cmdrsp->net.enbdis.enable;
+ spin_unlock_irqrestore(&vnicinfo->priv_lock,
+ flags);
+ netif_wake_queue(netdev);
+ netif_carrier_on(netdev);
+ } else {
+ netif_stop_queue(netdev);
+ netif_carrier_off(netdev);
+ spin_lock_irqsave(&vnicinfo->priv_lock, flags);
+ vnicinfo->enabled = cmdrsp->net.enbdis.enable;
+ spin_unlock_irqrestore(&vnicinfo->priv_lock,
+ flags);
+ }
+ break;
+ default:
+ LOGERRNAME(vnicinfo->netdev,
+ "Invalid net type:%d in cmdrsp\n",
+ cmdrsp->net.type);
+ break;
+ }
+ /* cmdrsp is now available for reuse */
+
+ if (dc->chinfo.threadinfo.should_stop)
+ break;
+ }
+}
+
+static int
+process_incoming_rsps(void *v)
+{
+ struct datachan *dc = v;
+ struct uiscmdrsp *cmdrsp = NULL;
+ const int SZ = SIZEOF_CMDRSP;
+ struct virtnic_info *vnicinfo;
+ struct channel_header __iomem *p_channel_header;
+ struct signal_queue_header __iomem *pqhdr;
+ uint64_t mask;
+ unsigned long long rc1;
+
+ UIS_DAEMONIZE("vnic_incoming");
+ DBGINF("In process_incoming_rsps pid:%d queueinfo:%p threadinfo:%p\n",
+ current->pid, dc->chinfo.queueinfo, &dc->chinfo.threadinfo);
+ /* alloc once and reuse */
+ vnicinfo = container_of(dc, struct virtnic_info, datachan);
+ cmdrsp = kmalloc(SZ, GFP_ATOMIC);
+ if (cmdrsp == NULL) {
+ LOGERRNAME(vnicinfo->netdev,
+ "**** FAILED to malloc - thread exiting\n");
+ complete_and_exit(&dc->chinfo.threadinfo.has_stopped, 0);
+ }
+ p_channel_header = vnicinfo->datachan.chinfo.queueinfo->chan;
+ pqhdr =
+ (struct signal_queue_header __iomem *)
+ ((char __iomem *)p_channel_header +
+ readq(&p_channel_header->ch_space_offset)) +
+ IOCHAN_FROM_IOPART;
+ mask = ULTRA_CHANNEL_ENABLE_INTS;
+ while (1) {
+ wait_event_interruptible_timeout(
+ vnicinfo->rsp_queue, (atomic_read
+ (&vnicinfo->interrupt_rcvd) == 1),
+ msecs_to_jiffies(vnicinfo->thread_wait_ms));
+ /*
+ * periodically check to see if there any rcv bufs which
+ * need to get sent to the iovm. This can only happen if
+ * we run out of memory when trying to allocate skbs.
+ */
+ atomic_set(&vnicinfo->interrupt_rcvd, 0);
+ send_rcv_posts_if_needed(vnicinfo);
+ drain_queue(dc, cmdrsp, vnicinfo);
+ rc1 = uisqueue_interlocked_or((uint64_t __iomem *)
+ vnicinfo->flags_addr, mask);
+ if (dc->chinfo.threadinfo.should_stop)
+ break;
+ }
+
+ kfree(cmdrsp);
+ DBGINF("In process_incoming_nic_rsp exiting\n");
+ complete_and_exit(&dc->chinfo.threadinfo.has_stopped, 0);
+}
+
+/*****************************************************/
+/* NIC support functions called external */
+/*****************************************************/
+
+static int
+virtnic_change_mtu(struct net_device *netdev, int new_mtu)
+{
+ LOGERRNAME(netdev, "netdev->name <<%s>>", netdev->name);
+ LOGERRNAME(netdev, "**** FAILED: MTU cannot be changed at this end.\n");
+ LOGERRNAME(netdev, "The same MTU is used for all the PNICs and VNICs in a switch.\n");
+ LOGERRNAME(netdev, "Please change MTU from the Resource Partition\n");
+ LOGERRNAME(netdev, "Current MTU is: %d\n", netdev->mtu);
+ return -EINVAL;
+ /*
+ * we cannot willy-nilly change the MTU; it has to come from
+ * CONTROL VM and all the vnics and pnics in a switch have to
+ * have the same MTU for everything to work.
+ */
+}
+
+/*
+ * Called by kernel when ifconfig down is run.
+ * Returns 0 on success, negative value on failure.
+ */
+static int
+virtnic_close(struct net_device *netdev)
+{
+ /* this is called on ifconfig down but also if the device is
+ * being removed
+ */
+ LOGINFNAME(netdev, "Closing %p name:%s\n", netdev, netdev->name);
+
+ netif_stop_queue(netdev);
+ virtnic_disable(netdev);
+
+ LOGINFNAME(netdev, "Closed:%p\n", netdev);
+
+ return 0;
+}
+
+static int
+virtnic_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
+{
+ return -EOPNOTSUPP;
+}
+
+/*
+ * Called by kernel when ifconfig up is run.
+ * Returns 0 on success, negative value on failure.
+*/
+static int
+virtnic_open(struct net_device *netdev)
+{
+ struct virtnic_info *vnicinfo = netdev_priv(netdev);
+ void *p = (__force void *)netdev->ip_ptr;
+
+ LOGINFNAME(vnicinfo->netdev,
+ "Opening %p name:%s allocating:%d rcvbufs mtu:%d\n", netdev,
+ netdev->name, vnicinfo->num_rcv_bufs, netdev->mtu);
+
+ virtnic_enable(netdev);
+ /* start the interface's transmit queue, allowing it accept
+ * packets for transmission
+ */
+ netif_start_queue(netdev);
+
+ LOGINFNAME(vnicinfo->netdev,
+ "Opened %p netdev->ip_ptr:%p name:%s %02x:%02x:%02x:%02x:%02x:%02x\n",
+ netdev, netdev->ip_ptr, netdev->name, netdev->dev_addr[0],
+ netdev->dev_addr[1], netdev->dev_addr[2],
+ netdev->dev_addr[3], netdev->dev_addr[4],
+ netdev->dev_addr[5]);
+
+ /*
+ * temporary code to see trap to catch if vnic inet addresses
+ * are getting trashed
+ */
+ if (p != (__force void *)netdev->ip_ptr) {
+ LOGERRNAME(vnicinfo->netdev, "***********FAILURE HAPPENED\n");
+ LOGERRNAME(vnicinfo->netdev, " Test to catch if vnic inet addresses are getting trashed.\n");
+ set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(msecs_to_jiffies(1000));
+ }
+ return 0;
+}
+
+static inline int
+repost_return(
+ struct uiscmdrsp *cmdrsp,
+ struct virtnic_info *vnicinfo,
+ struct sk_buff *skb,
+ struct net_device *netdev)
+{
+ struct net_pkt_rcv copy;
+ int i = 0, cc, numreposted;
+ int found_skb = 0;
+ int status = 0;
+
+ copy = cmdrsp->net.rcv;
+ LOGVER("REPOST_RETURN: realloc rcv skbs to replace:%d rcvbufs\n",
+ copy.numrcvbufs);
+ switch (copy.numrcvbufs) {
+ case 0:
+ vnicinfo->n_rcv0++;
+ break;
+ case 1:
+ vnicinfo->n_rcv1++;
+ break;
+ case 2:
+ vnicinfo->n_rcv2++;
+ break;
+ default:
+ vnicinfo->n_rcvx++;
+ break;
+ }
+ for (cc = 0, numreposted = 0; cc < copy.numrcvbufs; cc++) {
+ for (i = 0; i < vnicinfo->num_rcv_bufs; i++) {
+ if (vnicinfo->rcvbuf[i] != copy.rcvbuf[cc])
+ continue;
+
+ LOGVER("REPOST_RETURN: orphaning old rcvbuf[%d]:%p cc=%d",
+ i, vnicinfo->rcvbuf[i], cc);
+ vnicinfo->found_repost_rcvbuf_cnt++;
+ if ((skb) && vnicinfo->rcvbuf[i] == skb) {
+ found_skb = 1;
+ vnicinfo->repost_found_skb_cnt++;
+ }
+ vnicinfo->rcvbuf[i] = alloc_rcv_buf(netdev);
+ if (!vnicinfo->rcvbuf[i]) {
+ LOGVER("**** %s FAILED to reallocate new rcv buf - no REPOST, found_skb=%d, cc=%d, i=%d\n",
+ netdev->name, found_skb, cc, i);
+ vnicinfo->num_rcv_bufs_could_not_alloc++;
+ vnicinfo->alloc_failed_in_repost_return_cnt++;
+ status = -1;
+ break;
+ }
+ LOGVER("REPOST_RETURN: reposting new rcvbuf[%d]:%p\n",
+ i, vnicinfo->rcvbuf[i]);
+ post_skb(cmdrsp, vnicinfo, vnicinfo->rcvbuf[i]);
+ numreposted++;
+ break;
+ }
+ }
+ LOGVER("REPOST_RETURN: num rcvbufs posted:%d\n", numreposted);
+ if (numreposted != copy.numrcvbufs) {
+ LOGVER("**** %s FAILED to repost all the rcv bufs; numreposted:%d rcv.numrcvbufs:%d\n",
+ netdev->name, numreposted, copy.numrcvbufs);
+ vnicinfo->n_repost_deficit++;
+ status = -1;
+ }
+ if (skb) {
+ if (found_skb) {
+ LOGVER("REPOST_RETURN: skb is %p - freeing it", skb);
+ kfree_skb(skb);
+ } else {
+ LOGERRNAME(vnicinfo->netdev, "%s REPOST_RETURN: skb %p NOT found in rcvbuf list!!",
+ netdev->name, skb);
+ status = -3;
+ vnicinfo->bad_rcv_buf++;
+ }
+ }
+ atomic_dec(&vnicinfo->usage);
+ return status;
+}
+
+static void
+virtnic_rx(struct uiscmdrsp *cmdrsp)
+{
+ struct virtnic_info *vnicinfo;
+ struct sk_buff *skb, *prev, *curr;
+ struct net_device *netdev;
+ int cc, currsize, off, status;
+ struct ethhdr *eth;
+ unsigned long flags;
+#ifdef DEBUG
+ struct phys_info testfrags[MAX_PHYS_INFO];
+#endif
+
+/*
+ * post new rcv buf to the other end using the cmdrsp we have at hand
+ * post it without holding lock - but we'll use the signal lock to synchronize
+ * the queue insert the cmdrsp that contains the net.rcv is the one we are
+ * using to repost, so copy the info we need from it.
+ */
+ skb = cmdrsp->net.buf;
+ netdev = skb->dev;
+
+ if (netdev)
+ DBGINF("in virtnic_rx %p %s len:%d\n", netdev, netdev->name,
+ cmdrsp->net.rcv.rcv_done_len);
+ else {
+ /* We must have previously downed this network device and
+ * this skb and device is no longer valid. This also means
+ * the skb reference was removed from virtnic->rcvbuf so no
+ * need to search for it.
+ * All we can do is free the skb and return.
+ * Note: We crash if we try to log this here.
+ */
+ kfree_skb(skb);
+ return;
+ }
+
+ vnicinfo = netdev_priv(netdev);
+
+ spin_lock_irqsave(&vnicinfo->priv_lock, flags);
+ atomic_dec(&vnicinfo->num_rcv_bufs_in_iovm);
+
+ /* update rcv stats - call it with priv_lock held */
+ UPD_RCV_STATS;
+
+ atomic_inc(&vnicinfo->usage); /* don't want a close to happen before
+ we're done here */
+ /*
+ * set length to how much was ACTUALLY received -
+ * NOTE: rcv_done_len includes actual length of data rcvd
+ * including ethhdr
+ */
+ skb->len = cmdrsp->net.rcv.rcv_done_len;
+
+ /* test enabled while holding lock */
+ if (!(vnicinfo->enabled && vnicinfo->enab_dis_acked)) {
+ /*
+ * don't process it unless we're in enable mode and until
+ * we've gotten an ACK saying the other end got our RCV enable
+ */
+ LOGERRNAME(vnicinfo->netdev,
+ "%s dropping packet - perhaps old\n", netdev->name);
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+ if (repost_return(cmdrsp, vnicinfo, skb, netdev) < 0)
+ LOGERRNAME(vnicinfo->netdev, "repost_return failed");
+ return;
+ }
+
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+
+ /*
+ * when skb was allocated, skb->dev, skb->data, skb->len and
+ * skb->data_len were setup. AND, data has already put into the
+ * skb (both first frag and in frags pages)
+ * NOTE: firstfragslen is the amount of data in skb->data and that
+ * which is not in nr_frags or frag_list. This is now simply
+ * RCVPOST_BUF_SIZE. bump tail to show how much data is in
+ * firstfrag & set data_len to show rest see if we have to chain
+ * frag_list.
+ */
+ if (skb->len > RCVPOST_BUF_SIZE) { /* do PRECAUTIONARY check */
+ if (cmdrsp->net.rcv.numrcvbufs < 2) {
+ LOGERRNAME(vnicinfo->netdev, "**** %s Something is wrong; rcv_done_len:%d > RCVPOST_BUF_SIZE:%d but numrcvbufs:%d < 2\n",
+ netdev->name, skb->len, RCVPOST_BUF_SIZE,
+ cmdrsp->net.rcv.numrcvbufs);
+ if (repost_return(cmdrsp, vnicinfo, skb, netdev) < 0)
+ LOGERRNAME(vnicinfo->netdev,
+ "repost_return failed");
+ return;
+ }
+ /* length rcvd is greater than firstfrag in this skb rcv buf */
+ skb->tail += RCVPOST_BUF_SIZE; /* amount in skb->data */
+ skb->data_len = skb->len - RCVPOST_BUF_SIZE; /* amount that
+ will be in
+ frag_list */
+ DBGINF("len:%d data:%d\n", skb->len, skb->data_len);
+ } else {
+ /*
+ * data fits in this skb - no chaining - do PRECAUTIONARY check
+ */
+ if (cmdrsp->net.rcv.numrcvbufs != 1) { /* should be 1 */
+ LOGERRNAME(vnicinfo->netdev, "**** %s Something is wrong; rcv_done_len:%d <= RCVPOST_BUF_SIZE:%d but numrcvbufs:%d != 1\n",
+ netdev->name, skb->len, RCVPOST_BUF_SIZE,
+ cmdrsp->net.rcv.numrcvbufs);
+ if (repost_return(cmdrsp, vnicinfo, skb, netdev) < 0)
+ LOGERRNAME(vnicinfo->netdev,
+ "repost_return failed");
+ return;
+ }
+ skb->tail += skb->len;
+ skb->data_len = 0; /* nothing rcvd in frag_list */
+ }
+ off = skb_tail_pointer(skb) - skb->data;
+ /*
+ * amount we bumped tail by in the head skb
+ * it is used to calculate the size of each chained skb below
+ * it is also used to index into bufline to continue the copy
+ * (for chansocktwopc)
+ * if necessary chain the rcv skbs together.
+ * NOTE: index 0 has the same as cmdrsp->net.rcv.skb; we need to
+ * chain the rest to that one.
+ * - do PRECAUTIONARY check
+ */
+ if (cmdrsp->net.rcv.rcvbuf[0] != skb) {
+ LOGERRNAME(vnicinfo->netdev, "**** %s Something is wrong; rcvbuf[0]:%p != skb:%p\n",
+ netdev->name, cmdrsp->net.rcv.rcvbuf[0], skb);
+ if (repost_return(cmdrsp, vnicinfo, skb, netdev) < 0)
+ LOGERRNAME(vnicinfo->netdev, "repost_return failed");
+ return;
+ }
+
+ if (cmdrsp->net.rcv.numrcvbufs > 1) {
+ /* chain the various rcv buffers into the skb's frag_list. */
+ /* Note: off was initialized above */
+ for (cc = 1, prev = NULL;
+ cc < cmdrsp->net.rcv.numrcvbufs; cc++) {
+ curr = (struct sk_buff *)cmdrsp->net.rcv.rcvbuf[cc];
+ curr->next = NULL;
+ DBGINF("chaining skb:%p data:%p to skb:%p data:%p\n",
+ curr, curr->data, skb, skb->data);
+ if (prev == NULL) /* start of list- set head */
+ skb_shinfo(skb)->frag_list = curr;
+ else
+ prev->next = curr;
+ prev = curr;
+ /*
+ * should we set skb->len and skb->data_len for each
+ * buffer being chained??? can't hurt!
+ */
+ currsize =
+ min(skb->len - off,
+ (unsigned int)RCVPOST_BUF_SIZE);
+ curr->len = currsize;
+ curr->tail += currsize;
+ curr->data_len = 0;
+ off += currsize;
+ }
+#ifdef DEBUG
+ /* assert skb->len == off */
+ if (skb->len != off) {
+ LOGERRNAME(vnicinfo->netdev, "%s something wrong; skb->len:%d != off:%d\n",
+ netdev->name, skb->len, off);
+ }
+ /* test code */
+ cc = util_copy_fragsinfo_from_skb("rcvchaintest", skb,
+ RCVPOST_BUF_SIZE,
+ MAX_PHYS_INFO, testfrags);
+ LOGINFNAME(vnicinfo->netdev, "rcvchaintest returned:%d\n", cc);
+ if (cc != cmdrsp->net.rcv.numrcvbufs) {
+ LOGERRNAME(vnicinfo->netdev, "**** %s Something wrong; rcvd chain length %d different from one we calculated %d\n",
+ netdev->name, cmdrsp->net.rcv.numrcvbufs,
+ cc);
+ }
+ for (i = 0; i < cc; i++) {
+ LOGINFNAME(vnicinfo->netdev, "test:RCVPOST_BUF_SIZE:%d[%d] pfn:%llu off:0x%x len:%d\n",
+ RCVPOST_BUF_SIZE, i, testfrags[i].pi_pfn,
+ testfrags[i].pi_off, testfrags[i].pi_len);
+ }
+#endif
+ }
+
+ /* set up packet's protocl type using ethernet header - this
+ * sets up skb->pkt_type & it also PULLS out the eth header
+ */
+ skb->protocol = eth_type_trans(skb, netdev);
+
+ eth = eth_hdr(skb);
+
+ DBGINF("%d Src:%02x:%02x:%02x:%02x:%02x:%02x Dest:%02x:%02x:%02x:%02x:%02x:%02x proto:%x\n",
+ skb->pkt_type, eth->h_source[0], eth->h_source[1],
+ eth->h_source[2], eth->h_source[3], eth->h_source[4],
+ eth->h_source[5], eth->h_dest[0], eth->h_dest[1], eth->h_dest[2],
+ eth->h_dest[3], eth->h_dest[4], eth->h_dest[5], eth->h_proto);
+
+ skb->csum = 0;
+ skb->ip_summed = CHECKSUM_NONE; /* trust me, the checksum has
+ been verified */
+
+ do {
+ if (netdev->flags & IFF_PROMISC) {
+ DBGINF("IFF_PROMISC is set.\n");
+ break; /* accept all packets */
+ }
+ if (skb->pkt_type == PACKET_BROADCAST) {
+ DBGINF("packet is broadcast.\n");
+ if (netdev->flags & IFF_BROADCAST) {
+ DBGINF("IFF_BROADCAST is set.\n");
+ break; /* accept all broadcast packets */
+ }
+ } else if (skb->pkt_type == PACKET_MULTICAST) {
+ DBGINF("packet is multicast.\n");
+ if (netdev->flags & IFF_ALLMULTI)
+ DBGINF("IFF_ALLMULTI is set.\n");
+ if ((netdev->flags & IFF_MULTICAST) &&
+ (netdev_mc_count(netdev))) {
+ struct netdev_hw_addr *ha;
+ int found_mc = 0;
+
+ DBGINF("IFF_MULTICAST is set %d.\n",
+ netdev_mc_count(netdev));
+ /*
+ * only accept multicast packets that we can
+ * find in our multicast address list
+ */
+ netdev_for_each_mc_addr(ha, netdev) {
+ if (memcmp
+ (eth->h_dest, ha->addr,
+ MAX_MACADDR_LEN) == 0) {
+ DBGINF("multicast address is in our list at index:%i.\n", i);
+ found_mc = 1;
+ break;
+ }
+ }
+ if (found_mc) {
+ break; /* accept packet, dest
+ matches a multicast
+ address */
+ }
+ }
+ } else if (skb->pkt_type == PACKET_HOST) {
+ DBGINF("packet is directed.\n");
+ break; /* accept packet, h_dest must match vnic
+ mac address */
+ } else if (skb->pkt_type == PACKET_OTHERHOST) {
+ /* something is not right */
+ LOGERRNAME(vnicinfo->netdev, "**** FAILED to deliver rcv packet to OS; name:%s Dest:%02x:%02x:%02x:%02x:%02x:%02x VNIC:%02x:%02x:%02x:%02x:%02x:%02x\n",
+ netdev->name, eth->h_dest[0], eth->h_dest[1],
+ eth->h_dest[2], eth->h_dest[3],
+ eth->h_dest[4], eth->h_dest[5],
+ netdev->dev_addr[0], netdev->dev_addr[1],
+ netdev->dev_addr[2], netdev->dev_addr[3],
+ netdev->dev_addr[4], netdev->dev_addr[5]);
+ }
+ /* drop packet - don't forward it up to OS */
+ DBGINF("we cannot indicate this recv pkt! (netdev->flags:0x%04x, skb->pkt_type:0x%02x).\n",
+ netdev->flags, skb->pkt_type);
+ vnicinfo->n_rcv_packet_not_accepted++;
+ if (repost_return(cmdrsp, vnicinfo, skb, netdev) < 0)
+ LOGERRNAME(vnicinfo->netdev, "repost_return failed");
+ return;
+ } while (0);
+
+ DBGINF("Calling netif_rx skb:%p head:%p end:%p data:%p tail:%p len:%d data_len:%d skb->nr_frags:%d\n",
+ skb, skb->head, skb->end, skb->data, skb->tail, skb->len,
+ skb->data_len, skb_shinfo(skb)->nr_frags);
+
+ status = netif_rx(skb);
+ if (status != NET_RX_SUCCESS)
+ LOGWRNNAME(vnicinfo->netdev, "status=%d\n", status);
+ /*
+ * netif_rx returns various values, but "in practice most drivers
+ * ignore the return value
+ */
+
+ skb = NULL;
+ /*
+ * whether the packet got dropped or handled, the skb is freed by
+ * kernel code, so we shouldn't free it. but we should repost a
+ * new rcv buffer.
+ */
+ if (repost_return(cmdrsp, vnicinfo, skb, netdev) < 0)
+ LOGVER("repost_return failed");
+ return;
+}
+
+/*
+ * This function is protected from concurrent calls by a spinlock xmit_lock
+ * in the net_device struct, but as soon as the function returns it can be
+ * called again.
+ * Return 0, OK, !0 for error.
+ */
+static int
+virtnic_xmit(struct sk_buff *skb, struct net_device *netdev)
+{
+ struct virtnic_info *vnicinfo;
+ int len, firstfraglen, padlen;
+ struct uiscmdrsp *cmdrsp = NULL;
+ unsigned long flags;
+ int qrslt;
+
+/* Note: NETDEV_TX_OK is 0, NETDEV_TX_BUSY is 1. */
+#define BUSY { \
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags); \
+ vnicinfo->busy_cnt++; \
+ return NETDEV_TX_BUSY; \
+}
+
+/* return value NETDEV_TX_OK == 0 */
+ DBGINF("got xmit for netdev:%p %s len:%d ip_summed:%d skb->data:%p data_len:%d skb->h.raw:%p maxdatalen:%d\n",
+ netdev, netdev->name, skb->len, skb->ip_summed, skb->data,
+ skb->data_len, skb->h.raw, skb->end - skb->data);
+
+ vnicinfo = netdev_priv(netdev);
+ spin_lock_irqsave(&vnicinfo->priv_lock, flags);
+ /*Modified for Trac #2395 FIX TEL_CKS */
+ if (netif_queue_stopped(netdev)) {
+ LOGINFNAME(vnicinfo->netdev,
+ "Returning Busy because queue is stopped\n");
+ BUSY;
+ }
+ if (vnicinfo->server_down || vnicinfo->server_change_state) {
+ LOGINFNAME(vnicinfo->netdev, "Returning BUSY because server is down/changing state\n");
+ BUSY;
+ }
+ /*
+ * sk_buff struct is used to host network data throughout all the
+ * Linux network subsystems
+ */
+ len = skb->len;
+ /*
+ * skb->len is the FULL length of data (including fragmentary portion)
+ * skb->data_len is the length of the fragment portion in frags
+ * skb->len - skb->data_len is the size of the 1st fragment in skb->data
+ * calculate the length of the first fragment that skb->data is
+ * pointing to
+ */
+ firstfraglen = skb->len - skb->data_len;
+ if (firstfraglen < ETH_HEADER_SIZE) {
+ LOGERRNAME(vnicinfo->netdev, "first fragment in skb->data too small for ethernet header len:%d data_len:%d\n",
+ skb->len, skb->data_len);
+ BUSY; /* NOT LIKELY TO HAPPEN */
+ }
+
+ if ((len < ETH_MIN_PACKET_SIZE) &&
+ ((skb_end_pointer(skb) - skb->data) >= ETH_MIN_PACKET_SIZE)) {
+ /* pad the packet out to minimum size */
+ padlen = ETH_MIN_PACKET_SIZE - len;
+ DBGINF("padding %d\n", padlen);
+ memset(&skb->data[len], 0, padlen);
+ skb->tail += padlen;
+ skb->len += padlen;
+ len += padlen;
+ firstfraglen += padlen;
+ }
+
+ cmdrsp = vnicinfo->xmit_cmdrsp;
+ /* clear cmdrsp */
+ memset(cmdrsp, 0, SIZEOF_CMDRSP);
+ cmdrsp->net.type = NET_XMIT;
+ cmdrsp->cmdtype = CMD_NET_TYPE;
+
+ /* save the pointer to skb - we'll need it for completion */
+ cmdrsp->net.buf = skb;
+
+ if (((vnicinfo->datachan.chstat.sent_xmit >=
+ vnicinfo->datachan.chstat.got_xmit_done) &&
+ (vnicinfo->datachan.chstat.sent_xmit -
+ vnicinfo->datachan.chstat.got_xmit_done >=
+ vnicinfo->max_outstanding_net_xmits)) ||
+ /* OR check wrap condition */
+ ((vnicinfo->datachan.chstat.sent_xmit <
+ vnicinfo->datachan.chstat.got_xmit_done) &&
+ (ULONG_MAX - vnicinfo->datachan.chstat.got_xmit_done +
+ vnicinfo->datachan.chstat.sent_xmit >=
+ vnicinfo->max_outstanding_net_xmits))
+ ) {
+ /*
+ * too many NET_XMITs queued over to IOVM - need to wait
+ * Might need to remove the below message as these might be
+ * excessive under load.
+ */
+ vnicinfo->datachan.chstat.reject_count++;
+ if (!vnicinfo->queuefullmsg_logged &&
+ ((vnicinfo->datachan.chstat.reject_count & 0x3ff) ==
+ 1)) {
+ vnicinfo->queuefullmsg_logged = 1;
+#if VIRTNIC_STATS
+ vnicinfo->datachan.chstat.reject_jiffies_start =
+ jiffies;
+#endif
+ LOGINFNAME(vnicinfo->netdev, "**** REJECTING NET_XMIT - rejected count=%ld chstat.sent_xmit=%lu chstat.got_xmit_done=%lu\n",
+ vnicinfo->datachan.chstat.reject_count,
+ vnicinfo->datachan.chstat.sent_xmit,
+ vnicinfo->datachan.chstat.got_xmit_done);
+ }
+ netif_stop_queue(netdev); /* calling stop queue */
+ BUSY; /* return status that packet not accepted */
+ } else if (vnicinfo->queuefullmsg_logged) {
+#if VIRTNIC_STATS
+ LOGINFNAME(vnicinfo->netdev, "**** NET_XMITs now working again - rejected count = %ld msec = %ld\n",
+ vnicinfo->datachan.chstat.reject_count,
+ ((long)jiffies -
+ (long)(vnicinfo->datachan.chstat.
+ reject_jiffies_start)) * 1000 / HZ);
+#else
+ LOGINFNAME(vnicinfo->netdev, "**** NET_XMITs now working again - rejected count = %ld\n",
+ vnicinfo->datachan.chstat.reject_count);
+#endif
+ /* queue is not blocked so reset the logging flag */
+ vnicinfo->queuefullmsg_logged = 0;
+ }
+
+ if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
+ DBGINF("CHECKSUM_HW protocol:%x csum:%x tso_size:%x data:%p h.raw:%p nh.raw:%p\n",
+ skb->protocol, skb->csum, skb_shinfo(skb)->tso_size,
+ skb->data, skb->h.raw, skb->nh.raw);
+ cmdrsp->net.xmt.lincsum.valid = 1;
+ cmdrsp->net.xmt.lincsum.protocol = skb->protocol;
+ if (skb_transport_header(skb) > skb->data) {
+ cmdrsp->net.xmt.lincsum.hrawoff =
+ skb_transport_header(skb) - skb->data;
+ cmdrsp->net.xmt.lincsum.hrawoffv = 1;
+ }
+ if (skb_network_header(skb) > skb->data) {
+ cmdrsp->net.xmt.lincsum.nhrawoff =
+ skb_network_header(skb) - skb->data;
+ cmdrsp->net.xmt.lincsum.nhrawoffv = 1;
+ }
+ cmdrsp->net.xmt.lincsum.csum = skb->csum;
+ } else {
+ cmdrsp->net.xmt.lincsum.valid = 0;
+ }
+ /* save off the length of the entire data packet */
+ cmdrsp->net.xmt.len = len; /* total data length */
+ /*
+ * copy ethernet header from first frag into cmdrsp
+ * - everything else will be passed in frags & DMA'ed
+ */
+ memcpy(cmdrsp->net.xmt.ethhdr, skb->data, ETH_HEADER_SIZE);
+ /*
+ * copy frags info - from skb->data we need to only provide access
+ * beyond eth header
+ */
+ cmdrsp->net.xmt.num_frags =
+ uisutil_copy_fragsinfo_from_skb("virtnic_xmit", skb, firstfraglen,
+ MAX_PHYS_INFO,
+ cmdrsp->net.xmt.frags);
+ if (cmdrsp->net.xmt.num_frags == -1) {
+ LOGERRNAME(vnicinfo->netdev, "**** FAILED to copy fragsinfo\n");
+ BUSY; /* WILL HAPPEN ONLY IF FRAG ARRAY WITH
+ MAX_PHYS_INFO ENTRIES IS NOT ENOUGH */
+ }
+
+ DBGINF("Forwarding packet cmdrsp:%p\n", cmdrsp);
+
+ /*
+ * don't hold lock when forwarding xmit - if queue is full insert
+ * might sleep
+ */
+ qrslt = uisqueue_put_cmdrsp_with_lock_client(
+ vnicinfo->datachan.chinfo.queueinfo, cmdrsp,
+ IOCHAN_TO_IOPART,
+ (void *)&vnicinfo->datachan.chinfo.insertlock,
+ DONT_ISSUE_INTERRUPT, (uint64_t)NULL,
+ 0 /* don't wait */ ,
+ "vnic");
+ if (!qrslt) {
+ /* failed to queue xmit - return busy */
+ LOGERRNAME(vnicinfo->netdev,
+ "**** FAILED to insert NET_XMIT\n");
+ netif_stop_queue(netdev); /* calling stop queue */
+ BUSY; /* return status that packet not accepted */
+ }
+ /* Track the skbs that have been sent to the IOVM for XMIT */
+ skb_queue_head(&vnicinfo->xmitbufhead, skb);
+
+ /*
+ * set the last transmission start time
+ * linux docs says: Do not forget to update netdev->trans_start to
+ * jiffies after each new tx packet is given to the hardware.
+ */
+ netdev->trans_start = jiffies; /* some code in Linux uses this. */
+
+ /* update xmt stats */
+ UPD_XMT_STATS;
+ vnicinfo->datachan.chstat.sent_xmit++;
+
+ /*
+ * check to see if we have hit the high watermark for
+ * netif_stop_queue()
+ */
+ if (((vnicinfo->datachan.chstat.sent_xmit >=
+ vnicinfo->datachan.chstat.got_xmit_done) &&
+ (vnicinfo->datachan.chstat.sent_xmit -
+ vnicinfo->datachan.chstat.got_xmit_done >=
+ vnicinfo->upper_threshold_net_xmits)) ||
+ /* OR check wrap condition */
+ ((vnicinfo->datachan.chstat.sent_xmit <
+ vnicinfo->datachan.chstat.got_xmit_done) &&
+ (ULONG_MAX - vnicinfo->datachan.chstat.got_xmit_done +
+ vnicinfo->datachan.chstat.sent_xmit >=
+ vnicinfo->upper_threshold_net_xmits))
+ ) {
+ /* too many NET_XMITs queued over to IOVM - need to wait */
+ netif_stop_queue(netdev); /* calling stop queue - call
+ netif_wake_queue() after lower
+ threshold */
+ vnicinfo->flow_control_upper_hits++;
+ }
+
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+
+ /* skb will be freed when we get back NET_XMIT_DONE */
+ return NETDEV_TX_OK;
+}
+
+static void
+virtnic_serverdown_complete(struct work_struct *work)
+{
+ struct virtnic_info *vnicinfo;
+ struct net_device *netdev;
+ struct virtpci_dev *virtpcidev;
+ unsigned long flags;
+ int i = 0, count = 0;
+
+ vnicinfo =
+ container_of(work, struct virtnic_info, serverdown_completion);
+ netdev = vnicinfo->netdev;
+ virtpcidev = vnicinfo->virtpcidev;
+
+ DBGINF("virtpcidev busNo<<%d>>devNo<<%d>>", virtpcidev->busNo,
+ virtpcidev->deviceNo);
+ DBGINF("net_device name<<%s>>", netdev->name);
+ /* Stop Using Datachan */
+ uisthread_stop(&vnicinfo->datachan.chinfo.threadinfo);
+
+ /* Inform Linux that the link is down */
+ netif_carrier_off(netdev);
+ netif_stop_queue(netdev);
+
+ /*
+ * Free the skb for XMITs that haven't been serviced by the server
+ * We shouldn't have to inform Linux about these IOs because they
+ * are "lost in the ethernet"
+ */
+ skb_queue_purge(&vnicinfo->xmitbufhead);
+
+ spin_lock_irqsave(&vnicinfo->priv_lock, flags);
+ /* free rcv buffers */
+ for (i = 0; i < vnicinfo->num_rcv_bufs; i++) {
+ if (vnicinfo->rcvbuf[i]) {
+ kfree_skb(vnicinfo->rcvbuf[i]);
+ vnicinfo->rcvbuf[i] = NULL;
+ count++;
+ }
+ }
+ atomic_set(&vnicinfo->num_rcv_bufs_in_iovm, 0);
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+
+ LOGINFNAME(vnicinfo->netdev, "Closed:%p Freed %d rcv bufs\n", netdev,
+ count);
+
+ vnicinfo->server_down = true;
+ vnicinfo->server_change_state = false;
+ visorchipset_device_pause_response(virtpcidev->bus_no,
+ virtpcidev->device_no, 0);
+}
+
+/* As per VirtpciFunc returns 1 for success and 0 for failure */
+static int
+virtnic_serverdown(struct virtpci_dev *virtpcidev, u32 state)
+{
+ struct net_device *netdev = virtpcidev->net.netdev;
+ struct virtnic_info *vnicinfo = netdev_priv(netdev);
+
+ DBGINF("virtpcidev busNo<<%d>>devNo<<%d>>", virtpcidev->busNo,
+ virtpcidev->deviceNo);
+ DBGINF("entering virtnic_serverdown");
+
+ if (!vnicinfo->server_down && !vnicinfo->server_change_state) {
+ vnicinfo->server_change_state = true;
+ queue_work(virtnic_serverdown_workqueue,
+ &vnicinfo->serverdown_completion);
+ } else if (vnicinfo->server_change_state) {
+ LOGERRNAME(vnicinfo->netdev,
+ "Server already processing change state message.");
+ return 0;
+ } else
+ LOGERRNAME(vnicinfo->netdev,
+ "Server already down, but another server down message received.");
+ DBGINF("exiting virtnic_serverdown");
+ return 1;
+}
+
+/* As per VirtpciFunc returns 1 for success and 0 for failure */
+static int
+virtnic_serverup(struct virtpci_dev *virtpcidev)
+{
+ struct net_device *netdev = virtpcidev->net.netdev;
+ struct virtnic_info *vnicinfo = netdev_priv(netdev);
+ unsigned long flags;
+
+ DBGINF("entering virtnic_serverup");
+ DBGINF("virtpcidev busNo<<%d>>devNo<<%d>>", virtpcidev->busNo,
+ virtpcidev->deviceNo);
+ DBGINF("net_device name<<%s>>", netdev->name);
+ if (vnicinfo->server_down && !vnicinfo->server_change_state) {
+ vnicinfo->server_change_state = true;
+ /*
+ * Must transition channel to ATTACHED state BEFORE we can
+ * start using the device again
+ */
+ SPAR_CHANNEL_CLIENT_TRANSITION(vnicinfo->datachan.chinfo.
+ queueinfo->chan,
+ dev_name(&virtpcidev->
+ generic_dev),
+ CHANNELCLI_ATTACHED, NULL);
+
+ if (!uisthread_start(&vnicinfo->datachan.chinfo.threadinfo,
+ process_incoming_rsps,
+ &vnicinfo->datachan, "vnic_incoming")) {
+ LOGERRNAME(vnicinfo->netdev,
+ "**** FAILED to start thread\n");
+ return 0;
+ }
+
+ init_rcv_bufs(netdev, vnicinfo);
+
+ spin_lock_irqsave(&vnicinfo->priv_lock, flags);
+ vnicinfo->enabled = 1;
+ /*
+ * now we're ready, let's send an ENB to uisnic
+ * but until we get an ACK back from uisnic, we'll drop
+ * the packets
+ */
+ vnicinfo->enab_dis_acked = 0;
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+
+ /*
+ * send enable and wait for ack - don't hold lock when
+ * sending enable because if the queue is full, insert
+ * might sleep.
+ */
+ SEND_ENBDIS(netdev, 1, vnicinfo->cmdrsp_rcv,
+ vnicinfo->datachan.chinfo.queueinfo,
+ &vnicinfo->datachan.chinfo.insertlock,
+ vnicinfo->datachan.chstat);
+ } else if (vnicinfo->server_change_state) {
+ LOGERRNAME(vnicinfo->netdev,
+ "Server already processing change state message.");
+ return 0;
+ } else {
+ DBGINF("Server up message received for server that was already up.");
+ }
+ DBGINF("exiting virtnic_serverup");
+ return 1;
+}
+
+static void
+virtnic_timeout_reset(struct work_struct *work)
+{
+ struct virtnic_info *vnicinfo;
+ struct net_device *netdev;
+ struct virtpci_dev *virtpcidev;
+ int response = 0;
+
+ vnicinfo = container_of(work, struct virtnic_info, timeout_reset);
+ netdev = vnicinfo->netdev;
+
+ DBGINF("net_device name<<%s>>", netdev->name);
+ /* Transmit Timeouts are typically handled by resetting the
+ * device for our virtual NIC we will send a Disable and
+ * Enable to the IOVM. If it doesn't respond we will trigger
+ * a serverdown
+ */
+ DBGINF("Disabling connection to server.\n");
+ netif_stop_queue(netdev);
+ response = virtnic_disable_with_timeout(netdev, 100);
+ if (response != 0)
+ goto call_serverdown;
+
+ DBGINF("Disable returned so reenable connection to server.\n");
+ response = virtnic_enable_with_timeout(netdev, 100);
+ if (response != 0)
+ goto call_serverdown;
+ netif_wake_queue(netdev);
+
+ LOGWRNNAME(vnicinfo->netdev, "Virtual connection reset.\n");
+ return;
+
+call_serverdown:
+ LOGERRNAME(vnicinfo->netdev,
+ "Disable/enabled Pair failed to return so start serverdown.\n");
+ virtpcidev = vnicinfo->virtpcidev;
+ virtnic_serverdown(virtpcidev, 0);
+ return;
+}
+
+static void
+virtnic_xmit_timeout(struct net_device *netdev)
+{
+ struct virtnic_info *vnicinfo = netdev_priv(netdev);
+ unsigned long flags;
+
+ LOGWRNNAME(vnicinfo->netdev,
+ "Transmit Timeout. Resetting virtual connection.\n");
+ LOGWRNNAME(vnicinfo->netdev, "net_device name<<%s>>", netdev->name);
+
+ spin_lock_irqsave(&vnicinfo->priv_lock, flags);
+ /* Ensure that a ServerDown message hasn't been received */
+ if (!vnicinfo->enabled ||
+ (vnicinfo->server_down && !vnicinfo->server_change_state)) {
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+ return;
+ }
+ spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
+
+ queue_work(virtnic_timeout_reset_workqueue, &vnicinfo->timeout_reset);
+}
+
+static void
+virtnic_set_multi(struct net_device *netdev)
+{
+ struct uiscmdrsp *cmdrsp;
+ struct virtnic_info *vnicinfo = netdev_priv(netdev);
+
+ DBGINF("net_device name<<%s>>", netdev->name);
+ DBGINF("entering virtnic_set_multi\n");
+
+ /* any filtering changes? */
+ if (vnicinfo->old_flags != netdev->flags) {
+ LOGINFNAME(vnicinfo->netdev,
+ "old filter = 0x%04x, new filter = 0x%04x.\n",
+ vnicinfo->old_flags, netdev->flags);
+ if ((netdev->flags & IFF_PROMISC) !=
+ (vnicinfo->old_flags & IFF_PROMISC)) {
+ LOGINFNAME(vnicinfo->netdev,
+ "we are %s promiscuous mode.\n",
+ (netdev->
+ flags & IFF_PROMISC) ? "entering" :
+ "exiting");
+ cmdrsp = kmalloc(SIZEOF_CMDRSP, GFP_ATOMIC);
+ if (cmdrsp == NULL) {
+ LOGERRNAME(vnicinfo->netdev,
+ "**** FAILED to kmalloc cmdrsp.\n");
+ return;
+ }
+ memset(cmdrsp, 0, SIZEOF_CMDRSP);
+ cmdrsp->cmdtype = CMD_NET_TYPE;
+ cmdrsp->net.type = NET_RCV_PROMISC;
+ cmdrsp->net.enbdis.context = netdev;
+ cmdrsp->net.enbdis.enable =
+ (netdev->flags & IFF_PROMISC);
+ if (uisqueue_put_cmdrsp_with_lock_client
+ (vnicinfo->datachan.chinfo.queueinfo, cmdrsp,
+ IOCHAN_TO_IOPART,
+ (void *)&vnicinfo->datachan.chinfo.insertlock,
+ DONT_ISSUE_INTERRUPT, (uint64_t)NULL,
+ 0 /* don't wait */ , "vnic")) {
+ vnicinfo->datachan.chstat.sent_promisc++;
+ } else
+ LOGERRNAME(vnicinfo->netdev,
+ "**** FAILED to insert NET_RCV_PROMISC.\n");
+ kfree(cmdrsp);
+ }
+
+ vnicinfo->old_flags = netdev->flags;
+ }
+ DBGINF("exiting virtnic_set_multi\n");
+}
+
+/*****************************************************/
+/* debugfs filesystem functions */
+/*****************************************************/
+
+static ssize_t info_debugfs_read(struct file *file,
+ char __user *buf, size_t len, loff_t *offset)
+{
+ int i;
+ ssize_t bytes_read = 0;
+ int str_pos = 0;
+ struct virtnic_info *vni;
+ char *vbuf;
+
+ if (len > MAX_BUF)
+ len = MAX_BUF;
+ vbuf = kzalloc(len, GFP_KERNEL);
+ if (!vbuf)
+ return -ENOMEM;
+
+ /* for each vnic channel
+ * dump out channel specific data
+ */
+ for (i = 0; i < VIRTNICSOPENMAX; i++) {
+ if (num_virtnic_open[i].netdev == NULL)
+ continue;
+
+ vni = num_virtnic_open[i].vnicinfo;
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, "Vnic i = %d\n", i);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, "netdev = %s (0x%p), MAC Addr: %02x:%02x:%02x:%02x:%02x:%02x\n",
+ num_virtnic_open[i].netdev->name,
+ num_virtnic_open[i].netdev,
+ num_virtnic_open[i].netdev->dev_addr[0],
+ num_virtnic_open[i].netdev->dev_addr[1],
+ num_virtnic_open[i].netdev->dev_addr[2],
+ num_virtnic_open[i].netdev->dev_addr[3],
+ num_virtnic_open[i].netdev->dev_addr[4],
+ num_virtnic_open[i].netdev->dev_addr[5]);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, "vnicinfo = 0x%p\n", vni);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " num_rcv_bufs = %d\n",
+ vni->num_rcv_bufs);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " features = 0x%016llX\n",
+ (uint64_t)readq(&vni->datachan.chinfo.queueinfo->chan->
+ features));
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " max_outstanding_net_xmits = %d\n",
+ vni->max_outstanding_net_xmits);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " upper_threshold_net_xmits = %d\n",
+ vni->upper_threshold_net_xmits);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " lower_threshold_net_xmits = %d\n",
+ vni->lower_threshold_net_xmits);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " queuefullmsg_logged = %d\n",
+ vni->queuefullmsg_logged);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " queueinfo->packets_sent = %lld\n",
+ vni->datachan.chinfo.queueinfo->packets_sent);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " queueinfo->packets_received = %lld\n",
+ vni->datachan.chinfo.queueinfo->packets_received);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " chstat.got_rcv = %lu\n",
+ vni->datachan.chstat.got_rcv);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " chstat.got_enbdisack = %lu\n",
+ vni->datachan.chstat.got_enbdisack);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " chstat.got_xmit_done = %lu\n",
+ vni->datachan.chstat.got_xmit_done);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " chstat.xmit_fail = %lu\n",
+ vni->datachan.chstat.xmit_fail);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " chstat.sent_enbdis = %lu\n",
+ vni->datachan.chstat.sent_enbdis);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " chstat.sent_promisc = %lu\n",
+ vni->datachan.chstat.sent_promisc);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " chstat.sent_post = %lu\n",
+ vni->datachan.chstat.sent_post);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " chstat.sent_xmit = %lu\n",
+ vni->datachan.chstat.sent_xmit);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " chstat.reject_count = %lu\n",
+ vni->datachan.chstat.reject_count);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " chstat.extra_rcvbufs_sent = %lu\n",
+ vni->datachan.chstat.extra_rcvbufs_sent);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " n_rcv0 = %lu\n", vni->n_rcv0);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " n_rcv1 = %lu\n", vni->n_rcv1);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " n_rcv2 = %lu\n", vni->n_rcv2);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " n_rcvx = %lu\n", vni->n_rcvx);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " num_rcv_bufs_in_iovm = %d\n",
+ atomic_read(&vni->num_rcv_bufs_in_iovm));
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " alloc_failed_in_if_needed_cnt = %lu\n",
+ vni->alloc_failed_in_if_needed_cnt);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " alloc_failed_in_repost_return_cnt = %lu\n",
+ vni->alloc_failed_in_repost_return_cnt);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " inner_loop_limit_reached_cnt = %lu\n",
+ vni->inner_loop_limit_reached_cnt);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " found_repost_rcvbuf_cnt = %lu\n",
+ vni->found_repost_rcvbuf_cnt);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " repost_found_skb_cnt = %lu\n",
+ vni->repost_found_skb_cnt);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " n_repost_deficit = %lu\n",
+ vni->n_repost_deficit);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " bad_rcv_buf = %lu\n",
+ vni->bad_rcv_buf);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " n_rcv_packet_not_accepted = %lu\n",
+ vni->n_rcv_packet_not_accepted);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " interrupts_rcvd = %llu\n",
+ vni->interrupts_rcvd);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " interrupts_notme = %llu\n",
+ vni->interrupts_notme);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " interrupts_disabled = %llu\n",
+ vni->interrupts_disabled);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " busy_cnt = %llu\n",
+ vni->busy_cnt);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " flow_control_upper_hits = %llu\n",
+ vni->flow_control_upper_hits);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " flow_control_lower_hits = %llu\n",
+ vni->flow_control_lower_hits);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " thread_wait_ms = %d\n",
+ vni->thread_wait_ms);
+ str_pos += scnprintf(vbuf + str_pos,
+ len - str_pos, " netif_queue = %s\n",
+ netif_queue_stopped(vni->netdev) ?
+ "stopped" : "running");
+ }
+ bytes_read = simple_read_from_buffer(buf, len, offset, vbuf, str_pos);
+ kfree(vbuf);
+ return bytes_read;
+}
+
+static ssize_t enable_ints_write(struct file *file,
+ const char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ char buf[4];
+ int i, new_value;
+ struct virtnic_info *vnicinfo;
+ uint64_t __iomem *features_addr;
+ uint64_t mask;
+
+ if (count >= ARRAY_SIZE(buf))
+ return -EINVAL;
+
+ buf[count] = '\0';
+ if (copy_from_user(buf, buffer, count)) {
+ LOGERR("copy_from_user failed.\n");
+ return -EFAULT;
+ }
+
+ i = kstrtoint(buf, 10 , &new_value);
+
+ if (i != 0) {
+ LOGERR("Failed to scan value for enable_ints, buf<<%.*s>>",
+ (int)count, buf);
+ return -EFAULT;
+ }
+
+ /* set all counts to new_value usually 0 */
+ for (i = 0; i < VIRTNICSOPENMAX; i++) {
+ if (num_virtnic_open[i].vnicinfo != NULL) {
+ vnicinfo = num_virtnic_open[i].vnicinfo;
+ features_addr =
+ &vnicinfo->datachan.chinfo.queueinfo->chan->
+ features;
+ if (new_value == 1) {
+ mask =
+ ~(ULTRA_IO_CHANNEL_IS_POLLING |
+ ULTRA_IO_DRIVER_DISABLES_INTS);
+ uisqueue_interlocked_and(features_addr, mask);
+ mask = ULTRA_IO_DRIVER_ENABLES_INTS;
+ uisqueue_interlocked_or(features_addr, mask);
+ vnicinfo->thread_wait_ms = 2000;
+ } else {
+ mask =
+ ~(ULTRA_IO_DRIVER_ENABLES_INTS |
+ ULTRA_IO_DRIVER_DISABLES_INTS);
+ uisqueue_interlocked_and(features_addr, mask);
+ mask = ULTRA_IO_CHANNEL_IS_POLLING;
+ uisqueue_interlocked_or(features_addr, mask);
+ vnicinfo->thread_wait_ms = 2;
+ }
+ }
+}
+
+return count;
+}
+
+/*****************************************************/
+/* Module init & exit functions */
+/*****************************************************/
+
+static int __init
+virtnic_mod_init(void)
+{
+ int error, i;
+
+ LOGINF("entering virtnic_mod_init");
+ /* ASSERT RCVPOST_BUF_SIZE < 4K */
+ if (RCVPOST_BUF_SIZE > PI_PAGE_SIZE) {
+ LOGERR("**** FAILED RCVPOST_BUF_SIZE:%d larger than a page\n",
+ RCVPOST_BUF_SIZE);
+ return -1;
+ }
+ /* ASSERT RCVPOST_BUF_SIZE is big enough to hold eth header */
+ if (RCVPOST_BUF_SIZE < ETH_HEADER_SIZE) {
+ LOGERR("**** FAILED RCVPOST_BUF_SIZE:%d is < ETH_HEADER_SIZE:%d\n",
+ RCVPOST_BUF_SIZE, ETH_HEADER_SIZE);
+ return -1;
+ }
+
+ /* clear out array */
+ for (i = 0; i < VIRTNICSOPENMAX; i++) {
+ num_virtnic_open[i].netdev = NULL;
+ num_virtnic_open[i].vnicinfo = NULL;
+ }
+ /* create workqueue for serverdown completion */
+ virtnic_serverdown_workqueue =
+ create_singlethread_workqueue("virtnic_serverdown");
+ if (virtnic_serverdown_workqueue == NULL) {
+ LOGERR("**** FAILED virtnic_serverdown_workqueue creation\n");
+ return -1;
+ }
+ /* create workqueue for tx timeout reset */
+ virtnic_timeout_reset_workqueue =
+ create_singlethread_workqueue("virtnic_timeout_reset");
+ if (virtnic_timeout_reset_workqueue == NULL) {
+ LOGERR
+ ("**** FAILED virtnic_timeout_reset_workqueue creation\n");
+ return -1;
+ }
+ virtnic_debugfs_dir = debugfs_create_dir("virtnic", NULL);
+ debugfs_create_file("info", S_IRUSR, virtnic_debugfs_dir,
+ NULL, &debugfs_info_fops);
+ debugfs_create_file("enable_ints", S_IWUSR,
+ virtnic_debugfs_dir, NULL,
+ &debugfs_enable_ints_fops);
+
+ error = virtpci_register_driver(&virtnic_driver);
+ if (error < 0) {
+ LOGERR("**** FAILED to register driver %x\n", error);
+ debugfs_remove_recursive(virtnic_debugfs_dir);
+ return -1;
+ }
+ LOGINF("exiting virtnic_mod_init");
+ return error;
+}
+
+static void __exit
+virtnic_mod_exit(void)
+{
+ LOGINF("entering virtnic_mod_exit...\n");
+ virtpci_unregister_driver(&virtnic_driver);
+ /* unregister is going to call virtnic_remove for all devices */
+ /* destroy serverdown completion workqueue */
+ if (virtnic_serverdown_workqueue) {
+ destroy_workqueue(virtnic_serverdown_workqueue);
+ virtnic_serverdown_workqueue = NULL;
+ }
+
+ /* destroy timeout reset workqueue */
+ if (virtnic_timeout_reset_workqueue) {
+ destroy_workqueue(virtnic_timeout_reset_workqueue);
+ virtnic_timeout_reset_workqueue = NULL;
+ }
+
+ debugfs_remove_recursive(virtnic_debugfs_dir);
+ LOGINF("exiting virtnic_mod_exit...\n");
+}
+
+module_init(virtnic_mod_init);
+module_exit(virtnic_mod_exit);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Usha Srinivasan");
+MODULE_ALIAS("uisvirtnic");
+/* this is extracted during depmod and kept in modules.dep */
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH net 2/2] geneve: Fix races between socket add and release.
From: Jesse Gross @ 2014-12-17 18:48 UTC (permalink / raw)
To: Thomas Graf; +Cc: David Miller, netdev, Andy Zhou, Stephen Hemminger
In-Reply-To: <20141217165454.GE28766@casper.infradead.org>
On Wed, Dec 17, 2014 at 8:54 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 12/16/14 at 06:25pm, Jesse Gross wrote:
>> diff --git a/net/ipv4/geneve.c b/net/ipv4/geneve.c
>> index 5a47188..95e47c9 100644
>> --- a/net/ipv4/geneve.c
>> +++ b/net/ipv4/geneve.c
>> @@ -296,6 +296,7 @@ struct geneve_sock *geneve_sock_add(struct net *net, __be16 port,
>> geneve_rcv_t *rcv, void *data,
>> bool no_share, bool ipv6)
>> {
>> + struct geneve_net *gn = net_generic(net, geneve_net_id);
>> struct geneve_sock *gs;
>>
>> gs = geneve_socket_create(net, port, rcv, data, ipv6);
>> @@ -305,15 +306,15 @@ struct geneve_sock *geneve_sock_add(struct net *net, __be16 port,
>> if (no_share) /* Return error if sharing is not allowed. */
>> return ERR_PTR(-EINVAL);
>>
>> + spin_lock(&gn->sock_lock);
>> gs = geneve_find_sock(net, port);
>
> Perhaps remove the _rcu of the iterator in the geneve_find_sock?
> Also, the kfree_rcu() seems no longer needed as all read accesses
> are protected by the spinlock.
>
>> - if (gs) {
>> - if (gs->rcv == rcv)
>> - atomic_inc(&gs->refcnt);
>> - else
>> + if (gs && ((gs->rcv != rcv) ||
>> + !atomic_add_unless(&gs->refcnt, 1, 0)))
>> gs = ERR_PTR(-EBUSY);
>
> Since you are taking gn->sock_lock in geneve_sock_release()
> anyway, all accesses to refcnt could eventually be converted
> to non-atomic ops.
I generally agree (with the exception of kfree_rcu() - I believe that
is still needed since incoming packets reference it using RCU).
However, since this patch is targeted a net- I wanted to make a
minimal change and not completely redo the locking. A lot of the
locking here was pulled over from VXLAN and I think it can be
simplified since I don't expect that the Geneve code will bring in all
of that logic.
The one part that is not entirely clear is the workqueue in VXLAN used
for destroying the socket. This was added by Stephen in "vxlan: listen
on multiple ports" but it's not obvious to me what problem it is
trying to avoid and I don't see a comment. If possible, it would be
nice to simplify this as well if the issue doesn't apply to Geneve.
^ permalink raw reply
* [PATCH 10/10] Revert "drivers/net: Disable UFO through virtio"
From: Vladislav Yasevich @ 2014-12-17 18:20 UTC (permalink / raw)
To: netdev; +Cc: virtualization, mst, ben, stefanha, Vladislav Yasevich
In-Reply-To: <1418840455-22598-1-git-send-email-vyasevic@redhat.com>
This reverts commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4.
Now that we've split UFO into v4 and v6 version, we can turn
back UFO support for ipv4. Full IPv6 support will come later as
it requires extending vnet header structure.
Any older VM that assumes IPv6 support is included in UFO
will continue to use UFO and the host will generate fragment
ids for it, thus preserving connectivity.
Fixes: 88e0e0e5aa7a ("drivers/net: Disable UFO through virtio")
CC: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
---
drivers/net/virtio_net.c | 24 ++++++++++--------------
1 file changed, 10 insertions(+), 14 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b0bc8ea..534b633 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -491,17 +491,8 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
break;
case VIRTIO_NET_HDR_GSO_UDP:
- {
- static bool warned;
-
- if (!warned) {
- warned = true;
- netdev_warn(dev,
- "host using disabled UFO feature; please fix it\n");
- }
skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
break;
- }
case VIRTIO_NET_HDR_GSO_TCPV6:
skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
break;
@@ -890,6 +881,8 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
+ else if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
+ hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_UDP;
else
BUG();
if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_ECN)
@@ -1749,7 +1742,7 @@ static int virtnet_probe(struct virtio_device *vdev)
dev->features |= NETIF_F_HW_CSUM|NETIF_F_SG|NETIF_F_FRAGLIST;
if (virtio_has_feature(vdev, VIRTIO_NET_F_GSO)) {
- dev->hw_features |= NETIF_F_TSO
+ dev->hw_features |= NETIF_F_TSO | NETIF_F_UFO
| NETIF_F_TSO_ECN | NETIF_F_TSO6;
}
/* Individual feature bits: what can host handle? */
@@ -1759,9 +1752,11 @@ static int virtnet_probe(struct virtio_device *vdev)
dev->hw_features |= NETIF_F_TSO6;
if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_ECN))
dev->hw_features |= NETIF_F_TSO_ECN;
+ if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UFO))
+ dev->hw_features |= NETIF_F_UFO;
if (gso)
- dev->features |= dev->hw_features & NETIF_F_ALL_TSO;
+ dev->features |= dev->hw_features & (NETIF_F_ALL_TSO|NETIF_F_UFO);
/* (!csum && gso) case will be fixed by register_netdev() */
}
if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
@@ -1799,7 +1794,8 @@ static int virtnet_probe(struct virtio_device *vdev)
/* If we can receive ANY GSO packets, we must allocate large ones. */
if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
- virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN))
+ virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
+ virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
vi->big_packets = true;
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
@@ -1993,9 +1989,9 @@ static struct virtio_device_id id_table[] = {
static unsigned int features[] = {
VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GUEST_CSUM,
VIRTIO_NET_F_GSO, VIRTIO_NET_F_MAC,
- VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_TSO6,
+ VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_UFO, VIRTIO_NET_F_HOST_TSO6,
VIRTIO_NET_F_HOST_ECN, VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
- VIRTIO_NET_F_GUEST_ECN,
+ VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
--
1.9.3
^ permalink raw reply related
* [PATCH 09/10] macvtap: Re-enable UFO support
From: Vladislav Yasevich @ 2014-12-17 18:20 UTC (permalink / raw)
To: netdev; +Cc: virtualization, mst, ben, stefanha, Vladislav Yasevich
In-Reply-To: <1418840455-22598-1-git-send-email-vyasevic@redhat.com>
Now that UFO is split into v4 and v6 parts, we can bring
back v4 support. Continue to handle legacy applications
by selecting the ipv6 fagment id but do not change the
gso type. This allows 2 legacy VMs to continue to communicate.
Based on original work from Ben Hutchings.
Fixes: 88e0e0e5aa7a ("drivers/net: Disable UFO through virtio")
CC: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
drivers/net/macvtap.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 880cc09..75febd4 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -66,7 +66,7 @@ static struct cdev macvtap_cdev;
static const struct proto_ops macvtap_socket_ops;
#define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
- NETIF_F_TSO6)
+ NETIF_F_TSO6 | NETIF_F_UFO)
#define RX_OFFLOADS (NETIF_F_GRO | NETIF_F_LRO)
#define TAP_FEATURES (NETIF_F_GSO | NETIF_F_SG)
@@ -570,11 +570,14 @@ static int macvtap_skb_from_vnet_hdr(struct sk_buff *skb,
gso_type = SKB_GSO_TCPV6;
break;
case VIRTIO_NET_HDR_GSO_UDP:
- pr_warn_once("macvtap: %s: using disabled UFO feature; please fix this program\n",
- current->comm);
gso_type = SKB_GSO_UDP;
- if (skb->protocol == htons(ETH_P_IPV6))
+ if (vlan_get_protocol(skb) == htons(ETH_P_IPV6)) {
+ /* This is to support legacy appliacations.
+ * Do not change the gso_type as legacy apps
+ * may not know about the new type.
+ */
ipv6_proxy_select_ident(skb);
+ }
break;
default:
return -EINVAL;
@@ -619,6 +622,8 @@ static void macvtap_skb_to_vnet_hdr(const struct sk_buff *skb,
vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
else if (sinfo->gso_type & SKB_GSO_TCPV6)
vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
+ else if (sinfo->gso_type & SKB_GSO_UDP)
+ vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_UDP;
else
BUG();
if (sinfo->gso_type & SKB_GSO_TCP_ECN)
@@ -955,6 +960,9 @@ static int set_offload(struct macvtap_queue *q, unsigned long arg)
if (arg & TUN_F_TSO6)
feature_mask |= NETIF_F_TSO6;
}
+
+ if (arg & TUN_F_UFO)
+ feature_mask |= NETIF_F_UFO;
}
/* tun/tap driver inverts the usage for TSO offloads, where
@@ -965,7 +973,7 @@ static int set_offload(struct macvtap_queue *q, unsigned long arg)
* When user space turns off TSO, we turn off GSO/LRO so that
* user-space will not receive TSO frames.
*/
- if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6))
+ if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_UFO))
features |= RX_OFFLOADS;
else
features &= ~RX_OFFLOADS;
@@ -1066,7 +1074,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
case TUNSETOFFLOAD:
/* let the user check for future flags */
if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
- TUN_F_TSO_ECN))
+ TUN_F_TSO_ECN | TUN_F_UFO))
return -EINVAL;
rtnl_lock();
--
1.9.3
^ permalink raw reply related
* [PATCH 08/10] tun: Re-uanble UFO support.
From: Vladislav Yasevich @ 2014-12-17 18:20 UTC (permalink / raw)
To: netdev; +Cc: virtualization, mst, ben, stefanha, Vladislav Yasevich
In-Reply-To: <1418840455-22598-1-git-send-email-vyasevic@redhat.com>
Now that UFO is split into v4 and v6 parts, we can bring
back v4 support without any trouble.
Continue to handle legacy applications by selecting the
IPv6 fragment id but do not change the gso type. Thist
makes sure that two legacy VMs may still communicate.
Based on original work from Ben Hutchings.
Fixes: 88e0e0e5aa7a ("drivers/net: Disable UFO through virtio")
CC: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
drivers/net/tun.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 9dd3746..8c32fca 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -175,7 +175,7 @@ struct tun_struct {
struct net_device *dev;
netdev_features_t set_features;
#define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
- NETIF_F_TSO6)
+ NETIF_F_TSO6|NETIF_F_UFO)
int vnet_hdr_sz;
int sndbuf;
@@ -1152,20 +1152,15 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
break;
case VIRTIO_NET_HDR_GSO_UDP:
- {
- static bool warned;
-
- if (!warned) {
- warned = true;
- netdev_warn(tun->dev,
- "%s: using disabled UFO feature; please fix this program\n",
- current->comm);
- }
skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
- if (skb->protocol == htons(ETH_P_IPV6))
+ if (vlan_get_protocol(skb) == htons(ETH_P_IPV6)) {
+ /* This allows legacy application to work.
+ * Do not change the gso_type as it may
+ * not be upderstood by legacy applications.
+ */
ipv6_proxy_select_ident(skb);
+ }
break;
- }
default:
tun->dev->stats.rx_frame_errors++;
kfree_skb(skb);
@@ -1273,6 +1268,8 @@ static ssize_t tun_put_user(struct tun_struct *tun,
gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
else if (sinfo->gso_type & SKB_GSO_TCPV6)
gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
+ else if (sinfo->gso_type & SKB_GSO_UDP)
+ gso.gso_type = VIRTIO_NET_HDR_GSO_UDP;
else {
pr_err("unexpected GSO type: "
"0x%x, gso_size %d, hdr_len %d\n",
@@ -1780,6 +1777,11 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
features |= NETIF_F_TSO6;
arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
}
+
+ if (arg & TUN_F_UFO) {
+ features |= NETIF_F_UFO;
+ arg &= ~TUN_F_UFO;
+ }
}
/* This gives the user a way to test for new features in future by
--
1.9.3
^ permalink raw reply related
* [PATCH 07/10] s2io: Enable UFO6 support.
From: Vladislav Yasevich @ 2014-12-17 18:20 UTC (permalink / raw)
To: netdev; +Cc: virtualization, mst, ben, stefanha, Vladislav Yasevich, Jon Mason
In-Reply-To: <1418840455-22598-1-git-send-email-vyasevic@redhat.com>
CC: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
drivers/net/ethernet/neterion/s2io.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/neterion/s2io.c b/drivers/net/ethernet/neterion/s2io.c
index f5e4b82..d823bb7 100644
--- a/drivers/net/ethernet/neterion/s2io.c
+++ b/drivers/net/ethernet/neterion/s2io.c
@@ -4140,7 +4140,7 @@ static netdev_tx_t s2io_xmit(struct sk_buff *skb, struct net_device *dev)
}
frg_len = skb_headlen(skb);
- if (offload_type == SKB_GSO_UDP) {
+ if (offload_type == SKB_GSO_UDP || offload_type == SKB_GSO_UDP6) {
int ufo_size;
ufo_size = s2io_udp_mss(skb);
@@ -7917,9 +7917,9 @@ s2io_init_nic(struct pci_dev *pdev, const struct pci_device_id *pre)
dev->features |= dev->hw_features |
NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX;
if (sp->device_type & XFRAME_II_DEVICE) {
- dev->hw_features |= NETIF_F_UFO;
+ dev->hw_features |= NETIF_F_ALL_UFO;
if (ufo)
- dev->features |= NETIF_F_UFO;
+ dev->features |= NETIF_F_ALL_UFO;
}
if (sp->high_dma_flag == true)
dev->features |= NETIF_F_HIGHDMA;
--
1.9.3
^ permalink raw reply related
* [PATCH 06/10] macvlan: Enable UFO6 support.
From: Vladislav Yasevich @ 2014-12-17 18:20 UTC (permalink / raw)
To: netdev; +Cc: virtualization, mst, ben, stefanha, Vladislav Yasevich
In-Reply-To: <1418840455-22598-1-git-send-email-vyasevic@redhat.com>
Turn on UFO6 feature.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
drivers/net/macvlan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index bfb0b6e..807b98d 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -746,7 +746,7 @@ static struct lock_class_key macvlan_netdev_addr_lock_key;
#define MACVLAN_FEATURES \
(NETIF_F_SG | NETIF_F_ALL_CSUM | NETIF_F_HIGHDMA | NETIF_F_FRAGLIST | \
- NETIF_F_GSO | NETIF_F_TSO | NETIF_F_UFO | NETIF_F_GSO_ROBUST | \
+ NETIF_F_GSO | NETIF_F_TSO | NETIF_F_ALL_UFO | NETIF_F_GSO_ROBUST | \
NETIF_F_TSO_ECN | NETIF_F_TSO6 | NETIF_F_GRO | NETIF_F_RXCSUM | \
NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_STAG_FILTER)
--
1.9.3
^ permalink raw reply related
* [PATCH 05/10] veth: Enable UFO6 support.
From: Vladislav Yasevich @ 2014-12-17 18:20 UTC (permalink / raw)
To: netdev; +Cc: virtualization, mst, ben, stefanha, Vladislav Yasevich
In-Reply-To: <1418840455-22598-1-git-send-email-vyasevic@redhat.com>
Turn on UFO6 feature.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
drivers/net/veth.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 8ad5965..0052db5 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -280,7 +280,7 @@ static const struct net_device_ops veth_netdev_ops = {
#define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_ALL_TSO | \
NETIF_F_HW_CSUM | NETIF_F_RXCSUM | NETIF_F_HIGHDMA | \
NETIF_F_GSO_GRE | NETIF_F_GSO_UDP_TUNNEL | \
- NETIF_F_GSO_IPIP | NETIF_F_GSO_SIT | NETIF_F_UFO | \
+ NETIF_F_GSO_IPIP | NETIF_F_GSO_SIT | NETIF_F_ALL_UFO | \
NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX | \
NETIF_F_HW_VLAN_STAG_TX | NETIF_F_HW_VLAN_STAG_RX )
--
1.9.3
^ permalink raw reply related
* [PATCH 04/10] loopback: Turn on UFO6 support.
From: Vladislav Yasevich @ 2014-12-17 18:20 UTC (permalink / raw)
To: netdev; +Cc: virtualization, mst, ben, stefanha, Vladislav Yasevich
In-Reply-To: <1418840455-22598-1-git-send-email-vyasevic@redhat.com>
Turn on UFO6 support.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
drivers/net/loopback.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index c76283c..762c28a 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -170,10 +170,10 @@ static void loopback_setup(struct net_device *dev)
dev->flags = IFF_LOOPBACK;
dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
netif_keep_dst(dev);
- dev->hw_features = NETIF_F_ALL_TSO | NETIF_F_UFO;
+ dev->hw_features = NETIF_F_ALL_TSO | NETIF_F_ALL_UFO;
dev->features = NETIF_F_SG | NETIF_F_FRAGLIST
| NETIF_F_ALL_TSO
- | NETIF_F_UFO
+ | NETIF_F_ALL_UFO
| NETIF_F_HW_CSUM
| NETIF_F_RXCSUM
| NETIF_F_SCTP_CSUM
--
1.9.3
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox