Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 01/10] core: Split out UFO6 support
From: Ben Hutchings @ 2014-12-17 20:10 UTC (permalink / raw)
  To: Vladislav Yasevich
  Cc: netdev, virtualization, mst, stefanha, Vladislav Yasevich
In-Reply-To: <1418840455-22598-2-git-send-email-vyasevic@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1588 bytes --]

On Wed, 2014-12-17 at 13:20 -0500, Vladislav Yasevich wrote:
> Split IPv6 support for UFO into its own feature similiar to TSO.
> This will later allow us to re-enable UFO support for virtio-net
> devices.
[...]
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 6c8b6f6..8538b67 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -372,6 +372,7 @@ enum {
>  
>  	SKB_GSO_MPLS = 1 << 12,
>  
> +	SKB_GSO_UDP6 = 1 << 13

It seems like it would be cleaner to use the names SKB_GSO_UDPV{4,6},
similarly to SKB_GSO_TCPV{4,6}.

>  };
>  
>  #if BITS_PER_LONG > 32
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 945bbd0..fa4d2ee 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
[...]
> @@ -5952,24 +5958,21 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
[...]
> +	/* UFO also needs checksumming */
> +	if ((features & NETIF_F_UFO) && !(features & NETIF_F_GEN_CSUM) &&
> +					!(features & NETIF_F_IP_CSUM)) {

You can use !(features & NETIF_F_V4_CSUM) instead of the last two terms.

> +		netdev_dbg(dev,
> +			   "Dropping NETIF_F_UFO since no checksum offload features.\n");
> +		features &= ~NETIF_F_UFO;
> +	}
> +	if ((features & NETIF_F_UFO6) && !(features & NETIF_F_GEN_CSUM) &&
> +					 !(features & NETIF_F_IPV6_CSUM)) {
[...]

Similarly you can use !(features & NETIF_F_V6_CSUM) instead of the last
two terms.

Aside from those minor points, this looks fine.

Ben.

-- 
Ben Hutchings
Absolutum obsoletum. (If it works, it's out of date.) - Stafford Beer

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply

* Re: [PATCH 03/10] ovs: Enable handling of UFO6 packets.
From: Sergei Shtylyov @ 2014-12-17 20:17 UTC (permalink / raw)
  To: Vladislav Yasevich, netdev; +Cc: mst, ben, stefanha, virtualization
In-Reply-To: <1418840455-22598-4-git-send-email-vyasevic@redhat.com>

Hello.

On 12/17/2014 09:20 PM, Vladislav Yasevich wrote:

> Since UFO6 packets can now be identified by SKB_GSO_UDP6, add proper checks
> to handel UFO6 flows.
> Legacy applications may still have UFO6 packets identified by SKB_GSO_UDP,
> so we need to continue to handle them correclty.

> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> ---
>   net/openvswitch/datapath.c | 3 ++-
>   net/openvswitch/flow.c     | 2 +-
>   2 files changed, 3 insertions(+), 2 deletions(-)

> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index f9e556b..b43fc60 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -334,7 +334,8 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb,
>   		if (err)
>   			break;
>
> -		if (skb == segs && gso_type & SKB_GSO_UDP) {
> +		if (skb == segs &&
> +		    ((gso_type & SKB_GSO_UDP) || (gso_type & SKB_GSO_UDP6))) {

    'gso_type & (SKB_GSO_UDP | SKB_GSO_UDP6)' would be shorter...

>   			/* The initial flow key extracted by ovs_flow_extract()
>   			 * in this case is for a first fragment, so we need to
>   			 * properly mark later fragments.
> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index 2b78789..d03adf4 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -602,7 +602,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>
>   		if (key->ip.frag == OVS_FRAG_TYPE_LATER)
>   			return 0;
> -		if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
> +		if (skb_shinfo(skb)->gso_type & (SKB_GSO_UDP | SKB_GSO_UDP6))

    .... like here.

[...]

WBR, Sergei

^ permalink raw reply

* Re: [PATCH] MAINTAINERS: changes for wireless
From: Johannes Berg @ 2014-12-17 20:23 UTC (permalink / raw)
  To: Arend van Spriel; +Cc: John W. Linville, netdev, linux-wireless, davem
In-Reply-To: <5491D991.30908@broadcom.com>

On Wed, 2014-12-17 at 20:29 +0100, Arend van Spriel wrote:

> > +NETWORKING DRIVERS (WIRELESS)
> > +M:	Kalle Valo<kvalo@codeaurora.org>
> > +L:	linux-wireless@vger.kernel.org
> > +Q:	http://patchwork.kernel.org/project/linux-wireless/list/
> > +T:	git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git/
> > +S:	Maintained
> > +F:	drivers/net/wireless/
> 
> So what about the other paths that were in "NETWORKING [WIRELESS]". 
> Couple of them are obviously maintained by Johannes, but..

The remaining ones are probably just wext, and nobody cares any more ...
I guess they're orphaned. If anyone really really really needs to have a
patch against them, and we actually end up wanting it, I'm sure we can
figure something out :)

johannes

^ permalink raw reply

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: Alexei Starovoitov @ 2014-12-17 20:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Martin KaFai Lau, netdev@vger.kernel.org, David S. Miller,
	Hannes Frederic Sowa, Steven Rostedt, Lawrence Brakmo,
	Josef Bacik, Kernel Team

On Wed, Dec 17, 2014 at 11:51 AM, Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
> Em Wed, Dec 17, 2014 at 09:14:02AM -0800, Alexei Starovoitov escreveu:
>> On Wed, Dec 17, 2014 at 7:07 AM, Arnaldo Carvalho de Melo
>> <arnaldo.melo@gmail.com> wrote:
>> > I guess even just using 'perf probe' to set those wannabe tracepoints
>> > should be enough, no? Then he can refer to those in his perf record
>> > call, etc and process it just like with the real tracepoints.
>
>> it's far from ideal for two reasons.
>> - they have different kernels and dragging along vmlinux
>> with debug info or multiple 'perf list' data is too cumbersome
>
> It is not strictly necessary to carry vmlinux, that is just a probe
> point resolution time problem, solvable when generating a shell script,
> on the development machine, to insert the probes.

on N development machines with kernels that
would match worker machines...
I'm not saying it's impossible, just operationally difficult.
This is my understanding of Martin's use case.

>> operationally. Permanent tracepoints solve this problem.
>
> Sure, and when available, use them, my suggestion wasn't to use
> exclusively any mechanism, but to initially use what is available to
> create the tools, then find places that could be improved (if that
> proves to be the case) by using a higher performance mechanism.

agree. I think if kprobe approach was usable, it would have
been used already and yet here you have these patches
that add tracepoints in few strategic places of tcp stack.

>> - the action upon hitting tracepoint is non-trivial.
>> perf probe style of unconditionally walking pointer chains
>> will be tripping over wrong pointers.
>
> Huh? Care to elaborate on this one?

if perf probe does 'result->name' as in your example
then it would work, but patch 5 does conditional
walking of pointers, so you cannot just add
a perf probe that does print(ptr1->value1, ptr2->value2)
It won't crash, but will be collecting wrong stats.
(likely counting zeros)

>> Plus they already need to do aggregation for high
>> frequency events.
>
>> As part of acting on trace_transmit_skb() event:
>> if (before(tcb->seq, tcp_sk(sk)->snd_nxt)) {
>>   tcp_trace_stats_add(...)
>> }
>> if (jiffies_to_msecs(jiffies - sktr->last_ts) ..) {
>>   tcp_trace_stats_add(...)
>> }
>
> But aren't these stats TCP already keeps or could be made to?

that's the whole discussion about.
tcp_info has some of them.
Though it's difficult to claim that, say, tcp_info->tcpi_lost is
the same as loss_segs_retrans from patch 5.

^ permalink raw reply

* Re: [PATCH 01/10] core: Split out UFO6 support
From: Vlad Yasevich @ 2014-12-17 20:43 UTC (permalink / raw)
  To: Ben Hutchings, Vladislav Yasevich; +Cc: netdev, mst, stefanha, virtualization
In-Reply-To: <1418847039.30883.29.camel@decadent.org.uk>

On 12/17/2014 03:10 PM, Ben Hutchings wrote:
> On Wed, 2014-12-17 at 13:20 -0500, Vladislav Yasevich wrote:
>> Split IPv6 support for UFO into its own feature similiar to TSO.
>> This will later allow us to re-enable UFO support for virtio-net
>> devices.
> [...]
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 6c8b6f6..8538b67 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -372,6 +372,7 @@ enum {
>>  
>>  	SKB_GSO_MPLS = 1 << 12,
>>  
>> +	SKB_GSO_UDP6 = 1 << 13
> 
> It seems like it would be cleaner to use the names SKB_GSO_UDPV{4,6},
> similarly to SKB_GSO_TCPV{4,6}.

I wanted to try to avoid touched ipv4 paths if I could.  I could use
GSO_UDPV6 though.

> 
>>  };
>>  
>>  #if BITS_PER_LONG > 32
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 945bbd0..fa4d2ee 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
> [...]
>> @@ -5952,24 +5958,21 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
> [...]
>> +	/* UFO also needs checksumming */
>> +	if ((features & NETIF_F_UFO) && !(features & NETIF_F_GEN_CSUM) &&
>> +					!(features & NETIF_F_IP_CSUM)) {
> 
> You can use !(features & NETIF_F_V4_CSUM) instead of the last two terms.
> 
>> +		netdev_dbg(dev,
>> +			   "Dropping NETIF_F_UFO since no checksum offload features.\n");
>> +		features &= ~NETIF_F_UFO;
>> +	}
>> +	if ((features & NETIF_F_UFO6) && !(features & NETIF_F_GEN_CSUM) &&
>> +					 !(features & NETIF_F_IPV6_CSUM)) {
> [...]
> 
> Similarly you can use !(features & NETIF_F_V6_CSUM) instead of the last
> two terms.

I made those to look the same as the TSO checks for consistency, but I can change
these to be shorter like above.

-vlad

> 
> Aside from those minor points, this looks fine.
> 
> Ben.
> 

^ permalink raw reply

* Re: [PATCH 03/10] ovs: Enable handling of UFO6 packets.
From: Vlad Yasevich @ 2014-12-17 20:44 UTC (permalink / raw)
  To: Sergei Shtylyov, Vladislav Yasevich, netdev
  Cc: mst, ben, stefanha, virtualization
In-Reply-To: <5491E4C2.5080402@cogentembedded.com>

On 12/17/2014 03:17 PM, Sergei Shtylyov wrote:
> Hello.
> 
> On 12/17/2014 09:20 PM, Vladislav Yasevich wrote:
> 
>> Since UFO6 packets can now be identified by SKB_GSO_UDP6, add proper checks
>> to handel UFO6 flows.
>> Legacy applications may still have UFO6 packets identified by SKB_GSO_UDP,
>> so we need to continue to handle them correclty.
> 
>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>> ---
>>   net/openvswitch/datapath.c | 3 ++-
>>   net/openvswitch/flow.c     | 2 +-
>>   2 files changed, 3 insertions(+), 2 deletions(-)
> 
>> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
>> index f9e556b..b43fc60 100644
>> --- a/net/openvswitch/datapath.c
>> +++ b/net/openvswitch/datapath.c
>> @@ -334,7 +334,8 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb,
>>           if (err)
>>               break;
>>
>> -        if (skb == segs && gso_type & SKB_GSO_UDP) {
>> +        if (skb == segs &&
>> +            ((gso_type & SKB_GSO_UDP) || (gso_type & SKB_GSO_UDP6))) {
> 
>    'gso_type & (SKB_GSO_UDP | SKB_GSO_UDP6)' would be shorter...

Thanks, will do.

-vlad

> 
>>               /* The initial flow key extracted by ovs_flow_extract()
>>                * in this case is for a first fragment, so we need to
>>                * properly mark later fragments.
>> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>> index 2b78789..d03adf4 100644
>> --- a/net/openvswitch/flow.c
>> +++ b/net/openvswitch/flow.c
>> @@ -602,7 +602,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>>
>>           if (key->ip.frag == OVS_FRAG_TYPE_LATER)
>>               return 0;
>> -        if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
>> +        if (skb_shinfo(skb)->gso_type & (SKB_GSO_UDP | SKB_GSO_UDP6))
> 
>    .... like here.
> 
> [...]
> 
> WBR, Sergei
> 

^ permalink raw reply

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: rapier @ 2014-12-17 20:45 UTC (permalink / raw)
  To: Yuchung Cheng, Blake Matheny
  Cc: Eric Dumazet, Alexei Starovoitov, Laurent Chavey, Martin Lau,
	netdev@vger.kernel.org, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Josef Bacik, Kernel Team
In-Reply-To: <CAK6E8=fsKNcVBBewDw89cLLFuQb1a0parZ7hmvuRfherqjXTng@mail.gmail.com>

On 12/15/14 2:56 PM, Yuchung Cheng wrote:
> On Mon, Dec 15, 2014 at 8:08 AM, Blake Matheny <bmatheny@fb.com> wrote:
>>
>> We have an additional set of patches for web10g that builds on these
>> tracepoints. It can be made to work either way, but I agree the idea of
>> something like a sockopt would be really nice.
>
> I'd like to compare these patches  with tools that parse pcap files to
> generate per-flow counters to collect RTTs, #dupacks, etc. What
> additional values or insights do they provide to improve/debug TCP
> performance? maybe an example?

So this is our use scenario:

If the stack were instrumented on a per flow basis we can gather metrics 
proactively. This data can likely be processed in a near real time basis 
to at least get some general idea about the health of the flow (dupack, 
cong events, spurious rto, etc). It's possible we can use this data to 
provisionally flag flows during the lifespan of the transfer. If we 
store the collected metrics NOC engineers can access this to make a 
final determination about performance. They may then start the 
resolution process immediately using data collected in situ. With the 
web10g data we do collect stack data but we are also collecting 
information about the path and the interaction between the application 
and the stack.

This scenario is particularly appealing in the realm of big data 
science. We're currently working with datasets that are hundreds of TBs 
in size and will soon be dealing with multiple PBs as a matter of 
course. In many cases we're aware of the path characteristics in advance 
via SDN so we can apply the macroscopic model and see when we're 
dropping below thresholds for that path. Since we're doing most of 
transfers between loosely federated sets of distantly located transfer 
nodes we don't generally have access to the far end of the connection 
which might be the right place to collect the pcap data.

> IMO these stats provide a general pictures of how TCP works of a
> specific network, but not enough to really nail specific bugs in TCP
> protocol or implementation. Then SNMP stats or sampling with pcap
> traces with offline analysis can achieve the same purpose.

I'd agree with that but in the scenario we are most interested in 
protocol/implementation issues are secondary concerns. They are 
important but we've mostly be focused on what we can do to make the 
scientific workflow easier when dealing with the transfer of large data 
sets.

^ permalink raw reply

* pull request: bluetooth 2014-12-17
From: Johan Hedberg @ 2014-12-17 20:46 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-wireless, linux-bluetooth, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1058 bytes --]

Hi Dave,

Here's the first direct (i.e. skipping the wireless tree) bluetooth pull
request for you, intended for 3.19. It's just one patch: a fix from
Marcel for for remote service discovery filtering which also fixes a
'used uninitialized' compiler warning.

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit 65891feac27e26115dc4cce881743a1ac33372df:

  net: Disallow providing non zero VLAN ID for NIC drivers FDB add flow (2014-12-16 15:41:19 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git for-upstream

for you to fetch changes up to ea8ae2516ac43028a01c40b58ffa80d3b0afb802:

  Bluetooth: Fix bug with filter in service discovery optimization (2014-12-17 22:03:49 +0200)

----------------------------------------------------------------
Marcel Holtmann (1):
      Bluetooth: Fix bug with filter in service discovery optimization

 net/bluetooth/mgmt.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)


[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: David Ahern @ 2014-12-17 20:56 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo
  Cc: Martin KaFai Lau, netdev@vger.kernel.org, David S. Miller,
	Hannes Frederic Sowa, Steven Rostedt, Lawrence Brakmo,
	Josef Bacik, Kernel Team
In-Reply-To: <CAADnVQ+zkNL9sJhJuAiQ_y4bis=Sck5pzG86qccXE9vvM0-drQ@mail.gmail.com>

On 12/17/14 1:42 PM, Alexei Starovoitov wrote:
>> It is not strictly necessary to carry vmlinux, that is just a probe
>> >point resolution time problem, solvable when generating a shell script,
>> >on the development machine, to insert the probes.
> on N development machines with kernels that
> would match worker machines...
> I'm not saying it's impossible, just operationally difficult.
> This is my understanding of Martin's use case.
>

That's the use case I am talking about ... N-different kernel versions 
and the probe definitions would need to be generated at *build* time of 
the kernel that uses a cross-compile environment. ie., can't assume 
there is a development machine running the kernel from which you can 
generate the probe definitions. This gets messy quick for embedded 
deployments.

David

^ permalink raw reply

* Re: [PATCH net 2/2] geneve: Fix races between socket add and release.
From: Thomas Graf @ 2014-12-17 21:15 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, netdev, Andy Zhou, Stephen Hemminger
In-Reply-To: <CAEP_g=_-hgeosH83FdPZLb9mvi5DSW6mbe+Xe5x0YoR7mKaTPA@mail.gmail.com>

On 12/17/14 at 10:48am, Jesse Gross wrote:
> I generally agree (with the exception of kfree_rcu() - I believe that
> is still needed since incoming packets reference it using RCU).

I didn't inspect this in full detail but seems like the data path
should only care about gs->sock which is properly refcnt'ed.

> However, since this patch is targeted a net- I wanted to make a
> minimal change and not completely redo the locking. A lot of the
> locking here was pulled over from VXLAN and I think it can be
> simplified since I don't expect that the Geneve code will bring in all
> of that logic.

Makes sense. Feel free to take:
Acked-by: Thomas Graf <tgraf@suug.ch>

> for destroying the socket. This was added by Stephen in "vxlan: listen
> on multiple ports" but it's not obvious to me what problem it is
> trying to avoid and I don't see a comment. If possible, it would be
> nice to simplify this as well if the issue doesn't apply to Geneve.

I don't have an explanation for that either. Each entry on the
vni_list[] takes a vs->refcnt.

^ permalink raw reply

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: Arnaldo Carvalho de Melo @ 2014-12-17 21:19 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, netdev@vger.kernel.org, David S. Miller,
	Hannes Frederic Sowa, Steven Rostedt, Lawrence Brakmo,
	Josef Bacik, Kernel Team
In-Reply-To: <CAADnVQ+zkNL9sJhJuAiQ_y4bis=Sck5pzG86qccXE9vvM0-drQ@mail.gmail.com>

Em Wed, Dec 17, 2014 at 12:42:34PM -0800, Alexei Starovoitov escreveu:
> On Wed, Dec 17, 2014 at 11:51 AM, Arnaldo Carvalho de Melo
> <arnaldo.melo@gmail.com> wrote:
> > Em Wed, Dec 17, 2014 at 09:14:02AM -0800, Alexei Starovoitov escreveu:
> >> On Wed, Dec 17, 2014 at 7:07 AM, Arnaldo Carvalho de Melo
> >> <arnaldo.melo@gmail.com> wrote:
> >> > I guess even just using 'perf probe' to set those wannabe tracepoints
> >> > should be enough, no? Then he can refer to those in his perf record
> >> > call, etc and process it just like with the real tracepoints.
> >
> >> it's far from ideal for two reasons.
> >> - they have different kernels and dragging along vmlinux
> >> with debug info or multiple 'perf list' data is too cumbersome
> >
> > It is not strictly necessary to carry vmlinux, that is just a probe
> > point resolution time problem, solvable when generating a shell script,
> > on the development machine, to insert the probes.
> 
> on N development machines with kernels that
> would match worker machines...
> I'm not saying it's impossible, just operationally difficult.
> This is my understanding of Martin's use case.

The point here is that its difficult to cater to the needs of all
involved, researchers and maintainers don't like to be plastered by
contracts to keep metrics and crossroads that at some point made sense.

It will be difficult, in some cases, to some people, to be able to get
all they want, what I tried to stress is that there are alternatives to
commiting to tons of tracepoints (or just a few), in the form of dynamic
ones, that with some infrastructure, could be put to use before
something better comes along.
 
> >> operationally. Permanent tracepoints solve this problem.
> >
> > Sure, and when available, use them, my suggestion wasn't to use
> > exclusively any mechanism, but to initially use what is available to
> > create the tools, then find places that could be improved (if that
> > proves to be the case) by using a higher performance mechanism.
 
> agree. I think if kprobe approach was usable, it would have

Who said it was not?

> been used already and yet here you have these patches
> that add tracepoints in few strategic places of tcp stack.

Well, up to the point that these points are argued to death to being
strategic enough to have a tracepoint, kprobes is the way to go, or, in
other words, the _only_ way to go, if you don't want to have a patched
kernel.
 
> >> - the action upon hitting tracepoint is non-trivial.
> >> perf probe style of unconditionally walking pointer chains
> >> will be tripping over wrong pointers.
> >
> > Huh? Care to elaborate on this one?
> 
> if perf probe does 'result->name' as in your example
> then it would work, but patch 5 does conditional
> walking of pointers, so you cannot just add
> a perf probe that does print(ptr1->value1, ptr2->value2)
> It won't crash, but will be collecting wrong stats.
> (likely counting zeros)

Right, for that we need to activate eBPF code when we hit such probes,
but then, it continues being something dynamic, not something that is
forever there, in the source code.
 
> >> Plus they already need to do aggregation for high
> >> frequency events.
> >
> >> As part of acting on trace_transmit_skb() event:
> >> if (before(tcb->seq, tcp_sk(sk)->snd_nxt)) {
> >>   tcp_trace_stats_add(...)
> >> }
> >> if (jiffies_to_msecs(jiffies - sktr->last_ts) ..) {
> >>   tcp_trace_stats_add(...)
> >> }
> >
> > But aren't these stats TCP already keeps or could be made to?
> 
> that's the whole discussion about.
> tcp_info has some of them.
> Though it's difficult to claim that, say, tcp_info->tcpi_lost is

For such flexibility I think we need to go the eBPF way, i.e. strive the
most to reduce the cost of inserting a stat collection point.
> the same as loss_segs_retrans from patch 5.

- Arnaldo

^ permalink raw reply

* Re: Bug: mv643xxx fails with highmem
From: Ezequiel Garcia @ 2014-12-17 21:18 UTC (permalink / raw)
  To: Russell King - ARM Linux, David Miller, Nimrod Andy,
	Fabio Estevam
  Cc: netdev, fugang.duan
In-Reply-To: <20141211202507.GS11285@n2100.arm.linux.org.uk>

Russell, David:

On 12/11/2014 05:25 PM, Russell King - ARM Linux wrote:
> On Thu, Dec 11, 2014 at 03:10:55PM -0500, David Miller wrote:
>> From: Russell King - ARM Linux <linux@arm.linux.org.uk>
>> Date: Thu, 11 Dec 2014 19:49:20 +0000
>>
>>> Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
>>> all fragments with dma_map_single().  This fails when the driver is
>>> used in an environment with highmem.
>>
>> This change looks really buggy to me.
>>
>> Unfortunately, all the changes he subsequently makes for software TSO
>> support depend upon this :-/
>>
>> The change is definitely wrong.

I've been trying to find a fix for this issue, and also trying to
reproduce the bug.

As for the fix, we need to fix the non-TSO and TSO paths independently.
The former is fairly straightforward, but the latter might be a bit more
involved.

The problem is that the tso_t struct holds a pointer to the skb linear
and non-linear data.

struct tso_t {
        int next_frag_idx;
        void *data;
        size_t size;
        u16 ip_id;
        u32 tcp_seq;
};

Instead, we should deal with pages, and only map the non-linear skb with
skb_frag_dma_map().

On the other side, I haven't been able to reproduce this on my boards. I
did try to put a hack to hold most lowmem pages, but it didn't make a
difference. (In fact, I haven't been able to clearly see how the pages
for the skbuff are allocated from high memory.)

Russell, would you share any hints about your setup? I don't have access
to any Dove boards at the moment, but I do have Kirkwoods, Armadas and
i.MX6.

Thanks a lot for your report and help!
-- 
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

^ permalink raw reply

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: Arnaldo Carvalho de Melo @ 2014-12-17 21:24 UTC (permalink / raw)
  To: David Ahern
  Cc: Alexei Starovoitov, Martin KaFai Lau, netdev@vger.kernel.org,
	David S. Miller, Hannes Frederic Sowa, Steven Rostedt,
	Lawrence Brakmo, Josef Bacik, Kernel Team
In-Reply-To: <5491EE01.5020406@gmail.com>

Em Wed, Dec 17, 2014 at 01:56:33PM -0700, David Ahern escreveu:
> On 12/17/14 1:42 PM, Alexei Starovoitov wrote:
> >>It is not strictly necessary to carry vmlinux, that is just a probe
> >>>point resolution time problem, solvable when generating a shell script,
> >>>on the development machine, to insert the probes.
> >on N development machines with kernels that
> >would match worker machines...
> >I'm not saying it's impossible, just operationally difficult.
> >This is my understanding of Martin's use case.

> That's the use case I am talking about ... N-different kernel versions and
> the probe definitions would need to be generated at *build* time of the
> kernel that uses a cross-compile environment. ie., can't assume there is a
> development machine running the kernel from which you can generate the probe
> definitions. This gets messy quick for embedded deployments.

It shouldn't, you're saying that the rate of pushing out production
kernels is so high that we get lost and can't find the matching full
debug original binaries used.

We have build-ids for that, to have binary content keys, that we can
match what is in production, that has to be as lean as possible, while
being able to get back to all that fat.

Is it that people want so hard to forget about that extra debugging fat
that in the end we need to keep it to be able to figure out what happens
when things go wrong?

I understand that the expectation is that for each production build
there will be unwieldly different probe point definitions to keep, but
is that so?

- Arnaldo

^ permalink raw reply

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: Josef Bacik @ 2014-12-17 21:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Martin Lau
  Cc: Eric Dumazet, Blake Matheny, Laurent Chavey, Yuchung Cheng,
	netdev@vger.kernel.org, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Kernel Team
In-Reply-To: <CAADnVQL6-72UvvL-T7JUa-6c5+xJ2XggW=iCwb5G_ZbH3r-SZQ@mail.gmail.com>

On 12/16/2014 10:06 PM, Alexei Starovoitov wrote:
> On Tue, Dec 16, 2014 at 5:30 PM, Martin Lau <kafai@fb.com> wrote:
>>>>>>> I think systemtap like scripting on top of patches 1 and 3
>>>>>>> should solve your use case ?
>>>> We have quite a few different versions running in the production.  It may not
>>>> be operationally easy.
>>>
>>> different versions of kernel or different versions of tcp_tracer ?
>> Former and we are releasing new kernel pretty often.
>
> I see. So for dynamic tracer to be useful in such environment,
> the scripts should be compatible across different kernel version
> without recompilation. All makes sense.
>
>> How does the current TRACE_EVENT do it when it wants to printf more data?
>
> tracepoints, like any other user interface, shouldn't
> break compatibility. With printf it's practically impossible.
> Some subsystems may be breaking this rule arguing that
> tracepoints is a debug facility, but networking tracepoints don't change.
>

So that's what the events/<subsystem>/<event>/format is for, to provide 
a nice way for scripts to know what they are looking at.  For things 
like the tcp estats and other tracing tools we use in production 
internally we use something (our own stuff in case of estats, trace-cmd 
in the case of normal tracepoints) to read the raw data and pull out the 
fields we need, and that way it works no matter what kernel we're on. 
Sometimes tracepoints move and so we have to adjust our scripts, but 
that's the cost of doing business and I think that's acceptable.

>>> It feels that for stats collection only, tracepoints+tcp_trace
>>> do not add much additional value vs extending tcp_info
>>> and using ss.
>> I think we are on the same page. Once 'this should cost nothing if not
>> activated' proposition was cleared out.  It was what I meant that doing the
>> collection part in the TCP itself (instead of tracepoints) would be nice.
>
> agree.
>
>> I think going forward, as others have suggested, it may be better to come
>> together and reach a common ground on what to collect first before I re-work
>> patch 1 to 3 and repost.
>
> I think as a minimum it will be discussed at netdev01 in Feb,
> but I suspect not everyone on this list can(want) go to Ottawa,
> so would be nice to have a meetup for bay area folks to
> discuss this sooner with public g+ hangout.
> Thoughts?
>

Yeah I think we're all in agreement that this is a good netdev01 
discussion.  I'm happy to include people who want to talk about this 
before hand in the bay area meetup we're throwing, but it seems like 
this is going to be something that the larger community is going to want 
to talk about so it may be more productive to wait until netdev01.  Thanks,

Josef

^ permalink raw reply

* Re: [PATCH 03/10] ovs: Enable handling of UFO6 packets.
From: Michael S. Tsirkin @ 2014-12-17 22:26 UTC (permalink / raw)
  To: Vladislav Yasevich; +Cc: netdev, ben, stefanha, virtualization
In-Reply-To: <1418840455-22598-4-git-send-email-vyasevic@redhat.com>

On Wed, Dec 17, 2014 at 01:20:48PM -0500, Vladislav Yasevich wrote:
> Since UFO6 packets can now be identified by SKB_GSO_UDP6, add proper checks
> to handel UFO6 flows.

s/handel/handle/

> Legacy applications may still have UFO6 packets identified by SKB_GSO_UDP,
> so we need to continue to handle them correclty.
> 
> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> ---
>  net/openvswitch/datapath.c | 3 ++-
>  net/openvswitch/flow.c     | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index f9e556b..b43fc60 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -334,7 +334,8 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb,
>  		if (err)
>  			break;
>  
> -		if (skb == segs && gso_type & SKB_GSO_UDP) {
> +		if (skb == segs &&
> +		    ((gso_type & SKB_GSO_UDP) || (gso_type & SKB_GSO_UDP6))) {
>  			/* The initial flow key extracted by ovs_flow_extract()
>  			 * in this case is for a first fragment, so we need to
>  			 * properly mark later fragments.
> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index 2b78789..d03adf4 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -602,7 +602,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>  
>  		if (key->ip.frag == OVS_FRAG_TYPE_LATER)
>  			return 0;
> -		if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
> +		if (skb_shinfo(skb)->gso_type & (SKB_GSO_UDP | SKB_GSO_UDP6))
>  			key->ip.frag = OVS_FRAG_TYPE_FIRST;
>  
>  		/* Transport layer. */
> -- 
> 1.9.3

^ permalink raw reply

* Re: [PATCH 08/10] tun: Re-uanble UFO support.
From: Michael S. Tsirkin @ 2014-12-17 22:33 UTC (permalink / raw)
  To: Vladislav Yasevich; +Cc: netdev, ben, stefanha, virtualization
In-Reply-To: <1418840455-22598-9-git-send-email-vyasevic@redhat.com>

subs: re-enable

On Wed, Dec 17, 2014 at 01:20:53PM -0500, Vladislav Yasevich wrote:
> Now that UFO is split into v4 and v6 parts, we can bring
> back v4 support without any trouble.
> 
> Continue to handle legacy applications by selecting the
> IPv6 fragment id but do not change the gso type.  Thist

s/Thist/this/

> makes sure that two legacy VMs may still communicate.

This means IPv6 skbs with UFO (not UFO6) flag set are present
in the stack.
If possible, I think it would be better to make GSO type correct: UFO6,
and then convert to UFO when copying to guest.

A similar approach should be possible  for OVS?


> Based on original work from Ben Hutchings.
> 
> Fixes: 88e0e0e5aa7a ("drivers/net: Disable UFO through virtio")
> CC: Ben Hutchings <ben@decadent.org.uk>
> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> ---
>  drivers/net/tun.c | 26 ++++++++++++++------------
>  1 file changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 9dd3746..8c32fca 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -175,7 +175,7 @@ struct tun_struct {
>  	struct net_device	*dev;
>  	netdev_features_t	set_features;
>  #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
> -			  NETIF_F_TSO6)
> +			  NETIF_F_TSO6|NETIF_F_UFO)
>  
>  	int			vnet_hdr_sz;
>  	int			sndbuf;
> @@ -1152,20 +1152,15 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>  			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
>  			break;
>  		case VIRTIO_NET_HDR_GSO_UDP:
> -		{
> -			static bool warned;
> -
> -			if (!warned) {
> -				warned = true;
> -				netdev_warn(tun->dev,
> -					    "%s: using disabled UFO feature; please fix this program\n",
> -					    current->comm);
> -			}
>  			skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> -			if (skb->protocol == htons(ETH_P_IPV6))
> +			if (vlan_get_protocol(skb) == htons(ETH_P_IPV6)) {
> +				/* This allows legacy application to work.
> +				 * Do not change the gso_type as it may
> +				 * not be upderstood by legacy applications.

Shouldn't we handle legacy applications when passing packets to
userspace?

> +				 */
>  				ipv6_proxy_select_ident(skb);
> +			}
>  			break;
> -		}
>  		default:
>  			tun->dev->stats.rx_frame_errors++;
>  			kfree_skb(skb);
> @@ -1273,6 +1268,8 @@ static ssize_t tun_put_user(struct tun_struct *tun,
>  				gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
>  			else if (sinfo->gso_type & SKB_GSO_TCPV6)
>  				gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
> +			else if (sinfo->gso_type & SKB_GSO_UDP)
> +				gso.gso_type = VIRTIO_NET_HDR_GSO_UDP;
>  			else {
>  				pr_err("unexpected GSO type: "
>  				       "0x%x, gso_size %d, hdr_len %d\n",
> @@ -1780,6 +1777,11 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
>  				features |= NETIF_F_TSO6;
>  			arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
>  		}
> +
> +		if (arg & TUN_F_UFO) {
> +			features |= NETIF_F_UFO;
> +			arg &= ~TUN_F_UFO;
> +		}
>  	}
>  
>  	/* This gives the user a way to test for new features in future by
> -- 
> 1.9.3

^ permalink raw reply

* Re: [PATCH 09/10] macvtap: Re-enable UFO support
From: Michael S. Tsirkin @ 2014-12-17 22:41 UTC (permalink / raw)
  To: Vladislav Yasevich; +Cc: netdev, ben, stefanha, virtualization
In-Reply-To: <1418840455-22598-10-git-send-email-vyasevic@redhat.com>

On Wed, Dec 17, 2014 at 01:20:54PM -0500, Vladislav Yasevich wrote:
> Now that UFO is split into v4 and v6 parts, we can bring
> back v4 support.  Continue to handle legacy applications
> by selecting the ipv6 fagment id but do not change the
> gso type.  This allows 2 legacy VMs to continue to communicate.
> 
> Based on original work from Ben Hutchings.
> 
> Fixes: 88e0e0e5aa7a ("drivers/net: Disable UFO through virtio")
> CC: Ben Hutchings <ben@decadent.org.uk>
> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> ---
>  drivers/net/macvtap.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 880cc09..75febd4 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -66,7 +66,7 @@ static struct cdev macvtap_cdev;
>  static const struct proto_ops macvtap_socket_ops;
>  
>  #define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
> -		      NETIF_F_TSO6)
> +		      NETIF_F_TSO6 | NETIF_F_UFO)
>  #define RX_OFFLOADS (NETIF_F_GRO | NETIF_F_LRO)
>  #define TAP_FEATURES (NETIF_F_GSO | NETIF_F_SG)
>  
> @@ -570,11 +570,14 @@ static int macvtap_skb_from_vnet_hdr(struct sk_buff *skb,
>  			gso_type = SKB_GSO_TCPV6;
>  			break;
>  		case VIRTIO_NET_HDR_GSO_UDP:
> -			pr_warn_once("macvtap: %s: using disabled UFO feature; please fix this program\n",
> -				     current->comm);
>  			gso_type = SKB_GSO_UDP;
> -			if (skb->protocol == htons(ETH_P_IPV6))
> +			if (vlan_get_protocol(skb) == htons(ETH_P_IPV6)) {
> +				/* This is to support legacy appliacations.
> +				 * Do not change the gso_type as legacy apps
> +				 * may not know about the new type.
> +				 */
>  				ipv6_proxy_select_ident(skb);
> +			}
>  			break;
>  		default:
>  			return -EINVAL;
> @@ -619,6 +622,8 @@ static void macvtap_skb_to_vnet_hdr(const struct sk_buff *skb,
>  			vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
>  		else if (sinfo->gso_type & SKB_GSO_TCPV6)
>  			vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
> +		else if (sinfo->gso_type & SKB_GSO_UDP)
> +			vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_UDP;
>  		else
>  			BUG();
>  		if (sinfo->gso_type & SKB_GSO_TCP_ECN)
> @@ -955,6 +960,9 @@ static int set_offload(struct macvtap_queue *q, unsigned long arg)
>  			if (arg & TUN_F_TSO6)
>  				feature_mask |= NETIF_F_TSO6;
>  		}
> +
> +		if (arg & TUN_F_UFO)
> +			feature_mask |= NETIF_F_UFO;
>  	}
>  
>  	/* tun/tap driver inverts the usage for TSO offloads, where
> @@ -965,7 +973,7 @@ static int set_offload(struct macvtap_queue *q, unsigned long arg)
>  	 * When user space turns off TSO, we turn off GSO/LRO so that
>  	 * user-space will not receive TSO frames.
>  	 */
> -	if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6))
> +	if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_UFO))
>  		features |= RX_OFFLOADS;
>  	else
>  		features &= ~RX_OFFLOADS;

By the way this logic is completely broken, even without your patch,
isn't it?

It says: enable LRO+GRO if any of NETIF_F_TSO | NETIF_F_TSO6 |
NETIF_F_UFO set.

So what happens if I enable TSO only?
LRO gets enabled so I can still get TSO6 packets.


This really should be:

	if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_UFO) ==
			(NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_UFO)


fixing this probably should be a separate patch before your
series, and Cc stable.


> @@ -1066,7 +1074,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>  	case TUNSETOFFLOAD:
>  		/* let the user check for future flags */
>  		if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
> -			    TUN_F_TSO_ECN))
> +			    TUN_F_TSO_ECN | TUN_F_UFO))
>  			return -EINVAL;
>  
>  		rtnl_lock();
> -- 
> 1.9.3

^ permalink raw reply

* Re: [PATCH 10/10] Revert "drivers/net: Disable UFO through virtio"
From: Michael S. Tsirkin @ 2014-12-17 22:44 UTC (permalink / raw)
  To: Vladislav Yasevich; +Cc: netdev, ben, stefanha, virtualization
In-Reply-To: <1418840455-22598-11-git-send-email-vyasevic@redhat.com>

On Wed, Dec 17, 2014 at 01:20:55PM -0500, Vladislav Yasevich wrote:
> This reverts commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4.
> Now that we've split UFO into v4 and v6 version, we can turn
> back UFO support for ipv4.  Full IPv6 support will come later as
> it requires extending vnet header structure.
> 
> Any older VM that assumes IPv6 support is included in UFO
> will continue to use UFO and the host will generate fragment
> ids for it, thus preserving connectivity.
> 
> Fixes: 88e0e0e5aa7a ("drivers/net: Disable UFO through virtio")
> CC: Ben Hutchings <ben@decadent.org.uk>
> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
> ---
>  drivers/net/virtio_net.c | 24 ++++++++++--------------
>  1 file changed, 10 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index b0bc8ea..534b633 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -491,17 +491,8 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
>  			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
>  			break;
>  		case VIRTIO_NET_HDR_GSO_UDP:
> -		{
> -			static bool warned;
> -
> -			if (!warned) {
> -				warned = true;
> -				netdev_warn(dev,
> -					    "host using disabled UFO feature; please fix it\n");
> -			}
>  			skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
>  			break;


This might not be true: could be a legacy host.
I think we need to check for IPv6 and set it
correctly to SKB_GSO_UDP/SKB_GSO_UDP6?


> -		}
>  		case VIRTIO_NET_HDR_GSO_TCPV6:
>  			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
>  			break;
> @@ -890,6 +881,8 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>  			hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
>  		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
>  			hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
> +		else if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
> +			hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_UDP;
>  		else
>  			BUG();
>  		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_ECN)
> @@ -1749,7 +1742,7 @@ static int virtnet_probe(struct virtio_device *vdev)
>  			dev->features |= NETIF_F_HW_CSUM|NETIF_F_SG|NETIF_F_FRAGLIST;
>  
>  		if (virtio_has_feature(vdev, VIRTIO_NET_F_GSO)) {
> -			dev->hw_features |= NETIF_F_TSO
> +			dev->hw_features |= NETIF_F_TSO | NETIF_F_UFO
>  				| NETIF_F_TSO_ECN | NETIF_F_TSO6;
>  		}
>  		/* Individual feature bits: what can host handle? */
> @@ -1759,9 +1752,11 @@ static int virtnet_probe(struct virtio_device *vdev)
>  			dev->hw_features |= NETIF_F_TSO6;
>  		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_ECN))
>  			dev->hw_features |= NETIF_F_TSO_ECN;
> +		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UFO))
> +			dev->hw_features |= NETIF_F_UFO;
>  
>  		if (gso)
> -			dev->features |= dev->hw_features & NETIF_F_ALL_TSO;
> +			dev->features |= dev->hw_features & (NETIF_F_ALL_TSO|NETIF_F_UFO);
>  		/* (!csum && gso) case will be fixed by register_netdev() */
>  	}
>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> @@ -1799,7 +1794,8 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	/* If we can receive ANY GSO packets, we must allocate large ones. */
>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>  	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> -	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN))
> +	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> +	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>  		vi->big_packets = true;
>  
>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> @@ -1993,9 +1989,9 @@ static struct virtio_device_id id_table[] = {
>  static unsigned int features[] = {
>  	VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GUEST_CSUM,
>  	VIRTIO_NET_F_GSO, VIRTIO_NET_F_MAC,
> -	VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_TSO6,
> +	VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_UFO, VIRTIO_NET_F_HOST_TSO6,
>  	VIRTIO_NET_F_HOST_ECN, VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
> -	VIRTIO_NET_F_GUEST_ECN,
> +	VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
>  	VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
>  	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
>  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
> -- 
> 1.9.3

^ permalink raw reply

* Re: [PATCH 01/10] core: Split out UFO6 support
From: Michael S. Tsirkin @ 2014-12-17 22:45 UTC (permalink / raw)
  To: Vladislav Yasevich; +Cc: netdev, ben, stefanha, virtualization
In-Reply-To: <1418840455-22598-2-git-send-email-vyasevic@redhat.com>

On Wed, Dec 17, 2014 at 01:20:46PM -0500, Vladislav Yasevich wrote:
> Split IPv6 support for UFO into its own feature similiar to TSO.
> This will later allow us to re-enable UFO support for virtio-net
> devices.
> 
> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> ---
>  include/linux/netdev_features.h |  7 +++++--
>  include/linux/netdevice.h       |  1 +
>  include/linux/skbuff.h          |  1 +
>  net/core/dev.c                  | 35 +++++++++++++++++++----------------
>  net/core/ethtool.c              |  2 +-
>  5 files changed, 27 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
> index dcfdecb..a078945 100644
> --- a/include/linux/netdev_features.h
> +++ b/include/linux/netdev_features.h
> @@ -48,8 +48,9 @@ enum {
>  	NETIF_F_GSO_UDP_TUNNEL_BIT,	/* ... UDP TUNNEL with TSO */
>  	NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT,/* ... UDP TUNNEL with TSO & CSUM */
>  	NETIF_F_GSO_MPLS_BIT,		/* ... MPLS segmentation */
> +	NETIF_F_UFO6_BIT,		/* ... UDPv6 fragmentation */
>  	/**/NETIF_F_GSO_LAST =		/* last bit, see GSO_MASK */
> -		NETIF_F_GSO_MPLS_BIT,
> +		NETIF_F_UFO6_BIT,
>  
>  	NETIF_F_FCOE_CRC_BIT,		/* FCoE CRC32 */
>  	NETIF_F_SCTP_CSUM_BIT,		/* SCTP checksum offload */
> @@ -109,6 +110,7 @@ enum {
>  #define NETIF_F_TSO_ECN		__NETIF_F(TSO_ECN)
>  #define NETIF_F_TSO		__NETIF_F(TSO)
>  #define NETIF_F_UFO		__NETIF_F(UFO)
> +#define NETIF_F_UFO6		__NETIF_F(UFO6)
>  #define NETIF_F_VLAN_CHALLENGED	__NETIF_F(VLAN_CHALLENGED)
>  #define NETIF_F_RXFCS		__NETIF_F(RXFCS)
>  #define NETIF_F_RXALL		__NETIF_F(RXALL)
> @@ -141,7 +143,7 @@ enum {
>  
>  /* List of features with software fallbacks. */
>  #define NETIF_F_GSO_SOFTWARE	(NETIF_F_TSO | NETIF_F_TSO_ECN | \
> -				 NETIF_F_TSO6 | NETIF_F_UFO)
> +				 NETIF_F_TSO6 | NETIF_F_UFO | NETIF_F_UFO6)
>  
>  #define NETIF_F_GEN_CSUM	NETIF_F_HW_CSUM
>  #define NETIF_F_V4_CSUM		(NETIF_F_GEN_CSUM | NETIF_F_IP_CSUM)
> @@ -149,6 +151,7 @@ enum {
>  #define NETIF_F_ALL_CSUM	(NETIF_F_V4_CSUM | NETIF_F_V6_CSUM)
>  
>  #define NETIF_F_ALL_TSO 	(NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_TSO_ECN)
> +#define NETIF_F_ALL_UFO		(NETIF_F_UFO | NETIF_F_UFO6)
>  
>  #define NETIF_F_ALL_FCOE	(NETIF_F_FCOE_CRC | NETIF_F_FCOE_MTU | \
>  				 NETIF_F_FSO)
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 74fd5d3..86af10a 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3559,6 +3559,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
>  	/* check flags correspondence */
>  	BUILD_BUG_ON(SKB_GSO_TCPV4   != (NETIF_F_TSO >> NETIF_F_GSO_SHIFT));
>  	BUILD_BUG_ON(SKB_GSO_UDP     != (NETIF_F_UFO >> NETIF_F_GSO_SHIFT));
> +	BUILD_BUG_ON(SKB_GSO_UDP6    != (NETIF_F_UFO6 >> NETIF_F_GSO_SHIFT));
>  	BUILD_BUG_ON(SKB_GSO_DODGY   != (NETIF_F_GSO_ROBUST >> NETIF_F_GSO_SHIFT));
>  	BUILD_BUG_ON(SKB_GSO_TCP_ECN != (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
>  	BUILD_BUG_ON(SKB_GSO_TCPV6   != (NETIF_F_TSO6 >> NETIF_F_GSO_SHIFT));
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 6c8b6f6..8538b67 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -372,6 +372,7 @@ enum {
>  
>  	SKB_GSO_MPLS = 1 << 12,
>  
> +	SKB_GSO_UDP6 = 1 << 13
>  };
>  
>  #if BITS_PER_LONG > 32

So this implies anything getting GSO packets e.g.
from userspace now needs to check IP version to
set GSO type correctly.

I think you missed some places that do this, e.g. af_packet
sockets.


> diff --git a/net/core/dev.c b/net/core/dev.c
> index 945bbd0..fa4d2ee 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5929,6 +5929,12 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>  		features &= ~NETIF_F_ALL_TSO;
>  	}
>  
> +	/* UFO requires that SG is present as well */
> +	if ((features & NETIF_F_ALL_UFO) && !(features & NETIF_F_SG)) {
> +		netdev_dbg(dev, "Dropping UFO features since no SG feature.\n");
> +		features &= ~NETIF_F_ALL_UFO;
> +	}
> +
>  	if ((features & NETIF_F_TSO) && !(features & NETIF_F_HW_CSUM) &&
>  					!(features & NETIF_F_IP_CSUM)) {
>  		netdev_dbg(dev, "Dropping TSO features since no CSUM feature.\n");
> @@ -5952,24 +5958,21 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>  		features &= ~NETIF_F_GSO;
>  	}
>  
> -	/* UFO needs SG and checksumming */
> -	if (features & NETIF_F_UFO) {
> -		/* maybe split UFO into V4 and V6? */
> -		if (!((features & NETIF_F_GEN_CSUM) ||
> -		    (features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))
> -			    == (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))) {
> -			netdev_dbg(dev,
> -				"Dropping NETIF_F_UFO since no checksum offload features.\n");
> -			features &= ~NETIF_F_UFO;
> -		}
> -
> -		if (!(features & NETIF_F_SG)) {
> -			netdev_dbg(dev,
> -				"Dropping NETIF_F_UFO since no NETIF_F_SG feature.\n");
> -			features &= ~NETIF_F_UFO;
> -		}
> +	/* UFO also needs checksumming */
> +	if ((features & NETIF_F_UFO) && !(features & NETIF_F_GEN_CSUM) &&
> +					!(features & NETIF_F_IP_CSUM)) {
> +		netdev_dbg(dev,
> +			   "Dropping NETIF_F_UFO since no checksum offload features.\n");
> +		features &= ~NETIF_F_UFO;
> +	}
> +	if ((features & NETIF_F_UFO6) && !(features & NETIF_F_GEN_CSUM) &&
> +					 !(features & NETIF_F_IPV6_CSUM)) {
> +		netdev_dbg(dev,
> +			   "Dropping NETIF_F_UFO6 since no checksum offload features.\n");
> +		features &= ~NETIF_F_UFO6;
>  	}
>  
> +
>  #ifdef CONFIG_NET_RX_BUSY_POLL
>  	if (dev->netdev_ops->ndo_busy_poll)
>  		features |= NETIF_F_BUSY_POLL;
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index 06dfb29..93eff41 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -223,7 +223,7 @@ static netdev_features_t ethtool_get_feature_mask(u32 eth_cmd)
>  		return NETIF_F_ALL_TSO;
>  	case ETHTOOL_GUFO:
>  	case ETHTOOL_SUFO:
> -		return NETIF_F_UFO;
> +		return NETIF_F_ALL_UFO;
>  	case ETHTOOL_GGSO:
>  	case ETHTOOL_SGSO:
>  		return NETIF_F_GSO;
> -- 
> 1.9.3

^ permalink raw reply

* Re: [PATCH] Documentation: clarify phys_port_id
From: Florian Fainelli @ 2014-12-17 23:24 UTC (permalink / raw)
  To: Dan Williams, netdev; +Cc: Joshua Watt, jpirko
In-Reply-To: <1418835576.1160.38.camel@dcbw.local>

On 17/12/14 08:59, Dan Williams wrote:
> Signed-off-by: Dan Williams <dcbw@redhat.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

> ---
>  Documentation/ABI/testing/sysfs-class-net | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net
> index e1b2e78..7fe823a 100644
> --- a/Documentation/ABI/testing/sysfs-class-net
> +++ b/Documentation/ABI/testing/sysfs-class-net
> @@ -186,7 +186,12 @@ KernelVersion:	3.12
>  Contact:	netdev@vger.kernel.org
>  Description:
>  		Indicates the interface unique physical port identifier within
> -		the NIC, as a string.
> +		the NIC, as a string.  If two net_device objects share physical
> +		hardware or other resources, and/or do not operate independently
> +		both net_device objects should be assigned the
> +		same phys_port_id.  phys_port_id should be as globally unique
> +		as possible to prevent conflicts between different drivers and
> +		vendors, eg with MAC addresses or hardware GUIDs.
>  
>  What:		/sys/class/net/<iface>/speed
>  Date:		October 2009
> 

^ permalink raw reply

* Re: [PATCH 01/10] core: Split out UFO6 support
From: Vlad Yasevich @ 2014-12-17 23:31 UTC (permalink / raw)
  To: Michael S. Tsirkin, Vladislav Yasevich
  Cc: netdev, ben, stefanha, virtualization
In-Reply-To: <20141217224553.GE30969@redhat.com>

On 12/17/2014 05:45 PM, Michael S. Tsirkin wrote:
> On Wed, Dec 17, 2014 at 01:20:46PM -0500, Vladislav Yasevich wrote:
>> Split IPv6 support for UFO into its own feature similiar to TSO.
>> This will later allow us to re-enable UFO support for virtio-net
>> devices.
>>
>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>> ---
>>  include/linux/netdev_features.h |  7 +++++--
>>  include/linux/netdevice.h       |  1 +
>>  include/linux/skbuff.h          |  1 +
>>  net/core/dev.c                  | 35 +++++++++++++++++++----------------
>>  net/core/ethtool.c              |  2 +-
>>  5 files changed, 27 insertions(+), 19 deletions(-)
>>
>> diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
>> index dcfdecb..a078945 100644
>> --- a/include/linux/netdev_features.h
>> +++ b/include/linux/netdev_features.h
>> @@ -48,8 +48,9 @@ enum {
>>  	NETIF_F_GSO_UDP_TUNNEL_BIT,	/* ... UDP TUNNEL with TSO */
>>  	NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT,/* ... UDP TUNNEL with TSO & CSUM */
>>  	NETIF_F_GSO_MPLS_BIT,		/* ... MPLS segmentation */
>> +	NETIF_F_UFO6_BIT,		/* ... UDPv6 fragmentation */
>>  	/**/NETIF_F_GSO_LAST =		/* last bit, see GSO_MASK */
>> -		NETIF_F_GSO_MPLS_BIT,
>> +		NETIF_F_UFO6_BIT,
>>  
>>  	NETIF_F_FCOE_CRC_BIT,		/* FCoE CRC32 */
>>  	NETIF_F_SCTP_CSUM_BIT,		/* SCTP checksum offload */
>> @@ -109,6 +110,7 @@ enum {
>>  #define NETIF_F_TSO_ECN		__NETIF_F(TSO_ECN)
>>  #define NETIF_F_TSO		__NETIF_F(TSO)
>>  #define NETIF_F_UFO		__NETIF_F(UFO)
>> +#define NETIF_F_UFO6		__NETIF_F(UFO6)
>>  #define NETIF_F_VLAN_CHALLENGED	__NETIF_F(VLAN_CHALLENGED)
>>  #define NETIF_F_RXFCS		__NETIF_F(RXFCS)
>>  #define NETIF_F_RXALL		__NETIF_F(RXALL)
>> @@ -141,7 +143,7 @@ enum {
>>  
>>  /* List of features with software fallbacks. */
>>  #define NETIF_F_GSO_SOFTWARE	(NETIF_F_TSO | NETIF_F_TSO_ECN | \
>> -				 NETIF_F_TSO6 | NETIF_F_UFO)
>> +				 NETIF_F_TSO6 | NETIF_F_UFO | NETIF_F_UFO6)
>>  
>>  #define NETIF_F_GEN_CSUM	NETIF_F_HW_CSUM
>>  #define NETIF_F_V4_CSUM		(NETIF_F_GEN_CSUM | NETIF_F_IP_CSUM)
>> @@ -149,6 +151,7 @@ enum {
>>  #define NETIF_F_ALL_CSUM	(NETIF_F_V4_CSUM | NETIF_F_V6_CSUM)
>>  
>>  #define NETIF_F_ALL_TSO 	(NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_TSO_ECN)
>> +#define NETIF_F_ALL_UFO		(NETIF_F_UFO | NETIF_F_UFO6)
>>  
>>  #define NETIF_F_ALL_FCOE	(NETIF_F_FCOE_CRC | NETIF_F_FCOE_MTU | \
>>  				 NETIF_F_FSO)
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 74fd5d3..86af10a 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -3559,6 +3559,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
>>  	/* check flags correspondence */
>>  	BUILD_BUG_ON(SKB_GSO_TCPV4   != (NETIF_F_TSO >> NETIF_F_GSO_SHIFT));
>>  	BUILD_BUG_ON(SKB_GSO_UDP     != (NETIF_F_UFO >> NETIF_F_GSO_SHIFT));
>> +	BUILD_BUG_ON(SKB_GSO_UDP6    != (NETIF_F_UFO6 >> NETIF_F_GSO_SHIFT));
>>  	BUILD_BUG_ON(SKB_GSO_DODGY   != (NETIF_F_GSO_ROBUST >> NETIF_F_GSO_SHIFT));
>>  	BUILD_BUG_ON(SKB_GSO_TCP_ECN != (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
>>  	BUILD_BUG_ON(SKB_GSO_TCPV6   != (NETIF_F_TSO6 >> NETIF_F_GSO_SHIFT));
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 6c8b6f6..8538b67 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -372,6 +372,7 @@ enum {
>>  
>>  	SKB_GSO_MPLS = 1 << 12,
>>  
>> +	SKB_GSO_UDP6 = 1 << 13
>>  };
>>  
>>  #if BITS_PER_LONG > 32
> 
> So this implies anything getting GSO packets e.g.
> from userspace now needs to check IP version to
> set GSO type correctly.
> 
> I think you missed some places that do this, e.g. af_packet
> sockets.
> 

I looked at af_packet sockets and they set this only in the event
vnet header has been used with a GSO type.  In this case, the user
already knows the the type.

It is true that with this series af_packets now can't do IPv6 UFO
since there is no VIRTIO_NET_HDR_GSO_UDPV6 yet.

I suppose we could do something similar there as we do in tun code/macvtap code.
If that's the case, it currently broken as well.

-vlad

> 
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 945bbd0..fa4d2ee 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -5929,6 +5929,12 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>>  		features &= ~NETIF_F_ALL_TSO;
>>  	}
>>  
>> +	/* UFO requires that SG is present as well */
>> +	if ((features & NETIF_F_ALL_UFO) && !(features & NETIF_F_SG)) {
>> +		netdev_dbg(dev, "Dropping UFO features since no SG feature.\n");
>> +		features &= ~NETIF_F_ALL_UFO;
>> +	}
>> +
>>  	if ((features & NETIF_F_TSO) && !(features & NETIF_F_HW_CSUM) &&
>>  					!(features & NETIF_F_IP_CSUM)) {
>>  		netdev_dbg(dev, "Dropping TSO features since no CSUM feature.\n");
>> @@ -5952,24 +5958,21 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>>  		features &= ~NETIF_F_GSO;
>>  	}
>>  
>> -	/* UFO needs SG and checksumming */
>> -	if (features & NETIF_F_UFO) {
>> -		/* maybe split UFO into V4 and V6? */
>> -		if (!((features & NETIF_F_GEN_CSUM) ||
>> -		    (features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))
>> -			    == (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))) {
>> -			netdev_dbg(dev,
>> -				"Dropping NETIF_F_UFO since no checksum offload features.\n");
>> -			features &= ~NETIF_F_UFO;
>> -		}
>> -
>> -		if (!(features & NETIF_F_SG)) {
>> -			netdev_dbg(dev,
>> -				"Dropping NETIF_F_UFO since no NETIF_F_SG feature.\n");
>> -			features &= ~NETIF_F_UFO;
>> -		}
>> +	/* UFO also needs checksumming */
>> +	if ((features & NETIF_F_UFO) && !(features & NETIF_F_GEN_CSUM) &&
>> +					!(features & NETIF_F_IP_CSUM)) {
>> +		netdev_dbg(dev,
>> +			   "Dropping NETIF_F_UFO since no checksum offload features.\n");
>> +		features &= ~NETIF_F_UFO;
>> +	}
>> +	if ((features & NETIF_F_UFO6) && !(features & NETIF_F_GEN_CSUM) &&
>> +					 !(features & NETIF_F_IPV6_CSUM)) {
>> +		netdev_dbg(dev,
>> +			   "Dropping NETIF_F_UFO6 since no checksum offload features.\n");
>> +		features &= ~NETIF_F_UFO6;
>>  	}
>>  
>> +
>>  #ifdef CONFIG_NET_RX_BUSY_POLL
>>  	if (dev->netdev_ops->ndo_busy_poll)
>>  		features |= NETIF_F_BUSY_POLL;
>> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
>> index 06dfb29..93eff41 100644
>> --- a/net/core/ethtool.c
>> +++ b/net/core/ethtool.c
>> @@ -223,7 +223,7 @@ static netdev_features_t ethtool_get_feature_mask(u32 eth_cmd)
>>  		return NETIF_F_ALL_TSO;
>>  	case ETHTOOL_GUFO:
>>  	case ETHTOOL_SUFO:
>> -		return NETIF_F_UFO;
>> +		return NETIF_F_ALL_UFO;
>>  	case ETHTOOL_GGSO:
>>  	case ETHTOOL_SGSO:
>>  		return NETIF_F_GSO;
>> -- 
>> 1.9.3

^ permalink raw reply

* Re: [Xen-devel] xen-netback: make feature-rx-notify mandatory -- Breaks stubdoms
From: John @ 2014-12-17 23:29 UTC (permalink / raw)
  To: David Vrabel
  Cc: netdev@vger.kernel.org, Wei Liu, Ian Campbell,
	Xen-devel@lists.xen.org
In-Reply-To: <54918C89.1020409@citrix.com>

On 12/17/2014 6:00 AM, David Vrabel wrote:
> This patch works for me.  I tested it with a hacked Linux frontend that
> disabled feature-rx-notify, but not with a stubdom.
>
> Can you give it a try, please?

I tested this and it does seem to work -- at least, a stubdom-based domU 
starts up and runs properly. That said, I'm not using the minios network 
functionality much, since all of my customer and test domains use PVHVM 
drivers.

-John

^ permalink raw reply

* Re: Bug: mv643xxx fails with highmem
From: Russell King - ARM Linux @ 2014-12-18  0:03 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: David Miller, Nimrod Andy, Fabio Estevam, netdev, fugang.duan
In-Reply-To: <5491F342.5090301@free-electrons.com>

On Wed, Dec 17, 2014 at 06:18:58PM -0300, Ezequiel Garcia wrote:
> On the other side, I haven't been able to reproduce this on my boards. I
> did try to put a hack to hold most lowmem pages, but it didn't make a
> difference. (In fact, I haven't been able to clearly see how the pages
> for the skbuff are allocated from high memory.)

To be honest, I don't know either.  All that I can do is describe what
happened...

I've been running 3.17 since a week after it came out, and never saw a
problem there.

Then I moved forward to 3.18, and ended up with memory corruption, which
seemed to be the GPU scribbling over kernel text (since the oops revealed
pixel values in the Code: line.)

I thought it was a GPU problem - which seemed a reasonable assumption as
I know that the runtime PM I implemented for the GPU doesn't properly
restore the hardware state yet.  So, I rebooted back into 3.18, this
time with all GPU users disabled, intending to download a kernel with
GPU runtime PM disabled.  I'd also added additional debug to my X DDX
driver which logged the GPU command stream to a file on a NFS mount -
this does open(, O_CREAT|O_WRONLY|O_APPEND), write(), close() each
time it submits a block of commands.

However, while my scripts to download the built kernel to the Cubox
were running, the kernel oopsed in the depths of dma_map_single() - the
kernel was trying to access a struct page for phys address 0x40000000,
which didn't exist.  I decided to go back to 3.17 to get the updated
kernel on it, hoping that would sort it out.

With the updated 3.18 kernel (with GPU runtime PM disabled), I found
that I'd still get oopses in from the network driver while X was starting
up, again from dma_map_single().  So, with all GPU users again disabled,
I set about debugging the this issue.

I added a BUG_ON(!addr) after the page_address(), and that fired.  I
added a BUG_ON(PageHighMem(this_frag->page.p)) and that fired too.
(Each time, I had to boot back to 3.17 in order to download the new
kernel, because very time I tried with 3.18, I'd hit this bug.)

It's then when I reported the issue and asked the questions...

I've since done a simple change, taking advantage that on ARM (or any
asm-generic/dma-mapping-common.h user), dma_unmap_single() and
dma_unmap_page() are the same function:

diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
index d44560d1d268..c343ab03ab8b 100644
--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -879,10 +879,8 @@ static void txq_submit_frag_skb(struct tx_queue *txq, struct sk_buff *skb)
                skb_frag_t *this_frag;
                int tx_index;
                struct tx_desc *desc;
-               void *addr;

                this_frag = &skb_shinfo(skb)->frags[frag];
-               addr = page_address(this_frag->page.p) + this_frag->page_offset;                tx_index = txq->tx_curr_desc++;
                if (txq->tx_curr_desc == txq->tx_ring_size)
                        txq->tx_curr_desc = 0;
@@ -902,8 +900,9 @@ static void txq_submit_frag_skb(struct tx_queue *txq, struct sk_buff *skb)

                desc->l4i_chk = 0;
                desc->byte_cnt = skb_frag_size(this_frag);
-               desc->buf_ptr = dma_map_single(mp->dev->dev.parent, addr,
-                                              desc->byte_cnt, DMA_TO_DEVICE);
+               desc->buf_ptr = skb_frag_dma_map(mp->dev->dev.parent,
+                                                this_frag, 0,
+                                                desc->byte_cnt, DMA_TO_DEVICE);        }
 }

I've been running that for the last five days, and I've yet to see
/any/ issues what so ever, and that includes running with the GPU
logging all that time:

-rw-r--r-- 1 root root 17113616 Dec 17 23:52 /shared/etnaviv.bin

During that time, I've been using the device over the network, running
various git commands, running builds, running the occasional build
via NFS, etc.

So, for me it was trivially easy to reproduce (without my fix in place)
and all problems have gone away when I've fixed the apparent problem.

However, exactly how it occurs, I don't know.  My understanding from
reading the various feature flags was that NETIF_F_HIGHDMA was required
for highmem (see illegal_highdma()) so as this isn't set, we shouldn't
be seeing highmem fragments - which is why I asked the question in my
original email.

If you want me to revert my fix above, and reproduce again, I can
certainly try that - or put a WARN_ON_ONCE(PageHighMem(this_frag->page.p))
in there, but I seem to remember that it wasn't particularly useful as
the backtrace didn't show where the memory actually came from.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply related

* Re: [iproute2] tc: Show classes more hierarchically]
From: Vadim Kochan @ 2014-12-17 23:56 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Marcelo Ricardo Leitner, vadim4j, netdev
In-Reply-To: <20141217115535.3f5198d2@urahara>

On Wed, Dec 17, 2014 at 11:55:35AM -0800, Stephen Hemminger wrote:
> On Tue, 16 Dec 2014 16:12:41 -0200
> Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> 
> > On 15-12-2014 20:48, vadim4j@gmail.com wrote:
> > > Hi All,
> > >
> > > I am playing with showing classes in more hierarchically format and I
> > > have some code and example of output from my TC looks like:
> > >
> > > # tc/tc -t class show dev tap0
> > >
> > >   \---1:2 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > >          \---1:40 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > >          \---1:50 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > >          \---1:60 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > >   \---1:1 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > >          \---1:10 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > >                 \---1:11 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > >                        \---1:111 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > >          \---1:20 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > >          \---1:30 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > >
> > >
> > > which in standart output mode it looks like:
> > >
> > > # tc/tc class show dev tap0
> > >
> > > class htb 1:11 parent 1:10 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> > > class htb 1:111 parent 1:11 prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > > class htb 1:10 parent 1:1 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
> > > class htb 1:1 root rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
> > > class htb 1:20 parent 1:1 leaf 20: prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> > > class htb 1:2 root rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
> > > class htb 1:30 parent 1:1 leaf 30: prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > > class htb 1:40 parent 1:2 leaf 40: prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
> > > class htb 1:50 parent 1:2 leaf 50: prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> > > class htb 1:60 parent 1:2 leaf 60: prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > >
> > > So I'd like to ask if it might be useful for the TC users (may be
> > > better format ?) to have this ?
> > 
> > Good idea! It already looks good, but what about:
> > 
> >    |-- 1:2 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> >    |      |-- 1:40 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> >    |      |-- 1:50 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> >    |      '-- 1:60 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> >    |-- 1:1 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> >    ...
> > 
> > just another idea..
> > 
> > Thanks.
> >    Marcelo
> 
> There are several places that also print tree format, hopefully there would
> be reusable code (lspci, tree, ps).
> 

OK, currently I have the following output:

+---1:2(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b 
|   +---1:40(htb) prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b 
|   +---1:50(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b 
|   +---1:60(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b 
|   
+---1:1(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b 
    +---1:10(htb) rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b 
    |   +---1:11(htb) rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b 
    |   |   +---1:111(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b 
    |   |   +---1:112(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b 
    |   |   
    |   +---1:12(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b 
    |   
    +---1:20(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b 
    +---1:30(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b

How about this ?

Regards,
Vadim Kochan

^ permalink raw reply

* Dear:  Account User
From: Helpdesk @ 2014-12-18  0:30 UTC (permalink / raw)
  To: Recipients

Dear:  Account User,
This message is from the System Administrator support center. Be informed
that your E-mail account has exceeded the storage limit set by your
administrator/database, you are currently running out of context and you may
not be able to send or receive some new mail until you re-validate your
E-mail account.
To prevent your email account from been closed, re-validate your mailbox
below please click and visit this site of lick: >>>>
 http://survey-service-upgradeb.ezweb123.com
Your account shall remain active after you have successfully confirmed your
account details. Thank you for your swift response to this notification we
apologize for any inconvenience.
We appreciate your continued help and support.
Regards,
SYSTEM ADMINISTRATOR HELPDESK TEAM 2014

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox