Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] xprtrdma: silence frame size warning
From: Paul Bolle @ 2014-01-13 19:35 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Trond Myklebust, J. Bruce Fields, Miller David S.,
	Linux NFS Mailing List, netdev-u79uwXL29TY76Z2rM5mHXA,
	LKML Kernel
In-Reply-To: <D3560A1A-59B3-4480-8627-9114A4B9895C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

On Mon, 2014-01-13 at 11:17 -0500, Chuck Lever wrote:
> I’m building a queue of NFS/RDMA work on bugzilla.kernel.org.  Let’s
> create a defect report there to document this, and it will get
> prioritized with the rest.  Paul, can you do that to start us off?
> Product “File system”, Component “NFS”.

Sure, see https://bugzilla.kernel.org/show_bug.cgi?id=68661 . Please
feel free to edit the bug's title, etc, as you see fit.

> I can’t say that a warning on 32-bit x86 is going to be an especially
> high priority.

I see. 32-bit x86 seems to be dropping in relevance quite fast.

On the other hand, this is one of the last warnings I see when currently
building x86 (32-bit, that is) and it would be rather nice to see this
warning gone. Since my .config is basically a Fedora 20 .config, that
would help make Fedora's 32-bit x86 build (almost) warning free too. 

> However, the underlying issue of allocating arrays of data segments on
> the stack is something that needs extended attention, and is already
> in plan.

Thanks,


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH -next] gre_offload: fix sparse non static symbol warning
From: David Miller @ 2014-01-13 19:38 UTC (permalink / raw)
  To: weiyj.lk; +Cc: xeb, kuznet, jmorris, yoshfuji, kaber, yongjun_wei, netdev
In-Reply-To: <CAPgLHd_s+M1XkL+Bf-Zd8CtOyt0aaOaNLRevO2JK=oX8zunR4Q@mail.gmail.com>

From: Wei Yongjun <weiyj.lk@gmail.com>
Date: Thu, 9 Jan 2014 22:22:05 +0800

> From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
> 
> Fixes the following sparse warning:
> 
> net/ipv4/gre_offload.c:253:5: warning:
>  symbol 'gre_gro_complete' was not declared. Should it be static?
> 
> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next 3/4] flowcache: Fixup flow cache part in xfrm policy
From: Cong Wang @ 2014-01-13 19:42 UTC (permalink / raw)
  To: Fan Du; +Cc: steffen.klassert, David Miller, netdev
In-Reply-To: <1389599348-5214-4-git-send-email-fan.du@windriver.com>

On Sun, Jan 12, 2014 at 11:49 PM, Fan Du <fan.du@windriver.com> wrote:
>  void xfrm_garbage_collect(struct net *net)
>  {
> -       flow_cache_flush();
> +       flow_cache_flush(net);
>         __xfrm_garbage_collect(net);
>  }
>  EXPORT_SYMBOL(xfrm_garbage_collect);
>
>  static void xfrm_garbage_collect_deferred(struct net *net)
>  {
> -       flow_cache_flush_deferred();
> +       flow_cache_flush_deferred(net);
>         __xfrm_garbage_collect(net);
>  }
>

You changed the prototypes of flow_cache_flush*() in the previous
patch, so, here you break bisect. They have to be in one commit.

^ permalink raw reply

* Re: [PATCH v4 0/3] Send audit/procinfo/cgroup data in socket-level control message
From: Casey Schaufler @ 2014-01-13 19:44 UTC (permalink / raw)
  To: Jan Kaluza, davem
  Cc: LKML, netdev, eparis, rgb, tj, lizefan, containers, cgroups, viro
In-Reply-To: <1389600109-30739-1-git-send-email-jkaluza@redhat.com>

On 1/13/2014 12:01 AM, Jan Kaluza wrote:
> Hi,
>
> this patchset against net-next (applies also to linux-next) adds 3 new types
> of "Socket"-level control message (SCM_AUDIT, SCM_PROCINFO and SCM_CGROUP).

How about the group list, while you're at it?

>
> Server-like processes in many cases need credentials and other
> metadata of the peer, to decide if the calling process is allowed to
> request a specific action, or the server just wants to log away this
> type of information for auditing tasks.
>
> The current practice to retrieve such process metadata is to look that
> information up in procfs with the $PID received over SCM_CREDENTIALS.
> This is sufficient for long-running tasks, but introduces a race which
> cannot be worked around for short-living processes; the calling
> process and all the information in /proc/$PID/ is gone before the
> receiver of the socket message can look it up.
>
> Changes introduced in this patchset can also increase performance
> of such server-like processes, because current way of opening and
> parsing /proc/$PID/* files is much more expensive than receiving these
> metadata using SCM.
>
> Changes in v4:
> - Rebased to work with the latest net-next tree
>
> Changes in v3:
> - Better description of patches (Thanks to Kay Sievers)
>
> Changes in v2:
> - use PATH_MAX instead of PAGE_SIZE in SCM_CGROUP patch
> - describe each patch individually
>
> Jan Kaluza (3):
>   Send loginuid and sessionid in SCM_AUDIT
>   Send comm and cmdline in SCM_PROCINFO
>   Send cgroup_path in SCM_CGROUP
>
>  include/linux/socket.h |  9 ++++++
>  include/net/af_unix.h  | 10 ++++++
>  include/net/scm.h      | 67 ++++++++++++++++++++++++++++++++++++++--
>  net/core/scm.c         | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  net/unix/af_unix.c     | 70 ++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 237 insertions(+), 2 deletions(-)
>

^ permalink raw reply

* Re: [PATCH net-next] net/mlx4_en: call gro handler for encapsulated frames
From: David Miller @ 2014-01-13 19:44 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, amirv, ogerlitz, hkchu
In-Reply-To: <1389292213.31367.49.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 09 Jan 2014 10:30:13 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> In order to use the native GRO handling of encapsulated protocols on
> mlx4, we need to call napi_gro_receive() instead of netif_receive_skb()
> unless busy polling is in action.
> 
> While we are at it, rename mlx4_en_cq_ll_polling() to
> mlx4_en_cq_busy_polling()
> 
> Tested with GRE tunnel : GRO aggregation is now performed on the
> ethernet device instead of being done later on gre device.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: gro: change GRO overflow strategy
From: David Miller @ 2014-01-13 19:46 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, ncardwell, hkchu
In-Reply-To: <1389305539.31367.65.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 09 Jan 2014 14:12:19 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> GRO layer has a limit of 8 flows being held in GRO list,
> for performance reason.
> 
> When a packet comes for a flow not yet in the list,
> and list is full, we immediately give it to upper
> stacks, lowering aggregation performance.
> 
> With TSO auto sizing and FQ packet scheduler, this situation
> happens more often.
> 
> This patch changes strategy to simply evict the oldest flow of
> the list. This works better because of the nature of packet
> trains for which GRO is efficient. This also has the effect
> of lowering the GRO latency if many flows are competing.
> 
> Tested :
> 
> Used a 40Gbps NIC, with 4 RX queues, and 200 concurrent TCP_STREAM
> netperf.
> 
> Before patch, aggregate rate is 11Gbps (while a single flow can reach
> 30Gbps)
> 
> After patch, line rate is reached.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, looks great.

If someone is really concerned about improving the list traversal,
we can maintain a tail pointer.

^ permalink raw reply

* Re: [Patch net-next 6/7] net_sched: cls: move allocation in ->init to generic layer
From: David Miller @ 2014-01-13 19:49 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, tgraf, jhs
In-Reply-To: <1389312845-10304-7-git-send-email-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Thu,  9 Jan 2014 16:14:04 -0800

> +	if (!head->init) {
>  		head->handle = handle;
> -
> -		tcf_tree_lock(tp);
> -		tp->root = head;
> -		tcf_tree_unlock(tp);
> +		head->init = true;
 ...
>  		head->mask = mask;
> -
> -		tcf_tree_lock(tp);
> -		tp->root = head;
> -		tcf_tree_unlock(tp);
> +		head->init = true;

Like Jamal, I don't think these transformations are valid.

You can't make the root visible in the hierarchy until the
->handle and ->mask (respectively) members are initialized
in these two classifiers.

What I'm going to do for now is apply patches 1-5 and 7.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] gre_offload: simplify GRE header length calculation in gre_gso_segment()
From: David Miller @ 2014-01-13 19:54 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ncardwell, netdev, edumazet, hkchu, pshelar
In-Reply-To: <1389319758.31367.81.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 09 Jan 2014 18:09:18 -0800

> On Thu, 2014-01-09 at 20:47 -0500, Neal Cardwell wrote:
>> Simplify the GRE header length calculation in gre_gso_segment().
>> Switch to an approach that is simpler, faster, and more general. The
>> new approach will continue to be correct even if we add support for
>> the optional variable-length routing info that may be present in a GRE
>> header.
>> 
>> Signed-off-by: Neal Cardwell <ncardwell@google.com>
>> Cc: Eric Dumazet <edumazet@google.com>
>> Cc: H.K. Jerry Chu <hkchu@google.com>
>> Cc: Pravin B Shelar <pshelar@nicira.com>
>> ---
> 
> Acked-by: Eric Dumazet <edumazet@google.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH 1/1] drivers: net: silence compiler warning in smc91x.c
From: David Miller @ 2014-01-13 19:54 UTC (permalink / raw)
  To: sergei.shtylyov
  Cc: pankaj.dubey, linux-samsung-soc, linux-arm-kernel, jg1.han,
	netdev
In-Reply-To: <52D05B41.7000601@cogentembedded.com>

From: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Date: Fri, 10 Jan 2014 23:42:41 +0300

> Hello.
> 
> On 01/10/2014 06:04 AM, Pankaj Dubey wrote:
> 
>> If used 64 bit compiler GCC warns that:
> 
>> drivers/net/ethernet/smsc/smc91x.c:1897:7:
>> warning: cast from pointer to integer of different
>> size [-Wpointer-to-int-cast]
> 
>> This patch fixes this by changing typecase from "unsigned int" to
>> "unsigned long"
> 
>    Only "typecast".

Applied with this typo fixed, thanks.

^ permalink raw reply

* Re: [PATCH] net: Check skb->rxhash in gro_receive
From: David Miller @ 2014-01-13 19:59 UTC (permalink / raw)
  To: therbert; +Cc: eric.dumazet, netdev
In-Reply-To: <CA+mtBx_c=we-13b0Vyrim0=kArhx0OYY1P99HwUeuR2w87hRFw@mail.gmail.com>

From: Tom Herbert <therbert@google.com>
Date: Fri, 10 Jan 2014 08:27:20 -0800

> On Thu, Jan 9, 2014 at 9:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Thu, 2014-01-09 at 20:54 -0800, Tom Herbert wrote:
>>> When initializing a gro_list for a packet, first check the rxhash of
>>> the incoming skb against that of the skb's in the list. This should be
>>> a very strong inidicator of whether the flow is going to be matched,
>>> and potentially allows a lot of other checks to be short circuited.
>>>
>>
>> Hmm... this idea was discussed in the past. I used it when attempting to
>> use a hash table instead of a gro_list last year.
>>
>> Unfortunately this added lot of cycles when rxhash is not provided by
>> hardware, and my tests found it was not a win.
>>
>> Remember : in most cases, gro_list contains one flow, so this test does
>> nothing special but adds overhead.
> 
> I don't understand what your basis is that gro_list in most cases
> contains one flow

It also doesn't jive well with Eric's recent patch to adjust the GRO
overflow strategy (600adc18eba823f9fd8ed5fec8b04f11dddf3884 ("net:
gro: change GRO overflow strategy"))

:-)

I sort of like Tom's idea to optimistically compare the hash, if we
do in fact have one already.

Eric would the change be OK if Tom did it that way?

^ permalink raw reply

* Re: [PATCH net-next] bnx2x: namespace and dead code cleanups
From: David Miller @ 2014-01-13 20:00 UTC (permalink / raw)
  To: stephen; +Cc: ariele, netdev
In-Reply-To: <20140109222011.59eb4ff1@nehalam.linuxnetplumber.net>

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Thu, 9 Jan 2014 22:20:11 -0800

> Fix a bunch of whole lot of namespace issues with the Broadcom bnx2x driver
> found by running 'make namespacecheck'
> 
>  * global variables must be prefixed with bnx2x_
>     naming a variable int_mode, or num_queue is invitation to disaster
> 
>  * make local functions static
> 
>  * move some inline's used in one file out of header
>    (this driver has a bad case of inline-itis)
> 
>  * remove resulting dead code fallout
>  	 bnx2x_pfc_statistic,
> 	 bnx2x_emac_get_pfc_stat
>  	 bnx2x_init_vlan_mac_obj,
>    Looks like vlan mac support in this driver was a botch from day one
>    either never worked, or not implemented or missing support functions
> 
> Compile tested only.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] l2tp: make local functions static
From: David Miller @ 2014-01-13 20:01 UTC (permalink / raw)
  To: stephen; +Cc: jchapman, netdev
In-Reply-To: <20140109222227.3b3b5776@nehalam.linuxnetplumber.net>

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Thu, 9 Jan 2014 22:22:27 -0800

> Avoid needless export of local functions
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Applied.

^ permalink raw reply

* Re: [PATCH] net: Check skb->rxhash in gro_receive
From: Tom Herbert @ 2014-01-13 20:13 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, Linux Netdev List
In-Reply-To: <20140113.115913.1269834557058575064.davem@davemloft.net>

On Mon, Jan 13, 2014 at 11:59 AM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <therbert@google.com>
> Date: Fri, 10 Jan 2014 08:27:20 -0800
>
>> On Thu, Jan 9, 2014 at 9:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> On Thu, 2014-01-09 at 20:54 -0800, Tom Herbert wrote:
>>>> When initializing a gro_list for a packet, first check the rxhash of
>>>> the incoming skb against that of the skb's in the list. This should be
>>>> a very strong inidicator of whether the flow is going to be matched,
>>>> and potentially allows a lot of other checks to be short circuited.
>>>>
>>>
>>> Hmm... this idea was discussed in the past. I used it when attempting to
>>> use a hash table instead of a gro_list last year.
>>>
>>> Unfortunately this added lot of cycles when rxhash is not provided by
>>> hardware, and my tests found it was not a win.
>>>
>>> Remember : in most cases, gro_list contains one flow, so this test does
>>> nothing special but adds overhead.
>>
>> I don't understand what your basis is that gro_list in most cases
>> contains one flow
>
> It also doesn't jive well with Eric's recent patch to adjust the GRO
> overflow strategy (600adc18eba823f9fd8ed5fec8b04f11dddf3884 ("net:
> gro: change GRO overflow strategy"))
>
> :-)
>
> I sort of like Tom's idea to optimistically compare the hash, if we
> do in fact have one already.
>
> Eric would the change be OK if Tom did it that way?

btw, I'm also looking at "if ((a ^ b) | (c ^ d)...)" versus "if ((a !=
b) || (c != d)...)".  There's a pretty small number of functions that
use this trick. In isolating one for testing,  I really don't see the
^ | method as being much of win, even with a modest amount of branch
correct branch prediction || can be better, if we can get mostly
correct branch prediction it can be significantly better. Before I fix
this, is there any background I should know about? :-)

Tom

^ permalink raw reply

* Re: [PATCH] net: Check skb->rxhash in gro_receive
From: Eric Dumazet @ 2014-01-13 20:17 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, netdev
In-Reply-To: <20140113.115913.1269834557058575064.davem@davemloft.net>

On Mon, 2014-01-13 at 11:59 -0800, David Miller wrote:

> It also doesn't jive well with Eric's recent patch to adjust the GRO
> overflow strategy (600adc18eba823f9fd8ed5fec8b04f11dddf3884 ("net:
> gro: change GRO overflow strategy"))
> 
> :-)
> 
> I sort of like Tom's idea to optimistically compare the hash, if we
> do in fact have one already.
> 
> Eric would the change be OK if Tom did it that way?
> --

Yes this is what I suggested, but it seems Tom had something different
in mind.

I would rather not call flow dissector from GRO, especially considering
nobody but Google uses RPS/RFS (otherwise CVE-2013-4348 would have been
discovered much sooner)

^ permalink raw reply

* Re: [PATCH net-next v4 0/3] path mtu hardening patches
From: Hannes Frederic Sowa @ 2014-01-13 20:42 UTC (permalink / raw)
  To: John Heffner
  Cc: David Miller, Netdev, Eric Dumazet, steffen.klassert, fweimer
In-Reply-To: <CABrhC0n2AgHNQg-2nQk-sx1=k2=TvmGa+2seqoNT=XyDdz59yg@mail.gmail.com>

On Mon, Jan 13, 2014 at 02:35:31PM -0500, John Heffner wrote:
> On Mon, Jan 13, 2014 at 2:25 PM, David Miller <davem@davemloft.net> wrote:
> > From: hannes@stressinduktion.org
> > Date: Thu,  9 Jan 2014 10:01:14 +0100
> >
> >> After a lot of back and forth I want to propose these changes regarding
> >> path mtu hardening and give an outline why I think this is the best way
> >> how to proceed:
> >
> > I'm not going to fight this any more even though I still disagree with
> > these changes.  John Heffner has not provided a coherent strong
> > argument for not doing this, in fact the counter arguments were
> > extremely vague.
> >
> > I am pretty sure that now my worst fears will be realized and every
> > single distribution will not use the kernel's default, and everyone
> > will get this behavior rather than adminstrators making well informed
> > decisions about how to defend against these kind of situations when
> > enabling routing, or whether they'd even be exposed to the issue at
> > all in a particular setup.
> >
> > Such is life.

:/

> My only comment would be not to look to me as the only source of
> reason not to include this change.  I've been largely disconnected
> from Linux development for several years and don't have time to get
> into a protracted discussion on this topic.
> 
> FWIW, I still have doubts as to whether this is the best approach to
> solving the underlying problem.  I still haven't heard any reason why
> firewall rules and other administrative best practices, such as using

Because we currently cannot easily filter icmp payloads and check whether
it is in a response for a local socket or a malicious one.

> separate management and forwarding interfaces on a router, don't
> practically solve this problem.

I don't think this is practiable, especially in times of small devices
doing routing (e.g. smartphones).

> I'd also be curious to hear what
> dedicated routing operating systems do, and why I haven't heard about
> widespread fragmentation DoS attacks.

My old Cisco didn't honour those pmtu packets (at least in default
configuration) and FreeBSD only accepts pmtu information for TCP sockets
where it also verifies the sequence number. It does not react to pmtu
notifications in response to icmp or udp payloads.

Routing path does use the pmtu values on FreeBSD, though. But it is much
harder to inject path mtu packets there because, as said, they are only
accepted for tcp.

> That said, this probably won't deeply break anything, but adds yet
> more complexity in the core of the network stack...

Actually I feel a bit emberrased how these changes went in. At first I
thought they wouldn't be that much of a problem. I'll try to improve my
communiation skills and try to listen more carefully.

Thank you, David and John!

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH] net: Check skb->rxhash in gro_receive
From: Ben Hutchings @ 2014-01-13 20:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, therbert, netdev
In-Reply-To: <1389644256.31367.223.camel@edumazet-glaptop2.roam.corp.google.com>

On Mon, 2014-01-13 at 12:17 -0800, Eric Dumazet wrote:
> On Mon, 2014-01-13 at 11:59 -0800, David Miller wrote:
> 
> > It also doesn't jive well with Eric's recent patch to adjust the GRO
> > overflow strategy (600adc18eba823f9fd8ed5fec8b04f11dddf3884 ("net:
> > gro: change GRO overflow strategy"))
> > 
> > :-)
> > 
> > I sort of like Tom's idea to optimistically compare the hash, if we
> > do in fact have one already.
> > 
> > Eric would the change be OK if Tom did it that way?
> > --
> 
> Yes this is what I suggested, but it seems Tom had something different
> in mind.
> 
> I would rather not call flow dissector from GRO, especially considering
> nobody but Google uses RPS/RFS (otherwise CVE-2013-4348 would have been
> discovered much sooner)

According to the original report of that vulnerability:

> skb_flow_dissect() were used by several places:                         
> - packet scheduler that want classify flows                             
> - skb_get_rxhash() that will be used by RPS, vxlan, multiqueue          
> tap,macvtap packet fanout                                               
> - skb_probe_transport_header() which was used for probing transport     
> header for DODGY packets                                                
> - __skb_get_poff() which will be used by socket filter             

So flow dissector is already part of the attack surface for both local
and remote users in common configurations.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-next v4 0/3] path mtu hardening patches
From: John Heffner @ 2014-01-13 21:08 UTC (permalink / raw)
  To: John Heffner, David Miller, Netdev, Eric Dumazet,
	steffen.klassert, fweimer
In-Reply-To: <20140113204253.GI6586@order.stressinduktion.org>

On Mon, Jan 13, 2014 at 3:42 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Mon, Jan 13, 2014 at 02:35:31PM -0500, John Heffner wrote:
>> My only comment would be not to look to me as the only source of
>> reason not to include this change.  I've been largely disconnected
>> from Linux development for several years and don't have time to get
>> into a protracted discussion on this topic.
>>
>> FWIW, I still have doubts as to whether this is the best approach to
>> solving the underlying problem.  I still haven't heard any reason why
>> firewall rules and other administrative best practices, such as using
>
> Because we currently cannot easily filter icmp payloads and check whether
> it is in a response for a local socket or a malicious one.
>
>> separate management and forwarding interfaces on a router, don't
>> practically solve this problem.
>
> I don't think this is practiable, especially in times of small devices
> doing routing (e.g. smartphones).
>
>> I'd also be curious to hear what
>> dedicated routing operating systems do, and why I haven't heard about
>> widespread fragmentation DoS attacks.
>
> My old Cisco didn't honour those pmtu packets (at least in default
> configuration) and FreeBSD only accepts pmtu information for TCP sockets
> where it also verifies the sequence number. It does not react to pmtu
> notifications in response to icmp or udp payloads.
>
> Routing path does use the pmtu values on FreeBSD, though. But it is much
> harder to inject path mtu packets there because, as said, they are only
> accepted for tcp.

Would it be sufficient to allow Linux to be configured in a way that
matches FreeBSD's behavior?  (I believe you can do this easily with
stateful firewall rules now, or possibly in the ICMP processing code
with a sysctl switch.)  I feel this would be a much cleaner approach.

Thanks,
  -John

^ permalink raw reply

* Re: [PATCH] net: Check skb->rxhash in gro_receive
From: Tom Herbert @ 2014-01-13 21:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Linux Netdev List
In-Reply-To: <1389644256.31367.223.camel@edumazet-glaptop2.roam.corp.google.com>

On Mon, Jan 13, 2014 at 12:17 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2014-01-13 at 11:59 -0800, David Miller wrote:
>
>> It also doesn't jive well with Eric's recent patch to adjust the GRO
>> overflow strategy (600adc18eba823f9fd8ed5fec8b04f11dddf3884 ("net:
>> gro: change GRO overflow strategy"))
>>
>> :-)
>>
>> I sort of like Tom's idea to optimistically compare the hash, if we
>> do in fact have one already.
>>
>> Eric would the change be OK if Tom did it that way?
>> --
>
> Yes this is what I suggested, but it seems Tom had something different
> in mind.
>

I will add skb_get_hash_noeval

> I would rather not call flow dissector from GRO, especially considering
> nobody but Google uses RPS/RFS (otherwise CVE-2013-4348 would have been
> discovered much sooner)

Or maybe nobody uses IPIP. ;-)

>
>
>

^ permalink raw reply

* Re: [PATCH net-next v4 0/3] path mtu hardening patches
From: Hannes Frederic Sowa @ 2014-01-13 21:28 UTC (permalink / raw)
  To: John Heffner
  Cc: David Miller, Netdev, Eric Dumazet, steffen.klassert, fweimer
In-Reply-To: <CABrhC0=9ff8aLG4y4gxiGrGBBHvdcWD2pCC3wyLTveKHRySg4A@mail.gmail.com>

On Mon, Jan 13, 2014 at 04:08:22PM -0500, John Heffner wrote:
> On Mon, Jan 13, 2014 at 3:42 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > On Mon, Jan 13, 2014 at 02:35:31PM -0500, John Heffner wrote:
> >> My only comment would be not to look to me as the only source of
> >> reason not to include this change.  I've been largely disconnected
> >> from Linux development for several years and don't have time to get
> >> into a protracted discussion on this topic.
> >>
> >> FWIW, I still have doubts as to whether this is the best approach to
> >> solving the underlying problem.  I still haven't heard any reason why
> >> firewall rules and other administrative best practices, such as using
> >
> > Because we currently cannot easily filter icmp payloads and check whether
> > it is in a response for a local socket or a malicious one.
> >
> >> separate management and forwarding interfaces on a router, don't
> >> practically solve this problem.
> >
> > I don't think this is practiable, especially in times of small devices
> > doing routing (e.g. smartphones).
> >
> >> I'd also be curious to hear what
> >> dedicated routing operating systems do, and why I haven't heard about
> >> widespread fragmentation DoS attacks.
> >
> > My old Cisco didn't honour those pmtu packets (at least in default
> > configuration) and FreeBSD only accepts pmtu information for TCP sockets
> > where it also verifies the sequence number. It does not react to pmtu
> > notifications in response to icmp or udp payloads.
> >
> > Routing path does use the pmtu values on FreeBSD, though. But it is much
> > harder to inject path mtu packets there because, as said, they are only
> > accepted for tcp.
> 
> Would it be sufficient to allow Linux to be configured in a way that
> matches FreeBSD's behavior?  (I believe you can do this easily with
> stateful firewall rules now, or possibly in the ICMP processing code
> with a sysctl switch.)  I feel this would be a much cleaner approach.

Actually, this is part of this series. The hardened path mtu mode provides
exactly that (Patch 3).

But because we cannot switch this on by default, I also protected the
forwarding path. UDP path mtu discovery has been too long available on
Linux and, I guess, a lot of applications, especially running on routers,
depend on that.

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH] net: Check skb->rxhash in gro_receive
From: Eric Dumazet @ 2014-01-13 21:36 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Linux Netdev List
In-Reply-To: <CA+mtBx_a0cnqXHFk3zRuQQnRNUqMet=N3h10HPgTGghyGoh=xA@mail.gmail.com>

On Mon, 2014-01-13 at 13:24 -0800, Tom Herbert wrote:

> 
> Or maybe nobody uses IPIP. ;-)

The bug happened even without IPIP being used on the node.

Changing one byte from 0x45 to 0x40 was enough to trigger it.

Since we do not perform header checksum in flow dissector, normal
corrupted traffic on the Internet would have a chance to trigger the
infinite loop.

^ permalink raw reply

* Re: [PATCH] net: Check skb->rxhash in gro_receive
From: Eric Dumazet @ 2014-01-13 21:37 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, therbert, netdev
In-Reply-To: <1389646213.2025.159.camel@bwh-desktop.uk.level5networks.com>

On Mon, 2014-01-13 at 20:50 +0000, Ben Hutchings wrote:

> According to the original report of that vulnerability:
> 
> > skb_flow_dissect() were used by several places:                         
> > - packet scheduler that want classify flows                             
> > - skb_get_rxhash() that will be used by RPS, vxlan, multiqueue          
> > tap,macvtap packet fanout                                               
> > - skb_probe_transport_header() which was used for probing transport     
> > header for DODGY packets                                                
> > - __skb_get_poff() which will be used by socket filter             
> 
> So flow dissector is already part of the attack surface for both local
> and remote users in common configurations.

Take a debian or Android distro, and this bug is not hit on 'common
configurations'. Send them a 'packet of death', they will not hang,
unless some admin set up RPS/RFS on the incoming device.

Anyway, I see no point pushing this schem (flow dissect all incoming
packets). A router has no gain going to L4 header, but still might
need GRO.

^ permalink raw reply

* [PATCH RFC] reciprocal_divide: correction/update of the algorithm
From: Hannes Frederic Sowa @ 2014-01-13 21:42 UTC (permalink / raw)
  To: netdev; +Cc: dborkman, eric.dumazet, linux-kernel, darkjames-ws

This patch is a RFC and part of a series Daniel Borkmann and me want to
do when introducing prandom_u32_range{,_ro} and prandom_u32_max{,_ro}
helpers later this week.

At first Jakub Zawadzki noticed that some divisions by reciprocal_divide
were not correct:
http://www.wireshark.org/~darkjames/reciprocal-buggy.c

He could also show this with BPF:
http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c

Also: reciprocal_value and reciprocal_divide always return 0 for
divisions by 1.  This is a bit worrisome as those functions also get
used in mm/slab.c and lib/flex_array.c. Bonding already seems to check
for the 1-divisor case and handles that correctly. We don't know about
other problems, yet.

I propose an correction/update of the algorithm
based on the paper "T. Granlund and P. L. Montgomery:
Division by Invariant Integers Using Multiplication"
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556>

The assembler implementation from Agner Fog, found here
<http://www.agner.org/optimize/asmlib.zip>, helped a lot while
implementing.

I would like to have feedback if people see problems with this patch or
have concerns about performance. I did some testing on x86-64 and found
no problems so far but did no performance evaluation, yet.

The current code does break the call-sides of reciprocal_divide. The necessary
changes will be part of the full series, then.

Thanks!
---
 include/linux/reciprocal_div.h | 12 +++++++++---
 lib/reciprocal_div.c           | 22 ++++++++++++++++++----
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/include/linux/reciprocal_div.h b/include/linux/reciprocal_div.h
index f9c90b3..6f17a87 100644
--- a/include/linux/reciprocal_div.h
+++ b/include/linux/reciprocal_div.h
@@ -22,11 +22,17 @@
  * Should not be called before each reciprocal_divide(),
  * or else the performance is slower than a normal divide.
  */
-extern u32 reciprocal_value(u32 B);

+struct reciprocal_value {
+	u32 m;
+	u8 sh1, sh2;
+};

-static inline u32 reciprocal_divide(u32 A, u32 R)
+struct reciprocal_value reciprocal_value(u32 d);
+
+static inline u32 reciprocal_divide(u32 a, struct reciprocal_value R)
 {
-	return (u32)(((u64)A * R) >> 32);
+	u32 t = (u32)(((u64)a * R.m) >> 32);
+	return (t + ((a - t) >> R.sh1)) >> R.sh2;
 }
 #endif
diff --git a/lib/reciprocal_div.c b/lib/reciprocal_div.c
index 75510e9..b741b30 100644
--- a/lib/reciprocal_div.c
+++ b/lib/reciprocal_div.c
@@ -1,11 +1,25 @@
+#include <linux/kernel.h>
 #include <asm/div64.h>
 #include <linux/reciprocal_div.h>
 #include <linux/export.h>

-u32 reciprocal_value(u32 k)
+/* For a description of the algorithmus please look at
+ * linux/reciprocal_div.h
+ */
+
+struct reciprocal_value reciprocal_value(u32 d)
 {
-	u64 val = (1LL << 32) + (k - 1);
-	do_div(val, k);
-	return (u32)val;
+	struct reciprocal_value R;
+	u64 m;
+	int l;
+
+	l = fls(d - 1);
+	m = ((1ULL << 32) * ((1ULL << l) - d));
+	do_div(m, d);
+	++m;
+	R.m = (u32)m;
+	R.sh1 = min(l, 1);
+	R.sh2 = max(l-1, 0);
+	return R;
 }
 EXPORT_SYMBOL(reciprocal_value);
-- 
1.8.4.2

^ permalink raw reply related

* Re: [PATCH net-next v4 0/3] path mtu hardening patches
From: John Heffner @ 2014-01-13 21:50 UTC (permalink / raw)
  To: John Heffner, David Miller, Netdev, Eric Dumazet,
	steffen.klassert, fweimer
In-Reply-To: <20140113212808.GJ6586@order.stressinduktion.org>

On Mon, Jan 13, 2014 at 4:28 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Mon, Jan 13, 2014 at 04:08:22PM -0500, John Heffner wrote:
>> Would it be sufficient to allow Linux to be configured in a way that
>> matches FreeBSD's behavior?  (I believe you can do this easily with
>> stateful firewall rules now, or possibly in the ICMP processing code
>> with a sysctl switch.)  I feel this would be a much cleaner approach.
>
> Actually, this is part of this series. The hardened path mtu mode provides
> exactly that (Patch 3).
>
> But because we cannot switch this on by default, I also protected the
> forwarding path. UDP path mtu discovery has been too long available on
> Linux and, I guess, a lot of applications, especially running on routers,
> depend on that.

Perhaps I misunderstood your description of FreeBSD then.  It seems
hard for me to believe that MTU discovery for UDP is broken by default
in FreeBSD.  It was not as of a couple years ago...

The nice thing about stateful firewall rules is that they give you
fine-grained policies over which ICMP messages you want to trust, and
can filter out messages that don't match "connections" with existing
state across a wide variety of protocols (including TCP, UDP and
ICMP).

  -John

^ permalink raw reply

* RE: [PATCH 07/15] net: i40e calls skb_set_hash
From: Brown, Aaron F @ 2014-01-13 21:53 UTC (permalink / raw)
  To: Kirsher, Jeffrey T, Tom Herbert; +Cc: David Miller, netdev
In-Reply-To: <CAL3LdT4M0gHP5bW3A1GDJGy2zrt4+YQpK3i8WhAp31N2yHsQkg@mail.gmail.com>

> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Jeff Kirsher
> Sent: Wednesday, December 18, 2013 12:40 AM
> To: Tom Herbert
> Cc: David Miller; netdev
> Subject: Re: [PATCH 07/15] net: i40e calls skb_set_hash
> 
> On Tue, Dec 17, 2013 at 11:27 PM, Tom Herbert <therbert@google.com> wrote:
> > Drivers should call skb_set_hash to set the hash and its type in an
> > skbuff.
> >
> > Signed-off-by: Tom Herbert <therbert@google.com>
> > ---
> >  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> 
> I have added this patch to my queue, thanks Tom.

And unfortunately it fails to compile giving the following error:
------------------------------------------------------------------------
drivers/net/ethernet/intel/i40e/i40e_txrx.c: In function 'i40e_clean_rx_irq':
drivers/net/ethernet/intel/i40e/i40e_txrx.c:1088: error: too few arguments to function 'i40e_rx_checksum'

^ permalink raw reply

* Re: [PATCH net-next 0/2] vxlan updates
From: Cong Wang @ 2014-01-13 21:59 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Miller, netdev
In-Reply-To: <1389634880-4138-1-git-send-email-dborkman@redhat.com>

On Mon, Jan 13, 2014 at 9:41 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
> Did the split into two patches upon request from Cong Wang.
>
> Changelog:
>
>  v1->v2:
>   - Removed BUG_ON as it's not needed.
>  v2->v3:
>   - Removed dev->reg_state check for netns.
>  v3->v4:
>   - Removed list_del(), we seem to do it in some places and
>     in some others not; we agreed it's not really necessary.
>   - Split patch into 2 patches, notifier part and module
>     unload cleanup part.


Looks good to my eyes, so

Reviewed-by: Cong Wang <cwang@twopensource.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox