Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: UBSAN reports issue in ip_idents_reserve
From: Eric Dumazet @ 2016-09-20 15:25 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev
In-Reply-To: <1474380711.23058.8.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, 2016-09-20 at 07:11 -0700, Eric Dumazet wrote:
> On Tue, 2016-09-20 at 15:39 +0200, Jiri Pirko wrote:
> 
> > I see. So how to silent the warning?
> > 
> 
> We can replace the atomic_add_return() and use a loop around
> atomic_read() and atomic_cmpxhg()
> 
> This would change the nice property of x86 xadd into a loop.
> 
> Or we also could fallback to random generation if the atomic_cmpxchg()
> fails.
> 
> I'll provide a patch, thanks.
> 

I looks at other places, I am surprised you do not see other UBSAN
issues in networking :)

netdev_refcnt_read() can potentially gives errors as well.

^ permalink raw reply

* Re: [PATCH net-next 2/2] bnx2x: allocate mac filtering pending list in PAGE_SIZE increments
From: Jason Baron @ 2016-09-20 15:19 UTC (permalink / raw)
  To: Mintz, Yuval, davem@davemloft.net
  Cc: netdev@vger.kernel.org, Ariel.Elior@qlogic.com
In-Reply-To: <BL2PR07MB2306ECB3DBAB53BFB9CF30EC8DF70@BL2PR07MB2306.namprd07.prod.outlook.com>



On 09/20/2016 11:00 AM, Mintz, Yuval wrote:
>>> The question I rose was whether it actually makes a difference under
>>> such circumstances whether the device would actually filter those
>>> multicast addresses or be completely multicast promiscuous.
>>> e.g., whether it's significant to be filtering out multicast ingress
>>> traffic when you're already allowing 1/2 of all random multicast
>>> packets to be classified for the interface.
>>>
>>
>> Agreed, I think this is the more interesting question here. I thought that we
>> would want to make sure we are using most of the bins before falling back to
>> multicast ingress. The reason being that even if its more expensive for the NIC to
>> do the filtering than the multicast mode, it would be more than made up for by
>> having to drop the traffic higher up the stack. So I think if we can determine the
>> percent of the bins that we want to use, we can then back into the average
>> number of filters required to get there. As I said, I thought we would want to
>> make sure we filled basically all the bins (with a high probability that is) before
>> falling back to multicast, and so I threw out 2,048.
>
> AFAIK configuring multiple filters doesn't incur any performance penalty
> from the adapter side.
> And I agree that from 'offloading' perspective it's probably better to
> filter in HW even if the gain is negligible.
> So for the upper limit - there's not much of a reason to it; The only gain
> would be to prevent driver from allocating lots-and-lots of memory
> temporarily for an unnecessary configuration.
>

Ok. We already have an upper limit to an extent with 
/proc/sys/net/ipv4/igmp_max_memberships. And as posted I didn't include 
one b/c of the higher level limits already in place.

Thanks,

-Jason

^ permalink raw reply

* Re: [PATCH] netfilter: xt_socket: fix transparent match for IPv6 request sockets
From: Eric Dumazet @ 2016-09-20 15:05 UTC (permalink / raw)
  To: KOVACS Krisztian; +Cc: Pablo Neira Ayuso, netfilter-devel, Alex Badics, netdev
In-Reply-To: <1474383672.23058.19.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, 2016-09-20 at 08:01 -0700, Eric Dumazet wrote:

> 
> Something like :
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 3ebf45b38bc3..0fccfd6cc258 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -6264,6 +6264,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
>  	/* Note: tcp_v6_init_req() might override ir_iif for link locals */
>  	inet_rsk(req)->ir_iif = inet_request_bound_dev_if(sk, skb);
>  
> +	ireq->no_srccheck = inet_sk(sk)->transparent;

Since there is no @ireq there, simply use :

	inet_rsk(req)->no_srccheck = inet_sk(sk)->transparent;


>  	af_ops->init_req(req, sk, skb);
>  

^ permalink raw reply

* Re: [PATCH] netfilter: xt_socket: fix transparent match for IPv6 request sockets
From: Eric Dumazet @ 2016-09-20 15:01 UTC (permalink / raw)
  To: KOVACS Krisztian; +Cc: Pablo Neira Ayuso, netfilter-devel, Alex Badics, netdev
In-Reply-To: <20160920132637.221892-1-hidden@balabit.com>

On Tue, 2016-09-20 at 15:26 +0200, KOVACS Krisztian wrote:
> The introduction of TCP_NEW_SYN_RECV state, and the addition of request
> sockets to the ehash table seems to have broken the --transparent option
> of the socket match for IPv6 (around commit a9407000).
> 
> Now that the socket lookup finds the TCP_NEW_SYN_RECV socket instead of the
> listener, the --transparent option tries to match on the no_srccheck flag
> of the request socket.
> 
> Unfortunately, that flag was only set for IPv4 sockets in tcp_v4_init_req()
> by copying the transparent flag of the listener socket. This effectively
> causes '-m socket --transparent' not match on the ACK packet sent by the
> client in a TCP handshake.
> 
> This change adds the same code initializing no_srccheck to
> tcp_v6_init_req(), rendering the above scenario working again.
> 
> Signed-off-by: Alex Badics <alex.badics@balabit.com>
> Signed-off-by: KOVACS Krisztian <hidden@balabit.com>
> ---
>  net/ipv6/tcp_ipv6.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index 94f4f89..21f2e5c 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -690,6 +690,7 @@ static void tcp_v6_init_req(struct request_sock *req,
>  
>  	ireq->ir_v6_rmt_addr = ipv6_hdr(skb)->saddr;
>  	ireq->ir_v6_loc_addr = ipv6_hdr(skb)->daddr;
> +	ireq->no_srccheck = inet_sk(sk_listener)->transparent;
>  
>  	/* So that link locals have meaning */
>  	if (!sk_listener->sk_bound_dev_if &&

Thanks a lot for the bug hunting guys.

Could you add the precise tag to help stable backports ?

Fixes: 12-digit-sha1 ("patch title")

Since it is a netdev patch, I would also copy netdev@ (done here)

Also what about moving this (IPv4/IPv6 common code) before the
af_ops->init_req(req, sk, skb); call, since it is no longer family
specific ?

Something like :

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 3ebf45b38bc3..0fccfd6cc258 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6264,6 +6264,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	/* Note: tcp_v6_init_req() might override ir_iif for link locals */
 	inet_rsk(req)->ir_iif = inet_request_bound_dev_if(sk, skb);
 
+	ireq->no_srccheck = inet_sk(sk)->transparent;
 	af_ops->init_req(req, sk, skb);
 
 	if (security_inet_conn_request(sk, skb, req))
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7158d4f8dae4..b448eb9fe1b9 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1195,7 +1195,6 @@ static void tcp_v4_init_req(struct request_sock *req,
 
 	sk_rcv_saddr_set(req_to_sk(req), ip_hdr(skb)->daddr);
 	sk_daddr_set(req_to_sk(req), ip_hdr(skb)->saddr);
-	ireq->no_srccheck = inet_sk(sk_listener)->transparent;
 	ireq->opt = tcp_v4_save_options(skb);
 }
 



^ permalink raw reply related

* RE: [PATCH net-next 2/2] bnx2x: allocate mac filtering pending list in PAGE_SIZE increments
From: Mintz, Yuval @ 2016-09-20 15:00 UTC (permalink / raw)
  To: Jason Baron, davem@davemloft.net
  Cc: netdev@vger.kernel.org, Ariel.Elior@qlogic.com
In-Reply-To: <57E14D24.40504@akamai.com>

> > The question I rose was whether it actually makes a difference under
> > such circumstances whether the device would actually filter those
> > multicast addresses or be completely multicast promiscuous.
> > e.g., whether it's significant to be filtering out multicast ingress
> > traffic when you're already allowing 1/2 of all random multicast
> > packets to be classified for the interface.
> >
> 
> Agreed, I think this is the more interesting question here. I thought that we
> would want to make sure we are using most of the bins before falling back to
> multicast ingress. The reason being that even if its more expensive for the NIC to
> do the filtering than the multicast mode, it would be more than made up for by
> having to drop the traffic higher up the stack. So I think if we can determine the
> percent of the bins that we want to use, we can then back into the average
> number of filters required to get there. As I said, I thought we would want to
> make sure we filled basically all the bins (with a high probability that is) before
> falling back to multicast, and so I threw out 2,048.

AFAIK configuring multiple filters doesn't incur any performance penalty
from the adapter side.
And I agree that from 'offloading' perspective it's probably better to
filter in HW even if the gain is negligible. 
So for the upper limit - there's not much of a reason to it; The only gain
would be to prevent driver from allocating lots-and-lots of memory
temporarily for an unnecessary configuration.

^ permalink raw reply

* Re: [PATCH net-next 2/2] bnx2x: allocate mac filtering pending list in PAGE_SIZE increments
From: Jason Baron @ 2016-09-20 14:52 UTC (permalink / raw)
  To: Mintz, Yuval, davem@davemloft.net
  Cc: netdev@vger.kernel.org, Ariel.Elior@qlogic.com
In-Reply-To: <BL2PR07MB23061EAE5CF41EE56A1ED99B8DF70@BL2PR07MB2306.namprd07.prod.outlook.com>



On 09/20/2016 03:41 AM, Mintz, Yuval wrote:
>>>> Currently, we can have high order page allocations that specify
>>>> GFP_ATOMIC when configuring multicast MAC address filters.
>>>>
>>>> For example, we have seen order 2 page allocation failures with
>>>> ~500 multicast addresses configured.
>>>>
>>>> Convert the allocation for the pending list to be done in PAGE_SIZE
>>>> increments.
>>>>
>>>> Signed-off-by: Jason Baron <jbaron@akamai.com>
>>>
>>> While I appreciate the effort, I wonder whether it's worth it:
>>>
>>> - The hardware [even in its newer generation] provides an approximate
>>> based classification [I.e., hashed] with 256 bins.
>>> When configuring 500 multicast addresses, one can argue the difference
>>> between multicast-promisc mode and actual configuration is
>>> insignificant.
>>
>> With 256 bins, I think it takes close to: 256*lg(256) or 2,048 multicast addresses
>> to expect to have all bins have at least one hash, assuming a uniform distribution
>> of the hashes.
>>
>>> Perhaps the easier-to-maintain alternative would simply be to
>>> determine the maximal number of multicast addresses that can be
>>> configured using a single PAGE, and if in need of more than that
>>> simply move into multicast-promisc.
>>>
>>
>> sizeof(struct bnx2x_mcast_list_elem) = 24. So there are 170 per page on x86. So
>> if we want to fit 2,048 elements, we need 12 pages.
>
> That's not exactly what I mean - let's assume you'd have problems
> allocating more than a PAGE. According to your calculation, that
> means you're already using more than 170 multicast addresses.
> I didn't bother trying to solve the combinatorics question of how
> many bins you'd use on average for 170 filters given there are only
> 256 bins, but that would be a significant portion.

On average for 170 filters, I get an average of 124 bins in use out
of 256 possible bins.

> The question I rose was whether it actually makes a difference
> under such circumstances whether the device would actually filter
> those multicast addresses or be completely multicast promiscuous.
> e.g., whether it's significant to be filtering out multicast ingress
> traffic when you're already allowing 1/2 of all random multicast
> packets to be classified for the interface.
>

Agreed, I think this is the more interesting question here. I thought 
that we would want to make sure we are using most of the bins before 
falling back to multicast ingress. The reason being that even if its 
more expensive for the NIC to do the filtering than the multicast mode, 
it would be more than made up for by having to drop the traffic higher 
up the stack. So I think if we can determine the percent of the bins 
that we want to use, we can then back into the average number of filters 
required to get there. As I said, I thought we would want to make sure 
we filled basically all the bins (with a high probability that is) 
before falling back to multicast, and so I threw out 2,048.

Thanks,

-Jason

^ permalink raw reply

* Re: [PATCH net-next v2 05/10] bnxt_en: Fix ethtool -l|-L inconsistent channel counts.
From: Jakub Kicinski @ 2016-09-20 14:52 UTC (permalink / raw)
  To: Michael Chan; +Cc: davem, netdev, Mintz, Yuval
In-Reply-To: <1474354228-3221-6-git-send-email-michael.chan@broadcom.com>

Hi!

On Tue, 20 Sep 2016 02:50:23 -0400, Michael Chan wrote:
> The existing code is inconsistent in reporting and accepting the combined
> channel count.  bnxt_get_channels() reports maximum combined as the
> maximum rx count.  bnxt_set_channels() accepts combined count that
> cannot be bigger than max rx or max tx.
> 
> For example, if max rx = 2 and max tx = 1, we report max supported
> combined to be 2.  But if the user tries to set combined to 2, it will
> fail because 2 is bigger than max tx which is 1.
> 
> Fix the code to be consistent.  Max allowed combined = max(max_rx, max_tx).
> We will accept a combined channel count <= max(max_rx, max_tx).
> 
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>

Sorry I wasn't able to respond in time but I'm with Yuval on this one.

The canonical meaning for the parameters is set by man page for ethtool:
>        rx N   Changes the number of channels with only receive queues.
> 
>        tx N   Changes the number of channels with only transmit queues.
> 
>        other N
>               Changes the number of channels used only for other purposes e.g.
>               link interrupts or SR-IOV co-ordination.
> 
>        combined N
>               Changes the number of multi-purpose channels.

Could we please agree that combined means having both RX and TX and the
others mean having only the specified one?  See for example:

e261768e9e39 ("be2net: support asymmetric rx/tx queue counts")

^ permalink raw reply

* [PATCH] cxgb4: fix signed wrap around when decrementing index idx
From: Colin King @ 2016-09-20 14:48 UTC (permalink / raw)
  To: Hariprasad S, netdev; +Cc: linux-kernel
In-Reply-To: <20160920144846.21765-1-colin.king@canonical.com>

From: Colin Ian King <colin.king@canonical.com>

Change predecrement compare to post decrement compare to avoid an
unsigned integer wrap-around comparison when decrementing idx in
the while loop.

For example, when idx is zero, the current situation will
predecrement idx in the while loop, wrapping idx to the maximum
signed integer and cause out of bounds reads on rxq_info->msix_tbl[idx].

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
index d12a73e..42a9c8d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
@@ -367,7 +367,7 @@ int request_msix_queue_irqs_uld(struct adapter *adap, unsigned int uld_type)
 	}
 	return 0;
 unwind:
-	while (--idx >= 0) {
+	while (idx-- > 0) {
 		bmap_idx = rxq_info->msix_tbl[idx];
 		free_msix_idx_in_bmap(adap, bmap_idx);
 		free_irq(adap->msix_info_ulds[bmap_idx].vec,
-- 
2.9.3

^ permalink raw reply related

* [PATCH] cxgb4: fix signed wrap around when decrementing index idx
From: Colin King @ 2016-09-20 14:48 UTC (permalink / raw)
  To: Hariprasad S, netdev; +Cc: linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Change predecrement compare to post decrement compare to avoid an
unsigned integer wrap-around comparison when decrementing idx in
the while loop.

For example, when idx is zero, the current situation will
predecrement idx in the while loop, wrapping idx to the maximum
signed integer and cause out of bounds reads on rxq_info->msix_tbl[idx].

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
index d12a73e..42a9c8d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
@@ -367,7 +367,7 @@ int request_msix_queue_irqs_uld(struct adapter *adap, unsigned int uld_type)
 	}
 	return 0;
 unwind:
-	while (--idx >= 0) {
+	while (idx-- > 0) {
 		bmap_idx = rxq_info->msix_tbl[idx];
 		free_msix_idx_in_bmap(adap, bmap_idx);
 		free_irq(adap->msix_info_ulds[bmap_idx].vec,
-- 
2.9.3

^ permalink raw reply related

* Re: DSA: Suspicious RCU usage (via rtnl_bridge_getlink)
From: Jiri Pirko @ 2016-09-20 14:46 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Andrew Lunn, Russell King - ARM Linux, Jiri Pirko, netdev,
	Paul E. McKenney
In-Reply-To: <87y42m3alm.fsf@ketchup.i-did-not-set--mail-host-address--so-tickle-me>

Tue, Sep 20, 2016 at 04:32:53PM CEST, vivien.didelot@savoirfairelinux.com wrote:
>Hi Andrew, Russell,
>
>Andrew Lunn <andrew@lunn.ch> writes:
>
>> On Tue, Sep 20, 2016 at 11:26:12AM +0100, Russell King - ARM Linux wrote:
>>> Issuing "bridge vlan show" on clearfog provokes a "suspicious RCU usage"
>>> warning from the kernel (see below).
>>> 
>>> As it's illegal to schedule while holding the RCU read lock, there's the
>>> possibility for this happening much earlier in the call sequence -
>>> mv88e6xxx_port_vlan_dump() takes a mutex, and if that mutex were already
>>> held, we'd schedule at that point.  The RCU read lock was taken by
>>> rtnl_bridge_getlink().
>>> 
>>> It looks horrible to fix - mvmdio.c as well as DSA locking are involved.
>>
>> I would say this needs fixing higher up, in the bridge code. DSA has
>> to be able to sleep, since the switch can be on any arbitrary bus,
>> MDIO, SPI, etc. This will affect pure switchdev devices as well, since
>> they often need to send a request to the switch and wait for a reply.
>
>It looks similar to when a switchdev object/attribute is added/deleted
>without the SWITCHDEV_F_DEFER flag, used in the bridge code to defer
>switchdev operations until switchdev_deferred_process() is called.
>
>This is usually used to process switchdev ops outside the bridge lock.
>
>Jiri, can switchdev_port_vlan_fill not using SWITCHDEV_F_DEFER be the
>reason for this suspicious RCU usage when issuing "bridge vlan show"?

If it is called from atomic context, it should be deferred.


>
>Thanks,
>
>        Vivien

^ permalink raw reply

* Re: [PATCH net-next 8/8] net: qualcomm: add QCA7000 UART driver
From: kbuild test robot @ 2016-09-20 14:44 UTC (permalink / raw)
  To: Stefan Wahren
  Cc: kbuild-all, David S. Miller, Greg Kroah-Hartman, Jiri Slaby,
	netdev, linux-kernel, Stefan Wahren
In-Reply-To: <1474376544-20515-9-git-send-email-stefan.wahren@i2se.com>

[-- Attachment #1: Type: text/plain, Size: 1538 bytes --]

Hi Stefan,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Stefan-Wahren/net-qualcomm-add-QCA7000-UART-driver/20160920-210908
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/qualcomm/qca_uart.c: In function 'qcauart_netdev_xmit':
>> drivers/net/ethernet/qualcomm/qca_uart.c:302:5: error: 'struct net_device' has no member named 'trans_start'; did you mean 'mem_start'?
     dev->trans_start = jiffies;
        ^~
   drivers/net/ethernet/qualcomm/qca_uart.c: In function 'qcauart_netdev_tx_timeout':
   drivers/net/ethernet/qualcomm/qca_uart.c:314:29: error: 'struct net_device' has no member named 'trans_start'; did you mean 'mem_start'?
          jiffies, jiffies - dev->trans_start);
                                ^~

vim +302 drivers/net/ethernet/qualcomm/qca_uart.c

   296		written = qca->tty->ops->write(qca->tty, qca->xbuff, pos - qca->xbuff);
   297		qca->xleft = (pos - qca->xbuff) - written;
   298		qca->xhead = qca->xbuff + written;
   299		n_stats->tx_bytes += written;
   300		spin_unlock(&qca->lock);
   301	
 > 302		dev->trans_start = jiffies;
   303	out:
   304		kfree_skb(skb);
   305		return NETDEV_TX_OK;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 55965 bytes --]

^ permalink raw reply

* Re: DSA: Suspicious RCU usage (via rtnl_bridge_getlink)
From: Vivien Didelot @ 2016-09-20 14:32 UTC (permalink / raw)
  To: Andrew Lunn, Russell King - ARM Linux, Jiri Pirko
  Cc: netdev, Paul E. McKenney
In-Reply-To: <20160920133833.GD20638@lunn.ch>

Hi Andrew, Russell,

Andrew Lunn <andrew@lunn.ch> writes:

> On Tue, Sep 20, 2016 at 11:26:12AM +0100, Russell King - ARM Linux wrote:
>> Issuing "bridge vlan show" on clearfog provokes a "suspicious RCU usage"
>> warning from the kernel (see below).
>> 
>> As it's illegal to schedule while holding the RCU read lock, there's the
>> possibility for this happening much earlier in the call sequence -
>> mv88e6xxx_port_vlan_dump() takes a mutex, and if that mutex were already
>> held, we'd schedule at that point.  The RCU read lock was taken by
>> rtnl_bridge_getlink().
>> 
>> It looks horrible to fix - mvmdio.c as well as DSA locking are involved.
>
> I would say this needs fixing higher up, in the bridge code. DSA has
> to be able to sleep, since the switch can be on any arbitrary bus,
> MDIO, SPI, etc. This will affect pure switchdev devices as well, since
> they often need to send a request to the switch and wait for a reply.

It looks similar to when a switchdev object/attribute is added/deleted
without the SWITCHDEV_F_DEFER flag, used in the bridge code to defer
switchdev operations until switchdev_deferred_process() is called.

This is usually used to process switchdev ops outside the bridge lock.

Jiri, can switchdev_port_vlan_fill not using SWITCHDEV_F_DEFER be the
reason for this suspicious RCU usage when issuing "bridge vlan show"?

Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
From: Pablo Neira Ayuso @ 2016-09-20 14:29 UTC (permalink / raw)
  To: Daniel Mack
  Cc: htejun, daniel, ast, davem, kafai, fw, harald, netdev, sargun,
	cgroups
In-Reply-To: <83afdc54-1bbe-3530-e5fd-b74fefe9a042@zonque.org>

On Mon, Sep 19, 2016 at 10:56:14PM +0200, Daniel Mack wrote:
[...]
> Why would we artificially limit the use-cases of this implementation if
> the way it stands, both filtering and introspection are possible?

Why should we place infrastructure in the kernel to filter packets so
late, and why at postrouting btw, when we can do this way earlier
before any packet is actually sent? No performance impact, no need for
skbuff allocation and *no cycles wasted to evaluate if every packet is
wanted or not*.

^ permalink raw reply

* Re: DSA: Suspicious RCU usage (via rtnl_bridge_getlink)
From: Russell King - ARM Linux @ 2016-09-20 14:28 UTC (permalink / raw)
  To: Andrew Lunn, Dave Miller, Eric Dumazet
  Cc: netdev, Vivien Didelot, Paul E. McKenney
In-Reply-To: <20160920133833.GD20638@lunn.ch>

On Tue, Sep 20, 2016 at 03:38:33PM +0200, Andrew Lunn wrote:
> On Tue, Sep 20, 2016 at 11:26:12AM +0100, Russell King - ARM Linux wrote:
> > Issuing "bridge vlan show" on clearfog provokes a "suspicious RCU usage"
> > warning from the kernel (see below).
> > 
> > As it's illegal to schedule while holding the RCU read lock, there's the
> > possibility for this happening much earlier in the call sequence -
> > mv88e6xxx_port_vlan_dump() takes a mutex, and if that mutex were already
> > held, we'd schedule at that point.  The RCU read lock was taken by
> > rtnl_bridge_getlink().
> > 
> > It looks horrible to fix - mvmdio.c as well as DSA locking are involved.
> 
> Hi Russell
> 
> I would say this needs fixing higher up, in the bridge code. DSA has
> to be able to sleep, since the switch can be on any arbitrary bus,
> MDIO, SPI, etc. This will affect pure switchdev devices as well, since
> they often need to send a request to the switch and wait for a reply.

Hmm, okay, so looking around, other rtnl operations in there just
use for_each_netdev() or for_each_netdev_safe() without taking
any locks, apart from the rtnl mutex which we can see was already
taken.

Why does rtnl_bridge_getlink use RCU?  Can we drop the RCU read lock
and switch to using for_each_netdev() here?  Adding Dave and Eric,
as I guess they're more knowledgeable of the core rtnl code.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply

* [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
From: Sowmini Varadhan @ 2016-09-20 14:27 UTC (permalink / raw)
  To: netdev; +Cc: davem, jbenc, hannes, aduyck, daniel, pabeni, sowmini.varadhan

The vxlan header is at offset (14 + 20 + 8) into the packet,
so the vxh is not aligned in vxlan_build_skb. Use [get/put]_unaligned
functions to modify flags and vni field in the vxh.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 drivers/net/vxlan.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index e7d1668..f903fa4 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1751,15 +1751,17 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
 		goto out_free;
 
 	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
-	vxh->vx_flags = VXLAN_HF_VNI;
-	vxh->vx_vni = vxlan_vni_field(vni);
+	put_unaligned(VXLAN_HF_VNI, &vxh->vx_flags);
+	put_unaligned(vxlan_vni_field(vni), &vxh->vx_vni);
 
 	if (type & SKB_GSO_TUNNEL_REMCSUM) {
 		unsigned int start;
+		__be32 tmpvni = get_unaligned(&vxh->vx_vni);
 
 		start = skb_checksum_start_offset(skb) - sizeof(struct vxlanhdr);
-		vxh->vx_vni |= vxlan_compute_rco(start, skb->csum_offset);
-		vxh->vx_flags |= VXLAN_HF_RCO;
+		tmpvni |= vxlan_compute_rco(start, skb->csum_offset);
+		put_unaligned(tmpvni, &vxh->vx_vni);
+		put_unaligned(VXLAN_HF_VNI | VXLAN_HF_RCO, &vxh->vx_flags);
 
 		if (!skb_is_gso(skb)) {
 			skb->ip_summed = CHECKSUM_NONE;
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH v5 0/6] Add eBPF hooks for cgroups
From: Daniel Mack @ 2016-09-20 14:25 UTC (permalink / raw)
  To: Sargun Dhillon
  Cc: Pablo Neira Ayuso, htejun-b10kYP2dOMg,
	daniel-FeC+5ew28dpmcu3hnIyYJQ, ast-b10kYP2dOMg,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20160919215311.GA9723-I4sfFR6g6EicJoAdRrHjTrzMkBWIpU9tytq7g7fCXyjEk0E+pv7Png@public.gmane.org>

On 09/19/2016 11:53 PM, Sargun Dhillon wrote:
> On Mon, Sep 19, 2016 at 06:34:28PM +0200, Daniel Mack wrote:
>> On 09/16/2016 09:57 PM, Sargun Dhillon wrote:

>>> Now, with this patch, we don't have that, but I think we can reasonably add some 
>>> flag like "no override" when applying policies, or alternatively something like 
>>> "no new privileges", to prevent children from applying policies that override 
>>> top-level policy.
>>
>> Yes, but the API is already guarded by CAP_NET_ADMIN. Take that
>> capability away from your children, and they can't tamper with the
>> policy. Does that work for you?
>
> No. This can be addressed in a follow-on patch, but the use-case is that I have 
> a container orchestrator (Docker, or Mesos), and systemd. The sysadmin controls 
> systemd, and Docker is controlled by devs. Typically, the system owner wants 
> some system level statistics, and filtering, and then we want to do 
> per-container filtering.
> 
> We really want to be able to do nesting with userspace tools that are oblivious, 
> and we want to delegate a level of the cgroup hierarchy to the tool that created 
> it. I do not see Docker integrating with systemd any time soon, and that's 
> really the only other alternative.

Then we'd need to find out whether you want to block other users from
installing (thus overriding) an existing eBPF program, or if you want to
allow that but execute them all. Both is possible.

[...]

>>> It would be nice to be able to see whether or not a filter is attached to a 
>>> cgroup, but given this is going through syscalls, at least introspection
>>> is possible as opposed to something like netlink.
>>
>> Sure, there are many ways. I implemented the bpf cgroup logic using an
>> own cgroup controller once, which made it possible to read out the
>> status. But as we agreed on attaching programs through the bpf(2) system
>> call, I moved back to the implementation that directly stores the
>> pointers in the cgroup.
>>
>> First enabling the controller through the fs-backed cgroup interface,
>> then come back through the bpf(2) syscall and then go back to the fs
>> interface to read out status values is a bit weird.
>>
> Hrm, that makes sense. with the BPF syscall, would there be a way to get
> file descriptor of the currently attached BPF program?

A file descriptor is local to a task, so we would need to install a new
fd and return its number. But I'm not sure what we'd gain from that.


Thanks,
Daniel

^ permalink raw reply

* Re: UBSAN reports issue in ip_idents_reserve
From: Eric Dumazet @ 2016-09-20 14:24 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev
In-Reply-To: <1474381092.23058.10.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, 2016-09-20 at 07:18 -0700, Eric Dumazet wrote:

> +	 */
> +	if (unlikely(atomic_cmpxchg(p_id, old, new) != old))
> +		new = prandom_u32();
> +	return new;

Looks like we should return new - segs;

^ permalink raw reply

* [PATCH net v2] ipmr, ip6mr: return lastuse relative to now
From: Nikolay Aleksandrov @ 2016-09-20 14:17 UTC (permalink / raw)
  To: netdev; +Cc: roopa, sashok, davem, David.Laight, Nikolay Aleksandrov
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB0104872@AcuExch.aculab.com>

When I introduced the lastuse member I made a subtle error because it was
returned as an absolute value but that is meaningless to user-space as it
doesn't allow to see how old exactly an entry is. Let's make it similar to
how the bridge returns such values and make it relative to "now" (jiffies).
This allows us to show the actual age of the entries and is much more
useful (e.g. user-space daemons can age out entries, iproute2 can display
the lastuse properly).

Fixes: 43b9e1274060 ("net: ipmr/ip6mr: add support for keeping an entry age")
Reported-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
v2: make sure lastuse is before or equal to jiffies as per David Laight's
comment

I realize that this changes the way it is exported, but since it hasn't
been in a release yet and we're most probably the only users, I think it is
worth fixing. This change allows user-space daemons to age out entries and
also for the lastuse value to be shown properly via iproute2.

 net/ipv4/ipmr.c  | 7 +++++--
 net/ipv6/ip6mr.c | 7 +++++--
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 26253328d227..a87bcd2d4a94 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2076,6 +2076,7 @@ static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
 	struct rta_mfc_stats mfcs;
 	struct nlattr *mp_attr;
 	struct rtnexthop *nhp;
+	unsigned long lastuse;
 	int ct;
 
 	/* If cache is unresolved, don't try to parse IIF and OIF */
@@ -2105,12 +2106,14 @@ static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
 
 	nla_nest_end(skb, mp_attr);
 
+	lastuse = READ_ONCE(c->mfc_un.res.lastuse);
+	lastuse = time_after_eq(jiffies, lastuse) ? jiffies - lastuse : 0;
+
 	mfcs.mfcs_packets = c->mfc_un.res.pkt;
 	mfcs.mfcs_bytes = c->mfc_un.res.bytes;
 	mfcs.mfcs_wrong_if = c->mfc_un.res.wrong_if;
 	if (nla_put_64bit(skb, RTA_MFC_STATS, sizeof(mfcs), &mfcs, RTA_PAD) ||
-	    nla_put_u64_64bit(skb, RTA_EXPIRES,
-			      jiffies_to_clock_t(c->mfc_un.res.lastuse),
+	    nla_put_u64_64bit(skb, RTA_EXPIRES, jiffies_to_clock_t(lastuse),
 			      RTA_PAD))
 		return -EMSGSIZE;
 
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 6122f9c5cc49..fccb5dd91902 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -2239,6 +2239,7 @@ static int __ip6mr_fill_mroute(struct mr6_table *mrt, struct sk_buff *skb,
 	struct rta_mfc_stats mfcs;
 	struct nlattr *mp_attr;
 	struct rtnexthop *nhp;
+	unsigned long lastuse;
 	int ct;
 
 	/* If cache is unresolved, don't try to parse IIF and OIF */
@@ -2269,12 +2270,14 @@ static int __ip6mr_fill_mroute(struct mr6_table *mrt, struct sk_buff *skb,
 
 	nla_nest_end(skb, mp_attr);
 
+	lastuse = READ_ONCE(c->mfc_un.res.lastuse);
+	lastuse = time_after_eq(jiffies, lastuse) ? jiffies - lastuse : 0;
+
 	mfcs.mfcs_packets = c->mfc_un.res.pkt;
 	mfcs.mfcs_bytes = c->mfc_un.res.bytes;
 	mfcs.mfcs_wrong_if = c->mfc_un.res.wrong_if;
 	if (nla_put_64bit(skb, RTA_MFC_STATS, sizeof(mfcs), &mfcs, RTA_PAD) ||
-	    nla_put_u64_64bit(skb, RTA_EXPIRES,
-			      jiffies_to_clock_t(c->mfc_un.res.lastuse),
+	    nla_put_u64_64bit(skb, RTA_EXPIRES, jiffies_to_clock_t(lastuse),
 			      RTA_PAD))
 		return -EMSGSIZE;
 
-- 
2.1.4

^ permalink raw reply related

* Re: UBSAN reports issue in ip_idents_reserve
From: Eric Dumazet @ 2016-09-20 14:18 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev
In-Reply-To: <1474380711.23058.8.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, 2016-09-20 at 07:11 -0700, Eric Dumazet wrote:
> On Tue, 2016-09-20 at 15:39 +0200, Jiri Pirko wrote:
> 
> > I see. So how to silent the warning?
> > 
> 
> We can replace the atomic_add_return() and use a loop around
> atomic_read() and atomic_cmpxhg()
> 
> This would change the nice property of x86 xadd into a loop.
> 
> Or we also could fallback to random generation if the atomic_cmpxchg()
> fails.
> 
> I'll provide a patch, thanks.
> 

Could you try the following ?

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index
b52496fd51075821c39435f50ac62f813967aecc..91dc108ef6dc75df80f0e73b6fa062d98dc9a58a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -476,12 +476,19 @@ u32 ip_idents_reserve(u32 hash, int segs)
 	atomic_t *p_id = ip_idents + hash % IP_IDENTS_SZ;
 	u32 old = ACCESS_ONCE(*p_tstamp);
 	u32 now = (u32)jiffies;
-	u32 delta = 0;
+	u32 new, delta = 0;
 
 	if (old != now && cmpxchg(p_tstamp, old, now) == old)
 		delta = prandom_u32_max(now - old);
 
-	return atomic_add_return(segs + delta, p_id) - segs;
+	old = (u32)atomic_read(p_id);
+	new = old + delta + segs;
+	/* Do not try too hard, if multiple cpus are there,
+	 * just fallback to pseudo random number.
+	 */
+	if (unlikely(atomic_cmpxchg(p_id, old, new) != old))
+		new = prandom_u32();
+	return new;
 }
 EXPORT_SYMBOL(ip_idents_reserve);
 

^ permalink raw reply

* Re: UBSAN reports issue in ip_idents_reserve
From: Eric Dumazet @ 2016-09-20 14:11 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev
In-Reply-To: <20160920133915.GJ1843@nanopsycho.orion>

On Tue, 2016-09-20 at 15:39 +0200, Jiri Pirko wrote:

> I see. So how to silent the warning?
> 

We can replace the atomic_add_return() and use a loop around
atomic_read() and atomic_cmpxhg()

This would change the nice property of x86 xadd into a loop.

Or we also could fallback to random generation if the atomic_cmpxchg()
fails.

I'll provide a patch, thanks.

^ permalink raw reply

* Re: [PATCH v2 4/6] net: ethernet: bgmac: convert to feature flags
From: Jon Mason @ 2016-09-20 13:59 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: David Miller, Florian Fainelli, Hauke Mehrtens, Rob Herring,
	Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala, Ray Jui,
	Scott Branden, bcm-kernel-feedback-list, Network Development,
	Linux Kernel Mailing List, devicetree@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <CACna6rymVcKV1h_PEvRuCQrtOd4rsqV34bMWzVOof15fdc8KEA@mail.gmail.com>

On Tue, Sep 20, 2016 at 1:19 AM, Rafał Miłecki <zajec5@gmail.com> wrote:
> On 17 August 2016 at 13:34, Rafał Miłecki <zajec5@gmail.com> wrote:
>> On 8 July 2016 at 01:08, Jon Mason <jon.mason@broadcom.com> wrote:
>>>         mode = (bgmac_read(bgmac, BGMAC_DEV_STATUS) & BGMAC_DS_MM_MASK) >>
>>>                 BGMAC_DS_MM_SHIFT;
>>> -       if (ci->id != BCMA_CHIP_ID_BCM47162 || mode != 0)
>>> +       if (bgmac->feature_flags & BGMAC_FEAT_CLKCTLST || mode != 0)
>>>                 bgmac_set(bgmac, BCMA_CLKCTLST, BCMA_CLKCTLST_FORCEHT);
>>> -       if (ci->id == BCMA_CHIP_ID_BCM47162 && mode == 2)
>>> +       if (bgmac->feature_flags & BGMAC_FEAT_CLKCTLST && mode == 2)
>>>                 bcma_chipco_chipctl_maskset(&bgmac->core->bus->drv_cc, 1, ~0,
>>>                                             BGMAC_CHIPCTL_1_RXC_DLL_BYPASS);
>>
>> Jon, it looks to me you translated two following conditions:
>> ci->id != BCMA_CHIP_ID_BCM47162
>> and
>> ci->id == BCMA_CHIP_ID_BCM47162
>> into the same flag check:
>> bgmac->feature_flags & BGMAC_FEAT_CLKCTLST
>>
>> I don't think it's intentional, is it? Do you have a moment to fix this?
>
> Ping

Sorry, just seeing this now.  I'll double check the original code and
verify it (or fix it).

Thanks,
Jon

>
> --
> Rafał

^ permalink raw reply

* Re: UBSAN reports issue in ip_idents_reserve
From: Eric Dumazet @ 2016-09-20 13:45 UTC (permalink / raw)
  To: David Laight; +Cc: Jiri Pirko, netdev@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB01049DD@AcuExch.aculab.com>

On Tue, 2016-09-20 at 13:36 +0000, David Laight wrote:
> From: Eric Dumazet
> > Sent: 20 September 2016 14:29
> ...
> > > [   47.565420] -2117905507 + -695755206 cannot be represented in type 'int'
> ...
> > I do not think we have to worry here.
> >
> > These is best effort, and unfortunately atomic_t are int.
> 
> Not until we compile on a cpu where int arithmetic doesn't wrap.

Then I guess the guy adding this kind of arches in the kernel will have
to add all the core kernel infra.

If you have an idea, I will happily review a patch.

^ permalink raw reply

* [PATCH ipsec-next] xfrm: state lookup can be lockless
From: Florian Westphal @ 2016-09-20 13:45 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

This is called from the packet input path, we get lock contention
if many cpus handle ipsec in parallel.

After recent rcu conversion it is safe to call __xfrm_state_lookup
without the spinlock.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/xfrm/xfrm_state.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index ba8bf51..a38fdea 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1431,9 +1431,9 @@ xfrm_state_lookup(struct net *net, u32 mark, const xfrm_address_t *daddr, __be32
 {
 	struct xfrm_state *x;
 
-	spin_lock_bh(&net->xfrm.xfrm_state_lock);
+	rcu_read_lock();
 	x = __xfrm_state_lookup(net, mark, daddr, spi, proto, family);
-	spin_unlock_bh(&net->xfrm.xfrm_state_lock);
+	rcu_read_unlock();
 	return x;
 }
 EXPORT_SYMBOL(xfrm_state_lookup);
-- 
2.7.3

^ permalink raw reply related

* Re: UBSAN reports issue in ip_idents_reserve
From: Jiri Pirko @ 2016-09-20 13:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1474378115.23058.2.camel@edumazet-glaptop3.roam.corp.google.com>

Tue, Sep 20, 2016 at 03:28:35PM CEST, eric.dumazet@gmail.com wrote:
>On Tue, 2016-09-20 at 14:00 +0200, Jiri Pirko wrote:
>> Hi.
>> 
>> I'm consistently getting following UBSAN warning on every bootup:
>> 
>> [   47.545820] ================================================================================
>> [   47.554340] UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:11
>> [   47.561808] signed integer overflow:
>> [   47.565420] -2117905507 + -695755206 cannot be represented in type 'int'
>> [   47.572226] CPU: 0 PID: 389 Comm: ntpd Not tainted 4.8.0-rc6jiri+ #1
>> [   47.578636] Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox switch, BIOS 4.6.5 05/21/2015
>> [   47.588586]  ffffffff847bf8c0 00000000987b8f47 ffff8803829af5a8 ffffffff818354e3
>> [   47.596165]  0000000041b58ab3 ffffffff8277e711 ffffffff81835431 ffff8803829af5d0
>> [   47.603722]  ffff8803829af580 ffffffffd6879e3a 1ffffffff08f8214 ffffed0070535e6c
>> [   47.611298] Call Trace:
>> [   47.613795]  [<ffffffff818354e3>] dump_stack+0xb2/0x10f
>> [   47.619077]  [<ffffffff81835431>] ? _atomic_dec_and_lock+0xa1/0xa1
>> [   47.625327]  [<ffffffff818a884f>] ubsan_epilogue+0xd/0x4e
>> [   47.630811]  [<ffffffff818a9821>] handle_overflow+0x190/0x1de
>> [   47.636627]  [<ffffffff818a9691>] ? __ubsan_handle_negate_overflow+0x140/0x140
>> [   47.643914]  [<ffffffff81863130>] ? iov_iter_copy_from_user_atomic+0x6e0/0x6e0
>> [   47.651219]  [<ffffffff811e6f79>] ? __lock_acquire.isra.17+0xb79/0xe50
>> [   47.657832]  [<ffffffff81e581f2>] ? ip_generic_getfrag+0xd2/0x190
>> [   47.664011]  [<ffffffff81e58120>] ? ip_setup_cork+0x320/0x320
>> [   47.669827]  [<ffffffff818a987d>] __ubsan_handle_add_overflow+0xe/0x10
>> [   47.676444]  [<ffffffff81e41d52>] ip_idents_reserve+0xb2/0xe0
>> [   47.682254]  [<ffffffff81e443e9>] __ip_select_ident+0x159/0x1b0
>> [   47.688248]  [<ffffffff81e44290>] ? update_or_create_fnhe+0x850/0x850
>> [   47.694782]  [<ffffffff81e58120>] ? ip_setup_cork+0x320/0x320
>> [   47.700624]  [<ffffffff81e5ef40>] __ip_make_skb+0x8a0/0xab0
>> [   47.706259]  [<ffffffff81e5f3fd>] ip_make_skb+0x17d/0x1d0
>> [   47.711717]  [<ffffffff81e58120>] ? ip_setup_cork+0x320/0x320
>> [   47.717526]  [<ffffffff81e5f280>] ? ip_flush_pending_frames+0x20/0x20
>> [   47.724032]  [<ffffffff81e46ef0>] ? ip_rt_update_pmtu+0x4f0/0x4f0
>> [   47.730231]  [<ffffffff81f35291>] ? xfrm_lookup_route+0x21/0xe0
>> [   47.736216]  [<ffffffff81ec0cdb>] udp_sendmsg+0x9db/0xf60
>> [   47.741668]  [<ffffffff81e58120>] ? ip_setup_cork+0x320/0x320
>> [   47.747472]  [<ffffffff81ec0300>] ? udp_abort+0x70/0x70
>> [   47.752763]  [<ffffffff81ede3d8>] inet_sendmsg+0x198/0x220
>> [   47.758324]  [<ffffffff81ede292>] ? inet_sendmsg+0x52/0x220
>> [   47.763982]  [<ffffffff81ede240>] ? inet_recvmsg+0x300/0x300
>> [   47.769728]  [<ffffffff81d6fd25>] sock_sendmsg+0xa5/0xd0
>> [   47.775100]  [<ffffffff81d72f70>] SYSC_sendto+0x1d0/0x280
>> [   47.780551]  [<ffffffff81d72da0>] ? SYSC_connect+0x200/0x200
>> [   47.786283]  [<ffffffff814f66df>] ? poll_select_copy_remaining+0x2af/0x310
>> [   47.793265]  [<ffffffff814f6430>] ? set_fd_set+0x60/0x60
>> [   47.798665]  [<ffffffff811ee360>] ? do_raw_spin_trylock+0x90/0x90
>> [   47.804853]  [<ffffffff814f80e3>] ? SyS_select+0x1a3/0x200
>> [   47.810399]  [<ffffffff814f7f40>] ? core_sys_select+0x570/0x570
>> [   47.816415]  [<ffffffff8100467c>] ? exit_to_usermode_loop+0xec/0x110
>> [   47.822842]  [<ffffffff811e8abd>] ? lockdep_sys_exit+0x2d/0xb0
>> [   47.828769]  [<ffffffff81004016>] ? lockdep_sys_exit_thunk+0x16/0x30
>> [   47.835199]  [<ffffffff81d7433e>] SyS_sendto+0xe/0x10
>> [   47.840321]  [<ffffffff820381f2>] entry_SYSCALL_64_fastpath+0x1a/0xa9
>> [   47.846826] ================================================================================
>> 
>> Looks like this might be result of following commit:
>> 
>> commit 04ca6973f7c1a0d8537f2d9906a0cf8e69886d75
>> Author: Eric Dumazet <edumazet@google.com>
>> Date:   Sat Jul 26 08:58:10 2014 +0200
>> 
>>     ip: make IP identifiers less predictable
>> 
>> Eric, could you please take look at that?
>
>Sure
>
>I do not think we have to worry here.
>
>These is best effort, and unfortunately atomic_t are int.

I see. So how to silent the warning?

>
>Adding uatomic_t helpers in the kernel with unsigned int would be a huge
>effort, given this would touch all arches.
>
>
>

^ permalink raw reply

* RE: UBSAN reports issue in ip_idents_reserve
From: David Laight @ 2016-09-20 13:36 UTC (permalink / raw)
  To: 'Eric Dumazet', Jiri Pirko; +Cc: netdev@vger.kernel.org
In-Reply-To: <1474378115.23058.2.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet
> Sent: 20 September 2016 14:29
...
> > [   47.565420] -2117905507 + -695755206 cannot be represented in type 'int'
...
> I do not think we have to worry here.
>
> These is best effort, and unfortunately atomic_t are int.

Not until we compile on a cpu where int arithmetic doesn't wrap.

While I expect that various other parts of the kernel (and userspace)
wouldn't like such cpu, they do exist.

	David


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox