Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 14/27] net: mvpp2: add ip_version field in "struct mvpp2"
From: Thomas Petazzoni @ 2016-12-21 17:04 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Yehuda Yitschak, Jason Cooper, netdev, Hanna Hawa, Nadav Haklai,
	Gregory Clement, Stefan Chulski, Marcin Wojtas, David S. Miller,
	linux-arm-kernel, Sebastian Hesselbarth
In-Reply-To: <20161221170252.GO30952@lunn.ch>

Hello,

On Wed, 21 Dec 2016 18:02:52 +0100, Andrew Lunn wrote:

> On Wed, Dec 21, 2016 at 12:16:21PM +0100, Thomas Petazzoni wrote:
> > In preparation to the introduction for the support of PPv2.2 in the
> > mvpp2 driver, this commit adds an ip_version field to the struct
> > mvpp2  
> 
> When i read this, i was thinking IPv4 vs IPv6. It is a network driver after all.
> Could you maybe call this hw_version?

Sure, will do. Thanks for the feedback!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply

* Re: [PATCH net-next 14/27] net: mvpp2: add ip_version field in "struct mvpp2"
From: Andrew Lunn @ 2016-12-21 17:02 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: David S. Miller, netdev, Yehuda Yitschak, Jason Cooper,
	Hanna Hawa, Nadav Haklai, Gregory Clement, Stefan Chulski,
	Marcin Wojtas, linux-arm-kernel, Sebastian Hesselbarth
In-Reply-To: <1482318994-23488-15-git-send-email-thomas.petazzoni@free-electrons.com>

Hi Thomas

Minor nit pick.

On Wed, Dec 21, 2016 at 12:16:21PM +0100, Thomas Petazzoni wrote:
> In preparation to the introduction for the support of PPv2.2 in the
> mvpp2 driver, this commit adds an ip_version field to the struct
> mvpp2

When i read this, i was thinking IPv4 vs IPv6. It is a network driver after all.
Could you maybe call this hw_version?

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH 5/5 net-next] inet: reset tb->fastreuseport when adding a reuseport sk
From: Craig Gallek @ 2016-12-21 16:49 UTC (permalink / raw)
  To: Josef Bacik
  Cc: David Miller, Hannes Frederic Sowa, Eric Dumazet, Tom Herbert,
	netdev, kernel-team
In-Reply-To: <1482264424-15439-6-git-send-email-jbacik@fb.com>

On Tue, Dec 20, 2016 at 3:07 PM, Josef Bacik <jbacik@fb.com> wrote:
> If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 and
> never set it again.  Which means that in the future if we end up adding a bunch
> of reuseport sk's to that tb we'll have to do the expensive scan every time.
> Instead add a sock_common to the tb so we know what reuseport sk succeeded last.
> Once one sk has made it onto the list we know that there are no potential bind
> conflicts on the owners list that match that sk's rcv_addr.  So copy the sk's
> common into our tb->fastsock and set tb->fastruseport to FASTREUSESOCK_STRICT so
> we know we have to do an extra check for subsequent reuseport sockets and skip
> the expensive bind conflict check.
>
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  include/net/inet_hashtables.h   |  4 ++++
>  net/ipv4/inet_connection_sock.c | 53 +++++++++++++++++++++++++++++++++++++----
>  2 files changed, 53 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
> index 50f635c..b776401 100644
> --- a/include/net/inet_hashtables.h
> +++ b/include/net/inet_hashtables.h
> @@ -74,12 +74,16 @@ struct inet_ehash_bucket {
>   * users logged onto your box, isn't it nice to know that new data
>   * ports are created in O(1) time?  I thought so. ;-)  -DaveM
>   */
> +#define FASTREUSEPORT_ANY      1
> +#define FASTREUSEPORT_STRICT   2
> +
>  struct inet_bind_bucket {
>         possible_net_t          ib_net;
>         unsigned short          port;
>         signed char             fastreuse;
>         signed char             fastreuseport;
>         kuid_t                  fastuid;
> +       struct sock_common      fastsock;
>         int                     num_owners;
>         struct hlist_node       node;
>         struct hlist_head       owners;
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index d3ccf62..9e29fad 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -164,6 +164,32 @@ success:
>         return head;
>  }
>
> +static inline int sk_reuseport_match(struct inet_bind_bucket *tb,
> +                                    struct sock *sk)
> +{
> +       struct sock *sk2 = (struct sock *)&tb->fastsock;
> +       kuid_t uid = sock_i_uid(sk);
> +
> +       if (tb->fastreuseport <= 0)
> +               return 0;
> +       if (!sk->sk_reuseport)
> +               return 0;
> +       if (rcu_access_pointer(sk->sk_reuseport_cb))
> +               return 0;
> +       if (!uid_eq(tb->fastuid, uid))
> +               return 0;
> +       /* We only need to check the rcv_saddr if this tb was once marked
> +        * without fastreuseport and then was reset, as we can only know that
> +        * the fastsock has no potential bind conflicts with the rest of the
> +        * possible socks on the owners list.
> +        */
> +       if (tb->fastreuseport == FASTREUSEPORT_ANY)
> +               return 1;
> +       if (!inet_csk(sk)->icsk_af_ops->rcv_saddr_equal(sk, sk2, true))
The rcv_saddr_equal functions assume the type of the sk to be
inet_sock.  It doesn't look like they actually depend on any of the
inet-specific fields, but it's probably safer to either remove this
assumption or change the type of tb->fastsock to be a full inet_sock.

> +               return 0;
> +       return 1;
> +}
> +
>  /* Obtain a reference to a local port for the given sock,
>   * if snum is zero it means select any available local port.
>   * We try to allocate an odd port (and leave even ports for connect())
> @@ -206,9 +232,7 @@ tb_found:
>                         goto success;
>
>                 if ((tb->fastreuse > 0 && reuse) ||
> -                    (tb->fastreuseport > 0 &&
> -                     !rcu_access_pointer(sk->sk_reuseport_cb) &&
> -                     sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
> +                   sk_reuseport_match(tb, sk))
>                         goto success;
>                 if (inet_csk_bind_conflict(sk, tb, true, true))
>                         goto fail_unlock;
> @@ -220,14 +244,35 @@ success:
>                 if (sk->sk_reuseport) {
>                         tb->fastreuseport = 1;
>                         tb->fastuid = uid;
> +                       memcpy(&tb->fastsock, &sk->__sk_common,
> +                              sizeof(struct sock_common));
>                 } else {
>                         tb->fastreuseport = 0;
>                 }
>         } else {
>                 if (!reuse)
>                         tb->fastreuse = 0;
> -               if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
> +               if (sk->sk_reuseport) {
> +                       /* We didn't match or we don't have fastreuseport set on
> +                        * the tb, but we have sk_reuseport set on this socket
> +                        * and we know that there are no bind conflicts with
> +                        * this socket in this tb, so reset our tb's reuseport
> +                        * settings so that any subsequent sockets that match
> +                        * our current socket will be put on the fast path.
> +                        *
> +                        * If we reset we need to set FASTREUSEPORT_STRICT so we
> +                        * do extra checking for all subsequent sk_reuseport
> +                        * socks.
> +                        */
> +                       if (!sk_reuseport_match(tb, sk)) {
> +                               tb->fastreuseport = FASTREUSEPORT_STRICT;
> +                               tb->fastuid = uid;
> +                               memcpy(&tb->fastsock, &sk->__sk_common,
> +                                      sizeof(struct sock_common));
> +                       }
> +               } else {
>                         tb->fastreuseport = 0;
> +               }
>         }
>         if (!inet_csk(sk)->icsk_bind_hash)
>                 inet_bind_hash(sk, tb, port);
> --
> 2.9.3
>

^ permalink raw reply

* Re: mlx4: Bug in XDP_TX + 16 rx-queues
From: Tariq Toukan @ 2016-12-21 16:47 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Tariq Toukan, Saeed Mahameed, netdev@vger.kernel.org,
	Alexei Starovoitov
In-Reply-To: <20161220183117.GA63721@kafai-mba.local>



On 20/12/2016 8:31 PM, Martin KaFai Lau wrote:
> On Tue, Dec 20, 2016 at 02:02:05PM +0200, Tariq Toukan wrote:
>> Thanks Martin, nice catch!
>>
>>
>> On 20/12/2016 1:37 AM, Martin KaFai Lau wrote:
>>> Hi Tariq,
>>>
>>> On Sat, Dec 17, 2016 at 02:18:03AM -0800, Martin KaFai Lau wrote:
>>>> Hi All,
>>>>
>>>> I have been debugging with XDP_TX and 16 rx-queues.
>>>>
>>>> 1) When 16 rx-queues is used and an XDP prog is doing XDP_TX,
>>>> it seems that the packet cannot be XDP_TX out if the pkt
>>>> is received from some particular CPUs (/rx-queues).
>>>>
>>>> 2) If 8 rx-queues is used, it does not have problem.
>>>>
>>>> 3) The 16 rx-queues problem also went away after reverting these
>>>> two patches:
>>>> 15fca2c8eb41 net/mlx4_en: Add ethtool statistics for XDP cases
>>>> 67f8b1dcb9ee net/mlx4_en: Refactor the XDP forwarding rings scheme
>>>>
>>> After taking a closer look at 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme")
>>> and armed with the fact that '>8 rx-queues does not work', I have
>>> made the attached change that fixed the issue.
>>>
>>> Making change in mlx4_en_fill_qp_context() could be an easier fix
>>> but I think this change will be easier for discussion purpose.
>>>
>>> I don't want to lie that I know anything about how this variable
>>> works in CX3.  If this change makes sense, I can cook up a diff.
>>> Otherwise, can you shed some light on what could be happening
>>> and hopefully can lead to a diff?
>>>
>>> Thanks
>>> --Martin
>>>
>>>
>>> diff --git i/drivers/net/ethernet/mellanox/mlx4/en_netdev.c w/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>>> index bcd955339058..b3bfb987e493 100644
>>> --- i/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>>> +++ w/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>>> @@ -1638,10 +1638,10 @@ int mlx4_en_start_port(struct net_device *dev)
>>>
>>>   	/* Configure tx cq's and rings */
>>>   	for (t = 0 ; t < MLX4_EN_NUM_TX_TYPES; t++) {
>>> -		u8 num_tx_rings_p_up = t == TX ? priv->num_tx_rings_p_up : 1;
>> The bug lies in this line.
>> Number of rings per UP in case of TX_XDP should be priv->tx_ring_num[TX_XDP
>> ], not 1.
>> Please try the following fix.
>> I can prepare and send it once the window opens again.
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> index bcd955339058..edbe200ac2fa 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> @@ -1638,7 +1638,8 @@ int mlx4_en_start_port(struct net_device *dev)
>>
>>          /* Configure tx cq's and rings */
>>          for (t = 0 ; t < MLX4_EN_NUM_TX_TYPES; t++) {
>> -               u8 num_tx_rings_p_up = t == TX ? priv->num_tx_rings_p_up :
>> 1;
>> +               u8 num_tx_rings_p_up = t == TX ?
>> +                       priv->num_tx_rings_p_up : priv->tx_ring_num[t];
>>
>>                  for (i = 0; i < priv->tx_ring_num[t]; i++) {
>>                          /* Configure cq */
>>
> Thanks for confirming the bug is related to the user_prio argument.
>
> I have some questions:
>
> 1. Just to confirm the intention of the change.  Your change is essentially
>     always passing 0 to the user_prio parameter for the TX_XDP type by
>     doing (i / priv->tx_ring_num[t])?
Yes
>    If yes, would it be clearer to
>     always pass 0 instead?
I think that it should be implied by num_tx_rings_p_up, it simply hadthe 
wrong value.
>
>     And yes, it also works in our test.  Please post an offical patch
>     if it is the fix.
Sure.
>
> 2. Can you explain a little on how does the user_prio affect
>     the tx behavior?  e.g. What is the difference between
>     different user_prio like 0, 1, 2...etc?
An implementation of QoS (Quality of Service).
This would tell the HW to prioritize between different send queues.
In XDP forward we use a single user prio, 0 (default).
>
> 3. Mostly a follow up on (2).
>     In mlx4_en_get_profile(), num_tx_rings_p_up (of the struct mlx4_en_profile)
>     depends on mlx4_low_memory_profile() and number of cpu.  Does these
>     similar bounds apply to the 'u8 num_tx_rings_p_up' here for
>     TX_XDP type?
No.
The number of XDP_TX rings is equal to the number of RX rings.
>
> Thanks,
> Martin
Regards,
Tariq
>>> -
>>>   		for (i = 0; i < priv->tx_ring_num[t]; i++) {
>>>   			/* Configure cq */
>>> +			int user_prio;
>>> +
>>>   			cq = priv->tx_cq[t][i];
>>>   			err = mlx4_en_activate_cq(priv, cq, i);
>>>   			if (err) {
>>> @@ -1660,9 +1660,14 @@ int mlx4_en_start_port(struct net_device *dev)
>>>
>>>   			/* Configure ring */
>>>   			tx_ring = priv->tx_ring[t][i];
>>> +			if (t != TX_XDP)
>>> +				user_prio = i / priv->num_tx_rings_p_up;
>>> +			else
>>> +				user_prio = i & 0x07;
>>> +
>>>   			err = mlx4_en_activate_tx_ring(priv, tx_ring,
>>>   						       cq->mcq.cqn,
>>> -						       i / num_tx_rings_p_up);
>>> +						       user_prio);
>>>   			if (err) {
>>>   				en_err(priv, "Failed allocating Tx ring\n");
>>>   				mlx4_en_deactivate_cq(priv, cq);
>> Regards,
>> Tariq Toukan.

^ permalink raw reply

* Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage
From: Rik van Riel @ 2016-12-21 16:41 UTC (permalink / raw)
  To: kernel-hardening, Jason, linux
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
	jeanphilippe.aumasson, linux-crypto, linux-kernel, luto, netdev,
	tom, torvalds, tytso, vegard.nossum
In-Reply-To: <20161221155540.29529.qmail@ns.sciencehorizons.net>

[-- Attachment #1: Type: text/plain, Size: 710 bytes --]

On Wed, 2016-12-21 at 10:55 -0500, George Spelvin wrote:
> Actually, DJB just made a very relevant suggestion.
> 
> As I've mentioned, the 32-bit performance problems are an x86-
> specific
> problem.  ARM does very well, and other processors aren't bad at all.
> 
> SipHash fits very nicely (and runs very fast) in the MMX registers.
> 
> They're 64 bits, and there are 8 of them, so the integer registers
> can
> be reserved for pointers and loop counters and all that.  And there's
> reference code available.
> 
> How much does kernel_fpu_begin()/kernel_fpu_end() cost?

Those can be very expensive. Almost certainly not
worth it for small amounts of data.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply

* Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage
From: Rik van Riel @ 2016-12-21 16:39 UTC (permalink / raw)
  To: kernel-hardening, Jason A. Donenfeld
  Cc: George Spelvin, Theodore Ts'o, Andi Kleen, David Miller,
	David Laight, Daniel J . Bernstein, Eric Biggers,
	Hannes Frederic Sowa, Jean-Philippe Aumasson,
	Linux Crypto Mailing List, LKML, Andy Lutomirski, Netdev,
	Tom Herbert, Linus Torvalds, Vegard Nossum
In-Reply-To: <1482335804.8944.44.camel@edumazet-glaptop3.roam.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 1167 bytes --]

On Wed, 2016-12-21 at 07:56 -0800, Eric Dumazet wrote:
> On Wed, 2016-12-21 at 15:42 +0100, Jason A. Donenfeld wrote:
> George said :
> 
> > Cycles per byte on 1024 bytes of data:
> >                       Pentium Core 2  Ivy
> >                       4       Duo     Bridge
> > SipHash-2-4           38.9     8.3     5.8
> > HalfSipHash-2-4       12.7     4.5     3.2
> > MD5                    8.3     5.7     4.7
> 
> 
> That really was for 1024 bytes blocks, so pretty much useless for our
> discussion ?
> 
> Reading your numbers last week, I thought SipHash was faster, but
> George
> numbers are giving the opposite impression.
> 
> I do not have a P4 to make tests, so I only can trust you or George.

Does anybody still have a P4?

If they do, they're probably better off replacing
it with an Atom. The reduced power bills will pay
for replacing that P4 within a year or two.

In short, I am not sure how important the P4
performance numbers are, especially if we can
improve security for everybody else...

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply

* Re: mlx4: Bug in XDP_TX + 16 rx-queues
From: Tariq Toukan @ 2016-12-21 16:37 UTC (permalink / raw)
  To: David Miller; +Cc: kafai, tariqt, saeedm, netdev, ast
In-Reply-To: <20161220.092211.428014402629205135.davem@davemloft.net>


On 20/12/2016 4:22 PM, David Miller wrote:
> From: Tariq Toukan <ttoukan.linux@gmail.com>
> Date: Tue, 20 Dec 2016 14:02:05 +0200
>
>> I can prepare and send it once the window opens again.
> Bug fixes like this can and should be sent at any time whatsoever.
Thanks Dave, I will verify my fix and send it.

Regards,
Tariq

^ permalink raw reply

* Re: HalfSipHash Acceptable Usage
From: Jason A. Donenfeld @ 2016-12-21 16:37 UTC (permalink / raw)
  To: George Spelvin
  Cc: Andi Kleen, David Miller, David Laight, Daniel J . Bernstein,
	Eric Biggers, Eric Dumazet, Hannes Frederic Sowa,
	Jean-Philippe Aumasson, kernel-hardening,
	Linux Crypto Mailing List, LKML, Andy Lutomirski, Netdev,
	Tom Herbert, Linus Torvalds, Theodore Ts'o, Vegard Nossum
In-Reply-To: <20161221155540.29529.qmail@ns.sciencehorizons.net>

Hi George,

On Wed, Dec 21, 2016 at 4:55 PM, George Spelvin
<linux@sciencehorizons.net> wrote:
> Actually, DJB just made a very relevant suggestion.
>
> As I've mentioned, the 32-bit performance problems are an x86-specific
> problem.  ARM does very well, and other processors aren't bad at all.
>
> SipHash fits very nicely (and runs very fast) in the MMX registers.
>
> They're 64 bits, and there are 8 of them, so the integer registers can
> be reserved for pointers and loop counters and all that.  And there's
> reference code available.
>
> How much does kernel_fpu_begin()/kernel_fpu_end() cost?

In my experience, these functions are only worth calling when
processing more significant amounts of data. I don't have any
benchmarks, but when I _remove_ all of these calls in a kernel,
accelerated crypto gets noticeably faster (until the system crashes).
We can measure it, though.

By the way, if somehow SipHash becomes acceptably fast on x86, would
you consider HalfSipHash for hash tables to be no longer needed? Or do
you suspect that HalfSipHash will always be faster even on, say,
32-bit ARM.

Jason

^ permalink raw reply

* Re: HalfSipHash Acceptable Usage
From: Jason A. Donenfeld @ 2016-12-21 16:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: George Spelvin, Theodore Ts'o, Andi Kleen, David Miller,
	David Laight, Daniel J . Bernstein, Eric Biggers,
	Hannes Frederic Sowa, Jean-Philippe Aumasson, kernel-hardening,
	Linux Crypto Mailing List, LKML, Andy Lutomirski, Netdev,
	Tom Herbert, Linus Torvalds, Vegard Nossum
In-Reply-To: <1482335804.8944.44.camel@edumazet-glaptop3.roam.corp.google.com>

Hi Eric,

On Wed, Dec 21, 2016 at 4:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> That really was for 1024 bytes blocks, so pretty much useless for our
> discussion ?
>
> Reading your numbers last week, I thought SipHash was faster, but George
> numbers are giving the opposite impression.
>
> I do not have a P4 to make tests, so I only can trust you or George.

I'm not sure how George came up with those numbers, but the ones I
sent are output from that benchmark function in the last email. I'd be
interested in learning this too.

As mentioned in the last email, it looks like potential 32-bit issues
are really just specific to old Intel chips. Other 32-bit
architectures do fine. So, for new kernels, even if somehow there is a
tiny performance regression (though I couldn't see one) on old
architectures, I really doubt it will affect anybody in practice.

Jason

^ permalink raw reply

* Re: [PATCH net-next 00/27] net: mvpp2: add basic support for PPv2.2
From: David Miller @ 2016-12-21 16:24 UTC (permalink / raw)
  To: thomas.petazzoni
  Cc: netdev, nadavh, hannah, yehuday, jason, andrew,
	sebastian.hesselbarth, gregory.clement, linux-arm-kernel, stefanc,
	mw
In-Reply-To: <20161221171246.77e91d2c@free-electrons.com>

From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Date: Wed, 21 Dec 2016 17:12:46 +0100

> Hello,
> 
> On Wed, 21 Dec 2016 11:03:48 -0500 (EST), David Miller wrote:
> 
>> 1) net-next is closed, please do not submit net-next material during
>>    this time.
> 
> I did not expect you to review these patches right now, as I know
> net-next is closed. However I expect other people in the net community
> and people interested in Marvell platforms to look at those patches,
> potentially test them and give feedback.

Then mark the patches as "RFC".

^ permalink raw reply

* Re: [PATCH 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage
From: Mark Greer @ 2016-12-21 16:13 UTC (permalink / raw)
  To: Geoff Lansberry
  Cc: linux-wireless, lauro.venancio, aloisio.almeida, sameo, robh+dt,
	mark.rutland, netdev, devicetree, linux-kernel, justin
In-Reply-To: <32FAB08E-BE8E-4127-80A6-013300B43BD0@kuvee.com>

On Wed, Dec 21, 2016 at 06:47:36AM -0500, Geoff Lansberry wrote:
> Thanks Mark.   Should I resubmit patches with the requested edits today, or wait a bit for more comments?  What is the desired etiquette?

Its up to you.  I don't think there are any hard & fast rules on this.

If it were me, I would likely spin a new version today because there are
several responses already and it lets people review them at their leisure
over the holidays.

Just a thought - you may want to consider separating the third patch from
the other two.  The problems the first two solve are well understood and
have reasonable solutions (I believe).  The third one - for me, at least -
tries to fix a problem that is not well understood yet.

Mark
--

^ permalink raw reply

* Re: [PATCH net-next 00/27] net: mvpp2: add basic support for PPv2.2
From: Thomas Petazzoni @ 2016-12-21 16:12 UTC (permalink / raw)
  To: David Miller
  Cc: andrew, yehuday, jason, netdev, hannah, nadavh, gregory.clement,
	stefanc, mw, linux-arm-kernel, sebastian.hesselbarth
In-Reply-To: <20161221.110348.346806544020316686.davem@davemloft.net>

Hello,

On Wed, 21 Dec 2016 11:03:48 -0500 (EST), David Miller wrote:

> 1) net-next is closed, please do not submit net-next material during
>    this time.

I did not expect you to review these patches right now, as I know
net-next is closed. However I expect other people in the net community
and people interested in Marvell platforms to look at those patches,
potentially test them and give feedback.

> 2) 27 patches is way too many to submit at one time, please keep your
>    patch series submissions to a small, reasonable size.

OK, I will do some arbitrary cut in the series and send several
smaller series instead.

Thanks for the feedback,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply

* Re: [PATCH v3 net-next 0/3] Add support for the ethernet switch on the ESPRESSObin
From: David Miller @ 2016-12-21 16:10 UTC (permalink / raw)
  To: romain.perier
  Cc: andrew, vivien.didelot, f.fainelli, jason, sebastian.hesselbarth,
	gregory.clement, netdev, devicetree, robh+dt, ijc+devicetree,
	pawel.moll, mark.rutland, galak, linux-arm-kernel,
	thomas.petazzoni, nadavh
In-Reply-To: <20161221090045.474-1-romain.perier@free-electrons.com>


net-next is not open, please do not submit net-next changes during this
time.

^ permalink raw reply

* Re: [PATCH] net: macb: Applied changes according to review made by Andy Shevchenko.
From: David Miller @ 2016-12-21 16:08 UTC (permalink / raw)
  To: bfolta
  Cc: nicolas.ferre, niklas.cassel, alexandre.torgue, satananda.burla,
	rvatsavayi, simon.horman, linux-kernel, netdev, rafalo
In-Reply-To: <SN1PR0701MB1951B994F86160A24C436F30CC930@SN1PR0701MB1951.namprd07.prod.outlook.com>

From: Bartosz Folta <bfolta@cadence.com>
Date: Wed, 21 Dec 2016 08:53:05 +0000

> This patch is sent in regard to recently applied patch
> Commit 83a77e9ec4150ee4acc635638f7dedd9da523a26
> net: macb: Added PCI wrapper for Platform Driver.
> Andy's review does not appear on mailing lists. I can't reference it.
> 
> Signed-off-by: Bartosz Folta <bfolta@cadence.com>

This commit log message and Subject line need a lot of improvement.

You should be saying what exactly you are doing, rather than "someone
said I should do this".  The latter gives the reader no information
whatsoever about what the patch is doing.

^ permalink raw reply

* Re: [PATCH perf/core REBASE 3/5] tools lib bpf: Add bpf_prog_{attach,detach}
From: Arnaldo Carvalho de Melo @ 2016-12-21 16:06 UTC (permalink / raw)
  To: Joe Stringer; +Cc: LKML, netdev, Wang Nan, ast, Daniel Borkmann
In-Reply-To: <CAPWQB7HyzaAck1LEX_ec7gRpoKjyaZPZ+dco3Ca=sR4qQ403BQ@mail.gmail.com>

Em Tue, Dec 20, 2016 at 10:50:22AM -0800, Joe Stringer escreveu:
> On 20 December 2016 at 06:32, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > Em Tue, Dec 20, 2016 at 11:18:51AM -0300, Arnaldo Carvalho de Melo escreveu:
> >> This one makes it fail for CentOS 5 and 6, others may fail as well,
> >> still building, investigating...
> >
> > Ok, fixed it by making it follow the model of the other sys_bpf wrappers
> > setting up that bpf_attr union wrt initializing unamed struct members:
> > -       union bpf_attr attr = {
> > -               .target_fd = target_fd,
> > -       };
> > +       union bpf_attr attr;
> > +
> > +       bzero(&attr, sizeof(attr));
> > +       attr.target_fd     = target_fd;

> Ah, I just shifted these across originally so the delta would be
> minimal but now I know why this code is like this. Thanks.

np, making sure this code works in all those environments requires
automation, I'd say its impossible otherwise, too many details :-\

Fixed, pushed, merged, should hit 4.10 soon :-)

- Arnaldo

^ permalink raw reply

* Re: [PATCH net-next 00/27] net: mvpp2: add basic support for PPv2.2
From: David Miller @ 2016-12-21 16:03 UTC (permalink / raw)
  To: thomas.petazzoni
  Cc: netdev, nadavh, hannah, yehuday, jason, andrew,
	sebastian.hesselbarth, gregory.clement, linux-arm-kernel, stefanc,
	mw
In-Reply-To: <1482318994-23488-1-git-send-email-thomas.petazzoni@free-electrons.com>

Two things:

1) net-next is closed, please do not submit net-next material during
   this time.

2) 27 patches is way too many to submit at one time, please keep your
   patch series submissions to a small, reasonable size.

Thanks.

^ permalink raw reply

* [PATCH] net: fddi: skfp: use %p format specifier for addresses rather than %x
From: Colin King @ 2016-12-21 16:03 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Trivial fix: Addresses should be printed using the %p format specifier
rather than using %x.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/fddi/skfp/hwmtm.c | 12 ++++++------
 drivers/net/fddi/skfp/pmf.c   |  2 +-
 drivers/net/fddi/skfp/smt.c   |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/fddi/skfp/hwmtm.c b/drivers/net/fddi/skfp/hwmtm.c
index e26398b..d0a68bd 100644
--- a/drivers/net/fddi/skfp/hwmtm.c
+++ b/drivers/net/fddi/skfp/hwmtm.c
@@ -1483,7 +1483,7 @@ void mac_drv_clear_rx_queue(struct s_smc *smc)
 	r = queue->rx_curr_get ;
 	while (queue->rx_used) {
 		DRV_BUF_FLUSH(r,DDI_DMA_SYNC_FORCPU) ;
-		DB_RX("switch OWN bit of RxD 0x%x ",r,0,5) ;
+		DB_RX("switch OWN bit of RxD 0x%p ",r,0,5) ;
 		r->rxd_rbctrl &= ~cpu_to_le32(BMU_OWN) ;
 		frag_count = 1 ;
 		DRV_BUF_FLUSH(r,DDI_DMA_SYNC_FORDEV) ;
@@ -1645,7 +1645,7 @@ void hwm_tx_frag(struct s_smc *smc, char far *virt, u_long phys, int len,
 	DB_TX("hwm_tx_frag: len = %d, frame_status = %x ",len,frame_status,2) ;
 	if (frame_status & LAN_TX) {
 		/* '*t' is already defined */
-		DB_TX("LAN_TX: TxD = %x, virt = %x ",t,virt,3) ;
+		DB_TX("LAN_TX: TxD = %p, virt = %p ",t,virt,3) ;
 		t->txd_virt = virt ;
 		t->txd_txdscr = cpu_to_le32(smc->os.hwm.tx_descr) ;
 		t->txd_tbadr = cpu_to_le32(phys) ;
@@ -1819,7 +1819,7 @@ void smt_send_mbuf(struct s_smc *smc, SMbuf *mb, int fc)
 	__le32	tbctrl;
 
 	NDD_TRACE("THSB",mb,fc,0) ;
-	DB_TX("smt_send_mbuf: mb = 0x%x, fc = 0x%x",mb,fc,4) ;
+	DB_TX("smt_send_mbuf: mb = 0x%p, fc = 0x%x",mb,fc,4) ;
 
 	mb->sm_off-- ;	/* set to fc */
 	mb->sm_len++ ;	/* + fc */
@@ -1960,7 +1960,7 @@ static void mac_drv_clear_txd(struct s_smc *smc)
 
 			do {
 				DRV_BUF_FLUSH(t1,DDI_DMA_SYNC_FORCPU) ;
-				DB_TX("check OWN/EOF bit of TxD 0x%x",t1,0,5) ;
+				DB_TX("check OWN/EOF bit of TxD 0x%p",t1,0,5) ;
 				tbctrl = le32_to_cpu(CR_READ(t1->txd_tbctrl));
 
 				if (tbctrl & BMU_OWN || !queue->tx_used){
@@ -1988,7 +1988,7 @@ static void mac_drv_clear_txd(struct s_smc *smc)
 			}
 			else {
 #ifndef PASS_1ST_TXD_2_TX_COMP
-				DB_TX("mac_drv_tx_comp for TxD 0x%x",t2,0,4) ;
+				DB_TX("mac_drv_tx_comp for TxD 0x%p",t2,0,4) ;
 				mac_drv_tx_complete(smc,t2) ;
 #else
 				DB_TX("mac_drv_tx_comp for TxD 0x%x",
@@ -2052,7 +2052,7 @@ void mac_drv_clear_tx_queue(struct s_smc *smc)
 		tx_used = queue->tx_used ;
 		while (tx_used) {
 			DRV_BUF_FLUSH(t,DDI_DMA_SYNC_FORCPU) ;
-			DB_TX("switch OWN bit of TxD 0x%x ",t,0,5) ;
+			DB_TX("switch OWN bit of TxD 0x%p ",t,0,5) ;
 			t->txd_tbctrl &= ~cpu_to_le32(BMU_OWN) ;
 			DRV_BUF_FLUSH(t,DDI_DMA_SYNC_FORDEV) ;
 			t = t->txd_next ;
diff --git a/drivers/net/fddi/skfp/pmf.c b/drivers/net/fddi/skfp/pmf.c
index 441b4dc..52fa162 100644
--- a/drivers/net/fddi/skfp/pmf.c
+++ b/drivers/net/fddi/skfp/pmf.c
@@ -284,7 +284,7 @@ void smt_pmf_received_pack(struct s_smc *smc, SMbuf *mb, int local)
 	SMbuf		*reply ;
 
 	sm = smtod(mb,struct smt_header *) ;
-	DB_SMT("SMT: processing PMF frame at %x len %d\n",sm,mb->sm_len) ;
+	DB_SMT("SMT: processing PMF frame at %p len %d\n",sm,mb->sm_len) ;
 #ifdef	DEBUG
 	dump_smt(smc,sm,"PMF Received") ;
 #endif
diff --git a/drivers/net/fddi/skfp/smt.c b/drivers/net/fddi/skfp/smt.c
index cd78b7c..e80a089 100644
--- a/drivers/net/fddi/skfp/smt.c
+++ b/drivers/net/fddi/skfp/smt.c
@@ -504,7 +504,7 @@ void smt_received_pack(struct s_smc *smc, SMbuf *mb, int fs)
 #endif
 
 	smt_swap_para(sm,(int) mb->sm_len,1) ;
-	DB_SMT("SMT : received packet [%s] at 0x%x\n",
+	DB_SMT("SMT : received packet [%s] at 0x%p\n",
 		smt_type_name[m_fc(mb) & 0xf],sm) ;
 	DB_SMT("SMT : version %d, class %s\n",sm->smt_version,
 		smt_class_name[(sm->smt_class>LAST_CLASS)?0 : sm->smt_class]) ;
-- 
2.10.2

^ permalink raw reply related

* Re: [PATCH 1/5 net-next] inet: replace ->bind_conflict with ->rcv_saddr_equal
From: Josef Bacik @ 2016-12-21 15:59 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: davem, kraigatgoog, eric.dumazet, tom, netdev, kernel-team
In-Reply-To: <1482333786.2260.10.camel@stressinduktion.org>

On Wed, Dec 21, 2016 at 10:23 AM, Hannes Frederic Sowa 
<hannes@stressinduktion.org> wrote:
> On Tue, 2016-12-20 at 15:07 -0500, Josef Bacik wrote:
>>  --- a/net/dccp/ipv6.c
>>  +++ b/net/dccp/ipv6.c
>>  @@ -926,7 +926,7 @@ static const struct inet_connection_sock_af_ops 
>> dccp_ipv6_af_ops = {
>>   	.getsockopt	   = ipv6_getsockopt,
>>   	.addr2sockaddr	   = inet6_csk_addr2sockaddr,
>>   	.sockaddr_len	   = sizeof(struct sockaddr_in6),
>>  -	.bind_conflict	   = inet6_csk_bind_conflict,
>>  +	.rcv_saddr_equal   = ipv6_rcv_saddr_equal,
>>   #ifdef CONFIG_COMPAT
>>   	.compat_setsockopt = compat_ipv6_setsockopt,
>>   	.compat_getsockopt = compat_ipv6_getsockopt,
>> 
> 
> Btw, small nit, you forgot the corresponding changes in
> dccp_ipv6_mapped, thus causing this compiler error:
> 
> net/dccp/ipv6.c:961:2: error: unknown field ‘bind_conflict’ 
> specified in initializer
>   .bind_conflict    = inet6_csk_bind_conflict,
>   ^
> net/dccp/ipv6.c:961:22: error: ‘inet6_csk_bind_conflict’ 
> undeclared here (not in a function)
>   .bind_conflict    = inet6_csk_bind_conflict,
>                       ^~~~~~~~~~~~~~~~~~~~~~~
> scripts/Makefile.build:293: recipe for target 'net/dccp/ipv6.o' failed

Yeah kbuild caught that yesterday and I have it fixed up in my git tree 
already, thanks,

Josef

^ permalink raw reply

* Re: HalfSipHash Acceptable Usage
From: Eric Dumazet @ 2016-12-21 15:56 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: George Spelvin, Theodore Ts'o, Andi Kleen, David Miller,
	David Laight, Daniel J . Bernstein, Eric Biggers,
	Hannes Frederic Sowa, Jean-Philippe Aumasson, kernel-hardening,
	Linux Crypto Mailing List, LKML, Andy Lutomirski, Netdev,
	Tom Herbert, Linus Torvalds, Vegard Nossum
In-Reply-To: <CAHmME9oCfCwAq1qP09uAN6vvakh4wXDMHunsL9D7W_LDeN_OQQ@mail.gmail.com>

On Wed, 2016-12-21 at 15:42 +0100, Jason A. Donenfeld wrote:
> Hi Eric,
> 
> I computed performance numbers for both 32-bit and 64-bit using the
> actual functions in which talking about replacing MD5 with SipHash.
> The basic harness is here [1] if you're curious. SipHash was a pretty
> clear winner for both cases.
> 
> x86_64:
> [    1.714302] secure_tcpv6_sequence_number_md5# cycles: 102373398
> [    1.747685] secure_tcp_sequence_number_md5# cycles: 92042258
> [    1.773522] secure_tcpv6_sequence_number_siphash# cycles: 70786533
> [    1.798701] secure_tcp_sequence_number_siphash# cycles: 68941043
> 
> x86:
> [    1.635749] secure_tcpv6_sequence_number_md5# cycles: 106016335
> [    1.670259] secure_tcp_sequence_number_md5# cycles: 95670512
> [    1.708387] secure_tcpv6_sequence_number_siphash# cycles: 105988635
> [    1.740264] secure_tcp_sequence_number_siphash# cycles: 88225395
> 
> >>> 102373398 > 70786533
> True
> >>> 92042258 > 68941043
> True
> >>> 106016335 > 105988635
> True
> >>> 95670512 > 88225395
> True
> 
> While MD5 is probably faster for some kind of large-data
> cycles-per-byte, due to its 64-byte internal state, SipHash -- the
> "Sip" part standing "Short Input PRF" -- is fast for shorter inputs.
> In practice with the functions we're talking about replacing, there's
> no need to hash 64-bytes. So, SipHash comes out faster and more
> secure.
> 
> I also haven't begun to look focusedly at the assembly my SipHash
> implemention is generating, which means there's still window for even
> more performance improvements.
> 
> Jason
> 
> 
> [1] https://git.zx2c4.com/linux-dev/tree/net/core/secure_seq.c?h=siphash-bench#n194

Now I am quite confused.

George said :

> Cycles per byte on 1024 bytes of data:
>                       Pentium Core 2  Ivy
>                       4       Duo     Bridge
> SipHash-2-4           38.9     8.3     5.8
> HalfSipHash-2-4               12.7     4.5     3.2
> MD5                    8.3     5.7     4.7


That really was for 1024 bytes blocks, so pretty much useless for our
discussion ?

Reading your numbers last week, I thought SipHash was faster, but George
numbers are giving the opposite impression.

I do not have a P4 to make tests, so I only can trust you or George.

Thanks.

^ permalink raw reply

* RE: [PATCH net] tcp: add a missing barrier in tcp_tasklet_func()
From: Madalin-Cristian Bucur @ 2016-12-21 14:23 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev, Eric Dumazet, Miffy Lei
In-Reply-To: <1482327763.8944.26.camel@edumazet-glaptop3.roam.corp.google.com>

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Wednesday, December 21, 2016 3:43 PM
> 
> Madalin reported crashes happening in tcp_tasklet_func() on powerpc64
> 
> Before TSQ_QUEUED bit is cleared, we must ensure the changes done
> by list_del(&tp->tsq_node); are committed to memory, otherwise
> corruption might happen, as an other cpu could catch TSQ_QUEUED
> clearance too soon.
> 
> We can notice that old kernels were immune to this bug, because
> TSQ_QUEUED was cleared after a bh_lock_sock(sk)/bh_unlock_sock(sk)
> section, but they could have missed a kick to write additional bytes,
> when NIC interrupts for a given flow are spread to multiple cpus.
> 
> Affected TCP flows would need an incoming ACK or RTO timer to add more
> packets to the pipe. So overall situation should be better now.
> 
> Fixes: b223feb9de2a ("tcp: tsq: add shortcut in tcp_tasklet_func()")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Madalin Bucur <madalin.bucur@nxp.com>
> Tested-by: Madalin Bucur <madalin.bucur@nxp.com>

It's actually tested by Xing Lei:

Tested-by: Xing Lei <xing.lei@nxp.com>

Thank you for the fast resolution.

> ---
>  net/ipv4/tcp_output.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index
> b45101f3d2bd2e0f0077305a061add4f7ea0de27..31a255b555ad86a3537c077862e3ea38
> f9b44284 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -769,6 +769,7 @@ static void tcp_tasklet_func(unsigned long data)
>  		list_del(&tp->tsq_node);
> 
>  		sk = (struct sock *)tp;
> +		smp_mb__before_atomic();
>  		clear_bit(TSQ_QUEUED, &sk->sk_tsq_flags);
> 
>  		if (!sk->sk_lock.owned &&
> 


^ permalink raw reply

* Re: HalfSipHash Acceptable Usage
From: George Spelvin @ 2016-12-21 15:55 UTC (permalink / raw)
  To: Jason, linux
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, luto, netdev, tom, torvalds, tytso, vegard.nossum
In-Reply-To: <CAHmME9oJDOLpPKRpX=N+DY9BuzTueWjTaWzeVtdVMBG7mcrqKA@mail.gmail.com>

Actually, DJB just made a very relevant suggestion.

As I've mentioned, the 32-bit performance problems are an x86-specific
problem.  ARM does very well, and other processors aren't bad at all.

SipHash fits very nicely (and runs very fast) in the MMX registers.

They're 64 bits, and there are 8 of them, so the integer registers can
be reserved for pointers and loop counters and all that.  And there's
reference code available.

How much does kernel_fpu_begin()/kernel_fpu_end() cost?

Although there are a lot of pre-MMX x86es in embedded control applications,
I don't think anyone is worried about their networking performance.
(Specifically, all of this affects only connection setup, not throughput 
on established connections.)

^ permalink raw reply

* Re: [PATCH 1/5 net-next] inet: replace ->bind_conflict with ->rcv_saddr_equal
From: Hannes Frederic Sowa @ 2016-12-21 15:23 UTC (permalink / raw)
  To: Josef Bacik, davem, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482264424-15439-2-git-send-email-jbacik@fb.com>

On Tue, 2016-12-20 at 15:07 -0500, Josef Bacik wrote:
> --- a/net/dccp/ipv6.c
> +++ b/net/dccp/ipv6.c
> @@ -926,7 +926,7 @@ static const struct inet_connection_sock_af_ops dccp_ipv6_af_ops = {
>  	.getsockopt	   = ipv6_getsockopt,
>  	.addr2sockaddr	   = inet6_csk_addr2sockaddr,
>  	.sockaddr_len	   = sizeof(struct sockaddr_in6),
> -	.bind_conflict	   = inet6_csk_bind_conflict,
> +	.rcv_saddr_equal   = ipv6_rcv_saddr_equal,
>  #ifdef CONFIG_COMPAT
>  	.compat_setsockopt = compat_ipv6_setsockopt,
>  	.compat_getsockopt = compat_ipv6_getsockopt,
> 

Btw, small nit, you forgot the corresponding changes in
dccp_ipv6_mapped, thus causing this compiler error:

net/dccp/ipv6.c:961:2: error: unknown field ‘bind_conflict’ specified in initializer
  .bind_conflict    = inet6_csk_bind_conflict,
  ^
net/dccp/ipv6.c:961:22: error: ‘inet6_csk_bind_conflict’ undeclared here (not in a function)
  .bind_conflict    = inet6_csk_bind_conflict,
                      ^~~~~~~~~~~~~~~~~~~~~~~
scripts/Makefile.build:293: recipe for target 'net/dccp/ipv6.o' failed

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH 1/5 net-next] inet: replace ->bind_conflict with ->rcv_saddr_equal
From: Josef Bacik @ 2016-12-21 15:16 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: davem, kraigatgoog, eric.dumazet, tom, netdev, kernel-team
In-Reply-To: <1482332814.2260.6.camel@stressinduktion.org>

On Wed, Dec 21, 2016 at 10:06 AM, Hannes Frederic Sowa 
<hannes@stressinduktion.org> wrote:
> On Tue, 2016-12-20 at 15:07 -0500, Josef Bacik wrote:
>>  The only difference between inet6_csk_bind_conflict and 
>> inet_csk_bind_conflict
>>  is how they check the rcv_saddr.  Since we want to be able to check 
>> the saddr in
>>  other places just drop the protocol specific ->bind_conflict and 
>> replace it with
>>  ->rcv_saddr_equal, then make inet_csk_bind_conflict the one true 
>> bind conflict
>>  function.
>> 
>>  Signed-off-by: Josef Bacik <jbacik@fb.com>
>> 
> 
> 
> 
>>  ---
>>   include/net/inet6_connection_sock.h |  5 -----
>>   include/net/inet_connection_sock.h  |  9 +++------
>>   net/dccp/ipv4.c                     |  3 ++-
>>   net/dccp/ipv6.c                     |  2 +-
>>   net/ipv4/inet_connection_sock.c     | 22 +++++++-------------
>>   net/ipv4/tcp_ipv4.c                 |  3 ++-
>>   net/ipv4/udp.c                      |  1 +
>>   net/ipv6/inet6_connection_sock.c    | 40 
>> -------------------------------------
>>   net/ipv6/tcp_ipv6.c                 |  4 ++--
>>   9 files changed, 18 insertions(+), 71 deletions(-)
>> 
>>  diff --git a/include/net/inet6_connection_sock.h 
>> b/include/net/inet6_connection_sock.h
>>  index 3212b39..8ec87b6 100644
>>  --- a/include/net/inet6_connection_sock.h
>>  +++ b/include/net/inet6_connection_sock.h
>>  @@ -15,16 +15,11 @@
>> 
>>   #include <linux/types.h>
>> 
>>  -struct inet_bind_bucket;
>>   struct request_sock;
>>   struct sk_buff;
>>   struct sock;
>>   struct sockaddr;
>> 
>>  -int inet6_csk_bind_conflict(const struct sock *sk,
>>  -			    const struct inet_bind_bucket *tb, bool relax,
>>  -			    bool soreuseport_ok);
>>  -
>>   struct dst_entry *inet6_csk_route_req(const struct sock *sk, 
>> struct flowi6 *fl6,
>>   				      const struct request_sock *req, u8 proto);
>> 
>>  diff --git a/include/net/inet_connection_sock.h 
>> b/include/net/inet_connection_sock.h
>>  index ec0479a..9cd43c5 100644
>>  --- a/include/net/inet_connection_sock.h
>>  +++ b/include/net/inet_connection_sock.h
>>  @@ -62,9 +62,9 @@ struct inet_connection_sock_af_ops {
>>   				char __user *optval, int __user *optlen);
>>   #endif
>>   	void	    (*addr2sockaddr)(struct sock *sk, struct sockaddr *);
>>  -	int	    (*bind_conflict)(const struct sock *sk,
>>  -				     const struct inet_bind_bucket *tb,
>>  -				     bool relax, bool soreuseport_ok);
>>  +	int         (*rcv_saddr_equal)(const struct sock *sk1,
>>  +				       const struct sock *sk2,
>>  +				       bool match_wildcard);
>>   	void	    (*mtu_reduced)(struct sock *sk);
>>   };
>> 
>> 
> 
> The patch looks as a nice code cleanup already!
> 
> Have you looked if we can simply have one rcv_saddr_equal for both 
> ipv4
> and ipv6 that e.g. uses sk->sk_family instead of function pointers?
> This could give us even more possibilities to remove some indirect
> functions calls and thus might relieve some cycles?

I was going to do that but I'm not familiar enough with how sockets 
work to be comfortable.  My main concern is we have the ipv6_only() 
check on a socket, which seems to indicate to me that you can have a 
socket that can do both ipv4/ipv6, so what if we're specifically going 
through the ipv6 code, but we aren't ipv6_only() and we end up doing 
the ipv4 address compare when we really need to do the ipv6 address 
compare?  If this can't happen (and honestly as I type it out it sounds 
crazier than it did in my head) then yeah I'll totally do that as well 
and we can just have a global function without the protocol specific 
callbacks, but I need you or somebody to tell me I'm crazy and that it 
would be ok to have it all in one function.  Thanks,

Josef

^ permalink raw reply

* Re: [PATCH 3/5 net-next] inet: don't check for bind conflicts twice when searching for a port
From: Josef Bacik @ 2016-12-21 15:12 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: davem, kraigatgoog, eric.dumazet, tom, netdev, kernel-team
In-Reply-To: <1482332912.2260.8.camel@stressinduktion.org>

On Wed, Dec 21, 2016 at 10:08 AM, Hannes Frederic Sowa 
<hannes@stressinduktion.org> wrote:
> On Tue, 2016-12-20 at 15:07 -0500, Josef Bacik wrote:
>>  --- a/net/ipv4/inet_connection_sock.c
>>  +++ b/net/ipv4/inet_connection_sock.c
>>  @@ -92,7 +92,7 @@ int inet_csk_get_port(struct sock *sk, unsigned 
>> short snum)
>>   {
>>   	bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
>>   	struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
>>  -	int ret = 1, attempts = 5, port = snum;
>>  +	int ret = 1, port = snum;
>>   	struct inet_bind_hashbucket *head;
>>   	struct net *net = sock_net(sk);
>>   	int i, low, high, attempt_half;
>>  @@ -100,6 +100,7 @@ int inet_csk_get_port(struct sock *sk, unsigned 
>> short snum)
>>   	kuid_t uid = sock_i_uid(sk);
>>   	u32 remaining, offset;
>>   	bool reuseport_ok = !!snum;
>>  +	bool empty_tb = true;
>> 
>>   	if (port) {
>>   		head = &hinfo->bhash[inet_bhashfn(net, port,
>>  @@ -111,7 +112,6 @@ int inet_csk_get_port(struct sock *sk, unsigned 
>> short snum)
>> 
>>   		goto tb_not_found;
>>   	}
>>  -again:
>>   	attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
>>   other_half_scan:
>>   	inet_get_local_port_range(net, &low, &high);
>>  @@ -148,8 +148,12 @@ other_parity_scan:
>>   		spin_lock_bh(&head->lock);
>>   		inet_bind_bucket_for_each(tb, &head->chain)
>>   			if (net_eq(ib_net(tb), net) && tb->port == port) {
>>  -				if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok))
>>  -					goto tb_found;
>>  +				if (hlist_empty(&tb->owners))
>>  +					goto success;
>>  +				if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok)) {
>>  +					empty_tb = false;
>>  +					goto success;
>>  +				}
>>   				goto next_port;
>>   			}
>>   		goto tb_not_found;
>>  @@ -184,23 +188,12 @@ tb_found:
>>   		      !rcu_access_pointer(sk->sk_reuseport_cb) &&
>>   		      sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
>>   			goto success;
>>  -		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) {
>>  -			if ((reuse ||
>>  -			     (tb->fastreuseport > 0 &&
>>  -			      sk->sk_reuseport &&
>>  -			      !rcu_access_pointer(sk->sk_reuseport_cb) &&
>>  -			      uid_eq(tb->fastuid, uid))) && !snum &&
>>  -			    --attempts >= 0) {
>>  -				spin_unlock_bh(&head->lock);
>>  -				goto again;
>>  -			}
>>  +		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok))
>>   			goto fail_unlock;
>>  -		}
>>  -		if (!reuse)
>>  -			tb->fastreuse = 0;
>>  -		if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
>>  -			tb->fastreuseport = 0;
>>  -	} else {
>>  +		empty_tb = false;
>>  +	}
>>  +success:
>>  +	if (empty_tb) {
> 
> 
> I would fine it even more simple to read, if you redo the hlist_empty
> check here instead of someone has to review all the paths where this
> might get set. hlist_empty is a very quick test.

Yup that's fair, I'll fix that up.  Thanks,

Josef

^ permalink raw reply

* Re: [PATCH 3/5 net-next] inet: don't check for bind conflicts twice when searching for a port
From: Hannes Frederic Sowa @ 2016-12-21 15:08 UTC (permalink / raw)
  To: Josef Bacik, davem, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482264424-15439-4-git-send-email-jbacik@fb.com>

On Tue, 2016-12-20 at 15:07 -0500, Josef Bacik wrote:
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -92,7 +92,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
>  {
>  	bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
>  	struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
> -	int ret = 1, attempts = 5, port = snum;
> +	int ret = 1, port = snum;
>  	struct inet_bind_hashbucket *head;
>  	struct net *net = sock_net(sk);
>  	int i, low, high, attempt_half;
> @@ -100,6 +100,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
>  	kuid_t uid = sock_i_uid(sk);
>  	u32 remaining, offset;
>  	bool reuseport_ok = !!snum;
> +	bool empty_tb = true;
>  
>  	if (port) {
>  		head = &hinfo->bhash[inet_bhashfn(net, port,
> @@ -111,7 +112,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
>  
>  		goto tb_not_found;
>  	}
> -again:
>  	attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
>  other_half_scan:
>  	inet_get_local_port_range(net, &low, &high);
> @@ -148,8 +148,12 @@ other_parity_scan:
>  		spin_lock_bh(&head->lock);
>  		inet_bind_bucket_for_each(tb, &head->chain)
>  			if (net_eq(ib_net(tb), net) && tb->port == port) {
> -				if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok))
> -					goto tb_found;
> +				if (hlist_empty(&tb->owners))
> +					goto success;
> +				if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok)) {
> +					empty_tb = false;
> +					goto success;
> +				}
>  				goto next_port;
>  			}
>  		goto tb_not_found;
> @@ -184,23 +188,12 @@ tb_found:
>  		      !rcu_access_pointer(sk->sk_reuseport_cb) &&
>  		      sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
>  			goto success;
> -		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) {
> -			if ((reuse ||
> -			     (tb->fastreuseport > 0 &&
> -			      sk->sk_reuseport &&
> -			      !rcu_access_pointer(sk->sk_reuseport_cb) &&
> -			      uid_eq(tb->fastuid, uid))) && !snum &&
> -			    --attempts >= 0) {
> -				spin_unlock_bh(&head->lock);
> -				goto again;
> -			}
> +		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok))
>  			goto fail_unlock;
> -		}
> -		if (!reuse)
> -			tb->fastreuse = 0;
> -		if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
> -			tb->fastreuseport = 0;
> -	} else {
> +		empty_tb = false;
> +	}
> +success:
> +	if (empty_tb) {


I would fine it even more simple to read, if you redo the hlist_empty
check here instead of someone has to review all the paths where this
might get set. hlist_empty is a very quick test.


Thanks,
Hannes

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox