Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/2] ixgbe: release lock for the duration of ixgbe_suspend_close()
From: Pavel Tatashin @ 2018-05-03 13:30 UTC (permalink / raw)
  To: alexander.duyck
  Cc: Steven Sistare, Daniel Jordan, LKML, jeffrey.t.kirsher,
	intel-wired-lan, netdev, gregkh
In-Reply-To: <CAKgT0UdocfEm9oXZ1dkEMari8m3OA4uVTrYg45uj9fk2V41bxQ@mail.gmail.com>

Hi Alex,

> I'm not a fan of dropping the mutex while we go through
> ixgbe_close_suspend. I'm concerned it will result in us having a
> number of races on shutdown.

I would agree, but ixgbe_close_suspend() is already called without this
mutex from ixgbe_close(). This path is executed also during machine
shutdown but when kexec'ed. So, it is either an existing bug or there are
no races. But, because ixgbe_close() is called from the userland, and a
little earlier than ixgbe_shutdown() I think this means there are no races.

> If anything, I think we would need to find a replacement for the mutex
> that can operate at the per-netdev level if you are wanting to
> parallelize this.

Yes, if lock is necessary, it can be replaced in this place (and added to
ixgbe_close())  with something scalable.

Thank you,
Pavel

^ permalink raw reply

* Re: [PATCH] sctp: fix a potential missing-check bug
From: Wenwen Wang @ 2018-05-03 13:31 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Kangjie Lu, Vlad Yasevich, Neil Horman, David S. Miller,
	open list:SCTP PROTOCOL, open list:NETWORKING [GENERAL],
	open list, Wenwen Wang
In-Reply-To: <20180503124629.GM5105@localhost.localdomain>

On Thu, May 3, 2018 at 7:46 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Thu, May 03, 2018 at 07:01:51AM -0500, Wenwen Wang wrote:
>> On Wed, May 2, 2018 at 8:48 PM, Marcelo Ricardo Leitner
>> <marcelo.leitner@gmail.com> wrote:
>> > On Wed, May 02, 2018 at 08:27:05PM -0500, Wenwen Wang wrote:
>> >> On Wed, May 2, 2018 at 8:24 PM, Marcelo Ricardo Leitner
>> >> <marcelo.leitner@gmail.com> wrote:
>> >> > On Wed, May 02, 2018 at 08:15:45PM -0500, Wenwen Wang wrote:
>> >> >> In sctp_setsockopt_maxseg(), the integer 'val' is compared against min_len
>> >> >> and max_len to check whether it is in the appropriate range. If it is not,
>> >> >> an error code -EINVAL will be returned. This is enforced by a security
>> >> >> check. But, this check is only executed when 'val' is not 0. In fact, if
>> >> >> 'val' is 0, it will be assigned with a new value (if the return value of
>> >> >> the function sctp_id2assoc() is not 0) in the following execution. However,
>> >> >> this new value of 'val' is not checked before it is used to assigned to
>> >> >> asoc->user_frag. That means it is possible that the new value of 'val'
>> >> >> could be out of the expected range. This can cause security issues
>> >> >> such as buffer overflows, e.g., the new value of 'val' is used as an index
>> >> >> to access a buffer.
>> >> >>
>> >> >> This patch inserts a check for the new value of 'val' to see if it is in
>> >> >> the expected range. If it is not, an error code -EINVAL will be returned.
>> >> >>
>> >> >> Signed-off-by: Wenwen Wang <wang6495@umn.edu>
>> >> >> ---
>> >> >>  net/sctp/socket.c | 22 +++++++++++-----------
>> >> >>  1 file changed, 11 insertions(+), 11 deletions(-)
>> >> >
>> >> > ?
>> >> > This patch is the same as previous one. git send-email <old file>
>> >> > maybe?
>> >> >
>> >> >   Marcelo
>> >>
>> >> Thanks for your suggestion, Marcelo. I can send the old file. But, I
>> >> have added a line of comment in this patch.
>> >
>> > I meant if you had sent the old patch again by accident, because you
>> > said you worked on an old version of the tree, but then posted a patch
>> > that also doesn't use the new MTU function I mentioned.
>> >
>> >   Marcelo
>>
>> I worked on the latest kernel. But, I didn't find the MTU function
>> sctp_mtu_payload().
>
> Which tree are you using?
> [a] git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
>    or
> [b] git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
> ?
>
> The function isn't on [a] yet, but it is on [b].
>
>   Marcelo

Many thanks for your patience, Marcelo :)

The tree I am working on is:
git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Wenwen

^ permalink raw reply

* [PATCH] net/mlx5: fix spelling mistake: "modfiy" -> "modify"
From: Colin King @ 2018-05-03 13:35 UTC (permalink / raw)
  To: Saeed Mahameed, Matan Barak, Leon Romanovsky, David S . Miller,
	netdev, linux-rdma
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Trivial fix to spelling mistake in netdev_warn warning message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
index 610d485c4b03..f64b5e78519b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
@@ -565,7 +565,7 @@ static void arfs_modify_rule_rq(struct mlx5e_priv *priv,
 	err =  mlx5_modify_rule_destination(rule, &dst, NULL);
 	if (err)
 		netdev_warn(priv->netdev,
-			    "Failed to modfiy aRFS rule destination to rq=%d\n", rxq);
+			    "Failed to modify aRFS rule destination to rq=%d\n", rxq);
 }
 
 static void arfs_handle_work(struct work_struct *work)
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH 2/2] drivers core: multi-threading device shutdown
From: Pavel Tatashin @ 2018-05-03 13:38 UTC (permalink / raw)
  To: tobin
  Cc: Steven Sistare, Daniel Jordan, LKML, jeffrey.t.kirsher,
	intel-wired-lan, netdev, gregkh
In-Reply-To: <20180503055407.GN3791@eros>

On Thu, May 3, 2018 at 1:54 AM Tobin C. Harding <tobin@apporbit.com> wrote:

> This code was a pleasure to read, super clean.

Hi Tobin,

Thank you very much for your review, I will address all of your comments in
the next revision.

BTW, I found  a lock ordering issue in my work that  that I will need to
fix:

In device_shutdown() device_lock() must be taken before
devices_kset->list_lock. Instead I will use device_trylock(), and if that
fails, will drop devices_kset->list_lock and acquiring the device lock
outside, and check that the device is still in the list after taking the
list lock again.

Pavel

^ permalink raw reply

* Re: [PATCH bpf-next 00/12] Move ld_abs/ld_ind to native BPF
From: Daniel Borkmann @ 2018-05-03 13:39 UTC (permalink / raw)
  To: ast; +Cc: netdev
In-Reply-To: <20180503010536.7917-1-daniel@iogearbox.net>

On 05/03/2018 03:05 AM, Daniel Borkmann wrote:
> This set simplifies BPF JITs significantly by moving ld_abs/ld_ind
> to native BPF, for details see individual patches. Main rationale
> is in patch 'implement ld_abs/ld_ind in native bpf'. Thanks!
[...]

Noticed a minor issue, therefore will respin the series in a v2.

^ permalink raw reply

* Re: [PATCH] sctp: fix a potential missing-check bug
From: Marcelo Ricardo Leitner @ 2018-05-03 13:39 UTC (permalink / raw)
  To: Wenwen Wang
  Cc: Kangjie Lu, Vlad Yasevich, Neil Horman, David S. Miller,
	open list:SCTP PROTOCOL, open list:NETWORKING [GENERAL],
	open list
In-Reply-To: <CAAa=b7dvAFnVbFuMuUsCfEUJiKFgbW0UoNSuHEXwmVPdb_pWuA@mail.gmail.com>

On Thu, May 03, 2018 at 08:31:28AM -0500, Wenwen Wang wrote:
> On Thu, May 3, 2018 at 7:46 AM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
> > On Thu, May 03, 2018 at 07:01:51AM -0500, Wenwen Wang wrote:
> >> On Wed, May 2, 2018 at 8:48 PM, Marcelo Ricardo Leitner
> >> <marcelo.leitner@gmail.com> wrote:
> >> > On Wed, May 02, 2018 at 08:27:05PM -0500, Wenwen Wang wrote:
> >> >> On Wed, May 2, 2018 at 8:24 PM, Marcelo Ricardo Leitner
> >> >> <marcelo.leitner@gmail.com> wrote:
> >> >> > On Wed, May 02, 2018 at 08:15:45PM -0500, Wenwen Wang wrote:
> >> >> >> In sctp_setsockopt_maxseg(), the integer 'val' is compared against min_len
> >> >> >> and max_len to check whether it is in the appropriate range. If it is not,
> >> >> >> an error code -EINVAL will be returned. This is enforced by a security
> >> >> >> check. But, this check is only executed when 'val' is not 0. In fact, if
> >> >> >> 'val' is 0, it will be assigned with a new value (if the return value of
> >> >> >> the function sctp_id2assoc() is not 0) in the following execution. However,
> >> >> >> this new value of 'val' is not checked before it is used to assigned to
> >> >> >> asoc->user_frag. That means it is possible that the new value of 'val'
> >> >> >> could be out of the expected range. This can cause security issues
> >> >> >> such as buffer overflows, e.g., the new value of 'val' is used as an index
> >> >> >> to access a buffer.
> >> >> >>
> >> >> >> This patch inserts a check for the new value of 'val' to see if it is in
> >> >> >> the expected range. If it is not, an error code -EINVAL will be returned.
> >> >> >>
> >> >> >> Signed-off-by: Wenwen Wang <wang6495@umn.edu>
> >> >> >> ---
> >> >> >>  net/sctp/socket.c | 22 +++++++++++-----------
> >> >> >>  1 file changed, 11 insertions(+), 11 deletions(-)
> >> >> >
> >> >> > ?
> >> >> > This patch is the same as previous one. git send-email <old file>
> >> >> > maybe?
> >> >> >
> >> >> >   Marcelo
> >> >>
> >> >> Thanks for your suggestion, Marcelo. I can send the old file. But, I
> >> >> have added a line of comment in this patch.
> >> >
> >> > I meant if you had sent the old patch again by accident, because you
> >> > said you worked on an old version of the tree, but then posted a patch
> >> > that also doesn't use the new MTU function I mentioned.
> >> >
> >> >   Marcelo
> >>
> >> I worked on the latest kernel. But, I didn't find the MTU function
> >> sctp_mtu_payload().
> >
> > Which tree are you using?
> > [a] git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
> >    or
> > [b] git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
> > ?
> >
> > The function isn't on [a] yet, but it is on [b].
> >
> >   Marcelo
>
> Many thanks for your patience, Marcelo :)
>
> The tree I am working on is:
> git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Ahh! That explains the discrepancy :)
For networking patches, please refer to
Documentation/networking/netdev-FAQ.txt
It describes what the 2 trees I pointed out are and how they should be
used.
In short, both net and net-next are always newer than the one you're
using for networking subsystem.

Regards,
Marcelo

^ permalink raw reply

* Re: [PATCH] sctp: fix a potential missing-check bug
From: Wenwen Wang @ 2018-05-03 13:43 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Kangjie Lu, Vlad Yasevich, Neil Horman, David S. Miller,
	open list:SCTP PROTOCOL, open list:NETWORKING [GENERAL],
	open list, Wenwen Wang
In-Reply-To: <20180503133945.GN5105@localhost.localdomain>

On Thu, May 3, 2018 at 8:39 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Thu, May 03, 2018 at 08:31:28AM -0500, Wenwen Wang wrote:
>> On Thu, May 3, 2018 at 7:46 AM, Marcelo Ricardo Leitner
>> <marcelo.leitner@gmail.com> wrote:
>> > On Thu, May 03, 2018 at 07:01:51AM -0500, Wenwen Wang wrote:
>> >> On Wed, May 2, 2018 at 8:48 PM, Marcelo Ricardo Leitner
>> >> <marcelo.leitner@gmail.com> wrote:
>> >> > On Wed, May 02, 2018 at 08:27:05PM -0500, Wenwen Wang wrote:
>> >> >> On Wed, May 2, 2018 at 8:24 PM, Marcelo Ricardo Leitner
>> >> >> <marcelo.leitner@gmail.com> wrote:
>> >> >> > On Wed, May 02, 2018 at 08:15:45PM -0500, Wenwen Wang wrote:
>> >> >> >> In sctp_setsockopt_maxseg(), the integer 'val' is compared against min_len
>> >> >> >> and max_len to check whether it is in the appropriate range. If it is not,
>> >> >> >> an error code -EINVAL will be returned. This is enforced by a security
>> >> >> >> check. But, this check is only executed when 'val' is not 0. In fact, if
>> >> >> >> 'val' is 0, it will be assigned with a new value (if the return value of
>> >> >> >> the function sctp_id2assoc() is not 0) in the following execution. However,
>> >> >> >> this new value of 'val' is not checked before it is used to assigned to
>> >> >> >> asoc->user_frag. That means it is possible that the new value of 'val'
>> >> >> >> could be out of the expected range. This can cause security issues
>> >> >> >> such as buffer overflows, e.g., the new value of 'val' is used as an index
>> >> >> >> to access a buffer.
>> >> >> >>
>> >> >> >> This patch inserts a check for the new value of 'val' to see if it is in
>> >> >> >> the expected range. If it is not, an error code -EINVAL will be returned.
>> >> >> >>
>> >> >> >> Signed-off-by: Wenwen Wang <wang6495@umn.edu>
>> >> >> >> ---
>> >> >> >>  net/sctp/socket.c | 22 +++++++++++-----------
>> >> >> >>  1 file changed, 11 insertions(+), 11 deletions(-)
>> >> >> >
>> >> >> > ?
>> >> >> > This patch is the same as previous one. git send-email <old file>
>> >> >> > maybe?
>> >> >> >
>> >> >> >   Marcelo
>> >> >>
>> >> >> Thanks for your suggestion, Marcelo. I can send the old file. But, I
>> >> >> have added a line of comment in this patch.
>> >> >
>> >> > I meant if you had sent the old patch again by accident, because you
>> >> > said you worked on an old version of the tree, but then posted a patch
>> >> > that also doesn't use the new MTU function I mentioned.
>> >> >
>> >> >   Marcelo
>> >>
>> >> I worked on the latest kernel. But, I didn't find the MTU function
>> >> sctp_mtu_payload().
>> >
>> > Which tree are you using?
>> > [a] git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
>> >    or
>> > [b] git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
>> > ?
>> >
>> > The function isn't on [a] yet, but it is on [b].
>> >
>> >   Marcelo
>>
>> Many thanks for your patience, Marcelo :)
>>
>> The tree I am working on is:
>> git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>
> Ahh! That explains the discrepancy :)
> For networking patches, please refer to
> Documentation/networking/netdev-FAQ.txt
> It describes what the 2 trees I pointed out are and how they should be
> used.
> In short, both net and net-next are always newer than the one you're
> using for networking subsystem.
>
> Regards,
> Marcelo

I see now. Will work on the new networking trees. Thanks!

Wenwen

^ permalink raw reply

* Re: [PATCH 1/2] ixgbe: release lock for the duration of ixgbe_suspend_close()
From: Alexander Duyck @ 2018-05-03 13:46 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Steven Sistare, Daniel Jordan, LKML, Jeff Kirsher,
	intel-wired-lan, Netdev, Greg KH
In-Reply-To: <CAGM2rea3XXJ7OqGeAdexyOUb1_7W_84t5HAb_hp7Rb=TkTM86Q@mail.gmail.com>

On Thu, May 3, 2018 at 6:30 AM, Pavel Tatashin
<pasha.tatashin@oracle.com> wrote:
> Hi Alex,

Hi Pavel,

>> I'm not a fan of dropping the mutex while we go through
>> ixgbe_close_suspend. I'm concerned it will result in us having a
>> number of races on shutdown.
>
> I would agree, but ixgbe_close_suspend() is already called without this
> mutex from ixgbe_close(). This path is executed also during machine
> shutdown but when kexec'ed. So, it is either an existing bug or there are
> no races. But, because ixgbe_close() is called from the userland, and a
> little earlier than ixgbe_shutdown() I think this means there are no races.

All callers of ixgbe_close are supposed to already be holding the RTNL
mutex. That is the whole reason why we need to hold it here as the
shutdown path doesn't have the mutex held otherwise. The mutex was
added here because shutdown was racing with the open/close calls.

>
>> If anything, I think we would need to find a replacement for the mutex
>> that can operate at the per-netdev level if you are wanting to
>> parallelize this.
>
> Yes, if lock is necessary, it can be replaced in this place (and added to
> ixgbe_close())  with something scalable.

I wouldn't suggest adding yet another lock for all this. Instead it
might make more sense for us to start looking at breaking up the
locking since most of the networking subsystem uses the rtnl_lock call
it might be time to start looking at doing something like cutting back
on the use of this in cases where all the resources needed are
specific to one device and instead have a per device lock that could
address those kind of cases.

- Alex

^ permalink raw reply

* [PATCH RESEND] SUNRPC: fix include for cmpxchg_relaxed()
From: Mark Rutland @ 2018-05-03 13:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mark Rutland, Trond Myklebust, Anna Schumaker, J . Bruce Fields,
	Jeff Layton, David S . Miller, linux-nfs, netdev

Currently net/sunrpc/xprtmultipath.c is the only file outside of arch/
headers and asm-generic/ headers to include <asm/cmpxhcg.h>, apparently
for the use of cmpxchg_relaxed().

However, many architectures do not provide cmpxchg_relaxed() in their
<asm/cmpxhcg.h>, and it is necessary to include <linux/atomic.h> to get
this definition, as noted in Documentation/core-api/atomic_ops.rst:

  If someone wants to use xchg(), cmpxchg() and their variants,
  linux/atomic.h should be included rather than asm/cmpxchg.h, unless
  the code is in arch/* and can take care of itself.

Evidently we're getting the right header this via some transitive
include today, but this isn't something we can/should rely upon,
especially with ongoing rework of the atomic headers for KASAN
instrumentation.

Let's fix the code to include <linux/atomic.h>, avoiding fragility.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
---
 net/sunrpc/xprtmultipath.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

I sent this about a year ago [1], but got no response. This still applies atop
of v4.17-rc3.

I'm currently trying to implement instrumented atomics for arm64, and it would
be great to have this fixed.

Mark.

[1] https://lkml.kernel.org/r/1489574142-20856-1-git-send-email-mark.rutland@arm.com

diff --git a/net/sunrpc/xprtmultipath.c b/net/sunrpc/xprtmultipath.c
index e2d64c7138c3..d897f41be244 100644
--- a/net/sunrpc/xprtmultipath.c
+++ b/net/sunrpc/xprtmultipath.c
@@ -13,7 +13,7 @@
 #include <linux/rcupdate.h>
 #include <linux/rculist.h>
 #include <linux/slab.h>
-#include <asm/cmpxchg.h>
+#include <linux/atomic.h>
 #include <linux/spinlock.h>
 #include <linux/sunrpc/xprt.h>
 #include <linux/sunrpc/addr.h>
-- 
2.11.0

^ permalink raw reply related

* Re: DSA switch
From: Ran Shalit @ 2018-05-03 13:54 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Andrew Lunn, netdev
In-Reply-To: <20180503071124.GM19250@nanopsycho>

On Thu, May 3, 2018 at 10:11 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Thu, May 03, 2018 at 08:50:52AM CEST, ranshalit@gmail.com wrote:
>>On Wed, May 2, 2018 at 11:56 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>>> On Wed, May 02, 2018 at 11:20:05PM +0300, Ran Shalit wrote:
>>>> Hello,
>>>>
>>>> Is it possible to use switch just like external real switch,
>>>> connecting all ports to the same subnet ?
>>>
>>> Yes. Just bridge all ports/interfaces together and put your host IP
>>> address on the bridge.
>>>
>>>         Andrew
>>
>>
>>Hi,
>>
>>I get error on trying to add bridge.
>>I am trying to =understand which configuration is missing probably in my kernel,
>> I ran strace, but not sure , does it point to any missing configuration ?
>>
>>root@dm814x-evm:~# ip link add br0 type bridge
>
> Is the bridge module enabled in the kernel config?
>
>

Hi,

It seems that although the bridge command functions, it takes several
seconds (~6-7 seconds !) from the time it resturns to shell till a
real communication works (ping between 2 PCs connected to switch).
Does it usually takes so much time ?

Thank you,
ranran

^ permalink raw reply

* Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring
From: Tiwei Bie @ 2018-05-03 13:54 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, virtualization, linux-kernel, netdev, wexu,
	jfreimann
In-Reply-To: <9f0b4e37-63ff-42f9-f2e6-3747a19a0206@redhat.com>

On Thu, May 03, 2018 at 03:25:29PM +0800, Jason Wang wrote:
> On 2018年05月03日 10:09, Tiwei Bie wrote:
> > > > > So how about we use the straightforward way then?
> > > > You mean we do new += vq->vring_packed.num instead
> > > > of event_idx -= vq->vring_packed.num before calling
> > > > vring_need_event()?
> > > > 
> > > > The problem is that, the second param (new_idx) of
> > > > vring_need_event() will be used for:
> > > > 
> > > > (__u16)(new_idx - event_idx - 1)
> > > > (__u16)(new_idx - old)
> > > > 
> > > > So if we change new, we will need to change old too.
> > > I think that since we have a branch there anyway,
> > > we are better off just special-casing if (wrap_counter != vq->wrap_counter).
> > > Treat is differenty and avoid casts.
> > > 
> > > > And that would be an ugly hack..
> > > > 
> > > > Best regards,
> > > > Tiwei Bie
> > > I consider casts and huge numbers with two's complement
> > > games even uglier.
> > The dependency on two's complement game is introduced
> > since the split ring.
> > 
> > In packed ring, old is calculated via:
> > 
> > old = vq->next_avail_idx - vq->num_added;
> > 
> > In split ring, old is calculated via:
> > 
> > old = vq->avail_idx_shadow - vq->num_added;
> > 
> > In both cases, when vq->num_added is bigger, old will
> > be a big number.
> > 
> > Best regards,
> > Tiwei Bie
> > 
> 
> How about just do something like vhost:
> 
> static u16 vhost_idx_diff(struct vhost_virtqueue *vq, u16 old, u16 new)
> {
>     if (new > old)
>         return new - old;
>     return  (new + vq->num - old);
> }
> 
> static bool vhost_vring_packed_need_event(struct vhost_virtqueue *vq,
>                       __u16 event_off, __u16 new,
>                       __u16 old)
> {
>     return (__u16)(vhost_idx_diff(vq, new, event_off) - 1) <
>            (__u16)vhost_idx_diff(vq, new, old);
> }
> 
> ?

It seems that there is a typo in above code. The second
param of vhost_idx_diff() is `old`, but when calling this
function in vhost_vring_packed_need_event(), `new` is
passed as the second param.

If we assume the second param of vhost_idx_diff() is new
and the third one is old, i.e.:

static u16 vhost_idx_diff(struct vhost_virtqueue *vq, u16 new, u16 old)
{
    if (new > old)
        return new - old;
    return  (new + vq->num - old);
}

I think it's still not right.

Because in virtqueue_enable_cb_delayed(), we may set an
event_off which is bigger than new and both of them have
wrapped. And in this case, although new is smaller than
event_off (i.e. the third param -- old), new shouldn't
add vq->num, and actually we are expecting a very big
idx diff.

Best regards,
Tiwei Bie

^ permalink raw reply

* Re: [PATCH bpf-next v3 00/15] Introducing AF_XDP support
From: Willem de Bruijn @ 2018-05-03 13:55 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Karlsson, Magnus, Alexander Duyck, Alexander Duyck,
	John Fastabend, Alexei Starovoitov, Jesper Dangaard Brouer,
	Daniel Borkmann, Michael S. Tsirkin, Network Development,
	Björn Töpel, michael.lundkvist, Brandeburg, Jesse,
	Singhai, Anjali, Zhang, Qi Z
In-Reply-To: <20180502110136.3738-1-bjorn.topel@gmail.com>

On Wed, May 2, 2018 at 1:01 PM, Björn Töpel <bjorn.topel@gmail.com> wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> This patch set introduces a new address family called AF_XDP that is
> optimized for high performance packet processing

Great patchset, thanks.

> and, in upcoming
> patch sets, zero-copy semantics.

And looking forward to this!

> Thanks: Björn and Magnus
>
> Björn Töpel (7):
>   net: initial AF_XDP skeleton
>   xsk: add user memory registration support sockopt
>   xsk: add Rx queue setup and mmap support
>   xsk: add Rx receive functions and poll support
>   bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP
>   xsk: wire up XDP_DRV side of AF_XDP
>   xsk: wire up XDP_SKB side of AF_XDP
>
> Magnus Karlsson (8):
>   xsk: add umem fill queue support and mmap
>   xsk: add support for bind for Rx
>   xsk: add umem completion queue support and mmap
>   xsk: add Tx queue setup and mmap support
>   dev: packet: make packet_direct_xmit a common function
>   xsk: support for Tx
>   xsk: statistics support
>   samples/bpf: sample application and documentation for AF_XDP sockets

For the series

Acked-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply

* Re: [PATCH net] tcp: restore autocorking
From: Eric Dumazet @ 2018-05-03 13:55 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David Miller, Tal Gilboa, netdev, Michael Wenig, Eric Dumazet
In-Reply-To: <298c3b30-d3d3-6069-dd46-bcf0ce315126@mellanox.com>

On Thu, May 3, 2018 at 4:06 AM Tariq Toukan <tariqt@mellanox.com> wrote:



> On 03/05/2018 6:25 AM, Eric Dumazet wrote:
> > When adding rb-tree for TCP retransmit queue, we inadvertently broke
> > TCP autocorking.
> >
> > tcp_should_autocork() should really check if the rtx queue is not empty.
> >

> Hi Eric,

> We are glad to see that the issue that Tal investigated and reported [1]
> is now addressed.
> Thanks for doing that!

> Tal, let’s perf test to see the effect of this fix.

> Best,
> Tariq

> [1] https://patchwork.ozlabs.org/cover/822218/

Yes, but you never really gave any  insights of what exact tests you were
running :/

Sometimes the crystal ball is silent, sometimes it gives a hint :)

^ permalink raw reply

* Re: DSA switch
From: Andrew Lunn @ 2018-05-03 14:01 UTC (permalink / raw)
  To: Ran Shalit; +Cc: Jiri Pirko, netdev
In-Reply-To: <CAJ2oMhJx-n6qWchO27BV2u9Jd5X7ymtSfWeqL971DQJVQQNM-g@mail.gmail.com>

> It seems that although the bridge command functions, it takes several
> seconds (~6-7 seconds !) from the time it resturns to shell till a
> real communication works (ping between 2 PCs connected to switch).
> Does it usually takes so much time ?

PHY link up can take a second or two. Less if you use interrupts.

What do you have the bridge forwarding delay set to?

I would suggest profiling it, see what is take time, and how you can
tune it.

     Andrew

^ permalink raw reply

* Re: [PATCH 1/2] ixgbe: release lock for the duration of ixgbe_suspend_close()
From: Steven Sistare @ 2018-05-03 14:01 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Alexander Duyck, Daniel Jordan, LKML, Jeff Kirsher,
	intel-wired-lan, Netdev, Greg KH
In-Reply-To: <CAKgT0UfNBugQywBZuepUJ6deEYsSWmTw3b6VbQBJLn-kKXe7ng@mail.gmail.com>

On 5/3/2018 9:46 AM, Alexander Duyck wrote:
> On Thu, May 3, 2018 at 6:30 AM, Pavel Tatashin
> <pasha.tatashin@oracle.com> wrote:
>> Hi Alex,
> 
> Hi Pavel,
> 
>>> I'm not a fan of dropping the mutex while we go through
>>> ixgbe_close_suspend. I'm concerned it will result in us having a
>>> number of races on shutdown.
>>
>> I would agree, but ixgbe_close_suspend() is already called without this
>> mutex from ixgbe_close(). This path is executed also during machine
>> shutdown but when kexec'ed. So, it is either an existing bug or there are
>> no races. But, because ixgbe_close() is called from the userland, and a
>> little earlier than ixgbe_shutdown() I think this means there are no races.
> 
> All callers of ixgbe_close are supposed to already be holding the RTNL
> mutex. That is the whole reason why we need to hold it here as the
> shutdown path doesn't have the mutex held otherwise. The mutex was
> added here because shutdown was racing with the open/close calls.
> 
>>
>>> If anything, I think we would need to find a replacement for the mutex
>>> that can operate at the per-netdev level if you are wanting to
>>> parallelize this.
>>
>> Yes, if lock is necessary, it can be replaced in this place (and added to
>> ixgbe_close())  with something scalable.
> 
> I wouldn't suggest adding yet another lock for all this. Instead it
> might make more sense for us to start looking at breaking up the
> locking since most of the networking subsystem uses the rtnl_lock call
> it might be time to start looking at doing something like cutting back
> on the use of this in cases where all the resources needed are
> specific to one device and instead have a per device lock that could
> address those kind of cases.

Hi Pavel, you might want to pull lock optimization out of this patch series.
The parallelization by itself is valuable, and optimizations for individual 
devices including locks can come later.

- Steve

^ permalink raw reply

* [PATCH] netfilter: nf_queue: Replace conntrack entry
From: Kristian Evensen @ 2018-05-03 14:07 UTC (permalink / raw)
  To: netfilter-devel
  Cc: Kristian Evensen, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, David S. Miller, coreteam, netdev, linux-kernel

SKBs are assigned a conntrack entry before being passed to any NFQUEUEs,
and if no entry is found then a new one is created. This behavior causes
problems for some traffic patterns. For example, if two UDP packets
to/from the same host (using the same ports) arrive at the "same" time,
both are assigned a new conntrack entry. After the first packet have
traversed all chains, the conntrack entry will be inserted into the
global table. The second packet will then be dropped during the
insertion step, as an entry for the same flow already exists. One type
of application that frequently generates this traffic pattern, is DNS
resolvers.

This commit introduces a new function that checks, and potentially
replaces, the conntrack entry for any additional "new" SKBs mapping to
an existing flow. While not a perfect solution, there are still
situations where to-be-dropped SKBs can slip through, the situations is
improved considerably. On the routers I have used for testing, packets
belonging to the same UDP flow are let through (when generating the
traffic pattern described above). Without the change in this commit, all
packets except the first one was dropped.

With the change in this commit, a user can implement "perfect" solutions
in user-space. An application can for example keep track of seen UDP
flows, and then only release packets belonging to one flow when the
entry has been created. Without the change, and SKB is stuck with the
original conntrack entry.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
---
 net/netfilter/nfnetlink_queue.c | 68 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index c97966298..150c11ff4 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -43,6 +43,9 @@
 
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
 #include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_l3proto.h>
+#include <net/netfilter/nf_conntrack_l4proto.h>
 #endif
 
 #define NFQNL_QMAX_DEFAULT 1024
@@ -1046,6 +1049,53 @@ static int nfq_id_after(unsigned int id, unsigned int max)
 	return (int)(id - max) > 0;
 }
 
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+static void nfqnl_update_ct(struct net *net, struct sk_buff *skb)
+{
+	const struct nf_conntrack_l3proto *l3proto;
+	const struct nf_conntrack_l4proto *l4proto;
+	struct nf_conntrack_tuple_hash *h;
+	struct nf_conntrack_tuple tuple;
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct = NULL;
+	unsigned int dataoff;
+	u16 l3num;
+	u8 l4num;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	l3num = nf_ct_l3num(ct);
+	l3proto = nf_ct_l3proto_find_get(l3num);
+
+	if (l3proto->get_l4proto(skb, skb_network_offset(skb), &dataoff,
+				 &l4num) <= 0) {
+		return;
+	}
+
+	l4proto = nf_ct_l4proto_find_get(l3num, l4num);
+
+	if (!nf_ct_get_tuple(skb, skb_network_offset(skb), dataoff, l3num,
+			     l4num, net, &tuple, l3proto, l4proto)) {
+		return;
+	}
+
+#if IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES)
+	h = nf_conntrack_find_get(net, &ct->zone, &tuple);
+#else
+	h = nf_conntrack_find_get(net, NULL, &tuple);
+#endif
+
+	if (h) {
+		pr_debug("%s: tuple %u %pI4:%hu -> %pI4:%hu\n", __func__,
+			 tuple.dst.protonum, &tuple.src.u3.ip,
+			 ntohs(tuple.src.u.all), &tuple.dst.u3.ip,
+			 ntohs(tuple.dst.u.all));
+		nf_ct_put(ct);
+		ct = nf_ct_tuplehash_to_ctrack(h);
+		nf_ct_set(skb, ct, IP_CT_NEW);
+	}
+}
+#endif
+
 static int nfqnl_recv_verdict_batch(struct net *net, struct sock *ctnl,
 				    struct sk_buff *skb,
 				    const struct nlmsghdr *nlh,
@@ -1060,6 +1110,7 @@ static int nfqnl_recv_verdict_batch(struct net *net, struct sock *ctnl,
 	LIST_HEAD(batch_list);
 	u16 queue_num = ntohs(nfmsg->res_id);
 	struct nfnl_queue_net *q = nfnl_queue_pernet(net);
+	enum ip_conntrack_info ctinfo;
 
 	queue = verdict_instance_lookup(q, queue_num,
 					NETLINK_CB(skb).portid);
@@ -1090,6 +1141,16 @@ static int nfqnl_recv_verdict_batch(struct net *net, struct sock *ctnl,
 	list_for_each_entry_safe(entry, tmp, &batch_list, list) {
 		if (nfqa[NFQA_MARK])
 			entry->skb->mark = ntohl(nla_get_be32(nfqa[NFQA_MARK]));
+
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+			nf_ct_get(entry->skb, &ctinfo);
+
+			if (ctinfo == IP_CT_NEW && verdict != NF_STOLEN &&
+			    verdict != NF_DROP) {
+				nfqnl_update_ct(net, entry->skb);
+			}
+#endif
+
 		nf_reinject(entry, verdict);
 	}
 	return 0;
@@ -1213,6 +1274,13 @@ static int nfqnl_recv_verdict(struct net *net, struct sock *ctnl,
 	if (nfqa[NFQA_MARK])
 		entry->skb->mark = ntohl(nla_get_be32(nfqa[NFQA_MARK]));
 
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	nf_ct_get(entry->skb, &ctinfo);
+
+	if (ctinfo == IP_CT_NEW && verdict != NF_STOLEN && verdict != NF_DROP)
+		nfqnl_update_ct(net, entry->skb);
+#endif
+
 	nf_reinject(entry, verdict);
 	return 0;
 }
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH v1 net-next] net: dsa: mv88e6xxx: Fix name string for 88E6141
From: Marek Behún @ 2018-05-03 14:16 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, Greg Kroah-Hartman, Vivien Didelot, Arkadi Sharshevsky,
	David S . Miller
In-Reply-To: <20180503131143.GA17027@lunn.ch>

Hmm, sorry, this is already merged. It seems I forgor to etch new
commits for some time.

On Thu, 3 May 2018 15:11:43 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> On Thu, May 03, 2018 at 03:06:55PM +0200, Marek Behún wrote:
> > The structure was copied from 88E6341 but the name was not changed.
> > 
> > Signed-off-by: Marek Behun <marek.behun@nic.cz>  
> 
> Hi Marek
> 
> Which tree is this against?
> 
> > ---
> >  drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/dsa/mv88e6xxx/chip.c
> > b/drivers/net/dsa/mv88e6xxx/chip.c index e9600a82dc83..fae362020305
> > 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c
> > +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> > @@ -3206,7 +3206,7 @@ static const struct mv88e6xxx_info
> > mv88e6xxx_table[] = { [MV88E6141] = {
> >  		.prod_num = MV88E6XXX_PORT_SWITCH_ID_PROD_6141,
> >  		.family = MV88E6XXX_FAMILY_6341,
> > -		.name = "Marvell 88E6341",
> > +		.name = "Marvell 88E6141",
> >  		.num_databases = 4096,
> >  		.num_ports = 6,
> >  		.max_vid = 4095,
> > -- 
> > 2.16.1
> >   
> 
> commit 79a68b2631d8ec3e149081b1ecfb23509c040b4e
> Author: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> Date:   Tue Mar 20 10:44:40 2018 +0100
> 
>     net: dsa: mv88e6xxx: Fix name of switch 88E6141
>     
>     The switch name is emitted in the kernel log, so having the right
> name there is nice.
>     
>     Fixes: 1558727a1c1b ("net: dsa: mv88e6xxx: Add support for
> ethernet switch 88E6141") Reviewed-by: Andrew Lunn <andrew@lunn.ch>
>     Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> 

^ permalink raw reply

* Re: [PATCH RESEND] SUNRPC: fix include for cmpxchg_relaxed()
From: Jeff Layton @ 2018-05-03 14:16 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel
  Cc: Trond Myklebust, Anna Schumaker, J . Bruce Fields,
	David S . Miller, linux-nfs, netdev
In-Reply-To: <20180503135050.15778-1-mark.rutland@arm.com>

On Thu, 2018-05-03 at 14:50 +0100, Mark Rutland wrote:
> Currently net/sunrpc/xprtmultipath.c is the only file outside of arch/
> headers and asm-generic/ headers to include <asm/cmpxhcg.h>, apparently
> for the use of cmpxchg_relaxed().
> 
> However, many architectures do not provide cmpxchg_relaxed() in their
> <asm/cmpxhcg.h>, and it is necessary to include <linux/atomic.h> to get
> this definition, as noted in Documentation/core-api/atomic_ops.rst:
> 
>   If someone wants to use xchg(), cmpxchg() and their variants,
>   linux/atomic.h should be included rather than asm/cmpxchg.h, unless
>   the code is in arch/* and can take care of itself.
> 
> Evidently we're getting the right header this via some transitive
> include today, but this isn't something we can/should rely upon,
> especially with ongoing rework of the atomic headers for KASAN
> instrumentation.
> 
> Let's fix the code to include <linux/atomic.h>, avoiding fragility.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Trond Myklebust <trond.myklebust@primarydata.com>
> Cc: Anna Schumaker <anna.schumaker@netapp.com>
> Cc: J. Bruce Fields <bfields@fieldses.org>
> Cc: Jeff Layton <jlayton@poochiereds.net>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: linux-nfs@vger.kernel.org
> Cc: netdev@vger.kernel.org
> ---
>  net/sunrpc/xprtmultipath.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> I sent this about a year ago [1], but got no response. This still applies atop
> of v4.17-rc3.
> 
> I'm currently trying to implement instrumented atomics for arm64, and it would
> be great to have this fixed.
> 
> Mark.
> 
> [1] https://lkml.kernel.org/r/1489574142-20856-1-git-send-email-mark.rutland@arm.com
> 
> diff --git a/net/sunrpc/xprtmultipath.c b/net/sunrpc/xprtmultipath.c
> index e2d64c7138c3..d897f41be244 100644
> --- a/net/sunrpc/xprtmultipath.c
> +++ b/net/sunrpc/xprtmultipath.c
> @@ -13,7 +13,7 @@
>  #include <linux/rcupdate.h>
>  #include <linux/rculist.h>
>  #include <linux/slab.h>
> -#include <asm/cmpxchg.h>
> +#include <linux/atomic.h>
>  #include <linux/spinlock.h>
>  #include <linux/sunrpc/xprt.h>
>  #include <linux/sunrpc/addr.h>

Looks fine.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply

* Re: [PATCH 1/2] ixgbe: release lock for the duration of ixgbe_suspend_close()
From: Pavel Tatashin @ 2018-05-03 14:23 UTC (permalink / raw)
  To: Steven Sistare
  Cc: alexander.duyck, Daniel Jordan, LKML, jeffrey.t.kirsher,
	intel-wired-lan, netdev, gregkh
In-Reply-To: <50c9ac34-3971-ac1b-0b5e-1bfa004fd4cf@oracle.com>

> Hi Pavel, you might want to pull lock optimization out of this patch
series.
> The parallelization by itself is valuable, and optimizations for
individual
> devices including locks can come later.

Hi Steve,

Yes, I will pull this patch out of the series. Thank you for the suggestion.

Pavel

^ permalink raw reply

* Re: [PATCH v2 net-next 2/4] net: add skeleton of bpfilter kernel module
From: Edward Cree @ 2018-05-03 14:23 UTC (permalink / raw)
  To: Alexei Starovoitov, davem
  Cc: daniel, torvalds, gregkh, luto, netdev, linux-kernel, kernel-team
In-Reply-To: <20180503043604.1604587-3-ast@kernel.org>

On 03/05/18 05:36, Alexei Starovoitov wrote:
> bpfilter.ko consists of bpfilter_kern.c (normal kernel module code)
> and user mode helper code that is embedded into bpfilter.ko
>
> The steps to build bpfilter.ko are the following:
> - main.c is compiled by HOSTCC into the bpfilter_umh elf executable file
> - with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file
>   is converted into bpfilter_umh.o object file
>   with _binary_net_bpfilter_bpfilter_umh_start and _end symbols
>   Example:
>   $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o
>   0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end
>   0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size
>   0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start
> - bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko
>
> bpfilter_kern.c is a normal kernel module code that calls
> the fork_usermode_blob() helper to execute part of its own data
> as a user mode process.
>
> Notice that _binary_net_bpfilter_bpfilter_umh_start - end
> is placed into .init.rodata section, so it's freed as soon as __init
> function of bpfilter.ko is finished.
> As part of __init the bpfilter.ko does first request/reply action
> via two unix pipe provided by fork_usermode_blob() helper to
> make sure that umh is healthy. If not it will kill it via pid.
>
> Later bpfilter_process_sockopt() will be called from bpfilter hooks
> in get/setsockopt() to pass iptable commands into umh via bpfilter.ko
>
> If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will
> kill umh as well.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  include/linux/bpfilter.h      | 15 +++++++
>  include/uapi/linux/bpfilter.h | 21 ++++++++++
>  net/Kconfig                   |  2 +
>  net/Makefile                  |  1 +
>  net/bpfilter/Kconfig          | 17 ++++++++
>  net/bpfilter/Makefile         | 24 +++++++++++
>  net/bpfilter/bpfilter_kern.c  | 93 +++++++++++++++++++++++++++++++++++++++++++
>  net/bpfilter/main.c           | 63 +++++++++++++++++++++++++++++
>  net/bpfilter/msgfmt.h         | 17 ++++++++
>  net/ipv4/Makefile             |  2 +
>  net/ipv4/bpfilter/Makefile    |  2 +
>  net/ipv4/bpfilter/sockopt.c   | 42 +++++++++++++++++++
>  net/ipv4/ip_sockglue.c        | 17 ++++++++
>  13 files changed, 316 insertions(+)
>  create mode 100644 include/linux/bpfilter.h
>  create mode 100644 include/uapi/linux/bpfilter.h
>  create mode 100644 net/bpfilter/Kconfig
>  create mode 100644 net/bpfilter/Makefile
>  create mode 100644 net/bpfilter/bpfilter_kern.c
>  create mode 100644 net/bpfilter/main.c
>  create mode 100644 net/bpfilter/msgfmt.h
>  create mode 100644 net/ipv4/bpfilter/Makefile
>  create mode 100644 net/ipv4/bpfilter/sockopt.c
>
> diff --git a/include/linux/bpfilter.h b/include/linux/bpfilter.h
> new file mode 100644
> index 000000000000..687b1760bb9f
> --- /dev/null
> +++ b/include/linux/bpfilter.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_BPFILTER_H
> +#define _LINUX_BPFILTER_H
> +
> +#include <uapi/linux/bpfilter.h>
> +
> +struct sock;
> +int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char *optval,
> +			    unsigned int optlen);
> +int bpfilter_ip_get_sockopt(struct sock *sk, int optname, char *optval,
> +			    int *optlen);
> +extern int (*bpfilter_process_sockopt)(struct sock *sk, int optname,
> +				       char __user *optval,
> +				       unsigned int optlen, bool is_set);
> +#endif
> diff --git a/include/uapi/linux/bpfilter.h b/include/uapi/linux/bpfilter.h
> new file mode 100644
> index 000000000000..2ec3cc99ea4c
> --- /dev/null
> +++ b/include/uapi/linux/bpfilter.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _UAPI_LINUX_BPFILTER_H
> +#define _UAPI_LINUX_BPFILTER_H
> +
> +#include <linux/if.h>
> +
> +enum {
> +	BPFILTER_IPT_SO_SET_REPLACE = 64,
> +	BPFILTER_IPT_SO_SET_ADD_COUNTERS = 65,
> +	BPFILTER_IPT_SET_MAX,
> +};
> +
> +enum {
> +	BPFILTER_IPT_SO_GET_INFO = 64,
> +	BPFILTER_IPT_SO_GET_ENTRIES = 65,
> +	BPFILTER_IPT_SO_GET_REVISION_MATCH = 66,
> +	BPFILTER_IPT_SO_GET_REVISION_TARGET = 67,
> +	BPFILTER_IPT_GET_MAX,
> +};
> +
> +#endif /* _UAPI_LINUX_BPFILTER_H */
> diff --git a/net/Kconfig b/net/Kconfig
> index b62089fb1332..ed6368b306fa 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -201,6 +201,8 @@ source "net/bridge/netfilter/Kconfig"
>  
>  endif
>  
> +source "net/bpfilter/Kconfig"
> +
>  source "net/dccp/Kconfig"
>  source "net/sctp/Kconfig"
>  source "net/rds/Kconfig"
> diff --git a/net/Makefile b/net/Makefile
> index a6147c61b174..7f982b7682bd 100644
> --- a/net/Makefile
> +++ b/net/Makefile
> @@ -20,6 +20,7 @@ obj-$(CONFIG_TLS)		+= tls/
>  obj-$(CONFIG_XFRM)		+= xfrm/
>  obj-$(CONFIG_UNIX)		+= unix/
>  obj-$(CONFIG_NET)		+= ipv6/
> +obj-$(CONFIG_BPFILTER)		+= bpfilter/
>  obj-$(CONFIG_PACKET)		+= packet/
>  obj-$(CONFIG_NET_KEY)		+= key/
>  obj-$(CONFIG_BRIDGE)		+= bridge/
> diff --git a/net/bpfilter/Kconfig b/net/bpfilter/Kconfig
> new file mode 100644
> index 000000000000..782a732b9a5c
> --- /dev/null
> +++ b/net/bpfilter/Kconfig
> @@ -0,0 +1,17 @@
> +menuconfig BPFILTER
> +	bool "BPF based packet filtering framework (BPFILTER)"
> +	default n
> +	depends on NET && BPF
> +	help
> +	  This builds experimental bpfilter framework that is aiming to
> +	  provide netfilter compatible functionality via BPF
> +
> +if BPFILTER
> +config BPFILTER_UMH
> +	tristate "bpftiler kernel module with user mode helper"
sp. "bpftiler" -> "bpfilter"
> +	default m
> +	depends on m
> +	help
> +	  This builds bpfilter kernel module with embedded user mode helper
> +endif
> +
> diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
> new file mode 100644
> index 000000000000..897eedae523e
> --- /dev/null
> +++ b/net/bpfilter/Makefile
> @@ -0,0 +1,24 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Makefile for the Linux BPFILTER layer.
> +#
> +
> +hostprogs-y := bpfilter_umh
> +bpfilter_umh-objs := main.o
> +HOSTCFLAGS += -I. -Itools/include/
> +
> +# a bit of elf magic to convert bpfilter_umh binary into a binary blob
> +# inside bpfilter_umh.o elf file referenced by
> +# _binary_net_bpfilter_bpfilter_umh_start symbol
> +# which bpfilter_kern.c passes further into umh blob loader at run-time
> +quiet_cmd_copy_umh = GEN $@
> +      cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
> +      $(OBJCOPY) -I binary -O $(CONFIG_OUTPUT_FORMAT) \
> +      -B `$(OBJDUMP) -f $<|grep architecture|cut -d, -f1|cut -d' ' -f2` \
> +      --rename-section .data=.init.rodata $< $@
> +
> +$(obj)/bpfilter_umh.o: $(obj)/bpfilter_umh
> +	$(call cmd,copy_umh)
> +
> +obj-$(CONFIG_BPFILTER_UMH) += bpfilter.o
> +bpfilter-objs += bpfilter_kern.o bpfilter_umh.o
> diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
> new file mode 100644
> index 000000000000..e0a6fdd5842b
> --- /dev/null
> +++ b/net/bpfilter/bpfilter_kern.c
> @@ -0,0 +1,93 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <linux/umh.h>
> +#include <linux/bpfilter.h>
> +#include <linux/sched.h>
> +#include <linux/sched/signal.h>
> +#include <linux/fs.h>
> +#include <linux/file.h>
> +#include "msgfmt.h"
> +
> +#define UMH_start _binary_net_bpfilter_bpfilter_umh_start
> +#define UMH_end _binary_net_bpfilter_bpfilter_umh_end
> +
> +extern char UMH_start;
> +extern char UMH_end;
> +
> +static struct umh_info info;
> +
> +static void shutdown_umh(struct umh_info *info)
> +{
> +	struct task_struct *tsk;
> +
> +	tsk = pid_task(find_vpid(info->pid), PIDTYPE_PID);
> +	if (tsk)
> +		force_sig(SIGKILL, tsk);
> +	fput(info->pipe_to_umh);
> +	fput(info->pipe_from_umh);
> +}
> +
> +static void stop_umh(void)
> +{
> +	if (bpfilter_process_sockopt) {
I worry about locking here.  Is it possible for two calls to
 bpfilter_process_sockopt() to run in parallel, both fail, and thus both
 call stop_umh()?  And if both end up calling shutdown_umh(), we double
 fput().
> +		bpfilter_process_sockopt = NULL;
> +		shutdown_umh(&info);
> +	}
> +}
> +
> +static int __bpfilter_process_sockopt(struct sock *sk, int optname,
> +				      char __user *optval,
> +				      unsigned int optlen, bool is_set)
> +{
> +	struct mbox_request req;
> +	struct mbox_reply reply;
> +	loff_t pos;
> +	ssize_t n;
> +
> +	req.is_set = is_set;
> +	req.pid = current->pid;
> +	req.cmd = optname;
> +	req.addr = (long)optval;
> +	req.len = optlen;
> +	n = __kernel_write(info.pipe_to_umh, &req, sizeof(req), &pos);
> +	if (n != sizeof(req)) {
> +		pr_err("write fail %zd\n", n);
> +		stop_umh();
> +		return -EFAULT;
> +	}
> +	pos = 0;
> +	n = kernel_read(info.pipe_from_umh, &reply, sizeof(reply), &pos);
> +	if (n != sizeof(reply)) {
> +		pr_err("read fail %zd\n", n);
> +		stop_umh();
> +		return -EFAULT;
> +	}
> +	return reply.status;
> +}
> +
> +static int __init load_umh(void)
> +{
> +	int err;
> +
> +	err = fork_usermode_blob(&UMH_start, &UMH_end - &UMH_start, &info);
> +	if (err)
> +		return err;
> +	pr_info("Loaded umh pid %d\n", info.pid);
> +	bpfilter_process_sockopt = &__bpfilter_process_sockopt;
> +
> +	if (__bpfilter_process_sockopt(NULL, 0, 0, 0, 0) != 0) {
> +		stop_umh();
> +		return -EFAULT;
> +	}
> +	return 0;
> +}
> +
> +static void __exit fini_umh(void)
> +{
> +	stop_umh();
> +}
> +module_init(load_umh);
> +module_exit(fini_umh);
> +MODULE_LICENSE("GPL");
> diff --git a/net/bpfilter/main.c b/net/bpfilter/main.c
> new file mode 100644
> index 000000000000..81bbc1684896
> --- /dev/null
> +++ b/net/bpfilter/main.c
> @@ -0,0 +1,63 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE
> +#include <sys/uio.h>
> +#include <errno.h>
> +#include <stdio.h>
> +#include <sys/socket.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include "include/uapi/linux/bpf.h"
> +#include <asm/unistd.h>
> +#include "msgfmt.h"
> +
> +int debug_fd;
> +
> +static int handle_get_cmd(struct mbox_request *cmd)
> +{
> +	switch (cmd->cmd) {
> +	case 0:
> +		return 0;
> +	default:
> +		break;
> +	}
> +	return -ENOPROTOOPT;
> +}
> +
> +static int handle_set_cmd(struct mbox_request *cmd)
> +{
> +	return -ENOPROTOOPT;
> +}
> +
> +static void loop(void)
> +{
> +	while (1) {
> +		struct mbox_request req;
> +		struct mbox_reply reply;
> +		int n;
> +
> +		n = read(0, &req, sizeof(req));
> +		if (n != sizeof(req)) {
> +			dprintf(debug_fd, "invalid request %d\n", n);
> +			return;
> +		}
> +
> +		reply.status = req.is_set ?
> +			handle_set_cmd(&req) :
> +			handle_get_cmd(&req);
> +
> +		n = write(1, &reply, sizeof(reply));
> +		if (n != sizeof(reply)) {
> +			dprintf(debug_fd, "reply failed %d\n", n);
> +			return;
> +		}
> +	}
> +}
> +
> +int main(void)
> +{
> +	debug_fd = open("/dev/console", 00000002 | 00000100);
Should probably handle failure of this open() call.
> +	dprintf(debug_fd, "Started bpfilter\n");
> +	loop();
> +	close(debug_fd);
> +	return 0;
> +}
> diff --git a/net/bpfilter/msgfmt.h b/net/bpfilter/msgfmt.h
> new file mode 100644
> index 000000000000..94b9ac9e5114
> --- /dev/null
> +++ b/net/bpfilter/msgfmt.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _NET_BPFILTER_MSGFMT_H
> +#define _NET_BPFTILER_MSGFMT_H
Another bpftiler here, should be
+#define _NET_BPFILTER_MSGFMT_H

-Ed
> +
> +struct mbox_request {
> +	__u64 addr;
> +	__u32 len;
> +	__u32 is_set;
> +	__u32 cmd;
> +	__u32 pid;
> +};
> +
> +struct mbox_reply {
> +	__u32 status;
> +};
> +
> +#endif
> diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
> index b379520f9133..7018f91c5a39 100644
> --- a/net/ipv4/Makefile
> +++ b/net/ipv4/Makefile
> @@ -16,6 +16,8 @@ obj-y     := route.o inetpeer.o protocol.o \
>  	     inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
>  	     metrics.o
>  
> +obj-$(CONFIG_BPFILTER) += bpfilter/
> +
>  obj-$(CONFIG_NET_IP_TUNNEL) += ip_tunnel.o
>  obj-$(CONFIG_SYSCTL) += sysctl_net_ipv4.o
>  obj-$(CONFIG_PROC_FS) += proc.o
> diff --git a/net/ipv4/bpfilter/Makefile b/net/ipv4/bpfilter/Makefile
> new file mode 100644
> index 000000000000..ce262d76cc48
> --- /dev/null
> +++ b/net/ipv4/bpfilter/Makefile
> @@ -0,0 +1,2 @@
> +obj-$(CONFIG_BPFILTER) += sockopt.o
> +
> diff --git a/net/ipv4/bpfilter/sockopt.c b/net/ipv4/bpfilter/sockopt.c
> new file mode 100644
> index 000000000000..42a96d2d8d05
> --- /dev/null
> +++ b/net/ipv4/bpfilter/sockopt.c
> @@ -0,0 +1,42 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/uaccess.h>
> +#include <linux/bpfilter.h>
> +#include <uapi/linux/bpf.h>
> +#include <linux/wait.h>
> +#include <linux/kmod.h>
> +
> +int (*bpfilter_process_sockopt)(struct sock *sk, int optname,
> +				char __user *optval,
> +				unsigned int optlen, bool is_set);
> +EXPORT_SYMBOL_GPL(bpfilter_process_sockopt);
> +
> +int bpfilter_mbox_request(struct sock *sk, int optname, char __user *optval,
> +			  unsigned int optlen, bool is_set)
> +{
> +	if (!bpfilter_process_sockopt) {
> +		int err = request_module("bpfilter");
> +
> +		if (err)
> +			return err;
> +		if (!bpfilter_process_sockopt)
> +			return -ECHILD;
> +	}
> +	return bpfilter_process_sockopt(sk, optname, optval, optlen, is_set);
> +}
> +
> +int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char __user *optval,
> +			    unsigned int optlen)
> +{
> +	return bpfilter_mbox_request(sk, optname, optval, optlen, true);
> +}
> +
> +int bpfilter_ip_get_sockopt(struct sock *sk, int optname, char __user *optval,
> +			    int __user *optlen)
> +{
> +	int len;
> +
> +	if (get_user(len, optlen))
> +		return -EFAULT;
> +
> +	return bpfilter_mbox_request(sk, optname, optval, len, false);
> +}
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index 5ad2d8ed3a3f..e0791faacb24 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -47,6 +47,8 @@
>  #include <linux/errqueue.h>
>  #include <linux/uaccess.h>
>  
> +#include <linux/bpfilter.h>
> +
>  /*
>   *	SOL_IP control messages.
>   */
> @@ -1244,6 +1246,11 @@ int ip_setsockopt(struct sock *sk, int level,
>  		return -ENOPROTOOPT;
>  
>  	err = do_ip_setsockopt(sk, level, optname, optval, optlen);
> +#ifdef CONFIG_BPFILTER
> +	if (optname >= BPFILTER_IPT_SO_SET_REPLACE &&
> +	    optname < BPFILTER_IPT_SET_MAX)
> +		err = bpfilter_ip_set_sockopt(sk, optname, optval, optlen);
> +#endif
>  #ifdef CONFIG_NETFILTER
>  	/* we need to exclude all possible ENOPROTOOPTs except default case */
>  	if (err == -ENOPROTOOPT && optname != IP_HDRINCL &&
> @@ -1552,6 +1559,11 @@ int ip_getsockopt(struct sock *sk, int level,
>  	int err;
>  
>  	err = do_ip_getsockopt(sk, level, optname, optval, optlen, 0);
> +#ifdef CONFIG_BPFILTER
> +	if (optname >= BPFILTER_IPT_SO_GET_INFO &&
> +	    optname < BPFILTER_IPT_GET_MAX)
> +		err = bpfilter_ip_get_sockopt(sk, optname, optval, optlen);
> +#endif
>  #ifdef CONFIG_NETFILTER
>  	/* we need to exclude all possible ENOPROTOOPTs except default case */
>  	if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS &&
> @@ -1584,6 +1596,11 @@ int compat_ip_getsockopt(struct sock *sk, int level, int optname,
>  	err = do_ip_getsockopt(sk, level, optname, optval, optlen,
>  		MSG_CMSG_COMPAT);
>  
> +#ifdef CONFIG_BPFILTER
> +	if (optname >= BPFILTER_IPT_SO_GET_INFO &&
> +	    optname < BPFILTER_IPT_GET_MAX)
> +		err = bpfilter_ip_get_sockopt(sk, optname, optval, optlen);
> +#endif
>  #ifdef CONFIG_NETFILTER
>  	/* we need to exclude all possible ENOPROTOOPTs except default case */
>  	if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS &&

^ permalink raw reply

* RE: [PATCH net] tcp: restore autocorking
From: Michael Wenig @ 2018-05-03 14:27 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller; +Cc: netdev, Eric Dumazet
In-Reply-To: <20180503032513.210324-1-edumazet@google.com>

Thank you, Eric, for fixing this!

Michael 

-----Original Message-----
From: Eric Dumazet [mailto:edumazet@google.com] 
Sent: Wednesday, May 2, 2018 8:25 PM
To: David S . Miller <davem@davemloft.net>
Cc: netdev <netdev@vger.kernel.org>; Eric Dumazet <edumazet@google.com>; Michael Wenig <mwenig@vmware.com>; Eric Dumazet <eric.dumazet@gmail.com>
Subject: [PATCH net] tcp: restore autocorking

When adding rb-tree for TCP retransmit queue, we inadvertently broke TCP autocorking.

tcp_should_autocork() should really check if the rtx queue is not empty.

Tested:

Before the fix :
$ nstat -n;./netperf -H 10.246.7.152 -Cc -- -m 500;nstat | grep AutoCork MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

540000 262144    500    10.00      2682.85   2.47     1.59     3.618   2.329
TcpExtTCPAutoCorking            33                 0.0

// Same test, but forcing TCP_NODELAY
$ nstat -n;./netperf -H 10.246.7.152 -Cc -- -D -m 500;nstat | grep AutoCork MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET : nodelay
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

540000 262144    500    10.00      1408.75   2.44     2.96     6.802   8.259
TcpExtTCPAutoCorking            1                  0.0

After the fix :
$ nstat -n;./netperf -H 10.246.7.152 -Cc -- -m 500;nstat | grep AutoCork MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

540000 262144    500    10.00      5472.46   2.45     1.43     1.761   1.027
TcpExtTCPAutoCorking            361293             0.0

// With TCP_NODELAY option
$ nstat -n;./netperf -H 10.246.7.152 -Cc -- -D -m 500;nstat | grep AutoCork MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET : nodelay
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

540000 262144    500    10.00      5454.96   2.46     1.63     1.775   1.174
TcpExtTCPAutoCorking            315448             0.0

Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Michael Wenig <mwenig@vmware.com>
Tested-by: Michael Wenig <mwenig@vmware.com>
---
 net/ipv4/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 44be7f43455e4aefde8db61e2d941a69abcc642a..c9d00ef54deca15d5760bcbe154001a96fa1e2a7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -697,7 +697,7 @@ static bool tcp_should_autocork(struct sock *sk, struct sk_buff *skb,  {
 	return skb->len < size_goal &&
 	       sock_net(sk)->ipv4.sysctl_tcp_autocorking &&
-	       skb != tcp_write_queue_head(sk) &&
+	       !tcp_rtx_queue_empty(sk) &&
 	       refcount_read(&sk->sk_wmem_alloc) > skb->truesize;  }

^ permalink raw reply

* [PATCH v2 net-next] net: dsa: mv88e6xxx: 88E6141/6341 SERDES support
From: Marek Behún @ 2018-05-03 14:27 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Greg Kroah-Hartman, Vivien Didelot,
	Arkadi Sharshevsky, David S . Miller, Marek Behún

The 88E6141/6341 switches (also known as Topaz) have 1 SGMII lane,
which can be configured the same way as the SERDES lane on 88E6390.

Signed-off-by: Marek Behun <marek.behun@nic.cz>
---
 drivers/net/dsa/mv88e6xxx/chip.c   |  2 ++
 drivers/net/dsa/mv88e6xxx/serdes.c | 20 ++++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/serdes.h |  3 +++
 3 files changed, 25 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index bd74c45f5495..fae362020305 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2426,6 +2426,7 @@ static const struct mv88e6xxx_ops mv88e6141_ops = {
 	.reset = mv88e6352_g1_reset,
 	.vtu_getnext = mv88e6352_g1_vtu_getnext,
 	.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
+	.serdes_power = mv88e6341_serdes_power,
 };
 
 static const struct mv88e6xxx_ops mv88e6161_ops = {
@@ -2924,6 +2925,7 @@ static const struct mv88e6xxx_ops mv88e6341_ops = {
 	.reset = mv88e6352_g1_reset,
 	.vtu_getnext = mv88e6352_g1_vtu_getnext,
 	.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
+	.serdes_power = mv88e6341_serdes_power,
 };
 
 static const struct mv88e6xxx_ops mv88e6350_ops = {
diff --git a/drivers/net/dsa/mv88e6xxx/serdes.c b/drivers/net/dsa/mv88e6xxx/serdes.c
index f3c01119b3d1..bfecbdf5f64d 100644
--- a/drivers/net/dsa/mv88e6xxx/serdes.c
+++ b/drivers/net/dsa/mv88e6xxx/serdes.c
@@ -227,3 +227,23 @@ int mv88e6390_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on)
 
 	return 0;
 }
+
+int mv88e6341_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on)
+{
+	int err;
+	u8 cmode;
+
+	if (port != 5)
+		return 0;
+
+	err = mv88e6xxx_port_get_cmode(chip, port, &cmode);
+	if (err)
+		return err;
+
+	if ((cmode == MV88E6XXX_PORT_STS_CMODE_1000BASE_X) ||
+	     (cmode == MV88E6XXX_PORT_STS_CMODE_SGMII) ||
+	     (cmode == MV88E6XXX_PORT_STS_CMODE_2500BASEX))
+		return mv88e6390_serdes_sgmii(chip, MV88E6341_ADDR_SERDES, on);
+
+	return 0;
+}
diff --git a/drivers/net/dsa/mv88e6xxx/serdes.h b/drivers/net/dsa/mv88e6xxx/serdes.h
index 5c1cd6d8e9a5..87bfafc5fb29 100644
--- a/drivers/net/dsa/mv88e6xxx/serdes.h
+++ b/drivers/net/dsa/mv88e6xxx/serdes.h
@@ -19,6 +19,8 @@
 #define MV88E6352_ADDR_SERDES		0x0f
 #define MV88E6352_SERDES_PAGE_FIBER	0x01
 
+#define MV88E6341_ADDR_SERDES		0x15
+
 #define MV88E6390_PORT9_LANE0		0x09
 #define MV88E6390_PORT9_LANE1		0x12
 #define MV88E6390_PORT9_LANE2		0x13
@@ -42,6 +44,7 @@
 #define MV88E6390_SGMII_CONTROL_LOOPBACK	BIT(14)
 #define MV88E6390_SGMII_CONTROL_PDOWN		BIT(11)
 
+int mv88e6341_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on);
 int mv88e6352_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on);
 int mv88e6390_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on);
 
-- 
2.16.1

^ permalink raw reply related

* Re: [PATCH v1 net-next] net: dsa: mv88e6xxx: 88E6141/6341 SERDES support
From: Marek Behún @ 2018-05-03 14:31 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, Greg Kroah-Hartman, Vivien Didelot, Arkadi Sharshevsky,
	David S . Miller
In-Reply-To: <20180503131515.GB17027@lunn.ch>

Sent v2

On Thu, 3 May 2018 15:15:15 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> On Thu, May 03, 2018 at 03:06:48PM +0200, Marek Behún wrote:
> > The 88E6141/6341 switches (also known as Topaz) have 1 SGMII lane,
> > which can be configured the same way as the SERDES lane on 88E6390.
> > 
> > Signed-off-by: Marek Behun <marek.behun@nic.cz>  
> 
> > +int mv88e6341_serdes_power(struct mv88e6xxx_chip *chip, int port,
> > bool on) +{
> > +	int err;
> > +	u8 cmode;
> > +
> > +	if (port != 5)
> > +		return 0;
> > +
> > +	err = mv88e6xxx_port_get_cmode(chip, port, &cmode);
> > +	if (err)
> > +		return err;
> > +
> > +	if ((cmode == MV88E6XXX_PORT_STS_CMODE_1000BASE_X) ||
> > +	     (cmode == MV88E6XXX_PORT_STS_CMODE_SGMII) ||
> > +	     (cmode == MV88E6XXX_PORT_STS_CMODE_2500BASEX))
> > +		return mv88e6390_serdes_sgmii(chip, 0x15, on);  
> 
> Hi Marek
> 
> Please add a #define for this 0x15.
> 
> Otherwise, this looks good.
> 
> 	   Andrew

^ permalink raw reply

* Re: [PATCH net] tcp: restore autocorking
From: Neal Cardwell @ 2018-05-03 14:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Netdev, mwenig, Eric Dumazet
In-Reply-To: <20180503032513.210324-1-edumazet@google.com>

On Wed, May 2, 2018 at 11:25 PM Eric Dumazet <edumazet@google.com> wrote:

> When adding rb-tree for TCP retransmit queue, we inadvertently broke
> TCP autocorking.

> tcp_should_autocork() should really check if the rtx queue is not empty.

...

> Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Michael Wenig <mwenig@vmware.com>
> Tested-by: Michael Wenig <mwenig@vmware.com>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

Nice. Thanks, Eric!

neal

^ permalink raw reply

* Re: [PATCH net] tcp: restore autocorking
From: Soheil Hassas Yeganeh @ 2018-05-03 14:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, netdev, Michael Wenig, Eric Dumazet
In-Reply-To: <20180503032513.210324-1-edumazet@google.com>

On Wed, May 2, 2018 at 11:25 PM, Eric Dumazet <edumazet@google.com> wrote:
> Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Michael Wenig <mwenig@vmware.com>
> Tested-by: Michael Wenig <mwenig@vmware.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Thank you for catching and fixing this!

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox