Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 0/2] inet: Fixes for inet_csk_get_port and soreusport
From: David Miller @ 2016-12-17 16:13 UTC (permalink / raw)
  To: tom; +Cc: netdev, kernel-team, jbacik, eric.dumazet, raigatgoog
In-Reply-To: <20161215005416.1561632-1-tom@herbertland.com>

From: Tom Herbert <tom@herbertland.com>
Date: Wed, 14 Dec 2016 16:54:14 -0800

> This patch set fixes a couple of issues I noticed while debugging our
> softlockup issue in inet_csk_get_port.
> 
> - Don't allow jump into port scan in inet_csk_get_port if function
>   was called with non-zero port number (looking up explicit port
>   number).
> - When inet_csk_get_port is called with zero port number (ie. perform
>   scan) an reuseport is set on the socket, don't match sockets that
>   also have reuseport set. The intent from the user should be
>   to get a new port number and then explictly bind other
>   sockets to that number using soreuseport.
> 
> Tested:
> 
> Ran first patch on production workload with no ill effect.
> 
> For second patch, ran a little listener application and first
> demonstrated that unbound sockets with soreuseport can indeed
> be bound to unrelated soreuseport sockets.

Series applied, thanks Tom.

^ permalink raw reply

* Re: [PATCH] bpf: cgroup: annotate pointers in struct cgroup_bpf with __rcu
From: David Miller @ 2016-12-17 16:14 UTC (permalink / raw)
  To: daniel; +Cc: ast, daniel, netdev
In-Reply-To: <20161215095321.10571-1-daniel@zonque.org>

From: Daniel Mack <daniel@zonque.org>
Date: Thu, 15 Dec 2016 10:53:21 +0100

> The member 'effective' in 'struct cgroup_bpf' is protected by RCU.
> Annotate it accordingly to squelch a sparse warning.
> 
> Signed-off-by: Daniel Mack <daniel@zonque.org>

Applied.

^ permalink raw reply

* Re: Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: Jeffrey Walton @ 2016-12-17 16:14 UTC (permalink / raw)
  To: Theodore Ts'o, kernel-hardening, Jason A. Donenfeld,
	George Spelvin, ak, davem, David Laight, D. J. Bernstein,
	Eric Biggers, Hannes Frederic Sowa, Jean-Philippe Aumasson,
	linux-crypto, LKML, luto, Netdev, Tom Herbert, Linus Torvalds,
	Vegard Nossum
In-Reply-To: <20161217154152.5oug7mzb4tmfknwv@thunk.org>

> As far as half-siphash is concerned, it occurs to me that the main
> problem will be those users who need to guarantee that output can't be
> guessed over a long period of time.  For example, if you have a
> long-running process, then the output needs to remain unguessable over
> potentially months or years, or else you might be weakening the ASLR
> protections.  If on the other hand, the hash table or the process will
> be going away in a matter of seconds or minutes, the requirements with
> respect to cryptographic strength go down significantly.

Perhaps SipHash-4-8 should be used instead of SipHash-2-4. I believe
SipHash-4-8 is recommended for the security conscious who want to be
more conservative in their security estimates.

SipHash-4-8 does not add much more processing. If you are clocking
SipHash-2-4 at 2.0 or 2.5 cpb, then SipHash-4-8 will run at 3.0 to
4.0. Both are well below MD5 times. (At least with the data sets I've
tested).

> Now, maybe this doesn't matter that much if we can guarantee (or make
> assumptions) that the attacker doesn't have unlimited access the
> output stream of get_random_{long,int}(), or if it's being used in an
> anti-DOS use case where it ultimately only needs to be harder than
> alternate ways of attacking the system.
>
> Rekeying every five minutes doesn't necessarily help the with respect
> to ASLR, but it might reduce the amount of the output stream that
> would be available to the attacker in order to be able to attack the
> get_random_{long,int}() generator, and it also reduces the value of
> doing that attack to only compromising the ASLR for those processes
> started within that five minute window.

Forgive my ignorance... I did not find reading on using the primitive
in a PRNG. Does anyone know what Aumasson or Bernstein have to say?
Aumasson's site does not seem to discuss the use case:
https://www.google.com/search?q=siphash+rng+site%3A131002.net. (And
their paper only mentions random-number once in a different context).

Making the leap from internal hash tables and short-lived network
packets to the rng case may leave something to be desired, especially
if the bits get used in unanticipated ways, like creating long term
private keys.

Jeff

^ permalink raw reply

* Re: [PATCH 1/5] irda: irproc.c: Remove unneeded linux/miscdevice.h include
From: David Miller @ 2016-12-17 16:18 UTC (permalink / raw)
  To: clabbe.montjoie; +Cc: arnd, gregkh, samuel, linux-kernel, netdev
In-Reply-To: <20161215104250.16813-1-clabbe.montjoie@gmail.com>

From: Corentin Labbe <clabbe.montjoie@gmail.com>
Date: Thu, 15 Dec 2016 11:42:46 +0100

> irproc.c does not use any miscdevice so this patch remove this
> unnecessary inclusion.
> 
> Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/5] irda: irnet: Move linux/miscdevice.h include
From: David Miller @ 2016-12-17 16:18 UTC (permalink / raw)
  To: clabbe.montjoie; +Cc: arnd, gregkh, samuel, linux-kernel, netdev
In-Reply-To: <20161215104250.16813-2-clabbe.montjoie@gmail.com>

From: Corentin Labbe <clabbe.montjoie@gmail.com>
Date: Thu, 15 Dec 2016 11:42:47 +0100

> The only use of miscdevice is irda_ppp so no need to include
> linux/miscdevice.h for all irda files.
> This patch move the linux/miscdevice.h include to irnet_ppp.h
> 
> Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 3/5] irnet: ppp: move IRNET_MINOR to include/linux/miscdevice.h
From: David Miller @ 2016-12-17 16:18 UTC (permalink / raw)
  To: clabbe.montjoie; +Cc: arnd, gregkh, samuel, linux-kernel, netdev
In-Reply-To: <20161215104250.16813-3-clabbe.montjoie@gmail.com>

From: Corentin Labbe <clabbe.montjoie@gmail.com>
Date: Thu, 15 Dec 2016 11:42:48 +0100

> This patch move the define for IRNET_MINOR to include/linux/miscdevice.h
> It is better that all minor number definitions are in the same place.
> 
> Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 4/5] irda: irnet: Remove unused IRNET_MAJOR define
From: David Miller @ 2016-12-17 16:18 UTC (permalink / raw)
  To: clabbe.montjoie; +Cc: arnd, gregkh, samuel, linux-kernel, netdev
In-Reply-To: <20161215104250.16813-4-clabbe.montjoie@gmail.com>

From: Corentin Labbe <clabbe.montjoie@gmail.com>
Date: Thu, 15 Dec 2016 11:42:49 +0100

> The IRNET_MAJOR define is not used, so this patch remove it.
> 
> Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 5/5] irda: irnet: add member name to the miscdevice declaration
From: David Miller @ 2016-12-17 16:18 UTC (permalink / raw)
  To: clabbe.montjoie; +Cc: arnd, gregkh, samuel, linux-kernel, netdev
In-Reply-To: <20161215104250.16813-5-clabbe.montjoie@gmail.com>

From: Corentin Labbe <clabbe.montjoie@gmail.com>
Date: Thu, 15 Dec 2016 11:42:50 +0100

> Since the struct miscdevice have many members, it is dangerous to init
> it without members name relying only on member order.
> 
> This patch add member name to the init declaration.
> 
> Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net 0/3] dpaa_eth: a couple of fixes
From: David Miller @ 2016-12-17 16:22 UTC (permalink / raw)
  To: madalin.bucur; +Cc: netdev, linuxppc-dev, linux-kernel
In-Reply-To: <1481807586-4798-1-git-send-email-madalin.bucur@nxp.com>

From: Madalin Bucur <madalin.bucur@nxp.com>
Date: Thu, 15 Dec 2016 15:13:03 +0200

> This patch set introduces big endian accessors in the dpaa_eth driver
> making sure accesses to the QBMan HW are correct on little endian
> platforms. Removing a redundant Kconfig dependency on FSL_SOC.
> Adding myself as maintainer of the dpaa_eth driver.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net] sctp: sctp_epaddr_lookup_transport should be protected by rcu_read_lock
From: David Miller @ 2016-12-17 16:25 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, nhorman, dvyukov
In-Reply-To: <eb70a1f54c102c728a5572f1aa2842940fc3f5a5.1481814055.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Thu, 15 Dec 2016 23:00:55 +0800

> Since commit 7fda702f9315 ("sctp: use new rhlist interface on sctp transport
> rhashtable"), sctp has changed to use rhlist_lookup to look up transport, but
> rhlist_lookup doesn't call rcu_read_lock inside, unlike rhashtable_lookup_fast.
> 
> It is called in sctp_epaddr_lookup_transport and sctp_addrs_lookup_transport.
> sctp_addrs_lookup_transport is always in the protection of rcu_read_lock(),
> as __sctp_lookup_association is called in rx path or sctp_lookup_association
> which are in the protection of rcu_read_lock() already.
> 
> But sctp_epaddr_lookup_transport is called by sctp_endpoint_lookup_assoc, it
> doesn't call rcu_read_lock, which may cause "suspicious rcu_dereference_check
> usage' in __rhashtable_lookup.
> 
> This patch is to fix it by adding rcu_read_lock in sctp_endpoint_lookup_assoc
> before calling sctp_epaddr_lookup_transport.
> 
> Fixes: 7fda702f9315 ("sctp: use new rhlist interface on sctp transport rhashtable")
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] sctp: sctp_transport_lookup_process should rcu_read_unlock when transport is null
From: David Miller @ 2016-12-17 16:25 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, nhorman
In-Reply-To: <3d4f0205d5b5c64d372dffb37602f48e4a173612.1481814352.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Thu, 15 Dec 2016 23:05:52 +0800

> Prior to this patch, sctp_transport_lookup_process didn't rcu_read_unlock
> when it failed to find a transport by sctp_addrs_lookup_transport.
> 
> This patch is to fix it by moving up rcu_read_unlock right before checking
> transport and also to remove the out path.
> 
> Fixes: 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH] net: mvpp2: fix dma unmapping of TX buffers for fragments
From: David Miller @ 2016-12-17 16:35 UTC (permalink / raw)
  To: thomas.petazzoni
  Cc: netdev, linux-arm-kernel, jason, andrew, sebastian.hesselbarth,
	gregory.clement, mw, stefanc, nadavh, hannah, yehuday,
	raphael.glon, stable
In-Reply-To: <20161217162658.63bc2423@free-electrons.com>

From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Date: Sat, 17 Dec 2016 16:26:58 +0100

> Yes, I was thinking of moving towards a single array, as it's indeed
> crazy to have three arrays for that. However, since it's a fix going
> into stable, I also wanted to keep it as simple/straightforward as
> possible and avoid refactoring other parts of the code.

By the same token, by adding a third array you are making the code
more complex, adding more error recovery paths, etc.

> If you however believe moving to one array should be done as part of
> the fix, I'll do this.

Please do.

^ permalink raw reply

* Re: ipv6: handle -EFAULT from skb_copy_bits
From: Dave Jones @ 2016-12-17 16:43 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20161217.104120.945262627572958024.davem@davemloft.net>

On Sat, Dec 17, 2016 at 10:41:20AM -0500, David Miller wrote:
 > From: Dave Jones <davej@codemonkey.org.uk>
 > Date: Wed, 14 Dec 2016 10:47:29 -0500
 > 
 > > It seems to be possible to craft a packet for sendmsg that triggers
 > > the -EFAULT path in skb_copy_bits resulting in a BUG_ON that looks like:
 > > 
 > > RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
 > > RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
 > > RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
 > > RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
 > > RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
 > > R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
 > > R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80
 > > 
 > > Call Trace:
 > >  [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
 > >  [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
 > >  [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
 > >  [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
 > >  [<ffffffff816da27e>] SyS_sendto+0xe/0x10
 > >  [<ffffffff81002910>] do_syscall_64+0x50/0xa0
 > >  [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25
 > > 
 > > Handle this in rawv6_push_pending_frames and jump to the failure path.
 > > 
 > > Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
 > 
 > Hmmm, that's interesting.  Becaue the code in __ip6_append_data(), which
 > sets up the ->cork.base.length value, seems to be defensively trying to
 > avoid this possibility.
 > 
 > For example, it checks things like:
 > 
 > 	if (cork->length + length > mtu - headersize && ipc6->dontfrag &&
 > 	    (sk->sk_protocol == IPPROTO_UDP ||
 > 	     sk->sk_protocol == IPPROTO_RAW)) {
 > 
 > This is why the transport offset plus the length should never exceed
 > the total length for that skb_copy_bits() call.
 > 
 > Perhaps this protocol check in the code above is incomplete?  Do you
 > know what the sk->sk_protocol value was when that BUG triggered?  That
 > might shine some light on what is really happening here.

I'll see if I can craft up a reproducer next week.
For some reason I've not hit this on my test setup at home, but it
reproduces daily in our test setup at facebook.  The only thing
I can think of is that those fb boxes are ipv6 only, so I might be
exercising v4 more at home.

	Dave

^ permalink raw reply

* Re: [net-next PATCH v6 0/5] XDP for virtio_net
From: David Miller @ 2016-12-17 16:56 UTC (permalink / raw)
  To: mst
  Cc: john.fastabend, daniel, netdev, alexei.starovoitov,
	john.r.fastabend, brouer, tgraf
In-Reply-To: <20161216224600-mutt-send-email-mst@kernel.org>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Fri, 16 Dec 2016 22:48:14 +0200

> On Fri, Dec 16, 2016 at 01:20:02PM -0500, David Miller wrote:
>> From: "Michael S. Tsirkin" <mst@redhat.com>
>> Date: Fri, 16 Dec 2016 01:17:44 +0200
>> 
>> > OK, I think we can queue this for -next.
>> > 
>> > It's fairly limited in the kind of hardware supported, we can and
>> > probably should extend it further with time.
>> > 
>> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
>> 
>> Michael, thanks for reviewing.
>> 
>> Since the substance of this patch series has honestly been ready since
>> before the merge window, would you mind if I send this to Linus now?
>> 
>> That's what I was hoping I would be able to do.
>> 
>> Thanks again.
> 
> ACK, no problem.
> BTW in case people wonder, I'll be offline for a couple of weeks.
> This might delay review of some patches a bit.

Great, series applied, thanks everyone!

^ permalink raw reply

* Re: [PATCH] isdn/gigaset: use designated initializers
From: David Miller @ 2016-12-17 16:57 UTC (permalink / raw)
  To: keescook
  Cc: linux-kernel, pebolle, isdn, dan.carpenter, tilman,
	gigaset307x-common, netdev
In-Reply-To: <20161217005806.GA140074@beast>

From: Kees Cook <keescook@chromium.org>
Date: Fri, 16 Dec 2016 16:58:06 -0800

> Prepare to mark sensitive kernel structures for randomization by making
> sure they're using designated initializers. These were identified during
> allyesconfig builds of x86, arm, and arm64, with most initializer fixes
> extracted from grsecurity.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Applied.

^ permalink raw reply

* Re: [PATCH] ATM: use designated initializers
From: David Miller @ 2016-12-17 16:57 UTC (permalink / raw)
  To: keescook
  Cc: linux-kernel, felipe.balbi, mugunthanvnm, fw, javier, jarod,
	netdev
In-Reply-To: <20161217005843.GA140202@beast>

From: Kees Cook <keescook@chromium.org>
Date: Fri, 16 Dec 2016 16:58:43 -0800

> Prepare to mark sensitive kernel structures for randomization by making
> sure they're using designated initializers. These were identified during
> allyesconfig builds of x86, arm, and arm64, with most initializer fixes
> extracted from grsecurity.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Applied.

^ permalink raw reply

* Re: [PATCH] net: use designated initializers
From: David Miller @ 2016-12-17 16:57 UTC (permalink / raw)
  To: keescook; +Cc: linux-kernel, linux-decnet-user, netdev
In-Reply-To: <20161217005858.GA140223@beast>

From: Kees Cook <keescook@chromium.org>
Date: Fri, 16 Dec 2016 16:58:58 -0800

> Prepare to mark sensitive kernel structures for randomization by making
> sure they're using designated initializers. These were identified during
> allyesconfig builds of x86, arm, and arm64, with most initializer fixes
> extracted from grsecurity.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Applied, although "decnet: " would have been a much better
subsystem prefix.

^ permalink raw reply

* Re: [PATCH] WAN: use designated initializers
From: David Miller @ 2016-12-17 16:57 UTC (permalink / raw)
  To: keescook; +Cc: linux-kernel, netdev
In-Reply-To: <20161217005918.GA140243@beast>

From: Kees Cook <keescook@chromium.org>
Date: Fri, 16 Dec 2016 16:59:18 -0800

> Prepare to mark sensitive kernel structures for randomization by making
> sure they're using designated initializers. These were identified during
> allyesconfig builds of x86, arm, and arm64, with most initializer fixes
> extracted from grsecurity.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Applied.

^ permalink raw reply

* Re: [PATCH] bna: use designated initializers
From: David Miller @ 2016-12-17 16:57 UTC (permalink / raw)
  To: keescook
  Cc: linux-kernel, rasesh.mody, sudarsana.kalluru, Dept-GELinuxNICDev,
	netdev
In-Reply-To: <20161217010054.GA140360@beast>

From: Kees Cook <keescook@chromium.org>
Date: Fri, 16 Dec 2016 17:00:54 -0800

> Prepare to mark sensitive kernel structures for randomization by making
> sure they're using designated initializers. These were identified during
> allyesconfig builds of x86, arm, and arm64, with most initializer fixes
> extracted from grsecurity.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Applied.

^ permalink raw reply

* Re: [PATCH] isdn: use designated initializers
From: David Miller @ 2016-12-17 16:58 UTC (permalink / raw)
  To: keescook; +Cc: linux-kernel, isdn, mugunthanvnm, a, felipe.balbi, fw, netdev
In-Reply-To: <20161217010142.GA140401@beast>

From: Kees Cook <keescook@chromium.org>
Date: Fri, 16 Dec 2016 17:01:42 -0800

> Prepare to mark sensitive kernel structures for randomization by making
> sure they're using designated initializers. These were identified during
> allyesconfig builds of x86, arm, and arm64, with most initializer fixes
> extracted from grsecurity.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Applied.

^ permalink raw reply

* Re: [PATCH] net/x25: use designated initializers
From: David Miller @ 2016-12-17 16:58 UTC (permalink / raw)
  To: keescook; +Cc: linux-kernel, andrew.hendry, linux-x25, netdev
In-Reply-To: <20161217010339.GA140529@beast>

From: Kees Cook <keescook@chromium.org>
Date: Fri, 16 Dec 2016 17:03:39 -0800

> Prepare to mark sensitive kernel structures for randomization by making
> sure they're using designated initializers. These were identified during
> allyesconfig builds of x86, arm, and arm64, with most initializer fixes
> extracted from grsecurity.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Applied.

^ permalink raw reply

* Re: [PATCH 0/2] GTP tunneling fixes for net
From: David Miller @ 2016-12-17 17:01 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev, laforge
In-Reply-To: <1481837753-10317-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Thu, 15 Dec 2016 22:35:51 +0100

> The following patchset contains two GTP tunneling fixes for your net
> tree, they are:
> 
> 1) Offset to IPv4 header in gtp_check_src_ms_ipv4() is incorrect, thus
>    this function always succeeds and therefore this defeats this sanity
>    check. This allows packets that have no PDP to go though, patch from
>    Lionel Gauthier.
> 
> 2) According to Note 0 of Figure 2 in Section 6 of 3GPP TS 29.060 v13.5.0
>    Release 13, always set GTPv1 reserved bit to zero. This may cause
>    interoperability problems, patch from Harald Welte.
> 
> Please, apply, thanks a lot!

Series applied, thanks Pablo.

^ permalink raw reply

* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock
From: Pavel Machek @ 2016-12-17 17:31 UTC (permalink / raw)
  To: Lino Sanfilippo
  Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro,
	alexandre.torgue, davem, linux-kernel, netdev
In-Reply-To: <6f43eac8-754b-cfa2-371d-050701deb4cd@gmx.de>

[-- Attachment #1: Type: text/plain, Size: 1355 bytes --]

On Thu 2016-12-15 23:33:22, Lino Sanfilippo wrote:
> On 15.12.2016 22:32, Lino Sanfilippo wrote:
> 
> > Ah ok. Then maybe priv->hw->dma->stop_tx() does not do the job correctly (stop the
> > tx path properly) and the HW is still active on the tx path while the tx buffers are
> > freed. OTOH stmmac_release() also stops the phy before the tx (and rx) paths are stopped.
> > Did you try to stop the phy fist in stmmac_tx_err_work(), too?
> > 
> > Regards,
> > Lino
> > 
> 
> And this is the "sledgehammer" approach: Do a complete shutdown and restart
> of the hardware in case of tx error (against net-next and only
>compile tested).

Wow, thanks a lot. I'll try to get the driver back to the non-working
state, and try it.

I believe I have some idea what is wrong there. (Missing memory barriers).

> +static void stmmac_tx_err_work(struct work_struct *work)
> +{
> +	struct stmmac_priv *priv = container_of(work, struct stmmac_priv,
> +						tx_err_work);
> +	/* restart netdev */
> +	rtnl_lock();
> +	stmmac_release(priv->dev);
> +	stmmac_open(priv->dev);
> +	rtnl_unlock();
> +}

Won't this up/down the interface, in a way userspace can observe?

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: Synopsys Ethernet QoS
From: Pavel Machek @ 2016-12-17 17:38 UTC (permalink / raw)
  To: Joao Pinto
  Cc: Niklas Cassel, Giuseppe CAVALLARO, Florian Fainelli,
	Andy Shevchenko, David Miller, larper, rabinv, netdev,
	CARLOS.PALMINHA, Jie.Deng1, Stephen Warren
In-Reply-To: <79642215-95ce-7f04-3db7-121c585e2f2a@synopsys.com>

[-- Attachment #1: Type: text/plain, Size: 3796 bytes --]

Hi!

> >> So if there is a long time before handling interrupts,
> >> I guess that it makes sense that one stream could
> >> get an advantage in the net scheduler.
> >>
> >> If I find the time, and if no one beats me to it, I will try to replace
> >> the normal timers with HR timers + a smaller default timeout.
> >>
> > 
> > Can you try something like this? Highres timers will be needed, too,
> > but this fixes the logic problem.
> > 
> > You'll need to apply it twice as code is copy&pasted.
> > 
> > Best regards,
> > 									Pavel
> > 
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > 
> >  	 */
> >  	priv->tx_count_frames += nfrags + 1;
> >  	if (likely(priv->tx_coal_frames > priv->tx_count_frames)) {
> > -		mod_timer(&priv->txtimer,
> > -			  STMMAC_COAL_TIMER(priv->tx_coal_timer));
> > +		if (priv->tx_count_frames == nfrags + 1)
> > +			mod_timer(&priv->txtimer,
> > +				  STMMAC_COAL_TIMER(priv->tx_coal_timer));
> >  	} else {
> >  		priv->tx_count_frames = 0;
> >  		priv->hw->desc->set_tx_ic(desc);
> > 
> > 
> 
> I know that this is completely of topic, but I am facing a dificulty with
> stmmac. I have interrupts, mac well configured rx packets being received
> successfully, but TX is not working, resulting in Tx errors = Total TX packets.
> I have made a lot of debug and my conclusions is that by some reason when using
> stmmac after starting tx dma, the hw state machine enters a deadend state
> resulting in those errors. Anyone faced this trouble?

SMP or UP system? AFAICT the driver gets the memory barriers wrong. It
does not fail completely for me, but still fails rather quickly.

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 3e40578..641b03d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2121,11 +2205,11 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (mss_desc)
 		priv->hw->desc->set_tx_owner(mss_desc);
 
-	/* The own bit must be the latest setting done when prepare the
+	/* The own bit must be the latest setting done when preparing the
 	 * descriptor and then barrier is needed to make sure that
 	 * all is coherent before granting the DMA engine.
 	 */
-	smp_wmb();
+	wmb();
 
 	if (netif_msg_pktdata(priv)) {
 		pr_info("%s: curr=%d dirty=%d f=%d, e=%d, f_p=%p, nfrags %d\n",
@@ -2336,9 +2401,9 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		/* The own bit must be the latest setting done when prepare the
 		 * descriptor and then barrier is needed to make sure that
-		 * all is coherent before granting the DMA engine.
+		 * all is coherent before granting access to the DMA engine.
 		 */
-		smp_wmb();
+		wmb();
 	}
 
 	netdev_sent_queue(dev, skb->len);

Plus I'd suggest... at least (hand-edited). Driver should really be
modified to use readl() when accessing memory that changes.

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 3e40578..641b03d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1309,6 +1323,8 @@ static void stmmac_tx_clean(struct stmmac_priv *priv)
 		status = priv->hw->desc->tx_status(&priv->dev->stats,
 						      &priv->xstats, p,
 						      priv->ioaddr);
+		rmb();
+		
 		/* Check if the descriptor is owned by the DMA */
 		if (unlikely(status & tx_dma_own))
 			break;

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply related

* Potential issues (security and otherwise) with the current cgroup-bpf API
From: Andy Lutomirski @ 2016-12-17 18:18 UTC (permalink / raw)
  To: Daniel Mack, Alexei Starovoitov, Mickaël Salaün,
	Kees Cook, Jann Horn, Tejun Heo, David Ahern, David S. Miller,
	Thomas Graf, Michael Kerrisk, Peter Zijlstra
  Cc: Linux API, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Network Development

[-- Attachment #1: Type: text/plain, Size: 4679 bytes --]

Hi all-

I apologize for being rather late with this.  I didn't realize that
cgroup-bpf was going to be submitted for Linux 4.10, and I didn't see
it on the linux-api list, so I missed the discussion.

I think that the inet ingress, egress etc filters are a neat feature,
but I think the API has some issues that will bite us down the road
if it becomes stable in its current form.

Most of the problems I see are summarized in this transcript:

# mkdir cg2
# mount -t cgroup2 none cg2
# mkdir cg2/nosockets
# strace cgrp_socket_rule cg2/nosockets/ 0
...
open("cg2/nosockets/", O_RDONLY|O_DIRECTORY) = 3

^^^^ You can modify a cgroup after opening it O_RDONLY?

bpf(BPF_PROG_LOAD, {prog_type=0x9 /* BPF_PROG_TYPE_??? */, insn_cnt=2,
insns=0x7fffe3568c10, license="GPL", log_level=1, log_size=262144,
log_buf=0x6020c0, kern_version=0}, 48) = 4

^^^^ This is fine.  The bpf() syscall manipulates bpf objects.

bpf(0x8 /* BPF_??? */, 0x7fffe3568bf0, 48) = 0

^^^^ This is not so good:
^^^^
^^^^ a) The bpf() syscall is supposed to manipulate bpf objects.  This
^^^^    is manipulating a cgroup.  There's no reason that a socket creation
^^^^    filter couldn't be written in a different language (new iptables
^^^^    table?  Simple list of address families?), but if that happened,
^^^^    then using bpf() to install it would be entirely nonsensical.
^^^^
^^^^ b) This is starting to be an excessively ugly multiplexer.  Among
^^^^    other things, it's very unfriendly to seccomp.

# echo $$ >cg2/nosockets/cgroup.procs
# ping 127.0.0.1
ping: socket: Operation not permitted
# ls cg2/nosockets/
cgroup.controllers  cgroup.events  cgroup.procs  cgroup.subtree_control
# cat cg2/nosockets/cgroup.controllers

^^^^ Something in cgroupfs should give an indication that this cgroup
^^^^ filters socket creation, but there's nothing there.  You should also
^^^^ be able to turn the filter off from cgroupfs.

# mkdir cg2/nosockets/sockets
# /home/luto/apps/linux/samples/bpf/cgrp_socket_rule cg2/nosockets/sockets/ 1

^^^^ This succeeded, which means that, if this feature is enabled in 4.10,
^^^^ then we're stuck with its semantics.  If it returned -EINVAL instead,
^^^^ there would be a chance to refine it.

# echo $$ >cg2/nosockets/sockets/cgroup.procs
# ping 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.029 ms
^C
--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.029/0.029/0.029/0.000 ms

^^^^ Bash was inside a cgroup that disallowed socket creation, but socket
^^^^ creation wasn't disallowed.  This means that the obvious use of socket
^^^^ creation filters in nestable constainers fails insecurely.


There's also a subtle but nasty potential security problem here.
In 4.9 and before, cgroups has only one real effect in the kernel:
resource control. A process in a malicious cgroup could be DoSed,
but that was about the extent of the damage that a malicious cgroup
could do.

In 4.10 with With CONFIG_CGROUP_BPF=y, a cgroup can have bpf
programs attached that can do things if various events occur. (Right
now, this means socket operations, but there are plans in the works
to do this for LSM hooks too.) These bpf programs can say yes or no,
but they can also read out various data (including socket payloads!)
and save them away where an attacker can find them. This sounds a
lot like seccomp with a narrower scope but a much stronger ability to
exfiltrate private information.

Unfortunately, while seccomp is very, very careful to prevent
injection of a privileged victim into a malicious sandbox, the
CGROUP_BPF mechanism appears to have no real security model. There
is nothing to prevent a program that's in a malicious cgroup from
running a setuid binary, and there is nothing to prevent a program
that has the ability to move itself or another program into a
malicious cgroup from doing so and then, if needed for exploitation,
exec a setuid binary.

This isn't much of a problem yet because you currently need
CAP_NET_ADMIN to create a malicious sandbox in the first place.  I'm
sure that, in the near future, someone will want to make this stuff
work in containers with delegated cgroup hierarchies, and then there
may be a real problem here.


I've included a few security people on this thread.  The current API
looks abusable, and it would be nice to find all the holes before
4.10 comes out.


(The cgrp_socket_rule source is attached.  You can build it by sticking it
 in samples/bpf and doing:

 $ make headers_install
 $ cd samples/bpf
 $ gcc -o cgrp_socket_rule cgrp_socket_rule.c libbpf.c -I../../usr/include
)

--Andy

[-- Attachment #2: cgrp_socket_rule.c --]
[-- Type: text/x-csrc, Size: 1545 bytes --]

/* eBPF example program:
 *
 * - Loads eBPF program
 *
 *   The eBPF program sets the sk_bound_dev_if index in new AF_INET{6}
 *   sockets opened by processes in the cgroup.
 *
 * - Attaches the new program to a cgroup using BPF_PROG_ATTACH
 */

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <string.h>
#include <unistd.h>
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <net/if.h>
#include <linux/bpf.h>

#include "libbpf.h"

static int prog_load(int value)
{
	struct bpf_insn prog[] = {
		BPF_MOV64_IMM(BPF_REG_0, value), /* r0 = verdict */
		BPF_EXIT_INSN(),
	};

	return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
			     "GPL", 0);
}

static int usage(const char *argv0)
{
	printf("Usage: %s cg-path value\n", argv0);
	return EXIT_FAILURE;
}

int main(int argc, char **argv)
{
	int cg_fd, prog_fd, value, ret;

	if (argc < 2)
		return usage(argv[0]);

	cg_fd = open(argv[1], O_DIRECTORY | O_RDONLY);
	if (cg_fd < 0) {
		printf("Failed to open cgroup path: '%s'\n", strerror(errno));
		return EXIT_FAILURE;
	}

	value = atoi(argv[2]);

	prog_fd = prog_load(value);
	/* printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf); */

	if (prog_fd < 0) {
		printf("Failed to load prog: '%s'\n", strerror(errno));
		return EXIT_FAILURE;
	}

	ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE);
	if (ret < 0) {
		printf("Failed to attach prog to cgroup: '%s'\n",
		       strerror(errno));
		return EXIT_FAILURE;
	}

	return EXIT_SUCCESS;
}

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox