Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] bpf: cgroup: annotate pointers in struct cgroup_bpf with __rcu
From: David Miller @ 2016-12-17 16:14 UTC (permalink / raw)
  To: daniel; +Cc: ast, daniel, netdev
In-Reply-To: <20161215095321.10571-1-daniel@zonque.org>

From: Daniel Mack <daniel@zonque.org>
Date: Thu, 15 Dec 2016 10:53:21 +0100

> The member 'effective' in 'struct cgroup_bpf' is protected by RCU.
> Annotate it accordingly to squelch a sparse warning.
> 
> Signed-off-by: Daniel Mack <daniel@zonque.org>

Applied.

^ permalink raw reply

* Re: Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: Jeffrey Walton @ 2016-12-17 16:14 UTC (permalink / raw)
  To: Theodore Ts'o, kernel-hardening, Jason A. Donenfeld,
	George Spelvin, ak, davem, David Laight, D. J. Bernstein,
	Eric Biggers, Hannes Frederic Sowa, Jean-Philippe Aumasson,
	linux-crypto, LKML, luto, Netdev, Tom Herbert, Linus Torvalds,
	Vegard Nossum
In-Reply-To: <20161217154152.5oug7mzb4tmfknwv@thunk.org>

> As far as half-siphash is concerned, it occurs to me that the main
> problem will be those users who need to guarantee that output can't be
> guessed over a long period of time.  For example, if you have a
> long-running process, then the output needs to remain unguessable over
> potentially months or years, or else you might be weakening the ASLR
> protections.  If on the other hand, the hash table or the process will
> be going away in a matter of seconds or minutes, the requirements with
> respect to cryptographic strength go down significantly.

Perhaps SipHash-4-8 should be used instead of SipHash-2-4. I believe
SipHash-4-8 is recommended for the security conscious who want to be
more conservative in their security estimates.

SipHash-4-8 does not add much more processing. If you are clocking
SipHash-2-4 at 2.0 or 2.5 cpb, then SipHash-4-8 will run at 3.0 to
4.0. Both are well below MD5 times. (At least with the data sets I've
tested).

> Now, maybe this doesn't matter that much if we can guarantee (or make
> assumptions) that the attacker doesn't have unlimited access the
> output stream of get_random_{long,int}(), or if it's being used in an
> anti-DOS use case where it ultimately only needs to be harder than
> alternate ways of attacking the system.
>
> Rekeying every five minutes doesn't necessarily help the with respect
> to ASLR, but it might reduce the amount of the output stream that
> would be available to the attacker in order to be able to attack the
> get_random_{long,int}() generator, and it also reduces the value of
> doing that attack to only compromising the ASLR for those processes
> started within that five minute window.

Forgive my ignorance... I did not find reading on using the primitive
in a PRNG. Does anyone know what Aumasson or Bernstein have to say?
Aumasson's site does not seem to discuss the use case:
https://www.google.com/search?q=siphash+rng+site%3A131002.net. (And
their paper only mentions random-number once in a different context).

Making the leap from internal hash tables and short-lived network
packets to the rng case may leave something to be desired, especially
if the bits get used in unanticipated ways, like creating long term
private keys.

Jeff

^ permalink raw reply

* Re: [PATCH 1/5] irda: irproc.c: Remove unneeded linux/miscdevice.h include
From: David Miller @ 2016-12-17 16:18 UTC (permalink / raw)
  To: clabbe.montjoie; +Cc: arnd, gregkh, samuel, linux-kernel, netdev
In-Reply-To: <20161215104250.16813-1-clabbe.montjoie@gmail.com>

From: Corentin Labbe <clabbe.montjoie@gmail.com>
Date: Thu, 15 Dec 2016 11:42:46 +0100

> irproc.c does not use any miscdevice so this patch remove this
> unnecessary inclusion.
> 
> Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/5] irda: irnet: Move linux/miscdevice.h include
From: David Miller @ 2016-12-17 16:18 UTC (permalink / raw)
  To: clabbe.montjoie; +Cc: arnd, gregkh, samuel, linux-kernel, netdev
In-Reply-To: <20161215104250.16813-2-clabbe.montjoie@gmail.com>

From: Corentin Labbe <clabbe.montjoie@gmail.com>
Date: Thu, 15 Dec 2016 11:42:47 +0100

> The only use of miscdevice is irda_ppp so no need to include
> linux/miscdevice.h for all irda files.
> This patch move the linux/miscdevice.h include to irnet_ppp.h
> 
> Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 3/5] irnet: ppp: move IRNET_MINOR to include/linux/miscdevice.h
From: David Miller @ 2016-12-17 16:18 UTC (permalink / raw)
  To: clabbe.montjoie; +Cc: arnd, gregkh, samuel, linux-kernel, netdev
In-Reply-To: <20161215104250.16813-3-clabbe.montjoie@gmail.com>

From: Corentin Labbe <clabbe.montjoie@gmail.com>
Date: Thu, 15 Dec 2016 11:42:48 +0100

> This patch move the define for IRNET_MINOR to include/linux/miscdevice.h
> It is better that all minor number definitions are in the same place.
> 
> Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 4/5] irda: irnet: Remove unused IRNET_MAJOR define
From: David Miller @ 2016-12-17 16:18 UTC (permalink / raw)
  To: clabbe.montjoie; +Cc: arnd, gregkh, samuel, linux-kernel, netdev
In-Reply-To: <20161215104250.16813-4-clabbe.montjoie@gmail.com>

From: Corentin Labbe <clabbe.montjoie@gmail.com>
Date: Thu, 15 Dec 2016 11:42:49 +0100

> The IRNET_MAJOR define is not used, so this patch remove it.
> 
> Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 5/5] irda: irnet: add member name to the miscdevice declaration
From: David Miller @ 2016-12-17 16:18 UTC (permalink / raw)
  To: clabbe.montjoie; +Cc: arnd, gregkh, samuel, linux-kernel, netdev
In-Reply-To: <20161215104250.16813-5-clabbe.montjoie@gmail.com>

From: Corentin Labbe <clabbe.montjoie@gmail.com>
Date: Thu, 15 Dec 2016 11:42:50 +0100

> Since the struct miscdevice have many members, it is dangerous to init
> it without members name relying only on member order.
> 
> This patch add member name to the init declaration.
> 
> Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net 0/3] dpaa_eth: a couple of fixes
From: David Miller @ 2016-12-17 16:22 UTC (permalink / raw)
  To: madalin.bucur; +Cc: netdev, linuxppc-dev, linux-kernel
In-Reply-To: <1481807586-4798-1-git-send-email-madalin.bucur@nxp.com>

From: Madalin Bucur <madalin.bucur@nxp.com>
Date: Thu, 15 Dec 2016 15:13:03 +0200

> This patch set introduces big endian accessors in the dpaa_eth driver
> making sure accesses to the QBMan HW are correct on little endian
> platforms. Removing a redundant Kconfig dependency on FSL_SOC.
> Adding myself as maintainer of the dpaa_eth driver.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net] sctp: sctp_epaddr_lookup_transport should be protected by rcu_read_lock
From: David Miller @ 2016-12-17 16:25 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, nhorman, dvyukov
In-Reply-To: <eb70a1f54c102c728a5572f1aa2842940fc3f5a5.1481814055.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Thu, 15 Dec 2016 23:00:55 +0800

> Since commit 7fda702f9315 ("sctp: use new rhlist interface on sctp transport
> rhashtable"), sctp has changed to use rhlist_lookup to look up transport, but
> rhlist_lookup doesn't call rcu_read_lock inside, unlike rhashtable_lookup_fast.
> 
> It is called in sctp_epaddr_lookup_transport and sctp_addrs_lookup_transport.
> sctp_addrs_lookup_transport is always in the protection of rcu_read_lock(),
> as __sctp_lookup_association is called in rx path or sctp_lookup_association
> which are in the protection of rcu_read_lock() already.
> 
> But sctp_epaddr_lookup_transport is called by sctp_endpoint_lookup_assoc, it
> doesn't call rcu_read_lock, which may cause "suspicious rcu_dereference_check
> usage' in __rhashtable_lookup.
> 
> This patch is to fix it by adding rcu_read_lock in sctp_endpoint_lookup_assoc
> before calling sctp_epaddr_lookup_transport.
> 
> Fixes: 7fda702f9315 ("sctp: use new rhlist interface on sctp transport rhashtable")
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] sctp: sctp_transport_lookup_process should rcu_read_unlock when transport is null
From: David Miller @ 2016-12-17 16:25 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, nhorman
In-Reply-To: <3d4f0205d5b5c64d372dffb37602f48e4a173612.1481814352.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Thu, 15 Dec 2016 23:05:52 +0800

> Prior to this patch, sctp_transport_lookup_process didn't rcu_read_unlock
> when it failed to find a transport by sctp_addrs_lookup_transport.
> 
> This patch is to fix it by moving up rcu_read_unlock right before checking
> transport and also to remove the out path.
> 
> Fixes: 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH] net: mvpp2: fix dma unmapping of TX buffers for fragments
From: David Miller @ 2016-12-17 16:35 UTC (permalink / raw)
  To: thomas.petazzoni
  Cc: netdev, linux-arm-kernel, jason, andrew, sebastian.hesselbarth,
	gregory.clement, mw, stefanc, nadavh, hannah, yehuday,
	raphael.glon, stable
In-Reply-To: <20161217162658.63bc2423@free-electrons.com>

From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Date: Sat, 17 Dec 2016 16:26:58 +0100

> Yes, I was thinking of moving towards a single array, as it's indeed
> crazy to have three arrays for that. However, since it's a fix going
> into stable, I also wanted to keep it as simple/straightforward as
> possible and avoid refactoring other parts of the code.

By the same token, by adding a third array you are making the code
more complex, adding more error recovery paths, etc.

> If you however believe moving to one array should be done as part of
> the fix, I'll do this.

Please do.

^ permalink raw reply

* Re: ipv6: handle -EFAULT from skb_copy_bits
From: Dave Jones @ 2016-12-17 16:43 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20161217.104120.945262627572958024.davem@davemloft.net>

On Sat, Dec 17, 2016 at 10:41:20AM -0500, David Miller wrote:
 > From: Dave Jones <davej@codemonkey.org.uk>
 > Date: Wed, 14 Dec 2016 10:47:29 -0500
 > 
 > > It seems to be possible to craft a packet for sendmsg that triggers
 > > the -EFAULT path in skb_copy_bits resulting in a BUG_ON that looks like:
 > > 
 > > RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
 > > RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
 > > RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
 > > RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
 > > RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
 > > R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
 > > R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80
 > > 
 > > Call Trace:
 > >  [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
 > >  [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
 > >  [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
 > >  [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
 > >  [<ffffffff816da27e>] SyS_sendto+0xe/0x10
 > >  [<ffffffff81002910>] do_syscall_64+0x50/0xa0
 > >  [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25
 > > 
 > > Handle this in rawv6_push_pending_frames and jump to the failure path.
 > > 
 > > Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
 > 
 > Hmmm, that's interesting.  Becaue the code in __ip6_append_data(), which
 > sets up the ->cork.base.length value, seems to be defensively trying to
 > avoid this possibility.
 > 
 > For example, it checks things like:
 > 
 > 	if (cork->length + length > mtu - headersize && ipc6->dontfrag &&
 > 	    (sk->sk_protocol == IPPROTO_UDP ||
 > 	     sk->sk_protocol == IPPROTO_RAW)) {
 > 
 > This is why the transport offset plus the length should never exceed
 > the total length for that skb_copy_bits() call.
 > 
 > Perhaps this protocol check in the code above is incomplete?  Do you
 > know what the sk->sk_protocol value was when that BUG triggered?  That
 > might shine some light on what is really happening here.

I'll see if I can craft up a reproducer next week.
For some reason I've not hit this on my test setup at home, but it
reproduces daily in our test setup at facebook.  The only thing
I can think of is that those fb boxes are ipv6 only, so I might be
exercising v4 more at home.

	Dave

^ permalink raw reply

* Re: [net-next PATCH v6 0/5] XDP for virtio_net
From: David Miller @ 2016-12-17 16:56 UTC (permalink / raw)
  To: mst
  Cc: john.fastabend, daniel, netdev, alexei.starovoitov,
	john.r.fastabend, brouer, tgraf
In-Reply-To: <20161216224600-mutt-send-email-mst@kernel.org>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Fri, 16 Dec 2016 22:48:14 +0200

> On Fri, Dec 16, 2016 at 01:20:02PM -0500, David Miller wrote:
>> From: "Michael S. Tsirkin" <mst@redhat.com>
>> Date: Fri, 16 Dec 2016 01:17:44 +0200
>> 
>> > OK, I think we can queue this for -next.
>> > 
>> > It's fairly limited in the kind of hardware supported, we can and
>> > probably should extend it further with time.
>> > 
>> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
>> 
>> Michael, thanks for reviewing.
>> 
>> Since the substance of this patch series has honestly been ready since
>> before the merge window, would you mind if I send this to Linus now?
>> 
>> That's what I was hoping I would be able to do.
>> 
>> Thanks again.
> 
> ACK, no problem.
> BTW in case people wonder, I'll be offline for a couple of weeks.
> This might delay review of some patches a bit.

Great, series applied, thanks everyone!

^ permalink raw reply

* Re: [PATCH] isdn/gigaset: use designated initializers
From: David Miller @ 2016-12-17 16:57 UTC (permalink / raw)
  To: keescook
  Cc: linux-kernel, pebolle, isdn, dan.carpenter, tilman,
	gigaset307x-common, netdev
In-Reply-To: <20161217005806.GA140074@beast>

From: Kees Cook <keescook@chromium.org>
Date: Fri, 16 Dec 2016 16:58:06 -0800

> Prepare to mark sensitive kernel structures for randomization by making
> sure they're using designated initializers. These were identified during
> allyesconfig builds of x86, arm, and arm64, with most initializer fixes
> extracted from grsecurity.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Applied.

^ permalink raw reply

* Re: [PATCH] ATM: use designated initializers
From: David Miller @ 2016-12-17 16:57 UTC (permalink / raw)
  To: keescook
  Cc: linux-kernel, felipe.balbi, mugunthanvnm, fw, javier, jarod,
	netdev
In-Reply-To: <20161217005843.GA140202@beast>

From: Kees Cook <keescook@chromium.org>
Date: Fri, 16 Dec 2016 16:58:43 -0800

> Prepare to mark sensitive kernel structures for randomization by making
> sure they're using designated initializers. These were identified during
> allyesconfig builds of x86, arm, and arm64, with most initializer fixes
> extracted from grsecurity.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Applied.

^ permalink raw reply

* Re: [PATCH] net: use designated initializers
From: David Miller @ 2016-12-17 16:57 UTC (permalink / raw)
  To: keescook; +Cc: linux-kernel, linux-decnet-user, netdev
In-Reply-To: <20161217005858.GA140223@beast>

From: Kees Cook <keescook@chromium.org>
Date: Fri, 16 Dec 2016 16:58:58 -0800

> Prepare to mark sensitive kernel structures for randomization by making
> sure they're using designated initializers. These were identified during
> allyesconfig builds of x86, arm, and arm64, with most initializer fixes
> extracted from grsecurity.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Applied, although "decnet: " would have been a much better
subsystem prefix.

^ permalink raw reply

* Re: [PATCH] WAN: use designated initializers
From: David Miller @ 2016-12-17 16:57 UTC (permalink / raw)
  To: keescook; +Cc: linux-kernel, netdev
In-Reply-To: <20161217005918.GA140243@beast>

From: Kees Cook <keescook@chromium.org>
Date: Fri, 16 Dec 2016 16:59:18 -0800

> Prepare to mark sensitive kernel structures for randomization by making
> sure they're using designated initializers. These were identified during
> allyesconfig builds of x86, arm, and arm64, with most initializer fixes
> extracted from grsecurity.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Applied.

^ permalink raw reply

* Re: [PATCH] bna: use designated initializers
From: David Miller @ 2016-12-17 16:57 UTC (permalink / raw)
  To: keescook
  Cc: linux-kernel, rasesh.mody, sudarsana.kalluru, Dept-GELinuxNICDev,
	netdev
In-Reply-To: <20161217010054.GA140360@beast>

From: Kees Cook <keescook@chromium.org>
Date: Fri, 16 Dec 2016 17:00:54 -0800

> Prepare to mark sensitive kernel structures for randomization by making
> sure they're using designated initializers. These were identified during
> allyesconfig builds of x86, arm, and arm64, with most initializer fixes
> extracted from grsecurity.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Applied.

^ permalink raw reply

* Re: [PATCH] isdn: use designated initializers
From: David Miller @ 2016-12-17 16:58 UTC (permalink / raw)
  To: keescook; +Cc: linux-kernel, isdn, mugunthanvnm, a, felipe.balbi, fw, netdev
In-Reply-To: <20161217010142.GA140401@beast>

From: Kees Cook <keescook@chromium.org>
Date: Fri, 16 Dec 2016 17:01:42 -0800

> Prepare to mark sensitive kernel structures for randomization by making
> sure they're using designated initializers. These were identified during
> allyesconfig builds of x86, arm, and arm64, with most initializer fixes
> extracted from grsecurity.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Applied.

^ permalink raw reply

* Re: [PATCH] net/x25: use designated initializers
From: David Miller @ 2016-12-17 16:58 UTC (permalink / raw)
  To: keescook; +Cc: linux-kernel, andrew.hendry, linux-x25, netdev
In-Reply-To: <20161217010339.GA140529@beast>

From: Kees Cook <keescook@chromium.org>
Date: Fri, 16 Dec 2016 17:03:39 -0800

> Prepare to mark sensitive kernel structures for randomization by making
> sure they're using designated initializers. These were identified during
> allyesconfig builds of x86, arm, and arm64, with most initializer fixes
> extracted from grsecurity.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Applied.

^ permalink raw reply

* Re: [PATCH 0/2] GTP tunneling fixes for net
From: David Miller @ 2016-12-17 17:01 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev, laforge
In-Reply-To: <1481837753-10317-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Thu, 15 Dec 2016 22:35:51 +0100

> The following patchset contains two GTP tunneling fixes for your net
> tree, they are:
> 
> 1) Offset to IPv4 header in gtp_check_src_ms_ipv4() is incorrect, thus
>    this function always succeeds and therefore this defeats this sanity
>    check. This allows packets that have no PDP to go though, patch from
>    Lionel Gauthier.
> 
> 2) According to Note 0 of Figure 2 in Section 6 of 3GPP TS 29.060 v13.5.0
>    Release 13, always set GTPv1 reserved bit to zero. This may cause
>    interoperability problems, patch from Harald Welte.
> 
> Please, apply, thanks a lot!

Series applied, thanks Pablo.

^ permalink raw reply

* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock
From: Pavel Machek @ 2016-12-17 17:31 UTC (permalink / raw)
  To: Lino Sanfilippo
  Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro,
	alexandre.torgue, davem, linux-kernel, netdev
In-Reply-To: <6f43eac8-754b-cfa2-371d-050701deb4cd@gmx.de>

[-- Attachment #1: Type: text/plain, Size: 1355 bytes --]

On Thu 2016-12-15 23:33:22, Lino Sanfilippo wrote:
> On 15.12.2016 22:32, Lino Sanfilippo wrote:
> 
> > Ah ok. Then maybe priv->hw->dma->stop_tx() does not do the job correctly (stop the
> > tx path properly) and the HW is still active on the tx path while the tx buffers are
> > freed. OTOH stmmac_release() also stops the phy before the tx (and rx) paths are stopped.
> > Did you try to stop the phy fist in stmmac_tx_err_work(), too?
> > 
> > Regards,
> > Lino
> > 
> 
> And this is the "sledgehammer" approach: Do a complete shutdown and restart
> of the hardware in case of tx error (against net-next and only
>compile tested).

Wow, thanks a lot. I'll try to get the driver back to the non-working
state, and try it.

I believe I have some idea what is wrong there. (Missing memory barriers).

> +static void stmmac_tx_err_work(struct work_struct *work)
> +{
> +	struct stmmac_priv *priv = container_of(work, struct stmmac_priv,
> +						tx_err_work);
> +	/* restart netdev */
> +	rtnl_lock();
> +	stmmac_release(priv->dev);
> +	stmmac_open(priv->dev);
> +	rtnl_unlock();
> +}

Won't this up/down the interface, in a way userspace can observe?

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: Synopsys Ethernet QoS
From: Pavel Machek @ 2016-12-17 17:38 UTC (permalink / raw)
  To: Joao Pinto
  Cc: Niklas Cassel, Giuseppe CAVALLARO, Florian Fainelli,
	Andy Shevchenko, David Miller, larper, rabinv, netdev,
	CARLOS.PALMINHA, Jie.Deng1, Stephen Warren
In-Reply-To: <79642215-95ce-7f04-3db7-121c585e2f2a@synopsys.com>

[-- Attachment #1: Type: text/plain, Size: 3796 bytes --]

Hi!

> >> So if there is a long time before handling interrupts,
> >> I guess that it makes sense that one stream could
> >> get an advantage in the net scheduler.
> >>
> >> If I find the time, and if no one beats me to it, I will try to replace
> >> the normal timers with HR timers + a smaller default timeout.
> >>
> > 
> > Can you try something like this? Highres timers will be needed, too,
> > but this fixes the logic problem.
> > 
> > You'll need to apply it twice as code is copy&pasted.
> > 
> > Best regards,
> > 									Pavel
> > 
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > 
> >  	 */
> >  	priv->tx_count_frames += nfrags + 1;
> >  	if (likely(priv->tx_coal_frames > priv->tx_count_frames)) {
> > -		mod_timer(&priv->txtimer,
> > -			  STMMAC_COAL_TIMER(priv->tx_coal_timer));
> > +		if (priv->tx_count_frames == nfrags + 1)
> > +			mod_timer(&priv->txtimer,
> > +				  STMMAC_COAL_TIMER(priv->tx_coal_timer));
> >  	} else {
> >  		priv->tx_count_frames = 0;
> >  		priv->hw->desc->set_tx_ic(desc);
> > 
> > 
> 
> I know that this is completely of topic, but I am facing a dificulty with
> stmmac. I have interrupts, mac well configured rx packets being received
> successfully, but TX is not working, resulting in Tx errors = Total TX packets.
> I have made a lot of debug and my conclusions is that by some reason when using
> stmmac after starting tx dma, the hw state machine enters a deadend state
> resulting in those errors. Anyone faced this trouble?

SMP or UP system? AFAICT the driver gets the memory barriers wrong. It
does not fail completely for me, but still fails rather quickly.

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 3e40578..641b03d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2121,11 +2205,11 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (mss_desc)
 		priv->hw->desc->set_tx_owner(mss_desc);
 
-	/* The own bit must be the latest setting done when prepare the
+	/* The own bit must be the latest setting done when preparing the
 	 * descriptor and then barrier is needed to make sure that
 	 * all is coherent before granting the DMA engine.
 	 */
-	smp_wmb();
+	wmb();
 
 	if (netif_msg_pktdata(priv)) {
 		pr_info("%s: curr=%d dirty=%d f=%d, e=%d, f_p=%p, nfrags %d\n",
@@ -2336,9 +2401,9 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		/* The own bit must be the latest setting done when prepare the
 		 * descriptor and then barrier is needed to make sure that
-		 * all is coherent before granting the DMA engine.
+		 * all is coherent before granting access to the DMA engine.
 		 */
-		smp_wmb();
+		wmb();
 	}
 
 	netdev_sent_queue(dev, skb->len);

Plus I'd suggest... at least (hand-edited). Driver should really be
modified to use readl() when accessing memory that changes.

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 3e40578..641b03d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1309,6 +1323,8 @@ static void stmmac_tx_clean(struct stmmac_priv *priv)
 		status = priv->hw->desc->tx_status(&priv->dev->stats,
 						      &priv->xstats, p,
 						      priv->ioaddr);
+		rmb();
+		
 		/* Check if the descriptor is owned by the DMA */
 		if (unlikely(status & tx_dma_own))
 			break;

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply related

* Potential issues (security and otherwise) with the current cgroup-bpf API
From: Andy Lutomirski @ 2016-12-17 18:18 UTC (permalink / raw)
  To: Daniel Mack, Alexei Starovoitov, Mickaël Salaün,
	Kees Cook, Jann Horn, Tejun Heo, David Ahern, David S. Miller,
	Thomas Graf, Michael Kerrisk, Peter Zijlstra
  Cc: Linux API, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Network Development

[-- Attachment #1: Type: text/plain, Size: 4679 bytes --]

Hi all-

I apologize for being rather late with this.  I didn't realize that
cgroup-bpf was going to be submitted for Linux 4.10, and I didn't see
it on the linux-api list, so I missed the discussion.

I think that the inet ingress, egress etc filters are a neat feature,
but I think the API has some issues that will bite us down the road
if it becomes stable in its current form.

Most of the problems I see are summarized in this transcript:

# mkdir cg2
# mount -t cgroup2 none cg2
# mkdir cg2/nosockets
# strace cgrp_socket_rule cg2/nosockets/ 0
...
open("cg2/nosockets/", O_RDONLY|O_DIRECTORY) = 3

^^^^ You can modify a cgroup after opening it O_RDONLY?

bpf(BPF_PROG_LOAD, {prog_type=0x9 /* BPF_PROG_TYPE_??? */, insn_cnt=2,
insns=0x7fffe3568c10, license="GPL", log_level=1, log_size=262144,
log_buf=0x6020c0, kern_version=0}, 48) = 4

^^^^ This is fine.  The bpf() syscall manipulates bpf objects.

bpf(0x8 /* BPF_??? */, 0x7fffe3568bf0, 48) = 0

^^^^ This is not so good:
^^^^
^^^^ a) The bpf() syscall is supposed to manipulate bpf objects.  This
^^^^    is manipulating a cgroup.  There's no reason that a socket creation
^^^^    filter couldn't be written in a different language (new iptables
^^^^    table?  Simple list of address families?), but if that happened,
^^^^    then using bpf() to install it would be entirely nonsensical.
^^^^
^^^^ b) This is starting to be an excessively ugly multiplexer.  Among
^^^^    other things, it's very unfriendly to seccomp.

# echo $$ >cg2/nosockets/cgroup.procs
# ping 127.0.0.1
ping: socket: Operation not permitted
# ls cg2/nosockets/
cgroup.controllers  cgroup.events  cgroup.procs  cgroup.subtree_control
# cat cg2/nosockets/cgroup.controllers

^^^^ Something in cgroupfs should give an indication that this cgroup
^^^^ filters socket creation, but there's nothing there.  You should also
^^^^ be able to turn the filter off from cgroupfs.

# mkdir cg2/nosockets/sockets
# /home/luto/apps/linux/samples/bpf/cgrp_socket_rule cg2/nosockets/sockets/ 1

^^^^ This succeeded, which means that, if this feature is enabled in 4.10,
^^^^ then we're stuck with its semantics.  If it returned -EINVAL instead,
^^^^ there would be a chance to refine it.

# echo $$ >cg2/nosockets/sockets/cgroup.procs
# ping 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.029 ms
^C
--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.029/0.029/0.029/0.000 ms

^^^^ Bash was inside a cgroup that disallowed socket creation, but socket
^^^^ creation wasn't disallowed.  This means that the obvious use of socket
^^^^ creation filters in nestable constainers fails insecurely.


There's also a subtle but nasty potential security problem here.
In 4.9 and before, cgroups has only one real effect in the kernel:
resource control. A process in a malicious cgroup could be DoSed,
but that was about the extent of the damage that a malicious cgroup
could do.

In 4.10 with With CONFIG_CGROUP_BPF=y, a cgroup can have bpf
programs attached that can do things if various events occur. (Right
now, this means socket operations, but there are plans in the works
to do this for LSM hooks too.) These bpf programs can say yes or no,
but they can also read out various data (including socket payloads!)
and save them away where an attacker can find them. This sounds a
lot like seccomp with a narrower scope but a much stronger ability to
exfiltrate private information.

Unfortunately, while seccomp is very, very careful to prevent
injection of a privileged victim into a malicious sandbox, the
CGROUP_BPF mechanism appears to have no real security model. There
is nothing to prevent a program that's in a malicious cgroup from
running a setuid binary, and there is nothing to prevent a program
that has the ability to move itself or another program into a
malicious cgroup from doing so and then, if needed for exploitation,
exec a setuid binary.

This isn't much of a problem yet because you currently need
CAP_NET_ADMIN to create a malicious sandbox in the first place.  I'm
sure that, in the near future, someone will want to make this stuff
work in containers with delegated cgroup hierarchies, and then there
may be a real problem here.


I've included a few security people on this thread.  The current API
looks abusable, and it would be nice to find all the holes before
4.10 comes out.


(The cgrp_socket_rule source is attached.  You can build it by sticking it
 in samples/bpf and doing:

 $ make headers_install
 $ cd samples/bpf
 $ gcc -o cgrp_socket_rule cgrp_socket_rule.c libbpf.c -I../../usr/include
)

--Andy

[-- Attachment #2: cgrp_socket_rule.c --]
[-- Type: text/x-csrc, Size: 1545 bytes --]

/* eBPF example program:
 *
 * - Loads eBPF program
 *
 *   The eBPF program sets the sk_bound_dev_if index in new AF_INET{6}
 *   sockets opened by processes in the cgroup.
 *
 * - Attaches the new program to a cgroup using BPF_PROG_ATTACH
 */

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <string.h>
#include <unistd.h>
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <net/if.h>
#include <linux/bpf.h>

#include "libbpf.h"

static int prog_load(int value)
{
	struct bpf_insn prog[] = {
		BPF_MOV64_IMM(BPF_REG_0, value), /* r0 = verdict */
		BPF_EXIT_INSN(),
	};

	return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
			     "GPL", 0);
}

static int usage(const char *argv0)
{
	printf("Usage: %s cg-path value\n", argv0);
	return EXIT_FAILURE;
}

int main(int argc, char **argv)
{
	int cg_fd, prog_fd, value, ret;

	if (argc < 2)
		return usage(argv[0]);

	cg_fd = open(argv[1], O_DIRECTORY | O_RDONLY);
	if (cg_fd < 0) {
		printf("Failed to open cgroup path: '%s'\n", strerror(errno));
		return EXIT_FAILURE;
	}

	value = atoi(argv[2]);

	prog_fd = prog_load(value);
	/* printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf); */

	if (prog_fd < 0) {
		printf("Failed to load prog: '%s'\n", strerror(errno));
		return EXIT_FAILURE;
	}

	ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE);
	if (ret < 0) {
		printf("Failed to attach prog to cgroup: '%s'\n",
		       strerror(errno));
		return EXIT_FAILURE;
	}

	return EXIT_SUCCESS;
}

^ permalink raw reply

* [PATCH net-next v4 0/2] stmmac: dwmac-meson8b: configurable RGMII TX delay
From: Martin Blumenstingl @ 2016-12-17 18:21 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, mark.rutland-5wv7dgnIgG8,
	carlo-KA+7E9HrN00dnm+yROfE0A, khilman-rdvid1DuHRBWk0Htik3J/w
  Cc: peppe.cavallaro-qxv4g6HH51o, alexandre.torgue-qxv4g6HH51o,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Martin Blumenstingl
In-Reply-To: <20161202233238.17561-1-martin.blumenstingl-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>

Currently the dwmac-meson8b stmmac glue driver uses a hardcoded 1/4
cycle (= 2ns) TX clock delay. This seems to work fine for many boards
(for example Odroid-C2 or Amlogic's reference boards) but there are
some others where TX traffic is simply broken.
There are probably multiple reasons why it's working on some boards
while it's broken on others:
- some of Amlogic's reference boards are using a Micrel PHY
- hardware circuit design
- maybe more...

iperf3 results on my Mecool BB2 board (Meson GXM, RTL8211F PHY) with
TX clock delay disabled on the MAC (as it's enabled in the PHY driver).
TX throughput was virtually zero before:
$ iperf3 -c 192.168.1.100 -R
Connecting to host 192.168.1.100, port 5201
Reverse mode, remote host 192.168.1.100 is sending
[  4] local 192.168.1.206 port 52828 connected to 192.168.1.100 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   108 MBytes   901 Mbits/sec
[  4]   1.00-2.00   sec  94.2 MBytes   791 Mbits/sec
[  4]   2.00-3.00   sec  96.5 MBytes   810 Mbits/sec
[  4]   3.00-4.00   sec  96.2 MBytes   808 Mbits/sec
[  4]   4.00-5.00   sec  96.6 MBytes   810 Mbits/sec
[  4]   5.00-6.00   sec  96.5 MBytes   810 Mbits/sec
[  4]   6.00-7.00   sec  96.6 MBytes   810 Mbits/sec
[  4]   7.00-8.00   sec  96.5 MBytes   809 Mbits/sec
[  4]   8.00-9.00   sec   105 MBytes   884 Mbits/sec
[  4]   9.00-10.00  sec   111 MBytes   934 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1000 MBytes   839 Mbits/sec    0             sender
[  4]   0.00-10.00  sec   998 MBytes   837 Mbits/sec                  receiver

iperf Done.
$ iperf3 -c 192.168.1.100
Connecting to host 192.168.1.100, port 5201
[  4] local 192.168.1.206 port 52832 connected to 192.168.1.100 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.01   sec  99.5 MBytes   829 Mbits/sec  117    139 KBytes
[  4]   1.01-2.00   sec   105 MBytes   884 Mbits/sec  129   70.7 KBytes
[  4]   2.00-3.01   sec   107 MBytes   889 Mbits/sec  106    187 KBytes
[  4]   3.01-4.01   sec   105 MBytes   878 Mbits/sec   92    143 KBytes
[  4]   4.01-5.00   sec   105 MBytes   882 Mbits/sec  140    129 KBytes
[  4]   5.00-6.01   sec   106 MBytes   883 Mbits/sec  115    195 KBytes
[  4]   6.01-7.00   sec   102 MBytes   863 Mbits/sec  133   70.7 KBytes
[  4]   7.00-8.01   sec   106 MBytes   884 Mbits/sec  143   97.6 KBytes
[  4]   8.01-9.01   sec   104 MBytes   875 Mbits/sec  124    107 KBytes
[  4]   9.01-10.01  sec   105 MBytes   876 Mbits/sec   90    139 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.01  sec  1.02 GBytes   874 Mbits/sec  1189             sender
[  4]   0.00-10.01  sec  1.02 GBytes   873 Mbits/sec                  receiver

iperf Done.

I get similar TX throughput on my Meson GXBB "MXQ Pro+" board when I
disable the PHY's TX-delay and configure a 4ms TX-delay on the MAC.
So changes to at least the RTL8211F PHY driver are needed to get it
working properly in all situations.

Changes since v3:
- rebased to apply against current net-next branch (fixes a conflict
  with d2ed0a7755fe14c7 "net: ethernet: stmmac: fix of-node and
  fixed-link-phydev leaks")
- 

Changes since v2:
- moved all .dts patches (3-7) to a separate series
- removed the default 2ns TX delay when phy-mode RGMII is specified
- (rebased against current net-next)

Changes since v1:
- renamed the devicetree property "amlogic,tx-delay" to
  "amlogic,tx-delay-ns", which makes the .dts easier to read as we can
  simply specify human-readable values instead of having "preprocessor
  defines and calculation in human brain". Thanks to Andrew Lunn for
  the suggestion!
- improved documentation to indicate when the MAC TX-delay should be
  configured and how to use the PHY's TX-delay
- changed the default TX-delay in the dwmac-meson8b driver from 2ns
  to 0ms when any of the rgmii-*id modes are used (the 2ns default
  value still applies for phy-mode "rgmii")
- added patches to properly reset the PHY on Meson GXBB devices and to
  use a similar configuration than the one we use on Meson GXL devices
  (by passing a phy-handle to stmmac and defining the PHY in the mdio0
  bus - patch 3-6)
- add the "amlogic,tx-delay-ns" property to all boards which are using
  the RGMII PHY (patch 7)

Martin Blumenstingl (2):
  net: dt-bindings: add RGMII TX delay configuration to meson8b-dwmac
  net: stmmac: dwmac-meson8b: make the RGMII TX delay configurable

 .../devicetree/bindings/net/meson-dwmac.txt         | 14 ++++++++++++++
 drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c | 21 +++++++++++++++------
 2 files changed, 29 insertions(+), 6 deletions(-)

-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox