Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] rds: tcp: remove register_netdevice_notifier infrastructure.
From: Kirill Tkhai @ 2018-03-19 15:13 UTC (permalink / raw)
  To: Sowmini Varadhan, netdev; +Cc: davem, santosh.shilimkar
In-Reply-To: <1521467568-37876-1-git-send-email-sowmini.varadhan@oracle.com>

On 19.03.2018 16:52, Sowmini Varadhan wrote:
> The netns deletion path does not need to wait for all net_devices
> to be unregistered before dismantling rds_tcp state for the netns
> (we are able to dismantle this state on module unload even when
> all net_devices are active so there is no dependency here).
> 
> This patch removes code related to netdevice notifiers and
> refactors all the code needed to dismantle rds_tcp state
> into a ->exit callback for the pernet_operations used with
> register_pernet_device().
> 
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>

I just repeat my words:
rds_tcp_listen_sock destruction looks nice and safe, since all
the places the sockets is dereferenced use sk_callback_lock.
So they don't miss rds_tcp_listen_sock = NULL, as rds_tcp_listen_stop()
takes the lock too.

rds_tcp_conn_list is populated from:

1)rds_tcp_accept_one(), which can't happen after we flushed the queue
in rds_tcp_listen_stop();

2)rds_sendmsg(), which is triggered by userspace, and that's impossible,
when net is dead;

3)rds_ib_cm_handle_connect(), which call rds_conn_create() with init_net
argument only. This may race with module unloading only, but this problem
is already solved in RDS by rds_destroy_pending() check, which care about
that.

Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>

(The only thing I don't know is the reason we need to destroy the sockets
 before last netdevice, but I haven't dived into that. Just to mention this...).

Thanks, Sowmini.

Kirill

> ---
>  net/rds/tcp.c |   93 ++++++++++++++-------------------------------------------
>  1 files changed, 23 insertions(+), 70 deletions(-)
> 
> diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> index 08ea9cd..4f3a32c 100644
> --- a/net/rds/tcp.c
> +++ b/net/rds/tcp.c
> @@ -485,40 +485,6 @@ static __net_init int rds_tcp_init_net(struct net *net)
>  	return err;
>  }
>  
> -static void __net_exit rds_tcp_exit_net(struct net *net)
> -{
> -	struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
> -
> -	if (rtn->rds_tcp_sysctl)
> -		unregister_net_sysctl_table(rtn->rds_tcp_sysctl);
> -
> -	if (net != &init_net && rtn->ctl_table)
> -		kfree(rtn->ctl_table);
> -
> -	/* If rds_tcp_exit_net() is called as a result of netns deletion,
> -	 * the rds_tcp_kill_sock() device notifier would already have cleaned
> -	 * up the listen socket, thus there is no work to do in this function.
> -	 *
> -	 * If rds_tcp_exit_net() is called as a result of module unload,
> -	 * i.e., due to rds_tcp_exit() -> unregister_pernet_subsys(), then
> -	 * we do need to clean up the listen socket here.
> -	 */
> -	if (rtn->rds_tcp_listen_sock) {
> -		struct socket *lsock = rtn->rds_tcp_listen_sock;
> -
> -		rtn->rds_tcp_listen_sock = NULL;
> -		rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w);
> -	}
> -}
> -
> -static struct pernet_operations rds_tcp_net_ops = {
> -	.init = rds_tcp_init_net,
> -	.exit = rds_tcp_exit_net,
> -	.id = &rds_tcp_netid,
> -	.size = sizeof(struct rds_tcp_net),
> -	.async = true,
> -};
> -
>  static void rds_tcp_kill_sock(struct net *net)
>  {
>  	struct rds_tcp_connection *tc, *_tc;
> @@ -546,40 +512,38 @@ static void rds_tcp_kill_sock(struct net *net)
>  		rds_conn_destroy(tc->t_cpath->cp_conn);
>  }
>  
> -void *rds_tcp_listen_sock_def_readable(struct net *net)
> +static void __net_exit rds_tcp_exit_net(struct net *net)
>  {
>  	struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
> -	struct socket *lsock = rtn->rds_tcp_listen_sock;
>  
> -	if (!lsock)
> -		return NULL;
> +	rds_tcp_kill_sock(net);
>  
> -	return lsock->sk->sk_user_data;
> +	if (rtn->rds_tcp_sysctl)
> +		unregister_net_sysctl_table(rtn->rds_tcp_sysctl);
> +
> +	if (net != &init_net && rtn->ctl_table)
> +		kfree(rtn->ctl_table);
>  }
>  
> -static int rds_tcp_dev_event(struct notifier_block *this,
> -			     unsigned long event, void *ptr)
> +static struct pernet_operations rds_tcp_net_ops = {
> +	.init = rds_tcp_init_net,
> +	.exit = rds_tcp_exit_net,
> +	.id = &rds_tcp_netid,
> +	.size = sizeof(struct rds_tcp_net),
> +	.async = true,
> +};
> +
> +void *rds_tcp_listen_sock_def_readable(struct net *net)
>  {
> -	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> +	struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
> +	struct socket *lsock = rtn->rds_tcp_listen_sock;
>  
> -	/* rds-tcp registers as a pernet subys, so the ->exit will only
> -	 * get invoked after network acitivity has quiesced. We need to
> -	 * clean up all sockets  to quiesce network activity, and use
> -	 * the unregistration of the per-net loopback device as a trigger
> -	 * to start that cleanup.
> -	 */
> -	if (event == NETDEV_UNREGISTER_FINAL &&
> -	    dev->ifindex == LOOPBACK_IFINDEX)
> -		rds_tcp_kill_sock(dev_net(dev));
> +	if (!lsock)
> +		return NULL;
>  
> -	return NOTIFY_DONE;
> +	return lsock->sk->sk_user_data;
>  }
>  
> -static struct notifier_block rds_tcp_dev_notifier = {
> -	.notifier_call        = rds_tcp_dev_event,
> -	.priority = -10, /* must be called after other network notifiers */
> -};
> -
>  /* when sysctl is used to modify some kernel socket parameters,this
>   * function  resets the RDS connections in that netns  so that we can
>   * restart with new parameters.  The assumption is that such reset
> @@ -625,9 +589,7 @@ static void rds_tcp_exit(void)
>  	rds_tcp_set_unloading();
>  	synchronize_rcu();
>  	rds_info_deregister_func(RDS_INFO_TCP_SOCKETS, rds_tcp_tc_info);
> -	unregister_pernet_subsys(&rds_tcp_net_ops);
> -	if (unregister_netdevice_notifier(&rds_tcp_dev_notifier))
> -		pr_warn("could not unregister rds_tcp_dev_notifier\n");
> +	unregister_pernet_device(&rds_tcp_net_ops);
>  	rds_tcp_destroy_conns();
>  	rds_trans_unregister(&rds_tcp_transport);
>  	rds_tcp_recv_exit();
> @@ -651,24 +613,15 @@ static int rds_tcp_init(void)
>  	if (ret)
>  		goto out_slab;
>  
> -	ret = register_pernet_subsys(&rds_tcp_net_ops);
> +	ret = register_pernet_device(&rds_tcp_net_ops);
>  	if (ret)
>  		goto out_recv;
>  
> -	ret = register_netdevice_notifier(&rds_tcp_dev_notifier);
> -	if (ret) {
> -		pr_warn("could not register rds_tcp_dev_notifier\n");
> -		goto out_pernet;
> -	}
> -
>  	rds_trans_register(&rds_tcp_transport);
>  
>  	rds_info_register_func(RDS_INFO_TCP_SOCKETS, rds_tcp_tc_info);
>  
>  	goto out;
> -
> -out_pernet:
> -	unregister_pernet_subsys(&rds_tcp_net_ops);
>  out_recv:
>  	rds_tcp_recv_exit();
>  out_slab:
> 

^ permalink raw reply

* RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access
From: David Laight @ 2018-03-19 15:19 UTC (permalink / raw)
  To: 'Thomas Gleixner'
  Cc: 'Rahul Lakkireddy', x86@kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	mingo@redhat.com, hpa@zytor.com, davem@davemloft.net,
	akpm@linux-foundation.org, torvalds@linux-foundation.org,
	ganeshgr@chelsio.com, nirranjan@chelsio.com, indranil@chelsio.com
In-Reply-To: <alpine.DEB.2.21.1803191557400.2010@nanos.tec.linutronix.de>

From: Thomas Gleixner
> Sent: 19 March 2018 15:05
> 
> On Mon, 19 Mar 2018, David Laight wrote:
> > From: Rahul Lakkireddy
> > In principle it ought to be possible to get access to one or two
> > (eg) AVX registers by saving them to stack and telling the fpu
> > save code where you've put them.
> 
> No. We have functions for this and we are not adding new ad hoc magic.

I was thinking that a real API might do this...
Useful also for code that needs AVX-like registers to do things like CRCs.

> > OTOH, for x86, if the code always runs in process context (eg from a
> > system call) then, since the ABI defines them all as caller-saved
> > the AVX(2) registers, it is only necessary to ensure that the current
> > FPU registers belong to the current process once.
> > The registers can be set to zero by an 'invalidate' instruction on
> > system call entry (hope this is done) and after use.
> 
> Why would a system call touch the FPU registers? The kernel normally does
> not use FPU instructions and the code which explicitely does has to take
> care of save/restore. It would be performance madness to fiddle with the
> FPU stuff unconditionally if nothing uses it.

If system call entry reset the AVX registers then any FP save/restore
would be faster because the AVX registers wouldn't need to be saved
(and the cpu won't save them).
I believe the instruction to reset the AVX registers is fast.
The AVX registers only ever need saving if the process enters the
kernel through an interrupt.

	David

^ permalink raw reply

* Re: [RFC v2 0/2] kernel: add support to collect hardware logs in crash recovery kernel
From: Stephen Hemminger @ 2018-03-19 15:22 UTC (permalink / raw)
  To: Rahul Lakkireddy
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	kexec@lists.infradead.org, davem@davemloft.net,
	ebiederm@xmission.com, akpm@linux-foundation.org,
	torvalds@linux-foundation.org, Ganesh GR, Nirranjan Kirubaharan,
	Indranil Choudhury
In-Reply-To: <20180319075555.GA22955@chelsio.com>

On Mon, 19 Mar 2018 13:25:56 +0530
Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> wrote:

> On Friday, March 03/16/18, 2018 at 16:42:03 +0530, Rahul Lakkireddy wrote:
> > On production servers running variety of workloads over time, kernel
> > panic can happen sporadically after days or even months. It is
> > important to collect as much debug logs as possible to root cause
> > and fix the problem, that may not be easy to reproduce. Snapshot of
> > underlying hardware/firmware state (like register dump, firmware
> > logs, adapter memory, etc.), at the time of kernel panic will be very
> > helpful while debugging the culprit device driver.
> > 
> > This series of patches add new generic framework that enable device
> > drivers to collect device specific snapshot of the hardware/firmware
> > state of the underlying device in the crash recovery kernel. In crash
> > recovery kernel, the collected logs are exposed via /proc/crashdd/
> > directory, which is copied by user space scripts for post-analysis.
> > 
> > A kernel module crashdd is newly added. In crash recovery kernel,
> > crashdd exposes /proc/crashdd/ directory containing device specific
> > hardware/firmware logs.
> > 
> > The sequence of actions done by device drivers to append their device
> > specific hardware/firmware logs to /proc/crashdd/ directory are as
> > follows:
> > 
> > 1. During probe (before hardware is initialized), device drivers
> > register to the crashdd module (via crashdd_add_dump()), with
> > callback function, along with buffer size and log name needed for
> > firmware/hardware log collection.
> > 
> > 2. Crashdd creates a driver's directory under /proc/crashdd/<driver>.
> > Then, it allocates the buffer with requested size and invokes the
> > device driver's registered callback function.
> > 
> > 3. Device driver collects all hardware/firmware logs into the buffer
> > and returns control back to crashdd.
> > 
> > 4. Crashdd exposes the buffer as a file via
> > /proc/crashdd/<driver>/<dump_file>.
> > 
> > 5. User space script (/usr/lib/kdump/kdump-lib-initramfs.sh) copies
> > the entire /proc/crashdd/ directory to /var/crash/ directory.
> > 
> > Patch 1 adds crashdd module to allow drivers to register callback to
> > collect the device specific hardware/firmware logs.  The module also
> > exports /proc/crashdd/ directory containing the hardware/firmware logs.
> > 
> > Patch 2 shows a cxgb4 driver example using the API to collect
> > hardware/firmware logs in crash recovery kernel, before hardware is
> > initialized.  The logs for the devices are made available under
> > /proc/crashdd/cxgb4/ directory.
> > 
> > Suggestions and feedback will be much appreciated.
> > 
> > Thanks,
> > Rahul
> > 
> > RFC v1: https://www.spinics.net/lists/netdev/msg486562.html
> > 
> > ---
> > v2:
> > - Added new crashdd module that exports /proc/crashdd/ containing
> >   driver's registered hardware/firmware logs in patch 1.
> > - Replaced the API to allow drivers to register their hardware/firmware
> >   log collect routine in crash recovery kernel in patch 1.
> > - Updated patch 2 to use the new API in patch 1.
> > 
> > Rahul Lakkireddy (2):
> >   proc/crashdd: add API to collect hardware dump in second kernel
> >   cxgb4: collect hardware dump in second kernel
> > 
> >  drivers/net/ethernet/chelsio/cxgb4/cxgb4.h       |   4 +
> >  drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c |  25 +++
> >  drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h |   3 +
> >  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  |  12 ++
> >  fs/proc/Kconfig                                  |  11 +
> >  fs/proc/Makefile                                 |   1 +
> >  fs/proc/crashdd.c                                | 263 +++++++++++++++++++++++
> >  include/linux/crashdd.h                          |  43 ++++
> >  8 files changed, 362 insertions(+)
> >  create mode 100644 fs/proc/crashdd.c
> >  create mode 100644 include/linux/crashdd.h
> > 
> > -- 
> > 2.14.1
> >   
> 
> Does anyone have any comments with this approach?  If there are no
> comments, then I'll re-spin this RFC to Patch series.
> 
> Thanks,
> Rahul

This does look like it gives useful data, but it is not clear that this can
not already be done with existing API's or small extensions.

Introducing a new /proc interface and one that is mostly device specific is
unlikely to be greeted with a warm reception by the current Linux kernel community.

For example, getting firmware logs seems like something more related to
ethtool or sysfs.

^ permalink raw reply

* Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access
From: Christoph Hellwig @ 2018-03-19 15:27 UTC (permalink / raw)
  To: Rahul Lakkireddy
  Cc: x86, linux-kernel, netdev, tglx, mingo, hpa, davem, akpm,
	torvalds, ganeshgr, nirranjan, indranil
In-Reply-To: <cover.1521469118.git.rahul.lakkireddy@chelsio.com>

On Mon, Mar 19, 2018 at 07:50:33PM +0530, Rahul Lakkireddy wrote:
> This series of patches add support for 256-bit IO read and write.
> The APIs are readqq and writeqq (quad quadword - 4 x 64), that read
> and write 256-bits at a time from IO, respectively.

What a horrible name.  please encode the actual number of bits instead.

^ permalink raw reply

* Re: get_user_pages returning 0 (was Re: kernel BUG at drivers/vhost/vhost.c:LINE!)
From: David Sterba @ 2018-03-19 15:29 UTC (permalink / raw)
  To:  Michael S. Tsirkin 
  Cc: syzbot, Michel Lespinasse, syzkaller-bugs, linux-mm,
	Andrew Morton, virtualization, aarcange, jasowang, kvm,
	linux-kernel, netdev
In-Reply-To: <20180319161406-mutt-send-email-mst@kernel.org>

On Mon, Mar 19, 2018 at 05:09:28PM +0200,  Michael S. Tsirkin  wrote:
> Hello!
> The following code triggered by syzbot 
> 
>         r = get_user_pages_fast(log, 1, 1, &page);
>         if (r < 0)
>                 return r;
>         BUG_ON(r != 1);
> 
> Just looking at get_user_pages_fast's documentation this seems
> impossible - it is supposed to only ever return # of pages
> pinned or errno.
> 
> However, poking at code, I see at least one path that might cause this:
> 
>                         ret = faultin_page(tsk, vma, start, &foll_flags,
>                                         nonblocking);
>                         switch (ret) {
>                         case 0:
>                                 goto retry;
>                         case -EFAULT:
>                         case -ENOMEM:
>                         case -EHWPOISON:
>                                 return i ? i : ret;
>                         case -EBUSY:
>                                 return i;
> 
> which originally comes from:
> 
> commit 53a7706d5ed8f1a53ba062b318773160cc476dde
> Author: Michel Lespinasse <walken@google.com>
> Date:   Thu Jan 13 15:46:14 2011 -0800
> 
>     mlock: do not hold mmap_sem for extended periods of time
>     
>     __get_user_pages gets a new 'nonblocking' parameter to signal that the
>     caller is prepared to re-acquire mmap_sem and retry the operation if
>     needed.  This is used to split off long operations if they are going to
>     block on a disk transfer, or when we detect contention on the mmap_sem.
>     
>     [akpm@linux-foundation.org: remove ref to rwsem_is_contended()]
>     Signed-off-by: Michel Lespinasse <walken@google.com>
>     Cc: Hugh Dickins <hughd@google.com>
>     Cc: Rik van Riel <riel@redhat.com>
>     Cc: Peter Zijlstra <peterz@infradead.org>
>     Cc: Nick Piggin <npiggin@kernel.dk>
>     Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>     Cc: Ingo Molnar <mingo@elte.hu>
>     Cc: "H. Peter Anvin" <hpa@zytor.com>
>     Cc: Thomas Gleixner <tglx@linutronix.de>
>     Cc: David Howells <dhowells@redhat.com>
>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> 
> I started looking into this, if anyone has any feedback meanwhile,
> that would be appreciated.
> 
> In particular I don't really see why would this trigger
> on commit 8f5fd927c3a7576d57248a2d7a0861c3f2795973:
> 
> Merge: 8757ae2 093e037
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Fri Mar 16 13:37:42 2018 -0700
> 
>     Merge tag 'for-4.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
> 
> is btrfs used on these systems?

There were 3 patches pulled by that tag, none of them is even remotely
related to the reported bug, AFAICS. If there's some impact, it must be
indirect, obvious bugs like NULL pointer would exhibit in a different
way and leave at least some trace in the stacks.

^ permalink raw reply

* Re: [PATCH RFC 1/2] netlink: extend extack so it can carry more than one message
From: Marcelo Ricardo Leitner @ 2018-03-19 15:34 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, Alexander Aring, Jiri Pirko, Jakub Kicinski
In-Reply-To: <cea2b3c4-c5ae-878a-21c4-57ada02aa2f3@gmail.com>

On Sun, Mar 18, 2018 at 10:27:00PM -0600, David Ahern wrote:
> On 3/18/18 12:19 PM, Marcelo Ricardo Leitner wrote:
> > On Sun, Mar 18, 2018 at 10:11:20AM -0600, David Ahern wrote:
> >> On 3/16/18 1:23 PM, Marcelo Ricardo Leitner wrote:
> >>> Currently extack can carry only a single message, which is usually the
> >>> error message.
> >>>
> >>> This imposes a limitation on a more verbose error reporting. For
> >>> example, it's not able to carry warning messages together with the error
> >>> message, or 2 warning messages.
> >>
> >>
> >> The only means for userspace to separate an error message from info or
> >> warnings is the error in nlmsgerr. If it is non-0, any extack message is
> >> considered an error else it is a warning.
> > 
> > I don't see your point here.
> > 
> > The proposed patch extends what you said to:
> > - include warnings on error reports
> > - allow more than 1 message
> > 
> > With the proposed patch, if nlmsgerr is 0 all messages are considered
> > as warnings. If it's non-zero, some may be marked as warnings.
> 
> It's the 'some' that I was referring to, but ...
> 
> 
> >>> +#define NL_SET_ERR_MSG(extack, msg)	NL_SET_MSG(extack, msg)
> >>> +#define NL_SET_WARN_MSG(extack, msg)	NL_SET_MSG(extack, KERN_WARNING msg)
> >>> +
> >>>  #define NL_SET_ERR_MSG_MOD(extack, msg)			\
> >>>  	NL_SET_ERR_MSG((extack), KBUILD_MODNAME ": " msg)
> >>> +#define NL_SET_WARN_MSG_MOD(extack, msg)		\
> >>> +	NL_SET_WARN_MSG((extack), KBUILD_MODNAME ": " msg)
> >>> +
> >>
> >> Adding separate macros for error versus warning is confusing since from
> >> an extack perspective a message is a message and there is no uapi to
> >> separate them.
> > 
> > Are you saying the markings at beginning of the messages are not
> > possible? If that's the case, we probably can think of something else,
> > as I see value in being able to deliver warnings together with errors.
> 
> ... I did miss the KERN_WARNIN above. That means that warning messages
> are prefixed by 0x1 (KERN_SOH) and "4" (warning loglevel). There will be
> cases missed by iproute2 as current versions won't catch the 2 new
> characters.

The first one is not printable, so it would print a weird '4' at the
beginning of the message. But: only if it didn't have any error
message later, because old iproute will display only the last message
(and error messages are not tagged).

> 
> Converting code to be able to continue generating error messages yet
> ultimately fail seems overly complex to me. I get the intent of
> returning as much info as possible, but most of that feels (e.g., in the
> mlx5 example you referenced) like someone learning how to do something
> the first time in which case 1 at a time errors seems reasonable - in
> fact might drive home some lessons. ;-)

That is true.

Yep, I'm still lacking a real user for it. Maybe with the patchset
split it will come up.

  M.

^ permalink raw reply

* Re: [PATCH] net: phy: realtek: Add dummy stubs for MMD register access for rtl8211b
From: Andrew Lunn @ 2018-03-19 15:37 UTC (permalink / raw)
  To: Claudiu Manoil; +Cc: Kevin Hao, netdev@vger.kernel.org, Florian Fainelli
In-Reply-To: <AM0PR0402MB33963682784604EC4904928096D40@AM0PR0402MB3396.eurprd04.prod.outlook.com>

> This gianfar patch should have been a temporary workaround.
> Obviously, the driver of an (old) eth controller that does not support EEE should
> not be modified to have the same eth controller work normally when some new EEE
> capable phy happens to be attached to that controller (i.e. on a new board).
> It should be up to the phy integration layer to identify that the controller and the phy
> are not EEE compatible, and restrict the phy from entering EEE mode. (without any
> change to the eth driver)

We end up in the same place, needing to patch the RealTek PHY driver.

A MAC driver indicates it can do EEE by calling phy_init_eee(). This
will then turn on EEE in the PHY.

A PHY is not supposed to negotiate EEE unless it is asked to. But some
PHYs do it by default, and the PHY driver is not turning it off. That
is what b6b5e8a69118 fixes for the gianfar.

We could unconditionally disable EEE on all PHYs at probe time, and
then let phy_init_eee() turn it back on again. But then we need the
fix posted here for the RealTek PHY.

There does not appear to be any bit in the PHY status registers to
indicate if MMD is supported or not. So i don't see how we can make
unconditional disable of EEE safe without introducing lots of
regressions.

	Andrew

^ permalink raw reply

* RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access
From: Thomas Gleixner @ 2018-03-19 15:37 UTC (permalink / raw)
  To: David Laight
  Cc: 'Rahul Lakkireddy', x86@kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	mingo@redhat.com, hpa@zytor.com, davem@davemloft.net,
	akpm@linux-foundation.org, torvalds@linux-foundation.org,
	ganeshgr@chelsio.com, nirranjan@chelsio.com, indranil@chelsio.com
In-Reply-To: <7f8d811e79284a78a763f4852984eb3f@AcuMS.aculab.com>

On Mon, 19 Mar 2018, David Laight wrote:
> From: Thomas Gleixner
> > Sent: 19 March 2018 15:05
> > 
> > On Mon, 19 Mar 2018, David Laight wrote:
> > > From: Rahul Lakkireddy
> > > In principle it ought to be possible to get access to one or two
> > > (eg) AVX registers by saving them to stack and telling the fpu
> > > save code where you've put them.
> > 
> > No. We have functions for this and we are not adding new ad hoc magic.
> 
> I was thinking that a real API might do this...

We have a real API and that's good enough for the stuff we have using AVX
in the kernel.

> Useful also for code that needs AVX-like registers to do things like CRCs.

x86/crypto/ has a lot of AVX optimized code.

> > > OTOH, for x86, if the code always runs in process context (eg from a
> > > system call) then, since the ABI defines them all as caller-saved
> > > the AVX(2) registers, it is only necessary to ensure that the current
> > > FPU registers belong to the current process once.
> > > The registers can be set to zero by an 'invalidate' instruction on
> > > system call entry (hope this is done) and after use.
> > 
> > Why would a system call touch the FPU registers? The kernel normally does
> > not use FPU instructions and the code which explicitely does has to take
> > care of save/restore. It would be performance madness to fiddle with the
> > FPU stuff unconditionally if nothing uses it.
> 
> If system call entry reset the AVX registers then any FP save/restore
> would be faster because the AVX registers wouldn't need to be saved
> (and the cpu won't save them).
> I believe the instruction to reset the AVX registers is fast.
> The AVX registers only ever need saving if the process enters the
> kernel through an interrupt.

Wrong. The x8664 ABI clearly states:

   Linux Kernel code is not allowed to change the x87 and SSE units. If
   those are changed by kernel code, they have to be restored properly
   before sleeping or leav- ing the kernel.

That means the syscall interface relies on FPU state being not changed by
the kernel. So if you want to clear AVX on syscall entry you need to save
it first and then restore before returning. That would be a huge
performance hit.

Thanks,

	tglx

^ permalink raw reply

* Re: linux-next on x60: network manager often complains "network is disabled" after resume
From: Dan Williams @ 2018-03-19 15:40 UTC (permalink / raw)
  To: Pavel Machek, Woody Suwalski
  Cc: Rafael J. Wysocki, kernel list, Linux-pm mailing list,
	Netdev list
In-Reply-To: <20180319092106.GA5683@amd>

On Mon, 2018-03-19 at 10:21 +0100, Pavel Machek wrote:
> On Mon 2018-03-19 05:17:45, Woody Suwalski wrote:
> > Pavel Machek wrote:
> > > Hi!
> > > 
> > > With recent linux-next, after resume networkmanager often claims
> > > that
> > > "network is disabled". Sometimes suspend/resume clears that.
> > > 
> > > Any ideas? Does it work for you?
> > > 									
> > > Pavel
> > 
> > Tried the 4.16-rc6 with nm 1.4.4 - I do not see the issue.
> 
> Thanks for testing... but yes, 4.16 should be ok. If not fixed,
> problem will appear in 4.17-rc1.

Where does the complaint occur?  In the GUI, or with nmcli, or
somewhere else?  Also, what's the output of "nmcli dev" after resume?

Dan

^ permalink raw reply

* RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access
From: David Laight @ 2018-03-19 15:53 UTC (permalink / raw)
  To: 'Thomas Gleixner'
  Cc: 'Rahul Lakkireddy', x86@kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	mingo@redhat.com, hpa@zytor.com, davem@davemloft.net,
	akpm@linux-foundation.org, torvalds@linux-foundation.org,
	ganeshgr@chelsio.com, nirranjan@chelsio.com, indranil@chelsio.com
In-Reply-To: <alpine.DEB.2.21.1803191625080.2010@nanos.tec.linutronix.de>

From: Thomas Gleixner
> Sent: 19 March 2018 15:37
...
> > If system call entry reset the AVX registers then any FP save/restore
> > would be faster because the AVX registers wouldn't need to be saved
> > (and the cpu won't save them).
> > I believe the instruction to reset the AVX registers is fast.
> > The AVX registers only ever need saving if the process enters the
> > kernel through an interrupt.
> 
> Wrong. The x8664 ABI clearly states:
> 
>    Linux Kernel code is not allowed to change the x87 and SSE units. If
>    those are changed by kernel code, they have to be restored properly
>    before sleeping or leav- ing the kernel.
> 
> That means the syscall interface relies on FPU state being not changed by
> the kernel. So if you want to clear AVX on syscall entry you need to save
> it first and then restore before returning. That would be a huge
> performance hit.

The x87 and SSE registers can't be changed - they can contain callee-saved
registers.
But (IIRC) the AVX and AVX2 registers are all caller-saved.
So the system call entry stub functions are allowed to change them.
Which means that the syscall entry code can also change them.
Of course it must not leak kernel values back to userspace.

It is a few years since I looked at the AVX and fpu save code.

	David

^ permalink raw reply

* Re: NULL pointer dereferences with 4.14.27
From: Holger Hoffstätte @ 2018-03-19 15:57 UTC (permalink / raw)
  To: Carlos Carvalho, Soheil Hassas Yeganeh, David S. Miller, Greg KH,
	netdev, stable
In-Reply-To: <f5882c4c-049e-3485-9ad4-f552a38e8b0c@applied-asynchrony.com>


(CC: davem, soheil & gregkh)

On 03/17/18 20:12, Holger Hoffstätte wrote:
> On 03/17/18 19:41, Carlos Carvalho wrote:
>> I've put 4.14.27 this morning in this machine and in about 2h it started
>> showing null dereferences identical to the following one. There were several of
>> them, with about 1/2h of interval. Strangely it continued to work and I saw no
>> other anomalies. I've just reverted to 4.14.26.
>>
>> It only happened in this machine, which has a net traffic of several Gb/s and
>> thousands of simultaneous connections.
>>
>> Mar 17 13:29:21 sagres kernel: : BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
>> Mar 17 13:29:21 sagres kernel: : IP: tcp_push+0x4e/0xe7
>> Mar 17 13:29:21 sagres kernel: : PGD 0 P4D 0 
>> Mar 17 13:29:21 sagres kernel: : Oops: 0002 [#1] SMP PTI
>> Mar 17 13:29:21 sagres kernel: : CPU: 55 PID: 2658 Comm: apache2 Not tainted 4.14.27 #4
(snip)
> 
> Fixed by: https://www.spinics.net/lists/netdev/msg489445.html
> 
> -h
> 

This patch is in the netdev patchwork at https://patchwork.ozlabs.org/patch/886324/
but has been marked as "not applicable" without further queued/rejected comment
from Dave, so I believe it became a victim of email lossage.
As the patch says it doesn't apply to anything older than 4.14, but it has been
tested & reported by several people as fixing the problem, and indeed works
fine. Since GregKH only accepts net patches from Dave I wanted to make sure
it got queued up for 4.14.

Thanks,
Holger

^ permalink raw reply

* [PATCH AUTOSEL for 4.9 005/281] x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic()
From: Sasha Levin @ 2018-03-19 15:57 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
  Cc: Josh Poimboeuf, Cong Wang, David S . Miller, Dmitry Vyukov,
	Eric Dumazet, Kostya Serebryany, Linus Torvalds,
	Marcelo Ricardo Leitner, Neil Horman, Peter Zijlstra,
	Thomas Gleixner, Vlad Yasevich, linux-sctp@vger.kernel.org,
	netdev, syzkaller, Ingo Molnar, Sasha Levin
In-Reply-To: <20180319155742.13731-1-alexander.levin@microsoft.com>

From: Josh Poimboeuf <jpoimboe@redhat.com>

[ Upstream commit 42fc6c6cb1662ba2fa727dd01c9473c63be4e3b6 ]

Andrey Konovalov reported the following warning while fuzzing the kernel
with syzkaller:

  WARNING: kernel stack regs at ffff8800686869f8 in a.out:4933 has bad 'bp' value c3fc855a10167ec0

The unwinder dump revealed that RBP had a bad value when an interrupt
occurred in csum_partial_copy_generic().

That function saves RBP on the stack and then overwrites it, using it as
a scratch register.  That's problematic because it breaks stack traces
if an interrupt occurs in the middle of the function.

Replace the usage of RBP with another callee-saved register (R15) so
stack traces are no longer affected.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: linux-sctp@vger.kernel.org
Cc: netdev <netdev@vger.kernel.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Link: http://lkml.kernel.org/r/4b03a961efda5ec9bfe46b7b9c9ad72d1efad343.1493909486.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
---
 arch/x86/lib/csum-copy_64.S | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lib/csum-copy_64.S b/arch/x86/lib/csum-copy_64.S
index 7e48807b2fa1..45a53dfe1859 100644
--- a/arch/x86/lib/csum-copy_64.S
+++ b/arch/x86/lib/csum-copy_64.S
@@ -55,7 +55,7 @@ ENTRY(csum_partial_copy_generic)
 	movq  %r12, 3*8(%rsp)
 	movq  %r14, 4*8(%rsp)
 	movq  %r13, 5*8(%rsp)
-	movq  %rbp, 6*8(%rsp)
+	movq  %r15, 6*8(%rsp)
 
 	movq  %r8, (%rsp)
 	movq  %r9, 1*8(%rsp)
@@ -74,7 +74,7 @@ ENTRY(csum_partial_copy_generic)
 	/* main loop. clear in 64 byte blocks */
 	/* r9: zero, r8: temp2, rbx: temp1, rax: sum, rcx: saved length */
 	/* r11:	temp3, rdx: temp4, r12 loopcnt */
-	/* r10:	temp5, rbp: temp6, r14 temp7, r13 temp8 */
+	/* r10:	temp5, r15: temp6, r14 temp7, r13 temp8 */
 	.p2align 4
 .Lloop:
 	source
@@ -89,7 +89,7 @@ ENTRY(csum_partial_copy_generic)
 	source
 	movq  32(%rdi), %r10
 	source
-	movq  40(%rdi), %rbp
+	movq  40(%rdi), %r15
 	source
 	movq  48(%rdi), %r14
 	source
@@ -103,7 +103,7 @@ ENTRY(csum_partial_copy_generic)
 	adcq  %r11, %rax
 	adcq  %rdx, %rax
 	adcq  %r10, %rax
-	adcq  %rbp, %rax
+	adcq  %r15, %rax
 	adcq  %r14, %rax
 	adcq  %r13, %rax
 
@@ -121,7 +121,7 @@ ENTRY(csum_partial_copy_generic)
 	dest
 	movq %r10, 32(%rsi)
 	dest
-	movq %rbp, 40(%rsi)
+	movq %r15, 40(%rsi)
 	dest
 	movq %r14, 48(%rsi)
 	dest
@@ -203,7 +203,7 @@ ENTRY(csum_partial_copy_generic)
 	movq 3*8(%rsp), %r12
 	movq 4*8(%rsp), %r14
 	movq 5*8(%rsp), %r13
-	movq 6*8(%rsp), %rbp
+	movq 6*8(%rsp), %r15
 	addq $7*8, %rsp
 	ret
 
-- 
2.14.1

^ permalink raw reply related

* [PATCH AUTOSEL for 4.4 004/167] x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic()
From: Sasha Levin @ 2018-03-19 16:05 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
  Cc: Josh Poimboeuf, Cong Wang, David S . Miller, Dmitry Vyukov,
	Eric Dumazet, Kostya Serebryany, Linus Torvalds,
	Marcelo Ricardo Leitner, Neil Horman, Peter Zijlstra,
	Thomas Gleixner, Vlad Yasevich, linux-sctp@vger.kernel.org,
	netdev, syzkaller, Ingo Molnar, Sasha Levin
In-Reply-To: <20180319160513.16384-1-alexander.levin@microsoft.com>

From: Josh Poimboeuf <jpoimboe@redhat.com>

[ Upstream commit 42fc6c6cb1662ba2fa727dd01c9473c63be4e3b6 ]

Andrey Konovalov reported the following warning while fuzzing the kernel
with syzkaller:

  WARNING: kernel stack regs at ffff8800686869f8 in a.out:4933 has bad 'bp' value c3fc855a10167ec0

The unwinder dump revealed that RBP had a bad value when an interrupt
occurred in csum_partial_copy_generic().

That function saves RBP on the stack and then overwrites it, using it as
a scratch register.  That's problematic because it breaks stack traces
if an interrupt occurs in the middle of the function.

Replace the usage of RBP with another callee-saved register (R15) so
stack traces are no longer affected.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: linux-sctp@vger.kernel.org
Cc: netdev <netdev@vger.kernel.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Link: http://lkml.kernel.org/r/4b03a961efda5ec9bfe46b7b9c9ad72d1efad343.1493909486.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
---
 arch/x86/lib/csum-copy_64.S | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lib/csum-copy_64.S b/arch/x86/lib/csum-copy_64.S
index 7e48807b2fa1..45a53dfe1859 100644
--- a/arch/x86/lib/csum-copy_64.S
+++ b/arch/x86/lib/csum-copy_64.S
@@ -55,7 +55,7 @@ ENTRY(csum_partial_copy_generic)
 	movq  %r12, 3*8(%rsp)
 	movq  %r14, 4*8(%rsp)
 	movq  %r13, 5*8(%rsp)
-	movq  %rbp, 6*8(%rsp)
+	movq  %r15, 6*8(%rsp)
 
 	movq  %r8, (%rsp)
 	movq  %r9, 1*8(%rsp)
@@ -74,7 +74,7 @@ ENTRY(csum_partial_copy_generic)
 	/* main loop. clear in 64 byte blocks */
 	/* r9: zero, r8: temp2, rbx: temp1, rax: sum, rcx: saved length */
 	/* r11:	temp3, rdx: temp4, r12 loopcnt */
-	/* r10:	temp5, rbp: temp6, r14 temp7, r13 temp8 */
+	/* r10:	temp5, r15: temp6, r14 temp7, r13 temp8 */
 	.p2align 4
 .Lloop:
 	source
@@ -89,7 +89,7 @@ ENTRY(csum_partial_copy_generic)
 	source
 	movq  32(%rdi), %r10
 	source
-	movq  40(%rdi), %rbp
+	movq  40(%rdi), %r15
 	source
 	movq  48(%rdi), %r14
 	source
@@ -103,7 +103,7 @@ ENTRY(csum_partial_copy_generic)
 	adcq  %r11, %rax
 	adcq  %rdx, %rax
 	adcq  %r10, %rax
-	adcq  %rbp, %rax
+	adcq  %r15, %rax
 	adcq  %r14, %rax
 	adcq  %r13, %rax
 
@@ -121,7 +121,7 @@ ENTRY(csum_partial_copy_generic)
 	dest
 	movq %r10, 32(%rsi)
 	dest
-	movq %rbp, 40(%rsi)
+	movq %r15, 40(%rsi)
 	dest
 	movq %r14, 48(%rsi)
 	dest
@@ -203,7 +203,7 @@ ENTRY(csum_partial_copy_generic)
 	movq 3*8(%rsp), %r12
 	movq 4*8(%rsp), %r14
 	movq 5*8(%rsp), %r13
-	movq 6*8(%rsp), %rbp
+	movq 6*8(%rsp), %r15
 	addq $7*8, %rsp
 	ret
 
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH v2 06/21] fpga: Remove depends on HAS_DMA in case of platform dependency
From: Alan Tull @ 2018-03-19 16:06 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Ulf Hansson, Wolfram Sang, linux-iio, linux-fpga,
	linux-remoteproc, alsa-devel, Bjorn Andersson, Eric Anholt,
	netdev, linux-mtd, linux-i2c, linux1394-devel, Christoph Hellwig,
	Marek Szyprowski, Stefan Wahren, Boris Brezillon,
	James E . J . Bottomley, Herbert Xu, linux-scsi,
	Richard Weinberger, Joerg Roedel, Jassi Brar, Marek Vasut,
	linux-serial, Matias Bjorling
In-Reply-To: <1521208314-4783-7-git-send-email-geert@linux-m68k.org>

On Fri, Mar 16, 2018 at 8:51 AM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:

Hi Geert,

This essentially removes this commit

commit 1c8cb409491403036919dd1c6b45013dc8835a44
Author: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Date:   Wed Aug 3 13:45:46 2016 -0700

    drivers/fpga/Kconfig: fix build failure

    While building m32r allmodconfig the build is failing with the error:

      ERROR: "bad_dma_ops" [drivers/fpga/zynq-fpga.ko] undefined!

    Xilinx Zynq FPGA is using DMA but there was no dependency while
    building.

    Link: http://lkml.kernel.org/r/1464346526-13913-1-git-send-email-sudipm.mukherjee@gmail.com
    Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
    Acked-by: Moritz Fischer <moritz.fischer@ettus.com>
    Cc: Alan Tull <atull@opensource.altera.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Alan

> Remove dependencies on HAS_DMA where a Kconfig symbol depends on another
> symbol that implies HAS_DMA, and, optionally, on "|| COMPILE_TEST".
> In most cases this other symbol is an architecture or platform specific
> symbol, or PCI.
>
> Generic symbols and drivers without platform dependencies keep their
> dependencies on HAS_DMA, to prevent compiling subsystems or drivers that
> cannot work anyway.
>
> This simplifies the dependencies, and allows to improve compile-testing.
>
> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Reviewed-by: Mark Brown <broonie@kernel.org>
> Acked-by: Robin Murphy <robin.murphy@arm.com>
> ---
> v2:
>   - Add Reviewed-by, Acked-by,
>   - Drop RFC state,
>   - Split per subsystem.
> ---
>  drivers/fpga/Kconfig | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
> index f47ef848bcd056d5..fd539132542e30ee 100644
> --- a/drivers/fpga/Kconfig
> +++ b/drivers/fpga/Kconfig
> @@ -53,7 +53,6 @@ config FPGA_MGR_ALTERA_CVP
>  config FPGA_MGR_ZYNQ_FPGA
>         tristate "Xilinx Zynq FPGA"
>         depends on ARCH_ZYNQ || COMPILE_TEST
> -       depends on HAS_DMA
>         help
>           FPGA manager driver support for Xilinx Zynq FPGAs.
>
> --
> 2.7.4
>

^ permalink raw reply

* Re: [PATCH RFC iproute2] libnetlink: allow reading more than one message from extack
From: Stephen Hemminger @ 2018-03-19 16:09 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: netdev, Alexander Aring, Jiri Pirko, Jakub Kicinski
In-Reply-To: <0e68f6e4a4fc68fd3f5f164b8d50a95fde6a1f50.1521227930.git.mleitner@redhat.com>

On Fri, 16 Mar 2018 16:23:09 -0300
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:

> This patch introduces support for reading more than one message from
> extack's and to adjust their level (warning/error) accordingly.
> 
> Yes, there is a FIXME tag in the callback call for now.
> 
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

Make sense, can hold off until kernel supports warnings.

> ---
>  lib/libnetlink.c | 55 ++++++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 40 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/libnetlink.c b/lib/libnetlink.c
> index 928de1dd16d84b7802c06d0a659f5d73e1bbcb4b..7c4c81e11e02ea857888190eb5e7a9e99d159bb3 100644
> --- a/lib/libnetlink.c
> +++ b/lib/libnetlink.c
> @@ -43,9 +43,16 @@ static const enum mnl_attr_data_type extack_policy[NLMSGERR_ATTR_MAX + 1] = {
>  	[NLMSGERR_ATTR_OFFS]	= MNL_TYPE_U32,
>  };
>  
> +#define NETLINK_MAX_EXTACK_MSGS 8

Would rather not have fixed maximums

> +struct extack_args {
> +	const struct nlattr *msg[NETLINK_MAX_EXTACK_MSGS];
> +	const struct nlattr *offs;
> +	int msg_count;
> +};

If you put msg[] last in structure, it could be variable length.

>  static int err_attr_cb(const struct nlattr *attr, void *data)
>  {
> -	const struct nlattr **tb = data;
> +	struct extack_args *tb = data;
>  	uint16_t type;
>  
>  	if (mnl_attr_type_valid(attr, NLMSGERR_ATTR_MAX) < 0) {
> @@ -60,19 +67,23 @@ static int err_attr_cb(const struct nlattr *attr, void *data)
>  		return MNL_CB_ERROR;
>  	}
>  
> -	tb[type] = attr;
> +	if (type == NLMSGERR_ATTR_OFFS)
> +		tb->offs = attr;
> +	else if (tb->msg_count < NETLINK_MAX_EXTACK_MSGS)
> +		tb->msg[tb->msg_count++] = attr;
>  	return MNL_CB_OK;
>  }
>  
>  /* dump netlink extended ack error message */
>  int nl_dump_ext_ack(const struct nlmsghdr *nlh, nl_ext_ack_fn_t errfn)
>  {
> -	struct nlattr *tb[NLMSGERR_ATTR_MAX + 1] = {};
> +	struct extack_args tb = {};
>  	const struct nlmsgerr *err = mnl_nlmsg_get_payload(nlh);
>  	const struct nlmsghdr *err_nlh = NULL;
>  	unsigned int hlen = sizeof(*err);
> -	const char *msg = NULL;
> +	const char *msg[NETLINK_MAX_EXTACK_MSGS] = {};
>  	uint32_t off = 0;
> +	int ret, i;
>  
>  	/* no TLVs, nothing to do here */
>  	if (!(nlh->nlmsg_flags & NLM_F_ACK_TLVS))
> @@ -82,14 +93,14 @@ int nl_dump_ext_ack(const struct nlmsghdr *nlh, nl_ext_ack_fn_t errfn)
>  	if (!(nlh->nlmsg_flags & NLM_F_CAPPED))
>  		hlen += mnl_nlmsg_get_payload_len(&err->msg);
>  
> -	if (mnl_attr_parse(nlh, hlen, err_attr_cb, tb) != MNL_CB_OK)
> +	if (mnl_attr_parse(nlh, hlen, err_attr_cb, &tb) != MNL_CB_OK)
>  		return 0;
>  
> -	if (tb[NLMSGERR_ATTR_MSG])
> -		msg = mnl_attr_get_str(tb[NLMSGERR_ATTR_MSG]);
> +	for (i = 0; i < NETLINK_MAX_EXTACK_MSGS && tb.msg[i]; i++)
> +		msg[i] = mnl_attr_get_str(tb.msg[i]);
>  
> -	if (tb[NLMSGERR_ATTR_OFFS]) {
> -		off = mnl_attr_get_u32(tb[NLMSGERR_ATTR_OFFS]);
> +	if (tb.offs) {
> +		off = mnl_attr_get_u32(tb.offs);
>  
>  		if (off > nlh->nlmsg_len) {
>  			fprintf(stderr,
> @@ -100,21 +111,35 @@ int nl_dump_ext_ack(const struct nlmsghdr *nlh, nl_ext_ack_fn_t errfn)
>  	}
>  
>  	if (errfn)
> -		return errfn(msg, off, err_nlh);
> +		return errfn(*msg, off, err_nlh); /* FIXME */
>  
> -	if (msg && *msg != '\0') {
> +	ret = 0;
> +	for (i = 0; i < NETLINK_MAX_EXTACK_MSGS && msg[i]; i++) {
>  		bool is_err = !!err->error;
> +		const char *_msg = msg[i];
> +
> +		/* Message tagging has precedence.
> +		 * KERN_WARNING = ASCII Start Of Header ('\001') + '4'
> +		 */
> +		if (!strncmp(_msg, "\0014", 2)) {
> +			is_err = false;
> +			_msg += 2;
> +		}

If you are going to have an API that has levels, it must be the same
as existing syslog kernel log format and maybe even get some code reuse.

> +		/* But we can't have Error if it didn't fail. */
> +		if (is_err && !err->error)
> +			is_err = 0;
>  
>  		fprintf(stderr, "%s: %s",
> -			is_err ? "Error" : "Warning", msg);
> -		if (msg[strlen(msg) - 1] != '.')
> +			is_err ? "Error" : "Warning", _msg);
> +		if (_msg[strlen(_msg) - 1] != '.')
>  			fprintf(stderr, ".");
>  		fprintf(stderr, "\n");
>  
> -		return is_err ? 1 : 0;
> +		if (is_err)
> +			ret = 1;
>  	}
>  
> -	return 0;
> +	return ret;
>  }
>  #else
>  #warning "libmnl required for error support"

^ permalink raw reply

* RE: DTS for our Configuration
From: Alayev Michael @ 2018-03-19 16:23 UTC (permalink / raw)
  To: 'andrew@lunn.ch'
  Cc: 'netdev@vger.kernel.org', Efter Yoram, Dror Alon
In-Reply-To: <3988EA6F088014488BE41D19253A7EE40152236EE1@EXS10.iai.co.il>

[-- Attachment #1: Type: text/plain, Size: 4052 bytes --]

Hello Andrew,

You deserve a big thank you for your solution to our device-tree - the linux bootup log looks a lot better and It made a big progress for us.
Though we still have some issues...

1. Attached are 2 log files that are the result of your suggested dts:
	1st is your device-tree as is : results in kernel panic. Its probably caused by the "link" parameter.
	2nd is your device tree with gem0 port's10 "link" parameter commented on both switches. It results in good kernel but still the devices are not properly detected. Maybe its related to the 	comment you wrote below about the fixed PHY driver issue that prevents the driver to complete his phy address scan (stops at phy addr 1...).
2. The switch's product number should be 0x0a1 but instead its 0xa10 (is it just a high-low byte thing?)
3. The stand-alone phy (gem1) is not detected properly. It should be device id mv88e1510.

Please advise how to continue from here.

Regards,
Michael Alayev



-----Original Message-----
From: Dror Alon 
Sent: Sunday, March 18, 2018 8:16 AM
To: Alayev Michael
Subject: FW: DTS for our Configuration



-----Original Message-----
From: Andrew Lunn [mailto:andrew@lunn.ch]
Sent: Friday, March 16, 2018 5:12 PM
To: Dror Alon
Cc: 'netdev@vger.kernel.org'; Efter Yoram; Alayev Michael
Subject: Re: DTS for our Configuration

On Thu, Mar 15, 2018 at 02:00:00PM +0000, Dror Alon wrote:
> Hello Andrew,
> Thanks for your fast responses.
> Michael and I keep trying to configure our linux Zynq7000 board.
> We can't succeed with configure our SWITCHES via the DTS file.

> Please. See the next diagram, and please Do you have any DTS Sample 
> for this Setup ?

Hi Dror

I'm surprised you are doing SGMII between ports 10. They are 10G capable.

gem0: {
	fixed-link {
		speed = <1000>;
		full-duplex;
	};

	phy0: phy@0: {
		reg = <0>;
	};

	switch0: switch@1c: {
		compatible = "marvell,mv88e6190";
		reg = <0x1c>;

		ports {
			#address-cells = <1>;
			#size-cells = <0>;

			port@0 {
				reg = <0>;
				label = "cpu";
				fixed-link {
					speed = <1000>;
					full-duplex;
				};
			port@1 {
				reg = <1>;
				label = "lan0";
			};
			port@2 {
				reg = <2>;
				label = "lan1";
			};
			...
			switch0port10: port@10 {
				reg = <10>;
				label = "dsa";
				link = <&switch1port10>;
				phy-mode = "sgmii";
			 	fixed-link {
					speed = <1000>;
					full-duplex;
				};
			};
		};
	};

	switch1: switch@1d: {
		compatible = "marvell,mv88e6190";
		reg = <0x1d>;

		ports {
			#address-cells = <1>;
			#size-cells = <0>;

			port@0 {
				reg = <0>;
				label = "lan9";
				};
			port@1 {
				reg = <1>;
				label = "lan10";
			};
			port@2 {
				reg = <2>;
				label = "lan11";
			};
			...
			switch1port10: port@10 {
				reg = <10>;
				label = "dsa";
				link = <&switch0port10>;
				phy-mode = "sgmii";
			 	fixed-link {
					speed = <1000>;
					full-duplex;
				};
			};
		};
	};
};

gem1: {
      phy-handle = <phy0>;
};

It looks like you have at last one issue to solve in the macb driver.
If you have a fixed-phy, it does an mdiobus_register. This means it is going to ignore all the other device tree properties. You need it to do an of_mdiobus_register().

   Andrew

Default Profile
***********************************************************************************************

Please consider the environment before printing this email !
The information contained in this communication is proprietary to Israel Aerospace Industries Ltd. and/or third parties, may contain confidential or privileged information, and is intended only for the use of the intended addressee thereof.
If you are not the intended addressee, please be aware that any use, disclosure, distribution and/or copying of this communication is strictly prohibited. If you receive this communication in error, please notify the sender immediately and delete it from your computer. 
Thank you.

Visit us at:   www.iai.co.il

[-- Attachment #2: linux_log_andrew_dts_original.txt --]
[-- Type: text/plain, Size: 12164 bytes --]

Starting kernel ...

Booting Linux on physical CPU 0x0
Linux version 4.14.0-xilinx (84alav@fedora64) (gcc version 5.4.0 (Buildroot 2017.05.1)) #15 SMP PREEMPT Mon Mar 19 12:30:34 IST 2018
CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=18c5387d
CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
OF: fdt: Machine model: Zynq Zed Development Board
Memory policy: Data cache writealloc
cma: Reserved 16 MiB at 0x1dc00000
percpu: Embedded 16 pages/cpu @dfbc5000 s34764 r8192 d22580 u65536
Built 1 zonelists, mobility grouping on.  Total pages: 130048
Kernel command line:
PID hash table entries: 2048 (order: 1, 8192 bytes)
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 490488K/524288K available (5120K kernel code, 224K rwdata, 1404K rodata, 1024K init, 150K bss, 17416K reserved, 16384K cma-reserved, 0K highmem)
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
    vmalloc : 0xe0800000 - 0xff800000   ( 496 MB)
    lowmem  : 0xc0000000 - 0xe0000000   ( 512 MB)
    pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
    modules : 0xbf000000 - 0xbfe00000   (  14 MB)
      .text : 0xc0008000 - 0xc0600000   (6112 kB)
      .init : 0xc0800000 - 0xc0900000   (1024 kB)
      .data : 0xc0900000 - 0xc0938000   ( 224 kB)
       .bss : 0xc0938000 - 0xc095da14   ( 151 kB)
Preemptible hierarchical RCU implementation.
        RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
        Tasks RCU enabled.
RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
efuse mapped to e0800000
slcr mapped to e0802000
L2C: platform modifies aux control register: 0x72360000 -> 0x72760000
L2C: DT/platform modifies aux control register: 0x72360000 -> 0x72760000
L2C-310 erratum 769419 enabled
L2C-310 enabling early BRESP for Cortex-A9
L2C-310 full line of zeros enabled for Cortex-A9
L2C-310 ID prefetch enabled, offset 1 lines
L2C-310 dynamic clock gating enabled, standby mode enabled
L2C-310 cache controller enabled, 8 ways, 512 kB
L2C-310: CACHE_ID 0x410000c8, AUX_CTRL 0x76760001
zynq_clock_init: clkc starts at e0802100
Zynq clock init
sched_clock: 64 bits at 333MHz, resolution 3ns, wraps every 4398046511103ns
clocksource: arm_global_timer: mask: 0xffffffffffffffff max_cycles: 0x4ce07af025, max_idle_ns: 440795209040 ns
Switching to timer-based delay loop, resolution 3ns
clocksource: ttc_clocksource: mask: 0xffff max_cycles: 0xffff, max_idle_ns: 537538477 ns
timer #0 at e080a000, irq=17
Console: colour dummy device 80x30
console [tty0] enabled
Calibrating delay loop (skipped), value calculated using timer frequency.. 666.66 BogoMIPS (lpj=3333333)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
CPU: Testing write buffer coherency: ok
CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
Setting up static identity map for 0x100000 - 0x100060
Hierarchical SRCU implementation.
smp: Bringing up secondary CPUs ...
CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
smp: Brought up 1 node, 2 CPUs
SMP: Total of 2 processors activated (1333.33 BogoMIPS).
CPU: All CPU(s) started in SVC mode.
devtmpfs: initialized
random: get_random_u32 called from bucket_table_alloc+0x1c4/0x224 with crng_init=0
VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 4
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
futex hash table entries: 512 (order: 3, 32768 bytes)
pinctrl core: initialized pinctrl subsystem
NET: Registered protocol family 16
random: fast init done
DMA: preallocated 256 KiB pool for atomic coherent allocations
cpuidle: using governor menu
hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint registers.
hw-breakpoint: maximum watchpoint size is 4 bytes.
zynq-ocm f800c000.ocmc: ZYNQ OCM pool: 256 KiB @ 0xe0840000
zynq-pinctrl 700.pinctrl: zynq pinctrl initialized
e0000000.serial: ttyPS0 at MMIO 0xe0000000 (irq = 25, base_baud = 6249999) is a xuartps
console [ttyPS0] enabled
vgaarb: loaded
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
pps_core: LinuxPPS API ver. 1 registered
pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
PTP clock support registered
EDAC MC: Ver: 3.0.0
FPGA manager framework
clocksource: Switched to clocksource arm_global_timer
NET: Registered protocol family 2
TCP established hash table entries: 4096 (order: 2, 16384 bytes)
TCP bind hash table entries: 4096 (order: 3, 32768 bytes)
TCP: Hash tables configured (established 4096 bind 4096)
UDP hash table entries: 256 (order: 1, 8192 bytes)
UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
Trying to unpack rootfs image as initramfs...
Freeing initrd memory: 3604K
hw perfevents: no interrupt-affinity property for /pmu, guessing.
hw perfevents: enabled with armv7_cortex_a9 PMU driver, 7 counters available
workingset: timestamp_bits=30 max_order=17 bucket_order=0
jffs2: version 2.2. (NAND) (SUMMARY)  ?© 2001-2006 Red Hat, Inc.
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
io scheduler mq-deadline registered
io scheduler kyber registered
dma-pl330 f8003000.dmac: Loaded driver for PL330 DMAC-241330
dma-pl330 f8003000.dmac:        DBUFF-128x8bytes Num_Chans-8 Num_Peri-4 Num_Events-16
brd: module loaded
loop: module loaded
libphy: Fixed MDIO Bus: probed
CAN device driver interface
mv88e6xxx_init kelev
libphy: MACB_mii_bus: probed
mdio_bus e000b000.ethernet-ffffffff: /amba/ethernet@e000b000/fixed-link has invalid PHY address
mv88e6085 e000b000.ethernet-ffffffff:1d: switch 0xa10 detected: Marvell 88E6390X, revision 1
libphy: mv88e6xxx SMI: probed
mv88e6085 e000b000.ethernet-ffffffff:1c: switch 0xa10 detected: Marvell 88E6390X, revision 1
libphy: mv88e6xxx SMI: probed
mv88e6085: probe of e000b000.ethernet-ffffffff:1c failed with error -16
mdio_bus e000b000.ethernet-ffffffff: scan phy fixed-link at address 1
Unable to handle kernel NULL pointer dereference at virtual address 00000004
pgd = c0004000
[00000004] *pgd=00000000
Internal error: Oops - BUG: 17 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.14.0-xilinx #15
Hardware name: Xilinx Zynq Platform
task: df43b840 task.stack: df43c000
PC is at dsa_unregister_switch+0x10/0x48
LR is at dsa_unregister_switch+0x10/0x48
pc : [<c05aad48>]    lr : [<c05aad48>]    psr: 60000013
sp : df43dd58  ip : 00000000  fp : 00000000
r10: 00000000  r9 : dd009c78  r8 : 00000034
r7 : dd0d1434  r6 : c091dd68  r5 : 00000000  r4 : dd0d1210
r3 : df43b840  r2 : 00000000  r1 : 00000000  r0 : c0936cdc
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 18c5387d  Table: 1f65c04a  DAC: 00000051
Process swapper/0 (pid: 1, stack limit = 0xdf43c210)
Stack: (0xdf43dd58 to 0xdf43e000)
dd40:                                                       dd0d1210 00000000
dd60: c091dd68 c0418ac0 dd0d1400 00000000 c091dd68 c04127ac dd0d1400 c03b72b8
dd80: dd0d1400 df4ff530 c091c448 dd0d1460 dfbeb3c4 c03b6420 dd0d1400 dd0d145c
dda0: dd009c78 c03b3acc dd009e8c a0000013 dd1d2800 dd009e80 dd009e8c dd0d1400
ddc0: dd0d1400 dd009e84 dd009e8c c0412868 dd009c00 c0412624 ffffffed 00000001
dde0: dd009c00 dfbeb67c dfbeb3c4 c04bafac 00000000 c04b378c dfbecb5c 0000001c
de00: c06ffa6d df77f000 df77c000 df77c4c0 dfbeb3c4 df524a10 00000000 c0423104
de20: ffffffff df43de5c 00000001 c0229cd4 dfbeb3c4 c0421054 00000001 00000002
de40: 00000001 00000001 dd1c8898 dd1c9cc0 dd1c9c40 dd1c9bc0 00000000 00000000
de60: 00000001 00000001 c091ddc8 c0422974 df524a10 c091ddc8 00000000 00000000
de80: c091ddc8 00000000 00000000 c03b84d4 df524a10 c09523dc c09523e0 c03b6e84
dea0: df524a10 df524a44 c091ddc8 c0918a28 00000000 c0938000 c083383c c03b7080
dec0: 00000000 c091ddc8 c03b7000 c03b5790 df472f58 df4edb34 c091ddc8 00000000
dee0: dd1c5300 c03b64f8 c06ff8d2 c06ff8d3 00000000 c091ddc8 c081aea4 000000a8
df00: c083be5c c03b7808 00000006 c081aea4 000000a8 c0101900 00000000 df43df24
df20: 00000000 00000000 00000000 c075d548 000000a8 00000006 00000006 00000000
df40: cccccccd 00000000 00000000 c0938000 c083383c 00000006 c0833830 000000a8
df60: 00000006 c0833834 000000a8 c083be5c c0938000 c0800d40 00000006 00000006
df80: 00000000 c0800594 00000000 c05e1e64 00000000 00000000 00000000 00000000
dfa0: 00000000 c05e1e6c 00000000 c0106fd0 00000000 00000000 00000000 00000000
dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[<c05aad48>] (dsa_unregister_switch) from [<c0418ac0>] (mv88e6xxx_remove+0x1c/0x68)
[<c0418ac0>] (mv88e6xxx_remove) from [<c04127ac>] (mdio_remove+0x18/0x28)
[<c04127ac>] (mdio_remove) from [<c03b72b8>] (device_release_driver_internal+0x128/0x1d0)
[<c03b72b8>] (device_release_driver_internal) from [<c03b6420>] (bus_remove_device+0xcc/0xdc)
[<c03b6420>] (bus_remove_device) from [<c03b3acc>] (device_del+0x1bc/0x258)
[<c03b3acc>] (device_del) from [<c0412868>] (mdio_device_remove+0xc/0x18)
[<c0412868>] (mdio_device_remove) from [<c0412624>] (mdiobus_unregister+0x40/0x74)
[<c0412624>] (mdiobus_unregister) from [<c04bafac>] (of_mdiobus_register+0x234/0x254)
[<c04bafac>] (of_mdiobus_register) from [<c0423104>] (macb_probe+0x790/0xb88)
[<c0423104>] (macb_probe) from [<c03b84d4>] (platform_drv_probe+0x50/0xa0)
[<c03b84d4>] (platform_drv_probe) from [<c03b6e84>] (driver_probe_device+0x13c/0x2b8)
[<c03b6e84>] (driver_probe_device) from [<c03b7080>] (__driver_attach+0x80/0xa4)
[<c03b7080>] (__driver_attach) from [<c03b5790>] (bus_for_each_dev+0x68/0x8c)
[<c03b5790>] (bus_for_each_dev) from [<c03b64f8>] (bus_add_driver+0xc8/0x1dc)
[<c03b64f8>] (bus_add_driver) from [<c03b7808>] (driver_register+0x9c/0xe0)
[<c03b7808>] (driver_register) from [<c0101900>] (do_one_initcall+0xa8/0x11c)
[<c0101900>] (do_one_initcall) from [<c0800d40>] (kernel_init_freeable+0x10c/0x1cc)
[<c0800d40>] (kernel_init_freeable) from [<c05e1e6c>] (kernel_init+0x8/0x10c)
[<c05e1e6c>] (kernel_init) from [<c0106fd0>] (ret_from_fork+0x14/0x24)
Code: e92d4070 e1a05000 e59f0034 eb00ea7e (e5954004)
---[ end trace f6d20ab8f9ad8edb ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

CPU0: stopping
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D         4.14.0-xilinx #15
Hardware name: Xilinx Zynq Platform
[<c010e7b4>] (unwind_backtrace) from [<c010a9b0>] (show_stack+0x10/0x14)
[<c010a9b0>] (show_stack) from [<c05d1950>] (dump_stack+0x80/0xa0)
[<c05d1950>] (dump_stack) from [<c010cf20>] (ipi_cpu_stop+0x3c/0x70)
[<c010cf20>] (ipi_cpu_stop) from [<c010d720>] (handle_IPI+0x64/0x84)
[<c010d720>] (handle_IPI) from [<c01013f8>] (gic_handle_irq+0x7c/0x98)
[<c01013f8>] (gic_handle_irq) from [<c010b40c>] (__irq_svc+0x6c/0xa8)
Exception stack(0xc0901f48 to 0xc0901f90)
1f40:                   00000001 00000000 00000000 c0116800 00000000 00000000
1f60: c0900000 c08441f8 c0901fa0 c0833a30 00000000 00000000 1f388000 c0901f98
1f80: c01079e0 c01079e4 60000013 ffffffff
[<c010b40c>] (__irq_svc) from [<c01079e4>] (arch_cpu_idle+0x2c/0x38)
[<c01079e4>] (arch_cpu_idle) from [<c01490fc>] (do_idle+0xd0/0x198)
[<c01490fc>] (do_idle) from [<c01492fc>] (cpu_startup_entry+0x18/0x1c)
[<c01492fc>] (cpu_startup_entry) from [<c0800bd4>] (start_kernel+0x308/0x368)
[<c0800bd4>] (start_kernel) from [<0000807c>] (0x807c)
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

[-- Attachment #3: linux_log_andrew_dts_with_Port10in_both_switches_and_no_link.txt --]
[-- Type: text/plain, Size: 8926 bytes --]

Starting kernel ...

Booting Linux on physical CPU 0x0
Linux version 4.14.0-xilinx (84alav@fedora64) (gcc version 5.4.0 (Buildroot 2017.05.1)) #15 SMP PREEMPT Mon Mar 19 12:30:34 IST 2018
CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=18c5387d
CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
OF: fdt: Machine model: Zynq Zed Development Board
Memory policy: Data cache writealloc
cma: Reserved 16 MiB at 0x1dc00000
percpu: Embedded 16 pages/cpu @dfbc5000 s34764 r8192 d22580 u65536
Built 1 zonelists, mobility grouping on.  Total pages: 130048
Kernel command line:
PID hash table entries: 2048 (order: 1, 8192 bytes)
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 490488K/524288K available (5120K kernel code, 224K rwdata, 1404K rodata, 1024K init, 150K bss, 17416K reserved, 16384K cma-reserved, 0K highmem)
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
    vmalloc : 0xe0800000 - 0xff800000   ( 496 MB)
    lowmem  : 0xc0000000 - 0xe0000000   ( 512 MB)
    pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
    modules : 0xbf000000 - 0xbfe00000   (  14 MB)
      .text : 0xc0008000 - 0xc0600000   (6112 kB)
      .init : 0xc0800000 - 0xc0900000   (1024 kB)
      .data : 0xc0900000 - 0xc0938000   ( 224 kB)
       .bss : 0xc0938000 - 0xc095da14   ( 151 kB)
Preemptible hierarchical RCU implementation.
        RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
        Tasks RCU enabled.
RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
efuse mapped to e0800000
slcr mapped to e0802000
L2C: platform modifies aux control register: 0x72360000 -> 0x72760000
L2C: DT/platform modifies aux control register: 0x72360000 -> 0x72760000
L2C-310 erratum 769419 enabled
L2C-310 enabling early BRESP for Cortex-A9
L2C-310 full line of zeros enabled for Cortex-A9
L2C-310 ID prefetch enabled, offset 1 lines
L2C-310 dynamic clock gating enabled, standby mode enabled
L2C-310 cache controller enabled, 8 ways, 512 kB
L2C-310: CACHE_ID 0x410000c8, AUX_CTRL 0x76760001
zynq_clock_init: clkc starts at e0802100
Zynq clock init
sched_clock: 64 bits at 333MHz, resolution 3ns, wraps every 4398046511103ns
clocksource: arm_global_timer: mask: 0xffffffffffffffff max_cycles: 0x4ce07af025, max_idle_ns: 440795209040 ns
Switching to timer-based delay loop, resolution 3ns
clocksource: ttc_clocksource: mask: 0xffff max_cycles: 0xffff, max_idle_ns: 537538477 ns
timer #0 at e080a000, irq=17
Console: colour dummy device 80x30
console [tty0] enabled
Calibrating delay loop (skipped), value calculated using timer frequency.. 666.66 BogoMIPS (lpj=3333333)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
CPU: Testing write buffer coherency: ok
CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
Setting up static identity map for 0x100000 - 0x100060
Hierarchical SRCU implementation.
smp: Bringing up secondary CPUs ...
CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
smp: Brought up 1 node, 2 CPUs
SMP: Total of 2 processors activated (1333.33 BogoMIPS).
CPU: All CPU(s) started in SVC mode.
devtmpfs: initialized
random: get_random_u32 called from bucket_table_alloc+0x1c4/0x224 with crng_init=0
VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 4
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
futex hash table entries: 512 (order: 3, 32768 bytes)
pinctrl core: initialized pinctrl subsystem
NET: Registered protocol family 16
random: fast init done
DMA: preallocated 256 KiB pool for atomic coherent allocations
cpuidle: using governor menu
hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint registers.
hw-breakpoint: maximum watchpoint size is 4 bytes.
zynq-ocm f800c000.ocmc: ZYNQ OCM pool: 256 KiB @ 0xe0840000
zynq-pinctrl 700.pinctrl: zynq pinctrl initialized
e0000000.serial: ttyPS0 at MMIO 0xe0000000 (irq = 25, base_baud = 6249999) is a xuartps
console [ttyPS0] enabled
vgaarb: loaded
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
pps_core: LinuxPPS API ver. 1 registered
pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
PTP clock support registered
EDAC MC: Ver: 3.0.0
FPGA manager framework
clocksource: Switched to clocksource arm_global_timer
NET: Registered protocol family 2
TCP established hash table entries: 4096 (order: 2, 16384 bytes)
TCP bind hash table entries: 4096 (order: 3, 32768 bytes)
TCP: Hash tables configured (established 4096 bind 4096)
UDP hash table entries: 256 (order: 1, 8192 bytes)
UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
Trying to unpack rootfs image as initramfs...
Freeing initrd memory: 3604K
hw perfevents: no interrupt-affinity property for /pmu, guessing.
hw perfevents: enabled with armv7_cortex_a9 PMU driver, 7 counters available
workingset: timestamp_bits=30 max_order=17 bucket_order=0
jffs2: version 2.2. (NAND) (SUMMARY)  ?© 2001-2006 Red Hat, Inc.
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
io scheduler mq-deadline registered
io scheduler kyber registered
dma-pl330 f8003000.dmac: Loaded driver for PL330 DMAC-241330
dma-pl330 f8003000.dmac:        DBUFF-128x8bytes Num_Chans-8 Num_Peri-4 Num_Events-16
brd: module loaded
loop: module loaded
libphy: Fixed MDIO Bus: probed
CAN device driver interface
mv88e6xxx_init kelev
libphy: MACB_mii_bus: probed
mdio_bus e000b000.ethernet-ffffffff: /amba/ethernet@e000b000/fixed-link has invalid PHY address
mv88e6085 e000b000.ethernet-ffffffff:1d: switch 0xa10 detected: Marvell 88E6390X, revision 1
libphy: mv88e6xxx SMI: probed
DSA: switch 0 0 parsed
Tree has no master device
mv88e6085: probe of e000b000.ethernet-ffffffff:1d failed with error -22
mv88e6085 e000b000.ethernet-ffffffff:1c: switch 0xa10 detected: Marvell 88E6390X, revision 1
libphy: mv88e6xxx SMI: probed
DSA: switch 0 0 parsed
Tree has no master device
mv88e6085: probe of e000b000.ethernet-ffffffff:1c failed with error -22
mdio_bus e000b000.ethernet-ffffffff: scan phy fixed-link at address 1
macb e000c000.ethernet: invalid hw address, using random
libphy: MACB_mii_bus: probed
macb: macb_mii_probe kelev
macb e000c000.ethernet eth0: Cadence GEM rev 0x00020118 at 0xe000c000 irq 28 (5e:89:7b:36:e5:d6)
Generic PHY e000c000.ethernet-ffffffff:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=e000c000.ethernet-ffffffff:00, irq=POLL)
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci-pci: EHCI PCI platform driver
usbcore: registered new interface driver usb-storage
chipidea-usb2 e0002000.usb: e0002000.usb supply vbus not found, using dummy regulator
ci_hdrc ci_hdrc.0: unable to init phy: -110
ci_hdrc: probe of ci_hdrc.0 failed with error -110
i2c /dev entries driver
cdns-wdt f8005000.watchdog: Xilinx Watchdog Timer at e0948000 with timeout 10s
EDAC MC: ECC not enabled
Xilinx Zynq CpuIdle Driver started
sdhci: Secure Digital Host Controller Interface driver
sdhci: Copyright(c) Pierre Ossman
sdhci-pltfm: SDHCI platform and OF driver helper
mmc0: SDHCI controller on e0100000.sdhci [e0100000.sdhci] using ADMA
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
fpga_manager fpga0: Xilinx Zynq FPGA Manager registered
NET: Registered protocol family 10
Segment Routing with IPv6
sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
NET: Registered protocol family 17
can: controller area network core (rev 20170425 abi 9)
NET: Registered protocol family 29
can: raw protocol (rev 20170425)
can: broadcast manager protocol (rev 20170425 t)
can: netlink gateway (rev 20170425) max_hops=1
Registering SWP/SWPB emulation handler
hctosys: unable to open rtc device (rtc0)
of_cfs_init
of_cfs_init: OK
Freeing unused kernel memory: 1024K
/bin/mount: special device /dev/mmcblk0p1 does not exist
/bin/mount: special device /dev/mmcblk0p2 does not exist
/bin/mount: special device /dev/mmcblk0p3 does not exist
Starting logging: OK
Initializing random number generator... done.
Starting network: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
OK
Starting openntpd: creating new /var/db/ntpd.drift
OK
Starting dropbear sshd: OK
Starting vsftpd: OK

[-- Attachment #4: zynq-zed.dts --]
[-- Type: application/octet-stream, Size: 4384 bytes --]

/*
 *  Copyright (C) 2011 - 2014 Xilinx
 *  Copyright (C) 2012 National Instruments Corp.
 *
 * This software is licensed under the terms of the GNU General Public
 * License version 2, as published by the Free Software Foundation, and
 * may be copied, distributed, and modified under those terms.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 */
/dts-v1/;
/include/ "zynq-7000.dtsi"

/ {
	model = "Zynq Zed Development Board";
	compatible = "xlnx,zynq-zed", "xlnx,zynq-7000";

	aliases {
		ethernet0 = &gem0;
		ethernet1 = &gem1;
		serial0 = &uart0;
		spi0 = &qspi;
		mmc0 = &sdhci0;
	};

	memory {
		device_type = "memory";
		reg = <0x0 0x20000000>;
	};

	chosen {
		bootargs = "";
		stdout-path = "serial0:115200n8";
	};

	usb_phy0: phy0 {
		compatible = "ulpi-phy";
		#phy-cells = <0>;
		reg = <0xe0002000 0x1000>;
		view-port = <0x0170>;
		drv-vbus;
	};
};

&clkc {
	ps-clk-frequency = <33333333>;
};

&gem0 {
	status = "okay";
			fixed-link {
					speed = <1000>;
					full-duplex;
			};
			phy0: phy@0 {
				reg = <0>;
			};

			switch0: switch@1d {
				compatible = "marvell,mv88e6190";
				reg = <0x1d>;

				ports {
					#address-cells = <1>;
					#size-cells = <0>;

					port@0 {
					reg = <0>;
					label = "cpu";
						fixed-link {
							speed = <1000>;
							full-duplex;
						};
					};

					port@1 {
						reg = <1>;
						label = "lan0";
					};
					
					port@2 {
						reg = <2>;
						label = "lan1";
					};

					port@3 {
						reg = <3>;
						label = "lan2";
					};
					
					port@4 {
						reg = <4>;
						label = "lan3";
					};

					port@5 {
						reg = <5>;
						label = "lan4";
					};
					
					port@6 {
						reg = <6>;
						label = "lan5";
					};

					port@7 {
						reg = <7>;
						label = "lan6";
					};
					
					port@8 {
						reg = <8>;
						label = "lan7";
					};			

					port@9 {
						reg = <9>;
						label = "lan8";
					};

					switch0port10: port@10 {
						reg = <10>;
						label = "dsa";
						link = <&switch1port10>;
						phy-mode = "sgmii";
			 			fixed-link {
							speed = <1000>;
							full-duplex;
						};
					};

				};

			};


			switch1: switch@1c {
				compatible = "marvell,mv88e6190";
				reg = <0x1c>;

				ports {
					#address-cells = <1>;
					#size-cells = <0>;

					port@0 {
						reg = <0>;
						label = "lan9";
						};
					port@1 {
						reg = <1>;
						label = "lan10";
					};
					port@2 {
						reg = <2>;
						label = "lan11";
					};
					port@3 {
						reg = <3>;
						label = "lan12";
						};
					port@4 {
						reg = <4>;
						label = "lan13";
					};
					port@5 {
						reg = <5>;
						label = "lan14";
					};
					port@6 {
						reg = <6>;
						label = "lan15";
						};
					port@7 {
						reg = <7>;
						label = "lan16";
					};
					port@8 {
						reg = <8>;
						label = "lan17";
					};
					port@9 {
						reg = <9>;
						label = "lan18";
					};

					switch1port10: port@10 {
						reg = <10>;
						label = "dsa";
						link = <&switch0port10>;
						phy-mode = "sgmii";
					 	fixed-link {
							speed = <1000>;
							full-duplex;
						};
					};
				};
			};

		};


&gem1 {
	phy-handle = <&phy0>;
	status = "okay";
};



&qspi {
	status = "okay";
	is-dual = <0>;
	num-cs = <1>;
	flash@0 {
		compatible = "n25q128a11";
		reg = <0x0>;
		spi-tx-bus-width = <1>;
		spi-rx-bus-width = <4>;
		spi-max-frequency = <50000000>;
		#address-cells = <1>;
		#size-cells = <1>;
		partition@qspi-fsbl-uboot {
			label = "qspi-fsbl-uboot";
			reg = <0x0 0x100000>;
		};
		partition@qspi-linux {
			label = "qspi-linux";
			reg = <0x100000 0x500000>;
		};
		partition@qspi-device-tree {
			label = "qspi-device-tree";
			reg = <0x600000 0x20000>;
		};
		partition@qspi-rootfs {
			label = "qspi-rootfs";
			reg = <0x620000 0x5E0000>;
		};
		partition@qspi-bitstream {
			label = "qspi-bitstream";
			reg = <0xC00000 0x400000>;
		};
	};
};

&sdhci0 {
	status = "okay";
};

/*
&uart1 {
	current-speed = <115200>;
	device_type = "serial";
	port-number = <1>;
	status = "okay";
};
*/

&uart0 {
	current-speed = <115200>;
	device_type = "serial";
	port-number = <0>;
	status = "okay";
};
&usb0 {
	status = "okay";
	dr_mode = "host";
	usb-phy = <&usb_phy0>;
};

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] brcmfmac: add new dt entries for SG SDIO settings
From: Kalle Valo @ 2018-03-19 16:23 UTC (permalink / raw)
  To: Alexey Roslyakov
  Cc: andrew, robh+dt, mark.rutland, arend.vanspriel, franky.lin,
	hante.meuleman, chi-hsien.lin, wright.feng, netdev,
	linux-wireless, devicetree, linux-kernel, brcm80211-dev-list.pdl,
	brcm80211-dev-list
In-Reply-To: <20180319014032.9394-2-alexey.roslyakov@gmail.com>

Alexey Roslyakov <alexey.roslyakov@gmail.com> writes:

> There are 3 fields in SDIO settings (quirks) to workaround some of the
> SG SDIO host particularities, i.e higher align requirements for SG
> items. All coding is done the long time ago, but there is no way to
> change the driver behavior without patching the kernel. Add missing
> devicetree entries.
>
> Signed-off-by: Alexey Roslyakov <alexey.roslyakov@gmail.com>

The commit log is not clear for me, what does "all coding is done long
time ago" exactly mean? What code and where?

>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)

Why net-next? To me it looks like this should go to
wireless-drivers-next.

-- 
Kalle Valo

^ permalink raw reply

* Re: 4.14.2[6-7] tcp_push NULL pointer
From: Soheil Hassas Yeganeh @ 2018-03-19 16:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, tc, netdev
In-Reply-To: <72586b88-7099-8e16-4352-7fec9d7aa143@gmail.com>

On Mon, Mar 19, 2018 at 10:16 AM Eric Dumazet <eric.dumazet@gmail.com>
wrote:



> On 03/19/2018 07:03 AM, David Miller wrote:
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Mon, 19 Mar 2018 05:17:37 -0700
> >
> >> We have sent a fix last week, I am not sure if David took it.
> >>
> >> https://patchwork.ozlabs.org/patch/886324/
> >
> > I thought I submitted that in my last round of -stable submissions,
> > but I have re-added this to my stable queue and will make sure it
> > really gets there if not.
> >

> Thanks David !

Thanks very much, David!

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] brcmfmac: add new dt entries for SG SDIO settings
From: Alexey Roslyakov @ 2018-03-19 16:27 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Andrew Lunn, robh+dt, mark.rutland, Arend van Spriel, franky.lin,
	hante.meuleman, chi-hsien.lin, wright.feng, netdev,
	linux-wireless, devicetree, linux-kernel, brcm80211-dev-list.pdl,
	brcm80211-dev-list
In-Reply-To: <877eq8yqym.fsf@kamboji.qca.qualcomm.com>

Hi, Kalle,
good remark, I'll try to make it clear in next version.

Thank you.


On 19 March 2018 at 23:23, Kalle Valo <kvalo@codeaurora.org> wrote:
> Alexey Roslyakov <alexey.roslyakov@gmail.com> writes:
>
>> There are 3 fields in SDIO settings (quirks) to workaround some of the
>> SG SDIO host particularities, i.e higher align requirements for SG
>> items. All coding is done the long time ago, but there is no way to
>> change the driver behavior without patching the kernel. Add missing
>> devicetree entries.
>>
>> Signed-off-by: Alexey Roslyakov <alexey.roslyakov@gmail.com>
>
> The commit log is not clear for me, what does "all coding is done long
> time ago" exactly mean? What code and where?
>
>>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c | 12 +++++++++---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> Why net-next? To me it looks like this should go to
> wireless-drivers-next.
>
> --
> Kalle Valo



-- 
With best regards,
  Alexey Roslyakov
Email: alexey.roslyakov@gmail.com

^ permalink raw reply

* Re: [bpf-next PATCH v3 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
From: Alexei Starovoitov @ 2018-03-19 16:27 UTC (permalink / raw)
  To: John Fastabend; +Cc: davejwatson, davem, daniel, ast, netdev
In-Reply-To: <20180318195710.14466.34467.stgit@john-Precision-Tower-5810>

On Sun, Mar 18, 2018 at 12:57:10PM -0700, John Fastabend wrote:
> This implements a BPF ULP layer to allow policy enforcement and
> monitoring at the socket layer. In order to support this a new
> program type BPF_PROG_TYPE_SK_MSG is used to run the policy at
> the sendmsg/sendpage hook. To attach the policy to sockets a
> sockmap is used with a new program attach type BPF_SK_MSG_VERDICT.
> 
> Similar to previous sockmap usages when a sock is added to a
> sockmap, via a map update, if the map contains a BPF_SK_MSG_VERDICT
> program type attached then the BPF ULP layer is created on the
> socket and the attached BPF_PROG_TYPE_SK_MSG program is run for
> every msg in sendmsg case and page/offset in sendpage case.
...
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

looks great

Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* Re: [bpf-next PATCH v3 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper
From: Alexei Starovoitov @ 2018-03-19 16:27 UTC (permalink / raw)
  To: John Fastabend; +Cc: davejwatson, davem, daniel, ast, netdev
In-Reply-To: <20180318195715.14466.28900.stgit@john-Precision-Tower-5810>

On Sun, Mar 18, 2018 at 12:57:15PM -0700, John Fastabend wrote:
> A single sendmsg or sendfile system call can contain multiple logical
> messages that a BPF program may want to read and apply a verdict. But,
> without an apply_bytes helper any verdict on the data applies to all
> bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
> care to read the first N bytes of a msg. If the payload is large say
> MB or even GB setting up and calling the BPF program repeatedly for
> all bytes, even though the verdict is already known, creates
> unnecessary overhead.
> 
> To allow BPF programs to control how many bytes a given verdict
> applies to we implement a bpf_msg_apply_bytes() helper. When called
> from within a BPF program this sets a counter, internal to the
> BPF infrastructure, that applies the last verdict to the next N
> bytes. If the N is smaller than the current data being processed
> from a sendmsg/sendfile call, the first N bytes will be sent and
> the BPF program will be re-run with start_data pointing to the N+1
> byte. If N is larger than the current data being processed the
> BPF verdict will be applied to multiple sendmsg/sendfile calls
> until N bytes are consumed.
> 
> Note1 if a socket closes with apply_bytes counter non-zero this
> is not a problem because data is not being buffered for N bytes
> and is sent as its received.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Alexei Starovoitov <ast@kernel.org>

 

^ permalink raw reply

* Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access
From: Linus Torvalds @ 2018-03-19 16:29 UTC (permalink / raw)
  To: David Laight
  Cc: Thomas Gleixner, Rahul Lakkireddy, x86@kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	mingo@redhat.com, hpa@zytor.com, davem@davemloft.net,
	akpm@linux-foundation.org, ganeshgr@chelsio.com,
	nirranjan@chelsio.com, indranil@chelsio.com
In-Reply-To: <ba8ac1f8c1f444da857b6a46c4153672@AcuMS.aculab.com>

On Mon, Mar 19, 2018 at 8:53 AM, David Laight <David.Laight@aculab.com> wrote:
>
> The x87 and SSE registers can't be changed - they can contain callee-saved
> registers.
> But (IIRC) the AVX and AVX2 registers are all caller-saved.

No.

The kernel entry is not the usual function call.

On kernel entry, *all* registers are callee-saved.

Of course, some may be return values, and I have a slight hope that I
can trash %eflags. But basically, a system call is simply not a
function call. We have different calling conventions on the argument
side too. In fact, the arguments are in different registers depending
on just *which* system call you take.

                  Linus

^ permalink raw reply

* Re: [bpf-next PATCH v3 07/18] bpf: sockmap, add msg_cork_bytes() helper
From: Alexei Starovoitov @ 2018-03-19 16:30 UTC (permalink / raw)
  To: John Fastabend; +Cc: davejwatson, davem, daniel, ast, netdev
In-Reply-To: <20180318195720.14466.35911.stgit@john-Precision-Tower-5810>

On Sun, Mar 18, 2018 at 12:57:20PM -0700, John Fastabend wrote:
> In the case where we need a specific number of bytes before a
> verdict can be assigned, even if the data spans multiple sendmsg
> or sendfile calls. The BPF program may use msg_cork_bytes().
> 
> The extreme case is a user can call sendmsg repeatedly with
> 1-byte msg segments. Obviously, this is bad for performance but
> is still valid. If the BPF program needs N bytes to validate
> a header it can use msg_cork_bytes to specify N bytes and the
> BPF program will not be called again until N bytes have been
> accumulated. The infrastructure will attempt to coalesce data
> if possible so in many cases (most my use cases at least) the
> data will be in a single scatterlist element with data pointers
> pointing to start/end of the element. However, this is dependent
> on available memory so is not guaranteed. So BPF programs must
> validate data pointer ranges, but this is the case anyways to
> convince the verifier the accesses are valid.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>  include/uapi/linux/bpf.h |    3 ++-
>  net/core/filter.c        |   16 ++++++++++++++++
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index a557a2a..1765cfb 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -792,7 +792,8 @@ struct bpf_stack_build_id {
>  	FN(override_return),		\
>  	FN(sock_ops_cb_flags_set),	\
>  	FN(msg_redirect_map),		\
> -	FN(msg_apply_bytes),
> +	FN(msg_apply_bytes),		\
> +	FN(msg_cork_bytes),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 17d6775..0c9daf6 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1942,6 +1942,20 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>  	.arg2_type      = ARG_ANYTHING,
>  };
>  
> +BPF_CALL_2(bpf_msg_cork_bytes, struct sk_msg_buff *, msg, u32, bytes)
> +{
> +	msg->cork_bytes = bytes;
> +	return 0;
> +}

my understanding that setting it here and in the other helper *_bytes to zero
will be effectively a nop. Right?

Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* Re: NULL pointer dereferences with 4.14.27
From: David Miller @ 2018-03-19 16:33 UTC (permalink / raw)
  To: holger; +Cc: carlos, soheil, gregkh, netdev, stable
In-Reply-To: <d11d73b5-bde3-98bc-a936-db7352d9787b@applied-asynchrony.com>

From: Holger Hoffstätte <holger@applied-asynchrony.com>
Date: Mon, 19 Mar 2018 16:57:48 +0100

> This patch is in the netdev patchwork at https://patchwork.ozlabs.org/patch/886324/
> but has been marked as "not applicable" without further queued/rejected comment
> from Dave, so I believe it became a victim of email lossage.

It is not a victim of email lossage.

When someone posts a backport for -stable, that is not intended to be
applied upstream (because it's already there), I add the patch to the
stable bundle and mark it as "Not applicable" because it's "Not
applicable" for upstream.

^ permalink raw reply

* Re: recursive static routes
From: David Ahern @ 2018-03-19 16:34 UTC (permalink / raw)
  To: Saku Ytti, netdev
In-Reply-To: <CAAeewD_-s32D9GJEyV_06LPmkXLk1s9stspiY0Xrrn81PyhGgg@mail.gmail.com>

On 3/19/18 1:42 AM, Saku Ytti wrote:
> I believe Linux does not support recursive static routes, is this correct?

The Linux stack does not flatten routes when inserting into the FIB.
Recursion is expected to be done a routing daemon such as bgp which will
be able to handle updates as the network changes.

I have thought about adding such a feature to the stack, basically have
the gateway recomputed on link changes. It would most certainly not be
as robust as having the updates come from a routing daemon.

Any solution would need to handle encapsulations (e.g., MPLS) which is
one area it gets complicated fast.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox