Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] net: dsa: Discard frames from unused ports
From: David Miller @ 2018-04-06  2:04 UTC (permalink / raw)
  To: andrew; +Cc: netdev, f.fainelli, vivien.didelot
In-Reply-To: <1522886204-1545-1-git-send-email-andrew@lunn.ch>

From: Andrew Lunn <andrew@lunn.ch>
Date: Thu,  5 Apr 2018 01:56:44 +0200

> The Marvell switches under some conditions will pass a frame to the
> host with the port being the CPU port. Such frames are invalid, and
> should be dropped. Not dropping them can result in a crash when
> incrementing the receive statistics for an invalid port.
> 
> Reported-by: Chris Healy <cphealy@gmail.com>
> Fixes: 5f6b4e14cada ("net: dsa: User per-cpu 64-bit statistics")
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
 ...
> +	slave_port = &ds->ports[port];
> +
> +	if (slave_port->type != DSA_PORT_TYPE_USER)
> +		return NULL;

Look like we need a Fixes: update and an adjustment to use unlikely()
here based upon Florian's feedback.

^ permalink raw reply

* Re: [PATCH net] arp: fix arp_filter on l3slave devices
From: David Miller @ 2018-04-06  2:05 UTC (permalink / raw)
  To: dsahern; +Cc: mfadon, netdev
In-Reply-To: <8ea161a0-b164-8f05-5026-d416f17fbdfb@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Thu, 5 Apr 2018 08:40:48 -0600

> On 4/5/18 2:25 AM, Miguel Fadon Perlines wrote:
>> arp_filter performs an ip_route_output search for arp source address and
>> checks if output device is the same where the arp request was received,
>> if it is not, the arp request is not answered.
>> 
>> This route lookup is always done on main route table so l3slave devices
>> never find the proper route and arp is not answered.
>> 
>> Passing l3mdev_master_ifindex_rcu(dev) return value as oif fixes the
>> lookup for l3slave devices while maintaining same behavior for non
>> l3slave devices as this function returns 0 in that case.
>> 
>> Signed-off-by: Miguel Fadon Perlines <mfadon@teldat.com>
>> ---
>>  net/ipv4/arp.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
> 
> Acked-by: David Ahern <dsa@cumulusnetworks.com>
> 
> DaveM: this is a day 1 bug for VRF. Best guess at a Fixes tag would be:
> 
> Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX")
> 
> It would be good to get this into stable releases 4.9 and up. Thanks,

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net] net: mvpp2: Fix parser entry init boundary check
From: David Miller @ 2018-04-06  2:14 UTC (permalink / raw)
  To: maxime.chevallier
  Cc: netdev, linux-kernel, antoine.tenart, thomas.petazzoni,
	gregory.clement, miquel.raynal, nadavh, stefanc, ymarkman, mw
In-Reply-To: <20180405095548.14433-1-maxime.chevallier@bootlin.com>

From: Maxime Chevallier <maxime.chevallier@bootlin.com>
Date: Thu,  5 Apr 2018 11:55:48 +0200

> Boundary check in mvpp2_prs_init_from_hw must be done according to the
> passed "tid" parameter, not the mvpp2_prs_entry index, which is not yet
> initialized at the time of the check.
> 
> Fixes: 47e0e14eb1a6 ("net: mvpp2: Make mvpp2_prs_hw_read a parser entry init function")
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH 08/32] aio: replace kiocb_set_cancel_fn with a cancel_kiocb file operation
From: Al Viro @ 2018-04-06  2:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel
In-Reply-To: <20180330150809.28094-9-hch@lst.de>

On Fri, Mar 30, 2018 at 05:07:45PM +0200, Christoph Hellwig wrote:
> The current kiocb_set_cancel_fn implementation assumes the kiocb is
> embedded into an aio_kiocb, which is fundamentally unsafe as it might
> have been submitted by non-aio callers.  Instead add a cancel_kiocb
> file operation that replaced the ki_cancel function pointer set by
> kiocb_set_cancel_fn, and only adds iocbs to the active list when
> the read/write_iter methods return -EIOCBQUEUED and the file has
> a cancel_kiocb method.

> @@ -1440,6 +1423,16 @@ static inline ssize_t aio_rw_ret(struct kiocb *req, ssize_t ret)
>  {
>  	switch (ret) {
>  	case -EIOCBQUEUED:
> +		if (req->ki_filp->f_op->cancel_kiocb) {

... and by that point req might've been already freed by IO completion on
another CPU.
> +			struct aio_kiocb *iocb =
> +				container_of(req, struct aio_kiocb, rw);
> +			struct kioctx *ctx = iocb->ki_ctx;
> +			unsigned long flags;
> +
> +			spin_lock_irqsave(&ctx->ctx_lock, flags);
> +			list_add_tail(&iocb->ki_list, &ctx->active_reqs);

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply

* Re: [PATCH net-next] hv_netvsc: Add NetVSP v6 into version negotiation
From: David Miller @ 2018-04-06  2:17 UTC (permalink / raw)
  To: haiyangz, haiyangz
  Cc: netdev, kys, sthemmin, olaf, vkuznets, devel, linux-kernel
In-Reply-To: <20180405184222.3167-1-haiyangz@linuxonhyperv.com>

From: Haiyang Zhang <haiyangz@linuxonhyperv.com>
Date: Thu,  5 Apr 2018 11:42:22 -0700

> From: Haiyang Zhang <haiyangz@microsoft.com>
> 
> This patch adds the NetVSP v6 message structures, and includes this
> version into NetVSC/NetVSP version negotiation process.
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>

The net-next tree is closed, please resubmit this when the net-next
tree reopens.

Thank you.

^ permalink raw reply

* Re: [PATCH 0/4] hv_netvsc: Fix shutdown issues on older Windows hosts
From: David Miller @ 2018-04-06  2:21 UTC (permalink / raw)
  To: mgamal
  Cc: netdev, sthemmin, devel, linux-kernel, kys, haiyangz, vkuznets,
	otubo
In-Reply-To: <1522955361-14704-1-git-send-email-mgamal@redhat.com>

From: Mohammed Gamal <mgamal@redhat.com>
Date: Thu,  5 Apr 2018 21:09:17 +0200

> Guests running on WS2012 hosts would not shutdown when changing network
> interface setting (e.g. Number of channels, MTU ... etc). 
> 
> This patch series addresses these shutdown issues we enecountered with WS2012
> hosts. It's essentialy a rework of the series sent in 
> https://lkml.org/lkml/2018/1/23/111 on top of latest upstream
> 
> Fixes: 0ef58b0a05c1 ("hv_netvsc: change GPAD teardown order on older versions")

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH 2/2] net: phy: dp83640: Read strapped configuration settings
From: David Miller @ 2018-04-06  2:24 UTC (permalink / raw)
  To: andrew
  Cc: esben, f.fainelli, richardcochran, netdev, linux-kernel,
	rasmus.villemoes
In-Reply-To: <20180405204049.GD17495@lunn.ch>

From: Andrew Lunn <andrew@lunn.ch>
Date: Thu, 5 Apr 2018 22:40:49 +0200

> Or could it still contain whatever state the last boot of Linux, or
> maybe the bootloader, left the PHY in?

Right, this is my concern as well.

^ permalink raw reply

* Re: [PATCH net] net/ipv6: Increment OUTxxx counters after netfilter hook
From: David Miller @ 2018-04-06  2:24 UTC (permalink / raw)
  To: 0xeffeff; +Cc: netdev, dsahern
In-Reply-To: <20180405212947.17858-1-0xeffeff@gmail.com>

From: Jeff Barnhill <0xeffeff@gmail.com>
Date: Thu,  5 Apr 2018 21:29:47 +0000

> At the end of ip6_forward(), IPSTATS_MIB_OUTFORWDATAGRAMS and
> IPSTATS_MIB_OUTOCTETS are incremented immediately before the NF_HOOK call
> for NFPROTO_IPV6 / NF_INET_FORWARD.  As a result, these counters get
> incremented regardless of whether or not the netfilter hook allows the
> packet to continue being processed.  This change increments the counters
> in ip6_forward_finish() so that it will not happen if the netfilter hook
> chooses to terminate the packet, which is similar to how IPv4 works.
> 
> Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com>

Yep, this is more consistent with ipv4.

Applied, thank you.

^ permalink raw reply

* Re: [PATCH v15 ] net/veth/XDP: Line-rate packet forwarding in kernel
From: David Ahern @ 2018-04-06  2:55 UTC (permalink / raw)
  To: Md. Islam
  Cc: netdev, David Miller, Stephen Hemminger, Anton Gary Ceph,
	Pavel Emelyanov, Eric Dumazet, alexei.starovoitov, brouer
In-Reply-To: <CAFgPn1D2SE_wvCvLJU9uRJbwa4hy4NVj2as_C=VzmDLqyBeGbQ@mail.gmail.com>

On 4/3/18 9:15 PM, Md. Islam wrote:
>> Have you looked at what I would consider a more interesting use case of
>> packets into a node and delivered to a namespace via veth?
>>
>>    +--------------------------+---------------
>>    | Host                     | container
>>    |                          |
>>    |        +-------{ veth1 }-|-{veth2}----
>>    |       |                  |
>>    +----{ eth1 }------------------
>>
>> Can xdp / bpf on eth1 be used to speed up delivery to the container?
> 
> I didn't consider that, but it sounds like an important use case. How
> do we determine which namespace gets the packet?
> 

FIB lookups of course. Starting with my patch set that handles
forwarding on eth1, what is needed for XDP with veth? ie., a program on
eth1 does the lookup and redirects the packet to veth1 for Tx.
ndo_xdp_xmit for veth knows the packet needs to be forwarded to veth2
internally and there is no skb allocated for the packet yet.

^ permalink raw reply

* Re: [PATCH 5/6] rhashtable: support guaranteed successful insertion.
From: NeilBrown @ 2018-04-06  3:11 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Thomas Graf, netdev, linux-kernel, David S. Miller, Eric Dumazet
In-Reply-To: <20180329052231.GA21028@gondor.apana.org.au>

[-- Attachment #1: Type: text/plain, Size: 5902 bytes --]

On Thu, Mar 29 2018, Herbert Xu wrote:

> On Thu, Mar 29, 2018 at 08:26:21AM +1100, NeilBrown wrote:
>>
>> I say "astronomically unlikely", you say "probability .. is extremely
>> low".  I think we are in agreement here.
>> 
>> The point remains that if an error *can* be returned then I have to
>> write code to handle it and test that code.  I'd rather not.
>
> You have be able to handle errors anyway because of memory allocation
> failures.  Ultimately if you keep inserting you will eventually
> fail with ENOMEM.  So I don't see the issue with an additional
> error value.

You don't need to handle memory allocation failures at the point where
you insert into the table - adding to a linked list requires no new
memory.

If you are running out of memory, you will fail to allocate new objects,
so you won't be able to add them to the rhashtable.  Obviously you have
to handle that failure, and that is likely to save rhashtable from
getting many mem-alloc failures.

But once you have allocated a new object, rhashtable should just accept
it.  It doesn't help anyone to have the insertion fail.
The insertion can currently fail even when allocating a new object would
succeed, as assertion can require a GFP_ATOMIC allocation to succeed,
while allocating a new object might only need a GFP_KERNEL allocation to
succeed. 

>
>> > Even if it does happen we won't fail because we will perform
>> > an immediate rehash.  We only fail if it happens right away
>> > after the rehash (that is, at least another 16 elements have
>> > been inserted and you're trying to insert a 17th element, all
>> > while the new hash table has not been completely populated),
>> > which means that somebody has figured out our hash secret and
>> > failing in that case makes sense.
>
> BTW, you didn't acknowledge this bit which I think is crucial to
> how likely such an error is.

The likelihood of the error isn't really the issue - it either can
happen or it cannot.  If it can, it requires code to handle it.

>
>> I never suggested retrying, but I would have to handle it somehow.  I'd
>> rather not.
>
> ...
>
>> While I have no doubt that there are hashtables where someone could try
>> to attack the hash, I am quite sure there are others where is such an
>> attack is meaningless - any code which could generate the required range of
>> keys, could do far worse things more easily.
>
> Our network hashtable has to be secure against adversaries.  I
> understand that this may not be important to your use-case.  However,
> given the fact that the failure would only occur if an adversary
> is present and actively attacking your hash table, I don't think
> it has that much of a negative effect on your use-case either.

I certainly accept that there are circumstances where it is a real
possibility that an adversary might succeed in attacking the hash
function, and changing the seed for each table is an excellent idea.
It isn't clear to me that failing insertions is needed - it is done
rather late and so doesn't throttle much, and could give the attacker
feedback that they had succeeded (?).

Not all hash tables are susceptible to attack.  It would be nice if
developers working in less exposed areas could use rhashtable without ever
having to check for errors.

Error codes are messages from the implementation to the caller.
They should be chosen to help the caller make useful choices, not to
allow the implementation to avoid awkward situations.

>
> Of course if you can reproduce the EBUSY error without your
> disable_count patch or even after you have fixed the issue I
> have pointed out in your disable_count patch you can still reproduce
> it then that would suggest a real bug and we would need to fix it,
> for everyone.

I suspect that I cannot.  However experience tells me that customers do
things that I cannot and would never expect - they can often trigger
errors that I cannot.  It is best if the error cannot be returned.

>
>> Yes, storing a sharded count in the spinlock table does seem like an
>> appropriate granularity.  However that leads me to ask: why do we have
>> the spinlock table?  Why not bit spinlocks in the hashchain head like
>> include/linux/list_bl uses?
>
> The spinlock table predates rhashtable.  Perhaps Thomas/Eric/Dave
> can elucidate this.

Maybe I'll submit an RFC patch to change it - if I can make it work.

>
>> I don't understand how it can ever be "too late", though I appreciate
>> that in some cases "sooner" is better than "later"
>> If we give up on the single atomic_t counter, then we must accept that
>> the number of elements could exceed any given value.  The only promise
>> we can provide is that it wont exceed N% of the table size for more than
>> T seconds.
>
> Sure.  However, assuming we use an estimate that is hash-based,
> that *should* be fairly accurate assuming that your hash function
> is working in the first place.  It's completely different vs.
> estimating based on a per-cpu count which could be wildly inaccurate.

Ahhh... I see what you are thinking now.  You aren't suggestion a
sharded count that is summed periodically.  Rather you are suggesting that
we divide the hash space into N regions each with their own independent
counter, and resize the table if any one region reaches 70% occupancy -
on the assumption that the other regions are likely to be close.  Is
that right?
It would probably be dangerous to allow automatic shrinking (when just
one drops below 30%) in that case, but it might be OK for growing.

I'm not sure I like the idea, but it is worth thinking about.

Thanks,
NeilBrown

>
> Cheers,
> -- 
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* Re: KASAN: use-after-free Read in worker_thread (2)
From: Eric Biggers @ 2018-04-06  3:12 UTC (permalink / raw)
  To: syzbot
  Cc: davem, dvyukov, ebiggers, jiangshanlai, linux-kernel, mingo,
	netdev, syzkaller-bugs, tj, tklauser, tom, xiyou.wangcong
In-Reply-To: <001a1140ccfee824f8055db71210@google.com>

On Sat, Nov 11, 2017 at 07:56:01AM -0800, syzbot wrote:
> syzkaller has found reproducer for the following crash on
> d9e0e63d9a6f88440eb201e1491fcf730272c706
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
> 
> 
> BUG: KASAN: use-after-free in worker_thread+0x15bb/0x1990
> kernel/workqueue.c:2244
> Read of size 8 at addr ffff88002d0e3de0 by task kworker/u8:1/1209
> 
> CPU: 0 PID: 1209 Comm: kworker/u8:1 Not tainted 4.14.0-rc8-next-20171110+
> #12
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  print_address_description+0x73/0x250 mm/kasan/report.c:252
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
>  worker_thread+0x15bb/0x1990 kernel/workqueue.c:2244
>  kthread+0x37a/0x440 kernel/kthread.c:238
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:437
> 
> Allocated by task 11866:
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
>  kmem_cache_alloc+0x12e/0x760 mm/slab.c:3548
>  kmem_cache_zalloc include/linux/slab.h:693 [inline]
>  kcm_attach net/kcm/kcmsock.c:1394 [inline]
>  kcm_attach_ioctl net/kcm/kcmsock.c:1460 [inline]
>  kcm_ioctl+0x2d1/0x1610 net/kcm/kcmsock.c:1695
>  sock_do_ioctl+0x65/0xb0 net/socket.c:960
>  sock_ioctl+0x2c2/0x440 net/socket.c:1057
>  vfs_ioctl fs/ioctl.c:46 [inline]
>  do_vfs_ioctl+0x1b1/0x1530 fs/ioctl.c:686
>  SYSC_ioctl fs/ioctl.c:701 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> 
> Freed by task 11867:
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
>  __cache_free mm/slab.c:3492 [inline]
>  kmem_cache_free+0x77/0x280 mm/slab.c:3750
>  kcm_unattach+0xe50/0x1510 net/kcm/kcmsock.c:1563
>  kcm_unattach_ioctl net/kcm/kcmsock.c:1608 [inline]
>  kcm_ioctl+0xdf0/0x1610 net/kcm/kcmsock.c:1705
>  sock_do_ioctl+0x65/0xb0 net/socket.c:960
>  sock_ioctl+0x2c2/0x440 net/socket.c:1057
>  vfs_ioctl fs/ioctl.c:46 [inline]
>  do_vfs_ioctl+0x1b1/0x1530 fs/ioctl.c:686
>  SYSC_ioctl fs/ioctl.c:701 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> 
> The buggy address belongs to the object at ffff88002d0e3d00
>  which belongs to the cache kcm_psock_cache of size 576
> The buggy address is located 224 bytes inside of
>  576-byte region [ffff88002d0e3d00, ffff88002d0e3f40)
> The buggy address belongs to the page:
> page:ffffea0000b43880 count:1 mapcount:0 mapping:ffff88002d0e2180 index:0x0
> compound_mapcount: 0
> flags: 0x100000000008100(slab|head)
> raw: 0100000000008100 ffff88002d0e2180 0000000000000000 000000010000000b
> raw: ffffea0000b14920 ffffea0000b27e20 ffff88002b0089c0 0000000000000000
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>  ffff88002d0e3c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  ffff88002d0e3d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ffff88002d0e3d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                        ^
>  ffff88002d0e3e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff88002d0e3e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================

No longer occurring, the fix seems to have been commit 7e9964574ee97:

#syz fix: kcm: Only allow TCP sockets to be attached to a KCM mux

- Eric

^ permalink raw reply

* Re: [PATCH net-next] netns: filter uevents correctly
From: Eric W. Biederman @ 2018-04-06  3:59 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Kirill Tkhai, davem, gregkh, netdev, linux-kernel, avagin, serge
In-Reply-To: <20180405144130.GB26043@gmail.com>

Christian Brauner <christian.brauner@canonical.com> writes:

> On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
>> On 05.04.2018 17:07, Christian Brauner wrote:
>> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
>> >> On 04.04.2018 22:48, Christian Brauner wrote:
>> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")
>> >>>
>> >>> enabled sending hotplug events into all network namespaces back in 2010.
>> >>> Over time the set of uevents that get sent into all network namespaces has
>> >>> shrunk. We have now reached the point where hotplug events for all devices
>> >>> that carry a namespace tag are filtered according to that namespace.
>> >>>
>> >>> Specifically, they are filtered whenever the namespace tag of the kobject
>> >>> does not match the namespace tag of the netlink socket. One example are
>> >>> network devices. Uevents for network devices only show up in the network
>> >>> namespaces these devices are moved to or created in.
>> >>>
>> >>> However, any uevent for a kobject that does not have a namespace tag
>> >>> associated with it will not be filtered and we will *try* to broadcast it
>> >>> into all network namespaces.
>> >>>
>> >>> The original patchset was written in 2010 before user namespaces were a
>> >>> thing. With the introduction of user namespaces sending out uevents became
>> >>> partially isolated as they were filtered by user namespaces:
>> >>>
>> >>> net/netlink/af_netlink.c:do_one_broadcast()
>> >>>
>> >>> if (!net_eq(sock_net(sk), p->net)) {
>> >>>         if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
>> >>>                 return;
>> >>>
>> >>>         if (!peernet_has_id(sock_net(sk), p->net))
>> >>>                 return;
>> >>>
>> >>>         if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
>> >>>                              CAP_NET_BROADCAST))
>> >>>         j       return;
>> >>> }
>> >>>
>> >>> The file_ns_capable() check will check whether the caller had
>> >>> CAP_NET_BROADCAST at the time of opening the netlink socket in the user
>> >>> namespace of interest. This check is fine in general but seems insufficient
>> >>> to me when paired with uevents. The reason is that devices always belong to
>> >>> the initial user namespace so uevents for kobjects that do not carry a
>> >>> namespace tag should never be sent into another user namespace. This has
>> >>> been the intention all along. But there's one case where this breaks,
>> >>> namely if a new user namespace is created by root on the host and an
>> >>> identity mapping is established between root on the host and root in the
>> >>> new user namespace. Here's a reproducer:
>> >>>
>> >>>  sudo unshare -U --map-root
>> >>>  udevadm monitor -k
>> >>>  # Now change to initial user namespace and e.g. do
>> >>>  modprobe kvm
>> >>>  # or
>> >>>  rmmod kvm
>> >>>
>> >>> will allow the non-initial user namespace to retrieve all uevents from the
>> >>> host. This seems very anecdotal given that in the general case user
>> >>> namespaces do not see any uevents and also can't really do anything useful
>> >>> with them.
>> >>>
>> >>> Additionally, it is now possible to send uevents from userspace. As such we
>> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
>> >>> namespace of the network namespace of the netlink socket) userspace process
>> >>> make a decision what uevents should be sent.
>> >>>
>> >>> This makes me think that we should simply ensure that uevents for kobjects
>> >>> that do not carry a namespace tag are *always* filtered by user namespace
>> >>> in kobj_bcast_filter(). Specifically:
>> >>> - If the owning user namespace of the uevent socket is not init_user_ns the
>> >>>   event will always be filtered.
>> >>> - If the network namespace the uevent socket belongs to was created in the
>> >>>   initial user namespace but was opened from a non-initial user namespace
>> >>>   the event will be filtered as well.
>> >>> Put another way, uevents for kobjects not carrying a namespace tag are now
>> >>> always only sent to the initial user namespace. The regression potential
>> >>> for this is near to non-existent since user namespaces can't really do
>> >>> anything with interesting devices.
>> >>>
>> >>> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
>> >>> ---
>> >>>  lib/kobject_uevent.c | 10 +++++++++-
>> >>>  1 file changed, 9 insertions(+), 1 deletion(-)
>> >>>
>> >>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
>> >>> index 15ea216a67ce..cb98cddb6e3b 100644
>> >>> --- a/lib/kobject_uevent.c
>> >>> +++ b/lib/kobject_uevent.c
>> >>> @@ -251,7 +251,15 @@ static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
>> >>>  		return sock_ns != ns;
>> >>>  	}
>> >>>  
>> >>> -	return 0;
>> >>> +	/*
>> >>> +	 * The kobject does not carry a namespace tag so filter by user
>> >>> +	 * namespace below.
>> >>> +	 */
>> >>> +	if (sock_net(dsk)->user_ns != &init_user_ns)
>> >>> +		return 1;
>> >>> +
>> >>> +	/* Check if socket was opened from non-initial user namespace. */
>> >>> +	return sk_user_ns(dsk) != &init_user_ns;
>> >>>  }
>> >>>  #endif
>> >>
>> >> So, this prohibits to listen events of all devices except network-related
>> >> in containers? If it's so, I don't think it's a good solution. Uevents is not
>> > 
>> > No, this is not correct: As it is right now *without my patch* no
>> > non-initial user namespace is receiving *any uevents* but those
>> > specifically namespaced such as those for network devices. This patch
>> > doesn't change that at all. The commit message outlines this in detail
>> > how this comes about.
>> > There is only one case where this currently breaks and this is as I
>> > outlined explicitly in my commit message when you create a new user
>> > namespace and map container(0) -> host(0). This patch fixes this.
>> 
>> Could you please point the place, where non-initial user namespaces are filtered?
>> I only see the kobj_bcast_filter() logic, and it used to return 0, which means "accepted".
>> Now it will return 1 sometimes.
>
> Oh sure, it's in the commit message though. The callchain is
> lib/kobject_uevent.c:kobject_uevent_net_broadcast() ->
> nnet/netlink/af_netlink.c:netlink_broadcast_filtered() ->
> net/netlink/af_netlink.c:do_one_broadcast():
>
> This codepiece will check whether the openened socket holds
> CAP_NET_BROADCAST in the user namespace of the target network namespace
> which it won't because we don't have device namespaces and all devices
> belong to the initial set of namespaces.
>
>         if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
>                              CAP_NET_BROADCAST))
>         j       return;
>

The above that only applies if someone has set NETLINK_F_LISTEN_ALL_NSID
on their socket and has had someone with the appropriate privileges
assign a peerrnetid.

All of which is to say that file_ns_capable is not nearly as applicable
as it might be, and if you can pass the other two checks I think it is
pointless (because the peernet attributes are not generated for
kobj_uevents) but valid to receive events from outside your network
namespace.


I might be missing something but I don't see anything excluding network
namespaces owned by !init_user_ns excluded from the kobject_uevent
logic.

The uevent_sock_list has one entry per network namespace. And
kobject_uevent_net_broacast appears to walk each one.

I had a memory of filtering uevent messages and I had a memory
that the netlink_has_listeners had a per network namespace effect.
Neither seems true from my inspection of the code tonight.

If we are not filtering ordinary uevents at least at the user namespace
level that does seem to be at least a nuisance.


Christian can you dig a little deeper into this.  I have a feeling that
there are some real efficiency improvements that we could make to
kobject_uevent_net_broadcast if nothing else.

Perhaps you could see where uevents are broadcast by poking
the sysfs uevent of an existing device, and triggering a retransmit.

Eric

^ permalink raw reply

* Re: [PATCH 5/6] rhashtable: support guaranteed successful insertion.
From: Herbert Xu @ 2018-04-06  4:13 UTC (permalink / raw)
  To: NeilBrown
  Cc: Thomas Graf, netdev, linux-kernel, David S. Miller, Eric Dumazet
In-Reply-To: <87y3i1vxj7.fsf@notabene.neil.brown.name>

On Fri, Apr 06, 2018 at 01:11:56PM +1000, NeilBrown wrote:
>
> You don't need to handle memory allocation failures at the point where
> you insert into the table - adding to a linked list requires no new
> memory.

You do actually.  The r in rhashtable stands for resizable.  We
cannot completely rely on a background hash table resize because
the insertions can be triggered from softirq context and be without
rate-limits of any kind (e.g., incoming TCP connections).

Therefore at some point you either have to wait for the resize or
fail the insertion.  Since we cannot sleep then failing is the only
option.

> The likelihood of the error isn't really the issue - it either can
> happen or it cannot.  If it can, it requires code to handle it.

Sure, but you just have to handle it the same way you would handle
a memory allocation failure, because that's what it essentially is.

I'm sorry if that means you have to restructure your code to do that
but that's what you pay for having a resizable hash table because
at some point you just have to perform a resize.

> Ahhh... I see what you are thinking now.  You aren't suggestion a
> sharded count that is summed periodically.  Rather you are suggesting that
> we divide the hash space into N regions each with their own independent
> counter, and resize the table if any one region reaches 70% occupancy -
> on the assumption that the other regions are likely to be close.  Is
> that right?

Yes.

> It would probably be dangerous to allow automatic shrinking (when just
> one drops below 30%) in that case, but it might be OK for growing.

At the greatest granularity it would be a counter per bucket, so
clearly we need to maintain some kind of bound on the granularity
so our sample space is not too small.

Automatic shrinking shouldn't be an issue because that's the slow
kind of resizing that we punt to kthread context and it can afford
to do a careful count.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 1/2] af_key: Use DIV_ROUND_UP() instead of open-coded equivalent
From: Steffen Klassert @ 2018-04-06  4:31 UTC (permalink / raw)
  To: Kevin Easton; +Cc: Herbert Xu, David S. Miller, netdev, linux-kernel
In-Reply-To: <20180329013526.GA27752@la.guarana.org>

On Wed, Mar 28, 2018 at 09:35:26PM -0400, Kevin Easton wrote:
> On Wed, Mar 28, 2018 at 07:59:25AM +0200, Steffen Klassert wrote:
> > On Mon, Mar 26, 2018 at 07:39:16AM -0400, Kevin Easton wrote:
> > > Several places use (x + 7) / 8 to convert from a number of bits to a number
> > > of bytes.  Replace those with DIV_ROUND_UP(x, 8) instead, for consistency
> > > with other parts of the same file.
> > > 
> > > Signed-off-by: Kevin Easton <kevin@guarana.org>
> > 
> > Is this a fix or just a cleanup?
> > If it is just a cleanup, please resent it based on ipsec-next.
> 
> Hi Steffen,
> 
> This patch (1/2) is just a cleanup, but it's in preparation for patch 2/2
> which is a fix and modifies some of the same lines.

So please resend the fix without the cleanup, otherwise we can
get conflicts when backporting the fix to the stable trees.

^ permalink raw reply

* Re: [PATCH 05/32] fs: introduce new ->get_poll_head and ->poll_mask methods
From: Al Viro @ 2018-04-06  4:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel
In-Reply-To: <20180330150809.28094-6-hch@lst.de>

On Fri, Mar 30, 2018 at 05:07:42PM +0200, Christoph Hellwig wrote:

> +  get_poll_head: Returns the struct wait_queue_head that poll, select,
> +  epoll or aio poll should wait on in case this instance only has single
> +  waitqueue.  Can return NULL to indicate polling is not supported,
> +  or a POLL* value using the POLL_TO_PTR helper in case a grave error
> +  occured and ->poll_mask shall not be called.

> +		if (IS_ERR(head))
> +			return PTR_TO_POLL(head);

> + * ->get_poll_head can return a __poll_t in the PTR_ERR, use these macros
> + * to return the value and recover it.  It takes care of the negation as
> + * well as off the annotations.
> + */
> +#define POLL_TO_PTR(mask)	(ERR_PTR(-(__force int)(mask)))

Uh-oh...
static inline bool __must_check IS_ERR(__force const void *ptr)
{
        return IS_ERR_VALUE((unsigned long)ptr);
}
#define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned long)-MAX_ERRNO)
#define MAX_ERRNO       4095

IOW, your trick relies upon arguments of PTR_TO_POLL being no greater
than 4095.  Now, consider
#define EPOLLRDHUP      (__force __poll_t)0x00002000
which is to say, 8192...

So anything that tries e.g. POLL_TO_PTR(EPOLLRDHUP | EPOLLERR) will be in
for a quiet unpleasant surprise...

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply

* Re: [PATCH net-next 6/6] netdevsim: Add simple FIB resource controller via devlink
From: Jiri Pirko @ 2018-04-06  5:35 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, roopa, shm, jiri, idosch, jakub.kicinski,
	andy.roulin
In-Reply-To: <a916f016-5d8b-3f56-0a84-95d1712bec4c@cumulusnetworks.com>

Thu, Apr 05, 2018 at 10:10:29PM CEST, dsa@cumulusnetworks.com wrote:
>On 4/5/18 11:27 AM, Jiri Pirko wrote:
>> Wed, Mar 28, 2018 at 03:22:00AM CEST, dsa@cumulusnetworks.com wrote:
>>> Add devlink support to netdevsim and use it to implement a simple,
>>> profile based resource controller. Only one controller is needed
>>> per namespace, so the first netdevsim netdevice in a namespace
>>> registers with devlink. If that device is deleted, the resource
>>> settings are deleted.
>> 
>> I don't understand why you add 1:1 fixed relationship between
>> netnamespace and devlink instance. That is highly misleading and reader
>> might think that those 2 are somehow related. They are not. You can have
>> multiple devlink instances for many ports in a single namespace.
>
>The netdevsim devlink instance is an example of limiting the number of
>FIB entries and FIB rules for a namespace. It is currently limited to
>the init_net based on past discussion.
>
>It does not make sense to have multiple resource controllers for the
>same network namespace, hence the limit of only registering with devlink
>on the first device create.

Devlink instance represents an ASIC. 1:1. There is no relation with
network namespaces and should not be. I have no clue why you think so.

The model looks as I described it down below in the picture.


>
>> 
>> Could you please clarify?
>> 
>> Also, to see the relationship between individual netdevsim netdevices
>> and the parent devlink instance, we should use devlink_port
>> instances, like this: 
>> 
>>       devlink1              devlink2
>>        |    |                |    |
>>  dl_port1_1 dlport1_2   dlport2_1 dlport2_2
>>        |    |                |    |
>>      eth0  eth1             eth2 eth3
>> 
>> Note that "devlink instance" reprensents one ASIC.
>> The address of the devlink instance is the bus address of the ASIC.
>> Here, you use address of some/first netdevsim netdev instance.
>
>The ASIC here is the kernel tables in a namespace. It does not make
>sense to have 2 devlink instances for a single namespace.

Again. No clue why you build relationship with namespace.


>
>> 
>> The way it is implemented in netdevsim by this patch is wrong on
>> so many levels :(
>> 
>> Could you please fix this? I'm more than happy to help you with this,
>> please say so. Thanks!
>
>What is there to fix?
>
>Not creating a netdevsim device per netdevsim netdevice? That is
>completely unrelated to the devlink change.

To fit the model. Multiple devlink instances, each representing one
"virtual" ASIC, devlink_port instances, 1 for each netdevsim port.
Netdevsim port should simulate real devices. No real device should have
1:1 relation with network namespace. That is just simply wrong.

^ permalink raw reply

* Re: [PATCH net-next 6/6] netdevsim: Add simple FIB resource controller via devlink
From: Jiri Pirko @ 2018-04-06  5:52 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, roopa, shm, jiri, idosch, jakub.kicinski,
	andy.roulin
In-Reply-To: <e9c59b0c-328e-d343-6e8d-d19f643d2e9d@cumulusnetworks.com>

Thu, Apr 05, 2018 at 11:06:41PM CEST, dsa@cumulusnetworks.com wrote:
>On 4/5/18 2:10 PM, David Ahern wrote:
>> 
>> The ASIC here is the kernel tables in a namespace. It does not make
>> sense to have 2 devlink instances for a single namespace.
>
>I put this example controller in netdevsim per a suggestion from Ido.
>The netdevsim seemed like a good idea given that modules intention --
>testing network facilities. Perhaps I should have done this as a
>completely standalone module ...
>
>The intention is to treat the kernel's tables *per namespace* as a
>standalone entity that can be managed very similar to ASIC resources.

So you say you want to treat a namespace as an ASIC? That sounds very
odd to me :/


>Given that I can add a resource controller module
>(drivers/net/kern_res_mgr.c?) that creates a 'struct device' per network
>namespace with a devlink instance. In this case the device would very
>much be tied to the namespace 1:1.

That sounds more reasonable and accurate, yet still odd. You would not
have any netdevices there? Any ports?

^ permalink raw reply

* [PATCH net 0/3] lan78xx: Fixes and enhancements
From: Raghuram Chary J @ 2018-04-06  6:12 UTC (permalink / raw)
  To: davem; +Cc: netdev, unglinuxdriver, woojung.huh, raghuramchary.jallipalli

These series of patches have fix and enhancements for
lan78xx driver.

Raghuram Chary J (3):
  lan78xx: PHY DSP registers initialization to address EEE link drop
    issues with long cables
  lan78xx: Add support to dump lan78xx registers
  lan78xx: Lan7801 Support for Fixed PHY

 drivers/net/phy/microchip.c  | 123 ++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/usb/Kconfig      |   1 +
 drivers/net/usb/lan78xx.c    |  93 ++++++++++++++++++++++++++++++--
 include/linux/microchipphy.h |   8 +++
 4 files changed, 220 insertions(+), 5 deletions(-)

-- 
2.16.2

^ permalink raw reply

* [PATCH net 1/3] lan78xx: PHY DSP registers initialization to address EEE link drop issues with long cables
From: Raghuram Chary J @ 2018-04-06  6:12 UTC (permalink / raw)
  To: davem; +Cc: netdev, unglinuxdriver, woojung.huh, raghuramchary.jallipalli
In-Reply-To: <20180406061204.18257-1-raghuramchary.jallipalli@microchip.com>

The patch is to configure DSP registers of PHY device
to handle Gbe-EEE failures with >40m cable length.

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
---
 drivers/net/phy/microchip.c  | 123 ++++++++++++++++++++++++++++++++++++++++++-
 include/linux/microchipphy.h |   8 +++
 2 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/microchip.c b/drivers/net/phy/microchip.c
index 0f293ef28935..174ae9808722 100644
--- a/drivers/net/phy/microchip.c
+++ b/drivers/net/phy/microchip.c
@@ -20,6 +20,7 @@
 #include <linux/ethtool.h>
 #include <linux/phy.h>
 #include <linux/microchipphy.h>
+#include <linux/delay.h>
 
 #define DRIVER_AUTHOR	"WOOJUNG HUH <woojung.huh@microchip.com>"
 #define DRIVER_DESC	"Microchip LAN88XX PHY driver"
@@ -66,6 +67,107 @@ static int lan88xx_suspend(struct phy_device *phydev)
 	return 0;
 }
 
+static void lan88xx_TR_reg_set(struct phy_device *phydev, u16 regaddr,
+			       u32 data)
+{
+	int val;
+	u16 buf;
+
+	/* Get access to token ring page */
+	phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS,
+		  LAN88XX_EXT_PAGE_ACCESS_TR);
+
+	phy_write(phydev, LAN88XX_EXT_PAGE_TR_LOW_DATA, (data & 0xFFFF));
+	phy_write(phydev, LAN88XX_EXT_PAGE_TR_HIGH_DATA,
+		  (data & 0x00FF0000) >> 16);
+
+	/* Config control bits [15:13] of register */
+	buf = (regaddr & ~(0x3 << 13));/* Clr [14:13] to write data in reg */
+	buf |= 0x8000; /* Set [15] to Packet transmit */
+
+	phy_write(phydev, LAN88XX_EXT_PAGE_TR_CR, buf);
+
+	usleep_range(1000, 2000);/* Wait for Data to be written */
+	val = phy_read(phydev, LAN88XX_EXT_PAGE_TR_CR);
+	if (!(val & 0x8000))
+		pr_warn("TR Register[0x%X] configuration failed\n", regaddr);
+
+	/* Setting to Main page registers space*/
+	phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, LAN88XX_EXT_PAGE_SPACE_0);
+}
+
+static void lan88xx_config_TR_regs(struct phy_device *phydev)
+{
+	/* Get access to Channel 0x1, Node 0xF , Register 0x01.
+	 * Write 24-bit value 0x12B00A to register. Setting MrvlTrFix1000Kf,
+	 * MrvlTrFix1000Kp, MasterEnableTR bits.
+	 */
+	lan88xx_TR_reg_set(phydev, 0x0F82, 0x12B00A);
+
+	/* Get access to Channel b'10, Node b'1101, Register 0x06.
+	 * Write 24-bit value 0xD2C46F to register. Setting SSTrKf1000Slv,
+	 * SSTrKp1000Mas bits.
+	 */
+	lan88xx_TR_reg_set(phydev, 0x168C, 0xD2C46F);
+
+	/* Get access to Channel b'10, Node b'1111, Register 0x11.
+	 * Write 24-bit value 0x620 to register. Setting rem_upd_done_thresh
+	 * bits
+	 */
+	lan88xx_TR_reg_set(phydev, 0x17A2, 0x620);
+
+	/* Get access to Channel b'10, Node b'1101, Register 0x10.
+	 * Write 24-bit value 0xEEFFDD to register. Setting
+	 * eee_TrKp1Long_1000, eee_TrKp2Long_1000, eee_TrKp3Long_1000,
+	 * eee_TrKp1Short_1000,eee_TrKp2Short_1000, eee_TrKp3Short_1000 bits.
+	 */
+	lan88xx_TR_reg_set(phydev, 0x16A0, 0xEEFFDD);
+
+	/* Get access to Channel b'10, Node b'1101, Register 0x13.
+	 * Write 24-bit value 0x071448 to register. Setting
+	 * slv_lpi_tr_tmr_val1, slv_lpi_tr_tmr_val2 bits.
+	 */
+	lan88xx_TR_reg_set(phydev, 0x16A6, 0x071448);
+
+	/* Get access to Channel b'10, Node b'1101, Register 0x12.
+	 * Write 24-bit value 0x13132F to register. Setting
+	 * slv_sigdet_timer_val1, slv_sigdet_timer_val2 bits.
+	 */
+	lan88xx_TR_reg_set(phydev, 0x16A4, 0x13132F);
+
+	/* Get access to Channel b'10, Node b'1101, Register 0x14.
+	 * Write 24-bit value 0x0 to register. Setting eee_3level_delay,
+	 * eee_TrKf_freeze_delay bits.
+	 */
+	lan88xx_TR_reg_set(phydev, 0x16A8, 0x0);
+
+	/* Get access to Channel b'01, Node b'1111, Register 0x34.
+	 * Write 24-bit value 0x91B06C to register. Setting
+	 * FastMseSearchThreshLong1000, FastMseSearchThreshShort1000,
+	 * FastMseSearchUpdGain1000 bits.
+	 */
+	lan88xx_TR_reg_set(phydev, 0x0FE8, 0x91B06C);
+
+	/* Get access to Channel b'01, Node b'1111, Register 0x3E.
+	 * Write 24-bit value 0xC0A028 to register. Setting
+	 * FastMseKp2ThreshLong1000, FastMseKp2ThreshShort1000,
+	 * FastMseKp2UpdGain1000, FastMseKp2ExitEn1000 bits.
+	 */
+	lan88xx_TR_reg_set(phydev, 0x0FFC, 0xC0A028);
+
+	/* Get access to Channel b'01, Node b'1111, Register 0x35.
+	 * Write 24-bit value 0x041600 to register. Setting
+	 * FastMseSearchPhShNum1000, FastMseSearchClksPerPh1000,
+	 * FastMsePhChangeDelay1000 bits.
+	 */
+	lan88xx_TR_reg_set(phydev, 0x0FEA, 0x041600);
+
+	/* Get access to Channel b'10, Node b'1101, Register 0x03.
+	 * Write 24-bit value 0x000004 to register. Setting TrFreeze bits.
+	 */
+	lan88xx_TR_reg_set(phydev, 0x1686, 0x000004);
+}
+
 static int lan88xx_probe(struct phy_device *phydev)
 {
 	struct device *dev = &phydev->mdio.dev;
@@ -132,6 +234,25 @@ static void lan88xx_set_mdix(struct phy_device *phydev)
 	phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, LAN88XX_EXT_PAGE_SPACE_0);
 }
 
+static int lan88xx_config_init(struct phy_device *phydev)
+{
+	int val;
+
+	genphy_config_init(phydev);
+	/*Zerodetect delay enable */
+	val = phy_read_mmd(phydev, MDIO_MMD_PCS,
+			   PHY_ARDENNES_MMD_DEV_3_PHY_CFG);
+	val |= PHY_ARDENNES_MMD_DEV_3_PHY_CFG_ZD_DLY_EN_;
+
+	phy_write_mmd(phydev, MDIO_MMD_PCS, PHY_ARDENNES_MMD_DEV_3_PHY_CFG,
+		      val);
+
+	/* Config DSP registers */
+	lan88xx_config_TR_regs(phydev);
+
+	return 0;
+}
+
 static int lan88xx_config_aneg(struct phy_device *phydev)
 {
 	lan88xx_set_mdix(phydev);
@@ -151,7 +272,7 @@ static struct phy_driver microchip_phy_driver[] = {
 	.probe		= lan88xx_probe,
 	.remove		= lan88xx_remove,
 
-	.config_init	= genphy_config_init,
+	.config_init	= lan88xx_config_init,
 	.config_aneg	= lan88xx_config_aneg,
 
 	.ack_interrupt	= lan88xx_phy_ack_interrupt,
diff --git a/include/linux/microchipphy.h b/include/linux/microchipphy.h
index eb492d47f717..8f9c90379732 100644
--- a/include/linux/microchipphy.h
+++ b/include/linux/microchipphy.h
@@ -70,4 +70,12 @@
 #define	LAN88XX_MMD3_CHIP_ID			(32877)
 #define	LAN88XX_MMD3_CHIP_REV			(32878)
 
+/* DSP registers */
+#define PHY_ARDENNES_MMD_DEV_3_PHY_CFG		(0x806A)
+#define PHY_ARDENNES_MMD_DEV_3_PHY_CFG_ZD_DLY_EN_	(0x2000)
+#define LAN88XX_EXT_PAGE_ACCESS_TR		(0x52B5)
+#define LAN88XX_EXT_PAGE_TR_CR			16
+#define LAN88XX_EXT_PAGE_TR_LOW_DATA		17
+#define LAN88XX_EXT_PAGE_TR_HIGH_DATA		18
+
 #endif /* _MICROCHIPPHY_H */
-- 
2.16.2

^ permalink raw reply related

* [PATCH net 2/3] lan78xx: Add support to dump lan78xx registers
From: Raghuram Chary J @ 2018-04-06  6:12 UTC (permalink / raw)
  To: davem; +Cc: netdev, unglinuxdriver, woojung.huh, raghuramchary.jallipalli
In-Reply-To: <20180406061204.18257-1-raghuramchary.jallipalli@microchip.com>

In order to dump lan78xx family registers using ethtool, add
support at lan78xx driver level.

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
---
 drivers/net/usb/lan78xx.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 55a78eb96961..e3cc3b504c87 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -278,6 +278,30 @@ struct lan78xx_statstage64 {
 	u64 eee_tx_lpi_time;
 };
 
+u32 lan78xx_regs[] = {
+	ID_REV,
+	INT_STS,
+	HW_CFG,
+	PMT_CTL,
+	E2P_CMD,
+	E2P_DATA,
+	USB_STATUS,
+	VLAN_TYPE,
+	MAC_CR,
+	MAC_RX,
+	MAC_TX,
+	FLOW,
+	ERR_STS,
+	MII_ACC,
+	MII_DATA,
+	EEE_TX_LPI_REQ_DLY,
+	EEE_TW_TX_SYS,
+	EEE_TX_LPI_REM_DLY,
+	WUCSR
+};
+
+#define PHY_REG_SIZE (32 * sizeof(u32))
+
 struct lan78xx_net;
 
 struct lan78xx_priv {
@@ -1604,6 +1628,31 @@ static int lan78xx_set_pause(struct net_device *net,
 	return ret;
 }
 
+static int lan78xx_get_regs_len(struct net_device *netdev)
+{
+	return (sizeof(lan78xx_regs) + PHY_REG_SIZE);
+}
+
+static void
+lan78xx_get_regs(struct net_device *netdev, struct ethtool_regs *regs,
+		 void *buf)
+{
+	u32 *data = buf;
+	int i, j;
+	struct lan78xx_net *dev = netdev_priv(netdev);
+
+	/* Read Device/MAC registers */
+	for (i = 0, j = 0; i < (sizeof(lan78xx_regs) / sizeof(u32)); i++, j++)
+		lan78xx_read_reg(dev, lan78xx_regs[i], &data[j]);
+
+	if (!netdev->phydev)
+		return;
+
+	/* Read PHY registers */
+	for (i = 0; i < 32; i++, j++)
+		data[j] = phy_read(netdev->phydev, i);
+}
+
 static const struct ethtool_ops lan78xx_ethtool_ops = {
 	.get_link	= lan78xx_get_link,
 	.nway_reset	= phy_ethtool_nway_reset,
@@ -1624,6 +1673,8 @@ static const struct ethtool_ops lan78xx_ethtool_ops = {
 	.set_pauseparam	= lan78xx_set_pause,
 	.get_link_ksettings = lan78xx_get_link_ksettings,
 	.set_link_ksettings = lan78xx_set_link_ksettings,
+	.get_regs_len	= lan78xx_get_regs_len,
+	.get_regs	= lan78xx_get_regs,
 };
 
 static int lan78xx_ioctl(struct net_device *netdev, struct ifreq *rq, int cmd)
-- 
2.16.2

^ permalink raw reply related

* [PATCH net 3/3] lan78xx: Lan7801 Support for Fixed PHY
From: Raghuram Chary J @ 2018-04-06  6:12 UTC (permalink / raw)
  To: davem; +Cc: netdev, unglinuxdriver, woojung.huh, raghuramchary.jallipalli
In-Reply-To: <20180406061204.18257-1-raghuramchary.jallipalli@microchip.com>

Adding Fixed PHY support to the lan78xx driver.

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
---
 drivers/net/usb/Kconfig   |  1 +
 drivers/net/usb/lan78xx.c | 42 ++++++++++++++++++++++++++++++++++++++----
 2 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
index f28bd74ac275..418b0904cecb 100644
--- a/drivers/net/usb/Kconfig
+++ b/drivers/net/usb/Kconfig
@@ -111,6 +111,7 @@ config USB_LAN78XX
 	select MII
 	select PHYLIB
 	select MICROCHIP_PHY
+	select FIXED_PHY
 	help
 	  This option adds support for Microchip LAN78XX based USB 2
 	  & USB 3 10/100/1000 Ethernet adapters.
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index e3cc3b504c87..e67b2dabde66 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -36,7 +36,7 @@
 #include <linux/irq.h>
 #include <linux/irqchip/chained_irq.h>
 #include <linux/microchipphy.h>
-#include <linux/phy.h>
+#include <linux/phy_fixed.h>
 #include "lan78xx.h"
 
 #define DRIVER_AUTHOR	"WOOJUNG HUH <woojung.huh@microchip.com>"
@@ -426,6 +426,7 @@ struct lan78xx_net {
 	struct statstage	stats;
 
 	struct irq_domain_data	domain_data;
+	struct phy_device       *fixedphy;
 };
 
 /* define external phy id */
@@ -2058,11 +2059,39 @@ static int lan78xx_phy_init(struct lan78xx_net *dev)
 	int ret;
 	u32 mii_adv;
 	struct phy_device *phydev;
+	struct fixed_phy_status fphy_status = {
+		.link = 1,
+		.speed = SPEED_1000,
+		.duplex = DUPLEX_FULL,
+	};
 
 	phydev = phy_find_first(dev->mdiobus);
 	if (!phydev) {
-		netdev_err(dev->net, "no PHY found\n");
-		return -EIO;
+		if (dev->chipid == ID_REV_CHIP_ID_7801_) {
+			u32 buf;
+
+			netdev_info(dev->net, "PHY Not Found!! Registering Fixed PHY\n");
+			phydev = fixed_phy_register(PHY_POLL, &fphy_status, -1,
+						    NULL);
+			if (IS_ERR(phydev)) {
+				netdev_err(dev->net, "No PHY/fixed_PHY found\n");
+				return -ENODEV;
+			}
+			netdev_info(dev->net, "Registered FIXED PHY\n");
+			dev->interface = PHY_INTERFACE_MODE_RGMII;
+			dev->fixedphy = phydev;
+			ret = lan78xx_write_reg(dev, MAC_RGMII_ID,
+						MAC_RGMII_ID_TXC_DELAY_EN_);
+			ret = lan78xx_write_reg(dev, RGMII_TX_BYP_DLL, 0x3D00);
+			ret = lan78xx_read_reg(dev, HW_CFG, &buf);
+			buf |= HW_CFG_CLK125_EN_;
+			buf |= HW_CFG_REFCLK25_EN_;
+			ret = lan78xx_write_reg(dev, HW_CFG, buf);
+			goto phyinit;
+		} else {
+			netdev_err(dev->net, "no PHY found\n");
+			return -EIO;
+		}
 	}
 
 	if ((dev->chipid == ID_REV_CHIP_ID_7800_) ||
@@ -2100,7 +2129,7 @@ static int lan78xx_phy_init(struct lan78xx_net *dev)
 		ret = -EIO;
 		goto error;
 	}
-
+phyinit:
 	/* if phyirq is not set, use polling mode in phylib */
 	if (dev->domain_data.phyirq > 0)
 		phydev->irq = dev->domain_data.phyirq;
@@ -3559,6 +3588,11 @@ static void lan78xx_disconnect(struct usb_interface *intf)
 	udev = interface_to_usbdev(intf);
 
 	net = dev->net;
+
+	if (dev->fixedphy) {
+		fixed_phy_unregister(dev->fixedphy);
+		dev->fixedphy = NULL;
+	}
 	unregister_netdev(net);
 
 	cancel_delayed_work_sync(&dev->wq);
-- 
2.16.2

^ permalink raw reply related

* Re: marvell switch
From: Ran Shalit @ 2018-04-06  7:23 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <20180405204622.GE17495@lunn.ch>

On Thu, Apr 5, 2018 at 11:46 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>> > Hi Ran
>> >
>> > The Marvell driver makes each port act like a normal Linux network
>> > interface. So if you want to enable a port, do
>> >
>> > ip link set lan0 up
>> >
>> > Want to add an ip address to a port
>> >
>> > ip addr add 10.42.42.42/24 dev lan0
>> >
>> > Want to bridge two ports
>> >
>> > ip link add name br0 type bridge
>> > ip link set dev br0 up
>> > ip link set dev lan0 master br0
>> > ip link set dev lan1 master br0
>> >
>> > Just treat them as normal interfaces.
>> >
>>
>> If I may please ask,
>> What is the purpose of using bridge for configuring switch interfaces.
>> Is it in order to isolate some ports from others?
>> I ask because according to my understanding the default configuration of
>> the driver is to enable switch in "flat" configuration, i.e. as if all
>> ports are connected to each other.
>
> Please think about what i said. They are standard Linux network
> interfaces. Do standard Linux network interfaces bridge themselves
> together by default? No, you need to configure a bridge.
>
>          Andrew

I understand now...
Thank you very much.
ranran

^ permalink raw reply

* Re: WARNING in xfrm6_tunnel_net_exit
From: syzbot @ 2018-04-06  7:55 UTC (permalink / raw)
  To: davem, herbert, kuznet, linux-kernel, netdev, steffen.klassert,
	syzkaller-bugs, yoshfuji
In-Reply-To: <001a11479e32f4f8c50561f21514@google.com>

syzbot has found reproducer for the following crash on upstream commit
3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91 (Sat Mar 31 01:52:36 2018 +0000)
kernel.h: Retain constant expression output for max()/min()
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=777bf170a89e7b326405

So far this crash happened 10982 times on linux-next, mmots, net-next,  
upstream.
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=5399809707999232
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=4550974920196096
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-1647968177339044852
compiler: gcc (GCC) 8.0.1 20180301 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+777bf170a89e7b326405@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed.

IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
WARNING: CPU: 0 PID: 180 at net/ipv6/xfrm6_tunnel.c:345  
xfrm6_tunnel_net_exit+0x2c0/0x4f0 net/ipv6/xfrm6_tunnel.c:345
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 180 Comm: kworker/u4:4 Not tainted 4.16.0+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x1b9/0x29f lib/dump_stack.c:53
  panic+0x22f/0x4de kernel/panic.c:183
  __warn.cold.8+0x163/0x1a3 kernel/panic.c:547
  report_bug+0x252/0x2d0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:178 [inline]
  do_error_trap+0x1bc/0x470 arch/x86/kernel/traps.c:296
  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
  invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:991
RIP: 0010:xfrm6_tunnel_net_exit+0x2c0/0x4f0 net/ipv6/xfrm6_tunnel.c:345
RSP: 0018:ffff8801d96373d8 EFLAGS: 00010293
RAX: ffff8801d961c080 RBX: ffff8801b0e999a0 RCX: ffffffff866b08c6
RDX: 0000000000000000 RSI: ffffffff866b08d0 RDI: 0000000000000007
RBP: ffff8801d96374f8 R08: ffff8801d961c080 R09: ffffed003b6046c2
R10: 0000000000000003 R11: 0000000000000003 R12: 000000000000007c
R13: ffffed003b2c6e82 R14: ffff8801d96374d0 R15: ffff8801b6185f80
  ops_exit_list.isra.7+0xb0/0x160 net/core/net_namespace.c:152
  cleanup_net+0x51d/0xb20 net/core/net_namespace.c:523
  process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
  worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
  kthread+0x345/0x410 kernel/kthread.c:238
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
Dumping ftrace buffer:
    (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..

^ permalink raw reply

* [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size
From: haibinzhang(张海斌) @ 2018-04-06  8:22 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: lidongchen(陈立东),
	yunfangtai(台运方)

handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
polling udp packets with small length(e.g. 1byte udp payload), because setting
VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length.

Ping-Latencies shown below were tested between two Virtual Machines using
netperf (UDP_STREAM, len=1), and then another machine pinged the client:

Packet-Weight      Ping-Latencies(millisecond)
                   min      avg       max
Origin           3.319   18.489    57.303
64               1.643    2.021     2.552
128              1.825    2.600     3.224
256              1.997    2.710     4.295
512              1.860    3.171     4.631
1024             2.002    4.173     9.056
2048             2.257    5.650     9.688
4096             2.093    8.508    15.943

Ring size is a hint from device about a burst size it can tolerate. Based on
benchmarks, set the weight to 2 * vq size.

To evaluate this change, another tests were done using netperf(RR, TX) between
two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
tweaked through qemu. Results shown below does not show obvious changes.

vq size=256 TCP_RR                vq size=512 TCP_RR
size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
   1/       1/  -7%/        -2%      1/       1/   0%/        -2%
   1/       4/  +1%/         0%      1/       4/  +1%/         0%
   1/       8/  +1%/        -2%      1/       8/   0%/        +1%
  64/       1/  -6%/         0%     64/       1/  +7%/        +3%
  64/       4/   0%/        +2%     64/       4/  -1%/        +1%
  64/       8/   0%/         0%     64/       8/  -1%/        -2%
 256/       1/  -3%/        -4%    256/       1/  -4%/        -2%
 256/       4/  +3%/        +4%    256/       4/  +1%/        +2%
 256/       8/  +2%/         0%    256/       8/  +1%/        -1%

vq size=256 UDP_RR                vq size=512 UDP_RR
size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
   1/       1/  -5%/        +1%      1/       1/  -3%/        -2%
   1/       4/  +4%/        +1%      1/       4/  -2%/        +2%
   1/       8/  -1%/        -1%      1/       8/  -1%/         0%
  64/       1/  -2%/        -3%     64/       1/  +1%/        +1%
  64/       4/  -5%/        -1%     64/       4/  +2%/         0%
  64/       8/   0%/        -1%     64/       8/  -2%/        +1%
 256/       1/  +7%/        +1%    256/       1/  -7%/         0%
 256/       4/  +1%/        +1%    256/       4/  -3%/        -4%
 256/       8/  +2%/        +2%    256/       8/  +1%/        +1%

vq size=256 TCP_STREAM            vq size=512 TCP_STREAM
size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
  64/       1/   0%/        -3%     64/       1/   0%/         0%
  64/       4/  +3%/        -1%     64/       4/  -2%/        +4%
  64/       8/  +9%/        -4%     64/       8/  -1%/        +2%
 256/       1/  +1%/        -4%    256/       1/  +1%/        +1%
 256/       4/  -1%/        -1%    256/       4/  -3%/         0%
 256/       8/  +7%/        +5%    256/       8/  -3%/         0%
 512/       1/  +1%/         0%    512/       1/  -1%/        -1%
 512/       4/  +1%/        -1%    512/       4/   0%/         0%
 512/       8/  +7%/        -5%    512/       8/  +6%/        -1%
1024/       1/   0%/        -1%   1024/       1/   0%/        +1%
1024/       4/  +3%/         0%   1024/       4/  +1%/         0%
1024/       8/  +8%/        +5%   1024/       8/  -1%/         0%
2048/       1/  +2%/        +2%   2048/       1/  -1%/         0%
2048/       4/  +1%/         0%   2048/       4/   0%/        -1%
2048/       8/  -2%/         0%   2048/       8/   5%/        -1%
4096/       1/  -2%/         0%   4096/       1/  -2%/         0%
4096/       4/  +2%/         0%   4096/       4/   0%/         0%
4096/       8/  +9%/        -2%   4096/       8/  -5%/        -1%

Signed-off-by: Haibin Zhang <haibinzhang@tencent.com>
Signed-off-by: Yunfang Tai <yunfangtai@tencent.com>
Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 drivers/vhost/net.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8139bc70ad7d..3563a305cc0a 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -44,6 +44,10 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
  * Using this limit prevents one virtqueue from starving others. */
 #define VHOST_NET_WEIGHT 0x80000
 
+/* Max number of packets transferred before requeueing the job.
+ * Using this limit prevents one virtqueue from starving rx. */
+#define VHOST_NET_PKT_WEIGHT(vq) ((vq)->num * 2)
+
 /* MAX number of TX used buffers for outstanding zerocopy */
 #define VHOST_MAX_PEND 128
 #define VHOST_GOODCOPY_LEN 256
@@ -473,6 +477,7 @@ static void handle_tx(struct vhost_net *net)
 	struct socket *sock;
 	struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
 	bool zcopy, zcopy_used;
+	int sent_pkts = 0;
 
 	mutex_lock(&vq->mutex);
 	sock = vq->private_data;
@@ -580,7 +585,8 @@ static void handle_tx(struct vhost_net *net)
 		else
 			vhost_zerocopy_signal_used(net, vq);
 		vhost_net_tx_packet(net);
-		if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
+		if (unlikely(total_len >= VHOST_NET_WEIGHT) ||
+		    unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT(vq))) {
 			vhost_poll_queue(&vq->poll);
 			break;
 		}
-- 
2.12.3


^ permalink raw reply related

* Re: Best userspace programming API for XDP features query to kernel?
From: Jesper Dangaard Brouer via iovisor-dev @ 2018-04-06  8:47 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: oisf-devel-ZwoEplunGu2j570ONfqVQLVmwVP6tfMwSoIsB4E12gc,
	Alexei Starovoitov, Victor Julien, netdev-u79uwXL29TY76Z2rM5mHXA,
	iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org,
	Peter Manev, Jiri Benc, Saeed Mahameed, Eric Leblond,
	Daniel Borkmann
In-Reply-To: <20180405144716.2bdd4214-68UzVGuGftmUSpRRplVxJ1aTQe2KTcn/@public.gmane.org>

On Thu, 5 Apr 2018 14:47:16 -0700
Jakub Kicinski <jakub.kicinski-wFxRvT7yatFl57MIdRCFDg@public.gmane.org> wrote:

> On Thu, 5 Apr 2018 22:51:33 +0200, Jesper Dangaard Brouer wrote:
> > > What about nfp in terms of XDP
> > > offload capabilities, should they be included as well or is probing to load
> > > the program and see if it loads/JITs as we do today just fine (e.g. you'd
> > > otherwise end up with extra flags on a per BPF helper basis)?    
> > 
> > No, flags per BPF helper basis. As I've described above, helper belong
> > to the BPF core, not the driver.  Here I want to know what the specific
> > driver support.  
> 
> I think Daniel meant for nfp offload.  The offload restrictions are
> quite involved, are we going to be able to express those?

Let's keep thing separate.

I'm requesting something really simple.  I want the driver to tell me
what XDP actions it supports.  We/I can implement an XDP_QUERY_ACTIONS
via ndo_bpf, problem solved.  It is small, specific and simple.

For my other use-case of enabling XDP-xmit on a device, I can
implement another ndo_bpf extension. Current approach today is loading
a dummy XDP prog via ndo_bpf anyway (which is awkward). Again a
specific change that let us move one-step further.

For your nfp offload use-case, you/we have to find a completely
different solution.  You have hit a design choice made by BPF.
Which is that BPF is part of the core kernel, and helpers cannot be
loaded as kernel modules.  As we cannot remove or add helpers after the
verifier certified the program.  And basically your nfp offload driver
comes as a kernel module.
 (Details: and you basically already solved your issue by modifying the
core verifier to do a call back to bpf_prog_offload_verify_insn()).
Point being this is very different from what I'm requesting.  Thus, for
offloading you already have a solution, to my setup time detect
problem, as your program gets rejected setup/load time by the verifier.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox