Netdev List

Netdev List
 help / color / mirror / Atom feed

* KASAN: use-after-free Read in sk_psock_unlink
From: syzbot @ 2018-10-26  7:37 UTC (permalink / raw)
  To: daniel, davem, john.fastabend, linux-kernel, netdev,
	syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    8c60c36d0b8c Add linux-next specific files for 20181019
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=17356dbd400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8b6d7c4c81535e89
dashboard link: https://syzkaller.appspot.com/bug?extid=3acd9f67a6a15766686e
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+3acd9f67a6a15766686e@syzkaller.appspotmail.com

EXT4-fs (loop5): couldn't mount RDWR because of unsupported optional  
features (80)
EXT4-fs (loop5): couldn't mount RDWR because of unsupported optional  
features (80)
input: \x02 as /devices/virtual/input/input21
==================================================================
BUG: KASAN: use-after-free in sk_psock_unlink+0x4d8/0x700  
net/core/sock_map.c:992
Read of size 4 at addr ffff8801b7fc1018 by task syz-executor4/21409

CPU: 0 PID: 21409 Comm: syz-executor4 Not tainted 4.19.0-rc8-next-20181019+  
#98
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x244/0x39d lib/dump_stack.c:113
  print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
  __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
  sk_psock_unlink+0x4d8/0x700 net/core/sock_map.c:992
  tcp_bpf_remove+0xd0/0x130 net/ipv4/tcp_bpf.c:511
  tcp_bpf_close+0x1c6/0x4a0 net/ipv4/tcp_bpf.c:551
  inet_release+0x104/0x1f0 net/ipv4/af_inet.c:428
  __sock_release+0xd7/0x250 net/socket.c:580
  sock_close+0x19/0x20 net/socket.c:1142
  __fput+0x3bc/0xa70 fs/file_table.c:279
  ____fput+0x15/0x20 fs/file_table.c:312
  task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
  get_signal+0x1550/0x1970 kernel/signal.c:2347
  do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816
  exit_to_usermode_loop+0x2e5/0x380 arch/x86/entry/common.c:162
  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
  do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457569
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fd88965ec78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: 0000000000280000 RBX: 0000000000000006 RCX: 0000000000457569
RDX: fffffffffffffe6e RSI: 0000000020a88f88 RDI: 0000000000000003
RBP: 000000000072bf00 R08: 0000000020e68000 R09: 0000000000000010
R10: 0000000020000000 R11: 0000000000000246 R12: 00007fd88965f6d4
R13: 00000000004c3915 R14: 00000000004d57c0 R15: 00000000ffffffff

Allocated by task 21423:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
  kmem_cache_alloc_trace+0x152/0x750 mm/slab.c:3620
  kmalloc include/linux/slab.h:546 [inline]
  kzalloc include/linux/slab.h:741 [inline]
  sock_hash_alloc+0x1eb/0x5a0 net/core/sock_map.c:801
  find_and_alloc_map kernel/bpf/syscall.c:129 [inline]
  map_create+0x3bd/0x1100 kernel/bpf/syscall.c:509
  __do_sys_bpf kernel/bpf/syscall.c:2394 [inline]
  __se_sys_bpf kernel/bpf/syscall.c:2371 [inline]
  __x64_sys_bpf+0x303/0x510 kernel/bpf/syscall.c:2371
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 10109:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kfree+0xcf/0x230 mm/slab.c:3817
  sock_hash_free+0x450/0x640 net/core/sock_map.c:864
  bpf_map_free_deferred+0xd9/0x110 kernel/bpf/syscall.c:290
  process_one_work+0xc8b/0x1c40 kernel/workqueue.c:2153
  worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
  kthread+0x35a/0x440 kernel/kthread.c:246
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff8801b7fc1000
  which belongs to the cache kmalloc-512 of size 512
The buggy address is located 24 bytes inside of
  512-byte region [ffff8801b7fc1000, ffff8801b7fc1200)
The buggy address belongs to the page:
page:ffffea0006dff040 count:1 mapcount:0 mapping:ffff8801da800940 index:0x0
flags: 0x2fffc0000000200(slab)
raw: 02fffc0000000200 ffffea0006ee5888 ffffea00071fc6c8 ffff8801da800940
raw: 0000000000000000 ffff8801b7fc1000 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801b7fc0f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8801b7fc0f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ffff8801b7fc1000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                             ^
  ffff8801b7fc1080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801b7fc1100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply

* Re: [PATCH 3/3] usb: renesas_usbhs: Remove dummy runtime PM callbacks
From: Wolfram Sang @ 2018-10-25 22:57 UTC (permalink / raw)
  To: Jarkko Nikula, Yoshihiro Shimoda
  Cc: linux-pm, linux-i2c, Wolfram Sang, netdev, David S . Miller,
	Sergei Shtylyov, linux-renesas-soc, linux-usb, Yoshihiro Shimoda
In-Reply-To: <20181024135134.28456-4-jarkko.nikula@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 829 bytes --]

On Wed, Oct 24, 2018 at 04:51:34PM +0300, Jarkko Nikula wrote:
> Platform drivers don't need dummy runtime PM callbacks that just return
> success in order to have runtime PM happening. This has changed since
> following commits:
> 
> 05aa55dddb9e ("PM / Runtime: Lenient generic runtime pm callbacks")
> 543f2503a956 ("PM / platform_bus: Allow runtime PM by default")
> 8b313a38ecff ("PM / Platform: Use generic runtime PM callbacks directly")
> 
> Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>

Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

>  static const struct dev_pm_ops usbhsc_pm_ops = {
>  	.suspend		= usbhsc_suspend,
>  	.resume			= usbhsc_resume,

Unrelated to this patch, but I wonder right now: is there a reason not
to use SET_SYSTEM_SLEEP_PM_OPS here? Shimoda-san?


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 bpf] bpf: devmap: fix wrong interface selection in notifier_call
From: Daniel Borkmann @ 2018-10-25 22:56 UTC (permalink / raw)
  To: Taehee Yoo, ast; +Cc: netdev, john.fastabend
In-Reply-To: <20181024111517.13361-1-ap420073@gmail.com>

On 10/24/2018 01:15 PM, Taehee Yoo wrote:
> The dev_map_notification() removes interface in devmap if
> unregistering interface's ifindex is same.
> But only checking ifindex is not enough because other netns can have
> same ifindex. so that wrong interface selection could occurred.
> Hence netdev pointer comparison code is added.
> 
> v2: compare netdev pointer instead of using net_eq() (Daniel Borkmann)
> v1: Initial patch
> 
> Fixes: 2ddf71e23cc2 ("net: add notifier hooks for devmap bpf map")
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>

Applied to bpf, thanks Taehee!

^ permalink raw reply

* Re: [PATCH] selftests/bpf: add config fragments BPF_STREAM_PARSER and XDP_SOCKETS
From: Daniel Borkmann @ 2018-10-25 22:55 UTC (permalink / raw)
  To: Naresh Kamboju, netdev; +Cc: bhole_prashant_q7, davem
In-Reply-To: <1540478848-27827-1-git-send-email-naresh.kamboju@linaro.org>

On 10/25/2018 04:47 PM, Naresh Kamboju wrote:
> BPF sockmap and hashmap are dependent on CONFIG_BPF_STREAM_PARSER and
> xskmap is dependent on CONFIG_XDP_SOCKETS
> 
> Signed-off-by: Naresh Kamboju <naresh.kamboju@linaro.org>

Applied to bpf, thanks Naresh!

^ permalink raw reply

* Re: [PATCH 2/3] net: ethernet: Remove dummy runtime PM callbacks from Renesas drivers
From: Wolfram Sang @ 2018-10-25 22:54 UTC (permalink / raw)
  To: Jarkko Nikula
  Cc: linux-pm, linux-i2c, Wolfram Sang, netdev, David S . Miller,
	Sergei Shtylyov, linux-renesas-soc, linux-usb, Yoshihiro Shimoda
In-Reply-To: <20181024135134.28456-3-jarkko.nikula@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 584 bytes --]

On Wed, Oct 24, 2018 at 04:51:33PM +0300, Jarkko Nikula wrote:
> Platform drivers don't need dummy runtime PM callbacks that just return
> success in order to have runtime PM happening. This has changed since
> following commits:
> 
> 05aa55dddb9e ("PM / Runtime: Lenient generic runtime pm callbacks")
> 543f2503a956 ("PM / platform_bus: Allow runtime PM by default")
> 8b313a38ecff ("PM / Platform: Use generic runtime PM callbacks directly")
> 
> Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>

Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com>


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] net/ipv6: Block IPv6 addrconf on team ports
From: Jay Vosburgh @ 2018-10-25 22:40 UTC (permalink / raw)
  To: Chas Williams; +Cc: davem, netdev, vfalico, andy, jiri, kuznet, yoshfuji
In-Reply-To: <b8ae5ea8-985f-432c-e90a-7cd9b9da6009@gmail.com>

Chas Williams <3chas3@gmail.com> wrote:

>On 10/25/2018 05:59 PM, Jay Vosburgh wrote:
>> Chas Williams <3chas3@gmail.com> wrote:
>>
>>> netif_is_lag_port should be used to identify link aggregation ports.
>>> For this to work, we need to reorganize the bonding and team drivers
>>> so that the necessary flags are set before dev_open is called.
>>>
>>> commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
>>> made this decision originally based on the IFF_SLAVE flag which isn't
>>> used by the team driver.  Note, we do need to retain the IFF_SLAVE
>>> check for the eql driver.
>>
>> 	Is 31e77c93e432 the correct commit reference?  I don't see
>> anything in there about IFF_SLAVE or bonding; it's a patch to the
>> process scheduler.
>
>No, that's wrong.  It should be c2edacf80e155.
>
>> 	And, as Jiri said, the subject doesn't mention bonding.
>
>The behavior of bonding wasn't changed.  The intent of the patch
>is to add team slaves to the interfaces that don't get automatic
>IPv6 addresses.  The body discusses why bonding had to change as
>well.

	Sure, but the bonding code has changed, and the current
presentation makes it harder for reviewers to follow (or perhaps even
notice).

>I was under the impression that the subject needs to kept short.
>If there a better way to phrase what I want to do?

	I'd suggest splitting this into three patches: A first patch
that adds the new IPv6 functionality, then one patch each for team and
bonding to take advantage of that new functionality.  Each of the three
would then be very straightforward, change just one thing, and should be
clearer all around.

	-J

>>> Signed-off-by: Chas Williams <3chas3@gmail.com>
>>> ---
>>> drivers/net/bonding/bond_main.c | 4 ++--
>>> drivers/net/team/team.c         | 7 +++++--
>>> net/ipv6/addrconf.c             | 2 +-
>>> 3 files changed, 8 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index ffa37adb7681..5cdad164332b 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -1536,6 +1536,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
>>>
>>> 	/* set slave flag before open to prevent IPv6 addrconf */
>>> 	slave_dev->flags |= IFF_SLAVE;
>>> +	slave_dev->priv_flags |= IFF_BONDING;
>>>
>>> 	/* open the slave since the application closed it */
>>> 	res = dev_open(slave_dev);
>>> @@ -1544,7 +1545,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
>>> 		goto err_restore_mac;
>>> 	}
>>>
>>> -	slave_dev->priv_flags |= IFF_BONDING;
>>> 	/* initialize slave stats */
>>> 	dev_get_stats(new_slave->dev, &new_slave->slave_stats);
>>>
>>> @@ -1804,10 +1804,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
>>> 	slave_disable_netpoll(new_slave);
>>>
>>> err_close:
>>> -	slave_dev->priv_flags &= ~IFF_BONDING;
>>> 	dev_close(slave_dev);
>>>
>>> err_restore_mac:
>>> +	slave_dev->priv_flags &= ~IFF_BONDING;
>>> 	slave_dev->flags &= ~IFF_SLAVE;
>>> 	if (!bond->params.fail_over_mac ||
>>> 	    BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
>>> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>>> index db633ae9f784..8fc7d57e9f6d 100644
>>> --- a/drivers/net/team/team.c
>>> +++ b/drivers/net/team/team.c
>>> @@ -1128,14 +1128,12 @@ static int team_upper_dev_link(struct team *team, struct team_port *port,
>>> 					   &lag_upper_info, extack);
>>> 	if (err)
>>> 		return err;
>>> -	port->dev->priv_flags |= IFF_TEAM_PORT;
>>> 	return 0;
>>> }
>>>
>>> static void team_upper_dev_unlink(struct team *team, struct team_port *port)
>>> {
>>> 	netdev_upper_dev_unlink(port->dev, team->dev);
>>> -	port->dev->priv_flags &= ~IFF_TEAM_PORT;
>>> }
>>>
>>> static void __team_port_change_port_added(struct team_port *port, bool linkup);
>>> @@ -1214,6 +1212,9 @@ static int team_port_add(struct team *team, struct net_device *port_dev,
>>> 		goto err_port_enter;
>>> 	}
>>>
>>> +	/* set slave flag before open to prevent IPv6 addrconf */
>>> +	port->dev->priv_flags |= IFF_TEAM_PORT;
>>> +
>>> 	err = dev_open(port_dev);
>>> 	if (err) {
>>> 		netdev_dbg(dev, "Device %s opening failed\n",
>>> @@ -1292,6 +1293,7 @@ static int team_port_add(struct team *team, struct net_device *port_dev,
>>> 	dev_close(port_dev);
>>>
>>> err_dev_open:
>>> +	port->dev->priv_flags &= ~IFF_TEAM_PORT;
>>> 	team_port_leave(team, port);
>>> 	team_port_set_orig_dev_addr(port);
>>>
>>> @@ -1328,6 +1330,7 @@ static int team_port_del(struct team *team, struct net_device *port_dev)
>>> 	dev_uc_unsync(port_dev, dev);
>>> 	dev_mc_unsync(port_dev, dev);
>>> 	dev_close(port_dev);
>>> +	port->dev->priv_flags &= ~IFF_TEAM_PORT;
>>> 	team_port_leave(team, port);
>>>
>>> 	__team_option_inst_mark_removed_port(team, port);
>>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>>> index 45b84dd5c4eb..121f863022ed 100644
>>> --- a/net/ipv6/addrconf.c
>>> +++ b/net/ipv6/addrconf.c
>>> @@ -3482,7 +3482,7 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
>>>
>>> 	case NETDEV_UP:
>>> 	case NETDEV_CHANGE:
>>> -		if (dev->flags & IFF_SLAVE)
>>> +		if (netif_is_lag_port(dev) || dev->flags & IFF_SLAVE)
>>
>> 	Note that netvsc_vf_join() also uses IFF_SLAVE in order skip
>> IPv6 addrconf for netvsc devices; I don't believe its usage will pass
>> netif_is_lag_port().  It looks like the above will work, but your commit
>> message mentions eql as the reason for retaining the IFF_SLAVE test, and
>> eql isn't the only user of IFF_SLAVE in this manner.
>>
>> 	-J
>>
>>> 			break;
>>>
>>> 		if (idev && idev->cnf.disable_ipv6)
>>> -- 
>>> 2.14.4

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply

* RE: [PATCH] bonding:avoid repeated display of same link status change
From: Manish Kumar Singh @ 2018-10-26  6:49 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: Eric Dumazet,
	Mahesh Bandewar (महेश बंडेवार),
	linux-netdev, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, linux-kernel
In-Reply-To: <20181025092930.GC22291@unicorn.suse.cz>



> -----Original Message-----
> From: Michal Kubecek [mailto:mkubecek@suse.cz]
> Sent: 25 अक्तूबर 2018 14:59
> To: Manish Kumar Singh
> Cc: Eric Dumazet; Mahesh Bandewar (महेश बंडेवार); linux-netdev; Jay
> Vosburgh; Veaceslav Falico; Andy Gospodarek; David S. Miller; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH] bonding:avoid repeated display of same link status
> change
> 
> On Thu, Oct 25, 2018 at 02:21:05AM -0700, Manish Kumar Singh wrote:
> > > From: Michal Kubecek [mailto:mkubecek@suse.cz]
> > > IMHO it does not. AFAICS multiple instances of bond_mii_monitor()
> cannot
> > > run simultaneously for the same bond so that there doesn't seem to be
> > > anything to collide with. (And if they could, we would need to test and
> > > set the flag atomically in bond_miimon_inspect().)
> > >
> > Yes, Michal, we are inline with your understanding.
> > when the -original- patch was posted to upstream there was no
> > -synchronization- nor -racing- addressing code was in read/write of this
> > added filed, as we -never- saw need for either.
> >
> > -only- writer of the added field is bond_mii_monitor.
> > -only- reader of the added field is bond_miimon_inspect.
> > -this writer & reader -never- can run concurrently.
> > -writer invokes the reader.
> >
> > hence, imo uint_8 rtnl_needed is all what is needed; with
> bond_mii_monitor doing rtnl_needed = 1; and bond_miimon_inspect doing
> if rtnl_needed.
> >
> > here is the gravity of the situation with multiple customers whose names
> including machine names redacted:
> >
> >  4353 May 31 02:38:57 hostname kernel: ixgbe 0000:03:00.0: removed PHC
> on p2p1
> >  4354 May 31 02:38:57 hostname kernel: public: link status down for active
> interface p2p1, disabling it in 100 ms
> >  4355 May 31 02:38:57 hostname kernel: public: link status down for active
> interface p2p1, disabling it in 100 ms
> >  4356 May 31 02:38:57 hostname kernel: public: link status definitely down
> for interface p2p1, disabling it
> >  4357 May 31 02:38:57 hostname kernel: public: making interface p2p2 the
> new active one
> >  4358 May 31 02:38:59 hostname kernel: ixgbe 0000:03:00.0: registered PHC
> device on p2p1
> >  4359 May 31 02:39:00 hostname kernel: ixgbe 0000:03:00.0 p2p1: NIC Link is
> Up 10 Gbps, Flow Control: RX/TX
> >  4360 May 31 02:39:00 hostname kernel: public: link status up for interface
> p2p1, enabling it in 200 ms
> >  4361 May 31 02:39:00 hostname kernel: public: link status definitely up for
> interface p2p1, 10000 Mbps full duplex
> >  4362 May 31 02:45:37 hostname journal: Missed 217723 kernel messages
> >  4363 May 31 02:45:37 hostname kernel: public: link status down for active
> interface p2p2, disabling it in 100 ms
> >  	---------------------
> >                       11000+ APPROX SAME REPEATED MESSAGES in second
> > 	---------------------
> > 15877 May 31 02:45:37 hostname kernel: public: link status down for active
> interface p2p2, disabling it in 100 ms
> > 15878 May 31 02:45:37 hostname kernel: public: link status definitely down
> for interface p2p2, disabling it
> > 15879 May 31 02:45:37 hostname kernel: public: making interface p2p1 the
> new active one
> 
> When I was replying, I didn't know this was a v2 and I haven't seen the
> v1 discussion. I have read it since and I think I understand Eric's
> point now. The thing is that just adding e.g. u8 is OK as it is now.
> However, someone could later add another u8 next to it which would also
> be perfectly OK on its own but reads/writes to these two could collide
> between each other.
> 
> And as pointed out by a colleague, even having atomic_t and u8 flag in
> one 64-bit word could be a problem on architectures which cannot do an
> atomic read/write from/to a 32-bit word (sparc seems to be one).

Thanks Michal for explaining it, now we understand the problem what Eric was referring to in v1 of the patch.
I could think of fixing it in 3 ways, Please suggest which one would be safe and optimal fix:

1. Use type unit64_t for rtnl_needed .
2. Use type atomic64_t for rtnl_needed and atomic64_set/read.
3. Use type uint64_t for rtnl_needed with spinlock protection.

I think option 3 would be overkill keeping in mind the frequency of bond_mii_monitor.

Thanks,
Manish
> 
> Michal Kubecek

^ permalink raw reply

* Re: [PATCH net-next] net/ipv6: Block IPv6 addrconf on team ports
From: Chas Williams @ 2018-10-25 22:12 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: davem, netdev, vfalico, andy, jiri, kuznet, yoshfuji
In-Reply-To: <25151.1540504752@famine>



On 10/25/2018 05:59 PM, Jay Vosburgh wrote:
> Chas Williams <3chas3@gmail.com> wrote:
> 
>> netif_is_lag_port should be used to identify link aggregation ports.
>> For this to work, we need to reorganize the bonding and team drivers
>> so that the necessary flags are set before dev_open is called.
>>
>> commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
>> made this decision originally based on the IFF_SLAVE flag which isn't
>> used by the team driver.  Note, we do need to retain the IFF_SLAVE
>> check for the eql driver.
> 
> 	Is 31e77c93e432 the correct commit reference?  I don't see
> anything in there about IFF_SLAVE or bonding; it's a patch to the
> process scheduler.

No, that's wrong.  It should be c2edacf80e155.

> 	And, as Jiri said, the subject doesn't mention bonding.

The behavior of bonding wasn't changed.  The intent of the patch
is to add team slaves to the interfaces that don't get automatic
IPv6 addresses.  The body discusses why bonding had to change as
well.

I was under the impression that the subject needs to kept short.
If there a better way to phrase what I want to do?

> 
>> Signed-off-by: Chas Williams <3chas3@gmail.com>
>> ---
>> drivers/net/bonding/bond_main.c | 4 ++--
>> drivers/net/team/team.c         | 7 +++++--
>> net/ipv6/addrconf.c             | 2 +-
>> 3 files changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index ffa37adb7681..5cdad164332b 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -1536,6 +1536,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
>>
>> 	/* set slave flag before open to prevent IPv6 addrconf */
>> 	slave_dev->flags |= IFF_SLAVE;
>> +	slave_dev->priv_flags |= IFF_BONDING;
>>
>> 	/* open the slave since the application closed it */
>> 	res = dev_open(slave_dev);
>> @@ -1544,7 +1545,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
>> 		goto err_restore_mac;
>> 	}
>>
>> -	slave_dev->priv_flags |= IFF_BONDING;
>> 	/* initialize slave stats */
>> 	dev_get_stats(new_slave->dev, &new_slave->slave_stats);
>>
>> @@ -1804,10 +1804,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
>> 	slave_disable_netpoll(new_slave);
>>
>> err_close:
>> -	slave_dev->priv_flags &= ~IFF_BONDING;
>> 	dev_close(slave_dev);
>>
>> err_restore_mac:
>> +	slave_dev->priv_flags &= ~IFF_BONDING;
>> 	slave_dev->flags &= ~IFF_SLAVE;
>> 	if (!bond->params.fail_over_mac ||
>> 	    BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
>> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>> index db633ae9f784..8fc7d57e9f6d 100644
>> --- a/drivers/net/team/team.c
>> +++ b/drivers/net/team/team.c
>> @@ -1128,14 +1128,12 @@ static int team_upper_dev_link(struct team *team, struct team_port *port,
>> 					   &lag_upper_info, extack);
>> 	if (err)
>> 		return err;
>> -	port->dev->priv_flags |= IFF_TEAM_PORT;
>> 	return 0;
>> }
>>
>> static void team_upper_dev_unlink(struct team *team, struct team_port *port)
>> {
>> 	netdev_upper_dev_unlink(port->dev, team->dev);
>> -	port->dev->priv_flags &= ~IFF_TEAM_PORT;
>> }
>>
>> static void __team_port_change_port_added(struct team_port *port, bool linkup);
>> @@ -1214,6 +1212,9 @@ static int team_port_add(struct team *team, struct net_device *port_dev,
>> 		goto err_port_enter;
>> 	}
>>
>> +	/* set slave flag before open to prevent IPv6 addrconf */
>> +	port->dev->priv_flags |= IFF_TEAM_PORT;
>> +
>> 	err = dev_open(port_dev);
>> 	if (err) {
>> 		netdev_dbg(dev, "Device %s opening failed\n",
>> @@ -1292,6 +1293,7 @@ static int team_port_add(struct team *team, struct net_device *port_dev,
>> 	dev_close(port_dev);
>>
>> err_dev_open:
>> +	port->dev->priv_flags &= ~IFF_TEAM_PORT;
>> 	team_port_leave(team, port);
>> 	team_port_set_orig_dev_addr(port);
>>
>> @@ -1328,6 +1330,7 @@ static int team_port_del(struct team *team, struct net_device *port_dev)
>> 	dev_uc_unsync(port_dev, dev);
>> 	dev_mc_unsync(port_dev, dev);
>> 	dev_close(port_dev);
>> +	port->dev->priv_flags &= ~IFF_TEAM_PORT;
>> 	team_port_leave(team, port);
>>
>> 	__team_option_inst_mark_removed_port(team, port);
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 45b84dd5c4eb..121f863022ed 100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -3482,7 +3482,7 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
>>
>> 	case NETDEV_UP:
>> 	case NETDEV_CHANGE:
>> -		if (dev->flags & IFF_SLAVE)
>> +		if (netif_is_lag_port(dev) || dev->flags & IFF_SLAVE)
> 
> 	Note that netvsc_vf_join() also uses IFF_SLAVE in order skip
> IPv6 addrconf for netvsc devices; I don't believe its usage will pass
> netif_is_lag_port().  It looks like the above will work, but your commit
> message mentions eql as the reason for retaining the IFF_SLAVE test, and
> eql isn't the only user of IFF_SLAVE in this manner.
> 
> 	-J
> 
>> 			break;
>>
>> 		if (idev && idev->cnf.disable_ipv6)
>> -- 
>> 2.14.4
>>
> 
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com
> 

^ permalink raw reply

* Re: [PATCH net-next] net/ipv6: Block IPv6 addrconf on team ports
From: Chas Williams @ 2018-10-25 22:08 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, j.vosburgh, vfalico, andy, kuznet, yoshfuji
In-Reply-To: <20181025211008.GA2229@nanopsycho.orion>



On 10/25/2018 05:10 PM, Jiri Pirko wrote:
> Thu, Oct 25, 2018 at 11:02:27PM CEST, 3chas3@gmail.com wrote:
>> netif_is_lag_port should be used to identify link aggregation ports.
>> For this to work, we need to reorganize the bonding and team drivers
>> so that the necessary flags are set before dev_open is called.
>>
>> commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
>> made this decision originally based on the IFF_SLAVE flag which isn't
>> used by the team driver.  Note, we do need to retain the IFF_SLAVE
>> check for the eql driver.
>>
>> Signed-off-by: Chas Williams <3chas3@gmail.com>
>> ---
>> drivers/net/bonding/bond_main.c | 4 ++--
>> drivers/net/team/team.c         | 7 +++++--
>> net/ipv6/addrconf.c             | 2 +-
> 
> Subject talks about "team" yet you modify bond and team. Confusing..

The subject discusses what I want to do. The body of the message
covers how I had to do it. The behavior of bonding with respect to
addrconf isn't changed but netif_is_lag_port is picky about the
flags it wants to see from bonding.  So some bonding changes are
necessary.

^ permalink raw reply

* Re: [PATCH net-next] net/ipv6: Block IPv6 addrconf on team ports
From: Jay Vosburgh @ 2018-10-25 21:59 UTC (permalink / raw)
  To: Chas Williams; +Cc: davem, netdev, vfalico, andy, jiri, kuznet, yoshfuji
In-Reply-To: <20181025210227.25544-1-3chas3@gmail.com>

Chas Williams <3chas3@gmail.com> wrote:

>netif_is_lag_port should be used to identify link aggregation ports.
>For this to work, we need to reorganize the bonding and team drivers
>so that the necessary flags are set before dev_open is called.
>
>commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
>made this decision originally based on the IFF_SLAVE flag which isn't
>used by the team driver.  Note, we do need to retain the IFF_SLAVE
>check for the eql driver.

	Is 31e77c93e432 the correct commit reference?  I don't see
anything in there about IFF_SLAVE or bonding; it's a patch to the
process scheduler.

	And, as Jiri said, the subject doesn't mention bonding.

>Signed-off-by: Chas Williams <3chas3@gmail.com>
>---
> drivers/net/bonding/bond_main.c | 4 ++--
> drivers/net/team/team.c         | 7 +++++--
> net/ipv6/addrconf.c             | 2 +-
> 3 files changed, 8 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index ffa37adb7681..5cdad164332b 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1536,6 +1536,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
> 
> 	/* set slave flag before open to prevent IPv6 addrconf */
> 	slave_dev->flags |= IFF_SLAVE;
>+	slave_dev->priv_flags |= IFF_BONDING;
> 
> 	/* open the slave since the application closed it */
> 	res = dev_open(slave_dev);
>@@ -1544,7 +1545,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
> 		goto err_restore_mac;
> 	}
> 
>-	slave_dev->priv_flags |= IFF_BONDING;
> 	/* initialize slave stats */
> 	dev_get_stats(new_slave->dev, &new_slave->slave_stats);
> 
>@@ -1804,10 +1804,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
> 	slave_disable_netpoll(new_slave);
> 
> err_close:
>-	slave_dev->priv_flags &= ~IFF_BONDING;
> 	dev_close(slave_dev);
> 
> err_restore_mac:
>+	slave_dev->priv_flags &= ~IFF_BONDING;
> 	slave_dev->flags &= ~IFF_SLAVE;
> 	if (!bond->params.fail_over_mac ||
> 	    BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
>diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>index db633ae9f784..8fc7d57e9f6d 100644
>--- a/drivers/net/team/team.c
>+++ b/drivers/net/team/team.c
>@@ -1128,14 +1128,12 @@ static int team_upper_dev_link(struct team *team, struct team_port *port,
> 					   &lag_upper_info, extack);
> 	if (err)
> 		return err;
>-	port->dev->priv_flags |= IFF_TEAM_PORT;
> 	return 0;
> }
> 
> static void team_upper_dev_unlink(struct team *team, struct team_port *port)
> {
> 	netdev_upper_dev_unlink(port->dev, team->dev);
>-	port->dev->priv_flags &= ~IFF_TEAM_PORT;
> }
> 
> static void __team_port_change_port_added(struct team_port *port, bool linkup);
>@@ -1214,6 +1212,9 @@ static int team_port_add(struct team *team, struct net_device *port_dev,
> 		goto err_port_enter;
> 	}
> 
>+	/* set slave flag before open to prevent IPv6 addrconf */
>+	port->dev->priv_flags |= IFF_TEAM_PORT;
>+
> 	err = dev_open(port_dev);
> 	if (err) {
> 		netdev_dbg(dev, "Device %s opening failed\n",
>@@ -1292,6 +1293,7 @@ static int team_port_add(struct team *team, struct net_device *port_dev,
> 	dev_close(port_dev);
> 
> err_dev_open:
>+	port->dev->priv_flags &= ~IFF_TEAM_PORT;
> 	team_port_leave(team, port);
> 	team_port_set_orig_dev_addr(port);
> 
>@@ -1328,6 +1330,7 @@ static int team_port_del(struct team *team, struct net_device *port_dev)
> 	dev_uc_unsync(port_dev, dev);
> 	dev_mc_unsync(port_dev, dev);
> 	dev_close(port_dev);
>+	port->dev->priv_flags &= ~IFF_TEAM_PORT;
> 	team_port_leave(team, port);
> 
> 	__team_option_inst_mark_removed_port(team, port);
>diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>index 45b84dd5c4eb..121f863022ed 100644
>--- a/net/ipv6/addrconf.c
>+++ b/net/ipv6/addrconf.c
>@@ -3482,7 +3482,7 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
> 
> 	case NETDEV_UP:
> 	case NETDEV_CHANGE:
>-		if (dev->flags & IFF_SLAVE)
>+		if (netif_is_lag_port(dev) || dev->flags & IFF_SLAVE)

	Note that netvsc_vf_join() also uses IFF_SLAVE in order skip
IPv6 addrconf for netvsc devices; I don't believe its usage will pass
netif_is_lag_port().  It looks like the above will work, but your commit
message mentions eql as the reason for retaining the IFF_SLAVE test, and
eql isn't the only user of IFF_SLAVE in this manner.

	-J

> 			break;
> 
> 		if (idev && idev->cnf.disable_ipv6)
>-- 
>2.14.4
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 03/10] audit: log container info of syscalls
From: Steve Grubb @ 2018-10-25 21:55 UTC (permalink / raw)
  To: Paul Moore
  Cc: rgb, simo, carlos, linux-api, containers, linux-kernel, dhowells,
	linux-audit, netfilter-devel, ebiederm, luto, netdev,
	linux-fsdevel, Eric Paris, Serge Hallyn, viro
In-Reply-To: <CAHC9VhScaG8aOFYRV5hPXEKob1QRth5YFEbHNY=ZgKKAKxBBsQ@mail.gmail.com>

On Thu, 25 Oct 2018 16:40:19 -0400
Paul Moore <paul@paul-moore.com> wrote:

> On Thu, Oct 25, 2018 at 1:38 PM Richard Guy Briggs <rgb@redhat.com>
> wrote:
> > On 2018-10-25 17:57, Steve Grubb wrote:  
> > > On Thu, 25 Oct 2018 08:27:32 -0400
> > > Richard Guy Briggs <rgb@redhat.com> wrote:
> > >  
> > > > On 2018-10-25 06:49, Paul Moore wrote:  
> > > > > On Thu, Oct 25, 2018 at 2:06 AM Steve Grubb
> > > > > <sgrubb@redhat.com> wrote:  
> > > > > > On Wed, 24 Oct 2018 20:42:55 -0400
> > > > > > Richard Guy Briggs <rgb@redhat.com> wrote:  
> > > > > > > On 2018-10-24 16:55, Paul Moore wrote:  
> > > > > > > > On Wed, Oct 24, 2018 at 11:15 AM Richard Guy Briggs
> > > > > > > > <rgb@redhat.com> wrote:  
> > > > > > > > > On 2018-10-19 19:16, Paul Moore wrote:  
> > > > > > > > > > On Sun, Aug 5, 2018 at 4:32 AM Richard Guy Briggs
> > > > > > > > > > <rgb@redhat.com> wrote:  
> > > > >
> > > > > ...
> > > > >  
> > > > > > > > > > > +/*
> > > > > > > > > > > + * audit_log_contid - report container info
> > > > > > > > > > > + * @tsk: task to be recorded
> > > > > > > > > > > + * @context: task or local context for record
> > > > > > > > > > > + * @op: contid string description
> > > > > > > > > > > + */
> > > > > > > > > > > +int audit_log_contid(struct task_struct *tsk,
> > > > > > > > > > > +                            struct audit_context
> > > > > > > > > > > *context, char *op) +{
> > > > > > > > > > > +       struct audit_buffer *ab;
> > > > > > > > > > > +
> > > > > > > > > > > +       if (!audit_contid_set(tsk))
> > > > > > > > > > > +               return 0;
> > > > > > > > > > > +       /* Generate AUDIT_CONTAINER record with
> > > > > > > > > > > container ID */
> > > > > > > > > > > +       ab = audit_log_start(context, GFP_KERNEL,
> > > > > > > > > > > AUDIT_CONTAINER);
> > > > > > > > > > > +       if (!ab)
> > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > +       audit_log_format(ab, "op=%s contid=%llu",
> > > > > > > > > > > +                        op,
> > > > > > > > > > > audit_get_contid(tsk));
> > > > > > > > > > > +       audit_log_end(ab);
> > > > > > > > > > > +       return 0;
> > > > > > > > > > > +}
> > > > > > > > > > > +EXPORT_SYMBOL(audit_log_contid);  
> > > > > > > > > >
> > > > > > > > > > As discussed in the previous iteration of the
> > > > > > > > > > patch, I prefer AUDIT_CONTAINER_ID here over
> > > > > > > > > > AUDIT_CONTAINER.  If you feel strongly about
> > > > > > > > > > keeping it as-is with AUDIT_CONTAINER I suppose I
> > > > > > > > > > could live with that, but it is isn't my first
> > > > > > > > > > choice.  
> > > > > > > > >
> > > > > > > > > I don't have a strong opinion on this one, mildly
> > > > > > > > > preferring the shorter one only because it is
> > > > > > > > > shorter.  
> > > > > > > >
> > > > > > > > We already have multiple AUDIT_CONTAINER* record types,
> > > > > > > > so it seems as though we should use "AUDIT_CONTAINER"
> > > > > > > > as a prefix of sorts, rather than a type itself.  
> > > > > > >
> > > > > > > I'm fine with that.  I'd still like to hear Steve's
> > > > > > > input.  He had stronger opinions than me.  
> > > > > >
> > > > > > The creation event should be separate and distinct from the
> > > > > > continuing use when its used as a supplemental record. IOW,
> > > > > > binding the ID to a container is part of the lifecycle and
> > > > > > needs to be kept distinct.  
> > > > >
> > > > > Steve's comment is pretty ambiguous when it comes to
> > > > > AUDIT_CONTAINER vs AUDIT_CONTAINER_ID, but one could argue
> > > > > that AUDIT_CONTAINER_ID helps distinguish the audit container
> > > > > id marking record and gets to what I believe is the spirit of
> > > > > Steve's comment.  Taking this in context with my previous
> > > > > remarks, let's switch to using AUDIT_CONTAINER_ID.  
> > > >
> > > > I suspect Steve is mixing up AUDIT_CONTAINER_OP with
> > > > AUDIT_CONTAINER_ID, confusing the fact that they are two
> > > > seperate records.  As a summary, the suggested records are:
> > > >     CONTAINER_OP    audit container identifier creation
> > > >     CONTAINER       audit container identifier aux record to an
> > > > event
> > > >
> > > > and what Paul is suggesting (which is fine by me) is:
> > > >     CONTAINER_OP    audit container identifier creation event
> > > >     CONTAINER_ID    audit container identifier aux record to
> > > > an event
> > > >
> > > > Steve, please indicate you are fine with this.  
> > >
> > > I thought it was:  
> >
> > It *was*.  It was changed at Paul's request in this v3 thread:
> >         https://www.redhat.com/archives/linux-audit/2018-July/msg00087.html
> >
> > And listed in the examples and changelog to this v4 patchset:
> >         https://www.redhat.com/archives/linux-audit/2018-July/msg00178.html
> >
> > It is also listed in this userspace patchset update v4 (which should
> > also have had a changelog added to it, note to self...):
> >         https://www.redhat.com/archives/linux-audit/2018-July/msg00189.html
> >
> > I realize it is hard to keep up with all the detail changes in these
> > patchsets...
> >  
> > > CONTAINER_ID audit container identifier creation event
> > > event. CONTAINER audit container identifier aux record to an
> > > event
> > >
> > > Or vice versa. Don't mix up creation of the identifier with
> > > operations.  
> >
> > Exactly what I'm trying to avoid...  Worded another way: "Don't mix
> > up the creation operation with routine reporting of the identifier
> > in events."  Steve, can you and Paul discuss and agree on what they
> > should be called?  I don't have a horse in this race, but I need to
> > record the result of that run.  ;-)  
> 
> See my previous comments, I think I've been pretty clear on what I
> would like to see.

And historically speaking setting audit loginuid produces a LOGIN
event, so it only makes sense to consider binding container ID to
container as a CONTAINER event. For other supplemental records, we name
things what they are: PATH, CWD, SOCKADDR, etc. So, CONTAINER_ID makes
sense. CONTAINER_OP sounds like its for operations on a container. Do
we have any operations on a container?

-Steve

^ permalink raw reply

* Re: [PATCH net-next] net/ipv6: Block IPv6 addrconf on team ports
From: Jiri Pirko @ 2018-10-25 21:10 UTC (permalink / raw)
  To: Chas Williams; +Cc: davem, netdev, j.vosburgh, vfalico, andy, kuznet, yoshfuji
In-Reply-To: <20181025210227.25544-1-3chas3@gmail.com>

Thu, Oct 25, 2018 at 11:02:27PM CEST, 3chas3@gmail.com wrote:
>netif_is_lag_port should be used to identify link aggregation ports.
>For this to work, we need to reorganize the bonding and team drivers
>so that the necessary flags are set before dev_open is called.
>
>commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
>made this decision originally based on the IFF_SLAVE flag which isn't
>used by the team driver.  Note, we do need to retain the IFF_SLAVE
>check for the eql driver.
>
>Signed-off-by: Chas Williams <3chas3@gmail.com>
>---
> drivers/net/bonding/bond_main.c | 4 ++--
> drivers/net/team/team.c         | 7 +++++--
> net/ipv6/addrconf.c             | 2 +-

Subject talks about "team" yet you modify bond and team. Confusing..

^ permalink raw reply

* [PATCH net-next] net/ipv6: Block IPv6 addrconf on team ports
From: Chas Williams @ 2018-10-25 21:02 UTC (permalink / raw)
  To: davem
  Cc: netdev, j.vosburgh, vfalico, andy, jiri, kuznet, yoshfuji,
	Chas Williams

netif_is_lag_port should be used to identify link aggregation ports.
For this to work, we need to reorganize the bonding and team drivers
so that the necessary flags are set before dev_open is called.

commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
made this decision originally based on the IFF_SLAVE flag which isn't
used by the team driver.  Note, we do need to retain the IFF_SLAVE
check for the eql driver.

Signed-off-by: Chas Williams <3chas3@gmail.com>
---
 drivers/net/bonding/bond_main.c | 4 ++--
 drivers/net/team/team.c         | 7 +++++--
 net/ipv6/addrconf.c             | 2 +-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index ffa37adb7681..5cdad164332b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1536,6 +1536,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
 
 	/* set slave flag before open to prevent IPv6 addrconf */
 	slave_dev->flags |= IFF_SLAVE;
+	slave_dev->priv_flags |= IFF_BONDING;
 
 	/* open the slave since the application closed it */
 	res = dev_open(slave_dev);
@@ -1544,7 +1545,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
 		goto err_restore_mac;
 	}
 
-	slave_dev->priv_flags |= IFF_BONDING;
 	/* initialize slave stats */
 	dev_get_stats(new_slave->dev, &new_slave->slave_stats);
 
@@ -1804,10 +1804,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
 	slave_disable_netpoll(new_slave);
 
 err_close:
-	slave_dev->priv_flags &= ~IFF_BONDING;
 	dev_close(slave_dev);
 
 err_restore_mac:
+	slave_dev->priv_flags &= ~IFF_BONDING;
 	slave_dev->flags &= ~IFF_SLAVE;
 	if (!bond->params.fail_over_mac ||
 	    BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index db633ae9f784..8fc7d57e9f6d 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1128,14 +1128,12 @@ static int team_upper_dev_link(struct team *team, struct team_port *port,
 					   &lag_upper_info, extack);
 	if (err)
 		return err;
-	port->dev->priv_flags |= IFF_TEAM_PORT;
 	return 0;
 }
 
 static void team_upper_dev_unlink(struct team *team, struct team_port *port)
 {
 	netdev_upper_dev_unlink(port->dev, team->dev);
-	port->dev->priv_flags &= ~IFF_TEAM_PORT;
 }
 
 static void __team_port_change_port_added(struct team_port *port, bool linkup);
@@ -1214,6 +1212,9 @@ static int team_port_add(struct team *team, struct net_device *port_dev,
 		goto err_port_enter;
 	}
 
+	/* set slave flag before open to prevent IPv6 addrconf */
+	port->dev->priv_flags |= IFF_TEAM_PORT;
+
 	err = dev_open(port_dev);
 	if (err) {
 		netdev_dbg(dev, "Device %s opening failed\n",
@@ -1292,6 +1293,7 @@ static int team_port_add(struct team *team, struct net_device *port_dev,
 	dev_close(port_dev);
 
 err_dev_open:
+	port->dev->priv_flags &= ~IFF_TEAM_PORT;
 	team_port_leave(team, port);
 	team_port_set_orig_dev_addr(port);
 
@@ -1328,6 +1330,7 @@ static int team_port_del(struct team *team, struct net_device *port_dev)
 	dev_uc_unsync(port_dev, dev);
 	dev_mc_unsync(port_dev, dev);
 	dev_close(port_dev);
+	port->dev->priv_flags &= ~IFF_TEAM_PORT;
 	team_port_leave(team, port);
 
 	__team_option_inst_mark_removed_port(team, port);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 45b84dd5c4eb..121f863022ed 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3482,7 +3482,7 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 
 	case NETDEV_UP:
 	case NETDEV_CHANGE:
-		if (dev->flags & IFF_SLAVE)
+		if (netif_is_lag_port(dev) || dev->flags & IFF_SLAVE)
 			break;
 
 		if (idev && idev->cnf.disable_ipv6)
-- 
2.14.4

^ permalink raw reply related

* Re: [PATCH ghak90 (was ghak32) V4 03/10] audit: log container info of syscalls
From: Paul Moore @ 2018-10-25 20:40 UTC (permalink / raw)
  To: rgb
  Cc: sgrubb, simo, carlos, linux-api, containers, linux-kernel,
	dhowells, linux-audit, netfilter-devel, ebiederm, luto, netdev,
	linux-fsdevel, Eric Paris, Serge Hallyn, viro
In-Reply-To: <20181025173830.4yklhnrydt5qvr67@madcap2.tricolour.ca>

On Thu, Oct 25, 2018 at 1:38 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2018-10-25 17:57, Steve Grubb wrote:
> > On Thu, 25 Oct 2018 08:27:32 -0400
> > Richard Guy Briggs <rgb@redhat.com> wrote:
> >
> > > On 2018-10-25 06:49, Paul Moore wrote:
> > > > On Thu, Oct 25, 2018 at 2:06 AM Steve Grubb <sgrubb@redhat.com>
> > > > wrote:
> > > > > On Wed, 24 Oct 2018 20:42:55 -0400
> > > > > Richard Guy Briggs <rgb@redhat.com> wrote:
> > > > > > On 2018-10-24 16:55, Paul Moore wrote:
> > > > > > > On Wed, Oct 24, 2018 at 11:15 AM Richard Guy Briggs
> > > > > > > <rgb@redhat.com> wrote:
> > > > > > > > On 2018-10-19 19:16, Paul Moore wrote:
> > > > > > > > > On Sun, Aug 5, 2018 at 4:32 AM Richard Guy Briggs
> > > > > > > > > <rgb@redhat.com> wrote:
> > > >
> > > > ...
> > > >
> > > > > > > > > > +/*
> > > > > > > > > > + * audit_log_contid - report container info
> > > > > > > > > > + * @tsk: task to be recorded
> > > > > > > > > > + * @context: task or local context for record
> > > > > > > > > > + * @op: contid string description
> > > > > > > > > > + */
> > > > > > > > > > +int audit_log_contid(struct task_struct *tsk,
> > > > > > > > > > +                            struct audit_context
> > > > > > > > > > *context, char *op) +{
> > > > > > > > > > +       struct audit_buffer *ab;
> > > > > > > > > > +
> > > > > > > > > > +       if (!audit_contid_set(tsk))
> > > > > > > > > > +               return 0;
> > > > > > > > > > +       /* Generate AUDIT_CONTAINER record with
> > > > > > > > > > container ID */
> > > > > > > > > > +       ab = audit_log_start(context, GFP_KERNEL,
> > > > > > > > > > AUDIT_CONTAINER);
> > > > > > > > > > +       if (!ab)
> > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > +       audit_log_format(ab, "op=%s contid=%llu",
> > > > > > > > > > +                        op, audit_get_contid(tsk));
> > > > > > > > > > +       audit_log_end(ab);
> > > > > > > > > > +       return 0;
> > > > > > > > > > +}
> > > > > > > > > > +EXPORT_SYMBOL(audit_log_contid);
> > > > > > > > >
> > > > > > > > > As discussed in the previous iteration of the patch, I
> > > > > > > > > prefer AUDIT_CONTAINER_ID here over AUDIT_CONTAINER.  If
> > > > > > > > > you feel strongly about keeping it as-is with
> > > > > > > > > AUDIT_CONTAINER I suppose I could live with that, but it
> > > > > > > > > is isn't my first choice.
> > > > > > > >
> > > > > > > > I don't have a strong opinion on this one, mildly
> > > > > > > > preferring the shorter one only because it is shorter.
> > > > > > >
> > > > > > > We already have multiple AUDIT_CONTAINER* record types, so it
> > > > > > > seems as though we should use "AUDIT_CONTAINER" as a prefix
> > > > > > > of sorts, rather than a type itself.
> > > > > >
> > > > > > I'm fine with that.  I'd still like to hear Steve's input.  He
> > > > > > had stronger opinions than me.
> > > > >
> > > > > The creation event should be separate and distinct from the
> > > > > continuing use when its used as a supplemental record. IOW,
> > > > > binding the ID to a container is part of the lifecycle and needs
> > > > > to be kept distinct.
> > > >
> > > > Steve's comment is pretty ambiguous when it comes to AUDIT_CONTAINER
> > > > vs AUDIT_CONTAINER_ID, but one could argue that AUDIT_CONTAINER_ID
> > > > helps distinguish the audit container id marking record and gets to
> > > > what I believe is the spirit of Steve's comment.  Taking this in
> > > > context with my previous remarks, let's switch to using
> > > > AUDIT_CONTAINER_ID.
> > >
> > > I suspect Steve is mixing up AUDIT_CONTAINER_OP with
> > > AUDIT_CONTAINER_ID, confusing the fact that they are two seperate
> > > records.  As a summary, the suggested records are:
> > >     CONTAINER_OP    audit container identifier creation
> > >     CONTAINER       audit container identifier aux record to an
> > > event
> > >
> > > and what Paul is suggesting (which is fine by me) is:
> > >     CONTAINER_OP    audit container identifier creation event
> > >     CONTAINER_ID    audit container identifier aux record to
> > > an event
> > >
> > > Steve, please indicate you are fine with this.
> >
> > I thought it was:
>
> It *was*.  It was changed at Paul's request in this v3 thread:
>         https://www.redhat.com/archives/linux-audit/2018-July/msg00087.html
>
> And listed in the examples and changelog to this v4 patchset:
>         https://www.redhat.com/archives/linux-audit/2018-July/msg00178.html
>
> It is also listed in this userspace patchset update v4 (which should
> also have had a changelog added to it, note to self...):
>         https://www.redhat.com/archives/linux-audit/2018-July/msg00189.html
>
> I realize it is hard to keep up with all the detail changes in these
> patchsets...
>
> > CONTAINER_ID audit container identifier creation event
> > CONTAINER audit container identifier aux record to an event
> >
> > Or vice versa. Don't mix up creation of the identifier with operations.
>
> Exactly what I'm trying to avoid...  Worded another way: "Don't mix up
> the creation operation with routine reporting of the identifier in
> events."  Steve, can you and Paul discuss and agree on what they should
> be called?  I don't have a horse in this race, but I need to record the
> result of that run.  ;-)

See my previous comments, I think I've been pretty clear on what I
would like to see.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* [RFC] net: stmmac: RX Jumbo packet size > 8191 problem
From: Thor Thayer @ 2018-10-25 20:41 UTC (permalink / raw)
  To: Giuseppe CAVALLARO, alexandre.torgue, joabreu, netdev

Hi,

I'm running into a weird issue at the DMA boundary for large packets 
(>8192) that I can't explain.  I'm hoping someone here has an idea on 
why I'm seeing this issue.

This is the Synopsys DesignWare Ethernet GMAC core (3.74) using the 
stmmac driver found at drivers/net/ethernet/stmicro/stmmac.

If I ping with data sizes that exceed the first DMA buffer size (size 
set to 8191), ping reports a data mismatch as follows at byte #8144:

$ ping -c 1 -M do -s 8150 192.168.1.99
PING 192.168.1.99 (192.168.1.99) 8150(8178) bytes of data.
8158 bytes from 192.168.1.99: icmp_seq=1 ttl=64 time=0.669 ms
wrong data byte #8144 should be 0xd0 but was 0x0
#16	10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 
27 28 29 2a 2b 2c 2d 2e 2f
%< ---------------snip--------------------------------------
#8112	b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf c0 c1 c2 c3 c4 c5 
c6 c7 c8 c9 ca cb cc cd ce cf
#8144	0 0 0 0 d0 d1
         ^^^^^^^
Notice the 4 bytes of 0 there before the expected byte of d0. I 
confirmed the on-wire result with wireshark - same data packet as shown 
above.

Looking at the queue, I'm seeing these values in the RX descriptors (I'm 
using ring mode, enhanced descriptors).
0xa0040320 0x9fff1fff 0x7a358042 0x7a35a042
  ^des0      ^des1      ^des2      ^desc3

desc0 => 8196 bytes, OWN, First & Last Descriptor, Frame type = Eth
desc1 => Disable IRQ on done, Rx Buffer2 sz = 8191, Rx Buffer1 sz = 8191
desc2 => Buffer 1 Addr Pointer
desc3 => Buffer 2 Addr Pointer

If I adjust init_desc3() and refill_desc3() to initialize desc3 to 
desc2+BUF_SIZE_8KiB-4, I get a descriptor as show below and ping 
completes successfully.
0xa0040320 0x9fff1fff 0x77df8042 0x77dfa03e
                                   ^ this is now different

But I'm not sure why the -4 works because desc3 overlaps into the end of 
the first DMA buffer area (des2) which is counterintuitive.

At first I thought the 4 extra bytes were the FCS but that should occur 
at the end of the complete transfer, so I'd expect it to be at the end 
of all the data (in buffer2)

Here is the change that works. I ran a ping sweep with packet sizes from 
8100 to 8300 successfully with this change.
-------------------------------------------------------
$ git diff
diff --git a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c 
b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
index abc3f85270cd..b52be0235d8f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
+++ b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
@@ -115,13 +115,13 @@ static void refill_desc3(void *priv_ptr, struct 
dma_desc *p)

         /* Fill DES3 in case of RING mode */
         if (priv->dma_buf_sz >= BUF_SIZE_8KiB)
-               p->des3 = cpu_to_le32(le32_to_cpu(p->des2) + BUF_SIZE_8KiB);
+               p->des3 = cpu_to_le32(le32_to_cpu(p->des2) +
+                                     BUF_SIZE_8KiB - 4);

  }

  /* In ring mode we need to fill the desc3 because it is used as buffer */
  static void init_desc3(struct dma_desc *p)
  {
-       p->des3 = cpu_to_le32(le32_to_cpu(p->des2) + BUF_SIZE_8KiB);
+       p->des3 = cpu_to_le32(le32_to_cpu(p->des2) + BUF_SIZE_8KiB - 4);
  }

  static void clean_desc3(void *priv_ptr, struct dma_desc *p)
-----------------------------------------------------------

Any thoughts on why I need to change the indexing?

Thanks,

Thor

^ permalink raw reply related

* Re: [PATCH 1/2] Bluetooth: Add quirk for reading BD_ADDR from fwnode property
From: Balakrishna Godavarthi @ 2018-10-26  5:01 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: Marcel Holtmann, Johan Hedberg, David S . Miller, Loic Poulain,
	linux-bluetooth, netdev, linux-kernel, Brian Norris,
	Dmitry Grinberg, hemantg
In-Reply-To: <4041ef05cdd70d28d665d3288c4d4c43@codeaurora.org>

Hi Matthias,

I missed to add a point here.

On 2018-10-25 20:06, Balakrishna Godavarthi wrote:
> On 2018-10-25 05:51, Matthias Kaehlcke wrote:
>> Add HCI_QUIRK_USE_BDADDR_PROPERTY to allow controllers to retrieve
>> the public Bluetooth address from the firmware node property
>> 'local-bd-address'. If quirk is set and the property does not exist
>> or is invalid the controller is marked as unconfigured.
>> 
>> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
>> ---
>> hci_dev_get_bd_addr_from_property() currently assumes that the
>> firmware node with 'local-bd-address' is from hdev->dev.parent, not
>> sure if this universally true. However if it is true for existing
>> device that might use this interface we can assume this for now
>> (unless there is a clear solution now), and cross the bridge of
>> finding an alternative when we actually encounter the situation.
>> One option could be to look for the first parent that has a fwnode.
>> ---
>>  include/net/bluetooth/hci.h | 12 +++++++++++
>>  net/bluetooth/hci_core.c    | 42 
>> +++++++++++++++++++++++++++++++++++++
>>  net/bluetooth/mgmt.c        |  6 ++++--
>>  3 files changed, 58 insertions(+), 2 deletions(-)
>> 
>> diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
>> index cdd9f1fe7cfa..a5d748099752 100644
>> --- a/include/net/bluetooth/hci.h
>> +++ b/include/net/bluetooth/hci.h
>> @@ -158,6 +158,18 @@ enum {
>>  	 */
>>  	HCI_QUIRK_INVALID_BDADDR,
>> 
>> +	/* When this quirk is set, the public Bluetooth address
>> +	 * initially reported by HCI Read BD Address command
>> +	 * is considered invalid. The public BD Address can be
>> +	 * specified in the fwnode property 'local-bd-address'.
>> +	 * If this property does not exist or is invalid controller
>> +	 * configuration is required before this device can be used.
>> +	 *
>> +	 * This quirk can be set before hci_register_dev is called or
>> +	 * during the hdev->setup vendor callback.
>> +	 */
>> +	HCI_QUIRK_USE_BDADDR_PROPERTY,
>> +
>>  	/* When this quirk is set, the duplicate filtering during
>>  	 * scanning is based on Bluetooth devices addresses. To allow
>>  	 * RSSI based updates, restart scanning if needed.
>> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> index 74b29c7d841c..97214262c4fb 100644
>> --- a/net/bluetooth/hci_core.c
>> +++ b/net/bluetooth/hci_core.c
>> @@ -30,6 +30,7 @@
>>  #include <linux/rfkill.h>
>>  #include <linux/debugfs.h>
>>  #include <linux/crypto.h>
>> +#include <linux/property.h>
>>  #include <asm/unaligned.h>
>> 
>>  #include <net/bluetooth/bluetooth.h>
>> @@ -1355,9 +1356,40 @@ int hci_inquiry(void __user *arg)
>>  	return err;
>>  }
>> 
>> +/**
>> + * hci_dev_get_bd_addr_from_property - Get the Bluetooth Device 
>> Address
>> + *				       (BD_ADDR) for a HCI device from
>> + *				       a firmware node property.
>> + * @hdev:	The HCI device
>> + *
>> + * Search the firmware node for 'local-bd-address'.
>> + *
>> + * All-zero BD addresses are rejected, because those could be 
>> properties
>> + * that exist in the firmware tables, but were not updated by the 
>> firmware. For
>> + * example, the DTS could define 'local-bd-address', with zero BD 
>> addresses.
>> + */
>> +static int hci_dev_get_bd_addr_from_property(struct hci_dev *hdev)
>> +{
>> +	struct fwnode_handle *fwnode = dev_fwnode(hdev->dev.parent);
>> +	bdaddr_t ba;
>> +	int ret;
>> +
>> +	ret = fwnode_property_read_u8_array(fwnode, "local-bd-address",
>> +					    (u8 *)&ba, sizeof(ba));
>> +	if (ret < 0)
>> +		return ret;
>> +	if (!bacmp(&ba, BDADDR_ANY))
>> +		return -ENODATA;
>> +
>> +	hdev->public_addr = ba;
>> +
>> +	return 0;
>> +}
>> +
>>  static int hci_dev_do_open(struct hci_dev *hdev)
>>  {
>>  	int ret = 0;
>> +	bool bd_addr_set = false;
>> 
>>  	BT_DBG("%s %p", hdev->name, hdev);
>> 
>> @@ -1422,6 +1454,16 @@ static int hci_dev_do_open(struct hci_dev 
>> *hdev)
>>  		if (hdev->setup)
>>  			ret = hdev->setup(hdev);
>> 
>> +		if (test_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks)) {
>> +			if (!hci_dev_get_bd_addr_from_property(hdev))
>> +				if (hdev->set_bdaddr &&
>> +				    !hdev->set_bdaddr(hdev, &hdev->public_addr))
>> +					bd_addr_set = true;

Can we check the return status of hdev->setup() before calling 
hdev->set_bdaddr().
some vendors assign hdev->set_baddr helper before calling hdev->setup().
https://elixir.bootlin.com/linux/v4.19-rc8/source/drivers/bluetooth/btqcomsmd.c#L194
There will no use in calling hdev->set_baddr() if hdev->setup() fails.

>> +
>> +			if (!bd_addr_set)
>> +				hci_dev_set_flag(hdev, HCI_UNCONFIGURED);
>> +		}
>> +
>>  		/* The transport driver can set these quirks before
>>  		 * creating the HCI device or in its setup callback.
>>  		 *
>> diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
>> index 3bdc8f3ca259..3d9edb752403 100644
>> --- a/net/bluetooth/mgmt.c
>> +++ b/net/bluetooth/mgmt.c
>> @@ -551,7 +551,8 @@ static bool is_configured(struct hci_dev *hdev)
>>  	    !hci_dev_test_flag(hdev, HCI_EXT_CONFIGURED))
>>  		return false;
>> 
>> -	if (test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) &&
>> +	if ((test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) ||
>> +	     test_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks)) &&
>>  	    !bacmp(&hdev->public_addr, BDADDR_ANY))
>>  		return false;
>> 
>> @@ -566,7 +567,8 @@ static __le32 get_missing_options(struct hci_dev 
>> *hdev)
>>  	    !hci_dev_test_flag(hdev, HCI_EXT_CONFIGURED))
>>  		options |= MGMT_OPTION_EXTERNAL_CONFIG;
>> 
>> -	if (test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) &&
>> +	if ((test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) ||
>> +	     test_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks)) &&
>>  	    !bacmp(&hdev->public_addr, BDADDR_ANY))
>>  		options |= MGMT_OPTION_PUBLIC_ADDRESS;
> 
> Looks fine to me.
> 
> Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>

-- 
Regards
Balakrishna.

^ permalink raw reply

* [4.18-stable 1/1] netfilter: use kvmalloc_array to allocate memory for hashtable
From: Mark Asselstine @ 2018-10-25 20:18 UTC (permalink / raw)
  To: netdev, David Miller

David,

Please promote mainline commit 285189c78eeb6f684a024b86fb5997d10c6aa564 
[netfilter: use kvmalloc_array to allocate memory for hashtable] to 
linux-4.18.y stable. As it happens this not only fixes the issue described in 
the commit log, it also solves the issue of kmemleak reporting false positives 
of 'struct nf_conn' objects.

unreferenced object 0xffff9af78fa6de00 (size 256): 
  comm "rdate", pid 4215, jiffies 4299506036 (age 115.149s) 
  hex dump (first 32 bytes): 
    01 00 00 00 00 00 00 00 0a 00 96 98 f7 9a ff ff ................ 
    45 e6 00 00 00 00 00 00 10 99 a3 94 f7 9a ff ff E............... 
  backtrace: 
    [<0000000006b47d03>] kmem_cache_alloc+0x146/0x200 
    [<00000000dbb53245>] __nf_conntrack_alloc.isra.13+0x4d/0x170[nf_conntrack] 
    [<000000008c1c1285>] init_conntrack+0x6a/0x2f0 [nf_conntrack] 
    [<00000000a6dd3a04>] nf_conntrack_in+0x2c5/0x360 [nf_conntrack] 
    [<0000000000213d80>] ipv4_conntrack_local+0x5d/0x70 [nf_conntrack_ipv4] 
    [<00000000d98fc633>] nf_hook_slow+0x48/0xd0 
    [<00000000fea9b61e>] __ip_local_out+0xbd/0xf0 
    [<00000000e1418ed2>] ip_local_out+0x1c/0x50 
    [<0000000071f63135>] ip_queue_xmit+0x15f/0x3d0 
    [<000000008fb87cfd>] __tcp_transmit_skb+0x5bf/0xab0 
    [<0000000073c7808d>] tcp_connect+0x648/0x830 
    [<000000000e12e101>] tcp_v4_connect+0x458/0x4d0 
    [<000000003223764c>] __inet_stream_connect+0xe2/0x380 
    [<000000005c32d180>] inet_stream_connect+0x3b/0x60 
    [<00000000465bcd15>] __sys_connect+0xce/0x100 
    [<0000000055a63178>] __x64_sys_connect+0x1a/0x20 

The main object pointer to these struct nf_conn objects is 'salted' with 
ip_conntrack_info in sk_buff._nfct, and as such is not a viable pointer to 
this object by the kmemleak logic.

The only other consistent reference to these objects or contents is found in 
the hash table. But it appears that kmemleak does not scan the 
nf_conntrack_hash which is initialized in nf_ct_alloc_hashtable() via 
__get_free_pages(). This results in the objects appearing as "leaks".

I could solve this by keeping the original code and adding a call to 
kmemleak_alloc() in nf_ct_alloc_hashtable() and similarly a call to 
kmemleak_free() in nf_ct_free_hashtable(). But since this mainline commit 
exists which happens to also sort out this issue we are most likely best to do 
the backport and kill two birds with one stone.

He Zhe previously sent out a patch to this list "[RFC] [PATCH] netfilter: Fix 
kmemleak false positive reports". With the additional analysis summarized here 
that patch should not be considered for merging.

Thanks,
Mark Asselstine

^ permalink raw reply

* Re: [PATCH] net/{ipv4,ipv6}: Do not put target net if input nsid is invalid
From: David Ahern @ 2018-10-25 20:16 UTC (permalink / raw)
  To: Bjørn Mork, netdev; +Cc: Li RongQing, David Miller
In-Reply-To: <20181025191825.23936-1-bjorn@mork.no>

On 10/25/18 1:18 PM, Bjørn Mork wrote:
> The cleanup path will put the target net when netnsid is set.  So we must
> reset netnsid if the input is invalid.
> 
> Fixes: d7e38611b81e ("net/ipv4: Put target net when address dump fails due to bad attributes")
> Fixes: 242afaa6968c ("net/ipv6: Put target net when address dump fails due to bad attributes")
> Cc: David Ahern <dsahern@gmail.com>
> Signed-off-by: Bjørn Mork <bjorn@mork.no>
> ---
>  net/ipv4/devinet.c  | 1 +
>  net/ipv6/addrconf.c | 1 +
>  2 files changed, 2 insertions(+)
> 

Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* Re: [PATCH v3 2/2] net: qcom/emac: add phy-handle support for ACPI
From: Andrew Lunn @ 2018-10-25 19:24 UTC (permalink / raw)
  To: Wang Dongsheng; +Cc: timur, yu.zheng, f.fainelli, netdev
In-Reply-To: <7935985e49270ad2948b2a52d26510bdf55572e6.1540459999.git.dongsheng.wang@hxt-semitech.com>

On Thu, Oct 25, 2018 at 06:09:15PM +0800, Wang Dongsheng wrote:
> Use "phy-handle" to porint an internal MDIO device port.

Hi Dongsheng

You are basically defining how all future ACPI based MAC drivers get
access to their PHY. This needs to become part of the ACPI standard,
etc.

This code should not be hidden away in the emac driver. It needs to be
placed somewhere public so other drivers can use it. And it needs good
documentation, including an example of what needs to go into the ACPI
tables, etc.

Thanks
	Andrew

^ permalink raw reply

* [PATCH] net/{ipv4,ipv6}: Do not put target net if input nsid is invalid
From: Bjørn Mork @ 2018-10-25 19:18 UTC (permalink / raw)
  To: netdev; +Cc: Li RongQing, David Miller, Bjørn Mork, David Ahern

The cleanup path will put the target net when netnsid is set.  So we must
reset netnsid if the input is invalid.

Fixes: d7e38611b81e ("net/ipv4: Put target net when address dump fails due to bad attributes")
Fixes: 242afaa6968c ("net/ipv6: Put target net when address dump fails due to bad attributes")
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
---
 net/ipv4/devinet.c  | 1 +
 net/ipv6/addrconf.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 9250b309c742..a34602ae27de 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1704,6 +1704,7 @@ static int inet_valid_dump_ifaddr_req(const struct nlmsghdr *nlh,
 
 			net = rtnl_get_net_ns_capable(sk, fillargs->netnsid);
 			if (IS_ERR(net)) {
+				fillargs->netnsid = -1;
 				NL_SET_ERR_MSG(extack, "ipv4: Invalid target network namespace id");
 				return PTR_ERR(net);
 			}
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 7eb09c86fa13..63a808d5af15 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -5058,6 +5058,7 @@ static int inet6_valid_dump_ifaddr_req(const struct nlmsghdr *nlh,
 			fillargs->netnsid = nla_get_s32(tb[i]);
 			net = rtnl_get_net_ns_capable(sk, fillargs->netnsid);
 			if (IS_ERR(net)) {
+				fillargs->netnsid = -1;
 				NL_SET_ERR_MSG_MOD(extack, "Invalid target network namespace id");
 				return PTR_ERR(net);
 			}
-- 
2.11.0

^ permalink raw reply related

* for your images 12
From: Kate @ 2018-10-24 11:37 UTC (permalink / raw)
  To: netdev

We are an imaging team who can process 300+ images daily.

If you need any image editing service, please contact us today.

We do mainly images cut out and clipping path, masking.
Such as for your ecommerce photos, jewelry photos retouching, also it is
for beauty portraits and skin images
and wedding photos.

We provide test editing if you send some photos.

Thanks,
Kate

^ permalink raw reply

* for your images 17
From: Kate @ 2018-10-24 12:06 UTC (permalink / raw)
  To: netdev

We are an imaging team who can process 300+ images daily.

If you need any image editing service, please contact us today.

We do mainly images cut out and clipping path, masking.
Such as for your ecommerce photos, jewelry photos retouching, also it is
for beauty portraits and skin images
and wedding photos.

We provide test editing if you send some photos.

Thanks,
Kate

^ permalink raw reply

* for your images 15
From: Kate @ 2018-10-25 12:05 UTC (permalink / raw)
  To: netdev

We are an imaging team who can process 300+ images daily.

If you need any image editing service, please contact us today.

We do mainly images cut out and clipping path, masking.
Such as for your ecommerce photos, jewelry photos retouching, also it is
for beauty portraits and skin images
and wedding photos.

We provide test editing if you send some photos.

Thanks,
Kate

^ permalink raw reply

* Re: [net-next][PATCH] net/ipv4: fix a net leak
From: David Ahern @ 2018-10-25 18:46 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: Li RongQing, netdev, David Miller
In-Reply-To: <87lg6l7tq2.fsf@miraculix.mork.no>

On 10/25/18 12:43 PM, Bjørn Mork wrote:
> 
> inet_valid_dump_ifaddr_req() will bail out with an error, but only
> *after* setting fillargs->netnsid:
> 
>                 if (i == IFA_TARGET_NETNSID) {
>                         struct net *net;
> 
>                         fillargs->netnsid = nla_get_s32(tb[i]);
> 
>                         net = rtnl_get_net_ns_capable(sk, fillargs->netnsid);
>                         if (IS_ERR(net)) {
>                                 NL_SET_ERR_MSG(extack, "ipv4: Invalid target network namespace id");
>                                 return PTR_ERR(net);
>                         }
>                         *tgt_net = net;
>                 } else {
> 
> 
> 
> So inet_dump_ifaddr() ends up doing put_net(tgt_net):
> 
> 
>                 err = inet_valid_dump_ifaddr_req(nlh, &fillargs, &tgt_net,
>                                                  skb->sk, cb);
>                 if (err < 0)
>                         goto put_tgt_net;
> ..
> put_tgt_net:
>         if (fillargs.netnsid >= 0)
>                 put_net(tgt_net);
> 
> 
> 
> I believe you should set fillargs->netnsid back to -1 in the
> inet_valid_dump_ifaddr_req() error path, or use a temp variable to avoid
> changing it unless get_net is successful.

good point. either use of an intermediate or resetting nsid on failure.
Will you send a patch to fix ipv4 and v6?

Thanks,

^ permalink raw reply

* Re: [net-next][PATCH] net/ipv4: fix a net leak
From: Bjørn Mork @ 2018-10-25 18:43 UTC (permalink / raw)
  To: David Ahern; +Cc: Li RongQing, netdev, David Miller
In-Reply-To: <b2fe3de8-44b7-1ba9-3543-ac8abee44bf6@gmail.com>

David Ahern <dsahern@gmail.com> writes:
> On 10/24/18 9:02 AM, David Ahern wrote:
>> On 10/24/18 3:36 AM, Li RongQing wrote:
>>> put net when input a invalid ifindex, otherwise it will be leaked
>>>
>>> Fixes: 5fcd266a9f64("net/ipv4: Add support for dumping addresses for a specific device")
>>> Cc: David Ahern <dsahern@gmail.com>
>>> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
>>> Signed-off-by: Li RongQing <lirongqing@baidu.com>
>>> ---
>>>  net/ipv4/devinet.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
>>> index 63d5b58fbfdb..fd0c5a47e742 100644
>>> --- a/net/ipv4/devinet.c
>>> +++ b/net/ipv4/devinet.c
>>> @@ -1775,8 +1775,10 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
>>>  
>>>  		if (fillargs.ifindex) {
>>>  			dev = __dev_get_by_index(tgt_net, fillargs.ifindex);
>>> -			if (!dev)
>>> +			if (!dev) {
>>> +				put_net(tgt_net);
>>>  				return -ENODEV;
>>> +			}
>>>  
>>>  			in_dev = __in_dev_get_rtnl(dev);
>>>  			if (in_dev) {
>>>
>> 
>> Good catch. IPv6 has the same problem. Will fix that one.
>> 
> Actually remove that 'Reviewed-by'. You should only call put_net if
> (fillargs.netnsid >= 0)
>
> DaveM: just want to call this out since I mistakenly added the
> Reviewed-by. This patch should be dropped.

Hmm, I see that you implemented that.  But I believe it's still buggy if
called with an invalid netnsid.

inet_valid_dump_ifaddr_req() will bail out with an error, but only
*after* setting fillargs->netnsid:

                if (i == IFA_TARGET_NETNSID) {
                        struct net *net;

                        fillargs->netnsid = nla_get_s32(tb[i]);

                        net = rtnl_get_net_ns_capable(sk, fillargs->netnsid);
                        if (IS_ERR(net)) {
                                NL_SET_ERR_MSG(extack, "ipv4: Invalid target network namespace id");
                                return PTR_ERR(net);
                        }
                        *tgt_net = net;
                } else {



So inet_dump_ifaddr() ends up doing put_net(tgt_net):


                err = inet_valid_dump_ifaddr_req(nlh, &fillargs, &tgt_net,
                                                 skb->sk, cb);
                if (err < 0)
                        goto put_tgt_net;
..
put_tgt_net:
        if (fillargs.netnsid >= 0)
                put_net(tgt_net);



I believe you should set fillargs->netnsid back to -1 in the
inet_valid_dump_ifaddr_req() error path, or use a temp variable to avoid
changing it unless get_net is successful.



Bjørn

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox