* [AX25] patch did not fix -- was: ax25: fix incorrect dev_tracker usage @ 2022-09-10 7:16 Thomas Osterried 2022-09-10 11:49 ` Thomas Osterried 0 siblings, 1 reply; 4+ messages in thread From: Thomas Osterried @ 2022-09-10 7:16 UTC (permalink / raw) To: Eric Dumazet, Jakub Kicinski Cc: David S . Miller, Paolo Abeni, Eric Dumazet, Bernard Pidoux, Duoming Zhou, netdev, linux-hams Hello, patch: "ax25: fix incorrect dev_tracker usage" commit d7c4c9e075f8cc6d88d277bc24e5d99297f03c06 date 2022-07-28 ..does not fix Tested it with current towalrds tree, which uses latest change 7c6327c77d509e78bff76f2a4551fcfee851682e (netdev_put() instead of dev_put_track()). userspace: # rmmod bpqether refcount complpanis about [ 302.326051] unregister_netdevice: waiting for bpq1 to become free. Usage count = -2 [ 312.406495] unregister_netdevice: waiting for bpq1 to become free. Usage count = -2 trace (comparable to trace mentioned iin d7c4c9e075f8cc6d88d277bc24e5d99297f03c06): [ 291.965794] refcount_t: underflow; use-after-free. [ 291.968761] WARNING: CPU: 0 PID: 5954 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110 [ 291.973994] Modules linked in: nft_chain_nat(E) xt_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xt_tcpudp(E) nft_compat(E) nf_tables(E) libcrc32c(E) nfnetlink(E) tun(E) mkiss(E) bpqether(E-) ax25(OE) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) pcspkr(E) qxl(E) drm_ttm_helper(E) evdev(E) serio_raw(E) ttm(E) virtio_console(E) virtio_balloon(E) drm_kms_helper(E) qemu_fw_cfg(E) button(E) netconsole(E) fuse(E) drm(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) virtio_net(E) net_failover(E) virtio_blk(E) failover(E) hid_generic(E) usbhid(E) hid(E) crc32c_intel(E) psmouse(E) ata_generic(E) ehci_pci(E) uhci_hcd(E) ata_piix(E) ehci_hcd(E) libata(E) usbcore(E) usb_common(E) virtio_pci(E) virtio_pci_legacy_dev(E) scs i_mod(E) virtio_pci_modern_dev(E) virtio(E) virtio_ring(E) scsi_common(Endard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 [ 292.025488] RIP: 0010:refcount_warn_saturate+0xba/0x110 [ 292.027887] Code: 01 01 e8 e6 10 45 00 0f 0b c3 cc cc cc cc 80 3d 32 bf 10 01 00 75 85 48 c7 c7 80 57 76 92 c6 05 22 bf 10 01 01 e8 c3 10 45 00 <0f> 0b c3 cc cc cc cc 80 3d 0d bf 10 01 00 0f 85 5e ff ff ff 48 c7 [ 292.035844] RSP: 0018:ffffae0d806fbd30 EFLAGS: 00010286 [ 292.038080] RAX: 0000000000000000 RBX: ffff8fd9888b3e40 RCX: 0000000000000000 [ 292.040926] RDX: 0000000000000001 RSI: ffffffff9274e0e2 RDI: 00000000ffffffff [ 292.043823] RBP: ffff8fd983c05e00 R08: 0000000000000000 R09: 00000000ffffefff [ 292.046710] R10: ffffae0d806fbbd0 R11: ffffffff92acbaa8 R12: ffff8fd988ce0000 [ 292.049458] R13: ffff8fd983488000 R14: 0000000000000001 R15: ffff8fd983488080 [ 292.052199] FS: 0000000000000000(0000) GS:ffff8fd99fc00000(0063) knlGS:00000000f7ee2700 [ 292.055244] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 [ 292.057403] CR2: 00000000f6ec4e20 CR3: 00000000037d6000 CR4: 00000000000006f0 [ 292.060108] Call Trace: [ 292.061079] <TASK> [ 292.061971] ax25_device_event+0x234/0x250 [ax25] [ 292.063758] raw_notifier_call_chain+0x44/0x60 [ 292.065392] dev_close_many+0xe9/0x140 [ 292.066834] dev_close+0x7f/0xb0 [ 292.068044] bpq_device_event+0x209/0x2a0 [bpqether] [ 292.069910] call_netdevice_unregister_notifiers+0x66/0xb0 [ 292.071874] unregister_netdevice_notifier+0x6c/0xb0 [ 292.073716] bpq_cleanup_driver+0x24/0x62f [bpqether] [ 292.075588] __do_sys_delete_module+0x198/0x300 [ 292.077298] ? fpregs_assert_state_consistent+0x22/0x50 [ 292.079290] ? exit_to_user_mode_prepare+0x3a/0x150 [ 292.081081] __do_fast_syscall_32+0x6f/0xf0 [ 292.082709] do_fast_syscall_32+0x2f/0x70 [ 292.084215] entry_SYSENTER_compat_after_hwframe+0x70/0x82 [ 292.086244] RIP: 0023:0xf7f25559 [ 292.087482] Code: 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 [ 292.097365] RSP: 002b:00000000fff45da8 EFLAGS: 00200206 ORIG_RAX: 00000000000102028] RAX: ffffffffffffffda RBX: 00000000569cd19c RCX: 0000000000000800 [ 292.106296] RDX: 00000000565aa939 RSI: 00000000569cd160 RDI: 00000000569cd160 [ 292.110722] RBP: 00000000fff468e4 R08: 0000000000000000 R09: 0000000000000000 [ 292.115096] R10: 0000000000000000 R11: 0000000000200206 R12: 0000000000000000 [ 292.119435] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 292.123755] </TASK> [ 292.126362] ---[ end trace 0000000000000000 ]--- ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [AX25] patch did not fix -- was: ax25: fix incorrect dev_tracker usage 2022-09-10 7:16 [AX25] patch did not fix -- was: ax25: fix incorrect dev_tracker usage Thomas Osterried @ 2022-09-10 11:49 ` Thomas Osterried 2022-10-24 18:00 ` Thomas Osterried 0 siblings, 1 reply; 4+ messages in thread From: Thomas Osterried @ 2022-09-10 11:49 UTC (permalink / raw) To: Eric Dumazet, Jakub Kicinski Cc: David S . Miller, Paolo Abeni, Bernard Pidoux, Duoming Zhou, netdev, linux-hams Hello, please allow me the question what the patch tries to fix. 1. we add sessions to the list of active sessions ax25_cb_add(ax25), and remove them on close. Why do we need a tracker? 2. ax25_dev.c: ax25_dev_device_up(): netdev_hold(dev, &ax25_dev->dev_tracker, GFP_KERNEL); ax25_dev_device_down(): netdev_put(dev, &ax25_dev->dev_tracker); ax25_dev_free(): netdev_put(ax25_dev->dev, &ax25_dev->dev_tracker); Just to be sure: It's the list of active ax25-interface, correct? -> Looks good to me. 3. On device status change, i.e. NETDEV_DOWN, ax25_device_event() calls ax25_kill_by_device(dev) and ax25_dev_device_down(dev) (we saw in 2)). Before patches 7c6327c77d509e78bff76f2a4551fcfee851682e / d7c4c9e075f8cc6d88d277bc24e5d99297f03c06 we did in ax25_relase(): netdev_put(ax25_dev->dev, &ax25_dev->dev_tracker); and in ax25_bind(); netdev_hold(ax25_dev->dev, &ax25_dev->dev_tracker, GFP_ATOMIC); ax25_kill_by_device() traverses through the list of active connections (ax25_list): if (sk->sk_socket) { netdev_put(ax25_dev->dev, &ax25_dev->dev_tracker); ax25_dev_put(ax25_dev); } The patches mentioned above were: ax25_release(): netdev_put(ax25_dev->dev, &ax25->dev_tracker); ax25_bind(): netdev_hold(ax25_dev->dev, &ax25->dev_tracker, GFP_ATOMIC); -> Previously, each session (ax25_cb) was added to it's device with the dev_tracker of the _device_ (&ax25_dev->dev_tracker) Now, each session (ax25_cb) is added with it's own dev_tracker &ax25->dev_tracker But: ax25_kill_by_device() goes through the list of active sessions and does netdev_put(ax25_dev->dev, &ax25_dev->dev_tracker); instead of the new concept; I would have expected: netdev_put(ax25_dev->dev, &s->dev_tracker); If we netdev_put() to a non-existent tracker, this may explain the warnings unregister_netdevice: waiting for bpq1 to become free. Usage count = -2 and refcount_t: underflow; use-after-free. that I observed in my previous mail. 4. Again, my question for understanding about the dev_tracker concept: should _all_ in- and outgoing sessions be tracked? If so, I argue we have to add - new outbound sessions to the tracker (i.E. if a IP mode VC frame is sent) - new inbound sessions to the tracker Also, ax25_bind() currently does not track all sessions: /* * User already set interface with SO_BINDTODEVICE */ if (ax25->ax25_dev != NULL) goto done; ... if (ax25_dev) { ax25_fillin_cb(ax25, ax25_dev); netdev_hold(ax25_dev->dev, &ax25->dev_tracker, GFP_ATOMIC); } done: ax25_cb_add(ax25); -> If user has previously called ax25_setsockopt() with SO_BINDTODEVICE, the device has been added to ax25->ax25_dev. But ax25_bind() does not add it to the tracker list (see above, ax25->ax25_dev is != NULL). ax25_connect() does also currently not track all sessions: There's a spcial condition sock_flag(sk, SOCK_ZAPPED), where connect() is allowed to be called without having gone through ax25_bind(). Via ax25_rt_autobind(ax25, ..) it get's the appropriete ax25>dev. -> ax25_fillin_cb(ax25, ax25->ax25_dev); ax25_cb_add(ax25); => No tracker is added for this session. ax25_kill_by_device() does remove the tracker. I argued above, it removes the wrong tracker here: if (sk->sk_socket) { netdev_put(ax25_dev->dev, &ax25_dev->dev_tracker); ax25_dev_put(ax25_dev); } ..instead of &s->dev_tracker. And if we have trackers for all sessions (not only those that came via ax25_bind() from userspace), it's not enough to remove the tracker only for sessions with sk->socket. -> I would have expeted ax25_for_each(s, &ax25_list) { if (s->ax25_dev == ax25_dev) { if (s->ax25_dev == ax25_dev) { netdev_put(ax25_dev->dev, &s->dev_tracker); ... Finally: On normal session close (userspace program, or idle timer expiry), we need to have timers running for correct AX.25-session-close: If we are in connected state (AX25_STATE_3 or AX25_STATE_4), we need to send DISC in interval until max retry reached, or until we receive a DM-). In ax25_release(), we see: if (ax25_dev) { ... netdev_put(ax25_dev->dev, &ax25->dev_tracker); ax25_dev_put(ax25_dev); But we need have timers running for ax25_cb's that still refer to the network dev (and need to be there until correct session termination), we have a conflict here, because netdev_put() assures here we are not allowed to refer to the dev anymore. That needs to be resolved. Context: diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c index bbac3cb4dc99d..d82a51e69386b 100644 --- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -1066,7 +1066,7 @@ static int ax25_release(struct socket *sock) del_timer_sync(&ax25->t3timer); del_timer_sync(&ax25->idletimer); } - netdev_put(ax25_dev->dev, &ax25_dev->dev_tracker); + netdev_put(ax25_dev->dev, &ax25->dev_tracker); ax25_dev_put(ax25_dev); } @@ -1147,7 +1147,7 @@ static int ax25_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) if (ax25_dev) { ax25_fillin_cb(ax25, ax25_dev); - netdev_hold(ax25_dev->dev, &ax25_dev->dev_tracker, GFP_ATOMIC); + netdev_hold(ax25_dev->dev, &ax25->dev_tracker, GFP_ATOMIC); } done: ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [AX25] patch did not fix -- was: ax25: fix incorrect dev_tracker usage 2022-09-10 11:49 ` Thomas Osterried @ 2022-10-24 18:00 ` Thomas Osterried 2022-10-24 20:56 ` Jakub Kicinski 0 siblings, 1 reply; 4+ messages in thread From: Thomas Osterried @ 2022-10-24 18:00 UTC (permalink / raw) To: Eric Dumazet, Jakub Kicinski Cc: David S . Miller, Paolo Abeni, Bernard Pidoux, Duoming Zhou, netdev, linux-hams Hello, perhaps my questions in my last post were too complex. I seem to not fully understand the concept of netdev_hold() / netdev_put(). I'll restart with my question "1" of my previous post, and for convenience, I try to answer my question by myself (-> "yes" or "no" makes it easier to answer ;) I) Why do we need these trackers? If we remove a network device, there may be active network structures (in our case AX.25-sessions) that use the device. It's a good idea to trace them (and inform), so we know that they linger around, and even more, we could wait until they are cleaned up properly. II) What consequences has the tracker counter? As far as I can see by kernel messages, the netdev tracker forces the kernel to wait (on ifdown (i.e. ifconfig ax0 down) or rmmod (i.e. rmmod bpqether or rmmod ax25) ) until all references to the network device are freed. If there's a bug (refcount > 0 or < 0), kernel obviously waits for ever. III) Is it only to track sessions initiated from userspace? I think no. But in the current implementation, netdev_hold() is only called in ax25_bind(). And even there, not in all cases. But netdev_put() is called in all cases -> wrong count possible. There's no netdev_hold() for new - incoming AX.25-connects or - outgoing AX.25-sessions (initiated by i.e. by a packet of type IP-over-AX.25, netrom or rose). "Luckily" (in regard of the refcounter), there's also no netdev_put() if the AX.25 session get disconnected. Can this be correct? Thank you for your help, - Thomas dl9sau > Am 10.09.2022 um 13:49 schrieb Thomas Osterried <thomas@osterried.de>: > > Hello, > > please allow me the question what the patch tries to fix. > > 1. we add sessions to the list of active sessions ax25_cb_add(ax25), > and remove them on close. > Why do we need a tracker? > > 2. ax25_dev.c: > ax25_dev_device_up(): > netdev_hold(dev, &ax25_dev->dev_tracker, GFP_KERNEL); > ax25_dev_device_down(): > netdev_put(dev, &ax25_dev->dev_tracker); > ax25_dev_free(): > netdev_put(ax25_dev->dev, &ax25_dev->dev_tracker); > > Just to be sure: It's the list of active ax25-interface, correct? > > -> Looks good to me. > > 3. On device status change, i.e. NETDEV_DOWN, ax25_device_event() calls > ax25_kill_by_device(dev) > and > ax25_dev_device_down(dev) (we saw in 2)). > > Before patches > 7c6327c77d509e78bff76f2a4551fcfee851682e / d7c4c9e075f8cc6d88d277bc24e5d99297f03c06 > we did > in ax25_relase(): netdev_put(ax25_dev->dev, &ax25_dev->dev_tracker); > and > in ax25_bind(); netdev_hold(ax25_dev->dev, &ax25_dev->dev_tracker, GFP_ATOMIC); > > ax25_kill_by_device() traverses through the list of active connections > (ax25_list): > if (sk->sk_socket) { > netdev_put(ax25_dev->dev, > &ax25_dev->dev_tracker); > ax25_dev_put(ax25_dev); > } > > The patches mentioned above were: > ax25_release(): netdev_put(ax25_dev->dev, &ax25->dev_tracker); > ax25_bind(): netdev_hold(ax25_dev->dev, &ax25->dev_tracker, GFP_ATOMIC); > > -> Previously, each session (ax25_cb) was added to it's device with the > dev_tracker of the _device_ (&ax25_dev->dev_tracker) > Now, each session (ax25_cb) is added with it's own dev_tracker > &ax25->dev_tracker > But: > ax25_kill_by_device() goes through the list of active sessions and > does > netdev_put(ax25_dev->dev, &ax25_dev->dev_tracker); > instead of the new concept; I would have expected: > netdev_put(ax25_dev->dev, &s->dev_tracker); > > If we netdev_put() to a non-existent tracker, this may explain > the warnings > unregister_netdevice: waiting for bpq1 to become free. Usage count = -2 > and > refcount_t: underflow; use-after-free. > that I observed in my previous mail. > > > 4. Again, my question for understanding about the dev_tracker concept: > should _all_ in- and outgoing sessions be tracked? > If so, I argue we have to add > - new outbound sessions to the tracker (i.E. if a IP mode VC frame is > sent) > - new inbound sessions to the tracker > > > Also, ax25_bind() currently does not track all sessions: > > /* > * User already set interface with SO_BINDTODEVICE > */ > if (ax25->ax25_dev != NULL) > goto done; > ... > if (ax25_dev) { > ax25_fillin_cb(ax25, ax25_dev); > netdev_hold(ax25_dev->dev, &ax25->dev_tracker, GFP_ATOMIC); > } > done: > ax25_cb_add(ax25); > > -> If user has previously called ax25_setsockopt() with SO_BINDTODEVICE, > the device has been added to ax25->ax25_dev. But ax25_bind() does not > add it to the tracker list (see above, ax25->ax25_dev is != NULL). > > > ax25_connect() does also currently not track all sessions: > > There's a spcial condition sock_flag(sk, SOCK_ZAPPED), where connect() > is allowed to be called without having gone through ax25_bind(). > Via ax25_rt_autobind(ax25, ..) it get's the appropriete ax25>dev. > -> ax25_fillin_cb(ax25, ax25->ax25_dev); > ax25_cb_add(ax25); > => No tracker is added for this session. > > > ax25_kill_by_device() does remove the tracker. I argued above, it removes > the wrong tracker here: > if (sk->sk_socket) { > netdev_put(ax25_dev->dev, > &ax25_dev->dev_tracker); > ax25_dev_put(ax25_dev); > } > ..instead of &s->dev_tracker. > > And if we have trackers for all sessions (not only those that came via > ax25_bind() from userspace), it's not enough to remove the tracker only > for sessions with sk->socket. > > -> I would have expeted > ax25_for_each(s, &ax25_list) { > if (s->ax25_dev == ax25_dev) { > if (s->ax25_dev == ax25_dev) { > netdev_put(ax25_dev->dev, &s->dev_tracker); > ... > > > Finally: > On normal session close (userspace program, or idle timer expiry), > we need to have timers running for correct AX.25-session-close: > If we are in connected state (AX25_STATE_3 or AX25_STATE_4), we need > to send DISC in interval until max retry reached, or until we receive > a DM-). > In ax25_release(), we see: > if (ax25_dev) { > ... > netdev_put(ax25_dev->dev, &ax25->dev_tracker); > ax25_dev_put(ax25_dev); > But we need have timers running for ax25_cb's that still refer to > the network dev (and need to be there until correct session > termination), we have a conflict here, because > netdev_put() assures here we are not allowed to refer to the > dev anymore. > That needs to be resolved. > > > Context: > > diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c > index bbac3cb4dc99d..d82a51e69386b 100644 > --- a/net/ax25/af_ax25.c > +++ b/net/ax25/af_ax25.c > @@ -1066,7 +1066,7 @@ static int ax25_release(struct socket *sock) > del_timer_sync(&ax25->t3timer); > del_timer_sync(&ax25->idletimer); > } > - netdev_put(ax25_dev->dev, &ax25_dev->dev_tracker); > + netdev_put(ax25_dev->dev, &ax25->dev_tracker); > ax25_dev_put(ax25_dev); > } > > @@ -1147,7 +1147,7 @@ static int ax25_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) > > if (ax25_dev) { > ax25_fillin_cb(ax25, ax25_dev); > - netdev_hold(ax25_dev->dev, &ax25_dev->dev_tracker, GFP_ATOMIC); > + netdev_hold(ax25_dev->dev, &ax25->dev_tracker, GFP_ATOMIC); > } > > done: > > > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [AX25] patch did not fix -- was: ax25: fix incorrect dev_tracker usage 2022-10-24 18:00 ` Thomas Osterried @ 2022-10-24 20:56 ` Jakub Kicinski 0 siblings, 0 replies; 4+ messages in thread From: Jakub Kicinski @ 2022-10-24 20:56 UTC (permalink / raw) To: Thomas Osterried Cc: Eric Dumazet, David S . Miller, Paolo Abeni, Bernard Pidoux, Duoming Zhou, netdev, linux-hams On Mon, 24 Oct 2022 20:00:00 +0200 Thomas Osterried wrote: > II) What consequences has the tracker counter? > > As far as I can see by kernel messages, the netdev tracker forces the > kernel to wait > (on ifdown (i.e. ifconfig ax0 down) > or rmmod (i.e. rmmod bpqether or rmmod ax25) ) > until all references to the network device are freed. > If there's a bug (refcount > 0 or < 0), kernel obviously waits for ever. Small correction here - the wait is when netdev is unregistered, which often happens on rmmod, but also when user asks for a sw netdev to be deleted, or HW device is removed, etc. > III) Is it only to track sessions initiated from userspace? > > I think no. Correct, the trackers track the references taken on the device. Doesn't really matter if the reference is somehow tracable to a user request or not. Sorry for lack of input on the actual ax25 problem, I had looked briefly in September and it wasn't obvious how things work :S ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-10-24 20:56 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-09-10 7:16 [AX25] patch did not fix -- was: ax25: fix incorrect dev_tracker usage Thomas Osterried 2022-09-10 11:49 ` Thomas Osterried 2022-10-24 18:00 ` Thomas Osterried 2022-10-24 20:56 ` Jakub Kicinski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).