From mboxrd@z Thu Jan 1 00:00:00 1970 References: <20160527065822.GH22660@hermes.click-hack.org> <57480435.7040408@web.de> <20160527083333.GK22660@hermes.click-hack.org> <57480F0E.2050501@web.de> From: Jan Kiszka Message-ID: <574815BB.6060609@web.de> Date: Fri, 27 May 2016 11:39:07 +0200 MIME-Version: 1.0 In-Reply-To: <57480F0E.2050501@web.de> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable Subject: Re: [Xenomai] [Xenomai-git] Jan Kiszka : cobalt/rtdm: Fix driver reference counting List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Gilles Chanteperdrix Cc: xenomai@xenomai.org On 2016-05-27 11:10, Jan Kiszka wrote: > On 2016-05-27 10:33, Gilles Chanteperdrix wrote: >> On Fri, May 27, 2016 at 10:24:21AM +0200, Jan Kiszka wrote: >>> On 2016-05-27 08:58, Gilles Chanteperdrix wrote: >>>> On Fri, May 27, 2016 at 08:36:43AM +0200, git repository hosting wrote: >>>>> Module: xenomai-jki >>>>> Branch: for-forge >>>>> Commit: c9d83776c0ed882c71045dc32b340b57f88c5e00 >>>>> URL: http://git.xenomai.org/?p=3Dxenomai-jki.git;a=3Dcommit;h=3Dc9= d83776c0ed882c71045dc32b340b57f88c5e00 >>>>> >>>>> Author: Jan Kiszka >>>>> Date: Fri May 27 08:32:41 2016 +0200 >>>>> >>>>> cobalt/rtdm: Fix driver reference counting >>>>> >>>>> The rtdm smokey test triggered a BUG due to rtdm_dev_unregister not >>>>> taking the reference counter of a driver into account. Fix this by >>>>> moving the check into unregister_driver directly. >>>> >>>> Did you have a look at commit >>>> 96e85548a56c8c7fbd6d64c079701483a8e5da27 ? >>>> This looks like a revert, so since the commit was fixing something, >>>> I believe you are reintroducting a bug. >>>> >>> >>> Didn't see that. However, that commit was wrong because you were mixing >>> up different reference counters. One is that for devices, which is >>> decreased in __rtdm_put_device. The other is what un/register_driver >>> have to handle: that of the corresponding rtdm_driver. A device might >>> pass earlier than a driver because the latter may manage multiple >>> devices - exactly what the unit test checks. >>> >>> Maybe you can describe what scenarios was triggering the issue back, and >>> we can check if it reoccurred and fix it for good. >> >> Well, that was pretty simple, removing kernel modules registering >> devices (in my case it was rtnet.ko), would fail to unregister some >> part of the driver (some proc or sys files if I remember correctly), >> so that reinserting the driver would first cause some warning, and >> after several rmmod/insmod result in a crash or a simple failure I >> do not remember. I traced that to the fact that the test for the >> reference counter in unregister_driver was failing because the >> reference counter had already been decremented elsewhere. This was a >> long time ago, I do not remember all the details, but I think it is >> something like that. >> > = > I can remove and reload a stack of rtnet, rt_e1000 and rtipv4 multiple > times without any bug reports. /proc/rtnet also properly disappears and > reappears. However, unloading and loading rtpacket causes an oops. > = > BUG: unable to handle kernel paging request at ffffffffa01b5b20 > IP: [] blocking_notifier_chain_register+0x40/0xb0 > ... > Call Trace: > [] cobalt_add_state_chain+0x18/0x20 > [] rtdm_dev_register+0x1d3/0x660 > [] ? ipipe_unstall_root+0x5c/0x90 > [] ? do_one_initcall+0x80/0x1f0 > [] ? 0xffffffffa01d5000 > [] rt_packet_proto_init+0x15/0x48 [rtpacket] > [] ? 0xffffffffa01d5000 > [] do_one_initcall+0x90/0x1f0 > = > Let me check that. > = Pushed the proper fix for that. Was only affecting protocol drivers, thus the exposure via rtnet. Jan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: