* [PATCH] common/scsi_debug: make sure scsi_debug been removed correctly @ 2019-01-09 5:48 yangerkun 2019-01-09 21:02 ` Dave Chinner 0 siblings, 1 reply; 7+ messages in thread From: yangerkun @ 2019-01-09 5:48 UTC (permalink / raw) To: guaneryu; +Cc: houtao1, yangerkun, fstests Case generic/108 sometimes will fail while testing ext2, and the reson is that rmmod scsi_debug in _put_scsi_debug_dev may randomly fail. Now add the loop to do rmmod to make sure scsi_debug can be removed correctly. Signed-off-by: yangerkun <yangerkun@huawei.com> --- common/scsi_debug | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/common/scsi_debug b/common/scsi_debug index d9aa0bd..890ec68 100644 --- a/common/scsi_debug +++ b/common/scsi_debug @@ -45,11 +45,13 @@ _put_scsi_debug_dev() { lsmod | grep -wq scsi_debug || return - n=2 + n=15 # use redirection not -q option of modprobe here, because -q of old # modprobe is only quiet when the module is not found, not when the # module is in use. - while [ $n -ge 0 ] && ! modprobe -nr scsi_debug >/dev/null 2>&1; do + while [ $n -ge 0 ]; do + modprobe -r scsi_debug >/dev/null 2>&1 + [ $? -eq 0 ] && return 0 sleep 1 n=$((n-1)) done -- 2.9.5 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] common/scsi_debug: make sure scsi_debug been removed correctly 2019-01-09 5:48 [PATCH] common/scsi_debug: make sure scsi_debug been removed correctly yangerkun @ 2019-01-09 21:02 ` Dave Chinner 2019-01-12 2:20 ` yangerkun 0 siblings, 1 reply; 7+ messages in thread From: Dave Chinner @ 2019-01-09 21:02 UTC (permalink / raw) To: yangerkun; +Cc: guaneryu, houtao1, fstests On Wed, Jan 09, 2019 at 01:48:45PM +0800, yangerkun wrote: > Case generic/108 sometimes will fail while testing ext2, and the reson > is that rmmod scsi_debug in _put_scsi_debug_dev may randomly fail. Now > add the loop to do rmmod to make sure scsi_debug can be removed > correctly. Why does 'rmmod scsi_debug' randomly fail? What bug does ext2 have that prevents the scsi debug module from being released and hence removed? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] common/scsi_debug: make sure scsi_debug been removed correctly 2019-01-09 21:02 ` Dave Chinner @ 2019-01-12 2:20 ` yangerkun 2019-01-13 15:21 ` Eryu Guan 2019-01-15 3:47 ` Dave Chinner 0 siblings, 2 replies; 7+ messages in thread From: yangerkun @ 2019-01-12 2:20 UTC (permalink / raw) To: Dave Chinner; +Cc: guaneryu, houtao1, fstests Dave Chinner wrote on 2019/1/10 5:02: > On Wed, Jan 09, 2019 at 01:48:45PM +0800, yangerkun wrote: >> Case generic/108 sometimes will fail while testing ext2, and the reson >> is that rmmod scsi_debug in _put_scsi_debug_dev may randomly fail. Now >> add the loop to do rmmod to make sure scsi_debug can be removed >> correctly. > > Why does 'rmmod scsi_debug' randomly fail? > > What bug does ext2 have that prevents the scsi debug module from > being released and hence removed? It's not a bug with ext2, ever been existing in ext4 too. This patch is a reference to 'https://gitlab.com/cryptsetup/cryptsetup.git' commit d7b9ed05f0931b416c33c8eb2ff1e6efa39270ff, and the reason why need this is that the behavior of udev cannot be speculated, so scsi_debug may rmmod failed since udev scan open the device and take the reference of module scsi_debug. Thanks, Kun. > > Cheers, > > Dave. > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] common/scsi_debug: make sure scsi_debug been removed correctly 2019-01-12 2:20 ` yangerkun @ 2019-01-13 15:21 ` Eryu Guan 2019-01-15 3:47 ` Dave Chinner 1 sibling, 0 replies; 7+ messages in thread From: Eryu Guan @ 2019-01-13 15:21 UTC (permalink / raw) To: yangerkun; +Cc: Dave Chinner, houtao1, fstests On Sat, Jan 12, 2019 at 10:20:26AM +0800, yangerkun wrote: > > > Dave Chinner wrote on 2019/1/10 5:02: > > On Wed, Jan 09, 2019 at 01:48:45PM +0800, yangerkun wrote: > > > Case generic/108 sometimes will fail while testing ext2, and the reson > > > is that rmmod scsi_debug in _put_scsi_debug_dev may randomly fail. Now > > > add the loop to do rmmod to make sure scsi_debug can be removed > > > correctly. > > > > Why does 'rmmod scsi_debug' randomly fail? > > > > What bug does ext2 have that prevents the scsi debug module from > > being released and hence removed? > > It's not a bug with ext2, ever been existing in ext4 too. This patch is a > reference to 'https://gitlab.com/cryptsetup/cryptsetup.git' commit > d7b9ed05f0931b416c33c8eb2ff1e6efa39270ff, and the reason why need this is > that the behavior of udev cannot be speculated, so scsi_debug may rmmod > failed since udev scan open the device and take the reference of module > scsi_debug. It's probably a good thing to have the specific reason in commit log too. Thanks, Eryu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] common/scsi_debug: make sure scsi_debug been removed correctly 2019-01-12 2:20 ` yangerkun 2019-01-13 15:21 ` Eryu Guan @ 2019-01-15 3:47 ` Dave Chinner 2019-01-15 5:31 ` yangerkun 1 sibling, 1 reply; 7+ messages in thread From: Dave Chinner @ 2019-01-15 3:47 UTC (permalink / raw) To: yangerkun; +Cc: guaneryu, houtao1, fstests On Sat, Jan 12, 2019 at 10:20:26AM +0800, yangerkun wrote: > > > Dave Chinner wrote on 2019/1/10 5:02: > >On Wed, Jan 09, 2019 at 01:48:45PM +0800, yangerkun wrote: > >>Case generic/108 sometimes will fail while testing ext2, and the reson > >>is that rmmod scsi_debug in _put_scsi_debug_dev may randomly fail. Now > >>add the loop to do rmmod to make sure scsi_debug can be removed > >>correctly. > > > >Why does 'rmmod scsi_debug' randomly fail? > > > >What bug does ext2 have that prevents the scsi debug module from > >being released and hence removed? > > It's not a bug with ext2, ever been existing in ext4 too. This patch > is a reference to 'https://gitlab.com/cryptsetup/cryptsetup.git' > commit d7b9ed05f0931b416c33c8eb2ff1e6efa39270ff, and the reason why > need this is that the behavior of udev cannot be speculated, so > scsi_debug may rmmod failed since udev scan open the device and take > the reference of module scsi_debug. IOWs, you copied a hack from cryptsetup tests because you didn't know about $UDEV_SETTLE_PROG and didn't think to ask if anyone knew a solution to this problem? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] common/scsi_debug: make sure scsi_debug been removed correctly 2019-01-15 3:47 ` Dave Chinner @ 2019-01-15 5:31 ` yangerkun 2019-01-15 6:08 ` Dave Chinner 0 siblings, 1 reply; 7+ messages in thread From: yangerkun @ 2019-01-15 5:31 UTC (permalink / raw) To: Dave Chinner; +Cc: guaneryu, houtao1, fstests Dave Chinner wrote on 2019/1/15 11:47: > On Sat, Jan 12, 2019 at 10:20:26AM +0800, yangerkun wrote: >> >> >> Dave Chinner wrote on 2019/1/10 5:02: >>> On Wed, Jan 09, 2019 at 01:48:45PM +0800, yangerkun wrote: >>>> Case generic/108 sometimes will fail while testing ext2, and the reson >>>> is that rmmod scsi_debug in _put_scsi_debug_dev may randomly fail. Now >>>> add the loop to do rmmod to make sure scsi_debug can be removed >>>> correctly. >>> >>> Why does 'rmmod scsi_debug' randomly fail? >>> >>> What bug does ext2 have that prevents the scsi debug module from >>> being released and hence removed? >> >> It's not a bug with ext2, ever been existing in ext4 too. This patch >> is a reference to 'https://gitlab.com/cryptsetup/cryptsetup.git' >> commit d7b9ed05f0931b416c33c8eb2ff1e6efa39270ff, and the reason why >> need this is that the behavior of udev cannot be speculated, so >> scsi_debug may rmmod failed since udev scan open the device and take >> the reference of module scsi_debug. > > IOWs, you copied a hack from cryptsetup tests because you didn't > know about $UDEV_SETTLE_PROG and didn't think to ask if anyone knew > a solution to this problem? I have test to add udev settle before really do 'rmmod scsi_debug', and the result is the same, sometimes 'rmmod scsi_debug' may failed too. For this question, i have ask Ondrej Konzina, the question and answer are as follow: Question: "Hi, While we are testing fs with xfstests, generic/108 will fail at some times with the reson that rmmod find the refcnt of scsi_debug module isn't 0 by checking /proc/modules. And this your patch is really describe the same thing. But, i am confusing about the scene as you said: cleanup() { if [ -d "$MNT_DIR" ] ; then umount -f $MNT_DIR 2>/dev/null rmdir $MNT_DIR 2>/dev/null fi rm -f $LOCAL_FILE 2> /dev/null - udevadm settle >/dev/null 2>&1 - rmmod scsi_debug 2> /dev/null + scsi_debug_teardown "$DEV" || exit 100 } After the settle, all the event has been finish, so who will trigger the scan, and how to scan? Can you help to explain this? Thanks a lot! " Answer: "Well, almost true. But by our experience udev settle command behaviour is not strictly defined (and also discouraged from using even by udev maintainers themselves IIRC). For example it will not synchronize you against events coming after you trigger the settle command. And you have zero control over when the event is added. So the long-term goal is to get rid of it from all tests in cryptsetup (not being on top of our todo list as you can clearly see if you grep scripts across tests subdirectory :)). Due to many existing udev versions among many distributions we found out there's no more reliable way than trying to remove scsi_debug in retry loop. And that loop also handles some weird versions (legacy versions but we do sometimes test even old distros) where udev fails to remove device nodes from /dev from time to time." Since settle can not resolve the problem, so there has no choice add the loop to do these. Maybe it is not a good way for this problem, do you have any suggestion? Cheer, Kun. > > Cheers, > > Dave. > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] common/scsi_debug: make sure scsi_debug been removed correctly 2019-01-15 5:31 ` yangerkun @ 2019-01-15 6:08 ` Dave Chinner 0 siblings, 0 replies; 7+ messages in thread From: Dave Chinner @ 2019-01-15 6:08 UTC (permalink / raw) To: yangerkun; +Cc: guaneryu, houtao1, fstests On Tue, Jan 15, 2019 at 01:31:24PM +0800, yangerkun wrote: > > > Dave Chinner wrote on 2019/1/15 11:47: > >On Sat, Jan 12, 2019 at 10:20:26AM +0800, yangerkun wrote: > >> > >> > >>Dave Chinner wrote on 2019/1/10 5:02: > >>>On Wed, Jan 09, 2019 at 01:48:45PM +0800, yangerkun wrote: > >>>>Case generic/108 sometimes will fail while testing ext2, and the reson > >>>>is that rmmod scsi_debug in _put_scsi_debug_dev may randomly fail. Now > >>>>add the loop to do rmmod to make sure scsi_debug can be removed > >>>>correctly. > >>> > >>>Why does 'rmmod scsi_debug' randomly fail? > >>> > >>>What bug does ext2 have that prevents the scsi debug module from > >>>being released and hence removed? > >> > >>It's not a bug with ext2, ever been existing in ext4 too. This patch > >>is a reference to 'https://gitlab.com/cryptsetup/cryptsetup.git' > >>commit d7b9ed05f0931b416c33c8eb2ff1e6efa39270ff, and the reason why > >>need this is that the behavior of udev cannot be speculated, so > >>scsi_debug may rmmod failed since udev scan open the device and take > >>the reference of module scsi_debug. > > > >IOWs, you copied a hack from cryptsetup tests because you didn't > >know about $UDEV_SETTLE_PROG and didn't think to ask if anyone knew > >a solution to this problem? > > I have test to add udev settle before really do 'rmmod scsi_debug', > and the result is the same, sometimes 'rmmod scsi_debug' may failed > too.i So what is generating the events that udev is running that pins the scsi debug device given that it is no longer in use? > For this question, i have ask Ondrej Konzina, the question and > answer are as follow: > > Question: > "Hi, > > While we are testing fs with xfstests, generic/108 will fail at some > times with the reson that rmmod find the refcnt of scsi_debug module > isn't 0 by checking /proc/modules. And this your patch is really > describe the same thing. But, i am confusing about the scene as you said: > cleanup() { > if [ -d "$MNT_DIR" ] ; then > umount -f $MNT_DIR 2>/dev/null > rmdir $MNT_DIR 2>/dev/null > fi > rm -f $LOCAL_FILE 2> /dev/null > - udevadm settle >/dev/null 2>&1 > - rmmod scsi_debug 2> /dev/null > + scsi_debug_teardown "$DEV" || exit 100 > } > > After the settle, all the event has been finish, so who will trigger the > scan, and how to scan? Can you help to explain this? Thanks a lot! > " > Answer: > "Well, almost true. But by our experience udev settle command > behaviour is not strictly defined (and also discouraged from using > even by udev maintainers themselves IIRC). For example it will not > synchronize you against events coming after you trigger the settle > command. So, in summary: We used to have wait loops like you are proposing, then the udevadm settle command was added and we replaced all the wait loops with settles because that worked reliably. Now we're at the stage where udevadm settle doesn't work any more, the maintainers have effectively deprecated it without a replacement, and so we have to go back to hard coding hacky wait loops everywhere to retry until udev releases whatever random devices it holds references to? Forget I said anything, I'm rapidly coming to the conclusion that I have far too high standards for writing "working" software these days. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-01-15 6:08 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-01-09 5:48 [PATCH] common/scsi_debug: make sure scsi_debug been removed correctly yangerkun 2019-01-09 21:02 ` Dave Chinner 2019-01-12 2:20 ` yangerkun 2019-01-13 15:21 ` Eryu Guan 2019-01-15 3:47 ` Dave Chinner 2019-01-15 5:31 ` yangerkun 2019-01-15 6:08 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox