* [BUG] fs: super: possible ABBA deadlocks in do_thaw_all_callback() and freeze_bdev() @ 2021-12-27 2:03 Jia-Ju Bai 2021-12-27 2:08 ` Matthew Wilcox 0 siblings, 1 reply; 6+ messages in thread From: Jia-Ju Bai @ 2021-12-27 2:03 UTC (permalink / raw) To: viro, Jens Axboe, hch; +Cc: linux-fsdevel, linux-block, linux-kernel Hello, My static analysis tool reports several possible ABBA deadlocks in Linux 5.10: do_thaw_all_callback() down_write(&sb->s_umount); --> Line 1028 (Lock A) emergency_thaw_bdev() thaw_bdev() mutex_lock(&bdev->bd_fsfreeze_mutex); --> Line 602 (Lock B) freeze_bdev() mutex_lock(&bdev->bd_fsfreeze_mutex); --> Line 556 (Lock B) freeze_super() down_write(&sb->s_umount); --> Line 1716 (Lock A) down_write(&sb->s_umount); --> Line 1738 (Lock A) deactivate_super() down_write(&s->s_umount); --> Line 365 (Lock A) When do_thaw_all_callback() and freeze_bdev() are concurrently executed, the deadlocks can occur. I am not quite sure whether these possible deadlocks are real and how to fix them if them are real. Any feedback would be appreciated, thanks :) Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Best wishes, Jia-Ju Bai ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] fs: super: possible ABBA deadlocks in do_thaw_all_callback() and freeze_bdev() 2021-12-27 2:03 [BUG] fs: super: possible ABBA deadlocks in do_thaw_all_callback() and freeze_bdev() Jia-Ju Bai @ 2021-12-27 2:08 ` Matthew Wilcox 2021-12-27 4:39 ` Theodore Ts'o 0 siblings, 1 reply; 6+ messages in thread From: Matthew Wilcox @ 2021-12-27 2:08 UTC (permalink / raw) To: Jia-Ju Bai Cc: viro, Jens Axboe, hch, linux-fsdevel, linux-block, linux-kernel On Mon, Dec 27, 2021 at 10:03:35AM +0800, Jia-Ju Bai wrote: > My static analysis tool reports several possible ABBA deadlocks in Linux > 5.10: > > do_thaw_all_callback() > down_write(&sb->s_umount); --> Line 1028 (Lock A) > emergency_thaw_bdev() > thaw_bdev() > mutex_lock(&bdev->bd_fsfreeze_mutex); --> Line 602 (Lock B) > > freeze_bdev() > mutex_lock(&bdev->bd_fsfreeze_mutex); --> Line 556 (Lock B) > freeze_super() > down_write(&sb->s_umount); --> Line 1716 (Lock A) > down_write(&sb->s_umount); --> Line 1738 (Lock A) > deactivate_super() > down_write(&s->s_umount); --> Line 365 (Lock A) > > When do_thaw_all_callback() and freeze_bdev() are concurrently executed, the > deadlocks can occur. > > I am not quite sure whether these possible deadlocks are real and how to fix > them if them are real. > Any feedback would be appreciated, thanks :) As a rule, ABBA deadlocks that can actually occur are already found by lockdep. Tools that think they've found something are generally wrong. I'm not inclined to look in detail to find out why this tool is wrong because lockdep is so effective. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] fs: super: possible ABBA deadlocks in do_thaw_all_callback() and freeze_bdev() 2021-12-27 2:08 ` Matthew Wilcox @ 2021-12-27 4:39 ` Theodore Ts'o 2021-12-27 9:32 ` Jia-Ju Bai 0 siblings, 1 reply; 6+ messages in thread From: Theodore Ts'o @ 2021-12-27 4:39 UTC (permalink / raw) To: Matthew Wilcox Cc: Jia-Ju Bai, viro, Jens Axboe, hch, linux-fsdevel, linux-block, linux-kernel On Mon, Dec 27, 2021 at 02:08:58AM +0000, Matthew Wilcox wrote: > On Mon, Dec 27, 2021 at 10:03:35AM +0800, Jia-Ju Bai wrote: > > My static analysis tool reports several possible ABBA deadlocks in Linux > > 5.10: > > > > do_thaw_all_callback() > > down_write(&sb->s_umount); --> Line 1028 (Lock A) > > emergency_thaw_bdev() > > thaw_bdev() > > mutex_lock(&bdev->bd_fsfreeze_mutex); --> Line 602 (Lock B) > > > > freeze_bdev() > > mutex_lock(&bdev->bd_fsfreeze_mutex); --> Line 556 (Lock B) > > freeze_super() > > down_write(&sb->s_umount); --> Line 1716 (Lock A) > > down_write(&sb->s_umount); --> Line 1738 (Lock A) > > deactivate_super() > > down_write(&s->s_umount); --> Line 365 (Lock A) > > > > When do_thaw_all_callback() and freeze_bdev() are concurrently executed, the > > deadlocks can occur. > > > > I am not quite sure whether these possible deadlocks are real and how to fix > > them if them are real. > > Any feedback would be appreciated, thanks :) > > As a rule, ABBA deadlocks that can actually occur are already found by > lockdep. Tools that think they've found something are generally wrong. > I'm not inclined to look in detail to find out why this tool is wrong > because lockdep is so effective. Well, to be fair, lockdep will only find problems if both code paths are actually executed during a boot session where lockdep is active. In this particular case, "do_thaw_all_callback()" is called only from emergency_thaw_all(), which is executed via a magic-sysrq. (Sysrq-j). In practice, this sysrq is almost never used except to work around userspace bugs where a particular block device is frozen via the FIFREEZE ioctl, and never thawed via the FITHAW ioctl. So unless we had, say, an xfstest which tried to simulate triggering sysrq-j (e.g., via "echo j > /proc/sysrq-trigger"), lockdep would never find it. Of course, how likely is it that a user would try to trigger sysrq-j, because the user was trying to debug a buggy program that froze a block device and then, say, crashed before it had a chance to thaw it? It's probably pretty darned unlikely. So as to whether or not it's real, I'm sure we could probably trigger the deadlock using an artificial workload if you had one process constantly calling FIFREEZE and FITHAW on a block device, and other process constantly triggering "echo j > /proc/sysrq-trigger". So it *technically* could happen. Is it *likely* to happen under any kind of normal workload? Not hardly.... This makes it fall in the category of, "patch to fix something that never happens in real life, and would require root privs to trigger, and root can screw over the system in enough other ways anyway so it's kind of pointless", versus "let's try to shut up the static checker so we can find real bugs". And there I'd agree with Willy; I run xfstests with lockdep enabled, and given that the code coverage of xfstests is pretty good, I'm confident that any ABBA deadlocks that are *likely* to happen in real life tend to be found quickly, and fixed. If someone wanted to rewrite the emergency_thaw codepath to fix the locking order, in my opinion it's *technically* a bug fix. But it's the sort of thing which gets categorized as a P2 bug, and after a year, gets dropped down to P3, and a year after that, dropped down to P4 and ignored, since for most engineering organizations, resources are finite, and while this is a real bug, for most companies it's not worth fixing. Cheers, - Ted ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] fs: super: possible ABBA deadlocks in do_thaw_all_callback() and freeze_bdev() 2021-12-27 4:39 ` Theodore Ts'o @ 2021-12-27 9:32 ` Jia-Ju Bai 2021-12-27 13:43 ` Theodore Ts'o 0 siblings, 1 reply; 6+ messages in thread From: Jia-Ju Bai @ 2021-12-27 9:32 UTC (permalink / raw) To: Theodore Ts'o, Matthew Wilcox Cc: viro, Jens Axboe, hch, linux-fsdevel, linux-block, linux-kernel On 2021/12/27 12:39, Theodore Ts'o wrote: > On Mon, Dec 27, 2021 at 02:08:58AM +0000, Matthew Wilcox wrote: >> On Mon, Dec 27, 2021 at 10:03:35AM +0800, Jia-Ju Bai wrote: >>> My static analysis tool reports several possible ABBA deadlocks in Linux >>> 5.10: >>> >>> do_thaw_all_callback() >>> down_write(&sb->s_umount); --> Line 1028 (Lock A) >>> emergency_thaw_bdev() >>> thaw_bdev() >>> mutex_lock(&bdev->bd_fsfreeze_mutex); --> Line 602 (Lock B) >>> >>> freeze_bdev() >>> mutex_lock(&bdev->bd_fsfreeze_mutex); --> Line 556 (Lock B) >>> freeze_super() >>> down_write(&sb->s_umount); --> Line 1716 (Lock A) >>> down_write(&sb->s_umount); --> Line 1738 (Lock A) >>> deactivate_super() >>> down_write(&s->s_umount); --> Line 365 (Lock A) >>> >>> When do_thaw_all_callback() and freeze_bdev() are concurrently executed, the >>> deadlocks can occur. >>> >>> I am not quite sure whether these possible deadlocks are real and how to fix >>> them if them are real. >>> Any feedback would be appreciated, thanks :) >> As a rule, ABBA deadlocks that can actually occur are already found by >> lockdep. Tools that think they've found something are generally wrong. >> I'm not inclined to look in detail to find out why this tool is wrong >> because lockdep is so effective. > Well, to be fair, lockdep will only find problems if both code paths > are actually executed during a boot session where lockdep is active. > > In this particular case, "do_thaw_all_callback()" is called only from > emergency_thaw_all(), which is executed via a magic-sysrq. (Sysrq-j). > In practice, this sysrq is almost never used except to work around > userspace bugs where a particular block device is frozen via the > FIFREEZE ioctl, and never thawed via the FITHAW ioctl. > > So unless we had, say, an xfstest which tried to simulate triggering > sysrq-j (e.g., via "echo j > /proc/sysrq-trigger"), lockdep would > never find it. Of course, how likely is it that a user would try to > trigger sysrq-j, because the user was trying to debug a buggy program > that froze a block device and then, say, crashed before it had a > chance to thaw it? It's probably pretty darned unlikely. > > So as to whether or not it's real, I'm sure we could probably trigger > the deadlock using an artificial workload if you had one process > constantly calling FIFREEZE and FITHAW on a block device, and other > process constantly triggering "echo j > /proc/sysrq-trigger". So it > *technically* could happen. Is it *likely* to happen under any kind > of normal workload? Not hardly.... > > This makes it fall in the category of, "patch to fix something that > never happens in real life, and would require root privs to trigger, > and root can screw over the system in enough other ways anyway so it's > kind of pointless", versus "let's try to shut up the static checker so > we can find real bugs". > > And there I'd agree with Willy; I run xfstests with lockdep enabled, > and given that the code coverage of xfstests is pretty good, I'm > confident that any ABBA deadlocks that are *likely* to happen in real > life tend to be found quickly, and fixed. > > If someone wanted to rewrite the emergency_thaw codepath to fix the > locking order, in my opinion it's *technically* a bug fix. But it's > the sort of thing which gets categorized as a P2 bug, and after a > year, gets dropped down to P3, and a year after that, dropped down to > P4 and ignored, since for most engineering organizations, resources > are finite, and while this is a real bug, for most companies it's not > worth fixing. Thanks for your reply and suggestions. I will try to trigger this possible deadlock by enabling lockdep and using the workloads that you suggested. In my opinion, static analysis can conveniently cover some code that is hard to be covered at runtime, and thus it is useful to detecting some infrequently-triggered bugs. However, it is true that static analysis sometimes has many false positives, which is unsatisfactory :( I am trying some works to relieve this problem in kernel-code analysis. I can understand that the related code is not frequently executed, but I think that finding and fixing bugs should be always useful in practice :) Best wishes, Jia-Ju Bai ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] fs: super: possible ABBA deadlocks in do_thaw_all_callback() and freeze_bdev() 2021-12-27 9:32 ` Jia-Ju Bai @ 2021-12-27 13:43 ` Theodore Ts'o 2021-12-28 2:31 ` Jia-Ju Bai 0 siblings, 1 reply; 6+ messages in thread From: Theodore Ts'o @ 2021-12-27 13:43 UTC (permalink / raw) To: Jia-Ju Bai Cc: Matthew Wilcox, viro, Jens Axboe, hch, linux-fsdevel, linux-block, linux-kernel On Mon, Dec 27, 2021 at 05:32:09PM +0800, Jia-Ju Bai wrote: > Thanks for your reply and suggestions. > I will try to trigger this possible deadlock by enabling lockdep and using > the workloads that you suggested. > In my opinion, static analysis can conveniently cover some code that is hard > to be covered at runtime, and thus it is useful to detecting some > infrequently-triggered bugs. > However, it is true that static analysis sometimes has many false positives, > which is unsatisfactory :( > I am trying some works to relieve this problem in kernel-code analysis. > I can understand that the related code is not frequently executed, but I > think that finding and fixing bugs should be always useful in practice :) The thing about the sysrq commands is that they are almost always used in emergency situations when the system administrator with physical access to the console sends a sysrq command (e.g., by sending a BREAK to the serial console). This is usually done when the system has *already* locked up for some reason, such as getting livelocked due to an out of memory condition, or maybe even a deadlock. So if sysrq-j could potentially cause a deadlock, so what? Sysrq-j would only be used when the system was in a really bad state due to a bug in any case. In over 10 years of kernel development, I can't remember a single time when I've needed to use sysrq-j. So it might be that the better way to handle this would be to make sure all of the emergency sysrq code in fs/super.c is under the CONFIG_MAGIC_SYSRQ #ifdef --- and then do the static analysis without CONFIG_MAGIC_SYSRQ defined. As I said, I agree it's a bug, and if I had infinite resources, I'd certainly ask an engineer to completely rework the emergency sysrq-j code path to address the potential ABBA deadlock. The problem is I do *not* have infinite resources, which means I have to prioritize which bugs get attention, and how much time engineers on my team spend working on new features or performance enhacements that can justify their salaries and ensure that they get good performance ratings --- since leadership, technical difficulty and business impact is how engineers get judged at my company. Unfortunately, judging business impact is one of those things that is unfair to expect a static analyzer to do. And after all, if we have infinite resources, why should an OS bother with a VM? We can just pin all process text/data segments in memory, if money (and DRAM availability in the supply chain) is no object. :-) Cheers, - Ted ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] fs: super: possible ABBA deadlocks in do_thaw_all_callback() and freeze_bdev() 2021-12-27 13:43 ` Theodore Ts'o @ 2021-12-28 2:31 ` Jia-Ju Bai 0 siblings, 0 replies; 6+ messages in thread From: Jia-Ju Bai @ 2021-12-28 2:31 UTC (permalink / raw) To: Theodore Ts'o Cc: Matthew Wilcox, viro, Jens Axboe, hch, linux-fsdevel, linux-block, linux-kernel On 2021/12/27 21:43, Theodore Ts'o wrote: > On Mon, Dec 27, 2021 at 05:32:09PM +0800, Jia-Ju Bai wrote: >> Thanks for your reply and suggestions. >> I will try to trigger this possible deadlock by enabling lockdep and using >> the workloads that you suggested. >> In my opinion, static analysis can conveniently cover some code that is hard >> to be covered at runtime, and thus it is useful to detecting some >> infrequently-triggered bugs. >> However, it is true that static analysis sometimes has many false positives, >> which is unsatisfactory :( >> I am trying some works to relieve this problem in kernel-code analysis. >> I can understand that the related code is not frequently executed, but I >> think that finding and fixing bugs should be always useful in practice :) > The thing about the sysrq commands is that they are almost always used > in emergency situations when the system administrator with physical > access to the console sends a sysrq command (e.g., by sending a BREAK > to the serial console). This is usually done when the system has > *already* locked up for some reason, such as getting livelocked due to > an out of memory condition, or maybe even a deadlock. So if sysrq-j > could potentially cause a deadlock, so what? Sysrq-j would only be > used when the system was in a really bad state due to a bug in any > case. In over 10 years of kernel development, I can't remember a > single time when I've needed to use sysrq-j. > > So it might be that the better way to handle this would be to make > sure all of the emergency sysrq code in fs/super.c is under the > CONFIG_MAGIC_SYSRQ #ifdef --- and then do the static analysis without > CONFIG_MAGIC_SYSRQ defined. Thanks for the explanation. In fact, I did not know the sysrq commands, before finding this bug and seeing your explanation. > > As I said, I agree it's a bug, and if I had infinite resources, I'd > certainly ask an engineer to completely rework the emergency sysrq-j > code path to address the potential ABBA deadlock. The problem is I do > *not* have infinite resources, which means I have to prioritize which > bugs get attention, and how much time engineers on my team spend > working on new features or performance enhacements that can justify > their salaries and ensure that they get good performance ratings --- > since leadership, technical difficulty and business impact is how > engineers get judged at my company. I can understand the priority of bug fixing, with the consideration of resources and time. My static analysis tool just provides a small message that there is a possible bug :) > > Unfortunately, judging business impact is one of those things that is > unfair to expect a static analyzer to do. Thanks for your understanding :) Before seeing your explanation, I have no idea of business impact. But it is indeed practical to consider business impact and resource assignment in kernel development. > And after all, if we have > infinite resources, why should an OS bother with a VM? We can just > pin all process text/data segments in memory, if money (and DRAM > availability in the supply chain) is no object. :-) Haha, interesting idea :) Thanks a lot, Jia-Ju Bai ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-12-28 2:31 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-12-27 2:03 [BUG] fs: super: possible ABBA deadlocks in do_thaw_all_callback() and freeze_bdev() Jia-Ju Bai 2021-12-27 2:08 ` Matthew Wilcox 2021-12-27 4:39 ` Theodore Ts'o 2021-12-27 9:32 ` Jia-Ju Bai 2021-12-27 13:43 ` Theodore Ts'o 2021-12-28 2:31 ` Jia-Ju Bai
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).