* [bug report] scsi: target: tcmu: Fix possible data corruption
@ 2022-05-04 15:12 Dan Carpenter
2022-05-06 8:47 ` Xiaoguang Wang
2022-05-08 18:03 ` Bodo Stroesser
0 siblings, 2 replies; 8+ messages in thread
From: Dan Carpenter @ 2022-05-04 15:12 UTC (permalink / raw)
To: xiaoguang.wang; +Cc: target-devel
Hello Xiaoguang Wang,
The patch bb9b9eb0ae2e: "scsi: target: tcmu: Fix possible data
corruption" from Apr 21, 2022, leads to the following Smatch static
checker warning:
drivers/target/target_core_user.c:1689 tcmu_blocks_release()
warn: sleeping in atomic context
drivers/target/target_core_user.c
1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first,
1662 unsigned long last)
1663 {
1664 XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk);
1665 struct page *page;
1666 u32 pages_freed = 0;
1667
1668 xas_lock(&xas);
^^^^^^^^^^^^^^
We take a spinlock here.
1669 xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) {
1670 xas_store(&xas, NULL);
1671 /*
1672 * While reaching here there may be page faults occurring on
1673 * the to-be-released pages. A race condition may occur if
1674 * unmap_mapping_range() is called before page faults on these
1675 * pages have completed; a valid but stale map is created.
1676 *
1677 * If another command subsequently runs and needs to extend
1678 * dbi_thresh, it may reuse the slot corresponding to the
1679 * previous page in data_bitmap. Though we will allocate a new
1680 * page for the slot in data_area, no page fault will happen
1681 * because we have a valid map. Therefore the command's data
1682 * will be lost.
1683 *
1684 * We lock and unlock pages that are to be released to ensure
1685 * all page faults have completed. This way
1686 * unmap_mapping_range() can ensure stale maps are cleanly
1687 * removed.
1688 */
--> 1689 lock_page(page);
^^^^^^^^^^^^^^^
The lock_page() function calls might_sleep() (inside the declaration
block).
1690 unlock_page(page);
1691 __free_page(page);
1692 pages_freed++;
1693 }
1694 xas_unlock(&xas);
^^^^^^^^^^^^^^^^^
Unlock
1695
1696 atomic_sub(pages_freed, &global_page_count);
1697
1698 return pages_freed;
1699 }
regards,
dan carpenter
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [bug report] scsi: target: tcmu: Fix possible data corruption 2022-05-04 15:12 [bug report] scsi: target: tcmu: Fix possible data corruption Dan Carpenter @ 2022-05-06 8:47 ` Xiaoguang Wang 2022-05-08 18:03 ` Bodo Stroesser 1 sibling, 0 replies; 8+ messages in thread From: Xiaoguang Wang @ 2022-05-06 8:47 UTC (permalink / raw) To: Dan Carpenter; +Cc: target-devel hi, Sorry for late response, I missed this mail. Thanks for this report. Regards, Xiaoguang Wang > Hello Xiaoguang Wang, > > The patch bb9b9eb0ae2e: "scsi: target: tcmu: Fix possible data > corruption" from Apr 21, 2022, leads to the following Smatch static > checker warning: > > drivers/target/target_core_user.c:1689 tcmu_blocks_release() > warn: sleeping in atomic context > > drivers/target/target_core_user.c > 1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first, > 1662 unsigned long last) > 1663 { > 1664 XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk); > 1665 struct page *page; > 1666 u32 pages_freed = 0; > 1667 > 1668 xas_lock(&xas); > ^^^^^^^^^^^^^^ > We take a spinlock here. > > > 1669 xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) { > 1670 xas_store(&xas, NULL); > 1671 /* > 1672 * While reaching here there may be page faults occurring on > 1673 * the to-be-released pages. A race condition may occur if > 1674 * unmap_mapping_range() is called before page faults on these > 1675 * pages have completed; a valid but stale map is created. > 1676 * > 1677 * If another command subsequently runs and needs to extend > 1678 * dbi_thresh, it may reuse the slot corresponding to the > 1679 * previous page in data_bitmap. Though we will allocate a new > 1680 * page for the slot in data_area, no page fault will happen > 1681 * because we have a valid map. Therefore the command's data > 1682 * will be lost. > 1683 * > 1684 * We lock and unlock pages that are to be released to ensure > 1685 * all page faults have completed. This way > 1686 * unmap_mapping_range() can ensure stale maps are cleanly > 1687 * removed. > 1688 */ > --> 1689 lock_page(page); > ^^^^^^^^^^^^^^^ > The lock_page() function calls might_sleep() (inside the declaration > block). > > 1690 unlock_page(page); > 1691 __free_page(page); > 1692 pages_freed++; > 1693 } > 1694 xas_unlock(&xas); > ^^^^^^^^^^^^^^^^^ > Unlock > > 1695 > 1696 atomic_sub(pages_freed, &global_page_count); > 1697 > 1698 return pages_freed; > 1699 } > > regards, > dan carpenter ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report] scsi: target: tcmu: Fix possible data corruption 2022-05-04 15:12 [bug report] scsi: target: tcmu: Fix possible data corruption Dan Carpenter 2022-05-06 8:47 ` Xiaoguang Wang @ 2022-05-08 18:03 ` Bodo Stroesser 2022-05-09 3:13 ` Xiaoguang Wang 2022-05-09 6:05 ` Dan Carpenter 1 sibling, 2 replies; 8+ messages in thread From: Bodo Stroesser @ 2022-05-08 18:03 UTC (permalink / raw) To: Dan Carpenter, xiaoguang.wang; +Cc: target-devel Hi Dan, Thank you for the report. I'm quite sure that our code does not cause any problems, because in tcmu we explicitly or implicitly take the xarray's lock only while holding the so called cmdr_lock mutex. Also, we take the lock without disabling irq or bh. So I think there is no problem if lock_page sleeps while we hold the xarray's lock. Of course, we wouldn't need the xarray lock at all, but xarray code forces us to take it to avoid warnings. In tcmu_blocks_release we use the advanced xarray API to keep the overhead small. It allows us to lock/unlock before and after the loop only. If there is no other way to avoid the Smatch warning, we could easily put additional xas_unlock() and xas_lock() around the lock_page/unlock_page block. But if there is a way to avoid the warning without imposing overhead, I would of course prefer it. Regards, Bodo On 04.05.22 17:12, Dan Carpenter wrote: > Hello Xiaoguang Wang, > > The patch bb9b9eb0ae2e: "scsi: target: tcmu: Fix possible data > corruption" from Apr 21, 2022, leads to the following Smatch static > checker warning: > > drivers/target/target_core_user.c:1689 tcmu_blocks_release() > warn: sleeping in atomic context > > drivers/target/target_core_user.c > 1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first, > 1662 unsigned long last) > 1663 { > 1664 XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk); > 1665 struct page *page; > 1666 u32 pages_freed = 0; > 1667 > 1668 xas_lock(&xas); > ^^^^^^^^^^^^^^ > We take a spinlock here. > > > 1669 xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) { > 1670 xas_store(&xas, NULL); > 1671 /* > 1672 * While reaching here there may be page faults occurring on > 1673 * the to-be-released pages. A race condition may occur if > 1674 * unmap_mapping_range() is called before page faults on these > 1675 * pages have completed; a valid but stale map is created. > 1676 * > 1677 * If another command subsequently runs and needs to extend > 1678 * dbi_thresh, it may reuse the slot corresponding to the > 1679 * previous page in data_bitmap. Though we will allocate a new > 1680 * page for the slot in data_area, no page fault will happen > 1681 * because we have a valid map. Therefore the command's data > 1682 * will be lost. > 1683 * > 1684 * We lock and unlock pages that are to be released to ensure > 1685 * all page faults have completed. This way > 1686 * unmap_mapping_range() can ensure stale maps are cleanly > 1687 * removed. > 1688 */ > --> 1689 lock_page(page); > ^^^^^^^^^^^^^^^ > The lock_page() function calls might_sleep() (inside the declaration > block). > > 1690 unlock_page(page); > 1691 __free_page(page); > 1692 pages_freed++; > 1693 } > 1694 xas_unlock(&xas); > ^^^^^^^^^^^^^^^^^ > Unlock > > 1695 > 1696 atomic_sub(pages_freed, &global_page_count); > 1697 > 1698 return pages_freed; > 1699 } > > regards, > dan carpenter ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report] scsi: target: tcmu: Fix possible data corruption 2022-05-08 18:03 ` Bodo Stroesser @ 2022-05-09 3:13 ` Xiaoguang Wang 2022-05-09 6:05 ` Dan Carpenter 1 sibling, 0 replies; 8+ messages in thread From: Xiaoguang Wang @ 2022-05-09 3:13 UTC (permalink / raw) To: Bodo Stroesser, Dan Carpenter; +Cc: target-devel hi, > Hi Dan, > > Thank you for the report. > > I'm quite sure that our code does not cause any problems, because > in tcmu we explicitly or implicitly take the xarray's lock only while > holding the so called cmdr_lock mutex. Also, we take the lock without > disabling irq or bh. So I think there is no problem if lock_page sleeps > while we hold the xarray's lock. > > Of course, we wouldn't need the xarray lock at all, but xarray code > forces us to take it to avoid warnings. I also see it now, thanks. Indeed I was going to prepare patch which removes this xarry lock calls. Regards, Xiaoguang Wang > > In tcmu_blocks_release we use the advanced xarray API to keep the > overhead small. It allows us to lock/unlock before and after the loop > only. If there is no other way to avoid the Smatch warning, we could > easily put additional xas_unlock() and xas_lock() around the > lock_page/unlock_page block. > > But if there is a way to avoid the warning without imposing overhead, > I would of course prefer it. > > Regards, > Bodo > > > On 04.05.22 17:12, Dan Carpenter wrote: >> Hello Xiaoguang Wang, >> >> The patch bb9b9eb0ae2e: "scsi: target: tcmu: Fix possible data >> corruption" from Apr 21, 2022, leads to the following Smatch static >> checker warning: >> >> drivers/target/target_core_user.c:1689 tcmu_blocks_release() >> warn: sleeping in atomic context >> >> drivers/target/target_core_user.c >> 1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first, >> 1662 unsigned long last) >> 1663 { >> 1664 XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk); >> 1665 struct page *page; >> 1666 u32 pages_freed = 0; >> 1667 >> 1668 xas_lock(&xas); >> ^^^^^^^^^^^^^^ >> We take a spinlock here. >> >> >> 1669 xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) { >> 1670 xas_store(&xas, NULL); >> 1671 /* >> 1672 * While reaching here there may be page faults occurring on >> 1673 * the to-be-released pages. A race condition may occur if >> 1674 * unmap_mapping_range() is called before page faults on these >> 1675 * pages have completed; a valid but stale map is created. >> 1676 * >> 1677 * If another command subsequently runs and needs to extend >> 1678 * dbi_thresh, it may reuse the slot corresponding to the >> 1679 * previous page in data_bitmap. Though we will allocate a new >> 1680 * page for the slot in data_area, no page fault will happen >> 1681 * because we have a valid map. Therefore the command's data >> 1682 * will be lost. >> 1683 * >> 1684 * We lock and unlock pages that are to be released to ensure >> 1685 * all page faults have completed. This way >> 1686 * unmap_mapping_range() can ensure stale maps are cleanly >> 1687 * removed. >> 1688 */ >> --> 1689 lock_page(page); >> ^^^^^^^^^^^^^^^ >> The lock_page() function calls might_sleep() (inside the declaration >> block). >> >> 1690 unlock_page(page); >> 1691 __free_page(page); >> 1692 pages_freed++; >> 1693 } >> 1694 xas_unlock(&xas); >> ^^^^^^^^^^^^^^^^^ >> Unlock >> >> 1695 >> 1696 atomic_sub(pages_freed, &global_page_count); >> 1697 >> 1698 return pages_freed; >> 1699 } >> >> regards, >> dan carpenter ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report] scsi: target: tcmu: Fix possible data corruption 2022-05-08 18:03 ` Bodo Stroesser 2022-05-09 3:13 ` Xiaoguang Wang @ 2022-05-09 6:05 ` Dan Carpenter 2022-05-09 18:12 ` Matthew Wilcox 2022-05-09 18:28 ` Bodo Stroesser 1 sibling, 2 replies; 8+ messages in thread From: Dan Carpenter @ 2022-05-09 6:05 UTC (permalink / raw) To: Bodo Stroesser, Matthew Wilcox; +Cc: xiaoguang.wang, target-devel > If there is no other way to avoid the Smatch warning, The Smatch warning is not the issue. If we're holding a spinlock and we call might_sleep() then that generates a stack trace at runtime if you have CONFIG_DEBUG_ATOMIC_SLEEP enabled. Probably enabling CONFIG_DEBUG_ATOMIC_SLEEP should just be a standard part of the QC process. Anyway, it sounds you're just doing the locking to silence a warning in xarray. Let's ask Matthew if he has a hint. regards, dan carpenter On Sun, May 08, 2022 at 08:03:14PM +0200, Bodo Stroesser wrote: > Hi Dan, > > Thank you for the report. > > I'm quite sure that our code does not cause any problems, because > in tcmu we explicitly or implicitly take the xarray's lock only while > holding the so called cmdr_lock mutex. Also, we take the lock without > disabling irq or bh. So I think there is no problem if lock_page sleeps > while we hold the xarray's lock. > > Of course, we wouldn't need the xarray lock at all, but xarray code > forces us to take it to avoid warnings. > > In tcmu_blocks_release we use the advanced xarray API to keep the > overhead small. It allows us to lock/unlock before and after the loop > only. If there is no other way to avoid the Smatch warning, we could > easily put additional xas_unlock() and xas_lock() around the > lock_page/unlock_page block. > > But if there is a way to avoid the warning without imposing overhead, > I would of course prefer it. > > Regards, > Bodo > > > On 04.05.22 17:12, Dan Carpenter wrote: > > Hello Xiaoguang Wang, > > > > The patch bb9b9eb0ae2e: "scsi: target: tcmu: Fix possible data > > corruption" from Apr 21, 2022, leads to the following Smatch static > > checker warning: > > > > drivers/target/target_core_user.c:1689 tcmu_blocks_release() > > warn: sleeping in atomic context > > > > drivers/target/target_core_user.c > > 1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first, > > 1662 unsigned long last) > > 1663 { > > 1664 XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk); > > 1665 struct page *page; > > 1666 u32 pages_freed = 0; > > 1667 > > 1668 xas_lock(&xas); > > ^^^^^^^^^^^^^^ > > We take a spinlock here. > > > > > > 1669 xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) { > > 1670 xas_store(&xas, NULL); > > 1671 /* > > 1672 * While reaching here there may be page faults occurring on > > 1673 * the to-be-released pages. A race condition may occur if > > 1674 * unmap_mapping_range() is called before page faults on these > > 1675 * pages have completed; a valid but stale map is created. > > 1676 * > > 1677 * If another command subsequently runs and needs to extend > > 1678 * dbi_thresh, it may reuse the slot corresponding to the > > 1679 * previous page in data_bitmap. Though we will allocate a new > > 1680 * page for the slot in data_area, no page fault will happen > > 1681 * because we have a valid map. Therefore the command's data > > 1682 * will be lost. > > 1683 * > > 1684 * We lock and unlock pages that are to be released to ensure > > 1685 * all page faults have completed. This way > > 1686 * unmap_mapping_range() can ensure stale maps are cleanly > > 1687 * removed. > > 1688 */ > > --> 1689 lock_page(page); > > ^^^^^^^^^^^^^^^ > > The lock_page() function calls might_sleep() (inside the declaration > > block). > > > > 1690 unlock_page(page); > > 1691 __free_page(page); > > 1692 pages_freed++; > > 1693 } > > 1694 xas_unlock(&xas); > > ^^^^^^^^^^^^^^^^^ > > Unlock > > > > 1695 > > 1696 atomic_sub(pages_freed, &global_page_count); > > 1697 > > 1698 return pages_freed; > > 1699 } > > > > regards, > > dan carpenter ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report] scsi: target: tcmu: Fix possible data corruption 2022-05-09 6:05 ` Dan Carpenter @ 2022-05-09 18:12 ` Matthew Wilcox 2022-05-09 19:22 ` Bodo Stroesser 2022-05-09 18:28 ` Bodo Stroesser 1 sibling, 1 reply; 8+ messages in thread From: Matthew Wilcox @ 2022-05-09 18:12 UTC (permalink / raw) To: Dan Carpenter; +Cc: Bodo Stroesser, xiaoguang.wang, target-devel On Mon, May 09, 2022 at 09:05:45AM +0300, Dan Carpenter wrote: > > If there is no other way to avoid the Smatch warning, > > The Smatch warning is not the issue. If we're holding a spinlock and > we call might_sleep() then that generates a stack trace at runtime if > you have CONFIG_DEBUG_ATOMIC_SLEEP enabled. Probably enabling > CONFIG_DEBUG_ATOMIC_SLEEP should just be a standard part of the QC > process. > > Anyway, it sounds you're just doing the locking to silence a warning in > xarray. Let's ask Matthew if he has a hint. I suspect that the tcmu code is doing something horrible & wrong again. It keeps poking at VM internals without understanding of what's going on or why. Anyway ... > On Sun, May 08, 2022 at 08:03:14PM +0200, Bodo Stroesser wrote: > > I'm quite sure that our code does not cause any problems, because > > in tcmu we explicitly or implicitly take the xarray's lock only while > > holding the so called cmdr_lock mutex. Also, we take the lock without > > disabling irq or bh. So I think there is no problem if lock_page sleeps > > while we hold the xarray's lock. > > > > Of course, we wouldn't need the xarray lock at all, but xarray code > > forces us to take it to avoid warnings. You mean "The XArray warns us when we break its locking rules". > > In tcmu_blocks_release we use the advanced xarray API to keep the > > overhead small. It allows us to lock/unlock before and after the loop > > only. If there is no other way to avoid the Smatch warning, we could > > easily put additional xas_unlock() and xas_lock() around the > > lock_page/unlock_page block. ... then you'd have to call xas_reset(), and you might as well use xa_for_each(). > > > 1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first, > > > 1662 unsigned long last) > > > 1663 { > > > 1664 XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk); > > > 1665 struct page *page; > > > 1666 u32 pages_freed = 0; > > > 1667 > > > 1668 xas_lock(&xas); > > > ^^^^^^^^^^^^^^ > > > We take a spinlock here. > > > > > > > > > 1669 xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) { > > > 1670 xas_store(&xas, NULL); > > > 1671 /* > > > 1672 * While reaching here there may be page faults occurring on > > > 1673 * the to-be-released pages. A race condition may occur if > > > 1674 * unmap_mapping_range() is called before page faults on these > > > 1675 * pages have completed; a valid but stale map is created. > > > 1676 * > > > 1677 * If another command subsequently runs and needs to extend > > > 1678 * dbi_thresh, it may reuse the slot corresponding to the > > > 1679 * previous page in data_bitmap. Though we will allocate a new > > > 1680 * page for the slot in data_area, no page fault will happen > > > 1681 * because we have a valid map. Therefore the command's data > > > 1682 * will be lost. > > > 1683 * > > > 1684 * We lock and unlock pages that are to be released to ensure > > > 1685 * all page faults have completed. This way > > > 1686 * unmap_mapping_range() can ensure stale maps are cleanly > > > 1687 * removed. > > > 1688 */ > > > --> 1689 lock_page(page); > > > ^^^^^^^^^^^^^^^ > > > The lock_page() function calls might_sleep() (inside the declaration > > > block). > > > > > > 1690 unlock_page(page); > > > 1691 __free_page(page); > > > 1692 pages_freed++; > > > 1693 } > > > 1694 xas_unlock(&xas); There are a number of things you can do. One is to remove pages into a pagevec until it is full, then xas_unlock(); xas_reset(); lock and unlock each page and pass the pagevec to __pagevec_release(). Then xas_lock() and continue the loop. Another possibility is to trylock each page; if it fails, put it into the pagevec (and do the above dance if the pagevec is now full), but if it succeeds, you can now unlock it and __free_page(). The key to going fast is batching. And that goes for __free_page() vs __pagevec_release() as much as for walking the XArray. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report] scsi: target: tcmu: Fix possible data corruption 2022-05-09 18:12 ` Matthew Wilcox @ 2022-05-09 19:22 ` Bodo Stroesser 0 siblings, 0 replies; 8+ messages in thread From: Bodo Stroesser @ 2022-05-09 19:22 UTC (permalink / raw) To: Matthew Wilcox, Dan Carpenter; +Cc: xiaoguang.wang, target-devel On 09.05.22 20:12, Matthew Wilcox wrote: > On Mon, May 09, 2022 at 09:05:45AM +0300, Dan Carpenter wrote: >>> If there is no other way to avoid the Smatch warning, >> >> The Smatch warning is not the issue. If we're holding a spinlock and >> we call might_sleep() then that generates a stack trace at runtime if >> you have CONFIG_DEBUG_ATOMIC_SLEEP enabled. Probably enabling >> CONFIG_DEBUG_ATOMIC_SLEEP should just be a standard part of the QC >> process. >> >> Anyway, it sounds you're just doing the locking to silence a warning in >> xarray. Let's ask Matthew if he has a hint. > > I suspect that the tcmu code is doing something horrible & wrong again. > It keeps poking at VM internals without understanding of what's going > on or why. Anyway ... If we really are doing something wrong, the long comment in the new code in tcmu_blocks_release probably is wrong. Can you please give us a hint what our misunderstanding is? > >> On Sun, May 08, 2022 at 08:03:14PM +0200, Bodo Stroesser wrote: >>> I'm quite sure that our code does not cause any problems, because >>> in tcmu we explicitly or implicitly take the xarray's lock only while >>> holding the so called cmdr_lock mutex. Also, we take the lock without >>> disabling irq or bh. So I think there is no problem if lock_page sleeps >>> while we hold the xarray's lock. >>> >>> Of course, we wouldn't need the xarray lock at all, but xarray code >>> forces us to take it to avoid warnings. > > You mean "The XArray warns us when we break its locking rules". > Yes. The point is, that tcmu does not use spin_locks, but mutex. So we unfortunately cannot use the XArray lock alone, but always have to take tcmu's mutex and while holding it, we additionally (implicitly or explicitly) take XArray's lock. >>> In tcmu_blocks_release we use the advanced xarray API to keep the >>> overhead small. It allows us to lock/unlock before and after the loop >>> only. If there is no other way to avoid the Smatch warning, we could >>> easily put additional xas_unlock() and xas_lock() around the >>> lock_page/unlock_page block. > > ... then you'd have to call xas_reset(), and you might as well use > xa_for_each(). Yes. I wrote a mail few minutes ago that we should switch to xa_for_each_range(). > >>>> 1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first, >>>> 1662 unsigned long last) >>>> 1663 { >>>> 1664 XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk); >>>> 1665 struct page *page; >>>> 1666 u32 pages_freed = 0; >>>> 1667 >>>> 1668 xas_lock(&xas); >>>> ^^^^^^^^^^^^^^ >>>> We take a spinlock here. >>>> >>>> >>>> 1669 xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) { >>>> 1670 xas_store(&xas, NULL); >>>> 1671 /* >>>> 1672 * While reaching here there may be page faults occurring on >>>> 1673 * the to-be-released pages. A race condition may occur if >>>> 1674 * unmap_mapping_range() is called before page faults on these >>>> 1675 * pages have completed; a valid but stale map is created. >>>> 1676 * >>>> 1677 * If another command subsequently runs and needs to extend >>>> 1678 * dbi_thresh, it may reuse the slot corresponding to the >>>> 1679 * previous page in data_bitmap. Though we will allocate a new >>>> 1680 * page for the slot in data_area, no page fault will happen >>>> 1681 * because we have a valid map. Therefore the command's data >>>> 1682 * will be lost. >>>> 1683 * >>>> 1684 * We lock and unlock pages that are to be released to ensure >>>> 1685 * all page faults have completed. This way >>>> 1686 * unmap_mapping_range() can ensure stale maps are cleanly >>>> 1687 * removed. >>>> 1688 */ >>>> --> 1689 lock_page(page); >>>> ^^^^^^^^^^^^^^^ >>>> The lock_page() function calls might_sleep() (inside the declaration >>>> block). >>>> >>>> 1690 unlock_page(page); >>>> 1691 __free_page(page); >>>> 1692 pages_freed++; >>>> 1693 } >>>> 1694 xas_unlock(&xas); > > There are a number of things you can do. One is to remove pages into > a pagevec until it is full, then xas_unlock(); xas_reset(); lock and > unlock each page and pass the pagevec to __pagevec_release(). Then > xas_lock() and continue the loop. > > Another possibility is to trylock each page; if it fails, put it into > the pagevec (and do the above dance if the pagevec is now full), but > if it succeeds, you can now unlock it and __free_page(). Thank you for that hint. The probability, that trylock_page fails is _very_ low. In future we might call tcmu_blocks_release more often. Then we can speed it up again by using the method you described, even without a need for batching. > > The key to going fast is batching. And that goes for __free_page() > vs __pagevec_release() as much as for walking the XArray. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report] scsi: target: tcmu: Fix possible data corruption 2022-05-09 6:05 ` Dan Carpenter 2022-05-09 18:12 ` Matthew Wilcox @ 2022-05-09 18:28 ` Bodo Stroesser 1 sibling, 0 replies; 8+ messages in thread From: Bodo Stroesser @ 2022-05-09 18:28 UTC (permalink / raw) To: Dan Carpenter, Matthew Wilcox, xiaoguang.wang; +Cc: target-devel To be honest, tcmu_blocks_release is not relevant for tcmu performance. Therefore I think we should try to keep the code simple and easy to understand. So, it probably is the simplest and also best solution to switch to xarray normal API. That would make tcmu_blocks_release look like the following (untested) code: static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first, unsigned long last) { struct page *page; unsigned long dpi; u32 pages_freed = 0; first = first * udev->data_pages_per_blk; last = (last + 1) * udev->data_pages_per_blk - 1; xa_for_each_range(&udev->data_pages, dpi, page, first, last) { xa_erase(&udev->data_pages, dpi); lock_page(page); unlock_page(page); __free_page(page); pages_freed++; } atomic_sub(pages_freed, &global_page_count); return pages_freed; } Regards, Bodo On 09.05.22 08:05, Dan Carpenter wrote: >> If there is no other way to avoid the Smatch warning, > > The Smatch warning is not the issue. If we're holding a spinlock and > we call might_sleep() then that generates a stack trace at runtime if > you have CONFIG_DEBUG_ATOMIC_SLEEP enabled. Probably enabling > CONFIG_DEBUG_ATOMIC_SLEEP should just be a standard part of the QC > process. > > Anyway, it sounds you're just doing the locking to silence a warning in > xarray. Let's ask Matthew if he has a hint. > > regards, > dan carpenter > > On Sun, May 08, 2022 at 08:03:14PM +0200, Bodo Stroesser wrote: >> Hi Dan, >> >> Thank you for the report. >> >> I'm quite sure that our code does not cause any problems, because >> in tcmu we explicitly or implicitly take the xarray's lock only while >> holding the so called cmdr_lock mutex. Also, we take the lock without >> disabling irq or bh. So I think there is no problem if lock_page sleeps >> while we hold the xarray's lock. >> >> Of course, we wouldn't need the xarray lock at all, but xarray code >> forces us to take it to avoid warnings. >> >> In tcmu_blocks_release we use the advanced xarray API to keep the >> overhead small. It allows us to lock/unlock before and after the loop >> only. If there is no other way to avoid the Smatch warning, we could >> easily put additional xas_unlock() and xas_lock() around the >> lock_page/unlock_page block. >> >> But if there is a way to avoid the warning without imposing overhead, >> I would of course prefer it. >> >> Regards, >> Bodo >> >> >> On 04.05.22 17:12, Dan Carpenter wrote: >>> Hello Xiaoguang Wang, >>> >>> The patch bb9b9eb0ae2e: "scsi: target: tcmu: Fix possible data >>> corruption" from Apr 21, 2022, leads to the following Smatch static >>> checker warning: >>> >>> drivers/target/target_core_user.c:1689 tcmu_blocks_release() >>> warn: sleeping in atomic context >>> >>> drivers/target/target_core_user.c >>> 1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first, >>> 1662 unsigned long last) >>> 1663 { >>> 1664 XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk); >>> 1665 struct page *page; >>> 1666 u32 pages_freed = 0; >>> 1667 >>> 1668 xas_lock(&xas); >>> ^^^^^^^^^^^^^^ >>> We take a spinlock here. >>> >>> >>> 1669 xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) { >>> 1670 xas_store(&xas, NULL); >>> 1671 /* >>> 1672 * While reaching here there may be page faults occurring on >>> 1673 * the to-be-released pages. A race condition may occur if >>> 1674 * unmap_mapping_range() is called before page faults on these >>> 1675 * pages have completed; a valid but stale map is created. >>> 1676 * >>> 1677 * If another command subsequently runs and needs to extend >>> 1678 * dbi_thresh, it may reuse the slot corresponding to the >>> 1679 * previous page in data_bitmap. Though we will allocate a new >>> 1680 * page for the slot in data_area, no page fault will happen >>> 1681 * because we have a valid map. Therefore the command's data >>> 1682 * will be lost. >>> 1683 * >>> 1684 * We lock and unlock pages that are to be released to ensure >>> 1685 * all page faults have completed. This way >>> 1686 * unmap_mapping_range() can ensure stale maps are cleanly >>> 1687 * removed. >>> 1688 */ >>> --> 1689 lock_page(page); >>> ^^^^^^^^^^^^^^^ >>> The lock_page() function calls might_sleep() (inside the declaration >>> block). >>> >>> 1690 unlock_page(page); >>> 1691 __free_page(page); >>> 1692 pages_freed++; >>> 1693 } >>> 1694 xas_unlock(&xas); >>> ^^^^^^^^^^^^^^^^^ >>> Unlock >>> >>> 1695 >>> 1696 atomic_sub(pages_freed, &global_page_count); >>> 1697 >>> 1698 return pages_freed; >>> 1699 } >>> >>> regards, >>> dan carpenter ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-05-09 19:22 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-05-04 15:12 [bug report] scsi: target: tcmu: Fix possible data corruption Dan Carpenter 2022-05-06 8:47 ` Xiaoguang Wang 2022-05-08 18:03 ` Bodo Stroesser 2022-05-09 3:13 ` Xiaoguang Wang 2022-05-09 6:05 ` Dan Carpenter 2022-05-09 18:12 ` Matthew Wilcox 2022-05-09 19:22 ` Bodo Stroesser 2022-05-09 18:28 ` Bodo Stroesser
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.