[bug report] scsi: target: tcmu: Fix possible data corruption

All of lore.kernel.org
 help / color / mirror / Atom feed

* [bug report] scsi: target: tcmu: Fix possible data corruption
@ 2022-05-04 15:12 Dan Carpenter
  2022-05-06  8:47 ` Xiaoguang Wang
  2022-05-08 18:03 ` Bodo Stroesser
  0 siblings, 2 replies; 8+ messages in thread
From: Dan Carpenter @ 2022-05-04 15:12 UTC (permalink / raw)
  To: xiaoguang.wang; +Cc: target-devel

Hello Xiaoguang Wang,

The patch bb9b9eb0ae2e: "scsi: target: tcmu: Fix possible data
corruption" from Apr 21, 2022, leads to the following Smatch static
checker warning:

	drivers/target/target_core_user.c:1689 tcmu_blocks_release()
	warn: sleeping in atomic context

drivers/target/target_core_user.c
    1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first,
    1662                                 unsigned long last)
    1663 {
    1664         XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk);
    1665         struct page *page;
    1666         u32 pages_freed = 0;
    1667 
    1668         xas_lock(&xas);
                 ^^^^^^^^^^^^^^
We take a spinlock here.


    1669         xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) {
    1670                 xas_store(&xas, NULL);
    1671                 /*
    1672                  * While reaching here there may be page faults occurring on
    1673                  * the to-be-released pages. A race condition may occur if
    1674                  * unmap_mapping_range() is called before page faults on these
    1675                  * pages have completed; a valid but stale map is created.
    1676                  *
    1677                  * If another command subsequently runs and needs to extend
    1678                  * dbi_thresh, it may reuse the slot corresponding to the
    1679                  * previous page in data_bitmap. Though we will allocate a new
    1680                  * page for the slot in data_area, no page fault will happen
    1681                  * because we have a valid map. Therefore the command's data
    1682                  * will be lost.
    1683                  *
    1684                  * We lock and unlock pages that are to be released to ensure
    1685                  * all page faults have completed. This way
    1686                  * unmap_mapping_range() can ensure stale maps are cleanly
    1687                  * removed.
    1688                  */
--> 1689                 lock_page(page);
                         ^^^^^^^^^^^^^^^
The lock_page() function calls might_sleep() (inside the declaration
block).

    1690                 unlock_page(page);
    1691                 __free_page(page);
    1692                 pages_freed++;
    1693         }
    1694         xas_unlock(&xas);
                 ^^^^^^^^^^^^^^^^^
Unlock

    1695 
    1696         atomic_sub(pages_freed, &global_page_count);
    1697 
    1698         return pages_freed;
    1699 }

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] scsi: target: tcmu: Fix possible data corruption
  2022-05-04 15:12 [bug report] scsi: target: tcmu: Fix possible data corruption Dan Carpenter
@ 2022-05-06  8:47 ` Xiaoguang Wang
  2022-05-08 18:03 ` Bodo Stroesser
  1 sibling, 0 replies; 8+ messages in thread
From: Xiaoguang Wang @ 2022-05-06  8:47 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: target-devel

hi,

Sorry for late response, I missed this mail.
Thanks for this report.

Regards,
Xiaoguang Wang
> Hello Xiaoguang Wang,
>
> The patch bb9b9eb0ae2e: "scsi: target: tcmu: Fix possible data
> corruption" from Apr 21, 2022, leads to the following Smatch static
> checker warning:
>
> 	drivers/target/target_core_user.c:1689 tcmu_blocks_release()
> 	warn: sleeping in atomic context
>
> drivers/target/target_core_user.c
>     1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first,
>     1662                                 unsigned long last)
>     1663 {
>     1664         XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk);
>     1665         struct page *page;
>     1666         u32 pages_freed = 0;
>     1667 
>     1668         xas_lock(&xas);
>                  ^^^^^^^^^^^^^^
> We take a spinlock here.
>
>
>     1669         xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) {
>     1670                 xas_store(&xas, NULL);
>     1671                 /*
>     1672                  * While reaching here there may be page faults occurring on
>     1673                  * the to-be-released pages. A race condition may occur if
>     1674                  * unmap_mapping_range() is called before page faults on these
>     1675                  * pages have completed; a valid but stale map is created.
>     1676                  *
>     1677                  * If another command subsequently runs and needs to extend
>     1678                  * dbi_thresh, it may reuse the slot corresponding to the
>     1679                  * previous page in data_bitmap. Though we will allocate a new
>     1680                  * page for the slot in data_area, no page fault will happen
>     1681                  * because we have a valid map. Therefore the command's data
>     1682                  * will be lost.
>     1683                  *
>     1684                  * We lock and unlock pages that are to be released to ensure
>     1685                  * all page faults have completed. This way
>     1686                  * unmap_mapping_range() can ensure stale maps are cleanly
>     1687                  * removed.
>     1688                  */
> --> 1689                 lock_page(page);
>                          ^^^^^^^^^^^^^^^
> The lock_page() function calls might_sleep() (inside the declaration
> block).
>
>     1690                 unlock_page(page);
>     1691                 __free_page(page);
>     1692                 pages_freed++;
>     1693         }
>     1694         xas_unlock(&xas);
>                  ^^^^^^^^^^^^^^^^^
> Unlock
>
>     1695 
>     1696         atomic_sub(pages_freed, &global_page_count);
>     1697 
>     1698         return pages_freed;
>     1699 }
>
> regards,
> dan carpenter


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] scsi: target: tcmu: Fix possible data corruption
  2022-05-04 15:12 [bug report] scsi: target: tcmu: Fix possible data corruption Dan Carpenter
  2022-05-06  8:47 ` Xiaoguang Wang
@ 2022-05-08 18:03 ` Bodo Stroesser
  2022-05-09  3:13   ` Xiaoguang Wang
  2022-05-09  6:05   ` Dan Carpenter
  1 sibling, 2 replies; 8+ messages in thread
From: Bodo Stroesser @ 2022-05-08 18:03 UTC (permalink / raw)
  To: Dan Carpenter, xiaoguang.wang; +Cc: target-devel

Hi Dan,

Thank you for the report.

I'm quite sure that our code does not cause any problems, because
in tcmu we explicitly or implicitly take the xarray's lock only while
holding the so called cmdr_lock mutex. Also, we take the lock without
disabling irq or bh. So I think there is no problem if lock_page sleeps
while we hold the xarray's lock.

Of course, we wouldn't need the xarray lock at all, but xarray code
forces us to take it to avoid warnings.

In tcmu_blocks_release we use the advanced xarray API to keep the
overhead small. It allows us to lock/unlock before and after the loop
only. If there is no other way to avoid the Smatch warning, we could
easily put additional xas_unlock() and xas_lock() around the
lock_page/unlock_page block.

But if there is a way to avoid the warning without imposing overhead,
I would of course prefer it.

Regards,
Bodo


On 04.05.22 17:12, Dan Carpenter wrote:
> Hello Xiaoguang Wang,
> 
> The patch bb9b9eb0ae2e: "scsi: target: tcmu: Fix possible data
> corruption" from Apr 21, 2022, leads to the following Smatch static
> checker warning:
> 
> 	drivers/target/target_core_user.c:1689 tcmu_blocks_release()
> 	warn: sleeping in atomic context
> 
> drivers/target/target_core_user.c
>      1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first,
>      1662                                 unsigned long last)
>      1663 {
>      1664         XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk);
>      1665         struct page *page;
>      1666         u32 pages_freed = 0;
>      1667
>      1668         xas_lock(&xas);
>                   ^^^^^^^^^^^^^^
> We take a spinlock here.
> 
> 
>      1669         xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) {
>      1670                 xas_store(&xas, NULL);
>      1671                 /*
>      1672                  * While reaching here there may be page faults occurring on
>      1673                  * the to-be-released pages. A race condition may occur if
>      1674                  * unmap_mapping_range() is called before page faults on these
>      1675                  * pages have completed; a valid but stale map is created.
>      1676                  *
>      1677                  * If another command subsequently runs and needs to extend
>      1678                  * dbi_thresh, it may reuse the slot corresponding to the
>      1679                  * previous page in data_bitmap. Though we will allocate a new
>      1680                  * page for the slot in data_area, no page fault will happen
>      1681                  * because we have a valid map. Therefore the command's data
>      1682                  * will be lost.
>      1683                  *
>      1684                  * We lock and unlock pages that are to be released to ensure
>      1685                  * all page faults have completed. This way
>      1686                  * unmap_mapping_range() can ensure stale maps are cleanly
>      1687                  * removed.
>      1688                  */
> --> 1689                 lock_page(page);
>                           ^^^^^^^^^^^^^^^
> The lock_page() function calls might_sleep() (inside the declaration
> block).
> 
>      1690                 unlock_page(page);
>      1691                 __free_page(page);
>      1692                 pages_freed++;
>      1693         }
>      1694         xas_unlock(&xas);
>                   ^^^^^^^^^^^^^^^^^
> Unlock
> 
>      1695
>      1696         atomic_sub(pages_freed, &global_page_count);
>      1697
>      1698         return pages_freed;
>      1699 }
> 
> regards,
> dan carpenter

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] scsi: target: tcmu: Fix possible data corruption
  2022-05-08 18:03 ` Bodo Stroesser
@ 2022-05-09  3:13   ` Xiaoguang Wang
  2022-05-09  6:05   ` Dan Carpenter
  1 sibling, 0 replies; 8+ messages in thread
From: Xiaoguang Wang @ 2022-05-09  3:13 UTC (permalink / raw)
  To: Bodo Stroesser, Dan Carpenter; +Cc: target-devel

hi,

> Hi Dan,
>
> Thank you for the report.
>
> I'm quite sure that our code does not cause any problems, because
> in tcmu we explicitly or implicitly take the xarray's lock only while
> holding the so called cmdr_lock mutex. Also, we take the lock without
> disabling irq or bh. So I think there is no problem if lock_page sleeps
> while we hold the xarray's lock.
>
> Of course, we wouldn't need the xarray lock at all, but xarray code
> forces us to take it to avoid warnings.
I also see it now, thanks.
Indeed I was going to prepare patch which removes this xarry lock calls.

Regards,
Xiaoguang Wang

>
> In tcmu_blocks_release we use the advanced xarray API to keep the
> overhead small. It allows us to lock/unlock before and after the loop
> only. If there is no other way to avoid the Smatch warning, we could
> easily put additional xas_unlock() and xas_lock() around the
> lock_page/unlock_page block.
>
> But if there is a way to avoid the warning without imposing overhead,
> I would of course prefer it.
>
> Regards,
> Bodo
>
>
> On 04.05.22 17:12, Dan Carpenter wrote:
>> Hello Xiaoguang Wang,
>>
>> The patch bb9b9eb0ae2e: "scsi: target: tcmu: Fix possible data
>> corruption" from Apr 21, 2022, leads to the following Smatch static
>> checker warning:
>>
>>     drivers/target/target_core_user.c:1689 tcmu_blocks_release()
>>     warn: sleeping in atomic context
>>
>> drivers/target/target_core_user.c
>>      1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first,
>>      1662                                 unsigned long last)
>>      1663 {
>>      1664         XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk);
>>      1665         struct page *page;
>>      1666         u32 pages_freed = 0;
>>      1667
>>      1668         xas_lock(&xas);
>>                   ^^^^^^^^^^^^^^
>> We take a spinlock here.
>>
>>
>>      1669         xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) {
>>      1670                 xas_store(&xas, NULL);
>>      1671                 /*
>>      1672                  * While reaching here there may be page faults occurring on
>>      1673                  * the to-be-released pages. A race condition may occur if
>>      1674                  * unmap_mapping_range() is called before page faults on these
>>      1675                  * pages have completed; a valid but stale map is created.
>>      1676                  *
>>      1677                  * If another command subsequently runs and needs to extend
>>      1678                  * dbi_thresh, it may reuse the slot corresponding to the
>>      1679                  * previous page in data_bitmap. Though we will allocate a new
>>      1680                  * page for the slot in data_area, no page fault will happen
>>      1681                  * because we have a valid map. Therefore the command's data
>>      1682                  * will be lost.
>>      1683                  *
>>      1684                  * We lock and unlock pages that are to be released to ensure
>>      1685                  * all page faults have completed. This way
>>      1686                  * unmap_mapping_range() can ensure stale maps are cleanly
>>      1687                  * removed.
>>      1688                  */
>> --> 1689                 lock_page(page);
>>                           ^^^^^^^^^^^^^^^
>> The lock_page() function calls might_sleep() (inside the declaration
>> block).
>>
>>      1690                 unlock_page(page);
>>      1691                 __free_page(page);
>>      1692                 pages_freed++;
>>      1693         }
>>      1694         xas_unlock(&xas);
>>                   ^^^^^^^^^^^^^^^^^
>> Unlock
>>
>>      1695
>>      1696         atomic_sub(pages_freed, &global_page_count);
>>      1697
>>      1698         return pages_freed;
>>      1699 }
>>
>> regards,
>> dan carpenter


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] scsi: target: tcmu: Fix possible data corruption
  2022-05-08 18:03 ` Bodo Stroesser
  2022-05-09  3:13   ` Xiaoguang Wang
@ 2022-05-09  6:05   ` Dan Carpenter
  2022-05-09 18:12     ` Matthew Wilcox
  2022-05-09 18:28     ` Bodo Stroesser
  1 sibling, 2 replies; 8+ messages in thread
From: Dan Carpenter @ 2022-05-09  6:05 UTC (permalink / raw)
  To: Bodo Stroesser, Matthew Wilcox; +Cc: xiaoguang.wang, target-devel

> If there is no other way to avoid the Smatch warning,

The Smatch warning is not the issue.  If we're holding a spinlock and
we call might_sleep() then that generates a stack trace at runtime if
you have CONFIG_DEBUG_ATOMIC_SLEEP enabled.  Probably enabling
CONFIG_DEBUG_ATOMIC_SLEEP should just be a standard part of the QC
process.

Anyway, it sounds you're just doing the locking to silence a warning in
xarray.  Let's ask Matthew if he has a hint.

regards,
dan carpenter

On Sun, May 08, 2022 at 08:03:14PM +0200, Bodo Stroesser wrote:
> Hi Dan,
> 
> Thank you for the report.
> 
> I'm quite sure that our code does not cause any problems, because
> in tcmu we explicitly or implicitly take the xarray's lock only while
> holding the so called cmdr_lock mutex. Also, we take the lock without
> disabling irq or bh. So I think there is no problem if lock_page sleeps
> while we hold the xarray's lock.
> 
> Of course, we wouldn't need the xarray lock at all, but xarray code
> forces us to take it to avoid warnings.
> 
> In tcmu_blocks_release we use the advanced xarray API to keep the
> overhead small. It allows us to lock/unlock before and after the loop
> only. If there is no other way to avoid the Smatch warning, we could
> easily put additional xas_unlock() and xas_lock() around the
> lock_page/unlock_page block.
> 
> But if there is a way to avoid the warning without imposing overhead,
> I would of course prefer it.
> 
> Regards,
> Bodo
> 
> 
> On 04.05.22 17:12, Dan Carpenter wrote:
> > Hello Xiaoguang Wang,
> > 
> > The patch bb9b9eb0ae2e: "scsi: target: tcmu: Fix possible data
> > corruption" from Apr 21, 2022, leads to the following Smatch static
> > checker warning:
> > 
> > 	drivers/target/target_core_user.c:1689 tcmu_blocks_release()
> > 	warn: sleeping in atomic context
> > 
> > drivers/target/target_core_user.c
> >      1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first,
> >      1662                                 unsigned long last)
> >      1663 {
> >      1664         XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk);
> >      1665         struct page *page;
> >      1666         u32 pages_freed = 0;
> >      1667
> >      1668         xas_lock(&xas);
> >                   ^^^^^^^^^^^^^^
> > We take a spinlock here.
> > 
> > 
> >      1669         xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) {
> >      1670                 xas_store(&xas, NULL);
> >      1671                 /*
> >      1672                  * While reaching here there may be page faults occurring on
> >      1673                  * the to-be-released pages. A race condition may occur if
> >      1674                  * unmap_mapping_range() is called before page faults on these
> >      1675                  * pages have completed; a valid but stale map is created.
> >      1676                  *
> >      1677                  * If another command subsequently runs and needs to extend
> >      1678                  * dbi_thresh, it may reuse the slot corresponding to the
> >      1679                  * previous page in data_bitmap. Though we will allocate a new
> >      1680                  * page for the slot in data_area, no page fault will happen
> >      1681                  * because we have a valid map. Therefore the command's data
> >      1682                  * will be lost.
> >      1683                  *
> >      1684                  * We lock and unlock pages that are to be released to ensure
> >      1685                  * all page faults have completed. This way
> >      1686                  * unmap_mapping_range() can ensure stale maps are cleanly
> >      1687                  * removed.
> >      1688                  */
> > --> 1689                 lock_page(page);
> >                           ^^^^^^^^^^^^^^^
> > The lock_page() function calls might_sleep() (inside the declaration
> > block).
> > 
> >      1690                 unlock_page(page);
> >      1691                 __free_page(page);
> >      1692                 pages_freed++;
> >      1693         }
> >      1694         xas_unlock(&xas);
> >                   ^^^^^^^^^^^^^^^^^
> > Unlock
> > 
> >      1695
> >      1696         atomic_sub(pages_freed, &global_page_count);
> >      1697
> >      1698         return pages_freed;
> >      1699 }
> > 
> > regards,
> > dan carpenter

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] scsi: target: tcmu: Fix possible data corruption
  2022-05-09  6:05   ` Dan Carpenter
@ 2022-05-09 18:12     ` Matthew Wilcox
  2022-05-09 19:22       ` Bodo Stroesser
  2022-05-09 18:28     ` Bodo Stroesser
  1 sibling, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2022-05-09 18:12 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: Bodo Stroesser, xiaoguang.wang, target-devel

On Mon, May 09, 2022 at 09:05:45AM +0300, Dan Carpenter wrote:
> > If there is no other way to avoid the Smatch warning,
> 
> The Smatch warning is not the issue.  If we're holding a spinlock and
> we call might_sleep() then that generates a stack trace at runtime if
> you have CONFIG_DEBUG_ATOMIC_SLEEP enabled.  Probably enabling
> CONFIG_DEBUG_ATOMIC_SLEEP should just be a standard part of the QC
> process.
> 
> Anyway, it sounds you're just doing the locking to silence a warning in
> xarray.  Let's ask Matthew if he has a hint.

I suspect that the tcmu code is doing something horrible & wrong again.
It keeps poking at VM internals without understanding of what's going
on or why.  Anyway ...

> On Sun, May 08, 2022 at 08:03:14PM +0200, Bodo Stroesser wrote:
> > I'm quite sure that our code does not cause any problems, because
> > in tcmu we explicitly or implicitly take the xarray's lock only while
> > holding the so called cmdr_lock mutex. Also, we take the lock without
> > disabling irq or bh. So I think there is no problem if lock_page sleeps
> > while we hold the xarray's lock.
> > 
> > Of course, we wouldn't need the xarray lock at all, but xarray code
> > forces us to take it to avoid warnings.

You mean "The XArray warns us when we break its locking rules".

> > In tcmu_blocks_release we use the advanced xarray API to keep the
> > overhead small. It allows us to lock/unlock before and after the loop
> > only. If there is no other way to avoid the Smatch warning, we could
> > easily put additional xas_unlock() and xas_lock() around the
> > lock_page/unlock_page block.

... then you'd have to call xas_reset(), and you might as well use
xa_for_each().

> > >      1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first,
> > >      1662                                 unsigned long last)
> > >      1663 {
> > >      1664         XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk);
> > >      1665         struct page *page;
> > >      1666         u32 pages_freed = 0;
> > >      1667
> > >      1668         xas_lock(&xas);
> > >                   ^^^^^^^^^^^^^^
> > > We take a spinlock here.
> > > 
> > > 
> > >      1669         xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) {
> > >      1670                 xas_store(&xas, NULL);
> > >      1671                 /*
> > >      1672                  * While reaching here there may be page faults occurring on
> > >      1673                  * the to-be-released pages. A race condition may occur if
> > >      1674                  * unmap_mapping_range() is called before page faults on these
> > >      1675                  * pages have completed; a valid but stale map is created.
> > >      1676                  *
> > >      1677                  * If another command subsequently runs and needs to extend
> > >      1678                  * dbi_thresh, it may reuse the slot corresponding to the
> > >      1679                  * previous page in data_bitmap. Though we will allocate a new
> > >      1680                  * page for the slot in data_area, no page fault will happen
> > >      1681                  * because we have a valid map. Therefore the command's data
> > >      1682                  * will be lost.
> > >      1683                  *
> > >      1684                  * We lock and unlock pages that are to be released to ensure
> > >      1685                  * all page faults have completed. This way
> > >      1686                  * unmap_mapping_range() can ensure stale maps are cleanly
> > >      1687                  * removed.
> > >      1688                  */
> > > --> 1689                 lock_page(page);
> > >                           ^^^^^^^^^^^^^^^
> > > The lock_page() function calls might_sleep() (inside the declaration
> > > block).
> > > 
> > >      1690                 unlock_page(page);
> > >      1691                 __free_page(page);
> > >      1692                 pages_freed++;
> > >      1693         }
> > >      1694         xas_unlock(&xas);

There are a number of things you can do.  One is to remove pages into
a pagevec until it is full, then xas_unlock(); xas_reset(); lock and
unlock each page and pass the pagevec to __pagevec_release().  Then
xas_lock() and continue the loop.

Another possibility is to trylock each page; if it fails, put it into
the pagevec (and do the above dance if the pagevec is now full), but
if it succeeds, you can now unlock it and __free_page().

The key to going fast is batching.  And that goes for __free_page()
vs __pagevec_release() as much as for walking the XArray.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] scsi: target: tcmu: Fix possible data corruption
  2022-05-09 18:12     ` Matthew Wilcox
@ 2022-05-09 19:22       ` Bodo Stroesser
  0 siblings, 0 replies; 8+ messages in thread
From: Bodo Stroesser @ 2022-05-09 19:22 UTC (permalink / raw)
  To: Matthew Wilcox, Dan Carpenter; +Cc: xiaoguang.wang, target-devel

On 09.05.22 20:12, Matthew Wilcox wrote:
> On Mon, May 09, 2022 at 09:05:45AM +0300, Dan Carpenter wrote:
>>> If there is no other way to avoid the Smatch warning,
>>
>> The Smatch warning is not the issue.  If we're holding a spinlock and
>> we call might_sleep() then that generates a stack trace at runtime if
>> you have CONFIG_DEBUG_ATOMIC_SLEEP enabled.  Probably enabling
>> CONFIG_DEBUG_ATOMIC_SLEEP should just be a standard part of the QC
>> process.
>>
>> Anyway, it sounds you're just doing the locking to silence a warning in
>> xarray.  Let's ask Matthew if he has a hint.
> 
> I suspect that the tcmu code is doing something horrible & wrong again.
> It keeps poking at VM internals without understanding of what's going
> on or why.  Anyway ...

If we really are doing something wrong, the long comment in the new
code in tcmu_blocks_release probably is wrong. Can you please give us
a hint what our misunderstanding is?

> 
>> On Sun, May 08, 2022 at 08:03:14PM +0200, Bodo Stroesser wrote:
>>> I'm quite sure that our code does not cause any problems, because
>>> in tcmu we explicitly or implicitly take the xarray's lock only while
>>> holding the so called cmdr_lock mutex. Also, we take the lock without
>>> disabling irq or bh. So I think there is no problem if lock_page sleeps
>>> while we hold the xarray's lock.
>>>
>>> Of course, we wouldn't need the xarray lock at all, but xarray code
>>> forces us to take it to avoid warnings.
> 
> You mean "The XArray warns us when we break its locking rules".
> 

Yes.
The point is, that tcmu does not use spin_locks, but mutex. So we
unfortunately cannot use the XArray lock alone, but always have to take
tcmu's mutex and while holding it, we additionally (implicitly or
explicitly) take XArray's lock.

>>> In tcmu_blocks_release we use the advanced xarray API to keep the
>>> overhead small. It allows us to lock/unlock before and after the loop
>>> only. If there is no other way to avoid the Smatch warning, we could
>>> easily put additional xas_unlock() and xas_lock() around the
>>> lock_page/unlock_page block.
> 
> ... then you'd have to call xas_reset(), and you might as well use
> xa_for_each().

Yes.
I wrote a mail few minutes ago that we should switch to
xa_for_each_range().

> 
>>>>       1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first,
>>>>       1662                                 unsigned long last)
>>>>       1663 {
>>>>       1664         XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk);
>>>>       1665         struct page *page;
>>>>       1666         u32 pages_freed = 0;
>>>>       1667
>>>>       1668         xas_lock(&xas);
>>>>                    ^^^^^^^^^^^^^^
>>>> We take a spinlock here.
>>>>
>>>>
>>>>       1669         xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) {
>>>>       1670                 xas_store(&xas, NULL);
>>>>       1671                 /*
>>>>       1672                  * While reaching here there may be page faults occurring on
>>>>       1673                  * the to-be-released pages. A race condition may occur if
>>>>       1674                  * unmap_mapping_range() is called before page faults on these
>>>>       1675                  * pages have completed; a valid but stale map is created.
>>>>       1676                  *
>>>>       1677                  * If another command subsequently runs and needs to extend
>>>>       1678                  * dbi_thresh, it may reuse the slot corresponding to the
>>>>       1679                  * previous page in data_bitmap. Though we will allocate a new
>>>>       1680                  * page for the slot in data_area, no page fault will happen
>>>>       1681                  * because we have a valid map. Therefore the command's data
>>>>       1682                  * will be lost.
>>>>       1683                  *
>>>>       1684                  * We lock and unlock pages that are to be released to ensure
>>>>       1685                  * all page faults have completed. This way
>>>>       1686                  * unmap_mapping_range() can ensure stale maps are cleanly
>>>>       1687                  * removed.
>>>>       1688                  */
>>>> --> 1689                 lock_page(page);
>>>>                            ^^^^^^^^^^^^^^^
>>>> The lock_page() function calls might_sleep() (inside the declaration
>>>> block).
>>>>
>>>>       1690                 unlock_page(page);
>>>>       1691                 __free_page(page);
>>>>       1692                 pages_freed++;
>>>>       1693         }
>>>>       1694         xas_unlock(&xas);
> 
> There are a number of things you can do.  One is to remove pages into
> a pagevec until it is full, then xas_unlock(); xas_reset(); lock and
> unlock each page and pass the pagevec to __pagevec_release().  Then
> xas_lock() and continue the loop.
> 
> Another possibility is to trylock each page; if it fails, put it into
> the pagevec (and do the above dance if the pagevec is now full), but
> if it succeeds, you can now unlock it and __free_page().

Thank you for that hint.
The probability, that trylock_page fails is _very_ low. In future we
might call tcmu_blocks_release more often. Then we can speed it up
again by using the method you described, even without a need for
batching.

> 
> The key to going fast is batching.  And that goes for __free_page()
> vs __pagevec_release() as much as for walking the XArray.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] scsi: target: tcmu: Fix possible data corruption
  2022-05-09  6:05   ` Dan Carpenter
  2022-05-09 18:12     ` Matthew Wilcox
@ 2022-05-09 18:28     ` Bodo Stroesser
  1 sibling, 0 replies; 8+ messages in thread
From: Bodo Stroesser @ 2022-05-09 18:28 UTC (permalink / raw)
  To: Dan Carpenter, Matthew Wilcox, xiaoguang.wang; +Cc: target-devel

To be honest, tcmu_blocks_release is not relevant for tcmu performance.
Therefore I think we should try to keep the code simple and easy to
understand.

So, it probably is the simplest and also best solution to switch to
xarray normal API. That would make tcmu_blocks_release look like
the following (untested) code:

static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first,
                                unsigned long last)
{
	struct page *page;
	unsigned long dpi;
	u32 pages_freed = 0;

	first = first * udev->data_pages_per_blk;
	last = (last + 1) * udev->data_pages_per_blk - 1;
	xa_for_each_range(&udev->data_pages, dpi, page, first, last) {
		xa_erase(&udev->data_pages, dpi);
		lock_page(page);
		unlock_page(page);
		__free_page(page);
		pages_freed++;
	}

	atomic_sub(pages_freed, &global_page_count);

	return pages_freed;
}

Regards,
Bodo


On 09.05.22 08:05, Dan Carpenter wrote:
>> If there is no other way to avoid the Smatch warning,
> 
> The Smatch warning is not the issue.  If we're holding a spinlock and
> we call might_sleep() then that generates a stack trace at runtime if
> you have CONFIG_DEBUG_ATOMIC_SLEEP enabled.  Probably enabling
> CONFIG_DEBUG_ATOMIC_SLEEP should just be a standard part of the QC
> process.
> 
> Anyway, it sounds you're just doing the locking to silence a warning in
> xarray.  Let's ask Matthew if he has a hint.
> 
> regards,
> dan carpenter
> 
> On Sun, May 08, 2022 at 08:03:14PM +0200, Bodo Stroesser wrote:
>> Hi Dan,
>>
>> Thank you for the report.
>>
>> I'm quite sure that our code does not cause any problems, because
>> in tcmu we explicitly or implicitly take the xarray's lock only while
>> holding the so called cmdr_lock mutex. Also, we take the lock without
>> disabling irq or bh. So I think there is no problem if lock_page sleeps
>> while we hold the xarray's lock.
>>
>> Of course, we wouldn't need the xarray lock at all, but xarray code
>> forces us to take it to avoid warnings.
>>
>> In tcmu_blocks_release we use the advanced xarray API to keep the
>> overhead small. It allows us to lock/unlock before and after the loop
>> only. If there is no other way to avoid the Smatch warning, we could
>> easily put additional xas_unlock() and xas_lock() around the
>> lock_page/unlock_page block.
>>
>> But if there is a way to avoid the warning without imposing overhead,
>> I would of course prefer it.
>>
>> Regards,
>> Bodo
>>
>>
>> On 04.05.22 17:12, Dan Carpenter wrote:
>>> Hello Xiaoguang Wang,
>>>
>>> The patch bb9b9eb0ae2e: "scsi: target: tcmu: Fix possible data
>>> corruption" from Apr 21, 2022, leads to the following Smatch static
>>> checker warning:
>>>
>>> 	drivers/target/target_core_user.c:1689 tcmu_blocks_release()
>>> 	warn: sleeping in atomic context
>>>
>>> drivers/target/target_core_user.c
>>>       1661 static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first,
>>>       1662                                 unsigned long last)
>>>       1663 {
>>>       1664         XA_STATE(xas, &udev->data_pages, first * udev->data_pages_per_blk);
>>>       1665         struct page *page;
>>>       1666         u32 pages_freed = 0;
>>>       1667
>>>       1668         xas_lock(&xas);
>>>                    ^^^^^^^^^^^^^^
>>> We take a spinlock here.
>>>
>>>
>>>       1669         xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) {
>>>       1670                 xas_store(&xas, NULL);
>>>       1671                 /*
>>>       1672                  * While reaching here there may be page faults occurring on
>>>       1673                  * the to-be-released pages. A race condition may occur if
>>>       1674                  * unmap_mapping_range() is called before page faults on these
>>>       1675                  * pages have completed; a valid but stale map is created.
>>>       1676                  *
>>>       1677                  * If another command subsequently runs and needs to extend
>>>       1678                  * dbi_thresh, it may reuse the slot corresponding to the
>>>       1679                  * previous page in data_bitmap. Though we will allocate a new
>>>       1680                  * page for the slot in data_area, no page fault will happen
>>>       1681                  * because we have a valid map. Therefore the command's data
>>>       1682                  * will be lost.
>>>       1683                  *
>>>       1684                  * We lock and unlock pages that are to be released to ensure
>>>       1685                  * all page faults have completed. This way
>>>       1686                  * unmap_mapping_range() can ensure stale maps are cleanly
>>>       1687                  * removed.
>>>       1688                  */
>>> --> 1689                 lock_page(page);
>>>                            ^^^^^^^^^^^^^^^
>>> The lock_page() function calls might_sleep() (inside the declaration
>>> block).
>>>
>>>       1690                 unlock_page(page);
>>>       1691                 __free_page(page);
>>>       1692                 pages_freed++;
>>>       1693         }
>>>       1694         xas_unlock(&xas);
>>>                    ^^^^^^^^^^^^^^^^^
>>> Unlock
>>>
>>>       1695
>>>       1696         atomic_sub(pages_freed, &global_page_count);
>>>       1697
>>>       1698         return pages_freed;
>>>       1699 }
>>>
>>> regards,
>>> dan carpenter

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-05-09 19:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-05-04 15:12 [bug report] scsi: target: tcmu: Fix possible data corruption Dan Carpenter
2022-05-06  8:47 ` Xiaoguang Wang
2022-05-08 18:03 ` Bodo Stroesser
2022-05-09  3:13   ` Xiaoguang Wang
2022-05-09  6:05   ` Dan Carpenter
2022-05-09 18:12     ` Matthew Wilcox
2022-05-09 19:22       ` Bodo Stroesser
2022-05-09 18:28     ` Bodo Stroesser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.