From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>,
Bodo Stroesser <bostroesser@gmail.com>,
"Martin K . Petersen" <martin.petersen@oracle.com>,
Sasha Levin <sashal@kernel.org>,
linux-scsi@vger.kernel.org, target-devel@vger.kernel.org
Subject: [PATCH AUTOSEL 5.18 074/159] scsi: target: tcmu: Fix possible data corruption
Date: Mon, 30 May 2022 09:22:59 -0400 [thread overview]
Message-ID: <20220530132425.1929512-74-sashal@kernel.org> (raw)
In-Reply-To: <20220530132425.1929512-1-sashal@kernel.org>
From: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
[ Upstream commit bb9b9eb0ae2e9d3f6036f0ad907c3a83dcd43485 ]
When tcmu_vma_fault() gets a page successfully, before the current context
completes page fault procedure, find_free_blocks() may run and call
unmap_mapping_range() to unmap the page. Assume that when
find_free_blocks() initially completes and the previous page fault
procedure starts to run again and completes, then one truncated page has
been mapped to userspace. But note that tcmu_vma_fault() has gotten a
refcount for the page so any other subsystem won't be able to use the page
unless the userspace address is unmapped later.
If another command subsequently runs and needs to extend dbi_thresh it may
reuse the corresponding slot for the previous page in data_bitmap. Then
though we'll allocate new page for this slot in data_area, no page fault
will happen because we have a valid map and the real request's data will be
lost.
Filesystem implementations will also run into this issue but they usually
lock the page when vm_operations_struct->fault gets a page and unlock the
page after finish_fault() completes. For truncate filesystems lock pages in
truncate_inode_pages() to protect against racing wrt. page faults.
To fix this possible data corruption scenario we can apply a method similar
to the filesystems. For pages that are to be freed, tcmu_blocks_release()
locks and unlocks. Make tcmu_vma_fault() also lock found page under
cmdr_lock. At the same time, since tcmu_vma_fault() gets an extra page
refcount, tcmu_blocks_release() won't free pages if pages are in page fault
procedure, which means it is safe to call tcmu_blocks_release() before
unmap_mapping_range().
With these changes tcmu_blocks_release() will wait for all page faults to
be completed before calling unmap_mapping_range(). And later, if
unmap_mapping_range() is called, it will ensure stale mappings are removed.
Link: https://lore.kernel.org/r/20220421023735.9018-1-xiaoguang.wang@linux.alibaba.com
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/target/target_core_user.c | 40 ++++++++++++++++++++++++++++---
1 file changed, 37 insertions(+), 3 deletions(-)
diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
index fd7267baa707..b1fd06edea59 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -20,6 +20,7 @@
#include <linux/configfs.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
+#include <linux/pagemap.h>
#include <net/genetlink.h>
#include <scsi/scsi_common.h>
#include <scsi/scsi_proto.h>
@@ -1667,6 +1668,26 @@ static u32 tcmu_blocks_release(struct tcmu_dev *udev, unsigned long first,
xas_lock(&xas);
xas_for_each(&xas, page, (last + 1) * udev->data_pages_per_blk - 1) {
xas_store(&xas, NULL);
+ /*
+ * While reaching here there may be page faults occurring on
+ * the to-be-released pages. A race condition may occur if
+ * unmap_mapping_range() is called before page faults on these
+ * pages have completed; a valid but stale map is created.
+ *
+ * If another command subsequently runs and needs to extend
+ * dbi_thresh, it may reuse the slot corresponding to the
+ * previous page in data_bitmap. Though we will allocate a new
+ * page for the slot in data_area, no page fault will happen
+ * because we have a valid map. Therefore the command's data
+ * will be lost.
+ *
+ * We lock and unlock pages that are to be released to ensure
+ * all page faults have completed. This way
+ * unmap_mapping_range() can ensure stale maps are cleanly
+ * removed.
+ */
+ lock_page(page);
+ unlock_page(page);
__free_page(page);
pages_freed++;
}
@@ -1822,6 +1843,7 @@ static struct page *tcmu_try_get_data_page(struct tcmu_dev *udev, uint32_t dpi)
page = xa_load(&udev->data_pages, dpi);
if (likely(page)) {
get_page(page);
+ lock_page(page);
mutex_unlock(&udev->cmdr_lock);
return page;
}
@@ -1863,6 +1885,7 @@ static vm_fault_t tcmu_vma_fault(struct vm_fault *vmf)
struct page *page;
unsigned long offset;
void *addr;
+ vm_fault_t ret = 0;
int mi = tcmu_find_mem_index(vmf->vma);
if (mi < 0)
@@ -1887,10 +1910,11 @@ static vm_fault_t tcmu_vma_fault(struct vm_fault *vmf)
page = tcmu_try_get_data_page(udev, dpi);
if (!page)
return VM_FAULT_SIGBUS;
+ ret = VM_FAULT_LOCKED;
}
vmf->page = page;
- return 0;
+ return ret;
}
static const struct vm_operations_struct tcmu_vm_ops = {
@@ -3205,12 +3229,22 @@ static void find_free_blocks(void)
udev->dbi_max = block;
}
+ /*
+ * Release the block pages.
+ *
+ * Also note that since tcmu_vma_fault() gets an extra page
+ * refcount, tcmu_blocks_release() won't free pages if pages
+ * are mapped. This means it is safe to call
+ * tcmu_blocks_release() before unmap_mapping_range() which
+ * drops the refcount of any pages it unmaps and thus releases
+ * them.
+ */
+ pages_freed = tcmu_blocks_release(udev, start, end - 1);
+
/* Here will truncate the data area from off */
off = udev->data_off + (loff_t)start * udev->data_blk_size;
unmap_mapping_range(udev->inode->i_mapping, off, 0, 1);
- /* Release the block pages */
- pages_freed = tcmu_blocks_release(udev, start, end - 1);
mutex_unlock(&udev->cmdr_lock);
total_pages_freed += pages_freed;
--
2.35.1
next prev parent reply other threads:[~2022-05-30 13:32 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20220530132425.1929512-1-sashal@kernel.org>
2022-05-30 13:22 ` [PATCH AUTOSEL 5.18 035/159] scsi: lpfc: Move cfg_log_verbose check before calling lpfc_dmp_dbg() Sasha Levin
2022-05-30 13:22 ` [PATCH AUTOSEL 5.18 036/159] scsi: lpfc: Fix SCSI I/O completion and abort handler deadlock Sasha Levin
2022-05-30 13:22 ` [PATCH AUTOSEL 5.18 037/159] scsi: lpfc: Fix null pointer dereference after failing to issue FLOGI and PLOGI Sasha Levin
2022-05-30 13:22 ` [PATCH AUTOSEL 5.18 038/159] scsi: lpfc: Protect memory leak for NPIV ports sending PLOGI_RJT Sasha Levin
2022-05-30 13:22 ` [PATCH AUTOSEL 5.18 039/159] scsi: lpfc: Fix call trace observed during I/O with CMF enabled Sasha Levin
2022-05-30 13:22 ` [PATCH AUTOSEL 5.18 058/159] scsi: megaraid: Fix error check return value of register_chrdev() Sasha Levin
2022-05-30 13:22 ` [PATCH AUTOSEL 5.18 060/159] scsi: ufs: Use pm_runtime_resume_and_get() instead of pm_runtime_get_sync() Sasha Levin
2022-05-30 13:22 ` [PATCH AUTOSEL 5.18 061/159] scsi: lpfc: Fix resource leak in lpfc_sli4_send_seq_to_ulp() Sasha Levin
2022-05-30 13:22 ` Sasha Levin [this message]
2022-05-30 15:47 ` [PATCH AUTOSEL 5.18 074/159] scsi: target: tcmu: Fix possible data corruption Bodo Stroesser
2022-06-05 13:16 ` Sasha Levin
2022-05-30 13:23 ` [PATCH AUTOSEL 5.18 092/159] scsi: hisi_sas: Undo RPM resume for failed notify phy event for v3 HW Sasha Levin
2022-05-30 13:23 ` [PATCH AUTOSEL 5.18 093/159] scsi: lpfc: Inhibit aborts if external loopback plug is inserted Sasha Levin
2022-05-30 13:23 ` [PATCH AUTOSEL 5.18 094/159] scsi: lpfc: Alter FPIN stat accounting logic Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220530132425.1929512-74-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=bostroesser@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=stable@vger.kernel.org \
--cc=target-devel@vger.kernel.org \
--cc=xiaoguang.wang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox