* [PATCH 6.3 001/431] mm: call arch_swap_restore() from do_swap_page()
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 002/431] drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2 Greg Kroah-Hartman
` (430 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Peter Collingbourne, Qun-wei Lin,
David Hildenbrand, Huang, Ying, Steven Price, Catalin Marinas,
Andrew Morton
From: Peter Collingbourne <pcc@google.com>
commit 6dca4ac6fc91fd41ea4d6c4511838d37f4e0eab2 upstream.
Commit c145e0b47c77 ("mm: streamline COW logic in do_swap_page()") moved
the call to swap_free() before the call to set_pte_at(), which meant that
the MTE tags could end up being freed before set_pte_at() had a chance to
restore them. Fix it by adding a call to the arch_swap_restore() hook
before the call to swap_free().
Link: https://lkml.kernel.org/r/20230523004312.1807357-2-pcc@google.com
Link: https://linux-review.googlesource.com/id/I6470efa669e8bd2f841049b8c61020c510678965
Fixes: c145e0b47c77 ("mm: streamline COW logic in do_swap_page()")
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reported-by: Qun-wei Lin <Qun-wei.Lin@mediatek.com>
Closes: https://lore.kernel.org/all/5050805753ac469e8d727c797c2218a9d780d434.camel@mediatek.com/
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org> [6.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/memory.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3915,6 +3915,13 @@ vm_fault_t do_swap_page(struct vm_fault
}
/*
+ * Some architectures may have to restore extra metadata to the page
+ * when reading from swap. This metadata may be indexed by swap entry
+ * so this must be called before swap_free().
+ */
+ arch_swap_restore(entry, folio);
+
+ /*
* Remove the swap entry and conditionally try to free up the swapcache.
* We're already holding a reference on the page but haven't mapped it
* yet.
^ permalink raw reply [flat|nested] 440+ messages in thread* [PATCH 6.3 002/431] drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 001/431] mm: call arch_swap_restore() from do_swap_page() Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 003/431] fs: pipe: reveal missing function protoypes Greg Kroah-Hartman
` (429 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Jani Nikula, Jeff Layton,
Jani Nikula, Lyude Paul, Limonciello, Mario
From: Jeff Layton <jlayton@kernel.org>
commit 54d217406afe250d7a768783baaa79a035f21d38 upstream.
I've been experiencing some intermittent crashes down in the display
driver code. The symptoms are ususally a line like this in dmesg:
amdgpu 0000:30:00.0: [drm] Failed to create MST payload for port 000000006d3a3885: -5
...followed by an Oops due to a NULL pointer dereference.
Switch to using mgr->dev instead of state->dev since "state" can be
NULL in some cases.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2184855
Suggested-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230419112447.18471-1-jlayton@kernel.org
Cc: "Limonciello, Mario" <mario.limonciello@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/gpu/drm/display/drm_dp_mst_topology.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -3404,7 +3404,7 @@ int drm_dp_add_payload_part2(struct drm_
/* Skip failed payloads */
if (payload->vc_start_slot == -1) {
- drm_dbg_kms(state->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
+ drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
payload->port->connector->name);
return -EIO;
}
^ permalink raw reply [flat|nested] 440+ messages in thread* [PATCH 6.3 003/431] fs: pipe: reveal missing function protoypes
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 001/431] mm: call arch_swap_restore() from do_swap_page() Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 002/431] drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2 Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 004/431] block: Fix the type of the second bdev_op_is_zoned_write() argument Greg Kroah-Hartman
` (428 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Arnd Bergmann, Christian Brauner,
Sasha Levin
From: Arnd Bergmann <arnd@arndb.de>
[ Upstream commit 247c8d2f9837a3e29e3b6b7a4aa9c36c37659dd4 ]
A couple of functions from fs/pipe.c are used both internally
and for the watch queue code, but the declaration is only
visible when the latter is enabled:
fs/pipe.c:1254:5: error: no previous prototype for 'pipe_resize_ring'
fs/pipe.c:758:15: error: no previous prototype for 'account_pipe_buffers'
fs/pipe.c:764:6: error: no previous prototype for 'too_many_pipe_buffers_soft'
fs/pipe.c:771:6: error: no previous prototype for 'too_many_pipe_buffers_hard'
fs/pipe.c:777:6: error: no previous prototype for 'pipe_is_unprivileged_user'
Make the visible unconditionally to avoid these warnings.
Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Message-Id: <20230516195629.551602-1-arnd@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/pipe_fs_i.h | 4 ----
1 file changed, 4 deletions(-)
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index d2c3f16cf6b18..02e0086b10f6f 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -261,18 +261,14 @@ void generic_pipe_buf_release(struct pipe_inode_info *, struct pipe_buffer *);
extern const struct pipe_buf_operations nosteal_pipe_buf_ops;
-#ifdef CONFIG_WATCH_QUEUE
unsigned long account_pipe_buffers(struct user_struct *user,
unsigned long old, unsigned long new);
bool too_many_pipe_buffers_soft(unsigned long user_bufs);
bool too_many_pipe_buffers_hard(unsigned long user_bufs);
bool pipe_is_unprivileged_user(void);
-#endif
/* for F_SETPIPE_SZ and F_GETPIPE_SZ */
-#ifdef CONFIG_WATCH_QUEUE
int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots);
-#endif
long pipe_fcntl(struct file *, unsigned int, unsigned long arg);
struct pipe_inode_info *get_pipe_info(struct file *file, bool for_splice);
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 004/431] block: Fix the type of the second bdev_op_is_zoned_write() argument
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (2 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 003/431] fs: pipe: reveal missing function protoypes Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 005/431] splice: Fix filemap_splice_read() to use the correct inode Greg Kroah-Hartman
` (427 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Johannes Thumshirn, Pankaj Raghav,
Christoph Hellwig, Damien Le Moal, Hannes Reinecke, Ming Lei,
Bart Van Assche, Jens Axboe, Sasha Levin
From: Bart Van Assche <bvanassche@acm.org>
[ Upstream commit 3ddbe2a7e0d4a155a805f69c906c9beed30d4cc4 ]
Change the type of the second argument of bdev_op_is_zoned_write() from
blk_opf_t into enum req_op because this function expects an operation
without flags as second argument.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Fixes: 8cafdb5ab94c ("block: adapt blk_mq_plug() to not plug for writes that require a zone lock")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230517174230.897144-4-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/blkdev.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 941304f17492f..3d620f298aebd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1297,7 +1297,7 @@ static inline unsigned int bdev_zone_no(struct block_device *bdev, sector_t sec)
}
static inline bool bdev_op_is_zoned_write(struct block_device *bdev,
- blk_opf_t op)
+ enum req_op op)
{
if (!bdev_is_zoned(bdev))
return false;
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 005/431] splice: Fix filemap_splice_read() to use the correct inode
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (3 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 004/431] block: Fix the type of the second bdev_op_is_zoned_write() argument Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 006/431] erofs: kill hooked chains to avoid loops on deduplicated compressed images Greg Kroah-Hartman
` (426 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, David Howells, Christoph Hellwig,
Christian Brauner, Steve French, Jens Axboe, Al Viro,
David Hildenbrand, John Hubbard, linux-mm, linux-block,
linux-fsdevel, Sasha Levin
From: David Howells <dhowells@redhat.com>
[ Upstream commit c37222082f23c456664d1c3182a714670ab8f9a4 ]
Fix filemap_splice_read() to use file->f_mapping->host, not file->f_inode,
as the source of the file size because in the case of a block device,
file->f_inode points to the block-special file (which is typically 0
length) and not the backing store.
Fixes: 07073eb01c5f ("splice: Add a func to do a splice from a buffered file without ITER_PIPE")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
cc: Steve French <stfrench@microsoft.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20230522135018.2742245-2-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/filemap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 2723104cc06a1..8f048e62279a2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2903,7 +2903,7 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
do {
cond_resched();
- if (*ppos >= i_size_read(file_inode(in)))
+ if (*ppos >= i_size_read(in->f_mapping->host))
break;
iocb.ki_pos = *ppos;
@@ -2919,7 +2919,7 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
* part of the page is not copied back to userspace (unless
* another truncate extends the file - this is desired though).
*/
- isize = i_size_read(file_inode(in));
+ isize = i_size_read(in->f_mapping->host);
if (unlikely(*ppos >= isize))
break;
end_offset = min_t(loff_t, isize, *ppos + len);
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 006/431] erofs: kill hooked chains to avoid loops on deduplicated compressed images
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (4 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 005/431] splice: Fix filemap_splice_read() to use the correct inode Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 007/431] x86/resctrl: Only show tasks pid in current pid namespace Greg Kroah-Hartman
` (425 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable; +Cc: Greg Kroah-Hartman, patches, Gao Xiang, Yue Hu, Sasha Levin
From: Gao Xiang <hsiangkao@linux.alibaba.com>
[ Upstream commit 967c28b23f6c89bb8eef6a046ea88afe0d7c1029 ]
After heavily stressing EROFS with several images which include a
hand-crafted image of repeated patterns for more than 46 days, I found
two chains could be linked with each other almost simultaneously and
form a loop so that the entire loop won't be submitted. As a
consequence, the corresponding file pages will remain locked forever.
It can be _only_ observed on data-deduplicated compressed images.
For example, consider two chains with five pclusters in total:
Chain 1: 2->3->4->5 -- The tail pcluster is 5;
Chain 2: 5->1->2 -- The tail pcluster is 2.
Chain 2 could link to Chain 1 with pcluster 5; and Chain 1 could link
to Chain 2 at the same time with pcluster 2.
Since hooked chains are all linked locklessly now, I have no idea how
to simply avoid the race. Instead, let's avoid hooked chains completely
until I could work out a proper way to fix this and end users finally
tell us that it's needed to add it back.
Actually, this optimization can be found with multi-threaded workloads
(especially even more often on deduplicated compressed images), yet I'm
not sure about the overall system impacts of not having this compared
with implementation complexity.
Fixes: 267f2492c8f7 ("erofs: introduce multi-reference pclusters (fully-referenced)")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Link: https://lore.kernel.org/r/20230526201459.128169-4-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/erofs/zdata.c | 72 ++++++++----------------------------------------
1 file changed, 11 insertions(+), 61 deletions(-)
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index d7add72a09437..72325d4b98f9d 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -94,11 +94,8 @@ struct z_erofs_pcluster {
/* let's avoid the valid 32-bit kernel addresses */
-/* the chained workgroup has't submitted io (still open) */
+/* the end of a chain of pclusters */
#define Z_EROFS_PCLUSTER_TAIL ((void *)0x5F0ECAFE)
-/* the chained workgroup has already submitted io */
-#define Z_EROFS_PCLUSTER_TAIL_CLOSED ((void *)0x5F0EDEAD)
-
#define Z_EROFS_PCLUSTER_NIL (NULL)
struct z_erofs_decompressqueue {
@@ -499,20 +496,6 @@ int __init z_erofs_init_zip_subsystem(void)
enum z_erofs_pclustermode {
Z_EROFS_PCLUSTER_INFLIGHT,
- /*
- * The current pclusters was the tail of an exist chain, in addition
- * that the previous processed chained pclusters are all decided to
- * be hooked up to it.
- * A new chain will be created for the remaining pclusters which are
- * not processed yet, so different from Z_EROFS_PCLUSTER_FOLLOWED,
- * the next pcluster cannot reuse the whole page safely for inplace I/O
- * in the following scenario:
- * ________________________________________________________________
- * | tail (partial) page | head (partial) page |
- * | (belongs to the next pcl) | (belongs to the current pcl) |
- * |_______PCLUSTER_FOLLOWED______|________PCLUSTER_HOOKED__________|
- */
- Z_EROFS_PCLUSTER_HOOKED,
/*
* a weak form of Z_EROFS_PCLUSTER_FOLLOWED, the difference is that it
* could be dispatched into bypass queue later due to uptodated managed
@@ -530,8 +513,8 @@ enum z_erofs_pclustermode {
* ________________________________________________________________
* | tail (partial) page | head (partial) page |
* | (of the current cl) | (of the previous collection) |
- * | PCLUSTER_FOLLOWED or | |
- * |_____PCLUSTER_HOOKED__|___________PCLUSTER_FOLLOWED____________|
+ * | | |
+ * |__PCLUSTER_FOLLOWED___|___________PCLUSTER_FOLLOWED____________|
*
* [ (*) the above page can be used as inplace I/O. ]
*/
@@ -544,7 +527,7 @@ struct z_erofs_decompress_frontend {
struct z_erofs_bvec_iter biter;
struct page *candidate_bvpage;
- struct z_erofs_pcluster *pcl, *tailpcl;
+ struct z_erofs_pcluster *pcl;
z_erofs_next_pcluster_t owned_head;
enum z_erofs_pclustermode mode;
@@ -750,19 +733,7 @@ static void z_erofs_try_to_claim_pcluster(struct z_erofs_decompress_frontend *f)
return;
}
- /*
- * type 2, link to the end of an existing open chain, be careful
- * that its submission is controlled by the original attached chain.
- */
- if (*owned_head != &pcl->next && pcl != f->tailpcl &&
- cmpxchg(&pcl->next, Z_EROFS_PCLUSTER_TAIL,
- *owned_head) == Z_EROFS_PCLUSTER_TAIL) {
- *owned_head = Z_EROFS_PCLUSTER_TAIL;
- f->mode = Z_EROFS_PCLUSTER_HOOKED;
- f->tailpcl = NULL;
- return;
- }
- /* type 3, it belongs to a chain, but it isn't the end of the chain */
+ /* type 2, it belongs to an ongoing chain */
f->mode = Z_EROFS_PCLUSTER_INFLIGHT;
}
@@ -823,9 +794,6 @@ static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe)
goto err_out;
}
}
- /* used to check tail merging loop due to corrupted images */
- if (fe->owned_head == Z_EROFS_PCLUSTER_TAIL)
- fe->tailpcl = pcl;
fe->owned_head = &pcl->next;
fe->pcl = pcl;
return 0;
@@ -846,7 +814,6 @@ static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe)
/* must be Z_EROFS_PCLUSTER_TAIL or pointed to previous pcluster */
DBG_BUGON(fe->owned_head == Z_EROFS_PCLUSTER_NIL);
- DBG_BUGON(fe->owned_head == Z_EROFS_PCLUSTER_TAIL_CLOSED);
if (!(map->m_flags & EROFS_MAP_META)) {
grp = erofs_find_workgroup(fe->inode->i_sb,
@@ -865,10 +832,6 @@ static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe)
if (ret == -EEXIST) {
mutex_lock(&fe->pcl->lock);
- /* used to check tail merging loop due to corrupted images */
- if (fe->owned_head == Z_EROFS_PCLUSTER_TAIL)
- fe->tailpcl = fe->pcl;
-
z_erofs_try_to_claim_pcluster(fe);
} else if (ret) {
return ret;
@@ -1025,8 +988,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
* those chains are handled asynchronously thus the page cannot be used
* for inplace I/O or bvpage (should be processed in a strict order.)
*/
- tight &= (fe->mode >= Z_EROFS_PCLUSTER_HOOKED &&
- fe->mode != Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE);
+ tight &= (fe->mode > Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE);
cur = end - min_t(unsigned int, offset + end - map->m_la, end);
if (!(map->m_flags & EROFS_MAP_MAPPED)) {
@@ -1407,10 +1369,7 @@ static void z_erofs_decompress_queue(const struct z_erofs_decompressqueue *io,
};
z_erofs_next_pcluster_t owned = io->head;
- while (owned != Z_EROFS_PCLUSTER_TAIL_CLOSED) {
- /* impossible that 'owned' equals Z_EROFS_WORK_TPTR_TAIL */
- DBG_BUGON(owned == Z_EROFS_PCLUSTER_TAIL);
- /* impossible that 'owned' equals Z_EROFS_PCLUSTER_NIL */
+ while (owned != Z_EROFS_PCLUSTER_TAIL) {
DBG_BUGON(owned == Z_EROFS_PCLUSTER_NIL);
be.pcl = container_of(owned, struct z_erofs_pcluster, next);
@@ -1427,7 +1386,7 @@ static void z_erofs_decompressqueue_work(struct work_struct *work)
container_of(work, struct z_erofs_decompressqueue, u.work);
struct page *pagepool = NULL;
- DBG_BUGON(bgq->head == Z_EROFS_PCLUSTER_TAIL_CLOSED);
+ DBG_BUGON(bgq->head == Z_EROFS_PCLUSTER_TAIL);
z_erofs_decompress_queue(bgq, &pagepool);
erofs_release_pages(&pagepool);
kvfree(bgq);
@@ -1615,7 +1574,7 @@ static struct z_erofs_decompressqueue *jobqueue_init(struct super_block *sb,
q->sync = true;
}
q->sb = sb;
- q->head = Z_EROFS_PCLUSTER_TAIL_CLOSED;
+ q->head = Z_EROFS_PCLUSTER_TAIL;
return q;
}
@@ -1633,11 +1592,7 @@ static void move_to_bypass_jobqueue(struct z_erofs_pcluster *pcl,
z_erofs_next_pcluster_t *const submit_qtail = qtail[JQ_SUBMIT];
z_erofs_next_pcluster_t *const bypass_qtail = qtail[JQ_BYPASS];
- DBG_BUGON(owned_head == Z_EROFS_PCLUSTER_TAIL_CLOSED);
- if (owned_head == Z_EROFS_PCLUSTER_TAIL)
- owned_head = Z_EROFS_PCLUSTER_TAIL_CLOSED;
-
- WRITE_ONCE(pcl->next, Z_EROFS_PCLUSTER_TAIL_CLOSED);
+ WRITE_ONCE(pcl->next, Z_EROFS_PCLUSTER_TAIL);
WRITE_ONCE(*submit_qtail, owned_head);
WRITE_ONCE(*bypass_qtail, &pcl->next);
@@ -1708,15 +1663,10 @@ static void z_erofs_submit_queue(struct z_erofs_decompress_frontend *f,
unsigned int i = 0;
bool bypass = true;
- /* no possible 'owned_head' equals the following */
- DBG_BUGON(owned_head == Z_EROFS_PCLUSTER_TAIL_CLOSED);
DBG_BUGON(owned_head == Z_EROFS_PCLUSTER_NIL);
-
pcl = container_of(owned_head, struct z_erofs_pcluster, next);
+ owned_head = READ_ONCE(pcl->next);
- /* close the main owned chain at first */
- owned_head = cmpxchg(&pcl->next, Z_EROFS_PCLUSTER_TAIL,
- Z_EROFS_PCLUSTER_TAIL_CLOSED);
if (z_erofs_is_inline_pcluster(pcl)) {
move_to_bypass_jobqueue(pcl, qtail, owned_head);
continue;
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 007/431] x86/resctrl: Only show tasks pid in current pid namespace
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (5 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 006/431] erofs: kill hooked chains to avoid loops on deduplicated compressed images Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 008/431] blk-iocost: use spin_lock_irqsave in adjust_inuse_and_calc_cost Greg Kroah-Hartman
` (424 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Shawn Wang, Borislav Petkov (AMD),
Reinette Chatre, Fenghua Yu, Sasha Levin
From: Shawn Wang <shawnwang@linux.alibaba.com>
[ Upstream commit 2997d94b5dd0e8b10076f5e0b6f18410c73e28bd ]
When writing a task id to the "tasks" file in an rdtgroup,
rdtgroup_tasks_write() treats the pid as a number in the current pid
namespace. But when reading the "tasks" file, rdtgroup_tasks_show() shows
the list of global pids from the init namespace, which is confusing and
incorrect.
To be more robust, let the "tasks" file only show pids in the current pid
namespace.
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/all/20230116071246.97717-1-shawnwang@linux.alibaba.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 6ad33f355861f..61cdd9b1bb6d8 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -726,11 +726,15 @@ static ssize_t rdtgroup_tasks_write(struct kernfs_open_file *of,
static void show_rdt_tasks(struct rdtgroup *r, struct seq_file *s)
{
struct task_struct *p, *t;
+ pid_t pid;
rcu_read_lock();
for_each_process_thread(p, t) {
- if (is_closid_match(t, r) || is_rmid_match(t, r))
- seq_printf(s, "%d\n", t->pid);
+ if (is_closid_match(t, r) || is_rmid_match(t, r)) {
+ pid = task_pid_vnr(t);
+ if (pid)
+ seq_printf(s, "%d\n", pid);
+ }
}
rcu_read_unlock();
}
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 008/431] blk-iocost: use spin_lock_irqsave in adjust_inuse_and_calc_cost
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (6 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 007/431] x86/resctrl: Only show tasks pid in current pid namespace Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 009/431] x86/sev: Fix calculation of end address based on number of pages Greg Kroah-Hartman
` (423 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Li Nan, Tejun Heo, Yu Kuai,
Jens Axboe, Sasha Levin
From: Li Nan <linan122@huawei.com>
[ Upstream commit 8d211554679d0b23702bd32ba04aeac0c1c4f660 ]
adjust_inuse_and_calc_cost() use spin_lock_irq() and IRQ will be enabled
when unlock. DEADLOCK might happen if we have held other locks and disabled
IRQ before invoking it.
Fix it by using spin_lock_irqsave() instead, which can keep IRQ state
consistent with before when unlock.
================================
WARNING: inconsistent lock state
5.10.0-02758-g8e5f91fd772f #26 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
kworker/2:3/388 [HC0[0]:SC0[0]:HE0:SE1] takes:
ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq
ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390
{IN-HARDIRQ-W} state was registered at:
__lock_acquire+0x3d7/0x1070
lock_acquire+0x197/0x4a0
__raw_spin_lock_irqsave
_raw_spin_lock_irqsave+0x3b/0x60
bfq_idle_slice_timer_body
bfq_idle_slice_timer+0x53/0x1d0
__run_hrtimer+0x477/0xa70
__hrtimer_run_queues+0x1c6/0x2d0
hrtimer_interrupt+0x302/0x9e0
local_apic_timer_interrupt
__sysvec_apic_timer_interrupt+0xfd/0x420
run_sysvec_on_irqstack_cond
sysvec_apic_timer_interrupt+0x46/0xa0
asm_sysvec_apic_timer_interrupt+0x12/0x20
irq event stamp: 837522
hardirqs last enabled at (837521): [<ffffffff84b9419d>] __raw_spin_unlock_irqrestore
hardirqs last enabled at (837521): [<ffffffff84b9419d>] _raw_spin_unlock_irqrestore+0x3d/0x40
hardirqs last disabled at (837522): [<ffffffff84b93fa3>] __raw_spin_lock_irq
hardirqs last disabled at (837522): [<ffffffff84b93fa3>] _raw_spin_lock_irq+0x43/0x50
softirqs last enabled at (835852): [<ffffffff84e00558>] __do_softirq+0x558/0x8ec
softirqs last disabled at (835845): [<ffffffff84c010ff>] asm_call_irq_on_stack+0xf/0x20
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&bfqd->lock);
<Interrupt>
lock(&bfqd->lock);
*** DEADLOCK ***
3 locks held by kworker/2:3/388:
#0: ffff888107af0f38 ((wq_completion)kthrotld){+.+.}-{0:0}, at: process_one_work+0x742/0x13f0
#1: ffff8881176bfdd8 ((work_completion)(&td->dispatch_work)){+.+.}-{0:0}, at: process_one_work+0x777/0x13f0
#2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq
#2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390
stack backtrace:
CPU: 2 PID: 388 Comm: kworker/2:3 Not tainted 5.10.0-02758-g8e5f91fd772f #26
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Workqueue: kthrotld blk_throtl_dispatch_work_fn
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x107/0x167
print_usage_bug
valid_state
mark_lock_irq.cold+0x32/0x3a
mark_lock+0x693/0xbc0
mark_held_locks+0x9e/0xe0
__trace_hardirqs_on_caller
lockdep_hardirqs_on_prepare.part.0+0x151/0x360
trace_hardirqs_on+0x5b/0x180
__raw_spin_unlock_irq
_raw_spin_unlock_irq+0x24/0x40
spin_unlock_irq
adjust_inuse_and_calc_cost+0x4fb/0x970
ioc_rqos_merge+0x277/0x740
__rq_qos_merge+0x62/0xb0
rq_qos_merge
bio_attempt_back_merge+0x12c/0x4a0
blk_mq_sched_try_merge+0x1b6/0x4d0
bfq_bio_merge+0x24a/0x390
__blk_mq_sched_bio_merge+0xa6/0x460
blk_mq_sched_bio_merge
blk_mq_submit_bio+0x2e7/0x1ee0
__submit_bio_noacct_mq+0x175/0x3b0
submit_bio_noacct+0x1fb/0x270
blk_throtl_dispatch_work_fn+0x1ef/0x2b0
process_one_work+0x83e/0x13f0
process_scheduled_works
worker_thread+0x7e3/0xd80
kthread+0x353/0x470
ret_from_fork+0x1f/0x30
Fixes: b0853ab4a238 ("blk-iocost: revamp in-period donation snapbacks")
Signed-off-by: Li Nan <linan122@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20230527091904.3001833-1-linan666@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
block/blk-iocost.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 285ced3467abb..6084a9519883e 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -2455,6 +2455,7 @@ static u64 adjust_inuse_and_calc_cost(struct ioc_gq *iocg, u64 vtime,
u32 hwi, adj_step;
s64 margin;
u64 cost, new_inuse;
+ unsigned long flags;
current_hweight(iocg, NULL, &hwi);
old_hwi = hwi;
@@ -2473,11 +2474,11 @@ static u64 adjust_inuse_and_calc_cost(struct ioc_gq *iocg, u64 vtime,
iocg->inuse == iocg->active)
return cost;
- spin_lock_irq(&ioc->lock);
+ spin_lock_irqsave(&ioc->lock, flags);
/* we own inuse only when @iocg is in the normal active state */
if (iocg->abs_vdebt || list_empty(&iocg->active_list)) {
- spin_unlock_irq(&ioc->lock);
+ spin_unlock_irqrestore(&ioc->lock, flags);
return cost;
}
@@ -2498,7 +2499,7 @@ static u64 adjust_inuse_and_calc_cost(struct ioc_gq *iocg, u64 vtime,
} while (time_after64(vtime + cost, now->vnow) &&
iocg->inuse != iocg->active);
- spin_unlock_irq(&ioc->lock);
+ spin_unlock_irqrestore(&ioc->lock, flags);
TRACE_IOCG_PATH(inuse_adjust, iocg, now,
old_inuse, iocg->inuse, old_hwi, hwi);
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 009/431] x86/sev: Fix calculation of end address based on number of pages
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (7 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 008/431] blk-iocost: use spin_lock_irqsave in adjust_inuse_and_calc_cost Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 010/431] blk-cgroup: Reinit blkg_iostat_set after clearing in blkcg_reset_stats() Greg Kroah-Hartman
` (422 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Tom Lendacky, Borislav Petkov (AMD),
Sasha Levin
From: Tom Lendacky <thomas.lendacky@amd.com>
[ Upstream commit 5dee19b6b2b194216919b99a1f5af2949a754016 ]
When calculating an end address based on an unsigned int number of pages,
any value greater than or equal to 0x100000 that is shift PAGE_SHIFT bits
results in a 0 value, resulting in an invalid end address. Change the
number of pages variable in various routines from an unsigned int to an
unsigned long to calculate the end address correctly.
Fixes: 5e5ccff60a29 ("x86/sev: Add helper for validating pages in early enc attribute changes")
Fixes: dc3f3d2474b8 ("x86/mm: Validate memory when changing the C-bit")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/6a6e4eea0e1414402bac747744984fa4e9c01bb6.1686063086.git.thomas.lendacky@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/x86/include/asm/sev.h | 16 ++++++++--------
arch/x86/kernel/sev.c | 14 +++++++-------
2 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index ebc271bb6d8ed..a0a58c4122ec3 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -187,12 +187,12 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
}
void setup_ghcb(void);
void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
- unsigned int npages);
+ unsigned long npages);
void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
- unsigned int npages);
+ unsigned long npages);
void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op);
-void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
-void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
+void snp_set_memory_shared(unsigned long vaddr, unsigned long npages);
+void snp_set_memory_private(unsigned long vaddr, unsigned long npages);
void snp_set_wakeup_secondary_cpu(void);
bool snp_init(struct boot_params *bp);
void __init __noreturn snp_abort(void);
@@ -207,12 +207,12 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
static inline int rmpadjust(unsigned long vaddr, bool rmp_psize, unsigned long attrs) { return 0; }
static inline void setup_ghcb(void) { }
static inline void __init
-early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
+early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned long npages) { }
static inline void __init
-early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
+early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned long npages) { }
static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { }
-static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
-static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
+static inline void snp_set_memory_shared(unsigned long vaddr, unsigned long npages) { }
+static inline void snp_set_memory_private(unsigned long vaddr, unsigned long npages) { }
static inline void snp_set_wakeup_secondary_cpu(void) { }
static inline bool snp_init(struct boot_params *bp) { return false; }
static inline void snp_abort(void) { }
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 3f664ab277c49..45ef3926381f8 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -643,7 +643,7 @@ static u64 __init get_jump_table_addr(void)
return ret;
}
-static void pvalidate_pages(unsigned long vaddr, unsigned int npages, bool validate)
+static void pvalidate_pages(unsigned long vaddr, unsigned long npages, bool validate)
{
unsigned long vaddr_end;
int rc;
@@ -660,7 +660,7 @@ static void pvalidate_pages(unsigned long vaddr, unsigned int npages, bool valid
}
}
-static void __init early_set_pages_state(unsigned long paddr, unsigned int npages, enum psc_op op)
+static void __init early_set_pages_state(unsigned long paddr, unsigned long npages, enum psc_op op)
{
unsigned long paddr_end;
u64 val;
@@ -699,7 +699,7 @@ static void __init early_set_pages_state(unsigned long paddr, unsigned int npage
}
void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
- unsigned int npages)
+ unsigned long npages)
{
/*
* This can be invoked in early boot while running identity mapped, so
@@ -721,7 +721,7 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
}
void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
- unsigned int npages)
+ unsigned long npages)
{
/*
* This can be invoked in early boot while running identity mapped, so
@@ -877,7 +877,7 @@ static void __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
}
-static void set_pages_state(unsigned long vaddr, unsigned int npages, int op)
+static void set_pages_state(unsigned long vaddr, unsigned long npages, int op)
{
unsigned long vaddr_end, next_vaddr;
struct snp_psc_desc *desc;
@@ -902,7 +902,7 @@ static void set_pages_state(unsigned long vaddr, unsigned int npages, int op)
kfree(desc);
}
-void snp_set_memory_shared(unsigned long vaddr, unsigned int npages)
+void snp_set_memory_shared(unsigned long vaddr, unsigned long npages)
{
if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
return;
@@ -912,7 +912,7 @@ void snp_set_memory_shared(unsigned long vaddr, unsigned int npages)
set_pages_state(vaddr, npages, SNP_PAGE_STATE_SHARED);
}
-void snp_set_memory_private(unsigned long vaddr, unsigned int npages)
+void snp_set_memory_private(unsigned long vaddr, unsigned long npages)
{
if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
return;
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 010/431] blk-cgroup: Reinit blkg_iostat_set after clearing in blkcg_reset_stats()
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (8 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 009/431] x86/sev: Fix calculation of end address based on number of pages Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 011/431] virt: sevguest: Add CONFIG_CRYPTO dependency Greg Kroah-Hartman
` (421 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Waiman Long, Ming Lei, Tejun Heo,
Jens Axboe, Sasha Levin
From: Waiman Long <longman@redhat.com>
[ Upstream commit 3d2af77e31ade05ff7ccc3658c3635ec1bea0979 ]
When blkg_alloc() is called to allocate a blkcg_gq structure
with the associated blkg_iostat_set's, there are 2 fields within
blkg_iostat_set that requires proper initialization - blkg & sync.
The former field was introduced by commit 3b8cc6298724 ("blk-cgroup:
Optimize blkcg_rstat_flush()") while the later one was introduced by
commit f73316482977 ("blk-cgroup: reimplement basic IO stats using
cgroup rstat").
Unfortunately those fields in the blkg_iostat_set's are not properly
re-initialized when they are cleared in v1's blkcg_reset_stats(). This
can lead to a kernel panic due to NULL pointer access of the blkg
pointer. The missing initialization of sync is less problematic and
can be a problem in a debug kernel due to missing lockdep initialization.
Fix these problems by re-initializing them after memory clearing.
Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()")
Fixes: f73316482977 ("blk-cgroup: reimplement basic IO stats using cgroup rstat")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230606180724.2455066-1-longman@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
block/blk-cgroup.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index ad0cd992a6519..c50da8b3af029 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -585,8 +585,13 @@ static int blkcg_reset_stats(struct cgroup_subsys_state *css,
struct blkg_iostat_set *bis =
per_cpu_ptr(blkg->iostat_cpu, cpu);
memset(bis, 0, sizeof(*bis));
+
+ /* Re-initialize the cleared blkg_iostat_set */
+ u64_stats_init(&bis->sync);
+ bis->blkg = blkg;
}
memset(&blkg->iostat, 0, sizeof(blkg->iostat));
+ u64_stats_init(&blkg->iostat.sync);
for (i = 0; i < BLKCG_MAX_POLS; i++) {
struct blkcg_policy *pol = blkcg_policy[i];
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 011/431] virt: sevguest: Add CONFIG_CRYPTO dependency
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (9 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 010/431] blk-cgroup: Reinit blkg_iostat_set after clearing in blkcg_reset_stats() Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 012/431] blk-mq: fix potential io hang by wrong wake_batch Greg Kroah-Hartman
` (420 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Arnd Bergmann, Borislav Petkov (AMD),
Sasha Levin
From: Arnd Bergmann <arnd@arndb.de>
[ Upstream commit 84b9b44b99780d35fe72ac63c4724f158771e898 ]
This driver fails to link when CRYPTO is disabled, or in a loadable
module:
WARNING: unmet direct dependencies detected for CRYPTO_GCM
WARNING: unmet direct dependencies detected for CRYPTO_AEAD2
Depends on [m]: CRYPTO [=m]
Selected by [y]:
- SEV_GUEST [=y] && VIRT_DRIVERS [=y] && AMD_MEM_ENCRYPT [=y]
x86_64-linux-ld: crypto/aead.o: in function `crypto_register_aeads':
Fixes: fce96cf04430 ("virt: Add SEV-SNP guest driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230117171416.2715125-1-arnd@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/virt/coco/sev-guest/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/virt/coco/sev-guest/Kconfig b/drivers/virt/coco/sev-guest/Kconfig
index f9db0799ae67c..da2d7ca531f0f 100644
--- a/drivers/virt/coco/sev-guest/Kconfig
+++ b/drivers/virt/coco/sev-guest/Kconfig
@@ -2,6 +2,7 @@ config SEV_GUEST
tristate "AMD SEV Guest driver"
default m
depends on AMD_MEM_ENCRYPT
+ select CRYPTO
select CRYPTO_AEAD2
select CRYPTO_GCM
help
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 012/431] blk-mq: fix potential io hang by wrong wake_batch
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (10 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 011/431] virt: sevguest: Add CONFIG_CRYPTO dependency Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 013/431] lockd: drop inappropriate svc_get() from locked_get() Greg Kroah-Hartman
` (419 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Yu Kuai, Jan Kara, Jens Axboe,
Sasha Levin
From: Yu Kuai <yukuai3@huawei.com>
[ Upstream commit 4f1731df60f9033669f024d06ae26a6301260b55 ]
In __blk_mq_tag_busy/idle(), updating 'active_queues' and calculating
'wake_batch' is not atomic:
t1: t2:
_blk_mq_tag_busy blk_mq_tag_busy
inc active_queues
// assume 1->2
inc active_queues
// 2 -> 3
blk_mq_update_wake_batch
// calculate based on 3
blk_mq_update_wake_batch
/* calculate based on 2, while active_queues is actually 3. */
Fix this problem by protecting them wih 'tags->lock', this is not a hot
path, so performance should not be concerned. And now that all writers
are inside the lock, switch 'actives_queues' from atomic to unsigned
int.
Fixes: 180dccb0dba4 ("blk-mq: fix tag_get wait task can't be awakened")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230610023043.2559121-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
block/blk-mq-debugfs.c | 2 +-
block/blk-mq-tag.c | 15 ++++++++++-----
block/blk-mq.h | 3 +--
include/linux/blk-mq.h | 3 +--
4 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index b01818f8e216e..c152276736832 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -427,7 +427,7 @@ static void blk_mq_debugfs_tags_show(struct seq_file *m,
seq_printf(m, "nr_tags=%u\n", tags->nr_tags);
seq_printf(m, "nr_reserved_tags=%u\n", tags->nr_reserved_tags);
seq_printf(m, "active_queues=%d\n",
- atomic_read(&tags->active_queues));
+ READ_ONCE(tags->active_queues));
seq_puts(m, "\nbitmap_tags:\n");
sbitmap_queue_show(&tags->bitmap_tags, m);
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index a80d7c62bdfe6..100889c276c3f 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -40,6 +40,7 @@ static void blk_mq_update_wake_batch(struct blk_mq_tags *tags,
void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
{
unsigned int users;
+ struct blk_mq_tags *tags = hctx->tags;
/*
* calling test_bit() prior to test_and_set_bit() is intentional,
@@ -57,9 +58,11 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
return;
}
- users = atomic_inc_return(&hctx->tags->active_queues);
-
- blk_mq_update_wake_batch(hctx->tags, users);
+ spin_lock_irq(&tags->lock);
+ users = tags->active_queues + 1;
+ WRITE_ONCE(tags->active_queues, users);
+ blk_mq_update_wake_batch(tags, users);
+ spin_unlock_irq(&tags->lock);
}
/*
@@ -92,9 +95,11 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
return;
}
- users = atomic_dec_return(&tags->active_queues);
-
+ spin_lock_irq(&tags->lock);
+ users = tags->active_queues - 1;
+ WRITE_ONCE(tags->active_queues, users);
blk_mq_update_wake_batch(tags, users);
+ spin_unlock_irq(&tags->lock);
blk_mq_tag_wakeup_all(tags, false);
}
diff --git a/block/blk-mq.h b/block/blk-mq.h
index a7482d2cc82e7..4542308c8e62f 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -362,8 +362,7 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
return true;
}
- users = atomic_read(&hctx->tags->active_queues);
-
+ users = READ_ONCE(hctx->tags->active_queues);
if (!users)
return true;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index de0b0c3e7395a..4110d6e99b2b9 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -748,8 +748,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
struct blk_mq_tags {
unsigned int nr_tags;
unsigned int nr_reserved_tags;
-
- atomic_t active_queues;
+ unsigned int active_queues;
struct sbitmap_queue bitmap_tags;
struct sbitmap_queue breserved_tags;
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 013/431] lockd: drop inappropriate svc_get() from locked_get()
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (11 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 012/431] blk-mq: fix potential io hang by wrong wake_batch Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 014/431] nvme-core: fix memory leak in dhchap_secret_store Greg Kroah-Hartman
` (418 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Ido Schimmel, NeilBrown,
Ido Schimmel, Chuck Lever, Sasha Levin
From: NeilBrown <neilb@suse.de>
[ Upstream commit 665e89ab7c5af1f2d260834c861a74b01a30f95f ]
The below-mentioned patch was intended to simplify refcounting on the
svc_serv used by locked. The goal was to only ever have a single
reference from the single thread. To that end we dropped a call to
lockd_start_svc() (except when creating thread) which would take a
reference, and dropped the svc_put(serv) that would drop that reference.
Unfortunately we didn't also remove the svc_get() from
lockd_create_svc() in the case where the svc_serv already existed.
So after the patch:
- on the first call the svc_serv was allocated and the one reference
was given to the thread, so there are no extra references
- on subsequent calls svc_get() was called so there is now an extra
reference.
This is clearly not consistent.
The inconsistency is also clear in the current code in lockd_get()
takes *two* references, one on nlmsvc_serv and one by incrementing
nlmsvc_users. This clearly does not match lockd_put().
So: drop that svc_get() from lockd_get() (which used to be in
lockd_create_svc().
Reported-by: Ido Schimmel <idosch@idosch.org>
Closes: https://lore.kernel.org/linux-nfs/ZHsI%2FH16VX9kJQX1@shredder/T/#u
Fixes: b73a2972041b ("lockd: move lockd_start_svc() call into lockd_create_svc()")
Signed-off-by: NeilBrown <neilb@suse.de>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/lockd/svc.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 9a47303b2cba6..0c05668019c2b 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -355,7 +355,6 @@ static int lockd_get(void)
int error;
if (nlmsvc_serv) {
- svc_get(nlmsvc_serv);
nlmsvc_users++;
return 0;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 014/431] nvme-core: fix memory leak in dhchap_secret_store
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (12 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 013/431] lockd: drop inappropriate svc_get() from locked_get() Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 015/431] nvme-core: fix memory leak in dhchap_ctrl_secret Greg Kroah-Hartman
` (417 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Chaitanya Kulkarni, Yi Zhang,
Christoph Hellwig, Sagi Grimberg, Keith Busch, Sasha Levin
From: Chaitanya Kulkarni <kch@nvidia.com>
[ Upstream commit a836ca33c5b07d34dd5347af9f64d25651d12674 ]
Free dhchap_secret in nvme_ctrl_dhchap_secret_store() before we return
fix following kmemleack:-
unreferenced object 0xffff8886376ea800 (size 64):
comm "check", pid 22048, jiffies 4344316705 (age 92.199s)
hex dump (first 32 bytes):
44 48 48 43 2d 31 3a 30 30 3a 6e 78 72 35 4b 67 DHHC-1:00:nxr5Kg
75 58 34 75 6f 41 78 73 4a 61 34 63 2f 68 75 4c uX4uoAxsJa4c/huL
backtrace:
[<0000000030ce5d4b>] __kmalloc+0x4b/0x130
[<000000009be1cdc1>] nvme_ctrl_dhchap_secret_store+0x8f/0x160 [nvme_core]
[<00000000ac06c96a>] kernfs_fop_write_iter+0x12b/0x1c0
[<00000000437e7ced>] vfs_write+0x2ba/0x3c0
[<00000000f9491baf>] ksys_write+0x5f/0xe0
[<000000001c46513d>] do_syscall_64+0x3b/0x90
[<00000000ecf348fe>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
unreferenced object 0xffff8886376eaf00 (size 64):
comm "check", pid 22048, jiffies 4344316736 (age 92.168s)
hex dump (first 32 bytes):
44 48 48 43 2d 31 3a 30 30 3a 6e 78 72 35 4b 67 DHHC-1:00:nxr5Kg
75 58 34 75 6f 41 78 73 4a 61 34 63 2f 68 75 4c uX4uoAxsJa4c/huL
backtrace:
[<0000000030ce5d4b>] __kmalloc+0x4b/0x130
[<000000009be1cdc1>] nvme_ctrl_dhchap_secret_store+0x8f/0x160 [nvme_core]
[<00000000ac06c96a>] kernfs_fop_write_iter+0x12b/0x1c0
[<00000000437e7ced>] vfs_write+0x2ba/0x3c0
[<00000000f9491baf>] ksys_write+0x5f/0xe0
[<000000001c46513d>] do_syscall_64+0x3b/0x90
[<00000000ecf348fe>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Fixes: f50fff73d620 ("nvme: implement In-Band authentication")
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/nvme/host/core.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8a632bf7f5a8f..8a706ca1b3e14 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3872,8 +3872,10 @@ static ssize_t nvme_ctrl_dhchap_secret_store(struct device *dev,
int ret;
ret = nvme_auth_generate_key(dhchap_secret, &key);
- if (ret)
+ if (ret) {
+ kfree(dhchap_secret);
return ret;
+ }
kfree(opts->dhchap_secret);
opts->dhchap_secret = dhchap_secret;
host_key = ctrl->host_key;
@@ -3881,7 +3883,8 @@ static ssize_t nvme_ctrl_dhchap_secret_store(struct device *dev,
ctrl->host_key = key;
mutex_unlock(&ctrl->dhchap_auth_mutex);
nvme_auth_free_key(host_key);
- }
+ } else
+ kfree(dhchap_secret);
/* Start re-authentication */
dev_info(ctrl->device, "re-authenticating controller\n");
queue_work(nvme_wq, &ctrl->dhchap_auth_work);
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 015/431] nvme-core: fix memory leak in dhchap_ctrl_secret
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (13 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 014/431] nvme-core: fix memory leak in dhchap_secret_store Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 016/431] nvme-core: add missing fault-injection cleanup Greg Kroah-Hartman
` (416 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Chaitanya Kulkarni,
Christoph Hellwig, Sagi Grimberg, Keith Busch, Sasha Levin
From: Chaitanya Kulkarni <kch@nvidia.com>
[ Upstream commit 99c2dcc8ffc24e210a3aa05c204d92f3ef460b05 ]
Free dhchap_secret in nvme_ctrl_dhchap_ctrl_secret_store() before we
return when nvme_auth_generate_key() returns error.
Fixes: f50fff73d620 ("nvme: implement In-Band authentication")
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/nvme/host/core.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8a706ca1b3e14..b03f5a34b1ee0 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3929,8 +3929,10 @@ static ssize_t nvme_ctrl_dhchap_ctrl_secret_store(struct device *dev,
int ret;
ret = nvme_auth_generate_key(dhchap_secret, &key);
- if (ret)
+ if (ret) {
+ kfree(dhchap_secret);
return ret;
+ }
kfree(opts->dhchap_ctrl_secret);
opts->dhchap_ctrl_secret = dhchap_secret;
ctrl_key = ctrl->ctrl_key;
@@ -3938,7 +3940,8 @@ static ssize_t nvme_ctrl_dhchap_ctrl_secret_store(struct device *dev,
ctrl->ctrl_key = key;
mutex_unlock(&ctrl->dhchap_auth_mutex);
nvme_auth_free_key(ctrl_key);
- }
+ } else
+ kfree(dhchap_secret);
/* Start re-authentication */
dev_info(ctrl->device, "re-authenticating controller\n");
queue_work(nvme_wq, &ctrl->dhchap_auth_work);
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 016/431] nvme-core: add missing fault-injection cleanup
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (14 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 015/431] nvme-core: fix memory leak in dhchap_ctrl_secret Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 017/431] nvme-core: fix dev_pm_qos memleak Greg Kroah-Hartman
` (415 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Chaitanya Kulkarni, Yi Zhang,
Christoph Hellwig, Sagi Grimberg, Keith Busch, Sasha Levin
From: Chaitanya Kulkarni <kch@nvidia.com>
[ Upstream commit 3a12a0b868a512fcada564699d00f5e652c0998c ]
Add missing fault-injection cleanup in nvme_init_ctrl() in the error
unwind path that also fixes following message for blktests:-
linux-block (for-next) # grep debugfs debugfs-err.log
[ 147.853464] debugfs: Directory 'nvme1' with parent '/' already present!
[ 147.853973] nvme1: failed to create debugfs attr
[ 148.802490] debugfs: Directory 'nvme1' with parent '/' already present!
[ 148.803244] nvme1: failed to create debugfs attr
[ 148.877304] debugfs: Directory 'nvme1' with parent '/' already present!
[ 148.877775] nvme1: failed to create debugfs attr
[ 149.816652] debugfs: Directory 'nvme1' with parent '/' already present!
[ 149.818011] nvme1: failed to create debugfs attr
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Stable-dep-of: 7ed5cf8e6d9b ("nvme-core: fix dev_pm_qos memleak")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/nvme/host/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b03f5a34b1ee0..f07fd2fed3f9d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5249,6 +5249,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
return 0;
out_free_cdev:
+ nvme_fault_inject_fini(&ctrl->fault_inject);
cdev_device_del(&ctrl->cdev, ctrl->device);
out_free_name:
nvme_put_ctrl(ctrl);
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 017/431] nvme-core: fix dev_pm_qos memleak
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (15 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 016/431] nvme-core: add missing fault-injection cleanup Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 018/431] md/raid10: check slab-out-of-bounds in md_bitmap_get_counter Greg Kroah-Hartman
` (414 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Chaitanya Kulkarni, Yi Zhang,
Christoph Hellwig, Sagi Grimberg, Keith Busch, Sasha Levin
From: Chaitanya Kulkarni <kch@nvidia.com>
[ Upstream commit 7ed5cf8e6d9bfb6a78d0471317edff14f0f2b4dd ]
Call dev_pm_qos_hide_latency_tolerance() in the error unwind patch to
avoid following kmemleak:-
blktests (master) # kmemleak-clear; ./check nvme/044;
blktests (master) # kmemleak-scan ; kmemleak-show
nvme/044 (Test bi-directional authentication) [passed]
runtime 2.111s ... 2.124s
unreferenced object 0xffff888110c46240 (size 96):
comm "nvme", pid 33461, jiffies 4345365353 (age 75.586s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000069ac2cec>] kmalloc_trace+0x25/0x90
[<000000006acc66d5>] dev_pm_qos_update_user_latency_tolerance+0x6f/0x100
[<00000000cc376ea7>] nvme_init_ctrl+0x38e/0x410 [nvme_core]
[<000000007df61b4b>] 0xffffffffc05e88b3
[<00000000d152b985>] 0xffffffffc05744cb
[<00000000f04a4041>] vfs_write+0xc5/0x3c0
[<00000000f9491baf>] ksys_write+0x5f/0xe0
[<000000001c46513d>] do_syscall_64+0x3b/0x90
[<00000000ecf348fe>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Link: https://lore.kernel.org/linux-nvme/CAHj4cs-nDaKzMx2txO4dbE+Mz9ePwLtU0e3egz+StmzOUgWUrA@mail.gmail.com/
Fixes: f50fff73d620 ("nvme: implement In-Band authentication")
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/nvme/host/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f07fd2fed3f9d..8d8403b65e1b3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5250,6 +5250,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
return 0;
out_free_cdev:
nvme_fault_inject_fini(&ctrl->fault_inject);
+ dev_pm_qos_hide_latency_tolerance(ctrl->device);
cdev_device_del(&ctrl->cdev, ctrl->device);
out_free_name:
nvme_put_ctrl(ctrl);
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 018/431] md/raid10: check slab-out-of-bounds in md_bitmap_get_counter
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (16 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 017/431] nvme-core: fix dev_pm_qos memleak Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 019/431] md/raid10: fix overflow of md/safe_mode_delay Greg Kroah-Hartman
` (413 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable; +Cc: Greg Kroah-Hartman, patches, Li Nan, Yu Kuai, Song Liu,
Sasha Levin
From: Li Nan <linan122@huawei.com>
[ Upstream commit 301867b1c16805aebbc306aafa6ecdc68b73c7e5 ]
If we write a large number to md/bitmap_set_bits, md_bitmap_checkpage()
will return -EINVAL because 'page >= bitmap->pages', but the return value
was not checked immediately in md_bitmap_get_counter() in order to set
*blocks value and slab-out-of-bounds occurs.
Move check of 'page >= bitmap->pages' to md_bitmap_get_counter() and
return directly if true.
Fixes: ef4256733506 ("md/bitmap: optimise scanning of empty bitmaps.")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230515134808.3936750-2-linan666@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/md/md-bitmap.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index e7cc6ba1b657f..9640741e8d369 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -54,14 +54,7 @@ __acquires(bitmap->lock)
{
unsigned char *mappage;
- if (page >= bitmap->pages) {
- /* This can happen if bitmap_start_sync goes beyond
- * End-of-device while looking for a whole page.
- * It is harmless.
- */
- return -EINVAL;
- }
-
+ WARN_ON_ONCE(page >= bitmap->pages);
if (bitmap->bp[page].hijacked) /* it's hijacked, don't try to alloc */
return 0;
@@ -1364,6 +1357,14 @@ __acquires(bitmap->lock)
sector_t csize;
int err;
+ if (page >= bitmap->pages) {
+ /*
+ * This can happen if bitmap_start_sync goes beyond
+ * End-of-device while looking for a whole page or
+ * user set a huge number to sysfs bitmap_set_bits.
+ */
+ return NULL;
+ }
err = md_bitmap_checkpage(bitmap, page, create, 0);
if (bitmap->bp[page].hijacked ||
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 019/431] md/raid10: fix overflow of md/safe_mode_delay
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (17 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 018/431] md/raid10: check slab-out-of-bounds in md_bitmap_get_counter Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 020/431] md/raid10: fix wrong setting of max_corr_read_errors Greg Kroah-Hartman
` (412 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable; +Cc: Greg Kroah-Hartman, patches, Li Nan, Song Liu, Sasha Levin
From: Li Nan <linan122@huawei.com>
[ Upstream commit 6beb489b2eed25978523f379a605073f99240c50 ]
There is no input check when echo md/safe_mode_delay in safe_delay_store().
And msec might also overflow when HZ < 1000 in safe_delay_show(), Fix it by
checking overflow in safe_delay_store() and use unsigned long conversion in
safe_delay_show().
Fixes: 72e02075a33f ("md: factor out parsing of fixed-point numbers")
Signed-off-by: Li Nan <linan122@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230522072535.1523740-2-linan666@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/md/md.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index d479e1656ef33..61ad7dfe0e99a 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3814,8 +3814,9 @@ int strict_strtoul_scaled(const char *cp, unsigned long *res, int scale)
static ssize_t
safe_delay_show(struct mddev *mddev, char *page)
{
- int msec = (mddev->safemode_delay*1000)/HZ;
- return sprintf(page, "%d.%03d\n", msec/1000, msec%1000);
+ unsigned int msec = ((unsigned long)mddev->safemode_delay*1000)/HZ;
+
+ return sprintf(page, "%u.%03u\n", msec/1000, msec%1000);
}
static ssize_t
safe_delay_store(struct mddev *mddev, const char *cbuf, size_t len)
@@ -3827,7 +3828,7 @@ safe_delay_store(struct mddev *mddev, const char *cbuf, size_t len)
return -EINVAL;
}
- if (strict_strtoul_scaled(cbuf, &msec, 3) < 0)
+ if (strict_strtoul_scaled(cbuf, &msec, 3) < 0 || msec > UINT_MAX / HZ)
return -EINVAL;
if (msec == 0)
mddev->safemode_delay = 0;
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 020/431] md/raid10: fix wrong setting of max_corr_read_errors
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (18 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 019/431] md/raid10: fix overflow of md/safe_mode_delay Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 021/431] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request Greg Kroah-Hartman
` (411 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable; +Cc: Greg Kroah-Hartman, patches, Li Nan, Yu Kuai, Song Liu,
Sasha Levin
From: Li Nan <linan122@huawei.com>
[ Upstream commit f8b20a405428803bd9881881d8242c9d72c6b2b2 ]
There is no input check when echo md/max_read_errors and overflow might
occur. Add check of input number.
Fixes: 1e50915fe0bb ("raid: improve MD/raid10 handling of correctable read errors.")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230522072535.1523740-3-linan666@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/md/md.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 61ad7dfe0e99a..2674cb1d699c0 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4498,6 +4498,8 @@ max_corrected_read_errors_store(struct mddev *mddev, const char *buf, size_t len
rv = kstrtouint(buf, 10, &n);
if (rv < 0)
return rv;
+ if (n > INT_MAX)
+ return -EINVAL;
atomic_set(&mddev->max_corr_read_errors, n);
return len;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 021/431] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (19 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 020/431] md/raid10: fix wrong setting of max_corr_read_errors Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 022/431] md/raid10: fix io loss while replacement replace rdev Greg Kroah-Hartman
` (410 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable; +Cc: Greg Kroah-Hartman, patches, Li Nan, Yu Kuai, Song Liu,
Sasha Levin
From: Li Nan <linan122@huawei.com>
[ Upstream commit 34817a2441747b48e444cb0e05d84e14bc9443da ]
There are two check of 'mreplace' in raid10_sync_request(). In the first
check, 'need_replace' will be set and 'mreplace' will be used later if
no-Faulty 'mreplace' exists, In the second check, 'mreplace' will be
set to NULL if it is Faulty, but 'need_replace' will not be changed
accordingly. null-ptr-deref occurs if Faulty is set between two check.
Fix it by merging two checks into one. And replace 'need_replace' with
'mreplace' because their values are always the same.
Fixes: ee37d7314a32 ("md/raid10: Fix raid10 replace hang when new added disk faulty")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230527072218.2365857-2-linan666@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/md/raid10.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index ea6967aeaa02a..7b5f26726b310 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3436,7 +3436,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
int must_sync;
int any_working;
int need_recover = 0;
- int need_replace = 0;
struct raid10_info *mirror = &conf->mirrors[i];
struct md_rdev *mrdev, *mreplace;
@@ -3448,11 +3447,10 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
!test_bit(Faulty, &mrdev->flags) &&
!test_bit(In_sync, &mrdev->flags))
need_recover = 1;
- if (mreplace != NULL &&
- !test_bit(Faulty, &mreplace->flags))
- need_replace = 1;
+ if (mreplace && test_bit(Faulty, &mreplace->flags))
+ mreplace = NULL;
- if (!need_recover && !need_replace) {
+ if (!need_recover && !mreplace) {
rcu_read_unlock();
continue;
}
@@ -3468,8 +3466,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
rcu_read_unlock();
continue;
}
- if (mreplace && test_bit(Faulty, &mreplace->flags))
- mreplace = NULL;
/* Unless we are doing a full sync, or a replacement
* we only need to recover the block if it is set in
* the bitmap
@@ -3592,11 +3588,11 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
bio = r10_bio->devs[1].repl_bio;
if (bio)
bio->bi_end_io = NULL;
- /* Note: if need_replace, then bio
+ /* Note: if replace is not NULL, then bio
* cannot be NULL as r10buf_pool_alloc will
* have allocated it.
*/
- if (!need_replace)
+ if (!mreplace)
break;
bio->bi_next = biolist;
biolist = bio;
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 022/431] md/raid10: fix io loss while replacement replace rdev
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (20 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 021/431] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 023/431] md/raid1-10: factor out a helper to add bio to plug Greg Kroah-Hartman
` (409 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable; +Cc: Greg Kroah-Hartman, patches, Li Nan, Yu Kuai, Song Liu,
Sasha Levin
From: Li Nan <linan122@huawei.com>
[ Upstream commit 2ae6aaf76912bae53c74b191569d2ab484f24bf3 ]
When removing a disk with replacement, the replacement will be used to
replace rdev. During this process, there is a brief window in which both
rdev and replacement are read as NULL in raid10_write_request(). This
will result in io not being submitted but it should be.
//remove //write
raid10_remove_disk raid10_write_request
mirror->rdev = NULL
read rdev -> NULL
mirror->rdev = mirror->replacement
mirror->replacement = NULL
read replacement -> NULL
Fix it by reading replacement first and rdev later, meanwhile, use smp_mb()
to prevent memory reordering.
Fixes: 475b0321a4df ("md/raid10: writes should get directed to replacement as well as original.")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230602091839.743798-3-linan666@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/md/raid10.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 7b5f26726b310..5af4e8aa08e96 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -779,8 +779,16 @@ static struct md_rdev *read_balance(struct r10conf *conf,
disk = r10_bio->devs[slot].devnum;
rdev = rcu_dereference(conf->mirrors[disk].replacement);
if (rdev == NULL || test_bit(Faulty, &rdev->flags) ||
- r10_bio->devs[slot].addr + sectors > rdev->recovery_offset)
+ r10_bio->devs[slot].addr + sectors >
+ rdev->recovery_offset) {
+ /*
+ * Read replacement first to prevent reading both rdev
+ * and replacement as NULL during replacement replace
+ * rdev.
+ */
+ smp_mb();
rdev = rcu_dereference(conf->mirrors[disk].rdev);
+ }
if (rdev == NULL ||
test_bit(Faulty, &rdev->flags))
continue;
@@ -1477,9 +1485,15 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
for (i = 0; i < conf->copies; i++) {
int d = r10_bio->devs[i].devnum;
- struct md_rdev *rdev = rcu_dereference(conf->mirrors[d].rdev);
- struct md_rdev *rrdev = rcu_dereference(
- conf->mirrors[d].replacement);
+ struct md_rdev *rdev, *rrdev;
+
+ rrdev = rcu_dereference(conf->mirrors[d].replacement);
+ /*
+ * Read replacement first to prevent reading both rdev and
+ * replacement as NULL during replacement replace rdev.
+ */
+ smp_mb();
+ rdev = rcu_dereference(conf->mirrors[d].rdev);
if (rdev == rrdev)
rrdev = NULL;
if (rdev && (test_bit(Faulty, &rdev->flags)))
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 023/431] md/raid1-10: factor out a helper to add bio to plug
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (21 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 022/431] md/raid10: fix io loss while replacement replace rdev Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 024/431] md/raid1-10: factor out a helper to submit normal write Greg Kroah-Hartman
` (408 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable; +Cc: Greg Kroah-Hartman, patches, Yu Kuai, Song Liu, Sasha Levin
From: Yu Kuai <yukuai3@huawei.com>
[ Upstream commit 5ec6ca140a034682e421e2e808ef5ddfdfd65242 ]
The code in raid1 and raid10 is identical, prepare to limit the number
of plugged bios.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230529131106.2123367-3-yukuai1@huaweicloud.com
Stable-dep-of: 7db922bae3ab ("md/raid1-10: submit write io directly if bitmap is not enabled")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/md/raid1-10.c | 16 ++++++++++++++++
drivers/md/raid1.c | 12 +-----------
drivers/md/raid10.c | 11 +----------
3 files changed, 18 insertions(+), 21 deletions(-)
diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
index e61f6cad4e08e..9bf19a3409cef 100644
--- a/drivers/md/raid1-10.c
+++ b/drivers/md/raid1-10.c
@@ -109,3 +109,19 @@ static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
size -= len;
} while (idx++ < RESYNC_PAGES && size > 0);
}
+
+static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
+ blk_plug_cb_fn unplug)
+{
+ struct raid1_plug_cb *plug = NULL;
+ struct blk_plug_cb *cb = blk_check_plugged(unplug, mddev,
+ sizeof(*plug));
+
+ if (!cb)
+ return false;
+
+ plug = container_of(cb, struct raid1_plug_cb, cb);
+ bio_list_add(&plug->pending, bio);
+
+ return true;
+}
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 68a9e2d9985b2..131d8fd5ccaab 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1343,8 +1343,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
struct bitmap *bitmap = mddev->bitmap;
unsigned long flags;
struct md_rdev *blocked_rdev;
- struct blk_plug_cb *cb;
- struct raid1_plug_cb *plug = NULL;
int first_clone;
int max_sectors;
bool write_behind = false;
@@ -1573,15 +1571,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
r1_bio->sector);
/* flush_pending_writes() needs access to the rdev so...*/
mbio->bi_bdev = (void *)rdev;
-
- cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
- if (cb)
- plug = container_of(cb, struct raid1_plug_cb, cb);
- else
- plug = NULL;
- if (plug) {
- bio_list_add(&plug->pending, mbio);
- } else {
+ if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug)) {
spin_lock_irqsave(&conf->device_lock, flags);
bio_list_add(&conf->pending_bio_list, mbio);
spin_unlock_irqrestore(&conf->device_lock, flags);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 5af4e8aa08e96..a6cc066a86f09 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1288,8 +1288,6 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio,
const blk_opf_t do_sync = bio->bi_opf & REQ_SYNC;
const blk_opf_t do_fua = bio->bi_opf & REQ_FUA;
unsigned long flags;
- struct blk_plug_cb *cb;
- struct raid1_plug_cb *plug = NULL;
struct r10conf *conf = mddev->private;
struct md_rdev *rdev;
int devnum = r10_bio->devs[n_copy].devnum;
@@ -1329,14 +1327,7 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio,
atomic_inc(&r10_bio->remaining);
- cb = blk_check_plugged(raid10_unplug, mddev, sizeof(*plug));
- if (cb)
- plug = container_of(cb, struct raid1_plug_cb, cb);
- else
- plug = NULL;
- if (plug) {
- bio_list_add(&plug->pending, mbio);
- } else {
+ if (!raid1_add_bio_to_plug(mddev, mbio, raid10_unplug)) {
spin_lock_irqsave(&conf->device_lock, flags);
bio_list_add(&conf->pending_bio_list, mbio);
spin_unlock_irqrestore(&conf->device_lock, flags);
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 024/431] md/raid1-10: factor out a helper to submit normal write
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (22 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 023/431] md/raid1-10: factor out a helper to add bio to plug Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 025/431] md/raid1-10: submit write io directly if bitmap is not enabled Greg Kroah-Hartman
` (407 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable; +Cc: Greg Kroah-Hartman, patches, Yu Kuai, Song Liu, Sasha Levin
From: Yu Kuai <yukuai3@huawei.com>
[ Upstream commit 8295efbe68c080047e98d9c0eb5cb933b238a8cb ]
There are multiple places to do the same thing, factor out a helper to
prevent redundant code, and the helper will be used in following patch
as well.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230529131106.2123367-4-yukuai1@huaweicloud.com
Stable-dep-of: 7db922bae3ab ("md/raid1-10: submit write io directly if bitmap is not enabled")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/md/raid1-10.c | 17 +++++++++++++++++
drivers/md/raid1.c | 13 ++-----------
drivers/md/raid10.c | 26 ++++----------------------
3 files changed, 23 insertions(+), 33 deletions(-)
diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
index 9bf19a3409cef..506299bd55cb6 100644
--- a/drivers/md/raid1-10.c
+++ b/drivers/md/raid1-10.c
@@ -110,6 +110,23 @@ static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
} while (idx++ < RESYNC_PAGES && size > 0);
}
+
+static inline void raid1_submit_write(struct bio *bio)
+{
+ struct md_rdev *rdev = (struct md_rdev *)bio->bi_bdev;
+
+ bio->bi_next = NULL;
+ bio_set_dev(bio, rdev->bdev);
+ if (test_bit(Faulty, &rdev->flags))
+ bio_io_error(bio);
+ else if (unlikely(bio_op(bio) == REQ_OP_DISCARD &&
+ !bdev_max_discard_sectors(bio->bi_bdev)))
+ /* Just ignore it */
+ bio_endio(bio);
+ else
+ submit_bio_noacct(bio);
+}
+
static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
blk_plug_cb_fn unplug)
{
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 131d8fd5ccaab..e51b77a3a8397 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -799,17 +799,8 @@ static void flush_bio_list(struct r1conf *conf, struct bio *bio)
while (bio) { /* submit pending writes */
struct bio *next = bio->bi_next;
- struct md_rdev *rdev = (void *)bio->bi_bdev;
- bio->bi_next = NULL;
- bio_set_dev(bio, rdev->bdev);
- if (test_bit(Faulty, &rdev->flags)) {
- bio_io_error(bio);
- } else if (unlikely((bio_op(bio) == REQ_OP_DISCARD) &&
- !bdev_max_discard_sectors(bio->bi_bdev)))
- /* Just ignore it */
- bio_endio(bio);
- else
- submit_bio_noacct(bio);
+
+ raid1_submit_write(bio);
bio = next;
cond_resched();
}
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a6cc066a86f09..f2f7538dd2a68 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -917,17 +917,8 @@ static void flush_pending_writes(struct r10conf *conf)
while (bio) { /* submit pending writes */
struct bio *next = bio->bi_next;
- struct md_rdev *rdev = (void*)bio->bi_bdev;
- bio->bi_next = NULL;
- bio_set_dev(bio, rdev->bdev);
- if (test_bit(Faulty, &rdev->flags)) {
- bio_io_error(bio);
- } else if (unlikely((bio_op(bio) == REQ_OP_DISCARD) &&
- !bdev_max_discard_sectors(bio->bi_bdev)))
- /* Just ignore it */
- bio_endio(bio);
- else
- submit_bio_noacct(bio);
+
+ raid1_submit_write(bio);
bio = next;
}
blk_finish_plug(&plug);
@@ -1136,17 +1127,8 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
while (bio) { /* submit pending writes */
struct bio *next = bio->bi_next;
- struct md_rdev *rdev = (void*)bio->bi_bdev;
- bio->bi_next = NULL;
- bio_set_dev(bio, rdev->bdev);
- if (test_bit(Faulty, &rdev->flags)) {
- bio_io_error(bio);
- } else if (unlikely((bio_op(bio) == REQ_OP_DISCARD) &&
- !bdev_max_discard_sectors(bio->bi_bdev)))
- /* Just ignore it */
- bio_endio(bio);
- else
- submit_bio_noacct(bio);
+
+ raid1_submit_write(bio);
bio = next;
}
kfree(plug);
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 025/431] md/raid1-10: submit write io directly if bitmap is not enabled
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (23 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 024/431] md/raid1-10: factor out a helper to submit normal write Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 026/431] block: fix blktrace debugfs entries leakage Greg Kroah-Hartman
` (406 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable; +Cc: Greg Kroah-Hartman, patches, Yu Kuai, Song Liu, Sasha Levin
From: Yu Kuai <yukuai3@huawei.com>
[ Upstream commit 7db922bae3abdf0a1db81ef7228cc0b996a0c1e3 ]
Commit 6cce3b23f6f8 ("[PATCH] md: write intent bitmap support for raid10")
add bitmap support, and it changed that write io is submitted through
daemon thread because bitmap need to be updated before write io. And
later, plug is used to fix performance regression because all the write io
will go to demon thread, which means io can't be issued concurrently.
However, if bitmap is not enabled, the write io should not go to daemon
thread in the first place, and plug is not needed as well.
Fixes: 6cce3b23f6f8 ("[PATCH] md: write intent bitmap support for raid10")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230529131106.2123367-5-yukuai1@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/md/md-bitmap.c | 4 +---
drivers/md/md-bitmap.h | 7 +++++++
drivers/md/raid1-10.c | 13 +++++++++++--
3 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 9640741e8d369..8bbeeec70905c 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -993,7 +993,6 @@ static int md_bitmap_file_test_bit(struct bitmap *bitmap, sector_t block)
return set;
}
-
/* this gets called when the md device is ready to unplug its underlying
* (slave) device queues -- before we let any writes go down, we need to
* sync the dirty pages of the bitmap file to disk */
@@ -1003,8 +1002,7 @@ void md_bitmap_unplug(struct bitmap *bitmap)
int dirty, need_write;
int writing = 0;
- if (!bitmap || !bitmap->storage.filemap ||
- test_bit(BITMAP_STALE, &bitmap->flags))
+ if (!md_bitmap_enabled(bitmap))
return;
/* look at each page to see if there are any set bits that need to be
diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
index cfd7395de8fd3..3a4750952b3a7 100644
--- a/drivers/md/md-bitmap.h
+++ b/drivers/md/md-bitmap.h
@@ -273,6 +273,13 @@ int md_bitmap_copy_from_slot(struct mddev *mddev, int slot,
sector_t *lo, sector_t *hi, bool clear_bits);
void md_bitmap_free(struct bitmap *bitmap);
void md_bitmap_wait_behind_writes(struct mddev *mddev);
+
+static inline bool md_bitmap_enabled(struct bitmap *bitmap)
+{
+ return bitmap && bitmap->storage.filemap &&
+ !test_bit(BITMAP_STALE, &bitmap->flags);
+}
+
#endif
#endif
diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
index 506299bd55cb6..73cc3cb9154d8 100644
--- a/drivers/md/raid1-10.c
+++ b/drivers/md/raid1-10.c
@@ -131,9 +131,18 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
blk_plug_cb_fn unplug)
{
struct raid1_plug_cb *plug = NULL;
- struct blk_plug_cb *cb = blk_check_plugged(unplug, mddev,
- sizeof(*plug));
+ struct blk_plug_cb *cb;
+
+ /*
+ * If bitmap is not enabled, it's safe to submit the io directly, and
+ * this can get optimal performance.
+ */
+ if (!md_bitmap_enabled(mddev->bitmap)) {
+ raid1_submit_write(bio);
+ return true;
+ }
+ cb = blk_check_plugged(unplug, mddev, sizeof(*plug));
if (!cb)
return false;
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 026/431] block: fix blktrace debugfs entries leakage
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (24 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 025/431] md/raid1-10: submit write io directly if bitmap is not enabled Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 027/431] irqchip/loongson-eiointc: Fix irq affinity setting during resume Greg Kroah-Hartman
` (405 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Yu Kuai, Christoph Hellwig,
Jens Axboe, Sasha Levin
From: Yu Kuai <yukuai3@huawei.com>
[ Upstream commit dd7de3704af9989b780693d51eaea49a665bd9c2 ]
Commit 99d055b4fd4b ("block: remove per-disk debugfs files in
blk_unregister_queue") moves blk_trace_shutdown() from
blk_release_queue() to blk_unregister_queue(), this is safe if blktrace
is created through sysfs, however, there is a regression in corner
case.
blktrace can still be enabled after del_gendisk() through ioctl if
the disk is opened before del_gendisk(), and if blktrace is not shutdown
through ioctl before closing the disk, debugfs entries will be leaked.
Fix this problem by shutdown blktrace in disk_release(), this is safe
because blk_trace_remove() is reentrant.
Fixes: 99d055b4fd4b ("block: remove per-disk debugfs files in blk_unregister_queue")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230610022003.2557284-4-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
block/genhd.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/block/genhd.c b/block/genhd.c
index 7f874737af682..c5a35e1b462fa 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -25,8 +25,9 @@
#include <linux/pm_runtime.h>
#include <linux/badblocks.h>
#include <linux/part_stat.h>
-#include "blk-throttle.h"
+#include <linux/blktrace_api.h>
+#include "blk-throttle.h"
#include "blk.h"
#include "blk-mq-sched.h"
#include "blk-rq-qos.h"
@@ -1183,6 +1184,8 @@ static void disk_release(struct device *dev)
might_sleep();
WARN_ON_ONCE(disk_live(disk));
+ blk_trace_remove(disk->queue);
+
/*
* To undo the all initialization from blk_mq_init_allocated_queue in
* case of a probe failure where add_disk is never called we have to
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 027/431] irqchip/loongson-eiointc: Fix irq affinity setting during resume
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (25 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 026/431] block: fix blktrace debugfs entries leakage Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 028/431] splice: dont call file_accessed in copy_splice_read Greg Kroah-Hartman
` (404 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, yangqiming, Jianmin Lv, Marc Zyngier,
Sasha Levin
From: Jianmin Lv <lvjianmin@loongson.cn>
[ Upstream commit fb07b8f83441febeb0daf199b5f18c6de9bbab03 ]
The hierarchy of PCH PIC, PCH PCI MSI and EIONTC is as following:
PCH PIC ------->|
|---->EIOINTC
PCH PCI MSI --->|
so the irq_data list of irq_desc for IRQs on PCH PIC and PCH PCI MSI
is like this:
irq_desc->irq_data(domain: PCH PIC)->parent_data(domain: EIOINTC)
irq_desc->irq_data(domain: PCH PCI MSI)->parent_data(domain: EIOINTC)
In eiointc_resume(), the irq_data passed into eiointc_set_irq_affinity()
should be matched to EIOINTC domain instead of PCH PIC or PCH PCI MSI
domain, so fix it.
Fixes: a90335c2dfb4 ("irqchip/loongson-eiointc: Add suspend/resume support")
Reported-by: yangqiming <yangqiming@loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230614115936.5950-6-lvjianmin@loongson.cn
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/irqchip/irq-loongson-eiointc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-loongson-eiointc.c b/drivers/irqchip/irq-loongson-eiointc.c
index 90181c42840b4..873a326ed6cbc 100644
--- a/drivers/irqchip/irq-loongson-eiointc.c
+++ b/drivers/irqchip/irq-loongson-eiointc.c
@@ -317,7 +317,7 @@ static void eiointc_resume(void)
desc = irq_resolve_mapping(eiointc_priv[i]->eiointc_domain, j);
if (desc && desc->handle_irq && desc->handle_irq != handle_bad_irq) {
raw_spin_lock(&desc->lock);
- irq_data = &desc->irq_data;
+ irq_data = irq_domain_get_irq_data(eiointc_priv[i]->eiointc_domain, irq_desc_get_irq(desc));
eiointc_set_irq_affinity(irq_data, irq_data->common->affinity, 0);
raw_spin_unlock(&desc->lock);
}
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 028/431] splice: dont call file_accessed in copy_splice_read
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (26 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 027/431] irqchip/loongson-eiointc: Fix irq affinity setting during resume Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 029/431] irqchip/stm32-exti: Fix warning on initialized field overwritten Greg Kroah-Hartman
` (403 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Christoph Hellwig,
Johannes Thumshirn, Christian Brauner, David Howells, Jens Axboe,
Sasha Levin
From: Christoph Hellwig <hch@lst.de>
[ Upstream commit 0b24be4691c9e6ea13ca70050d42a9f9032fa788 ]
copy_splice_read calls into ->read_iter to read the data, which already
calls file_accessed.
Fixes: 33b3b041543e ("splice: Add a func to do a splice from an O_DIRECT file without ITER_PIPE")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20230614140341.521331-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/splice.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/splice.c b/fs/splice.c
index 2c3dec2b6dfaf..5eca589fe8479 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -338,7 +338,6 @@ ssize_t direct_splice_read(struct file *in, loff_t *ppos,
reclaim -= ret;
remain = ret;
*ppos = kiocb.ki_pos;
- file_accessed(in);
} else if (ret < 0) {
/*
* callers of ->splice_read() expect -EAGAIN on
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 029/431] irqchip/stm32-exti: Fix warning on initialized field overwritten
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (27 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 028/431] splice: dont call file_accessed in copy_splice_read Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 030/431] irqchip/jcore-aic: Fix missing allocation of IRQ descriptors Greg Kroah-Hartman
` (402 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Antonio Borneo, Marc Zyngier,
Sasha Levin
From: Antonio Borneo <antonio.borneo@foss.st.com>
[ Upstream commit 48f31e496488a25f443c0df52464da446fb1d10c ]
While compiling with W=1, both gcc and clang complain about a
tricky way to initialize an array by filling it with a non-zero
value and then overrride some of the array elements.
In this case the override is intentional, so just disable the
specific warning for only this part of the code.
Note: the flag "-Woverride-init" is recognized by both compilers,
but the warning msg from clang reports "-Winitializer-overrides".
The doc of clang clarifies that the two flags are synonyms, so use
here only the flag name common on both compilers.
Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Fixes: c297493336b7 ("irqchip/stm32-exti: Simplify irq description table")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230601155614.34490-1-antonio.borneo@foss.st.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/irqchip/irq-stm32-exti.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/irqchip/irq-stm32-exti.c b/drivers/irqchip/irq-stm32-exti.c
index 6a3f7498ea8ea..8bbb2b114636c 100644
--- a/drivers/irqchip/irq-stm32-exti.c
+++ b/drivers/irqchip/irq-stm32-exti.c
@@ -173,6 +173,16 @@ static struct irq_chip stm32_exti_h_chip_direct;
#define EXTI_INVALID_IRQ U8_MAX
#define STM32MP1_DESC_IRQ_SIZE (ARRAY_SIZE(stm32mp1_exti_banks) * IRQS_PER_BANK)
+/*
+ * Use some intentionally tricky logic here to initialize the whole array to
+ * EXTI_INVALID_IRQ, but then override certain fields, requiring us to indicate
+ * that we "know" that there are overrides in this structure, and we'll need to
+ * disable that warning from W=1 builds.
+ */
+__diag_push();
+__diag_ignore_all("-Woverride-init",
+ "logic to initialize all and then override some is OK");
+
static const u8 stm32mp1_desc_irq[] = {
/* default value */
[0 ... (STM32MP1_DESC_IRQ_SIZE - 1)] = EXTI_INVALID_IRQ,
@@ -266,6 +276,8 @@ static const u8 stm32mp13_desc_irq[] = {
[70] = 98,
};
+__diag_pop();
+
static const struct stm32_exti_drv_data stm32mp1_drv_data = {
.exti_banks = stm32mp1_exti_banks,
.bank_nr = ARRAY_SIZE(stm32mp1_exti_banks),
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 030/431] irqchip/jcore-aic: Fix missing allocation of IRQ descriptors
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (28 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 029/431] irqchip/stm32-exti: Fix warning on initialized field overwritten Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 031/431] svcrdma: Prevent page release when nothing was received Greg Kroah-Hartman
` (401 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, John Paul Adrian Glaubitz,
Rob Landley, Marc Zyngier, Sasha Levin
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
[ Upstream commit 4848229494a323eeaab62eee5574ef9f7de80374 ]
The initialization function for the J-Core AIC aic_irq_of_init() is
currently missing the call to irq_alloc_descs() which allocates and
initializes all the IRQ descriptors. Add missing function call and
return the error code from irq_alloc_descs() in case the allocation
fails.
Fixes: 981b58f66cfc ("irqchip/jcore-aic: Add J-Core AIC driver")
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Tested-by: Rob Landley <rob@landley.net>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230510163343.43090-1-glaubitz@physik.fu-berlin.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/irqchip/irq-jcore-aic.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/irqchip/irq-jcore-aic.c b/drivers/irqchip/irq-jcore-aic.c
index 5f47d8ee4ae39..b9dcc8e78c750 100644
--- a/drivers/irqchip/irq-jcore-aic.c
+++ b/drivers/irqchip/irq-jcore-aic.c
@@ -68,6 +68,7 @@ static int __init aic_irq_of_init(struct device_node *node,
unsigned min_irq = JCORE_AIC2_MIN_HWIRQ;
unsigned dom_sz = JCORE_AIC_MAX_HWIRQ+1;
struct irq_domain *domain;
+ int ret;
pr_info("Initializing J-Core AIC\n");
@@ -100,6 +101,12 @@ static int __init aic_irq_of_init(struct device_node *node,
jcore_aic.irq_unmask = noop;
jcore_aic.name = "AIC";
+ ret = irq_alloc_descs(-1, min_irq, dom_sz - min_irq,
+ of_node_to_nid(node));
+
+ if (ret < 0)
+ return ret;
+
domain = irq_domain_add_legacy(node, dom_sz - min_irq, min_irq, min_irq,
&jcore_aic_irqdomain_ops,
&jcore_aic);
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 031/431] svcrdma: Prevent page release when nothing was received
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (29 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 030/431] irqchip/jcore-aic: Fix missing allocation of IRQ descriptors Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 032/431] erofs: fix compact 4B support for 16k block size Greg Kroah-Hartman
` (400 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable; +Cc: Greg Kroah-Hartman, patches, Jeff Layton, Chuck Lever,
Sasha Levin
From: Chuck Lever <chuck.lever@oracle.com>
[ Upstream commit baf6d18b116b7dc84ed5e212c3a89f17cdc3f28c ]
I noticed that svc_rqst_release_pages() was still unnecessarily
releasing a page when svc_rdma_recvfrom() returns zero.
Fixes: a53d5cb0646a ("svcrdma: Avoid releasing a page in svc_xprt_release()")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index a22fe7587fa6f..70207d8a318a4 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -796,6 +796,12 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
struct svc_rdma_recv_ctxt *ctxt;
int ret;
+ /* Prevent svc_xprt_release() from releasing pages in rq_pages
+ * when returning 0 or an error.
+ */
+ rqstp->rq_respages = rqstp->rq_pages;
+ rqstp->rq_next_page = rqstp->rq_respages;
+
rqstp->rq_xprt_ctxt = NULL;
ctxt = NULL;
@@ -819,12 +825,6 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
DMA_FROM_DEVICE);
svc_rdma_build_arg_xdr(rqstp, ctxt);
- /* Prevent svc_xprt_release from releasing pages in rq_pages
- * if we return 0 or an error.
- */
- rqstp->rq_respages = rqstp->rq_pages;
- rqstp->rq_next_page = rqstp->rq_respages;
-
ret = svc_rdma_xdr_decode_req(&rqstp->rq_arg, ctxt);
if (ret < 0)
goto out_err;
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 032/431] erofs: fix compact 4B support for 16k block size
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (30 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 031/431] svcrdma: Prevent page release when nothing was received Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 033/431] posix-timers: Prevent RT livelock in itimer_delete() Greg Kroah-Hartman
` (399 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable; +Cc: Greg Kroah-Hartman, patches, Gao Xiang, Sasha Levin
From: Gao Xiang <hsiangkao@linux.alibaba.com>
[ Upstream commit 001b8ccd0650727e54ec16ef72bf1b8eeab7168e ]
In compact 4B, two adjacent lclusters are packed together as a unit to
form on-disk indexes for effective random access, as below:
(amortized = 4, vcnt = 2)
_____________________________________________
|___@_____ encoded bits __________|_ blkaddr _|
0 . amortized * vcnt = 8
. .
. . amortized * vcnt - 4 = 4
. .
.____________________________.
|_type (2 bits)_|_clusterofs_|
Therefore, encoded bits for each pack are 32 bits (4 bytes). IOWs,
since each lcluster can get 16 bits for its type and clusterofs, the
maximum supported lclustersize for compact 4B format is 16k (14 bits).
Fix this to enable compact 4B format for 16k lclusters (blocks), which
is tested on an arm64 server with 16k page size.
Fixes: 152a333a5895 ("staging: erofs: add compacted compression indexes support")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230601112341.56960-1-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/erofs/zmap.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index b5f4086537548..322f110b3c8f4 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -148,7 +148,7 @@ static int unpack_compacted_index(struct z_erofs_maprecorder *m,
u8 *in, type;
bool big_pcluster;
- if (1 << amortizedshift == 4)
+ if (1 << amortizedshift == 4 && lclusterbits <= 14)
vcnt = 2;
else if (1 << amortizedshift == 2 && lclusterbits == 12)
vcnt = 16;
@@ -250,7 +250,6 @@ static int compacted_load_cluster_from_disk(struct z_erofs_maprecorder *m,
{
struct inode *const inode = m->inode;
struct erofs_inode *const vi = EROFS_I(inode);
- const unsigned int lclusterbits = vi->z_logical_clusterbits;
const erofs_off_t ebase = sizeof(struct z_erofs_map_header) +
ALIGN(erofs_iloc(inode) + vi->inode_isize + vi->xattr_isize, 8);
const unsigned int totalidx = DIV_ROUND_UP(inode->i_size, EROFS_BLKSIZ);
@@ -258,9 +257,6 @@ static int compacted_load_cluster_from_disk(struct z_erofs_maprecorder *m,
unsigned int amortizedshift;
erofs_off_t pos;
- if (lclusterbits != 12)
- return -EOPNOTSUPP;
-
if (lcn >= totalidx)
return -EINVAL;
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 033/431] posix-timers: Prevent RT livelock in itimer_delete()
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (31 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 032/431] erofs: fix compact 4B support for 16k block size Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 034/431] tick/rcu: Fix bogus ratelimit condition Greg Kroah-Hartman
` (398 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Thomas Gleixner, Frederic Weisbecker,
Sasha Levin
From: Thomas Gleixner <tglx@linutronix.de>
[ Upstream commit 9d9e522010eb5685d8b53e8a24320653d9d4cbbf ]
itimer_delete() has a retry loop when the timer is concurrently expired. On
non-RT kernels this just spin-waits until the timer callback has completed,
except for posix CPU timers which have HAVE_POSIX_CPU_TIMERS_TASK_WORK
enabled.
In that case and on RT kernels the existing task could live lock when
preempting the task which does the timer delivery.
Replace spin_unlock() with an invocation of timer_wait_running() to handle
it the same way as the other retry loops in the posix timer code.
Fixes: ec8f954a40da ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/87v8g7c50d.ffs@tglx
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/time/posix-timers.c | 43 +++++++++++++++++++++++++++++++-------
1 file changed, 35 insertions(+), 8 deletions(-)
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 808a247205a9a..ed3c4a9543982 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -1037,27 +1037,52 @@ SYSCALL_DEFINE1(timer_delete, timer_t, timer_id)
}
/*
- * return timer owned by the process, used by exit_itimers
+ * Delete a timer if it is armed, remove it from the hash and schedule it
+ * for RCU freeing.
*/
static void itimer_delete(struct k_itimer *timer)
{
-retry_delete:
- spin_lock_irq(&timer->it_lock);
+ unsigned long flags;
+
+ /*
+ * irqsave is required to make timer_wait_running() work.
+ */
+ spin_lock_irqsave(&timer->it_lock, flags);
+retry_delete:
+ /*
+ * Even if the timer is not longer accessible from other tasks
+ * it still might be armed and queued in the underlying timer
+ * mechanism. Worse, that timer mechanism might run the expiry
+ * function concurrently.
+ */
if (timer_delete_hook(timer) == TIMER_RETRY) {
- spin_unlock_irq(&timer->it_lock);
+ /*
+ * Timer is expired concurrently, prevent livelocks
+ * and pointless spinning on RT.
+ *
+ * timer_wait_running() drops timer::it_lock, which opens
+ * the possibility for another task to delete the timer.
+ *
+ * That's not possible here because this is invoked from
+ * do_exit() only for the last thread of the thread group.
+ * So no other task can access and delete that timer.
+ */
+ if (WARN_ON_ONCE(timer_wait_running(timer, &flags) != timer))
+ return;
+
goto retry_delete;
}
list_del(&timer->list);
- spin_unlock_irq(&timer->it_lock);
+ spin_unlock_irqrestore(&timer->it_lock, flags);
release_posix_timer(timer, IT_ID_SET);
}
/*
- * This is called by do_exit or de_thread, only when nobody else can
- * modify the signal->posix_timers list. Yet we need sighand->siglock
- * to prevent the race with /proc/pid/timers.
+ * Invoked from do_exit() when the last thread of a thread group exits.
+ * At that point no other task can access the timers of the dying
+ * task anymore.
*/
void exit_itimers(struct task_struct *tsk)
{
@@ -1067,10 +1092,12 @@ void exit_itimers(struct task_struct *tsk)
if (list_empty(&tsk->signal->posix_timers))
return;
+ /* Protect against concurrent read via /proc/$PID/timers */
spin_lock_irq(&tsk->sighand->siglock);
list_replace_init(&tsk->signal->posix_timers, &timers);
spin_unlock_irq(&tsk->sighand->siglock);
+ /* The timers are not longer accessible via tsk::signal */
while (!list_empty(&timers)) {
tmr = list_first_entry(&timers, struct k_itimer, list);
itimer_delete(tmr);
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 034/431] tick/rcu: Fix bogus ratelimit condition
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (32 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 033/431] posix-timers: Prevent RT livelock in itimer_delete() Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 035/431] tracing/timer: Add missing hrtimer modes to decode_hrtimer_mode() Greg Kroah-Hartman
` (397 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable; +Cc: Greg Kroah-Hartman, patches, Wen Yang, Thomas Gleixner,
Sasha Levin
From: Wen Yang <wenyang.linux@foxmail.com>
[ Upstream commit a7e282c77785c7eabf98836431b1f029481085ad ]
The ratelimit logic in report_idle_softirq() is broken because the
exit condition is always true:
static int ratelimit;
if (ratelimit < 10)
return false; ---> always returns here
ratelimit++; ---> no chance to run
Make it check for >= 10 instead.
Fixes: 0345691b24c0 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle")
Signed-off-by: Wen Yang <wenyang.linux@foxmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/tencent_5AAA3EEAB42095C9B7740BE62FBF9A67E007@qq.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/time/tick-sched.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d6fb6a676bbbb..1ad89eec2a55f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1046,7 +1046,7 @@ static bool report_idle_softirq(void)
return false;
}
- if (ratelimit < 10)
+ if (ratelimit >= 10)
return false;
/* On RT, softirqs handling may be waiting on some lock */
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 035/431] tracing/timer: Add missing hrtimer modes to decode_hrtimer_mode().
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (33 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 034/431] tick/rcu: Fix bogus ratelimit condition Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 036/431] btrfs: make btrfs_split_bio work on struct btrfs_bio Greg Kroah-Hartman
` (396 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Sebastian Andrzej Siewior,
Thomas Gleixner, Mukesh Ojha, Steven Rostedt (Google),
Sasha Levin
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[ Upstream commit 2951580ba6adb082bb6b7154a5ecb24e7c1f7569 ]
The trace output for the HRTIMER_MODE_.*_HARD modes is seen as a number
since these modes are not decoded. The author was not aware of the fancy
decoding function which makes the life easier.
Extend decode_hrtimer_mode() with the additional HRTIMER_MODE_.*_HARD
modes.
Fixes: ae6683d815895 ("hrtimer: Introduce HARD expiry mode")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230418143854.8vHWQKLM@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/trace/events/timer.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 3e8619c72f774..b4bc2828fa09f 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -158,7 +158,11 @@ DEFINE_EVENT(timer_class, timer_cancel,
{ HRTIMER_MODE_ABS_SOFT, "ABS|SOFT" }, \
{ HRTIMER_MODE_REL_SOFT, "REL|SOFT" }, \
{ HRTIMER_MODE_ABS_PINNED_SOFT, "ABS|PINNED|SOFT" }, \
- { HRTIMER_MODE_REL_PINNED_SOFT, "REL|PINNED|SOFT" })
+ { HRTIMER_MODE_REL_PINNED_SOFT, "REL|PINNED|SOFT" }, \
+ { HRTIMER_MODE_ABS_HARD, "ABS|HARD" }, \
+ { HRTIMER_MODE_REL_HARD, "REL|HARD" }, \
+ { HRTIMER_MODE_ABS_PINNED_HARD, "ABS|PINNED|HARD" }, \
+ { HRTIMER_MODE_REL_PINNED_HARD, "REL|PINNED|HARD" })
/**
* hrtimer_init - called when the hrtimer is initialized
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 036/431] btrfs: make btrfs_split_bio work on struct btrfs_bio
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (34 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 035/431] tracing/timer: Add missing hrtimer modes to decode_hrtimer_mode() Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 037/431] btrfs: fix file_offset for REQ_BTRFS_ONE_ORDERED bios that get split Greg Kroah-Hartman
` (395 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Anand Jain, Johannes Thumshirn,
Qu Wenruo, Christoph Hellwig, David Sterba, Sasha Levin
From: Christoph Hellwig <hch@lst.de>
[ Upstream commit 2cef0c79bb81d8bae1dbc45195771a824ca45e76 ]
btrfs_split_bio expects a btrfs_bio as argument and always allocates one.
Type both the orig_bio argument and the return value as struct btrfs_bio
to improve type safety.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: c731cd0b6d25 ("btrfs: fix file_offset for REQ_BTRFS_ONE_ORDERED bios that get split")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/bio.c | 27 ++++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index ada899613486a..67e5156f940d3 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -59,30 +59,31 @@ struct bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf,
return bio;
}
-static struct bio *btrfs_split_bio(struct btrfs_fs_info *fs_info,
- struct bio *orig, u64 map_length,
- bool use_append)
+static struct btrfs_bio *btrfs_split_bio(struct btrfs_fs_info *fs_info,
+ struct btrfs_bio *orig_bbio,
+ u64 map_length, bool use_append)
{
- struct btrfs_bio *orig_bbio = btrfs_bio(orig);
+ struct btrfs_bio *bbio;
struct bio *bio;
if (use_append) {
unsigned int nr_segs;
- bio = bio_split_rw(orig, &fs_info->limits, &nr_segs,
+ bio = bio_split_rw(&orig_bbio->bio, &fs_info->limits, &nr_segs,
&btrfs_clone_bioset, map_length);
} else {
- bio = bio_split(orig, map_length >> SECTOR_SHIFT, GFP_NOFS,
- &btrfs_clone_bioset);
+ bio = bio_split(&orig_bbio->bio, map_length >> SECTOR_SHIFT,
+ GFP_NOFS, &btrfs_clone_bioset);
}
- btrfs_bio_init(btrfs_bio(bio), orig_bbio->inode, NULL, orig_bbio);
+ bbio = btrfs_bio(bio);
+ btrfs_bio_init(bbio, orig_bbio->inode, NULL, orig_bbio);
- btrfs_bio(bio)->file_offset = orig_bbio->file_offset;
- if (!(orig->bi_opf & REQ_BTRFS_ONE_ORDERED))
+ bbio->file_offset = orig_bbio->file_offset;
+ if (!(orig_bbio->bio.bi_opf & REQ_BTRFS_ONE_ORDERED))
orig_bbio->file_offset += map_length;
atomic_inc(&orig_bbio->pending_ios);
- return bio;
+ return bbio;
}
static void btrfs_orig_write_end_io(struct bio *bio);
@@ -631,8 +632,8 @@ static bool btrfs_submit_chunk(struct bio *bio, int mirror_num)
map_length = min(map_length, fs_info->max_zone_append_size);
if (map_length < length) {
- bio = btrfs_split_bio(fs_info, bio, map_length, use_append);
- bbio = btrfs_bio(bio);
+ bbio = btrfs_split_bio(fs_info, bbio, map_length, use_append);
+ bio = &bbio->bio;
}
/*
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 037/431] btrfs: fix file_offset for REQ_BTRFS_ONE_ORDERED bios that get split
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (35 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 036/431] btrfs: make btrfs_split_bio work on struct btrfs_bio Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 038/431] clocksource/drivers/cadence-ttc: Fix memory leak in ttc_timer_probe Greg Kroah-Hartman
` (394 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Johannes Thumshirn, Josef Bacik,
Christoph Hellwig, David Sterba, Sasha Levin
From: Christoph Hellwig <hch@lst.de>
[ Upstream commit c731cd0b6d255e4855a7cac9f276864032ab2387 ]
If a bio gets split, it needs to have a proper file_offset for checksum
validation and repair to work properly.
Based on feedback from Josef, commit 852eee62d31a ("btrfs: allow
btrfs_submit_bio to split bios") skipped this adjustment for ONE_ORDERED
bios. But if we actually ever need to split a ONE_ORDERED read bio, this
will lead to a wrong file offset in the repair code. Right now the only
user of the file_offset is logging of an error message so this is mostly
harmless, but the wrong offset might be more problematic for additional
users in the future.
Fixes: 852eee62d31a ("btrfs: allow btrfs_submit_bio to split bios")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/bio.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 67e5156f940d3..4bb2c6f4ad0e7 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -79,8 +79,7 @@ static struct btrfs_bio *btrfs_split_bio(struct btrfs_fs_info *fs_info,
btrfs_bio_init(bbio, orig_bbio->inode, NULL, orig_bbio);
bbio->file_offset = orig_bbio->file_offset;
- if (!(orig_bbio->bio.bi_opf & REQ_BTRFS_ONE_ORDERED))
- orig_bbio->file_offset += map_length;
+ orig_bbio->file_offset += map_length;
atomic_inc(&orig_bbio->pending_ios);
return bbio;
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 038/431] clocksource/drivers/cadence-ttc: Fix memory leak in ttc_timer_probe
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (36 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 037/431] btrfs: fix file_offset for REQ_BTRFS_ONE_ORDERED bios that get split Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 039/431] PM: domains: fix integer overflow issues in genpd_parse_state() Greg Kroah-Hartman
` (393 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Feng Mingxi, Dongliang Mu,
Michal Simek, Daniel Lezcano, Sasha Levin
From: Feng Mingxi <m202271825@hust.edu.cn>
[ Upstream commit 8b5bf64c89c7100c921bd807ba39b2eb003061ab ]
Smatch reports:
drivers/clocksource/timer-cadence-ttc.c:529 ttc_timer_probe()
warn: 'timer_baseaddr' from of_iomap() not released on lines: 498,508,516.
timer_baseaddr may have the problem of not being released after use,
I replaced it with the devm_of_iomap() function and added the clk_put()
function to cleanup the "clk_ce" and "clk_cs".
Fixes: e932900a3279 ("arm: zynq: Use standard timer binding")
Fixes: 70504f311d4b ("clocksource/drivers/cadence_ttc: Convert init function to return error")
Signed-off-by: Feng Mingxi <m202271825@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Acked-by: Michal Simek <michal.simek@amd.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230425065611.702917-1-m202271825@hust.edu.cn
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/clocksource/timer-cadence-ttc.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/clocksource/timer-cadence-ttc.c b/drivers/clocksource/timer-cadence-ttc.c
index 4efd0cf3b602d..0d52e28fea4de 100644
--- a/drivers/clocksource/timer-cadence-ttc.c
+++ b/drivers/clocksource/timer-cadence-ttc.c
@@ -486,10 +486,10 @@ static int __init ttc_timer_probe(struct platform_device *pdev)
* and use it. Note that the event timer uses the interrupt and it's the
* 2nd TTC hence the irq_of_parse_and_map(,1)
*/
- timer_baseaddr = of_iomap(timer, 0);
- if (!timer_baseaddr) {
+ timer_baseaddr = devm_of_iomap(&pdev->dev, timer, 0, NULL);
+ if (IS_ERR(timer_baseaddr)) {
pr_err("ERROR: invalid timer base address\n");
- return -ENXIO;
+ return PTR_ERR(timer_baseaddr);
}
irq = irq_of_parse_and_map(timer, 1);
@@ -513,20 +513,27 @@ static int __init ttc_timer_probe(struct platform_device *pdev)
clk_ce = of_clk_get(timer, clksel);
if (IS_ERR(clk_ce)) {
pr_err("ERROR: timer input clock not found\n");
- return PTR_ERR(clk_ce);
+ ret = PTR_ERR(clk_ce);
+ goto put_clk_cs;
}
ret = ttc_setup_clocksource(clk_cs, timer_baseaddr, timer_width);
if (ret)
- return ret;
+ goto put_clk_ce;
ret = ttc_setup_clockevent(clk_ce, timer_baseaddr + 4, irq);
if (ret)
- return ret;
+ goto put_clk_ce;
pr_info("%pOFn #0 at %p, irq=%d\n", timer, timer_baseaddr, irq);
return 0;
+
+put_clk_ce:
+ clk_put(clk_ce);
+put_clk_cs:
+ clk_put(clk_cs);
+ return ret;
}
static const struct of_device_id ttc_timer_of_match[] = {
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 039/431] PM: domains: fix integer overflow issues in genpd_parse_state()
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (37 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 038/431] clocksource/drivers/cadence-ttc: Fix memory leak in ttc_timer_probe Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 040/431] perf/arm-cmn: Fix DTC reset Greg Kroah-Hartman
` (392 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Nikita Zhandarovich, Ulf Hansson,
Rafael J. Wysocki, Sasha Levin
From: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
[ Upstream commit e5d1c8722083f0332dcd3c85fa1273d85fb6bed8 ]
Currently, while calculating residency and latency values, right
operands may overflow if resulting values are big enough.
To prevent this, albeit unlikely case, play it safe and convert
right operands to left ones' type s64.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 30f604283e05 ("PM / Domains: Allow domain power states to be read from DT")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/base/power/domain.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 32084e38b73d0..51b9d4eaab5ea 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -2939,10 +2939,10 @@ static int genpd_parse_state(struct genpd_power_state *genpd_state,
err = of_property_read_u32(state_node, "min-residency-us", &residency);
if (!err)
- genpd_state->residency_ns = 1000 * residency;
+ genpd_state->residency_ns = 1000LL * residency;
- genpd_state->power_on_latency_ns = 1000 * exit_latency;
- genpd_state->power_off_latency_ns = 1000 * entry_latency;
+ genpd_state->power_on_latency_ns = 1000LL * exit_latency;
+ genpd_state->power_off_latency_ns = 1000LL * entry_latency;
genpd_state->fwnode = &state_node->fwnode;
return 0;
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 040/431] perf/arm-cmn: Fix DTC reset
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (38 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 039/431] PM: domains: fix integer overflow issues in genpd_parse_state() Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 041/431] x86/mm: Allow guest.enc_status_change_prepare() to fail Greg Kroah-Hartman
` (391 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Geoff Blake, Robin Murphy,
Will Deacon, Sasha Levin
From: Robin Murphy <robin.murphy@arm.com>
[ Upstream commit 71746c995cac92fcf6a65661b51211cf2009d7f0 ]
It turns out that my naive DTC reset logic fails to work as intended,
since, after checking with the hardware designers, the PMU actually
needs to be fully enabled in order to correctly clear any pending
overflows. Therefore, invert the sequence to start with turning on both
enables so that we can reliably get the DTCs into a known state, then
moving to our normal counters-stopped state from there. Since all the
DTM counters have already been unpaired during the initial discovery
pass, we just need to additionally reset the cycle counters to ensure
that no other unexpected overflows occur during this period.
Fixes: 0ba64770a2f2 ("perf: Add Arm CMN-600 PMU driver")
Reported-by: Geoff Blake <blakgeof@amazon.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/0ea4559261ea394f827c9aee5168c77a60aaee03.1684946389.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/perf/arm-cmn.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index 44b719f39c3b3..4f86b7fd9823f 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -1899,9 +1899,10 @@ static int arm_cmn_init_dtc(struct arm_cmn *cmn, struct arm_cmn_node *dn, int id
if (dtc->irq < 0)
return dtc->irq;
- writel_relaxed(0, dtc->base + CMN_DT_PMCR);
+ writel_relaxed(CMN_DT_DTC_CTL_DT_EN, dtc->base + CMN_DT_DTC_CTL);
+ writel_relaxed(CMN_DT_PMCR_PMU_EN | CMN_DT_PMCR_OVFL_INTR_EN, dtc->base + CMN_DT_PMCR);
+ writeq_relaxed(0, dtc->base + CMN_DT_PMCCNTR);
writel_relaxed(0x1ff, dtc->base + CMN_DT_PMOVSR_CLR);
- writel_relaxed(CMN_DT_PMCR_OVFL_INTR_EN, dtc->base + CMN_DT_PMCR);
return 0;
}
@@ -1961,7 +1962,7 @@ static int arm_cmn_init_dtcs(struct arm_cmn *cmn)
dn->type = CMN_TYPE_CCLA;
}
- writel_relaxed(CMN_DT_DTC_CTL_DT_EN, cmn->dtc[0].base + CMN_DT_DTC_CTL);
+ arm_cmn_set_state(cmn, CMN_STATE_DISABLED);
return 0;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 041/431] x86/mm: Allow guest.enc_status_change_prepare() to fail
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (39 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 040/431] perf/arm-cmn: Fix DTC reset Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 042/431] x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad() Greg Kroah-Hartman
` (390 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Kirill A. Shutemov, Dave Hansen,
Kuppuswamy Sathyanarayanan, Sasha Levin
From: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[ Upstream commit 3f6819dd192ef4f0c568ec3e9d6d408b3fa1ad3d ]
TDX code is going to provide guest.enc_status_change_prepare() that is
able to fail. TDX will use the call to convert the GPA range from shared
to private. This operation can fail.
Add a way to return an error from the callback.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/all/20230606095622.1939-2-kirill.shutemov%40linux.intel.com
Stable-dep-of: 195edce08b63 ("x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/x86/include/asm/x86_init.h | 2 +-
arch/x86/kernel/x86_init.c | 2 +-
arch/x86/mm/mem_encrypt_amd.c | 4 +++-
arch/x86/mm/pat/set_memory.c | 3 ++-
4 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index c1c8c581759d6..034e62838b284 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -150,7 +150,7 @@ struct x86_init_acpi {
* @enc_cache_flush_required Returns true if a cache flush is needed before changing page encryption status
*/
struct x86_guest {
- void (*enc_status_change_prepare)(unsigned long vaddr, int npages, bool enc);
+ bool (*enc_status_change_prepare)(unsigned long vaddr, int npages, bool enc);
bool (*enc_status_change_finish)(unsigned long vaddr, int npages, bool enc);
bool (*enc_tlb_flush_required)(bool enc);
bool (*enc_cache_flush_required)(void);
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 10622cf2b30f4..41e5b4cb898c3 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -130,7 +130,7 @@ struct x86_cpuinit_ops x86_cpuinit = {
static void default_nmi_init(void) { };
-static void enc_status_change_prepare_noop(unsigned long vaddr, int npages, bool enc) { }
+static bool enc_status_change_prepare_noop(unsigned long vaddr, int npages, bool enc) { return true; }
static bool enc_status_change_finish_noop(unsigned long vaddr, int npages, bool enc) { return false; }
static bool enc_tlb_flush_required_noop(bool enc) { return false; }
static bool enc_cache_flush_required_noop(void) { return false; }
diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c
index 9c4d8dbcb1296..ff6c0462beee7 100644
--- a/arch/x86/mm/mem_encrypt_amd.c
+++ b/arch/x86/mm/mem_encrypt_amd.c
@@ -319,7 +319,7 @@ static void enc_dec_hypercall(unsigned long vaddr, int npages, bool enc)
#endif
}
-static void amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool enc)
+static bool amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool enc)
{
/*
* To maintain the security guarantees of SEV-SNP guests, make sure
@@ -327,6 +327,8 @@ static void amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool
*/
if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP) && !enc)
snp_set_memory_shared(vaddr, npages);
+
+ return true;
}
/* Return true unconditionally: return value doesn't matter for the SEV side */
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 356758b7d4b47..6a167290a1fd1 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2151,7 +2151,8 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
cpa_flush(&cpa, x86_platform.guest.enc_cache_flush_required());
/* Notify hypervisor that we are about to set/clr encryption attribute. */
- x86_platform.guest.enc_status_change_prepare(addr, numpages, enc);
+ if (!x86_platform.guest.enc_status_change_prepare(addr, numpages, enc))
+ return -EIO;
ret = __change_page_attr_set_clr(&cpa, 1);
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 042/431] x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad()
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (40 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 041/431] x86/mm: Allow guest.enc_status_change_prepare() to fail Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 043/431] drivers/perf: hisi: Dont migrate perf to the CPU going to teardown Greg Kroah-Hartman
` (389 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Kirill A. Shutemov, Dave Hansen,
Kuppuswamy Sathyanarayanan, Sasha Levin
From: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[ Upstream commit 195edce08b63d293377f615f4f7f086715d2d212 ]
tl;dr: There is a race in the TDX private<=>shared conversion code
which could kill the TDX guest. Fix it by changing conversion
ordering to eliminate the window.
TDX hardware maintains metadata to track which pages are private and
shared. Additionally, TDX guests use the guest x86 page tables to
specify whether a given mapping is intended to be private or shared.
Bad things happen when the intent and metadata do not match.
So there are two thing in play:
1. "the page" -- the physical TDX page metadata
2. "the mapping" -- the guest-controlled x86 page table intent
For instance, an unrecoverable exit to VMM occurs if a guest touches a
private mapping that points to a shared physical page.
In summary:
* Private mapping => Private Page == OK (obviously)
* Shared mapping => Shared Page == OK (obviously)
* Private mapping => Shared Page == BIG BOOM!
* Shared mapping => Private Page == OK-ish
(It will read generate a recoverable #VE via handle_mmio())
Enter load_unaligned_zeropad(). It can touch memory that is adjacent but
otherwise unrelated to the memory it needs to touch. It will cause one
of those unrecoverable exits (aka. BIG BOOM) if it blunders into a
shared mapping pointing to a private page.
This is a problem when __set_memory_enc_pgtable() converts pages from
shared to private. It first changes the mapping and second modifies
the TDX page metadata. It's moving from:
* Shared mapping => Shared Page == OK
to:
* Private mapping => Shared Page == BIG BOOM!
This means that there is a window with a shared mapping pointing to a
private page where load_unaligned_zeropad() can strike.
Add a TDX handler for guest.enc_status_change_prepare(). This converts
the page from shared to private *before* the page becomes private. This
ensures that there is never a private mapping to a shared page.
Leave a guest.enc_status_change_finish() in place but only use it for
private=>shared conversions. This will delay updating the TDX metadata
marking the page private until *after* the mapping matches the metadata.
This also ensures that there is never a private mapping to a shared page.
[ dhansen: rewrite changelog ]
Fixes: 7dbde7631629 ("x86/mm/cpa: Add support for TDX shared memory")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/all/20230606095622.1939-3-kirill.shutemov%40linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/x86/coco/tdx/tdx.c | 51 ++++++++++++++++++++++++++++++++++++++---
1 file changed, 48 insertions(+), 3 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 055300e08fb38..3d191ec036fb7 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -840,6 +840,30 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
return true;
}
+static bool tdx_enc_status_change_prepare(unsigned long vaddr, int numpages,
+ bool enc)
+{
+ /*
+ * Only handle shared->private conversion here.
+ * See the comment in tdx_early_init().
+ */
+ if (enc)
+ return tdx_enc_status_changed(vaddr, numpages, enc);
+ return true;
+}
+
+static bool tdx_enc_status_change_finish(unsigned long vaddr, int numpages,
+ bool enc)
+{
+ /*
+ * Only handle private->shared conversion here.
+ * See the comment in tdx_early_init().
+ */
+ if (!enc)
+ return tdx_enc_status_changed(vaddr, numpages, enc);
+ return true;
+}
+
void __init tdx_early_init(void)
{
u64 cc_mask;
@@ -867,9 +891,30 @@ void __init tdx_early_init(void)
*/
physical_mask &= cc_mask - 1;
- x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required;
- x86_platform.guest.enc_tlb_flush_required = tdx_tlb_flush_required;
- x86_platform.guest.enc_status_change_finish = tdx_enc_status_changed;
+ /*
+ * The kernel mapping should match the TDX metadata for the page.
+ * load_unaligned_zeropad() can touch memory *adjacent* to that which is
+ * owned by the caller and can catch even _momentary_ mismatches. Bad
+ * things happen on mismatch:
+ *
+ * - Private mapping => Shared Page == Guest shutdown
+ * - Shared mapping => Private Page == Recoverable #VE
+ *
+ * guest.enc_status_change_prepare() converts the page from
+ * shared=>private before the mapping becomes private.
+ *
+ * guest.enc_status_change_finish() converts the page from
+ * private=>shared after the mapping becomes private.
+ *
+ * In both cases there is a temporary shared mapping to a private page,
+ * which can result in a #VE. But, there is never a private mapping to
+ * a shared page.
+ */
+ x86_platform.guest.enc_status_change_prepare = tdx_enc_status_change_prepare;
+ x86_platform.guest.enc_status_change_finish = tdx_enc_status_change_finish;
+
+ x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required;
+ x86_platform.guest.enc_tlb_flush_required = tdx_tlb_flush_required;
pr_info("Guest detected\n");
}
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 043/431] drivers/perf: hisi: Dont migrate perf to the CPU going to teardown
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (41 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 042/431] x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad() Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 044/431] perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used Greg Kroah-Hartman
` (388 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Junhao He, Jonathan Cameron,
Yicong Yang, Mark Rutland, Will Deacon, Sasha Levin
From: Junhao He <hejunhao3@huawei.com>
[ Upstream commit 7a6a9f1c5a0a875a421db798d4b2ee022dc1ee1a ]
The driver needs to migrate the perf context if the current using CPU going
to teardown. By the time calling the cpuhp::teardown() callback the
cpu_online_mask() hasn't updated yet and still includes the CPU going to
teardown. In current driver's implementation we may migrate the context
to the teardown CPU and leads to the below calltrace:
...
[ 368.104662][ T932] task:cpuhp/0 state:D stack: 0 pid: 15 ppid: 2 flags:0x00000008
[ 368.113699][ T932] Call trace:
[ 368.116834][ T932] __switch_to+0x7c/0xbc
[ 368.120924][ T932] __schedule+0x338/0x6f0
[ 368.125098][ T932] schedule+0x50/0xe0
[ 368.128926][ T932] schedule_preempt_disabled+0x18/0x24
[ 368.134229][ T932] __mutex_lock.constprop.0+0x1d4/0x5dc
[ 368.139617][ T932] __mutex_lock_slowpath+0x1c/0x30
[ 368.144573][ T932] mutex_lock+0x50/0x60
[ 368.148579][ T932] perf_pmu_migrate_context+0x84/0x2b0
[ 368.153884][ T932] hisi_pcie_pmu_offline_cpu+0x90/0xe0 [hisi_pcie_pmu]
[ 368.160579][ T932] cpuhp_invoke_callback+0x2a0/0x650
[ 368.165707][ T932] cpuhp_thread_fun+0xe4/0x190
[ 368.170316][ T932] smpboot_thread_fn+0x15c/0x1a0
[ 368.175099][ T932] kthread+0x108/0x13c
[ 368.179012][ T932] ret_from_fork+0x10/0x18
...
Use function cpumask_any_but() to find one correct active cpu to fixes
this issue.
Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")
Signed-off-by: Junhao He <hejunhao3@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230608114326.27649-1-hejunhao3@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/perf/hisilicon/hisi_pcie_pmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
index 6fee0b6e163bb..e10fc7cb9493a 100644
--- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
+++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
@@ -683,7 +683,7 @@ static int hisi_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
pcie_pmu->on_cpu = -1;
/* Choose a new CPU from all online cpus. */
- target = cpumask_first(cpu_online_mask);
+ target = cpumask_any_but(cpu_online_mask, cpu);
if (target >= nr_cpu_ids) {
pci_err(pcie_pmu->pdev, "There is no CPU to set\n");
return 0;
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 044/431] perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (42 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 043/431] drivers/perf: hisi: Dont migrate perf to the CPU going to teardown Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 045/431] perf/arm_cspmu: Fix event attribute type Greg Kroah-Hartman
` (387 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Ilkka Koskinen, Will Deacon,
Sasha Levin
From: Ilkka Koskinen <ilkka@os.amperecomputing.com>
[ Upstream commit 225d757012e0afa673d8c862e6fb39ed2f429b4d ]
Don't try to set irq affinity if PMU doesn't have an overflow interrupt.
Fixes: e37dfd65731d ("perf: arm_cspmu: Add support for ARM CoreSight PMU driver")
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20230608203742.3503486-1-ilkka@os.amperecomputing.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/perf/arm_cspmu/arm_cspmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c b/drivers/perf/arm_cspmu/arm_cspmu.c
index e31302ab7e37c..c1b6f4bdb04af 100644
--- a/drivers/perf/arm_cspmu/arm_cspmu.c
+++ b/drivers/perf/arm_cspmu/arm_cspmu.c
@@ -1230,7 +1230,8 @@ static struct platform_driver arm_cspmu_driver = {
static void arm_cspmu_set_active_cpu(int cpu, struct arm_cspmu *cspmu)
{
cpumask_set_cpu(cpu, &cspmu->active_cpu);
- WARN_ON(irq_set_affinity(cspmu->irq, &cspmu->active_cpu));
+ if (cspmu->irq)
+ WARN_ON(irq_set_affinity(cspmu->irq, &cspmu->active_cpu));
}
static int arm_cspmu_cpu_online(unsigned int cpu, struct hlist_node *node)
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 045/431] perf/arm_cspmu: Fix event attribute type
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (43 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 044/431] perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 046/431] APEI: GHES: correctly return NULL for ghes_get_devices() Greg Kroah-Hartman
` (386 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Suzuki K Poulose, Robin Murphy,
Will Deacon, Sasha Levin, Ilkka Koskinen
From: Robin Murphy <robin.murphy@arm.com>
[ Upstream commit 71e0cb32d5fc61468e83ed962379af71bba8237e ]
ARM_CSPMU_EVENT_ATTR() defines a struct perf_pmu_events_attr, so
arm_cspmu_sysfs_event_show() should not be interpreting it as struct
dev_ext_attribute.
Fixes: e37dfd65731d ("perf: arm_cspmu: Add support for ARM CoreSight PMU driver")
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/27c0804af64007b2400abbc40278f642ee6a0a29.1685983270.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/perf/arm_cspmu/arm_cspmu.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c b/drivers/perf/arm_cspmu/arm_cspmu.c
index c1b6f4bdb04af..35d2fe33a7b6f 100644
--- a/drivers/perf/arm_cspmu/arm_cspmu.c
+++ b/drivers/perf/arm_cspmu/arm_cspmu.c
@@ -189,10 +189,10 @@ static inline bool use_64b_counter_reg(const struct arm_cspmu *cspmu)
ssize_t arm_cspmu_sysfs_event_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
- struct dev_ext_attribute *eattr =
- container_of(attr, struct dev_ext_attribute, attr);
- return sysfs_emit(buf, "event=0x%llx\n",
- (unsigned long long)eattr->var);
+ struct perf_pmu_events_attr *pmu_attr;
+
+ pmu_attr = container_of(attr, typeof(*pmu_attr), attr);
+ return sysfs_emit(buf, "event=0x%llx\n", pmu_attr->id);
}
EXPORT_SYMBOL_GPL(arm_cspmu_sysfs_event_show);
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 046/431] APEI: GHES: correctly return NULL for ghes_get_devices()
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (44 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 045/431] perf/arm_cspmu: Fix event attribute type Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 047/431] powercap: RAPL: fix invalid initialization for pl4_supported field Greg Kroah-Hartman
` (385 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Li Yang, Tony Luck,
Rafael J. Wysocki, Sasha Levin
From: Li Yang <leoyang.li@nxp.com>
[ Upstream commit 9368aa1882ac7178adcd936cee5f0899dbf76dc4 ]
Since 315bada690e0 ("EDAC: Check for GHES preference in the
chipset-specific EDAC drivers"), vendor specific EDAC driver will not
probe correctly when CONFIG_ACPI_APEI_GHES is enabled but no GHES device
is present. Make ghes_get_devices() return NULL when the GHES device
list is empty to fix the problem.
Fixes: 9057a3f7ac36 ("EDAC/ghes: Prepare to make ghes_edac a proper module")
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/acpi/apei/ghes.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 34ad071a64e96..4382fe13ee3e4 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1544,6 +1544,8 @@ struct list_head *ghes_get_devices(void)
pr_warn_once("Force-loading ghes_edac on an unsupported platform. You're on your own!\n");
}
+ } else if (list_empty(&ghes_devs)) {
+ return NULL;
}
return &ghes_devs;
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 047/431] powercap: RAPL: fix invalid initialization for pl4_supported field
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (45 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 046/431] APEI: GHES: correctly return NULL for ghes_get_devices() Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 048/431] powercap: RAPL: Fix CONFIG_IOSF_MBI dependency Greg Kroah-Hartman
` (384 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Dave Hansen, Sumeet Pawnikar,
Rafael J. Wysocki, Sasha Levin
From: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
[ Upstream commit d05b5e0baf424c8c4b4709ac11f66ab726c8deaf ]
The current initialization of the struct x86_cpu_id via
pl4_support_ids[] is partial and wrong. It is initializing
"stepping" field with "X86_FEATURE_ANY" instead of "feature" field.
Use X86_MATCH_INTEL_FAM6_MODEL macro instead of initializing
each field of the struct x86_cpu_id for pl4_supported list of CPUs.
This X86_MATCH_INTEL_FAM6_MODEL macro internally uses another macro
X86_MATCH_VENDOR_FAM_MODEL_FEATURE for X86 based CPU matching with
appropriate initialized values.
Reported-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lore.kernel.org/lkml/28ead36b-2d9e-1a36-6f4e-04684e420260@intel.com
Fixes: eb52bc2ae5b8 ("powercap: RAPL: Add Power Limit4 support for Meteor Lake SoC")
Fixes: b08b95cf30f5 ("powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P")
Fixes: 515755906921 ("powercap: RAPL: Add Power Limit4 support for RaptorLake")
Fixes: 1cc5b9a411e4 ("powercap: Add Power Limit4 support for Alder Lake SoC")
Fixes: 8365a898fe53 ("powercap: Add Power Limit4 support")
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/powercap/intel_rapl_msr.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/powercap/intel_rapl_msr.c b/drivers/powercap/intel_rapl_msr.c
index a27673706c3d6..7be7561f5ad64 100644
--- a/drivers/powercap/intel_rapl_msr.c
+++ b/drivers/powercap/intel_rapl_msr.c
@@ -137,14 +137,14 @@ static int rapl_msr_write_raw(int cpu, struct reg_action *ra)
/* List of verified CPUs. */
static const struct x86_cpu_id pl4_support_ids[] = {
- { X86_VENDOR_INTEL, 6, INTEL_FAM6_TIGERLAKE_L, X86_FEATURE_ANY },
- { X86_VENDOR_INTEL, 6, INTEL_FAM6_ALDERLAKE, X86_FEATURE_ANY },
- { X86_VENDOR_INTEL, 6, INTEL_FAM6_ALDERLAKE_L, X86_FEATURE_ANY },
- { X86_VENDOR_INTEL, 6, INTEL_FAM6_ALDERLAKE_N, X86_FEATURE_ANY },
- { X86_VENDOR_INTEL, 6, INTEL_FAM6_RAPTORLAKE, X86_FEATURE_ANY },
- { X86_VENDOR_INTEL, 6, INTEL_FAM6_RAPTORLAKE_P, X86_FEATURE_ANY },
- { X86_VENDOR_INTEL, 6, INTEL_FAM6_METEORLAKE, X86_FEATURE_ANY },
- { X86_VENDOR_INTEL, 6, INTEL_FAM6_METEORLAKE_L, X86_FEATURE_ANY },
+ X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE_L, NULL),
+ X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, NULL),
+ X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, NULL),
+ X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_N, NULL),
+ X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE, NULL),
+ X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_P, NULL),
+ X86_MATCH_INTEL_FAM6_MODEL(METEORLAKE, NULL),
+ X86_MATCH_INTEL_FAM6_MODEL(METEORLAKE_L, NULL),
{}
};
--
2.39.2
^ permalink raw reply related [flat|nested] 440+ messages in thread* [PATCH 6.3 048/431] powercap: RAPL: Fix CONFIG_IOSF_MBI dependency
2023-07-09 11:09 [PATCH 6.3 000/431] 6.3.13-rc1 review Greg Kroah-Hartman
` (46 preceding siblings ...)
2023-07-09 11:09 ` [PATCH 6.3 047/431] powercap: RAPL: fix invalid initialization for pl4_supported field Greg Kroah-Hartman
@ 2023-07-09 11:09 ` Greg Kroah-Hartman
2023-07-09 11:09 ` [PATCH 6.3 049/431] PM: domains: Move the verification of in-params from genpd_add_device() Greg Kroah-Hartman
` (383 subsequent siblings)
431 siblings, 0 replies; 440+ messages in thread
From: Greg Kroah-Hartman @ 2023-07-09 11:09 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Arnd Bergmann, Zhang Rui,
Rafael J. Wysocki, Sasha Levin
From: Zhang Rui <rui.zhang@intel.com>
[ Upstream commit 4658fe81b3f8afe8adf37734ec5fe595d90415c6 ]
After commit 3382388d7148 ("intel_rapl: abstract RAPL common code"),
accessing to IOSF_MBI interface is done in the RAPL common code.
Thus it is the CONFIG_INTEL_RAPL_CORE that has dependency of
CONFIG_IOSF_MBI, while CONFIG_INTEL_RAPL_MSR does not.
This problem was not exposed previously because all the previous RAPL
common code users, aka, the RAPL MSR and MMIO I/F drivers, have
CONFIG_IOSF_MBI selected.
Fix the CONFIG_IOSF_MBI dependency in RAPL code. This also fixes a build
time failure when the RAPL TPMI I/F driver is introduced without
selecting CONFIG_IOSF_MBI.
x86_64-linux-ld: vmlinux.o: in function `set_floor_freq_atom':
intel_rapl_common.c:(.text+0x2dac9b8): undefined reference to `iosf_mbi_write'
x86_64-linux-ld: intel_rapl_common.c:(.text+0x2daca66): undefined reference to `iosf_mbi_read'
Reference to iosf_mbi.h is also removed from the RAPL MSR I/F driver.
Fixes: 3382388d7148 ("intel_rapl: abstract RAPL common code")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/all/20230601213246.3271412-1-arnd@kernel.org
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---