Netdev List
 help / color / mirror / Atom feed
* [PATCH 1/2] tg3: Move the [rt]x_dropped counters to tg3_napi
From: alexey.pakhunov @ 2023-11-08  1:01 UTC (permalink / raw)
  To: mchan
  Cc: vincent.wong2, netdev, linux-kernel, siva.kallam, prashant,
	Alex Pakhunov

From: Alex Pakhunov <alexey.pakhunov@spacex.com>

This change moves [rt]x_dropped counters to tg3_napi so that they can be
updated by a single writer, race-free.

Signed-off-by: Alex Pakhunov <alexey.pakhunov@spacex.com>
Signed-off-by: Vincent Wong <vincent.wong2@spacex.com>
---
 drivers/net/ethernet/broadcom/tg3.c | 29 +++++++++++++++++++++++++----
 drivers/net/ethernet/broadcom/tg3.h |  4 ++--
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 14b311196b8f..09acc2504965 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6845,7 +6845,7 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
 				       desc_idx, *post_ptr);
 		drop_it_no_recycle:
 			/* Other statistics kept track of by card. */
-			tp->rx_dropped++;
+			tnapi->rx_dropped++;
 			goto next_pkt;
 		}
 
@@ -8146,7 +8146,7 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 drop:
 	dev_kfree_skb_any(skb);
 drop_nofree:
-	tp->tx_dropped++;
+	tnapi->tx_dropped++;
 	return NETDEV_TX_OK;
 }
 
@@ -8587,6 +8587,9 @@ static int tg3_init_rings(struct tg3 *tp)
 			tg3_free_rings(tp);
 			return -ENOMEM;
 		}
+
+		tnapi->rx_dropped = 0;
+		tnapi->tx_dropped = 0;
 	}
 
 	return 0;
@@ -11895,6 +11898,9 @@ static void tg3_get_nstats(struct tg3 *tp, struct rtnl_link_stats64 *stats)
 {
 	struct rtnl_link_stats64 *old_stats = &tp->net_stats_prev;
 	struct tg3_hw_stats *hw_stats = tp->hw_stats;
+	unsigned long rx_dropped = 0;
+	unsigned long tx_dropped = 0;
+	int i;
 
 	stats->rx_packets = old_stats->rx_packets +
 		get_stat64(&hw_stats->rx_ucast_packets) +
@@ -11941,8 +11947,23 @@ static void tg3_get_nstats(struct tg3 *tp, struct rtnl_link_stats64 *stats)
 	stats->rx_missed_errors = old_stats->rx_missed_errors +
 		get_stat64(&hw_stats->rx_discards);
 
-	stats->rx_dropped = tp->rx_dropped;
-	stats->tx_dropped = tp->tx_dropped;
+	/* Aggregate per-queue counters. The per-queue counters are updated
+	 * by a single writer, race-free. The result computed by this loop
+	 * might not be 100% accurate (counters can be updated in the middle of
+	 * the loop) but the next tg3_get_nstats() will recompute the current
+	 * value so it is acceptable.
+	 */
+	for (i = 0; i < tp->irq_cnt; i++) {
+		struct tg3_napi *tnapi = &tp->napi[i];
+
+		rx_dropped += tnapi->rx_dropped;
+		tx_dropped += tnapi->tx_dropped;
+	}
+
+	/* The counters wrap around at 4G on 32bit machines. */
+	stats->rx_dropped = rx_dropped;
+	stats->tx_dropped = tx_dropped;
+
 }
 
 static int tg3_get_regs_len(struct net_device *dev)
diff --git a/drivers/net/ethernet/broadcom/tg3.h b/drivers/net/ethernet/broadcom/tg3.h
index 1000c894064f..8d753f8c5b06 100644
--- a/drivers/net/ethernet/broadcom/tg3.h
+++ b/drivers/net/ethernet/broadcom/tg3.h
@@ -3018,6 +3018,7 @@ struct tg3_napi {
 	u16				*rx_rcb_prod_idx;
 	struct tg3_rx_prodring_set	prodring;
 	struct tg3_rx_buffer_desc	*rx_rcb;
+	unsigned long			rx_dropped;
 
 	u32				tx_prod	____cacheline_aligned;
 	u32				tx_cons;
@@ -3026,6 +3027,7 @@ struct tg3_napi {
 	u32				prodmbox;
 	struct tg3_tx_buffer_desc	*tx_ring;
 	struct tg3_tx_ring_info		*tx_buffers;
+	unsigned long			tx_dropped;
 
 	dma_addr_t			status_mapping;
 	dma_addr_t			rx_rcb_mapping;
@@ -3219,8 +3221,6 @@ struct tg3 {
 
 
 	/* begin "everything else" cacheline(s) section */
-	unsigned long			rx_dropped;
-	unsigned long			tx_dropped;
 	struct rtnl_link_stats64	net_stats_prev;
 	struct tg3_ethtool_stats	estats_prev;
 

base-commit: ffc253263a1375a65fa6c9f62a893e9767fbebfa
-- 
2.39.3


^ permalink raw reply related

* [PATCH 2/2] tg3: Increment tx_dropped in tg3_tso_bug()
From: alexey.pakhunov @ 2023-11-08  1:01 UTC (permalink / raw)
  To: mchan
  Cc: vincent.wong2, netdev, linux-kernel, siva.kallam, prashant,
	Alex Pakhunov
In-Reply-To: <20231108010129.2009947-1-alexey.pakhunov@spacex.com>

From: Alex Pakhunov <alexey.pakhunov@spacex.com>

tg3_tso_bug() drops a packet if it cannot be segmented for any reason.
The number of discarded frames should be incremented accordingly.

Signed-off-by: Alex Pakhunov <alexey.pakhunov@spacex.com>
Signed-off-by: Vincent Wong <vincent.wong2@spacex.com>
---
 drivers/net/ethernet/broadcom/tg3.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 09acc2504965..264d6aff67e7 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -7874,8 +7874,10 @@ static int tg3_tso_bug(struct tg3 *tp, struct tg3_napi *tnapi,
 
 	segs = skb_gso_segment(skb, tp->dev->features &
 				    ~(NETIF_F_TSO | NETIF_F_TSO6));
-	if (IS_ERR(segs) || !segs)
+	if (IS_ERR(segs) || !segs) {
+		tnapi->tx_dropped++;
 		goto tg3_tso_bug_end;
+	}
 
 	skb_list_walk_safe(segs, seg, next) {
 		skb_mark_not_on_list(seg);
-- 
2.39.3


^ permalink raw reply related

* Re: [RFC PATCH v3 06/12] memory-provider: dmabuf devmem memory provider
From: David Ahern @ 2023-11-08  0:10 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, linux-kernel, linux-arch, linux-kselftest, linux-media,
	dri-devel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas,
	Arnd Bergmann, Willem de Bruijn, Shuah Khan, Sumit Semwal,
	Christian König, Shakeel Butt, Jeroen de Borst,
	Praveen Kaligineedi, Willem de Bruijn, Kaiyuan Zhang
In-Reply-To: <CAHS8izME7NixQrrh+qKnMR4+FyTzKW=B2pYyNffJ+igiehe-7g@mail.gmail.com>

On 11/7/23 5:02 PM, Mina Almasry wrote:
> On Mon, Nov 6, 2023 at 1:02 PM Stanislav Fomichev <sdf@google.com> wrote:
>>
>> On 11/05, Mina Almasry wrote:
>>> +static inline bool page_is_page_pool_iov(const struct page *page)
>>> +{
>>> +     return (unsigned long)page & PP_DEVMEM;
>>> +}
>>
>> Speaking of bpf: one thing that might be problematic with this PP_DEVMEM
>> bit is that it will make debugging with bpftrace a bit (more)
>> complicated. If somebody were trying to get to that page_pool_iov from
>> the frags, they will have to do the equivalent of page_is_page_pool_iov,
>> but probably not a big deal? (thinking out loud)
> 
> Good point, but that doesn't only apply to bpf I think. I'm guessing
> even debugger drgn access to the bv_page in the frag will have trouble
> if it's actually accessing an iov with LSB set.
> 
> But this is not specific to this use for LSB pointer trick. I think
> all code that currently uses LSB pointer trick will have similar
> troubles. In this context my humble vote is that we get such big
> upside from reducing code churn that it's reasonable to tolerate such
> side effects.

+1

> 
> I could alleviate some of the issues by teaching drgn to do the right
> thing for devmem/iovs... time permitting.
> 
Tools like drgn and crash have to know when the LSB trick is used  -
e.g., dst_entry - and handle it when dereferencing pointers.

^ permalink raw reply

* Re: [RFC PATCH v3 06/12] memory-provider: dmabuf devmem memory provider
From: Mina Almasry @ 2023-11-08  0:02 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, linux-kernel, linux-arch, linux-kselftest, linux-media,
	dri-devel, linaro-mm-sig, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Jesper Dangaard Brouer,
	Ilias Apalodimas, Arnd Bergmann, Willem de Bruijn, Shuah Khan,
	Sumit Semwal, Christian König, Shakeel Butt, Jeroen de Borst,
	Praveen Kaligineedi, Willem de Bruijn, Kaiyuan Zhang
In-Reply-To: <583db67b-96c6-4e17-bea0-b5a14799db4a@kernel.org>

On Mon, Nov 6, 2023 at 1:02 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> On 11/05, Mina Almasry wrote:
> > +static inline bool page_is_page_pool_iov(const struct page *page)
> > +{
> > +     return (unsigned long)page & PP_DEVMEM;
> > +}
>
> Speaking of bpf: one thing that might be problematic with this PP_DEVMEM
> bit is that it will make debugging with bpftrace a bit (more)
> complicated. If somebody were trying to get to that page_pool_iov from
> the frags, they will have to do the equivalent of page_is_page_pool_iov,
> but probably not a big deal? (thinking out loud)

Good point, but that doesn't only apply to bpf I think. I'm guessing
even debugger drgn access to the bv_page in the frag will have trouble
if it's actually accessing an iov with LSB set.

But this is not specific to this use for LSB pointer trick. I think
all code that currently uses LSB pointer trick will have similar
troubles. In this context my humble vote is that we get such big
upside from reducing code churn that it's reasonable to tolerate such
side effects.

I could alleviate some of the issues by teaching drgn to do the right
thing for devmem/iovs... time permitting.

On Mon, Nov 6, 2023 at 3:49 PM David Ahern <dsahern@kernel.org> wrote:
>
> On 11/5/23 7:44 PM, Mina Almasry wrote:
> > diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> > index 78cbb040af94..b93243c2a640 100644
> > --- a/include/net/page_pool/helpers.h
> > +++ b/include/net/page_pool/helpers.h
> > @@ -111,6 +112,45 @@ page_pool_iov_binding(const struct page_pool_iov *ppiov)
> >       return page_pool_iov_owner(ppiov)->binding;
> >  }
> >
> > +static inline int page_pool_iov_refcount(const struct page_pool_iov *ppiov)
> > +{
> > +     return refcount_read(&ppiov->refcount);
> > +}
> > +
> > +static inline void page_pool_iov_get_many(struct page_pool_iov *ppiov,
> > +                                       unsigned int count)
> > +{
> > +     refcount_add(count, &ppiov->refcount);
> > +}
> > +
> > +void __page_pool_iov_free(struct page_pool_iov *ppiov);
> > +
> > +static inline void page_pool_iov_put_many(struct page_pool_iov *ppiov,
> > +                                       unsigned int count)
> > +{
> > +     if (!refcount_sub_and_test(count, &ppiov->refcount))
> > +             return;
> > +
> > +     __page_pool_iov_free(ppiov);
> > +}
> > +
> > +/* page pool mm helpers */
> > +
> > +static inline bool page_is_page_pool_iov(const struct page *page)
> > +{
> > +     return (unsigned long)page & PP_DEVMEM;
>
> This is another one where the code can be more generic to not force a
> lot changes later.  e.g., PP_CUSTOM or PP_NO_PAGE. Then io_uring use
> case with host memory can leverage the iov pool in a similar manner.
>
> That does mean skb->devmem needs to be a flag on the page pool and not
> just assume iov == device memory.
>
>
>


-- 
Thanks,
Mina

^ permalink raw reply

* Re: [RFC PATCH v3 10/12] tcp: RX path for devmem TCP
From: David Ahern @ 2023-11-08  0:01 UTC (permalink / raw)
  To: Mina Almasry, Willem de Bruijn
  Cc: Stanislav Fomichev, netdev, linux-kernel, linux-arch,
	linux-kselftest, linux-media, dri-devel, linaro-mm-sig,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jesper Dangaard Brouer, Ilias Apalodimas, Arnd Bergmann,
	Shuah Khan, Sumit Semwal, Christian König, Shakeel Butt,
	Jeroen de Borst, Praveen Kaligineedi, Willem de Bruijn,
	Kaiyuan Zhang
In-Reply-To: <CAHS8izOU06ceKyc5oVZhdCKJqmeRdcRyJBFpjGe=u2yh=V52dQ@mail.gmail.com>

On 11/7/23 4:55 PM, Mina Almasry wrote:
> On Mon, Nov 6, 2023 at 4:03 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>>
>> On Mon, Nov 6, 2023 at 3:55 PM David Ahern <dsahern@kernel.org> wrote:
>>>
>>> On 11/6/23 4:32 PM, Stanislav Fomichev wrote:
>>>>> The concise notification API returns tokens as a range for
>>>>> compression, encoding as two 32-bit unsigned integers start + length.
>>>>> It allows for even further batching by returning multiple such ranges
>>>>> in a single call.
>>>>
>>>> Tangential: should tokens be u64? Otherwise we can't have more than
>>>> 4gb unacknowledged. Or that's a reasonable constraint?
>>>>
>>>
>>> Was thinking the same and with bits reserved for a dmabuf id to allow
>>> multiple dmabufs in a single rx queue (future extension, but build the
>>> capability in now). e.g., something like a 37b offset (128GB dmabuf
>>> size), 19b length (large GRO), 8b dmabuf id (lots of dmabufs to a queue).
>>
>> Agreed. Converting to 64b now sounds like a good forward looking revision.
> 
> The concept of IDing a dma-buf came up in a couple of different
> contexts. First, in the context of us giving the dma-buf ID to the
> user on recvmsg() to tell the user the data is in this specific
> dma-buf. The second context is here, to bind dma-bufs with multiple
> user-visible IDs to an rx queue.
> 
> My issue here is that I don't see anything in the struct dma_buf that
> can practically serve as an ID:
> 
> https://elixir.bootlin.com/linux/v6.6-rc7/source/include/linux/dma-buf.h#L302
> 
> Actually, from the userspace, only the name of the dma-buf seems
> queryable. That's only unique if the user sets it as such. The dmabuf
> FD can't serve as an ID. For our use case we need to support 1 process
> doing the dma-buf bind via netlink, sharing the dma-buf FD to another
> process, and that process receives the data.  In this case the FDs
> shown by the 2 processes may be different. Converting to 64b is a
> trivial change I can make now, but I'm not sure how to ID these
> dma-bufs. Suggestions welcome. I'm not sure the dma-buf guys will
> allow adding a new ID + APIs to query said dma-buf ID.
> 

The API can be unique to this usage: e.g., add a dmabuf id to the
netlink API. Userspace manages the ids (tells the kernel what value to
use with an instance), the kernel validates no 2 dmabufs have the same
id and then returns the value here.



^ permalink raw reply

* Re: [RFC PATCH v3 10/12] tcp: RX path for devmem TCP
From: Mina Almasry @ 2023-11-07 23:55 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: David Ahern, Stanislav Fomichev, netdev, linux-kernel, linux-arch,
	linux-kselftest, linux-media, dri-devel, linaro-mm-sig,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jesper Dangaard Brouer, Ilias Apalodimas, Arnd Bergmann,
	Shuah Khan, Sumit Semwal, Christian König, Shakeel Butt,
	Jeroen de Borst, Praveen Kaligineedi, Willem de Bruijn,
	Kaiyuan Zhang
In-Reply-To: <CAF=yD-Ln4v8orUne8E7D2_eHu39PWPCrMR3Qtuh312pCu=erng@mail.gmail.com>

On Mon, Nov 6, 2023 at 4:03 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Mon, Nov 6, 2023 at 3:55 PM David Ahern <dsahern@kernel.org> wrote:
> >
> > On 11/6/23 4:32 PM, Stanislav Fomichev wrote:
> > >> The concise notification API returns tokens as a range for
> > >> compression, encoding as two 32-bit unsigned integers start + length.
> > >> It allows for even further batching by returning multiple such ranges
> > >> in a single call.
> > >
> > > Tangential: should tokens be u64? Otherwise we can't have more than
> > > 4gb unacknowledged. Or that's a reasonable constraint?
> > >
> >
> > Was thinking the same and with bits reserved for a dmabuf id to allow
> > multiple dmabufs in a single rx queue (future extension, but build the
> > capability in now). e.g., something like a 37b offset (128GB dmabuf
> > size), 19b length (large GRO), 8b dmabuf id (lots of dmabufs to a queue).
>
> Agreed. Converting to 64b now sounds like a good forward looking revision.

The concept of IDing a dma-buf came up in a couple of different
contexts. First, in the context of us giving the dma-buf ID to the
user on recvmsg() to tell the user the data is in this specific
dma-buf. The second context is here, to bind dma-bufs with multiple
user-visible IDs to an rx queue.

My issue here is that I don't see anything in the struct dma_buf that
can practically serve as an ID:

https://elixir.bootlin.com/linux/v6.6-rc7/source/include/linux/dma-buf.h#L302

Actually, from the userspace, only the name of the dma-buf seems
queryable. That's only unique if the user sets it as such. The dmabuf
FD can't serve as an ID. For our use case we need to support 1 process
doing the dma-buf bind via netlink, sharing the dma-buf FD to another
process, and that process receives the data.  In this case the FDs
shown by the 2 processes may be different. Converting to 64b is a
trivial change I can make now, but I'm not sure how to ID these
dma-bufs. Suggestions welcome. I'm not sure the dma-buf guys will
allow adding a new ID + APIs to query said dma-buf ID.

--
Thanks,
Mina

^ permalink raw reply

* [PATCH AUTOSEL 5.10 5/5] 9p/trans_fd: Annotate data-racy writes to file::f_flags
From: Sasha Levin @ 2023-11-07 23:25 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Marco Elver, syzbot+e441aeeb422763cc5511, Dominique Martinet,
	Sasha Levin, ericvh, lucho, davem, edumazet, kuba, pabeni, v9fs,
	netdev
In-Reply-To: <20231107232503.3776802-1-sashal@kernel.org>

From: Marco Elver <elver@google.com>

[ Upstream commit 355f074609dbf3042900ea9d30fcd2b0c323a365 ]

syzbot reported:

 | BUG: KCSAN: data-race in p9_fd_create / p9_fd_create
 |
 | read-write to 0xffff888130fb3d48 of 4 bytes by task 15599 on cpu 0:
 |  p9_fd_open net/9p/trans_fd.c:842 [inline]
 |  p9_fd_create+0x210/0x250 net/9p/trans_fd.c:1092
 |  p9_client_create+0x595/0xa70 net/9p/client.c:1010
 |  v9fs_session_init+0xf9/0xd90 fs/9p/v9fs.c:410
 |  v9fs_mount+0x69/0x630 fs/9p/vfs_super.c:123
 |  legacy_get_tree+0x74/0xd0 fs/fs_context.c:611
 |  vfs_get_tree+0x51/0x190 fs/super.c:1519
 |  do_new_mount+0x203/0x660 fs/namespace.c:3335
 |  path_mount+0x496/0xb30 fs/namespace.c:3662
 |  do_mount fs/namespace.c:3675 [inline]
 |  __do_sys_mount fs/namespace.c:3884 [inline]
 |  [...]
 |
 | read-write to 0xffff888130fb3d48 of 4 bytes by task 15563 on cpu 1:
 |  p9_fd_open net/9p/trans_fd.c:842 [inline]
 |  p9_fd_create+0x210/0x250 net/9p/trans_fd.c:1092
 |  p9_client_create+0x595/0xa70 net/9p/client.c:1010
 |  v9fs_session_init+0xf9/0xd90 fs/9p/v9fs.c:410
 |  v9fs_mount+0x69/0x630 fs/9p/vfs_super.c:123
 |  legacy_get_tree+0x74/0xd0 fs/fs_context.c:611
 |  vfs_get_tree+0x51/0x190 fs/super.c:1519
 |  do_new_mount+0x203/0x660 fs/namespace.c:3335
 |  path_mount+0x496/0xb30 fs/namespace.c:3662
 |  do_mount fs/namespace.c:3675 [inline]
 |  __do_sys_mount fs/namespace.c:3884 [inline]
 |  [...]
 |
 | value changed: 0x00008002 -> 0x00008802

Within p9_fd_open(), O_NONBLOCK is added to f_flags of the read and
write files. This may happen concurrently if e.g. mounting process
modifies the fd in another thread.

Mark the plain read-modify-writes as intentional data-races, with the
assumption that the result of executing the accesses concurrently will
always result in the same result despite the accesses themselves not
being atomic.

Reported-by: syzbot+e441aeeb422763cc5511@syzkaller.appspotmail.com
Signed-off-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/ZO38mqkS0TYUlpFp@elver.google.com
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Message-ID: <20231025103445.1248103-1-asmadeus@codewreck.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/9p/trans_fd.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index e070a0b8e5ca3..63f4d2067059e 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -823,14 +823,21 @@ static int p9_fd_open(struct p9_client *client, int rfd, int wfd)
 		goto out_free_ts;
 	if (!(ts->rd->f_mode & FMODE_READ))
 		goto out_put_rd;
-	/* prevent workers from hanging on IO when fd is a pipe */
-	ts->rd->f_flags |= O_NONBLOCK;
+	/* Prevent workers from hanging on IO when fd is a pipe.
+	 * It's technically possible for userspace or concurrent mounts to
+	 * modify this flag concurrently, which will likely result in a
+	 * broken filesystem. However, just having bad flags here should
+	 * not crash the kernel or cause any other sort of bug, so mark this
+	 * particular data race as intentional so that tooling (like KCSAN)
+	 * can allow it and detect further problems.
+	 */
+	data_race(ts->rd->f_flags |= O_NONBLOCK);
 	ts->wr = fget(wfd);
 	if (!ts->wr)
 		goto out_put_rd;
 	if (!(ts->wr->f_mode & FMODE_WRITE))
 		goto out_put_wr;
-	ts->wr->f_flags |= O_NONBLOCK;
+	data_race(ts->wr->f_flags |= O_NONBLOCK);
 
 	client->trans = ts;
 	client->status = Connected;
-- 
2.42.0


^ permalink raw reply related

* [PATCH AUTOSEL 5.15 6/6] 9p: v9fs_listxattr: fix %s null argument warning
From: Sasha Levin @ 2023-11-07 23:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dominique Martinet, Su Hui, Christian Schoenebeck, Sasha Levin,
	ericvh, lucho, davem, edumazet, kuba, pabeni, v9fs, netdev
In-Reply-To: <20231107232446.3776662-1-sashal@kernel.org>

From: Dominique Martinet <asmadeus@codewreck.org>

[ Upstream commit 9b5c6281838fc84683dd99b47302d81fce399918 ]

W=1 warns about null argument to kprintf:
In file included from fs/9p/xattr.c:12:
In function ‘v9fs_xattr_get’,
    inlined from ‘v9fs_listxattr’ at fs/9p/xattr.c:142:9:
include/net/9p/9p.h:55:2: error: ‘%s’ directive argument is null
[-Werror=format-overflow=]
   55 |  _p9_debug(level, __func__, fmt, ##__VA_ARGS__)
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use an empty string instead of :
 - this is ok 9p-wise because p9pdu_vwritef serializes a null string
and an empty string the same way (one '0' word for length)
 - since this degrades the print statements, add new single quotes for
xattr's name delimter (Old: "file = (null)", new: "file = ''")

Link: https://lore.kernel.org/r/20231008060138.517057-1-suhui@nfschina.com
Suggested-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Acked-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Message-ID: <20231025103445.1248103-2-asmadeus@codewreck.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/9p/xattr.c   | 5 +++--
 net/9p/client.c | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/9p/xattr.c b/fs/9p/xattr.c
index ee331845e2c7a..31799ac10e33a 100644
--- a/fs/9p/xattr.c
+++ b/fs/9p/xattr.c
@@ -73,7 +73,7 @@ ssize_t v9fs_xattr_get(struct dentry *dentry, const char *name,
 	struct p9_fid *fid;
 	int ret;
 
-	p9_debug(P9_DEBUG_VFS, "name = %s value_len = %zu\n",
+	p9_debug(P9_DEBUG_VFS, "name = '%s' value_len = %zu\n",
 		 name, buffer_size);
 	fid = v9fs_fid_lookup(dentry);
 	if (IS_ERR(fid))
@@ -144,7 +144,8 @@ int v9fs_fid_xattr_set(struct p9_fid *fid, const char *name,
 
 ssize_t v9fs_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size)
 {
-	return v9fs_xattr_get(dentry, NULL, buffer, buffer_size);
+	/* Txattrwalk with an empty string lists xattrs instead */
+	return v9fs_xattr_get(dentry, "", buffer, buffer_size);
 }
 
 static int v9fs_xattr_handler_get(const struct xattr_handler *handler,
diff --git a/net/9p/client.c b/net/9p/client.c
index c4c1e44cd7ca3..3696e9373e687 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -2018,7 +2018,7 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
 		goto error;
 	}
 	p9_debug(P9_DEBUG_9P,
-		 ">>> TXATTRWALK file_fid %d, attr_fid %d name %s\n",
+		 ">>> TXATTRWALK file_fid %d, attr_fid %d name '%s'\n",
 		 file_fid->fid, attr_fid->fid, attr_name);
 
 	req = p9_client_rpc(clnt, P9_TXATTRWALK, "dds",
-- 
2.42.0


^ permalink raw reply related

* [PATCH AUTOSEL 5.15 5/6] 9p/trans_fd: Annotate data-racy writes to file::f_flags
From: Sasha Levin @ 2023-11-07 23:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Marco Elver, syzbot+e441aeeb422763cc5511, Dominique Martinet,
	Sasha Levin, ericvh, lucho, davem, edumazet, kuba, pabeni, v9fs,
	netdev
In-Reply-To: <20231107232446.3776662-1-sashal@kernel.org>

From: Marco Elver <elver@google.com>

[ Upstream commit 355f074609dbf3042900ea9d30fcd2b0c323a365 ]

syzbot reported:

 | BUG: KCSAN: data-race in p9_fd_create / p9_fd_create
 |
 | read-write to 0xffff888130fb3d48 of 4 bytes by task 15599 on cpu 0:
 |  p9_fd_open net/9p/trans_fd.c:842 [inline]
 |  p9_fd_create+0x210/0x250 net/9p/trans_fd.c:1092
 |  p9_client_create+0x595/0xa70 net/9p/client.c:1010
 |  v9fs_session_init+0xf9/0xd90 fs/9p/v9fs.c:410
 |  v9fs_mount+0x69/0x630 fs/9p/vfs_super.c:123
 |  legacy_get_tree+0x74/0xd0 fs/fs_context.c:611
 |  vfs_get_tree+0x51/0x190 fs/super.c:1519
 |  do_new_mount+0x203/0x660 fs/namespace.c:3335
 |  path_mount+0x496/0xb30 fs/namespace.c:3662
 |  do_mount fs/namespace.c:3675 [inline]
 |  __do_sys_mount fs/namespace.c:3884 [inline]
 |  [...]
 |
 | read-write to 0xffff888130fb3d48 of 4 bytes by task 15563 on cpu 1:
 |  p9_fd_open net/9p/trans_fd.c:842 [inline]
 |  p9_fd_create+0x210/0x250 net/9p/trans_fd.c:1092
 |  p9_client_create+0x595/0xa70 net/9p/client.c:1010
 |  v9fs_session_init+0xf9/0xd90 fs/9p/v9fs.c:410
 |  v9fs_mount+0x69/0x630 fs/9p/vfs_super.c:123
 |  legacy_get_tree+0x74/0xd0 fs/fs_context.c:611
 |  vfs_get_tree+0x51/0x190 fs/super.c:1519
 |  do_new_mount+0x203/0x660 fs/namespace.c:3335
 |  path_mount+0x496/0xb30 fs/namespace.c:3662
 |  do_mount fs/namespace.c:3675 [inline]
 |  __do_sys_mount fs/namespace.c:3884 [inline]
 |  [...]
 |
 | value changed: 0x00008002 -> 0x00008802

Within p9_fd_open(), O_NONBLOCK is added to f_flags of the read and
write files. This may happen concurrently if e.g. mounting process
modifies the fd in another thread.

Mark the plain read-modify-writes as intentional data-races, with the
assumption that the result of executing the accesses concurrently will
always result in the same result despite the accesses themselves not
being atomic.

Reported-by: syzbot+e441aeeb422763cc5511@syzkaller.appspotmail.com
Signed-off-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/ZO38mqkS0TYUlpFp@elver.google.com
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Message-ID: <20231025103445.1248103-1-asmadeus@codewreck.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/9p/trans_fd.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index f359cfdc1858f..b44b77d3b35d1 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -835,14 +835,21 @@ static int p9_fd_open(struct p9_client *client, int rfd, int wfd)
 		goto out_free_ts;
 	if (!(ts->rd->f_mode & FMODE_READ))
 		goto out_put_rd;
-	/* prevent workers from hanging on IO when fd is a pipe */
-	ts->rd->f_flags |= O_NONBLOCK;
+	/* Prevent workers from hanging on IO when fd is a pipe.
+	 * It's technically possible for userspace or concurrent mounts to
+	 * modify this flag concurrently, which will likely result in a
+	 * broken filesystem. However, just having bad flags here should
+	 * not crash the kernel or cause any other sort of bug, so mark this
+	 * particular data race as intentional so that tooling (like KCSAN)
+	 * can allow it and detect further problems.
+	 */
+	data_race(ts->rd->f_flags |= O_NONBLOCK);
 	ts->wr = fget(wfd);
 	if (!ts->wr)
 		goto out_put_rd;
 	if (!(ts->wr->f_mode & FMODE_WRITE))
 		goto out_put_wr;
-	ts->wr->f_flags |= O_NONBLOCK;
+	data_race(ts->wr->f_flags |= O_NONBLOCK);
 
 	client->trans = ts;
 	client->status = Connected;
-- 
2.42.0


^ permalink raw reply related

* [PATCH AUTOSEL 6.1 11/11] 9p: v9fs_listxattr: fix %s null argument warning
From: Sasha Levin @ 2023-11-07 23:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dominique Martinet, Su Hui, Christian Schoenebeck, Sasha Levin,
	ericvh, lucho, davem, edumazet, kuba, pabeni, v9fs, netdev
In-Reply-To: <20231107232420.3776419-1-sashal@kernel.org>

From: Dominique Martinet <asmadeus@codewreck.org>

[ Upstream commit 9b5c6281838fc84683dd99b47302d81fce399918 ]

W=1 warns about null argument to kprintf:
In file included from fs/9p/xattr.c:12:
In function ‘v9fs_xattr_get’,
    inlined from ‘v9fs_listxattr’ at fs/9p/xattr.c:142:9:
include/net/9p/9p.h:55:2: error: ‘%s’ directive argument is null
[-Werror=format-overflow=]
   55 |  _p9_debug(level, __func__, fmt, ##__VA_ARGS__)
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use an empty string instead of :
 - this is ok 9p-wise because p9pdu_vwritef serializes a null string
and an empty string the same way (one '0' word for length)
 - since this degrades the print statements, add new single quotes for
xattr's name delimter (Old: "file = (null)", new: "file = ''")

Link: https://lore.kernel.org/r/20231008060138.517057-1-suhui@nfschina.com
Suggested-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Acked-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Message-ID: <20231025103445.1248103-2-asmadeus@codewreck.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/9p/xattr.c   | 5 +++--
 net/9p/client.c | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/9p/xattr.c b/fs/9p/xattr.c
index 2807bb63f7802..3b9aa61de8c2d 100644
--- a/fs/9p/xattr.c
+++ b/fs/9p/xattr.c
@@ -65,7 +65,7 @@ ssize_t v9fs_xattr_get(struct dentry *dentry, const char *name,
 	struct p9_fid *fid;
 	int ret;
 
-	p9_debug(P9_DEBUG_VFS, "name = %s value_len = %zu\n",
+	p9_debug(P9_DEBUG_VFS, "name = '%s' value_len = %zu\n",
 		 name, buffer_size);
 	fid = v9fs_fid_lookup(dentry);
 	if (IS_ERR(fid))
@@ -136,7 +136,8 @@ int v9fs_fid_xattr_set(struct p9_fid *fid, const char *name,
 
 ssize_t v9fs_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size)
 {
-	return v9fs_xattr_get(dentry, NULL, buffer, buffer_size);
+	/* Txattrwalk with an empty string lists xattrs instead */
+	return v9fs_xattr_get(dentry, "", buffer, buffer_size);
 }
 
 static int v9fs_xattr_handler_get(const struct xattr_handler *handler,
diff --git a/net/9p/client.c b/net/9p/client.c
index af59c3f2ec2e7..b86629deea438 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1985,7 +1985,7 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
 		goto error;
 	}
 	p9_debug(P9_DEBUG_9P,
-		 ">>> TXATTRWALK file_fid %d, attr_fid %d name %s\n",
+		 ">>> TXATTRWALK file_fid %d, attr_fid %d name '%s'\n",
 		 file_fid->fid, attr_fid->fid, attr_name);
 
 	req = p9_client_rpc(clnt, P9_TXATTRWALK, "dds",
-- 
2.42.0


^ permalink raw reply related

* [PATCH AUTOSEL 6.1 10/11] 9p/trans_fd: Annotate data-racy writes to file::f_flags
From: Sasha Levin @ 2023-11-07 23:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Marco Elver, syzbot+e441aeeb422763cc5511, Dominique Martinet,
	Sasha Levin, ericvh, lucho, davem, edumazet, kuba, pabeni, v9fs,
	netdev
In-Reply-To: <20231107232420.3776419-1-sashal@kernel.org>

From: Marco Elver <elver@google.com>

[ Upstream commit 355f074609dbf3042900ea9d30fcd2b0c323a365 ]

syzbot reported:

 | BUG: KCSAN: data-race in p9_fd_create / p9_fd_create
 |
 | read-write to 0xffff888130fb3d48 of 4 bytes by task 15599 on cpu 0:
 |  p9_fd_open net/9p/trans_fd.c:842 [inline]
 |  p9_fd_create+0x210/0x250 net/9p/trans_fd.c:1092
 |  p9_client_create+0x595/0xa70 net/9p/client.c:1010
 |  v9fs_session_init+0xf9/0xd90 fs/9p/v9fs.c:410
 |  v9fs_mount+0x69/0x630 fs/9p/vfs_super.c:123
 |  legacy_get_tree+0x74/0xd0 fs/fs_context.c:611
 |  vfs_get_tree+0x51/0x190 fs/super.c:1519
 |  do_new_mount+0x203/0x660 fs/namespace.c:3335
 |  path_mount+0x496/0xb30 fs/namespace.c:3662
 |  do_mount fs/namespace.c:3675 [inline]
 |  __do_sys_mount fs/namespace.c:3884 [inline]
 |  [...]
 |
 | read-write to 0xffff888130fb3d48 of 4 bytes by task 15563 on cpu 1:
 |  p9_fd_open net/9p/trans_fd.c:842 [inline]
 |  p9_fd_create+0x210/0x250 net/9p/trans_fd.c:1092
 |  p9_client_create+0x595/0xa70 net/9p/client.c:1010
 |  v9fs_session_init+0xf9/0xd90 fs/9p/v9fs.c:410
 |  v9fs_mount+0x69/0x630 fs/9p/vfs_super.c:123
 |  legacy_get_tree+0x74/0xd0 fs/fs_context.c:611
 |  vfs_get_tree+0x51/0x190 fs/super.c:1519
 |  do_new_mount+0x203/0x660 fs/namespace.c:3335
 |  path_mount+0x496/0xb30 fs/namespace.c:3662
 |  do_mount fs/namespace.c:3675 [inline]
 |  __do_sys_mount fs/namespace.c:3884 [inline]
 |  [...]
 |
 | value changed: 0x00008002 -> 0x00008802

Within p9_fd_open(), O_NONBLOCK is added to f_flags of the read and
write files. This may happen concurrently if e.g. mounting process
modifies the fd in another thread.

Mark the plain read-modify-writes as intentional data-races, with the
assumption that the result of executing the accesses concurrently will
always result in the same result despite the accesses themselves not
being atomic.

Reported-by: syzbot+e441aeeb422763cc5511@syzkaller.appspotmail.com
Signed-off-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/ZO38mqkS0TYUlpFp@elver.google.com
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Message-ID: <20231025103445.1248103-1-asmadeus@codewreck.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/9p/trans_fd.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 5a1aecf7fe487..a69422366a235 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -833,14 +833,21 @@ static int p9_fd_open(struct p9_client *client, int rfd, int wfd)
 		goto out_free_ts;
 	if (!(ts->rd->f_mode & FMODE_READ))
 		goto out_put_rd;
-	/* prevent workers from hanging on IO when fd is a pipe */
-	ts->rd->f_flags |= O_NONBLOCK;
+	/* Prevent workers from hanging on IO when fd is a pipe.
+	 * It's technically possible for userspace or concurrent mounts to
+	 * modify this flag concurrently, which will likely result in a
+	 * broken filesystem. However, just having bad flags here should
+	 * not crash the kernel or cause any other sort of bug, so mark this
+	 * particular data race as intentional so that tooling (like KCSAN)
+	 * can allow it and detect further problems.
+	 */
+	data_race(ts->rd->f_flags |= O_NONBLOCK);
 	ts->wr = fget(wfd);
 	if (!ts->wr)
 		goto out_put_rd;
 	if (!(ts->wr->f_mode & FMODE_WRITE))
 		goto out_put_wr;
-	ts->wr->f_flags |= O_NONBLOCK;
+	data_race(ts->wr->f_flags |= O_NONBLOCK);
 
 	client->trans = ts;
 	client->status = Connected;
-- 
2.42.0


^ permalink raw reply related

* [PATCH AUTOSEL 6.5 18/18] 9p: v9fs_listxattr: fix %s null argument warning
From: Sasha Levin @ 2023-11-07 23:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dominique Martinet, Su Hui, Christian Schoenebeck, Sasha Levin,
	ericvh, lucho, davem, edumazet, kuba, pabeni, v9fs, netdev
In-Reply-To: <20231107232330.3776001-1-sashal@kernel.org>

From: Dominique Martinet <asmadeus@codewreck.org>

[ Upstream commit 9b5c6281838fc84683dd99b47302d81fce399918 ]

W=1 warns about null argument to kprintf:
In file included from fs/9p/xattr.c:12:
In function ‘v9fs_xattr_get’,
    inlined from ‘v9fs_listxattr’ at fs/9p/xattr.c:142:9:
include/net/9p/9p.h:55:2: error: ‘%s’ directive argument is null
[-Werror=format-overflow=]
   55 |  _p9_debug(level, __func__, fmt, ##__VA_ARGS__)
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use an empty string instead of :
 - this is ok 9p-wise because p9pdu_vwritef serializes a null string
and an empty string the same way (one '0' word for length)
 - since this degrades the print statements, add new single quotes for
xattr's name delimter (Old: "file = (null)", new: "file = ''")

Link: https://lore.kernel.org/r/20231008060138.517057-1-suhui@nfschina.com
Suggested-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Acked-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Message-ID: <20231025103445.1248103-2-asmadeus@codewreck.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/9p/xattr.c   | 5 +++--
 net/9p/client.c | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/9p/xattr.c b/fs/9p/xattr.c
index e00cf8109b3f3..3c4572ef3a488 100644
--- a/fs/9p/xattr.c
+++ b/fs/9p/xattr.c
@@ -68,7 +68,7 @@ ssize_t v9fs_xattr_get(struct dentry *dentry, const char *name,
 	struct p9_fid *fid;
 	int ret;
 
-	p9_debug(P9_DEBUG_VFS, "name = %s value_len = %zu\n",
+	p9_debug(P9_DEBUG_VFS, "name = '%s' value_len = %zu\n",
 		 name, buffer_size);
 	fid = v9fs_fid_lookup(dentry);
 	if (IS_ERR(fid))
@@ -139,7 +139,8 @@ int v9fs_fid_xattr_set(struct p9_fid *fid, const char *name,
 
 ssize_t v9fs_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size)
 {
-	return v9fs_xattr_get(dentry, NULL, buffer, buffer_size);
+	/* Txattrwalk with an empty string lists xattrs instead */
+	return v9fs_xattr_get(dentry, "", buffer, buffer_size);
 }
 
 static int v9fs_xattr_handler_get(const struct xattr_handler *handler,
diff --git a/net/9p/client.c b/net/9p/client.c
index 86bbc7147fc14..9c2bc15e3cfaf 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1979,7 +1979,7 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
 		goto error;
 	}
 	p9_debug(P9_DEBUG_9P,
-		 ">>> TXATTRWALK file_fid %d, attr_fid %d name %s\n",
+		 ">>> TXATTRWALK file_fid %d, attr_fid %d name '%s'\n",
 		 file_fid->fid, attr_fid->fid, attr_name);
 
 	req = p9_client_rpc(clnt, P9_TXATTRWALK, "dds",
-- 
2.42.0


^ permalink raw reply related

* [PATCH AUTOSEL 6.5 17/18] 9p/trans_fd: Annotate data-racy writes to file::f_flags
From: Sasha Levin @ 2023-11-07 23:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Marco Elver, syzbot+e441aeeb422763cc5511, Dominique Martinet,
	Sasha Levin, ericvh, lucho, davem, edumazet, kuba, pabeni, v9fs,
	netdev
In-Reply-To: <20231107232330.3776001-1-sashal@kernel.org>

From: Marco Elver <elver@google.com>

[ Upstream commit 355f074609dbf3042900ea9d30fcd2b0c323a365 ]

syzbot reported:

 | BUG: KCSAN: data-race in p9_fd_create / p9_fd_create
 |
 | read-write to 0xffff888130fb3d48 of 4 bytes by task 15599 on cpu 0:
 |  p9_fd_open net/9p/trans_fd.c:842 [inline]
 |  p9_fd_create+0x210/0x250 net/9p/trans_fd.c:1092
 |  p9_client_create+0x595/0xa70 net/9p/client.c:1010
 |  v9fs_session_init+0xf9/0xd90 fs/9p/v9fs.c:410
 |  v9fs_mount+0x69/0x630 fs/9p/vfs_super.c:123
 |  legacy_get_tree+0x74/0xd0 fs/fs_context.c:611
 |  vfs_get_tree+0x51/0x190 fs/super.c:1519
 |  do_new_mount+0x203/0x660 fs/namespace.c:3335
 |  path_mount+0x496/0xb30 fs/namespace.c:3662
 |  do_mount fs/namespace.c:3675 [inline]
 |  __do_sys_mount fs/namespace.c:3884 [inline]
 |  [...]
 |
 | read-write to 0xffff888130fb3d48 of 4 bytes by task 15563 on cpu 1:
 |  p9_fd_open net/9p/trans_fd.c:842 [inline]
 |  p9_fd_create+0x210/0x250 net/9p/trans_fd.c:1092
 |  p9_client_create+0x595/0xa70 net/9p/client.c:1010
 |  v9fs_session_init+0xf9/0xd90 fs/9p/v9fs.c:410
 |  v9fs_mount+0x69/0x630 fs/9p/vfs_super.c:123
 |  legacy_get_tree+0x74/0xd0 fs/fs_context.c:611
 |  vfs_get_tree+0x51/0x190 fs/super.c:1519
 |  do_new_mount+0x203/0x660 fs/namespace.c:3335
 |  path_mount+0x496/0xb30 fs/namespace.c:3662
 |  do_mount fs/namespace.c:3675 [inline]
 |  __do_sys_mount fs/namespace.c:3884 [inline]
 |  [...]
 |
 | value changed: 0x00008002 -> 0x00008802

Within p9_fd_open(), O_NONBLOCK is added to f_flags of the read and
write files. This may happen concurrently if e.g. mounting process
modifies the fd in another thread.

Mark the plain read-modify-writes as intentional data-races, with the
assumption that the result of executing the accesses concurrently will
always result in the same result despite the accesses themselves not
being atomic.

Reported-by: syzbot+e441aeeb422763cc5511@syzkaller.appspotmail.com
Signed-off-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/ZO38mqkS0TYUlpFp@elver.google.com
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Message-ID: <20231025103445.1248103-1-asmadeus@codewreck.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/9p/trans_fd.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 00b684616e8d9..9374790f17ce4 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -832,14 +832,21 @@ static int p9_fd_open(struct p9_client *client, int rfd, int wfd)
 		goto out_free_ts;
 	if (!(ts->rd->f_mode & FMODE_READ))
 		goto out_put_rd;
-	/* prevent workers from hanging on IO when fd is a pipe */
-	ts->rd->f_flags |= O_NONBLOCK;
+	/* Prevent workers from hanging on IO when fd is a pipe.
+	 * It's technically possible for userspace or concurrent mounts to
+	 * modify this flag concurrently, which will likely result in a
+	 * broken filesystem. However, just having bad flags here should
+	 * not crash the kernel or cause any other sort of bug, so mark this
+	 * particular data race as intentional so that tooling (like KCSAN)
+	 * can allow it and detect further problems.
+	 */
+	data_race(ts->rd->f_flags |= O_NONBLOCK);
 	ts->wr = fget(wfd);
 	if (!ts->wr)
 		goto out_put_rd;
 	if (!(ts->wr->f_mode & FMODE_WRITE))
 		goto out_put_wr;
-	ts->wr->f_flags |= O_NONBLOCK;
+	data_race(ts->wr->f_flags |= O_NONBLOCK);
 
 	client->trans = ts;
 	client->status = Connected;
-- 
2.42.0


^ permalink raw reply related

* [PATCH AUTOSEL 6.6 18/18] 9p: v9fs_listxattr: fix %s null argument warning
From: Sasha Levin @ 2023-11-07 23:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dominique Martinet, Su Hui, Christian Schoenebeck, Sasha Levin,
	ericvh, lucho, davem, edumazet, kuba, pabeni, v9fs, netdev
In-Reply-To: <20231107232231.3775605-1-sashal@kernel.org>

From: Dominique Martinet <asmadeus@codewreck.org>

[ Upstream commit 9b5c6281838fc84683dd99b47302d81fce399918 ]

W=1 warns about null argument to kprintf:
In file included from fs/9p/xattr.c:12:
In function ‘v9fs_xattr_get’,
    inlined from ‘v9fs_listxattr’ at fs/9p/xattr.c:142:9:
include/net/9p/9p.h:55:2: error: ‘%s’ directive argument is null
[-Werror=format-overflow=]
   55 |  _p9_debug(level, __func__, fmt, ##__VA_ARGS__)
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use an empty string instead of :
 - this is ok 9p-wise because p9pdu_vwritef serializes a null string
and an empty string the same way (one '0' word for length)
 - since this degrades the print statements, add new single quotes for
xattr's name delimter (Old: "file = (null)", new: "file = ''")

Link: https://lore.kernel.org/r/20231008060138.517057-1-suhui@nfschina.com
Suggested-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Acked-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Message-ID: <20231025103445.1248103-2-asmadeus@codewreck.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/9p/xattr.c   | 5 +++--
 net/9p/client.c | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/9p/xattr.c b/fs/9p/xattr.c
index e00cf8109b3f3..3c4572ef3a488 100644
--- a/fs/9p/xattr.c
+++ b/fs/9p/xattr.c
@@ -68,7 +68,7 @@ ssize_t v9fs_xattr_get(struct dentry *dentry, const char *name,
 	struct p9_fid *fid;
 	int ret;
 
-	p9_debug(P9_DEBUG_VFS, "name = %s value_len = %zu\n",
+	p9_debug(P9_DEBUG_VFS, "name = '%s' value_len = %zu\n",
 		 name, buffer_size);
 	fid = v9fs_fid_lookup(dentry);
 	if (IS_ERR(fid))
@@ -139,7 +139,8 @@ int v9fs_fid_xattr_set(struct p9_fid *fid, const char *name,
 
 ssize_t v9fs_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size)
 {
-	return v9fs_xattr_get(dentry, NULL, buffer, buffer_size);
+	/* Txattrwalk with an empty string lists xattrs instead */
+	return v9fs_xattr_get(dentry, "", buffer, buffer_size);
 }
 
 static int v9fs_xattr_handler_get(const struct xattr_handler *handler,
diff --git a/net/9p/client.c b/net/9p/client.c
index 86bbc7147fc14..9c2bc15e3cfaf 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1979,7 +1979,7 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
 		goto error;
 	}
 	p9_debug(P9_DEBUG_9P,
-		 ">>> TXATTRWALK file_fid %d, attr_fid %d name %s\n",
+		 ">>> TXATTRWALK file_fid %d, attr_fid %d name '%s'\n",
 		 file_fid->fid, attr_fid->fid, attr_name);
 
 	req = p9_client_rpc(clnt, P9_TXATTRWALK, "dds",
-- 
2.42.0


^ permalink raw reply related

* [PATCH AUTOSEL 6.6 17/18] 9p/trans_fd: Annotate data-racy writes to file::f_flags
From: Sasha Levin @ 2023-11-07 23:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Marco Elver, syzbot+e441aeeb422763cc5511, Dominique Martinet,
	Sasha Levin, ericvh, lucho, davem, edumazet, kuba, pabeni, v9fs,
	netdev
In-Reply-To: <20231107232231.3775605-1-sashal@kernel.org>

From: Marco Elver <elver@google.com>

[ Upstream commit 355f074609dbf3042900ea9d30fcd2b0c323a365 ]

syzbot reported:

 | BUG: KCSAN: data-race in p9_fd_create / p9_fd_create
 |
 | read-write to 0xffff888130fb3d48 of 4 bytes by task 15599 on cpu 0:
 |  p9_fd_open net/9p/trans_fd.c:842 [inline]
 |  p9_fd_create+0x210/0x250 net/9p/trans_fd.c:1092
 |  p9_client_create+0x595/0xa70 net/9p/client.c:1010
 |  v9fs_session_init+0xf9/0xd90 fs/9p/v9fs.c:410
 |  v9fs_mount+0x69/0x630 fs/9p/vfs_super.c:123
 |  legacy_get_tree+0x74/0xd0 fs/fs_context.c:611
 |  vfs_get_tree+0x51/0x190 fs/super.c:1519
 |  do_new_mount+0x203/0x660 fs/namespace.c:3335
 |  path_mount+0x496/0xb30 fs/namespace.c:3662
 |  do_mount fs/namespace.c:3675 [inline]
 |  __do_sys_mount fs/namespace.c:3884 [inline]
 |  [...]
 |
 | read-write to 0xffff888130fb3d48 of 4 bytes by task 15563 on cpu 1:
 |  p9_fd_open net/9p/trans_fd.c:842 [inline]
 |  p9_fd_create+0x210/0x250 net/9p/trans_fd.c:1092
 |  p9_client_create+0x595/0xa70 net/9p/client.c:1010
 |  v9fs_session_init+0xf9/0xd90 fs/9p/v9fs.c:410
 |  v9fs_mount+0x69/0x630 fs/9p/vfs_super.c:123
 |  legacy_get_tree+0x74/0xd0 fs/fs_context.c:611
 |  vfs_get_tree+0x51/0x190 fs/super.c:1519
 |  do_new_mount+0x203/0x660 fs/namespace.c:3335
 |  path_mount+0x496/0xb30 fs/namespace.c:3662
 |  do_mount fs/namespace.c:3675 [inline]
 |  __do_sys_mount fs/namespace.c:3884 [inline]
 |  [...]
 |
 | value changed: 0x00008002 -> 0x00008802

Within p9_fd_open(), O_NONBLOCK is added to f_flags of the read and
write files. This may happen concurrently if e.g. mounting process
modifies the fd in another thread.

Mark the plain read-modify-writes as intentional data-races, with the
assumption that the result of executing the accesses concurrently will
always result in the same result despite the accesses themselves not
being atomic.

Reported-by: syzbot+e441aeeb422763cc5511@syzkaller.appspotmail.com
Signed-off-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/ZO38mqkS0TYUlpFp@elver.google.com
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Message-ID: <20231025103445.1248103-1-asmadeus@codewreck.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/9p/trans_fd.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index c4015f30f9fa7..d0eb03ada704d 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -832,14 +832,21 @@ static int p9_fd_open(struct p9_client *client, int rfd, int wfd)
 		goto out_free_ts;
 	if (!(ts->rd->f_mode & FMODE_READ))
 		goto out_put_rd;
-	/* prevent workers from hanging on IO when fd is a pipe */
-	ts->rd->f_flags |= O_NONBLOCK;
+	/* Prevent workers from hanging on IO when fd is a pipe.
+	 * It's technically possible for userspace or concurrent mounts to
+	 * modify this flag concurrently, which will likely result in a
+	 * broken filesystem. However, just having bad flags here should
+	 * not crash the kernel or cause any other sort of bug, so mark this
+	 * particular data race as intentional so that tooling (like KCSAN)
+	 * can allow it and detect further problems.
+	 */
+	data_race(ts->rd->f_flags |= O_NONBLOCK);
 	ts->wr = fget(wfd);
 	if (!ts->wr)
 		goto out_put_rd;
 	if (!(ts->wr->f_mode & FMODE_WRITE))
 		goto out_put_wr;
-	ts->wr->f_flags |= O_NONBLOCK;
+	data_race(ts->wr->f_flags |= O_NONBLOCK);
 
 	client->trans = ts;
 	client->status = Connected;
-- 
2.42.0


^ permalink raw reply related

* Re: [RFC PATCH v3 05/12] netdev: netdevice devmem allocator
From: Mina Almasry @ 2023-11-07 23:03 UTC (permalink / raw)
  To: David Ahern, David Wei
  Cc: netdev, linux-kernel, linux-arch, linux-kselftest, linux-media,
	dri-devel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas,
	Arnd Bergmann, Willem de Bruijn, Shuah Khan, Sumit Semwal,
	Christian König, Shakeel Butt, Jeroen de Borst,
	Praveen Kaligineedi, Willem de Bruijn, Kaiyuan Zhang
In-Reply-To: <a5b95e6b-8716-4e2e-9183-959b754b5b5e@kernel.org>

On Tue, Nov 7, 2023 at 2:55 PM David Ahern <dsahern@kernel.org> wrote:
>
> On 11/7/23 3:10 PM, Mina Almasry wrote:
> > On Mon, Nov 6, 2023 at 3:44 PM David Ahern <dsahern@kernel.org> wrote:
> >>
> >> On 11/5/23 7:44 PM, Mina Almasry wrote:
> >>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >>> index eeeda849115c..1c351c138a5b 100644
> >>> --- a/include/linux/netdevice.h
> >>> +++ b/include/linux/netdevice.h
> >>> @@ -843,6 +843,9 @@ struct netdev_dmabuf_binding {
> >>>  };
> >>>
> >>>  #ifdef CONFIG_DMA_SHARED_BUFFER
> >>> +struct page_pool_iov *
> >>> +netdev_alloc_devmem(struct netdev_dmabuf_binding *binding);
> >>> +void netdev_free_devmem(struct page_pool_iov *ppiov);
> >>
> >> netdev_{alloc,free}_dmabuf?
> >>
> >
> > Can do.
> >
> >> I say that because a dmabuf can be host memory, at least I am not aware
> >> of a restriction that a dmabuf is device memory.
> >>
> >
> > In my limited experience dma-buf is generally device memory, and
> > that's really its use case. CONFIG_UDMABUF is a driver that mocks
> > dma-buf with a memfd which I think is used for testing. But I can do
> > the rename, it's more clear anyway, I think.
>
> config UDMABUF
>         bool "userspace dmabuf misc driver"
>         default n
>         depends on DMA_SHARED_BUFFER
>         depends on MEMFD_CREATE || COMPILE_TEST
>         help
>           A driver to let userspace turn memfd regions into dma-bufs.
>           Qemu can use this to create host dmabufs for guest framebuffers.
>
>
> Qemu is just a userspace process; it is no way a special one.
>
> Treating host memory as a dmabuf should radically simplify the io_uring
> extension of this set.

I agree actually, and I was about to make that comment to David Wei's
series once I have the time.

David, your io_uring RX zerocopy proposal actually works with devmem
TCP, if you're inclined to do that instead, what you'd do roughly is
(I think):

- Allocate a memfd,
- Use CONFIG_UDMABUF to create a dma-buf out of that memfd.
- Bind the dma-buf to the NIC using the netlink API in this RFC.
- Your io_uring extensions and io_uring uapi should work as-is almost
on top of this series, I think.

If you do this the incoming packets should land into your memfd, which
may or may not work for you. In the future if you feel inclined to use
device memory, this approach that I'm describing here would be more
extensible to device memory, because you'd already be using dma-bufs
for your user memory; you'd just replace one kind of dma-buf (UDMABUF)
with another.

> That the io_uring set needs to dive into
> page_pools is just wrong - complicating the design and code and pushing
> io_uring into a realm it does not need to be involved in.
>
> Most (all?) of this patch set can work with any memory; only device
> memory is unreadable.
>
>


-- 
Thanks,
Mina

^ permalink raw reply

* Re: [PATCH net] net_sched: sch_fq: better validate TCA_FQ_WEIGHTS and TCA_FQ_PRIOMAP
From: Jamal Hadi Salim @ 2023-11-07 22:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Willem de Bruijn,
	Cong Wang, Toke Høiland-Jørgensen, Jiri Pirko, netdev,
	eric.dumazet, syzbot
In-Reply-To: <20231107160440.1992526-1-edumazet@google.com>

On Tue, Nov 7, 2023 at 11:04 AM Eric Dumazet <edumazet@google.com> wrote:
>
> syzbot was able to trigger the following report while providing
> too small TCA_FQ_WEIGHTS attribute [1]
>
> Fix is to use NLA_POLICY_EXACT_LEN() to ensure user space
> provided correct sizes.
>
> Apply the same fix to TCA_FQ_PRIOMAP.
>
> [1]
> BUG: KMSAN: uninit-value in fq_load_weights net/sched/sch_fq.c:960 [inline]
> BUG: KMSAN: uninit-value in fq_change+0x1348/0x2fe0 net/sched/sch_fq.c:1071
> fq_load_weights net/sched/sch_fq.c:960 [inline]
> fq_change+0x1348/0x2fe0 net/sched/sch_fq.c:1071
> fq_init+0x68e/0x780 net/sched/sch_fq.c:1159
> qdisc_create+0x12f3/0x1be0 net/sched/sch_api.c:1326
> tc_modify_qdisc+0x11ef/0x2c20
> rtnetlink_rcv_msg+0x16a6/0x1840 net/core/rtnetlink.c:6558
> netlink_rcv_skb+0x371/0x650 net/netlink/af_netlink.c:2545
> rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6576
> netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline]
> netlink_unicast+0xf47/0x1250 net/netlink/af_netlink.c:1368
> netlink_sendmsg+0x1238/0x13d0 net/netlink/af_netlink.c:1910
> sock_sendmsg_nosec net/socket.c:730 [inline]
> __sock_sendmsg net/socket.c:745 [inline]
> ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2588
> ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2642
> __sys_sendmsg net/socket.c:2671 [inline]
> __do_sys_sendmsg net/socket.c:2680 [inline]
> __se_sys_sendmsg net/socket.c:2678 [inline]
> __x64_sys_sendmsg+0x307/0x490 net/socket.c:2678
> do_syscall_x64 arch/x86/entry/common.c:51 [inline]
> do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
> entry_SYSCALL_64_after_hwframe+0x63/0x6b
>
> Uninit was created at:
> slab_post_alloc_hook+0x129/0xa70 mm/slab.h:768
> slab_alloc_node mm/slub.c:3478 [inline]
> kmem_cache_alloc_node+0x5e9/0xb10 mm/slub.c:3523
> kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:560
> __alloc_skb+0x318/0x740 net/core/skbuff.c:651
> alloc_skb include/linux/skbuff.h:1286 [inline]
> netlink_alloc_large_skb net/netlink/af_netlink.c:1214 [inline]
> netlink_sendmsg+0xb34/0x13d0 net/netlink/af_netlink.c:1885
> sock_sendmsg_nosec net/socket.c:730 [inline]
> __sock_sendmsg net/socket.c:745 [inline]
> ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2588
> ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2642
> __sys_sendmsg net/socket.c:2671 [inline]
> __do_sys_sendmsg net/socket.c:2680 [inline]
> __se_sys_sendmsg net/socket.c:2678 [inline]
> __x64_sys_sendmsg+0x307/0x490 net/socket.c:2678
> do_syscall_x64 arch/x86/entry/common.c:51 [inline]
> do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
> entry_SYSCALL_64_after_hwframe+0x63/0x6b
>
> CPU: 1 PID: 5001 Comm: syz-executor300 Not tainted 6.6.0-syzkaller-12401-g8f6f76a6a29f #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
>
> Fixes: 29f834aa326e ("net_sched: sch_fq: add 3 bands and WRR scheduling")
> Fixes: 49e7265fd098 ("net_sched: sch_fq: add TCA_FQ_WEIGHTS attribute")
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/sched/sch_fq.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
> index 0fd18c344ab5ae6d53e12fc764c0506a2979b4c8..3a31c47fea9bd97d815f2624d926bf7be62387cd 100644
> --- a/net/sched/sch_fq.c
> +++ b/net/sched/sch_fq.c
> @@ -919,14 +919,8 @@ static const struct nla_policy fq_policy[TCA_FQ_MAX + 1] = {
>         [TCA_FQ_TIMER_SLACK]            = { .type = NLA_U32 },
>         [TCA_FQ_HORIZON]                = { .type = NLA_U32 },
>         [TCA_FQ_HORIZON_DROP]           = { .type = NLA_U8 },
> -       [TCA_FQ_PRIOMAP]                = {
> -                       .type = NLA_BINARY,
> -                       .len = sizeof(struct tc_prio_qopt),
> -               },
> -       [TCA_FQ_WEIGHTS]                = {
> -                       .type = NLA_BINARY,
> -                       .len = FQ_BANDS * sizeof(s32),
> -               },
> +       [TCA_FQ_PRIOMAP]                = NLA_POLICY_EXACT_LEN(sizeof(struct tc_prio_qopt)),
> +       [TCA_FQ_WEIGHTS]                = NLA_POLICY_EXACT_LEN(FQ_BANDS * sizeof(s32)),
>  };
>
>  /* compress a u8 array with all elems <= 3 to an array of 2-bit fields */

Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>

cheers,
jamal
> --
> 2.42.0.869.gea05f2083d-goog
>

^ permalink raw reply

* Re: [RFC PATCH v3 05/12] netdev: netdevice devmem allocator
From: David Ahern @ 2023-11-07 22:55 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, linux-kernel, linux-arch, linux-kselftest, linux-media,
	dri-devel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas,
	Arnd Bergmann, Willem de Bruijn, Shuah Khan, Sumit Semwal,
	Christian König, Shakeel Butt, Jeroen de Borst,
	Praveen Kaligineedi, Willem de Bruijn, Kaiyuan Zhang
In-Reply-To: <CAHS8izOHYx+oYnzksUDrK1S0+6CdMJmirApntP5W862yFumezw@mail.gmail.com>

On 11/7/23 3:10 PM, Mina Almasry wrote:
> On Mon, Nov 6, 2023 at 3:44 PM David Ahern <dsahern@kernel.org> wrote:
>>
>> On 11/5/23 7:44 PM, Mina Almasry wrote:
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index eeeda849115c..1c351c138a5b 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -843,6 +843,9 @@ struct netdev_dmabuf_binding {
>>>  };
>>>
>>>  #ifdef CONFIG_DMA_SHARED_BUFFER
>>> +struct page_pool_iov *
>>> +netdev_alloc_devmem(struct netdev_dmabuf_binding *binding);
>>> +void netdev_free_devmem(struct page_pool_iov *ppiov);
>>
>> netdev_{alloc,free}_dmabuf?
>>
> 
> Can do.
> 
>> I say that because a dmabuf can be host memory, at least I am not aware
>> of a restriction that a dmabuf is device memory.
>>
> 
> In my limited experience dma-buf is generally device memory, and
> that's really its use case. CONFIG_UDMABUF is a driver that mocks
> dma-buf with a memfd which I think is used for testing. But I can do
> the rename, it's more clear anyway, I think.

config UDMABUF
        bool "userspace dmabuf misc driver"
        default n
        depends on DMA_SHARED_BUFFER
        depends on MEMFD_CREATE || COMPILE_TEST
        help
          A driver to let userspace turn memfd regions into dma-bufs.
          Qemu can use this to create host dmabufs for guest framebuffers.


Qemu is just a userspace process; it is no way a special one.

Treating host memory as a dmabuf should radically simplify the io_uring
extension of this set. That the io_uring set needs to dive into
page_pools is just wrong - complicating the design and code and pushing
io_uring into a realm it does not need to be involved in.

Most (all?) of this patch set can work with any memory; only device
memory is unreadable.



^ permalink raw reply

* Re: [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval
From: kernel test robot @ 2023-11-07 22:45 UTC (permalink / raw)
  To: Faizal Rahim, Vladimir Oltean, Vinicius Costa Gomes,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: oe-kbuild-all, netdev, linux-kernel
In-Reply-To: <20231107112023.676016-5-faizal.abdul.rahim@linux.intel.com>

Hi Faizal,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Faizal-Rahim/net-sched-taprio-fix-too-early-schedules-switching/20231107-192843
base:   net/main
patch link:    https://lore.kernel.org/r/20231107112023.676016-5-faizal.abdul.rahim%40linux.intel.com
patch subject: [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval
config: powerpc-allmodconfig (https://download.01.org/0day-ci/archive/20231108/202311080506.qMlPx2WA-lkp@intel.com/config)
compiler: powerpc64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231108/202311080506.qMlPx2WA-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311080506.qMlPx2WA-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> net/sched/sch_taprio.c:227:5: warning: no previous prototype for 'get_interval' [-Wmissing-prototypes]
     227 | u32 get_interval(const struct sched_entry *entry,
         |     ^~~~~~~~~~~~
>> net/sched/sch_taprio.c:236:5: warning: no previous prototype for 'get_cycle_time' [-Wmissing-prototypes]
     236 | s64 get_cycle_time(const struct sched_gate_list *oper)
         |     ^~~~~~~~~~~~~~


vim +/get_interval +227 net/sched/sch_taprio.c

   226	
 > 227	u32 get_interval(const struct sched_entry *entry,
   228			 const struct sched_gate_list *oper)
   229	{
   230		if (entry->correction_active)
   231			return entry->interval + oper->cycle_time_correction;
   232		else
   233			return entry->interval;
   234	}
   235	
 > 236	s64 get_cycle_time(const struct sched_gate_list *oper)
   237	{
   238		if (cycle_corr_active(oper->cycle_time_correction))
   239			return oper->cycle_time + oper->cycle_time_correction;
   240		else
   241			return oper->cycle_time;
   242	}
   243	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: Missing a write memory barrier in tls_init()
From: Sabrina Dubroca @ 2023-11-07 22:45 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Dae R. Jeong, borisp, john.fastabend, davem, edumazet, pabeni,
	netdev, linux-kernel, ywchoi
In-Reply-To: <20231106143659.12e0d126@kernel.org>

2023-11-06, 14:36:59 -0800, Jakub Kicinski wrote:
> On Thu, 2 Nov 2023 16:11:29 +0900 Dae R. Jeong wrote:
> > In addition, I believe the {tls_setsockopt, tls_getsockopt}
> > implementation is fine because of the address dependency. I think
> > load-load reordering is prohibited in this case so we don't need a
> > read barrier.
> 
> Sounds plausible, could you send a patch?
>
> The smb_wmb() would be better placed in tls_init(), IMHO.

Wouldn't it be enough to just move the rcu_assign_pointer after ctx is
fully initialized, ie just before update_sk_prot? also clearer wrt
RCU.
(and maybe get rid of tls_ctx_create and move all that into tls_init,
it's not much and we don't even set ctx->{tx,rx}_conf in there)

-- 
Sabrina


^ permalink raw reply

* Re: [PATCH net v2] net: phylink: initialize carrier state at creation
From: patchwork-bot+netdevbpf @ 2023-11-07 22:40 UTC (permalink / raw)
  To: Klaus Kudielka
  Cc: rmk+kernel, andrew, hkallweit1, davem, edumazet, kuba, pabeni,
	netdev, linux-kernel, stable
In-Reply-To: <20231107174402.3590-1-klaus.kudielka@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Tue,  7 Nov 2023 18:44:02 +0100 you wrote:
> Background: Turris Omnia (Armada 385); eth2 (mvneta) connected to SFP bus;
> SFP module is present, but no fiber connected, so definitely no carrier.
> 
> After booting, eth2 is down, but netdev LED trigger surprisingly reports
> link active. Then, after "ip link set eth2 up", the link indicator goes
> away - as I would have expected it from the beginning.
> 
> [...]

Here is the summary with links:
  - [net,v2] net: phylink: initialize carrier state at creation
    https://git.kernel.org/netdev/net/c/02d5fdbf4f2b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v3] tcp: Fix -Wc23-extensions in tcp_options_write()
From: patchwork-bot+netdevbpf @ 2023-11-07 22:30 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: edumazet, davem, dsahern, kuba, pabeni, ndesaulniers, trix,
	0x7f454c46, noureddine, hch, netdev, linux-kernel, llvm, patches
In-Reply-To: <20231106-tcp-ao-fix-label-in-compound-statement-warning-v3-1-b54a64602a85@kernel.org>

Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Mon, 06 Nov 2023 14:14:16 -0700 you wrote:
> Clang warns (or errors with CONFIG_WERROR=y) when CONFIG_TCP_AO is set:
> 
>   net/ipv4/tcp_output.c:663:2: error: label at end of compound statement is a C23 extension [-Werror,-Wc23-extensions]
>     663 |         }
>         |         ^
>   1 error generated.
> 
> [...]

Here is the summary with links:
  - [net,v3] tcp: Fix -Wc23-extensions in tcp_options_write()
    https://git.kernel.org/netdev/net/c/7425627b2b2c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 0/4] vsock: fix server prevents clients from reconnecting
From: patchwork-bot+netdevbpf @ 2023-11-07 22:30 UTC (permalink / raw)
  To: None
  Cc: luigi.leonardi, kvm, davem, edumazet, mst, imbrenda, kuba, asias,
	stefanha, pabeni, sgarzare, netdev, linux-kernel, virtualization
In-Reply-To: <20231103175551.41025-1-f.storniolo95@gmail.com>

Hello:

This series was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Fri,  3 Nov 2023 18:55:47 +0100 you wrote:
> From: Filippo Storniolo <f.storniolo95@gmail.com>
> 
> This patch series introduce fix and tests for the following vsock bug:
> If the same remote peer, using the same port, tries to connect
> to a server on a listening port more than once, the server will
> reject the connection, causing a "connection reset by peer"
> error on the remote peer. This is due to the presence of a
> dangling socket from a previous connection in both the connected
> and bound socket lists.
> The inconsistency of the above lists only occurs when the remote
> peer disconnects and the server remains active.
> This bug does not occur when the server socket is closed.
> 
> [...]

Here is the summary with links:
  - [net,1/4] vsock/virtio: remove socket from connected/bound list on shutdown
    https://git.kernel.org/netdev/net/c/3a5cc90a4d17
  - [net,2/4] test/vsock fix: add missing check on socket creation
    https://git.kernel.org/netdev/net/c/bfada5a7672f
  - [net,3/4] test/vsock: refactor vsock_accept
    https://git.kernel.org/netdev/net/c/84d5fb974131
  - [net,4/4] test/vsock: add dobule bind connect test
    https://git.kernel.org/netdev/net/c/d80f63f69025

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [RFC PATCH v3 09/12] net: add support for skbs with unreadable frags
From: Stanislav Fomichev @ 2023-11-07 22:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mina Almasry, Willem de Bruijn, David Ahern, netdev, linux-kernel,
	linux-arch, linux-kselftest, linux-media, dri-devel,
	linaro-mm-sig, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Jesper Dangaard Brouer, Ilias Apalodimas, Arnd Bergmann,
	Shuah Khan, Sumit Semwal, Christian König, Shakeel Butt,
	Jeroen de Borst, Praveen Kaligineedi, Willem de Bruijn,
	Kaiyuan Zhang
In-Reply-To: <CANn89iJNR8bYYBO92=f5_2hFoTK8+giH11o-7NHURoahwvV11w@mail.gmail.com>

On 11/07, Eric Dumazet wrote:
> On Tue, Nov 7, 2023 at 10:05 PM Stanislav Fomichev <sdf@google.com> wrote:
> 
> >
> > I don't understand. We require an elaborate setup to receive devmem cmsgs,
> > why would some random application receive those?
> 
> 
> A TCP socket can receive 'valid TCP packets' from many different sources,
> especially with BPF hooks...
> 
> Think of a bonding setup, packets being mirrored by some switches or
> even from tc.
> 
> Better double check than be sorry.
> 
> We have not added a 5th component in the 4-tuple lookups, being "is
> this socket a devmem one".
> 
> A mix of regular/devmem skb is supported.

Can we mark a socket as devmem-only? Do we have any use-case for those
hybrid setups? Or, let me put it that way: do we expect API callers
to handle both linear and non-linear cases correctly?
As a consumer of the previous versions of these apis internally,
I find all those corner cases confusing :-( Hence trying to understand
whether we can make it a bit more rigid and properly defined upstream.

But going back to that MSG_SOCK_DEVMEM flag. If the application is
supposed to handle both linear and devmem chucks, why do we need
this extra MSG_SOCK_DEVMEM opt-in to signal that it's able to process
it? From Mina's reply, it seemed like MSG_SOCK_DEVMEM is there to
protect random applications that get misrouted devmem skb. I don't
see how returning EFAULT helps in that case.

^ permalink raw reply

* Re: [PATCH] s390/qeth: Fix typo 'weed' in comment
From: patchwork-bot+netdevbpf @ 2023-11-07 22:20 UTC (permalink / raw)
  To: Kuan-Wei Chiu
  Cc: wintera, wenjia, hca, gor, agordeev, borntraeger, svens,
	linux-s390, netdev, linux-kernel
In-Reply-To: <20231106222059.1475375-1-visitorckw@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Tue,  7 Nov 2023 06:20:59 +0800 you wrote:
> Replace 'weed' with 'we' in the comment.
> 
> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> ---
>  drivers/s390/net/qeth_core_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Here is the summary with links:
  - s390/qeth: Fix typo 'weed' in comment
    https://git.kernel.org/netdev/net/c/dbc9e341e365

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox