Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v2 14/21] virtio_net: xsk: tx: virtnet_free_old_xmit() distinguishes xsk buffer
From: Xuan Zhuo @ 2023-11-10  5:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jason Wang, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf
In-Reply-To: <20231110003159-mutt-send-email-mst@kernel.org>

On Fri, 10 Nov 2023 00:32:50 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Nov 10, 2023 at 09:44:32AM +0800, Xuan Zhuo wrote:
> > On Thu, 9 Nov 2023 06:59:48 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Thu, Nov 09, 2023 at 07:16:08PM +0800, Xuan Zhuo wrote:
> > > > On Thu, 9 Nov 2023 06:11:49 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Tue, Nov 07, 2023 at 11:12:20AM +0800, Xuan Zhuo wrote:
> > > > > > virtnet_free_old_xmit distinguishes three type ptr(skb, xdp frame, xsk
> > > > > > buffer) by the last bits of the pointer.
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > ---
> > > > > >  drivers/net/virtio/virtio_net.h | 18 ++++++++++++++++--
> > > > > >  drivers/net/virtio/xsk.h        |  5 +++++
> > > > > >  2 files changed, 21 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> > > > > > index a431a2c1ee47..a13d6d301fdb 100644
> > > > > > --- a/drivers/net/virtio/virtio_net.h
> > > > > > +++ b/drivers/net/virtio/virtio_net.h
> > > > > > @@ -225,6 +225,11 @@ struct virtnet_info {
> > > > > >  	struct failover *failover;
> > > > > >  };
> > > > > >
> > > > > > +static inline bool virtnet_is_skb_ptr(void *ptr)
> > > > > > +{
> > > > > > +	return !((unsigned long)ptr & VIRTIO_XMIT_DATA_MASK);
> > > > > > +}
> > > > > > +
> > > > > >  static inline bool virtnet_is_xdp_frame(void *ptr)
> > > > > >  {
> > > > > >  	return (unsigned long)ptr & VIRTIO_XDP_FLAG;
> > > > > > @@ -235,6 +240,8 @@ static inline struct xdp_frame *virtnet_ptr_to_xdp(void *ptr)
> > > > > >  	return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
> > > > > >  }
> > > > > >
> > > > > > +static inline u32 virtnet_ptr_to_xsk(void *ptr);
> > > > > > +
> > > > >
> > > > > I don't understand why you need this here.
> > > >
> > > > The below function virtnet_free_old_xmit needs this.
> > > >
> > > > Thanks.
> > >
> > > I don't understand why is virtnet_free_old_xmit inline, either.
> >
> > That is in the header file.
> >
>
> It does not belong there.


Do you mean virtnet_free_old_xmit?

That will be called by xsk.c. So I move it to the header file.

Thanks.


>
>
> > >
> > > > >
> > > > >
> > > > > >  static inline void *virtnet_sq_unmap(struct virtnet_sq *sq, void *data)
> > > > > >  {
> > > > > >  	struct virtnet_sq_dma *next, *head;
> > > > > > @@ -261,11 +268,12 @@ static inline void *virtnet_sq_unmap(struct virtnet_sq *sq, void *data)
> > > > > >  static inline void virtnet_free_old_xmit(struct virtnet_sq *sq, bool in_napi,
> > > > > >  					 u64 *bytes, u64 *packets)
> > > > > >  {
> > > > > > +	unsigned int xsknum = 0;
> > > > > >  	unsigned int len;
> > > > > >  	void *ptr;
> > > > > >
> > > > > >  	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > > > > -		if (!virtnet_is_xdp_frame(ptr)) {
> > > > > > +		if (virtnet_is_skb_ptr(ptr)) {
> > > > > >  			struct sk_buff *skb;
> > > > > >
> > > > > >  			if (sq->do_dma)
> > > > > > @@ -277,7 +285,7 @@ static inline void virtnet_free_old_xmit(struct virtnet_sq *sq, bool in_napi,
> > > > > >
> > > > > >  			*bytes += skb->len;
> > > > > >  			napi_consume_skb(skb, in_napi);
> > > > > > -		} else {
> > > > > > +		} else if (virtnet_is_xdp_frame(ptr)) {
> > > > > >  			struct xdp_frame *frame;
> > > > > >
> > > > > >  			if (sq->do_dma)
> > > > > > @@ -287,9 +295,15 @@ static inline void virtnet_free_old_xmit(struct virtnet_sq *sq, bool in_napi,
> > > > > >
> > > > > >  			*bytes += xdp_get_frame_len(frame);
> > > > > >  			xdp_return_frame(frame);
> > > > > > +		} else {
> > > > > > +			*bytes += virtnet_ptr_to_xsk(ptr);
> > > > > > +			++xsknum;
> > > > > >  		}
> > > > > >  		(*packets)++;
> > > > > >  	}
> > > > > > +
> > > > > > +	if (xsknum)
> > > > > > +		xsk_tx_completed(sq->xsk.pool, xsknum);
> > > > > >  }
> > > > > >
> > > > > >  static inline bool virtnet_is_xdp_raw_buffer_queue(struct virtnet_info *vi, int q)
> > > > > > diff --git a/drivers/net/virtio/xsk.h b/drivers/net/virtio/xsk.h
> > > > > > index 1bd19dcda649..7ebc9bda7aee 100644
> > > > > > --- a/drivers/net/virtio/xsk.h
> > > > > > +++ b/drivers/net/virtio/xsk.h
> > > > > > @@ -14,6 +14,11 @@ static inline void *virtnet_xsk_to_ptr(u32 len)
> > > > > >  	return (void *)(p | VIRTIO_XSK_FLAG);
> > > > > >  }
> > > > > >
> > > > > > +static inline u32 virtnet_ptr_to_xsk(void *ptr)
> > > > > > +{
> > > > > > +	return ((unsigned long)ptr) >> VIRTIO_XSK_FLAG_OFFSET;
> > > > > > +}
> > > > > > +
> > > > > >  int virtnet_xsk_pool_setup(struct net_device *dev, struct netdev_bpf *xdp);
> > > > > >  bool virtnet_xsk_xmit(struct virtnet_sq *sq, struct xsk_buff_pool *pool,
> > > > > >  		      int budget);
> > > > > > --
> > > > > > 2.32.0.3.g01195cf9f
> > > > >
> > > > >
> > >
>

^ permalink raw reply

* Re: [PATCH v2 2/2] tg3: Increment tx_dropped in tg3_tso_bug()
From: Michael Chan @ 2023-11-10  4:56 UTC (permalink / raw)
  To: alexey.pakhunov
  Cc: mchan, vincent.wong2, netdev, linux-kernel, siva.kallam, prashant
In-Reply-To: <20231110002340.3612515-2-alexey.pakhunov@spacex.com>

[-- Attachment #1: Type: text/plain, Size: 459 bytes --]

On Thu, Nov 9, 2023 at 4:24 PM <alexey.pakhunov@spacex.com> wrote:
>
> From: Alex Pakhunov <alexey.pakhunov@spacex.com>
>
> tg3_tso_bug() drops a packet if it cannot be segmented for any reason.
> The number of discarded frames should be incremented accordingly.
>
> Signed-off-by: Alex Pakhunov <alexey.pakhunov@spacex.com>
> Signed-off-by: Vincent Wong <vincent.wong2@spacex.com>

Thanks.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]

^ permalink raw reply

* Re: [PATCH net 2/3] dpll: fix pin dump crash for rebound module
From: Jiri Pirko @ 2023-11-10  6:45 UTC (permalink / raw)
  To: Kubalewski, Arkadiusz
  Cc: netdev@vger.kernel.org, vadim.fedorenko@linux.dev,
	Michalik, Michal, Olech, Milena, pabeni@redhat.com,
	kuba@kernel.org
In-Reply-To: <DM6PR11MB4657DAC525E05B5DB72145119BAFA@DM6PR11MB4657.namprd11.prod.outlook.com>

Fri, Nov 10, 2023 at 12:32:21AM CET, arkadiusz.kubalewski@intel.com wrote:
>>From: Jiri Pirko <jiri@resnulli.us>
>>Sent: Thursday, November 9, 2023 7:06 PM
>>
>>Thu, Nov 09, 2023 at 05:30:20PM CET, arkadiusz.kubalewski@intel.com wrote:
>>>>From: Jiri Pirko <jiri@resnulli.us>
>>>>Sent: Thursday, November 9, 2023 2:19 PM
>>>>
>>>>Thu, Nov 09, 2023 at 01:20:48PM CET, arkadiusz.kubalewski@intel.com
>>>>wrote:
>>>>>>From: Jiri Pirko <jiri@resnulli.us>
>>>>>>Sent: Wednesday, November 8, 2023 3:30 PM
>>>>>>
>>>>>>Wed, Nov 08, 2023 at 11:32:25AM CET, arkadiusz.kubalewski@intel.com
>>>>>>wrote:
>>>>>>>When a kernel module is unbound but the pin resources were not
>>>>>>>entirely
>>>>>>>freed (other kernel module instance have had kept the reference to
>>>>>>>that
>>>>>>>pin), and kernel module is again bound, the pin properties would not
>>>>>>>be
>>>>>>>updated (the properties are only assigned when memory for the pin is
>>>>>>>allocated), prop pointer still points to the kernel module memory of
>>>>>>>the kernel module which was deallocated on the unbind.
>>>>>>>
>>>>>>>If the pin dump is invoked in this state, the result is a kernel
>>>>>>>crash.
>>>>>>>Prevent the crash by storing persistent pin properties in dpll
>>>>>>>subsystem,
>>>>>>>copy the content from the kernel module when pin is allocated, instead
>>>>>>>of
>>>>>>>using memory of the kernel module.
>>>>>>>
>>>>>>>Fixes: 9431063ad323 ("dpll: core: Add DPLL framework base functions")
>>>>>>>Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base
>>>>>>>functions")
>>>>>>>Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
>>>>>>>---
>>>>>>> drivers/dpll/dpll_core.c    |  4 ++--
>>>>>>> drivers/dpll/dpll_core.h    |  4 ++--
>>>>>>> drivers/dpll/dpll_netlink.c | 28 ++++++++++++++--------------
>>>>>>> 3 files changed, 18 insertions(+), 18 deletions(-)
>>>>>>>
>>>>>>>diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
>>>>>>>index 3568149b9562..4077b562ba3b 100644
>>>>>>>--- a/drivers/dpll/dpll_core.c
>>>>>>>+++ b/drivers/dpll/dpll_core.c
>>>>>>>@@ -442,7 +442,7 @@ dpll_pin_alloc(u64 clock_id, u32 pin_idx, struct
>>>>>>>module *module,
>>>>>>> 		ret = -EINVAL;
>>>>>>> 		goto err;
>>>>>>> 	}
>>>>>>>-	pin->prop = prop;
>>>>>>>+	memcpy(&pin->prop, prop, sizeof(pin->prop));
>>>>>>
>>>>>>Odd, you don't care about the pointer within this structure?
>>>>>>
>>>>>
>>>>>Well, true. Need a fix.
>>>>>Wondering if copying idea is better than just assigning prop pointer on
>>>>>each call to dpll_pin_get(..) function (when pin already exists)?
>>>>
>>>>Not sure what do you mean. Examples please.
>>>>
>>>
>>>Sure,
>>>
>>>Basically this change:
>>>
>>>diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
>>>index ae884b92d68c..06b72d5877c3 100644
>>>--- a/drivers/dpll/dpll_core.c
>>>+++ b/drivers/dpll/dpll_core.c
>>>@@ -483,6 +483,7 @@ dpll_pin_get(u64 clock_id, u32 pin_idx, struct module
>>>*module,
>>>                    pos->pin_idx == pin_idx &&
>>>                    pos->module == module) {
>>>                        ret = pos;
>>>+                       pos->prop = prop;
>>>                        refcount_inc(&ret->refcount);
>>>                        break;
>>>                }
>>>
>>>would replace whole of this patch changes, although seems a bit hacky.
>>
>>Or event better, as I suggested in the other patch reply, resolve this
>>internally in the driver registering things only when they are valid.
>>Much better then to hack anything in dpll core.
>>
>
>This approach seemed to me hacky, that is why started with coping the
>data.
>It is not about registering, rather about unregistering on driver
>unbind, which brakes things, and currently cannot be recovered in
>described case.

Sure it can. PF0 unbind-> internal notification-> PF1 unregisters all
related object. Very clean and simple.


>
>Thank you!
>Arkadiusz
>
>>
>>>
>>>Thank you!
>>>Arkadiusz
>>>
>>>>
>>>>>
>>>>>Thank you!
>>>>>Arkadiusz
>>>>>
>>>>>>
>>>>>>> 	refcount_set(&pin->refcount, 1);
>>>>>>> 	xa_init_flags(&pin->dpll_refs, XA_FLAGS_ALLOC);
>>>>>>> 	xa_init_flags(&pin->parent_refs, XA_FLAGS_ALLOC);
>>>>>>>@@ -634,7 +634,7 @@ int dpll_pin_on_pin_register(struct dpll_pin
>>>>>>>*parent,
>>>>>>>struct dpll_pin *pin,
>>>>>>> 	unsigned long i, stop;
>>>>>>> 	int ret;
>>>>>>>
>>>>>>>-	if (WARN_ON(parent->prop->type != DPLL_PIN_TYPE_MUX))
>>>>>>>+	if (WARN_ON(parent->prop.type != DPLL_PIN_TYPE_MUX))
>>>>>>> 		return -EINVAL;
>>>>>>>
>>>>>>> 	if (WARN_ON(!ops) ||
>>>>>>>diff --git a/drivers/dpll/dpll_core.h b/drivers/dpll/dpll_core.h
>>>>>>>index 5585873c5c1b..717f715015c7 100644
>>>>>>>--- a/drivers/dpll/dpll_core.h
>>>>>>>+++ b/drivers/dpll/dpll_core.h
>>>>>>>@@ -44,7 +44,7 @@ struct dpll_device {
>>>>>>>  * @module:		module of creator
>>>>>>>  * @dpll_refs:		hold referencees to dplls pin was registered
>>>>>>>with
>>>>>>>  * @parent_refs:	hold references to parent pins pin was registered
>>>>>>>with
>>>>>>>- * @prop:		pointer to pin properties given by registerer
>>>>>>>+ * @prop:		pin properties copied from the registerer
>>>>>>>  * @rclk_dev_name:	holds name of device when pin can recover
>>>>>>>clock
>>>>>>>from it
>>>>>>>  * @refcount:		refcount
>>>>>>>  **/
>>>>>>>@@ -55,7 +55,7 @@ struct dpll_pin {
>>>>>>> 	struct module *module;
>>>>>>> 	struct xarray dpll_refs;
>>>>>>> 	struct xarray parent_refs;
>>>>>>>-	const struct dpll_pin_properties *prop;
>>>>>>>+	struct dpll_pin_properties prop;
>>>>>>> 	refcount_t refcount;
>>>>>>> };
>>>>>>>
>>>>>>>diff --git a/drivers/dpll/dpll_netlink.c b/drivers/dpll/dpll_netlink.c
>>>>>>>index 93fc6c4b8a78..963bbbbe6660 100644
>>>>>>>--- a/drivers/dpll/dpll_netlink.c
>>>>>>>+++ b/drivers/dpll/dpll_netlink.c
>>>>>>>@@ -278,17 +278,17 @@ dpll_msg_add_pin_freq(struct sk_buff *msg,
>>>>>>>struct
>>>>>>>dpll_pin *pin,
>>>>>>> 	if (nla_put_64bit(msg, DPLL_A_PIN_FREQUENCY, sizeof(freq),
>>>>>>>&freq,
>>>>>>> 			  DPLL_A_PIN_PAD))
>>>>>>> 		return -EMSGSIZE;
>>>>>>>-	for (fs = 0; fs < pin->prop->freq_supported_num; fs++) {
>>>>>>>+	for (fs = 0; fs < pin->prop.freq_supported_num; fs++) {
>>>>>>> 		nest = nla_nest_start(msg,
>>>>>>>DPLL_A_PIN_FREQUENCY_SUPPORTED);
>>>>>>> 		if (!nest)
>>>>>>> 			return -EMSGSIZE;
>>>>>>>-		freq = pin->prop->freq_supported[fs].min;
>>>>>>>+		freq = pin->prop.freq_supported[fs].min;
>>>>>>> 		if (nla_put_64bit(msg, DPLL_A_PIN_FREQUENCY_MIN,
>>>>>>>sizeof(freq),
>>>>>>> 				  &freq, DPLL_A_PIN_PAD)) {
>>>>>>> 			nla_nest_cancel(msg, nest);
>>>>>>> 			return -EMSGSIZE;
>>>>>>> 		}
>>>>>>>-		freq = pin->prop->freq_supported[fs].max;
>>>>>>>+		freq = pin->prop.freq_supported[fs].max;
>>>>>>> 		if (nla_put_64bit(msg, DPLL_A_PIN_FREQUENCY_MAX,
>>>>>>>sizeof(freq),
>>>>>>> 				  &freq, DPLL_A_PIN_PAD)) {
>>>>>>> 			nla_nest_cancel(msg, nest);
>>>>>>>@@ -304,9 +304,9 @@ static bool dpll_pin_is_freq_supported(struct
>>>>>>>dpll_pin
>>>>>>>*pin, u32 freq)
>>>>>>> {
>>>>>>> 	int fs;
>>>>>>>
>>>>>>>-	for (fs = 0; fs < pin->prop->freq_supported_num; fs++)
>>>>>>>-		if (freq >= pin->prop->freq_supported[fs].min &&
>>>>>>>-		    freq <= pin->prop->freq_supported[fs].max)
>>>>>>>+	for (fs = 0; fs < pin->prop.freq_supported_num; fs++)
>>>>>>>+		if (freq >= pin->prop.freq_supported[fs].min &&
>>>>>>>+		    freq <= pin->prop.freq_supported[fs].max)
>>>>>>> 			return true;
>>>>>>> 	return false;
>>>>>>> }
>>>>>>>@@ -403,7 +403,7 @@ static int
>>>>>>> dpll_cmd_pin_get_one(struct sk_buff *msg, struct dpll_pin *pin,
>>>>>>> 		     struct netlink_ext_ack *extack)
>>>>>>> {
>>>>>>>-	const struct dpll_pin_properties *prop = pin->prop;
>>>>>>>+	const struct dpll_pin_properties *prop = &pin->prop;
>>>>>>> 	struct dpll_pin_ref *ref;
>>>>>>> 	int ret;
>>>>>>>
>>>>>>>@@ -696,7 +696,7 @@ dpll_pin_on_pin_state_set(struct dpll_pin *pin,
>>>>>>>u32
>>>>>>>parent_idx,
>>>>>>> 	int ret;
>>>>>>>
>>>>>>> 	if (!(DPLL_PIN_CAPABILITIES_STATE_CAN_CHANGE &
>>>>>>>-	      pin->prop->capabilities)) {
>>>>>>>+	      pin->prop.capabilities)) {
>>>>>>> 		NL_SET_ERR_MSG(extack, "state changing is not allowed");
>>>>>>> 		return -EOPNOTSUPP;
>>>>>>> 	}
>>>>>>>@@ -732,7 +732,7 @@ dpll_pin_state_set(struct dpll_device *dpll,
>>>>>>>struct
>>>>>>>dpll_pin *pin,
>>>>>>> 	int ret;
>>>>>>>
>>>>>>> 	if (!(DPLL_PIN_CAPABILITIES_STATE_CAN_CHANGE &
>>>>>>>-	      pin->prop->capabilities)) {
>>>>>>>+	      pin->prop.capabilities)) {
>>>>>>> 		NL_SET_ERR_MSG(extack, "state changing is not allowed");
>>>>>>> 		return -EOPNOTSUPP;
>>>>>>> 	}
>>>>>>>@@ -759,7 +759,7 @@ dpll_pin_prio_set(struct dpll_device *dpll, struct
>>>>>>>dpll_pin *pin,
>>>>>>> 	int ret;
>>>>>>>
>>>>>>> 	if (!(DPLL_PIN_CAPABILITIES_PRIORITY_CAN_CHANGE &
>>>>>>>-	      pin->prop->capabilities)) {
>>>>>>>+	      pin->prop.capabilities)) {
>>>>>>> 		NL_SET_ERR_MSG(extack, "prio changing is not allowed");
>>>>>>> 		return -EOPNOTSUPP;
>>>>>>> 	}
>>>>>>>@@ -787,7 +787,7 @@ dpll_pin_direction_set(struct dpll_pin *pin,
>>>>>>>struct
>>>>>>>dpll_device *dpll,
>>>>>>> 	int ret;
>>>>>>>
>>>>>>> 	if (!(DPLL_PIN_CAPABILITIES_DIRECTION_CAN_CHANGE &
>>>>>>>-	      pin->prop->capabilities)) {
>>>>>>>+	      pin->prop.capabilities)) {
>>>>>>> 		NL_SET_ERR_MSG(extack, "direction changing is not
>>>>>>>allowed");
>>>>>>> 		return -EOPNOTSUPP;
>>>>>>> 	}
>>>>>>>@@ -817,8 +817,8 @@ dpll_pin_phase_adj_set(struct dpll_pin *pin,
>>>>>>>struct
>>>>>>>nlattr *phase_adj_attr,
>>>>>>> 	int ret;
>>>>>>>
>>>>>>> 	phase_adj = nla_get_s32(phase_adj_attr);
>>>>>>>-	if (phase_adj > pin->prop->phase_range.max ||
>>>>>>>-	    phase_adj < pin->prop->phase_range.min) {
>>>>>>>+	if (phase_adj > pin->prop.phase_range.max ||
>>>>>>>+	    phase_adj < pin->prop.phase_range.min) {
>>>>>>> 		NL_SET_ERR_MSG_ATTR(extack, phase_adj_attr,
>>>>>>> 				    "phase adjust value not supported");
>>>>>>> 		return -EINVAL;
>>>>>>>@@ -999,7 +999,7 @@ dpll_pin_find(u64 clock_id, struct nlattr
>>>>>>>*mod_name_attr,
>>>>>>> 	unsigned long i;
>>>>>>>
>>>>>>> 	xa_for_each_marked(&dpll_pin_xa, i, pin, DPLL_REGISTERED) {
>>>>>>>-		prop = pin->prop;
>>>>>>>+		prop = &pin->prop;
>>>>>>> 		cid_match = clock_id ? pin->clock_id == clock_id : true;
>>>>>>> 		mod_match = mod_name_attr && module_name(pin->module) ?
>>>>>>> 			!nla_strcmp(mod_name_attr,
>>>>>>>--
>>>>>>>2.38.1
>>>>>>>
>>>>>
>

^ permalink raw reply

* Re: [PATCH net 3/3] dpll: fix register pin with unregistered parent pin
From: Jiri Pirko @ 2023-11-10  6:48 UTC (permalink / raw)
  To: Kubalewski, Arkadiusz
  Cc: Vadim Fedorenko, netdev@vger.kernel.org, Michalik, Michal,
	Olech, Milena, pabeni@redhat.com, kuba@kernel.org
In-Reply-To: <DM6PR11MB465763E8D261CEC6B215358C9BAFA@DM6PR11MB4657.namprd11.prod.outlook.com>

Fri, Nov 10, 2023 at 12:21:11AM CET, arkadiusz.kubalewski@intel.com wrote:
>>From: Jiri Pirko <jiri@resnulli.us>
>>Sent: Thursday, November 9, 2023 7:04 PM
>>
>>Thu, Nov 09, 2023 at 05:02:48PM CET, arkadiusz.kubalewski@intel.com wrote:
>>>>From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
>>>>Sent: Thursday, November 9, 2023 11:56 AM
>>>>To: Kubalewski, Arkadiusz <arkadiusz.kubalewski@intel.com>; Jiri Pirko
>>>>
>>>>On 09/11/2023 09:59, Kubalewski, Arkadiusz wrote:
>>>>>> From: Jiri Pirko <jiri@resnulli.us>
>>>>>> Sent: Wednesday, November 8, 2023 4:08 PM
>>>>>>
>>>>>> Wed, Nov 08, 2023 at 11:32:26AM CET, arkadiusz.kubalewski@intel.com
>>>>>> wrote:
>>>>>>> In case of multiple kernel module instances using the same dpll
>>>>>>>device:
>>>>>>> if only one registers dpll device, then only that one can register
>>>>>>
>>>>>> They why you don't register in multiple instances? See mlx5 for a
>>>>>> reference.
>>>>>>
>>>>>
>>>>> Every registration requires ops, but for our case only PF0 is able to
>>>>> control dpll pins and device, thus only this can provide ops.
>>>>> Basically without PF0, dpll is not able to be controlled, as well
>>>>> as directly connected pins.
>>>>>
>>>>But why do you need other pins then, if FP0 doesn't exist?
>>>>
>>>
>>>In general we don't need them at that point, but this is a corner case,
>>>where users for some reason decided to unbind PF 0, and I treat this state
>>>as temporary, where dpll/pins controllability is temporarily broken.
>>
>>So resolve this broken situation internally in the driver, registering
>>things only in case PF0 is present. Some simple notification infra would
>>do. Don't drag this into the subsystem internals.
>>
>
>Thanks for your feedback, but this is already wrong advice.
>
>Our HW/FW is designed in different way than yours, it doesn't mean it is wrong.
>As you might recall from our sync meetings, the dpll subsystem is to unify
>approaches and reduce the code in the drivers, where your advice is exactly

No. Your driver knows when what objects are valid or not. Of a pin of
PF1 is not valid because the "master" PF0 is gone, it is responsibility
of your driver to resolve that. Don't bring this internal dependencies
to the dpll core please, does not make any sense to do so. Thanks!


>opposite, suggested fix would require to implement extra synchronization of the
>dpll and pin registration state between driver instances, most probably with
>use of additional modules like aux-bus or something similar, which was from the
>very beginning something we tried to avoid.
>Only ice uses the infrastructure of muxed pins, and this is broken as it
>doesn't allow unbind the driver which have registered dpll and pins without
>crashing the kernel, so a fix is required in dpll subsystem, not in the driver.
>
>Thank you!
>Arkadiusz
>
>>
>>>
>>>The dpll at that point is not registered, all the direct pins are also
>>>not registered, thus not available to the users.
>>>
>>>When I do dump at that point there are still 3 pins present, one for each
>>>PF, although they are all zombies - no parents as their parent pins are
>>>not
>>>registered (as the other patch [1/3] prevents dump of pin parent if the
>>>parent is not registered). Maybe we can remove the REGISTERED mark for all
>>>the muxed pins, if all their parents have been unregistered, so they won't
>>>be visible to the user at all. Will try to POC that.
>>>
>>>>>>
>>>>>>> directly connected pins with a dpll device. If unregistered parent
>>>>>>> determines if the muxed pin can be register with it or not, it forces
>>>>>>> serialized driver load order - first the driver instance which
>>>>>>> registers the direct pins needs to be loaded, then the other
>>>>>>> instances
>>>>>>> could register muxed type pins.
>>>>>>>
>>>>>>> Allow registration of a pin with a parent even if the parent was not
>>>>>>> yet registered, thus allow ability for unserialized driver instance
>>>>>>
>>>>>> Weird.
>>>>>>
>>>>>
>>>>> Yeah, this is issue only for MUX/parent pin part, couldn't find better
>>>>> way, but it doesn't seem to break things around..
>>>>>
>>>>
>>>>I just wonder how do you see the registration procedure? How can parent
>>>>pin exist if it's not registered? I believe you cannot get it through
>>>>DPLL API, then the only possible way is to create it within the same
>>>>driver code, which can be simply re-arranged. Am I wrong here?
>>>>
>>>
>>>By "parent exist" I mean the parent pin exist in the dpll subsystem
>>>(allocated on pins xa), but it doesn't mean it is available to the users,
>>>as it might not be registered with a dpll device.
>>>
>>>We have this 2 step init approach:
>>>1. dpll_pin_get(..) -> allocate new pin or increase reference if exist
>>>2.1. dpll_pin_register(..) -> register with a dpll device
>>>2.2. dpll_pin_on_pin_register -> register with a parent pin
>>>
>>>Basically:
>>>- PF 0 does 1 & 2.1 for all the direct inputs, and steps: 1 & 2.2 for its
>>>  recovery clock pin,
>>>- other PF's only do step 1 for the direct input pins (as they must get
>>>  reference to those in order to register recovery clock pin with them),
>>>  and steps: 1 & 2.2 for their recovery clock pin.
>>>
>>>
>>>Thank you!
>>>Arkadiusz
>>>
>>>>> Thank you!
>>>>> Arkadiusz
>>>>>
>>>>>>
>>>>>>> load order.
>>>>>>> Do not WARN_ON notification for unregistered pin, which can be
>>>>>>> invoked
>>>>>>> for described case, instead just return error.
>>>>>>>
>>>>>>> Fixes: 9431063ad323 ("dpll: core: Add DPLL framework base functions")
>>>>>>> Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base
>>>>>>> functions")
>>>>>>> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
>>>>>>> ---
>>>>>>> drivers/dpll/dpll_core.c    | 4 ----
>>>>>>> drivers/dpll/dpll_netlink.c | 2 +-
>>>>>>> 2 files changed, 1 insertion(+), 5 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
>>>>>>> index
>>>>>>> 4077b562ba3b..ae884b92d68c 100644
>>>>>>> --- a/drivers/dpll/dpll_core.c
>>>>>>> +++ b/drivers/dpll/dpll_core.c
>>>>>>> @@ -28,8 +28,6 @@ static u32 dpll_xa_id;
>>>>>>> 	WARN_ON_ONCE(!xa_get_mark(&dpll_device_xa, (d)->id,
>>>>>>> DPLL_REGISTERED))
>>>>>>> #define ASSERT_DPLL_NOT_REGISTERED(d)	\
>>>>>>> 	WARN_ON_ONCE(xa_get_mark(&dpll_device_xa, (d)->id,
>>>>>>> DPLL_REGISTERED))
>>>>>>> -#define ASSERT_PIN_REGISTERED(p)	\
>>>>>>> -	WARN_ON_ONCE(!xa_get_mark(&dpll_pin_xa, (p)->id,
>>>>>>> DPLL_REGISTERED))
>>>>>>>
>>>>>>> struct dpll_device_registration {
>>>>>>> 	struct list_head list;
>>>>>>> @@ -641,8 +639,6 @@ int dpll_pin_on_pin_register(struct dpll_pin
>>>>>>> *parent,
>>>>>>> struct dpll_pin *pin,
>>>>>>> 	    WARN_ON(!ops->state_on_pin_get) ||
>>>>>>> 	    WARN_ON(!ops->direction_get))
>>>>>>> 		return -EINVAL;
>>>>>>> -	if (ASSERT_PIN_REGISTERED(parent))
>>>>>>> -		return -EINVAL;
>>>>>>>
>>>>>>> 	mutex_lock(&dpll_lock);
>>>>>>> 	ret = dpll_xa_ref_pin_add(&pin->parent_refs, parent, ops,
>>>>>>> priv); diff
>>>>>>> --git a/drivers/dpll/dpll_netlink.c b/drivers/dpll/dpll_netlink.c
>>>>>>> index
>>>>>>> 963bbbbe6660..ff430f43304f 100644
>>>>>>> --- a/drivers/dpll/dpll_netlink.c
>>>>>>> +++ b/drivers/dpll/dpll_netlink.c
>>>>>>> @@ -558,7 +558,7 @@ dpll_pin_event_send(enum dpll_cmd event, struct
>>>>>>> dpll_pin *pin)
>>>>>>> 	int ret = -ENOMEM;
>>>>>>> 	void *hdr;
>>>>>>>
>>>>>>> -	if (WARN_ON(!xa_get_mark(&dpll_pin_xa, pin->id,
>>>>>>> DPLL_REGISTERED)))
>>>>>>> +	if (!xa_get_mark(&dpll_pin_xa, pin->id, DPLL_REGISTERED))
>>>>>>> 		return -ENODEV;
>>>>>>>
>>>>>>> 	msg = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
>>>>>>> --
>>>>>>> 2.38.1
>>>>>>>
>>>>
>>>

^ permalink raw reply

* Re: [PATCH net 0/3] dpll: fix unordered unbind/bind registerer issues
From: Jiri Pirko @ 2023-11-10  6:48 UTC (permalink / raw)
  To: Kubalewski, Arkadiusz
  Cc: Vadim Fedorenko, netdev@vger.kernel.org, Michalik, Michal,
	Olech, Milena, pabeni@redhat.com, kuba@kernel.org
In-Reply-To: <DM6PR11MB4657209FFC300E207E600F3F9BAFA@DM6PR11MB4657.namprd11.prod.outlook.com>

Fri, Nov 10, 2023 at 12:35:43AM CET, arkadiusz.kubalewski@intel.com wrote:
>>From: Jiri Pirko <jiri@resnulli.us>
>>Sent: Thursday, November 9, 2023 7:07 PM
>>
>>Thu, Nov 09, 2023 at 06:20:14PM CET, arkadiusz.kubalewski@intel.com wrote:
>>>>From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
>>>>Sent: Thursday, November 9, 2023 11:51 AM
>>>>
>>>>On 08/11/2023 10:32, Arkadiusz Kubalewski wrote:
>>>>> Fix issues when performing unordered unbind/bind of a kernel modules
>>>>> which are using a dpll device with DPLL_PIN_TYPE_MUX pins.
>>>>> Currently only serialized bind/unbind of such use case works, fix
>>>>> the issues and allow for unserialized kernel module bind order.
>>>>>
>>>>> The issues are observed on the ice driver, i.e.,
>>>>>
>>>>> $ echo 0000:af:00.0 > /sys/bus/pci/drivers/ice/unbind
>>>>> $ echo 0000:af:00.1 > /sys/bus/pci/drivers/ice/unbind
>>>>>
>>>>> results in:
>>>>>
>>>>> ice 0000:af:00.0: Removed PTP clock
>>>>> BUG: kernel NULL pointer dereference, address: 0000000000000010
>>>>> PF: supervisor read access in kernel mode
>>>>> PF: error_code(0x0000) - not-present page
>>>>> PGD 0 P4D 0
>>>>> Oops: 0000 [#1] PREEMPT SMP PTI
>>>>> CPU: 7 PID: 71848 Comm: bash Kdump: loaded Not tainted 6.6.0-rc5_next-
>>>>>queue_19th-Oct-2023-01625-g039e5d15e451 #1
>>>>> Hardware name: Intel Corporation S2600STB/S2600STB, BIOS
>>>>>SE5C620.86B.02.01.0008.031920191559 03/19/2019
>>>>> RIP: 0010:ice_dpll_rclk_state_on_pin_get+0x2f/0x90 [ice]
>>>>> Code: 41 57 4d 89 cf 41 56 41 55 4d 89 c5 41 54 55 48 89 f5 53 4c 8b 66
>>>>>08 48 89 cb 4d 8d b4 24 f0 49 00 00 4c 89 f7 e8 71 ec 1f c5 <0f> b6 5b
>>>>>10
>>>>>41 0f b6 84 24 30 4b 00 00 29 c3 41 0f b6 84 24 28 4b
>>>>> RSP: 0018:ffffc902b179fb60 EFLAGS: 00010246
>>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>>>> RDX: ffff8882c1398000 RSI: ffff888c7435cc60 RDI: ffff888c7435cb90
>>>>> RBP: ffff888c7435cc60 R08: ffffc902b179fbb0 R09: 0000000000000000
>>>>> R10: ffff888ef1fc8050 R11: fffffffffff82700 R12: ffff888c743581a0
>>>>> R13: ffffc902b179fbb0 R14: ffff888c7435cb90 R15: 0000000000000000
>>>>> FS:  00007fdc7dae0740(0000) GS:ffff888c105c0000(0000)
>>>>>knlGS:0000000000000000
>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> CR2: 0000000000000010 CR3: 0000000132c24002 CR4: 00000000007706e0
>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> PKRU: 55555554
>>>>> Call Trace:
>>>>>   <TASK>
>>>>>   ? __die+0x20/0x70
>>>>>   ? page_fault_oops+0x76/0x170
>>>>>   ? exc_page_fault+0x65/0x150
>>>>>   ? asm_exc_page_fault+0x22/0x30
>>>>>   ? ice_dpll_rclk_state_on_pin_get+0x2f/0x90 [ice]
>>>>>   ? __pfx_ice_dpll_rclk_state_on_pin_get+0x10/0x10 [ice]
>>>>>   dpll_msg_add_pin_parents+0x142/0x1d0
>>>>>   dpll_pin_event_send+0x7d/0x150
>>>>>   dpll_pin_on_pin_unregister+0x3f/0x100
>>>>>   ice_dpll_deinit_pins+0xa1/0x230 [ice]
>>>>>   ice_dpll_deinit+0x29/0xe0 [ice]
>>>>>   ice_remove+0xcd/0x200 [ice]
>>>>>   pci_device_remove+0x33/0xa0
>>>>>   device_release_driver_internal+0x193/0x200
>>>>>   unbind_store+0x9d/0xb0
>>>>>   kernfs_fop_write_iter+0x128/0x1c0
>>>>>   vfs_write+0x2bb/0x3e0
>>>>>   ksys_write+0x5f/0xe0
>>>>>   do_syscall_64+0x59/0x90
>>>>>   ? filp_close+0x1b/0x30
>>>>>   ? do_dup2+0x7d/0xd0
>>>>>   ? syscall_exit_work+0x103/0x130
>>>>>   ? syscall_exit_to_user_mode+0x22/0x40
>>>>>   ? do_syscall_64+0x69/0x90
>>>>>   ? syscall_exit_work+0x103/0x130
>>>>>   ? syscall_exit_to_user_mode+0x22/0x40
>>>>>   ? do_syscall_64+0x69/0x90
>>>>>   entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>>>>> RIP: 0033:0x7fdc7d93eb97
>>>>> Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e
>>>>>fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0
>>>>>ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
>>>>> RSP: 002b:00007fff2aa91028 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>>>> RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fdc7d93eb97
>>>>> RDX: 000000000000000d RSI: 00005644814ec9b0 RDI: 0000000000000001
>>>>> RBP: 00005644814ec9b0 R08: 0000000000000000 R09: 00007fdc7d9b14e0
>>>>> R10: 00007fdc7d9b13e0 R11: 0000000000000246 R12: 000000000000000d
>>>>> R13: 00007fdc7d9fb780 R14: 000000000000000d R15: 00007fdc7d9f69e0
>>>>>   </TASK>
>>>>> Modules linked in: uinput vfio_pci vfio_pci_core vfio_iommu_type1 vfio
>>>>>irqbypass ixgbevf snd_seq_dummy snd_hrtimer snd_seq snd_timer
>>>>>snd_seq_device snd soundcore overlay qrtr rfkill vfat fat xfs libcrc32c
>>>>>rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod
>>>>>target_core_mod
>>>>>ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm intel_rapl_msr
>>>>>intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common
>>>>>isst_if_common skx_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal
>>>>>intel_powerclamp coretemp irdma rapl intel_cstate ib_uverbs iTCO_wdt
>>>>>iTCO_vendor_support acpi_ipmi intel_uncore mei_me ipmi_si pcspkr
>>>>>i2c_i801
>>>>>ib_core mei ipmi_devintf intel_pch_thermal ioatdma i2c_smbus
>>>>>ipmi_msghandler lpc_ich joydev acpi_power_meter acpi_pad ext4 mbcache
>>>>>jbd2
>>>>>sd_mod t10_pi sg ast i2c_algo_bit drm_shmem_helper drm_kms_helper ice
>>>>>crct10dif_pclmul ixgbe crc32_pclmul drm crc32c_intel ahci i40e libahci
>>>>>ghash_clmulni_intel libata mdio dca gnss wmi fuse [last unloaded: iavf]
>>>>> CR2: 0000000000000010
>>>>>
>>>>> Arkadiusz Kubalewski (3):
>>>>>    dpll: fix pin dump crash after module unbind
>>>>>    dpll: fix pin dump crash for rebound module
>>>>>    dpll: fix register pin with unregistered parent pin
>>>>>
>>>>>   drivers/dpll/dpll_core.c    |  8 ++------
>>>>>   drivers/dpll/dpll_core.h    |  4 ++--
>>>>>   drivers/dpll/dpll_netlink.c | 37 ++++++++++++++++++++++--------------
>>>>>-
>>>>>   3 files changed, 26 insertions(+), 23 deletions(-)
>>>>>
>>>>
>>>>
>>>>I still don't get how can we end up with unregistered pin. And shouldn't
>>>>drivers do unregister of dpll/pin during release procedure? I thought it
>>>>was kind of agreement we reached while developing the subsystem.
>>>>
>>>
>>>It's definitely not about ending up with unregistered pins.
>>>
>>>Usually the driver is loaded for PF0, PF1, PF2, PF3 and unloaded in
>>>opposite
>>>order: PF3, PF2, PF1, PF0. And this is working without any issues.
>>
>>Please fix this in the driver.
>>
>
>Thanks for your feedback, but this is already wrong advice.
>
>Our HW/FW is designed in different way than yours, it doesn't mean it is wrong.
>As you might recall from our sync meetings, the dpll subsystem is to unify
>approaches and reduce the code in the drivers, where your advice is exactly
>opposite, suggested fix would require to implement extra synchronization of the
>dpll and pin registration state between driver instances, most probably with
>use of additional modules like aux-bus or something similar, which was from the
>very beginning something we tried to avoid.
>Only ice uses the infrastructure of muxed pins, and this is broken as it
>doesn't allow unbind the driver which have registered dpll and pins without
>crashing the kernel, so a fix is required in dpll subsystem, not in the driver.

I replied in the other patch thread.


>
>Thank you!
>Arkadiusz
>
>>
>>>
>>>Above crash is caused because of unordered driver unload, where dpll
>>>subsystem
>>>tries to notify muxed pin was deleted, but at that time the parent is
>>>already
>>>gone, thus data points to memory which is no longer available, thus crash
>>>happens when trying to dump pin parents.
>>>
>>>This series fixes all issues I could find connected to the situation where
>>>muxed-pins are trying to access their parents, when parent registerer was
>>>removed
>>>in the meantime.
>>>
>>>Thank you!
>>>Arkadiusz

^ permalink raw reply

* Re: [PATCH v2 1/2] tg3: Move the [rt]x_dropped counters to tg3_napi
From: Michael Chan @ 2023-11-10  4:55 UTC (permalink / raw)
  To: alexey.pakhunov
  Cc: mchan, vincent.wong2, netdev, linux-kernel, siva.kallam, prashant
In-Reply-To: <20231110002340.3612515-1-alexey.pakhunov@spacex.com>

[-- Attachment #1: Type: text/plain, Size: 434 bytes --]

On Thu, Nov 9, 2023 at 4:24 PM <alexey.pakhunov@spacex.com> wrote:
>
> From: Alex Pakhunov <alexey.pakhunov@spacex.com>
>
> This change moves [rt]x_dropped counters to tg3_napi so that they can be
> updated by a single writer, race-free.
>
> Signed-off-by: Alex Pakhunov <alexey.pakhunov@spacex.com>
> Signed-off-by: Vincent Wong <vincent.wong2@spacex.com>

Thanks.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]

^ permalink raw reply

* Re: [RFC v1 3/8] vhost: Add 3 new uapi to support iommufd
From: Cindy Lu @ 2023-11-10  6:49 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, yi.l.liu, jgg, linux-kernel, virtualization, netdev
In-Reply-To: <CACGkMEtVqAYP3ec0+uxmdiOdXXevjy5S+7Vuc9s=PcS3ry0nCg@mail.gmail.com>

Jason Wang <jasowang@redhat.com> 于2023年11月10日周五 10:32写道:
>
> On Wed, Nov 8, 2023 at 3:09 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Nov 8, 2023 at 2:39 PM Cindy Lu <lulu@redhat.com> wrote:
> > >
> > > On Wed, Nov 8, 2023 at 11:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Tue, Nov 7, 2023 at 2:57 PM Cindy Lu <lulu@redhat.com> wrote:
> > > > >
> > > > > On Mon, Nov 6, 2023 at 3:30 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > > On Sat, Nov 4, 2023 at 1:17 AM Cindy Lu <lulu@redhat.com> wrote:
> > > > > > >
> > > > > > > VHOST_VDPA_SET_IOMMU_FD: bind the device to iommufd device
> > > > > > >
> > > > > > > VDPA_DEVICE_ATTACH_IOMMUFD_AS: Attach a vdpa device to an iommufd
> > > > > > > address space specified by IOAS id.
> > > > > > >
> > > > > > > VDPA_DEVICE_DETACH_IOMMUFD_AS: Detach a vdpa device
> > > > > > > from the iommufd address space
> > > > > > >
> > > > > > > Signed-off-by: Cindy Lu <lulu@redhat.com>
> > > > > > > ---
> > > > > >
> > > > > > [...]
> > > > > >
> > > > > > > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> > > > > > > index f5c48b61ab62..07e1b2c443ca 100644
> > > > > > > --- a/include/uapi/linux/vhost.h
> > > > > > > +++ b/include/uapi/linux/vhost.h
> > > > > > > @@ -219,4 +219,70 @@
> > > > > > >   */
> > > > > > >  #define VHOST_VDPA_RESUME              _IO(VHOST_VIRTIO, 0x7E)
> > > > > > >
> > > > > > > +/* vhost_vdpa_set_iommufd
> > > > > > > + * Input parameters:
> > > > > > > + * @iommufd: file descriptor from /dev/iommu; pass -1 to unset
> > > > > > > + * @iommufd_ioasid: IOAS identifier returned from ioctl(IOMMU_IOAS_ALLOC)
> > > > > > > + * Output parameters:
> > > > > > > + * @out_dev_id: device identifier
> > > > > > > + */
> > > > > > > +struct vhost_vdpa_set_iommufd {
> > > > > > > +       __s32 iommufd;
> > > > > > > +       __u32 iommufd_ioasid;
> > > > > > > +       __u32 out_dev_id;
> > > > > > > +};
> > > > > > > +
> > > > > > > +#define VHOST_VDPA_SET_IOMMU_FD \
> > > > > > > +       _IOW(VHOST_VIRTIO, 0x7F, struct vhost_vdpa_set_iommufd)
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * VDPA_DEVICE_ATTACH_IOMMUFD_AS -
> > > > > > > + * _IOW(VHOST_VIRTIO, 0x7f, struct vdpa_device_attach_iommufd_as)
> > > > > > > + *
> > > > > > > + * Attach a vdpa device to an iommufd address space specified by IOAS
> > > > > > > + * id.
> > > > > > > + *
> > > > > > > + * Available only after a device has been bound to iommufd via
> > > > > > > + * VHOST_VDPA_SET_IOMMU_FD
> > > > > > > + *
> > > > > > > + * Undo by VDPA_DEVICE_DETACH_IOMMUFD_AS or device fd close.
> > > > > > > + *
> > > > > > > + * @argsz:     user filled size of this data.
> > > > > > > + * @flags:     must be 0.
> > > > > > > + * @ioas_id:   Input the target id which can represent an ioas
> > > > > > > + *             allocated via iommufd subsystem.
> > > > > > > + *
> > > > > > > + * Return: 0 on success, -errno on failure.
> > > > > > > + */
> > > > > > > +struct vdpa_device_attach_iommufd_as {
> > > > > > > +       __u32 argsz;
> > > > > > > +       __u32 flags;
> > > > > > > +       __u32 ioas_id;
> > > > > > > +};
> > > > > >
> > > > > > I think we need to map ioas to vDPA AS, so there should be an ASID
> > > > > > from the view of vDPA?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > The qemu will have a structure save and  maintain this information,So
> > > > > I didn't add this
> > > > >  in kernel,we can add this but maybe only for check?
> > > >
> > > > I meant for example, a simulator has two AS. How can we attach an ioas
> > > > to a specific AS with the above uAPI?
> > > >
> > > > Thank>
> > > this   __u32 ioas_id here is alloc from the iommufd system. maybe I
> > > need to change to new name iommuds_asid to
> > > make this more clear
> > > the process in qemu is
> > >
> > > 1) qemu want to use AS 0 (for example)
> > > 2) checking the existing asid. the asid 0 not used before
> > > 3 )alloc new asid from iommufd system, get new ioas_id (maybe 3 for example)
> > > qemu will save this relation 3<-->0 in the driver.
> > > 4) setting the ioctl VDPA_DEVICE_ATTACH_IOMMUFD_AS to attach new ASID
> > > to the kernel
> >
> > So if we want to map IOMMUFD AS 3 to VDPA AS 0, how can it be done?
> >
> > For example I didn't see a vDPA AS parameter in the above uAPI.
> >
> > vhost_vdpa_set_iommufd has iommufd_ioasid which is obviously not the vDPA AS.
> >
> > And ioas_id of vdpa_device_attach_iommufd_as (as you explained above)
> > is not vDPA AS.
>
> For example, the simulator/mlx5e has two ASes. It needs to know the
> mapping between vDPA AS and iommufd AS. Otherwise the translation will
> be problematic.
>
> Thanks
>
Got it, thanks Jason. I will re-write this part
Thanks
Cindy
> >
> > Thanks
> >
> >
> > > 5) while map the memory, qemu will use ASID 3 to map /umap
> > > and use ASID 0 for legacy mode map/umap
> > >
> > > So kernel here will not maintain the ioas_id from iommufd,
> > > and this also make the code strange since there will 2 different asid
> > > for the same AS, maybe we can save these information in the kernel
> > > Thanks
> > > cindy
> > > > > Thanks
> > > > > Cindy
> > > > > > > +
> > > > > > > +#define VDPA_DEVICE_ATTACH_IOMMUFD_AS \
> > > > > > > +       _IOW(VHOST_VIRTIO, 0x82, struct vdpa_device_attach_iommufd_as)
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * VDPA_DEVICE_DETACH_IOMMUFD_AS
> > > > > > > + *
> > > > > > > + * Detach a vdpa device from the iommufd address space it has been
> > > > > > > + * attached to. After it, device should be in a blocking DMA state.
> > > > > > > + *
> > > > > > > + * Available only after a device has been bound to iommufd via
> > > > > > > + * VHOST_VDPA_SET_IOMMU_FD
> > > > > > > + *
> > > > > > > + * @argsz:     user filled size of this data.
> > > > > > > + * @flags:     must be 0.
> > > > > > > + *
> > > > > > > + * Return: 0 on success, -errno on failure.
> > > > > > > + */
> > > > > > > +struct vdpa_device_detach_iommufd_as {
> > > > > > > +       __u32 argsz;
> > > > > > > +       __u32 flags;
> > > > > > > +};
> > > > > > > +
> > > > > > > +#define VDPA_DEVICE_DETACH_IOMMUFD_AS \
> > > > > > > +       _IOW(VHOST_VIRTIO, 0x83, struct vdpa_device_detach_iommufd_as)
> > > > > > > +
> > > > > > >  #endif
> > > > > > > --
> > > > > > > 2.34.3
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>


^ permalink raw reply

* Re: [RFC PATCH v3 08/12] net: support non paged skb frags
From: Mina Almasry @ 2023-11-10  4:06 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, linux-kernel, linux-arch, linux-kselftest, linux-media,
	dri-devel, linaro-mm-sig, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Jesper Dangaard Brouer, Ilias Apalodimas,
	Arnd Bergmann, David Ahern, Willem de Bruijn, Shuah Khan,
	Sumit Semwal, Christian König, Shakeel Butt, Jeroen de Borst,
	Praveen Kaligineedi
In-Reply-To: <adde2b31fdd9e7bb4a09f0073580b840bea0bab1.camel@redhat.com>

On Thu, Nov 9, 2023 at 1:15 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Sun, 2023-11-05 at 18:44 -0800, Mina Almasry wrote:
> [...]
> > @@ -3421,7 +3446,7 @@ static inline struct page *skb_frag_page(const skb_frag_t *frag)
> >   */
> >  static inline void __skb_frag_ref(skb_frag_t *frag)
> >  {
> > -     get_page(skb_frag_page(frag));
> > +     page_pool_page_get_many(frag->bv_page, 1);
>
> I guess the above needs #ifdef CONFIG_PAGE_POOL guards and explicit
> skb_frag_is_page_pool_iov() check ?
>

It doesn't actually. page_pool_page_* helpers are compiled in
regardless of CONFIG_PAGE_POOL, and handle both page_pool_iov* & page*
just fine (the checking happens inside the function).

You may yell at me that it's too confusing... I somewhat agree, but
I'm unsure of what is a better name or location for the helpers. The
helpers handle (page_pool_iov* || page*) gracefully, so they seem to
belong in the page pool for me, but it is indeed surprising/confusing
that these helpers are available even if !CONFIG_PAGE_POOL.

>
> Cheers,
>
> Paolo
>
>


-- 
Thanks,
Mina

^ permalink raw reply

* Re: [RFC PATCH v3 04/12] netdev: support binding dma-buf to netdevice
From: Yunsheng Lin @ 2023-11-10  7:38 UTC (permalink / raw)
  To: Mina Almasry, Paolo Abeni
  Cc: netdev, linux-kernel, linux-arch, linux-kselftest, linux-media,
	dri-devel, linaro-mm-sig, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Jesper Dangaard Brouer, Ilias Apalodimas,
	Arnd Bergmann, David Ahern, Willem de Bruijn, Shuah Khan,
	Sumit Semwal, Christian König, Shakeel Butt, Jeroen de Borst,
	Praveen Kaligineedi, Willem de Bruijn, Kaiyuan Zhang
In-Reply-To: <CAHS8izOGSE-PJ1uShkH_Mr6kUoC1EjM_9P1J=_TO6nLFP9K53Q@mail.gmail.com>

On 2023/11/10 10:59, Mina Almasry wrote:
> On Thu, Nov 9, 2023 at 12:30 AM Paolo Abeni <pabeni@redhat.com> wrote:
>>
>> I'm trying to wrap my head around the whole infra... the above line is
>> confusing. Why do you increment dma_addr? it will be re-initialized in
>> the next iteration.
>>
> 
> That is just a mistake, sorry. Will remove this increment.

You seems to be combining comments in different thread and replying in
one thread, I am not sure that is a good practice and I almost missed the
reply below as I don't seem to be cc'ed.

> 
> On Thu, Nov 9, 2023 at 1:29 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:> >>>
>>>>> gen_pool_destroy BUG_ON() if it's not empty at the time of destroying.
>>>>> Technically that should never happen, because
>>>>> __netdev_devmem_binding_free() should only be called when the refcount
>>>>> hits 0, so all the chunks have been freed back to the gen_pool. But,
>>>>> just in case, I don't want to crash the server just because I'm
>>>>> leaking a chunk... this is a bit of defensive programming that is
>>>>> typically frowned upon, but the behavior of gen_pool is so severe I
>>>>> think the WARN() + check is warranted here.
>>>>
>>>> It seems it is pretty normal for the above to happen nowadays because of
>>>> retransmits timeouts, NAPI defer schemes mentioned below:
>>>>
>>>> https://lkml.kernel.org/netdev/168269854650.2191653.8465259808498269815.stgit@firesoul/
>>>>
>>>> And currently page pool core handles that by using a workqueue.
>>>
>>> Forgive me but I'm not understanding the concern here.
>>>
>>> __netdev_devmem_binding_free() is called when binding->ref hits 0.
>>>
>>> binding->ref is incremented when an iov slice of the dma-buf is
>>> allocated, and decremented when an iov is freed. So,
>>> __netdev_devmem_binding_free() can't really be called unless all the
>>> iovs have been freed, and gen_pool_size() == gen_pool_avail(),
>>> regardless of what's happening on the page_pool side of things, right?
>>
>> I seems to misunderstand it. In that case, it seems to be about
>> defensive programming like other checking.
>>
>> By looking at it more closely, it seems napi_frag_unref() call
>> page_pool_page_put_many() directly, which means devmem seems to
>> be bypassing the napi_safe optimization.
>>
>> Can napi_frag_unref() reuse napi_pp_put_page() in order to reuse
>> the napi_safe optimization?
>>
> 
> I think it already does. page_pool_page_put_many() is only called if
> !recycle or !napi_pp_put_page(). In that case
> page_pool_page_put_many() is just a replacement for put_page(),
> because this 'page' may be an iov.

Is there a reason why not calling napi_pp_put_page() for devmem too
instead of calling page_pool_page_put_many()? mem provider has a
'release_page' ops, calling page_pool_page_put_many() directly here
seems to be bypassing the 'release_page' ops, which means devmem is
bypassing most of the main features of page pool.

As far as I can tell, the main features of page pool:
1. Allow lockless allocation and freeing in pool->alloc cache by
   utilizing NAPI non-concurrent context.
2. Allow concurrent allocation and freeing in pool->ring cache by
   utilizing ptr_ring.
3. Allow dma map/unmap and cache sync optimization.
4. Allow detailed stats logging and tracing.
5. Allow some bulk allocation and freeing.
6. support both skb packet and xdp frame.

I am wondering what is the main features that devmem is utilizing
by intergrating into page pool?

It seems the driver can just call netdev_alloc_devmem() and
napi_frag_unref() can call netdev_free_devmem() directly without
intergrating into page pool and it should just works too?

Maybe we should consider creating a new thin layer, in order to
demux to page pool, devmem or other mem type if my suggestion does
not work out too?

> 

^ permalink raw reply

* Re: [PATCH iwl-net] i40e: Fix max frame size check
From: Ivan Vecera @ 2023-11-10  8:10 UTC (permalink / raw)
  To: Jesse Brandeburg, intel-wired-lan
  Cc: Jacob Keller, Wojciech Drewek, Tony Nguyen, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	open list:NETWORKING DRIVERS
In-Reply-To: <d6114f2b-4ac8-ce15-19f6-483965daf3f3@intel.com>

On 08. 11. 23 21:38, Jesse Brandeburg wrote:
> On 11/8/2023 7:10 AM, Ivan Vecera wrote:
>> Commit 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set") added
>> a check for port's MFS (max frame size) value. The value is stored
>> in PRTGL_SAH register for each physical port. According datasheet this
>> register is defined as:
>>
>> PRTGL_SAH[PRT]: (0x001E2140 + 0x4*PRT, PRT=0...3)
>>
>> where PRT is physical port number.
> 
> <trimmed lkml, and a couple of non-existent intel addresses>
> 
> Was there an actual problem here? I suspect if you read all the
> registers for each PF's BAR, you'll find that all 4 report the same,
> correct value, for the perspective of the BAR they're being read from.
> 
> The i40e hardware does this (somewhat non-obvious) for *lots* of port
> specific registers, and what happens is no matter which of the 4 you
> read the value from, you'll get the right "port specific" value. This is
> because the hardware designers decided to make a different "view" on the
> register set depending on which PF you access it from. The only time
> these offsets matter is when the part is in debug mode or when the
> firmware is reading the internal registers (from the internal firmware
> register space - which has no aliasing) that rely on the correct offset.
> 
> In this case, I think your change won't make any functional difference,
> but I can see why you want to make the change as the code doesn't match
> the datasheet's definition of the register.
> 
> That all said, unless you can prove a problem, I'm relatively sure that
> nothing is wrong here in functionality or code. And if you go this
> route, there might be a lot of other registers to fix that have the same
> aliasing.
> 
> I apologize for the confusing manuals and header file, it's complicated
> but in practice works really well. Effectively you can't read other
> port's registers by accident.
> 
> Here was my experiment showing the aliasing on X722. You'll note that
> the lower 16 bits are a MAC address that doesn't change no matter which
> register you read.
> 
> device 20:0.0
> 0x1e2140 == 0x26008245
> 0x1e2144 == 0x26008245
> 0x1e2148 == 0x26008245
> 0x1e214c == 0x26008245
> device 20:0.1
> 0x1e2140 == 0x26008345
> 0x1e2144 == 0x26008345
> 0x1e2148 == 0x26008345
> 0x1e214c == 0x26008345
> device 20:0.2
> 0x1e2140 == 0x26008445
> 0x1e2144 == 0x26008445
> 0x1e2148 == 0x26008445
> 0x1e214c == 0x26008445
> device 20:0.3
> 0x1e2140 == 0x26008545
> 0x1e2144 == 0x26008545
> 0x1e2148 == 0x26008545
> 0x1e214c == 0x26008545
> 
> lspci -d ::0200
> 20:00.0 Ethernet controller: Intel Corporation Ethernet Connection X722
> for 10GBASE-T (rev 04)
> 20:00.1 Ethernet controller: Intel Corporation Ethernet Connection X722
> for 10GBASE-T (rev 04)
> 20:00.2 Ethernet controller: Intel Corporation Ethernet Connection X722
> for 10GbE SFP+ (rev 04)
> 20:00.3 Ethernet controller: Intel Corporation Ethernet Connection X722
> for 10GbE SFP+ (rev 04)
> 
> Hope this helps!

Hi Jesse,
thanks a lot for the explanation. I found this during preparation of my 
iwl-next stuff and found that variable 'i' is used inappropriately so I 
checked also the datasheet and found the definition of PRTGL_SAH 
register that is defined per port but I didn't know there is such 
aliasing for registers in PF BAR space.
I will send a new patch that at least fix the wrong usage of 'i' variable.

Thanks,
Ivan


^ permalink raw reply

* [PATCH iwl-net] i40e: Fix unexpected MFS warning message
From: Ivan Vecera @ 2023-11-10  8:12 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: Jacob Keller, Wojciech Drewek, Jesse Brandeburg, Tony Nguyen,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Todd Fujinaka, Jeff Kirsher, open list:NETWORKING DRIVERS,
	open list

Commit 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set") added
a warning message that reports unexpected size of port's MFS (max
frame size) value. This message use for the port number local
variable 'i' that is wrong.
In i40e_probe() this 'i' variable is used only to iterate VSIs
to find FDIR VSI:

<code>
...
/* if FDIR VSI was set up, start it now */
        for (i = 0; i < pf->num_alloc_vsi; i++) {
                if (pf->vsi[i] && pf->vsi[i]->type == I40E_VSI_FDIR) {
                        i40e_vsi_open(pf->vsi[i]);
                        break;
                }
        }
...
</code>

So the warning message use for the port number indext of FDIR VSI
if this exists or pf->num_alloc_vsi if not.

Fix the message by using 'pf->hw.port' for the port number.

Fixes: 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e870afa0e401..c36535145a41 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -16225,7 +16225,7 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	       I40E_PRTGL_SAH_MFS_MASK) >> I40E_PRTGL_SAH_MFS_SHIFT;
 	if (val < MAX_FRAME_SIZE_DEFAULT)
 		dev_warn(&pdev->dev, "MFS for port %x has been set below the default: %x\n",
-			 i, val);
+			 pf->hw.port, val);
 
 	/* Add a filter to drop all Flow control frames from any VSI from being
 	 * transmitted. By doing so we stop a malicious VF from sending out
-- 
2.41.0


^ permalink raw reply related

* Re: [RFC net-next] net: don't dump stack on queue timeout
From: Daniele Palmas @ 2023-11-10  8:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, syzbot+d55372214aff0faa1f1f, jhs,
	xiyou.wangcong, jiri
In-Reply-To: <20231109071850.053f04a7@kernel.org>

Il giorno gio 9 nov 2023 alle ore 16:18 Jakub Kicinski
<kuba@kernel.org> ha scritto:
>
> On Thu, 9 Nov 2023 08:40:00 +0100 Daniele Palmas wrote:
> > For example, I can see the splat with MBIM modems when radio link
> > failure happens, something for which the host can't really do
> > anything. So, the main result of using WARN is to scare the users who
> > are not aware of the reasons behind it and create unneeded support
> > requests...
>
> Is it not possible to clear the carrier on downstream devices?
> Radio link failure sounds like carrier loss.

The problem is that the MBIM standard does not define the
CDC_NOTIFY_NETWORK_CONNECTION, so carrier loss detection is managed
through the indications on the control channel.

But the kernel is not aware of what's passing through the control
channel, so it's the userspace tool that should detect carrier loss,
disconnect the bearers and set the network interface down.

For example, ModemManager is capable of doing that, but the problem is
that usually the standard modem notifications on the control channel
arrive later than the splat: increasing watchdog_timeo does not seem
to me a good option, since the notification could arrive much later.

One possible solution is to have some proprietary notifications on the
control channel that detect RLF early and trigger the above described
process before the warn happens: by coincidence, I wrote a custom
ModemManager patch for this a few days ago
https://gitlab.freedesktop.org/dnlplm/ModemManager/-/commit/89ba8ab65d4bfbd4cf1ff11ed58c08b112aca80f

Regards,
Daniele

^ permalink raw reply

* Re: PRP with VLAN support - or how to contribute to a Linux network driver
From: Heiko Gerstung @ 2023-11-10  8:24 UTC (permalink / raw)
  To: Kristian Myrland Overskeid; +Cc: Andrew Lunn, netdev@vger.kernel.org
In-Reply-To: <CAGL4nSN0ZLHjARRRvS8Df8gLQLUb0ddiSJ5UfjNte0oX83VTOg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5397 bytes --]



Am 09.11.23, 13:20 schrieb "Kristian Myrland Overskeid" <koverskeid@gmail.com <mailto:koverskeid@gmail.com>>:


Hi Kristian,

> If you simply remove the line "dev->features |=
> NETIF_F_VLAN_CHALLENGED;" in hsr_device.c, the hsr-module is handling
> vlan frames without any further modifications. Unless you need to send
> vlan tagged supervision frames, I'm pretty sure the current
> implementation works just as fine with vlan as without.

thanks a lot for your respsonse - we tried removing the NETIF_F_VLAN_CHALLENGED flag and it did not work for us. We could set up a VLAN interface on top of the PRP interface, but traffic did not get through. I will retest this to make sure we did not overlook something.

> However, in my opinion, the discard-algorithm
> (hsr_register_frame_out() in hsr_framereg.c) is not made for switched
> networks. The problem with the current implementation is that it does
> not account for frames arriving in a different order than it was sent
> from a host. It simply checks if the sequence number of an arriving
> frame is higher than the previous one. If the network has some sort of
> priority, it must be expected that frames will arrive out of order
> when the network load is big enough for the switches to start
> prioritizing.
>
> My solution was to add a linked list to the node struct, one for each
> registered vlan id. It contains the vlan id, last sequence number and
> time. On reception of a vlan frame to the HSR_PT_MASTER, it retrieves
> the "node_seq_out" and "node_time_out" based on the vlan.

I agree that it would be necessary to handle frames arriving in a mixed up order.

> This works fine for me because all the prp nodes are connected to
> trunk ports and the switches are prioritizing frames based on the vlan
> tag.

> If a prp node is connected to an access port, but the network is using
> vlan priority, all sequence numbers and timestamps with the
> corresponding vlan id must be kept in a hashed list. The list must be
> regularly checked to remove elements before new frames with a wrapped
> around sequence number can arrive.

If I understand correctly, this would make the discard process more robust because in the access port scenario the frames can arrive in an even more mixed up order or do you mean that the access port is removing the VLAN tag and sends the frames untagged to the node?

> ZHAW School of Engineering has made a prp program for both linux user
> and kernel space with such a discard algorithm. The program does not
> compile without some modifications, but the discard algorithm works
> fine. The program is open source and can be found at
> https://github.com/ZHAW-InES-Team/sw_stack_prp1 <https://github.com/ZHAW-InES-Team/sw_stack_prp1>.


I will reach out to ZHAW and check with them if they would be willing to implement their more robust discard mechanism into the hsr module. The github repo has a note saying it moved to github.zhaw.ch which I cannot access as it requires credentials. 

Thanks again, 

Heiko





tor. 9. nov. 2023 kl. 09:08 skrev Heiko Gerstung <heiko.gerstung@meinberg.de <mailto:heiko.gerstung@meinberg.de>>:
>
> Am 08.11.23, 16:17 schrieb "Andrew Lunn" <andrew@lunn.ch <mailto:andrew@lunn.ch> <mailto:andrew@lunn.ch <mailto:andrew@lunn.ch>>>:
>
>
> >> I would like to discuss if it makes sense to remove the PRP
> >> functionality from the HSR driver (which is based on the bridge
> >> kernel module AFAICS) and instead implement PRP as a separate module
> >> (based on the Bonding driver, which would make more sense for PRP).
>
>
> > Seems like nobody replied. I don't know PRP or HSR, so i can only make
> > general remarks.
>
> Thank you for responding!
>
> > The general policy is that we don't rip something out and replace it
> > with new code. We try to improve what already exists to meet the
> > demands. This is partially because of backwards compatibility. There
> > could be users using the code as is. You cannot break that. Can you
> > step by step modify the current code to make use of bonding, and in
> > the process show you don't break the current use cases?
>
> Understood. I am not sure if we can change the hsr driver to gradually use a more bonding-like approach for prp and I believe this might not be required, as long as we can get VLAN support into it.
>
> > You also need to consider offloading to hardware. The bridge code has infrastructure
> > to offload. Does the bond driver? I've no idea about that.
>
> I do not know this either but would expect that the nature of bonding would not require offloading support (I do not see a potential for efficiency/performance improvements here, unlike HSR or PRP).
>
> >> Hoping for advise what the next steps could be. Happy to discuss
> >> this off-list as it may not be of interest for most people.
>
> > You probably want to get together with others who are interested in
> > PRP and HSR. linutronix, ti, microchip, etc.
>
> Yes, would love to do that and my hope was that I would find them here. I am not familiar with the "orphaned" status for a kernel module, but I would have expected that one of the mentioned parties interested in PRP/HSR would have adopted the module.
>
> > Andrew
>
> Again, thanks a lot for your comments and remarks, very useful.
>
> Heiko
>
>
>






[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 8165 bytes --]

^ permalink raw reply

* Re: [PATCH net] bonding: stop the device in bond_setup_by_slave()
From: Eric Dumazet @ 2023-11-10  8:38 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Jay Vosburgh,
	Andy Gospodarek, netdev, eric.dumazet, syzbot
In-Reply-To: <ZU2nBgeOAZVs4KKJ@Laptop-X1>

On Fri, Nov 10, 2023 at 4:44 AM Hangbin Liu <liuhangbin@gmail.com> wrote:
>
> On Thu, Nov 09, 2023 at 06:01:02PM +0000, Eric Dumazet wrote:
> > Commit 9eed321cde22 ("net: lapbether: only support ethernet devices")
> > has been able to keep syzbot away from net/lapb, until today.
> >
> > In the following splat [1], the issue is that a lapbether device has
> > been created on a bonding device without members. Then adding a non
> > ARPHRD_ETHER member forced the bonding master to change its type.
> >
> > The fix is to make sure we call dev_close() in bond_setup_by_slave()
> > so that the potential linked lapbether devices (or any other devices
> > having assumptions on the physical device) are removed.
> >
> > A similar bug has been addressed in commit 40baec225765
> > ("bonding: fix panic on non-ARPHRD_ETHER enslave failure")
> >
>
> Do we need also do this if the bond changed to ether device from other dev
> type? e.g.
>
>     if (slave_dev->type != ARPHRD_ETHER)
>             bond_setup_by_slave(bond_dev, slave_dev);
>     else
>             bond_ether_setup(bond_dev);

Hmmm... possibly, but as far as I know, nothing can be stacked on top of IPoIB

Note that another way to deal with this in a fine grained way is to
return NOTIFY_BAD
from NETDEV_PRE_TYPE_CHANGE event (vlan, macvlan, ipvlan ...)

^ permalink raw reply

* Re: [PATCH RESEND] ptp: Fixes a null pointer dereference in ptp_ioctl
From: patchwork-bot+netdevbpf @ 2023-11-10  8:40 UTC (permalink / raw)
  To: Yuran Pereira
  Cc: richardcochran, netdev, eadavis, davem, reibax, linux-kernel,
	linux-kernel-mentees, syzbot+8a78ecea7ac1a2ea26e5
In-Reply-To: <DB3PR10MB6835D68E7E632532155AE585E8A9A@DB3PR10MB6835.EURPRD10.PROD.OUTLOOK.COM>

Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Wed,  8 Nov 2023 02:18:36 +0530 you wrote:
> Syzkaller found a null pointer dereference in ptp_ioctl
> originating from the lack of a null check for tsevq.
> 
> ```
> general protection fault, probably for non-canonical
> 	address 0xdffffc000000020b: 0000 [#1] PREEMPT SMP KASAN
> KASAN: probably user-memory-access in range
> 	[0x0000000000001058-0x000000000000105f]
> CPU: 0 PID: 5053 Comm: syz-executor353 Not tainted
> 	6.6.0-syzkaller-10396-g4652b8e4f3ff #0
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> 	BIOS Google 10/09/2023
> RIP: 0010:ptp_ioctl+0xcb7/0x1d10 drivers/ptp/ptp_chardev.c:476
> ...
> Call Trace:
>  <TASK>
>  posix_clock_ioctl+0xf8/0x160 kernel/time/posix-clock.c:86
>  vfs_ioctl fs/ioctl.c:51 [inline]
>  __do_sys_ioctl fs/ioctl.c:871 [inline]
>  __se_sys_ioctl fs/ioctl.c:857 [inline]
>  __x64_sys_ioctl+0x18f/0x210 fs/ioctl.c:857
>  do_syscall_x64 arch/x86/entry/common.c:51 [inline]
>  do_syscall_64+0x3f/0x110 arch/x86/entry/common.c:82
>  entry_SYSCALL_64_after_hwframe+0x63/0x6b
> ```
> 
> [...]

Here is the summary with links:
  - [RESEND] ptp: Fixes a null pointer dereference in ptp_ioctl
    https://git.kernel.org/netdev/net/c/8a4f030dbced

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v2] net: set SOCK_RCU_FREE before inserting socket into hashtable
From: patchwork-bot+netdevbpf @ 2023-11-10  8:40 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, davem, edumazet, kuba, pabeni
In-Reply-To: <20231108211325.18938-1-sdf@google.com>

Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Wed,  8 Nov 2023 13:13:25 -0800 you wrote:
> We've started to see the following kernel traces:
> 
>  WARNING: CPU: 83 PID: 0 at net/core/filter.c:6641 sk_lookup+0x1bd/0x1d0
> 
>  Call Trace:
>   <IRQ>
>   __bpf_skc_lookup+0x10d/0x120
>   bpf_sk_lookup+0x48/0xd0
>   bpf_sk_lookup_tcp+0x19/0x20
>   bpf_prog_<redacted>+0x37c/0x16a3
>   cls_bpf_classify+0x205/0x2e0
>   tcf_classify+0x92/0x160
>   __netif_receive_skb_core+0xe52/0xf10
>   __netif_receive_skb_list_core+0x96/0x2b0
>   napi_complete_done+0x7b5/0xb70
>   <redacted>_poll+0x94/0xb0
>   net_rx_action+0x163/0x1d70
>   __do_softirq+0xdc/0x32e
>   asm_call_irq_on_stack+0x12/0x20
>   </IRQ>
>   do_softirq_own_stack+0x36/0x50
>   do_softirq+0x44/0x70
> 
> [...]

Here is the summary with links:
  - [net,v2] net: set SOCK_RCU_FREE before inserting socket into hashtable
    https://git.kernel.org/netdev/net/c/871019b22d1b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH] MAINTAINERS: net: Update reviewers for TI's Ethernet drivers
From: Ravi Gunasekaran @ 2023-11-10  8:42 UTC (permalink / raw)
  To: netdev
  Cc: linux-omap, linux-kernel, s-vadapalli, nm, srk, rogerq,
	r-gunasekaran

Grygorii is no longer associated with TI and messages addressed to
him bounce.

Add Siddharth and myself as reviewers.

Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com>
---
 MAINTAINERS | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7b151710e8c5..bd52c33bca02 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -21775,7 +21775,8 @@ F:	Documentation/devicetree/bindings/counter/ti-eqep.yaml
 F:	drivers/counter/ti-eqep.c
 
 TI ETHERNET SWITCH DRIVER (CPSW)
-R:	Grygorii Strashko <grygorii.strashko@ti.com>
+R:	Siddharth Vadapalli <s-vadapalli@ti.com>
+R:	Ravi Gunasekaran <r-gunasekaran@ti.com>
 L:	linux-omap@vger.kernel.org
 L:	netdev@vger.kernel.org
 S:	Maintained
-- 
2.17.1


^ permalink raw reply related

* [PATCH] Fixes the null pointer deferences in nsim_bpf
From: Dipendra Khadka @ 2023-11-10  8:44 UTC (permalink / raw)
  To: kuba, davem, edumazet, pabeni
  Cc: Dipendra Khadka, netdev, linux-kernel, linux-kernel-mentees,
	syzbot+44c2416196b7c607f226

Syzkaller found a null pointer dereference in nsim_bpf
originating from the lack of a null check for state.

This patch fixes the issue by adding a check for state
in two functions nsim_prog_set_loaded and nsim_setup_prog_hw_checks

Reported-by: syzbot+44c2416196b7c607f226@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com./bug?extid=44c2416196b7c607f226

Signed-off-by: Dipendra Khadka <kdipendra88@gmail.com>
---
 drivers/net/netdevsim/bpf.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index f60eb97e3a62..e407efb0e3de 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -97,7 +97,8 @@ static void nsim_prog_set_loaded(struct bpf_prog *prog, bool loaded)
 		return;
 
 	state = prog->aux->offload->dev_priv;
-	state->is_loaded = loaded;
+	if (state)
+		state->is_loaded = loaded;
 }
 
 static int
@@ -317,10 +318,12 @@ nsim_setup_prog_hw_checks(struct netdevsim *ns, struct netdev_bpf *bpf)
 	}
 
 	state = bpf->prog->aux->offload->dev_priv;
-	if (WARN_ON(strcmp(state->state, "xlated"))) {
-		NSIM_EA(bpf->extack, "offloading program in bad state");
-		return -EINVAL;
-	}
+	if (state) {
+		if (WARN_ON(strcmp(state->state, "xlated"))) {
+			NSIM_EA(bpf->extack, "offloading program in bad state");
+			return -EINVAL;
+		}
+	}
 	return 0;
 }
 
-- 
2.34.1


^ permalink raw reply related

* RE: [PATCH net 3/3] dpll: fix register pin with unregistered parent pin
From: Kubalewski, Arkadiusz @ 2023-11-10  8:50 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Vadim Fedorenko, netdev@vger.kernel.org, Michalik, Michal,
	Olech, Milena, pabeni@redhat.com, kuba@kernel.org
In-Reply-To: <ZU3SJdlOwoFOEavA@nanopsycho>

>From: Jiri Pirko <jiri@resnulli.us>
>Sent: Friday, November 10, 2023 7:48 AM
>
>Fri, Nov 10, 2023 at 12:21:11AM CET, arkadiusz.kubalewski@intel.com wrote:
>>>From: Jiri Pirko <jiri@resnulli.us>
>>>Sent: Thursday, November 9, 2023 7:04 PM
>>>
>>>Thu, Nov 09, 2023 at 05:02:48PM CET, arkadiusz.kubalewski@intel.com
>>>wrote:
>>>>>From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
>>>>>Sent: Thursday, November 9, 2023 11:56 AM
>>>>>To: Kubalewski, Arkadiusz <arkadiusz.kubalewski@intel.com>; Jiri Pirko
>>>>>
>>>>>On 09/11/2023 09:59, Kubalewski, Arkadiusz wrote:
>>>>>>> From: Jiri Pirko <jiri@resnulli.us>
>>>>>>> Sent: Wednesday, November 8, 2023 4:08 PM
>>>>>>>
>>>>>>> Wed, Nov 08, 2023 at 11:32:26AM CET, arkadiusz.kubalewski@intel.com
>>>>>>> wrote:
>>>>>>>> In case of multiple kernel module instances using the same dpll
>>>>>>>>device:
>>>>>>>> if only one registers dpll device, then only that one can register
>>>>>>>
>>>>>>> They why you don't register in multiple instances? See mlx5 for a
>>>>>>> reference.
>>>>>>>
>>>>>>
>>>>>> Every registration requires ops, but for our case only PF0 is able to
>>>>>> control dpll pins and device, thus only this can provide ops.
>>>>>> Basically without PF0, dpll is not able to be controlled, as well
>>>>>> as directly connected pins.
>>>>>>
>>>>>But why do you need other pins then, if FP0 doesn't exist?
>>>>>
>>>>
>>>>In general we don't need them at that point, but this is a corner case,
>>>>where users for some reason decided to unbind PF 0, and I treat this
>>>>state
>>>>as temporary, where dpll/pins controllability is temporarily broken.
>>>
>>>So resolve this broken situation internally in the driver, registering
>>>things only in case PF0 is present. Some simple notification infra would
>>>do. Don't drag this into the subsystem internals.
>>>
>>
>>Thanks for your feedback, but this is already wrong advice.
>>
>>Our HW/FW is designed in different way than yours, it doesn't mean it is
>>wrong.
>>As you might recall from our sync meetings, the dpll subsystem is to unify
>>approaches and reduce the code in the drivers, where your advice is
>>exactly
>
>No. Your driver knows when what objects are valid or not. Of a pin of
>PF1 is not valid because the "master" PF0 is gone, it is responsibility
>of your driver to resolve that. Don't bring this internal dependencies
>to the dpll core please, does not make any sense to do so. Thanks!
>

No, a driver doesn't know it, those are separated instances, and you already
suggested to implement special notification bus in the driver.
This is not needed and prone for another errors. The dpll subsystem is here to
make driver life easier.

Thank you!
Arkadiusz

>
>>opposite, suggested fix would require to implement extra synchronization
>>of the
>>dpll and pin registration state between driver instances, most probably
>>with
>>use of additional modules like aux-bus or something similar, which was
>>from the
>>very beginning something we tried to avoid.
>>Only ice uses the infrastructure of muxed pins, and this is broken as it
>>doesn't allow unbind the driver which have registered dpll and pins
>>without
>>crashing the kernel, so a fix is required in dpll subsystem, not in the
>>driver.
>>
>>Thank you!
>>Arkadiusz
>>
>>>
>>>>
>>>>The dpll at that point is not registered, all the direct pins are also
>>>>not registered, thus not available to the users.
>>>>
>>>>When I do dump at that point there are still 3 pins present, one for
>>>>each
>>>>PF, although they are all zombies - no parents as their parent pins are
>>>>not
>>>>registered (as the other patch [1/3] prevents dump of pin parent if the
>>>>parent is not registered). Maybe we can remove the REGISTERED mark for
>>>>all
>>>>the muxed pins, if all their parents have been unregistered, so they
>>>>won't
>>>>be visible to the user at all. Will try to POC that.
>>>>
>>>>>>>
>>>>>>>> directly connected pins with a dpll device. If unregistered parent
>>>>>>>> determines if the muxed pin can be register with it or not, it
>>>>>>>>forces
>>>>>>>> serialized driver load order - first the driver instance which
>>>>>>>> registers the direct pins needs to be loaded, then the other
>>>>>>>> instances
>>>>>>>> could register muxed type pins.
>>>>>>>>
>>>>>>>> Allow registration of a pin with a parent even if the parent was
>>>>>>>>not
>>>>>>>> yet registered, thus allow ability for unserialized driver instance
>>>>>>>
>>>>>>> Weird.
>>>>>>>
>>>>>>
>>>>>> Yeah, this is issue only for MUX/parent pin part, couldn't find
>>>>>>better
>>>>>> way, but it doesn't seem to break things around..
>>>>>>
>>>>>
>>>>>I just wonder how do you see the registration procedure? How can parent
>>>>>pin exist if it's not registered? I believe you cannot get it through
>>>>>DPLL API, then the only possible way is to create it within the same
>>>>>driver code, which can be simply re-arranged. Am I wrong here?
>>>>>
>>>>
>>>>By "parent exist" I mean the parent pin exist in the dpll subsystem
>>>>(allocated on pins xa), but it doesn't mean it is available to the
>>>>users,
>>>>as it might not be registered with a dpll device.
>>>>
>>>>We have this 2 step init approach:
>>>>1. dpll_pin_get(..) -> allocate new pin or increase reference if exist
>>>>2.1. dpll_pin_register(..) -> register with a dpll device
>>>>2.2. dpll_pin_on_pin_register -> register with a parent pin
>>>>
>>>>Basically:
>>>>- PF 0 does 1 & 2.1 for all the direct inputs, and steps: 1 & 2.2 for
>>>>its
>>>>  recovery clock pin,
>>>>- other PF's only do step 1 for the direct input pins (as they must get
>>>>  reference to those in order to register recovery clock pin with them),
>>>>  and steps: 1 & 2.2 for their recovery clock pin.
>>>>
>>>>
>>>>Thank you!
>>>>Arkadiusz
>>>>
>>>>>> Thank you!
>>>>>> Arkadiusz
>>>>>>
>>>>>>>
>>>>>>>> load order.
>>>>>>>> Do not WARN_ON notification for unregistered pin, which can be
>>>>>>>> invoked
>>>>>>>> for described case, instead just return error.
>>>>>>>>
>>>>>>>> Fixes: 9431063ad323 ("dpll: core: Add DPLL framework base
>>>>>>>>functions")
>>>>>>>> Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base
>>>>>>>> functions")
>>>>>>>> Signed-off-by: Arkadiusz Kubalewski
>>>>>>>><arkadiusz.kubalewski@intel.com>
>>>>>>>> ---
>>>>>>>> drivers/dpll/dpll_core.c    | 4 ----
>>>>>>>> drivers/dpll/dpll_netlink.c | 2 +-
>>>>>>>> 2 files changed, 1 insertion(+), 5 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
>>>>>>>> index
>>>>>>>> 4077b562ba3b..ae884b92d68c 100644
>>>>>>>> --- a/drivers/dpll/dpll_core.c
>>>>>>>> +++ b/drivers/dpll/dpll_core.c
>>>>>>>> @@ -28,8 +28,6 @@ static u32 dpll_xa_id;
>>>>>>>> 	WARN_ON_ONCE(!xa_get_mark(&dpll_device_xa, (d)->id,
>>>>>>>> DPLL_REGISTERED))
>>>>>>>> #define ASSERT_DPLL_NOT_REGISTERED(d)	\
>>>>>>>> 	WARN_ON_ONCE(xa_get_mark(&dpll_device_xa, (d)->id,
>>>>>>>> DPLL_REGISTERED))
>>>>>>>> -#define ASSERT_PIN_REGISTERED(p)	\
>>>>>>>> -	WARN_ON_ONCE(!xa_get_mark(&dpll_pin_xa, (p)->id,
>>>>>>>> DPLL_REGISTERED))
>>>>>>>>
>>>>>>>> struct dpll_device_registration {
>>>>>>>> 	struct list_head list;
>>>>>>>> @@ -641,8 +639,6 @@ int dpll_pin_on_pin_register(struct dpll_pin
>>>>>>>> *parent,
>>>>>>>> struct dpll_pin *pin,
>>>>>>>> 	    WARN_ON(!ops->state_on_pin_get) ||
>>>>>>>> 	    WARN_ON(!ops->direction_get))
>>>>>>>> 		return -EINVAL;
>>>>>>>> -	if (ASSERT_PIN_REGISTERED(parent))
>>>>>>>> -		return -EINVAL;
>>>>>>>>
>>>>>>>> 	mutex_lock(&dpll_lock);
>>>>>>>> 	ret = dpll_xa_ref_pin_add(&pin->parent_refs, parent, ops,
>>>>>>>> priv); diff
>>>>>>>> --git a/drivers/dpll/dpll_netlink.c b/drivers/dpll/dpll_netlink.c
>>>>>>>> index
>>>>>>>> 963bbbbe6660..ff430f43304f 100644
>>>>>>>> --- a/drivers/dpll/dpll_netlink.c
>>>>>>>> +++ b/drivers/dpll/dpll_netlink.c
>>>>>>>> @@ -558,7 +558,7 @@ dpll_pin_event_send(enum dpll_cmd event, struct
>>>>>>>> dpll_pin *pin)
>>>>>>>> 	int ret = -ENOMEM;
>>>>>>>> 	void *hdr;
>>>>>>>>
>>>>>>>> -	if (WARN_ON(!xa_get_mark(&dpll_pin_xa, pin->id,
>>>>>>>> DPLL_REGISTERED)))
>>>>>>>> +	if (!xa_get_mark(&dpll_pin_xa, pin->id, DPLL_REGISTERED))
>>>>>>>> 		return -ENODEV;
>>>>>>>>
>>>>>>>> 	msg = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
>>>>>>>> --
>>>>>>>> 2.38.1
>>>>>>>>
>>>>>
>>>>

^ permalink raw reply

* Re: [PATCH] MAINTAINERS: net: Update reviewers for TI's Ethernet drivers
From: Roger Quadros @ 2023-11-10  8:51 UTC (permalink / raw)
  To: Ravi Gunasekaran, netdev
  Cc: linux-omap, linux-kernel, s-vadapalli, nm, srk, Md Danish Anwar
In-Reply-To: <20231110084227.2616-1-r-gunasekaran@ti.com>

Hi Ravi,

On 10/11/2023 10:42, Ravi Gunasekaran wrote:
> Grygorii is no longer associated with TI and messages addressed to
> him bounce.
> 
> Add Siddharth and myself as reviewers.
> 
> Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com>
> ---
>  MAINTAINERS | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7b151710e8c5..bd52c33bca02 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -21775,7 +21775,8 @@ F:	Documentation/devicetree/bindings/counter/ti-eqep.yaml
>  F:	drivers/counter/ti-eqep.c
>  
>  TI ETHERNET SWITCH DRIVER (CPSW)
> -R:	Grygorii Strashko <grygorii.strashko@ti.com>
> +R:	Siddharth Vadapalli <s-vadapalli@ti.com>
> +R:	Ravi Gunasekaran <r-gunasekaran@ti.com>

Could you please add me as Reviewer as well. Thanks!

>  L:	linux-omap@vger.kernel.org
>  L:	netdev@vger.kernel.org
>  S:	Maintained

> F:      drivers/net/ethernet/ti/cpsw*
> F:      drivers/net/ethernet/ti/davinci*

What about am65-cpsw*?

And drivers/net/ethernet/ti/icssg/*

I also see 

OMAP GPIO DRIVER
M:      Grygorii Strashko <grygorii.strashko@ti.com>

Maybe a separate patch to remove the invalid email-id?

-- 
cheers,
-roger

^ permalink raw reply

* Re: [PATCH] Fixes the null pointer deferences in nsim_bpf
From: Eric Dumazet @ 2023-11-10  8:52 UTC (permalink / raw)
  To: Dipendra Khadka
  Cc: kuba, davem, pabeni, netdev, linux-kernel, linux-kernel-mentees,
	syzbot+44c2416196b7c607f226
In-Reply-To: <20231110084425.2123-1-kdipendra88@gmail.com>

On Fri, Nov 10, 2023 at 9:45 AM Dipendra Khadka <kdipendra88@gmail.com> wrote:
>
> Syzkaller found a null pointer dereference in nsim_bpf
> originating from the lack of a null check for state.
>
> This patch fixes the issue by adding a check for state
> in two functions nsim_prog_set_loaded and nsim_setup_prog_hw_checks
>
> Reported-by: syzbot+44c2416196b7c607f226@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com./bug?extid=44c2416196b7c607f226

Please add a Fixes: tag, and remove this empty line, thanks.

>
> Signed-off-by: Dipendra Khadka <kdipendra88@gmail.com>
> ---
>  drivers/net/netdevsim/bpf.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
> index f60eb97e3a62..e407efb0e3de 100644
> --- a/drivers/net/netdevsim/bpf.c
> +++ b/drivers/net/netdevsim/bpf.c
> @@ -97,7 +97,8 @@ static void nsim_prog_set_loaded(struct bpf_prog *prog, bool loaded)
>                 return;
>
>         state = prog->aux->offload->dev_priv;
> -       state->is_loaded = loaded;
> +       if (state)
> +               state->is_loaded = loaded;
>  }
>
>  static int
> @@ -317,10 +318,12 @@ nsim_setup_prog_hw_checks(struct netdevsim *ns, struct netdev_bpf *bpf)
>         }
>
>         state = bpf->prog->aux->offload->dev_priv;
> -       if (WARN_ON(strcmp(state->state, "xlated"))) {
> -               NSIM_EA(bpf->extack, "offloading program in bad state");
> -               return -EINVAL;
> -       }
> +       if (state) {
> +               if (WARN_ON(strcmp(state->state, "xlated"))) {
> +                       NSIM_EA(bpf->extack, "offloading program in bad state");
> +                       return -EINVAL;
> +               }
> +       }
>         return 0;
>  }
>
> --
> 2.34.1
>

^ permalink raw reply

* Re: [PATCH v2 1/3] net: phy: at803x: add QCA8084 ethernet phy support
From: Jie Luo @ 2023-11-10  8:53 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: andrew, hkallweit1, linux, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel
In-Reply-To: <20231109101618.009efb45@fedora>



On 11/9/2023 5:16 PM, Maxime Chevallier wrote:
> Hello,
> 
> On Thu, 9 Nov 2023 16:32:36 +0800
> Jie Luo <quic_luoj@quicinc.com> wrote:
> 
> [...]
> 
>>> What I understand from this is that this PHY can be used either as a
>>> switch, in which case port 4 would be connected to the host interface
>>> at up to 2.5G, or as a quad-phy, but since it uses QUSGMII the link
>>> speed would be limited to 1G per-port, is that correct ?
>>
>> When the PHY works on the interface mode QUSGMII for quad-phy, all 4
>> PHYs can support to the max link speed 2.5G, actually the PHY can
>> support to max link speed 2.5G for all supported interface modes
>> including qusgmii and sgmii.
> 
> I'm a bit confused then, as the USGMII spec says that Quad USGMII really
> is for quad 10/100/1000 speeds, using 10b/8b encoding.
> 
> Aren't you using the USXGMII mode instead, which can convey 4 x 2.5Gbps
>   with 66b/64b encoding ?
> 
> Thanks,
> 
> Maxime

Hi Maxime,
Yes, for quad PHY mode, it is using 66b/64 encoding.

it seems that PHY_INTERFACE_MODE_USXGMII is for single port,
so i take the interface name PHY_INTERFACE_MODE_QUSGMII for
quad PHYs here.

can we apply PHY_INTERFACE_MODE_USXGMII to quad PHYs in this
case(qca8084 quad PHY mode)?

Thanks,
Jie.

^ permalink raw reply

* Re: [RFC PATCH 3/8] net: pcs: pcs-mtk-lynxi: use 2500Base-X without AN
From: Russell King (Oracle) @ 2023-11-10  8:54 UTC (permalink / raw)
  To: Daniel Golle
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Chunfeng Yun,
	Vinod Koul, Kishon Vijay Abraham I, Felix Fietkau, John Crispin,
	Sean Wang, Mark Lee, Lorenzo Bianconi, Matthias Brugger,
	AngeloGioacchino Del Regno, Andrew Lunn, Heiner Kallweit,
	Alexander Couzens, Philipp Zabel, netdev, devicetree,
	linux-kernel, linux-arm-kernel, linux-mediatek, linux-phy
In-Reply-To: <091e466912f1333bb76d23e95dc6019c9b71645f.1699565880.git.daniel@makrotopia.org>

On Thu, Nov 09, 2023 at 09:51:22PM +0000, Daniel Golle wrote:
> Using 2500Base-T SFP modules e.g. on the BananaPi R3 requires manually
> disabling auto-negotiation, e.g. using ethtool. While a proper fix
> using SFP quirks is being discussed upstream, bring a work-around to
> restore user experience to what it was before the switch to the
> dedicated SGMII PCS driver.

No.

> @@ -129,7 +138,8 @@ static int mtk_pcs_lynxi_config(struct phylink_pcs *pcs, unsigned int neg_mode,
>  	if (neg_mode & PHYLINK_PCS_NEG_INBAND)
>  		sgm_mode |= SGMII_REMOTE_FAULT_DIS;
>  
> -	if (neg_mode == PHYLINK_PCS_NEG_INBAND_ENABLED) {
> +	if (neg_mode == PHYLINK_PCS_NEG_INBAND_ENABLED &&
> +	    interface != PHY_INTERFACE_MODE_2500BASEX) {
>  		if (interface == PHY_INTERFACE_MODE_SGMII)
>  			sgm_mode |= SGMII_SPEED_DUPLEX_AN;
>  		bmcr = BMCR_ANENABLE;

Phylink is asking you to have inband enabled. If inband needs to be
disabled, then we need to arrange for phylink to pass
PHYLINK_PCS_NEG_INBAND_DISABLED.

Please don't hack special handling and behaviour into drivers.

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* RE: [PATCH net 2/3] dpll: fix pin dump crash for rebound module
From: Kubalewski, Arkadiusz @ 2023-11-10  9:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev@vger.kernel.org, vadim.fedorenko@linux.dev,
	Michalik, Michal, Olech, Milena, pabeni@redhat.com,
	kuba@kernel.org
In-Reply-To: <ZU3RlSmInnoXufxf@nanopsycho>

>From: Jiri Pirko <jiri@resnulli.us>
>Sent: Friday, November 10, 2023 7:46 AM
>
>Fri, Nov 10, 2023 at 12:32:21AM CET, arkadiusz.kubalewski@intel.com wrote:
>>>From: Jiri Pirko <jiri@resnulli.us>
>>>Sent: Thursday, November 9, 2023 7:06 PM
>>>
>>>Thu, Nov 09, 2023 at 05:30:20PM CET, arkadiusz.kubalewski@intel.com
>>>wrote:
>>>>>From: Jiri Pirko <jiri@resnulli.us>
>>>>>Sent: Thursday, November 9, 2023 2:19 PM
>>>>>
>>>>>Thu, Nov 09, 2023 at 01:20:48PM CET, arkadiusz.kubalewski@intel.com
>>>>>wrote:
>>>>>>>From: Jiri Pirko <jiri@resnulli.us>
>>>>>>>Sent: Wednesday, November 8, 2023 3:30 PM
>>>>>>>
>>>>>>>Wed, Nov 08, 2023 at 11:32:25AM CET, arkadiusz.kubalewski@intel.com
>>>>>>>wrote:
>>>>>>>>When a kernel module is unbound but the pin resources were not
>>>>>>>>entirely
>>>>>>>>freed (other kernel module instance have had kept the reference to
>>>>>>>>that
>>>>>>>>pin), and kernel module is again bound, the pin properties would not
>>>>>>>>be
>>>>>>>>updated (the properties are only assigned when memory for the pin is
>>>>>>>>allocated), prop pointer still points to the kernel module memory of
>>>>>>>>the kernel module which was deallocated on the unbind.
>>>>>>>>
>>>>>>>>If the pin dump is invoked in this state, the result is a kernel
>>>>>>>>crash.
>>>>>>>>Prevent the crash by storing persistent pin properties in dpll
>>>>>>>>subsystem,
>>>>>>>>copy the content from the kernel module when pin is allocated,
>>>>>>>>instead
>>>>>>>>of
>>>>>>>>using memory of the kernel module.
>>>>>>>>
>>>>>>>>Fixes: 9431063ad323 ("dpll: core: Add DPLL framework base
>>>>>>>>functions")
>>>>>>>>Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base
>>>>>>>>functions")
>>>>>>>>Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
>>>>>>>>---
>>>>>>>> drivers/dpll/dpll_core.c    |  4 ++--
>>>>>>>> drivers/dpll/dpll_core.h    |  4 ++--
>>>>>>>> drivers/dpll/dpll_netlink.c | 28 ++++++++++++++--------------
>>>>>>>> 3 files changed, 18 insertions(+), 18 deletions(-)
>>>>>>>>
>>>>>>>>diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
>>>>>>>>index 3568149b9562..4077b562ba3b 100644
>>>>>>>>--- a/drivers/dpll/dpll_core.c
>>>>>>>>+++ b/drivers/dpll/dpll_core.c
>>>>>>>>@@ -442,7 +442,7 @@ dpll_pin_alloc(u64 clock_id, u32 pin_idx, struct
>>>>>>>>module *module,
>>>>>>>> 		ret = -EINVAL;
>>>>>>>> 		goto err;
>>>>>>>> 	}
>>>>>>>>-	pin->prop = prop;
>>>>>>>>+	memcpy(&pin->prop, prop, sizeof(pin->prop));
>>>>>>>
>>>>>>>Odd, you don't care about the pointer within this structure?
>>>>>>>
>>>>>>
>>>>>>Well, true. Need a fix.
>>>>>>Wondering if copying idea is better than just assigning prop pointer
>>>>>>on
>>>>>>each call to dpll_pin_get(..) function (when pin already exists)?
>>>>>
>>>>>Not sure what do you mean. Examples please.
>>>>>
>>>>
>>>>Sure,
>>>>
>>>>Basically this change:
>>>>
>>>>diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
>>>>index ae884b92d68c..06b72d5877c3 100644
>>>>--- a/drivers/dpll/dpll_core.c
>>>>+++ b/drivers/dpll/dpll_core.c
>>>>@@ -483,6 +483,7 @@ dpll_pin_get(u64 clock_id, u32 pin_idx, struct
>>>>module
>>>>*module,
>>>>                    pos->pin_idx == pin_idx &&
>>>>                    pos->module == module) {
>>>>                        ret = pos;
>>>>+                       pos->prop = prop;
>>>>                        refcount_inc(&ret->refcount);
>>>>                        break;
>>>>                }
>>>>
>>>>would replace whole of this patch changes, although seems a bit hacky.
>>>
>>>Or event better, as I suggested in the other patch reply, resolve this
>>>internally in the driver registering things only when they are valid.
>>>Much better then to hack anything in dpll core.
>>>
>>
>>This approach seemed to me hacky, that is why started with coping the
>>data.
>>It is not about registering, rather about unregistering on driver
>>unbind, which brakes things, and currently cannot be recovered in
>>described case.
>
>Sure it can. PF0 unbind-> internal notification-> PF1 unregisters all
>related object. Very clean and simple.
>

What you are suggesting is:
- special purpose bus in the driver,
- dpll-related,
- not needed,
- prone for errors.

The dpll subsystem is here to make driver life easier.

Thank you!
Arkadiusz

>
>>
>>Thank you!
>>Arkadiusz
>>
>>>
>>>>
>>>>Thank you!
>>>>Arkadiusz
>>>>
>>>>>
>>>>>>
>>>>>>Thank you!
>>>>>>Arkadiusz
>>>>>>
>>>>>>>
>>>>>>>> 	refcount_set(&pin->refcount, 1);
>>>>>>>> 	xa_init_flags(&pin->dpll_refs, XA_FLAGS_ALLOC);
>>>>>>>> 	xa_init_flags(&pin->parent_refs, XA_FLAGS_ALLOC);
>>>>>>>>@@ -634,7 +634,7 @@ int dpll_pin_on_pin_register(struct dpll_pin
>>>>>>>>*parent,
>>>>>>>>struct dpll_pin *pin,
>>>>>>>> 	unsigned long i, stop;
>>>>>>>> 	int ret;
>>>>>>>>
>>>>>>>>-	if (WARN_ON(parent->prop->type != DPLL_PIN_TYPE_MUX))
>>>>>>>>+	if (WARN_ON(parent->prop.type != DPLL_PIN_TYPE_MUX))
>>>>>>>> 		return -EINVAL;
>>>>>>>>
>>>>>>>> 	if (WARN_ON(!ops) ||
>>>>>>>>diff --git a/drivers/dpll/dpll_core.h b/drivers/dpll/dpll_core.h
>>>>>>>>index 5585873c5c1b..717f715015c7 100644
>>>>>>>>--- a/drivers/dpll/dpll_core.h
>>>>>>>>+++ b/drivers/dpll/dpll_core.h
>>>>>>>>@@ -44,7 +44,7 @@ struct dpll_device {
>>>>>>>>  * @module:		module of creator
>>>>>>>>  * @dpll_refs:		hold referencees to dplls pin was registered
>>>>>>>>with
>>>>>>>>  * @parent_refs:	hold references to parent pins pin was
>registered
>>>>>>>>with
>>>>>>>>- * @prop:		pointer to pin properties given by registerer
>>>>>>>>+ * @prop:		pin properties copied from the registerer
>>>>>>>>  * @rclk_dev_name:	holds name of device when pin can recover
>>>>>>>>clock
>>>>>>>>from it
>>>>>>>>  * @refcount:		refcount
>>>>>>>>  **/
>>>>>>>>@@ -55,7 +55,7 @@ struct dpll_pin {
>>>>>>>> 	struct module *module;
>>>>>>>> 	struct xarray dpll_refs;
>>>>>>>> 	struct xarray parent_refs;
>>>>>>>>-	const struct dpll_pin_properties *prop;
>>>>>>>>+	struct dpll_pin_properties prop;
>>>>>>>> 	refcount_t refcount;
>>>>>>>> };
>>>>>>>>
>>>>>>>>diff --git a/drivers/dpll/dpll_netlink.c
>b/drivers/dpll/dpll_netlink.c
>>>>>>>>index 93fc6c4b8a78..963bbbbe6660 100644
>>>>>>>>--- a/drivers/dpll/dpll_netlink.c
>>>>>>>>+++ b/drivers/dpll/dpll_netlink.c
>>>>>>>>@@ -278,17 +278,17 @@ dpll_msg_add_pin_freq(struct sk_buff *msg,
>>>>>>>>struct
>>>>>>>>dpll_pin *pin,
>>>>>>>> 	if (nla_put_64bit(msg, DPLL_A_PIN_FREQUENCY, sizeof(freq),
>>>>>>>>&freq,
>>>>>>>> 			  DPLL_A_PIN_PAD))
>>>>>>>> 		return -EMSGSIZE;
>>>>>>>>-	for (fs = 0; fs < pin->prop->freq_supported_num; fs++) {
>>>>>>>>+	for (fs = 0; fs < pin->prop.freq_supported_num; fs++) {
>>>>>>>> 		nest = nla_nest_start(msg,
>>>>>>>>DPLL_A_PIN_FREQUENCY_SUPPORTED);
>>>>>>>> 		if (!nest)
>>>>>>>> 			return -EMSGSIZE;
>>>>>>>>-		freq = pin->prop->freq_supported[fs].min;
>>>>>>>>+		freq = pin->prop.freq_supported[fs].min;
>>>>>>>> 		if (nla_put_64bit(msg, DPLL_A_PIN_FREQUENCY_MIN,
>>>>>>>>sizeof(freq),
>>>>>>>> 				  &freq, DPLL_A_PIN_PAD)) {
>>>>>>>> 			nla_nest_cancel(msg, nest);
>>>>>>>> 			return -EMSGSIZE;
>>>>>>>> 		}
>>>>>>>>-		freq = pin->prop->freq_supported[fs].max;
>>>>>>>>+		freq = pin->prop.freq_supported[fs].max;
>>>>>>>> 		if (nla_put_64bit(msg, DPLL_A_PIN_FREQUENCY_MAX,
>>>>>>>>sizeof(freq),
>>>>>>>> 				  &freq, DPLL_A_PIN_PAD)) {
>>>>>>>> 			nla_nest_cancel(msg, nest);
>>>>>>>>@@ -304,9 +304,9 @@ static bool dpll_pin_is_freq_supported(struct
>>>>>>>>dpll_pin
>>>>>>>>*pin, u32 freq)
>>>>>>>> {
>>>>>>>> 	int fs;
>>>>>>>>
>>>>>>>>-	for (fs = 0; fs < pin->prop->freq_supported_num; fs++)
>>>>>>>>-		if (freq >= pin->prop->freq_supported[fs].min &&
>>>>>>>>-		    freq <= pin->prop->freq_supported[fs].max)
>>>>>>>>+	for (fs = 0; fs < pin->prop.freq_supported_num; fs++)
>>>>>>>>+		if (freq >= pin->prop.freq_supported[fs].min &&
>>>>>>>>+		    freq <= pin->prop.freq_supported[fs].max)
>>>>>>>> 			return true;
>>>>>>>> 	return false;
>>>>>>>> }
>>>>>>>>@@ -403,7 +403,7 @@ static int
>>>>>>>> dpll_cmd_pin_get_one(struct sk_buff *msg, struct dpll_pin *pin,
>>>>>>>> 		     struct netlink_ext_ack *extack)
>>>>>>>> {
>>>>>>>>-	const struct dpll_pin_properties *prop = pin->prop;
>>>>>>>>+	const struct dpll_pin_properties *prop = &pin->prop;
>>>>>>>> 	struct dpll_pin_ref *ref;
>>>>>>>> 	int ret;
>>>>>>>>
>>>>>>>>@@ -696,7 +696,7 @@ dpll_pin_on_pin_state_set(struct dpll_pin *pin,
>>>>>>>>u32
>>>>>>>>parent_idx,
>>>>>>>> 	int ret;
>>>>>>>>
>>>>>>>> 	if (!(DPLL_PIN_CAPABILITIES_STATE_CAN_CHANGE &
>>>>>>>>-	      pin->prop->capabilities)) {
>>>>>>>>+	      pin->prop.capabilities)) {
>>>>>>>> 		NL_SET_ERR_MSG(extack, "state changing is not allowed");
>>>>>>>> 		return -EOPNOTSUPP;
>>>>>>>> 	}
>>>>>>>>@@ -732,7 +732,7 @@ dpll_pin_state_set(struct dpll_device *dpll,
>>>>>>>>struct
>>>>>>>>dpll_pin *pin,
>>>>>>>> 	int ret;
>>>>>>>>
>>>>>>>> 	if (!(DPLL_PIN_CAPABILITIES_STATE_CAN_CHANGE &
>>>>>>>>-	      pin->prop->capabilities)) {
>>>>>>>>+	      pin->prop.capabilities)) {
>>>>>>>> 		NL_SET_ERR_MSG(extack, "state changing is not allowed");
>>>>>>>> 		return -EOPNOTSUPP;
>>>>>>>> 	}
>>>>>>>>@@ -759,7 +759,7 @@ dpll_pin_prio_set(struct dpll_device *dpll,
>struct
>>>>>>>>dpll_pin *pin,
>>>>>>>> 	int ret;
>>>>>>>>
>>>>>>>> 	if (!(DPLL_PIN_CAPABILITIES_PRIORITY_CAN_CHANGE &
>>>>>>>>-	      pin->prop->capabilities)) {
>>>>>>>>+	      pin->prop.capabilities)) {
>>>>>>>> 		NL_SET_ERR_MSG(extack, "prio changing is not allowed");
>>>>>>>> 		return -EOPNOTSUPP;
>>>>>>>> 	}
>>>>>>>>@@ -787,7 +787,7 @@ dpll_pin_direction_set(struct dpll_pin *pin,
>>>>>>>>struct
>>>>>>>>dpll_device *dpll,
>>>>>>>> 	int ret;
>>>>>>>>
>>>>>>>> 	if (!(DPLL_PIN_CAPABILITIES_DIRECTION_CAN_CHANGE &
>>>>>>>>-	      pin->prop->capabilities)) {
>>>>>>>>+	      pin->prop.capabilities)) {
>>>>>>>> 		NL_SET_ERR_MSG(extack, "direction changing is not
>>>>>>>>allowed");
>>>>>>>> 		return -EOPNOTSUPP;
>>>>>>>> 	}
>>>>>>>>@@ -817,8 +817,8 @@ dpll_pin_phase_adj_set(struct dpll_pin *pin,
>>>>>>>>struct
>>>>>>>>nlattr *phase_adj_attr,
>>>>>>>> 	int ret;
>>>>>>>>
>>>>>>>> 	phase_adj = nla_get_s32(phase_adj_attr);
>>>>>>>>-	if (phase_adj > pin->prop->phase_range.max ||
>>>>>>>>-	    phase_adj < pin->prop->phase_range.min) {
>>>>>>>>+	if (phase_adj > pin->prop.phase_range.max ||
>>>>>>>>+	    phase_adj < pin->prop.phase_range.min) {
>>>>>>>> 		NL_SET_ERR_MSG_ATTR(extack, phase_adj_attr,
>>>>>>>> 				    "phase adjust value not supported");
>>>>>>>> 		return -EINVAL;
>>>>>>>>@@ -999,7 +999,7 @@ dpll_pin_find(u64 clock_id, struct nlattr
>>>>>>>>*mod_name_attr,
>>>>>>>> 	unsigned long i;
>>>>>>>>
>>>>>>>> 	xa_for_each_marked(&dpll_pin_xa, i, pin, DPLL_REGISTERED) {
>>>>>>>>-		prop = pin->prop;
>>>>>>>>+		prop = &pin->prop;
>>>>>>>> 		cid_match = clock_id ? pin->clock_id == clock_id : true;
>>>>>>>> 		mod_match = mod_name_attr && module_name(pin->module) ?
>>>>>>>> 			!nla_strcmp(mod_name_attr,
>>>>>>>>--
>>>>>>>>2.38.1
>>>>>>>>
>>>>>>
>>

^ permalink raw reply

* RE: [PATCH net 0/3] dpll: fix unordered unbind/bind registerer issues
From: Kubalewski, Arkadiusz @ 2023-11-10  9:06 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Vadim Fedorenko, netdev@vger.kernel.org, Michalik, Michal,
	Olech, Milena, pabeni@redhat.com, kuba@kernel.org
In-Reply-To: <ZU3SSClU6Ijn3M7B@nanopsycho>

>From: Jiri Pirko <jiri@resnulli.us>
>Sent: Friday, November 10, 2023 7:49 AM
>
>Fri, Nov 10, 2023 at 12:35:43AM CET, arkadiusz.kubalewski@intel.com wrote:
>>>From: Jiri Pirko <jiri@resnulli.us>
>>>Sent: Thursday, November 9, 2023 7:07 PM
>>>
>>>Thu, Nov 09, 2023 at 06:20:14PM CET, arkadiusz.kubalewski@intel.com
>>>wrote:
>>>>>From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
>>>>>Sent: Thursday, November 9, 2023 11:51 AM
>>>>>
>>>>>On 08/11/2023 10:32, Arkadiusz Kubalewski wrote:
>>>>>> Fix issues when performing unordered unbind/bind of a kernel modules
>>>>>> which are using a dpll device with DPLL_PIN_TYPE_MUX pins.
>>>>>> Currently only serialized bind/unbind of such use case works, fix
>>>>>> the issues and allow for unserialized kernel module bind order.
>>>>>>
>>>>>> The issues are observed on the ice driver, i.e.,
>>>>>>
>>>>>> $ echo 0000:af:00.0 > /sys/bus/pci/drivers/ice/unbind
>>>>>> $ echo 0000:af:00.1 > /sys/bus/pci/drivers/ice/unbind
>>>>>>
>>>>>> results in:
>>>>>>
>>>>>> ice 0000:af:00.0: Removed PTP clock
>>>>>> BUG: kernel NULL pointer dereference, address: 0000000000000010
>>>>>> PF: supervisor read access in kernel mode
>>>>>> PF: error_code(0x0000) - not-present page
>>>>>> PGD 0 P4D 0
>>>>>> Oops: 0000 [#1] PREEMPT SMP PTI
>>>>>> CPU: 7 PID: 71848 Comm: bash Kdump: loaded Not tainted 6.6.0-
>>>>>>rc5_next-
>>>>>>queue_19th-Oct-2023-01625-g039e5d15e451 #1
>>>>>> Hardware name: Intel Corporation S2600STB/S2600STB, BIOS
>>>>>>SE5C620.86B.02.01.0008.031920191559 03/19/2019
>>>>>> RIP: 0010:ice_dpll_rclk_state_on_pin_get+0x2f/0x90 [ice]
>>>>>> Code: 41 57 4d 89 cf 41 56 41 55 4d 89 c5 41 54 55 48 89 f5 53 4c 8b
>>>>>>66
>>>>>>08 48 89 cb 4d 8d b4 24 f0 49 00 00 4c 89 f7 e8 71 ec 1f c5 <0f> b6 5b
>>>>>>10
>>>>>>41 0f b6 84 24 30 4b 00 00 29 c3 41 0f b6 84 24 28 4b
>>>>>> RSP: 0018:ffffc902b179fb60 EFLAGS: 00010246
>>>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>>>>> RDX: ffff8882c1398000 RSI: ffff888c7435cc60 RDI: ffff888c7435cb90
>>>>>> RBP: ffff888c7435cc60 R08: ffffc902b179fbb0 R09: 0000000000000000
>>>>>> R10: ffff888ef1fc8050 R11: fffffffffff82700 R12: ffff888c743581a0
>>>>>> R13: ffffc902b179fbb0 R14: ffff888c7435cb90 R15: 0000000000000000
>>>>>> FS:  00007fdc7dae0740(0000) GS:ffff888c105c0000(0000)
>>>>>>knlGS:0000000000000000
>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> CR2: 0000000000000010 CR3: 0000000132c24002 CR4: 00000000007706e0
>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>> PKRU: 55555554
>>>>>> Call Trace:
>>>>>>   <TASK>
>>>>>>   ? __die+0x20/0x70
>>>>>>   ? page_fault_oops+0x76/0x170
>>>>>>   ? exc_page_fault+0x65/0x150
>>>>>>   ? asm_exc_page_fault+0x22/0x30
>>>>>>   ? ice_dpll_rclk_state_on_pin_get+0x2f/0x90 [ice]
>>>>>>   ? __pfx_ice_dpll_rclk_state_on_pin_get+0x10/0x10 [ice]
>>>>>>   dpll_msg_add_pin_parents+0x142/0x1d0
>>>>>>   dpll_pin_event_send+0x7d/0x150
>>>>>>   dpll_pin_on_pin_unregister+0x3f/0x100
>>>>>>   ice_dpll_deinit_pins+0xa1/0x230 [ice]
>>>>>>   ice_dpll_deinit+0x29/0xe0 [ice]
>>>>>>   ice_remove+0xcd/0x200 [ice]
>>>>>>   pci_device_remove+0x33/0xa0
>>>>>>   device_release_driver_internal+0x193/0x200
>>>>>>   unbind_store+0x9d/0xb0
>>>>>>   kernfs_fop_write_iter+0x128/0x1c0
>>>>>>   vfs_write+0x2bb/0x3e0
>>>>>>   ksys_write+0x5f/0xe0
>>>>>>   do_syscall_64+0x59/0x90
>>>>>>   ? filp_close+0x1b/0x30
>>>>>>   ? do_dup2+0x7d/0xd0
>>>>>>   ? syscall_exit_work+0x103/0x130
>>>>>>   ? syscall_exit_to_user_mode+0x22/0x40
>>>>>>   ? do_syscall_64+0x69/0x90
>>>>>>   ? syscall_exit_work+0x103/0x130
>>>>>>   ? syscall_exit_to_user_mode+0x22/0x40
>>>>>>   ? do_syscall_64+0x69/0x90
>>>>>>   entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>>>>>> RIP: 0033:0x7fdc7d93eb97
>>>>>> Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f
>>>>>>1e
>>>>>>fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00
>>>>>>f0
>>>>>>ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
>>>>>> RSP: 002b:00007fff2aa91028 EFLAGS: 00000246 ORIG_RAX:
>>>>>>0000000000000001
>>>>>> RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fdc7d93eb97
>>>>>> RDX: 000000000000000d RSI: 00005644814ec9b0 RDI: 0000000000000001
>>>>>> RBP: 00005644814ec9b0 R08: 0000000000000000 R09: 00007fdc7d9b14e0
>>>>>> R10: 00007fdc7d9b13e0 R11: 0000000000000246 R12: 000000000000000d
>>>>>> R13: 00007fdc7d9fb780 R14: 000000000000000d R15: 00007fdc7d9f69e0
>>>>>>   </TASK>
>>>>>> Modules linked in: uinput vfio_pci vfio_pci_core vfio_iommu_type1
>>>>>>vfio
>>>>>>irqbypass ixgbevf snd_seq_dummy snd_hrtimer snd_seq snd_timer
>>>>>>snd_seq_device snd soundcore overlay qrtr rfkill vfat fat xfs
>>>>>>libcrc32c
>>>>>>rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod
>>>>>>target_core_mod
>>>>>>ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm
>>>>>>intel_rapl_msr
>>>>>>intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common
>>>>>>isst_if_common skx_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal
>>>>>>intel_powerclamp coretemp irdma rapl intel_cstate ib_uverbs iTCO_wdt
>>>>>>iTCO_vendor_support acpi_ipmi intel_uncore mei_me ipmi_si pcspkr
>>>>>>i2c_i801
>>>>>>ib_core mei ipmi_devintf intel_pch_thermal ioatdma i2c_smbus
>>>>>>ipmi_msghandler lpc_ich joydev acpi_power_meter acpi_pad ext4 mbcache
>>>>>>jbd2
>>>>>>sd_mod t10_pi sg ast i2c_algo_bit drm_shmem_helper drm_kms_helper ice
>>>>>>crct10dif_pclmul ixgbe crc32_pclmul drm crc32c_intel ahci i40e libahci
>>>>>>ghash_clmulni_intel libata mdio dca gnss wmi fuse [last unloaded:
>>>>>>iavf]
>>>>>> CR2: 0000000000000010
>>>>>>
>>>>>> Arkadiusz Kubalewski (3):
>>>>>>    dpll: fix pin dump crash after module unbind
>>>>>>    dpll: fix pin dump crash for rebound module
>>>>>>    dpll: fix register pin with unregistered parent pin
>>>>>>
>>>>>>   drivers/dpll/dpll_core.c    |  8 ++------
>>>>>>   drivers/dpll/dpll_core.h    |  4 ++--
>>>>>>   drivers/dpll/dpll_netlink.c | 37 ++++++++++++++++++++++------------
>>>>>>--
>>>>>>-
>>>>>>   3 files changed, 26 insertions(+), 23 deletions(-)
>>>>>>
>>>>>
>>>>>
>>>>>I still don't get how can we end up with unregistered pin. And
>>>>>shouldn't
>>>>>drivers do unregister of dpll/pin during release procedure? I thought
>>>>>it
>>>>>was kind of agreement we reached while developing the subsystem.
>>>>>
>>>>
>>>>It's definitely not about ending up with unregistered pins.
>>>>
>>>>Usually the driver is loaded for PF0, PF1, PF2, PF3 and unloaded in
>>>>opposite
>>>>order: PF3, PF2, PF1, PF0. And this is working without any issues.
>>>
>>>Please fix this in the driver.
>>>
>>
>>Thanks for your feedback, but this is already wrong advice.
>>
>>Our HW/FW is designed in different way than yours, it doesn't mean it is
>>wrong.
>>As you might recall from our sync meetings, the dpll subsystem is to unify
>>approaches and reduce the code in the drivers, where your advice is
>>exactly
>>opposite, suggested fix would require to implement extra synchronization
>>of the
>>dpll and pin registration state between driver instances, most probably
>>with
>>use of additional modules like aux-bus or something similar, which was
>>from the
>>very beginning something we tried to avoid.
>>Only ice uses the infrastructure of muxed pins, and this is broken as it
>>doesn't allow unbind the driver which have registered dpll and pins
>>without
>>crashing the kernel, so a fix is required in dpll subsystem, not in the
>>driver.
>
>I replied in the other patch thread.
>

Yes, so did I.
But what is the reason you have moved the discussion from the other thread
into this one?

Thank you!
Arkadiusz

>
>>
>>Thank you!
>>Arkadiusz
>>
>>>
>>>>
>>>>Above crash is caused because of unordered driver unload, where dpll
>>>>subsystem
>>>>tries to notify muxed pin was deleted, but at that time the parent is
>>>>already
>>>>gone, thus data points to memory which is no longer available, thus
>>>>crash
>>>>happens when trying to dump pin parents.
>>>>
>>>>This series fixes all issues I could find connected to the situation
>>>>where
>>>>muxed-pins are trying to access their parents, when parent registerer
>>>>was
>>>>removed
>>>>in the meantime.
>>>>
>>>>Thank you!
>>>>Arkadiusz

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox