Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: KASAN: slab-out-of-bounds Read in pfkey_add
From: Dmitry Vyukov @ 2018-04-11  7:48 UTC (permalink / raw)
  To: Kevin Easton
  Cc: Eric Biggers, David Miller, Herbert Xu, LKML, netdev,
	Steffen Klassert, syzkaller-bugs, syzbot
In-Reply-To: <20180411061824.GA28791@la.guarana.org>

On Wed, Apr 11, 2018 at 8:18 AM, Kevin Easton <kevin@guarana.org> wrote:
> On Mon, Apr 09, 2018 at 01:56:36AM -0400, Kevin Easton wrote:
>> On Sun, Apr 08, 2018 at 09:04:33PM -0700, Eric Biggers wrote:
>> ...
>> >
>> > Looks like this is going to be fixed by
>> > https://patchwork.kernel.org/patch/10327883/ ("af_key: Always verify length of
>> > provided sadb_key"), but it's not applied yet to the ipsec tree yet.  Kevin, for
>> > future reference, for syzbot bugs it would be helpful to reply to the original
>> > bug report and say that a patch was sent out, or even better send the patch as a
>> > reply to the bug report email, e.g.
>> >
>> >     git format-patch --in-reply-to="<001a114292fadd3e2505607060a8@google.com>"
>> >
>> > for this one (and the Message ID can be found in the syzkaller-bugs archive even
>> > if the email isn't in your inbox).
>>
>> Sure, I can do that.
>
> I recalled one reason I _didn't_ do this - the message ID is retrievable
> from the archived email, but because the archive is Google Groups the
> message recipients aren't (only masked).
>
>     - Kevin
>

Hi Kevin,

This was mailed to other lists too:

To: davem@, herbert@, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, steffen.klassert@,
syzkaller-bugs@googlegroups.com

In the groups UI there is a drop down menu with "Show Original" option
which shows raw email which include Message-ID: header.

^ permalink raw reply

* Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework
From: Jiri Pirko @ 2018-04-11  7:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stephen Hemminger, Sridhar Samudrala, davem, netdev,
	virtualization, virtio-dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, jasowang, loseweigh
In-Reply-To: <20180411022807-mutt-send-email-mst@kernel.org>

Wed, Apr 11, 2018 at 01:28:51AM CEST, mst@redhat.com wrote:
>On Tue, Apr 10, 2018 at 02:26:08PM -0700, Stephen Hemminger wrote:
>> On Tue, 10 Apr 2018 11:59:50 -0700
>> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>> 
>> > Use the registration/notification framework supported by the generic
>> > bypass infrastructure.
>> > 
>> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> > ---
>> 
>> Thanks for doing this.  Your current version has couple show stopper
>> issues.
>> 
>> First, the slave device is instantly taking over the slave.
>> This doesn't allow udev/systemd to do its device rename of the slave
>> device. Netvsc uses a delayed work to workaround this.
>
>Interesting. Does this mean udev must act within a specific time window
>then?

Yeah. That is scarry. Also, wrong.


>
>> Secondly, the select queue needs to call queue selection in VF.
>> The bonding/teaming logic doesn't work well for UDP flows.
>> Commit b3bf5666a510 ("hv_netvsc: defer queue selection to VF")
>> fixed this performance problem.
>> 
>> Lastly, more indirection is bad in current climate.
>> 
>> I am not completely adverse to this but it needs to be fast, simple
>> and completely transparent.

^ permalink raw reply

* Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework
From: Jiri Pirko @ 2018-04-11  7:53 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Sridhar Samudrala, mst, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh
In-Reply-To: <20180410142608.50f15b45@xeon-e3>

Tue, Apr 10, 2018 at 11:26:08PM CEST, stephen@networkplumber.org wrote:
>On Tue, 10 Apr 2018 11:59:50 -0700
>Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
>> Use the registration/notification framework supported by the generic
>> bypass infrastructure.
>> 
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> ---
>
>Thanks for doing this.  Your current version has couple show stopper
>issues.
>
>First, the slave device is instantly taking over the slave.
>This doesn't allow udev/systemd to do its device rename of the slave
>device. Netvsc uses a delayed work to workaround this.

Wait. Why the fact a device is enslaved has to affect the udev in any
way? If it does, smells like a bug in udev.


>
>Secondly, the select queue needs to call queue selection in VF.
>The bonding/teaming logic doesn't work well for UDP flows.
>Commit b3bf5666a510 ("hv_netvsc: defer queue selection to VF")
>fixed this performance problem.
>
>Lastly, more indirection is bad in current climate.
>
>I am not completely adverse to this but it needs to be fast, simple
>and completely transparent.

^ permalink raw reply

* [PATCH net] rds: MP-RDS may use an invalid c_path
From: Ka-Cheong Poon @ 2018-04-11  7:57 UTC (permalink / raw)
  To: netdev; +Cc: santosh.shilimkar, davem, rds-devel

rds_sendmsg() calls rds_send_mprds_hash() to find a c_path to use to
send a message.  Suppose the RDS connection is not yet up.  In
rds_send_mprds_hash(), it does

	if (conn->c_npaths == 0)
		wait_event_interruptible(conn->c_hs_waitq,
					 (conn->c_npaths != 0));

If it is interrupted before the connection is set up,
rds_send_mprds_hash() will return a non-zero hash value.  Hence
rds_sendmsg() will use a non-zero c_path to send the message.  But if
the RDS connection ends up to be non-MP capable, the message will be
lost as only the zero c_path can be used.

Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
---
 net/rds/send.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/rds/send.c b/net/rds/send.c
index acad042..94c7f74 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 Oracle.  All rights reserved.
+ * Copyright (c) 2006, 2018 Oracle and/or its affiliates. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -1017,10 +1017,15 @@ static int rds_send_mprds_hash(struct rds_sock *rs, struct rds_connection *conn)
 	if (conn->c_npaths == 0 && hash != 0) {
 		rds_send_ping(conn, 0);
 
-		if (conn->c_npaths == 0) {
-			wait_event_interruptible(conn->c_hs_waitq,
-						 (conn->c_npaths != 0));
-		}
+		/* The underlying connection is not up yet.  Need to wait
+		 * until it is up to be sure that the non-zero c_path can be
+		 * used.  But if we are interrupted, we have to use the zero
+		 * c_path in case the connection ends up being non-MP capable.
+		 */
+		if (conn->c_npaths == 0)
+			if (wait_event_interruptible(conn->c_hs_waitq,
+						     conn->c_npaths != 0))
+				hash = 0;
 		if (conn->c_npaths == 1)
 			hash = 0;
 	}
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH 1/2] staging: irda: Replace mdelay with usleep_range in stir421x_fw_upload
From: Greg KH @ 2018-04-11  8:03 UTC (permalink / raw)
  To: Jia-Ju Bai
  Cc: devel, samuel, netdev, johan, linux-kernel, arvind.yadav.cs,
	davem
In-Reply-To: <d65d1116-7ffd-e289-5379-cc052782700d@gmail.com>

On Wed, Apr 11, 2018 at 03:17:10PM +0800, Jia-Ju Bai wrote:
> 
> 
> On 2018/4/11 14:41, Greg KH wrote:
> > On Wed, Apr 11, 2018 at 09:29:34AM +0800, Jia-Ju Bai wrote:
> > > stir421x_fw_upload() is never called in atomic context.
> > > 
> > > The call chain ending up at stir421x_fw_upload() is:
> > > [1] stir421x_fw_upload() <- stir421x_patch_device() <- irda_usb_probe()
> > > 
> > > irda_usb_probe() is set as ".probe" in struct usb_driver.
> > > This function is not called in atomic context.
> > > 
> > > Despite never getting called from atomic context, stir421x_fw_upload()
> > > calls mdelay() to busily wait.
> > > This is not necessary and can be replaced with usleep_range() to
> > > avoid busy waiting.
> > > 
> > > This is found by a static analysis tool named DCNS written by myself.
> > > And I also manually check it.
> > > 
> > > Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
> > > ---
> > >   drivers/staging/irda/drivers/irda-usb.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > Please, at the very least, work off of Linus's tree.  There is no
> > drivers/staging/irda/ anymore :)
> > 
> 
> Okay, sorry.
> Could you please recommend me a right tree or its git address?

Have you looked in the MAINTAINERS file?  Worst case, always use
linux-next.

greg k-h

^ permalink raw reply

* Re: [RFC PATCH net-next v5 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Jiri Pirko @ 2018-04-11  8:03 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh
In-Reply-To: <5d56e572-24f1-745d-49ae-c2dada5db03c@intel.com>

Wed, Apr 11, 2018 at 08:24:43AM CEST, sridhar.samudrala@intel.com wrote:
>On 4/10/2018 11:03 PM, Jiri Pirko wrote:
>> Tue, Apr 10, 2018 at 05:59:02PM CEST, sridhar.samudrala@intel.com wrote:
>> > On 4/10/2018 8:43 AM, Jiri Pirko wrote:
>> > > Tue, Apr 10, 2018 at 05:27:48PM CEST, sridhar.samudrala@intel.com wrote:
>> > > > On 4/10/2018 8:22 AM, Jiri Pirko wrote:
>> > > > > Tue, Apr 10, 2018 at 05:13:40PM CEST, sridhar.samudrala@intel.com wrote:
>> > > > > > On 4/10/2018 3:55 AM, Jiri Pirko wrote:
>> > > > > > > Mon, Apr 09, 2018 at 08:47:06PM CEST, sridhar.samudrala@intel.com wrote:
>> > > > > > > > On 4/9/2018 1:07 AM, Jiri Pirko wrote:
>> > > > > > > > > Sat, Apr 07, 2018 at 12:59:14AM CEST, sridhar.samudrala@intel.com wrote:
>> > > > > > > > > > On 4/6/2018 5:48 AM, Jiri Pirko wrote:
>> > > > > > > > > > > Thu, Apr 05, 2018 at 11:08:22PM CEST, sridhar.samudrala@intel.com wrote:
>> > > > > > > > > [...]
>> > > > > > > > > 
>> > > > > > > > > > > > +static int virtnet_bypass_join_child(struct net_device *bypass_netdev,
>> > > > > > > > > > > > +				     struct net_device *child_netdev)
>> > > > > > > > > > > > +{
>> > > > > > > > > > > > +	struct virtnet_bypass_info *vbi;
>> > > > > > > > > > > > +	bool backup;
>> > > > > > > > > > > > +
>> > > > > > > > > > > > +	vbi = netdev_priv(bypass_netdev);
>> > > > > > > > > > > > +	backup = (child_netdev->dev.parent == bypass_netdev->dev.parent);
>> > > > > > > > > > > > +	if (backup ? rtnl_dereference(vbi->backup_netdev) :
>> > > > > > > > > > > > +			rtnl_dereference(vbi->active_netdev)) {
>> > > > > > > > > > > > +		netdev_info(bypass_netdev,
>> > > > > > > > > > > > +			    "%s attempting to join bypass dev when %s already present\n",
>> > > > > > > > > > > > +			    child_netdev->name, backup ? "backup" : "active");
>> > > > > > > > > > > Bypass module should check if there is already some other netdev
>> > > > > > > > > > > enslaved and refuse right there.
>> > > > > > > > > > This will work for virtio-net with 3 netdev model, but this check has to be done by netvsc
>> > > > > > > > > > as its bypass_netdev is same as the backup_netdev.
>> > > > > > > > > > Will add a flag while registering with the bypass module to indicate if the driver is doing
>> > > > > > > > > > a 2 netdev or 3 netdev model and based on that flag this check can be done in bypass module
>> > > > > > > > > > for 3 netdev scenario.
>> > > > > > > > > Just let me undestand it clearly. What I expect the difference would be
>> > > > > > > > > between 2netdev and3 netdev model is this:
>> > > > > > > > > 2netdev:
>> > > > > > > > >        bypass_master
>> > > > > > > > >           /
>> > > > > > > > >          /
>> > > > > > > > > VF_slave
>> > > > > > > > > 
>> > > > > > > > > 3netdev:
>> > > > > > > > >        bypass_master
>> > > > > > > > >           /     \
>> > > > > > > > >          /       \
>> > > > > > > > > VF_slave   backup_slave
>> > > > > > > > > 
>> > > > > > > > > Is that correct? If not, how does it look like?
>> > > > > > > > > 
>> > > > > > > > > 
>> > > > > > > > Looks correct.
>> > > > > > > > VF_slave and backup_slave are the original netdevs and are present in both the models.
>> > > > > > > > In the 3 netdev model, bypass_master netdev is created and VF_slave and backup_slave are
>> > > > > > > > marked as the 2 slaves of this new netdev.
>> > > > > > > You say it looks correct and in another sentence you provide completely
>> > > > > > > different description. Could you please look again?
>> > > > > > > 
>> > > > > > To be exact, 2 netdev model with netvsc looks like this.
>> > > > > > 
>> > > > > >       netvsc_netdev
>> > > > > >         /
>> > > > > >        /
>> > > > > > VF_slave
>> > > > > > 
>> > > > > > With virtio_net, 3 netdev model
>> > > > > > 
>> > > > > >     bypass_netdev
>> > > > > >         /     \
>> > > > > >        /       \
>> > > > > > VF_slave   virtio_net netdev
>> > > > > Could you also mark the original netdev which is there now? is it
>> > > > > bypass_netdev or virtio_net_netdev ?
>> > > > bypass_netdev
>> > > >       /     \
>> > > >      /       \
>> > > > VF_slave   virtio_net netdev (original)
>> > > That does not make sense.
>> > > 1) You diverge from the behaviour of the netvsc, where the original
>> > >      netdev is a master of the VF
>> > > 2) If the original netdev is a slave, you cannot have any IP address
>> > >      configured on it (well you could, but the rx_handler would eat every
>> > >      incoming packet). So you will break the user bacause he would have to
>> > >      move the configuration to the new master device.
>> > > This only makes sense that the original netdev becomes the master for both
>> > > netvsc and virtio_net.
>> > Forgot to mention that bypass_netdev takes over the name of the original
>> > netdev and
>> > virtio_net netdev will get the backup name.
>> What do you mean by "name"?
>
>bypass_netdev also is associated with the same pci device as the original virtio_net
>netdev via SET_NETDEV_DEV().  Also, we added ndo_get_phys_port_name() to virtio_net
>that will return _bkup when BACKUP feature is enabled.

Okay.

>
>So for ex: if virtio_net inteface was getting 'ens12' as the name assigned by udev
>without BACKUP feature,  when BACKUP feature is enabled,  the  bypass_netdev will be
>named 'ens12' and the original virtio_net will get named as ens12n_bkup.

Got it.

I don't like the bypass_master to look differently in netvsc and
virtio_net :/ The best would be to convert netvsc to 3 netdev model and
treat them the same. The more I think about it, the more the 2 netdev
model feels wrong.


>
>
>> 
>> > So the userspace network configuration doesn't need to change.
>> > 
>> > 
>

^ permalink raw reply

* Re: [PATCH 1/2] staging: irda: Replace mdelay with usleep_range in stir421x_fw_upload
From: Jia-Ju Bai @ 2018-04-11  8:11 UTC (permalink / raw)
  To: Greg KH; +Cc: devel, samuel, netdev, johan, linux-kernel, arvind.yadav.cs,
	davem
In-Reply-To: <20180411080311.GB2137@kroah.com>



On 2018/4/11 16:03, Greg KH wrote:
> On Wed, Apr 11, 2018 at 03:17:10PM +0800, Jia-Ju Bai wrote:
>>
>> On 2018/4/11 14:41, Greg KH wrote:
>>> On Wed, Apr 11, 2018 at 09:29:34AM +0800, Jia-Ju Bai wrote:
>>>> stir421x_fw_upload() is never called in atomic context.
>>>>
>>>> The call chain ending up at stir421x_fw_upload() is:
>>>> [1] stir421x_fw_upload() <- stir421x_patch_device() <- irda_usb_probe()
>>>>
>>>> irda_usb_probe() is set as ".probe" in struct usb_driver.
>>>> This function is not called in atomic context.
>>>>
>>>> Despite never getting called from atomic context, stir421x_fw_upload()
>>>> calls mdelay() to busily wait.
>>>> This is not necessary and can be replaced with usleep_range() to
>>>> avoid busy waiting.
>>>>
>>>> This is found by a static analysis tool named DCNS written by myself.
>>>> And I also manually check it.
>>>>
>>>> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
>>>> ---
>>>>    drivers/staging/irda/drivers/irda-usb.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>> Please, at the very least, work off of Linus's tree.  There is no
>>> drivers/staging/irda/ anymore :)
>>>
>> Okay, sorry.
>> Could you please recommend me a right tree or its git address?
> Have you looked in the MAINTAINERS file?  Worst case, always use
> linux-next.
>
> greg k-h

Oh, sorry, I did notice the git tree in the MAINTAINERS file.
I always used linux-stable.
Thanks for telling me this :)


Best wishes,
Jia-Ju Bai

^ permalink raw reply

* Re: [PATCH 1/2] staging: irda: Replace mdelay with usleep_range in stir421x_fw_upload
From: Greg KH @ 2018-04-11  8:17 UTC (permalink / raw)
  To: Jia-Ju Bai
  Cc: devel, samuel, netdev, johan, linux-kernel, arvind.yadav.cs,
	davem
In-Reply-To: <ecf21a6e-0e99-55dc-54be-3bd633bedbc5@gmail.com>

On Wed, Apr 11, 2018 at 04:11:00PM +0800, Jia-Ju Bai wrote:
> 
> 
> On 2018/4/11 16:03, Greg KH wrote:
> > On Wed, Apr 11, 2018 at 03:17:10PM +0800, Jia-Ju Bai wrote:
> > > 
> > > On 2018/4/11 14:41, Greg KH wrote:
> > > > On Wed, Apr 11, 2018 at 09:29:34AM +0800, Jia-Ju Bai wrote:
> > > > > stir421x_fw_upload() is never called in atomic context.
> > > > > 
> > > > > The call chain ending up at stir421x_fw_upload() is:
> > > > > [1] stir421x_fw_upload() <- stir421x_patch_device() <- irda_usb_probe()
> > > > > 
> > > > > irda_usb_probe() is set as ".probe" in struct usb_driver.
> > > > > This function is not called in atomic context.
> > > > > 
> > > > > Despite never getting called from atomic context, stir421x_fw_upload()
> > > > > calls mdelay() to busily wait.
> > > > > This is not necessary and can be replaced with usleep_range() to
> > > > > avoid busy waiting.
> > > > > 
> > > > > This is found by a static analysis tool named DCNS written by myself.
> > > > > And I also manually check it.
> > > > > 
> > > > > Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
> > > > > ---
> > > > >    drivers/staging/irda/drivers/irda-usb.c | 2 +-
> > > > >    1 file changed, 1 insertion(+), 1 deletion(-)
> > > > Please, at the very least, work off of Linus's tree.  There is no
> > > > drivers/staging/irda/ anymore :)
> > > > 
> > > Okay, sorry.
> > > Could you please recommend me a right tree or its git address?
> > Have you looked in the MAINTAINERS file?  Worst case, always use
> > linux-next.
> > 
> > greg k-h
> 
> Oh, sorry, I did notice the git tree in the MAINTAINERS file.
> I always used linux-stable.

linux-stable is almost never the tree to use as it is almost always
12000 patches behind what is in Linus's tree and about 20000 changes
behind linux-next.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH 1/2] staging: irda: Replace mdelay with usleep_range in stir421x_fw_upload
From: Jia-Ju Bai @ 2018-04-11  8:20 UTC (permalink / raw)
  To: Greg KH; +Cc: devel, samuel, netdev, johan, linux-kernel, arvind.yadav.cs,
	davem
In-Reply-To: <20180411081724.GA4155@kroah.com>



On 2018/4/11 16:17, Greg KH wrote:
> On Wed, Apr 11, 2018 at 04:11:00PM +0800, Jia-Ju Bai wrote:
>>
>> On 2018/4/11 16:03, Greg KH wrote:
>>> On Wed, Apr 11, 2018 at 03:17:10PM +0800, Jia-Ju Bai wrote:
>>>> On 2018/4/11 14:41, Greg KH wrote:
>>>>> On Wed, Apr 11, 2018 at 09:29:34AM +0800, Jia-Ju Bai wrote:
>>>>>> stir421x_fw_upload() is never called in atomic context.
>>>>>>
>>>>>> The call chain ending up at stir421x_fw_upload() is:
>>>>>> [1] stir421x_fw_upload() <- stir421x_patch_device() <- irda_usb_probe()
>>>>>>
>>>>>> irda_usb_probe() is set as ".probe" in struct usb_driver.
>>>>>> This function is not called in atomic context.
>>>>>>
>>>>>> Despite never getting called from atomic context, stir421x_fw_upload()
>>>>>> calls mdelay() to busily wait.
>>>>>> This is not necessary and can be replaced with usleep_range() to
>>>>>> avoid busy waiting.
>>>>>>
>>>>>> This is found by a static analysis tool named DCNS written by myself.
>>>>>> And I also manually check it.
>>>>>>
>>>>>> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
>>>>>> ---
>>>>>>     drivers/staging/irda/drivers/irda-usb.c | 2 +-
>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>> Please, at the very least, work off of Linus's tree.  There is no
>>>>> drivers/staging/irda/ anymore :)
>>>>>
>>>> Okay, sorry.
>>>> Could you please recommend me a right tree or its git address?
>>> Have you looked in the MAINTAINERS file?  Worst case, always use
>>> linux-next.
>>>
>>> greg k-h
>> Oh, sorry, I did notice the git tree in the MAINTAINERS file.
>> I always used linux-stable.
> linux-stable is almost never the tree to use as it is almost always
> 12000 patches behind what is in Linus's tree and about 20000 changes
> behind linux-next.
>

Okay, I now know why many of my patches were not replied.
I should choose correct tree in the MAINTAINERS file.
Thanks :)


Best wishes,
Jia-Ju Bai

^ permalink raw reply

* Re: [patch net] devlink: convert occ_get op to separate registration
From: gregkh @ 2018-04-11  8:46 UTC (permalink / raw)
  To: Sasha Levin
  Cc: David Miller, jiri@resnulli.us, jiri@mellanox.com,
	netdev@vger.kernel.org, idosch@mellanox.com,
	stable@vger.kernel.org
In-Reply-To: <20180410204329.GJ2341@sasha-vm>

On Tue, Apr 10, 2018 at 08:43:31PM +0000, Sasha Levin wrote:
> >Bots are starting to overwhelm actual content from human beings
> >on this list, and I want to put my foot on the brake right now
> >before it gets even more out of control.
> 
> I think we're just hitting the limitations of using a mailing list. Bots
> aren't around to spam everyone for fun, right?
> 
> 0day bot is spammy because patches fail to compile, syzbot is spammy
> because we have tons of bugs users can hit and the stable bot is
> spammy because we miss lots of commits that should be in stable.
> 
> As the kernel grows, not only the codebase is expanding but also the
> tooling around it. While spammy, bots provide valuable input that in the
> past would take blood, sweat and tears to acquire. We just need a better
> way to consume it rather than blocking off these inputs.

As one of the primary abusers of the "I send too much email" flood of
mess, I'm going to start cutting down on what type of message I respond
to a public vger list from now on.  I'll write more on the stable@ list
about this, but for your individual patches like this, how about just
responding to the developers themselves and not a public list?  I do
that when I commit a patch to my tree, stripping out lists, as it's not
a useful message for anyone other than the person who wrote the patch
and the reviewers.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH net-next] netns: filter uevents correctly
From: Christian Brauner @ 2018-04-11  9:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Kirill Tkhai, davem, gregkh, netdev, linux-kernel, avagin, serge
In-Reply-To: <87in8zumpd.fsf@xmission.com>

On Tue, Apr 10, 2018 at 10:04:46AM -0500, Eric W. Biederman wrote:
> Christian Brauner <christian.brauner@canonical.com> writes:
> 
> > On Mon, Apr 09, 2018 at 06:21:31PM -0500, Eric W. Biederman wrote:
> >> Christian Brauner <christian.brauner@canonical.com> writes:
> >> 
> >> > On Thu, Apr 05, 2018 at 10:59:49PM -0500, Eric W. Biederman wrote:
> >> >> Christian Brauner <christian.brauner@canonical.com> writes:
> >> >> 
> >> >> > On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
> >> >> >> On 05.04.2018 17:07, Christian Brauner wrote:
> >> >> >> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
> >> >> >> >> On 04.04.2018 22:48, Christian Brauner wrote:
> >> >> >> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")
> >> >> >> >>>
> >> >> >> >>> enabled sending hotplug events into all network namespaces back in 2010.
> >> >> >> >>> Over time the set of uevents that get sent into all network namespaces has
> >> >> >> >>> shrunk. We have now reached the point where hotplug events for all devices
> >> >> >> >>> that carry a namespace tag are filtered according to that namespace.
> >> >> >> >>>
> >> >> >> >>> Specifically, they are filtered whenever the namespace tag of the kobject
> >> >> >> >>> does not match the namespace tag of the netlink socket. One example are
> >> >> >> >>> network devices. Uevents for network devices only show up in the network
> >> >> >> >>> namespaces these devices are moved to or created in.
> >> >> >> >>>
> >> >> >> >>> However, any uevent for a kobject that does not have a namespace tag
> >> >> >> >>> associated with it will not be filtered and we will *try* to broadcast it
> >> >> >> >>> into all network namespaces.
> >> >> >> >>>
> >> >> >> >>> The original patchset was written in 2010 before user namespaces were a
> >> >> >> >>> thing. With the introduction of user namespaces sending out uevents became
> >> >> >> >>> partially isolated as they were filtered by user namespaces:
> >> >> >> >>>
> >> >> >> >>> net/netlink/af_netlink.c:do_one_broadcast()
> >> >> >> >>>
> >> >> >> >>> if (!net_eq(sock_net(sk), p->net)) {
> >> >> >> >>>         if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> >> >> >> >>>                 return;
> >> >> >> >>>
> >> >> >> >>>         if (!peernet_has_id(sock_net(sk), p->net))
> >> >> >> >>>                 return;
> >> >> >> >>>
> >> >> >> >>>         if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> >> >> >> >>>                              CAP_NET_BROADCAST))
> >> >> >> >>>         j       return;
> >> >> >> >>> }
> >> >> >> >>>
> >> >> >> >>> The file_ns_capable() check will check whether the caller had
> >> >> >> >>> CAP_NET_BROADCAST at the time of opening the netlink socket in the user
> >> >> >> >>> namespace of interest. This check is fine in general but seems insufficient
> >> >> >> >>> to me when paired with uevents. The reason is that devices always belong to
> >> >> >> >>> the initial user namespace so uevents for kobjects that do not carry a
> >> >> >> >>> namespace tag should never be sent into another user namespace. This has
> >> >> >> >>> been the intention all along. But there's one case where this breaks,
> >> >> >> >>> namely if a new user namespace is created by root on the host and an
> >> >> >> >>> identity mapping is established between root on the host and root in the
> >> >> >> >>> new user namespace. Here's a reproducer:
> >> >> >> >>>
> >> >> >> >>>  sudo unshare -U --map-root
> >> >> >> >>>  udevadm monitor -k
> >> >> >> >>>  # Now change to initial user namespace and e.g. do
> >> >> >> >>>  modprobe kvm
> >> >> >> >>>  # or
> >> >> >> >>>  rmmod kvm
> >> >> >> >>>
> >> >> >> >>> will allow the non-initial user namespace to retrieve all uevents from the
> >> >> >> >>> host. This seems very anecdotal given that in the general case user
> >> >> >> >>> namespaces do not see any uevents and also can't really do anything useful
> >> >> >> >>> with them.
> >> >> >> >>>
> >> >> >> >>> Additionally, it is now possible to send uevents from userspace. As such we
> >> >> >> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
> >> >> >> >>> namespace of the network namespace of the netlink socket) userspace process
> >> >> >> >>> make a decision what uevents should be sent.
> >> >> >> >>>
> >> >> >> >>> This makes me think that we should simply ensure that uevents for kobjects
> >> >> >> >>> that do not carry a namespace tag are *always* filtered by user namespace
> >> >> >> >>> in kobj_bcast_filter(). Specifically:
> >> >> >> >>> - If the owning user namespace of the uevent socket is not init_user_ns the
> >> >> >> >>>   event will always be filtered.
> >> >> >> >>> - If the network namespace the uevent socket belongs to was created in the
> >> >> >> >>>   initial user namespace but was opened from a non-initial user namespace
> >> >> >> >>>   the event will be filtered as well.
> >> >> >> >>> Put another way, uevents for kobjects not carrying a namespace tag are now
> >> >> >> >>> always only sent to the initial user namespace. The regression potential
> >> >> >> >>> for this is near to non-existent since user namespaces can't really do
> >> >> >> >>> anything with interesting devices.
> >> >> >> >>>
> >> >> >> >>> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> >> >> >> >>> ---
> >> >> >> >>>  lib/kobject_uevent.c | 10 +++++++++-
> >> >> >> >>>  1 file changed, 9 insertions(+), 1 deletion(-)
> >> >> >> >>>
> >> >> >> >>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> >> >> >> >>> index 15ea216a67ce..cb98cddb6e3b 100644
> >> >> >> >>> --- a/lib/kobject_uevent.c
> >> >> >> >>> +++ b/lib/kobject_uevent.c
> >> >> >> >>> @@ -251,7 +251,15 @@ static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
> >> >> >> >>>  		return sock_ns != ns;
> >> >> >> >>>  	}
> >> >> >> >>>  
> >> >> >> >>> -	return 0;
> >> >> >> >>> +	/*
> >> >> >> >>> +	 * The kobject does not carry a namespace tag so filter by user
> >> >> >> >>> +	 * namespace below.
> >> >> >> >>> +	 */
> >> >> >> >>> +	if (sock_net(dsk)->user_ns != &init_user_ns)
> >> >> >> >>> +		return 1;
> >> >> >> >>> +
> >> >> >> >>> +	/* Check if socket was opened from non-initial user namespace. */
> >> >> >> >>> +	return sk_user_ns(dsk) != &init_user_ns;
> >> >> >> >>>  }
> >> >> >> >>>  #endif
> >> >> >> >>
> >> >> >> >> So, this prohibits to listen events of all devices except network-related
> >> >> >> >> in containers? If it's so, I don't think it's a good solution. Uevents is not
> >> >> >> > 
> >> >> >> > No, this is not correct: As it is right now *without my patch* no
> >> >> >> > non-initial user namespace is receiving *any uevents* but those
> >> >> >> > specifically namespaced such as those for network devices. This patch
> >> >> >> > doesn't change that at all. The commit message outlines this in detail
> >> >> >> > how this comes about.
> >> >> >> > There is only one case where this currently breaks and this is as I
> >> >> >> > outlined explicitly in my commit message when you create a new user
> >> >> >> > namespace and map container(0) -> host(0). This patch fixes this.
> >> >> >> 
> >> >> >> Could you please point the place, where non-initial user namespaces are filtered?
> >> >> >> I only see the kobj_bcast_filter() logic, and it used to return 0, which means "accepted".
> >> >> >> Now it will return 1 sometimes.
> >> >> >
> >> >> > Oh sure, it's in the commit message though. The callchain is
> >> >> > lib/kobject_uevent.c:kobject_uevent_net_broadcast() ->
> >> >> > nnet/netlink/af_netlink.c:netlink_broadcast_filtered() ->
> >> >> > net/netlink/af_netlink.c:do_one_broadcast():
> >> >> >
> >> >> > This codepiece will check whether the openened socket holds
> >> >> > CAP_NET_BROADCAST in the user namespace of the target network namespace
> >> >> > which it won't because we don't have device namespaces and all devices
> >> >> > belong to the initial set of namespaces.
> >> >> >
> >> >> >         if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> >> >> >                              CAP_NET_BROADCAST))
> >> >> >         j       return;
> >> >> >
> >> >> 
> >> >> The above that only applies if someone has set NETLINK_F_LISTEN_ALL_NSID
> >> >> on their socket and has had someone with the appropriate privileges
> >> >> assign a peerrnetid.
> >> >> 
> >> >> All of which is to say that file_ns_capable is not nearly as applicable
> >> >> as it might be, and if you can pass the other two checks I think it is
> >> >> pointless (because the peernet attributes are not generated for
> >> >> kobj_uevents) but valid to receive events from outside your network
> >> >> namespace.
> >> >> 
> >> >> 
> >> >> I might be missing something but I don't see anything excluding network
> >> >> namespaces owned by !init_user_ns excluded from the kobject_uevent
> >> >> logic.
> >> >> 
> >> >> The uevent_sock_list has one entry per network namespace. And
> >> >> kobject_uevent_net_broacast appears to walk each one.
> >> >> 
> >> >> I had a memory of filtering uevent messages and I had a memory
> >> >> that the netlink_has_listeners had a per network namespace effect.
> >> >> Neither seems true from my inspection of the code tonight.
> >> >> 
> >> >> If we are not filtering ordinary uevents at least at the user namespace
> >> >> level that does seem to be at least a nuisance.
> >> >> 
> >> >> 
> >> >> Christian can you dig a little deeper into this.  I have a feeling that
> >> >> there are some real efficiency improvements that we could make to
> >> >> kobject_uevent_net_broadcast if nothing else.
> >> >> 
> >> >> Perhaps you could see where uevents are broadcast by poking
> >> >> the sysfs uevent of an existing device, and triggering a retransmit.
> >> >
> >> > @Eric, so I did some intensive testing over the weekend and forget everything I
> >> > said before. Uevents are not filtered by the kernel at the moment. This is
> >> > currently - apart from network devices - a pure userspace thing. Specifically,
> >> > anyone  on the host can listen to all uevents from everywhere. It's neither
> >> > filtered by user nor by network namespace. The fact that I used
> >> >
> >> > udevadm --debug monitor
> >> >
> >> > to test my prior hypothesis was what led me to believe that uevents are already
> >> > correctly filtered.
> >> > Instead, what is actually happening is that every udev implementation out there
> >> > is discarding uevents that were send by uids != 0 in the CMSG_DATA.
> >> > Specifically,
> >> 
> >> Yes.  I remember something of the sort.  I believe udev also filters to
> >> ensure that the netlink port id == 0 to ensure the message came from
> >> the kernel and was not spoofed in any way.
> >> 
> >> > - legacy standalone udevd:
> >> >   https://git.kernel.org/pub/scm/linux/hotplug/udev.git/snapshot/udev-062.tar.gz
> >> > - eudevd
> >> >   https://github.com/gentoo/eudev/blob/6f630d32bf494a457171b3f99e329592497bf271/src/libudev/libudev-monitor.c#L645
> >> > - systemd-udevd
> >> >   https://github.com/systemd/systemd/blob/e89ab7f219a399ab719c78cf43c07c0da60bd151/src/libudev/libudev-monitor.c#L656
> >> > - ueventd (Android)
> >> >   https://android.googlesource.com/platform/system/core.git/+/android-8.1.0_r22/libcutils/uevent.c#81
> >> >
> >> > For all of those I could trace this behavior back to the first released
> >> > version. (To be precise, for legacy udevd that eventually became systemd-udevd
> >> > I could trace it back to the first version that is still available on
> >> > git.kernel.org which is 062. Since eudevd is a fork of systemd-udevd it is
> >> > trivially true that it has the same behavior from the beginning.)
> >> >
> >> > In any case, userspace udev is not making use of uevents at all right now since
> >> > any uid != 0 events are **explicitly** discarded.
> >> > The fact that you receive uevents for
> >> >
> >> > sudo unshare -U --map-root -n
> >> > udevadm --debug monitor
> >> >
> >> > is simply explained by the fact that container(0) <=> host(0) at which point
> >> > the uid in CMSG_DATA will be 0 in the new user namespace and udev will not
> >> > discard it.
> >> > The use case for receiving uevents in containers/user namespaces is definitely
> >> > there but that's what the uevent injection patch series was for that we merged.
> >> > This is a much safer and saner solution.
> >> > The fact that all udev implementations filter uevents by uid != 0 very much
> >> > seems like a security mechanism in userspace that we probably should provide by
> >> > isolating uevents based on user and/or network namespaces.
> >> 
> >> So in summary.  Uevents are filtered in a user namespace (by userspace)
> >> because the received uid != 0.  It instead == 65534 == "nobody" because
> >> the global root uid is not mapped.
> >
> > Exactly.
> >
> >> 
> >> Which means that we can modify the kernel to not send to all network
> >> namespaces whose user_ns != &init_user_ns because we know that userspace
> >> will ignore the message because of the uid anyway.  Which means when
> >
> > Yes.
> >
> >> net-next reopens you can send that patch.  But please base it on just
> >> not including network namespaces in the list, as that is much more
> >> efficient than adding more conditions to the filter.
> >
> > I'll send a patch out once net-next reopens. I'll also make sure to
> > inform all udev userspace implementations of the change. It won't affect
> > them but it is nice for them to know that they're safer now.
> 
> The real danger is in a user namespace or in a container really is too
> many daemons responding to events will generate a thundering hurd of
> activity when there is really nothing to do.
> 
> > Something like this (Proper commit message and so on will be added once
> > I sent this out.):
> 
> Exactly.
> 
> I would make the comment say something like: "ignore all but the initial
> user namespace".

Yeah, agreed.
But I think the patch is not complete. To guarantee that no non-initial
user namespace actually receives uevents we need to:
1. only sent uevents to uevent sockets that are located in network
   namespaces that are owned by init_user_ns
2. filter uevents that are sent to sockets in mc_list that have opened a
   uevent socket that is owned by init_user_ns *from* a
   non-init_user_ns

We account for 1. by only recording uevent sockets in the global uevent
socket list who are owned by init_user_ns.
But to account for 2. we need to filter by the user namespace who owns
the socket in mc_list. So in addition to that we also need to slightly
change the filter logic in kobj_bcast_filter() I think:

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 22a2c1a98b8f..064d7d29ace5 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -251,7 +251,8 @@ static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
 		return sock_ns != ns;
 	}
 
-	return 0;
+ 	/* Check if socket was opened from non-initial user namespace. */
+ 	return sk_user_ns(dsk) != &init_user_ns;
 }
 #endif
 

But correct me if I'm wrong.

Christian

> 
> Eric
> 
> 
> > From 68d3c27435520cd600874999b6d9d17572854a7a Mon Sep 17 00:00:00 2001
> > From: Christian Brauner <christian.brauner@ubuntu.com>
> > Date: Tue, 10 Apr 2018 11:56:49 +0200
> > Subject: [PATCH] netns: restrict uevents to initial user namespace
> >
> > /* Here'll be a useful commit message describing in detail what's
> >  * happening once I sent it to net-next. */
> >
> > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> > ---
> >  lib/kobject_uevent.c | 21 +++++++++++++++------
> >  1 file changed, 15 insertions(+), 6 deletions(-)
> >
> > diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> > index 15ea216a67ce..22a2c1a98b8f 100644
> > --- a/lib/kobject_uevent.c
> > +++ b/lib/kobject_uevent.c
> > @@ -703,9 +703,16 @@ static int uevent_net_init(struct net *net)
> >  
> >  	net->uevent_sock = ue_sk;
> >  
> > -	mutex_lock(&uevent_sock_mutex);
> > -	list_add_tail(&ue_sk->list, &uevent_sock_list);
> > -	mutex_unlock(&uevent_sock_mutex);
> > +	/*
> > +	 * Only sent uevents to uevent sockets that are located in network
> > +	 * namespaces owned by the initial user namespace.
> > +	 */
> > +	if (sock_net(ue_sk->sk)->user_ns == &init_user_ns) {
> > +		mutex_lock(&uevent_sock_mutex);
> > +		list_add_tail(&ue_sk->list, &uevent_sock_list);
> > +		mutex_unlock(&uevent_sock_mutex);
> > +	}
> > +
> >  	return 0;
> >  }
> >  
> > @@ -713,9 +720,11 @@ static void uevent_net_exit(struct net *net)
> >  {
> >  	struct uevent_sock *ue_sk = net->uevent_sock;
> >  
> > -	mutex_lock(&uevent_sock_mutex);
> > -	list_del(&ue_sk->list);
> > -	mutex_unlock(&uevent_sock_mutex);
> > +	if (sock_net(ue_sk->sk)->user_ns == &init_user_ns) {
> > +		mutex_lock(&uevent_sock_mutex);
> > +		list_del(&ue_sk->list);
> > +		mutex_unlock(&uevent_sock_mutex);
> > +	}
> >  
> >  	netlink_kernel_release(ue_sk->sk);
> >  	kfree(ue_sk);

^ permalink raw reply related

* [iproute PATCH] iproute: Abort if nexthop cannot be parsed
From: Jakub Sitnicki @ 2018-04-11  9:43 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Attempt to add a multipath route where a nexthop definition refers to a
non-existent device causes 'ip' to crash and burn due to stack buffer
overflow:

  # ip -6 route add fd00::1/64 nexthop dev fake1
  Cannot find device "fake1"
  Cannot find device "fake1"
  Cannot find device "fake1"
  ...
  Segmentation fault (core dumped)

Don't ignore errors from the helper routine that parses the nexthop
definition, and abort immediately if parsing fails.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
---
 ip/iproute.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index 1d8fd815..44351bc5 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -1038,7 +1038,10 @@ static int parse_nexthops(struct nlmsghdr *n, struct rtmsg *r,
 		memset(rtnh, 0, sizeof(*rtnh));
 		rtnh->rtnh_len = sizeof(*rtnh);
 		rta->rta_len += rtnh->rtnh_len;
-		parse_one_nh(n, r, rta, rtnh, &argc, &argv);
+		if (parse_one_nh(n, r, rta, rtnh, &argc, &argv)) {
+			fprintf(stderr, "Error: cannot parse nexthop\n");
+			exit(-1);
+		}
 		rtnh = RTNH_NEXT(rtnh);
 	}
 
-- 
2.14.3

^ permalink raw reply related

* [PATCH] lan78xx: Correctly indicate invalid OTP
From: Phil Elwell @ 2018-04-11  9:59 UTC (permalink / raw)
  To: Woojung Huh, Microchip Linux Driver Support, netdev, linux-usb,
	linux-kernel
  Cc: Phil Elwell

lan78xx_read_otp tries to return -EINVAL in the event of invalid OTP
content, but the value gets overwritten before it is returned and the
read goes ahead anyway. Make the read conditional as it should be
and preserve the error code.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
---
 drivers/net/usb/lan78xx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 55a78eb..32cf217 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -928,7 +928,8 @@ static int lan78xx_read_otp(struct lan78xx_net *dev, u32 offset,
 			offset += 0x100;
 		else
 			ret = -EINVAL;
-		ret = lan78xx_read_raw_otp(dev, offset, length, data);
+		if (!ret)
+			ret = lan78xx_read_raw_otp(dev, offset, length, data);
 	}
 
 	return ret;
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH v2] net: tipc: Replace GFP_ATOMIC with GFP_KERNEL in tipc_mon_create
From: Ying Xue @ 2018-04-11 10:11 UTC (permalink / raw)
  To: Jia-Ju Bai, jon.maloy, davem; +Cc: netdev, tipc-discussion, linux-kernel
In-Reply-To: <1523323053-28127-1-git-send-email-baijiaju1990@gmail.com>

On 04/10/2018 09:17 AM, Jia-Ju Bai wrote:
> tipc_mon_create() is never called in atomic context.
> 
> The call chain ending up at dn_route_init() is:

Sorry, I don't think there is any relationship between the following
call chain with dn_route_init().

> [1] tipc_mon_create() <- tipc_enable_bearer() <- tipc_nl_bearer_enable()
> tipc_nl_bearer_enable() calls rtnl_lock(), which indicates this function
> is not called in atomic context.
> 
> Despite never getting called from atomic context,
> tipc_mon_create() calls kzalloc() with GFP_ATOMIC,
> which does not sleep for allocation.
> GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
> which can sleep and improve the possibility of sucessful allocation.

s/sucessful/successful

> 
> This is found by a static analysis tool named DCNS written by myself.
> And I also manually check it.
> 
> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>> ---
> v2:
> * Modify the description of GFP_ATOMIC in v1.
>   Thank Eric for good advice.
> ---
>  net/tipc/monitor.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/tipc/monitor.c b/net/tipc/monitor.c
> index 9e109bb..9714d80 100644
> --- a/net/tipc/monitor.c
> +++ b/net/tipc/monitor.c
> @@ -604,9 +604,9 @@ int tipc_mon_create(struct net *net, int bearer_id)
>  	if (tn->monitors[bearer_id])
>  		return 0;
>  
> -	mon = kzalloc(sizeof(*mon), GFP_ATOMIC);
> -	self = kzalloc(sizeof(*self), GFP_ATOMIC);
> -	dom = kzalloc(sizeof(*dom), GFP_ATOMIC);
> +	mon = kzalloc(sizeof(*mon), GFP_KERNEL);
> +	self = kzalloc(sizeof(*self), GFP_KERNEL);
> +	dom = kzalloc(sizeof(*dom), GFP_KERNEL);
>  	if (!mon || !self || !dom) {
>  		kfree(mon);
>  		kfree(self);
> 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply

* Re: [RFC bpf-next v2 8/8] bpf: add documentation for eBPF helpers (58-64)
From: Jesper Dangaard Brouer @ 2018-04-11 10:17 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: brouer, daniel, ast, netdev, oss-drivers, linux-doc, linux-man,
	John Fastabend
In-Reply-To: <20180410144157.4831-9-quentin.monnet@netronome.com>

On Tue, 10 Apr 2018 15:41:57 +0100
Quentin Monnet <quentin.monnet@netronome.com> wrote:

> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 7343af4196c8..db090ad03626 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1250,6 +1250,51 @@ union bpf_attr {
>   * 	Return
>   * 		0 on success, or a negative error in case of failure.
>   *
> + * int bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
> + * 	Description
> + * 		Redirect the packet to the endpoint referenced by *map* at
> + * 		index *key*. Depending on its type, his *map* can contain
> + * 		references to net devices (for forwarding packets through other
> + * 		ports), or to CPUs (for redirecting XDP frames to another CPU;
> + * 		but this is not fully implemented as of this writing).

Stating that CPUMAP redirect "is not fully implemented" is confusing.
The issue is that CPUMAP only works for "native" XDP.

What about saying:

"[...] or to CPUs (for redirecting XDP frames to another CPU;
 but this is only implemented for native XDP as of this writing)"

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH v2] net: tipc: Replace GFP_ATOMIC with GFP_KERNEL in tipc_mon_create
From: Jia-Ju Bai @ 2018-04-11 10:18 UTC (permalink / raw)
  To: Ying Xue, jon.maloy, davem; +Cc: netdev, tipc-discussion, linux-kernel
In-Reply-To: <73c0b7cb-5148-9c38-d45b-750f711b5570@windriver.com>



On 2018/4/11 18:11, Ying Xue wrote:
> On 04/10/2018 09:17 AM, Jia-Ju Bai wrote:
>> tipc_mon_create() is never called in atomic context.
>>
>> The call chain ending up at dn_route_init() is:
> Sorry, I don't think there is any relationship between the following
> call chain with dn_route_init().
>
>> [1] tipc_mon_create() <- tipc_enable_bearer() <- tipc_nl_bearer_enable()
>> tipc_nl_bearer_enable() calls rtnl_lock(), which indicates this function
>> is not called in atomic context.
>>
>> Despite never getting called from atomic context,
>> tipc_mon_create() calls kzalloc() with GFP_ATOMIC,
>> which does not sleep for allocation.
>> GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
>> which can sleep and improve the possibility of sucessful allocation.
> s/sucessful/successful

Thanks for your reply.
I am sorry for my mistakes.
I will revised the text and send a V3.


Jia-Ju Bai

^ permalink raw reply

* [PATCH v3] net: tipc: Replace GFP_ATOMIC with GFP_KERNEL in tipc_mon_create
From: Jia-Ju Bai @ 2018-04-11 10:24 UTC (permalink / raw)
  To: jon.maloy, ying.xue, davem
  Cc: netdev, tipc-discussion, linux-kernel, Jia-Ju Bai

tipc_mon_create() is never called in atomic context.

The call chain ending up at tipc_mon_create() is:
[1] tipc_mon_create() <- tipc_enable_bearer() <- tipc_nl_bearer_enable()
tipc_nl_bearer_enable() calls rtnl_lock(), which indicates this function
is not called in atomic context.

Despite never getting called from atomic context,
tipc_mon_create() calls kzalloc() with GFP_ATOMIC,
which does not sleep for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
which can sleep and improve the possibility of successful allocation.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
v2:
* Modify the description of GFP_ATOMIC in v1.
  Thank Eric for good advice.
v3:
* Modify wrong text in description in v2.
  Thank Ying for good advice.
---
 net/tipc/monitor.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/tipc/monitor.c b/net/tipc/monitor.c
index 9e109bb..9714d80 100644
--- a/net/tipc/monitor.c
+++ b/net/tipc/monitor.c
@@ -604,9 +604,9 @@ int tipc_mon_create(struct net *net, int bearer_id)
 	if (tn->monitors[bearer_id])
 		return 0;

-	mon = kzalloc(sizeof(*mon), GFP_ATOMIC);
-	self = kzalloc(sizeof(*self), GFP_ATOMIC);
-	dom = kzalloc(sizeof(*dom), GFP_ATOMIC);
+	mon = kzalloc(sizeof(*mon), GFP_KERNEL);
+	self = kzalloc(sizeof(*self), GFP_KERNEL);
+	dom = kzalloc(sizeof(*dom), GFP_KERNEL);
 	if (!mon || !self || !dom) {
 		kfree(mon);
 		kfree(self);
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH v3] net: tipc: Replace GFP_ATOMIC with GFP_KERNEL in tipc_mon_create
From: Ying Xue @ 2018-04-11 10:26 UTC (permalink / raw)
  To: Jia-Ju Bai, jon.maloy, davem; +Cc: netdev, tipc-discussion, linux-kernel
In-Reply-To: <1523442262-4823-1-git-send-email-baijiaju1990@gmail.com>

On 04/11/2018 06:24 PM, Jia-Ju Bai wrote:
> tipc_mon_create() is never called in atomic context.
> 
> The call chain ending up at tipc_mon_create() is:
> [1] tipc_mon_create() <- tipc_enable_bearer() <- tipc_nl_bearer_enable()
> tipc_nl_bearer_enable() calls rtnl_lock(), which indicates this function
> is not called in atomic context.
> 
> Despite never getting called from atomic context,
> tipc_mon_create() calls kzalloc() with GFP_ATOMIC,
> which does not sleep for allocation.
> GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
> which can sleep and improve the possibility of successful allocation.
> 
> This is found by a static analysis tool named DCNS written by myself.
> And I also manually check it.
> 
> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>

Acked-by: Ying Xue <ying.xue@windriver.com>

> ---
> v2:
> * Modify the description of GFP_ATOMIC in v1.
>   Thank Eric for good advice.
> v3:
> * Modify wrong text in description in v2.
>   Thank Ying for good advice.
> ---
>  net/tipc/monitor.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/tipc/monitor.c b/net/tipc/monitor.c
> index 9e109bb..9714d80 100644
> --- a/net/tipc/monitor.c
> +++ b/net/tipc/monitor.c
> @@ -604,9 +604,9 @@ int tipc_mon_create(struct net *net, int bearer_id)
>  	if (tn->monitors[bearer_id])
>  		return 0;
>  
> -	mon = kzalloc(sizeof(*mon), GFP_ATOMIC);
> -	self = kzalloc(sizeof(*self), GFP_ATOMIC);
> -	dom = kzalloc(sizeof(*dom), GFP_ATOMIC);
> +	mon = kzalloc(sizeof(*mon), GFP_KERNEL);
> +	self = kzalloc(sizeof(*self), GFP_KERNEL);
> +	dom = kzalloc(sizeof(*dom), GFP_KERNEL);
>  	if (!mon || !self || !dom) {
>  		kfree(mon);
>  		kfree(self);
> 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply

* Re: TCP one-by-one acking - RFC interpretation question
From: Michal Kubecek @ 2018-04-11 10:58 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, Yuchung Cheng, Neal Cardwell,
	Kenneth Klette Jonassen
In-Reply-To: <f98ec5a8-391d-5b42-2690-0f58e8122f5a@gmail.com>

On Fri, Apr 06, 2018 at 09:49:58AM -0700, Eric Dumazet wrote:
> Cc Neal and Yuchung if they missed this thread.
> 
> On 04/06/2018 08:03 AM, Michal Kubecek wrote:
> > On Fri, Apr 06, 2018 at 05:01:29AM -0700, Eric Dumazet wrote:
> >>
> >>
> >> On 04/06/2018 03:05 AM, Michal Kubecek wrote:
> >>> Hello
> >>>
> >>> I encountered a strange behaviour of some (non-linux) TCP stack which
> >>> I believe is incorrect but support engineers from the company producing
> >>> it claim is OK.
> >>>
> >>> Assume a client (sender, Linux 4.4 kernel) sends a stream of MSS sized
> >>> segments but segments 2, 4 and 6 do not reach the server (receiver):
> >>>
> >>>          ACK             SAK             SAK             SAK
> >>>       +-------+-------+-------+-------+-------+-------+-------+
> >>>       |   1   |   2   |   3   |   4   |   5   |   6   |   7   |
> >>>       +-------+-------+-------+-------+-------+-------+-------+
> >>>     34273   35701   37129   38557   39985   41413   42841   44269
> >>>
> >>> When segment 2 is retransmitted after RTO timeout, normal response would
> >>> be ACK-ing segment 3 (38557) with SACK for 5 and 7 (39985-41413 and
> >>> 42841-44269).
> >>>
> >>> However, this server stack responds with two separate ACKs:
> >>>
> >>>   - ACK 37129, SACK 37129-38557 39985-41413 42841-44269
> >>>   - ACK 38557, SACK 39985-41413 42841-44269
> >>
> >> Hmmm... Yes this seems very very wrong and lazy.
> >>
> >> Have you verified behavior of more recent linux kernel to such threats ?
> > 
> > No, unfortunately the problem was only encountered by our customer in
> > production environment (they tried to reproduce in a test lab but no
> > luck). They are running backups to NFS server and it happens from time
> > to time (in the order of hours, IIUC). So it would be probably hard to
> > let them try with more recent kernel.
> > 
> > On the other hand, they reported that SLE11 clients (kernel 3.0) do not
> > run into this kind of problem. It was originally reported as a
> > a regression on migration from SLE11-SP4 (3.0 kernel) to SLE12-SP2 (4.4
> > kernel) and the problem was reported as "SLE12-SP2 is ignoring dupacks"
> > (which seems to be mostly caused by the switch to RACK).
> > 
> > It also seems that part of the problem is specific packet loss pattern
> > where at some point, many packets are lost in "every second" pattern.
> > The customer finally started to investigate this problem and it seems it
> > has something to do with their bonding setup (they provided no details,
> > my guess is packets are divided over two paths and one of them fails).
> > 
> >> packetdrill test would be relatively easy to write.
> > 
> > I'll try but I have very little experience with writing packetdrill
> > scripts so it will probably take some time.
> > 
> >> Regardless of this broken alien stack, we might be able to work around
> >> this faster than the vendor is able to fix and deploy a new stack.
> >>
> >> ( https://en.wikipedia.org/wiki/Robustness_principle )
> >> Be conservative in what you do, be liberal in what you accept from
> >> others...
> > 
> > I was thinking about this a bit. "Fixing" the acknowledgment number
> > could do the trick but it doesn't feel correct. We might use the fact
> > that TSecr of both ACKs above matches TSval of the retransmission which
> > triggered them so that RTT calculated from timestamp would be the right
> > one. So perhaps something like "prefer timestamp RTT if measured RTT
> > seems way too off". But I'm not sure if it couldn't break other use
> > cases where (high) measured RTT is actually correct, rather than (low)
> > timestamp RTT.

I stared at the code some more and apparently I was wrong. I put my
tracing right after the check

        if (flag & FLAG_SACK_RENEGING) {

in tcp_check_sack_reneging() and completely missed Neal's commit
5ae344c949e7 ("tcp: reduce spurious retransmits due to transient SACK
reneging") which deals with exactly this kind of broken TCP stacks
sending a series of acks for each segment from out of order queue.
Therefore it seems the events in my log weren't actual SACK reneging
events and the SACK scoreboard wasn't cleared.

There is something else I don't understand, though. In the case of
acking previously sacked and never retransmitted segment,
tcp_clean_rtx_queue() calculates the parameters for tcp_ack_update_rtt()
using

        if (sack->first_sackt.v64) {
                sack_rtt_us = skb_mstamp_us_delta(&now,
&sack->first_sackt);
                ca_rtt_us = skb_mstamp_us_delta(&now,
&sack->last_sackt);
        }

(in 4.4; mainline code replaces &now with tp->tcp_mstamp). If I read the
code correctly, both sack->first_sackt and sack->last_sackt contain
timestamps of initial segment transmission. This would mean we use the
time difference between the initial transmission and now, i.e. including
the RTO of the lost packet).

IMHO we should take the actual round trip time instead, i.e. the
difference between the original transmission and the time the packet
sacked (first time). It seems we have been doing this before commit
31231a8a8730 ("tcp: improve RTT from SACK for CC").

Michal Kubecek

^ permalink raw reply

* [RFC/PATCH] Add a socketoption IPV6_MULTICAST_ALL analogue to the IPV4 version
From: Andre Naujoks @ 2018-04-11 11:02 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, David Ahern,
	Greg Kroah-Hartman, Andre Naujoks, Maciej Żenczykowski,
	Shaohua Li, Paolo Abeni, Thomas Gleixner, Kate Stewart,
	Philippe Ombredanne, linux-kernel, netdev

Hi.

I was running into a problem, when trying to join multiple multicast groups
on a single socket and thus binding to the any-address on said socket. I
received traffic from multicast groups, I did not join on that socket and
was at first surprised by that. After reading some old e-mails/threads,
which came to the conclusion "It is, as it is."
(e.g https://marc.info/?l=linux-kernel&m=115815686626791&w=2), I discovered
the IPv4 socketoption IP_MULTICAST_ALL, which, when disabled, does exactly
what I would expect from a socket by default.

I propose a socket option for IPv6, which does the same and has the same
default as the IPv4 version. My first thought was, to just apply
IP_MULTICAST_ALL to a ipv6 socket, but that would change the behavior of
current applications and would probably be a big no-no.

Regards
  Andre

>From 473653086c05a3de839c3504885053f6254c7bc5 Mon Sep 17 00:00:00 2001
From: Andre Naujoks <nautsch2@gmail.com>
Date: Wed, 11 Apr 2018 12:38:28 +0200
Subject: [PATCH] Add a socketoption IPV6_MULTICAST_ALL analogue to the IPV4
 version

The socket option will be enabled by default to ensure current behaviour
is not changed. This is the same for the IPv4 version.

A socket bound to in6addr_any and a specific port will receive all traffic
on that port. Analogue to IP_MULTICAST_ALL, disable this behaviour, if
one or more multicast groups were joined (using said socket) and only
pass on multicast traffic from groups, which were explicitly joined via
this socket.

Without this option disabled a socket (system even) joined to multiple
multicast groups is very hard to get right. Filtering by destination
address has to take place in user space to avoid receiving multicast
traffic from other multicast groups, which might have traffic on the same
port.

The extension of the IP_MULTICAST_ALL socketoption to just apply to ipv6,
too, is not done to avoid changing the behaviour of current applications.

Signed-off-by: Andre Naujoks <nautsch2@gmail.com>
---
 include/linux/ipv6.h     |  3 ++-
 include/uapi/linux/in6.h |  1 +
 net/ipv6/af_inet6.c      |  1 +
 net/ipv6/ipv6_sockglue.c | 11 +++++++++++
 net/ipv6/mcast.c         |  2 +-
 5 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 8415bf1a9776..495e834c1367 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -274,7 +274,8 @@ struct ipv6_pinfo {
 						 */
 				dontfrag:1,
 				autoflowlabel:1,
-				autoflowlabel_set:1;
+				autoflowlabel_set:1,
+				mc_all:1;
 	__u8			min_hopcount;
 	__u8			tclass;
 	__be32			rcv_flowinfo;
diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
index ed291e55f024..71d82fe15b03 100644
--- a/include/uapi/linux/in6.h
+++ b/include/uapi/linux/in6.h
@@ -177,6 +177,7 @@ struct in6_flowlabel_req {
 #define IPV6_V6ONLY		26
 #define IPV6_JOIN_ANYCAST	27
 #define IPV6_LEAVE_ANYCAST	28
+#define IPV6_MULTICAST_ALL	29

 /* IPV6_MTU_DISCOVER values */
 #define IPV6_PMTUDISC_DONT		0
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 8da0b513f188..7844cd9d2f10 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -209,6 +209,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
 	np->hop_limit	= -1;
 	np->mcast_hops	= IPV6_DEFAULT_MCASTHOPS;
 	np->mc_loop	= 1;
+	np->mc_all	= 1;
 	np->pmtudisc	= IPV6_PMTUDISC_WANT;
 	np->repflow	= net->ipv6.sysctl.flowlabel_reflect;
 	sk->sk_ipv6only	= net->ipv6.sysctl.bindv6only;
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 4d780c7f0130..b2bc1942a2ee 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -664,6 +664,13 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 			retv = ipv6_sock_ac_drop(sk, mreq.ipv6mr_ifindex, &mreq.ipv6mr_acaddr);
 		break;
 	}
+	case IPV6_MULTICAST_ALL:
+		if (optlen < sizeof(int))
+			goto e_inval;
+		np->mc_all = valbool;
+		retv = 0;
+		break;
+
 	case MCAST_JOIN_GROUP:
 	case MCAST_LEAVE_GROUP:
 	{
@@ -1255,6 +1262,10 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		val = np->mcast_oif;
 		break;

+	case IPV6_MULTICAST_ALL:
+		val = np->mc_all;
+		break;
+
 	case IPV6_UNICAST_IF:
 		val = (__force int)htonl((__u32) np->ucast_oif);
 		break;
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 793159d77d8a..623ad00eb3c2 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -622,7 +622,7 @@ bool inet6_mc_check(struct sock *sk, const struct in6_addr *mc_addr,
 	}
 	if (!mc) {
 		rcu_read_unlock();
-		return true;
+		return np->mc_all;
 	}
 	read_lock(&mc->sflock);
 	psl = mc->sflist;
-- 
2.17.0

^ permalink raw reply related

* [PATCH] lan78xx: Avoid spurious kevent 4 "error"
From: Phil Elwell @ 2018-04-11 11:02 UTC (permalink / raw)
  To: Woojung Huh, Microchip Linux Driver Support, netdev, linux-usb,
	linux-kernel
  Cc: Phil Elwell

lan78xx_defer_event generates an error message whenever the work item
is already scheduled. lan78xx_open defers three events -
EVENT_STAT_UPDATE, EVENT_DEV_OPEN and EVENT_LINK_RESET. Being aware
of the likelihood (or certainty) of an error message, the DEV_OPEN
event is added to the set of pending events directly, relying on
the subsequent deferral of the EVENT_LINK_RESET call to schedule the
work.  Take the same precaution with EVENT_STAT_UPDATE to avoid a
totally unnecessary error message.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
---
 drivers/net/usb/lan78xx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 32cf217..3102374 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2507,7 +2507,7 @@ static void lan78xx_init_stats(struct lan78xx_net *dev)
 	dev->stats.rollover_max.eee_tx_lpi_transitions = 0xFFFFFFFF;
 	dev->stats.rollover_max.eee_tx_lpi_time = 0xFFFFFFFF;

-	lan78xx_defer_kevent(dev, EVENT_STAT_UPDATE);
+	set_bit(EVENT_STAT_UPDATE, &dev->flags);
 }

 static int lan78xx_open(struct net_device *net)
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 net] slip: Check if rstate is initialized before uncompressing
From: Tejaswi Tanikella @ 2018-04-11 11:04 UTC (permalink / raw)
  To: netdev, davem; +Cc: g.nault

On receiving a packet the state index points to the rstate which must be
used to fill up IP and TCP headers. But if the state index points to a
rstate which is unitialized, i.e. filled with zeros, it gets stuck in an
infinite loop inside ip_fast_csum trying to compute the ip checsum of a
header with zero length.

89.666953:   <2> [<ffffff9dd3e94d38>] slhc_uncompress+0x464/0x468
89.666965:   <2> [<ffffff9dd3e87d88>] ppp_receive_nonmp_frame+0x3b4/0x65c
89.666978:   <2> [<ffffff9dd3e89dd4>] ppp_receive_frame+0x64/0x7e0
89.666991:   <2> [<ffffff9dd3e8a708>] ppp_input+0x104/0x198
89.667005:   <2> [<ffffff9dd3e93868>] pppopns_recv_core+0x238/0x370
89.667027:   <2> [<ffffff9dd4428fc8>] __sk_receive_skb+0xdc/0x250
89.667040:   <2> [<ffffff9dd3e939e4>] pppopns_recv+0x44/0x60
89.667053:   <2> [<ffffff9dd4426848>] __sock_queue_rcv_skb+0x16c/0x24c
89.667065:   <2> [<ffffff9dd4426954>] sock_queue_rcv_skb+0x2c/0x38
89.667085:   <2> [<ffffff9dd44f7358>] raw_rcv+0x124/0x154
89.667098:   <2> [<ffffff9dd44f7568>] raw_local_deliver+0x1e0/0x22c
89.667117:   <2> [<ffffff9dd44c8ba0>] ip_local_deliver_finish+0x70/0x24c
89.667131:   <2> [<ffffff9dd44c92f4>] ip_local_deliver+0x100/0x10c

./scripts/faddr2line vmlinux slhc_uncompress+0x464/0x468 output:
 ip_fast_csum at arch/arm64/include/asm/checksum.h:40
 (inlined by) slhc_uncompress at drivers/net/slip/slhc.c:615

Adding a variable to indicate if the current rstate is initialized. If
such a packet arrives, move to toss state.

Signed-off-by: Tejaswi Tanikella <tejaswit@codeaurora.org>
---
 drivers/net/slip/slhc.c | 5 +++++
 include/net/slhc_vj.h   | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/slip/slhc.c b/drivers/net/slip/slhc.c
index 5782733..f4e93f5 100644
--- a/drivers/net/slip/slhc.c
+++ b/drivers/net/slip/slhc.c
@@ -509,6 +509,10 @@ struct slcompress *
 		if(x < 0 || x > comp->rslot_limit)
 			goto bad;
 
+		/* Check if the cstate is initialized */
+		if (!comp->rstate[x].initialized)
+			goto bad;
+
 		comp->flags &=~ SLF_TOSS;
 		comp->recv_current = x;
 	} else {
@@ -673,6 +677,7 @@ struct slcompress *
 	if (cs->cs_tcp.doff > 5)
 	  memcpy(cs->cs_tcpopt, icp + ihl*4 + sizeof(struct tcphdr), (cs->cs_tcp.doff - 5) * 4);
 	cs->cs_hsize = ihl*2 + cs->cs_tcp.doff*2;
+	cs->initialized = true;
 	/* Put headers back on packet
 	 * Neither header checksum is recalculated
 	 */
diff --git a/include/net/slhc_vj.h b/include/net/slhc_vj.h
index 8716d59..8fcf890 100644
--- a/include/net/slhc_vj.h
+++ b/include/net/slhc_vj.h
@@ -127,6 +127,7 @@
  */
 struct cstate {
 	byte_t	cs_this;	/* connection id number (xmit) */
+	bool	initialized;	/* true if initialized */
 	struct cstate *next;	/* next in ring (xmit) */
 	struct iphdr cs_ip;	/* ip/tcp hdr from most recent packet */
 	struct tcphdr cs_tcp;
-- 
1.9.1

^ permalink raw reply related

* [PATCH] cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
From: Bassem Boubaker @ 2018-04-11 11:15 UTC (permalink / raw)
  Cc: bassem.boubaker, Oliver Neukum, linux-usb, netdev, linux-kernel

    The Cinterion AHS8 is a 3G device with one embedded WWAN interface
    using cdc_ether as a driver.

    The modem is  controlled via AT commands through the exposed TTYs.

    AT+CGDCONT write command can be used to activate or deactivate a WWAN
    connection for a PDP context defined with the same command. UE supports
    one WWAN adapter.

Signed-off-by: Bassem Boubaker <bassem.boubaker@actia.fr>
---
 drivers/net/usb/cdc_ether.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index fff4b13..5c42cf8 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -902,6 +902,12 @@ static const struct usb_device_id	products[] = {
 				      USB_CDC_PROTO_NONE),
 	.driver_info = (unsigned long)&wwan_info,
 }, {
+	/* Cinterion AHS3 modem by GEMALTO */
+	USB_DEVICE_AND_INTERFACE_INFO(0x1e2d, 0x0055, USB_CLASS_COMM,
+				      USB_CDC_SUBCLASS_ETHERNET,
+				      USB_CDC_PROTO_NONE),
+	.driver_info = (unsigned long)&wwan_info,
+}, {
 	USB_INTERFACE_INFO(USB_CLASS_COMM, USB_CDC_SUBCLASS_ETHERNET,
 			USB_CDC_PROTO_NONE),
 	.driver_info = (unsigned long) &cdc_info,
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
From: Oliver Neukum @ 2018-04-11 11:25 UTC (permalink / raw)
  To: Bassem Boubaker; +Cc: linux-kernel, linux-usb, netdev
In-Reply-To: <1523445353-7631-1-git-send-email-bassem.boubaker@actia.fr>

Am Mittwoch, den 11.04.2018, 13:15 +0200 schrieb Bassem Boubaker:
>     The Cinterion AHS8 is a 3G device with one embedded WWAN interface
>     using cdc_ether as a driver.
> 
>     The modem is  controlled via AT commands through the exposed TTYs.
> 
>     AT+CGDCONT write command can be used to activate or deactivate a WWAN
>     connection for a PDP context defined with the same command. UE supports
>     one WWAN adapter.
> 
> Signed-off-by: Bassem Boubaker <bassem.boubaker@actia.fr>
Acked-by: Oliver Neukum <oneukum@suse.com>

^ permalink raw reply

* Re: TCP one-by-one acking - RFC interpretation question
From: Michal Kubecek @ 2018-04-11 12:06 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, Yuchung Cheng, Neal Cardwell,
	Kenneth Klette Jonassen
In-Reply-To: <20180411105837.bwnpoqvbra43kjub@unicorn.suse.cz>

On Wed, Apr 11, 2018 at 12:58:37PM +0200, Michal Kubecek wrote:
> There is something else I don't understand, though. In the case of
> acking previously sacked and never retransmitted segment,
> tcp_clean_rtx_queue() calculates the parameters for tcp_ack_update_rtt()
> using
> 
>         if (sack->first_sackt.v64) {
>                 sack_rtt_us = skb_mstamp_us_delta(&now,
> &sack->first_sackt);
>                 ca_rtt_us = skb_mstamp_us_delta(&now,
> &sack->last_sackt);
>         }
> 
> (in 4.4; mainline code replaces &now with tp->tcp_mstamp). If I read the
> code correctly, both sack->first_sackt and sack->last_sackt contain
> timestamps of initial segment transmission. This would mean we use the
> time difference between the initial transmission and now, i.e. including
> the RTO of the lost packet).
> 
> IMHO we should take the actual round trip time instead, i.e. the
> difference between the original transmission and the time the packet
> sacked (first time). It seems we have been doing this before commit
> 31231a8a8730 ("tcp: improve RTT from SACK for CC").

Sorry for the noise, this was my misunderstanding, the first_sackt and
last_sackt values are only taken from segments newly sacked by ack
received right now, not those which were already sacked before.

The actual problem and unrealistic RTT measurements come from another
RFC violation I didn't mention before: the NAS doesn't follow RFC 2018
section 4 rule for ordering of SACK blocks. Rather than sending SACK
blocks three most recently received out-of-order blocks, it simply sends
first three ordered by sequence numbers. In the earlier example (odd
packets were received, even lost)

       ACK             SAK             SAK             SAK
    +-------+-------+-------+-------+-------+-------+-------+-------+-------+
    |   1   |   2   |   3   |   4   |   5   |   6   |   7   |   8   |   9   |
    +-------+-------+-------+-------+-------+-------+-------+-------+-------+
  34273   35701   37129   38557   39985   41413   42841   44269   45697   47125

it responds to retransmitted segment 2 by

  1. ACK 37129, SACK 37129-38557 39985-41413 42841-44269
  2. ACK 38557, SACK 39985-41413 42841-44269 45697-47125

This new SACK block 45697-47125 has not been retransmitted and as it
wasn't sacked before, it is considered newly sacked. Therefore it gets
processed and its deemed RTT (time since its original transmit time)
"poisons" the RTT calculation, leading to RTO spiraling up.

Thus if we want to work around the NAS behaviour, we would need to
recognize such new SACK block as "not really new" and ignore it for
first_sackt/last_sackt. I'm not sure if it's possible without
misinterpreting actually delayed out of order packets. Of course, it is
not clear if it's worth the effort to work around so severely broken TCP
implementations (two obvious RFC violations, even if we don't count the
one-by-one acking).

Michal Kubecek

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox