Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] net: ipv6: Truncate single route when it doesn't fit into dump buffer.
From: Johannes Berg @ 2017-05-14 21:00 UTC (permalink / raw)
  To: Jan Moskyto Matejka, David Ahern; +Cc: David Miller, mq, netdev, roopa
In-Reply-To: <20170513172915.5ram3mlsy4ihzwmw@lopatka.joja.cz>

On Sat, 2017-05-13 at 19:29 +0200, Jan Moskyto Matejka wrote:
> 
> > When adding a route to the skb, track whether it contains at least
> > 1
> > route. If not, it means the next route in the dump is larger than
> > the
> > given buffer. Detect this condition and error out of the dump -
> > returning an error to the user (-ENOSPC? or EMSGSIZE?)
> 
> EMSGSIZE seems OK for me.

If we return an error here, and consequently allow for userspace
changes to pick this up, perhaps we could also consider allowing to
split the dump between nexthops, so that arbitrary such things can be
returned.

We did a similar thing in nl80211 at some point - originally, a dump of
wireless devices present was doing a whole device per message, we later
allowed splitting a single device across multiple messages because
capability information was reaching a reasonable message size limit.

johannes

^ permalink raw reply

* Re: [PATCH] net: ipv6: Truncate single route when it doesn't fit into dump buffer.
From: David Ahern @ 2017-05-14 22:14 UTC (permalink / raw)
  To: Johannes Berg, Jan Moskyto Matejka; +Cc: David Miller, mq, netdev, roopa
In-Reply-To: <1494795637.2803.2.camel@sipsolutions.net>

On 5/14/17 3:00 PM, Johannes Berg wrote:
> On Sat, 2017-05-13 at 19:29 +0200, Jan Moskyto Matejka wrote:
>>
>>> When adding a route to the skb, track whether it contains at least
>>> 1
>>> route. If not, it means the next route in the dump is larger than
>>> the
>>> given buffer. Detect this condition and error out of the dump -
>>> returning an error to the user (-ENOSPC? or EMSGSIZE?)
>>
>> EMSGSIZE seems OK for me.
> 
> If we return an error here, and consequently allow for userspace
> changes to pick this up, perhaps we could also consider allowing to
> split the dump between nexthops, so that arbitrary such things can be
> returned.

Returning an error should not impact existing userspace; it should
already be checking for an error response in the message.

Splitting the dump between nexthops across multiple messages will have
repercussions on userspace. I think (at least for rtnetlink and links,
addresses, routes) userspace needs to provide a buffer large enough for
a complete object. If we limit the number of nexthops to something
reasonable (e.g., 256), then ipv4 for example will be ~ 3kB + lwt encap
size we are talking on the order of an 8kb maybe 16kB buffer. That is a
reasonable request for the API.

^ permalink raw reply

* Re: [PATCH] net: ipv6: Truncate single route when it doesn't fit into dump buffer.
From: Johannes Berg @ 2017-05-14 22:20 UTC (permalink / raw)
  To: David Ahern, Jan Moskyto Matejka; +Cc: David Miller, mq, netdev, roopa
In-Reply-To: <fefe8b7a-3264-333a-bab3-7e0dd82efed1@gmail.com>

On Sun, 2017-05-14 at 16:14 -0600, David Ahern wrote:
> On 5/14/17 3:00 PM, Johannes Berg wrote:
> > On Sat, 2017-05-13 at 19:29 +0200, Jan Moskyto Matejka wrote:
> > > 
> > > > When adding a route to the skb, track whether it contains at
> > > > least
> > > > 1
> > > > route. If not, it means the next route in the dump is larger
> > > > than
> > > > the
> > > > given buffer. Detect this condition and error out of the dump -
> > > > returning an error to the user (-ENOSPC? or EMSGSIZE?)
> > > 
> > > EMSGSIZE seems OK for me.
> > 
> > If we return an error here, and consequently allow for userspace
> > changes to pick this up, perhaps we could also consider allowing to
> > split the dump between nexthops, so that arbitrary such things can
> > be
> > returned.
> 
> Returning an error should not impact existing userspace; it should
> already be checking for an error response in the message.

Right. I was sort of thinking that you'd catch the error and try with a
bigger buffer or so.

> Splitting the dump between nexthops across multiple messages will
> have repercussions on userspace. I think (at least for rtnetlink and
> links, addresses, routes) userspace needs to provide a buffer large
> enough for a complete object. If we limit the number of nexthops to
> something reasonable (e.g., 256), then ipv4 for example will be ~ 3kB
> + lwt encap size we are talking on the order of an 8kb maybe 16kB
> buffer. That is a reasonable request for the API.

Yeah, that seems reasonable.

We had a bigger problem here, in that the # of channels supported, and
the amount of data per channel, was increasing all the time.

johannes

^ permalink raw reply

* 465 netdev
From: bcohen @ 2017-05-14 23:03 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: 2786216.zip --]
[-- Type: application/zip, Size: 4820 bytes --]

^ permalink raw reply

* Re: [PATCH 2/3] Fix ERROR: Macros with complex values should be enclosed in parentheses
From: Andrew Lunn @ 2017-05-15  0:25 UTC (permalink / raw)
  To: Maciek Fijalkowski; +Cc: mst, jasonwang, virtualization, netdev, linux-kernel
In-Reply-To: <20170514175130.18664-2-macfij7@wp.pl>

On Sun, May 14, 2017 at 07:51:29PM +0200, Maciek Fijalkowski wrote:
> From: Maciej Fijalkowski <macfij7@wp.pl>

Hi Maciek

Please include some commit message, even if it is just the checkpatch
error you are fixing.

Please include the subsystem/driver you are patch in the subject line.
Also, your subject of Fix ERROR: makes it sound a lot worse than it
is. 

net: virtio: Fix checkpatch error Complex macros should use ()

Also, you should at least compile test your change. It is clearly
wrong, as shown by 0-day.

       Andrew

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH net] i40e: proper update of the page_offset field
From: Alexander Duyck @ 2017-05-15  0:31 UTC (permalink / raw)
  To: Björn Töpel; +Cc: Netdev, intel-wired-lan, Björn Töpel
In-Reply-To: <20170514175649.31616-1-bjorn.topel@gmail.com>

On Sun, May 14, 2017 at 10:56 AM, Björn Töpel <bjorn.topel@gmail.com> wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> In f8b45b74cc62 ("i40e/i40evf: Use build_skb to build frames")
> i40e_build_skb updates the page_offset field with an incorrect offset,
> which can lead to data corruption. This patch updates page_offset
> correctly.
>
> Note that the bug only appears on architectures where PAGE_SIZE is
> 8192 or larger.
>
> Fixes: f8b45b74cc62 ("i40e/i40evf: Use build_skb to build frames")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>

Minor nit. I think i40evf has the same issue and requires the same
fix. Can you add that to this patch and resubmit? Otherwise this fix
looks good to me.

> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index 29321a6167a6..cd894f4023b1 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1854,7 +1854,8 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
>  #if (PAGE_SIZE < 8192)
>         unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
>  #else
> -       unsigned int truesize = SKB_DATA_ALIGN(size);
> +       unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
> +                               SKB_DATA_ALIGN(I40E_SKB_PAD + size);
>  #endif
>         struct sk_buff *skb;
>
> --
> 2.11.0
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply

* Re: [PATCH] net/smc: mark as BROKEN due to remote memory exposure
From: David Miller @ 2017-05-15  0:44 UTC (permalink / raw)
  To: Bart.VanAssche; +Cc: hch, netdev, linux-rdma, stable, ubraun
In-Reply-To: <1494788929.21774.1.camel@sandisk.com>

From: Bart Van Assche <Bart.VanAssche@sandisk.com>
Date: Sun, 14 May 2017 19:08:50 +0000

> What is your plan to avoid that applications start using and
> depending on AF_SMC?

The API is out there already so we are out of luck, and neither
you nor I nor anyone else can "stop" this from happening.

^ permalink raw reply

* Re: [PATCH 2/3] bpf: Track alignment of MAP pointers in verifier.
From: David Miller @ 2017-05-15  1:00 UTC (permalink / raw)
  To: daniel; +Cc: ast, alexei.starovoitov, netdev
In-Reply-To: <59186A2E.2010904@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Sun, 14 May 2017 16:31:10 +0200

> On 05/13/2017 04:28 AM, David Miller wrote:
>> @@ -823,10 +825,27 @@ static int check_pkt_ptr_alignment(const struct
>> bpf_reg_state *reg,
>>   }
>>
>>   static int check_val_ptr_alignment(const struct bpf_reg_state *reg,
>> -				   int size, bool strict)
>> +				   int off, int size, bool strict)
>>   {
>> -	if (strict && size != 1) {
>> -		verbose("Unknown alignment. Only byte-sized access allowed in value
>> -		access.\n");
>> +	int reg_off;
>> +
>> +	/* Byte size accesses are always allowed. */
>> +	if (!strict || size == 1)
>> +		return 0;
>> +
>> +	reg_off = reg->off;
>> +	if (reg->id) {
>> +		if (reg->aux_off_align % size) {
>> + verbose("Value access is only %u byte aligned, %d byte access not
>> allowed\n",
>> +				reg->aux_off_align, size);
>> +			return -EACCES;
>> +		}
>> +		reg_off += reg->aux_off;
>> +	}
>
> What are the semantics of using id here? In ptr_to_pkt, we have it,
> so that eventually, in find_good_pkt_pointers() we can match on id
> and update the range for all such regs with the same id. I'm just
> wondering as the side effect of this is that this makes state
> pruning worse.

Ok.  I was advancing reg->id so that it can be used here as the signal
that there is "auxiliary" components to the pointer, and thus we need
to take reg->aux_off_align and reg->aux_off into account.

I did not realize the state pruning component of reg->id so I'll need
to look more deeply into this.

We could use something other than reg->id to decide if there are
variable components to the pointer if necessary.

> Also, reg->off is currently only used in ptr_to_pkt types and
> checked as well in check_packet_access(). Now as semantics change,
> do we need to check for it as well in check_map_access_adj() which
> we currently don't do?

It should not be necessary.  Both before and after my changes we
validate the access range using the reg->min_value and reg->max_value.

>> -		/* a constant was added to pkt_ptr.
>> +		/* a constant was added to the pointer.
>>   		 * Remember it while keeping the same 'id'
>>   		 */
>>   		dst_reg->off += imm;
> 
> Can this now overflow for map type? Also in the UNKNOWN_VALUE case
> below since overflow checks are then only enforced in ptr_to_pkt case?

Indeed, we will have to do "something".  The reg->off used to be u16
and is now a u32 with my changes.  So it can handle something larger
than MAX_PACKET_OFF.

But we still have to handle overflow.  I am not so sure what range of
offsets is reasonable for these MAP pointers, can you make a
suggestion?

> 
>>   	} else {
>> -		bool had_id;
>> -
>> -		if (src_reg->type == PTR_TO_PACKET) {
>> +		if (is_packet && src_reg->type == PTR_TO_PACKET) {
>>   			/* R6=pkt(id=0,off=0,r=62) R7=imm22; r7 += r6 */
>>   			tmp_reg = *dst_reg;  /* save r7 state */
>>   			*dst_reg = *src_reg; /* copy pkt_ptr state r6 into r7 */
> 
> I believe clang could probably generate something similar also for
> map value pointers.

Ok, it should be easy to make that part work both with packet pointers
and MAPs.

Thanks for your feedback, I'll try to refine this some more.

^ permalink raw reply

* RE: [PATCH] net/smc: mark as BROKEN due to remote memory exposure
From: Parav Pandit @ 2017-05-15  1:58 UTC (permalink / raw)
  To: David Miller,
	Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org
  Cc: hch-jcswGhMUV9g@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
In-Reply-To: <20170514.204404.1844909849561204299.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

Hi Dave,

> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of David Miller
> Sent: Sunday, May 14, 2017 7:44 PM
> To: Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org
> Cc: hch-jcswGhMUV9g@public.gmane.org; netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
> Subject: Re: [PATCH] net/smc: mark as BROKEN due to remote memory
> exposure
> 
> From: Bart Van Assche <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Date: Sun, 14 May 2017 19:08:50 +0000
> 
> > What is your plan to avoid that applications start using and depending
> > on AF_SMC?
> 

status = socket(AF_SMC, field, IPPROT_TCP);
Here,
- AF_SMC actually means AF_INET IPv4 addresses!
- IPPROTO_TCP means TCP and RDMA both when socket is AF_SMC.
- When creating socket addresses, use AF_INET based addresses.
-  When invoking bind(), listen(), connect() APIs, use AF_INET addresses instead.
- Supporting IPv6 is TBD with AF_SMC sockets.
- At user level get_addrinfo will continue to return AF_INET addresses.

Such explanation for socket APIs doesn't sound correct.

The primary motivation for SMC protocol was to simplify the applications and library to make use of RDMA.
This kind of API is against such simplicity and creates more confusion.
RFC only gives example and doesn't asks to create new socket family.
I can provide more data, but a simple grep in get_addrinfo() and friend functions in user space has heavy dependence on AF_INET and AF_INET6.

> The API is out there already so we are out of luck, and neither you nor I nor
> anyone else can "stop" this from happening.

I think it is still not too late to fix this API. SMC is released in v4.11 very recently.
v4.12 is still not out.
Given the limitation of protocol being RoCEv1 only, we might not have many users whose applications will stop functioning.
(Which will anyway won't work for RoCEv2, and IPv6 addresses).

I propose,
(a) AF_SMC socket 43 can be marked reserved in future kernel versions to avoid use.
(b) New protocol family that represents TCP and RDMA protocol, may be named IPPROTO_SMC even though it is not a protocol in IP header.

We can possibly target to have this fix in 4.13 kernel timeframe.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info
> at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] net: socket: mark socket protocol handler structs as const
From: linzhang @ 2017-05-15  2:26 UTC (permalink / raw)
  To: marcel, gustavo, johan.hedberg, davem
  Cc: dmitry.tarnyagin, sameo, nhorman, linux-bluetooth, netdev,
	linux-kernel, linux-wireless, linzhang

Signed-off-by: linzhang <xiaolou4617@gmail.com>
---
 net/bluetooth/af_bluetooth.c | 2 +-
 net/caif/caif_socket.c       | 2 +-
 net/kcm/kcmsock.c            | 2 +-
 net/nfc/af_nfc.c             | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 42d0997..8a8f77a 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -733,7 +733,7 @@ void bt_procfs_cleanup(struct net *net, const char *name)
 EXPORT_SYMBOL(bt_procfs_init);
 EXPORT_SYMBOL(bt_procfs_cleanup);
 
-static struct net_proto_family bt_sock_family_ops = {
+static const struct net_proto_family bt_sock_family_ops = {
 	.owner	= THIS_MODULE,
 	.family	= PF_BLUETOOTH,
 	.create	= bt_sock_create,
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index adcad34..4674d17 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -1099,7 +1099,7 @@ static int caif_create(struct net *net, struct socket *sock, int protocol,
 }
 
 
-static struct net_proto_family caif_family_ops = {
+static const struct net_proto_family caif_family_ops = {
 	.family = PF_CAIF,
 	.create = caif_create,
 	.owner = THIS_MODULE,
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index deca20f..da49191 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -1985,7 +1985,7 @@ static int kcm_create(struct net *net, struct socket *sock,
 	return 0;
 }
 
-static struct net_proto_family kcm_family_ops = {
+static const struct net_proto_family kcm_family_ops = {
 	.family = PF_KCM,
 	.create = kcm_create,
 	.owner  = THIS_MODULE,
diff --git a/net/nfc/af_nfc.c b/net/nfc/af_nfc.c
index 54e40fa..d3e594e 100644
--- a/net/nfc/af_nfc.c
+++ b/net/nfc/af_nfc.c
@@ -48,7 +48,7 @@ static int nfc_sock_create(struct net *net, struct socket *sock, int proto,
 	return rc;
 }
 
-static struct net_proto_family nfc_sock_family_ops = {
+static const struct net_proto_family nfc_sock_family_ops = {
 	.owner  = THIS_MODULE,
 	.family = PF_NFC,
 	.create = nfc_sock_create,
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] kmod: don't load module unless req process has CAP_SYS_MODULE
From: Mahesh Bandewar (महेश बंडेवार) @ 2017-05-15  2:42 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Mahesh Bandewar, Ingo Molnar, LKML, netdev, Eric W . Biederman,
	Kees Cook, David Miller, Eric Dumazet
In-Reply-To: <20170514104537.GA29323@kroah.com>

On Sun, May 14, 2017 at 3:45 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Fri, May 12, 2017 at 04:22:59PM -0700, Mahesh Bandewar wrote:
>> From: Mahesh Bandewar <maheshb@google.com>
>>
[...]
>>   Now try to create a bridge inside this newly created net-ns which would
>>   mean bridge module need to be loaded.
>>   # ip link add br0 type bridge
>>   # echo $?
>>   0
>>   # lsmod | grep bridge
>>   bridge                110592  0
>>   stp                    16384  1 bridge
>>   llc                    16384  2 bridge,stp
>>   #
>>
>>   After this patch -
>>   # ip link add br0 type bridge
>>   RTNETLINK answers: Operation not supported
>>   # echo $?
>>   2
>>   # lsmod | grep bridge
>>   #
>
> Well, it only loads this because the kernel asked for it to be loaded,
> right?
>
Yes, kernel asked for it because of a user action.

>>
>> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
>> ---
>>  kernel/kmod.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/kernel/kmod.c b/kernel/kmod.c
>> index 563f97e2be36..ac30157169b7 100644
>> --- a/kernel/kmod.c
>> +++ b/kernel/kmod.c
>> @@ -133,6 +133,9 @@ int __request_module(bool wait, const char *fmt, ...)
>>  #define MAX_KMOD_CONCURRENT 50       /* Completely arbitrary value - KAO */
>>       static int kmod_loop_msg;
>>
>> +     if (!capable(CAP_SYS_MODULE))
>> +             return -EPERM;
>
> At first glance this looks right, but I'm worried what this will break
> that currently relies on this.  There might be lots of systems that are
> used to this being the method that the needed module is requested.  What
> about when userspace asks for a random char device and that module is
> then loaded?  Does this patch break that functionality?
>
Any module when loaded gets loaded system-wide as we can't allow
module loading per-ns. To validate the behavior I was comparing it
with insmod/modprobe, if that doesn't allow because of lack of this
capability in default-ns, then this *indirect* method of loading
module should not allow the same action and the behavior should be
consistent. So with that logic if userspace asks for a random
char-device if insmod/modprobe cannot load it, then this method should
not load it either for the consistency, right?

> thanks,
>
> greg k-h

^ permalink raw reply

* Re: [PATCH 2/3] Fix ERROR: Macros with complex values should be enclosed in parentheses
From: Michael S. Tsirkin @ 2017-05-15  2:56 UTC (permalink / raw)
  To: Maciek Fijalkowski; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20170514175130.18664-2-macfij7@wp.pl>

On Sun, May 14, 2017 at 07:51:29PM +0200, Maciek Fijalkowski wrote:
> From: Maciej Fijalkowski <macfij7@wp.pl>
> 
> Signed-off-by: Maciej Fijalkowski <macfij7@wp.pl>

This is not a complex value.

> ---
>  drivers/net/virtio_net.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f20dfb8..6c8170c 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2663,7 +2663,7 @@ static struct virtio_device_id id_table[] = {
>  };
>  
>  #define VIRTNET_FEATURES \
> -	VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GUEST_CSUM, \
> +	(VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GUEST_CSUM, \
>  	VIRTIO_NET_F_MAC, \
>  	VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_UFO, VIRTIO_NET_F_HOST_TSO6, \
>  	VIRTIO_NET_F_HOST_ECN, VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, \
> @@ -2672,7 +2672,7 @@ static struct virtio_device_id id_table[] = {
>  	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
>  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>  	VIRTIO_NET_F_CTRL_MAC_ADDR, \
> -	VIRTIO_NET_F_MTU
> +	VIRTIO_NET_F_MTU)
>  
>  static unsigned int features[] = {
>  	VIRTNET_FEATURES,
> -- 
> 2.4.11

^ permalink raw reply

* Re: [PATCH 1/3] Fix ERROR: trailing statements should be on next line
From: Michael S. Tsirkin @ 2017-05-15  2:58 UTC (permalink / raw)
  To: Maciek Fijalkowski; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20170514175130.18664-1-macfij7@wp.pl>

On Sun, May 14, 2017 at 07:51:28PM +0200, Maciek Fijalkowski wrote:
> From: Maciej Fijalkowski <macfij7@wp.pl>
> 
> Signed-off-by: Maciej Fijalkowski <macfij7@wp.pl>

I prefer the original form - ; isn't a full statement.

> ---
>  drivers/net/virtio_net.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9320d96..f20dfb8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -217,7 +217,8 @@ static void give_pages(struct receive_queue *rq, struct page *page)
>  	struct page *end;
>  
>  	/* Find end of list, sew whole thing into vi->rq.pages. */
> -	for (end = page; end->private; end = (struct page *)end->private);
> +	for (end = page; end->private; end = (struct page *)end->private)
> +		;
>  	end->private = (unsigned long)rq->pages;
>  	rq->pages = page;
>  }
> -- 
> 2.4.11

^ permalink raw reply

* Re: [PATCH 3/3] Fix ERROR: code indent should use tabs where possible
From: Michael S. Tsirkin @ 2017-05-15  3:00 UTC (permalink / raw)
  To: Maciek Fijalkowski; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20170514175130.18664-3-macfij7@wp.pl>

On Sun, May 14, 2017 at 07:51:30PM +0200, Maciek Fijalkowski wrote:
> From: Maciej Fijalkowski <macfij7@wp.pl>
> 
> Signed-off-by: Maciej Fijalkowski <macfij7@wp.pl>


Acked-by: Michael S. Tsirkin <mst@redhat.com>

But you really want to fix the subject. Make it less verbose
drop upper case and add info about module. Something like:

virtio_net: coding style fix: use tabs for indent

Also, you misspelled Jason's email.

> ---
>  drivers/net/virtio_net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 6c8170c..5d71e9f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2717,7 +2717,7 @@ static __init int virtio_net_driver_init(void)
>  	if (ret)
>  		goto err_dead;
>  
> -        ret = register_virtio_driver(&virtio_net_driver);
> +	ret = register_virtio_driver(&virtio_net_driver);
>  	if (ret)
>  		goto err_virtio;
>  	return 0;
> -- 
> 2.4.11

^ permalink raw reply

* Re: [PATCH 1/3] Fix ERROR: trailing statements should be on next line
From: Alex Williamson @ 2017-05-15  3:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, virtualization, linux-kernel, Maciek Fijalkowski
In-Reply-To: <20170515055700-mutt-send-email-mst@kernel.org>

On Mon, 15 May 2017 05:58:05 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Sun, May 14, 2017 at 07:51:28PM +0200, Maciek Fijalkowski wrote:
> > From: Maciej Fijalkowski <macfij7@wp.pl>
> > 
> > Signed-off-by: Maciej Fijalkowski <macfij7@wp.pl>  
> 
> I prefer the original form - ; isn't a full statement.
> 
> > ---
> >  drivers/net/virtio_net.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 9320d96..f20dfb8 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -217,7 +217,8 @@ static void give_pages(struct receive_queue *rq, struct page *page)
> >  	struct page *end;
> >  
> >  	/* Find end of list, sew whole thing into vi->rq.pages. */
> > -	for (end = page; end->private; end = (struct page *)end->private);
> > +	for (end = page; end->private; end = (struct page *)end->private)
> > +		;

FWIW, I generally like to put a comment on the next line to make it
abundantly clear that there's nothing in the body of the loop, it's
also more aesthetically pleasing than a semi-colon on the line by
itself, ex. /* Nothing */;  It's just too easy to misinterpret the
loop otherwise, especially without gratuitous white space.  Thanks,

Alex


> >  	end->private = (unsigned long)rq->pages;
> >  	rq->pages = page;
> >  }
> > -- 
> > 2.4.11  
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* [PATCH] net: x25: fix one potential use-after-free issue
From: linzhang @ 2017-05-15  4:12 UTC (permalink / raw)
  To: andrew.hendry, davem; +Cc: nhorman, linux-x25, netdev, linux-kernel, linzhang

The function x25_init is not properly unregister related resources
on error handler.It is will result in kernel oops if x25_init init
failed, so add right unregister call on error handler.

Signed-off-by: linzhang <xiaolou4617@gmail.com>
---
 net/x25/af_x25.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index 8b911c2..e01e09a 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -1811,12 +1811,14 @@ static int __init x25_init(void)
 	x25_register_sysctl();
 	rc = x25_proc_init();
 	if (rc != 0)
-		goto out_dev;
+		goto out_sysctl;
 out:
 	return rc;
-out_dev:
+out_sysctl:
+	x25_unregister_sysctl();
 	unregister_netdevice_notifier(&x25_dev_notifier);
 out_sock:
+	dev_remove_pack(&x25_packet_type);
 	sock_unregister(AF_X25);
 out_proto:
 	proto_unregister(&x25_proto);
-- 
1.8.3.1

^ permalink raw reply related

* Re: [Intel-wired-lan] [PATCH net] i40e: proper update of the page_offset field
From: Björn Töpel @ 2017-05-15  4:13 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Netdev, intel-wired-lan, Björn Töpel
In-Reply-To: <CAKgT0UdvyF5UJG9y0hWS3_JsjRTotHUEgK1bvOSiHWDSpMGggA@mail.gmail.com>

2017-05-15 2:31 GMT+02:00 Alexander Duyck <alexander.duyck@gmail.com>:
> On Sun, May 14, 2017 at 10:56 AM, Björn Töpel <bjorn.topel@gmail.com> wrote:
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> In f8b45b74cc62 ("i40e/i40evf: Use build_skb to build frames")
>> i40e_build_skb updates the page_offset field with an incorrect offset,
>> which can lead to data corruption. This patch updates page_offset
>> correctly.
>>
>> Note that the bug only appears on architectures where PAGE_SIZE is
>> 8192 or larger.
>>
>> Fixes: f8b45b74cc62 ("i40e/i40evf: Use build_skb to build frames")
>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>
> Minor nit. I think i40evf has the same issue and requires the same
> fix. Can you add that to this patch and resubmit? Otherwise this fix
> looks good to me.

Good point. I'll send a v2, that addresses i40evf as well.

>
>> ---
>>  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>> index 29321a6167a6..cd894f4023b1 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>> @@ -1854,7 +1854,8 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
>>  #if (PAGE_SIZE < 8192)
>>         unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
>>  #else
>> -       unsigned int truesize = SKB_DATA_ALIGN(size);
>> +       unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
>> +                               SKB_DATA_ALIGN(I40E_SKB_PAD + size);
>>  #endif
>>         struct sk_buff *skb;
>>
>> --
>> 2.11.0
>>
>> _______________________________________________
>> Intel-wired-lan mailing list
>> Intel-wired-lan@osuosl.org
>> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply

* Re: [PATCH v4] net/mlx4_core: Use min3 to select number of MSI-X vectors
From: Leon Romanovsky @ 2017-05-15  4:43 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: yishaih-VPRAkNaXOzVWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20170514190133.GC28994@yuval-lap>

[-- Attachment #1: Type: text/plain, Size: 2584 bytes --]

On Sun, May 14, 2017 at 10:01:34PM +0300, Yuval Shaia wrote:
> On Fri, May 12, 2017 at 09:10:51AM +0300, Yuval Shaia wrote:
> > Signed-off-by: Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> > ---
> > v0 -> v1:
> > 	* s/"min_t("/"min_t(int"
> > v1 -> v2:
> > 	* Use min3 instead of min_t twice
> > v2 -> v3:
> > 	* Change commit log header message to reflect the changes made in
> > 	  v2
> > v3 -> v4:
> > 	* Cast return value from num_online_cpus to int to avoid
> > 	  compilation errors from "sparse"
>
> Hi Leon,
> Got your r-b for v3, can you please review v4?

What was the sparse error?

num_online_cpus() is declared as "unsigned int", so you don't need to do casting.

➜  linux-rdma git:(master) git grep num_online_cpus include/
include/linux/cpumask.h:#define num_online_cpus()       cpumask_weight(cpu_online_mask)
include/linux/cpumask.h:#define num_online_cpus()       1U

➜  linux-rdma git:(master) git grep cpumask_weight include/
include/linux/cpumask.h:static inline unsigned int cpumask_weight(const struct cpumask *srcp)

Thanks

>
> Thanks,
>
> > ---
> >  drivers/net/ethernet/mellanox/mlx4/main.c | 10 ++++------
> >  1 file changed, 4 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
> > index 7032054..83aab1e 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/main.c
> > @@ -2862,12 +2862,10 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
> >  	int port = 0;
> >
> >  	if (msi_x) {
> > -		int nreq = dev->caps.num_ports * num_online_cpus() + 1;
> > -
> > -		nreq = min_t(int, dev->caps.num_eqs - dev->caps.reserved_eqs,
> > -			     nreq);
> > -		if (nreq > MAX_MSIX)
> > -			nreq = MAX_MSIX;
> > +		int nreq = min3(dev->caps.num_ports *
> > +				(int)num_online_cpus() + 1,
> > +				dev->caps.num_eqs - dev->caps.reserved_eqs,
> > +				MAX_MSIX);
> >
> >  		entries = kcalloc(nreq, sizeof *entries, GFP_KERNEL);
> >  		if (!entries)
> > --
> > 2.7.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH net v2] i40e/i40evf: proper update of the page_offset field
From: Björn Töpel @ 2017-05-15  4:52 UTC (permalink / raw)
  To: netdev, intel-wired-lan
  Cc: Björn Töpel, alexander.h.duyck, jeffrey.t.kirsher,
	alexander.duyck

From: Björn Töpel <bjorn.topel@intel.com>

In f8b45b74cc62 ("i40e/i40evf: Use build_skb to build frames")
i40e_build_skb updates the page_offset field with an incorrect offset,
which can lead to data corruption. This patch updates page_offset
correctly, by properly setting truesize.

Note that the bug only appears on architectures where PAGE_SIZE is
8192 or larger.

Fixes: f8b45b74cc62 ("i40e/i40evf: Use build_skb to build frames")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 3 ++-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 29321a6167a6..cd894f4023b1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1854,7 +1854,8 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
 #if (PAGE_SIZE < 8192)
 	unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
 #else
-	unsigned int truesize = SKB_DATA_ALIGN(size);
+	unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
+				SKB_DATA_ALIGN(I40E_SKB_PAD + size);
 #endif
 	struct sk_buff *skb;
 
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index dfe241a12ad0..12b02e530503 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1190,7 +1190,8 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
 #if (PAGE_SIZE < 8192)
 	unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
 #else
-	unsigned int truesize = SKB_DATA_ALIGN(size);
+	unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
+				SKB_DATA_ALIGN(I40E_SKB_PAD + size);
 #endif
 	struct sk_buff *skb;
 
-- 
2.11.0

^ permalink raw reply related

* Re: Queries regarding TAP interface
From: Kashyap, Saurav @ 2017-05-15  5:00 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <20170512085229.3cecd2d1@xeon-e3>

Hi Stephen,
Comments inline.



>On Fri, 12 May 2017 05:02:37 +0000
>"Kashyap, Saurav" <Saurav.Kashyap@cavium.com> wrote:
>
>> Hi,
>> 
>> I am using upstream kernel 4.11.0 and base is RHEL 7.2. I am playing around with TAP devices and created it using following commands
>> =======
>> ip tuntap add tap10 mode tap
>> ip addr add 172.28.12.1/24 dev tap10
>> ip link set tap10 up
>> ethtool -s tap10 msglvl 1
>> =======
>> 
>> I enabled debug in driver and also added some prints in tun_net_xmit function. I tried to ping using tap10 but I don’t see request reaching driver. I also tried to create a bridge using brctl and link with physical ethernet interface but it didn’t work.
>> 
>> Second experiment I did was to create two tap device on same system and try pinging one from another, that also didn’t worked. 
>> I have following queries
>> 1) Is I am missing something? 
>> 2) Is is possible to hook my own tx/rx to that tap device, so that data packets actually goes on wire?
>> 3) Does DHCP works with tap devices?
>> 
>> 
>> Thanks,
>> ~Saurav
>> 
>
>Don't you need an application reading/writing the tap device?
<SK> So ping is not intelligent enough to work with TAP? I had wrote a simple tap program that open a tun device (/dev/net/tun) and does read/write. With this program I can see that calls are going to the driver. I ran this program and from other terminal I tried ping but ping failed. Does ping/arping/dhcp utilities work with TAP interface?

>The carrier state of tap device is up/down based on whether the corresponding tap device is open

<SK> Yes, thank for an info.

Thanks,
~Saurav

^ permalink raw reply

* Re: [PATCH] net: fec: select queue depending on VLAN priority
From: Stefan Agner @ 2017-05-15  5:39 UTC (permalink / raw)
  To: Andy Duan; +Cc: David Miller, andrew, festevam, netdev, linux-kernel
In-Reply-To: <AM4PR0401MB2260CCF651251378BBE13CB7FFED0@AM4PR0401MB2260.eurprd04.prod.outlook.com>

On 2017-05-10 21:49, Andy Duan wrote:
> From: Stefan Agner <stefan@agner.ch> Sent: Thursday, May 11, 2017 12:08 PM
>>To: Andy Duan <fugang.duan@nxp.com>
>>Cc: David Miller <davem@davemloft.net>; andrew@lunn.ch;
>>festevam@gmail.com; netdev@vger.kernel.org; linux-
>>kernel@vger.kernel.org
>>Subject: Re: [PATCH] net: fec: select queue depending on VLAN priority
>>
>>On 2017-05-09 19:42, Andy Duan wrote:
>>> From: David Miller <davem@davemloft.net> Sent: Tuesday, May 09, 2017
>>> 9:39 PM
>>>>To: stefan@agner.ch
>>>>Cc: Andy Duan <fugang.duan@nxp.com>; andrew@lunn.ch;
>>>>festevam@gmail.com; netdev@vger.kernel.org; linux-
>>>>kernel@vger.kernel.org
>>>>Subject: Re: [PATCH] net: fec: select queue depending on VLAN priority
>>>>
>>>>From: Stefan Agner <stefan@agner.ch>
>>>>Date: Mon,  8 May 2017 22:37:08 -0700
>>>>
>>>>> Since the addition of the multi queue code with commit 59d0f7465644
>>>>> ("net: fec: init multi queue date structure") the queue selection
>>>>> has been handelt by the default transmit queue selection
>>>>> implementation which tries to evenly distribute the traffic across
>>>>> all available queues. This selection presumes that the queues are
>>>>> using an equal priority, however, the queues 1 and 2 are actually of
>>>>> higher priority (the classification of the queues is enabled in
>>fec_enet_enable_ring).
>>>>>
>>>>> This can lead to net scheduler warnings and continuous TX ring dumps
>>>>> when exercising the system with iperf.
>>>>>
>>>>> Use only queue 0 for all common traffic (no VLAN and P802.1p
>>>>> priority
>>>>> 0 and 1) and route level 2-7 through queue 1 and 2.
>>>>>
>>>>> Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
>>>>> Fixes: 59d0f7465644 ("net: fec: init multi queue date structure")
>>>>
>>>>If the queues are used for prioritization, and it does not have
>>>>multiple normal priority level queues, multiqueue is not what the
>>>>driver should have implemented.
>>> Firstly, HW multiple queues support:
>>> 	- Traffic-shaping bandwidth distribution supports credit-based and
>>> round-robin-based policies. Either policy can be combined with
>>> time-based shaping.
>>> 	- AVB (Audio Video Bridging, IEEE 802.1Qav) features:
>>> 		* Credit-based bandwidth distribution policy can be combined
>>with
>>> time-based shaping
>>> 		* AVB endpoint talker and listener support
>>> 		* Support for arbitration between different priority traffic (for
>>> example, AVB class A, AVB class B, and non-AVB) Round-robin-based
>>> policies:
>>> 	It has the same priority for three queues: In the round-robin QoS
>>> scheme, each queue is given an equal opportunity to transmit one
>>> frame. For example, if queue n has a frame to transmit, the queue
>>> transmits its frame. After queue n has transmitted its frame, or if
>>> queue n does not have a frame to transmit, queue n+1 is then allowed
>>> to transmit its frame, and so on.
>>>
>>> Credit-based policies:
>>> 	The AVB credit based shaper acts independently, per class, to control
>>> the bandwidth distribution between normal traffic and time-sensitive
>>> traffic with respect to the total link bandwidth available.
>>> 	Credit based shaper conbined with time-based shaping:
>>> 		- priority: ClassA queue > ClassB queue > best effort
>>> 		- ensure the queue bandwidth as user set based on time-
>>based shaping
>>> algorithms (transmitter transmit frame from three queue in turn based
>>> on time-based shaping algorithms)
>>> 	And in real AVB case,  each streaming can be independent, and are
>>> fixed on related queue. Then driver level should implement
>>> .ndo_select_queue() to put the streaming into related queue. That is
>>> what the patch did.
>>>
>>> The current driver config the three queue to credit-based policies
>>> (AVB), the patch seems no problem for the implementation. Do you have
>>> any suggestion ?
>>>
>>
>>I tried using the round robin mode by adding this:
>>
>>+       /* Set Round-Robin policy */
>>+       writel(1, fep->hwp + FEC_QOS_SCHEME);
>>
>>After a while I got the warning non the less:
>>
>>WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316
>>dev_watchdog+0x248/0x24c NETDEV WATCHDOG: eth0 (fec): transmit queue
>>2 timed out Modules linked in:
>>CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1-00058-g56d22eced8bc-
>>dirty #377 Hardware name: Freescale i.MX7 Dual (Device Tree) [<c02266b0>]
>>(unwind_backtrace) from [<c0222d7c>] (show_stack+0x10/0x14) [<c0222d7c>]
>>(show_stack) from [<c04d4098>] (dump_stack+0x78/0x8c) [<c04d4098>]
>>(dump_stack) from [<c0236548>] (__warn+0xe8/0x100) [<c0236548>]
>>(__warn) from [<c0236598>] (warn_slowpath_fmt+0x38/0x48) [<c0236598>]
>>(warn_slowpath_fmt) from [<c0805904>]
>>(dev_watchdog+0x248/0x24c)
>>[<c0805904>] (dev_watchdog) from [<c028a0e8>] (call_timer_fn+0x28/0x98)
>>[<c028a0e8>] (call_timer_fn) from [<c028a1f8>] (expire_timers+0xa0/0xac)
>>[<c028a1f8>] (expire_timers) from [<c028a2a0>]
>>(run_timer_softirq+0x9c/0x194)
>>[<c028a2a0>] (run_timer_softirq) from [<c023aaf8>]
>>(__do_softirq+0x114/0x234)
>>[<c023aaf8>] (__do_softirq) from [<c023aee4>] (irq_exit+0xcc/0x108)
>>[<c023aee4>] (irq_exit) from [<c0279920>]
>>(__handle_domain_irq+0x80/0xec)
>>[<c0279920>] (__handle_domain_irq) from [<c0201544>]
>>(gic_handle_irq+0x48/0x8c)
>>[<c0201544>] (gic_handle_irq) from [<c0223838>] (__irq_svc+0x58/0x8c)
>>Exception stack(0xc1001f28 to 0xc1001f70)
>>1f20:                   00000001 00000000 00000000 c022fdc0 c1000000
>>c1003d80
>>1f40: c1003d34 c0e72ed0 c0bd9b04 c1001f80 00000000 00000000 00000000
>>c1001f78
>>1f60: c022048c c0220490 60000013 ffffffff [<c0223838>] (__irq_svc) from
>>[<c0220490>] (arch_cpu_idle+0x38/0x3c) [<c0220490>] (arch_cpu_idle) from
>>[<c026ec60>] (do_idle+0x170/0x204) [<c026ec60>] (do_idle) from
>>[<c026efac>] (cpu_startup_entry+0x18/0x1c) [<c026efac>]
>>(cpu_startup_entry) from [<c0e00c88>]
>>(start_kernel+0x394/0x3a0)
>>---[ end trace a474f341d40e0705 ]---
>>fec 30be0000.ethernet eth0: TX ring dump
>>
>>I disabled the regular ring dump and only printed one line. It seems to come
>>up every 2 seconds, and checking cat /proc/interrupts showed that queue 2
>>stayed at its last value (3562218):
>>
>> 58:    3091320     GIC-0 150 Level     30be0000.ethernet
>> 59:    3562218     GIC-0 151 Level     30be0000.ethernet
>> 60:   13377922     GIC-0 152 Level     30be0000.ethernet
> 
> Pls check ENETx_DMAnCFG[16] whether is set, and disable time-based
> shaping for round robin.

When I do not set ENETx_DMAnCFG[16] (DMA_CLASS_EN), then networking
stops working (I cannot mount NFS).

Time-based is disabled by default I guess?

^ permalink raw reply

* Re: [PATCH] kmod: don't load module unless req process has CAP_SYS_MODULE
From: Greg Kroah-Hartman @ 2017-05-15  6:10 UTC (permalink / raw)
  To: Mahesh Bandewar (महेश बंडेवार)
  Cc: Mahesh Bandewar, Ingo Molnar, LKML, netdev, Eric W . Biederman,
	Kees Cook, David Miller, Eric Dumazet
In-Reply-To: <CAF2d9jh6zARipRGM92drOjjrpicF6177cnUf=6oHbR9tHX1ObA@mail.gmail.com>

On Sun, May 14, 2017 at 07:42:08PM -0700, Mahesh Bandewar (महेश बंडेवार) wrote:
> On Sun, May 14, 2017 at 3:45 AM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Fri, May 12, 2017 at 04:22:59PM -0700, Mahesh Bandewar wrote:
> >> From: Mahesh Bandewar <maheshb@google.com>
> >>
> [...]
> >>   Now try to create a bridge inside this newly created net-ns which would
> >>   mean bridge module need to be loaded.
> >>   # ip link add br0 type bridge
> >>   # echo $?
> >>   0
> >>   # lsmod | grep bridge
> >>   bridge                110592  0
> >>   stp                    16384  1 bridge
> >>   llc                    16384  2 bridge,stp
> >>   #
> >>
> >>   After this patch -
> >>   # ip link add br0 type bridge
> >>   RTNETLINK answers: Operation not supported
> >>   # echo $?
> >>   2
> >>   # lsmod | grep bridge
> >>   #
> >
> > Well, it only loads this because the kernel asked for it to be loaded,
> > right?
> >
> Yes, kernel asked for it because of a user action.

Which is good, that's the way it is supposed to work.

> >> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> >> ---
> >>  kernel/kmod.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >> diff --git a/kernel/kmod.c b/kernel/kmod.c
> >> index 563f97e2be36..ac30157169b7 100644
> >> --- a/kernel/kmod.c
> >> +++ b/kernel/kmod.c
> >> @@ -133,6 +133,9 @@ int __request_module(bool wait, const char *fmt, ...)
> >>  #define MAX_KMOD_CONCURRENT 50       /* Completely arbitrary value - KAO */
> >>       static int kmod_loop_msg;
> >>
> >> +     if (!capable(CAP_SYS_MODULE))
> >> +             return -EPERM;
> >
> > At first glance this looks right, but I'm worried what this will break
> > that currently relies on this.  There might be lots of systems that are
> > used to this being the method that the needed module is requested.  What
> > about when userspace asks for a random char device and that module is
> > then loaded?  Does this patch break that functionality?
> >
> Any module when loaded gets loaded system-wide as we can't allow
> module loading per-ns.

That's the joys of "namespaces" :)

> To validate the behavior I was comparing it
> with insmod/modprobe, if that doesn't allow because of lack of this
> capability in default-ns, then this *indirect* method of loading
> module should not allow the same action and the behavior should be
> consistent. So with that logic if userspace asks for a random
> char-device if insmod/modprobe cannot load it, then this method should
> not load it either for the consistency, right?

No, that would break things that are expecting this type of
functionality, right?

What is the "problem" with loading kernel modules when userspace asks
for the functionality involved in them?  There has been some work with
the LSM interface to disallow this if so desired, why not just use that
instead?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] kmod: don't load module unless req process has CAP_SYS_MODULE
From: Greg Kroah-Hartman @ 2017-05-15  6:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Mahesh Bandewar, Ingo Molnar, LKML, netdev, Kees Cook,
	David Miller, Eric Dumazet, Mahesh Bandewar
In-Reply-To: <87d1bbo81d.fsf@xmission.com>

On Sun, May 14, 2017 at 08:57:34AM -0500, Eric W. Biederman wrote:
> Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:
> 
> > On Fri, May 12, 2017 at 04:22:59PM -0700, Mahesh Bandewar wrote:
> >> From: Mahesh Bandewar <maheshb@google.com>
> >> 
> >> A process inside random user-ns should not load a module, which is
> >> currently possible. As demonstrated in following scenario -
> >> 
> >>   Create namespaces; especially a user-ns and become root inside.
> >>   $ unshare -rfUp -- unshare -unm -- bash
> >> 
> >>   Try to load the bridge module. It should fail and this is expected!
> >>   #  modprobe bridge
> >>   WARNING: Error inserting stp (/lib/modules/4.11.0-smp-DEV/kernel/net/802/stp.ko): Operation not permitted
> >>   FATAL: Error inserting bridge (/lib/modules/4.11.0-smp-DEV/kernel/net/bridge/bridge.ko): Operation not permitted
> >> 
> >>   Verify bridge module is not loaded.
> >>   # lsmod | grep bridge
> >>   #
> >> 
> >>   Now try to create a bridge inside this newly created net-ns which would
> >>   mean bridge module need to be loaded.
> >>   # ip link add br0 type bridge
> >>   # echo $?
> >>   0
> >>   # lsmod | grep bridge
> >>   bridge                110592  0
> >>   stp                    16384  1 bridge
> >>   llc                    16384  2 bridge,stp
> >>   #
> >> 
> >>   After this patch -
> >>   # ip link add br0 type bridge
> >>   RTNETLINK answers: Operation not supported
> >>   # echo $?
> >>   2
> >>   # lsmod | grep bridge
> >>   #
> >
> > Well, it only loads this because the kernel asked for it to be loaded,
> > right?
> >
> >> 
> >> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> >> ---
> >>  kernel/kmod.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> >> 
> >> diff --git a/kernel/kmod.c b/kernel/kmod.c
> >> index 563f97e2be36..ac30157169b7 100644
> >> --- a/kernel/kmod.c
> >> +++ b/kernel/kmod.c
> >> @@ -133,6 +133,9 @@ int __request_module(bool wait, const char *fmt, ...)
> >>  #define MAX_KMOD_CONCURRENT 50	/* Completely arbitrary value - KAO */
> >>  	static int kmod_loop_msg;
> >>  
> >> +	if (!capable(CAP_SYS_MODULE))
> >> +		return -EPERM;
> >
> > At first glance this looks right, but I'm worried what this will break
> > that currently relies on this.  There might be lots of systems that are
> > used to this being the method that the needed module is requested.  What
> > about when userspace asks for a random char device and that module is
> > then loaded?  Does this patch break that functionality?
> 
> For the specific example give I think we would be better served by
> adding a capability check at the call site.  In this case CAP_NET_ADMIN
> as those are the capabilities iproute traditionally has.
> 
> We have something similar in dev_load in already in the networking code.
> 
> This limits the people who can't load modules to root user in user
> namespaces.  I would be fine with any other code paths in a user
> namespace getting a similar treatment.
> 
> Eric
> 
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index bcb0f610ee42..6b72528a4636 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -2595,7 +2595,7 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>  
>                 if (!ops) {
>  #ifdef CONFIG_MODULES
> -                       if (kind[0]) {
> +                       if (kind[0] && capable(CAP_NET_ADMIN)) {
>                                 __rtnl_unlock();
>                                 request_module("rtnl-link-%s", kind);
>                                 rtnl_lock();

I don't object to this if the networking developers don't mind the
change in functionality.  They can handle the fallout :)

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] net/smc: mark as BROKEN due to remote memory exposure
From: Sagi Grimberg @ 2017-05-15  6:41 UTC (permalink / raw)
  To: David Miller, hch-jcswGhMUV9g
  Cc: Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA,
	ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
In-Reply-To: <20170514.115116.499149210596634881.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

Hi Dave,

>> this patch has not been superceeded by anything, can you explain why
>> it has been marked as such in patchworks?
>
> I think you're being overbearing by requiring this to be marked BROKEN
> and I would like you to explore other ways with the authors to fix
> whatever perceived problems you think SMC has.

Well, its not one's opinion, its a real problem. To be fair, this
security breach existed in other RDMA based storage protocols for
years in the past, but we cleaned it up completely.

Our assumption is that *if* the user is willingly choosing to expose its
entire physical address space to remote access to get some performance
boost that's fine, but to have the kernel expose it by default, without
even letting the user to control it is plain irresponsible. IMHO we
should *not* repeat the mistakes of the past and set a higher bar for
RDMA protocol implementations.

> You claim that this is somehow "urgent" is false.  You can ask
> distributions to disable SMC or whatever in the short term if it
> reallly, truly, bothers you.

I doubt that is sufficient given that not all implementations
out there rely on distros. I'm afraid one time is too much in
this case.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] net/smc: mark as BROKEN due to remote memory exposure
From: Leon Romanovsky @ 2017-05-15  7:18 UTC (permalink / raw)
  To: David Miller
  Cc: hch-jcswGhMUV9g, Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA,
	ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
In-Reply-To: <20170514.115116.499149210596634881.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]

On Sun, May 14, 2017 at 11:51:16AM -0400, David Miller wrote:
> From: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Date: Sun, 14 May 2017 07:58:48 +0200
>
> > this patch has not been superceeded by anything, can you explain why
> > it has been marked as such in patchworks?
>
> I think you're being overbearing by requiring this to be marked BROKEN
> and I would like you to explore other ways with the authors to fix
> whatever perceived problems you think SMC has.

Hi Dave,

Christoph proposed very clear way to remove BROKEN tag, by providing
secure environment by default to all users.

All ULPs in RDMA and lustre in staging [1] are working in secure mode
by default and I don't understand why SMC-R should be different.

From my experience such "BROKEN" tag will help for authors to get proper
priority from their managers to fix it in a first place, before they are
moved to anything else.

>
> You claim that this is somehow "urgent" is false.  You can ask
> distributions to disable SMC or whatever in the short term if it
> reallly, truly, bothers you.

It is not distributions only. For example, RDMA stack is released by
different providers. Mellanox OFED [2] is one of them and it is based on
upstream kernel.

Thanks

[1] e0ccf5d085ab ("staging: lustre: ko2iblnd: Adapt to the removal of ib_get_dma_mr()")
[2] http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox