Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [v2 PATCH 1/1] tg3: fix meaningless hw_stats reading after tg3_halt memset 0 hw_stats
From: Zumeng Chen @ 2018-05-05  2:40 UTC (permalink / raw)
  To: Michael Chan
  Cc: Netdev, open list, Siva Reddy Kallam,
	prashant.sreedharan@broadcom.com, David Miller
In-Reply-To: <CACKFLi=Rv_+Gig5UpdnpbOPcyLyGr3TYGUbFEmeGcZWAk3YZRQ@mail.gmail.com>

On 05/03/2018 01:04 PM, Michael Chan wrote:
> On Wed, May 2, 2018 at 5:30 PM, Zumeng Chen <zumeng.chen@gmail.com> wrote:
>> On 2018年05月03日 01:32, Michael Chan wrote:
>>> On Wed, May 2, 2018 at 3:27 AM, Zumeng Chen <zumeng.chen@gmail.com> wrote:
>>>> On 2018年05月02日 13:12, Michael Chan wrote:
>>>>> On Tue, May 1, 2018 at 5:42 PM, Zumeng Chen <zumeng.chen@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> diff --git a/drivers/net/ethernet/broadcom/tg3.h
>>>>>> b/drivers/net/ethernet/broadcom/tg3.h
>>>>>> index 3b5e98e..c61d83c 100644
>>>>>> --- a/drivers/net/ethernet/broadcom/tg3.h
>>>>>> +++ b/drivers/net/ethernet/broadcom/tg3.h
>>>>>> @@ -3102,6 +3102,7 @@ enum TG3_FLAGS {
>>>>>>            TG3_FLAG_ROBOSWITCH,
>>>>>>            TG3_FLAG_ONE_DMA_AT_ONCE,
>>>>>>            TG3_FLAG_RGMII_MODE,
>>>>>> +       TG3_FLAG_HALT,
>>>>> I think you should be able to use the existing INIT_COMPLETE flag
>>>>
>>>> No,  it will bring the uncertain factors into the existed complicate
>>>> logic
>>>> of INIT_COMPLETE.
>>>> And I think it's very simple logic here to fix the meaningless hw_stats
>>>> reading and the problem
>>>> of commit f5992b72. I even suspect if you have read INIT_COMPLETE related
>>>> codes carefully.
>>>>
>>> We should use an existing flag whenever appropriate
>>
>> I disagree. This is sort of blahblah...
> I don't want to see another flag added that is practically the same as
> !INIT_COMPLETE.  The driver already has close to one hundred flags.
> Adding a new flag that is similar to an existing flag will just make
> the code more difficult to understand and maintain.
>
> If you don't want to fix it the cleaner way, Siva or I will fix it.

Feel free to go, I just take a double look, INIT_COMPLETE can directly 
be used as follows:

diff --git a/drivers/net/ethernet/broadcom/tg3.c 
b/drivers/net/ethernet/broadcom/tg3.c
index 08bbb63..0e04fd7 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -8733,14 +8733,11 @@ static void tg3_free_consistent(struct tg3 *tp)
         tg3_mem_rx_release(tp);
         tg3_mem_tx_release(tp);

-       /* Protect tg3_get_stats64() from reading freed tp->hw_stats. */
-       tg3_full_lock(tp, 0);
         if (tp->hw_stats) {
                 dma_free_coherent(&tp->pdev->dev, sizeof(struct 
tg3_hw_stats),
                                   tp->hw_stats, tp->stats_mapping);
                 tp->hw_stats = NULL;
         }
-       tg3_full_unlock(tp);
  }

  /*
@@ -14178,7 +14175,7 @@ static void tg3_get_stats64(struct net_device *dev,
         struct tg3 *tp = netdev_priv(dev);

         spin_lock_bh(&tp->lock);
-       if (!tp->hw_stats) {
+       if (!tp->hw_stats || tg3_flag(tp, INIT_COMPLETE)) {
                 *stats = tp->net_stats_prev;
                 spin_unlock_bh(&tp->lock);
                 return;

Cheers,
Zumeng

^ permalink raw reply related

* [PATCH] net/9p: correct the variable name in v9fs_get_trans_by_name() comment
From: Sun Lianwen @ 2018-05-05  3:29 UTC (permalink / raw)
  To: davem; +Cc: netdev

The v9fs_get_trans_by_name(char *s) variable name is not "name" but "s".

Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
---
 net/9p/mod.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/9p/mod.c b/net/9p/mod.c
index 6ab36aea7727..eb9777f05755 100644
--- a/net/9p/mod.c
+++ b/net/9p/mod.c
@@ -104,7 +104,7 @@ EXPORT_SYMBOL(v9fs_unregister_trans);
 
 /**
  * v9fs_get_trans_by_name - get transport with the matching name
- * @name: string identifying transport
+ * @s: string identifying transport
  *
  */
 struct p9_trans_module *v9fs_get_trans_by_name(char *s)
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper
From: Jann Horn @ 2018-05-05  4:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Linus Torvalds,
	Greg Kroah-Hartman, Andy Lutomirski, Network Development,
	kernel list, kernel-team
In-Reply-To: <20180503043604.1604587-2-ast@kernel.org>

On Thu, May 3, 2018 at 12:36 AM, Alexei Starovoitov <ast@kernel.org> wrote:
> Introduce helper:
> int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> struct umh_info {
>        struct file *pipe_to_umh;
>        struct file *pipe_from_umh;
>        pid_t pid;
> };
>
> that GPLed kernel modules (signed or unsigned) can use it to execute part
> of its own data as swappable user mode process.
>
> The kernel will do:
> - mount "tmpfs"
> - allocate a unique file in tmpfs
> - populate that file with [data, data + len] bytes
> - user-mode-helper code will do_execve that file and, before the process
>   starts, the kernel will create two unix pipes for bidirectional
>   communication between kernel module and umh
> - close tmpfs file, effectively deleting it
> - the fork_usermode_blob will return zero on success and populate
>   'struct umh_info' with two unix pipes and the pid of the user process
>
> As the first step in the development of the bpfilter project
> the fork_usermode_blob() helper is introduced to allow user mode code
> to be invoked from a kernel module. The idea is that user mode code plus
> normal kernel module code are built as part of the kernel build
> and installed as traditional kernel module into distro specified location,
> such that from a distribution point of view, there is
> no difference between regular kernel modules and kernel modules + umh code.
> Such modules can be signed, modprobed, rmmod, etc. The use of this new helper
> by a kernel module doesn't make it any special from kernel and user space
> tooling point of view.
[...]
> +static struct vfsmount *umh_fs;
> +
> +static int init_tmpfs(void)
> +{
> +       struct file_system_type *type;
> +
> +       if (umh_fs)
> +               return 0;
> +       type = get_fs_type("tmpfs");
> +       if (!type)
> +               return -ENODEV;
> +       umh_fs = kern_mount(type);
> +       if (IS_ERR(umh_fs)) {
> +               int err = PTR_ERR(umh_fs);
> +
> +               put_filesystem(type);
> +               umh_fs = NULL;
> +               return err;
> +       }
> +       return 0;
> +}

Should init_tmpfs() be holding some sort of mutex if it's fiddling
with `umh_fs`? The current code only calls it in initcall context, but
if that ever changes and two processes try to initialize the tmpfs at
the same time, a few things could go wrong.
I guess Luis' suggestion (putting a call to init_tmpfs() in
do_basic_setup()) might be the easiest way to get rid of that problem.

> +static int alloc_tmpfs_file(size_t size, struct file **filp)
> +{
> +       struct file *file;
> +       int err;
> +
> +       err = init_tmpfs();
> +       if (err)
> +               return err;
> +       file = shmem_file_setup_with_mnt(umh_fs, "umh", size, VM_NORESERVE);
> +       if (IS_ERR(file))
> +               return PTR_ERR(file);
> +       *filp = file;
> +       return 0;
> +}

^ permalink raw reply

* Re: [PATCH iproute2] rdma: fix header files
From: Stephen Hemminger @ 2018-05-05  4:58 UTC (permalink / raw)
  To: David Ahern; +Cc: swise, netdev
In-Reply-To: <d51c9746-d3e9-48f5-fc93-b63e4c5c7a8c@gmail.com>

On Fri, 4 May 2018 16:13:07 -0600
David Ahern <dsahern@gmail.com> wrote:

> On 5/4/18 3:56 PM, Stephen Hemminger wrote:
> > All user api headers in iproute2 should be in include/uapi
> > so that script can be used to put correct sanitized kernel headers
> > there. And the header files for rdma must be a complete set; if one
> > header file includes another, all must be present.
> > 
> > This fixes build on older distributions, and Windows Services
> > for Linux.
> > 
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >  include/uapi/rdma/ib_user_sa.h                |   77 ++
> >  include/uapi/rdma/ib_user_verbs.h             | 1210 +++++++++++++++++
> >  .../uapi/rdma/rdma_netlink.h                  |   13 +
> >  .../uapi/rdma/rdma_user_cm.h                  |    6 +-
> >  4 files changed, 1303 insertions(+), 3 deletions(-)
> >  create mode 100644 include/uapi/rdma/ib_user_sa.h
> >  create mode 100644 include/uapi/rdma/ib_user_verbs.h
> >  rename {rdma/include => include}/uapi/rdma/rdma_netlink.h (95%)
> >  rename {rdma/include => include}/uapi/rdma/rdma_user_cm.h (98%)
> >   
> 
> Stephen:
> 
> Per a recent discussion the RDMA folks need to take ownership of the
> uapi files. RDMA features do not hit Dave's net-next tree so the rdma
> code can never hit iproute2-next during a dev cycle.

I want all uapi headers in include/uapi because it avoids possible overlap problems,
During the linux-net/linus release cycle they should match what is Linus's tree.

During the net-next they can come from two sources.

^ permalink raw reply

* Re: [PATCH net-next v2] net: core: rework basic flow dissection helper
From: kbuild test robot @ 2018-05-05  5:21 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: kbuild-all, netdev, David S. Miller, Eric Dumazet, Jason Wang
In-Reply-To: <e87a05fed35c9cb8e60fe8f867c43b27b3044e21.1525426286.git.pabeni@redhat.com>

Hi Paolo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Paolo-Abeni/net-core-rework-basic-flow-dissection-helper/20180505-090417
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   include/net/tipc.h:55:30: sparse: incorrect type in return expression (different base types) @@    expected unsigned int @@    got restriunsigned int @@
   include/net/tipc.h:55:30:    expected unsigned int
   include/net/tipc.h:55:30:    got restricted __be32 <noident>
   net/core/flow_dissector.c:840:48: sparse: incorrect type in assignment (different base types) @@    expected restricted __be32 [usertype] key @@    got  [usertype] key @@
   net/core/flow_dissector.c:840:48:    expected restricted __be32 [usertype] key
   net/core/flow_dissector.c:840:48:    got unsigned int
   net/core/flow_dissector.c:1035:30: sparse: expression using sizeof(void)
   net/core/flow_dissector.c:1035:30: sparse: expression using sizeof(void)
   net/core/flow_dissector.c:1276:25: sparse: expression using sizeof(void)
>> net/core/flow_dissector.c:1319:59: sparse: Using plain integer as NULL pointer

vim +1319 net/core/flow_dissector.c

  1305	
  1306	/**
  1307	 * skb_get_poff - get the offset to the payload
  1308	 * @skb: sk_buff to get the payload offset from
  1309	 *
  1310	 * The function will get the offset to the payload as far as it could
  1311	 * be dissected.  The main user is currently BPF, so that we can dynamically
  1312	 * truncate packets without needing to push actual payload to the user
  1313	 * space and can analyze headers only, instead.
  1314	 */
  1315	u32 skb_get_poff(const struct sk_buff *skb)
  1316	{
  1317		struct flow_keys_basic keys;
  1318	
> 1319		if (!skb_flow_dissect_flow_keys_basic(skb, &keys, 0, 0, 0, 0, 0))
  1320			return 0;
  1321	
  1322		return __skb_get_poff(skb, skb->data, &keys, skb_headlen(skb));
  1323	}
  1324	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* [PATCHv2 net] sctp: delay the authentication for the duplicated cookie-echo chunk
From: Xin Long @ 2018-05-05  6:59 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: davem, Marcelo Ricardo Leitner, Neil Horman

Now sctp only delays the authentication for the normal cookie-echo
chunk by setting chunk->auth_chunk in sctp_endpoint_bh_rcv(). But
for the duplicated one with auth, in sctp_assoc_bh_rcv(), it does
authentication first based on the old asoc, which will definitely
fail due to the different auth info in the old asoc.

The duplicated cookie-echo chunk will create a new asoc with the
auth info from this chunk, and the authentication should also be
done with the new asoc's auth info for all of the collision 'A',
'B' and 'D'. Otherwise, the duplicated cookie-echo chunk with auth
will never pass the authentication and create the new connection.

This issue exists since very beginning, and this fix is to make
sctp_assoc_bh_rcv() follow the way sctp_endpoint_bh_rcv() does
for the normal cookie-echo chunk to delay the authentication.

While at it, remove the unused params from sctp_sf_authenticate()
and define sctp_auth_chunk_verify() used for all the places that
do the delayed authentication.

v1->v2:
  fix the typo in changelog as Marcelo noticed.

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sctp/associola.c    | 30 ++++++++++++++++-
 net/sctp/sm_statefuns.c | 86 ++++++++++++++++++++++++++-----------------------
 2 files changed, 75 insertions(+), 41 deletions(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 837806d..a47179d 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1024,8 +1024,9 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
 	struct sctp_endpoint *ep;
 	struct sctp_chunk *chunk;
 	struct sctp_inq *inqueue;
-	int state;
+	int first_time = 1;	/* is this the first time through the loop */
 	int error = 0;
+	int state;
 
 	/* The association should be held so we should be safe. */
 	ep = asoc->ep;
@@ -1036,6 +1037,30 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
 		state = asoc->state;
 		subtype = SCTP_ST_CHUNK(chunk->chunk_hdr->type);
 
+		/* If the first chunk in the packet is AUTH, do special
+		 * processing specified in Section 6.3 of SCTP-AUTH spec
+		 */
+		if (first_time && subtype.chunk == SCTP_CID_AUTH) {
+			struct sctp_chunkhdr *next_hdr;
+
+			next_hdr = sctp_inq_peek(inqueue);
+			if (!next_hdr)
+				goto normal;
+
+			/* If the next chunk is COOKIE-ECHO, skip the AUTH
+			 * chunk while saving a pointer to it so we can do
+			 * Authentication later (during cookie-echo
+			 * processing).
+			 */
+			if (next_hdr->type == SCTP_CID_COOKIE_ECHO) {
+				chunk->auth_chunk = skb_clone(chunk->skb,
+							      GFP_ATOMIC);
+				chunk->auth = 1;
+				continue;
+			}
+		}
+
+normal:
 		/* SCTP-AUTH, Section 6.3:
 		 *    The receiver has a list of chunk types which it expects
 		 *    to be received only after an AUTH-chunk.  This list has
@@ -1074,6 +1099,9 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
 		/* If there is an error on chunk, discard this packet. */
 		if (error && chunk)
 			chunk->pdiscard = 1;
+
+		if (first_time)
+			first_time = 0;
 	}
 	sctp_association_put(asoc);
 }
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 28c070e..c9ae340 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -153,10 +153,7 @@ static enum sctp_disposition sctp_sf_violation_chunk(
 					struct sctp_cmd_seq *commands);
 
 static enum sctp_ierror sctp_sf_authenticate(
-					struct net *net,
-					const struct sctp_endpoint *ep,
 					const struct sctp_association *asoc,
-					const union sctp_subtype type,
 					struct sctp_chunk *chunk);
 
 static enum sctp_disposition __sctp_sf_do_9_1_abort(
@@ -626,6 +623,38 @@ enum sctp_disposition sctp_sf_do_5_1C_ack(struct net *net,
 	return SCTP_DISPOSITION_CONSUME;
 }
 
+static bool sctp_auth_chunk_verify(struct net *net, struct sctp_chunk *chunk,
+				   const struct sctp_association *asoc)
+{
+	struct sctp_chunk auth;
+
+	if (!chunk->auth_chunk)
+		return true;
+
+	/* SCTP-AUTH:  auth_chunk pointer is only set when the cookie-echo
+	 * is supposed to be authenticated and we have to do delayed
+	 * authentication.  We've just recreated the association using
+	 * the information in the cookie and now it's much easier to
+	 * do the authentication.
+	 */
+
+	/* Make sure that we and the peer are AUTH capable */
+	if (!net->sctp.auth_enable || !asoc->peer.auth_capable)
+		return false;
+
+	/* set-up our fake chunk so that we can process it */
+	auth.skb = chunk->auth_chunk;
+	auth.asoc = chunk->asoc;
+	auth.sctp_hdr = chunk->sctp_hdr;
+	auth.chunk_hdr = (struct sctp_chunkhdr *)
+				skb_push(chunk->auth_chunk,
+					 sizeof(struct sctp_chunkhdr));
+	skb_pull(chunk->auth_chunk, sizeof(struct sctp_chunkhdr));
+	auth.transport = chunk->transport;
+
+	return sctp_sf_authenticate(asoc, &auth) == SCTP_IERROR_NO_ERROR;
+}
+
 /*
  * Respond to a normal COOKIE ECHO chunk.
  * We are the side that is being asked for an association.
@@ -763,37 +792,9 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net,
 	if (error)
 		goto nomem_init;
 
-	/* SCTP-AUTH:  auth_chunk pointer is only set when the cookie-echo
-	 * is supposed to be authenticated and we have to do delayed
-	 * authentication.  We've just recreated the association using
-	 * the information in the cookie and now it's much easier to
-	 * do the authentication.
-	 */
-	if (chunk->auth_chunk) {
-		struct sctp_chunk auth;
-		enum sctp_ierror ret;
-
-		/* Make sure that we and the peer are AUTH capable */
-		if (!net->sctp.auth_enable || !new_asoc->peer.auth_capable) {
-			sctp_association_free(new_asoc);
-			return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
-		}
-
-		/* set-up our fake chunk so that we can process it */
-		auth.skb = chunk->auth_chunk;
-		auth.asoc = chunk->asoc;
-		auth.sctp_hdr = chunk->sctp_hdr;
-		auth.chunk_hdr = (struct sctp_chunkhdr *)
-					skb_push(chunk->auth_chunk,
-						 sizeof(struct sctp_chunkhdr));
-		skb_pull(chunk->auth_chunk, sizeof(struct sctp_chunkhdr));
-		auth.transport = chunk->transport;
-
-		ret = sctp_sf_authenticate(net, ep, new_asoc, type, &auth);
-		if (ret != SCTP_IERROR_NO_ERROR) {
-			sctp_association_free(new_asoc);
-			return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
-		}
+	if (!sctp_auth_chunk_verify(net, chunk, new_asoc)) {
+		sctp_association_free(new_asoc);
+		return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
 	}
 
 	repl = sctp_make_cookie_ack(new_asoc, chunk);
@@ -1797,13 +1798,15 @@ static enum sctp_disposition sctp_sf_do_dupcook_a(
 	if (sctp_auth_asoc_init_active_key(new_asoc, GFP_ATOMIC))
 		goto nomem;
 
+	if (!sctp_auth_chunk_verify(net, chunk, new_asoc))
+		return SCTP_DISPOSITION_DISCARD;
+
 	/* Make sure no new addresses are being added during the
 	 * restart.  Though this is a pretty complicated attack
 	 * since you'd have to get inside the cookie.
 	 */
-	if (!sctp_sf_check_restart_addrs(new_asoc, asoc, chunk, commands)) {
+	if (!sctp_sf_check_restart_addrs(new_asoc, asoc, chunk, commands))
 		return SCTP_DISPOSITION_CONSUME;
-	}
 
 	/* If the endpoint is in the SHUTDOWN-ACK-SENT state and recognizes
 	 * the peer has restarted (Action A), it MUST NOT setup a new
@@ -1912,6 +1915,9 @@ static enum sctp_disposition sctp_sf_do_dupcook_b(
 	if (sctp_auth_asoc_init_active_key(new_asoc, GFP_ATOMIC))
 		goto nomem;
 
+	if (!sctp_auth_chunk_verify(net, chunk, new_asoc))
+		return SCTP_DISPOSITION_DISCARD;
+
 	/* Update the content of current association.  */
 	sctp_add_cmd_sf(commands, SCTP_CMD_UPDATE_ASSOC, SCTP_ASOC(new_asoc));
 	sctp_add_cmd_sf(commands, SCTP_CMD_NEW_STATE,
@@ -2009,6 +2015,9 @@ static enum sctp_disposition sctp_sf_do_dupcook_d(
 	 * a COOKIE ACK.
 	 */
 
+	if (!sctp_auth_chunk_verify(net, chunk, asoc))
+		return SCTP_DISPOSITION_DISCARD;
+
 	/* Don't accidentally move back into established state. */
 	if (asoc->state < SCTP_STATE_ESTABLISHED) {
 		sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_STOP,
@@ -4171,10 +4180,7 @@ enum sctp_disposition sctp_sf_eat_fwd_tsn_fast(
  * The return value is the disposition of the chunk.
  */
 static enum sctp_ierror sctp_sf_authenticate(
-					struct net *net,
-					const struct sctp_endpoint *ep,
 					const struct sctp_association *asoc,
-					const union sctp_subtype type,
 					struct sctp_chunk *chunk)
 {
 	struct sctp_shared_key *sh_key = NULL;
@@ -4275,7 +4281,7 @@ enum sctp_disposition sctp_sf_eat_auth(struct net *net,
 						  commands);
 
 	auth_hdr = (struct sctp_authhdr *)chunk->skb->data;
-	error = sctp_sf_authenticate(net, ep, asoc, type, chunk);
+	error = sctp_sf_authenticate(asoc, chunk);
 	switch (error) {
 	case SCTP_IERROR_AUTH_BAD_HMAC:
 		/* Generate the ERROR chunk and discard the rest
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH net] sctp: delay the authentication for the duplicated cookie-echo chunk
From: Xin Long @ 2018-05-05  7:01 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: network dev, linux-sctp, davem, Neil Horman
In-Reply-To: <20180504223337.GO5105@localhost.localdomain>

On Sat, May 5, 2018 at 6:33 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Fri, May 04, 2018 at 05:05:10PM +0800, Xin Long wrote:
>> Now sctp only delays the authentication for the normal cookie-echo
>> chunk by setting chunk->auth_chunk in sctp_endpoint_bh_rcv(). But
>> for the duplicated one with auth, in sctp_assoc_bh_rcv(), it does
>> authentication first based on the old asoc, which will definitely
>> fail due to the different auth info in the old asoc.
>>
>> The duplicated cookie-echo chunk will create a new asoc with the
>> auth info from this chunk, and the authentication should also be
>> done with the new asoc's auth info for all of the collision 'A',
>> 'B' and 'D'. Otherwise, the duplicated cookie-echo chunk with auth
>> will never pass the authentication and create the new connection.
>>
>> This issue exists since very beginning, and this fix is to make
>> sctp_assoc_bh_rcv() follow the way sctp_assoc_bh_rcv() does for
>    I guess you meant sctp_endpoint_bh_rcv here --^ right?
have posted v2, thanks.

>
> Other than this LGTM
>
>> the normal cookie-echo chunk to delay the authentication.
>>
>> While at it, remove the unused params from sctp_sf_authenticate()
>> and define sctp_auth_chunk_verify() used for all the places that
>> do the delayed authentication.
>>
>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>> ---
>>  net/sctp/associola.c    | 30 ++++++++++++++++-
>>  net/sctp/sm_statefuns.c | 86 ++++++++++++++++++++++++++-----------------------
>>  2 files changed, 75 insertions(+), 41 deletions(-)
>>
>> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
>> index 837806d..a47179d 100644
>> --- a/net/sctp/associola.c
>> +++ b/net/sctp/associola.c
>> @@ -1024,8 +1024,9 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
>>       struct sctp_endpoint *ep;
>>       struct sctp_chunk *chunk;
>>       struct sctp_inq *inqueue;
>> -     int state;
>> +     int first_time = 1;     /* is this the first time through the loop */
>>       int error = 0;
>> +     int state;
>>
>>       /* The association should be held so we should be safe. */
>>       ep = asoc->ep;
>> @@ -1036,6 +1037,30 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
>>               state = asoc->state;
>>               subtype = SCTP_ST_CHUNK(chunk->chunk_hdr->type);
>>
>> +             /* If the first chunk in the packet is AUTH, do special
>> +              * processing specified in Section 6.3 of SCTP-AUTH spec
>> +              */
>> +             if (first_time && subtype.chunk == SCTP_CID_AUTH) {
>> +                     struct sctp_chunkhdr *next_hdr;
>> +
>> +                     next_hdr = sctp_inq_peek(inqueue);
>> +                     if (!next_hdr)
>> +                             goto normal;
>> +
>> +                     /* If the next chunk is COOKIE-ECHO, skip the AUTH
>> +                      * chunk while saving a pointer to it so we can do
>> +                      * Authentication later (during cookie-echo
>> +                      * processing).
>> +                      */
>> +                     if (next_hdr->type == SCTP_CID_COOKIE_ECHO) {
>> +                             chunk->auth_chunk = skb_clone(chunk->skb,
>> +                                                           GFP_ATOMIC);
>> +                             chunk->auth = 1;
>> +                             continue;
>> +                     }
>> +             }
>> +
>> +normal:
>>               /* SCTP-AUTH, Section 6.3:
>>                *    The receiver has a list of chunk types which it expects
>>                *    to be received only after an AUTH-chunk.  This list has
>> @@ -1074,6 +1099,9 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
>>               /* If there is an error on chunk, discard this packet. */
>>               if (error && chunk)
>>                       chunk->pdiscard = 1;
>> +
>> +             if (first_time)
>> +                     first_time = 0;
>>       }
>>       sctp_association_put(asoc);
>>  }
>> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
>> index 28c070e..c9ae340 100644
>> --- a/net/sctp/sm_statefuns.c
>> +++ b/net/sctp/sm_statefuns.c
>> @@ -153,10 +153,7 @@ static enum sctp_disposition sctp_sf_violation_chunk(
>>                                       struct sctp_cmd_seq *commands);
>>
>>  static enum sctp_ierror sctp_sf_authenticate(
>> -                                     struct net *net,
>> -                                     const struct sctp_endpoint *ep,
>>                                       const struct sctp_association *asoc,
>> -                                     const union sctp_subtype type,
>>                                       struct sctp_chunk *chunk);
>>
>>  static enum sctp_disposition __sctp_sf_do_9_1_abort(
>> @@ -626,6 +623,38 @@ enum sctp_disposition sctp_sf_do_5_1C_ack(struct net *net,
>>       return SCTP_DISPOSITION_CONSUME;
>>  }
>>
>> +static bool sctp_auth_chunk_verify(struct net *net, struct sctp_chunk *chunk,
>> +                                const struct sctp_association *asoc)
>> +{
>> +     struct sctp_chunk auth;
>> +
>> +     if (!chunk->auth_chunk)
>> +             return true;
>> +
>> +     /* SCTP-AUTH:  auth_chunk pointer is only set when the cookie-echo
>> +      * is supposed to be authenticated and we have to do delayed
>> +      * authentication.  We've just recreated the association using
>> +      * the information in the cookie and now it's much easier to
>> +      * do the authentication.
>> +      */
>> +
>> +     /* Make sure that we and the peer are AUTH capable */
>> +     if (!net->sctp.auth_enable || !asoc->peer.auth_capable)
>> +             return false;
>> +
>> +     /* set-up our fake chunk so that we can process it */
>> +     auth.skb = chunk->auth_chunk;
>> +     auth.asoc = chunk->asoc;
>> +     auth.sctp_hdr = chunk->sctp_hdr;
>> +     auth.chunk_hdr = (struct sctp_chunkhdr *)
>> +                             skb_push(chunk->auth_chunk,
>> +                                      sizeof(struct sctp_chunkhdr));
>> +     skb_pull(chunk->auth_chunk, sizeof(struct sctp_chunkhdr));
>> +     auth.transport = chunk->transport;
>> +
>> +     return sctp_sf_authenticate(asoc, &auth) == SCTP_IERROR_NO_ERROR;
>> +}
>> +
>>  /*
>>   * Respond to a normal COOKIE ECHO chunk.
>>   * We are the side that is being asked for an association.
>> @@ -763,37 +792,9 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net,
>>       if (error)
>>               goto nomem_init;
>>
>> -     /* SCTP-AUTH:  auth_chunk pointer is only set when the cookie-echo
>> -      * is supposed to be authenticated and we have to do delayed
>> -      * authentication.  We've just recreated the association using
>> -      * the information in the cookie and now it's much easier to
>> -      * do the authentication.
>> -      */
>> -     if (chunk->auth_chunk) {
>> -             struct sctp_chunk auth;
>> -             enum sctp_ierror ret;
>> -
>> -             /* Make sure that we and the peer are AUTH capable */
>> -             if (!net->sctp.auth_enable || !new_asoc->peer.auth_capable) {
>> -                     sctp_association_free(new_asoc);
>> -                     return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
>> -             }
>> -
>> -             /* set-up our fake chunk so that we can process it */
>> -             auth.skb = chunk->auth_chunk;
>> -             auth.asoc = chunk->asoc;
>> -             auth.sctp_hdr = chunk->sctp_hdr;
>> -             auth.chunk_hdr = (struct sctp_chunkhdr *)
>> -                                     skb_push(chunk->auth_chunk,
>> -                                              sizeof(struct sctp_chunkhdr));
>> -             skb_pull(chunk->auth_chunk, sizeof(struct sctp_chunkhdr));
>> -             auth.transport = chunk->transport;
>> -
>> -             ret = sctp_sf_authenticate(net, ep, new_asoc, type, &auth);
>> -             if (ret != SCTP_IERROR_NO_ERROR) {
>> -                     sctp_association_free(new_asoc);
>> -                     return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
>> -             }
>> +     if (!sctp_auth_chunk_verify(net, chunk, new_asoc)) {
>> +             sctp_association_free(new_asoc);
>> +             return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
>>       }
>>
>>       repl = sctp_make_cookie_ack(new_asoc, chunk);
>> @@ -1797,13 +1798,15 @@ static enum sctp_disposition sctp_sf_do_dupcook_a(
>>       if (sctp_auth_asoc_init_active_key(new_asoc, GFP_ATOMIC))
>>               goto nomem;
>>
>> +     if (!sctp_auth_chunk_verify(net, chunk, new_asoc))
>> +             return SCTP_DISPOSITION_DISCARD;
>> +
>>       /* Make sure no new addresses are being added during the
>>        * restart.  Though this is a pretty complicated attack
>>        * since you'd have to get inside the cookie.
>>        */
>> -     if (!sctp_sf_check_restart_addrs(new_asoc, asoc, chunk, commands)) {
>> +     if (!sctp_sf_check_restart_addrs(new_asoc, asoc, chunk, commands))
>>               return SCTP_DISPOSITION_CONSUME;
>> -     }
>>
>>       /* If the endpoint is in the SHUTDOWN-ACK-SENT state and recognizes
>>        * the peer has restarted (Action A), it MUST NOT setup a new
>> @@ -1912,6 +1915,9 @@ static enum sctp_disposition sctp_sf_do_dupcook_b(
>>       if (sctp_auth_asoc_init_active_key(new_asoc, GFP_ATOMIC))
>>               goto nomem;
>>
>> +     if (!sctp_auth_chunk_verify(net, chunk, new_asoc))
>> +             return SCTP_DISPOSITION_DISCARD;
>> +
>>       /* Update the content of current association.  */
>>       sctp_add_cmd_sf(commands, SCTP_CMD_UPDATE_ASSOC, SCTP_ASOC(new_asoc));
>>       sctp_add_cmd_sf(commands, SCTP_CMD_NEW_STATE,
>> @@ -2009,6 +2015,9 @@ static enum sctp_disposition sctp_sf_do_dupcook_d(
>>        * a COOKIE ACK.
>>        */
>>
>> +     if (!sctp_auth_chunk_verify(net, chunk, asoc))
>> +             return SCTP_DISPOSITION_DISCARD;
>> +
>>       /* Don't accidentally move back into established state. */
>>       if (asoc->state < SCTP_STATE_ESTABLISHED) {
>>               sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_STOP,
>> @@ -4171,10 +4180,7 @@ enum sctp_disposition sctp_sf_eat_fwd_tsn_fast(
>>   * The return value is the disposition of the chunk.
>>   */
>>  static enum sctp_ierror sctp_sf_authenticate(
>> -                                     struct net *net,
>> -                                     const struct sctp_endpoint *ep,
>>                                       const struct sctp_association *asoc,
>> -                                     const union sctp_subtype type,
>>                                       struct sctp_chunk *chunk)
>>  {
>>       struct sctp_shared_key *sh_key = NULL;
>> @@ -4275,7 +4281,7 @@ enum sctp_disposition sctp_sf_eat_auth(struct net *net,
>>                                                 commands);
>>
>>       auth_hdr = (struct sctp_authhdr *)chunk->skb->data;
>> -     error = sctp_sf_authenticate(net, ep, asoc, type, chunk);
>> +     error = sctp_sf_authenticate(asoc, chunk);
>>       switch (error) {
>>       case SCTP_IERROR_AUTH_BAD_HMAC:
>>               /* Generate the ERROR chunk and discard the rest
>> --
>> 2.1.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply

* Re: [RFC PATCH 2/2] net: mac808211: mac802154: use lockdep_assert_in_softirq() instead own warning
From: Ingo Molnar @ 2018-05-05  7:06 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Peter Zijlstra, linux-kernel, tglx, Ingo Molnar, David S. Miller,
	Johannes Berg, Alexander Aring, Stefan Schmidt, netdev,
	linux-wireless, linux-wpan, Anna-Maria Gleixner
In-Reply-To: <20180504190735.izmzibhb66gjb5wr@linutronix.de>


* Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 2018-05-04 20:51:32 [+0200], Peter Zijlstra wrote:
> > On Fri, May 04, 2018 at 08:45:39PM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2018-05-04 20:32:49 [+0200], Peter Zijlstra wrote:
> > > > On Fri, May 04, 2018 at 07:51:44PM +0200, Sebastian Andrzej Siewior wrote:
> > > > > From: Anna-Maria Gleixner <anna-maria@linutronix.de>
> > > > > 
> > > > > The warning in ieee802154_rx() and ieee80211_rx_napi() is there to ensure
> > > > > the softirq context for the subsequent netif_receive_skb() call. 
> > > > 
> > > > That's not in fact what it does though; so while that might indeed be
> > > > the intent that's not what it does.
> > > 
> > > It was introduced in commit d20ef63d3246 ("mac80211: document
> > > ieee80211_rx() context requirement"):
> > > 
> > >     mac80211: document ieee80211_rx() context requirement
> > >     
> > >     ieee80211_rx() must be called with softirqs disabled
> > 
> > softirqs disabled, ack that is exactly what it checks.
> > 
> > But afaict the assertion you introduced tests that we are _in_ softirq
> > context, which is not the same.
> 
> indeed, now it clicked. Given what I wrote in the cover letter would you
> be in favour of (a proper) lockdep_assert_BH_disabled() or the cheaper
> local_bh_enable() (assuming the network folks don't mind the cheaper
> version)?

BTW., going by the hardirq variant nomenclature:

        lockdep_assert_irqs_enabled();

... the proper name would not be lockdep_assert_BH_disabled(), but:

        lockdep_assert_softirqs_disabled();

Thanks,

	Ingo

^ permalink raw reply

* Re: [net-next PATCH v2 2/8] udp: Verify that pulling UDP header in GSO segmentation doesn't fail
From: Willem de Bruijn @ 2018-05-05  8:12 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Network Development, Willem de Bruijn, David Miller
In-Reply-To: <20180504182857.5194.45504.stgit@localhost.localdomain>

On Fri, May 4, 2018 at 8:29 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
>
> We should verify that we can pull the UDP header before we attempt to do
> so. Otherwise if this fails we have no way of knowing and GSO will not work
> correctly.

This already happened in the callers udp[46]_ufo_fragment

^ permalink raw reply

* Re: [net-next PATCH v2 3/8] udp: Do not pass MSS as parameter to GSO segmentation
From: Willem de Bruijn @ 2018-05-05  8:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Duyck, Network Development, Willem de Bruijn,
	David Miller
In-Reply-To: <84200426-da0f-6ff4-8ae1-209e417f76c6@gmail.com>

On Fri, May 4, 2018 at 10:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 05/04/2018 11:29 AM, Alexander Duyck wrote:
>> From: Alexander Duyck <alexander.h.duyck@intel.com>
>>
>> There is no point in passing MSS as a parameter for for the GSO
>> segmentation call as it is already available via the shared info for the
>> skb itself.
>>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> ---
>
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Acked-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply

* Re: [net-next PATCH v2 1/8] udp: Record gso_segs when supporting UDP segmentation offload
From: Willem de Bruijn @ 2018-05-05  8:23 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Network Development, Willem de Bruijn, David Miller
In-Reply-To: <20180504182847.5194.27449.stgit@localhost.localdomain>

On Fri, May 4, 2018 at 8:28 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
>
> We need to record the number of segments that will be generated when this
> frame is segmented. The expectation is that if gso_size is set then
> gso_segs is set as well. Without this some drivers such as ixgbe get
> confused if they attempt to offload this as they record 0 segments for the
> entire packet instead of the correct value.
>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>  net/ipv4/udp.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index dd3102a37ef9..e07db83b311e 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -793,6 +793,8 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
>
>                 skb_shinfo(skb)->gso_size = cork->gso_size;
>                 skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
> +               skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(len - sizeof(uh),
> +                                                        cork->gso_size);
>                 goto csum_partial;
>         }

Acked-by: Willem de Bruijn <willemb@google.com>

But we have to be careful that both UDP GSO and UFO packets can traverse
qdisc_pkt_len_init. This does seem to fix the UDP GSO case, as it adds the
header size to each segment. But that is odd, as the code precedes UDP
GSO, so must have been intended to estimate UFO size on the wire. Only
external sources can generate UFO. It seems like we need to not add
sizeof(struct udphdr) to hdrlen in the SKB_GSO_DODGY branch.

^ permalink raw reply

* Re: [net-next PATCH v2 5/8] udp: Partially unroll handling of first segment and last segment
From: Willem de Bruijn @ 2018-05-05  8:37 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Network Development, Willem de Bruijn, David Miller
In-Reply-To: <20180504183019.5194.47789.stgit@localhost.localdomain>

On Fri, May 4, 2018 at 8:30 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
>
> This patch allows us to take care of unrolling the first segment and the
> last segment

Only the last segment needs to be unrolled, right?

> of the loop for processing the segmented skb. Part of the
> motivation for this is that it makes it easier to process the fact that the
> first fame and all of the frames in between should be mostly identical
> in terms of header data, and the last frame has differences in the length
> and partial checksum.
>
> In addition I am dropping the header length calculation since we don't
> really need it for anything but the last frame and it can be easily
> obtained by just pulling the data_len and offset of tail from the transport
> header.
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>
> v2: New break-out patch based on one patch from earlier series
>
>  net/ipv4/udp_offload.c |   35 ++++++++++++++++++++---------------
>  1 file changed, 20 insertions(+), 15 deletions(-)
>
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 5c4bb8b9e83e..946d06d2aa0c 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -193,7 +193,6 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>         struct sk_buff *seg, *segs = ERR_PTR(-EINVAL);
>         struct sock *sk = gso_skb->sk;
>         unsigned int sum_truesize = 0;
> -       unsigned int hdrlen;
>         struct udphdr *uh;
>         unsigned int mss;
>         __sum16 check;
> @@ -206,7 +205,6 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>         if (!pskb_may_pull(gso_skb, sizeof(*uh)))
>                 goto out;
>
> -       hdrlen = gso_skb->data - skb_mac_header(gso_skb);
>         skb_pull(gso_skb, sizeof(*uh));
>
>         /* clear destructor to avoid skb_segment assigning it to tail */
> @@ -219,7 +217,8 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>                 return segs;
>         }
>
> -       uh = udp_hdr(segs);
> +       seg = segs;
> +       uh = udp_hdr(seg);
>
>         /* compute checksum adjustment based on old length versus new */
>         newlen = htons(sizeof(*uh) + mss);
> @@ -227,25 +226,31 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>                                             ((__force u32)uh->len ^ 0xFFFF) +
>                                             (__force u32)newlen));
>
> -       for (seg = segs; seg; seg = seg->next) {
> -               uh = udp_hdr(seg);
> +       for (;;) {
> +               seg->destructor = sock_wfree;
> +               seg->sk = sk;
> +               sum_truesize += seg->truesize;
>
> -               /* last packet can be partial gso_size */
> -               if (!seg->next) {
> -                       newlen = htons(seg->len - hdrlen);
> -                       check = ~csum_fold((__force __wsum)((__force u32)uh->check +
> -                                                           ((__force u32)uh->len ^ 0xFFFF) +
> -                                                           (__force u32)newlen));
> -               }
> +               if (!seg->next)
> +                       break;

Not critical, but I find replacing

  for (seg = segs; seg; seg = seg->next) {
    uh = udp_hdr(seg);
    ...
  }

with

  uh = udp_hdr(seg);
  for (;;) {
    ...
    if (!seg->next)
      break;

    uh = udp_hdr(seg);
  }

much less easy to parse and it inflates patch size. How about just

  - for (seg = segs; seg; seg = seg->next)
  + for( seg = segs; seg->next; seg = seg->next)


>
>                 uh->len = newlen;
>                 uh->check = check;
>
> -               seg->destructor = sock_wfree;
> -               seg->sk = sk;
> -               sum_truesize += seg->truesize;
> +               seg = seg->next;
> +               uh = udp_hdr(seg);
>         }
>
> +       /* last packet can be partial gso_size, account for that in checksum */
> +       newlen = htons(skb_tail_pointer(seg) - skb_transport_header(seg) +
> +                      seg->data_len);
> +       check = ~csum_fold((__force __wsum)((__force u32)uh->check +
> +                                           ((__force u32)uh->len ^ 0xFFFF)  +
> +                                           (__force u32)newlen));
> +       uh->len = newlen;
> +       uh->check = check;
> +
> +       /* update refcount for the packet */
>         refcount_add(sum_truesize - gso_skb->truesize, &sk->sk_wmem_alloc);
>  out:
>         return segs;
>

^ permalink raw reply

* Re: [net-next PATCH v2 7/8] udp: Do not copy destructor if one is not present
From: Willem de Bruijn @ 2018-05-05  8:45 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Network Development, Willem de Bruijn, David Miller
In-Reply-To: <20180504183110.5194.5449.stgit@localhost.localdomain>

On Fri, May 4, 2018 at 8:31 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
>
> This patch makes it so that if a destructor is not present we avoid trying
> to update the skb socket or any reference counting that would be associated
> with the NULL socket and/or descriptor. By doing this we can support
> traffic coming from another namespace without any issues.
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>

Acked-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply

* BUG?: receiving on a packet socket with .sll_protocoll and bridging
From: Uwe Kleine-König @ 2018-05-05  8:57 UTC (permalink / raw)
  To: netdev

Hello,

my eventual goal is to implement MRP and for that I started to program a
bit and stumbled over a problem I don't understand.

For testing purposes I created a veth device pair (veth0 veth1), open a
socket for each of the devices and send packets around between them. In
tcpdump a typical package looks as follows:

10:36:34.755208 ae:a9:da:50:48:db (oui Unknown) > 01:15:e4:00:00:01 (oui Unknown), ethertype Unknown (0x88e3), length 58: 
	0x0000:  0001 0212 8000 aea9 da50 48db 0000 0000  .........PH.....
	0x0010:  0000 0589 40f2 6574 6800 0000 0000 0000  ....@.eth.......
	0x0020:  0000 0100 0a80 3d38 4c5e 0000            ......=8L^..

The socket to receive these packages is opened using:

	#define ETH_P_MRP 0x88e3

	struct sockaddr_ll sa_ll = {
		.sll_family = AF_PACKET,
		.sll_protocol = htons(ETH_P_MRP),
		.sll_ifindex = if_nametoindex("veth0")
	};

	fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_MRP));
	bind(fd, (struct sockaddr *)&sa_ll, sizeof(sa_ll));

So far everything works fine and I can receive the packets I send.

If now I add veth0 to a bridge (e.g.

	ip link add br0 type bridge
	ip link set dev veth0 master br0

) and continue to send on veth1 and receive on veth0 I don't receive
the packets any more. The other direction (veth0 sending, veth1
receiving) still works fine. Each of the following changes allow to
receive again:

 a) take veth0 out of the bridge
 b) bind(2) the receiving socket to br0 instead of veth0
 c) use .sll_protocol = htons(ETH_P_ALL) for bind(2)

In the end only c) could be sensible (because I need to know the port
the packet entered the stack and that might well be bridged), but I
wonder why .sll_protocol = htons(ETH_P_MRP) seems to do the right thing
for an unbridged device but not for a bridged one.

Is this a bug or a feature I don't understand?

If someone wants to reproduce this locally, I can simplify my program
and provide it here. I tested this on a Debian 4.15 kernel (x86), but
also with the same symptoms on an arm64 with 4.16 and a dsa switch.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply

* Re: [PATCH 1/8] rhashtable: silence RCU warning in rhashtable_test.
From: Herbert Xu @ 2018-05-05  9:10 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, linux-kernel
In-Reply-To: <152540605424.18473.310003836978239224.stgit@noble>

On Fri, May 04, 2018 at 01:54:14PM +1000, NeilBrown wrote:
> print_ht in rhashtable_test calls rht_dereference() with neither
> RCU protection or the mutex.  This triggers an RCU warning.
> So take the mutex to silence the warning.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>

I don't think the mutex is actually needed but since we don't
have rht_dereference_raw and this is just test code:

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 2/8] rhashtable: remove nulls_base and related code.
From: Herbert Xu @ 2018-05-05  9:12 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, linux-kernel
In-Reply-To: <152540605427.18473.12277050439942480863.stgit@noble>

On Fri, May 04, 2018 at 01:54:14PM +1000, NeilBrown wrote:
> This "feature" is unused, undocumented, and untested and so
> doesn't really belong.  If a use for the nulls marker
> is found, all this code would need to be reviewed to
> ensure it works as required.  It would be just as easy to
> just add the code if/when it is needed instead.
> 
> This patch actually fixes a bug too.  The table resizing allows a
> table to grow to 2^31 buckets, but the hash is truncated to 27 bits -
> any growth beyond 2^27 is wasteful an ineffective.
> 
> This patch result in NULLS_MARKER(0) being used for all chains,
> and leave the use of rht_is_a_null() to test for it.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>

I disagree.  This is a fundamental requirement for the use of
rhashtable in certain networking systems such as TCP/UDP.  So
we know that there will be a use for this.

As to the bug fix, please separate it out of the patch and resubmit.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 3/8] rhashtable: use cmpxchg() to protect ->future_tbl.
From: Herbert Xu @ 2018-05-05  9:27 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, linux-kernel
In-Reply-To: <152540605430.18473.11758878046224478232.stgit@noble>

On Fri, May 04, 2018 at 01:54:14PM +1000, NeilBrown wrote:
> Rather than borrowing one of the bucket locks to
> protect ->future_tbl updates, use cmpxchg().
> This gives more freedom to change how bucket locking
> is implemented.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>

This looks nice.

> -	spin_unlock_bh(old_tbl->locks);
> +	rcu_assign_pointer(tmp, new_tbl);

Do we need this barrier since cmpxchg is supposed to provide memory
barrier semantics?

> +	if (cmpxchg(&old_tbl->future_tbl, NULL, tmp) != NULL)
> +		return -EEXIST;

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 4/8] rhashtable: fix race in nested_table_alloc()
From: Herbert Xu @ 2018-05-05  9:29 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, linux-kernel
In-Reply-To: <152540605432.18473.11813271279255176724.stgit@noble>

On Fri, May 04, 2018 at 01:54:14PM +1000, NeilBrown wrote:
> If two threads run nested_table_alloc() at the same time
> they could both allocate a new table.
> Best case is that one of them will never be freed, leaking memory.
> Worst case is hat entry get stored there before it leaks,
> and the are lost from the table.
> 
> So use cmpxchg to detect the race and free the unused table.
> 
> Fixes: da20420f83ea ("rhashtable: Add nested tables")
> Cc: stable@vger.kernel.org # 4.11+
> Signed-off-by: NeilBrown <neilb@suse.com>

What about the spinlock that's meant to be held around this
operation?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 5/8] rhashtable: remove rhashtable_walk_peek()
From: Herbert Xu @ 2018-05-05  9:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, linux-kernel, Tom Herbert
In-Reply-To: <152540605435.18473.12657571579874985957.stgit@noble>

On Fri, May 04, 2018 at 01:54:14PM +1000, NeilBrown wrote:
> This function has a somewhat confused behavior that is not properly
> described by the documentation.
> Sometimes is returns the previous object, sometimes it returns the
> next one.
> Sometimes it changes the iterator, sometimes it doesn't.
> 
> This function is not currently used and is not worth keeping, so
> remove it.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>

I don't have a problem with this.  But it would be good to hear
from Tom Herbert who added this.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 8/8] rhashtable: don't hold lock on first table throughout insertion.
From: Herbert Xu @ 2018-05-05  9:41 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, linux-kernel
In-Reply-To: <152540605444.18473.9591316658457316578.stgit@noble>

On Fri, May 04, 2018 at 01:54:14PM +1000, NeilBrown wrote:
> rhashtable_try_insert() currently hold a lock on the bucket in
> the first table, while also locking buckets in subsequent tables.
> This is unnecessary and looks like a hold-over from some earlier
> version of the implementation.
> 
> As insert and remove always lock a bucket in each table in turn, and
> as insert only inserts in the final table, there cannot be any races
> that are not covered by simply locking a bucket in each table in turn.
> 
> When an insert call reaches that last table it can be sure that there
> is no match entry in any other table as it has searched them all, and
> insertion never happens anywhere but in the last table.  The fact that
> code tests for the existence of future_tbl while holding a lock on
> the relevant bucket ensures that two threads inserting the same key
> will make compatible decisions about which is the "last" table.
> 
> This simplifies the code and allows the ->rehash field to be
> discarded.
> We still need a way to ensure that a dead bucket_table is never
> re-linked by rhashtable_walk_stop().  This can be achieved by
> setting the ->size to 1.  This still allows lookup code to work (and
> correctly not find anything) but can never happen on an active bucket
> table (as the minimum size is 4).
> 
> Signed-off-by: NeilBrown <neilb@suse.com>

I'm not convinced this is safe.  There can be multiple tables
in existence.  Unless you hold the lock on the first table, what
is to stop two parallel inserters each from inserting into their
"last" tables which may not be the same table?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 6/8] rhashtable: further improve stability of rhashtable_walk
From: Herbert Xu @ 2018-05-05  9:42 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, linux-kernel
In-Reply-To: <152540605438.18473.4797800779538116583.stgit@noble>

On Fri, May 04, 2018 at 01:54:14PM +1000, NeilBrown wrote:
> If the sequence:
>    obj = rhashtable_walk_next(iter);
>    rhashtable_walk_stop(iter);
>    rhashtable_remove_fast(ht, &obj->head, params);
>    rhashtable_walk_start(iter);
> 
>  races with another thread inserting or removing
>  an object on the same hash chain, a subsequent
>  rhashtable_walk_next() is not guaranteed to get the "next"
>  object. It is possible that an object could be
>  repeated, or missed.
> 
>  This can be made more reliable by keeping the objects in a hash chain
>  sorted by memory address.  A subsequent rhashtable_walk_next()
>  call can reliably find the correct position in the list, and thus
>  find the 'next' object.
> 
>  It is not possible (certainly not so easy) to achieve this with an
>  rhltable as keeping the hash chain in order is not so easy.  When the
>  first object with a given key is removed, it is replaced in the chain
>  with the next object with the same key, and the address of that
>  object may not be correctly ordered.
>  No current user of rhltable_walk_enter() calls
>  rhashtable_walk_start() more than once, so no current code
>  could benefit from a more reliable walk of rhltables.
> 
>  This patch only attempts to improve walks for rhashtables.
>  - a new object is always inserted after the last object with a
>    smaller address, or at the start
>  - when rhashtable_walk_start() is called, it records that 'p' is not
>    'safe', meaning that it cannot be dereferenced.  The revalidation
>    that was previously done here is moved to rhashtable_walk_next()
>  - when rhashtable_walk_next() is called while p is not NULL and not
>    safe, it walks the chain looking for the first object with an
>    address greater than p and returns that.  If there is none, it moves
>    to the next hash chain.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>

I'm a bit torn on this.  On the hand this is definitely an improvement
over the status quo.  On the other this does not work on rhltable and
we do have a way of fixing it for both rhashtable and rhltable.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 7/8] rhashtable: add rhashtable_walk_prev()
From: Herbert Xu @ 2018-05-05  9:43 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, linux-kernel, Tom Herbert
In-Reply-To: <152540605441.18473.4087381584733882012.stgit@noble>

On Fri, May 04, 2018 at 01:54:14PM +1000, NeilBrown wrote:
> rhashtable_walk_prev() returns the object returned by
> the previous rhashtable_walk_next(), providing it is still in the
> table (or was during this grace period).
> This works even if rhashtable_walk_stop() and rhashtable_talk_start()
> have been called since the last rhashtable_walk_next().
> 
> If there have been no calls to rhashtable_walk_next(), or if the
> object is gone from the table, then NULL is returned.
> 
> This can usefully be used in a seq_file ->start() function.
> If the pos is the same as was returned by the last ->next() call,
> then rhashtable_walk_prev() can be used to re-establish the
> current location in the table.  If it returns NULL, then
> rhashtable_walk_next() should be used.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>

I will ack this if Tom is OK with replacing peek with it.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [net-next PATCH v2 4/8] udp: Do not pass checksum as a parameter to GSO segmentation
From: Willem de Bruijn @ 2018-05-05 10:01 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Network Development, Willem de Bruijn, David Miller
In-Reply-To: <20180504182958.5194.62398.stgit@localhost.localdomain>

On Fri, May 4, 2018 at 8:30 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
>
> This patch is meant to allow us to avoid having to recompute the checksum
> from scratch and have it passed as a parameter.
>
> Instead of taking that approach we can take advantage of the fact that the
> length that was used to compute the existing checksum is included in the
> UDP header. If we cancel that out by adding the value XOR with 0xFFFF we
> can then just add the new length in and fold that into the new result.
>
> I think this may be fixing a checksum bug in the original code as well
> since the checksum that was passed included the UDP header in the checksum
> computation, but then excluded it for the adjustment on the last frame. I
> believe this may have an effect on things in the cases where the two differ
> by bits that would result in things crossing the byte boundaries.

The replacement code, below, subtracts original payload size then adds
the new payload size. mss here excludes the udp header size.

>                 /* last packet can be partial gso_size */
> -               if (!seg->next)
> -                       csum_replace2(&uh->check, htons(mss),
> -                                     htons(seg->len - hdrlen - sizeof(*uh)));

^ permalink raw reply

* Re: [net-next PATCH v2 0/8] UDP GSO Segmentation clean-ups
From: Willem de Bruijn @ 2018-05-05 10:06 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Network Development, Willem de Bruijn, David Miller
In-Reply-To: <20180504182537.5194.72775.stgit@localhost.localdomain>

On Fri, May 4, 2018 at 8:28 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> This patch set addresses a number of issues I found while sorting out
> enabling UDP GSO Segmentation support for ixgbe/ixgbevf. Specifically there
> were a number of issues related to the checksum and such that seemed to
> cause either minor irregularities or kernel panics in the case of the
> offload request being allowed to traverse between name spaces.

Were you able to traverse GSO packets between network namespace before
adding to NETIF_F_GSO_SOFTWARE? It does appear that veth includes
NETIF_F_GSO_ENCAP_ALL, which also allows GSO.

In either case, it should not be possible for GSO packets to arrive on a veth
device, as that can result in queuing the GSO packet to a recipient socket.
In this regard veth is like loopback and must exclude GSO support.

I'll take a look.

^ permalink raw reply

* Re: [PATCH] net: dsa: drop some VLAs in switch.c
From: Salvatore Mesoraca @ 2018-05-05 10:36 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Vivien Didelot, linux-kernel, Kernel Hardening, netdev,
	David S. Miller, Andrew Lunn, Kees Cook
In-Reply-To: <b948ef55-8b16-fee4-b22f-4c6d09fea0cf@gmail.com>

2018-03-13 21:06 GMT+01:00 Florian Fainelli <f.fainelli@gmail.com>:
> On 03/13/2018 12:58 PM, Vivien Didelot wrote:
>> Hi Salvatore,
>>
>> Salvatore Mesoraca <s.mesoraca16@gmail.com> writes:
>>
>>> dsa_switch's num_ports is currently fixed to DSA_MAX_PORTS. So we avoid
>>> 2 VLAs[1] by using DSA_MAX_PORTS instead of ds->num_ports.
>>>
>>> [1] https://lkml.org/lkml/2018/3/7/621
>>>
>>> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
>>
>> NAK.
>>
>> We are in the process to remove hardcoded limits such as DSA_MAX_PORTS
>> and DSA_MAX_SWITCHES, so we have to stick with ds->num_ports.
>
> Then this means that we need to allocate a bitmap from the heap, which
> sounds a bit superfluous and could theoretically fail... not sure which
> way is better, but bumping the size to DSA_MAX_PORTS definitively does
> help people working on enabling -Wvla.

Hi Florian,

Should I consider this patch still NAKed or not?
Should I resend the patch with some modifications?

Thank you,

Salvatore

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox