Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 2/8] rhashtable: remove nulls_base and related code.
From: Herbert Xu @ 2018-05-05  9:12 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, linux-kernel
In-Reply-To: <152540605427.18473.12277050439942480863.stgit@noble>

On Fri, May 04, 2018 at 01:54:14PM +1000, NeilBrown wrote:
> This "feature" is unused, undocumented, and untested and so
> doesn't really belong.  If a use for the nulls marker
> is found, all this code would need to be reviewed to
> ensure it works as required.  It would be just as easy to
> just add the code if/when it is needed instead.
> 
> This patch actually fixes a bug too.  The table resizing allows a
> table to grow to 2^31 buckets, but the hash is truncated to 27 bits -
> any growth beyond 2^27 is wasteful an ineffective.
> 
> This patch result in NULLS_MARKER(0) being used for all chains,
> and leave the use of rht_is_a_null() to test for it.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>

I disagree.  This is a fundamental requirement for the use of
rhashtable in certain networking systems such as TCP/UDP.  So
we know that there will be a use for this.

As to the bug fix, please separate it out of the patch and resubmit.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 1/8] rhashtable: silence RCU warning in rhashtable_test.
From: Herbert Xu @ 2018-05-05  9:10 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, linux-kernel
In-Reply-To: <152540605424.18473.310003836978239224.stgit@noble>

On Fri, May 04, 2018 at 01:54:14PM +1000, NeilBrown wrote:
> print_ht in rhashtable_test calls rht_dereference() with neither
> RCU protection or the mutex.  This triggers an RCU warning.
> So take the mutex to silence the warning.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>

I don't think the mutex is actually needed but since we don't
have rht_dereference_raw and this is just test code:

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* BUG?: receiving on a packet socket with .sll_protocoll and bridging
From: Uwe Kleine-König @ 2018-05-05  8:57 UTC (permalink / raw)
  To: netdev

Hello,

my eventual goal is to implement MRP and for that I started to program a
bit and stumbled over a problem I don't understand.

For testing purposes I created a veth device pair (veth0 veth1), open a
socket for each of the devices and send packets around between them. In
tcpdump a typical package looks as follows:

10:36:34.755208 ae:a9:da:50:48:db (oui Unknown) > 01:15:e4:00:00:01 (oui Unknown), ethertype Unknown (0x88e3), length 58: 
	0x0000:  0001 0212 8000 aea9 da50 48db 0000 0000  .........PH.....
	0x0010:  0000 0589 40f2 6574 6800 0000 0000 0000  ....@.eth.......
	0x0020:  0000 0100 0a80 3d38 4c5e 0000            ......=8L^..

The socket to receive these packages is opened using:

	#define ETH_P_MRP 0x88e3

	struct sockaddr_ll sa_ll = {
		.sll_family = AF_PACKET,
		.sll_protocol = htons(ETH_P_MRP),
		.sll_ifindex = if_nametoindex("veth0")
	};

	fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_MRP));
	bind(fd, (struct sockaddr *)&sa_ll, sizeof(sa_ll));

So far everything works fine and I can receive the packets I send.

If now I add veth0 to a bridge (e.g.

	ip link add br0 type bridge
	ip link set dev veth0 master br0

) and continue to send on veth1 and receive on veth0 I don't receive
the packets any more. The other direction (veth0 sending, veth1
receiving) still works fine. Each of the following changes allow to
receive again:

 a) take veth0 out of the bridge
 b) bind(2) the receiving socket to br0 instead of veth0
 c) use .sll_protocol = htons(ETH_P_ALL) for bind(2)

In the end only c) could be sensible (because I need to know the port
the packet entered the stack and that might well be bridged), but I
wonder why .sll_protocol = htons(ETH_P_MRP) seems to do the right thing
for an unbridged device but not for a bridged one.

Is this a bug or a feature I don't understand?

If someone wants to reproduce this locally, I can simplify my program
and provide it here. I tested this on a Debian 4.15 kernel (x86), but
also with the same symptoms on an arm64 with 4.16 and a dsa switch.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply

* Re: [net-next PATCH v2 7/8] udp: Do not copy destructor if one is not present
From: Willem de Bruijn @ 2018-05-05  8:45 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Network Development, Willem de Bruijn, David Miller
In-Reply-To: <20180504183110.5194.5449.stgit@localhost.localdomain>

On Fri, May 4, 2018 at 8:31 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
>
> This patch makes it so that if a destructor is not present we avoid trying
> to update the skb socket or any reference counting that would be associated
> with the NULL socket and/or descriptor. By doing this we can support
> traffic coming from another namespace without any issues.
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>

Acked-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply

* Re: [net-next PATCH v2 5/8] udp: Partially unroll handling of first segment and last segment
From: Willem de Bruijn @ 2018-05-05  8:37 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Network Development, Willem de Bruijn, David Miller
In-Reply-To: <20180504183019.5194.47789.stgit@localhost.localdomain>

On Fri, May 4, 2018 at 8:30 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
>
> This patch allows us to take care of unrolling the first segment and the
> last segment

Only the last segment needs to be unrolled, right?

> of the loop for processing the segmented skb. Part of the
> motivation for this is that it makes it easier to process the fact that the
> first fame and all of the frames in between should be mostly identical
> in terms of header data, and the last frame has differences in the length
> and partial checksum.
>
> In addition I am dropping the header length calculation since we don't
> really need it for anything but the last frame and it can be easily
> obtained by just pulling the data_len and offset of tail from the transport
> header.
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>
> v2: New break-out patch based on one patch from earlier series
>
>  net/ipv4/udp_offload.c |   35 ++++++++++++++++++++---------------
>  1 file changed, 20 insertions(+), 15 deletions(-)
>
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 5c4bb8b9e83e..946d06d2aa0c 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -193,7 +193,6 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>         struct sk_buff *seg, *segs = ERR_PTR(-EINVAL);
>         struct sock *sk = gso_skb->sk;
>         unsigned int sum_truesize = 0;
> -       unsigned int hdrlen;
>         struct udphdr *uh;
>         unsigned int mss;
>         __sum16 check;
> @@ -206,7 +205,6 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>         if (!pskb_may_pull(gso_skb, sizeof(*uh)))
>                 goto out;
>
> -       hdrlen = gso_skb->data - skb_mac_header(gso_skb);
>         skb_pull(gso_skb, sizeof(*uh));
>
>         /* clear destructor to avoid skb_segment assigning it to tail */
> @@ -219,7 +217,8 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>                 return segs;
>         }
>
> -       uh = udp_hdr(segs);
> +       seg = segs;
> +       uh = udp_hdr(seg);
>
>         /* compute checksum adjustment based on old length versus new */
>         newlen = htons(sizeof(*uh) + mss);
> @@ -227,25 +226,31 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>                                             ((__force u32)uh->len ^ 0xFFFF) +
>                                             (__force u32)newlen));
>
> -       for (seg = segs; seg; seg = seg->next) {
> -               uh = udp_hdr(seg);
> +       for (;;) {
> +               seg->destructor = sock_wfree;
> +               seg->sk = sk;
> +               sum_truesize += seg->truesize;
>
> -               /* last packet can be partial gso_size */
> -               if (!seg->next) {
> -                       newlen = htons(seg->len - hdrlen);
> -                       check = ~csum_fold((__force __wsum)((__force u32)uh->check +
> -                                                           ((__force u32)uh->len ^ 0xFFFF) +
> -                                                           (__force u32)newlen));
> -               }
> +               if (!seg->next)
> +                       break;

Not critical, but I find replacing

  for (seg = segs; seg; seg = seg->next) {
    uh = udp_hdr(seg);
    ...
  }

with

  uh = udp_hdr(seg);
  for (;;) {
    ...
    if (!seg->next)
      break;

    uh = udp_hdr(seg);
  }

much less easy to parse and it inflates patch size. How about just

  - for (seg = segs; seg; seg = seg->next)
  + for( seg = segs; seg->next; seg = seg->next)


>
>                 uh->len = newlen;
>                 uh->check = check;
>
> -               seg->destructor = sock_wfree;
> -               seg->sk = sk;
> -               sum_truesize += seg->truesize;
> +               seg = seg->next;
> +               uh = udp_hdr(seg);
>         }
>
> +       /* last packet can be partial gso_size, account for that in checksum */
> +       newlen = htons(skb_tail_pointer(seg) - skb_transport_header(seg) +
> +                      seg->data_len);
> +       check = ~csum_fold((__force __wsum)((__force u32)uh->check +
> +                                           ((__force u32)uh->len ^ 0xFFFF)  +
> +                                           (__force u32)newlen));
> +       uh->len = newlen;
> +       uh->check = check;
> +
> +       /* update refcount for the packet */
>         refcount_add(sum_truesize - gso_skb->truesize, &sk->sk_wmem_alloc);
>  out:
>         return segs;
>

^ permalink raw reply

* Re: [net-next PATCH v2 1/8] udp: Record gso_segs when supporting UDP segmentation offload
From: Willem de Bruijn @ 2018-05-05  8:23 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Network Development, Willem de Bruijn, David Miller
In-Reply-To: <20180504182847.5194.27449.stgit@localhost.localdomain>

On Fri, May 4, 2018 at 8:28 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
>
> We need to record the number of segments that will be generated when this
> frame is segmented. The expectation is that if gso_size is set then
> gso_segs is set as well. Without this some drivers such as ixgbe get
> confused if they attempt to offload this as they record 0 segments for the
> entire packet instead of the correct value.
>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>  net/ipv4/udp.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index dd3102a37ef9..e07db83b311e 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -793,6 +793,8 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
>
>                 skb_shinfo(skb)->gso_size = cork->gso_size;
>                 skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
> +               skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(len - sizeof(uh),
> +                                                        cork->gso_size);
>                 goto csum_partial;
>         }

Acked-by: Willem de Bruijn <willemb@google.com>

But we have to be careful that both UDP GSO and UFO packets can traverse
qdisc_pkt_len_init. This does seem to fix the UDP GSO case, as it adds the
header size to each segment. But that is odd, as the code precedes UDP
GSO, so must have been intended to estimate UFO size on the wire. Only
external sources can generate UFO. It seems like we need to not add
sizeof(struct udphdr) to hdrlen in the SKB_GSO_DODGY branch.

^ permalink raw reply

* Re: [net-next PATCH v2 3/8] udp: Do not pass MSS as parameter to GSO segmentation
From: Willem de Bruijn @ 2018-05-05  8:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Duyck, Network Development, Willem de Bruijn,
	David Miller
In-Reply-To: <84200426-da0f-6ff4-8ae1-209e417f76c6@gmail.com>

On Fri, May 4, 2018 at 10:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 05/04/2018 11:29 AM, Alexander Duyck wrote:
>> From: Alexander Duyck <alexander.h.duyck@intel.com>
>>
>> There is no point in passing MSS as a parameter for for the GSO
>> segmentation call as it is already available via the shared info for the
>> skb itself.
>>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> ---
>
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Acked-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply

* Re: [net-next PATCH v2 2/8] udp: Verify that pulling UDP header in GSO segmentation doesn't fail
From: Willem de Bruijn @ 2018-05-05  8:12 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Network Development, Willem de Bruijn, David Miller
In-Reply-To: <20180504182857.5194.45504.stgit@localhost.localdomain>

On Fri, May 4, 2018 at 8:29 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
>
> We should verify that we can pull the UDP header before we attempt to do
> so. Otherwise if this fails we have no way of knowing and GSO will not work
> correctly.

This already happened in the callers udp[46]_ufo_fragment

^ permalink raw reply

* Re: [RFC PATCH 2/2] net: mac808211: mac802154: use lockdep_assert_in_softirq() instead own warning
From: Ingo Molnar @ 2018-05-05  7:06 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Peter Zijlstra, linux-kernel, tglx, Ingo Molnar, David S. Miller,
	Johannes Berg, Alexander Aring, Stefan Schmidt, netdev,
	linux-wireless, linux-wpan, Anna-Maria Gleixner
In-Reply-To: <20180504190735.izmzibhb66gjb5wr@linutronix.de>


* Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 2018-05-04 20:51:32 [+0200], Peter Zijlstra wrote:
> > On Fri, May 04, 2018 at 08:45:39PM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2018-05-04 20:32:49 [+0200], Peter Zijlstra wrote:
> > > > On Fri, May 04, 2018 at 07:51:44PM +0200, Sebastian Andrzej Siewior wrote:
> > > > > From: Anna-Maria Gleixner <anna-maria@linutronix.de>
> > > > > 
> > > > > The warning in ieee802154_rx() and ieee80211_rx_napi() is there to ensure
> > > > > the softirq context for the subsequent netif_receive_skb() call. 
> > > > 
> > > > That's not in fact what it does though; so while that might indeed be
> > > > the intent that's not what it does.
> > > 
> > > It was introduced in commit d20ef63d3246 ("mac80211: document
> > > ieee80211_rx() context requirement"):
> > > 
> > >     mac80211: document ieee80211_rx() context requirement
> > >     
> > >     ieee80211_rx() must be called with softirqs disabled
> > 
> > softirqs disabled, ack that is exactly what it checks.
> > 
> > But afaict the assertion you introduced tests that we are _in_ softirq
> > context, which is not the same.
> 
> indeed, now it clicked. Given what I wrote in the cover letter would you
> be in favour of (a proper) lockdep_assert_BH_disabled() or the cheaper
> local_bh_enable() (assuming the network folks don't mind the cheaper
> version)?

BTW., going by the hardirq variant nomenclature:

        lockdep_assert_irqs_enabled();

... the proper name would not be lockdep_assert_BH_disabled(), but:

        lockdep_assert_softirqs_disabled();

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH net] sctp: delay the authentication for the duplicated cookie-echo chunk
From: Xin Long @ 2018-05-05  7:01 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: network dev, linux-sctp, davem, Neil Horman
In-Reply-To: <20180504223337.GO5105@localhost.localdomain>

On Sat, May 5, 2018 at 6:33 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Fri, May 04, 2018 at 05:05:10PM +0800, Xin Long wrote:
>> Now sctp only delays the authentication for the normal cookie-echo
>> chunk by setting chunk->auth_chunk in sctp_endpoint_bh_rcv(). But
>> for the duplicated one with auth, in sctp_assoc_bh_rcv(), it does
>> authentication first based on the old asoc, which will definitely
>> fail due to the different auth info in the old asoc.
>>
>> The duplicated cookie-echo chunk will create a new asoc with the
>> auth info from this chunk, and the authentication should also be
>> done with the new asoc's auth info for all of the collision 'A',
>> 'B' and 'D'. Otherwise, the duplicated cookie-echo chunk with auth
>> will never pass the authentication and create the new connection.
>>
>> This issue exists since very beginning, and this fix is to make
>> sctp_assoc_bh_rcv() follow the way sctp_assoc_bh_rcv() does for
>    I guess you meant sctp_endpoint_bh_rcv here --^ right?
have posted v2, thanks.

>
> Other than this LGTM
>
>> the normal cookie-echo chunk to delay the authentication.
>>
>> While at it, remove the unused params from sctp_sf_authenticate()
>> and define sctp_auth_chunk_verify() used for all the places that
>> do the delayed authentication.
>>
>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>> ---
>>  net/sctp/associola.c    | 30 ++++++++++++++++-
>>  net/sctp/sm_statefuns.c | 86 ++++++++++++++++++++++++++-----------------------
>>  2 files changed, 75 insertions(+), 41 deletions(-)
>>
>> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
>> index 837806d..a47179d 100644
>> --- a/net/sctp/associola.c
>> +++ b/net/sctp/associola.c
>> @@ -1024,8 +1024,9 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
>>       struct sctp_endpoint *ep;
>>       struct sctp_chunk *chunk;
>>       struct sctp_inq *inqueue;
>> -     int state;
>> +     int first_time = 1;     /* is this the first time through the loop */
>>       int error = 0;
>> +     int state;
>>
>>       /* The association should be held so we should be safe. */
>>       ep = asoc->ep;
>> @@ -1036,6 +1037,30 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
>>               state = asoc->state;
>>               subtype = SCTP_ST_CHUNK(chunk->chunk_hdr->type);
>>
>> +             /* If the first chunk in the packet is AUTH, do special
>> +              * processing specified in Section 6.3 of SCTP-AUTH spec
>> +              */
>> +             if (first_time && subtype.chunk == SCTP_CID_AUTH) {
>> +                     struct sctp_chunkhdr *next_hdr;
>> +
>> +                     next_hdr = sctp_inq_peek(inqueue);
>> +                     if (!next_hdr)
>> +                             goto normal;
>> +
>> +                     /* If the next chunk is COOKIE-ECHO, skip the AUTH
>> +                      * chunk while saving a pointer to it so we can do
>> +                      * Authentication later (during cookie-echo
>> +                      * processing).
>> +                      */
>> +                     if (next_hdr->type == SCTP_CID_COOKIE_ECHO) {
>> +                             chunk->auth_chunk = skb_clone(chunk->skb,
>> +                                                           GFP_ATOMIC);
>> +                             chunk->auth = 1;
>> +                             continue;
>> +                     }
>> +             }
>> +
>> +normal:
>>               /* SCTP-AUTH, Section 6.3:
>>                *    The receiver has a list of chunk types which it expects
>>                *    to be received only after an AUTH-chunk.  This list has
>> @@ -1074,6 +1099,9 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
>>               /* If there is an error on chunk, discard this packet. */
>>               if (error && chunk)
>>                       chunk->pdiscard = 1;
>> +
>> +             if (first_time)
>> +                     first_time = 0;
>>       }
>>       sctp_association_put(asoc);
>>  }
>> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
>> index 28c070e..c9ae340 100644
>> --- a/net/sctp/sm_statefuns.c
>> +++ b/net/sctp/sm_statefuns.c
>> @@ -153,10 +153,7 @@ static enum sctp_disposition sctp_sf_violation_chunk(
>>                                       struct sctp_cmd_seq *commands);
>>
>>  static enum sctp_ierror sctp_sf_authenticate(
>> -                                     struct net *net,
>> -                                     const struct sctp_endpoint *ep,
>>                                       const struct sctp_association *asoc,
>> -                                     const union sctp_subtype type,
>>                                       struct sctp_chunk *chunk);
>>
>>  static enum sctp_disposition __sctp_sf_do_9_1_abort(
>> @@ -626,6 +623,38 @@ enum sctp_disposition sctp_sf_do_5_1C_ack(struct net *net,
>>       return SCTP_DISPOSITION_CONSUME;
>>  }
>>
>> +static bool sctp_auth_chunk_verify(struct net *net, struct sctp_chunk *chunk,
>> +                                const struct sctp_association *asoc)
>> +{
>> +     struct sctp_chunk auth;
>> +
>> +     if (!chunk->auth_chunk)
>> +             return true;
>> +
>> +     /* SCTP-AUTH:  auth_chunk pointer is only set when the cookie-echo
>> +      * is supposed to be authenticated and we have to do delayed
>> +      * authentication.  We've just recreated the association using
>> +      * the information in the cookie and now it's much easier to
>> +      * do the authentication.
>> +      */
>> +
>> +     /* Make sure that we and the peer are AUTH capable */
>> +     if (!net->sctp.auth_enable || !asoc->peer.auth_capable)
>> +             return false;
>> +
>> +     /* set-up our fake chunk so that we can process it */
>> +     auth.skb = chunk->auth_chunk;
>> +     auth.asoc = chunk->asoc;
>> +     auth.sctp_hdr = chunk->sctp_hdr;
>> +     auth.chunk_hdr = (struct sctp_chunkhdr *)
>> +                             skb_push(chunk->auth_chunk,
>> +                                      sizeof(struct sctp_chunkhdr));
>> +     skb_pull(chunk->auth_chunk, sizeof(struct sctp_chunkhdr));
>> +     auth.transport = chunk->transport;
>> +
>> +     return sctp_sf_authenticate(asoc, &auth) == SCTP_IERROR_NO_ERROR;
>> +}
>> +
>>  /*
>>   * Respond to a normal COOKIE ECHO chunk.
>>   * We are the side that is being asked for an association.
>> @@ -763,37 +792,9 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net,
>>       if (error)
>>               goto nomem_init;
>>
>> -     /* SCTP-AUTH:  auth_chunk pointer is only set when the cookie-echo
>> -      * is supposed to be authenticated and we have to do delayed
>> -      * authentication.  We've just recreated the association using
>> -      * the information in the cookie and now it's much easier to
>> -      * do the authentication.
>> -      */
>> -     if (chunk->auth_chunk) {
>> -             struct sctp_chunk auth;
>> -             enum sctp_ierror ret;
>> -
>> -             /* Make sure that we and the peer are AUTH capable */
>> -             if (!net->sctp.auth_enable || !new_asoc->peer.auth_capable) {
>> -                     sctp_association_free(new_asoc);
>> -                     return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
>> -             }
>> -
>> -             /* set-up our fake chunk so that we can process it */
>> -             auth.skb = chunk->auth_chunk;
>> -             auth.asoc = chunk->asoc;
>> -             auth.sctp_hdr = chunk->sctp_hdr;
>> -             auth.chunk_hdr = (struct sctp_chunkhdr *)
>> -                                     skb_push(chunk->auth_chunk,
>> -                                              sizeof(struct sctp_chunkhdr));
>> -             skb_pull(chunk->auth_chunk, sizeof(struct sctp_chunkhdr));
>> -             auth.transport = chunk->transport;
>> -
>> -             ret = sctp_sf_authenticate(net, ep, new_asoc, type, &auth);
>> -             if (ret != SCTP_IERROR_NO_ERROR) {
>> -                     sctp_association_free(new_asoc);
>> -                     return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
>> -             }
>> +     if (!sctp_auth_chunk_verify(net, chunk, new_asoc)) {
>> +             sctp_association_free(new_asoc);
>> +             return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
>>       }
>>
>>       repl = sctp_make_cookie_ack(new_asoc, chunk);
>> @@ -1797,13 +1798,15 @@ static enum sctp_disposition sctp_sf_do_dupcook_a(
>>       if (sctp_auth_asoc_init_active_key(new_asoc, GFP_ATOMIC))
>>               goto nomem;
>>
>> +     if (!sctp_auth_chunk_verify(net, chunk, new_asoc))
>> +             return SCTP_DISPOSITION_DISCARD;
>> +
>>       /* Make sure no new addresses are being added during the
>>        * restart.  Though this is a pretty complicated attack
>>        * since you'd have to get inside the cookie.
>>        */
>> -     if (!sctp_sf_check_restart_addrs(new_asoc, asoc, chunk, commands)) {
>> +     if (!sctp_sf_check_restart_addrs(new_asoc, asoc, chunk, commands))
>>               return SCTP_DISPOSITION_CONSUME;
>> -     }
>>
>>       /* If the endpoint is in the SHUTDOWN-ACK-SENT state and recognizes
>>        * the peer has restarted (Action A), it MUST NOT setup a new
>> @@ -1912,6 +1915,9 @@ static enum sctp_disposition sctp_sf_do_dupcook_b(
>>       if (sctp_auth_asoc_init_active_key(new_asoc, GFP_ATOMIC))
>>               goto nomem;
>>
>> +     if (!sctp_auth_chunk_verify(net, chunk, new_asoc))
>> +             return SCTP_DISPOSITION_DISCARD;
>> +
>>       /* Update the content of current association.  */
>>       sctp_add_cmd_sf(commands, SCTP_CMD_UPDATE_ASSOC, SCTP_ASOC(new_asoc));
>>       sctp_add_cmd_sf(commands, SCTP_CMD_NEW_STATE,
>> @@ -2009,6 +2015,9 @@ static enum sctp_disposition sctp_sf_do_dupcook_d(
>>        * a COOKIE ACK.
>>        */
>>
>> +     if (!sctp_auth_chunk_verify(net, chunk, asoc))
>> +             return SCTP_DISPOSITION_DISCARD;
>> +
>>       /* Don't accidentally move back into established state. */
>>       if (asoc->state < SCTP_STATE_ESTABLISHED) {
>>               sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_STOP,
>> @@ -4171,10 +4180,7 @@ enum sctp_disposition sctp_sf_eat_fwd_tsn_fast(
>>   * The return value is the disposition of the chunk.
>>   */
>>  static enum sctp_ierror sctp_sf_authenticate(
>> -                                     struct net *net,
>> -                                     const struct sctp_endpoint *ep,
>>                                       const struct sctp_association *asoc,
>> -                                     const union sctp_subtype type,
>>                                       struct sctp_chunk *chunk)
>>  {
>>       struct sctp_shared_key *sh_key = NULL;
>> @@ -4275,7 +4281,7 @@ enum sctp_disposition sctp_sf_eat_auth(struct net *net,
>>                                                 commands);
>>
>>       auth_hdr = (struct sctp_authhdr *)chunk->skb->data;
>> -     error = sctp_sf_authenticate(net, ep, asoc, type, chunk);
>> +     error = sctp_sf_authenticate(asoc, chunk);
>>       switch (error) {
>>       case SCTP_IERROR_AUTH_BAD_HMAC:
>>               /* Generate the ERROR chunk and discard the rest
>> --
>> 2.1.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply

* [PATCHv2 net] sctp: delay the authentication for the duplicated cookie-echo chunk
From: Xin Long @ 2018-05-05  6:59 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: davem, Marcelo Ricardo Leitner, Neil Horman

Now sctp only delays the authentication for the normal cookie-echo
chunk by setting chunk->auth_chunk in sctp_endpoint_bh_rcv(). But
for the duplicated one with auth, in sctp_assoc_bh_rcv(), it does
authentication first based on the old asoc, which will definitely
fail due to the different auth info in the old asoc.

The duplicated cookie-echo chunk will create a new asoc with the
auth info from this chunk, and the authentication should also be
done with the new asoc's auth info for all of the collision 'A',
'B' and 'D'. Otherwise, the duplicated cookie-echo chunk with auth
will never pass the authentication and create the new connection.

This issue exists since very beginning, and this fix is to make
sctp_assoc_bh_rcv() follow the way sctp_endpoint_bh_rcv() does
for the normal cookie-echo chunk to delay the authentication.

While at it, remove the unused params from sctp_sf_authenticate()
and define sctp_auth_chunk_verify() used for all the places that
do the delayed authentication.

v1->v2:
  fix the typo in changelog as Marcelo noticed.

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sctp/associola.c    | 30 ++++++++++++++++-
 net/sctp/sm_statefuns.c | 86 ++++++++++++++++++++++++++-----------------------
 2 files changed, 75 insertions(+), 41 deletions(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 837806d..a47179d 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1024,8 +1024,9 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
 	struct sctp_endpoint *ep;
 	struct sctp_chunk *chunk;
 	struct sctp_inq *inqueue;
-	int state;
+	int first_time = 1;	/* is this the first time through the loop */
 	int error = 0;
+	int state;
 
 	/* The association should be held so we should be safe. */
 	ep = asoc->ep;
@@ -1036,6 +1037,30 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
 		state = asoc->state;
 		subtype = SCTP_ST_CHUNK(chunk->chunk_hdr->type);
 
+		/* If the first chunk in the packet is AUTH, do special
+		 * processing specified in Section 6.3 of SCTP-AUTH spec
+		 */
+		if (first_time && subtype.chunk == SCTP_CID_AUTH) {
+			struct sctp_chunkhdr *next_hdr;
+
+			next_hdr = sctp_inq_peek(inqueue);
+			if (!next_hdr)
+				goto normal;
+
+			/* If the next chunk is COOKIE-ECHO, skip the AUTH
+			 * chunk while saving a pointer to it so we can do
+			 * Authentication later (during cookie-echo
+			 * processing).
+			 */
+			if (next_hdr->type == SCTP_CID_COOKIE_ECHO) {
+				chunk->auth_chunk = skb_clone(chunk->skb,
+							      GFP_ATOMIC);
+				chunk->auth = 1;
+				continue;
+			}
+		}
+
+normal:
 		/* SCTP-AUTH, Section 6.3:
 		 *    The receiver has a list of chunk types which it expects
 		 *    to be received only after an AUTH-chunk.  This list has
@@ -1074,6 +1099,9 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
 		/* If there is an error on chunk, discard this packet. */
 		if (error && chunk)
 			chunk->pdiscard = 1;
+
+		if (first_time)
+			first_time = 0;
 	}
 	sctp_association_put(asoc);
 }
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 28c070e..c9ae340 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -153,10 +153,7 @@ static enum sctp_disposition sctp_sf_violation_chunk(
 					struct sctp_cmd_seq *commands);
 
 static enum sctp_ierror sctp_sf_authenticate(
-					struct net *net,
-					const struct sctp_endpoint *ep,
 					const struct sctp_association *asoc,
-					const union sctp_subtype type,
 					struct sctp_chunk *chunk);
 
 static enum sctp_disposition __sctp_sf_do_9_1_abort(
@@ -626,6 +623,38 @@ enum sctp_disposition sctp_sf_do_5_1C_ack(struct net *net,
 	return SCTP_DISPOSITION_CONSUME;
 }
 
+static bool sctp_auth_chunk_verify(struct net *net, struct sctp_chunk *chunk,
+				   const struct sctp_association *asoc)
+{
+	struct sctp_chunk auth;
+
+	if (!chunk->auth_chunk)
+		return true;
+
+	/* SCTP-AUTH:  auth_chunk pointer is only set when the cookie-echo
+	 * is supposed to be authenticated and we have to do delayed
+	 * authentication.  We've just recreated the association using
+	 * the information in the cookie and now it's much easier to
+	 * do the authentication.
+	 */
+
+	/* Make sure that we and the peer are AUTH capable */
+	if (!net->sctp.auth_enable || !asoc->peer.auth_capable)
+		return false;
+
+	/* set-up our fake chunk so that we can process it */
+	auth.skb = chunk->auth_chunk;
+	auth.asoc = chunk->asoc;
+	auth.sctp_hdr = chunk->sctp_hdr;
+	auth.chunk_hdr = (struct sctp_chunkhdr *)
+				skb_push(chunk->auth_chunk,
+					 sizeof(struct sctp_chunkhdr));
+	skb_pull(chunk->auth_chunk, sizeof(struct sctp_chunkhdr));
+	auth.transport = chunk->transport;
+
+	return sctp_sf_authenticate(asoc, &auth) == SCTP_IERROR_NO_ERROR;
+}
+
 /*
  * Respond to a normal COOKIE ECHO chunk.
  * We are the side that is being asked for an association.
@@ -763,37 +792,9 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net,
 	if (error)
 		goto nomem_init;
 
-	/* SCTP-AUTH:  auth_chunk pointer is only set when the cookie-echo
-	 * is supposed to be authenticated and we have to do delayed
-	 * authentication.  We've just recreated the association using
-	 * the information in the cookie and now it's much easier to
-	 * do the authentication.
-	 */
-	if (chunk->auth_chunk) {
-		struct sctp_chunk auth;
-		enum sctp_ierror ret;
-
-		/* Make sure that we and the peer are AUTH capable */
-		if (!net->sctp.auth_enable || !new_asoc->peer.auth_capable) {
-			sctp_association_free(new_asoc);
-			return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
-		}
-
-		/* set-up our fake chunk so that we can process it */
-		auth.skb = chunk->auth_chunk;
-		auth.asoc = chunk->asoc;
-		auth.sctp_hdr = chunk->sctp_hdr;
-		auth.chunk_hdr = (struct sctp_chunkhdr *)
-					skb_push(chunk->auth_chunk,
-						 sizeof(struct sctp_chunkhdr));
-		skb_pull(chunk->auth_chunk, sizeof(struct sctp_chunkhdr));
-		auth.transport = chunk->transport;
-
-		ret = sctp_sf_authenticate(net, ep, new_asoc, type, &auth);
-		if (ret != SCTP_IERROR_NO_ERROR) {
-			sctp_association_free(new_asoc);
-			return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
-		}
+	if (!sctp_auth_chunk_verify(net, chunk, new_asoc)) {
+		sctp_association_free(new_asoc);
+		return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
 	}
 
 	repl = sctp_make_cookie_ack(new_asoc, chunk);
@@ -1797,13 +1798,15 @@ static enum sctp_disposition sctp_sf_do_dupcook_a(
 	if (sctp_auth_asoc_init_active_key(new_asoc, GFP_ATOMIC))
 		goto nomem;
 
+	if (!sctp_auth_chunk_verify(net, chunk, new_asoc))
+		return SCTP_DISPOSITION_DISCARD;
+
 	/* Make sure no new addresses are being added during the
 	 * restart.  Though this is a pretty complicated attack
 	 * since you'd have to get inside the cookie.
 	 */
-	if (!sctp_sf_check_restart_addrs(new_asoc, asoc, chunk, commands)) {
+	if (!sctp_sf_check_restart_addrs(new_asoc, asoc, chunk, commands))
 		return SCTP_DISPOSITION_CONSUME;
-	}
 
 	/* If the endpoint is in the SHUTDOWN-ACK-SENT state and recognizes
 	 * the peer has restarted (Action A), it MUST NOT setup a new
@@ -1912,6 +1915,9 @@ static enum sctp_disposition sctp_sf_do_dupcook_b(
 	if (sctp_auth_asoc_init_active_key(new_asoc, GFP_ATOMIC))
 		goto nomem;
 
+	if (!sctp_auth_chunk_verify(net, chunk, new_asoc))
+		return SCTP_DISPOSITION_DISCARD;
+
 	/* Update the content of current association.  */
 	sctp_add_cmd_sf(commands, SCTP_CMD_UPDATE_ASSOC, SCTP_ASOC(new_asoc));
 	sctp_add_cmd_sf(commands, SCTP_CMD_NEW_STATE,
@@ -2009,6 +2015,9 @@ static enum sctp_disposition sctp_sf_do_dupcook_d(
 	 * a COOKIE ACK.
 	 */
 
+	if (!sctp_auth_chunk_verify(net, chunk, asoc))
+		return SCTP_DISPOSITION_DISCARD;
+
 	/* Don't accidentally move back into established state. */
 	if (asoc->state < SCTP_STATE_ESTABLISHED) {
 		sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_STOP,
@@ -4171,10 +4180,7 @@ enum sctp_disposition sctp_sf_eat_fwd_tsn_fast(
  * The return value is the disposition of the chunk.
  */
 static enum sctp_ierror sctp_sf_authenticate(
-					struct net *net,
-					const struct sctp_endpoint *ep,
 					const struct sctp_association *asoc,
-					const union sctp_subtype type,
 					struct sctp_chunk *chunk)
 {
 	struct sctp_shared_key *sh_key = NULL;
@@ -4275,7 +4281,7 @@ enum sctp_disposition sctp_sf_eat_auth(struct net *net,
 						  commands);
 
 	auth_hdr = (struct sctp_authhdr *)chunk->skb->data;
-	error = sctp_sf_authenticate(net, ep, asoc, type, chunk);
+	error = sctp_sf_authenticate(asoc, chunk);
 	switch (error) {
 	case SCTP_IERROR_AUTH_BAD_HMAC:
 		/* Generate the ERROR chunk and discard the rest
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH net-next v2] net: core: rework basic flow dissection helper
From: kbuild test robot @ 2018-05-05  5:21 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: kbuild-all, netdev, David S. Miller, Eric Dumazet, Jason Wang
In-Reply-To: <e87a05fed35c9cb8e60fe8f867c43b27b3044e21.1525426286.git.pabeni@redhat.com>

Hi Paolo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Paolo-Abeni/net-core-rework-basic-flow-dissection-helper/20180505-090417
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   include/net/tipc.h:55:30: sparse: incorrect type in return expression (different base types) @@    expected unsigned int @@    got restriunsigned int @@
   include/net/tipc.h:55:30:    expected unsigned int
   include/net/tipc.h:55:30:    got restricted __be32 <noident>
   net/core/flow_dissector.c:840:48: sparse: incorrect type in assignment (different base types) @@    expected restricted __be32 [usertype] key @@    got  [usertype] key @@
   net/core/flow_dissector.c:840:48:    expected restricted __be32 [usertype] key
   net/core/flow_dissector.c:840:48:    got unsigned int
   net/core/flow_dissector.c:1035:30: sparse: expression using sizeof(void)
   net/core/flow_dissector.c:1035:30: sparse: expression using sizeof(void)
   net/core/flow_dissector.c:1276:25: sparse: expression using sizeof(void)
>> net/core/flow_dissector.c:1319:59: sparse: Using plain integer as NULL pointer

vim +1319 net/core/flow_dissector.c

  1305	
  1306	/**
  1307	 * skb_get_poff - get the offset to the payload
  1308	 * @skb: sk_buff to get the payload offset from
  1309	 *
  1310	 * The function will get the offset to the payload as far as it could
  1311	 * be dissected.  The main user is currently BPF, so that we can dynamically
  1312	 * truncate packets without needing to push actual payload to the user
  1313	 * space and can analyze headers only, instead.
  1314	 */
  1315	u32 skb_get_poff(const struct sk_buff *skb)
  1316	{
  1317		struct flow_keys_basic keys;
  1318	
> 1319		if (!skb_flow_dissect_flow_keys_basic(skb, &keys, 0, 0, 0, 0, 0))
  1320			return 0;
  1321	
  1322		return __skb_get_poff(skb, skb->data, &keys, skb_headlen(skb));
  1323	}
  1324	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH iproute2] rdma: fix header files
From: Stephen Hemminger @ 2018-05-05  4:58 UTC (permalink / raw)
  To: David Ahern; +Cc: swise, netdev
In-Reply-To: <d51c9746-d3e9-48f5-fc93-b63e4c5c7a8c@gmail.com>

On Fri, 4 May 2018 16:13:07 -0600
David Ahern <dsahern@gmail.com> wrote:

> On 5/4/18 3:56 PM, Stephen Hemminger wrote:
> > All user api headers in iproute2 should be in include/uapi
> > so that script can be used to put correct sanitized kernel headers
> > there. And the header files for rdma must be a complete set; if one
> > header file includes another, all must be present.
> > 
> > This fixes build on older distributions, and Windows Services
> > for Linux.
> > 
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >  include/uapi/rdma/ib_user_sa.h                |   77 ++
> >  include/uapi/rdma/ib_user_verbs.h             | 1210 +++++++++++++++++
> >  .../uapi/rdma/rdma_netlink.h                  |   13 +
> >  .../uapi/rdma/rdma_user_cm.h                  |    6 +-
> >  4 files changed, 1303 insertions(+), 3 deletions(-)
> >  create mode 100644 include/uapi/rdma/ib_user_sa.h
> >  create mode 100644 include/uapi/rdma/ib_user_verbs.h
> >  rename {rdma/include => include}/uapi/rdma/rdma_netlink.h (95%)
> >  rename {rdma/include => include}/uapi/rdma/rdma_user_cm.h (98%)
> >   
> 
> Stephen:
> 
> Per a recent discussion the RDMA folks need to take ownership of the
> uapi files. RDMA features do not hit Dave's net-next tree so the rdma
> code can never hit iproute2-next during a dev cycle.

I want all uapi headers in include/uapi because it avoids possible overlap problems,
During the linux-net/linus release cycle they should match what is Linus's tree.

During the net-next they can come from two sources.

^ permalink raw reply

* Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper
From: Jann Horn @ 2018-05-05  4:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Linus Torvalds,
	Greg Kroah-Hartman, Andy Lutomirski, Network Development,
	kernel list, kernel-team
In-Reply-To: <20180503043604.1604587-2-ast@kernel.org>

On Thu, May 3, 2018 at 12:36 AM, Alexei Starovoitov <ast@kernel.org> wrote:
> Introduce helper:
> int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> struct umh_info {
>        struct file *pipe_to_umh;
>        struct file *pipe_from_umh;
>        pid_t pid;
> };
>
> that GPLed kernel modules (signed or unsigned) can use it to execute part
> of its own data as swappable user mode process.
>
> The kernel will do:
> - mount "tmpfs"
> - allocate a unique file in tmpfs
> - populate that file with [data, data + len] bytes
> - user-mode-helper code will do_execve that file and, before the process
>   starts, the kernel will create two unix pipes for bidirectional
>   communication between kernel module and umh
> - close tmpfs file, effectively deleting it
> - the fork_usermode_blob will return zero on success and populate
>   'struct umh_info' with two unix pipes and the pid of the user process
>
> As the first step in the development of the bpfilter project
> the fork_usermode_blob() helper is introduced to allow user mode code
> to be invoked from a kernel module. The idea is that user mode code plus
> normal kernel module code are built as part of the kernel build
> and installed as traditional kernel module into distro specified location,
> such that from a distribution point of view, there is
> no difference between regular kernel modules and kernel modules + umh code.
> Such modules can be signed, modprobed, rmmod, etc. The use of this new helper
> by a kernel module doesn't make it any special from kernel and user space
> tooling point of view.
[...]
> +static struct vfsmount *umh_fs;
> +
> +static int init_tmpfs(void)
> +{
> +       struct file_system_type *type;
> +
> +       if (umh_fs)
> +               return 0;
> +       type = get_fs_type("tmpfs");
> +       if (!type)
> +               return -ENODEV;
> +       umh_fs = kern_mount(type);
> +       if (IS_ERR(umh_fs)) {
> +               int err = PTR_ERR(umh_fs);
> +
> +               put_filesystem(type);
> +               umh_fs = NULL;
> +               return err;
> +       }
> +       return 0;
> +}

Should init_tmpfs() be holding some sort of mutex if it's fiddling
with `umh_fs`? The current code only calls it in initcall context, but
if that ever changes and two processes try to initialize the tmpfs at
the same time, a few things could go wrong.
I guess Luis' suggestion (putting a call to init_tmpfs() in
do_basic_setup()) might be the easiest way to get rid of that problem.

> +static int alloc_tmpfs_file(size_t size, struct file **filp)
> +{
> +       struct file *file;
> +       int err;
> +
> +       err = init_tmpfs();
> +       if (err)
> +               return err;
> +       file = shmem_file_setup_with_mnt(umh_fs, "umh", size, VM_NORESERVE);
> +       if (IS_ERR(file))
> +               return PTR_ERR(file);
> +       *filp = file;
> +       return 0;
> +}

^ permalink raw reply

* [PATCH] net/9p: correct the variable name in v9fs_get_trans_by_name() comment
From: Sun Lianwen @ 2018-05-05  3:29 UTC (permalink / raw)
  To: davem; +Cc: netdev

The v9fs_get_trans_by_name(char *s) variable name is not "name" but "s".

Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
---
 net/9p/mod.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/9p/mod.c b/net/9p/mod.c
index 6ab36aea7727..eb9777f05755 100644
--- a/net/9p/mod.c
+++ b/net/9p/mod.c
@@ -104,7 +104,7 @@ EXPORT_SYMBOL(v9fs_unregister_trans);
 
 /**
  * v9fs_get_trans_by_name - get transport with the matching name
- * @name: string identifying transport
+ * @s: string identifying transport
  *
  */
 struct p9_trans_module *v9fs_get_trans_by_name(char *s)
-- 
2.17.0

^ permalink raw reply related

* Re: [v2 PATCH 1/1] tg3: fix meaningless hw_stats reading after tg3_halt memset 0 hw_stats
From: Zumeng Chen @ 2018-05-05  2:40 UTC (permalink / raw)
  To: Michael Chan
  Cc: Netdev, open list, Siva Reddy Kallam,
	prashant.sreedharan@broadcom.com, David Miller
In-Reply-To: <CACKFLi=Rv_+Gig5UpdnpbOPcyLyGr3TYGUbFEmeGcZWAk3YZRQ@mail.gmail.com>

On 05/03/2018 01:04 PM, Michael Chan wrote:
> On Wed, May 2, 2018 at 5:30 PM, Zumeng Chen <zumeng.chen@gmail.com> wrote:
>> On 2018年05月03日 01:32, Michael Chan wrote:
>>> On Wed, May 2, 2018 at 3:27 AM, Zumeng Chen <zumeng.chen@gmail.com> wrote:
>>>> On 2018年05月02日 13:12, Michael Chan wrote:
>>>>> On Tue, May 1, 2018 at 5:42 PM, Zumeng Chen <zumeng.chen@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> diff --git a/drivers/net/ethernet/broadcom/tg3.h
>>>>>> b/drivers/net/ethernet/broadcom/tg3.h
>>>>>> index 3b5e98e..c61d83c 100644
>>>>>> --- a/drivers/net/ethernet/broadcom/tg3.h
>>>>>> +++ b/drivers/net/ethernet/broadcom/tg3.h
>>>>>> @@ -3102,6 +3102,7 @@ enum TG3_FLAGS {
>>>>>>            TG3_FLAG_ROBOSWITCH,
>>>>>>            TG3_FLAG_ONE_DMA_AT_ONCE,
>>>>>>            TG3_FLAG_RGMII_MODE,
>>>>>> +       TG3_FLAG_HALT,
>>>>> I think you should be able to use the existing INIT_COMPLETE flag
>>>>
>>>> No,  it will bring the uncertain factors into the existed complicate
>>>> logic
>>>> of INIT_COMPLETE.
>>>> And I think it's very simple logic here to fix the meaningless hw_stats
>>>> reading and the problem
>>>> of commit f5992b72. I even suspect if you have read INIT_COMPLETE related
>>>> codes carefully.
>>>>
>>> We should use an existing flag whenever appropriate
>>
>> I disagree. This is sort of blahblah...
> I don't want to see another flag added that is practically the same as
> !INIT_COMPLETE.  The driver already has close to one hundred flags.
> Adding a new flag that is similar to an existing flag will just make
> the code more difficult to understand and maintain.
>
> If you don't want to fix it the cleaner way, Siva or I will fix it.

Feel free to go, I just take a double look, INIT_COMPLETE can directly 
be used as follows:

diff --git a/drivers/net/ethernet/broadcom/tg3.c 
b/drivers/net/ethernet/broadcom/tg3.c
index 08bbb63..0e04fd7 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -8733,14 +8733,11 @@ static void tg3_free_consistent(struct tg3 *tp)
         tg3_mem_rx_release(tp);
         tg3_mem_tx_release(tp);

-       /* Protect tg3_get_stats64() from reading freed tp->hw_stats. */
-       tg3_full_lock(tp, 0);
         if (tp->hw_stats) {
                 dma_free_coherent(&tp->pdev->dev, sizeof(struct 
tg3_hw_stats),
                                   tp->hw_stats, tp->stats_mapping);
                 tp->hw_stats = NULL;
         }
-       tg3_full_unlock(tp);
  }

  /*
@@ -14178,7 +14175,7 @@ static void tg3_get_stats64(struct net_device *dev,
         struct tg3 *tp = netdev_priv(dev);

         spin_lock_bh(&tp->lock);
-       if (!tp->hw_stats) {
+       if (!tp->hw_stats || tg3_flag(tp, INIT_COMPLETE)) {
                 *stats = tp->net_stats_prev;
                 spin_unlock_bh(&tp->lock);
                 return;

Cheers,
Zumeng

^ permalink raw reply related

* Re: [PATCH V3] net/netlink: make sure the headers line up actual value output
From: Bo YU @ 2018-05-05  1:54 UTC (permalink / raw)
  To: David Miller; +Cc: xiyou.wangcong, yuzibode, netdev
In-Reply-To: <20180504.130223.1863527612784159289.davem@davemloft.net>

Hi,
On Fri, May 04, 2018 at 01:02:23PM -0400, David Miller wrote:
>From: YU Bo <tsu.yubo@gmail.com>
>Date: Thu, 3 May 2018 23:09:23 -0400
>
>> Making sure the headers line up properly with the actual value output
>> of the command
>> `cat /proc/net/netlink`
>>
>> Before the patch:
>> <sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode
>> <ffff8cd2c2f7b000 0 909 00000550 0 0 0 2 0 18946
>>
>> After the patch:
>>>sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode
>>>0000000033203952 0 897 00000113 0 0 0 2 0 14906
>>
>> Signed-off-by: Bo YU <tsu.yubo@gmail.com>
>
>Applied, but why did you send this V3 to the list two times?
Thanks a lot. When sent the email,i encounter networking issue and not sure
send to the list.Sorry for making noise.
>
>Thank you.

^ permalink raw reply

* Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper
From: Alexei Starovoitov @ 2018-05-05  1:37 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Alexei Starovoitov, davem, daniel, torvalds, gregkh, luto, netdev,
	linux-kernel, kernel-team, Al Viro, David Howells, Mimi Zohar,
	Kees Cook, Andrew Morton, Dominik Brodowski, Hugh Dickins,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, David Airlie,
	Rafael J. Wysocki, Linux FS Devel, p
In-Reply-To: <20180504195642.GB12838@wotan.suse.de>

On Fri, May 04, 2018 at 07:56:43PM +0000, Luis R. Rodriguez wrote:
> What a mighty short list of reviewers. Adding some more. My review below.
> I'd appreciate a Cc on future versions of these patches.

sure.

> On Wed, May 02, 2018 at 09:36:01PM -0700, Alexei Starovoitov wrote:
> > Introduce helper:
> > int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> > struct umh_info {
> >        struct file *pipe_to_umh;
> >        struct file *pipe_from_umh;
> >        pid_t pid;
> > };
> > 
> > that GPLed kernel modules (signed or unsigned) can use it to execute part
> > of its own data as swappable user mode process.
> > 
> > The kernel will do:
> > - mount "tmpfs"
> 
> Actually its a *shared* vfsmount tmpfs for all umh blobs.

yep

> > - allocate a unique file in tmpfs
> > - populate that file with [data, data + len] bytes
> > - user-mode-helper code will do_execve that file and, before the process
> >   starts, the kernel will create two unix pipes for bidirectional
> >   communication between kernel module and umh
> > - close tmpfs file, effectively deleting it
> > - the fork_usermode_blob will return zero on success and populate
> >   'struct umh_info' with two unix pipes and the pid of the user process
> 
> But since its using UMH_WAIT_EXEC, all we can guarantee currently is the
> inception point was intended, well though out, and will run, but the return
> value in no way reflects the success or not of the execution. More below.

yep

> > As the first step in the development of the bpfilter project
> > the fork_usermode_blob() helper is introduced to allow user mode code
> > to be invoked from a kernel module. The idea is that user mode code plus
> > normal kernel module code are built as part of the kernel build
> > and installed as traditional kernel module into distro specified location,
> > such that from a distribution point of view, there is
> > no difference between regular kernel modules and kernel modules + umh code.
> > Such modules can be signed, modprobed, rmmod, etc. The use of this new helper
> > by a kernel module doesn't make it any special from kernel and user space
> > tooling point of view.
> > 
> > Such approach enables kernel to delegate functionality traditionally done
> > by the kernel modules into the user space processes (either root or !root) and
> > reduces security attack surface of the new code. The buggy umh code would crash
> > the user process, but not the kernel. Another advantage is that umh code
> > of the kernel module can be debugged and tested out of user space
> > (e.g. opening the possibility to run clang sanitizers, fuzzers or
> > user space test suites on the umh code).
> > In case of the bpfilter project such architecture allows complex control plane
> > to be done in the user space while bpf based data plane stays in the kernel.
> > 
> > Since umh can crash, can be oom-ed by the kernel, killed by the admin,
> > the kernel module that uses them (like bpfilter) needs to manage life
> > time of umh on its own via two unix pipes and the pid of umh.
> > 
> > The exit code of such kernel module should kill the umh it started,
> > so that rmmod of the kernel module will cleanup the corresponding umh.
> > Just like if the kernel module does kmalloc() it should kfree() it in the exit code.
> > 
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> >  fs/exec.c               |  38 ++++++++---
> >  include/linux/binfmts.h |   1 +
> >  include/linux/umh.h     |  12 ++++
> >  kernel/umh.c            | 176 +++++++++++++++++++++++++++++++++++++++++++++++-
> >  4 files changed, 215 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 183059c427b9..30a36c2a39bf 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -1706,14 +1706,13 @@ static int exec_binprm(struct linux_binprm *bprm)
> >  /*
> >   * sys_execve() executes a new program.
> >   */
> > -static int do_execveat_common(int fd, struct filename *filename,
> > -			      struct user_arg_ptr argv,
> > -			      struct user_arg_ptr envp,
> > -			      int flags)
> > +static int __do_execve_file(int fd, struct filename *filename,
> > +			    struct user_arg_ptr argv,
> > +			    struct user_arg_ptr envp,
> > +			    int flags, struct file *file)
> >  {
> >  	char *pathbuf = NULL;
> >  	struct linux_binprm *bprm;
> > -	struct file *file;
> >  	struct files_struct *displaced;
> >  	int retval;
> 
> Keeping in mind a fuzzer...
> 
> Note, right below this, and not shown here in the hunk, is:
> 
>         if (IS_ERR(filename))                                                   
>                 return PTR_ERR(filename)
> >  
> > @@ -1752,7 +1751,8 @@ static int do_execveat_common(int fd, struct filename *filename,
> >  	check_unsafe_exec(bprm);
> >  	current->in_execve = 1;
> >  
> > -	file = do_open_execat(fd, filename, flags);
> > +	if (!file)
> > +		file = do_open_execat(fd, filename, flags);
> 
> 
> Here we now seem to allow !file and open the file with the passed fd as in
> the old days. This is an expected change.
> 
> >  	retval = PTR_ERR(file);
> >  	if (IS_ERR(file))
> >  		goto out_unmark;
> > @@ -1760,7 +1760,9 @@ static int do_execveat_common(int fd, struct filename *filename,
> >  	sched_exec();
> >  
> >  	bprm->file = file;
> > -	if (fd == AT_FDCWD || filename->name[0] == '/') {
> > +	if (!filename) {
> 
> If anything shouldn't this be:
> 
> 	if (IS_ERR(filename))

nope. it should be !filename as do_execve_file() passes NULL.
IS_ERR != IS_ERR_OR_NULL

> But, wouldn't the above first branch in the routine catch this?
> 
> > +		bprm->filename = "none";
> 
> Given this seems like a desirable branch which was tested, wonder how this
> ever got set if the above branch in the first hunk I noted hit true?
> 
> In any case, we seem to have two cases, can we rule out the exact requirements
> at the top so we can bail out with an error code if one or the other way to
> call this function does not align with expectations?

I think you're misreading the code or I don't understand the concern at all.

> > +	} else if (fd == AT_FDCWD || filename->name[0] == '/') {
> >  		bprm->filename = filename->name;
> >  	} else {
> >  		if (filename->name[0] == '\0')
> > @@ -1826,7 +1828,8 @@ static int do_execveat_common(int fd, struct filename *filename,
> >  	task_numa_free(current);
> >  	free_bprm(bprm);
> >  	kfree(pathbuf);
> > -	putname(filename);
> > +	if (filename)
> > +		putname(filename);
> >  	if (displaced)
> >  		put_files_struct(displaced);
> >  	return retval;
> > @@ -1849,10 +1852,27 @@ static int do_execveat_common(int fd, struct filename *filename,
> >  	if (displaced)
> >  		reset_files_struct(displaced);
> >  out_ret:
> > -	putname(filename);
> > +	if (filename)
> > +		putname(filename);
> >  	return retval;
> >  }
> >  
> > +static int do_execveat_common(int fd, struct filename *filename,
> 
> Further signs the filename is now optional. But I don't understand how these
> branches ever be true, but perhaps I'm missing something?
> 
> > +			      struct user_arg_ptr argv,
> > +			      struct user_arg_ptr envp,
> > +			      int flags)
> > +{
> > +	return __do_execve_file(fd, filename, argv, envp, flags, NULL);
> > +}
> > +
> > +int do_execve_file(struct file *file, void *__argv, void *__envp)
> > +{
> > +	struct user_arg_ptr argv = { .ptr.native = __argv };
> > +	struct user_arg_ptr envp = { .ptr.native = __envp };
> > +
> > +	return __do_execve_file(AT_FDCWD, NULL, argv, envp, 0, file);
> > +}
> 
> Or maybe do the semantics expectations checks here, so we don't clutter
> do_execveat_common() with them?

specifically ?

> > +
> >  int do_execve(struct filename *filename,
> >  	const char __user *const __user *__argv,
> >  	const char __user *const __user *__envp)
> > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > index 4955e0863b83..c05f24fac4f6 100644
> > --- a/include/linux/binfmts.h
> > +++ b/include/linux/binfmts.h
> > @@ -150,5 +150,6 @@ extern int do_execveat(int, struct filename *,
> >  		       const char __user * const __user *,
> >  		       const char __user * const __user *,
> >  		       int);
> > +int do_execve_file(struct file *file, void *__argv, void *__envp);
> >  
> >  #endif /* _LINUX_BINFMTS_H */
> > diff --git a/include/linux/umh.h b/include/linux/umh.h
> > index 244aff638220..5c812acbb80a 100644
> > --- a/include/linux/umh.h
> > +++ b/include/linux/umh.h
> > @@ -22,8 +22,10 @@ struct subprocess_info {
> >  	const char *path;
> >  	char **argv;
> >  	char **envp;
> > +	struct file *file;
> >  	int wait;
> >  	int retval;
> > +	pid_t pid;
> >  	int (*init)(struct subprocess_info *info, struct cred *new);
> >  	void (*cleanup)(struct subprocess_info *info);
> >  	void *data;
> 
> While at it, can you kdocify struct subprocess_info and add new docs for at
> least these two entires you are adding ?

Sorry 'while at it' doesn't sound as a good reason to
add kdoc now instead of later.

> > @@ -38,6 +40,16 @@ call_usermodehelper_setup(const char *path, char **argv, char **envp,
> >  			  int (*init)(struct subprocess_info *info, struct cred *new),
> >  			  void (*cleanup)(struct subprocess_info *), void *data);
> >  
> > +struct subprocess_info *call_usermodehelper_setup_file(struct file *file,
> > +			  int (*init)(struct subprocess_info *info, struct cred *new),
> > +			  void (*cleanup)(struct subprocess_info *), void *data);
> 
> Likewise but on the umc.c file.
> 
> > +struct umh_info {
> > +	struct file *pipe_to_umh;
> > +	struct file *pipe_from_umh;
> > +	pid_t pid;
> > +};
> 
> Likewise.

what 'likewise' ? The kdoc ?

> 
> > +int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> 
> Likewise but on the umc.c files.
> 
> > +
> >  extern int
> >  call_usermodehelper_exec(struct subprocess_info *info, int wait);
> >  
> > diff --git a/kernel/umh.c b/kernel/umh.c
> > index f76b3ff876cf..c3f418d7d51a 100644
> > --- a/kernel/umh.c
> > +++ b/kernel/umh.c
> > @@ -25,6 +25,8 @@
> >  #include <linux/ptrace.h>
> >  #include <linux/async.h>
> >  #include <linux/uaccess.h>
> > +#include <linux/shmem_fs.h>
> > +#include <linux/pipe_fs_i.h>
> >  
> >  #include <trace/events/module.h>
> >  
> > @@ -97,9 +99,13 @@ static int call_usermodehelper_exec_async(void *data)
> >  
> >  	commit_creds(new);
> >  
> > -	retval = do_execve(getname_kernel(sub_info->path),
> > -			   (const char __user *const __user *)sub_info->argv,
> > -			   (const char __user *const __user *)sub_info->envp);
> > +	if (sub_info->file)
> > +		retval = do_execve_file(sub_info->file,
> > +					sub_info->argv, sub_info->envp);
> > +	else
> > +		retval = do_execve(getname_kernel(sub_info->path),
> > +				   (const char __user *const __user *)sub_info->argv,
> > +				   (const char __user *const __user *)sub_info->envp);
> >  out:
> >  	sub_info->retval = retval;
> >  	/*
> > @@ -185,6 +191,8 @@ static void call_usermodehelper_exec_work(struct work_struct *work)
> >  		if (pid < 0) {
> >  			sub_info->retval = pid;
> >  			umh_complete(sub_info);
> > +		} else {
> > +			sub_info->pid = pid;
> >  		}
> >  	}
> >  }
> > @@ -393,6 +401,168 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
> >  }
> >  EXPORT_SYMBOL(call_usermodehelper_setup);
> >  
> > +struct subprocess_info *call_usermodehelper_setup_file(struct file *file,
> > +		int (*init)(struct subprocess_info *info, struct cred *new),
> > +		void (*cleanup)(struct subprocess_info *info), void *data)
> 
> Should be static, no other users outside of this file.

good catch. will change to static.

> Please use umh_setup_file().

sorry. makes no sense.
There is call_usermodehelper_setup() right above it.
call_usermodehelper_setup_file() just follows the naming convention.
If you prefer shorter names, both have to be renamed in the separate patch series.

> > +{
> > +	struct subprocess_info *sub_info;
> 
> Considering a possible fuzzer triggering random data we should probably
> return NULL early and avoid the kzalloc if:

I missing 'fuzzer' point here and earlier.
'fuzzer' cannot reach here. It's all internal api.

> 	if (!file || !init || !cleanup)
> 		return NULL;

sorry, nope. in kernel we don't do defensive programming like this.

> Is data optional? The kdoc could clarify this.

No. Should be obvious from this patch.
The only caller of call_usermodehelper_setup_file() is fork_usermode_blob()
and it passes 'struct umh_info *info'.

> 
> > +
> > +	sub_info = kzalloc(sizeof(struct subprocess_info), GFP_KERNEL);
> > +	if (!sub_info)
> > +		return NULL;
> > +
> > +	INIT_WORK(&sub_info->work, call_usermodehelper_exec_work);
> > +	sub_info->path = "none";
> > +	sub_info->file = file;
> > +	sub_info->init = init;
> > +	sub_info->cleanup = cleanup;
> > +	sub_info->data = data;
> > +	return sub_info;
> > +}
> > +
> > +static struct vfsmount *umh_fs;
> > +
> > +static int init_tmpfs(void)
> 
> Please use umh_init_tmpfs(). 

ok

> Also see init/main.c do_basic_setup() which calls
> usermodehelper_enable() prior to do_initcalls(). Now, fortunately TMPFS is only
> bool, saving us from some races and we do call tmpfs's init first shmem_init():
> 
> static void __init do_basic_setup(void)
> {
> 	cpuset_init_smp();
> 	shmem_init();
> 	driver_init();
> 	init_irq_proc();
> 	do_ctors();
> 	usermodehelper_enable();
> 	do_initcalls();
> }
> 
> But it also means we're enabling your new call call fork_usermode_blob() on
> early init code even if we're not setup. Since this umh tmpfs vfsmount is
> shared I'd say just call this init right before usermodehelper_enable()
> on do_basic_setup().

Not following.
Why init_tmpfs() should be called by __init function?
Are you saying make 'static struct vfsmount *shm_mnt;'
global and use it here? so no init_tmpfs() necessary?
I think that can work, but feels that having two
tmpfs mounts (one for shmem and one for umh) is cleaner.

> 
> > +{
> > +	struct file_system_type *type;
> > +
> > +	if (umh_fs)
> > +		return 0;
> > +	type = get_fs_type("tmpfs");
> > +	if (!type)
> > +		return -ENODEV;
> > +	umh_fs = kern_mount(type);
> > +	if (IS_ERR(umh_fs)) {
> > +		int err = PTR_ERR(umh_fs);
> > +
> > +		put_filesystem(type);
> > +		umh_fs = NULL;
> > +		return err;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int alloc_tmpfs_file(size_t size, struct file **filp)
> 
> Please use umh_alloc_tmpfs_file()

ok

> > +{
> > +	struct file *file;
> > +	int err;
> > +
> > +	err = init_tmpfs();
> > +	if (err)
> > +		return err;
> > +	file = shmem_file_setup_with_mnt(umh_fs, "umh", size, VM_NORESERVE);
> > +	if (IS_ERR(file))
> > +		return PTR_ERR(file);
> > +	*filp = file;
> > +	return 0;
> > +}
> > +
> > +static int populate_file(struct file *file, const void *data, size_t size)
> 
> Please use umh_populate_file()

ok

> > +{
> > +	size_t offset = 0;
> > +	int err;
> > +
> > +	do {
> > +		unsigned int len = min_t(typeof(size), size, PAGE_SIZE);
> > +		struct page *page;
> > +		void *pgdata, *vaddr;
> > +
> > +		err = pagecache_write_begin(file, file->f_mapping, offset, len,
> > +					    0, &page, &pgdata);
> > +		if (err < 0)
> > +			goto fail;
> > +
> > +		vaddr = kmap(page);
> > +		memcpy(vaddr, data, len);
> > +		kunmap(page);
> > +
> > +		err = pagecache_write_end(file, file->f_mapping, offset, len,
> > +					  len, page, pgdata);
> > +		if (err < 0)
> > +			goto fail;
> > +
> > +		size -= len;
> > +		data += len;
> > +		offset += len;
> > +	} while (size);
> 
> Character for character, this looks like a wonderful copy and paste from
> i915_gem_object_create_from_data()'s own loop which does the same exact
> thing. Perhaps its time for a helper on mm/filemap.c with an export so
> if a bug is fixed in one place its fixed in both places.

yes, of course, but not right now.
Once it all lands that will be the time to create common helper.
It's not a good idea to mess with i915 in one patch set.

> > +	return 0;
> > +fail:
> > +	return err;
> > +}
> > +
> > +static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
> 
> The function name umh_pipe_setup() is also used on fs/coredump.c, with the same
> prototype, perhaps rename that before we take this on, even if both are static.

hmm. why?
These are two static functions with the same name, so?
tags get confusing?

> > +{
> > +	struct umh_info *umh_info = info->data;
> > +	struct file *from_umh[2];
> > +	struct file *to_umh[2];
> > +	int err;
> > +
> > +	/* create pipe to send data to umh */
> > +	err = create_pipe_files(to_umh, 0);
> > +	if (err)
> > +		return err;
> > +	err = replace_fd(0, to_umh[0], 0);
> > +	fput(to_umh[0]);
> > +	if (err < 0) {
> > +		fput(to_umh[1]);
> > +		return err;
> > +	}
> > +
> > +	/* create pipe to receive data from umh */
> > +	err = create_pipe_files(from_umh, 0);
> > +	if (err) {
> > +		fput(to_umh[1]);
> > +		replace_fd(0, NULL, 0);
> > +		return err;
> > +	}
> > +	err = replace_fd(1, from_umh[1], 0);
> > +	fput(from_umh[1]);
> > +	if (err < 0) {
> > +		fput(to_umh[1]);
> > +		replace_fd(0, NULL, 0);
> > +		fput(from_umh[0]);
> > +		return err;
> > +	}
> > +
> > +	umh_info->pipe_to_umh = to_umh[1];
> > +	umh_info->pipe_from_umh = from_umh[0];
> > +	return 0;
> > +}
> > +
> > +static void umh_save_pid(struct subprocess_info *info)
> > +{
> > +	struct umh_info *umh_info = info->data;
> > +
> > +	umh_info->pid = info->pid;
> > +}
> > +
> > +int fork_usermode_blob(void *data, size_t len, struct umh_info *info)
> 
> Please use umh_fork_blob()

sorry, no. fork_usermode_blob() is much more descriptive name.

> > +{
> > +	struct subprocess_info *sub_info;
> > +	struct file *file = NULL;
> > +	int err;
> > +
> > +	err = alloc_tmpfs_file(len, &file);
> > +	if (err)
> > +		return err;
> > +
> > +	err = populate_file(file, data, len);
> > +	if (err)
> > +		goto out;
> > +
> > +	err = -ENOMEM;
> > +	sub_info = call_usermodehelper_setup_file(file, umh_pipe_setup,
> > +						  umh_save_pid, info);
> > +	if (!sub_info)
> > +		goto out;
> > +
> > +	err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC);
> 
> Alright, neat, so to be clear, we're just glad to try inception, we have no
> clue or idea what the real return value would be, its up to the caller to track
> the progress somehow?

yep.

> Can you add a kdoc entry for this and clarify requirements?

ok. I'll add a comment to this helper.

> Also, can you extend lib/test_kmod.c with a test case for this with its own
> demo and try to blow it up?

in what sense? bpfilter is the test and the driving component for it.
I'm expecting that folks who want to use this helper to do usb drivers
in user space may want to extend this helper further, but that's their job.

> I hadn't tried suspend/resume during a kmod test, but since we're using a
> kernel_thread() I wouldn't be surprised if we barf while stress testing the
> module loader. Its surely a corner case, but better mention that now than cry
> later if we get heavy umh modules and all of a sudden we start using this for
> whatever reason close to suspend.

folks that care about suspend/resume should do that.
I'm happy to gate this helper for !CONFIG_SUSPEND, since I have
no idea what issues can be uncovered, how to fix them and no desire to do so.

Thanks

^ permalink raw reply

* Re: [PATCH v2 net-next] net: stmmac: Add support for U32 TC filter using Flexible RX Parser
From: Jakub Kicinski @ 2018-05-05  1:33 UTC (permalink / raw)
  To: Jose Abreu
  Cc: netdev, David S. Miller, Joao Pinto, Vitor Soares,
	Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <c933954a111938eb8aef46082993aa541372a9ee.1525424332.git.joabreu@synopsys.com>

On Fri,  4 May 2018 10:01:38 +0100, Jose Abreu wrote:
> This adds support for U32 filter by using an HW only feature called
> Flexible RX Parser. This allow us to match any given packet field with a
> pattern and accept/reject or even route the packet to a specific DMA
> channel.
> 
> Right now we only support acception or rejection of frame and we only
> support simple rules. Though, the Parser has the flexibility of jumping to
> specific rules as an if condition so complex rules can be established.
> 
> This is only supported in GMAC5.10+.
> 
> The following commands can be used to test this code:
> 
> 	1) Setup an ingress qdisk:
> 	# tc qdisc add dev eth0 handle ffff: ingress
> 
> 	2) Setup a filter (e.g. filter by IP):
> 	# tc filter add dev eth0 parent ffff: protocol ip u32 match ip \
> 		src 192.168.0.3 skip_sw action drop
> 
> In every tests performed we always used the "skip_sw" flag to make sure
> only the RX Parser was involved.
> 
> Signed-off-by: Jose Abreu <joabreu@synopsys.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Joao Pinto <jpinto@synopsys.com>
> Cc: Vitor Soares <soares@synopsys.com>
> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> Cc: Alexandre Torgue <alexandre.torgue@st.com>
> Cc: Jakub Kicinski <kubakici@wp.pl>
> ---
> Changes from v1:
> 	- Follow Linux network coding style (David)
> 	- Use tc_cls_can_offload_and_chain0() (Jakub)

Thanks!

> @@ -4223,6 +4277,11 @@ int stmmac_dvr_probe(struct device *device,
>  	ndev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
>  			    NETIF_F_RXCSUM;
>  
> +	ret = stmmac_tc_init(priv, priv);
> +	if (!ret) {
> +		ndev->hw_features |= NETIF_F_HW_TC;
> +	}
> +
>  	if ((priv->plat->tso_en) && (priv->dma_cap.tsoen)) {
>  		ndev->hw_features |= NETIF_F_TSO | NETIF_F_TSO6;
>  		priv->tso = true;

One more comment, but perhaps not a showstopper, it's considered good
practice to disallow clearing/disabling this flag while filters are
installed.  Driver should return -EBUSY from .ndo_set_features if TC
rules are offloaded and user wants to disable HW_TC feature flag.

^ permalink raw reply

* [PATCH 24/24] selftests: net: return Kselftest Skip code for skipped tests
From: Shuah Khan (Samsung OSG) @ 2018-05-05  1:13 UTC (permalink / raw)
  To: shuah, davem; +Cc: linux-kselftest, linux-kernel, netdev
In-Reply-To: <20180505011328.32078-1-shuah@kernel.org>

When net test is skipped because of unmet dependencies and/or unsupported
configuration, it returns 0 which is treated as a pass by the Kselftest
framework. This leads to false positive result even when the test could
not be run.

Change it to return kselftest skip code when a test gets skipped to
clearly report that the test could not be run.

Kselftest framework SKIP code is 4 and the framework prints appropriate
messages to indicate that the test is skipped.

Change psock_tpacket to use ksft_exit_skip() when a non-root user runs
the test and add an explicit check for root and a clear message, instead
of failing the test when /sys/power/state file open fails.

Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
---
 tools/testing/selftests/net/fib_tests.sh    |  8 +++++---
 tools/testing/selftests/net/netdevice.sh    | 16 +++++++++------
 tools/testing/selftests/net/pmtu.sh         |  5 ++++-
 tools/testing/selftests/net/psock_tpacket.c |  4 +++-
 tools/testing/selftests/net/rtnetlink.sh    | 31 ++++++++++++++++-------------
 5 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 9164e60d4b66..5baac82b9287 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -5,6 +5,8 @@
 # different events.
 
 ret=0
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
 
 VERBOSE=${VERBOSE:=0}
 PAUSE_ON_FAIL=${PAUSE_ON_FAIL:=no}
@@ -579,18 +581,18 @@ fib_test()
 
 if [ "$(id -u)" -ne 0 ];then
 	echo "SKIP: Need root privileges"
-	exit 0
+	exit $ksft_skip;
 fi
 
 if [ ! -x "$(command -v ip)" ]; then
 	echo "SKIP: Could not run test without ip tool"
-	exit 0
+	exit $ksft_skip
 fi
 
 ip route help 2>&1 | grep -q fibmatch
 if [ $? -ne 0 ]; then
 	echo "SKIP: iproute2 too old, missing fibmatch"
-	exit 0
+	exit $ksft_skip
 fi
 
 # start clean
diff --git a/tools/testing/selftests/net/netdevice.sh b/tools/testing/selftests/net/netdevice.sh
index 903679e0ff31..e3afcb424710 100755
--- a/tools/testing/selftests/net/netdevice.sh
+++ b/tools/testing/selftests/net/netdevice.sh
@@ -8,6 +8,9 @@
 # if not they probably have failed earlier in the boot process and their logged error will be catched by another test
 #
 
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
 # this function will try to up the interface
 # if already up, nothing done
 # arg1: network interface name
@@ -18,7 +21,7 @@ kci_net_start()
 	ip link show "$netdev" |grep -q UP
 	if [ $? -eq 0 ];then
 		echo "SKIP: $netdev: interface already up"
-		return 0
+		return $ksft_skip
 	fi
 
 	ip link set "$netdev" up
@@ -61,12 +64,12 @@ kci_net_setup()
 	ip address show "$netdev" |grep '^[[:space:]]*inet'
 	if [ $? -eq 0 ];then
 		echo "SKIP: $netdev: already have an IP"
-		return 0
+		return $ksft_skip
 	fi
 
 	# TODO what ipaddr to set ? DHCP ?
 	echo "SKIP: $netdev: set IP address"
-	return 0
+	return $ksft_skip
 }
 
 # test an ethtool command
@@ -84,6 +87,7 @@ kci_netdev_ethtool_test()
 	if [ $ret -ne 0 ];then
 		if [ $ret -eq "$1" ];then
 			echo "SKIP: $netdev: ethtool $2 not supported"
+			return $ksft_skip
 		else
 			echo "FAIL: $netdev: ethtool $2"
 			return 1
@@ -104,7 +108,7 @@ kci_netdev_ethtool()
 	ethtool --version 2>/dev/null >/dev/null
 	if [ $? -ne 0 ];then
 		echo "SKIP: ethtool not present"
-		return 1
+		return $ksft_skip
 	fi
 
 	TMP_ETHTOOL_FEATURES="$(mktemp)"
@@ -176,13 +180,13 @@ kci_test_netdev()
 #check for needed privileges
 if [ "$(id -u)" -ne 0 ];then
 	echo "SKIP: Need root privileges"
-	exit 0
+	exit $ksft_skip
 fi
 
 ip link show 2>/dev/null >/dev/null
 if [ $? -ne 0 ];then
 	echo "SKIP: Could not run test without the ip tool"
-	exit 0
+	exit $ksft_skip
 fi
 
 TMP_LIST_NETDEV="$(mktemp)"
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 1e428781a625..7514f93e1624 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -43,6 +43,9 @@
 #	that MTU is properly calculated instead when MTU is not configured from
 #	userspace
 
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
 tests="
 	pmtu_vti6_exception		vti6: PMTU exceptions
 	pmtu_vti4_exception		vti4: PMTU exceptions
@@ -162,7 +165,7 @@ setup_xfrm6() {
 }
 
 setup() {
-	[ "$(id -u)" -ne 0 ] && echo "  need to run as root" && return 1
+	[ "$(id -u)" -ne 0 ] && echo "  need to run as root" && return $ksft_skip
 
 	cleanup_done=0
 	for arg do
diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c
index 7f6cd9fdacf3..7ec4fa4d55dc 100644
--- a/tools/testing/selftests/net/psock_tpacket.c
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -60,6 +60,8 @@
 
 #include "psock_lib.h"
 
+#include "../kselftest.h"
+
 #ifndef bug_on
 # define bug_on(cond)		assert(!(cond))
 #endif
@@ -825,7 +827,7 @@ static int test_tpacket(int version, int type)
 		fprintf(stderr, "test: skip %s %s since user and kernel "
 			"space have different bit width\n",
 			tpacket_str[version], type_str[type]);
-		return 0;
+		return KSFT_SKIP;
 	}
 
 	sock = pfsocket(version);
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index e6f485235435..fb3767844e42 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -7,6 +7,9 @@
 devdummy="test-dummy0"
 ret=0
 
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
 # set global exit status, but never reset nonzero one.
 check_err()
 {
@@ -333,7 +336,7 @@ kci_test_vrf()
 	ip link show type vrf 2>/dev/null
 	if [ $? -ne 0 ]; then
 		echo "SKIP: vrf: iproute2 too old"
-		return 0
+		return $ksft_skip
 	fi
 
 	ip link add "$vrfname" type vrf table 10
@@ -409,7 +412,7 @@ kci_test_encap_fou()
 	ip fou help 2>&1 |grep -q 'Usage: ip fou'
 	if [ $? -ne 0 ];then
 		echo "SKIP: fou: iproute2 too old"
-		return 1
+		return $ksft_skip
 	fi
 
 	ip netns exec "$testns" ip fou add port 7777 ipproto 47 2>/dev/null
@@ -444,7 +447,7 @@ kci_test_encap()
 	ip netns add "$testns"
 	if [ $? -ne 0 ]; then
 		echo "SKIP encap tests: cannot add net namespace $testns"
-		return 1
+		return $ksft_skip
 	fi
 
 	ip netns exec "$testns" ip link set lo up
@@ -469,7 +472,7 @@ kci_test_macsec()
 	ip macsec help 2>&1 | grep -q "^Usage: ip macsec"
 	if [ $? -ne 0 ]; then
 		echo "SKIP: macsec: iproute2 too old"
-		return 0
+		return $ksft_skip
 	fi
 
 	ip link add link "$devdummy" "$msname" type macsec port 42 encrypt on
@@ -511,14 +514,14 @@ kci_test_gretap()
 	ip netns add "$testns"
 	if [ $? -ne 0 ]; then
 		echo "SKIP gretap tests: cannot add net namespace $testns"
-		return 1
+		return $ksft_skip
 	fi
 
 	ip link help gretap 2>&1 | grep -q "^Usage:"
 	if [ $? -ne 0 ];then
 		echo "SKIP: gretap: iproute2 too old"
 		ip netns del "$testns"
-		return 1
+		return $ksft_skip
 	fi
 
 	# test native tunnel
@@ -561,14 +564,14 @@ kci_test_ip6gretap()
 	ip netns add "$testns"
 	if [ $? -ne 0 ]; then
 		echo "SKIP ip6gretap tests: cannot add net namespace $testns"
-		return 1
+		return $ksft_skip
 	fi
 
 	ip link help ip6gretap 2>&1 | grep -q "^Usage:"
 	if [ $? -ne 0 ];then
 		echo "SKIP: ip6gretap: iproute2 too old"
 		ip netns del "$testns"
-		return 1
+		return $ksft_skip
 	fi
 
 	# test native tunnel
@@ -611,13 +614,13 @@ kci_test_erspan()
 	ip link help erspan 2>&1 | grep -q "^Usage:"
 	if [ $? -ne 0 ];then
 		echo "SKIP: erspan: iproute2 too old"
-		return 1
+		return $ksft_skip
 	fi
 
 	ip netns add "$testns"
 	if [ $? -ne 0 ]; then
 		echo "SKIP erspan tests: cannot add net namespace $testns"
-		return 1
+		return $ksft_skip
 	fi
 
 	# test native tunnel erspan v1
@@ -676,13 +679,13 @@ kci_test_ip6erspan()
 	ip link help ip6erspan 2>&1 | grep -q "^Usage:"
 	if [ $? -ne 0 ];then
 		echo "SKIP: ip6erspan: iproute2 too old"
-		return 1
+		return $ksft_skip
 	fi
 
 	ip netns add "$testns"
 	if [ $? -ne 0 ]; then
 		echo "SKIP ip6erspan tests: cannot add net namespace $testns"
-		return 1
+		return $ksft_skip
 	fi
 
 	# test native tunnel ip6erspan v1
@@ -762,14 +765,14 @@ kci_test_rtnl()
 #check for needed privileges
 if [ "$(id -u)" -ne 0 ];then
 	echo "SKIP: Need root privileges"
-	exit 0
+	exit $ksft_skip
 fi
 
 for x in ip tc;do
 	$x -Version 2>/dev/null >/dev/null
 	if [ $? -ne 0 ];then
 		echo "SKIP: Could not run test without the $x tool"
-		exit 0
+		exit $ksft_skip
 	fi
 done
 
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next] vlan: correct the file path in vlan_dev_change_flags() comment
From: Sun Lianwen @ 2018-05-05  1:08 UTC (permalink / raw)
  To: davem; +Cc: netdev

The vlan_flags enum is defined in include/uapi/linux/if_vlan.h file.
not in include/linux/if_vlan.h file.

Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
---
 net/8021q/vlan_dev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 236452ebbd9e..546af0e73ac3 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -215,7 +215,9 @@ int vlan_dev_set_egress_priority(const struct net_device *dev,
 	return 0;
 }
 
-/* Flags are defined in the vlan_flags enum in include/linux/if_vlan.h file. */
+/* Flags are defined in the vlan_flags enum in
+ * include/uapi/linux/if_vlan.h file.
+ */
 int vlan_dev_change_flags(const struct net_device *dev, u32 flags, u32 mask)
 {
 	struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH v2 net-next 2/4] net: add skeleton of bpfilter kernel module
From: Alexei Starovoitov @ 2018-05-05  1:00 UTC (permalink / raw)
  To: Edward Cree
  Cc: Alexei Starovoitov, davem, daniel, torvalds, gregkh, luto, netdev,
	linux-kernel, kernel-team
In-Reply-To: <c0ad0d12-418f-8dfd-ed2c-92e36316310b@solarflare.com>

On Thu, May 03, 2018 at 03:23:55PM +0100, Edward Cree wrote:
> On 03/05/18 05:36, Alexei Starovoitov wrote:
> > bpfilter.ko consists of bpfilter_kern.c (normal kernel module code)
> > and user mode helper code that is embedded into bpfilter.ko
> >
> > The steps to build bpfilter.ko are the following:
> > - main.c is compiled by HOSTCC into the bpfilter_umh elf executable file
> > - with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file
> >   is converted into bpfilter_umh.o object file
> >   with _binary_net_bpfilter_bpfilter_umh_start and _end symbols
> >   Example:
> >   $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o
> >   0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end
> >   0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size
> >   0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start
> > - bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko
> >
> > bpfilter_kern.c is a normal kernel module code that calls
> > the fork_usermode_blob() helper to execute part of its own data
> > as a user mode process.
> >
> > Notice that _binary_net_bpfilter_bpfilter_umh_start - end
> > is placed into .init.rodata section, so it's freed as soon as __init
> > function of bpfilter.ko is finished.
> > As part of __init the bpfilter.ko does first request/reply action
> > via two unix pipe provided by fork_usermode_blob() helper to
> > make sure that umh is healthy. If not it will kill it via pid.
> >
> > Later bpfilter_process_sockopt() will be called from bpfilter hooks
> > in get/setsockopt() to pass iptable commands into umh via bpfilter.ko
> >
> > If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will
> > kill umh as well.
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
...
> > +static void stop_umh(void)
> > +{
> > +	if (bpfilter_process_sockopt) {
> I worry about locking here.  Is it possible for two calls to
>  bpfilter_process_sockopt() to run in parallel, both fail, and thus both
>  call stop_umh()?  And if both end up calling shutdown_umh(), we double
>  fput().

I thought iptables sockopt is serialized earlier. Nope.
We need to grab the mutex to access these pipes.
Will fix.

Thanks for spelling nits. Will fix as well.

^ permalink raw reply

* Re: [PATCH iproute2-next] bpf: don't offload perf array maps
From: Daniel Borkmann @ 2018-05-05  0:48 UTC (permalink / raw)
  To: Jakub Kicinski, dsahern, alexei.starovoitov; +Cc: stephen, netdev, oss-drivers
In-Reply-To: <20180505003751.2232-1-jakub.kicinski@netronome.com>

On 05/05/2018 02:37 AM, Jakub Kicinski wrote:
> Perf arrays are handled specially by the kernel, don't request
> offload even when used by an offloaded program.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* [PATCH iproute2-next] bpf: don't offload perf array maps
From: Jakub Kicinski @ 2018-05-05  0:37 UTC (permalink / raw)
  To: dsahern, alexei.starovoitov, daniel
  Cc: stephen, netdev, oss-drivers, Jakub Kicinski

Perf arrays are handled specially by the kernel, don't request
offload even when used by an offloaded program.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 lib/bpf.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/lib/bpf.c b/lib/bpf.c
index d9a406bf55f2..4e26c0df76c5 100644
--- a/lib/bpf.c
+++ b/lib/bpf.c
@@ -97,6 +97,11 @@ static const struct bpf_prog_meta __bpf_prog_meta[] = {
 	},
 };
 
+static bool bpf_map_offload_neutral(enum bpf_map_type type)
+{
+	return type == BPF_MAP_TYPE_PERF_EVENT_ARRAY;
+}
+
 static const char *bpf_prog_to_subdir(enum bpf_prog_type type)
 {
 	assert(type < ARRAY_SIZE(__bpf_prog_meta) &&
@@ -1594,7 +1599,7 @@ static int bpf_map_attach(const char *name, struct bpf_elf_ctx *ctx,
 			  const struct bpf_elf_map *map, struct bpf_map_ext *ext,
 			  int *have_map_in_map)
 {
-	int fd, ret, map_inner_fd = 0;
+	int fd, ifindex, ret, map_inner_fd = 0;
 
 	fd = bpf_probe_pinned(name, ctx, map->pinning);
 	if (fd > 0) {
@@ -1631,10 +1636,10 @@ static int bpf_map_attach(const char *name, struct bpf_elf_ctx *ctx,
 		}
 	}
 
+	ifindex = bpf_map_offload_neutral(map->type) ? 0 : ctx->ifindex;
 	errno = 0;
 	fd = bpf_map_create(map->type, map->size_key, map->size_value,
-			    map->max_elem, map->flags, map_inner_fd,
-			    ctx->ifindex);
+			    map->max_elem, map->flags, map_inner_fd, ifindex);
 
 	if (fd < 0 || ctx->verbose) {
 		bpf_map_report(fd, name, map, ctx, map_inner_fd);
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH bpf-next v3 00/15] Introducing AF_XDP support
From: Alexei Starovoitov @ 2018-05-05  0:34 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Daniel Borkmann, Björn Töpel, Karlsson, Magnus,
	Alexander Duyck, Alexander Duyck, John Fastabend,
	Alexei Starovoitov, Jesper Dangaard Brouer, Willem de Bruijn,
	Michael S. Tsirkin, Network Development, Björn Töpel,
	michael.lundkvist, Brandeburg, Jesse, Singhai, Anjali,
	Zhang, Qi Z
In-Reply-To: <CAJ8uoz3V8x4uv8Xeb+qaVB0_Rkd73TuU=3ubvkDh9b7nAkXSyw@mail.gmail.com>

On Fri, May 04, 2018 at 01:22:17PM +0200, Magnus Karlsson wrote:
> On Fri, May 4, 2018 at 1:38 AM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Fri, May 04, 2018 at 12:49:09AM +0200, Daniel Borkmann wrote:
> >> On 05/02/2018 01:01 PM, Björn Töpel wrote:
> >> > From: Björn Töpel <bjorn.topel@intel.com>
> >> >
> >> > This patch set introduces a new address family called AF_XDP that is
> >> > optimized for high performance packet processing and, in upcoming
> >> > patch sets, zero-copy semantics. In this patch set, we have removed
> >> > all zero-copy related code in order to make it smaller, simpler and
> >> > hopefully more review friendly. This patch set only supports copy-mode
> >> > for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode
> >> > for RX using the XDP_DRV path. Zero-copy support requires XDP and
> >> > driver changes that Jesper Dangaard Brouer is working on. Some of his
> >> > work has already been accepted. We will publish our zero-copy support
> >> > for RX and TX on top of his patch sets at a later point in time.
> >>
> >> +1, would be great to see it land this cycle. Saw few minor nits here
> >> and there but nothing to hold it up, for the series:
> >>
> >> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> >>
> >> Thanks everyone!
> >
> > Great stuff!
> >
> > Applied to bpf-next, with one condition.
> > Upcoming zero-copy patches for both RX and TX need to be posted
> > and reviewed within this release window.
> > If netdev community as a whole won't be able to agree on the zero-copy
> > bits we'd need to revert this feature before the next merge window.
> 
> Thanks everyone for reviewing this. Highly appreciated.
> 
> Just so we understand the purpose correctly:
> 
> 1: Do you want to see the ZC patches in order to verify that the user
> space API holds? If so, we can produce an additional RFC  patch set
> using a big chunk of code that we had in RFC V1. We are not proud of
> this code since it is clunky, but it hopefully proves the point with
> the uapi being the same.
> 
> 2: And/Or are you worried about us all (the netdev community) not
> agreeing on a way to implement ZC internally in the drivers and the
> XDP infrastructure? This is not going to be possible to finish during
> this cycle since we do not like the implementation we had in RFC V1.
> Too intrusive and now we also have nicer abstractions from Jesper that
> we can use and extend to provide a (hopefully) much cleaner and less
> intrusive solution.

short answer: both.

Cleanliness and performance of the ZC code is not as important as
getting API right. The main concern that during ZC review process
we will find out that existing API has issues, so we have to
do this exercise before the merge window.
And RFC won't fly. Send the patches for real. They have to go
through the proper code review. The hackers of netdev community
can accept a partial, or a bit unclean, or slightly inefficient
implementation, since it can be and will be improved later,
but API we cannot change once it goes into official release.

Here is the example of API concern:
this patch set added shared umem concept. It sounds good in theory,
but will it perform well with ZC ? Earlier RFCs didn't have that
feature. If it won't perform well than it shouldn't be in the tree.
The key reason to let AF_XDP into the tree is its performance promise.
If it doesn't perform we should rip it out and redesign.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox