Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH][next] can: at91_can: mark expected switch fall-throughs
From: Sergei Shtylyov @ 2019-02-09 19:17 UTC (permalink / raw)
  To: Gustavo A. R. Silva, Wolfgang Grandegger, Marc Kleine-Budde,
	David S. Miller, Nicolas Ferre, Alexandre Belloni,
	Ludovic Desroches
  Cc: linux-can, netdev, linux-arm-kernel, linux-kernel
In-Reply-To: <8402d6f6-dff1-eb99-3bd1-051b4f6f24f0@cogentembedded.com>

On 02/08/2019 09:55 PM, Sergei Shtylyov wrote:

>> In preparation to enabling -Wimplicit-fallthrough, mark switch
>> cases where we are expecting to fall through.
>>
>> Notice that, in this particular case, the /* fall through */
>> comments are placed at the bottom of the case statement, which
>> is what GCC is expecting to find.
>>
>> Warning level 3 was used: -Wimplicit-fallthrough=3
>>
>> This patch is part of the ongoing efforts to enabling
>> -Wimplicit-fallthrough.
>>
>> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
>> ---
>>  drivers/net/can/at91_can.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/can/at91_can.c b/drivers/net/can/at91_can.c
>> index d98c69045b17..1718c20f9c99 100644
>> --- a/drivers/net/can/at91_can.c
>> +++ b/drivers/net/can/at91_can.c
>> @@ -902,7 +902,8 @@ static void at91_irq_err_state(struct net_device *dev,
>>  				CAN_ERR_CRTL_TX_WARNING :
>>  				CAN_ERR_CRTL_RX_WARNING;
>>  		}
>> -	case CAN_STATE_ERROR_WARNING:	/* fallthrough */
>> +		/* fall through */
> 
>    Why do we need this comment at all? Just remove it, that's all.
> 
>> +	case CAN_STATE_ERROR_WARNING:
>>  		/*
>>  		 * from: ERROR_ACTIVE, ERROR_WARNING
>>  		 * to  : ERROR_PASSIVE, BUS_OFF
>> @@ -951,7 +952,8 @@ static void at91_irq_err_state(struct net_device *dev,
>>  		netdev_dbg(dev, "Error Active\n");
>>  		cf->can_id |= CAN_ERR_PROT;
>>  		cf->data[2] = CAN_ERR_PROT_ACTIVE;
>> -	case CAN_STATE_ERROR_WARNING:	/* fallthrough */
>> +		/* fall through */
> 
>    Again, we don;t need it here.
> 
>> +	case CAN_STATE_ERROR_WARNING:
>>  		reg_idr = AT91_IRQ_ERRA | AT91_IRQ_WARN | AT91_IRQ_BOFF;
>>  		reg_ier = AT91_IRQ_ERRP;
>>  		break;

   Ignore me, I misread the code...

MBR, Sergei

^ permalink raw reply

* Re: Possible bug into DSA2 code.
From: Andrew Lunn @ 2019-02-09 19:34 UTC (permalink / raw)
  To: Rodolfo Giometti
  Cc: Florian Fainelli, Vivien Didelot, David S. Miller, netdev
In-Reply-To: <528f797d-445b-a314-d8ef-db15a3b6a8ce@enneenne.com>

> So we I see two possible solutions:
> 
> 1) having both ds->slave_mii_bus and ds->ops->phy_read already defined is an
> error, then it must be signaled to the calling code, or

I don't think we can do that. mv88e6xxx optionally instantiates the
MDIO busses, depending on what is in device tree. If there is no mdio
property, we need the DSA core to create an MDIO bus.

Looking at the driver, ds->slave_mii_bus is assigned in
mv88e6xxx_setup().

We have talked about adding a teardown() to the ops structure. This
seems like another argument we should do it. The mv88e6xxx_teardown()
can set ds->slave_mii_bus back to NULL, undoing what it did in the
setup code.

      Andrew

^ permalink raw reply

* Re: [PATCH net-next] net: phy: Add support for asking the PHY its abilities
From: Andrew Lunn @ 2019-02-09 19:42 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org
In-Reply-To: <1984f4fa-78df-bb51-5021-306ce87c3a80@gmail.com>

> I know, it's patch 15 in your series ;) And I'm aware that usually new
> core functionality is acceptable only if it comes together with a user,
> to avoid having billions of orphaned good ideas in the code.
> I focused on the core here to not get lost in all the new stuff, and to
> provide Maxime with some direction for adjusting his series.

I'm just trying to avoid Maxime reimplementing something when we
already have a patch:

https://github.com/lunn/linux/commit/07e3fa8f183f05a09d969a9378534da35238eeb9

Maxime, feel free to cherry-pick this into your series.

	Andrew

^ permalink raw reply

* Re: [PATCH net-next] net: phy: Add support for asking the PHY its abilities
From: Heiner Kallweit @ 2019-02-09 19:50 UTC (permalink / raw)
  To: Andrew Lunn, Maxime Chevallier
  Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org
In-Reply-To: <20190209194255.GJ30856@lunn.ch>

On 09.02.2019 20:42, Andrew Lunn wrote:
>> I know, it's patch 15 in your series ;) And I'm aware that usually new
>> core functionality is acceptable only if it comes together with a user,
>> to avoid having billions of orphaned good ideas in the code.
>> I focused on the core here to not get lost in all the new stuff, and to
>> provide Maxime with some direction for adjusting his series.
> 
> I'm just trying to avoid Maxime reimplementing something when we
> already have a patch:
> 
Sure, I didn't mean Maxime should re-implement things we have in the pipe.
I meant it more in a way that he gets an idea in which direction we're moving.

> https://github.com/lunn/linux/commit/07e3fa8f183f05a09d969a9378534da35238eeb9
> 
> Maxime, feel free to cherry-pick this into your series.
> 
I'll submit this one too.

> 	Andrew
> 
Heiner

Oh, and I just saw: When we talk about/with somebody, we should add him to the
mail addressee list.

^ permalink raw reply

* Re: [PATCH net-next] net: phy: Add support for asking the PHY its abilities
From: Heiner Kallweit @ 2019-02-09 20:08 UTC (permalink / raw)
  To: Andrew Lunn, Maxime Chevallier
  Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org
In-Reply-To: <addc4332-bf56-dd86-d70f-5001db5192b0@gmail.com>

On 09.02.2019 20:50, Heiner Kallweit wrote:
> On 09.02.2019 20:42, Andrew Lunn wrote:
>>> I know, it's patch 15 in your series ;) And I'm aware that usually new
>>> core functionality is acceptable only if it comes together with a user,
>>> to avoid having billions of orphaned good ideas in the code.
>>> I focused on the core here to not get lost in all the new stuff, and to
>>> provide Maxime with some direction for adjusting his series.
>>
>> I'm just trying to avoid Maxime reimplementing something when we
>> already have a patch:
>>
> Sure, I didn't mean Maxime should re-implement things we have in the pipe.
> I meant it more in a way that he gets an idea in which direction we're moving.
> 
>> https://github.com/lunn/linux/commit/07e3fa8f183f05a09d969a9378534da35238eeb9
>>
>> Maxime, feel free to cherry-pick this into your series.
>>
> I'll submit this one too.
> 
Just saw that this patch depends on patch 5 of Maxime's series. And it needs a
small change because the generic function was renamed to
genphy_c45_pma_read_abilities(). So indeed it may be better if Maxime adds
this patch to his series.

One comment regarding the pause bits in Maxime's patch 5:
If no pause bit is set the core automatically sets both. So we have to do this
neither in the marvell10g driver nor in the generic read-abilities function.

>> 	Andrew
>>
Heiner

> Heiner
> 
> Oh, and I just saw: When we talk about/with somebody, we should add him to the
> mail addressee list.
> 


^ permalink raw reply

* How to set promiscuous mode
From: William Flanagan @ 2019-02-09 20:22 UTC (permalink / raw)
  To: netdev



Hi,

Working with iproute2 for a task with Wireshark.  I don't see the 
command in 'ip' to put an Ethernet port into promiscuous mode.  A reply 
from the openSuse forum (below) tells me how.

I'm wondering if this should be in the 'MAN ip' page.

Regards,

Bill Flanagan


-------- Forwarded Message --------


Dear konsultor,

tsu2 has just replied to a thread you have subscribed to entitled - LEAP 
15 how to set promiscuous mode on eth0? - in the Network/Internet forum 
of openSUSE Forums.

This thread is located at:
https://forums.opensuse.org/showthread.php/534881-how-to-set-promiscuous-mode-on-eth0?goto=newpost

Here is the message that has just been posted:
***************

---Quote (Originally by konsultor)---
Thanks, Tsu. Double 'atta boy" for sure.
What I find odd is that neither 'promisc' nor 'promiscuous' appears in 
the 'MAN ip" document. Is that a reportable bug?
---End Quote---
MAN authors are typically listed in the document, best to write them 
directly.

TSU
***************


There may also be other replies, but you will not receive any more 
notifications until you visit the forum again.

All the best,
openSUSE Forums

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Unsubscription information:

To unsubscribe from this thread, please visit this page:
https://forums.opensuse.org/subscription.php?do=removesubscription&type=thread&subscriptionid=189629&auth=887f2c6151997a4244524bdac85ff08b

To unsubscribe from ALL threads, please visit this page:
https://forums.opensuse.org/subscription.php?do=viewsubscription&folderid=all

-- 
Flanagan Consulting

Creative Network Solutions
 From Desktop to Data Center

3800 Concorde Parkway, Suite 1500, Chantilly, Virginia 20151 USA
Ph:  +1.703.242.8381    Fx:  +1.703.242.8391
www.Flanagan-Consulting.com
Flanagan Consulting is a Service Mark of W. A. Flanagan, Inc.

"Beware of false knowledge; it is more dangerous than ignorance."
                                        --George Bernard Shaw



^ permalink raw reply

* [PATCH] net/packet: fix 4gb buffer limit due to overflow check
From: Kal Conley @ 2019-02-09 20:37 UTC (permalink / raw)
  To: davem
  Cc: kal.conley, Willem de Bruijn, Eric Dumazet, Greg Kroah-Hartman,
	Jeff Kirsher, Alexander Duyck, Kirill Tkhai, Vincent Whitchurch,
	Li RongQing, Magnus Karlsson, open list:NETWORKING [GENERAL],
	open list

When calculating rb->frames_per_block * req->tp_block_nr the result
can overflow. Check it for overflow without limiting the total buffer
size to UINT_MAX.

This change fixes support for packet ring buffers >= UINT_MAX.
---
 net/packet/af_packet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index d0945253f43b..d603a430378e 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4291,7 +4291,7 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
 		rb->frames_per_block = req->tp_block_size / req->tp_frame_size;
 		if (unlikely(rb->frames_per_block == 0))
 			goto out;
-		if (unlikely(req->tp_block_size > UINT_MAX / req->tp_block_nr))
+		if (unlikely(rb->frames_per_block > UINT_MAX / req->tp_block_nr))
 			goto out;
 		if (unlikely((rb->frames_per_block * req->tp_block_nr) !=
 					req->tp_frame_nr))
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next 15/16] net: switchdev: Replace port attr set SDO with a notification
From: Jiri Pirko @ 2019-02-09 20:38 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <83d8f16f-a73d-5aa7-4718-fc45e0fed226@gmail.com>

Sat, Feb 09, 2019 at 01:36:18AM CET, f.fainelli@gmail.com wrote:
>On 2/8/19 4:32 PM, Florian Fainelli wrote:
>> Drop switchdev_ops.switchdev_port_attr_set. Drop the uses of this field
>> from all clients, which were migrated to use switchdev notification in
>> the previous patches.
>> 
>> Add a new function switchdev_port_attr_notify() that sends the switchdev
>> notifications SWITCHDEV_PORT_ATTR_SET.
>> 
>> Drop __switchdev_port_attr_set() and update switchdev_port_attr_set()
>> likewise.
>> 
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>>  include/net/switchdev.h   | 18 --------
>>  net/switchdev/switchdev.c | 92 ++++++++++-----------------------------
>>  2 files changed, 22 insertions(+), 88 deletions(-)
>> 
>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>> index 4c5f7e5430cf..5387ff6f41c5 100644
>> --- a/include/net/switchdev.h
>> +++ b/include/net/switchdev.h
>> @@ -111,21 +111,6 @@ void *switchdev_trans_item_dequeue(struct switchdev_trans *trans);
>>  
>>  typedef int switchdev_obj_dump_cb_t(struct switchdev_obj *obj);
>>  
>> -/**
>> - * struct switchdev_ops - switchdev operations
>> - *
>> - * @switchdev_port_attr_get: Get a port attribute (see switchdev_attr).
>> - *
>> - * @switchdev_port_attr_set: Set a port attribute (see switchdev_attr).
>> - */
>> -struct switchdev_ops {
>> -	int	(*switchdev_port_attr_get)(struct net_device *dev,
>> -					   struct switchdev_attr *attr);
>> -	int	(*switchdev_port_attr_set)(struct net_device *dev,
>> -					   const struct switchdev_attr *attr,
>> -					   struct switchdev_trans *trans);
>> -};
>> -
>
>This and the hunk below bisection, I will move that into patch #16 after
>receiving feedback on this.

Looks good. Thanks!

^ permalink raw reply

* [PATCH net-next v2] add snmp counter document
From: yupeng @ 2019-02-09 22:46 UTC (permalink / raw)
  To: netdev, davem; +Cc: xiyou.wangcong, rdunlap

add document for tcp retransmission, tcp fast open, syn cookies,
challenge ack, prune and several general counters

Signed-off-by: yupeng <yupeng0921@gmail.com>
---
 Documentation/networking/snmp_counter.rst | 184 +++++++++++++++++++++++++++++-
 1 file changed, 181 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/snmp_counter.rst b/Documentation/networking/snmp_counter.rst
index c5642f43..52b026b 100644
--- a/Documentation/networking/snmp_counter.rst
+++ b/Documentation/networking/snmp_counter.rst
@@ -367,16 +367,19 @@ to the accept queue.
 TCP Fast Open
 =============
 * TcpEstabResets
+
 Defined in `RFC1213 tcpEstabResets`_.
 
 .. _RFC1213 tcpEstabResets: https://tools.ietf.org/html/rfc1213#page-48
 
 * TcpAttemptFails
+
 Defined in `RFC1213 tcpAttemptFails`_.
 
 .. _RFC1213 tcpAttemptFails: https://tools.ietf.org/html/rfc1213#page-48
 
 * TcpOutRsts
+
 Defined in `RFC1213 tcpOutRsts`_. The RFC says this counter indicates
 the 'segments sent containing the RST flag', but in linux kernel, this
 couner indicates the segments kerenl tried to send. The sending
@@ -384,6 +387,30 @@ process might be failed due to some errors (e.g. memory alloc failed).
 
 .. _RFC1213 tcpOutRsts: https://tools.ietf.org/html/rfc1213#page-52
 
+* TcpExtTCPSpuriousRtxHostQueues
+
+When the TCP stack wants to retransmit a packet, and finds that packet
+is not lost in the network, but the packet is not sent yet, the TCP
+stack would give up the retransmission and update this counter. It
+might happen if a packet stays too long time in a qdisc or driver
+queue.
+
+* TcpEstabResets
+
+The socket receives a RST packet in Establish or CloseWait state.
+
+* TcpExtTCPKeepAlive
+
+This counter indicates many keepalive packets were sent. The keepalive
+won't be enabled by default. A userspace program could enable it by
+setting the SO_KEEPALIVE socket option.
+
+* TcpExtTCPSpuriousRTOs
+
+The spurious retransmission timeout detected by the `F-RTO`_
+algorithm.
+
+.. _F-RTO: https://tools.ietf.org/html/rfc5682
 
 TCP Fast Path
 ============
@@ -609,6 +636,29 @@ packet yet, the sender would know packet 4 is out of order. The TCP
 stack of kernel will increase TcpExtTCPSACKReorder for both of the
 above scenarios.
 
+* TcpExtTCPSlowStartRetrans
+
+The TCP stack wants to retransmit a packet and the congestion control
+state is 'Loss'.
+
+* TcpExtTCPFastRetrans
+
+The TCP stack wants to retransmit a packet and the congestion control
+state is not 'Loss'.
+
+* TcpExtTCPLostRetransmit
+
+A SACK points out that a retransmission packet is lost again.
+
+* TcpExtTCPRetransFail
+
+The TCP stack tries to deliver a retransmission packet to lower layers
+but the lower layers return an error.
+
+* TcpExtTCPSynRetrans
+
+The TCP stack retransmits a SYN packet.
+
 DSACK
 =====
 The DSACK is defined in `RFC2883`_. The receiver uses DSACK to report
@@ -790,8 +840,9 @@ unacknowledged number (more strict than `RFC 5961 section 5.2`_).
 .. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11
 
 TCP receive window
-=================
+==================
 * TcpExtTCPWantZeroWindowAdv
+
 Depending on current memory usage, the TCP stack tries to set receive
 window to zero. But the receive window might still be a no-zero
 value. For example, if the previous window size is 10, and the TCP
@@ -799,14 +850,16 @@ stack receives 3 bytes, the current window size would be 7 even if the
 window size calculated by the memory usage is zero.
 
 * TcpExtTCPToZeroWindowAdv
+
 The TCP receive window is set to zero from a no-zero value.
 
 * TcpExtTCPFromZeroWindowAdv
+
 The TCP receive window is set to no-zero value from zero.
 
 
 Delayed ACK
-==========
+===========
 The TCP Delayed ACK is a technique which is used for reducing the
 packet count in the network. For more details, please refer the
 `Delayed ACK wiki`_
@@ -814,10 +867,12 @@ packet count in the network. For more details, please refer the
 .. _Delayed ACK wiki: https://en.wikipedia.org/wiki/TCP_delayed_acknowledgment
 
 * TcpExtDelayedACKs
+
 A delayed ACK timer expires. The TCP stack will send a pure ACK packet
 and exit the delayed ACK mode.
 
 * TcpExtDelayedACKLocked
+
 A delayed ACK timer expires, but the TCP stack can't send an ACK
 immediately due to the socket is locked by a userspace program. The
 TCP stack will send a pure ACK later (after the userspace program
@@ -826,24 +881,147 @@ TCP stack will also update TcpExtDelayedACKs and exit the delayed ACK
 mode.
 
 * TcpExtDelayedACKLost
+
 It will be updated when the TCP stack receives a packet which has been
 ACKed. A Delayed ACK loss might cause this issue, but it would also be
 triggered by other reasons, such as a packet is duplicated in the
 network.
 
 Tail Loss Probe (TLP)
-===================
+=====================
 TLP is an algorithm which is used to detect TCP packet loss. For more
 details, please refer the `TLP paper`_.
 
 .. _TLP paper: https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01
 
 * TcpExtTCPLossProbes
+
 A TLP probe packet is sent.
 
 * TcpExtTCPLossProbeRecovery
+
 A packet loss is detected and recovered by TLP.
 
+TCP Fast Open
+=============
+TCP Fast Open is a technology which allows data transfer before the
+3-way handshake complete. Please refer the `TCP Fast Open wiki`_ for a
+general description.
+
+.. _TCP Fast Open wiki: https://en.wikipedia.org/wiki/TCP_Fast_Open
+
+* TcpExtTCPFastOpenActive
+
+When the TCP stack receives an ACK packet in the SYN-SENT status, and
+the ACK packet acknowledges the data in the SYN packet, the TCP stack
+understand the TFO cookie is accepted by the other side, then it
+updates this counter.
+
+* TcpExtTCPFastOpenActiveFail
+
+This counter indicates that the TCP stack initiated a TCP Fast Open,
+but it failed. This counter would be updated in three scenarios: (1)
+the other side doesn't acknowledge the data in the SYN packet. (2) The
+SYN packet which has the TFO cookie is timeout at least once. (3)
+after the 3-way handshake, the retransmission timeout happens
+net.ipv4.tcp_retries1 times, because some middle-boxes may black-hole
+fast open after the handshake.
+
+* TcpExtTCPFastOpenPassive
+
+This counter indicates how many times the TCP stack accepts the fast
+open request.
+
+* TcpExtTCPFastOpenPassiveFail
+
+This counter indicates how many times the TCP stack rejects the fast
+open request. It is caused by either the TFO cookie is invalid or the
+TCP stack finds an error during the socket creating process.
+
+* TcpExtTCPFastOpenListenOverflow
+
+When the pending fast open request number is larger than
+fastopenq->max_qlen, the TCP stack will reject the fast open request
+and update this counter. When this counter is updated, the TCP stack
+won't update TcpExtTCPFastOpenPassive or
+TcpExtTCPFastOpenPassiveFail. The fastopenq->max_qlen is set by the
+TCP_FASTOPEN socket operation and it could not be larger than
+net.core.somaxconn. For example:
+
+setsockopt(sfd, SOL_TCP, TCP_FASTOPEN, &qlen, sizeof(qlen));
+
+* TcpExtTCPFastOpenCookieReqd
+
+This counter indicates how many times a client wants to request a TFO
+cookie.
+
+SYN cookies
+===========
+SYN cookies are used to mitigate SYN flood, for details, please refer
+the `SYN cookies wiki`_.
+
+.. _SYN cookies wiki: https://en.wikipedia.org/wiki/SYN_cookies
+
+* TcpExtSyncookiesSent
+
+It indicates how many SYN cookies are sent.
+
+* TcpExtSyncookiesRecv
+
+How many reply packets of the SYN cookies the TCP stack receives.
+
+* TcpExtSyncookiesFailed
+
+The MSS decoded from the SYN cookie is invalid. When this counter is
+updated, the received packet won't be treated as a SYN cookie and the
+TcpExtSyncookiesRecv counter wont be updated.
+
+Challenge ACK
+=============
+For details of challenge ACK, please refer the explaination of
+TcpExtTCPACKSkippedChallenge.
+
+* TcpExtTCPChallengeACK
+
+The number of challenge acks sent.
+
+* TcpExtTCPSYNChallenge
+
+The number of challenge acks sent in response to SYN packets. After
+updates this counter, the TCP stack might send a challenge ACK and
+update the TcpExtTCPChallengeACK counter, or it might also skip to
+send the challenge and update the TcpExtTCPACKSkippedChallenge.
+
+prune
+=====
+When a socket is under memory pressure, the TCP stack will try to
+reclaim memory from the receiving queue and out of order queue. One of
+the reclaiming method is 'collapse', which means allocate a big sbk,
+copy the contiguous skbs to the single big skb, and free these
+contiguous skbs.
+
+* TcpExtPruneCalled
+
+The TCP stack tries to reclaim memory for a socket. After updates this
+counter, the TCP stack will try to collapse the out of order queue and
+the receiving queue. If the memory is still not enough, the TCP stack
+will try to discard packets from the out of order queue (and update the
+TcpExtOfoPruned counter)
+
+* TcpExtOfoPruned
+
+The TCP stack tries to discard packet on the out of order queue.
+
+* TcpExtRcvPruned
+
+After 'collapse' and discard packets from the out of order queue, if
+the actually used memory is still larger than the max allowed memory,
+this counter will be updated. It means the 'prune' fails.
+
+* TcpExtTCPRcvCollapsed
+
+This counter indicates how many skbs are freed during 'collapse'.
+
 examples
 ========
 
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH net] sctp: make sctp_setsockopt_events() less strict about the option length
From: David Miller @ 2019-02-09 23:12 UTC (permalink / raw)
  To: marcelo.leitner
  Cc: julien, netdev, linux-sctp, linux-kernel, nhorman, vyasevich,
	lucien.xin
In-Reply-To: <20190206203754.GC13621@localhost.localdomain>

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Wed, 6 Feb 2019 18:37:54 -0200

> On Wed, Feb 06, 2019 at 12:14:30PM -0800, Julien Gomes wrote:
>> Make sctp_setsockopt_events() able to accept sctp_event_subscribe
>> structures longer than the current definitions.
>> 
>> This should prevent unjustified setsockopt() failures due to struct
>> sctp_event_subscribe extensions (as in 4.11 and 4.12) when using
>> binaries that should be compatible, but were built with later kernel
>> uapi headers.
> 
> Not sure if we support backwards compatibility like this?

What a complete mess we have here.

Use new socket option numbers next time, do not change the size and/or
layout of existing socket options.

This whole thread, if you read it, is basically "if we compatability
this way, that breaks, and if we do compatability this other way oh
shit this other thing doesn't work."

I think we really need to specifically check for the difference sizes
that existed one by one, clear out the part not given by the user, and
backport this as far back as possible in a way that in the older kernels
we see if the user is actually trying to use the new features and if so
error out.

Which, btw, is terrible behavior.  Newly compiled apps should work on
older kernels if they don't try to use the new features, and if they
can the ones that want to try to use the new features should be able
to fall back when that feature isn't available in a non-ambiguous
and precisely defined way.

The fact that the use of the new feature is hidden in the new
structure elements is really rotten.

This patch, at best, needs some work and definitely a longer and more
detailed commit message.

^ permalink raw reply

* Re: [PATCH bpf] bpf: Fix narrow load on a bpf_sock returned from sk_lookup()
From: Joe Stringer @ 2019-02-10  1:47 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
	Joe Stringer
In-Reply-To: <20190209062554.142612-1-kafai@fb.com>

On Fri, 8 Feb 2019 at 22:27, Martin KaFai Lau <kafai@fb.com> wrote:
>
> By adding this test to test_verifier:
> {
>         "reference tracking: access sk->src_ip4 (narrow load)",
>         .insns = {
>         BPF_SK_LOOKUP,
>         BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
>         BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
>         BPF_LDX_MEM(BPF_H, BPF_REG_2, BPF_REG_0, offsetof(struct bpf_sock, src_ip4) + 2),
>         BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
>         BPF_EMIT_CALL(BPF_FUNC_sk_release),
>         BPF_EXIT_INSN(),
>         },
>         .prog_type = BPF_PROG_TYPE_SCHED_CLS,
>         .result = ACCEPT,
> },
>
> The above test loads 2 bytes from sk->src_ip4 where
> sk is obtained by bpf_sk_lookup_tcp().
>
> It hits an internal verifier error from convert_ctx_accesses():
> [root@arch-fb-vm1 bpf]# ./test_verifier 665 665
> Failed to load prog 'Invalid argument'!
> 0: (b7) r2 = 0
> 1: (63) *(u32 *)(r10 -8) = r2
> 2: (7b) *(u64 *)(r10 -16) = r2
> 3: (7b) *(u64 *)(r10 -24) = r2
> 4: (7b) *(u64 *)(r10 -32) = r2
> 5: (7b) *(u64 *)(r10 -40) = r2
> 6: (7b) *(u64 *)(r10 -48) = r2
> 7: (bf) r2 = r10
> 8: (07) r2 += -48
> 9: (b7) r3 = 36
> 10: (b7) r4 = 0
> 11: (b7) r5 = 0
> 12: (85) call bpf_sk_lookup_tcp#84
> 13: (bf) r6 = r0
> 14: (15) if r0 == 0x0 goto pc+3
>  R0=sock(id=1,off=0,imm=0) R6=sock(id=1,off=0,imm=0) R10=fp0,call_-1 fp-8=????0000 fp-16=0000mmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=mmmmmmmm refs=1
> 15: (69) r2 = *(u16 *)(r0 +26)
> 16: (bf) r1 = r6
> 17: (85) call bpf_sk_release#86
> 18: (95) exit
>
> from 14 to 18: safe
> processed 20 insns (limit 131072), stack depth 48
> bpf verifier is misconfigured
> Summary: 0 PASSED, 0 SKIPPED, 1 FAILED
>
> The bpf_sock_is_valid_access() is expecting src_ip4 can be narrowly
> loaded (meaning load any 1 or 2 bytes of the src_ip4) by
> marking info->ctx_field_size.  However, this marked
> ctx_field_size is not used.  This patch fixes it.
>
> Due to the recent refactoring in test_verifier,
> this new test will be added to the bpf-next branch
> (together with the bpf_tcp_sock patchset)
> to avoid merge conflict.
>
> Fixes: c64b7983288e ("bpf: Add PTR_TO_SOCKET verifier type")
> Cc: Joe Stringer <joe@wand.net.nz>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---

Nice find, thanks.

Acked-by: Joe Stringer <joe@wand.net.nz>

^ permalink raw reply

* Re: [PATCH net-next v2] add snmp counter document
From: David Miller @ 2019-02-10  3:00 UTC (permalink / raw)
  To: yupeng0921; +Cc: netdev, xiyou.wangcong, rdunlap
In-Reply-To: <20190209224618.6127-1-yupeng0921@gmail.com>

From: yupeng <yupeng0921@gmail.com>
Date: Sat,  9 Feb 2019 14:46:18 -0800

> add document for tcp retransmission, tcp fast open, syn cookies,
> challenge ack, prune and several general counters
> 
> Signed-off-by: yupeng <yupeng0921@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net/packet: fix 4gb buffer limit due to overflow check
From: David Miller @ 2019-02-10  3:01 UTC (permalink / raw)
  To: kal.conley
  Cc: willemb, edumazet, gregkh, jeffrey.t.kirsher, alexander.h.duyck,
	ktkhai, vincent.whitchurch, lirongqing, magnus.karlsson, netdev,
	linux-kernel
In-Reply-To: <20190209203701.27252-1-kal.conley@dectris.com>

From: Kal Conley <kal.conley@dectris.com>
Date: Sat,  9 Feb 2019 21:37:00 +0100

> When calculating rb->frames_per_block * req->tp_block_nr the result
> can overflow. Check it for overflow without limiting the total buffer
> size to UINT_MAX.
> 
> This change fixes support for packet ring buffers >= UINT_MAX.

Please resubmit with a proper signoff and also an appropriate Fixes:
tag.

^ permalink raw reply

* Re: [PATCH net-next] net: dsa: mv88e6xxx: SERDES support 2500BaseT via external PHY
From: David Miller @ 2019-02-10  3:02 UTC (permalink / raw)
  To: hkallweit1; +Cc: andrew, f.fainelli, vivien.didelot, netdev
In-Reply-To: <aed891fb-8568-2291-6050-18b6e2fd12e8@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Fri, 8 Feb 2019 22:25:44 +0100

> From: Andrew Lunn <andrew@lunn.ch>
> By using an external PHY, ports 9 and 10 can support 2500BaseT.
> So set this link mode in the mask when validating.
> 
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: phy: aquantia: add support for AQCS109
From: David Miller @ 2019-02-10  3:03 UTC (permalink / raw)
  To: hkallweit1; +Cc: andrew, f.fainelli, netdev, nikita.yoush
In-Reply-To: <23f602a2-1b35-a504-a634-2b352d8fad89@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Fri, 8 Feb 2019 20:56:55 +0100

> Add support for the AQCS109. From software point of view,
> it should be almost equivalent to AQR107.
> 
> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH bpf] bpf: Fix narrow load on a bpf_sock returned from sk_lookup()
From: Alexei Starovoitov @ 2019-02-10  4:02 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
	Joe Stringer
In-Reply-To: <20190209062554.142612-1-kafai@fb.com>

On Fri, Feb 08, 2019 at 10:25:54PM -0800, Martin KaFai Lau wrote:
> By adding this test to test_verifier:
> {
> 	"reference tracking: access sk->src_ip4 (narrow load)",
> 	.insns = {
> 	BPF_SK_LOOKUP,
> 	BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
> 	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
> 	BPF_LDX_MEM(BPF_H, BPF_REG_2, BPF_REG_0, offsetof(struct bpf_sock, src_ip4) + 2),
> 	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
> 	BPF_EMIT_CALL(BPF_FUNC_sk_release),
> 	BPF_EXIT_INSN(),
> 	},
> 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
> 	.result = ACCEPT,
> },
> 
> The above test loads 2 bytes from sk->src_ip4 where
> sk is obtained by bpf_sk_lookup_tcp().
> 
> It hits an internal verifier error from convert_ctx_accesses():
> [root@arch-fb-vm1 bpf]# ./test_verifier 665 665
> Failed to load prog 'Invalid argument'!
> 0: (b7) r2 = 0
> 1: (63) *(u32 *)(r10 -8) = r2
> 2: (7b) *(u64 *)(r10 -16) = r2
> 3: (7b) *(u64 *)(r10 -24) = r2
> 4: (7b) *(u64 *)(r10 -32) = r2
> 5: (7b) *(u64 *)(r10 -40) = r2
> 6: (7b) *(u64 *)(r10 -48) = r2
> 7: (bf) r2 = r10
> 8: (07) r2 += -48
> 9: (b7) r3 = 36
> 10: (b7) r4 = 0
> 11: (b7) r5 = 0
> 12: (85) call bpf_sk_lookup_tcp#84
> 13: (bf) r6 = r0
> 14: (15) if r0 == 0x0 goto pc+3
>  R0=sock(id=1,off=0,imm=0) R6=sock(id=1,off=0,imm=0) R10=fp0,call_-1 fp-8=????0000 fp-16=0000mmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=mmmmmmmm refs=1
> 15: (69) r2 = *(u16 *)(r0 +26)
> 16: (bf) r1 = r6
> 17: (85) call bpf_sk_release#86
> 18: (95) exit
> 
> from 14 to 18: safe
> processed 20 insns (limit 131072), stack depth 48
> bpf verifier is misconfigured
> Summary: 0 PASSED, 0 SKIPPED, 1 FAILED
> 
> The bpf_sock_is_valid_access() is expecting src_ip4 can be narrowly
> loaded (meaning load any 1 or 2 bytes of the src_ip4) by
> marking info->ctx_field_size.  However, this marked
> ctx_field_size is not used.  This patch fixes it.
> 
> Due to the recent refactoring in test_verifier,
> this new test will be added to the bpf-next branch
> (together with the bpf_tcp_sock patchset)
> to avoid merge conflict.
> 
> Fixes: c64b7983288e ("bpf: Add PTR_TO_SOCKET verifier type")
> Cc: Joe Stringer <joe@wand.net.nz>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>

Applied to bpf tree.

Martin, if your is_fullsock work depends on it, I can apply the fix
to bpf-next as well. Just let me know.


^ permalink raw reply

* Re: [PATCH net-next 10/16] rocker: Handle SWITCHDEV_PORT_ATTR_SET
From: Florian Fainelli @ 2019-02-10  4:20 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190209182147.GQ2353@nanopsycho>

Le 2/9/19 à 10:21 AM, Jiri Pirko a écrit :
> Sat, Feb 09, 2019 at 01:32:42AM CET, f.fainelli@gmail.com wrote:
>> Following patches will change the way we communicate getting or setting
> 
> Just "setting", no "getting".
> 
> 
>> a port's attribute and use a blocking notifier to perform those tasks.
>>
>> Prepare rocker to support receiving notifier events targeting
>> SWITCHDEV_PORT_ATTR_SET and simply translate that into the existing
>> rocker_port_attr_set call.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>> drivers/net/ethernet/rocker/rocker_main.c | 20 ++++++++++++++++++++
>> 1 file changed, 20 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
>> index ff3f14504f4f..f10e4888ecff 100644
>> --- a/drivers/net/ethernet/rocker/rocker_main.c
>> +++ b/drivers/net/ethernet/rocker/rocker_main.c
>> @@ -2811,6 +2811,24 @@ rocker_switchdev_port_obj_event(unsigned long event, struct net_device *netdev,
>> 	return notifier_from_errno(err);
>> }
>>
>> +static int
>> +rocker_switchdev_port_attr_event(unsigned long event, struct net_device *netdev,
>> +				 struct switchdev_notifier_port_attr_info
>> +				 *port_attr_info)
>> +{
>> +	int err = -EOPNOTSUPP;
>> +
>> +	switch (event) {
>> +	case SWITCHDEV_PORT_ATTR_SET:
> 
> Do you expect some other event to be handled in
> rocker_switchdev_port_attr_event()? Because you have SWITCHDEV_PORT_ATTR_SET
> selected in case here and in rocker_switchdev_blocking_event.
> Perhaps you can rename rocker_switchdev_port_attr_event() to
> rocker_switchdev_port_attr_set_event() and avoid this switchcase.

That's a good point, I have PORT_ATTR_{GET_SET} initially which was the
reason for the switch/case statement, will change it according to your
suggestion. Thanks!
-- 
Florian

^ permalink raw reply

* [RFC] apparently bogus logics in unix_find_other() since 2002
From: Al Viro @ 2019-02-10  4:24 UTC (permalink / raw)
  To: netdev; +Cc: Solar Designer, David Miller

	In "net/unix/af_unix.c: Set ATIME on socket inode" (back in
2002) we'd grown something rather odd in unix_find_other().  In the
original patch it was
                u=unix_find_socket_byname(sunname, len, type, hash);
-               if (!u)
+               if (u) {
+                       struct dentry *dentry;
+                       dentry = u->protinfo.af_unix.dentry;
+                       if (dentry)
+                               UPDATE_ATIME(dentry->d_inode);
+               } else
                        goto fail;

These days the code is

                u = unix_find_socket_byname(net, sunname, len, type, hash);
                if (u) {
                        struct dentry *dentry;
                        dentry = unix_sk(u)->path.dentry;
                        if (dentry)
                                touch_atime(&unix_sk(u)->path);
                } else  
                        goto fail;

but the logics is the same.  It's the abstract address case - we have
'\0' in sunname->sun_path[0].  How in hell could that possibly have
non-NULL ->path.dentry and what would it be?

Note that unix_find_socket_byname() returns non-NULL u here only if
u->addr->name->sun_path[0] is equal to sunname->sun_path[0], i.e.
'\0'.  There are only two places where we ever might assign non-NULL
to ->path.dentry:
        if (sun_path[0]) {
                addr->hash = UNIX_HASH_SIZE;
                hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1);
                spin_lock(&unix_table_lock);
                u->path = path;
                list = &unix_socket_table[hash];
        } else {
in unix_bind() (in which case u->addr->name->sun_path[0] will not be '\0') and
        /* copy address information from listening to new sock*/
        if (otheru->addr) {
                refcount_inc(&otheru->addr->refcnt);
                newu->addr = otheru->addr;
        }
        if (otheru->path.dentry) {
                path_get(&otheru->path);
                newu->path = otheru->path;
        }
in unix_stream_connect().  And once ->addr is non-NULL, it's never changed,
and ->addr->name contents is never modified afterwards.  So we would have
to had the same condition (non-NULL ->path.dentry with '\0' in
->addr->name->sun_path[0]) at earlier point on the listener socket.

Looks like that should be impossible; what am I missing here?  Incidentally,
how can the quoted fragment in in unix_stream_connect() be reached with NULL
otheru->addr?  After all, otheru is unix_sock of a listener; how could
we possibly have found it if it had NULL ->addr?

Confused...

^ permalink raw reply

* Re: [PATCH bpf] bpf: Fix narrow load on a bpf_sock returned from sk_lookup()
From: Martin Lau @ 2019-02-10  7:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: netdev@vger.kernel.org, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Joe Stringer
In-Reply-To: <20190210040241.wtsldfavw2vk3afv@ast-mbp>

On Sat, Feb 09, 2019 at 08:02:43PM -0800, Alexei Starovoitov wrote:
> On Fri, Feb 08, 2019 at 10:25:54PM -0800, Martin KaFai Lau wrote:
> > By adding this test to test_verifier:
> > {
> > 	"reference tracking: access sk->src_ip4 (narrow load)",
> > 	.insns = {
> > 	BPF_SK_LOOKUP,
> > 	BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
> > 	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
> > 	BPF_LDX_MEM(BPF_H, BPF_REG_2, BPF_REG_0, offsetof(struct bpf_sock, src_ip4) + 2),
> > 	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
> > 	BPF_EMIT_CALL(BPF_FUNC_sk_release),
> > 	BPF_EXIT_INSN(),
> > 	},
> > 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
> > 	.result = ACCEPT,
> > },
> > 
> > The above test loads 2 bytes from sk->src_ip4 where
> > sk is obtained by bpf_sk_lookup_tcp().
> > 
> > It hits an internal verifier error from convert_ctx_accesses():
> > [root@arch-fb-vm1 bpf]# ./test_verifier 665 665
> > Failed to load prog 'Invalid argument'!
> > 0: (b7) r2 = 0
> > 1: (63) *(u32 *)(r10 -8) = r2
> > 2: (7b) *(u64 *)(r10 -16) = r2
> > 3: (7b) *(u64 *)(r10 -24) = r2
> > 4: (7b) *(u64 *)(r10 -32) = r2
> > 5: (7b) *(u64 *)(r10 -40) = r2
> > 6: (7b) *(u64 *)(r10 -48) = r2
> > 7: (bf) r2 = r10
> > 8: (07) r2 += -48
> > 9: (b7) r3 = 36
> > 10: (b7) r4 = 0
> > 11: (b7) r5 = 0
> > 12: (85) call bpf_sk_lookup_tcp#84
> > 13: (bf) r6 = r0
> > 14: (15) if r0 == 0x0 goto pc+3
> >  R0=sock(id=1,off=0,imm=0) R6=sock(id=1,off=0,imm=0) R10=fp0,call_-1 fp-8=????0000 fp-16=0000mmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=mmmmmmmm refs=1
> > 15: (69) r2 = *(u16 *)(r0 +26)
> > 16: (bf) r1 = r6
> > 17: (85) call bpf_sk_release#86
> > 18: (95) exit
> > 
> > from 14 to 18: safe
> > processed 20 insns (limit 131072), stack depth 48
> > bpf verifier is misconfigured
> > Summary: 0 PASSED, 0 SKIPPED, 1 FAILED
> > 
> > The bpf_sock_is_valid_access() is expecting src_ip4 can be narrowly
> > loaded (meaning load any 1 or 2 bytes of the src_ip4) by
> > marking info->ctx_field_size.  However, this marked
> > ctx_field_size is not used.  This patch fixes it.
> > 
> > Due to the recent refactoring in test_verifier,
> > this new test will be added to the bpf-next branch
> > (together with the bpf_tcp_sock patchset)
> > to avoid merge conflict.
> > 
> > Fixes: c64b7983288e ("bpf: Add PTR_TO_SOCKET verifier type")
> > Cc: Joe Stringer <joe@wand.net.nz>
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> 
> Applied to bpf tree.
Thanks!

> 
> Martin, if your is_fullsock work depends on it, I can apply the fix
> to bpf-next as well. Just let me know.
Yes, the is_fullsock work depends on it.
I should have mentioned it in this commit log.

^ permalink raw reply

* [PATCH v2 bpf-next 0/7] Add __sk_buff->sk, bpf_tcp_sock, BPF_FUNC_tcp_sock and BPF_FUNC_sk_fullsock
From: Martin KaFai Lau @ 2019-02-10  7:22 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Lawrence Brakmo

This series adds __sk_buff->sk, "struct bpf_tcp_sock",
BPF_FUNC_sk_fullsock and BPF_FUNC_tcp_sock.  Together, they provide
a common way to expose the members of "struct tcp_sock" and
"struct bpf_sock" for the bpf_prog to access.

The patch series first adds a bpf_sock pointer to __sk_buff
and a new helper BPF_FUNC_sk_fullsock.

It then adds BPF_FUNC_tcp_sock to get a bpf_tcp_sock
pointer from a bpf_sock pointer.

The current use case is to allow a cg_skb_bpf_prog to provide
per cgroup traffic policing/shaping.

Please see individual patch for details.

v2:
- Patch 1 depends on
  commit d623876646be ("bpf: Fix narrow load on a bpf_sock returned from sk_lookup()")
  in the bpf branch.
- Add sk_to_full_sk() to bpf_sk_fullsock() and bpf_tcp_sock()
  such that there is a way to access the listener's sk and tcp_sk
  when __sk_buff->sk is a request_sock.
  The comments in the uapi bpf.h is updated accordingly.
- bpf_ctx_range_till() is used in bpf_sock_common_is_valid_access()
  in patch 1.  Saved a few lines.
- Patch 2 is new in v2 and it adds "state", "dst_ip4", "dst_ip6" and
  "dst_port" to the bpf_sock.  Narrow load is allowed on them.
  The "state" (i.e. sk_state) has already been used in
  INET_DIAG (e.g. ss -t) and getsockopt(TCP_INFO).
- While at it in the new patch 2, also allow narrow load on some
  existing fields of the bpf_sock, which are "family", "type", "protocol"
  and "src_port".  Only allow loading from first byte for now.
  i.e. does not allow narrow load starting from the 2nd byte.
- Add some narrow load tests to the test_verifier's sock.c

Martin KaFai Lau (7):
  bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper
  bpf: Add state, dst_ip4, dst_ip6 and dst_port to bpf_sock
  bpf: Refactor sock_ops_convert_ctx_access
  bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock
  bpf: Sync bpf.h to tools/
  bpf: Add skb->sk, bpf_sk_fullsock and bpf_tcp_sock tests to
    test_verifer
  bpf: Add test_sock_fields for skb->sk and bpf_tcp_sock

 include/linux/bpf.h                           |  42 ++
 include/uapi/linux/bpf.h                      |  72 ++-
 kernel/bpf/verifier.c                         | 159 ++++--
 net/core/filter.c                             | 495 +++++++++++-------
 tools/include/uapi/linux/bpf.h                |  72 ++-
 tools/testing/selftests/bpf/Makefile          |   6 +-
 tools/testing/selftests/bpf/bpf_helpers.h     |   4 +
 tools/testing/selftests/bpf/bpf_util.h        |   9 +
 .../testing/selftests/bpf/test_sock_fields.c  | 327 ++++++++++++
 .../selftests/bpf/test_sock_fields_kern.c     | 152 ++++++
 .../selftests/bpf/verifier/ref_tracking.c     |   4 +-
 tools/testing/selftests/bpf/verifier/sock.c   | 384 ++++++++++++++
 tools/testing/selftests/bpf/verifier/unpriv.c |   2 +-
 13 files changed, 1493 insertions(+), 235 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_sock_fields.c
 create mode 100644 tools/testing/selftests/bpf/test_sock_fields_kern.c
 create mode 100644 tools/testing/selftests/bpf/verifier/sock.c

-- 
2.17.1


^ permalink raw reply

* [PATCH v2 bpf-next 1/7] bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper
From: Martin KaFai Lau @ 2019-02-10  7:22 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Lawrence Brakmo
In-Reply-To: <20190210072220.1530061-1-kafai@fb.com>

In kernel, it is common to check "skb->sk && sk_fullsock(skb->sk)"
before accessing the fields in sock.  For example, in __netdev_pick_tx:

static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
			    struct net_device *sb_dev)
{
	/* ... */

	struct sock *sk = skb->sk;

		if (queue_index != new_index && sk &&
		    sk_fullsock(sk) &&
		    rcu_access_pointer(sk->sk_dst_cache))
			sk_tx_queue_set(sk, new_index);

	/* ... */

	return queue_index;
}

This patch adds a "struct bpf_sock *sk" pointer to the "struct __sk_buff"
where a few of the convert_ctx_access() in filter.c has already been
accessing the skb->sk sock_common's fields,
e.g. sock_ops_convert_ctx_access().

"__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier.
Some of the fileds in "bpf_sock" will not be directly
accessible through the "__sk_buff->sk" pointer.  It is limited
by the new "bpf_sock_common_is_valid_access()".
e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock
     are not allowed.

The newly added "struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)"
can be used to get a sk with all accessible fields in "bpf_sock".
This helper is added to both cg_skb and sched_(cls|act).

int cg_skb_foo(struct __sk_buff *skb) {
	struct bpf_sock *sk;

	sk = skb->sk;
	if (!sk)
		return 1;

	sk = bpf_sk_fullsock(sk);
	if (!sk)
		return 1;

	if (sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP)
		return 1;

	/* some_traffic_shaping(); */

	return 1;
}

(1) The sk is read only

(2) There is no new "struct bpf_sock_common" introduced.

(3) Future kernel sock's members could be added to bpf_sock only
    instead of repeatedly adding at multiple places like currently
    in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc.

(4) After "sk = skb->sk", the reg holding sk is in type
    PTR_TO_SOCK_COMMON_OR_NULL.

(5) After bpf_sk_fullsock(), the return type will be in type
    PTR_TO_SOCKET_OR_NULL which is the same as the return type of
    bpf_sk_lookup_xxx().

    However, bpf_sk_fullsock() does not take refcnt.  The
    acquire_reference_state() is only depending on the return type now.
    To avoid it, a new is_acquire_function() is checked before calling
    acquire_reference_state().

(6) The WARN_ON in "release_reference_state()" is no longer an
    internal verifier bug.

    When reg->id is not found in state->refs[], it means the
    bpf_prog does something wrong like
    "bpf_sk_release(bpf_sk_fullsock(skb->sk))" where reference has
    never been acquired by calling "bpf_sk_fullsock(skb->sk)".

    A -EINVAL and a verbose are done instead of WARN_ON.  A test is
    added to the test_verifier in a later patch.

    Since the WARN_ON in "release_reference_state()" is no longer
    needed, "__release_reference_state()" is folded into
    "release_reference_state()" also.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/linux/bpf.h      |  12 ++++
 include/uapi/linux/bpf.h |  12 +++-
 kernel/bpf/verifier.c    | 132 +++++++++++++++++++++++++++------------
 net/core/filter.c        |  42 +++++++++++++
 4 files changed, 157 insertions(+), 41 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bd169a7bcc93..a60463b45b54 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -194,6 +194,7 @@ enum bpf_arg_type {
 	ARG_ANYTHING,		/* any (initialized) argument is ok */
 	ARG_PTR_TO_SOCKET,	/* pointer to bpf_sock */
 	ARG_PTR_TO_SPIN_LOCK,	/* pointer to bpf_spin_lock */
+	ARG_PTR_TO_SOCK_COMMON,	/* pointer to sock_common */
 };
 
 /* type of values returned from helper functions */
@@ -256,6 +257,8 @@ enum bpf_reg_type {
 	PTR_TO_FLOW_KEYS,	 /* reg points to bpf_flow_keys */
 	PTR_TO_SOCKET,		 /* reg points to struct bpf_sock */
 	PTR_TO_SOCKET_OR_NULL,	 /* reg points to struct bpf_sock or NULL */
+	PTR_TO_SOCK_COMMON,	 /* reg points to sock_common */
+	PTR_TO_SOCK_COMMON_OR_NULL, /* reg points to sock_common or NULL */
 };
 
 /* The information passed from prog-specific *_is_valid_access
@@ -920,6 +923,9 @@ void bpf_user_rnd_init_once(void);
 u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 
 #if defined(CONFIG_NET)
+bool bpf_sock_common_is_valid_access(int off, int size,
+				     enum bpf_access_type type,
+				     struct bpf_insn_access_aux *info);
 bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
 			      struct bpf_insn_access_aux *info);
 u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
@@ -928,6 +934,12 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
 				struct bpf_prog *prog,
 				u32 *target_size);
 #else
+static inline bool bpf_sock_common_is_valid_access(int off, int size,
+						   enum bpf_access_type type,
+						   struct bpf_insn_access_aux *info)
+{
+	return false;
+}
 static inline bool bpf_sock_is_valid_access(int off, int size,
 					    enum bpf_access_type type,
 					    struct bpf_insn_access_aux *info)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1777fa0c61e4..5d79cba74ddc 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2329,6 +2329,14 @@ union bpf_attr {
  *		"**y**".
  *	Return
  *		0
+ *
+ * struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)
+ *	Description
+ *		This helper gets a **struct bpf_sock** pointer such
+ *		that all the fields in bpf_sock can be accessed.
+ *	Return
+ *		A **struct bpf_sock** pointer on success, or NULL in
+ *		case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2425,7 +2433,8 @@ union bpf_attr {
 	FN(msg_pop_data),		\
 	FN(rc_pointer_rel),		\
 	FN(spin_lock),			\
-	FN(spin_unlock),
+	FN(spin_unlock),		\
+	FN(sk_fullsock),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -2545,6 +2554,7 @@ struct __sk_buff {
 	__u64 tstamp;
 	__u32 wire_len;
 	__u32 gso_segs;
+	__bpf_md_ptr(struct bpf_sock *, sk);
 };
 
 struct bpf_tunnel_key {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 516dfc6d78de..b755d55a3791 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -331,10 +331,17 @@ static bool type_is_pkt_pointer(enum bpf_reg_type type)
 	       type == PTR_TO_PACKET_META;
 }
 
+static bool type_is_sk_pointer(enum bpf_reg_type type)
+{
+	return type == PTR_TO_SOCKET ||
+		type == PTR_TO_SOCK_COMMON;
+}
+
 static bool reg_type_may_be_null(enum bpf_reg_type type)
 {
 	return type == PTR_TO_MAP_VALUE_OR_NULL ||
-	       type == PTR_TO_SOCKET_OR_NULL;
+	       type == PTR_TO_SOCKET_OR_NULL ||
+	       type == PTR_TO_SOCK_COMMON_OR_NULL;
 }
 
 static bool type_is_refcounted(enum bpf_reg_type type)
@@ -377,6 +384,12 @@ static bool is_release_function(enum bpf_func_id func_id)
 	return func_id == BPF_FUNC_sk_release;
 }
 
+static bool is_acquire_function(enum bpf_func_id func_id)
+{
+	return func_id == BPF_FUNC_sk_lookup_tcp ||
+		func_id == BPF_FUNC_sk_lookup_udp;
+}
+
 /* string representation of 'enum bpf_reg_type' */
 static const char * const reg_type_str[] = {
 	[NOT_INIT]		= "?",
@@ -392,6 +405,8 @@ static const char * const reg_type_str[] = {
 	[PTR_TO_FLOW_KEYS]	= "flow_keys",
 	[PTR_TO_SOCKET]		= "sock",
 	[PTR_TO_SOCKET_OR_NULL] = "sock_or_null",
+	[PTR_TO_SOCK_COMMON]	= "sock_common",
+	[PTR_TO_SOCK_COMMON_OR_NULL] = "sock_common_or_null",
 };
 
 static char slot_type_char[] = {
@@ -618,13 +633,10 @@ static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx)
 }
 
 /* release function corresponding to acquire_reference_state(). Idempotent. */
-static int __release_reference_state(struct bpf_func_state *state, int ptr_id)
+static int release_reference_state(struct bpf_func_state *state, int ptr_id)
 {
 	int i, last_idx;
 
-	if (!ptr_id)
-		return -EFAULT;
-
 	last_idx = state->acquired_refs - 1;
 	for (i = 0; i < state->acquired_refs; i++) {
 		if (state->refs[i].id == ptr_id) {
@@ -636,21 +648,7 @@ static int __release_reference_state(struct bpf_func_state *state, int ptr_id)
 			return 0;
 		}
 	}
-	return -EFAULT;
-}
-
-/* variation on the above for cases where we expect that there must be an
- * outstanding reference for the specified ptr_id.
- */
-static int release_reference_state(struct bpf_verifier_env *env, int ptr_id)
-{
-	struct bpf_func_state *state = cur_func(env);
-	int err;
-
-	err = __release_reference_state(state, ptr_id);
-	if (WARN_ON_ONCE(err != 0))
-		verbose(env, "verifier internal error: can't release reference\n");
-	return err;
+	return -EINVAL;
 }
 
 static int transfer_reference_state(struct bpf_func_state *dst,
@@ -1209,6 +1207,8 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
 	case CONST_PTR_TO_MAP:
 	case PTR_TO_SOCKET:
 	case PTR_TO_SOCKET_OR_NULL:
+	case PTR_TO_SOCK_COMMON:
+	case PTR_TO_SOCK_COMMON_OR_NULL:
 		return true;
 	default:
 		return false;
@@ -1647,6 +1647,7 @@ static int check_sock_access(struct bpf_verifier_env *env, int insn_idx,
 	struct bpf_reg_state *regs = cur_regs(env);
 	struct bpf_reg_state *reg = &regs[regno];
 	struct bpf_insn_access_aux info = {};
+	bool valid;
 
 	if (reg->smin_value < 0) {
 		verbose(env, "R%d min value is negative, either use unsigned index or do a if (index >=0) check.\n",
@@ -1654,15 +1655,28 @@ static int check_sock_access(struct bpf_verifier_env *env, int insn_idx,
 		return -EACCES;
 	}
 
-	if (!bpf_sock_is_valid_access(off, size, t, &info)) {
-		verbose(env, "invalid bpf_sock access off=%d size=%d\n",
-			off, size);
-		return -EACCES;
+	switch (reg->type) {
+	case PTR_TO_SOCK_COMMON:
+		valid = bpf_sock_common_is_valid_access(off, size, t, &info);
+		break;
+	case PTR_TO_SOCKET:
+		valid = bpf_sock_is_valid_access(off, size, t, &info);
+		break;
+	default:
+		valid = false;
 	}
 
-	env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
 
-	return 0;
+	if (valid) {
+		env->insn_aux_data[insn_idx].ctx_field_size =
+			info.ctx_field_size;
+		return 0;
+	}
+
+	verbose(env, "R%d invalid %s access off=%d size=%d\n",
+		regno, reg_type_str[reg->type], off, size);
+
+	return -EACCES;
 }
 
 static bool __is_pointer_value(bool allow_ptr_leaks,
@@ -1688,8 +1702,14 @@ static bool is_ctx_reg(struct bpf_verifier_env *env, int regno)
 {
 	const struct bpf_reg_state *reg = reg_state(env, regno);
 
-	return reg->type == PTR_TO_CTX ||
-	       reg->type == PTR_TO_SOCKET;
+	return reg->type == PTR_TO_CTX;
+}
+
+static bool is_sk_reg(struct bpf_verifier_env *env, int regno)
+{
+	const struct bpf_reg_state *reg = reg_state(env, regno);
+
+	return type_is_sk_pointer(reg->type);
 }
 
 static bool is_pkt_reg(struct bpf_verifier_env *env, int regno)
@@ -1800,6 +1820,9 @@ static int check_ptr_alignment(struct bpf_verifier_env *env,
 	case PTR_TO_SOCKET:
 		pointer_desc = "sock ";
 		break;
+	case PTR_TO_SOCK_COMMON:
+		pointer_desc = "sock_common ";
+		break;
 	default:
 		break;
 	}
@@ -2003,11 +2026,14 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 			 * PTR_TO_PACKET[_META,_END]. In the latter
 			 * case, we know the offset is zero.
 			 */
-			if (reg_type == SCALAR_VALUE)
+			if (reg_type == SCALAR_VALUE) {
 				mark_reg_unknown(env, regs, value_regno);
-			else
+			} else {
 				mark_reg_known_zero(env, regs,
 						    value_regno);
+				if (reg_type_may_be_null(reg_type))
+					regs[value_regno].id = ++env->id_gen;
+			}
 			regs[value_regno].type = reg_type;
 		}
 
@@ -2053,9 +2079,10 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		err = check_flow_keys_access(env, off, size);
 		if (!err && t == BPF_READ && value_regno >= 0)
 			mark_reg_unknown(env, regs, value_regno);
-	} else if (reg->type == PTR_TO_SOCKET) {
+	} else if (type_is_sk_pointer(reg->type)) {
 		if (t == BPF_WRITE) {
-			verbose(env, "cannot write into socket\n");
+			verbose(env, "R%d cannot write into %s\n",
+				regno, reg_type_str[reg->type]);
 			return -EACCES;
 		}
 		err = check_sock_access(env, insn_idx, regno, off, size, t);
@@ -2102,7 +2129,8 @@ static int check_xadd(struct bpf_verifier_env *env, int insn_idx, struct bpf_ins
 
 	if (is_ctx_reg(env, insn->dst_reg) ||
 	    is_pkt_reg(env, insn->dst_reg) ||
-	    is_flow_key_reg(env, insn->dst_reg)) {
+	    is_flow_key_reg(env, insn->dst_reg) ||
+	    is_sk_reg(env, insn->dst_reg)) {
 		verbose(env, "BPF_XADD stores into R%d %s is not allowed\n",
 			insn->dst_reg,
 			reg_type_str[reg_state(env, insn->dst_reg)->type]);
@@ -2369,6 +2397,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 		err = check_ctx_reg(env, reg, regno);
 		if (err < 0)
 			return err;
+	} else if (arg_type == ARG_PTR_TO_SOCK_COMMON) {
+		expected_type = PTR_TO_SOCK_COMMON;
+		/* Any sk pointer can be ARG_PTR_TO_SOCK_COMMON */
+		if (!type_is_sk_pointer(type))
+			goto err_type;
 	} else if (arg_type == ARG_PTR_TO_SOCKET) {
 		expected_type = PTR_TO_SOCKET;
 		if (type != expected_type)
@@ -2783,7 +2816,7 @@ static int release_reference(struct bpf_verifier_env *env,
 	for (i = 0; i <= vstate->curframe; i++)
 		release_reg_references(env, vstate->frame[i], meta->ptr_id);
 
-	return release_reference_state(env, meta->ptr_id);
+	return release_reference_state(cur_func(env), meta->ptr_id);
 }
 
 static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
@@ -3049,8 +3082,11 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 		}
 	} else if (is_release_function(func_id)) {
 		err = release_reference(env, &meta);
-		if (err)
+		if (err) {
+			verbose(env, "func %s#%d reference has not been acquired before\n",
+				func_id_name(func_id), func_id);
 			return err;
+		}
 	}
 
 	regs = cur_regs(env);
@@ -3099,12 +3135,19 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 			regs[BPF_REG_0].id = ++env->id_gen;
 		}
 	} else if (fn->ret_type == RET_PTR_TO_SOCKET_OR_NULL) {
-		int id = acquire_reference_state(env, insn_idx);
-		if (id < 0)
-			return id;
 		mark_reg_known_zero(env, regs, BPF_REG_0);
 		regs[BPF_REG_0].type = PTR_TO_SOCKET_OR_NULL;
-		regs[BPF_REG_0].id = id;
+		if (is_acquire_function(func_id)) {
+			int id = acquire_reference_state(env, insn_idx);
+
+			if (id < 0)
+				return id;
+			/* For release_reference() */
+			regs[BPF_REG_0].id = id;
+		} else {
+			/* For mark_ptr_or_null_reg() */
+			regs[BPF_REG_0].id = ++env->id_gen;
+		}
 	} else {
 		verbose(env, "unknown return type %d of func %s#%d\n",
 			fn->ret_type, func_id_name(func_id), func_id);
@@ -3364,6 +3407,8 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 	case PTR_TO_PACKET_END:
 	case PTR_TO_SOCKET:
 	case PTR_TO_SOCKET_OR_NULL:
+	case PTR_TO_SOCK_COMMON:
+	case PTR_TO_SOCK_COMMON_OR_NULL:
 		verbose(env, "R%d pointer arithmetic on %s prohibited\n",
 			dst, reg_type_str[ptr_reg->type]);
 		return -EACCES;
@@ -4597,6 +4642,8 @@ static void mark_ptr_or_null_reg(struct bpf_func_state *state,
 			}
 		} else if (reg->type == PTR_TO_SOCKET_OR_NULL) {
 			reg->type = PTR_TO_SOCKET;
+		} else if (reg->type == PTR_TO_SOCK_COMMON_OR_NULL) {
+			reg->type = PTR_TO_SOCK_COMMON;
 		}
 		if (is_null || !(reg_is_refcounted(reg) ||
 				 reg_may_point_to_spin_lock(reg))) {
@@ -4621,7 +4668,7 @@ static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno,
 	int i, j;
 
 	if (reg_is_refcounted_or_null(&regs[regno]) && is_null)
-		__release_reference_state(state, id);
+		release_reference_state(state, id);
 
 	for (i = 0; i < MAX_BPF_REG; i++)
 		mark_ptr_or_null_reg(state, &regs[i], id, is_null);
@@ -5790,6 +5837,8 @@ static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur,
 	case PTR_TO_FLOW_KEYS:
 	case PTR_TO_SOCKET:
 	case PTR_TO_SOCKET_OR_NULL:
+	case PTR_TO_SOCK_COMMON:
+	case PTR_TO_SOCK_COMMON_OR_NULL:
 		/* Only valid matches are exact, which memcmp() above
 		 * would have accepted
 		 */
@@ -6110,6 +6159,8 @@ static bool reg_type_mismatch_ok(enum bpf_reg_type type)
 	case PTR_TO_CTX:
 	case PTR_TO_SOCKET:
 	case PTR_TO_SOCKET_OR_NULL:
+	case PTR_TO_SOCK_COMMON:
+	case PTR_TO_SOCK_COMMON_OR_NULL:
 		return false;
 	default:
 		return true;
@@ -7112,6 +7163,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			convert_ctx_access = ops->convert_ctx_access;
 			break;
 		case PTR_TO_SOCKET:
+		case PTR_TO_SOCK_COMMON:
 			convert_ctx_access = bpf_sock_convert_ctx_access;
 			break;
 		default:
diff --git a/net/core/filter.c b/net/core/filter.c
index 3a49f68eda10..401d2e0aebf8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1793,6 +1793,20 @@ static const struct bpf_func_proto bpf_skb_pull_data_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_1(bpf_sk_fullsock, struct sock *, sk)
+{
+	sk = sk_to_full_sk(sk);
+
+	return sk_fullsock(sk) ? (unsigned long)sk : (unsigned long)NULL;
+}
+
+static const struct bpf_func_proto bpf_sk_fullsock_proto = {
+	.func		= bpf_sk_fullsock,
+	.gpl_only	= false,
+	.ret_type	= RET_PTR_TO_SOCKET_OR_NULL,
+	.arg1_type	= ARG_PTR_TO_SOCK_COMMON,
+};
+
 static inline int sk_skb_try_make_writable(struct sk_buff *skb,
 					   unsigned int write_len)
 {
@@ -5406,6 +5420,8 @@ cg_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	switch (func_id) {
 	case BPF_FUNC_get_local_storage:
 		return &bpf_get_local_storage_proto;
+	case BPF_FUNC_sk_fullsock:
+		return &bpf_sk_fullsock_proto;
 	default:
 		return sk_filter_func_proto(func_id, prog);
 	}
@@ -5477,6 +5493,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_socket_uid_proto;
 	case BPF_FUNC_fib_lookup:
 		return &bpf_skb_fib_lookup_proto;
+	case BPF_FUNC_sk_fullsock:
+		return &bpf_sk_fullsock_proto;
 #ifdef CONFIG_XFRM
 	case BPF_FUNC_skb_get_xfrm_state:
 		return &bpf_skb_get_xfrm_state_proto;
@@ -5764,6 +5782,11 @@ static bool bpf_skb_is_valid_access(int off, int size, enum bpf_access_type type
 		if (size != sizeof(__u64))
 			return false;
 		break;
+	case offsetof(struct __sk_buff, sk):
+		if (type == BPF_WRITE || size != sizeof(__u64))
+			return false;
+		info->reg_type = PTR_TO_SOCK_COMMON_OR_NULL;
+		break;
 	default:
 		/* Only narrow read access allowed for now. */
 		if (type == BPF_WRITE) {
@@ -5950,6 +5973,18 @@ static bool __sock_filter_check_size(int off, int size,
 	return size == size_default;
 }
 
+bool bpf_sock_common_is_valid_access(int off, int size,
+				     enum bpf_access_type type,
+				     struct bpf_insn_access_aux *info)
+{
+	switch (off) {
+	case bpf_ctx_range_till(struct bpf_sock, type, priority):
+		return false;
+	default:
+		return bpf_sock_is_valid_access(off, size, type, info);
+	}
+}
+
 bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
 			      struct bpf_insn_access_aux *info)
 {
@@ -6748,6 +6783,13 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
 		off += offsetof(struct qdisc_skb_cb, pkt_len);
 		*target_size = 4;
 		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, off);
+		break;
+
+	case offsetof(struct __sk_buff, sk):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct sk_buff, sk));
+		break;
 	}
 
 	return insn - insn_buf;
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 bpf-next 2/7] bpf: Add state, dst_ip4, dst_ip6 and dst_port to bpf_sock
From: Martin KaFai Lau @ 2019-02-10  7:22 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Lawrence Brakmo
In-Reply-To: <20190210072220.1530061-1-kafai@fb.com>

This patch adds "state", "dst_ip4", "dst_ip6" and "dst_port" to the
bpf_sock.  The userspace has already been using "state",
e.g. inet_diag (ss -t) and getsockopt(TCP_INFO).

This patch also allows narrow load on the following existing fields:
"family", "type", "protocol" and "src_port".  Unlike IP address,
the load offset is resticted to the first byte for them but it
can be relaxed later if there is a use case.

This patch also folds __sock_filter_check_size() into
bpf_sock_is_valid_access() since it is not called
by any where else.  All bpf_sock checking is in
one place.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/uapi/linux/bpf.h | 17 ++++---
 net/core/filter.c        | 99 +++++++++++++++++++++++++++++++---------
 2 files changed, 85 insertions(+), 31 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 5d79cba74ddc..d8f91777c5b6 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2606,15 +2606,14 @@ struct bpf_sock {
 	__u32 protocol;
 	__u32 mark;
 	__u32 priority;
-	__u32 src_ip4;		/* Allows 1,2,4-byte read.
-				 * Stored in network byte order.
-				 */
-	__u32 src_ip6[4];	/* Allows 1,2,4-byte read.
-				 * Stored in network byte order.
-				 */
-	__u32 src_port;		/* Allows 4-byte read.
-				 * Stored in host byte order
-				 */
+	/* IP address also allows 1 and 2 bytes access */
+	__u32 src_ip4;
+	__u32 src_ip6[4];
+	__u32 src_port;		/* host byte order */
+	__u32 dst_port;		/* network byte order */
+	__u32 dst_ip4;
+	__u32 dst_ip6[4];
+	__u32 state;
 };
 
 struct bpf_sock_tuple {
diff --git a/net/core/filter.c b/net/core/filter.c
index 401d2e0aebf8..01bb64bf2b5e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5958,21 +5958,6 @@ static bool __sock_filter_check_attach_type(int off,
 	return true;
 }
 
-static bool __sock_filter_check_size(int off, int size,
-				     struct bpf_insn_access_aux *info)
-{
-	const int size_default = sizeof(__u32);
-
-	switch (off) {
-	case bpf_ctx_range(struct bpf_sock, src_ip4):
-	case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]):
-		bpf_ctx_record_field_size(info, size_default);
-		return bpf_ctx_narrow_access_ok(off, size, size_default);
-	}
-
-	return size == size_default;
-}
-
 bool bpf_sock_common_is_valid_access(int off, int size,
 				     enum bpf_access_type type,
 				     struct bpf_insn_access_aux *info)
@@ -5988,13 +5973,29 @@ bool bpf_sock_common_is_valid_access(int off, int size,
 bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
 			      struct bpf_insn_access_aux *info)
 {
+	const int size_default = sizeof(__u32);
+
 	if (off < 0 || off >= sizeof(struct bpf_sock))
 		return false;
 	if (off % size != 0)
 		return false;
-	if (!__sock_filter_check_size(off, size, info))
-		return false;
-	return true;
+
+	switch (off) {
+	case offsetof(struct bpf_sock, state):
+	case offsetof(struct bpf_sock, family):
+	case offsetof(struct bpf_sock, type):
+	case offsetof(struct bpf_sock, protocol):
+	case offsetof(struct bpf_sock, dst_port):
+	case offsetof(struct bpf_sock, src_port):
+	case bpf_ctx_range(struct bpf_sock, src_ip4):
+	case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]):
+	case bpf_ctx_range(struct bpf_sock, dst_ip4):
+	case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]):
+		bpf_ctx_record_field_size(info, size_default);
+		return bpf_ctx_narrow_access_ok(off, size, size_default);
+	}
+
+	return size == size_default;
 }
 
 static bool sock_filter_is_valid_access(int off, int size,
@@ -6838,24 +6839,32 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
 		break;
 
 	case offsetof(struct bpf_sock, family):
-		BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_family) != 2);
-
-		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
-				      offsetof(struct sock, sk_family));
+		*insn++ = BPF_LDX_MEM(
+			BPF_FIELD_SIZEOF(struct sock_common, skc_family),
+			si->dst_reg, si->src_reg,
+			bpf_target_off(struct sock_common,
+				       skc_family,
+				       FIELD_SIZEOF(struct sock_common,
+						    skc_family),
+				       target_size));
 		break;
 
 	case offsetof(struct bpf_sock, type):
+		BUILD_BUG_ON(HWEIGHT32(SK_FL_TYPE_MASK) != BITS_PER_BYTE * 2);
 		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
 				      offsetof(struct sock, __sk_flags_offset));
 		*insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, SK_FL_TYPE_MASK);
 		*insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg, SK_FL_TYPE_SHIFT);
+		*target_size = 2;
 		break;
 
 	case offsetof(struct bpf_sock, protocol):
+		BUILD_BUG_ON(HWEIGHT32(SK_FL_PROTO_MASK) != BITS_PER_BYTE);
 		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
 				      offsetof(struct sock, __sk_flags_offset));
 		*insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, SK_FL_PROTO_MASK);
 		*insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg, SK_FL_PROTO_SHIFT);
+		*target_size = 1;
 		break;
 
 	case offsetof(struct bpf_sock, src_ip4):
@@ -6867,6 +6876,15 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
 				       target_size));
 		break;
 
+	case offsetof(struct bpf_sock, dst_ip4):
+		*insn++ = BPF_LDX_MEM(
+			BPF_SIZE(si->code), si->dst_reg, si->src_reg,
+			bpf_target_off(struct sock_common, skc_daddr,
+				       FIELD_SIZEOF(struct sock_common,
+						    skc_daddr),
+				       target_size));
+		break;
+
 	case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]):
 #if IS_ENABLED(CONFIG_IPV6)
 		off = si->off;
@@ -6885,6 +6903,23 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
 #endif
 		break;
 
+	case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+		off = si->off;
+		off -= offsetof(struct bpf_sock, dst_ip6[0]);
+		*insn++ = BPF_LDX_MEM(
+			BPF_SIZE(si->code), si->dst_reg, si->src_reg,
+			bpf_target_off(struct sock_common,
+				       skc_v6_daddr.s6_addr32[0],
+				       FIELD_SIZEOF(struct sock_common,
+						    skc_v6_daddr.s6_addr32[0]),
+				       target_size) + off);
+#else
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+		*target_size = 4;
+#endif
+		break;
+
 	case offsetof(struct bpf_sock, src_port):
 		*insn++ = BPF_LDX_MEM(
 			BPF_FIELD_SIZEOF(struct sock_common, skc_num),
@@ -6894,6 +6929,26 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
 						    skc_num),
 				       target_size));
 		break;
+
+	case offsetof(struct bpf_sock, dst_port):
+		*insn++ = BPF_LDX_MEM(
+			BPF_FIELD_SIZEOF(struct sock_common, skc_dport),
+			si->dst_reg, si->src_reg,
+			bpf_target_off(struct sock_common, skc_dport,
+				       FIELD_SIZEOF(struct sock_common,
+						    skc_dport),
+				       target_size));
+		break;
+
+	case offsetof(struct bpf_sock, state):
+		*insn++ = BPF_LDX_MEM(
+			BPF_FIELD_SIZEOF(struct sock_common, skc_state),
+			si->dst_reg, si->src_reg,
+			bpf_target_off(struct sock_common, skc_state,
+				       FIELD_SIZEOF(struct sock_common,
+						    skc_state),
+				       target_size));
+		break;
 	}
 
 	return insn - insn_buf;
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 bpf-next 4/7] bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock
From: Martin KaFai Lau @ 2019-02-10  7:22 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Lawrence Brakmo
In-Reply-To: <20190210072220.1530061-1-kafai@fb.com>

This patch adds a helper function BPF_FUNC_tcp_sock and it
is currently available for cg_skb and sched_(cls|act):

struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk);

int cg_skb_foo(struct __sk_buff *skb) {
	struct bpf_tcp_sock *tp;
	struct bpf_sock *sk;
	__u32 snd_cwnd;

	sk = skb->sk;
	if (!sk)
		return 1;

	tp = bpf_tcp_sock(sk);
	if (!tp)
		return 1;

	snd_cwnd = tp->snd_cwnd;
	/* ... */

	return 1;
}

A 'struct bpf_tcp_sock' is also added to the uapi bpf.h to provide
read-only access.  bpf_tcp_sock has all the existing tcp_sock's fields
that has already been exposed by the bpf_sock_ops.
i.e. no new tcp_sock's fields are exposed in bpf.h.

This helper returns a pointer to the tcp_sock.  If it is not a tcp_sock
or it cannot be traced back to a tcp_sock by sk_to_full_sk(), it
returns NULL.  Hence, the caller needs to check for NULL before
accessing it.

The current use case is to expose members from tcp_sock
to allow a cg_skb_bpf_prog to provide per cgroup traffic
policing/shaping.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/linux/bpf.h      | 30 +++++++++++++++
 include/uapi/linux/bpf.h | 51 +++++++++++++++++++++++++-
 kernel/bpf/verifier.c    | 31 +++++++++++++++-
 net/core/filter.c        | 79 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 188 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a60463b45b54..7f58828755fd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -204,6 +204,7 @@ enum bpf_return_type {
 	RET_PTR_TO_MAP_VALUE,		/* returns a pointer to map elem value */
 	RET_PTR_TO_MAP_VALUE_OR_NULL,	/* returns a pointer to map elem value or NULL */
 	RET_PTR_TO_SOCKET_OR_NULL,	/* returns a pointer to a socket or NULL */
+	RET_PTR_TO_TCP_SOCK_OR_NULL,	/* returns a pointer to a tcp_sock or NULL */
 };
 
 /* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
@@ -259,6 +260,8 @@ enum bpf_reg_type {
 	PTR_TO_SOCKET_OR_NULL,	 /* reg points to struct bpf_sock or NULL */
 	PTR_TO_SOCK_COMMON,	 /* reg points to sock_common */
 	PTR_TO_SOCK_COMMON_OR_NULL, /* reg points to sock_common or NULL */
+	PTR_TO_TCP_SOCK,	 /* reg points to struct tcp_sock */
+	PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
 };
 
 /* The information passed from prog-specific *_is_valid_access
@@ -956,4 +959,31 @@ static inline u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
 }
 #endif
 
+#ifdef CONFIG_INET
+bool bpf_tcp_sock_is_valid_access(int off, int size, enum bpf_access_type type,
+				  struct bpf_insn_access_aux *info);
+
+u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
+				    const struct bpf_insn *si,
+				    struct bpf_insn *insn_buf,
+				    struct bpf_prog *prog,
+				    u32 *target_size);
+#else
+static inline bool bpf_tcp_sock_is_valid_access(int off, int size,
+						enum bpf_access_type type,
+						struct bpf_insn_access_aux *info)
+{
+	return false;
+}
+
+static inline u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
+						  const struct bpf_insn *si,
+						  struct bpf_insn *insn_buf,
+						  struct bpf_prog *prog,
+						  u32 *target_size)
+{
+	return 0;
+}
+#endif /* CONFIG_INET */
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d8f91777c5b6..25c8c0e62ecf 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2337,6 +2337,15 @@ union bpf_attr {
  *	Return
  *		A **struct bpf_sock** pointer on success, or NULL in
  *		case of failure.
+ *
+ * struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk)
+ *	Description
+ *		This helper gets a **struct bpf_tcp_sock** pointer from a
+ *		**struct bpf_sock** pointer.
+ *
+ *	Return
+ *		A **struct bpf_tcp_sock** pointer on success, or NULL in
+ *		case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2434,7 +2443,8 @@ union bpf_attr {
 	FN(rc_pointer_rel),		\
 	FN(spin_lock),			\
 	FN(spin_unlock),		\
-	FN(sk_fullsock),
+	FN(sk_fullsock),		\
+	FN(tcp_sock),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -2616,6 +2626,45 @@ struct bpf_sock {
 	__u32 state;
 };
 
+struct bpf_tcp_sock {
+	__u32 snd_cwnd;		/* Sending congestion window		*/
+	__u32 srtt_us;		/* smoothed round trip time << 3 in usecs */
+	__u32 rtt_min;
+	__u32 snd_ssthresh;	/* Slow start size threshold		*/
+	__u32 rcv_nxt;		/* What we want to receive next		*/
+	__u32 snd_nxt;		/* Next sequence we send		*/
+	__u32 snd_una;		/* First byte we want an ack for	*/
+	__u32 mss_cache;	/* Cached effective mss, not including SACKS */
+	__u32 ecn_flags;	/* ECN status bits.			*/
+	__u32 rate_delivered;	/* saved rate sample: packets delivered */
+	__u32 rate_interval_us;	/* saved rate sample: time elapsed */
+	__u32 packets_out;	/* Packets which are "in flight"	*/
+	__u32 retrans_out;	/* Retransmitted packets out		*/
+	__u32 total_retrans;	/* Total retransmits for entire connection */
+	__u32 segs_in;		/* RFC4898 tcpEStatsPerfSegsIn
+				 * total number of segments in.
+				 */
+	__u32 data_segs_in;	/* RFC4898 tcpEStatsPerfDataSegsIn
+				 * total number of data segments in.
+				 */
+	__u32 segs_out;		/* RFC4898 tcpEStatsPerfSegsOut
+				 * The total number of segments sent.
+				 */
+	__u32 data_segs_out;	/* RFC4898 tcpEStatsPerfDataSegsOut
+				 * total number of data segments sent.
+				 */
+	__u32 lost_out;		/* Lost packets			*/
+	__u32 sacked_out;	/* SACK'd packets			*/
+	__u64 bytes_received;	/* RFC4898 tcpEStatsAppHCThruOctetsReceived
+				 * sum(delta(rcv_nxt)), or how many bytes
+				 * were acked.
+				 */
+	__u64 bytes_acked;	/* RFC4898 tcpEStatsAppHCThruOctetsAcked
+				 * sum(delta(snd_una)), or how many bytes
+				 * were acked.
+				 */
+};
+
 struct bpf_sock_tuple {
 	union {
 		struct {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b755d55a3791..1b9496c41383 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -334,14 +334,16 @@ static bool type_is_pkt_pointer(enum bpf_reg_type type)
 static bool type_is_sk_pointer(enum bpf_reg_type type)
 {
 	return type == PTR_TO_SOCKET ||
-		type == PTR_TO_SOCK_COMMON;
+		type == PTR_TO_SOCK_COMMON ||
+		type == PTR_TO_TCP_SOCK;
 }
 
 static bool reg_type_may_be_null(enum bpf_reg_type type)
 {
 	return type == PTR_TO_MAP_VALUE_OR_NULL ||
 	       type == PTR_TO_SOCKET_OR_NULL ||
-	       type == PTR_TO_SOCK_COMMON_OR_NULL;
+	       type == PTR_TO_SOCK_COMMON_OR_NULL ||
+	       type == PTR_TO_TCP_SOCK_OR_NULL;
 }
 
 static bool type_is_refcounted(enum bpf_reg_type type)
@@ -407,6 +409,8 @@ static const char * const reg_type_str[] = {
 	[PTR_TO_SOCKET_OR_NULL] = "sock_or_null",
 	[PTR_TO_SOCK_COMMON]	= "sock_common",
 	[PTR_TO_SOCK_COMMON_OR_NULL] = "sock_common_or_null",
+	[PTR_TO_TCP_SOCK]	= "tcp_sock",
+	[PTR_TO_TCP_SOCK_OR_NULL] = "tcp_sock_or_null",
 };
 
 static char slot_type_char[] = {
@@ -1209,6 +1213,8 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
 	case PTR_TO_SOCKET_OR_NULL:
 	case PTR_TO_SOCK_COMMON:
 	case PTR_TO_SOCK_COMMON_OR_NULL:
+	case PTR_TO_TCP_SOCK:
+	case PTR_TO_TCP_SOCK_OR_NULL:
 		return true;
 	default:
 		return false;
@@ -1662,6 +1668,9 @@ static int check_sock_access(struct bpf_verifier_env *env, int insn_idx,
 	case PTR_TO_SOCKET:
 		valid = bpf_sock_is_valid_access(off, size, t, &info);
 		break;
+	case PTR_TO_TCP_SOCK:
+		valid = bpf_tcp_sock_is_valid_access(off, size, t, &info);
+		break;
 	default:
 		valid = false;
 	}
@@ -1823,6 +1832,9 @@ static int check_ptr_alignment(struct bpf_verifier_env *env,
 	case PTR_TO_SOCK_COMMON:
 		pointer_desc = "sock_common ";
 		break;
+	case PTR_TO_TCP_SOCK:
+		pointer_desc = "tcp_sock ";
+		break;
 	default:
 		break;
 	}
@@ -3148,6 +3160,10 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 			/* For mark_ptr_or_null_reg() */
 			regs[BPF_REG_0].id = ++env->id_gen;
 		}
+	} else if (fn->ret_type == RET_PTR_TO_TCP_SOCK_OR_NULL) {
+		mark_reg_known_zero(env, regs, BPF_REG_0);
+		regs[BPF_REG_0].type = PTR_TO_TCP_SOCK_OR_NULL;
+		regs[BPF_REG_0].id = ++env->id_gen;
 	} else {
 		verbose(env, "unknown return type %d of func %s#%d\n",
 			fn->ret_type, func_id_name(func_id), func_id);
@@ -3409,6 +3425,8 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 	case PTR_TO_SOCKET_OR_NULL:
 	case PTR_TO_SOCK_COMMON:
 	case PTR_TO_SOCK_COMMON_OR_NULL:
+	case PTR_TO_TCP_SOCK:
+	case PTR_TO_TCP_SOCK_OR_NULL:
 		verbose(env, "R%d pointer arithmetic on %s prohibited\n",
 			dst, reg_type_str[ptr_reg->type]);
 		return -EACCES;
@@ -4644,6 +4662,8 @@ static void mark_ptr_or_null_reg(struct bpf_func_state *state,
 			reg->type = PTR_TO_SOCKET;
 		} else if (reg->type == PTR_TO_SOCK_COMMON_OR_NULL) {
 			reg->type = PTR_TO_SOCK_COMMON;
+		} else if (reg->type == PTR_TO_TCP_SOCK_OR_NULL) {
+			reg->type = PTR_TO_TCP_SOCK;
 		}
 		if (is_null || !(reg_is_refcounted(reg) ||
 				 reg_may_point_to_spin_lock(reg))) {
@@ -5839,6 +5859,8 @@ static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur,
 	case PTR_TO_SOCKET_OR_NULL:
 	case PTR_TO_SOCK_COMMON:
 	case PTR_TO_SOCK_COMMON_OR_NULL:
+	case PTR_TO_TCP_SOCK:
+	case PTR_TO_TCP_SOCK_OR_NULL:
 		/* Only valid matches are exact, which memcmp() above
 		 * would have accepted
 		 */
@@ -6161,6 +6183,8 @@ static bool reg_type_mismatch_ok(enum bpf_reg_type type)
 	case PTR_TO_SOCKET_OR_NULL:
 	case PTR_TO_SOCK_COMMON:
 	case PTR_TO_SOCK_COMMON_OR_NULL:
+	case PTR_TO_TCP_SOCK:
+	case PTR_TO_TCP_SOCK_OR_NULL:
 		return false;
 	default:
 		return true;
@@ -7166,6 +7190,9 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 		case PTR_TO_SOCK_COMMON:
 			convert_ctx_access = bpf_sock_convert_ctx_access;
 			break;
+		case PTR_TO_TCP_SOCK:
+			convert_ctx_access = bpf_tcp_sock_convert_ctx_access;
+			break;
 		default:
 			continue;
 		}
diff --git a/net/core/filter.c b/net/core/filter.c
index c0d7b9ef279f..353735575204 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5315,6 +5315,79 @@ static const struct bpf_func_proto bpf_sock_addr_sk_lookup_udp_proto = {
 	.arg5_type	= ARG_ANYTHING,
 };
 
+bool bpf_tcp_sock_is_valid_access(int off, int size, enum bpf_access_type type,
+				  struct bpf_insn_access_aux *info)
+{
+	if (off < 0 || off >= offsetofend(struct bpf_tcp_sock, bytes_acked))
+		return false;
+
+	if (off % size != 0)
+		return false;
+
+	switch (off) {
+	case offsetof(struct bpf_tcp_sock, bytes_received):
+	case offsetof(struct bpf_tcp_sock, bytes_acked):
+		return size == sizeof(__u64);
+	default:
+		return size == sizeof(__u32);
+	}
+}
+
+u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
+				    const struct bpf_insn *si,
+				    struct bpf_insn *insn_buf,
+				    struct bpf_prog *prog, u32 *target_size)
+{
+	struct bpf_insn *insn = insn_buf;
+
+#define BPF_TCP_SOCK_GET_COMMON(FIELD)					\
+	do {								\
+		BUILD_BUG_ON(FIELD_SIZEOF(struct tcp_sock, FIELD) >	\
+			     FIELD_SIZEOF(struct bpf_tcp_sock, FIELD));	\
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct tcp_sock, FIELD),\
+				      si->dst_reg, si->src_reg,		\
+				      offsetof(struct tcp_sock, FIELD)); \
+	} while (0)
+
+	CONVERT_COMMON_TCP_SOCK_FIELDS(struct bpf_tcp_sock,
+				       BPF_TCP_SOCK_GET_COMMON);
+
+	if (insn > insn_buf)
+		return insn - insn_buf;
+
+	switch (si->off) {
+	case offsetof(struct bpf_tcp_sock, rtt_min):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct tcp_sock, rtt_min) !=
+			     sizeof(struct minmax));
+		BUILD_BUG_ON(sizeof(struct minmax) <
+			     sizeof(struct minmax_sample));
+
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
+				      offsetof(struct tcp_sock, rtt_min) +
+				      offsetof(struct minmax_sample, v));
+		break;
+	}
+
+	return insn - insn_buf;
+}
+
+BPF_CALL_1(bpf_tcp_sock, struct sock *, sk)
+{
+	sk = sk_to_full_sk(sk);
+
+	if (sk_fullsock(sk) && sk->sk_protocol == IPPROTO_TCP)
+		return (unsigned long)sk;
+
+	return (unsigned long)NULL;
+}
+
+static const struct bpf_func_proto bpf_tcp_sock_proto = {
+	.func		= bpf_tcp_sock,
+	.gpl_only	= false,
+	.ret_type	= RET_PTR_TO_TCP_SOCK_OR_NULL,
+	.arg1_type	= ARG_PTR_TO_SOCK_COMMON,
+};
+
 #endif /* CONFIG_INET */
 
 bool bpf_helper_changes_pkt_data(void *func)
@@ -5470,6 +5543,10 @@ cg_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_local_storage_proto;
 	case BPF_FUNC_sk_fullsock:
 		return &bpf_sk_fullsock_proto;
+#ifdef CONFIG_INET
+	case BPF_FUNC_tcp_sock:
+		return &bpf_tcp_sock_proto;
+#endif
 	default:
 		return sk_filter_func_proto(func_id, prog);
 	}
@@ -5560,6 +5637,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_sk_lookup_udp_proto;
 	case BPF_FUNC_sk_release:
 		return &bpf_sk_release_proto;
+	case BPF_FUNC_tcp_sock:
+		return &bpf_tcp_sock_proto;
 #endif
 	default:
 		return bpf_base_func_proto(func_id);
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 bpf-next 5/7] bpf: Sync bpf.h to tools/
From: Martin KaFai Lau @ 2019-02-10  7:22 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Lawrence Brakmo
In-Reply-To: <20190210072220.1530061-1-kafai@fb.com>

This patch sync the uapi bpf.h to tools/.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/include/uapi/linux/bpf.h | 72 ++++++++++++++++++++++++++++++----
 1 file changed, 65 insertions(+), 7 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1777fa0c61e4..25c8c0e62ecf 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2329,6 +2329,23 @@ union bpf_attr {
  *		"**y**".
  *	Return
  *		0
+ *
+ * struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)
+ *	Description
+ *		This helper gets a **struct bpf_sock** pointer such
+ *		that all the fields in bpf_sock can be accessed.
+ *	Return
+ *		A **struct bpf_sock** pointer on success, or NULL in
+ *		case of failure.
+ *
+ * struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk)
+ *	Description
+ *		This helper gets a **struct bpf_tcp_sock** pointer from a
+ *		**struct bpf_sock** pointer.
+ *
+ *	Return
+ *		A **struct bpf_tcp_sock** pointer on success, or NULL in
+ *		case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2425,7 +2442,9 @@ union bpf_attr {
 	FN(msg_pop_data),		\
 	FN(rc_pointer_rel),		\
 	FN(spin_lock),			\
-	FN(spin_unlock),
+	FN(spin_unlock),		\
+	FN(sk_fullsock),		\
+	FN(tcp_sock),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -2545,6 +2564,7 @@ struct __sk_buff {
 	__u64 tstamp;
 	__u32 wire_len;
 	__u32 gso_segs;
+	__bpf_md_ptr(struct bpf_sock *, sk);
 };
 
 struct bpf_tunnel_key {
@@ -2596,14 +2616,52 @@ struct bpf_sock {
 	__u32 protocol;
 	__u32 mark;
 	__u32 priority;
-	__u32 src_ip4;		/* Allows 1,2,4-byte read.
-				 * Stored in network byte order.
+	/* IP address also allows 1 and 2 bytes access */
+	__u32 src_ip4;
+	__u32 src_ip6[4];
+	__u32 src_port;		/* host byte order */
+	__u32 dst_port;		/* network byte order */
+	__u32 dst_ip4;
+	__u32 dst_ip6[4];
+	__u32 state;
+};
+
+struct bpf_tcp_sock {
+	__u32 snd_cwnd;		/* Sending congestion window		*/
+	__u32 srtt_us;		/* smoothed round trip time << 3 in usecs */
+	__u32 rtt_min;
+	__u32 snd_ssthresh;	/* Slow start size threshold		*/
+	__u32 rcv_nxt;		/* What we want to receive next		*/
+	__u32 snd_nxt;		/* Next sequence we send		*/
+	__u32 snd_una;		/* First byte we want an ack for	*/
+	__u32 mss_cache;	/* Cached effective mss, not including SACKS */
+	__u32 ecn_flags;	/* ECN status bits.			*/
+	__u32 rate_delivered;	/* saved rate sample: packets delivered */
+	__u32 rate_interval_us;	/* saved rate sample: time elapsed */
+	__u32 packets_out;	/* Packets which are "in flight"	*/
+	__u32 retrans_out;	/* Retransmitted packets out		*/
+	__u32 total_retrans;	/* Total retransmits for entire connection */
+	__u32 segs_in;		/* RFC4898 tcpEStatsPerfSegsIn
+				 * total number of segments in.
 				 */
-	__u32 src_ip6[4];	/* Allows 1,2,4-byte read.
-				 * Stored in network byte order.
+	__u32 data_segs_in;	/* RFC4898 tcpEStatsPerfDataSegsIn
+				 * total number of data segments in.
+				 */
+	__u32 segs_out;		/* RFC4898 tcpEStatsPerfSegsOut
+				 * The total number of segments sent.
+				 */
+	__u32 data_segs_out;	/* RFC4898 tcpEStatsPerfDataSegsOut
+				 * total number of data segments sent.
+				 */
+	__u32 lost_out;		/* Lost packets			*/
+	__u32 sacked_out;	/* SACK'd packets			*/
+	__u64 bytes_received;	/* RFC4898 tcpEStatsAppHCThruOctetsReceived
+				 * sum(delta(rcv_nxt)), or how many bytes
+				 * were acked.
 				 */
-	__u32 src_port;		/* Allows 4-byte read.
-				 * Stored in host byte order
+	__u64 bytes_acked;	/* RFC4898 tcpEStatsAppHCThruOctetsAcked
+				 * sum(delta(snd_una)), or how many bytes
+				 * were acked.
 				 */
 };
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 bpf-next 3/7] bpf: Refactor sock_ops_convert_ctx_access
From: Martin KaFai Lau @ 2019-02-10  7:22 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Lawrence Brakmo
In-Reply-To: <20190210072220.1530061-1-kafai@fb.com>

The next patch will introduce a new "struct bpf_tcp_sock" which
exposes the same tcp_sock's fields already exposed in
"struct bpf_sock_ops".

This patch refactor the existing convert_ctx_access() codes for
"struct bpf_sock_ops" to get them ready to be reused for
"struct bpf_tcp_sock".  The "rtt_min" is not refactored
in this patch because its handling is different from other
fields.

The SOCK_OPS_GET_TCP_SOCK_FIELD is new. All other SOCK_OPS_XXX_FIELD
changes are code move only.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/core/filter.c | 287 ++++++++++++++++++++--------------------------
 1 file changed, 127 insertions(+), 160 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 01bb64bf2b5e..c0d7b9ef279f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5030,6 +5030,54 @@ static const struct bpf_func_proto bpf_lwt_seg6_adjust_srh_proto = {
 };
 #endif /* CONFIG_IPV6_SEG6_BPF */
 
+#define CONVERT_COMMON_TCP_SOCK_FIELDS(md_type, CONVERT)		\
+do {									\
+	switch (si->off) {						\
+	case offsetof(md_type, snd_cwnd):				\
+		CONVERT(snd_cwnd); break;				\
+	case offsetof(md_type, srtt_us):				\
+		CONVERT(srtt_us); break;				\
+	case offsetof(md_type, snd_ssthresh):				\
+		CONVERT(snd_ssthresh); break;				\
+	case offsetof(md_type, rcv_nxt):				\
+		CONVERT(rcv_nxt); break;				\
+	case offsetof(md_type, snd_nxt):				\
+		CONVERT(snd_nxt); break;				\
+	case offsetof(md_type, snd_una):				\
+		CONVERT(snd_una); break;				\
+	case offsetof(md_type, mss_cache):				\
+		CONVERT(mss_cache); break;				\
+	case offsetof(md_type, ecn_flags):				\
+		CONVERT(ecn_flags); break;				\
+	case offsetof(md_type, rate_delivered):				\
+		CONVERT(rate_delivered); break;				\
+	case offsetof(md_type, rate_interval_us):			\
+		CONVERT(rate_interval_us); break;			\
+	case offsetof(md_type, packets_out):				\
+		CONVERT(packets_out); break;				\
+	case offsetof(md_type, retrans_out):				\
+		CONVERT(retrans_out); break;				\
+	case offsetof(md_type, total_retrans):				\
+		CONVERT(total_retrans); break;				\
+	case offsetof(md_type, segs_in):				\
+		CONVERT(segs_in); break;				\
+	case offsetof(md_type, data_segs_in):				\
+		CONVERT(data_segs_in); break;				\
+	case offsetof(md_type, segs_out):				\
+		CONVERT(segs_out); break;				\
+	case offsetof(md_type, data_segs_out):				\
+		CONVERT(data_segs_out); break;				\
+	case offsetof(md_type, lost_out):				\
+		CONVERT(lost_out); break;				\
+	case offsetof(md_type, sacked_out):				\
+		CONVERT(sacked_out); break;				\
+	case offsetof(md_type, bytes_received):				\
+		CONVERT(bytes_received); break;				\
+	case offsetof(md_type, bytes_acked):				\
+		CONVERT(bytes_acked); break;				\
+	}								\
+} while (0)
+
 #ifdef CONFIG_INET
 static struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple *tuple,
 			      int dif, int sdif, u8 family, u8 proto)
@@ -7196,6 +7244,85 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 	struct bpf_insn *insn = insn_buf;
 	int off;
 
+/* Helper macro for adding read access to tcp_sock or sock fields. */
+#define SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ)			      \
+	do {								      \
+		BUILD_BUG_ON(FIELD_SIZEOF(OBJ, OBJ_FIELD) >		      \
+			     FIELD_SIZEOF(struct bpf_sock_ops, BPF_FIELD));   \
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
+						struct bpf_sock_ops_kern,     \
+						is_fullsock),		      \
+				      si->dst_reg, si->src_reg,		      \
+				      offsetof(struct bpf_sock_ops_kern,      \
+					       is_fullsock));		      \
+		*insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 2);	      \
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
+						struct bpf_sock_ops_kern, sk),\
+				      si->dst_reg, si->src_reg,		      \
+				      offsetof(struct bpf_sock_ops_kern, sk));\
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(OBJ,		      \
+						       OBJ_FIELD),	      \
+				      si->dst_reg, si->dst_reg,		      \
+				      offsetof(OBJ, OBJ_FIELD));	      \
+	} while (0)
+
+#define SOCK_OPS_GET_TCP_SOCK_FIELD(FIELD) \
+		SOCK_OPS_GET_FIELD(FIELD, FIELD, struct tcp_sock)
+
+/* Helper macro for adding write access to tcp_sock or sock fields.
+ * The macro is called with two registers, dst_reg which contains a pointer
+ * to ctx (context) and src_reg which contains the value that should be
+ * stored. However, we need an additional register since we cannot overwrite
+ * dst_reg because it may be used later in the program.
+ * Instead we "borrow" one of the other register. We first save its value
+ * into a new (temp) field in bpf_sock_ops_kern, use it, and then restore
+ * it at the end of the macro.
+ */
+#define SOCK_OPS_SET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ)			      \
+	do {								      \
+		int reg = BPF_REG_9;					      \
+		BUILD_BUG_ON(FIELD_SIZEOF(OBJ, OBJ_FIELD) >		      \
+			     FIELD_SIZEOF(struct bpf_sock_ops, BPF_FIELD));   \
+		if (si->dst_reg == reg || si->src_reg == reg)		      \
+			reg--;						      \
+		if (si->dst_reg == reg || si->src_reg == reg)		      \
+			reg--;						      \
+		*insn++ = BPF_STX_MEM(BPF_DW, si->dst_reg, reg,		      \
+				      offsetof(struct bpf_sock_ops_kern,      \
+					       temp));			      \
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
+						struct bpf_sock_ops_kern,     \
+						is_fullsock),		      \
+				      reg, si->dst_reg,			      \
+				      offsetof(struct bpf_sock_ops_kern,      \
+					       is_fullsock));		      \
+		*insn++ = BPF_JMP_IMM(BPF_JEQ, reg, 0, 2);		      \
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
+						struct bpf_sock_ops_kern, sk),\
+				      reg, si->dst_reg,			      \
+				      offsetof(struct bpf_sock_ops_kern, sk));\
+		*insn++ = BPF_STX_MEM(BPF_FIELD_SIZEOF(OBJ, OBJ_FIELD),	      \
+				      reg, si->src_reg,			      \
+				      offsetof(OBJ, OBJ_FIELD));	      \
+		*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->dst_reg,		      \
+				      offsetof(struct bpf_sock_ops_kern,      \
+					       temp));			      \
+	} while (0)
+
+#define SOCK_OPS_GET_OR_SET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ, TYPE)	      \
+	do {								      \
+		if (TYPE == BPF_WRITE)					      \
+			SOCK_OPS_SET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ);	      \
+		else							      \
+			SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ);	      \
+	} while (0)
+
+	CONVERT_COMMON_TCP_SOCK_FIELDS(struct bpf_sock_ops,
+				       SOCK_OPS_GET_TCP_SOCK_FIELD);
+
+	if (insn > insn_buf)
+		return insn - insn_buf;
+
 	switch (si->off) {
 	case offsetof(struct bpf_sock_ops, op) ...
 	     offsetof(struct bpf_sock_ops, replylong[3]):
@@ -7353,175 +7480,15 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 				      FIELD_SIZEOF(struct minmax_sample, t));
 		break;
 
-/* Helper macro for adding read access to tcp_sock or sock fields. */
-#define SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ)			      \
-	do {								      \
-		BUILD_BUG_ON(FIELD_SIZEOF(OBJ, OBJ_FIELD) >		      \
-			     FIELD_SIZEOF(struct bpf_sock_ops, BPF_FIELD));   \
-		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
-						struct bpf_sock_ops_kern,     \
-						is_fullsock),		      \
-				      si->dst_reg, si->src_reg,		      \
-				      offsetof(struct bpf_sock_ops_kern,      \
-					       is_fullsock));		      \
-		*insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 2);	      \
-		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
-						struct bpf_sock_ops_kern, sk),\
-				      si->dst_reg, si->src_reg,		      \
-				      offsetof(struct bpf_sock_ops_kern, sk));\
-		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(OBJ,		      \
-						       OBJ_FIELD),	      \
-				      si->dst_reg, si->dst_reg,		      \
-				      offsetof(OBJ, OBJ_FIELD));	      \
-	} while (0)
-
-/* Helper macro for adding write access to tcp_sock or sock fields.
- * The macro is called with two registers, dst_reg which contains a pointer
- * to ctx (context) and src_reg which contains the value that should be
- * stored. However, we need an additional register since we cannot overwrite
- * dst_reg because it may be used later in the program.
- * Instead we "borrow" one of the other register. We first save its value
- * into a new (temp) field in bpf_sock_ops_kern, use it, and then restore
- * it at the end of the macro.
- */
-#define SOCK_OPS_SET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ)			      \
-	do {								      \
-		int reg = BPF_REG_9;					      \
-		BUILD_BUG_ON(FIELD_SIZEOF(OBJ, OBJ_FIELD) >		      \
-			     FIELD_SIZEOF(struct bpf_sock_ops, BPF_FIELD));   \
-		if (si->dst_reg == reg || si->src_reg == reg)		      \
-			reg--;						      \
-		if (si->dst_reg == reg || si->src_reg == reg)		      \
-			reg--;						      \
-		*insn++ = BPF_STX_MEM(BPF_DW, si->dst_reg, reg,		      \
-				      offsetof(struct bpf_sock_ops_kern,      \
-					       temp));			      \
-		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
-						struct bpf_sock_ops_kern,     \
-						is_fullsock),		      \
-				      reg, si->dst_reg,			      \
-				      offsetof(struct bpf_sock_ops_kern,      \
-					       is_fullsock));		      \
-		*insn++ = BPF_JMP_IMM(BPF_JEQ, reg, 0, 2);		      \
-		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
-						struct bpf_sock_ops_kern, sk),\
-				      reg, si->dst_reg,			      \
-				      offsetof(struct bpf_sock_ops_kern, sk));\
-		*insn++ = BPF_STX_MEM(BPF_FIELD_SIZEOF(OBJ, OBJ_FIELD),	      \
-				      reg, si->src_reg,			      \
-				      offsetof(OBJ, OBJ_FIELD));	      \
-		*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->dst_reg,		      \
-				      offsetof(struct bpf_sock_ops_kern,      \
-					       temp));			      \
-	} while (0)
-
-#define SOCK_OPS_GET_OR_SET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ, TYPE)	      \
-	do {								      \
-		if (TYPE == BPF_WRITE)					      \
-			SOCK_OPS_SET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ);	      \
-		else							      \
-			SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ);	      \
-	} while (0)
-
-	case offsetof(struct bpf_sock_ops, snd_cwnd):
-		SOCK_OPS_GET_FIELD(snd_cwnd, snd_cwnd, struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, srtt_us):
-		SOCK_OPS_GET_FIELD(srtt_us, srtt_us, struct tcp_sock);
-		break;
-
 	case offsetof(struct bpf_sock_ops, bpf_sock_ops_cb_flags):
 		SOCK_OPS_GET_FIELD(bpf_sock_ops_cb_flags, bpf_sock_ops_cb_flags,
 				   struct tcp_sock);
 		break;
 
-	case offsetof(struct bpf_sock_ops, snd_ssthresh):
-		SOCK_OPS_GET_FIELD(snd_ssthresh, snd_ssthresh, struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, rcv_nxt):
-		SOCK_OPS_GET_FIELD(rcv_nxt, rcv_nxt, struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, snd_nxt):
-		SOCK_OPS_GET_FIELD(snd_nxt, snd_nxt, struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, snd_una):
-		SOCK_OPS_GET_FIELD(snd_una, snd_una, struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, mss_cache):
-		SOCK_OPS_GET_FIELD(mss_cache, mss_cache, struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, ecn_flags):
-		SOCK_OPS_GET_FIELD(ecn_flags, ecn_flags, struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, rate_delivered):
-		SOCK_OPS_GET_FIELD(rate_delivered, rate_delivered,
-				   struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, rate_interval_us):
-		SOCK_OPS_GET_FIELD(rate_interval_us, rate_interval_us,
-				   struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, packets_out):
-		SOCK_OPS_GET_FIELD(packets_out, packets_out, struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, retrans_out):
-		SOCK_OPS_GET_FIELD(retrans_out, retrans_out, struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, total_retrans):
-		SOCK_OPS_GET_FIELD(total_retrans, total_retrans,
-				   struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, segs_in):
-		SOCK_OPS_GET_FIELD(segs_in, segs_in, struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, data_segs_in):
-		SOCK_OPS_GET_FIELD(data_segs_in, data_segs_in, struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, segs_out):
-		SOCK_OPS_GET_FIELD(segs_out, segs_out, struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, data_segs_out):
-		SOCK_OPS_GET_FIELD(data_segs_out, data_segs_out,
-				   struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, lost_out):
-		SOCK_OPS_GET_FIELD(lost_out, lost_out, struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, sacked_out):
-		SOCK_OPS_GET_FIELD(sacked_out, sacked_out, struct tcp_sock);
-		break;
-
 	case offsetof(struct bpf_sock_ops, sk_txhash):
 		SOCK_OPS_GET_OR_SET_FIELD(sk_txhash, sk_txhash,
 					  struct sock, type);
 		break;
-
-	case offsetof(struct bpf_sock_ops, bytes_received):
-		SOCK_OPS_GET_FIELD(bytes_received, bytes_received,
-				   struct tcp_sock);
-		break;
-
-	case offsetof(struct bpf_sock_ops, bytes_acked):
-		SOCK_OPS_GET_FIELD(bytes_acked, bytes_acked, struct tcp_sock);
-		break;
-
 	}
 	return insn - insn_buf;
 }
-- 
2.17.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox