Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH rdma-next v3 0/3] ODP support for mlx5 DC QPs
From: Leon Romanovsky @ 2019-08-28  7:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Michael Guralnik, Saeed Mahameed,
	linux-netdev
In-Reply-To: <20190827155140.GA15153@ziepe.ca>

On Tue, Aug 27, 2019 at 12:51:40PM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 19, 2019 at 03:08:12PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@mellanox.com>
> >
> > Changelog
> >  v3:
> >  * Rewrote patches to expose through DEVX without need to change mlx5-abi.h at all.
> >  v2: https://lore.kernel.org/linux-rdma/20190806074807.9111-1-leon@kernel.org
> >  * Fixed reserved_* field wrong name (Saeed M.)
> >  * Split first patch to two patches, one for mlx5-next and one for  rdma-next. (Saeed M.)
> >  v1: https://lore.kernel.org/linux-rdma/20190804100048.32671-1-leon@kernel.org
> >  * Fixed alignment to u64 in mlx5-abi.h (Gal P.)
> >  v0: https://lore.kernel.org/linux-rdma/20190801122139.25224-1-leon@kernel.org
> >
> > >From Michael,
> >
> > The series adds support for on-demand paging for DC transport.
> >
> > As DC is mlx-only transport, the capabilities are exposed
> > to the user using DEVX objects and later on through mlx5dv_query_device.
> >
> > Thanks
> >
> > Michael Guralnik (3):
> >   net/mlx5: Set ODP capabilities for DC transport to max
> >   IB/mlx5: Remove check of FW capabilities in ODP page fault handling
> >   IB/mlx5: Add page fault handler for DC initiator WQE
>
> This seems fine, can you put the commit on the shared branch?

Thanks, applied to mlx5-next
00679b631edd net/mlx5: Set ODP capabilities for DC transport to max

>
> Thanks,
> Jason

^ permalink raw reply

* Re: [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames
From: Jiri Pirko @ 2019-08-28  7:07 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: David Miller, Jakub Kicinski, David Ahern, netdev,
	Stephen Hemminger, dcbw, Michal Kubecek, Andrew Lunn, parav,
	Saeed Mahameed, mlxsw
In-Reply-To: <CAJieiUjpE+o-=x2hQcsKQJNxB8O7VLHYw2tSnqzTFRuy_vtOxw@mail.gmail.com>

Tue, Aug 27, 2019 at 05:14:49PM CEST, roopa@cumulusnetworks.com wrote:
>On Tue, Aug 27, 2019 at 2:35 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Tue, Aug 27, 2019 at 10:22:42AM CEST, davem@davemloft.net wrote:
>> >From: Jiri Pirko <jiri@resnulli.us>
>> >Date: Tue, 27 Aug 2019 09:08:08 +0200
>> >
>> >> Okay, so if I understand correctly, on top of separate commands for
>> >> add/del of alternative names, you suggest also get/dump to be separate
>> >> command and don't fill this up in existing newling/getlink command.
>> >
>> >I'm not sure what to do yet.
>> >
>> >David has a point, because the only way these ifnames are useful is
>> >as ways to specify and choose net devices.  So based upon that I'm
>> >slightly learning towards not using separate commands.
>>
>> Well yeah, one can use it to handle existing commands instead of
>> IFLA_NAME.
>>
>> But why does it rule out separate commands? I think it is cleaner than
>> to put everything in poor setlink messages :/ The fact that we would
>> need to add "OP" to the setlink message just feels of. Other similar
>> needs may show up in the future and we may endup in ridiculous messages
>> like:
>>
>> SETLINK
>>   IFLA_NAME eth0
>>   IFLA_ATLNAME_LIST (nest)
>>       IFLA_ALTNAME_OP add
>>       IFLA_ALTNAME somereallylongname
>>       IFLA_ALTNAME_OP del
>>       IFLA_ALTNAME somereallyreallylongname
>>       IFLA_ALTNAME_OP add
>>       IFLA_ALTNAME someotherreallylongname
>>   IFLA_SOMETHING_ELSE_LIST (nest)
>>       IFLA_SOMETHING_ELSE_OP add
>>       ...
>>       IFLA_SOMETHING_ELSE_OP del
>>       ...
>>       IFLA_SOMETHING_ELSE_OP add
>>       ...
>>
>> I don't know what to think about it. Rollbacks are going to be pure hell :/
>
>I don't see a huge problem with the above. We need a way to solve this
>anyways for other list types in the future correct ?.
>The approach taken by this series will not scale if we have to add a
>new msg type and header for every such list attribute in the future.

Do you have some other examples in mind? So far, this was not needed.


>
>A good parallel here is bridge vlan which uses RTM_SETLINK and
>RTM_DELLINK for vlan add and deletes. But it does have an advantage of
>a separate
>msg space under AF_BRIDGE which makes it cleaner. Maybe something
>closer to that  can be made to work (possibly with a msg flag) ?.

1) Not sure if AF_BRIDGE is the right example how to do things
2) See br_vlan_info(). It is not an OP-PER-VLAN. You either add or
delete all passed info, depending on the cmd (RTM_SETLINK/RTM_DETLINK).


>
>Would be good to have a consistent way to update list attributes for
>future needs too.

Okay. Do you suggest to have new set of commands to handle
adding/deleting lists of items? altNames now, others (other nests) later?

Something like:

CMD SETLISTS
     IFLA_NAME eth0
     IFLA_ATLNAME_LIST (nest)
       IFLA_ALTNAME somereallylongname
       IFLA_ALTNAME somereallyreallylongname
       IFLA_ALTNAME someotherreallylongname
     IFLA_SOMETHING_ELSE_LIST (nest)
       IFLA_SOMETHING_ELSE
       IFLA_SOMETHING_ELSE
       IFLA_SOMETHING_ELSE


CMD DELLISTS
     IFLA_NAME eth0
     IFLA_ATLNAME_LIST (nest)
       IFLA_ALTNAME somereallylongname
       IFLA_ALTNAME somereallyreallylongname
       IFLA_ALTNAME someotherreallylongname
     IFLA_SOMETHING_ELSE_LIST (nest)
       IFLA_SOMETHING_ELSE
       IFLA_SOMETHING_ELSE
       IFLA_SOMETHING_ELSE

How does this sound?

^ permalink raw reply

* Re: libbpf distro packaging
From: Jiri Olsa @ 2019-08-28  7:12 UTC (permalink / raw)
  To: Julia Kartseva
  Cc: Alexei Starovoitov, Andrii Nakryiko, labbott@redhat.com,
	acme@kernel.org, debian-kernel@lists.debian.org,
	netdev@vger.kernel.org, Andrey Ignatov, Yonghong Song,
	jolsa@kernel.org, Daniel Borkmann
In-Reply-To: <A2E805DD-8237-4703-BE6F-CC96A4D4D909@fb.com>

On Tue, Aug 27, 2019 at 10:30:24PM +0000, Julia Kartseva wrote:
> On 8/25/19, 11:42 PM, "Jiri Olsa" <jolsa@redhat.com> wrote:
> 
> > On Fri, Aug 23, 2019 at 04:00:01PM +0000, Alexei Starovoitov wrote:
> > > 
> > > Technically we can bump it at any time.
> > > The goal was to bump it only when new kernel is released
> > > to capture a collection of new APIs in a given 0.0.X release.
> > > So that libbpf versions are synchronized with kernel versions
> > > in some what loose way.
> > > In this case we can make an exception and bump it now.
> >
> > I see, I dont think it's worth of the exception now,
> > the patch is simple or we'll start with 0.0.3
> 
> PR introducing 0.0.5 ABI was merged:
> https://github.com/libbpf/libbpf/commit/476e158
> Jiri, you'd like to avoid patching, you can start w/ 0.0.5.
> Also if you're planning to use *.spec from libbpf as a source of truth,
> It may be enhanced by syncing spec and ABI versions, similar to
> https://github.com/libbpf/libbpf/commit/d60f568

cool, anyway I started with v0.0.3 ;-) I'll update
to latest once we are merged in

the spec/srpm is currently under Fedora review:
  https://bugzilla.redhat.com/show_bug.cgi?id=1745478

you can check it in here:
  http://people.redhat.com/~jolsa/libbpf/v2/

I think it's little different from what you have,
but not in the essential parts

jirka

^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Peter Zijlstra @ 2019-08-28  7:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Kees Cook, LSM List, James Morris, Jann Horn,
	Masami Hiramatsu, Steven Rostedt, David S. Miller,
	Daniel Borkmann, Network Development, bpf, kernel-team, Linux API
In-Reply-To: <CALCETrV8iJv9+Ai11_1_r6MapPhhwt9hjxi=6EoixytabTScqg@mail.gmail.com>

On Tue, Aug 27, 2019 at 04:01:08PM -0700, Andy Lutomirski wrote:

> > Tracing:
> >
> > CAP_BPF and perf_paranoid_tracepoint_raw() (which is kernel.perf_event_paranoid == -1)
> > are necessary to:

That's not tracing, that's perf.

> > +bool cap_bpf_tracing(void)
> > +{
> > +       return capable(CAP_SYS_ADMIN) ||
> > +              (capable(CAP_BPF) && !perf_paranoid_tracepoint_raw());
> > +}

A whole long time ago, I proposed we introduce CAP_PERF or something
along those lines; as a replacement for that horrible crap Android and
Debian ship. But nobody was ever interested enough.

The nice thing about that is that you can then disallow perf/tracing in
general, but tag the perf executable (and similar tools) with the
capability so that unpriv users can still use it, but only limited
through the tool, not the syscalls directly.


^ permalink raw reply

* Re: [E1000-devel] SFP+ EEPROM readouts fail on X722 (ethtool -m: Invalid argument)
From: Jakub Jankowski @ 2019-08-28  7:18 UTC (permalink / raw)
  To: Fujinaka, Todd, e1000-devel@lists.sourceforge.net
  Cc: netdev@vger.kernel.org, mhemsley@open-systems.com, Jeff Kirsher,
	Lihong Yang
In-Reply-To: <9B4A1B1917080E46B64F07F2989DADD69B01402F@ORSMSX115.amr.corp.intel.com>

This commit suggests that it should be possible: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c271dd6c391b535226cf1a81aaad9f33cb5899d3
(It has been in upstream kernel since v4.12, so my test kernel does have 
it, and so does the out-of-tree driver I'm testing with)

On 8/28/19 2:53 AM, Fujinaka, Todd wrote:
> Sorry about the top posting, but if I don't do it this way I can't read anything in Outlook (not my preferred MUA).
>
> I think I may have been wrong about things. I'm not as familiar with the x722, and the NVM versions are completely different than the x710 and I was confused.
>
> Even worse, I'm not sure if the x722 is able to read the data from the SFP/SFP+ EEPROM. I remembered that was a feature we requested internally but I don't remember what the progress was.
>
> I'm asking around to see if I can get clarification. I haven't heard anything yet.
>
> Todd Fujinaka
> Software Application Engineer
> Datacenter Engineering Group
> Intel Corporation
> todd.fujinaka@intel.com
>
>
> -----Original Message-----
> From: Jakub Jankowski [mailto:shasta@toxcorp.com]
> Sent: Tuesday, August 27, 2019 4:01 PM
> To: Fujinaka, Todd <todd.fujinaka@intel.com>; e1000-devel@lists.sourceforge.net
> Cc: netdev@vger.kernel.org; mhemsley@open-systems.com
> Subject: Re: [E1000-devel] SFP+ EEPROM readouts fail on X722 (ethtool -m: Invalid argument)
>
> Hi,
>
> On 8/27/19 7:56 PM, Fujinaka, Todd wrote:
>> The hints should be:
>> # ethtool -m eth10
>> Cannot get module EEPROM information: Invalid argument # dmesg | tail -n 1 [  445.971974] i40e 0000:3d:00.3 eth10: Module EEPROM memory read not supported. Please update the NVM image.
>>
>> # ethtool -i eth10
>> driver: i40e
>> version: 2.9.21
>> firmware-version: 3.31 0x80000d31 1.1767.0
>>
>> And the working case:
>> # ethtool -i eth8
>> driver: i40e
>> version: 2.9.21
>> firmware-version: 6.01 0x800035cf 1.1876.0
>>
>> If you don't see it, 6.01 > 3.31.
> The reason why firmware between the two is (that much) different is because the non-working case is from X722 NIC, while the working one is from X710.
>
>> The NVM update tool should be available on downloadcenter.intel.com
> Thanks for the pointer to NVM updater. I'd like to offer some additional comments about my experience with the newest one (v4.00):
>
> a) running ./nvmupdate64e (from X722_NVMUpdate_Linux_x64 subdir) errors out without really saying what's wrong:
>
>     # ./nvmupdate64e
>
>     Intel(R) Ethernet NVM Update Tool
>     NVMUpdate version 1.30.2.11
>     Copyright (C) 2013 - 2017 Intel Corporation.
>
>
>     WARNING: To avoid damage to your device, do not stop the update or reboot or power off the system during this update.
>     Inventory in progress. Please wait [+.........]
>     Tool execution completed with the following status: The configuration file could not be opened/read, or a syntax error was discovered in the file
>     Press any key to exit.
>
> after enabling logging (-l out.log) a bit more is revealed:
>
>     # tail -n 2 out.log
>     Error:   Config file line 2: Not supported config file version.
>     Error:   Missing CONFIG VERSION parameter in configuration file.
>
> but that's not entirely true, CONFIG VERSION is set in the default configuration file:
>
>     # head -n 2 nvmupdate.cfg
>     CURRENT FAMILY: 1.0.0
>     CONFIG VERSION: 1.14.0
>
> so why isn't this understood?
> Manually editing nvmupdate.cfg and setting CONFIG VERSION: 1.11.0 seems to make this particular problem go away.
>
> b) Re-doing this with downgraded config version exposes another problem:
>
>     Config file read.
>     Error:   Can't open NVM map file [Immediate_offset_2.txt]
>
> and indeed, there is no Immediate_offset_2.txt in NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_Linux_x64/
> There is one, however, in
> NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_EFIx64/ subdir.
> Copying it over to the _Linux_x64 resolves this particular problem
>
> c) Re-doing this with Immediate_offset_2.txt in place exposes third problem:
>
>     Error:   Can't open NVM image file
> [LBG_B2_Wolf_Pass_WFT_X557_P01_PHY_Auto_Detect_P23_NCSI_v3.31_800016DB.bin]
>
> and once again - same story. It exists in NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_EFIx64/ but not NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_Linux_x64/ - had to copy it over.
>
>
> Once I managed to get all these out of the way, the tool finally ran:
>
>     Num Description                               Ver. DevId S:B Status
>     === ======================================== ===== ===== ====== ===============
>     01) Intel(R) Ethernet Server Adapter I350-T4  1.99  1521 00:024 Update not available
>     02) Intel(R) Ethernet Connection X722 for     3.49  37D2 00:061 Update
>         10GBASE-T available
>     03) Intel(R) Ethernet Server Adapter I350-T4  1.99  1521 00:175 Update not available
>
>
> The initial starting point was:
>
> 0) firmware-version: 3.31 0x80000d31 1.1767.0
>
> After first update+reboot, this was bumped to:
>
> 1) firmware-version: 3.1d 0x800016db 1.1767.0    (but ethtool -m ethX still doesn't work)
>
> So I ran the tool the second time, it said 'Update available' again, but this time:
>
>     Num Description                               Ver. DevId S:B Status
>     === ======================================== ===== ===== ====== ===============
>     01) Intel(R) Ethernet Server Adapter I350-T4  1.99  1521 00:024 Update not available
>     02) Intel(R) Ethernet Connection X722 for     3.29  37D2 00:061 Update
>         10GBASE-T available
>     03) Intel(R) Ethernet Server Adapter I350-T4  1.99  1521 00:175 Update not available
>
>     Options: Adapter Index List (comma-separated), [A]ll, e[X]it
>     Enter selection:02
>     Would you like to back up the NVM images? [Y]es/[N]o: Y
>     Update in progress. This operation may take several minutes.
>     [*******+..]
>     Tool execution completed with the following status: <---------- why is there no status printed?
>     Press any key to exit.
>
>
> Checking output log:
>
>     # cat out3.log
>     Intel(R) Ethernet NVM Update Tool
>     NVMUpdate version 1.30.2.11
>     Copyright (C) 2013 - 2017 Intel Corporation.
>
>     ./nvmupdate64e -c nvmupdate.cfg -l out3.log
>
>     Config file read.
>     Inventory
>     [00:061:00:00]: Intel(R) Ethernet Connection X722 for 10GBASE-T
>         Flash inventory started
>         Shadow RAM inventory started
>     Alternate MAC address is not set
>         Shadow RAM inventory finished
>         Flash inventory finished
>         OROM inventory started
>         OROM inventory finished
>         PHY NVM inventory started
>         PHY NVM inventory finished
>     [00:061:00:01]: Intel(R) Ethernet Connection X722 for 10GBASE-T
>         Device already inventoried.
>     [00:061:00:02]: Intel(R) Ethernet Connection X722 for 10GbE SFP+
>         Device already inventoried.
>         PHY NVM inventory started
>         PHY NVM inventory finished
>     [00:061:00:03]: Intel(R) Ethernet Connection X722 for 10GbE SFP+
>         Device already inventoried.
>     Update
>     [00:061:00:00]: Intel(R) Ethernet Connection X722 for 10GBASE-T
>         Creating backup images in directory: A4BF0164884A
>         Backup images created.
>         Flash update started
>         NVM image verification started
>         Shadow RAM image verification started
>
>     Image differences found at offset 0x3AE [Device=0xF, Buffer=0x0] -
> update required.
>     Error:   Flash update failed
>     [00:061:00:02]: Intel(R) Ethernet Connection X722 for 10GbE SFP+
>     #
>
> However, ethtool -i suggests that firmware was updated to:
>
> 2) firmware-version: 4.00 0x80001577 1.1580.0    <------- so it did
> _something_ after all?
>
> At this point, every subsequent attempt to run the NVM updater yields
> the same results: an update is available, but attempting to apply it
> fails with the same message in log.
>
> And my initial issue still persists - ethtool -m <iface> still returns
> "invalid argument" with "Module EEPROM memory read not supported. Please
> update the NVM image" logged in dmesg.
>
> How can I resolve this?
>
> Cheers,
>    Jakub.
>
>> Todd Fujinaka
>> Software Application Engineer
>> Datacenter Engineering Group
>> Intel Corporation
>> todd.fujinaka@intel.com
>>
>>
>> -----Original Message-----
>> From: Jakub Jankowski [mailto:shasta@toxcorp.com]
>> Sent: Tuesday, August 27, 2019 4:03 AM
>> To: e1000-devel@lists.sourceforge.net
>> Cc: netdev@vger.kernel.org; shasta@toxcorp.com; mhemsley@open-systems.com
>> Subject: [E1000-devel] SFP+ EEPROM readouts fail on X722 (ethtool -m: Invalid argument)
>>
>> Hi,
>>
>> We can't get SFP+ EEPROM readouts for X722 to work at all:
>>
>> # ethtool -m eth10
>> Cannot get module EEPROM information: Invalid argument # dmesg | tail -n 1 [  445.971974] i40e 0000:3d:00.3 eth10: Module EEPROM memory read not supported. Please update the NVM image.
>> # lspci | grep 3d:00.3
>> 3d:00.3 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GbE SFP+ (rev 09)
>>
>>
>> We're running 4.19.65 kernel at the moment, testing using the newest out-of-tree Intel module
>>
>> # modinfo -F version i40e
>> 2.9.21
>>
>> We also tried:
>> - 4.19.65 with in-tree i40e (2.3.2-k)
>> - stock Arch Linux (kernel 5.2.5, driver 2.8.20-k) and the results are the same, as shown above.
>>
>> # ethtool -i eth10
>> driver: i40e
>> version: 2.9.21
>> firmware-version: 3.31 0x80000d31 1.1767.0
>> expansion-rom-version:
>> bus-info: 0000:3d:00.3
>> supports-statistics: yes
>> supports-test: yes
>> supports-eeprom-access: yes
>> supports-register-dump: yes
>> supports-priv-flags: yes
>> # dmidecode -s baseboard-manufacturer
>> Intel Corporation
>> # dmidecode -s baseboard-product-name
>> S2600WFT
>> # dmidecode -s baseboard-version
>> H48104-853
>>
>> # lspci -vvv
>> (...)
>> 3d:00.3 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GbE SFP+ (rev 09)
>> 	DeviceName: Intel PCH Integrated 10 Gigabit Ethernet Controller
>> 	Subsystem: Intel Corporation Ethernet Connection X722 for 10GbE SFP+
>> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
>> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>> 	Latency: 0, Cache Line Size: 32 bytes
>> 	Interrupt: pin A routed to IRQ 112
>> 	NUMA node: 0
>> 	Region 0: Memory at ab000000 (64-bit, prefetchable) [size=16M]
>> 	Region 3: Memory at b0000000 (64-bit, prefetchable) [size=32K]
>> 	Expansion ROM at <ignored> [disabled]
>> 	Capabilities: [40] Power Management version 3
>> 		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>> 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>> 	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
>> 		Address: 0000000000000000  Data: 0000
>> 		Masking: 00000000  Pending: 00000000
>> 	Capabilities: [70] MSI-X: Enable+ Count=129 Masked-
>> 		Vector table: BAR=3 offset=00000000
>> 		PBA: BAR=3 offset=00001000
>> 	Capabilities: [a0] Express (v2) Endpoint, MSI 00
>> 		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
>> 			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
>> 		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
>> 			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-
>> 			MaxPayload 256 bytes, MaxReadReq 512 bytes
>> 		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
>> 		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
>> 			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
>> 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>> 		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
>> 			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>> 		DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported
>> 			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
>> 			 AtomicOpsCtl: ReqEn-
>> 		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
>> 			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>> 	Capabilities: [e0] Vital Product Data
>> 		Product Name: Example VPD
>> 		Read-only fields:
>> 			[V0] Vendor specific:
>> 			[RV] Reserved: checksum good, 0 byte(s) reserved
>> 		End
>> 	Capabilities: [100 v2] Advanced Error Reporting
>> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
>> 		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>> 		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
>> 			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>> 		HeaderLog: 00000000 00000000 00000000 00000000
>> 	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
>> 		ARICap:	MFVC- ACS-, Next Function: 0
>> 		ARICtl:	MFVC- ACS-, Function Group: 0
>> 	Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
>> 		IOVCap:	Migration-, Interrupt Message Number: 000
>> 		IOVCtl:	Enable- Migration- Interrupt- MSE- ARIHierarchy-
>> 		IOVSta:	Migration-
>> 		Initial VFs: 32, Total VFs: 32, Number of VFs: 0, Function Dependency Link: 03
>> 		VF offset: 109, stride: 1, Device ID: 37cd
>> 		Supported Page Size: 00000553, System Page Size: 00000001
>> 		Region 0: Memory at 00000000af000000 (64-bit, prefetchable)
>> 		Region 3: Memory at 00000000b0020000 (64-bit, prefetchable)
>> 		VF Migration: offset: 00000000, BIR: 0
>> 	Capabilities: [1a0 v1] Transaction Processing Hints
>> 		Device specific mode supported
>> 		No steering table available
>> 	Capabilities: [1b0 v1] Access Control Services
>> 		ACSCap:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
>> 		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
>> 	Kernel driver in use: i40e
>> 	Kernel modules: i40e
>>
>>
>> Same kernel+i40e, same SFP+ module - but on Intel X710, works like a treat:
>>
>> # lspci | grep X7
>> 81:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)
>> 81:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) # ethtool -m eth8
>> 	Identifier                                : 0x03 (SFP)
>> 	Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
>> 	Connector                                 : 0x07 (LC)
>> 	Transceiver codes                         : 0x10 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00
>> 	Transceiver type                          : 10G Ethernet: 10G Base-SR
>> 	Transceiver type                          : Ethernet: 1000BASE-SX
>> 	Encoding                                  : 0x06 (64B/66B)
>> 	BR, Nominal                               : 10300MBd
>>            (...)
>> # ethtool -i eth8
>> driver: i40e
>> version: 2.9.21
>> firmware-version: 6.01 0x800035cf 1.1876.0
>> expansion-rom-version:
>> bus-info: 0000:81:00.0
>> supports-statistics: yes
>> supports-test: yes
>> supports-eeprom-access: yes
>> supports-register-dump: yes
>> supports-priv-flags: yes
>> #
>>
>>
>> Is this a known problem?
>>
>>
>> Best regards,
>>     Jakub
>>
>>
>>
>> _______________________________________________
>> E1000-devel mailing list
>> E1000-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/e1000-devel
>> To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired


-- 
Jakub Jankowski|shasta@toxcorp.com|http://toxcorp.com/
GPG: FCBF F03D 9ADB B768 8B92 BB52 0341 9037 A875 942D


^ permalink raw reply

* [RFCv2 bpf-next 00/12] Programming socket lookup with BPF
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski

This patch set adds a mechanism for programming mappings between the local
addresses and listening/receiving sockets with BPF.

It introduces a new per-netns BPF program type, called inet_lookup, which
runs during the socket lookup. The program is allowed to select a
listening/receiving socket from a SOCKARRAY map that the packet will be
delivered to.

BPF inet_lookup intends to be an alternative for:

* SO_BINDTOPREFIX [1] - a mechanism that provides a way to listen/receive
  on all local addresses that belong to a network prefix. An alternative to
  binding to INADDR_ANY that allows applications bound to disjoint network
  prefixes to share a port. Not generic. Never got upstreamed.

* TPROXY [2] - a powerful mechanism that allows steering packets destined
  to non-local addresses to a local socket. It also works for local
  addresses, which is a less restrictive case. Can be used to implement
  what SO_BINDTOPREFIX does, and more - in particular, all ports can be
  redirected to a single socket. Socket dispatch happens early in ingress
  path (PREROUTING hook). Versatile but comes with complexities.

Compared to the above, inet_lookup aims to be a programmatic way to map
(address, port) pairs to a socket. It runs after a routing decision for
local delivery was made, and hence is limited to local addresses only.

Being part of the socket lookup, has a desired effect that redirection is
visible to XDP programs which call bpf_sk_lookup helpers.

When it comes to use cases, we have presented them in RFCv1 [3] cover
letter and also at last Netconf [4]. To recap, they are:

1) sharing a port between two services

   Services are accepting connections on different (disjoint) IP ranges but
   same port. Requests going to 192.0.2.0/24 tcp/80 are handled by NGINX,
   while 198.51.100.0/24 tcp/80 IP range is handled by Apache server.
   Applications are running as different users, in a flat single-netns
   setup.

2) receiving traffic on all ports

   We have a proxy server that accepts connections to _any_ port [5].

A simple demo program that implements (1) could look like

#define NET1 (IP4(192,  0,   2, 0) >> 8)
#define NET2 (IP4(198, 51, 100, 0) >> 8)

#define MAX_SERVERS 2

struct {
	__uint(type, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY);
	__uint(max_entries, MAX_SERVERS);
	__type(key, __u32);
	__type(value, __u64);
} redir_map SEC(".maps");

SEC("inet_lookup/demo_two_servers")
int demo_two_http_servers(struct bpf_inet_lookup *ctx)
{
	__u32 index = 0;
	__u64 flags = 0;

        if (ctx->family != AF_INET)
                return BPF_OK;
	if (ctx->protocol != IPPROTO_TCP)
		return BPF_OK;
        if (ctx->local_port != 80)
                return BPF_OK;

        switch (bpf_ntohl(ctx->local_ip4) >> 8) {
        case NET1:
		index = 0;
		break;
        case NET2:
		index = 1;
		break;
	default:
		return BPF_OK;
        }

        return bpf_redirect_lookup(ctx, &redir_map, &index, flags);
}

Since RFCv1, we've changed the approach from rewriting the lookup key to
map-based redirection. This has been suggested at Netconf, and is a
recurring pattern in existing BPF program types.

We're posting the 2nd version of RFC patch set to collect further feedback
and set context for the presentation and discussions at the upcoming
Network Summit at LPC '19 [6].

Patches are also available on GitHub [7].

Thanks,
Jakub

[1] https://www.spinics.net/lists/netdev/msg370789.html
[2] https://www.kernel.org/doc/Documentation/networking/tproxy.txt
[3] https://lore.kernel.org/netdev/20190618130050.8344-1-jakub@cloudflare.com/
[4] http://vger.kernel.org/netconf2019_files/Programmable%20socket%20lookup.pdf
[5] https://blog.cloudflare.com/how-we-built-spectrum/
[6] https://linuxplumbersconf.org/event/4/contributions/487/
[7] https://github.com/jsitnicki/linux/commits/bpf-inet-lookup

Changes RFCv1 -> RFCv2:

- Make socket lookup redirection map-based. BPF program now uses a
  dedicated helper and a SOCKARRAY map to select the socket to redirect to.
  A consequence of this change is that bpf_inet_lookup context is now
  read-only.

- Look for connected UDP sockets before allowing redirection from BPF.
  This makes connected UDP socket work as expected in the presence of
  inet_lookup prog.

- Share the code for BPF_PROG_{ATTACH,DETACH,QUERY} with flow_dissector,
  the only other per-netns BPF prog type.


Jakub Sitnicki (12):
  flow_dissector: Extract attach/detach/query helpers
  bpf: Introduce inet_lookup program type for redirecting socket lookup
  bpf: Add verifier tests for inet_lookup context access
  inet: Store layer 4 protocol in inet_hashinfo
  udp: Store layer 4 protocol in udp_table
  inet: Run inet_lookup bpf program on socket lookup
  inet6: Run inet_lookup bpf program on socket lookup
  udp: Run inet_lookup bpf program on socket lookup
  udp6: Run inet_lookup bpf program on socket lookup
  bpf: Sync linux/bpf.h to tools/
  libbpf: Add support for inet_lookup program type
  bpf: Test redirecting listening/receiving socket lookup

 include/linux/bpf.h                           |   8 +
 include/linux/bpf_types.h                     |   1 +
 include/linux/filter.h                        |  18 +
 include/net/inet6_hashtables.h                |  19 +
 include/net/inet_hashtables.h                 |  36 +
 include/net/net_namespace.h                   |   2 +
 include/net/udp.h                             |  10 +-
 include/uapi/linux/bpf.h                      |  58 +-
 kernel/bpf/syscall.c                          |  10 +
 kernel/bpf/verifier.c                         |   7 +-
 net/core/filter.c                             | 304 ++++++++
 net/core/flow_dissector.c                     |  65 +-
 net/dccp/proto.c                              |   2 +-
 net/ipv4/inet_hashtables.c                    |   5 +
 net/ipv4/tcp_ipv4.c                           |   2 +-
 net/ipv4/udp.c                                |  59 +-
 net/ipv4/udp_impl.h                           |   2 +-
 net/ipv4/udplite.c                            |   4 +-
 net/ipv6/inet6_hashtables.c                   |   5 +
 net/ipv6/udp.c                                |  54 +-
 net/ipv6/udp_impl.h                           |   2 +-
 net/ipv6/udplite.c                            |   2 +-
 tools/include/uapi/linux/bpf.h                |  58 +-
 tools/lib/bpf/libbpf.c                        |   4 +
 tools/lib/bpf/libbpf.h                        |   2 +
 tools/lib/bpf/libbpf.map                      |   2 +
 tools/lib/bpf/libbpf_probes.c                 |   1 +
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   5 +-
 tools/testing/selftests/bpf/bpf_helpers.h     |   3 +
 .../selftests/bpf/progs/inet_lookup_progs.c   |  78 ++
 .../testing/selftests/bpf/test_inet_lookup.c  | 522 +++++++++++++
 .../testing/selftests/bpf/test_inet_lookup.sh |  35 +
 .../selftests/bpf/verifier/ctx_inet_lookup.c  | 696 ++++++++++++++++++
 34 files changed, 1974 insertions(+), 108 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/inet_lookup_progs.c
 create mode 100644 tools/testing/selftests/bpf/test_inet_lookup.c
 create mode 100755 tools/testing/selftests/bpf/test_inet_lookup.sh
 create mode 100644 tools/testing/selftests/bpf/verifier/ctx_inet_lookup.c

-- 
2.20.1


^ permalink raw reply

* [RFCv2 bpf-next 01/12] flow_dissector: Extract attach/detach/query helpers
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski
In-Reply-To: <20190828072250.29828-1-jakub@cloudflare.com>

Move generic parts of callbacks for querying, attaching, and detaching a
single BPF program for reuse by other BPF program types.

Subsequent patch makes use of the extracted routines.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/linux/bpf.h       |  8 +++++
 net/core/filter.c         | 73 +++++++++++++++++++++++++++++++++++++++
 net/core/flow_dissector.c | 65 ++++++----------------------------
 3 files changed, 92 insertions(+), 54 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5b9d22338606..b301e0c03a8c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -23,6 +23,7 @@ struct sock;
 struct seq_file;
 struct btf;
 struct btf_type;
+struct mutex;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
@@ -1145,4 +1146,11 @@ static inline u32 bpf_xdp_sock_convert_ctx_access(enum bpf_access_type type,
 }
 #endif /* CONFIG_INET */
 
+int bpf_prog_query_one(struct bpf_prog __rcu **pprog,
+		       const union bpf_attr *attr,
+		       union bpf_attr __user *uattr);
+int bpf_prog_attach_one(struct bpf_prog __rcu **pprog, struct mutex *lock,
+			struct bpf_prog *prog, u32 flags);
+int bpf_prog_detach_one(struct bpf_prog __rcu **pprog, struct mutex *lock);
+
 #endif /* _LINUX_BPF_H */
diff --git a/net/core/filter.c b/net/core/filter.c
index 0c1059cdad3d..a498fbaa2d50 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8668,6 +8668,79 @@ int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
 	return ret;
 }
 
+int bpf_prog_query_one(struct bpf_prog __rcu **pprog,
+		       const union bpf_attr *attr,
+		       union bpf_attr __user *uattr)
+{
+	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
+	u32 prog_id, prog_cnt = 0, flags = 0;
+	struct bpf_prog *attached;
+
+	if (attr->query.query_flags)
+		return -EINVAL;
+
+	rcu_read_lock();
+	attached = rcu_dereference(*pprog);
+	if (attached) {
+		prog_cnt = 1;
+		prog_id = attached->aux->id;
+	}
+	rcu_read_unlock();
+
+	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
+		return -EFAULT;
+	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
+		return -EFAULT;
+
+	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
+		return 0;
+
+	if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
+		return -EFAULT;
+
+	return 0;
+}
+
+int bpf_prog_attach_one(struct bpf_prog __rcu **pprog, struct mutex *lock,
+			struct bpf_prog *prog, u32 flags)
+{
+	struct bpf_prog *attached;
+
+	if (flags)
+		return -EINVAL;
+
+	mutex_lock(lock);
+	attached = rcu_dereference_protected(*pprog,
+					     lockdep_is_held(lock));
+	if (attached) {
+		/* Only one BPF program can be attached at a time */
+		mutex_unlock(lock);
+		return -EEXIST;
+	}
+	rcu_assign_pointer(*pprog, prog);
+	mutex_unlock(lock);
+
+	return 0;
+}
+
+int bpf_prog_detach_one(struct bpf_prog __rcu **pprog, struct mutex *lock)
+{
+	struct bpf_prog *attached;
+
+	mutex_lock(lock);
+	attached = rcu_dereference_protected(*pprog,
+					     lockdep_is_held(lock));
+	if (!attached) {
+		mutex_unlock(lock);
+		return -ENOENT;
+	}
+	RCU_INIT_POINTER(*pprog, NULL);
+	bpf_prog_put(attached);
+	mutex_unlock(lock);
+
+	return 0;
+}
+
 #ifdef CONFIG_INET
 struct sk_reuseport_kern {
 	struct sk_buff *skb;
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 9741b593ea53..c51602158906 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -73,80 +73,37 @@ EXPORT_SYMBOL(skb_flow_dissector_init);
 int skb_flow_dissector_prog_query(const union bpf_attr *attr,
 				  union bpf_attr __user *uattr)
 {
-	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
-	u32 prog_id, prog_cnt = 0, flags = 0;
-	struct bpf_prog *attached;
 	struct net *net;
-
-	if (attr->query.query_flags)
-		return -EINVAL;
+	int ret;
 
 	net = get_net_ns_by_fd(attr->query.target_fd);
 	if (IS_ERR(net))
 		return PTR_ERR(net);
 
-	rcu_read_lock();
-	attached = rcu_dereference(net->flow_dissector_prog);
-	if (attached) {
-		prog_cnt = 1;
-		prog_id = attached->aux->id;
-	}
-	rcu_read_unlock();
+	ret = bpf_prog_query_one(&net->flow_dissector_prog, attr, uattr);
 
 	put_net(net);
-
-	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
-		return -EFAULT;
-	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
-		return -EFAULT;
-
-	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
-		return 0;
-
-	if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
-		return -EFAULT;
-
-	return 0;
+	return ret;
 }
 
 int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
 				       struct bpf_prog *prog)
 {
-	struct bpf_prog *attached;
-	struct net *net;
+	struct net *net = current->nsproxy->net_ns;
 
-	net = current->nsproxy->net_ns;
-	mutex_lock(&flow_dissector_mutex);
-	attached = rcu_dereference_protected(net->flow_dissector_prog,
-					     lockdep_is_held(&flow_dissector_mutex));
-	if (attached) {
-		/* Only one BPF program can be attached at a time */
-		mutex_unlock(&flow_dissector_mutex);
-		return -EEXIST;
-	}
-	rcu_assign_pointer(net->flow_dissector_prog, prog);
-	mutex_unlock(&flow_dissector_mutex);
-	return 0;
+	return bpf_prog_attach_one(&net->flow_dissector_prog,
+				   &flow_dissector_mutex, prog,
+				   attr->attach_flags);
 }
 
 int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
 {
-	struct bpf_prog *attached;
-	struct net *net;
+	struct net *net = current->nsproxy->net_ns;
 
-	net = current->nsproxy->net_ns;
-	mutex_lock(&flow_dissector_mutex);
-	attached = rcu_dereference_protected(net->flow_dissector_prog,
-					     lockdep_is_held(&flow_dissector_mutex));
-	if (!attached) {
-		mutex_unlock(&flow_dissector_mutex);
-		return -ENOENT;
-	}
-	bpf_prog_put(attached);
-	RCU_INIT_POINTER(net->flow_dissector_prog, NULL);
-	mutex_unlock(&flow_dissector_mutex);
-	return 0;
+	return bpf_prog_detach_one(&net->flow_dissector_prog,
+				   &flow_dissector_mutex);
 }
+
 /**
  * skb_flow_get_be16 - extract be16 entity
  * @skb: sk_buff to extract from
-- 
2.20.1


^ permalink raw reply related

* [RFCv2 bpf-next 02/12] bpf: Introduce inet_lookup program type for redirecting socket lookup
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski
In-Reply-To: <20190828072250.29828-1-jakub@cloudflare.com>

Add a new program type for redirecting the listening/bound socket lookup
from BPF. The program attaches to a network namespace. It is allowed to
select a socket from a SOCKARRAY, which will be used as a result of socket
lookup.

This provides a mechanism for programming the mapping between
local (address, port) pairs and listening/receiving sockets.

The program receives the 4-tuple, as well as the IP version and L4
protocol, of the packet that triggered the lookup as its context for making
a decision.

The netns-attached program is not called anywhere yet. Following patches
hook it up to ipv4 and ipv6 stacks.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/linux/bpf_types.h   |   1 +
 include/linux/filter.h      |  18 +++
 include/net/net_namespace.h |   2 +
 include/uapi/linux/bpf.h    |  58 ++++++++-
 kernel/bpf/syscall.c        |  10 ++
 kernel/bpf/verifier.c       |   7 +-
 net/core/filter.c           | 231 ++++++++++++++++++++++++++++++++++++
 7 files changed, 325 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 36a9c2325176..cc5c4ece748a 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -37,6 +37,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2)
 #endif
 #ifdef CONFIG_INET
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport)
+BPF_PROG_TYPE(BPF_PROG_TYPE_INET_LOOKUP, inet_lookup)
 #endif
 
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 92c6e31fb008..5b1b3b754c28 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1229,4 +1229,22 @@ struct bpf_sockopt_kern {
 	s32		retval;
 };
 
+struct bpf_inet_lookup_kern {
+	unsigned short	family;
+	u8		protocol;
+	__be32		saddr;
+	struct in6_addr	saddr6;
+	__be16		sport;
+	__be32		daddr;
+	struct in6_addr	daddr6;
+	unsigned short	hnum;
+	struct sock	*redir_sk;
+};
+
+int inet_lookup_bpf_prog_attach(const union bpf_attr *attr,
+				struct bpf_prog *prog);
+int inet_lookup_bpf_prog_detach(const union bpf_attr *attr);
+int inet_lookup_bpf_prog_query(const union bpf_attr *attr,
+			       union bpf_attr __user *uattr);
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4a9da951a794..bd01147cc064 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -171,6 +171,8 @@ struct net {
 #ifdef CONFIG_XDP_SOCKETS
 	struct netns_xdp	xdp;
 #endif
+	struct bpf_prog __rcu	*inet_lookup_prog;
+
 	struct sock		*diag_nlsk;
 	atomic_t		fnhe_genid;
 } __randomize_layout;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b5889257cc33..639abfa96779 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -173,6 +173,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_CGROUP_SYSCTL,
 	BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
 	BPF_PROG_TYPE_CGROUP_SOCKOPT,
+	BPF_PROG_TYPE_INET_LOOKUP,
 };
 
 enum bpf_attach_type {
@@ -199,6 +200,7 @@ enum bpf_attach_type {
 	BPF_CGROUP_UDP6_RECVMSG,
 	BPF_CGROUP_GETSOCKOPT,
 	BPF_CGROUP_SETSOCKOPT,
+	BPF_INET_LOOKUP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -2747,6 +2749,33 @@ union bpf_attr {
  *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
  *
  *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_redirect_lookup(struct bpf_inet_lookup *ctx, struct bpf_map *sockarray, void *key, u64 flags)
+ *	Description
+ *		Select a socket referenced by *map* (of type
+ *		**BPF_MAP_TYPE_REUSEPORT_SOCKARRAY**) at index *key* to use as a
+ *		result of listening (TCP) or bound (UDP) socket lookup.
+ *
+ *		The IP family and L4 protocol in *ctx* object, populated from
+ *		the packet that triggered the lookup, must match the selected
+ *		socket's family and protocol. IP6_V6ONLY socket option is
+ *		honored.
+ *
+ *		To be used by **BPF_INET_LOOKUP** programs attached to the
+ *		network namespace. Program needs to return **BPF_REDIRECT**, the
+ *		helper's success return value, for the selected socket to be
+ *		actually used.
+ *
+ *	Return
+ *		**BPF_REDIRECT** on success, if the socket at index *key* was selected.
+ *
+ *		**-EINVAL** if *flags* are invalid (not zero).
+ *
+ *		**-ENOENT** if there is no socket at index *key*.
+ *
+ *		**-EPROTOTYPE** if *ctx->protocol* does not match the socket protocol.
+ *
+ *		**-EAFNOSUPPORT** if socket does not accept IP version in *ctx->family*.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2859,7 +2888,8 @@ union bpf_attr {
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
 	FN(send_signal),		\
-	FN(tcp_gen_syncookie),
+	FN(tcp_gen_syncookie),		\
+	FN(redirect_lookup),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -3116,6 +3146,32 @@ struct bpf_tcp_sock {
 	__u32 icsk_retransmits;	/* Number of unrecovered [RTO] timeouts */
 };
 
+/* User accessible data for inet_lookup programs.
+ * New fields must be added at the end.
+ */
+struct bpf_inet_lookup {
+	__u32 family;		/* AF_INET, AF_INET6 */
+	__u32 protocol;		/* IPROTO_TCP, IPPROTO_UDP */
+	__u32 remote_ip4;	/* Allows 1,2,4-byte read but no write.
+				 * Stored in network byte order.
+				 */
+	__u32 local_ip4;	/* Allows 1,2,4-byte read and 4-byte write.
+				 * Stored in network byte order.
+				 */
+	__u32 remote_ip6[4];	/* Allows 1,2,4-byte read but no write.
+				 * Stored in network byte order.
+				 */
+	__u32 local_ip6[4];	/* Allows 1,2,4-byte read and 4-byte write.
+				 * Stored in network byte order.
+				 */
+	__u32 remote_port;	/* Allows 4-byte read but no write.
+				 * Stored in network byte order.
+				 */
+	__u32 local_port;	/* Allows 4-byte read and write.
+				 * Stored in host byte order.
+				 */
+};
+
 struct bpf_sock_tuple {
 	union {
 		struct {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c0f62fd67c6b..763f2352ff7f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1935,6 +1935,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_CGROUP_SETSOCKOPT:
 		ptype = BPF_PROG_TYPE_CGROUP_SOCKOPT;
 		break;
+	case BPF_INET_LOOKUP:
+		ptype = BPF_PROG_TYPE_INET_LOOKUP;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -1959,6 +1962,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
 		ret = skb_flow_dissector_bpf_prog_attach(attr, prog);
 		break;
+	case BPF_PROG_TYPE_INET_LOOKUP:
+		ret = inet_lookup_bpf_prog_attach(attr, prog);
+		break;
 	default:
 		ret = cgroup_bpf_prog_attach(attr, ptype, prog);
 	}
@@ -2022,6 +2028,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	case BPF_CGROUP_SETSOCKOPT:
 		ptype = BPF_PROG_TYPE_CGROUP_SOCKOPT;
 		break;
+	case BPF_INET_LOOKUP:
+		return inet_lookup_bpf_prog_detach(attr);
 	default:
 		return -EINVAL;
 	}
@@ -2065,6 +2073,8 @@ static int bpf_prog_query(const union bpf_attr *attr,
 		return lirc_prog_query(attr, uattr);
 	case BPF_FLOW_DISSECTOR:
 		return skb_flow_dissector_prog_query(attr, uattr);
+	case BPF_INET_LOOKUP:
+		return inet_lookup_bpf_prog_query(attr, uattr);
 	default:
 		return -EINVAL;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 10c0ff93f52b..5717dd10cc4d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3494,7 +3494,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 			goto error;
 		break;
 	case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY:
-		if (func_id != BPF_FUNC_sk_select_reuseport)
+		if (func_id != BPF_FUNC_sk_select_reuseport &&
+		    func_id != BPF_FUNC_redirect_lookup)
 			goto error;
 		break;
 	case BPF_MAP_TYPE_QUEUE:
@@ -3578,6 +3579,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		if (map->map_type != BPF_MAP_TYPE_SK_STORAGE)
 			goto error;
 		break;
+	case BPF_FUNC_redirect_lookup:
+		if (map->map_type != BPF_MAP_TYPE_REUSEPORT_SOCKARRAY)
+			goto error;
+		break;
 	default:
 		break;
 	}
diff --git a/net/core/filter.c b/net/core/filter.c
index a498fbaa2d50..d9375a7e60f5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -9007,4 +9007,235 @@ const struct bpf_verifier_ops sk_reuseport_verifier_ops = {
 
 const struct bpf_prog_ops sk_reuseport_prog_ops = {
 };
+
 #endif /* CONFIG_INET */
+
+static DEFINE_MUTEX(inet_lookup_prog_mutex);
+
+BPF_CALL_4(redirect_lookup, struct bpf_inet_lookup_kern *, ctx,
+	   struct bpf_map *, map, void *, key, u64, flags)
+{
+	struct sock_reuseport *reuse;
+	struct sock *redir_sk;
+
+	if (unlikely(flags))
+		return -EINVAL;
+
+	/* Lookup socket in the map */
+	redir_sk = map->ops->map_lookup_elem(map, key);
+	if (!redir_sk)
+		return -ENOENT;
+
+	/* Check if socket got unhashed from sockets table, e.g. by
+	 * close(), after the above map_lookup_elem(). Treat it as
+	 * removed from the map.
+	 */
+	reuse = rcu_dereference(redir_sk->sk_reuseport_cb);
+	if (!reuse)
+		return -ENOENT;
+
+	/* Check protocol & family are a match */
+	if (ctx->protocol != redir_sk->sk_protocol)
+		return -EPROTOTYPE;
+	if (ctx->family != redir_sk->sk_family &&
+	    (redir_sk->sk_family == AF_INET || ipv6_only_sock(redir_sk)))
+		return -EAFNOSUPPORT;
+
+	/* Store socket in context */
+	ctx->redir_sk = redir_sk;
+
+	/* Signal redirect action */
+	return BPF_REDIRECT;
+}
+
+static const struct bpf_func_proto bpf_redirect_lookup_proto = {
+	.func		= redirect_lookup,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_CONST_MAP_PTR,
+	.arg3_type	= ARG_PTR_TO_MAP_KEY,
+	.arg4_type	= ARG_ANYTHING,
+};
+
+static const struct bpf_func_proto *
+inet_lookup_func_proto(enum bpf_func_id func_id,
+		       const struct bpf_prog *prog)
+{
+	switch (func_id) {
+	case BPF_FUNC_redirect_lookup:
+		return &bpf_redirect_lookup_proto;
+	default:
+		return bpf_base_func_proto(func_id);
+	}
+}
+
+int inet_lookup_bpf_prog_attach(const union bpf_attr *attr,
+				struct bpf_prog *prog)
+{
+	struct net *net = current->nsproxy->net_ns;
+
+	return bpf_prog_attach_one(&net->inet_lookup_prog,
+				   &inet_lookup_prog_mutex, prog,
+				   attr->attach_flags);
+}
+
+int inet_lookup_bpf_prog_detach(const union bpf_attr *attr)
+{
+	struct net *net = current->nsproxy->net_ns;
+
+	return bpf_prog_detach_one(&net->inet_lookup_prog,
+				   &inet_lookup_prog_mutex);
+}
+
+int inet_lookup_bpf_prog_query(const union bpf_attr *attr,
+			       union bpf_attr __user *uattr)
+{
+	struct net *net;
+	int ret;
+
+	net = get_net_ns_by_fd(attr->query.target_fd);
+	if (IS_ERR(net))
+		return PTR_ERR(net);
+
+	ret = bpf_prog_query_one(&net->inet_lookup_prog, attr, uattr);
+
+	put_net(net);
+	return ret;
+}
+
+static bool inet_lookup_is_valid_access(int off, int size,
+					enum bpf_access_type type,
+					const struct bpf_prog *prog,
+					struct bpf_insn_access_aux *info)
+{
+	const int size_default = sizeof(__u32);
+
+	if (off < 0 || off >= sizeof(struct bpf_inet_lookup))
+		return false;
+	if (off % size != 0)
+		return false;
+	if (type != BPF_READ)
+		return false;
+
+	switch (off) {
+	case bpf_ctx_range(struct bpf_inet_lookup, remote_ip4):
+	case bpf_ctx_range(struct bpf_inet_lookup, local_ip4):
+	case bpf_ctx_range_till(struct bpf_inet_lookup,
+				remote_ip6[0], remote_ip6[3]):
+	case bpf_ctx_range_till(struct bpf_inet_lookup,
+				local_ip6[0], local_ip6[3]):
+		if (!bpf_ctx_narrow_access_ok(off, size, size_default))
+			return false;
+		bpf_ctx_record_field_size(info, size_default);
+		break;
+
+	case bpf_ctx_range(struct bpf_inet_lookup, family):
+	case bpf_ctx_range(struct bpf_inet_lookup, protocol):
+	case bpf_ctx_range(struct bpf_inet_lookup, remote_port):
+	case bpf_ctx_range(struct bpf_inet_lookup, local_port):
+		if (size != size_default)
+			return false;
+		break;
+
+	default:
+		return false;
+	}
+
+	return true;
+}
+
+#define LOAD_FIELD_SIZE_OFF(TYPE, FIELD, SIZE, OFF) ({			\
+	*insn++ = BPF_LDX_MEM(SIZE, si->dst_reg, si->src_reg,		\
+			      bpf_target_off(TYPE, FIELD,		\
+					     FIELD_SIZEOF(TYPE, FIELD),	\
+					     target_size) + (OFF));	\
+})
+
+#define LOAD_FIELD_SIZE(TYPE, FIELD, SIZE) \
+	LOAD_FIELD_SIZE_OFF(TYPE, FIELD, SIZE, 0)
+
+#define LOAD_FIELD(TYPE, FIELD) \
+	LOAD_FIELD_SIZE(TYPE, FIELD, BPF_FIELD_SIZEOF(TYPE, FIELD))
+
+static u32 inet_lookup_convert_ctx_access(enum bpf_access_type type,
+					  const struct bpf_insn *si,
+					  struct bpf_insn *insn_buf,
+					  struct bpf_prog *prog,
+					  u32 *target_size)
+{
+	struct bpf_insn *insn = insn_buf;
+	int off;
+
+	switch (si->off) {
+	case offsetof(struct bpf_inet_lookup, family):
+		LOAD_FIELD(struct bpf_inet_lookup_kern, family);
+		break;
+
+	case offsetof(struct bpf_inet_lookup, protocol):
+		LOAD_FIELD(struct bpf_inet_lookup_kern, protocol);
+		break;
+
+	case offsetof(struct bpf_inet_lookup, remote_ip4):
+		LOAD_FIELD_SIZE(struct bpf_inet_lookup_kern, saddr,
+				BPF_SIZE(si->code));
+		break;
+
+	case offsetof(struct bpf_inet_lookup, local_ip4):
+		LOAD_FIELD_SIZE(struct bpf_inet_lookup_kern, daddr,
+				BPF_SIZE(si->code));
+
+		break;
+
+	case bpf_ctx_range_till(struct bpf_inet_lookup,
+				remote_ip6[0], remote_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+		off = si->off;
+		off -= offsetof(struct bpf_inet_lookup, remote_ip6[0]);
+
+		LOAD_FIELD_SIZE_OFF(struct bpf_inet_lookup_kern,
+				    saddr6.s6_addr32[0],
+				    BPF_SIZE(si->code), off);
+#else
+		(void)off;
+
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
+
+	case bpf_ctx_range_till(struct bpf_inet_lookup,
+				local_ip6[0], local_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+		off = si->off;
+		off -= offsetof(struct bpf_inet_lookup, local_ip6[0]);
+
+		LOAD_FIELD_SIZE_OFF(struct bpf_inet_lookup_kern,
+				    daddr6.s6_addr32[0],
+				    BPF_SIZE(si->code), off);
+#else
+		(void)off;
+
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
+
+	case offsetof(struct bpf_inet_lookup, remote_port):
+		LOAD_FIELD(struct bpf_inet_lookup_kern, sport);
+		break;
+
+	case offsetof(struct bpf_inet_lookup, local_port):
+		LOAD_FIELD(struct bpf_inet_lookup_kern, hnum);
+		break;
+	}
+
+	return insn - insn_buf;
+}
+
+const struct bpf_prog_ops inet_lookup_prog_ops = {
+};
+
+const struct bpf_verifier_ops inet_lookup_verifier_ops = {
+	.get_func_proto		= inet_lookup_func_proto,
+	.is_valid_access	= inet_lookup_is_valid_access,
+	.convert_ctx_access	= inet_lookup_convert_ctx_access,
+};
-- 
2.20.1


^ permalink raw reply related

* [RFCv2 bpf-next 04/12] inet: Store layer 4 protocol in inet_hashinfo
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski
In-Reply-To: <20190828072250.29828-1-jakub@cloudflare.com>

Make it possible to identify the protocol of the sockets stored in hashinfo
without looking up one.

Subsequent patches make use the new field at the socket lookup time to
enforce that the BPF program selects only sockets with matching protocol.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/inet_hashtables.h | 3 +++
 net/dccp/proto.c              | 2 +-
 net/ipv4/tcp_ipv4.c           | 2 +-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index af2b4c065a04..b2d43ee72dc1 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -138,6 +138,9 @@ struct inet_hashinfo {
 	unsigned int			lhash2_mask;
 	struct inet_listen_hashbucket	*lhash2;
 
+	/* Layer 4 protocol of the stored sockets */
+	int				protocol;
+
 	/* All the above members are written once at bootup and
 	 * never written again _or_ are predominantly read-access.
 	 *
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 5bad08dc4316..805eee1b4fb0 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -45,7 +45,7 @@ EXPORT_SYMBOL_GPL(dccp_statistics);
 struct percpu_counter dccp_orphan_count;
 EXPORT_SYMBOL_GPL(dccp_orphan_count);
 
-struct inet_hashinfo dccp_hashinfo;
+struct inet_hashinfo dccp_hashinfo = { .protocol = IPPROTO_DCCP };
 EXPORT_SYMBOL_GPL(dccp_hashinfo);
 
 /* the maximum queue length for tx in packets. 0 is no limit */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index fd394ad179a0..5d2afbcc45cc 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -87,7 +87,7 @@ static int tcp_v4_md5_hash_hdr(char *md5_hash, const struct tcp_md5sig_key *key,
 			       __be32 daddr, __be32 saddr, const struct tcphdr *th);
 #endif
 
-struct inet_hashinfo tcp_hashinfo;
+struct inet_hashinfo tcp_hashinfo = { .protocol = IPPROTO_TCP };
 EXPORT_SYMBOL(tcp_hashinfo);
 
 static u32 tcp_v4_init_seq(const struct sk_buff *skb)
-- 
2.20.1


^ permalink raw reply related

* [RFCv2 bpf-next 03/12] bpf: Add verifier tests for inet_lookup context access
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski
In-Reply-To: <20190828072250.29828-1-jakub@cloudflare.com>

Exercise verifier access checks for bpf_inet_lookup context object fields.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 .../selftests/bpf/verifier/ctx_inet_lookup.c  | 696 ++++++++++++++++++
 1 file changed, 696 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/verifier/ctx_inet_lookup.c

diff --git a/tools/testing/selftests/bpf/verifier/ctx_inet_lookup.c b/tools/testing/selftests/bpf/verifier/ctx_inet_lookup.c
new file mode 100644
index 000000000000..665984d8179d
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/ctx_inet_lookup.c
@@ -0,0 +1,696 @@
+{
+	"valid 1,2,4-byte read bpf_inet_lookup remote_ip4",
+	.insns = {
+		/* 4-byte read */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip4)),
+		/* 2-byte read */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip4)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip4) + 2),
+		/* 1-byte read */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip4)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip4) + 3),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_inet_lookup remote_ip4",
+	.insns = {
+		/* 8-byte read */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_inet_lookup remote_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		/* 4-byte write */
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_inet_lookup remote_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		/* 4-byte write */
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_inet_lookup remote_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		/* 2-byte write */
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_inet_lookup remote_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		/* 1-byte write */
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"valid 1,2,4-byte read bpf_inet_lookup local_ip4",
+	.insns = {
+		/* 4-byte read */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip4)),
+		/* 2-byte read */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip4)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip4) + 2),
+		/* 1-byte read */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip4)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip4) + 3),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_inet_lookup local_ip4",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_inet_lookup local_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_inet_lookup local_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_inet_lookup local_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_inet_lookup local_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"valid 1,2,4-byte read bpf_inet_lookup remote_ip6",
+	.insns = {
+		/* 4-byte read */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[0])),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[3])),
+		/* 2-byte read */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[0])),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup,
+				     remote_ip6[3]) + 2),
+		/* 1-byte read */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[0])),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup,
+				     remote_ip6[3]) + 3),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_inet_lookup remote_ip6",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_inet_lookup remote_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_inet_lookup remote_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_inet_lookup remote_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_inet_lookup remote_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"valid 1,2,4-byte read bpf_inet_lookup local_ip6",
+	.insns = {
+		/* 4-byte read */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip6[0])),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip6[3])),
+		/* 2-byte read */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip6[0])),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip6[3]) + 2),
+		/* 1-byte read */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip6[0])),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip6[3]) + 3),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_inet_lookup local_ip6",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_inet_lookup local_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_inet_lookup local_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_inet_lookup local_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_inet_lookup local_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"valid 4-byte read bpf_inet_lookup remote_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_inet_lookup remote_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte read bpf_inet_lookup remote_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte read bpf_inet_lookup remote_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_inet_lookup remote_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_inet_lookup remote_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_inet_lookup remote_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_inet_lookup remote_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"valid 4-byte read bpf_inet_lookup local_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_inet_lookup local_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte read bpf_inet_lookup local_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte read bpf_inet_lookup local_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_inet_lookup local_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_inet_lookup local_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_inet_lookup local_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_inet_lookup local_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"valid 4-byte read bpf_inet_lookup family",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_inet_lookup family",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte read bpf_inet_lookup family",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte read bpf_inet_lookup family",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_inet_lookup family",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_inet_lookup family",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_inet_lookup family",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_inet_lookup family",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"valid 4-byte read bpf_inet_lookup protocol",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_inet_lookup protocol",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte read bpf_inet_lookup protocol",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte read bpf_inet_lookup protocol",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_inet_lookup protocol",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_inet_lookup protocol",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_inet_lookup protocol",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_inet_lookup protocol",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
-- 
2.20.1


^ permalink raw reply related

* [RFCv2 bpf-next 05/12] udp: Store layer 4 protocol in udp_table
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski
In-Reply-To: <20190828072250.29828-1-jakub@cloudflare.com>

Because UDP and UDP-Lite share code we have to pass the L4 protocol
identifier alongside the socket table to down call sites where
distinguishing between the two is needed.

There is currently only one such call site, which by itself is not reason
enough for the change.

However, subsequent patches will also make use the new udp_table field
inside the socket lookup routine to enforce that the BPF program selects
only sockets with matching protocol.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/udp.h   | 10 ++++++----
 net/ipv4/udp.c      | 15 +++++++--------
 net/ipv4/udp_impl.h |  2 +-
 net/ipv4/udplite.c  |  4 ++--
 net/ipv6/udp.c      | 12 ++++++------
 net/ipv6/udp_impl.h |  2 +-
 net/ipv6/udplite.c  |  2 +-
 7 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 79d141d2103b..97778976c5ec 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -63,16 +63,18 @@ struct udp_hslot {
 /**
  *	struct udp_table - UDP table
  *
- *	@hash:	hash table, sockets are hashed on (local port)
- *	@hash2:	hash table, sockets are hashed on (local port, local address)
- *	@mask:	number of slots in hash tables, minus 1
- *	@log:	log2(number of slots in hash table)
+ *	@hash:		hash table, sockets are hashed on (local port)
+ *	@hash2:		hash table, sockets are hashed on (local port, local address)
+ *	@mask:		number of slots in hash tables, minus 1
+ *	@log:		log2(number of slots in hash table)
+ *	@protocol:	layer 4 protocol of the stored sockets
  */
 struct udp_table {
 	struct udp_hslot	*hash;
 	struct udp_hslot	*hash2;
 	unsigned int		mask;
 	unsigned int		log;
+	int			protocol;
 };
 extern struct udp_table udp_table;
 void udp_table_init(struct udp_table *, const char *);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d88821c794fb..9fffe9e9eec6 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -113,7 +113,7 @@
 #include <net/addrconf.h>
 #include <net/udp_tunnel.h>
 
-struct udp_table udp_table __read_mostly;
+struct udp_table udp_table __read_mostly = { .protocol = IPPROTO_UDP };
 EXPORT_SYMBOL(udp_table);
 
 long sysctl_udp_mem[3] __read_mostly;
@@ -2106,8 +2106,7 @@ EXPORT_SYMBOL(udp_sk_rx_dst_set);
 static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 				    struct udphdr  *uh,
 				    __be32 saddr, __be32 daddr,
-				    struct udp_table *udptable,
-				    int proto)
+				    struct udp_table *udptable)
 {
 	struct sock *sk, *first = NULL;
 	unsigned short hnum = ntohs(uh->dest);
@@ -2163,7 +2162,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	} else {
 		kfree_skb(skb);
 		__UDP_INC_STATS(net, UDP_MIB_IGNOREDMULTI,
-				proto == IPPROTO_UDPLITE);
+				udptable->protocol == IPPROTO_UDPLITE);
 	}
 	return 0;
 }
@@ -2240,8 +2239,7 @@ static int udp_unicast_rcv_skb(struct sock *sk, struct sk_buff *skb,
  *	All we need to do is get the socket, and then do a checksum.
  */
 
-int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
-		   int proto)
+int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable)
 {
 	struct sock *sk;
 	struct udphdr *uh;
@@ -2249,6 +2247,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	struct rtable *rt = skb_rtable(skb);
 	__be32 saddr, daddr;
 	struct net *net = dev_net(skb->dev);
+	int proto = udptable->protocol;
 
 	/*
 	 *  Validate the packet.
@@ -2289,7 +2288,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 
 	if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
 		return __udp4_lib_mcast_deliver(net, skb, uh,
-						saddr, daddr, udptable, proto);
+						saddr, daddr, udptable);
 
 	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
 	if (sk)
@@ -2463,7 +2462,7 @@ int udp_v4_early_demux(struct sk_buff *skb)
 
 int udp_rcv(struct sk_buff *skb)
 {
-	return __udp4_lib_rcv(skb, &udp_table, IPPROTO_UDP);
+	return __udp4_lib_rcv(skb, &udp_table);
 }
 
 void udp_destroy_sock(struct sock *sk)
diff --git a/net/ipv4/udp_impl.h b/net/ipv4/udp_impl.h
index 6b2fa77eeb1c..7013535f9084 100644
--- a/net/ipv4/udp_impl.h
+++ b/net/ipv4/udp_impl.h
@@ -6,7 +6,7 @@
 #include <net/protocol.h>
 #include <net/inet_common.h>
 
-int __udp4_lib_rcv(struct sk_buff *, struct udp_table *, int);
+int __udp4_lib_rcv(struct sk_buff *, struct udp_table *);
 int __udp4_lib_err(struct sk_buff *, u32, struct udp_table *);
 
 int udp_v4_get_port(struct sock *sk, unsigned short snum);
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
index 5936d66d1ce2..4e4e85de95b2 100644
--- a/net/ipv4/udplite.c
+++ b/net/ipv4/udplite.c
@@ -14,12 +14,12 @@
 #include <linux/proc_fs.h>
 #include "udp_impl.h"
 
-struct udp_table 	udplite_table __read_mostly;
+struct udp_table udplite_table __read_mostly = { .protocol = IPPROTO_UDPLITE };
 EXPORT_SYMBOL(udplite_table);
 
 static int udplite_rcv(struct sk_buff *skb)
 {
-	return __udp4_lib_rcv(skb, &udplite_table, IPPROTO_UDPLITE);
+	return __udp4_lib_rcv(skb, &udplite_table);
 }
 
 static int udplite_err(struct sk_buff *skb, u32 info)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 827fe7385078..16ef2303bd8d 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -741,7 +741,7 @@ static void udp6_csum_zero_error(struct sk_buff *skb)
  */
 static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 		const struct in6_addr *saddr, const struct in6_addr *daddr,
-		struct udp_table *udptable, int proto)
+		struct udp_table *udptable)
 {
 	struct sock *sk, *first = NULL;
 	const struct udphdr *uh = udp_hdr(skb);
@@ -803,7 +803,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	} else {
 		kfree_skb(skb);
 		__UDP6_INC_STATS(net, UDP_MIB_IGNOREDMULTI,
-				 proto == IPPROTO_UDPLITE);
+				 udptable->protocol == IPPROTO_UDPLITE);
 	}
 	return 0;
 }
@@ -836,11 +836,11 @@ static int udp6_unicast_rcv_skb(struct sock *sk, struct sk_buff *skb,
 	return 0;
 }
 
-int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
-		   int proto)
+int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable)
 {
 	const struct in6_addr *saddr, *daddr;
 	struct net *net = dev_net(skb->dev);
+	int proto = udptable->protocol;
 	struct udphdr *uh;
 	struct sock *sk;
 	u32 ulen = 0;
@@ -902,7 +902,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	 */
 	if (ipv6_addr_is_multicast(daddr))
 		return __udp6_lib_mcast_deliver(net, skb,
-				saddr, daddr, udptable, proto);
+				saddr, daddr, udptable);
 
 	/* Unicast */
 	sk = __udp6_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
@@ -1011,7 +1011,7 @@ INDIRECT_CALLABLE_SCOPE void udp_v6_early_demux(struct sk_buff *skb)
 
 INDIRECT_CALLABLE_SCOPE int udpv6_rcv(struct sk_buff *skb)
 {
-	return __udp6_lib_rcv(skb, &udp_table, IPPROTO_UDP);
+	return __udp6_lib_rcv(skb, &udp_table);
 }
 
 /*
diff --git a/net/ipv6/udp_impl.h b/net/ipv6/udp_impl.h
index 20e324b6f358..acd5a942c633 100644
--- a/net/ipv6/udp_impl.h
+++ b/net/ipv6/udp_impl.h
@@ -8,7 +8,7 @@
 #include <net/inet_common.h>
 #include <net/transp_v6.h>
 
-int __udp6_lib_rcv(struct sk_buff *, struct udp_table *, int);
+int __udp6_lib_rcv(struct sk_buff *, struct udp_table *);
 int __udp6_lib_err(struct sk_buff *, struct inet6_skb_parm *, u8, u8, int,
 		   __be32, struct udp_table *);
 
diff --git a/net/ipv6/udplite.c b/net/ipv6/udplite.c
index bf7a7acd39b1..f442ed595e6f 100644
--- a/net/ipv6/udplite.c
+++ b/net/ipv6/udplite.c
@@ -14,7 +14,7 @@
 
 static int udplitev6_rcv(struct sk_buff *skb)
 {
-	return __udp6_lib_rcv(skb, &udplite_table, IPPROTO_UDPLITE);
+	return __udp6_lib_rcv(skb, &udplite_table);
 }
 
 static int udplitev6_err(struct sk_buff *skb,
-- 
2.20.1


^ permalink raw reply related

* [RFCv2 bpf-next 06/12] inet: Run inet_lookup bpf program on socket lookup
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski
In-Reply-To: <20190828072250.29828-1-jakub@cloudflare.com>

Run a BPF program before looking up the listening socket. The program can
redirect the skb to a listening socket of its choice, providing it calls
bpf_redirect_lookup() helper and returns BPF_REDIRECT.

This lets the user-space program mappings between packet 4-tuple and
listening sockets. With the possibility to override the socket lookup from
BPF, applications don't need to bind sockets to every addresses they
receive on, or resort to listening on all addresses with INADDR_ANY.

Also port sharing conflicts become a non-issue. Application can listen on
any free port and still receive traffic destined to its assigned service
port.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/inet_hashtables.h | 33 +++++++++++++++++++++++++++++++++
 net/ipv4/inet_hashtables.c    |  5 +++++
 2 files changed, 38 insertions(+)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index b2d43ee72dc1..c9c7efb961cb 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -417,4 +417,37 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 
 int inet_hash_connect(struct inet_timewait_death_row *death_row,
 		      struct sock *sk);
+
+static inline struct sock *__inet_lookup_run_bpf(const struct net *net,
+						 struct bpf_inet_lookup_kern *ctx)
+{
+	struct bpf_prog *prog;
+	int ret = BPF_OK;
+
+	rcu_read_lock();
+	prog = rcu_dereference(net->inet_lookup_prog);
+	if (prog)
+		ret = BPF_PROG_RUN(prog, ctx);
+	rcu_read_unlock();
+
+	return ret == BPF_REDIRECT ? ctx->redir_sk : NULL;
+}
+
+static inline struct sock *inet_lookup_run_bpf(const struct net *net, u8 proto,
+					       __be32 saddr, __be16 sport,
+					       __be32 daddr,
+					       unsigned short hnum)
+{
+	struct bpf_inet_lookup_kern ctx = {
+		.family		= AF_INET,
+		.protocol	= proto,
+		.saddr		= saddr,
+		.sport		= sport,
+		.daddr		= daddr,
+		.hnum		= hnum,
+	};
+
+	return __inet_lookup_run_bpf(net, &ctx);
+}
+
 #endif /* _INET_HASHTABLES_H */
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 97824864e40d..ab6d89c27c94 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -299,6 +299,11 @@ struct sock *__inet_lookup_listener(struct net *net,
 	struct sock *result = NULL;
 	unsigned int hash2;
 
+	result = inet_lookup_run_bpf(net, hashinfo->protocol,
+				     saddr, sport, daddr, hnum);
+	if (result)
+		goto done;
+
 	hash2 = ipv4_portaddr_hash(net, daddr, hnum);
 	ilb2 = inet_lhash2_bucket(hashinfo, hash2);
 
-- 
2.20.1


^ permalink raw reply related

* [RFCv2 bpf-next 09/12] udp6: Run inet_lookup bpf program on socket lookup
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski
In-Reply-To: <20190828072250.29828-1-jakub@cloudflare.com>

Same as for udp, split the socket lookup into two phases and let the BPF
inet_lookup program select the receiving socket.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv6/udp.c | 42 ++++++++++++++++++++++++++++++------------
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 16ef2303bd8d..7380cf57e88c 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -101,7 +101,7 @@ void udp_v6_rehash(struct sock *sk)
 static int compute_score(struct sock *sk, struct net *net,
 			 const struct in6_addr *saddr, __be16 sport,
 			 const struct in6_addr *daddr, unsigned short hnum,
-			 int dif, int sdif)
+			 int dif, int sdif, unsigned char state)
 {
 	int score;
 	struct inet_sock *inet;
@@ -112,6 +112,9 @@ static int compute_score(struct sock *sk, struct net *net,
 	    sk->sk_family != PF_INET6)
 		return -1;
 
+	if (state && sk->sk_state != state)
+		return -1;
+
 	if (!ipv6_addr_equal(&sk->sk_v6_rcv_saddr, daddr))
 		return -1;
 
@@ -146,7 +149,7 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 		const struct in6_addr *saddr, __be16 sport,
 		const struct in6_addr *daddr, unsigned int hnum,
 		int dif, int sdif, struct udp_hslot *hslot2,
-		struct sk_buff *skb)
+		struct sk_buff *skb, unsigned char state)
 {
 	struct sock *sk, *result;
 	int score, badness;
@@ -156,7 +159,7 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 	badness = -1;
 	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
 		score = compute_score(sk, net, saddr, sport,
-				      daddr, hnum, dif, sdif);
+				      daddr, hnum, dif, sdif, state);
 		if (score > badness) {
 			if (sk->sk_reuseport) {
 				hash = udp6_ehashfn(net, daddr, hnum,
@@ -190,19 +193,34 @@ struct sock *__udp6_lib_lookup(struct net *net,
 	slot2 = hash2 & udptable->mask;
 	hslot2 = &udptable->hash2[slot2];
 
+	/* Lookup connected sockets */
 	result = udp6_lib_lookup2(net, saddr, sport,
 				  daddr, hnum, dif, sdif,
-				  hslot2, skb);
-	if (!result) {
-		hash2 = ipv6_portaddr_hash(net, &in6addr_any, hnum);
-		slot2 = hash2 & udptable->mask;
+				  hslot2, skb, TCP_ESTABLISHED);
+	if (result)
+		goto done;
 
-		hslot2 = &udptable->hash2[slot2];
+	/* Lookup redirect from BPF */
+	result = inet6_lookup_run_bpf(net, udptable->protocol,
+				      saddr, sport, daddr, hnum);
+	if (result)
+		goto done;
 
-		result = udp6_lib_lookup2(net, saddr, sport,
-					  &in6addr_any, hnum, dif, sdif,
-					  hslot2, skb);
-	}
+	/* Lookup bound sockets */
+	result = udp6_lib_lookup2(net, saddr, sport,
+				  daddr, hnum, dif, sdif,
+				  hslot2, skb, 0);
+	if (result)
+		goto done;
+
+	hash2 = ipv6_portaddr_hash(net, &in6addr_any, hnum);
+	slot2 = hash2 & udptable->mask;
+	hslot2 = &udptable->hash2[slot2];
+
+	result = udp6_lib_lookup2(net, saddr, sport,
+				  &in6addr_any, hnum, dif, sdif,
+				  hslot2, skb, 0);
+done:
 	if (IS_ERR(result))
 		return NULL;
 	return result;
-- 
2.20.1


^ permalink raw reply related

* [RFCv2 bpf-next 10/12] bpf: Sync linux/bpf.h to tools/
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski
In-Reply-To: <20190828072250.29828-1-jakub@cloudflare.com>

Newly added program, context type and helper is used by tests in a
subsequent patch. Synchronize the header file.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/include/uapi/linux/bpf.h | 58 +++++++++++++++++++++++++++++++++-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index b5889257cc33..58ee3d24a430 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -173,6 +173,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_CGROUP_SYSCTL,
 	BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
 	BPF_PROG_TYPE_CGROUP_SOCKOPT,
+	BPF_PROG_TYPE_INET_LOOKUP,
 };
 
 enum bpf_attach_type {
@@ -199,6 +200,7 @@ enum bpf_attach_type {
 	BPF_CGROUP_UDP6_RECVMSG,
 	BPF_CGROUP_GETSOCKOPT,
 	BPF_CGROUP_SETSOCKOPT,
+	BPF_INET_LOOKUP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -2747,6 +2749,33 @@ union bpf_attr {
  *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
  *
  *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_redirect_lookup(struct bpf_inet_lookup_kern *ctx, struct bpf_map *sockarray, void *key, u64 flags)
+ *	Description
+ *		Select a socket referenced by *map* (of type
+ *		**BPF_MAP_TYPE_REUSEPORT_SOCKARRAY**) at index *key* to use as a
+ *		result of listening (TCP) or bound (UDP) socket lookup.
+ *
+ *		The IP family and L4 protocol in *ctx* object, populated from
+ *		the packet that triggered the lookup, must match the selected
+ *		socket's family and protocol. IP6_V6ONLY socket option is
+ *		honored.
+ *
+ *		To be used by **BPF_INET_LOOKUP** programs attached to the
+ *		network namespace. Program needs to return **BPF_REDIRECT**, the
+ *		helper's success return value, for the selected socket to be
+ *		actually used.
+ *
+ *	Return
+ *		**BPF_REDIRECT** on success, if the socket at index *key* was selected.
+ *
+ *		**-EINVAL** if *flags* are invalid (not zero).
+ *
+ *		**-ENOENT** if there is no socket at index *key*.
+ *
+ *		**-EPROTOTYPE** if *ctx->protocol* does not match the socket protocol.
+ *
+ *		**-EAFNOSUPPORT** if socket does not accept IP version in *ctx->family*.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2859,7 +2888,8 @@ union bpf_attr {
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
 	FN(send_signal),		\
-	FN(tcp_gen_syncookie),
+	FN(tcp_gen_syncookie),		\
+	FN(redirect_lookup),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -3116,6 +3146,32 @@ struct bpf_tcp_sock {
 	__u32 icsk_retransmits;	/* Number of unrecovered [RTO] timeouts */
 };
 
+/* User accessible data for inet_lookup programs.
+ * New fields must be added at the end.
+ */
+struct bpf_inet_lookup {
+	__u32 family;		/* AF_INET, AF_INET6 */
+	__u32 protocol;		/* IPROTO_TCP, IPPROTO_UDP */
+	__u32 remote_ip4;	/* Allows 1,2,4-byte read but no write.
+				 * Stored in network byte order.
+				 */
+	__u32 local_ip4;	/* Allows 1,2,4-byte read and 4-byte write.
+				 * Stored in network byte order.
+				 */
+	__u32 remote_ip6[4];	/* Allows 1,2,4-byte read but no write.
+				 * Stored in network byte order.
+				 */
+	__u32 local_ip6[4];	/* Allows 1,2,4-byte read and 4-byte write.
+				 * Stored in network byte order.
+				 */
+	__u32 remote_port;	/* Allows 4-byte read but no write.
+				 * Stored in network byte order.
+				 */
+	__u32 local_port;	/* Allows 4-byte read and write.
+				 * Stored in host byte order.
+				 */
+};
+
 struct bpf_sock_tuple {
 	union {
 		struct {
-- 
2.20.1


^ permalink raw reply related

* [RFCv2 bpf-next 11/12] libbpf: Add support for inet_lookup program type
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski
In-Reply-To: <20190828072250.29828-1-jakub@cloudflare.com>

Make libbpf aware of the newly added program type. Reserve a section name
for it.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/lib/bpf/libbpf.c        | 4 ++++
 tools/lib/bpf/libbpf.h        | 2 ++
 tools/lib/bpf/libbpf.map      | 2 ++
 tools/lib/bpf/libbpf_probes.c | 1 +
 4 files changed, 9 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2233f919dd88..addb9762e965 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -3580,6 +3580,7 @@ static bool bpf_prog_type__needs_kver(enum bpf_prog_type type)
 	case BPF_PROG_TYPE_PERF_EVENT:
 	case BPF_PROG_TYPE_CGROUP_SYSCTL:
 	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
+	case BPF_PROG_TYPE_INET_LOOKUP:
 		return false;
 	case BPF_PROG_TYPE_KPROBE:
 	default:
@@ -4447,6 +4448,7 @@ BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT);
 BPF_PROG_TYPE_FNS(raw_tracepoint, BPF_PROG_TYPE_RAW_TRACEPOINT);
 BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP);
 BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
+BPF_PROG_TYPE_FNS(inet_lookup, BPF_PROG_TYPE_INET_LOOKUP);
 
 void bpf_program__set_expected_attach_type(struct bpf_program *prog,
 					   enum bpf_attach_type type)
@@ -4542,6 +4544,8 @@ static const struct {
 						BPF_CGROUP_GETSOCKOPT),
 	BPF_EAPROG_SEC("cgroup/setsockopt",	BPF_PROG_TYPE_CGROUP_SOCKOPT,
 						BPF_CGROUP_SETSOCKOPT),
+	BPF_EAPROG_SEC("inet_lookup",		BPF_PROG_TYPE_INET_LOOKUP,
+						BPF_INET_LOOKUP),
 };
 
 #undef BPF_PROG_SEC_IMPL
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index e8f70977d137..937d6da9430a 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -262,6 +262,7 @@ LIBBPF_API int bpf_program__set_sched_cls(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_sched_act(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_xdp(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_perf_event(struct bpf_program *prog);
+LIBBPF_API int bpf_program__set_inet_lookup(struct bpf_program *prog);
 LIBBPF_API void bpf_program__set_type(struct bpf_program *prog,
 				      enum bpf_prog_type type);
 LIBBPF_API void
@@ -276,6 +277,7 @@ LIBBPF_API bool bpf_program__is_sched_cls(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_sched_act(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_xdp(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_perf_event(const struct bpf_program *prog);
+LIBBPF_API bool bpf_program__is_inet_lookup(const struct bpf_program *prog);
 
 /*
  * No need for __attribute__((packed)), all members of 'bpf_map_def'
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 664ce8e7a60e..57564ad458ba 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -67,6 +67,7 @@ LIBBPF_0.0.1 {
 		bpf_prog_test_run;
 		bpf_prog_test_run_xattr;
 		bpf_program__fd;
+		bpf_program__is_inet_lookup;
 		bpf_program__is_kprobe;
 		bpf_program__is_perf_event;
 		bpf_program__is_raw_tracepoint;
@@ -84,6 +85,7 @@ LIBBPF_0.0.1 {
 		bpf_program__priv;
 		bpf_program__set_expected_attach_type;
 		bpf_program__set_ifindex;
+		bpf_program__set_inet_lookup;
 		bpf_program__set_kprobe;
 		bpf_program__set_perf_event;
 		bpf_program__set_prep;
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index 4b0b0364f5fc..c365223a2d1e 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -102,6 +102,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
 	case BPF_PROG_TYPE_CGROUP_SYSCTL:
 	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
+	case BPF_PROG_TYPE_INET_LOOKUP:
 	default:
 		break;
 	}
-- 
2.20.1


^ permalink raw reply related

* [RFCv2 bpf-next 07/12] inet6: Run inet_lookup bpf program on socket lookup
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski
In-Reply-To: <20190828072250.29828-1-jakub@cloudflare.com>

Following the ipv4 changes, run a BPF program attached to netns in context
of which we're doing the socket lookup so that it can redirect the skb to a
socket of its choice. The program runs before the listening socket lookup.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/inet6_hashtables.h | 19 +++++++++++++++++++
 net/ipv6/inet6_hashtables.c    |  5 +++++
 2 files changed, 24 insertions(+)

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index fe96bf247aac..c2393d148d8d 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -104,6 +104,25 @@ struct sock *inet6_lookup(struct net *net, struct inet_hashinfo *hashinfo,
 			  const int dif);
 
 int inet6_hash(struct sock *sk);
+
+static inline struct sock *inet6_lookup_run_bpf(struct net *net, u8 proto,
+						const struct in6_addr *saddr,
+						__be16 sport,
+						const struct in6_addr *daddr,
+						unsigned short hnum)
+{
+	struct bpf_inet_lookup_kern ctx = {
+		.family		= AF_INET6,
+		.protocol	= proto,
+		.saddr6		= *saddr,
+		.sport		= sport,
+		.daddr6		= *daddr,
+		.hnum		= hnum,
+	};
+
+	return __inet_lookup_run_bpf(net, &ctx);
+}
+
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 
 #define INET6_MATCH(__sk, __net, __saddr, __daddr, __ports, __dif, __sdif) \
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index cf60fae9533b..40dd0a3d80ed 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -157,6 +157,11 @@ struct sock *inet6_lookup_listener(struct net *net,
 	struct sock *result = NULL;
 	unsigned int hash2;
 
+	result = inet6_lookup_run_bpf(net, hashinfo->protocol,
+				      saddr, sport, daddr, hnum);
+	if (result)
+		goto done;
+
 	hash2 = ipv6_portaddr_hash(net, daddr, hnum);
 	ilb2 = inet_lhash2_bucket(hashinfo, hash2);
 
-- 
2.20.1


^ permalink raw reply related

* [RFCv2 bpf-next 08/12] udp: Run inet_lookup bpf program on socket lookup
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski
In-Reply-To: <20190828072250.29828-1-jakub@cloudflare.com>

Following the TCP socket lookup changes, allow selecting the receiving
socket from BPF before searching for bound socket by destination address
and port.

As connected and bound but non-connected socket lookup currently happens in
one step, we split the lookup in two phases to run BPF only after a lookup
for a connected socket was a miss. Hence making sure connected UDP sockets
continue to work as expected in presence of a BPF inet_lookup program.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv4/udp.c | 44 ++++++++++++++++++++++++++++++++------------
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9fffe9e9eec6..3a4b98f89249 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -353,7 +353,7 @@ int udp_v4_get_port(struct sock *sk, unsigned short snum)
 static int compute_score(struct sock *sk, struct net *net,
 			 __be32 saddr, __be16 sport,
 			 __be32 daddr, unsigned short hnum,
-			 int dif, int sdif)
+			 int dif, int sdif, unsigned char state)
 {
 	int score;
 	struct inet_sock *inet;
@@ -364,6 +364,9 @@ static int compute_score(struct sock *sk, struct net *net,
 	    ipv6_only_sock(sk))
 		return -1;
 
+	if (state && sk->sk_state != state)
+		return -1;
+
 	if (sk->sk_rcv_saddr != daddr)
 		return -1;
 
@@ -411,7 +414,8 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 				     __be32 daddr, unsigned int hnum,
 				     int dif, int sdif,
 				     struct udp_hslot *hslot2,
-				     struct sk_buff *skb)
+				     struct sk_buff *skb,
+				     unsigned char state)
 {
 	struct sock *sk, *result;
 	int score, badness;
@@ -421,7 +425,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 	badness = 0;
 	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
 		score = compute_score(sk, net, saddr, sport,
-				      daddr, hnum, dif, sdif);
+				      daddr, hnum, dif, sdif, state);
 		if (score > badness) {
 			if (sk->sk_reuseport) {
 				hash = udp_ehashfn(net, daddr, hnum,
@@ -454,18 +458,34 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 	slot2 = hash2 & udptable->mask;
 	hslot2 = &udptable->hash2[slot2];
 
+	/* Lookup connected sockets */
 	result = udp4_lib_lookup2(net, saddr, sport,
 				  daddr, hnum, dif, sdif,
-				  hslot2, skb);
-	if (!result) {
-		hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
-		slot2 = hash2 & udptable->mask;
-		hslot2 = &udptable->hash2[slot2];
+				  hslot2, skb, TCP_ESTABLISHED);
+	if (result)
+		goto done;
 
-		result = udp4_lib_lookup2(net, saddr, sport,
-					  htonl(INADDR_ANY), hnum, dif, sdif,
-					  hslot2, skb);
-	}
+	/* Lookup redirect from BPF */
+	result = inet_lookup_run_bpf(net, udptable->protocol,
+				     saddr, sport, daddr, hnum);
+	if (result)
+		goto done;
+
+	/* Lookup bound sockets */
+	result = udp4_lib_lookup2(net, saddr, sport,
+				  daddr, hnum, dif, sdif,
+				  hslot2, skb, 0);
+	if (result)
+		goto done;
+
+	hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
+	slot2 = hash2 & udptable->mask;
+	hslot2 = &udptable->hash2[slot2];
+
+	result = udp4_lib_lookup2(net, saddr, sport,
+				  htonl(INADDR_ANY), hnum, dif, sdif,
+				  hslot2, skb, 0);
+done:
 	if (IS_ERR(result))
 		return NULL;
 	return result;
-- 
2.20.1


^ permalink raw reply related

* [RFCv2 bpf-next 12/12] bpf: Test redirecting listening/receiving socket lookup
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski
In-Reply-To: <20190828072250.29828-1-jakub@cloudflare.com>

Check that steering the packets targeted at a local (address, port) that is
different than the server's bind() address with a BPF inet_lookup program
works as expected for TCP or UDP over either IPv4 or IPv6. Make sure that
it is possible to redirect IPv4 packets to IPv6 sockets that are not
V6-only.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   5 +-
 tools/testing/selftests/bpf/bpf_helpers.h     |   3 +
 .../selftests/bpf/progs/inet_lookup_progs.c   |  78 +++
 .../testing/selftests/bpf/test_inet_lookup.c  | 522 ++++++++++++++++++
 .../testing/selftests/bpf/test_inet_lookup.sh |  35 ++
 6 files changed, 642 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/inet_lookup_progs.c
 create mode 100644 tools/testing/selftests/bpf/test_inet_lookup.c
 create mode 100755 tools/testing/selftests/bpf/test_inet_lookup.sh

diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index 60c9338cd9b4..7442bd9166c7 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -44,3 +44,4 @@ test_sockopt_sk
 test_sockopt_multi
 test_sockopt_inherit
 test_tcp_rtt
+test_inet_lookup
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 7a23d94fe6a9..89dbbc032c8f 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -65,7 +65,8 @@ TEST_PROGS := test_kmod.sh \
 	test_tcp_check_syncookie.sh \
 	test_tc_tunnel.sh \
 	test_tc_edt.sh \
-	test_xdping.sh
+	test_xdping.sh \
+	test_inet_lookup.sh
 
 TEST_PROGS_EXTENDED := with_addr.sh \
 	with_tunnels.sh \
@@ -75,7 +76,7 @@ TEST_PROGS_EXTENDED := with_addr.sh \
 # Compile but not part of 'make run_tests'
 TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr test_skb_cgroup_id_user \
 	flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \
-	test_lirc_mode2_user
+	test_lirc_mode2_user test_inet_lookup
 
 include ../lib.mk
 
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 6c4930bc6e2e..dda00609098a 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -231,6 +231,9 @@ static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal;
 static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip,
 					  int ip_len, void *tcp, int tcp_len) =
 	(void *) BPF_FUNC_tcp_gen_syncookie;
+static int (*bpf_redirect_lookup)(void *ctx, void *map, void *key,
+				  __u64 flags) =
+	(void *) BPF_FUNC_redirect_lookup;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/progs/inet_lookup_progs.c b/tools/testing/selftests/bpf/progs/inet_lookup_progs.c
new file mode 100644
index 000000000000..16b1b2e241e4
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/inet_lookup_progs.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <sys/socket.h>
+
+#include "bpf_endian.h"
+#include "bpf_helpers.h"
+
+#define IP4(a, b, c, d)	((__u32)(		\
+	((__u32)((a) & (__u32)0xffUL) << 24) |	\
+	((__u32)((b) & (__u32)0xffUL) << 16) |	\
+	((__u32)((c) & (__u32)0xffUL) <<  8) |	\
+	((__u32)((d) & (__u32)0xffUL) <<  0)))
+
+#define REUSEPORT_ARRAY_SIZE 32
+
+struct {
+	__uint(type, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY);
+	__uint(max_entries, REUSEPORT_ARRAY_SIZE);
+	__type(key, __u32);
+	__type(value, __u64);
+} redir_map SEC(".maps");
+
+static const __u32 DST_PORT = 7007;
+static const __u32 DST_IP4 = IP4(127, 0, 0, 1);
+static const __u32 DST_IP6[] = { 0xfd000000, 0x0, 0x0, 0x00000001 };
+
+/* Redirect packets destined for port DST_PORT to socket at redir_map[0]. */
+SEC("inet_lookup/redir_port")
+int redir_port(struct bpf_inet_lookup *ctx)
+{
+	__u32 index = 0;
+	__u64 flags = 0;
+
+	if (ctx->local_port != DST_PORT)
+		return BPF_OK;
+
+	return bpf_redirect_lookup(ctx, &redir_map, &index, flags);
+}
+
+/* Redirect packets destined for DST_IP4 address to socket at redir_map[0]. */
+SEC("inet_lookup/redir_ip4")
+int redir_ip4(struct bpf_inet_lookup *ctx)
+{
+	__u32 index = 0;
+	__u64 flags = 0;
+
+	if (ctx->family != AF_INET)
+		return BPF_OK;
+	if (ctx->local_port != DST_PORT)
+		return BPF_OK;
+	if (ctx->local_ip4 != bpf_htonl(DST_IP4))
+		return BPF_OK;
+
+	return bpf_redirect_lookup(ctx, &redir_map, &index, flags);
+}
+
+/* Redirect packets destined for DST_IP6 address to socket at redir_map[0]. */
+SEC("inet_lookup/redir_ip6")
+int redir_ip6(struct bpf_inet_lookup *ctx)
+{
+	__u32 index = 0;
+	__u64 flags = 0;
+
+	if (ctx->family != AF_INET6)
+		return BPF_OK;
+	if (ctx->local_port != DST_PORT)
+		return BPF_OK;
+	if (ctx->local_ip6[0] != bpf_htonl(DST_IP6[0]) ||
+	    ctx->local_ip6[1] != bpf_htonl(DST_IP6[1]) ||
+	    ctx->local_ip6[2] != bpf_htonl(DST_IP6[2]) ||
+	    ctx->local_ip6[3] != bpf_htonl(DST_IP6[3]))
+		return BPF_OK;
+
+	return bpf_redirect_lookup(ctx, &redir_map, &index, flags);
+}
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1;
diff --git a/tools/testing/selftests/bpf/test_inet_lookup.c b/tools/testing/selftests/bpf/test_inet_lookup.c
new file mode 100644
index 000000000000..7e222488514c
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_inet_lookup.c
@@ -0,0 +1,522 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * L7 echo tests with the server listening/receiving at a different
+ * (address, port) than the client sends packets to.
+ *
+ * Traffic is steered to the server socket by redirecting the socket
+ * lookup with an eBPF inet_lookup program. The inet_lookup program,
+ * selected a target listening/bound socket from SOCKARRAY map based
+ * on the packet's 4-tuple.
+ */
+
+#include <arpa/inet.h>
+#include <assert.h>
+#include <errno.h>
+#include <error.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include <bpf/libbpf.h>
+#include <bpf/bpf.h>
+
+#include "bpf_rlimit.h"
+#include "bpf_util.h"
+
+#define BPF_FILE	"./inet_lookup_progs.o"
+#define MAX_ERROR_LEN	256
+
+/* External (address, port) pairs the client sends packets to. */
+#define EXT_IP4		"127.0.0.1"
+#define EXT_IP6		"fd00::1"
+#define EXT_PORT	7007
+
+/* Internal (address, port) pairs the server listens/receives at. */
+#define INT_IP4		"127.0.0.2"
+#define INT_IP4_V6	"::ffff:127.0.0.2"
+#define INT_IP6		"fd00::2"
+#define INT_PORT	8008
+
+#define REUSEPORT_ARRAY_SIZE 32
+
+struct inet_addr {
+	const char *ip;
+	unsigned short port;
+};
+
+struct test {
+	const char *desc;
+	const char *bpf_prog;
+
+	int socket_type;
+
+	struct inet_addr send_to;
+	struct inet_addr recv_at;
+};
+
+static const struct test tests[] = {
+	{
+		.desc		= "TCP IPv4 redir port",
+		.bpf_prog	= "inet_lookup/redir_port",
+		.socket_type	= SOCK_STREAM,
+		.send_to	= { EXT_IP4, EXT_PORT },
+		.recv_at	= { EXT_IP4, INT_PORT },
+	},
+	{
+		.desc		= "TCP IPv4 redir addr",
+		.bpf_prog	= "inet_lookup/redir_ip4",
+		.socket_type	= SOCK_STREAM,
+		.send_to	= { EXT_IP4, EXT_PORT },
+		.recv_at	= { INT_IP4, EXT_PORT },
+	},
+	{
+		.desc		= "TCP IPv6 redir port",
+		.bpf_prog	= "inet_lookup/redir_port",
+		.socket_type	= SOCK_STREAM,
+		.send_to	= { EXT_IP6, EXT_PORT },
+		.recv_at	= { EXT_IP6, INT_PORT },
+	},
+	{
+		.desc		= "TCP IPv6 redir addr",
+		.bpf_prog	= "inet_lookup/redir_ip6",
+		.socket_type	= SOCK_STREAM,
+		.send_to	= { EXT_IP6, EXT_PORT },
+		.recv_at	= { INT_IP6, EXT_PORT },
+	},
+	{
+		.desc		= "TCP IPv4->IPv6 redir port",
+		.bpf_prog	= "inet_lookup/redir_port",
+		.socket_type	= SOCK_STREAM,
+		.recv_at	= { INT_IP4_V6, INT_PORT },
+		.send_to	= { EXT_IP4, EXT_PORT },
+	},
+	{
+		.desc		= "UDP IPv4 redir port",
+		.bpf_prog	= "inet_lookup/redir_port",
+		.socket_type	= SOCK_DGRAM,
+		.send_to	= { EXT_IP4, EXT_PORT },
+		.recv_at	= { EXT_IP4, INT_PORT },
+	},
+	{
+		.desc		= "UDP IPv4 redir addr",
+		.bpf_prog	= "inet_lookup/redir_ip4",
+		.socket_type	= SOCK_DGRAM,
+		.send_to	= { EXT_IP4, EXT_PORT },
+		.recv_at	= { INT_IP4, EXT_PORT },
+	},
+	{
+		.desc		= "UDP IPv6 redir port",
+		.bpf_prog	= "inet_lookup/redir_port",
+		.socket_type	= SOCK_DGRAM,
+		.send_to	= { EXT_IP6, EXT_PORT },
+		.recv_at	= { EXT_IP6, INT_PORT },
+	},
+	{
+		.desc		= "UDP IPv6 redir addr",
+		.bpf_prog	= "inet_lookup/redir_ip6",
+		.socket_type	= SOCK_DGRAM,
+		.send_to	= { EXT_IP6, EXT_PORT },
+		.recv_at	= { INT_IP6, EXT_PORT },
+	},
+	{
+		.desc		= "UDP IPv4->IPv6 redir port",
+		.bpf_prog	= "inet_lookup/redir_port",
+		.socket_type	= SOCK_DGRAM,
+		.recv_at	= { INT_IP4_V6, INT_PORT },
+		.send_to	= { EXT_IP4, EXT_PORT },
+	},
+};
+
+static bool is_ipv6_addr(const char *ip)
+{
+	return !!strchr(ip, ':');
+}
+
+static void make_addr(int family, const char *ip, int port,
+		      struct sockaddr_storage *ss, int *sz)
+{
+	struct sockaddr_in *addr4;
+	struct sockaddr_in6 *addr6;
+
+	switch (family) {
+	case AF_INET:
+		addr4 = (struct sockaddr_in *)ss;
+		addr4->sin_family = AF_INET;
+		addr4->sin_port = htons(port);
+		if (!inet_pton(AF_INET, ip, &addr4->sin_addr))
+			error(1, errno, "inet_pton failed: %s", ip);
+		*sz = sizeof(*addr4);
+		break;
+	case AF_INET6:
+		addr6 = (struct sockaddr_in6 *)ss;
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_port = htons(port);
+		if (!inet_pton(AF_INET6, ip, &addr6->sin6_addr))
+			error(1, errno, "inet_pton failed: %s", ip);
+		*sz = sizeof(*addr6);
+		break;
+	default:
+		error(1, 0, "unsupported family %d", family);
+	}
+}
+
+static int make_server(int type, const char *ip, int port)
+{
+	struct sockaddr_storage ss = {0};
+	int fd, opt, sz;
+	int family;
+
+	family = is_ipv6_addr(ip) ? AF_INET6 : AF_INET;
+	make_addr(family, ip, port, &ss, &sz);
+
+	fd = socket(family, type, 0);
+	if (fd < 0)
+		error(1, errno, "failed to create listen socket");
+
+	opt = 1;
+	if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &opt, sizeof(opt)))
+		error(1, errno, "failed to set SO_REUSEPORT");
+	if (type == SOCK_DGRAM) {
+		if (setsockopt(fd, SOL_IP, IP_RECVORIGDSTADDR,
+			       &opt, sizeof(opt)))
+			error(1, errno, "failed to set IP_RECVORIGDSTADDR");
+	}
+	if (family == AF_INET6 && type == SOCK_DGRAM) {
+		if (setsockopt(fd, SOL_IPV6, IPV6_RECVORIGDSTADDR,
+			       &opt, sizeof(opt)))
+			error(1, errno, "failed to set IPV6_RECVORIGDSTADDR");
+	}
+
+	if (bind(fd, (struct sockaddr *)&ss, sz))
+		error(1, errno, "failed to bind listen socket");
+
+	if (type == SOCK_STREAM && listen(fd, 1))
+		error(1, errno, "failed to listen on port %d", port);
+
+	return fd;
+}
+
+static int make_client(int type, const char *ip, int port)
+{
+	struct sockaddr_storage ss = {0};
+	struct sockaddr *sa;
+	int family;
+	int fd, sz;
+
+	family = is_ipv6_addr(ip) ? AF_INET6 : AF_INET;
+	make_addr(family, ip, port, &ss, &sz);
+	sa = (struct sockaddr *)&ss;
+
+	fd = socket(family, type, 0);
+	if (fd < 0)
+		error(1, errno, "failed to create socket");
+
+	if (connect(fd, sa, sz))
+		error(1, errno, "failed to connect socket");
+
+	return fd;
+}
+
+static void send_byte(int fd)
+{
+	if (send(fd, "a", 1, 0) < 1)
+		error(1, errno, "failed to send message");
+}
+
+static void recv_byte(int fd)
+{
+	char buf[1];
+
+	if (recv(fd, buf, sizeof(buf), 0) < 1)
+		error(1, errno, "failed to receive message");
+}
+
+static void tcp_recv_send(int server_fd)
+{
+	char buf[1];
+	size_t len;
+	ssize_t n;
+	int fd;
+
+	fd = accept(server_fd, NULL, NULL);
+	if (fd < 0)
+		error(1, errno, "failed to accept");
+
+	len = sizeof(buf);
+	n = recv(fd, buf, len, 0);
+	if (n < 0)
+		error(1, errno, "failed to receive");
+	if (n < len)
+		error(1, 0, "partial receive");
+
+	n = send(fd, buf, len, 0);
+	if (n < 0)
+		error(1, errno, "failed to send");
+	if (n < len)
+		error(1, 0, "partial send");
+
+	close(fd);
+}
+
+static void udp_recv_send(int server_fd)
+{
+	char cmsg_buf[CMSG_SPACE(sizeof(struct sockaddr_storage))];
+	struct sockaddr_storage _src_addr = { 0 };
+	struct sockaddr_storage _dst_addr = { 0 };
+	struct sockaddr_storage *src_addr = &_src_addr;
+	struct sockaddr_storage *dst_addr = NULL;
+	struct msghdr msg = { 0 };
+	struct iovec iov = { 0 };
+	struct cmsghdr *cm;
+	char buf[1];
+	ssize_t n;
+	int fd;
+
+	iov.iov_base = buf;
+	iov.iov_len = sizeof(buf);
+
+	msg.msg_name = src_addr;
+	msg.msg_namelen = sizeof(*src_addr);
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+
+	n = recvmsg(server_fd, &msg, 0);
+	if (n < 0)
+		error(1, errno, "failed to receive");
+	if (n < sizeof(buf))
+		error(1, 0, "partial receive");
+	if (msg.msg_flags & MSG_CTRUNC)
+		error(1, errno, "truncated cmsg");
+
+	for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) {
+		if ((cm->cmsg_level == SOL_IP &&
+		     cm->cmsg_type == IP_ORIGDSTADDR) ||
+		    (cm->cmsg_level == SOL_IPV6 &&
+		     cm->cmsg_type == IPV6_ORIGDSTADDR)) {
+			dst_addr = (struct sockaddr_storage *)CMSG_DATA(cm);
+			break;
+		}
+		error(0, 0, "ignored cmsg at level %d type %d",
+		      cm->cmsg_level, cm->cmsg_type);
+	}
+	if (!dst_addr)
+		error(1, 0, "failed to get destination address");
+
+	/* Server bound to IPv4-mapped IPv6 address */
+	if (src_addr->ss_family != dst_addr->ss_family) {
+		assert(dst_addr->ss_family == AF_INET);
+
+		struct sockaddr_in *dst4 = (void *)dst_addr;
+		struct sockaddr_in6 *dst6 = (void *)&_dst_addr;
+
+		dst6->sin6_family = AF_INET6;
+		dst6->sin6_port = dst4->sin_port;
+
+		dst6->sin6_addr.s6_addr[10] = 0xff;
+		dst6->sin6_addr.s6_addr[11] = 0xff;
+		memcpy(&dst6->sin6_addr.s6_addr[12],
+		       &dst4->sin_addr.s_addr, sizeof(dst4->sin_addr.s_addr));
+
+		dst_addr = (void *)dst6;
+	}
+
+	/* Reply from original destination address. */
+	fd = socket(dst_addr->ss_family, SOCK_DGRAM, 0);
+	if (fd < 0)
+		error(1, errno, "failed to create socket");
+
+	if (bind(fd, (struct sockaddr *)dst_addr, sizeof(*dst_addr)))
+		error(1, errno, "failed to bind socket");
+
+	msg.msg_control = NULL;
+	msg.msg_controllen = 0;
+	n = sendmsg(fd, &msg, 0);
+	if (n < 0)
+		error(1, errno, "failed to send");
+	if (n < sizeof(buf))
+		error(1, 0, "partial send");
+
+	close(fd);
+}
+
+static void tcp_echo(int client_fd, int server_fd)
+{
+	send_byte(client_fd);
+	tcp_recv_send(server_fd);
+	recv_byte(client_fd);
+}
+
+static void udp_echo(int client_fd, int server_fd)
+{
+	send_byte(client_fd);
+	udp_recv_send(server_fd);
+	recv_byte(client_fd);
+}
+
+static struct bpf_object *load_prog(void)
+{
+	char buf[MAX_ERROR_LEN];
+	struct bpf_object *obj;
+	int prog_fd;
+	int err;
+
+	err = bpf_prog_load(BPF_FILE, BPF_PROG_TYPE_UNSPEC, &obj, &prog_fd);
+	if (err) {
+		libbpf_strerror(err, buf, ARRAY_SIZE(buf));
+		error(1, 0, "failed to open bpf file '%s': %s", BPF_FILE, buf);
+	}
+
+	return obj;
+}
+
+static void attach_prog(struct bpf_object *obj, const char *sec)
+{
+	enum bpf_attach_type attach_type;
+	struct bpf_program *prog;
+	char buf[MAX_ERROR_LEN];
+	int target_fd = -1;
+	int prog_fd;
+	int err;
+
+	prog = bpf_object__find_program_by_title(obj, sec);
+	err = libbpf_get_error(prog);
+	if (err) {
+		libbpf_strerror(err, buf, ARRAY_SIZE(buf));
+		error(1, 0, "failed to find section \"%s\": %s", sec, buf);
+	}
+
+	err = libbpf_attach_type_by_name(sec, &attach_type);
+	if (err) {
+		libbpf_strerror(err, buf, ARRAY_SIZE(buf));
+		error(1, 0, "failed to identify attach type: %s", buf);
+	}
+
+	prog_fd = bpf_program__fd(prog);
+	if (prog_fd < 0)
+		error(1, errno, "failed to get prog fd");
+
+	err = bpf_prog_attach(prog_fd, target_fd, attach_type, 0);
+	if (err)
+		error(1, -err, "failed to attach prog");
+}
+
+static void detach_prog(const char *sec)
+{
+	enum bpf_attach_type attach_type;
+	char buf[MAX_ERROR_LEN];
+	int target_fd = -1;
+	int err;
+
+	err = libbpf_attach_type_by_name(sec, &attach_type);
+	if (err) {
+		libbpf_strerror(err, buf, ARRAY_SIZE(buf));
+		error(1, 0, "failed to identify attach type: %s", buf);
+	}
+
+	err = bpf_prog_detach(target_fd, attach_type);
+	if (err && err != -EPERM)
+		error(1, -err, "failed to detach prog");
+}
+
+static void update_redir_map(int map_fd, int index, int sock_fd)
+{
+	uint64_t value;
+	int err;
+
+	value = (uint64_t)sock_fd;
+	err = bpf_map_update_elem(map_fd, &index, &value, BPF_NOEXIST);
+	if (err)
+		error(1, errno, "failed to update redir_map @ %d", index);
+}
+
+static void test_prog_query(void)
+{
+	__u32 attach_flags = 0;
+	__u32 prog_ids[1] = { 0 };
+	__u32 prog_cnt = 1;
+	int fd, err;
+
+	fd = open("/proc/self/ns/net", O_RDONLY);
+	if (fd < 0)
+		error(1, errno, "failed to open /proc/self/ns/net");
+
+	err = bpf_prog_query(fd, BPF_INET_LOOKUP, 0,
+			     &attach_flags, prog_ids, &prog_cnt);
+	if (err)
+		error(1, errno, "failed to query BPF_INET_LOOKUP prog");
+
+	assert(attach_flags == 0);
+	assert(prog_cnt == 1);
+	assert(prog_ids[0] != 0);
+
+	close(fd);
+}
+
+static void run_test(const struct test *t, struct bpf_object *obj,
+		     int redir_map)
+{
+	int client_fd, server_fd;
+
+	fprintf(stderr, "test %s\n", t->desc);
+
+	/* Clean up after any previous failed test runs */
+	detach_prog(t->bpf_prog);
+
+	attach_prog(obj, t->bpf_prog);
+	test_prog_query();
+
+	server_fd = make_server(t->socket_type,
+				t->recv_at.ip, t->recv_at.port);
+	update_redir_map(redir_map, 0, server_fd);
+
+	client_fd = make_client(t->socket_type,
+				t->send_to.ip, t->send_to.port);
+
+	if (t->socket_type == SOCK_STREAM)
+		tcp_echo(client_fd, server_fd);
+	else
+		udp_echo(client_fd, server_fd);
+
+	close(client_fd);
+	close(server_fd);
+
+	detach_prog(t->bpf_prog);
+}
+
+static int find_redir_map(struct bpf_object *obj)
+{
+	struct bpf_map *map;
+	int fd;
+
+	map = bpf_object__find_map_by_name(obj, "redir_map");
+	if (!map)
+		error(1, 0, "failed to find 'redir_map'");
+	fd = bpf_map__fd(map);
+	if (fd < 0)
+		error(1, 0, "failed to get 'redir_map' fd");
+
+	return fd;
+}
+
+int main(void)
+{
+	struct bpf_object *obj;
+	const struct test *t;
+	int redir_map;
+
+	obj = load_prog();
+	redir_map = find_redir_map(obj);
+
+	for (t = tests; t < tests + ARRAY_SIZE(tests); t++)
+		run_test(t, obj, redir_map);
+
+	close(redir_map);
+	bpf_object__unload(obj);
+
+	fprintf(stderr, "PASS\n");
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/test_inet_lookup.sh b/tools/testing/selftests/bpf/test_inet_lookup.sh
new file mode 100755
index 000000000000..5efb42fbdf59
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_inet_lookup.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+if [[ $EUID -ne 0 ]]; then
+        echo "This script must be run as root"
+        echo "FAIL"
+        exit 1
+fi
+
+# Run the script in a dedicated network namespace.
+if [[ -z $(ip netns identify $$) ]]; then
+        ../net/in_netns.sh "$0" "$@"
+        exit $?
+fi
+
+readonly IP6_1="fd00::1"
+readonly IP6_2="fd00::2"
+
+setup()
+{
+        ip -6 addr add ${IP6_1}/128 dev lo
+        ip -6 addr add ${IP6_2}/128 dev lo
+}
+
+cleanup()
+{
+        ip -6 addr del ${IP6_1}/128 dev lo
+        ip -6 addr del ${IP6_2}/128 dev lo
+}
+
+trap cleanup EXIT
+setup
+
+./test_inet_lookup
+exit $?
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH 1/2] PTP: introduce new versions of IOCTLs
From: Felipe Balbi @ 2019-08-28  8:23 UTC (permalink / raw)
  To: Joe Perches, Richard Cochran; +Cc: Christopher S Hall, netdev, linux-kernel
In-Reply-To: <0f1487356ae2e9ff185ede2359381630007538c7.camel@perches.com>

[-- Attachment #1: Type: text/plain, Size: 1047 bytes --]


Hi,

Joe Perches <joe@perches.com> writes:

> On Mon, 2019-08-19 at 08:43 -0700, Richard Cochran wrote:
>> On Sun, Aug 18, 2019 at 03:07:18PM -0700, Joe Perches wrote:
>> > Also the original patch deletes 2 case entries for
>> > PTP_PIN_GETFUNC and PTP_PIN_SETFUNC and converts them to
>> > PTP_PIN_GETFUNC2 and PTP_PIN_SETFUNC2 but still uses tests
>> > for the deleted case label entries making part of the case
>> > code block unreachable.
>> > 
>> > That's at least a defect:
>> > 
>> > -	case PTP_PIN_GETFUNC:
>> > +	case PTP_PIN_GETFUNC2:
>> > 
>> > and
>> >  
>> > -	case PTP_PIN_SETFUNC:
>> > +	case PTP_PIN_SETFUNC2:
>> 
>> Good catch.  Felipe, please fix that!
>> 
>> (Regarding Joe's memset suggestion, I'll leave that to your discretion.)
>
> Not just how declarations are done or memset.
>
> Minimizing unnecessary stack consumption is generally good.

Originally I had memset only on the three cases where they were
needed. Richard, which do you prefer? I don't mind changing it back.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* RE: [PATCH net-next v2 3/3] dpaa2-eth: Add pause frame support
From: Ioana Ciocoi Radulescu @ 2019-08-28  8:40 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev@vger.kernel.org, davem@davemloft.net, Ioana Ciornei
In-Reply-To: <20190827232132.GD26248@lunn.ch>

> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: Wednesday, August 28, 2019 2:22 AM
> To: Ioana Ciocoi Radulescu <ruxandra.radulescu@nxp.com>
> Cc: netdev@vger.kernel.org; davem@davemloft.net; Ioana Ciornei
> <ioana.ciornei@nxp.com>
> Subject: Re: [PATCH net-next v2 3/3] dpaa2-eth: Add pause frame support
> 
> On Tue, Aug 27, 2019 at 05:15:51PM +0300, Ioana Radulescu wrote:
> > Starting with firmware version MC10.18.0, we have support for
> > L2 flow control. Asymmetrical configuration (Rx or Tx only) is
> > supported, but not pause frame autonegotioation.
> 
> > +static int set_pause(struct dpaa2_eth_priv *priv)
> > +{
> > +	struct device *dev = priv->net_dev->dev.parent;
> > +	struct dpni_link_cfg link_cfg = {0};
> > +	int err;
> > +
> > +	/* Get the default link options so we don't override other flags */
> > +	err = dpni_get_link_cfg(priv->mc_io, 0, priv->mc_token, &link_cfg);
> > +	if (err) {
> > +		dev_err(dev, "dpni_get_link_cfg() failed\n");
> > +		return err;
> > +	}
> > +
> > +	link_cfg.options |= DPNI_LINK_OPT_PAUSE;
> > +	link_cfg.options &= ~DPNI_LINK_OPT_ASYM_PAUSE;
> > +	err = dpni_set_link_cfg(priv->mc_io, 0, priv->mc_token, &link_cfg);
> > +	if (err) {
> > +		dev_err(dev, "dpni_set_link_cfg() failed\n");
> > +		return err;
> > +	}
> > +
> > +	priv->link_state.options = link_cfg.options;
> > +
> > +	return 0;
> > +}
> > +
> >  /* Configure the DPNI object this interface is associated with */
> >  static int setup_dpni(struct fsl_mc_device *ls_dev)
> >  {
> > @@ -2500,6 +2562,13 @@ static int setup_dpni(struct fsl_mc_device
> *ls_dev)
> >
> >  	set_enqueue_mode(priv);
> >
> > +	/* Enable pause frame support */
> > +	if (dpaa2_eth_has_pause_support(priv)) {
> > +		err = set_pause(priv);
> > +		if (err)
> > +			goto close;
> 
> Hi Ioana
> 
> So by default you have the MAC do pause, not asym pause?  Generally,
> any MAC that can do asym pause does asym pause.

Clearing the ASYM_PAUSE flag only means we tell the firmware we want
both Rx and Tx pause to be enabled in the beginning. User can still set
an asymmetric config (i.e. only Rx pause or only Tx pause to be enabled)
if needed.

The truth table is like this:

PAUSE | ASYM_PAUSE | Rx pause | Tx pause
----------------------------------------
  0   |     0      | disabled | disabled
  0   |     1      | disabled | enabled
  1   |     0      | enabled  | enabled
  1   |     1      | enabled  | disabled

Thanks,
Ioana

^ permalink raw reply

* Re: [PATCH] arcnet: capmode: remove redundant assignment to pointer pkt
From: Sergei Shtylyov @ 2019-08-28  9:14 UTC (permalink / raw)
  To: Colin King, Michael Grzeschik, David S . Miller, netdev
  Cc: kernel-janitors, linux-kernel
In-Reply-To: <20190827112954.26677-1-colin.king@canonical.com>

Hello!

On 27.08.2019 14:29, Colin King wrote:

> From: Colin Ian King <colin.king@canonical.com>
> 
> Pointer pkt is being initialized with a value that is never read
> and pkg is being re-assigned a little later on. The assignment is
       ^^^ pkt

> redundant and hence can be removed.
> 
> Addresses-Coverity: ("Ununsed value")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH] bridge:fragmented packets dropped by bridge
From: Rundong Ge @ 2019-08-28  9:21 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Florian Westphal, davem, kuznet, yoshfuji, netdev,
	Pablo Neira Ayuso, kadlec, Roopa Prabhu, netfilter-devel,
	coreteam, bridge, Nikolay Aleksandrov, linux-kernel
In-Reply-To: <nycvar.YFH.7.76.1908260955400.22383@n3.vanv.qr>

Jan Engelhardt <jengelh@inai.de> 于2019年8月26日周一 下午3:59写道:
>
>
> On Tuesday 2019-07-30 14:35, Florian Westphal wrote:
> >Rundong Ge <rdong.ge@gmail.com> wrote:
> >> Given following setup:
> >> -modprobe br_netfilter
> >> -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
> >> -brctl addbr br0
> >> -brctl addif br0 enp2s0
> >> -brctl addif br0 enp3s0
> >> -brctl addif br0 enp6s0
> >> -ifconfig enp2s0 mtu 1300
> >> -ifconfig enp3s0 mtu 1500
> >> -ifconfig enp6s0 mtu 1500
> >> -ifconfig br0 up
> >>
> >>                  multi-port
> >> mtu1500 - mtu1500|bridge|1500 - mtu1500
> >>   A                  |            B
> >>                    mtu1300
> >
> >How can a bridge forward a frame from A/B to mtu1300?
>
> There might be a misunderstanding here judging from the shortness of this
> thread.
>
> I understood it such that the bridge ports (eth0,eth1) have MTU 1500, yet br0
> (in essence the third bridge port if you so wish) itself has MTU 1300.
>
> Therefore, frame forwarding from eth0 to eth1 should succeed, since the
> 1300-byte MTU is only relevant if the bridge decides the packet needs to be
> locally delivered.

Under this setup when I do "ping B -l 2000" from A, the fragmented
packets will be dropped by bridge.
When the "/proc/sys/net/bridge/bridge-nf-call-iptables" is on, bridge
will do defragment at PREROUTING and re-fragment at POSTROUTING. At
the re-fragment bridge will check if the max frag size is larger than
the bridge's MTU in  br_nf_ip_fragment(), if it is true packets will
be dropped.
And this patch use the outdev's MTU instead of the bridge's MTU to do
the br_nf_ip_fragment.

^ permalink raw reply

* Re: [PATCH net-next v4 3/3] dt-bindings: net: ethernet: Update mt7622 docs and dts to reflect the new phylink API
From: Matthias Brugger @ 2019-08-28  9:29 UTC (permalink / raw)
  To: René van Dorst, John Crispin, Sean Wang, Nelson Chang,
	David S . Miller
  Cc: netdev, linux-arm-kernel, linux-mediatek, linux-mips,
	Russell King, Frank Wunderlich, Stefan Roese
In-Reply-To: <20190825174341.20750-4-opensource@vdorst.com>

Hi David,

On 25/08/2019 19:43, René van Dorst wrote:
> This patch the removes the recently added mediatek,physpeed property.
> Use the fixed-link property speed = <2500> to set the phy in 2.5Gbit.
> See mt7622-bananapi-bpi-r64.dts for a working example.
> 
> Signed-off-by: René van Dorst <opensource@vdorst.com>
> --
> v3->v4:
> * no change
> v2->v3:
> * no change
> v1->v2:
> * SGMII port only support BASE-X at 2.5Gbit.
> ---
>  .../arm/mediatek/mediatek,sgmiisys.txt        |  2 --
>  .../dts/mediatek/mt7622-bananapi-bpi-r64.dts  | 28 +++++++++++++------
>  arch/arm64/boot/dts/mediatek/mt7622.dtsi      |  1 -
>  3 files changed, 19 insertions(+), 12 deletions(-)

Thanks for taking this patch. For the next time, please make sure that dts[i]
patches are independent from the binding description, as dts[i] should go
through my tree. No problem for this round, just saying for the future.

Regards,
Matthias

> 
> diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt
> index f5518f26a914..30cb645c0e54 100644
> --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt
> +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt
> @@ -9,8 +9,6 @@ Required Properties:
>  	- "mediatek,mt7622-sgmiisys", "syscon"
>  	- "mediatek,mt7629-sgmiisys", "syscon"
>  - #clock-cells: Must be 1
> -- mediatek,physpeed: Should be one of "auto", "1000" or "2500" to match up
> -		     the capability of the target PHY.
>  
>  The SGMIISYS controller uses the common clk binding from
>  Documentation/devicetree/bindings/clock/clock-bindings.txt
> diff --git a/arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts b/arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts
> index 710c5c3d87d3..83e10591e0e5 100644
> --- a/arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts
> +++ b/arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts
> @@ -115,24 +115,34 @@
>  };
>  
>  &eth {
> -	pinctrl-names = "default";
> -	pinctrl-0 = <&eth_pins>;
>  	status = "okay";
> +	gmac0: mac@0 {
> +		compatible = "mediatek,eth-mac";
> +		reg = <0>;
> +		phy-mode = "2500base-x";
> +
> +		fixed-link {
> +			speed = <2500>;
> +			full-duplex;
> +			pause;
> +		};
> +	};
>  
>  	gmac1: mac@1 {
>  		compatible = "mediatek,eth-mac";
>  		reg = <1>;
> -		phy-handle = <&phy5>;
> +		phy-mode = "rgmii";
> +
> +		fixed-link {
> +			speed = <1000>;
> +			full-duplex;
> +			pause;
> +		};
>  	};
>  
> -	mdio-bus {
> +	mdio: mdio-bus {
>  		#address-cells = <1>;
>  		#size-cells = <0>;
> -
> -		phy5: ethernet-phy@5 {
> -			reg = <5>;
> -			phy-mode = "sgmii";
> -		};
>  	};
>  };
>  
> diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
> index d1e13d340e26..dac51e98204c 100644
> --- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi
> +++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
> @@ -931,6 +931,5 @@
>  			     "syscon";
>  		reg = <0 0x1b128000 0 0x3000>;
>  		#clock-cells = <1>;
> -		mediatek,physpeed = "2500";
>  	};
>  };
> 

^ permalink raw reply

* Re: [PATCH net] netdevsim: Restore per-network namespace accounting for fib entries
From: Jiri Pirko @ 2019-08-28 10:37 UTC (permalink / raw)
  To: David Ahern; +Cc: davem, netdev, David Ahern
In-Reply-To: <20190806191517.8713-1-dsahern@kernel.org>

Tue, Aug 06, 2019 at 09:15:17PM CEST, dsahern@kernel.org wrote:
>From: David Ahern <dsahern@gmail.com>
>
>Prior to the commit in the fixes tag, the resource controller in netdevsim
>tracked fib entries and rules per network namespace. Restore that behavior.

David, please help me understand. If the counters are per-device, not
per-netns, they are both the same. If we have device (devlink instance)
is in a netns and take only things happening in this netns into account,
it should count exactly the same amount of fib entries, doesn't it?

I re-thinked the devlink netns patchset and currently I'm going in
slightly different direction. I'm having netns as an attribute of
devlink reload. So all the port netdevices and everything gets
re-instantiated into new netns. Works fine with mlxsw. There we also
re-register the fib notifier.

I think that this can work for your usecase in netdevsim too:
1) devlink instance is registering a fib notifier to track all fib
   entries in a namespace it belongs to. The counters are per-device -
   counting fib entries in a namespace the device is in.
2) another devlink instance can do the same tracking in the same
   namespace. No problem, it's a separate counter, but the numbers are
   the same. One can set different limits to different devlink
   instances, but you can have only one. That is the bahaviour you have
   now.
3) on devlink reload, netdevsim re-instantiates ports and re-registers
   fib notifier
4) on devlink reload with netns change, all should be fine as the
   re-registered fib nofitier replays the entries. The ports are
   re-instatiated in new netns.

This way, we would get consistent behaviour between netdevsim and real
devices (mlxsw), correct devlink-netns implementation (you also
suggested to move ports to the namespace). Everyone should be happy.

What do you think?

^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: kbuild test robot @ 2019-08-28 10:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: kbuild-all, luto, davem, daniel, netdev, bpf, kernel-team,
	linux-api
In-Reply-To: <20190827205213.456318-1-ast@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 1553 bytes --]

Hi Alexei,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Alexei-Starovoitov/bpf-capabilities-introduce-CAP_BPF/20190828-142441
base:   https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: ia64-defconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=ia64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/bpf/core.c: In function 'cap_bpf_tracing':
>> kernel/bpf/core.c:2110:31: error: implicit declaration of function 'perf_paranoid_tracepoint_raw' [-Werror=implicit-function-declaration]
            (capable(CAP_BPF) && !perf_paranoid_tracepoint_raw());
                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/perf_paranoid_tracepoint_raw +2110 kernel/bpf/core.c

  2106	
  2107	bool cap_bpf_tracing(void)
  2108	{
  2109		return capable(CAP_SYS_ADMIN) ||
> 2110		       (capable(CAP_BPF) && !perf_paranoid_tracepoint_raw());
  2111	}
  2112	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 19214 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox