* Re: [PATCH net v3] net/tls: prevent skb_orphan() from leaking TLS plain text with offload
From: Willem de Bruijn @ 2019-08-08 18:10 UTC (permalink / raw)
To: Jakub Kicinski
Cc: John Fastabend, David Miller, Network Development, davejwatson,
borisp, aviadye, Daniel Borkmann, Eric Dumazet,
Alexei Starovoitov, oss-drivers
In-Reply-To: <20190808103148.164bec9f@cakuba.netronome.com>
On Thu, Aug 8, 2019 at 1:32 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Thu, 8 Aug 2019 11:59:18 -0400, Willem de Bruijn wrote:
> > > diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> > > index 7c0b2b778703..43922d86e510 100644
> > > --- a/net/tls/tls_device.c
> > > +++ b/net/tls/tls_device.c
> > > @@ -373,9 +373,9 @@ static int tls_push_data(struct sock *sk,
> > > struct tls_context *tls_ctx = tls_get_ctx(sk);
> > > struct tls_prot_info *prot = &tls_ctx->prot_info;
> > > struct tls_offload_context_tx *ctx = tls_offload_ctx_tx(tls_ctx);
> > > - int tls_push_record_flags = flags | MSG_SENDPAGE_NOTLAST;
> > > int more = flags & (MSG_SENDPAGE_NOTLAST | MSG_MORE);
> > > struct tls_record_info *record = ctx->open_record;
> > > + int tls_push_record_flags;
> > > struct page_frag *pfrag;
> > > size_t orig_size = size;
> > > u32 max_open_record_len;
> > > @@ -390,6 +390,9 @@ static int tls_push_data(struct sock *sk,
> > > if (sk->sk_err)
> > > return -sk->sk_err;
> > >
> > > + flags |= MSG_SENDPAGE_DECRYPTED;
> > > + tls_push_record_flags = flags | MSG_SENDPAGE_NOTLAST;
> > > +
> >
> > Without being too familiar with this code: can this plaintext flag be
> > set once, closer to the call to do_tcp_sendpages, in tls_push_sg?
> >
> > Instead of two locations with multiple non-trivial codepaths between
> > them and do_tcp_sendpages.
> >
> > Or are there paths where the flag is not set? Which I imagine would
> > imply already passing s/w encrypted ciphertext.
>
> tls_push_sg() is shared with sw path which doesn't have the device
> validation.
>
> Device TLS can read tls_push_sg() via tls_push_partial_record() and
> tls_push_data(). tls_push_data() is addressed directly here,
> tls_push_partial_record() is again shared with SW path, so we have to
> address it by adding the flag in tls_device_write_space().
>
> The alternative is to add a conditional to tls_push_sg() which is
> a little less nice from performance and layering PoV but it is a lot
> simpler..
>
> Should I change?
Not at all. Thanks for the detailed explanation. That answered my last question
Acked-by: Willem de Bruijn <willemb@google.com>
^ permalink raw reply
* Re: [PATCH net-next] r8169: make use of xmit_more
From: Heiner Kallweit @ 2019-08-08 18:17 UTC (permalink / raw)
To: Holger Hoffstätte, Realtek linux nic maintainers,
David Miller
Cc: netdev@vger.kernel.org, Sander Eikelenboom, Eric Dumazet
In-Reply-To: <cfb9a1c7-57c8-db04-1081-ac1cb92bb447@applied-asynchrony.com>
On 08.08.2019 17:53, Holger Hoffstätte wrote:
> On 8/8/19 4:37 PM, Holger Hoffstätte wrote:
>>
>> Hello Heiner -
>>
>> On 7/28/19 11:25 AM, Heiner Kallweit wrote:
>>> There was a previous attempt to use xmit_more, but the change had to be
>>> reverted because under load sometimes a transmit timeout occurred [0].
>>> Maybe this was caused by a missing memory barrier, the new attempt
>>> keeps the memory barrier before the call to netif_stop_queue like it
>>> is used by the driver as of today. The new attempt also changes the
>>> order of some calls as suggested by Eric.
>>>
>>> [0] https://lkml.org/lkml/2019/2/10/39
>>>
>>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>
>> I decided to take one for the team and merged this into my 5.2.x tree (just
>> fixing up the path) and it has been working fine for the last 2 weeks in two
>> machines..until today, when for the first time in forever some random NFS traffic
>> made this old friend come out from under the couch:
>>
>> [Aug 8 14:13] ------------[ cut here ]------------
>> [ +0.000006] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
>> [ +0.000021] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x21f/0x230
>> [ +0.000001] Modules linked in: lz4 lz4_compress lz4_decompress nfsd auth_rpcgss oid_registry lockd grace sunrpc sch_fq_codel btrfs xor zstd_compress raid6_pq zstd_decompress bfq jitterentropy_rng nct6775 hwmon_vid coretemp hwmon x86_pkg_temp_thermal aesni_intel aes_x86_64 i915 glue_helper crypto_simd cryptd i2c_i801 intel_gtt i2c_algo_bit iosf_mbi drm_kms_helper syscopyarea usbhid sysfillrect r8169 sysimgblt fb_sys_fops realtek drm libphy drm_panel_orientation_quirks i2c_core video backlight mq_deadline
>> [ +0.000026] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.2.7 #1
>> [ +0.000001] Hardware name: System manufacturer System Product Name/P8Z68-V LX, BIOS 4105 07/01/2013
>> [ +0.000004] RIP: 0010:dev_watchdog+0x21f/0x230
>> [ +0.000002] Code: 3b 00 75 ea eb ad 4c 89 ef c6 05 1c 45 bd 00 01 e8 66 35 fc ff 44 89 e1 4c 89 ee 48 c7 c7 e8 5e fc 81 48 89 c2 e8 90 df 92 ff <0f> 0b eb 8e 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 66 66 66 66 90
>> [ +0.000002] RSP: 0018:ffffc90000118e68 EFLAGS: 00010286
>> [ +0.000002] RAX: 0000000000000000 RBX: ffff8887f7837600 RCX: 0000000000000303
>> [ +0.000001] RDX: 0000000000000001 RSI: 0000000000000092 RDI: ffffffff827a488c
>> [ +0.000001] RBP: ffff8887f9fbc440 R08: 0000000000000303 R09: 0000000000000003
>> [ +0.000001] R10: 000000000001004c R11: 0000000000000001 R12: 0000000000000000
>> [ +0.000009] R13: ffff8887f9fbc000 R14: ffffffff8173aa20 R15: dead000000000200
>> [ +0.000001] FS: 0000000000000000(0000) GS:ffff8887ff580000(0000) knlGS:0000000000000000
>> [ +0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ +0.000001] CR2: 00007f8d1c04d000 CR3: 0000000002209001 CR4: 00000000000606e0
>> [ +0.000000] Call Trace:
>> [ +0.000002] <IRQ>
>> [ +0.000005] call_timer_fn+0x2b/0x120
>> [ +0.000002] expire_timers+0xa4/0x100
>> [ +0.000001] run_timer_softirq+0x8c/0x170
>> [ +0.000002] ? __hrtimer_run_queues+0x13a/0x290
>> [ +0.000003] ? sched_clock_cpu+0xe/0x130
>> [ +0.000003] __do_softirq+0xeb/0x2de
>> [ +0.000003] irq_exit+0x9d/0xe0
>> [ +0.000002] smp_apic_timer_interrupt+0x60/0x110
>> [ +0.000003] apic_timer_interrupt+0xf/0x20
>> [ +0.000001] </IRQ>
>> [ +0.000003] RIP: 0010:cpuidle_enter_state+0xad/0x930
>> [ +0.000001] Code: c5 66 66 66 66 90 31 ff e8 90 99 9e ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 39 08 00 00 31 ff e8 e7 26 a2 ff fb 45 85 e4 <0f> 88 34 02 00 00 49 63 cc 4c 2b 2c 24 48 8d 04 49 48 c1 e0 05 8b
>> [ +0.000000] RSP: 0018:ffffc9000008be50 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
>> [ +0.000001] RAX: ffff8887ff5a9180 RBX: ffffffff822b6c40 RCX: 000000000000001f
>> [ +0.000001] RDX: 0000000000000000 RSI: 0000000033087154 RDI: 0000000000000000
>> [ +0.000001] RBP: ffff8887ff5b1310 R08: 000030d021fae397 R09: ffff8887ff59c8c0
>> [ +0.000000] R10: ffff8887ff59c8c0 R11: 0000000000000006 R12: 0000000000000004
>> [ +0.000001] R13: 000030d021fae397 R14: 0000000000000004 R15: ffff8887fc281600
>> [ +0.000001] cpuidle_enter+0x29/0x40
>> [ +0.000002] do_idle+0x1e5/0x280
>> [ +0.000001] cpu_startup_entry+0x19/0x20
>> [ +0.000002] start_secondary+0x186/0x1c0
>> [ +0.000001] secondary_startup_64+0xa4/0xb0
>> [ +0.000001] ---[ end trace 99493c768580f4fd ]---
>>
>> The device is:
>>
>> Aug 7 23:19:09 tux kernel: libphy: r8169: probed
>> Aug 7 23:19:09 tux kernel: r8169 0000:04:00.0 eth0: RTL8168evl/8111evl, c8:60:00:68:33:cc, XID 2c9, IRQ 36
>> Aug 7 23:19:09 tux kernel: r8169 0000:04:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
>> Aug 7 23:19:12 tux kernel: RTL8211E Gigabit Ethernet r8169-400:00: attached PHY driver [RTL8211E Gigabit Ethernet] (mii_bus:phy_addr=r8169-400:00, irq=IGNORE)
>> Aug 7 23:19:13 tux kernel: r8169 0000:04:00.0 eth0: No native access to PCI extended config space, falling back to CSI
>>
>> and using fq_codel, of course.
>>
>> This cpuidle hiccup used to be completely gone without xmit_more and this was
>> the first (and so far only) time since merging it (regardless of load).
>> Also, while I'm using BMQ as CPU scheduler, that hasn't made a difference for
>> this particular problem in the past (with MuQSS/PDS) either; way back when I had
>> Eric's previous attempt(s) it also hiccupped with CFS.
>>
>> Revert or wait for more reports when -next is merged in 5.4?
>
> Another question/data point: I've had the whole basket of offloads activated:
>
> ethtool --offload eth0 rx on tx on gro on gso on sg on tso on
>
> and this caused zero problems without the xmit_more patch. However I just saw
> that net-next has a patch where TSO is disabled due to a known HW defect in
> RTL8168evl, which is of course what I have. Could this be the reason for the
> stall/hiccup when xmit_more has its fingers in the pie? I kind of know what
> xmit_more does, just not how it could interact with a possibly broken TSO that
> nevertheless seems to work fine otherwise..
>
I was about to ask exactly that, whether you have TSO enabled. I don't know what
can trigger the HW issue, it was just confirmed by Realtek that this chip version
has a problem with TSO. So the logical conclusion is: test w/o TSO, ideally the
linux-next version.
> thanks
> Holger
>
Heiner
^ permalink raw reply
* Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites
From: John Hubbard @ 2019-08-08 18:18 UTC (permalink / raw)
To: Weiny, Ira, Michal Hocko
Cc: Jan Kara, Matthew Wilcox, Andrew Morton, Christoph Hellwig,
Williams, Dan J, Dave Chinner, Dave Hansen, Jason Gunthorpe,
Jérôme Glisse, LKML, amd-gfx@lists.freedesktop.org,
ceph-devel@vger.kernel.org, devel@driverdev.osuosl.org,
devel@lists.orangefs.org, dri-devel@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, kvm@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, linux-block@vger.kernel.org,
linux-crypto@vger.kernel.org, linux-fbdev@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-media@vger.kernel.org,
linux-mm@kvack.org, linux-nfs@vger.kernel.org,
linux-rdma@vger.kernel.org, linux-rpi-kernel@lists.infradead.org,
linux-xfs@vger.kernel.org, netdev@vger.kernel.org,
rds-devel@oss.oracle.com, sparclinux@vger.kernel.org,
x86@kernel.org, xen-devel@lists.xenproject.org
In-Reply-To: <2807E5FD2F6FDA4886F6618EAC48510E79E79644@CRSMSX101.amr.corp.intel.com>
On 8/8/19 9:25 AM, Weiny, Ira wrote:
>>
>> On 8/7/19 7:36 PM, Ira Weiny wrote:
>>> On Wed, Aug 07, 2019 at 10:46:49AM +0200, Michal Hocko wrote:
>>>> On Wed 07-08-19 10:37:26, Jan Kara wrote:
>>>>> On Fri 02-08-19 12:14:09, John Hubbard wrote:
>>>>>> On 8/2/19 7:52 AM, Jan Kara wrote:
>>>>>>> On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
>>>>>>>> On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
>>>>>>>>> On Fri 02-08-19 11:12:44, Michal Hocko wrote:
>>>>>>>>>> On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote:
>> [...]
> Yep I can do this. I did not realize that Andrew had accepted any of this work. I'll check out his tree. But I don't think he is going to accept this series through his tree. So what is the ETA on that landing in Linus' tree?
>
I'd expect it to go into 5.4, according to my understanding of how
the release cycles are arranged.
> To that point I'm still not sure who would take all this as I am now touching mm, procfs, rdma, ext4, and xfs.
>
> I just thought I would chime in with my progress because I'm to a point where things are working and so I can submit the code but I'm not sure what I can/should depend on landing... Also, now that 0day has run overnight it has found issues with this rebase so I need to clean those up... Perhaps I will base on Andrew's tree prior to doing that...
I'm certainly not the right person to answer, but in spite of that, I'd think
Andrew's tree is a reasonable place for it. Sort of.
thanks,
--
John Hubbard
NVIDIA
^ permalink raw reply
* Re: [PATCH v3 0/2] dt-bindings: net: meson-dwmac: convert to yaml
From: David Miller @ 2019-08-08 18:20 UTC (permalink / raw)
To: narmstrong
Cc: robh+dt, martin.blumenstingl, devicetree, netdev, linux-amlogic,
linux-arm-kernel, linux-kernel
In-Reply-To: <20190808114101.29982-1-narmstrong@baylibre.com>
From: Neil Armstrong <narmstrong@baylibre.com>
Date: Thu, 8 Aug 2019 13:40:59 +0200
> This patchsets converts the Amlogic Meson DWMAC glue bindings over to
> YAML schemas using the already converted dwmac bindings.
>
> The first patch is needed because the Amlogic glue needs a supplementary
> reg cell to access the DWMAC glue registers.
>
> Changes since v2:
> - Added review tags
> - Updated allwinner,sun7i-a20-gmac.yaml reg maxItems
Where is this targetted to be merged, an ARM tree? Or one of my
networking trees?
^ permalink raw reply
* Re: [Linux-decnet-user] [PATCH] Documentation: decnet: remove reference to CONFIG_DECNET_ROUTE_FWMARK
From: David Miller @ 2019-08-08 18:21 UTC (permalink / raw)
To: emserrat
Cc: clabbe, corbet, linux-doc, netdev, linux-decnet-user,
linux-kernel, tgraf
In-Reply-To: <DM5PR22MB03797234267E8B37EA3080BBC4D70@DM5PR22MB0379.namprd22.prod.outlook.com>
From: Eduardo Marcelo Serrat <emserrat@hotmail.com>
Date: Thu, 8 Aug 2019 11:44:14 +0000
> Sorry for using the list for this purpose but we are looking for
> senior engineers with knowledge in OpenVMS/ Tru64 Unix, Solaris,
> HP-UX and of course Linux and familiar with virtualization
> technologies, specially cross platform emulators. We need to fill
> support engineer roles. If anybody interested for positions in the
> US / Europe please send me an email.
Please do not ever use the vger.kernel.org mailing lists for this kind
of solicitation.
It is completely inappropriate.
^ permalink raw reply
* Re: [PATCH v2 00/15] net: phy: adin: add support for Analog Devices PHYs
From: David Miller @ 2019-08-08 18:24 UTC (permalink / raw)
To: alexandru.ardelean
Cc: netdev, devicetree, linux-kernel, robh+dt, mark.rutland,
f.fainelli, hkallweit1, andrew
In-Reply-To: <20190808123026.17382-1-alexandru.ardelean@analog.com>
From: Alexandru Ardelean <alexandru.ardelean@analog.com>
Date: Thu, 8 Aug 2019 15:30:11 +0300
> This changeset adds support for Analog Devices Industrial Ethernet PHYs.
> Particularly the PHYs this driver adds support for:
> * ADIN1200 - Robust, Industrial, Low Power 10/100 Ethernet PHY
> * ADIN1300 - Robust, Industrial, Low Latency 10/100/1000 Gigabit
> Ethernet PHY
>
> The 2 chips are pin & register compatible with one another. The main
> difference being that ADIN1200 doesn't operate in gigabit mode.
>
> The chips can be operated by the Generic PHY driver as well via the
> standard IEEE PHY registers (0x0000 - 0x000F) which are supported by the
> kernel as well. This assumes that configuration of the PHY has been done
> completely in HW, according to spec, i.e. no extra SW configuration
> required.
>
> This changeset also implements the ability to configure the chips via SW
> registers.
>
> Datasheets:
> https://www.analog.com/media/en/technical-documentation/data-sheets/ADIN1300.pdf
> https://www.analog.com/media/en/technical-documentation/data-sheets/ADIN1200.pdf
>
> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
I think, at a minimum, the c22 vs. c45 issues need to be discussed more
and even if no code changes occur there is definitely some adjustments
and clairifications that need to occur on this issue in the commit
messages and/or documentation.
^ permalink raw reply
* Re: [PATCH 0/2] pull request for net: batman-adv 2019-08-08
From: David Miller @ 2019-08-08 18:26 UTC (permalink / raw)
To: sw; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <20190808130208.2124-1-sw@simonwunderlich.de>
From: Simon Wunderlich <sw@simonwunderlich.de>
Date: Thu, 8 Aug 2019 15:02:06 +0200
> here are some bugfixes which we would like to have integrated into net.
>
> Please pull or let me know of any problem!
Pulled.
^ permalink raw reply
* Re: [PATCH bpf-next 1/3] bpf: support cloning sk storage on accept()
From: Martin Lau @ 2019-08-08 18:27 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Stanislav Fomichev, netdev@vger.kernel.org, bpf@vger.kernel.org,
davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net
In-Reply-To: <20190808152830.GC2820@mini-arch>
On Thu, Aug 08, 2019 at 08:28:30AM -0700, Stanislav Fomichev wrote:
> On 08/08, Martin Lau wrote:
> > On Wed, Aug 07, 2019 at 08:47:18AM -0700, Stanislav Fomichev wrote:
> > > Add new helper bpf_sk_storage_clone which optionally clones sk storage
> > > and call it from bpf_sk_storage_clone. Reuse the gap in
> > > bpf_sk_storage_elem to store clone/non-clone flag.
> > >
> > > Cc: Martin KaFai Lau <kafai@fb.com>
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > > include/net/bpf_sk_storage.h | 10 ++++
> > > include/uapi/linux/bpf.h | 1 +
> > > net/core/bpf_sk_storage.c | 102 +++++++++++++++++++++++++++++++++--
> > > net/core/sock.c | 9 ++--
> > > 4 files changed, 115 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
> > > index b9dcb02e756b..8e4f831d2e52 100644
> > > --- a/include/net/bpf_sk_storage.h
> > > +++ b/include/net/bpf_sk_storage.h
> > > @@ -10,4 +10,14 @@ void bpf_sk_storage_free(struct sock *sk);
> > > extern const struct bpf_func_proto bpf_sk_storage_get_proto;
> > > extern const struct bpf_func_proto bpf_sk_storage_delete_proto;
> > >
> > > +#ifdef CONFIG_BPF_SYSCALL
> > > +int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
> > > +#else
> > > +static inline int bpf_sk_storage_clone(const struct sock *sk,
> > > + struct sock *newsk)
> > > +{
> > > + return 0;
> > > +}
> > > +#endif
> > > +
> > > #endif /* _BPF_SK_STORAGE_H */
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index 4393bd4b2419..00459ca4c8cf 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -2931,6 +2931,7 @@ enum bpf_func_id {
> > >
> > > /* BPF_FUNC_sk_storage_get flags */
> > > #define BPF_SK_STORAGE_GET_F_CREATE (1ULL << 0)
> > > +#define BPF_SK_STORAGE_GET_F_CLONE (1ULL << 1)
> > It is only used in bpf_sk_storage_get().
> > What if the elem is created from bpf_fd_sk_storage_update_elem()
> > i.e. from the syscall API ?
> >
> > What may be the use case for a map to have both CLONE and non-CLONE
> > elements? If it is not the case, would it be better to add
> > BPF_F_CLONE to bpf_attr->map_flags?
> I didn't think about putting it on the map itself since the API
> is on a per-element, but it does make sense. I can't come up
> with a use-case for a per-element selective clone/non-clone.
> Thanks, will move to the map itself.
>
> > >
> > > /* Mode for BPF_FUNC_skb_adjust_room helper. */
> > > enum bpf_adj_room_mode {
> > > diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> > > index 94c7f77ecb6b..b6dea67965bc 100644
> > > --- a/net/core/bpf_sk_storage.c
> > > +++ b/net/core/bpf_sk_storage.c
> > > @@ -12,6 +12,9 @@
> > >
> > > static atomic_t cache_idx;
> > >
> > > +#define BPF_SK_STORAGE_GET_F_MASK (BPF_SK_STORAGE_GET_F_CREATE | \
> > > + BPF_SK_STORAGE_GET_F_CLONE)
> > > +
> > > struct bucket {
> > > struct hlist_head list;
> > > raw_spinlock_t lock;
> > > @@ -66,7 +69,8 @@ struct bpf_sk_storage_elem {
> > > struct hlist_node snode; /* Linked to bpf_sk_storage */
> > > struct bpf_sk_storage __rcu *sk_storage;
> > > struct rcu_head rcu;
> > > - /* 8 bytes hole */
> > > + u8 clone:1;
> > > + /* 7 bytes hole */
> > > /* The data is stored in aother cacheline to minimize
> > > * the number of cachelines access during a cache hit.
> > > */
> > > @@ -509,7 +513,7 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
> > > return 0;
> > > }
> > >
> > > -/* Called by __sk_destruct() */
> > > +/* Called by __sk_destruct() & bpf_sk_storage_clone() */
> > > void bpf_sk_storage_free(struct sock *sk)
> > > {
> > > struct bpf_sk_storage_elem *selem;
> > > @@ -739,19 +743,106 @@ static int bpf_fd_sk_storage_delete_elem(struct bpf_map *map, void *key)
> > > return err;
> > > }
> > >
> > > +static struct bpf_sk_storage_elem *
> > > +bpf_sk_storage_clone_elem(struct sock *newsk,
> > > + struct bpf_sk_storage_map *smap,
> > > + struct bpf_sk_storage_elem *selem)
> > > +{
> > > + struct bpf_sk_storage_elem *copy_selem;
> > > +
> > > + copy_selem = selem_alloc(smap, newsk, NULL, true);
> > > + if (!copy_selem)
> > > + return ERR_PTR(-ENOMEM);
> > nit.
> > may be just return NULL as selem_alloc() does.
> Sounds good.
>
> > > +
> > > + if (map_value_has_spin_lock(&smap->map))
> > > + copy_map_value_locked(&smap->map, SDATA(copy_selem)->data,
> > > + SDATA(selem)->data, true);
> > > + else
> > > + copy_map_value(&smap->map, SDATA(copy_selem)->data,
> > > + SDATA(selem)->data);
> > > +
> > > + return copy_selem;
> > > +}
> > > +
> > > +int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
> > > +{
> > > + struct bpf_sk_storage *new_sk_storage = NULL;
> > > + struct bpf_sk_storage *sk_storage;
> > > + struct bpf_sk_storage_elem *selem;
> > > + int ret;
> > > +
> > > + RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> > > +
> > > + rcu_read_lock();
> > > + sk_storage = rcu_dereference(sk->sk_bpf_storage);
> > > +
> > > + if (!sk_storage || hlist_empty(&sk_storage->list))
> > > + goto out;
> > > +
> > > + hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
> > > + struct bpf_sk_storage_map *smap;
> > > + struct bpf_sk_storage_elem *copy_selem;
> > > +
> > > + if (!selem->clone)
> > > + continue;
> > > +
> > > + smap = rcu_dereference(SDATA(selem)->smap);
> > > + if (!smap)
> > smap should not be NULL.
> I see; you never set it back to NULL and we are guaranteed that the
> map is still around due to rcu. Removed.
>
> > > + continue;
> > > +
> > > + copy_selem = bpf_sk_storage_clone_elem(newsk, smap, selem);
> > > + if (IS_ERR(copy_selem)) {
> > > + ret = PTR_ERR(copy_selem);
> > > + goto err;
> > > + }
> > > +
> > > + if (!new_sk_storage) {
> > > + ret = sk_storage_alloc(newsk, smap, copy_selem);
> > > + if (ret) {
> > > + kfree(copy_selem);
> > > + atomic_sub(smap->elem_size,
> > > + &newsk->sk_omem_alloc);
> > > + goto err;
> > > + }
> > > +
> > > + new_sk_storage = rcu_dereference(copy_selem->sk_storage);
> > > + continue;
> > > + }
> > > +
> > > + raw_spin_lock_bh(&new_sk_storage->lock);
> > > + selem_link_map(smap, copy_selem);
> > Unlike the existing selem-update use-cases in bpf_sk_storage.c,
> > the smap->map.refcnt has not been held here. Reading the smap
> > is fine. However, adding a new selem to a deleting smap is an issue.
> > Hence, I think bpf_map_inc_not_zero() should be done first.
> In this case, I should probably do it after smap = rcu_deref()?
Right.
and bpf_map_put should be called when done. Becasue of bpf_map_put,
it may be a good idea to add a comment to the first synchronize_rcu()
in bpf_sk_storage_map_free() since this new bpf_sk_storage_clone()
also depends on it now,
which makes it different from other bpf maps.
>
> > > + __selem_link_sk(new_sk_storage, copy_selem);
> > > + raw_spin_unlock_bh(&new_sk_storage->lock);
> > > + }
> > > +
> > > +out:
> > > + rcu_read_unlock();
> > > + return 0;
> > > +
> > > +err:
> > > + rcu_read_unlock();
> > > +
> > > + bpf_sk_storage_free(newsk);
> > > + return ret;
> > > +}
> > > +
> > > BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
> > > void *, value, u64, flags)
> > > {
> > > struct bpf_sk_storage_data *sdata;
> > >
> > > - if (flags > BPF_SK_STORAGE_GET_F_CREATE)
> > > + if (flags & ~BPF_SK_STORAGE_GET_F_MASK)
> > > + return (unsigned long)NULL;
> > > +
> > > + if ((flags & BPF_SK_STORAGE_GET_F_CLONE) &&
> > > + !(flags & BPF_SK_STORAGE_GET_F_CREATE))
> > > return (unsigned long)NULL;
> > >
> > > sdata = sk_storage_lookup(sk, map, true);
> > > if (sdata)
> > > return (unsigned long)sdata->data;
> > >
> > > - if (flags == BPF_SK_STORAGE_GET_F_CREATE &&
> > > + if ((flags & BPF_SK_STORAGE_GET_F_CREATE) &&
> > > /* Cannot add new elem to a going away sk.
> > > * Otherwise, the new elem may become a leak
> > > * (and also other memory issues during map
> > > @@ -762,6 +853,9 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
> > > /* sk must be a fullsock (guaranteed by verifier),
> > > * so sock_gen_put() is unnecessary.
> > > */
> > > + if (!IS_ERR(sdata))
> > > + SELEM(sdata)->clone =
> > > + !!(flags & BPF_SK_STORAGE_GET_F_CLONE);
> > > sock_put(sk);
> > > return IS_ERR(sdata) ?
> > > (unsigned long)NULL : (unsigned long)sdata->data;
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index d57b0cc995a0..f5e801a9cea4 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -1851,9 +1851,12 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
> > > goto out;
> > > }
> > > RCU_INIT_POINTER(newsk->sk_reuseport_cb, NULL);
> > > -#ifdef CONFIG_BPF_SYSCALL
> > > - RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> > > -#endif
> > > +
> > > + if (bpf_sk_storage_clone(sk, newsk)) {
> > > + sk_free_unlock_clone(newsk);
> > > + newsk = NULL;
> > > + goto out;
> > > + }
> > >
> > > newsk->sk_err = 0;
> > > newsk->sk_err_soft = 0;
> > > --
> > > 2.22.0.770.g0f2c4a37fd-goog
> > >
^ permalink raw reply
* Re: [PATCH 0/4] pull request for net-next: batman-adv 2019-08-08
From: David Miller @ 2019-08-08 18:29 UTC (permalink / raw)
To: sw; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <20190808130619.4481-1-sw@simonwunderlich.de>
From: Simon Wunderlich <sw@simonwunderlich.de>
Date: Thu, 8 Aug 2019 15:06:15 +0200
> here is a small feature/cleanup pull request of batman-adv to go into net-next.
>
> Please pull or let me know of any problem!
Pulled, thanks.
That lockdep annotation in the 4th patch really helped with the review.
^ permalink raw reply
* Re: [PATCH net-next] taprio: remove unused variable 'entry_list_policy'
From: David Miller @ 2019-08-08 18:38 UTC (permalink / raw)
To: yuehaibing
Cc: jhs, xiyou.wangcong, jiri, vinicius.gomes, linux-kernel, netdev
In-Reply-To: <20190808142623.69188-1-yuehaibing@huawei.com>
From: YueHaibing <yuehaibing@huawei.com>
Date: Thu, 8 Aug 2019 22:26:23 +0800
> net/sched/sch_taprio.c:680:32: warning:
> entry_list_policy defined but not used [-Wunused-const-variable=]
>
> It is not used since commit a3d43c0d56f1 ("taprio: Add
> support adding an admin schedule")
>
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
This is probably unintentional and a bug, we should be using that
policy value to validate that the sched list is indeed a nested
attribute.
I'm not applying this without at least a better and clear commit
message explaining why we shouldn't be using this policy any more.
^ permalink raw reply
* RE: [PATCH v3 1/1] ixgbe: sync the first fragment unconditionally
From: Bowers, AndrewX @ 2019-08-08 18:42 UTC (permalink / raw)
To: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org
In-Reply-To: <20190808040312.21719-1-firo.yang@suse.com>
> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Firo Yang
> Sent: Wednesday, August 7, 2019 9:04 PM
> To: netdev@vger.kernel.org
> Cc: maciejromanfijalkowski@gmail.com; Firo Yang <firo.yang@suse.com>;
> linux-kernel@vger.kernel.org; intel-wired-lan@lists.osuosl.org;
> jian.w.wen@oracle.com; alexander.h.duyck@linux.intel.com;
> davem@davemloft.net
> Subject: [Intel-wired-lan] [PATCH v3 1/1] ixgbe: sync the first fragment
> unconditionally
>
> In Xen environment, if Xen-swiotlb is enabled, ixgbe driver could possibly
> allocate a page, DMA memory buffer, for the first fragment which is not
> suitable for Xen-swiotlb to do DMA operations.
> Xen-swiotlb have to internally allocate another page for doing DMA
> operations. This mechanism requires syncing the data from the internal page
> to the page which ixgbe sends to upper network stack. However, since
> commit f3213d932173 ("ixgbe: Update driver to make use of DMA attributes
> in Rx path"), the unmap operation is performed with
> DMA_ATTR_SKIP_CPU_SYNC. As a result, the sync is not performed.
> Since the sync isn't performed, the upper network stack could receive a
> incomplete network packet. By incomplete, it means the linear data on the
> first fragment(between skb->head and skb->end) is invalid. So we have to
> copy the data from the internal xen-swiotlb page to the page which ixgbe
> sends to upper network stack through the sync operation.
>
> More details from Alexander Duyck:
> Specifically since we are mapping the frame with
> DMA_ATTR_SKIP_CPU_SYNC we have to unmap with that as well. As a result
> a sync is not performed on an unmap and must be done manually as we
> skipped it for the first frag. As such we need to always sync before possibly
> performing a page unmap operation.
>
> Fixes: f3213d932173 ("ixgbe: Update driver to make use of DMA attributes in
> Rx path")
> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Signed-off-by: Firo Yang <firo.yang@suse.com>
> ---
> Changes from v2:
> * Added details on the problem caused by skipping the sync.
> * Added more explanation from Alexander Duyck.
>
> Changes from v1:
> * Imporved the patch description.
> * Added Reviewed-by: and Fixes: as suggested by Alexander Duyck.
>
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 +++++++++-------
> 1 file changed, 9 insertions(+), 7 deletions(-)
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
^ permalink raw reply
* Re: [PATCH net-next 5/5] r8152: change rx_frag_head_sz and rx_max_agg_num dynamically
From: Jakub Kicinski @ 2019-08-08 18:43 UTC (permalink / raw)
To: Hayes Wang
Cc: Maciej Fijalkowski, netdev@vger.kernel.org, nic_swsd,
linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org
In-Reply-To: <0835B3720019904CB8F7AA43166CEEB2F18D0F3F@RTITMBSVM03.realtek.com.tw>
On Thu, 8 Aug 2019 12:16:50 +0000, Hayes Wang wrote:
> Maciej Fijalkowski [mailto:maciejromanfijalkowski@gmail.com]
> > Sent: Thursday, August 08, 2019 7:50 PM
> > > Excuse me again.
> > > I find the kernel supports the copybreak of Ethtool.
> > > However, I couldn't find a command of Ethtool to use it.
> >
> > Ummm there's set_tunable ops. Amazon's ena driver is making use of it from
> > what
> > I see. Look at ena_set_tunable() in
> > drivers/net/ethernet/amazon/ena/ena_ethtool.c.
>
> The kernel could support it. And I has finished it.
> However, when I want to test it by ethtool, I couldn't find suitable command.
> I couldn't find relative feature in the source code of ethtool, either.
It's possible it's not implemented in the user space tool 🤔
Looks like it got posted here:
https://www.spinics.net/lists/netdev/msg299877.html
But perhaps never finished?
It should be fairly straightforward to implement by looking at how
phy-tunables are handled.
^ permalink raw reply
* Re: [PATCH 1/1] bpf: introduce new helper udp_flow_src_port
From: Andrii Nakryiko @ 2019-08-08 18:48 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Y Song, Alexei Starovoitov, Farid Zakaria, Daniel Borkmann,
netdev, bpf
In-Reply-To: <20190805171036.5a5bf790@cakuba.netronome.com>
On Mon, Aug 5, 2019 at 5:11 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Sat, 3 Aug 2019 23:52:16 -0700, Y Song wrote:
> > > include/uapi/linux/bpf.h | 21 +++++++--
> > > net/core/filter.c | 20 ++++++++
> > > tools/include/uapi/linux/bpf.h | 21 +++++++--
> > > tools/testing/selftests/bpf/bpf_helpers.h | 2 +
> > > .../bpf/prog_tests/udp_flow_src_port.c | 28 +++++++++++
> > > .../bpf/progs/test_udp_flow_src_port_kern.c | 47 +++++++++++++++++++
> > > 6 files changed, 131 insertions(+), 8 deletions(-)
> > > create mode 100644 tools/testing/selftests/bpf/prog_tests/udp_flow_src_port.c
> > > create mode 100644 tools/testing/selftests/bpf/progs/test_udp_flow_src_port_kern.c
> >
> > First, for each review, backport and sync with libbpf repo, in the future,
> > could you break the patch to two patches?
> > 1. kernel changes (net/core/filter.c, include/uapi/linux/bpf.h)
> > 2. tools/include/uapi/linux/bpf.h
> > 3. tools/testing/ changes
>
> A lot of people get caught off by this, could explain why this is
> necessary?
We are using script [0] to sync libbpf sources from linux repo to
Github. It does a lot of things to make this happen, given that Github
structure is not a simple copy/move into subdirectory. Instead it does
a bunch of cherry-picking and tree rewrites, so when there are patches
that touched both libbpf sources (including those tools/include/...
files) and some sources that we don't sync (e.g., just include/...),
then script/git gets confused which breaks the flow and requires more
manual work. Which is why we are asking to split those changes. Hope
this helps to clarify.
[0] https://github.com/libbpf/libbpf/blob/master/scripts/sync-kernel.sh
>
> git can deal with this scenario without missing a step, format-patch
> takes paths:
>
> $ git show --oneline -s
> 1002f3e955d7 (HEAD) bpf: introduce new helper udp_flow_src_port
>
> $ git format-patch HEAD~ -- tools/include/uapi/linux/bpf.h
> 0001-bpf-introduce-new-helper-udp_flow_src_port.patch
>
> $ grep -B1 changed 0001-bpf-introduce-new-helper-udp_flow_src_port.patch
> tools/include/uapi/linux/bpf.h | 21 +++++++++++++++++----
> 1 file changed, 17 insertions(+), 4 deletions(-)
>
> $ cd ../libbpf
> $ git am -p2 ../linux/0001-bpf-introduce-new-helper-udp_flow_src_port.patch
> Applying: bpf: introduce new helper udp_flow_src_port
> error: patch failed: include/uapi/linux/bpf.h:2853
> error: include/uapi/linux/bpf.h: patch does not apply
> ...
>
> Well, the patch doesn't apply to libbpf right now, but git finds the
> right paths and all that.
>
> IMO it'd be good to not have this artificial process obstacle and all
> the "sync headers" commits in the tree.
It might be the case that script can be written in some different way
to bypass this limitation, but someone has to dedicate time to write
it and test it. Feel free to contribute.
^ permalink raw reply
* Re: [PATCH] liquidio: Use pcie_flr() instead of reimplementing it
From: Bjorn Helgaas @ 2019-08-08 18:48 UTC (permalink / raw)
To: Denis Efremov
Cc: Bjorn Helgaas, Derek Chickles, Satanand Burla, Felix Manlunas,
netdev, linux-pci, linux-kernel
In-Reply-To: <20190808045753.5474-1-efremov@linux.com>
On Thu, Aug 08, 2019 at 07:57:53AM +0300, Denis Efremov wrote:
> octeon_mbox_process_cmd() directly writes the PCI_EXP_DEVCTL_BCR_FLR
> bit, which bypasses timing requirements imposed by the PCIe spec.
> This patch fixes the function to use the pcie_flr() interface instead.
>
> Signed-off-by: Denis Efremov <efremov@linux.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Thanks for doing this, Denis. When possible it's better to use a PCI
core interface than to fiddle with PCI config space directly from a
driver.
> ---
> drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
> index 021d99cd1665..614d07be7181 100644
> --- a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
> +++ b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
> @@ -260,9 +260,7 @@ static int octeon_mbox_process_cmd(struct octeon_mbox *mbox,
> dev_info(&oct->pci_dev->dev,
> "got a request for FLR from VF that owns DPI ring %u\n",
> mbox->q_no);
> - pcie_capability_set_word(
> - oct->sriov_info.dpiring_to_vfpcidev_lut[mbox->q_no],
> - PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR);
> + pcie_flr(oct->sriov_info.dpiring_to_vfpcidev_lut[mbox->q_no]);
> break;
>
> case OCTEON_PF_CHANGED_VF_MACADDR:
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH bpf 2/2] tools: bpftool: add error message on pin failure
From: Andrii Nakryiko @ 2019-08-08 18:54 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Alexei Starovoitov, Daniel Borkmann, Networking, bpf, oss-drivers,
Andy Lutomirski, Quentin Monnet
In-Reply-To: <20190807001923.19483-3-jakub.kicinski@netronome.com>
On Tue, Aug 6, 2019 at 5:21 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> No error message is currently printed if the pin syscall
> itself fails. It got lost in the loadall refactoring.
>
> Fixes: 77380998d91d ("bpftool: add loadall command")
> Reported-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
> ---
Acked-by: Andrii Nakryiko <andriin@fb.com>
> CC: luto@kernel.org, sdf@google.com
>
> tools/bpf/bpftool/common.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> index c52a6ffb8949..6a71324be628 100644
> --- a/tools/bpf/bpftool/common.c
> +++ b/tools/bpf/bpftool/common.c
> @@ -204,7 +204,11 @@ int do_pin_fd(int fd, const char *name)
> if (err)
> return err;
>
> - return bpf_obj_pin(fd, name);
> + err = bpf_obj_pin(fd, name);
> + if (err)
> + p_err("can't pin the object (%s): %s", name, strerror(errno));
> +
> + return err;
> }
>
> int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH net-next v1 1/8] netfilter: inlined four headers files into another one.
From: Jozsef Kadlecsik @ 2019-08-08 18:49 UTC (permalink / raw)
To: Jeremy Sowden
Cc: Pablo Neira Ayuso, Netfilter Devel, Net Dev, Masahiro Yamada
In-Reply-To: <20190807141705.4864-2-jeremy@azazel.net>
Hi Jeremy,
On Wed, 7 Aug 2019, Jeremy Sowden wrote:
> linux/netfilter/ipset/ip_set.h included four other header files:
>
> include/linux/netfilter/ipset/ip_set_comment.h
> include/linux/netfilter/ipset/ip_set_counter.h
> include/linux/netfilter/ipset/ip_set_skbinfo.h
> include/linux/netfilter/ipset/ip_set_timeout.h
>
> Of these the first three were not included anywhere else. The last,
> ip_set_timeout.h, was included in a couple of other places, but defined
> inline functions which call other inline functions defined in ip_set.h,
> so ip_set.h had to be included before it.
>
> Inlined all four into ip_set.h, and updated the other files that
> included ip_set_timeout.h.
>
> Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Also, for the ipset part of patch 2/8, thank!
Best regards,
Jozsef
> ---
> include/linux/netfilter/ipset/ip_set.h | 238 +++++++++++++++++-
> .../linux/netfilter/ipset/ip_set_comment.h | 73 ------
> .../linux/netfilter/ipset/ip_set_counter.h | 84 -------
> .../linux/netfilter/ipset/ip_set_skbinfo.h | 42 ----
> .../linux/netfilter/ipset/ip_set_timeout.h | 77 ------
> net/netfilter/ipset/ip_set_hash_gen.h | 2 +-
> net/netfilter/xt_set.c | 1 -
> 7 files changed, 235 insertions(+), 282 deletions(-)
> delete mode 100644 include/linux/netfilter/ipset/ip_set_comment.h
> delete mode 100644 include/linux/netfilter/ipset/ip_set_counter.h
> delete mode 100644 include/linux/netfilter/ipset/ip_set_skbinfo.h
> delete mode 100644 include/linux/netfilter/ipset/ip_set_timeout.h
>
> diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h
> index 12ad9b1853b4..9bc255a8461b 100644
> --- a/include/linux/netfilter/ipset/ip_set.h
> +++ b/include/linux/netfilter/ipset/ip_set.h
> @@ -452,10 +452,240 @@ bitmap_bytes(u32 a, u32 b)
> return 4 * ((((b - a + 8) / 8) + 3) / 4);
> }
>
> -#include <linux/netfilter/ipset/ip_set_timeout.h>
> -#include <linux/netfilter/ipset/ip_set_comment.h>
> -#include <linux/netfilter/ipset/ip_set_counter.h>
> -#include <linux/netfilter/ipset/ip_set_skbinfo.h>
> +/* How often should the gc be run by default */
> +#define IPSET_GC_TIME (3 * 60)
> +
> +/* Timeout period depending on the timeout value of the given set */
> +#define IPSET_GC_PERIOD(timeout) \
> + ((timeout/3) ? min_t(u32, (timeout)/3, IPSET_GC_TIME) : 1)
> +
> +/* Entry is set with no timeout value */
> +#define IPSET_ELEM_PERMANENT 0
> +
> +/* Set is defined with timeout support: timeout value may be 0 */
> +#define IPSET_NO_TIMEOUT UINT_MAX
> +
> +/* Max timeout value, see msecs_to_jiffies() in jiffies.h */
> +#define IPSET_MAX_TIMEOUT (UINT_MAX >> 1)/MSEC_PER_SEC
> +
> +#define ip_set_adt_opt_timeout(opt, set) \
> +((opt)->ext.timeout != IPSET_NO_TIMEOUT ? (opt)->ext.timeout : (set)->timeout)
> +
> +static inline unsigned int
> +ip_set_timeout_uget(struct nlattr *tb)
> +{
> + unsigned int timeout = ip_set_get_h32(tb);
> +
> + /* Normalize to fit into jiffies */
> + if (timeout > IPSET_MAX_TIMEOUT)
> + timeout = IPSET_MAX_TIMEOUT;
> +
> + return timeout;
> +}
> +
> +static inline bool
> +ip_set_timeout_expired(const unsigned long *t)
> +{
> + return *t != IPSET_ELEM_PERMANENT && time_is_before_jiffies(*t);
> +}
> +
> +static inline void
> +ip_set_timeout_set(unsigned long *timeout, u32 value)
> +{
> + unsigned long t;
> +
> + if (!value) {
> + *timeout = IPSET_ELEM_PERMANENT;
> + return;
> + }
> +
> + t = msecs_to_jiffies(value * MSEC_PER_SEC) + jiffies;
> + if (t == IPSET_ELEM_PERMANENT)
> + /* Bingo! :-) */
> + t--;
> + *timeout = t;
> +}
> +
> +static inline u32
> +ip_set_timeout_get(const unsigned long *timeout)
> +{
> + u32 t;
> +
> + if (*timeout == IPSET_ELEM_PERMANENT)
> + return 0;
> +
> + t = jiffies_to_msecs(*timeout - jiffies)/MSEC_PER_SEC;
> + /* Zero value in userspace means no timeout */
> + return t == 0 ? 1 : t;
> +}
> +
> +static inline char*
> +ip_set_comment_uget(struct nlattr *tb)
> +{
> + return nla_data(tb);
> +}
> +
> +/* Called from uadd only, protected by the set spinlock.
> + * The kadt functions don't use the comment extensions in any way.
> + */
> +static inline void
> +ip_set_init_comment(struct ip_set *set, struct ip_set_comment *comment,
> + const struct ip_set_ext *ext)
> +{
> + struct ip_set_comment_rcu *c = rcu_dereference_protected(comment->c, 1);
> + size_t len = ext->comment ? strlen(ext->comment) : 0;
> +
> + if (unlikely(c)) {
> + set->ext_size -= sizeof(*c) + strlen(c->str) + 1;
> + kfree_rcu(c, rcu);
> + rcu_assign_pointer(comment->c, NULL);
> + }
> + if (!len)
> + return;
> + if (unlikely(len > IPSET_MAX_COMMENT_SIZE))
> + len = IPSET_MAX_COMMENT_SIZE;
> + c = kmalloc(sizeof(*c) + len + 1, GFP_ATOMIC);
> + if (unlikely(!c))
> + return;
> + strlcpy(c->str, ext->comment, len + 1);
> + set->ext_size += sizeof(*c) + strlen(c->str) + 1;
> + rcu_assign_pointer(comment->c, c);
> +}
> +
> +/* Used only when dumping a set, protected by rcu_read_lock() */
> +static inline int
> +ip_set_put_comment(struct sk_buff *skb, const struct ip_set_comment *comment)
> +{
> + struct ip_set_comment_rcu *c = rcu_dereference(comment->c);
> +
> + if (!c)
> + return 0;
> + return nla_put_string(skb, IPSET_ATTR_COMMENT, c->str);
> +}
> +
> +/* Called from uadd/udel, flush or the garbage collectors protected
> + * by the set spinlock.
> + * Called when the set is destroyed and when there can't be any user
> + * of the set data anymore.
> + */
> +static inline void
> +ip_set_comment_free(struct ip_set *set, struct ip_set_comment *comment)
> +{
> + struct ip_set_comment_rcu *c;
> +
> + c = rcu_dereference_protected(comment->c, 1);
> + if (unlikely(!c))
> + return;
> + set->ext_size -= sizeof(*c) + strlen(c->str) + 1;
> + kfree_rcu(c, rcu);
> + rcu_assign_pointer(comment->c, NULL);
> +}
> +
> +static inline void
> +ip_set_add_bytes(u64 bytes, struct ip_set_counter *counter)
> +{
> + atomic64_add((long long)bytes, &(counter)->bytes);
> +}
> +
> +static inline void
> +ip_set_add_packets(u64 packets, struct ip_set_counter *counter)
> +{
> + atomic64_add((long long)packets, &(counter)->packets);
> +}
> +
> +static inline u64
> +ip_set_get_bytes(const struct ip_set_counter *counter)
> +{
> + return (u64)atomic64_read(&(counter)->bytes);
> +}
> +
> +static inline u64
> +ip_set_get_packets(const struct ip_set_counter *counter)
> +{
> + return (u64)atomic64_read(&(counter)->packets);
> +}
> +
> +static inline bool
> +ip_set_match_counter(u64 counter, u64 match, u8 op)
> +{
> + switch (op) {
> + case IPSET_COUNTER_NONE:
> + return true;
> + case IPSET_COUNTER_EQ:
> + return counter == match;
> + case IPSET_COUNTER_NE:
> + return counter != match;
> + case IPSET_COUNTER_LT:
> + return counter < match;
> + case IPSET_COUNTER_GT:
> + return counter > match;
> + }
> + return false;
> +}
> +
> +static inline void
> +ip_set_update_counter(struct ip_set_counter *counter,
> + const struct ip_set_ext *ext, u32 flags)
> +{
> + if (ext->packets != ULLONG_MAX &&
> + !(flags & IPSET_FLAG_SKIP_COUNTER_UPDATE)) {
> + ip_set_add_bytes(ext->bytes, counter);
> + ip_set_add_packets(ext->packets, counter);
> + }
> +}
> +
> +static inline bool
> +ip_set_put_counter(struct sk_buff *skb, const struct ip_set_counter *counter)
> +{
> + return nla_put_net64(skb, IPSET_ATTR_BYTES,
> + cpu_to_be64(ip_set_get_bytes(counter)),
> + IPSET_ATTR_PAD) ||
> + nla_put_net64(skb, IPSET_ATTR_PACKETS,
> + cpu_to_be64(ip_set_get_packets(counter)),
> + IPSET_ATTR_PAD);
> +}
> +
> +static inline void
> +ip_set_init_counter(struct ip_set_counter *counter,
> + const struct ip_set_ext *ext)
> +{
> + if (ext->bytes != ULLONG_MAX)
> + atomic64_set(&(counter)->bytes, (long long)(ext->bytes));
> + if (ext->packets != ULLONG_MAX)
> + atomic64_set(&(counter)->packets, (long long)(ext->packets));
> +}
> +
> +static inline void
> +ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo,
> + const struct ip_set_ext *ext,
> + struct ip_set_ext *mext, u32 flags)
> +{
> + mext->skbinfo = *skbinfo;
> +}
> +
> +static inline bool
> +ip_set_put_skbinfo(struct sk_buff *skb, const struct ip_set_skbinfo *skbinfo)
> +{
> + /* Send nonzero parameters only */
> + return ((skbinfo->skbmark || skbinfo->skbmarkmask) &&
> + nla_put_net64(skb, IPSET_ATTR_SKBMARK,
> + cpu_to_be64((u64)skbinfo->skbmark << 32 |
> + skbinfo->skbmarkmask),
> + IPSET_ATTR_PAD)) ||
> + (skbinfo->skbprio &&
> + nla_put_net32(skb, IPSET_ATTR_SKBPRIO,
> + cpu_to_be32(skbinfo->skbprio))) ||
> + (skbinfo->skbqueue &&
> + nla_put_net16(skb, IPSET_ATTR_SKBQUEUE,
> + cpu_to_be16(skbinfo->skbqueue)));
> +}
> +
> +static inline void
> +ip_set_init_skbinfo(struct ip_set_skbinfo *skbinfo,
> + const struct ip_set_ext *ext)
> +{
> + *skbinfo = ext->skbinfo;
> +}
>
> #define IP_SET_INIT_KEXT(skb, opt, set) \
> { .bytes = (skb)->len, .packets = 1, \
> diff --git a/include/linux/netfilter/ipset/ip_set_comment.h b/include/linux/netfilter/ipset/ip_set_comment.h
> deleted file mode 100644
> index 0b894d81bbf2..000000000000
> --- a/include/linux/netfilter/ipset/ip_set_comment.h
> +++ /dev/null
> @@ -1,73 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -#ifndef _IP_SET_COMMENT_H
> -#define _IP_SET_COMMENT_H
> -
> -/* Copyright (C) 2013 Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
> - */
> -
> -#ifdef __KERNEL__
> -
> -static inline char*
> -ip_set_comment_uget(struct nlattr *tb)
> -{
> - return nla_data(tb);
> -}
> -
> -/* Called from uadd only, protected by the set spinlock.
> - * The kadt functions don't use the comment extensions in any way.
> - */
> -static inline void
> -ip_set_init_comment(struct ip_set *set, struct ip_set_comment *comment,
> - const struct ip_set_ext *ext)
> -{
> - struct ip_set_comment_rcu *c = rcu_dereference_protected(comment->c, 1);
> - size_t len = ext->comment ? strlen(ext->comment) : 0;
> -
> - if (unlikely(c)) {
> - set->ext_size -= sizeof(*c) + strlen(c->str) + 1;
> - kfree_rcu(c, rcu);
> - rcu_assign_pointer(comment->c, NULL);
> - }
> - if (!len)
> - return;
> - if (unlikely(len > IPSET_MAX_COMMENT_SIZE))
> - len = IPSET_MAX_COMMENT_SIZE;
> - c = kmalloc(sizeof(*c) + len + 1, GFP_ATOMIC);
> - if (unlikely(!c))
> - return;
> - strlcpy(c->str, ext->comment, len + 1);
> - set->ext_size += sizeof(*c) + strlen(c->str) + 1;
> - rcu_assign_pointer(comment->c, c);
> -}
> -
> -/* Used only when dumping a set, protected by rcu_read_lock() */
> -static inline int
> -ip_set_put_comment(struct sk_buff *skb, const struct ip_set_comment *comment)
> -{
> - struct ip_set_comment_rcu *c = rcu_dereference(comment->c);
> -
> - if (!c)
> - return 0;
> - return nla_put_string(skb, IPSET_ATTR_COMMENT, c->str);
> -}
> -
> -/* Called from uadd/udel, flush or the garbage collectors protected
> - * by the set spinlock.
> - * Called when the set is destroyed and when there can't be any user
> - * of the set data anymore.
> - */
> -static inline void
> -ip_set_comment_free(struct ip_set *set, struct ip_set_comment *comment)
> -{
> - struct ip_set_comment_rcu *c;
> -
> - c = rcu_dereference_protected(comment->c, 1);
> - if (unlikely(!c))
> - return;
> - set->ext_size -= sizeof(*c) + strlen(c->str) + 1;
> - kfree_rcu(c, rcu);
> - rcu_assign_pointer(comment->c, NULL);
> -}
> -
> -#endif
> -#endif
> diff --git a/include/linux/netfilter/ipset/ip_set_counter.h b/include/linux/netfilter/ipset/ip_set_counter.h
> deleted file mode 100644
> index 3400958c07be..000000000000
> --- a/include/linux/netfilter/ipset/ip_set_counter.h
> +++ /dev/null
> @@ -1,84 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -#ifndef _IP_SET_COUNTER_H
> -#define _IP_SET_COUNTER_H
> -
> -/* Copyright (C) 2015 Jozsef Kadlecsik <kadlec@netfilter.org> */
> -
> -#ifdef __KERNEL__
> -
> -static inline void
> -ip_set_add_bytes(u64 bytes, struct ip_set_counter *counter)
> -{
> - atomic64_add((long long)bytes, &(counter)->bytes);
> -}
> -
> -static inline void
> -ip_set_add_packets(u64 packets, struct ip_set_counter *counter)
> -{
> - atomic64_add((long long)packets, &(counter)->packets);
> -}
> -
> -static inline u64
> -ip_set_get_bytes(const struct ip_set_counter *counter)
> -{
> - return (u64)atomic64_read(&(counter)->bytes);
> -}
> -
> -static inline u64
> -ip_set_get_packets(const struct ip_set_counter *counter)
> -{
> - return (u64)atomic64_read(&(counter)->packets);
> -}
> -
> -static inline bool
> -ip_set_match_counter(u64 counter, u64 match, u8 op)
> -{
> - switch (op) {
> - case IPSET_COUNTER_NONE:
> - return true;
> - case IPSET_COUNTER_EQ:
> - return counter == match;
> - case IPSET_COUNTER_NE:
> - return counter != match;
> - case IPSET_COUNTER_LT:
> - return counter < match;
> - case IPSET_COUNTER_GT:
> - return counter > match;
> - }
> - return false;
> -}
> -
> -static inline void
> -ip_set_update_counter(struct ip_set_counter *counter,
> - const struct ip_set_ext *ext, u32 flags)
> -{
> - if (ext->packets != ULLONG_MAX &&
> - !(flags & IPSET_FLAG_SKIP_COUNTER_UPDATE)) {
> - ip_set_add_bytes(ext->bytes, counter);
> - ip_set_add_packets(ext->packets, counter);
> - }
> -}
> -
> -static inline bool
> -ip_set_put_counter(struct sk_buff *skb, const struct ip_set_counter *counter)
> -{
> - return nla_put_net64(skb, IPSET_ATTR_BYTES,
> - cpu_to_be64(ip_set_get_bytes(counter)),
> - IPSET_ATTR_PAD) ||
> - nla_put_net64(skb, IPSET_ATTR_PACKETS,
> - cpu_to_be64(ip_set_get_packets(counter)),
> - IPSET_ATTR_PAD);
> -}
> -
> -static inline void
> -ip_set_init_counter(struct ip_set_counter *counter,
> - const struct ip_set_ext *ext)
> -{
> - if (ext->bytes != ULLONG_MAX)
> - atomic64_set(&(counter)->bytes, (long long)(ext->bytes));
> - if (ext->packets != ULLONG_MAX)
> - atomic64_set(&(counter)->packets, (long long)(ext->packets));
> -}
> -
> -#endif /* __KERNEL__ */
> -#endif /* _IP_SET_COUNTER_H */
> diff --git a/include/linux/netfilter/ipset/ip_set_skbinfo.h b/include/linux/netfilter/ipset/ip_set_skbinfo.h
> deleted file mode 100644
> index 3a2df02dbd55..000000000000
> --- a/include/linux/netfilter/ipset/ip_set_skbinfo.h
> +++ /dev/null
> @@ -1,42 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -#ifndef _IP_SET_SKBINFO_H
> -#define _IP_SET_SKBINFO_H
> -
> -/* Copyright (C) 2015 Jozsef Kadlecsik <kadlec@netfilter.org> */
> -
> -#ifdef __KERNEL__
> -
> -static inline void
> -ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo,
> - const struct ip_set_ext *ext,
> - struct ip_set_ext *mext, u32 flags)
> -{
> - mext->skbinfo = *skbinfo;
> -}
> -
> -static inline bool
> -ip_set_put_skbinfo(struct sk_buff *skb, const struct ip_set_skbinfo *skbinfo)
> -{
> - /* Send nonzero parameters only */
> - return ((skbinfo->skbmark || skbinfo->skbmarkmask) &&
> - nla_put_net64(skb, IPSET_ATTR_SKBMARK,
> - cpu_to_be64((u64)skbinfo->skbmark << 32 |
> - skbinfo->skbmarkmask),
> - IPSET_ATTR_PAD)) ||
> - (skbinfo->skbprio &&
> - nla_put_net32(skb, IPSET_ATTR_SKBPRIO,
> - cpu_to_be32(skbinfo->skbprio))) ||
> - (skbinfo->skbqueue &&
> - nla_put_net16(skb, IPSET_ATTR_SKBQUEUE,
> - cpu_to_be16(skbinfo->skbqueue)));
> -}
> -
> -static inline void
> -ip_set_init_skbinfo(struct ip_set_skbinfo *skbinfo,
> - const struct ip_set_ext *ext)
> -{
> - *skbinfo = ext->skbinfo;
> -}
> -
> -#endif /* __KERNEL__ */
> -#endif /* _IP_SET_SKBINFO_H */
> diff --git a/include/linux/netfilter/ipset/ip_set_timeout.h b/include/linux/netfilter/ipset/ip_set_timeout.h
> deleted file mode 100644
> index 2be60e379ecf..000000000000
> --- a/include/linux/netfilter/ipset/ip_set_timeout.h
> +++ /dev/null
> @@ -1,77 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -#ifndef _IP_SET_TIMEOUT_H
> -#define _IP_SET_TIMEOUT_H
> -
> -/* Copyright (C) 2003-2013 Jozsef Kadlecsik <kadlec@netfilter.org> */
> -
> -#ifdef __KERNEL__
> -
> -/* How often should the gc be run by default */
> -#define IPSET_GC_TIME (3 * 60)
> -
> -/* Timeout period depending on the timeout value of the given set */
> -#define IPSET_GC_PERIOD(timeout) \
> - ((timeout/3) ? min_t(u32, (timeout)/3, IPSET_GC_TIME) : 1)
> -
> -/* Entry is set with no timeout value */
> -#define IPSET_ELEM_PERMANENT 0
> -
> -/* Set is defined with timeout support: timeout value may be 0 */
> -#define IPSET_NO_TIMEOUT UINT_MAX
> -
> -/* Max timeout value, see msecs_to_jiffies() in jiffies.h */
> -#define IPSET_MAX_TIMEOUT (UINT_MAX >> 1)/MSEC_PER_SEC
> -
> -#define ip_set_adt_opt_timeout(opt, set) \
> -((opt)->ext.timeout != IPSET_NO_TIMEOUT ? (opt)->ext.timeout : (set)->timeout)
> -
> -static inline unsigned int
> -ip_set_timeout_uget(struct nlattr *tb)
> -{
> - unsigned int timeout = ip_set_get_h32(tb);
> -
> - /* Normalize to fit into jiffies */
> - if (timeout > IPSET_MAX_TIMEOUT)
> - timeout = IPSET_MAX_TIMEOUT;
> -
> - return timeout;
> -}
> -
> -static inline bool
> -ip_set_timeout_expired(const unsigned long *t)
> -{
> - return *t != IPSET_ELEM_PERMANENT && time_is_before_jiffies(*t);
> -}
> -
> -static inline void
> -ip_set_timeout_set(unsigned long *timeout, u32 value)
> -{
> - unsigned long t;
> -
> - if (!value) {
> - *timeout = IPSET_ELEM_PERMANENT;
> - return;
> - }
> -
> - t = msecs_to_jiffies(value * MSEC_PER_SEC) + jiffies;
> - if (t == IPSET_ELEM_PERMANENT)
> - /* Bingo! :-) */
> - t--;
> - *timeout = t;
> -}
> -
> -static inline u32
> -ip_set_timeout_get(const unsigned long *timeout)
> -{
> - u32 t;
> -
> - if (*timeout == IPSET_ELEM_PERMANENT)
> - return 0;
> -
> - t = jiffies_to_msecs(*timeout - jiffies)/MSEC_PER_SEC;
> - /* Zero value in userspace means no timeout */
> - return t == 0 ? 1 : t;
> -}
> -
> -#endif /* __KERNEL__ */
> -#endif /* _IP_SET_TIMEOUT_H */
> diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
> index 0feb77fa9edc..2e541cb3b37d 100644
> --- a/net/netfilter/ipset/ip_set_hash_gen.h
> +++ b/net/netfilter/ipset/ip_set_hash_gen.h
> @@ -7,7 +7,7 @@
> #include <linux/rcupdate.h>
> #include <linux/jhash.h>
> #include <linux/types.h>
> -#include <linux/netfilter/ipset/ip_set_timeout.h>
> +#include <linux/netfilter/ipset/ip_set.h>
>
> #define __ipset_dereference_protected(p, c) rcu_dereference_protected(p, c)
> #define ipset_dereference_protected(p, set) \
> diff --git a/net/netfilter/xt_set.c b/net/netfilter/xt_set.c
> index ecbfa291fb70..731bc2cafae4 100644
> --- a/net/netfilter/xt_set.c
> +++ b/net/netfilter/xt_set.c
> @@ -14,7 +14,6 @@
>
> #include <linux/netfilter/x_tables.h>
> #include <linux/netfilter/ipset/ip_set.h>
> -#include <linux/netfilter/ipset/ip_set_timeout.h>
> #include <uapi/linux/netfilter/xt_set.h>
>
> MODULE_LICENSE("GPL");
> --
> 2.20.1
>
>
-
E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
H-1525 Budapest 114, POB. 49, Hungary
^ permalink raw reply
* [PATCH net-next 0/3] net: phy: realtek: add support for integrated 2.5Gbps PHY in RTL8125
From: Heiner Kallweit @ 2019-08-08 19:02 UTC (permalink / raw)
To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org
This series adds support for the integrated 2.5Gbps PHY in RTL8125.
First two patches add necessary functionality to phylib.
Heiner Kallweit (3):
net: phy: prepare phylib to deal with PHY's extending Clause 22
net: phy: add phy_modify_paged_changed
net: phy: realtek: add support for the 2.5Gbps PHY in RTL8125
drivers/net/phy/phy-core.c | 29 ++++++++++++++++++----
drivers/net/phy/phy_device.c | 35 +++++++++++---------------
drivers/net/phy/realtek.c | 48 ++++++++++++++++++++++++++++++++++++
include/linux/phy.h | 10 +++++++-
4 files changed, 96 insertions(+), 26 deletions(-)
--
2.22.0
^ permalink raw reply
* Re: [PATCH net-next] net/ncsi: allow to customize BMC MAC Address offset
From: Tao Ren @ 2019-08-08 19:02 UTC (permalink / raw)
To: Andrew Lunn
Cc: Jakub Kicinski, Samuel Mendoza-Jonas, David S . Miller,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
openbmc@lists.ozlabs.org, William Kennington, Joel Stanley
In-Reply-To: <20190808133209.GB32706@lunn.ch>
Hi Andrew,
On 8/8/19 6:32 AM, Andrew Lunn wrote:
>> Let me prepare patch v2 using device tree. I'm not sure if standard
>> "mac-address" fits this situation because all we need is an offset
>> (integer) and BMC MAC is calculated by adding the offset to NIC's
>> MAC address. Anyways, let me work out v2 patch we can discuss more
>> then.
>
> Hi Tao
>
> I don't know BMC terminology. By NICs MAC address, you are referring
> to the hosts MAC address? The MAC address the big CPU is using for its
> interface? Where does this NIC get its MAC address from? If the BMCs
> bootloader has access to it, it can set the mac-address property in
> the device tree.
Sorry for the confusion and let me clarify more:
The NIC here refers to the Network controller which provide network connectivity for both BMC (via NC-SI) and Host (for example, via PCIe).
On Facebook Yamp BMC, BMC sends NCSI_OEM_GET_MAC command (as an ethernet packet) to the Network Controller while bringing up eth0, and the (Broadcom) Network Controller replies with the Base MAC Address reserved for the platform. As for Yamp, Base-MAC and Base-MAC+1 are used by Host (big CPU) and Base-MAC+2 are assigned to BMC. In my opinion, Base MAC and MAC address assignments are controlled by Network Controller, which is transparent to both BMC and Host.
I'm not sure if I understand your suggestion correctly: do you mean we should move the logic (GET_MAC from Network Controller, adding offset and configuring BMC MAC) from kernel to boot loader?
Sam posted several ncsi patches for u-boot recently. Sam, do we support the work (implemented in this patch) in uboot? Or are we planning to do so?
Thanks,
Tao
^ permalink raw reply
* [PATCH net-next 3/3] net: phy: realtek: add support for the 2.5Gbps PHY in RTL8125
From: Heiner Kallweit @ 2019-08-08 19:05 UTC (permalink / raw)
To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <ddbf28b9-f32e-7399-10a6-27b79ca0aaf9@gmail.com>
This adds support for the integrated 2.5Gbps PHY in Realtek RTL8125.
Advertisement of 2.5Gbps mode is done via a vendor-specific register.
Same applies to reading NBase-T link partner advertisement.
Unfortunately this 2.5Gbps PHY shares the PHY ID with the integrated
1Gbps PHY's in other Realtek network chips and so far no method is
known to differentiate them. This means we can't autodetect 2.5Gbps
support and the network driver has to add 2.5Gbps to the supported
and advertised modes.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
drivers/net/phy/realtek.c | 48 +++++++++++++++++++++++++++++++++++++++
1 file changed, 48 insertions(+)
diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index a669945eb..35c981476 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -39,6 +39,11 @@
#define RTL8366RB_POWER_SAVE 0x15
#define RTL8366RB_POWER_SAVE_ON BIT(12)
+#define RTL_ADV_2500FULL BIT(7)
+#define RTL_LPADV_10000FULL BIT(11)
+#define RTL_LPADV_5000FULL BIT(6)
+#define RTL_LPADV_2500FULL BIT(5)
+
MODULE_DESCRIPTION("Realtek PHY driver");
MODULE_AUTHOR("Johnson Leung");
MODULE_LICENSE("GPL");
@@ -256,6 +261,47 @@ static int rtl8366rb_config_init(struct phy_device *phydev)
return ret;
}
+static int rtlgen_config_aneg(struct phy_device *phydev)
+{
+ int ret = 0;
+
+ if (linkmode_test_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT,
+ phydev->supported) && phydev->autoneg == AUTONEG_ENABLE) {
+ u16 adv2500 = 0;
+
+ if (linkmode_test_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT,
+ phydev->advertising))
+ adv2500 = RTL_ADV_2500FULL;
+
+ ret = phy_modify_paged_changed(phydev, 0xa5d, 0x12,
+ RTL_ADV_2500FULL, adv2500);
+ if (ret < 0)
+ return ret;
+ }
+
+ return __genphy_config_aneg(phydev, ret);
+}
+
+static int rtlgen_read_status(struct phy_device *phydev)
+{
+ if (linkmode_test_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT,
+ phydev->supported) && phydev->autoneg == AUTONEG_ENABLE) {
+ int lpadv = phy_read_paged(phydev, 0xa5d, 0x13);
+
+ if (lpadv < 0)
+ return lpadv;
+
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_10000baseT_Full_BIT,
+ phydev->lp_advertising, lpadv & RTL_LPADV_10000FULL);
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_5000baseT_Full_BIT,
+ phydev->lp_advertising, lpadv & RTL_LPADV_5000FULL);
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT,
+ phydev->lp_advertising, lpadv & RTL_LPADV_2500FULL);
+ }
+
+ return genphy_read_status(phydev);
+}
+
static struct phy_driver realtek_drvs[] = {
{
PHY_ID_MATCH_EXACT(0x00008201),
@@ -328,6 +374,8 @@ static struct phy_driver realtek_drvs[] = {
}, {
PHY_ID_MATCH_EXACT(0x001cc800),
.name = "Generic Realtek PHY",
+ .config_aneg = rtlgen_config_aneg,
+ .read_status = rtlgen_read_status,
.suspend = genphy_suspend,
.resume = genphy_resume,
.read_page = rtl821x_read_page,
--
2.22.0
^ permalink raw reply related
* [PATCH net-next 1/3] net: phy: prepare phylib to deal with PHY's extending Clause 22
From: Heiner Kallweit @ 2019-08-08 19:03 UTC (permalink / raw)
To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <ddbf28b9-f32e-7399-10a6-27b79ca0aaf9@gmail.com>
The integrated PHY in 2.5Gbps chip RTL8125 is the first (known to me)
PHY that uses standard Clause 22 for all modes up to 1Gbps and adds
2.5Gbps control using vendor-specific registers. To use phylib for
the standard part little extensions are needed:
- Move most of genphy_config_aneg to a new function
__genphy_config_aneg that takes a parameter whether restarting
auto-negotiation is needed (depending on whether content of
vendor-specific advertisement register changed).
- Don't clear phydev->lp_advertising in genphy_read_status so that
we can set non-C22 mode flags before.
Basically both changes mimic the behavior of the equivalent Clause 45
functions.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
drivers/net/phy/phy_device.c | 35 +++++++++++++++--------------------
include/linux/phy.h | 8 +++++++-
2 files changed, 22 insertions(+), 21 deletions(-)
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 7ddd91df9..bd7e7db8c 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1571,11 +1571,9 @@ static int genphy_config_advert(struct phy_device *phydev)
/* Only allow advertising what this PHY supports */
linkmode_and(phydev->advertising, phydev->advertising,
phydev->supported);
- if (!ethtool_convert_link_mode_to_legacy_u32(&advertise,
- phydev->advertising))
- phydev_warn(phydev, "PHY advertising (%*pb) more modes than genphy supports, some modes not advertised.\n",
- __ETHTOOL_LINK_MODE_MASK_NBITS,
- phydev->advertising);
+
+ ethtool_convert_link_mode_to_legacy_u32(&advertise,
+ phydev->advertising);
/* Setup standard advertisement */
err = phy_modify_changed(phydev, MII_ADVERTISE,
@@ -1681,18 +1679,20 @@ int genphy_restart_aneg(struct phy_device *phydev)
EXPORT_SYMBOL(genphy_restart_aneg);
/**
- * genphy_config_aneg - restart auto-negotiation or write BMCR
+ * __genphy_config_aneg - restart auto-negotiation or write BMCR
* @phydev: target phy_device struct
+ * @changed: whether autoneg is requested
*
* Description: If auto-negotiation is enabled, we configure the
* advertising, and then restart auto-negotiation. If it is not
* enabled, then we write the BMCR.
*/
-int genphy_config_aneg(struct phy_device *phydev)
+int __genphy_config_aneg(struct phy_device *phydev, bool changed)
{
- int err, changed;
+ int err;
- changed = genphy_config_eee_advert(phydev);
+ if (genphy_config_eee_advert(phydev))
+ changed = true;
if (AUTONEG_ENABLE != phydev->autoneg)
return genphy_setup_forced(phydev);
@@ -1700,10 +1700,10 @@ int genphy_config_aneg(struct phy_device *phydev)
err = genphy_config_advert(phydev);
if (err < 0) /* error */
return err;
+ else if (err)
+ changed = true;
- changed |= err;
-
- if (changed == 0) {
+ if (!changed) {
/* Advertisement hasn't changed, but maybe aneg was never on to
* begin with? Or maybe phy was isolated?
*/
@@ -1713,18 +1713,15 @@ int genphy_config_aneg(struct phy_device *phydev)
return ctl;
if (!(ctl & BMCR_ANENABLE) || (ctl & BMCR_ISOLATE))
- changed = 1; /* do restart aneg */
+ changed = true; /* do restart aneg */
}
/* Only restart aneg if we are advertising something different
* than we were before.
*/
- if (changed > 0)
- return genphy_restart_aneg(phydev);
-
- return 0;
+ return changed ? genphy_restart_aneg(phydev) : 0;
}
-EXPORT_SYMBOL(genphy_config_aneg);
+EXPORT_SYMBOL(__genphy_config_aneg);
/**
* genphy_aneg_done - return auto-negotiation status
@@ -1811,8 +1808,6 @@ int genphy_read_status(struct phy_device *phydev)
phydev->pause = 0;
phydev->asym_pause = 0;
- linkmode_zero(phydev->lp_advertising);
-
if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
if (phydev->is_gigabit_capable) {
lpagb = phy_read(phydev, MII_STAT1000);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 462b90b73..7117825ee 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1069,7 +1069,7 @@ int genphy_read_abilities(struct phy_device *phydev);
int genphy_setup_forced(struct phy_device *phydev);
int genphy_restart_aneg(struct phy_device *phydev);
int genphy_config_eee_advert(struct phy_device *phydev);
-int genphy_config_aneg(struct phy_device *phydev);
+int __genphy_config_aneg(struct phy_device *phydev, bool changed);
int genphy_aneg_done(struct phy_device *phydev);
int genphy_update_link(struct phy_device *phydev);
int genphy_read_status(struct phy_device *phydev);
@@ -1077,6 +1077,12 @@ int genphy_suspend(struct phy_device *phydev);
int genphy_resume(struct phy_device *phydev);
int genphy_loopback(struct phy_device *phydev, bool enable);
int genphy_soft_reset(struct phy_device *phydev);
+
+static inline int genphy_config_aneg(struct phy_device *phydev)
+{
+ return __genphy_config_aneg(phydev, false);
+}
+
static inline int genphy_no_soft_reset(struct phy_device *phydev)
{
return 0;
--
2.22.0
^ permalink raw reply related
* [PATCH net-next 2/3] net: phy: add phy_modify_paged_changed
From: Heiner Kallweit @ 2019-08-08 19:04 UTC (permalink / raw)
To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <ddbf28b9-f32e-7399-10a6-27b79ca0aaf9@gmail.com>
Add helper function phy_modify_paged_changed, behavior is the same
as for phy_modify_changed.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
drivers/net/phy/phy-core.c | 29 ++++++++++++++++++++++++-----
include/linux/phy.h | 2 ++
2 files changed, 26 insertions(+), 5 deletions(-)
diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
index 16667fbac..9ae3abb2d 100644
--- a/drivers/net/phy/phy-core.c
+++ b/drivers/net/phy/phy-core.c
@@ -783,24 +783,43 @@ int phy_write_paged(struct phy_device *phydev, int page, u32 regnum, u16 val)
EXPORT_SYMBOL(phy_write_paged);
/**
- * phy_modify_paged() - Convenience function for modifying a paged register
+ * phy_modify_paged_changed() - Function for modifying a paged register
* @phydev: a pointer to a &struct phy_device
* @page: the page for the phy
* @regnum: register number
* @mask: bit mask of bits to clear
* @set: bit mask of bits to set
*
- * Same rules as for phy_read() and phy_write().
+ * Returns negative errno, 0 if there was no change, and 1 in case of change
*/
-int phy_modify_paged(struct phy_device *phydev, int page, u32 regnum,
- u16 mask, u16 set)
+int phy_modify_paged_changed(struct phy_device *phydev, int page, u32 regnum,
+ u16 mask, u16 set)
{
int ret = 0, oldpage;
oldpage = phy_select_page(phydev, page);
if (oldpage >= 0)
- ret = __phy_modify(phydev, regnum, mask, set);
+ ret = __phy_modify_changed(phydev, regnum, mask, set);
return phy_restore_page(phydev, oldpage, ret);
}
+EXPORT_SYMBOL(phy_modify_paged_changed);
+
+/**
+ * phy_modify_paged() - Convenience function for modifying a paged register
+ * @phydev: a pointer to a &struct phy_device
+ * @page: the page for the phy
+ * @regnum: register number
+ * @mask: bit mask of bits to clear
+ * @set: bit mask of bits to set
+ *
+ * Same rules as for phy_read() and phy_write().
+ */
+int phy_modify_paged(struct phy_device *phydev, int page, u32 regnum,
+ u16 mask, u16 set)
+{
+ int ret = phy_modify_paged_changed(phydev, page, regnum, mask, set);
+
+ return ret < 0 ? ret : 0;
+}
EXPORT_SYMBOL(phy_modify_paged);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 7117825ee..781f4810c 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -984,6 +984,8 @@ int phy_select_page(struct phy_device *phydev, int page);
int phy_restore_page(struct phy_device *phydev, int oldpage, int ret);
int phy_read_paged(struct phy_device *phydev, int page, u32 regnum);
int phy_write_paged(struct phy_device *phydev, int page, u32 regnum, u16 val);
+int phy_modify_paged_changed(struct phy_device *phydev, int page, u32 regnum,
+ u16 mask, u16 set);
int phy_modify_paged(struct phy_device *phydev, int page, u32 regnum,
u16 mask, u16 set);
--
2.22.0
^ permalink raw reply related
* Re: [bpf-next v3 PATCH 1/3] samples/bpf: xdp_fwd rename devmap name to be xdp_tx_ports
From: Toke Høiland-Jørgensen @ 2019-08-08 19:16 UTC (permalink / raw)
To: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
Alexei Starovoitov
Cc: a.s.protopopov, dsahern, ys114321, Jesper Dangaard Brouer
In-Reply-To: <156528105763.22124.16079929264188823516.stgit@firesoul>
Jesper Dangaard Brouer <brouer@redhat.com> writes:
> The devmap name 'tx_port' came from a copy-paste from xdp_redirect_map
> which only have a single TX port. Change name to xdp_tx_ports
> to make it more descriptive.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Reviewed-by: David Ahern <dsahern@gmail.com>
> Acked-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
^ permalink raw reply
* Re: [bpf-next v3 PATCH 2/3] samples/bpf: make xdp_fwd more practically usable via devmap lookup
From: Toke Høiland-Jørgensen @ 2019-08-08 19:17 UTC (permalink / raw)
To: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
Alexei Starovoitov
Cc: a.s.protopopov, dsahern, ys114321, Jesper Dangaard Brouer
In-Reply-To: <156528106270.22124.2563148023961869582.stgit@firesoul>
Jesper Dangaard Brouer <brouer@redhat.com> writes:
> This address the TODO in samples/bpf/xdp_fwd_kern.c, which points out
> that the chosen egress index should be checked for existence in the
> devmap. This can now be done via taking advantage of Toke's work in
> commit 0cdbb4b09a06 ("devmap: Allow map lookups from eBPF").
>
> This change makes xdp_fwd more practically usable, as this allows for
> a mixed environment, where IP-forwarding fallback to network stack, if
> the egress device isn't configured to use XDP.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
^ permalink raw reply
* Re: [bonding][patch] Regarding a bonding lacp issue
From: Jay Vosburgh @ 2019-08-08 19:17 UTC (permalink / raw)
To: Felix; +Cc: vfalico, andy, netdev, linux-kernel
In-Reply-To: <8799b243-36da-4baf-8c67-aeb5f978c34f.fei.feng@linux.alibaba.com>
Felix <fei.feng@linux.alibaba.com> wrote:
>Dear Mainteners,
>
>Recently I hit a packet drop issue in bonding driver on Linux 4.9. Please
>see details below. Please take a look to see if my understanding is
>correct. Many thanks.
>
>What is the problem?
>The bonding driver starts to send packets even if the Partner(Switch)'s
>Collecting bit is not enabled yet. Partner would drop all packets until
>its Collecting bit is enabled.
>
>What is the root cuase?
>According to LACP spec, the Actor need to check Partner's Sync and
>Collecting bits before enable its Distributing bit and Distributing
>function. Please see the PIC below.
The diagram you reference is found in 802.1AX-2014 figure 6-21,
which shows the state diagram for an independent control implementation,
i.e., collecting and distributing are managed independently.
However, Linux bonding implements coupled control, which is
shown in figure 6-22. Here, there is no Partner.Collecting requirement
on the state transition from ATTACHED to COLLECTING_DISTRIBUTING.
To quote 802.1AX-2014 6.4.15:
As independent control is not possible, the coupled control
state machine does not wait for the Partner to signal that
collection has started before enabling both collection and
distribution.
Now, that said, I agree that what you're seeing is likely
explained by this behavior, and your fix should resolve the immediate
problem (that bonding sends packets before the peer has enabled
COLLECTING).
However, your fix does put bonding out of compliance with the
standard, as it does not really implement COLLECTING and DISTRIBUTING as
discrete states. In particular, if the peer in your case were to later
clear Partner.Collecting, bonding will not react to this as a figure
6-21 independent control implementation would (which isn't a change from
current behavior, but currently this isn't expected).
So, in my opinion a patch like this should have a comment
attached noting that we are deliberately not in compliance with the
standard in this specific situation. The proper fix is to implement
figure 6-21 separate state.
Lastly, are you able to test and generate a patch against
current upstream, instead of 4.9?
-J
>How to fix?
>Please see the diff as following. And the patch is attached.
>
>--- ../origin/linux-4.9.188/drivers/net/bonding/bond_3ad.c 2019-08-07
>00:29:42.000000000 +0800
>+++ drivers/net/bonding/bond_3ad.c 2019-08-08 23:13:29.015640197 +0800
>@@ -937,6 +937,7 @@
> */
> if ((port->sm_vars & AD_PORT_SELECTED) &&
> (port->partner_oper.port_state & AD_STATE_SYNCHRONIZATION) &&
>+ (port->partner_oper.port_state & AD_STATE_COLLECTING) &&
> !__check_agg_selection_timer(port)) {
> if (port->aggregator->is_active)
> port->sm_mux_state =
>
>------
>Thanks,
>Felix
---
-Jay Vosburgh, jay.vosburgh@canonical.com
^ permalink raw reply
* Re: [bpf-next v3 PATCH 3/3] samples/bpf: xdp_fwd explain bpf_fib_lookup return codes
From: Toke Høiland-Jørgensen @ 2019-08-08 19:17 UTC (permalink / raw)
To: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
Alexei Starovoitov
Cc: a.s.protopopov, dsahern, ys114321, Jesper Dangaard Brouer
In-Reply-To: <156528106777.22124.12162740342925045912.stgit@firesoul>
Jesper Dangaard Brouer <brouer@redhat.com> writes:
> Make it clear that this XDP program depend on the network
> stack to do the ARP resolution. This is connected with the
> BPF_FIB_LKUP_RET_NO_NEIGH return code from bpf_fib_lookup().
>
> Another common mistake (seen via XDP-tutorial) is that users
> don't realize that sysctl net.ipv{4,6}.conf.all.forwarding
> setting is honored by bpf_fib_lookup.
>
> Reported-by: Anton Protopopov <a.s.protopopov@gmail.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Reviewed-by: David Ahern <dsahern@gmail.com>
> Acked-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox