* configurable block error injection v5
From: Christoph Hellwig @ 2026-06-11 14:06 UTC (permalink / raw)
To: Jens Axboe
Cc: Jonathan Corbet, Damien Le Moal, Hannes Reinecke, Keith Busch,
linux-block, linux-doc
Hi all,
this series adds a new configurable block error injection facility.
We already have a few to inject block errors, but unfortunately most
of them are either not very useful or hard to use, or both:
- The fail_make_request failure injection point can't distinguish
different commands, different ranges in the file and can only injection
plain I/O errors.
- the should_fail_bio 'dynamic' failure injection has all the same issues
as fail_make_request
- dm-error can only fail all command in the table using BLK_STS_IOERR
and requires setting up a new block device
- dm-flakey and dm-dust allow all kinds of configurability, but still
don't have good error selection, no good support for non-read/write
commands and are limited to the dm table alignment requirements,
which for zoned devices enforces setting them up for an entire zone.
They also once again require setting up a stacked block device,
which is really annoying in harnesses like xfstests
This series adds a new debugfs-based block layer error injection
that allows to configure what operations and ranges the injection
applied to, and what status to return. It also allows to configure a
failure ratio similar to the xfs errortag injection.
Changes since v4:
- don't unlock in removeall to avoid a race between removeall and setup
- document why we can't match 0-sized bios
Changes since v3:
- use a static branch to guard the new condition
- split out a new header so that jump_label.h doesn't get pulled into
blk.h
- more checking for impossible conditions in blk_status_to_tag
- more spelling fixes
Changes since v2:
- improve the documentation a bit
- fix a spelling mistake in a comment
Changes since v1:
- drop the should_fail_bio removal and cleanup depending on it, as it's
used by eBPF programs and thus a hidden UABI.
- as a result split the code out to it's own Kconfig symbol
- various error handling fixed pointed out by Keith
- documentation spelling fixes pointed out by Randy
Diffstat:
Documentation/block/error-injection.rst | 59 +++++
Documentation/block/index.rst | 1
block/Kconfig | 8
block/Makefile | 1
block/blk-core.c | 87 ++++++--
block/blk-sysfs.c | 5
block/blk.h | 3
block/error-injection.c | 315 ++++++++++++++++++++++++++++++++
block/error-injection.h | 21 ++
block/genhd.c | 4
include/linux/blkdev.h | 6
11 files changed, 490 insertions(+), 20 deletions(-)
^ permalink raw reply
* Re: [RFC PATCH 1/3] mm/numa: add exclusive node pool and numa=standby boot parameter
From: Gregory Price @ 2026-06-11 14:04 UTC (permalink / raw)
To: Mike Rapoport
Cc: linux-mm, x86, linux-doc, linux-kernel, linux-acpi, driver-core,
kernel-team, corbet, skhan, dave.hansen, luto, peterz, tglx,
mingo, bp, hpa, rafael, lenb, gregkh, dakr, akpm, rdunlap,
feng.tang, dapeng1.mi, elver, kuba, ebiggers, lirongqing, paulmck,
dave.jiang, jic23, xueshuai, kai.huang
In-Reply-To: <aip5IWmxg9CWg8hQ@kernel.org>
On Thu, Jun 11, 2026 at 12:00:17PM +0300, Mike Rapoport wrote:
> > 1) Can we do dynamic addition of nodes?
> >
> > Not Trivially
> >
> > Some services utilize num_possible_nodes() as a static value to
> > calculate the amount of resources to use at runtime (bpf, md/raid5).
> >
> > Example: futex_init uses num_possible_nodes() as part of its
> > hashsize calculation during __init.
>
> AFAIU, we don't add the additional nodes for generic hotplug memory but
> rather for exclusive use of by drivers/applications that are aware of these
> nodes.
The intent is to use for "non-generic" hotplug (see the whole private
node series [1]), which would eventually still use the hotplug mechanism
just not for generic memory.
[1] https://lore.kernel.org/linux-mm/20260222084842.1824063-1-gourry@gourry.net/
> Wouldn't adding them to possible nodes actually skew the calculation of the
> resources by the services utilizing num_possible_nodes()?
>
> With the futex_init() example, won't be hashsize scaled down two much
> because we've added these special nodes to the possible mask?
>
The result is the same as BIOS reserving nodes with PXM entries that
don't get used. The CXL ACPI Tables do this for CXL Fixed Memory
Windows that may never be hotplugged.
So really i think you're pointing out that futex_init() here probably
shouldn't be using num_possible_nodes?
~Gregory
^ permalink raw reply
* Re: [PATCH v3] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum
From: Will Deacon @ 2026-06-11 13:34 UTC (permalink / raw)
To: Shanker Donthineni
Cc: Catalin Marinas, Vladimir Murzin, Jason Gunthorpe,
linux-arm-kernel, Mark Rutland, linux-kernel, linux-doc,
Vikram Sethi, Jason Sequeira
In-Reply-To: <20260610164822.4157248-1-sdonthineni@nvidia.com>
On Wed, Jun 10, 2026 at 11:48:22AM -0500, Shanker Donthineni wrote:
> On systems with NVIDIA Olympus cores, a Device-nGnR* load can be
> observed by a peripheral before an older, non-overlapping Device-nGnR*
> store to the same peripheral. This breaks the program-order guarantee
> that software expects for Device-nGnR* accesses and can leave a
> peripheral in an incorrect state, as a load is observed before an
> earlier store takes effect.
>
> The erratum can occur only when all of the following apply:
>
> - A PE executes a Device-nGnR* store followed by a younger
> Device-nGnR* load.
> - The store is not a store-release.
> - The accesses target the same peripheral and do not overlap in bytes.
> - There is at most one intervening Device-nGnR* store in program
> order, and there are no intervening Device-nGnR* loads.
> - There is no DSB, and no DMB that orders loads, between the store and
> the load.
> - Specific micro-architectural and timing conditions occur.
>
> Promote the raw MMIO store helpers (__raw_writeb/w/l/q) from plain str*
> to stlr* (Store-Release), which removes the "store is not a
> store-release" condition for every device write the kernel issues.
> Because writel() and writel_relaxed() are both built on __raw_writel()
> in asm-generic/io.h, patching the raw variants covers both the
> non-relaxed and relaxed APIs without touching the higher layers. Note
> that writel()'s own barrier sits before the store, so it does not order
> the store against a subsequent readl(); the store-release promotion is
> what provides that ordering.
>
> Like ARM64_ERRATUM_832075 on the load side, the change is gated on a new
> ARM64_WORKAROUND_DEVICE_STORE_RELEASE capability and only activated on
> parts that match MIDR_NVIDIA_OLYMPUS, so unaffected CPUs continue to use
> the plain str* sequence.
>
> Note: stlr* only supports base-register addressing, so affected CPUs use
> a base-register stlr* path. Unaffected CPUs keep the original
> offset-addressed str* sequence introduced by commit d044d6ba6f02
> ("arm64: io: permit offset addressing").
>
> The __const_memcpy_toio_aligned32() and __const_memcpy_toio_aligned64()
> helpers are left unchanged. These helpers are intended for
> write-combining mappings, which are Normal-NC on arm64. Replacing their
> contiguous str* groups would defeat the write-combining behavior used to
> improve store performance.
>
> Co-developed-by: Vikram Sethi <vsethi@nvidia.com>
> Signed-off-by: Vikram Sethi <vsethi@nvidia.com>
> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
> Changes since v2:
> - Reworked the raw MMIO write helpers so unaffected CPUs keep the
> existing offset-addressed STR sequence, while affected CPUs use the
> base-register STLR path.
> - Updated the commit message to match the code changes.
> - Rebased on top of the arm64 for-next/errata branch:
> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/errata
>
> Changes since v1:
> - Updated the commit message based on feedback from Vladimir Murzin.
>
> Documentation/arch/arm64/silicon-errata.rst | 2 ++
> arch/arm64/Kconfig | 23 ++++++++++++++++
> arch/arm64/include/asm/io.h | 30 +++++++++++++++++++++
> arch/arm64/kernel/cpu_errata.c | 8 ++++++
> arch/arm64/tools/cpucaps | 1 +
> 5 files changed, 64 insertions(+)
>
> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
> index ad09bbb10da80..fc45125dc2f80 100644
> --- a/Documentation/arch/arm64/silicon-errata.rst
> +++ b/Documentation/arch/arm64/silicon-errata.rst
> @@ -298,6 +298,8 @@ stable kernels.
> +----------------+-----------------+-----------------+-----------------------------+
> | NVIDIA | Carmel Core | N/A | NVIDIA_CARMEL_CNP_ERRATUM |
> +----------------+-----------------+-----------------+-----------------------------+
> +| NVIDIA | Olympus core | T410-OLY-1027 | NVIDIA_OLYMPUS_1027_ERRATUM |
> ++----------------+-----------------+-----------------+-----------------------------+
> | NVIDIA | Olympus core | T410-OLY-1029 | ARM64_ERRATUM_4118414 |
> +----------------+-----------------+-----------------+-----------------------------+
> | NVIDIA | T241 GICv3/4.x | T241-FABRIC-4 | N/A |
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c65cef81be86a..d633eb70de1ac 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -564,6 +564,29 @@ config ARM64_ERRATUM_832075
>
> If unsure, say Y.
>
> +config NVIDIA_OLYMPUS_1027_ERRATUM
> + bool "NVIDIA Olympus: device store/load ordering erratum"
> + default y
> + help
> + This option adds an alternative code sequence to work around an
> + NVIDIA Olympus core erratum where a Device-nGnR* store can be
> + observed by a peripheral after a younger Device-nGnR* load to the
> + same peripheral. This breaks the program order that drivers rely
> + on for MMIO and can leave a device in an incorrect state.
> +
> + The workaround promotes the raw MMIO store helpers
> + (__raw_writeb/w/l/q) to Store-Release (STLR), which restores the
> + required ordering. Because writel() and writel_relaxed() are built
> + on __raw_writel(), both are covered without changes to the higher
> + layers.
> +
> + The fix is applied through the alternatives framework, so enabling
> + this option does not by itself activate the workaround: it is
> + patched in only when an affected CPU is detected, and is a no-op on
> + unaffected CPUs.
> +
> + If unsure, say Y.
> +
> config ARM64_ERRATUM_834220
> bool "Cortex-A57: 834220: Stage 2 translation fault might be incorrectly reported in presence of a Stage 1 fault (rare)"
> depends on KVM
> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> index 8cbd1e96fd50b..801223e754c90 100644
> --- a/arch/arm64/include/asm/io.h
> +++ b/arch/arm64/include/asm/io.h
> @@ -22,10 +22,22 @@
> /*
> * Generic IO read/write. These perform native-endian accesses.
> */
> +static __always_inline bool arm64_needs_device_store_release(void)
> +{
> + return alternative_has_cap_unlikely(
> + ARM64_WORKAROUND_DEVICE_STORE_RELEASE);
> +}
> +
> #define __raw_writeb __raw_writeb
> static __always_inline void __raw_writeb(u8 val, volatile void __iomem *addr)
> {
> volatile u8 __iomem *ptr = addr;
> +
> + if (arm64_needs_device_store_release()) {
> + asm volatile("stlrb %w0, [%1]" : : "rZ" (val), "r" (addr));
> + return;
> + }
> +
> asm volatile("strb %w0, %1" : : "rZ" (val), "Qo" (*ptr));
> }
Use an 'else' clause instead of the early return? (similarly for the other
changes).
I still reckon you should do something with the memcpy-to-io routines.
A simple option could be to make dgh() a dmb on parts with the erratum?
That at least moves the barrier out of the loop.
Will
^ permalink raw reply
* Re: [PATCH v2 0/3] f2fs: support encrypted inline data
From: LiaoYuanhong-vivo @ 2026-06-11 12:50 UTC (permalink / raw)
To: ebiggers
Cc: chao, corbet, jaegeuk, linux-doc, linux-f2fs-devel, linux-fscrypt,
linux-kernel, skhan, tytso, liaoyuanhong
In-Reply-To: <20260602134104.348655-1-liaoyuanhong@vivo.com>
Hi,
Gentle ping on this series.
v2 tries to address the previous concerns by avoiding per-file software
tfm growth, preparing the software transform lazily, and explicitly
disabling unsupported key combinations.
The main remaining limitation is hardware-wrapped keys. If this makes
the feature unlikely to be accepted, please let me know. Otherwise, I
would appreciate any review comments on the current direction.
If maintainers have any feasible direction in mind, I would also
appreciate hearing it.
Thanks,
Liao Yuanhong
^ permalink raw reply
* Re: [PATCH v5 08/10] ACPI: APEI: share GHES CPER helpers
From: Ahmed Tiba @ 2026-06-11 12:42 UTC (permalink / raw)
To: Jonathan Cameron
Cc: will, xueshuai, saket.dumbre, mchehab, dave, djbw, bp, tony.luck,
guohanjun, lenb, skhan, vishal.l.verma, rafael, corbet, ira.weiny,
dave.jiang, krzk+dt, robh, catalin.marinas, alison.schofield,
conor+dt, linux-arm-kernel, Michael.Zhao2, linux-doc,
linux-kernel, linux-cxl, Dmitry.Lamerov, devicetree, linux-acpi,
linux-edac, acpica-devel
In-Reply-To: <20260529173229.18843384@jic23-huawei>
On 29/05/2026 17:32, Jonathan Cameron wrote:
> On Fri, 29 May 2026 10:50:48 +0100
> Ahmed Tiba <ahmed.tiba@arm.com> wrote:
>
>> Wire GHES up to the helper routines in ghes_cper.c and remove the local
>> copies from ghes.c. This keeps the control flow identical while letting
>> the helpers be shared with other firmware-first providers.
>>
>> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
> Mostly looks fine. The one bit that rather makes this exercise of breaking
> out generic code look dodgy is the ifdefs in the generic file.
>
As below.
>
>> ---
>> drivers/acpi/apei/ghes.c | 416 +--------------------------------------
>> drivers/acpi/apei/ghes_cper.c | 438 +++++++++++++++++++++++++++++++++++++++++-
>> include/acpi/ghes_cper.h | 20 ++
>> 3 files changed, 459 insertions(+), 415 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 85be2ebf4d3e..f85b97c4db4c 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>
>>
>> static void __ghes_panic(struct ghes *ghes,
>> diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
>> index d7a666a163c3..0ff9d06eb78f 100644
>> --- a/drivers/acpi/apei/ghes_cper.c
>> +++ b/drivers/acpi/apei/ghes_cper.c
>> @@ -13,22 +13,32 @@
>
>>
>> #include "apei-internal.h"
>>
>> +ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
>> +
>> +#ifndef CONFIG_ACPI_APEI
>> +void __weak arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) { }
>> +#endif
> This is non obvious enough that the reasoning for a new weak function should be mentioned in
> the patch description. Why not stub it in include/acpi/apei.h?
>
Agreed. I should have explained that in the changelog.
The weak arch_apei_report_mem_error() fallback was only meant to keep
the shared helper buildable when GHES_CPER_HELPERS is enabled without
ACPI_APEI, while preserving the current GHES behaviour when ACPI_APEI
is enabled.
I kept it local because this fallback is only needed by this helper
split and I did not want to widen the APEI header API for that.
>> +
>> static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
>> static atomic_t ghes_estatus_cache_alloced;
>
>> +void __ghes_print_estatus(const char *pfx,
>> + const struct acpi_hest_generic *generic,
>> + const struct acpi_hest_generic_status *estatus)
>> +{
>> + static atomic_t seqno;
>> + unsigned int curr_seqno;
>> + char pfx_seq[64];
>> +
>> + if (!pfx) {
>> + if (ghes_severity(estatus->error_severity) <=
>> + GHES_SEV_CORRECTED)
>> + pfx = KERN_WARNING;
>> + else
>> + pfx = KERN_ERR;
>> + }
>> + curr_seqno = atomic_inc_return(&seqno);
>> + snprintf(pfx_seq, sizeof(pfx_seq), "%s{%u}" HW_ERR, pfx, curr_seqno);
>> + printk("%sHardware error from APEI Generic Hardware Error Source: %d\n",
>> + pfx_seq, generic->header.source_id);
>> + cper_estatus_print(pfx_seq, estatus);
>> +}
>> +
>> +int ghes_print_estatus(const char *pfx,
>> + const struct acpi_hest_generic *generic,
>> + const struct acpi_hest_generic_status *estatus)
>> +{
>> + /* Not more than 2 messages every 5 seconds */
>> + static DEFINE_RATELIMIT_STATE(ratelimit_corrected, 5 * HZ, 2);
>> + static DEFINE_RATELIMIT_STATE(ratelimit_uncorrected, 5 * HZ, 2);
>> + struct ratelimit_state *ratelimit;
>> +
>> + if (ghes_severity(estatus->error_severity) <= GHES_SEV_CORRECTED)
>> + ratelimit = &ratelimit_corrected;
>> + else
>> + ratelimit = &ratelimit_uncorrected;
>> + if (__ratelimit(ratelimit)) {
>> + __ghes_print_estatus(pfx, generic, estatus);
>> + return 1;
>> + }
>> + return 0;
>> +}
>> +
>> +#ifdef CONFIG_ACPI_APEI
>
> So after the effort to break the the generic stuff we end up with non generic
> bits in the broken out file? Is there no way to avoid this?
>
The intent here was not to create a new generic estatus core
but to keep the existing GHES control flow and lift the CPER helper flow
for reuse by the DT provider.
Splitting the remaining ACPI GHES specific read/clear path back out into
ghes.c would break that flow across files again. The CONFIG_ACPI_APEI
guard keeps that ACPI specific piece local while the DT side reuses the
same CPER parsing and status-dispatch path.
Best regards,
Ahmed
^ permalink raw reply
* Re: [PATCH v3 04/12] x86,fs/resctrl: Program PLZA through kmode arch hooks
From: Peter Newman @ 2026-06-11 11:44 UTC (permalink / raw)
To: Moger, Babu
Cc: Luck, Tony, Babu Moger, corbet, reinette.chatre, Dave.Martin,
james.morse, tglx, bp, dave.hansen, skhan, x86, mingo, hpa, akpm,
rdunlap, pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver,
lirongqing, paulmck, bhelgaas, seanjc, alexandre.chartre,
yazen.ghannam, peterz, chang.seok.bae, kim.phillips, xin, naveen,
thomas.lendacky, linux-doc, linux-kernel, eranian,
sos-linux-ext-patches
In-Reply-To: <a56f8ecc-cf1e-48a4-836d-7e7723072c38@amd.com>
Hi Babu,
On Thu, May 21, 2026 at 1:09 AM Moger, Babu <bmoger@amd.com> wrote:
>
> Hi Tony,
>
> On 5/20/2026 5:16 PM, Luck, Tony wrote:
> > On Wed, May 20, 2026 at 12:49:25PM -0500, Babu Moger wrote:
> >> Hi Tony,
> >>
> >>
> >> On 5/19/26 15:59, Luck, Tony wrote:
> >>> On Thu, Apr 30, 2026 at 06:24:49PM -0500, Babu Moger wrote:
> >>>> +void resctrl_arch_configure_kmode(cpumask_var_t cpu_mask, u32 closid, u32 rmid, bool enable)
> >>>> +{
> >>>> + union msr_pqr_plza_assoc plza = { 0 };
> >>>> +
> >>>> + plza.split.rmid = rmid;
> >>>> + plza.split.rmid_en = 1;
> >>>
> >>> Shouldn't there be a parameter for the value of rmid_en?
> >>
> >>
> >> I realized that behavior is not required—it was actually due to a mistake in
> >> my v2 series implementation.
Really? This is in fact the only behavior we wanted:
https://lore.kernel.org/lkml/CABPqkBSq=cgn-am4qorA_VN0vsbpbfDePSi7gubicpROB1=djw@mail.gmail.com/
-Peter
> >>
> >> Below are the relevant definitions:
> >>
> >>
> >> GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU:
> >> The CLOSID is applied to kernel work, while the RMID used for monitoring is
> >> inherited from the currently running user task.
> >> No separate monitoring group is assigned for kernel work, so kernel
> >> execution naturally inherits the user-space RMID.
> >>
> >>
> >> GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU:
> >> Both CLOSID and RMID are explicitly assigned to kernel work.
> >> This allows assigning a dedicated monitoring group for kernel execution and
> >> therefore requires a separate RMID.
> >>
> >> Example: For GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU:
> >>
> >> # mount -t resctrl resctrl /sys/fs/resctrl
> >>
> >> # cat /sys/fs/resctrl/info/kernel_mode
> >> [inherit_ctrl_and_mon:group=//]
> >> global_assign_ctrl_inherit_mon_per_cpu:group=none
> >> global_assign_ctrl_assign_mon_per_cpu:group=none
> >>
> >> # mkdir /sys/fs/resctrl/ctrl1 (PQR_ASSOC closid=1 rmid=1)
> >>
> >> This configures all the CPU threads to use closid=1 and rmid=1 for both
> >> allocation and monitoring across user and kernel modes.
> >>
> >>
> >> # echo "global_assign_ctrl_inherit_mon_per_cpu:group=ctrl1//" \
> >> > /sys/fs/resctrl/info/kernel_mode
> >>
> >> # cat /sys/fs/resctrl/info/kernel_mode
> >> inherit_ctrl_and_mon:group=none
> >> [global_assign_ctrl_inherit_mon_per_cpu:group=ctrl1//]
> >> global_assign_ctrl_assign_mon_per_cpu:group=none
> >>
> >> This overrides the previous configuration, and PQR_PLZA_ASSOC is written.
> >>
> >> Possible options:
> >>
> >> 1. (closid=1, rmid_en=0, rmid=1)
> >> Here, hardware uses closid=1 for kernel work, but RMID tracking is disabled
> >> for kernel mode.
> >>
> >> As a result, reading RMID 1 reports only user-mode activity
> >> This contradicts the definition of this mode, since kernel work is expected
> >> to inherit the user RMID for monitoring.
> >>
> >> 2. (closid=1, rmid_en=1, rmid=1)
> >> In this case, RMID tracking is enabled for both user and kernel modes.
> >>
> >> Reading RMID 1 reports combined user + kernel activity
> >> This aligns with the expected inherit_monitoring behavior
> >>
> >>
> >> The preferred approach is to separate kernel monitoring by assigning it a
> >> dedicated monitoring group and updating PQR_PLZA_ASSOC to use a different
> >> RMID (e.g., closid=1, rmid_en=1, rmid=2). This is exactly the behavior
> >> implemented by GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU.
> >
> > So maybe I'm just confused by the name "global_assign_ctrl_inherit_mon_per_cpu"
> >
> > That sounds like "Use the CLOSID from PLZA, but keep the RMID from
> > legacy PQR_ASSOC.
>
> Yes. That is correct. We need to work on naming this correctly.
>
> >
> > So:
> >
> > # mkdir ctrl1 # maybe gets CLOSID=1, RMID=1
> > # echo global_assign_ctrl_inherit_mon_per_cpu:group=ctrl1//" > info/kernel_mode
>
> This makes kernel mode run with CLOSID 1 and RMID 1(Use the same RMID as
> the user mode). [1]
>
> > # mkdir ctrl2 # maybe gets CLOSID=2, RMID=2
> > # echo $$ > ctrl2/tasks
> >
> > My shell, and all children run with CLOSID=2 and RMID=2 from ctrl2. But
> > when they do system calls, take page faults or there is an interrupt I'd
> > expect the code in the kernel to run with the CLOSID=1, while inheriting
> > RMID=2.
>
> ctrl2 is not a PLZA group. So, RMID 2 is not connected to PLZA.
> >
> > To make that happen, I thing the PLZA MSR should have rmid_en = 0. But
> > the only code I see that sets this always sets rmid_en=1.
>
> Setting rmid_en = 0 in [1] disables counting of kernel usage for RMID 1
> (from ctrl1).
>
> The key difference between the two modes is:
>
> In one mode, user and kernel usage are counted together.
> In the other mode, kernel usage is counted separately from user usage.
>
> Please feel free to continue the discussion if anything is still unclear.
>
>
> Thanks,
> Babu
>
^ permalink raw reply
* Re: [RFC V2 3/3] mm: Replace pgtable entry prints with new format
From: David Hildenbrand (Arm) @ 2026-06-11 11:32 UTC (permalink / raw)
To: Anshuman Khandual, linux-mm
Cc: Andy Shevchenko, Rasmus Villemoes, Sergey Senozhatsky,
Petr Mladek, Steven Rostedt, Jonathan Corbet, Andrew Morton,
linux-kernel, linux-doc, Lorenzo Stoakes
In-Reply-To: <20260610043545.3725735-4-anshuman.khandual@arm.com>
On 6/10/26 06:35, Anshuman Khandual wrote:
> Replace all existing pgtable entry prints with recently added new format in
> __print_bad_page_map_pgtable().
>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Lorenzo Stoakes <ljs@kernel.org>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
>
> mm/memory.c | 15 +++++----------
> 1 file changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 86a973119bd4..8a25790f7c24 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -521,7 +521,6 @@ static bool is_bad_page_map_ratelimited(void)
>
> static void __print_bad_page_map_pgtable(struct mm_struct *mm, unsigned long addr)
> {
> - unsigned long long pgdv, p4dv, pudv, pmdv;
> p4d_t p4d, *p4dp;
> pud_t pud, *pudp;
> pmd_t pmd, *pmdp;
> @@ -532,34 +531,30 @@ static void __print_bad_page_map_pgtable(struct mm_struct *mm, unsigned long add
> * see locking requirements for print_bad_page_map().
> */
> pgdp = pgd_offset(mm, addr);
> - pgdv = pgd_val(*pgdp);
>
> if (!pgd_present(*pgdp) || pgd_leaf(*pgdp)) {
> - pr_alert("pgd:%08llx\n", pgdv);
> + pr_alert("pgd:%ppgd\n", pgdp);
> return;
> }
>
> p4dp = p4d_offset(pgdp, addr);
> p4d = p4dp_get(p4dp);
> - p4dv = p4d_val(p4d);
>
> if (!p4d_present(p4d) || p4d_leaf(p4d)) {
> - pr_alert("pgd:%08llx p4d:%08llx\n", pgdv, p4dv);
> + pr_alert("pgd:%ppgd p4d:%pp4d\n", pgdp, p4dp);
> return;
> }
>
> pudp = pud_offset(p4dp, addr);
> pud = pudp_get(pudp);
> - pudv = pud_val(pud);
>
> if (!pud_present(pud) || pud_leaf(pud)) {
> - pr_alert("pgd:%08llx p4d:%08llx pud:%08llx\n", pgdv, p4dv, pudv);
> + pr_alert("pgd:%ppgd p4d:%pp4d pud:%ppud\n", pgdp, p4dp, pudp);
> return;
> }
>
> pmdp = pmd_offset(pudp, addr);
> pmd = pmdp_get(pmdp);
> - pmdv = pmd_val(pmd);
>
> /*
> * Dumping the PTE would be nice, but it's tricky with CONFIG_HIGHPTE,
> @@ -567,8 +562,8 @@ static void __print_bad_page_map_pgtable(struct mm_struct *mm, unsigned long add
> * doing another map would be bad. print_bad_page_map() should
> * already take care of printing the PTE.
> */
> - pr_alert("pgd:%08llx p4d:%08llx pud:%08llx pmd:%08llx\n", pgdv,
> - p4dv, pudv, pmdv);
> + pr_alert("pgd:%ppgd p4d:%pp4d pud:%ppud pmd:%ppmd\n", pgdp,
> + p4dp, pudp, pmdp);
> }
>
> /*
I like that! I guess having per-level format identifiers is the right approach
given that we have per-level types.
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH net-next v2 1/2] tls: remove tls_toe and the related driver
From: Eric Dumazet @ 2026-06-11 11:31 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: netdev, Ayush Sawal, John Fastabend, Jakub Kicinski,
David S. Miller, Alexander Gordeev, Andrew Lunn,
Christian Borntraeger, Heiko Carstens, Paolo Abeni, Simon Horman,
Sven Schnelle, Vasily Gorbik, linux-s390, linux-doc,
Jonathan Corbet, Shuah Khan
In-Reply-To: <1f30e73275c07bf879f547589872d0916025a52e.1781165969.git.sd@queasysnail.net>
On Thu, Jun 11, 2026 at 3:21 AM Sabrina Dubroca <sd@queasysnail.net> wrote:
>
> The tls_toe feature and its single user (chelsio chtls) have been
> unmaintained for multiple years. It also hooks into the core of the
> TCP implementation, and bypasses most of the networking stack.
>
> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
> ---
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: [PATCH v2 1/4] mm: use mapping_mapped to simplify the code
From: Pedro Falcato @ 2026-06-11 11:13 UTC (permalink / raw)
To: Huang Shijie
Cc: akpm, viro, brauner, jack, muchun.song, osalvador, david, surenb,
mjguzik, liam, ljs, vbabka, shakeel.butt, rppt, mhocko, corbet,
skhan, linux, dinguyen, schuster.simon, James.Bottomley, deller,
djbw, willy, peterz, mingo, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, james.clark,
mhiramat, oleg, ziy, baolin.wang, npache, ryan.roberts, dev.jain,
baohua, lance.yang, linmiaohe, nao.horiguchi, jannh, riel, harry,
will, brian.ruley, rmk+kernel, dave.anglin, linux-mm, linux-doc,
linux-kernel, linux-arm-kernel, linux-parisc, linux-fsdevel,
nvdimm, linux-perf-users, linux-trace-kernel, zhongyuan,
fangbaoshun, yingzhiwei
In-Reply-To: <20260611061915.2354307-2-huangsj@hygon.cn>
On Thu, Jun 11, 2026 at 02:18:57PM +0800, Huang Shijie wrote:
> Use mapping_mapped() to simplify the code, make
> the code tidy and clean.
>
> Signed-off-by: Huang Shijie <huangsj@hygon.cn>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
LGTM, thanks! Super uncontroversial so perhaps
could be picked up separately.
--
Pedro
^ permalink raw reply
* Re: [PATCH v2 3/4] mm/fs: split the file's i_mmap tree
From: Pedro Falcato @ 2026-06-11 11:11 UTC (permalink / raw)
To: Huang Shijie
Cc: akpm, viro, brauner, jack, muchun.song, osalvador, david, surenb,
mjguzik, liam, ljs, vbabka, shakeel.butt, rppt, mhocko, corbet,
skhan, linux, dinguyen, schuster.simon, James.Bottomley, deller,
djbw, willy, peterz, mingo, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, james.clark,
mhiramat, oleg, ziy, baolin.wang, npache, ryan.roberts, dev.jain,
baohua, lance.yang, linmiaohe, nao.horiguchi, jannh, riel, harry,
will, brian.ruley, rmk+kernel, dave.anglin, linux-mm, linux-doc,
linux-kernel, linux-arm-kernel, linux-parisc, linux-fsdevel,
nvdimm, linux-perf-users, linux-trace-kernel, zhongyuan,
fangbaoshun, yingzhiwei
In-Reply-To: <20260611061915.2354307-4-huangsj@hygon.cn>
Hi,
On Thu, Jun 11, 2026 at 02:18:59PM +0800, Huang Shijie wrote:
> In the UnixBench tests, there is a test "execl" which tests
> the execve system call.
> For example, a Hygon's server has 12 NUMA nodes, and 384 CPUs.
> When we test our server with "./Run -c 384 execl",
> the test result is not good enough. The i_mmap locks contended heavily on
> "libc.so" and "ld.so". The i_mmap tree for "libc.so" can be
> over 6000 VMAs, all the VMAs can be in different NUMA mode. The insert/remove
> operations do not run quickly enough.
I _really_ would have appreciated some coordination here, because I said I was
going to take a look at it. I have something that I think is much simpler
in practice. These patches are also way too complex to be dropped just before
the merge window.
Some comments:
>
> In order to reduce the competition of the i_mmap lock, this patch does
> following:
> 1.) Split the single i_mmap tree into several sibling trees:
> Each tree has a lock. The CONFIG_SPLIT_I_MMAP is used to
> turn on/off this feature.
There is no need for a config option. This needs to Just Work.
> 2.) Introduce a new field "tree_idx" for vm_area_struct to save the
> sibling tree index for this VMA.
This is possibly contentious, but there are holes in vm_area_struct.
So I think this is fine.
> 3.) Introduce a new field "vma_count" for address_space.
> The new mapping_mapped() will use it.
> 4.) Rewrite the vma_interval_tree_foreach()
> 5.) Rewrite the lock functions.
>
> After this patch, the VMA insert/remove operations will work faster,
> and we can get over 400% performance improvement with the above test.
>
> Signed-off-by: Huang Shijie <huangsj@hygon.cn>
> ---
> fs/Kconfig | 8 ++
> fs/hugetlbfs/inode.c | 20 ++++-
> fs/inode.c | 75 ++++++++++++++++-
> include/linux/fs.h | 174 ++++++++++++++++++++++++++++++++++++++-
> include/linux/mm.h | 80 ++++++++++++++++++
> include/linux/mm_types.h | 3 +
> mm/internal.h | 3 +-
> mm/mmap.c | 11 ++-
> mm/nommu.c | 23 ++++--
> mm/pagewalk.c | 2 +-
> mm/vma.c | 72 +++++++++++-----
> mm/vma_init.c | 3 +
> 12 files changed, 436 insertions(+), 38 deletions(-)
>
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 43cb06de297f..e24804f70432 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -9,6 +9,14 @@ menu "File systems"
> config DCACHE_WORD_ACCESS
> bool
>
> +config SPLIT_I_MMAP
> + bool "Split the file's i_mmap to several trees"
> + default n
> + help
> + Split the file's i_mmap to several trees, each tree has a separate
> + lock. This will reduce the lock contention of file's i_mmap tree,
> + but it will cost more memory for per inode.
> +
> config VALIDATE_FS_PARSER
> bool "Validate filesystem parameter description"
> help
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index da5b41ea5bdd..68d8308418dd 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -891,6 +891,23 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb,
> */
> static struct lock_class_key hugetlbfs_i_mmap_rwsem_key;
>
> +#ifdef CONFIG_SPLIT_I_MMAP
> +static void hugetlbfs_lockdep_set_class(struct address_space *mapping)
> +{
> + int i;
> +
> + for (i = 0; i < split_tree_num; i++) {
> + lockdep_set_class(&mapping->i_mmap[i].rwsem,
> + &hugetlbfs_i_mmap_rwsem_key);
> + }
> +}
> +#else
> +static void hugetlbfs_lockdep_set_class(struct address_space *mapping)
> +{
> + lockdep_set_class(&mapping->i_mmap_rwsem, &hugetlbfs_i_mmap_rwsem_key);
> +}
> +#endif
> +
> static struct inode *hugetlbfs_get_inode(struct super_block *sb,
> struct mnt_idmap *idmap,
> struct inode *dir,
> @@ -915,8 +932,7 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
>
> inode->i_ino = get_next_ino();
> inode_init_owner(idmap, inode, dir, mode);
> - lockdep_set_class(&inode->i_mapping->i_mmap_rwsem,
> - &hugetlbfs_i_mmap_rwsem_key);
> + hugetlbfs_lockdep_set_class(inode->i_mapping);
> inode->i_mapping->a_ops = &hugetlbfs_aops;
> simple_inode_init_ts(inode);
> info->resv_map = resv_map;
> diff --git a/fs/inode.c b/fs/inode.c
> index 62c579a0cf7d..cb67ae83f5b3 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -214,6 +214,70 @@ static int no_open(struct inode *inode, struct file *file)
> return -ENXIO;
> }
>
> +#ifdef CONFIG_SPLIT_I_MMAP
> +int split_tree_num;
> +static int split_tree_align __maybe_unused = 32;
> +
> +static void __init init_split_tree_num(void)
> +{
> +#ifdef CONFIG_NUMA
> + split_tree_num = nr_node_ids;
> +#else
> + split_tree_num = ALIGN(nr_cpu_ids, split_tree_align);
> +#endif
> +}
Again, too configurable. I think you're too stuck up on the NUMA case -
which does not matter for many people - and may actively harm NUMA users. If
I have a 128 core 2 NUMA node system, what should I shard by?
> +
> +static void free_mapping_i_mmap(struct address_space *mapping)
> +{
> + int i;
> +
> + if (!mapping->i_mmap)
> + return;
> +
> + for (i = 0; i < split_tree_num; i++)
> + kfree(mapping->i_mmap[i]);
> +
> + kfree(mapping->i_mmap);
> + mapping->i_mmap = NULL;
> +}
> +
> +static int init_mapping_i_mmap(struct address_space *mapping, gfp_t gfp)
> +{
> + struct i_mmap_tree *tree;
> + int i;
> +
> + /* The extra one is used as terminator in vma_interval_tree_foreach() */
> + mapping->i_mmap = kzalloc(sizeof(tree) * (split_tree_num + 1), gfp);
> + if (!mapping->i_mmap)
> + return -ENOMEM;
> +
> + for (i = 0; i < split_tree_num; i++) {
> + tree = kzalloc_node(sizeof(*tree), gfp, i);
> + if (!tree)
> + goto nomem;
> +
> + tree->root = RB_ROOT_CACHED;
> + init_rwsem(&tree->rwsem);
This (as-is) should blow up with lockdep + the locking loops down there.
> +
> + mapping->i_mmap[i] = tree;
> + }
> + return 0;
> +nomem:
> + free_mapping_i_mmap(mapping);
> + return -ENOMEM;
> +}
Honestly, it's likely that a simple static array in struct address_space
suffices. I would not go through the trouble of getting everything very
tight and NUMA correct.
> +#else
> +static int init_mapping_i_mmap(struct address_space *mapping, gfp_t gfp)
> +{
> + mapping->i_mmap = RB_ROOT_CACHED;
> + init_rwsem(&mapping->i_mmap_rwsem);
> + return 0;
> +}
> +
> +static void free_mapping_i_mmap(struct address_space *mapping) { }
> +static void __init init_split_tree_num(void) {}
> +#endif
> +
> /**
> * inode_init_always_gfp - perform inode structure initialisation
> * @sb: superblock inode belongs to
> @@ -302,9 +366,14 @@ int inode_init_always_gfp(struct super_block *sb, struct inode *inode, gfp_t gfp
> #endif
> inode->i_flctx = NULL;
>
> - if (unlikely(security_inode_alloc(inode, gfp)))
> + if (init_mapping_i_mmap(mapping, gfp))
> return -ENOMEM;
>
> + if (unlikely(security_inode_alloc(inode, gfp))) {
> + free_mapping_i_mmap(mapping);
> + return -ENOMEM;
> + }
> +
> this_cpu_inc(nr_inodes);
>
> return 0;
> @@ -380,6 +449,7 @@ void __destroy_inode(struct inode *inode)
> if (inode->i_default_acl && !is_uncached_acl(inode->i_default_acl))
> posix_acl_release(inode->i_default_acl);
> #endif
> + free_mapping_i_mmap(&inode->i_data);
> this_cpu_dec(nr_inodes);
> }
> EXPORT_SYMBOL(__destroy_inode);
> @@ -480,9 +550,7 @@ EXPORT_SYMBOL(inc_nlink);
> static void __address_space_init_once(struct address_space *mapping)
> {
> xa_init_flags(&mapping->i_pages, XA_FLAGS_LOCK_IRQ | XA_FLAGS_ACCOUNT);
> - init_rwsem(&mapping->i_mmap_rwsem);
> spin_lock_init(&mapping->i_private_lock);
> - mapping->i_mmap = RB_ROOT_CACHED;
> }
>
> void address_space_init_once(struct address_space *mapping)
> @@ -2619,6 +2687,7 @@ void __init inode_init(void)
> &i_hash_mask,
> 0,
> 0);
> + init_split_tree_num();
> }
>
> void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index cd46615b8f53..f4b3645b61df 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -450,6 +450,25 @@ struct mapping_metadata_bhs {
> struct list_head list; /* The list of bhs (b_assoc_buffers) */
> };
>
> +#ifdef CONFIG_SPLIT_I_MMAP
> +/*
> + * struct i_mmap_tree - A single sibling tree of the file's split i_mmap.
> + * @root: The red/black interval tree root.
> + * @rwsem: Protects insert/remove operations on this sibling tree.
> + * @vma_count: Number of VMAs in this sibling tree.
> + *
> + * When CONFIG_SPLIT_I_MMAP is enabled, the file's single i_mmap tree is
> + * split into split_tree_num sibling trees, each with its own lock. This
> + * reduces lock contention by allowing concurrent VMA insert/remove
> + * operations on different sibling trees.
> + */
> +struct i_mmap_tree {
> + struct rb_root_cached root;
> + struct rw_semaphore rwsem;
> + atomic_t vma_count;
I don't see what you need this vma_count for? I get the one in address_space,
but this one does not seem useful.
> +};
> +#endif
> +
> /**
> * struct address_space - Contents of a cacheable, mappable object.
> * @host: Owner, either the inode or the block_device.
> @@ -461,8 +480,13 @@ struct mapping_metadata_bhs {
> * @gfp_mask: Memory allocation flags to use for allocating pages.
> * @i_mmap_writable: Number of VM_SHARED, VM_MAYWRITE mappings.
> * @nr_thps: Number of THPs in the pagecache (non-shmem only).
> - * @i_mmap: Tree of private and shared mappings.
> - * @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable.
> + * @i_mmap: Tree of private and shared mappings. When CONFIG_SPLIT_I_MMAP
> + * is enabled, this is an array of split_tree_num struct i_mmap_tree
> + * pointers (plus a NULL terminator).
NULL terminator wastes more memory, so I would really strongly avoid it as
well.
> + * @vma_count: Total number of VMAs across all sibling trees (only when
> + * CONFIG_SPLIT_I_MMAP is enabled). Used by mapping_mapped().
> + * @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable (only when
> + * CONFIG_SPLIT_I_MMAP is disabled; otherwise per-tree rwsem is used).
So, there are very good reasons why you still need an i_mmap_rwsem protecting
state, even with split mmap trees. Which I'll go into later.
> * @nrpages: Number of page entries, protected by the i_pages lock.
> * @writeback_index: Writeback starts here.
> * @a_ops: Methods.
> @@ -480,14 +504,19 @@ struct address_space {
> /* number of thp, only for non-shmem files */
> atomic_t nr_thps;
> #endif
> +#ifdef CONFIG_SPLIT_I_MMAP
> + struct i_mmap_tree **i_mmap;
> + atomic_t vma_count;
> +#else
> struct rb_root_cached i_mmap;
> + struct rw_semaphore i_mmap_rwsem;
> +#endif
> unsigned long nrpages;
> pgoff_t writeback_index;
> const struct address_space_operations *a_ops;
> unsigned long flags;
> errseq_t wb_err;
> spinlock_t i_private_lock;
> - struct rw_semaphore i_mmap_rwsem;
See d3b1a9a778e1 ("fs/address_space: move i_mmap_rwsem to mitigate a false sharing with i_mmap.")
> } __attribute__((aligned(sizeof(long)))) __randomize_layout;
> /*
> * On most architectures that alignment is already the case; but
> @@ -508,6 +537,133 @@ static inline bool mapping_tagged(const struct address_space *mapping, xa_mark_t
> return xa_marked(&mapping->i_pages, tag);
> }
>
> +#ifdef CONFIG_SPLIT_I_MMAP
> +static inline int mapping_mapped(const struct address_space *mapping)
> +{
> + return atomic_read(&mapping->vma_count);
Now that I think of it, I don't think we need atomic_t, only unsigned long +
READ_ONCE() suffices. Increments can race just fine, we don't expect any
consistency there - if you want consistency you probably hold the i_mmap lock.
> +}
> +
> +static inline void inc_mapping_vma(struct address_space *mapping,
> + struct vm_area_struct *vma)
> +{
> + struct i_mmap_tree *tree = mapping->i_mmap[vma->tree_idx];
> +
> + atomic_inc(&tree->vma_count);
> + atomic_inc(&mapping->vma_count);
> +}
> +
> +static inline void dec_mapping_vma(struct address_space *mapping,
> + struct vm_area_struct *vma)
> +{
> + struct i_mmap_tree *tree = mapping->i_mmap[vma->tree_idx];
> +
> + atomic_dec(&tree->vma_count);
> + atomic_dec(&mapping->vma_count);
> +}
This probably shouldn't be in linux/fs.h.
> +
> +static inline struct rb_root_cached *get_i_mmap_root(struct address_space *mapping)
> +{
> + return (struct rb_root_cached *)mapping->i_mmap;
> +}
> +
> +static inline void i_mmap_tree_lock_write(struct address_space *mapping,
> + struct vm_area_struct *vma)
> +{
> + struct i_mmap_tree *tree = mapping->i_mmap[vma->tree_idx];
> +
> + down_write(&tree->rwsem);
> +}
> +
> +static inline void i_mmap_tree_unlock_write(struct address_space *mapping,
> + struct vm_area_struct *vma)
> +{
> + struct i_mmap_tree *tree = mapping->i_mmap[vma->tree_idx];
> +
> + up_write(&tree->rwsem);
> +}
> +
> +#define i_mmap_lock_write_prepare(mapping)
> +#define i_mmap_unlock_write_complete(mapping)
It's unclear to me why you added write_prepare() and write_complete().
> +
> +extern int split_tree_num;
> +static inline void i_mmap_lock_write(struct address_space *mapping)
> +{
> + int i;
> +
> + for (i = 0; i < split_tree_num; i++)
> + down_write(&mapping->i_mmap[i]->rwsem);
Oof, this is an incredibly large hammer. This is basically why I think keeping
i_mmap_rwsem (in a different form) is required. You do not want to take $nr_cpus
locks (read _or_ write). For my design, I keep i_mmap_rwsem, but I invert its
meaning - taking it in write = I'm reading from the tree; taking it in read =
I'm writing to the tree. This provides some lighter-weight exclusion between
rmap walks and rmap tree manipulation.
_Technically_, you shouldn't need to always take a lock when manipulating the
tree. A pattern like mnt_hold_writers()/mnt_get_write_access() can probably
work well. But it may be too complex ATM.
Also, note that you pretty much do not want i_mmap_lock_write() users after
the conversion is done.
> +}
> +
> +static inline int i_mmap_trylock_write(struct address_space *mapping)
> +{
> + int i;
> +
> + for (i = 0; i < split_tree_num; i++) {
> + if (!down_write_trylock(&mapping->i_mmap[i]->rwsem)) {
> + while (i--)
> + up_write(&mapping->i_mmap[i]->rwsem);
> + return 0;
> + }
> + }
> + return 1;
> +}
> +
> +static inline void i_mmap_unlock_write(struct address_space *mapping)
> +{
> + int i;
> +
> + for (i = 0; i < split_tree_num; i++)
> + up_write(&mapping->i_mmap[i]->rwsem);
> +}
> +
> +static inline int i_mmap_trylock_read(struct address_space *mapping)
> +{
> + int i;
> +
> + for (i = 0; i < split_tree_num; i++) {
> + if (!down_read_trylock(&mapping->i_mmap[i]->rwsem)) {
> + while (i--)
> + up_read(&mapping->i_mmap[i]->rwsem);
> + return 0;
> + }
> + }
> + return 1;
> +}
> +
> +static inline void i_mmap_lock_read(struct address_space *mapping)
> +{
> + int i;
> +
> + for (i = 0; i < split_tree_num; i++)
> + down_read(&mapping->i_mmap[i]->rwsem);
> +}
> +
> +static inline void i_mmap_unlock_read(struct address_space *mapping)
> +{
> + int i;
> +
> + for (i = 0; i < split_tree_num; i++)
> + up_read(&mapping->i_mmap[i]->rwsem);
> +}
> +
> +static inline void i_mmap_assert_locked(struct address_space *mapping)
> +{
> + int i;
> +
> + for (i = 0; i < split_tree_num; i++)
> + lockdep_assert_held(&mapping->i_mmap[i]->rwsem);
> +}
> +
> +static inline void i_mmap_assert_write_locked(struct address_space *mapping)
> +{
> + int i;
> +
> + for (i = 0; i < split_tree_num; i++)
> + lockdep_assert_held_write(&mapping->i_mmap[i]->rwsem);
> +}
> +
> +#else
> +
> static inline void i_mmap_lock_write(struct address_space *mapping)
> {
> down_write(&mapping->i_mmap_rwsem);
> @@ -561,6 +717,18 @@ static inline struct rb_root_cached *get_i_mmap_root(struct address_space *mappi
> return &mapping->i_mmap;
> }
>
> +static inline void inc_mapping_vma(struct address_space *mapping,
> + struct vm_area_struct *vma) { }
> +static inline void dec_mapping_vma(struct address_space *mapping,
> + struct vm_area_struct *vma) { }
> +
> +#define i_mmap_lock_write_prepare(mapping) i_mmap_lock_write(mapping)
> +#define i_mmap_unlock_write_complete(mapping) i_mmap_unlock_write(mapping)
> +#define i_mmap_tree_lock_write(mapping, vma)
> +#define i_mmap_tree_unlock_write(mapping, vma)
> +
> +#endif
> +
> /*
> * Might pages of this file have been modified in userspace?
> * Note that i_mmap_writable counts all VM_SHARED, VM_MAYWRITE vmas: do_mmap
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0a45c6a8b9f2..9aa8119fa9bf 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4041,11 +4041,91 @@ struct vm_area_struct *vma_interval_tree_iter_first(struct rb_root_cached *root,
> struct vm_area_struct *vma_interval_tree_iter_next(struct vm_area_struct *node,
> unsigned long start, unsigned long last);
>
> +#ifdef CONFIG_SPLIT_I_MMAP
> +extern int split_tree_num;
> +
> +static inline int smallest_tree_idx(struct file *file)
> +{
> + struct address_space *mapping = file->f_mapping;
> + int tmp = INT_MAX, count;
> + int i, j = 0;
> +
> + /*
> + * Since a not 100% accurate value is still okay,
> + * we do not need any lock here.
> + */
> + for (i = 0; i < split_tree_num; i++) {
> + count = atomic_read(&mapping->i_mmap[i]->vma_count);
> + if (count < tmp) {
> + j = i;
> + tmp = count;
> + if (!tmp)
> + break;
> + }
> + }
Ohh, I see why you want the per-subtree vma_count now. But is this a net-win?
I think doing something like vma-pointer-hashing or just smp_processor_id()
would work a-ok.
> + return j;
> +}
> +
> +static inline void vma_set_tree_idx(struct vm_area_struct *vma)
> +{
> +#ifdef CONFIG_NUMA
> + vma->tree_idx = numa_node_id();
> +#else
> + vma->tree_idx = smallest_tree_idx(vma->vm_file);
> +#endif
> +}
> +
> +static inline struct rb_root_cached *get_rb_root(struct vm_area_struct *vma,
> + struct address_space *mapping)
> +{
> + return &mapping->i_mmap[vma->tree_idx]->root;
> +}
> +
> +/* Find the first valid VMA in the sibling trees */
> +static inline struct vm_area_struct *first_vma(struct i_mmap_tree ***__r,
> + unsigned long start, unsigned long last)
> +{
> + struct vm_area_struct *vma = NULL;
> + struct i_mmap_tree **tree = *__r;
> + struct rb_root_cached *root;
> +
> + while (*tree) {
> + root = &(*tree)->root;
> + tree++;
> + vma = vma_interval_tree_iter_first(root, start, last);
> + if (vma)
> + break;
> + }
> +
> + /* Save for the next loop */
> + *__r = tree;
> + return vma;
> +}
> +
> +/*
> + * Please use get_i_mmap_root() to get the @root.
> + * @_tmp is referenced to avoid unused variable warning.
> + */
> +#define vma_interval_tree_foreach(vma, root, start, last) \
> + for (struct i_mmap_tree **_r = (struct i_mmap_tree **)(root), \
> + **_tmp = (vma = first_vma(&_r, start, last)) ? _r : NULL;\
> + ((_tmp && vma) || (vma = first_vma(&_r, start, last))); \
> + vma = vma_interval_tree_iter_next(vma, start, last))
> +#else
> /* Please use get_i_mmap_root() to get the @root */
> #define vma_interval_tree_foreach(vma, root, start, last) \
> for (vma = vma_interval_tree_iter_first(root, start, last); \
> vma; vma = vma_interval_tree_iter_next(vma, start, last))
>
> +static inline void vma_set_tree_idx(struct vm_area_struct *vma) { }
> +
> +static inline struct rb_root_cached *get_rb_root(struct vm_area_struct *vma,
> + struct address_space *mapping)
> +{
> + return &mapping->i_mmap;
> +}
> +#endif
> +
> void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
> struct rb_root_cached *root);
> void anon_vma_interval_tree_remove(struct anon_vma_chain *node,
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index a308e2c23b82..8d6aab3346ce 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -1072,6 +1072,9 @@ struct vm_area_struct {
> #ifdef __HAVE_PFNMAP_TRACKING
> struct pfnmap_track_ctx *pfnmap_track_ctx;
> #endif
> +#ifdef CONFIG_SPLIT_I_MMAP
> + int tree_idx; /* The sibling tree index for the VMA */
> +#endif
FTR the struct hole isn't here, but right after vm_lock_seq or vm_refcnt in
most configs.
> } __randomize_layout;
>
> /* Clears all bits in the VMA flags bitmap, non-atomically. */
> diff --git a/mm/internal.h b/mm/internal.h
> index 5a2ddcf68e0b..2d35cacffd19 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1888,7 +1888,8 @@ static inline void maybe_rmap_unlock_action(struct vm_area_struct *vma,
>
> VM_WARN_ON_ONCE(vma_is_anonymous(vma));
> file = vma->vm_file;
> - i_mmap_unlock_write(file->f_mapping);
> + i_mmap_tree_unlock_write(file->f_mapping, vma);
> + i_mmap_unlock_write_complete(file->f_mapping);
> action->hide_from_rmap_until_complete = false;
> }
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index d714fdb357e5..70036ec9dcaa 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1825,15 +1825,20 @@ __latent_entropy int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
> struct address_space *mapping = file->f_mapping;
>
> get_file(file);
> - i_mmap_lock_write(mapping);
> + i_mmap_lock_write_prepare(mapping);
> + i_mmap_tree_lock_write(mapping, mpnt);
> +
> if (vma_is_shared_maywrite(tmp))
> mapping_allow_writable(mapping);
> flush_dcache_mmap_lock(mapping);
> /* insert tmp into the share list, just after mpnt */
> vma_interval_tree_insert_after(tmp, mpnt,
> - get_i_mmap_root(mapping));
> + get_rb_root(mpnt, mapping));
> + inc_mapping_vma(mapping, tmp);
Honestly, would prefer to hide all of these details from mmap.
> flush_dcache_mmap_unlock(mapping);
> - i_mmap_unlock_write(mapping);
> +
> + i_mmap_tree_unlock_write(mapping, mpnt);
> + i_mmap_unlock_write_complete(mapping);
> }
>
> if (!(tmp->vm_flags & VM_WIPEONFORK))
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 0f18ffc658e9..1f2c60a220f6 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -567,11 +567,16 @@ static void setup_vma_to_mm(struct vm_area_struct *vma, struct mm_struct *mm)
> if (vma->vm_file) {
> struct address_space *mapping = vma->vm_file->f_mapping;
>
> - i_mmap_lock_write(mapping);
> + i_mmap_lock_write_prepare(mapping);
> + i_mmap_tree_lock_write(mapping, vma);
> +
> flush_dcache_mmap_lock(mapping);
> - vma_interval_tree_insert(vma, get_i_mmap_root(mapping));
> + vma_interval_tree_insert(vma, get_rb_root(vma, mapping));
> + inc_mapping_vma(mapping, vma);
> flush_dcache_mmap_unlock(mapping);
> - i_mmap_unlock_write(mapping);
> +
> + i_mmap_tree_unlock_write(mapping, vma);
> + i_mmap_unlock_write_complete(mapping);
> }
> }
>
> @@ -583,11 +588,16 @@ static void cleanup_vma_from_mm(struct vm_area_struct *vma)
> struct address_space *mapping;
> mapping = vma->vm_file->f_mapping;
>
> - i_mmap_lock_write(mapping);
> + i_mmap_lock_write_prepare(mapping);
> + i_mmap_tree_lock_write(mapping, vma);
> +
> flush_dcache_mmap_lock(mapping);
> - vma_interval_tree_remove(vma, get_i_mmap_root(mapping));
> + vma_interval_tree_remove(vma, get_rb_root(vma, mapping));
> + dec_mapping_vma(mapping, vma);
> flush_dcache_mmap_unlock(mapping);
> - i_mmap_unlock_write(mapping);
> +
> + i_mmap_tree_unlock_write(mapping, vma);
> + i_mmap_unlock_write_complete(mapping);
> }
> }
>
> @@ -1063,6 +1073,7 @@ unsigned long do_mmap(struct file *file,
> if (file) {
> region->vm_file = get_file(file);
> vma->vm_file = get_file(file);
> + vma_set_tree_idx(vma);
This is unrelated, shouldn't be done here.
> }
>
> down_write(&nommu_region_sem);
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index 8df1b5077951..d5745519d95a 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -809,7 +809,7 @@ int walk_page_mapping(struct address_space *mapping, pgoff_t first_index,
> if (!check_ops_safe(ops))
> return -EINVAL;
>
> - lockdep_assert_held(&mapping->i_mmap_rwsem);
> + i_mmap_assert_locked(mapping);
This kind of conversion should be done in a separate step.
> vma_interval_tree_foreach(vma, get_i_mmap_root(mapping), first_index,
> first_index + nr - 1) {
> /* Clip to the vma */
> diff --git a/mm/vma.c b/mm/vma.c
> index 6159650c1b42..2055758064a9 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -234,22 +234,23 @@ static void __vma_link_file(struct vm_area_struct *vma,
> mapping_allow_writable(mapping);
>
> flush_dcache_mmap_lock(mapping);
> - vma_interval_tree_insert(vma, get_i_mmap_root(mapping));
> + vma_interval_tree_insert(vma, get_rb_root(vma, mapping));
> + inc_mapping_vma(mapping, vma);
inc_mapping_vma() should probably be done implicitly by insertion?
> flush_dcache_mmap_unlock(mapping);
> }
>
> -/*
> - * Requires inode->i_mapping->i_mmap_rwsem
> - */
> static void __remove_shared_vm_struct(struct vm_area_struct *vma,
> struct address_space *mapping)
> {
> + i_mmap_tree_lock_write(mapping, vma);
> if (vma_is_shared_maywrite(vma))
> mapping_unmap_writable(mapping);
>
> flush_dcache_mmap_lock(mapping);
> - vma_interval_tree_remove(vma, get_i_mmap_root(mapping));
> + vma_interval_tree_remove(vma, get_rb_root(vma, mapping));
> + dec_mapping_vma(mapping, vma);
> flush_dcache_mmap_unlock(mapping);
> + i_mmap_tree_unlock_write(mapping, vma);
> }
>
> /*
> @@ -297,8 +298,9 @@ static void vma_prepare(struct vma_prepare *vp)
> uprobe_munmap(vp->adj_next, vp->adj_next->vm_start,
> vp->adj_next->vm_end);
>
> - i_mmap_lock_write(vp->mapping);
> + i_mmap_lock_write_prepare(vp->mapping);
> if (vp->insert && vp->insert->vm_file) {
> + i_mmap_tree_lock_write(vp->mapping, vp->insert);
> /*
> * Put into interval tree now, so instantiated pages
> * are visible to arm/parisc __flush_dcache_page
> @@ -307,6 +309,7 @@ static void vma_prepare(struct vma_prepare *vp)
> */
> __vma_link_file(vp->insert,
> vp->insert->vm_file->f_mapping);
> + i_mmap_tree_unlock_write(vp->mapping, vp->insert);
> }
> }
>
> @@ -318,12 +321,17 @@ static void vma_prepare(struct vma_prepare *vp)
> }
>
> if (vp->file) {
> + i_mmap_tree_lock_write(vp->mapping, vp->vma);
> flush_dcache_mmap_lock(vp->mapping);
> vma_interval_tree_remove(vp->vma,
> - get_i_mmap_root(vp->mapping));
> - if (vp->adj_next)
> + get_rb_root(vp->vma, vp->mapping));
> + dec_mapping_vma(vp->mapping, vp->vma);
> + if (vp->adj_next) {
> + i_mmap_tree_lock_write(vp->mapping, vp->adj_next);
> vma_interval_tree_remove(vp->adj_next,
> - get_i_mmap_root(vp->mapping));
> + get_rb_root(vp->adj_next, vp->mapping));
> + dec_mapping_vma(vp->mapping, vp->adj_next);
> + }
> }
>
> }
> @@ -340,12 +348,17 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
> struct mm_struct *mm)
> {
> if (vp->file) {
> - if (vp->adj_next)
> + if (vp->adj_next) {
> vma_interval_tree_insert(vp->adj_next,
> - get_i_mmap_root(vp->mapping));
> + get_rb_root(vp->adj_next, vp->mapping));
> + inc_mapping_vma(vp->mapping, vp->adj_next);
> + i_mmap_tree_unlock_write(vp->mapping, vp->adj_next);
> + }
> vma_interval_tree_insert(vp->vma,
> - get_i_mmap_root(vp->mapping));
> + get_rb_root(vp->vma, vp->mapping));
> + inc_mapping_vma(vp->mapping, vp->vma);
> flush_dcache_mmap_unlock(vp->mapping);
> + i_mmap_tree_unlock_write(vp->mapping, vp->vma);
> }
>
> if (vp->remove && vp->file) {
> @@ -370,7 +383,7 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
> }
>
> if (vp->file) {
> - i_mmap_unlock_write(vp->mapping);
> + i_mmap_unlock_write_complete(vp->mapping);
>
> if (!vp->skip_vma_uprobe) {
> uprobe_mmap(vp->vma);
> @@ -1799,12 +1812,12 @@ static void unlink_file_vma_batch_process(struct unlink_vma_file_batch *vb)
> int i;
>
> mapping = vb->vmas[0]->vm_file->f_mapping;
> - i_mmap_lock_write(mapping);
> + i_mmap_lock_write_prepare(mapping);
> for (i = 0; i < vb->count; i++) {
> VM_WARN_ON_ONCE(vb->vmas[i]->vm_file->f_mapping != mapping);
> __remove_shared_vm_struct(vb->vmas[i], mapping);
> }
> - i_mmap_unlock_write(mapping);
> + i_mmap_unlock_write_complete(mapping);
>
> unlink_file_vma_batch_init(vb);
> }
> @@ -1836,10 +1849,13 @@ static void vma_link_file(struct vm_area_struct *vma, bool hold_rmap_lock)
>
> if (file) {
> mapping = file->f_mapping;
> - i_mmap_lock_write(mapping);
> + i_mmap_lock_write_prepare(mapping);
> + i_mmap_tree_lock_write(mapping, vma);
> __vma_link_file(vma, mapping);
> - if (!hold_rmap_lock)
> - i_mmap_unlock_write(mapping);
> + if (!hold_rmap_lock) {
> + i_mmap_tree_unlock_write(mapping, vma);
> + i_mmap_unlock_write_complete(mapping);
> + }
> }
> }
>
> @@ -2164,6 +2180,23 @@ static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma)
> }
> }
I can but hope that all of the above is quite simplified before we get to the
"making file rmap more complicated" bit.
>
> +#ifdef CONFIG_SPLIT_I_MMAP
> +static inline void i_mmap_nest_lock(struct address_space *mapping,
> + struct rw_semaphore *lock)
> +{
> + int i;
> +
> + for (i = 0; i < split_tree_num; i++)
> + down_write_nest_lock(&mapping->i_mmap[i]->rwsem, lock);
> +}
> +#else
> +static inline void i_mmap_nest_lock(struct address_space *mapping,
> + struct rw_semaphore *lock)
> +{
> + down_write_nest_lock(&mapping->i_mmap_rwsem, lock);
> +}
> +#endif
> +
> static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping)
> {
> if (!test_bit(AS_MM_ALL_LOCKS, &mapping->flags)) {
> @@ -2178,7 +2211,7 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping)
> */
> if (test_and_set_bit(AS_MM_ALL_LOCKS, &mapping->flags))
> BUG();
> - down_write_nest_lock(&mapping->i_mmap_rwsem, &mm->mmap_lock);
> + i_mmap_nest_lock(mapping, &mm->mmap_lock);
> }
> }
>
> @@ -2489,6 +2522,7 @@ static int __mmap_new_file_vma(struct mmap_state *map,
> int error;
>
> vma->vm_file = map->file;
> + vma_set_tree_idx(vma);
> if (!map->file_doesnt_need_get)
> get_file(map->file);
>
> diff --git a/mm/vma_init.c b/mm/vma_init.c
> index 3c0b65950510..c115e33d4812 100644
> --- a/mm/vma_init.c
> +++ b/mm/vma_init.c
> @@ -72,6 +72,9 @@ static void vm_area_init_from(const struct vm_area_struct *src,
> #ifdef CONFIG_NUMA
> dest->vm_policy = src->vm_policy;
> #endif
> +#ifdef CONFIG_SPLIT_I_MMAP
> + dest->tree_idx = src->tree_idx;
> +#endif
> #ifdef __HAVE_PFNMAP_TRACKING
> dest->pfnmap_track_ctx = NULL;
> #endif
--
Pedro
^ permalink raw reply
* [PATCH net-next v2 1/2] tls: remove tls_toe and the related driver
From: Sabrina Dubroca @ 2026-06-11 10:21 UTC (permalink / raw)
To: netdev
Cc: Sabrina Dubroca, Ayush Sawal, John Fastabend, Jakub Kicinski,
David S. Miller, Alexander Gordeev, Andrew Lunn,
Christian Borntraeger, Eric Dumazet, Heiko Carstens, Paolo Abeni,
Simon Horman, Sven Schnelle, Vasily Gorbik, linux-s390, linux-doc,
Jonathan Corbet, Shuah Khan
In-Reply-To: <cover.1781165969.git.sd@queasysnail.net>
The tls_toe feature and its single user (chelsio chtls) have been
unmaintained for multiple years. It also hooks into the core of the
TCP implementation, and bypasses most of the networking stack.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
Documentation/networking/tls-offload.rst | 7 +-
arch/s390/configs/debug_defconfig | 1 -
arch/s390/configs/defconfig | 1 -
.../net/ethernet/chelsio/cxgb4/cxgb4_main.c | 3 +-
.../ethernet/chelsio/inline_crypto/Kconfig | 12 -
.../ethernet/chelsio/inline_crypto/Makefile | 1 -
.../chelsio/inline_crypto/chtls/Makefile | 6 -
.../chelsio/inline_crypto/chtls/chtls.h | 584 -----
.../chelsio/inline_crypto/chtls/chtls_cm.c | 2336 -----------------
.../chelsio/inline_crypto/chtls/chtls_cm.h | 218 --
.../chelsio/inline_crypto/chtls/chtls_hw.c | 462 ----
.../chelsio/inline_crypto/chtls/chtls_io.c | 1836 -------------
.../chelsio/inline_crypto/chtls/chtls_main.c | 642 -----
include/linux/netdev_features.h | 3 +-
include/net/tls.h | 1 -
include/net/tls_toe.h | 77 -
include/uapi/linux/tls.h | 2 +-
net/ethtool/common.c | 1 -
net/tls/Kconfig | 10 -
net/tls/Makefile | 1 -
net/tls/tls_main.c | 17 -
net/tls/tls_toe.c | 141 -
22 files changed, 4 insertions(+), 6358 deletions(-)
delete mode 100644 drivers/net/ethernet/chelsio/inline_crypto/chtls/Makefile
delete mode 100644 drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls.h
delete mode 100644 drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
delete mode 100644 drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.h
delete mode 100644 drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_hw.c
delete mode 100644 drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c
delete mode 100644 drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c
delete mode 100644 include/net/tls_toe.h
delete mode 100644 net/tls/tls_toe.c
diff --git a/Documentation/networking/tls-offload.rst b/Documentation/networking/tls-offload.rst
index c173f537bf4d..25ee8d9f12c9 100644
--- a/Documentation/networking/tls-offload.rst
+++ b/Documentation/networking/tls-offload.rst
@@ -13,7 +13,7 @@ Layer Protocol (ULP) and install the cryptographic connection state.
For details regarding the user-facing interface refer to the TLS
documentation in :ref:`Documentation/networking/tls.rst <kernel_tls>`.
-``ktls`` can operate in three modes:
+``ktls`` can operate in two modes:
* Software crypto mode (``TLS_SW``) - CPU handles the cryptography.
In most basic cases only crypto operations synchronous with the CPU
@@ -26,11 +26,6 @@ documentation in :ref:`Documentation/networking/tls.rst <kernel_tls>`.
This mode integrates best with the kernel stack and is described in detail
in the remaining part of this document
(``ethtool`` flags ``tls-hw-tx-offload`` and ``tls-hw-rx-offload``).
- * Full TCP NIC offload mode (``TLS_HW_RECORD``) - mode of operation where
- NIC driver and firmware replace the kernel networking stack
- with its own TCP handling, it is not usable in production environments
- making use of the Linux networking stack for example any firewalling
- abilities or QoS and packet scheduling (``ethtool`` flag ``tls-hw-record``).
The operation mode is selected automatically based on device configuration,
offload opt-in or opt-out on per-connection basis is not currently supported.
diff --git a/arch/s390/configs/debug_defconfig b/arch/s390/configs/debug_defconfig
index 730c90b4a876..fa517117b275 100644
--- a/arch/s390/configs/debug_defconfig
+++ b/arch/s390/configs/debug_defconfig
@@ -125,7 +125,6 @@ CONFIG_UNIX=y
CONFIG_UNIX_DIAG=m
CONFIG_TLS=m
CONFIG_TLS_DEVICE=y
-CONFIG_TLS_TOE=y
CONFIG_XFRM_USER=m
CONFIG_NET_KEY=m
CONFIG_SMC=m
diff --git a/arch/s390/configs/defconfig b/arch/s390/configs/defconfig
index dd5fc1426c88..86c19649d6a4 100644
--- a/arch/s390/configs/defconfig
+++ b/arch/s390/configs/defconfig
@@ -116,7 +116,6 @@ CONFIG_UNIX=y
CONFIG_UNIX_DIAG=m
CONFIG_TLS=m
CONFIG_TLS_DEVICE=y
-CONFIG_TLS_TOE=y
CONFIG_XFRM_USER=m
CONFIG_NET_KEY=m
CONFIG_SMC=m
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 6df98fca932f..9e2c2fa16d7a 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -6795,8 +6795,7 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
NETIF_F_TSO | NETIF_F_TSO6;
netdev->hw_features |= NETIF_F_GSO_UDP_TUNNEL |
- NETIF_F_GSO_UDP_TUNNEL_CSUM |
- NETIF_F_HW_TLS_RECORD;
+ NETIF_F_GSO_UDP_TUNNEL_CSUM;
if (adapter->rawf_cnt)
netdev->udp_tunnel_nic_info = &cxgb_udp_tunnels;
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/Kconfig b/drivers/net/ethernet/chelsio/inline_crypto/Kconfig
index 521955e1f894..b7d452f4a7f1 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/Kconfig
+++ b/drivers/net/ethernet/chelsio/inline_crypto/Kconfig
@@ -13,18 +13,6 @@ config CHELSIO_INLINE_CRYPTO
if CHELSIO_INLINE_CRYPTO
-config CRYPTO_DEV_CHELSIO_TLS
- tristate "Chelsio Crypto Inline TLS Driver"
- depends on CHELSIO_T4
- depends on TLS
- depends on TLS_TOE
- help
- Support Chelsio Inline TLS with Chelsio crypto accelerator.
- Enable inline TLS support for Tx and Rx.
-
- To compile this driver as a module, choose M here: the module
- will be called chtls.
-
config CHELSIO_IPSEC_INLINE
tristate "Chelsio IPSec XFRM Tx crypto offload"
depends on CHELSIO_T4
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/Makefile b/drivers/net/ethernet/chelsio/inline_crypto/Makefile
index 27e6d7e2f1eb..ca6548adc6a7 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/Makefile
+++ b/drivers/net/ethernet/chelsio/inline_crypto/Makefile
@@ -1,4 +1,3 @@
# SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_CRYPTO_DEV_CHELSIO_TLS) += chtls/
obj-$(CONFIG_CHELSIO_IPSEC_INLINE) += ch_ipsec/
obj-$(CONFIG_CHELSIO_TLS_DEVICE) += ch_ktls/
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/Makefile b/drivers/net/ethernet/chelsio/inline_crypto/chtls/Makefile
deleted file mode 100644
index bc11495acdb3..000000000000
--- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/Makefile
+++ /dev/null
@@ -1,6 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-ccflags-y := -I $(srctree)/drivers/net/ethernet/chelsio/cxgb4 \
- -I $(srctree)/drivers/crypto/chelsio
-
-obj-$(CONFIG_CRYPTO_DEV_CHELSIO_TLS) += chtls.o
-chtls-objs := chtls_main.o chtls_cm.o chtls_io.o chtls_hw.o
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls.h b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls.h
deleted file mode 100644
index 1de5744a49b0..000000000000
--- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls.h
+++ /dev/null
@@ -1,584 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (c) 2018 Chelsio Communications, Inc.
- */
-
-#ifndef __CHTLS_H__
-#define __CHTLS_H__
-
-#include <crypto/aes.h>
-#include <crypto/hash.h>
-#include <crypto/sha1.h>
-#include <crypto/sha2.h>
-#include <crypto/authenc.h>
-#include <crypto/ctr.h>
-#include <crypto/gf128mul.h>
-#include <crypto/internal/aead.h>
-#include <crypto/null.h>
-#include <crypto/internal/skcipher.h>
-#include <crypto/aead.h>
-#include <crypto/scatterwalk.h>
-#include <crypto/internal/hash.h>
-#include <linux/tls.h>
-#include <net/tls.h>
-#include <net/tls_prot.h>
-#include <net/tls_toe.h>
-
-#include "t4fw_api.h"
-#include "t4_msg.h"
-#include "cxgb4.h"
-#include "cxgb4_uld.h"
-#include "l2t.h"
-#include "chcr_algo.h"
-#include "chcr_core.h"
-#include "chcr_crypto.h"
-
-#define CHTLS_DRV_VERSION "1.0.0.0-ko"
-
-#define TLS_KEYCTX_RXFLIT_CNT_S 24
-#define TLS_KEYCTX_RXFLIT_CNT_V(x) ((x) << TLS_KEYCTX_RXFLIT_CNT_S)
-
-#define TLS_KEYCTX_RXPROT_VER_S 20
-#define TLS_KEYCTX_RXPROT_VER_M 0xf
-#define TLS_KEYCTX_RXPROT_VER_V(x) ((x) << TLS_KEYCTX_RXPROT_VER_S)
-
-#define TLS_KEYCTX_RXCIPH_MODE_S 16
-#define TLS_KEYCTX_RXCIPH_MODE_M 0xf
-#define TLS_KEYCTX_RXCIPH_MODE_V(x) ((x) << TLS_KEYCTX_RXCIPH_MODE_S)
-
-#define TLS_KEYCTX_RXAUTH_MODE_S 12
-#define TLS_KEYCTX_RXAUTH_MODE_M 0xf
-#define TLS_KEYCTX_RXAUTH_MODE_V(x) ((x) << TLS_KEYCTX_RXAUTH_MODE_S)
-
-#define TLS_KEYCTX_RXCIAU_CTRL_S 11
-#define TLS_KEYCTX_RXCIAU_CTRL_V(x) ((x) << TLS_KEYCTX_RXCIAU_CTRL_S)
-
-#define TLS_KEYCTX_RX_SEQCTR_S 9
-#define TLS_KEYCTX_RX_SEQCTR_M 0x3
-#define TLS_KEYCTX_RX_SEQCTR_V(x) ((x) << TLS_KEYCTX_RX_SEQCTR_S)
-
-#define TLS_KEYCTX_RX_VALID_S 8
-#define TLS_KEYCTX_RX_VALID_V(x) ((x) << TLS_KEYCTX_RX_VALID_S)
-
-#define TLS_KEYCTX_RXCK_SIZE_S 3
-#define TLS_KEYCTX_RXCK_SIZE_M 0x7
-#define TLS_KEYCTX_RXCK_SIZE_V(x) ((x) << TLS_KEYCTX_RXCK_SIZE_S)
-
-#define TLS_KEYCTX_RXMK_SIZE_S 0
-#define TLS_KEYCTX_RXMK_SIZE_M 0x7
-#define TLS_KEYCTX_RXMK_SIZE_V(x) ((x) << TLS_KEYCTX_RXMK_SIZE_S)
-
-#define KEYCTX_TX_WR_IV_S 55
-#define KEYCTX_TX_WR_IV_M 0x1ffULL
-#define KEYCTX_TX_WR_IV_V(x) ((x) << KEYCTX_TX_WR_IV_S)
-#define KEYCTX_TX_WR_IV_G(x) \
- (((x) >> KEYCTX_TX_WR_IV_S) & KEYCTX_TX_WR_IV_M)
-
-#define KEYCTX_TX_WR_AAD_S 47
-#define KEYCTX_TX_WR_AAD_M 0xffULL
-#define KEYCTX_TX_WR_AAD_V(x) ((x) << KEYCTX_TX_WR_AAD_S)
-#define KEYCTX_TX_WR_AAD_G(x) (((x) >> KEYCTX_TX_WR_AAD_S) & \
- KEYCTX_TX_WR_AAD_M)
-
-#define KEYCTX_TX_WR_AADST_S 39
-#define KEYCTX_TX_WR_AADST_M 0xffULL
-#define KEYCTX_TX_WR_AADST_V(x) ((x) << KEYCTX_TX_WR_AADST_S)
-#define KEYCTX_TX_WR_AADST_G(x) \
- (((x) >> KEYCTX_TX_WR_AADST_S) & KEYCTX_TX_WR_AADST_M)
-
-#define KEYCTX_TX_WR_CIPHER_S 30
-#define KEYCTX_TX_WR_CIPHER_M 0x1ffULL
-#define KEYCTX_TX_WR_CIPHER_V(x) ((x) << KEYCTX_TX_WR_CIPHER_S)
-#define KEYCTX_TX_WR_CIPHER_G(x) \
- (((x) >> KEYCTX_TX_WR_CIPHER_S) & KEYCTX_TX_WR_CIPHER_M)
-
-#define KEYCTX_TX_WR_CIPHERST_S 23
-#define KEYCTX_TX_WR_CIPHERST_M 0x7f
-#define KEYCTX_TX_WR_CIPHERST_V(x) ((x) << KEYCTX_TX_WR_CIPHERST_S)
-#define KEYCTX_TX_WR_CIPHERST_G(x) \
- (((x) >> KEYCTX_TX_WR_CIPHERST_S) & KEYCTX_TX_WR_CIPHERST_M)
-
-#define KEYCTX_TX_WR_AUTH_S 14
-#define KEYCTX_TX_WR_AUTH_M 0x1ff
-#define KEYCTX_TX_WR_AUTH_V(x) ((x) << KEYCTX_TX_WR_AUTH_S)
-#define KEYCTX_TX_WR_AUTH_G(x) \
- (((x) >> KEYCTX_TX_WR_AUTH_S) & KEYCTX_TX_WR_AUTH_M)
-
-#define KEYCTX_TX_WR_AUTHST_S 7
-#define KEYCTX_TX_WR_AUTHST_M 0x7f
-#define KEYCTX_TX_WR_AUTHST_V(x) ((x) << KEYCTX_TX_WR_AUTHST_S)
-#define KEYCTX_TX_WR_AUTHST_G(x) \
- (((x) >> KEYCTX_TX_WR_AUTHST_S) & KEYCTX_TX_WR_AUTHST_M)
-
-#define KEYCTX_TX_WR_AUTHIN_S 0
-#define KEYCTX_TX_WR_AUTHIN_M 0x7f
-#define KEYCTX_TX_WR_AUTHIN_V(x) ((x) << KEYCTX_TX_WR_AUTHIN_S)
-#define KEYCTX_TX_WR_AUTHIN_G(x) \
- (((x) >> KEYCTX_TX_WR_AUTHIN_S) & KEYCTX_TX_WR_AUTHIN_M)
-
-struct sge_opaque_hdr {
- void *dev;
- dma_addr_t addr[MAX_SKB_FRAGS + 1];
-};
-
-#define MAX_IVS_PAGE 256
-#define TLS_KEY_CONTEXT_SZ 64
-#define CIPHER_BLOCK_SIZE 16
-#define GCM_TAG_SIZE 16
-#define KEY_ON_MEM_SZ 16
-#define AEAD_EXPLICIT_DATA_SIZE 8
-#define TLS_HEADER_LENGTH 5
-#define SCMD_CIPH_MODE_AES_GCM 2
-/* Any MFS size should work and come from openssl */
-#define TLS_MFS 16384
-
-#define RSS_HDR sizeof(struct rss_header)
-#define TLS_WR_CPL_LEN \
- (sizeof(struct fw_tlstx_data_wr) + sizeof(struct cpl_tx_tls_sfo))
-
-enum {
- CHTLS_KEY_CONTEXT_DSGL,
- CHTLS_KEY_CONTEXT_IMM,
- CHTLS_KEY_CONTEXT_DDR,
-};
-
-enum {
- CHTLS_LISTEN_START,
- CHTLS_LISTEN_STOP,
-};
-
-/* Flags for return value of CPL message handlers */
-enum {
- CPL_RET_BUF_DONE = 1, /* buffer processing done */
- CPL_RET_BAD_MSG = 2, /* bad CPL message */
- CPL_RET_UNKNOWN_TID = 4 /* unexpected unknown TID */
-};
-
-#define LISTEN_INFO_HASH_SIZE 32
-#define RSPQ_HASH_BITS 5
-struct listen_info {
- struct listen_info *next; /* Link to next entry */
- struct sock *sk; /* The listening socket */
- unsigned int stid; /* The server TID */
-};
-
-enum {
- T4_LISTEN_START_PENDING,
- T4_LISTEN_STARTED
-};
-
-enum csk_flags {
- CSK_CALLBACKS_CHKD, /* socket callbacks have been sanitized */
- CSK_ABORT_REQ_RCVD, /* received one ABORT_REQ_RSS message */
- CSK_TX_MORE_DATA, /* sending ULP data; don't set SHOVE bit */
- CSK_TX_WAIT_IDLE, /* suspend Tx until in-flight data is ACKed */
- CSK_ABORT_SHUTDOWN, /* shouldn't send more abort requests */
- CSK_ABORT_RPL_PENDING, /* expecting an abort reply */
- CSK_CLOSE_CON_REQUESTED,/* we've sent a close_conn_req */
- CSK_TX_DATA_SENT, /* sent a TX_DATA WR on this connection */
- CSK_TX_FAILOVER, /* Tx traffic failing over */
- CSK_UPDATE_RCV_WND, /* Need to update rcv window */
- CSK_RST_ABORTED, /* outgoing RST was aborted */
- CSK_TLS_HANDSHK, /* TLS Handshake */
- CSK_CONN_INLINE, /* Connection on HW */
-};
-
-enum chtls_cdev_state {
- CHTLS_CDEV_STATE_UP = 1
-};
-
-struct listen_ctx {
- struct sock *lsk;
- struct chtls_dev *cdev;
- struct sk_buff_head synq;
- u32 state;
-};
-
-struct key_map {
- unsigned long *addr;
- unsigned int start;
- unsigned int available;
- unsigned int size;
- spinlock_t lock; /* lock for key id request from map */
-} __packed;
-
-struct tls_scmd {
- u32 seqno_numivs;
- u32 ivgen_hdrlen;
-};
-
-struct chtls_dev {
- struct tls_toe_device tlsdev;
- struct list_head list;
- struct cxgb4_lld_info *lldi;
- struct pci_dev *pdev;
- struct listen_info *listen_hash_tab[LISTEN_INFO_HASH_SIZE];
- spinlock_t listen_lock; /* lock for listen list */
- struct net_device **ports;
- struct tid_info *tids;
- unsigned int pfvf;
- const unsigned short *mtus;
-
- struct idr hwtid_idr;
- struct idr stid_idr;
-
- spinlock_t idr_lock ____cacheline_aligned_in_smp;
-
- struct net_device *egr_dev[NCHAN * 2];
- struct sk_buff *rspq_skb_cache[1 << RSPQ_HASH_BITS];
- struct sk_buff *askb;
-
- struct sk_buff_head deferq;
- struct work_struct deferq_task;
-
- struct list_head list_node;
- struct list_head rcu_node;
- struct list_head na_node;
- unsigned int send_page_order;
- int max_host_sndbuf;
- u32 round_robin_cnt;
- struct key_map kmap;
- unsigned int cdev_state;
-};
-
-struct chtls_listen {
- struct chtls_dev *cdev;
- struct sock *sk;
-};
-
-struct chtls_hws {
- struct sk_buff_head sk_recv_queue;
- u8 txqid;
- u8 ofld;
- u16 type;
- u16 rstate;
- u16 keyrpl;
- u16 pldlen;
- u16 rcvpld;
- u16 compute;
- u16 expansion;
- u16 keylen;
- u16 pdus;
- u16 adjustlen;
- u16 ivsize;
- u16 txleft;
- u32 mfs;
- s32 txkey;
- s32 rxkey;
- u32 fcplenmax;
- u32 copied_seq;
- u64 tx_seq_no;
- struct tls_scmd scmd;
- union {
- struct tls12_crypto_info_aes_gcm_128 aes_gcm_128;
- struct tls12_crypto_info_aes_gcm_256 aes_gcm_256;
- } crypto_info;
-};
-
-struct chtls_sock {
- struct sock *sk;
- struct chtls_dev *cdev;
- struct l2t_entry *l2t_entry; /* pointer to the L2T entry */
- struct net_device *egress_dev; /* TX_CHAN for act open retry */
-
- struct sk_buff_head txq;
- struct sk_buff *wr_skb_head;
- struct sk_buff *wr_skb_tail;
- struct sk_buff *ctrl_skb_cache;
- struct sk_buff *txdata_skb_cache; /* abort path messages */
- struct kref kref;
- unsigned long flags;
- u32 opt2;
- u32 wr_credits;
- u32 wr_unacked;
- u32 wr_max_credits;
- u32 wr_nondata;
- u32 hwtid; /* TCP Control Block ID */
- u32 txq_idx;
- u32 rss_qid;
- u32 tid;
- u32 idr;
- u32 mss;
- u32 ulp_mode;
- u32 tx_chan;
- u32 rx_chan;
- u32 sndbuf;
- u32 txplen_max;
- u32 mtu_idx; /* MTU table index */
- u32 smac_idx;
- u8 port_id;
- u8 tos;
- u16 resv2;
- u32 delack_mode;
- u32 delack_seq;
- u32 snd_win;
- u32 rcv_win;
-
- void *passive_reap_next; /* placeholder for passive */
- struct chtls_hws tlshws;
- struct synq {
- struct sk_buff *next;
- struct sk_buff *prev;
- } synq;
- struct listen_ctx *listen_ctx;
-};
-
-struct tls_hdr {
- u8 type;
- u16 version;
- u16 length;
-} __packed;
-
-struct tlsrx_cmp_hdr {
- u8 type;
- u16 version;
- u16 length;
-
- u64 tls_seq;
- u16 reserved1;
- u8 res_to_mac_error;
-} __packed;
-
-/* res_to_mac_error fields */
-#define TLSRX_HDR_PKT_INT_ERROR_S 4
-#define TLSRX_HDR_PKT_INT_ERROR_M 0x1
-#define TLSRX_HDR_PKT_INT_ERROR_V(x) \
- ((x) << TLSRX_HDR_PKT_INT_ERROR_S)
-#define TLSRX_HDR_PKT_INT_ERROR_G(x) \
- (((x) >> TLSRX_HDR_PKT_INT_ERROR_S) & TLSRX_HDR_PKT_INT_ERROR_M)
-#define TLSRX_HDR_PKT_INT_ERROR_F TLSRX_HDR_PKT_INT_ERROR_V(1U)
-
-#define TLSRX_HDR_PKT_SPP_ERROR_S 3
-#define TLSRX_HDR_PKT_SPP_ERROR_M 0x1
-#define TLSRX_HDR_PKT_SPP_ERROR_V(x) ((x) << TLSRX_HDR_PKT_SPP_ERROR)
-#define TLSRX_HDR_PKT_SPP_ERROR_G(x) \
- (((x) >> TLSRX_HDR_PKT_SPP_ERROR_S) & TLSRX_HDR_PKT_SPP_ERROR_M)
-#define TLSRX_HDR_PKT_SPP_ERROR_F TLSRX_HDR_PKT_SPP_ERROR_V(1U)
-
-#define TLSRX_HDR_PKT_CCDX_ERROR_S 2
-#define TLSRX_HDR_PKT_CCDX_ERROR_M 0x1
-#define TLSRX_HDR_PKT_CCDX_ERROR_V(x) ((x) << TLSRX_HDR_PKT_CCDX_ERROR_S)
-#define TLSRX_HDR_PKT_CCDX_ERROR_G(x) \
- (((x) >> TLSRX_HDR_PKT_CCDX_ERROR_S) & TLSRX_HDR_PKT_CCDX_ERROR_M)
-#define TLSRX_HDR_PKT_CCDX_ERROR_F TLSRX_HDR_PKT_CCDX_ERROR_V(1U)
-
-#define TLSRX_HDR_PKT_PAD_ERROR_S 1
-#define TLSRX_HDR_PKT_PAD_ERROR_M 0x1
-#define TLSRX_HDR_PKT_PAD_ERROR_V(x) ((x) << TLSRX_HDR_PKT_PAD_ERROR_S)
-#define TLSRX_HDR_PKT_PAD_ERROR_G(x) \
- (((x) >> TLSRX_HDR_PKT_PAD_ERROR_S) & TLSRX_HDR_PKT_PAD_ERROR_M)
-#define TLSRX_HDR_PKT_PAD_ERROR_F TLSRX_HDR_PKT_PAD_ERROR_V(1U)
-
-#define TLSRX_HDR_PKT_MAC_ERROR_S 0
-#define TLSRX_HDR_PKT_MAC_ERROR_M 0x1
-#define TLSRX_HDR_PKT_MAC_ERROR_V(x) ((x) << TLSRX_HDR_PKT_MAC_ERROR)
-#define TLSRX_HDR_PKT_MAC_ERROR_G(x) \
- (((x) >> S_TLSRX_HDR_PKT_MAC_ERROR_S) & TLSRX_HDR_PKT_MAC_ERROR_M)
-#define TLSRX_HDR_PKT_MAC_ERROR_F TLSRX_HDR_PKT_MAC_ERROR_V(1U)
-
-#define TLSRX_HDR_PKT_ERROR_M 0x1F
-#define CONTENT_TYPE_ERROR 0x7F
-
-struct ulp_mem_rw {
- __be32 cmd;
- __be32 len16; /* command length */
- __be32 dlen; /* data length in 32-byte units */
- __be32 lock_addr;
-};
-
-struct tls_key_wr {
- __be32 op_to_compl;
- __be32 flowid_len16;
- __be32 ftid;
- u8 reneg_to_write_rx;
- u8 protocol;
- __be16 mfs;
-};
-
-struct tls_key_req {
- struct tls_key_wr wr;
- struct ulp_mem_rw req;
- struct ulptx_idata sc_imm;
-};
-
-/*
- * This lives in skb->cb and is used to chain WRs in a linked list.
- */
-struct wr_skb_cb {
- struct l2t_skb_cb l2t; /* reserve space for l2t CB */
- struct sk_buff *next_wr; /* next write request */
-};
-
-/* Per-skb backlog handler. Run when a socket's backlog is processed. */
-struct blog_skb_cb {
- void (*backlog_rcv)(struct sock *sk, struct sk_buff *skb);
- struct chtls_dev *cdev;
-};
-
-/*
- * Similar to tcp_skb_cb but with ULP elements added to support TLS,
- * etc.
- */
-struct ulp_skb_cb {
- struct wr_skb_cb wr; /* reserve space for write request */
- u16 flags; /* TCP-like flags */
- u8 psh;
- u8 ulp_mode; /* ULP mode/submode of sk_buff */
- u32 seq; /* TCP sequence number */
- union { /* ULP-specific fields */
- struct {
- u8 type;
- u8 ofld;
- u8 iv;
- } tls;
- } ulp;
-};
-
-#define ULP_SKB_CB(skb) ((struct ulp_skb_cb *)&((skb)->cb[0]))
-#define BLOG_SKB_CB(skb) ((struct blog_skb_cb *)(skb)->cb)
-
-/*
- * Flags for ulp_skb_cb.flags.
- */
-enum {
- ULPCB_FLAG_NEED_HDR = 1 << 0, /* packet needs a TX_DATA_WR header */
- ULPCB_FLAG_NO_APPEND = 1 << 1, /* don't grow this skb */
- ULPCB_FLAG_BARRIER = 1 << 2, /* set TX_WAIT_IDLE after sending */
- ULPCB_FLAG_HOLD = 1 << 3, /* skb not ready for Tx yet */
- ULPCB_FLAG_COMPL = 1 << 4, /* request WR completion */
- ULPCB_FLAG_URG = 1 << 5, /* urgent data */
- ULPCB_FLAG_TLS_HDR = 1 << 6, /* payload with tls hdr */
- ULPCB_FLAG_NO_HDR = 1 << 7, /* not a ofld wr */
-};
-
-/* The ULP mode/submode of an skbuff */
-#define skb_ulp_mode(skb) (ULP_SKB_CB(skb)->ulp_mode)
-#define TCP_PAGE(sk) (sk->sk_frag.page)
-#define TCP_OFF(sk) (sk->sk_frag.offset)
-
-static inline struct chtls_dev *to_chtls_dev(struct tls_toe_device *tlsdev)
-{
- return container_of(tlsdev, struct chtls_dev, tlsdev);
-}
-
-static inline void csk_set_flag(struct chtls_sock *csk,
- enum csk_flags flag)
-{
- __set_bit(flag, &csk->flags);
-}
-
-static inline void csk_reset_flag(struct chtls_sock *csk,
- enum csk_flags flag)
-{
- __clear_bit(flag, &csk->flags);
-}
-
-static inline bool csk_conn_inline(const struct chtls_sock *csk)
-{
- return test_bit(CSK_CONN_INLINE, &csk->flags);
-}
-
-static inline int csk_flag(const struct sock *sk, enum csk_flags flag)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
-
- if (!csk_conn_inline(csk))
- return 0;
- return test_bit(flag, &csk->flags);
-}
-
-static inline int csk_flag_nochk(const struct chtls_sock *csk,
- enum csk_flags flag)
-{
- return test_bit(flag, &csk->flags);
-}
-
-static inline void *cplhdr(struct sk_buff *skb)
-{
- return skb->data;
-}
-
-static inline int is_neg_adv(unsigned int status)
-{
- return status == CPL_ERR_RTX_NEG_ADVICE ||
- status == CPL_ERR_KEEPALV_NEG_ADVICE ||
- status == CPL_ERR_PERSIST_NEG_ADVICE;
-}
-
-static inline void process_cpl_msg(void (*fn)(struct sock *, struct sk_buff *),
- struct sock *sk,
- struct sk_buff *skb)
-{
- skb_reset_mac_header(skb);
- skb_reset_network_header(skb);
- skb_reset_transport_header(skb);
-
- bh_lock_sock(sk);
- if (unlikely(sock_owned_by_user(sk))) {
- BLOG_SKB_CB(skb)->backlog_rcv = fn;
- __sk_add_backlog(sk, skb);
- } else {
- fn(sk, skb);
- }
- bh_unlock_sock(sk);
-}
-
-static inline void chtls_sock_free(struct kref *ref)
-{
- struct chtls_sock *csk = container_of(ref, struct chtls_sock,
- kref);
- kfree(csk);
-}
-
-static inline void __chtls_sock_put(const char *fn, struct chtls_sock *csk)
-{
- kref_put(&csk->kref, chtls_sock_free);
-}
-
-static inline void __chtls_sock_get(const char *fn,
- struct chtls_sock *csk)
-{
- kref_get(&csk->kref);
-}
-
-static inline void send_or_defer(struct sock *sk, struct tcp_sock *tp,
- struct sk_buff *skb, int through_l2t)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
-
- if (through_l2t) {
- /* send through L2T */
- cxgb4_l2t_send(csk->egress_dev, skb, csk->l2t_entry);
- } else {
- /* send directly */
- cxgb4_ofld_send(csk->egress_dev, skb);
- }
-}
-
-typedef int (*chtls_handler_func)(struct chtls_dev *, struct sk_buff *);
-extern chtls_handler_func chtls_handlers[NUM_CPL_CMDS];
-void chtls_install_cpl_ops(struct sock *sk);
-int chtls_init_kmap(struct chtls_dev *cdev, struct cxgb4_lld_info *lldi);
-void chtls_listen_stop(struct chtls_dev *cdev, struct sock *sk);
-int chtls_listen_start(struct chtls_dev *cdev, struct sock *sk);
-void chtls_close(struct sock *sk, long timeout);
-int chtls_disconnect(struct sock *sk, int flags);
-void chtls_shutdown(struct sock *sk, int how);
-void chtls_destroy_sock(struct sock *sk);
-int chtls_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
-int chtls_recvmsg(struct sock *sk, struct msghdr *msg,
- size_t len, int flags);
-void chtls_splice_eof(struct socket *sock);
-int send_tx_flowc_wr(struct sock *sk, int compl,
- u32 snd_nxt, u32 rcv_nxt);
-void chtls_tcp_push(struct sock *sk, int flags);
-int chtls_push_frames(struct chtls_sock *csk, int comp);
-void chtls_set_tcb_field_rpl_skb(struct sock *sk, u16 word,
- u64 mask, u64 val, u8 cookie,
- int through_l2t);
-int chtls_setkey(struct chtls_sock *csk, u32 keylen, u32 mode, int cipher_type);
-void chtls_set_quiesce_ctrl(struct sock *sk, int val);
-void skb_entail(struct sock *sk, struct sk_buff *skb, int flags);
-unsigned int keyid_to_addr(int start_addr, int keyid);
-void free_tls_keyid(struct sock *sk);
-#endif
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
deleted file mode 100644
index 0e3e5cf52c2c..000000000000
--- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
+++ /dev/null
@@ -1,2336 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Copyright (c) 2018 Chelsio Communications, Inc.
- *
- * Written by: Atul Gupta (atul.gupta@chelsio.com)
- */
-
-#include <linux/module.h>
-#include <linux/list.h>
-#include <linux/workqueue.h>
-#include <linux/skbuff.h>
-#include <linux/timer.h>
-#include <linux/notifier.h>
-#include <linux/inetdevice.h>
-#include <linux/ip.h>
-#include <linux/tcp.h>
-#include <linux/sched/signal.h>
-#include <linux/kallsyms.h>
-#include <linux/kprobes.h>
-#include <linux/if_vlan.h>
-#include <linux/ipv6.h>
-#include <net/ipv6.h>
-#include <net/transp_v6.h>
-#include <net/ip6_route.h>
-#include <net/inet_common.h>
-#include <net/tcp.h>
-#include <net/dst.h>
-#include <net/tls.h>
-#include <net/addrconf.h>
-#include <net/secure_seq.h>
-
-#include "chtls.h"
-#include "chtls_cm.h"
-#include "clip_tbl.h"
-#include "t4_tcb.h"
-
-/*
- * State transitions and actions for close. Note that if we are in SYN_SENT
- * we remain in that state as we cannot control a connection while it's in
- * SYN_SENT; such connections are allowed to establish and are then aborted.
- */
-static unsigned char new_state[16] = {
- /* current state: new state: action: */
- /* (Invalid) */ TCP_CLOSE,
- /* TCP_ESTABLISHED */ TCP_FIN_WAIT1 | TCP_ACTION_FIN,
- /* TCP_SYN_SENT */ TCP_SYN_SENT,
- /* TCP_SYN_RECV */ TCP_FIN_WAIT1 | TCP_ACTION_FIN,
- /* TCP_FIN_WAIT1 */ TCP_FIN_WAIT1,
- /* TCP_FIN_WAIT2 */ TCP_FIN_WAIT2,
- /* TCP_TIME_WAIT */ TCP_CLOSE,
- /* TCP_CLOSE */ TCP_CLOSE,
- /* TCP_CLOSE_WAIT */ TCP_LAST_ACK | TCP_ACTION_FIN,
- /* TCP_LAST_ACK */ TCP_LAST_ACK,
- /* TCP_LISTEN */ TCP_CLOSE,
- /* TCP_CLOSING */ TCP_CLOSING,
-};
-
-static struct chtls_sock *chtls_sock_create(struct chtls_dev *cdev)
-{
- struct chtls_sock *csk = kzalloc_obj(*csk, GFP_ATOMIC);
-
- if (!csk)
- return NULL;
-
- csk->txdata_skb_cache = alloc_skb(TXDATA_SKB_LEN, GFP_ATOMIC);
- if (!csk->txdata_skb_cache) {
- kfree(csk);
- return NULL;
- }
-
- kref_init(&csk->kref);
- csk->cdev = cdev;
- skb_queue_head_init(&csk->txq);
- csk->wr_skb_head = NULL;
- csk->wr_skb_tail = NULL;
- csk->mss = MAX_MSS;
- csk->tlshws.ofld = 1;
- csk->tlshws.txkey = -1;
- csk->tlshws.rxkey = -1;
- csk->tlshws.mfs = TLS_MFS;
- skb_queue_head_init(&csk->tlshws.sk_recv_queue);
- return csk;
-}
-
-static void chtls_sock_release(struct kref *ref)
-{
- struct chtls_sock *csk =
- container_of(ref, struct chtls_sock, kref);
-
- kfree(csk);
-}
-
-static struct net_device *chtls_find_netdev(struct chtls_dev *cdev,
- struct sock *sk)
-{
- struct adapter *adap = pci_get_drvdata(cdev->pdev);
- struct net_device *ndev = cdev->ports[0];
-#if IS_ENABLED(CONFIG_IPV6)
- struct net_device *temp;
- int addr_type;
-#endif
- int i;
-
- switch (sk->sk_family) {
- case PF_INET:
- if (likely(!inet_sk(sk)->inet_rcv_saddr))
- return ndev;
- ndev = __ip_dev_find(&init_net, inet_sk(sk)->inet_rcv_saddr, false);
- break;
-#if IS_ENABLED(CONFIG_IPV6)
- case PF_INET6:
- addr_type = ipv6_addr_type(&sk->sk_v6_rcv_saddr);
- if (likely(addr_type == IPV6_ADDR_ANY))
- return ndev;
-
- for_each_netdev_rcu(&init_net, temp) {
- if (ipv6_chk_addr(&init_net, (struct in6_addr *)
- &sk->sk_v6_rcv_saddr, temp, 1)) {
- ndev = temp;
- break;
- }
- }
- break;
-#endif
- default:
- return NULL;
- }
-
- if (!ndev)
- return NULL;
-
- if (is_vlan_dev(ndev))
- ndev = vlan_dev_real_dev(ndev);
-
- for_each_port(adap, i)
- if (cdev->ports[i] == ndev)
- return ndev;
- return NULL;
-}
-
-static void assign_rxopt(struct sock *sk, unsigned int opt)
-{
- const struct chtls_dev *cdev;
- struct chtls_sock *csk;
- struct tcp_sock *tp;
-
- csk = rcu_dereference_sk_user_data(sk);
- tp = tcp_sk(sk);
-
- cdev = csk->cdev;
- tp->tcp_header_len = sizeof(struct tcphdr);
- tp->rx_opt.mss_clamp = cdev->mtus[TCPOPT_MSS_G(opt)] - 40;
- tp->mss_cache = tp->rx_opt.mss_clamp;
- tp->rx_opt.tstamp_ok = TCPOPT_TSTAMP_G(opt);
- tp->rx_opt.snd_wscale = TCPOPT_SACK_G(opt);
- tp->rx_opt.wscale_ok = TCPOPT_WSCALE_OK_G(opt);
- SND_WSCALE(tp) = TCPOPT_SND_WSCALE_G(opt);
- if (!tp->rx_opt.wscale_ok)
- tp->rx_opt.rcv_wscale = 0;
- if (tp->rx_opt.tstamp_ok) {
- tp->tcp_header_len += TCPOLEN_TSTAMP_ALIGNED;
- tp->rx_opt.mss_clamp -= TCPOLEN_TSTAMP_ALIGNED;
- } else if (csk->opt2 & TSTAMPS_EN_F) {
- csk->opt2 &= ~TSTAMPS_EN_F;
- csk->mtu_idx = TCPOPT_MSS_G(opt);
- }
-}
-
-static void chtls_purge_receive_queue(struct sock *sk)
-{
- struct sk_buff *skb;
-
- while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) {
- skb_dstref_steal(skb);
- kfree_skb(skb);
- }
-}
-
-static void chtls_purge_write_queue(struct sock *sk)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
- struct sk_buff *skb;
-
- while ((skb = __skb_dequeue(&csk->txq))) {
- sk->sk_wmem_queued -= skb->truesize;
- __kfree_skb(skb);
- }
-}
-
-static void chtls_purge_recv_queue(struct sock *sk)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
- struct chtls_hws *tlsk = &csk->tlshws;
- struct sk_buff *skb;
-
- while ((skb = __skb_dequeue(&tlsk->sk_recv_queue)) != NULL) {
- skb_dstref_steal(skb);
- kfree_skb(skb);
- }
-}
-
-static void abort_arp_failure(void *handle, struct sk_buff *skb)
-{
- struct cpl_abort_req *req = cplhdr(skb);
- struct chtls_dev *cdev;
-
- cdev = (struct chtls_dev *)handle;
- req->cmd = CPL_ABORT_NO_RST;
- cxgb4_ofld_send(cdev->lldi->ports[0], skb);
-}
-
-static struct sk_buff *alloc_ctrl_skb(struct sk_buff *skb, int len)
-{
- if (likely(skb && !skb_shared(skb) && !skb_cloned(skb))) {
- __skb_trim(skb, 0);
- refcount_inc(&skb->users);
- } else {
- skb = alloc_skb(len, GFP_KERNEL | __GFP_NOFAIL);
- }
- return skb;
-}
-
-static void chtls_send_abort(struct sock *sk, int mode, struct sk_buff *skb)
-{
- struct cpl_abort_req *req;
- struct chtls_sock *csk;
- struct tcp_sock *tp;
-
- csk = rcu_dereference_sk_user_data(sk);
- tp = tcp_sk(sk);
-
- if (!skb)
- skb = alloc_ctrl_skb(csk->txdata_skb_cache, sizeof(*req));
-
- req = (struct cpl_abort_req *)skb_put(skb, sizeof(*req));
- INIT_TP_WR_CPL(req, CPL_ABORT_REQ, csk->tid);
- skb_set_queue_mapping(skb, (csk->txq_idx << 1) | CPL_PRIORITY_DATA);
- req->rsvd0 = htonl(tp->snd_nxt);
- req->rsvd1 = !csk_flag_nochk(csk, CSK_TX_DATA_SENT);
- req->cmd = mode;
- t4_set_arp_err_handler(skb, csk->cdev, abort_arp_failure);
- send_or_defer(sk, tp, skb, mode == CPL_ABORT_SEND_RST);
-}
-
-static void chtls_send_reset(struct sock *sk, int mode, struct sk_buff *skb)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
-
- if (unlikely(csk_flag_nochk(csk, CSK_ABORT_SHUTDOWN) ||
- !csk->cdev)) {
- if (sk->sk_state == TCP_SYN_RECV)
- csk_set_flag(csk, CSK_RST_ABORTED);
- goto out;
- }
-
- if (!csk_flag_nochk(csk, CSK_TX_DATA_SENT)) {
- struct tcp_sock *tp = tcp_sk(sk);
-
- if (send_tx_flowc_wr(sk, 0, tp->snd_nxt, tp->rcv_nxt) < 0)
- WARN_ONCE(1, "send tx flowc error");
- csk_set_flag(csk, CSK_TX_DATA_SENT);
- }
-
- csk_set_flag(csk, CSK_ABORT_RPL_PENDING);
- chtls_purge_write_queue(sk);
-
- csk_set_flag(csk, CSK_ABORT_SHUTDOWN);
- if (sk->sk_state != TCP_SYN_RECV)
- chtls_send_abort(sk, mode, skb);
- else
- chtls_set_tcb_field_rpl_skb(sk, TCB_T_FLAGS_W,
- TCB_T_FLAGS_V(TCB_T_FLAGS_M), 0,
- TCB_FIELD_COOKIE_TFLAG, 1);
-
- return;
-out:
- kfree_skb(skb);
-}
-
-static void release_tcp_port(struct sock *sk)
-{
- if (inet_csk(sk)->icsk_bind_hash)
- inet_put_port(sk);
-}
-
-static void tcp_uncork(struct sock *sk)
-{
- struct tcp_sock *tp = tcp_sk(sk);
-
- if (tp->nonagle & TCP_NAGLE_CORK) {
- tp->nonagle &= ~TCP_NAGLE_CORK;
- chtls_tcp_push(sk, 0);
- }
-}
-
-static void chtls_close_conn(struct sock *sk)
-{
- struct cpl_close_con_req *req;
- struct chtls_sock *csk;
- struct sk_buff *skb;
- unsigned int tid;
- unsigned int len;
-
- len = roundup(sizeof(struct cpl_close_con_req), 16);
- csk = rcu_dereference_sk_user_data(sk);
- tid = csk->tid;
-
- skb = alloc_skb(len, GFP_KERNEL | __GFP_NOFAIL);
- req = (struct cpl_close_con_req *)__skb_put(skb, len);
- memset(req, 0, len);
- req->wr.wr_hi = htonl(FW_WR_OP_V(FW_TP_WR) |
- FW_WR_IMMDLEN_V(sizeof(*req) -
- sizeof(req->wr)));
- req->wr.wr_mid = htonl(FW_WR_LEN16_V(DIV_ROUND_UP(sizeof(*req), 16)) |
- FW_WR_FLOWID_V(tid));
-
- OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_CLOSE_CON_REQ, tid));
-
- tcp_uncork(sk);
- skb_entail(sk, skb, ULPCB_FLAG_NO_HDR | ULPCB_FLAG_NO_APPEND);
- if (sk->sk_state != TCP_SYN_SENT)
- chtls_push_frames(csk, 1);
-}
-
-/*
- * Perform a state transition during close and return the actions indicated
- * for the transition. Do not make this function inline, the main reason
- * it exists at all is to avoid multiple inlining of tcp_set_state.
- */
-static int make_close_transition(struct sock *sk)
-{
- int next = (int)new_state[sk->sk_state];
-
- tcp_set_state(sk, next & TCP_STATE_MASK);
- return next & TCP_ACTION_FIN;
-}
-
-void chtls_close(struct sock *sk, long timeout)
-{
- int data_lost, prev_state;
- struct chtls_sock *csk;
-
- csk = rcu_dereference_sk_user_data(sk);
-
- lock_sock(sk);
- sk->sk_shutdown |= SHUTDOWN_MASK;
-
- data_lost = skb_queue_len(&sk->sk_receive_queue);
- data_lost |= skb_queue_len(&csk->tlshws.sk_recv_queue);
- chtls_purge_recv_queue(sk);
- chtls_purge_receive_queue(sk);
-
- if (sk->sk_state == TCP_CLOSE) {
- goto wait;
- } else if (data_lost || sk->sk_state == TCP_SYN_SENT) {
- chtls_send_reset(sk, CPL_ABORT_SEND_RST, NULL);
- release_tcp_port(sk);
- goto unlock;
- } else if (sock_flag(sk, SOCK_LINGER) && !sk->sk_lingertime) {
- sk->sk_prot->disconnect(sk, 0);
- } else if (make_close_transition(sk)) {
- chtls_close_conn(sk);
- }
-wait:
- if (timeout)
- sk_stream_wait_close(sk, timeout);
-
-unlock:
- prev_state = sk->sk_state;
- sock_hold(sk);
- sock_orphan(sk);
-
- release_sock(sk);
-
- local_bh_disable();
- bh_lock_sock(sk);
-
- if (prev_state != TCP_CLOSE && sk->sk_state == TCP_CLOSE)
- goto out;
-
- if (sk->sk_state == TCP_FIN_WAIT2 && tcp_sk(sk)->linger2 < 0 &&
- !csk_flag(sk, CSK_ABORT_SHUTDOWN)) {
- struct sk_buff *skb;
-
- skb = alloc_skb(sizeof(struct cpl_abort_req), GFP_ATOMIC);
- if (skb)
- chtls_send_reset(sk, CPL_ABORT_SEND_RST, skb);
- }
-
- if (sk->sk_state == TCP_CLOSE)
- inet_csk_destroy_sock(sk);
-
-out:
- bh_unlock_sock(sk);
- local_bh_enable();
- sock_put(sk);
-}
-
-/*
- * Wait until a socket enters on of the given states.
- */
-static int wait_for_states(struct sock *sk, unsigned int states)
-{
- DECLARE_WAITQUEUE(wait, current);
- struct socket_wq _sk_wq;
- long current_timeo;
- int err = 0;
-
- current_timeo = 200;
-
- /*
- * We want this to work even when there's no associated struct socket.
- * In that case we provide a temporary wait_queue_head_t.
- */
- if (!sk->sk_wq) {
- init_waitqueue_head(&_sk_wq.wait);
- _sk_wq.fasync_list = NULL;
- init_rcu_head_on_stack(&_sk_wq.rcu);
- RCU_INIT_POINTER(sk->sk_wq, &_sk_wq);
- }
-
- add_wait_queue(sk_sleep(sk), &wait);
- while (!sk_in_state(sk, states)) {
- if (!current_timeo) {
- err = -EBUSY;
- break;
- }
- if (signal_pending(current)) {
- err = sock_intr_errno(current_timeo);
- break;
- }
- set_current_state(TASK_UNINTERRUPTIBLE);
- release_sock(sk);
- if (!sk_in_state(sk, states))
- current_timeo = schedule_timeout(current_timeo);
- __set_current_state(TASK_RUNNING);
- lock_sock(sk);
- }
- remove_wait_queue(sk_sleep(sk), &wait);
-
- if (rcu_dereference(sk->sk_wq) == &_sk_wq)
- sk->sk_wq = NULL;
- return err;
-}
-
-int chtls_disconnect(struct sock *sk, int flags)
-{
- struct tcp_sock *tp;
- int err;
-
- tp = tcp_sk(sk);
- chtls_purge_recv_queue(sk);
- chtls_purge_receive_queue(sk);
- chtls_purge_write_queue(sk);
-
- if (sk->sk_state != TCP_CLOSE) {
- sk->sk_err = ECONNRESET;
- chtls_send_reset(sk, CPL_ABORT_SEND_RST, NULL);
- err = wait_for_states(sk, TCPF_CLOSE);
- if (err)
- return err;
- }
- chtls_purge_recv_queue(sk);
- chtls_purge_receive_queue(sk);
- tp->max_window = 0xFFFF << (tp->rx_opt.snd_wscale);
- return tcp_disconnect(sk, flags);
-}
-
-#define SHUTDOWN_ELIGIBLE_STATE (TCPF_ESTABLISHED | \
- TCPF_SYN_RECV | TCPF_CLOSE_WAIT)
-void chtls_shutdown(struct sock *sk, int how)
-{
- if ((how & SEND_SHUTDOWN) &&
- sk_in_state(sk, SHUTDOWN_ELIGIBLE_STATE) &&
- make_close_transition(sk))
- chtls_close_conn(sk);
-}
-
-void chtls_destroy_sock(struct sock *sk)
-{
- struct chtls_sock *csk;
-
- csk = rcu_dereference_sk_user_data(sk);
- chtls_purge_recv_queue(sk);
- csk->ulp_mode = ULP_MODE_NONE;
- chtls_purge_write_queue(sk);
- free_tls_keyid(sk);
- kref_put(&csk->kref, chtls_sock_release);
- if (sk->sk_family == AF_INET)
- sk->sk_prot = &tcp_prot;
-#if IS_ENABLED(CONFIG_IPV6)
- else
- sk->sk_prot = &tcpv6_prot;
-#endif
- sk->sk_prot->destroy(sk);
-}
-
-static void reset_listen_child(struct sock *child)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(child);
- struct sk_buff *skb;
-
- skb = alloc_ctrl_skb(csk->txdata_skb_cache,
- sizeof(struct cpl_abort_req));
-
- chtls_send_reset(child, CPL_ABORT_SEND_RST, skb);
- sock_orphan(child);
- tcp_orphan_count_inc();
- if (child->sk_state == TCP_CLOSE)
- inet_csk_destroy_sock(child);
-}
-
-static void chtls_disconnect_acceptq(struct sock *listen_sk)
-{
- struct request_sock **pprev;
-
- pprev = ACCEPT_QUEUE(listen_sk);
- while (*pprev) {
- struct request_sock *req = *pprev;
-
- if (req->rsk_ops == &chtls_rsk_ops ||
- req->rsk_ops == &chtls_rsk_opsv6) {
- struct sock *child = req->sk;
-
- *pprev = req->dl_next;
- sk_acceptq_removed(listen_sk);
- reqsk_put(req);
- sock_hold(child);
- local_bh_disable();
- bh_lock_sock(child);
- release_tcp_port(child);
- reset_listen_child(child);
- bh_unlock_sock(child);
- local_bh_enable();
- sock_put(child);
- } else {
- pprev = &req->dl_next;
- }
- }
-}
-
-static int listen_hashfn(const struct sock *sk)
-{
- return ((unsigned long)sk >> 10) & (LISTEN_INFO_HASH_SIZE - 1);
-}
-
-static struct listen_info *listen_hash_add(struct chtls_dev *cdev,
- struct sock *sk,
- unsigned int stid)
-{
- struct listen_info *p = kmalloc_obj(*p);
-
- if (p) {
- int key = listen_hashfn(sk);
-
- p->sk = sk;
- p->stid = stid;
- spin_lock(&cdev->listen_lock);
- p->next = cdev->listen_hash_tab[key];
- cdev->listen_hash_tab[key] = p;
- spin_unlock(&cdev->listen_lock);
- }
- return p;
-}
-
-static int listen_hash_find(struct chtls_dev *cdev,
- struct sock *sk)
-{
- struct listen_info *p;
- int stid = -1;
- int key;
-
- key = listen_hashfn(sk);
-
- spin_lock(&cdev->listen_lock);
- for (p = cdev->listen_hash_tab[key]; p; p = p->next)
- if (p->sk == sk) {
- stid = p->stid;
- break;
- }
- spin_unlock(&cdev->listen_lock);
- return stid;
-}
-
-static int listen_hash_del(struct chtls_dev *cdev,
- struct sock *sk)
-{
- struct listen_info *p, **prev;
- int stid = -1;
- int key;
-
- key = listen_hashfn(sk);
- prev = &cdev->listen_hash_tab[key];
-
- spin_lock(&cdev->listen_lock);
- for (p = *prev; p; prev = &p->next, p = p->next)
- if (p->sk == sk) {
- stid = p->stid;
- *prev = p->next;
- kfree(p);
- break;
- }
- spin_unlock(&cdev->listen_lock);
- return stid;
-}
-
-static void cleanup_syn_rcv_conn(struct sock *child, struct sock *parent)
-{
- struct request_sock *req;
- struct chtls_sock *csk;
-
- csk = rcu_dereference_sk_user_data(child);
- req = csk->passive_reap_next;
-
- reqsk_queue_removed(&inet_csk(parent)->icsk_accept_queue, req);
- __skb_unlink((struct sk_buff *)&csk->synq, &csk->listen_ctx->synq);
- chtls_reqsk_free(req);
- csk->passive_reap_next = NULL;
-}
-
-static void chtls_reset_synq(struct listen_ctx *listen_ctx)
-{
- struct sock *listen_sk = listen_ctx->lsk;
-
- while (!skb_queue_empty(&listen_ctx->synq)) {
- struct chtls_sock *csk =
- container_of((struct synq *)skb_peek
- (&listen_ctx->synq), struct chtls_sock, synq);
- struct sock *child = csk->sk;
-
- cleanup_syn_rcv_conn(child, listen_sk);
- sock_hold(child);
- local_bh_disable();
- bh_lock_sock(child);
- release_tcp_port(child);
- reset_listen_child(child);
- bh_unlock_sock(child);
- local_bh_enable();
- sock_put(child);
- }
-}
-
-int chtls_listen_start(struct chtls_dev *cdev, struct sock *sk)
-{
- struct net_device *ndev;
-#if IS_ENABLED(CONFIG_IPV6)
- bool clip_valid = false;
-#endif
- struct listen_ctx *ctx;
- struct adapter *adap;
- struct port_info *pi;
- int ret = 0;
- int stid;
-
- rcu_read_lock();
- ndev = chtls_find_netdev(cdev, sk);
- rcu_read_unlock();
- if (!ndev)
- return -EBADF;
-
- pi = netdev_priv(ndev);
- adap = pi->adapter;
- if (!(adap->flags & CXGB4_FULL_INIT_DONE))
- return -EBADF;
-
- if (listen_hash_find(cdev, sk) >= 0) /* already have it */
- return -EADDRINUSE;
-
- ctx = kmalloc_obj(*ctx);
- if (!ctx)
- return -ENOMEM;
-
- __module_get(THIS_MODULE);
- ctx->lsk = sk;
- ctx->cdev = cdev;
- ctx->state = T4_LISTEN_START_PENDING;
- skb_queue_head_init(&ctx->synq);
-
- stid = cxgb4_alloc_stid(cdev->tids, sk->sk_family, ctx);
- if (stid < 0)
- goto free_ctx;
-
- sock_hold(sk);
- if (!listen_hash_add(cdev, sk, stid))
- goto free_stid;
-
- if (sk->sk_family == PF_INET) {
- ret = cxgb4_create_server(ndev, stid,
- inet_sk(sk)->inet_rcv_saddr,
- inet_sk(sk)->inet_sport, 0,
- cdev->lldi->rxq_ids[0]);
-#if IS_ENABLED(CONFIG_IPV6)
- } else {
- int addr_type;
-
- addr_type = ipv6_addr_type(&sk->sk_v6_rcv_saddr);
- if (addr_type != IPV6_ADDR_ANY) {
- ret = cxgb4_clip_get(ndev, (const u32 *)
- &sk->sk_v6_rcv_saddr, 1);
- if (ret)
- goto del_hash;
- clip_valid = true;
- }
- ret = cxgb4_create_server6(ndev, stid,
- &sk->sk_v6_rcv_saddr,
- inet_sk(sk)->inet_sport,
- cdev->lldi->rxq_ids[0]);
-#endif
- }
- if (ret > 0)
- ret = net_xmit_errno(ret);
- if (ret)
- goto del_hash;
- return 0;
-del_hash:
-#if IS_ENABLED(CONFIG_IPV6)
- if (clip_valid)
- cxgb4_clip_release(ndev, (const u32 *)&sk->sk_v6_rcv_saddr, 1);
-#endif
- listen_hash_del(cdev, sk);
-free_stid:
- cxgb4_free_stid(cdev->tids, stid, sk->sk_family);
- sock_put(sk);
-free_ctx:
- kfree(ctx);
- module_put(THIS_MODULE);
- return -EBADF;
-}
-
-void chtls_listen_stop(struct chtls_dev *cdev, struct sock *sk)
-{
- struct listen_ctx *listen_ctx;
- int stid;
-
- stid = listen_hash_del(cdev, sk);
- if (stid < 0)
- return;
-
- listen_ctx = (struct listen_ctx *)lookup_stid(cdev->tids, stid);
- chtls_reset_synq(listen_ctx);
-
- cxgb4_remove_server(cdev->lldi->ports[0], stid,
- cdev->lldi->rxq_ids[0], sk->sk_family == PF_INET6);
-
-#if IS_ENABLED(CONFIG_IPV6)
- if (sk->sk_family == PF_INET6) {
- struct net_device *ndev = chtls_find_netdev(cdev, sk);
- int addr_type = 0;
-
- addr_type = ipv6_addr_type((const struct in6_addr *)
- &sk->sk_v6_rcv_saddr);
- if (addr_type != IPV6_ADDR_ANY)
- cxgb4_clip_release(ndev, (const u32 *)
- &sk->sk_v6_rcv_saddr, 1);
- }
-#endif
- chtls_disconnect_acceptq(sk);
-}
-
-static int chtls_pass_open_rpl(struct chtls_dev *cdev, struct sk_buff *skb)
-{
- struct cpl_pass_open_rpl *rpl = cplhdr(skb) + RSS_HDR;
- unsigned int stid = GET_TID(rpl);
- struct listen_ctx *listen_ctx;
-
- listen_ctx = (struct listen_ctx *)lookup_stid(cdev->tids, stid);
- if (!listen_ctx)
- return CPL_RET_BUF_DONE;
-
- if (listen_ctx->state == T4_LISTEN_START_PENDING) {
- listen_ctx->state = T4_LISTEN_STARTED;
- return CPL_RET_BUF_DONE;
- }
-
- if (rpl->status != CPL_ERR_NONE) {
- pr_info("Unexpected PASS_OPEN_RPL status %u for STID %u\n",
- rpl->status, stid);
- } else {
- cxgb4_free_stid(cdev->tids, stid, listen_ctx->lsk->sk_family);
- sock_put(listen_ctx->lsk);
- kfree(listen_ctx);
- module_put(THIS_MODULE);
- }
- return CPL_RET_BUF_DONE;
-}
-
-static int chtls_close_listsrv_rpl(struct chtls_dev *cdev, struct sk_buff *skb)
-{
- struct cpl_close_listsvr_rpl *rpl = cplhdr(skb) + RSS_HDR;
- struct listen_ctx *listen_ctx;
- unsigned int stid;
- void *data;
-
- stid = GET_TID(rpl);
- data = lookup_stid(cdev->tids, stid);
- listen_ctx = (struct listen_ctx *)data;
-
- if (rpl->status != CPL_ERR_NONE) {
- pr_info("Unexpected CLOSE_LISTSRV_RPL status %u for STID %u\n",
- rpl->status, stid);
- } else {
- cxgb4_free_stid(cdev->tids, stid, listen_ctx->lsk->sk_family);
- sock_put(listen_ctx->lsk);
- kfree(listen_ctx);
- module_put(THIS_MODULE);
- }
- return CPL_RET_BUF_DONE;
-}
-
-static void chtls_purge_wr_queue(struct sock *sk)
-{
- struct sk_buff *skb;
-
- while ((skb = dequeue_wr(sk)) != NULL)
- kfree_skb(skb);
-}
-
-static void chtls_release_resources(struct sock *sk)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
- struct chtls_dev *cdev = csk->cdev;
- unsigned int tid = csk->tid;
- struct tid_info *tids;
-
- if (!cdev)
- return;
-
- tids = cdev->tids;
- kfree_skb(csk->txdata_skb_cache);
- csk->txdata_skb_cache = NULL;
-
- if (csk->wr_credits != csk->wr_max_credits) {
- chtls_purge_wr_queue(sk);
- chtls_reset_wr_list(csk);
- }
-
- if (csk->l2t_entry) {
- cxgb4_l2t_release(csk->l2t_entry);
- csk->l2t_entry = NULL;
- }
-
- if (sk->sk_state != TCP_SYN_SENT) {
- cxgb4_remove_tid(tids, csk->port_id, tid, sk->sk_family);
- sock_put(sk);
- }
-}
-
-static void chtls_conn_done(struct sock *sk)
-{
- if (sock_flag(sk, SOCK_DEAD))
- chtls_purge_receive_queue(sk);
- sk_wakeup_sleepers(sk, 0);
- tcp_done(sk);
-}
-
-static void do_abort_syn_rcv(struct sock *child, struct sock *parent)
-{
- /*
- * If the server is still open we clean up the child connection,
- * otherwise the server already did the clean up as it was purging
- * its SYN queue and the skb was just sitting in its backlog.
- */
- if (likely(parent->sk_state == TCP_LISTEN)) {
- cleanup_syn_rcv_conn(child, parent);
- /* Without the below call to sock_orphan,
- * we leak the socket resource with syn_flood test
- * as inet_csk_destroy_sock will not be called
- * in tcp_done since SOCK_DEAD flag is not set.
- * Kernel handles this differently where new socket is
- * created only after 3 way handshake is done.
- */
- sock_orphan(child);
- tcp_orphan_count_inc();
- chtls_release_resources(child);
- chtls_conn_done(child);
- } else {
- if (csk_flag(child, CSK_RST_ABORTED)) {
- chtls_release_resources(child);
- chtls_conn_done(child);
- }
- }
-}
-
-static void pass_open_abort(struct sock *child, struct sock *parent,
- struct sk_buff *skb)
-{
- do_abort_syn_rcv(child, parent);
- kfree_skb(skb);
-}
-
-static void bl_pass_open_abort(struct sock *lsk, struct sk_buff *skb)
-{
- pass_open_abort(skb->sk, lsk, skb);
-}
-
-static void chtls_pass_open_arp_failure(struct sock *sk,
- struct sk_buff *skb)
-{
- const struct request_sock *oreq;
- struct chtls_sock *csk;
- struct chtls_dev *cdev;
- struct sock *parent;
- void *data;
-
- csk = rcu_dereference_sk_user_data(sk);
- cdev = csk->cdev;
-
- /*
- * If the connection is being aborted due to the parent listening
- * socket going away there's nothing to do, the ABORT_REQ will close
- * the connection.
- */
- if (csk_flag(sk, CSK_ABORT_RPL_PENDING)) {
- kfree_skb(skb);
- return;
- }
-
- oreq = csk->passive_reap_next;
- data = lookup_stid(cdev->tids, oreq->ts_recent);
- parent = ((struct listen_ctx *)data)->lsk;
-
- bh_lock_sock(parent);
- if (!sock_owned_by_user(parent)) {
- pass_open_abort(sk, parent, skb);
- } else {
- BLOG_SKB_CB(skb)->backlog_rcv = bl_pass_open_abort;
- __sk_add_backlog(parent, skb);
- }
- bh_unlock_sock(parent);
-}
-
-static void chtls_accept_rpl_arp_failure(void *handle,
- struct sk_buff *skb)
-{
- struct sock *sk = (struct sock *)handle;
-
- sock_hold(sk);
- process_cpl_msg(chtls_pass_open_arp_failure, sk, skb);
- sock_put(sk);
-}
-
-static unsigned int chtls_select_mss(const struct chtls_sock *csk,
- unsigned int pmtu,
- struct cpl_pass_accept_req *req)
-{
- struct chtls_dev *cdev;
- struct dst_entry *dst;
- unsigned int tcpoptsz;
- unsigned int iphdrsz;
- unsigned int mtu_idx;
- struct tcp_sock *tp;
- unsigned int mss;
- struct sock *sk;
- u16 user_mss;
-
- mss = ntohs(req->tcpopt.mss);
- sk = csk->sk;
- dst = __sk_dst_get(sk);
- cdev = csk->cdev;
- tp = tcp_sk(sk);
- tcpoptsz = 0;
-
-#if IS_ENABLED(CONFIG_IPV6)
- if (sk->sk_family == AF_INET6)
- iphdrsz = sizeof(struct ipv6hdr) + sizeof(struct tcphdr);
- else
-#endif
- iphdrsz = sizeof(struct iphdr) + sizeof(struct tcphdr);
- if (req->tcpopt.tstamp)
- tcpoptsz += round_up(TCPOLEN_TIMESTAMP, 4);
-
- tp->advmss = dst_metric_advmss(dst);
- user_mss = USER_MSS(tp);
- if (user_mss && tp->advmss > user_mss)
- tp->advmss = user_mss;
- if (tp->advmss > pmtu - iphdrsz)
- tp->advmss = pmtu - iphdrsz;
- if (mss && tp->advmss > mss)
- tp->advmss = mss;
-
- tp->advmss = cxgb4_best_aligned_mtu(cdev->lldi->mtus,
- iphdrsz + tcpoptsz,
- tp->advmss - tcpoptsz,
- 8, &mtu_idx);
- tp->advmss -= iphdrsz;
-
- inet_csk(sk)->icsk_pmtu_cookie = pmtu;
- return mtu_idx;
-}
-
-static unsigned int select_rcv_wscale(int space, int wscale_ok, int win_clamp)
-{
- int wscale = 0;
-
- if (space > MAX_RCV_WND)
- space = MAX_RCV_WND;
- if (win_clamp && win_clamp < space)
- space = win_clamp;
-
- if (wscale_ok) {
- while (wscale < 14 && (65535 << wscale) < space)
- wscale++;
- }
- return wscale;
-}
-
-static void chtls_pass_accept_rpl(struct sk_buff *skb,
- struct cpl_pass_accept_req *req,
- unsigned int tid)
-
-{
- struct cpl_t5_pass_accept_rpl *rpl5;
- struct cxgb4_lld_info *lldi;
- const struct tcphdr *tcph;
- const struct tcp_sock *tp;
- struct chtls_sock *csk;
- unsigned int len;
- struct sock *sk;
- u32 opt2, hlen;
- u64 opt0;
-
- sk = skb->sk;
- tp = tcp_sk(sk);
- csk = sk->sk_user_data;
- csk->tid = tid;
- lldi = csk->cdev->lldi;
- len = roundup(sizeof(*rpl5), 16);
-
- rpl5 = __skb_put_zero(skb, len);
- INIT_TP_WR(rpl5, tid);
-
- OPCODE_TID(rpl5) = cpu_to_be32(MK_OPCODE_TID(CPL_PASS_ACCEPT_RPL,
- csk->tid));
- csk->mtu_idx = chtls_select_mss(csk, dst_mtu(__sk_dst_get(sk)),
- req);
- opt0 = TCAM_BYPASS_F |
- WND_SCALE_V(RCV_WSCALE(tp)) |
- MSS_IDX_V(csk->mtu_idx) |
- L2T_IDX_V(csk->l2t_entry->idx) |
- NAGLE_V(!(tp->nonagle & TCP_NAGLE_OFF)) |
- TX_CHAN_V(csk->tx_chan) |
- SMAC_SEL_V(csk->smac_idx) |
- DSCP_V(csk->tos >> 2) |
- ULP_MODE_V(ULP_MODE_TLS) |
- RCV_BUFSIZ_V(min(tp->rcv_wnd >> 10, RCV_BUFSIZ_M));
-
- opt2 = RX_CHANNEL_V(0) |
- RSS_QUEUE_VALID_F | RSS_QUEUE_V(csk->rss_qid);
-
- if (!is_t5(lldi->adapter_type))
- opt2 |= RX_FC_DISABLE_F;
- if (req->tcpopt.tstamp)
- opt2 |= TSTAMPS_EN_F;
- if (req->tcpopt.sack)
- opt2 |= SACK_EN_F;
- hlen = ntohl(req->hdr_len);
-
- tcph = (struct tcphdr *)((u8 *)(req + 1) +
- T6_ETH_HDR_LEN_G(hlen) + T6_IP_HDR_LEN_G(hlen));
- if (tcph->ece && tcph->cwr)
- opt2 |= CCTRL_ECN_V(1);
- opt2 |= CONG_CNTRL_V(CONG_ALG_NEWRENO);
- opt2 |= T5_ISS_F;
- opt2 |= T5_OPT_2_VALID_F;
- opt2 |= WND_SCALE_EN_V(WSCALE_OK(tp));
- rpl5->opt0 = cpu_to_be64(opt0);
- rpl5->opt2 = cpu_to_be32(opt2);
- rpl5->iss = cpu_to_be32((get_random_u32() & ~7UL) - 1);
- set_wr_txq(skb, CPL_PRIORITY_SETUP, csk->port_id);
- t4_set_arp_err_handler(skb, sk, chtls_accept_rpl_arp_failure);
- cxgb4_l2t_send(csk->egress_dev, skb, csk->l2t_entry);
-}
-
-static void inet_inherit_port(struct sock *lsk, struct sock *newsk)
-{
- local_bh_disable();
- __inet_inherit_port(lsk, newsk);
- local_bh_enable();
-}
-
-static int chtls_backlog_rcv(struct sock *sk, struct sk_buff *skb)
-{
- if (skb->protocol) {
- kfree_skb(skb);
- return 0;
- }
- BLOG_SKB_CB(skb)->backlog_rcv(sk, skb);
- return 0;
-}
-
-static void chtls_set_tcp_window(struct chtls_sock *csk)
-{
- struct net_device *ndev = csk->egress_dev;
- struct port_info *pi = netdev_priv(ndev);
- unsigned int linkspeed;
- u8 scale;
-
- linkspeed = pi->link_cfg.speed;
- scale = linkspeed / SPEED_10000;
-#define CHTLS_10G_RCVWIN (256 * 1024)
- csk->rcv_win = CHTLS_10G_RCVWIN;
- if (scale)
- csk->rcv_win *= scale;
-#define CHTLS_10G_SNDWIN (256 * 1024)
- csk->snd_win = CHTLS_10G_SNDWIN;
- if (scale)
- csk->snd_win *= scale;
-}
-
-static struct sock *chtls_recv_sock(struct sock *lsk,
- struct request_sock *oreq,
- void *network_hdr,
- const struct cpl_pass_accept_req *req,
- struct chtls_dev *cdev)
-{
- struct adapter *adap = pci_get_drvdata(cdev->pdev);
- struct neighbour *n = NULL;
- struct inet_sock *newinet;
- const struct iphdr *iph;
- struct tls_context *ctx;
- struct net_device *ndev;
- struct chtls_sock *csk;
- struct dst_entry *dst;
- struct tcp_sock *tp;
- struct sock *newsk;
- bool found = false;
- u16 port_id;
- int rxq_idx;
- int step, i;
-
- iph = (const struct iphdr *)network_hdr;
- newsk = tcp_create_openreq_child(lsk, oreq, cdev->askb);
- if (!newsk)
- goto free_oreq;
-
- if (lsk->sk_family == AF_INET) {
- dst = inet_csk_route_child_sock(lsk, newsk, oreq);
- if (!dst)
- goto free_sk;
-
- n = dst_neigh_lookup(dst, &iph->saddr);
-#if IS_ENABLED(CONFIG_IPV6)
- } else {
- const struct ipv6hdr *ip6h;
- struct flowi6 fl6;
-
- ip6h = (const struct ipv6hdr *)network_hdr;
- memset(&fl6, 0, sizeof(fl6));
- fl6.flowi6_proto = IPPROTO_TCP;
- fl6.saddr = ip6h->daddr;
- fl6.daddr = ip6h->saddr;
- fl6.fl6_dport = inet_rsk(oreq)->ir_rmt_port;
- fl6.fl6_sport = htons(inet_rsk(oreq)->ir_num);
- security_req_classify_flow(oreq, flowi6_to_flowi_common(&fl6));
- dst = ip6_dst_lookup_flow(sock_net(lsk), lsk, &fl6, NULL);
- if (IS_ERR(dst))
- goto free_sk;
- n = dst_neigh_lookup(dst, &ip6h->saddr);
-#endif
- }
- if (!n || !n->dev)
- goto free_dst;
-
- ndev = n->dev;
- if (is_vlan_dev(ndev))
- ndev = vlan_dev_real_dev(ndev);
-
- for_each_port(adap, i)
- if (cdev->ports[i] == ndev)
- found = true;
-
- if (!found)
- goto free_dst;
-
- port_id = cxgb4_port_idx(ndev);
-
- csk = chtls_sock_create(cdev);
- if (!csk)
- goto free_dst;
-
- csk->l2t_entry = cxgb4_l2t_get(cdev->lldi->l2t, n, ndev, 0);
- if (!csk->l2t_entry)
- goto free_csk;
-
- newsk->sk_user_data = csk;
- newsk->sk_backlog_rcv = chtls_backlog_rcv;
-
- tp = tcp_sk(newsk);
- newinet = inet_sk(newsk);
-
- if (iph->version == 0x4) {
- newinet->inet_daddr = iph->saddr;
- newinet->inet_rcv_saddr = iph->daddr;
- newinet->inet_saddr = iph->daddr;
-#if IS_ENABLED(CONFIG_IPV6)
- } else {
- struct tcp6_sock *newtcp6sk = (struct tcp6_sock *)newsk;
- struct inet_request_sock *treq = inet_rsk(oreq);
- struct ipv6_pinfo *newnp = inet6_sk(newsk);
- struct ipv6_pinfo *np = inet6_sk(lsk);
-
- newinet->pinet6 = &newtcp6sk->inet6;
- newinet->ipv6_fl_list = NULL;
- memcpy(newnp, np, sizeof(struct ipv6_pinfo));
- newsk->sk_v6_daddr = treq->ir_v6_rmt_addr;
- newsk->sk_v6_rcv_saddr = treq->ir_v6_loc_addr;
- inet6_sk(newsk)->saddr = treq->ir_v6_loc_addr;
- newnp->pktoptions = NULL;
- newsk->sk_bound_dev_if = treq->ir_iif;
- newinet->inet_opt = NULL;
- newinet->inet_daddr = LOOPBACK4_IPV6;
- newinet->inet_saddr = LOOPBACK4_IPV6;
-#endif
- }
-
- oreq->ts_recent = PASS_OPEN_TID_G(ntohl(req->tos_stid));
- sk_setup_caps(newsk, dst);
- ctx = tls_get_ctx(lsk);
- newsk->sk_destruct = ctx->sk_destruct;
- newsk->sk_prot_creator = lsk->sk_prot_creator;
- csk->sk = newsk;
- csk->passive_reap_next = oreq;
- csk->tx_chan = cxgb4_port_chan(ndev);
- csk->port_id = port_id;
- csk->egress_dev = ndev;
- csk->tos = PASS_OPEN_TOS_G(ntohl(req->tos_stid));
- chtls_set_tcp_window(csk);
- tp->rcv_wnd = csk->rcv_win;
- csk->sndbuf = csk->snd_win;
- csk->ulp_mode = ULP_MODE_TLS;
- step = cdev->lldi->nrxq / cdev->lldi->nchan;
- rxq_idx = port_id * step;
- rxq_idx += cdev->round_robin_cnt++ % step;
- csk->rss_qid = cdev->lldi->rxq_ids[rxq_idx];
- csk->txq_idx = (rxq_idx < cdev->lldi->ntxq) ? rxq_idx :
- port_id * step;
- csk->sndbuf = newsk->sk_sndbuf;
- csk->smac_idx = ((struct port_info *)netdev_priv(ndev))->smt_idx;
- RCV_WSCALE(tp) = select_rcv_wscale(tcp_full_space(newsk),
- READ_ONCE(sock_net(newsk)->
- ipv4.sysctl_tcp_window_scaling),
- tp->window_clamp);
- neigh_release(n);
- inet_inherit_port(lsk, newsk);
- csk_set_flag(csk, CSK_CONN_INLINE);
- bh_unlock_sock(newsk); /* tcp_create_openreq_child ->sk_clone_lock */
-
- return newsk;
-free_csk:
- chtls_sock_release(&csk->kref);
-free_dst:
- if (n)
- neigh_release(n);
- dst_release(dst);
-free_sk:
- inet_csk_prepare_forced_close(newsk);
- tcp_done(newsk);
-free_oreq:
- chtls_reqsk_free(oreq);
- return NULL;
-}
-
-/*
- * Populate a TID_RELEASE WR. The skb must be already propely sized.
- */
-static void mk_tid_release(struct sk_buff *skb,
- unsigned int chan, unsigned int tid)
-{
- struct cpl_tid_release *req;
- unsigned int len;
-
- len = roundup(sizeof(struct cpl_tid_release), 16);
- req = (struct cpl_tid_release *)__skb_put(skb, len);
- memset(req, 0, len);
- set_wr_txq(skb, CPL_PRIORITY_SETUP, chan);
- INIT_TP_WR_CPL(req, CPL_TID_RELEASE, tid);
-}
-
-static int chtls_get_module(struct sock *sk)
-{
- struct inet_connection_sock *icsk = inet_csk(sk);
-
- if (!try_module_get(icsk->icsk_ulp_ops->owner))
- return -1;
-
- return 0;
-}
-
-static void chtls_pass_accept_request(struct sock *sk,
- struct sk_buff *skb)
-{
- struct cpl_t5_pass_accept_rpl *rpl;
- struct cpl_pass_accept_req *req;
- struct listen_ctx *listen_ctx;
- struct vlan_ethhdr *vlan_eh;
- struct request_sock *oreq;
- struct sk_buff *reply_skb;
- struct chtls_sock *csk;
- struct chtls_dev *cdev;
- struct ipv6hdr *ip6h;
- struct tcphdr *tcph;
- struct sock *newsk;
- struct ethhdr *eh;
- struct iphdr *iph;
- void *network_hdr;
- unsigned int stid;
- unsigned int len;
- unsigned int tid;
- bool th_ecn, ect;
- __u8 ip_dsfield; /* IPv4 tos or IPv6 dsfield */
- u16 eth_hdr_len;
- bool ecn_ok;
-
- req = cplhdr(skb) + RSS_HDR;
- tid = GET_TID(req);
- cdev = BLOG_SKB_CB(skb)->cdev;
- newsk = lookup_tid(cdev->tids, tid);
- stid = PASS_OPEN_TID_G(ntohl(req->tos_stid));
- if (newsk) {
- pr_info("tid (%d) already in use\n", tid);
- return;
- }
-
- len = roundup(sizeof(*rpl), 16);
- reply_skb = alloc_skb(len, GFP_ATOMIC);
- if (!reply_skb) {
- cxgb4_remove_tid(cdev->tids, 0, tid, sk->sk_family);
- kfree_skb(skb);
- return;
- }
-
- if (sk->sk_state != TCP_LISTEN)
- goto reject;
-
- if (inet_csk_reqsk_queue_is_full(sk))
- goto reject;
-
- if (sk_acceptq_is_full(sk))
- goto reject;
-
-
- eth_hdr_len = T6_ETH_HDR_LEN_G(ntohl(req->hdr_len));
- if (eth_hdr_len == ETH_HLEN) {
- eh = (struct ethhdr *)(req + 1);
- iph = (struct iphdr *)(eh + 1);
- ip6h = (struct ipv6hdr *)(eh + 1);
- network_hdr = (void *)(eh + 1);
- } else {
- vlan_eh = (struct vlan_ethhdr *)(req + 1);
- iph = (struct iphdr *)(vlan_eh + 1);
- ip6h = (struct ipv6hdr *)(vlan_eh + 1);
- network_hdr = (void *)(vlan_eh + 1);
- }
-
- if (iph->version == 0x4) {
- tcph = (struct tcphdr *)(iph + 1);
- skb_set_network_header(skb, (void *)iph - (void *)req);
- oreq = inet_reqsk_alloc(&chtls_rsk_ops, sk, true);
- } else {
- tcph = (struct tcphdr *)(ip6h + 1);
- skb_set_network_header(skb, (void *)ip6h - (void *)req);
- oreq = inet_reqsk_alloc(&chtls_rsk_opsv6, sk, false);
- }
-
- if (!oreq)
- goto reject;
-
- oreq->rsk_rcv_wnd = 0;
- oreq->rsk_window_clamp = 0;
- oreq->syncookie = 0;
- oreq->mss = 0;
- oreq->ts_recent = 0;
-
- tcp_rsk(oreq)->tfo_listener = false;
- tcp_rsk(oreq)->rcv_isn = ntohl(tcph->seq);
- chtls_set_req_port(oreq, tcph->source, tcph->dest);
- if (iph->version == 0x4) {
- chtls_set_req_addr(oreq, iph->daddr, iph->saddr);
- ip_dsfield = ipv4_get_dsfield(iph);
-#if IS_ENABLED(CONFIG_IPV6)
- } else {
- inet_rsk(oreq)->ir_v6_rmt_addr = ipv6_hdr(skb)->saddr;
- inet_rsk(oreq)->ir_v6_loc_addr = ipv6_hdr(skb)->daddr;
- ip_dsfield = ipv6_get_dsfield(ipv6_hdr(skb));
-#endif
- }
- if (req->tcpopt.wsf <= 14 &&
- READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_window_scaling)) {
- inet_rsk(oreq)->wscale_ok = 1;
- inet_rsk(oreq)->snd_wscale = req->tcpopt.wsf;
- }
- inet_rsk(oreq)->ir_iif = sk->sk_bound_dev_if;
- th_ecn = tcph->ece && tcph->cwr;
- if (th_ecn) {
- ect = !INET_ECN_is_not_ect(ip_dsfield);
- ecn_ok = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_ecn);
- if ((!ect && ecn_ok) || tcp_ca_needs_ecn(sk))
- inet_rsk(oreq)->ecn_ok = 1;
- }
-
- newsk = chtls_recv_sock(sk, oreq, network_hdr, req, cdev);
- if (!newsk)
- goto reject;
-
- if (chtls_get_module(newsk))
- goto reject;
- inet_csk_reqsk_queue_added(sk);
- reply_skb->sk = newsk;
- chtls_install_cpl_ops(newsk);
- cxgb4_insert_tid(cdev->tids, newsk, tid, newsk->sk_family);
- csk = rcu_dereference_sk_user_data(newsk);
- listen_ctx = (struct listen_ctx *)lookup_stid(cdev->tids, stid);
- csk->listen_ctx = listen_ctx;
- __skb_queue_tail(&listen_ctx->synq, (struct sk_buff *)&csk->synq);
- chtls_pass_accept_rpl(reply_skb, req, tid);
- kfree_skb(skb);
- return;
-
-reject:
- mk_tid_release(reply_skb, 0, tid);
- cxgb4_ofld_send(cdev->lldi->ports[0], reply_skb);
- kfree_skb(skb);
-}
-
-/*
- * Handle a CPL_PASS_ACCEPT_REQ message.
- */
-static int chtls_pass_accept_req(struct chtls_dev *cdev, struct sk_buff *skb)
-{
- struct cpl_pass_accept_req *req = cplhdr(skb) + RSS_HDR;
- struct listen_ctx *ctx;
- unsigned int stid;
- unsigned int tid;
- struct sock *lsk;
- void *data;
-
- stid = PASS_OPEN_TID_G(ntohl(req->tos_stid));
- tid = GET_TID(req);
-
- data = lookup_stid(cdev->tids, stid);
- if (!data)
- return 1;
-
- ctx = (struct listen_ctx *)data;
- lsk = ctx->lsk;
-
- if (unlikely(tid_out_of_range(cdev->tids, tid))) {
- pr_info("passive open TID %u too large\n", tid);
- return 1;
- }
-
- BLOG_SKB_CB(skb)->cdev = cdev;
- process_cpl_msg(chtls_pass_accept_request, lsk, skb);
- return 0;
-}
-
-/*
- * Completes some final bits of initialization for just established connections
- * and changes their state to TCP_ESTABLISHED.
- *
- * snd_isn here is the ISN after the SYN, i.e., the true ISN + 1.
- */
-static void make_established(struct sock *sk, u32 snd_isn, unsigned int opt)
-{
- struct tcp_sock *tp = tcp_sk(sk);
-
- tp->pushed_seq = snd_isn;
- tp->write_seq = snd_isn;
- tp->snd_nxt = snd_isn;
- tp->snd_una = snd_isn;
- atomic_set(&inet_sk(sk)->inet_id, get_random_u16());
- assign_rxopt(sk, opt);
-
- if (tp->rcv_wnd > (RCV_BUFSIZ_M << 10))
- tp->rcv_wup -= tp->rcv_wnd - (RCV_BUFSIZ_M << 10);
-
- smp_mb();
- tcp_set_state(sk, TCP_ESTABLISHED);
-}
-
-static void chtls_abort_conn(struct sock *sk, struct sk_buff *skb)
-{
- struct sk_buff *abort_skb;
-
- abort_skb = alloc_skb(sizeof(struct cpl_abort_req), GFP_ATOMIC);
- if (abort_skb)
- chtls_send_reset(sk, CPL_ABORT_SEND_RST, abort_skb);
-}
-
-static struct sock *reap_list;
-static DEFINE_SPINLOCK(reap_list_lock);
-
-/*
- * Process the reap list.
- */
-DECLARE_TASK_FUNC(process_reap_list, task_param)
-{
- spin_lock_bh(&reap_list_lock);
- while (reap_list) {
- struct sock *sk = reap_list;
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
-
- reap_list = csk->passive_reap_next;
- csk->passive_reap_next = NULL;
- spin_unlock(&reap_list_lock);
- sock_hold(sk);
-
- bh_lock_sock(sk);
- chtls_abort_conn(sk, NULL);
- sock_orphan(sk);
- if (sk->sk_state == TCP_CLOSE)
- inet_csk_destroy_sock(sk);
- bh_unlock_sock(sk);
- sock_put(sk);
- spin_lock(&reap_list_lock);
- }
- spin_unlock_bh(&reap_list_lock);
-}
-
-static DECLARE_WORK(reap_task, process_reap_list);
-
-static void add_to_reap_list(struct sock *sk)
-{
- struct chtls_sock *csk = sk->sk_user_data;
-
- local_bh_disable();
- release_tcp_port(sk); /* release the port immediately */
-
- spin_lock(&reap_list_lock);
- csk->passive_reap_next = reap_list;
- reap_list = sk;
- if (!csk->passive_reap_next)
- schedule_work(&reap_task);
- spin_unlock(&reap_list_lock);
- local_bh_enable();
-}
-
-static void add_pass_open_to_parent(struct sock *child, struct sock *lsk,
- struct chtls_dev *cdev)
-{
- struct request_sock *oreq;
- struct chtls_sock *csk;
-
- if (lsk->sk_state != TCP_LISTEN)
- return;
-
- csk = child->sk_user_data;
- oreq = csk->passive_reap_next;
- csk->passive_reap_next = NULL;
-
- reqsk_queue_removed(&inet_csk(lsk)->icsk_accept_queue, oreq);
- __skb_unlink((struct sk_buff *)&csk->synq, &csk->listen_ctx->synq);
-
- if (sk_acceptq_is_full(lsk)) {
- chtls_reqsk_free(oreq);
- add_to_reap_list(child);
- } else {
- refcount_set(&oreq->rsk_refcnt, 1);
- inet_csk_reqsk_queue_add(lsk, oreq, child);
- lsk->sk_data_ready(lsk);
- }
-}
-
-static void bl_add_pass_open_to_parent(struct sock *lsk, struct sk_buff *skb)
-{
- struct sock *child = skb->sk;
-
- skb->sk = NULL;
- add_pass_open_to_parent(child, lsk, BLOG_SKB_CB(skb)->cdev);
- kfree_skb(skb);
-}
-
-static int chtls_pass_establish(struct chtls_dev *cdev, struct sk_buff *skb)
-{
- struct cpl_pass_establish *req = cplhdr(skb) + RSS_HDR;
- struct chtls_sock *csk;
- struct sock *lsk, *sk;
- unsigned int hwtid;
-
- hwtid = GET_TID(req);
- sk = lookup_tid(cdev->tids, hwtid);
- if (!sk)
- return (CPL_RET_UNKNOWN_TID | CPL_RET_BUF_DONE);
-
- bh_lock_sock(sk);
- if (unlikely(sock_owned_by_user(sk))) {
- kfree_skb(skb);
- } else {
- unsigned int stid;
- void *data;
-
- csk = sk->sk_user_data;
- csk->wr_max_credits = 64;
- csk->wr_credits = 64;
- csk->wr_unacked = 0;
- make_established(sk, ntohl(req->snd_isn), ntohs(req->tcp_opt));
- stid = PASS_OPEN_TID_G(ntohl(req->tos_stid));
- sk->sk_state_change(sk);
- if (unlikely(sk->sk_socket))
- sk_wake_async(sk, 0, POLL_OUT);
-
- data = lookup_stid(cdev->tids, stid);
- if (!data) {
- /* listening server close */
- kfree_skb(skb);
- goto unlock;
- }
- lsk = ((struct listen_ctx *)data)->lsk;
-
- bh_lock_sock(lsk);
- if (unlikely(skb_queue_empty(&csk->listen_ctx->synq))) {
- /* removed from synq */
- bh_unlock_sock(lsk);
- kfree_skb(skb);
- goto unlock;
- }
-
- if (likely(!sock_owned_by_user(lsk))) {
- kfree_skb(skb);
- add_pass_open_to_parent(sk, lsk, cdev);
- } else {
- skb->sk = sk;
- BLOG_SKB_CB(skb)->cdev = cdev;
- BLOG_SKB_CB(skb)->backlog_rcv =
- bl_add_pass_open_to_parent;
- __sk_add_backlog(lsk, skb);
- }
- bh_unlock_sock(lsk);
- }
-unlock:
- bh_unlock_sock(sk);
- return 0;
-}
-
-/*
- * Handle receipt of an urgent pointer.
- */
-static void handle_urg_ptr(struct sock *sk, u32 urg_seq)
-{
- struct tcp_sock *tp = tcp_sk(sk);
-
- urg_seq--;
- if (tp->urg_data && !after(urg_seq, tp->urg_seq))
- return; /* duplicate pointer */
-
- sk_send_sigurg(sk);
- if (tp->urg_seq == tp->copied_seq && tp->urg_data &&
- !sock_flag(sk, SOCK_URGINLINE) &&
- tp->copied_seq != tp->rcv_nxt) {
- struct sk_buff *skb = skb_peek(&sk->sk_receive_queue);
-
- tp->copied_seq++;
- if (skb && tp->copied_seq - ULP_SKB_CB(skb)->seq >= skb->len)
- chtls_free_skb(sk, skb);
- }
-
- tp->urg_data = TCP_URG_NOTYET;
- tp->urg_seq = urg_seq;
-}
-
-static void check_sk_callbacks(struct chtls_sock *csk)
-{
- struct sock *sk = csk->sk;
-
- if (unlikely(sk->sk_user_data &&
- !csk_flag_nochk(csk, CSK_CALLBACKS_CHKD)))
- csk_set_flag(csk, CSK_CALLBACKS_CHKD);
-}
-
-/*
- * Handles Rx data that arrives in a state where the socket isn't accepting
- * new data.
- */
-static void handle_excess_rx(struct sock *sk, struct sk_buff *skb)
-{
- if (!csk_flag(sk, CSK_ABORT_SHUTDOWN))
- chtls_abort_conn(sk, skb);
-
- kfree_skb(skb);
-}
-
-static void chtls_recv_data(struct sock *sk, struct sk_buff *skb)
-{
- struct cpl_rx_data *hdr = cplhdr(skb) + RSS_HDR;
- struct chtls_sock *csk;
- struct tcp_sock *tp;
-
- csk = rcu_dereference_sk_user_data(sk);
- tp = tcp_sk(sk);
-
- if (unlikely(sk->sk_shutdown & RCV_SHUTDOWN)) {
- handle_excess_rx(sk, skb);
- return;
- }
-
- ULP_SKB_CB(skb)->seq = ntohl(hdr->seq);
- ULP_SKB_CB(skb)->psh = hdr->psh;
- skb_ulp_mode(skb) = ULP_MODE_NONE;
-
- skb_reset_transport_header(skb);
- __skb_pull(skb, sizeof(*hdr) + RSS_HDR);
- if (!skb->data_len)
- __skb_trim(skb, ntohs(hdr->len));
-
- if (unlikely(hdr->urg))
- handle_urg_ptr(sk, tp->rcv_nxt + ntohs(hdr->urg));
- if (unlikely(tp->urg_data == TCP_URG_NOTYET &&
- tp->urg_seq - tp->rcv_nxt < skb->len))
- tp->urg_data = TCP_URG_VALID |
- skb->data[tp->urg_seq - tp->rcv_nxt];
-
- if (unlikely(hdr->dack_mode != csk->delack_mode)) {
- csk->delack_mode = hdr->dack_mode;
- csk->delack_seq = tp->rcv_nxt;
- }
-
- tcp_hdr(skb)->fin = 0;
- tp->rcv_nxt += skb->len;
-
- __skb_queue_tail(&sk->sk_receive_queue, skb);
-
- if (!sock_flag(sk, SOCK_DEAD)) {
- check_sk_callbacks(csk);
- sk->sk_data_ready(sk);
- }
-}
-
-static int chtls_rx_data(struct chtls_dev *cdev, struct sk_buff *skb)
-{
- struct cpl_rx_data *req = cplhdr(skb) + RSS_HDR;
- unsigned int hwtid = GET_TID(req);
- struct sock *sk;
-
- sk = lookup_tid(cdev->tids, hwtid);
- if (unlikely(!sk)) {
- pr_err("can't find conn. for hwtid %u.\n", hwtid);
- return -EINVAL;
- }
- skb_dstref_steal(skb);
- process_cpl_msg(chtls_recv_data, sk, skb);
- return 0;
-}
-
-static void chtls_recv_pdu(struct sock *sk, struct sk_buff *skb)
-{
- struct cpl_tls_data *hdr = cplhdr(skb);
- struct chtls_sock *csk;
- struct chtls_hws *tlsk;
- struct tcp_sock *tp;
-
- csk = rcu_dereference_sk_user_data(sk);
- tlsk = &csk->tlshws;
- tp = tcp_sk(sk);
-
- if (unlikely(sk->sk_shutdown & RCV_SHUTDOWN)) {
- handle_excess_rx(sk, skb);
- return;
- }
-
- ULP_SKB_CB(skb)->seq = ntohl(hdr->seq);
- ULP_SKB_CB(skb)->flags = 0;
- skb_ulp_mode(skb) = ULP_MODE_TLS;
-
- skb_reset_transport_header(skb);
- __skb_pull(skb, sizeof(*hdr));
- if (!skb->data_len)
- __skb_trim(skb,
- CPL_TLS_DATA_LENGTH_G(ntohl(hdr->length_pkd)));
-
- if (unlikely(tp->urg_data == TCP_URG_NOTYET && tp->urg_seq -
- tp->rcv_nxt < skb->len))
- tp->urg_data = TCP_URG_VALID |
- skb->data[tp->urg_seq - tp->rcv_nxt];
-
- tcp_hdr(skb)->fin = 0;
- tlsk->pldlen = CPL_TLS_DATA_LENGTH_G(ntohl(hdr->length_pkd));
- __skb_queue_tail(&tlsk->sk_recv_queue, skb);
-}
-
-static int chtls_rx_pdu(struct chtls_dev *cdev, struct sk_buff *skb)
-{
- struct cpl_tls_data *req = cplhdr(skb);
- unsigned int hwtid = GET_TID(req);
- struct sock *sk;
-
- sk = lookup_tid(cdev->tids, hwtid);
- if (unlikely(!sk)) {
- pr_err("can't find conn. for hwtid %u.\n", hwtid);
- return -EINVAL;
- }
- skb_dstref_steal(skb);
- process_cpl_msg(chtls_recv_pdu, sk, skb);
- return 0;
-}
-
-static void chtls_set_hdrlen(struct sk_buff *skb, unsigned int nlen)
-{
- struct tlsrx_cmp_hdr *tls_cmp_hdr = cplhdr(skb);
-
- skb->hdr_len = ntohs((__force __be16)tls_cmp_hdr->length);
- tls_cmp_hdr->length = ntohs((__force __be16)nlen);
-}
-
-static void chtls_rx_hdr(struct sock *sk, struct sk_buff *skb)
-{
- struct tlsrx_cmp_hdr *tls_hdr_pkt;
- struct cpl_rx_tls_cmp *cmp_cpl;
- struct sk_buff *skb_rec;
- struct chtls_sock *csk;
- struct chtls_hws *tlsk;
- struct tcp_sock *tp;
-
- cmp_cpl = cplhdr(skb);
- csk = rcu_dereference_sk_user_data(sk);
- tlsk = &csk->tlshws;
- tp = tcp_sk(sk);
-
- ULP_SKB_CB(skb)->seq = ntohl(cmp_cpl->seq);
- ULP_SKB_CB(skb)->flags = 0;
-
- skb_reset_transport_header(skb);
- __skb_pull(skb, sizeof(*cmp_cpl));
- tls_hdr_pkt = (struct tlsrx_cmp_hdr *)skb->data;
- if (tls_hdr_pkt->res_to_mac_error & TLSRX_HDR_PKT_ERROR_M)
- tls_hdr_pkt->type = CONTENT_TYPE_ERROR;
- if (!skb->data_len)
- __skb_trim(skb, TLS_HEADER_LENGTH);
-
- tp->rcv_nxt +=
- CPL_RX_TLS_CMP_PDULENGTH_G(ntohl(cmp_cpl->pdulength_length));
-
- ULP_SKB_CB(skb)->flags |= ULPCB_FLAG_TLS_HDR;
- skb_rec = __skb_dequeue(&tlsk->sk_recv_queue);
- if (!skb_rec) {
- __skb_queue_tail(&sk->sk_receive_queue, skb);
- } else {
- chtls_set_hdrlen(skb, tlsk->pldlen);
- tlsk->pldlen = 0;
- __skb_queue_tail(&sk->sk_receive_queue, skb);
- __skb_queue_tail(&sk->sk_receive_queue, skb_rec);
- }
-
- if (!sock_flag(sk, SOCK_DEAD)) {
- check_sk_callbacks(csk);
- sk->sk_data_ready(sk);
- }
-}
-
-static int chtls_rx_cmp(struct chtls_dev *cdev, struct sk_buff *skb)
-{
- struct cpl_rx_tls_cmp *req = cplhdr(skb);
- unsigned int hwtid = GET_TID(req);
- struct sock *sk;
-
- sk = lookup_tid(cdev->tids, hwtid);
- if (unlikely(!sk)) {
- pr_err("can't find conn. for hwtid %u.\n", hwtid);
- return -EINVAL;
- }
- skb_dstref_steal(skb);
- process_cpl_msg(chtls_rx_hdr, sk, skb);
-
- return 0;
-}
-
-static void chtls_timewait(struct sock *sk)
-{
- struct tcp_sock *tp = tcp_sk(sk);
-
- tp->rcv_nxt++;
- tp->rx_opt.ts_recent_stamp = ktime_get_seconds();
- tp->srtt_us = 0;
- tcp_time_wait(sk, TCP_TIME_WAIT, 0);
-}
-
-static void chtls_peer_close(struct sock *sk, struct sk_buff *skb)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
-
- if (csk_flag_nochk(csk, CSK_ABORT_RPL_PENDING))
- goto out;
-
- sk->sk_shutdown |= RCV_SHUTDOWN;
- sock_set_flag(sk, SOCK_DONE);
-
- switch (sk->sk_state) {
- case TCP_SYN_RECV:
- case TCP_ESTABLISHED:
- tcp_set_state(sk, TCP_CLOSE_WAIT);
- break;
- case TCP_FIN_WAIT1:
- tcp_set_state(sk, TCP_CLOSING);
- break;
- case TCP_FIN_WAIT2:
- chtls_release_resources(sk);
- if (csk_flag_nochk(csk, CSK_ABORT_RPL_PENDING))
- chtls_conn_done(sk);
- else
- chtls_timewait(sk);
- break;
- default:
- pr_info("cpl_peer_close in bad state %d\n", sk->sk_state);
- }
-
- if (!sock_flag(sk, SOCK_DEAD)) {
- sk->sk_state_change(sk);
- /* Do not send POLL_HUP for half duplex close. */
-
- if ((sk->sk_shutdown & SEND_SHUTDOWN) ||
- sk->sk_state == TCP_CLOSE)
- sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_HUP);
- else
- sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN);
- }
-out:
- kfree_skb(skb);
-}
-
-static void chtls_close_con_rpl(struct sock *sk, struct sk_buff *skb)
-{
- struct cpl_close_con_rpl *rpl = cplhdr(skb) + RSS_HDR;
- struct chtls_sock *csk;
- struct tcp_sock *tp;
-
- csk = rcu_dereference_sk_user_data(sk);
-
- if (csk_flag_nochk(csk, CSK_ABORT_RPL_PENDING))
- goto out;
-
- tp = tcp_sk(sk);
-
- tp->snd_una = ntohl(rpl->snd_nxt) - 1; /* exclude FIN */
-
- switch (sk->sk_state) {
- case TCP_CLOSING:
- chtls_release_resources(sk);
- if (csk_flag_nochk(csk, CSK_ABORT_RPL_PENDING))
- chtls_conn_done(sk);
- else
- chtls_timewait(sk);
- break;
- case TCP_LAST_ACK:
- chtls_release_resources(sk);
- chtls_conn_done(sk);
- break;
- case TCP_FIN_WAIT1:
- tcp_set_state(sk, TCP_FIN_WAIT2);
- sk->sk_shutdown |= SEND_SHUTDOWN;
-
- if (!sock_flag(sk, SOCK_DEAD))
- sk->sk_state_change(sk);
- else if (tcp_sk(sk)->linger2 < 0 &&
- !csk_flag_nochk(csk, CSK_ABORT_SHUTDOWN))
- chtls_abort_conn(sk, skb);
- else if (csk_flag_nochk(csk, CSK_TX_DATA_SENT))
- chtls_set_quiesce_ctrl(sk, 0);
- break;
- default:
- pr_info("close_con_rpl in bad state %d\n", sk->sk_state);
- }
-out:
- kfree_skb(skb);
-}
-
-static struct sk_buff *get_cpl_skb(struct sk_buff *skb,
- size_t len, gfp_t gfp)
-{
- if (likely(!skb_is_nonlinear(skb) && !skb_cloned(skb))) {
- WARN_ONCE(skb->len < len, "skb alloc error");
- __skb_trim(skb, len);
- skb_get(skb);
- } else {
- skb = alloc_skb(len, gfp);
- if (skb)
- __skb_put(skb, len);
- }
- return skb;
-}
-
-static void set_abort_rpl_wr(struct sk_buff *skb, unsigned int tid,
- int cmd)
-{
- struct cpl_abort_rpl *rpl = cplhdr(skb);
-
- INIT_TP_WR_CPL(rpl, CPL_ABORT_RPL, tid);
- rpl->cmd = cmd;
-}
-
-static void send_defer_abort_rpl(struct chtls_dev *cdev, struct sk_buff *skb)
-{
- struct cpl_abort_req_rss *req = cplhdr(skb);
- struct sk_buff *reply_skb;
-
- reply_skb = alloc_skb(sizeof(struct cpl_abort_rpl),
- GFP_KERNEL | __GFP_NOFAIL);
- __skb_put(reply_skb, sizeof(struct cpl_abort_rpl));
- set_abort_rpl_wr(reply_skb, GET_TID(req),
- (req->status & CPL_ABORT_NO_RST));
- set_wr_txq(reply_skb, CPL_PRIORITY_DATA, req->status >> 1);
- cxgb4_ofld_send(cdev->lldi->ports[0], reply_skb);
- kfree_skb(skb);
-}
-
-/*
- * Add an skb to the deferred skb queue for processing from process context.
- */
-static void t4_defer_reply(struct sk_buff *skb, struct chtls_dev *cdev,
- defer_handler_t handler)
-{
- DEFERRED_SKB_CB(skb)->handler = handler;
- spin_lock_bh(&cdev->deferq.lock);
- __skb_queue_tail(&cdev->deferq, skb);
- if (skb_queue_len(&cdev->deferq) == 1)
- schedule_work(&cdev->deferq_task);
- spin_unlock_bh(&cdev->deferq.lock);
-}
-
-static void chtls_send_abort_rpl(struct sock *sk, struct sk_buff *skb,
- struct chtls_dev *cdev,
- int status, int queue)
-{
- struct cpl_abort_req_rss *req = cplhdr(skb) + RSS_HDR;
- struct sk_buff *reply_skb;
- struct chtls_sock *csk;
- unsigned int tid;
-
- csk = rcu_dereference_sk_user_data(sk);
- tid = GET_TID(req);
-
- reply_skb = get_cpl_skb(skb, sizeof(struct cpl_abort_rpl), gfp_any());
- if (!reply_skb) {
- req->status = (queue << 1) | status;
- t4_defer_reply(skb, cdev, send_defer_abort_rpl);
- return;
- }
-
- set_abort_rpl_wr(reply_skb, tid, status);
- kfree_skb(skb);
- set_wr_txq(reply_skb, CPL_PRIORITY_DATA, queue);
- if (csk_conn_inline(csk)) {
- struct l2t_entry *e = csk->l2t_entry;
-
- if (e && sk->sk_state != TCP_SYN_RECV) {
- cxgb4_l2t_send(csk->egress_dev, reply_skb, e);
- return;
- }
- }
- cxgb4_ofld_send(cdev->lldi->ports[0], reply_skb);
-}
-
-/*
- * This is run from a listener's backlog to abort a child connection in
- * SYN_RCV state (i.e., one on the listener's SYN queue).
- */
-static void bl_abort_syn_rcv(struct sock *lsk, struct sk_buff *skb)
-{
- struct chtls_sock *csk;
- struct sock *child;
- int queue;
-
- child = skb->sk;
- csk = rcu_dereference_sk_user_data(child);
- queue = csk->txq_idx;
-
- skb->sk = NULL;
- chtls_send_abort_rpl(child, skb, BLOG_SKB_CB(skb)->cdev,
- CPL_ABORT_NO_RST, queue);
- do_abort_syn_rcv(child, lsk);
-}
-
-static int abort_syn_rcv(struct sock *sk, struct sk_buff *skb)
-{
- const struct request_sock *oreq;
- struct listen_ctx *listen_ctx;
- struct chtls_sock *csk;
- struct chtls_dev *cdev;
- struct sock *psk;
- void *ctx;
-
- csk = sk->sk_user_data;
- oreq = csk->passive_reap_next;
- cdev = csk->cdev;
-
- if (!oreq)
- return -1;
-
- ctx = lookup_stid(cdev->tids, oreq->ts_recent);
- if (!ctx)
- return -1;
-
- listen_ctx = (struct listen_ctx *)ctx;
- psk = listen_ctx->lsk;
-
- bh_lock_sock(psk);
- if (!sock_owned_by_user(psk)) {
- int queue = csk->txq_idx;
-
- chtls_send_abort_rpl(sk, skb, cdev, CPL_ABORT_NO_RST, queue);
- do_abort_syn_rcv(sk, psk);
- } else {
- skb->sk = sk;
- BLOG_SKB_CB(skb)->backlog_rcv = bl_abort_syn_rcv;
- __sk_add_backlog(psk, skb);
- }
- bh_unlock_sock(psk);
- return 0;
-}
-
-static void chtls_abort_req_rss(struct sock *sk, struct sk_buff *skb)
-{
- const struct cpl_abort_req_rss *req = cplhdr(skb) + RSS_HDR;
- struct chtls_sock *csk = sk->sk_user_data;
- int rst_status = CPL_ABORT_NO_RST;
- int queue = csk->txq_idx;
-
- if (is_neg_adv(req->status)) {
- kfree_skb(skb);
- return;
- }
-
- csk_reset_flag(csk, CSK_ABORT_REQ_RCVD);
-
- if (!csk_flag_nochk(csk, CSK_ABORT_SHUTDOWN) &&
- !csk_flag_nochk(csk, CSK_TX_DATA_SENT)) {
- struct tcp_sock *tp = tcp_sk(sk);
-
- if (send_tx_flowc_wr(sk, 0, tp->snd_nxt, tp->rcv_nxt) < 0)
- WARN_ONCE(1, "send_tx_flowc error");
- csk_set_flag(csk, CSK_TX_DATA_SENT);
- }
-
- csk_set_flag(csk, CSK_ABORT_SHUTDOWN);
-
- if (!csk_flag_nochk(csk, CSK_ABORT_RPL_PENDING)) {
- sk->sk_err = ETIMEDOUT;
-
- if (!sock_flag(sk, SOCK_DEAD))
- sk_error_report(sk);
-
- if (sk->sk_state == TCP_SYN_RECV && !abort_syn_rcv(sk, skb))
- return;
-
- }
-
- chtls_send_abort_rpl(sk, skb, BLOG_SKB_CB(skb)->cdev,
- rst_status, queue);
- chtls_release_resources(sk);
- chtls_conn_done(sk);
-}
-
-static void chtls_abort_rpl_rss(struct sock *sk, struct sk_buff *skb)
-{
- struct cpl_abort_rpl_rss *rpl = cplhdr(skb) + RSS_HDR;
- struct chtls_sock *csk;
- struct chtls_dev *cdev;
-
- csk = rcu_dereference_sk_user_data(sk);
- cdev = csk->cdev;
-
- if (csk_flag_nochk(csk, CSK_ABORT_RPL_PENDING)) {
- csk_reset_flag(csk, CSK_ABORT_RPL_PENDING);
- if (!csk_flag_nochk(csk, CSK_ABORT_REQ_RCVD)) {
- if (sk->sk_state == TCP_SYN_SENT) {
- cxgb4_remove_tid(cdev->tids,
- csk->port_id,
- GET_TID(rpl),
- sk->sk_family);
- sock_put(sk);
- }
- chtls_release_resources(sk);
- chtls_conn_done(sk);
- }
- }
- kfree_skb(skb);
-}
-
-static int chtls_conn_cpl(struct chtls_dev *cdev, struct sk_buff *skb)
-{
- struct cpl_peer_close *req = cplhdr(skb) + RSS_HDR;
- void (*fn)(struct sock *sk, struct sk_buff *skb);
- unsigned int hwtid = GET_TID(req);
- struct chtls_sock *csk;
- struct sock *sk;
- u8 opcode;
-
- opcode = ((const struct rss_header *)cplhdr(skb))->opcode;
-
- sk = lookup_tid(cdev->tids, hwtid);
- if (!sk)
- goto rel_skb;
-
- csk = sk->sk_user_data;
-
- switch (opcode) {
- case CPL_PEER_CLOSE:
- fn = chtls_peer_close;
- break;
- case CPL_CLOSE_CON_RPL:
- fn = chtls_close_con_rpl;
- break;
- case CPL_ABORT_REQ_RSS:
- /*
- * Save the offload device in the skb, we may process this
- * message after the socket has closed.
- */
- BLOG_SKB_CB(skb)->cdev = csk->cdev;
- fn = chtls_abort_req_rss;
- break;
- case CPL_ABORT_RPL_RSS:
- fn = chtls_abort_rpl_rss;
- break;
- default:
- goto rel_skb;
- }
-
- process_cpl_msg(fn, sk, skb);
- return 0;
-
-rel_skb:
- kfree_skb(skb);
- return 0;
-}
-
-static void chtls_rx_ack(struct sock *sk, struct sk_buff *skb)
-{
- struct cpl_fw4_ack *hdr = cplhdr(skb) + RSS_HDR;
- struct chtls_sock *csk = sk->sk_user_data;
- struct tcp_sock *tp = tcp_sk(sk);
- u32 credits = hdr->credits;
- u32 snd_una;
-
- snd_una = ntohl(hdr->snd_una);
- csk->wr_credits += credits;
-
- if (csk->wr_unacked > csk->wr_max_credits - csk->wr_credits)
- csk->wr_unacked = csk->wr_max_credits - csk->wr_credits;
-
- while (credits) {
- struct sk_buff *pskb = csk->wr_skb_head;
- u32 csum;
-
- if (unlikely(!pskb)) {
- if (csk->wr_nondata)
- csk->wr_nondata -= credits;
- break;
- }
- csum = (__force u32)pskb->csum;
- if (unlikely(credits < csum)) {
- pskb->csum = (__force __wsum)(csum - credits);
- break;
- }
- dequeue_wr(sk);
- credits -= csum;
- kfree_skb(pskb);
- }
- if (hdr->seq_vld & CPL_FW4_ACK_FLAGS_SEQVAL) {
- if (unlikely(before(snd_una, tp->snd_una))) {
- kfree_skb(skb);
- return;
- }
-
- if (tp->snd_una != snd_una) {
- tp->snd_una = snd_una;
- tp->rcv_tstamp = tcp_jiffies32;
- if (tp->snd_una == tp->snd_nxt &&
- !csk_flag_nochk(csk, CSK_TX_FAILOVER))
- csk_reset_flag(csk, CSK_TX_WAIT_IDLE);
- }
- }
-
- if (hdr->seq_vld & CPL_FW4_ACK_FLAGS_CH) {
- unsigned int fclen16 = roundup(failover_flowc_wr_len, 16);
-
- csk->wr_credits -= fclen16;
- csk_reset_flag(csk, CSK_TX_WAIT_IDLE);
- csk_reset_flag(csk, CSK_TX_FAILOVER);
- }
- if (skb_queue_len(&csk->txq) && chtls_push_frames(csk, 0))
- sk->sk_write_space(sk);
-
- kfree_skb(skb);
-}
-
-static int chtls_wr_ack(struct chtls_dev *cdev, struct sk_buff *skb)
-{
- struct cpl_fw4_ack *rpl = cplhdr(skb) + RSS_HDR;
- unsigned int hwtid = GET_TID(rpl);
- struct sock *sk;
-
- sk = lookup_tid(cdev->tids, hwtid);
- if (unlikely(!sk)) {
- pr_err("can't find conn. for hwtid %u.\n", hwtid);
- return -EINVAL;
- }
- process_cpl_msg(chtls_rx_ack, sk, skb);
-
- return 0;
-}
-
-static int chtls_set_tcb_rpl(struct chtls_dev *cdev, struct sk_buff *skb)
-{
- struct cpl_set_tcb_rpl *rpl = cplhdr(skb) + RSS_HDR;
- unsigned int hwtid = GET_TID(rpl);
- struct sock *sk;
-
- sk = lookup_tid(cdev->tids, hwtid);
-
- /* return EINVAL if socket doesn't exist */
- if (!sk)
- return -EINVAL;
-
- /* Reusing the skb as size of cpl_set_tcb_field structure
- * is greater than cpl_abort_req
- */
- if (TCB_COOKIE_G(rpl->cookie) == TCB_FIELD_COOKIE_TFLAG)
- chtls_send_abort(sk, CPL_ABORT_SEND_RST, NULL);
-
- kfree_skb(skb);
- return 0;
-}
-
-chtls_handler_func chtls_handlers[NUM_CPL_CMDS] = {
- [CPL_PASS_OPEN_RPL] = chtls_pass_open_rpl,
- [CPL_CLOSE_LISTSRV_RPL] = chtls_close_listsrv_rpl,
- [CPL_PASS_ACCEPT_REQ] = chtls_pass_accept_req,
- [CPL_PASS_ESTABLISH] = chtls_pass_establish,
- [CPL_RX_DATA] = chtls_rx_data,
- [CPL_TLS_DATA] = chtls_rx_pdu,
- [CPL_RX_TLS_CMP] = chtls_rx_cmp,
- [CPL_PEER_CLOSE] = chtls_conn_cpl,
- [CPL_CLOSE_CON_RPL] = chtls_conn_cpl,
- [CPL_ABORT_REQ_RSS] = chtls_conn_cpl,
- [CPL_ABORT_RPL_RSS] = chtls_conn_cpl,
- [CPL_FW4_ACK] = chtls_wr_ack,
- [CPL_SET_TCB_RPL] = chtls_set_tcb_rpl,
-};
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.h b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.h
deleted file mode 100644
index 29ceff5a5fcb..000000000000
--- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.h
+++ /dev/null
@@ -1,218 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (c) 2018 Chelsio Communications, Inc.
- */
-
-#ifndef __CHTLS_CM_H__
-#define __CHTLS_CM_H__
-
-/*
- * TCB settings
- */
-/* 3:0 */
-#define TCB_ULP_TYPE_W 0
-#define TCB_ULP_TYPE_S 0
-#define TCB_ULP_TYPE_M 0xfULL
-#define TCB_ULP_TYPE_V(x) ((x) << TCB_ULP_TYPE_S)
-
-/* 11:4 */
-#define TCB_ULP_RAW_W 0
-#define TCB_ULP_RAW_S 4
-#define TCB_ULP_RAW_M 0xffULL
-#define TCB_ULP_RAW_V(x) ((x) << TCB_ULP_RAW_S)
-
-#define TF_TLS_KEY_SIZE_S 7
-#define TF_TLS_KEY_SIZE_V(x) ((x) << TF_TLS_KEY_SIZE_S)
-
-#define TF_TLS_CONTROL_S 2
-#define TF_TLS_CONTROL_V(x) ((x) << TF_TLS_CONTROL_S)
-
-#define TF_TLS_ACTIVE_S 1
-#define TF_TLS_ACTIVE_V(x) ((x) << TF_TLS_ACTIVE_S)
-
-#define TF_TLS_ENABLE_S 0
-#define TF_TLS_ENABLE_V(x) ((x) << TF_TLS_ENABLE_S)
-
-#define TF_RX_QUIESCE_S 15
-#define TF_RX_QUIESCE_V(x) ((x) << TF_RX_QUIESCE_S)
-
-/*
- * Max receive window supported by HW in bytes. Only a small part of it can
- * be set through option0, the rest needs to be set through RX_DATA_ACK.
- */
-#define MAX_RCV_WND ((1U << 27) - 1)
-#define MAX_MSS 65536
-
-/*
- * Min receive window. We want it to be large enough to accommodate receive
- * coalescing, handle jumbo frames, and not trigger sender SWS avoidance.
- */
-#define MIN_RCV_WND (24 * 1024U)
-#define LOOPBACK(x) (((x) & htonl(0xff000000)) == htonl(0x7f000000))
-
-/* for TX: a skb must have a headroom of at least TX_HEADER_LEN bytes */
-#define TX_HEADER_LEN \
- (sizeof(struct fw_ofld_tx_data_wr) + sizeof(struct sge_opaque_hdr))
-#define TX_TLSHDR_LEN \
- (sizeof(struct fw_tlstx_data_wr) + sizeof(struct cpl_tx_tls_sfo) + \
- sizeof(struct sge_opaque_hdr))
-#define TXDATA_SKB_LEN 128
-
-enum {
- CPL_TX_TLS_SFO_TYPE_CCS,
- CPL_TX_TLS_SFO_TYPE_ALERT,
- CPL_TX_TLS_SFO_TYPE_HANDSHAKE,
- CPL_TX_TLS_SFO_TYPE_DATA,
- CPL_TX_TLS_SFO_TYPE_HEARTBEAT,
-};
-
-enum {
- TLS_HDR_TYPE_CCS = 20,
- TLS_HDR_TYPE_ALERT,
- TLS_HDR_TYPE_HANDSHAKE,
- TLS_HDR_TYPE_RECORD,
- TLS_HDR_TYPE_HEARTBEAT,
-};
-
-typedef void (*defer_handler_t)(struct chtls_dev *dev, struct sk_buff *skb);
-extern struct request_sock_ops chtls_rsk_ops;
-extern struct request_sock_ops chtls_rsk_opsv6;
-
-struct deferred_skb_cb {
- defer_handler_t handler;
- struct chtls_dev *dev;
-};
-
-#define DEFERRED_SKB_CB(skb) ((struct deferred_skb_cb *)(skb)->cb)
-#define failover_flowc_wr_len offsetof(struct fw_flowc_wr, mnemval[3])
-#define WR_SKB_CB(skb) ((struct wr_skb_cb *)(skb)->cb)
-#define ACCEPT_QUEUE(sk) (&inet_csk(sk)->icsk_accept_queue.rskq_accept_head)
-
-#define SND_WSCALE(tp) ((tp)->rx_opt.snd_wscale)
-#define RCV_WSCALE(tp) ((tp)->rx_opt.rcv_wscale)
-#define USER_MSS(tp) (READ_ONCE((tp)->rx_opt.user_mss))
-#define TS_RECENT_STAMP(tp) ((tp)->rx_opt.ts_recent_stamp)
-#define WSCALE_OK(tp) ((tp)->rx_opt.wscale_ok)
-#define TSTAMP_OK(tp) ((tp)->rx_opt.tstamp_ok)
-#define SACK_OK(tp) ((tp)->rx_opt.sack_ok)
-
-/* TLS SKB */
-#define skb_ulp_tls_inline(skb) (ULP_SKB_CB(skb)->ulp.tls.ofld)
-#define skb_ulp_tls_iv_imm(skb) (ULP_SKB_CB(skb)->ulp.tls.iv)
-
-void chtls_defer_reply(struct sk_buff *skb, struct chtls_dev *dev,
- defer_handler_t handler);
-
-/*
- * Returns true if the socket is in one of the supplied states.
- */
-static inline unsigned int sk_in_state(const struct sock *sk,
- unsigned int states)
-{
- return states & (1 << sk->sk_state);
-}
-
-static void chtls_rsk_destructor(struct request_sock *req)
-{
- /* do nothing */
-}
-
-static inline void chtls_init_rsk_ops(struct proto *chtls_tcp_prot,
- struct request_sock_ops *chtls_tcp_ops,
- struct proto *tcp_prot, int family)
-{
- memset(chtls_tcp_ops, 0, sizeof(*chtls_tcp_ops));
- chtls_tcp_ops->family = family;
- chtls_tcp_ops->obj_size = sizeof(struct tcp_request_sock);
- chtls_tcp_ops->destructor = chtls_rsk_destructor;
- chtls_tcp_ops->slab = tcp_prot->rsk_prot->slab;
- chtls_tcp_prot->rsk_prot = chtls_tcp_ops;
-}
-
-static inline void chtls_reqsk_free(struct request_sock *req)
-{
- if (req->rsk_listener)
- sock_put(req->rsk_listener);
- kmem_cache_free(req->rsk_ops->slab, req);
-}
-
-#define DECLARE_TASK_FUNC(task, task_param) \
- static void task(struct work_struct *task_param)
-
-static inline void sk_wakeup_sleepers(struct sock *sk, bool interruptable)
-{
- struct socket_wq *wq;
-
- rcu_read_lock();
- wq = rcu_dereference(sk->sk_wq);
- if (skwq_has_sleeper(wq)) {
- if (interruptable)
- wake_up_interruptible(sk_sleep(sk));
- else
- wake_up_all(sk_sleep(sk));
- }
- rcu_read_unlock();
-}
-
-static inline void chtls_set_req_port(struct request_sock *oreq,
- __be16 source, __be16 dest)
-{
- inet_rsk(oreq)->ir_rmt_port = source;
- inet_rsk(oreq)->ir_num = ntohs(dest);
-}
-
-static inline void chtls_set_req_addr(struct request_sock *oreq,
- __be32 local_ip, __be32 peer_ip)
-{
- inet_rsk(oreq)->ir_loc_addr = local_ip;
- inet_rsk(oreq)->ir_rmt_addr = peer_ip;
-}
-
-static inline void chtls_free_skb(struct sock *sk, struct sk_buff *skb)
-{
- skb_dstref_steal(skb);
- __skb_unlink(skb, &sk->sk_receive_queue);
- __kfree_skb(skb);
-}
-
-static inline void chtls_kfree_skb(struct sock *sk, struct sk_buff *skb)
-{
- skb_dstref_steal(skb);
- __skb_unlink(skb, &sk->sk_receive_queue);
- kfree_skb(skb);
-}
-
-static inline void chtls_reset_wr_list(struct chtls_sock *csk)
-{
- csk->wr_skb_head = NULL;
- csk->wr_skb_tail = NULL;
-}
-
-static inline void enqueue_wr(struct chtls_sock *csk, struct sk_buff *skb)
-{
- WR_SKB_CB(skb)->next_wr = NULL;
-
- skb_get(skb);
-
- if (!csk->wr_skb_head)
- csk->wr_skb_head = skb;
- else
- WR_SKB_CB(csk->wr_skb_tail)->next_wr = skb;
- csk->wr_skb_tail = skb;
-}
-
-static inline struct sk_buff *dequeue_wr(struct sock *sk)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
- struct sk_buff *skb = NULL;
-
- skb = csk->wr_skb_head;
-
- if (likely(skb)) {
- /* Don't bother clearing the tail */
- csk->wr_skb_head = WR_SKB_CB(skb)->next_wr;
- WR_SKB_CB(skb)->next_wr = NULL;
- }
- return skb;
-}
-#endif
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_hw.c b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_hw.c
deleted file mode 100644
index d84473ca844d..000000000000
--- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_hw.c
+++ /dev/null
@@ -1,462 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Copyright (c) 2018 Chelsio Communications, Inc.
- *
- * Written by: Atul Gupta (atul.gupta@chelsio.com)
- */
-
-#include <linux/module.h>
-#include <linux/list.h>
-#include <linux/workqueue.h>
-#include <linux/skbuff.h>
-#include <linux/timer.h>
-#include <linux/notifier.h>
-#include <linux/inetdevice.h>
-#include <linux/ip.h>
-#include <linux/tcp.h>
-#include <linux/tls.h>
-#include <net/tls.h>
-
-#include "chtls.h"
-#include "chtls_cm.h"
-
-static void __set_tcb_field_direct(struct chtls_sock *csk,
- struct cpl_set_tcb_field *req, u16 word,
- u64 mask, u64 val, u8 cookie, int no_reply)
-{
- struct ulptx_idata *sc;
-
- INIT_TP_WR_CPL(req, CPL_SET_TCB_FIELD, csk->tid);
- req->wr.wr_mid |= htonl(FW_WR_FLOWID_V(csk->tid));
- req->reply_ctrl = htons(NO_REPLY_V(no_reply) |
- QUEUENO_V(csk->rss_qid));
- req->word_cookie = htons(TCB_WORD_V(word) | TCB_COOKIE_V(cookie));
- req->mask = cpu_to_be64(mask);
- req->val = cpu_to_be64(val);
- sc = (struct ulptx_idata *)(req + 1);
- sc->cmd_more = htonl(ULPTX_CMD_V(ULP_TX_SC_NOOP));
- sc->len = htonl(0);
-}
-
-static void __set_tcb_field(struct sock *sk, struct sk_buff *skb, u16 word,
- u64 mask, u64 val, u8 cookie, int no_reply)
-{
- struct cpl_set_tcb_field *req;
- struct chtls_sock *csk;
- struct ulptx_idata *sc;
- unsigned int wrlen;
-
- wrlen = roundup(sizeof(*req) + sizeof(*sc), 16);
- csk = rcu_dereference_sk_user_data(sk);
-
- req = (struct cpl_set_tcb_field *)__skb_put(skb, wrlen);
- __set_tcb_field_direct(csk, req, word, mask, val, cookie, no_reply);
- set_wr_txq(skb, CPL_PRIORITY_CONTROL, csk->port_id);
-}
-
-/*
- * Send control message to HW, message go as immediate data and packet
- * is freed immediately.
- */
-static int chtls_set_tcb_field(struct sock *sk, u16 word, u64 mask, u64 val)
-{
- struct cpl_set_tcb_field *req;
- unsigned int credits_needed;
- struct chtls_sock *csk;
- struct ulptx_idata *sc;
- struct sk_buff *skb;
- unsigned int wrlen;
- int ret;
-
- wrlen = roundup(sizeof(*req) + sizeof(*sc), 16);
-
- skb = alloc_skb(wrlen, GFP_ATOMIC);
- if (!skb)
- return -ENOMEM;
-
- credits_needed = DIV_ROUND_UP(wrlen, 16);
- csk = rcu_dereference_sk_user_data(sk);
-
- __set_tcb_field(sk, skb, word, mask, val, 0, 1);
- skb_set_queue_mapping(skb, (csk->txq_idx << 1) | CPL_PRIORITY_DATA);
- csk->wr_credits -= credits_needed;
- csk->wr_unacked += credits_needed;
- enqueue_wr(csk, skb);
- ret = cxgb4_ofld_send(csk->egress_dev, skb);
- if (ret < 0)
- kfree_skb(skb);
- return ret < 0 ? ret : 0;
-}
-
-void chtls_set_tcb_field_rpl_skb(struct sock *sk, u16 word,
- u64 mask, u64 val, u8 cookie,
- int through_l2t)
-{
- struct sk_buff *skb;
- unsigned int wrlen;
-
- wrlen = sizeof(struct cpl_set_tcb_field) + sizeof(struct ulptx_idata);
- wrlen = roundup(wrlen, 16);
-
- skb = alloc_skb(wrlen, GFP_KERNEL | __GFP_NOFAIL);
- if (!skb)
- return;
-
- __set_tcb_field(sk, skb, word, mask, val, cookie, 0);
- send_or_defer(sk, tcp_sk(sk), skb, through_l2t);
-}
-
-static int chtls_set_tcb_keyid(struct sock *sk, int keyid)
-{
- return chtls_set_tcb_field(sk, 31, 0xFFFFFFFFULL, keyid);
-}
-
-static int chtls_set_tcb_seqno(struct sock *sk)
-{
- return chtls_set_tcb_field(sk, 28, ~0ULL, 0);
-}
-
-static int chtls_set_tcb_quiesce(struct sock *sk, int val)
-{
- return chtls_set_tcb_field(sk, 1, (1ULL << TF_RX_QUIESCE_S),
- TF_RX_QUIESCE_V(val));
-}
-
-void chtls_set_quiesce_ctrl(struct sock *sk, int val)
-{
- struct chtls_sock *csk;
- struct sk_buff *skb;
- unsigned int wrlen;
- int ret;
-
- wrlen = sizeof(struct cpl_set_tcb_field) + sizeof(struct ulptx_idata);
- wrlen = roundup(wrlen, 16);
-
- skb = alloc_skb(wrlen, GFP_ATOMIC);
- if (!skb)
- return;
-
- csk = rcu_dereference_sk_user_data(sk);
-
- __set_tcb_field(sk, skb, 1, TF_RX_QUIESCE_V(1), 0, 0, 1);
- set_wr_txq(skb, CPL_PRIORITY_CONTROL, csk->port_id);
- ret = cxgb4_ofld_send(csk->egress_dev, skb);
- if (ret < 0)
- kfree_skb(skb);
-}
-
-/* TLS Key bitmap processing */
-int chtls_init_kmap(struct chtls_dev *cdev, struct cxgb4_lld_info *lldi)
-{
- unsigned int num_key_ctx, bsize;
- int ksize;
-
- num_key_ctx = (lldi->vr->key.size / TLS_KEY_CONTEXT_SZ);
- bsize = BITS_TO_LONGS(num_key_ctx);
-
- cdev->kmap.size = num_key_ctx;
- cdev->kmap.available = bsize;
- ksize = sizeof(*cdev->kmap.addr) * bsize;
- cdev->kmap.addr = kvzalloc(ksize, GFP_KERNEL);
- if (!cdev->kmap.addr)
- return -ENOMEM;
-
- cdev->kmap.start = lldi->vr->key.start;
- spin_lock_init(&cdev->kmap.lock);
- return 0;
-}
-
-static int get_new_keyid(struct chtls_sock *csk, u32 optname)
-{
- struct net_device *dev = csk->egress_dev;
- struct chtls_dev *cdev = csk->cdev;
- struct chtls_hws *hws;
- struct adapter *adap;
- int keyid;
-
- adap = netdev2adap(dev);
- hws = &csk->tlshws;
-
- spin_lock_bh(&cdev->kmap.lock);
- keyid = find_first_zero_bit(cdev->kmap.addr, cdev->kmap.size);
- if (keyid < cdev->kmap.size) {
- __set_bit(keyid, cdev->kmap.addr);
- if (optname == TLS_RX)
- hws->rxkey = keyid;
- else
- hws->txkey = keyid;
- atomic_inc(&adap->chcr_stats.tls_key);
- } else {
- keyid = -1;
- }
- spin_unlock_bh(&cdev->kmap.lock);
- return keyid;
-}
-
-void free_tls_keyid(struct sock *sk)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
- struct net_device *dev = csk->egress_dev;
- struct chtls_dev *cdev = csk->cdev;
- struct chtls_hws *hws;
- struct adapter *adap;
-
- if (!cdev->kmap.addr)
- return;
-
- adap = netdev2adap(dev);
- hws = &csk->tlshws;
-
- spin_lock_bh(&cdev->kmap.lock);
- if (hws->rxkey >= 0) {
- __clear_bit(hws->rxkey, cdev->kmap.addr);
- atomic_dec(&adap->chcr_stats.tls_key);
- hws->rxkey = -1;
- }
- if (hws->txkey >= 0) {
- __clear_bit(hws->txkey, cdev->kmap.addr);
- atomic_dec(&adap->chcr_stats.tls_key);
- hws->txkey = -1;
- }
- spin_unlock_bh(&cdev->kmap.lock);
-}
-
-unsigned int keyid_to_addr(int start_addr, int keyid)
-{
- return (start_addr + (keyid * TLS_KEY_CONTEXT_SZ)) >> 5;
-}
-
-static void chtls_rxkey_ivauth(struct _key_ctx *kctx)
-{
- kctx->iv_to_auth = cpu_to_be64(KEYCTX_TX_WR_IV_V(6ULL) |
- KEYCTX_TX_WR_AAD_V(1ULL) |
- KEYCTX_TX_WR_AADST_V(5ULL) |
- KEYCTX_TX_WR_CIPHER_V(14ULL) |
- KEYCTX_TX_WR_CIPHERST_V(0ULL) |
- KEYCTX_TX_WR_AUTH_V(14ULL) |
- KEYCTX_TX_WR_AUTHST_V(16ULL) |
- KEYCTX_TX_WR_AUTHIN_V(16ULL));
-}
-
-static int chtls_key_info(struct chtls_sock *csk,
- struct _key_ctx *kctx,
- u32 keylen, u32 optname,
- int cipher_type)
-{
- unsigned char key[AES_MAX_KEY_SIZE];
- unsigned char *key_p, *salt;
- unsigned char ghash_h[AEAD_H_SIZE];
- int ck_size, key_ctx_size, kctx_mackey_size, salt_size;
- struct aes_enckey aes;
- int ret;
-
- key_ctx_size = sizeof(struct _key_ctx) +
- roundup(keylen, 16) + AEAD_H_SIZE;
-
- /* GCM mode of AES supports 128 and 256 bit encryption, so
- * prepare key context base on GCM cipher type
- */
- switch (cipher_type) {
- case TLS_CIPHER_AES_GCM_128: {
- struct tls12_crypto_info_aes_gcm_128 *gcm_ctx_128 =
- (struct tls12_crypto_info_aes_gcm_128 *)
- &csk->tlshws.crypto_info;
- memcpy(key, gcm_ctx_128->key, keylen);
-
- key_p = gcm_ctx_128->key;
- salt = gcm_ctx_128->salt;
- ck_size = CHCR_KEYCTX_CIPHER_KEY_SIZE_128;
- salt_size = TLS_CIPHER_AES_GCM_128_SALT_SIZE;
- kctx_mackey_size = CHCR_KEYCTX_MAC_KEY_SIZE_128;
- break;
- }
- case TLS_CIPHER_AES_GCM_256: {
- struct tls12_crypto_info_aes_gcm_256 *gcm_ctx_256 =
- (struct tls12_crypto_info_aes_gcm_256 *)
- &csk->tlshws.crypto_info;
- memcpy(key, gcm_ctx_256->key, keylen);
-
- key_p = gcm_ctx_256->key;
- salt = gcm_ctx_256->salt;
- ck_size = CHCR_KEYCTX_CIPHER_KEY_SIZE_256;
- salt_size = TLS_CIPHER_AES_GCM_256_SALT_SIZE;
- kctx_mackey_size = CHCR_KEYCTX_MAC_KEY_SIZE_256;
- break;
- }
- default:
- pr_err("GCM: Invalid key length %d\n", keylen);
- return -EINVAL;
- }
-
- /* Calculate the H = CIPH(K, 0 repeated 16 times).
- * It will go in key context
- */
- ret = aes_prepareenckey(&aes, key, keylen);
- if (ret)
- return ret;
-
- memset(ghash_h, 0, AEAD_H_SIZE);
- aes_encrypt(&aes, ghash_h, ghash_h);
- memzero_explicit(&aes, sizeof(aes));
- csk->tlshws.keylen = key_ctx_size;
-
- /* Copy the Key context */
- if (optname == TLS_RX) {
- int key_ctx;
-
- key_ctx = ((key_ctx_size >> 4) << 3);
- kctx->ctx_hdr = FILL_KEY_CRX_HDR(ck_size,
- kctx_mackey_size,
- 0, 0, key_ctx);
- chtls_rxkey_ivauth(kctx);
- } else {
- kctx->ctx_hdr = FILL_KEY_CTX_HDR(ck_size,
- kctx_mackey_size,
- 0, 0, key_ctx_size >> 4);
- }
-
- memcpy(kctx->salt, salt, salt_size);
- memcpy(kctx->key, key_p, keylen);
- memcpy(kctx->key + keylen, ghash_h, AEAD_H_SIZE);
- /* erase key info from driver */
- memset(key_p, 0, keylen);
-
- return 0;
-}
-
-static void chtls_set_scmd(struct chtls_sock *csk)
-{
- struct chtls_hws *hws = &csk->tlshws;
-
- hws->scmd.seqno_numivs =
- SCMD_SEQ_NO_CTRL_V(3) |
- SCMD_PROTO_VERSION_V(0) |
- SCMD_ENC_DEC_CTRL_V(0) |
- SCMD_CIPH_AUTH_SEQ_CTRL_V(1) |
- SCMD_CIPH_MODE_V(2) |
- SCMD_AUTH_MODE_V(4) |
- SCMD_HMAC_CTRL_V(0) |
- SCMD_IV_SIZE_V(4) |
- SCMD_NUM_IVS_V(1);
-
- hws->scmd.ivgen_hdrlen =
- SCMD_IV_GEN_CTRL_V(1) |
- SCMD_KEY_CTX_INLINE_V(0) |
- SCMD_TLS_FRAG_ENABLE_V(1);
-}
-
-int chtls_setkey(struct chtls_sock *csk, u32 keylen,
- u32 optname, int cipher_type)
-{
- struct tls_key_req *kwr;
- struct chtls_dev *cdev;
- struct _key_ctx *kctx;
- int wrlen, klen, len;
- struct sk_buff *skb;
- struct sock *sk;
- int keyid;
- int kaddr;
- int ret;
-
- cdev = csk->cdev;
- sk = csk->sk;
-
- klen = roundup((keylen + AEAD_H_SIZE) + sizeof(*kctx), 32);
- wrlen = roundup(sizeof(*kwr), 16);
- len = klen + wrlen;
-
- /* Flush out-standing data before new key takes effect */
- if (optname == TLS_TX) {
- lock_sock(sk);
- if (skb_queue_len(&csk->txq))
- chtls_push_frames(csk, 0);
- release_sock(sk);
- }
-
- skb = alloc_skb(len, GFP_KERNEL);
- if (!skb)
- return -ENOMEM;
-
- keyid = get_new_keyid(csk, optname);
- if (keyid < 0) {
- ret = -ENOSPC;
- goto out_nokey;
- }
-
- kaddr = keyid_to_addr(cdev->kmap.start, keyid);
- kwr = (struct tls_key_req *)__skb_put_zero(skb, len);
- kwr->wr.op_to_compl =
- cpu_to_be32(FW_WR_OP_V(FW_ULPTX_WR) | FW_WR_COMPL_F |
- FW_WR_ATOMIC_V(1U));
- kwr->wr.flowid_len16 =
- cpu_to_be32(FW_WR_LEN16_V(DIV_ROUND_UP(len, 16) |
- FW_WR_FLOWID_V(csk->tid)));
- kwr->wr.protocol = 0;
- kwr->wr.mfs = htons(TLS_MFS);
- kwr->wr.reneg_to_write_rx = optname;
-
- /* ulptx command */
- kwr->req.cmd = cpu_to_be32(ULPTX_CMD_V(ULP_TX_MEM_WRITE) |
- T5_ULP_MEMIO_ORDER_V(1) |
- T5_ULP_MEMIO_IMM_V(1));
- kwr->req.len16 = cpu_to_be32((csk->tid << 8) |
- DIV_ROUND_UP(len - sizeof(kwr->wr), 16));
- kwr->req.dlen = cpu_to_be32(ULP_MEMIO_DATA_LEN_V(klen >> 5));
- kwr->req.lock_addr = cpu_to_be32(ULP_MEMIO_ADDR_V(kaddr));
-
- /* sub command */
- kwr->sc_imm.cmd_more = cpu_to_be32(ULPTX_CMD_V(ULP_TX_SC_IMM));
- kwr->sc_imm.len = cpu_to_be32(klen);
-
- lock_sock(sk);
- /* key info */
- kctx = (struct _key_ctx *)(kwr + 1);
- ret = chtls_key_info(csk, kctx, keylen, optname, cipher_type);
- if (ret)
- goto out_notcb;
-
- if (unlikely(csk_flag(sk, CSK_ABORT_SHUTDOWN)))
- goto out_notcb;
-
- set_wr_txq(skb, CPL_PRIORITY_DATA, csk->tlshws.txqid);
- csk->wr_credits -= DIV_ROUND_UP(len, 16);
- csk->wr_unacked += DIV_ROUND_UP(len, 16);
- enqueue_wr(csk, skb);
- cxgb4_ofld_send(csk->egress_dev, skb);
- skb = NULL;
-
- chtls_set_scmd(csk);
- /* Clear quiesce for Rx key */
- if (optname == TLS_RX) {
- ret = chtls_set_tcb_keyid(sk, keyid);
- if (ret)
- goto out_notcb;
- ret = chtls_set_tcb_field(sk, 0,
- TCB_ULP_RAW_V(TCB_ULP_RAW_M),
- TCB_ULP_RAW_V((TF_TLS_KEY_SIZE_V(1) |
- TF_TLS_CONTROL_V(1) |
- TF_TLS_ACTIVE_V(1) |
- TF_TLS_ENABLE_V(1))));
- if (ret)
- goto out_notcb;
- ret = chtls_set_tcb_seqno(sk);
- if (ret)
- goto out_notcb;
- ret = chtls_set_tcb_quiesce(sk, 0);
- if (ret)
- goto out_notcb;
- csk->tlshws.rxkey = keyid;
- } else {
- csk->tlshws.tx_seq_no = 0;
- csk->tlshws.txkey = keyid;
- }
-
- release_sock(sk);
- return ret;
-out_notcb:
- release_sock(sk);
- free_tls_keyid(sk);
-out_nokey:
- kfree_skb(skb);
- return ret;
-}
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c
deleted file mode 100644
index c8e99409a52a..000000000000
--- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c
+++ /dev/null
@@ -1,1836 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Copyright (c) 2018 Chelsio Communications, Inc.
- *
- * Written by: Atul Gupta (atul.gupta@chelsio.com)
- */
-
-#include <linux/module.h>
-#include <linux/list.h>
-#include <linux/workqueue.h>
-#include <linux/skbuff.h>
-#include <linux/timer.h>
-#include <linux/notifier.h>
-#include <linux/inetdevice.h>
-#include <linux/ip.h>
-#include <linux/tcp.h>
-#include <linux/sched/signal.h>
-#include <net/tcp.h>
-#include <net/busy_poll.h>
-#include <crypto/aes.h>
-
-#include "chtls.h"
-#include "chtls_cm.h"
-
-static bool is_tls_tx(struct chtls_sock *csk)
-{
- return csk->tlshws.txkey >= 0;
-}
-
-static bool is_tls_rx(struct chtls_sock *csk)
-{
- return csk->tlshws.rxkey >= 0;
-}
-
-static int data_sgl_len(const struct sk_buff *skb)
-{
- unsigned int cnt;
-
- cnt = skb_shinfo(skb)->nr_frags;
- return sgl_len(cnt) * 8;
-}
-
-static int nos_ivs(struct sock *sk, unsigned int size)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
-
- return DIV_ROUND_UP(size, csk->tlshws.mfs);
-}
-
-static int set_ivs_imm(struct sock *sk, const struct sk_buff *skb)
-{
- int ivs_size = nos_ivs(sk, skb->len) * CIPHER_BLOCK_SIZE;
- int hlen = TLS_WR_CPL_LEN + data_sgl_len(skb);
-
- if ((hlen + KEY_ON_MEM_SZ + ivs_size) <
- MAX_IMM_OFLD_TX_DATA_WR_LEN) {
- ULP_SKB_CB(skb)->ulp.tls.iv = 1;
- return 1;
- }
- ULP_SKB_CB(skb)->ulp.tls.iv = 0;
- return 0;
-}
-
-static int max_ivs_size(struct sock *sk, int size)
-{
- return nos_ivs(sk, size) * CIPHER_BLOCK_SIZE;
-}
-
-static int ivs_size(struct sock *sk, const struct sk_buff *skb)
-{
- return set_ivs_imm(sk, skb) ? (nos_ivs(sk, skb->len) *
- CIPHER_BLOCK_SIZE) : 0;
-}
-
-static int flowc_wr_credits(int nparams, int *flowclenp)
-{
- int flowclen16, flowclen;
-
- flowclen = offsetof(struct fw_flowc_wr, mnemval[nparams]);
- flowclen16 = DIV_ROUND_UP(flowclen, 16);
- flowclen = flowclen16 * 16;
-
- if (flowclenp)
- *flowclenp = flowclen;
-
- return flowclen16;
-}
-
-static struct sk_buff *create_flowc_wr_skb(struct sock *sk,
- struct fw_flowc_wr *flowc,
- int flowclen)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
- struct sk_buff *skb;
-
- skb = alloc_skb(flowclen, GFP_ATOMIC);
- if (!skb)
- return NULL;
-
- __skb_put_data(skb, flowc, flowclen);
- skb_set_queue_mapping(skb, (csk->txq_idx << 1) | CPL_PRIORITY_DATA);
-
- return skb;
-}
-
-static int send_flowc_wr(struct sock *sk, struct fw_flowc_wr *flowc,
- int flowclen)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
- struct tcp_sock *tp = tcp_sk(sk);
- struct sk_buff *skb;
- int flowclen16;
- int ret;
-
- flowclen16 = flowclen / 16;
-
- if (csk_flag(sk, CSK_TX_DATA_SENT)) {
- skb = create_flowc_wr_skb(sk, flowc, flowclen);
- if (!skb)
- return -ENOMEM;
-
- skb_entail(sk, skb,
- ULPCB_FLAG_NO_HDR | ULPCB_FLAG_NO_APPEND);
- return 0;
- }
-
- ret = cxgb4_immdata_send(csk->egress_dev,
- csk->txq_idx,
- flowc, flowclen);
- if (!ret)
- return flowclen16;
- skb = create_flowc_wr_skb(sk, flowc, flowclen);
- if (!skb)
- return -ENOMEM;
- send_or_defer(sk, tp, skb, 0);
- return flowclen16;
-}
-
-static u8 tcp_state_to_flowc_state(u8 state)
-{
- switch (state) {
- case TCP_ESTABLISHED:
- return FW_FLOWC_MNEM_TCPSTATE_ESTABLISHED;
- case TCP_CLOSE_WAIT:
- return FW_FLOWC_MNEM_TCPSTATE_CLOSEWAIT;
- case TCP_FIN_WAIT1:
- return FW_FLOWC_MNEM_TCPSTATE_FINWAIT1;
- case TCP_CLOSING:
- return FW_FLOWC_MNEM_TCPSTATE_CLOSING;
- case TCP_LAST_ACK:
- return FW_FLOWC_MNEM_TCPSTATE_LASTACK;
- case TCP_FIN_WAIT2:
- return FW_FLOWC_MNEM_TCPSTATE_FINWAIT2;
- }
-
- return FW_FLOWC_MNEM_TCPSTATE_ESTABLISHED;
-}
-
-int send_tx_flowc_wr(struct sock *sk, int compl,
- u32 snd_nxt, u32 rcv_nxt)
-{
- DEFINE_RAW_FLEX(struct fw_flowc_wr, flowc, mnemval, FW_FLOWC_MNEM_MAX);
- int nparams, paramidx, flowclen16, flowclen;
- struct chtls_sock *csk;
- struct tcp_sock *tp;
-
- csk = rcu_dereference_sk_user_data(sk);
- tp = tcp_sk(sk);
-
-#define FLOWC_PARAM(__m, __v) \
- do { \
- flowc->mnemval[paramidx].mnemonic = FW_FLOWC_MNEM_##__m; \
- flowc->mnemval[paramidx].val = cpu_to_be32(__v); \
- paramidx++; \
- } while (0)
-
- paramidx = 0;
-
- FLOWC_PARAM(PFNVFN, FW_PFVF_CMD_PFN_V(csk->cdev->lldi->pf));
- FLOWC_PARAM(CH, csk->tx_chan);
- FLOWC_PARAM(PORT, csk->tx_chan);
- FLOWC_PARAM(IQID, csk->rss_qid);
- FLOWC_PARAM(SNDNXT, tp->snd_nxt);
- FLOWC_PARAM(RCVNXT, tp->rcv_nxt);
- FLOWC_PARAM(SNDBUF, csk->sndbuf);
- FLOWC_PARAM(MSS, tp->mss_cache);
- FLOWC_PARAM(TCPSTATE, tcp_state_to_flowc_state(sk->sk_state));
-
- if (SND_WSCALE(tp))
- FLOWC_PARAM(RCV_SCALE, SND_WSCALE(tp));
-
- if (csk->ulp_mode == ULP_MODE_TLS)
- FLOWC_PARAM(ULD_MODE, ULP_MODE_TLS);
-
- if (csk->tlshws.fcplenmax)
- FLOWC_PARAM(TXDATAPLEN_MAX, csk->tlshws.fcplenmax);
-
- nparams = paramidx;
-#undef FLOWC_PARAM
-
- flowclen16 = flowc_wr_credits(nparams, &flowclen);
- flowc->op_to_nparams =
- cpu_to_be32(FW_WR_OP_V(FW_FLOWC_WR) |
- FW_WR_COMPL_V(compl) |
- FW_FLOWC_WR_NPARAMS_V(nparams));
- flowc->flowid_len16 = cpu_to_be32(FW_WR_LEN16_V(flowclen16) |
- FW_WR_FLOWID_V(csk->tid));
-
- return send_flowc_wr(sk, flowc, flowclen);
-}
-
-/* Copy IVs to WR */
-static int tls_copy_ivs(struct sock *sk, struct sk_buff *skb)
-
-{
- struct chtls_sock *csk;
- unsigned char *iv_loc;
- struct chtls_hws *hws;
- unsigned char *ivs;
- u16 number_of_ivs;
- struct page *page;
- int err = 0;
-
- csk = rcu_dereference_sk_user_data(sk);
- hws = &csk->tlshws;
- number_of_ivs = nos_ivs(sk, skb->len);
-
- if (number_of_ivs > MAX_IVS_PAGE) {
- pr_warn("MAX IVs in PAGE exceeded %d\n", number_of_ivs);
- return -ENOMEM;
- }
-
- /* generate the IVs */
- ivs = kmalloc_array(CIPHER_BLOCK_SIZE, number_of_ivs, GFP_ATOMIC);
- if (!ivs)
- return -ENOMEM;
- get_random_bytes(ivs, number_of_ivs * CIPHER_BLOCK_SIZE);
-
- if (skb_ulp_tls_iv_imm(skb)) {
- /* send the IVs as immediate data in the WR */
- iv_loc = (unsigned char *)__skb_push(skb, number_of_ivs *
- CIPHER_BLOCK_SIZE);
- if (iv_loc)
- memcpy(iv_loc, ivs, number_of_ivs * CIPHER_BLOCK_SIZE);
-
- hws->ivsize = number_of_ivs * CIPHER_BLOCK_SIZE;
- } else {
- /* Send the IVs as sgls */
- /* Already accounted IV DSGL for credits */
- skb_shinfo(skb)->nr_frags--;
- page = alloc_pages(sk->sk_allocation | __GFP_COMP, 0);
- if (!page) {
- pr_info("%s : Page allocation for IVs failed\n",
- __func__);
- err = -ENOMEM;
- goto out;
- }
- memcpy(page_address(page), ivs, number_of_ivs *
- CIPHER_BLOCK_SIZE);
- skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags, page, 0,
- number_of_ivs * CIPHER_BLOCK_SIZE);
- hws->ivsize = 0;
- }
-out:
- kfree(ivs);
- return err;
-}
-
-/* Copy Key to WR */
-static void tls_copy_tx_key(struct sock *sk, struct sk_buff *skb)
-{
- struct ulptx_sc_memrd *sc_memrd;
- struct chtls_sock *csk;
- struct chtls_dev *cdev;
- struct ulptx_idata *sc;
- struct chtls_hws *hws;
- u32 immdlen;
- int kaddr;
-
- csk = rcu_dereference_sk_user_data(sk);
- hws = &csk->tlshws;
- cdev = csk->cdev;
-
- immdlen = sizeof(*sc) + sizeof(*sc_memrd);
- kaddr = keyid_to_addr(cdev->kmap.start, hws->txkey);
- sc = (struct ulptx_idata *)__skb_push(skb, immdlen);
- if (sc) {
- sc->cmd_more = htonl(ULPTX_CMD_V(ULP_TX_SC_NOOP));
- sc->len = htonl(0);
- sc_memrd = (struct ulptx_sc_memrd *)(sc + 1);
- sc_memrd->cmd_to_len =
- htonl(ULPTX_CMD_V(ULP_TX_SC_MEMRD) |
- ULP_TX_SC_MORE_V(1) |
- ULPTX_LEN16_V(hws->keylen >> 4));
- sc_memrd->addr = htonl(kaddr);
- }
-}
-
-static u64 tlstx_incr_seqnum(struct chtls_hws *hws)
-{
- return hws->tx_seq_no++;
-}
-
-static bool is_sg_request(const struct sk_buff *skb)
-{
- return skb->peeked ||
- (skb->len > MAX_IMM_ULPTX_WR_LEN);
-}
-
-/*
- * Returns true if an sk_buff carries urgent data.
- */
-static bool skb_urgent(struct sk_buff *skb)
-{
- return ULP_SKB_CB(skb)->flags & ULPCB_FLAG_URG;
-}
-
-/* TLS content type for CPL SFO */
-static unsigned char tls_content_type(unsigned char content_type)
-{
- switch (content_type) {
- case TLS_HDR_TYPE_CCS:
- return CPL_TX_TLS_SFO_TYPE_CCS;
- case TLS_HDR_TYPE_ALERT:
- return CPL_TX_TLS_SFO_TYPE_ALERT;
- case TLS_HDR_TYPE_HANDSHAKE:
- return CPL_TX_TLS_SFO_TYPE_HANDSHAKE;
- case TLS_HDR_TYPE_HEARTBEAT:
- return CPL_TX_TLS_SFO_TYPE_HEARTBEAT;
- }
- return CPL_TX_TLS_SFO_TYPE_DATA;
-}
-
-static void tls_tx_data_wr(struct sock *sk, struct sk_buff *skb,
- int dlen, int tls_immd, u32 credits,
- int expn, int pdus)
-{
- struct fw_tlstx_data_wr *req_wr;
- struct cpl_tx_tls_sfo *req_cpl;
- unsigned int wr_ulp_mode_force;
- struct tls_scmd *updated_scmd;
- unsigned char data_type;
- struct chtls_sock *csk;
- struct net_device *dev;
- struct chtls_hws *hws;
- struct tls_scmd *scmd;
- struct adapter *adap;
- unsigned char *req;
- int immd_len;
- int iv_imm;
- int len;
-
- csk = rcu_dereference_sk_user_data(sk);
- iv_imm = skb_ulp_tls_iv_imm(skb);
- dev = csk->egress_dev;
- adap = netdev2adap(dev);
- hws = &csk->tlshws;
- scmd = &hws->scmd;
- len = dlen + expn;
-
- dlen = (dlen < hws->mfs) ? dlen : hws->mfs;
- atomic_inc(&adap->chcr_stats.tls_pdu_tx);
-
- updated_scmd = scmd;
- updated_scmd->seqno_numivs &= 0xffffff80;
- updated_scmd->seqno_numivs |= SCMD_NUM_IVS_V(pdus);
- hws->scmd = *updated_scmd;
-
- req = (unsigned char *)__skb_push(skb, sizeof(struct cpl_tx_tls_sfo));
- req_cpl = (struct cpl_tx_tls_sfo *)req;
- req = (unsigned char *)__skb_push(skb, (sizeof(struct
- fw_tlstx_data_wr)));
-
- req_wr = (struct fw_tlstx_data_wr *)req;
- immd_len = (tls_immd ? dlen : 0);
- req_wr->op_to_immdlen =
- htonl(FW_WR_OP_V(FW_TLSTX_DATA_WR) |
- FW_TLSTX_DATA_WR_COMPL_V(1) |
- FW_TLSTX_DATA_WR_IMMDLEN_V(immd_len));
- req_wr->flowid_len16 = htonl(FW_TLSTX_DATA_WR_FLOWID_V(csk->tid) |
- FW_TLSTX_DATA_WR_LEN16_V(credits));
- wr_ulp_mode_force = TX_ULP_MODE_V(ULP_MODE_TLS);
-
- if (is_sg_request(skb))
- wr_ulp_mode_force |= FW_OFLD_TX_DATA_WR_ALIGNPLD_F |
- ((tcp_sk(sk)->nonagle & TCP_NAGLE_OFF) ? 0 :
- FW_OFLD_TX_DATA_WR_SHOVE_F);
-
- req_wr->lsodisable_to_flags =
- htonl(TX_ULP_MODE_V(ULP_MODE_TLS) |
- TX_URG_V(skb_urgent(skb)) |
- T6_TX_FORCE_F | wr_ulp_mode_force |
- TX_SHOVE_V((!csk_flag(sk, CSK_TX_MORE_DATA)) &&
- skb_queue_empty(&csk->txq)));
-
- req_wr->ctxloc_to_exp =
- htonl(FW_TLSTX_DATA_WR_NUMIVS_V(pdus) |
- FW_TLSTX_DATA_WR_EXP_V(expn) |
- FW_TLSTX_DATA_WR_CTXLOC_V(CHTLS_KEY_CONTEXT_DDR) |
- FW_TLSTX_DATA_WR_IVDSGL_V(!iv_imm) |
- FW_TLSTX_DATA_WR_KEYSIZE_V(hws->keylen >> 4));
-
- /* Fill in the length */
- req_wr->plen = htonl(len);
- req_wr->mfs = htons(hws->mfs);
- req_wr->adjustedplen_pkd =
- htons(FW_TLSTX_DATA_WR_ADJUSTEDPLEN_V(hws->adjustlen));
- req_wr->expinplenmax_pkd =
- htons(FW_TLSTX_DATA_WR_EXPINPLENMAX_V(hws->expansion));
- req_wr->pdusinplenmax_pkd =
- FW_TLSTX_DATA_WR_PDUSINPLENMAX_V(hws->pdus);
- req_wr->r10 = 0;
-
- data_type = tls_content_type(ULP_SKB_CB(skb)->ulp.tls.type);
- req_cpl->op_to_seg_len = htonl(CPL_TX_TLS_SFO_OPCODE_V(CPL_TX_TLS_SFO) |
- CPL_TX_TLS_SFO_DATA_TYPE_V(data_type) |
- CPL_TX_TLS_SFO_CPL_LEN_V(2) |
- CPL_TX_TLS_SFO_SEG_LEN_V(dlen));
- req_cpl->pld_len = htonl(len - expn);
-
- req_cpl->type_protover = htonl(CPL_TX_TLS_SFO_TYPE_V
- ((data_type == CPL_TX_TLS_SFO_TYPE_HEARTBEAT) ?
- TLS_HDR_TYPE_HEARTBEAT : 0) |
- CPL_TX_TLS_SFO_PROTOVER_V(0));
-
- /* create the s-command */
- req_cpl->r1_lo = 0;
- req_cpl->seqno_numivs = cpu_to_be32(hws->scmd.seqno_numivs);
- req_cpl->ivgen_hdrlen = cpu_to_be32(hws->scmd.ivgen_hdrlen);
- req_cpl->scmd1 = cpu_to_be64(tlstx_incr_seqnum(hws));
-}
-
-/*
- * Calculate the TLS data expansion size
- */
-static int chtls_expansion_size(struct sock *sk, int data_len,
- int fullpdu,
- unsigned short *pducnt)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
- struct chtls_hws *hws = &csk->tlshws;
- struct tls_scmd *scmd = &hws->scmd;
- int fragsize = hws->mfs;
- int expnsize = 0;
- int fragleft;
- int fragcnt;
- int expppdu;
-
- if (SCMD_CIPH_MODE_G(scmd->seqno_numivs) ==
- SCMD_CIPH_MODE_AES_GCM) {
- expppdu = GCM_TAG_SIZE + AEAD_EXPLICIT_DATA_SIZE +
- TLS_HEADER_LENGTH;
-
- if (fullpdu) {
- *pducnt = data_len / (expppdu + fragsize);
- if (*pducnt > 32)
- *pducnt = 32;
- else if (!*pducnt)
- *pducnt = 1;
- expnsize = (*pducnt) * expppdu;
- return expnsize;
- }
- fragcnt = (data_len / fragsize);
- expnsize = fragcnt * expppdu;
- fragleft = data_len % fragsize;
- if (fragleft > 0)
- expnsize += expppdu;
- }
- return expnsize;
-}
-
-/* WR with IV, KEY and CPL SFO added */
-static void make_tlstx_data_wr(struct sock *sk, struct sk_buff *skb,
- int tls_tx_imm, int tls_len, u32 credits)
-{
- unsigned short pdus_per_ulp = 0;
- struct chtls_sock *csk;
- struct chtls_hws *hws;
- int expn_sz;
- int pdus;
-
- csk = rcu_dereference_sk_user_data(sk);
- hws = &csk->tlshws;
- pdus = DIV_ROUND_UP(tls_len, hws->mfs);
- expn_sz = chtls_expansion_size(sk, tls_len, 0, NULL);
- if (!hws->compute) {
- hws->expansion = chtls_expansion_size(sk,
- hws->fcplenmax,
- 1, &pdus_per_ulp);
- hws->pdus = pdus_per_ulp;
- hws->adjustlen = hws->pdus *
- ((hws->expansion / hws->pdus) + hws->mfs);
- hws->compute = 1;
- }
- if (tls_copy_ivs(sk, skb))
- return;
- tls_copy_tx_key(sk, skb);
- tls_tx_data_wr(sk, skb, tls_len, tls_tx_imm, credits, expn_sz, pdus);
- hws->tx_seq_no += (pdus - 1);
-}
-
-static void make_tx_data_wr(struct sock *sk, struct sk_buff *skb,
- unsigned int immdlen, int len,
- u32 credits, u32 compl)
-{
- struct fw_ofld_tx_data_wr *req;
- unsigned int wr_ulp_mode_force;
- struct chtls_sock *csk;
- unsigned int opcode;
-
- csk = rcu_dereference_sk_user_data(sk);
- opcode = FW_OFLD_TX_DATA_WR;
-
- req = (struct fw_ofld_tx_data_wr *)__skb_push(skb, sizeof(*req));
- req->op_to_immdlen = htonl(WR_OP_V(opcode) |
- FW_WR_COMPL_V(compl) |
- FW_WR_IMMDLEN_V(immdlen));
- req->flowid_len16 = htonl(FW_WR_FLOWID_V(csk->tid) |
- FW_WR_LEN16_V(credits));
-
- wr_ulp_mode_force = TX_ULP_MODE_V(csk->ulp_mode);
- if (is_sg_request(skb))
- wr_ulp_mode_force |= FW_OFLD_TX_DATA_WR_ALIGNPLD_F |
- ((tcp_sk(sk)->nonagle & TCP_NAGLE_OFF) ? 0 :
- FW_OFLD_TX_DATA_WR_SHOVE_F);
-
- req->tunnel_to_proxy = htonl(wr_ulp_mode_force |
- TX_URG_V(skb_urgent(skb)) |
- TX_SHOVE_V((!csk_flag(sk, CSK_TX_MORE_DATA)) &&
- skb_queue_empty(&csk->txq)));
- req->plen = htonl(len);
-}
-
-static int chtls_wr_size(struct chtls_sock *csk, const struct sk_buff *skb,
- bool size)
-{
- int wr_size;
-
- wr_size = TLS_WR_CPL_LEN;
- wr_size += KEY_ON_MEM_SZ;
- wr_size += ivs_size(csk->sk, skb);
-
- if (size)
- return wr_size;
-
- /* frags counted for IV dsgl */
- if (!skb_ulp_tls_iv_imm(skb))
- skb_shinfo(skb)->nr_frags++;
-
- return wr_size;
-}
-
-static bool is_ofld_imm(struct chtls_sock *csk, const struct sk_buff *skb)
-{
- int length = skb->len;
-
- if (skb->peeked || skb->len > MAX_IMM_ULPTX_WR_LEN)
- return false;
-
- if (likely(ULP_SKB_CB(skb)->flags & ULPCB_FLAG_NEED_HDR)) {
- /* Check TLS header len for Immediate */
- if (csk->ulp_mode == ULP_MODE_TLS &&
- skb_ulp_tls_inline(skb))
- length += chtls_wr_size(csk, skb, true);
- else
- length += sizeof(struct fw_ofld_tx_data_wr);
-
- return length <= MAX_IMM_OFLD_TX_DATA_WR_LEN;
- }
- return true;
-}
-
-static unsigned int calc_tx_flits(const struct sk_buff *skb,
- unsigned int immdlen)
-{
- unsigned int flits, cnt;
-
- flits = immdlen / 8; /* headers */
- cnt = skb_shinfo(skb)->nr_frags;
- if (skb_tail_pointer(skb) != skb_transport_header(skb))
- cnt++;
- return flits + sgl_len(cnt);
-}
-
-static void arp_failure_discard(void *handle, struct sk_buff *skb)
-{
- kfree_skb(skb);
-}
-
-int chtls_push_frames(struct chtls_sock *csk, int comp)
-{
- struct chtls_hws *hws = &csk->tlshws;
- struct tcp_sock *tp;
- struct sk_buff *skb;
- int total_size = 0;
- struct sock *sk;
- int wr_size;
-
- wr_size = sizeof(struct fw_ofld_tx_data_wr);
- sk = csk->sk;
- tp = tcp_sk(sk);
-
- if (unlikely(sk_in_state(sk, TCPF_SYN_SENT | TCPF_CLOSE)))
- return 0;
-
- if (unlikely(csk_flag(sk, CSK_ABORT_SHUTDOWN)))
- return 0;
-
- while (csk->wr_credits && (skb = skb_peek(&csk->txq)) &&
- (!(ULP_SKB_CB(skb)->flags & ULPCB_FLAG_HOLD) ||
- skb_queue_len(&csk->txq) > 1)) {
- unsigned int credit_len = skb->len;
- unsigned int credits_needed;
- unsigned int completion = 0;
- int tls_len = skb->len;/* TLS data len before IV/key */
- unsigned int immdlen;
- int len = skb->len; /* length [ulp bytes] inserted by hw */
- int flowclen16 = 0;
- int tls_tx_imm = 0;
-
- immdlen = skb->len;
- if (!is_ofld_imm(csk, skb)) {
- immdlen = skb_transport_offset(skb);
- if (skb_ulp_tls_inline(skb))
- wr_size = chtls_wr_size(csk, skb, false);
- credit_len = 8 * calc_tx_flits(skb, immdlen);
- } else {
- if (skb_ulp_tls_inline(skb)) {
- wr_size = chtls_wr_size(csk, skb, false);
- tls_tx_imm = 1;
- }
- }
- if (likely(ULP_SKB_CB(skb)->flags & ULPCB_FLAG_NEED_HDR))
- credit_len += wr_size;
- credits_needed = DIV_ROUND_UP(credit_len, 16);
- if (!csk_flag_nochk(csk, CSK_TX_DATA_SENT)) {
- flowclen16 = send_tx_flowc_wr(sk, 1, tp->snd_nxt,
- tp->rcv_nxt);
- if (flowclen16 <= 0)
- break;
- csk->wr_credits -= flowclen16;
- csk->wr_unacked += flowclen16;
- csk->wr_nondata += flowclen16;
- csk_set_flag(csk, CSK_TX_DATA_SENT);
- }
-
- if (csk->wr_credits < credits_needed) {
- if (skb_ulp_tls_inline(skb) &&
- !skb_ulp_tls_iv_imm(skb))
- skb_shinfo(skb)->nr_frags--;
- break;
- }
-
- __skb_unlink(skb, &csk->txq);
- skb_set_queue_mapping(skb, (csk->txq_idx << 1) |
- CPL_PRIORITY_DATA);
- if (hws->ofld)
- hws->txqid = (skb->queue_mapping >> 1);
- skb->csum = (__force __wsum)(credits_needed + csk->wr_nondata);
- csk->wr_credits -= credits_needed;
- csk->wr_unacked += credits_needed;
- csk->wr_nondata = 0;
- enqueue_wr(csk, skb);
-
- if (likely(ULP_SKB_CB(skb)->flags & ULPCB_FLAG_NEED_HDR)) {
- if ((comp && csk->wr_unacked == credits_needed) ||
- (ULP_SKB_CB(skb)->flags & ULPCB_FLAG_COMPL) ||
- csk->wr_unacked >= csk->wr_max_credits / 2) {
- completion = 1;
- csk->wr_unacked = 0;
- }
- if (skb_ulp_tls_inline(skb))
- make_tlstx_data_wr(sk, skb, tls_tx_imm,
- tls_len, credits_needed);
- else
- make_tx_data_wr(sk, skb, immdlen, len,
- credits_needed, completion);
- tp->snd_nxt += len;
- tp->lsndtime = tcp_jiffies32;
- if (completion)
- ULP_SKB_CB(skb)->flags &= ~ULPCB_FLAG_NEED_HDR;
- } else {
- struct cpl_close_con_req *req = cplhdr(skb);
- unsigned int cmd = CPL_OPCODE_G(ntohl
- (OPCODE_TID(req)));
-
- if (cmd == CPL_CLOSE_CON_REQ)
- csk_set_flag(csk,
- CSK_CLOSE_CON_REQUESTED);
-
- if ((ULP_SKB_CB(skb)->flags & ULPCB_FLAG_COMPL) &&
- (csk->wr_unacked >= csk->wr_max_credits / 2)) {
- req->wr.wr_hi |= htonl(FW_WR_COMPL_F);
- csk->wr_unacked = 0;
- }
- }
- total_size += skb->truesize;
- if (ULP_SKB_CB(skb)->flags & ULPCB_FLAG_BARRIER)
- csk_set_flag(csk, CSK_TX_WAIT_IDLE);
- t4_set_arp_err_handler(skb, NULL, arp_failure_discard);
- cxgb4_l2t_send(csk->egress_dev, skb, csk->l2t_entry);
- }
- sk->sk_wmem_queued -= total_size;
- return total_size;
-}
-
-static void mark_urg(struct tcp_sock *tp, int flags,
- struct sk_buff *skb)
-{
- if (unlikely(flags & MSG_OOB)) {
- tp->snd_up = tp->write_seq;
- ULP_SKB_CB(skb)->flags = ULPCB_FLAG_URG |
- ULPCB_FLAG_BARRIER |
- ULPCB_FLAG_NO_APPEND |
- ULPCB_FLAG_NEED_HDR;
- }
-}
-
-/*
- * Returns true if a connection should send more data to TCP engine
- */
-static bool should_push(struct sock *sk)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
- struct chtls_dev *cdev = csk->cdev;
- struct tcp_sock *tp = tcp_sk(sk);
-
- /*
- * If we've released our offload resources there's nothing to do ...
- */
- if (!cdev)
- return false;
-
- /*
- * If there aren't any work requests in flight, or there isn't enough
- * data in flight, or Nagle is off then send the current TX_DATA
- * otherwise hold it and wait to accumulate more data.
- */
- return csk->wr_credits == csk->wr_max_credits ||
- (tp->nonagle & TCP_NAGLE_OFF);
-}
-
-/*
- * Returns true if a TCP socket is corked.
- */
-static bool corked(const struct tcp_sock *tp, int flags)
-{
- return (flags & MSG_MORE) || (tp->nonagle & TCP_NAGLE_CORK);
-}
-
-/*
- * Returns true if a send should try to push new data.
- */
-static bool send_should_push(struct sock *sk, int flags)
-{
- return should_push(sk) && !corked(tcp_sk(sk), flags);
-}
-
-void chtls_tcp_push(struct sock *sk, int flags)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
- int qlen = skb_queue_len(&csk->txq);
-
- if (likely(qlen)) {
- struct sk_buff *skb = skb_peek_tail(&csk->txq);
- struct tcp_sock *tp = tcp_sk(sk);
-
- mark_urg(tp, flags, skb);
-
- if (!(ULP_SKB_CB(skb)->flags & ULPCB_FLAG_NO_APPEND) &&
- corked(tp, flags)) {
- ULP_SKB_CB(skb)->flags |= ULPCB_FLAG_HOLD;
- return;
- }
-
- ULP_SKB_CB(skb)->flags &= ~ULPCB_FLAG_HOLD;
- if (qlen == 1 &&
- ((ULP_SKB_CB(skb)->flags & ULPCB_FLAG_NO_APPEND) ||
- should_push(sk)))
- chtls_push_frames(csk, 1);
- }
-}
-
-/*
- * Calculate the size for a new send sk_buff. It's maximum size so we can
- * pack lots of data into it, unless we plan to send it immediately, in which
- * case we size it more tightly.
- *
- * Note: we don't bother compensating for MSS < PAGE_SIZE because it doesn't
- * arise in normal cases and when it does we are just wasting memory.
- */
-static int select_size(struct sock *sk, int io_len, int flags, int len)
-{
- const int pgbreak = SKB_MAX_HEAD(len);
-
- /*
- * If the data wouldn't fit in the main body anyway, put only the
- * header in the main body so it can use immediate data and place all
- * the payload in page fragments.
- */
- if (io_len > pgbreak)
- return 0;
-
- /*
- * If we will be accumulating payload get a large main body.
- */
- if (!send_should_push(sk, flags))
- return pgbreak;
-
- return io_len;
-}
-
-void skb_entail(struct sock *sk, struct sk_buff *skb, int flags)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
- struct tcp_sock *tp = tcp_sk(sk);
-
- ULP_SKB_CB(skb)->seq = tp->write_seq;
- ULP_SKB_CB(skb)->flags = flags;
- __skb_queue_tail(&csk->txq, skb);
- sk->sk_wmem_queued += skb->truesize;
-
- if (TCP_PAGE(sk) && TCP_OFF(sk)) {
- put_page(TCP_PAGE(sk));
- TCP_PAGE(sk) = NULL;
- TCP_OFF(sk) = 0;
- }
-}
-
-static struct sk_buff *get_tx_skb(struct sock *sk, int size)
-{
- struct sk_buff *skb;
-
- skb = alloc_skb(size + TX_HEADER_LEN, sk->sk_allocation);
- if (likely(skb)) {
- skb_reserve(skb, TX_HEADER_LEN);
- skb_entail(sk, skb, ULPCB_FLAG_NEED_HDR);
- skb_reset_transport_header(skb);
- }
- return skb;
-}
-
-static struct sk_buff *get_record_skb(struct sock *sk, int size, bool zcopy)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
- struct sk_buff *skb;
-
- skb = alloc_skb(((zcopy ? 0 : size) + TX_TLSHDR_LEN +
- KEY_ON_MEM_SZ + max_ivs_size(sk, size)),
- sk->sk_allocation);
- if (likely(skb)) {
- skb_reserve(skb, (TX_TLSHDR_LEN +
- KEY_ON_MEM_SZ + max_ivs_size(sk, size)));
- skb_entail(sk, skb, ULPCB_FLAG_NEED_HDR);
- skb_reset_transport_header(skb);
- ULP_SKB_CB(skb)->ulp.tls.ofld = 1;
- ULP_SKB_CB(skb)->ulp.tls.type = csk->tlshws.type;
- }
- return skb;
-}
-
-static void tx_skb_finalize(struct sk_buff *skb)
-{
- struct ulp_skb_cb *cb = ULP_SKB_CB(skb);
-
- if (!(cb->flags & ULPCB_FLAG_NO_HDR))
- cb->flags = ULPCB_FLAG_NEED_HDR;
- cb->flags |= ULPCB_FLAG_NO_APPEND;
-}
-
-static void push_frames_if_head(struct sock *sk)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
-
- if (skb_queue_len(&csk->txq) == 1)
- chtls_push_frames(csk, 1);
-}
-
-static int chtls_skb_copy_to_page_nocache(struct sock *sk,
- struct iov_iter *from,
- struct sk_buff *skb,
- struct page *page,
- int off, int copy)
-{
- int err;
-
- err = skb_do_copy_data_nocache(sk, skb, from, page_address(page) +
- off, copy, skb->len);
- if (err)
- return err;
-
- skb->len += copy;
- skb->data_len += copy;
- skb->truesize += copy;
- sk->sk_wmem_queued += copy;
- return 0;
-}
-
-static bool csk_mem_free(struct chtls_dev *cdev, struct sock *sk)
-{
- return (cdev->max_host_sndbuf - sk->sk_wmem_queued > 0);
-}
-
-static int csk_wait_memory(struct chtls_dev *cdev,
- struct sock *sk, long *timeo_p)
-{
- DEFINE_WAIT_FUNC(wait, woken_wake_function);
- int ret, err = 0;
- long current_timeo;
- long vm_wait = 0;
- bool noblock;
-
- current_timeo = *timeo_p;
- noblock = (*timeo_p ? false : true);
- if (csk_mem_free(cdev, sk)) {
- current_timeo = get_random_u32_below(HZ / 5) + 2;
- vm_wait = get_random_u32_below(HZ / 5) + 2;
- }
-
- add_wait_queue(sk_sleep(sk), &wait);
- while (1) {
- sk_set_bit(SOCKWQ_ASYNC_NOSPACE, sk);
-
- if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
- goto do_error;
- if (!*timeo_p) {
- if (noblock)
- set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
- goto do_nonblock;
- }
- if (signal_pending(current))
- goto do_interrupted;
- sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
- if (csk_mem_free(cdev, sk) && !vm_wait)
- break;
-
- set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
- sk->sk_write_pending++;
- ret = sk_wait_event(sk, ¤t_timeo, sk->sk_err ||
- (sk->sk_shutdown & SEND_SHUTDOWN) ||
- (csk_mem_free(cdev, sk) && !vm_wait),
- &wait);
- sk->sk_write_pending--;
- if (ret < 0)
- goto do_error;
-
- if (vm_wait) {
- vm_wait -= current_timeo;
- current_timeo = *timeo_p;
- if (current_timeo != MAX_SCHEDULE_TIMEOUT) {
- current_timeo -= vm_wait;
- if (current_timeo < 0)
- current_timeo = 0;
- }
- vm_wait = 0;
- }
- *timeo_p = current_timeo;
- }
-do_rm_wq:
- remove_wait_queue(sk_sleep(sk), &wait);
- return err;
-do_error:
- err = -EPIPE;
- goto do_rm_wq;
-do_nonblock:
- err = -EAGAIN;
- goto do_rm_wq;
-do_interrupted:
- err = sock_intr_errno(*timeo_p);
- goto do_rm_wq;
-}
-
-static int chtls_proccess_cmsg(struct sock *sk, struct msghdr *msg,
- unsigned char *record_type)
-{
- struct cmsghdr *cmsg;
- int rc = -EINVAL;
-
- for_each_cmsghdr(cmsg, msg) {
- if (!CMSG_OK(msg, cmsg))
- return -EINVAL;
- if (cmsg->cmsg_level != SOL_TLS)
- continue;
-
- switch (cmsg->cmsg_type) {
- case TLS_SET_RECORD_TYPE:
- if (cmsg->cmsg_len < CMSG_LEN(sizeof(*record_type)))
- return -EINVAL;
-
- if (msg->msg_flags & MSG_MORE)
- return -EINVAL;
-
- *record_type = *(unsigned char *)CMSG_DATA(cmsg);
- rc = 0;
- break;
- default:
- return -EINVAL;
- }
- }
-
- return rc;
-}
-
-int chtls_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
- struct chtls_dev *cdev = csk->cdev;
- struct tcp_sock *tp = tcp_sk(sk);
- struct sk_buff *skb;
- int mss, flags, err;
- int recordsz = 0;
- int copied = 0;
- long timeo;
-
- lock_sock(sk);
- flags = msg->msg_flags;
- timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
-
- if (!sk_in_state(sk, TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) {
- err = sk_stream_wait_connect(sk, &timeo);
- if (err)
- goto out_err;
- }
-
- sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
- err = -EPIPE;
- if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
- goto out_err;
-
- mss = csk->mss;
- csk_set_flag(csk, CSK_TX_MORE_DATA);
-
- while (msg_data_left(msg)) {
- int copy = 0;
-
- skb = skb_peek_tail(&csk->txq);
- if (skb) {
- copy = mss - skb->len;
- skb->ip_summed = CHECKSUM_UNNECESSARY;
- }
- if (!csk_mem_free(cdev, sk))
- goto wait_for_sndbuf;
-
- if (is_tls_tx(csk) && !csk->tlshws.txleft) {
- unsigned char record_type = TLS_RECORD_TYPE_DATA;
-
- if (unlikely(msg->msg_controllen)) {
- err = chtls_proccess_cmsg(sk, msg,
- &record_type);
- if (err)
- goto out_err;
-
- /* Avoid appending tls handshake, alert to tls data */
- if (skb)
- tx_skb_finalize(skb);
- }
-
- recordsz = size;
- csk->tlshws.txleft = recordsz;
- csk->tlshws.type = record_type;
- }
-
- if (!skb || (ULP_SKB_CB(skb)->flags & ULPCB_FLAG_NO_APPEND) ||
- copy <= 0) {
-new_buf:
- if (skb) {
- tx_skb_finalize(skb);
- push_frames_if_head(sk);
- }
-
- if (is_tls_tx(csk)) {
- skb = get_record_skb(sk,
- select_size(sk,
- recordsz,
- flags,
- TX_TLSHDR_LEN),
- false);
- } else {
- skb = get_tx_skb(sk,
- select_size(sk, size, flags,
- TX_HEADER_LEN));
- }
- if (unlikely(!skb))
- goto wait_for_memory;
-
- skb->ip_summed = CHECKSUM_UNNECESSARY;
- copy = mss;
- }
- if (copy > size)
- copy = size;
-
- if (msg->msg_flags & MSG_SPLICE_PAGES) {
- err = skb_splice_from_iter(skb, &msg->msg_iter, copy);
- if (err < 0) {
- if (err == -EMSGSIZE)
- goto new_buf;
- goto do_fault;
- }
- copy = err;
- sk_wmem_queued_add(sk, copy);
- } else if (skb_tailroom(skb) > 0) {
- copy = min(copy, skb_tailroom(skb));
- if (is_tls_tx(csk))
- copy = min_t(int, copy, csk->tlshws.txleft);
- err = skb_add_data_nocache(sk, skb,
- &msg->msg_iter, copy);
- if (err)
- goto do_fault;
- } else {
- int i = skb_shinfo(skb)->nr_frags;
- struct page *page = TCP_PAGE(sk);
- int pg_size = PAGE_SIZE;
- int off = TCP_OFF(sk);
- bool merge;
-
- if (page)
- pg_size = page_size(page);
- if (off < pg_size &&
- skb_can_coalesce(skb, i, page, off)) {
- merge = true;
- goto copy;
- }
- merge = false;
- if (i == (is_tls_tx(csk) ? (MAX_SKB_FRAGS - 1) :
- MAX_SKB_FRAGS))
- goto new_buf;
-
- if (page && off == pg_size) {
- put_page(page);
- TCP_PAGE(sk) = page = NULL;
- pg_size = PAGE_SIZE;
- }
-
- if (!page) {
- gfp_t gfp = sk->sk_allocation;
- int order = cdev->send_page_order;
-
- if (order) {
- page = alloc_pages(gfp | __GFP_COMP |
- __GFP_NOWARN |
- __GFP_NORETRY,
- order);
- if (page)
- pg_size <<= order;
- }
- if (!page) {
- page = alloc_page(gfp);
- pg_size = PAGE_SIZE;
- }
- if (!page)
- goto wait_for_memory;
- off = 0;
- }
-copy:
- if (copy > pg_size - off)
- copy = pg_size - off;
- if (is_tls_tx(csk))
- copy = min_t(int, copy, csk->tlshws.txleft);
-
- err = chtls_skb_copy_to_page_nocache(sk, &msg->msg_iter,
- skb, page,
- off, copy);
- if (unlikely(err)) {
- if (!TCP_PAGE(sk)) {
- TCP_PAGE(sk) = page;
- TCP_OFF(sk) = 0;
- }
- goto do_fault;
- }
- /* Update the skb. */
- if (merge) {
- skb_frag_size_add(
- &skb_shinfo(skb)->frags[i - 1],
- copy);
- } else {
- skb_fill_page_desc(skb, i, page, off, copy);
- if (off + copy < pg_size) {
- /* space left keep page */
- get_page(page);
- TCP_PAGE(sk) = page;
- } else {
- TCP_PAGE(sk) = NULL;
- }
- }
- TCP_OFF(sk) = off + copy;
- }
- if (unlikely(skb->len == mss))
- tx_skb_finalize(skb);
- tp->write_seq += copy;
- copied += copy;
- size -= copy;
-
- if (is_tls_tx(csk))
- csk->tlshws.txleft -= copy;
-
- if (corked(tp, flags) &&
- (sk_stream_wspace(sk) < sk_stream_min_wspace(sk)))
- ULP_SKB_CB(skb)->flags |= ULPCB_FLAG_NO_APPEND;
-
- if (size == 0)
- goto out;
-
- if (ULP_SKB_CB(skb)->flags & ULPCB_FLAG_NO_APPEND)
- push_frames_if_head(sk);
- continue;
-wait_for_sndbuf:
- set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
-wait_for_memory:
- err = csk_wait_memory(cdev, sk, &timeo);
- if (err)
- goto do_error;
- }
-out:
- csk_reset_flag(csk, CSK_TX_MORE_DATA);
- if (copied)
- chtls_tcp_push(sk, flags);
-done:
- release_sock(sk);
- return copied;
-do_fault:
- if (!skb->len) {
- __skb_unlink(skb, &csk->txq);
- sk->sk_wmem_queued -= skb->truesize;
- __kfree_skb(skb);
- }
-do_error:
- if (copied)
- goto out;
-out_err:
- if (csk_conn_inline(csk))
- csk_reset_flag(csk, CSK_TX_MORE_DATA);
- copied = sk_stream_error(sk, flags, err);
- goto done;
-}
-
-void chtls_splice_eof(struct socket *sock)
-{
- struct sock *sk = sock->sk;
-
- lock_sock(sk);
- chtls_tcp_push(sk, 0);
- release_sock(sk);
-}
-
-static void chtls_select_window(struct sock *sk)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
- struct tcp_sock *tp = tcp_sk(sk);
- unsigned int wnd = tp->rcv_wnd;
-
- wnd = max_t(unsigned int, wnd, tcp_full_space(sk));
- wnd = max_t(unsigned int, MIN_RCV_WND, wnd);
-
- if (wnd > MAX_RCV_WND)
- wnd = MAX_RCV_WND;
-
-/*
- * Check if we need to grow the receive window in response to an increase in
- * the socket's receive buffer size. Some applications increase the buffer
- * size dynamically and rely on the window to grow accordingly.
- */
-
- if (wnd > tp->rcv_wnd) {
- tp->rcv_wup -= wnd - tp->rcv_wnd;
- tp->rcv_wnd = wnd;
- /* Mark the receive window as updated */
- csk_reset_flag(csk, CSK_UPDATE_RCV_WND);
- }
-}
-
-/*
- * Send RX credits through an RX_DATA_ACK CPL message. We are permitted
- * to return without sending the message in case we cannot allocate
- * an sk_buff. Returns the number of credits sent.
- */
-static u32 send_rx_credits(struct chtls_sock *csk, u32 credits)
-{
- struct cpl_rx_data_ack *req;
- struct sk_buff *skb;
-
- skb = alloc_skb(sizeof(*req), GFP_ATOMIC);
- if (!skb)
- return 0;
- __skb_put(skb, sizeof(*req));
- req = (struct cpl_rx_data_ack *)skb->head;
-
- set_wr_txq(skb, CPL_PRIORITY_ACK, csk->port_id);
- INIT_TP_WR(req, csk->tid);
- OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_RX_DATA_ACK,
- csk->tid));
- req->credit_dack = cpu_to_be32(RX_CREDITS_V(credits) |
- RX_FORCE_ACK_F);
- cxgb4_ofld_send(csk->cdev->ports[csk->port_id], skb);
- return credits;
-}
-
-#define CREDIT_RETURN_STATE (TCPF_ESTABLISHED | \
- TCPF_FIN_WAIT1 | \
- TCPF_FIN_WAIT2)
-
-/*
- * Called after some received data has been read. It returns RX credits
- * to the HW for the amount of data processed.
- */
-static void chtls_cleanup_rbuf(struct sock *sk, int copied)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
- struct tcp_sock *tp;
- int must_send;
- u32 credits;
- u32 thres;
-
- thres = 15 * 1024;
-
- if (!sk_in_state(sk, CREDIT_RETURN_STATE))
- return;
-
- chtls_select_window(sk);
- tp = tcp_sk(sk);
- credits = tp->copied_seq - tp->rcv_wup;
- if (unlikely(!credits))
- return;
-
-/*
- * For coalescing to work effectively ensure the receive window has
- * at least 16KB left.
- */
- must_send = credits + 16384 >= tp->rcv_wnd;
-
- if (must_send || credits >= thres)
- tp->rcv_wup += send_rx_credits(csk, credits);
-}
-
-static int chtls_pt_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
- int flags)
-{
- struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
- struct chtls_hws *hws = &csk->tlshws;
- struct net_device *dev = csk->egress_dev;
- struct adapter *adap = netdev2adap(dev);
- struct tcp_sock *tp = tcp_sk(sk);
- unsigned long avail;
- int buffers_freed;
- int copied = 0;
- int target;
- long timeo;
- int ret;
-
- buffers_freed = 0;
-
- timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
- target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
-
- if (unlikely(csk_flag(sk, CSK_UPDATE_RCV_WND)))
- chtls_cleanup_rbuf(sk, copied);
-
- do {
- struct sk_buff *skb;
- u32 offset = 0;
-
- if (unlikely(tp->urg_data &&
- tp->urg_seq == tp->copied_seq)) {
- if (copied)
- break;
- if (signal_pending(current)) {
- copied = timeo ? sock_intr_errno(timeo) :
- -EAGAIN;
- break;
- }
- }
- skb = skb_peek(&sk->sk_receive_queue);
- if (skb)
- goto found_ok_skb;
- if (csk->wr_credits &&
- skb_queue_len(&csk->txq) &&
- chtls_push_frames(csk, csk->wr_credits ==
- csk->wr_max_credits))
- sk->sk_write_space(sk);
-
- if (copied >= target && !READ_ONCE(sk->sk_backlog.tail))
- break;
-
- if (copied) {
- if (sk->sk_err || sk->sk_state == TCP_CLOSE ||
- (sk->sk_shutdown & RCV_SHUTDOWN) ||
- signal_pending(current))
- break;
-
- if (!timeo)
- break;
- } else {
- if (sock_flag(sk, SOCK_DONE))
- break;
- if (sk->sk_err) {
- copied = sock_error(sk);
- break;
- }
- if (sk->sk_shutdown & RCV_SHUTDOWN)
- break;
- if (sk->sk_state == TCP_CLOSE) {
- copied = -ENOTCONN;
- break;
- }
- if (!timeo) {
- copied = -EAGAIN;
- break;
- }
- if (signal_pending(current)) {
- copied = sock_intr_errno(timeo);
- break;
- }
- }
- if (READ_ONCE(sk->sk_backlog.tail)) {
- release_sock(sk);
- lock_sock(sk);
- chtls_cleanup_rbuf(sk, copied);
- continue;
- }
-
- if (copied >= target)
- break;
- chtls_cleanup_rbuf(sk, copied);
- ret = sk_wait_data(sk, &timeo, NULL);
- if (ret < 0) {
- copied = copied ? : ret;
- goto unlock;
- }
- continue;
-found_ok_skb:
- if (!skb->len) {
- skb_dstref_steal(skb);
- __skb_unlink(skb, &sk->sk_receive_queue);
- kfree_skb(skb);
-
- if (!copied && !timeo) {
- copied = -EAGAIN;
- break;
- }
-
- if (copied < target) {
- release_sock(sk);
- lock_sock(sk);
- continue;
- }
- break;
- }
- offset = hws->copied_seq;
- avail = skb->len - offset;
- if (len < avail)
- avail = len;
-
- if (unlikely(tp->urg_data)) {
- u32 urg_offset = tp->urg_seq - tp->copied_seq;
-
- if (urg_offset < avail) {
- if (urg_offset) {
- avail = urg_offset;
- } else if (!sock_flag(sk, SOCK_URGINLINE)) {
- /* First byte is urgent, skip */
- tp->copied_seq++;
- offset++;
- avail--;
- if (!avail)
- goto skip_copy;
- }
- }
- }
- /* Set record type if not already done. For a non-data record,
- * do not proceed if record type could not be copied.
- */
- if (ULP_SKB_CB(skb)->flags & ULPCB_FLAG_TLS_HDR) {
- struct tls_hdr *thdr = (struct tls_hdr *)skb->data;
- int cerr = 0;
-
- cerr = put_cmsg(msg, SOL_TLS, TLS_GET_RECORD_TYPE,
- sizeof(thdr->type), &thdr->type);
-
- if (cerr && thdr->type != TLS_RECORD_TYPE_DATA) {
- copied = -EIO;
- break;
- }
- /* don't send tls header, skip copy */
- goto skip_copy;
- }
-
- if (skb_copy_datagram_msg(skb, offset, msg, avail)) {
- if (!copied) {
- copied = -EFAULT;
- break;
- }
- }
-
- copied += avail;
- len -= avail;
- hws->copied_seq += avail;
-skip_copy:
- if (tp->urg_data && after(tp->copied_seq, tp->urg_seq))
- tp->urg_data = 0;
-
- if ((avail + offset) >= skb->len) {
- struct sk_buff *next_skb;
- if (ULP_SKB_CB(skb)->flags & ULPCB_FLAG_TLS_HDR) {
- tp->copied_seq += skb->len;
- hws->rcvpld = skb->hdr_len;
- } else {
- atomic_inc(&adap->chcr_stats.tls_pdu_rx);
- tp->copied_seq += hws->rcvpld;
- }
- chtls_free_skb(sk, skb);
- buffers_freed++;
- hws->copied_seq = 0;
- next_skb = skb_peek(&sk->sk_receive_queue);
- if (copied >= target && !next_skb)
- break;
- if (ULP_SKB_CB(next_skb)->flags & ULPCB_FLAG_TLS_HDR)
- break;
- }
- } while (len > 0);
-
- if (buffers_freed)
- chtls_cleanup_rbuf(sk, copied);
-
-unlock:
- release_sock(sk);
- return copied;
-}
-
-/*
- * Peek at data in a socket's receive buffer.
- */
-static int peekmsg(struct sock *sk, struct msghdr *msg,
- size_t len, int flags)
-{
- struct tcp_sock *tp = tcp_sk(sk);
- u32 peek_seq, offset;
- struct sk_buff *skb;
- int copied = 0;
- size_t avail; /* amount of available data in current skb */
- long timeo;
- int ret;
-
- lock_sock(sk);
- timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
- peek_seq = tp->copied_seq;
-
- do {
- if (unlikely(tp->urg_data && tp->urg_seq == peek_seq)) {
- if (copied)
- break;
- if (signal_pending(current)) {
- copied = timeo ? sock_intr_errno(timeo) :
- -EAGAIN;
- break;
- }
- }
-
- skb_queue_walk(&sk->sk_receive_queue, skb) {
- offset = peek_seq - ULP_SKB_CB(skb)->seq;
- if (offset < skb->len)
- goto found_ok_skb;
- }
-
- /* empty receive queue */
- if (copied)
- break;
- if (sock_flag(sk, SOCK_DONE))
- break;
- if (sk->sk_err) {
- copied = sock_error(sk);
- break;
- }
- if (sk->sk_shutdown & RCV_SHUTDOWN)
- break;
- if (sk->sk_state == TCP_CLOSE) {
- copied = -ENOTCONN;
- break;
- }
- if (!timeo) {
- copied = -EAGAIN;
- break;
- }
- if (signal_pending(current)) {
- copied = sock_intr_errno(timeo);
- break;
- }
-
- if (READ_ONCE(sk->sk_backlog.tail)) {
- /* Do not sleep, just process backlog. */
- release_sock(sk);
- lock_sock(sk);
- } else {
- ret = sk_wait_data(sk, &timeo, NULL);
- if (ret < 0) {
- /* here 'copied' is 0 due to previous checks */
- copied = ret;
- break;
- }
- }
-
- if (unlikely(peek_seq != tp->copied_seq)) {
- if (net_ratelimit())
- pr_info("TCP(%s:%d), race in MSG_PEEK.\n",
- current->comm, current->pid);
- peek_seq = tp->copied_seq;
- }
- continue;
-
-found_ok_skb:
- avail = skb->len - offset;
- if (len < avail)
- avail = len;
- /*
- * Do we have urgent data here? We need to skip over the
- * urgent byte.
- */
- if (unlikely(tp->urg_data)) {
- u32 urg_offset = tp->urg_seq - peek_seq;
-
- if (urg_offset < avail) {
- /*
- * The amount of data we are preparing to copy
- * contains urgent data.
- */
- if (!urg_offset) { /* First byte is urgent */
- if (!sock_flag(sk, SOCK_URGINLINE)) {
- peek_seq++;
- offset++;
- avail--;
- }
- if (!avail)
- continue;
- } else {
- /* stop short of the urgent data */
- avail = urg_offset;
- }
- }
- }
-
- /*
- * If MSG_TRUNC is specified the data is discarded.
- */
- if (likely(!(flags & MSG_TRUNC)))
- if (skb_copy_datagram_msg(skb, offset, msg, len)) {
- if (!copied) {
- copied = -EFAULT;
- break;
- }
- }
- peek_seq += avail;
- copied += avail;
- len -= avail;
- } while (len > 0);
-
- release_sock(sk);
- return copied;
-}
-
-int chtls_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
- int flags)
-{
- struct tcp_sock *tp = tcp_sk(sk);
- struct chtls_sock *csk;
- unsigned long avail; /* amount of available data in current skb */
- int buffers_freed;
- int copied = 0;
- long timeo;
- int target; /* Read at least this many bytes */
- int ret;
-
- buffers_freed = 0;
-
- if (unlikely(flags & MSG_OOB))
- return tcp_prot.recvmsg(sk, msg, len, flags);
-
- if (unlikely(flags & MSG_PEEK))
- return peekmsg(sk, msg, len, flags);
-
- if (sk_can_busy_loop(sk) &&
- skb_queue_empty_lockless(&sk->sk_receive_queue) &&
- sk->sk_state == TCP_ESTABLISHED)
- sk_busy_loop(sk, flags & MSG_DONTWAIT);
-
- lock_sock(sk);
- csk = rcu_dereference_sk_user_data(sk);
-
- if (is_tls_rx(csk))
- return chtls_pt_recvmsg(sk, msg, len, flags);
-
- timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
- target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
-
- if (unlikely(csk_flag(sk, CSK_UPDATE_RCV_WND)))
- chtls_cleanup_rbuf(sk, copied);
-
- do {
- struct sk_buff *skb;
- u32 offset;
-
- if (unlikely(tp->urg_data && tp->urg_seq == tp->copied_seq)) {
- if (copied)
- break;
- if (signal_pending(current)) {
- copied = timeo ? sock_intr_errno(timeo) :
- -EAGAIN;
- break;
- }
- }
-
- skb = skb_peek(&sk->sk_receive_queue);
- if (skb)
- goto found_ok_skb;
-
- if (csk->wr_credits &&
- skb_queue_len(&csk->txq) &&
- chtls_push_frames(csk, csk->wr_credits ==
- csk->wr_max_credits))
- sk->sk_write_space(sk);
-
- if (copied >= target && !READ_ONCE(sk->sk_backlog.tail))
- break;
-
- if (copied) {
- if (sk->sk_err || sk->sk_state == TCP_CLOSE ||
- (sk->sk_shutdown & RCV_SHUTDOWN) ||
- signal_pending(current))
- break;
- } else {
- if (sock_flag(sk, SOCK_DONE))
- break;
- if (sk->sk_err) {
- copied = sock_error(sk);
- break;
- }
- if (sk->sk_shutdown & RCV_SHUTDOWN)
- break;
- if (sk->sk_state == TCP_CLOSE) {
- copied = -ENOTCONN;
- break;
- }
- if (!timeo) {
- copied = -EAGAIN;
- break;
- }
- if (signal_pending(current)) {
- copied = sock_intr_errno(timeo);
- break;
- }
- }
-
- if (READ_ONCE(sk->sk_backlog.tail)) {
- release_sock(sk);
- lock_sock(sk);
- chtls_cleanup_rbuf(sk, copied);
- continue;
- }
-
- if (copied >= target)
- break;
- chtls_cleanup_rbuf(sk, copied);
- ret = sk_wait_data(sk, &timeo, NULL);
- if (ret < 0) {
- copied = copied ? : ret;
- goto unlock;
- }
- continue;
-
-found_ok_skb:
- if (!skb->len) {
- chtls_kfree_skb(sk, skb);
- if (!copied && !timeo) {
- copied = -EAGAIN;
- break;
- }
-
- if (copied < target)
- continue;
-
- break;
- }
-
- offset = tp->copied_seq - ULP_SKB_CB(skb)->seq;
- avail = skb->len - offset;
- if (len < avail)
- avail = len;
-
- if (unlikely(tp->urg_data)) {
- u32 urg_offset = tp->urg_seq - tp->copied_seq;
-
- if (urg_offset < avail) {
- if (urg_offset) {
- avail = urg_offset;
- } else if (!sock_flag(sk, SOCK_URGINLINE)) {
- tp->copied_seq++;
- offset++;
- avail--;
- if (!avail)
- goto skip_copy;
- }
- }
- }
-
- if (likely(!(flags & MSG_TRUNC))) {
- if (skb_copy_datagram_msg(skb, offset,
- msg, avail)) {
- if (!copied) {
- copied = -EFAULT;
- break;
- }
- }
- }
-
- tp->copied_seq += avail;
- copied += avail;
- len -= avail;
-
-skip_copy:
- if (tp->urg_data && after(tp->copied_seq, tp->urg_seq))
- tp->urg_data = 0;
-
- if (avail + offset >= skb->len) {
- chtls_free_skb(sk, skb);
- buffers_freed++;
-
- if (copied >= target &&
- !skb_peek(&sk->sk_receive_queue))
- break;
- }
- } while (len > 0);
-
- if (buffers_freed)
- chtls_cleanup_rbuf(sk, copied);
-
-unlock:
- release_sock(sk);
- return copied;
-}
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c
deleted file mode 100644
index 2570575434f9..000000000000
--- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c
+++ /dev/null
@@ -1,642 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Copyright (c) 2018 Chelsio Communications, Inc.
- *
- * Written by: Atul Gupta (atul.gupta@chelsio.com)
- */
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/skbuff.h>
-#include <linux/socket.h>
-#include <linux/hash.h>
-#include <linux/in.h>
-#include <linux/net.h>
-#include <linux/ip.h>
-#include <linux/tcp.h>
-#include <net/ipv6.h>
-#include <net/transp_v6.h>
-#include <net/tcp.h>
-#include <net/tls.h>
-
-#include "chtls.h"
-#include "chtls_cm.h"
-
-#define DRV_NAME "chtls"
-
-/*
- * chtls device management
- * maintains a list of the chtls devices
- */
-static LIST_HEAD(cdev_list);
-static DEFINE_MUTEX(cdev_mutex);
-
-static DEFINE_MUTEX(notify_mutex);
-static RAW_NOTIFIER_HEAD(listen_notify_list);
-static struct proto chtls_cpl_prot, chtls_cpl_protv6;
-struct request_sock_ops chtls_rsk_ops, chtls_rsk_opsv6;
-static uint send_page_order = (14 - PAGE_SHIFT < 0) ? 0 : 14 - PAGE_SHIFT;
-
-static void register_listen_notifier(struct notifier_block *nb)
-{
- mutex_lock(¬ify_mutex);
- raw_notifier_chain_register(&listen_notify_list, nb);
- mutex_unlock(¬ify_mutex);
-}
-
-static void unregister_listen_notifier(struct notifier_block *nb)
-{
- mutex_lock(¬ify_mutex);
- raw_notifier_chain_unregister(&listen_notify_list, nb);
- mutex_unlock(¬ify_mutex);
-}
-
-static int listen_notify_handler(struct notifier_block *this,
- unsigned long event, void *data)
-{
- struct chtls_listen *clisten;
- int ret = NOTIFY_DONE;
-
- clisten = (struct chtls_listen *)data;
-
- switch (event) {
- case CHTLS_LISTEN_START:
- ret = chtls_listen_start(clisten->cdev, clisten->sk);
- kfree(clisten);
- break;
- case CHTLS_LISTEN_STOP:
- chtls_listen_stop(clisten->cdev, clisten->sk);
- kfree(clisten);
- break;
- }
- return ret;
-}
-
-static struct notifier_block listen_notifier = {
- .notifier_call = listen_notify_handler
-};
-
-static int listen_backlog_rcv(struct sock *sk, struct sk_buff *skb)
-{
- if (likely(skb_transport_header(skb) != skb_network_header(skb)))
- return tcp_v4_do_rcv(sk, skb);
- BLOG_SKB_CB(skb)->backlog_rcv(sk, skb);
- return 0;
-}
-
-static int chtls_start_listen(struct chtls_dev *cdev, struct sock *sk)
-{
- struct chtls_listen *clisten;
-
- if (sk->sk_protocol != IPPROTO_TCP)
- return -EPROTONOSUPPORT;
-
- if (sk->sk_family == PF_INET &&
- LOOPBACK(inet_sk(sk)->inet_rcv_saddr))
- return -EADDRNOTAVAIL;
-
- sk->sk_backlog_rcv = listen_backlog_rcv;
- clisten = kmalloc_obj(*clisten);
- if (!clisten)
- return -ENOMEM;
- clisten->cdev = cdev;
- clisten->sk = sk;
- mutex_lock(¬ify_mutex);
- raw_notifier_call_chain(&listen_notify_list,
- CHTLS_LISTEN_START, clisten);
- mutex_unlock(¬ify_mutex);
- return 0;
-}
-
-static void chtls_stop_listen(struct chtls_dev *cdev, struct sock *sk)
-{
- struct chtls_listen *clisten;
-
- if (sk->sk_protocol != IPPROTO_TCP)
- return;
-
- clisten = kmalloc_obj(*clisten);
- if (!clisten)
- return;
- clisten->cdev = cdev;
- clisten->sk = sk;
- mutex_lock(¬ify_mutex);
- raw_notifier_call_chain(&listen_notify_list,
- CHTLS_LISTEN_STOP, clisten);
- mutex_unlock(¬ify_mutex);
-}
-
-static int chtls_inline_feature(struct tls_toe_device *dev)
-{
- struct net_device *netdev;
- struct chtls_dev *cdev;
- int i;
-
- cdev = to_chtls_dev(dev);
-
- for (i = 0; i < cdev->lldi->nports; i++) {
- netdev = cdev->ports[i];
- if (netdev->features & NETIF_F_HW_TLS_RECORD)
- return 1;
- }
- return 0;
-}
-
-static int chtls_create_hash(struct tls_toe_device *dev, struct sock *sk)
-{
- struct chtls_dev *cdev = to_chtls_dev(dev);
-
- if (sk->sk_state == TCP_LISTEN)
- return chtls_start_listen(cdev, sk);
- return 0;
-}
-
-static void chtls_destroy_hash(struct tls_toe_device *dev, struct sock *sk)
-{
- struct chtls_dev *cdev = to_chtls_dev(dev);
-
- if (sk->sk_state == TCP_LISTEN)
- chtls_stop_listen(cdev, sk);
-}
-
-static void chtls_free_uld(struct chtls_dev *cdev)
-{
- int i;
-
- tls_toe_unregister_device(&cdev->tlsdev);
- kvfree(cdev->kmap.addr);
- idr_destroy(&cdev->hwtid_idr);
- for (i = 0; i < (1 << RSPQ_HASH_BITS); i++)
- kfree_skb(cdev->rspq_skb_cache[i]);
- kfree(cdev->lldi);
- kfree_skb(cdev->askb);
- kfree(cdev);
-}
-
-static inline void chtls_dev_release(struct kref *kref)
-{
- struct tls_toe_device *dev;
- struct chtls_dev *cdev;
- struct adapter *adap;
-
- dev = container_of(kref, struct tls_toe_device, kref);
- cdev = to_chtls_dev(dev);
-
- /* Reset tls rx/tx stats */
- adap = pci_get_drvdata(cdev->pdev);
- atomic_set(&adap->chcr_stats.tls_pdu_tx, 0);
- atomic_set(&adap->chcr_stats.tls_pdu_rx, 0);
-
- chtls_free_uld(cdev);
-}
-
-static void chtls_register_dev(struct chtls_dev *cdev)
-{
- struct tls_toe_device *tlsdev = &cdev->tlsdev;
-
- strscpy(tlsdev->name, "chtls", TLS_TOE_DEVICE_NAME_MAX);
- strlcat(tlsdev->name, cdev->lldi->ports[0]->name,
- TLS_TOE_DEVICE_NAME_MAX);
- tlsdev->feature = chtls_inline_feature;
- tlsdev->hash = chtls_create_hash;
- tlsdev->unhash = chtls_destroy_hash;
- tlsdev->release = chtls_dev_release;
- kref_init(&tlsdev->kref);
- tls_toe_register_device(tlsdev);
- cdev->cdev_state = CHTLS_CDEV_STATE_UP;
-}
-
-static void process_deferq(struct work_struct *task_param)
-{
- struct chtls_dev *cdev = container_of(task_param,
- struct chtls_dev, deferq_task);
- struct sk_buff *skb;
-
- spin_lock_bh(&cdev->deferq.lock);
- while ((skb = __skb_dequeue(&cdev->deferq)) != NULL) {
- spin_unlock_bh(&cdev->deferq.lock);
- DEFERRED_SKB_CB(skb)->handler(cdev, skb);
- spin_lock_bh(&cdev->deferq.lock);
- }
- spin_unlock_bh(&cdev->deferq.lock);
-}
-
-static int chtls_get_skb(struct chtls_dev *cdev)
-{
- cdev->askb = alloc_skb(sizeof(struct tcphdr), GFP_KERNEL);
- if (!cdev->askb)
- return -ENOMEM;
-
- skb_put(cdev->askb, sizeof(struct tcphdr));
- skb_reset_transport_header(cdev->askb);
- memset(cdev->askb->data, 0, cdev->askb->len);
- return 0;
-}
-
-static void *chtls_uld_add(const struct cxgb4_lld_info *info)
-{
- struct cxgb4_lld_info *lldi;
- struct chtls_dev *cdev;
- int i, j;
-
- cdev = kzalloc_obj(*cdev);
- if (!cdev)
- goto out;
-
- lldi = kzalloc_obj(*lldi);
- if (!lldi)
- goto out_lldi;
-
- if (chtls_get_skb(cdev))
- goto out_skb;
-
- *lldi = *info;
- cdev->lldi = lldi;
- cdev->pdev = lldi->pdev;
- cdev->tids = lldi->tids;
- cdev->ports = lldi->ports;
- cdev->mtus = lldi->mtus;
- cdev->tids = lldi->tids;
- cdev->pfvf = FW_VIID_PFN_G(cxgb4_port_viid(lldi->ports[0]))
- << FW_VIID_PFN_S;
-
- for (i = 0; i < (1 << RSPQ_HASH_BITS); i++) {
- unsigned int size = 64 - sizeof(struct rsp_ctrl) - 8;
-
- cdev->rspq_skb_cache[i] = __alloc_skb(size,
- gfp_any(), 0,
- lldi->nodeid);
- if (unlikely(!cdev->rspq_skb_cache[i]))
- goto out_rspq_skb;
- }
-
- idr_init(&cdev->hwtid_idr);
- INIT_WORK(&cdev->deferq_task, process_deferq);
- spin_lock_init(&cdev->listen_lock);
- spin_lock_init(&cdev->idr_lock);
- cdev->send_page_order = min_t(uint, get_order(32768),
- send_page_order);
- cdev->max_host_sndbuf = 48 * 1024;
-
- if (lldi->vr->key.size)
- if (chtls_init_kmap(cdev, lldi))
- goto out_rspq_skb;
-
- mutex_lock(&cdev_mutex);
- list_add_tail(&cdev->list, &cdev_list);
- mutex_unlock(&cdev_mutex);
-
- return cdev;
-out_rspq_skb:
- for (j = 0; j < i; j++)
- kfree_skb(cdev->rspq_skb_cache[j]);
- kfree_skb(cdev->askb);
-out_skb:
- kfree(lldi);
-out_lldi:
- kfree(cdev);
-out:
- return NULL;
-}
-
-static void chtls_free_all_uld(void)
-{
- struct chtls_dev *cdev, *tmp;
-
- mutex_lock(&cdev_mutex);
- list_for_each_entry_safe(cdev, tmp, &cdev_list, list) {
- if (cdev->cdev_state == CHTLS_CDEV_STATE_UP) {
- list_del(&cdev->list);
- kref_put(&cdev->tlsdev.kref, cdev->tlsdev.release);
- }
- }
- mutex_unlock(&cdev_mutex);
-}
-
-static int chtls_uld_state_change(void *handle, enum cxgb4_state new_state)
-{
- struct chtls_dev *cdev = handle;
-
- switch (new_state) {
- case CXGB4_STATE_UP:
- chtls_register_dev(cdev);
- break;
- case CXGB4_STATE_DOWN:
- break;
- case CXGB4_STATE_START_RECOVERY:
- break;
- case CXGB4_STATE_DETACH:
- mutex_lock(&cdev_mutex);
- list_del(&cdev->list);
- mutex_unlock(&cdev_mutex);
- kref_put(&cdev->tlsdev.kref, cdev->tlsdev.release);
- break;
- default:
- break;
- }
- return 0;
-}
-
-static struct sk_buff *copy_gl_to_skb_pkt(const struct pkt_gl *gl,
- const __be64 *rsp,
- u32 pktshift)
-{
- struct sk_buff *skb;
-
- /* Allocate space for cpl_pass_accept_req which will be synthesized by
- * driver. Once driver synthesizes cpl_pass_accept_req the skb will go
- * through the regular cpl_pass_accept_req processing in TOM.
- */
- skb = alloc_skb(size_add(gl->tot_len,
- sizeof(struct cpl_pass_accept_req)) -
- pktshift, GFP_ATOMIC);
- if (unlikely(!skb))
- return NULL;
- __skb_put(skb, gl->tot_len + sizeof(struct cpl_pass_accept_req)
- - pktshift);
- /* For now we will copy cpl_rx_pkt in the skb */
- skb_copy_to_linear_data(skb, rsp, sizeof(struct cpl_rx_pkt));
- skb_copy_to_linear_data_offset(skb, sizeof(struct cpl_pass_accept_req)
- , gl->va + pktshift,
- gl->tot_len - pktshift);
-
- return skb;
-}
-
-static int chtls_recv_packet(struct chtls_dev *cdev,
- const struct pkt_gl *gl, const __be64 *rsp)
-{
- unsigned int opcode = *(u8 *)rsp;
- struct sk_buff *skb;
- int ret;
-
- skb = copy_gl_to_skb_pkt(gl, rsp, cdev->lldi->sge_pktshift);
- if (!skb)
- return -ENOMEM;
-
- ret = chtls_handlers[opcode](cdev, skb);
- if (ret & CPL_RET_BUF_DONE)
- kfree_skb(skb);
-
- return 0;
-}
-
-static int chtls_recv_rsp(struct chtls_dev *cdev, const __be64 *rsp)
-{
- unsigned long rspq_bin;
- unsigned int opcode;
- struct sk_buff *skb;
- unsigned int len;
- int ret;
-
- len = 64 - sizeof(struct rsp_ctrl) - 8;
- opcode = *(u8 *)rsp;
-
- rspq_bin = hash_ptr((void *)rsp, RSPQ_HASH_BITS);
- skb = cdev->rspq_skb_cache[rspq_bin];
- if (skb && !skb_is_nonlinear(skb) &&
- !skb_shared(skb) && !skb_cloned(skb)) {
- refcount_inc(&skb->users);
- if (refcount_read(&skb->users) == 2) {
- __skb_trim(skb, 0);
- if (skb_tailroom(skb) >= len)
- goto copy_out;
- }
- refcount_dec(&skb->users);
- }
- skb = alloc_skb(len, GFP_ATOMIC);
- if (unlikely(!skb))
- return -ENOMEM;
-
-copy_out:
- __skb_put(skb, len);
- skb_copy_to_linear_data(skb, rsp, len);
- skb_reset_network_header(skb);
- skb_reset_transport_header(skb);
- ret = chtls_handlers[opcode](cdev, skb);
-
- if (ret & CPL_RET_BUF_DONE)
- kfree_skb(skb);
- return 0;
-}
-
-static void chtls_recv(struct chtls_dev *cdev,
- struct sk_buff **skbs, const __be64 *rsp)
-{
- struct sk_buff *skb = *skbs;
- unsigned int opcode;
- int ret;
-
- opcode = *(u8 *)rsp;
-
- __skb_push(skb, sizeof(struct rss_header));
- skb_copy_to_linear_data(skb, rsp, sizeof(struct rss_header));
-
- ret = chtls_handlers[opcode](cdev, skb);
- if (ret & CPL_RET_BUF_DONE)
- kfree_skb(skb);
-}
-
-static int chtls_uld_rx_handler(void *handle, const __be64 *rsp,
- const struct pkt_gl *gl)
-{
- struct chtls_dev *cdev = handle;
- unsigned int opcode;
- struct sk_buff *skb;
-
- opcode = *(u8 *)rsp;
-
- if (unlikely(opcode == CPL_RX_PKT)) {
- if (chtls_recv_packet(cdev, gl, rsp) < 0)
- goto nomem;
- return 0;
- }
-
- if (!gl)
- return chtls_recv_rsp(cdev, rsp);
-
-#define RX_PULL_LEN 128
- skb = cxgb4_pktgl_to_skb(gl, RX_PULL_LEN, RX_PULL_LEN);
- if (unlikely(!skb))
- goto nomem;
- chtls_recv(cdev, &skb, rsp);
- return 0;
-
-nomem:
- return -ENOMEM;
-}
-
-static int do_chtls_getsockopt(struct sock *sk, char __user *optval,
- int __user *optlen)
-{
- struct tls_crypto_info crypto_info = { 0 };
-
- crypto_info.version = TLS_1_2_VERSION;
- if (copy_to_user(optval, &crypto_info, sizeof(struct tls_crypto_info)))
- return -EFAULT;
- return 0;
-}
-
-static int chtls_getsockopt(struct sock *sk, int level, int optname,
- char __user *optval, int __user *optlen)
-{
- struct tls_context *ctx = tls_get_ctx(sk);
-
- if (level != SOL_TLS)
- return ctx->sk_proto->getsockopt(sk, level,
- optname, optval, optlen);
-
- return do_chtls_getsockopt(sk, optval, optlen);
-}
-
-static int do_chtls_setsockopt(struct sock *sk, int optname,
- sockptr_t optval, unsigned int optlen)
-{
- struct tls_crypto_info *crypto_info, tmp_crypto_info;
- struct chtls_sock *csk;
- int keylen;
- int cipher_type;
- int rc = 0;
-
- csk = rcu_dereference_sk_user_data(sk);
-
- if (sockptr_is_null(optval) || optlen < sizeof(*crypto_info)) {
- rc = -EINVAL;
- goto out;
- }
-
- rc = copy_from_sockptr(&tmp_crypto_info, optval, sizeof(*crypto_info));
- if (rc) {
- rc = -EFAULT;
- goto out;
- }
-
- /* check version */
- if (tmp_crypto_info.version != TLS_1_2_VERSION) {
- rc = -ENOTSUPP;
- goto out;
- }
-
- crypto_info = (struct tls_crypto_info *)&csk->tlshws.crypto_info;
-
- /* GCM mode of AES supports 128 and 256 bit encryption, so
- * copy keys from user based on GCM cipher type.
- */
- switch (tmp_crypto_info.cipher_type) {
- case TLS_CIPHER_AES_GCM_128: {
- /* Obtain version and type from previous copy */
- crypto_info[0] = tmp_crypto_info;
- /* Now copy the following data */
- rc = copy_from_sockptr_offset((char *)crypto_info +
- sizeof(*crypto_info),
- optval, sizeof(*crypto_info),
- sizeof(struct tls12_crypto_info_aes_gcm_128)
- - sizeof(*crypto_info));
-
- if (rc) {
- rc = -EFAULT;
- goto out;
- }
-
- keylen = TLS_CIPHER_AES_GCM_128_KEY_SIZE;
- cipher_type = TLS_CIPHER_AES_GCM_128;
- break;
- }
- case TLS_CIPHER_AES_GCM_256: {
- crypto_info[0] = tmp_crypto_info;
- rc = copy_from_sockptr_offset((char *)crypto_info +
- sizeof(*crypto_info),
- optval, sizeof(*crypto_info),
- sizeof(struct tls12_crypto_info_aes_gcm_256)
- - sizeof(*crypto_info));
-
- if (rc) {
- rc = -EFAULT;
- goto out;
- }
-
- keylen = TLS_CIPHER_AES_GCM_256_KEY_SIZE;
- cipher_type = TLS_CIPHER_AES_GCM_256;
- break;
- }
- default:
- rc = -EINVAL;
- goto out;
- }
- rc = chtls_setkey(csk, keylen, optname, cipher_type);
-out:
- return rc;
-}
-
-static int chtls_setsockopt(struct sock *sk, int level, int optname,
- sockptr_t optval, unsigned int optlen)
-{
- struct tls_context *ctx = tls_get_ctx(sk);
-
- if (level != SOL_TLS)
- return ctx->sk_proto->setsockopt(sk, level,
- optname, optval, optlen);
-
- return do_chtls_setsockopt(sk, optname, optval, optlen);
-}
-
-static struct cxgb4_uld_info chtls_uld_info = {
- .name = DRV_NAME,
- .nrxq = MAX_ULD_QSETS,
- .ntxq = MAX_ULD_QSETS,
- .rxq_size = 1024,
- .add = chtls_uld_add,
- .state_change = chtls_uld_state_change,
- .rx_handler = chtls_uld_rx_handler,
-};
-
-void chtls_install_cpl_ops(struct sock *sk)
-{
- if (sk->sk_family == AF_INET)
- sk->sk_prot = &chtls_cpl_prot;
- else
- sk->sk_prot = &chtls_cpl_protv6;
-}
-
-static void __init chtls_init_ulp_ops(void)
-{
- chtls_cpl_prot = tcp_prot;
- chtls_init_rsk_ops(&chtls_cpl_prot, &chtls_rsk_ops,
- &tcp_prot, PF_INET);
- chtls_cpl_prot.close = chtls_close;
- chtls_cpl_prot.disconnect = chtls_disconnect;
- chtls_cpl_prot.destroy = chtls_destroy_sock;
- chtls_cpl_prot.shutdown = chtls_shutdown;
- chtls_cpl_prot.sendmsg = chtls_sendmsg;
- chtls_cpl_prot.splice_eof = chtls_splice_eof;
- chtls_cpl_prot.recvmsg = chtls_recvmsg;
- chtls_cpl_prot.setsockopt = chtls_setsockopt;
- chtls_cpl_prot.getsockopt = chtls_getsockopt;
-#if IS_ENABLED(CONFIG_IPV6)
- chtls_cpl_protv6 = chtls_cpl_prot;
- chtls_init_rsk_ops(&chtls_cpl_protv6, &chtls_rsk_opsv6,
- &tcpv6_prot, PF_INET6);
-#endif
-}
-
-static int __init chtls_register(void)
-{
- chtls_init_ulp_ops();
- register_listen_notifier(&listen_notifier);
- cxgb4_register_uld(CXGB4_ULD_TLS, &chtls_uld_info);
- return 0;
-}
-
-static void __exit chtls_unregister(void)
-{
- unregister_listen_notifier(&listen_notifier);
- chtls_free_all_uld();
- cxgb4_unregister_uld(CXGB4_ULD_TLS);
-}
-
-module_init(chtls_register);
-module_exit(chtls_unregister);
-
-MODULE_DESCRIPTION("Chelsio TLS Inline driver");
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Chelsio Communications");
-MODULE_VERSION(CHTLS_DRV_VERSION);
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 93e4da7046a1..8eb6b8033606 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -79,7 +79,7 @@ enum {
NETIF_F_HW_TLS_RX_BIT, /* Hardware TLS RX offload */
NETIF_F_GRO_HW_BIT, /* Hardware Generic receive offload */
- NETIF_F_HW_TLS_RECORD_BIT, /* Offload TLS record */
+ __UNUSED_NETIF_F_56,
NETIF_F_GRO_FRAGLIST_BIT, /* Fraglist GRO */
NETIF_F_HW_MACSEC_BIT, /* Offload MACsec operations */
@@ -153,7 +153,6 @@ enum {
#define NETIF_F_HW_ESP __NETIF_F(HW_ESP)
#define NETIF_F_HW_ESP_TX_CSUM __NETIF_F(HW_ESP_TX_CSUM)
#define NETIF_F_RX_UDP_TUNNEL_PORT __NETIF_F(RX_UDP_TUNNEL_PORT)
-#define NETIF_F_HW_TLS_RECORD __NETIF_F(HW_TLS_RECORD)
#define NETIF_F_GSO_UDP_L4 __NETIF_F(GSO_UDP_L4)
#define NETIF_F_HW_TLS_TX __NETIF_F(HW_TLS_TX)
#define NETIF_F_HW_TLS_RX __NETIF_F(HW_TLS_RX)
diff --git a/include/net/tls.h b/include/net/tls.h
index 3811943288b3..e57bef58851e 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -85,7 +85,6 @@ enum {
TLS_BASE,
TLS_SW,
TLS_HW,
- TLS_HW_RECORD,
TLS_NUM_CONFIG,
};
diff --git a/include/net/tls_toe.h b/include/net/tls_toe.h
deleted file mode 100644
index b3aa7593ce2c..000000000000
--- a/include/net/tls_toe.h
+++ /dev/null
@@ -1,77 +0,0 @@
-/*
- * Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
- * Copyright (c) 2016-2017, Dave Watson <davejwatson@fb.com>. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses. You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- * - Redistributions of source code must retain the above
- * copyright notice, this list of conditions and the following
- * disclaimer.
- *
- * - Redistributions in binary form must reproduce the above
- * copyright notice, this list of conditions and the following
- * disclaimer in the documentation and/or other materials
- * provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-
-#include <linux/kref.h>
-#include <linux/list.h>
-
-struct sock;
-
-#define TLS_TOE_DEVICE_NAME_MAX 32
-
-/*
- * This structure defines the routines for Inline TLS driver.
- * The following routines are optional and filled with a
- * null pointer if not defined.
- *
- * @name: Its the name of registered Inline tls device
- * @dev_list: Inline tls device list
- * int (*feature)(struct tls_toe_device *device);
- * Called to return Inline TLS driver capability
- *
- * int (*hash)(struct tls_toe_device *device, struct sock *sk);
- * This function sets Inline driver for listen and program
- * device specific functioanlity as required
- *
- * void (*unhash)(struct tls_toe_device *device, struct sock *sk);
- * This function cleans listen state set by Inline TLS driver
- *
- * void (*release)(struct kref *kref);
- * Release the registered device and allocated resources
- * @kref: Number of reference to tls_toe_device
- */
-struct tls_toe_device {
- char name[TLS_TOE_DEVICE_NAME_MAX];
- struct list_head dev_list;
- int (*feature)(struct tls_toe_device *device);
- int (*hash)(struct tls_toe_device *device, struct sock *sk);
- void (*unhash)(struct tls_toe_device *device, struct sock *sk);
- void (*release)(struct kref *kref);
- struct kref kref;
-};
-
-int tls_toe_bypass(struct sock *sk);
-int tls_toe_hash(struct sock *sk);
-void tls_toe_unhash(struct sock *sk);
-
-void tls_toe_register_device(struct tls_toe_device *device);
-void tls_toe_unregister_device(struct tls_toe_device *device);
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index b8b9c42f848c..1245ab38afc1 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -203,6 +203,6 @@ enum {
#define TLS_CONF_BASE 1
#define TLS_CONF_SW 2
#define TLS_CONF_HW 3
-#define TLS_CONF_HW_RECORD 4
+#define TLS_CONF_HW_RECORD 4 /* unused */
#endif /* _UAPI_LINUX_TLS_H */
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index 84ec88dee05c..972fe0f02094 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -67,7 +67,6 @@ const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
[NETIF_F_HW_ESP_BIT] = "esp-hw-offload",
[NETIF_F_HW_ESP_TX_CSUM_BIT] = "esp-tx-csum-hw-offload",
[NETIF_F_RX_UDP_TUNNEL_PORT_BIT] = "rx-udp_tunnel-port-offload",
- [NETIF_F_HW_TLS_RECORD_BIT] = "tls-hw-record",
[NETIF_F_HW_TLS_TX_BIT] = "tls-hw-tx-offload",
[NETIF_F_HW_TLS_RX_BIT] = "tls-hw-rx-offload",
[NETIF_F_GRO_FRAGLIST_BIT] = "rx-gro-list",
diff --git a/net/tls/Kconfig b/net/tls/Kconfig
index a25bf57f2673..4f4d5973a28f 100644
--- a/net/tls/Kconfig
+++ b/net/tls/Kconfig
@@ -27,13 +27,3 @@ config TLS_DEVICE
Enable kernel support for HW offload of the TLS protocol.
If unsure, say N.
-
-config TLS_TOE
- bool "Transport Layer Security TCP stack bypass"
- depends on TLS
- default n
- help
- Enable kernel support for legacy HW offload of the TLS protocol,
- which is incompatible with the Linux networking stack semantics.
-
- If unsure, say N.
diff --git a/net/tls/Makefile b/net/tls/Makefile
index e41c800489ac..4c7d296081ee 100644
--- a/net/tls/Makefile
+++ b/net/tls/Makefile
@@ -9,5 +9,4 @@ obj-$(CONFIG_TLS) += tls.o
tls-y := tls_main.o tls_sw.o tls_proc.o trace.o tls_strp.o
-tls-$(CONFIG_TLS_TOE) += tls_toe.o
tls-$(CONFIG_TLS_DEVICE) += tls_device.o tls_device_fallback.o
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index c10a3fd7fc17..13c88a7b8787 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -43,8 +43,6 @@
#include <net/snmp.h>
#include <net/tls.h>
-#include <net/tls_toe.h>
-
#include "tls.h"
MODULE_AUTHOR("Mellanox Technologies");
@@ -963,9 +961,6 @@ static void build_proto_ops(struct proto_ops ops[TLS_NUM_CONFIG][TLS_NUM_CONFIG]
ops[TLS_HW ][TLS_HW ] = ops[TLS_HW ][TLS_SW ];
#endif
-#ifdef CONFIG_TLS_TOE
- ops[TLS_HW_RECORD][TLS_HW_RECORD] = *base;
-#endif
}
static void tls_build_proto(struct sock *sk)
@@ -1037,11 +1032,6 @@ static void build_protos(struct proto prot[TLS_NUM_CONFIG][TLS_NUM_CONFIG],
prot[TLS_HW][TLS_HW] = prot[TLS_HW][TLS_SW];
#endif
-#ifdef CONFIG_TLS_TOE
- prot[TLS_HW_RECORD][TLS_HW_RECORD] = *base;
- prot[TLS_HW_RECORD][TLS_HW_RECORD].hash = tls_toe_hash;
- prot[TLS_HW_RECORD][TLS_HW_RECORD].unhash = tls_toe_unhash;
-#endif
}
static int tls_init(struct sock *sk)
@@ -1051,11 +1041,6 @@ static int tls_init(struct sock *sk)
tls_build_proto(sk);
-#ifdef CONFIG_TLS_TOE
- if (tls_toe_bypass(sk))
- return 0;
-#endif
-
/* The TLS ulp is currently supported only for TCP sockets
* in ESTABLISHED state.
* Supporting sockets in LISTEN state will require us
@@ -1111,8 +1096,6 @@ static u16 tls_user_config(struct tls_context *ctx, bool tx)
return TLS_CONF_SW;
case TLS_HW:
return TLS_CONF_HW;
- case TLS_HW_RECORD:
- return TLS_CONF_HW_RECORD;
}
return 0;
}
diff --git a/net/tls/tls_toe.c b/net/tls/tls_toe.c
deleted file mode 100644
index 825669e1ab47..000000000000
--- a/net/tls/tls_toe.c
+++ /dev/null
@@ -1,141 +0,0 @@
-/*
- * Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
- * Copyright (c) 2016-2017, Dave Watson <davejwatson@fb.com>. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses. You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- * - Redistributions of source code must retain the above
- * copyright notice, this list of conditions and the following
- * disclaimer.
- *
- * - Redistributions in binary form must reproduce the above
- * copyright notice, this list of conditions and the following
- * disclaimer in the documentation and/or other materials
- * provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-
-#include <linux/list.h>
-#include <linux/rcupdate.h>
-#include <linux/spinlock.h>
-#include <net/inet_connection_sock.h>
-#include <net/tls.h>
-#include <net/tls_toe.h>
-
-#include "tls.h"
-
-static LIST_HEAD(device_list);
-static DEFINE_SPINLOCK(device_spinlock);
-
-static void tls_toe_sk_destruct(struct sock *sk)
-{
- struct inet_connection_sock *icsk = inet_csk(sk);
- struct tls_context *ctx = tls_get_ctx(sk);
-
- ctx->sk_destruct(sk);
- /* Free ctx */
- rcu_assign_pointer(icsk->icsk_ulp_data, NULL);
- tls_ctx_free(sk, ctx);
-}
-
-int tls_toe_bypass(struct sock *sk)
-{
- struct tls_toe_device *dev;
- struct tls_context *ctx;
- int rc = 0;
-
- spin_lock_bh(&device_spinlock);
- list_for_each_entry(dev, &device_list, dev_list) {
- if (dev->feature && dev->feature(dev)) {
- ctx = tls_ctx_create(sk);
- if (!ctx)
- goto out;
-
- ctx->sk_destruct = sk->sk_destruct;
- sk->sk_destruct = tls_toe_sk_destruct;
- ctx->rx_conf = TLS_HW_RECORD;
- ctx->tx_conf = TLS_HW_RECORD;
- update_sk_prot(sk, ctx);
- rc = 1;
- break;
- }
- }
-out:
- spin_unlock_bh(&device_spinlock);
- return rc;
-}
-
-void tls_toe_unhash(struct sock *sk)
-{
- struct tls_context *ctx = tls_get_ctx(sk);
- struct tls_toe_device *dev;
-
- spin_lock_bh(&device_spinlock);
- list_for_each_entry(dev, &device_list, dev_list) {
- if (dev->unhash) {
- kref_get(&dev->kref);
- spin_unlock_bh(&device_spinlock);
- dev->unhash(dev, sk);
- kref_put(&dev->kref, dev->release);
- spin_lock_bh(&device_spinlock);
- }
- }
- spin_unlock_bh(&device_spinlock);
- ctx->sk_proto->unhash(sk);
-}
-
-int tls_toe_hash(struct sock *sk)
-{
- struct tls_context *ctx = tls_get_ctx(sk);
- struct tls_toe_device *dev;
- int err;
-
- err = ctx->sk_proto->hash(sk);
- spin_lock_bh(&device_spinlock);
- list_for_each_entry(dev, &device_list, dev_list) {
- if (dev->hash) {
- kref_get(&dev->kref);
- spin_unlock_bh(&device_spinlock);
- err |= dev->hash(dev, sk);
- kref_put(&dev->kref, dev->release);
- spin_lock_bh(&device_spinlock);
- }
- }
- spin_unlock_bh(&device_spinlock);
-
- if (err)
- tls_toe_unhash(sk);
- return err;
-}
-
-void tls_toe_register_device(struct tls_toe_device *device)
-{
- spin_lock_bh(&device_spinlock);
- list_add_tail(&device->dev_list, &device_list);
- spin_unlock_bh(&device_spinlock);
-}
-EXPORT_SYMBOL(tls_toe_register_device);
-
-void tls_toe_unregister_device(struct tls_toe_device *device)
-{
- spin_lock_bh(&device_spinlock);
- list_del(&device->dev_list);
- spin_unlock_bh(&device_spinlock);
-}
-EXPORT_SYMBOL(tls_toe_unregister_device);
--
2.54.0
^ permalink raw reply related
* Re: [PATCH net-next 1/3] docs: net: fix minor issues with XDP metadata docs
From: Jesper Dangaard Brouer @ 2026-06-11 10:06 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet, linux-doc,
bpf, skhan, ast, daniel, john.fastabend, sdf
In-Reply-To: <20260609201224.1191391-2-kuba@kernel.org>
On 09/06/2026 22.12, Jakub Kicinski wrote:
> Minor updates to the XDP metadata documentation:
> - s/union/struct/ for xsk_tx_metadata
> - document nested request and completion metadata fields
> - point capability queries at the xsk-features attribute
> - fix grammar in the XDP RX metadata guide
> - typos
>
> Signed-off-by: Jakub Kicinski<kuba@kernel.org>
> ---
> CC:corbet@lwn.net
> CC:skhan@linuxfoundation.org
> CC:ast@kernel.org
> CC:daniel@iogearbox.net
> CC:hawk@kernel.org
> CC:john.fastabend@gmail.com
> CC:sdf@fomichev.me
> CC:linux-doc@vger.kernel.org
> CC:bpf@vger.kernel.org
> ---
> Documentation/networking/xdp-rx-metadata.rst | 2 +-
> Documentation/networking/xsk-tx-metadata.rst | 30 +++++++++++---------
> 2 files changed, 17 insertions(+), 15 deletions(-)
LGTM
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
^ permalink raw reply
* Re: [PATCH net-next v3 0/3] Add standard stats for HSR/PRP
From: MD Danish Anwar @ 2026-06-11 9:51 UTC (permalink / raw)
To: Simon Horman
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Jonathan Corbet, Shuah Khan, Roger Quadros, Andrew Lunn,
Jacob Keller, Meghana Malladi, David Carlier, Vadim Fedorenko,
Kevin Hao, Himanshu Mittal, Hangbin Liu, Markus Elfring,
Fernando Fernandez Mancera, Jan Vaclav, netdev, linux-doc,
linux-kernel, linux-arm-kernel, Felix Maurer, Luka Gejak
In-Reply-To: <20260610184737.GO3920875@horms.kernel.org>
Hi Simon,
On 11/06/26 12:17 am, Simon Horman wrote:
> On Mon, Jun 08, 2026 at 03:39:27PM +0530, MD Danish Anwar wrote:
>> Add standard stats for HSR / PRP. This series was initially adding HSR/PRP
>> related stats for ICSSG driver. Based on maintainers' comments on v2 I am
>> now adding support to dump standard stats for HSR/PRP.
>>
>> The drivers which support offload can populate these standard stats.
>>
>> This series only implements offloaded stats. For software-only interfaces
>> Felix Maurer had said he will do it later [1]
>>
>> v2 https://lore.kernel.org/all/20260514075605.850674-1-danishanwar@ti.com/
>> [1] https://lore.kernel.org/all/ag87pBZfOyccPZTc@thinkpad/
>>
>> Cc: Jakub Kicinski <kuba@kernel.org>
>> Cc: Felix Maurer <fmaurer@redhat.com>
>> Cc: Luka Gejak <luka.gejak@linux.dev>
>
> Hi MD,
>
> There is AI-generated review of this patch-set available on both
> https://sashiko.dev and https://netdev-ai.bots.linux.dev/sashiko/
> I would appreciate it if you could look over that with a view
> to addressing any issues that directly affect this patch-set.
I did look at the AI-generated reviews. The review on Patch 1/3 and 3/3
seems like a real issue to me which I have fixed. The reviews on patch
2/2 is not related to the series.
I have posted v4 with this fix
https://lore.kernel.org/all/20260611095035.852370-1-danishanwar@ti.com/
--
Thanks and Regards,
Danish
^ permalink raw reply
* [PATCH net-next v4 3/3] net: ti: icssg: Add HSR offload statistics support
From: MD Danish Anwar @ 2026-06-11 9:50 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Jonathan Corbet, Shuah Khan, MD Danish Anwar,
Roger Quadros, Andrew Lunn, Meghana Malladi, Jacob Keller,
David Carlier, Vadim Fedorenko, Kevin Hao, Markus Elfring,
Hangbin Liu, Fernando Fernandez Mancera, Jan Vaclav
Cc: netdev, linux-doc, linux-kernel, linux-arm-kernel
In-Reply-To: <20260611095035.852370-1-danishanwar@ti.com>
Add support for exposing ICSSG HSR statistics through two interfaces:
ethtool and the standard RTM_GETSTATS / IFLA_STATS_LINK_XSTATS path.
Add a standard_stats flag to struct icssg_pa_stats and extend
icssg_all_pa_stats[] with 10 new entries:
Firmware-specific HSR counters (standard_stats=false, ethtool only):
- FW_HSR_FWD_CHECK_FAIL_DROP
- FW_HSR_HE_CHECK_FAIL_DROP
- FW_HSR_SKIP_HOST_DUP_DISCARD
IEC 62439-3 LRE counters (standard_stats=true, excluded from ethtool):
- FW_LRE_CNT_UNIQUE_RX, FW_LRE_CNT_DUPLICATE_RX, FW_LRE_CNT_MULTIPLE_RX
- FW_LRE_CNT_RX, FW_LRE_CNT_TX, FW_LRE_CNT_OWN_RX
- FW_LRE_CNT_ERRWRONGLAN
The ethtool get_strings/get_ethtool_stats callbacks skip entries with
standard_stats=true so they do not appear as ethtool counters.
ICSSG_NUM_PA_STANDARD_STATS is introduced and accounted for in
ICSSG_NUM_ETHTOOL_STATS so the sset count stays accurate.
ICSSG_NUM_PA_STATS is updated from 32 to 42.
Implement ndo_has_offload_stats() and ndo_get_offload_stats() in
emac_netdev_ops to expose the IEC 62439-3 LRE counters via the HSR
stack's RTM_GETSTATS / IFLA_STATS_LINK_XSTATS interface. The HSR stack
calls these NDOs on slave A; the callback reads PA stat registers for
both ports (MAC0 = port A, MAC1 = port B) from the shared prueth
instance and fills struct hsr_lre_stats. Port C counters are not
available in ICSSG hardware and remain at ~0ULL.
Export emac_update_hardware_stats() and emac_get_stat_by_name() as
GPL symbols so they can be called from icssg_prueth.c. Also change
emac_get_stat_by_name() return type from int to u64 and make it return
~0ULL on an unknown stat name instead of -EINVAL, consistent with the
hsr_lre_stats sentinel convention.
Add FW_HSR_FWD_CHECK_FAIL_DROP and FW_HSR_HE_CHECK_FAIL_DROP to the
rx_dropped sum in ndo_get_stats64, as these represent frames discarded
by the HSR forwarding logic.
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
---
.../ethernet/ti/icssg_prueth.rst | 19 ++++
drivers/net/ethernet/ti/icssg/icssg_common.c | 7 +-
drivers/net/ethernet/ti/icssg/icssg_ethtool.c | 10 +-
drivers/net/ethernet/ti/icssg/icssg_prueth.c | 92 +++++++++++++++++++
drivers/net/ethernet/ti/icssg/icssg_prueth.h | 10 +-
drivers/net/ethernet/ti/icssg/icssg_stats.c | 6 +-
drivers/net/ethernet/ti/icssg/icssg_stats.h | 85 +++++++++--------
.../net/ethernet/ti/icssg/icssg_switch_map.h | 10 ++
8 files changed, 193 insertions(+), 46 deletions(-)
diff --git a/Documentation/networking/device_drivers/ethernet/ti/icssg_prueth.rst b/Documentation/networking/device_drivers/ethernet/ti/icssg_prueth.rst
index da21ddf431bbc..faa1fc18a6737 100644
--- a/Documentation/networking/device_drivers/ethernet/ti/icssg_prueth.rst
+++ b/Documentation/networking/device_drivers/ethernet/ti/icssg_prueth.rst
@@ -54,3 +54,22 @@ These statistics are as follows,
- ``FW_HOST_TX_PKT_CNT``: Number of valid packets copied by RTU0 to Tx queues
- ``FW_HOST_EGRESS_Q_PRE_OVERFLOW``: Host Egress Q (Pre-emptible) Overflow Counter
- ``FW_HOST_EGRESS_Q_EXP_OVERFLOW``: Host Egress Q (Pre-emptible) Overflow Counter
+ - ``FW_HSR_FWD_CHECK_FAIL_DROP``: Packets dropped on the HSR forwarding path due to failed checks
+ - ``FW_HSR_HE_CHECK_FAIL_DROP``: Packets dropped on the host egress path due to failed checks
+ - ``FW_HSR_SKIP_HOST_DUP_DISCARD``: Frames for which the host duplicate discard check was skipped
+
+HSR/LRE Standard Statistics
+============================
+
+When the ICSSG operates in HSR offload mode the driver exposes the IEC 62439-3
+LRE counters through the standard netlink stats interface.
+
+The following per-port (port A and port B) LRE counters are reported:
+
+ - ``lreCntTx``: Number of HSR/PRP tagged frames sent
+ - ``lreCntRx``: Number of HSR/PRP tagged frames received
+ - ``lreCntUnique``: Number of frames received with no duplicate detected
+ - ``lreCntDuplicate``: Number of frames received for which exactly one duplicate was detected
+ - ``lreCntMultiple``: Number of frames received for which more than one duplicate was detected
+ - ``lreCntOwnRx``: Number of HSR/PRP tagged frames received whose source MAC matches the node's own address
+ - ``lreCntErrWrongLan``: Number of frames received with a wrong LAN identifier (PRP only)
diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
index a28a608f9bf4b..1fcb031949535 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_common.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
@@ -1643,7 +1643,12 @@ void icssg_ndo_get_stats64(struct net_device *ndev,
emac_get_stat_by_name(emac, "FW_INF_DROP_TAGGED") +
emac_get_stat_by_name(emac, "FW_INF_DROP_PRIOTAGGED") +
emac_get_stat_by_name(emac, "FW_INF_DROP_NOTAG") +
- emac_get_stat_by_name(emac, "FW_INF_DROP_NOTMEMBER");
+ emac_get_stat_by_name(emac,
+ "FW_INF_DROP_NOTMEMBER") +
+ emac_get_stat_by_name(emac,
+ "FW_HSR_FWD_CHECK_FAIL_DROP") +
+ emac_get_stat_by_name(emac,
+ "FW_HSR_HE_CHECK_FAIL_DROP");
stats->tx_errors = ndev->stats.tx_errors;
stats->tx_dropped = ndev->stats.tx_dropped +
emac_get_stat_by_name(emac, "FW_RTU_PKT_DROP") +
diff --git a/drivers/net/ethernet/ti/icssg/icssg_ethtool.c b/drivers/net/ethernet/ti/icssg/icssg_ethtool.c
index b715af21d23ac..7a99c99aab1e8 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_ethtool.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_ethtool.c
@@ -74,7 +74,9 @@ static int emac_get_sset_count(struct net_device *ndev, int stringset)
if (emac->prueth->pa_stats)
return ICSSG_NUM_ETHTOOL_STATS;
else
- return ICSSG_NUM_ETHTOOL_STATS - ICSSG_NUM_PA_STATS;
+ return ICSSG_NUM_ETHTOOL_STATS -
+ (ICSSG_NUM_PA_STATS -
+ ICSSG_NUM_PA_STANDARD_STATS);
default:
return -EOPNOTSUPP;
}
@@ -93,7 +95,8 @@ static void emac_get_strings(struct net_device *ndev, u32 stringset, u8 *data)
ethtool_puts(&p, icssg_all_miig_stats[i].name);
if (emac->prueth->pa_stats)
for (i = 0; i < ARRAY_SIZE(icssg_all_pa_stats); i++)
- ethtool_puts(&p, icssg_all_pa_stats[i].name);
+ if (!icssg_all_pa_stats[i].standard_stats)
+ ethtool_puts(&p, icssg_all_pa_stats[i].name);
break;
default:
break;
@@ -114,7 +117,8 @@ static void emac_get_ethtool_stats(struct net_device *ndev,
if (emac->prueth->pa_stats)
for (i = 0; i < ARRAY_SIZE(icssg_all_pa_stats); i++)
- *(data++) = emac->pa_stats[i];
+ if (!icssg_all_pa_stats[i].standard_stats)
+ *(data++) = emac->pa_stats[i];
}
static int emac_get_ts_info(struct net_device *ndev,
diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.c b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
index 591be5c8056b4..6df5bb2582928 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_prueth.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
@@ -14,6 +14,7 @@
#include <linux/etherdevice.h>
#include <linux/genalloc.h>
#include <linux/if_hsr.h>
+#include <linux/if_link.h>
#include <linux/if_vlan.h>
#include <linux/interrupt.h>
#include <linux/io-64-nonatomic-hi-lo.h>
@@ -1633,6 +1634,95 @@ int prueth_xsk_wakeup(struct net_device *ndev, u32 qid, u32 flags)
return 0;
}
+/**
+ * prueth_ndo_get_offload_stats - Fill standard LRE counters from ICSSG.
+ * @attr_id: Stats attribute ID; only IFLA_OFFLOAD_XSTATS_LRE_STATS is handled.
+ * @dev: Slave net_device (port A) whose offload stats are requested.
+ * @sp: Output pointer; cast to struct hsr_lre_stats *.
+ *
+ * Called by the HSR stack via ndo_get_offload_stats on the slave A device.
+ * Fetches the per-port PA stat register snapshots for port A and port B,
+ * and fills the IEC-62439-3 per-port LRE counters. Port C (interlink)
+ * counters are not available in ICSSG hardware and remain at ~0ULL.
+ *
+ * Return: 0 on success, -EOPNOTSUPP if the device does not support
+ * HSR offload statistics for the requested attribute.
+ */
+static int prueth_ndo_get_offload_stats(int attr_id,
+ const struct net_device *dev,
+ void *sp)
+{
+ struct prueth_emac *emac = netdev_priv(dev);
+ struct prueth *prueth = emac->prueth;
+ struct hsr_lre_stats *stats = sp;
+ struct prueth_emac *emac0;
+ struct prueth_emac *emac1;
+
+ if (attr_id != IFLA_OFFLOAD_XSTATS_LRE_STATS)
+ return -EOPNOTSUPP;
+
+ if (!prueth->is_hsr_offload_mode)
+ return -EOPNOTSUPP;
+
+ /* Current emac is SlaveA. Other emac is SlaveB */
+ emac0 = emac;
+ emac1 = prueth->emac[1 - prueth_emac_slice(emac)];
+
+ if (!prueth->pa_stats)
+ return -EOPNOTSUPP;
+
+ /* Initialise all fields to ~0ULL ("unsupported"); only port A and B
+ * counters are filled — port C and aggregate counters are not
+ * available in ICSSG hardware.
+ */
+ memset(stats, 0xff, sizeof(*stats));
+
+ emac_update_hardware_stats(emac0);
+ stats->cnt_tx_a =
+ emac_get_stat_by_name(emac0, "FW_LRE_CNT_TX");
+ stats->cnt_rx_a =
+ emac_get_stat_by_name(emac0, "FW_LRE_CNT_RX");
+ stats->cnt_unique_a =
+ emac_get_stat_by_name(emac0, "FW_LRE_CNT_UNIQUE_RX");
+ stats->cnt_duplicate_a =
+ emac_get_stat_by_name(emac0, "FW_LRE_CNT_DUPLICATE_RX");
+ stats->cnt_multi_a =
+ emac_get_stat_by_name(emac0, "FW_LRE_CNT_MULTIPLE_RX");
+ stats->cnt_own_rx_a =
+ emac_get_stat_by_name(emac0, "FW_LRE_CNT_OWN_RX");
+ /* lreCntErrWrongLan is PRP only */
+ stats->cnt_err_wrong_lan_a =
+ emac_get_stat_by_name(emac0, "FW_LRE_CNT_ERRWRONGLAN");
+
+ emac_update_hardware_stats(emac1);
+ stats->cnt_tx_b =
+ emac_get_stat_by_name(emac1, "FW_LRE_CNT_TX");
+ stats->cnt_rx_b =
+ emac_get_stat_by_name(emac1, "FW_LRE_CNT_RX");
+ stats->cnt_unique_b =
+ emac_get_stat_by_name(emac1, "FW_LRE_CNT_UNIQUE_RX");
+ stats->cnt_duplicate_b =
+ emac_get_stat_by_name(emac1, "FW_LRE_CNT_DUPLICATE_RX");
+ stats->cnt_multi_b =
+ emac_get_stat_by_name(emac1, "FW_LRE_CNT_MULTIPLE_RX");
+ stats->cnt_own_rx_b =
+ emac_get_stat_by_name(emac1, "FW_LRE_CNT_OWN_RX");
+ stats->cnt_err_wrong_lan_b =
+ emac_get_stat_by_name(emac1, "FW_LRE_CNT_ERRWRONGLAN");
+
+ return 0;
+}
+
+static bool prueth_ndo_has_offload_stats(const struct net_device *dev,
+ int attr_id)
+{
+ struct prueth_emac *emac = netdev_priv(dev);
+ struct prueth *prueth = emac->prueth;
+
+ return attr_id == IFLA_OFFLOAD_XSTATS_LRE_STATS &&
+ prueth->is_hsr_offload_mode && prueth->pa_stats;
+}
+
static const struct net_device_ops emac_netdev_ops = {
.ndo_open = emac_ndo_open,
.ndo_stop = emac_ndo_stop,
@@ -1652,6 +1742,8 @@ static const struct net_device_ops emac_netdev_ops = {
.ndo_hwtstamp_get = icssg_ndo_get_ts_config,
.ndo_hwtstamp_set = icssg_ndo_set_ts_config,
.ndo_xsk_wakeup = prueth_xsk_wakeup,
+ .ndo_has_offload_stats = prueth_ndo_has_offload_stats,
+ .ndo_get_offload_stats = prueth_ndo_get_offload_stats,
};
static int prueth_netdev_init(struct prueth *prueth,
diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
index df93d15c5b786..d6c221e897924 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
+++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
@@ -57,12 +57,14 @@
#define ICSSG_MAX_RFLOWS 8 /* per slice */
-#define ICSSG_NUM_PA_STATS 32
+#define ICSSG_NUM_PA_STATS 42
#define ICSSG_NUM_MIIG_STATS 60
/* Number of ICSSG related stats */
#define ICSSG_NUM_STATS (ICSSG_NUM_MIIG_STATS + ICSSG_NUM_PA_STATS)
-#define ICSSG_NUM_STANDARD_STATS 31
-#define ICSSG_NUM_ETHTOOL_STATS (ICSSG_NUM_STATS - ICSSG_NUM_STANDARD_STATS)
+#define ICSSG_NUM_STANDARD_STATS 31
+#define ICSSG_NUM_PA_STANDARD_STATS 7
+#define ICSSG_NUM_ETHTOOL_STATS (ICSSG_NUM_STATS - ICSSG_NUM_STANDARD_STATS - \
+ ICSSG_NUM_PA_STANDARD_STATS)
#define IEP_DEFAULT_CYCLE_TIME_NS 1000000 /* 1 ms */
@@ -458,7 +460,7 @@ int emac_fdb_flow_id_updated(struct prueth_emac *emac);
void icssg_stats_work_handler(struct work_struct *work);
void emac_update_hardware_stats(struct prueth_emac *emac);
-int emac_get_stat_by_name(struct prueth_emac *emac, char *stat_name);
+u64 emac_get_stat_by_name(struct prueth_emac *emac, char *stat_name);
/* Common functions */
void prueth_cleanup_rx_chns(struct prueth_emac *emac,
diff --git a/drivers/net/ethernet/ti/icssg/icssg_stats.c b/drivers/net/ethernet/ti/icssg/icssg_stats.c
index 7159baa0155cf..9950d0ba899fa 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_stats.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_stats.c
@@ -62,6 +62,7 @@ void emac_update_hardware_stats(struct prueth_emac *emac)
spin_unlock(&prueth->stats_lock);
}
+EXPORT_SYMBOL_GPL(emac_update_hardware_stats);
void icssg_stats_work_handler(struct work_struct *work)
{
@@ -74,7 +75,7 @@ void icssg_stats_work_handler(struct work_struct *work)
}
EXPORT_SYMBOL_GPL(icssg_stats_work_handler);
-int emac_get_stat_by_name(struct prueth_emac *emac, char *stat_name)
+u64 emac_get_stat_by_name(struct prueth_emac *emac, char *stat_name)
{
int i;
@@ -91,5 +92,6 @@ int emac_get_stat_by_name(struct prueth_emac *emac, char *stat_name)
}
netdev_err(emac->ndev, "Invalid stats %s\n", stat_name);
- return -EINVAL;
+ return ~0ULL;
}
+EXPORT_SYMBOL_GPL(emac_get_stat_by_name);
diff --git a/drivers/net/ethernet/ti/icssg/icssg_stats.h b/drivers/net/ethernet/ti/icssg/icssg_stats.h
index 6f4400d8a0f61..373debfb815cc 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_stats.h
+++ b/drivers/net/ethernet/ti/icssg/icssg_stats.h
@@ -157,50 +157,63 @@ static const struct icssg_miig_stats icssg_all_miig_stats[] = {
static_assert(ARRAY_SIZE(icssg_all_miig_stats) == ICSSG_NUM_MIIG_STATS);
-#define ICSSG_PA_STATS(field) \
-{ \
- #field, \
- field, \
+#define ICSSG_PA_STATS(field, stats_type) \
+{ \
+ #field, \
+ field, \
+ stats_type \
}
struct icssg_pa_stats {
char name[ETH_GSTRING_LEN];
u32 offset;
+ bool standard_stats;
};
static const struct icssg_pa_stats icssg_all_pa_stats[] = {
- ICSSG_PA_STATS(FW_RTU_PKT_DROP),
- ICSSG_PA_STATS(FW_Q0_OVERFLOW),
- ICSSG_PA_STATS(FW_Q1_OVERFLOW),
- ICSSG_PA_STATS(FW_Q2_OVERFLOW),
- ICSSG_PA_STATS(FW_Q3_OVERFLOW),
- ICSSG_PA_STATS(FW_Q4_OVERFLOW),
- ICSSG_PA_STATS(FW_Q5_OVERFLOW),
- ICSSG_PA_STATS(FW_Q6_OVERFLOW),
- ICSSG_PA_STATS(FW_Q7_OVERFLOW),
- ICSSG_PA_STATS(FW_DROPPED_PKT),
- ICSSG_PA_STATS(FW_RX_ERROR),
- ICSSG_PA_STATS(FW_RX_DS_INVALID),
- ICSSG_PA_STATS(FW_TX_DROPPED_PACKET),
- ICSSG_PA_STATS(FW_TX_TS_DROPPED_PACKET),
- ICSSG_PA_STATS(FW_INF_PORT_DISABLED),
- ICSSG_PA_STATS(FW_INF_SAV),
- ICSSG_PA_STATS(FW_INF_SA_DL),
- ICSSG_PA_STATS(FW_INF_PORT_BLOCKED),
- ICSSG_PA_STATS(FW_INF_DROP_TAGGED),
- ICSSG_PA_STATS(FW_INF_DROP_PRIOTAGGED),
- ICSSG_PA_STATS(FW_INF_DROP_NOTAG),
- ICSSG_PA_STATS(FW_INF_DROP_NOTMEMBER),
- ICSSG_PA_STATS(FW_RX_EOF_SHORT_FRMERR),
- ICSSG_PA_STATS(FW_RX_B0_DROP_EARLY_EOF),
- ICSSG_PA_STATS(FW_TX_JUMBO_FRM_CUTOFF),
- ICSSG_PA_STATS(FW_RX_EXP_FRAG_Q_DROP),
- ICSSG_PA_STATS(FW_RX_FIFO_OVERRUN),
- ICSSG_PA_STATS(FW_CUT_THR_PKT),
- ICSSG_PA_STATS(FW_HOST_RX_PKT_CNT),
- ICSSG_PA_STATS(FW_HOST_TX_PKT_CNT),
- ICSSG_PA_STATS(FW_HOST_EGRESS_Q_PRE_OVERFLOW),
- ICSSG_PA_STATS(FW_HOST_EGRESS_Q_EXP_OVERFLOW),
+ /* Firmware-specific stats: exposed via ethtool -S only */
+ ICSSG_PA_STATS(FW_RTU_PKT_DROP, false),
+ ICSSG_PA_STATS(FW_Q0_OVERFLOW, false),
+ ICSSG_PA_STATS(FW_Q1_OVERFLOW, false),
+ ICSSG_PA_STATS(FW_Q2_OVERFLOW, false),
+ ICSSG_PA_STATS(FW_Q3_OVERFLOW, false),
+ ICSSG_PA_STATS(FW_Q4_OVERFLOW, false),
+ ICSSG_PA_STATS(FW_Q5_OVERFLOW, false),
+ ICSSG_PA_STATS(FW_Q6_OVERFLOW, false),
+ ICSSG_PA_STATS(FW_Q7_OVERFLOW, false),
+ ICSSG_PA_STATS(FW_DROPPED_PKT, false),
+ ICSSG_PA_STATS(FW_RX_ERROR, false),
+ ICSSG_PA_STATS(FW_RX_DS_INVALID, false),
+ ICSSG_PA_STATS(FW_TX_DROPPED_PACKET, false),
+ ICSSG_PA_STATS(FW_TX_TS_DROPPED_PACKET, false),
+ ICSSG_PA_STATS(FW_INF_PORT_DISABLED, false),
+ ICSSG_PA_STATS(FW_INF_SAV, false),
+ ICSSG_PA_STATS(FW_INF_SA_DL, false),
+ ICSSG_PA_STATS(FW_INF_PORT_BLOCKED, false),
+ ICSSG_PA_STATS(FW_INF_DROP_TAGGED, false),
+ ICSSG_PA_STATS(FW_INF_DROP_PRIOTAGGED, false),
+ ICSSG_PA_STATS(FW_INF_DROP_NOTAG, false),
+ ICSSG_PA_STATS(FW_INF_DROP_NOTMEMBER, false),
+ ICSSG_PA_STATS(FW_RX_EOF_SHORT_FRMERR, false),
+ ICSSG_PA_STATS(FW_RX_B0_DROP_EARLY_EOF, false),
+ ICSSG_PA_STATS(FW_TX_JUMBO_FRM_CUTOFF, false),
+ ICSSG_PA_STATS(FW_RX_EXP_FRAG_Q_DROP, false),
+ ICSSG_PA_STATS(FW_RX_FIFO_OVERRUN, false),
+ ICSSG_PA_STATS(FW_CUT_THR_PKT, false),
+ ICSSG_PA_STATS(FW_HOST_RX_PKT_CNT, false),
+ ICSSG_PA_STATS(FW_HOST_TX_PKT_CNT, false),
+ ICSSG_PA_STATS(FW_HOST_EGRESS_Q_PRE_OVERFLOW, false),
+ ICSSG_PA_STATS(FW_HOST_EGRESS_Q_EXP_OVERFLOW, false),
+ ICSSG_PA_STATS(FW_HSR_FWD_CHECK_FAIL_DROP, false),
+ ICSSG_PA_STATS(FW_HSR_HE_CHECK_FAIL_DROP, false),
+ ICSSG_PA_STATS(FW_HSR_SKIP_HOST_DUP_DISCARD, false),
+ ICSSG_PA_STATS(FW_LRE_CNT_UNIQUE_RX, true),
+ ICSSG_PA_STATS(FW_LRE_CNT_DUPLICATE_RX, true),
+ ICSSG_PA_STATS(FW_LRE_CNT_MULTIPLE_RX, true),
+ ICSSG_PA_STATS(FW_LRE_CNT_RX, true),
+ ICSSG_PA_STATS(FW_LRE_CNT_TX, true),
+ ICSSG_PA_STATS(FW_LRE_CNT_OWN_RX, true),
+ ICSSG_PA_STATS(FW_LRE_CNT_ERRWRONGLAN, true),
};
static_assert(ARRAY_SIZE(icssg_all_pa_stats) == ICSSG_NUM_PA_STATS);
diff --git a/drivers/net/ethernet/ti/icssg/icssg_switch_map.h b/drivers/net/ethernet/ti/icssg/icssg_switch_map.h
index 7e053b8af3ece..556facb33e0ce 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_switch_map.h
+++ b/drivers/net/ethernet/ti/icssg/icssg_switch_map.h
@@ -266,5 +266,15 @@
#define FW_HOST_TX_PKT_CNT 0x0250
#define FW_HOST_EGRESS_Q_PRE_OVERFLOW 0x0258
#define FW_HOST_EGRESS_Q_EXP_OVERFLOW 0x0260
+#define FW_HSR_FWD_CHECK_FAIL_DROP 0x0500
+#define FW_HSR_HE_CHECK_FAIL_DROP 0x0508
+#define FW_HSR_SKIP_HOST_DUP_DISCARD 0x0510
+#define FW_LRE_CNT_UNIQUE_RX 0x0518
+#define FW_LRE_CNT_DUPLICATE_RX 0x0520
+#define FW_LRE_CNT_MULTIPLE_RX 0x0528
+#define FW_LRE_CNT_RX 0x0530
+#define FW_LRE_CNT_TX 0x0538
+#define FW_LRE_CNT_OWN_RX 0x0540
+#define FW_LRE_CNT_ERRWRONGLAN 0x0548
#endif /* __NET_TI_ICSSG_SWITCH_MAP_H */
--
2.34.1
^ permalink raw reply related
* [PATCH net-next v4 2/3] net: ti: icssg: Add static_assert to guard stat array counts
From: MD Danish Anwar @ 2026-06-11 9:50 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Jonathan Corbet, Shuah Khan, MD Danish Anwar,
Roger Quadros, Andrew Lunn, Meghana Malladi, Jacob Keller,
David Carlier, Vadim Fedorenko, Kevin Hao, Markus Elfring,
Hangbin Liu, Fernando Fernandez Mancera, Jan Vaclav
Cc: netdev, linux-doc, linux-kernel, linux-arm-kernel
In-Reply-To: <20260611095035.852370-1-danishanwar@ti.com>
Place static_assert() immediately after each of icssg_all_miig_stats[]
and icssg_all_pa_stats[] in icssg_stats.h to verify at build time that
ICSSG_NUM_MIIG_STATS and ICSSG_NUM_PA_STATS stay in sync with the
actual array sizes. This turns a silent miscount into a build error
should either the constant or the array be updated independently.
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
---
drivers/net/ethernet/ti/icssg/icssg_stats.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/ti/icssg/icssg_stats.h b/drivers/net/ethernet/ti/icssg/icssg_stats.h
index 5ec0b38e0c67d..6f4400d8a0f61 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_stats.h
+++ b/drivers/net/ethernet/ti/icssg/icssg_stats.h
@@ -155,6 +155,8 @@ static const struct icssg_miig_stats icssg_all_miig_stats[] = {
ICSSG_MIIG_STATS(tx_bytes, true),
};
+static_assert(ARRAY_SIZE(icssg_all_miig_stats) == ICSSG_NUM_MIIG_STATS);
+
#define ICSSG_PA_STATS(field) \
{ \
#field, \
@@ -201,4 +203,6 @@ static const struct icssg_pa_stats icssg_all_pa_stats[] = {
ICSSG_PA_STATS(FW_HOST_EGRESS_Q_EXP_OVERFLOW),
};
+static_assert(ARRAY_SIZE(icssg_all_pa_stats) == ICSSG_NUM_PA_STATS);
+
#endif /* __NET_TI_ICSSG_STATS_H */
--
2.34.1
^ permalink raw reply related
* [PATCH net-next v4 1/3] net: hsr: Add standard LRE stats via RTM_GETSTATS / IFLA_STATS_LINK_XSTATS
From: MD Danish Anwar @ 2026-06-11 9:50 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Jonathan Corbet, Shuah Khan, MD Danish Anwar,
Roger Quadros, Andrew Lunn, Meghana Malladi, Jacob Keller,
David Carlier, Vadim Fedorenko, Kevin Hao, Markus Elfring,
Hangbin Liu, Fernando Fernandez Mancera, Jan Vaclav
Cc: netdev, linux-doc, linux-kernel, linux-arm-kernel
In-Reply-To: <20260611095035.852370-1-danishanwar@ti.com>
Per the IEC-62439-3 specification the Link Redundancy Entity (LRE)
maintains a well-defined set of counters applicable to both software
and offloaded HSR/PRP implementations. Define these counters as
individual netlink attributes inside a LINK_XSTATS_TYPE_HSR nest,
following the approach used by bridge and bond with IFLA_STATS_LINK_XSTATS.
The full IEC-62439-3 MIB counter set is represented, with per-port (A,
B, C) granularity where applicable:
lreCntTx{A,B,C} - sent HSR/PRP tagged frames per port
lreCntRx{A,B,C} - received HSR/PRP tagged frames per port
lreCntErrWrongLan{A,B,C} - received frames with wrong LAN ID (PRP)
lreCntErrors{A,B,C} - received frames with errors per port
lreCntUnique{A,B,C} - frames received without duplicate
lreCntDuplicate{A,B,C} - frames received with exactly one duplicate
lreCntMulti{A,B,C} - frames received with more than one duplicate
lreCntOwnRx{A,B} - own-address frames received (HSR only)
Each counter is encoded as its own HSR_XSTATS_* u64 netlink attribute.
Unsupported counters are initialised to ~0ULL by the kernel and omitted
from the netlink reply; user-space must treat an absent attribute as
"not available".
The UAPI attribute enum (HSR_XSTATS_*) is added to hsr_netlink.h.
LINK_XSTATS_TYPE_HSR is added to the LINK_XSTATS_TYPE_* enum in both
include/uapi/linux/if_link.h and tools/include/uapi/linux/if_link.h.
A kernel-internal struct hsr_lre_stats (in linux/if_hsr.h) is provided
for offload drivers to fill via ndo_get_offload_stats. Unsupported
fields must be left at the ~0ULL value initialised by the HSR layer
before calling the NDO.
The HSR stack calls ndo_get_offload_stats on slave A to collect offload
counters.
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
---
include/linux/if_hsr.h | 48 +++++++++++
include/uapi/linux/hsr_netlink.h | 56 +++++++++++++
include/uapi/linux/if_link.h | 2 +
net/hsr/hsr_netlink.c | 130 +++++++++++++++++++++++++++--
tools/include/uapi/linux/if_link.h | 2 +
5 files changed, 230 insertions(+), 8 deletions(-)
diff --git a/include/linux/if_hsr.h b/include/linux/if_hsr.h
index f4cf2dd36d193..b8c20f0906194 100644
--- a/include/linux/if_hsr.h
+++ b/include/linux/if_hsr.h
@@ -38,6 +38,54 @@ struct hsr_tag {
#define HSR_HLEN 6
+/**
+ * struct hsr_lre_stats - Kernel-internal IEC-62439-3 LRE counter set.
+ *
+ * This is the buffer type written by ndo_get_offload_stats() when called
+ * with attr_id == IFLA_STATS_LINK_XSTATS on an HSR slave device. Each
+ * field maps to one HSR_XSTATS_* netlink attribute. Fields that the
+ * offload driver does not support must be left at the initialised value of
+ * ~0ULL; the HSR layer will skip those when building the netlink reply.
+ *
+ * Per-port suffix: _a = port A (slave 1 / LAN-A),
+ * _b = port B (slave 2 / LAN-B),
+ * _c = interlink / application interface.
+ *
+ * @cnt_tx_a: lreCntTxA - sent HSR/PRP tagged frames on port A.
+ * @cnt_tx_b: lreCntTxB - sent HSR/PRP tagged frames on port B.
+ * @cnt_tx_c: lreCntTxC - sent HSR/PRP tagged frames on port C.
+ * @cnt_rx_a: lreCntRxA - received HSR/PRP tagged frames on port A.
+ * @cnt_rx_b: lreCntRxB - received HSR/PRP tagged frames on port B.
+ * @cnt_rx_c: lreCntRxC - received HSR/PRP tagged frames on port C.
+ * @cnt_err_wrong_lan_a: lreCntErrWrongLanA - wrong LAN ID frames on port A.
+ * @cnt_err_wrong_lan_b: lreCntErrWrongLanB - wrong LAN ID frames on port B.
+ * @cnt_err_wrong_lan_c: lreCntErrWrongLanC - wrong LAN ID frames on port C.
+ * @cnt_errors_a: lreCntErrorsA - received frames with errors on port A.
+ * @cnt_errors_b: lreCntErrorsB - received frames with errors on port B.
+ * @cnt_errors_c: lreCntErrorsC - received frames with errors on port C.
+ * @cnt_unique_a: lreCntUniqueA - frames received without duplicate on port A.
+ * @cnt_unique_b: lreCntUniqueB - frames received without duplicate on port B.
+ * @cnt_unique_c: lreCntUniqueC - frames received without duplicate on port C.
+ * @cnt_duplicate_a: lreCntDuplicateA - frames with one duplicate on port A.
+ * @cnt_duplicate_b: lreCntDuplicateB - frames with one duplicate on port B.
+ * @cnt_duplicate_c: lreCntDuplicateC - frames with one duplicate on port C.
+ * @cnt_multi_a: lreCntMultiA - frames with more than one duplicate on port A.
+ * @cnt_multi_b: lreCntMultiB - frames with more than one duplicate on port B.
+ * @cnt_multi_c: lreCntMultiC - frames with more than one duplicate on port C.
+ * @cnt_own_rx_a: lreCntOwnRxA - own-address frames received on port A.
+ * @cnt_own_rx_b: lreCntOwnRxB - own-address frames received on port B.
+ */
+struct hsr_lre_stats {
+ u64 cnt_tx_a, cnt_tx_b, cnt_tx_c;
+ u64 cnt_rx_a, cnt_rx_b, cnt_rx_c;
+ u64 cnt_err_wrong_lan_a, cnt_err_wrong_lan_b, cnt_err_wrong_lan_c;
+ u64 cnt_errors_a, cnt_errors_b, cnt_errors_c;
+ u64 cnt_unique_a, cnt_unique_b, cnt_unique_c;
+ u64 cnt_duplicate_a, cnt_duplicate_b, cnt_duplicate_c;
+ u64 cnt_multi_a, cnt_multi_b, cnt_multi_c;
+ u64 cnt_own_rx_a, cnt_own_rx_b;
+};
+
#if IS_ENABLED(CONFIG_HSR)
extern bool is_hsr_master(struct net_device *dev);
extern int hsr_get_version(struct net_device *dev, enum hsr_version *ver);
diff --git a/include/uapi/linux/hsr_netlink.h b/include/uapi/linux/hsr_netlink.h
index d540ea9bbef4b..c414a2bb93b79 100644
--- a/include/uapi/linux/hsr_netlink.h
+++ b/include/uapi/linux/hsr_netlink.h
@@ -48,4 +48,60 @@ enum {
};
#define HSR_C_MAX (__HSR_C_MAX - 1)
+/* HSR/PRP LRE extended statistics attributes.
+ * Reported inside LINK_XSTATS_TYPE_HSR (RTM_GETSTATS / ip stats show).
+ * Counter definitions follow IEC-62439-3 MIB naming.
+ *
+ * All counters are __u64. Unsupported counters are omitted from the
+ * netlink reply; user-space must treat an absent attribute as "not available".
+ *
+ * Per-port suffix: _A = port A (slave 1), _B = port B (slave 2),
+ * _C = interlink / application interface.
+ */
+enum {
+ /* Sent HSR/PRP tagged frames per port */
+ HSR_XSTATS_CNT_TX_A = 1,
+ HSR_XSTATS_CNT_TX_B,
+ HSR_XSTATS_CNT_TX_C,
+
+ /* Received HSR/PRP tagged frames per port */
+ HSR_XSTATS_CNT_RX_A,
+ HSR_XSTATS_CNT_RX_B,
+ HSR_XSTATS_CNT_RX_C,
+
+ /* Received frames with wrong LAN ID (PRP only) per port */
+ HSR_XSTATS_CNT_ERR_WRONG_LAN_A,
+ HSR_XSTATS_CNT_ERR_WRONG_LAN_B,
+ HSR_XSTATS_CNT_ERR_WRONG_LAN_C,
+
+ /* Received frames with errors per port */
+ HSR_XSTATS_CNT_ERRORS_A,
+ HSR_XSTATS_CNT_ERRORS_B,
+ HSR_XSTATS_CNT_ERRORS_C,
+
+ /* Frames received with no duplicate per port */
+ HSR_XSTATS_CNT_UNIQUE_A,
+ HSR_XSTATS_CNT_UNIQUE_B,
+ HSR_XSTATS_CNT_UNIQUE_C,
+
+ /* Frames received with exactly one duplicate per port */
+ HSR_XSTATS_CNT_DUPLICATE_A,
+ HSR_XSTATS_CNT_DUPLICATE_B,
+ HSR_XSTATS_CNT_DUPLICATE_C,
+
+ /* Frames received with more than one duplicate per port */
+ HSR_XSTATS_CNT_MULTI_A,
+ HSR_XSTATS_CNT_MULTI_B,
+ HSR_XSTATS_CNT_MULTI_C,
+
+ /* Frames received matching this node's own address (HSR only) */
+ HSR_XSTATS_CNT_OWN_RX_A,
+ HSR_XSTATS_CNT_OWN_RX_B,
+
+ HSR_XSTATS_PAD,
+ __HSR_XSTATS_MAX,
+};
+
+#define HSR_XSTATS_MAX (__HSR_XSTATS_MAX - 1)
+
#endif /* __UAPI_HSR_NETLINK_H */
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 79ce4bc24cba6..11458f683624a 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -1905,6 +1905,7 @@ enum {
LINK_XSTATS_TYPE_UNSPEC,
LINK_XSTATS_TYPE_BRIDGE,
LINK_XSTATS_TYPE_BOND,
+ LINK_XSTATS_TYPE_HSR,
__LINK_XSTATS_TYPE_MAX
};
#define LINK_XSTATS_TYPE_MAX (__LINK_XSTATS_TYPE_MAX - 1)
@@ -1915,6 +1916,7 @@ enum {
IFLA_OFFLOAD_XSTATS_CPU_HIT, /* struct rtnl_link_stats64 */
IFLA_OFFLOAD_XSTATS_HW_S_INFO, /* HW stats info. A nest */
IFLA_OFFLOAD_XSTATS_L3_STATS, /* struct rtnl_hw_stats64 */
+ IFLA_OFFLOAD_XSTATS_LRE_STATS, /* struct hsr_lre_stats */
__IFLA_OFFLOAD_XSTATS_MAX
};
#define IFLA_OFFLOAD_XSTATS_MAX (__IFLA_OFFLOAD_XSTATS_MAX - 1)
diff --git a/net/hsr/hsr_netlink.c b/net/hsr/hsr_netlink.c
index db0b0af7a6920..31aedf7460a47 100644
--- a/net/hsr/hsr_netlink.c
+++ b/net/hsr/hsr_netlink.c
@@ -11,6 +11,8 @@
#include <linux/kernel.h>
#include <net/rtnetlink.h>
#include <net/genetlink.h>
+#include <uapi/linux/if_link.h>
+#include <uapi/linux/hsr_netlink.h>
#include "hsr_main.h"
#include "hsr_device.h"
#include "hsr_framereg.h"
@@ -189,15 +191,127 @@ static int hsr_fill_info(struct sk_buff *skb, const struct net_device *dev)
return -EMSGSIZE;
}
+/*
+ * Number of real HSR_XSTATS_* u64 counter attributes.
+ * Real counters run from HSR_XSTATS_CNT_TX_A(1) through
+ * HSR_XSTATS_CNT_OWN_RX_B(25); HSR_XSTATS_PAD is not a counter.
+ */
+#define HSR_XSTATS_CNT_ATTRS (HSR_XSTATS_PAD - 1)
+
+static size_t hsr_get_linkxstats_size(const struct net_device *dev, int attr)
+{
+ if (attr != IFLA_STATS_LINK_XSTATS)
+ return 0;
+
+ /* Nest header (LINK_XSTATS_TYPE_HSR) + one u64 nla per counter */
+ return nla_total_size(0) +
+ HSR_XSTATS_CNT_ATTRS * nla_total_size_64bit(sizeof(u64));
+}
+
+/* Put a u64 counter attribute; skip if value is ~0ULL (unsupported). */
+static int hsr_put_stat(struct sk_buff *skb, int attr_id, u64 val)
+{
+ if (val == ~0ULL)
+ return 0;
+ return nla_put_u64_64bit(skb, attr_id, val, HSR_XSTATS_PAD);
+}
+
+static int hsr_fill_linkxstats(struct sk_buff *skb,
+ const struct net_device *dev,
+ int *prividx, int attr)
+{
+ struct hsr_priv *hsr = netdev_priv(dev);
+ struct hsr_lre_stats stats;
+ int s_prividx = *prividx;
+ struct hsr_port *port;
+ struct nlattr *nest;
+ int err;
+
+ if (attr != IFLA_STATS_LINK_XSTATS)
+ return 0;
+
+ *prividx = 0;
+
+ nest = nla_nest_start_noflag(skb, LINK_XSTATS_TYPE_HSR);
+ if (!nest)
+ return -EMSGSIZE;
+
+ /* Initialise all counters to ~0ULL ("unsupported") */
+ memset(&stats, 0xff, sizeof(stats));
+
+ /* Ask the offload driver (if any) via ndo_get_offload_stats on slave A.
+ * Use IFLA_OFFLOAD_XSTATS_LRE_STATS, which is the correct identifier
+ * from enum ifla_offload_xstats that these NDOs expect.
+ */
+ port = hsr_port_get_hsr(hsr, HSR_PT_SLAVE_A);
+ if (port) {
+ const struct net_device_ops *ops = port->dev->netdev_ops;
+
+ if (ops->ndo_has_offload_stats &&
+ ops->ndo_has_offload_stats(port->dev,
+ IFLA_OFFLOAD_XSTATS_LRE_STATS) &&
+ ops->ndo_get_offload_stats) {
+ err = ops->ndo_get_offload_stats(IFLA_OFFLOAD_XSTATS_LRE_STATS,
+ port->dev, &stats);
+ if (err && err != -EOPNOTSUPP) {
+ nla_nest_cancel(skb, nest);
+ return err;
+ }
+ }
+ }
+
+#define PUT_STAT(attr, field) \
+ do { \
+ if (HSR_XSTATS_##attr < s_prividx) \
+ break; \
+ if (hsr_put_stat(skb, HSR_XSTATS_##attr, stats.field)) { \
+ *prividx = HSR_XSTATS_##attr; \
+ nla_nest_end(skb, nest); \
+ return -EMSGSIZE; \
+ } \
+ } while (0)
+
+ PUT_STAT(CNT_TX_A, cnt_tx_a);
+ PUT_STAT(CNT_TX_B, cnt_tx_b);
+ PUT_STAT(CNT_TX_C, cnt_tx_c);
+ PUT_STAT(CNT_RX_A, cnt_rx_a);
+ PUT_STAT(CNT_RX_B, cnt_rx_b);
+ PUT_STAT(CNT_RX_C, cnt_rx_c);
+ PUT_STAT(CNT_ERR_WRONG_LAN_A, cnt_err_wrong_lan_a);
+ PUT_STAT(CNT_ERR_WRONG_LAN_B, cnt_err_wrong_lan_b);
+ PUT_STAT(CNT_ERR_WRONG_LAN_C, cnt_err_wrong_lan_c);
+ PUT_STAT(CNT_ERRORS_A, cnt_errors_a);
+ PUT_STAT(CNT_ERRORS_B, cnt_errors_b);
+ PUT_STAT(CNT_ERRORS_C, cnt_errors_c);
+ PUT_STAT(CNT_UNIQUE_A, cnt_unique_a);
+ PUT_STAT(CNT_UNIQUE_B, cnt_unique_b);
+ PUT_STAT(CNT_UNIQUE_C, cnt_unique_c);
+ PUT_STAT(CNT_DUPLICATE_A, cnt_duplicate_a);
+ PUT_STAT(CNT_DUPLICATE_B, cnt_duplicate_b);
+ PUT_STAT(CNT_DUPLICATE_C, cnt_duplicate_c);
+ PUT_STAT(CNT_MULTI_A, cnt_multi_a);
+ PUT_STAT(CNT_MULTI_B, cnt_multi_b);
+ PUT_STAT(CNT_MULTI_C, cnt_multi_c);
+ PUT_STAT(CNT_OWN_RX_A, cnt_own_rx_a);
+ PUT_STAT(CNT_OWN_RX_B, cnt_own_rx_b);
+
+#undef PUT_STAT
+
+ nla_nest_end(skb, nest);
+ return 0;
+}
+
static struct rtnl_link_ops hsr_link_ops __read_mostly = {
- .kind = "hsr",
- .maxtype = IFLA_HSR_MAX,
- .policy = hsr_policy,
- .priv_size = sizeof(struct hsr_priv),
- .setup = hsr_dev_setup,
- .newlink = hsr_newlink,
- .dellink = hsr_dellink,
- .fill_info = hsr_fill_info,
+ .kind = "hsr",
+ .maxtype = IFLA_HSR_MAX,
+ .policy = hsr_policy,
+ .priv_size = sizeof(struct hsr_priv),
+ .setup = hsr_dev_setup,
+ .newlink = hsr_newlink,
+ .dellink = hsr_dellink,
+ .fill_info = hsr_fill_info,
+ .get_linkxstats_size = hsr_get_linkxstats_size,
+ .fill_linkxstats = hsr_fill_linkxstats,
};
/* attribute policy */
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index 7e46ca4cd31bb..ed4e6d53ccbab 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -1844,6 +1844,7 @@ enum {
LINK_XSTATS_TYPE_UNSPEC,
LINK_XSTATS_TYPE_BRIDGE,
LINK_XSTATS_TYPE_BOND,
+ LINK_XSTATS_TYPE_HSR,
__LINK_XSTATS_TYPE_MAX
};
#define LINK_XSTATS_TYPE_MAX (__LINK_XSTATS_TYPE_MAX - 1)
@@ -1854,6 +1855,7 @@ enum {
IFLA_OFFLOAD_XSTATS_CPU_HIT, /* struct rtnl_link_stats64 */
IFLA_OFFLOAD_XSTATS_HW_S_INFO, /* HW stats info. A nest */
IFLA_OFFLOAD_XSTATS_L3_STATS, /* struct rtnl_hw_stats64 */
+ IFLA_OFFLOAD_XSTATS_LRE_STATS, /* struct hsr_lre_stats */
__IFLA_OFFLOAD_XSTATS_MAX
};
#define IFLA_OFFLOAD_XSTATS_MAX (__IFLA_OFFLOAD_XSTATS_MAX - 1)
--
2.34.1
^ permalink raw reply related
* [PATCH net-next v4 0/3] Add standard stats for HSR/PRP
From: MD Danish Anwar @ 2026-06-11 9:50 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Jonathan Corbet, Shuah Khan, MD Danish Anwar,
Roger Quadros, Andrew Lunn, Meghana Malladi, Jacob Keller,
David Carlier, Vadim Fedorenko, Kevin Hao, Markus Elfring,
Hangbin Liu, Fernando Fernandez Mancera, Jan Vaclav
Cc: netdev, linux-doc, linux-kernel, linux-arm-kernel, Felix Maurer,
Luka Gejak
Add standard stats for HSR / PRP. This series was initially adding HSR/PRP
related stats for ICSSG driver. Based on maintainers' comments on v2 I am
now adding support to dump standard stats for HSR/PRP.
The drivers which support offload can populate these standard stats.
This series only implements offloaded stats. For software-only interfaces
Felix Maurer had said he will do it later [1]
v3 - v4:
*) Address AI review comments on Patch 1/3 and Patch 3/3. AI review comments
on patch 2/3 were not relevant to this series.
v3 https://lore.kernel.org/all/20260608100930.210149-1-danishanwar@ti.com/
v2 https://lore.kernel.org/all/20260514075605.850674-1-danishanwar@ti.com/
[1] https://lore.kernel.org/all/ag87pBZfOyccPZTc@thinkpad/
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Felix Maurer <fmaurer@redhat.com>
Cc: Luka Gejak <luka.gejak@linux.dev>
MD Danish Anwar (3):
net: hsr: Add standard LRE stats via RTM_GETSTATS /
IFLA_STATS_LINK_XSTATS
net: ti: icssg: Add static_assert to guard stat array counts
net: ti: icssg: Add HSR offload statistics support
.../ethernet/ti/icssg_prueth.rst | 19 +++
drivers/net/ethernet/ti/icssg/icssg_common.c | 7 +-
drivers/net/ethernet/ti/icssg/icssg_ethtool.c | 10 +-
drivers/net/ethernet/ti/icssg/icssg_prueth.c | 92 +++++++++++++
drivers/net/ethernet/ti/icssg/icssg_prueth.h | 10 +-
drivers/net/ethernet/ti/icssg/icssg_stats.c | 6 +-
drivers/net/ethernet/ti/icssg/icssg_stats.h | 89 +++++++-----
.../net/ethernet/ti/icssg/icssg_switch_map.h | 10 ++
include/linux/if_hsr.h | 48 +++++++
include/uapi/linux/hsr_netlink.h | 56 ++++++++
include/uapi/linux/if_link.h | 2 +
net/hsr/hsr_netlink.c | 130 ++++++++++++++++--
tools/include/uapi/linux/if_link.h | 2 +
13 files changed, 427 insertions(+), 54 deletions(-)
base-commit: 0068940907d33217ae01217f84910a5cde606c17
--
2.34.1
^ permalink raw reply
* Re: [RFC V2 1/3] lib/vsprintf: Add support for pgtable entries
From: Anshuman Khandual @ 2026-06-11 9:50 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Usama Arif, linux-mm, Rasmus Villemoes, Sergey Senozhatsky,
Petr Mladek, Steven Rostedt, Jonathan Corbet, Andrew Morton,
David Hildenbrand, linux-kernel, linux-doc, David Hildenbrand,
Lorenzo Stoakes, Andy Whitcroft
In-Reply-To: <aiphHAkLnG_L2kY2@ashevche-desk.local>
On 11/06/26 12:47 PM, Andy Shevchenko wrote:
> On Thu, Jun 11, 2026 at 10:45:01AM +0530, Anshuman Khandual wrote:
>> On 10/06/26 4:43 PM, Usama Arif wrote:
>>> On Wed, 10 Jun 2026 05:35:43 +0100 Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>
> ...
>
>>>> + static_assert(sizeof(pte_t) == 4 ||
>>>> + sizeof(pte_t) == 8,
>>>> + "pte_t size must be 4 or 8 bytes");
>
> Besides occupying too many lines, why are these static asserts hidden here and
> not declared in the global space? More wide Q is why they are needed at all?
Sure, will move these static_assert just above pxd_pointer()
These asserts ensure
- Platforms have either 32 bit or 64 bit pgtable descriptors
- special_hex_number() can be used to print such descriptors
^ permalink raw reply
* Re: [RFC PATCH 1/3] mm/numa: add exclusive node pool and numa=standby boot parameter
From: Mike Rapoport @ 2026-06-11 9:00 UTC (permalink / raw)
To: Gregory Price
Cc: linux-mm, x86, linux-doc, linux-kernel, linux-acpi, driver-core,
kernel-team, corbet, skhan, dave.hansen, luto, peterz, tglx,
mingo, bp, hpa, rafael, lenb, gregkh, dakr, akpm, rdunlap,
feng.tang, dapeng1.mi, elver, kuba, ebiggers, lirongqing, paulmck,
dave.jiang, jic23, xueshuai, kai.huang
In-Reply-To: <20260610014517.253609-2-gourry@gourry.net>
Hi,
On Tue, Jun 09, 2026 at 09:45:15PM -0400, Gregory Price wrote:
> It can be at times preferential to logically split up hotplug memory
> capacity into more nodes than are described by BIOS at boot time.
>
> However, if nodes are not described at __init time, they are not
> possible to add later on.
...
> 1) Can we do dynamic addition of nodes?
>
> Not Trivially
>
> Some services utilize num_possible_nodes() as a static value to
> calculate the amount of resources to use at runtime (bpf, md/raid5).
>
> Example: futex_init uses num_possible_nodes() as part of its
> hashsize calculation during __init.
AFAIU, we don't add the additional nodes for generic hotplug memory but
rather for exclusive use of by drivers/applications that are aware of these
nodes.
Wouldn't adding them to possible nodes actually skew the calculation of the
resources by the services utilizing num_possible_nodes()?
With the futex_init() example, won't be hashsize scaled down two much
because we've added these special nodes to the possible mask?
--
Sincerely yours,
Mike.
^ permalink raw reply
* Re: [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend
From: James Clark @ 2026-06-11 8:37 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Suzuki K Poulose, Mike Leach, Leo Yan, Namhyung Kim, Jiri Olsa,
Ian Rogers, Amir Ayupov, Jonathan Corbet, Shuah Khan,
Paschalis Mpeis, coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc
In-Reply-To: <ainFqtxdLwhbRqrI@x1>
On 10/06/2026 9:14 pm, Arnaldo Carvalho de Melo wrote:
> On Tue, Jun 09, 2026 at 03:40:05PM +0100, James Clark wrote:
>> Fix thread tracking when decoding Coresight trace and add a new test for
>> it.
>
> The issues found by sashiko seem mild and you can address them in follow
> up patches, I think.
>
> So for the benefit of having perf-tools-next available for linux-next
> testing and the window is closing soon, so I've merged this, ok?
>
> - Arnaldo
>
Thanks! I'll send any I think need fixes but I was going to mostly
ignore them, or I already replied to the same comments from it on
previous versions.
>> The new test is added as a Perf test workload instead of a custom binary
>> with its own build system, but this requires a new feature in Perf test
>> to pass in control pipes which can enable and disable events. This
>> scopes the recording to just the workload and helps to reduce the amount
>> of data recorded in tracing tests.
>>
>> With this new feature we can re-write all of the Coresight tests to make
>> use of it and remove the remaining binaries which fixes the following
>> issues:
>>
>> * They didn't work in out of source builds
>> * A lot of the tests unnecessarily required root and didn't skip
>> without it
>> * They were mainly qualitative tests which didn't look for specific
>> behavior
>>
>> Most importantly, the long build and runtime has been reduced. On a
>> Radxa Orion O6, unroll_loop_thread.c took 37s to compile which is longer
>> than the entire Perf build. Now the build time is negligible and the
>> before and after test runtimes for all the Coresight tests are:
>>
>> | N1SDP | Orion O6
>> -----------------------------------
>> Before | 4m 0s | 14m 49s
>> After | 26s | 56s
>> -----------------------------------
>>
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>> Changes in v5:
>> - Forgot to include this change:
>> - Test for actual length of expected raw dump (Leo)
>> - Link to v4: https://lore.kernel.org/r/20260609-james-cs-context-tracking-fix-v4-0-44f9fb9e5c42@linaro.org
>>
>> Changes in v4:
>> - Rename workload-ctl to record-ctl and improve docs (Leo)
>> - Use new packet argument everywhere in
>> cs_etm__synth_instruction_sample() (Sashiko)
>> - Test for actual length of expected raw dump (Leo)
>> - Use -fno-inline instead of keyword (Leo)
>> - Don't test any brace or call lines in deterministic test
>> - Make sure context switch loop test does cleanup on failure (Sashiko)
>> - Remove undef int overflows in workloads (Sashiko)
>> - Link to v3: https://lore.kernel.org/r/20260603-james-cs-context-tracking-fix-v3-0-c392945d9ed5@linaro.org
>>
>> Changes in v3:
>> - Minor sashiko comments
>> - Close some more pipes
>> - Fix warning messages
>> - Error handling improvements
>> - Pass packet into cs_etm__synth_instruction_sample()
>> - Fixup stale comment (Leo)
>> - Link to v2: https://lore.kernel.org/r/20260602-james-cs-context-tracking-fix-v2-0-85b5ce6f55c6@linaro.org
>>
>> Changes in v2:
>> - Add --workload-ctl option to Perf test
>> - Re-write all the Coresight tests and speed them up
>> - Pass packet to memory access function so frontend can use either the
>> previous or current packet's EL
>> - Link to v1: https://lore.kernel.org/r/20260526-james-cs-context-tracking-fix-v1-0-ebd602e18287@linaro.org
>>
>> ---
>> James Clark (19):
>> perf cs-etm: Queue context packets for frontend
>> perf test: Add workload-ctl option
>> perf test: Add a workload that forces context switches
>> perf test cs-etm: Test process attribution
>> perf test: Add deterministic workload
>> perf test cs-etm: Replace unroll loop thread with deterministic decode test
>> perf test cs-etm: Remove asm_pure_loop test
>> perf test cs-etm: Replace memcpy test with raw dump stress test
>> perf test: Add named_threads workload
>> perf test cs-etm: Test decoding for concurrent threads test
>> perf test cs-etm: Remove duplicate branch tests
>> perf test cs-etm: Skip if not root
>> perf test cs-etm: Reduce snapshot size
>> perf test cs-etm: Speed up basic test
>> perf test cs-etm: Remove unused Coresight workloads
>> perf test cs-etm: Make disassembly test use kcore
>> perf test cs-etm: Add all branch instructions to test
>> perf test cs-etm: Speed up disassembly test
>> perf test cs-etm: Move existing tests to coresight folder
>>
>> Documentation/trace/coresight/coresight-perf.rst | 78 +------
>> MAINTAINERS | 2 -
>> tools/perf/Documentation/perf-test.txt | 24 ++-
>> tools/perf/Makefile.perf | 14 +-
>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 20 +-
>> tools/perf/tests/builtin-test.c | 187 +++++++++++++++-
>> tools/perf/tests/shell/coresight/Makefile | 29 ---
>> .../perf/tests/shell/coresight/Makefile.miniconfig | 14 --
>> tools/perf/tests/shell/coresight/asm_pure_loop.sh | 22 --
>> .../tests/shell/coresight/asm_pure_loop/.gitignore | 1 -
>> .../tests/shell/coresight/asm_pure_loop/Makefile | 34 ---
>> .../shell/coresight/asm_pure_loop/asm_pure_loop.S | 30 ---
>> .../tests/shell/coresight/concurrent_threads.sh | 45 ++++
>> .../tests/shell/coresight/context_switch_thread.sh | 69 ++++++
>> tools/perf/tests/shell/coresight/deterministic.sh | 72 +++++++
>> .../tests/shell/coresight/memcpy_thread/.gitignore | 1 -
>> .../tests/shell/coresight/memcpy_thread/Makefile | 33 ---
>> .../shell/coresight/memcpy_thread/memcpy_thread.c | 80 -------
>> .../tests/shell/coresight/memcpy_thread_16k_10.sh | 22 --
>> .../perf/tests/shell/coresight/raw_dump_stress.sh | 65 ++++++
>> .../shell/{ => coresight}/test_arm_coresight.sh | 43 ++--
>> .../{ => coresight}/test_arm_coresight_disasm.sh | 23 +-
>> .../tests/shell/coresight/thread_loop/.gitignore | 1 -
>> .../tests/shell/coresight/thread_loop/Makefile | 33 ---
>> .../shell/coresight/thread_loop/thread_loop.c | 85 --------
>> .../shell/coresight/thread_loop_check_tid_10.sh | 23 --
>> .../shell/coresight/thread_loop_check_tid_2.sh | 23 --
>> .../shell/coresight/unroll_loop_thread/.gitignore | 1 -
>> .../shell/coresight/unroll_loop_thread/Makefile | 33 ---
>> .../unroll_loop_thread/unroll_loop_thread.c | 75 -------
>> .../tests/shell/coresight/unroll_loop_thread_10.sh | 22 --
>> tools/perf/tests/shell/lib/coresight.sh | 134 ------------
>> tools/perf/tests/tests.h | 3 +
>> tools/perf/tests/workloads/Build | 4 +
>> tools/perf/tests/workloads/context_switch_loop.c | 110 ++++++++++
>> tools/perf/tests/workloads/deterministic.c | 39 ++++
>> tools/perf/tests/workloads/named_threads.c | 109 ++++++++++
>> tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 21 +-
>> tools/perf/util/cs-etm.c | 236 ++++++++++++---------
>> tools/perf/util/cs-etm.h | 8 +-
>> 40 files changed, 926 insertions(+), 942 deletions(-)
>> ---
>> base-commit: 351a37f2fda4db668cff8ba12f2992d73dccdaea
>> change-id: 20260515-james-cs-context-tracking-fix-754998bae7ed
>>
>> Best regards,
>> --
>> James Clark <james.clark@linaro.org>
^ permalink raw reply
* Re: [PATCH] docs/{it_it,sp_SP,zh_CN,zh_TW}: update references to removed CONFIG_DEBUG_SLAB
From: Dongliang Mu @ 2026-06-11 8:26 UTC (permalink / raw)
To: Ethan Nelson-Moore, Shuah Khan, Avadhut Naik, linux-doc, Alex Shi
Cc: Federico Vaga, Jonathan Corbet, Carlos Bilbao, Yanteng Si,
Hu Haowen
In-Reply-To: <20260611010014.412841-1-enelsonmoore@gmail.com>
On 6/11/26 9:00 AM, Ethan Nelson-Moore wrote:
> CONFIG_DEBUG_SLAB was removed in commit 2a19be61a651 ("mm/slab: remove
> CONFIG_SLAB from all Kconfig and Makefile"), but references to it
> remained in documentation. The English documentation was updated to
> refer to CONFIG_SLUB_DEBUG in commit 5969fbf30274 ("docs:
> submit-checklist: structure by category"), but these translations were
> never similarly updated. Update them.
>
> Discovered while searching for CONFIG_* symbols referenced in the
> kernel but not defined in any Kconfig file.
>
> Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
> ---
> Documentation/translations/it_IT/process/submit-checklist.rst | 2 +-
> Documentation/translations/sp_SP/process/submit-checklist.rst | 2 +-
> Documentation/translations/zh_CN/process/submit-checklist.rst | 2 +-
> Documentation/translations/zh_TW/process/submit-checklist.rst | 2 +-
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
> 4 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/translations/it_IT/process/submit-checklist.rst b/Documentation/translations/it_IT/process/submit-checklist.rst
> index 5bf1b4adebc1..c58d773fd297 100644
> --- a/Documentation/translations/it_IT/process/submit-checklist.rst
> +++ b/Documentation/translations/it_IT/process/submit-checklist.rst
> @@ -122,7 +122,7 @@ Verificate il vostro codice
>
> 1) La patch è stata verificata con le seguenti opzioni abilitate
> contemporaneamente: ``CONFIG_PREEMPT``, ``CONFIG_DEBUG_PREEMPT``,
> - ``CONFIG_DEBUG_SLAB``, ``CONFIG_DEBUG_PAGEALLOC``, ``CONFIG_DEBUG_MUTEXES``,
> + ``CONFIG_SLUB_DEBUG``, ``CONFIG_DEBUG_PAGEALLOC``, ``CONFIG_DEBUG_MUTEXES``,
> ``CONFIG_DEBUG_SPINLOCK``, ``CONFIG_DEBUG_ATOMIC_SLEEP``,
> ``CONFIG_PROVE_RCU`` e ``CONFIG_DEBUG_OBJECTS_RCU_HEAD``.
>
> diff --git a/Documentation/translations/sp_SP/process/submit-checklist.rst b/Documentation/translations/sp_SP/process/submit-checklist.rst
> index e7107cc97001..aedf55eb3b80 100644
> --- a/Documentation/translations/sp_SP/process/submit-checklist.rst
> +++ b/Documentation/translations/sp_SP/process/submit-checklist.rst
> @@ -76,7 +76,7 @@ y en otros lugares con respecto al envío de parches del kernel de Linux.
> cualquier problema.
>
> 12) Ha sido probado con ``CONFIG_PREEMPT``, ``CONFIG_DEBUG_PREEMPT``,
> - ``CONFIG_DEBUG_SLAB``, ``CONFIG_DEBUG_PAGEALLOC``, ``CONFIG_DEBUG_MUTEXES``,
> + ``CONFIG_SLUB_DEBUG``, ``CONFIG_DEBUG_PAGEALLOC``, ``CONFIG_DEBUG_MUTEXES``,
> ``CONFIG_DEBUG_SPINLOCK``, ``CONFIG_DEBUG_ATOMIC_SLEEP``
> ``CONFIG_PROVE_RCU`` y ``CONFIG_DEBUG_OBJECTS_RCU_HEAD`` todos
> habilitados simultáneamente.
> diff --git a/Documentation/translations/zh_CN/process/submit-checklist.rst b/Documentation/translations/zh_CN/process/submit-checklist.rst
> index 0e524f1c1af5..18411b426122 100644
> --- a/Documentation/translations/zh_CN/process/submit-checklist.rst
> +++ b/Documentation/translations/zh_CN/process/submit-checklist.rst
> @@ -65,7 +65,7 @@ Linux内核补丁提交检查单
> :ref:`kernel-doc <kernel_doc_zh>` 并修复任何问题。
>
> 12) 通过以下选项同时启用的测试: ``CONFIG_PREEMPT``, ``CONFIG_DEBUG_PREEMPT``,
> - ``CONFIG_DEBUG_SLAB``, ``CONFIG_DEBUG_PAGEALLOC``, ``CONFIG_DEBUG_MUTEXES``,
> + ``CONFIG_SLUB_DEBUG``, ``CONFIG_DEBUG_PAGEALLOC``, ``CONFIG_DEBUG_MUTEXES``,
> ``CONFIG_DEBUG_SPINLOCK``, ``CONFIG_DEBUG_ATOMIC_SLEEP``,
> ``CONFIG_PROVE_RCU`` 和 ``CONFIG_DEBUG_OBJECTS_RCU_HEAD`` 。
>
> diff --git a/Documentation/translations/zh_TW/process/submit-checklist.rst b/Documentation/translations/zh_TW/process/submit-checklist.rst
> index a0cb91a6945f..06aa635a659c 100644
> --- a/Documentation/translations/zh_TW/process/submit-checklist.rst
> +++ b/Documentation/translations/zh_TW/process/submit-checklist.rst
> @@ -68,7 +68,7 @@ Linux內核補丁提交檢查單
> :ref:`kernel-doc <kernel_doc_zh>` 並修復任何問題。
>
> 12) 通過以下選項同時啓用的測試: ``CONFIG_PREEMPT``, ``CONFIG_DEBUG_PREEMPT``,
> - ``CONFIG_DEBUG_SLAB``, ``CONFIG_DEBUG_PAGEALLOC``, ``CONFIG_DEBUG_MUTEXES``,
> + ``CONFIG_SLUB_DEBUG``, ``CONFIG_DEBUG_PAGEALLOC``, ``CONFIG_DEBUG_MUTEXES``,
> ``CONFIG_DEBUG_SPINLOCK``, ``CONFIG_DEBUG_ATOMIC_SLEEP``,
> ``CONFIG_PROVE_RCU`` 和 ``CONFIG_DEBUG_OBJECTS_RCU_HEAD`` 。
>
^ permalink raw reply
* Re: [PATCH] docs/zh_CN: fix CONFIG_CONPAT typo for CONFIG_COMPAT
From: Dongliang Mu @ 2026-06-11 8:05 UTC (permalink / raw)
To: Ethan Nelson-Moore, Shuah Khan, Kees Cook, linux-doc
Cc: Alex Shi, Yanteng Si, Jonathan Corbet
In-Reply-To: <20260610231836.186610-1-enelsonmoore@gmail.com>
On 6/11/26 7:18 AM, Ethan Nelson-Moore wrote:
> The Simplified Chinese translation of security/self-protection.rst
> contains a typo CONFIG_CONPAT for CONFIG_COMPAT. Fix it.
Yes, it is a typo in the Chinese translation.
Please strip the following content from the commit message. If you would
like to enrich the above paragraph, that's better.
Dongliang Mu
>
> I don't speak Chinese, but I verified that CONFIG_COMPAT was what was
> intended via Google Translate.
>
> Discovered while searching for CONFIG_* symbols referenced in code but
> not defined in any Kconfig file.
>
> Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
> ---
> Documentation/translations/zh_CN/security/self-protection.rst | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/translations/zh_CN/security/self-protection.rst b/Documentation/translations/zh_CN/security/self-protection.rst
> index 93de9cee5c1a..ad96bb4a4995 100644
> --- a/Documentation/translations/zh_CN/security/self-protection.rst
> +++ b/Documentation/translations/zh_CN/security/self-protection.rst
> @@ -97,7 +97,7 @@ ARCH_OPTIONAL_KERNEL_RWX时的默认设置。
> --------------------
>
> 对于64位系统,一种消除许多系统调用最简单的方法是构建时不启用
> -CONFIG_CONPAT。然而,这种情况通常不可行。
> +CONFIG_COMPAT。然而,这种情况通常不可行。
>
> “seccomp”系统为用户空间提供了一种可选功能,提供了一种减少可供
> 运行中进程使用内核入口点数量的方法。这限制了可以访问内核代码
^ permalink raw reply
* Re: [PATCH] Documentation: process: fix brackets
From: Geert Uytterhoeven @ 2026-06-11 8:01 UTC (permalink / raw)
To: Manuel Ebner
Cc: Jonathan Corbet, Shuah Khan, open list:DOCUMENTATION PROCESS,
open list:DOCUMENTATION, open list, Kees Cook,
Krzysztof Kozlowski
In-Reply-To: <20260611064311.117023-2-manuelebner@mailbox.org>
CC kees, krzk
On Thu, 11 Jun 2026 at 08:43, Manuel Ebner <manuelebner@mailbox.org> wrote:
>
> Fix missing ')' and needless ')'
>
> Signed-off-by: Manuel Ebner <manuelebner@mailbox.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
> ---
> This is the first patch of a 'series', but I won't send them together
> because I'm still producing the patches and it will take me a couple weeks.
> Documentation/process/deprecated.rst | 2 +-
> Documentation/process/maintainer-soc.rst | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/process/deprecated.rst b/Documentation/process/deprecated.rst
> index ac75b7ecac47..03de71f654c7 100644
> --- a/Documentation/process/deprecated.rst
> +++ b/Documentation/process/deprecated.rst
> @@ -388,7 +388,7 @@ allocations. For example, these open coded assignments::
> ptr = kmalloc_array(count, sizeof(*ptr), gfp);
> ptr = kcalloc(count, sizeof(*ptr), gfp);
> ptr = kmalloc(struct_size(ptr, flex_member, count), gfp);
> - ptr = kmalloc(sizeof(struct foo, gfp);
> + ptr = kmalloc(sizeof(struct foo), gfp);
>
> become, respectively::
>
> diff --git a/Documentation/process/maintainer-soc.rst b/Documentation/process/maintainer-soc.rst
> index a3a90a7d4c68..fa91dfc53783 100644
> --- a/Documentation/process/maintainer-soc.rst
> +++ b/Documentation/process/maintainer-soc.rst
> @@ -60,7 +60,7 @@ All typical platform related patches should be sent via SoC submaintainers
> shared defconfigs. Note that scripts/get_maintainer.pl might not provide
> correct addresses for the shared defconfig, so ignore its output and manually
> create CC-list based on MAINTAINERS file or use something like
> -``scripts/get_maintainer.pl -f drivers/soc/FOO/``).
> +``scripts/get_maintainer.pl -f drivers/soc/FOO/``.
>
> Submitting Patches to the Main SoC Maintainers
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* [PATCH] Documentation: admin-guide: fix bracelets and translation issue
From: Manuel Ebner @ 2026-06-11 7:55 UTC (permalink / raw)
To: Jonathan Corbet, Shuah Khan, Mauro Carvalho Chehab, linux-media,
open list:DOCUMENTATION, open list
Cc: Manuel Ebner
Add missing ] and replace 'neuer Name' with 'new Name'.
Signed-off-by: Manuel Ebner <manuelebner@mailbox.org>
---
Documentation/admin-guide/kernel-parameters.txt | 6 +++---
Documentation/admin-guide/media/bttv.rst | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 00375193bd26..17363d525ae3 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6414,9 +6414,9 @@ Kernel parameters
reboot= [KNL]
Format (x86 or x86_64):
[w[arm] | c[old] | h[ard] | s[oft] | g[pio]] | d[efault] \
- [[,]s[mp]#### \
+ [[,]s[mp]####] \
[[,]b[ios] | a[cpi] | k[bd] | t[riple] | e[fi] | p[ci]] \
- [[,]f[orce]
+ [[,]f[orce]]
Where reboot_mode is one of warm (soft) or cold (hard) or gpio
(prefix with 'panic_' to set mode for panic
reboot only),
@@ -6917,7 +6917,7 @@ Kernel parameters
apic=verbose is specified.
Example: apic=debug show_lapic=all
- slab_debug[=options[,slabs][;[options[,slabs]]...] [MM]
+ slab_debug[=options[,slabs][;[options[,slabs]]...]] [MM]
Enabling slab_debug allows one to determine the
culprit if slab objects become corrupted. Enabling
slab_debug can create guard zones around objects and
diff --git a/Documentation/admin-guide/media/bttv.rst b/Documentation/admin-guide/media/bttv.rst
index 58cbaf6df694..78c3e560b806 100644
--- a/Documentation/admin-guide/media/bttv.rst
+++ b/Documentation/admin-guide/media/bttv.rst
@@ -1239,7 +1239,7 @@ Models:
- Galaxis DVB Card C CI
- Galaxis DVB Card S
- Galaxis DVB Card C
-- Galaxis plug.in S [neuer Name: Galaxis DVB Card S CI
+- Galaxis plug.in S [new Name: Galaxis DVB Card S CI]
Hauppauge
~~~~~~~~~
--
2.54.0
^ permalink raw reply related
* [PATCH v3 2/3] hwmon: pmbus: Add support for Silergy SQ24860
From: Ziming Zhu @ 2026-06-11 7:43 UTC (permalink / raw)
To: Guenter Roeck
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet,
Shuah Khan, linux-hwmon, devicetree, linux-kernel, linux-doc,
Ziming Zhu
In-Reply-To: <20260611074335.4415-1-zmzhu0630@163.com>
From: Ziming Zhu <ziming.zhu@silergycorp.com>
Add PMBus hwmon support for the Silergy SQ24860 eFuse.
The driver reports input voltage, output voltage, auxiliary voltage,
input current, input power, and temperature. It also exposes peak,
average, and minimum history attributes, sample count configuration,
and maps the manufacturer-specific VIREF register to the generic input
over-current fault limit attribute.
The IMON resistor value is read from the silergy,rimon-micro-ohms device
property and used to configure the input current calibration gain.
Signed-off-by: Ziming Zhu <ziming.zhu@silergycorp.com>
---
drivers/hwmon/pmbus/Kconfig | 19 ++
drivers/hwmon/pmbus/Makefile | 1 +
drivers/hwmon/pmbus/sq24860.c | 430 ++++++++++++++++++++++++++++++++++
3 files changed, 450 insertions(+)
create mode 100644 drivers/hwmon/pmbus/sq24860.c
diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig
index 8f4bff375ecb..a905b5af137c 100644
--- a/drivers/hwmon/pmbus/Kconfig
+++ b/drivers/hwmon/pmbus/Kconfig
@@ -612,6 +612,25 @@ config SENSORS_STEF48H28
This driver can also be built as a module. If so, the module will
be called stef48h28.
+config SENSORS_SQ24860
+ tristate "Silergy SQ24860"
+ help
+ If you say yes here you get hardware monitoring support for Silergy
+ SQ24860 eFuse.
+
+ This driver can also be built as a module. If so, the module will
+ be called sq24860.
+
+config SENSORS_SQ24860_REGULATOR
+ bool "Regulator support for SQ24860"
+ depends on SENSORS_SQ24860 && REGULATOR
+ default SENSORS_SQ24860
+ help
+ If you say yes here you get regulator support for Silergy SQ24860.
+ The regulator is registered through the PMBus regulator framework and
+ can be used to control the output exposed by the device.
+ This option is only useful if regulator framework support is needed.
+
config SENSORS_STPDDC60
tristate "ST STPDDC60"
help
diff --git a/drivers/hwmon/pmbus/Makefile b/drivers/hwmon/pmbus/Makefile
index 7129b62bc00f..86bc93c6c091 100644
--- a/drivers/hwmon/pmbus/Makefile
+++ b/drivers/hwmon/pmbus/Makefile
@@ -60,6 +60,7 @@ obj-$(CONFIG_SENSORS_PM6764TR) += pm6764tr.o
obj-$(CONFIG_SENSORS_PXE1610) += pxe1610.o
obj-$(CONFIG_SENSORS_Q54SJ108A2) += q54sj108a2.o
obj-$(CONFIG_SENSORS_STEF48H28) += stef48h28.o
+obj-$(CONFIG_SENSORS_SQ24860) += sq24860.o
obj-$(CONFIG_SENSORS_STPDDC60) += stpddc60.o
obj-$(CONFIG_SENSORS_TDA38640) += tda38640.o
obj-$(CONFIG_SENSORS_TPS25990) += tps25990.o
diff --git a/drivers/hwmon/pmbus/sq24860.c b/drivers/hwmon/pmbus/sq24860.c
new file mode 100644
index 000000000000..f16f650ff7ba
--- /dev/null
+++ b/drivers/hwmon/pmbus/sq24860.c
@@ -0,0 +1,430 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Author: Ziming Zhu <ziming.zhu@silergycorp.com>
+ */
+
+#include <linux/bitfield.h>
+#include <linux/err.h>
+#include <linux/i2c.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/math64.h>
+
+#include "pmbus.h"
+
+#define SQ24860_IIN_CAL_GAIN 0x38
+#define SQ24860_READ_VAUX 0xd0
+#define SQ24860_READ_VIN_MIN 0xd1
+#define SQ24860_READ_VIN_PEAK 0xd2
+#define SQ24860_READ_IIN_PEAK 0xd4
+#define SQ24860_READ_PIN_PEAK 0xd5
+#define SQ24860_READ_TEMP_AVG 0xd6
+#define SQ24860_READ_TEMP_PEAK 0xd7
+#define SQ24860_READ_VOUT_MIN 0xda
+#define SQ24860_READ_VIN_AVG 0xdc
+#define SQ24860_READ_VOUT_AVG 0xdd
+#define SQ24860_READ_IIN_AVG 0xde
+#define SQ24860_READ_PIN_AVG 0xdf
+#define SQ24860_VIREF 0xe0
+#define SQ24860_PK_MIN_AVG 0xea
+#define PK_MIN_AVG_RST_PEAK BIT(7)
+#define PK_MIN_AVG_RST_AVG BIT(6)
+#define PK_MIN_AVG_RST_MIN BIT(5)
+#define PK_MIN_AVG_AVG_CNT GENMASK(2, 0)
+#define SQ24860_MFR_WRITE_PROTECT 0xf8
+#define SQ24860_UNLOCKED BIT(7)
+
+#define SQ24860_8B_SHIFT 2
+#define SQ24860_IIN_OCF_NUM 1000000
+#define SQ24860_IIN_OCF_DIV 129278
+#define SQ24860_IIN_OCF_OFF 165
+
+#define PK_MIN_AVG_RST_MASK (PK_MIN_AVG_RST_PEAK | \
+ PK_MIN_AVG_RST_AVG | \
+ PK_MIN_AVG_RST_MIN)
+#define SQ24860_MAX_SAMPLES BIT(FIELD_MAX(PK_MIN_AVG_AVG_CNT))
+/*
+ * Arbitrary default Rimon value: 1.6kOhm
+ */
+#define SQ24860_DEFAULT_RIMON 1600000000
+#define SQ24860_GIMON 18180
+
+#define SQ24860_VAUX_DIV 20
+
+static int sq24860_write_iin_cal_gain(struct i2c_client *client, u32 rimon)
+{
+ u64 temp = 6400ULL * 1000000000ULL * 1000ULL;
+ u64 denom;
+ u64 word;
+
+ if (!rimon)
+ return -EINVAL;
+
+ denom = (u64)rimon * SQ24860_GIMON;
+ word = div64_u64(temp, denom);
+ if (!word || word > U16_MAX)
+ return -EINVAL;
+
+ return i2c_smbus_write_word_data(client, SQ24860_IIN_CAL_GAIN,
+ (u16)word);
+}
+
+static int sq24860_mfr_write_protect_set(struct i2c_client *client,
+ u8 protect)
+{
+ u8 val;
+
+ switch (protect) {
+ case 0:
+ val = 0xa2;
+ break;
+ case PB_WP_ALL:
+ val = 0x0;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return pmbus_write_byte_data(client, -1, SQ24860_MFR_WRITE_PROTECT,
+ val);
+}
+
+static int sq24860_mfr_write_protect_get(struct i2c_client *client)
+{
+ int ret = pmbus_read_byte_data(client, -1, SQ24860_MFR_WRITE_PROTECT);
+
+ if (ret < 0)
+ return ret;
+
+ return (ret & SQ24860_UNLOCKED) ? 0 : PB_WP_ALL;
+}
+
+static int sq24860_read_word_data(struct i2c_client *client,
+ int page, int phase, int reg)
+{
+ int ret;
+
+ switch (reg) {
+ case PMBUS_VIRT_READ_VIN_MAX:
+ ret = pmbus_read_word_data(client, page, phase,
+ SQ24860_READ_VIN_PEAK);
+ break;
+
+ case PMBUS_VIRT_READ_VIN_MIN:
+ ret = pmbus_read_word_data(client, page, phase,
+ SQ24860_READ_VIN_MIN);
+ break;
+
+ case PMBUS_VIRT_READ_VIN_AVG:
+ ret = pmbus_read_word_data(client, page, phase,
+ SQ24860_READ_VIN_AVG);
+ break;
+
+ case PMBUS_VIRT_READ_VOUT_MIN:
+ ret = pmbus_read_word_data(client, page, phase,
+ SQ24860_READ_VOUT_MIN);
+ break;
+
+ case PMBUS_VIRT_READ_VOUT_AVG:
+ ret = pmbus_read_word_data(client, page, phase,
+ SQ24860_READ_VOUT_AVG);
+ break;
+
+ case PMBUS_VIRT_READ_IIN_AVG:
+ ret = pmbus_read_word_data(client, page, phase,
+ SQ24860_READ_IIN_AVG);
+ break;
+
+ case PMBUS_VIRT_READ_IIN_MAX:
+ ret = pmbus_read_word_data(client, page, phase,
+ SQ24860_READ_IIN_PEAK);
+ break;
+
+ case PMBUS_VIRT_READ_TEMP_AVG:
+ ret = pmbus_read_word_data(client, page, phase,
+ SQ24860_READ_TEMP_AVG);
+ break;
+
+ case PMBUS_VIRT_READ_TEMP_MAX:
+ ret = pmbus_read_word_data(client, page, phase,
+ SQ24860_READ_TEMP_PEAK);
+ break;
+
+ case PMBUS_VIRT_READ_PIN_AVG:
+ ret = pmbus_read_word_data(client, page, phase,
+ SQ24860_READ_PIN_AVG);
+ break;
+
+ case PMBUS_VIRT_READ_PIN_MAX:
+ ret = pmbus_read_word_data(client, page, phase,
+ SQ24860_READ_PIN_PEAK);
+ break;
+
+ case PMBUS_VIRT_READ_VMON:
+ ret = pmbus_read_word_data(client, page, phase,
+ SQ24860_READ_VAUX);
+ if (ret < 0)
+ break;
+ ret = DIV_ROUND_CLOSEST(ret, SQ24860_VAUX_DIV);
+ break;
+
+ case PMBUS_VIN_UV_WARN_LIMIT:
+ case PMBUS_VIN_UV_FAULT_LIMIT:
+ case PMBUS_VIN_OV_WARN_LIMIT:
+ case PMBUS_VIN_OV_FAULT_LIMIT:
+ case PMBUS_VOUT_UV_WARN_LIMIT:
+ case PMBUS_IIN_OC_WARN_LIMIT:
+ case PMBUS_OT_WARN_LIMIT:
+ case PMBUS_OT_FAULT_LIMIT:
+ case PMBUS_PIN_OP_WARN_LIMIT:
+ /*
+ * These registers provide an 8 bits value instead of a
+ * 10bits one. Just shifting twice the register value is
+ * enough to make the sensor type conversion work, even
+ * if the datasheet provides different m, b and R for
+ * those.
+ */
+ ret = pmbus_read_word_data(client, page, phase, reg);
+ if (ret < 0)
+ break;
+ ret <<= SQ24860_8B_SHIFT;
+ break;
+
+ case PMBUS_IIN_OC_FAULT_LIMIT:
+ /*
+ * VIREF directly sets the over-current limit at which the eFuse
+ * will turn the FET off and trigger a fault. Expose it through
+ * this generic property instead of a manufacturer specific one.
+ */
+ ret = pmbus_read_byte_data(client, page, SQ24860_VIREF);
+ if (ret < 0)
+ break;
+ ret = DIV_ROUND_CLOSEST(ret * SQ24860_IIN_OCF_NUM,
+ SQ24860_IIN_OCF_DIV);
+ ret += SQ24860_IIN_OCF_OFF;
+ break;
+
+ case PMBUS_VIRT_SAMPLES:
+ ret = pmbus_read_byte_data(client, page, SQ24860_PK_MIN_AVG);
+ if (ret < 0)
+ break;
+ ret = BIT(FIELD_GET(PK_MIN_AVG_AVG_CNT, ret));
+ break;
+
+ case PMBUS_VIRT_RESET_TEMP_HISTORY:
+ case PMBUS_VIRT_RESET_VIN_HISTORY:
+ case PMBUS_VIRT_RESET_IIN_HISTORY:
+ case PMBUS_VIRT_RESET_PIN_HISTORY:
+ case PMBUS_VIRT_RESET_VOUT_HISTORY:
+ ret = 0;
+ break;
+
+ default:
+ ret = -ENODATA;
+ break;
+ }
+
+ return ret;
+}
+
+static int sq24860_write_word_data(struct i2c_client *client,
+ int page, int reg, u16 value)
+{
+ int ret;
+
+ switch (reg) {
+ case PMBUS_VIN_UV_WARN_LIMIT:
+ case PMBUS_VIN_UV_FAULT_LIMIT:
+ case PMBUS_VIN_OV_WARN_LIMIT:
+ case PMBUS_VIN_OV_FAULT_LIMIT:
+ case PMBUS_VOUT_UV_WARN_LIMIT:
+ case PMBUS_IIN_OC_WARN_LIMIT:
+ case PMBUS_OT_WARN_LIMIT:
+ case PMBUS_OT_FAULT_LIMIT:
+ case PMBUS_PIN_OP_WARN_LIMIT:
+ value >>= SQ24860_8B_SHIFT;
+ value = clamp_val(value, 0, 0xff);
+ ret = pmbus_write_word_data(client, page, reg, value);
+ break;
+
+ case PMBUS_IIN_OC_FAULT_LIMIT:
+ if (value < SQ24860_IIN_OCF_OFF)
+ return -EINVAL;
+ value -= SQ24860_IIN_OCF_OFF;
+ value = DIV_ROUND_CLOSEST(((unsigned int)value) * SQ24860_IIN_OCF_DIV,
+ SQ24860_IIN_OCF_NUM);
+ value = clamp_val(value, 0, 0x3f);
+ ret = pmbus_write_byte_data(client, page, SQ24860_VIREF, value);
+ break;
+
+ case PMBUS_VIRT_SAMPLES:
+ value = clamp_val(value, 1, SQ24860_MAX_SAMPLES);
+ value = ilog2(value);
+ ret = pmbus_update_byte_data(client, page, SQ24860_PK_MIN_AVG,
+ PK_MIN_AVG_AVG_CNT,
+ FIELD_PREP(PK_MIN_AVG_AVG_CNT, value));
+ break;
+
+ case PMBUS_VIRT_RESET_TEMP_HISTORY:
+ case PMBUS_VIRT_RESET_VIN_HISTORY:
+ case PMBUS_VIRT_RESET_IIN_HISTORY:
+ case PMBUS_VIRT_RESET_PIN_HISTORY:
+ case PMBUS_VIRT_RESET_VOUT_HISTORY:
+ /*
+ * SQ24860 has history resets based on MIN/AVG/PEAK instead of per
+ * sensor type. Exposing this quirk in hwmon is not desirable so
+ * reset MIN, AVG and PEAK together. Even is there effectively only
+ * one reset, which resets everything, expose the 5 entries so
+ * userspace is not required map a sensor type to another to trigger
+ * a reset
+ */
+ ret = pmbus_update_byte_data(client, 0, SQ24860_PK_MIN_AVG,
+ PK_MIN_AVG_RST_MASK,
+ PK_MIN_AVG_RST_MASK);
+ break;
+
+ default:
+ ret = -ENODATA;
+ break;
+ }
+
+ return ret;
+}
+
+static int sq24860_read_byte_data(struct i2c_client *client,
+ int page, int reg)
+{
+ int ret;
+
+ switch (reg) {
+ case PMBUS_WRITE_PROTECT:
+ ret = sq24860_mfr_write_protect_get(client);
+ break;
+
+ default:
+ ret = -ENODATA;
+ break;
+ }
+
+ return ret;
+}
+
+static int sq24860_write_byte_data(struct i2c_client *client,
+ int page, int reg, u8 byte)
+{
+ int ret;
+
+ switch (reg) {
+ case PMBUS_WRITE_PROTECT:
+ ret = sq24860_mfr_write_protect_set(client, byte);
+ break;
+
+ default:
+ ret = -ENODATA;
+ break;
+ }
+
+ return ret;
+}
+
+#if IS_ENABLED(CONFIG_SENSORS_SQ24860_REGULATOR)
+static const struct regulator_desc sq24860_reg_desc[] = {
+ PMBUS_REGULATOR_ONE_NODE("vout"),
+};
+#endif
+
+static const struct pmbus_driver_info sq24860_base_info = {
+ .pages = 1,
+ .format[PSC_VOLTAGE_IN] = direct,
+ .m[PSC_VOLTAGE_IN] = 64,
+ .b[PSC_VOLTAGE_IN] = 0,
+ .R[PSC_VOLTAGE_IN] = 0,
+ .format[PSC_VOLTAGE_OUT] = direct,
+ .m[PSC_VOLTAGE_OUT] = 64,
+ .b[PSC_VOLTAGE_OUT] = 0,
+ .R[PSC_VOLTAGE_OUT] = 0,
+ .format[PSC_TEMPERATURE] = direct,
+ .m[PSC_TEMPERATURE] = 1,
+ .b[PSC_TEMPERATURE] = 0,
+ .R[PSC_TEMPERATURE] = 0,
+/*
+ * Current and power measurements depend on the calibration gain
+ * programmed from the board-specific IMON resistor value.
+ */
+ .format[PSC_CURRENT_IN] = direct,
+ .m[PSC_CURRENT_IN] = 16,
+ .b[PSC_CURRENT_IN] = 0,
+ .R[PSC_CURRENT_IN] = 0,
+ .format[PSC_POWER] = direct,
+ .m[PSC_POWER] = 2,
+ .b[PSC_POWER] = 0,
+ .R[PSC_POWER] = 0,
+ .func[0] = PMBUS_HAVE_VIN |
+ PMBUS_HAVE_VOUT |
+ PMBUS_HAVE_VMON |
+ PMBUS_HAVE_IIN |
+ PMBUS_HAVE_PIN |
+ PMBUS_HAVE_TEMP |
+ PMBUS_HAVE_STATUS_VOUT |
+ PMBUS_HAVE_STATUS_IOUT |
+ PMBUS_HAVE_STATUS_INPUT |
+ PMBUS_HAVE_STATUS_TEMP |
+ PMBUS_HAVE_SAMPLES,
+ .read_word_data = sq24860_read_word_data,
+ .write_word_data = sq24860_write_word_data,
+ .read_byte_data = sq24860_read_byte_data,
+ .write_byte_data = sq24860_write_byte_data,
+
+#if IS_ENABLED(CONFIG_SENSORS_SQ24860_REGULATOR)
+ .reg_desc = sq24860_reg_desc,
+ .num_regulators = ARRAY_SIZE(sq24860_reg_desc),
+#endif
+};
+
+static const struct i2c_device_id sq24860_i2c_id[] = {
+ { "sq24860" },
+ {}
+};
+MODULE_DEVICE_TABLE(i2c, sq24860_i2c_id);
+
+static const struct of_device_id sq24860_of_match[] = {
+ { .compatible = "silergy,sq24860" },
+ {}
+};
+MODULE_DEVICE_TABLE(of, sq24860_of_match);
+
+static int sq24860_probe(struct i2c_client *client)
+{
+ struct device *dev = &client->dev;
+ struct pmbus_driver_info *info;
+ u32 rimon;
+ int ret;
+
+ if (device_property_read_u32(dev, "silergy,rimon-micro-ohms", &rimon))
+ rimon = SQ24860_DEFAULT_RIMON;
+ ret = sq24860_write_iin_cal_gain(client, rimon);
+ if (ret < 0)
+ return dev_err_probe(&client->dev, ret,
+ "Failed to set gain\n");
+ info = devm_kmemdup(dev, &sq24860_base_info, sizeof(*info), GFP_KERNEL);
+ if (!info)
+ return -ENOMEM;
+
+ return pmbus_do_probe(client, info);
+}
+
+static struct i2c_driver sq24860_driver = {
+ .driver = {
+ .name = "sq24860",
+ .of_match_table = sq24860_of_match,
+ },
+ .probe = sq24860_probe,
+ .id_table = sq24860_i2c_id,
+};
+module_i2c_driver(sq24860_driver);
+
+MODULE_AUTHOR("Ziming Zhu <ziming.zhu@silergycorp.com>");
+MODULE_DESCRIPTION("PMBUS driver for SQ24860 eFuse");
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS("PMBUS");
--
2.25.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox