* Re: [PATCH 25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier
From: Petr Mladek @ 2022-05-16 14:56 UTC (permalink / raw)
To: Guilherme G. Piccoli
Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx, vgoyal,
vkuznets, will
In-Reply-To: <20220427224924.592546-26-gpiccoli@igalia.com>
On Wed 2022-04-27 19:49:19, Guilherme G. Piccoli wrote:
> Currently the parameter "panic_print" relies in a function called
> directly on panic path; one of the flags the users can set for
> panic_print triggers a console replay mechanism, to show the
> entire kernel log buffer (from the beginning) in a panic event.
>
> Two problems with that: the dual nature of the panic_print
> isn't really appropriate, the function was originally meant
> to allow users dumping system information on panic events,
> and was "overridden" to also force a console flush of the full
> kernel log buffer. It also turns the code a bit more complex
> and duplicate than it needs to be.
>
> This patch proposes 2 changes: first, we decouple panic_print
> from the console flushing mechanism, in the form of a new kernel
> core parameter (panic_console_replay); we kept the functionality
> on panic_print to avoid userspace breakage, although we comment
> in both code and documentation that this panic_print usage is
> deprecated.
>
> We converted panic_print function to a panic notifier too, adding
> it on the panic informational notifier list, executed as the final
> callback. This allows a more clear code and makes sense, as
> panic_print_sys_info() is really a panic-time only function.
> We also moved its code to kernel/printk.c, it seems to make more
> sense given it's related to printing stuff.
I really like both changes. Just please split it them into two
patchset. I mean one patch for the new "panic_console_replay"
parameter and 2nd that moves "printk_info" into the notifier.
Best Regards,
Petr
^ permalink raw reply
* Re: [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers
From: Petr Mladek @ 2022-05-16 14:50 UTC (permalink / raw)
To: Guilherme G. Piccoli
Cc: Steven Rostedt, akpm, bhe, kexec, linux-kernel,
bcm-kernel-feedback-list, linuxppc-dev, linux-alpha, linux-edac,
linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
linux-remoteproc, linux-s390, linux-tegra, linux-um, linux-xtensa,
netdev, openipmi-developer, rcu, sparclinux, xen-devel, x86,
kernel-dev, kernel, halves, fabiomirmar, alejandro.j.jimenez,
andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
senozhatsky, stern, tglx, vgoyal, vkuznets, will
In-Reply-To: <c8818906-f113-82b6-b58b-d47ae0c16b4f@igalia.com>
On Wed 2022-05-11 17:03:51, Guilherme G. Piccoli wrote:
> On 10/05/2022 14:40, Steven Rostedt wrote:
> > On Wed, 27 Apr 2022 19:49:17 -0300
> > "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:
> >
> >> Currently we don't have a way to check if there are dumpers set,
> >> except counting the list members maybe. This patch introduces a very
> >> simple helper to provide this information, by just keeping track of
> >> registered/unregistered kmsg dumpers. It's going to be used on the
> >> panic path in the subsequent patch.
> >
> > FYI, it is considered "bad form" to reference in the change log "this
> > patch". We know this is a patch. The change log should just talk about what
> > is being done. So can you reword your change logs (you do this is almost
> > every patch). Here's what I would reword the above to be:
> >
> > Currently we don't have a way to check if there are dumpers set, except
> > perhaps by counting the list members. Introduce a very simple helper to
> > provide this information, by just keeping track of registered/unregistered
> > kmsg dumpers. This will simplify the refactoring of the panic path.
>
> Thanks for the hint, you're right - it's almost in all of my patches.
> I'll reword all of them (except the ones already merged) to remove this
> "bad form".
Shame on me that I do not care that much about the style of the commit
message :-)
Anyway, the code looks good to me. With the better commit message:
Reviewed-by: Petr Mladek <pmladek@suse.com>
Best Regards,
Petr
^ permalink raw reply
* Re: [PATCH net-next v3 00/10] UDP/IPv6 refactoring
From: David Ahern @ 2022-05-16 14:47 UTC (permalink / raw)
To: Paolo Abeni, Pavel Begunkov, netdev, David S . Miller,
Jakub Kicinski
Cc: Eric Dumazet, linux-kernel
In-Reply-To: <b9025eb4d8a1efefbcd04013cbe8e55e98ef66e1.camel@redhat.com>
On 5/16/22 7:48 AM, Paolo Abeni wrote:
> Hello,
>
> On Fri, 2022-05-13 at 16:26 +0100, Pavel Begunkov wrote:
>> Refactor UDP/IPv6 and especially udpv6_sendmsg() paths. The end result looks
>> cleaner than it was before and the series also removes a bunch of instructions
>> and other overhead from the hot path positively affecting performance.
>>
>> Testing over dummy netdev with 16 byte packets yields 2240481 tx/s,
>> comparing to 2203417 tx/s previously, which is around +1.6%
>
> I personally feel that some patches in this series have a relevant
> chance of introducing functional regressions and e.g. syzbot will not
> help to catch them. That risk is IMHO relevant considered that the
> performance gain here looks quite limited.
>
> There are a few individual changes that IMHO looks like nice cleanup
> e.g. patch 5, 6, 8, 9 and possibly even patch 1.
>
> I suggest to reduce the patchset scope to them.
>
I agree with that sentiment. The set also needs testcases that captures
the various permutations.
^ permalink raw reply
* [PATCH net v6 2/2] ptp: ptp_clockmatrix: return -EBUSY if phase pull-in is in progress
From: Min Li @ 2022-05-16 14:47 UTC (permalink / raw)
To: richardcochran, lee.jones; +Cc: linux-kernel, netdev, Min Li
In-Reply-To: <1652712427-14703-1-git-send-email-min.li.xe@renesas.com>
Also removes PEROUT_ENABLE_OUTPUT_MASK
Signed-off-by: Min Li <min.li.xe@renesas.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
---
drivers/ptp/ptp_clockmatrix.c | 32 ++------------------------------
drivers/ptp/ptp_clockmatrix.h | 2 --
2 files changed, 2 insertions(+), 32 deletions(-)
diff --git a/drivers/ptp/ptp_clockmatrix.c b/drivers/ptp/ptp_clockmatrix.c
index 2507576..cb258e1 100644
--- a/drivers/ptp/ptp_clockmatrix.c
+++ b/drivers/ptp/ptp_clockmatrix.c
@@ -1352,43 +1352,15 @@ static int idtcm_output_enable(struct idtcm_channel *channel,
return idtcm_write(idtcm, (u16)base, OUT_CTRL_1, &val, sizeof(val));
}
-static int idtcm_output_mask_enable(struct idtcm_channel *channel,
- bool enable)
-{
- u16 mask;
- int err;
- u8 outn;
-
- mask = channel->output_mask;
- outn = 0;
-
- while (mask) {
- if (mask & 0x1) {
- err = idtcm_output_enable(channel, enable, outn);
- if (err)
- return err;
- }
-
- mask >>= 0x1;
- outn++;
- }
-
- return 0;
-}
-
static int idtcm_perout_enable(struct idtcm_channel *channel,
struct ptp_perout_request *perout,
bool enable)
{
struct idtcm *idtcm = channel->idtcm;
- unsigned int flags = perout->flags;
struct timespec64 ts = {0, 0};
int err;
- if (flags == PEROUT_ENABLE_OUTPUT_MASK)
- err = idtcm_output_mask_enable(channel, enable);
- else
- err = idtcm_output_enable(channel, enable, perout->index);
+ err = idtcm_output_enable(channel, enable, perout->index);
if (err) {
dev_err(idtcm->dev, "Unable to set output enable");
@@ -1892,7 +1864,7 @@ static int idtcm_adjtime(struct ptp_clock_info *ptp, s64 delta)
int err;
if (channel->phase_pull_in == true)
- return 0;
+ return -EBUSY;
mutex_lock(idtcm->lock);
diff --git a/drivers/ptp/ptp_clockmatrix.h b/drivers/ptp/ptp_clockmatrix.h
index 4379650..bf1e49409 100644
--- a/drivers/ptp/ptp_clockmatrix.h
+++ b/drivers/ptp/ptp_clockmatrix.h
@@ -54,8 +54,6 @@
#define LOCK_TIMEOUT_MS (2000)
#define LOCK_POLL_INTERVAL_MS (10)
-#define PEROUT_ENABLE_OUTPUT_MASK (0xdeadbeef)
-
#define IDTCM_MAX_WRITE_COUNT (512)
#define PHASE_PULL_IN_MAX_PPB (144000)
--
2.7.4
^ permalink raw reply related
* [PATCH net v6 1/2] ptp: ptp_clockmatrix: Add PTP_CLK_REQ_EXTTS support
From: Min Li @ 2022-05-16 14:47 UTC (permalink / raw)
To: richardcochran, lee.jones; +Cc: linux-kernel, netdev, Min Li
Use TOD_READ_SECONDARY for extts to keep TOD_READ_PRIMARY
for gettime and settime exclusively. Before this change,
TOD_READ_PRIMARY was used for both extts and gettime/settime,
which would result in changing TOD read/write triggers between
operations. Using TOD_READ_SECONDARY would make extts
independent of gettime/settime operation
Signed-off-by: Min Li <min.li.xe@renesas.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
---
-use div helpers to do 64b division
-change comments to comply with kernel-doc format -Fix Jakub comments
-add more description to address the user problem
-improvements suggested by Jakub
drivers/ptp/ptp_clockmatrix.c | 289 ++++++++++++++++++++++++---------------
drivers/ptp/ptp_clockmatrix.h | 5 +
include/linux/mfd/idt8a340_reg.h | 12 +-
3 files changed, 196 insertions(+), 110 deletions(-)
diff --git a/drivers/ptp/ptp_clockmatrix.c b/drivers/ptp/ptp_clockmatrix.c
index 08e429a..2507576 100644
--- a/drivers/ptp/ptp_clockmatrix.c
+++ b/drivers/ptp/ptp_clockmatrix.c
@@ -239,73 +239,97 @@ static int wait_for_boot_status_ready(struct idtcm *idtcm)
return -EBUSY;
}
-static int _idtcm_set_scsr_read_trig(struct idtcm_channel *channel,
- enum scsr_read_trig_sel trig, u8 ref)
+static int arm_tod_read_trig_sel_refclk(struct idtcm_channel *channel, u8 ref)
{
struct idtcm *idtcm = channel->idtcm;
- u16 tod_read_cmd = IDTCM_FW_REG(idtcm->fw_ver, V520, TOD_READ_PRIMARY_CMD);
- u8 val;
+ u16 tod_read_cmd = IDTCM_FW_REG(idtcm->fw_ver, V520, TOD_READ_SECONDARY_CMD);
+ u8 val = 0;
int err;
- if (trig == SCSR_TOD_READ_TRIG_SEL_REFCLK) {
- err = idtcm_read(idtcm, channel->tod_read_primary,
- TOD_READ_PRIMARY_SEL_CFG_0, &val, sizeof(val));
- if (err)
- return err;
-
- val &= ~(WR_REF_INDEX_MASK << WR_REF_INDEX_SHIFT);
- val |= (ref << WR_REF_INDEX_SHIFT);
-
- err = idtcm_write(idtcm, channel->tod_read_primary,
- TOD_READ_PRIMARY_SEL_CFG_0, &val, sizeof(val));
- if (err)
- return err;
- }
+ val &= ~(WR_REF_INDEX_MASK << WR_REF_INDEX_SHIFT);
+ val |= (ref << WR_REF_INDEX_SHIFT);
- err = idtcm_read(idtcm, channel->tod_read_primary,
- tod_read_cmd, &val, sizeof(val));
+ err = idtcm_write(idtcm, channel->tod_read_secondary,
+ TOD_READ_SECONDARY_SEL_CFG_0, &val, sizeof(val));
if (err)
return err;
- val &= ~(TOD_READ_TRIGGER_MASK << TOD_READ_TRIGGER_SHIFT);
- val |= (trig << TOD_READ_TRIGGER_SHIFT);
- val &= ~TOD_READ_TRIGGER_MODE; /* single shot */
+ val = 0 | (SCSR_TOD_READ_TRIG_SEL_REFCLK << TOD_READ_TRIGGER_SHIFT);
+
+ err = idtcm_write(idtcm, channel->tod_read_secondary, tod_read_cmd,
+ &val, sizeof(val));
+ if (err)
+ dev_err(idtcm->dev, "%s: err = %d", __func__, err);
- err = idtcm_write(idtcm, channel->tod_read_primary,
- tod_read_cmd, &val, sizeof(val));
return err;
}
-static int idtcm_enable_extts(struct idtcm_channel *channel, u8 todn, u8 ref,
- bool enable)
+static bool is_single_shot(u8 mask)
{
- struct idtcm *idtcm = channel->idtcm;
- u8 old_mask = idtcm->extts_mask;
- u8 mask = 1 << todn;
+ /* Treat single bit ToD masks as continuous trigger */
+ return mask <= 8 && is_power_of_2(mask);
+}
+
+static int idtcm_extts_enable(struct idtcm_channel *channel,
+ struct ptp_clock_request *rq, int on)
+{
+ u8 index = rq->extts.index;
+ struct idtcm *idtcm;
+ u8 mask = 1 << index;
int err = 0;
+ u8 old_mask;
+ int ref;
+
+ idtcm = channel->idtcm;
+ old_mask = idtcm->extts_mask;
- if (todn >= MAX_TOD)
+ /* Reject requests with unsupported flags */
+ if (rq->extts.flags & ~(PTP_ENABLE_FEATURE |
+ PTP_RISING_EDGE |
+ PTP_FALLING_EDGE |
+ PTP_STRICT_FLAGS))
+ return -EOPNOTSUPP;
+
+ /* Reject requests to enable time stamping on falling edge */
+ if ((rq->extts.flags & PTP_ENABLE_FEATURE) &&
+ (rq->extts.flags & PTP_FALLING_EDGE))
+ return -EOPNOTSUPP;
+
+ if (index >= MAX_TOD)
return -EINVAL;
- if (enable) {
- if (ref > 0xF) /* E_REF_CLK15 */
- return -EINVAL;
- if (idtcm->extts_mask & mask)
- return 0;
- err = _idtcm_set_scsr_read_trig(&idtcm->channel[todn],
- SCSR_TOD_READ_TRIG_SEL_REFCLK,
- ref);
+ if (on) {
+ /* Support triggering more than one TOD_0/1/2/3 by same pin */
+ /* Use the pin configured for the channel */
+ ref = ptp_find_pin(channel->ptp_clock, PTP_PF_EXTTS, channel->tod);
+
+ if (ref < 0) {
+ dev_err(idtcm->dev, "%s: No valid pin found for TOD%d!\n",
+ __func__, channel->tod);
+ return -EBUSY;
+ }
+
+ err = arm_tod_read_trig_sel_refclk(&idtcm->channel[index], ref);
+
if (err == 0) {
idtcm->extts_mask |= mask;
- idtcm->event_channel[todn] = channel;
- idtcm->channel[todn].refn = ref;
+ idtcm->event_channel[index] = channel;
+ idtcm->channel[index].refn = ref;
+ idtcm->extts_single_shot = is_single_shot(idtcm->extts_mask);
+
+ if (old_mask)
+ return 0;
+
+ schedule_delayed_work(&idtcm->extts_work,
+ msecs_to_jiffies(EXTTS_PERIOD_MS));
}
- } else
+ } else {
idtcm->extts_mask &= ~mask;
+ idtcm->extts_single_shot = is_single_shot(idtcm->extts_mask);
- if (old_mask == 0 && idtcm->extts_mask)
- schedule_delayed_work(&idtcm->extts_work,
- msecs_to_jiffies(EXTTS_PERIOD_MS));
+ if (idtcm->extts_mask == 0)
+ cancel_delayed_work(&idtcm->extts_work);
+ }
return err;
}
@@ -371,6 +395,31 @@ static void wait_for_chip_ready(struct idtcm *idtcm)
"Continuing while SYS APLL/DPLL is not locked");
}
+static int _idtcm_gettime_triggered(struct idtcm_channel *channel,
+ struct timespec64 *ts)
+{
+ struct idtcm *idtcm = channel->idtcm;
+ u16 tod_read_cmd = IDTCM_FW_REG(idtcm->fw_ver, V520, TOD_READ_SECONDARY_CMD);
+ u8 buf[TOD_BYTE_COUNT];
+ u8 trigger;
+ int err;
+
+ err = idtcm_read(idtcm, channel->tod_read_secondary,
+ tod_read_cmd, &trigger, sizeof(trigger));
+ if (err)
+ return err;
+
+ if (trigger & TOD_READ_TRIGGER_MASK)
+ return -EBUSY;
+
+ err = idtcm_read(idtcm, channel->tod_read_secondary,
+ TOD_READ_SECONDARY_BASE, buf, sizeof(buf));
+ if (err)
+ return err;
+
+ return char_array_to_timespec(buf, sizeof(buf), ts);
+}
+
static int _idtcm_gettime(struct idtcm_channel *channel,
struct timespec64 *ts, u8 timeout)
{
@@ -396,7 +445,7 @@ static int _idtcm_gettime(struct idtcm_channel *channel,
} while (trigger & TOD_READ_TRIGGER_MASK);
err = idtcm_read(idtcm, channel->tod_read_primary,
- TOD_READ_PRIMARY, buf, sizeof(buf));
+ TOD_READ_PRIMARY_BASE, buf, sizeof(buf));
if (err)
return err;
@@ -415,67 +464,38 @@ static int idtcm_extts_check_channel(struct idtcm *idtcm, u8 todn)
extts_channel = &idtcm->channel[todn];
ptp_channel = idtcm->event_channel[todn];
+
if (extts_channel == ptp_channel)
dco_delay = ptp_channel->dco_delay;
- err = _idtcm_gettime(extts_channel, &ts, 1);
- if (err == 0) {
- event.type = PTP_CLOCK_EXTTS;
- event.index = todn;
- event.timestamp = timespec64_to_ns(&ts) - dco_delay;
- ptp_clock_event(ptp_channel->ptp_clock, &event);
- }
- return err;
-}
-
-static u8 idtcm_enable_extts_mask(struct idtcm_channel *channel,
- u8 extts_mask, bool enable)
-{
- struct idtcm *idtcm = channel->idtcm;
- int i, err;
+ err = _idtcm_gettime_triggered(extts_channel, &ts);
+ if (err)
+ return err;
- for (i = 0; i < MAX_TOD; i++) {
- u8 mask = 1 << i;
- u8 refn = idtcm->channel[i].refn;
-
- if (extts_mask & mask) {
- /* check extts before disabling it */
- if (enable == false) {
- err = idtcm_extts_check_channel(idtcm, i);
- /* trigger happened so we won't re-enable it */
- if (err == 0)
- extts_mask &= ~mask;
- }
- (void)idtcm_enable_extts(channel, i, refn, enable);
- }
- }
+ /* Triggered - save timestamp */
+ event.type = PTP_CLOCK_EXTTS;
+ event.index = todn;
+ event.timestamp = timespec64_to_ns(&ts) - dco_delay;
+ ptp_clock_event(ptp_channel->ptp_clock, &event);
- return extts_mask;
+ return err;
}
static int _idtcm_gettime_immediate(struct idtcm_channel *channel,
struct timespec64 *ts)
{
struct idtcm *idtcm = channel->idtcm;
- u8 extts_mask = 0;
- int err;
- /* Disable extts */
- if (idtcm->extts_mask) {
- extts_mask = idtcm_enable_extts_mask(channel, idtcm->extts_mask,
- false);
- }
-
- err = _idtcm_set_scsr_read_trig(channel,
- SCSR_TOD_READ_TRIG_SEL_IMMEDIATE, 0);
- if (err == 0)
- err = _idtcm_gettime(channel, ts, 10);
+ u16 tod_read_cmd = IDTCM_FW_REG(idtcm->fw_ver, V520, TOD_READ_PRIMARY_CMD);
+ u8 val = (SCSR_TOD_READ_TRIG_SEL_IMMEDIATE << TOD_READ_TRIGGER_SHIFT);
+ int err;
- /* Re-enable extts */
- if (extts_mask)
- idtcm_enable_extts_mask(channel, extts_mask, true);
+ err = idtcm_write(idtcm, channel->tod_read_primary,
+ tod_read_cmd, &val, sizeof(val));
+ if (err)
+ return err;
- return err;
+ return _idtcm_gettime(channel, ts, 10);
}
static int _sync_pll_output(struct idtcm *idtcm,
@@ -1702,6 +1722,9 @@ static int initialize_dco_operating_mode(struct idtcm_channel *channel)
/*
* Maximum absolute value for write phase offset in picoseconds
*
+ * @channel: channel
+ * @delta_ns: delta in nanoseconds
+ *
* Destination signed register is 32-bit register in resolution of 50ps
*
* 0x7fffffff * 50 = 2147483647 * 50 = 107374182350
@@ -1958,8 +1981,7 @@ static int idtcm_enable(struct ptp_clock_info *ptp,
err = idtcm_perout_enable(channel, &rq->perout, true);
break;
case PTP_CLK_REQ_EXTTS:
- err = idtcm_enable_extts(channel, rq->extts.index,
- rq->extts.rsv[0], on);
+ err = idtcm_extts_enable(channel, rq, on);
break;
default:
break;
@@ -1982,13 +2004,6 @@ static int idtcm_enable_tod(struct idtcm_channel *channel)
u8 cfg;
int err;
- /* STEELAI-366 - Temporary workaround for ts2phc compatibility */
- if (0) {
- err = idtcm_output_mask_enable(channel, false);
- if (err)
- return err;
- }
-
/*
* Start the TOD clock ticking.
*/
@@ -2038,17 +2053,35 @@ static void idtcm_set_version_info(struct idtcm *idtcm)
product_id, hw_rev_id, config_select);
}
+static int idtcm_verify_pin(struct ptp_clock_info *ptp, unsigned int pin,
+ enum ptp_pin_function func, unsigned int chan)
+{
+ switch (func) {
+ case PTP_PF_NONE:
+ case PTP_PF_EXTTS:
+ break;
+ case PTP_PF_PEROUT:
+ case PTP_PF_PHYSYNC:
+ return -1;
+ }
+ return 0;
+}
+
+static struct ptp_pin_desc pin_config[MAX_TOD][MAX_REF_CLK];
+
static const struct ptp_clock_info idtcm_caps = {
.owner = THIS_MODULE,
.max_adj = 244000,
.n_per_out = 12,
.n_ext_ts = MAX_TOD,
+ .n_pins = MAX_REF_CLK,
.adjphase = &idtcm_adjphase,
.adjfine = &idtcm_adjfine,
.adjtime = &idtcm_adjtime,
.gettime64 = &idtcm_gettime,
.settime64 = &idtcm_settime,
.enable = &idtcm_enable,
+ .verify = &idtcm_verify_pin,
.do_aux_work = &idtcm_work_handler,
};
@@ -2057,12 +2090,14 @@ static const struct ptp_clock_info idtcm_caps_deprecated = {
.max_adj = 244000,
.n_per_out = 12,
.n_ext_ts = MAX_TOD,
+ .n_pins = MAX_REF_CLK,
.adjphase = &idtcm_adjphase,
.adjfine = &idtcm_adjfine,
.adjtime = &idtcm_adjtime_deprecated,
.gettime64 = &idtcm_gettime,
.settime64 = &idtcm_settime_deprecated,
.enable = &idtcm_enable,
+ .verify = &idtcm_verify_pin,
.do_aux_work = &idtcm_work_handler,
};
@@ -2174,8 +2209,9 @@ static u32 idtcm_get_dco_delay(struct idtcm_channel *channel)
n = 1;
fodFreq = (u32)div_u64(m, n);
+
if (fodFreq >= 500000000)
- return 18 * (u32)div_u64(NSEC_PER_SEC, fodFreq);
+ return (u32)div_u64(18 * (u64)NSEC_PER_SEC, fodFreq);
return 0;
}
@@ -2188,24 +2224,28 @@ static int configure_channel_tod(struct idtcm_channel *channel, u32 index)
switch (index) {
case 0:
channel->tod_read_primary = IDTCM_FW_REG(fw_ver, V520, TOD_READ_PRIMARY_0);
+ channel->tod_read_secondary = IDTCM_FW_REG(fw_ver, V520, TOD_READ_SECONDARY_0);
channel->tod_write = IDTCM_FW_REG(fw_ver, V520, TOD_WRITE_0);
channel->tod_n = IDTCM_FW_REG(fw_ver, V520, TOD_0);
channel->sync_src = SYNC_SOURCE_DPLL0_TOD_PPS;
break;
case 1:
channel->tod_read_primary = IDTCM_FW_REG(fw_ver, V520, TOD_READ_PRIMARY_1);
+ channel->tod_read_secondary = IDTCM_FW_REG(fw_ver, V520, TOD_READ_SECONDARY_1);
channel->tod_write = IDTCM_FW_REG(fw_ver, V520, TOD_WRITE_1);
channel->tod_n = IDTCM_FW_REG(fw_ver, V520, TOD_1);
channel->sync_src = SYNC_SOURCE_DPLL1_TOD_PPS;
break;
case 2:
channel->tod_read_primary = IDTCM_FW_REG(fw_ver, V520, TOD_READ_PRIMARY_2);
+ channel->tod_read_secondary = IDTCM_FW_REG(fw_ver, V520, TOD_READ_SECONDARY_2);
channel->tod_write = IDTCM_FW_REG(fw_ver, V520, TOD_WRITE_2);
channel->tod_n = IDTCM_FW_REG(fw_ver, V520, TOD_2);
channel->sync_src = SYNC_SOURCE_DPLL2_TOD_PPS;
break;
case 3:
channel->tod_read_primary = IDTCM_FW_REG(fw_ver, V520, TOD_READ_PRIMARY_3);
+ channel->tod_read_secondary = IDTCM_FW_REG(fw_ver, V520, TOD_READ_SECONDARY_3);
channel->tod_write = IDTCM_FW_REG(fw_ver, V520, TOD_WRITE_3);
channel->tod_n = IDTCM_FW_REG(fw_ver, V520, TOD_3);
channel->sync_src = SYNC_SOURCE_DPLL3_TOD_PPS;
@@ -2221,6 +2261,7 @@ static int idtcm_enable_channel(struct idtcm *idtcm, u32 index)
{
struct idtcm_channel *channel;
int err;
+ int i;
if (!(index < MAX_TOD))
return -EINVAL;
@@ -2248,6 +2289,17 @@ static int idtcm_enable_channel(struct idtcm *idtcm, u32 index)
snprintf(channel->caps.name, sizeof(channel->caps.name),
"IDT CM TOD%u", index);
+ channel->caps.pin_config = pin_config[index];
+
+ for (i = 0; i < channel->caps.n_pins; ++i) {
+ struct ptp_pin_desc *ppd = &channel->caps.pin_config[i];
+
+ snprintf(ppd->name, sizeof(ppd->name), "input_ref%d", i);
+ ppd->index = i;
+ ppd->func = PTP_PF_NONE;
+ ppd->chan = index;
+ }
+
err = initialize_dco_operating_mode(channel);
if (err)
return err;
@@ -2302,26 +2354,40 @@ static int idtcm_enable_extts_channel(struct idtcm *idtcm, u32 index)
static void idtcm_extts_check(struct work_struct *work)
{
struct idtcm *idtcm = container_of(work, struct idtcm, extts_work.work);
- int err, i;
+ struct idtcm_channel *channel;
+ u8 mask;
+ int err;
+ int i;
if (idtcm->extts_mask == 0)
return;
mutex_lock(idtcm->lock);
+
for (i = 0; i < MAX_TOD; i++) {
- u8 mask = 1 << i;
+ mask = 1 << i;
+
+ if ((idtcm->extts_mask & mask) == 0)
+ continue;
+
+ err = idtcm_extts_check_channel(idtcm, i);
- if (idtcm->extts_mask & mask) {
- err = idtcm_extts_check_channel(idtcm, i);
+ if (err == 0) {
/* trigger clears itself, so clear the mask */
- if (err == 0)
+ if (idtcm->extts_single_shot) {
idtcm->extts_mask &= ~mask;
+ } else {
+ /* Re-arm */
+ channel = &idtcm->channel[i];
+ arm_tod_read_trig_sel_refclk(channel, channel->refn);
+ }
}
}
if (idtcm->extts_mask)
schedule_delayed_work(&idtcm->extts_work,
msecs_to_jiffies(EXTTS_PERIOD_MS));
+
mutex_unlock(idtcm->lock);
}
@@ -2342,6 +2408,11 @@ static void set_default_masks(struct idtcm *idtcm)
idtcm->tod_mask = DEFAULT_TOD_MASK;
idtcm->extts_mask = 0;
+ idtcm->channel[0].tod = 0;
+ idtcm->channel[1].tod = 1;
+ idtcm->channel[2].tod = 2;
+ idtcm->channel[3].tod = 3;
+
idtcm->channel[0].pll = DEFAULT_TOD0_PTP_PLL;
idtcm->channel[1].pll = DEFAULT_TOD1_PTP_PLL;
idtcm->channel[2].pll = DEFAULT_TOD2_PTP_PLL;
@@ -2420,8 +2491,8 @@ static int idtcm_remove(struct platform_device *pdev)
{
struct idtcm *idtcm = platform_get_drvdata(pdev);
+ idtcm->extts_mask = 0;
ptp_clock_unregister_all(idtcm);
-
cancel_delayed_work_sync(&idtcm->extts_work);
return 0;
diff --git a/drivers/ptp/ptp_clockmatrix.h b/drivers/ptp/ptp_clockmatrix.h
index 0f3059a..4379650 100644
--- a/drivers/ptp/ptp_clockmatrix.h
+++ b/drivers/ptp/ptp_clockmatrix.h
@@ -10,11 +10,13 @@
#include <linux/ktime.h>
#include <linux/mfd/idt8a340_reg.h>
+#include <linux/ptp_clock.h>
#include <linux/regmap.h>
#define FW_FILENAME "idtcm.bin"
#define MAX_TOD (4)
#define MAX_PLL (8)
+#define MAX_REF_CLK (16)
#define MAX_ABS_WRITE_PHASE_PICOSECONDS (107374182350LL)
@@ -90,6 +92,7 @@ struct idtcm_channel {
u16 dpll_ctrl_n;
u16 dpll_phase_pull_in;
u16 tod_read_primary;
+ u16 tod_read_secondary;
u16 tod_write;
u16 tod_n;
u16 hw_dpll_n;
@@ -105,6 +108,7 @@ struct idtcm_channel {
/* last input trigger for extts */
u8 refn;
u8 pll;
+ u8 tod;
u16 output_mask;
};
@@ -116,6 +120,7 @@ struct idtcm {
enum fw_version fw_ver;
/* Polls for external time stamps */
u8 extts_mask;
+ bool extts_single_shot;
struct delayed_work extts_work;
/* Remember the ptp channel to report extts */
struct idtcm_channel *event_channel[MAX_TOD];
diff --git a/include/linux/mfd/idt8a340_reg.h b/include/linux/mfd/idt8a340_reg.h
index a18c153..0c70608 100644
--- a/include/linux/mfd/idt8a340_reg.h
+++ b/include/linux/mfd/idt8a340_reg.h
@@ -407,7 +407,7 @@
#define TOD_READ_PRIMARY_0 0xcc40
#define TOD_READ_PRIMARY_0_V520 0xcc50
/* 8-bit subns, 32-bit ns, 48-bit seconds */
-#define TOD_READ_PRIMARY 0x0000
+#define TOD_READ_PRIMARY_BASE 0x0000
/* Counter increments after TOD write is completed */
#define TOD_READ_PRIMARY_COUNTER 0x000b
/* Read trigger configuration */
@@ -424,6 +424,16 @@
#define TOD_READ_SECONDARY_0 0xcc90
#define TOD_READ_SECONDARY_0_V520 0xcca0
+/* 8-bit subns, 32-bit ns, 48-bit seconds */
+#define TOD_READ_SECONDARY_BASE 0x0000
+/* Counter increments after TOD write is completed */
+#define TOD_READ_SECONDARY_COUNTER 0x000b
+/* Read trigger configuration */
+#define TOD_READ_SECONDARY_SEL_CFG_0 0x000c
+/* Read trigger selection */
+#define TOD_READ_SECONDARY_CMD 0x000e
+#define TOD_READ_SECONDARY_CMD_V520 0x000f
+
#define TOD_READ_SECONDARY_1 0xcca0
#define TOD_READ_SECONDARY_1_V520 0xccb0
#define TOD_READ_SECONDARY_2 0xccb0
--
2.7.4
^ permalink raw reply related
* Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list
From: Petr Mladek @ 2022-05-16 14:45 UTC (permalink / raw)
To: Guilherme G. Piccoli
Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx, vgoyal,
vkuznets, will, Alexander Gordeev, Christian Borntraeger,
David S. Miller, Heiko Carstens, Sven Schnelle, Vasily Gorbik
In-Reply-To: <20220427224924.592546-23-gpiccoli@igalia.com>
On Wed 2022-04-27 19:49:16, Guilherme G. Piccoli wrote:
> Currently we have 3 notifier lists in the panic path, which will
> be wired in a way to allow the notifier callbacks to run in
> different moments at panic time, in a subsequent patch.
>
> But there is also an odd set of architecture calls hardcoded in
> the end of panic path, after the restart machinery. They're
> responsible for late time tunings / events, like enabling a stop
> button (Sparc) or effectively stopping the machine (s390).
>
> This patch introduces yet another notifier list to offer the
> architectures a way to add callbacks in such late moment on
> panic path without the need of ifdefs / hardcoded approaches.
The patch looks good to me. I would just suggest two changes.
1. I would rename the list to "panic_loop_list" instead of
"panic_post_reboot_list".
It will be more clear that it includes things that are
needed before panic() enters the infinite loop.
2. I would move all the notifiers that enable blinking here.
The blinking should be done only during the infinite
loop when there is nothing else to do. If we enable
earlier then it might disturb/break more important
functionality (dumping information, reboot).
Best Regards,
Petr
^ permalink raw reply
* Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
From: Petr Mladek @ 2022-05-16 14:33 UTC (permalink / raw)
To: Guilherme G. Piccoli
Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx, vgoyal,
vkuznets, will, Alex Elder, Alexander Gordeev, Anton Ivanov,
Benjamin Herrenschmidt, Bjorn Andersson, Boris Ostrovsky,
Chris Zankel, Christian Borntraeger, Corey Minyard, Dexuan Cui,
H. Peter Anvin, Haiyang Zhang, Heiko Carstens, Helge Deller,
Ivan Kokshaysky, James E.J. Bottomley, James Morse, Johannes Berg,
K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
Paul Mackerras, Pavel Machek, Richard Weinberger, Robert Richter,
Stefano Stabellini, Stephen Hemminger, Sven Schnelle, Tony Luck,
Vasily Gorbik, Wei Liu
In-Reply-To: <20220427224924.592546-22-gpiccoli@igalia.com>
On Wed 2022-04-27 19:49:15, Guilherme G. Piccoli wrote:
> This patch renames the panic_notifier_list to panic_pre_reboot_list;
> the idea is that a subsequent patch will refactor the panic path
> in order to better split the notifiers, running some of them very
> early, some of them not so early [but still before kmsg_dump()] and
> finally, the rest should execute late, after kdump. The latter ones
> are now in the panic pre-reboot list - the name comes from the idea
> that these notifiers execute before panic() attempts rebooting the
> machine (if that option is set).
>
> We also took the opportunity to clean-up useless header inclusions,
> improve some notifier block declarations (e.g. in ibmasm/heartbeat.c)
> and more important, change some priorities - we hereby set 2 notifiers
> to run late in the list [iss_panic_event() and the IPMI panic_event()]
> due to the risks they offer (may not return, for example).
> Proper documentation is going to be provided in a subsequent patch,
> that effectively refactors the panic path.
>
> --- a/drivers/edac/altera_edac.c
> +++ b/drivers/edac/altera_edac.c
> @@ -2163,7 +2162,7 @@ static int altr_edac_a10_probe(struct platform_device *pdev)
> int dberror, err_addr;
>
> edac->panic_notifier.notifier_call = s10_edac_dberr_handler;
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_pre_reboot_list,
My understanding is that this notifier first prints info about ECC
errors and then triggers reboot. It might make sense to split it
into two notifiers.
> &edac->panic_notifier);
>
> /* Printout a message if uncorrectable error previously. */
> --- a/drivers/leds/trigger/ledtrig-panic.c
> +++ b/drivers/leds/trigger/ledtrig-panic.c
> @@ -64,7 +63,7 @@ static long led_panic_blink(int state)
>
> static int __init ledtrig_panic_init(void)
> {
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_pre_reboot_list,
> &led_trigger_panic_nb);
Blinking => should go to the last "post_reboot/loop" list.
>
> led_trigger_register_simple("panic", &trigger);
> --- a/drivers/misc/ibmasm/heartbeat.c
> +++ b/drivers/misc/ibmasm/heartbeat.c
> @@ -32,20 +31,23 @@ static int suspend_heartbeats = 0;
> static int panic_happened(struct notifier_block *n, unsigned long val, void *v)
> {
> suspend_heartbeats = 1;
> - return 0;
> + return NOTIFY_DONE;
> }
>
> -static struct notifier_block panic_notifier = { panic_happened, NULL, 1 };
> +static struct notifier_block panic_notifier = {
> + .notifier_call = panic_happened,
> +};
>
> void ibmasm_register_panic_notifier(void)
> {
> - atomic_notifier_chain_register(&panic_notifier_list, &panic_notifier);
> + atomic_notifier_chain_register(&panic_pre_reboot_list,
> + &panic_notifier);
Same here. Blinking => should go to the last "post_reboot/loop" list.
> }
>
> void ibmasm_unregister_panic_notifier(void)
> {
> - atomic_notifier_chain_unregister(&panic_notifier_list,
> - &panic_notifier);
> + atomic_notifier_chain_unregister(&panic_pre_reboot_list,
> + &panic_notifier);
> }
The rest of the moved notifiers seem to fit well this "pre_reboot"
list.
Best Regards,
Petr
^ permalink raw reply
* Re: [PATCH 20/30] panic: Add the panic informational notifier list
From: Guilherme G. Piccoli @ 2022-05-16 14:28 UTC (permalink / raw)
To: Petr Mladek
Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
tglx, vgoyal, vkuznets, will, Benjamin Herrenschmidt,
Catalin Marinas, Florian Fainelli, Frederic Weisbecker,
H. Peter Anvin, Hari Bathini, Joel Fernandes, Jonathan Hunter,
Josh Triplett, Lai Jiangshan, Leo Yan, Mathieu Desnoyers,
Mathieu Poirier, Michael Ellerman, Mike Leach, Mikko Perttunen,
Neeraj Upadhyay, Nicholas Piggin, Paul Mackerras,
Suzuki K Poulose, Thierry Reding, Thomas Bogendoerfer
In-Reply-To: <YoJbeuTNBXOIypSH@alley>
On 16/05/2022 11:11, Petr Mladek wrote:
> [...]
>
> All notifiers moved in this patch seems to fit well the "info"
> notifier list. The patch looks good from this POV.
>
> I still have to review the rest of the patches to see if it
> is complete.
>
> Best Regards,
> Petr
Thanks a bunch for the review =)
^ permalink raw reply
* Re: UDP receive performance drop since 3.10
From: Paolo Abeni @ 2022-05-16 14:29 UTC (permalink / raw)
To: David Laight, netdev@vger.kernel.org
In-Reply-To: <d11a2ce6ed394acd8c6da29d0358f7ce@AcuMS.aculab.com>
On Mon, 2022-05-16 at 12:58 +0000, David Laight wrote:
> I've noticed a doubling in the cpu cost of udp processing
> between a RHEL 3.10 kernel and a 5.18-rc6 one.
>
> This is (probably) all within ip_rcv().
>
> I'm testing very high rate UDP receive of RTP audio.
> (The target is 500000 udp/sec.)
> I've enable RPS so that ip_rcv() runs on different multiple
> cpus from the ethernet code.
> (RSS on the BCM5720 (tg3) doesn't seem to work very well.)
>
> On the 3.10 kernel the 'RPS' cpu show about 5% 'soft int' time.
> With 5.10 this has doubled to 10% for much the same test.
>
> The ftrace for a single packet shows a lot of extra code.
> With a RHEL 3.10 kernel the trace is quite short:
>
> /* netif_receive_skb: dev=em2 skbaddr=ffff99c3ee5ae000 len=200 */
> ip_rcv() {
> ip_rcv_finish() {
> 0.483 us udp_v4_early_demux();
> ip_route_input_noref() {
> ip_route_input_slow() {
> 2.036 us fib_table_lookup();
> fib_validate_source() {
> __fib_validate_source.isra.13() {
> 0.646 us fib_table_lookup();
> 1.589 us }
> 2.610 us }
> 6.820 us }
> 7.755 us }
> ip_local_deliver() {
> ip_local_deliver_finish() {
> 0.250 us raw_local_deliver();
> udp_rcv() {
> __udp4_lib_rcv() {
> __udp4_lib_lookup() {
> 0.063 us compute_score();
> 0.097 us compute_score();
> 1.496 us }
> udp_queue_rcv_skb() {
> sk_filter_trim_cap() {
> security_sock_rcv_skb() {
> 0.066 us cap_socket_sock_rcv_skb();
> 1.024 us }
> 1.836 us }
> 0.093 us ipv4_pktinfo_prepare();
> __udp_enqueue_schedule_skb() {
> 0.066 us _raw_spin_lock();
> sock_def_readable() {
> __wake_up_sync_key() {
> __wake_up_common_lock() {
> 0.194 us _raw_spin_lock_irqsave();
> __wake_up_common() {
> ep_poll_callback() {
> 0.184 us _raw_spin_lock_irqsave();
> 0.084 us _raw_spin_unlock_irqrestore();
> 2.009 us }
> 3.264 us }
> 0.087 us _raw_spin_unlock_irqrestore();
> 5.579 us }
> 6.311 us }
> 7.241 us }
> 8.833 us }
> + 12.948 us }
> + 16.365 us }
> + 17.280 us }
> + 19.900 us }
> + 20.673 us }
> + 31.519 us }
> + 32.534 us }
>
> Whereas 5.18 has a much longer trace:
> ip_rcv() {
> 0.668 us ip_rcv_core();
> ip_rcv_finish_core.constprop.0() {
> 1.155 us udp_v4_early_demux();
> ip_route_input_noref() {
> 0.306 us __rcu_read_lock();
> ip_route_input_rcu() {
> ip_route_input_slow() {
> make_kuid() {
> 0.441 us map_id_range_down();
> 1.231 us }
> 0.307 us __rcu_read_lock();
> fib_table_lookup() {
> 1.268 us fib_lookup_good_nhc();
> 2.736 us }
> 0.307 us __rcu_read_unlock();
> fib_validate_source() {
> __fib_validate_source() {
> make_kuid() {
> 0.304 us map_id_range_down();
> 0.931 us }
> 0.304 us __rcu_read_lock();
> fib_table_lookup() {
> 0.493 us fib_lookup_good_nhc();
> 1.405 us }
> 0.393 us __rcu_read_unlock();
> 0.390 us fib_info_nh_uses_dev();
> 5.457 us }
> 6.327 us }
> + 13.726 us }
> + 14.727 us }
> 0.407 us __rcu_read_unlock();
> + 16.673 us }
> + 19.389 us }
> ip_local_deliver() {
> ip_local_deliver_finish() {
> 0.376 us __rcu_read_lock();
> ip_protocol_deliver_rcu() {
> 0.434 us raw_local_deliver();
> udp_rcv() {
> __udp4_lib_rcv() {
> __udp4_lib_lookup() {
> 0.326 us udp4_lib_lookup2.isra.0();
> 0.928 us udp4_lib_lookup2.isra.0();
> 2.413 us }
> udp_unicast_rcv_skb() {
> udp_queue_rcv_skb() {
> udp_queue_rcv_one_skb() {
> sk_filter_trim_cap() {
> 0.440 us security_sock_rcv_skb();
> sk_filter_trim_cap.part.0() {
> 0.297 us __rcu_read_lock();
> 0.310 us __rcu_read_unlock();
> 1.531 us }
> 3.277 us }
> 0.334 us skb_pull_rcsum();
> 0.310 us ipv4_pktinfo_prepare();
> __udp_enqueue_schedule_skb() {
> _raw_spin_lock() {
> 0.334 us preempt_count_add();
> 0.938 us }
> _raw_spin_unlock() {
> 0.303 us preempt_count_sub();
> 0.908 us }
> sock_def_readable() {
> 0.307 us __rcu_read_lock();
> __wake_up_sync_key() {
> __wake_up_common_lock() {
> _raw_spin_lock_irqsave() {
> 0.326 us preempt_count_add();
> 0.951 us }
> __wake_up_common() {
> ep_poll_callback() {
> _raw_read_lock_irqsave() {
> 0.330 us preempt_count_add();
> 1.152 us }
> 0.614 us __rcu_read_lock();
> 0.323 us __rcu_read_unlock();
> _raw_read_unlock_irqrestore() {
> 0.380 us preempt_count_sub();
> 0.995 us }
> 5.410 us }
> 6.741 us }
> _raw_spin_unlock_irqrestore() {
> 0.317 us preempt_count_sub();
> 1.094 us }
> 9.994 us }
> + 10.806 us }
> 0.327 us __rcu_read_unlock();
> + 12.809 us }
> + 16.182 us }
> + 22.153 us }
> + 22.769 us }
> + 23.528 us }
> + 27.878 us }
> + 28.646 us }
> + 30.476 us }
> 0.324 us __rcu_read_unlock();
> + 32.398 us }
> + 33.168 us }
> + 54.976 us }
>
> Now I know the cost of ftrace is significant (and seems to be
> higher in 5.18) but there also seems to be a lot more code.
> As well as the extra rcu locks (which are probably mostly ftrace
> overhead, a few other things stick out:
>
> 1) The sock_net_uid(net, NULL) calls.
> These are make_kuid(net->user_ns, 0) - so pretty much constant.
> They seem to end up in a loop in map_id_range_down_base().
> All looks expensive in the default network namespace where
> 0 maps to 0.
>
> 2) Extra code in fib_lookup().
>
> 3) A lot more locking in ep_poll_callback().
>
> The 5.18 kernel also seems to have CONFIG_DEBUG_PREEMPT set.
> I can't find the Kconfig entry for it.
> It doesn't exist in the old .config at all.
> So I'm not sure why 'make oldconfig' picked it up.
>
> The other possibility is that the extra code is tick_nohz_idle_exit().
> The 3.10 trace is from a non-RPS config so I can't compare it.
>
> I'm going to disable CONFIG_DEBUG_PREEMPT to see how much
> difference it makes.
> Any idea if any other debug options will have got picked up?
Do you have CONFIG_PREEMPT_DYNAMIC in your config? That was not
available in 3.10 and apparently it pulls quite a bit of stuff, which
in the end should be quite measurable. The preempt count alone adds
~7us to the above sample.
Cheers,
Paolo
^ permalink raw reply
* [PATCH -next] net: ethernet: sunplus: add missing of_node_put() in spl2sw_mdio_init()
From: Yang Yingliang @ 2022-05-16 14:37 UTC (permalink / raw)
To: linux-kernel, netdev; +Cc: wellslutw, andrew, pabeni, davem, kuba
of_get_child_by_name() returns device node pointer with refcount
incremented. The refcount should be decremented before returning
from spl2sw_mdio_init().
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
drivers/net/ethernet/sunplus/spl2sw_mdio.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/sunplus/spl2sw_mdio.c b/drivers/net/ethernet/sunplus/spl2sw_mdio.c
index 139ac8f2685e..733ae1704269 100644
--- a/drivers/net/ethernet/sunplus/spl2sw_mdio.c
+++ b/drivers/net/ethernet/sunplus/spl2sw_mdio.c
@@ -97,8 +97,10 @@ u32 spl2sw_mdio_init(struct spl2sw_common *comm)
/* Allocate and register mdio bus. */
mii_bus = devm_mdiobus_alloc(&comm->pdev->dev);
- if (!mii_bus)
- return -ENOMEM;
+ if (!mii_bus) {
+ ret = -ENOMEM;
+ goto out;
+ }
mii_bus->name = "sunplus_mii_bus";
mii_bus->parent = &comm->pdev->dev;
@@ -110,10 +112,13 @@ u32 spl2sw_mdio_init(struct spl2sw_common *comm)
ret = of_mdiobus_register(mii_bus, mdio_np);
if (ret) {
dev_err(&comm->pdev->dev, "Failed to register mdiobus!\n");
- return ret;
+ goto out;
}
comm->mii_bus = mii_bus;
+
+out:
+ of_node_put(mdio_np);
return ret;
}
--
2.25.1
^ permalink raw reply related
* Re: [PATCH v2 4/4] powerpc/52xx: Convert to use fwnode API
From: Andy Shevchenko @ 2022-05-16 14:10 UTC (permalink / raw)
To: Michael Ellerman
Cc: Wolfram Sang, Marc Kleine-Budde, Damien Le Moal, Mark Brown,
chris.packham, Sergey Shtylyov, David S. Miller, Jakub Kicinski,
Greg Kroah-Hartman, Jiri Slaby, linuxppc-dev, linux-kernel,
linux-ide, linux-i2c, linux-can, netdev, linux-spi, linux-serial,
Benjamin Herrenschmidt, Paul Mackerras, Anatolij Gustschin,
Wolfgang Grandegger, Eric Dumazet, Paolo Abeni, Pantelis Antoniou
In-Reply-To: <YoJaGGwfoSYhaT13@smile.fi.intel.com>
On Mon, May 16, 2022 at 05:05:12PM +0300, Andy Shevchenko wrote:
> On Mon, May 16, 2022 at 11:48:05PM +1000, Michael Ellerman wrote:
> > Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
> > > We may convert the GPT driver to use fwnode API for the sake
> > > of consistency of the used APIs inside the driver.
> >
> > I'm not sure about this one.
> >
> > It's more consistent to use fwnode in this driver, but it's very
> > inconsistent with the rest of the powerpc code. We have basically no
> > uses of the fwnode APIs at the moment.
>
> Fair point!
>
> > It seems like a pretty straight-forward conversion, but there could
> > easily be a bug in there, I don't have any way to test it. Do you?
>
> Nope, only compile testing. The important part of this series is to
> clean up of_node from GPIO library, so since here it's a user of
> it I want to do that. This patch is just ad-hoc conversion that I
> noticed is possible. But there is no any requirement to do so.
>
> Lemme drop this from v3.
I just realize that there is no point to send a v3. You can just apply
first 3 patches. Or is your comment against entire series?
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH 20/30] panic: Add the panic informational notifier list
From: Petr Mladek @ 2022-05-16 14:11 UTC (permalink / raw)
To: Guilherme G. Piccoli
Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx, vgoyal,
vkuznets, will, Benjamin Herrenschmidt, Catalin Marinas,
Florian Fainelli, Frederic Weisbecker, H. Peter Anvin,
Hari Bathini, Joel Fernandes, Jonathan Hunter, Josh Triplett,
Lai Jiangshan, Leo Yan, Mathieu Desnoyers, Mathieu Poirier,
Michael Ellerman, Mike Leach, Mikko Perttunen, Neeraj Upadhyay,
Nicholas Piggin, Paul Mackerras, Suzuki K Poulose, Thierry Reding,
Thomas Bogendoerfer
In-Reply-To: <20220427224924.592546-21-gpiccoli@igalia.com>
On Wed 2022-04-27 19:49:14, Guilherme G. Piccoli wrote:
> The goal of this new panic notifier is to allow its users to
> register callbacks to run earlier in the panic path than they
> currently do. This aims at informational mechanisms, like dumping
> kernel offsets and showing device error data (in case it's simple
> registers reading, for example) as well as mechanisms to disable
> log flooding (like hung_task detector / RCU warnings) and the
> tracing dump_on_oops (when enabled).
>
> Any (non-invasive) information that should be provided before
> kmsg_dump() as well as log flooding preventing code should fit
> here, as long it offers relatively low risk for kdump.
>
> For now, the patch is almost a no-op, although it changes a bit
> the ordering in which some panic notifiers are executed - specially
> affected by this are the notifiers responsible for disabling the
> hung_task detector / RCU warnings, which now run first. In a
> subsequent patch, the panic path will be refactored, then the
> panic informational notifiers will effectively run earlier,
> before ksmg_dump() (and usually before kdump as well).
>
> We also defer documenting it all properly in the subsequent
> refactor patch. Finally, while at it, we removed some useless
> header inclusions too.
>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
All notifiers moved in this patch seems to fit well the "info"
notifier list. The patch looks good from this POV.
I still have to review the rest of the patches to see if it
is complete.
Best Regards,
Petr
^ permalink raw reply
* Re: [PATCH net-next v3] bond: add mac filter option for balance-xor
From: Jonathan Toppins @ 2022-05-16 14:06 UTC (permalink / raw)
To: Nikolay Aleksandrov, netdev
Cc: toke, Long Xin, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Jonathan Corbet, Jay Vosburgh, Veaceslav Falico,
Andy Gospodarek, linux-doc, linux-kernel
In-Reply-To: <da6bbb3b-344c-f032-fe03-5e8c8ac3c388@blackwall.org>
On 5/15/22 02:32, Nikolay Aleksandrov wrote:
> On 15/05/2022 00:41, Nikolay Aleksandrov wrote:
>> On 13/05/2022 20:43, Jonathan Toppins wrote:
>>> Implement a MAC filter that prevents duplicate frame delivery when
>>> handling BUM traffic. This attempts to partially replicate OvS SLB
>>> Bonding[1] like functionality without requiring significant change
>>> in the Linux bridging code.
>>>
>>> A typical network setup for this feature would be:
>>>
>>> .--------------------------------------------.
>>> | .--------------------. |
>>> | | | |
>>> .-------------------. | |
>>> | | Bond 0 | | | |
>>> | .--'---. .---'--. | | |
>>> .----|-| eth0 |-| eth1 |-|----. .-----+----. .----+------.
>>> | | '------' '------' | | | Switch 1 | | Switch 2 |
>>> | '---,---------------' | | +---+ |
>>> | / | '----+-----' '----+------'
>>> | .---'---. .------. | | |
>>> | | br0 |----| VM 1 | | ~~~~~~~~~~~~~~~~~~~~~
>>> | '-------' '------' | ( )
>>> | | .------. | ( Rest of Network )
>>> | '--------| VM # | | (_____________________)
>>> | '------' |
>>> | Host 1 |
>>> '-----------------------------'
>>>
>>> Where 'VM1' and 'VM#' are hosts connected to a Linux bridge, br0, with
>>> bond0 and its associated links, eth0 & eth1, provide ingress/egress. One
>>> can assume bond0, br1, and hosts VM1 to VM# are all contained in a
>>> single box, as depicted. Interfaces eth0 and eth1 provide redundant
>>> connections to the data center with the requirement to use all bandwidth
>>> when the system is functioning normally. Switch 1 and Switch 2 are
>>> physical switches that do not implement any advanced L2 management
>>> features such as MLAG, Cisco's VPC, or LACP.
>>>
>>> Combining this feature with vlan+srcmac hash policy allows a user to
>>> create an access network without the need to use expensive switches that
>>> support features like Cisco's VCP.
>>>
>>> [1] https://docs.openvswitch.org/en/latest/topics/bonding/#slb-bonding
>>>
>>> Co-developed-by: Long Xin <lxin@redhat.com>
>>> Signed-off-by: Long Xin <lxin@redhat.com>
>>> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
>>> ---
>>>
>>> Notes:
>>> v2:
>>> * dropped needless abstraction functions and put code in module init
>>> * renamed variable "rc" to "ret" to stay consistent with most of the
>>> code
>>> * fixed parameter setting management, when arp-monitor is turned on
>>> this feature will be turned off similar to how miimon and arp-monitor
>>> interact
>>> * renamed bond_xor_recv to bond_mac_filter_recv for a little more
>>> clarity
>>> * it appears the implied default return code for any bonding recv probe
>>> must be `RX_HANDLER_ANOTHER`. Changed the default return code of
>>> bond_mac_filter_recv to use this return value to not break skb
>>> processing when the skb dev is switched to the bond dev:
>>> `skb->dev = bond->dev`
>>>
>>> v3: Nik's comments
>>> * clarified documentation
>>> * fixed inline and basic reverse Christmas tree formatting
>>> * zero'ed entry in mac_create
>>> * removed read_lock taking in bond_mac_filter_recv
>>> * made has_expired() atomic and removed critical sections
>>> surrounding calls to has_expired(), this also removed the
>>> use-after-free that would have occurred:
>>> spin_lock_irqsave(&entry->lock, flags);
>>> if (has_expired(bond, entry))
>>> mac_delete(bond, entry);
>>> spin_unlock_irqrestore(&entry->lock, flags); <---
>>> * moved init/destroy of mac_filter_tbl to bond_open/bond_close
>>> this removed the complex option dependencies, the only behavioural
>>> change the user will see is if the bond is up and mac_filter is
>>> enabled if they try and set arp_interval they will receive -EBUSY
>>> * in bond_changelink moved processing of mac_filter option just below
>>> mode processing
>>>
>>> Documentation/networking/bonding.rst | 20 +++
>>> drivers/net/bonding/Makefile | 2 +-
>>> drivers/net/bonding/bond_mac_filter.c | 201 ++++++++++++++++++++++++++
>>> drivers/net/bonding/bond_mac_filter.h | 37 +++++
>>> drivers/net/bonding/bond_main.c | 30 ++++
>>> drivers/net/bonding/bond_netlink.c | 13 ++
>>> drivers/net/bonding/bond_options.c | 81 +++++++++--
>>> drivers/net/bonding/bonding_priv.h | 1 +
>>> include/net/bond_options.h | 1 +
>>> include/net/bonding.h | 3 +
>>> include/uapi/linux/if_link.h | 1 +
>>> 11 files changed, 373 insertions(+), 17 deletions(-)
>>> create mode 100644 drivers/net/bonding/bond_mac_filter.c
>>> create mode 100644 drivers/net/bonding/bond_mac_filter.h
>>>
>>
> [snip]
>
> The same problem solved using a few nftables rules (in case you don't want to load eBPF):
> $ nft 'add table netdev nt'
> $ nft 'add chain netdev nt bond0EgressFilter { type filter hook egress device bond0 priority 0; }'
> $ nft 'add chain netdev nt bond0IngressFilter { type filter hook ingress device bond0 priority 0; }'
> $ nft 'add set netdev nt macset { type ether_addr; flags timeout; }'
> $ nft 'add rule netdev nt bond0EgressFilter set update ether saddr timeout 5s @macset'
> $ nft 'add rule netdev nt bond0IngressFilter ether saddr @macset counter drop'
>
I get the following when trying to apply this on a fedora 35 install.
root@fedora ~]# ip link add bond0 type bond mode balance-xor
xmit_hash_policy vlan+srcmac
[root@fedora ~]# nft 'add table netdev nt'
[root@fedora ~]# nft 'add chain netdev nt bond0EgressFilter { type
filter hook egress device bond0 priority 0; }'
Error: unknown chain hook
add chain netdev nt bond0EgressFilter { type filter hook egress device
bond0 priority 0; }
^^^^^^
[root@fedora ~]# uname -a
Linux fedora 5.17.5-200.fc35.x86_64 #1 SMP PREEMPT Thu Apr 28 15:41:41
UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
^ permalink raw reply
* Re: [PATCH v2 4/4] powerpc/52xx: Convert to use fwnode API
From: Andy Shevchenko @ 2022-05-16 14:05 UTC (permalink / raw)
To: Michael Ellerman
Cc: Wolfram Sang, Marc Kleine-Budde, Damien Le Moal, Mark Brown,
chris.packham, Sergey Shtylyov, David S. Miller, Jakub Kicinski,
Greg Kroah-Hartman, Jiri Slaby, linuxppc-dev, linux-kernel,
linux-ide, linux-i2c, linux-can, netdev, linux-spi, linux-serial,
Benjamin Herrenschmidt, Paul Mackerras, Anatolij Gustschin,
Wolfgang Grandegger, Eric Dumazet, Paolo Abeni, Pantelis Antoniou
In-Reply-To: <877d6l7fmy.fsf@mpe.ellerman.id.au>
On Mon, May 16, 2022 at 11:48:05PM +1000, Michael Ellerman wrote:
> Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
> > We may convert the GPT driver to use fwnode API for the sake
> > of consistency of the used APIs inside the driver.
>
> I'm not sure about this one.
>
> It's more consistent to use fwnode in this driver, but it's very
> inconsistent with the rest of the powerpc code. We have basically no
> uses of the fwnode APIs at the moment.
Fair point!
> It seems like a pretty straight-forward conversion, but there could
> easily be a bug in there, I don't have any way to test it. Do you?
Nope, only compile testing. The important part of this series is to
clean up of_node from GPIO library, so since here it's a user of
it I want to do that. This patch is just ad-hoc conversion that I
noticed is possible. But there is no any requirement to do so.
Lemme drop this from v3.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
From: Petr Mladek @ 2022-05-16 14:01 UTC (permalink / raw)
To: Guilherme G. Piccoli
Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
coresight, linuxppc-dev, linux-alpha, linux-arm-kernel,
linux-edac, linux-hyperv, linux-leds, linux-mips, linux-parisc,
linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um,
linux-xtensa, netdev, openipmi-developer, rcu, sparclinux,
xen-devel, x86, kernel-dev, kernel, halves, fabiomirmar,
alejandro.j.jimenez, andriy.shevchenko, arnd, bp, corbet,
d.hatayama, dave.hansen, dyoung, feng.tang, gregkh, mikelley,
hidehiro.kawai.ez, jgross, john.ogness, keescook, luto, mhiramat,
mingo, paulmck, peterz, rostedt, senozhatsky, stern, tglx, vgoyal,
vkuznets, will, Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
Christophe JAILLET, David Gow, David S. Miller, Dexuan Cui,
Doug Berger, Evan Green, Florian Fainelli, Haiyang Zhang,
Hari Bathini, Heiko Carstens, Julius Werner, Justin Chen,
K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
Scott Branden, Sebastian Reichel, Shile Zhang, Stephen Hemminger,
Sven Schnelle, Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik,
Wang ShaoBo, Wei Liu, zhenwei pi
In-Reply-To: <20220427224924.592546-20-gpiccoli@igalia.com>
On Wed 2022-04-27 19:49:13, Guilherme G. Piccoli wrote:
> The goal of this new panic notifier is to allow its users to register
> callbacks to run very early in the panic path. This aims hypervisor/FW
> notification mechanisms as well as simple LED functions, and any other
> simple and safe mechanism that should run early in the panic path; more
> dangerous callbacks should execute later.
>
> For now, the patch is almost a no-op (although it changes a bit the
> ordering in which some panic notifiers are executed). In a subsequent
> patch, the panic path will be refactored, then the panic hypervisor
> notifiers will effectively run very early in the panic path.
>
> We also defer documenting it all properly in the subsequent refactor
> patch. While at it, we removed some useless header inclusions and
> fixed some notifiers return too (by using the standard NOTIFY_DONE).
> --- a/arch/mips/sgi-ip22/ip22-reset.c
> +++ b/arch/mips/sgi-ip22/ip22-reset.c
> @@ -195,7 +195,7 @@ static int __init reboot_setup(void)
> }
>
> timer_setup(&blink_timer, blink_timeout, 0);
> - atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
> + atomic_notifier_chain_register(&panic_hypervisor_list, &panic_block);
This notifier enables blinking. It is not much safe. It calls
mod_timer() that takes a lock internally.
This kind of functionality should go into the last list called
before panic() enters the infinite loop. IMHO, all the blinking
stuff should go there.
>
> return 0;
> }
> diff --git a/arch/mips/sgi-ip32/ip32-reset.c b/arch/mips/sgi-ip32/ip32-reset.c
> index 18d1c115cd53..9ee1302c9d13 100644
> --- a/arch/mips/sgi-ip32/ip32-reset.c
> +++ b/arch/mips/sgi-ip32/ip32-reset.c
> @@ -145,7 +144,7 @@ static __init int ip32_reboot_setup(void)
> pm_power_off = ip32_machine_halt;
>
> timer_setup(&blink_timer, blink_timeout, 0);
> - atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
> + atomic_notifier_chain_register(&panic_hypervisor_list, &panic_block);
Same here. Should be done only before the "loop".
>
> return 0;
> }
> --- a/drivers/firmware/google/gsmi.c
> +++ b/drivers/firmware/google/gsmi.c
> @@ -1034,7 +1034,7 @@ static __init int gsmi_init(void)
>
> register_reboot_notifier(&gsmi_reboot_notifier);
> register_die_notifier(&gsmi_die_notifier);
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_hypervisor_list,
> &gsmi_panic_notifier);
I am not sure about this one. It looks like some logging or
pre_reboot stuff.
>
> printk(KERN_INFO "gsmi version " DRIVER_VERSION " loaded\n");
> --- a/drivers/leds/trigger/ledtrig-activity.c
> +++ b/drivers/leds/trigger/ledtrig-activity.c
> @@ -247,7 +247,7 @@ static int __init activity_init(void)
> int rc = led_trigger_register(&activity_led_trigger);
>
> if (!rc) {
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_hypervisor_list,
> &activity_panic_nb);
The notifier is trivial. It just sets a variable.
But still, it is about blinking and should be done
in the last "loop" list.
> register_reboot_notifier(&activity_reboot_nb);
> }
> --- a/drivers/leds/trigger/ledtrig-heartbeat.c
> +++ b/drivers/leds/trigger/ledtrig-heartbeat.c
> @@ -190,7 +190,7 @@ static int __init heartbeat_trig_init(void)
> int rc = led_trigger_register(&heartbeat_led_trigger);
>
> if (!rc) {
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_hypervisor_list,
> &heartbeat_panic_nb);
Same here. Blinking => loop list.
> register_reboot_notifier(&heartbeat_reboot_nb);
> }
> diff --git a/drivers/misc/bcm-vk/bcm_vk_dev.c b/drivers/misc/bcm-vk/bcm_vk_dev.c
> index a16b99bdaa13..d9d5199cdb2b 100644
> --- a/drivers/misc/bcm-vk/bcm_vk_dev.c
> +++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
> @@ -1446,7 +1446,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>
> /* register for panic notifier */
> vk->panic_nb.notifier_call = bcm_vk_on_panic;
> - err = atomic_notifier_chain_register(&panic_notifier_list,
> + err = atomic_notifier_chain_register(&panic_hypervisor_list,
> &vk->panic_nb);
It seems to reset some hardware or so. IMHO, it should go into the
pre-reboot list.
> if (err) {
> dev_err(dev, "Fail to register panic notifier\n");
> --- a/drivers/power/reset/ltc2952-poweroff.c
> +++ b/drivers/power/reset/ltc2952-poweroff.c
> @@ -279,7 +279,7 @@ static int ltc2952_poweroff_probe(struct platform_device *pdev)
> pm_power_off = ltc2952_poweroff_kill;
>
> data->panic_notifier.notifier_call = ltc2952_poweroff_notify_panic;
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_hypervisor_list,
> &data->panic_notifier);
I looks like this somehow triggers the reboot. IMHO, it should go
into the pre_reboot list.
> dev_info(&pdev->dev, "probe successful\n");
>
> --- a/drivers/soc/bcm/brcmstb/pm/pm-arm.c
> +++ b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
> @@ -814,7 +814,7 @@ static int brcmstb_pm_probe(struct platform_device *pdev)
> goto out;
> }
>
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_hypervisor_list,
> &brcmstb_pm_panic_nb);
I am not sure about this one. It instruct some HW to preserve DRAM.
IMHO, it better fits into pre_reboot category but I do not have
strong opinion.
>
> pm_power_off = brcmstb_pm_poweroff;
Best Regards,
Petr
^ permalink raw reply
* Re: [PATCH net-next v3 00/10] UDP/IPv6 refactoring
From: Paolo Abeni @ 2022-05-16 13:48 UTC (permalink / raw)
To: Pavel Begunkov, netdev, David S . Miller, Jakub Kicinski
Cc: David Ahern, Eric Dumazet, linux-kernel
In-Reply-To: <cover.1652368648.git.asml.silence@gmail.com>
Hello,
On Fri, 2022-05-13 at 16:26 +0100, Pavel Begunkov wrote:
> Refactor UDP/IPv6 and especially udpv6_sendmsg() paths. The end result looks
> cleaner than it was before and the series also removes a bunch of instructions
> and other overhead from the hot path positively affecting performance.
>
> Testing over dummy netdev with 16 byte packets yields 2240481 tx/s,
> comparing to 2203417 tx/s previously, which is around +1.6%
I personally feel that some patches in this series have a relevant
chance of introducing functional regressions and e.g. syzbot will not
help to catch them. That risk is IMHO relevant considered that the
performance gain here looks quite limited.
There are a few individual changes that IMHO looks like nice cleanup
e.g. patch 5, 6, 8, 9 and possibly even patch 1.
I suggest to reduce the patchset scope to them.
Thanks!
Paolo
^ permalink raw reply
* Re: [PATCH v2 4/4] powerpc/52xx: Convert to use fwnode API
From: Michael Ellerman @ 2022-05-16 13:48 UTC (permalink / raw)
To: Andy Shevchenko, Andy Shevchenko, Wolfram Sang, Marc Kleine-Budde,
Damien Le Moal, Mark Brown, chris.packham, Sergey Shtylyov,
David S. Miller, Jakub Kicinski, Greg Kroah-Hartman, Jiri Slaby,
linuxppc-dev, linux-kernel, linux-ide, linux-i2c, linux-can,
netdev, linux-spi, linux-serial
Cc: Benjamin Herrenschmidt, Paul Mackerras, Anatolij Gustschin,
Wolfgang Grandegger, Eric Dumazet, Paolo Abeni, Pantelis Antoniou
In-Reply-To: <20220507100147.5802-4-andriy.shevchenko@linux.intel.com>
Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
> We may convert the GPT driver to use fwnode API for the sake
> of consistency of the used APIs inside the driver.
I'm not sure about this one.
It's more consistent to use fwnode in this driver, but it's very
inconsistent with the rest of the powerpc code. We have basically no
uses of the fwnode APIs at the moment.
It seems like a pretty straight-forward conversion, but there could
easily be a bug in there, I don't have any way to test it. Do you?
cheers
> diff --git a/arch/powerpc/platforms/52xx/mpc52xx_gpt.c b/arch/powerpc/platforms/52xx/mpc52xx_gpt.c
> index ae47fdcc8a96..58c3651034bd 100644
> --- a/arch/powerpc/platforms/52xx/mpc52xx_gpt.c
> +++ b/arch/powerpc/platforms/52xx/mpc52xx_gpt.c
> @@ -53,10 +53,9 @@
> #include <linux/interrupt.h>
> #include <linux/io.h>
> #include <linux/list.h>
> +#include <linux/mod_devicetable.h>
> +#include <linux/module.h>
> #include <linux/mutex.h>
> -#include <linux/of.h>
> -#include <linux/of_platform.h>
> -#include <linux/of_gpio.h>
> #include <linux/kernel.h>
> #include <linux/property.h>
> #include <linux/slab.h>
> @@ -64,7 +63,7 @@
> #include <linux/watchdog.h>
> #include <linux/miscdevice.h>
> #include <linux/uaccess.h>
> -#include <linux/module.h>
> +
> #include <asm/div64.h>
> #include <asm/mpc52xx.h>
>
> @@ -235,18 +234,17 @@ static const struct irq_domain_ops mpc52xx_gpt_irq_ops = {
> .xlate = mpc52xx_gpt_irq_xlate,
> };
>
> -static void
> -mpc52xx_gpt_irq_setup(struct mpc52xx_gpt_priv *gpt, struct device_node *node)
> +static void mpc52xx_gpt_irq_setup(struct mpc52xx_gpt_priv *gpt)
> {
> int cascade_virq;
> unsigned long flags;
> u32 mode;
>
> - cascade_virq = irq_of_parse_and_map(node, 0);
> - if (!cascade_virq)
> + cascade_virq = platform_get_irq(to_platform_device(gpt->dev), 0);
> + if (cascade_virq < 0)
> return;
>
> - gpt->irqhost = irq_domain_add_linear(node, 1, &mpc52xx_gpt_irq_ops, gpt);
> + gpt->irqhost = irq_domain_create_linear(dev_fwnode(gpt->dev), 1, &mpc52xx_gpt_irq_ops, gpt);
> if (!gpt->irqhost) {
> dev_err(gpt->dev, "irq_domain_add_linear() failed\n");
> return;
> @@ -670,8 +668,7 @@ static int mpc52xx_gpt_wdt_init(void)
> return err;
> }
>
> -static int mpc52xx_gpt_wdt_setup(struct mpc52xx_gpt_priv *gpt,
> - const u32 *period)
> +static int mpc52xx_gpt_wdt_setup(struct mpc52xx_gpt_priv *gpt, const u32 period)
> {
> u64 real_timeout;
>
> @@ -679,14 +676,14 @@ static int mpc52xx_gpt_wdt_setup(struct mpc52xx_gpt_priv *gpt,
> mpc52xx_gpt_wdt = gpt;
>
> /* configure the wdt if the device tree contained a timeout */
> - if (!period || *period == 0)
> + if (period == 0)
> return 0;
>
> - real_timeout = (u64) *period * 1000000000ULL;
> + real_timeout = (u64)period * 1000000000ULL;
> if (mpc52xx_gpt_do_start(gpt, real_timeout, 0, 1))
> dev_warn(gpt->dev, "starting as wdt failed\n");
> else
> - dev_info(gpt->dev, "watchdog set to %us timeout\n", *period);
> + dev_info(gpt->dev, "watchdog set to %us timeout\n", period);
> return 0;
> }
>
> @@ -697,8 +694,7 @@ static int mpc52xx_gpt_wdt_init(void)
> return 0;
> }
>
> -static inline int mpc52xx_gpt_wdt_setup(struct mpc52xx_gpt_priv *gpt,
> - const u32 *period)
> +static inline int mpc52xx_gpt_wdt_setup(struct mpc52xx_gpt_priv *gpt, const u32 period)
> {
> return 0;
> }
> @@ -726,25 +722,26 @@ static int mpc52xx_gpt_probe(struct platform_device *ofdev)
> dev_set_drvdata(&ofdev->dev, gpt);
>
> mpc52xx_gpt_gpio_setup(gpt);
> - mpc52xx_gpt_irq_setup(gpt, ofdev->dev.of_node);
> + mpc52xx_gpt_irq_setup(gpt);
>
> mutex_lock(&mpc52xx_gpt_list_mutex);
> list_add(&gpt->list, &mpc52xx_gpt_list);
> mutex_unlock(&mpc52xx_gpt_list_mutex);
>
> /* check if this device could be a watchdog */
> - if (of_get_property(ofdev->dev.of_node, "fsl,has-wdt", NULL) ||
> - of_get_property(ofdev->dev.of_node, "has-wdt", NULL)) {
> - const u32 *on_boot_wdt;
> + if (device_property_present(gpt->dev, "fsl,has-wdt") ||
> + device_property_present(gpt->dev, "has-wdt")) {
> + u32 on_boot_wdt = 0;
> + int ret;
>
> gpt->wdt_mode = MPC52xx_GPT_CAN_WDT;
> - on_boot_wdt = of_get_property(ofdev->dev.of_node,
> - "fsl,wdt-on-boot", NULL);
> - if (on_boot_wdt) {
> + ret = device_property_read_u32(gpt->dev, "fsl,wdt-on-boot", &on_boot_wdt);
> + if (ret) {
> + dev_info(gpt->dev, "can function as watchdog\n");
> + } else {
> dev_info(gpt->dev, "used as watchdog\n");
> gpt->wdt_mode |= MPC52xx_GPT_IS_WDT;
> - } else
> - dev_info(gpt->dev, "can function as watchdog\n");
> + }
> mpc52xx_gpt_wdt_setup(gpt, on_boot_wdt);
> }
>
> --
> 2.35.1
^ permalink raw reply
* [PATCH linux-next] net: ath: fix minmax.cocci warnings
From: Guo Zhengkui @ 2022-05-16 13:40 UTC (permalink / raw)
To: Jiri Slaby, Nick Kossifidis, Luis Chamberlain, Kalle Valo,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Toke Høiland-Jørgensen,
open list:ATHEROS ATH5K WIRELESS DRIVER,
open list:NETWORKING DRIVERS, open list
Cc: zhengkui_guo, Guo Zhengkui
Fix the following coccicheck warnings:
drivers/net/wireless/ath/ath5k/phy.c:3139:62-63: WARNING
opportunity for min()
drivers/net/wireless/ath/ath9k/dfs.c:249:28-30: WARNING
opportunity for max()
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
---
drivers/net/wireless/ath/ath5k/phy.c | 2 +-
drivers/net/wireless/ath/ath9k/dfs.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/ath/ath5k/phy.c b/drivers/net/wireless/ath/ath5k/phy.c
index 00f9e347d414..5797ef9c73d7 100644
--- a/drivers/net/wireless/ath/ath5k/phy.c
+++ b/drivers/net/wireless/ath/ath5k/phy.c
@@ -3136,7 +3136,7 @@ ath5k_combine_pwr_to_pdadc_curves(struct ath5k_hw *ah,
pdadc_n = gain_boundaries[pdg] + pd_gain_overlap - pwr_min[pdg];
/* Limit it to be inside pwr range */
table_size = pwr_max[pdg] - pwr_min[pdg];
- max_idx = (pdadc_n < table_size) ? pdadc_n : table_size;
+ max_idx = min(pdadc_n, table_size);
/* Fill pdadc_out table */
while (pdadc_0 < max_idx && pdadc_i < 128)
diff --git a/drivers/net/wireless/ath/ath9k/dfs.c b/drivers/net/wireless/ath/ath9k/dfs.c
index acb9602aa464..11349218bc21 100644
--- a/drivers/net/wireless/ath/ath9k/dfs.c
+++ b/drivers/net/wireless/ath/ath9k/dfs.c
@@ -246,7 +246,7 @@ ath9k_postprocess_radar_event(struct ath_softc *sc,
DFS_STAT_INC(sc, dc_phy_errors);
/* when both are present use stronger one */
- rssi = (ard->rssi < ard->ext_rssi) ? ard->ext_rssi : ard->rssi;
+ rssi = max(ard->rssi, ard->ext_rssi);
break;
default:
/*
--
2.20.1
^ permalink raw reply related
* Re: [PATCH bpf-next 1/2] cpuidle/rcu: Making arch_cpu_idle and rcu_idle_exit noinstr
From: Paul E. McKenney @ 2022-05-16 13:39 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Masami Hiramatsu, netdev, bpf, lkml, Martin KaFai Lau, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Steven Rostedt
In-Reply-To: <20220516114922.GA349949@lothringen>
On Mon, May 16, 2022 at 01:49:22PM +0200, Frederic Weisbecker wrote:
> On Sun, May 15, 2022 at 09:25:35PM -0700, Paul E. McKenney wrote:
> > On Sun, May 15, 2022 at 10:36:52PM +0200, Jiri Olsa wrote:
> > > Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
> > > in rcu 'not watching' context and if there's tracer attached to
> > > them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
> > > warning like:
> > >
> > > [ 3.017540] WARNING: suspicious RCU usage
> > > ...
> > > [ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
> > > [ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
> > > [ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
> > > [ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
> > > [ 3.018371] fprobe_handler.part.0+0xab/0x150
> > > [ 3.018374] 0xffffffffa00080c8
> > > [ 3.018393] ? arch_cpu_idle+0x5/0x10
> > > [ 3.018398] arch_cpu_idle+0x5/0x10
> > > [ 3.018399] default_idle_call+0x59/0x90
> > > [ 3.018401] do_idle+0x1c3/0x1d0
> > >
> > > The call path is following:
> > >
> > > default_idle_call
> > > rcu_idle_enter
> > > arch_cpu_idle
> > > rcu_idle_exit
> > >
> > > The arch_cpu_idle and rcu_idle_exit are the only ones from above
> > > path that are traceble and cause this problem on my setup.
> > >
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> >
> > From an RCU viewpoint:
> >
> > Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> >
> > [ I considered asking for an instrumentation_on() in rcu_idle_exit(),
> > but there is no point given that local_irq_restore() isn't something
> > you instrument anyway. ]
>
> So local_irq_save() in the beginning of rcu_idle_exit() is unsafe because
> it is instrumentable by the function (graph) tracers and the irqsoff tracer.
>
> Also it calls into lockdep that might make use of RCU.
>
> That's why rcu_idle_exit() is not noinstr yet. See this patch:
>
> https://lore.kernel.org/lkml/20220503100051.2799723-4-frederic@kernel.org/
Ah, I should have looked at the context-tracking series again!
And I have to ask... How much debugging capability are we really losing
by continuing to use the raw versions of local_irq_{save,restore}()?
Thanx, Paul
> Thanks.
>
> >
> > > ---
> > > arch/x86/kernel/process.c | 2 +-
> > > kernel/rcu/tree.c | 2 +-
> > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > > index b370767f5b19..1345cb0124a6 100644
> > > --- a/arch/x86/kernel/process.c
> > > +++ b/arch/x86/kernel/process.c
> > > @@ -720,7 +720,7 @@ void arch_cpu_idle_dead(void)
> > > /*
> > > * Called from the generic idle code.
> > > */
> > > -void arch_cpu_idle(void)
> > > +void noinstr arch_cpu_idle(void)
> > > {
> > > x86_idle();
> > > }
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index a4b8189455d5..20d529722f51 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -896,7 +896,7 @@ static void noinstr rcu_eqs_exit(bool user)
> > > * If you add or remove a call to rcu_idle_exit(), be sure to test with
> > > * CONFIG_RCU_EQS_DEBUG=y.
> > > */
> > > -void rcu_idle_exit(void)
> > > +void noinstr rcu_idle_exit(void)
> > > {
> > > unsigned long flags;
> > >
> > > --
> > > 2.35.3
> > >
^ permalink raw reply
* Re: [PATCH net-next v3 03/10] udp/ipv6: prioritise the ip6 path over ip4 checks
From: Paolo Abeni @ 2022-05-16 13:14 UTC (permalink / raw)
To: Pavel Begunkov, netdev, David S . Miller, Jakub Kicinski
Cc: David Ahern, Eric Dumazet, linux-kernel
In-Reply-To: <50cca375d8730b5bf74b975d0fede64b1a3744c4.1652368648.git.asml.silence@gmail.com>
On Fri, 2022-05-13 at 16:26 +0100, Pavel Begunkov wrote:
> For AF_INET6 sockets we care the most about ipv6 but not ip4 mappings as
> it's requires some extra hops anyway. Take AF_INET6 case from the address
> parsing switch and add an explicit path for it. It removes some extra
> ifs from the path and removes the switch overhead.
>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
> net/ipv6/udp.c | 37 +++++++++++++++++--------------------
> 1 file changed, 17 insertions(+), 20 deletions(-)
>
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 85bff1252f5c..e0b1bea998ce 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -1360,30 +1360,27 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>
> /* destination address check */
> if (sin6) {
> - if (addr_len < offsetof(struct sockaddr, sa_data))
> - return -EINVAL;
> + if (addr_len < SIN6_LEN_RFC2133 || sin6->sin6_family != AF_INET6) {
> + if (addr_len < offsetof(struct sockaddr, sa_data))
> + return -EINVAL;
I think you can't access 'sin6->sin6_family' before validating the
socket address len, that is before doing:
if (addr_len < offsetof(struct sockaddr, sa_data))
Paolo
^ permalink raw reply
* Re: [PATCH net-next v3 02/10] udp/ipv6: move pending section of udpv6_sendmsg
From: Paolo Abeni @ 2022-05-16 13:11 UTC (permalink / raw)
To: Pavel Begunkov, netdev, David S . Miller, Jakub Kicinski
Cc: David Ahern, Eric Dumazet, linux-kernel
In-Reply-To: <a0e7477985ef08c5f08f35b8c7336587c8adce12.1652368648.git.asml.silence@gmail.com>
On Fri, 2022-05-13 at 16:26 +0100, Pavel Begunkov wrote:
> Move up->pending section of udpv6_sendmsg() to the beginning of the
> function. Even though it require some code duplication for sin6 parsing,
> it clearly localises the pending handling in one place, removes an extra
> if and more importantly will prepare the code for further patches.
>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
> net/ipv6/udp.c | 70 ++++++++++++++++++++++++++++++--------------------
> 1 file changed, 42 insertions(+), 28 deletions(-)
>
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 11d44ed46953..85bff1252f5c 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -1318,6 +1318,46 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> ipc6.sockc.tsflags = sk->sk_tsflags;
> ipc6.sockc.mark = sk->sk_mark;
>
> + /* Rough check on arithmetic overflow,
> + better check is made in ip6_append_data().
> + */
> + if (unlikely(len > INT_MAX - sizeof(struct udphdr)))
> + return -EMSGSIZE;
> +
> + getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag;
> +
> + /* There are pending frames. */
> + if (up->pending) {
> + if (up->pending == AF_INET)
> + return udp_sendmsg(sk, msg, len);
> +
> + /* Do a quick destination sanity check before corking. */
> + if (sin6) {
> + if (msg->msg_namelen < offsetof(struct sockaddr, sa_data))
> + return -EINVAL;
> + if (sin6->sin6_family == AF_INET6) {
> + if (msg->msg_namelen < SIN6_LEN_RFC2133)
> + return -EINVAL;
> + if (ipv6_addr_any(&sin6->sin6_addr) &&
> + ipv6_addr_v4mapped(&np->saddr))
> + return -EINVAL;
It looks like 'any' destination with ipv4 mapped source is now
rejected, while the existing code accept it.
/P
^ permalink raw reply
* Re: [PATCH net v2] netfilter: nf_flow_table: fix teardown flow timeout
From: Sven Auhagen @ 2022-05-16 13:02 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Oz Shlomo, Felix Fietkau, netdev, netfilter-devel,
Florian Westphal, Paul Blakey
In-Reply-To: <YoJG2j0w551KM17k@salvia>
On Mon, May 16, 2022 at 02:43:06PM +0200, Pablo Neira Ayuso wrote:
> On Mon, May 16, 2022 at 02:23:00PM +0200, Sven Auhagen wrote:
> > On Mon, May 16, 2022 at 02:13:03PM +0200, Pablo Neira Ayuso wrote:
> > > On Mon, May 16, 2022 at 12:56:41PM +0200, Pablo Neira Ayuso wrote:
> > > > On Thu, May 12, 2022 at 09:28:03PM +0300, Oz Shlomo wrote:
> > > > > Connections leaving the established state (due to RST / FIN TCP packets)
> > > > > set the flow table teardown flag. The packet path continues to set lower
> > > > > timeout value as per the new TCP state but the offload flag remains set.
> > > > >
> > > > > Hence, the conntrack garbage collector may race to undo the timeout
> > > > > adjustment of the packet path, leaving the conntrack entry in place with
> > > > > the internal offload timeout (one day).
> > > > >
> > > > > Avoid ct gc timeout overwrite by flagging teared down flowtable
> > > > > connections.
> > > > >
> > > > > On the nftables side we only need to allow established TCP connections to
> > > > > create a flow offload entry. Since we can not guaruantee that
> > > > > flow_offload_teardown is called by a TCP FIN packet we also need to make
> > > > > sure that flow_offload_fixup_ct is also called in flow_offload_del
> > > > > and only fixes up established TCP connections.
> > > > [...]
> > > > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > > > > index 0164e5f522e8..324fdb62c08b 100644
> > > > > --- a/net/netfilter/nf_conntrack_core.c
> > > > > +++ b/net/netfilter/nf_conntrack_core.c
> > > > > @@ -1477,7 +1477,8 @@ static void gc_worker(struct work_struct *work)
> > > > > tmp = nf_ct_tuplehash_to_ctrack(h);
> > > > >
> > > > > if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
> > > > > - nf_ct_offload_timeout(tmp);
> > > >
> > > > Hm, it is the trick to avoid checking for IPS_OFFLOAD from the packet
> > > > path that triggers the race, ie. nf_ct_is_expired()
> > > >
> > > > The flowtable ct fixup races with conntrack gc collector.
> > > >
> > > > Clearing IPS_OFFLOAD might result in offloading the entry again for
> > > > the closing packets.
> > > >
> > > > Probably clear IPS_OFFLOAD from teardown, and skip offload if flow is
> > > > in a TCP state that represent closure?
> > > >
> > > > if (unlikely(!tcph || tcph->fin || tcph->rst))
> > > > goto out;
> > > >
> > > > this is already the intention in the existing code.
> > >
> > > I'm attaching an incomplete sketch patch. My goal is to avoid the
> > > extra IPS_ bit.
> >
> > You might create a race with ct gc that will remove the ct
> > if it is in close or end of close and before flow offload teardown is running
> > so flow offload teardown might access memory that was freed.
>
> flow object holds a reference to the ct object until it is released,
> no use-after-free can happen.
>
Also if nf_ct_delete is called before flowtable delete?
Can you let me know why?
> > It is not a very likely scenario but never the less it might happen now
> > since the IPS_OFFLOAD_BIT is not set and the state might just time out.
> >
> > If someone sets a very small TCP CLOSE timeout it gets more likely.
> >
> > So Oz and myself were debatting about three possible cases/problems:
> >
> > 1. ct gc sets timeout even though the state is in CLOSE/FIN because the
> > IPS_OFFLOAD is still set but the flow is in teardown
> > 2. ct gc removes the ct because the IPS_OFFLOAD is not set and
> > the CLOSE timeout is reached before the flow offload del
>
> OK.
>
> > 3. tcp ct is always set to ESTABLISHED with a very long timeout
> > in flow offload teardown/delete even though the state is already
> > CLOSED.
> >
> > Also as a remark we can not assume that the FIN or RST packet is hitting
> > flow table teardown as the packet might get bumped to the slow path in
> > nftables.
>
> I assume this remark is related to 3.?
Yes, exactly.
>
> if IPS_OFFLOAD is unset, then conntrack would update the state
> according to this FIN or RST.
>
It will move to a different TCP state anyways only the ct state
will be at IPS_OFFLOAD_BIT and prevent it from beeing garbage collected.
The timeout will be bumped back up as long as IPS_OFFLOAD_BIT is set
even though TCP might already be CLOSED.
> Thanks for the summary.
^ permalink raw reply
* UDP receive performance drop since 3.10
From: David Laight @ 2022-05-16 12:58 UTC (permalink / raw)
To: netdev@vger.kernel.org
I've noticed a doubling in the cpu cost of udp processing
between a RHEL 3.10 kernel and a 5.18-rc6 one.
This is (probably) all within ip_rcv().
I'm testing very high rate UDP receive of RTP audio.
(The target is 500000 udp/sec.)
I've enable RPS so that ip_rcv() runs on different multiple
cpus from the ethernet code.
(RSS on the BCM5720 (tg3) doesn't seem to work very well.)
On the 3.10 kernel the 'RPS' cpu show about 5% 'soft int' time.
With 5.10 this has doubled to 10% for much the same test.
The ftrace for a single packet shows a lot of extra code.
With a RHEL 3.10 kernel the trace is quite short:
/* netif_receive_skb: dev=em2 skbaddr=ffff99c3ee5ae000 len=200 */
ip_rcv() {
ip_rcv_finish() {
0.483 us udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_slow() {
2.036 us fib_table_lookup();
fib_validate_source() {
__fib_validate_source.isra.13() {
0.646 us fib_table_lookup();
1.589 us }
2.610 us }
6.820 us }
7.755 us }
ip_local_deliver() {
ip_local_deliver_finish() {
0.250 us raw_local_deliver();
udp_rcv() {
__udp4_lib_rcv() {
__udp4_lib_lookup() {
0.063 us compute_score();
0.097 us compute_score();
1.496 us }
udp_queue_rcv_skb() {
sk_filter_trim_cap() {
security_sock_rcv_skb() {
0.066 us cap_socket_sock_rcv_skb();
1.024 us }
1.836 us }
0.093 us ipv4_pktinfo_prepare();
__udp_enqueue_schedule_skb() {
0.066 us _raw_spin_lock();
sock_def_readable() {
__wake_up_sync_key() {
__wake_up_common_lock() {
0.194 us _raw_spin_lock_irqsave();
__wake_up_common() {
ep_poll_callback() {
0.184 us _raw_spin_lock_irqsave();
0.084 us _raw_spin_unlock_irqrestore();
2.009 us }
3.264 us }
0.087 us _raw_spin_unlock_irqrestore();
5.579 us }
6.311 us }
7.241 us }
8.833 us }
+ 12.948 us }
+ 16.365 us }
+ 17.280 us }
+ 19.900 us }
+ 20.673 us }
+ 31.519 us }
+ 32.534 us }
Whereas 5.18 has a much longer trace:
ip_rcv() {
0.668 us ip_rcv_core();
ip_rcv_finish_core.constprop.0() {
1.155 us udp_v4_early_demux();
ip_route_input_noref() {
0.306 us __rcu_read_lock();
ip_route_input_rcu() {
ip_route_input_slow() {
make_kuid() {
0.441 us map_id_range_down();
1.231 us }
0.307 us __rcu_read_lock();
fib_table_lookup() {
1.268 us fib_lookup_good_nhc();
2.736 us }
0.307 us __rcu_read_unlock();
fib_validate_source() {
__fib_validate_source() {
make_kuid() {
0.304 us map_id_range_down();
0.931 us }
0.304 us __rcu_read_lock();
fib_table_lookup() {
0.493 us fib_lookup_good_nhc();
1.405 us }
0.393 us __rcu_read_unlock();
0.390 us fib_info_nh_uses_dev();
5.457 us }
6.327 us }
+ 13.726 us }
+ 14.727 us }
0.407 us __rcu_read_unlock();
+ 16.673 us }
+ 19.389 us }
ip_local_deliver() {
ip_local_deliver_finish() {
0.376 us __rcu_read_lock();
ip_protocol_deliver_rcu() {
0.434 us raw_local_deliver();
udp_rcv() {
__udp4_lib_rcv() {
__udp4_lib_lookup() {
0.326 us udp4_lib_lookup2.isra.0();
0.928 us udp4_lib_lookup2.isra.0();
2.413 us }
udp_unicast_rcv_skb() {
udp_queue_rcv_skb() {
udp_queue_rcv_one_skb() {
sk_filter_trim_cap() {
0.440 us security_sock_rcv_skb();
sk_filter_trim_cap.part.0() {
0.297 us __rcu_read_lock();
0.310 us __rcu_read_unlock();
1.531 us }
3.277 us }
0.334 us skb_pull_rcsum();
0.310 us ipv4_pktinfo_prepare();
__udp_enqueue_schedule_skb() {
_raw_spin_lock() {
0.334 us preempt_count_add();
0.938 us }
_raw_spin_unlock() {
0.303 us preempt_count_sub();
0.908 us }
sock_def_readable() {
0.307 us __rcu_read_lock();
__wake_up_sync_key() {
__wake_up_common_lock() {
_raw_spin_lock_irqsave() {
0.326 us preempt_count_add();
0.951 us }
__wake_up_common() {
ep_poll_callback() {
_raw_read_lock_irqsave() {
0.330 us preempt_count_add();
1.152 us }
0.614 us __rcu_read_lock();
0.323 us __rcu_read_unlock();
_raw_read_unlock_irqrestore() {
0.380 us preempt_count_sub();
0.995 us }
5.410 us }
6.741 us }
_raw_spin_unlock_irqrestore() {
0.317 us preempt_count_sub();
1.094 us }
9.994 us }
+ 10.806 us }
0.327 us __rcu_read_unlock();
+ 12.809 us }
+ 16.182 us }
+ 22.153 us }
+ 22.769 us }
+ 23.528 us }
+ 27.878 us }
+ 28.646 us }
+ 30.476 us }
0.324 us __rcu_read_unlock();
+ 32.398 us }
+ 33.168 us }
+ 54.976 us }
Now I know the cost of ftrace is significant (and seems to be
higher in 5.18) but there also seems to be a lot more code.
As well as the extra rcu locks (which are probably mostly ftrace
overhead, a few other things stick out:
1) The sock_net_uid(net, NULL) calls.
These are make_kuid(net->user_ns, 0) - so pretty much constant.
They seem to end up in a loop in map_id_range_down_base().
All looks expensive in the default network namespace where
0 maps to 0.
2) Extra code in fib_lookup().
3) A lot more locking in ep_poll_callback().
The 5.18 kernel also seems to have CONFIG_DEBUG_PREEMPT set.
I can't find the Kconfig entry for it.
It doesn't exist in the old .config at all.
So I'm not sure why 'make oldconfig' picked it up.
The other possibility is that the extra code is tick_nohz_idle_exit().
The 3.10 trace is from a non-RPS config so I can't compare it.
I'm going to disable CONFIG_DEBUG_PREEMPT to see how much
difference it makes.
Any idea if any other debug options will have got picked up?
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply
* [Patch] net: af_key: check encryption module availability consistency
From: Thomas Bartschies @ 2022-05-16 12:57 UTC (permalink / raw)
Cc: Thomas Bartschies
Since the recent introduction supporting the SM3 and SM4 hash algos for IPsec, the kernel
produces invalid pfkey acquire messages, when these encryption modules are disabled. This
happens because the availability of the algos wasn't checked in all necessary functions.
This patch adds these checks.
Signed-off-by: Thomas Bartschies <thomas.bartschies@cvk.de>
diff -uprN a/net/key/af_key.c b/net/key/af_key.c
--- a/net/key/af_key.c 2022-05-09 09:16:33.000000000 +0200
+++ b/net/key/af_key.c 2022-05-13 13:51:58.286250337 +0200
@@ -2898,7 +2898,7 @@ static int count_ah_combs(const struct x
break;
if (!aalg->pfkey_supported)
continue;
- if (aalg_tmpl_set(t, aalg))
+ if (aalg_tmpl_set(t, aalg) && aalg->available)
sz += sizeof(struct sadb_comb);
}
return sz + sizeof(struct sadb_prop);
@@ -2916,7 +2916,7 @@ static int count_esp_combs(const struct
if (!ealg->pfkey_supported)
continue;
- if (!(ealg_tmpl_set(t, ealg)))
+ if (!(ealg_tmpl_set(t, ealg) && ealg->available))
continue;
for (k = 1; ; k++) {
@@ -2927,7 +2927,7 @@ static int count_esp_combs(const struct
if (!aalg->pfkey_supported)
continue;
- if (aalg_tmpl_set(t, aalg))
+ if (aalg_tmpl_set(t, aalg) && aalg->available)
sz += sizeof(struct sadb_comb);
}
}
^ permalink raw reply
* Re: [PATCH net-next 7/9] net: tcp: add skb drop reasons to tcp connect requesting
From: kernel test robot @ 2022-05-16 12:54 UTC (permalink / raw)
To: menglong8.dong, edumazet
Cc: llvm, kbuild-all, rostedt, mingo, davem, yoshfuji, dsahern, kuba,
pabeni, imagedong, kafai, talalahmad, keescook, dongli.zhang,
linux-kernel, netdev
In-Reply-To: <20220516034519.184876-8-imagedong@tencent.com>
Hi,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on net-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/menglong8-dong-gmail-com/net-tcp-add-skb-drop-reasons-to-tcp-state-change/20220516-114934
base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git d9713088158b23973266e07fdc85ff7d68791a8c
config: mips-mtx1_defconfig (https://download.01.org/0day-ci/archive/20220516/202205162057.owcP29LO-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 853fa8ee225edf2d0de94b0dcbd31bea916e825e)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install mips cross compiling tool for clang build
# apt-get install binutils-mips-linux-gnu
# https://github.com/intel-lab-lkp/linux/commit/d93679590223760e685126e344dfddd7d7c08cc3
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review menglong8-dong-gmail-com/net-tcp-add-skb-drop-reasons-to-tcp-state-change/20220516-114934
git checkout d93679590223760e685126e344dfddd7d7c08cc3
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=mips SHELL=/bin/bash net/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
>> net/dccp/ipv4.c:584:5: error: conflicting types for 'dccp_v4_conn_request'
int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb,
^
net/dccp/dccp.h:258:5: note: previous declaration is here
int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb);
^
>> net/dccp/ipv4.c:921:21: error: incompatible function pointer types initializing 'int (*)(struct sock *, struct sk_buff *, enum skb_drop_reason *)' with an expression of type 'int (struct sock *, struct sk_buff *)' [-Werror,-Wincompatible-function-pointer-types]
.conn_request = dccp_v4_conn_request,
^~~~~~~~~~~~~~~~~~~~
2 errors generated.
vim +/dccp_v4_conn_request +584 net/dccp/ipv4.c
583
> 584 int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb,
585 enum skb_drop_reason *reason)
586 {
587 struct inet_request_sock *ireq;
588 struct request_sock *req;
589 struct dccp_request_sock *dreq;
590 const __be32 service = dccp_hdr_request(skb)->dccph_req_service;
591 struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb);
592
593 /* Never answer to DCCP_PKT_REQUESTs send to broadcast or multicast */
594 if (skb_rtable(skb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
595 return 0; /* discard, don't send a reset here */
596
597 if (dccp_bad_service_code(sk, service)) {
598 dcb->dccpd_reset_code = DCCP_RESET_CODE_BAD_SERVICE_CODE;
599 goto drop;
600 }
601 /*
602 * TW buckets are converted to open requests without
603 * limitations, they conserve resources and peer is
604 * evidently real one.
605 */
606 dcb->dccpd_reset_code = DCCP_RESET_CODE_TOO_BUSY;
607 if (inet_csk_reqsk_queue_is_full(sk))
608 goto drop;
609
610 if (sk_acceptq_is_full(sk))
611 goto drop;
612
613 req = inet_reqsk_alloc(&dccp_request_sock_ops, sk, true);
614 if (req == NULL)
615 goto drop;
616
617 if (dccp_reqsk_init(req, dccp_sk(sk), skb))
618 goto drop_and_free;
619
620 dreq = dccp_rsk(req);
621 if (dccp_parse_options(sk, dreq, skb))
622 goto drop_and_free;
623
624 if (security_inet_conn_request(sk, skb, req))
625 goto drop_and_free;
626
627 ireq = inet_rsk(req);
628 sk_rcv_saddr_set(req_to_sk(req), ip_hdr(skb)->daddr);
629 sk_daddr_set(req_to_sk(req), ip_hdr(skb)->saddr);
630 ireq->ir_mark = inet_request_mark(sk, skb);
631 ireq->ireq_family = AF_INET;
632 ireq->ir_iif = sk->sk_bound_dev_if;
633
634 /*
635 * Step 3: Process LISTEN state
636 *
637 * Set S.ISR, S.GSR, S.SWL, S.SWH from packet or Init Cookie
638 *
639 * Setting S.SWL/S.SWH to is deferred to dccp_create_openreq_child().
640 */
641 dreq->dreq_isr = dcb->dccpd_seq;
642 dreq->dreq_gsr = dreq->dreq_isr;
643 dreq->dreq_iss = dccp_v4_init_sequence(skb);
644 dreq->dreq_gss = dreq->dreq_iss;
645 dreq->dreq_service = service;
646
647 if (dccp_v4_send_response(sk, req))
648 goto drop_and_free;
649
650 inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
651 reqsk_put(req);
652 return 0;
653
654 drop_and_free:
655 reqsk_free(req);
656 drop:
657 __DCCP_INC_STATS(DCCP_MIB_ATTEMPTFAILS);
658 return -1;
659 }
660 EXPORT_SYMBOL_GPL(dccp_v4_conn_request);
661
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox