* [PATCH net-next 2/8] net: sched: pie: change default value of pie_params->target
From: Leslie Monis @ 2018-10-31 16:19 UTC (permalink / raw)
To: jhs
Cc: netdev, tahiliani, dhavaljkhandla26, hrishihiraskar, bmanish15597,
sdp.sachin
In-Reply-To: <1541002772-28040-1-git-send-email-lesliemonis@gmail.com>
From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
RFC 8033 suggests a default value of 15 milliseconds for
the target queue delay instead of 20 milliseconds.
Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
---
net/sched/sch_pie.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index 56c9e4d..c5d6d6b 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -83,7 +83,7 @@ static void pie_params_init(struct pie_params *params)
params->beta = 20;
params->tupdate = usecs_to_jiffies(30 * USEC_PER_MSEC); /* 30 ms */
params->limit = 1000; /* default of 1000 packets */
- params->target = PSCHED_NS2TICKS(20 * NSEC_PER_MSEC); /* 20 ms */
+ params->target = PSCHED_NS2TICKS(15 * NSEC_PER_MSEC); /* 15 ms */
params->ecn = false;
params->bytemode = false;
}
--
2.7.4
^ permalink raw reply related
* Re: [RFC PATCH] lib: Introduce generic __cmpxchg_u64() and use it where needed
From: Guenter Roeck @ 2018-11-01 1:18 UTC (permalink / raw)
To: Paul Burton, Trond Myklebust
Cc: Ralf Baechle, James Hogan, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, linux-mips@linux-mips.org,
linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
Andrew Morton, Arnd Bergmann, J. Bruce Fields, Jeff Layton,
Anna Schumaker, David S. Miller, linux-nfs@vger.kernel.org,
netdev@vger.kernel.org
In-Reply-To: <20181031233235.qbedw3pinxcuk7me@pburton-laptop>
On 10/31/18 4:32 PM, Paul Burton wrote:
> (Copying SunRPC & net maintainers.)
>
> Hi Guenter,
>
> On Wed, Oct 31, 2018 at 03:02:53PM -0700, Guenter Roeck wrote:
>> The alternatives I can see are
>> - Do not use cmpxchg64() outside architecture code (ie drop its use from
>> the offending driver, and keep doing the same whenever the problem comes
>> up again).
>> or
>> - Introduce something like ARCH_HAS_CMPXCHG64 and use it to determine
>> if cmpxchg64 is supported or not.
>>
>> Any preference ?
>
> My preference would be option 1 - avoiding cmpxchg64() where possible in
> generic code. I wouldn't be opposed to the Kconfig option if there are
> cases where cmpxchg64() can really help performance though.
>
> The last time I'm aware of this coming up the affected driver was
> modified to avoid cmpxchg64() [1].
>
> In this particular case I have no idea why
> net/sunrpc/auth_gss/gss_krb5_seal.c is using cmpxchg64() at all. It's
> essentially reinventing atomic64_fetch_inc() which is already provided
> everywhere via CONFIG_GENERIC_ATOMIC64 & the spinlock approach. At least
> for atomic64_* functions the assumption that all access will be
> performed using those same functions seems somewhat reasonable.
>
> So how does the below look? Trond?
>
For my part I agree that this would be a much better solution. The argument
that it is not always absolutely guaranteed that atomics don't wrap doesn't
really hold for me because it looks like they all do. On top of that, there
is an explicit atomic_dec_if_positive() and atomic_fetch_add_unless(),
which to me strongly suggests that they _are_ supposed to wrap.
Given the cost of adding a comparison to each atomic operation to
prevent it from wrapping, anything else would not really make sense to me.
So ... please consider my patch abandoned. Thanks for looking into this!
Guenter
> Thanks,
> Paul
>
> [1] https://patchwork.ozlabs.org/cover/891284/
>
> ---
> diff --git a/include/linux/sunrpc/gss_krb5.h b/include/linux/sunrpc/gss_krb5.h
> index 131424cefc6a..02c0412e368c 100644
> --- a/include/linux/sunrpc/gss_krb5.h
> +++ b/include/linux/sunrpc/gss_krb5.h
> @@ -107,8 +107,8 @@ struct krb5_ctx {
> u8 Ksess[GSS_KRB5_MAX_KEYLEN]; /* session key */
> u8 cksum[GSS_KRB5_MAX_KEYLEN];
> s32 endtime;
> - u32 seq_send;
> - u64 seq_send64;
> + atomic_t seq_send;
> + atomic64_t seq_send64;
> struct xdr_netobj mech_used;
> u8 initiator_sign[GSS_KRB5_MAX_KEYLEN];
> u8 acceptor_sign[GSS_KRB5_MAX_KEYLEN];
> @@ -118,9 +118,6 @@ struct krb5_ctx {
> u8 acceptor_integ[GSS_KRB5_MAX_KEYLEN];
> };
>
> -extern u32 gss_seq_send_fetch_and_inc(struct krb5_ctx *ctx);
> -extern u64 gss_seq_send64_fetch_and_inc(struct krb5_ctx *ctx);
> -
> /* The length of the Kerberos GSS token header */
> #define GSS_KRB5_TOK_HDR_LEN (16)
>
> diff --git a/net/sunrpc/auth_gss/gss_krb5_mech.c b/net/sunrpc/auth_gss/gss_krb5_mech.c
> index 7f0424dfa8f6..eab71fc7af3e 100644
> --- a/net/sunrpc/auth_gss/gss_krb5_mech.c
> +++ b/net/sunrpc/auth_gss/gss_krb5_mech.c
> @@ -274,6 +274,7 @@ get_key(const void *p, const void *end,
> static int
> gss_import_v1_context(const void *p, const void *end, struct krb5_ctx *ctx)
> {
> + u32 seq_send;
> int tmp;
>
> p = simple_get_bytes(p, end, &ctx->initiate, sizeof(ctx->initiate));
> @@ -315,9 +316,10 @@ gss_import_v1_context(const void *p, const void *end, struct krb5_ctx *ctx)
> p = simple_get_bytes(p, end, &ctx->endtime, sizeof(ctx->endtime));
> if (IS_ERR(p))
> goto out_err;
> - p = simple_get_bytes(p, end, &ctx->seq_send, sizeof(ctx->seq_send));
> + p = simple_get_bytes(p, end, &seq_send, sizeof(seq_send));
> if (IS_ERR(p))
> goto out_err;
> + atomic_set(&ctx->seq_send, seq_send);
> p = simple_get_netobj(p, end, &ctx->mech_used);
> if (IS_ERR(p))
> goto out_err;
> @@ -607,6 +609,7 @@ static int
> gss_import_v2_context(const void *p, const void *end, struct krb5_ctx *ctx,
> gfp_t gfp_mask)
> {
> + u64 seq_send64;
> int keylen;
>
> p = simple_get_bytes(p, end, &ctx->flags, sizeof(ctx->flags));
> @@ -617,14 +620,15 @@ gss_import_v2_context(const void *p, const void *end, struct krb5_ctx *ctx,
> p = simple_get_bytes(p, end, &ctx->endtime, sizeof(ctx->endtime));
> if (IS_ERR(p))
> goto out_err;
> - p = simple_get_bytes(p, end, &ctx->seq_send64, sizeof(ctx->seq_send64));
> + p = simple_get_bytes(p, end, &seq_send64, sizeof(seq_send64));
> if (IS_ERR(p))
> goto out_err;
> + atomic64_set(&ctx->seq_send64, seq_send64);
> /* set seq_send for use by "older" enctypes */
> - ctx->seq_send = ctx->seq_send64;
> - if (ctx->seq_send64 != ctx->seq_send) {
> - dprintk("%s: seq_send64 %lx, seq_send %x overflow?\n", __func__,
> - (unsigned long)ctx->seq_send64, ctx->seq_send);
> + atomic_set(&ctx->seq_send, seq_send64);
> + if (seq_send64 != atomic_read(&ctx->seq_send)) {
> + dprintk("%s: seq_send64 %llx, seq_send %x overflow?\n", __func__,
> + seq_send64, atomic_read(&ctx->seq_send));
> p = ERR_PTR(-EINVAL);
> goto out_err;
> }
> diff --git a/net/sunrpc/auth_gss/gss_krb5_seal.c b/net/sunrpc/auth_gss/gss_krb5_seal.c
> index b4adeb06660b..48fe4a591b54 100644
> --- a/net/sunrpc/auth_gss/gss_krb5_seal.c
> +++ b/net/sunrpc/auth_gss/gss_krb5_seal.c
> @@ -123,30 +123,6 @@ setup_token_v2(struct krb5_ctx *ctx, struct xdr_netobj *token)
> return krb5_hdr;
> }
>
> -u32
> -gss_seq_send_fetch_and_inc(struct krb5_ctx *ctx)
> -{
> - u32 old, seq_send = READ_ONCE(ctx->seq_send);
> -
> - do {
> - old = seq_send;
> - seq_send = cmpxchg(&ctx->seq_send, old, old + 1);
> - } while (old != seq_send);
> - return seq_send;
> -}
> -
> -u64
> -gss_seq_send64_fetch_and_inc(struct krb5_ctx *ctx)
> -{
> - u64 old, seq_send = READ_ONCE(ctx->seq_send);
> -
> - do {
> - old = seq_send;
> - seq_send = cmpxchg64(&ctx->seq_send64, old, old + 1);
> - } while (old != seq_send);
> - return seq_send;
> -}
> -
> static u32
> gss_get_mic_v1(struct krb5_ctx *ctx, struct xdr_buf *text,
> struct xdr_netobj *token)
> @@ -177,7 +153,7 @@ gss_get_mic_v1(struct krb5_ctx *ctx, struct xdr_buf *text,
>
> memcpy(ptr + GSS_KRB5_TOK_HDR_LEN, md5cksum.data, md5cksum.len);
>
> - seq_send = gss_seq_send_fetch_and_inc(ctx);
> + seq_send = atomic_fetch_inc(&ctx->seq_send);
>
> if (krb5_make_seq_num(ctx, ctx->seq, ctx->initiate ? 0 : 0xff,
> seq_send, ptr + GSS_KRB5_TOK_HDR_LEN, ptr + 8))
> @@ -205,7 +181,7 @@ gss_get_mic_v2(struct krb5_ctx *ctx, struct xdr_buf *text,
>
> /* Set up the sequence number. Now 64-bits in clear
> * text and w/o direction indicator */
> - seq_send_be64 = cpu_to_be64(gss_seq_send64_fetch_and_inc(ctx));
> + seq_send_be64 = cpu_to_be64(atomic64_fetch_inc(&ctx->seq_send64));
> memcpy(krb5_hdr + 8, (char *) &seq_send_be64, 8);
>
> if (ctx->initiate) {
> diff --git a/net/sunrpc/auth_gss/gss_krb5_wrap.c b/net/sunrpc/auth_gss/gss_krb5_wrap.c
> index 962fa84e6db1..5cdde6cb703a 100644
> --- a/net/sunrpc/auth_gss/gss_krb5_wrap.c
> +++ b/net/sunrpc/auth_gss/gss_krb5_wrap.c
> @@ -228,7 +228,7 @@ gss_wrap_kerberos_v1(struct krb5_ctx *kctx, int offset,
>
> memcpy(ptr + GSS_KRB5_TOK_HDR_LEN, md5cksum.data, md5cksum.len);
>
> - seq_send = gss_seq_send_fetch_and_inc(kctx);
> + seq_send = atomic_fetch_inc(&kctx->seq_send);
>
> /* XXX would probably be more efficient to compute checksum
> * and encrypt at the same time: */
> @@ -475,7 +475,7 @@ gss_wrap_kerberos_v2(struct krb5_ctx *kctx, u32 offset,
> *be16ptr++ = 0;
>
> be64ptr = (__be64 *)be16ptr;
> - *be64ptr = cpu_to_be64(gss_seq_send64_fetch_and_inc(kctx));
> + *be64ptr = cpu_to_be64(atomic64_fetch_inc(&kctx->seq_send64));
>
> err = (*kctx->gk5e->encrypt_v2)(kctx, offset, buf, pages);
> if (err)
>
^ permalink raw reply
* [PATCH net-next 3/8] net: sched: pie: change default value of pie_params->tupdate
From: Leslie Monis @ 2018-10-31 16:19 UTC (permalink / raw)
To: jhs
Cc: netdev, tahiliani, dhavaljkhandla26, hrishihiraskar, bmanish15597,
sdp.sachin
In-Reply-To: <1541002772-28040-1-git-send-email-lesliemonis@gmail.com>
From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
RFC 8033 suggests a default value of 15 milliseconds for
the update interval instead of 30 milliseconds.
Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
---
net/sched/sch_pie.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index c5d6d6b..9912d9cc 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -81,7 +81,7 @@ static void pie_params_init(struct pie_params *params)
{
params->alpha = 2;
params->beta = 20;
- params->tupdate = usecs_to_jiffies(30 * USEC_PER_MSEC); /* 30 ms */
+ params->tupdate = usecs_to_jiffies(15 * USEC_PER_MSEC); /* 15 ms */
params->limit = 1000; /* default of 1000 packets */
params->target = PSCHED_NS2TICKS(15 * NSEC_PER_MSEC); /* 15 ms */
params->ecn = false;
--
2.7.4
^ permalink raw reply related
* [PATCH net-next 4/8] net: sched: pie: change initial value of pie_vars->burst_time
From: Leslie Monis @ 2018-10-31 16:19 UTC (permalink / raw)
To: jhs
Cc: netdev, tahiliani, dhavaljkhandla26, hrishihiraskar, bmanish15597,
sdp.sachin
In-Reply-To: <1541002772-28040-1-git-send-email-lesliemonis@gmail.com>
From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
RFC 8033 suggests an initial value of 150 milliseconds for
the maximum time allowed for a burst of packets instead of
100 milliseconds.
Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
---
net/sched/sch_pie.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index 9912d9cc..f4e189a 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -92,8 +92,8 @@ static void pie_vars_init(struct pie_vars *vars)
{
vars->dq_count = DQCOUNT_INVALID;
vars->avg_dq_rate = 0;
- /* default of 100 ms in pschedtime */
- vars->burst_time = PSCHED_NS2TICKS(100 * NSEC_PER_MSEC);
+ /* default of 150 ms in pschedtime */
+ vars->burst_time = PSCHED_NS2TICKS(150 * NSEC_PER_MSEC);
}
static bool drop_early(struct Qdisc *sch, u32 packet_size)
--
2.7.4
^ permalink raw reply related
* [PATCH net-next 5/8] net: sched: pie: add more conditions to auto-tune alpha and beta
From: Leslie Monis @ 2018-10-31 16:19 UTC (permalink / raw)
To: jhs
Cc: netdev, tahiliani, dhavaljkhandla26, hrishihiraskar, bmanish15597,
sdp.sachin
In-Reply-To: <1541002772-28040-1-git-send-email-lesliemonis@gmail.com>
From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
The update in drop probability depends on the parameters
alpha and beta, which in turn reflect the current congestion
level. However, the previous if-else cases were recommended
when the supported bandwidth was up to 12 Mbps but, current
data links support a much higher bandwidth, and the
requirement for more bandwidth is in never-ending demand.
Hence, RFC 8033 suggests using more if-else cases for better
fine-tuning of parameters alpha and beta in order to control
the congestion as much as possible.
Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
---
net/sched/sch_pie.c | 26 +++++++++++++++++++++++---
1 file changed, 23 insertions(+), 3 deletions(-)
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index f4e189a..c84e91e 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -343,10 +343,30 @@ static void calculate_probability(struct Qdisc *sch)
* appropriately 2) scaling down by 16 to come to 0-2 range.
* Please see paper for details.
*
- * We scale alpha and beta differently depending on whether we are in
- * light, medium or high dropping mode.
+ * We scale alpha and beta differently depending on how heavy the
+ * congestion is.
*/
- if (q->vars.prob < MAX_PROB / 100) {
+ if (q->vars.prob < MAX_PROB / 1000000) {
+ alpha =
+ (q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 15;
+ beta =
+ (q->params.beta * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 15;
+ } else if (q->vars.prob < MAX_PROB / 100000) {
+ alpha =
+ (q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 13;
+ beta =
+ (q->params.beta * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 13;
+ } else if (q->vars.prob < MAX_PROB / 10000) {
+ alpha =
+ (q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 11;
+ beta =
+ (q->params.beta * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 11;
+ } else if (q->vars.prob < MAX_PROB / 1000) {
+ alpha =
+ (q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 9;
+ beta =
+ (q->params.beta * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 9;
+ } else if (q->vars.prob < MAX_PROB / 100) {
alpha =
(q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 7;
beta =
--
2.7.4
^ permalink raw reply related
* [PATCH net-next 6/8] net: sched: pie: add mechanism to set PIE active/inactive
From: Leslie Monis @ 2018-10-31 16:19 UTC (permalink / raw)
To: jhs
Cc: netdev, tahiliani, dhavaljkhandla26, hrishihiraskar, bmanish15597,
sdp.sachin
In-Reply-To: <1541002772-28040-1-git-send-email-lesliemonis@gmail.com>
From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
To overcome unnecessary packet drops due to a spurious
uptick in queuing latency caused by fluctuations in a
network, PIE can choose to be active only when the queue
occupancy is over a certain threshold. RFC 8033 suggests
the value of this threshold be 1/3 of the tail drop
threshold. PIE becomes inactive when the congestion ends
i.e., when the drop probability reaches 0, and the current
and previous latency samples are all below half of the
target queue delay.
Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
---
net/sched/sch_pie.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index c84e91e..b68b367 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -57,6 +57,7 @@ struct pie_vars {
psched_time_t dq_tstamp; /* drain rate */
u32 avg_dq_rate; /* bytes per pschedtime tick,scaled */
u32 qlen_old; /* in bytes */
+ bool active; /* inactive/active */
};
/* statistics gathering */
@@ -94,6 +95,7 @@ static void pie_vars_init(struct pie_vars *vars)
vars->avg_dq_rate = 0;
/* default of 150 ms in pschedtime */
vars->burst_time = PSCHED_NS2TICKS(150 * NSEC_PER_MSEC);
+ vars->active = true;
}
static bool drop_early(struct Qdisc *sch, u32 packet_size)
@@ -141,12 +143,23 @@ static int pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct pie_sched_data *q = qdisc_priv(sch);
bool enqueue = false;
+ if (!q->vars.active && qdisc_qlen(sch) >= sch->limit / 3) {
+ /* If the queue occupancy is over 1/3 of the tail drop
+ * threshold, turn on PIE.
+ */
+ pie_vars_init(&q->vars);
+ q->vars.prob = 0;
+ q->vars.qdelay_old = 0;
+ q->vars.dq_count = 0;
+ q->vars.dq_tstamp = psched_get_time();
+ }
+
if (unlikely(qdisc_qlen(sch) >= sch->limit)) {
q->stats.overlimit++;
goto out;
}
- if (!drop_early(sch, skb->len)) {
+ if (!q->vars.active || !drop_early(sch, skb->len)) {
enqueue = true;
} else if (q->params.ecn && (q->vars.prob <= MAX_PROB / 10) &&
INET_ECN_set_ce(skb)) {
@@ -431,7 +444,7 @@ static void calculate_probability(struct Qdisc *sch)
q->vars.qdelay = qdelay;
q->vars.qlen_old = qlen;
- /* We restart the measurement cycle if the following conditions are met
+ /* We turn off PIE if the following conditions are met
* 1. If the delay has been low for 2 consecutive Tupdate periods
* 2. Calculated drop probability is zero
* 3. We have atleast one estimate for the avg_dq_rate ie.,
@@ -441,7 +454,7 @@ static void calculate_probability(struct Qdisc *sch)
(q->vars.qdelay_old < q->params.target / 2) &&
q->vars.prob == 0 &&
q->vars.avg_dq_rate > 0)
- pie_vars_init(&q->vars);
+ q->vars.active = false;
}
static void pie_timer(struct timer_list *t)
@@ -451,7 +464,8 @@ static void pie_timer(struct timer_list *t)
spinlock_t *root_lock = qdisc_lock(qdisc_root_sleeping(sch));
spin_lock(root_lock);
- calculate_probability(sch);
+ if (q->vars.active)
+ calculate_probability(sch);
/* reset the timer to fire after 'tupdate'. tupdate is in jiffies. */
if (q->params.tupdate)
--
2.7.4
^ permalink raw reply related
* [PATCH net-next 7/8] net: sched: pie: add derandomization mechanism
From: Leslie Monis @ 2018-10-31 16:19 UTC (permalink / raw)
To: jhs
Cc: netdev, tahiliani, dhavaljkhandla26, hrishihiraskar, bmanish15597,
sdp.sachin
In-Reply-To: <1541002772-28040-1-git-send-email-lesliemonis@gmail.com>
From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
Random dropping of packets to achieve latency control may
introduce outlier situations where packets are dropped too
close to each other or too far from each other. This can
cause the real drop percentage to temporarily deviate from
the intended drop probability. In certain scenarios, such
as a small number of simultaneous TCP flows, these
deviations can cause significant deviations in link
utilization and queuing latency. RFC 8033 suggests using a
derandomization mechanism to avoid these deviations.
Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
---
net/sched/sch_pie.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index b68b367..88e605c 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -58,6 +58,7 @@ struct pie_vars {
u32 avg_dq_rate; /* bytes per pschedtime tick,scaled */
u32 qlen_old; /* in bytes */
bool active; /* inactive/active */
+ u64 accu_prob; /* accumulated drop probability */
};
/* statistics gathering */
@@ -96,6 +97,7 @@ static void pie_vars_init(struct pie_vars *vars)
/* default of 150 ms in pschedtime */
vars->burst_time = PSCHED_NS2TICKS(150 * NSEC_PER_MSEC);
vars->active = true;
+ vars->accu_prob = 0;
}
static bool drop_early(struct Qdisc *sch, u32 packet_size)
@@ -130,9 +132,21 @@ static bool drop_early(struct Qdisc *sch, u32 packet_size)
else
local_prob = q->vars.prob;
+ if (local_prob == 0)
+ q->vars.accu_prob = 0;
+
+ q->vars.accu_prob += local_prob;
+
+ if (q->vars.accu_prob < (MAX_PROB / 100) * 85)
+ return false;
+ if (q->vars.accu_prob >= ((u64)MAX_PROB * 17) / 2)
+ return true;
+
rnd = prandom_u32();
- if (rnd < local_prob)
+ if (rnd < local_prob) {
+ q->vars.accu_prob = 0;
return true;
+ }
return false;
}
@@ -181,6 +195,7 @@ static int pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
out:
q->stats.dropped++;
+ q->vars.accu_prob = 0;
return qdisc_drop(skb, sch, to_free);
}
--
2.7.4
^ permalink raw reply related
* [PATCH net-next 8/8] net: sched: pie: update references
From: Leslie Monis @ 2018-10-31 16:19 UTC (permalink / raw)
To: jhs
Cc: netdev, tahiliani, dhavaljkhandla26, hrishihiraskar, bmanish15597,
sdp.sachin
In-Reply-To: <1541002772-28040-1-git-send-email-lesliemonis@gmail.com>
From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
RFC 8033 replaces the IETF draft for PIE
Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
---
net/sched/sch_pie.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index 88e605c..708e8ad 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -17,7 +17,7 @@
* University of Oslo, Norway.
*
* References:
- * IETF draft submission: http://tools.ietf.org/html/draft-pan-aqm-pie-00
+ * RFC 8033: https://tools.ietf.org/html/rfc8033
* IEEE Conference on High Performance Switching and Routing 2013 :
* "PIE: A * Lightweight Control Scheme to Address the Bufferbloat Problem"
*/
--
2.7.4
^ permalink raw reply related
* Re: [PATCH net-next 0/8] net: sched: pie: align PIE implementation with RFC 8033
From: Stephen Hemminger @ 2018-10-31 16:36 UTC (permalink / raw)
To: Leslie Monis
Cc: jhs, netdev, tahiliani, dhavaljkhandla26, hrishihiraskar,
bmanish15597, sdp.sachin
In-Reply-To: <1541002772-28040-1-git-send-email-lesliemonis@gmail.com>
On Wed, 31 Oct 2018 21:49:24 +0530
Leslie Monis <lesliemonis@gmail.com> wrote:
> The current implementation of PIE queueing discipline is according to an IETF
> draft [http://tools.ietf.org/html/draft-pan-aqm-pie-00] and the paper
> [PIE: A Lightweight Control Scheme to Address the Bufferbloat Problem].
> However, a lot of necessary modifications and enhancements have been proposed
> in RFC 8033, which have not yet been incorporated in the source code of Linux
> kernel. The following series of patches helps in achieving the same.
>
> This patch series includes:
>
> 1. Change the value of QUEUE_THRESHOLD
> 2. Change the default value of pie_params->target
> 3. Change the default value of pie_params->tupdate
> 4. Change the initial value of pie_vars->burst_time
> 5. Add more conditions to auto-tune alpha and beta
> 6. Add mechanism to set PIE active/inactive
> 7. Add a derandomization mechanism
> 8. Update references
>
> Mohit P. Tahiliani (8):
> net: sched: pie: change value of QUEUE_THRESHOLD
> net: sched: pie: change default value of pie_params->target
> net: sched: pie: change default value of pie_params->tupdate
> net: sched: pie: change initial value of pie_vars->burst_time
> net: sched: pie: add more conditions to auto-tune alpha and beta
> net: sched: pie: add mechanism to set PIE active/inactive
> net: sched: pie: add derandomization mechanism
> net: sched: pie: update references
>
> net/sched/sch_pie.c | 77 +++++++++++++++++++++++++++++++++++++++++++----------
> 1 file changed, 63 insertions(+), 14 deletions(-)
>
Did you do performance tests? Often the RFC is out of date and
the actual values are better than those in the standard.
^ permalink raw reply
* Re: [PATCH] net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules
From: David Miller @ 2018-11-01 1:36 UTC (permalink / raw)
To: niklas.cassel
Cc: peppe.cavallaro, alexandre.torgue, joabreu, mcoquelin.stm32,
vkoul, netdev, linux-stm32, linux-arm-kernel, linux-kernel
In-Reply-To: <20181031150810.16665-1-niklas.cassel@linaro.org>
From: Niklas Cassel <niklas.cassel@linaro.org>
Date: Wed, 31 Oct 2018 16:08:10 +0100
> When building stmmac, it is only possible to select CONFIG_DWMAC_GENERIC,
> or any of the glue drivers, when CONFIG_STMMAC_PLATFORM is set.
> The only exception is CONFIG_STMMAC_PCI.
>
> When calling of_mdiobus_register(), it will call our ->reset()
> callback, which is set to stmmac_mdio_reset().
>
> Most of the code in stmmac_mdio_reset() is protected by a
> "#if defined(CONFIG_STMMAC_PLATFORM)", which will evaluate
> to false when CONFIG_STMMAC_PLATFORM=m.
>
> Because of this, the phy reset gpio will only be pulled when
> stmmac is built as built-in, but not when built as modules.
>
> Fix this by using "#if IS_ENABLED()" instead of "#if defined()".
>
> Signed-off-by: Niklas Cassel <niklas.cassel@linaro.org>
Applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH net-next 7/8] net: sched: pie: add derandomization mechanism
From: Stephen Hemminger @ 2018-10-31 16:38 UTC (permalink / raw)
To: Leslie Monis
Cc: jhs, netdev, tahiliani, dhavaljkhandla26, hrishihiraskar,
bmanish15597, sdp.sachin
In-Reply-To: <1541002772-28040-8-git-send-email-lesliemonis@gmail.com>
On Wed, 31 Oct 2018 21:49:31 +0530
Leslie Monis <lesliemonis@gmail.com> wrote:
> diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
> index b68b367..88e605c 100644
> --- a/net/sched/sch_pie.c
> +++ b/net/sched/sch_pie.c
> @@ -58,6 +58,7 @@ struct pie_vars {
> u32 avg_dq_rate; /* bytes per pschedtime tick,scaled */
> u32 qlen_old; /* in bytes */
> bool active; /* inactive/active */
> + u64 accu_prob; /* accumulated drop probability */
> };
>
Although putting it at the end seems like a natural place, it creates
holes in the c structure. My recommendation would be to put the new
field after dq_tsstamp.
^ permalink raw reply
* Re: [PATCH net-next 5/8] net: sched: pie: add more conditions to auto-tune alpha and beta
From: Stephen Hemminger @ 2018-10-31 16:40 UTC (permalink / raw)
To: Leslie Monis
Cc: jhs, netdev, tahiliani, dhavaljkhandla26, hrishihiraskar,
bmanish15597, sdp.sachin
In-Reply-To: <1541002772-28040-6-git-send-email-lesliemonis@gmail.com>
On Wed, 31 Oct 2018 21:49:29 +0530
Leslie Monis <lesliemonis@gmail.com> wrote:
> From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
>
> The update in drop probability depends on the parameters
> alpha and beta, which in turn reflect the current congestion
> level. However, the previous if-else cases were recommended
> when the supported bandwidth was up to 12 Mbps but, current
> data links support a much higher bandwidth, and the
> requirement for more bandwidth is in never-ending demand.
> Hence, RFC 8033 suggests using more if-else cases for better
> fine-tuning of parameters alpha and beta in order to control
> the congestion as much as possible.
>
> Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
> Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
> Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
> Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
> Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
> ---
> net/sched/sch_pie.c | 26 +++++++++++++++++++++++---
> 1 file changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
> index f4e189a..c84e91e 100644
> --- a/net/sched/sch_pie.c
> +++ b/net/sched/sch_pie.c
> @@ -343,10 +343,30 @@ static void calculate_probability(struct Qdisc *sch)
> * appropriately 2) scaling down by 16 to come to 0-2 range.
> * Please see paper for details.
> *
> - * We scale alpha and beta differently depending on whether we are in
> - * light, medium or high dropping mode.
> + * We scale alpha and beta differently depending on how heavy the
> + * congestion is.
> */
> - if (q->vars.prob < MAX_PROB / 100) {
> + if (q->vars.prob < MAX_PROB / 1000000) {
> + alpha =
> + (q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 15;
> + beta =
> + (q->params.beta * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 15;
> + } else if (q->vars.prob < MAX_PROB / 100000) {
> + alpha =
> + (q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 13;
> + beta =
> + (q->params.beta * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 13;
> + } else if (q->vars.prob < MAX_PROB / 10000) {
> + alpha =
> + (q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 11;
> + beta =
> + (q->params.beta * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 11;
> + } else if (q->vars.prob < MAX_PROB / 1000) {
> + alpha =
> + (q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 9;
> + beta =
> + (q->params.beta * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 9;
> + } else if (q->vars.prob < MAX_PROB / 100) {
> alpha =
> (q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 7;
> beta =
Seems like the if/else chain is getting long in the tail. Maybe a loop
or table driven approach would be clearer.
^ permalink raw reply
* Re: [PATCH net-next 6/8] net: sched: pie: add mechanism to set PIE active/inactive
From: Stephen Hemminger @ 2018-10-31 16:41 UTC (permalink / raw)
To: Leslie Monis
Cc: jhs, netdev, tahiliani, dhavaljkhandla26, hrishihiraskar,
bmanish15597, sdp.sachin
In-Reply-To: <1541002772-28040-7-git-send-email-lesliemonis@gmail.com>
On Wed, 31 Oct 2018 21:49:30 +0530
Leslie Monis <lesliemonis@gmail.com> wrote:
> diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
> index c84e91e..b68b367 100644
> --- a/net/sched/sch_pie.c
> +++ b/net/sched/sch_pie.c
> @@ -57,6 +57,7 @@ struct pie_vars {
> psched_time_t dq_tstamp; /* drain rate */
> u32 avg_dq_rate; /* bytes per pschedtime tick,scaled */
> u32 qlen_old; /* in bytes */
> + bool active; /* inactive/active */
> };
Current Linux best practice is to not use bool for true/false values
in a structure. This is because the size of bool is not obvious and
can cause padding.
Recommend using u8 instead.
^ permalink raw reply
* [GIT] Networking
From: David Miller @ 2018-11-01 1:44 UTC (permalink / raw)
To: torvalds; +Cc: akpm, netdev, linux-kernel
1) BPF verifier fixes from Daniel Borkmann.
2) HNS driver fixes from Huazhong Tan.
3) FDB only works for ethernet devices, reject attempts to install FDB
rules for others. From Ido Schimmel.
4) Fix spectre V1 in vhost, from Jason Wang.
5) Don't pass on-stack object to irq_set_affinity_hint() in mvpp2 driver,
from Marc Zyngier.
6) Fix mlx5e checksum handling when RXFCS is enabled, from Eric Dumazet.
Please pull, thanks a lot!
The following changes since commit 4b42745211af552f170f38a1b97f4a112b5da6b2:
Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc (2018-10-29 15:37:33 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
for you to fetch changes up to 46ebe2834ba5b541f28ee72e556a3fed42c47570:
openvswitch: Fix push/pop ethernet validation (2018-10-31 18:37:16 -0700)
----------------------------------------------------------------
Alexei Starovoitov (1):
Merge branch 'verifier-fixes'
Andrey Ignatov (1):
libbpf: Fix compile error in libbpf_attach_type_by_name
Bo YU (2):
net: add an identifier name for 'struct sock *'
net: drop a space before tabs
Colin Ian King (1):
net: hns3: fix spelling mistake "intrerrupt" -> "interrupt"
Daniel Borkmann (4):
bpf: fix partial copy of map_ptr when dst is scalar
bpf: don't set id on after map lookup with ptr_to_map_val return
bpf: add various test cases to test_verifier
bpf: test make sure to run unpriv test cases in test_verifier
David S. Miller (5):
Merge branch 'mlxsw-Couple-of-fixes'
Merge branch 'hns3-fixes'
Merge branch 'mlxsw-Enable-minimum-shaper-on-MC-TCs'
Merge git://git.kernel.org/.../bpf/bpf
Merge branch '10GbE' of git://git.kernel.org/.../jkirsher/net-queue
Eric Dumazet (2):
net/mlx4_en: add a missing <net/ip.h> include
net/mlx5e: fix csum adjustments caused by RXFCS
Hangbin Liu (1):
ipv4/igmp: fix v1/v2 switchback timeout based on rfc3376, 8.12
Huazhong Tan (11):
net: hns3: add error handler for hns3_nic_init_vector_data()
net: hns3: bugfix for buffer not free problem during resetting
net: hns3: bugfix for reporting unknown vector0 interrupt repeatly problem
net: hns3: bugfix for the initialization of command queue's spin lock
net: hns3: remove unnecessary queue reset in the hns3_uninit_all_ring()
net: hns3: bugfix for is_valid_csq_clean_head()
net: hns3: bugfix for hclge_mdio_write and hclge_mdio_read
net: hns3: fix incorrect return value/type of some functions
net: hns3: bugfix for handling mailbox while the command queue reinitialized
net: hns3: bugfix for rtnl_lock's range in the hclge_reset()
net: hns3: bugfix for rtnl_lock's range in the hclgevf_reset()
Ido Schimmel (1):
rtnetlink: Disallow FDB configuration for non-Ethernet device
Jacob Keller (3):
fm10k: ensure completer aborts are marked as non-fatal after a resume
fm10k: add missing device IDs to the upstream driver
fm10k: bump driver version to match out-of-tree release
Jaime Caamaño Ruiz (1):
openvswitch: Fix push/pop ethernet validation
Jason Wang (1):
vhost: Fix Spectre V1 vulnerability
Jeff Kirsher (1):
ixgbe/ixgbevf: fix XFRM_ALGO dependency
John Fastabend (1):
bpf: tcp_bpf_recvmsg should return EAGAIN when nonblocking and no data
Li Zhijian (1):
kselftests/bpf: use ping6 as the default ipv6 ping binary if it exists
Lorenzo Colitti (1):
Documentation: ip-sysctl.txt: Document tcp_fwmark_accept
Marc Zyngier (1):
net: mvpp2: Fix affinity hint allocation
Miroslav Lichvar (1):
igb: shorten maximum PHC timecounter update interval
Mitch Williams (1):
i40e: Update status codes
Nathan Chancellor (1):
hinic: Fix l4_type parameter in hinic_task_set_tunnel_l4
Ngai-Mint Kwan (1):
fm10k: fix SM mailbox full condition
Niklas Cassel (1):
net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules
Petr Machata (5):
mlxsw: spectrum_switchdev: Don't ignore deletions of learned MACs
mlxsw: reg: QEEC: Add minimum shaper fields
mlxsw: spectrum: Set minimum shaper on MC TCs
selftests: mlxsw: qos_mc_aware: Tweak for min shaper
selftests: mlxsw: qos_mc_aware: Add a test for UC awareness
Radoslaw Tyl (1):
ixgbe: fix MAC anti-spoofing filter after VFLR
Shalom Toledo (1):
mlxsw: core: Fix devlink unregister flow
Tobias Jungel (1):
bonding: fix length of actor system
Xin Long (2):
sctp: clear the transport of some out_chunk_list chunks in sctp_assoc_rm_peer
sctp: check policy more carefully when getting pr status
Yonghong Song (1):
tools/bpf: add unlimited rlimit for flow_dissector_load
Documentation/networking/ip-sysctl.txt | 11 ++++
drivers/net/bonding/bond_netlink.c | 3 +-
drivers/net/ethernet/hisilicon/hns3/hnae3.h | 6 +-
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 117 +++++++++++++++++++++++++++----------
drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 2 +-
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 26 +++++----
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 2 +-
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 42 ++++++-------
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h | 2 +-
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 6 ++
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c | 4 +-
drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 19 +++---
drivers/net/ethernet/huawei/hinic/hinic_hw_qp.c | 2 +-
drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h | 2 +-
drivers/net/ethernet/intel/Kconfig | 18 ++++++
drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 51 +++++++++-------
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 2 +
drivers/net/ethernet/intel/fm10k/fm10k_type.h | 2 +
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 +-
drivers/net/ethernet/intel/igb/igb_ptp.c | 8 ++-
drivers/net/ethernet/intel/ixgbe/Makefile | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 8 +--
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 4 +-
drivers/net/ethernet/intel/ixgbevf/Makefile | 2 +-
drivers/net/ethernet/intel/ixgbevf/ixgbevf.h | 4 +-
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 +-
drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 1 +
drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 18 ++++--
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 45 +++-----------
drivers/net/ethernet/mellanox/mlxsw/core.c | 24 +++++---
drivers/net/ethernet/mellanox/mlxsw/reg.h | 22 ++++++-
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 25 ++++++++
drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 2 -
drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 2 +-
drivers/vhost/vhost.c | 2 +
include/linux/avf/virtchnl.h | 12 +++-
include/linux/bpf_verifier.h | 3 +
include/linux/inetdevice.h | 4 +-
include/net/af_unix.h | 4 +-
kernel/bpf/verifier.c | 21 ++++---
net/core/rtnetlink.c | 10 ++++
net/ipv4/igmp.c | 53 +++++++++++------
net/ipv4/tcp_bpf.c | 1 +
net/openvswitch/flow_netlink.c | 4 +-
net/sctp/associola.c | 10 +++-
net/sctp/socket.c | 8 ++-
net/xfrm/Kconfig | 1 -
tools/lib/bpf/libbpf.c | 13 +++--
tools/testing/selftests/bpf/flow_dissector_load.c | 2 +
tools/testing/selftests/bpf/test_skb_cgroup_id.sh | 3 +-
tools/testing/selftests/bpf/test_sock_addr.sh | 3 +-
tools/testing/selftests/bpf/test_verifier.c | 321 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------
tools/testing/selftests/drivers/net/mlxsw/qos_mc_aware.sh | 95 ++++++++++++++++++++++--------
56 files changed, 793 insertions(+), 274 deletions(-)
^ permalink raw reply
* RE: [RFC PATCH 4/4] ixgbe: add support for extended PHC gettime
From: Keller, Jacob E @ 2018-10-31 16:55 UTC (permalink / raw)
To: Richard Cochran, Miroslav Lichvar
Cc: netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org
In-Reply-To: <20181031144003.qs235wjmiuwaprps@localhost>
> -----Original Message-----
> From: Richard Cochran [mailto:richardcochran@gmail.com]
> Sent: Wednesday, October 31, 2018 7:40 AM
> To: Miroslav Lichvar <mlichvar@redhat.com>
> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; netdev@vger.kernel.org; intel-wired-
> lan@lists.osuosl.org
> Subject: Re: [RFC PATCH 4/4] ixgbe: add support for extended PHC gettime
>
> On Mon, Oct 29, 2018 at 02:31:09PM +0100, Miroslav Lichvar wrote:
> > I think there could be a flag in ptp_system_timestamp, or a parameter
> > of gettimex64(), which would enable/disable reading of the system
> > clock.
>
> I'm not a fan of functions that change their behavior based on flags
> in their input parameters.
>
> Thanks,
> Richard
Neither am I. I do however want to find a solution that avoids having drivers needlessly duplicate almost the same functionality.
Thanks,
Jake
^ permalink raw reply
* Re: [PATCH net] rtnetlink: invoke 'cb->done' destructor before 'cb->args' reset
From: David Ahern @ 2018-10-31 16:55 UTC (permalink / raw)
To: Alexey Kodanev, netdev; +Cc: David Miller
In-Reply-To: <1540968178-18894-1-git-send-email-alexey.kodanev@oracle.com>
On 10/31/18 12:42 AM, Alexey Kodanev wrote:
> cb->args[2] can store the pointer to the struct fib6_walker,
> allocated in inet6_dump_fib(). On the next loop iteration in
> rtnl_dump_all(), 'memset(&cb, 0, sizeof(cb->args))' can reset
> that pointer, leaking the memory [1].
>
> Fix it by calling cb->done, if it is set, before filling 'cb->args'
> with zeros.
>
> Looks like the recent changes in rtnl_dump_all() contributed to
> the appearance of this kmemleak [1], commit c63586dc9b3e ("net:
> rtnl_dump_all needs to propagate error from dumpit function")
> breaks the loop only on an error now.
>
...
It is more efficient to keep going.
I think the simplest fix for 4.20 is to break the loop if ret is non-0 -
restore the previous behavior. For net-next I think the done callback is
not needed for ipv6; I think there is a simpler way to do it.
^ permalink raw reply
* RE: [RFC PATCH 3/4] igb: add support for extended PHC gettime
From: Keller, Jacob E @ 2018-10-31 16:56 UTC (permalink / raw)
To: Miroslav Lichvar, Richard Cochran
Cc: netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org
In-Reply-To: <20181031093935.GL31668@localhost>
> -----Original Message-----
> From: Miroslav Lichvar [mailto:mlichvar@redhat.com]
> Sent: Wednesday, October 31, 2018 2:40 AM
> To: Richard Cochran <richardcochran@gmail.com>
> Cc: netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; Keller, Jacob E
> <jacob.e.keller@intel.com>
> Subject: Re: [RFC PATCH 3/4] igb: add support for extended PHC gettime
>
> On Tue, Oct 30, 2018 at 07:29:16PM -0700, Richard Cochran wrote:
> > On Fri, Oct 26, 2018 at 06:27:41PM +0200, Miroslav Lichvar wrote:
> > > +static int igb_ptp_gettimex(struct ptp_clock_info *ptp,
> > > + struct ptp_system_timestamp *sts)
> > > +{
> > > + struct igb_adapter *igb = container_of(ptp, struct igb_adapter,
> > > + ptp_caps);
> > > + struct e1000_hw *hw = &igb->hw;
> > > + unsigned long flags;
> > > + u32 lo, hi;
> > > + u64 ns;
> > > +
> > > + spin_lock_irqsave(&igb->tmreg_lock, flags);
> > > +
> > > + /* 82576 doesn't have SYSTIMR */
> > > + if (igb->hw.mac.type == e1000_82576) {
> >
> > Instead of if/then/else, can't you follow the pattern of providing
> > different function flavors ...
>
> I can. I was just trying to minimize the amount of triplicated code.
> In the next version I'll add a patch to deprecate the old gettime
> functions, as Jacob suggested, and replace them with the extended
> versions, so the amount of code will not change that much.
>
Excellent.
-Jake
> Thanks,
>
> --
> Miroslav Lichvar
^ permalink raw reply
* Re: [PATCH net] rtnetlink: invoke 'cb->done' destructor before 'cb->args' reset
From: David Ahern @ 2018-10-31 17:35 UTC (permalink / raw)
To: Alexey Kodanev, netdev; +Cc: David Miller
In-Reply-To: <104f12e4-866b-b986-cb9d-28c40d5c5e84@gmail.com>
On 10/31/18 10:55 AM, David Ahern wrote:
> I think the simplest fix for 4.20 is to break the loop if ret is non-0 -
> restore the previous behavior.
that is the only recourse. It has to bail if ret is non-0. Do you want
to send a patch with that fix?
^ permalink raw reply
* Re: libbpf build failure on debian:9 with clang
From: Andrey Ignatov @ 2018-10-31 17:40 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Daniel Borkmann, Linux Networking Development Mailing List
In-Reply-To: <20181031141208.GI10660@kernel.org>
Arnaldo Carvalho de Melo <acme@kernel.org> [Wed, 2018-10-31 07:12 -0700]:
> 17 40.66 debian:9 : FAIL gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
>
> The failure was with clang tho:
>
> clang version 3.8.1-24 (tags/RELEASE_381/final)
>
> With:
>
> gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)
>
> it built without any warnings/errors.
>
> CC /tmp/build/perf/libbpf.o
> libbpf.c:2201:36: error: comparison of constant -22 with expression of type 'const enum bpf_attach_type' is always false [-Werror,-Wtautological-constant-out-of-range-compare]
> if (section_names[i].attach_type == -EINVAL)
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~
> 1 error generated.
Hi Arnaldo,
I have a clang version that I can reproduce this error with. Working on
patch.
Thank you for report!
> CC /tmp/build/perf/help.o
> mv: cannot stat '/tmp/build/perf/.libbpf.o.tmp': No such file or directory
> /git/linux/tools/build/Makefile.build:96: recipe for target '/tmp/build/perf/libbpf.o' failed
> make[4]: *** [/tmp/build/perf/libbpf.o] Error 1
>
> This is the cset:
>
> commit 956b620fcf0b64de403cd26a56bc41e6e4826ea6
> Author: Andrey Ignatov <rdna@fb.com>
> Date: Wed Sep 26 15:24:53 2018 -0700
>
> libbpf: Introduce libbpf_attach_type_by_name
>
> ------------------------
>
> Tests are continuing, so far:
>
> 1 43.53 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0
> 2 58.62 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822
> 3 51.62 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0
> 4 51.68 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0
> 5 49.38 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0
> 6 79.07 alpine:edge : Ok gcc (Alpine 6.4.0) 6.4.0
> 7 63.35 amazonlinux:1 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
> 8 59.65 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
> 9 47.39 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
> 10 50.64 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
> 11 28.75 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55)
> 12 33.26 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
> 13 43.16 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
> 14 73.61 clearlinux:latest : FAIL gcc (Clear Linux OS for Intel Architecture) 8.2.1 20180502
> 15 45.56 debian:7 : Ok gcc (Debian 4.7.2-5) 4.7.2
> 16 45.53 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u1) 4.9.2
> 17 40.66 debian:9 : FAIL gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
> 18 113.19 debian:experimental : Ok gcc (Debian 8.2.0-8) 8.2.0
> 19 41.48 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.2.0-7) 8.2.0
> 20 41.51 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.2.0-7) 8.2.0
> 21 40.09 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.1.0-12) 8.1.0
> 22 42.17 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.2.0-7) 8.2.0
> 23 40.02 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
> 24 45.47 fedora:21 : Ok gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
> 25 41.64 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
> 26 43.60 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
> 27 44.04 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)
> 28 37.21 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
>
> The problem with clearlinux is unrelated:
>
> clang-7: error: unknown argument: '-fno-semantic-interposition'
> clang-7: error: unsupported argument '4' to option 'flto='
> clang-7: error: optimization flag '-ffat-lto-objects' is not supported [-Werror,-Wignored-optimization-argument]
>
--
Andrey Ignatov
^ permalink raw reply
* Re: [PATCH net-next 0/8] net: sched: pie: align PIE implementation with RFC 8033
From: David Miller @ 2018-10-31 17:43 UTC (permalink / raw)
To: lesliemonis
Cc: jhs, netdev, tahiliani, dhavaljkhandla26, hrishihiraskar,
bmanish15597, sdp.sachin
In-Reply-To: <1541002772-28040-1-git-send-email-lesliemonis@gmail.com>
net-next is closed, please resubmit this when net-next opens back up.
Thank you.
^ permalink raw reply
* [PATCH net] openvswitch: Fix push/pop ethernet validation
From: Jaime Caamaño Ruiz @ 2018-10-31 17:52 UTC (permalink / raw)
To: netdev; +Cc: pshelar, Jaime Caamaño Ruiz
When there are both pop and push ethernet header actions among the
actions to be applied to a packet, an unexpected EINVAL (Invalid
argument) error is obtained. This is due to mac_proto not being reset
correctly when those actions are validated.
Reported-at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047554.html
Fixes: 91820da6ae85 ("openvswitch: add Ethernet push and pop actions")
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com>
---
net/openvswitch/flow_netlink.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index a70097ecf33c..865ecef68196 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -3030,7 +3030,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
* is already present */
if (mac_proto != MAC_PROTO_NONE)
return -EINVAL;
- mac_proto = MAC_PROTO_NONE;
+ mac_proto = MAC_PROTO_ETHERNET;
break;
case OVS_ACTION_ATTR_POP_ETH:
@@ -3038,7 +3038,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
return -EINVAL;
if (vlan_tci & htons(VLAN_TAG_PRESENT))
return -EINVAL;
- mac_proto = MAC_PROTO_ETHERNET;
+ mac_proto = MAC_PROTO_NONE;
break;
case OVS_ACTION_ATTR_PUSH_NSH:
--
2.16.4
^ permalink raw reply related
* Re: [PATCH iproute2] Use libbsd for strlcpy if available
From: Luca Boccassi @ 2018-10-31 17:54 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, dsahern
In-Reply-To: <20181031080922.2ff123eb@xeon-e3>
[-- Attachment #1: Type: text/plain, Size: 2177 bytes --]
On Wed, 2018-10-31 at 08:09 -0700, Stephen Hemminger wrote:
> On Mon, 29 Oct 2018 10:46:50 +0000
> Luca Boccassi <bluca@debian.org> wrote:
>
> > If libc does not provide strlcpy check for libbsd with pkg-config
> > to
> > avoid relying on inline version.
> >
> > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > ---
> > This allows distro maintainers to be able to choose to reduce
> > duplication and let this code be maintained in one place, in the
> > external library.
> >
>
> I like the idea, but it causes warnings on Debian testing, and maybe
> other distros.
>
> ipnetns.c:2: warning: "_ATFILE_SOURCE" redefined
> #define _ATFILE_SOURCE
>
> In file included from /usr/include/x86_64-linux-gnu/bits/libc-header-
> start.h:33,
> from /usr/include/string.h:26,
> from /usr/include/bsd/string.h:30,
> from <command-line>:
> /usr/include/features.h:326: note: this is the location of the
> previous definition
> # define _ATFILE_SOURCE 1
>
>
> Please figure out how to handle this and resubmit. SUSE open build
> service might
> also work to test multiple distro's
Ah missed that. That happens because features.h defines _ATFILE_SOURCE
to 1, but ip/ipnetns.c defines it without a value. According to the
spec either way doesn't change the result.
This happens because of the quick hack of using -include
/usr/include/bsd/string.h which was, well, a quick hack and didn't
require to add the include manually everywhere strlcpy was used, even
in the future. But it has side effects like this.
So I'll send v2 with a less hacky fix, which means defining HAVE_LIBBSD
in configure and doing #ifdef HAVE_LIBBSD #include <bsd/string.h> in
every file. It also means that this needs to be done for every future
use of strlcpy, or the build with libbsd will break.
If you or David prefer the hacky way, I can instead send a v3 that does
the quick hack, and also changes _ATFILE_SOURCE to 1 so that there is
no complaint from the compiler, as the values will be the same.
--
Kind regards,
Luca Boccassi
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* [PATCH iproute2 v2] Use libbsd for strlcpy if available
From: Luca Boccassi @ 2018-10-31 18:00 UTC (permalink / raw)
To: netdev; +Cc: stephen, dsahern
In-Reply-To: <20181029104650.24924-1-bluca@debian.org>
If libc does not provide strlcpy check for libbsd with pkg-config to
avoid relying on inline version.
Signed-off-by: Luca Boccassi <bluca@debian.org>
---
Changed from -include /usr/include/bsd/string.h hack to HAVE_LIBBSD
and proper includes in each file that uses strlcpy.
The hack causes a compiler warning as ip/ipnetns.c defines
_ATFILE_SOURCE without a value, but system headers use 1, so there's
a mismatch.
configure | 11 +++++++++--
genl/ctrl.c | 3 +++
ip/iplink.c | 3 +++
ip/ipnetns.c | 3 +++
ip/iproute_lwtunnel.c | 3 +++
ip/ipvrf.c | 3 +++
ip/ipxfrm.c | 3 +++
ip/tunnel.c | 3 +++
ip/xfrm_state.c | 3 +++
lib/bpf.c | 3 +++
lib/fs.c | 3 +++
lib/inet_proto.c | 3 +++
misc/ss.c | 3 +++
tc/em_ipset.c | 3 +++
tc/m_pedit.c | 3 +++
15 files changed, 51 insertions(+), 2 deletions(-)
diff --git a/configure b/configure
index 744d6282..c5655978 100755
--- a/configure
+++ b/configure
@@ -330,8 +330,15 @@ EOF
then
echo "no"
else
- echo 'CFLAGS += -DNEED_STRLCPY' >>$CONFIG
- echo "yes"
+ if ${PKG_CONFIG} libbsd --exists
+ then
+ echo 'CFLAGS += -DHAVE_LIBBSD' `${PKG_CONFIG} libbsd --cflags` >>$CONFIG
+ echo 'LDLIBS +=' `${PKG_CONFIG} libbsd --libs` >> $CONFIG
+ echo "no"
+ else
+ echo 'CFLAGS += -DNEED_STRLCPY' >>$CONFIG
+ echo "yes"
+ fi
fi
rm -f $TMPDIR/strtest.c $TMPDIR/strtest
}
diff --git a/genl/ctrl.c b/genl/ctrl.c
index 6133336a..fef6aaa9 100644
--- a/genl/ctrl.c
+++ b/genl/ctrl.c
@@ -18,6 +18,9 @@
#include <netinet/in.h>
#include <arpa/inet.h>
#include <string.h>
+#ifdef HAVE_LIBBSD
+#include <bsd/string.h>
+#endif
#include "utils.h"
#include "genl_utils.h"
diff --git a/ip/iplink.c b/ip/iplink.c
index b5519201..067f5409 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -24,6 +24,9 @@
#include <netinet/in.h>
#include <arpa/inet.h>
#include <string.h>
+#ifdef HAVE_LIBBSD
+#include <bsd/string.h>
+#endif
#include <sys/ioctl.h>
#include <stdbool.h>
#include <linux/mpls.h>
diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index 0eac18cf..da019d76 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -8,6 +8,9 @@
#include <sys/syscall.h>
#include <stdio.h>
#include <string.h>
+#ifdef HAVE_LIBBSD
+#include <bsd/string.h>
+#endif
#include <sched.h>
#include <fcntl.h>
#include <dirent.h>
diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c
index 8f497015..2285bc1d 100644
--- a/ip/iproute_lwtunnel.c
+++ b/ip/iproute_lwtunnel.c
@@ -16,6 +16,9 @@
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
+#ifdef HAVE_LIBBSD
+#include <bsd/string.h>
+#endif
#include <linux/ila.h>
#include <linux/lwtunnel.h>
#include <linux/mpls_iptunnel.h>
diff --git a/ip/ipvrf.c b/ip/ipvrf.c
index 8a6b7f97..8572b4f2 100644
--- a/ip/ipvrf.c
+++ b/ip/ipvrf.c
@@ -21,6 +21,9 @@
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
+#ifdef HAVE_LIBBSD
+#include <bsd/string.h>
+#endif
#include <dirent.h>
#include <errno.h>
#include <limits.h>
diff --git a/ip/ipxfrm.c b/ip/ipxfrm.c
index 17ab4abe..b02f30a6 100644
--- a/ip/ipxfrm.c
+++ b/ip/ipxfrm.c
@@ -28,6 +28,9 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
+#ifdef HAVE_LIBBSD
+#include <bsd/string.h>
+#endif
#include <sys/types.h>
#include <sys/socket.h>
#include <time.h>
diff --git a/ip/tunnel.c b/ip/tunnel.c
index d0d55f37..73abb2e2 100644
--- a/ip/tunnel.c
+++ b/ip/tunnel.c
@@ -24,6 +24,9 @@
#include <stdio.h>
#include <string.h>
+#ifdef HAVE_LIBBSD
+#include <bsd/string.h>
+#endif
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
diff --git a/ip/xfrm_state.c b/ip/xfrm_state.c
index e8c01746..18e0c6fa 100644
--- a/ip/xfrm_state.c
+++ b/ip/xfrm_state.c
@@ -27,6 +27,9 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
+#ifdef HAVE_LIBBSD
+#include <bsd/string.h>
+#endif
#include <netdb.h>
#include "utils.h"
#include "xfrm.h"
diff --git a/lib/bpf.c b/lib/bpf.c
index 45f279fa..35d7c45a 100644
--- a/lib/bpf.c
+++ b/lib/bpf.c
@@ -15,6 +15,9 @@
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
+#ifdef HAVE_LIBBSD
+#include <bsd/string.h>
+#endif
#include <stdbool.h>
#include <stdint.h>
#include <errno.h>
diff --git a/lib/fs.c b/lib/fs.c
index 86efd4ed..af36bea0 100644
--- a/lib/fs.c
+++ b/lib/fs.c
@@ -20,6 +20,9 @@
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
+#ifdef HAVE_LIBBSD
+#include <bsd/string.h>
+#endif
#include <errno.h>
#include <limits.h>
diff --git a/lib/inet_proto.c b/lib/inet_proto.c
index 0836a4c9..b379d8f8 100644
--- a/lib/inet_proto.c
+++ b/lib/inet_proto.c
@@ -18,6 +18,9 @@
#include <netinet/in.h>
#include <netdb.h>
#include <string.h>
+#ifdef HAVE_LIBBSD
+#include <bsd/string.h>
+#endif
#include "rt_names.h"
#include "utils.h"
diff --git a/misc/ss.c b/misc/ss.c
index 4d12fb5d..c472fbd9 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -19,6 +19,9 @@
#include <sys/sysmacros.h>
#include <netinet/in.h>
#include <string.h>
+#ifdef HAVE_LIBBSD
+#include <bsd/string.h>
+#endif
#include <errno.h>
#include <netdb.h>
#include <arpa/inet.h>
diff --git a/tc/em_ipset.c b/tc/em_ipset.c
index 48b287f5..550b2101 100644
--- a/tc/em_ipset.c
+++ b/tc/em_ipset.c
@@ -20,6 +20,9 @@
#include <netdb.h>
#include <unistd.h>
#include <string.h>
+#ifdef HAVE_LIBBSD
+#include <bsd/string.h>
+#endif
#include <stdlib.h>
#include <getopt.h>
diff --git a/tc/m_pedit.c b/tc/m_pedit.c
index 2aeb56d9..baacc80d 100644
--- a/tc/m_pedit.c
+++ b/tc/m_pedit.c
@@ -23,6 +23,9 @@
#include <netinet/in.h>
#include <arpa/inet.h>
#include <string.h>
+#ifdef HAVE_LIBBSD
+#include <bsd/string.h>
+#endif
#include <dlfcn.h>
#include "utils.h"
#include "tc_util.h"
--
2.19.1
^ permalink raw reply related
* Re: [PATCH net-next v2 5/6] net/ncsi: Reset channel state in ncsi_start_dev()
From: Samuel Mendoza-Jonas @ 2018-11-01 4:30 UTC (permalink / raw)
To: Justin.Lee1, netdev; +Cc: davem, linux-kernel, openbmc
In-Reply-To: <2be6038ad8cb43559495e6f84e97b8a6@AUSX13MPS306.AMER.DELL.COM>
On Tue, 2018-10-30 at 18:23 +0000, Justin.Lee1@Dell.com wrote:
> > On Fri, 2018-10-26 at 17:25 +0000, Justin.Lee1@Dell.com wrote:
> > > Hi Samuel,
> > >
> > > I noticed a few issues and commented below.
> > >
> > > Thanks,
> > > Justin
> > >
> > >
> > > > /* Resources */
> > > > +int ncsi_reset_dev(struct ncsi_dev *nd);
> > > > void ncsi_start_channel_monitor(struct ncsi_channel *nc);
> > > > void ncsi_stop_channel_monitor(struct ncsi_channel *nc);
> > > > struct ncsi_channel *ncsi_find_channel(struct ncsi_package *np,
> > > > diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
> > > > index 014321ad31d3..9bad03e3fa5e 100644
> > > > --- a/net/ncsi/ncsi-manage.c
> > > > +++ b/net/ncsi/ncsi-manage.c
> > > > @@ -550,8 +550,10 @@ static void ncsi_suspend_channel(struct ncsi_dev_priv *ndp)
> > > > spin_lock_irqsave(&nc->lock, flags);
> > > > nc->state = NCSI_CHANNEL_INACTIVE;
> > > > spin_unlock_irqrestore(&nc->lock, flags);
> > > > - ncsi_process_next_channel(ndp);
> > > > -
> > > > + if (ndp->flags & NCSI_DEV_RESET)
> > > > + ncsi_reset_dev(nd);
> > > > + else
> > > > + ncsi_process_next_channel(ndp);
> > > > break;
> > > > default:
> > > > netdev_warn(nd->dev, "Wrong NCSI state 0x%x in suspend\n",
> > > > @@ -1554,7 +1556,7 @@ int ncsi_start_dev(struct ncsi_dev *nd)
> > > > return 0;
> > > > }
> > > >
> > > > - return ncsi_choose_active_channel(nd);
> > > > + return ncsi_reset_dev(nd);
> > >
> > > If there is no available channel due to the whitelist, ncsi_start_dev() function will return failed
> > > Status and the network interface may fail to bring up too. It is possible for user to disable all
> > > channels and leave the interface up for checking the LOM status.
> > >
> >
> > I'm not sure that that is a bug, or at least not in the scope of this
> > series. If the whitelist is set such that no channels are valid then
> > there's nothing for NCSI to do. If we want to do something like always
> > monitor all channels then that would be best to do in another patch.
> >
> > > > }
> > > > EXPORT_SYMBOL_GPL(ncsi_start_dev);
> > >
> > > Also, if I send set_package_mask and set_channel_mask commands back to back in a program,
> > > the state machine doesn't work well. If I use command line and wait for it to complete for
> > > each step, then it is fine.
> >
> > Yeah that's not great; probably hitting some corner cases in the NCSI
> > locking. I'll look into the multi-channel related stuff but I have a
> > feeling that if you tried this with the existing set/clear commands you
> > would probably hit something similar, especially on your dual core
> > platform. If so this is probably something to fix separately.
> >
>
> It is possible that it is causing by the following code in ncsi_reset_dev() function.
> The state might be overwritten and the previous operation is interrupted.
>
> spin_lock_irqsave(&ndp->lock, flags);
> ndp->flags |= NCSI_DEV_RESET;
> ndp->active_channel = active;
> ndp->active_package = active->package;
> spin_unlock_irqrestore(&ndp->lock, flags);
>
> nd->state = ncsi_dev_state_suspend;
Yep, we should probably add a check before calling ncsi_reset_dev() in
the netlink code if we're already in reset, and check in ncsi_reset_dev()
if we mid-configuration.
For your trace below can you share exactly which commands you were
sending? Those messages aren't upstream so it's not 100% clear what's
being sent.
Thanks!
Sam
>
> > > npcm7xx-emc f0825000.eth eth2: NCSI: Multi-package enabled on ifindex 2, mask 0x00000001
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_stop_channel_monitor() - pkg 0 ch 0
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_suspend_channel() - pkg 0 ch 0 state 0400
> > > npcm7xx-emc f0825000.eth eth2: NCSI: pkg 0 ch 0 set as preferred channel
> > > npcm7xx-emc f0825000.eth eth2: NCSI: Multi-channel enabled on ifindex 2, mask 0x00000003
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_stop_channel_monitor() - pkg 0 ch 1
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_suspend_channel() - pkg 0 ch 1 state 0400
> > > npcm7xx-emc f0825000.eth eth2: NCSI: Package 1 set to all channels disabled
> > > npcm7xx-emc f0825000.eth eth2: NCSI: Multi-channel enabled on ifindex 2, mask 0x00000000
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel()
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pkg 0
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pass pkg whitelist
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - ch 0
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pass ch whitelist
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - skip
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - ch 1
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pass ch whitelist
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - skip
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - next pkg
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pkg 1
> > > npcm7xx-emc f0825000.eth eth2: NCSI: No channel found to configure!
> > > npcm7xx-emc f0825000.eth eth2: NCSI interface down
> > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> > > npcm7xx-emc f0825000.eth eth2: Wrong NCSI state 0x100 in workqueue
> > >
> > > All masks are set correctly, but you can see the PS column is not right and channel doesn't
> > > configure correctly.
> > >
> > > /sys/kernel/debug/ncsi_protocol# cat ncsi_device_status
> > > IFIDX IFNAME NAME PID CID RX TX MP MC WP WC PC PS LS RU CR NQ HA
> > > ===================================================================
> > > 2 eth2 ncsi0 000 000 1 1 1 1 1 1 1 0 1 1 1 0 1
> > > 2 eth2 ncsi1 000 001 1 0 1 1 1 1 0 0 1 1 1 0 1
> > > 2 eth2 ncsi2 001 000 0 0 1 1 0 0 0 0 1 1 1 0 1
> > > 2 eth2 ncsi3 001 001 0 0 1 1 0 0 0 0 1 1 1 0 1
> > > ===================================================================
> > > MP: Multi-mode Package WP: Whitelist Package
> > > MC: Multi-mode Channel WC: Whitelist Channel
> > > PC: Primary Channel
> > > PS: Poll Status
> > > LS: Link Status
> > > RU: Running
> > > CR: Carrier OK
> > > NQ: Queue Stopped
> > > HA: Hardware Arbitration
> > >
> > > PS column is getting from (int)nc->monitor.enabled.
>
>
^ permalink raw reply
* Re: [Patch net] net: make pskb_trim_rcsum_slow() robust
From: David Miller @ 2018-10-31 19:36 UTC (permalink / raw)
To: xiyou.wangcong; +Cc: netdev, edumazet
In-Reply-To: <20181030003515.12075-1-xiyou.wangcong@gmail.com>
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon, 29 Oct 2018 17:35:15 -0700
> Most callers of pskb_trim_rcsum() simply drops the skb when
> it fails, however, ip_check_defrag() still continues to pass
> the skb up to stack. In that case, we should restore its previous
> csum if __pskb_trim() fails.
>
> Found this during code review.
>
> Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
I kind of agree with Eric that we should make all callers, including
ip_check_defrag(), fail just as with any memory allocation failure.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox