All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Florian Westphal <fw@strlen.de>, netfilter-devel@vger.kernel.org
Subject: Re: [PATCH nf-next 1/2] netfilter: xtables: use percpu rule counters
Date: Mon, 8 Jun 2015 21:54:22 +0200	[thread overview]
Message-ID: <20150608195422.GE22198@breakpoint.cc> (raw)
In-Reply-To: <1433789954.29864.65.camel@edumazet-glaptop2.roam.corp.google.com>

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > The only reason left as to why we need percpu duplication are the rule
> > counters embedded into ipt_entry et al -- since each cpu has its own copy
> > of the rules, all counters can be lockless.
> > 
> > The downside is that the more cpus are supported, the more memory is
> > required.  Rules are not just duplicated per online cpu but for each
> > possible cpu, i.e. if maxcpu is 144, then rule is duplicated 144 times,
> > not for the e.g. 64 cores present.
> > 
> > To save some memory and also improve utilization of shared caches it
> > would be preferable to only store the rule blob once.
> > 
> > So we first need to separate counters and the rule blob.
> > 
> > Instead of using entry->counters, allocate this percpu and store the
> > percpu address in entry->counters.pcnt on CONFIG_SMP.
> > 
> > This change makes no sense as-is; it is merely an intermediate step to
> > remove the percpu duplication of the rule set in a followup patch.
> > 
> > Suggested-by: Eric Dumazet <edumazet@google.com>
> > Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> > ---
[..]

> > +++ b/include/linux/netfilter/x_tables.h
> > +/* On SMP, ip(6)t_entry->counters.pcnt holds address of the
> > + * real (percpu) counter.  On !SMP, its just the packet count,
> > + * so nothing needs to be done there.
> > + *
> > + * xt_percpu_counter_alloc returns the address of the percpu
> > + * counter, or 0 on !SMP.
> > + *
> > + * Hence caller must use IS_ERR_VALUE to check for error, this
> > + * allows us to contain CONFIG_SMP kludgery.
> > + */
> > +static inline u64 xt_percpu_counter_alloc(void)
> > +{
> > +#ifdef CONFIG_SMP
> > +	void *res = alloc_percpu(struct xt_counters);
> > +
> > +	if (res == NULL)
> > +		return (u64) -ENOMEM;
> > +
> > +	return (__force u64) res;
> > +#else
> > +	return 0;
> > +#endif
> > +}
> 
> 
> The u64 stuff looks strange on a 32bit kernel ;)

Hmmm, not sure how to avoid that.
The packet and byte counters are always u64, and I don't
think it will help to provide different helpers for 32 vs. 64bit system.

> > +static inline void xt_percpu_counter_free(u64 pcnt)
> > +{
> > +#ifdef CONFIG_SMP
> > +	free_percpu((__force void *) pcnt);
> > +#endif
> > +}
> > +
> > +static inline struct xt_counters *
> > +xt_get_this_cpu_counter(struct xt_counters *cnt)
> > +{
> > +#ifdef CONFIG_SMP
> > +	return this_cpu_ptr((void *) cnt->pcnt);
> > +#else
> > +	return cnt;
> > +#endif
> > +}
> > +
> > +static inline struct xt_counters *
> > +xt_get_per_cpu_counter(struct xt_counters *cnt, unsigned int cpu)
> > +{
> > +#ifdef CONFIG_SMP
> > +	return per_cpu_ptr((void *) cnt->pcnt, cpu);
> > +#else
> > +	return cnt;
> > +#endif
> > +}
> > +
> 
> So I understand struct xt_counters is exported to user space
> (include/uapi/linux/netfilter/x_tables.h) 
>
> But maybe you could use in the kernel 
> 
> union xt_kcounters {
> 	struct xt_counters __percpu *pcpu;
> 	struct xt_counters counters;
> };
> 
> This way, you could avoid a lot of casts ?

I tried that but I did not find out how to use this to improve
readability :-|

static inline bool xt_percpu_counter_alloc(union xt_kcounters *cnt)
{
#ifdef CONFIG_SMP
	cnt->pcpu = alloc_percpu(struct xt_counters);

	return cnt->pcpu != NULL;
#else
	return true;
#endif
}

... looks ok BUT it means call site looks like

if (xt_percpu_counter_alloc((union xt_kcounters *) &e->counters))
	/* err */

which I find worse :-/

I could add anon union in ipt_entry to avoid casts but I remember
there were build failure problems with anon unions & older gcc releases.

Is there any idea you have wrt. not leaking percpu yes|no details
to ip(6)_tables.c while still avoiding the casts outside the static
inline helpers above?

> Please run sparse to make sure you did not miss an annotation.
> 
> include/linux/netfilter/x_tables.h:370:21: warning: incorrect type in initializer (different address spaces)
> include/linux/netfilter/x_tables.h:370:21:    expected void *res
> include/linux/netfilter/x_tables.h:370:21:    got struct xt_counters [noderef] <asn:3>*<noident>

Ugh.  I sprinkled a few more __percpu annotations on x_tables.h and
these are gone now.

Thanks Eric.

      reply	other threads:[~2015-06-08 19:54 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-08 17:27 [PATCH nf-next 1/2] netfilter: xtables: use percpu rule counters Florian Westphal
2015-06-08 17:27 ` [PATCH nf-next 2/2] netfilter: xtables: avoid percpu ruleset duplication Florian Westphal
2015-06-08 18:59 ` [PATCH nf-next 1/2] netfilter: xtables: use percpu rule counters Eric Dumazet
2015-06-08 19:54   ` Florian Westphal [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150608195422.GE22198@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=eric.dumazet@gmail.com \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.