Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking
From: kbuild test robot @ 2016-09-14 21:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: kbuild-all, Andrew Morton, Tejun Heo, David S. Miller,
	Michal Hocko, Vladimir Davydov, linux-mm, cgroups, netdev,
	linux-kernel, kernel-team
In-Reply-To: <20160914194846.11153-3-hannes@cmpxchg.org>

[-- Attachment #1: Type: text/plain, Size: 1539 bytes --]

Hi Johannes,

[auto build test ERROR on net/master]
[also build test ERROR on v4.8-rc6 next-20160914]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Johannes-Weiner/mm-memcontrol-make-per-cpu-charge-cache-IRQ-safe-for-socket-accounting/20160915-035634
config: parisc-b180_defconfig (attached as .config)
compiler: hppa-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=parisc 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `sk_alloc':
   (.text.sk_alloc+0xb4): undefined reference to `mem_cgroup_sk_alloc'
   net/built-in.o: In function `__sk_destruct':
>> net/core/.tmp_sock.o:(.text.__sk_destruct+0xdc): undefined reference to `mem_cgroup_sk_free'
   net/built-in.o: In function `sk_clone_lock':
   (.text.sk_clone_lock+0x184): undefined reference to `mem_cgroup_sk_alloc'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 12799 bytes --]

^ permalink raw reply

* Re: [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking
From: kbuild test robot @ 2016-09-14 21:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: kbuild-all, Andrew Morton, Tejun Heo, David S. Miller,
	Michal Hocko, Vladimir Davydov, linux-mm, cgroups, netdev,
	linux-kernel, kernel-team
In-Reply-To: <20160914194846.11153-3-hannes@cmpxchg.org>

[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]

Hi Johannes,

[auto build test ERROR on net/master]
[also build test ERROR on v4.8-rc6 next-20160914]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Johannes-Weiner/mm-memcontrol-make-per-cpu-charge-cache-IRQ-safe-for-socket-accounting/20160915-035634
config: parisc-c3000_defconfig (attached as .config)
compiler: hppa-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=parisc 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `sk_alloc':
>> (.text.sk_alloc+0x88): undefined reference to `mem_cgroup_sk_alloc'
   net/built-in.o: In function `__sk_destruct':
>> net/core/sock.o:(.text.__sk_destruct+0xbc): undefined reference to `mem_cgroup_sk_free'
   net/built-in.o: In function `sk_clone_lock':
>> (.text.sk_clone_lock+0x164): undefined reference to `mem_cgroup_sk_alloc'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 14256 bytes --]

^ permalink raw reply

* Re: [RFC v3 07/22] landlock: Handle file comparisons
From: Alexei Starovoitov @ 2016-09-14 21:06 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Alexei Starovoitov,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Daniel Borkmann,
	Daniel Mack, David Drysdale, David S . Miller, Elena Reshetova,
	Eric W . Biederman, James Morris, Kees Cook, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29Tb/PtFMR13I2A
In-Reply-To: <20160914072415.26021-8-mic-WFhQfpSGs3bR7s880joybQ@public.gmane.org>

On Wed, Sep 14, 2016 at 09:24:00AM +0200, Mickaël Salaün wrote:
> Add eBPF functions to compare file system access with a Landlock file
> system handle:
> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
>   This function allows to compare the dentry, inode, device or mount
>   point of the currently accessed file, with a reference handle.
> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
>   This function allows an eBPF program to check if the current accessed
>   file is the same or in the hierarchy of a reference handle.
> 
> The goal of file system handle is to abstract kernel objects such as a
> struct file or a struct inode. Userland can create this kind of handle
> thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct
> landlock_handle containing the handle type (e.g.
> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could
> also be any descriptions able to match a struct file or a struct inode
> (e.g. path or glob string).
> 
> Changes since v2:
> * add MNT_INTERNAL check to only add file handle from user-visible FS
>   (e.g. no anonymous inode)
> * replace struct file* with struct path* in map_landlock_handle
> * add BPF protos
> * fix bpf_landlock_cmp_fs_prop_with_struct_file()
> 
> Signed-off-by: Mickaël Salaün <mic-WFhQfpSGs3bR7s880joybQ@public.gmane.org>
> Cc: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> Cc: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
> Cc: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> Cc: James Morris <james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> Cc: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> Cc: Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
> Link: https://lkml.kernel.org/r/CALCETrWwTiz3kZTkEgOW24-DvhQq6LftwEXh77FD2G5o71yD7g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org

thanks for keeping the links to the previous discussion.
Long term it should help, though I worry we already at the point
where there are too many outstanding issues to resolve before we
can proceed with reasonable code review.

> +/*
> + * bpf_landlock_cmp_fs_prop_with_struct_file
> + *
> + * Cf. include/uapi/linux/bpf.h
> + */
> +static inline u64 bpf_landlock_cmp_fs_prop_with_struct_file(u64 r1_property,
> +		u64 r2_map, u64 r3_map_op, u64 r4_file, u64 r5)
> +{
> +	u8 property = (u8) r1_property;
> +	struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
> +	enum bpf_map_array_op map_op = r3_map_op;
> +	struct file *file = (struct file *) (unsigned long) r4_file;

please use just added BPF_CALL_ macros. They will help readability of the above.

> +	struct bpf_array *array = container_of(map, struct bpf_array, map);
> +	struct path *p1, *p2;
> +	struct map_landlock_handle *handle;
> +	int i;
> +
> +	/* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS is an arraymap */
> +	if (unlikely(!map)) {
> +		WARN_ON(1);
> +		return -EFAULT;
> +	}
> +	if (unlikely(!file))
> +		return -ENOENT;
> +	if (unlikely((property | _LANDLOCK_FLAG_FS_MASK) != _LANDLOCK_FLAG_FS_MASK))
> +		return -EINVAL;
> +
> +	/* for now, only handle OP_OR */
> +	switch (map_op) {
> +	case BPF_MAP_ARRAY_OP_OR:
> +		break;
> +	case BPF_MAP_ARRAY_OP_UNSPEC:
> +	case BPF_MAP_ARRAY_OP_AND:
> +	case BPF_MAP_ARRAY_OP_XOR:
> +	default:
> +		return -EINVAL;
> +	}
> +	p2 = &file->f_path;
> +
> +	synchronize_rcu();

that is completely broken.
bpf programs are executing under rcu_lock.
please enable CONFIG_PROVE_RCU and retest everything.

I would suggest for the next RFC to do minimal 7 patches up to this point
with simple example that demonstrates the use case.
I would avoid all unpriv stuff and all of seccomp for the next RFC as well,
otherwise I don't think we can realistically make forward progress, since
there are too many issues raised in the subsequent patches.

The common part that is mergeable is prog's subtype extension to
the verifier that can be used for better tracing and is the common
piece of infra needed for both landlock and checmate LSMs
(which must be one LSM anyway)

^ permalink raw reply

* Re: [PATCH 7/9] net: ethernet: ti: cpts: calc mult and shift from refclk freq
From: Richard Cochran @ 2016-09-14 21:03 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok, John Stultz
In-Reply-To: <4add2983-bdfe-8436-de62-4b2fdc2ea750@ti.com>

On Wed, Sep 14, 2016 at 11:47:46PM +0300, Grygorii Strashko wrote:
> As I understand (and tested), clocks_calc_mult_shift() will 
> return max possible mult which can be used without overflow.

Yes, BUT the returned values depends on the @maxsec input.  As the
kerneldec says,

 * Larger ranges may reduce the conversion accuracy by chosing smaller
 * mult and shift factors.

In addition to that, frequency tuning by calculating a percentage of
'mult', and if 'mult' is small, then the tuning resolution is poor.

So we don't want @maxsec as large as possible, but as small as
possible.

> if calculated results do not satisfy end user - the custom values can
> be passed in DT.  

If we calculate automatically, then the result had better well be
optimal or nearly so.  Otherwise, we should leave it as a manual input
via DTS, IMHO, so that someone is forced to check the values.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH 3/9] net: ethernet: ti: cpts: rework initialization/deinitialization
From: Grygorii Strashko @ 2016-09-14 20:59 UTC (permalink / raw)
  To: Richard Cochran
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok
In-Reply-To: <20160914205223.GF12195@netboy>

On 09/14/2016 11:52 PM, Richard Cochran wrote:
> On Wed, Sep 14, 2016 at 11:37:45PM +0300, Grygorii Strashko wrote:
>> The problem is that if cpts not initialized than pinter on 
>> cpts (in consumer/parent driver - NETCP) will be NULL.
> 
> You made that problem with your "clean up" in this series.
> Previously, cpts was always allocated.

not exactly - it was allocated in CPSW .probe() manually, 
in without this re-work it has to be allocated in NTCP the
same way - manually. So, checks we are discussing here will be still
present, but they will need to be done in CPSW/NETCP drivers -
just one level up and duplicated in both these drivers.

> 
>> So, rx_enable will be unaccessible and cause crash :(
> 
> So just make sure cpts is initialized.

OK. I'll update code this way.

-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH 3/9] net: ethernet: ti: cpts: rework initialization/deinitialization
From: Richard Cochran @ 2016-09-14 20:52 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok
In-Reply-To: <c0aac71d-2033-f210-5494-15399b477e37@ti.com>

On Wed, Sep 14, 2016 at 11:37:45PM +0300, Grygorii Strashko wrote:
> The problem is that if cpts not initialized than pinter on 
> cpts (in consumer/parent driver - NETCP) will be NULL.

You made that problem with your "clean up" in this series.
Previously, cpts was always allocated.

> So, rx_enable will be unaccessible and cause crash :(

So just make sure cpts is initialized.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH v4 00/16] Add Paravirtual RDMA Driver
From: Jason Gunthorpe @ 2016-09-14 20:50 UTC (permalink / raw)
  To: Adit Ranadive
  Cc: dledford@redhat.com, linux-rdma@vger.kernel.org, pv-drivers,
	netdev@vger.kernel.org, linux-pci@vger.kernel.org,
	Jorgen S. Hansen, Aditya Sarwade, George Zhang, Bryan Tan
In-Reply-To: <DM2PR0501MB844473AE0FCAD0FD748D43DC5F10@DM2PR0501MB844.namprd05.prod.outlook.com>

On Wed, Sep 14, 2016 at 07:44:45PM +0000, Adit Ranadive wrote:
> On Wed, Sep 14, 2016 at 10:37:00 -0700, Jason Gunthorpe wrote:
> > We desire to use this as the vehical for the userspace included with the 4.9
> > kernel.
> > 
> > I anticipate the tree will be running by Oct 1.
> 
> Thanks. So does this mean that the libraries distributed via OFED (openfabrics.org)
> will be now from the rdma-plumbing git tree?

I can't speak to the exact transition schedule each downstream will
follow. OFED is a downstream distribution focused on backporting to
old distros.

I would expect that by the time OFED starts to backport 4.9 they will
need to use rdma plumbing.

At the cutoff point we will immediately tell all downstreams the
latest official upstream source is in rdma-plumbing and they should
stop monitoring the old repos.

> Or is the switch to happen only when distros start shipping with the
> 4.9 kernel by default?

Now that we are going down this path with the 4.9 set as the time
frame, I wouldn't worry about timing, there is no down-side to you for
working within rdma-plumbing, and it guarantees that eventually your
code will be widely distributed.

Jason

^ permalink raw reply

* Re: [PATCH 8/9] net: ethernet: ti: cpts: fix overflow check period
From: Grygorii Strashko @ 2016-09-14 20:48 UTC (permalink / raw)
  To: Richard Cochran
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok
In-Reply-To: <20160914204307.GE12195@netboy>

On 09/14/2016 11:43 PM, Richard Cochran wrote:
> On Wed, Sep 14, 2016 at 11:23:43PM +0300, Grygorii Strashko wrote:
>> if yes then those changes are correct as from patch#7 point of
>> view, as from patch#8 because they are separate standalone changes.
>> In patch patch#7 it reasonable to ball out earlier, while in patch#8
>> it required to move forward a bit as I need to know maxsec.
>
> And what about the extra blank line?  AFAICT, placing the test later
> in patch #7 is correct logic and has the advantage of not distracting
> reviews with pointless churn!
>

NP. I'll change it.

-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH 7/9] net: ethernet: ti: cpts: calc mult and shift from refclk freq
From: Grygorii Strashko @ 2016-09-14 20:47 UTC (permalink / raw)
  To: Richard Cochran
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok, John Stultz
In-Reply-To: <20160914202618.GC12195@netboy>

On 09/14/2016 11:26 PM, Richard Cochran wrote:
> On Wed, Sep 14, 2016 at 04:02:29PM +0300, Grygorii Strashko wrote:
>> +static void cpts_calc_mult_shift(struct cpts *cpts)
>> +{
>> +	u64 maxsec;
>> +	u32 freq;
>> +	u32 mult;
>> +	u32 shift;
>> +	u64 ns;
>> +	u64 frac;
>> +
>> +	if (cpts->cc_mult || cpts->cc.shift)
>> +		return;
>> +
>> +	freq = clk_get_rate(cpts->refclk);
>> +
>> +	/* Calc the maximum number of seconds which we can run before
>> +	 * wrapping around.
>> +	 */
>> +	maxsec = cpts->cc.mask;
>> +	do_div(maxsec, freq);
>> +	if (!maxsec)
>> +		maxsec = 1;
> 
> This doesn't pass the smell test.  If the max counter value is M, you
> are figuring M*1/F which is the time in seconds corresponding to M.
> We set M=2^32-1, and so 'freq' would have to be greater than 4 GHz in
> order for 'maxsec' to be zero.  Do you really expect such high
> frequency input clocks?

no. can drop it.

> 
>> +	else if (maxsec > 600 && cpts->cc.mask > UINT_MAX)
>> +		maxsec = 600;
> 
> What is this all about?  cc.mask is always 2^32 - 1.

Oh. Not sure if we will update CPTS to support this, but on
 KS2 E/L (66AK2E) CPTS counter can work in 64bit mode.

> 
>> +	clocks_calc_mult_shift(&mult, &shift, freq, NSEC_PER_SEC, maxsec);
>> +
>> +	cpts->cc_mult = mult;
>> +	cpts->cc.mult = mult;
> 
> In order to get good resolution on the frequency adjustment, we want
> to keep 'mult' as large as possible.  I don't see your code doing
> this.  We can rely on the watchdog reader (work queue) to prevent
> overflows.

As I understand (and tested), clocks_calc_mult_shift() will 
return max possible mult which can be used without overflow.
if calculated results do not satisfy end user - the custom values can
be passed in DT.  

-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking
From: Tejun Heo @ 2016-09-14 20:45 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David S. Miller, Michal Hocko, Vladimir Davydov,
	linux-mm, cgroups, netdev, linux-kernel, kernel-team
In-Reply-To: <20160914194846.11153-3-hannes@cmpxchg.org>

On Wed, Sep 14, 2016 at 03:48:46PM -0400, Johannes Weiner wrote:
> The cgroup core and the memory controller need to track socket
> ownership for different purposes, but the tracking sites being
> entirely different is kind of ugly.
> 
> Be a better citizen and rename the memory controller callbacks to
> match the cgroup core callbacks, then move them to the same place.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

For 1-3,

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 8/9] net: ethernet: ti: cpts: fix overflow check period
From: Richard Cochran @ 2016-09-14 20:43 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok
In-Reply-To: <33f6f81b-ce41-4eba-c62d-93cdb06daa8f@ti.com>

On Wed, Sep 14, 2016 at 11:23:43PM +0300, Grygorii Strashko wrote:
> if yes then those changes are correct as from patch#7 point of 
> view, as from patch#8 because they are separate standalone changes.
> In patch patch#7 it reasonable to ball out earlier, while in patch#8
> it required to move forward a bit as I need to know maxsec. 

And what about the extra blank line?  AFAICT, placing the test later
in patch #7 is correct logic and has the advantage of not distracting
reviews with pointless churn!

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH 3/9] net: ethernet: ti: cpts: rework initialization/deinitialization
From: Grygorii Strashko @ 2016-09-14 20:37 UTC (permalink / raw)
  To: Richard Cochran
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok
In-Reply-To: <20160914203217.GD12195@netboy>

On 09/14/2016 11:32 PM, Richard Cochran wrote:
> On Wed, Sep 14, 2016 at 11:10:48PM +0300, Grygorii Strashko wrote:
>> On 09/14/2016 04:52 PM, Richard Cochran wrote:
>>> On Wed, Sep 14, 2016 at 04:02:25PM +0300, Grygorii Strashko wrote:
> 
>>>> -	if (!cpts->rx_enable)
>>>> +	if (!cpts || !cpts->rx_enable)
>>>>  		return;
> 
>> Ok. I can't say I'd like all this checks, but there are internal requirement 
>> to allow CPTS to be disabled though DT on KS2 (even if built in).
>> I'd try to clarify and return back here.
>>
>> But It'll be good to know your position - acceptable/can be discussed/completely unacceptable?
> 
> Look at that code snippet.  We already test for @cpts->rx_enable.  If
> you want to disable cpts at run time, just avoid setting that field.
> There is no need for any new tests or return codes.
> 

The problem is that if cpts not initialized than pinter on 
cpts (in consumer/parent driver - NETCP) will be NULL.
So, rx_enable will be unaccessible and cause crash :(

-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH 3/9] net: ethernet: ti: cpts: rework initialization/deinitialization
From: Richard Cochran @ 2016-09-14 20:32 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok
In-Reply-To: <606f7cfc-aa34-528b-df04-7a60a9695425@ti.com>

On Wed, Sep 14, 2016 at 11:10:48PM +0300, Grygorii Strashko wrote:
> On 09/14/2016 04:52 PM, Richard Cochran wrote:
> > On Wed, Sep 14, 2016 at 04:02:25PM +0300, Grygorii Strashko wrote:

> >> -	if (!cpts->rx_enable)
> >> +	if (!cpts || !cpts->rx_enable)
> >>  		return;

> Ok. I can't say I'd like all this checks, but there are internal requirement 
> to allow CPTS to be disabled though DT on KS2 (even if built in).
> I'd try to clarify and return back here.
> 
> But It'll be good to know your position - acceptable/can be discussed/completely unacceptable?

Look at that code snippet.  We already test for @cpts->rx_enable.  If
you want to disable cpts at run time, just avoid setting that field.
There is no need for any new tests or return codes.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH 7/9] net: ethernet: ti: cpts: calc mult and shift from refclk freq
From: Richard Cochran @ 2016-09-14 20:26 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok
In-Reply-To: <20160914130231.3035-8-grygorii.strashko@ti.com>

On Wed, Sep 14, 2016 at 04:02:29PM +0300, Grygorii Strashko wrote:
> +static void cpts_calc_mult_shift(struct cpts *cpts)
> +{
> +	u64 maxsec;
> +	u32 freq;
> +	u32 mult;
> +	u32 shift;
> +	u64 ns;
> +	u64 frac;
> +
> +	if (cpts->cc_mult || cpts->cc.shift)
> +		return;
> +
> +	freq = clk_get_rate(cpts->refclk);
> +
> +	/* Calc the maximum number of seconds which we can run before
> +	 * wrapping around.
> +	 */
> +	maxsec = cpts->cc.mask;
> +	do_div(maxsec, freq);
> +	if (!maxsec)
> +		maxsec = 1;

This doesn't pass the smell test.  If the max counter value is M, you
are figuring M*1/F which is the time in seconds corresponding to M.
We set M=2^32-1, and so 'freq' would have to be greater than 4 GHz in
order for 'maxsec' to be zero.  Do you really expect such high
frequency input clocks?

> +	else if (maxsec > 600 && cpts->cc.mask > UINT_MAX)
> +		maxsec = 600;

What is this all about?  cc.mask is always 2^32 - 1.

> +	clocks_calc_mult_shift(&mult, &shift, freq, NSEC_PER_SEC, maxsec);
> +
> +	cpts->cc_mult = mult;
> +	cpts->cc.mult = mult;

In order to get good resolution on the frequency adjustment, we want
to keep 'mult' as large as possible.  I don't see your code doing
this.  We can rely on the watchdog reader (work queue) to prevent
overflows.

Thanks,
Richard

> +	cpts->cc.shift = shift;
> +	/* Check calculations and inform if not precise */
> +	frac = 0;
> +	ns = cyclecounter_cyc2ns(&cpts->cc, freq, cpts->cc.mask, &frac);
> +
> +	dev_info(cpts->dev,
> +		 "CPTS: ref_clk_freq:%u calc_mult:%u calc_shift:%u error:%lld nsec/sec\n",
> +		 freq, cpts->cc_mult, cpts->cc.shift, (ns - NSEC_PER_SEC));
> +}

^ permalink raw reply

* Re: [PATCH 8/9] net: ethernet: ti: cpts: fix overflow check period
From: Grygorii Strashko @ 2016-09-14 20:23 UTC (permalink / raw)
  To: Richard Cochran
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok
In-Reply-To: <20160914200845.GB12195@netboy>

On 09/14/2016 11:08 PM, Richard Cochran wrote:
> On Wed, Sep 14, 2016 at 11:03:18PM +0300, Grygorii Strashko wrote:
>> On 09/14/2016 05:25 PM, Richard Cochran wrote:
>>> On Wed, Sep 14, 2016 at 04:02:30PM +0300, Grygorii Strashko wrote:
>>>> @@ -427,9 +427,6 @@ static void cpts_calc_mult_shift(struct cpts *cpts)
>>>>  	u64 ns;
>>>>  	u64 frac;
>>>>  
>>>> -	if (cpts->cc_mult || cpts->cc.shift)
>>>> -		return;
>>>> -
>>>>  	freq = clk_get_rate(cpts->refclk);
>>>>  
>>>>  	/* Calc the maximum number of seconds which we can run before
>>>
>>> This hunk has nothing to do with $subject.
>>
>> Sry, but I did not get your comment here :(
>> I'd happy to update patch according to your request, but could you provide more info here, pls?
> 
> You added that code in patch #7.  Then you moved it in patch #8.  You
> could have made the code correct in patch #7 to begin with.
> 

Do you mean 
 -	if (cpts->cc_mult || cpts->cc.shift)
 -		return;
??

if yes then those changes are correct as from patch#7 point of 
view, as from patch#8 because they are separate standalone changes.
In patch patch#7 it reasonable to ball out earlier, while in patch#8
it required to move forward a bit as I need to know maxsec. 

Sry, that I'm bothering you with stupid questions.


-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH 3/9] net: ethernet: ti: cpts: rework initialization/deinitialization
From: Grygorii Strashko @ 2016-09-14 20:10 UTC (permalink / raw)
  To: Richard Cochran
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok
In-Reply-To: <20160914135214.GA28592@localhost.localdomain>

On 09/14/2016 04:52 PM, Richard Cochran wrote:
> On Wed, Sep 14, 2016 at 04:02:25PM +0300, Grygorii Strashko wrote:
>> @@ -323,7 +307,7 @@ void cpts_rx_timestamp(struct cpts *cpts, struct sk_buff *skb)
>>  	u64 ns;
>>  	struct skb_shared_hwtstamps *ssh;
>>  
>> -	if (!cpts->rx_enable)
>> +	if (!cpts || !cpts->rx_enable)
>>  		return;
> 
> This function is in the hot path, and you have added a pointless new
> test.  Don't do that.
> 
>>  	ns = cpts_find_ts(cpts, skb, CPTS_EV_RX);
>>  	if (!ns)
>> @@ -338,7 +322,7 @@ void cpts_tx_timestamp(struct cpts *cpts, struct sk_buff *skb)
>>  	u64 ns;
>>  	struct skb_shared_hwtstamps ssh;
>>  
>> -	if (!(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS))
>> +	if (!cpts || !(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS))
>>  		return;
> 
> Same here.
> 
>>  	ns = cpts_find_ts(cpts, skb, CPTS_EV_TX);
>>  	if (!ns)
>> @@ -348,53 +332,102 @@ void cpts_tx_timestamp(struct cpts *cpts, struct sk_buff *skb)
>>  	skb_tstamp_tx(skb, &ssh);
>>  }
>>  
>> -int cpts_register(struct device *dev, struct cpts *cpts,
>> -		  u32 mult, u32 shift)
>> +int cpts_register(struct cpts *cpts)
>>  {
>>  	int err, i;
>> -	unsigned long flags;
>>  
>> -	cpts->info = cpts_info;
>> -	cpts->clock = ptp_clock_register(&cpts->info, dev);
>> -	if (IS_ERR(cpts->clock)) {
>> -		err = PTR_ERR(cpts->clock);
>> -		cpts->clock = NULL;
>> -		return err;
>> -	}
>> -	spin_lock_init(&cpts->lock);
>> -
>> -	cpts->cc.read = cpts_systim_read;
>> -	cpts->cc.mask = CLOCKSOURCE_MASK(32);
>> -	cpts->cc_mult = mult;
>> -	cpts->cc.mult = mult;
>> -	cpts->cc.shift = shift;
>> +	if (!cpts)
>> +		return -EINVAL;
> 
> Not hot path, but still silly.  The caller should never pass NULL.

Ok. I can't say I'd like all this checks, but there are internal requirement 
to allow CPTS to be disabled though DT on KS2 (even if built in).
I'd try to clarify and return back here.

But It'll be good to know your position - acceptable/can be discussed/completely unacceptable?


-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH 8/9] net: ethernet: ti: cpts: fix overflow check period
From: Richard Cochran @ 2016-09-14 20:08 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok
In-Reply-To: <d4cde14a-8147-6097-f034-c9cde6cdb718@ti.com>

On Wed, Sep 14, 2016 at 11:03:18PM +0300, Grygorii Strashko wrote:
> On 09/14/2016 05:25 PM, Richard Cochran wrote:
> > On Wed, Sep 14, 2016 at 04:02:30PM +0300, Grygorii Strashko wrote:
> >> @@ -427,9 +427,6 @@ static void cpts_calc_mult_shift(struct cpts *cpts)
> >>  	u64 ns;
> >>  	u64 frac;
> >>  
> >> -	if (cpts->cc_mult || cpts->cc.shift)
> >> -		return;
> >> -
> >>  	freq = clk_get_rate(cpts->refclk);
> >>  
> >>  	/* Calc the maximum number of seconds which we can run before
> > 
> > This hunk has nothing to do with $subject.
> 
> Sry, but I did not get your comment here :(
> I'd happy to update patch according to your request, but could you provide more info here, pls?

You added that code in patch #7.  Then you moved it in patch #8.  You
could have made the code correct in patch #7 to begin with.

> >>  	cpts->cc_mult = mult;
> >>  	cpts->cc.mult = mult;
> >>  	cpts->cc.shift = shift;
> >> +

If you want a blank line here, then put into the original patch #7.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH 4/9] net: ethernet: ti: cpts: move dt props parsing to cpts driver
From: Richard Cochran @ 2016-09-14 20:03 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok
In-Reply-To: <179724dc-dcc2-6052-84c6-eef9807d867c@ti.com>

On Wed, Sep 14, 2016 at 10:45:54PM +0300, Grygorii Strashko wrote:
> With this change It will not be required to add the same DT parsing code
> in Keystone 2 netcp driver, so overall number of lines will be reduced.

This explanation would make the commit message much more informative.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH 8/9] net: ethernet: ti: cpts: fix overflow check period
From: Grygorii Strashko @ 2016-09-14 20:03 UTC (permalink / raw)
  To: Richard Cochran
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok
In-Reply-To: <20160914142503.GF28592@localhost.localdomain>

On 09/14/2016 05:25 PM, Richard Cochran wrote:
> On Wed, Sep 14, 2016 at 04:02:30PM +0300, Grygorii Strashko wrote:
>> @@ -427,9 +427,6 @@ static void cpts_calc_mult_shift(struct cpts *cpts)
>>  	u64 ns;
>>  	u64 frac;
>>  
>> -	if (cpts->cc_mult || cpts->cc.shift)
>> -		return;
>> -
>>  	freq = clk_get_rate(cpts->refclk);
>>  
>>  	/* Calc the maximum number of seconds which we can run before
> 
> This hunk has nothing to do with $subject.

Sry, but I did not get your comment here :(
I'd happy to update patch according to your request, but could you provide more info here, pls?



> 
>> @@ -442,11 +439,20 @@ static void cpts_calc_mult_shift(struct cpts *cpts)
>>  	else if (maxsec > 600 && cpts->cc.mask > UINT_MAX)
>>  		maxsec = 600;
>>  
>> +	/* Calc overflow check period (maxsec / 2) */
>> +	cpts->ov_check_period = (HZ * maxsec) / 2;
>> +	dev_info(cpts->dev, "cpts: overflow check period %lu\n",
>> +		 cpts->ov_check_period);
>> +
>> +	if (cpts->cc_mult || cpts->cc.shift)
>> +		return;
>> +
>>  	clocks_calc_mult_shift(&mult, &shift, freq, NSEC_PER_SEC, maxsec);
>>  
>>  	cpts->cc_mult = mult;
>>  	cpts->cc.mult = mult;
>>  	cpts->cc.shift = shift;
>> +
> 
> Nor does this.



-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH 7/9] net: ethernet: ti: cpts: calc mult and shift from refclk freq
From: Grygorii Strashko @ 2016-09-14 19:59 UTC (permalink / raw)
  To: Richard Cochran
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok
In-Reply-To: <20160914142210.GE28592@localhost.localdomain>

On 09/14/2016 05:22 PM, Richard Cochran wrote:
> On Wed, Sep 14, 2016 at 04:02:29PM +0300, Grygorii Strashko wrote:
>> @@ -35,6 +33,8 @@ Optional properties:
>>  			  For example in dra72x-evm, pcf gpio has to be
>>  			  driven low so that cpsw slave 0 and phy data
>>  			  lines are connected via mux.
>> +- cpts_clock_mult	: Numerator to convert input clock ticks into nanoseconds
>> +- cpts_clock_shift	: Denominator to convert input clock ticks into nanoseconds
>
> You should explain to the reader how these will be calculated when the
> properties are missing.

Not sure how full should it be explained in bindings - I'll try.


>
>> diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
>> index ff8bb85..8046a21 100644
>> --- a/drivers/net/ethernet/ti/cpts.c
>> +++ b/drivers/net/ethernet/ti/cpts.c
>> @@ -418,18 +418,60 @@ void cpts_unregister(struct cpts *cpts)
>>  	clk_disable(cpts->refclk);
>>  }
>>
>> +static void cpts_calc_mult_shift(struct cpts *cpts)
>> +{
>> +	u64 maxsec;
>> +	u32 freq;
>> +	u32 mult;
>> +	u32 shift;
>> +	u64 ns;
>> +	u64 frac;
>
> Why so many new lines?  This isn't good style.  Please combine
> variables of the same type into one line and sort the lists
> alphabetically.

Its matter of preferences :), but sure - I'll update.
>
>> +	if (cpts->cc_mult || cpts->cc.shift)
>> +		return;
>> +
>> +	freq = clk_get_rate(cpts->refclk);
>> +
>> +	/* Calc the maximum number of seconds which we can run before
>> +	 * wrapping around.
>> +	 */
>> +	maxsec = cpts->cc.mask;
>> +	do_div(maxsec, freq);
>> +	if (!maxsec)
>> +		maxsec = 1;
>> +	else if (maxsec > 600 && cpts->cc.mask > UINT_MAX)
>> +		maxsec = 600;
>> +
>> +	clocks_calc_mult_shift(&mult, &shift, freq, NSEC_PER_SEC, maxsec);
>> +
>> +	cpts->cc_mult = mult;
>> +	cpts->cc.mult = mult;
>> +	cpts->cc.shift = shift;
>> +	/* Check calculations and inform if not precise */
>
> Contrary to this comment, you are not making any kind of judgment
> about whether the calculations are precise or not.
>
>> +	frac = 0;
>> +	ns = cyclecounter_cyc2ns(&cpts->cc, freq, cpts->cc.mask, &frac);
>> +
>> +	dev_info(cpts->dev,
>> +		 "CPTS: ref_clk_freq:%u calc_mult:%u calc_shift:%u error:%lld nsec/sec\n",
>> +		 freq, cpts->cc_mult, cpts->cc.shift, (ns - NSEC_PER_SEC));
>> +}
>> +
>

Thanks for the review.

-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH RFC 5/6] net: Generic resolver backend
From: Tom Herbert @ 2016-09-14 19:56 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team
In-Reply-To: <20160914094911.GE11841@pox.localdomain>

On Wed, Sep 14, 2016 at 2:49 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 09/09/16 at 04:19pm, Tom Herbert wrote:
>> diff --git a/net/core/resolver.c b/net/core/resolver.c
>> new file mode 100644
>> index 0000000..61b36c5
>> --- /dev/null
>> +++ b/net/core/resolver.c
>> @@ -0,0 +1,267 @@
>> +#include <linux/errno.h>
>> +#include <linux/ip.h>
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/netlink.h>
>> +#include <linux/skbuff.h>
>> +#include <linux/socket.h>
>> +#include <linux/types.h>
>> +#include <linux/vmalloc.h>
>> +#include <net/checksum.h>
>> +#include <net/ip.h>
>> +#include <net/ip6_fib.h>
>> +#include <net/lwtunnel.h>
>> +#include <net/protocol.h>
>> +#include <net/resolver.h>
>> +#include <uapi/linux/ila.h>
>
> This include list could be stripped down a bit. ila, lwt, fib, ...
>
>> +
>> +static struct net_rslv_ent *net_rslv_new_ent(struct net_rslv *nrslv,
>> +                                          void *key)
>
> Comment above that net_rslv_get_lock() must be held?
>
>> +{
>> +     struct net_rslv_ent *nrent;
>> +     int err;
>> +
>> +     nrent = kzalloc(sizeof(*nrent) + nrslv->obj_size, GFP_KERNEL);
>
> GFP_ATOMIC since you typically hold net_rslv_get_lock() spinlock?
>
>> +     if (!nrent)
>> +             return ERR_PTR(-EAGAIN);
>> +
>> +     /* Key is always at beginning of object data */
>> +     memcpy(nrent->object, key, nrslv->params.key_len);
>> +
>> +     /* Initialize user data */
>> +     if (nrslv->rslv_init)
>> +             nrslv->rslv_init(nrslv, nrent);
>> +
>> +     /* Put in hash table */
>> +     err = rhashtable_lookup_insert_fast(&nrslv->rhash_table,
>> +                                         &nrent->node, nrslv->params);
>> +     if (err)
>> +             return ERR_PTR(err);
>> +
>> +     if (nrslv->timeout) {
>> +             /* Schedule timeout for resolver */
>> +             INIT_DELAYED_WORK(&nrent->timeout_work, net_rslv_delayed_work);
>
> Should this be done before inserting into rhashtable?
>
Adding to the table and setting delayed work are done under a lock so
I think it should be okay. I'll add a comment to the function that the
lock is held.

>> +             schedule_delayed_work(&nrent->timeout_work, nrslv->timeout);
>> +     }
>> +
>> +     nrent->nrslv = nrslv;
>
> Same here.  net_rslv_cancel_all_delayed_work() walking the rhashtable could
> see ->nrslv as NULL.

I'll move it up, but net_rslv_cancel_all_delayed_work is only called
when we're destroying the table so it would be a bug in the higher
layer if it is both destroying the table and adding entries at the
same time.

Thanks,
Tom


>

^ permalink raw reply

* Re: [PATCH 6/9] net: ethernet: ti: cpts: clean up event list if event pool is empty
From: Grygorii Strashko @ 2016-09-14 19:54 UTC (permalink / raw)
  To: Richard Cochran
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok
In-Reply-To: <20160914141414.GD28592@localhost.localdomain>

On 09/14/2016 05:14 PM, Richard Cochran wrote:
> On Wed, Sep 14, 2016 at 04:02:28PM +0300, Grygorii Strashko wrote:
>> From: WingMan Kwok <w-kwok2@ti.com>
>>
>> When a CPTS user does not exit gracefully by disabling cpts
>> timestamping and leaving a joined multicast group, the system
>> continues to receive and timestamps the ptp packets which eventually
>> occupy all the event list entries.  When this happns, the added code
>> tries to remove some list entries which are expired.
>>
>> Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
>> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
>> ---
>>  drivers/net/ethernet/ti/cpts.c | 30 ++++++++++++++++++++++++++++--
>>  1 file changed, 28 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
>> index 970d4e2..ff8bb85 100644
>> --- a/drivers/net/ethernet/ti/cpts.c
>> +++ b/drivers/net/ethernet/ti/cpts.c
>> @@ -57,22 +57,48 @@ static int cpts_fifo_pop(struct cpts *cpts, u32 *high, u32 *low)
>>  	return -1;
>>  }
>>
>> +static int cpts_event_list_clean_up(struct cpts *cpts)
>
> 5 words, that is quite a mouth full.  How about this instead?
>
> 	static int cpts_purge_events(struct cpts *cpts);
>
>> +{
>> +	struct list_head *this, *next;
>> +	struct cpts_event *event;
>> +	int removed = 0;
>> +
>> +	list_for_each_safe(this, next, &cpts->events) {
>> +		event = list_entry(this, struct cpts_event, list);
>> +		if (event_expired(event)) {
>> +			list_del_init(&event->list);
>> +			list_add(&event->list, &cpts->pool);
>> +			++removed;
>> +		}
>> +	}
>> +	return removed;
>> +}
>> +
>>  /*
>>   * Returns zero if matching event type was found.
>>   */
>>  static int cpts_fifo_read(struct cpts *cpts, int match)
>>  {
>>  	int i, type = -1;
>> +	int removed;
>
> No need for another variable, just change the return code above to
>
> 	return removed ? 0 : -1;
>
> and then you have ...
>
>>  	u32 hi, lo;
>>  	struct cpts_event *event;
>>
>>  	for (i = 0; i < CPTS_FIFO_DEPTH; i++) {
>>  		if (cpts_fifo_pop(cpts, &hi, &lo))
>>  			break;
>> +
>>  		if (list_empty(&cpts->pool)) {
>> -			pr_err("cpts: event pool is empty\n");
>> -			return -1;
>> +			removed = cpts_event_list_clean_up(cpts);
>> +			if (!removed) {
>> +				dev_err(cpts->dev,
>> +					"cpts: event pool is empty\n");
>> +				return -1;
>> +			}
>
> 			if (cpts_purge_events(cpts)) {
> 				dev_err(cpts->dev, "cpts: event pool empty\n");
> 				return -1;
> 			}
>
> Notice how I avoided the ugly line break?
>
>> +			dev_dbg(cpts->dev,
>> +				"cpts: event pool cleaned up %d\n", removed);
>>  		}
>> +
>>  		event = list_first_entry(&cpts->pool, struct cpts_event, list);
>>  		event->tmo = jiffies + 2;
>>  		event->high = hi;
>> --
>> 2.9.3
>>
>

NP here - will update. Thanks for you review.


-- 
regards,
-grygorii

^ permalink raw reply

* [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking
From: Johannes Weiner @ 2016-09-14 19:48 UTC (permalink / raw)
  To: Andrew Morton, Tejun Heo, David S. Miller
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, netdev,
	linux-kernel, kernel-team
In-Reply-To: <20160914194846.11153-1-hannes@cmpxchg.org>

The cgroup core and the memory controller need to track socket
ownership for different purposes, but the tracking sites being
entirely different is kind of ugly.

Be a better citizen and rename the memory controller callbacks to
match the cgroup core callbacks, then move them to the same place.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h |  4 ++--
 mm/memcontrol.c            | 19 +++++++++++--------
 net/core/sock.c            |  6 +++---
 net/ipv4/tcp.c             |  2 --
 net/ipv4/tcp_ipv4.c        |  3 ---
 5 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0710143723bc..ca11b3e6dd65 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -773,8 +773,8 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
 #endif	/* CONFIG_CGROUP_WRITEBACK */
 
 struct sock;
-void sock_update_memcg(struct sock *sk);
-void sock_release_memcg(struct sock *sk);
+void mem_cgroup_sk_alloc(struct sock *sk);
+void mem_cgroup_sk_free(struct sock *sk);
 bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
 void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
 #ifdef CONFIG_MEMCG
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 60bb830abc34..2caf1ee86e78 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2939,7 +2939,7 @@ static int memcg_update_tcp_limit(struct mem_cgroup *memcg, unsigned long limit)
 		/*
 		 * The active flag needs to be written after the static_key
 		 * update. This is what guarantees that the socket activation
-		 * function is the last one to run. See sock_update_memcg() for
+		 * function is the last one to run. See mem_cgroup_sk_alloc() for
 		 * details, and note that we don't mark any socket as belonging
 		 * to this memcg until that flag is up.
 		 *
@@ -2948,7 +2948,7 @@ static int memcg_update_tcp_limit(struct mem_cgroup *memcg, unsigned long limit)
 		 * as accounted, but the accounting functions are not patched in
 		 * yet, we'll lose accounting.
 		 *
-		 * We never race with the readers in sock_update_memcg(),
+		 * We never race with the readers in mem_cgroup_sk_alloc(),
 		 * because when this value change, the code to process it is not
 		 * patched in yet.
 		 */
@@ -5651,11 +5651,15 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage)
 DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key);
 EXPORT_SYMBOL(memcg_sockets_enabled_key);
 
-void sock_update_memcg(struct sock *sk)
+void mem_cgroup_sk_alloc(struct sock *sk)
 {
 	struct mem_cgroup *memcg;
 
-	/* Socket cloning can throw us here with sk_cgrp already
+	if (!mem_cgroup_sockets_enabled)
+		return;
+
+	/*
+	 * Socket cloning can throw us here with sk_memcg already
 	 * filled. It won't however, necessarily happen from
 	 * process context. So the test for root memcg given
 	 * the current task's memcg won't help us in this case.
@@ -5680,12 +5684,11 @@ void sock_update_memcg(struct sock *sk)
 out:
 	rcu_read_unlock();
 }
-EXPORT_SYMBOL(sock_update_memcg);
 
-void sock_release_memcg(struct sock *sk)
+void mem_cgroup_sk_free(struct sock *sk)
 {
-	WARN_ON(!sk->sk_memcg);
-	css_put(&sk->sk_memcg->css);
+	if (sk->sk_memcg)
+		css_put(&sk->sk_memcg->css);
 }
 
 /**
diff --git a/net/core/sock.c b/net/core/sock.c
index 038e660ef844..c73e28fc9c2a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1363,6 +1363,7 @@ static void sk_prot_free(struct proto *prot, struct sock *sk)
 	slab = prot->slab;
 
 	cgroup_sk_free(&sk->sk_cgrp_data);
+	mem_cgroup_sk_free(sk);
 	security_sk_free(sk);
 	if (slab != NULL)
 		kmem_cache_free(slab, sk);
@@ -1399,6 +1400,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		sock_net_set(sk, net);
 		atomic_set(&sk->sk_wmem_alloc, 1);
 
+		mem_cgroup_sk_alloc(sk);
 		cgroup_sk_alloc(&sk->sk_cgrp_data);
 		sock_update_classid(&sk->sk_cgrp_data);
 		sock_update_netprioidx(&sk->sk_cgrp_data);
@@ -1545,6 +1547,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		newsk->sk_incoming_cpu = raw_smp_processor_id();
 		atomic64_set(&newsk->sk_cookie, 0);
 
+		mem_cgroup_sk_alloc(newsk);
 		cgroup_sk_alloc(&newsk->sk_cgrp_data);
 
 		/*
@@ -1569,9 +1572,6 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		sk_set_socket(newsk, NULL);
 		newsk->sk_wq = NULL;
 
-		if (mem_cgroup_sockets_enabled && sk->sk_memcg)
-			sock_update_memcg(newsk);
-
 		if (newsk->sk_prot->sockets_allocated)
 			sk_sockets_allocated_inc(newsk);
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index a13fcb369f52..fc76ef51a5f4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -421,8 +421,6 @@ void tcp_init_sock(struct sock *sk)
 	sk->sk_rcvbuf = sysctl_tcp_rmem[1];
 
 	local_bh_disable();
-	if (mem_cgroup_sockets_enabled)
-		sock_update_memcg(sk);
 	sk_sockets_allocated_inc(sk);
 	local_bh_enable();
 }
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 04b989328558..b8fc74a66299 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1872,9 +1872,6 @@ void tcp_v4_destroy_sock(struct sock *sk)
 	local_bh_disable();
 	sk_sockets_allocated_dec(sk);
 	local_bh_enable();
-
-	if (mem_cgroup_sockets_enabled && sk->sk_memcg)
-		sock_release_memcg(sk);
 }
 EXPORT_SYMBOL(tcp_v4_destroy_sock);
 
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 2/3] cgroup: duplicate cgroup reference when cloning sockets
From: Johannes Weiner @ 2016-09-14 19:48 UTC (permalink / raw)
  To: Andrew Morton, Tejun Heo, David S. Miller
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, netdev,
	linux-kernel, kernel-team
In-Reply-To: <20160914194846.11153-1-hannes@cmpxchg.org>

From: Johannes Weiner <jweiner@fb.com>

When a socket is cloned, the associated sock_cgroup_data is duplicated
but not its reference on the cgroup. As a result, the cgroup reference
count will underflow when both sockets are destroyed later on.

Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
Cc: <stable@vger.kernel.org> # 4.5+
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 kernel/cgroup.c | 6 ++++++
 net/core/sock.c | 5 ++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0c4db7908264..b0d727d26fc7 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -6297,6 +6297,12 @@ void cgroup_sk_alloc(struct sock_cgroup_data *skcd)
 	if (cgroup_sk_alloc_disabled)
 		return;
 
+	/* Socket clone path */
+	if (skcd->val) {
+		cgroup_get(sock_cgroup_ptr(skcd));
+		return;
+	}
+
 	rcu_read_lock();
 
 	while (true) {
diff --git a/net/core/sock.c b/net/core/sock.c
index 51a730485649..038e660ef844 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1340,7 +1340,6 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
 		if (!try_module_get(prot->owner))
 			goto out_free_sec;
 		sk_tx_queue_clear(sk);
-		cgroup_sk_alloc(&sk->sk_cgrp_data);
 	}
 
 	return sk;
@@ -1400,6 +1399,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		sock_net_set(sk, net);
 		atomic_set(&sk->sk_wmem_alloc, 1);
 
+		cgroup_sk_alloc(&sk->sk_cgrp_data);
 		sock_update_classid(&sk->sk_cgrp_data);
 		sock_update_netprioidx(&sk->sk_cgrp_data);
 	}
@@ -1544,6 +1544,9 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		newsk->sk_priority = 0;
 		newsk->sk_incoming_cpu = raw_smp_processor_id();
 		atomic64_set(&newsk->sk_cookie, 0);
+
+		cgroup_sk_alloc(&newsk->sk_cgrp_data);
+
 		/*
 		 * Before updating sk_refcnt, we must commit prior changes to memory
 		 * (Documentation/RCU/rculist_nulls.txt for details)
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 1/3] mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting
From: Johannes Weiner @ 2016-09-14 19:48 UTC (permalink / raw)
  To: Andrew Morton, Tejun Heo, David S. Miller
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, netdev,
	linux-kernel, kernel-team

From: Johannes Weiner <jweiner@fb.com>

During cgroup2 rollout into production, we started encountering css
refcount underflows and css access crashes in the memory controller.
Splitting the heavily shared css reference counter into logical users
narrowed the imbalance down to the cgroup2 socket memory accounting.

The problem turns out to be the per-cpu charge cache. Cgroup1 had a
separate socket counter, but the new cgroup2 socket accounting goes
through the common charge path that uses a shared per-cpu cache for
all memory that is being tracked. Those caches are safe against
scheduling preemption, but not against interrupts - such as the newly
added packet receive path. When cache draining is interrupted by
network RX taking pages out of the cache, the resuming drain operation
will put references of in-use pages, thus causing the imbalance.

Disable IRQs during all per-cpu charge cache operations.

Fixes: f7e1cb6ec51b ("mm: memcontrol: account socket memory in unified hierarchy memory controller")
Cc: <stable@vger.kernel.org> # 4.5+
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7a8d6624758a..60bb830abc34 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1710,17 +1710,22 @@ static DEFINE_MUTEX(percpu_charge_mutex);
 static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
 	struct memcg_stock_pcp *stock;
+	unsigned long flags;
 	bool ret = false;
 
 	if (nr_pages > CHARGE_BATCH)
 		return ret;
 
-	stock = &get_cpu_var(memcg_stock);
+	local_irq_save(flags);
+
+	stock = this_cpu_ptr(&memcg_stock);
 	if (memcg == stock->cached && stock->nr_pages >= nr_pages) {
 		stock->nr_pages -= nr_pages;
 		ret = true;
 	}
-	put_cpu_var(memcg_stock);
+
+	local_irq_restore(flags);
+
 	return ret;
 }
 
@@ -1741,15 +1746,18 @@ static void drain_stock(struct memcg_stock_pcp *stock)
 	stock->cached = NULL;
 }
 
-/*
- * This must be called under preempt disabled or must be called by
- * a thread which is pinned to local cpu.
- */
 static void drain_local_stock(struct work_struct *dummy)
 {
-	struct memcg_stock_pcp *stock = this_cpu_ptr(&memcg_stock);
+	struct memcg_stock_pcp *stock;
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	stock = this_cpu_ptr(&memcg_stock);
 	drain_stock(stock);
 	clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
+
+	local_irq_restore(flags);
 }
 
 /*
@@ -1758,14 +1766,19 @@ static void drain_local_stock(struct work_struct *dummy)
  */
 static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
-	struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock);
+	struct memcg_stock_pcp *stock;
+	unsigned long flags;
+
+	local_irq_save(flags);
 
+	stock = this_cpu_ptr(&memcg_stock);
 	if (stock->cached != memcg) { /* reset if necessary */
 		drain_stock(stock);
 		stock->cached = memcg;
 	}
 	stock->nr_pages += nr_pages;
-	put_cpu_var(memcg_stock);
+
+	local_irq_restore(flags);
 }
 
 /*
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox