Linux Documentation
 help / color / mirror / Atom feed
* Re: [PATCH v2 0/7] mm: pages for hugetlb's overcommit may be able to charge to memcg
From: TSUKADA Koutaro @ 2018-05-24  4:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Johannes Weiner, Vladimir Davydov, Jonathan Corbet,
	Luis R. Rodriguez, Kees Cook, Andrew Morton, Roman Gushchin,
	David Rientjes, Aneesh Kumar K.V, Naoya Horiguchi,
	Anshuman Khandual, Marc-Andre Lureau, Punit Agrawal, Dan Williams,
	Vlastimil Babka, linux-doc, linux-kernel, linux-fsdevel, linux-mm,
	cgroups
In-Reply-To: <20180522185407.GC20441@dhcp22.suse.cz>

On 2018/05/23 3:54, Michal Hocko wrote:
> On Tue 22-05-18 22:04:23, TSUKADA Koutaro wrote:
>> On 2018/05/22 3:07, Mike Kravetz wrote:
>>> On 05/17/2018 09:27 PM, TSUKADA Koutaro wrote:
>>>> Thanks to Mike Kravetz for comment on the previous version patch.
>>>>
>>>> The purpose of this patch-set is to make it possible to control whether or
>>>> not to charge surplus hugetlb pages obtained by overcommitting to memory
>>>> cgroup. In the future, I am trying to accomplish limiting the memory usage
>>>> of applications that use both normal pages and hugetlb pages by the memory
>>>> cgroup(not use the hugetlb cgroup).
>>>>
>>>> Applications that use shared libraries like libhugetlbfs.so use both normal
>>>> pages and hugetlb pages, but we do not know how much to use each. Please
>>>> suppose you want to manage the memory usage of such applications by cgroup
>>>> How do you set the memory cgroup and hugetlb cgroup limit when you want to
>>>> limit memory usage to 10GB?
>>>>
>>>> If you set a limit of 10GB for each, the user can use a total of 20GB of
>>>> memory and can not limit it well. Since it is difficult to estimate the
>>>> ratio used by user of normal pages and hugetlb pages, setting limits of 2GB
>>>> to memory cgroup and 8GB to hugetlb cgroup is not very good idea. In such a
>>>> case, I thought that by using my patch-set, we could manage resources just
>>>> by setting 10GB as the limit of memory cgoup(there is no limit to hugetlb
>>>> cgroup).
>>>>
>>>> In this patch-set, introduce the charge_surplus_huge_pages(boolean) to
>>>> struct hstate. If it is true, it charges to the memory cgroup to which the
>>>> task that obtained surplus hugepages belongs. If it is false, do nothing as
>>>> before, and the default value is false. The charge_surplus_huge_pages can
>>>> be controlled procfs or sysfs interfaces.
>>>>
>>>> Since THP is very effective in environments with kernel page size of 4KB,
>>>> such as x86, there is no reason to positively use HugeTLBfs, so I think
>>>> that there is no situation to enable charge_surplus_huge_pages. However, in
>>>> some distributions such as arm64, the page size of the kernel is 64KB, and
>>>> the size of THP is too huge as 512MB, making it difficult to use. HugeTLBfs
>>>> may support multiple huge page sizes, and in such a special environment
>>>> there is a desire to use HugeTLBfs.
>>>
>>> One of the basic questions/concerns I have is accounting for surplus huge
>>> pages in the default memory resource controller.  The existing huegtlb
>>> resource controller already takes hugetlbfs huge pages into account,
>>> including surplus pages.  This series would allow surplus pages to be
>>> accounted for in the default  memory controller, or the hugetlb controller
>>> or both.
>>>
>>> I understand that current mechanisms do not meet the needs of the above
>>> use case.  The question is whether this is an appropriate way to approach
>>> the issue.
> 
> I do share your view Mike!
> 
>>> My cgroup experience and knowledge is extremely limited, but
>>> it does not appear that any other resource can be controlled by multiple
>>> controllers.  Therefore, I am concerned that this may be going against
>>> basic cgroup design philosophy.
>>
>> Thank you for your feedback.
>> That makes sense, surplus hugepages are charged to both memcg and hugetlb
>> cgroup, which may be contrary to cgroup design philosophy.
>>
>> Based on the above advice, I have considered the following improvements,
>> what do you think about?
>>
>> The 'charge_surplus_hugepages' of v2 patch-set was an option to switch
>> "whether to charge memcg in addition to hugetlb cgroup", but it will be
>> abolished. Instead, change to "switch only to memcg instead of hugetlb
>> cgroup" option. This is called 'surplus_charge_to_memcg'.
> 
> This all looks so hackish and ad-hoc that I would be tempted to give it
> an outright nack, but let's here more about why do we need this fiddling
> at all. I've asked in other email so I guess I will get an answer there
> but let me just emphasize again that I absolutely detest a possibility
> to put hugetlb pages into the memcg mix. They just do not belong there.
> Try to look at previous discussions why it has been decided to have a
> separate hugetlb pages at all.
> 
> I am also quite confused why you keep distinguishing surplus hugetlb
> pages from regular preallocated ones. Being a surplus page is an
> implementation detail that we use for an internal accounting rather than
> something to exhibit to the userspace even more than we do currently.

I apologize for having confused.

The hugetlb pages obtained from the pool do not waste the buddy pool. On
the other hand, surplus hugetlb pages waste the buddy pool. Due to this
difference in property, I thought it could be distinguished.

Although my memcg knowledge is extremely limited, memcg is accounting for
various kinds of pages obtained from the buddy pool by the task belonging
to it. I would like to argue that surplus hugepage has specificity in
terms of obtaining from the buddy pool, and that it is specially permitted
charge requirements for memcg.

It seems very strange that charge hugetlb page to memcg, but essentially
it only charges the usage of the compound page obtained from the buddy pool,
and even if that page is used as hugetlb page after that, memcg is not
interested in that.

I will completely apologize if my way of thinking is wrong. It would be
greatly appreciated if you could mention why we can not charge surplus
hugepages to memcg.

> Just look at what [sw]hould when you need to adjust accounting - e.g.
> due to the pool resize. Are you going to uncharge those surplus pages
> ffrom memcg to reflect their persistence?
> 

I could not understand the intention of this question, sorry. When resize
the pool, I think that the number of surplus hugepages in use does not
change. Could you explain what you were concerned about?

-- 
Thanks,
Tsukada


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 0/7] mm: pages for hugetlb's overcommit may be able to charge to memcg
From: TSUKADA Koutaro @ 2018-05-24  4:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Vladimir Davydov, Jonathan Corbet,
	Luis R. Rodriguez, Kees Cook, Andrew Morton, Roman Gushchin,
	David Rientjes, Mike Kravetz, Aneesh Kumar K.V, Naoya Horiguchi,
	Anshuman Khandual, Marc-Andre Lureau, Punit Agrawal, Dan Williams,
	Vlastimil Babka, linux-doc, linux-kernel, linux-fsdevel, linux-mm,
	cgroups
In-Reply-To: <20180522135148.GA20441@dhcp22.suse.cz>

On 2018/05/22 22:51, Michal Hocko wrote:
> On Fri 18-05-18 13:27:27, TSUKADA Koutaro wrote:
>> The purpose of this patch-set is to make it possible to control whether or
>> not to charge surplus hugetlb pages obtained by overcommitting to memory
>> cgroup. In the future, I am trying to accomplish limiting the memory usage
>> of applications that use both normal pages and hugetlb pages by the memory
>> cgroup(not use the hugetlb cgroup).
> 
> There was a deliberate decision to keep hugetlb and "normal" memory
> cgroup controllers separate. Mostly because hugetlb memory is an
> artificial memory subsystem on its own and it doesn't fit into the rest
> of memcg accounted memory very well. I believe we want to keep that
> status quo.
> 
>> Applications that use shared libraries like libhugetlbfs.so use both normal
>> pages and hugetlb pages, but we do not know how much to use each. Please
>> suppose you want to manage the memory usage of such applications by cgroup
>> How do you set the memory cgroup and hugetlb cgroup limit when you want to
>> limit memory usage to 10GB?
> 
> Well such a usecase requires an explicit configuration already. Either
> by using special wrappers or modifying the code. So I would argue that
> you have quite a good knowlege of the setup. If you need a greater
> flexibility then just do not use hugetlb at all and rely on THP.
> [...]
> 
>> In this patch-set, introduce the charge_surplus_huge_pages(boolean) to
>> struct hstate. If it is true, it charges to the memory cgroup to which the
>> task that obtained surplus hugepages belongs. If it is false, do nothing as
>> before, and the default value is false. The charge_surplus_huge_pages can
>> be controlled procfs or sysfs interfaces.
> 
> I do not really think this is a good idea. We really do not want to make
> the current hugetlb code more complex than it is already. The current
> hugetlb cgroup controller is simple and works at least somehow. I would
> not add more on top unless there is a _really_ strong usecase behind.
> Please make sure to describe such a usecase in details before we even
> start considering the code.

Thank you for your time.

I do not know if it is really a strong use case, but I will explain my
motive in detail. English is not my native language, so please pardon
my poor English.

I am one of the developers for software that managing the resource used
from user job at HPC-Cluster with Linux. The resource is memory mainly.
The HPC-Cluster may be shared by multiple people and used. Therefore, the
memory used by each user must be strictly controlled, otherwise the
user's job will runaway, not only will it hamper the other users, it will
crash the entire system in OOM.

Some users of HPC are very nervous about performance. Jobs are executed
while synchronizing with MPI communication using multiple compute nodes.
Since CPU wait time will occur when synchronizing, they want to minimize
the variation in execution time at each node to reduce waiting times as
much as possible. We call this variation a noise.

THP does not guarantee to use the Huge Page, but may use the normal page.
This mechanism is one cause of variation(noise).

The users who know this mechanism will be hesitant to use THP. However,
the users also know the benefits of the Huge Page's TLB hit rate
performance, and the Huge Page seems to be attractive. It seems natural
that these users are interested in HugeTLBfs, I do not know at all
whether it is the right approach or not.

At the very least, our HPC system is pursuing high versatility and we
have to consider whether we can provide it if users want to use HugeTLBfs.

In order to use HugeTLBfs we need to create a persistent pool, but in
our use case sharing nodes, it would be impossible to create, delete or
resize the pool.

One of the answers I have reached is to use HugeTLBfs by overcommitting
without creating a pool(this is the surplus hugepage).

Surplus hugepages is hugetlb page, but I think at least that consuming
buddy pool is a decisive difference from hugetlb page of persistent pool.
If nr_overcommit_hugepages is assumed to be infinite, allocating pages for
surplus hugepages from buddy pool is all unlimited even if being limited
by memcg. In extreme cases, overcommitment will allow users to exhaust
the entire memory of the system. Of course, this can be prevented by the
hugetlb cgroup, but even if we set the limit for memcg and hugetlb cgroup
respectively, as I asked in the first mail(set limit to 10GB), the
control will not work.

I thought I could charge surplus hugepages to memcg, but maybe I did not
have enough knowledge about memcg. I would like to reply to another mail
for details.

>> Since THP is very effective in environments with kernel page size of 4KB,
>> such as x86, there is no reason to positively use HugeTLBfs, so I think
>> that there is no situation to enable charge_surplus_huge_pages. However, in
>> some distributions such as arm64, the page size of the kernel is 64KB, and
>> the size of THP is too huge as 512MB, making it difficult to use. HugeTLBfs
>> may support multiple huge page sizes, and in such a special environment
>> there is a desire to use HugeTLBfs.
> 
> Well, then I would argue that you shouldn't use 64kB pages for your
> setup or allow THP for smaller sizes. Really hugetlb pages are by no
> means a substitute here.
> 

Actually, I am opposed to the 64KB page, but the proposal to change the
page size is expected to be dismissed as a problem.

-- 
Thanks,
Tsukada

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH RESEND] Documentation: filesystems: update filesystem locking documentation
From: Sean Anderson @ 2018-05-24  2:29 UTC (permalink / raw)
  To: corbet, linux-doc; +Cc: linux-fsdevel, Al Viro, Matthew Wilcox, Jeff Layton


[-- Attachment #1.1: Type: text/plain, Size: 3557 bytes --]

Documentation/filesystems/Locking no longer reflects current locking
semantics. i_mutex is no longer used for locking, and has been superseded
by i_rwsem. Additionally, ->iterate_shared() was not documented.

Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
---
 Documentation/filesystems/Locking | 43 +++++++++++++++++--------------
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 75d2d57e2c44..15853d522941 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -69,31 +69,31 @@ prototypes:
 
 locking rules:
 	all may block
-		i_mutex(inode)
-lookup:		yes
-create:		yes
-link:		yes (both)
-mknod:		yes
-symlink:	yes
-mkdir:		yes
-unlink:		yes (both)
-rmdir:		yes (both)	(see below)
-rename:	yes (all)	(see below)
+		i_rwsem(inode)
+lookup:		shared
+create:		exclusive
+link:		exclusive (both)
+mknod:		exclusive
+symlink:	exclusive
+mkdir:		exclusive
+unlink:		exclusive (both)
+rmdir:		exclusive (both)(see below)
+rename:		exclusive (all)	(see below)
 readlink:	no
 get_link:	no
-setattr:	yes
+setattr:	exclusive
 permission:	no (may not block if called in rcu-walk mode)
 get_acl:	no
 getattr:	no
 listxattr:	no
 fiemap:		no
 update_time:	no
-atomic_open:	yes
+atomic_open:	exclusive
 tmpfile:	no
 
 
-	Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_mutex on
-victim.
+	Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem
+	exclusive on victim.
 	cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.
 
 See Documentation/filesystems/directory-locking for more detailed discussion
@@ -111,10 +111,10 @@ prototypes:
 
 locking rules:
 	all may block
-		i_mutex(inode)
+		i_rwsem(inode)
 list:		no
 get:		no
-set:		yes
+set:		exclusive
 
 --------------------------- super_operations ---------------------------
 prototypes:
@@ -217,14 +217,14 @@ prototypes:
 locking rules:
 	All except set_page_dirty and freepage may block
 
-			PageLocked(page)	i_mutex
+			PageLocked(page)	i_rwsem
 writepage:		yes, unlocks (see below)
 readpage:		yes, unlocks
 writepages:
 set_page_dirty		no
 readpages:
-write_begin:		locks the page		yes
-write_end:		yes, unlocks		yes
+write_begin:		locks the page		exclusive
+write_end:		yes, unlocks		exclusive
 bmap:
 invalidatepage:		yes
 releasepage:		yes
@@ -439,6 +439,7 @@ prototypes:
 	ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
 	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
 	int (*iterate) (struct file *, struct dir_context *);
+	int (*iterate_shared) (struct file *, struct dir_context *);
 	unsigned int (*poll) (struct file *, struct poll_table_struct *);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
@@ -480,6 +481,10 @@ mutex or just to use i_size_read() instead.
 Note: this does not protect the file->f_pos against concurrent modifications
 since this is something the userspace has to take care about.
 
+->iterate() is called with i_rwsem exclusive.
+
+->iterate_shared() is called with i_rwsem at least shared.
+
 ->fasync() is responsible for maintaining the FASYNC bit in filp->f_flags.
 Most instances call fasync_helper(), which does that maintenance, so it's
 not normally something one needs to worry about.  Return values > 0 will be
-- 
2.17.0


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply related

* Re: [PATCH v6 4/5] arm64: dts: sdm845: Add serial console support
From: Rajendra Nayak @ 2018-05-24  2:23 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Karthikeyan Ramasubramanian, Jonathan Corbet, Andy Gross,
	David Brown, Rob Herring, Mark Rutland, Wolfram Sang, linux-doc,
	linux-arm-msm, devicetree, linux-i2c, Evan Green, acourbot,
	Stephen Boyd, Bjorn Andersson
In-Reply-To: <CAD=FV=WwVjMKDoL29fqdzSmi5SRQOxk_mV2=7br8HUK_hUc+XQ@mail.gmail.com>


On 05/23/2018 08:43 PM, Doug Anderson wrote:
> Rajendra,
> 
> On Tue, May 22, 2018 at 11:30 PM, Rajendra Nayak <rnayak@codeaurora.org> wrote:
>>
>>
>> On 03/30/2018 10:38 PM, Karthikeyan Ramasubramanian wrote:
>>> From: Rajendra Nayak <rnayak@codeaurora.org>
>>>
>>> Add the qup uart node and geni se instance needed to
>>> support the serial console on the MTP.
>>>
>>> Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org>
>>> Signed-off-by: Karthikeyan Ramasubramanian <kramasub@codeaurora.org>
>>> ---
>>
>> Andy, is it possible to pull this one in for 4.18?
>> Sorry, I only realized we somehow missed this after looking at your pull request.
>>
>> This is the only patch that prevents linux-next from booting up my sdm845 MTP
>> to a minimal console shell.
> 
> It was in Andy's tree but then got dropped.  Unfortunately the clock
> bindings didn't land early enough so it's a bit difficult to land any
> device tree changes that use the clock bindings until the next kernel
> revision...

ah, okay, did not realize that. Thanks for clarifying.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] docs: update kernel versions and dates in tables
From: Jonathan Corbet @ 2018-05-23 22:26 UTC (permalink / raw)
  To: Tim Bird; +Cc: tim.bird, linux-doc, linux-kernel
In-Reply-To: <1527114014-26240-1-git-send-email-tim.bird@sony.com>

On Wed, 23 May 2018 15:20:14 -0700
Tim Bird <tbird20d@gmail.com> wrote:

> Every once in a while, we should update the examples
> to reflect more recent kernel versions.
> 
> Update the tables describing kernel releases, the merge window,
> and current longterm maintained kernel, from 2.6-era kernels
> to 4.x.

I dunno...it's only been since 2011...aren't you being a little hasty?

:)

Applied, thanks.

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] docs: update kernel versions and dates in tables
From: Tim Bird @ 2018-05-23 22:20 UTC (permalink / raw)
  To: corbet; +Cc: tim.bird, linux-doc, linux-kernel

Every once in a while, we should update the examples
to reflect more recent kernel versions.

Update the tables describing kernel releases, the merge window,
and current longterm maintained kernel, from 2.6-era kernels
to 4.x.

Signed-off-by: Tim Bird <tim.bird@sony.com>
---
 Documentation/process/2.Process.rst | 72 +++++++++++++++++++------------------
 1 file changed, 38 insertions(+), 34 deletions(-)

diff --git a/Documentation/process/2.Process.rst b/Documentation/process/2.Process.rst
index ce5561b..a9c46dd 100644
--- a/Documentation/process/2.Process.rst
+++ b/Documentation/process/2.Process.rst
@@ -18,17 +18,17 @@ major kernel release happening every two or three months.  The recent
 release history looks like this:
 
 	======  =================
-	2.6.38	March 14, 2011
-	2.6.37	January 4, 2011
-	2.6.36	October 20, 2010
-	2.6.35	August 1, 2010
-	2.6.34	May 15, 2010
-	2.6.33	February 24, 2010
+	4.11	April 30, 2017
+	4.12	July 2, 2017
+	4.13	September 3, 2017
+	4.14	November 12, 2017
+	4.15	January 28, 2018
+	4.16	April 1, 2018
 	======  =================
 
-Every 2.6.x release is a major kernel release with new features, internal
-API changes, and more.  A typical 2.6 release can contain nearly 10,000
-changesets with changes to several hundred thousand lines of code.  2.6 is
+Every 4.x release is a major kernel release with new features, internal
+API changes, and more.  A typical 4.x release contain about 13,000
+changesets with changes to several hundred thousand lines of code.  4.x is
 thus the leading edge of Linux kernel development; the kernel uses a
 rolling development model which is continually integrating major changes.
 
@@ -70,20 +70,19 @@ will get up to somewhere between -rc6 and -rc9 before the kernel is
 considered to be sufficiently stable and the final 2.6.x release is made.
 At that point the whole process starts over again.
 
-As an example, here is how the 2.6.38 development cycle went (all dates in
-2011):
+As an example, here is how the 4.16 development cycle went (all dates in
+2018):
 
 	==============  ===============================
-	January 4	2.6.37 stable release
-	January 18	2.6.38-rc1, merge window closes
-	January 21	2.6.38-rc2
-	February 1	2.6.38-rc3
-	February 7	2.6.38-rc4
-	February 15	2.6.38-rc5
-	February 21	2.6.38-rc6
-	March 1		2.6.38-rc7
-	March 7		2.6.38-rc8
-	March 14	2.6.38 stable release
+	January 28	4.15 stable release
+	February 11	4.16-rc1, merge window closes
+	February 18	4.16-rc2
+	February 25	4.16-rc3
+	March 4		4.16-rc4
+	March 11	4.16-rc5
+	March 18	4.16-rc6
+	March 25	4.16-rc7
+	April 1		4.17 stable release
 	==============  ===============================
 
 How do the developers decide when to close the development cycle and create
@@ -99,37 +98,42 @@ release is made.  In the real world, this kind of perfection is hard to
 achieve; there are just too many variables in a project of this size.
 There comes a point where delaying the final release just makes the problem
 worse; the pile of changes waiting for the next merge window will grow
-larger, creating even more regressions the next time around.  So most 2.6.x
+larger, creating even more regressions the next time around.  So most 4.x
 kernels go out with a handful of known regressions though, hopefully, none
 of them are serious.
 
 Once a stable release is made, its ongoing maintenance is passed off to the
 "stable team," currently consisting of Greg Kroah-Hartman.  The stable team
-will release occasional updates to the stable release using the 2.6.x.y
+will release occasional updates to the stable release using the 4.x.y
 numbering scheme.  To be considered for an update release, a patch must (1)
 fix a significant bug, and (2) already be merged into the mainline for the
 next development kernel.  Kernels will typically receive stable updates for
 a little more than one development cycle past their initial release.  So,
-for example, the 2.6.36 kernel's history looked like:
+for example, the 4.13 kernel's history looked like:
 
 	==============  ===============================
-	October 10	2.6.36 stable release
-	November 22	2.6.36.1
-	December 9	2.6.36.2
-	January 7	2.6.36.3
-	February 17	2.6.36.4
+	September 3 	4.13 stable release
+	September 13	4.13.1
+	September 20	4.13.2
+	September 27	4.13.3
+	October 5	4.13.4
+	October 12  	4.13.5
+	...		...
+	November 24	4.13.16
 	==============  ===============================
 
-2.6.36.4 was the final stable update for the 2.6.36 release.
+4.13.16 was the final stable update of the 4.13 release.
 
 Some kernels are designated "long term" kernels; they will receive support
 for a longer period.  As of this writing, the current long term kernels
 and their maintainers are:
 
-	======  ======================  ===========================
-	2.6.27	Willy Tarreau		(Deep-frozen stable kernel)
-	2.6.32	Greg Kroah-Hartman
-	2.6.35	Andi Kleen		(Embedded flag kernel)
+	======  ======================  ==============================
+	3.16	Ben Hutchings		(very long-term stable kernel)
+	4.1	Sasha Levin
+	4.4	Greg Kroah-Hartman	(very long-term stable kernel)
+	4.9	Greg Kroah-Hartman
+	4.14	Greg Kroah-Hartman
 	======  ======================  ===========================
 
 The selection of a kernel for long-term support is purely a matter of a
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH bpf-next v2 0/3] bpf: add boot parameters for sysctl knobs
From: Alexei Starovoitov @ 2018-05-23 22:02 UTC (permalink / raw)
  To: Eugene Syromiatnikov
  Cc: netdev, linux-kernel, linux-doc, Kees Cook, Kai-Heng Feng,
	Daniel Borkmann, Alexei Starovoitov, Jonathan Corbet, Jiri Olsa,
	Jesper Dangaard Brouer
In-Reply-To: <20180523121806.GA27675@asgard.redhat.com>

On Wed, May 23, 2018 at 02:18:19PM +0200, Eugene Syromiatnikov wrote:
> Some BPF sysctl knobs affect the loading of BPF programs, and during
> system boot/init stages these sysctls are not yet configured.
> A concrete example is systemd, that has implemented loading of BPF
> programs.
> 
> Thus, to allow controlling these setting at early boot, this patch set
> adds the ability to change the default setting of these sysctl knobs
> as well as option to override them via a boot-time kernel parameter
> (in order to avoid rebuilding kernel each time a need of changing these
> defaults arises).
> 
> The sysctl knobs in question are kernel.unprivileged_bpf_disable,
> net.core.bpf_jit_harden, and net.core.bpf_jit_kallsyms.

- systemd is root. today it only uses cgroup-bpf progs which require root,
  so disabling unpriv during boot time makes no difference to systemd.
  what is the actual reason to present time?

- say in the future systemd wants to use so_reuseport+bpf for faster
  networking. With unpriv disable during boot, it will force systemd
  to do such networking from root, which will lower its security barrier.
  How that make sense?

- bpf_jit_kallsyms sysctl has immediate effect on loaded programs.
  Flipping it during the boot or right after or any time after
  is the same thing. Why add such boot flag then?

- jit_harden can be turned on by systemd. so turning it during the boot
  will make systemd progs to be constant blinded.
  Constant blinding protects kernel from unprivileged JIT spraying.
  Are you worried that systemd will attack the kernel with JIT spraying?

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v8 4/6] cpuset: Make generate_sched_domains() recognize isolated_cpus
From: Waiman Long @ 2018-05-23 20:18 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli
In-Reply-To: <20180523173453.GY30654@e110439-lin>

On 05/23/2018 01:34 PM, Patrick Bellasi wrote:
> Hi Waiman,
>
> On 17-May 16:55, Waiman Long wrote:
>
> [...]
>
>> @@ -672,13 +672,14 @@ static int generate_sched_domains(cpumask_var_t **domains,
>>  	int ndoms = 0;		/* number of sched domains in result */
>>  	int nslot;		/* next empty doms[] struct cpumask slot */
>>  	struct cgroup_subsys_state *pos_css;
>> +	bool root_load_balance = is_sched_load_balance(&top_cpuset);
>>  
>>  	doms = NULL;
>>  	dattr = NULL;
>>  	csa = NULL;
>>  
>>  	/* Special case for the 99% of systems with one, full, sched domain */
>> -	if (is_sched_load_balance(&top_cpuset)) {
>> +	if (root_load_balance && !top_cpuset.isolation_count) {
> Perhaps I'm missing something but, it seems to me that, when the two
> conditions above are true, then we are going to destroy and rebuild
> the exact same scheduling domains.
>
> IOW, on 99% of systems where:
>
>    is_sched_load_balance(&top_cpuset)
>    top_cpuset.isolation_count = 0
>
> since boot time and forever, then every time we update a value for
> cpuset.cpus we keep rebuilding the same SDs.
>
> It's not strictly related to this patch, the same already happens in
> mainline based just on the first condition, but since you are extending
> that optimization, perhaps you can tell me where I'm possibly wrong or
> which cases I'm not considering.
>
> I'm interested mainly because on Android systems those conditions
> are always true and we see SDs rebuilds every time we write
> something in cpuset.cpus, which ultimately accounts for almost all the
> 6-7[ms] time required for the write to return, depending on the CPU
> frequency.
>
> Cheers Patrick
>
Yes, that is true. I will look into how to further optimize this. Thanks
for the suggestion.

-Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC V2 2/6] hwmon: Add support for RPi voltage sensor
From: Guenter Roeck @ 2018-05-23 18:12 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Stefan Wahren, Rob Herring, Eric Anholt, Mark Rutland,
	Jonathan Corbet, Jean Delvare, linux-hwmon, devicetree,
	Florian Fainelli, Scott Branden, linux-doc, Ray Jui, Phil Elwell,
	Noralf Trønnes, bcm-kernel-feedback-list, linux-rpi-kernel,
	linux-arm-kernel
In-Reply-To: <cbbdab47-cde3-7a7c-c797-c00d546e00d5@arm.com>

On Wed, May 23, 2018 at 01:12:10PM +0100, Robin Murphy wrote:
> On 22/05/18 20:31, Stefan Wahren wrote:
> [...]
> >>>>>+static int rpi_hwmon_probe(struct platform_device *pdev)
> >>>>>+{
> >>>>>+	struct device *dev = &pdev->dev;
> >>>>>+	struct rpi_hwmon_data *data;
> >>>>>+	int ret;
> >>>>>+
> >>>>>+	data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> >>>>>+	if (!data)
> >>>>>+		return -ENOMEM;
> >>>>>+
> >>>>>+	data->fw = platform_get_drvdata(to_platform_device(dev->parent));
> >>>>>+	if (!data->fw)
> >>>>>+		return -EPROBE_DEFER;
> >>>>>+
> >>>>
> >>>>I am a bit at loss here (and sorry I didn't bring this up before).
> >>>>How would this ever be possible, given that the driver is registered
> >>>>from the firmware driver ?
> >>>
> >>>Do you refer to the (wrong) return code, the assumption that the parent must be a platform driver or a possible race?
> >>>
> >>
> >>The return code is one thing. My question was how the driver would ever be instantiated
> >>with platform_get_drvdata(to_platform_device(dev->parent)) == NULL (but dev->parent != NULL),
> >>so I referred to the race. But, sure, a second question would be how that would indicate
> >>that the parent is not instantiated yet (which by itself seems like an odd question).
> >
> >This shouldn't happen and worth a log error. In patch #3 the registration is called after the complete private data of the firmware driver is initialized. Did i missed something?
> >
> >But i must confess that i didn't test all builtin/module combinations.
> 
> The point is that, by construction, a "raspberrypi-hwmon" device will only
> ever be created for this driver to bind to if the firmware device is both
> fully initialised and known to support the GET_THROTTLED call already. Thus
> trying to check those again from the hwmon driver is at best pointless, and
> at worst misleading. If somebody *does* manage to bind this driver to some
> random inappropriate device, you've still got no guarantee that dev->parent
> is valid or that dev_get_drvdata(dev->parent)) won't return something
> non-NULL that isn't a struct rpi_firmware pointer, at which point you're
> liable to pass the paranoid check yet still crash anyway.
> 
> IOW, you can't reasonably defend against incorrect operation, and under
> correct operation there's nothing to defend against, so either way it's
> pretty futile to waste effort trying.
> 

Well said.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v2 4/5] PCI/AER: Add sysfs attributes for rootport cumulative stats
From: Rajat Jain @ 2018-05-23 17:58 UTC (permalink / raw)
  To: Bjorn Helgaas, Jonathan Corbet, Philippe Ombredanne, Kate Stewart,
	Thomas Gleixner, Greg Kroah-Hartman, Rajat Jain, Frederick Lawler,
	Oza Pawandeep, Keith Busch, Gabriele Paoloni, Alexandru Gagniuc,
	Thomas Tai, Steven Rostedt (VMware), linux-pci, linux-doc,
	linux-kernel, Jes Sorensen, Kyle McMartin
  Cc: rajatxjain
In-Reply-To: <20180523175808.28030-1-rajatja@google.com>

Add sysfs attributes for rootport statistics (that are cumulative
of all the ERR_* messages seen on this PCI hierarchy).

Signed-off-by: Rajat Jain <rajatja@google.com>
---
v2: same as v1

 drivers/pci/pcie/aer/aerdrv.h       |  2 ++
 drivers/pci/pcie/aer/aerdrv_core.c  |  2 ++
 drivers/pci/pcie/aer/aerdrv_stats.c | 31 +++++++++++++++++++++++++++++
 3 files changed, 35 insertions(+)

diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index 048fbd7c9633..77d8355551d9 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -88,6 +88,8 @@ irqreturn_t aer_irq(int irq, void *context);
 int pci_aer_stats_init(struct pci_dev *pdev);
 void pci_aer_stats_exit(struct pci_dev *pdev);
 void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info);
+void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
+				 struct aer_err_source *e_src);
 
 extern const char
 *aer_correctable_error_string[AER_MAX_TYPEOF_CORRECTABLE_ERRS];
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index 42a6f913069a..0f70e22563f3 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -424,6 +424,8 @@ static void aer_isr_one_error(struct pcie_device *p_device,
 	struct aer_rpc *rpc = get_service_data(p_device);
 	struct aer_err_info *e_info = &rpc->e_info;
 
+	pci_rootport_aer_stats_incr(p_device->port, e_src);
+
 	/*
 	 * There is a possibility that both correctable error and
 	 * uncorrectable error being logged. Report correctable error first.
diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c b/drivers/pci/pcie/aer/aerdrv_stats.c
index e47321b267f6..898c9bc02ec2 100644
--- a/drivers/pci/pcie/aer/aerdrv_stats.c
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -57,6 +57,9 @@ static DEVICE_ATTR_RO(field)
 aer_stats_aggregate_attr(dev_total_cor_errs);
 aer_stats_aggregate_attr(dev_total_fatal_errs);
 aer_stats_aggregate_attr(dev_total_nonfatal_errs);
+aer_stats_aggregate_attr(rootport_total_cor_errs);
+aer_stats_aggregate_attr(rootport_total_fatal_errs);
+aer_stats_aggregate_attr(rootport_total_nonfatal_errs);
 
 #define aer_stats_breakdown_attr(field, stats_array, strings_array)	\
 	static ssize_t							\
@@ -90,6 +93,9 @@ static struct attribute *aer_stats_attrs[] __ro_after_init = {
 	&dev_attr_dev_total_nonfatal_errs.attr,
 	&dev_attr_dev_breakdown_correctable.attr,
 	&dev_attr_dev_breakdown_uncorrectable.attr,
+	&dev_attr_rootport_total_cor_errs.attr,
+	&dev_attr_rootport_total_fatal_errs.attr,
+	&dev_attr_rootport_total_nonfatal_errs.attr,
 	NULL
 };
 
@@ -102,6 +108,12 @@ static umode_t aer_stats_attrs_are_visible(struct kobject *kobj,
 	if (!pdev->aer_stats)
 		return 0;
 
+	if ((a == &dev_attr_rootport_total_cor_errs.attr ||
+	     a == &dev_attr_rootport_total_fatal_errs.attr ||
+	     a == &dev_attr_rootport_total_nonfatal_errs.attr) &&
+	    pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT)
+		return 0;
+
 	return a->mode;
 }
 
@@ -144,6 +156,25 @@ void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info)
 			counter[i]++;
 }
 
+void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
+				 struct aer_err_source *e_src)
+{
+	struct aer_stats *aer_stats = pdev->aer_stats;
+
+	if (!aer_stats)
+		return;
+
+	if (e_src->status & PCI_ERR_ROOT_COR_RCV)
+		aer_stats->rootport_total_cor_errs++;
+
+	if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) {
+		if (e_src->status & PCI_ERR_ROOT_FATAL_RCV)
+			aer_stats->rootport_total_fatal_errs++;
+		else
+			aer_stats->rootport_total_nonfatal_errs++;
+	}
+}
+
 int pci_aer_stats_init(struct pci_dev *pdev)
 {
 	pdev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
-- 
2.17.0.441.gb46fe60e1d-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices
From: Rajat Jain @ 2018-05-23 17:58 UTC (permalink / raw)
  To: Bjorn Helgaas, Jonathan Corbet, Philippe Ombredanne, Kate Stewart,
	Thomas Gleixner, Greg Kroah-Hartman, Rajat Jain, Frederick Lawler,
	Oza Pawandeep, Keith Busch, Gabriele Paoloni, Alexandru Gagniuc,
	Thomas Tai, Steven Rostedt (VMware), linux-pci, linux-doc,
	linux-kernel, Jes Sorensen, Kyle McMartin
  Cc: rajatxjain
In-Reply-To: <20180523175808.28030-1-rajatja@google.com>

Define a structure to hold the AER statistics. There are 2 groups
of statistics: dev_* counters that are to be collected for all AER
capable devices and rootport_* counters that are collected for all
(AER capable) rootports only. Allocate and free this structure when
device is added or released (thus counters survive the lifetime of the
device).

Add a new file aerdrv_stats.c to hold the AER stats collection logic.

Signed-off-by: Rajat Jain <rajatja@google.com>
---
v2: Fix the license header as per Greg's suggestions
    (Since there is disagreement with using "//" vs "/* */" for license
     I decided to keep the one preferred by Linus, also used by others
     in this directory)

 drivers/pci/pcie/aer/Makefile       |  2 +-
 drivers/pci/pcie/aer/aerdrv.h       |  6 +++
 drivers/pci/pcie/aer/aerdrv_core.c  |  9 +++++
 drivers/pci/pcie/aer/aerdrv_stats.c | 61 +++++++++++++++++++++++++++++
 drivers/pci/probe.c                 |  1 +
 include/linux/pci.h                 |  3 ++
 6 files changed, 81 insertions(+), 1 deletion(-)
 create mode 100644 drivers/pci/pcie/aer/aerdrv_stats.c

diff --git a/drivers/pci/pcie/aer/Makefile b/drivers/pci/pcie/aer/Makefile
index 09bd890875a3..a06f9cc2bde5 100644
--- a/drivers/pci/pcie/aer/Makefile
+++ b/drivers/pci/pcie/aer/Makefile
@@ -7,7 +7,7 @@ obj-$(CONFIG_PCIEAER) += aerdriver.o
 
 obj-$(CONFIG_PCIE_ECRC)	+= ecrc.o
 
-aerdriver-objs := aerdrv_errprint.o aerdrv_core.o aerdrv.o
+aerdriver-objs := aerdrv_errprint.o aerdrv_core.o aerdrv.o aerdrv_stats.o
 aerdriver-$(CONFIG_ACPI) += aerdrv_acpi.o
 
 obj-$(CONFIG_PCIEAER_INJECT) += aer_inject.o
diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index b4c950683cc7..d8b9fba536ed 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -33,6 +33,10 @@
 					PCI_ERR_UNC_MALF_TLP)
 
 #define AER_MAX_MULTI_ERR_DEVICES	5	/* Not likely to have more */
+
+#define AER_MAX_TYPEOF_CORRECTABLE_ERRS 16	/* as per PCI_ERR_COR_STATUS */
+#define AER_MAX_TYPEOF_UNCORRECTABLE_ERRS 26	/* as per PCI_ERR_UNCOR_STATUS*/
+
 struct aer_err_info {
 	struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
 	int error_dev_num;
@@ -81,6 +85,8 @@ void aer_isr(struct work_struct *work);
 void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
 void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info);
 irqreturn_t aer_irq(int irq, void *context);
+int pci_aer_stats_init(struct pci_dev *pdev);
+void pci_aer_stats_exit(struct pci_dev *pdev);
 
 #ifdef CONFIG_ACPI_APEI
 int pcie_aer_get_firmware_first(struct pci_dev *pci_dev);
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index 36e622d35c48..42a6f913069a 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -95,9 +95,18 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
 int pci_aer_init(struct pci_dev *dev)
 {
 	dev->aer_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
+
+	if (!dev->aer_cap || pci_aer_stats_init(dev))
+		return -EIO;
+
 	return pci_cleanup_aer_error_status_regs(dev);
 }
 
+void pci_aer_exit(struct pci_dev *dev)
+{
+	pci_aer_stats_exit(dev);
+}
+
 /**
  * add_error_device - list device to be handled
  * @e_info: pointer to error info
diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c b/drivers/pci/pcie/aer/aerdrv_stats.c
new file mode 100644
index 000000000000..2f48d6bc81f1
--- /dev/null
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018 Google Inc, All Rights Reserved.
+ *
+ * Rajat Jain (rajatja@google.com)
+ *
+ * AER Statistics - exposed to userspace via /sysfs attributes.
+ */
+
+#include <linux/pci.h>
+#include "aerdrv.h"
+
+/* AER stats for the device */
+struct aer_stats {
+
+	/*
+	 * Fields for all AER capable devices. They indicate the errors
+	 * "as seen by this device". Note that this may mean that if an
+	 * end point is causing problems, the AER counters may increment
+	 * at its link partner (e.g. root port) because the errors will be
+	 * "seen" by the link partner and not the the problematic end point
+	 * itself (which may report all counters as 0 as it never saw any
+	 * problems).
+	 */
+	/* Individual counters for different type of correctable errors */
+	u64 dev_cor_errs[AER_MAX_TYPEOF_CORRECTABLE_ERRS];
+	/* Individual counters for different type of uncorrectable errors */
+	u64 dev_uncor_errs[AER_MAX_TYPEOF_UNCORRECTABLE_ERRS];
+	/* Total number of correctable errors seen by this device */
+	u64 dev_total_cor_errs;
+	/* Total number of fatal uncorrectable errors seen by this device */
+	u64 dev_total_fatal_errs;
+	/* Total number of fatal uncorrectable errors seen by this device */
+	u64 dev_total_nonfatal_errs;
+
+	/*
+	 * Fields for Root ports only, these indicate the total number of
+	 * ERR_COR, ERR_FATAL, and ERR_NONFATAL messages received by the
+	 * rootport, INCLUDING the ones that are generated internally (by
+	 * the rootport itself)
+	 */
+	u64 rootport_total_cor_errs;
+	u64 rootport_total_fatal_errs;
+	u64 rootport_total_nonfatal_errs;
+};
+
+int pci_aer_stats_init(struct pci_dev *pdev)
+{
+	pdev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
+	if (!pdev->aer_stats) {
+		dev_err(&pdev->dev, "No memory for aer_stats\n");
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+void pci_aer_stats_exit(struct pci_dev *pdev)
+{
+	kfree(pdev->aer_stats);
+	pdev->aer_stats = NULL;
+}
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 384020757b81..dd662c241373 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2064,6 +2064,7 @@ static void pci_configure_device(struct pci_dev *dev)
 
 static void pci_release_capabilities(struct pci_dev *dev)
 {
+	pci_aer_exit(dev);
 	pci_vpd_release(dev);
 	pci_iov_release(dev);
 	pci_free_cap_save_buffers(dev);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 21965e0dbe62..5c84b1304de7 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -299,6 +299,7 @@ struct pci_dev {
 	u8		hdr_type;	/* PCI header type (`multi' flag masked out) */
 #ifdef CONFIG_PCIEAER
 	u16		aer_cap;	/* AER capability offset */
+	struct aer_stats *aer_stats;	/* AER stats for this device */
 #endif
 	u8		pcie_cap;	/* PCIe capability offset */
 	u8		msi_cap;	/* MSI capability offset */
@@ -1470,10 +1471,12 @@ static inline bool pcie_aspm_support_enabled(void) { return false; }
 void pci_no_aer(void);
 bool pci_aer_available(void);
 int pci_aer_init(struct pci_dev *dev);
+void pci_aer_exit(struct pci_dev *dev);
 #else
 static inline void pci_no_aer(void) { }
 static inline bool pci_aer_available(void) { return false; }
 static inline int pci_aer_init(struct pci_dev *d) { return -ENODEV; }
+static inline void pci_aer_exit(struct pci_dev *d) { }
 #endif
 
 #ifdef CONFIG_PCIE_ECRC
-- 
2.17.0.441.gb46fe60e1d-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 0/5] Expose PCIe AER stats via sysfs
From: Rajat Jain @ 2018-05-23 17:58 UTC (permalink / raw)
  To: Bjorn Helgaas, Jonathan Corbet, Philippe Ombredanne, Kate Stewart,
	Thomas Gleixner, Greg Kroah-Hartman, Rajat Jain, Frederick Lawler,
	Oza Pawandeep, Keith Busch, Gabriele Paoloni, Alexandru Gagniuc,
	Thomas Tai, Steven Rostedt (VMware), linux-pci, linux-doc,
	linux-kernel, Jes Sorensen, Kyle McMartin
  Cc: rajatxjain
In-Reply-To: <20180522222805.80314-1-rajatja@google.com>

This patchset exposes the AER stats via the sysfs attributes.

Patchset v2 has minor changes to v1 based on the review comments,
no functional change.
Primarily:
 * Fix license header
 * Use tabs instead of spaces
 * Remove use on unlikely() etc
 * Move documentation to Documentation/ABI/

Rajat Jain (5):
  PCI/AER: Define and allocate aer_stats structure for AER capable
    devices
  PCI/AER: Add sysfs stats for AER capable devices
  PCI/AER: Add sysfs attributes to provide breakdown of AERs
  PCI/AER: Add sysfs attributes for rootport cumulative stats
  Documentation/ABI: Add details of PCI AER statistics

 .../testing/sysfs-bus-pci-devices-aer_stats   | 103 ++++++++++
 Documentation/PCI/pcieaer-howto.txt           |   5 +
 drivers/pci/pci-sysfs.c                       |   3 +
 drivers/pci/pci.h                             |   4 +-
 drivers/pci/pcie/aer/Makefile                 |   2 +-
 drivers/pci/pcie/aer/aerdrv.h                 |  15 ++
 drivers/pci/pcie/aer/aerdrv_core.c            |  11 +
 drivers/pci/pcie/aer/aerdrv_errprint.c        |   7 +-
 drivers/pci/pcie/aer/aerdrv_stats.c           | 192 ++++++++++++++++++
 drivers/pci/probe.c                           |   1 +
 include/linux/pci.h                           |   3 +
 11 files changed, 342 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
 create mode 100644 drivers/pci/pcie/aer/aerdrv_stats.c

-- 
2.17.0.441.gb46fe60e1d-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v2 3/5] PCI/AER: Add sysfs attributes to provide breakdown of AERs
From: Rajat Jain @ 2018-05-23 17:58 UTC (permalink / raw)
  To: Bjorn Helgaas, Jonathan Corbet, Philippe Ombredanne, Kate Stewart,
	Thomas Gleixner, Greg Kroah-Hartman, Rajat Jain, Frederick Lawler,
	Oza Pawandeep, Keith Busch, Gabriele Paoloni, Alexandru Gagniuc,
	Thomas Tai, Steven Rostedt (VMware), linux-pci, linux-doc,
	linux-kernel, Jes Sorensen, Kyle McMartin
  Cc: rajatxjain, Rajat Jain
In-Reply-To: <20180523175808.28030-1-rajatja@google.com>

Add sysfs attributes to provide breakdown of the AERs seen,
into different type of correctable or uncorrectable errors:

dev_breakdown_correctable
dev_breakdown_uncorrectable

Signed-off-by: Rajat Jain <rajatj@google.com>
---
v2: Use tabs instead of spaces, fix the subject, and print
    all non zero counters.

 drivers/pci/pcie/aer/aerdrv.h          |  6 ++++++
 drivers/pci/pcie/aer/aerdrv_errprint.c |  6 ++++--
 drivers/pci/pcie/aer/aerdrv_stats.c    | 28 ++++++++++++++++++++++++++
 3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index b5d5ad6f2c03..048fbd7c9633 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -89,6 +89,12 @@ int pci_aer_stats_init(struct pci_dev *pdev);
 void pci_aer_stats_exit(struct pci_dev *pdev);
 void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info);
 
+extern const char
+*aer_correctable_error_string[AER_MAX_TYPEOF_CORRECTABLE_ERRS];
+
+extern const char
+*aer_uncorrectable_error_string[AER_MAX_TYPEOF_UNCORRECTABLE_ERRS];
+
 #ifdef CONFIG_ACPI_APEI
 int pcie_aer_get_firmware_first(struct pci_dev *pci_dev);
 #else
diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
index 5e8b98deda08..5585f309f1a8 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -68,7 +68,8 @@ static const char *aer_error_layer[] = {
 	"Transaction Layer"
 };
 
-static const char *aer_correctable_error_string[] = {
+const char
+*aer_correctable_error_string[AER_MAX_TYPEOF_CORRECTABLE_ERRS] = {
 	"Receiver Error",		/* Bit Position 0	*/
 	NULL,
 	NULL,
@@ -87,7 +88,8 @@ static const char *aer_correctable_error_string[] = {
 	"Header Log Overflow",		/* Bit Position 15	*/
 };
 
-static const char *aer_uncorrectable_error_string[] = {
+const char
+*aer_uncorrectable_error_string[AER_MAX_TYPEOF_UNCORRECTABLE_ERRS] = {
 	"Undefined",			/* Bit Position 0	*/
 	NULL,
 	NULL,
diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c b/drivers/pci/pcie/aer/aerdrv_stats.c
index 5555beffef2b..e47321b267f6 100644
--- a/drivers/pci/pcie/aer/aerdrv_stats.c
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -58,10 +58,38 @@ aer_stats_aggregate_attr(dev_total_cor_errs);
 aer_stats_aggregate_attr(dev_total_fatal_errs);
 aer_stats_aggregate_attr(dev_total_nonfatal_errs);
 
+#define aer_stats_breakdown_attr(field, stats_array, strings_array)	\
+	static ssize_t							\
+	field##_show(struct device *dev, struct device_attribute *attr,	\
+		     char *buf)						\
+{									\
+	unsigned int i;							\
+	char *str = buf;						\
+	struct pci_dev *pdev = to_pci_dev(dev);				\
+	u64 *stats = pdev->aer_stats->stats_array;			\
+	for (i = 0; i < ARRAY_SIZE(strings_array); i++) {		\
+		if (strings_array[i])					\
+			str += sprintf(str, "%s = 0x%llx\n",		\
+				       strings_array[i], stats[i]);	\
+		else if (stats[i])					\
+			str += sprintf(str, #stats_array "bit[%d] = 0x%llx\n",\
+				       i, stats[i]);			\
+	}								\
+	return str-buf;							\
+}									\
+static DEVICE_ATTR_RO(field)
+
+aer_stats_breakdown_attr(dev_breakdown_correctable, dev_cor_errs,
+			 aer_correctable_error_string);
+aer_stats_breakdown_attr(dev_breakdown_uncorrectable, dev_uncor_errs,
+			 aer_uncorrectable_error_string);
+
 static struct attribute *aer_stats_attrs[] __ro_after_init = {
 	&dev_attr_dev_total_cor_errs.attr,
 	&dev_attr_dev_total_fatal_errs.attr,
 	&dev_attr_dev_total_nonfatal_errs.attr,
+	&dev_attr_dev_breakdown_correctable.attr,
+	&dev_attr_dev_breakdown_uncorrectable.attr,
 	NULL
 };
 
-- 
2.17.0.441.gb46fe60e1d-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 5/5] Documentation/ABI: Add details of PCI AER statistics
From: Rajat Jain @ 2018-05-23 17:58 UTC (permalink / raw)
  To: Bjorn Helgaas, Jonathan Corbet, Philippe Ombredanne, Kate Stewart,
	Thomas Gleixner, Greg Kroah-Hartman, Rajat Jain, Frederick Lawler,
	Oza Pawandeep, Keith Busch, Gabriele Paoloni, Alexandru Gagniuc,
	Thomas Tai, Steven Rostedt (VMware), linux-pci, linux-doc,
	linux-kernel, Jes Sorensen, Kyle McMartin
  Cc: rajatxjain
In-Reply-To: <20180523175808.28030-1-rajatja@google.com>

Add the PCI AER statistics details to
Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
and provide a pointer to it in
Documentation/PCI/pcieaer-howto.txt

Signed-off-by: Rajat Jain <rajatja@google.com>
---
v2: Move the documentation to Documentation/ABI/

 .../testing/sysfs-bus-pci-devices-aer_stats   | 103 ++++++++++++++++++
 Documentation/PCI/pcieaer-howto.txt           |   5 +
 2 files changed, 108 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats

diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
new file mode 100644
index 000000000000..f55c389290ac
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
@@ -0,0 +1,103 @@
+==========================
+PCIe Device AER statistics
+==========================
+These attributes show up under all the devices that are AER capable. These
+statistical counters indicate the errors "as seen/reported by the device".
+Note that this may mean that if an end point is causing problems, the AER
+counters may increment at its link partner (e.g. root port) because the
+errors will be "seen" / reported by the link partner and not the the
+problematic end point itself (which may report all counters as 0 as it never
+saw any problems).
+
+Where:		/sys/bus/pci/devices/<dev>/aer_stats/dev_total_cor_errs
+Date:		May 2018
+Kernel Version: 4.17.0
+Contact:	linux-pci@vger.kernel.org, rajatja@google.com
+Description:	Total number of correctable errors seen and reported by this
+		PCI device using ERR_COR.
+
+Where:		/sys/bus/pci/devices/<dev>/aer_stats/dev_total_fatal_errs
+Date:		May 2018
+Kernel Version: 4.17.0
+Contact:	linux-pci@vger.kernel.org, rajatja@google.com
+Description:	Total number of uncorrectable fatal errors seen and reported
+		by this PCI device using ERR_FATAL.
+
+Where:		/sys/bus/pci/devices/<dev>/aer_stats/dev_total_nonfatal_errs
+Date:		May 2018
+Kernel Version: 4.17.0
+Contact:	linux-pci@vger.kernel.org, rajatja@google.com
+Description:	Total number of uncorrectable non-fatal errors seen and reported
+		by this PCI device using ERR_NONFATAL.
+
+Where:		/sys/bus/pci/devices/<dev>/aer_stats/dev_breakdown_correctable
+Date:		May 2018
+Kernel Version: 4.17.0
+Contact:	linux-pci@vger.kernel.org, rajatja@google.com
+Description:	Breakdown of of correctable errors seen and reported by this
+		PCI device using ERR_COR. A sample result looks like this:
+-----------------------------------------
+Receiver Error = 0x174
+Bad TLP = 0x19
+Bad DLLP = 0x3
+RELAY_NUM Rollover = 0x0
+Replay Timer Timeout = 0x1
+Advisory Non-Fatal = 0x0
+Corrected Internal Error = 0x0
+Header Log Overflow = 0x0
+-----------------------------------------
+
+Where:		/sys/bus/pci/devices/<dev>/aer_stats/dev_breakdown_uncorrectable
+Date:		May 2018
+Kernel Version: 4.17.0
+Contact:	linux-pci@vger.kernel.org, rajatja@google.com
+Description:	Breakdown of of correctable errors seen and reported by this
+		PCI device using ERR_FATAL or ERR_NONFATAL. A sample result
+		looks like this:
+-----------------------------------------
+Undefined = 0x0
+Data Link Protocol = 0x0
+Surprise Down Error = 0x0
+Poisoned TLP = 0x0
+Flow Control Protocol = 0x0
+Completion Timeout = 0x0
+Completer Abort = 0x0
+Unexpected Completion = 0x0
+Receiver Overflow = 0x0
+Malformed TLP = 0x0
+ECRC = 0x0
+Unsupported Request = 0x0
+ACS Violation = 0x0
+Uncorrectable Internal Error = 0x0
+MC Blocked TLP = 0x0
+AtomicOp Egress Blocked = 0x0
+TLP Prefix Blocked Error = 0x0
+-----------------------------------------
+
+============================
+PCIe Rootport AER statistics
+============================
+These attributes showup under only the rootports that are AER capable. These
+indicate the number of error messages as "reported to" the rootport. Please note
+that the rootports also transmit (internally) the ERR_* messages for errors seen
+by the internal rootport PCI device, so these counters includes them and are
+thus cumulative of all the error messages on the PCI hierarchy originating
+at that root port.
+
+Where:		/sys/bus/pci/devices/<dev>/aer_stats/rootport_total_cor_errs
+Date:		May 2018
+Kernel Version: 4.17.0
+Contact:	linux-pci@vger.kernel.org, rajatja@google.com
+Description:	Total number of ERR_COR messages reported to rootport.
+
+Where:		/sys/bus/pci/devices/<dev>/aer_stats/rootport_total_fatal_errs
+Date:		May 2018
+Kernel Version: 4.17.0
+Contact:	linux-pci@vger.kernel.org, rajatja@google.com
+Description:	Total number of ERR_FATAL messages reported to rootport.
+
+Where:	    /sys/bus/pci/devices/<dev>/aer_stats/rootport_total_nonfatal_errs
+Date:		May 2018
+Kernel Version: 4.17.0
+Contact:	linux-pci@vger.kernel.org, rajatja@google.com
+Description:	Total number of ERR_NONFATAL messages reported to rootport.
diff --git a/Documentation/PCI/pcieaer-howto.txt b/Documentation/PCI/pcieaer-howto.txt
index acd0dddd6bb8..91b6e677cb8c 100644
--- a/Documentation/PCI/pcieaer-howto.txt
+++ b/Documentation/PCI/pcieaer-howto.txt
@@ -73,6 +73,11 @@ In the example, 'Requester ID' means the ID of the device who sends
 the error message to root port. Pls. refer to pci express specs for
 other fields.
 
+2.4 AER Statistics / Counters
+
+When PCIe AER errors are captured, the counters / statistics are also exposed
+in form of sysfs attributes which are documented at
+Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
 
 3. Developer Guide
 
-- 
2.17.0.441.gb46fe60e1d-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 2/5] PCI/AER: Add sysfs stats for AER capable devices
From: Rajat Jain @ 2018-05-23 17:58 UTC (permalink / raw)
  To: Bjorn Helgaas, Jonathan Corbet, Philippe Ombredanne, Kate Stewart,
	Thomas Gleixner, Greg Kroah-Hartman, Rajat Jain, Frederick Lawler,
	Oza Pawandeep, Keith Busch, Gabriele Paoloni, Alexandru Gagniuc,
	Thomas Tai, Steven Rostedt (VMware), linux-pci, linux-doc,
	linux-kernel, Jes Sorensen, Kyle McMartin
  Cc: rajatxjain
In-Reply-To: <20180523175808.28030-1-rajatja@google.com>

Add the following AER sysfs stats to represent the counters for each
kind of error as seen by the device:

dev_total_cor_errs
dev_total_fatal_errs
dev_total_nonfatal_errs

Signed-off-by: Rajat Jain <rajatja@google.com>
---
v2: Use tabs instead of spaces at the end of macro lines, and remove
    the use of unlikely() as per Greg's suggestion.

 drivers/pci/pci-sysfs.c                |  3 ++
 drivers/pci/pci.h                      |  4 +-
 drivers/pci/pcie/aer/aerdrv.h          |  1 +
 drivers/pci/pcie/aer/aerdrv_errprint.c |  1 +
 drivers/pci/pcie/aer/aerdrv_stats.c    | 72 ++++++++++++++++++++++++++
 5 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 366d93af051d..730f985a3dc9 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1743,6 +1743,9 @@ static const struct attribute_group *pci_dev_attr_groups[] = {
 #endif
 	&pci_bridge_attr_group,
 	&pcie_dev_attr_group,
+#ifdef CONFIG_PCIEAER
+	&aer_stats_attr_group,
+#endif
 	NULL,
 };
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index c358e7a07f3f..9a28ec600225 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -181,7 +181,9 @@ extern const struct attribute_group *pci_dev_groups[];
 extern const struct attribute_group *pcibus_groups[];
 extern const struct device_type pci_dev_type;
 extern const struct attribute_group *pci_bus_groups[];
-
+#ifdef CONFIG_PCIEAER
+extern const struct attribute_group aer_stats_attr_group;
+#endif
 
 /**
  * pci_match_one_device - Tell if a PCI device structure has a matching
diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index d8b9fba536ed..b5d5ad6f2c03 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -87,6 +87,7 @@ void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info);
 irqreturn_t aer_irq(int irq, void *context);
 int pci_aer_stats_init(struct pci_dev *pdev);
 void pci_aer_stats_exit(struct pci_dev *pdev);
+void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info);
 
 #ifdef CONFIG_ACPI_APEI
 int pcie_aer_get_firmware_first(struct pci_dev *pci_dev);
diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
index 21ca5e1b0ded..5e8b98deda08 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -155,6 +155,7 @@ static void __aer_print_error(struct pci_dev *dev,
 			pci_err(dev, "   [%2d] Unknown Error Bit%s\n",
 				i, info->first_error == i ? " (First)" : "");
 	}
+	pci_dev_aer_stats_incr(dev, info);
 }
 
 void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c b/drivers/pci/pcie/aer/aerdrv_stats.c
index 2f48d6bc81f1..5555beffef2b 100644
--- a/drivers/pci/pcie/aer/aerdrv_stats.c
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -44,6 +44,78 @@ struct aer_stats {
 	u64 rootport_total_nonfatal_errs;
 };
 
+#define aer_stats_aggregate_attr(field)					\
+	static ssize_t							\
+	field##_show(struct device *dev, struct device_attribute *attr,	\
+		     char *buf)						\
+{									\
+	struct pci_dev *pdev = to_pci_dev(dev);				\
+	return sprintf(buf, "0x%llx\n", pdev->aer_stats->field);	\
+}									\
+static DEVICE_ATTR_RO(field)
+
+aer_stats_aggregate_attr(dev_total_cor_errs);
+aer_stats_aggregate_attr(dev_total_fatal_errs);
+aer_stats_aggregate_attr(dev_total_nonfatal_errs);
+
+static struct attribute *aer_stats_attrs[] __ro_after_init = {
+	&dev_attr_dev_total_cor_errs.attr,
+	&dev_attr_dev_total_fatal_errs.attr,
+	&dev_attr_dev_total_nonfatal_errs.attr,
+	NULL
+};
+
+static umode_t aer_stats_attrs_are_visible(struct kobject *kobj,
+					   struct attribute *a, int n)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	if (!pdev->aer_stats)
+		return 0;
+
+	return a->mode;
+}
+
+const struct attribute_group aer_stats_attr_group = {
+	.name  = "aer_stats",
+	.attrs  = aer_stats_attrs,
+	.is_visible = aer_stats_attrs_are_visible,
+};
+
+void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info)
+{
+	int status, i, max = -1;
+	u64 *counter = NULL;
+	struct aer_stats *aer_stats = pdev->aer_stats;
+
+	if (!aer_stats)
+		return;
+
+	switch (info->severity) {
+	case AER_CORRECTABLE:
+		aer_stats->dev_total_cor_errs++;
+		counter = &aer_stats->dev_cor_errs[0];
+		max = AER_MAX_TYPEOF_CORRECTABLE_ERRS;
+		break;
+	case AER_NONFATAL:
+		aer_stats->dev_total_nonfatal_errs++;
+		counter = &aer_stats->dev_uncor_errs[0];
+		max = AER_MAX_TYPEOF_UNCORRECTABLE_ERRS;
+		break;
+	case AER_FATAL:
+		aer_stats->dev_total_fatal_errs++;
+		counter = &aer_stats->dev_uncor_errs[0];
+		max = AER_MAX_TYPEOF_UNCORRECTABLE_ERRS;
+		break;
+	}
+
+	status = (info->status & ~info->mask);
+	for (i = 0; i < max; i++)
+		if (status & (1 << i))
+			counter[i]++;
+}
+
 int pci_aer_stats_init(struct pci_dev *pdev)
 {
 	pdev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
-- 
2.17.0.441.gb46fe60e1d-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH v8 4/6] cpuset: Make generate_sched_domains() recognize isolated_cpus
From: Patrick Bellasi @ 2018-05-23 17:34 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli
In-Reply-To: <1526590545-3350-5-git-send-email-longman@redhat.com>

Hi Waiman,

On 17-May 16:55, Waiman Long wrote:

[...]

> @@ -672,13 +672,14 @@ static int generate_sched_domains(cpumask_var_t **domains,
>  	int ndoms = 0;		/* number of sched domains in result */
>  	int nslot;		/* next empty doms[] struct cpumask slot */
>  	struct cgroup_subsys_state *pos_css;
> +	bool root_load_balance = is_sched_load_balance(&top_cpuset);
>  
>  	doms = NULL;
>  	dattr = NULL;
>  	csa = NULL;
>  
>  	/* Special case for the 99% of systems with one, full, sched domain */
> -	if (is_sched_load_balance(&top_cpuset)) {
> +	if (root_load_balance && !top_cpuset.isolation_count) {

Perhaps I'm missing something but, it seems to me that, when the two
conditions above are true, then we are going to destroy and rebuild
the exact same scheduling domains.

IOW, on 99% of systems where:

   is_sched_load_balance(&top_cpuset)
   top_cpuset.isolation_count = 0

since boot time and forever, then every time we update a value for
cpuset.cpus we keep rebuilding the same SDs.

It's not strictly related to this patch, the same already happens in
mainline based just on the first condition, but since you are extending
that optimization, perhaps you can tell me where I'm possibly wrong or
which cases I'm not considering.

I'm interested mainly because on Android systems those conditions
are always true and we see SDs rebuilds every time we write
something in cpuset.cpus, which ultimately accounts for almost all the
6-7[ms] time required for the write to return, depending on the CPU
frequency.

Cheers Patrick

-- 
#include <best/regards.h>

Patrick Bellasi
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC PATCH 0/6] net: ethernet: ti: cpsw: add MQPRIO and CBS Qdisc offload
From: Grygorii Strashko @ 2018-05-23 15:43 UTC (permalink / raw)
  To: Ivan Khoronzhuk, davem
  Cc: corbet, akpm, netdev, linux-doc, linux-kernel, linux-omap,
	vinicius.gomes, henrik, jesus.sanchez-palencia, Sekhar Nori,
	Yogesh Siraswar, Schuyler Patton
In-Reply-To: <20180518211510.13341-1-ivan.khoronzhuk@linaro.org>

Hi Ivan,

On 05/18/2018 04:15 PM, Ivan Khoronzhuk wrote:
> This series adds MQPRIO and CBS Qdisc offload for TI cpsw driver.
> It potentially can be used in audio video bridging (AVB) and time
> sensitive networking (TSN).
> 
> Patchset was tested on AM572x EVM and BBB boards. Last patch from this
> series adds detailed description of configuration with examples. For
> consistency reasons, in role of talker and listener, tools from
> patchset "TSN: Add qdisc based config interface for CBS" were used and
> can be seen here: https://www.spinics.net/lists/netdev/msg460869.html
> 
> Based on net-next/master

Thanks a lot, it is great work.

In general I have no comments as of now, but I prefer to wait a bit (few 
weeks) for more comments and possible test reports.

If no comments, pls feel free to repost as final series.


> 
> Ivan Khoronzhuk (6):
>    net: ethernet: ti: cpsw: use cpdma channels in backward order for txq
>    net: ethernet: ti: cpdma: fit rated channels in backward order
>    net: ethernet: ti: cpsw: add MQPRIO Qdisc offload
>    net: ethernet: ti: cpsw: add CBS Qdisc offload
>    net: ethernet: ti: cpsw: restore shaper configuration while down/up
>    Documentation: networking: cpsw: add MQPRIO & CBS offload examples
> 
>   Documentation/networking/cpsw.txt       | 540 ++++++++++++++++++++++++
>   drivers/net/ethernet/ti/cpsw.c          | 364 +++++++++++++++-
>   drivers/net/ethernet/ti/davinci_cpdma.c |  31 +-
>   3 files changed, 913 insertions(+), 22 deletions(-)
>   create mode 100644 Documentation/networking/cpsw.txt
> 

-- 
regards,
-grygorii
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v6 4/5] arm64: dts: sdm845: Add serial console support
From: Doug Anderson @ 2018-05-23 15:13 UTC (permalink / raw)
  To: Rajendra Nayak
  Cc: Karthikeyan Ramasubramanian, Jonathan Corbet, Andy Gross,
	David Brown, Rob Herring, Mark Rutland, Wolfram Sang, linux-doc,
	linux-arm-msm, devicetree, linux-i2c, Evan Green, acourbot,
	Stephen Boyd, Bjorn Andersson
In-Reply-To: <ebe4515d-a26f-a825-0353-666b0b5ad9f5@codeaurora.org>

Rajendra,

On Tue, May 22, 2018 at 11:30 PM, Rajendra Nayak <rnayak@codeaurora.org> wrote:
>
>
> On 03/30/2018 10:38 PM, Karthikeyan Ramasubramanian wrote:
>> From: Rajendra Nayak <rnayak@codeaurora.org>
>>
>> Add the qup uart node and geni se instance needed to
>> support the serial console on the MTP.
>>
>> Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org>
>> Signed-off-by: Karthikeyan Ramasubramanian <kramasub@codeaurora.org>
>> ---
>
> Andy, is it possible to pull this one in for 4.18?
> Sorry, I only realized we somehow missed this after looking at your pull request.
>
> This is the only patch that prevents linux-next from booting up my sdm845 MTP
> to a minimal console shell.

It was in Andy's tree but then got dropped.  Unfortunately the clock
bindings didn't land early enough so it's a bit difficult to land any
device tree changes that use the clock bindings until the next kernel
revision...

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices
From: Steven Rostedt @ 2018-05-23 14:46 UTC (permalink / raw)
  To: Alex G.
  Cc: Jes Sorensen, Matthew Wilcox, Rajat Jain, Bjorn Helgaas,
	Jonathan Corbet, Philippe Ombredanne, Kate Stewart,
	Thomas Gleixner, Greg Kroah-Hartman, Frederick Lawler,
	Oza Pawandeep, Keith Busch, Gabriele Paoloni, Thomas Tai,
	linux-pci, linux-doc, linux-kernel, Kyle McMartin, rajatxjain
In-Reply-To: <ffa4e203-f320-b9be-c0d7-8432e949765b@gmail.com>

On Wed, 23 May 2018 09:33:30 -0500
"Alex G." <mr.nuke.me@gmail.com> wrote:

> > Well I'll agree to disagree with Linus on this one. It's ugly as fsck
> > and allows for ambiguous statements in the code.  
> 
> You misspelled "fuck".

No, Jes is Danish. That's how they spell it.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices
From: Jes Sorensen @ 2018-05-23 14:32 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Rajat Jain, Bjorn Helgaas, Jonathan Corbet, Philippe Ombredanne,
	Kate Stewart, Thomas Gleixner, Greg Kroah-Hartman,
	Frederick Lawler, Oza Pawandeep, Keith Busch, Gabriele Paoloni,
	Alexandru Gagniuc, Thomas Tai, Steven Rostedt (VMware), linux-pci,
	linux-doc, linux-kernel, Kyle McMartin, rajatxjain
In-Reply-To: <20180523142656.GE19987@bombadil.infradead.org>

On 05/23/2018 10:26 AM, Matthew Wilcox wrote:
> On Wed, May 23, 2018 at 10:20:10AM -0400, Jes Sorensen wrote:
>>> +++ b/drivers/pci/pcie/aer/aerdrv_stats.c
>>> @@ -0,0 +1,64 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>
>> Fix the formatting please - that gross // gibberish doesn't belong there.
> 
> Sorry, Jes.  The Chief Penguin has Spoken, and that's the preferred
> syntax:
> 
> 2. Style:
> 
>    The SPDX license identifier is added in form of a comment.  The comment
>    style depends on the file type::
> 
>       C source: // SPDX-License-Identifier: <SPDX License Expression>
> 
> (you can dig up the discussion around this on the mailing list if you
> like.  Linus actually thinks that C++ single-line comments are one of
> the few things that language got right)

Well I'll agree to disagree with Linus on this one. It's ugly as fsck
and allows for ambiguous statements in the code.

Jes
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices
From: Alex G. @ 2018-05-23 14:33 UTC (permalink / raw)
  To: Jes Sorensen, Matthew Wilcox
  Cc: Rajat Jain, Bjorn Helgaas, Jonathan Corbet, Philippe Ombredanne,
	Kate Stewart, Thomas Gleixner, Greg Kroah-Hartman,
	Frederick Lawler, Oza Pawandeep, Keith Busch, Gabriele Paoloni,
	Thomas Tai, Steven Rostedt (VMware), linux-pci, linux-doc,
	linux-kernel, Kyle McMartin, rajatxjain
In-Reply-To: <46f0a2fa-9b48-c6d3-05ac-a75168b1998c@fb.com>

On 05/23/2018 09:32 AM, Jes Sorensen wrote:
> On 05/23/2018 10:26 AM, Matthew Wilcox wrote:
>> On Wed, May 23, 2018 at 10:20:10AM -0400, Jes Sorensen wrote:
>>>> +++ b/drivers/pci/pcie/aer/aerdrv_stats.c
>>>> @@ -0,0 +1,64 @@
>>>> +// SPDX-License-Identifier: GPL-2.0
>>>
>>> Fix the formatting please - that gross // gibberish doesn't belong there.
>>
>> Sorry, Jes.  The Chief Penguin has Spoken, and that's the preferred
>> syntax:
>>
>> 2. Style:
>>
>>    The SPDX license identifier is added in form of a comment.  The comment
>>    style depends on the file type::
>>
>>       C source: // SPDX-License-Identifier: <SPDX License Expression>
>>
>> (you can dig up the discussion around this on the mailing list if you
>> like.  Linus actually thinks that C++ single-line comments are one of
>> the few things that language got right)
> 
> Well I'll agree to disagree with Linus on this one. It's ugly as fsck
> and allows for ambiguous statements in the code.

You misspelled "fuck".

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices
From: Jes Sorensen @ 2018-05-23 14:28 UTC (permalink / raw)
  To: Alex G., Rajat Jain, Bjorn Helgaas, Jonathan Corbet,
	Philippe Ombredanne, Kate Stewart, Thomas Gleixner,
	Greg Kroah-Hartman, Frederick Lawler, Oza Pawandeep, Keith Busch,
	Gabriele Paoloni, Thomas Tai, Steven Rostedt (VMware), linux-pci,
	linux-doc, linux-kernel, Kyle McMartin
  Cc: rajatxjain
In-Reply-To: <53789ac0-b571-5d6a-5925-7e83925b5b31@gmail.com>

On 05/23/2018 10:26 AM, Alex G. wrote:
> On 05/23/2018 09:20 AM, Jes Sorensen wrote:
>> On 05/22/2018 06:28 PM, Rajat Jain wrote:
>>> new file mode 100644
>>> index 000000000000..b9f251992209
>>> --- /dev/null
>>> +++ b/drivers/pci/pcie/aer/aerdrv_stats.c
>>> @@ -0,0 +1,64 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>
>> Fix the formatting please - that gross // gibberish doesn't belong there.
> 
> Deep breath in. Deep breath out.
> 
> git grep SPDX
> 
> Although I don't like it, this format is already too common.

So? Just because some people did something wrong doesn't mean you should
continue to do it.

Jes
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices
From: Matthew Wilcox @ 2018-05-23 14:26 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Rajat Jain, Bjorn Helgaas, Jonathan Corbet, Philippe Ombredanne,
	Kate Stewart, Thomas Gleixner, Greg Kroah-Hartman,
	Frederick Lawler, Oza Pawandeep, Keith Busch, Gabriele Paoloni,
	Alexandru Gagniuc, Thomas Tai, Steven Rostedt (VMware), linux-pci,
	linux-doc, linux-kernel, Kyle McMartin, rajatxjain
In-Reply-To: <62fae8eb-aaed-65b2-19ad-7c57b3a9bfdc@fb.com>

On Wed, May 23, 2018 at 10:20:10AM -0400, Jes Sorensen wrote:
> > +++ b/drivers/pci/pcie/aer/aerdrv_stats.c
> > @@ -0,0 +1,64 @@
> > +// SPDX-License-Identifier: GPL-2.0
> 
> Fix the formatting please - that gross // gibberish doesn't belong there.

Sorry, Jes.  The Chief Penguin has Spoken, and that's the preferred
syntax:

2. Style:

   The SPDX license identifier is added in form of a comment.  The comment
   style depends on the file type::

      C source: // SPDX-License-Identifier: <SPDX License Expression>

(you can dig up the discussion around this on the mailing list if you
like.  Linus actually thinks that C++ single-line comments are one of
the few things that language got right)
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices
From: Alex G. @ 2018-05-23 14:26 UTC (permalink / raw)
  To: Jes Sorensen, Rajat Jain, Bjorn Helgaas, Jonathan Corbet,
	Philippe Ombredanne, Kate Stewart, Thomas Gleixner,
	Greg Kroah-Hartman, Frederick Lawler, Oza Pawandeep, Keith Busch,
	Gabriele Paoloni, Thomas Tai, Steven Rostedt (VMware), linux-pci,
	linux-doc, linux-kernel, Kyle McMartin
  Cc: rajatxjain
In-Reply-To: <62fae8eb-aaed-65b2-19ad-7c57b3a9bfdc@fb.com>

On 05/23/2018 09:20 AM, Jes Sorensen wrote:
> On 05/22/2018 06:28 PM, Rajat Jain wrote:
>> new file mode 100644
>> index 000000000000..b9f251992209
>> --- /dev/null
>> +++ b/drivers/pci/pcie/aer/aerdrv_stats.c
>> @@ -0,0 +1,64 @@
>> +// SPDX-License-Identifier: GPL-2.0
> 
> Fix the formatting please - that gross // gibberish doesn't belong there.

Deep breath in. Deep breath out.

git grep SPDX

Although I don't like it, this format is already too common.

Cheers,
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices
From: Jes Sorensen @ 2018-05-23 14:20 UTC (permalink / raw)
  To: Rajat Jain, Bjorn Helgaas, Jonathan Corbet, Philippe Ombredanne,
	Kate Stewart, Thomas Gleixner, Greg Kroah-Hartman,
	Frederick Lawler, Oza Pawandeep, Keith Busch, Gabriele Paoloni,
	Alexandru Gagniuc, Thomas Tai, Steven Rostedt (VMware), linux-pci,
	linux-doc, linux-kernel, Kyle McMartin
  Cc: rajatxjain
In-Reply-To: <20180522222805.80314-2-rajatja@google.com>

On 05/22/2018 06:28 PM, Rajat Jain wrote:
> Define a structure to hold the AER statistics. There are 2 groups
> of statistics: dev_* counters that are to be collected for all AER
> capable devices and rootport_* counters that are collected for all
> (AER capable) rootports only. Allocate and free this structure when
> device is added or released (thus counters survive the lifetime of the
> device).
> 
> Add a new file aerdrv_stats.c to hold the AER stats collection logic.
> 
> Signed-off-by: Rajat Jain <rajatja@google.com>
> ---
>  drivers/pci/pcie/aer/Makefile       |  2 +-
>  drivers/pci/pcie/aer/aerdrv.h       |  6 +++
>  drivers/pci/pcie/aer/aerdrv_core.c  |  9 ++++
>  drivers/pci/pcie/aer/aerdrv_stats.c | 64 +++++++++++++++++++++++++++++
>  drivers/pci/probe.c                 |  1 +
>  include/linux/pci.h                 |  3 ++
>  6 files changed, 84 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/pci/pcie/aer/aerdrv_stats.c
> 
> diff --git a/drivers/pci/pcie/aer/Makefile b/drivers/pci/pcie/aer/Makefile
>  
> -aerdriver-objs := aerdrv_errprint.o aerdrv_core.o aerdrv.o
> +aerdriver-objs := aerdrv_errprint.o aerdrv_core.o aerdrv.o aerdrv_stats.o
>  aerdriver-$(CONFIG_ACPI) += aerdrv_acpi.o
>  
>  obj-$(CONFIG_PCIEAER_INJECT) += aer_inject.o
> diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
> index b4c950683cc7..d8b9fba536ed 100644
> --- a/drivers/pci/pcie/aer/aerdrv.h
> +++ b/drivers/pci/pcie/aer/aerdrv.h
> @@ -33,6 +33,10 @@
>  					PCI_ERR_UNC_MALF_TLP)
>  
>  #define AER_MAX_MULTI_ERR_DEVICES	5	/* Not likely to have more */
> +
> +#define AER_MAX_TYPEOF_CORRECTABLE_ERRS 16	/* as per PCI_ERR_COR_STATUS */
> +#define AER_MAX_TYPEOF_UNCORRECTABLE_ERRS 26	/* as per PCI_ERR_UNCOR_STATUS*/
> +
>  struct aer_err_info {
>  	struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
>  	int error_dev_num;
> @@ -81,6 +85,8 @@ void aer_isr(struct work_struct *work);
>  void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
>  void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info);
>  irqreturn_t aer_irq(int irq, void *context);
> +int pci_aer_stats_init(struct pci_dev *pdev);
> +void pci_aer_stats_exit(struct pci_dev *pdev);
>  
>  #ifdef CONFIG_ACPI_APEI
>  int pcie_aer_get_firmware_first(struct pci_dev *pci_dev);
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index 36e622d35c48..42a6f913069a 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -95,9 +95,18 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
>  int pci_aer_init(struct pci_dev *dev)
>  {
>  	dev->aer_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
> +
> +	if (!dev->aer_cap || pci_aer_stats_init(dev))
> +		return -EIO;
> +
>  	return pci_cleanup_aer_error_status_regs(dev);
>  }
>  
> +void pci_aer_exit(struct pci_dev *dev)
> +{
> +	pci_aer_stats_exit(dev);
> +}
> +
>  /**
>   * add_error_device - list device to be handled
>   * @e_info: pointer to error info
> diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c b/drivers/pci/pcie/aer/aerdrv_stats.c
> new file mode 100644
> index 000000000000..b9f251992209
> --- /dev/null
> +++ b/drivers/pci/pcie/aer/aerdrv_stats.c
> @@ -0,0 +1,64 @@
> +// SPDX-License-Identifier: GPL-2.0

Fix the formatting please - that gross // gibberish doesn't belong there.

Jes
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox