* [PATCH 3/3] docs: s390: s390dbf: typos and formatting, update crash command
From: Steffen Maier @ 2019-07-03 10:19 UTC (permalink / raw)
To: linux-doc
Cc: linux-s390, Mauro Carvalho Chehab, Mauro Carvalho Chehab,
Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
linux-kernel
In-Reply-To: <1562149189-1417-1-git-send-email-maier@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
---
Documentation/s390/s390dbf.rst | 122 +++++++++++++++++++++++------------------
1 file changed, 68 insertions(+), 54 deletions(-)
diff --git a/Documentation/s390/s390dbf.rst b/Documentation/s390/s390dbf.rst
index be42892b159e..cdb36842b898 100644
--- a/Documentation/s390/s390dbf.rst
+++ b/Documentation/s390/s390dbf.rst
@@ -23,7 +23,8 @@ The debug feature may also very useful for kernel and driver development.
Design:
-------
Kernel components (e.g. device drivers) can register themselves at the debug
-feature with the function call debug_register(). This function initializes a
+feature with the function call :c:func:`debug_register()`.
+This function initializes a
debug log for the caller. For each debug log exists a number of debug areas
where exactly one is active at one time. Each debug area consists of contiguous
pages in memory. In the debug areas there are stored debug entries (log records)
@@ -44,8 +45,9 @@ The debug areas themselves are also ordered in form of a ring buffer.
When an exception is thrown in the last debug area, the following debug
entries are then written again in the very first area.
-There are three versions for the event- and exception-calls: One for
-logging raw data, one for text and one for numbers.
+There are four versions for the event- and exception-calls: One for
+logging raw data, one for text, one for numbers (unsigned int and long),
+and one for sprintf-like formatted strings.
Each debug entry contains the following data:
@@ -56,29 +58,29 @@ Each debug entry contains the following data:
- Flag, if entry is an exception or not
The debug logs can be inspected in a live system through entries in
-the debugfs-filesystem. Under the toplevel directory "s390dbf" there is
+the debugfs-filesystem. Under the toplevel directory "``s390dbf``" there is
a directory for each registered component, which is named like the
corresponding component. The debugfs normally should be mounted to
-/sys/kernel/debug therefore the debug feature can be accessed under
-/sys/kernel/debug/s390dbf.
+``/sys/kernel/debug`` therefore the debug feature can be accessed under
+``/sys/kernel/debug/s390dbf``.
The content of the directories are files which represent different views
to the debug log. Each component can decide which views should be
-used through registering them with the function debug_register_view().
+used through registering them with the function :c:func:`debug_register_view()`.
Predefined views for hex/ascii, sprintf and raw binary data are provided.
It is also possible to define other views. The content of
a view can be inspected simply by reading the corresponding debugfs file.
All debug logs have an actual debug level (range from 0 to 6).
-The default level is 3. Event and Exception functions have a 'level'
+The default level is 3. Event and Exception functions have a :c:data:`level`
parameter. Only debug entries with a level that is lower or equal
than the actual level are written to the log. This means, when
writing events, high priority log entries should have a low level
value whereas low priority entries should have a high one.
The actual debug level can be changed with the help of the debugfs-filesystem
-through writing a number string "x" to the 'level' debugfs file which is
+through writing a number string "x" to the ``level`` debugfs file which is
provided for every debug log. Debugging can be switched off completely
-by using "-" on the 'level' debugfs file.
+by using "-" on the ``level`` debugfs file.
Example::
@@ -86,21 +88,21 @@ Example::
It is also possible to deactivate the debug feature globally for every
debug log. You can change the behavior using 2 sysctl parameters in
-/proc/sys/s390dbf:
+``/proc/sys/s390dbf``:
There are currently 2 possible triggers, which stop the debug feature
-globally. The first possibility is to use the "debug_active" sysctl. If
-set to 1 the debug feature is running. If "debug_active" is set to 0 the
+globally. The first possibility is to use the ``debug_active`` sysctl. If
+set to 1 the debug feature is running. If ``debug_active`` is set to 0 the
debug feature is turned off.
The second trigger which stops the debug feature is a kernel oops.
That prevents the debug feature from overwriting debug information that
happened before the oops. After an oops you can reactivate the debug feature
-by piping 1 to /proc/sys/s390dbf/debug_active. Nevertheless, its not
+by piping 1 to ``/proc/sys/s390dbf/debug_active``. Nevertheless, it's not
suggested to use an oopsed kernel in a production environment.
If you want to disallow the deactivation of the debug feature, you can use
-the "debug_stoppable" sysctl. If you set "debug_stoppable" to 0 the debug
+the ``debug_stoppable`` sysctl. If you set ``debug_stoppable`` to 0 the debug
feature cannot be stopped. If the debug feature is already stopped, it
will stay deactivated.
@@ -113,16 +115,18 @@ Kernel Interfaces:
Predefined views:
-----------------
-extern struct debug_view debug_hex_ascii_view;
+.. code-block:: c
-extern struct debug_view debug_raw_view;
+ extern struct debug_view debug_hex_ascii_view;
-extern struct debug_view debug_sprintf_view;
+ extern struct debug_view debug_raw_view;
+
+ extern struct debug_view debug_sprintf_view;
Examples
--------
-::
+.. code-block:: c
/*
* hex_ascii- + raw-view Example
@@ -131,15 +135,15 @@ Examples
#include <linux/init.h>
#include <asm/debug.h>
- static debug_info_t* debug_info;
+ static debug_info_t *debug_info;
static int init(void)
{
/* register 4 debug areas with one page each and 4 byte data field */
- debug_info = debug_register ("test", 1, 4, 4 );
- debug_register_view(debug_info,&debug_hex_ascii_view);
- debug_register_view(debug_info,&debug_raw_view);
+ debug_info = debug_register("test", 1, 4, 4 );
+ debug_register_view(debug_info, &debug_hex_ascii_view);
+ debug_register_view(debug_info, &debug_raw_view);
debug_text_event(debug_info, 4 , "one ");
debug_int_exception(debug_info, 4, 4711);
@@ -150,13 +154,13 @@ Examples
static void cleanup(void)
{
- debug_unregister (debug_info);
+ debug_unregister(debug_info);
}
module_init(init);
module_exit(cleanup);
-::
+.. code-block:: c
/*
* sprintf-view Example
@@ -165,15 +169,15 @@ Examples
#include <linux/init.h>
#include <asm/debug.h>
- static debug_info_t* debug_info;
+ static debug_info_t *debug_info;
static int init(void)
{
/* register 4 debug areas with one page each and data field for */
/* format string pointer + 2 varargs (= 3 * sizeof(long)) */
- debug_info = debug_register ("test", 1, 4, sizeof(long) * 3);
- debug_register_view(debug_info,&debug_sprintf_view);
+ debug_info = debug_register("test", 1, 4, sizeof(long) * 3);
+ debug_register_view(debug_info, &debug_sprintf_view);
debug_sprintf_event(debug_info, 2 , "first event in %s:%i\n",__FILE__,__LINE__);
debug_sprintf_exception(debug_info, 1, "pointer to debug info: %p\n",&debug_info);
@@ -183,7 +187,7 @@ Examples
static void cleanup(void)
{
- debug_unregister (debug_info);
+ debug_unregister(debug_info);
}
module_init(init);
@@ -252,7 +256,7 @@ Define 4 pages for the debug areas of debug feature "dasd"::
> echo "4" > /sys/kernel/debug/s390dbf/dasd/pages
-Stooping the debug feature
+Stopping the debug feature
--------------------------
Example:
@@ -264,10 +268,11 @@ Example:
> echo 0 > /proc/sys/s390dbf/debug_active
-lcrash Interface
+crash Interface
----------------
-It is planned that the dump analysis tool lcrash gets an additional command
-'s390dbf' to display all the debug logs. With this tool it will be possible
+The ``crash`` tool since v5.1.0 has a built-in command
+``s390dbf`` to display all the debug logs or export them to the file system.
+With this tool it is possible
to investigate the debug logs on a live system and with a memory dump after
a system crash.
@@ -276,8 +281,8 @@ Investigating raw memory
One last possibility to investigate the debug logs at a live
system and after a system crash is to look at the raw memory
under VM or at the Service Element.
-It is possible to find the anker of the debug-logs through
-the 'debug_area_first' symbol in the System map. Then one has
+It is possible to find the anchor of the debug-logs through
+the ``debug_area_first`` symbol in the System map. Then one has
to follow the correct pointers of the data-structures defined
in debug.h and find the debug-areas in memory.
Normally modules which use the debug feature will also have
@@ -286,7 +291,7 @@ this pointer it will also be possible to find the debug logs in
memory.
For this method it is recommended to use '16 * x + 4' byte (x = 0..n)
-for the length of the data field in debug_register() in
+for the length of the data field in :c:func:`debug_register()` in
order to see the debug entries well formatted.
@@ -295,7 +300,7 @@ Predefined Views
There are three predefined views: hex_ascii, raw and sprintf.
The hex_ascii view shows the data field in hex and ascii representation
-(e.g. '45 43 4b 44 | ECKD').
+(e.g. ``45 43 4b 44 | ECKD``).
The raw view returns a bytestream as the debug areas are stored in memory.
The sprintf view formats the debug entries in the same way as the sprintf
@@ -335,18 +340,20 @@ The format of the raw view is:
- datafield
A typical line of the hex_ascii view will look like the following (first line
-is only for explanation and will not be displayed when 'cating' the view):
+is only for explanation and will not be displayed when 'cating' the view)::
-area time level exception cpu caller data (hex + ascii)
---------------------------------------------------------------------------
-00 00964419409:440690 1 - 00 88023fe
+ area time level exception cpu caller data (hex + ascii)
+ --------------------------------------------------------------------------
+ 00 00964419409:440690 1 - 00 88023fe
Defining views
--------------
Views are specified with the 'debug_view' structure. There are defined
-callback functions which are used for reading and writing the debugfs files::
+callback functions which are used for reading and writing the debugfs files:
+
+.. code-block:: c
struct debug_view {
char name[DEBUG_MAX_PROCF_LEN];
@@ -357,7 +364,9 @@ callback functions which are used for reading and writing the debugfs files::
void* private_data;
};
-where::
+where:
+
+.. code-block:: c
typedef int (debug_header_proc_t) (debug_info_t* id,
struct debug_view* view,
@@ -395,10 +404,10 @@ Then 'header_proc' and 'format_proc' are called for each
existing debug entry.
The input_proc can be used to implement functionality when it is written to
-the view (e.g. like with 'echo "0" > /sys/kernel/debug/s390dbf/dasd/level).
+the view (e.g. like with ``echo "0" > /sys/kernel/debug/s390dbf/dasd/level``).
For header_proc there can be used the default function
-debug_dflt_header_fn() which is defined in debug.h.
+:c:func:`debug_dflt_header_fn()` which is defined in debug.h.
and which produces the same header output as the predefined views.
E.g::
@@ -407,7 +416,9 @@ E.g::
In order to see how to use the callback functions check the implementation
of the default views!
-Example::
+Example:
+
+.. code-block:: c
#include <asm/debug.h>
@@ -423,21 +434,20 @@ Example::
};
static int debug_test_format_fn(
- debug_info_t * id, struct debug_view *view,
+ debug_info_t *id, struct debug_view *view,
char *out_buf, const char *in_buf
)
{
int i, rc = 0;
- if(id->buf_size >= 4) {
+ if (id->buf_size >= 4) {
int msg_nr = *((int*)in_buf);
- if(msg_nr < sizeof(messages)/sizeof(char*) - 1)
+ if (msg_nr < sizeof(messages) / sizeof(char*) - 1)
rc += sprintf(out_buf, "%s", messages[msg_nr]);
else
rc += sprintf(out_buf, UNKNOWNSTR, msg_nr);
}
- out:
- return rc;
+ return rc;
}
struct debug_view debug_test_view = {
@@ -452,13 +462,17 @@ Example::
test:
=====
-::
+.. code-block:: c
debug_info_t *debug_info;
+ int i;
...
- debug_info = debug_register ("test", 0, 4, 4 ));
+ debug_info = debug_register("test", 0, 4, 4);
debug_register_view(debug_info, &debug_test_view);
- for(i = 0; i < 10; i ++) debug_int_event(debug_info, 1, i);
+ for (i = 0; i < 10; i ++)
+ debug_int_event(debug_info, 1, i);
+
+::
> cat /sys/kernel/debug/s390dbf/test/myview
00 00964419734:611402 1 - 00 88042ca This error...........
--
1.8.3.1
^ permalink raw reply related
* [PATCH 0/3] docs: s390: restore content and update s390dbf.rst
From: Steffen Maier @ 2019-07-03 10:19 UTC (permalink / raw)
To: linux-doc
Cc: linux-s390, Mauro Carvalho Chehab, Mauro Carvalho Chehab,
Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
linux-kernel
This is based on top of the 3 s390 patches Heiko already queued on our
s390 features branch.
[("Re: [PATCH v3 00/33] Convert files to ReST - part 1")
https://www.spinics.net/lists/linux-doc/msg66137.html
https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/log/Documentation/s390?h=features]
If I was not mistaken, some documentation was accidentally lost
and patch 1 restores it.
After having looked closer, I came up with patches 2 and 3.
Rendered successfully on a current Fedora 30 and it looks good:
$ make SPHINXDIRS="s390" htmldocs
Steffen Maier (3):
docs: s390: restore important non-kdoc parts of s390dbf.rst
docs: s390: unify and update s390dbf kdocs at debug.c
docs: s390: s390dbf: typos and formatting, update crash command
Documentation/s390/s390dbf.rst | 390 +++++++++++++++++++++++++++++++++++++++--
arch/s390/include/asm/debug.h | 112 ++----------
arch/s390/kernel/debug.c | 105 +++++++++--
3 files changed, 473 insertions(+), 134 deletions(-)
--
1.8.3.1
^ permalink raw reply
* Re: [PATCH v7 1/2] fTPM: firmware TPM running in TEE
From: Sumit Garg @ 2019-07-03 10:03 UTC (permalink / raw)
To: Ilias Apalodimas, Thirupathaiah Annapureddy
Cc: Jarkko Sakkinen, Sasha Levin, peterhuewe@gmx.de, jgg@ziepe.ca,
corbet@lwn.net, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, linux-integrity@vger.kernel.org,
Microsoft Linux Kernel List, Bryan Kelly (CSI),
tee-dev@lists.linaro.org, rdunlap@infradead.org, Joakim Bech
In-Reply-To: <CAC_iWjK2F13QxjuvqzqNLx00SiGz_FQ5X=MQxJyDev57bo3=LQ@mail.gmail.com>
On Wed, 3 Jul 2019 at 13:42, Ilias Apalodimas
<ilias.apalodimas@linaro.org> wrote:
>
> Hi Thirupathaiah,
>
> (+Joakim)
>
> On Wed, 3 Jul 2019 at 09:58, Ilias Apalodimas
> <ilias.apalodimas@linaro.org> wrote:
> >
> > Hi Thirupathaiah,
> > >
> > > First of all, Thanks a lot for trying to test the driver.
> > >
> > np
> >
> > [...]
> > > > I managed to do some quick testing in QEMU.
> > > > Everything works fine when i build this as a module (using IBM's TPM 2.0
> > > > TSS)
> > > >
> > > > - As module
> > > > # insmod /lib/modules/5.2.0-rc1/kernel/drivers/char/tpm/tpm_ftpm_tee.ko
> > > > # getrandom -by 8
> > > > randomBytes length 8
> > > > 23 b9 3d c3 90 13 d9 6b
> > > >
> > > > - Built-in
> > > > # dmesg | grep optee
> > > > ftpm-tee firmware:optee: ftpm_tee_probe:tee_client_open_session failed,
> > > > err=ffff0008
> > > This (0xffff0008) translates to TEE_ERROR_ITEM_NOT_FOUND.
> > >
> > > Where is fTPM TA located in the your test setup?
> > > Is it stitched into TEE binary as an EARLY_TA or
> > > Is it expected to be loaded during run-time with the help of user mode OP-TEE supplicant?
> > >
> > > My guess is that you are trying to load fTPM TA through user mode OP-TEE supplicant.
> > > Can you confirm?
> > I tried both
> >
>
> Ok apparently there was a failure with my built-in binary which i
> didn't notice. I did a full rebuilt and checked the elf this time :)
>
> Built as an earlyTA my error now is:
> ftpm-tee firmware:optee: ftpm_tee_probe:tee_client_open_session
> failed, err=ffff3024 (translates to TEE_ERROR_TARGET_DEAD)
> Since you tested it on real hardware i guess you tried both
> module/built-in. Which TEE version are you using?
>
> > > U-boot and Linux driver stacks work seamlessly without dependency on supplicant.
Is this true?
It looks like this fTPM driver can't work as a built-in driver. The
reason seems to be secure storage access required by OP-TEE fTPM TA
that is provided via OP-TEE supplicant that's not available during
kernel boot.
Snippet from ms-tpm-20-ref/Samples/ARM32-FirmwareTPM/optee_ta/fTPM/fTPM.c +145:
// If we fail to open fTPM storage we cannot continue.
if (_plat__NVEnable(NULL) == 0) {
TEE_Panic(TEE_ERROR_BAD_STATE);
}
So it seems like this module will work as a loadable module only after
OP-TEE supplicant is up.
-Sumit
> Thanks
> /Ilias
^ permalink raw reply
* Re: [PATCH 39/39] docs: gpio: add sysfs interface to the admin-guide
From: Linus Walleij @ 2019-07-03 8:44 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Linux Doc Mailing List, Mauro Carvalho Chehab,
linux-kernel@vger.kernel.org, Jonathan Corbet,
Bartosz Golaszewski, Rafael J. Wysocki, Len Brown, Harry Wei,
Alex Shi, open list:GPIO SUBSYSTEM, ACPI Devel Maling List
In-Reply-To: <1ecff14ec37c0c434f003d93c4b86b1cd3dac834.1561724493.git.mchehab+samsung@kernel.org>
On Fri, Jun 28, 2019 at 2:30 PM Mauro Carvalho Chehab
<mchehab+samsung@kernel.org> wrote:
> While this is stated as obsoleted, the sysfs interface described
> there is still valid, and belongs to the admin-guide.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
This doesn't apply to my tree because of dependencies in the
index so I guess it's best if you merge it:
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Yours,
Linus Walleij
^ permalink raw reply
* Re: [PATCH v7 1/2] fTPM: firmware TPM running in TEE
From: Ilias Apalodimas @ 2019-07-03 8:12 UTC (permalink / raw)
To: Thirupathaiah Annapureddy
Cc: Jarkko Sakkinen, Sasha Levin, peterhuewe@gmx.de, jgg@ziepe.ca,
corbet@lwn.net, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, linux-integrity@vger.kernel.org,
Microsoft Linux Kernel List, Bryan Kelly (CSI),
tee-dev@lists.linaro.org, sumit.garg@linaro.org,
rdunlap@infradead.org, Joakim Bech
In-Reply-To: <20190703065813.GA12724@apalos>
Hi Thirupathaiah,
(+Joakim)
On Wed, 3 Jul 2019 at 09:58, Ilias Apalodimas
<ilias.apalodimas@linaro.org> wrote:
>
> Hi Thirupathaiah,
> >
> > First of all, Thanks a lot for trying to test the driver.
> >
> np
>
> [...]
> > > I managed to do some quick testing in QEMU.
> > > Everything works fine when i build this as a module (using IBM's TPM 2.0
> > > TSS)
> > >
> > > - As module
> > > # insmod /lib/modules/5.2.0-rc1/kernel/drivers/char/tpm/tpm_ftpm_tee.ko
> > > # getrandom -by 8
> > > randomBytes length 8
> > > 23 b9 3d c3 90 13 d9 6b
> > >
> > > - Built-in
> > > # dmesg | grep optee
> > > ftpm-tee firmware:optee: ftpm_tee_probe:tee_client_open_session failed,
> > > err=ffff0008
> > This (0xffff0008) translates to TEE_ERROR_ITEM_NOT_FOUND.
> >
> > Where is fTPM TA located in the your test setup?
> > Is it stitched into TEE binary as an EARLY_TA or
> > Is it expected to be loaded during run-time with the help of user mode OP-TEE supplicant?
> >
> > My guess is that you are trying to load fTPM TA through user mode OP-TEE supplicant.
> > Can you confirm?
> I tried both
>
Ok apparently there was a failure with my built-in binary which i
didn't notice. I did a full rebuilt and checked the elf this time :)
Built as an earlyTA my error now is:
ftpm-tee firmware:optee: ftpm_tee_probe:tee_client_open_session
failed, err=ffff3024 (translates to TEE_ERROR_TARGET_DEAD)
Since you tested it on real hardware i guess you tried both
module/built-in. Which TEE version are you using?
Thanks
/Ilias
^ permalink raw reply
* Re: [PATCH v7 1/2] fTPM: firmware TPM running in TEE
From: Ilias Apalodimas @ 2019-07-03 6:58 UTC (permalink / raw)
To: Thirupathaiah Annapureddy
Cc: Jarkko Sakkinen, Sasha Levin, peterhuewe@gmx.de, jgg@ziepe.ca,
corbet@lwn.net, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, linux-integrity@vger.kernel.org,
Microsoft Linux Kernel List, Bryan Kelly (CSI),
tee-dev@lists.linaro.org, sumit.garg@linaro.org,
rdunlap@infradead.org
In-Reply-To: <CY4PR21MB0279B99FB0097309ADE83809BCF80@CY4PR21MB0279.namprd21.prod.outlook.com>
Hi Thirupathaiah,
>
> First of all, Thanks a lot for trying to test the driver.
>
np
[...]
> > I managed to do some quick testing in QEMU.
> > Everything works fine when i build this as a module (using IBM's TPM 2.0
> > TSS)
> >
> > - As module
> > # insmod /lib/modules/5.2.0-rc1/kernel/drivers/char/tpm/tpm_ftpm_tee.ko
> > # getrandom -by 8
> > randomBytes length 8
> > 23 b9 3d c3 90 13 d9 6b
> >
> > - Built-in
> > # dmesg | grep optee
> > ftpm-tee firmware:optee: ftpm_tee_probe:tee_client_open_session failed,
> > err=ffff0008
> This (0xffff0008) translates to TEE_ERROR_ITEM_NOT_FOUND.
>
> Where is fTPM TA located in the your test setup?
> Is it stitched into TEE binary as an EARLY_TA or
> Is it expected to be loaded during run-time with the help of user mode OP-TEE supplicant?
>
> My guess is that you are trying to load fTPM TA through user mode OP-TEE supplicant.
> Can you confirm?
I tried both
> If that is the true,
> - In the case of driver built as a module (CONFIG_TCG_FTPM_TEE=m), this is works fine
> as user mode supplicant is ready.
> - In the built-in case (CONFIG_TCG_FTPM_TEE=y),
> This would result in the above error 0xffff0008 as TEE is unable to find fTPM TA.
Maybe i did something wrong and never noticed it wasn't built as an earlyTA
>
> The expectation is that fTPM TA is built as an EARLY_TA (in BL32) so that
> U-boot and Linux driver stacks work seamlessly without dependency on supplicant.
>
You can add my tested-by tag for the module. I'll go back to testing it as
built-in at some point in real hardware and let you know if i have any issues.
If someone's is interested in the QEMU testing:
1. compile this https://github.com/jbech-linaro/manifest/blob/ftpm/README.md
2. replace the whole linux kernel on the root-dir with a latest version + fTPM
char driver
3. Apply a hack on kernel and disable dynamic shm (Need for this depends on
kernel + op-tee version)
diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
index 1854a3db..7aea8a5 100644
--- a/drivers/tee/optee/core.c
+++ b/drivers/tee/optee/core.c
@@ -588,13 +588,15 @@ static struct optee *optee_probe(struct device_node *np)
/*
* Try to use dynamic shared memory if possible
*/
+#if 0
if (sec_caps & OPTEE_SMC_SEC_CAP_DYNAMIC_SHM)
pool = optee_config_dyn_shm();
+#endif
/*
* If dynamic shared memory is not available or failed - try static one
*/
- if (IS_ERR(pool) && (sec_caps & OPTEE_SMC_SEC_CAP_HAVE_RESERVED_SHM))
+ if (sec_caps & OPTEE_SMC_SEC_CAP_HAVE_RESERVED_SHM)
pool = optee_config_shm_memremap(invoke_fn, &memremaped_shm);
if (IS_ERR(pool))
For the module part:
Tested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
^ permalink raw reply related
* Re: [PATCH] mm, slab: Extend slab/shrink to shrink all the memcg caches
From: Michal Hocko @ 2019-07-03 6:56 UTC (permalink / raw)
To: Waiman Long
Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Alexander Viro, Jonathan Corbet, Luis Chamberlain,
Kees Cook, Johannes Weiner, Vladimir Davydov, linux-mm, linux-doc,
linux-fsdevel, cgroups, linux-kernel, Roman Gushchin,
Shakeel Butt, Andrea Arcangeli
In-Reply-To: <20190702183730.14461-1-longman@redhat.com>
On Tue 02-07-19 14:37:30, Waiman Long wrote:
> Currently, a value of '1" is written to /sys/kernel/slab/<slab>/shrink
> file to shrink the slab by flushing all the per-cpu slabs and free
> slabs in partial lists. This applies only to the root caches, though.
>
> Extends this capability by shrinking all the child memcg caches and
> the root cache when a value of '2' is written to the shrink sysfs file.
Why do we need a new value for this functionality? I would tend to think
that skipping memcg caches is a bug/incomplete implementation. Or is it
a deliberate decision to cover root caches only?
--
Michal Hocko
SUSE Labs
^ permalink raw reply
* [PATCH v4] docs: aha152x.txt convert it to ReST
From: Sushma Unnibhavi @ 2019-07-03 6:14 UTC (permalink / raw)
To: skhan
Cc: Sushma Unnibhavi, corbet, mchehab, linux-kernel-mentees,
linux-doc, linux-kernel
This patch converts aha152x.rst
to ReST format, No content change.
Added aha152x.rst to sh/index.rst
Added SPDX tag in index.rst
Signed-off-by: Sushma Unnibhavi <sushmaunnibhavi425@gmail.com>
---
Documentation/driver-api/index.rst | 1 +
Documentation/scsi/aha152x.rst | 203 ++++++++++++++++++++++++++++
Documentation/scsi/aha152x.txt | 183 -------------------------
Documentation/scsi/source/conf.py | 52 +++++++
Documentation/scsi/source/index.rst | 22 +++
5 files changed, 278 insertions(+), 183 deletions(-)
create mode 100644 Documentation/scsi/aha152x.rst
delete mode 100644 Documentation/scsi/aha152x.txt
create mode 100644 Documentation/scsi/source/conf.py
create mode 100644 Documentation/scsi/source/index.rst
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index d26308af6036..e26809c95c79 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -32,6 +32,7 @@ available subsections can be seen below.
usb/index
firewire
pci/index
+ scsi/index
spi
i2c
i3c/index
diff --git a/Documentation/scsi/aha152x.rst b/Documentation/scsi/aha152x.rst
new file mode 100644
index 000000000000..3c4d558b9daf
--- /dev/null
+++ b/Documentation/scsi/aha152x.rst
@@ -0,0 +1,203 @@
+
+=====================================================
+Adaptec AHA-1520/1522 SCSI driver for Linux (aha152x)
+=====================================================
+
+Copyright 1993-1999 Jürgen Fischer <fischer@norbit.de>
+TC1550 patches by Luuk van Dijk (ldz@xs4all.nl)
+
+
+In Revision 2 the driver was modified a lot (especially the
+bottom-half handler complete()).
+
+The driver is much cleaner now, has support for the new
+error handling code in 2.3, produced less cpu load (much
+less polling loops), has slightly higher throughput (at
+least on my ancient test box; a i486/33Mhz/20MB).
+
+
+========================
+Configuration Arguments
+========================
++-----------+------------------------------------------+---------------------------+
+|IOPORT| | base io address | (0x340/0x140) |
++-----------+------------------------------------------+---------------------------+
+|IRQ | interrupt level | (9-12; default 11)| |
++-----------+------------------------------------------+---------------------------+
+|SCSI_ID | scsi id of controller | (0-7; default 7) |
++-----------+------------------------------------------+---------------------------+
+|RECONNECT | allow targets to disconnect from the bus| (0/1; default 1 [on]) |
++-----------+------------------------------------------+---------------------------+
+|PARITY | enable parity checking | (0/1; default 1 [on]) |
++-----------+------------------------------------------+---------------------------+
+|SYNCHRONOUS| enable synchronous transfers | (0/1; default 1 [on]) |
++-----------+------------------------------------------+---------------------------+
+|DELAY: | bus reset delay | (default 100) |
++-----------+------------------------------------------+---------------------------+
+|EXT_TRANS: | enable extended translation (see NOTES) | (0/1: default 0 [off]) |
++-----------+------------------------------------------+---------------------------+
+
+========================================================================
+Compile Time Configuration (go into AHA152X in drivers/scsi/Makefile)
+========================================================================
+
+-DAUTOCONF
+ use configuration the controller reports (AHA-152x only)
+
+-DSKIP_BIOSTEST
+ Don't test for BIOS signature (AHA-1510 or disabled BIOS)
+
+-DSETUP0="{ IOPORT, IRQ, SCSI_ID, RECONNECT, PARITY, SYNCHRONOUS, DELAY, EXT_TRANS }"
+ override for the first controller
+
+-DSETUP1="{ IOPORT, IRQ, SCSI_ID, RECONNECT, PARITY, SYNCHRONOUS, DELAY, EXT_TRANS }"
+ override for the second controller
+
+-DAHA152X_DEBUG
+ enable debugging output
+
+-DAHA152X_STAT
+ enable some statistics
+
+
+==========================
+Lilo Command Line Options
+==========================
+
+aha152x=<IOPORT>[,<IRQ>[,<SCSI-ID>[,<RECONNECT>[,<PARITY>[,<SYNCHRONOUS>[,<DELAY> [,<EXT_TRANS]]]]]]]
+
+The normal configuration can be overridden by specifying a command
+line.When you do this, the BIOS test is skipped. Entered values
+have to be valid (known). Don't use values that aren't supported
+under normal operation. If you think that you need other values:
+contact me. For two controllers use the aha152x statement twice.
+
+
+=================================
+Symbols For Module Configuration
+=================================
+---------------------------
+Choose From 2 Alternatives
+---------------------------
+1. specify everything (old)
+
+ aha152x=IOPORT,IRQ,SCSI_ID,RECONNECT,PARITY,SYNCHRONOUS,DELAY,EXT_TRANS
+ configuration override for first controller
+
+
+ aha152x1=IOPORT,IRQ,SCSI_ID,RECONNECT,PARITY,SYNCHRONOUS,DELAY,EXT_TRANS
+ configuration override for second controller
+
+2. specify only what you need to (irq or io is required; new)
+
+ io=IOPORT0[,IOPORT1]
+ IOPORT for first and second controller
+
+ irq=IRQ0[,IRQ1]
+ IRQ for first and second controller
+
+ scsiid=SCSIID0[,SCSIID1]
+ SCSIID for first and second controller
+
+ reconnect=RECONNECT0[,RECONNECT1]
+ allow targets to disconnect for first and second controller
+
+ parity=PAR0[PAR1]
+ use parity for first and second controller
+
+ sync=SYNCHRONOUS0[,SYNCHRONOUS1]
+ enable synchronous transfers for first and second controller
+
+ delay=DELAY0[,DELAY1]
+ reset DELAY for first and second controller
+
+ exttrans=EXTTRANS0[,EXTTRANS1]
+ enable extended translation for first and second controller
+
+
+If you use both alternatives the first will be taken.
+
+
+====================
+NOTES ON EXT_TRANS:
+====================
+
+SCSI uses block numbers to address blocks/sectors on a device.
+The BIOS uses a cylinder/head/sector addressing scheme (C/H/S)
+scheme instead. DOS expects a BIOS or driver that understands
+this C/H/S addressing.
+
+The number of cylinders/heads/sectors is called geometry and is
+required as base for requests in C/H/S addressing. SCSI only
+knows about the total capacity of disks in blocks (sectors).
+
+Therefore the SCSI BIOS/DOS driver has to calculate a logical/virtual
+geometry just to be able to support that addressing scheme. The
+geometry returned by the SCSI BIOS is a pure calculation and has
+nothing to do with the real/physical geometry of the disk (which
+is usually irrelevant anyway).
+
+Basically this has no impact at all on Linux, because it also uses block
+instead of C/H/S addressing. Unfortunately C/H/S addressing is also used
+in the partition table and therefore every operating system has to know
+the right geometry to be able to interpret it.
+
+Moreover there are certain limitations to the C/H/S addressing scheme,
+namely the address space is limited to up to 255 heads, up to 63 sectors
+and a maximum of 1023 cylinders.
+
+The AHA-1522 BIOS calculates the geometry by fixing the number of heads
+to 64, the number of sectors to 32 and by calculating the number of
+cylinders by dividing the capacity reported by the disk by 64*32 (1 MB).
+This is considered to be the default translation.
+
+With respect to the limit of 1023 cylinders using C/H/S you can only
+address the first GB of your disk in the partition table. Therefore
+BIOSes of some newer controllers based on the AIC-6260/6360 support
+extended translation. This means that the BIOS uses 255 for heads,
+63 for sectors and then divides the capacity of the disk by 255*63
+(about 8 MB), as soon it sees a disk greater than 1 GB. That results
+in a maximum of about 8 GB addressable diskspace in the partition
+table (but there are already bigger disks out there today).
+
+To make it even more complicated the translation mode might/might
+not be configurable in certain BIOS setups.
+
+This driver does some more or less failsafe guessing to get the
+geometry right in most cases:
+
+- for disks<1GB:
+ -use default translation (C/32/64)
+
+- for disks>1GB:
+ - take current geometry from the partition table (using scsicam_bios_param
+ and accept only `valid` geometries, ie. either (C/32/64) or (C/63/255)).
+ This can be extended translation even if it's not enabled in the driver.
+
+ - if that fails, take extended translation if enabled by override,
+ kernel or module parameter, otherwise take default translation and
+ ask the user for verification. This might on not yet partitioned
+ disks.
+
+
+==================
+REFERENCES USED:
+==================
+ "AIC-6260 SCSI Chip Specification", Adaptec Corporation.
+
+ "SCSI COMPUTER SYSTEM INTERFACE - 2 (SCSI-2)", X3T9.2/86-109 rev. 10h
+
+ "Writing a SCSI device driver for Linux", Rik Faith (faith@cs.unc.edu)
+
+ "Kernel Hacker's Guide", Michael K. Johnson (johnsonm@sunsite.unc.edu)
+
+ "Adaptec 1520/1522 User's Guide", Adaptec Corporation.
+
+ Michael K. Johnson (johnsonm@sunsite.unc.edu)
+
+ Drew Eckhardt (drew@cs.colorado.edu)
+
+ Eric Youngdale (eric@andante.org)
+
+ special thanks to Eric Youngdale for the free(!) supplying the
+ documentation on the chip.
diff --git a/Documentation/scsi/aha152x.txt b/Documentation/scsi/aha152x.txt
deleted file mode 100644
index 94848734ac66..000000000000
--- a/Documentation/scsi/aha152x.txt
+++ /dev/null
@@ -1,183 +0,0 @@
-$Id: README.aha152x,v 1.2 1999/12/25 15:32:30 fischer Exp fischer $
-Adaptec AHA-1520/1522 SCSI driver for Linux (aha152x)
-
-Copyright 1993-1999 Jürgen Fischer <fischer@norbit.de>
-TC1550 patches by Luuk van Dijk (ldz@xs4all.nl)
-
-
-In Revision 2 the driver was modified a lot (especially the
-bottom-half handler complete()).
-
-The driver is much cleaner now, has support for the new
-error handling code in 2.3, produced less cpu load (much
-less polling loops), has slightly higher throughput (at
-least on my ancient test box; a i486/33Mhz/20MB).
-
-
-CONFIGURATION ARGUMENTS:
-
-IOPORT base io address (0x340/0x140)
-IRQ interrupt level (9-12; default 11)
-SCSI_ID scsi id of controller (0-7; default 7)
-RECONNECT allow targets to disconnect from the bus (0/1; default 1 [on])
-PARITY enable parity checking (0/1; default 1 [on])
-SYNCHRONOUS enable synchronous transfers (0/1; default 1 [on])
-DELAY: bus reset delay (default 100)
-EXT_TRANS: enable extended translation (0/1: default 0 [off])
- (see NOTES)
-
-COMPILE TIME CONFIGURATION (go into AHA152X in drivers/scsi/Makefile):
-
--DAUTOCONF
- use configuration the controller reports (AHA-152x only)
-
--DSKIP_BIOSTEST
- Don't test for BIOS signature (AHA-1510 or disabled BIOS)
-
--DSETUP0="{ IOPORT, IRQ, SCSI_ID, RECONNECT, PARITY, SYNCHRONOUS, DELAY, EXT_TRANS }"
- override for the first controller
-
--DSETUP1="{ IOPORT, IRQ, SCSI_ID, RECONNECT, PARITY, SYNCHRONOUS, DELAY, EXT_TRANS }"
- override for the second controller
-
--DAHA152X_DEBUG
- enable debugging output
-
--DAHA152X_STAT
- enable some statistics
-
-
-LILO COMMAND LINE OPTIONS:
-
-aha152x=<IOPORT>[,<IRQ>[,<SCSI-ID>[,<RECONNECT>[,<PARITY>[,<SYNCHRONOUS>[,<DELAY> [,<EXT_TRANS]]]]]]]
-
- The normal configuration can be overridden by specifying a command line.
- When you do this, the BIOS test is skipped. Entered values have to be
- valid (known). Don't use values that aren't supported under normal
- operation. If you think that you need other values: contact me.
- For two controllers use the aha152x statement twice.
-
-
-SYMBOLS FOR MODULE CONFIGURATION:
-
-Choose from 2 alternatives:
-
-1. specify everything (old)
-
-aha152x=IOPORT,IRQ,SCSI_ID,RECONNECT,PARITY,SYNCHRONOUS,DELAY,EXT_TRANS
- configuration override for first controller
-
-
-aha152x1=IOPORT,IRQ,SCSI_ID,RECONNECT,PARITY,SYNCHRONOUS,DELAY,EXT_TRANS
- configuration override for second controller
-
-2. specify only what you need to (irq or io is required; new)
-
-io=IOPORT0[,IOPORT1]
- IOPORT for first and second controller
-
-irq=IRQ0[,IRQ1]
- IRQ for first and second controller
-
-scsiid=SCSIID0[,SCSIID1]
- SCSIID for first and second controller
-
-reconnect=RECONNECT0[,RECONNECT1]
- allow targets to disconnect for first and second controller
-
-parity=PAR0[PAR1]
- use parity for first and second controller
-
-sync=SYNCHRONOUS0[,SYNCHRONOUS1]
- enable synchronous transfers for first and second controller
-
-delay=DELAY0[,DELAY1]
- reset DELAY for first and second controller
-
-exttrans=EXTTRANS0[,EXTTRANS1]
- enable extended translation for first and second controller
-
-
-If you use both alternatives the first will be taken.
-
-
-NOTES ON EXT_TRANS:
-
-SCSI uses block numbers to address blocks/sectors on a device.
-The BIOS uses a cylinder/head/sector addressing scheme (C/H/S)
-scheme instead. DOS expects a BIOS or driver that understands this
-C/H/S addressing.
-
-The number of cylinders/heads/sectors is called geometry and is required
-as base for requests in C/H/S addressing. SCSI only knows about the
-total capacity of disks in blocks (sectors).
-
-Therefore the SCSI BIOS/DOS driver has to calculate a logical/virtual
-geometry just to be able to support that addressing scheme. The geometry
-returned by the SCSI BIOS is a pure calculation and has nothing to
-do with the real/physical geometry of the disk (which is usually
-irrelevant anyway).
-
-Basically this has no impact at all on Linux, because it also uses block
-instead of C/H/S addressing. Unfortunately C/H/S addressing is also used
-in the partition table and therefore every operating system has to know
-the right geometry to be able to interpret it.
-
-Moreover there are certain limitations to the C/H/S addressing scheme,
-namely the address space is limited to up to 255 heads, up to 63 sectors
-and a maximum of 1023 cylinders.
-
-The AHA-1522 BIOS calculates the geometry by fixing the number of heads
-to 64, the number of sectors to 32 and by calculating the number of
-cylinders by dividing the capacity reported by the disk by 64*32 (1 MB).
-This is considered to be the default translation.
-
-With respect to the limit of 1023 cylinders using C/H/S you can only
-address the first GB of your disk in the partition table. Therefore
-BIOSes of some newer controllers based on the AIC-6260/6360 support
-extended translation. This means that the BIOS uses 255 for heads,
-63 for sectors and then divides the capacity of the disk by 255*63
-(about 8 MB), as soon it sees a disk greater than 1 GB. That results
-in a maximum of about 8 GB addressable diskspace in the partition table
-(but there are already bigger disks out there today).
-
-To make it even more complicated the translation mode might/might
-not be configurable in certain BIOS setups.
-
-This driver does some more or less failsafe guessing to get the
-geometry right in most cases:
-
-- for disks<1GB: use default translation (C/32/64)
-
-- for disks>1GB:
- - take current geometry from the partition table
- (using scsicam_bios_param and accept only `valid' geometries,
- ie. either (C/32/64) or (C/63/255)). This can be extended translation
- even if it's not enabled in the driver.
-
- - if that fails, take extended translation if enabled by override,
- kernel or module parameter, otherwise take default translation and
- ask the user for verification. This might on not yet partitioned
- disks.
-
-
-REFERENCES USED:
-
- "AIC-6260 SCSI Chip Specification", Adaptec Corporation.
-
- "SCSI COMPUTER SYSTEM INTERFACE - 2 (SCSI-2)", X3T9.2/86-109 rev. 10h
-
- "Writing a SCSI device driver for Linux", Rik Faith (faith@cs.unc.edu)
-
- "Kernel Hacker's Guide", Michael K. Johnson (johnsonm@sunsite.unc.edu)
-
- "Adaptec 1520/1522 User's Guide", Adaptec Corporation.
-
- Michael K. Johnson (johnsonm@sunsite.unc.edu)
-
- Drew Eckhardt (drew@cs.colorado.edu)
-
- Eric Youngdale (eric@andante.org)
-
- special thanks to Eric Youngdale for the free(!) supplying the
- documentation on the chip.
diff --git a/Documentation/scsi/source/conf.py b/Documentation/scsi/source/conf.py
new file mode 100644
index 000000000000..8f60483b49fb
--- /dev/null
+++ b/Documentation/scsi/source/conf.py
@@ -0,0 +1,52 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# http://www.sphinx-doc.org/en/master/config
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+
+
+# -- Project information -----------------------------------------------------
+
+project = 'doc'
+copyright = '2019, sushma'
+author = 'sushma'
+
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+]
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = []
+
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages. See the documentation for
+# a list of builtin themes.
+#
+html_theme = 'alabaster'
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
diff --git a/Documentation/scsi/source/index.rst b/Documentation/scsi/source/index.rst
new file mode 100644
index 000000000000..003259e30a59
--- /dev/null
+++ b/Documentation/scsi/source/index.rst
@@ -0,0 +1,22 @@
+.. doc documentation master file, created by
+ sphinx-quickstart on Mon Jul 1 11:21:20 2019.
+ You can adapt this file completely to your liking, but it should at least
+ contain the root `toctree` directive.
+.SPDX-License-Identifier: GPL-2.0
+
+===============================
+SCSI Subsystem
+===============================
+
+.. toctree::
+ :maxdepth: 2
+ :caption: Contents:
+
+aha152x
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
--
2.17.1
^ permalink raw reply related
* Re: [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction within the same inner shareable domain
From: qi.fuli @ 2019-07-03 2:45 UTC (permalink / raw)
To: Will Deacon, qi.fuli@fujitsu.com
Cc: Will Deacon, indou.takao@fujitsu.com, linux-doc@vger.kernel.org,
peterz@infradead.org, Catalin Marinas, Jonathan Corbet,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <20190627102724.vif6zh6zfqktpmjx@willie-the-truck>
Hi Will,
Thanks for your comments.
On 6/27/19 7:27 PM, Will Deacon wrote:
> On Mon, Jun 24, 2019 at 10:34:02AM +0000, qi.fuli@fujitsu.com wrote:
>> On 6/18/19 2:03 AM, Will Deacon wrote:
>>> On Mon, Jun 17, 2019 at 11:32:53PM +0900, Takao Indoh wrote:
>>>> From: Takao Indoh <indou.takao@fujitsu.com>
>>>>
>>>> I found a performance issue related on the implementation of Linux's TLB
>>>> flush for arm64.
>>>>
>>>> When I run a single-threaded test program on moderate environment, it
>>>> usually takes 39ms to finish its work. However, when I put a small
>>>> apprication, which just calls mprotest() continuously, on one of sibling
>>>> cores and run it simultaneously, the test program slows down significantly.
>>>> It becomes 49ms(125%) on ThunderX2. I also detected the same problem on
>>>> ThunderX1 and Fujitsu A64FX.
>>> This is a problem for any applications that share hardware resources with
>>> each other, so I don't think it's something we should be too concerned about
>>> addressing unless there is a practical DoS scenario, which there doesn't
>>> appear to be in this case. It may be that the real answer is "don't call
>>> mprotect() in a loop".
>> I think there has been a misunderstanding, please let me explain.
>> This application is just an example using for reproducing the
>> performance issue we found.
>> Our original purpose is reducing OS jitter by this series.
>> The OS jitter on massively parallel processing systems have been known
>> and studied for many years.
>> The 2.5% OS jitter can result in over a factor of 20 slowdown for the
>> same application [1].
> I think it's worth pointing out that the system in question was neither
> ARM-based nor running Linux, so I'd be cautious in applying the conclusions
> of that paper directly to our TLB invalidation code. Furthermore, the noise
> being generated in their experiments uses a timer interrupt, which has a
> /vastly/ different profile to a DVM message in terms of both system impact
> and frequency.
My original purpose was to explain that the OS jitter is a vital issue for
large-scale HPC environment by referencing this paper.
Please allow me to introduce the issue that had occurred to our HPC
environment.
We used FWQ [1] to do an experiment on 1 node of our HPC environment,
we expected it would be tens of microseconds of maximum OS jitter, but
it was
hundreds of microseconds, which didn't meet our requirement. We tried to
find
out the cause by using ftrace, but we cannot find any processes which would
cause noise and only knew the extension of processing time. Then we
confirmed
the CPU instruction count through CPU PMU, we also didn't find any changes.
However, we found that with the increase of that the TLB flash was called,
the noise was also increasing. Here we understood that the cause of this
issue
is the implementation of Linux's TLB flush for arm64, especially use of
TLBI-is
instruction which is a broadcast to all processor core on the system.
Therefore,
we made this patch set to fix this issue. After testing for several
times, the
noise was reduced and our original goal was achieved, so we do think
this patch
makes sense.
As I mentioned, the OS jitter is a vital issue for large-scale HPC
environment.
We tried a lot of things to reduce the OS jitter. One of them is task
separation
between the CPUs which are used for computing and the CPUs which are
used for
maintenance. All of the daemon processes and I/O interrupts are bounden
to the
maintenance CPUs. Further more, we used nohz_full to avoid the noise
caused by
computing CPU interruption, but all of the CPUs were affected by TLBI-is
instruction, the task separation of CPUs didn't work. Therefore, we
would like
to implement that TLB flush is done on minimal CPUs to reducing the OS
jitter
by using this patch set.
[1] https://asc.llnl.gov/sequoia/benchmarks/FTQ_summary_v1.1.pdf
Thanks,
QI Fuli
>> Though it may be an extreme example, reducing the OS jitter has been an
>> issue in HPC environment.
>>
>> [1] Ferreira, Kurt B., Patrick Bridges, and Ron Brightwell.
>> "Characterizing application sensitivity to OS interference using
>> kernel-level noise injection." Proceedings of the 2008 ACM/IEEE
>> conference on Supercomputing. IEEE Press, 2008.
>>
>>>> I suppose the root cause of this issue is the implementation of Linux's TLB
>>>> flush for arm64, especially use of TLBI-is instruction which is a broadcast
>>>> to all processor core on the system. In case of the above situation,
>>>> TLBI-is is called by mprotect().
>>> On the flip side, Linux is providing the hardware with enough information
>>> not to broadcast to cores for which the remote TLBs don't have entries
>>> allocated for the ASID being invalidated. I would say that the root cause
>>> of the issue is that this filtering is not taking place.
>> Do you mean that the filter should be implemented in hardware?
> Yes. If you're building a large system and you care about "jitter", then
> you either need to partition it in such a way that sources of noise are
> contained, or you need to introduce filters to limit their scope. Rewriting
> the low-level memory-management parts of the operating system is a red
> herring and imposes a needless burden on everybody else without solving
> the real problem, which is that contended use of shared resources doesn't
> scale.
>
> Will
^ permalink raw reply
* [PATCH] MAINTAINERS: Update for Intel Speed Select Technology
From: Srinivas Pandruvada @ 2019-07-03 1:53 UTC (permalink / raw)
To: dvhart, andy, andriy.shevchenko, corbet
Cc: rjw, alan, lenb, prarit, darcari, linux-doc, linux-kernel,
platform-driver-x86, Srinivas Pandruvada
Added myself as the maintainer.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
MAINTAINERS | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 5cfbea4ce575..b6ed7958372d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8101,6 +8101,14 @@ S: Supported
F: drivers/infiniband/hw/i40iw/
F: include/uapi/rdma/i40iw-abi.h
+INTEL SPEED SELECT TECHNOLOGY
+M: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
+L: platform-driver-x86@vger.kernel.org
+S: Maintained
+F: drivers/platform/x86/intel_speed_select_if/
+F: tools/power/x86/intel-speed-select/
+F: include/uapi/linux/isst_if.h
+
INTEL TELEMETRY DRIVER
M: Rajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com>
M: "David E. Box" <david.e.box@linux.intel.com>
--
2.17.2
^ permalink raw reply related
* Re: [PATCH] mm, slab: Extend slab/shrink to shrink all the memcg caches
From: Waiman Long @ 2019-07-02 20:44 UTC (permalink / raw)
To: Andrew Morton
Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Alexander Viro, Jonathan Corbet, Luis Chamberlain, Kees Cook,
Johannes Weiner, Michal Hocko, Vladimir Davydov, linux-mm,
linux-doc, linux-fsdevel, cgroups, linux-kernel, Roman Gushchin,
Shakeel Butt, Andrea Arcangeli
In-Reply-To: <20190702130318.39d187dc27dbdd9267788165@linux-foundation.org>
On 7/2/19 4:03 PM, Andrew Morton wrote:
> On Tue, 2 Jul 2019 14:37:30 -0400 Waiman Long <longman@redhat.com> wrote:
>
>> Currently, a value of '1" is written to /sys/kernel/slab/<slab>/shrink
>> file to shrink the slab by flushing all the per-cpu slabs and free
>> slabs in partial lists. This applies only to the root caches, though.
>>
>> Extends this capability by shrinking all the child memcg caches and
>> the root cache when a value of '2' is written to the shrink sysfs file.
> Why?
>
> Please fully describe the value of the proposed feature to or users.
> Always.
Sure. Essentially, the sysfs shrink interface is not complete. It allows
the root cache to be shrunk, but not any of the memcg caches.
The same can also be said for others slab sysfs files which show current
cache status. I don't think sysfs files are created for the memcg
caches, but I may be wrong. In many cases, information can be available
elsewhere like the slabinfo file. The shrink operation, however, has no
other alternative available.
>> ...
>>
>> --- a/Documentation/ABI/testing/sysfs-kernel-slab
>> +++ b/Documentation/ABI/testing/sysfs-kernel-slab
>> @@ -429,10 +429,12 @@ KernelVersion: 2.6.22
>> Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
>> Christoph Lameter <cl@linux-foundation.org>
>> Description:
>> - The shrink file is written when memory should be reclaimed from
>> - a cache. Empty partial slabs are freed and the partial list is
>> - sorted so the slabs with the fewest available objects are used
>> - first.
>> + A value of '1' is written to the shrink file when memory should
>> + be reclaimed from a cache. Empty partial slabs are freed and
>> + the partial list is sorted so the slabs with the fewest
>> + available objects are used first. When a value of '2' is
>> + written, all the corresponding child memory cgroup caches
>> + should be shrunk as well. All other values are invalid.
> One would expect this to be a bitfield, like /proc/sys/vm/drop_caches.
> So writing 3 does both forms of shrinking.
>
> Yes, it happens to be the case that 2 is a superset of 1, but what
> about if we add "4"?
>
Yes, I can make it into a bit fields of 2 bits, just like
/proc/sys/vm/drop_caches.
Cheers,
Longman
^ permalink raw reply
* Re: [PATCH v6 1/6] ARM: Add TTBR operator for kasan_init
From: Linus Walleij @ 2019-07-02 21:03 UTC (permalink / raw)
To: Florian Fainelli, Russell King
Cc: Linux ARM, bcm-kernel-feedback-list, Abbott Liu, Andrey Ryabinin,
Alexander Potapenko, Dmitry Vyukov, Jonathan Corbet, Russell King,
christoffer.dall, Marc Zyngier, Arnd Bergmann, Nicolas Pitre,
Vladimir Murzin, Kees Cook, jinb.park7, Alexandre Belloni,
Ard Biesheuvel, Daniel Lezcano, Philippe Ombredanne, Rob Landley,
Greg KH, Andrew Morton, Mark Rutland, Catalin Marinas,
Masahiro Yamada, Thomas Gleixner, thgarnie, David Howells,
Geert Uytterhoeven, Andre Przywara, julien.thierry, drjones,
philip, mhocko, kirill.shutemov, kasan-dev,
Linux Doc Mailing List, linux-kernel@vger.kernel.org, kvmarm,
Andrey Ryabinin
In-Reply-To: <20190617221134.9930-2-f.fainelli@gmail.com>
Hi Florian!
thanks for your patch!
On Tue, Jun 18, 2019 at 12:11 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
> From: Abbott Liu <liuwenliang@huawei.com>
>
> The purpose of this patch is to provide set_ttbr0/get_ttbr0 to
> kasan_init function. The definitions of cp15 registers should be in
> arch/arm/include/asm/cp15.h rather than arch/arm/include/asm/kvm_hyp.h,
> so move them.
>
> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Reported-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Abbott Liu <liuwenliang@huawei.com>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> +#include <linux/stringify.h>
What is this for? I think it can be dropped.
This stuff adding a whole bunch of accessors:
> +static inline void set_par(u64 val)
> +{
> + if (IS_ENABLED(CONFIG_ARM_LPAE))
> + write_sysreg(val, PAR_64);
> + else
> + write_sysreg(val, PAR_32);
> +}
Can we put that in a separate patch since it is not
adding any users, so this is a pure refactoring patch for
the current code?
Yours,
Linus Walleij
^ permalink raw reply
* Re: [PATCH] mm, slab: Extend slab/shrink to shrink all the memcg caches
From: Andrew Morton @ 2019-07-02 21:33 UTC (permalink / raw)
To: Waiman Long
Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Alexander Viro, Jonathan Corbet, Luis Chamberlain, Kees Cook,
Johannes Weiner, Michal Hocko, Vladimir Davydov, linux-mm,
linux-doc, linux-fsdevel, cgroups, linux-kernel, Roman Gushchin,
Shakeel Butt, Andrea Arcangeli
In-Reply-To: <78879b79-1b8f-cdfd-d4fa-610afe5e5d48@redhat.com>
On Tue, 2 Jul 2019 16:44:24 -0400 Waiman Long <longman@redhat.com> wrote:
> On 7/2/19 4:03 PM, Andrew Morton wrote:
> > On Tue, 2 Jul 2019 14:37:30 -0400 Waiman Long <longman@redhat.com> wrote:
> >
> >> Currently, a value of '1" is written to /sys/kernel/slab/<slab>/shrink
> >> file to shrink the slab by flushing all the per-cpu slabs and free
> >> slabs in partial lists. This applies only to the root caches, though.
> >>
> >> Extends this capability by shrinking all the child memcg caches and
> >> the root cache when a value of '2' is written to the shrink sysfs file.
> > Why?
> >
> > Please fully describe the value of the proposed feature to or users.
> > Always.
>
> Sure. Essentially, the sysfs shrink interface is not complete. It allows
> the root cache to be shrunk, but not any of the memcg caches.
But that doesn't describe anything of value. Who wants to use this,
and why? How will it be used? What are the use-cases?
^ permalink raw reply
* Re: [PATCH v5 07/18] kunit: test: add initial tests
From: Luis Chamberlain @ 2019-07-02 20:57 UTC (permalink / raw)
To: Brendan Higgins
Cc: Frank Rowand, Greg KH, Josh Poimboeuf, Kees Cook, Kieran Bingham,
Peter Zijlstra, Rob Herring, Stephen Boyd, shuah,
Theodore Ts'o, Masahiro Yamada, devicetree, dri-devel,
kunit-dev, open list:DOCUMENTATION, linux-fsdevel, linux-kbuild,
Linux Kernel Mailing List, open list:KERNEL SELFTEST FRAMEWORK,
linux-nvdimm, linux-um, Sasha Levin, Bird, Timothy,
Amir Goldstein, Dan Carpenter, Daniel Vetter, Jeff Dike,
Joel Stanley, Julia Lawall, Kevin Hilman, Knut Omang,
Logan Gunthorpe, Michael Ellerman, Petr Mladek, Randy Dunlap,
Richard Weinberger, David Rientjes, Steven Rostedt, wfg
In-Reply-To: <CAFd5g46=7OQDREdLDTiMgVWq-Xj2zfOw8cRhPJEihSbO89MDyA@mail.gmail.com>
On Tue, Jul 02, 2019 at 10:52:50AM -0700, Brendan Higgins wrote:
> On Wed, Jun 26, 2019 at 12:53 AM Brendan Higgins
> <brendanhiggins@google.com> wrote:
> >
> > On Tue, Jun 25, 2019 at 4:22 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > >
> > > On Mon, Jun 17, 2019 at 01:26:02AM -0700, Brendan Higgins wrote:
> > > > diff --git a/kunit/example-test.c b/kunit/example-test.c
> > > > new file mode 100644
> > > > index 0000000000000..f44b8ece488bb
> > > > --- /dev/null
> > > > +++ b/kunit/example-test.c
> > >
> > > <-- snip -->
> > >
> > > > +/*
> > > > + * This defines a suite or grouping of tests.
> > > > + *
> > > > + * Test cases are defined as belonging to the suite by adding them to
> > > > + * `kunit_cases`.
> > > > + *
> > > > + * Often it is desirable to run some function which will set up things which
> > > > + * will be used by every test; this is accomplished with an `init` function
> > > > + * which runs before each test case is invoked. Similarly, an `exit` function
> > > > + * may be specified which runs after every test case and can be used to for
> > > > + * cleanup. For clarity, running tests in a test module would behave as follows:
> > > > + *
> > >
> > > To be clear this is not the kernel module init, but rather the kunit
> > > module init. I think using kmodule would make this clearer to a reader.
> >
> > Seems reasonable. Will fix in next revision.
> >
> > > > + * module.init(test);
> > > > + * module.test_case[0](test);
> > > > + * module.exit(test);
> > > > + * module.init(test);
> > > > + * module.test_case[1](test);
> > > > + * module.exit(test);
> > > > + * ...;
> > > > + */
>
> Do you think it might be clearer yet to rename `struct kunit_module
> *module;` to `struct kunit_suite *suite;`?
Yes. Definitely. Or struct kunit_test. Up to you.
Luis
^ permalink raw reply
* Re: [PATCH v6 0/6] KASan for arm
From: Linus Walleij @ 2019-07-02 21:06 UTC (permalink / raw)
To: Florian Fainelli
Cc: Linux ARM, bcm-kernel-feedback-list, Alexander Potapenko,
Dmitry Vyukov, Jonathan Corbet, Russell King, christoffer.dall,
Marc Zyngier, Arnd Bergmann, Nicolas Pitre, Vladimir Murzin,
Kees Cook, jinb.park7, Alexandre Belloni, Ard Biesheuvel,
Daniel Lezcano, Philippe Ombredanne, liuwenliang, Rob Landley,
Greg KH, Andrew Morton, Mark Rutland, Catalin Marinas,
Masahiro Yamada, Thomas Gleixner, thgarnie, David Howells,
Geert Uytterhoeven, Andre Przywara, julien.thierry, drjones,
philip, mhocko, kirill.shutemov, kasan-dev,
Linux Doc Mailing List, linux-kernel@vger.kernel.org, kvmarm,
Andrey Ryabinin
In-Reply-To: <20190617221134.9930-1-f.fainelli@gmail.com>
Hi Florian,
On Tue, Jun 18, 2019 at 12:11 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
> Abbott submitted a v5 about a year ago here:
>
> and the series was not picked up since then, so I rebased it against
> v5.2-rc4 and re-tested it on a Brahma-B53 (ARMv8 running AArch32 mode)
> and Brahma-B15, both LPAE and test-kasan is consistent with the ARM64
> counter part.
>
> We were in a fairly good shape last time with a few different people
> having tested it, so I am hoping we can get that included for 5.4 if
> everything goes well.
Thanks for picking this up. I was trying out KASan in the past,
got sidetracked and honestly lost interest a bit because it was
boring. But I do realize that it is really neat, so I will try to help
out with some review and test on a bunch of hardware I have.
At one point I even had this running on the ARMv4 SA1100
(no joke!) and if I recall correctly, I got stuck because of things
that might very well have been related to using a very fragile
Arm testchip that later broke down completely in the l2cache
when we added the spectre/meltdown fixes.
I start reviewing and testing.
Yours,
Linus Walleij
^ permalink raw reply
* Klientskie bazy. Email: prodawez@armyspy.com Uznajte podrobnee!
From: NAdRkvadroshturman @ 2019-07-02 19:10 UTC (permalink / raw)
To: CnBDFkvadroshturman
Klientskie bazy. Email: prodawez@armyspy.com Uznajte podrobnee!
^ permalink raw reply
* Re: [PATCH] mm, slab: Extend slab/shrink to shrink all the memcg caches
From: Andrew Morton @ 2019-07-02 20:03 UTC (permalink / raw)
To: Waiman Long
Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Alexander Viro, Jonathan Corbet, Luis Chamberlain, Kees Cook,
Johannes Weiner, Michal Hocko, Vladimir Davydov, linux-mm,
linux-doc, linux-fsdevel, cgroups, linux-kernel, Roman Gushchin,
Shakeel Butt, Andrea Arcangeli
In-Reply-To: <20190702183730.14461-1-longman@redhat.com>
On Tue, 2 Jul 2019 14:37:30 -0400 Waiman Long <longman@redhat.com> wrote:
> Currently, a value of '1" is written to /sys/kernel/slab/<slab>/shrink
> file to shrink the slab by flushing all the per-cpu slabs and free
> slabs in partial lists. This applies only to the root caches, though.
>
> Extends this capability by shrinking all the child memcg caches and
> the root cache when a value of '2' is written to the shrink sysfs file.
Why?
Please fully describe the value of the proposed feature to or users.
Always.
>
> ...
>
> --- a/Documentation/ABI/testing/sysfs-kernel-slab
> +++ b/Documentation/ABI/testing/sysfs-kernel-slab
> @@ -429,10 +429,12 @@ KernelVersion: 2.6.22
> Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
> Christoph Lameter <cl@linux-foundation.org>
> Description:
> - The shrink file is written when memory should be reclaimed from
> - a cache. Empty partial slabs are freed and the partial list is
> - sorted so the slabs with the fewest available objects are used
> - first.
> + A value of '1' is written to the shrink file when memory should
> + be reclaimed from a cache. Empty partial slabs are freed and
> + the partial list is sorted so the slabs with the fewest
> + available objects are used first. When a value of '2' is
> + written, all the corresponding child memory cgroup caches
> + should be shrunk as well. All other values are invalid.
One would expect this to be a bitfield, like /proc/sys/vm/drop_caches.
So writing 3 does both forms of shrinking.
Yes, it happens to be the case that 2 is a superset of 1, but what
about if we add "4"?
^ permalink raw reply
* Re: [PATCH v6 2/6] ARM: Disable instrumentation for some code
From: Linus Walleij @ 2019-07-02 21:56 UTC (permalink / raw)
To: Florian Fainelli
Cc: Linux ARM, bcm-kernel-feedback-list, Andrey Ryabinin, Abbott Liu,
Alexander Potapenko, Dmitry Vyukov, Jonathan Corbet, Russell King,
christoffer.dall, Marc Zyngier, Arnd Bergmann, Nicolas Pitre,
Vladimir Murzin, Kees Cook, jinb.park7, Alexandre Belloni,
Ard Biesheuvel, Daniel Lezcano, Philippe Ombredanne, Rob Landley,
Greg KH, Andrew Morton, Mark Rutland, Catalin Marinas,
Masahiro Yamada, Thomas Gleixner, thgarnie, David Howells,
Geert Uytterhoeven, Andre Przywara, julien.thierry, drjones,
philip, mhocko, kirill.shutemov, kasan-dev,
Linux Doc Mailing List, linux-kernel@vger.kernel.org, kvmarm,
Andrey Ryabinin
In-Reply-To: <20190617221134.9930-3-f.fainelli@gmail.com>
On Tue, Jun 18, 2019 at 12:11 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
> @@ -236,7 +236,8 @@ static int unwind_pop_register(struct unwind_ctrl_block *ctrl,
> if (*vsp >= (unsigned long *)ctrl->sp_high)
> return -URC_FAILURE;
>
> - ctrl->vrs[reg] = *(*vsp)++;
> + ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp));
> + (*vsp)++;
I would probably even put in a comment here so it is clear why we
do this. Passers-by may not know that READ_ONCE_NOCHECK() is
even related to KASan.
Other than that,
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Yours,
Linus Walleij
^ permalink raw reply
* Re: [PATCH] Documentation: misc-devices: mei: Convert mei txt files to reST
From: Shreeya Patel @ 2019-07-02 19:57 UTC (permalink / raw)
To: Winkler, Tomas, skhan@linuxfoundation.org, corbet@lwn.net,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kernel-mentees@lists.linuxfoundation.org
In-Reply-To: <5B8DA87D05A7694D9FA63FD143655C1B9DC547D0@hasmsx108.ger.corp.intel.com>
On Sun, 2019-06-30 at 06:23 +0000, Winkler, Tomas wrote:
> > -----Original Message-----
> > From: Shreeya Patel [mailto:shreeya.patel23498@gmail.com]
> > Sent: Sunday, June 30, 2019 00:32
> > To: skhan@linuxfoundation.org; corbet@lwn.net; Winkler, Tomas
> > <tomas.winkler@intel.com>; linux-doc@vger.kernel.org; linux-
> > kernel@vger.kernel.org;
> > linux-kernel-mentees@lists.linuxfoundation.org
> > Subject: [PATCH] Documentation: misc-devices: mei: Convert mei txt
> > files to
> > reST
> >
> > Convert the MEI misc device's documentation files from .txt to
> > reStructuredText format. Make a minor change of correcting the
> > wrong macro
> > name MEI_CONNECT_CLIENT_IOCTL to IOCTL_MEI_CONNECT_CLIENT.
> > Add an index file in mei as there are two sections for it in the
> > documentation.
> >
> > Signed-off-by: Shreeya Patel <shreeya.patel23498@gmail.com>
> > ---
>
> Sorry you are late, we've already done that, it should be merged via
> Greg's char-misc tree.
> Thanks
> Tomas
>
Oh okay.
Thanks
>
>
> > I am not sure if I have placed the Documentation in the right place
> > so I would
> > like to get some suggestions from the MAINTAINERS on this part.
> >
> > Documentation/misc-devices/index.rst | 1 +
> > Documentation/misc-devices/mei/index.rst | 15 +
> > .../misc-devices/mei/mei-client-bus.rst | 151 +++++++++
> > .../misc-devices/mei/mei-client-bus.txt | 141 ---------
> > Documentation/misc-devices/mei/mei.rst | 289
> > ++++++++++++++++++
> > Documentation/misc-devices/mei/mei.txt | 266 --------------
> > --
> > 6 files changed, 456 insertions(+), 407 deletions(-) create mode
> > 100644
> > Documentation/misc-devices/mei/index.rst
> > create mode 100644 Documentation/misc-devices/mei/mei-client-
> > bus.rst
> > delete mode 100644 Documentation/misc-devices/mei/mei-client-
> > bus.txt
> > create mode 100644 Documentation/misc-devices/mei/mei.rst
> > delete mode 100644 Documentation/misc-devices/mei/mei.txt
> >
> > diff --git a/Documentation/misc-devices/index.rst
> > b/Documentation/misc-
> > devices/index.rst
> > index dfd1f45a3127..e788a12b2b19 100644
> > --- a/Documentation/misc-devices/index.rst
> > +++ b/Documentation/misc-devices/index.rst
> > @@ -15,3 +15,4 @@ fit into other categories.
> > :maxdepth: 2
> >
> > ibmvmc
> > + mei/index
> > diff --git a/Documentation/misc-devices/mei/index.rst
> > b/Documentation/misc-
> > devices/mei/index.rst
> > new file mode 100644
> > index 000000000000..3018098ad075
> > --- /dev/null
> > +++ b/Documentation/misc-devices/mei/index.rst
> > @@ -0,0 +1,15 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================================================
> > ==
> > +Intel(R) Management Engine Interface Kernel Driver (Intel(R) MEI)
> > +===============================================================
> > ==
> > +
> > +.. class:: toc-title
> > +
> > + Table of contents
> > +
> > +.. toctree::
> > + :maxdepth: 2
> > +
> > + mei
> > + mei-client-bus
> > diff --git a/Documentation/misc-devices/mei/mei-client-bus.rst
> > b/Documentation/misc-devices/mei/mei-client-bus.rst
> > new file mode 100644
> > index 000000000000..82d455afae78
> > --- /dev/null
> > +++ b/Documentation/misc-devices/mei/mei-client-bus.rst
> > @@ -0,0 +1,151 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +==============================================
> > +Intel(R) Management Engine (ME) Client bus API
> > +==============================================
> > +
> > +
> > +Rationale
> > +=========
> > +
> > +MEI misc character device is useful for dedicated applications to
> > send
> > +and receive data to the many FW appliance found in Intel's ME from
> > the user
> > space.
> > +However for some of the ME functionalities it make sense to
> > leverage
> > +existing software stack and expose them through existing kernel
> > subsystems.
> > +
> > +In order to plug seamlessly into the kernel device driver model we
> > add
> > +kernel virtual bus abstraction on top of the MEI driver. This
> > allows
> > +implementing linux kernel drivers for the various MEI features as
> > a stand
> > alone entities found in their respective subsystem.
> > +Existing device drivers can even potentially be re-used by adding
> > an
> > +MEI CL bus layer to the existing code.
> > +
> > +
> > +MEI CL bus API
> > +==============
> > +
> > +A driver implementation for an MEI Client is very similar to
> > existing
> > +bus based device drivers. The driver registers itself as an MEI CL
> > bus
> > +driver through the :c:type:`mei_cl_driver` structure:
> > +
> > +::
> > +
> > + struct mei_cl_driver {
> > + struct device_driver driver;
> > + const char *name;
> > +
> > + const struct mei_cl_device_id *id_table;
> > +
> > + int (*probe)(struct mei_cl_device *dev, const struct
> > mei_cl_id
> > *id);
> > + int (*remove)(struct mei_cl_device *dev);
> > + };
> > +
> > + struct mei_cl_id {
> > + char name[MEI_NAME_SIZE];
> > + kernel_ulong_t driver_info;
> > + };
> > +
> > +
> > +The :c:type:`mei_cl_id` structure allows the driver to bind itself
> > against a
> > device name.
> > +
> > +To actually register a driver on the ME Client bus one must call
> > the
> > +:c:func:`mei_cl_add_driver()` API. This is typically called at
> > module init time.
> > +
> > +Once registered on the ME Client bus, a driver will typically try
> > to do
> > +some I/O on this bus and this should be done through the
> > +:c:func:`mei_cl_send()` and :c:func:`mei_cl_recv()` routines. The
> > latter is
> > synchronous (blocks and sleeps until data shows up).
> > +In order for drivers to be notified of pending events waiting for
> > them (e.g.
> > +an Rx event) they can register an event handler through the
> > +:c:func:`mei_cl_register_event_cb()` routine. Currently only the
> > +:c:macro:`MEI_EVENT_RX` event will trigger an event handler call
> > and
> > +the driver implementation is supposed to call :c:func:`mei_recv()`
> > from
> > +the event handler in order to fetch the pending received buffers.
> > +
> > +
> > +Example
> > +=======
> > +
> > +As a theoretical example let's pretend the ME comes with a
> > "contact" NFC IP.
> > +The driver init and exit routines for this device would look like:
> > +
> > +::
> > +
> > + #define CONTACT_DRIVER_NAME "contact"
> > +
> > + static struct mei_cl_device_id contact_mei_cl_tbl[] = {
> > + { CONTACT_DRIVER_NAME, },
> > + /* required last entry */
> > + { }
> > + };
> > + MODULE_DEVICE_TABLE(mei_cl, contact_mei_cl_tbl);
> > +
> > + static struct mei_cl_driver contact_driver = {
> > + .id_table = contact_mei_tbl,
> > + .name = CONTACT_DRIVER_NAME,
> > + .probe = contact_probe,
> > + .remove = contact_remove,
> > + };
> > +
> > + static int contact_init(void)
> > + {
> > + int r;
> > +
> > + r = mei_cl_driver_register(&contact_driver);
> > + if (r) {
> > + pr_err(CONTACT_DRIVER_NAME ": driver
> > registration
> > failed\n");
> > + return r;
> > + }
> > +
> > + return 0;
> > + }
> > +
> > + static void __exit contact_exit(void)
> > + {
> > + mei_cl_driver_unregister(&contact_driver);
> > + }
> > +
> > + module_init(contact_init);
> > + module_exit(contact_exit);
> > +
> > +And the driver's simplified probe routine would look like that:
> > +
> > +::
> > +
> > + int contact_probe(struct mei_cl_device *dev, struct
> > mei_cl_device_id
> > *id)
> > + {
> > + struct contact_driver *contact;
> > +
> > + [...]
> > + mei_cl_enable_device(dev);
> > +
> > + mei_cl_register_event_cb(dev, contact_event_cb,
> > contact);
> > +
> > + return 0;
> > + }
> > +
> > +In the probe routine the driver first enable the MEI device and
> > then
> > +registers an ME bus event handler which is as close as it can get
> > to
> > +registering a threaded IRQ handler.
> > +The handler implementation will typically call some I/O routine
> > +depending on the pending events:
> > +
> > +::
> > +
> > + #define MAX_NFC_PAYLOAD 128
> > +
> > + static void contact_event_cb(struct mei_cl_device *dev,
> > u32 events,
> > + void *context)
> > + {
> > + struct contact_driver *contact = context;
> > +
> > + if (events & BIT(MEI_EVENT_RX)) {
> > + u8 payload[MAX_NFC_PAYLOAD];
> > + int payload_size;
> > +
> > + payload_size = mei_recv(dev, payload,
> > MAX_NFC_PAYLOAD);
> > + if (payload_size <= 0)
> > + return;
> > +
> > + /* Hook to the NFC subsystem */
> > + nfc_hci_recv_frame(contact->hdev, payload,
> > payload_size);
> > + }
> > + }
> > diff --git a/Documentation/misc-devices/mei/mei-client-bus.txt
> > b/Documentation/misc-devices/mei/mei-client-bus.txt
> > deleted file mode 100644
> > index 743be4ec8989..000000000000
> > --- a/Documentation/misc-devices/mei/mei-client-bus.txt
> > +++ /dev/null
> > @@ -1,141 +0,0 @@
> > -Intel(R) Management Engine (ME) Client bus API -
> > ==============================================
> > -
> > -
> > -Rationale
> > -=========
> > -
> > -MEI misc character device is useful for dedicated applications to
> > send and
> > receive -data to the many FW appliance found in Intel's ME from the
> > user
> > space.
> > -However for some of the ME functionalities it make sense to
> > leverage existing
> > software -stack and expose them through existing kernel subsystems.
> > -
> > -In order to plug seamlessly into the kernel device driver model we
> > add kernel
> > virtual -bus abstraction on top of the MEI driver. This allows
> > implementing linux
> > kernel drivers -for the various MEI features as a stand alone
> > entities found in
> > their respective subsystem.
> > -Existing device drivers can even potentially be re-used by adding
> > an MEI CL
> > bus layer to -the existing code.
> > -
> > -
> > -MEI CL bus API
> > -==============
> > -
> > -A driver implementation for an MEI Client is very similar to
> > existing bus -based
> > device drivers. The driver registers itself as an MEI CL bus driver
> > through -the
> > mei_cl_driver structure:
> > -
> > -struct mei_cl_driver {
> > - struct device_driver driver;
> > - const char *name;
> > -
> > - const struct mei_cl_device_id *id_table;
> > -
> > - int (*probe)(struct mei_cl_device *dev, const struct mei_cl_id
> > *id);
> > - int (*remove)(struct mei_cl_device *dev);
> > -};
> > -
> > -struct mei_cl_id {
> > - char name[MEI_NAME_SIZE];
> > - kernel_ulong_t driver_info;
> > -};
> > -
> > -The mei_cl_id structure allows the driver to bind itself against a
> > device name.
> > -
> > -To actually register a driver on the ME Client bus one must call
> > the
> > mei_cl_add_driver() -API. This is typically called at module init
> > time.
> > -
> > -Once registered on the ME Client bus, a driver will typically try
> > to do some I/O
> > on -this bus and this should be done through the mei_cl_send() and
> > mei_cl_recv() -routines. The latter is synchronous (blocks and
> > sleeps until data
> > shows up).
> > -In order for drivers to be notified of pending events waiting for
> > them (e.g.
> > -an Rx event) they can register an event handler through the
> > -mei_cl_register_event_cb() routine. Currently only the
> > MEI_EVENT_RX event -
> > will trigger an event handler call and the driver implementation is
> > supposed -to
> > call mei_recv() from the event handler in order to fetch the
> > pending -received
> > buffers.
> > -
> > -
> > -Example
> > -=======
> > -
> > -As a theoretical example let's pretend the ME comes with a
> > "contact" NFC IP.
> > -The driver init and exit routines for this device would look like:
> > -
> > -#define CONTACT_DRIVER_NAME "contact"
> > -
> > -static struct mei_cl_device_id contact_mei_cl_tbl[] = {
> > - { CONTACT_DRIVER_NAME, },
> > -
> > - /* required last entry */
> > - { }
> > -};
> > -MODULE_DEVICE_TABLE(mei_cl, contact_mei_cl_tbl);
> > -
> > -static struct mei_cl_driver contact_driver = {
> > - .id_table = contact_mei_tbl,
> > - .name = CONTACT_DRIVER_NAME,
> > -
> > - .probe = contact_probe,
> > - .remove = contact_remove,
> > -};
> > -
> > -static int contact_init(void)
> > -{
> > - int r;
> > -
> > - r = mei_cl_driver_register(&contact_driver);
> > - if (r) {
> > - pr_err(CONTACT_DRIVER_NAME ": driver registration
> > failed\n");
> > - return r;
> > - }
> > -
> > - return 0;
> > -}
> > -
> > -static void __exit contact_exit(void)
> > -{
> > - mei_cl_driver_unregister(&contact_driver);
> > -}
> > -
> > -module_init(contact_init);
> > -module_exit(contact_exit);
> > -
> > -And the driver's simplified probe routine would look like that:
> > -
> > -int contact_probe(struct mei_cl_device *dev, struct
> > mei_cl_device_id *id) -{
> > - struct contact_driver *contact;
> > -
> > - [...]
> > - mei_cl_enable_device(dev);
> > -
> > - mei_cl_register_event_cb(dev, contact_event_cb, contact);
> > -
> > - return 0;
> > -}
> > -
> > -In the probe routine the driver first enable the MEI device and
> > then registers -
> > an ME bus event handler which is as close as it can get to
> > registering a -
> > threaded IRQ handler.
> > -The handler implementation will typically call some I/O routine
> > depending on -
> > the pending events:
> > -
> > -#define MAX_NFC_PAYLOAD 128
> > -
> > -static void contact_event_cb(struct mei_cl_device *dev, u32
> > events,
> > - void *context)
> > -{
> > - struct contact_driver *contact = context;
> > -
> > - if (events & BIT(MEI_EVENT_RX)) {
> > - u8 payload[MAX_NFC_PAYLOAD];
> > - int payload_size;
> > -
> > - payload_size = mei_recv(dev, payload, MAX_NFC_PAYLOAD);
> > - if (payload_size <= 0)
> > - return;
> > -
> > - /* Hook to the NFC subsystem */
> > - nfc_hci_recv_frame(contact->hdev, payload,
> > payload_size);
> > - }
> > -}
> > diff --git a/Documentation/misc-devices/mei/mei.rst
> > b/Documentation/misc-
> > devices/mei/mei.rst
> > new file mode 100644
> > index 000000000000..e91ac2570b4d
> > --- /dev/null
> > +++ b/Documentation/misc-devices/mei/mei.rst
> > @@ -0,0 +1,289 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +====================================
> > +Intel(R) Management Engine Interface
> > +====================================
> > +
> > +Introduction
> > +============
> > +
> > +The Intel Management Engine (Intel ME) is an isolated and
> > protected
> > +computing resource (Co-processor) residing inside certain Intel
> > +chipsets. The Intel ME provides support for computer/IT management
> > +features. The feature set depends on the Intel chipset SKU.
> > +
> > +The Intel Management Engine Interface (Intel MEI, previously known
> > as
> > +HECI) is the interface between the Host and Intel ME. This
> > interface is
> > +exposed to the host as a PCI device. The Intel MEI Driver is in
> > charge
> > +of the communication channel between a host application and the
> > Intel ME
> > feature.
> > +
> > +Each Intel ME feature (Intel ME Client) is addressed by a
> > GUID/UUID and
> > +each client has its own protocol. The protocol is message-based
> > with a
> > +header and payload up to 512 bytes.
> > +
> > +Prominent usage of the Intel ME Interface is to communicate with
> > +Intel(R) Active Management Technology (Intel AMT) implemented in
> > +firmware running on the Intel ME.
> > +
> > +Intel AMT provides the ability to manage a host remotely out-of-
> > band
> > +(OOB) even when the operating system running on the host processor
> > has
> > +crashed or is in a sleep state.
> > +
> > +Some examples of Intel AMT usage are:
> > + * Monitoring hardware state and platform components
> > + * Remote power off/on (useful for green computing or
> > overnight IT
> > + maintenance)
> > + * OS updates
> > + * Storage of useful platform information such as software
> > assets
> > + * Built-in hardware KVM
> > + * Selective network isolation of Ethernet and IP protocol
> > flows based
> > + on policies set by a remote management console
> > + * IDE device redirection from remote management console
> > +
> > +Intel AMT (OOB) communication is based on SOAP (deprecated
> > starting
> > +with Release 6.0) over HTTP/S or WS-Management protocol over
> > HTTP/S
> > +that are received from a remote management console application.
> > +
> > +For more information about Intel AMT:
> > +`<
> > http://software.intel.com/sites/manageability/AMT_Implementation_and_
> > +Reference_Guide>`_
> > +
> > +
> > +Intel MEI Driver
> > +================
> > +
> > +The driver exposes a misc device called :file:`/dev/mei`.
> > +
> > +An application maintains communication with an Intel ME feature
> > while
> > +:file:`/dev/mei` is open. The binding to a specific feature is
> > +performed by calling :c:macro:`IOCTL_MEI_CONNECT_CLIENT`, which
> > passes
> > the desired UUID.
> > +The number of instances of an Intel ME feature that can be opened
> > at
> > +the same time depends on the Intel ME feature, but most of the
> > features
> > +allow only a single instance.
> > +
> > +The Intel AMT Host Interface (Intel AMTHI) feature supports
> > multiple
> > +simultaneous user connected applications. The Intel MEI driver
> > handles
> > +this internally by maintaining request queues for the
> > applications.
> > +
> > +The driver is transparent to data that are passed between firmware
> > +feature and host application.
> > +
> > +Because some of the Intel ME features can change the system
> > +configuration, the driver by default allows only a privileged user
> > to
> > +access it.
> > +
> > +A code snippet for an application communicating with Intel AMTHI
> > client:
> > +
> > +::
> > +
> > + struct mei_connect_client_data data;
> > + fd = open(MEI_DEVICE);
> > +
> > + data.d.in_client_uuid = AMTHI_UUID;
> > +
> > + ioctl(fd, IOCTL_MEI_CONNECT_CLIENT, &data);
> > +
> > + printf("Ver=%d, MaxLen=%ld\n",
> > + data.d.in_client_uuid.protocol_version,
> > + data.d.in_client_uuid.max_msg_length);
> > +
> > + [...]
> > +
> > + write(fd, amthi_req_data, amthi_req_data_len);
> > +
> > + [...]
> > +
> > + read(fd, &amthi_res_data, amthi_res_data_len);
> > +
> > + [...]
> > +
> > + close(fd);
> > +
> > +
> > +IOCTL
> > +=====
> > +
> > +The Intel MEI Driver supports the following IOCTL commands:
> > +
> > +
> > +:c:macro:`IOCTL_MEI_CONNECT_CLIENT`
> > +-------------------------------------
> > +Connect to firmware Feature (client)
> > +
> > +**Usage:**
> > +
> > +::
> > +
> > + struct mei_connect_client_data clientData;
> > + ioctl(fd, IOCTL_MEI_CONNECT_CLIENT, &clientData);
> > +
> > +**Inputs:**
> > + :c:type:`mei_connect_client_data` - structure contain the
> > following
> > + input field.
> > +
> > + :c:data:`in_client_uuid` - UUID of the FW Feature that
> > needs to connect
> > to.
> > +
> > +**Outputs:**
> > + :c:data:`out_client_properties` - Client Properties: MTU
> > and Protocol
> > Version.
> > +
> > +**Error returns:**
> > + | :c:macro:`EINVAL` - Wrong IOCTL Number.
> > + | :c:macro:`ENODEV` - Device or Connection is not
> > initialized or ready.
> > (e.g. Wrong UUID).
> > + | :c:macro:`ENOMEM` - Unable to allocate memory to client
> > internal
> > data.
> > + | :c:macro:`EFAULT` - Fatal Error (e.g. Unable to access
> > user input data).
> > + | :c:macro:`EBUSY` - Connection Already Open.
> > +
> > +**Notes:**
> > + :c:data:`max_msg_length` (MTU) in client properties
> > describes the
> > maximum
> > + data that can be sent or received. (e.g. if MTU=2K, can
> > send
> > + requests up to bytes 2k and received responses up to 2k
> > bytes).
> > +
> > +
> > +:c:macro:`IOCTL_MEI_NOTIFY_SET`
> > +-------------------------------
> > +Enable or disable event notifications
> > +
> > +**Usage:**
> > +
> > +::
> > +
> > + uint32_t enable;
> > + ioctl(fd, IOCTL_MEI_NOTIFY_SET, &enable);
> > +
> > +**Inputs:**
> > + | :c:data:`uint32_t enable = 1;`
> > + | or
> > + | :c:data:`uint32_t enable[disable] = 0;`
> > +
> > +**Error returns:**
> > + | :c:macro:`EINVAL` - Wrong IOCTL Number.
> > + | :c:macro:`ENODEV` - Device is not initialized or the
> > client not
> > connected.
> > + | :c:macro:`ENOMEM` - Unable to allocate memory to client
> > internal
> > data.
> > + | :c:macro:`EFAULT` - Fatal Error (e.g. Unable to access
> > user input data).
> > + | :c:macro:`EOPNOTSUPP` - if the device doesn't support
> > the feature.
> > +
> > +**Notes:**
> > + The client must be connected in order to enable
> > notification events.
> > +
> > +
> > +:c:macro:`IOCTL_MEI_NOTIFY_GET`
> > +-------------------------------
> > +Retrieve event
> > +
> > +**Usage:**
> > +
> > +::
> > +
> > + uint32_t event;
> > + ioctl(fd, IOCTL_MEI_NOTIFY_GET, &event);
> > +
> > +**Outputs:**
> > + | 1 - if an event is pending.
> > + | 0 - if there is no even pending.
> > +
> > +**Error returns:**
> > + | :c:macro:`EINVAL` - Wrong IOCTL Number.
> > + | :c:macro:`ENODEV` - Device is not initialized or the
> > client not
> > connected.
> > + | :c:macro:`ENOMEM` - Unable to allocate memory to client
> > internal
> > data.
> > + | :c:macro:`EFAULT` - Fatal Error (e.g. Unable to access
> > user input data).
> > + | :c:macro:`EOPNOTSUPP` - if the device doesn't support
> > the feature.
> > +
> > +**Notes:**
> > + The client must be connected and event notification has to
> > be enabled
> > + in order to receive an event.
> > +
> > +
> > +Intel ME Applications
> > +=====================
> > +
> > +1) Intel Local Management Service (Intel LMS)
> > +
> > + Applications running locally on the platform communicate
> > with Intel AMT
> > Release
> > + 2.0 and later releases in the same way that network
> > applications do via
> > SOAP
> > + over HTTP (deprecated starting with Release 6.0) or with
> > WS-
> > Management over
> > + SOAP over HTTP. This means that some Intel AMT features
> > can be
> > accessed from a
> > + local application using the same network interface as a
> > remote
> > application
> > + communicating with Intel AMT over the network.
> > +
> > + When a local application sends a message addressed to the
> > local Intel
> > AMT host
> > + name, the Intel LMS, which listens for traffic directed to
> > the host name,
> > + intercepts the message and routes it to the Intel MEI.
> > + For more information:
> > +
> > `<
> > http://software.intel.com/sites/manageability/AMT_Implementation_and_Re
> > ference_Guide>`_
> > + Under "About Intel AMT" => "Local Access"
> > +
> > + For downloading Intel LMS:
> > +
> > + `<
> > http://software.intel.com/en-us/articles/download-the-latest-intel-a
> > + mt-open-source-drivers/>`_
> > +
> > + The Intel LMS opens a connection using the Intel MEI
> > driver to the Intel
> > LMS
> > + firmware feature using a defined UUID and then
> > communicates with the
> > feature
> > + using a protocol called Intel AMT Port Forwarding Protocol
> > (Intel APF
> > protocol).
> > + The protocol is used to maintain multiple sessions with
> > Intel AMT from a
> > + single application.
> > +
> > + See the protocol specification in the `Intel AMT Software
> > Development
> > Kit (SDK)
> > +
> > <
> > http://software.intel.com/sites/manageability/AMT_Implementation_and_Ref
> > erence_Guide>`_
> > + Under "SDK Resources" => "Intel(R) vPro(TM) Gateway (MPS)"
> > + => "Information for Intel(R) vPro(TM) Gateway Developers"
> > + => "Description of the Intel AMT Port Forwarding (APF)
> > Protocol"
> > +
> > +2) Intel AMT Remote configuration using a Local Agent
> > +
> > + A Local Agent enables IT personnel to configure Intel AMT
> > out-of-the-box
> > + without requiring installing additional data to enable
> > setup. The remote
> > + configuration process may involve an ISV-developed remote
> > configuration
> > + agent that runs on the host.
> > + For more information:
> > +
> > `<
> > http://software.intel.com/sites/manageability/AMT_Implementation_and_Re
> > ference_Guide>`_
> > + Under "Setup and Configuration of Intel AMT" =>
> > + "SDK Tools Supporting Setup and Configuration" =>
> > + "Using the Local Agent Sample"
> > +
> > + An open source Intel AMT configuration utility, impleme
> > nting a local
> > agent
> > + that accesses the Intel MEI driver, can be found here:
> > +
> > + `<
> > http://software.intel.com/en-us/articles/download-the-latest-intel-a
> > + mt-open-source-drivers/>`
> > +
> > +
> > +Intel AMT OS Health Watchdog
> > +============================
> > +
> > +The Intel AMT Watchdog is an OS Health (Hang/Crash) watchdog.
> > +Whenever the OS hangs or crashes, Intel AMT will send an event to
> > any
> > +subscriber to this event. This mechanism means that IT knows when
> > a
> > +platform crashes even when there is a hard failure on the host.
> > +
> > +The Intel AMT Watchdog is composed of two parts:
> > + 1) Firmware feature - receives the heartbeats
> > + and sends an event when the heartbeats stop.
> > + 2) Intel MEI iAMT watchdog driver - connects to the
> > watchdog feature,
> > + configures the watchdog and sends the heartbeats.
> > +
> > +The Intel iAMT watchdog MEI driver uses the kernel watchdog API to
> > +configure the Intel AMT Watchdog and to send heartbeats to it. The
> > +default timeout of the watchdog is 120 seconds.
> > +
> > +If the Intel AMT is not enabled in the firmware then the watchdog
> > +client won't enumerate on the me client bus and watchdog devices
> > won't be
> > exposed.
> > +
> > +
> > +Supported Chipsets
> > +==================
> > +
> > +| 7 Series Chipset Family
> > +| 6 Series Chipset Family
> > +| 5 Series Chipset Family
> > +| 4 Series Chipset Family
> > +| Mobile 4 Series Chipset Family
> > +| ICH9
> > +| 82946GZ/GL
> > +| 82G35 Express
> > +| 82Q963/Q965
> > +| 82P965/G965
> > +| Mobile PM965/GM965
> > +| Mobile GME965/GLE960
> > +| 82Q35 Express
> > +| 82G33/G31/P35/P31 Express
> > +| 82Q33 Express
> > +| 82X38/X48 Express
> > +
> > +---
> > +linux-mei@linux.intel.com
> > diff --git a/Documentation/misc-devices/mei/mei.txt
> > b/Documentation/misc-
> > devices/mei/mei.txt
> > deleted file mode 100644
> > index 2b80a0cd621f..000000000000
> > --- a/Documentation/misc-devices/mei/mei.txt
> > +++ /dev/null
> > @@ -1,266 +0,0 @@
> > -Intel(R) Management Engine Interface (Intel(R) MEI) -
> > ===================================================
> > -
> > -Introduction
> > -============
> > -
> > -The Intel Management Engine (Intel ME) is an isolated and
> > protected
> > computing -resource (Co-processor) residing inside certain Intel
> > chipsets. The
> > Intel ME -provides support for computer/IT management features. The
> > feature
> > set -depends on the Intel chipset SKU.
> > -
> > -The Intel Management Engine Interface (Intel MEI, previously known
> > as HECI)
> > -is the interface between the Host and Intel ME. This interface is
> > exposed -to
> > the host as a PCI device. The Intel MEI Driver is in charge of the
> > -
> > communication channel between a host application and the Intel ME
> > feature.
> > -
> > -Each Intel ME feature (Intel ME Client) is addressed by a
> > GUID/UUID and -each
> > client has its own protocol. The protocol is message-based with a
> > -header and
> > payload up to 512 bytes.
> > -
> > -Prominent usage of the Intel ME Interface is to communicate with
> > Intel(R) -
> > Active Management Technology (Intel AMT) implemented in firmware
> > running
> > on -the Intel ME.
> > -
> > -Intel AMT provides the ability to manage a host remotely out-of-
> > band (OOB) -
> > even when the operating system running on the host processor has
> > crashed or -
> > is in a sleep state.
> > -
> > -Some examples of Intel AMT usage are:
> > - - Monitoring hardware state and platform components
> > - - Remote power off/on (useful for green computing or overnight
> > IT
> > - maintenance)
> > - - OS updates
> > - - Storage of useful platform information such as software
> > assets
> > - - Built-in hardware KVM
> > - - Selective network isolation of Ethernet and IP protocol flows
> > based
> > - on policies set by a remote management console
> > - - IDE device redirection from remote management console
> > -
> > -Intel AMT (OOB) communication is based on SOAP (deprecated
> > -starting with
> > Release 6.0) over HTTP/S or WS-Management protocol over -HTTP/S
> > that are
> > received from a remote management console application.
> > -
> > -For more information about Intel AMT:
> > -
> >
http://software.intel.com/sites/manageability/AMT_Implementation_and_Refe
> > rence_Guide
> > -
> > -
> > -Intel MEI Driver
> > -================
> > -
> > -The driver exposes a misc device called /dev/mei.
> > -
> > -An application maintains communication with an Intel ME feature
> > while -
> > /dev/mei is open. The binding to a specific feature is performed by
> > calling -
> > MEI_CONNECT_CLIENT_IOCTL, which passes the desired UUID.
> > -The number of instances of an Intel ME feature that can be opened
> > -at the
> > same time depends on the Intel ME feature, but most of the
> > -features allow
> > only a single instance.
> > -
> > -The Intel AMT Host Interface (Intel AMTHI) feature supports
> > multiple -
> > simultaneous user connected applications. The Intel MEI driver
> > -handles this
> > internally by maintaining request queues for the applications.
> > -
> > -The driver is transparent to data that are passed between firmware
> > feature -
> > and host application.
> > -
> > -Because some of the Intel ME features can change the system
> > -configuration,
> > the driver by default allows only a privileged -user to access it.
> > -
> > -A code snippet for an application communicating with Intel AMTHI
> > client:
> > -
> > - struct mei_connect_client_data data;
> > - fd = open(MEI_DEVICE);
> > -
> > - data.d.in_client_uuid = AMTHI_UUID;
> > -
> > - ioctl(fd, IOCTL_MEI_CONNECT_CLIENT, &data);
> > -
> > - printf("Ver=%d, MaxLen=%ld\n",
> > - data.d.in_client_uuid.protocol_version,
> > - data.d.in_client_uuid.max_msg_length);
> > -
> > - [...]
> > -
> > - write(fd, amthi_req_data, amthi_req_data_len);
> > -
> > - [...]
> > -
> > - read(fd, &amthi_res_data, amthi_res_data_len);
> > -
> > - [...]
> > - close(fd);
> > -
> > -
> > -IOCTL
> > -=====
> > -
> > -The Intel MEI Driver supports the following IOCTL commands:
> > - IOCTL_MEI_CONNECT_CLIENT Connect to firmware Feature
> > (client).
> > -
> > - usage:
> > - struct mei_connect_client_data clientData;
> > - ioctl(fd, IOCTL_MEI_CONNECT_CLIENT, &clientData);
> > -
> > - inputs:
> > - mei_connect_client_data struct contain the following
> > - input field:
> > -
> > - in_client_uuid - UUID of the FW Feature that needs
> > - to connect to.
> > - outputs:
> > - out_client_properties - Client Properties: MTU and
> > Protocol
> > Version.
> > -
> > - error returns:
> > - EINVAL Wrong IOCTL Number
> > - ENODEV Device or Connection is not initialized or
> > ready.
> > - (e.g. Wrong UUID)
> > - ENOMEM Unable to allocate memory to client
> > internal
> > data.
> > - EFAULT Fatal Error (e.g. Unable to access user
> > input data)
> > - EBUSY Connection Already Open
> > -
> > - Notes:
> > - max_msg_length (MTU) in client properties describes the
> > maximum
> > - data that can be sent or received. (e.g. if MTU=2K, can
> > send
> > - requests up to bytes 2k and received responses up to 2k
> > bytes).
> > -
> > - IOCTL_MEI_NOTIFY_SET: enable or disable event notifications
> > -
> > - Usage:
> > - uint32_t enable;
> > - ioctl(fd, IOCTL_MEI_NOTIFY_SET, &enable);
> > -
> > - Inputs:
> > - uint32_t enable = 1;
> > - or
> > - uint32_t enable[disable] = 0;
> > -
> > - Error returns:
> > - EINVAL Wrong IOCTL Number
> > - ENODEV Device is not initialized or the client
> > not
> > connected
> > - ENOMEM Unable to allocate memory to client
> > internal
> > data.
> > - EFAULT Fatal Error (e.g. Unable to access user
> > input data)
> > - EOPNOTSUPP if the device doesn't support the feature
> > -
> > - Notes:
> > - The client must be connected in order to enable notification
> > events
> > -
> > -
> > - IOCTL_MEI_NOTIFY_GET : retrieve event
> > -
> > - Usage:
> > - uint32_t event;
> > - ioctl(fd, IOCTL_MEI_NOTIFY_GET, &event);
> > -
> > - Outputs:
> > - 1 - if an event is pending
> > - 0 - if there is no even pending
> > -
> > - Error returns:
> > - EINVAL Wrong IOCTL Number
> > - ENODEV Device is not initialized or the client not
> > connected
> > - ENOMEM Unable to allocate memory to client
> > internal
> > data.
> > - EFAULT Fatal Error (e.g. Unable to access user
> > input data)
> > - EOPNOTSUPP if the device doesn't support the feature
> > -
> > - Notes:
> > - The client must be connected and event notification has to be
> > enabled
> > - in order to receive an event
> > -
> > -
> > -Intel ME Applications
> > -=====================
> > -
> > - 1) Intel Local Management Service (Intel LMS)
> > -
> > - Applications running locally on the platform communicate
> > with Intel
> > AMT Release
> > - 2.0 and later releases in the same way that network
> > applications do
> > via SOAP
> > - over HTTP (deprecated starting with Release 6.0) or with WS-
> > Management over
> > - SOAP over HTTP. This means that some Intel AMT features can
> > be
> > accessed from a
> > - local application using the same network interface as a
> > remote
> > application
> > - communicating with Intel AMT over the network.
> > -
> > - When a local application sends a message addressed to the
> > local Intel
> > AMT host
> > - name, the Intel LMS, which listens for traffic directed to
> > the host
> > name,
> > - intercepts the message and routes it to the Intel MEI.
> > - For more information:
> > -
> >
http://software.intel.com/sites/manageability/AMT_Implementation_and_Refe
> > rence_Guide
> > - Under "About Intel AMT" => "Local Access"
> > -
> > - For downloading Intel LMS:
> > -
> > http://software.intel.com/en-us/articles/download-the-latest-intel-
> > amt-open-source-drivers/
> > -
> > - The Intel LMS opens a connection using the Intel MEI driver
> > to the
> > Intel LMS
> > - firmware feature using a defined UUID and then communicates
> > with
> > the feature
> > - using a protocol called Intel AMT Port Forwarding Protocol
> > (Intel APF
> > protocol).
> > - The protocol is used to maintain multiple sessions with
> > Intel AMT
> > from a
> > - single application.
> > -
> > - See the protocol specification in the Intel AMT Software
> > Development
> > Kit (SDK)
> > -
> >
http://software.intel.com/sites/manageability/AMT_Implementation_and_Refe
> > rence_Guide
> > - Under "SDK Resources" => "Intel(R) vPro(TM) Gateway (MPS)"
> > - => "Information for Intel(R) vPro(TM) Gateway Developers"
> > - => "Description of the Intel AMT Port Forwarding (APF)
> > Protocol"
> > -
> > - 2) Intel AMT Remote configuration using a Local Agent
> > -
> > - A Local Agent enables IT personnel to configure Intel AMT
> > out-of-the-
> > box
> > - without requiring installing additional data to enable
> > setup. The
> > remote
> > - configuration process may involve an ISV-developed remote
> > configuration
> > - agent that runs on the host.
> > - For more information:
> > -
> >
http://software.intel.com/sites/manageability/AMT_Implementation_and_Refe
> > rence_Guide
> > - Under "Setup and Configuration of Intel AMT" =>
> > - "SDK Tools Supporting Setup and Configuration" =>
> > - "Using the Local Agent Sample"
> > -
> > - An open source Intel AMT configuration utility, implementin
> > g a local
> > agent
> > - that accesses the Intel MEI driver, can be found here:
> > -
> > http://software.intel.com/en-us/articles/download-the-latest-intel-
> > amt-open-source-drivers/
> > -
> > -
> > -Intel AMT OS Health Watchdog
> > -============================
> > -
> > -The Intel AMT Watchdog is an OS Health (Hang/Crash) watchdog.
> > -Whenever the OS hangs or crashes, Intel AMT will send an event -to
> > any
> > subscriber to this event. This mechanism means that -IT knows when
> > a platform
> > crashes even when there is a hard failure on the host.
> > -
> > -The Intel AMT Watchdog is composed of two parts:
> > - 1) Firmware feature - receives the heartbeats
> > - and sends an event when the heartbeats stop.
> > - 2) Intel MEI iAMT watchdog driver - connects to the watchdog
> > feature,
> > - configures the watchdog and sends the heartbeats.
> > -
> > -The Intel iAMT watchdog MEI driver uses the kernel watchdog API to
> > configure
> > -the Intel AMT Watchdog and to send heartbeats to it. The default
> > timeout of
> > the -watchdog is 120 seconds.
> > -
> > -If the Intel AMT is not enabled in the firmware then the watchdog
> > client won't
> > enumerate -on the me client bus and watchdog devices won't be
> > exposed.
> > -
> > -
> > -Supported Chipsets
> > -==================
> > -
> > -7 Series Chipset Family
> > -6 Series Chipset Family
> > -5 Series Chipset Family
> > -4 Series Chipset Family
> > -Mobile 4 Series Chipset Family
> > -ICH9
> > -82946GZ/GL
> > -82G35 Express
> > -82Q963/Q965
> > -82P965/G965
> > -Mobile PM965/GM965
> > -Mobile GME965/GLE960
> > -82Q35 Express
> > -82G33/G31/P35/P31 Express
> > -82Q33 Express
> > -82X38/X48 Express
> > -
> > ----
> > -linux-mei@linux.intel.com
> > --
> > 2.17.1
>
>
^ permalink raw reply
* Re: [PATCH v1 1/2] Documentation/filesystems: add binderfs
From: Christian Brauner @ 2019-07-02 19:51 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Jonathan Corbet, linux-doc, linux-kernel
In-Reply-To: <20190702175729.GF1729@bombadil.infradead.org>
On Tue, Jul 02, 2019 at 10:57:29AM -0700, Matthew Wilcox wrote:
> On Mon, Jan 14, 2019 at 05:24:01PM -0700, Jonathan Corbet wrote:
> > On Fri, 11 Jan 2019 14:40:59 +0100
> > Christian Brauner <christian@brauner.io> wrote:
> > > This documents the Android binderfs filesystem used to dynamically add and
> > > remove binder devices that are private to each instance.
> >
> > You didn't add it to index.rst, so it won't actually become part of the
> > docs build.
>
> I think you added it in the wrong place.
>
> From 8167b80c950834da09a9204b6236f238197c197b Mon Sep 17 00:00:00 2001
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Date: Tue, 2 Jul 2019 13:54:38 -0400
> Subject: [PATCH] docs: Move binderfs to admin-guide
>
> The documentation is more appropriate for the administrator than for
> the internal kernel API section it is currently in.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Don't feel very strong about where this ends up. :)
Acked-by: Christian Brauner <christian@brauner.io>
> ---
> .../{filesystems => admin-guide}/binderfs.rst | 0
> Documentation/admin-guide/index.rst | 1 +
> Documentation/filesystems/index.rst | 10 ----------
> 3 files changed, 1 insertion(+), 10 deletions(-)
> rename Documentation/{filesystems => admin-guide}/binderfs.rst (100%)
>
> diff --git a/Documentation/filesystems/binderfs.rst b/Documentation/admin-guide/binderfs.rst
> similarity index 100%
> rename from Documentation/filesystems/binderfs.rst
> rename to Documentation/admin-guide/binderfs.rst
> diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
> index 8001917ee012..24fbe0568eff 100644
> --- a/Documentation/admin-guide/index.rst
> +++ b/Documentation/admin-guide/index.rst
> @@ -70,6 +70,7 @@ configure specific aspects of kernel behavior to your liking.
> ras
> bcache
> ext4
> + binderfs
> pm/index
> thunderbolt
> LSM/index
> diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
> index 1131c34d77f6..970c0a3ec377 100644
> --- a/Documentation/filesystems/index.rst
> +++ b/Documentation/filesystems/index.rst
> @@ -31,13 +31,3 @@ filesystem implementations.
>
> journalling
> fscrypt
> -
> -Filesystem-specific documentation
> -=================================
> -
> -Documentation for individual filesystem types can be found here.
> -
> -.. toctree::
> - :maxdepth: 2
> -
> - binderfs.rst
> --
> 2.20.1
>
^ permalink raw reply
* Re: [PATCH] mm, slab: Extend slab/shrink to shrink all the memcg caches
From: Roman Gushchin @ 2019-07-02 19:30 UTC (permalink / raw)
To: Waiman Long
Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Alexander Viro, Jonathan Corbet, Luis Chamberlain,
Kees Cook, Johannes Weiner, Michal Hocko, Vladimir Davydov,
linux-mm@kvack.org, linux-doc@vger.kernel.org,
linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, Shakeel Butt, Andrea Arcangeli
In-Reply-To: <20190702183730.14461-1-longman@redhat.com>
On Tue, Jul 02, 2019 at 02:37:30PM -0400, Waiman Long wrote:
> Currently, a value of '1" is written to /sys/kernel/slab/<slab>/shrink
> file to shrink the slab by flushing all the per-cpu slabs and free
> slabs in partial lists. This applies only to the root caches, though.
>
> Extends this capability by shrinking all the child memcg caches and
> the root cache when a value of '2' is written to the shrink sysfs file.
>
> On a 4-socket 112-core 224-thread x86-64 system after a parallel kernel
> build, the the amount of memory occupied by slabs before shrinking
> slabs were:
>
> # grep task_struct /proc/slabinfo
> task_struct 7114 7296 7744 4 8 : tunables 0 0
> 0 : slabdata 1824 1824 0
> # grep "^S[lRU]" /proc/meminfo
> Slab: 1310444 kB
> SReclaimable: 377604 kB
> SUnreclaim: 932840 kB
>
> After shrinking slabs:
>
> # grep "^S[lRU]" /proc/meminfo
> Slab: 695652 kB
> SReclaimable: 322796 kB
> SUnreclaim: 372856 kB
> # grep task_struct /proc/slabinfo
> task_struct 2262 2572 7744 4 8 : tunables 0 0
> 0 : slabdata 643 643 0
>
> Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Roman Gushchin <guro@fb.com>
Thanks, Waiman!
^ permalink raw reply
* Re: [PATCH] mm, slab: Extend slab/shrink to shrink all the memcg caches
From: Waiman Long @ 2019-07-02 19:15 UTC (permalink / raw)
To: David Rientjes
Cc: Christoph Lameter, Pekka Enberg, Joonsoo Kim, Andrew Morton,
Alexander Viro, Jonathan Corbet, Luis Chamberlain, Kees Cook,
Johannes Weiner, Michal Hocko, Vladimir Davydov, linux-mm,
linux-doc, linux-fsdevel, cgroups, linux-kernel, Roman Gushchin,
Shakeel Butt, Andrea Arcangeli
In-Reply-To: <alpine.DEB.2.21.1907021206000.67286@chino.kir.corp.google.com>
On 7/2/19 3:09 PM, David Rientjes wrote:
> On Tue, 2 Jul 2019, Waiman Long wrote:
>
>> diff --git a/Documentation/ABI/testing/sysfs-kernel-slab b/Documentation/ABI/testing/sysfs-kernel-slab
>> index 29601d93a1c2..2a3d0fc4b4ac 100644
>> --- a/Documentation/ABI/testing/sysfs-kernel-slab
>> +++ b/Documentation/ABI/testing/sysfs-kernel-slab
>> @@ -429,10 +429,12 @@ KernelVersion: 2.6.22
>> Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
>> Christoph Lameter <cl@linux-foundation.org>
>> Description:
>> - The shrink file is written when memory should be reclaimed from
>> - a cache. Empty partial slabs are freed and the partial list is
>> - sorted so the slabs with the fewest available objects are used
>> - first.
>> + A value of '1' is written to the shrink file when memory should
>> + be reclaimed from a cache. Empty partial slabs are freed and
>> + the partial list is sorted so the slabs with the fewest
>> + available objects are used first. When a value of '2' is
>> + written, all the corresponding child memory cgroup caches
>> + should be shrunk as well. All other values are invalid.
>>
> This should likely call out that '2' also does '1', that might not be
> clear enough.
You are right. I will reword the text to make it clearer.
>> What: /sys/kernel/slab/cache/slab_size
>> Date: May 2007
>> diff --git a/mm/slab.h b/mm/slab.h
>> index 3b22931bb557..a16b2c7ff4dd 100644
>> --- a/mm/slab.h
>> +++ b/mm/slab.h
>> @@ -174,6 +174,7 @@ int __kmem_cache_shrink(struct kmem_cache *);
>> void __kmemcg_cache_deactivate(struct kmem_cache *s);
>> void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s);
>> void slab_kmem_cache_release(struct kmem_cache *);
>> +int kmem_cache_shrink_all(struct kmem_cache *s);
>>
>> struct seq_file;
>> struct file;
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 464faaa9fd81..493697ba1da5 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -981,6 +981,49 @@ int kmem_cache_shrink(struct kmem_cache *cachep)
>> }
>> EXPORT_SYMBOL(kmem_cache_shrink);
>>
>> +/**
>> + * kmem_cache_shrink_all - shrink a cache and all its memcg children
>> + * @s: The root cache to shrink.
>> + *
>> + * Return: 0 if successful, -EINVAL if not a root cache
>> + */
>> +int kmem_cache_shrink_all(struct kmem_cache *s)
>> +{
>> + struct kmem_cache *c;
>> +
>> + if (!IS_ENABLED(CONFIG_MEMCG_KMEM)) {
>> + kmem_cache_shrink(s);
>> + return 0;
>> + }
>> + if (!is_root_cache(s))
>> + return -EINVAL;
>> +
>> + /*
>> + * The caller should have a reference to the root cache and so
>> + * we don't need to take the slab_mutex. We have to take the
>> + * slab_mutex, however, to iterate the memcg caches.
>> + */
>> + get_online_cpus();
>> + get_online_mems();
>> + kasan_cache_shrink(s);
>> + __kmem_cache_shrink(s);
>> +
>> + mutex_lock(&slab_mutex);
>> + for_each_memcg_cache(c, s) {
>> + /*
>> + * Don't need to shrink deactivated memcg caches.
>> + */
>> + if (s->flags & SLAB_DEACTIVATED)
>> + continue;
>> + kasan_cache_shrink(c);
>> + __kmem_cache_shrink(c);
>> + }
>> + mutex_unlock(&slab_mutex);
>> + put_online_mems();
>> + put_online_cpus();
>> + return 0;
>> +}
>> +
>> bool slab_is_available(void)
>> {
>> return slab_state >= UP;
> I'm wondering how long this could take, i.e. how long we hold slab_mutex
> while we traverse each cache and shrink it.
It will depends on how many memcg caches are there. Actually, I have
been thinking about using the show method to show the time spent in the
last shrink operation. I am just not sure if it is worth doing. What do
you think?
-Longman
^ permalink raw reply
* Re: [PATCH] mm, slab: Extend slab/shrink to shrink all the memcg caches
From: David Rientjes @ 2019-07-02 19:09 UTC (permalink / raw)
To: Waiman Long
Cc: Christoph Lameter, Pekka Enberg, Joonsoo Kim, Andrew Morton,
Alexander Viro, Jonathan Corbet, Luis Chamberlain, Kees Cook,
Johannes Weiner, Michal Hocko, Vladimir Davydov, linux-mm,
linux-doc, linux-fsdevel, cgroups, linux-kernel, Roman Gushchin,
Shakeel Butt, Andrea Arcangeli
In-Reply-To: <20190702183730.14461-1-longman@redhat.com>
On Tue, 2 Jul 2019, Waiman Long wrote:
> diff --git a/Documentation/ABI/testing/sysfs-kernel-slab b/Documentation/ABI/testing/sysfs-kernel-slab
> index 29601d93a1c2..2a3d0fc4b4ac 100644
> --- a/Documentation/ABI/testing/sysfs-kernel-slab
> +++ b/Documentation/ABI/testing/sysfs-kernel-slab
> @@ -429,10 +429,12 @@ KernelVersion: 2.6.22
> Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
> Christoph Lameter <cl@linux-foundation.org>
> Description:
> - The shrink file is written when memory should be reclaimed from
> - a cache. Empty partial slabs are freed and the partial list is
> - sorted so the slabs with the fewest available objects are used
> - first.
> + A value of '1' is written to the shrink file when memory should
> + be reclaimed from a cache. Empty partial slabs are freed and
> + the partial list is sorted so the slabs with the fewest
> + available objects are used first. When a value of '2' is
> + written, all the corresponding child memory cgroup caches
> + should be shrunk as well. All other values are invalid.
>
This should likely call out that '2' also does '1', that might not be
clear enough.
> What: /sys/kernel/slab/cache/slab_size
> Date: May 2007
> diff --git a/mm/slab.h b/mm/slab.h
> index 3b22931bb557..a16b2c7ff4dd 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -174,6 +174,7 @@ int __kmem_cache_shrink(struct kmem_cache *);
> void __kmemcg_cache_deactivate(struct kmem_cache *s);
> void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s);
> void slab_kmem_cache_release(struct kmem_cache *);
> +int kmem_cache_shrink_all(struct kmem_cache *s);
>
> struct seq_file;
> struct file;
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 464faaa9fd81..493697ba1da5 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -981,6 +981,49 @@ int kmem_cache_shrink(struct kmem_cache *cachep)
> }
> EXPORT_SYMBOL(kmem_cache_shrink);
>
> +/**
> + * kmem_cache_shrink_all - shrink a cache and all its memcg children
> + * @s: The root cache to shrink.
> + *
> + * Return: 0 if successful, -EINVAL if not a root cache
> + */
> +int kmem_cache_shrink_all(struct kmem_cache *s)
> +{
> + struct kmem_cache *c;
> +
> + if (!IS_ENABLED(CONFIG_MEMCG_KMEM)) {
> + kmem_cache_shrink(s);
> + return 0;
> + }
> + if (!is_root_cache(s))
> + return -EINVAL;
> +
> + /*
> + * The caller should have a reference to the root cache and so
> + * we don't need to take the slab_mutex. We have to take the
> + * slab_mutex, however, to iterate the memcg caches.
> + */
> + get_online_cpus();
> + get_online_mems();
> + kasan_cache_shrink(s);
> + __kmem_cache_shrink(s);
> +
> + mutex_lock(&slab_mutex);
> + for_each_memcg_cache(c, s) {
> + /*
> + * Don't need to shrink deactivated memcg caches.
> + */
> + if (s->flags & SLAB_DEACTIVATED)
> + continue;
> + kasan_cache_shrink(c);
> + __kmem_cache_shrink(c);
> + }
> + mutex_unlock(&slab_mutex);
> + put_online_mems();
> + put_online_cpus();
> + return 0;
> +}
> +
> bool slab_is_available(void)
> {
> return slab_state >= UP;
I'm wondering how long this could take, i.e. how long we hold slab_mutex
while we traverse each cache and shrink it.
Acked-by: David Rientjes <rientjes@google.com>
^ permalink raw reply
* Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs
From: Waiman Long @ 2019-07-02 18:41 UTC (permalink / raw)
To: Michal Hocko
Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Alexander Viro, Jonathan Corbet, Luis Chamberlain,
Kees Cook, Johannes Weiner, Vladimir Davydov, linux-mm, linux-doc,
linux-fsdevel, cgroups, linux-kernel, Roman Gushchin,
Shakeel Butt, Andrea Arcangeli
In-Reply-To: <20190628073128.GC2751@dhcp22.suse.cz>
On 6/28/19 3:31 AM, Michal Hocko wrote:
> On Thu 27-06-19 17:16:04, Waiman Long wrote:
>> On 6/27/19 11:15 AM, Michal Hocko wrote:
>>> On Mon 24-06-19 13:42:19, Waiman Long wrote:
>>>> With the slub memory allocator, the numbers of active slab objects
>>>> reported in /proc/slabinfo are not real because they include objects
>>>> that are held by the per-cpu slab structures whether they are actually
>>>> used or not. The problem gets worse the more CPUs a system have. For
>>>> instance, looking at the reported number of active task_struct objects,
>>>> one will wonder where all the missing tasks gone.
>>>>
>>>> I know it is hard and costly to get a real count of active objects.
>>> What exactly is expensive? Why cannot slabinfo reduce the number of
>>> active objects by per-cpu cached objects?
>>>
>> The number of cachelines that needs to be accessed in order to get an
>> accurate count will be much higher if we need to iterate through all the
>> per-cpu structures. In addition, accessing the per-cpu partial list will
>> be racy.
> Why is all that a problem for a root only interface that should be used
> quite rarely (it is not something that you should be reading hundreds
> time per second, right)?
That can be true. Anyway, I have posted a new patch to use the existing
<slab>/shrink sysfs file to perform memcg cache shrinking as well. So I
am not going to pursue this patch.
Thanks,
Longman
^ permalink raw reply
* Re: [PATCH] mm, slab: Extend slab/shrink to shrink all the memcg caches
From: Waiman Long @ 2019-07-02 18:39 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Alexander Viro, Jonathan Corbet, Luis Chamberlain,
Kees Cook, Johannes Weiner, Michal Hocko, Vladimir Davydov
Cc: linux-mm, linux-doc, linux-fsdevel, cgroups, linux-kernel,
Roman Gushchin, Shakeel Butt, Andrea Arcangeli
In-Reply-To: <20190702183730.14461-1-longman@redhat.com>
On 7/2/19 2:37 PM, Waiman Long wrote:
> Currently, a value of '1" is written to /sys/kernel/slab/<slab>/shrink
> file to shrink the slab by flushing all the per-cpu slabs and free
> slabs in partial lists. This applies only to the root caches, though.
>
> Extends this capability by shrinking all the child memcg caches and
> the root cache when a value of '2' is written to the shrink sysfs file.
>
> On a 4-socket 112-core 224-thread x86-64 system after a parallel kernel
> build, the the amount of memory occupied by slabs before shrinking
> slabs were:
>
> # grep task_struct /proc/slabinfo
> task_struct 7114 7296 7744 4 8 : tunables 0 0
> 0 : slabdata 1824 1824 0
> # grep "^S[lRU]" /proc/meminfo
> Slab: 1310444 kB
> SReclaimable: 377604 kB
> SUnreclaim: 932840 kB
>
> After shrinking slabs:
>
> # grep "^S[lRU]" /proc/meminfo
> Slab: 695652 kB
> SReclaimable: 322796 kB
> SUnreclaim: 372856 kB
> # grep task_struct /proc/slabinfo
> task_struct 2262 2572 7744 4 8 : tunables 0 0
> 0 : slabdata 643 643 0
>
> Signed-off-by: Waiman Long <longman@redhat.com>
This is a follow-up of my previous patch "mm, slab: Extend
vm/drop_caches to shrink kmem slabs". It is based on the linux-next tree.
-Longman
> ---
> Documentation/ABI/testing/sysfs-kernel-slab | 10 +++--
> mm/slab.h | 1 +
> mm/slab_common.c | 43 +++++++++++++++++++++
> mm/slub.c | 2 +
> 4 files changed, 52 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-kernel-slab b/Documentation/ABI/testing/sysfs-kernel-slab
> index 29601d93a1c2..2a3d0fc4b4ac 100644
> --- a/Documentation/ABI/testing/sysfs-kernel-slab
> +++ b/Documentation/ABI/testing/sysfs-kernel-slab
> @@ -429,10 +429,12 @@ KernelVersion: 2.6.22
> Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
> Christoph Lameter <cl@linux-foundation.org>
> Description:
> - The shrink file is written when memory should be reclaimed from
> - a cache. Empty partial slabs are freed and the partial list is
> - sorted so the slabs with the fewest available objects are used
> - first.
> + A value of '1' is written to the shrink file when memory should
> + be reclaimed from a cache. Empty partial slabs are freed and
> + the partial list is sorted so the slabs with the fewest
> + available objects are used first. When a value of '2' is
> + written, all the corresponding child memory cgroup caches
> + should be shrunk as well. All other values are invalid.
>
> What: /sys/kernel/slab/cache/slab_size
> Date: May 2007
> diff --git a/mm/slab.h b/mm/slab.h
> index 3b22931bb557..a16b2c7ff4dd 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -174,6 +174,7 @@ int __kmem_cache_shrink(struct kmem_cache *);
> void __kmemcg_cache_deactivate(struct kmem_cache *s);
> void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s);
> void slab_kmem_cache_release(struct kmem_cache *);
> +int kmem_cache_shrink_all(struct kmem_cache *s);
>
> struct seq_file;
> struct file;
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 464faaa9fd81..493697ba1da5 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -981,6 +981,49 @@ int kmem_cache_shrink(struct kmem_cache *cachep)
> }
> EXPORT_SYMBOL(kmem_cache_shrink);
>
> +/**
> + * kmem_cache_shrink_all - shrink a cache and all its memcg children
> + * @s: The root cache to shrink.
> + *
> + * Return: 0 if successful, -EINVAL if not a root cache
> + */
> +int kmem_cache_shrink_all(struct kmem_cache *s)
> +{
> + struct kmem_cache *c;
> +
> + if (!IS_ENABLED(CONFIG_MEMCG_KMEM)) {
> + kmem_cache_shrink(s);
> + return 0;
> + }
> + if (!is_root_cache(s))
> + return -EINVAL;
> +
> + /*
> + * The caller should have a reference to the root cache and so
> + * we don't need to take the slab_mutex. We have to take the
> + * slab_mutex, however, to iterate the memcg caches.
> + */
> + get_online_cpus();
> + get_online_mems();
> + kasan_cache_shrink(s);
> + __kmem_cache_shrink(s);
> +
> + mutex_lock(&slab_mutex);
> + for_each_memcg_cache(c, s) {
> + /*
> + * Don't need to shrink deactivated memcg caches.
> + */
> + if (s->flags & SLAB_DEACTIVATED)
> + continue;
> + kasan_cache_shrink(c);
> + __kmem_cache_shrink(c);
> + }
> + mutex_unlock(&slab_mutex);
> + put_online_mems();
> + put_online_cpus();
> + return 0;
> +}
> +
> bool slab_is_available(void)
> {
> return slab_state >= UP;
> diff --git a/mm/slub.c b/mm/slub.c
> index a384228ff6d3..5d7b0004c51f 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5298,6 +5298,8 @@ static ssize_t shrink_store(struct kmem_cache *s,
> {
> if (buf[0] == '1')
> kmem_cache_shrink(s);
> + else if (buf[0] == '2')
> + kmem_cache_shrink_all(s);
> else
> return -EINVAL;
> return length;
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox