* Re: [PATCH bpf-next v3 3/8] bpf: add documentation for eBPF helpers (12-22)
From: Daniel Borkmann @ 2018-04-19 10:29 UTC (permalink / raw)
To: Quentin Monnet, ast; +Cc: netdev, oss-drivers, linux-doc, linux-man
In-Reply-To: <20180417143438.7018-4-quentin.monnet@netronome.com>
On 04/17/2018 04:34 PM, Quentin Monnet wrote:
> Add documentation for eBPF helper functions to bpf.h user header file.
> This documentation can be parsed with the Python script provided in
> another commit of the patch series, in order to provide a RST document
> that can later be converted into a man page.
>
> The objective is to make the documentation easily understandable and
> accessible to all eBPF developers, including beginners.
>
> This patch contains descriptions for the following helper functions, all
> written by Alexei:
>
> - bpf_get_current_pid_tgid()
> - bpf_get_current_uid_gid()
> - bpf_get_current_comm()
> - bpf_skb_vlan_push()
> - bpf_skb_vlan_pop()
> - bpf_skb_get_tunnel_key()
> - bpf_skb_set_tunnel_key()
> - bpf_redirect()
> - bpf_perf_event_output()
> - bpf_get_stackid()
> - bpf_get_current_task()
>
> v3:
> - bpf_skb_get_tunnel_key(): Change and improve description and example.
> - bpf_redirect(): Improve description of BPF_F_INGRESS flag.
> - bpf_perf_event_output(): Fix first sentence of description. Delete
> wrong statement on context being evaluated as a struct pt_reg. Remove
> the long yet incomplete example.
> - bpf_get_stackid(): Add a note about PERF_MAX_STACK_DEPTH being
> configurable.
>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> ---
> include/uapi/linux/bpf.h | 225 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 225 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 02b7d522b3c0..c59bf5b28164 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -591,6 +591,231 @@ union bpf_attr {
> * performed again.
> * Return
> * 0 on success, or a negative error in case of failure.
> + *
> + * u64 bpf_get_current_pid_tgid(void)
> + * Return
> + * A 64-bit integer containing the current tgid and pid, and
> + * created as such:
> + * *current_task*\ **->tgid << 32 \|**
> + * *current_task*\ **->pid**.
> + *
> + * u64 bpf_get_current_uid_gid(void)
> + * Return
> + * A 64-bit integer containing the current GID and UID, and
> + * created as such: *current_gid* **<< 32 \|** *current_uid*.
> + *
> + * int bpf_get_current_comm(char *buf, u32 size_of_buf)
> + * Description
> + * Copy the **comm** attribute of the current task into *buf* of
> + * *size_of_buf*. The **comm** attribute contains the name of
> + * the executable (excluding the path) for the current task. The
> + * *size_of_buf* must be strictly positive. On success, the
> + * helper makes sure that the *buf* is NUL-terminated. On failure,
> + * it is filled with zeroes.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci)
> + * Description
> + * Push a *vlan_tci* (VLAN tag control information) of protocol
> + * *vlan_proto* to the packet associated to *skb*, then update
> + * the checksum. Note that if *vlan_proto* is different from
> + * **ETH_P_8021Q** and **ETH_P_8021AD**, it is considered to
> + * be **ETH_P_8021Q**.
> + *
> + * A call to this helper is susceptible to change data from the
> + * packet. Therefore, at load time, all checks on pointers
> + * previously done by the verifier are invalidated and must be
> + * performed again.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_skb_vlan_pop(struct sk_buff *skb)
> + * Description
> + * Pop a VLAN header from the packet associated to *skb*.
> + *
> + * A call to this helper is susceptible to change data from the
> + * packet. Therefore, at load time, all checks on pointers
> + * previously done by the verifier are invalidated and must be
> + * performed again.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_skb_get_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key *key, u32 size, u64 flags)
> + * Description
> + * Get tunnel metadata. This helper takes a pointer *key* to an
> + * empty **struct bpf_tunnel_key** of **size**, that will be
> + * filled with tunnel metadata for the packet associated to *skb*.
> + * The *flags* can be set to **BPF_F_TUNINFO_IPV6**, which
> + * indicates that the tunnel is based on IPv6 protocol instead of
> + * IPv4.
> + *
> + * The **struct bpf_tunnel_key** is an object that generalizes the
> + * principal parameters used by various tunneling protocols into a
> + * single struct. This way, it can be used to easily make a
> + * decision based on the contents of the encapsulation header,
> + * "summarized" in this struct. In particular, it holds the IP
> + * address of the remote end (IPv4 or IPv6, depending on the case)
> + * in *key*\ **->remote_ipv4** or *key*\ **->remote_ipv6**.
I would also mention the tunnel_id which is typically mapped to a vni, allowing
to make this id programmable together with bpf_skb_set_tunnel_key() helper.
> + * Let's imagine that the following code is part of a program
> + * attached to the TC ingress interface, on one end of a GRE
> + * tunnel, and is supposed to filter out all messages coming from
> + * remote ends with IPv4 address other than 10.0.0.1:
> + *
> + * ::
> + *
> + * int ret;
> + * struct bpf_tunnel_key key = {};
> + *
> + * ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
> + * if (ret < 0)
> + * return TC_ACT_SHOT; // drop packet
> + *
> + * if (key.remote_ipv4 != 0x0a000001)
> + * return TC_ACT_SHOT; // drop packet
> + *
> + * return TC_ACT_OK; // accept packet
Lets also add a small sentence that this interface can be used with all
encap devs that can operate in 'collect metadata' mode, where instead of
having one netdevice per specific configuration, the 'collect metadata'
mode only requires a single device where the configuration can be extracted
from those BPF helpers. Could also mentioned this can be used together with
vxlan, geneve, gre and ipip tunnels.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_skb_set_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key *key, u32 size, u64 flags)
> + * Description
> + * Populate tunnel metadata for packet associated to *skb.* The
> + * tunnel metadata is set to the contents of *key*, of *size*. The
> + * *flags* can be set to a combination of the following values:
> + *
> + * **BPF_F_TUNINFO_IPV6**
> + * Indicate that the tunnel is based on IPv6 protocol
> + * instead of IPv4.
> + * **BPF_F_ZERO_CSUM_TX**
> + * For IPv4 packets, add a flag to tunnel metadata
> + * indicating that checksum computation should be skipped
> + * and checksum set to zeroes.
> + * **BPF_F_DONT_FRAGMENT**
> + * Add a flag to tunnel metadata indicating that the
> + * packet should not be fragmented.
> + * **BPF_F_SEQ_NUMBER**
> + * Add a flag to tunnel metadata indicating that a
> + * sequence number should be added to tunnel header before
> + * sending the packet. This flag was added for GRE
> + * encapsulation, but might be used with other protocols
> + * as well in the future.
> + *
> + * Here is a typical usage on the transmit path:
> + *
> + * ::
> + *
> + * struct bpf_tunnel_key key;
> + * populate key ...
> + * bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
> + * bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
See above, maybe this can just reference bpf_skb_get_tunnel_key() from here.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_redirect(u32 ifindex, u64 flags)
> + * Description
> + * Redirect the packet to another net device of index *ifindex*.
> + * This helper is somewhat similar to **bpf_clone_redirect**\
> + * (), except that the packet is not cloned, which provides
> + * increased performance.
> + *
> + * Save for XDP, both ingress and egress interfaces can be used
s/Save/Same/ ?
> + * for redirection. The **BPF_F_INGRESS** value in *flags* is used
(In XDP case, BPF_F_INGRESS cannot be used.)
> + * to make the distinction (ingress path is selected if the flag
> + * is present, egress path otherwise). Currently, XDP only
> + * supports redirection to the egress interface, and accepts no
> + * flag at all.
> + * Return
> + * For XDP, the helper returns **XDP_REDIRECT** on success or
> + * **XDP_ABORT** on error. For other program types, the values
> + * are **TC_ACT_REDIRECT** on success or **TC_ACT_SHOT** on
> + * error.
> + *
> + * int bpf_perf_event_output(struct pt_reg *ctx, struct bpf_map *map, u64 flags, void *data, u64 size)
> + * Description
> + * Write raw *data* blob into a special BPF perf event held by
> + * *map* of type **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. This perf
> + * event must have the following attributes: **PERF_SAMPLE_RAW**
> + * as **sample_type**, **PERF_TYPE_SOFTWARE** as **type**, and
> + * **PERF_COUNT_SW_BPF_OUTPUT** as **config**.
> + *
> + * The *flags* are used to indicate the index in *map* for which
> + * the value must be put, masked with **BPF_F_INDEX_MASK**.
> + * Alternatively, *flags* can be set to **BPF_F_CURRENT_CPU**
> + * to indicate that the index of the current CPU core should be
> + * used.
> + *
> + * The value to write, of *size*, is passed through eBPF stack and
> + * pointed by *data*.
> + *
> + * The context of the program *ctx* needs also be passed to the
> + * helper.
> + *
> + * On user space, a program willing to read the values needs to
> + * call **perf_event_open**\ () on the perf event (either for
> + * one or for all CPUs) and to store the file descriptor into the
> + * *map*. This must be done before the eBPF program can send data
> + * into it. An example is available in file
> + * *samples/bpf/trace_output_user.c* in the Linux kernel source
> + * tree (the eBPF program counterpart is in
> + * *samples/bpf/trace_output_kern.c*).
> + *
> + * **bpf_perf_event_output**\ () achieves better performance
> + * than **bpf_trace_printk**\ () for sharing data with user
> + * space, and is much better suitable for streaming data from eBPF
> + * programs.
Would also mentioned that this helper can be used out of tc and XDP BPF
programs as well and allows for passing i) only custom structs, ii) only
packet payload, or iii) a combination of both to user space listeners.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_get_stackid(struct pt_reg *ctx, struct bpf_map *map, u64 flags)
> + * Description
> + * Walk a user or a kernel stack and return its id. To achieve
> + * this, the helper needs *ctx*, which is a pointer to the context
> + * on which the tracing program is executed, and a pointer to a
> + * *map* of type **BPF_MAP_TYPE_STACK_TRACE**.
> + *
> + * The last argument, *flags*, holds the number of stack frames to
> + * skip (from 0 to 255), masked with
> + * **BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
> + * a combination of the following flags:
> + *
> + * **BPF_F_USER_STACK**
> + * Collect a user space stack instead of a kernel stack.
> + * **BPF_F_FAST_STACK_CMP**
> + * Compare stacks by hash only.
> + * **BPF_F_REUSE_STACKID**
> + * If two different stacks hash into the same *stackid*,
> + * discard the old one.
> + *
> + * The stack id retrieved is a 32 bit long integer handle which
> + * can be further combined with other data (including other stack
> + * ids) and used as a key into maps. This can be useful for
> + * generating a variety of graphs (such as flame graphs or off-cpu
> + * graphs).
> + *
> + * For walking a stack, this helper is an improvement over
> + * **bpf_probe_read**\ (), which can be used with unrolled loops
> + * but is not efficient and consumes a lot of eBPF instructions.
> + * Instead, **bpf_get_stackid**\ () can collect up to
> + * **PERF_MAX_STACK_DEPTH** both kernel and user frames. Note that
> + * this limit can be controlled with the **sysctl** program, and
> + * that it should be manually increased in order to profile long
> + * user stacks (such as stacks for Java programs). To do so, use:
> + *
> + * ::
> + *
> + * # sysctl kernel.perf_event_max_stack=<new value>
> + *
> + * Return
> + * The positive or null stack id on success, or a negative error
> + * in case of failure.
> + *
> + * u64 bpf_get_current_task(void)
> + * Return
> + * A pointer to the current task struct.
> */
> #define __BPF_FUNC_MAPPER(FN) \
> FN(unspec), \
>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH v2] doc: dev-tools: kselftest.rst: update contributing new tests
From: Anders Roxell @ 2018-04-19 10:28 UTC (permalink / raw)
To: shuah, corbet; +Cc: linux-kselftest, linux-doc, linux-kernel, Anders Roxell
In-Reply-To: <20180417084631.11242-1-anders.roxell@linaro.org>
Add a description that the kernel headers should be used as far as it is
possible and then the system headers.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
---
Documentation/dev-tools/kselftest.rst | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst
index e80850eefe13..3bf371a938d0 100644
--- a/Documentation/dev-tools/kselftest.rst
+++ b/Documentation/dev-tools/kselftest.rst
@@ -151,6 +151,11 @@ Contributing new tests (details)
TEST_FILES, TEST_GEN_FILES mean it is the file which is used by
test.
+ * First use the headers inside the kernel source and/or git repo, and then the
+ system headers. Headers for the kernel release as opposed to headers
+ installed by the distro on the system should be the primary focus to be able
+ to find regressions.
+
Test Harness
============
--
2.11.0
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: [PATCH bpf-next v3 2/8] bpf: add documentation for eBPF helpers (01-11)
From: Daniel Borkmann @ 2018-04-19 10:02 UTC (permalink / raw)
To: Quentin Monnet, ast; +Cc: netdev, oss-drivers, linux-doc, linux-man
In-Reply-To: <20180417143438.7018-3-quentin.monnet@netronome.com>
On 04/17/2018 04:34 PM, Quentin Monnet wrote:
> Add documentation for eBPF helper functions to bpf.h user header file.
> This documentation can be parsed with the Python script provided in
> another commit of the patch series, in order to provide a RST document
> that can later be converted into a man page.
>
> The objective is to make the documentation easily understandable and
> accessible to all eBPF developers, including beginners.
>
> This patch contains descriptions for the following helper functions, all
> written by Alexei:
>
> - bpf_map_lookup_elem()
> - bpf_map_update_elem()
> - bpf_map_delete_elem()
> - bpf_probe_read()
> - bpf_ktime_get_ns()
> - bpf_trace_printk()
> - bpf_skb_store_bytes()
> - bpf_l3_csum_replace()
> - bpf_l4_csum_replace()
> - bpf_tail_call()
> - bpf_clone_redirect()
>
> v3:
> - bpf_map_lookup_elem(): Fix description of restrictions for flags
> related to the existence of the entry.
> - bpf_trace_printk(): State that trace_pipe can be configured. Fix
> return value in case an unknown format specifier is met. Add a note on
> kernel log notice when the helper is used. Edit example.
> - bpf_tail_call(): Improve comment on stack inheritance.
> - bpf_clone_redirect(): Improve description of BPF_F_INGRESS flag.
>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Thanks for doing all this work, Quentin!
Just some small improvements while reading over it:
> ---
> include/uapi/linux/bpf.h | 210 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 210 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 45f77f01e672..02b7d522b3c0 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -381,6 +381,216 @@ union bpf_attr {
> * intentional, removing them would break paragraphs for rst2man.
> *
> * Start of BPF helper function descriptions:
> + *
> + * void *bpf_map_lookup_elem(struct bpf_map *map, void *key)
const void *key
> + * Description
> + * Perform a lookup in *map* for an entry associated to *key*.
> + * Return
> + * Map value associated to *key*, or **NULL** if no entry was
> + * found.
> + *
> + * int bpf_map_update_elem(struct bpf_map *map, void *key, void *value, u64 flags)
const void *key, const void *value
> + * Description
> + * Add or update the value of the entry associated to *key* in
> + * *map* with *value*. *flags* is one of:
> + *
> + * **BPF_NOEXIST**
> + * The entry for *key* must not exist in the map.
> + * **BPF_EXIST**
> + * The entry for *key* must already exist in the map.
> + * **BPF_ANY**
> + * No condition on the existence of the entry for *key*.
> + *
> + * Flag value **BPF_NOEXIST** cannot be used for maps of types
> + * **BPF_MAP_TYPE_ARRAY** or **BPF_MAP_TYPE_PERCPU_ARRAY** (all
> + * elements always exist), the helper would return an error.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_map_delete_elem(struct bpf_map *map, void *key)
const void *key
> + * Description
> + * Delete entry with *key* from *map*.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_probe_read(void *dst, u32 size, const void *src)
> + * Description
> + * For tracing programs, safely attempt to read *size* bytes from
> + * address *src* and store the data in *dst*.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * u64 bpf_ktime_get_ns(void)
> + * Description
> + * Return the time elapsed since system boot, in nanoseconds.
> + * Return
> + * Current *ktime*.
> + *
> + * int bpf_trace_printk(const char *fmt, u32 fmt_size, ...)
> + * Description
> + * This helper is a "printk()-like" facility for debugging. It
> + * prints a message defined by format *fmt* (of size *fmt_size*)
> + * to file *\/sys/kernel/debug/tracing/trace* from DebugFS, if
> + * available. It can take up to three additional **u64**
> + * arguments (as an eBPF helpers, the total number of arguments is
> + * limited to five).
> + *
> + * Each time the helper is called, it appends a line to the trace.
> + * The format of the trace is customizable, and the exact output
> + * one will get depends on the options set in
> + * *\/sys/kernel/debug/tracing/trace_options* (see also the
> + * *README* file under the same directory). However, it usually
> + * defaults to something like:
> + *
> + * ::
> + *
> + * telnet-470 [001] .N.. 419421.045894: 0x00000001: <formatted msg>
> + *
> + * In the above:
> + *
> + * * ``telnet`` is the name of the current task.
> + * * ``470`` is the PID of the current task.
> + * * ``001`` is the CPU number on which the task is
> + * running.
> + * * In ``.N..``, each character refers to a set of
> + * options (whether irqs are enabled, scheduling
> + * options, whether hard/softirqs are running, level of
> + * preempt_disabled respectively). **N** means that
> + * **TIF_NEED_RESCHED** and **PREEMPT_NEED_RESCHED**
> + * are set.
> + * * ``419421.045894`` is a timestamp.
> + * * ``0x00000001`` is a fake value used by BPF for the
> + * instruction pointer register.
> + * * ``<formatted msg>`` is the message formatted with
> + * *fmt*.
> + *
> + * The conversion specifiers supported by *fmt* are similar, but
> + * more limited than for printk(). They are **%d**, **%i**,
> + * **%u**, **%x**, **%ld**, **%li**, **%lu**, **%lx**, **%lld**,
> + * **%lli**, **%llu**, **%llx**, **%p**, **%s**. No modifier (size
> + * of field, padding with zeroes, etc.) is available, and the
> + * helper will return **-EINVAL** (but print nothing) if it
> + * encounters an unknown specifier.
> + *
> + * Also, note that **bpf_trace_printk**\ () is slow, and should
> + * only be used for debugging purposes. For this reason, a notice
> + * bloc (spanning several lines) is printed to kernel logs and
> + * states that the helper should not be used "for production use"
> + * the first time this helper is used (or more precisely, when
> + * **trace_printk**\ () buffers are allocated). For passing values
> + * to user space, perf events should be preferred.
> + * Return
> + * The number of bytes written to the buffer, or a negative error
> + * in case of failure.
> + *
> + * int bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len, u64 flags)
> + * Description
> + * Store *len* bytes from address *from* into the packet
> + * associated to *skb*, at *offset*. *flags* are a combination of
> + * **BPF_F_RECOMPUTE_CSUM** (automatically recompute the
> + * checksum for the packet after storing the bytes) and
> + * **BPF_F_INVALIDATE_HASH** (set *skb*\ **->hash**, *skb*\
> + * **->swhash** and *skb*\ **->l4hash** to 0).
> + *
> + * A call to this helper is susceptible to change data from the
> + * packet. Therefore, at load time, all checks on pointers
Perhaps: to change the underlying packet buffer (it's not the data itself but
a potential reallocation to e.g. unclone the skb which otherwise would lead to
a use-after-free if not forced to be invalidated from verifier).
> + * previously done by the verifier are invalidated and must be
> + * performed again.
I would add something like "if used in combination with direct packet access"
where it's the only place this comment is relevant.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_l3_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64 to, u64 size)
> + * Description
> + * Recompute the IP checksum for the packet associated to *skb*.
Perhaps we could say something like 'L3 (e.g. IP)'. The helper itself has zero
knowledge of the underlying protocol used. It's solely replacing the csum and
updating skb->csum when necessary.
> + * Computation is incremental, so the helper must know the former
> + * value of the header field that was modified (*from*), the new
> + * value of this field (*to*), and the number of bytes (2 or 4)
> + * for this field, stored in *size*. Alternatively, it is possible
> + * to store the difference between the previous and the new values
> + * of the header field in *to*, by setting *from* and *size* to 0.
I would add that this works in combination with csum_diff() helper, allows for
more flexibility and to handle sizes larger than 2 or 4.
> + * For both methods, *offset* indicates the location of the IP
> + * checksum within the packet.
> + *
> + * A call to this helper is susceptible to change data from the
> + * packet. Therefore, at load time, all checks on pointers
> + * previously done by the verifier are invalidated and must be
> + * performed again.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_l4_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64 to, u64 flags)
> + * Description
> + * Recompute the TCP or UDP checksum for the packet associated to
See prior comment, L4 (e.g. TCP, UDP or ICMP).
> + * *skb*. Computation is incremental, so the helper must know the
> + * former value of the header field that was modified (*from*),
> + * the new value of this field (*to*), and the number of bytes (2
> + * or 4) for this field, stored on the lowest four bits of
> + * *flags*. Alternatively, it is possible to store the difference
> + * between the previous and the new values of the header field in
> + * *to*, by setting *from* and the four lowest bits of *flags* to
Same reference for csum_diff().
> + * 0. For both methods, *offset* indicates the location of the IP
> + * checksum within the packet. In addition to the size of the
> + * field, *flags* can be added (bitwise OR) actual flags. With
> + * **BPF_F_MARK_MANGLED_0**, a null checksum is left untouched
> + * (unless **BPF_F_MARK_ENFORCE** is added as well), and for
> + * updates resulting in a null checksum the value is set to
> + * **CSUM_MANGLED_0** instead. Flag **BPF_F_PSEUDO_HDR**
> + * indicates the checksum is to be computed against a
> + * pseudo-header.
> + *
> + * A call to this helper is susceptible to change data from the
> + * packet. Therefore, at load time, all checks on pointers
> + * previously done by the verifier are invalidated and must be
> + * performed again.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_tail_call(void *ctx, struct bpf_map *prog_array_map, u32 index)
> + * Description
> + * This special helper is used to trigger a "tail call", or in
> + * other words, to jump into another eBPF program. The same stack
> + * frame is used (but values on stack and in registers for the
> + * caller are not accessible to the callee). This mechanism allows
> + * for program chaining, either for raising the maximum number of
> + * available eBPF instructions, or to execute given programs in
> + * conditional blocks. For security reasons, there is an upper
> + * limit to the number of successive tail calls that can be
> + * performed.
> + *
> + * Upon call of this helper, the program attempts to jump into a
> + * program referenced at index *index* in *prog_array_map*, a
> + * special map of type **BPF_MAP_TYPE_PROG_ARRAY**, and passes
> + * *ctx*, a pointer to the context.
> + *
> + * If the call succeeds, the kernel immediately runs the first
> + * instruction of the new program. This is not a function call,
> + * and it never goes back to the previous program. If the call
Nit: s/goes back/returns/
> + * fails, then the helper has no effect, and the caller continues
> + * to run its own instructions. A call can fail if the destination
Maybe: s/to run its own/to run its subsequent/
> + * program for the jump does not exist (i.e. *index* is superior
> + * to the number of entries in *prog_array_map*), or if the
> + * maximum number of tail calls has been reached for this chain of
> + * programs. This limit is defined in the kernel by the macro
> + * **MAX_TAIL_CALL_CNT** (not accessible to user space), which
> + * is currently set to 32.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_clone_redirect(struct sk_buff *skb, u32 ifindex, u64 flags)
> + * Description
> + * Clone and redirect the packet associated to *skb* to another
> + * net device of index *ifindex*. Both ingress and egress
> + * interfaces can be used for redirection. The **BPF_F_INGRESS**
> + * value in *flags* is used to make the distinction (ingress path
> + * is selected if the flag is present, egress path otherwise).
> + * This is the only flag supported for now.
Would probably make sense to describe the relation to bpf_redirect() helper
in one sentence, meaning, that bpf_clone_redirect() has the associated cost
duplicating the skb but the helper can be used out of the BPF prog whereas
this is not the case with bpf_redirect() which is therefore more efficient
but handled through an action code where the redirect happens after the BPF
prog returned.
> + * A call to this helper is susceptible to change data from the
> + * packet. Therefore, at load time, all checks on pointers
> + * previously done by the verifier are invalidated and must be
> + * performed again.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> */
> #define __BPF_FUNC_MAPPER(FN) \
> FN(unspec), \
>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH 2/2] atomic_ops.rst: Use `warning` rst directive
From: SeongJae Park @ 2018-04-19 8:42 UTC (permalink / raw)
To: paulmck, corbet; +Cc: linux-kernel, linux-doc, SeongJae Park
In-Reply-To: <20180419084245.17096-1-sj38.park@gmail.com>
One warning message in 'atomic_ops.rst' is not using 'warning' rst
directive while others does. This commit modifies the message to use
'warning' rst directive.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
Documentation/core-api/atomic_ops.rst | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/Documentation/core-api/atomic_ops.rst b/Documentation/core-api/atomic_ops.rst
index 4ea4af71e68a..2e7165f86f55 100644
--- a/Documentation/core-api/atomic_ops.rst
+++ b/Documentation/core-api/atomic_ops.rst
@@ -466,10 +466,12 @@ Like the above, except that these routines return a boolean which
indicates whether the changed bit was set _BEFORE_ the atomic bit
operation.
-WARNING! It is incredibly important that the value be a boolean,
-ie. "0" or "1". Do not try to be fancy and save a few instructions by
-declaring the above to return "long" and just returning something like
-"old_val & mask" because that will not work.
+
+.. warning::
+ It is incredibly important that the value be a boolean, ie. "0" or "1".
+ Do not try to be fancy and save a few instructions by declaring the
+ above to return "long" and just returning something like "old_val &
+ mask" because that will not work.
For one thing, this return value gets truncated to int in many code
paths using these interfaces, so on 64-bit if the bit is set in the
--
2.13.0
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 1/2] atomic_ops.rst: Fix wrong example code
From: SeongJae Park @ 2018-04-19 8:42 UTC (permalink / raw)
To: paulmck, corbet; +Cc: linux-kernel, linux-doc, SeongJae Park
Example code snippets for necessary of READ_ONCE() and WRITE_ONCE() has
an unnecessary line of code and wrong condition. This commit fixes
them.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
Documentation/core-api/atomic_ops.rst | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/Documentation/core-api/atomic_ops.rst b/Documentation/core-api/atomic_ops.rst
index fce929144ccd..4ea4af71e68a 100644
--- a/Documentation/core-api/atomic_ops.rst
+++ b/Documentation/core-api/atomic_ops.rst
@@ -111,7 +111,6 @@ If the compiler can prove that do_something() does not store to the
variable a, then the compiler is within its rights transforming this to
the following::
- tmp = a;
if (a > 0)
for (;;)
do_something();
@@ -119,7 +118,7 @@ the following::
If you don't want the compiler to do this (and you probably don't), then
you should use something like the following::
- while (READ_ONCE(a) < 0)
+ while (READ_ONCE(a) > 0)
do_something();
Alternatively, you could place a barrier() call in the loop.
--
2.13.0
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: [PATCH v11 0/4] set VSESR_EL2 by user space and support NOTIFY_SEI notification
From: gengdongjiu @ 2018-04-19 4:09 UTC (permalink / raw)
To: James Morse
Cc: rkrcmar, corbet, christoffer.dall, marc.zyngier, linux,
catalin.marinas, rjw, bp, lenb, kvm, linux-doc, linux-kernel,
linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
zhengxiang9
In-Reply-To: <eb65baf4-abb0-570f-f3af-61f505c9de10@arm.com>
James,
>
>> I do not know when it is merge-window. About the apply version, it does not have limited.
>
> 'git fetch' Linus' tree and look at the tags. 'v4.16' lost its '-rc' suffixes,
> and there isn't a 'v4.17-rc1' yet, so we are still in the merge window.
>
> Linus sends a message to LKML. eg:
> https://lkml.org/lkml/2018/4/1/175
>
> net-next closes shortly before the merge window, and re-opens afterwards. There
> is a handy web page:
> http://vger.kernel.org/~davem/net-next.html
Thanks for this information, got it.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [RESEND PATCH] x86/boot/KASLR: Extend movable_node option for KASLR
From: Dou Liyang @ 2018-04-19 2:09 UTC (permalink / raw)
To: linux-kernel, x86, linux-doc
Cc: tglx, mingo, hpa, keescook, bhe, fanc.fnst, indou.takao
In-Reply-To: <20180403033612.19925-1-douly.fnst@cn.fujitsu.com>
Hi Ingo,
Any comments about that?
Now, When users want to support node hotplug with KASLR, they use
'mem=' to restrict the boot-up memory to the first node memory size.
If we want to boot up some hotpluggable node, their memory can't be
shown.
IMO, only few machines can support physical NUMA Node hotplug, and
we can't get memory hotplug info from ACPI SRAT earlier now(If we can do
that, we even can remove the 'movable_node' option).
So, IMO, extend movable_node to replace the misuse of 'mem' option.
Thought?
Thanks,
dou
At 04/03/2018 11:36 AM, Dou Liyang wrote:
> The movable_node option is a boot-time switch to make sure the physical
> NUMA nodes can be hot-added/removed when ACPI table can't be parsed to
> provide the memory hotplug information.
>
> As we all know, there is always one node, called "home node", which
> can't be movabled and the kernel image resides in it. With movable_node
> option, Linux allocates new early memorys near the kernel image to avoid
> using the other movable node.
>
> But, due to KASLR also can't get the the memory hotplug information, it may
> randomize the kernel image into a movable node which breaks the rule of
> movable_node option and makes the physical hot-add/remove operation failed.
>
> The perfect solution is providing the memory hotplug information to KASLR.
> But, it needs the efforts from hardware engineers and software engineers.
>
> Here is an alternative method. Extend movable_node option to restrict kernel
> to be randomized in the home node by adding a parameter. this parameter sets
> up the boundaries between the home nodes and other nodes.
>
> Reported-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> ---
> Changelog:
> -Rewrite the commit log and document.
>
> Documentation/admin-guide/kernel-parameters.txt | 12 ++++++++++--
> arch/x86/boot/compressed/kaslr.c | 19 ++++++++++++++++---
> 2 files changed, 26 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 1d1d53f85ddd..0cfc0b10a117 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2353,7 +2353,8 @@
> mousedev.yres= [MOUSE] Vertical screen resolution, used for devices
> reporting absolute coordinates, such as tablets
>
> - movablecore=nn[KMG] [KNL,X86,IA-64,PPC] This parameter
> + movablecore=nn[KMG]
> + [KNL,X86,IA-64,PPC] This parameter
> is similar to kernelcore except it specifies the
> amount of memory used for migratable allocations.
> If both kernelcore and movablecore is specified,
> @@ -2363,12 +2364,19 @@
> that the amount of memory usable for all allocations
> is not too small.
>
> - movable_node [KNL] Boot-time switch to make hotplugable memory
> + movable_node [KNL] Boot-time switch to make hot-pluggable memory
> NUMA nodes to be movable. This means that the memory
> of such nodes will be usable only for movable
> allocations which rules out almost all kernel
> allocations. Use with caution!
>
> + movable_node=nn[KMG]
> + [KNL] Extend movable_node to make it work well with KASLR.
> + This parameter is the boundaries between the "home node" and
> + the other nodes. The "home node" is an immovable node and is
> + defined by BIOS. Set the 'nn' to the memory size of "home
> + node", the kernel image will be extracted in immovable nodes.
> +
> MTD_Partition= [MTD]
> Format: <name>,<region-number>,<size>,<offset>
>
> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> index 8199a6187251..f906d7890e69 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -92,7 +92,10 @@ struct mem_vector {
> static bool memmap_too_large;
>
>
> -/* Store memory limit specified by "mem=nn[KMG]" or "memmap=nn[KMG]" */
> +/*
> + * Store memory limit specified by the following situations:
> + * "mem=nn[KMG]" or "memmap=nn[KMG]" or "movable_node=nn[KMG]"
> + */
> unsigned long long mem_limit = ULLONG_MAX;
>
>
> @@ -214,7 +217,8 @@ static int handle_mem_memmap(void)
> char *param, *val;
> u64 mem_size;
>
> - if (!strstr(args, "memmap=") && !strstr(args, "mem="))
> + if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
> + !strstr(args, "movable_node="))
> return 0;
>
> tmp_cmdline = malloc(len + 1);
> @@ -249,7 +253,16 @@ static int handle_mem_memmap(void)
> free(tmp_cmdline);
> return -EINVAL;
> }
> - mem_limit = mem_size;
> + mem_limit = mem_limit > mem_size ? mem_size : mem_limit;
> + } else if (!strcmp(param, "movable_node")) {
> + char *p = val;
> +
> + mem_size = memparse(p, &p);
> + if (mem_size == 0) {
> + free(tmp_cmdline);
> + return -EINVAL;
> + }
> + mem_limit = mem_limit > mem_size ? mem_size : mem_limit;
> }
> }
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH bpf-next v3 6/8] bpf: add documentation for eBPF helpers (42-50)
From: Martin KaFai Lau @ 2018-04-18 23:42 UTC (permalink / raw)
To: Quentin Monnet
Cc: daniel, ast, netdev, oss-drivers, linux-doc, linux-man, Kaixu Xia,
Sargun Dhillon, Thomas Graf, Gianluca Borello, Chenbo Feng
In-Reply-To: <20180417143438.7018-7-quentin.monnet@netronome.com>
On Tue, Apr 17, 2018 at 03:34:36PM +0100, Quentin Monnet wrote:
[...]
> @@ -965,6 +984,17 @@ union bpf_attr {
> * Return
> * 0 on success, or a negative error in case of failure.
> *
> + * int bpf_skb_under_cgroup(struct sk_buff *skb, struct bpf_map *map, u32 index)
> + * Description
> + * Check whether *skb* is a descendant of the cgroup2 held by
> + * *map* of type **BPF_MAP_TYPE_CGROUP_ARRAY**, at *index*.
> + * Return
> + * The return value depends on the result of the test, and can be:
> + *
> + * * 0, if the *skb* failed the cgroup2 descendant test.
> + * * 1, if the *skb* succeeded the cgroup2 descendant test.
> + * * A negative error code, if an error occurred.
> + *
[...]
> + * int bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
> + * Description
> + * Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
> + * it is possible to use a negative value for *delta*. This helper
> + * can be used to prepare the packet for pushing or popping
> + * headers.
> + *
> + * A call to this helper is susceptible to change data from the
> + * packet. Therefore, at load time, all checks on pointers
> + * previously done by the verifier are invalidated and must be
> + * performed again.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
LGTM. Thanks!
Acked-by: Martin KaFai Lau <kafai@fb.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH bpf-next v3 6/8] bpf: add documentation for eBPF helpers (42-50)
From: Alexei Starovoitov @ 2018-04-18 23:29 UTC (permalink / raw)
To: Quentin Monnet
Cc: daniel, ast, netdev, oss-drivers, linux-doc, linux-man, Kaixu Xia,
Martin KaFai Lau, Sargun Dhillon, Thomas Graf, Gianluca Borello,
Chenbo Feng
In-Reply-To: <20180417143438.7018-7-quentin.monnet@netronome.com>
On Tue, Apr 17, 2018 at 03:34:36PM +0100, Quentin Monnet wrote:
> Add documentation for eBPF helper functions to bpf.h user header file.
> This documentation can be parsed with the Python script provided in
> another commit of the patch series, in order to provide a RST document
> that can later be converted into a man page.
>
> The objective is to make the documentation easily understandable and
> accessible to all eBPF developers, including beginners.
>
> This patch contains descriptions for the following helper functions:
>
> Helper from Kaixu:
> - bpf_perf_event_read()
>
> Helpers from Martin:
> - bpf_skb_under_cgroup()
> - bpf_xdp_adjust_head()
>
> Helpers from Sargun:
> - bpf_probe_write_user()
> - bpf_current_task_under_cgroup()
>
> Helper from Thomas:
> - bpf_skb_change_head()
>
> Helper from Gianluca:
> - bpf_probe_read_str()
>
> Helpers from Chenbo:
> - bpf_get_socket_cookie()
> - bpf_get_socket_uid()
>
> v3:
> - bpf_perf_event_read(): Fix time of selection for perf event type in
> description. Remove occurences of "cores" to avoid confusion with
> "CPU".
>
> Cc: Kaixu Xia <xiakaixu@huawei.com>
> Cc: Martin KaFai Lau <kafai@fb.com>
> Cc: Sargun Dhillon <sargun@sargun.me>
> Cc: Thomas Graf <tgraf@suug.ch>
> Cc: Gianluca Borello <g.borello@gmail.com>
> Cc: Chenbo Feng <fengc@google.com>
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
...
> + *
> + * u64 bpf_get_socket_cookie(struct sk_buff *skb)
> + * Description
> + * Retrieve the socket cookie generated by the kernel from a
> + * **struct sk_buff** with a known socket. If none has been set
this bit could use some improvement, since it reads as cookie is
generated from sk_buff, whereas it has nothing to do with this particular
sk_buff. Cookie belongs to the socket and generated for the socket.
Would be good to explain that cookie is stable for the life of the socket.
For the rest:
Acked-by: Alexei Starovoitov <ast@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH bpf-next v3 5/8] bpf: add documentation for eBPF helpers (33-41)
From: Alexei Starovoitov @ 2018-04-18 22:23 UTC (permalink / raw)
To: Quentin Monnet; +Cc: daniel, ast, netdev, oss-drivers, linux-doc, linux-man
In-Reply-To: <20180417143438.7018-6-quentin.monnet@netronome.com>
On Tue, Apr 17, 2018 at 03:34:35PM +0100, Quentin Monnet wrote:
> Add documentation for eBPF helper functions to bpf.h user header file.
> This documentation can be parsed with the Python script provided in
> another commit of the patch series, in order to provide a RST document
> that can later be converted into a man page.
>
> The objective is to make the documentation easily understandable and
> accessible to all eBPF developers, including beginners.
>
> This patch contains descriptions for the following helper functions, all
> written by Daniel:
>
> - bpf_get_hash_recalc()
> - bpf_skb_change_tail()
> - bpf_skb_pull_data()
> - bpf_csum_update()
> - bpf_set_hash_invalid()
> - bpf_get_numa_node_id()
> - bpf_set_hash()
> - bpf_skb_adjust_room()
> - bpf_xdp_adjust_meta()
>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH bpf-next v3 4/8] bpf: add documentation for eBPF helpers (23-32)
From: Alexei Starovoitov @ 2018-04-18 22:11 UTC (permalink / raw)
To: Quentin Monnet; +Cc: daniel, ast, netdev, oss-drivers, linux-doc, linux-man
In-Reply-To: <20180417143438.7018-5-quentin.monnet@netronome.com>
On Tue, Apr 17, 2018 at 03:34:34PM +0100, Quentin Monnet wrote:
> Add documentation for eBPF helper functions to bpf.h user header file.
> This documentation can be parsed with the Python script provided in
> another commit of the patch series, in order to provide a RST document
> that can later be converted into a man page.
>
> The objective is to make the documentation easily understandable and
> accessible to all eBPF developers, including beginners.
>
> This patch contains descriptions for the following helper functions, all
> written by Daniel:
>
> - bpf_get_prandom_u32()
> - bpf_get_smp_processor_id()
> - bpf_get_cgroup_classid()
> - bpf_get_route_realm()
> - bpf_skb_load_bytes()
> - bpf_csum_diff()
> - bpf_skb_get_tunnel_opt()
> - bpf_skb_set_tunnel_opt()
> - bpf_skb_change_proto()
> - bpf_skb_change_type()
>
> v3:
> - bpf_get_prandom_u32(): Fix helper name :(. Add description, including
> a note on the internal random state.
> - bpf_get_smp_processor_id(): Add description, including a note on the
> processor id remaining stable during program run.
> - bpf_get_cgroup_classid(): State that CONFIG_CGROUP_NET_CLASSID is
> required to use the helper. Add a reference to related documentation.
> State that placing a task in net_cls controller disables cgroup-bpf.
> - bpf_get_route_realm(): State that CONFIG_CGROUP_NET_CLASSID is
> required to use this helper.
> - bpf_skb_load_bytes(): Fix comment on current use cases for the helper.
>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH bpf-next v3 3/8] bpf: add documentation for eBPF helpers (12-22)
From: Alexei Starovoitov @ 2018-04-18 22:10 UTC (permalink / raw)
To: Quentin Monnet; +Cc: daniel, ast, netdev, oss-drivers, linux-doc, linux-man
In-Reply-To: <20180417143438.7018-4-quentin.monnet@netronome.com>
On Tue, Apr 17, 2018 at 03:34:33PM +0100, Quentin Monnet wrote:
> Add documentation for eBPF helper functions to bpf.h user header file.
> This documentation can be parsed with the Python script provided in
> another commit of the patch series, in order to provide a RST document
> that can later be converted into a man page.
>
> The objective is to make the documentation easily understandable and
> accessible to all eBPF developers, including beginners.
>
> This patch contains descriptions for the following helper functions, all
> written by Alexei:
>
> - bpf_get_current_pid_tgid()
> - bpf_get_current_uid_gid()
> - bpf_get_current_comm()
> - bpf_skb_vlan_push()
> - bpf_skb_vlan_pop()
> - bpf_skb_get_tunnel_key()
> - bpf_skb_set_tunnel_key()
> - bpf_redirect()
> - bpf_perf_event_output()
> - bpf_get_stackid()
> - bpf_get_current_task()
>
> v3:
> - bpf_skb_get_tunnel_key(): Change and improve description and example.
> - bpf_redirect(): Improve description of BPF_F_INGRESS flag.
> - bpf_perf_event_output(): Fix first sentence of description. Delete
> wrong statement on context being evaluated as a struct pt_reg. Remove
> the long yet incomplete example.
> - bpf_get_stackid(): Add a note about PERF_MAX_STACK_DEPTH being
> configurable.
>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
looks great.
Acked-by: Alexei Starovoitov <ast@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v3 07/10] Documentation: dt-bindings: Add documents for PECI hwmon client drivers
From: Jae Hyun Yoo @ 2018-04-18 21:57 UTC (permalink / raw)
To: Rob Herring
Cc: Alan Cox, Andrew Jeffery, Andrew Lunn, Andy Shevchenko,
Arnd Bergmann, Benjamin Herrenschmidt, Fengguang Wu, Greg KH,
Guenter Roeck, Haiyue Wang, James Feist, Jason M Biils,
Jean Delvare, Joel Stanley, Julia Cartwright, Miguel Ojeda,
Milton Miller II, Pavel Machek, Randy Dunlap, Stef van Os,
Sumeet R Pawnikar, Vernon Mauery, linux-kernel@vger.kernel.org,
linux-doc, devicetree, Linux HWMON List,
moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
OpenBMC Maillist
In-Reply-To: <CAL_JsqLpuNj4kQ8oXB0kxOsS7ww9Jk-oq4tJrroDKkLRTPrjSA@mail.gmail.com>
On 4/18/2018 2:28 PM, Rob Herring wrote:
> On Wed, Apr 18, 2018 at 3:28 PM, Jae Hyun Yoo
> <jae.hyun.yoo@linux.intel.com> wrote:
>> On 4/18/2018 7:32 AM, Rob Herring wrote:
>>>
>>> On Tue, Apr 17, 2018 at 3:40 PM, Jae Hyun Yoo
>>> <jae.hyun.yoo@linux.intel.com> wrote:
>>>>
>>>> On 4/16/2018 4:51 PM, Jae Hyun Yoo wrote:
>>>>>
>>>>>
>>>>> On 4/16/2018 4:22 PM, Jae Hyun Yoo wrote:
>>>>>>
>>>>>>
>>>>>> On 4/16/2018 11:14 AM, Rob Herring wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 10, 2018 at 11:32:09AM -0700, Jae Hyun Yoo wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> This commit adds dt-bindings documents for PECI cputemp and dimmtemp
>>>>>>>> client
>>>>>>>> drivers.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>
>>>> [...]
>>>>
>>>>>>>> +Example:
>>>>>>>> + peci-bus@0 {
>>>>>>>> + #address-cells = <1>;
>>>>>>>> + #size-cells = <0>;
>>>>>>>> + < more properties >
>>>>>>>> +
>>>>>>>> + peci-dimmtemp@cpu0 {
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> unit-address is wrong.
>>>>>>>
>>>>>>
>>>>>> Will fix it using the reg value.
>>>>>>
>>>>>>> It is a different bus from cputemp? Otherwise, you have conflicting
>>>>>>> addresses. If that's the case, probably should make it clear by
>>>>>>> showing
>>>>>>> different host adapters for each example.
>>>>>>>
>>>>>>
>>>>>> It could be the same bus with cputemp. Also, client address sharing is
>>>>>> possible by PECI core if the functionality is different. I mean,
>>>>>> cputemp and
>>>>>> dimmtemp targeting the same client is possible case like this.
>>>>>> peci-cputemp@30
>>>>>> peci-dimmtemp@30
>>>>>>
>>>>>
>>>>> Oh, I got your point. Probably, I should change these separate settings
>>>>> into one like
>>>>>
>>>>> peci-client@30 {
>>>>> compatible = "intel,peci-client";
>>>>> reg = <0x30>;
>>>>> };
>>>>>
>>>>> Then cputemp and dimmtemp drivers could refer the same compatible
>>>>> string.
>>>>> Will rewrite it.
>>>>>
>>>>
>>>> I've checked it again and realized that it should use function based node
>>>> name like:
>>>>
>>>> peci-cputemp@30
>>>> peci-dimmtemp@30
>>>>
>>>> If it use the same string like 'peci-client@30', the drivers cannot be
>>>> selectively enabled. The client address sharing way is well handled in
>>>> PECI
>>>> core and this way would be better for the future implementations of other
>>>> PECI functional drivers such as crash dump driver and so on. So I'm going
>>>> change the unit-address only.
>>>
>>>
>>> 2 nodes at the same address is wrong (and soon dtc will warn you on
>>> this). You have 2 potential options. The first is you need additional
>>> address information in the DT if these are in fact 2 independent
>>> devices. This could be something like a function number to use
>>> something from PCI addressing. From what I found on PECI, it doesn't
>>> seem to have anything like that. The 2nd option is you have a single
>>> DT node which registers multiple hwmon devices. DT nodes and drivers
>>> don't have to be 1-1. Don't design your DT nodes from how you want to
>>> partition drivers in some OS.
>>>
>>> Rob
>>>
>>
>> Please correct me if I'm wrong but I'm still thinking that it is
>> possible. Also, I did compile it but dtc doesn't make a warning. Let me
>> show an another use case which is similar to this case:
>
> I did say *soon*. It's in dtc repo, but not the kernel copy yet.
>
>> In arch/arm/boot/dts/aspeed-g5.dtsi
>> [...]
>> lpc_host: lpc-host@80 {
>> compatible = "aspeed,ast2500-lpc-host", "simple-mfd", "syscon";
>> reg = <0x80 0x1e0>;
>> reg-io-width = <4>;
>>
>> #address-cells = <1>;
>> #size-cells = <1>;
>> ranges = <0x0 0x80 0x1e0>;
>>
>> lpc_ctrl: lpc-ctrl@0 {
>> compatible = "aspeed,ast2500-lpc-ctrl";
>> reg = <0x0 0x80>;
>> clocks = <&syscon ASPEED_CLK_GATE_LCLK>;
>> status = "disabled";
>> };
>>
>> lpc_snoop: lpc-snoop@0 {
>> compatible = "aspeed,ast2500-lpc-snoop";
>> reg = <0x0 0x80>;
>> interrupts = <8>;
>> status = "disabled";
>> };
>> }
>> [...]
>>
>> This is device tree setting for LPC interface and its child nodes.
>> LPC interface can be used as a multi-functional interface such as
>> snoop 80, KCS, SIO and so on. In this use case, lpc-ctrl@0 and
>> lpc-snoop@0 are sharing their address range from their individual
>> driver modules and they can be registered quite well through both
>> static dt or dynamic dtoverlay. PECI is also a multi-functional
>> interface which is similar to the above case, I think.
>
> This case too is poor design and should be fixed as well. Simply put,
> you can have 2 devices on a bus at the same address without some sort
> of mux or arbitration device in the middle. If you have a device/block
> with multiple functions provided to the OS, then it is the OS's
> problem to arbitrate access. It is not a DT problem because OS's can
> vary in how they handle that both from OS to OS and over time.
>
> Rob
>
If I change it to a single DT node which registers 2 hwmon devices using
the 2nd option above, then I still have 2 devices on a bus at the same
address. Does it also make a problem to the OS then?
Jae
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v3 07/10] Documentation: dt-bindings: Add documents for PECI hwmon client drivers
From: Rob Herring @ 2018-04-18 21:28 UTC (permalink / raw)
To: Jae Hyun Yoo
Cc: Alan Cox, Andrew Jeffery, Andrew Lunn, Andy Shevchenko,
Arnd Bergmann, Benjamin Herrenschmidt, Fengguang Wu, Greg KH,
Guenter Roeck, Haiyue Wang, James Feist, Jason M Biils,
Jean Delvare, Joel Stanley, Julia Cartwright, Miguel Ojeda,
Milton Miller II, Pavel Machek, Randy Dunlap, Stef van Os,
Sumeet R Pawnikar, Vernon Mauery, linux-kernel@vger.kernel.org,
linux-doc, devicetree, Linux HWMON List,
moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
OpenBMC Maillist
In-Reply-To: <1f2a86ff-b902-1d1d-488a-807ac1dd20cc@linux.intel.com>
On Wed, Apr 18, 2018 at 3:28 PM, Jae Hyun Yoo
<jae.hyun.yoo@linux.intel.com> wrote:
> On 4/18/2018 7:32 AM, Rob Herring wrote:
>>
>> On Tue, Apr 17, 2018 at 3:40 PM, Jae Hyun Yoo
>> <jae.hyun.yoo@linux.intel.com> wrote:
>>>
>>> On 4/16/2018 4:51 PM, Jae Hyun Yoo wrote:
>>>>
>>>>
>>>> On 4/16/2018 4:22 PM, Jae Hyun Yoo wrote:
>>>>>
>>>>>
>>>>> On 4/16/2018 11:14 AM, Rob Herring wrote:
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 10, 2018 at 11:32:09AM -0700, Jae Hyun Yoo wrote:
>>>>>>>
>>>>>>>
>>>>>>> This commit adds dt-bindings documents for PECI cputemp and dimmtemp
>>>>>>> client
>>>>>>> drivers.
>>>>>>
>>>>>>
>>>>>>
>>>
>>> [...]
>>>
>>>>>>> +Example:
>>>>>>> + peci-bus@0 {
>>>>>>> + #address-cells = <1>;
>>>>>>> + #size-cells = <0>;
>>>>>>> + < more properties >
>>>>>>> +
>>>>>>> + peci-dimmtemp@cpu0 {
>>>>>>
>>>>>>
>>>>>>
>>>>>> unit-address is wrong.
>>>>>>
>>>>>
>>>>> Will fix it using the reg value.
>>>>>
>>>>>> It is a different bus from cputemp? Otherwise, you have conflicting
>>>>>> addresses. If that's the case, probably should make it clear by
>>>>>> showing
>>>>>> different host adapters for each example.
>>>>>>
>>>>>
>>>>> It could be the same bus with cputemp. Also, client address sharing is
>>>>> possible by PECI core if the functionality is different. I mean,
>>>>> cputemp and
>>>>> dimmtemp targeting the same client is possible case like this.
>>>>> peci-cputemp@30
>>>>> peci-dimmtemp@30
>>>>>
>>>>
>>>> Oh, I got your point. Probably, I should change these separate settings
>>>> into one like
>>>>
>>>> peci-client@30 {
>>>> compatible = "intel,peci-client";
>>>> reg = <0x30>;
>>>> };
>>>>
>>>> Then cputemp and dimmtemp drivers could refer the same compatible
>>>> string.
>>>> Will rewrite it.
>>>>
>>>
>>> I've checked it again and realized that it should use function based node
>>> name like:
>>>
>>> peci-cputemp@30
>>> peci-dimmtemp@30
>>>
>>> If it use the same string like 'peci-client@30', the drivers cannot be
>>> selectively enabled. The client address sharing way is well handled in
>>> PECI
>>> core and this way would be better for the future implementations of other
>>> PECI functional drivers such as crash dump driver and so on. So I'm going
>>> change the unit-address only.
>>
>>
>> 2 nodes at the same address is wrong (and soon dtc will warn you on
>> this). You have 2 potential options. The first is you need additional
>> address information in the DT if these are in fact 2 independent
>> devices. This could be something like a function number to use
>> something from PCI addressing. From what I found on PECI, it doesn't
>> seem to have anything like that. The 2nd option is you have a single
>> DT node which registers multiple hwmon devices. DT nodes and drivers
>> don't have to be 1-1. Don't design your DT nodes from how you want to
>> partition drivers in some OS.
>>
>> Rob
>>
>
> Please correct me if I'm wrong but I'm still thinking that it is
> possible. Also, I did compile it but dtc doesn't make a warning. Let me
> show an another use case which is similar to this case:
I did say *soon*. It's in dtc repo, but not the kernel copy yet.
> In arch/arm/boot/dts/aspeed-g5.dtsi
> [...]
> lpc_host: lpc-host@80 {
> compatible = "aspeed,ast2500-lpc-host", "simple-mfd", "syscon";
> reg = <0x80 0x1e0>;
> reg-io-width = <4>;
>
> #address-cells = <1>;
> #size-cells = <1>;
> ranges = <0x0 0x80 0x1e0>;
>
> lpc_ctrl: lpc-ctrl@0 {
> compatible = "aspeed,ast2500-lpc-ctrl";
> reg = <0x0 0x80>;
> clocks = <&syscon ASPEED_CLK_GATE_LCLK>;
> status = "disabled";
> };
>
> lpc_snoop: lpc-snoop@0 {
> compatible = "aspeed,ast2500-lpc-snoop";
> reg = <0x0 0x80>;
> interrupts = <8>;
> status = "disabled";
> };
> }
> [...]
>
> This is device tree setting for LPC interface and its child nodes.
> LPC interface can be used as a multi-functional interface such as
> snoop 80, KCS, SIO and so on. In this use case, lpc-ctrl@0 and
> lpc-snoop@0 are sharing their address range from their individual
> driver modules and they can be registered quite well through both
> static dt or dynamic dtoverlay. PECI is also a multi-functional
> interface which is similar to the above case, I think.
This case too is poor design and should be fixed as well. Simply put,
you can have 2 devices on a bus at the same address without some sort
of mux or arbitration device in the middle. If you have a device/block
with multiple functions provided to the OS, then it is the OS's
problem to arbitrate access. It is not a DT problem because OS's can
vary in how they handle that both from OS to OS and over time.
Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v3 07/10] Documentation: dt-bindings: Add documents for PECI hwmon client drivers
From: Jae Hyun Yoo @ 2018-04-18 20:28 UTC (permalink / raw)
To: Rob Herring
Cc: Alan Cox, Andrew Jeffery, Andrew Lunn, Andy Shevchenko,
Arnd Bergmann, Benjamin Herrenschmidt, Fengguang Wu, Greg KH,
Guenter Roeck, Haiyue Wang, James Feist, Jason M Biils,
Jean Delvare, Joel Stanley, Julia Cartwright, Miguel Ojeda,
Milton Miller II, Pavel Machek, Randy Dunlap, Stef van Os,
Sumeet R Pawnikar, Vernon Mauery, linux-kernel@vger.kernel.org,
linux-doc, devicetree, Linux HWMON List,
moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
OpenBMC Maillist
In-Reply-To: <CAL_JsqLcxEnCOywTtU14G4=6A5m6G=KH3oi8m1VxKSEdxuv-ww@mail.gmail.com>
On 4/18/2018 7:32 AM, Rob Herring wrote:
> On Tue, Apr 17, 2018 at 3:40 PM, Jae Hyun Yoo
> <jae.hyun.yoo@linux.intel.com> wrote:
>> On 4/16/2018 4:51 PM, Jae Hyun Yoo wrote:
>>>
>>> On 4/16/2018 4:22 PM, Jae Hyun Yoo wrote:
>>>>
>>>> On 4/16/2018 11:14 AM, Rob Herring wrote:
>>>>>
>>>>> On Tue, Apr 10, 2018 at 11:32:09AM -0700, Jae Hyun Yoo wrote:
>>>>>>
>>>>>> This commit adds dt-bindings documents for PECI cputemp and dimmtemp
>>>>>> client
>>>>>> drivers.
>>>>>
>>>>>
>>
>> [...]
>>
>>>>>> +Example:
>>>>>> + peci-bus@0 {
>>>>>> + #address-cells = <1>;
>>>>>> + #size-cells = <0>;
>>>>>> + < more properties >
>>>>>> +
>>>>>> + peci-dimmtemp@cpu0 {
>>>>>
>>>>>
>>>>> unit-address is wrong.
>>>>>
>>>>
>>>> Will fix it using the reg value.
>>>>
>>>>> It is a different bus from cputemp? Otherwise, you have conflicting
>>>>> addresses. If that's the case, probably should make it clear by showing
>>>>> different host adapters for each example.
>>>>>
>>>>
>>>> It could be the same bus with cputemp. Also, client address sharing is
>>>> possible by PECI core if the functionality is different. I mean, cputemp and
>>>> dimmtemp targeting the same client is possible case like this.
>>>> peci-cputemp@30
>>>> peci-dimmtemp@30
>>>>
>>>
>>> Oh, I got your point. Probably, I should change these separate settings
>>> into one like
>>>
>>> peci-client@30 {
>>> compatible = "intel,peci-client";
>>> reg = <0x30>;
>>> };
>>>
>>> Then cputemp and dimmtemp drivers could refer the same compatible string.
>>> Will rewrite it.
>>>
>>
>> I've checked it again and realized that it should use function based node
>> name like:
>>
>> peci-cputemp@30
>> peci-dimmtemp@30
>>
>> If it use the same string like 'peci-client@30', the drivers cannot be
>> selectively enabled. The client address sharing way is well handled in PECI
>> core and this way would be better for the future implementations of other
>> PECI functional drivers such as crash dump driver and so on. So I'm going
>> change the unit-address only.
>
> 2 nodes at the same address is wrong (and soon dtc will warn you on
> this). You have 2 potential options. The first is you need additional
> address information in the DT if these are in fact 2 independent
> devices. This could be something like a function number to use
> something from PCI addressing. From what I found on PECI, it doesn't
> seem to have anything like that. The 2nd option is you have a single
> DT node which registers multiple hwmon devices. DT nodes and drivers
> don't have to be 1-1. Don't design your DT nodes from how you want to
> partition drivers in some OS.
>
> Rob
>
Please correct me if I'm wrong but I'm still thinking that it is
possible. Also, I did compile it but dtc doesn't make a warning. Let me
show an another use case which is similar to this case:
In arch/arm/boot/dts/aspeed-g5.dtsi
[...]
lpc_host: lpc-host@80 {
compatible = "aspeed,ast2500-lpc-host", "simple-mfd", "syscon";
reg = <0x80 0x1e0>;
reg-io-width = <4>;
#address-cells = <1>;
#size-cells = <1>;
ranges = <0x0 0x80 0x1e0>;
lpc_ctrl: lpc-ctrl@0 {
compatible = "aspeed,ast2500-lpc-ctrl";
reg = <0x0 0x80>;
clocks = <&syscon ASPEED_CLK_GATE_LCLK>;
status = "disabled";
};
lpc_snoop: lpc-snoop@0 {
compatible = "aspeed,ast2500-lpc-snoop";
reg = <0x0 0x80>;
interrupts = <8>;
status = "disabled";
};
}
[...]
This is device tree setting for LPC interface and its child nodes.
LPC interface can be used as a multi-functional interface such as
snoop 80, KCS, SIO and so on. In this use case, lpc-ctrl@0 and
lpc-snoop@0 are sharing their address range from their individual
driver modules and they can be registered quite well through both
static dt or dynamic dtoverlay. PECI is also a multi-functional
interface which is similar to the above case, I think.
Thanks,
Jae
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v3] docs: kernel-parameters.txt: Fix whitespace
From: Randy Dunlap @ 2018-04-18 20:09 UTC (permalink / raw)
To: Thymo van Beers, corbet; +Cc: linux-doc, linux-kernel
In-Reply-To: <20180418185136.GA27678@thinkpad>
On 04/18/18 11:51, Thymo van Beers wrote:
> Some lines used spaces instead of tabs at line start.
> This can cause mangled lines in editors due to inconsistency.
>
> Replace spaces for tabs where appropriate.
>
> Signed-off-by: Thymo van Beers <thymovanbeers@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
> ---
> Changes in v3:
> - Change indentation in intel_pstate to reduce overrunning 80-column
> mark
> - Indent "nohrst, nosrst, norst:" like "rstonce"
intel_pstate, isolcpus, maxcpus, onenand.bdry, reboot (and a few others)
do still go beyond 80 columns. Those could be fixed later (or not).
Thanks.
> Changes in v2:
> - Rebase against docs-next
> - Fix indentation modifications
>
> Documentation/admin-guide/kernel-parameters.txt | 136 ++++++++++++------------
> 1 file changed, 68 insertions(+), 68 deletions(-)
--
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH 0/6] arm64: untag user pointers passed to the kernel
From: Andrey Konovalov @ 2018-04-18 18:53 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Jonathan Corbet, Mark Rutland,
Robin Murphy, Al Viro, Andrey Konovalov, James Morse, Kees Cook,
Bart Van Assche, Kate Stewart, Greg Kroah-Hartman,
Thomas Gleixner, Philippe Ombredanne, Andrew Morton, Ingo Molnar,
Kirill A . Shutemov, Dan Williams, Aneesh Kumar K . V, Zi Yan,
linux-arm-kernel, linux-doc, linux-kernel, linux-mm
Cc: Dmitry Vyukov, Kostya Serebryany, Evgeniy Stepanov, Lee Smith,
Ramana Radhakrishnan, Jacob Bramley, Ruben Ayrapetyan
Hi!
arm64 has a feature called Top Byte Ignore, which allows to embed pointer
tags into the top byte of each pointer. Userspace programs (such as
HWASan, a memory debugging tool [1]) might use this feature and pass
tagged user pointers to the kernel through syscalls or other interfaces.
This patch makes a few of the kernel interfaces accept tagged user
pointers. The kernel is already able to handle user faults with tagged
pointers and has the untagged_addr macro, which this patchset reuses.
We're not trying to cover all possible ways the kernel accepts user
pointers in one patchset, so this one should be considered as a start.
Thanks!
[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
Changes in v1:
- Rebased onto 4.17-rc1.
Changes in RFC v2:
- Added "#ifndef untagged_addr..." fallback in linux/uaccess.h instead of
defining it for each arch individually.
- Updated Documentation/arm64/tagged-pointers.txt.
- Dropped “mm, arm64: untag user addresses in memory syscalls”.
- Rebased onto 3eb2ce82 (4.16-rc7).
Andrey Konovalov (6):
arm64: add type casts to untagged_addr macro
uaccess: add untagged_addr definition for other arches
arm64: untag user addresses in copy_from_user and others
mm, arm64: untag user addresses in mm/gup.c
lib, arm64: untag addrs passed to strncpy_from_user and strnlen_user
arm64: update Documentation/arm64/tagged-pointers.txt
Documentation/arm64/tagged-pointers.txt | 5 +++--
arch/arm64/include/asm/uaccess.h | 9 +++++++--
include/linux/uaccess.h | 4 ++++
lib/strncpy_from_user.c | 2 ++
lib/strnlen_user.c | 2 ++
mm/gup.c | 12 ++++++++++++
6 files changed, 30 insertions(+), 4 deletions(-)
--
2.17.0.484.g0c8726318c-goog
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH 2/6] uaccess: add untagged_addr definition for other arches
From: Andrey Konovalov @ 2018-04-18 18:53 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Jonathan Corbet, Mark Rutland,
Robin Murphy, Al Viro, Andrey Konovalov, James Morse, Kees Cook,
Bart Van Assche, Kate Stewart, Greg Kroah-Hartman,
Thomas Gleixner, Philippe Ombredanne, Andrew Morton, Ingo Molnar,
Kirill A . Shutemov, Dan Williams, Aneesh Kumar K . V, Zi Yan,
linux-arm-kernel, linux-doc, linux-kernel, linux-mm
Cc: Dmitry Vyukov, Kostya Serebryany, Evgeniy Stepanov, Lee Smith,
Ramana Radhakrishnan, Jacob Bramley, Ruben Ayrapetyan
In-Reply-To: <cover.1524077494.git.andreyknvl@google.com>
To allow arm64 syscalls accept tagged pointers from userspace, we must
untag them when they are passed to the kernel. Since untagging is done in
generic parts of the kernel (like the mm subsystem), the untagged_addr
macro should be defined for all architectures.
Define it as a noop for other architectures besides arm64.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
include/linux/uaccess.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index efe79c1cdd47..c045b4eff95e 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -13,6 +13,10 @@
#include <asm/uaccess.h>
+#ifndef untagged_addr
+#define untagged_addr(addr) addr
+#endif
+
/*
* Architectures should provide two primitives (raw_copy_{to,from}_user())
* and get rid of their private instances of copy_{to,from}_user() and
--
2.17.0.484.g0c8726318c-goog
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 1/6] arm64: add type casts to untagged_addr macro
From: Andrey Konovalov @ 2018-04-18 18:53 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Jonathan Corbet, Mark Rutland,
Robin Murphy, Al Viro, Andrey Konovalov, James Morse, Kees Cook,
Bart Van Assche, Kate Stewart, Greg Kroah-Hartman,
Thomas Gleixner, Philippe Ombredanne, Andrew Morton, Ingo Molnar,
Kirill A . Shutemov, Dan Williams, Aneesh Kumar K . V, Zi Yan,
linux-arm-kernel, linux-doc, linux-kernel, linux-mm
Cc: Dmitry Vyukov, Kostya Serebryany, Evgeniy Stepanov, Lee Smith,
Ramana Radhakrishnan, Jacob Bramley, Ruben Ayrapetyan
In-Reply-To: <cover.1524077494.git.andreyknvl@google.com>
This patch makes the untagged_addr macro accept all kinds of address types
(void *, unsigned long, etc.) and allows not to specify type casts in each
place where it is used. This is done by using __typeof__.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
arch/arm64/include/asm/uaccess.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index e66b0fca99c2..2d6451cbaa86 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -102,7 +102,8 @@ static inline unsigned long __range_ok(const void __user *addr, unsigned long si
* up with a tagged userland pointer. Clear the tag to get a sane pointer to
* pass on to access_ok(), for instance.
*/
-#define untagged_addr(addr) sign_extend64(addr, 55)
+#define untagged_addr(addr) \
+ ((__typeof__(addr))sign_extend64((__u64)(addr), 55))
#define access_ok(type, addr, size) __range_ok(addr, size)
#define user_addr_max get_fs
--
2.17.0.484.g0c8726318c-goog
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 3/6] arm64: untag user addresses in copy_from_user and others
From: Andrey Konovalov @ 2018-04-18 18:53 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Jonathan Corbet, Mark Rutland,
Robin Murphy, Al Viro, Andrey Konovalov, James Morse, Kees Cook,
Bart Van Assche, Kate Stewart, Greg Kroah-Hartman,
Thomas Gleixner, Philippe Ombredanne, Andrew Morton, Ingo Molnar,
Kirill A . Shutemov, Dan Williams, Aneesh Kumar K . V, Zi Yan,
linux-arm-kernel, linux-doc, linux-kernel, linux-mm
Cc: Dmitry Vyukov, Kostya Serebryany, Evgeniy Stepanov, Lee Smith,
Ramana Radhakrishnan, Jacob Bramley, Ruben Ayrapetyan
In-Reply-To: <cover.1524077494.git.andreyknvl@google.com>
copy_from_user (and a few other similar functions) are used to copy data
from user memory into the kernel memory or vice versa. Since a user can
provided a tagged pointer to one of the syscalls that use copy_from_user,
we need to correctly handle such pointers.
Do this by untagging user pointers in access_ok and in __uaccess_mask_ptr.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
arch/arm64/include/asm/uaccess.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 2d6451cbaa86..24a221678fe3 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -105,7 +105,8 @@ static inline unsigned long __range_ok(const void __user *addr, unsigned long si
#define untagged_addr(addr) \
((__typeof__(addr))sign_extend64((__u64)(addr), 55))
-#define access_ok(type, addr, size) __range_ok(addr, size)
+#define access_ok(type, addr, size) \
+ __range_ok(untagged_addr(addr), size)
#define user_addr_max get_fs
#define _ASM_EXTABLE(from, to) \
@@ -238,12 +239,15 @@ static inline void uaccess_enable_not_uao(void)
/*
* Sanitise a uaccess pointer such that it becomes NULL if above the
* current addr_limit.
+ * Also untag user pointers that have the top byte tag set.
*/
#define uaccess_mask_ptr(ptr) (__typeof__(ptr))__uaccess_mask_ptr(ptr)
static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
{
void __user *safe_ptr;
+ ptr = untagged_addr(ptr);
+
asm volatile(
" bics xzr, %1, %2\n"
" csel %0, %1, xzr, eq\n"
--
2.17.0.484.g0c8726318c-goog
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 6/6] arm64: update Documentation/arm64/tagged-pointers.txt
From: Andrey Konovalov @ 2018-04-18 18:53 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Jonathan Corbet, Mark Rutland,
Robin Murphy, Al Viro, Andrey Konovalov, James Morse, Kees Cook,
Bart Van Assche, Kate Stewart, Greg Kroah-Hartman,
Thomas Gleixner, Philippe Ombredanne, Andrew Morton, Ingo Molnar,
Kirill A . Shutemov, Dan Williams, Aneesh Kumar K . V, Zi Yan,
linux-arm-kernel, linux-doc, linux-kernel, linux-mm
Cc: Dmitry Vyukov, Kostya Serebryany, Evgeniy Stepanov, Lee Smith,
Ramana Radhakrishnan, Jacob Bramley, Ruben Ayrapetyan
In-Reply-To: <cover.1524077494.git.andreyknvl@google.com>
Add a note that work on passing tagged user pointers to the kernel via
syscalls has started, but might not be complete yet.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
Documentation/arm64/tagged-pointers.txt | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/Documentation/arm64/tagged-pointers.txt b/Documentation/arm64/tagged-pointers.txt
index a25a99e82bb1..361481283f00 100644
--- a/Documentation/arm64/tagged-pointers.txt
+++ b/Documentation/arm64/tagged-pointers.txt
@@ -35,8 +35,9 @@ Using non-zero address tags in any of these locations may result in an
error code being returned, a (fatal) signal being raised, or other modes
of failure.
-For these reasons, passing non-zero address tags to the kernel via
-system calls is forbidden, and using a non-zero address tag for sp is
+Some initial work for supporting non-zero address tags passed to the
+kernel via system calls has been done, but the kernel doesn't provide
+any guarantees at this point. Using a non-zero address tag for sp is
strongly discouraged.
Programs maintaining a frame pointer and frame records that use non-zero
--
2.17.0.484.g0c8726318c-goog
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 4/6] mm, arm64: untag user addresses in mm/gup.c
From: Andrey Konovalov @ 2018-04-18 18:53 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Jonathan Corbet, Mark Rutland,
Robin Murphy, Al Viro, Andrey Konovalov, James Morse, Kees Cook,
Bart Van Assche, Kate Stewart, Greg Kroah-Hartman,
Thomas Gleixner, Philippe Ombredanne, Andrew Morton, Ingo Molnar,
Kirill A . Shutemov, Dan Williams, Aneesh Kumar K . V, Zi Yan,
linux-arm-kernel, linux-doc, linux-kernel, linux-mm
Cc: Dmitry Vyukov, Kostya Serebryany, Evgeniy Stepanov, Lee Smith,
Ramana Radhakrishnan, Jacob Bramley, Ruben Ayrapetyan
In-Reply-To: <cover.1524077494.git.andreyknvl@google.com>
mm/gup.c provides a kernel interface that accepts user addresses and
manipulates user pages directly (for example get_user_pages, that is used
by the futex syscall). Here we also need to handle the case of tagged user
pointers.
Untag addresses passed to this interface.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
mm/gup.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/mm/gup.c b/mm/gup.c
index 76af4cfeaf68..fb375de7d40d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -386,6 +386,8 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
struct page *page;
struct mm_struct *mm = vma->vm_mm;
+ address = untagged_addr(address);
+
*page_mask = 0;
/* make this handle hugepd */
@@ -647,6 +649,8 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
if (!nr_pages)
return 0;
+ start = untagged_addr(start);
+
VM_BUG_ON(!!pages != !!(gup_flags & FOLL_GET));
/*
@@ -801,6 +805,8 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
struct vm_area_struct *vma;
int ret, major = 0;
+ address = untagged_addr(address);
+
if (unlocked)
fault_flags |= FAULT_FLAG_ALLOW_RETRY;
@@ -854,6 +860,8 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
long ret, pages_done;
bool lock_dropped;
+ start = untagged_addr(start);
+
if (locked) {
/* if VM_FAULT_RETRY can be returned, vmas become invalid */
BUG_ON(vmas);
@@ -1751,6 +1759,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
unsigned long flags;
int nr = 0;
+ start = untagged_addr(start);
+
start &= PAGE_MASK;
addr = start;
len = (unsigned long) nr_pages << PAGE_SHIFT;
@@ -1803,6 +1813,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
unsigned long addr, len, end;
int nr = 0, ret = 0;
+ start = untagged_addr(start);
+
start &= PAGE_MASK;
addr = start;
len = (unsigned long) nr_pages << PAGE_SHIFT;
--
2.17.0.484.g0c8726318c-goog
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 5/6] lib, arm64: untag addrs passed to strncpy_from_user and strnlen_user
From: Andrey Konovalov @ 2018-04-18 18:53 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Jonathan Corbet, Mark Rutland,
Robin Murphy, Al Viro, Andrey Konovalov, James Morse, Kees Cook,
Bart Van Assche, Kate Stewart, Greg Kroah-Hartman,
Thomas Gleixner, Philippe Ombredanne, Andrew Morton, Ingo Molnar,
Kirill A . Shutemov, Dan Williams, Aneesh Kumar K . V, Zi Yan,
linux-arm-kernel, linux-doc, linux-kernel, linux-mm
Cc: Dmitry Vyukov, Kostya Serebryany, Evgeniy Stepanov, Lee Smith,
Ramana Radhakrishnan, Jacob Bramley, Ruben Ayrapetyan
In-Reply-To: <cover.1524077494.git.andreyknvl@google.com>
strncpy_from_user and strnlen_user accept user addresses as arguments, and
do not go through the same path as copy_from_user and others, so here we
need to separately handle the case of tagged user addresses as well.
Untag user pointers passed to these functions.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
lib/strncpy_from_user.c | 2 ++
lib/strnlen_user.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c
index b53e1b5d80f4..97467cd2bc59 100644
--- a/lib/strncpy_from_user.c
+++ b/lib/strncpy_from_user.c
@@ -106,6 +106,8 @@ long strncpy_from_user(char *dst, const char __user *src, long count)
if (unlikely(count <= 0))
return 0;
+ src = untagged_addr(src);
+
max_addr = user_addr_max();
src_addr = (unsigned long)src;
if (likely(src_addr < max_addr)) {
diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c
index 60d0bbda8f5e..8b5f56466e00 100644
--- a/lib/strnlen_user.c
+++ b/lib/strnlen_user.c
@@ -108,6 +108,8 @@ long strnlen_user(const char __user *str, long count)
if (unlikely(count <= 0))
return 0;
+ str = untagged_addr(str);
+
max_addr = user_addr_max();
src_addr = (unsigned long)str;
if (likely(src_addr < max_addr)) {
--
2.17.0.484.g0c8726318c-goog
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH v3] docs: kernel-parameters.txt: Fix whitespace
From: Thymo van Beers @ 2018-04-18 18:51 UTC (permalink / raw)
To: corbet; +Cc: linux-doc, linux-kernel
Some lines used spaces instead of tabs at line start.
This can cause mangled lines in editors due to inconsistency.
Replace spaces for tabs where appropriate.
Signed-off-by: Thymo van Beers <thymovanbeers@gmail.com>
---
Changes in v3:
- Change indentation in intel_pstate to reduce overrunning 80-column
mark
- Indent "nohrst, nosrst, norst:" like "rstonce"
Changes in v2:
- Rebase against docs-next
- Fix indentation modifications
Documentation/admin-guide/kernel-parameters.txt | 136 ++++++++++++------------
1 file changed, 68 insertions(+), 68 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3487be79847c..865a24e4d516 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -106,11 +106,11 @@
use by PCI
Format: <irq>,<irq>...
- acpi_mask_gpe= [HW,ACPI]
+ acpi_mask_gpe= [HW,ACPI]
Due to the existence of _Lxx/_Exx, some GPEs triggered
by unsupported hardware/firmware features can result in
- GPE floodings that cannot be automatically disabled by
- the GPE dispatcher.
+ GPE floodings that cannot be automatically disabled by
+ the GPE dispatcher.
This facility can be used to prevent such uncontrolled
GPE floodings.
Format: <int>
@@ -472,10 +472,10 @@
for platform specific values (SB1, Loongson3 and
others).
- ccw_timeout_log [S390]
+ ccw_timeout_log [S390]
See Documentation/s390/CommonIO for details.
- cgroup_disable= [KNL] Disable a particular controller
+ cgroup_disable= [KNL] Disable a particular controller
Format: {name of the controller(s) to disable}
The effects of cgroup_disable=foo are:
- foo isn't auto-mounted if you mount all cgroups in
@@ -641,8 +641,8 @@
hvc<n> Use the hypervisor console device <n>. This is for
both Xen and PowerPC hypervisors.
- If the device connected to the port is not a TTY but a braille
- device, prepend "brl," before the device type, for instance
+ If the device connected to the port is not a TTY but a braille
+ device, prepend "brl," before the device type, for instance
console=brl,ttyS0
For now, only VisioBraille is supported.
@@ -662,7 +662,7 @@
consoleblank= [KNL] The console blank (screen saver) timeout in
seconds. A value of 0 disables the blank timer.
- Defaults to 0.
+ Defaults to 0.
coredump_filter=
[KNL] Change the default value for
@@ -730,7 +730,7 @@
or memory reserved is below 4G.
cryptomgr.notests
- [KNL] Disable crypto self-tests
+ [KNL] Disable crypto self-tests
cs89x0_dma= [HW,NET]
Format: <dma>
@@ -746,7 +746,7 @@
Format: <port#>,<type>
See also Documentation/input/devices/joystick-parport.rst
- ddebug_query= [KNL,DYNAMIC_DEBUG] Enable debug messages at early boot
+ ddebug_query= [KNL,DYNAMIC_DEBUG] Enable debug messages at early boot
time. See
Documentation/admin-guide/dynamic-debug-howto.rst for
details. Deprecated, see dyndbg.
@@ -833,7 +833,7 @@
causing system reset or hang due to sending
INIT from AP to BSP.
- disable_ddw [PPC/PSERIES]
+ disable_ddw [PPC/PSERIES]
Disable Dynamic DMA Window support. Use this if
to workaround buggy firmware.
@@ -1188,7 +1188,7 @@
parameter will force ia64_sal_cache_flush to call
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
- forcepae [X86-32]
+ forcepae [X86-32]
Forcefully enable Physical Address Extension (PAE).
Many Pentium M systems disable PAE but may have a
functionally usable PAE implementation.
@@ -1247,7 +1247,7 @@
gamma= [HW,DRM]
- gart_fix_e820= [X86_64] disable the fix e820 for K8 GART
+ gart_fix_e820= [X86_64] disable the fix e820 for K8 GART
Format: off | on
default: on
@@ -1341,11 +1341,11 @@
x86-64 are 2M (when the CPU supports "pse") and 1G
(when the CPU supports the "pdpe1gb" cpuinfo flag).
- hvc_iucv= [S390] Number of z/VM IUCV hypervisor console (HVC)
- terminal devices. Valid values: 0..8
- hvc_iucv_allow= [S390] Comma-separated list of z/VM user IDs.
- If specified, z/VM IUCV HVC accepts connections
- from listed z/VM user IDs only.
+ hvc_iucv= [S390] Number of z/VM IUCV hypervisor console (HVC)
+ terminal devices. Valid values: 0..8
+ hvc_iucv_allow= [S390] Comma-separated list of z/VM user IDs.
+ If specified, z/VM IUCV HVC accepts connections
+ from listed z/VM user IDs only.
keep_bootcon [KNL]
Do not unregister boot console at start. This is only
@@ -1353,11 +1353,11 @@
between unregistering the boot console and initializing
the real console.
- i2c_bus= [HW] Override the default board specific I2C bus speed
- or register an additional I2C bus that is not
- registered from board initialization code.
- Format:
- <bus_id>,<clkrate>
+ i2c_bus= [HW] Override the default board specific I2C bus speed
+ or register an additional I2C bus that is not
+ registered from board initialization code.
+ Format:
+ <bus_id>,<clkrate>
i8042.debug [HW] Toggle i8042 debug mode
i8042.unmask_kbd_data
@@ -1386,7 +1386,7 @@
Default: only on s2r transitions on x86; most other
architectures force reset to be always executed
i8042.unlock [HW] Unlock (ignore) the keylock
- i8042.kbdreset [HW] Reset device connected to KBD port
+ i8042.kbdreset [HW] Reset device connected to KBD port
i810= [HW,DRM]
@@ -1548,13 +1548,13 @@
programs exec'd, files mmap'd for exec, and all files
opened for read by uid=0.
- ima_template= [IMA]
+ ima_template= [IMA]
Select one of defined IMA measurements template formats.
Formats: { "ima" | "ima-ng" | "ima-sig" }
Default: "ima-ng"
ima_template_fmt=
- [IMA] Define a custom template format.
+ [IMA] Define a custom template format.
Format: { "field1|...|fieldN" }
ima.ahash_minsize= [IMA] Minimum file size for asynchronous hash usage
@@ -1597,7 +1597,7 @@
inport.irq= [HW] Inport (ATI XL and Microsoft) busmouse driver
Format: <irq>
- int_pln_enable [x86] Enable power limit notification interrupt
+ int_pln_enable [x86] Enable power limit notification interrupt
integrity_audit=[IMA]
Format: { "0" | "1" }
@@ -1650,39 +1650,39 @@
0 disables intel_idle and fall back on acpi_idle.
1 to 9 specify maximum depth of C-state.
- intel_pstate= [X86]
- disable
- Do not enable intel_pstate as the default
- scaling driver for the supported processors
- passive
- Use intel_pstate as a scaling driver, but configure it
- to work with generic cpufreq governors (instead of
- enabling its internal governor). This mode cannot be
- used along with the hardware-managed P-states (HWP)
- feature.
- force
- Enable intel_pstate on systems that prohibit it by default
- in favor of acpi-cpufreq. Forcing the intel_pstate driver
- instead of acpi-cpufreq may disable platform features, such
- as thermal controls and power capping, that rely on ACPI
- P-States information being indicated to OSPM and therefore
- should be used with caution. This option does not work with
- processors that aren't supported by the intel_pstate driver
- or on platforms that use pcc-cpufreq instead of acpi-cpufreq.
- no_hwp
- Do not enable hardware P state control (HWP)
- if available.
- hwp_only
- Only load intel_pstate on systems which support
- hardware P state control (HWP) if available.
- support_acpi_ppc
- Enforce ACPI _PPC performance limits. If the Fixed ACPI
- Description Table, specifies preferred power management
- profile as "Enterprise Server" or "Performance Server",
- then this feature is turned on by default.
- per_cpu_perf_limits
- Allow per-logical-CPU P-State performance control limits using
- cpufreq sysfs interface
+ intel_pstate= [X86]
+ disable
+ Do not enable intel_pstate as the default
+ scaling driver for the supported processors
+ passive
+ Use intel_pstate as a scaling driver, but configure it
+ to work with generic cpufreq governors (instead of
+ enabling its internal governor). This mode cannot be
+ used along with the hardware-managed P-states (HWP)
+ feature.
+ force
+ Enable intel_pstate on systems that prohibit it by default
+ in favor of acpi-cpufreq. Forcing the intel_pstate driver
+ instead of acpi-cpufreq may disable platform features, such
+ as thermal controls and power capping, that rely on ACPI
+ P-States information being indicated to OSPM and therefore
+ should be used with caution. This option does not work with
+ processors that aren't supported by the intel_pstate driver
+ or on platforms that use pcc-cpufreq instead of acpi-cpufreq.
+ no_hwp
+ Do not enable hardware P state control (HWP)
+ if available.
+ hwp_only
+ Only load intel_pstate on systems which support
+ hardware P state control (HWP) if available.
+ support_acpi_ppc
+ Enforce ACPI _PPC performance limits. If the Fixed ACPI
+ Description Table, specifies preferred power management
+ profile as "Enterprise Server" or "Performance Server",
+ then this feature is turned on by default.
+ per_cpu_perf_limits
+ Allow per-logical-CPU P-State performance control limits using
+ cpufreq sysfs interface
intremap= [X86-64, Intel-IOMMU]
on enable Interrupt Remapping (default)
@@ -2027,7 +2027,7 @@
* [no]ncqtrim: Turn off queued DSM TRIM.
* nohrst, nosrst, norst: suppress hard, soft
- and both resets.
+ and both resets.
* rstonce: only attempt one reset during
hot-unplug link recovery
@@ -2215,7 +2215,7 @@
[KNL,SH] Allow user to override the default size for
per-device physically contiguous DMA buffers.
- memhp_default_state=online/offline
+ memhp_default_state=online/offline
[KNL] Set the initial state for the memory hotplug
onlining policy. If not specified, the default value is
set according to the
@@ -2762,7 +2762,7 @@
[X86,PV_OPS] Disable paravirtualized VMware scheduler
clock and use the default one.
- no-steal-acc [X86,KVM] Disable paravirtualized steal time accounting.
+ no-steal-acc [X86,KVM] Disable paravirtualized steal time accounting.
steal time is computed, but won't influence scheduler
behaviour
@@ -2823,7 +2823,7 @@
notsc [BUGS=X86-32] Disable Time Stamp Counter
nowatchdog [KNL] Disable both lockup detectors, i.e.
- soft-lockup and NMI watchdog (hard-lockup).
+ soft-lockup and NMI watchdog (hard-lockup).
nowb [ARM]
@@ -2843,7 +2843,7 @@
If the dependencies are under your control, you can
turn on cpu0_hotplug.
- nps_mtm_hs_ctr= [KNL,ARC]
+ nps_mtm_hs_ctr= [KNL,ARC]
This parameter sets the maximum duration, in
cycles, each HW thread of the CTOP can run
without interruptions, before HW switches it.
@@ -2984,7 +2984,7 @@
pci=option[,option...] [PCI] various PCI subsystem options:
earlydump [X86] dump PCI config space before the kernel
- changes anything
+ changes anything
off [X86] don't probe for the PCI bus
bios [X86-32] force use of PCI BIOS, don't access
the hardware directly. Use this if your machine
@@ -3072,7 +3072,7 @@
is enabled by default. If you need to use this,
please report a bug.
nocrs [X86] Ignore PCI host bridge windows from ACPI.
- If you need to use this, please report a bug.
+ If you need to use this, please report a bug.
routeirq Do IRQ routing for all PCI devices.
This is normally done in pci_enable_device(),
so this option is a temporary workaround
@@ -4391,7 +4391,7 @@
usbcore.initial_descriptor_timeout=
[USB] Specifies timeout for the initial 64-byte
- USB_REQ_GET_DESCRIPTOR request in milliseconds
+ USB_REQ_GET_DESCRIPTOR request in milliseconds
(default 5000 = 5.0 seconds).
usbcore.nousb [USB] Disable the USB subsystem
--
2.16.1
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: [PATCH v2] docs: kernel-parameters.txt: Fix whitespace
From: Randy Dunlap @ 2018-04-18 18:32 UTC (permalink / raw)
To: Thymo van Beers; +Cc: linux-doc, linux-kernel
In-Reply-To: <20180418181635.GA24031@thinkpad>
On 04/18/18 11:16, Thymo van Beers wrote:
> On Mon, Apr 16, 2018 at 03:03:47PM -0700, Randy Dunlap wrote:
>> On 04/16/18 14:49, Thymo van Beers wrote:
>>> Some lines used spaces instead of tabs at line start.
>>> This can cause mangled lines in editors due to inconsistency.
>>>
>>> Replace spaces for tabs where appropriate.
>>>
>>> Signed-off-by: Thymo van Beers <thymovanbeers@gmail.com>
>>> ---
>>> Changes in v2:
>>> - Rebase against docs-next
>>> - Fix indentation modifications
>>>
>>> Documentation/admin-guide/kernel-parameters.txt | 136 ++++++++++++------------
>>> 1 file changed, 68 insertions(+), 68 deletions(-)
>>>
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>> index 3487be79847c..f625f65c286f 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>
>> Most of the patch is OK IMO, but not the intel_pstate part:
>> The 2-space extra indents work fine here, while the extra tab makes a lot of the
>> lines go beyond the 80-column mark.
>>
>>> @@ -1650,39 +1650,39 @@
>>> 0 disables intel_idle and fall back on acpi_idle.
>>> 1 to 9 specify maximum depth of C-state.
>>>
>>> - intel_pstate= [X86]
>>> - disable
>>> - Do not enable intel_pstate as the default
>>> - scaling driver for the supported processors
>>> - passive
>>> - Use intel_pstate as a scaling driver, but configure it
>>> - to work with generic cpufreq governors (instead of
>>> - enabling its internal governor). This mode cannot be
>>> - used along with the hardware-managed P-states (HWP)
>>> - feature.
>>> - force
>>> - Enable intel_pstate on systems that prohibit it by default
>>> - in favor of acpi-cpufreq. Forcing the intel_pstate driver
>>> - instead of acpi-cpufreq may disable platform features, such
>>> - as thermal controls and power capping, that rely on ACPI
>>> - P-States information being indicated to OSPM and therefore
>>> - should be used with caution. This option does not work with
>>> - processors that aren't supported by the intel_pstate driver
>>> - or on platforms that use pcc-cpufreq instead of acpi-cpufreq.
>>> - no_hwp
>>> - Do not enable hardware P state control (HWP)
>>> - if available.
>>> - hwp_only
>>> - Only load intel_pstate on systems which support
>>> - hardware P state control (HWP) if available.
>>> - support_acpi_ppc
>>> - Enforce ACPI _PPC performance limits. If the Fixed ACPI
>>> - Description Table, specifies preferred power management
>>> - profile as "Enterprise Server" or "Performance Server",
>>> - then this feature is turned on by default.
>>> - per_cpu_perf_limits
>>> - Allow per-logical-CPU P-State performance control limits using
>>> - cpufreq sysfs interface
>>> + intel_pstate= [X86]
>>> + disable
>>> + Do not enable intel_pstate as the default
>>> + scaling driver for the supported processors
>>> + passive
>>> + Use intel_pstate as a scaling driver, but configure it
>>> + to work with generic cpufreq governors (instead of
>>> + enabling its internal governor). This mode cannot be
>>> + used along with the hardware-managed P-states (HWP)
>>> + feature.
>>> + force
>>> + Enable intel_pstate on systems that prohibit it by default
>>> + in favor of acpi-cpufreq. Forcing the intel_pstate driver
>>> + instead of acpi-cpufreq may disable platform features, such
>>> + as thermal controls and power capping, that rely on ACPI
>>> + P-States information being indicated to OSPM and therefore
>>> + should be used with caution. This option does not work with
>>> + processors that aren't supported by the intel_pstate driver
>>> + or on platforms that use pcc-cpufreq instead of acpi-cpufreq.
>>> + no_hwp
>>> + Do not enable hardware P state control (HWP)
>>> + if available.
>>> + hwp_only
>>> + Only load intel_pstate on systems which support
>>> + hardware P state control (HWP) if available.
>>> + support_acpi_ppc
>>> + Enforce ACPI _PPC performance limits. If the Fixed ACPI
>>> + Description Table, specifies preferred power management
>>> + profile as "Enterprise Server" or "Performance Server",
>>> + then this feature is turned on by default.
>>> + per_cpu_perf_limits
>>> + Allow per-logical-CPU P-State performance control limits using
>>> + cpufreq sysfs interface
>>>
>>> intremap= [X86-64, Intel-IOMMU]
>>> on enable Interrupt Remapping (default)
>>> @@ -2027,7 +2027,7 @@
>>> * [no]ncqtrim: Turn off queued DSM TRIM.
>>>
>>> * nohrst, nosrst, norst: suppress hard, soft
>>> - and both resets.
>>> + and both resets.
>>
>> I would leave that line above indented like the one after "rstonce" below.
>>
>>>
>>> * rstonce: only attempt one reset during
>>> hot-unplug link recovery
>>
>>
>> --
>> ~Randy
>
> Okay, thanks for your feedback.
>
> I reindented intel_pstate as you said and I can still see the whole
> description for the 'advanced' option is going past the 80-column mark.
>
> I'll leave it indented with two spaces for this patch.
> If you wish I can make a separate patch that addresses 80-column overrun
> for intel_pstate.
>
> I'll indent the nohrst,... section like rstonce.
>
> Does that sound good to you?
Yes, it does. Go for it!
thanks,
--
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox