* [PATCH] Documentation: filesystem: fix "Removed Sysctls" table
From: Sheriff Esseson @ 2019-07-23 11:48 UTC (permalink / raw)
To: skhan
Cc: linux-kernel-mentees, Darrick J. Wong, linux-xfs, Jonathan Corbet,
open list:DOCUMENTATION, open list
the "Removed Sysctls" section is a table - bring it alive with ReST.
Signed-off-by: Sheriff Esseson <sheriffesseson@gmail.com>
---
Documentation/admin-guide/xfs.rst | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/xfs.rst b/Documentation/admin-guide/xfs.rst
index e76665a8f2f2..fb5b39f73059 100644
--- a/Documentation/admin-guide/xfs.rst
+++ b/Documentation/admin-guide/xfs.rst
@@ -337,11 +337,12 @@ None at present.
Removed Sysctls
===============
+============================= =======
Name Removed
- ---- -------
+============================= =======
fs.xfs.xfsbufd_centisec v4.0
fs.xfs.age_buffer_centisecs v4.0
-
+============================= =======
Error handling
==============
--
2.22.0
^ permalink raw reply related
* Re: [PATCH v1 1/2] mm/page_idle: Add support for per-pid page_idle using virtual indexing
From: Konstantin Khlebnikov @ 2019-07-23 10:10 UTC (permalink / raw)
To: Joel Fernandes (Google), linux-kernel
Cc: vdavydov.dev, Brendan Gregg, kernel-team, Alexey Dobriyan,
Al Viro, Andrew Morton, carmenjackson, Christian Hansen,
Colin Ian King, dancol, David Howells, fmayer, joaodias, joelaf,
Jonathan Corbet, Kees Cook, Kirill Tkhai, linux-doc,
linux-fsdevel, linux-mm, Michal Hocko, Mike Rapoport, minchan,
minchan, namhyung, sspatil, surenb, Thomas Gleixner, timmurray,
tkjos, Vlastimil Babka, wvw
In-Reply-To: <01568524-ed97-36c9-61f7-e95084658f5b@yandex-team.ru>
On 23.07.2019 11:43, Konstantin Khlebnikov wrote:
> On 23.07.2019 0:32, Joel Fernandes (Google) wrote:
>> The page_idle tracking feature currently requires looking up the pagemap
>> for a process followed by interacting with /sys/kernel/mm/page_idle.
>> This is quite cumbersome and can be error-prone too. If between
>> accessing the per-PID pagemap and the global page_idle bitmap, if
>> something changes with the page then the information is not accurate.
>> More over looking up PFN from pagemap in Android devices is not
>> supported by unprivileged process and requires SYS_ADMIN and gives 0 for
>> the PFN.
>>
>> This patch adds support to directly interact with page_idle tracking at
>> the PID level by introducing a /proc/<pid>/page_idle file. This
>> eliminates the need for userspace to calculate the mapping of the page.
>> It follows the exact same semantics as the global
>> /sys/kernel/mm/page_idle, however it is easier to use for some usecases
>> where looking up PFN is not needed and also does not require SYS_ADMIN.
>> It ended up simplifying userspace code, solving the security issue
>> mentioned and works quite well. SELinux does not need to be turned off
>> since no pagemap look up is needed.
>>
>> In Android, we are using this for the heap profiler (heapprofd) which
>> profiles and pin points code paths which allocates and leaves memory
>> idle for long periods of time.
>>
>> Documentation material:
>> The idle page tracking API for virtual address indexing using virtual page
>> frame numbers (VFN) is located at /proc/<pid>/page_idle. It is a bitmap
>> that follows the same semantics as /sys/kernel/mm/page_idle/bitmap
>> except that it uses virtual instead of physical frame numbers.
>>
>> This idle page tracking API can be simpler to use than physical address
>> indexing, since the pagemap for a process does not need to be looked up
>> to mark or read a page's idle bit. It is also more accurate than
>> physical address indexing since in physical address indexing, address
>> space changes can occur between reading the pagemap and reading the
>> bitmap. In virtual address indexing, the process's mmap_sem is held for
>> the duration of the access.
>
> Maybe integrate this into existing interface: /proc/pid/clear_refs and
> /proc/pid/pagemap ?
>
> I.e. echo X > /proc/pid/clear_refs clears reference bits in ptes and
> marks pages idle only for pages mapped in this process.
> And idle bit in /proc/pid/pagemap tells that page is still idle in this process.
> This is faster - we don't need to walk whole rmap for that.
Moreover, this is so cheap so could be counted and shown in smaps.
Unlike to clearing real access bits this does not disrupt memory reclaimer.
Killer feature.
>
>>
>> Cc: vdavydov.dev@gmail.com
>> Cc: Brendan Gregg <bgregg@netflix.com>
>> Cc: kernel-team@android.com
>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>>
>> ---
>> Internal review -> v1:
>> Fixes from Suren.
>> Corrections to change log, docs (Florian, Sandeep)
>>
>> fs/proc/base.c | 3 +
>> fs/proc/internal.h | 1 +
>> fs/proc/task_mmu.c | 57 +++++++
>> include/linux/page_idle.h | 4 +
>> mm/page_idle.c | 305 +++++++++++++++++++++++++++++++++-----
>> 5 files changed, 330 insertions(+), 40 deletions(-)
>>
>> diff --git a/fs/proc/base.c b/fs/proc/base.c
>> index 77eb628ecc7f..a58dd74606e9 100644
>> --- a/fs/proc/base.c
>> +++ b/fs/proc/base.c
>> @@ -3021,6 +3021,9 @@ static const struct pid_entry tgid_base_stuff[] = {
>> REG("smaps", S_IRUGO, proc_pid_smaps_operations),
>> REG("smaps_rollup", S_IRUGO, proc_pid_smaps_rollup_operations),
>> REG("pagemap", S_IRUSR, proc_pagemap_operations),
>> +#ifdef CONFIG_IDLE_PAGE_TRACKING
>> + REG("page_idle", S_IRUSR|S_IWUSR, proc_page_idle_operations),
>> +#endif
>> #endif
>> #ifdef CONFIG_SECURITY
>> DIR("attr", S_IRUGO|S_IXUGO, proc_attr_dir_inode_operations, proc_attr_dir_operations),
>> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
>> index cd0c8d5ce9a1..bc9371880c63 100644
>> --- a/fs/proc/internal.h
>> +++ b/fs/proc/internal.h
>> @@ -293,6 +293,7 @@ extern const struct file_operations proc_pid_smaps_operations;
>> extern const struct file_operations proc_pid_smaps_rollup_operations;
>> extern const struct file_operations proc_clear_refs_operations;
>> extern const struct file_operations proc_pagemap_operations;
>> +extern const struct file_operations proc_page_idle_operations;
>> extern unsigned long task_vsize(struct mm_struct *);
>> extern unsigned long task_statm(struct mm_struct *,
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 4d2b860dbc3f..11ccc53da38e 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -1642,6 +1642,63 @@ const struct file_operations proc_pagemap_operations = {
>> .open = pagemap_open,
>> .release = pagemap_release,
>> };
>> +
>> +#ifdef CONFIG_IDLE_PAGE_TRACKING
>> +static ssize_t proc_page_idle_read(struct file *file, char __user *buf,
>> + size_t count, loff_t *ppos)
>> +{
>> + int ret;
>> + struct task_struct *tsk = get_proc_task(file_inode(file));
>> +
>> + if (!tsk)
>> + return -EINVAL;
>> + ret = page_idle_proc_read(file, buf, count, ppos, tsk);
>> + put_task_struct(tsk);
>> + return ret;
>> +}
>> +
>> +static ssize_t proc_page_idle_write(struct file *file, const char __user *buf,
>> + size_t count, loff_t *ppos)
>> +{
>> + int ret;
>> + struct task_struct *tsk = get_proc_task(file_inode(file));
>> +
>> + if (!tsk)
>> + return -EINVAL;
>> + ret = page_idle_proc_write(file, (char __user *)buf, count, ppos, tsk);
>> + put_task_struct(tsk);
>> + return ret;
>> +}
>> +
>> +static int proc_page_idle_open(struct inode *inode, struct file *file)
>> +{
>> + struct mm_struct *mm;
>> +
>> + mm = proc_mem_open(inode, PTRACE_MODE_READ);
>> + if (IS_ERR(mm))
>> + return PTR_ERR(mm);
>> + file->private_data = mm;
>> + return 0;
>> +}
>> +
>> +static int proc_page_idle_release(struct inode *inode, struct file *file)
>> +{
>> + struct mm_struct *mm = file->private_data;
>> +
>> + if (mm)
>> + mmdrop(mm);
>> + return 0;
>> +}
>> +
>> +const struct file_operations proc_page_idle_operations = {
>> + .llseek = mem_lseek, /* borrow this */
>> + .read = proc_page_idle_read,
>> + .write = proc_page_idle_write,
>> + .open = proc_page_idle_open,
>> + .release = proc_page_idle_release,
>> +};
>> +#endif /* CONFIG_IDLE_PAGE_TRACKING */
>> +
>> #endif /* CONFIG_PROC_PAGE_MONITOR */
>> #ifdef CONFIG_NUMA
>> diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h
>> index 1e894d34bdce..f1bc2640d85e 100644
>> --- a/include/linux/page_idle.h
>> +++ b/include/linux/page_idle.h
>> @@ -106,6 +106,10 @@ static inline void clear_page_idle(struct page *page)
>> }
>> #endif /* CONFIG_64BIT */
>> +ssize_t page_idle_proc_write(struct file *file,
>> + char __user *buf, size_t count, loff_t *ppos, struct task_struct *tsk);
>> +ssize_t page_idle_proc_read(struct file *file,
>> + char __user *buf, size_t count, loff_t *ppos, struct task_struct *tsk);
>> #else /* !CONFIG_IDLE_PAGE_TRACKING */
>> static inline bool page_is_young(struct page *page)
>> diff --git a/mm/page_idle.c b/mm/page_idle.c
>> index 295512465065..874a60c41fef 100644
>> --- a/mm/page_idle.c
>> +++ b/mm/page_idle.c
>> @@ -11,6 +11,7 @@
>> #include <linux/mmu_notifier.h>
>> #include <linux/page_ext.h>
>> #include <linux/page_idle.h>
>> +#include <linux/sched/mm.h>
>> #define BITMAP_CHUNK_SIZE sizeof(u64)
>> #define BITMAP_CHUNK_BITS (BITMAP_CHUNK_SIZE * BITS_PER_BYTE)
>> @@ -28,15 +29,12 @@
>> *
>> * This function tries to get a user memory page by pfn as described above.
>> */
>> -static struct page *page_idle_get_page(unsigned long pfn)
>> +static struct page *page_idle_get_page(struct page *page_in)
>> {
>> struct page *page;
>> pg_data_t *pgdat;
>> - if (!pfn_valid(pfn))
>> - return NULL;
>> -
>> - page = pfn_to_page(pfn);
>> + page = page_in;
>> if (!page || !PageLRU(page) ||
>> !get_page_unless_zero(page))
>> return NULL;
>> @@ -51,6 +49,15 @@ static struct page *page_idle_get_page(unsigned long pfn)
>> return page;
>> }
>> +static struct page *page_idle_get_page_pfn(unsigned long pfn)
>> +{
>> +
>> + if (!pfn_valid(pfn))
>> + return NULL;
>> +
>> + return page_idle_get_page(pfn_to_page(pfn));
>> +}
>> +
>> static bool page_idle_clear_pte_refs_one(struct page *page,
>> struct vm_area_struct *vma,
>> unsigned long addr, void *arg)
>> @@ -118,6 +125,47 @@ static void page_idle_clear_pte_refs(struct page *page)
>> unlock_page(page);
>> }
>> +/* Helper to get the start and end frame given a pos and count */
>> +static int page_idle_get_frames(loff_t pos, size_t count, struct mm_struct *mm,
>> + unsigned long *start, unsigned long *end)
>> +{
>> + unsigned long max_frame;
>> +
>> + /* If an mm is not given, assume we want physical frames */
>> + max_frame = mm ? (mm->task_size >> PAGE_SHIFT) : max_pfn;
>> +
>> + if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE)
>> + return -EINVAL;
>> +
>> + *start = pos * BITS_PER_BYTE;
>> + if (*start >= max_frame)
>> + return -ENXIO;
>> +
>> + *end = *start + count * BITS_PER_BYTE;
>> + if (*end > max_frame)
>> + *end = max_frame;
>> + return 0;
>> +}
>> +
>> +static bool page_really_idle(struct page *page)
>> +{
>> + if (!page)
>> + return false;
>> +
>> + if (page_is_idle(page)) {
>> + /*
>> + * The page might have been referenced via a
>> + * pte, in which case it is not idle. Clear
>> + * refs and recheck.
>> + */
>> + page_idle_clear_pte_refs(page);
>> + if (page_is_idle(page))
>> + return true;
>> + }
>> +
>> + return false;
>> +}
>> +
>> static ssize_t page_idle_bitmap_read(struct file *file, struct kobject *kobj,
>> struct bin_attribute *attr, char *buf,
>> loff_t pos, size_t count)
>> @@ -125,35 +173,21 @@ static ssize_t page_idle_bitmap_read(struct file *file, struct kobject *kobj,
>> u64 *out = (u64 *)buf;
>> struct page *page;
>> unsigned long pfn, end_pfn;
>> - int bit;
>> -
>> - if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE)
>> - return -EINVAL;
>> -
>> - pfn = pos * BITS_PER_BYTE;
>> - if (pfn >= max_pfn)
>> - return 0;
>> + int bit, ret;
>> - end_pfn = pfn + count * BITS_PER_BYTE;
>> - if (end_pfn > max_pfn)
>> - end_pfn = max_pfn;
>> + ret = page_idle_get_frames(pos, count, NULL, &pfn, &end_pfn);
>> + if (ret == -ENXIO)
>> + return 0; /* Reads beyond max_pfn do nothing */
>> + else if (ret)
>> + return ret;
>> for (; pfn < end_pfn; pfn++) {
>> bit = pfn % BITMAP_CHUNK_BITS;
>> if (!bit)
>> *out = 0ULL;
>> - page = page_idle_get_page(pfn);
>> - if (page) {
>> - if (page_is_idle(page)) {
>> - /*
>> - * The page might have been referenced via a
>> - * pte, in which case it is not idle. Clear
>> - * refs and recheck.
>> - */
>> - page_idle_clear_pte_refs(page);
>> - if (page_is_idle(page))
>> - *out |= 1ULL << bit;
>> - }
>> + page = page_idle_get_page_pfn(pfn);
>> + if (page && page_really_idle(page)) {
>> + *out |= 1ULL << bit;
>> put_page(page);
>> }
>> if (bit == BITMAP_CHUNK_BITS - 1)
>> @@ -170,23 +204,16 @@ static ssize_t page_idle_bitmap_write(struct file *file, struct kobject *kobj,
>> const u64 *in = (u64 *)buf;
>> struct page *page;
>> unsigned long pfn, end_pfn;
>> - int bit;
>> + int bit, ret;
>> - if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE)
>> - return -EINVAL;
>> -
>> - pfn = pos * BITS_PER_BYTE;
>> - if (pfn >= max_pfn)
>> - return -ENXIO;
>> -
>> - end_pfn = pfn + count * BITS_PER_BYTE;
>> - if (end_pfn > max_pfn)
>> - end_pfn = max_pfn;
>> + ret = page_idle_get_frames(pos, count, NULL, &pfn, &end_pfn);
>> + if (ret)
>> + return ret;
>> for (; pfn < end_pfn; pfn++) {
>> bit = pfn % BITMAP_CHUNK_BITS;
>> if ((*in >> bit) & 1) {
>> - page = page_idle_get_page(pfn);
>> + page = page_idle_get_page_pfn(pfn);
>> if (page) {
>> page_idle_clear_pte_refs(page);
>> set_page_idle(page);
>> @@ -224,10 +251,208 @@ struct page_ext_operations page_idle_ops = {
>> };
>> #endif
>> +/* page_idle tracking for /proc/<pid>/page_idle */
>> +
>> +static DEFINE_SPINLOCK(idle_page_list_lock);
>> +struct list_head idle_page_list;
>> +
>> +struct page_node {
>> + struct page *page;
>> + unsigned long addr;
>> + struct list_head list;
>> +};
>> +
>> +struct page_idle_proc_priv {
>> + unsigned long start_addr;
>> + char *buffer;
>> + int write;
>> +};
>> +
>> +static void add_page_idle_list(struct page *page,
>> + unsigned long addr, struct mm_walk *walk)
>> +{
>> + struct page *page_get;
>> + struct page_node *pn;
>> + int bit;
>> + unsigned long frames;
>> + struct page_idle_proc_priv *priv = walk->private;
>> + u64 *chunk = (u64 *)priv->buffer;
>> +
>> + if (priv->write) {
>> + /* Find whether this page was asked to be marked */
>> + frames = (addr - priv->start_addr) >> PAGE_SHIFT;
>> + bit = frames % BITMAP_CHUNK_BITS;
>> + chunk = &chunk[frames / BITMAP_CHUNK_BITS];
>> + if (((*chunk >> bit) & 1) == 0)
>> + return;
>> + }
>> +
>> + page_get = page_idle_get_page(page);
>> + if (!page_get)
>> + return;
>> +
>> + pn = kmalloc(sizeof(*pn), GFP_ATOMIC);
>> + if (!pn)
>> + return;
>> +
>> + pn->page = page_get;
>> + pn->addr = addr;
>> + list_add(&pn->list, &idle_page_list);
>> +}
>> +
>> +static int pte_page_idle_proc_range(pmd_t *pmd, unsigned long addr,
>> + unsigned long end,
>> + struct mm_walk *walk)
>> +{
>> + struct vm_area_struct *vma = walk->vma;
>> + pte_t *pte;
>> + spinlock_t *ptl;
>> + struct page *page;
>> +
>> + ptl = pmd_trans_huge_lock(pmd, vma);
>> + if (ptl) {
>> + if (pmd_present(*pmd)) {
>> + page = follow_trans_huge_pmd(vma, addr, pmd,
>> + FOLL_DUMP|FOLL_WRITE);
>> + if (!IS_ERR_OR_NULL(page))
>> + add_page_idle_list(page, addr, walk);
>> + }
>> + spin_unlock(ptl);
>> + return 0;
>> + }
>> +
>> + if (pmd_trans_unstable(pmd))
>> + return 0;
>> +
>> + pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>> + for (; addr != end; pte++, addr += PAGE_SIZE) {
>> + if (!pte_present(*pte))
>> + continue;
>> +
>> + page = vm_normal_page(vma, addr, *pte);
>> + if (page)
>> + add_page_idle_list(page, addr, walk);
>> + }
>> +
>> + pte_unmap_unlock(pte - 1, ptl);
>> + return 0;
>> +}
>> +
>> +ssize_t page_idle_proc_generic(struct file *file, char __user *ubuff,
>> + size_t count, loff_t *pos,
>> + struct task_struct *tsk, int write)
>> +{
>> + int ret;
>> + char *buffer;
>> + u64 *out;
>> + unsigned long start_addr, end_addr, start_frame, end_frame;
>> + struct mm_struct *mm = file->private_data;
>> + struct mm_walk walk = { .pmd_entry = pte_page_idle_proc_range, };
>> + struct page_node *cur, *next;
>> + struct page_idle_proc_priv priv;
>> + bool walk_error = false;
>> +
>> + if (!mm || !mmget_not_zero(mm))
>> + return -EINVAL;
>> +
>> + if (count > PAGE_SIZE)
>> + count = PAGE_SIZE;
>> +
>> + buffer = kzalloc(PAGE_SIZE, GFP_KERNEL);
>> + if (!buffer) {
>> + ret = -ENOMEM;
>> + goto out_mmput;
>> + }
>> + out = (u64 *)buffer;
>> +
>> + if (write && copy_from_user(buffer, ubuff, count)) {
>> + ret = -EFAULT;
>> + goto out;
>> + }
>> +
>> + ret = page_idle_get_frames(*pos, count, mm, &start_frame, &end_frame);
>> + if (ret)
>> + goto out;
>> +
>> + start_addr = (start_frame << PAGE_SHIFT);
>> + end_addr = (end_frame << PAGE_SHIFT);
>> + priv.buffer = buffer;
>> + priv.start_addr = start_addr;
>> + priv.write = write;
>> + walk.private = &priv;
>> + walk.mm = mm;
>> +
>> + down_read(&mm->mmap_sem);
>> +
>> + /*
>> + * Protects the idle_page_list which is needed because
>> + * walk_page_vma() holds ptlock which deadlocks with
>> + * page_idle_clear_pte_refs(). So we have to collect all
>> + * pages first, and then call page_idle_clear_pte_refs().
>> + */
>> + spin_lock(&idle_page_list_lock);
>> + ret = walk_page_range(start_addr, end_addr, &walk);
>> + if (ret)
>> + walk_error = true;
>> +
>> + list_for_each_entry_safe(cur, next, &idle_page_list, list) {
>> + int bit, index;
>> + unsigned long off;
>> + struct page *page = cur->page;
>> +
>> + if (unlikely(walk_error))
>> + goto remove_page;
>> +
>> + if (write) {
>> + page_idle_clear_pte_refs(page);
>> + set_page_idle(page);
>> + } else {
>> + if (page_really_idle(page)) {
>> + off = ((cur->addr) >> PAGE_SHIFT) - start_frame;
>> + bit = off % BITMAP_CHUNK_BITS;
>> + index = off / BITMAP_CHUNK_BITS;
>> + out[index] |= 1ULL << bit;
>> + }
>> + }
>> +remove_page:
>> + put_page(page);
>> + list_del(&cur->list);
>> + kfree(cur);
>> + }
>> + spin_unlock(&idle_page_list_lock);
>> +
>> + if (!write && !walk_error)
>> + ret = copy_to_user(ubuff, buffer, count);
>> +
>> + up_read(&mm->mmap_sem);
>> +out:
>> + kfree(buffer);
>> +out_mmput:
>> + mmput(mm);
>> + if (!ret)
>> + ret = count;
>> + return ret;
>> +
>> +}
>> +
>> +ssize_t page_idle_proc_read(struct file *file, char __user *ubuff,
>> + size_t count, loff_t *pos, struct task_struct *tsk)
>> +{
>> + return page_idle_proc_generic(file, ubuff, count, pos, tsk, 0);
>> +}
>> +
>> +ssize_t page_idle_proc_write(struct file *file, char __user *ubuff,
>> + size_t count, loff_t *pos, struct task_struct *tsk)
>> +{
>> + return page_idle_proc_generic(file, ubuff, count, pos, tsk, 1);
>> +}
>> +
>> static int __init page_idle_init(void)
>> {
>> int err;
>> + INIT_LIST_HEAD(&idle_page_list);
>> +
>> err = sysfs_create_group(mm_kobj, &page_idle_attr_group);
>> if (err) {
>> pr_err("page_idle: register sysfs failed\n");
>>
^ permalink raw reply
* RE: [PATCH v5] Documentation/checkpatch: Prefer strscpy/strscpy_pad over strcpy/strlcpy/strncpy
From: Gote, Nitin R @ 2019-07-23 9:26 UTC (permalink / raw)
To: Joe Perches, Kees Cook
Cc: corbet@lwn.net, akpm@linux-foundation.org, apw@canonical.com,
linux-doc@vger.kernel.org, kernel-hardening@lists.openwall.com
In-Reply-To: <28404b52d58efa0a3e85ce05ce0b210049ed6050.camel@perches.com>
> -----Original Message-----
> From: Joe Perches [mailto:joe@perches.com]
> Sent: Monday, July 22, 2019 11:11 PM
> To: Kees Cook <keescook@chromium.org>; Gote, Nitin R
> <nitin.r.gote@intel.com>
> Cc: corbet@lwn.net; akpm@linux-foundation.org; apw@canonical.com;
> linux-doc@vger.kernel.org; kernel-hardening@lists.openwall.com
> Subject: Re: [PATCH v5] Documentation/checkpatch: Prefer
> strscpy/strscpy_pad over strcpy/strlcpy/strncpy
>
> On Mon, 2019-07-22 at 10:30 -0700, Kees Cook wrote:
> > On Wed, Jul 17, 2019 at 10:00:05AM +0530, NitinGote wrote:
> > > From: Nitin Gote <nitin.r.gote@intel.com>
> > >
> > > Added check in checkpatch.pl to
> > > 1. Deprecate strcpy() in favor of strscpy().
> > > 2. Deprecate strlcpy() in favor of strscpy().
> > > 3. Deprecate strncpy() in favor of strscpy() or strscpy_pad().
> > >
> > > Updated strncpy() section in Documentation/process/deprecated.rst
> > > to cover strscpy_pad() case.
> > >
> > > Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
> >
> > Reviewed-by: Kees Cook <keescook@chromium.org>
> >
> > Joe, does this address your checkpatch concerns?
>
> Well, kinda.
>
> strscpy_pad isn't used anywhere in the kernel.
>
> And
>
> + "strncpy" => "strscpy, strscpy_pad or for non-
> NUL-terminated strings, strncpy() can still be used, but destinations should
> be marked with __nonstring",
>
> is a bit verbose. This could be simply:
>
> + "strncpy" => "strscpy - for non-NUL-terminated uses, strncpy() dst
> should be __nonstring",
>
But, if the destination buffer needs extra NUL-padding for remaining size of destination,
then safe replacement is strscpy_pad(). Right? If yes, then what is your opinion on below change :
"strncpy" => "strscpy, strcpy_pad - for non-NUL-terminated uses, strncpy() dst
should be __nonstring",
> And I still prefer adding stracpy as it
> reduces code verbosity and eliminates defects.
>
-Nitin
^ permalink raw reply
* [PATCH v3 2/2] drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver.
From: Ganapatrao Kulkarni @ 2019-07-23 9:16 UTC (permalink / raw)
To: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
Cc: will@kernel.org, mark.rutland@arm.com, corbet@lwn.net,
Jayachandran Chandrasekharan Nair, Robert Richter, Jan Glauber,
gklkml16@gmail.com
In-Reply-To: <1563873380-2003-1-git-send-email-gkulkarni@marvell.com>
CCPI2 is a low-latency high-bandwidth serial interface for connecting
ThunderX2 processors. This patch adds support to capture CCPI2 perf events.
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
---
drivers/perf/thunderx2_pmu.c | 248 ++++++++++++++++++++++++++++++-----
1 file changed, 214 insertions(+), 34 deletions(-)
diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
index 43d76c85da56..a4e1273eafa3 100644
--- a/drivers/perf/thunderx2_pmu.c
+++ b/drivers/perf/thunderx2_pmu.c
@@ -17,22 +17,31 @@
*/
#define TX2_PMU_MAX_COUNTERS 4
+
#define TX2_PMU_DMC_CHANNELS 8
#define TX2_PMU_L3_TILES 16
#define TX2_PMU_HRTIMER_INTERVAL (2 * NSEC_PER_SEC)
-#define GET_EVENTID(ev) ((ev->hw.config) & 0x1f)
-#define GET_COUNTERID(ev) ((ev->hw.idx) & 0x3)
+#define GET_EVENTID(ev, mask) ((ev->hw.config) & mask)
+#define GET_COUNTERID(ev, mask) ((ev->hw.idx) & mask)
/* 1 byte per counter(4 counters).
* Event id is encoded in bits [5:1] of a byte,
*/
#define DMC_EVENT_CFG(idx, val) ((val) << (((idx) * 8) + 1))
+/* bits[3:0] to select counters, are indexed from 8 to 15. */
+#define CCPI2_COUNTER_OFFSET 8
+
#define L3C_COUNTER_CTL 0xA8
#define L3C_COUNTER_DATA 0xAC
#define DMC_COUNTER_CTL 0x234
#define DMC_COUNTER_DATA 0x240
+#define CCPI2_PERF_CTL 0x108
+#define CCPI2_COUNTER_CTL 0x10C
+#define CCPI2_COUNTER_SEL 0x12c
+#define CCPI2_COUNTER_DATA_L 0x130
+
/* L3C event IDs */
#define L3_EVENT_READ_REQ 0xD
#define L3_EVENT_WRITEBACK_REQ 0xE
@@ -51,15 +60,23 @@
#define DMC_EVENT_READ_TXNS 0xF
#define DMC_EVENT_MAX 0x10
+#define CCPI2_EVENT_REQ_PKT_SENT 0x3D
+#define CCPI2_EVENT_SNOOP_PKT_SENT 0x65
+#define CCPI2_EVENT_DATA_PKT_SENT 0x105
+#define CCPI2_EVENT_GIC_PKT_SENT 0x12D
+#define CCPI2_EVENT_MAX 0x200
+
enum tx2_uncore_type {
PMU_TYPE_L3C,
PMU_TYPE_DMC,
+ PMU_TYPE_CCPI2,
PMU_TYPE_INVALID,
};
+
/*
- * pmu on each socket has 2 uncore devices(dmc and l3c),
- * each device has 4 counters.
+ * pmu on each socket has 3 uncore devices(dmc, l3ci and ccpi2),
+ * dmc and l3c has 4 counters and ccpi2 8.
*/
struct tx2_uncore_pmu {
struct hlist_node hpnode;
@@ -69,12 +86,14 @@ struct tx2_uncore_pmu {
int node;
int cpu;
u32 max_counters;
+ u32 counters_mask;
u32 prorate_factor;
u32 max_events;
+ u32 events_mask;
u64 hrtimer_interval;
void __iomem *base;
DECLARE_BITMAP(active_counters, TX2_PMU_MAX_COUNTERS);
- struct perf_event *events[TX2_PMU_MAX_COUNTERS];
+ struct perf_event **events;
struct device *dev;
struct hrtimer hrtimer;
const struct attribute_group **attr_groups;
@@ -92,7 +111,21 @@ static inline struct tx2_uncore_pmu *pmu_to_tx2_pmu(struct pmu *pmu)
return container_of(pmu, struct tx2_uncore_pmu, pmu);
}
-PMU_FORMAT_ATTR(event, "config:0-4");
+#define TX2_PMU_FORMAT_ATTR(_var, _name, _format) \
+static ssize_t \
+__tx2_pmu_##_var##_show(struct device *dev, \
+ struct device_attribute *attr, \
+ char *page) \
+{ \
+ BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \
+ return sprintf(page, _format "\n"); \
+} \
+ \
+static struct device_attribute format_attr_##_var = \
+ __ATTR(_name, 0444, __tx2_pmu_##_var##_show, NULL)
+
+TX2_PMU_FORMAT_ATTR(event, event, "config:0-4");
+TX2_PMU_FORMAT_ATTR(event_ccpi2, event, "config:0-9");
static struct attribute *l3c_pmu_format_attrs[] = {
&format_attr_event.attr,
@@ -104,6 +137,11 @@ static struct attribute *dmc_pmu_format_attrs[] = {
NULL,
};
+static struct attribute *ccpi2_pmu_format_attrs[] = {
+ &format_attr_event_ccpi2.attr,
+ NULL,
+};
+
static const struct attribute_group l3c_pmu_format_attr_group = {
.name = "format",
.attrs = l3c_pmu_format_attrs,
@@ -114,6 +152,11 @@ static const struct attribute_group dmc_pmu_format_attr_group = {
.attrs = dmc_pmu_format_attrs,
};
+static const struct attribute_group ccpi2_pmu_format_attr_group = {
+ .name = "format",
+ .attrs = ccpi2_pmu_format_attrs,
+};
+
/*
* sysfs event attributes
*/
@@ -164,6 +207,19 @@ static struct attribute *dmc_pmu_events_attrs[] = {
NULL,
};
+TX2_EVENT_ATTR(req_pktsent, CCPI2_EVENT_REQ_PKT_SENT);
+TX2_EVENT_ATTR(snoop_pktsent, CCPI2_EVENT_SNOOP_PKT_SENT);
+TX2_EVENT_ATTR(data_pktsent, CCPI2_EVENT_DATA_PKT_SENT);
+TX2_EVENT_ATTR(gic_pktsent, CCPI2_EVENT_GIC_PKT_SENT);
+
+static struct attribute *ccpi2_pmu_events_attrs[] = {
+ &tx2_pmu_event_attr_req_pktsent.attr.attr,
+ &tx2_pmu_event_attr_snoop_pktsent.attr.attr,
+ &tx2_pmu_event_attr_data_pktsent.attr.attr,
+ &tx2_pmu_event_attr_gic_pktsent.attr.attr,
+ NULL,
+};
+
static const struct attribute_group l3c_pmu_events_attr_group = {
.name = "events",
.attrs = l3c_pmu_events_attrs,
@@ -174,6 +230,11 @@ static const struct attribute_group dmc_pmu_events_attr_group = {
.attrs = dmc_pmu_events_attrs,
};
+static const struct attribute_group ccpi2_pmu_events_attr_group = {
+ .name = "events",
+ .attrs = ccpi2_pmu_events_attrs,
+};
+
/*
* sysfs cpumask attributes
*/
@@ -213,11 +274,23 @@ static const struct attribute_group *dmc_pmu_attr_groups[] = {
NULL
};
+static const struct attribute_group *ccpi2_pmu_attr_groups[] = {
+ &ccpi2_pmu_format_attr_group,
+ &pmu_cpumask_attr_group,
+ &ccpi2_pmu_events_attr_group,
+ NULL
+};
+
static inline u32 reg_readl(unsigned long addr)
{
return readl((void __iomem *)addr);
}
+static inline u32 reg_readq(unsigned long addr)
+{
+ return readq((void __iomem *)addr);
+}
+
static inline void reg_writel(u32 val, unsigned long addr)
{
writel(val, (void __iomem *)addr);
@@ -245,33 +318,58 @@ static void init_cntr_base_l3c(struct perf_event *event,
struct tx2_uncore_pmu *tx2_pmu)
{
struct hw_perf_event *hwc = &event->hw;
+ u32 cmask;
+
+ tx2_pmu = pmu_to_tx2_pmu(event->pmu);
+ cmask = tx2_pmu->counters_mask;
/* counter ctrl/data reg offset at 8 */
hwc->config_base = (unsigned long)tx2_pmu->base
- + L3C_COUNTER_CTL + (8 * GET_COUNTERID(event));
+ + L3C_COUNTER_CTL + (8 * GET_COUNTERID(event, cmask));
hwc->event_base = (unsigned long)tx2_pmu->base
- + L3C_COUNTER_DATA + (8 * GET_COUNTERID(event));
+ + L3C_COUNTER_DATA + (8 * GET_COUNTERID(event, cmask));
}
static void init_cntr_base_dmc(struct perf_event *event,
struct tx2_uncore_pmu *tx2_pmu)
{
struct hw_perf_event *hwc = &event->hw;
+ u32 cmask;
+
+ tx2_pmu = pmu_to_tx2_pmu(event->pmu);
+ cmask = tx2_pmu->counters_mask;
hwc->config_base = (unsigned long)tx2_pmu->base
+ DMC_COUNTER_CTL;
/* counter data reg offset at 0xc */
hwc->event_base = (unsigned long)tx2_pmu->base
- + DMC_COUNTER_DATA + (0xc * GET_COUNTERID(event));
+ + DMC_COUNTER_DATA + (0xc * GET_COUNTERID(event, cmask));
+}
+
+static void init_cntr_base_ccpi2(struct perf_event *event,
+ struct tx2_uncore_pmu *tx2_pmu)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ u32 cmask;
+
+ cmask = tx2_pmu->counters_mask;
+
+ hwc->config_base = (unsigned long)tx2_pmu->base
+ + CCPI2_COUNTER_CTL + (4 * GET_COUNTERID(event, cmask));
+ hwc->event_base = (unsigned long)tx2_pmu->base;
}
static void uncore_start_event_l3c(struct perf_event *event, int flags)
{
- u32 val;
+ u32 val, emask;
struct hw_perf_event *hwc = &event->hw;
+ struct tx2_uncore_pmu *tx2_pmu;
+
+ tx2_pmu = pmu_to_tx2_pmu(event->pmu);
+ emask = tx2_pmu->events_mask;
/* event id encoded in bits [07:03] */
- val = GET_EVENTID(event) << 3;
+ val = GET_EVENTID(event, emask) << 3;
reg_writel(val, hwc->config_base);
local64_set(&hwc->prev_count, 0);
reg_writel(0, hwc->event_base);
@@ -284,10 +382,17 @@ static inline void uncore_stop_event_l3c(struct perf_event *event)
static void uncore_start_event_dmc(struct perf_event *event, int flags)
{
- u32 val;
+ u32 val, cmask, emask;
struct hw_perf_event *hwc = &event->hw;
- int idx = GET_COUNTERID(event);
- int event_id = GET_EVENTID(event);
+ struct tx2_uncore_pmu *tx2_pmu;
+ int idx, event_id;
+
+ tx2_pmu = pmu_to_tx2_pmu(event->pmu);
+ cmask = tx2_pmu->counters_mask;
+ emask = tx2_pmu->events_mask;
+
+ idx = GET_COUNTERID(event, cmask);
+ event_id = GET_EVENTID(event, emask);
/* enable and start counters.
* 8 bits for each counter, bits[05:01] of a counter to set event type.
@@ -302,9 +407,14 @@ static void uncore_start_event_dmc(struct perf_event *event, int flags)
static void uncore_stop_event_dmc(struct perf_event *event)
{
- u32 val;
+ u32 val, cmask;
struct hw_perf_event *hwc = &event->hw;
- int idx = GET_COUNTERID(event);
+ struct tx2_uncore_pmu *tx2_pmu;
+ int idx;
+
+ tx2_pmu = pmu_to_tx2_pmu(event->pmu);
+ cmask = tx2_pmu->counters_mask;
+ idx = GET_COUNTERID(event, cmask);
/* clear event type(bits[05:01]) to stop counter */
val = reg_readl(hwc->config_base);
@@ -312,6 +422,31 @@ static void uncore_stop_event_dmc(struct perf_event *event)
reg_writel(val, hwc->config_base);
}
+static void uncore_start_event_ccpi2(struct perf_event *event, int flags)
+{
+ u32 emask;
+ struct hw_perf_event *hwc = &event->hw;
+ struct tx2_uncore_pmu *tx2_pmu;
+
+ tx2_pmu = pmu_to_tx2_pmu(event->pmu);
+ emask = tx2_pmu->events_mask;
+
+ /* Bit [09:00] to set event id, set level and type to 1 */
+ reg_writel((3 << 10) |
+ GET_EVENTID(event, emask), hwc->config_base);
+ /* reset[4], enable[0] and start[1] counters */
+ reg_writel(0x13, hwc->event_base + CCPI2_PERF_CTL);
+ local64_set(&event->hw.prev_count, 0ULL);
+}
+
+static void uncore_stop_event_ccpi2(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ /* disable and stop counter */
+ reg_writel(0, hwc->event_base + CCPI2_PERF_CTL);
+}
+
static void tx2_uncore_event_update(struct perf_event *event)
{
s64 prev, delta, new = 0;
@@ -319,20 +454,30 @@ static void tx2_uncore_event_update(struct perf_event *event)
struct tx2_uncore_pmu *tx2_pmu;
enum tx2_uncore_type type;
u32 prorate_factor;
+ u32 cmask, emask;
tx2_pmu = pmu_to_tx2_pmu(event->pmu);
type = tx2_pmu->type;
+ cmask = tx2_pmu->counters_mask;
+ emask = tx2_pmu->events_mask;
prorate_factor = tx2_pmu->prorate_factor;
-
- new = reg_readl(hwc->event_base);
- prev = local64_xchg(&hwc->prev_count, new);
-
- /* handles rollover of 32 bit counter */
- delta = (u32)(((1UL << 32) - prev) + new);
+ if (type == PMU_TYPE_CCPI2) {
+ reg_writel(CCPI2_COUNTER_OFFSET +
+ GET_COUNTERID(event, cmask),
+ hwc->event_base + CCPI2_COUNTER_SEL);
+ new = reg_readq(hwc->event_base + CCPI2_COUNTER_DATA_L);
+ prev = local64_xchg(&hwc->prev_count, new);
+ delta = new - prev;
+ } else {
+ new = reg_readl(hwc->event_base);
+ prev = local64_xchg(&hwc->prev_count, new);
+ /* handles rollover of 32 bit counter */
+ delta = (u32)(((1UL << 32) - prev) + new);
+ }
/* DMC event data_transfers granularity is 16 Bytes, convert it to 64 */
if (type == PMU_TYPE_DMC &&
- GET_EVENTID(event) == DMC_EVENT_DATA_TRANSFERS)
+ GET_EVENTID(event, emask) == DMC_EVENT_DATA_TRANSFERS)
delta = delta/4;
/* L3C and DMC has 16 and 8 interleave channels respectively.
@@ -351,6 +496,7 @@ static enum tx2_uncore_type get_tx2_pmu_type(struct acpi_device *adev)
} devices[] = {
{"CAV901D", PMU_TYPE_L3C},
{"CAV901F", PMU_TYPE_DMC},
+ {"CAV901E", PMU_TYPE_CCPI2},
{"", PMU_TYPE_INVALID}
};
@@ -380,7 +526,8 @@ static bool tx2_uncore_validate_event(struct pmu *pmu,
* Make sure the group of events can be scheduled at once
* on the PMU.
*/
-static bool tx2_uncore_validate_event_group(struct perf_event *event)
+static bool tx2_uncore_validate_event_group(struct perf_event *event,
+ int max_counters)
{
struct perf_event *sibling, *leader = event->group_leader;
int counters = 0;
@@ -403,7 +550,7 @@ static bool tx2_uncore_validate_event_group(struct perf_event *event)
* If the group requires more counters than the HW has,
* it cannot ever be scheduled.
*/
- return counters <= TX2_PMU_MAX_COUNTERS;
+ return counters <= max_counters;
}
@@ -439,7 +586,7 @@ static int tx2_uncore_event_init(struct perf_event *event)
hwc->config = event->attr.config;
/* Validate the group */
- if (!tx2_uncore_validate_event_group(event))
+ if (!tx2_uncore_validate_event_group(event, tx2_pmu->max_counters))
return -EINVAL;
return 0;
@@ -456,8 +603,9 @@ static void tx2_uncore_event_start(struct perf_event *event, int flags)
tx2_pmu->start_event(event, flags);
perf_event_update_userpage(event);
- /* Start timer for first event */
- if (bitmap_weight(tx2_pmu->active_counters,
+ /* Start timer for first non ccpi2 event */
+ if (tx2_pmu->type != PMU_TYPE_CCPI2 &&
+ bitmap_weight(tx2_pmu->active_counters,
tx2_pmu->max_counters) == 1) {
hrtimer_start(&tx2_pmu->hrtimer,
ns_to_ktime(tx2_pmu->hrtimer_interval),
@@ -495,7 +643,8 @@ static int tx2_uncore_event_add(struct perf_event *event, int flags)
if (hwc->idx < 0)
return -EAGAIN;
- tx2_pmu->events[hwc->idx] = event;
+ if (tx2_pmu->events)
+ tx2_pmu->events[hwc->idx] = event;
/* set counter control and data registers base address */
tx2_pmu->init_cntr_base(event, tx2_pmu);
@@ -510,14 +659,17 @@ static void tx2_uncore_event_del(struct perf_event *event, int flags)
{
struct tx2_uncore_pmu *tx2_pmu = pmu_to_tx2_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
+ u32 cmask;
+ cmask = tx2_pmu->counters_mask;
tx2_uncore_event_stop(event, PERF_EF_UPDATE);
/* clear the assigned counter */
- free_counter(tx2_pmu, GET_COUNTERID(event));
+ free_counter(tx2_pmu, GET_COUNTERID(event, cmask));
perf_event_update_userpage(event);
- tx2_pmu->events[hwc->idx] = NULL;
+ if (tx2_pmu->events)
+ tx2_pmu->events[hwc->idx] = NULL;
hwc->idx = -1;
}
@@ -580,8 +732,12 @@ static int tx2_uncore_pmu_add_dev(struct tx2_uncore_pmu *tx2_pmu)
cpu_online_mask);
tx2_pmu->cpu = cpu;
- hrtimer_init(&tx2_pmu->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
- tx2_pmu->hrtimer.function = tx2_hrtimer_callback;
+ /* CCPI2 counters are 64 bit counters, no overflow */
+ if (tx2_pmu->type != PMU_TYPE_CCPI2) {
+ hrtimer_init(&tx2_pmu->hrtimer,
+ CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ tx2_pmu->hrtimer.function = tx2_hrtimer_callback;
+ }
ret = tx2_uncore_pmu_register(tx2_pmu);
if (ret) {
@@ -654,8 +810,10 @@ static struct tx2_uncore_pmu *tx2_uncore_pmu_init_dev(struct device *dev,
switch (tx2_pmu->type) {
case PMU_TYPE_L3C:
tx2_pmu->max_counters = TX2_PMU_MAX_COUNTERS;
+ tx2_pmu->counters_mask = 0x3;
tx2_pmu->prorate_factor = TX2_PMU_L3_TILES;
tx2_pmu->max_events = L3_EVENT_MAX;
+ tx2_pmu->events_mask = 0x1f;
tx2_pmu->hrtimer_interval = TX2_PMU_HRTIMER_INTERVAL;
tx2_pmu->attr_groups = l3c_pmu_attr_groups;
tx2_pmu->name = devm_kasprintf(dev, GFP_KERNEL,
@@ -663,11 +821,15 @@ static struct tx2_uncore_pmu *tx2_uncore_pmu_init_dev(struct device *dev,
tx2_pmu->init_cntr_base = init_cntr_base_l3c;
tx2_pmu->start_event = uncore_start_event_l3c;
tx2_pmu->stop_event = uncore_stop_event_l3c;
+ tx2_pmu->events = devm_kzalloc(dev, tx2_pmu->max_counters *
+ sizeof(*tx2_pmu->events), GFP_KERNEL);
break;
case PMU_TYPE_DMC:
tx2_pmu->max_counters = TX2_PMU_MAX_COUNTERS;
+ tx2_pmu->counters_mask = 0x3;
tx2_pmu->prorate_factor = TX2_PMU_DMC_CHANNELS;
tx2_pmu->max_events = DMC_EVENT_MAX;
+ tx2_pmu->events_mask = 0x1f;
tx2_pmu->hrtimer_interval = TX2_PMU_HRTIMER_INTERVAL;
tx2_pmu->attr_groups = dmc_pmu_attr_groups;
tx2_pmu->name = devm_kasprintf(dev, GFP_KERNEL,
@@ -675,6 +837,23 @@ static struct tx2_uncore_pmu *tx2_uncore_pmu_init_dev(struct device *dev,
tx2_pmu->init_cntr_base = init_cntr_base_dmc;
tx2_pmu->start_event = uncore_start_event_dmc;
tx2_pmu->stop_event = uncore_stop_event_dmc;
+ tx2_pmu->events = devm_kzalloc(dev, tx2_pmu->max_counters *
+ sizeof(*tx2_pmu->events), GFP_KERNEL);
+ break;
+ case PMU_TYPE_CCPI2:
+ /* CCPI2 has 8 counters */
+ tx2_pmu->max_counters = 8;
+ tx2_pmu->counters_mask = 0x7;
+ tx2_pmu->prorate_factor = 1;
+ tx2_pmu->max_events = CCPI2_EVENT_MAX;
+ tx2_pmu->events_mask = 0x1ff;
+ tx2_pmu->attr_groups = ccpi2_pmu_attr_groups;
+ tx2_pmu->name = devm_kasprintf(dev, GFP_KERNEL,
+ "uncore_ccpi2_%d", tx2_pmu->node);
+ tx2_pmu->init_cntr_base = init_cntr_base_ccpi2;
+ tx2_pmu->start_event = uncore_start_event_ccpi2;
+ tx2_pmu->stop_event = uncore_stop_event_ccpi2;
+ tx2_pmu->events = NULL;
break;
case PMU_TYPE_INVALID:
devm_kfree(dev, tx2_pmu);
@@ -744,7 +923,8 @@ static int tx2_uncore_pmu_offline_cpu(unsigned int cpu,
if (cpu != tx2_pmu->cpu)
return 0;
- hrtimer_cancel(&tx2_pmu->hrtimer);
+ if (tx2_pmu->type != PMU_TYPE_CCPI2)
+ hrtimer_cancel(&tx2_pmu->hrtimer);
cpumask_copy(&cpu_online_mask_temp, cpu_online_mask);
cpumask_clear_cpu(cpu, &cpu_online_mask_temp);
new_cpu = cpumask_any_and(
--
2.17.1
^ permalink raw reply related
* [PATCH v3 0/2] Add CCPI2 PMU support
From: Ganapatrao Kulkarni @ 2019-07-23 9:16 UTC (permalink / raw)
To: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
Cc: will@kernel.org, mark.rutland@arm.com, corbet@lwn.net,
Jayachandran Chandrasekharan Nair, Robert Richter, Jan Glauber,
gklkml16@gmail.com
Add Cavium Coherent Processor Interconnect (CCPI2) PMU
support in ThunderX2 Uncore driver.
v3: Rebased to 5.3-rc1
v2: Updated with review comments [1]
[1] https://lkml.org/lkml/2019/6/14/965
v1: initial patch
Ganapatrao Kulkarni (2):
Documentation: perf: Update documentation for ThunderX2 PMU uncore
driver
drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver.
.../admin-guide/perf/thunderx2-pmu.rst | 20 +-
drivers/perf/thunderx2_pmu.c | 248 +++++++++++++++---
2 files changed, 225 insertions(+), 43 deletions(-)
--
2.17.1
^ permalink raw reply
* [PATCH v3 1/2] Documentation: perf: Update documentation for ThunderX2 PMU uncore driver
From: Ganapatrao Kulkarni @ 2019-07-23 9:16 UTC (permalink / raw)
To: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
Cc: will@kernel.org, mark.rutland@arm.com, corbet@lwn.net,
Jayachandran Chandrasekharan Nair, Robert Richter, Jan Glauber,
gklkml16@gmail.com
In-Reply-To: <1563873380-2003-1-git-send-email-gkulkarni@marvell.com>
From: Ganapatrao Kulkarni <ganapatrao.kulkarni@marvell.com>
Add documentation for Cavium Coherent Processor Interconnect (CCPI2) PMU.
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
---
.../admin-guide/perf/thunderx2-pmu.rst | 20 ++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/Documentation/admin-guide/perf/thunderx2-pmu.rst b/Documentation/admin-guide/perf/thunderx2-pmu.rst
index 08e33675853a..01f158238ae1 100644
--- a/Documentation/admin-guide/perf/thunderx2-pmu.rst
+++ b/Documentation/admin-guide/perf/thunderx2-pmu.rst
@@ -3,24 +3,26 @@ Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE)
=============================================================
The ThunderX2 SoC PMU consists of independent, system-wide, per-socket
-PMUs such as the Level 3 Cache (L3C) and DDR4 Memory Controller (DMC).
+PMUs such as the Level 3 Cache (L3C), DDR4 Memory Controller (DMC) and
+Cavium Coherent Processor Interconnect (CCPI2).
The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles.
Events are counted for the default channel (i.e. channel 0) and prorated
to the total number of channels/tiles.
-The DMC and L3C support up to 4 counters. Counters are independently
-programmable and can be started and stopped individually. Each counter
-can be set to a different event. Counters are 32-bit and do not support
-an overflow interrupt; they are read every 2 seconds.
+The DMC and L3C support up to 4 counters, while the CCPI2 supports up to 8
+counters. Counters are independently programmable to different events and
+can be started and stopped individually. None of the counters support an
+overflow interrupt. DMC and L3C counters are 32-bit and read every 2 seconds.
+The CCPI2 counters are 64-bit and assumed not to overflow in normal operation.
PMU UNCORE (perf) driver:
The thunderx2_pmu driver registers per-socket perf PMUs for the DMC and
-L3C devices. Each PMU can be used to count up to 4 events
-simultaneously. The PMUs provide a description of their available events
-and configuration options under sysfs, see
-/sys/devices/uncore_<l3c_S/dmc_S/>; S is the socket id.
+L3C devices. Each PMU can be used to count up to 4 (DMC/L3C) or up to 8
+(CCPI2) events simultaneously. The PMUs provide a description of their
+available events and configuration options under sysfs, see
+/sys/devices/uncore_<l3c_S/dmc_S/ccpi2_S/>; S is the socket id.
The driver does not support sampling, therefore "perf record" will not
work. Per-task perf sessions are also not supported.
--
2.17.1
^ permalink raw reply related
* Re: [PATCH v1 1/2] mm/page_idle: Add support for per-pid page_idle using virtual indexing
From: Konstantin Khlebnikov @ 2019-07-23 8:43 UTC (permalink / raw)
To: Joel Fernandes (Google), linux-kernel
Cc: vdavydov.dev, Brendan Gregg, kernel-team, Alexey Dobriyan,
Al Viro, Andrew Morton, carmenjackson, Christian Hansen,
Colin Ian King, dancol, David Howells, fmayer, joaodias, joelaf,
Jonathan Corbet, Kees Cook, Kirill Tkhai, linux-doc,
linux-fsdevel, linux-mm, Michal Hocko, Mike Rapoport, minchan,
minchan, namhyung, sspatil, surenb, Thomas Gleixner, timmurray,
tkjos, Vlastimil Babka, wvw
In-Reply-To: <20190722213205.140845-1-joel@joelfernandes.org>
On 23.07.2019 0:32, Joel Fernandes (Google) wrote:
> The page_idle tracking feature currently requires looking up the pagemap
> for a process followed by interacting with /sys/kernel/mm/page_idle.
> This is quite cumbersome and can be error-prone too. If between
> accessing the per-PID pagemap and the global page_idle bitmap, if
> something changes with the page then the information is not accurate.
> More over looking up PFN from pagemap in Android devices is not
> supported by unprivileged process and requires SYS_ADMIN and gives 0 for
> the PFN.
>
> This patch adds support to directly interact with page_idle tracking at
> the PID level by introducing a /proc/<pid>/page_idle file. This
> eliminates the need for userspace to calculate the mapping of the page.
> It follows the exact same semantics as the global
> /sys/kernel/mm/page_idle, however it is easier to use for some usecases
> where looking up PFN is not needed and also does not require SYS_ADMIN.
> It ended up simplifying userspace code, solving the security issue
> mentioned and works quite well. SELinux does not need to be turned off
> since no pagemap look up is needed.
>
> In Android, we are using this for the heap profiler (heapprofd) which
> profiles and pin points code paths which allocates and leaves memory
> idle for long periods of time.
>
> Documentation material:
> The idle page tracking API for virtual address indexing using virtual page
> frame numbers (VFN) is located at /proc/<pid>/page_idle. It is a bitmap
> that follows the same semantics as /sys/kernel/mm/page_idle/bitmap
> except that it uses virtual instead of physical frame numbers.
>
> This idle page tracking API can be simpler to use than physical address
> indexing, since the pagemap for a process does not need to be looked up
> to mark or read a page's idle bit. It is also more accurate than
> physical address indexing since in physical address indexing, address
> space changes can occur between reading the pagemap and reading the
> bitmap. In virtual address indexing, the process's mmap_sem is held for
> the duration of the access.
Maybe integrate this into existing interface: /proc/pid/clear_refs and
/proc/pid/pagemap ?
I.e. echo X > /proc/pid/clear_refs clears reference bits in ptes and
marks pages idle only for pages mapped in this process.
And idle bit in /proc/pid/pagemap tells that page is still idle in this process.
This is faster - we don't need to walk whole rmap for that.
>
> Cc: vdavydov.dev@gmail.com
> Cc: Brendan Gregg <bgregg@netflix.com>
> Cc: kernel-team@android.com
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>
> ---
> Internal review -> v1:
> Fixes from Suren.
> Corrections to change log, docs (Florian, Sandeep)
>
> fs/proc/base.c | 3 +
> fs/proc/internal.h | 1 +
> fs/proc/task_mmu.c | 57 +++++++
> include/linux/page_idle.h | 4 +
> mm/page_idle.c | 305 +++++++++++++++++++++++++++++++++-----
> 5 files changed, 330 insertions(+), 40 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 77eb628ecc7f..a58dd74606e9 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -3021,6 +3021,9 @@ static const struct pid_entry tgid_base_stuff[] = {
> REG("smaps", S_IRUGO, proc_pid_smaps_operations),
> REG("smaps_rollup", S_IRUGO, proc_pid_smaps_rollup_operations),
> REG("pagemap", S_IRUSR, proc_pagemap_operations),
> +#ifdef CONFIG_IDLE_PAGE_TRACKING
> + REG("page_idle", S_IRUSR|S_IWUSR, proc_page_idle_operations),
> +#endif
> #endif
> #ifdef CONFIG_SECURITY
> DIR("attr", S_IRUGO|S_IXUGO, proc_attr_dir_inode_operations, proc_attr_dir_operations),
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index cd0c8d5ce9a1..bc9371880c63 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -293,6 +293,7 @@ extern const struct file_operations proc_pid_smaps_operations;
> extern const struct file_operations proc_pid_smaps_rollup_operations;
> extern const struct file_operations proc_clear_refs_operations;
> extern const struct file_operations proc_pagemap_operations;
> +extern const struct file_operations proc_page_idle_operations;
>
> extern unsigned long task_vsize(struct mm_struct *);
> extern unsigned long task_statm(struct mm_struct *,
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 4d2b860dbc3f..11ccc53da38e 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1642,6 +1642,63 @@ const struct file_operations proc_pagemap_operations = {
> .open = pagemap_open,
> .release = pagemap_release,
> };
> +
> +#ifdef CONFIG_IDLE_PAGE_TRACKING
> +static ssize_t proc_page_idle_read(struct file *file, char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + int ret;
> + struct task_struct *tsk = get_proc_task(file_inode(file));
> +
> + if (!tsk)
> + return -EINVAL;
> + ret = page_idle_proc_read(file, buf, count, ppos, tsk);
> + put_task_struct(tsk);
> + return ret;
> +}
> +
> +static ssize_t proc_page_idle_write(struct file *file, const char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + int ret;
> + struct task_struct *tsk = get_proc_task(file_inode(file));
> +
> + if (!tsk)
> + return -EINVAL;
> + ret = page_idle_proc_write(file, (char __user *)buf, count, ppos, tsk);
> + put_task_struct(tsk);
> + return ret;
> +}
> +
> +static int proc_page_idle_open(struct inode *inode, struct file *file)
> +{
> + struct mm_struct *mm;
> +
> + mm = proc_mem_open(inode, PTRACE_MODE_READ);
> + if (IS_ERR(mm))
> + return PTR_ERR(mm);
> + file->private_data = mm;
> + return 0;
> +}
> +
> +static int proc_page_idle_release(struct inode *inode, struct file *file)
> +{
> + struct mm_struct *mm = file->private_data;
> +
> + if (mm)
> + mmdrop(mm);
> + return 0;
> +}
> +
> +const struct file_operations proc_page_idle_operations = {
> + .llseek = mem_lseek, /* borrow this */
> + .read = proc_page_idle_read,
> + .write = proc_page_idle_write,
> + .open = proc_page_idle_open,
> + .release = proc_page_idle_release,
> +};
> +#endif /* CONFIG_IDLE_PAGE_TRACKING */
> +
> #endif /* CONFIG_PROC_PAGE_MONITOR */
>
> #ifdef CONFIG_NUMA
> diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h
> index 1e894d34bdce..f1bc2640d85e 100644
> --- a/include/linux/page_idle.h
> +++ b/include/linux/page_idle.h
> @@ -106,6 +106,10 @@ static inline void clear_page_idle(struct page *page)
> }
> #endif /* CONFIG_64BIT */
>
> +ssize_t page_idle_proc_write(struct file *file,
> + char __user *buf, size_t count, loff_t *ppos, struct task_struct *tsk);
> +ssize_t page_idle_proc_read(struct file *file,
> + char __user *buf, size_t count, loff_t *ppos, struct task_struct *tsk);
> #else /* !CONFIG_IDLE_PAGE_TRACKING */
>
> static inline bool page_is_young(struct page *page)
> diff --git a/mm/page_idle.c b/mm/page_idle.c
> index 295512465065..874a60c41fef 100644
> --- a/mm/page_idle.c
> +++ b/mm/page_idle.c
> @@ -11,6 +11,7 @@
> #include <linux/mmu_notifier.h>
> #include <linux/page_ext.h>
> #include <linux/page_idle.h>
> +#include <linux/sched/mm.h>
>
> #define BITMAP_CHUNK_SIZE sizeof(u64)
> #define BITMAP_CHUNK_BITS (BITMAP_CHUNK_SIZE * BITS_PER_BYTE)
> @@ -28,15 +29,12 @@
> *
> * This function tries to get a user memory page by pfn as described above.
> */
> -static struct page *page_idle_get_page(unsigned long pfn)
> +static struct page *page_idle_get_page(struct page *page_in)
> {
> struct page *page;
> pg_data_t *pgdat;
>
> - if (!pfn_valid(pfn))
> - return NULL;
> -
> - page = pfn_to_page(pfn);
> + page = page_in;
> if (!page || !PageLRU(page) ||
> !get_page_unless_zero(page))
> return NULL;
> @@ -51,6 +49,15 @@ static struct page *page_idle_get_page(unsigned long pfn)
> return page;
> }
>
> +static struct page *page_idle_get_page_pfn(unsigned long pfn)
> +{
> +
> + if (!pfn_valid(pfn))
> + return NULL;
> +
> + return page_idle_get_page(pfn_to_page(pfn));
> +}
> +
> static bool page_idle_clear_pte_refs_one(struct page *page,
> struct vm_area_struct *vma,
> unsigned long addr, void *arg)
> @@ -118,6 +125,47 @@ static void page_idle_clear_pte_refs(struct page *page)
> unlock_page(page);
> }
>
> +/* Helper to get the start and end frame given a pos and count */
> +static int page_idle_get_frames(loff_t pos, size_t count, struct mm_struct *mm,
> + unsigned long *start, unsigned long *end)
> +{
> + unsigned long max_frame;
> +
> + /* If an mm is not given, assume we want physical frames */
> + max_frame = mm ? (mm->task_size >> PAGE_SHIFT) : max_pfn;
> +
> + if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE)
> + return -EINVAL;
> +
> + *start = pos * BITS_PER_BYTE;
> + if (*start >= max_frame)
> + return -ENXIO;
> +
> + *end = *start + count * BITS_PER_BYTE;
> + if (*end > max_frame)
> + *end = max_frame;
> + return 0;
> +}
> +
> +static bool page_really_idle(struct page *page)
> +{
> + if (!page)
> + return false;
> +
> + if (page_is_idle(page)) {
> + /*
> + * The page might have been referenced via a
> + * pte, in which case it is not idle. Clear
> + * refs and recheck.
> + */
> + page_idle_clear_pte_refs(page);
> + if (page_is_idle(page))
> + return true;
> + }
> +
> + return false;
> +}
> +
> static ssize_t page_idle_bitmap_read(struct file *file, struct kobject *kobj,
> struct bin_attribute *attr, char *buf,
> loff_t pos, size_t count)
> @@ -125,35 +173,21 @@ static ssize_t page_idle_bitmap_read(struct file *file, struct kobject *kobj,
> u64 *out = (u64 *)buf;
> struct page *page;
> unsigned long pfn, end_pfn;
> - int bit;
> -
> - if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE)
> - return -EINVAL;
> -
> - pfn = pos * BITS_PER_BYTE;
> - if (pfn >= max_pfn)
> - return 0;
> + int bit, ret;
>
> - end_pfn = pfn + count * BITS_PER_BYTE;
> - if (end_pfn > max_pfn)
> - end_pfn = max_pfn;
> + ret = page_idle_get_frames(pos, count, NULL, &pfn, &end_pfn);
> + if (ret == -ENXIO)
> + return 0; /* Reads beyond max_pfn do nothing */
> + else if (ret)
> + return ret;
>
> for (; pfn < end_pfn; pfn++) {
> bit = pfn % BITMAP_CHUNK_BITS;
> if (!bit)
> *out = 0ULL;
> - page = page_idle_get_page(pfn);
> - if (page) {
> - if (page_is_idle(page)) {
> - /*
> - * The page might have been referenced via a
> - * pte, in which case it is not idle. Clear
> - * refs and recheck.
> - */
> - page_idle_clear_pte_refs(page);
> - if (page_is_idle(page))
> - *out |= 1ULL << bit;
> - }
> + page = page_idle_get_page_pfn(pfn);
> + if (page && page_really_idle(page)) {
> + *out |= 1ULL << bit;
> put_page(page);
> }
> if (bit == BITMAP_CHUNK_BITS - 1)
> @@ -170,23 +204,16 @@ static ssize_t page_idle_bitmap_write(struct file *file, struct kobject *kobj,
> const u64 *in = (u64 *)buf;
> struct page *page;
> unsigned long pfn, end_pfn;
> - int bit;
> + int bit, ret;
>
> - if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE)
> - return -EINVAL;
> -
> - pfn = pos * BITS_PER_BYTE;
> - if (pfn >= max_pfn)
> - return -ENXIO;
> -
> - end_pfn = pfn + count * BITS_PER_BYTE;
> - if (end_pfn > max_pfn)
> - end_pfn = max_pfn;
> + ret = page_idle_get_frames(pos, count, NULL, &pfn, &end_pfn);
> + if (ret)
> + return ret;
>
> for (; pfn < end_pfn; pfn++) {
> bit = pfn % BITMAP_CHUNK_BITS;
> if ((*in >> bit) & 1) {
> - page = page_idle_get_page(pfn);
> + page = page_idle_get_page_pfn(pfn);
> if (page) {
> page_idle_clear_pte_refs(page);
> set_page_idle(page);
> @@ -224,10 +251,208 @@ struct page_ext_operations page_idle_ops = {
> };
> #endif
>
> +/* page_idle tracking for /proc/<pid>/page_idle */
> +
> +static DEFINE_SPINLOCK(idle_page_list_lock);
> +struct list_head idle_page_list;
> +
> +struct page_node {
> + struct page *page;
> + unsigned long addr;
> + struct list_head list;
> +};
> +
> +struct page_idle_proc_priv {
> + unsigned long start_addr;
> + char *buffer;
> + int write;
> +};
> +
> +static void add_page_idle_list(struct page *page,
> + unsigned long addr, struct mm_walk *walk)
> +{
> + struct page *page_get;
> + struct page_node *pn;
> + int bit;
> + unsigned long frames;
> + struct page_idle_proc_priv *priv = walk->private;
> + u64 *chunk = (u64 *)priv->buffer;
> +
> + if (priv->write) {
> + /* Find whether this page was asked to be marked */
> + frames = (addr - priv->start_addr) >> PAGE_SHIFT;
> + bit = frames % BITMAP_CHUNK_BITS;
> + chunk = &chunk[frames / BITMAP_CHUNK_BITS];
> + if (((*chunk >> bit) & 1) == 0)
> + return;
> + }
> +
> + page_get = page_idle_get_page(page);
> + if (!page_get)
> + return;
> +
> + pn = kmalloc(sizeof(*pn), GFP_ATOMIC);
> + if (!pn)
> + return;
> +
> + pn->page = page_get;
> + pn->addr = addr;
> + list_add(&pn->list, &idle_page_list);
> +}
> +
> +static int pte_page_idle_proc_range(pmd_t *pmd, unsigned long addr,
> + unsigned long end,
> + struct mm_walk *walk)
> +{
> + struct vm_area_struct *vma = walk->vma;
> + pte_t *pte;
> + spinlock_t *ptl;
> + struct page *page;
> +
> + ptl = pmd_trans_huge_lock(pmd, vma);
> + if (ptl) {
> + if (pmd_present(*pmd)) {
> + page = follow_trans_huge_pmd(vma, addr, pmd,
> + FOLL_DUMP|FOLL_WRITE);
> + if (!IS_ERR_OR_NULL(page))
> + add_page_idle_list(page, addr, walk);
> + }
> + spin_unlock(ptl);
> + return 0;
> + }
> +
> + if (pmd_trans_unstable(pmd))
> + return 0;
> +
> + pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> + for (; addr != end; pte++, addr += PAGE_SIZE) {
> + if (!pte_present(*pte))
> + continue;
> +
> + page = vm_normal_page(vma, addr, *pte);
> + if (page)
> + add_page_idle_list(page, addr, walk);
> + }
> +
> + pte_unmap_unlock(pte - 1, ptl);
> + return 0;
> +}
> +
> +ssize_t page_idle_proc_generic(struct file *file, char __user *ubuff,
> + size_t count, loff_t *pos,
> + struct task_struct *tsk, int write)
> +{
> + int ret;
> + char *buffer;
> + u64 *out;
> + unsigned long start_addr, end_addr, start_frame, end_frame;
> + struct mm_struct *mm = file->private_data;
> + struct mm_walk walk = { .pmd_entry = pte_page_idle_proc_range, };
> + struct page_node *cur, *next;
> + struct page_idle_proc_priv priv;
> + bool walk_error = false;
> +
> + if (!mm || !mmget_not_zero(mm))
> + return -EINVAL;
> +
> + if (count > PAGE_SIZE)
> + count = PAGE_SIZE;
> +
> + buffer = kzalloc(PAGE_SIZE, GFP_KERNEL);
> + if (!buffer) {
> + ret = -ENOMEM;
> + goto out_mmput;
> + }
> + out = (u64 *)buffer;
> +
> + if (write && copy_from_user(buffer, ubuff, count)) {
> + ret = -EFAULT;
> + goto out;
> + }
> +
> + ret = page_idle_get_frames(*pos, count, mm, &start_frame, &end_frame);
> + if (ret)
> + goto out;
> +
> + start_addr = (start_frame << PAGE_SHIFT);
> + end_addr = (end_frame << PAGE_SHIFT);
> + priv.buffer = buffer;
> + priv.start_addr = start_addr;
> + priv.write = write;
> + walk.private = &priv;
> + walk.mm = mm;
> +
> + down_read(&mm->mmap_sem);
> +
> + /*
> + * Protects the idle_page_list which is needed because
> + * walk_page_vma() holds ptlock which deadlocks with
> + * page_idle_clear_pte_refs(). So we have to collect all
> + * pages first, and then call page_idle_clear_pte_refs().
> + */
> + spin_lock(&idle_page_list_lock);
> + ret = walk_page_range(start_addr, end_addr, &walk);
> + if (ret)
> + walk_error = true;
> +
> + list_for_each_entry_safe(cur, next, &idle_page_list, list) {
> + int bit, index;
> + unsigned long off;
> + struct page *page = cur->page;
> +
> + if (unlikely(walk_error))
> + goto remove_page;
> +
> + if (write) {
> + page_idle_clear_pte_refs(page);
> + set_page_idle(page);
> + } else {
> + if (page_really_idle(page)) {
> + off = ((cur->addr) >> PAGE_SHIFT) - start_frame;
> + bit = off % BITMAP_CHUNK_BITS;
> + index = off / BITMAP_CHUNK_BITS;
> + out[index] |= 1ULL << bit;
> + }
> + }
> +remove_page:
> + put_page(page);
> + list_del(&cur->list);
> + kfree(cur);
> + }
> + spin_unlock(&idle_page_list_lock);
> +
> + if (!write && !walk_error)
> + ret = copy_to_user(ubuff, buffer, count);
> +
> + up_read(&mm->mmap_sem);
> +out:
> + kfree(buffer);
> +out_mmput:
> + mmput(mm);
> + if (!ret)
> + ret = count;
> + return ret;
> +
> +}
> +
> +ssize_t page_idle_proc_read(struct file *file, char __user *ubuff,
> + size_t count, loff_t *pos, struct task_struct *tsk)
> +{
> + return page_idle_proc_generic(file, ubuff, count, pos, tsk, 0);
> +}
> +
> +ssize_t page_idle_proc_write(struct file *file, char __user *ubuff,
> + size_t count, loff_t *pos, struct task_struct *tsk)
> +{
> + return page_idle_proc_generic(file, ubuff, count, pos, tsk, 1);
> +}
> +
> static int __init page_idle_init(void)
> {
> int err;
>
> + INIT_LIST_HEAD(&idle_page_list);
> +
> err = sysfs_create_group(mm_kobj, &page_idle_attr_group);
> if (err) {
> pr_err("page_idle: register sysfs failed\n");
>
^ permalink raw reply
* [PATCH V2 10/10] Documentation: cpufreq: Update policy notifier documentation
From: Viresh Kumar @ 2019-07-23 6:14 UTC (permalink / raw)
To: Rafael Wysocki, Viresh Kumar, Jonathan Corbet
Cc: linux-pm, Vincent Guittot, linux-doc, linux-kernel
In-Reply-To: <cover.1563862014.git.viresh.kumar@linaro.org>
Update documentation with the recent policy notifier updates.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
Documentation/cpu-freq/core.txt | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)
diff --git a/Documentation/cpu-freq/core.txt b/Documentation/cpu-freq/core.txt
index 55193e680250..ed577d9c154b 100644
--- a/Documentation/cpu-freq/core.txt
+++ b/Documentation/cpu-freq/core.txt
@@ -57,19 +57,11 @@ transition notifiers.
2.1 CPUFreq policy notifiers
----------------------------
-These are notified when a new policy is intended to be set. Each
-CPUFreq policy notifier is called twice for a policy transition:
+These are notified when a new policy is created or removed.
-1.) During CPUFREQ_ADJUST all CPUFreq notifiers may change the limit if
- they see a need for this - may it be thermal considerations or
- hardware limitations.
-
-2.) And during CPUFREQ_NOTIFY all notifiers are informed of the new policy
- - if two hardware drivers failed to agree on a new policy before this
- stage, the incompatible hardware shall be shut down, and the user
- informed of this.
-
-The phase is specified in the second argument to the notifier.
+The phase is specified in the second argument to the notifier. The phase is
+CPUFREQ_CREATE_POLICY when the policy is first created and it is
+CPUFREQ_REMOVE_POLICY when the policy is removed.
The third argument, a void *pointer, points to a struct cpufreq_policy
consisting of several values, including min, max (the lower and upper
--
2.21.0.rc0.269.g1a574e7a288b
^ permalink raw reply related
* [PATCH V2 00/10] cpufreq: Migrate users of policy notifiers to QoS requests
From: Viresh Kumar @ 2019-07-23 6:14 UTC (permalink / raw)
To: Rafael Wysocki, Amit Daniel Kachhap, Bartlomiej Zolnierkiewicz,
Benjamin Herrenschmidt, Daniel Lezcano, Eduardo Valentin,
Erik Schmauss, Greg Kroah-Hartman, Javi Merino, Jonathan Corbet,
Len Brown, Rafael J. Wysocki, Robert Moore, Viresh Kumar,
Zhang Rui
Cc: linux-pm, Vincent Guittot, devel, dri-devel, linux-acpi,
linux-doc, linux-fbdev, linux-kernel, linuxppc-dev
Hello,
Now that cpufreq core supports taking QoS requests for min/max cpu
frequencies, lets migrate rest of the users to using them instead of the
policy notifiers.
The CPUFREQ_NOTIFY and CPUFREQ_ADJUST events of the policy notifiers are
removed as a result, but we have to add CPUFREQ_CREATE_POLICY and
CPUFREQ_REMOVE_POLICY events to it for the acpi stuff specifically,
though they are also used by arch_topology stuff now. So the policy
notifiers aren't completely removed.
Boot tested on my x86 PC and ARM hikey board.
This has already gone through build bot for a few days now.
V1->V2:
- Added Acked-by tags
- Reordered to keep cleanups at the bottom
- Rebased over 5.3-rc1
--
viresh
Viresh Kumar (10):
cpufreq: Add policy create/remove notifiers
thermal: cpu_cooling: Switch to QoS requests instead of cpufreq
notifier
powerpc: macintosh: Switch to QoS requests instead of cpufreq notifier
cpufreq: powerpc_cbe: Switch to QoS requests instead of cpufreq
notifier
ACPI: cpufreq: Switch to QoS requests instead of cpufreq notifier
arch_topology: Use CPUFREQ_CREATE_POLICY instead of CPUFREQ_NOTIFY
video: sa1100fb: Remove cpufreq policy notifier
video: pxafb: Remove cpufreq policy notifier
cpufreq: Remove CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier
events
Documentation: cpufreq: Update policy notifier documentation
Documentation/cpu-freq/core.txt | 16 +--
drivers/acpi/processor_driver.c | 44 ++++++++-
drivers/acpi/processor_perflib.c | 106 +++++++++-----------
drivers/acpi/processor_thermal.c | 81 ++++++++-------
drivers/base/arch_topology.c | 2 +-
drivers/cpufreq/cpufreq.c | 51 ++++------
drivers/cpufreq/ppc_cbe_cpufreq.c | 19 +++-
drivers/cpufreq/ppc_cbe_cpufreq.h | 8 ++
drivers/cpufreq/ppc_cbe_cpufreq_pmi.c | 96 +++++++++++-------
drivers/macintosh/windfarm_cpufreq_clamp.c | 77 ++++++++++-----
drivers/thermal/cpu_cooling.c | 110 +++++----------------
drivers/video/fbdev/pxafb.c | 21 ----
drivers/video/fbdev/pxafb.h | 1 -
drivers/video/fbdev/sa1100fb.c | 27 -----
drivers/video/fbdev/sa1100fb.h | 1 -
include/acpi/processor.h | 22 +++--
include/linux/cpufreq.h | 4 +-
17 files changed, 327 insertions(+), 359 deletions(-)
--
2.21.0.rc0.269.g1a574e7a288b
^ permalink raw reply
* Re: [PATCH v1 1/2] mm/page_idle: Add support for per-pid page_idle using virtual indexing
From: Minchan Kim @ 2019-07-23 6:13 UTC (permalink / raw)
To: Joel Fernandes (Google)
Cc: linux-kernel, vdavydov.dev, Brendan Gregg, kernel-team,
Alexey Dobriyan, Al Viro, Andrew Morton, carmenjackson,
Christian Hansen, Colin Ian King, dancol, David Howells, fmayer,
joaodias, joelaf, Jonathan Corbet, Kees Cook, Kirill Tkhai,
Konstantin Khlebnikov, linux-doc, linux-fsdevel, linux-mm,
Michal Hocko, Mike Rapoport, namhyung, sspatil, surenb,
Thomas Gleixner, timmurray, tkjos, Vlastimil Babka, wvw
In-Reply-To: <20190722213205.140845-1-joel@joelfernandes.org>
Hi Joel,
On Mon, Jul 22, 2019 at 05:32:04PM -0400, Joel Fernandes (Google) wrote:
> The page_idle tracking feature currently requires looking up the pagemap
> for a process followed by interacting with /sys/kernel/mm/page_idle.
> This is quite cumbersome and can be error-prone too. If between
cumbersome: That's the fair tradeoff between idle page tracking and
clear_refs because idle page tracking could check even though the page
is not mapped.
error-prone: What's the error?
> accessing the per-PID pagemap and the global page_idle bitmap, if
> something changes with the page then the information is not accurate.
What you mean with error is this timing issue?
Why do you need to be accurate? IOW, accurate is always good but what's
the scale of the accuracy?
> More over looking up PFN from pagemap in Android devices is not
> supported by unprivileged process and requires SYS_ADMIN and gives 0 for
> the PFN.
>
> This patch adds support to directly interact with page_idle tracking at
> the PID level by introducing a /proc/<pid>/page_idle file. This
> eliminates the need for userspace to calculate the mapping of the page.
> It follows the exact same semantics as the global
> /sys/kernel/mm/page_idle, however it is easier to use for some usecases
> where looking up PFN is not needed and also does not require SYS_ADMIN.
Ah, so the primary goal is to provide convinience interface and it would
help accurary, too. IOW, accuracy is not your main goal?
> It ended up simplifying userspace code, solving the security issue
> mentioned and works quite well. SELinux does not need to be turned off
> since no pagemap look up is needed.
I'm not sure how it is painful to check it via pagemap for your goal
but not sure it's a good idea to create new ABI for just convinience.
I think that's library we have.
>
> In Android, we are using this for the heap profiler (heapprofd) which
> profiles and pin points code paths which allocates and leaves memory
> idle for long periods of time.
So the goal is to detect idle pages with idle memory tracking?
It couldn't work well because such idle pages could finally swap out and
lose every flags of the page descriptor which is working mechanism of
idle page tracking. It should have named "workingset page tracking",
not "idle page tracking".
>
> Documentation material:
> The idle page tracking API for virtual address indexing using virtual page
> frame numbers (VFN) is located at /proc/<pid>/page_idle. It is a bitmap
> that follows the same semantics as /sys/kernel/mm/page_idle/bitmap
> except that it uses virtual instead of physical frame numbers.
>
> This idle page tracking API can be simpler to use than physical address
> indexing, since the pagemap for a process does not need to be looked up
> to mark or read a page's idle bit. It is also more accurate than
> physical address indexing since in physical address indexing, address
> space changes can occur between reading the pagemap and reading the
> bitmap. In virtual address indexing, the process's mmap_sem is held for
> the duration of the access.
>
> Cc: vdavydov.dev@gmail.com
> Cc: Brendan Gregg <bgregg@netflix.com>
> Cc: kernel-team@android.com
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>
> ---
> Internal review -> v1:
> Fixes from Suren.
> Corrections to change log, docs (Florian, Sandeep)
>
> fs/proc/base.c | 3 +
> fs/proc/internal.h | 1 +
> fs/proc/task_mmu.c | 57 +++++++
> include/linux/page_idle.h | 4 +
> mm/page_idle.c | 305 +++++++++++++++++++++++++++++++++-----
> 5 files changed, 330 insertions(+), 40 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 77eb628ecc7f..a58dd74606e9 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -3021,6 +3021,9 @@ static const struct pid_entry tgid_base_stuff[] = {
> REG("smaps", S_IRUGO, proc_pid_smaps_operations),
> REG("smaps_rollup", S_IRUGO, proc_pid_smaps_rollup_operations),
> REG("pagemap", S_IRUSR, proc_pagemap_operations),
> +#ifdef CONFIG_IDLE_PAGE_TRACKING
> + REG("page_idle", S_IRUSR|S_IWUSR, proc_page_idle_operations),
> +#endif
> #endif
> #ifdef CONFIG_SECURITY
> DIR("attr", S_IRUGO|S_IXUGO, proc_attr_dir_inode_operations, proc_attr_dir_operations),
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index cd0c8d5ce9a1..bc9371880c63 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -293,6 +293,7 @@ extern const struct file_operations proc_pid_smaps_operations;
> extern const struct file_operations proc_pid_smaps_rollup_operations;
> extern const struct file_operations proc_clear_refs_operations;
> extern const struct file_operations proc_pagemap_operations;
> +extern const struct file_operations proc_page_idle_operations;
>
> extern unsigned long task_vsize(struct mm_struct *);
> extern unsigned long task_statm(struct mm_struct *,
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 4d2b860dbc3f..11ccc53da38e 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1642,6 +1642,63 @@ const struct file_operations proc_pagemap_operations = {
> .open = pagemap_open,
> .release = pagemap_release,
> };
> +
> +#ifdef CONFIG_IDLE_PAGE_TRACKING
> +static ssize_t proc_page_idle_read(struct file *file, char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + int ret;
> + struct task_struct *tsk = get_proc_task(file_inode(file));
> +
> + if (!tsk)
> + return -EINVAL;
> + ret = page_idle_proc_read(file, buf, count, ppos, tsk);
> + put_task_struct(tsk);
> + return ret;
> +}
> +
> +static ssize_t proc_page_idle_write(struct file *file, const char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + int ret;
> + struct task_struct *tsk = get_proc_task(file_inode(file));
> +
> + if (!tsk)
> + return -EINVAL;
> + ret = page_idle_proc_write(file, (char __user *)buf, count, ppos, tsk);
> + put_task_struct(tsk);
> + return ret;
> +}
> +
> +static int proc_page_idle_open(struct inode *inode, struct file *file)
> +{
> + struct mm_struct *mm;
> +
> + mm = proc_mem_open(inode, PTRACE_MODE_READ);
> + if (IS_ERR(mm))
> + return PTR_ERR(mm);
> + file->private_data = mm;
> + return 0;
> +}
> +
> +static int proc_page_idle_release(struct inode *inode, struct file *file)
> +{
> + struct mm_struct *mm = file->private_data;
> +
> + if (mm)
> + mmdrop(mm);
> + return 0;
> +}
> +
> +const struct file_operations proc_page_idle_operations = {
> + .llseek = mem_lseek, /* borrow this */
> + .read = proc_page_idle_read,
> + .write = proc_page_idle_write,
> + .open = proc_page_idle_open,
> + .release = proc_page_idle_release,
> +};
> +#endif /* CONFIG_IDLE_PAGE_TRACKING */
> +
> #endif /* CONFIG_PROC_PAGE_MONITOR */
>
> #ifdef CONFIG_NUMA
> diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h
> index 1e894d34bdce..f1bc2640d85e 100644
> --- a/include/linux/page_idle.h
> +++ b/include/linux/page_idle.h
> @@ -106,6 +106,10 @@ static inline void clear_page_idle(struct page *page)
> }
> #endif /* CONFIG_64BIT */
>
> +ssize_t page_idle_proc_write(struct file *file,
> + char __user *buf, size_t count, loff_t *ppos, struct task_struct *tsk);
> +ssize_t page_idle_proc_read(struct file *file,
> + char __user *buf, size_t count, loff_t *ppos, struct task_struct *tsk);
> #else /* !CONFIG_IDLE_PAGE_TRACKING */
>
> static inline bool page_is_young(struct page *page)
> diff --git a/mm/page_idle.c b/mm/page_idle.c
> index 295512465065..874a60c41fef 100644
> --- a/mm/page_idle.c
> +++ b/mm/page_idle.c
> @@ -11,6 +11,7 @@
> #include <linux/mmu_notifier.h>
> #include <linux/page_ext.h>
> #include <linux/page_idle.h>
> +#include <linux/sched/mm.h>
>
> #define BITMAP_CHUNK_SIZE sizeof(u64)
> #define BITMAP_CHUNK_BITS (BITMAP_CHUNK_SIZE * BITS_PER_BYTE)
> @@ -28,15 +29,12 @@
> *
> * This function tries to get a user memory page by pfn as described above.
> */
> -static struct page *page_idle_get_page(unsigned long pfn)
> +static struct page *page_idle_get_page(struct page *page_in)
> {
> struct page *page;
> pg_data_t *pgdat;
>
> - if (!pfn_valid(pfn))
> - return NULL;
> -
> - page = pfn_to_page(pfn);
> + page = page_in;
> if (!page || !PageLRU(page) ||
> !get_page_unless_zero(page))
> return NULL;
> @@ -51,6 +49,15 @@ static struct page *page_idle_get_page(unsigned long pfn)
> return page;
> }
>
> +static struct page *page_idle_get_page_pfn(unsigned long pfn)
> +{
> +
> + if (!pfn_valid(pfn))
> + return NULL;
> +
> + return page_idle_get_page(pfn_to_page(pfn));
> +}
> +
> static bool page_idle_clear_pte_refs_one(struct page *page,
> struct vm_area_struct *vma,
> unsigned long addr, void *arg)
> @@ -118,6 +125,47 @@ static void page_idle_clear_pte_refs(struct page *page)
> unlock_page(page);
> }
>
> +/* Helper to get the start and end frame given a pos and count */
> +static int page_idle_get_frames(loff_t pos, size_t count, struct mm_struct *mm,
> + unsigned long *start, unsigned long *end)
> +{
> + unsigned long max_frame;
> +
> + /* If an mm is not given, assume we want physical frames */
> + max_frame = mm ? (mm->task_size >> PAGE_SHIFT) : max_pfn;
> +
> + if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE)
> + return -EINVAL;
> +
> + *start = pos * BITS_PER_BYTE;
> + if (*start >= max_frame)
> + return -ENXIO;
> +
> + *end = *start + count * BITS_PER_BYTE;
> + if (*end > max_frame)
> + *end = max_frame;
> + return 0;
> +}
> +
> +static bool page_really_idle(struct page *page)
> +{
> + if (!page)
> + return false;
> +
> + if (page_is_idle(page)) {
> + /*
> + * The page might have been referenced via a
> + * pte, in which case it is not idle. Clear
> + * refs and recheck.
> + */
> + page_idle_clear_pte_refs(page);
> + if (page_is_idle(page))
> + return true;
> + }
> +
> + return false;
> +}
> +
> static ssize_t page_idle_bitmap_read(struct file *file, struct kobject *kobj,
> struct bin_attribute *attr, char *buf,
> loff_t pos, size_t count)
> @@ -125,35 +173,21 @@ static ssize_t page_idle_bitmap_read(struct file *file, struct kobject *kobj,
> u64 *out = (u64 *)buf;
> struct page *page;
> unsigned long pfn, end_pfn;
> - int bit;
> -
> - if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE)
> - return -EINVAL;
> -
> - pfn = pos * BITS_PER_BYTE;
> - if (pfn >= max_pfn)
> - return 0;
> + int bit, ret;
>
> - end_pfn = pfn + count * BITS_PER_BYTE;
> - if (end_pfn > max_pfn)
> - end_pfn = max_pfn;
> + ret = page_idle_get_frames(pos, count, NULL, &pfn, &end_pfn);
> + if (ret == -ENXIO)
> + return 0; /* Reads beyond max_pfn do nothing */
> + else if (ret)
> + return ret;
>
> for (; pfn < end_pfn; pfn++) {
> bit = pfn % BITMAP_CHUNK_BITS;
> if (!bit)
> *out = 0ULL;
> - page = page_idle_get_page(pfn);
> - if (page) {
> - if (page_is_idle(page)) {
> - /*
> - * The page might have been referenced via a
> - * pte, in which case it is not idle. Clear
> - * refs and recheck.
> - */
> - page_idle_clear_pte_refs(page);
> - if (page_is_idle(page))
> - *out |= 1ULL << bit;
> - }
> + page = page_idle_get_page_pfn(pfn);
> + if (page && page_really_idle(page)) {
> + *out |= 1ULL << bit;
> put_page(page);
> }
> if (bit == BITMAP_CHUNK_BITS - 1)
> @@ -170,23 +204,16 @@ static ssize_t page_idle_bitmap_write(struct file *file, struct kobject *kobj,
> const u64 *in = (u64 *)buf;
> struct page *page;
> unsigned long pfn, end_pfn;
> - int bit;
> + int bit, ret;
>
> - if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE)
> - return -EINVAL;
> -
> - pfn = pos * BITS_PER_BYTE;
> - if (pfn >= max_pfn)
> - return -ENXIO;
> -
> - end_pfn = pfn + count * BITS_PER_BYTE;
> - if (end_pfn > max_pfn)
> - end_pfn = max_pfn;
> + ret = page_idle_get_frames(pos, count, NULL, &pfn, &end_pfn);
> + if (ret)
> + return ret;
>
> for (; pfn < end_pfn; pfn++) {
> bit = pfn % BITMAP_CHUNK_BITS;
> if ((*in >> bit) & 1) {
> - page = page_idle_get_page(pfn);
> + page = page_idle_get_page_pfn(pfn);
> if (page) {
> page_idle_clear_pte_refs(page);
> set_page_idle(page);
> @@ -224,10 +251,208 @@ struct page_ext_operations page_idle_ops = {
> };
> #endif
>
> +/* page_idle tracking for /proc/<pid>/page_idle */
> +
> +static DEFINE_SPINLOCK(idle_page_list_lock);
> +struct list_head idle_page_list;
> +
> +struct page_node {
> + struct page *page;
> + unsigned long addr;
> + struct list_head list;
> +};
> +
> +struct page_idle_proc_priv {
> + unsigned long start_addr;
> + char *buffer;
> + int write;
> +};
> +
> +static void add_page_idle_list(struct page *page,
> + unsigned long addr, struct mm_walk *walk)
> +{
> + struct page *page_get;
> + struct page_node *pn;
> + int bit;
> + unsigned long frames;
> + struct page_idle_proc_priv *priv = walk->private;
> + u64 *chunk = (u64 *)priv->buffer;
> +
> + if (priv->write) {
> + /* Find whether this page was asked to be marked */
> + frames = (addr - priv->start_addr) >> PAGE_SHIFT;
> + bit = frames % BITMAP_CHUNK_BITS;
> + chunk = &chunk[frames / BITMAP_CHUNK_BITS];
> + if (((*chunk >> bit) & 1) == 0)
> + return;
> + }
> +
> + page_get = page_idle_get_page(page);
> + if (!page_get)
> + return;
> +
> + pn = kmalloc(sizeof(*pn), GFP_ATOMIC);
> + if (!pn)
> + return;
> +
> + pn->page = page_get;
> + pn->addr = addr;
> + list_add(&pn->list, &idle_page_list);
> +}
> +
> +static int pte_page_idle_proc_range(pmd_t *pmd, unsigned long addr,
> + unsigned long end,
> + struct mm_walk *walk)
> +{
> + struct vm_area_struct *vma = walk->vma;
> + pte_t *pte;
> + spinlock_t *ptl;
> + struct page *page;
> +
> + ptl = pmd_trans_huge_lock(pmd, vma);
> + if (ptl) {
> + if (pmd_present(*pmd)) {
> + page = follow_trans_huge_pmd(vma, addr, pmd,
> + FOLL_DUMP|FOLL_WRITE);
> + if (!IS_ERR_OR_NULL(page))
> + add_page_idle_list(page, addr, walk);
> + }
> + spin_unlock(ptl);
> + return 0;
> + }
> +
> + if (pmd_trans_unstable(pmd))
> + return 0;
> +
> + pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> + for (; addr != end; pte++, addr += PAGE_SIZE) {
> + if (!pte_present(*pte))
> + continue;
> +
> + page = vm_normal_page(vma, addr, *pte);
> + if (page)
> + add_page_idle_list(page, addr, walk);
> + }
> +
> + pte_unmap_unlock(pte - 1, ptl);
> + return 0;
> +}
> +
> +ssize_t page_idle_proc_generic(struct file *file, char __user *ubuff,
> + size_t count, loff_t *pos,
> + struct task_struct *tsk, int write)
> +{
> + int ret;
> + char *buffer;
> + u64 *out;
> + unsigned long start_addr, end_addr, start_frame, end_frame;
> + struct mm_struct *mm = file->private_data;
> + struct mm_walk walk = { .pmd_entry = pte_page_idle_proc_range, };
> + struct page_node *cur, *next;
> + struct page_idle_proc_priv priv;
> + bool walk_error = false;
> +
> + if (!mm || !mmget_not_zero(mm))
> + return -EINVAL;
> +
> + if (count > PAGE_SIZE)
> + count = PAGE_SIZE;
> +
> + buffer = kzalloc(PAGE_SIZE, GFP_KERNEL);
> + if (!buffer) {
> + ret = -ENOMEM;
> + goto out_mmput;
> + }
> + out = (u64 *)buffer;
> +
> + if (write && copy_from_user(buffer, ubuff, count)) {
> + ret = -EFAULT;
> + goto out;
> + }
> +
> + ret = page_idle_get_frames(*pos, count, mm, &start_frame, &end_frame);
> + if (ret)
> + goto out;
> +
> + start_addr = (start_frame << PAGE_SHIFT);
> + end_addr = (end_frame << PAGE_SHIFT);
> + priv.buffer = buffer;
> + priv.start_addr = start_addr;
> + priv.write = write;
> + walk.private = &priv;
> + walk.mm = mm;
> +
> + down_read(&mm->mmap_sem);
> +
> + /*
> + * Protects the idle_page_list which is needed because
> + * walk_page_vma() holds ptlock which deadlocks with
> + * page_idle_clear_pte_refs(). So we have to collect all
> + * pages first, and then call page_idle_clear_pte_refs().
> + */
> + spin_lock(&idle_page_list_lock);
> + ret = walk_page_range(start_addr, end_addr, &walk);
> + if (ret)
> + walk_error = true;
> +
> + list_for_each_entry_safe(cur, next, &idle_page_list, list) {
> + int bit, index;
> + unsigned long off;
> + struct page *page = cur->page;
> +
> + if (unlikely(walk_error))
> + goto remove_page;
> +
> + if (write) {
> + page_idle_clear_pte_refs(page);
> + set_page_idle(page);
> + } else {
> + if (page_really_idle(page)) {
> + off = ((cur->addr) >> PAGE_SHIFT) - start_frame;
> + bit = off % BITMAP_CHUNK_BITS;
> + index = off / BITMAP_CHUNK_BITS;
> + out[index] |= 1ULL << bit;
> + }
> + }
> +remove_page:
> + put_page(page);
> + list_del(&cur->list);
> + kfree(cur);
> + }
> + spin_unlock(&idle_page_list_lock);
> +
> + if (!write && !walk_error)
> + ret = copy_to_user(ubuff, buffer, count);
> +
> + up_read(&mm->mmap_sem);
> +out:
> + kfree(buffer);
> +out_mmput:
> + mmput(mm);
> + if (!ret)
> + ret = count;
> + return ret;
> +
> +}
> +
> +ssize_t page_idle_proc_read(struct file *file, char __user *ubuff,
> + size_t count, loff_t *pos, struct task_struct *tsk)
> +{
> + return page_idle_proc_generic(file, ubuff, count, pos, tsk, 0);
> +}
> +
> +ssize_t page_idle_proc_write(struct file *file, char __user *ubuff,
> + size_t count, loff_t *pos, struct task_struct *tsk)
> +{
> + return page_idle_proc_generic(file, ubuff, count, pos, tsk, 1);
> +}
> +
> static int __init page_idle_init(void)
> {
> int err;
>
> + INIT_LIST_HEAD(&idle_page_list);
> +
> err = sysfs_create_group(mm_kobj, &page_idle_attr_group);
> if (err) {
> pr_err("page_idle: register sysfs failed\n");
> --
> 2.22.0.657.g960e92d24f-goog
^ permalink raw reply
* Re: [PATCH v1 1/2] mm/page_idle: Add support for per-pid page_idle using virtual indexing
From: Michal Hocko @ 2019-07-23 6:05 UTC (permalink / raw)
To: Joel Fernandes (Google)
Cc: linux-kernel, vdavydov.dev, Brendan Gregg, kernel-team,
Alexey Dobriyan, Al Viro, Andrew Morton, carmenjackson,
Christian Hansen, Colin Ian King, dancol, David Howells, fmayer,
joaodias, joelaf, Jonathan Corbet, Kees Cook, Kirill Tkhai,
Konstantin Khlebnikov, linux-doc, linux-fsdevel, linux-mm,
Mike Rapoport, minchan, minchan, namhyung, sspatil, surenb,
Thomas Gleixner, timmurray, tkjos, Vlastimil Babka, wvw,
linux-api
In-Reply-To: <20190722213205.140845-1-joel@joelfernandes.org>
[Cc linux-api - please always do CC this list when introducing a user
visible API]
On Mon 22-07-19 17:32:04, Joel Fernandes (Google) wrote:
> The page_idle tracking feature currently requires looking up the pagemap
> for a process followed by interacting with /sys/kernel/mm/page_idle.
> This is quite cumbersome and can be error-prone too. If between
> accessing the per-PID pagemap and the global page_idle bitmap, if
> something changes with the page then the information is not accurate.
> More over looking up PFN from pagemap in Android devices is not
> supported by unprivileged process and requires SYS_ADMIN and gives 0 for
> the PFN.
>
> This patch adds support to directly interact with page_idle tracking at
> the PID level by introducing a /proc/<pid>/page_idle file. This
> eliminates the need for userspace to calculate the mapping of the page.
> It follows the exact same semantics as the global
> /sys/kernel/mm/page_idle, however it is easier to use for some usecases
> where looking up PFN is not needed and also does not require SYS_ADMIN.
> It ended up simplifying userspace code, solving the security issue
> mentioned and works quite well. SELinux does not need to be turned off
> since no pagemap look up is needed.
>
> In Android, we are using this for the heap profiler (heapprofd) which
> profiles and pin points code paths which allocates and leaves memory
> idle for long periods of time.
>
> Documentation material:
> The idle page tracking API for virtual address indexing using virtual page
> frame numbers (VFN) is located at /proc/<pid>/page_idle. It is a bitmap
> that follows the same semantics as /sys/kernel/mm/page_idle/bitmap
> except that it uses virtual instead of physical frame numbers.
>
> This idle page tracking API can be simpler to use than physical address
> indexing, since the pagemap for a process does not need to be looked up
> to mark or read a page's idle bit. It is also more accurate than
> physical address indexing since in physical address indexing, address
> space changes can occur between reading the pagemap and reading the
> bitmap. In virtual address indexing, the process's mmap_sem is held for
> the duration of the access.
I didn't get to read the actual code but the overall idea makes sense to
me. I can see this being useful for userspace memory management (along
with remote MADV_PAGEOUT, MADV_COLD).
Normally I would object that a cumbersome nature of the existing
interface can be hidden in a userspace but I do agree that rowhammer has
made this one close to unusable for anything but a privileged process.
I do not think you can make any argument about accuracy because
the information will never be accurate. Sure the race window is smaller
in principle but you can hardly say anything about how much or whether
at all.
Thanks.
--
Michal Hocko
SUSE Labs
^ permalink raw reply
* [PATCH v3 01/12] fpga: dfl: fme: support 512bit data width PR
From: Wu Hao @ 2019-07-23 4:51 UTC (permalink / raw)
To: gregkh, mdf, linux-fpga
Cc: linux-kernel, linux-api, linux-doc, atull, Wu Hao, Ananda Ravuri,
Xu Yilun
In-Reply-To: <1563857495-26692-1-git-send-email-hao.wu@intel.com>
In early partial reconfiguration private feature, it only
supports 32bit data width when writing data to hardware for
PR. 512bit data width PR support is an important optimization
for some specific solutions (e.g. XEON with FPGA integrated),
it allows driver to use AVX512 instruction to improve the
performance of partial reconfiguration. e.g. programming one
100MB bitstream image via this 512bit data width PR hardware
only takes ~300ms, but 32bit revision requires ~3s per test
result.
Please note now this optimization is only done on revision 2
of this PR private feature which is only used in integrated
solution that AVX512 is always supported. This revision 2
hardware doesn't support 32bit PR.
Signed-off-by: Ananda Ravuri <ananda.ravuri@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Acked-by: Alan Tull <atull@kernel.org>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
---
v2: remove DRV/MODULE_VERSION modifications
---
drivers/fpga/dfl-fme-mgr.c | 110 ++++++++++++++++++++++++++++++++++++++-------
drivers/fpga/dfl-fme-pr.c | 43 +++++++++++-------
drivers/fpga/dfl-fme.h | 2 +
drivers/fpga/dfl.h | 5 +++
4 files changed, 129 insertions(+), 31 deletions(-)
diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c
index b3f7eee..46e17f0 100644
--- a/drivers/fpga/dfl-fme-mgr.c
+++ b/drivers/fpga/dfl-fme-mgr.c
@@ -22,6 +22,7 @@
#include <linux/io-64-nonatomic-lo-hi.h>
#include <linux/fpga/fpga-mgr.h>
+#include "dfl.h"
#include "dfl-fme-pr.h"
/* FME Partial Reconfiguration Sub Feature Register Set */
@@ -30,6 +31,7 @@
#define FME_PR_STS 0x10
#define FME_PR_DATA 0x18
#define FME_PR_ERR 0x20
+#define FME_PR_512_DATA 0x40 /* Data Register for 512bit datawidth PR */
#define FME_PR_INTFC_ID_L 0xA8
#define FME_PR_INTFC_ID_H 0xB0
@@ -67,8 +69,43 @@
#define PR_WAIT_TIMEOUT 8000000
#define PR_HOST_STATUS_IDLE 0
+#if defined(CONFIG_X86) && defined(CONFIG_AS_AVX512)
+
+#include <linux/cpufeature.h>
+#include <asm/fpu/api.h>
+
+static inline int is_cpu_avx512_enabled(void)
+{
+ return cpu_feature_enabled(X86_FEATURE_AVX512F);
+}
+
+static inline void copy512(const void *src, void __iomem *dst)
+{
+ kernel_fpu_begin();
+
+ asm volatile("vmovdqu64 (%0), %%zmm0;"
+ "vmovntdq %%zmm0, (%1);"
+ :
+ : "r"(src), "r"(dst)
+ : "memory");
+
+ kernel_fpu_end();
+}
+#else
+static inline int is_cpu_avx512_enabled(void)
+{
+ return 0;
+}
+
+static inline void copy512(const void *src, void __iomem *dst)
+{
+ WARN_ON_ONCE(1);
+}
+#endif
+
struct fme_mgr_priv {
void __iomem *ioaddr;
+ unsigned int pr_datawidth;
u64 pr_error;
};
@@ -169,7 +206,7 @@ static int fme_mgr_write(struct fpga_manager *mgr,
struct fme_mgr_priv *priv = mgr->priv;
void __iomem *fme_pr = priv->ioaddr;
u64 pr_ctrl, pr_status, pr_data;
- int delay = 0, pr_credit, i = 0;
+ int ret = 0, delay = 0, pr_credit;
dev_dbg(dev, "start request\n");
@@ -181,9 +218,9 @@ static int fme_mgr_write(struct fpga_manager *mgr,
/*
* driver can push data to PR hardware using PR_DATA register once HW
- * has enough pr_credit (> 1), pr_credit reduces one for every 32bit
- * pr data write to PR_DATA register. If pr_credit <= 1, driver needs
- * to wait for enough pr_credit from hardware by polling.
+ * has enough pr_credit (> 1), pr_credit reduces one for every pr data
+ * width write to PR_DATA register. If pr_credit <= 1, driver needs to
+ * wait for enough pr_credit from hardware by polling.
*/
pr_status = readq(fme_pr + FME_PR_STS);
pr_credit = FIELD_GET(FME_PR_STS_PR_CREDIT, pr_status);
@@ -192,7 +229,8 @@ static int fme_mgr_write(struct fpga_manager *mgr,
while (pr_credit <= 1) {
if (delay++ > PR_WAIT_TIMEOUT) {
dev_err(dev, "PR_CREDIT timeout\n");
- return -ETIMEDOUT;
+ ret = -ETIMEDOUT;
+ goto done;
}
udelay(1);
@@ -200,21 +238,27 @@ static int fme_mgr_write(struct fpga_manager *mgr,
pr_credit = FIELD_GET(FME_PR_STS_PR_CREDIT, pr_status);
}
- if (count < 4) {
- dev_err(dev, "Invalid PR bitstream size\n");
- return -EINVAL;
+ WARN_ON(count < priv->pr_datawidth);
+
+ switch (priv->pr_datawidth) {
+ case 4:
+ pr_data = FIELD_PREP(FME_PR_DATA_PR_DATA_RAW,
+ *(u32 *)buf);
+ writeq(pr_data, fme_pr + FME_PR_DATA);
+ break;
+ case 64:
+ copy512(buf, fme_pr + FME_PR_512_DATA);
+ break;
+ default:
+ WARN_ON_ONCE(1);
}
-
- pr_data = 0;
- pr_data |= FIELD_PREP(FME_PR_DATA_PR_DATA_RAW,
- *(((u32 *)buf) + i));
- writeq(pr_data, fme_pr + FME_PR_DATA);
- count -= 4;
+ buf += priv->pr_datawidth;
+ count -= priv->pr_datawidth;
pr_credit--;
- i++;
}
- return 0;
+done:
+ return ret;
}
static int fme_mgr_write_complete(struct fpga_manager *mgr,
@@ -279,6 +323,36 @@ static void fme_mgr_get_compat_id(void __iomem *fme_pr,
id->id_h = readq(fme_pr + FME_PR_INTFC_ID_H);
}
+static u8 fme_mgr_get_pr_datawidth(struct device *dev, void __iomem *fme_pr)
+{
+ u8 revision = dfl_feature_revision(fme_pr);
+
+ if (revision < 2) {
+ /*
+ * revision 0 and 1 only support 32bit data width partial
+ * reconfiguration, so pr_datawidth is 4 (Byte).
+ */
+ return 4;
+ } else if (revision == 2) {
+ /*
+ * revision 2 hardware has optimization to support 512bit data
+ * width partial reconfiguration with AVX512 instructions. So
+ * pr_datawidth is 64 (Byte). As revision 2 hardware is only
+ * used in integrated solution, CPU supports AVX512 instructions
+ * for sure, but it still needs to check here as AVX512 could be
+ * disabled in kernel (e.g. using clearcpuid boot option).
+ */
+ if (is_cpu_avx512_enabled())
+ return 64;
+
+ dev_err(dev, "revision 2: AVX512 is disabled\n");
+ return 0;
+ }
+
+ dev_err(dev, "revision %d is not supported yet\n", revision);
+ return 0;
+}
+
static int fme_mgr_probe(struct platform_device *pdev)
{
struct dfl_fme_mgr_pdata *pdata = dev_get_platdata(&pdev->dev);
@@ -302,6 +376,10 @@ static int fme_mgr_probe(struct platform_device *pdev)
return PTR_ERR(priv->ioaddr);
}
+ priv->pr_datawidth = fme_mgr_get_pr_datawidth(dev, priv->ioaddr);
+ if (!priv->pr_datawidth)
+ return -ENODEV;
+
compat_id = devm_kzalloc(dev, sizeof(*compat_id), GFP_KERNEL);
if (!compat_id)
return -ENOMEM;
diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c
index 3c71dc3..cd94ba8 100644
--- a/drivers/fpga/dfl-fme-pr.c
+++ b/drivers/fpga/dfl-fme-pr.c
@@ -83,7 +83,7 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
if (copy_from_user(&port_pr, argp, minsz))
return -EFAULT;
- if (port_pr.argsz < minsz || port_pr.flags)
+ if (port_pr.argsz < minsz || port_pr.flags || !port_pr.buffer_size)
return -EINVAL;
/* get fme header region */
@@ -101,15 +101,25 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
port_pr.buffer_size))
return -EFAULT;
+ mutex_lock(&pdata->lock);
+ fme = dfl_fpga_pdata_get_private(pdata);
+ /* fme device has been unregistered. */
+ if (!fme) {
+ ret = -EINVAL;
+ goto unlock_exit;
+ }
+
/*
* align PR buffer per PR bandwidth, as HW ignores the extra padding
* data automatically.
*/
- length = ALIGN(port_pr.buffer_size, 4);
+ length = ALIGN(port_pr.buffer_size, fme->pr_datawidth);
buf = vmalloc(length);
- if (!buf)
- return -ENOMEM;
+ if (!buf) {
+ ret = -ENOMEM;
+ goto unlock_exit;
+ }
if (copy_from_user(buf,
(void __user *)(unsigned long)port_pr.buffer_address,
@@ -127,18 +137,10 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
info->flags |= FPGA_MGR_PARTIAL_RECONFIG;
- mutex_lock(&pdata->lock);
- fme = dfl_fpga_pdata_get_private(pdata);
- /* fme device has been unregistered. */
- if (!fme) {
- ret = -EINVAL;
- goto unlock_exit;
- }
-
region = dfl_fme_region_find(fme, port_pr.port_id);
if (!region) {
ret = -EINVAL;
- goto unlock_exit;
+ goto free_exit;
}
fpga_image_info_free(region->info);
@@ -159,10 +161,10 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
fpga_bridges_put(®ion->bridge_list);
put_device(®ion->dev);
-unlock_exit:
- mutex_unlock(&pdata->lock);
free_exit:
vfree(buf);
+unlock_exit:
+ mutex_unlock(&pdata->lock);
return ret;
}
@@ -388,6 +390,17 @@ static int pr_mgmt_init(struct platform_device *pdev,
mutex_lock(&pdata->lock);
priv = dfl_fpga_pdata_get_private(pdata);
+ /*
+ * Initialize PR data width.
+ * Only revision 2 supports 512bit datawidth for better performance,
+ * other revisions use default 32bit datawidth. This is used for
+ * buffer alignment.
+ */
+ if (dfl_feature_revision(feature->ioaddr) == 2)
+ priv->pr_datawidth = 64;
+ else
+ priv->pr_datawidth = 4;
+
/* Initialize the region and bridge sub device list */
INIT_LIST_HEAD(&priv->region_list);
INIT_LIST_HEAD(&priv->bridge_list);
diff --git a/drivers/fpga/dfl-fme.h b/drivers/fpga/dfl-fme.h
index 5394a21..de20755 100644
--- a/drivers/fpga/dfl-fme.h
+++ b/drivers/fpga/dfl-fme.h
@@ -21,12 +21,14 @@
/**
* struct dfl_fme - dfl fme private data
*
+ * @pr_datawidth: data width for partial reconfiguration.
* @mgr: FME's FPGA manager platform device.
* @region_list: linked list of FME's FPGA regions.
* @bridge_list: linked list of FME's FPGA bridges.
* @pdata: fme platform device's pdata.
*/
struct dfl_fme {
+ int pr_datawidth;
struct platform_device *mgr;
struct list_head region_list;
struct list_head bridge_list;
diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
index a8b869e..8851c6c 100644
--- a/drivers/fpga/dfl.h
+++ b/drivers/fpga/dfl.h
@@ -331,6 +331,11 @@ static inline bool dfl_feature_is_port(void __iomem *base)
(FIELD_GET(DFH_ID, v) == DFH_ID_FIU_PORT);
}
+static inline u8 dfl_feature_revision(void __iomem *base)
+{
+ return (u8)FIELD_GET(DFH_REVISION, readq(base + DFH));
+}
+
/**
* struct dfl_fpga_enum_info - DFL FPGA enumeration information
*
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 02/12] fpga: dfl: fme: add DFL_FPGA_FME_PORT_RELEASE/ASSIGN ioctl support.
From: Wu Hao @ 2019-07-23 4:51 UTC (permalink / raw)
To: gregkh, mdf, linux-fpga
Cc: linux-kernel, linux-api, linux-doc, atull, Wu Hao, Zhang Yi Z,
Xu Yilun
In-Reply-To: <1563857495-26692-1-git-send-email-hao.wu@intel.com>
In order to support virtualization usage via PCIe SRIOV, this patch
adds two ioctls under FPGA Management Engine (FME) to release and
assign back the port device. In order to safely turn Port from PF
into VF and enable PCIe SRIOV, it requires user to invoke this
PORT_RELEASE ioctl to release port firstly to remove userspace
interfaces, and then configure the PF/VF access register in FME.
After disable SRIOV, it requires user to invoke this PORT_ASSIGN
ioctl to attach the port back to PF.
Ioctl interfaces:
* DFL_FPGA_FME_PORT_RELEASE
Release platform device of given port, it deletes port platform
device to remove related userspace interfaces on PF, then
configures PF/VF access mode to VF.
* DFL_FPGA_FME_PORT_ASSIGN
Assign platform device of given port back to PF, it configures
PF/VF access mode to PF, then adds port platform device back to
re-enable related userspace interfaces on PF.
Signed-off-by: Zhang Yi Z <yi.z.zhang@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Acked-by: Alan Tull <atull@kernel.org>
Acked-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
---
v2: remove argsz from ioctls.
---
drivers/fpga/dfl-fme-main.c | 30 ++++++++++++
drivers/fpga/dfl.c | 107 +++++++++++++++++++++++++++++++++++++-----
drivers/fpga/dfl.h | 10 ++++
include/uapi/linux/fpga-dfl.h | 19 ++++++++
4 files changed, 154 insertions(+), 12 deletions(-)
diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index 0be4635..e61e0fe 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -16,6 +16,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
+#include <linux/uaccess.h>
#include <linux/fpga-dfl.h>
#include "dfl.h"
@@ -104,9 +105,38 @@ static void fme_hdr_uinit(struct platform_device *pdev,
device_remove_groups(&pdev->dev, fme_hdr_groups);
}
+static long fme_hdr_ioctl_config_port(struct dfl_feature_platform_data *pdata,
+ unsigned long arg, bool release)
+{
+ struct dfl_fpga_cdev *cdev = pdata->dfl_cdev;
+ int port_id;
+
+ if (get_user(port_id, (int __user *)arg))
+ return -EFAULT;
+
+ return dfl_fpga_cdev_config_port(cdev, port_id, release);
+}
+
+static long fme_hdr_ioctl(struct platform_device *pdev,
+ struct dfl_feature *feature,
+ unsigned int cmd, unsigned long arg)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
+
+ switch (cmd) {
+ case DFL_FPGA_FME_PORT_RELEASE:
+ return fme_hdr_ioctl_config_port(pdata, arg, true);
+ case DFL_FPGA_FME_PORT_ASSIGN:
+ return fme_hdr_ioctl_config_port(pdata, arg, false);
+ }
+
+ return -ENODEV;
+}
+
static const struct dfl_feature_ops fme_hdr_ops = {
.init = fme_hdr_init,
.uinit = fme_hdr_uinit,
+ .ioctl = fme_hdr_ioctl,
};
static struct dfl_feature_driver fme_feature_drvs[] = {
diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c
index 4b66aaa..e04ed45 100644
--- a/drivers/fpga/dfl.c
+++ b/drivers/fpga/dfl.c
@@ -231,16 +231,20 @@ void dfl_fpga_port_ops_del(struct dfl_fpga_port_ops *ops)
*/
int dfl_fpga_check_port_id(struct platform_device *pdev, void *pport_id)
{
- struct dfl_fpga_port_ops *port_ops = dfl_fpga_port_ops_get(pdev);
- int port_id;
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
+ struct dfl_fpga_port_ops *port_ops;
+
+ if (pdata->id != FEATURE_DEV_ID_UNUSED)
+ return pdata->id == *(int *)pport_id;
+ port_ops = dfl_fpga_port_ops_get(pdev);
if (!port_ops || !port_ops->get_id)
return 0;
- port_id = port_ops->get_id(pdev);
+ pdata->id = port_ops->get_id(pdev);
dfl_fpga_port_ops_put(port_ops);
- return port_id == *(int *)pport_id;
+ return pdata->id == *(int *)pport_id;
}
EXPORT_SYMBOL_GPL(dfl_fpga_check_port_id);
@@ -474,6 +478,7 @@ static int build_info_commit_dev(struct build_feature_devs_info *binfo)
pdata->dev = fdev;
pdata->num = binfo->feature_num;
pdata->dfl_cdev = binfo->cdev;
+ pdata->id = FEATURE_DEV_ID_UNUSED;
mutex_init(&pdata->lock);
lockdep_set_class_and_name(&pdata->lock, &dfl_pdata_keys[type],
dfl_pdata_key_strings[type]);
@@ -973,25 +978,27 @@ void dfl_fpga_feature_devs_remove(struct dfl_fpga_cdev *cdev)
{
struct dfl_feature_platform_data *pdata, *ptmp;
- remove_feature_devs(cdev);
-
mutex_lock(&cdev->lock);
- if (cdev->fme_dev) {
- /* the fme should be unregistered. */
- WARN_ON(device_is_registered(cdev->fme_dev));
+ if (cdev->fme_dev)
put_device(cdev->fme_dev);
- }
list_for_each_entry_safe(pdata, ptmp, &cdev->port_dev_list, node) {
struct platform_device *port_dev = pdata->dev;
- /* the port should be unregistered. */
- WARN_ON(device_is_registered(&port_dev->dev));
+ /* remove released ports */
+ if (!device_is_registered(&port_dev->dev)) {
+ dfl_id_free(feature_dev_id_type(port_dev),
+ port_dev->id);
+ platform_device_put(port_dev);
+ }
+
list_del(&pdata->node);
put_device(&port_dev->dev);
}
mutex_unlock(&cdev->lock);
+ remove_feature_devs(cdev);
+
fpga_region_unregister(cdev->region);
devm_kfree(cdev->parent, cdev);
}
@@ -1029,6 +1036,82 @@ struct platform_device *
}
EXPORT_SYMBOL_GPL(__dfl_fpga_cdev_find_port);
+static int attach_port_dev(struct dfl_fpga_cdev *cdev, int port_id)
+{
+ struct platform_device *port_pdev;
+ int ret = -ENODEV;
+
+ mutex_lock(&cdev->lock);
+ port_pdev = __dfl_fpga_cdev_find_port(cdev, &port_id,
+ dfl_fpga_check_port_id);
+ if (!port_pdev)
+ goto unlock_exit;
+
+ if (device_is_registered(&port_pdev->dev)) {
+ ret = -EBUSY;
+ goto put_dev_exit;
+ }
+
+ ret = platform_device_add(port_pdev);
+ if (ret)
+ goto put_dev_exit;
+
+ dfl_feature_dev_use_end(dev_get_platdata(&port_pdev->dev));
+ cdev->released_port_num--;
+put_dev_exit:
+ put_device(&port_pdev->dev);
+unlock_exit:
+ mutex_unlock(&cdev->lock);
+ return ret;
+}
+
+static int detach_port_dev(struct dfl_fpga_cdev *cdev, int port_id)
+{
+ struct platform_device *port_pdev;
+ int ret = -ENODEV;
+
+ mutex_lock(&cdev->lock);
+ port_pdev = __dfl_fpga_cdev_find_port(cdev, &port_id,
+ dfl_fpga_check_port_id);
+ if (!port_pdev)
+ goto unlock_exit;
+
+ if (!device_is_registered(&port_pdev->dev)) {
+ ret = -EBUSY;
+ goto put_dev_exit;
+ }
+
+ ret = dfl_feature_dev_use_begin(dev_get_platdata(&port_pdev->dev));
+ if (ret)
+ goto put_dev_exit;
+
+ platform_device_del(port_pdev);
+ cdev->released_port_num++;
+put_dev_exit:
+ put_device(&port_pdev->dev);
+unlock_exit:
+ mutex_unlock(&cdev->lock);
+ return ret;
+}
+
+/**
+ * dfl_fpga_cdev_config_port - configure a port feature dev
+ * @cdev: parent container device.
+ * @port_id: id of the port feature device.
+ * @release: release port or assign port back.
+ *
+ * This function allows user to release port platform device or assign it back.
+ * e.g. to safely turn one port from PF into VF for PCI device SRIOV support,
+ * release port platform device is one necessary step.
+ */
+int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev *cdev, int port_id,
+ bool release)
+{
+ return release ? detach_port_dev(cdev, port_id) :
+ attach_port_dev(cdev, port_id);
+}
+EXPORT_SYMBOL_GPL(dfl_fpga_cdev_config_port);
+
static int __init dfl_fpga_init(void)
{
int ret;
diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
index 8851c6c..d700ee9 100644
--- a/drivers/fpga/dfl.h
+++ b/drivers/fpga/dfl.h
@@ -183,6 +183,8 @@ struct dfl_feature {
#define DEV_STATUS_IN_USE 0
+#define FEATURE_DEV_ID_UNUSED (-1)
+
/**
* struct dfl_feature_platform_data - platform data for feature devices
*
@@ -191,6 +193,7 @@ struct dfl_feature {
* @cdev: cdev of feature dev.
* @dev: ptr to platform device linked with this platform data.
* @dfl_cdev: ptr to container device.
+ * @id: id used for this feature device.
* @disable_count: count for port disable.
* @num: number for sub features.
* @dev_status: dev status (e.g. DEV_STATUS_IN_USE).
@@ -203,6 +206,7 @@ struct dfl_feature_platform_data {
struct cdev cdev;
struct platform_device *dev;
struct dfl_fpga_cdev *dfl_cdev;
+ int id;
unsigned int disable_count;
unsigned long dev_status;
void *private;
@@ -378,6 +382,7 @@ int dfl_fpga_enum_info_add_dfl(struct dfl_fpga_enum_info *info,
* @fme_dev: FME feature device under this container device.
* @lock: mutex lock to protect the port device list.
* @port_dev_list: list of all port feature devices under this container device.
+ * @released_port_num: released port number under this container device.
*/
struct dfl_fpga_cdev {
struct device *parent;
@@ -385,6 +390,7 @@ struct dfl_fpga_cdev {
struct device *fme_dev;
struct mutex lock;
struct list_head port_dev_list;
+ int released_port_num;
};
struct dfl_fpga_cdev *
@@ -412,4 +418,8 @@ struct platform_device *
return pdev;
}
+
+int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev *cdev,
+ int port_id, bool release);
+
#endif /* __FPGA_DFL_H */
diff --git a/include/uapi/linux/fpga-dfl.h b/include/uapi/linux/fpga-dfl.h
index 2e324e5..72f11fd 100644
--- a/include/uapi/linux/fpga-dfl.h
+++ b/include/uapi/linux/fpga-dfl.h
@@ -176,4 +176,23 @@ struct dfl_fpga_fme_port_pr {
#define DFL_FPGA_FME_PORT_PR _IO(DFL_FPGA_MAGIC, DFL_FME_BASE + 0)
+/**
+ * DFL_FPGA_FME_PORT_RELEASE - _IOW(DFL_FPGA_MAGIC, DFL_FME_BASE + 1,
+ * int port_id)
+ *
+ * Driver releases the port per Port ID provided by caller.
+ * Return: 0 on success, -errno on failure.
+ */
+#define DFL_FPGA_FME_PORT_RELEASE _IOW(DFL_FPGA_MAGIC, DFL_FME_BASE + 1, int)
+
+/**
+ * DFL_FPGA_FME_PORT_ASSIGN - _IOW(DFL_FPGA_MAGIC, DFL_FME_BASE + 2,
+ * int port_id)
+ *
+ * Driver assigns the port back per Port ID provided by caller.
+ * Return: 0 on success, -errno on failure.
+ */
+
+#define DFL_FPGA_FME_PORT_ASSIGN _IOW(DFL_FPGA_MAGIC, DFL_FME_BASE + 2, int)
+
#endif /* _UAPI_LINUX_FPGA_DFL_H */
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 05/12] fpga: dfl: afu: add userclock sysfs interfaces.
From: Wu Hao @ 2019-07-23 4:51 UTC (permalink / raw)
To: gregkh, mdf, linux-fpga
Cc: linux-kernel, linux-api, linux-doc, atull, Wu Hao, Ananda Ravuri,
Russ Weight, Xu Yilun
In-Reply-To: <1563857495-26692-1-git-send-email-hao.wu@intel.com>
This patch introduces userclock sysfs interfaces for AFU, user
could use these interfaces for clock setting to AFU.
Please note that, this is only working for port header feature
with revision 0, for later revisions, userclock setting is moved
to a separated private feature, so one revision sysfs interface
is exposed to userspace application for this purpose too.
Signed-off-by: Ananda Ravuri <ananda.ravuri@intel.com>
Signed-off-by: Russ Weight <russell.h.weight@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Acked-by: Alan Tull <atull@kernel.org>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
---
v2: rebased, and switched to use device_add/remove_groups for sysfs
v3: update kernel version and date in sysfs doc
---
Documentation/ABI/testing/sysfs-platform-dfl-port | 35 +++++++
drivers/fpga/dfl-afu-main.c | 114 +++++++++++++++++++++-
drivers/fpga/dfl.h | 4 +
3 files changed, 152 insertions(+), 1 deletion(-)
diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port b/Documentation/ABI/testing/sysfs-platform-dfl-port
index 5961fb2..f641807 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-port
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-port
@@ -44,3 +44,38 @@ Contact: Wu Hao <hao.wu@intel.com>
Description: Read-write. Read and set AFU latency tolerance reporting value.
Set ltr to 1 if the AFU can tolerate latency >= 40us or set it
to 0 if it is latency sensitive.
+
+What: /sys/bus/platform/devices/dfl-port.0/revision
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. Read this file to get the revision of port header
+ feature.
+
+What: /sys/bus/platform/devices/dfl-port.0/userclk_freqcmd
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Write-only. User writes command to this interface to set
+ userclock to AFU.
+
+What: /sys/bus/platform/devices/dfl-port.0/userclk_freqsts
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. Read this file to get the status of issued command
+ to userclck_freqcmd.
+
+What: /sys/bus/platform/devices/dfl-port.0/userclk_freqcntrcmd
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Write-only. User writes command to this interface to set
+ userclock counter.
+
+What: /sys/bus/platform/devices/dfl-port.0/userclk_freqcntrsts
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. Read this file to get the status of issued command
+ to userclck_freqcntrcmd.
diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index cb3f73e..9025314 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -142,6 +142,17 @@ static int port_get_id(struct platform_device *pdev)
static DEVICE_ATTR_RO(id);
static ssize_t
+revision_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ return sprintf(buf, "%x\n", dfl_feature_revision(base));
+}
+static DEVICE_ATTR_RO(revision);
+
+static ssize_t
ltr_show(struct device *dev, struct device_attribute *attr, char *buf)
{
struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
@@ -276,6 +287,7 @@ static int port_get_id(struct platform_device *pdev)
static struct attribute *port_hdr_attrs[] = {
&dev_attr_id.attr,
+ &dev_attr_revision.attr,
&dev_attr_ltr.attr,
&dev_attr_ap1_event.attr,
&dev_attr_ap2_event.attr,
@@ -284,14 +296,113 @@ static int port_get_id(struct platform_device *pdev)
};
ATTRIBUTE_GROUPS(port_hdr);
+static ssize_t
+userclk_freqcmd_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ u64 userclk_freq_cmd;
+ void __iomem *base;
+
+ if (kstrtou64(buf, 0, &userclk_freq_cmd))
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ writeq(userclk_freq_cmd, base + PORT_HDR_USRCLK_CMD0);
+ mutex_unlock(&pdata->lock);
+
+ return count;
+}
+static DEVICE_ATTR_WO(userclk_freqcmd);
+
+static ssize_t
+userclk_freqcntrcmd_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ u64 userclk_freqcntr_cmd;
+ void __iomem *base;
+
+ if (kstrtou64(buf, 0, &userclk_freqcntr_cmd))
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ writeq(userclk_freqcntr_cmd, base + PORT_HDR_USRCLK_CMD1);
+ mutex_unlock(&pdata->lock);
+
+ return count;
+}
+static DEVICE_ATTR_WO(userclk_freqcntrcmd);
+
+static ssize_t
+userclk_freqsts_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ u64 userclk_freqsts;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ userclk_freqsts = readq(base + PORT_HDR_USRCLK_STS0);
+
+ return sprintf(buf, "0x%llx\n", (unsigned long long)userclk_freqsts);
+}
+static DEVICE_ATTR_RO(userclk_freqsts);
+
+static ssize_t
+userclk_freqcntrsts_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ u64 userclk_freqcntrsts;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ userclk_freqcntrsts = readq(base + PORT_HDR_USRCLK_STS1);
+
+ return sprintf(buf, "0x%llx\n",
+ (unsigned long long)userclk_freqcntrsts);
+}
+static DEVICE_ATTR_RO(userclk_freqcntrsts);
+
+static struct attribute *port_hdr_userclk_attrs[] = {
+ &dev_attr_userclk_freqcmd.attr,
+ &dev_attr_userclk_freqcntrcmd.attr,
+ &dev_attr_userclk_freqsts.attr,
+ &dev_attr_userclk_freqcntrsts.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(port_hdr_userclk);
+
static int port_hdr_init(struct platform_device *pdev,
struct dfl_feature *feature)
{
+ int ret;
+
dev_dbg(&pdev->dev, "PORT HDR Init.\n");
port_reset(pdev);
- return device_add_groups(&pdev->dev, port_hdr_groups);
+ ret = device_add_groups(&pdev->dev, port_hdr_groups);
+ if (ret)
+ return ret;
+
+ /*
+ * if revision > 0, the userclock will be moved from port hdr register
+ * region to a separated private feature.
+ */
+ if (dfl_feature_revision(feature->ioaddr) > 0)
+ return 0;
+
+ ret = device_add_groups(&pdev->dev, port_hdr_userclk_groups);
+ if (ret)
+ device_remove_groups(&pdev->dev, port_hdr_groups);
+
+ return ret;
}
static void port_hdr_uinit(struct platform_device *pdev,
@@ -299,6 +410,7 @@ static void port_hdr_uinit(struct platform_device *pdev,
{
dev_dbg(&pdev->dev, "PORT HDR UInit.\n");
+ device_remove_groups(&pdev->dev, port_hdr_userclk_groups);
device_remove_groups(&pdev->dev, port_hdr_groups);
}
diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
index fe7bca4..77b5137 100644
--- a/drivers/fpga/dfl.h
+++ b/drivers/fpga/dfl.h
@@ -120,6 +120,10 @@
#define PORT_HDR_CAP 0x30
#define PORT_HDR_CTRL 0x38
#define PORT_HDR_STS 0x40
+#define PORT_HDR_USRCLK_CMD0 0x50
+#define PORT_HDR_USRCLK_CMD1 0x58
+#define PORT_HDR_USRCLK_STS0 0x60
+#define PORT_HDR_USRCLK_STS1 0x68
/* Port Capability Register Bitfield */
#define PORT_CAP_PORT_NUM GENMASK_ULL(1, 0) /* ID of this port */
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 04/12] fpga: dfl: afu: add AFU state related sysfs interfaces
From: Wu Hao @ 2019-07-23 4:51 UTC (permalink / raw)
To: gregkh, mdf, linux-fpga
Cc: linux-kernel, linux-api, linux-doc, atull, Wu Hao, Ananda Ravuri,
Xu Yilun
In-Reply-To: <1563857495-26692-1-git-send-email-hao.wu@intel.com>
This patch introduces more sysfs interfaces for Accelerated
Function Unit (AFU). These interfaces allow users to read
current AFU Power State (APx), read / clear AFU Power (APx)
events which are sticky to identify transient APx state,
and manage AFU's LTR (latency tolerance reporting).
Signed-off-by: Ananda Ravuri <ananda.ravuri@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Acked-by: Alan Tull <atull@kernel.org>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
---
v2: rebased, and remove DRV/MODULE_VERSION modifications
v3: update kernel version and date in sysfs doc
---
Documentation/ABI/testing/sysfs-platform-dfl-port | 30 +++++
drivers/fpga/dfl-afu-main.c | 137 ++++++++++++++++++++++
drivers/fpga/dfl.h | 11 ++
3 files changed, 178 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port b/Documentation/ABI/testing/sysfs-platform-dfl-port
index 6a92dda..5961fb2 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-port
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-port
@@ -14,3 +14,33 @@ Description: Read-only. User can program different PR bitstreams to FPGA
Accelerator Function Unit (AFU) for different functions. It
returns uuid which could be used to identify which PR bitstream
is programmed in this AFU.
+
+What: /sys/bus/platform/devices/dfl-port.0/power_state
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. It reports the APx (AFU Power) state, different APx
+ means different throttling level. When reading this file, it
+ returns "0" - Normal / "1" - AP1 / "2" - AP2 / "6" - AP6.
+
+What: /sys/bus/platform/devices/dfl-port.0/ap1_event
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-write. Read or set 1 to clear AP1 (AFU Power State 1)
+ event. It's used to indicate transient AP1 state.
+
+What: /sys/bus/platform/devices/dfl-port.0/ap2_event
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-write. Read or set 1 to clear AP2 (AFU Power State 2)
+ event. It's used to indicate transient AP2 state.
+
+What: /sys/bus/platform/devices/dfl-port.0/ltr
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-write. Read and set AFU latency tolerance reporting value.
+ Set ltr to 1 if the AFU can tolerate latency >= 40us or set it
+ to 0 if it is latency sensitive.
diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index 68b4d08..cb3f73e 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -141,8 +141,145 @@ static int port_get_id(struct platform_device *pdev)
}
static DEVICE_ATTR_RO(id);
+static ssize_t
+ltr_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + PORT_HDR_CTRL);
+ mutex_unlock(&pdata->lock);
+
+ return sprintf(buf, "%x\n", (u8)FIELD_GET(PORT_CTRL_LATENCY, v));
+}
+
+static ssize_t
+ltr_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u8 ltr;
+ u64 v;
+
+ if (kstrtou8(buf, 0, <r) || ltr > 1)
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + PORT_HDR_CTRL);
+ v &= ~PORT_CTRL_LATENCY;
+ v |= FIELD_PREP(PORT_CTRL_LATENCY, ltr);
+ writeq(v, base + PORT_HDR_CTRL);
+ mutex_unlock(&pdata->lock);
+
+ return count;
+}
+static DEVICE_ATTR_RW(ltr);
+
+static ssize_t
+ap1_event_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + PORT_HDR_STS);
+ mutex_unlock(&pdata->lock);
+
+ return sprintf(buf, "%x\n", (u8)FIELD_GET(PORT_STS_AP1_EVT, v));
+}
+
+static ssize_t
+ap1_event_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u8 ap1_event;
+
+ if (kstrtou8(buf, 0, &ap1_event) || ap1_event != 1)
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ writeq(PORT_STS_AP1_EVT, base + PORT_HDR_STS);
+ mutex_unlock(&pdata->lock);
+
+ return count;
+}
+static DEVICE_ATTR_RW(ap1_event);
+
+static ssize_t
+ap2_event_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + PORT_HDR_STS);
+ mutex_unlock(&pdata->lock);
+
+ return sprintf(buf, "%x\n", (u8)FIELD_GET(PORT_STS_AP2_EVT, v));
+}
+
+static ssize_t
+ap2_event_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u8 ap2_event;
+
+ if (kstrtou8(buf, 0, &ap2_event) || ap2_event != 1)
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ writeq(PORT_STS_AP2_EVT, base + PORT_HDR_STS);
+ mutex_unlock(&pdata->lock);
+
+ return count;
+}
+static DEVICE_ATTR_RW(ap2_event);
+
+static ssize_t
+power_state_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + PORT_HDR_STS);
+ mutex_unlock(&pdata->lock);
+
+ return sprintf(buf, "0x%x\n", (u8)FIELD_GET(PORT_STS_PWR_STATE, v));
+}
+static DEVICE_ATTR_RO(power_state);
+
static struct attribute *port_hdr_attrs[] = {
&dev_attr_id.attr,
+ &dev_attr_ltr.attr,
+ &dev_attr_ap1_event.attr,
+ &dev_attr_ap2_event.attr,
+ &dev_attr_power_state.attr,
NULL,
};
ATTRIBUTE_GROUPS(port_hdr);
diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
index 061ccd4..fe7bca4 100644
--- a/drivers/fpga/dfl.h
+++ b/drivers/fpga/dfl.h
@@ -119,6 +119,7 @@
#define PORT_HDR_NEXT_AFU NEXT_AFU
#define PORT_HDR_CAP 0x30
#define PORT_HDR_CTRL 0x38
+#define PORT_HDR_STS 0x40
/* Port Capability Register Bitfield */
#define PORT_CAP_PORT_NUM GENMASK_ULL(1, 0) /* ID of this port */
@@ -130,6 +131,16 @@
/* Latency tolerance reporting. '1' >= 40us, '0' < 40us.*/
#define PORT_CTRL_LATENCY BIT_ULL(2)
#define PORT_CTRL_SFTRST_ACK BIT_ULL(4) /* HW ack for reset */
+
+/* Port Status Register Bitfield */
+#define PORT_STS_AP2_EVT BIT_ULL(13) /* AP2 event detected */
+#define PORT_STS_AP1_EVT BIT_ULL(12) /* AP1 event detected */
+#define PORT_STS_PWR_STATE GENMASK_ULL(11, 8) /* AFU power states */
+#define PORT_STS_PWR_STATE_NORM 0
+#define PORT_STS_PWR_STATE_AP1 1 /* 50% throttling */
+#define PORT_STS_PWR_STATE_AP2 2 /* 90% throttling */
+#define PORT_STS_PWR_STATE_AP6 6 /* 100% throttling */
+
/**
* struct dfl_fpga_port_ops - port ops
*
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 07/12] fpga: dfl: afu: export __port_enable/disable function.
From: Wu Hao @ 2019-07-23 4:51 UTC (permalink / raw)
To: gregkh, mdf, linux-fpga
Cc: linux-kernel, linux-api, linux-doc, atull, Wu Hao, Xu Yilun
In-Reply-To: <1563857495-26692-1-git-send-email-hao.wu@intel.com>
As these two functions are used by other private features. e.g.
in error reporting private feature, it requires to check port status
and reset port for error clearing.
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Acked-by: Moritz Fischer <mdf@kernel.org>
Acked-by: Alan Tull <atull@kernel.org>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
---
v2: rebased.
---
drivers/fpga/dfl-afu-main.c | 25 ++++++++++++++-----------
drivers/fpga/dfl-afu.h | 3 +++
2 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index fbd9553..22d294b 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -22,14 +22,16 @@
#include "dfl-afu.h"
/**
- * port_enable - enable a port
+ * __port_enable - enable a port
* @pdev: port platform device.
*
* Enable Port by clear the port soft reset bit, which is set by default.
* The AFU is unable to respond to any MMIO access while in reset.
- * port_enable function should only be used after port_disable function.
+ * __port_enable function should only be used after __port_disable function.
+ *
+ * The caller needs to hold lock for protection.
*/
-static void port_enable(struct platform_device *pdev)
+void __port_enable(struct platform_device *pdev)
{
struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
void __iomem *base;
@@ -52,13 +54,14 @@ static void port_enable(struct platform_device *pdev)
#define RST_POLL_TIMEOUT 1000 /* us */
/**
- * port_disable - disable a port
+ * __port_disable - disable a port
* @pdev: port platform device.
*
- * Disable Port by setting the port soft reset bit, it puts the port into
- * reset.
+ * Disable Port by setting the port soft reset bit, it puts the port into reset.
+ *
+ * The caller needs to hold lock for protection.
*/
-static int port_disable(struct platform_device *pdev)
+int __port_disable(struct platform_device *pdev)
{
struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
void __iomem *base;
@@ -104,9 +107,9 @@ static int __port_reset(struct platform_device *pdev)
{
int ret;
- ret = port_disable(pdev);
+ ret = __port_disable(pdev);
if (!ret)
- port_enable(pdev);
+ __port_enable(pdev);
return ret;
}
@@ -806,9 +809,9 @@ static int port_enable_set(struct platform_device *pdev, bool enable)
mutex_lock(&pdata->lock);
if (enable)
- port_enable(pdev);
+ __port_enable(pdev);
else
- ret = port_disable(pdev);
+ ret = __port_disable(pdev);
mutex_unlock(&pdata->lock);
return ret;
diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
index 0c7630a..35e60c5 100644
--- a/drivers/fpga/dfl-afu.h
+++ b/drivers/fpga/dfl-afu.h
@@ -79,6 +79,9 @@ struct dfl_afu {
struct dfl_feature_platform_data *pdata;
};
+void __port_enable(struct platform_device *pdev);
+int __port_disable(struct platform_device *pdev);
+
void afu_mmio_region_init(struct dfl_feature_platform_data *pdata);
int afu_mmio_region_add(struct dfl_feature_platform_data *pdata,
u32 region_index, u64 region_size, u64 phys, u32 flags);
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 08/12] fpga: dfl: afu: add error reporting support.
From: Wu Hao @ 2019-07-23 4:51 UTC (permalink / raw)
To: gregkh, mdf, linux-fpga
Cc: linux-kernel, linux-api, linux-doc, atull, Wu Hao, Xu Yilun
In-Reply-To: <1563857495-26692-1-git-send-email-hao.wu@intel.com>
Error reporting is one important private feature, it reports error
detected on port and accelerated function unit (AFU). It introduces
several sysfs interfaces to allow userspace to check and clear
errors detected by hardware.
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Acked-by: Alan Tull <atull@kernel.org>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
---
v2: switch to device_add/remove_group for sysfs.
v3: update kernel version and date in sysfs doc
---
Documentation/ABI/testing/sysfs-platform-dfl-port | 39 ++++
drivers/fpga/Makefile | 1 +
drivers/fpga/dfl-afu-error.c | 225 ++++++++++++++++++++++
drivers/fpga/dfl-afu-main.c | 4 +
drivers/fpga/dfl-afu.h | 4 +
5 files changed, 273 insertions(+)
create mode 100644 drivers/fpga/dfl-afu-error.c
diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port b/Documentation/ABI/testing/sysfs-platform-dfl-port
index f641807..03740c2 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-port
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-port
@@ -79,3 +79,42 @@ KernelVersion: 5.4
Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. Read this file to get the status of issued command
to userclck_freqcntrcmd.
+
+What: /sys/bus/platform/devices/dfl-port.0/errors/revision
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. Read this file to get the revision of this error
+ reporting private feature.
+
+What: /sys/bus/platform/devices/dfl-port.0/errors/errors
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. Read this file to get errors detected on port and
+ Accelerated Function Unit (AFU).
+
+What: /sys/bus/platform/devices/dfl-port.0/errors/first_error
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. Read this file to get the first error detected by
+ hardware.
+
+What: /sys/bus/platform/devices/dfl-port.0/errors/first_malformed_req
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. Read this file to get the first malformed request
+ captured by hardware.
+
+What: /sys/bus/platform/devices/dfl-port.0/errors/clear
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Write-only. Write error code to this file to clear errors.
+ Write fails with -EINVAL if input parsing fails or input error
+ code doesn't match.
+ Write fails with -EBUSY or -ETIMEDOUT if error can't be cleared
+ as hardware is in low power state (-EBUSY) or not responding
+ (-ETIMEDOUT).
diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
index 312b937..7255891 100644
--- a/drivers/fpga/Makefile
+++ b/drivers/fpga/Makefile
@@ -41,6 +41,7 @@ obj-$(CONFIG_FPGA_DFL_AFU) += dfl-afu.o
dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o
dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o dfl-afu-dma-region.o
+dfl-afu-objs += dfl-afu-error.o
# Drivers for FPGAs which implement DFL
obj-$(CONFIG_FPGA_DFL_PCI) += dfl-pci.o
diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
new file mode 100644
index 0000000..9649da8
--- /dev/null
+++ b/drivers/fpga/dfl-afu-error.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for FPGA Accelerated Function Unit (AFU) Error Reporting
+ *
+ * Copyright 2019 Intel Corporation, Inc.
+ *
+ * Authors:
+ * Wu Hao <hao.wu@linux.intel.com>
+ * Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ * Joseph Grecco <joe.grecco@intel.com>
+ * Enno Luebbers <enno.luebbers@intel.com>
+ * Tim Whisonant <tim.whisonant@intel.com>
+ * Ananda Ravuri <ananda.ravuri@intel.com>
+ * Mitchel Henry <henry.mitchel@intel.com>
+ */
+
+#include <linux/uaccess.h>
+
+#include "dfl-afu.h"
+
+#define PORT_ERROR_MASK 0x8
+#define PORT_ERROR 0x10
+#define PORT_FIRST_ERROR 0x18
+#define PORT_MALFORMED_REQ0 0x20
+#define PORT_MALFORMED_REQ1 0x28
+
+#define ERROR_MASK GENMASK_ULL(63, 0)
+
+/* mask or unmask port errors by the error mask register. */
+static void __port_err_mask(struct device *dev, bool mask)
+{
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
+
+ writeq(mask ? ERROR_MASK : 0, base + PORT_ERROR_MASK);
+}
+
+/* clear port errors. */
+static int __port_err_clear(struct device *dev, u64 err)
+{
+ struct platform_device *pdev = to_platform_device(dev);
+ void __iomem *base_err, *base_hdr;
+ int ret;
+ u64 v;
+
+ base_err = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
+ base_hdr = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ /*
+ * clear Port Errors
+ *
+ * - Check for AP6 State
+ * - Halt Port by keeping Port in reset
+ * - Set PORT Error mask to all 1 to mask errors
+ * - Clear all errors
+ * - Set Port mask to all 0 to enable errors
+ * - All errors start capturing new errors
+ * - Enable Port by pulling the port out of reset
+ */
+
+ /* if device is still in AP6 power state, can not clear any error. */
+ v = readq(base_hdr + PORT_HDR_STS);
+ if (FIELD_GET(PORT_STS_PWR_STATE, v) == PORT_STS_PWR_STATE_AP6) {
+ dev_err(dev, "Could not clear errors, device in AP6 state.\n");
+ return -EBUSY;
+ }
+
+ /* Halt Port by keeping Port in reset */
+ ret = __port_disable(pdev);
+ if (ret)
+ return ret;
+
+ /* Mask all errors */
+ __port_err_mask(dev, true);
+
+ /* Clear errors if err input matches with current port errors.*/
+ v = readq(base_err + PORT_ERROR);
+
+ if (v == err) {
+ writeq(v, base_err + PORT_ERROR);
+
+ v = readq(base_err + PORT_FIRST_ERROR);
+ writeq(v, base_err + PORT_FIRST_ERROR);
+ } else {
+ ret = -EINVAL;
+ }
+
+ /* Clear mask */
+ __port_err_mask(dev, false);
+
+ /* Enable the Port by clear the reset */
+ __port_enable(pdev);
+
+ return ret;
+}
+
+static ssize_t revision_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
+
+ return sprintf(buf, "%u\n", dfl_feature_revision(base));
+}
+static DEVICE_ATTR_RO(revision);
+
+static ssize_t errors_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u64 error;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
+
+ mutex_lock(&pdata->lock);
+ error = readq(base + PORT_ERROR);
+ mutex_unlock(&pdata->lock);
+
+ return sprintf(buf, "0x%llx\n", (unsigned long long)error);
+}
+static DEVICE_ATTR_RO(errors);
+
+static ssize_t first_error_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u64 error;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
+
+ mutex_lock(&pdata->lock);
+ error = readq(base + PORT_FIRST_ERROR);
+ mutex_unlock(&pdata->lock);
+
+ return sprintf(buf, "0x%llx\n", (unsigned long long)error);
+}
+static DEVICE_ATTR_RO(first_error);
+
+static ssize_t first_malformed_req_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u64 req0, req1;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
+
+ mutex_lock(&pdata->lock);
+ req0 = readq(base + PORT_MALFORMED_REQ0);
+ req1 = readq(base + PORT_MALFORMED_REQ1);
+ mutex_unlock(&pdata->lock);
+
+ return sprintf(buf, "0x%016llx%016llx\n",
+ (unsigned long long)req1, (unsigned long long)req0);
+}
+static DEVICE_ATTR_RO(first_malformed_req);
+
+static ssize_t clear_store(struct device *dev, struct device_attribute *attr,
+ const char *buff, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ u64 value;
+ int ret;
+
+ if (kstrtou64(buff, 0, &value))
+ return -EINVAL;
+
+ mutex_lock(&pdata->lock);
+ ret = __port_err_clear(dev, value);
+ mutex_unlock(&pdata->lock);
+
+ return ret ? ret : count;
+}
+static DEVICE_ATTR_WO(clear);
+
+static struct attribute *port_err_attrs[] = {
+ &dev_attr_revision.attr,
+ &dev_attr_errors.attr,
+ &dev_attr_first_error.attr,
+ &dev_attr_first_malformed_req.attr,
+ &dev_attr_clear.attr,
+ NULL,
+};
+
+static struct attribute_group port_err_attr_group = {
+ .attrs = port_err_attrs,
+ .name = "errors",
+};
+
+static int port_err_init(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
+
+ dev_dbg(&pdev->dev, "PORT ERR Init.\n");
+
+ mutex_lock(&pdata->lock);
+ __port_err_mask(&pdev->dev, false);
+ mutex_unlock(&pdata->lock);
+
+ return device_add_group(&pdev->dev, &port_err_attr_group);
+}
+
+static void port_err_uinit(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ dev_dbg(&pdev->dev, "PORT ERR UInit.\n");
+
+ device_remove_group(&pdev->dev, &port_err_attr_group);
+}
+
+const struct dfl_feature_id port_err_id_table[] = {
+ {.id = PORT_FEATURE_ID_ERROR,},
+ {0,}
+};
+
+const struct dfl_feature_ops port_err_ops = {
+ .init = port_err_init,
+ .uinit = port_err_uinit,
+};
diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index 22d294b..15dd4cb 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -524,6 +524,10 @@ static void port_afu_uinit(struct platform_device *pdev,
.ops = &port_afu_ops,
},
{
+ .id_table = port_err_id_table,
+ .ops = &port_err_ops,
+ },
+ {
.ops = NULL,
}
};
diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
index 35e60c5..c3182a2 100644
--- a/drivers/fpga/dfl-afu.h
+++ b/drivers/fpga/dfl-afu.h
@@ -100,4 +100,8 @@ int afu_dma_map_region(struct dfl_feature_platform_data *pdata,
struct dfl_afu_dma_region *
afu_dma_region_find(struct dfl_feature_platform_data *pdata,
u64 iova, u64 size);
+
+extern const struct dfl_feature_ops port_err_ops;
+extern const struct dfl_feature_id port_err_id_table[];
+
#endif /* __DFL_AFU_H */
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 09/12] fpga: dfl: afu: add STP (SignalTap) support
From: Wu Hao @ 2019-07-23 4:51 UTC (permalink / raw)
To: gregkh, mdf, linux-fpga
Cc: linux-kernel, linux-api, linux-doc, atull, Wu Hao, Xu Yilun
In-Reply-To: <1563857495-26692-1-git-send-email-hao.wu@intel.com>
STP (SignalTap) is one of the private features under the port for
debugging. This patch adds private feature driver support for it
to allow userspace applications to mmap related mmio region and
provide STP service.
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Acked-by: Moritz Fischer <mdf@kernel.org>
Acked-by: Alan Tull <atull@kernel.org>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
---
drivers/fpga/dfl-afu-main.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index 15dd4cb..395f96e 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -514,6 +514,36 @@ static void port_afu_uinit(struct platform_device *pdev,
.uinit = port_afu_uinit,
};
+static int port_stp_init(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ struct resource *res = &pdev->resource[feature->resource_index];
+
+ dev_dbg(&pdev->dev, "PORT STP Init.\n");
+
+ return afu_mmio_region_add(dev_get_platdata(&pdev->dev),
+ DFL_PORT_REGION_INDEX_STP,
+ resource_size(res), res->start,
+ DFL_PORT_REGION_MMAP | DFL_PORT_REGION_READ |
+ DFL_PORT_REGION_WRITE);
+}
+
+static void port_stp_uinit(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ dev_dbg(&pdev->dev, "PORT STP UInit.\n");
+}
+
+static const struct dfl_feature_id port_stp_id_table[] = {
+ {.id = PORT_FEATURE_ID_STP,},
+ {0,}
+};
+
+static const struct dfl_feature_ops port_stp_ops = {
+ .init = port_stp_init,
+ .uinit = port_stp_uinit,
+};
+
static struct dfl_feature_driver port_feature_drvs[] = {
{
.id_table = port_hdr_id_table,
@@ -528,6 +558,10 @@ static void port_afu_uinit(struct platform_device *pdev,
.ops = &port_err_ops,
},
{
+ .id_table = port_stp_id_table,
+ .ops = &port_stp_ops,
+ },
+ {
.ops = NULL,
}
};
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 12/12] Documentation: fpga: dfl: add descriptions for virtualization and new interfaces.
From: Wu Hao @ 2019-07-23 4:51 UTC (permalink / raw)
To: gregkh, mdf, linux-fpga
Cc: linux-kernel, linux-api, linux-doc, atull, Wu Hao, Xu Yilun
In-Reply-To: <1563857495-26692-1-git-send-email-hao.wu@intel.com>
This patch adds virtualization support description for DFL based
FPGA devices (based on PCIe SRIOV), and introductions to new
interfaces added by new dfl private feature drivers.
[mdf@kernel.org: Fixed up to make it work with new reStructuredText docs]
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Acked-by: Alan Tull <atull@kernel.org>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
---
Documentation/fpga/dfl.rst | 105 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 105 insertions(+)
diff --git a/Documentation/fpga/dfl.rst b/Documentation/fpga/dfl.rst
index 2f125ab..6fa483f 100644
--- a/Documentation/fpga/dfl.rst
+++ b/Documentation/fpga/dfl.rst
@@ -87,6 +87,8 @@ The following functions are exposed through ioctls:
- Get driver API version (DFL_FPGA_GET_API_VERSION)
- Check for extensions (DFL_FPGA_CHECK_EXTENSION)
- Program bitstream (DFL_FPGA_FME_PORT_PR)
+- Assign port to PF (DFL_FPGA_FME_PORT_ASSIGN)
+- Release port from PF (DFL_FPGA_FME_PORT_RELEASE)
More functions are exposed through sysfs
(/sys/class/fpga_region/regionX/dfl-fme.n/):
@@ -102,6 +104,10 @@ More functions are exposed through sysfs
one FPGA device may have more than one port, this sysfs interface indicates
how many ports the FPGA device has.
+ Global error reporting management (errors/)
+ error reporting sysfs interfaces allow user to read errors detected by the
+ hardware, and clear the logged errors.
+
FIU - PORT
==========
@@ -143,6 +149,10 @@ More functions are exposed through sysfs:
Read Accelerator GUID (afu_id)
afu_id indicates which PR bitstream is programmed to this AFU.
+ Error reporting (errors/)
+ error reporting sysfs interfaces allow user to read port/afu errors
+ detected by the hardware, and clear the logged errors.
+
DFL Framework Overview
======================
@@ -218,6 +228,101 @@ the compat_id exposed by the target FPGA region. This check is usually done by
userspace before calling the reconfiguration IOCTL.
+FPGA virtualization - PCIe SRIOV
+================================
+This section describes the virtualization support on DFL based FPGA device to
+enable accessing an accelerator from applications running in a virtual machine
+(VM). This section only describes the PCIe based FPGA device with SRIOV support.
+
+Features supported by the particular FPGA device are exposed through Device
+Feature Lists, as illustrated below:
+
+::
+
+ +-------------------------------+ +-------------+
+ | PF | | VF |
+ +-------------------------------+ +-------------+
+ ^ ^ ^ ^
+ | | | |
+ +-----|------------|---------|--------------|-------+
+ | | | | | |
+ | +-----+ +-------+ +-------+ +-------+ |
+ | | FME | | Port0 | | Port1 | | Port2 | |
+ | +-----+ +-------+ +-------+ +-------+ |
+ | ^ ^ ^ |
+ | | | | |
+ | +-------+ +------+ +-------+ |
+ | | AFU | | AFU | | AFU | |
+ | +-------+ +------+ +-------+ |
+ | |
+ | DFL based FPGA PCIe Device |
+ +---------------------------------------------------+
+
+FME is always accessed through the physical function (PF).
+
+Ports (and related AFUs) are accessed via PF by default, but could be exposed
+through virtual function (VF) devices via PCIe SRIOV. Each VF only contains
+1 Port and 1 AFU for isolation. Users could assign individual VFs (accelerators)
+created via PCIe SRIOV interface, to virtual machines.
+
+The driver organization in virtualization case is illustrated below:
+::
+
+ +-------++------++------+ |
+ | FME || FME || FME | |
+ | FPGA || FPGA || FPGA | |
+ |Manager||Bridge||Region| |
+ +-------++------++------+ |
+ +-----------------------+ +--------+ | +--------+
+ | FME | | AFU | | | AFU |
+ | Module | | Module | | | Module |
+ +-----------------------+ +--------+ | +--------+
+ +-----------------------+ | +-----------------------+
+ | FPGA Container Device | | | FPGA Container Device |
+ | (FPGA Base Region) | | | (FPGA Base Region) |
+ +-----------------------+ | +-----------------------+
+ +------------------+ | +------------------+
+ | FPGA PCIE Module | | Virtual | FPGA PCIE Module |
+ +------------------+ Host | Machine +------------------+
+ -------------------------------------- | ------------------------------
+ +---------------+ | +---------------+
+ | PCI PF Device | | | PCI VF Device |
+ +---------------+ | +---------------+
+
+FPGA PCIe device driver is always loaded first once a FPGA PCIe PF or VF device
+is detected. It:
+
+* Finishes enumeration on both FPGA PCIe PF and VF device using common
+ interfaces from DFL framework.
+* Supports SRIOV.
+
+The FME device driver plays a management role in this driver architecture, it
+provides ioctls to release Port from PF and assign Port to PF. After release
+a port from PF, then it's safe to expose this port through a VF via PCIe SRIOV
+sysfs interface.
+
+To enable accessing an accelerator from applications running in a VM, the
+respective AFU's port needs to be assigned to a VF using the following steps:
+
+#. The PF owns all AFU ports by default. Any port that needs to be
+ reassigned to a VF must first be released through the
+ DFL_FPGA_FME_PORT_RELEASE ioctl on the FME device.
+
+#. Once N ports are released from PF, then user can use command below
+ to enable SRIOV and VFs. Each VF owns only one Port with AFU.
+
+ ::
+
+ echo N > $PCI_DEVICE_PATH/sriov_numvfs
+
+#. Pass through the VFs to VMs
+
+#. The AFU under VF is accessible from applications in VM (using the
+ same driver inside the VF).
+
+Note that an FME can't be assigned to a VF, thus PR and other management
+functions are only available via the PF.
+
Device enumeration
==================
This section introduces how applications enumerate the fpga device from
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 11/12] fpga: dfl: fme: add global error reporting support
From: Wu Hao @ 2019-07-23 4:51 UTC (permalink / raw)
To: gregkh, mdf, linux-fpga
Cc: linux-kernel, linux-api, linux-doc, atull, Wu Hao, Luwei Kang,
Ananda Ravuri, Xu Yilun
In-Reply-To: <1563857495-26692-1-git-send-email-hao.wu@intel.com>
This patch adds support for global error reporting for FPGA
Management Engine (FME), it introduces sysfs interfaces to
report different error detected by the hardware, and allow
user to clear errors or inject error for testing purpose.
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Ananda Ravuri <ananda.ravuri@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Acked-by: Alan Tull <atull@kernel.org>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
---
v2: switch to device_add/remove_groups for sysfs.
v3: update kernel version and date in sysfs doc
---
Documentation/ABI/testing/sysfs-platform-dfl-fme | 75 +++++
drivers/fpga/Makefile | 2 +-
drivers/fpga/dfl-fme-error.c | 385 +++++++++++++++++++++++
drivers/fpga/dfl-fme-main.c | 4 +
drivers/fpga/dfl-fme.h | 2 +
drivers/fpga/dfl.h | 2 +
6 files changed, 469 insertions(+), 1 deletion(-)
create mode 100644 drivers/fpga/dfl-fme-error.c
diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
index 269e807..ad36b79 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
@@ -44,3 +44,78 @@ Description: Read-only. It returns socket_id to indicate which socket
this FPGA belongs to, only valid for integrated solution.
User only needs this information, in case standard numa node
can't provide correct information.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/revision
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. Read this file to get the revision of this global
+ error reporting private feature.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/pcie0_errors
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-Write. Read this file for errors detected on pcie0 link.
+ Write this file to clear errors logged in pcie0_errors. Write
+ fails with -EINVAL if input parsing fails or input error code
+ doesn't match.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/pcie1_errors
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-Write. Read this file for errors detected on pcie1 link.
+ Write this file to clear errors logged in pcie1_errors. Write
+ fails with -EINVAL if input parsing fails or input error code
+ doesn't match.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/nonfatal_errors
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. It returns non-fatal errors detected.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/catfatal_errors
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. It returns catastrophic and fatal errors detected.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/inject_error
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-Write. Read this file to check errors injected. Write this
+ file to inject errors for testing purpose. Write fails with
+ -EINVAL if input parsing fails or input inject error code isn't
+ supported.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/errors
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. Read this file to get errors detected by hardware.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/first_error
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. Read this file to get the first error detected by
+ hardware.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/next_error
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. Read this file to get the second error detected by
+ hardware.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/clear
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Write-only. Write error code to this file to clear all errors
+ logged in errors, first_error and next_error. Write fails with
+ -EINVAL if input parsing fails or input error code doesn't
+ match.
diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
index 7255891..4865b74 100644
--- a/drivers/fpga/Makefile
+++ b/drivers/fpga/Makefile
@@ -39,7 +39,7 @@ obj-$(CONFIG_FPGA_DFL_FME_BRIDGE) += dfl-fme-br.o
obj-$(CONFIG_FPGA_DFL_FME_REGION) += dfl-fme-region.o
obj-$(CONFIG_FPGA_DFL_AFU) += dfl-afu.o
-dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o
+dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o dfl-fme-error.o
dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o dfl-afu-dma-region.o
dfl-afu-objs += dfl-afu-error.o
diff --git a/drivers/fpga/dfl-fme-error.c b/drivers/fpga/dfl-fme-error.c
new file mode 100644
index 0000000..6b5605d
--- /dev/null
+++ b/drivers/fpga/dfl-fme-error.c
@@ -0,0 +1,385 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for FPGA Management Engine Error Management
+ *
+ * Copyright 2019 Intel Corporation, Inc.
+ *
+ * Authors:
+ * Kang Luwei <luwei.kang@intel.com>
+ * Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ * Wu Hao <hao.wu@intel.com>
+ * Joseph Grecco <joe.grecco@intel.com>
+ * Enno Luebbers <enno.luebbers@intel.com>
+ * Tim Whisonant <tim.whisonant@intel.com>
+ * Ananda Ravuri <ananda.ravuri@intel.com>
+ * Mitchel, Henry <henry.mitchel@intel.com>
+ */
+
+#include <linux/uaccess.h>
+
+#include "dfl.h"
+#include "dfl-fme.h"
+
+#define FME_ERROR_MASK 0x8
+#define FME_ERROR 0x10
+#define MBP_ERROR BIT_ULL(6)
+#define PCIE0_ERROR_MASK 0x18
+#define PCIE0_ERROR 0x20
+#define PCIE1_ERROR_MASK 0x28
+#define PCIE1_ERROR 0x30
+#define FME_FIRST_ERROR 0x38
+#define FME_NEXT_ERROR 0x40
+#define RAS_NONFAT_ERROR_MASK 0x48
+#define RAS_NONFAT_ERROR 0x50
+#define RAS_CATFAT_ERROR_MASK 0x58
+#define RAS_CATFAT_ERROR 0x60
+#define RAS_ERROR_INJECT 0x68
+#define INJECT_ERROR_MASK GENMASK_ULL(2, 0)
+
+static ssize_t revision_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ return sprintf(buf, "%u\n", dfl_feature_revision(base));
+}
+static DEVICE_ATTR_RO(revision);
+
+static ssize_t pcie0_errors_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ return sprintf(buf, "0x%llx\n",
+ (unsigned long long)readq(base + PCIE0_ERROR));
+}
+
+static ssize_t pcie0_errors_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+ int ret = 0;
+ u64 v, val;
+
+ if (kstrtou64(buf, 0, &val))
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ mutex_lock(&pdata->lock);
+ writeq(GENMASK_ULL(63, 0), base + PCIE0_ERROR_MASK);
+
+ v = readq(base + PCIE0_ERROR);
+ if (val == v)
+ writeq(v, base + PCIE0_ERROR);
+ else
+ ret = -EINVAL;
+
+ writeq(0ULL, base + PCIE0_ERROR_MASK);
+ mutex_unlock(&pdata->lock);
+ return ret ? ret : count;
+}
+static DEVICE_ATTR_RW(pcie0_errors);
+
+static ssize_t pcie1_errors_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ return sprintf(buf, "0x%llx\n",
+ (unsigned long long)readq(base + PCIE1_ERROR));
+}
+
+static ssize_t pcie1_errors_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+ int ret = 0;
+ u64 v, val;
+
+ if (kstrtou64(buf, 0, &val))
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ mutex_lock(&pdata->lock);
+ writeq(GENMASK_ULL(63, 0), base + PCIE1_ERROR_MASK);
+
+ v = readq(base + PCIE1_ERROR);
+ if (val == v)
+ writeq(v, base + PCIE1_ERROR);
+ else
+ ret = -EINVAL;
+
+ writeq(0ULL, base + PCIE1_ERROR_MASK);
+ mutex_unlock(&pdata->lock);
+ return ret ? ret : count;
+}
+static DEVICE_ATTR_RW(pcie1_errors);
+
+static ssize_t nonfatal_errors_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ return sprintf(buf, "0x%llx\n",
+ (unsigned long long)readq(base + RAS_NONFAT_ERROR));
+}
+static DEVICE_ATTR_RO(nonfatal_errors);
+
+static ssize_t catfatal_errors_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ return sprintf(buf, "0x%llx\n",
+ (unsigned long long)readq(base + RAS_CATFAT_ERROR));
+}
+static DEVICE_ATTR_RO(catfatal_errors);
+
+static ssize_t inject_error_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ v = readq(base + RAS_ERROR_INJECT);
+
+ return sprintf(buf, "0x%llx\n",
+ (unsigned long long)FIELD_GET(INJECT_ERROR_MASK, v));
+}
+
+static ssize_t inject_error_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+ u8 inject_error;
+ u64 v;
+
+ if (kstrtou8(buf, 0, &inject_error))
+ return -EINVAL;
+
+ if (inject_error & ~INJECT_ERROR_MASK)
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + RAS_ERROR_INJECT);
+ v &= ~INJECT_ERROR_MASK;
+ v |= FIELD_PREP(INJECT_ERROR_MASK, inject_error);
+ writeq(v, base + RAS_ERROR_INJECT);
+ mutex_unlock(&pdata->lock);
+
+ return count;
+}
+static DEVICE_ATTR_RW(inject_error);
+
+static struct attribute *errors_attrs[] = {
+ &dev_attr_revision.attr,
+ &dev_attr_pcie0_errors.attr,
+ &dev_attr_pcie1_errors.attr,
+ &dev_attr_nonfatal_errors.attr,
+ &dev_attr_catfatal_errors.attr,
+ &dev_attr_inject_error.attr,
+ NULL,
+};
+
+static struct attribute_group errors_attr_group = {
+ .attrs = errors_attrs,
+};
+
+static ssize_t errors_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ return sprintf(buf, "0x%llx\n",
+ (unsigned long long)readq(base + FME_ERROR));
+}
+static DEVICE_ATTR_RO(errors);
+
+static ssize_t first_error_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ return sprintf(buf, "0x%llx\n",
+ (unsigned long long)readq(base + FME_FIRST_ERROR));
+}
+static DEVICE_ATTR_RO(first_error);
+
+static ssize_t next_error_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ return sprintf(buf, "0x%llx\n",
+ (unsigned long long)readq(base + FME_NEXT_ERROR));
+}
+static DEVICE_ATTR_RO(next_error);
+
+static ssize_t clear_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+ u64 v, val;
+ int ret = 0;
+
+ if (kstrtou64(buf, 0, &val))
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ mutex_lock(&pdata->lock);
+ writeq(GENMASK_ULL(63, 0), base + FME_ERROR_MASK);
+
+ v = readq(base + FME_ERROR);
+ if (val == v) {
+ writeq(v, base + FME_ERROR);
+ v = readq(base + FME_FIRST_ERROR);
+ writeq(v, base + FME_FIRST_ERROR);
+ v = readq(base + FME_NEXT_ERROR);
+ writeq(v, base + FME_NEXT_ERROR);
+ } else {
+ ret = -EINVAL;
+ }
+
+ /* Workaround: disable MBP_ERROR if feature revision is 0 */
+ writeq(dfl_feature_revision(base) ? 0ULL : MBP_ERROR,
+ base + FME_ERROR_MASK);
+ mutex_unlock(&pdata->lock);
+ return ret ? ret : count;
+}
+static DEVICE_ATTR_WO(clear);
+
+static struct attribute *fme_errors_attrs[] = {
+ &dev_attr_errors.attr,
+ &dev_attr_first_error.attr,
+ &dev_attr_next_error.attr,
+ &dev_attr_clear.attr,
+ NULL,
+};
+
+static struct attribute_group fme_errors_attr_group = {
+ .attrs = fme_errors_attrs,
+ .name = "fme-errors",
+};
+
+static const struct attribute_group *error_groups[] = {
+ &fme_errors_attr_group,
+ &errors_attr_group,
+ NULL
+};
+
+static void fme_error_enable(struct dfl_feature *feature)
+{
+ void __iomem *base = feature->ioaddr;
+
+ /* Workaround: disable MBP_ERROR if revision is 0 */
+ writeq(dfl_feature_revision(feature->ioaddr) ? 0ULL : MBP_ERROR,
+ base + FME_ERROR_MASK);
+ writeq(0ULL, base + PCIE0_ERROR_MASK);
+ writeq(0ULL, base + PCIE1_ERROR_MASK);
+ writeq(0ULL, base + RAS_NONFAT_ERROR_MASK);
+ writeq(0ULL, base + RAS_CATFAT_ERROR_MASK);
+}
+
+static void err_dev_release(struct device *dev)
+{
+ kfree(dev);
+}
+
+static int fme_global_err_init(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ struct device *dev;
+ int ret = 0;
+
+ dev_dbg(&pdev->dev, "FME Global Error Reporting Init.\n");
+
+ dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+ if (!dev)
+ return -ENOMEM;
+
+ dev->parent = &pdev->dev;
+ dev->release = err_dev_release;
+ dev_set_name(dev, "errors");
+
+ fme_error_enable(feature);
+
+ ret = device_register(dev);
+ if (ret) {
+ put_device(dev);
+ return ret;
+ }
+
+ ret = device_add_groups(dev, error_groups);
+ if (ret) {
+ device_unregister(dev);
+ return ret;
+ }
+
+ feature->priv = dev;
+
+ return ret;
+}
+
+static void fme_global_err_uinit(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ struct device *dev = feature->priv;
+
+ dev_dbg(&pdev->dev, "FME Global Error Reporting UInit.\n");
+
+ device_remove_groups(dev, error_groups);
+ device_unregister(dev);
+}
+
+const struct dfl_feature_id fme_global_err_id_table[] = {
+ {.id = FME_FEATURE_ID_GLOBAL_ERR,},
+ {0,}
+};
+
+const struct dfl_feature_ops fme_global_err_ops = {
+ .init = fme_global_err_init,
+ .uinit = fme_global_err_uinit,
+};
diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index c8703c4..a09f75b 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -202,6 +202,10 @@ static long fme_hdr_ioctl(struct platform_device *pdev,
.ops = &fme_pr_mgmt_ops,
},
{
+ .id_table = fme_global_err_id_table,
+ .ops = &fme_global_err_ops,
+ },
+ {
.ops = NULL,
},
};
diff --git a/drivers/fpga/dfl-fme.h b/drivers/fpga/dfl-fme.h
index 7a021c4..5fbe3f5 100644
--- a/drivers/fpga/dfl-fme.h
+++ b/drivers/fpga/dfl-fme.h
@@ -37,5 +37,7 @@ struct dfl_fme {
extern const struct dfl_feature_ops fme_pr_mgmt_ops;
extern const struct dfl_feature_id fme_pr_mgmt_id_table[];
+extern const struct dfl_feature_ops fme_global_err_ops;
+extern const struct dfl_feature_id fme_global_err_id_table[];
#endif /* __DFL_FME_H */
diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
index f44fda1..64eae77 100644
--- a/drivers/fpga/dfl.h
+++ b/drivers/fpga/dfl.h
@@ -197,12 +197,14 @@ struct dfl_feature_driver {
* feature dev (platform device)'s reources.
* @ioaddr: mapped mmio resource address.
* @ops: ops of this sub feature.
+ * @priv: priv data of this feature.
*/
struct dfl_feature {
u64 id;
int resource_index;
void __iomem *ioaddr;
const struct dfl_feature_ops *ops;
+ void *priv;
};
#define DEV_STATUS_IN_USE 0
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 10/12] fpga: dfl: fme: add capability sysfs interfaces
From: Wu Hao @ 2019-07-23 4:51 UTC (permalink / raw)
To: gregkh, mdf, linux-fpga
Cc: linux-kernel, linux-api, linux-doc, atull, Wu Hao, Luwei Kang,
Xu Yilun
In-Reply-To: <1563857495-26692-1-git-send-email-hao.wu@intel.com>
This patch adds 3 read-only sysfs interfaces for FPGA Management Engine
(FME) block for capabilities including cache_size, fabric_version and
socket_id.
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Acked-by: Alan Tull <atull@kernel.org>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
---
v2: rebased.
v3: update kernel version and date in sysfs doc
---
Documentation/ABI/testing/sysfs-platform-dfl-fme | 23 ++++++++++++
drivers/fpga/dfl-fme-main.c | 48 ++++++++++++++++++++++++
2 files changed, 71 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
index 8fa4feb..269e807 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
@@ -21,3 +21,26 @@ Contact: Wu Hao <hao.wu@intel.com>
Description: Read-only. It returns Bitstream (static FPGA region) meta
data, which includes the synthesis date, seed and other
information of this static FPGA region.
+
+What: /sys/bus/platform/devices/dfl-fme.0/cache_size
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. It returns cache size of this FPGA device.
+
+What: /sys/bus/platform/devices/dfl-fme.0/fabric_version
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. It returns fabric version of this FPGA device.
+ Userspace applications need this information to select
+ best data channels per different fabric design.
+
+What: /sys/bus/platform/devices/dfl-fme.0/socket_id
+Date: July 2019
+KernelVersion: 5.4
+Contact: Wu Hao <hao.wu@intel.com>
+Description: Read-only. It returns socket_id to indicate which socket
+ this FPGA belongs to, only valid for integrated solution.
+ User only needs this information, in case standard numa node
+ can't provide correct information.
diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index e333f19..c8703c4 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -73,10 +73,58 @@ static ssize_t bitstream_metadata_show(struct device *dev,
}
static DEVICE_ATTR_RO(bitstream_metadata);
+static ssize_t cache_size_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER);
+
+ v = readq(base + FME_HDR_CAP);
+
+ return sprintf(buf, "%u\n",
+ (unsigned int)FIELD_GET(FME_CAP_CACHE_SIZE, v));
+}
+static DEVICE_ATTR_RO(cache_size);
+
+static ssize_t fabric_version_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER);
+
+ v = readq(base + FME_HDR_CAP);
+
+ return sprintf(buf, "%u\n",
+ (unsigned int)FIELD_GET(FME_CAP_FABRIC_VERID, v));
+}
+static DEVICE_ATTR_RO(fabric_version);
+
+static ssize_t socket_id_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER);
+
+ v = readq(base + FME_HDR_CAP);
+
+ return sprintf(buf, "%u\n",
+ (unsigned int)FIELD_GET(FME_CAP_SOCKET_ID, v));
+}
+static DEVICE_ATTR_RO(socket_id);
+
static struct attribute *fme_hdr_attrs[] = {
&dev_attr_ports_num.attr,
&dev_attr_bitstream_id.attr,
&dev_attr_bitstream_metadata.attr,
+ &dev_attr_cache_size.attr,
+ &dev_attr_fabric_version.attr,
+ &dev_attr_socket_id.attr,
NULL,
};
ATTRIBUTE_GROUPS(fme_hdr);
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 06/12] fpga: dfl: add id_table for dfl private feature driver
From: Wu Hao @ 2019-07-23 4:51 UTC (permalink / raw)
To: gregkh, mdf, linux-fpga
Cc: linux-kernel, linux-api, linux-doc, atull, Wu Hao, Xu Yilun
In-Reply-To: <1563857495-26692-1-git-send-email-hao.wu@intel.com>
This patch adds id_table for each dfl private feature driver,
it allows to reuse same private feature driver to match and support
multiple dfl private features.
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Acked-by: Moritz Fischer <mdf@kernel.org>
Acked-by: Alan Tull <atull@kernel.org>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
---
v2: rebased, remove DRV/MODULE_VERSION modifications
---
drivers/fpga/dfl-afu-main.c | 14 ++++++++++++--
drivers/fpga/dfl-fme-main.c | 11 ++++++++---
drivers/fpga/dfl-fme-pr.c | 7 ++++++-
drivers/fpga/dfl-fme.h | 3 ++-
drivers/fpga/dfl.c | 18 ++++++++++++++++--
drivers/fpga/dfl.h | 21 +++++++++++++++------
6 files changed, 59 insertions(+), 15 deletions(-)
diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index 9025314..fbd9553 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -435,6 +435,11 @@ static void port_hdr_uinit(struct platform_device *pdev,
return ret;
}
+static const struct dfl_feature_id port_hdr_id_table[] = {
+ {.id = PORT_FEATURE_ID_HEADER,},
+ {0,}
+};
+
static const struct dfl_feature_ops port_hdr_ops = {
.init = port_hdr_init,
.uinit = port_hdr_uinit,
@@ -496,6 +501,11 @@ static void port_afu_uinit(struct platform_device *pdev,
device_remove_groups(&pdev->dev, port_afu_groups);
}
+static const struct dfl_feature_id port_afu_id_table[] = {
+ {.id = PORT_FEATURE_ID_AFU,},
+ {0,}
+};
+
static const struct dfl_feature_ops port_afu_ops = {
.init = port_afu_init,
.uinit = port_afu_uinit,
@@ -503,11 +513,11 @@ static void port_afu_uinit(struct platform_device *pdev,
static struct dfl_feature_driver port_feature_drvs[] = {
{
- .id = PORT_FEATURE_ID_HEADER,
+ .id_table = port_hdr_id_table,
.ops = &port_hdr_ops,
},
{
- .id = PORT_FEATURE_ID_AFU,
+ .id_table = port_afu_id_table,
.ops = &port_afu_ops,
},
{
diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index e61e0fe..e333f19 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -133,6 +133,11 @@ static long fme_hdr_ioctl(struct platform_device *pdev,
return -ENODEV;
}
+static const struct dfl_feature_id fme_hdr_id_table[] = {
+ {.id = FME_FEATURE_ID_HEADER,},
+ {0,}
+};
+
static const struct dfl_feature_ops fme_hdr_ops = {
.init = fme_hdr_init,
.uinit = fme_hdr_uinit,
@@ -141,12 +146,12 @@ static long fme_hdr_ioctl(struct platform_device *pdev,
static struct dfl_feature_driver fme_feature_drvs[] = {
{
- .id = FME_FEATURE_ID_HEADER,
+ .id_table = fme_hdr_id_table,
.ops = &fme_hdr_ops,
},
{
- .id = FME_FEATURE_ID_PR_MGMT,
- .ops = &pr_mgmt_ops,
+ .id_table = fme_pr_mgmt_id_table,
+ .ops = &fme_pr_mgmt_ops,
},
{
.ops = NULL,
diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c
index cd94ba8..52f1745 100644
--- a/drivers/fpga/dfl-fme-pr.c
+++ b/drivers/fpga/dfl-fme-pr.c
@@ -483,7 +483,12 @@ static long fme_pr_ioctl(struct platform_device *pdev,
return ret;
}
-const struct dfl_feature_ops pr_mgmt_ops = {
+const struct dfl_feature_id fme_pr_mgmt_id_table[] = {
+ {.id = FME_FEATURE_ID_PR_MGMT,},
+ {0}
+};
+
+const struct dfl_feature_ops fme_pr_mgmt_ops = {
.init = pr_mgmt_init,
.uinit = pr_mgmt_uinit,
.ioctl = fme_pr_ioctl,
diff --git a/drivers/fpga/dfl-fme.h b/drivers/fpga/dfl-fme.h
index de20755..7a021c4 100644
--- a/drivers/fpga/dfl-fme.h
+++ b/drivers/fpga/dfl-fme.h
@@ -35,6 +35,7 @@ struct dfl_fme {
struct dfl_feature_platform_data *pdata;
};
-extern const struct dfl_feature_ops pr_mgmt_ops;
+extern const struct dfl_feature_ops fme_pr_mgmt_ops;
+extern const struct dfl_feature_id fme_pr_mgmt_id_table[];
#endif /* __DFL_FME_H */
diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c
index c3a8e1d..3eb67ab 100644
--- a/drivers/fpga/dfl.c
+++ b/drivers/fpga/dfl.c
@@ -281,6 +281,21 @@ static int dfl_feature_instance_init(struct platform_device *pdev,
return ret;
}
+static bool dfl_feature_drv_match(struct dfl_feature *feature,
+ struct dfl_feature_driver *driver)
+{
+ const struct dfl_feature_id *ids = driver->id_table;
+
+ if (ids) {
+ while (ids->id) {
+ if (ids->id == feature->id)
+ return true;
+ ids++;
+ }
+ }
+ return false;
+}
+
/**
* dfl_fpga_dev_feature_init - init for sub features of dfl feature device
* @pdev: feature device.
@@ -301,8 +316,7 @@ int dfl_fpga_dev_feature_init(struct platform_device *pdev,
while (drv->ops) {
dfl_fpga_dev_for_each_feature(pdata, feature) {
- /* match feature and drv using id */
- if (feature->id == drv->id) {
+ if (dfl_feature_drv_match(feature, drv)) {
ret = dfl_feature_instance_init(pdev, pdata,
feature, drv);
if (ret)
diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
index 77b5137..f44fda1 100644
--- a/drivers/fpga/dfl.h
+++ b/drivers/fpga/dfl.h
@@ -30,8 +30,8 @@
/* plus one for fme device */
#define MAX_DFL_FEATURE_DEV_NUM (MAX_DFL_FPGA_PORT_NUM + 1)
-/* Reserved 0x0 for Header Group Register and 0xff for AFU */
-#define FEATURE_ID_FIU_HEADER 0x0
+/* Reserved 0xfe for Header Group Register and 0xff for AFU */
+#define FEATURE_ID_FIU_HEADER 0xfe
#define FEATURE_ID_AFU 0xff
#define FME_FEATURE_ID_HEADER FEATURE_ID_FIU_HEADER
@@ -169,13 +169,22 @@ struct dfl_fpga_port_ops {
int dfl_fpga_check_port_id(struct platform_device *pdev, void *pport_id);
/**
- * struct dfl_feature_driver - sub feature's driver
+ * struct dfl_feature_id - dfl private feature id
*
- * @id: sub feature id.
- * @ops: ops of this sub feature.
+ * @id: unique dfl private feature id.
*/
-struct dfl_feature_driver {
+struct dfl_feature_id {
u64 id;
+};
+
+/**
+ * struct dfl_feature_driver - dfl private feature driver
+ *
+ * @id_table: id_table for dfl private features supported by this driver.
+ * @ops: ops of this dfl private feature driver.
+ */
+struct dfl_feature_driver {
+ const struct dfl_feature_id *id_table;
const struct dfl_feature_ops *ops;
};
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 03/12] fpga: dfl: pci: enable SRIOV support.
From: Wu Hao @ 2019-07-23 4:51 UTC (permalink / raw)
To: gregkh, mdf, linux-fpga
Cc: linux-kernel, linux-api, linux-doc, atull, Wu Hao, Zhang Yi Z,
Xu Yilun
In-Reply-To: <1563857495-26692-1-git-send-email-hao.wu@intel.com>
This patch enables the standard sriov support. It allows user to
enable SRIOV (and VFs), then user could pass through accelerators
(VFs) into virtual machine or use VFs directly in host.
Signed-off-by: Zhang Yi Z <yi.z.zhang@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Acked-by: Alan Tull <atull@kernel.org>
Acked-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
---
v2: remove DRV/MODULE_VERSION modifications.
---
drivers/fpga/dfl-pci.c | 39 +++++++++++++++++++++++++++++++++++++++
drivers/fpga/dfl.c | 41 +++++++++++++++++++++++++++++++++++++++++
drivers/fpga/dfl.h | 1 +
3 files changed, 81 insertions(+)
diff --git a/drivers/fpga/dfl-pci.c b/drivers/fpga/dfl-pci.c
index 66b5720..0e65d81 100644
--- a/drivers/fpga/dfl-pci.c
+++ b/drivers/fpga/dfl-pci.c
@@ -223,8 +223,46 @@ int cci_pci_probe(struct pci_dev *pcidev, const struct pci_device_id *pcidevid)
return ret;
}
+static int cci_pci_sriov_configure(struct pci_dev *pcidev, int num_vfs)
+{
+ struct cci_drvdata *drvdata = pci_get_drvdata(pcidev);
+ struct dfl_fpga_cdev *cdev = drvdata->cdev;
+ int ret = 0;
+
+ mutex_lock(&cdev->lock);
+
+ if (!num_vfs) {
+ /*
+ * disable SRIOV and then put released ports back to default
+ * PF access mode.
+ */
+ pci_disable_sriov(pcidev);
+
+ __dfl_fpga_cdev_config_port_vf(cdev, false);
+
+ } else if (cdev->released_port_num == num_vfs) {
+ /*
+ * only enable SRIOV if cdev has matched released ports, put
+ * released ports into VF access mode firstly.
+ */
+ __dfl_fpga_cdev_config_port_vf(cdev, true);
+
+ ret = pci_enable_sriov(pcidev, num_vfs);
+ if (ret)
+ __dfl_fpga_cdev_config_port_vf(cdev, false);
+ } else {
+ ret = -EINVAL;
+ }
+
+ mutex_unlock(&cdev->lock);
+ return ret;
+}
+
static void cci_pci_remove(struct pci_dev *pcidev)
{
+ if (dev_is_pf(&pcidev->dev))
+ cci_pci_sriov_configure(pcidev, 0);
+
cci_remove_feature_devs(pcidev);
pci_disable_pcie_error_reporting(pcidev);
}
@@ -234,6 +272,7 @@ static void cci_pci_remove(struct pci_dev *pcidev)
.id_table = cci_pcie_id_tbl,
.probe = cci_pci_probe,
.remove = cci_pci_remove,
+ .sriov_configure = cci_pci_sriov_configure,
};
module_pci_driver(cci_pci_driver);
diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c
index e04ed45..c3a8e1d 100644
--- a/drivers/fpga/dfl.c
+++ b/drivers/fpga/dfl.c
@@ -1112,6 +1112,47 @@ int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev *cdev, int port_id,
}
EXPORT_SYMBOL_GPL(dfl_fpga_cdev_config_port);
+static void config_port_vf(struct device *fme_dev, int port_id, bool is_vf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(fme_dev, FME_FEATURE_ID_HEADER);
+
+ v = readq(base + FME_HDR_PORT_OFST(port_id));
+
+ v &= ~FME_PORT_OFST_ACC_CTRL;
+ v |= FIELD_PREP(FME_PORT_OFST_ACC_CTRL,
+ is_vf ? FME_PORT_OFST_ACC_VF : FME_PORT_OFST_ACC_PF);
+
+ writeq(v, base + FME_HDR_PORT_OFST(port_id));
+}
+
+/**
+ * __dfl_fpga_cdev_config_port_vf - configure port to VF access mode
+ *
+ * @cdev: parent container device.
+ * @if_vf: true for VF access mode, and false for PF access mode
+ *
+ * Return: 0 on success, negative error code otherwise.
+ *
+ * This function is needed in sriov configuration routine. It could be used to
+ * configures the released ports access mode to VF or PF.
+ * The caller needs to hold lock for protection.
+ */
+void __dfl_fpga_cdev_config_port_vf(struct dfl_fpga_cdev *cdev, bool is_vf)
+{
+ struct dfl_feature_platform_data *pdata;
+
+ list_for_each_entry(pdata, &cdev->port_dev_list, node) {
+ if (device_is_registered(&pdata->dev->dev))
+ continue;
+
+ config_port_vf(cdev->fme_dev, pdata->id, is_vf);
+ }
+}
+EXPORT_SYMBOL_GPL(__dfl_fpga_cdev_config_port_vf);
+
static int __init dfl_fpga_init(void)
{
int ret;
diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
index d700ee9..061ccd4 100644
--- a/drivers/fpga/dfl.h
+++ b/drivers/fpga/dfl.h
@@ -421,5 +421,6 @@ struct platform_device *
int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev *cdev,
int port_id, bool release);
+void __dfl_fpga_cdev_config_port_vf(struct dfl_fpga_cdev *cdev, bool is_vf);
#endif /* __FPGA_DFL_H */
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 00/12] FPGA DFL updates
From: Wu Hao @ 2019-07-23 4:51 UTC (permalink / raw)
To: gregkh, mdf, linux-fpga; +Cc: linux-kernel, linux-api, linux-doc, atull, Wu Hao
Hi Greg
This is v3 patchset which adds more features to FPGA DFL. This patchset
is made on top of patch[1] and 5.3-rc1 tree. Documentation patch for DFL
is added back to this patchset
Main changes from v2:
- update kernel version/date in sysfs doc (patch #4, #5, #8, #10, #11).
- add back Documentation patch (patch #12).
Main changes from v1:
- remove DRV/MODULE_VERSION modifications. (patch #1, #3, #4, #6)
- remove argsz from new ioctls. (patch #2)
- replace sysfs_create/remove_* with device_add/remove_* for sysfs entries.
(patch #5, #8, #11)
[1] [PATCH] fpga: dfl: use driver core functions, not sysfs ones.
https://lkml.org/lkml/2019/7/4/36
Wu Hao (12):
fpga: dfl: fme: support 512bit data width PR
fpga: dfl: fme: add DFL_FPGA_FME_PORT_RELEASE/ASSIGN ioctl support.
fpga: dfl: pci: enable SRIOV support.
fpga: dfl: afu: add AFU state related sysfs interfaces
fpga: dfl: afu: add userclock sysfs interfaces.
fpga: dfl: add id_table for dfl private feature driver
fpga: dfl: afu: export __port_enable/disable function.
fpga: dfl: afu: add error reporting support.
fpga: dfl: afu: add STP (SignalTap) support
fpga: dfl: fme: add capability sysfs interfaces
fpga: dfl: fme: add global error reporting support
Documentation: fpga: dfl: add descriptions for virtualization and new
interfaces.
Documentation/ABI/testing/sysfs-platform-dfl-fme | 98 ++++++
Documentation/ABI/testing/sysfs-platform-dfl-port | 104 ++++++
Documentation/fpga/dfl.rst | 105 ++++++
drivers/fpga/Makefile | 3 +-
drivers/fpga/dfl-afu-error.c | 225 +++++++++++++
drivers/fpga/dfl-afu-main.c | 328 +++++++++++++++++-
drivers/fpga/dfl-afu.h | 7 +
drivers/fpga/dfl-fme-error.c | 385 ++++++++++++++++++++++
drivers/fpga/dfl-fme-main.c | 93 +++++-
drivers/fpga/dfl-fme-mgr.c | 110 ++++++-
drivers/fpga/dfl-fme-pr.c | 50 ++-
drivers/fpga/dfl-fme.h | 7 +-
drivers/fpga/dfl-pci.c | 39 +++
drivers/fpga/dfl.c | 166 +++++++++-
drivers/fpga/dfl.h | 54 ++-
include/uapi/linux/fpga-dfl.h | 19 ++
16 files changed, 1722 insertions(+), 71 deletions(-)
create mode 100644 drivers/fpga/dfl-afu-error.c
create mode 100644 drivers/fpga/dfl-fme-error.c
--
1.8.3.1
^ permalink raw reply
* Re: [PATCH v2 0/2] Add CCPI2 PMU support
From: Ganapatrao Kulkarni @ 2019-07-23 4:33 UTC (permalink / raw)
To: Ganapatrao Kulkarni
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, Will.Deacon@arm.com,
mark.rutland@arm.com, corbet@lwn.net,
Jayachandran Chandrasekharan Nair, rrichter@marvell.coma,
Jan Glauber
In-Reply-To: <1562317967-16329-1-git-send-email-gkulkarni@marvell.com>
Hi Will,
Any further comments for this patch?
On Fri, Jul 5, 2019 at 2:43 PM Ganapatrao Kulkarni
<gkulkarni@marvell.com> wrote:
>
> Add Cavium Coherent Processor Interconnect (CCPI2) PMU
> support in ThunderX2 Uncore driver.
>
> v2: Updated with review comments [1]
>
> [1] https://lkml.org/lkml/2019/6/14/965
>
> v1: initial patch
>
> Ganapatrao Kulkarni (2):
> Documentation: perf: Update documentation for ThunderX2 PMU uncore
> driver
> drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver.
>
> Documentation/perf/thunderx2-pmu.txt | 20 ++-
> drivers/perf/thunderx2_pmu.c | 248 +++++++++++++++++++++++----
> 2 files changed, 225 insertions(+), 43 deletions(-)
>
> --
> 2.17.1
>
Thanks,
Ganapat
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox