* [PATCH 2/5] PCI/AER: Add sysfs stats for AER capable devices
From: Rajat Jain @ 2018-05-22 22:28 UTC (permalink / raw)
To: Bjorn Helgaas, Jonathan Corbet, Philippe Ombredanne, Kate Stewart,
Thomas Gleixner, Greg Kroah-Hartman, Rajat Jain, Frederick Lawler,
Oza Pawandeep, Keith Busch, Gabriele Paoloni, Alexandru Gagniuc,
Thomas Tai, Steven Rostedt (VMware), linux-pci, linux-doc,
linux-kernel, Jes Sorensen, Kyle McMartin
Cc: rajatxjain
In-Reply-To: <20180522222805.80314-1-rajatja@google.com>
Add the following AER sysfs stats to represent the counters for each
kind of error as seen by the device:
dev_total_cor_errs
dev_total_fatal_errs
dev_total_nonfatal_errs
Signed-off-by: Rajat Jain <rajatja@google.com>
---
drivers/pci/pci-sysfs.c | 3 ++
drivers/pci/pci.h | 4 +-
drivers/pci/pcie/aer/aerdrv.h | 1 +
drivers/pci/pcie/aer/aerdrv_errprint.c | 1 +
drivers/pci/pcie/aer/aerdrv_stats.c | 72 ++++++++++++++++++++++++++
5 files changed, 80 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 366d93af051d..730f985a3dc9 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1743,6 +1743,9 @@ static const struct attribute_group *pci_dev_attr_groups[] = {
#endif
&pci_bridge_attr_group,
&pcie_dev_attr_group,
+#ifdef CONFIG_PCIEAER
+ &aer_stats_attr_group,
+#endif
NULL,
};
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index c358e7a07f3f..9a28ec600225 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -181,7 +181,9 @@ extern const struct attribute_group *pci_dev_groups[];
extern const struct attribute_group *pcibus_groups[];
extern const struct device_type pci_dev_type;
extern const struct attribute_group *pci_bus_groups[];
-
+#ifdef CONFIG_PCIEAER
+extern const struct attribute_group aer_stats_attr_group;
+#endif
/**
* pci_match_one_device - Tell if a PCI device structure has a matching
diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index d8b9fba536ed..b5d5ad6f2c03 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -87,6 +87,7 @@ void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info);
irqreturn_t aer_irq(int irq, void *context);
int pci_aer_stats_init(struct pci_dev *pdev);
void pci_aer_stats_exit(struct pci_dev *pdev);
+void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info);
#ifdef CONFIG_ACPI_APEI
int pcie_aer_get_firmware_first(struct pci_dev *pci_dev);
diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
index 21ca5e1b0ded..5e8b98deda08 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -155,6 +155,7 @@ static void __aer_print_error(struct pci_dev *dev,
pci_err(dev, " [%2d] Unknown Error Bit%s\n",
i, info->first_error == i ? " (First)" : "");
}
+ pci_dev_aer_stats_incr(dev, info);
}
void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c b/drivers/pci/pcie/aer/aerdrv_stats.c
index b9f251992209..87b7119d0a86 100644
--- a/drivers/pci/pcie/aer/aerdrv_stats.c
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -47,6 +47,78 @@ struct aer_stats {
u64 rootport_total_nonfatal_errs;
};
+#define aer_stats_aggregate_attr(field) \
+ static ssize_t \
+ field##_show(struct device *dev, struct device_attribute *attr, \
+ char *buf) \
+{ \
+ struct pci_dev *pdev = to_pci_dev(dev); \
+ return sprintf(buf, "0x%llx\n", pdev->aer_stats->field); \
+} \
+static DEVICE_ATTR_RO(field)
+
+aer_stats_aggregate_attr(dev_total_cor_errs);
+aer_stats_aggregate_attr(dev_total_fatal_errs);
+aer_stats_aggregate_attr(dev_total_nonfatal_errs);
+
+static struct attribute *aer_stats_attrs[] __ro_after_init = {
+ &dev_attr_dev_total_cor_errs.attr,
+ &dev_attr_dev_total_fatal_errs.attr,
+ &dev_attr_dev_total_nonfatal_errs.attr,
+ NULL
+};
+
+static umode_t aer_stats_attrs_are_visible(struct kobject *kobj,
+ struct attribute *a, int n)
+{
+ struct device *dev = kobj_to_dev(kobj);
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ if (!pdev->aer_stats)
+ return 0;
+
+ return a->mode;
+}
+
+const struct attribute_group aer_stats_attr_group = {
+ .name = "aer_stats",
+ .attrs = aer_stats_attrs,
+ .is_visible = aer_stats_attrs_are_visible,
+};
+
+void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info)
+{
+ int status, i, max = -1;
+ u64 *counter = NULL;
+ struct aer_stats *aer_stats = pdev->aer_stats;
+
+ if (unlikely(!aer_stats))
+ return;
+
+ switch (info->severity) {
+ case AER_CORRECTABLE:
+ aer_stats->dev_total_cor_errs++;
+ counter = &aer_stats->dev_cor_errs[0];
+ max = AER_MAX_TYPEOF_CORRECTABLE_ERRS;
+ break;
+ case AER_NONFATAL:
+ aer_stats->dev_total_nonfatal_errs++;
+ counter = &aer_stats->dev_uncor_errs[0];
+ max = AER_MAX_TYPEOF_UNCORRECTABLE_ERRS;
+ break;
+ case AER_FATAL:
+ aer_stats->dev_total_fatal_errs++;
+ counter = &aer_stats->dev_uncor_errs[0];
+ max = AER_MAX_TYPEOF_UNCORRECTABLE_ERRS;
+ break;
+ }
+
+ status = (info->status & ~info->mask);
+ for (i = 0; i < max; i++)
+ if (status & (1 << i))
+ counter[i]++;
+}
+
int pci_aer_stats_init(struct pci_dev *pdev)
{
pdev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
--
2.17.0.441.gb46fe60e1d-goog
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices
From: Rajat Jain @ 2018-05-22 22:28 UTC (permalink / raw)
To: Bjorn Helgaas, Jonathan Corbet, Philippe Ombredanne, Kate Stewart,
Thomas Gleixner, Greg Kroah-Hartman, Rajat Jain, Frederick Lawler,
Oza Pawandeep, Keith Busch, Gabriele Paoloni, Alexandru Gagniuc,
Thomas Tai, Steven Rostedt (VMware), linux-pci, linux-doc,
linux-kernel, Jes Sorensen, Kyle McMartin
Cc: rajatxjain
In-Reply-To: <20180522222805.80314-1-rajatja@google.com>
Define a structure to hold the AER statistics. There are 2 groups
of statistics: dev_* counters that are to be collected for all AER
capable devices and rootport_* counters that are collected for all
(AER capable) rootports only. Allocate and free this structure when
device is added or released (thus counters survive the lifetime of the
device).
Add a new file aerdrv_stats.c to hold the AER stats collection logic.
Signed-off-by: Rajat Jain <rajatja@google.com>
---
drivers/pci/pcie/aer/Makefile | 2 +-
drivers/pci/pcie/aer/aerdrv.h | 6 +++
drivers/pci/pcie/aer/aerdrv_core.c | 9 ++++
drivers/pci/pcie/aer/aerdrv_stats.c | 64 +++++++++++++++++++++++++++++
drivers/pci/probe.c | 1 +
include/linux/pci.h | 3 ++
6 files changed, 84 insertions(+), 1 deletion(-)
create mode 100644 drivers/pci/pcie/aer/aerdrv_stats.c
diff --git a/drivers/pci/pcie/aer/Makefile b/drivers/pci/pcie/aer/Makefile
index 09bd890875a3..a06f9cc2bde5 100644
--- a/drivers/pci/pcie/aer/Makefile
+++ b/drivers/pci/pcie/aer/Makefile
@@ -7,7 +7,7 @@ obj-$(CONFIG_PCIEAER) += aerdriver.o
obj-$(CONFIG_PCIE_ECRC) += ecrc.o
-aerdriver-objs := aerdrv_errprint.o aerdrv_core.o aerdrv.o
+aerdriver-objs := aerdrv_errprint.o aerdrv_core.o aerdrv.o aerdrv_stats.o
aerdriver-$(CONFIG_ACPI) += aerdrv_acpi.o
obj-$(CONFIG_PCIEAER_INJECT) += aer_inject.o
diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index b4c950683cc7..d8b9fba536ed 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -33,6 +33,10 @@
PCI_ERR_UNC_MALF_TLP)
#define AER_MAX_MULTI_ERR_DEVICES 5 /* Not likely to have more */
+
+#define AER_MAX_TYPEOF_CORRECTABLE_ERRS 16 /* as per PCI_ERR_COR_STATUS */
+#define AER_MAX_TYPEOF_UNCORRECTABLE_ERRS 26 /* as per PCI_ERR_UNCOR_STATUS*/
+
struct aer_err_info {
struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
int error_dev_num;
@@ -81,6 +85,8 @@ void aer_isr(struct work_struct *work);
void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info);
irqreturn_t aer_irq(int irq, void *context);
+int pci_aer_stats_init(struct pci_dev *pdev);
+void pci_aer_stats_exit(struct pci_dev *pdev);
#ifdef CONFIG_ACPI_APEI
int pcie_aer_get_firmware_first(struct pci_dev *pci_dev);
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index 36e622d35c48..42a6f913069a 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -95,9 +95,18 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
int pci_aer_init(struct pci_dev *dev)
{
dev->aer_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
+
+ if (!dev->aer_cap || pci_aer_stats_init(dev))
+ return -EIO;
+
return pci_cleanup_aer_error_status_regs(dev);
}
+void pci_aer_exit(struct pci_dev *dev)
+{
+ pci_aer_stats_exit(dev);
+}
+
/**
* add_error_device - list device to be handled
* @e_info: pointer to error info
diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c b/drivers/pci/pcie/aer/aerdrv_stats.c
new file mode 100644
index 000000000000..b9f251992209
--- /dev/null
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018 Google Inc, All Rights Reserved.
+ * Rajat Jain (rajatja@google.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * AER Statistics - exposed to userspace via /sysfs attributes.
+ */
+
+#include <linux/pci.h>
+#include "aerdrv.h"
+
+/* AER stats for the device */
+struct aer_stats {
+
+ /*
+ * Fields for all AER capable devices. They indicate the errors
+ * "as seen by this device". Note that this may mean that if an
+ * end point is causing problems, the AER counters may increment
+ * at its link partner (e.g. root port) because the errors will be
+ * "seen" by the link partner and not the the problematic end point
+ * itself (which may report all counters as 0 as it never saw any
+ * problems).
+ */
+ /* Individual counters for different type of correctable errors */
+ u64 dev_cor_errs[AER_MAX_TYPEOF_CORRECTABLE_ERRS];
+ /* Individual counters for different type of uncorrectable errors */
+ u64 dev_uncor_errs[AER_MAX_TYPEOF_UNCORRECTABLE_ERRS];
+ /* Total number of correctable errors seen by this device */
+ u64 dev_total_cor_errs;
+ /* Total number of fatal uncorrectable errors seen by this device */
+ u64 dev_total_fatal_errs;
+ /* Total number of fatal uncorrectable errors seen by this device */
+ u64 dev_total_nonfatal_errs;
+
+ /*
+ * Fields for Root ports only, these indicate the total number of
+ * ERR_COR, ERR_FATAL, and ERR_NONFATAL messages received by the
+ * rootport, INCLUDING the ones that are generated internally (by
+ * the rootport itself)
+ */
+ u64 rootport_total_cor_errs;
+ u64 rootport_total_fatal_errs;
+ u64 rootport_total_nonfatal_errs;
+};
+
+int pci_aer_stats_init(struct pci_dev *pdev)
+{
+ pdev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
+ if (!pdev->aer_stats) {
+ dev_err(&pdev->dev, "No memory for aer_stats\n");
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+void pci_aer_stats_exit(struct pci_dev *pdev)
+{
+ kfree(pdev->aer_stats);
+ pdev->aer_stats = NULL;
+}
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 384020757b81..dd662c241373 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2064,6 +2064,7 @@ static void pci_configure_device(struct pci_dev *dev)
static void pci_release_capabilities(struct pci_dev *dev)
{
+ pci_aer_exit(dev);
pci_vpd_release(dev);
pci_iov_release(dev);
pci_free_cap_save_buffers(dev);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 21965e0dbe62..5c84b1304de7 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -299,6 +299,7 @@ struct pci_dev {
u8 hdr_type; /* PCI header type (`multi' flag masked out) */
#ifdef CONFIG_PCIEAER
u16 aer_cap; /* AER capability offset */
+ struct aer_stats *aer_stats; /* AER stats for this device */
#endif
u8 pcie_cap; /* PCIe capability offset */
u8 msi_cap; /* MSI capability offset */
@@ -1470,10 +1471,12 @@ static inline bool pcie_aspm_support_enabled(void) { return false; }
void pci_no_aer(void);
bool pci_aer_available(void);
int pci_aer_init(struct pci_dev *dev);
+void pci_aer_exit(struct pci_dev *dev);
#else
static inline void pci_no_aer(void) { }
static inline bool pci_aer_available(void) { return false; }
static inline int pci_aer_init(struct pci_dev *d) { return -ENODEV; }
+static inline void pci_aer_exit(struct pci_dev *d) { }
#endif
#ifdef CONFIG_PCIE_ECRC
--
2.17.0.441.gb46fe60e1d-goog
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 4/5] PCI/AER: Add sysfs attributes for rootport cumulative stats
From: Rajat Jain @ 2018-05-22 22:28 UTC (permalink / raw)
To: Bjorn Helgaas, Jonathan Corbet, Philippe Ombredanne, Kate Stewart,
Thomas Gleixner, Greg Kroah-Hartman, Rajat Jain, Frederick Lawler,
Oza Pawandeep, Keith Busch, Gabriele Paoloni, Alexandru Gagniuc,
Thomas Tai, Steven Rostedt (VMware), linux-pci, linux-doc,
linux-kernel, Jes Sorensen, Kyle McMartin
Cc: rajatxjain
In-Reply-To: <20180522222805.80314-1-rajatja@google.com>
Add sysfs attributes for rootport statistics (that are cumulative
of all the ERR_* messages seen on this PCI hierarchy).
Signed-off-by: Rajat Jain <rajatja@google.com>
---
drivers/pci/pcie/aer/aerdrv.h | 2 ++
drivers/pci/pcie/aer/aerdrv_core.c | 2 ++
drivers/pci/pcie/aer/aerdrv_stats.c | 31 +++++++++++++++++++++++++++++
3 files changed, 35 insertions(+)
diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index 048fbd7c9633..77d8355551d9 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -88,6 +88,8 @@ irqreturn_t aer_irq(int irq, void *context);
int pci_aer_stats_init(struct pci_dev *pdev);
void pci_aer_stats_exit(struct pci_dev *pdev);
void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info);
+void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
+ struct aer_err_source *e_src);
extern const char
*aer_correctable_error_string[AER_MAX_TYPEOF_CORRECTABLE_ERRS];
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index 42a6f913069a..0f70e22563f3 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -424,6 +424,8 @@ static void aer_isr_one_error(struct pcie_device *p_device,
struct aer_rpc *rpc = get_service_data(p_device);
struct aer_err_info *e_info = &rpc->e_info;
+ pci_rootport_aer_stats_incr(p_device->port, e_src);
+
/*
* There is a possibility that both correctable error and
* uncorrectable error being logged. Report correctable error first.
diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c b/drivers/pci/pcie/aer/aerdrv_stats.c
index 5f0a6e144f56..a526e26c8683 100644
--- a/drivers/pci/pcie/aer/aerdrv_stats.c
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -60,6 +60,9 @@ static DEVICE_ATTR_RO(field)
aer_stats_aggregate_attr(dev_total_cor_errs);
aer_stats_aggregate_attr(dev_total_fatal_errs);
aer_stats_aggregate_attr(dev_total_nonfatal_errs);
+aer_stats_aggregate_attr(rootport_total_cor_errs);
+aer_stats_aggregate_attr(rootport_total_fatal_errs);
+aer_stats_aggregate_attr(rootport_total_nonfatal_errs);
#define aer_stats_breakdown_attr(field, stats_array, strings_array) \
static ssize_t \
@@ -90,6 +93,9 @@ static struct attribute *aer_stats_attrs[] __ro_after_init = {
&dev_attr_dev_total_nonfatal_errs.attr,
&dev_attr_dev_breakdown_correctable.attr,
&dev_attr_dev_breakdown_uncorrectable.attr,
+ &dev_attr_rootport_total_cor_errs.attr,
+ &dev_attr_rootport_total_fatal_errs.attr,
+ &dev_attr_rootport_total_nonfatal_errs.attr,
NULL
};
@@ -102,6 +108,12 @@ static umode_t aer_stats_attrs_are_visible(struct kobject *kobj,
if (!pdev->aer_stats)
return 0;
+ if ((a == &dev_attr_rootport_total_cor_errs.attr ||
+ a == &dev_attr_rootport_total_fatal_errs.attr ||
+ a == &dev_attr_rootport_total_nonfatal_errs.attr) &&
+ pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT)
+ return 0;
+
return a->mode;
}
@@ -144,6 +156,25 @@ void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info)
counter[i]++;
}
+void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
+ struct aer_err_source *e_src)
+{
+ struct aer_stats *aer_stats = pdev->aer_stats;
+
+ if (unlikely(!aer_stats))
+ return;
+
+ if (e_src->status & PCI_ERR_ROOT_COR_RCV)
+ aer_stats->rootport_total_cor_errs++;
+
+ if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) {
+ if (e_src->status & PCI_ERR_ROOT_FATAL_RCV)
+ aer_stats->rootport_total_fatal_errs++;
+ else
+ aer_stats->rootport_total_nonfatal_errs++;
+ }
+}
+
int pci_aer_stats_init(struct pci_dev *pdev)
{
pdev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
--
2.17.0.441.gb46fe60e1d-goog
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 5/5] Documentation/PCI: Add details of PCI AER statistics
From: Rajat Jain @ 2018-05-22 22:28 UTC (permalink / raw)
To: Bjorn Helgaas, Jonathan Corbet, Philippe Ombredanne, Kate Stewart,
Thomas Gleixner, Greg Kroah-Hartman, Rajat Jain, Frederick Lawler,
Oza Pawandeep, Keith Busch, Gabriele Paoloni, Alexandru Gagniuc,
Thomas Tai, Steven Rostedt (VMware), linux-pci, linux-doc,
linux-kernel, Jes Sorensen, Kyle McMartin
Cc: rajatxjain
In-Reply-To: <20180522222805.80314-1-rajatja@google.com>
Add the PCI AER statistics details to
Documentation/PCI/pcieaer-howto.txt
Signed-off-by: Rajat Jain <rajatja@google.com>
---
Documentation/PCI/pcieaer-howto.txt | 35 +++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/Documentation/PCI/pcieaer-howto.txt b/Documentation/PCI/pcieaer-howto.txt
index acd0dddd6bb8..86ee9f9ff5e1 100644
--- a/Documentation/PCI/pcieaer-howto.txt
+++ b/Documentation/PCI/pcieaer-howto.txt
@@ -73,6 +73,41 @@ In the example, 'Requester ID' means the ID of the device who sends
the error message to root port. Pls. refer to pci express specs for
other fields.
+2.4 AER statistics
+
+When AER messages are captured, the statistics are exposed via the following
+sysfs attributes under the "aer_stats" folder for the device:
+
+2.4.1 Device sysfs Attributes
+
+These attributes show up under all the devices that are AER capable. These
+indicate the errors "as seen by the device". Note that this may mean that if
+an end point is causing problems, the AER counters may increment at its link
+partner (e.g. root port) because the errors will be "seen" by the link partner
+and not the the problematic end point itself (which may report all counters
+as 0 as it never saw any problems).
+
+ * dev_total_cor_errs: number of correctable errors seen by the device.
+ * dev_total_fatal_errs: number of fatal uncorrectable errors seen by the device.
+ * dev_total_nonfatal_errs: number of nonfatal uncorr errors seen by the device.
+ * dev_breakdown_correctable: Provides a breakdown of different type of
+ correctable errors seen.
+ * dev_breakdown_uncorrectable: Provides a breakdown of different type of
+ uncorrectable errors seen.
+
+2.4.1 Rootport sysfs Attributes
+
+These attributes showup under only the rootports that are AER capable. These
+indicate the number of error messages as "reported to" the rootport. Please note
+that the rootports also transmit (internally) the ERR_* messages for errors seen
+by the internal rootport PCI device, so these counters includes them and are
+thus cumulative of all the error messages on the PCI hierarchy originating
+at that root port.
+
+ * rootport_total_cor_errs: number of ERR_COR messages reported to rootport.
+ * rootport_total_fatal_errs: number of ERR_FATAL messages reported to rootport.
+ * rootport_total_nonfatal_errs: number of ERR_NONFATAL messages reporeted to
+ rootport.
3. Developer Guide
--
2.17.0.441.gb46fe60e1d-goog
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 3/5] PCP/AER: Add sysfs attributes to provide breakdown of AERs
From: Rajat Jain @ 2018-05-22 22:28 UTC (permalink / raw)
To: Bjorn Helgaas, Jonathan Corbet, Philippe Ombredanne, Kate Stewart,
Thomas Gleixner, Greg Kroah-Hartman, Rajat Jain, Frederick Lawler,
Oza Pawandeep, Keith Busch, Gabriele Paoloni, Alexandru Gagniuc,
Thomas Tai, Steven Rostedt (VMware), linux-pci, linux-doc,
linux-kernel, Jes Sorensen, Kyle McMartin
Cc: rajatxjain, Rajat Jain
In-Reply-To: <20180522222805.80314-1-rajatja@google.com>
Add sysfs attributes to provide breakdown of the AERs seen,
into different type of correctable or uncorrectable errors:
dev_breakdown_correctable
dev_breakdown_uncorrectable
Signed-off-by: Rajat Jain <rajatj@google.com>
---
drivers/pci/pcie/aer/aerdrv.h | 6 ++++++
drivers/pci/pcie/aer/aerdrv_errprint.c | 6 ++++--
drivers/pci/pcie/aer/aerdrv_stats.c | 25 +++++++++++++++++++++++++
3 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index b5d5ad6f2c03..048fbd7c9633 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -89,6 +89,12 @@ int pci_aer_stats_init(struct pci_dev *pdev);
void pci_aer_stats_exit(struct pci_dev *pdev);
void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info);
+extern const char
+*aer_correctable_error_string[AER_MAX_TYPEOF_CORRECTABLE_ERRS];
+
+extern const char
+*aer_uncorrectable_error_string[AER_MAX_TYPEOF_UNCORRECTABLE_ERRS];
+
#ifdef CONFIG_ACPI_APEI
int pcie_aer_get_firmware_first(struct pci_dev *pci_dev);
#else
diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
index 5e8b98deda08..5585f309f1a8 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -68,7 +68,8 @@ static const char *aer_error_layer[] = {
"Transaction Layer"
};
-static const char *aer_correctable_error_string[] = {
+const char
+*aer_correctable_error_string[AER_MAX_TYPEOF_CORRECTABLE_ERRS] = {
"Receiver Error", /* Bit Position 0 */
NULL,
NULL,
@@ -87,7 +88,8 @@ static const char *aer_correctable_error_string[] = {
"Header Log Overflow", /* Bit Position 15 */
};
-static const char *aer_uncorrectable_error_string[] = {
+const char
+*aer_uncorrectable_error_string[AER_MAX_TYPEOF_UNCORRECTABLE_ERRS] = {
"Undefined", /* Bit Position 0 */
NULL,
NULL,
diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c b/drivers/pci/pcie/aer/aerdrv_stats.c
index 87b7119d0a86..5f0a6e144f56 100644
--- a/drivers/pci/pcie/aer/aerdrv_stats.c
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -61,10 +61,35 @@ aer_stats_aggregate_attr(dev_total_cor_errs);
aer_stats_aggregate_attr(dev_total_fatal_errs);
aer_stats_aggregate_attr(dev_total_nonfatal_errs);
+#define aer_stats_breakdown_attr(field, stats_array, strings_array) \
+ static ssize_t \
+ field##_show(struct device *dev, struct device_attribute *attr, \
+ char *buf) \
+{ \
+ unsigned int i; \
+ char *str = buf; \
+ struct pci_dev *pdev = to_pci_dev(dev); \
+ u64 *stats = pdev->aer_stats->stats_array; \
+ for (i = 0; i < ARRAY_SIZE(strings_array); i++) { \
+ if (strings_array[i]) \
+ str += sprintf(str, "%s = 0x%llx\n", \
+ strings_array[i], stats[i]); \
+ } \
+ return str-buf; \
+} \
+static DEVICE_ATTR_RO(field)
+
+aer_stats_breakdown_attr(dev_breakdown_correctable, dev_cor_errs,
+ aer_correctable_error_string);
+aer_stats_breakdown_attr(dev_breakdown_uncorrectable, dev_uncor_errs,
+ aer_uncorrectable_error_string);
+
static struct attribute *aer_stats_attrs[] __ro_after_init = {
&dev_attr_dev_total_cor_errs.attr,
&dev_attr_dev_total_fatal_errs.attr,
&dev_attr_dev_total_nonfatal_errs.attr,
+ &dev_attr_dev_breakdown_correctable.attr,
+ &dev_attr_dev_breakdown_uncorrectable.attr,
NULL
};
--
2.17.0.441.gb46fe60e1d-goog
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 0/5] Expose PCIe AER stats via sysfs
From: Rajat Jain @ 2018-05-22 22:28 UTC (permalink / raw)
To: Bjorn Helgaas, Jonathan Corbet, Philippe Ombredanne, Kate Stewart,
Thomas Gleixner, Greg Kroah-Hartman, Rajat Jain, Frederick Lawler,
Oza Pawandeep, Keith Busch, Gabriele Paoloni, Alexandru Gagniuc,
Thomas Tai, Steven Rostedt (VMware), linux-pci, linux-doc,
linux-kernel, Jes Sorensen, Kyle McMartin
Cc: rajatxjain
This patchset exposes the AER stats via the sysfs attributes.
Rajat Jain (5):
PCI/AER: Define and allocate aer_stats structure for AER capable
devices
PCI/AER: Add sysfs stats for AER capable devices
PCP/AER: Add sysfs attributes to provide breakdown of AERs
PCI/AER: Add sysfs attributes for rootport cumulative stats
Documentation/PCI: Add details of PCI AER statistics
Documentation/PCI/pcieaer-howto.txt | 35 +++++
drivers/pci/pci-sysfs.c | 3 +
drivers/pci/pci.h | 4 +-
drivers/pci/pcie/aer/Makefile | 2 +-
drivers/pci/pcie/aer/aerdrv.h | 15 ++
drivers/pci/pcie/aer/aerdrv_core.c | 11 ++
drivers/pci/pcie/aer/aerdrv_errprint.c | 7 +-
drivers/pci/pcie/aer/aerdrv_stats.c | 192 +++++++++++++++++++++++++
drivers/pci/probe.c | 1 +
include/linux/pci.h | 3 +
10 files changed, 269 insertions(+), 4 deletions(-)
create mode 100644 drivers/pci/pcie/aer/aerdrv_stats.c
--
2.17.0.441.gb46fe60e1d-goog
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v2 0/7] mm: pages for hugetlb's overcommit may be able to charge to memcg
From: Mike Kravetz @ 2018-05-22 20:28 UTC (permalink / raw)
To: TSUKADA Koutaro
Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Jonathan Corbet,
Luis R. Rodriguez, Kees Cook, Andrew Morton, Roman Gushchin,
David Rientjes, Aneesh Kumar K.V, Naoya Horiguchi,
Anshuman Khandual, Marc-Andre Lureau, Punit Agrawal, Dan Williams,
Vlastimil Babka, linux-doc, linux-kernel, linux-fsdevel, linux-mm,
cgroups
In-Reply-To: <8711fed5-fc35-a11a-3a17-740a9dca1f2a@ascade.co.jp>
On 05/22/2018 06:04 AM, TSUKADA Koutaro wrote:
>
> I stared at the commit log of mm/hugetlb_cgroup.c, but it did not seem to
> have specially considered of surplus hugepages. Later, I will send a mail
> to hugetlb cgroup's committer to ask about surplus hugepages charge
> specifications.
>
I went back and looked at surplus huge page allocation. Previously, I made
a statement that the hugetlb controller accounts for surplus huge pages.
Turns out that may not be 100% correct.
Thanks to Michal, all surplus huge page allocation is performed via the
alloc_surplus_huge_page() routine. This will ultimately call into the
buddy allocator without any cgroup charges. Calls to alloc_surplus_huge_page
are made from:
- alloc_huge_page() when allocating a huge page to a mapping/file. In this
case, appropriate calls to the hugetlb controller are in place. So, any
limits are enforced here.
- gather_surplus_pages() when allocating and setting aside 'reserved' huge
pages. No accounting is performed here. Do note that in this case the
allocated huge pages are not assigned to the mapping/file. Even though
'reserved', they are deposited into the global pool and also counted as
'free'. When these reserved pages are ultimately used to populate a
file/mapping, the code path goes through alloc_huge_page() where appropriate
calls to the hugetlb controller are in place.
So, the bottom line is that surplus huge pages are not accounted for when
they are allocated as 'reserves'. It is not until these reserves are actually
used that accounting limits are checked. This 'seems' to align with general
allocation of huge pages within the pool. No accounting is done until they
are actually allocated to a mapping/file.
--
Mike Kravetz
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH RFC V2 2/6] hwmon: Add support for RPi voltage sensor
From: Stefan Wahren @ 2018-05-22 19:31 UTC (permalink / raw)
To: Rob Herring, Guenter Roeck, Eric Anholt, Mark Rutland,
Jonathan Corbet, Jean Delvare
Cc: Scott Branden, Florian Fainelli, linux-rpi-kernel, Phil Elwell,
bcm-kernel-feedback-list, linux-doc, devicetree, linux-hwmon,
Noralf Trønnes, Ray Jui, linux-arm-kernel
In-Reply-To: <bea3d6c7-a713-c4aa-8ae8-7fec80c5da5d@roeck-us.net>
> Guenter Roeck <linux@roeck-us.net> hat am 22. Mai 2018 um 16:10 geschrieben:
>
>
> On 05/22/2018 06:51 AM, Stefan Wahren wrote:
> > Hi Guenter,
> >
> >> Guenter Roeck <linux@roeck-us.net> hat am 22. Mai 2018 um 15:41 geschrieben:
> >>
> >>
> >> On 05/22/2018 04:21 AM, Stefan Wahren wrote:
> >>> Currently there is no easy way to detect undervoltage conditions on a
> >>> remote Raspberry Pi. This hwmon driver retrieves the state of the
> >>> undervoltage sensor via mailbox interface. The handling based on
> >>> Noralf's modifications to the downstream firmware driver. In case of
> >>> an undervoltage condition only an entry is written to the kernel log.
> >>>
> >>> CC: "Noralf Trønnes" <noralf@tronnes.org>
> >>> Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
> >>> ---
> >>> Documentation/hwmon/raspberrypi-hwmon | 22 +++++
> >>> drivers/hwmon/Kconfig | 10 ++
> >>> drivers/hwmon/Makefile | 1 +
> >>> drivers/hwmon/raspberrypi-hwmon.c | 168 ++++++++++++++++++++++++++++++++++
> >>> 4 files changed, 201 insertions(+)
> >>> create mode 100644 Documentation/hwmon/raspberrypi-hwmon
> >>> create mode 100644 drivers/hwmon/raspberrypi-hwmon.c
> >>>
> >>> diff --git a/Documentation/hwmon/raspberrypi-hwmon b/Documentation/hwmon/raspberrypi-hwmon
> >>> new file mode 100644
> >>> index 0000000..3c92e2c
> >>> --- /dev/null
> >>> +++ b/Documentation/hwmon/raspberrypi-hwmon
> >>> @@ -0,0 +1,22 @@
> >>> +Kernel driver raspberrypi-hwmon
> >>> +===============================
> >>> +
> >>> +Supported boards:
> >>> + * Raspberry Pi A+ (via GPIO on SoC)
> >>> + * Raspberry Pi B+ (via GPIO on SoC)
> >>> + * Raspberry Pi 2 B (via GPIO on SoC)
> >>> + * Raspberry Pi 3 B (via GPIO on port expander)
> >>> + * Raspberry Pi 3 B+ (via PMIC)
> >>> +
> >>> +Author: Stefan Wahren <stefan.wahren@i2se.com>
> >>> +
> >>> +Description
> >>> +-----------
> >>> +
> >>> +This driver periodically polls a mailbox property of the VC4 firmware to detect
> >>> +undervoltage conditions.
> >>> +
> >>> +Sysfs entries
> >>> +-------------
> >>> +
> >>> +in0_lcrit_alarm Undervoltage alarm
> >>> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> >>> index 768aed5..9a5bdb0 100644
> >>> --- a/drivers/hwmon/Kconfig
> >>> +++ b/drivers/hwmon/Kconfig
> >>> @@ -1298,6 +1298,16 @@ config SENSORS_PWM_FAN
> >>> This driver can also be built as a module. If so, the module
> >>> will be called pwm-fan.
> >>>
> >>> +config SENSORS_RASPBERRYPI_HWMON
> >>> + tristate "Raspberry Pi voltage monitor"
> >>> + depends on RASPBERRYPI_FIRMWARE || COMPILE_TEST
> >>> + help
> >>> + If you say yes here you get support for voltage sensor on the
> >>> + Raspberry Pi.
> >>> +
> >>> + This driver can also be built as a module. If so, the module
> >>> + will be called raspberrypi-hwmon.
> >>> +
> >>> config SENSORS_SHT15
> >>> tristate "Sensiron humidity and temperature sensors. SHT15 and compat."
> >>> depends on GPIOLIB || COMPILE_TEST
> >>> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> >>> index e7d52a3..a929770 100644
> >>> --- a/drivers/hwmon/Makefile
> >>> +++ b/drivers/hwmon/Makefile
> >>> @@ -141,6 +141,7 @@ obj-$(CONFIG_SENSORS_PC87427) += pc87427.o
> >>> obj-$(CONFIG_SENSORS_PCF8591) += pcf8591.o
> >>> obj-$(CONFIG_SENSORS_POWR1220) += powr1220.o
> >>> obj-$(CONFIG_SENSORS_PWM_FAN) += pwm-fan.o
> >>> +obj-$(CONFIG_SENSORS_RASPBERRYPI_HWMON) += raspberrypi-hwmon.o
> >>> obj-$(CONFIG_SENSORS_S3C) += s3c-hwmon.o
> >>> obj-$(CONFIG_SENSORS_SCH56XX_COMMON)+= sch56xx-common.o
> >>> obj-$(CONFIG_SENSORS_SCH5627) += sch5627.o
> >>> diff --git a/drivers/hwmon/raspberrypi-hwmon.c b/drivers/hwmon/raspberrypi-hwmon.c
> >>> new file mode 100644
> >>> index 0000000..6233e84
> >>> --- /dev/null
> >>> +++ b/drivers/hwmon/raspberrypi-hwmon.c
> >>> @@ -0,0 +1,168 @@
> >>> +// SPDX-License-Identifier: GPL-2.0+
> >>> +/*
> >>> + * Raspberry Pi voltage sensor driver
> >>> + *
> >>> + * Based on firmware/raspberrypi.c by Noralf Trønnes
> >>> + *
> >>> + * Copyright (C) 2018 Stefan Wahren <stefan.wahren@i2se.com>
> >>> + */
> >>> +#include <linux/device.h>
> >>> +#include <linux/err.h>
> >>> +#include <linux/hwmon.h>
> >>> +#include <linux/module.h>
> >>> +#include <linux/platform_device.h>
> >>> +#include <linux/slab.h>
> >>> +#include <linux/workqueue.h>
> >>> +#include <soc/bcm2835/raspberrypi-firmware.h>
> >>> +
> >>> +#define UNDERVOLTAGE_STICKY_BIT BIT(16)
> >>> +
> >>> +struct rpi_hwmon_data {
> >>> + struct device *hwmon_dev;
> >>> + struct rpi_firmware *fw;
> >>> + u32 last_throttled;
> >>> + struct delayed_work get_values_poll_work;
> >>> +};
> >>> +
> >>> +static void rpi_firmware_get_throttled(struct rpi_hwmon_data *data)
> >>> +{
> >>> + u32 new_uv, old_uv, value;
> >>> + int ret;
> >>> +
> >>> + /* Request firmware to clear sticky bits */
> >>> + value = 0xffff;
> >>> +
> >>> + ret = rpi_firmware_property(data->fw, RPI_FIRMWARE_GET_THROTTLED,
> >>> + &value, sizeof(value));
> >>> + if (ret) {
> >>> + dev_err_once(data->hwmon_dev, "Failed to get throttled (%d)\n",
> >>> + ret);
> >>> + return;
> >>> + }
> >>> +
> >>> + new_uv = value & UNDERVOLTAGE_STICKY_BIT;
> >>> + old_uv = data->last_throttled & UNDERVOLTAGE_STICKY_BIT;
> >>> + data->last_throttled = value;
> >>> +
> >>> + if (new_uv == old_uv)
> >>> + return;
> >>> +
> >>> + if (new_uv)
> >>> + dev_crit(data->hwmon_dev, "Undervoltage detected!\n");
> >>> + else
> >>> + dev_info(data->hwmon_dev, "Voltage normalised\n");
> >>> +
> >>> + sysfs_notify(&data->hwmon_dev->kobj, NULL, "in0_lcrit_alarm");
> >>> +}
> >>> +
> >>> +static void get_values_poll(struct work_struct *work)
> >>> +{
> >>> + struct rpi_hwmon_data *data;
> >>> +
> >>> + data = container_of(work, struct rpi_hwmon_data,
> >>> + get_values_poll_work.work);
> >>> +
> >>> + rpi_firmware_get_throttled(data);
> >>> +
> >>> + /*
> >>> + * We can't run faster than the sticky shift (100ms) since we get
> >>> + * flipping in the sticky bits that are cleared.
> >>> + */
> >>> + schedule_delayed_work(&data->get_values_poll_work, 2 * HZ);
> >>> +}
> >>> +
> >>> +static int rpi_read(struct device *dev, enum hwmon_sensor_types type,
> >>> + u32 attr, int channel, long *val)
> >>> +{
> >>> + struct rpi_hwmon_data *data = dev_get_drvdata(dev);
> >>> +
> >>> + *val = !!(data->last_throttled & UNDERVOLTAGE_STICKY_BIT);
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static umode_t rpi_is_visible(const void *_data, enum hwmon_sensor_types type,
> >>> + u32 attr, int channel)
> >>> +{
> >>> + return 0444;
> >>> +}
> >>> +
> >>> +static const u32 rpi_in_config[] = {
> >>> + HWMON_I_LCRIT_ALARM,
> >>> + 0
> >>> +};
> >>> +
> >>> +static const struct hwmon_channel_info rpi_in = {
> >>> + .type = hwmon_in,
> >>> + .config = rpi_in_config,
> >>> +};
> >>> +
> >>> +static const struct hwmon_channel_info *rpi_info[] = {
> >>> + &rpi_in,
> >>> + NULL
> >>> +};
> >>> +
> >>> +static const struct hwmon_ops rpi_hwmon_ops = {
> >>> + .is_visible = rpi_is_visible,
> >>> + .read = rpi_read,
> >>> +};
> >>> +
> >>> +static const struct hwmon_chip_info rpi_chip_info = {
> >>> + .ops = &rpi_hwmon_ops,
> >>> + .info = rpi_info,
> >>> +};
> >>> +
> >>> +static int rpi_hwmon_probe(struct platform_device *pdev)
> >>> +{
> >>> + struct device *dev = &pdev->dev;
> >>> + struct rpi_hwmon_data *data;
> >>> + int ret;
> >>> +
> >>> + data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> >>> + if (!data)
> >>> + return -ENOMEM;
> >>> +
> >>> + data->fw = platform_get_drvdata(to_platform_device(dev->parent));
> >>> + if (!data->fw)
> >>> + return -EPROBE_DEFER;
> >>> +
> >>
> >> I am a bit at loss here (and sorry I didn't bring this up before).
> >> How would this ever be possible, given that the driver is registered
> >> from the firmware driver ?
> >
> > Do you refer to the (wrong) return code, the assumption that the parent must be a platform driver or a possible race?
> >
>
> The return code is one thing. My question was how the driver would ever be instantiated
> with platform_get_drvdata(to_platform_device(dev->parent)) == NULL (but dev->parent != NULL),
> so I referred to the race. But, sure, a second question would be how that would indicate
> that the parent is not instantiated yet (which by itself seems like an odd question).
This shouldn't happen and worth a log error. In patch #3 the registration is called after the complete private data of the firmware driver is initialized. Did i missed something?
But i must confess that i didn't test all builtin/module combinations.
>
> Yet another question, as you point out, is why to use platform_get_drvdata(to_platform_device(dev->parent))
> instead of dev_get_drvdata(dev->parent).
Sure this is much simpler
Thanks
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v2 0/7] mm: pages for hugetlb's overcommit may be able to charge to memcg
From: Michal Hocko @ 2018-05-22 18:54 UTC (permalink / raw)
To: TSUKADA Koutaro
Cc: Mike Kravetz, Johannes Weiner, Vladimir Davydov, Jonathan Corbet,
Luis R. Rodriguez, Kees Cook, Andrew Morton, Roman Gushchin,
David Rientjes, Aneesh Kumar K.V, Naoya Horiguchi,
Anshuman Khandual, Marc-Andre Lureau, Punit Agrawal, Dan Williams,
Vlastimil Babka, linux-doc, linux-kernel, linux-fsdevel, linux-mm,
cgroups
In-Reply-To: <8711fed5-fc35-a11a-3a17-740a9dca1f2a@ascade.co.jp>
On Tue 22-05-18 22:04:23, TSUKADA Koutaro wrote:
> On 2018/05/22 3:07, Mike Kravetz wrote:
> > On 05/17/2018 09:27 PM, TSUKADA Koutaro wrote:
> >> Thanks to Mike Kravetz for comment on the previous version patch.
> >>
> >> The purpose of this patch-set is to make it possible to control whether or
> >> not to charge surplus hugetlb pages obtained by overcommitting to memory
> >> cgroup. In the future, I am trying to accomplish limiting the memory usage
> >> of applications that use both normal pages and hugetlb pages by the memory
> >> cgroup(not use the hugetlb cgroup).
> >>
> >> Applications that use shared libraries like libhugetlbfs.so use both normal
> >> pages and hugetlb pages, but we do not know how much to use each. Please
> >> suppose you want to manage the memory usage of such applications by cgroup
> >> How do you set the memory cgroup and hugetlb cgroup limit when you want to
> >> limit memory usage to 10GB?
> >>
> >> If you set a limit of 10GB for each, the user can use a total of 20GB of
> >> memory and can not limit it well. Since it is difficult to estimate the
> >> ratio used by user of normal pages and hugetlb pages, setting limits of 2GB
> >> to memory cgroup and 8GB to hugetlb cgroup is not very good idea. In such a
> >> case, I thought that by using my patch-set, we could manage resources just
> >> by setting 10GB as the limit of memory cgoup(there is no limit to hugetlb
> >> cgroup).
> >>
> >> In this patch-set, introduce the charge_surplus_huge_pages(boolean) to
> >> struct hstate. If it is true, it charges to the memory cgroup to which the
> >> task that obtained surplus hugepages belongs. If it is false, do nothing as
> >> before, and the default value is false. The charge_surplus_huge_pages can
> >> be controlled procfs or sysfs interfaces.
> >>
> >> Since THP is very effective in environments with kernel page size of 4KB,
> >> such as x86, there is no reason to positively use HugeTLBfs, so I think
> >> that there is no situation to enable charge_surplus_huge_pages. However, in
> >> some distributions such as arm64, the page size of the kernel is 64KB, and
> >> the size of THP is too huge as 512MB, making it difficult to use. HugeTLBfs
> >> may support multiple huge page sizes, and in such a special environment
> >> there is a desire to use HugeTLBfs.
> >
> > One of the basic questions/concerns I have is accounting for surplus huge
> > pages in the default memory resource controller. The existing huegtlb
> > resource controller already takes hugetlbfs huge pages into account,
> > including surplus pages. This series would allow surplus pages to be
> > accounted for in the default memory controller, or the hugetlb controller
> > or both.
> >
> > I understand that current mechanisms do not meet the needs of the above
> > use case. The question is whether this is an appropriate way to approach
> > the issue.
I do share your view Mike!
> > My cgroup experience and knowledge is extremely limited, but
> > it does not appear that any other resource can be controlled by multiple
> > controllers. Therefore, I am concerned that this may be going against
> > basic cgroup design philosophy.
>
> Thank you for your feedback.
> That makes sense, surplus hugepages are charged to both memcg and hugetlb
> cgroup, which may be contrary to cgroup design philosophy.
>
> Based on the above advice, I have considered the following improvements,
> what do you think about?
>
> The 'charge_surplus_hugepages' of v2 patch-set was an option to switch
> "whether to charge memcg in addition to hugetlb cgroup", but it will be
> abolished. Instead, change to "switch only to memcg instead of hugetlb
> cgroup" option. This is called 'surplus_charge_to_memcg'.
This all looks so hackish and ad-hoc that I would be tempted to give it
an outright nack, but let's here more about why do we need this fiddling
at all. I've asked in other email so I guess I will get an answer there
but let me just emphasize again that I absolutely detest a possibility
to put hugetlb pages into the memcg mix. They just do not belong there.
Try to look at previous discussions why it has been decided to have a
separate hugetlb pages at all.
I am also quite confused why you keep distinguishing surplus hugetlb
pages from regular preallocated ones. Being a surplus page is an
implementation detail that we use for an internal accounting rather than
something to exhibit to the userspace even more than we do currently.
Just look at what [sw]hould when you need to adjust accounting - e.g.
due to the pool resize. Are you going to uncharge those surplus pages
ffrom memcg to reflect their persistence?
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v2 11/11] docs: fix broken references with multiple hints
From: Rob Herring @ 2018-05-22 17:31 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Linux Doc Mailing List, Mauro Carvalho Chehab, linux-kernel,
Jonathan Corbet, Linus Walleij, David Airlie, Mark Rutland,
Harry Wei, Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
Roy Pledge, Greg Kroah-Hartman, Bartlomiej Zolnierkiewicz,
Steven Rostedt, Ingo Molnar, James Morris, Serge E. Hallyn,
linux-gpio, dri-devel, devicetree, linux-kernel, linux-usb,
linux-input, devel, linux-fbdev, linux-security-module
In-Reply-To: <63a4f8a93f9115475bc184d0f37d076c9b9c75ff.1525870886.git.mchehab+samsung@kernel.org>
On Wed, May 09, 2018 at 10:18:54AM -0300, Mauro Carvalho Chehab wrote:
> The script:
> ./scripts/documentation-file-ref-check --fix-rst
>
> Gives multiple hints for broken references on some files.
> Manually use the one that applies for some files.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
> ---
> Documentation/ABI/obsolete/sysfs-gpio | 2 +-
> .../devicetree/bindings/display/bridge/tda998x.txt | 2 +-
Acked-by: Rob Herring <robh@kernel.org>
> Documentation/trace/events.rst | 2 +-
> Documentation/trace/tracepoint-analysis.rst | 2 +-
> Documentation/translations/zh_CN/SubmittingDrivers | 2 +-
> Documentation/translations/zh_CN/gpio.txt | 4 ++--
> MAINTAINERS | 2 +-
> drivers/hid/usbhid/Kconfig | 2 +-
> drivers/input/Kconfig | 4 ++--
> drivers/input/joystick/Kconfig | 4 ++--
> drivers/input/joystick/iforce/Kconfig | 2 +-
> drivers/input/serio/Kconfig | 4 ++--
> drivers/staging/fsl-mc/bus/dpio/dpio-driver.txt | 2 +-
> drivers/video/fbdev/skeletonfb.c | 8 ++++----
> include/linux/tracepoint.h | 2 +-
> security/device_cgroup.c | 2 +-
> 16 files changed, 23 insertions(+), 23 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v5 3/5] i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller
From: Karthik Ramasubramanian @ 2018-05-22 14:52 UTC (permalink / raw)
To: Wolfram Sang
Cc: corbet, andy.gross, david.brown, robh+dt, mark.rutland, linux-doc,
linux-arm-msm, devicetree, linux-i2c, evgreen, acourbot, swboyd,
dianders, Sagar Dharia, Girish Mahadevan
In-Reply-To: <20180521204924.4ldbke6udjshdz2k@katana>
On 5/21/2018 2:49 PM, Wolfram Sang wrote:
> Hi,
>
> On Fri, Mar 23, 2018 at 02:20:59PM -0600, Karthikeyan Ramasubramanian wrote:
>> This bus driver supports the GENI based i2c hardware controller in the
>> Qualcomm SOCs. The Qualcomm Generic Interface (GENI) is a programmable
>> module supporting a wide range of serial interfaces including I2C. The
>> driver supports FIFO mode and DMA mode of transfer and switches modes
>> dynamically depending on the size of the transfer.
>>
>> Signed-off-by: Karthikeyan Ramasubramanian <kramasub@codeaurora.org>
>> Signed-off-by: Sagar Dharia <sdharia@codeaurora.org>
>> Signed-off-by: Girish Mahadevan <girishm@codeaurora.org>
>
> Is one of these people interested in maintaining this driver? Then, an
> entry for MAINTAINERS would be needed, too. (Same goes for
> drivers/soc/qcom/ IMHO, but this is not my realm, so just saying)
One of us will maintain this driver and I will update the MAINTAINERS
appropriately.
>
>> +static const struct geni_i2c_err_log gi2c_log[] = {
>> + [GP_IRQ0] = {-EINVAL, "Unknown I2C err GP_IRQ0"},
>> + [NACK] = {-ENOTCONN, "NACK: slv unresponsive, check its power/reset-ln"},
>> + [GP_IRQ2] = {-EINVAL, "Unknown I2C err GP IRQ2"},
>> + [BUS_PROTO] = {-EPROTO, "Bus proto err, noisy/unepxected start/stop"},
>> + [ARB_LOST] = {-EBUSY, "Bus arbitration lost, clock line undriveable"},
>> + [GP_IRQ5] = {-EINVAL, "Unknown I2C err GP IRQ5"},
>> + [GENI_OVERRUN] = {-EIO, "Cmd overrun, check GENI cmd-state machine"},
>> + [GENI_ILLEGAL_CMD] = {-EILSEQ, "Illegal cmd, check GENI cmd-state machine"},
>> + [GENI_ABORT_DONE] = {-ETIMEDOUT, "Abort after timeout successful"},
>> + [GENI_TIMEOUT] = {-ETIMEDOUT, "I2C TXN timed out"},
>> +};
>
> Please check Documentation/i2c/fault-codes for better -ERRNO values,
> especially for NACK and ARB_LOST.
I will check the fault-codes and fix the error codes here.
>
> Rest looks good from a glimpse.
>
> Thanks,
>
> Wolfram
>
Regards,
Karthik.
--
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH] cgroup, docs: add a note about returning EBUSY in some cases
From: Roman Gushchin @ 2018-05-22 14:36 UTC (permalink / raw)
To: Tejun Heo; +Cc: Roman Gushchin, kernel-team, cgroups, linux-doc, linux-kernel
Explicitly document EBUSY returned by writing into cgroup.procs
if controllers are enabled; and writing into cgroup.subtree_control
if there are attached processes.
The return code might be slightly surprising, and because there is
nothing obviously better, let's document it at least.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: kernel-team@fb.com
Cc: cgroups@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Documentation/cgroup-v2.txt | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 74cdeaed9f7a..57302f88a4ad 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -799,6 +799,9 @@ All cgroup core files are prefixed with "cgroup."
When delegating a sub-hierarchy, write access to this file
should be granted along with the containing directory.
+ If the target cgroup has enabled controllers, writing to this
+ file will fail with EBUSY.
+
In a threaded cgroup, reading this file fails with EOPNOTSUPP
as all the processes belong to the thread root. Writing is
supported and moves every thread of the process to the cgroup.
@@ -850,6 +853,9 @@ All cgroup core files are prefixed with "cgroup."
the last one is effective. When multiple enable and disable
operations are specified, either all succeed or all fail.
+ If the cgroup has attached tasks, writing to this file will
+ fail with EBUSY.
+
cgroup.events
A read-only flat-keyed file which exists on non-root cgroups.
The following entries are defined. Unless specified
--
2.14.3
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: [PATCH RFC V2 2/6] hwmon: Add support for RPi voltage sensor
From: Guenter Roeck @ 2018-05-22 14:10 UTC (permalink / raw)
To: Stefan Wahren, Rob Herring, Eric Anholt, Mark Rutland,
Jonathan Corbet, Jean Delvare
Cc: Scott Branden, Florian Fainelli, linux-rpi-kernel, Phil Elwell,
bcm-kernel-feedback-list, linux-doc, devicetree, linux-hwmon,
Noralf Trønnes, Ray Jui, linux-arm-kernel
In-Reply-To: <659923372.11518.1526997096662@email.1und1.de>
On 05/22/2018 06:51 AM, Stefan Wahren wrote:
> Hi Guenter,
>
>> Guenter Roeck <linux@roeck-us.net> hat am 22. Mai 2018 um 15:41 geschrieben:
>>
>>
>> On 05/22/2018 04:21 AM, Stefan Wahren wrote:
>>> Currently there is no easy way to detect undervoltage conditions on a
>>> remote Raspberry Pi. This hwmon driver retrieves the state of the
>>> undervoltage sensor via mailbox interface. The handling based on
>>> Noralf's modifications to the downstream firmware driver. In case of
>>> an undervoltage condition only an entry is written to the kernel log.
>>>
>>> CC: "Noralf Trønnes" <noralf@tronnes.org>
>>> Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
>>> ---
>>> Documentation/hwmon/raspberrypi-hwmon | 22 +++++
>>> drivers/hwmon/Kconfig | 10 ++
>>> drivers/hwmon/Makefile | 1 +
>>> drivers/hwmon/raspberrypi-hwmon.c | 168 ++++++++++++++++++++++++++++++++++
>>> 4 files changed, 201 insertions(+)
>>> create mode 100644 Documentation/hwmon/raspberrypi-hwmon
>>> create mode 100644 drivers/hwmon/raspberrypi-hwmon.c
>>>
>>> diff --git a/Documentation/hwmon/raspberrypi-hwmon b/Documentation/hwmon/raspberrypi-hwmon
>>> new file mode 100644
>>> index 0000000..3c92e2c
>>> --- /dev/null
>>> +++ b/Documentation/hwmon/raspberrypi-hwmon
>>> @@ -0,0 +1,22 @@
>>> +Kernel driver raspberrypi-hwmon
>>> +===============================
>>> +
>>> +Supported boards:
>>> + * Raspberry Pi A+ (via GPIO on SoC)
>>> + * Raspberry Pi B+ (via GPIO on SoC)
>>> + * Raspberry Pi 2 B (via GPIO on SoC)
>>> + * Raspberry Pi 3 B (via GPIO on port expander)
>>> + * Raspberry Pi 3 B+ (via PMIC)
>>> +
>>> +Author: Stefan Wahren <stefan.wahren@i2se.com>
>>> +
>>> +Description
>>> +-----------
>>> +
>>> +This driver periodically polls a mailbox property of the VC4 firmware to detect
>>> +undervoltage conditions.
>>> +
>>> +Sysfs entries
>>> +-------------
>>> +
>>> +in0_lcrit_alarm Undervoltage alarm
>>> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
>>> index 768aed5..9a5bdb0 100644
>>> --- a/drivers/hwmon/Kconfig
>>> +++ b/drivers/hwmon/Kconfig
>>> @@ -1298,6 +1298,16 @@ config SENSORS_PWM_FAN
>>> This driver can also be built as a module. If so, the module
>>> will be called pwm-fan.
>>>
>>> +config SENSORS_RASPBERRYPI_HWMON
>>> + tristate "Raspberry Pi voltage monitor"
>>> + depends on RASPBERRYPI_FIRMWARE || COMPILE_TEST
>>> + help
>>> + If you say yes here you get support for voltage sensor on the
>>> + Raspberry Pi.
>>> +
>>> + This driver can also be built as a module. If so, the module
>>> + will be called raspberrypi-hwmon.
>>> +
>>> config SENSORS_SHT15
>>> tristate "Sensiron humidity and temperature sensors. SHT15 and compat."
>>> depends on GPIOLIB || COMPILE_TEST
>>> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
>>> index e7d52a3..a929770 100644
>>> --- a/drivers/hwmon/Makefile
>>> +++ b/drivers/hwmon/Makefile
>>> @@ -141,6 +141,7 @@ obj-$(CONFIG_SENSORS_PC87427) += pc87427.o
>>> obj-$(CONFIG_SENSORS_PCF8591) += pcf8591.o
>>> obj-$(CONFIG_SENSORS_POWR1220) += powr1220.o
>>> obj-$(CONFIG_SENSORS_PWM_FAN) += pwm-fan.o
>>> +obj-$(CONFIG_SENSORS_RASPBERRYPI_HWMON) += raspberrypi-hwmon.o
>>> obj-$(CONFIG_SENSORS_S3C) += s3c-hwmon.o
>>> obj-$(CONFIG_SENSORS_SCH56XX_COMMON)+= sch56xx-common.o
>>> obj-$(CONFIG_SENSORS_SCH5627) += sch5627.o
>>> diff --git a/drivers/hwmon/raspberrypi-hwmon.c b/drivers/hwmon/raspberrypi-hwmon.c
>>> new file mode 100644
>>> index 0000000..6233e84
>>> --- /dev/null
>>> +++ b/drivers/hwmon/raspberrypi-hwmon.c
>>> @@ -0,0 +1,168 @@
>>> +// SPDX-License-Identifier: GPL-2.0+
>>> +/*
>>> + * Raspberry Pi voltage sensor driver
>>> + *
>>> + * Based on firmware/raspberrypi.c by Noralf Trønnes
>>> + *
>>> + * Copyright (C) 2018 Stefan Wahren <stefan.wahren@i2se.com>
>>> + */
>>> +#include <linux/device.h>
>>> +#include <linux/err.h>
>>> +#include <linux/hwmon.h>
>>> +#include <linux/module.h>
>>> +#include <linux/platform_device.h>
>>> +#include <linux/slab.h>
>>> +#include <linux/workqueue.h>
>>> +#include <soc/bcm2835/raspberrypi-firmware.h>
>>> +
>>> +#define UNDERVOLTAGE_STICKY_BIT BIT(16)
>>> +
>>> +struct rpi_hwmon_data {
>>> + struct device *hwmon_dev;
>>> + struct rpi_firmware *fw;
>>> + u32 last_throttled;
>>> + struct delayed_work get_values_poll_work;
>>> +};
>>> +
>>> +static void rpi_firmware_get_throttled(struct rpi_hwmon_data *data)
>>> +{
>>> + u32 new_uv, old_uv, value;
>>> + int ret;
>>> +
>>> + /* Request firmware to clear sticky bits */
>>> + value = 0xffff;
>>> +
>>> + ret = rpi_firmware_property(data->fw, RPI_FIRMWARE_GET_THROTTLED,
>>> + &value, sizeof(value));
>>> + if (ret) {
>>> + dev_err_once(data->hwmon_dev, "Failed to get throttled (%d)\n",
>>> + ret);
>>> + return;
>>> + }
>>> +
>>> + new_uv = value & UNDERVOLTAGE_STICKY_BIT;
>>> + old_uv = data->last_throttled & UNDERVOLTAGE_STICKY_BIT;
>>> + data->last_throttled = value;
>>> +
>>> + if (new_uv == old_uv)
>>> + return;
>>> +
>>> + if (new_uv)
>>> + dev_crit(data->hwmon_dev, "Undervoltage detected!\n");
>>> + else
>>> + dev_info(data->hwmon_dev, "Voltage normalised\n");
>>> +
>>> + sysfs_notify(&data->hwmon_dev->kobj, NULL, "in0_lcrit_alarm");
>>> +}
>>> +
>>> +static void get_values_poll(struct work_struct *work)
>>> +{
>>> + struct rpi_hwmon_data *data;
>>> +
>>> + data = container_of(work, struct rpi_hwmon_data,
>>> + get_values_poll_work.work);
>>> +
>>> + rpi_firmware_get_throttled(data);
>>> +
>>> + /*
>>> + * We can't run faster than the sticky shift (100ms) since we get
>>> + * flipping in the sticky bits that are cleared.
>>> + */
>>> + schedule_delayed_work(&data->get_values_poll_work, 2 * HZ);
>>> +}
>>> +
>>> +static int rpi_read(struct device *dev, enum hwmon_sensor_types type,
>>> + u32 attr, int channel, long *val)
>>> +{
>>> + struct rpi_hwmon_data *data = dev_get_drvdata(dev);
>>> +
>>> + *val = !!(data->last_throttled & UNDERVOLTAGE_STICKY_BIT);
>>> + return 0;
>>> +}
>>> +
>>> +static umode_t rpi_is_visible(const void *_data, enum hwmon_sensor_types type,
>>> + u32 attr, int channel)
>>> +{
>>> + return 0444;
>>> +}
>>> +
>>> +static const u32 rpi_in_config[] = {
>>> + HWMON_I_LCRIT_ALARM,
>>> + 0
>>> +};
>>> +
>>> +static const struct hwmon_channel_info rpi_in = {
>>> + .type = hwmon_in,
>>> + .config = rpi_in_config,
>>> +};
>>> +
>>> +static const struct hwmon_channel_info *rpi_info[] = {
>>> + &rpi_in,
>>> + NULL
>>> +};
>>> +
>>> +static const struct hwmon_ops rpi_hwmon_ops = {
>>> + .is_visible = rpi_is_visible,
>>> + .read = rpi_read,
>>> +};
>>> +
>>> +static const struct hwmon_chip_info rpi_chip_info = {
>>> + .ops = &rpi_hwmon_ops,
>>> + .info = rpi_info,
>>> +};
>>> +
>>> +static int rpi_hwmon_probe(struct platform_device *pdev)
>>> +{
>>> + struct device *dev = &pdev->dev;
>>> + struct rpi_hwmon_data *data;
>>> + int ret;
>>> +
>>> + data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
>>> + if (!data)
>>> + return -ENOMEM;
>>> +
>>> + data->fw = platform_get_drvdata(to_platform_device(dev->parent));
>>> + if (!data->fw)
>>> + return -EPROBE_DEFER;
>>> +
>>
>> I am a bit at loss here (and sorry I didn't bring this up before).
>> How would this ever be possible, given that the driver is registered
>> from the firmware driver ?
>
> Do you refer to the (wrong) return code, the assumption that the parent must be a platform driver or a possible race?
>
The return code is one thing. My question was how the driver would ever be instantiated
with platform_get_drvdata(to_platform_device(dev->parent)) == NULL (but dev->parent != NULL),
so I referred to the race. But, sure, a second question would be how that would indicate
that the parent is not instantiated yet (which by itself seems like an odd question).
Yet another question, as you point out, is why to use platform_get_drvdata(to_platform_device(dev->parent))
instead of dev_get_drvdata(dev->parent).
Guenter
>>
>>> + ret = rpi_firmware_property(data->fw, RPI_FIRMWARE_GET_THROTTLED,
>>> + &data->last_throttled,
>>> + sizeof(data->last_throttled));
>>> + if (ret)
>>> + return -ENODEV;
>>> +
>>> + data->hwmon_dev = devm_hwmon_device_register_with_info(dev, "rpi_volt",
>>> + data,
>>> + &rpi_chip_info,
>>> + NULL);
>>> +
>>> + INIT_DELAYED_WORK(&data->get_values_poll_work, get_values_poll);
>>> + platform_set_drvdata(pdev, data);
>>> +
>>> + if (!PTR_ERR_OR_ZERO(data->hwmon_dev))
>>> + schedule_delayed_work(&data->get_values_poll_work, 2 * HZ);
>>> +
>>> + return PTR_ERR_OR_ZERO(data->hwmon_dev);
>>> +}
>>> +
>>> +static int rpi_hwmon_remove(struct platform_device *pdev)
>>> +{
>>> + struct rpi_hwmon_data *data = platform_get_drvdata(pdev);
>>> +
>>> + cancel_delayed_work_sync(&data->get_values_poll_work);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static struct platform_driver rpi_hwmon_driver = {
>>> + .probe = rpi_hwmon_probe,
>>> + .remove = rpi_hwmon_remove,
>>> + .driver = {
>>> + .name = "raspberrypi-hwmon",
>>> + },
>>> +};
>>> +module_platform_driver(rpi_hwmon_driver);
>>> +
>>> +MODULE_AUTHOR("Stefan Wahren <stefan.wahren@i2se.com>");
>>> +MODULE_DESCRIPTION("Raspberry Pi voltage sensor driver");
>>> +MODULE_LICENSE("GPL v2");
>>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v8 6/6] cpuset: Allow reporting of sched domain generation info
From: Juri Lelli @ 2018-05-22 13:53 UTC (permalink / raw)
To: Waiman Long
Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
Mike Galbraith, torvalds, Roman Gushchin
In-Reply-To: <1526590545-3350-7-git-send-email-longman@redhat.com>
Hi,
On 17/05/18 16:55, Waiman Long wrote:
> This patch enables us to report sched domain generation information.
>
> If DYNAMIC_DEBUG is enabled, issuing the following command
>
> echo "file cpuset.c +p" > /sys/kernel/debug/dynamic_debug/control
>
> and setting loglevel to 8 will allow the kernel to show what scheduling
> domain changes are being made.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/cgroup/cpuset.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index fb8aa82b..8f586e8 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -820,6 +820,12 @@ static int generate_sched_domains(cpumask_var_t **domains,
> }
> BUG_ON(nslot != ndoms);
>
> +#ifdef CONFIG_DEBUG_KERNEL
> + for (i = 0; i < ndoms; i++)
> + pr_debug("generate_sched_domains dom %d: %*pbl\n", i,
> + cpumask_pr_args(doms[i]));
> +#endif
> +
While I'm always in favor of adding debug output, in this case I'm not
sure it's adding much to what we already print when booting with
sched_debug kernel command-line param, e.g.
--->8---
Kernel command line: BOOT_IMAGE=/vmlinuz-4.17.0-rc5+ ... sched_debug
[...]
smp: Bringing up secondary CPUs ...
x86: Booting SMP configuration:
.... node #0, CPUs: #1 #2 #3 #4 #5
.... node #1, CPUs: #6 #7 #8 #9 #10 #11
smp: Brought up 2 nodes, 12 CPUs
smpboot: Max logical packages: 2
smpboot: Total of 12 processors activated (45636.50 BogoMIPS)
CPU0 attaching sched-domain(s):
domain-0: span=0-5 level=MC
groups: 0:{ span=0 cap=1016 }, 1:{ span=1 cap=1011 }, 2:{ span=2 }, 3:{ span=3 cap=1023 }, 4:{ span=4 }, 5:{ span=5 }
domain-1: span=0-11 level=NUMA
groups: 0:{ span=0-5 cap=6122 }, 6:{ span=6-11 cap=6141 }
CPU1 attaching sched-domain(s):
domain-0: span=0-5 level=MC
groups: 1:{ span=1 cap=1011 }, 2:{ span=2 }, 3:{ span=3 cap=1023 }, 4:{ span=4 }, 5:{ span=5 }, 0:{ span=0 cap=1016 }
domain-1: span=0-11 level=NUMA
groups: 0:{ span=0-5 cap=6122 }, 6:{ span=6-11 cap=6141 }
CPU2 attaching sched-domain(s):
domain-0: span=0-5 level=MC
groups: 2:{ span=2 }, 3:{ span=3 cap=1023 }, 4:{ span=4 }, 5:{ span=5 }, 0:{ span=0 cap=1016 }, 1:{ span=1 cap=1011 }
domain-1: span=0-11 level=NUMA
groups: 0:{ span=0-5 cap=6122 }, 6:{ span=6-11 cap=6141 }
CPU3 attaching sched-domain(s):
domain-0: span=0-5 level=MC
groups: 3:{ span=3 cap=1023 }, 4:{ span=4 }, 5:{ span=5 }, 0:{ span=0 cap=1016 }, 1:{ span=1 cap=1011 }, 2:{ span=2 }
domain-1: span=0-11 level=NUMA
groups: 0:{ span=0-5 cap=6122 }, 6:{ span=6-11 cap=6141 }
CPU4 attaching sched-domain(s):
domain-0: span=0-5 level=MC
groups: 4:{ span=4 }, 5:{ span=5 }, 0:{ span=0 cap=1016 }, 1:{ span=1 cap=1011 }, 2:{ span=2 }, 3:{ span=3 cap=1023 }
domain-1: span=0-11 level=NUMA
groups: 0:{ span=0-5 cap=6122 }, 6:{ span=6-11 cap=6141 }
CPU5 attaching sched-domain(s):
domain-0: span=0-5 level=MC
groups: 5:{ span=5 }, 0:{ span=0 cap=1016 }, 1:{ span=1 cap=1011 }, 2:{ span=2 }, 3:{ span=3 cap=1023 }, 4:{ span=4 }
domain-1: span=0-11 level=NUMA
groups: 0:{ span=0-5 cap=6122 }, 6:{ span=6-11 cap=6141 }
CPU6 attaching sched-domain(s):
domain-0: span=6-11 level=MC
groups: 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 cap=1021 }, 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }
domain-1: span=0-11 level=NUMA
groups: 6:{ span=6-11 cap=6141 }, 0:{ span=0-5 cap=6122 }
CPU7 attaching sched-domain(s):
domain-0: span=6-11 level=MC
groups: 7:{ span=7 }, 8:{ span=8 cap=1021 }, 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }, 6:{ span=6 }
domain-1: span=0-11 level=NUMA
groups: 6:{ span=6-11 cap=6141 }, 0:{ span=0-5 cap=6122 }
CPU8 attaching sched-domain(s):
domain-0: span=6-11 level=MC
groups: 8:{ span=8 cap=1021 }, 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }, 6:{ span=6 }, 7:{ span=7 }
domain-1: span=0-11 level=NUMA
groups: 6:{ span=6-11 cap=6141 }, 0:{ span=0-5 cap=6122 }
CPU9 attaching sched-domain(s):
domain-0: span=6-11 level=MC
groups: 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }, 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 cap=1021 }
domain-1: span=0-11 level=NUMA
groups: 6:{ span=6-11 cap=6141 }, 0:{ span=0-5 cap=6122 }
CPU10 attaching sched-domain(s):
domain-0: span=6-11 level=MC
groups: 10:{ span=10 }, 11:{ span=11 }, 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 cap=1021 }, 9:{ span=9 }
domain-1: span=0-11 level=NUMA
groups: 6:{ span=6-11 cap=6141 }, 0:{ span=0-5 cap=6122 }
CPU11 attaching sched-domain(s):
domain-0: span=6-11 level=MC
groups: 11:{ span=11 }, 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 cap=1021 }, 9:{ span=9 }, 10:{ span=10 }
domain-1: span=0-11 level=NUMA
groups: 6:{ span=6-11 cap=6141 }, 0:{ span=0-5 cap=6122 }
span: 0-11 (max cpu_capacity = 1024)
[...]
generate_sched_domains dom 0: 6-11 <-- this and the one below is what
generate_sched_domains dom 1: 0-5 you are adding
CPU0 attaching NULL sched-domain.
CPU1 attaching NULL sched-domain.
CPU2 attaching NULL sched-domain.
CPU3 attaching NULL sched-domain.
CPU4 attaching NULL sched-domain.
CPU5 attaching NULL sched-domain.
CPU6 attaching NULL sched-domain.
CPU7 attaching NULL sched-domain.
CPU8 attaching NULL sched-domain.
CPU9 attaching NULL sched-domain.
CPU10 attaching NULL sched-domain.
CPU11 attaching NULL sched-domain.
CPU6 attaching sched-domain(s):
domain-0: span=6-11 level=MC
groups: 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 }, 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }
CPU7 attaching sched-domain(s):
domain-0: span=6-11 level=MC
groups: 7:{ span=7 }, 8:{ span=8 }, 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }, 6:{ span=6 }
CPU8 attaching sched-domain(s):
domain-0: span=6-11 level=MC
groups: 8:{ span=8 }, 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }, 6:{ span=6 }, 7:{ span=7 }
CPU9 attaching sched-domain(s):
domain-0: span=6-11 level=MC
groups: 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }, 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 }
CPU10 attaching sched-domain(s):
domain-0: span=6-11 level=MC
groups: 10:{ span=10 }, 11:{ span=11 }, 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 }, 9:{ span=9 }
CPU11 attaching sched-domain(s):
domain-0: span=6-11 level=MC
groups: 11:{ span=11 }, 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 }, 9:{ span=9 }, 10:{ span=10 }
span: 6-11 (max cpu_capacity = 1024)
CPU0 attaching sched-domain(s):
domain-0: span=0-5 level=MC
groups: 0:{ span=0 }, 1:{ span=1 }, 2:{ span=2 }, 3:{ span=3 }, 4:{ span=4 }, 5:{ span=5 }
CPU1 attaching sched-domain(s):
domain-0: span=0-5 level=MC
groups: 1:{ span=1 }, 2:{ span=2 }, 3:{ span=3 }, 4:{ span=4 }, 5:{ span=5 }, 0:{ span=0 }
CPU2 attaching sched-domain(s):
domain-0: span=0-5 level=MC
groups: 2:{ span=2 }, 3:{ span=3 }, 4:{ span=4 }, 5:{ span=5 }, 0:{ span=0 }, 1:{ span=1 }
CPU3 attaching sched-domain(s):
domain-0: span=0-5 level=MC
groups: 3:{ span=3 }, 4:{ span=4 }, 5:{ span=5 }, 0:{ span=0 }, 1:{ span=1 }, 2:{ span=2 }
CPU4 attaching sched-domain(s):
domain-0: span=0-5 level=MC
groups: 4:{ span=4 }, 5:{ span=5 }, 0:{ span=0 }, 1:{ span=1 }, 2:{ span=2 }, 3:{ span=3 }
CPU5 attaching sched-domain(s):
domain-0: span=0-5 level=MC
groups: 5:{ span=5 }, 0:{ span=0 }, 1:{ span=1 }, 2:{ span=2 }, 3:{ span=3 }, 4:{ span=4 }
span: 0-5 (max cpu_capacity = 1024)
--->8---
Do you think there is still a benefit in printing out what
generate_sched_domains does?
Best,
- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH RFC V2 2/6] hwmon: Add support for RPi voltage sensor
From: Stefan Wahren @ 2018-05-22 13:51 UTC (permalink / raw)
To: Rob Herring, Guenter Roeck, Eric Anholt, Mark Rutland,
Jonathan Corbet, Jean Delvare
Cc: Scott Branden, Florian Fainelli, linux-rpi-kernel, Phil Elwell,
bcm-kernel-feedback-list, linux-doc, devicetree, linux-hwmon,
Noralf Trønnes, Ray Jui, linux-arm-kernel
In-Reply-To: <90a768aa-ee8c-1050-cf15-60637069dbdb@roeck-us.net>
Hi Guenter,
> Guenter Roeck <linux@roeck-us.net> hat am 22. Mai 2018 um 15:41 geschrieben:
>
>
> On 05/22/2018 04:21 AM, Stefan Wahren wrote:
> > Currently there is no easy way to detect undervoltage conditions on a
> > remote Raspberry Pi. This hwmon driver retrieves the state of the
> > undervoltage sensor via mailbox interface. The handling based on
> > Noralf's modifications to the downstream firmware driver. In case of
> > an undervoltage condition only an entry is written to the kernel log.
> >
> > CC: "Noralf Trønnes" <noralf@tronnes.org>
> > Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
> > ---
> > Documentation/hwmon/raspberrypi-hwmon | 22 +++++
> > drivers/hwmon/Kconfig | 10 ++
> > drivers/hwmon/Makefile | 1 +
> > drivers/hwmon/raspberrypi-hwmon.c | 168 ++++++++++++++++++++++++++++++++++
> > 4 files changed, 201 insertions(+)
> > create mode 100644 Documentation/hwmon/raspberrypi-hwmon
> > create mode 100644 drivers/hwmon/raspberrypi-hwmon.c
> >
> > diff --git a/Documentation/hwmon/raspberrypi-hwmon b/Documentation/hwmon/raspberrypi-hwmon
> > new file mode 100644
> > index 0000000..3c92e2c
> > --- /dev/null
> > +++ b/Documentation/hwmon/raspberrypi-hwmon
> > @@ -0,0 +1,22 @@
> > +Kernel driver raspberrypi-hwmon
> > +===============================
> > +
> > +Supported boards:
> > + * Raspberry Pi A+ (via GPIO on SoC)
> > + * Raspberry Pi B+ (via GPIO on SoC)
> > + * Raspberry Pi 2 B (via GPIO on SoC)
> > + * Raspberry Pi 3 B (via GPIO on port expander)
> > + * Raspberry Pi 3 B+ (via PMIC)
> > +
> > +Author: Stefan Wahren <stefan.wahren@i2se.com>
> > +
> > +Description
> > +-----------
> > +
> > +This driver periodically polls a mailbox property of the VC4 firmware to detect
> > +undervoltage conditions.
> > +
> > +Sysfs entries
> > +-------------
> > +
> > +in0_lcrit_alarm Undervoltage alarm
> > diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> > index 768aed5..9a5bdb0 100644
> > --- a/drivers/hwmon/Kconfig
> > +++ b/drivers/hwmon/Kconfig
> > @@ -1298,6 +1298,16 @@ config SENSORS_PWM_FAN
> > This driver can also be built as a module. If so, the module
> > will be called pwm-fan.
> >
> > +config SENSORS_RASPBERRYPI_HWMON
> > + tristate "Raspberry Pi voltage monitor"
> > + depends on RASPBERRYPI_FIRMWARE || COMPILE_TEST
> > + help
> > + If you say yes here you get support for voltage sensor on the
> > + Raspberry Pi.
> > +
> > + This driver can also be built as a module. If so, the module
> > + will be called raspberrypi-hwmon.
> > +
> > config SENSORS_SHT15
> > tristate "Sensiron humidity and temperature sensors. SHT15 and compat."
> > depends on GPIOLIB || COMPILE_TEST
> > diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> > index e7d52a3..a929770 100644
> > --- a/drivers/hwmon/Makefile
> > +++ b/drivers/hwmon/Makefile
> > @@ -141,6 +141,7 @@ obj-$(CONFIG_SENSORS_PC87427) += pc87427.o
> > obj-$(CONFIG_SENSORS_PCF8591) += pcf8591.o
> > obj-$(CONFIG_SENSORS_POWR1220) += powr1220.o
> > obj-$(CONFIG_SENSORS_PWM_FAN) += pwm-fan.o
> > +obj-$(CONFIG_SENSORS_RASPBERRYPI_HWMON) += raspberrypi-hwmon.o
> > obj-$(CONFIG_SENSORS_S3C) += s3c-hwmon.o
> > obj-$(CONFIG_SENSORS_SCH56XX_COMMON)+= sch56xx-common.o
> > obj-$(CONFIG_SENSORS_SCH5627) += sch5627.o
> > diff --git a/drivers/hwmon/raspberrypi-hwmon.c b/drivers/hwmon/raspberrypi-hwmon.c
> > new file mode 100644
> > index 0000000..6233e84
> > --- /dev/null
> > +++ b/drivers/hwmon/raspberrypi-hwmon.c
> > @@ -0,0 +1,168 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +/*
> > + * Raspberry Pi voltage sensor driver
> > + *
> > + * Based on firmware/raspberrypi.c by Noralf Trønnes
> > + *
> > + * Copyright (C) 2018 Stefan Wahren <stefan.wahren@i2se.com>
> > + */
> > +#include <linux/device.h>
> > +#include <linux/err.h>
> > +#include <linux/hwmon.h>
> > +#include <linux/module.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/slab.h>
> > +#include <linux/workqueue.h>
> > +#include <soc/bcm2835/raspberrypi-firmware.h>
> > +
> > +#define UNDERVOLTAGE_STICKY_BIT BIT(16)
> > +
> > +struct rpi_hwmon_data {
> > + struct device *hwmon_dev;
> > + struct rpi_firmware *fw;
> > + u32 last_throttled;
> > + struct delayed_work get_values_poll_work;
> > +};
> > +
> > +static void rpi_firmware_get_throttled(struct rpi_hwmon_data *data)
> > +{
> > + u32 new_uv, old_uv, value;
> > + int ret;
> > +
> > + /* Request firmware to clear sticky bits */
> > + value = 0xffff;
> > +
> > + ret = rpi_firmware_property(data->fw, RPI_FIRMWARE_GET_THROTTLED,
> > + &value, sizeof(value));
> > + if (ret) {
> > + dev_err_once(data->hwmon_dev, "Failed to get throttled (%d)\n",
> > + ret);
> > + return;
> > + }
> > +
> > + new_uv = value & UNDERVOLTAGE_STICKY_BIT;
> > + old_uv = data->last_throttled & UNDERVOLTAGE_STICKY_BIT;
> > + data->last_throttled = value;
> > +
> > + if (new_uv == old_uv)
> > + return;
> > +
> > + if (new_uv)
> > + dev_crit(data->hwmon_dev, "Undervoltage detected!\n");
> > + else
> > + dev_info(data->hwmon_dev, "Voltage normalised\n");
> > +
> > + sysfs_notify(&data->hwmon_dev->kobj, NULL, "in0_lcrit_alarm");
> > +}
> > +
> > +static void get_values_poll(struct work_struct *work)
> > +{
> > + struct rpi_hwmon_data *data;
> > +
> > + data = container_of(work, struct rpi_hwmon_data,
> > + get_values_poll_work.work);
> > +
> > + rpi_firmware_get_throttled(data);
> > +
> > + /*
> > + * We can't run faster than the sticky shift (100ms) since we get
> > + * flipping in the sticky bits that are cleared.
> > + */
> > + schedule_delayed_work(&data->get_values_poll_work, 2 * HZ);
> > +}
> > +
> > +static int rpi_read(struct device *dev, enum hwmon_sensor_types type,
> > + u32 attr, int channel, long *val)
> > +{
> > + struct rpi_hwmon_data *data = dev_get_drvdata(dev);
> > +
> > + *val = !!(data->last_throttled & UNDERVOLTAGE_STICKY_BIT);
> > + return 0;
> > +}
> > +
> > +static umode_t rpi_is_visible(const void *_data, enum hwmon_sensor_types type,
> > + u32 attr, int channel)
> > +{
> > + return 0444;
> > +}
> > +
> > +static const u32 rpi_in_config[] = {
> > + HWMON_I_LCRIT_ALARM,
> > + 0
> > +};
> > +
> > +static const struct hwmon_channel_info rpi_in = {
> > + .type = hwmon_in,
> > + .config = rpi_in_config,
> > +};
> > +
> > +static const struct hwmon_channel_info *rpi_info[] = {
> > + &rpi_in,
> > + NULL
> > +};
> > +
> > +static const struct hwmon_ops rpi_hwmon_ops = {
> > + .is_visible = rpi_is_visible,
> > + .read = rpi_read,
> > +};
> > +
> > +static const struct hwmon_chip_info rpi_chip_info = {
> > + .ops = &rpi_hwmon_ops,
> > + .info = rpi_info,
> > +};
> > +
> > +static int rpi_hwmon_probe(struct platform_device *pdev)
> > +{
> > + struct device *dev = &pdev->dev;
> > + struct rpi_hwmon_data *data;
> > + int ret;
> > +
> > + data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> > + if (!data)
> > + return -ENOMEM;
> > +
> > + data->fw = platform_get_drvdata(to_platform_device(dev->parent));
> > + if (!data->fw)
> > + return -EPROBE_DEFER;
> > +
>
> I am a bit at loss here (and sorry I didn't bring this up before).
> How would this ever be possible, given that the driver is registered
> from the firmware driver ?
Do you refer to the (wrong) return code, the assumption that the parent must be a platform driver or a possible race?
>
> > + ret = rpi_firmware_property(data->fw, RPI_FIRMWARE_GET_THROTTLED,
> > + &data->last_throttled,
> > + sizeof(data->last_throttled));
> > + if (ret)
> > + return -ENODEV;
> > +
> > + data->hwmon_dev = devm_hwmon_device_register_with_info(dev, "rpi_volt",
> > + data,
> > + &rpi_chip_info,
> > + NULL);
> > +
> > + INIT_DELAYED_WORK(&data->get_values_poll_work, get_values_poll);
> > + platform_set_drvdata(pdev, data);
> > +
> > + if (!PTR_ERR_OR_ZERO(data->hwmon_dev))
> > + schedule_delayed_work(&data->get_values_poll_work, 2 * HZ);
> > +
> > + return PTR_ERR_OR_ZERO(data->hwmon_dev);
> > +}
> > +
> > +static int rpi_hwmon_remove(struct platform_device *pdev)
> > +{
> > + struct rpi_hwmon_data *data = platform_get_drvdata(pdev);
> > +
> > + cancel_delayed_work_sync(&data->get_values_poll_work);
> > +
> > + return 0;
> > +}
> > +
> > +static struct platform_driver rpi_hwmon_driver = {
> > + .probe = rpi_hwmon_probe,
> > + .remove = rpi_hwmon_remove,
> > + .driver = {
> > + .name = "raspberrypi-hwmon",
> > + },
> > +};
> > +module_platform_driver(rpi_hwmon_driver);
> > +
> > +MODULE_AUTHOR("Stefan Wahren <stefan.wahren@i2se.com>");
> > +MODULE_DESCRIPTION("Raspberry Pi voltage sensor driver");
> > +MODULE_LICENSE("GPL v2");
> >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v2 0/7] mm: pages for hugetlb's overcommit may be able to charge to memcg
From: Michal Hocko @ 2018-05-22 13:51 UTC (permalink / raw)
To: TSUKADA Koutaro
Cc: Johannes Weiner, Vladimir Davydov, Jonathan Corbet,
Luis R. Rodriguez, Kees Cook, Andrew Morton, Roman Gushchin,
David Rientjes, Mike Kravetz, Aneesh Kumar K.V, Naoya Horiguchi,
Anshuman Khandual, Marc-Andre Lureau, Punit Agrawal, Dan Williams,
Vlastimil Babka, linux-doc, linux-kernel, linux-fsdevel, linux-mm,
cgroups
In-Reply-To: <e863529b-7ce5-4fbe-8cff-581b5789a5f9@ascade.co.jp>
On Fri 18-05-18 13:27:27, TSUKADA Koutaro wrote:
> Thanks to Mike Kravetz for comment on the previous version patch.
I am sorry that I didn't join the discussion for the previous version
but time just didn't allow that. So sorry if I am repeating something
already sorted out.
> The purpose of this patch-set is to make it possible to control whether or
> not to charge surplus hugetlb pages obtained by overcommitting to memory
> cgroup. In the future, I am trying to accomplish limiting the memory usage
> of applications that use both normal pages and hugetlb pages by the memory
> cgroup(not use the hugetlb cgroup).
There was a deliberate decision to keep hugetlb and "normal" memory
cgroup controllers separate. Mostly because hugetlb memory is an
artificial memory subsystem on its own and it doesn't fit into the rest
of memcg accounted memory very well. I believe we want to keep that
status quo.
> Applications that use shared libraries like libhugetlbfs.so use both normal
> pages and hugetlb pages, but we do not know how much to use each. Please
> suppose you want to manage the memory usage of such applications by cgroup
> How do you set the memory cgroup and hugetlb cgroup limit when you want to
> limit memory usage to 10GB?
Well such a usecase requires an explicit configuration already. Either
by using special wrappers or modifying the code. So I would argue that
you have quite a good knowlege of the setup. If you need a greater
flexibility then just do not use hugetlb at all and rely on THP.
[...]
> In this patch-set, introduce the charge_surplus_huge_pages(boolean) to
> struct hstate. If it is true, it charges to the memory cgroup to which the
> task that obtained surplus hugepages belongs. If it is false, do nothing as
> before, and the default value is false. The charge_surplus_huge_pages can
> be controlled procfs or sysfs interfaces.
I do not really think this is a good idea. We really do not want to make
the current hugetlb code more complex than it is already. The current
hugetlb cgroup controller is simple and works at least somehow. I would
not add more on top unless there is a _really_ strong usecase behind.
Please make sure to describe such a usecase in details before we even
start considering the code.
> Since THP is very effective in environments with kernel page size of 4KB,
> such as x86, there is no reason to positively use HugeTLBfs, so I think
> that there is no situation to enable charge_surplus_huge_pages. However, in
> some distributions such as arm64, the page size of the kernel is 64KB, and
> the size of THP is too huge as 512MB, making it difficult to use. HugeTLBfs
> may support multiple huge page sizes, and in such a special environment
> there is a desire to use HugeTLBfs.
Well, then I would argue that you shouldn't use 64kB pages for your
setup or allow THP for smaller sizes. Really hugetlb pages are by no
means a substitute here.
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH RFC V2 2/6] hwmon: Add support for RPi voltage sensor
From: Guenter Roeck @ 2018-05-22 13:41 UTC (permalink / raw)
To: Stefan Wahren, Rob Herring, Mark Rutland, Jean Delvare,
Jonathan Corbet, Eric Anholt
Cc: Florian Fainelli, Ray Jui, Scott Branden, Phil Elwell,
bcm-kernel-feedback-list, devicetree, linux-arm-kernel,
linux-rpi-kernel, linux-hwmon, linux-doc, Noralf Trønnes
In-Reply-To: <1526988112-4021-3-git-send-email-stefan.wahren@i2se.com>
On 05/22/2018 04:21 AM, Stefan Wahren wrote:
> Currently there is no easy way to detect undervoltage conditions on a
> remote Raspberry Pi. This hwmon driver retrieves the state of the
> undervoltage sensor via mailbox interface. The handling based on
> Noralf's modifications to the downstream firmware driver. In case of
> an undervoltage condition only an entry is written to the kernel log.
>
> CC: "Noralf Trønnes" <noralf@tronnes.org>
> Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
> ---
> Documentation/hwmon/raspberrypi-hwmon | 22 +++++
> drivers/hwmon/Kconfig | 10 ++
> drivers/hwmon/Makefile | 1 +
> drivers/hwmon/raspberrypi-hwmon.c | 168 ++++++++++++++++++++++++++++++++++
> 4 files changed, 201 insertions(+)
> create mode 100644 Documentation/hwmon/raspberrypi-hwmon
> create mode 100644 drivers/hwmon/raspberrypi-hwmon.c
>
> diff --git a/Documentation/hwmon/raspberrypi-hwmon b/Documentation/hwmon/raspberrypi-hwmon
> new file mode 100644
> index 0000000..3c92e2c
> --- /dev/null
> +++ b/Documentation/hwmon/raspberrypi-hwmon
> @@ -0,0 +1,22 @@
> +Kernel driver raspberrypi-hwmon
> +===============================
> +
> +Supported boards:
> + * Raspberry Pi A+ (via GPIO on SoC)
> + * Raspberry Pi B+ (via GPIO on SoC)
> + * Raspberry Pi 2 B (via GPIO on SoC)
> + * Raspberry Pi 3 B (via GPIO on port expander)
> + * Raspberry Pi 3 B+ (via PMIC)
> +
> +Author: Stefan Wahren <stefan.wahren@i2se.com>
> +
> +Description
> +-----------
> +
> +This driver periodically polls a mailbox property of the VC4 firmware to detect
> +undervoltage conditions.
> +
> +Sysfs entries
> +-------------
> +
> +in0_lcrit_alarm Undervoltage alarm
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index 768aed5..9a5bdb0 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -1298,6 +1298,16 @@ config SENSORS_PWM_FAN
> This driver can also be built as a module. If so, the module
> will be called pwm-fan.
>
> +config SENSORS_RASPBERRYPI_HWMON
> + tristate "Raspberry Pi voltage monitor"
> + depends on RASPBERRYPI_FIRMWARE || COMPILE_TEST
> + help
> + If you say yes here you get support for voltage sensor on the
> + Raspberry Pi.
> +
> + This driver can also be built as a module. If so, the module
> + will be called raspberrypi-hwmon.
> +
> config SENSORS_SHT15
> tristate "Sensiron humidity and temperature sensors. SHT15 and compat."
> depends on GPIOLIB || COMPILE_TEST
> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> index e7d52a3..a929770 100644
> --- a/drivers/hwmon/Makefile
> +++ b/drivers/hwmon/Makefile
> @@ -141,6 +141,7 @@ obj-$(CONFIG_SENSORS_PC87427) += pc87427.o
> obj-$(CONFIG_SENSORS_PCF8591) += pcf8591.o
> obj-$(CONFIG_SENSORS_POWR1220) += powr1220.o
> obj-$(CONFIG_SENSORS_PWM_FAN) += pwm-fan.o
> +obj-$(CONFIG_SENSORS_RASPBERRYPI_HWMON) += raspberrypi-hwmon.o
> obj-$(CONFIG_SENSORS_S3C) += s3c-hwmon.o
> obj-$(CONFIG_SENSORS_SCH56XX_COMMON)+= sch56xx-common.o
> obj-$(CONFIG_SENSORS_SCH5627) += sch5627.o
> diff --git a/drivers/hwmon/raspberrypi-hwmon.c b/drivers/hwmon/raspberrypi-hwmon.c
> new file mode 100644
> index 0000000..6233e84
> --- /dev/null
> +++ b/drivers/hwmon/raspberrypi-hwmon.c
> @@ -0,0 +1,168 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Raspberry Pi voltage sensor driver
> + *
> + * Based on firmware/raspberrypi.c by Noralf Trønnes
> + *
> + * Copyright (C) 2018 Stefan Wahren <stefan.wahren@i2se.com>
> + */
> +#include <linux/device.h>
> +#include <linux/err.h>
> +#include <linux/hwmon.h>
> +#include <linux/module.h>
> +#include <linux/platform_device.h>
> +#include <linux/slab.h>
> +#include <linux/workqueue.h>
> +#include <soc/bcm2835/raspberrypi-firmware.h>
> +
> +#define UNDERVOLTAGE_STICKY_BIT BIT(16)
> +
> +struct rpi_hwmon_data {
> + struct device *hwmon_dev;
> + struct rpi_firmware *fw;
> + u32 last_throttled;
> + struct delayed_work get_values_poll_work;
> +};
> +
> +static void rpi_firmware_get_throttled(struct rpi_hwmon_data *data)
> +{
> + u32 new_uv, old_uv, value;
> + int ret;
> +
> + /* Request firmware to clear sticky bits */
> + value = 0xffff;
> +
> + ret = rpi_firmware_property(data->fw, RPI_FIRMWARE_GET_THROTTLED,
> + &value, sizeof(value));
> + if (ret) {
> + dev_err_once(data->hwmon_dev, "Failed to get throttled (%d)\n",
> + ret);
> + return;
> + }
> +
> + new_uv = value & UNDERVOLTAGE_STICKY_BIT;
> + old_uv = data->last_throttled & UNDERVOLTAGE_STICKY_BIT;
> + data->last_throttled = value;
> +
> + if (new_uv == old_uv)
> + return;
> +
> + if (new_uv)
> + dev_crit(data->hwmon_dev, "Undervoltage detected!\n");
> + else
> + dev_info(data->hwmon_dev, "Voltage normalised\n");
> +
> + sysfs_notify(&data->hwmon_dev->kobj, NULL, "in0_lcrit_alarm");
> +}
> +
> +static void get_values_poll(struct work_struct *work)
> +{
> + struct rpi_hwmon_data *data;
> +
> + data = container_of(work, struct rpi_hwmon_data,
> + get_values_poll_work.work);
> +
> + rpi_firmware_get_throttled(data);
> +
> + /*
> + * We can't run faster than the sticky shift (100ms) since we get
> + * flipping in the sticky bits that are cleared.
> + */
> + schedule_delayed_work(&data->get_values_poll_work, 2 * HZ);
> +}
> +
> +static int rpi_read(struct device *dev, enum hwmon_sensor_types type,
> + u32 attr, int channel, long *val)
> +{
> + struct rpi_hwmon_data *data = dev_get_drvdata(dev);
> +
> + *val = !!(data->last_throttled & UNDERVOLTAGE_STICKY_BIT);
> + return 0;
> +}
> +
> +static umode_t rpi_is_visible(const void *_data, enum hwmon_sensor_types type,
> + u32 attr, int channel)
> +{
> + return 0444;
> +}
> +
> +static const u32 rpi_in_config[] = {
> + HWMON_I_LCRIT_ALARM,
> + 0
> +};
> +
> +static const struct hwmon_channel_info rpi_in = {
> + .type = hwmon_in,
> + .config = rpi_in_config,
> +};
> +
> +static const struct hwmon_channel_info *rpi_info[] = {
> + &rpi_in,
> + NULL
> +};
> +
> +static const struct hwmon_ops rpi_hwmon_ops = {
> + .is_visible = rpi_is_visible,
> + .read = rpi_read,
> +};
> +
> +static const struct hwmon_chip_info rpi_chip_info = {
> + .ops = &rpi_hwmon_ops,
> + .info = rpi_info,
> +};
> +
> +static int rpi_hwmon_probe(struct platform_device *pdev)
> +{
> + struct device *dev = &pdev->dev;
> + struct rpi_hwmon_data *data;
> + int ret;
> +
> + data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> + if (!data)
> + return -ENOMEM;
> +
> + data->fw = platform_get_drvdata(to_platform_device(dev->parent));
> + if (!data->fw)
> + return -EPROBE_DEFER;
> +
I am a bit at loss here (and sorry I didn't bring this up before).
How would this ever be possible, given that the driver is registered
from the firmware driver ?
> + ret = rpi_firmware_property(data->fw, RPI_FIRMWARE_GET_THROTTLED,
> + &data->last_throttled,
> + sizeof(data->last_throttled));
> + if (ret)
> + return -ENODEV;
> +
> + data->hwmon_dev = devm_hwmon_device_register_with_info(dev, "rpi_volt",
> + data,
> + &rpi_chip_info,
> + NULL);
> +
> + INIT_DELAYED_WORK(&data->get_values_poll_work, get_values_poll);
> + platform_set_drvdata(pdev, data);
> +
> + if (!PTR_ERR_OR_ZERO(data->hwmon_dev))
> + schedule_delayed_work(&data->get_values_poll_work, 2 * HZ);
> +
> + return PTR_ERR_OR_ZERO(data->hwmon_dev);
> +}
> +
> +static int rpi_hwmon_remove(struct platform_device *pdev)
> +{
> + struct rpi_hwmon_data *data = platform_get_drvdata(pdev);
> +
> + cancel_delayed_work_sync(&data->get_values_poll_work);
> +
> + return 0;
> +}
> +
> +static struct platform_driver rpi_hwmon_driver = {
> + .probe = rpi_hwmon_probe,
> + .remove = rpi_hwmon_remove,
> + .driver = {
> + .name = "raspberrypi-hwmon",
> + },
> +};
> +module_platform_driver(rpi_hwmon_driver);
> +
> +MODULE_AUTHOR("Stefan Wahren <stefan.wahren@i2se.com>");
> +MODULE_DESCRIPTION("Raspberry Pi voltage sensor driver");
> +MODULE_LICENSE("GPL v2");
>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v8 2/6] cpuset: Add new v2 cpuset.sched.domain flag
From: Waiman Long @ 2018-05-22 13:20 UTC (permalink / raw)
To: Juri Lelli
Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
Mike Galbraith, torvalds, Roman Gushchin
In-Reply-To: <20180522125750.GA31040@localhost.localdomain>
On 05/22/2018 08:57 AM, Juri Lelli wrote:
> Hi,
>
> On 17/05/18 16:55, Waiman Long wrote:
>
> [...]
>
>> /**
>> + * update_isolated_cpumask - update the isolated_cpus mask of parent cpuset
>> + * @cpuset: The cpuset that requests CPU isolation
>> + * @oldmask: The old isolated cpumask to be removed from the parent
>> + * @newmask: The new isolated cpumask to be added to the parent
>> + * Return: 0 if successful, an error code otherwise
>> + *
>> + * Changes to the isolated CPUs are not allowed if any of CPUs changing
>> + * state are in any of the child cpusets of the parent except the requesting
>> + * child.
>> + *
>> + * If the sched_domain flag changes, either the oldmask (0=>1) or the
>> + * newmask (1=>0) will be NULL.
>> + *
>> + * Called with cpuset_mutex held.
>> + */
>> +static int update_isolated_cpumask(struct cpuset *cpuset,
>> + struct cpumask *oldmask, struct cpumask *newmask)
>> +{
>> + int retval;
>> + int adding, deleting;
>> + cpumask_var_t addmask, delmask;
>> + struct cpuset *parent = parent_cs(cpuset);
>> + struct cpuset *sibling;
>> + struct cgroup_subsys_state *pos_css;
>> + int old_count = parent->isolation_count;
>> + bool dying = cpuset->css.flags & CSS_DYING;
>> +
>> + /*
>> + * Parent must be a scheduling domain with non-empty cpus_allowed.
>> + */
>> + if (!is_sched_domain(parent) || cpumask_empty(parent->cpus_allowed))
>> + return -EINVAL;
>> +
>> + /*
>> + * The oldmask, if present, must be a subset of parent's isolated
>> + * CPUs.
>> + */
>> + if (oldmask && !cpumask_empty(oldmask) && (!parent->isolation_count ||
>> + !cpumask_subset(oldmask, parent->isolated_cpus))) {
>> + WARN_ON_ONCE(1);
>> + return -EINVAL;
>> + }
>> +
>> + /*
>> + * A sched_domain state change is not allowed if there are
>> + * online children and the cpuset is not dying.
>> + */
>> + if (!dying && (!oldmask || !newmask) &&
>> + css_has_online_children(&cpuset->css))
>> + return -EBUSY;
>> +
>> + if (!zalloc_cpumask_var(&addmask, GFP_KERNEL))
>> + return -ENOMEM;
>> + if (!zalloc_cpumask_var(&delmask, GFP_KERNEL)) {
>> + free_cpumask_var(addmask);
>> + return -ENOMEM;
>> + }
>> +
>> + if (!old_count) {
>> + if (!zalloc_cpumask_var(&parent->isolated_cpus, GFP_KERNEL)) {
>> + retval = -ENOMEM;
>> + goto out;
>> + }
>> + old_count = 1;
>> + }
>> +
>> + retval = -EBUSY;
>> + adding = deleting = false;
>> + if (newmask)
>> + cpumask_copy(addmask, newmask);
>> + if (oldmask)
>> + deleting = cpumask_andnot(delmask, oldmask, addmask);
>> + if (newmask)
>> + adding = cpumask_andnot(addmask, newmask, delmask);
>> +
>> + if (!adding && !deleting)
>> + goto out_ok;
>> +
>> + /*
>> + * The cpus to be added must be in the parent's effective_cpus mask
>> + * but not in the isolated_cpus mask.
>> + */
>> + if (!cpumask_subset(addmask, parent->effective_cpus))
>> + goto out;
>> + if (parent->isolation_count &&
>> + cpumask_intersects(parent->isolated_cpus, addmask))
>> + goto out;
>> +
>> + /*
>> + * Check if any CPUs in addmask or delmask are in a sibling cpuset.
>> + * An empty sibling cpus_allowed means it is the same as parent's
>> + * effective_cpus. This checking is skipped if the cpuset is dying.
>> + */
>> + if (dying)
>> + goto updated_isolated_cpus;
>> +
>> + cpuset_for_each_child(sibling, pos_css, parent) {
>> + if ((sibling == cpuset) || !(sibling->css.flags & CSS_ONLINE))
>> + continue;
>> + if (cpumask_empty(sibling->cpus_allowed))
>> + goto out;
>> + if (adding &&
>> + cpumask_intersects(sibling->cpus_allowed, addmask))
>> + goto out;
>> + if (deleting &&
>> + cpumask_intersects(sibling->cpus_allowed, delmask))
>> + goto out;
>> + }
> Just got the below by echoing 1 into cpuset.sched.domain of a sibling with
> "isolated" cpuset.cpus. Guess you are missing proper locking about here
> above.
>
> --->8---
> [ 7509.905005] =============================
> [ 7509.905009] WARNING: suspicious RCU usage
> [ 7509.905014] 4.17.0-rc5+ #11 Not tainted
> [ 7509.905017] -----------------------------
> [ 7509.905023] /home/juri/work/kernel/linux/kernel/cgroup/cgroup.c:3826 cgroup_mutex or RCU read lock required!
> [ 7509.905026]
> other info that might help us debug this:
>
> [ 7509.905031]
> rcu_scheduler_active = 2, debug_locks = 1
> [ 7509.905036] 4 locks held by bash/1480:
> [ 7509.905039] #0: 00000000bf288709 (sb_writers#6){.+.+}, at: vfs_write+0x18a/0x1b0
> [ 7509.905072] #1: 00000000ebf23fc9 (&of->mutex){+.+.}, at: kernfs_fop_write+0xe2/0x1a0
> [ 7509.905098] #2: 00000000de7c626e (kn->count#302){.+.+}, at: kernfs_fop_write+0xeb/0x1a0
> [ 7509.905124] #3: 00000000a6a2bd9f (cpuset_mutex){+.+.}, at: cpuset_write_u64+0x23/0x140
> [ 7509.905149]
> stack backtrace:
> [ 7509.905156] CPU: 6 PID: 1480 Comm: bash Not tainted 4.17.0-rc5+ #11
> [ 7509.905160] Hardware name: LENOVO 30B6S2F900/1030, BIOS S01KT56A 01/15/2018
> [ 7509.905164] Call Trace:
> [ 7509.905176] dump_stack+0x85/0xcb
> [ 7509.905187] css_next_child+0x90/0xd0
> [ 7509.905195] update_isolated_cpumask+0x18f/0x2e0
> [ 7509.905208] update_flag+0x1f3/0x210
> [ 7509.905220] cpuset_write_u64+0xff/0x140
> [ 7509.905230] cgroup_file_write+0x178/0x230
> [ 7509.905244] kernfs_fop_write+0x113/0x1a0
> [ 7509.905254] __vfs_write+0x36/0x180
> [ 7509.905264] ? rcu_read_lock_sched_held+0x6b/0x80
> [ 7509.905270] ? rcu_sync_lockdep_assert+0x2e/0x60
> [ 7509.905278] ? __sb_start_write+0x13e/0x1a0
> [ 7509.905283] ? vfs_write+0x18a/0x1b0
> [ 7509.905293] vfs_write+0xc1/0x1b0
> [ 7509.905302] ksys_write+0x55/0xc0
> [ 7509.905317] do_syscall_64+0x60/0x200
> [ 7509.905327] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 7509.905333] RIP: 0033:0x7fee4fdfe414
> [ 7509.905338] RSP: 002b:00007fff364a80a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [ 7509.905346] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fee4fdfe414
> [ 7509.905350] RDX: 0000000000000002 RSI: 000055eb12f93740 RDI: 0000000000000001
> [ 7509.905354] RBP: 000055eb12f93740 R08: 000000000000000a R09: 00007fff364a7c30
> [ 7509.905358] R10: 000000000000000a R11: 0000000000000246 R12: 00007fee500cd760
> [ 7509.905361] R13: 0000000000000002 R14: 00007fee500c8760 R15: 0000000000000002
> --->8---
>
> Best,
>
> - Juri
>
Thanks for the testing. I will fix that in the next version.
-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v2 0/7] mm: pages for hugetlb's overcommit may be able to charge to memcg
From: TSUKADA Koutaro @ 2018-05-22 13:04 UTC (permalink / raw)
To: Mike Kravetz
Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Jonathan Corbet,
Luis R. Rodriguez, Kees Cook, Andrew Morton, Roman Gushchin,
David Rientjes, Aneesh Kumar K.V, Naoya Horiguchi,
Anshuman Khandual, Marc-Andre Lureau, Punit Agrawal, Dan Williams,
Vlastimil Babka, linux-doc, linux-kernel, linux-fsdevel, linux-mm,
cgroups
In-Reply-To: <240f1b14-ed7d-4983-6c52-be4899d4caa5@oracle.com>
On 2018/05/22 3:07, Mike Kravetz wrote:
> On 05/17/2018 09:27 PM, TSUKADA Koutaro wrote:
>> Thanks to Mike Kravetz for comment on the previous version patch.
>>
>> The purpose of this patch-set is to make it possible to control whether or
>> not to charge surplus hugetlb pages obtained by overcommitting to memory
>> cgroup. In the future, I am trying to accomplish limiting the memory usage
>> of applications that use both normal pages and hugetlb pages by the memory
>> cgroup(not use the hugetlb cgroup).
>>
>> Applications that use shared libraries like libhugetlbfs.so use both normal
>> pages and hugetlb pages, but we do not know how much to use each. Please
>> suppose you want to manage the memory usage of such applications by cgroup
>> How do you set the memory cgroup and hugetlb cgroup limit when you want to
>> limit memory usage to 10GB?
>>
>> If you set a limit of 10GB for each, the user can use a total of 20GB of
>> memory and can not limit it well. Since it is difficult to estimate the
>> ratio used by user of normal pages and hugetlb pages, setting limits of 2GB
>> to memory cgroup and 8GB to hugetlb cgroup is not very good idea. In such a
>> case, I thought that by using my patch-set, we could manage resources just
>> by setting 10GB as the limit of memory cgoup(there is no limit to hugetlb
>> cgroup).
>>
>> In this patch-set, introduce the charge_surplus_huge_pages(boolean) to
>> struct hstate. If it is true, it charges to the memory cgroup to which the
>> task that obtained surplus hugepages belongs. If it is false, do nothing as
>> before, and the default value is false. The charge_surplus_huge_pages can
>> be controlled procfs or sysfs interfaces.
>>
>> Since THP is very effective in environments with kernel page size of 4KB,
>> such as x86, there is no reason to positively use HugeTLBfs, so I think
>> that there is no situation to enable charge_surplus_huge_pages. However, in
>> some distributions such as arm64, the page size of the kernel is 64KB, and
>> the size of THP is too huge as 512MB, making it difficult to use. HugeTLBfs
>> may support multiple huge page sizes, and in such a special environment
>> there is a desire to use HugeTLBfs.
>
> One of the basic questions/concerns I have is accounting for surplus huge
> pages in the default memory resource controller. The existing huegtlb
> resource controller already takes hugetlbfs huge pages into account,
> including surplus pages. This series would allow surplus pages to be
> accounted for in the default memory controller, or the hugetlb controller
> or both.
>
> I understand that current mechanisms do not meet the needs of the above
> use case. The question is whether this is an appropriate way to approach
> the issue. My cgroup experience and knowledge is extremely limited, but
> it does not appear that any other resource can be controlled by multiple
> controllers. Therefore, I am concerned that this may be going against
> basic cgroup design philosophy.
Thank you for your feedback.
That makes sense, surplus hugepages are charged to both memcg and hugetlb
cgroup, which may be contrary to cgroup design philosophy.
Based on the above advice, I have considered the following improvements,
what do you think about?
The 'charge_surplus_hugepages' of v2 patch-set was an option to switch
"whether to charge memcg in addition to hugetlb cgroup", but it will be
abolished. Instead, change to "switch only to memcg instead of hugetlb
cgroup" option. This is called 'surplus_charge_to_memcg'.
The surplus_charge_to_memcg option is created in per hugetlb cgroup.
If it is false(default), charge destination cgroup of various page types
is the same as the current kernel version. If it become true, hugetlb
cgroup stops accounting for surplus hugepages, and memcg starts accounting
instead.
A table showing which cgroups are charged:
page types | current v2(off) v2(on) v3(off) v3(on)
-------------------------------------------------------------------
normal + THP | m m m m m
hugetlb(persistent) | h h h h h
hugetlb(surplus) | h h m+h h m
-------------------------------------------------------------------
v2: charge_surplus_hugepages option
v3: next version, surplus_charge_to_memcg option
m: memory cgroup
h: hugetlb cgroup
>
> It would be good to get comments from people more cgroup knowledgeable,
> and especially from those involved in the decision to do separate hugetlb
> control.
>
I stared at the commit log of mm/hugetlb_cgroup.c, but it did not seem to
have specially considered of surplus hugepages. Later, I will send a mail
to hugetlb cgroup's committer to ask about surplus hugepages charge
specifications.
--
Thanks,
Tsukada
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v8 2/6] cpuset: Add new v2 cpuset.sched.domain flag
From: Juri Lelli @ 2018-05-22 12:57 UTC (permalink / raw)
To: Waiman Long
Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
Mike Galbraith, torvalds, Roman Gushchin
In-Reply-To: <1526590545-3350-3-git-send-email-longman@redhat.com>
Hi,
On 17/05/18 16:55, Waiman Long wrote:
[...]
> /**
> + * update_isolated_cpumask - update the isolated_cpus mask of parent cpuset
> + * @cpuset: The cpuset that requests CPU isolation
> + * @oldmask: The old isolated cpumask to be removed from the parent
> + * @newmask: The new isolated cpumask to be added to the parent
> + * Return: 0 if successful, an error code otherwise
> + *
> + * Changes to the isolated CPUs are not allowed if any of CPUs changing
> + * state are in any of the child cpusets of the parent except the requesting
> + * child.
> + *
> + * If the sched_domain flag changes, either the oldmask (0=>1) or the
> + * newmask (1=>0) will be NULL.
> + *
> + * Called with cpuset_mutex held.
> + */
> +static int update_isolated_cpumask(struct cpuset *cpuset,
> + struct cpumask *oldmask, struct cpumask *newmask)
> +{
> + int retval;
> + int adding, deleting;
> + cpumask_var_t addmask, delmask;
> + struct cpuset *parent = parent_cs(cpuset);
> + struct cpuset *sibling;
> + struct cgroup_subsys_state *pos_css;
> + int old_count = parent->isolation_count;
> + bool dying = cpuset->css.flags & CSS_DYING;
> +
> + /*
> + * Parent must be a scheduling domain with non-empty cpus_allowed.
> + */
> + if (!is_sched_domain(parent) || cpumask_empty(parent->cpus_allowed))
> + return -EINVAL;
> +
> + /*
> + * The oldmask, if present, must be a subset of parent's isolated
> + * CPUs.
> + */
> + if (oldmask && !cpumask_empty(oldmask) && (!parent->isolation_count ||
> + !cpumask_subset(oldmask, parent->isolated_cpus))) {
> + WARN_ON_ONCE(1);
> + return -EINVAL;
> + }
> +
> + /*
> + * A sched_domain state change is not allowed if there are
> + * online children and the cpuset is not dying.
> + */
> + if (!dying && (!oldmask || !newmask) &&
> + css_has_online_children(&cpuset->css))
> + return -EBUSY;
> +
> + if (!zalloc_cpumask_var(&addmask, GFP_KERNEL))
> + return -ENOMEM;
> + if (!zalloc_cpumask_var(&delmask, GFP_KERNEL)) {
> + free_cpumask_var(addmask);
> + return -ENOMEM;
> + }
> +
> + if (!old_count) {
> + if (!zalloc_cpumask_var(&parent->isolated_cpus, GFP_KERNEL)) {
> + retval = -ENOMEM;
> + goto out;
> + }
> + old_count = 1;
> + }
> +
> + retval = -EBUSY;
> + adding = deleting = false;
> + if (newmask)
> + cpumask_copy(addmask, newmask);
> + if (oldmask)
> + deleting = cpumask_andnot(delmask, oldmask, addmask);
> + if (newmask)
> + adding = cpumask_andnot(addmask, newmask, delmask);
> +
> + if (!adding && !deleting)
> + goto out_ok;
> +
> + /*
> + * The cpus to be added must be in the parent's effective_cpus mask
> + * but not in the isolated_cpus mask.
> + */
> + if (!cpumask_subset(addmask, parent->effective_cpus))
> + goto out;
> + if (parent->isolation_count &&
> + cpumask_intersects(parent->isolated_cpus, addmask))
> + goto out;
> +
> + /*
> + * Check if any CPUs in addmask or delmask are in a sibling cpuset.
> + * An empty sibling cpus_allowed means it is the same as parent's
> + * effective_cpus. This checking is skipped if the cpuset is dying.
> + */
> + if (dying)
> + goto updated_isolated_cpus;
> +
> + cpuset_for_each_child(sibling, pos_css, parent) {
> + if ((sibling == cpuset) || !(sibling->css.flags & CSS_ONLINE))
> + continue;
> + if (cpumask_empty(sibling->cpus_allowed))
> + goto out;
> + if (adding &&
> + cpumask_intersects(sibling->cpus_allowed, addmask))
> + goto out;
> + if (deleting &&
> + cpumask_intersects(sibling->cpus_allowed, delmask))
> + goto out;
> + }
Just got the below by echoing 1 into cpuset.sched.domain of a sibling with
"isolated" cpuset.cpus. Guess you are missing proper locking about here
above.
--->8---
[ 7509.905005] =============================
[ 7509.905009] WARNING: suspicious RCU usage
[ 7509.905014] 4.17.0-rc5+ #11 Not tainted
[ 7509.905017] -----------------------------
[ 7509.905023] /home/juri/work/kernel/linux/kernel/cgroup/cgroup.c:3826 cgroup_mutex or RCU read lock required!
[ 7509.905026]
other info that might help us debug this:
[ 7509.905031]
rcu_scheduler_active = 2, debug_locks = 1
[ 7509.905036] 4 locks held by bash/1480:
[ 7509.905039] #0: 00000000bf288709 (sb_writers#6){.+.+}, at: vfs_write+0x18a/0x1b0
[ 7509.905072] #1: 00000000ebf23fc9 (&of->mutex){+.+.}, at: kernfs_fop_write+0xe2/0x1a0
[ 7509.905098] #2: 00000000de7c626e (kn->count#302){.+.+}, at: kernfs_fop_write+0xeb/0x1a0
[ 7509.905124] #3: 00000000a6a2bd9f (cpuset_mutex){+.+.}, at: cpuset_write_u64+0x23/0x140
[ 7509.905149]
stack backtrace:
[ 7509.905156] CPU: 6 PID: 1480 Comm: bash Not tainted 4.17.0-rc5+ #11
[ 7509.905160] Hardware name: LENOVO 30B6S2F900/1030, BIOS S01KT56A 01/15/2018
[ 7509.905164] Call Trace:
[ 7509.905176] dump_stack+0x85/0xcb
[ 7509.905187] css_next_child+0x90/0xd0
[ 7509.905195] update_isolated_cpumask+0x18f/0x2e0
[ 7509.905208] update_flag+0x1f3/0x210
[ 7509.905220] cpuset_write_u64+0xff/0x140
[ 7509.905230] cgroup_file_write+0x178/0x230
[ 7509.905244] kernfs_fop_write+0x113/0x1a0
[ 7509.905254] __vfs_write+0x36/0x180
[ 7509.905264] ? rcu_read_lock_sched_held+0x6b/0x80
[ 7509.905270] ? rcu_sync_lockdep_assert+0x2e/0x60
[ 7509.905278] ? __sb_start_write+0x13e/0x1a0
[ 7509.905283] ? vfs_write+0x18a/0x1b0
[ 7509.905293] vfs_write+0xc1/0x1b0
[ 7509.905302] ksys_write+0x55/0xc0
[ 7509.905317] do_syscall_64+0x60/0x200
[ 7509.905327] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 7509.905333] RIP: 0033:0x7fee4fdfe414
[ 7509.905338] RSP: 002b:00007fff364a80a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 7509.905346] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fee4fdfe414
[ 7509.905350] RDX: 0000000000000002 RSI: 000055eb12f93740 RDI: 0000000000000001
[ 7509.905354] RBP: 000055eb12f93740 R08: 000000000000000a R09: 00007fff364a7c30
[ 7509.905358] R10: 000000000000000a R11: 0000000000000246 R12: 00007fee500cd760
[ 7509.905361] R13: 0000000000000002 R14: 00007fee500c8760 R15: 0000000000000002
--->8---
Best,
- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v2 0/7] mm: pages for hugetlb's overcommit may be able to charge to memcg
From: TSUKADA Koutaro @ 2018-05-22 12:56 UTC (permalink / raw)
To: Punit Agrawal
Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Jonathan Corbet,
Luis R. Rodriguez, Kees Cook, Andrew Morton, Roman Gushchin,
David Rientjes, Mike Kravetz, Aneesh Kumar K.V, Naoya Horiguchi,
Anshuman Khandual, Marc-Andre Lureau, Dan Williams,
Vlastimil Babka, linux-doc, linux-kernel, linux-fsdevel, linux-mm,
cgroups
In-Reply-To: <871se5ysbg.fsf@e105922-lin.cambridge.arm.com>
Hi Punit,
On 2018/05/21 23:52, Punit Agrawal wrote:
> Hi Tsukada,
>
> I was staring at memcg code to better understand your changes and had
> the below thought.
>
> TSUKADA Koutaro <tsukada@ascade.co.jp> writes:
>
> [...]
>
>> In this patch-set, introduce the charge_surplus_huge_pages(boolean) to
>> struct hstate. If it is true, it charges to the memory cgroup to which the
>> task that obtained surplus hugepages belongs. If it is false, do nothing as
>> before, and the default value is false. The charge_surplus_huge_pages can
>> be controlled procfs or sysfs interfaces.
>
> Instead of tying the surplus huge page charging control per-hstate,
> could the control be made per-memcg?
>
> This can be done by introducing a per-memory controller file in sysfs
> (memory.charge_surplus_hugepages?) that indicates whether surplus
> hugepages are to be charged to the controller and forms part of the
> total limit. IIUC, the limit already accounts for page and swap cache
> pages.
>
> This would allow the control to be enabled per-cgroup and also keep the
> userspace control interface in one place.
>
> As said earlier, I'm not familiar with memcg so the above might not be a
> feasible but think it'll lead to a more coherent user
> interface. Hopefully, more knowledgeable folks on the thread can chime
> in.
>
Thank you for good advise.
As you mentioned, it is better to be able to control by per-memcg. After
organizing my thoughts, I will develop the next version patch-set that can
solve issues and challenge again.
Thanks,
Tsukada
> Thanks,
> Punit
>
>> Since THP is very effective in environments with kernel page size of 4KB,
>> such as x86, there is no reason to positively use HugeTLBfs, so I think
>> that there is no situation to enable charge_surplus_huge_pages. However, in
>> some distributions such as arm64, the page size of the kernel is 64KB, and
>> the size of THP is too huge as 512MB, making it difficult to use. HugeTLBfs
>> may support multiple huge page sizes, and in such a special environment
>> there is a desire to use HugeTLBfs.
>>
>> The patch set is for 4.17.0-rc3+. I don't know whether patch-set are
>> acceptable or not, so I just done a simple test.
>>
>> Thanks,
>> Tsukada
>>
>> TSUKADA Koutaro (7):
>> hugetlb: introduce charge_surplus_huge_pages to struct hstate
>> hugetlb: supports migrate charging for surplus hugepages
>> memcg: use compound_order rather than hpage_nr_pages
>> mm, sysctl: make charging surplus hugepages controllable
>> hugetlb: add charge_surplus_hugepages attribute
>> Documentation, hugetlb: describe about charge_surplus_hugepages
>> memcg: supports movement of surplus hugepages statistics
>>
>> Documentation/vm/hugetlbpage.txt | 6 +
>> include/linux/hugetlb.h | 4 +
>> kernel/sysctl.c | 7 +
>> mm/hugetlb.c | 148 +++++++++++++++++++++++++++++++++++++++
>> mm/memcontrol.c | 109 +++++++++++++++++++++++++++-
>> 5 files changed, 269 insertions(+), 5 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH RFC V2 5/6] ARM: multi_v7_defconfig: Enable RPi voltage sensor
From: Stefan Wahren @ 2018-05-22 11:21 UTC (permalink / raw)
To: Rob Herring, Mark Rutland, Jean Delvare, Guenter Roeck,
Jonathan Corbet, Eric Anholt
Cc: Florian Fainelli, Ray Jui, Scott Branden, Phil Elwell,
bcm-kernel-feedback-list, devicetree, linux-arm-kernel,
linux-rpi-kernel, linux-hwmon, linux-doc, Stefan Wahren
In-Reply-To: <1526988112-4021-1-git-send-email-stefan.wahren@i2se.com>
The patch enables the hwmon driver for the Raspberry Pi.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
---
arch/arm/configs/multi_v7_defconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm/configs/multi_v7_defconfig b/arch/arm/configs/multi_v7_defconfig
index 720461b..5c9dc00 100644
--- a/arch/arm/configs/multi_v7_defconfig
+++ b/arch/arm/configs/multi_v7_defconfig
@@ -477,6 +477,7 @@ CONFIG_SENSORS_LM90=y
CONFIG_SENSORS_LM95245=y
CONFIG_SENSORS_NTC_THERMISTOR=m
CONFIG_SENSORS_PWM_FAN=m
+CONFIG_SENSORS_RASPBERRYPI_HWMON=m
CONFIG_SENSORS_INA2XX=m
CONFIG_CPU_THERMAL=y
CONFIG_BCM2835_THERMAL=m
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH RFC V2 0/6] hwmon: Add support for Raspberry Pi voltage sensor
From: Stefan Wahren @ 2018-05-22 11:21 UTC (permalink / raw)
To: Rob Herring, Mark Rutland, Jean Delvare, Guenter Roeck,
Jonathan Corbet, Eric Anholt
Cc: Florian Fainelli, Ray Jui, Scott Branden, Phil Elwell,
bcm-kernel-feedback-list, devicetree, linux-arm-kernel,
linux-rpi-kernel, linux-hwmon, linux-doc, Stefan Wahren
A common issue for the Raspberry Pi is an inadequate power supply.
Noralf Trønnes started a discussion [1] about writing such undervoltage
conditions into the kernel log.
This series is a draft to upstream the resulting kernel patch and is not
intended for 4.18.
Changes in V2:
- simplified Kconfig dependency suggested by Robin Murphy
- replace dt-binding by probing from firmware driver
- add hwmon documentation
- minor improvements suggested by Guenter Roeck
[1] - https://github.com/raspberrypi/linux/issues/2367
Stefan Wahren (6):
ARM: bcm2835: Add GET_THROTTLED firmware property
hwmon: Add support for RPi voltage sensor
firmware: raspberrypi: Register hwmon driver
ARM: bcm2835_defconfig: Enable RPi voltage sensor
ARM: multi_v7_defconfig: Enable RPi voltage sensor
arm64: defconfig: Enable RPi voltage sensor
Documentation/hwmon/raspberrypi-hwmon | 22 ++++
arch/arm/configs/bcm2835_defconfig | 2 +-
arch/arm/configs/multi_v7_defconfig | 1 +
arch/arm64/configs/defconfig | 1 +
drivers/firmware/raspberrypi.c | 19 ++++
drivers/hwmon/Kconfig | 10 ++
drivers/hwmon/Makefile | 1 +
drivers/hwmon/raspberrypi-hwmon.c | 168 +++++++++++++++++++++++++++++
include/soc/bcm2835/raspberrypi-firmware.h | 1 +
9 files changed, 224 insertions(+), 1 deletion(-)
create mode 100644 Documentation/hwmon/raspberrypi-hwmon
create mode 100644 drivers/hwmon/raspberrypi-hwmon.c
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH RFC V2 1/6] ARM: bcm2835: Add GET_THROTTLED firmware property
From: Stefan Wahren @ 2018-05-22 11:21 UTC (permalink / raw)
To: Rob Herring, Mark Rutland, Jean Delvare, Guenter Roeck,
Jonathan Corbet, Eric Anholt
Cc: Florian Fainelli, Ray Jui, Scott Branden, Phil Elwell,
bcm-kernel-feedback-list, devicetree, linux-arm-kernel,
linux-rpi-kernel, linux-hwmon, linux-doc, Stefan Wahren
In-Reply-To: <1526988112-4021-1-git-send-email-stefan.wahren@i2se.com>
Recent Raspberry Pi firmware provides a mailbox property to detect
under-voltage conditions. Here is the current definition.
The u32 value returned by the firmware is divided into 2 parts:
- lower 16-bits are the live value
- upper 16-bits are the history or sticky value
Bits:
0: undervoltage
1: arm frequency capped
2: currently throttled
16: undervoltage has occurred
17: arm frequency capped has occurred
18: throttling has occurred
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
---
include/soc/bcm2835/raspberrypi-firmware.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/soc/bcm2835/raspberrypi-firmware.h b/include/soc/bcm2835/raspberrypi-firmware.h
index 8ee8991..c4a5c9e 100644
--- a/include/soc/bcm2835/raspberrypi-firmware.h
+++ b/include/soc/bcm2835/raspberrypi-firmware.h
@@ -75,6 +75,7 @@ enum rpi_firmware_property_tag {
RPI_FIRMWARE_GET_EDID_BLOCK = 0x00030020,
RPI_FIRMWARE_GET_CUSTOMER_OTP = 0x00030021,
RPI_FIRMWARE_GET_DOMAIN_STATE = 0x00030030,
+ RPI_FIRMWARE_GET_THROTTLED = 0x00030046,
RPI_FIRMWARE_SET_CLOCK_STATE = 0x00038001,
RPI_FIRMWARE_SET_CLOCK_RATE = 0x00038002,
RPI_FIRMWARE_SET_VOLTAGE = 0x00038003,
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH RFC V2 4/6] ARM: bcm2835_defconfig: Enable RPi voltage sensor
From: Stefan Wahren @ 2018-05-22 11:21 UTC (permalink / raw)
To: Rob Herring, Mark Rutland, Jean Delvare, Guenter Roeck,
Jonathan Corbet, Eric Anholt
Cc: Florian Fainelli, Ray Jui, Scott Branden, Phil Elwell,
bcm-kernel-feedback-list, devicetree, linux-arm-kernel,
linux-rpi-kernel, linux-hwmon, linux-doc, Stefan Wahren
In-Reply-To: <1526988112-4021-1-git-send-email-stefan.wahren@i2se.com>
The patch enables the hwmon driver for the Raspberry Pi.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
---
arch/arm/configs/bcm2835_defconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/configs/bcm2835_defconfig b/arch/arm/configs/bcm2835_defconfig
index e4d188f..e9bc889 100644
--- a/arch/arm/configs/bcm2835_defconfig
+++ b/arch/arm/configs/bcm2835_defconfig
@@ -86,7 +86,7 @@ CONFIG_SPI=y
CONFIG_SPI_BCM2835=y
CONFIG_SPI_BCM2835AUX=y
CONFIG_GPIO_SYSFS=y
-# CONFIG_HWMON is not set
+CONFIG_SENSORS_RASPBERRYPI_HWMON=m
CONFIG_THERMAL=y
CONFIG_BCM2835_THERMAL=y
CONFIG_WATCHDOG=y
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox