* [PATCH 1/3] Make /dev/urandom scalable
@ 2015-10-06 22:05 Andi Kleen
2015-10-06 22:05 ` [PATCH 2/3] random: Make input to output pool balancing per cpu Andi Kleen
2015-10-06 22:05 ` [PATCH 3/3] random: Add pool name to urandom_read trace point Andi Kleen
0 siblings, 2 replies; 8+ messages in thread
From: Andi Kleen @ 2015-10-06 22:05 UTC (permalink / raw)
To: tytso; +Cc: linux-kernel, Andi Kleen
From: Andi Kleen <ak@linux.intel.com>
We had a case where a 4 socket system spent >80% of its total CPU time
contending on the global urandom nonblocking pool spinlock. While the
application could probably have used an own PRNG, it may have valid
reasons to use the best possible key for different session keys.
The application still ran acceptable under 2S, but just fell over
the locking cliff on 4S.
Implementation
==============
The non blocking pool is used widely these days, from every execve() (to
set up AT_RANDOM for ld.so randomization), to getrandom(3) and to frequent
/dev/urandom users in user space. Clearly having such a popular resource
under a global lock is bad thing.
This patch changes the random driver to use distributed per NUMA node
nonblocking pools. The basic structure is not changed: entropy is
first fed into the input pool and later from there distributed
round-robin into the blocking and non blocking pools. This patch extends
this to use an dedicated non blocking pool for each node, and distribute
evenly from the input pool into these distributed pools, in
addition to the blocking pool.
Then every urandom/getrandom user fetches data from its node local
pool. At boot time when users may be still waiting for the non
blocking pool initialization we use the node 0 non blocking pool,
to avoid the need for different wake up queues.
For single node systems (like the vast majority of non server systems)
nothing changes. There is still only a single non blocking pool.
For other systems the original nonblocking pool is used until this
pool has 128 bits worth of entropy. After that it is used
to initialize the other pools. This already gives them
different init states, so they don't run in lock-step
to avoid "replay" attacks.
Since we still have a global input pool there are no problems
with load balancing entropy data between nodes. Any node that never
runs any interrupts would still get the same amount of entropy as
other nodes.
Entropy is fed preferably to nodes that need it more using
the existing 75% threshold.
For saving/restoring /dev/urandom, there is currently no mechanism
to access the non local node pool (short of setting task affinity).
This implies that currently the standard init/exit random save/restore
scripts would only save node 0. On restore all pools are updates.
So the entropy of non 0 gets lost over reboot. That seems acceptable
to me for now (fixing this would need a new separate save/restore interface)
Scalability
===========
I tested the patch with a simple will-it-scale test banging
on get_random() in parallel on more and more CPUs. Of course
that is not a realistic scenario, as real programs should
do some work between getting random numbers. But it's a worst
case for the random scalability.
On a 4S Xeon v3 system _without_ the patchkit the benchmark
maxes out when using all the threads on one node. After
that it quickly settles to about half the throughput of
one node with 2-4 nodes.
(all throughput factors, bigger is better)
Without patchkit:
1 node: 1x
2 nodes: 0.75x
3 nodes: 0.55x
4 nodes: 0.42x
With the patchkit applied:
1 node: 1x
2 nodes: 2x
3 nodes: 2.4x
4 nodes: 3x
So it's not quite linear scalability, but 3x maximum throughput
is already a lot better.
A node can still have a large number of CPUs: on my test system 36
logical software threads (18C * 2T). In principle it may make
sense to split it up further. Per logical CPU would be clearly
overkill. But that would also add more pressure on the input
pools. For now per node seems like a acceptable compromise.
/dev/random still uses a single global lock. For now that seems
acceptable as it normally cannot be used for real high volume
accesses anyways.
The input pool also still uses a global lock. The existing per CPU
fast pool and "give up when busy" mechanism seems to scale well enough
even on larger systems.
v2: Fix name of pool 0. Fix race with interrupts. Make
iteration loops slightly more efficient. Add ifdefs to avoid
any extra code on non-NUMA. Delay other pool use to when
the original pool initialized and initialize the pools from
pool 0. Add comments on memory allocation.
v3: Minor changes from review. Consistent pool names.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
drivers/char/random.c | 187 ++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 160 insertions(+), 27 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index d0da5d8..e7e02c0 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -156,6 +156,17 @@
* particular randomness source. They do this by keeping track of the
* first and second order deltas of the event timings.
*
+ * Distributed pools
+ * =================
+ *
+ * On larger systems the locking on the single non blocking pool can
+ * become a bottleneck. To avoid this, we use an own non blocking pool
+ * for each NUMA node. The distributed pools are fed round robin from
+ * the input pool. Each user then only reads entropy from their local
+ * pool.
+ *
+ * For the blocking pool there is still only a single instance.
+ *
* Ensuring unpredictability at system startup
* ============================================
*
@@ -467,7 +478,7 @@ static struct entropy_store blocking_pool = {
static struct entropy_store nonblocking_pool = {
.poolinfo = &poolinfo_table[1],
- .name = "nonblocking",
+ .name = "nonblocking 0",
.pull = &input_pool,
.lock = __SPIN_LOCK_UNLOCKED(nonblocking_pool.lock),
.pool = nonblocking_pool_data,
@@ -475,6 +486,41 @@ static struct entropy_store nonblocking_pool = {
push_to_pool),
};
+/*
+ * Per NUMA node nonblocking pool. This avoids lock contention
+ * when many processes extract data from /dev/urandom in parallel.
+ * /dev/random stays single instance for now.
+ *
+ * This variable is only set up once all non node 0 pools
+ * are initialized.
+ */
+#ifdef CONFIG_NUMA
+static struct entropy_store **nonblocking_node_pool __read_mostly;
+static struct entropy_store **early_nonblocking_node_pool __read_mostly;
+#else
+#define nonblocking_node_pool ((struct entropy_store **)NULL)
+#endif
+
+/* User must check for zero nonblocking_node_pool */
+#define for_each_nb_pool(i, pool) \
+ { int num_nodes = num_possible_nodes(); \
+ for (i = 0; i < num_nodes; i++) { \
+ pool = nonblocking_node_pool[i];
+#define end_for_each_nb() } }
+
+static inline struct entropy_store *get_nonblocking_pool(void)
+{
+ struct entropy_store *pool = &nonblocking_pool;
+
+ /*
+ * Non node 0 pools may take longer to initialize. Keep using
+ * the boot nonblocking pool while this happens.
+ */
+ if (nonblocking_node_pool)
+ pool = nonblocking_node_pool[numa_node_id()];
+ return pool;
+}
+
static __u32 const twist_table[8] = {
0x00000000, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };
@@ -608,6 +654,27 @@ static void process_random_ready_list(void)
spin_unlock_irqrestore(&random_ready_list_lock, flags);
}
+#define POOL_INIT_BYTES (128 / 8)
+
+void init_node_pools(void)
+{
+#ifdef CONFIG_NUMA
+ int i;
+ int num_nodes = num_possible_nodes();
+
+ for (i = 0; i < num_nodes; i++) {
+ struct entropy_store *pool = early_nonblocking_node_pool[i];
+ char buf[POOL_INIT_BYTES];
+
+ get_random_bytes(buf, sizeof buf);
+ mix_pool_bytes(pool, buf, sizeof buf);
+ }
+ /* Initialize first before setting variable */
+ mb();
+ nonblocking_node_pool = early_nonblocking_node_pool;
+#endif
+}
+
/*
* Credit (or debit) the entropy store with n bits of entropy.
* Use credit_entropy_bits_safe() if the value comes from userspace
@@ -681,6 +748,7 @@ retry:
prandom_reseed_late();
process_random_ready_list();
wake_up_all(&urandom_init_wait);
+ init_node_pools();
pr_notice("random: %s pool is initialized\n", r->name);
}
}
@@ -698,18 +766,23 @@ retry:
kill_fasync(&fasync, SIGIO, POLL_IN);
}
/* If the input pool is getting full, send some
- * entropy to the two output pools, flipping back and
- * forth between them, until the output pools are 75%
+ * entropy to the other output pools, cycling
+ * between them, until the output pools are 75%
* full.
+ * Feed into all pools round robin.
*/
if (entropy_bits > random_write_wakeup_bits &&
r->initialized &&
r->entropy_total >= 2*random_read_wakeup_bits) {
static struct entropy_store *last = &blocking_pool;
+ static int next_pool = -1;
struct entropy_store *other = &blocking_pool;
- if (last == &blocking_pool)
- other = &nonblocking_pool;
+ /* -1: use blocking pool, 0<=max_node: node nb pool */
+ if (next_pool > -1)
+ other = nonblocking_node_pool[next_pool];
+ if (++next_pool >= num_possible_nodes())
+ next_pool = -1;
if (other->entropy_count <=
3 * other->poolinfo->poolfracbits / 4)
last = other;
@@ -748,6 +821,17 @@ struct timer_rand_state {
#define INIT_TIMER_RAND_STATE { INITIAL_JIFFIES, };
+static void pool_add_buf(struct entropy_store *pool,
+ const void *buf, unsigned int size)
+{
+ unsigned long flags;
+ unsigned long time = random_get_entropy() ^ jiffies;
+ spin_lock_irqsave(&pool->lock, flags);
+ _mix_pool_bytes(pool, buf, size);
+ _mix_pool_bytes(pool, &time, sizeof(time));
+ spin_unlock_irqrestore(&pool->lock, flags);
+}
+
/*
* Add device- or boot-specific data to the input and nonblocking
* pools to help initialize them to unique values.
@@ -758,19 +842,18 @@ struct timer_rand_state {
*/
void add_device_randomness(const void *buf, unsigned int size)
{
- unsigned long time = random_get_entropy() ^ jiffies;
- unsigned long flags;
+ struct entropy_store *pool;
+ int i;
trace_add_device_randomness(size, _RET_IP_);
- spin_lock_irqsave(&input_pool.lock, flags);
- _mix_pool_bytes(&input_pool, buf, size);
- _mix_pool_bytes(&input_pool, &time, sizeof(time));
- spin_unlock_irqrestore(&input_pool.lock, flags);
-
- spin_lock_irqsave(&nonblocking_pool.lock, flags);
- _mix_pool_bytes(&nonblocking_pool, buf, size);
- _mix_pool_bytes(&nonblocking_pool, &time, sizeof(time));
- spin_unlock_irqrestore(&nonblocking_pool.lock, flags);
+ pool_add_buf(&input_pool, buf, size);
+ if (!nonblocking_node_pool) {
+ pool_add_buf(&nonblocking_pool, buf, size);
+ } else {
+ for_each_nb_pool (i, pool) {
+ pool_add_buf(pool, buf, size);
+ } end_for_each_nb()
+ }
}
EXPORT_SYMBOL(add_device_randomness);
@@ -1252,15 +1335,16 @@ static ssize_t extract_entropy_user(struct entropy_store *r, void __user *buf,
*/
void get_random_bytes(void *buf, int nbytes)
{
+ struct entropy_store *pool = get_nonblocking_pool();
#if DEBUG_RANDOM_BOOT > 0
- if (unlikely(nonblocking_pool.initialized == 0))
+ if (unlikely(pool->initialized == 0))
printk(KERN_NOTICE "random: %pF get_random_bytes called "
"with %d bits of entropy available\n",
(void *) _RET_IP_,
- nonblocking_pool.entropy_total);
+ pool->entropy_total);
#endif
trace_get_random_bytes(nbytes, _RET_IP_);
- extract_entropy(&nonblocking_pool, buf, nbytes, 0, 0);
+ extract_entropy(pool, buf, nbytes, 0, 0);
}
EXPORT_SYMBOL(get_random_bytes);
@@ -1350,7 +1434,7 @@ void get_random_bytes_arch(void *buf, int nbytes)
}
if (nbytes)
- extract_entropy(&nonblocking_pool, p, nbytes, 0, 0);
+ extract_entropy(get_nonblocking_pool(), p, nbytes, 0, 0);
}
EXPORT_SYMBOL(get_random_bytes_arch);
@@ -1390,12 +1474,43 @@ static void init_std_data(struct entropy_store *r)
* initializations complete at compile time. We should also
* take care not to overwrite the precious per platform data
* we were given.
+ *
+ * This is early initialization, if memory allocations of a few
+ * bytes fails the kernel is doomed anyways. So use __GFP_NOFAIL.
*/
static int rand_initialize(void)
{
+#ifdef CONFIG_NUMA
+ int i;
+ int num_nodes = num_possible_nodes();
+ struct entropy_store **pool_list;
+#endif
+
init_std_data(&input_pool);
init_std_data(&blocking_pool);
init_std_data(&nonblocking_pool);
+#ifdef CONFIG_NUMA
+ pool_list = kmalloc(num_nodes * sizeof(void *),
+ GFP_KERNEL|__GFP_NOFAIL);
+ pool_list[0] = &nonblocking_pool;
+ for (i = 1; i < num_nodes; i++) {
+ struct entropy_store *pool;
+
+ pool = kzalloc(sizeof(struct entropy_store),
+ GFP_KERNEL|__GFP_NOFAIL);
+ pool_list[i] = pool;
+ pool->poolinfo = &poolinfo_table[1];
+ pool->pull = &input_pool;
+ spin_lock_init(&pool->lock);
+ /* pool data not cleared intentionally */
+ pool->pool = kmalloc(sizeof(nonblocking_pool_data),
+ GFP_KERNEL|__GFP_NOFAIL);
+ INIT_WORK(&pool->push_work, push_to_pool);
+ pool->name = kasprintf(GFP_KERNEL|__GFP_NOFAIL,
+ "nonblocking %d", i);
+ }
+ early_nonblocking_node_pool = pool_list;
+#endif
return 0;
}
early_initcall(rand_initialize);
@@ -1458,17 +1573,19 @@ static ssize_t
urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
{
int ret;
+ struct entropy_store *pool;
- if (unlikely(nonblocking_pool.initialized == 0))
+ pool = get_nonblocking_pool();
+ if (unlikely(pool->initialized == 0))
printk_once(KERN_NOTICE "random: %s urandom read "
"with %d bits of entropy available\n",
current->comm, nonblocking_pool.entropy_total);
nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3));
- ret = extract_entropy_user(&nonblocking_pool, buf, nbytes);
+ ret = extract_entropy_user(pool, buf, nbytes);
- trace_urandom_read(8 * nbytes, ENTROPY_BITS(&nonblocking_pool),
- ENTROPY_BITS(&input_pool));
+ trace_urandom_read(8 * nbytes, ENTROPY_BITS(pool),
+ ENTROPY_BITS(pool));
return ret;
}
@@ -1513,22 +1630,34 @@ static ssize_t random_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos)
{
size_t ret;
+ struct entropy_store *pool;
+ int i;
ret = write_pool(&blocking_pool, buffer, count);
if (ret)
return ret;
- ret = write_pool(&nonblocking_pool, buffer, count);
- if (ret)
- return ret;
+ if (nonblocking_node_pool) {
+ for_each_nb_pool (i, pool) {
+ ret = write_pool(pool, buffer, count);
+ if (ret)
+ return ret;
+ } end_for_each_nb()
+ } else {
+ ret = write_pool(&nonblocking_pool, buffer, count);
+ if (ret)
+ return ret;
+ }
return (ssize_t)count;
}
static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
{
+ int i;
int size, ent_count;
int __user *p = (int __user *)arg;
int retval;
+ struct entropy_store *pool;
switch (cmd) {
case RNDGETENTCNT:
@@ -1569,6 +1698,10 @@ static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
return -EPERM;
input_pool.entropy_count = 0;
nonblocking_pool.entropy_count = 0;
+ if (nonblocking_node_pool)
+ for_each_nb_pool (i, pool) {
+ pool->entropy_count = 0;
+ } end_for_each_nb()
blocking_pool.entropy_count = 0;
return 0;
default:
--
2.4.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] random: Make input to output pool balancing per cpu
2015-10-06 22:05 [PATCH 1/3] Make /dev/urandom scalable Andi Kleen
@ 2015-10-06 22:05 ` Andi Kleen
2015-10-06 22:18 ` kbuild test robot
2015-10-06 22:05 ` [PATCH 3/3] random: Add pool name to urandom_read trace point Andi Kleen
1 sibling, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2015-10-06 22:05 UTC (permalink / raw)
To: tytso; +Cc: linux-kernel, Andi Kleen
From: Andi Kleen <ak@linux.intel.com>
The load balancing from input pool to output pools was
essentially unlocked. Before it didn't matter much because
there were only two choices (blocking and non blocking).
But now with the distributed non blocking pools we have
a lot more pools, and unlocked access of the counters
may systematically deprive some nodes from their deserved
entropy.
Turn the round-robin state into per CPU variables
to avoid any possibility of races. This code already
runs with preemption disabled.
v2: Check for non initialized pools.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
drivers/char/random.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index e7e02c0..a395f783 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -774,15 +774,20 @@ retry:
if (entropy_bits > random_write_wakeup_bits &&
r->initialized &&
r->entropy_total >= 2*random_read_wakeup_bits) {
- static struct entropy_store *last = &blocking_pool;
- static int next_pool = -1;
- struct entropy_store *other = &blocking_pool;
+ static DEFINE_PER_CPU(struct entropy_store *, lastp) =
+ &blocking_pool;
+ static DEFINE_PER_CPU(int, next_pool);
+ struct entropy_store *other = &blocking_pool, *last;
+ int np;
/* -1: use blocking pool, 0<=max_node: node nb pool */
- if (next_pool > -1)
- other = nonblocking_node_pool[next_pool];
- if (++next_pool >= num_possible_nodes())
- next_pool = -1;
+ np = __this_cpu_read(next_pool);
+ if (np > -1 && nonblocking_node_pool)
+ other = nonblocking_node_pool[np];
+ if (++np >= num_possible_nodes())
+ np = -1;
+ __this_cpu_write(next_pool, np);
+ last = __this_cpu_read(lastp);
if (other->entropy_count <=
3 * other->poolinfo->poolfracbits / 4)
last = other;
@@ -791,6 +796,7 @@ retry:
schedule_work(&last->push_work);
r->entropy_total = 0;
}
+ __this_cpu_write(lastp, last);
}
}
}
--
2.4.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] random: Make input to output pool balancing per cpu
2015-10-06 22:05 ` [PATCH 2/3] random: Make input to output pool balancing per cpu Andi Kleen
@ 2015-10-06 22:18 ` kbuild test robot
0 siblings, 0 replies; 8+ messages in thread
From: kbuild test robot @ 2015-10-06 22:18 UTC (permalink / raw)
To: Andi Kleen; +Cc: kbuild-all, tytso, linux-kernel, Andi Kleen
[-- Attachment #1: Type: text/plain, Size: 10635 bytes --]
Hi Andi,
[auto build test ERROR on v4.3-rc4 -- if it's inappropriate base, please ignore]
config: i386-randconfig-x009-201540 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All errors (new ones prefixed by >>):
In file included from include/asm-generic/percpu.h:6:0,
from arch/x86/include/asm/percpu.h:551,
from arch/x86/include/asm/preempt.h:5,
from include/linux/preempt.h:64,
from include/linux/spinlock.h:50,
from include/linux/seqlock.h:35,
from include/linux/time.h:5,
from include/uapi/linux/timex.h:56,
from include/linux/timex.h:56,
from include/linux/sched.h:19,
from include/linux/utsname.h:5,
from drivers/char/random.c:249:
drivers/char/random.c: In function 'credit_entropy_bits':
>> include/linux/percpu-defs.h:91:33: error: section attribute cannot be specified for local variables
extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name; \
^
include/linux/percpu-defs.h:116:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
DEFINE_PER_CPU_SECTION(type, name, "")
^
drivers/char/random.c:777:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(struct entropy_store *, lastp) =
^
include/linux/percpu-defs.h:92:26: error: section attribute cannot be specified for local variables
__PCPU_DUMMY_ATTRS char __pcpu_unique_##name; \
^
include/linux/percpu-defs.h:116:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
DEFINE_PER_CPU_SECTION(type, name, "")
^
drivers/char/random.c:777:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(struct entropy_store *, lastp) =
^
>> include/linux/percpu-defs.h:92:26: error: declaration of '__pcpu_unique_lastp' with no linkage follows extern declaration
__PCPU_DUMMY_ATTRS char __pcpu_unique_##name; \
^
include/linux/percpu-defs.h:116:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
DEFINE_PER_CPU_SECTION(type, name, "")
^
drivers/char/random.c:777:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(struct entropy_store *, lastp) =
^
include/linux/percpu-defs.h:91:33: note: previous declaration of '__pcpu_unique_lastp' was here
extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name; \
^
include/linux/percpu-defs.h:116:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
DEFINE_PER_CPU_SECTION(type, name, "")
^
drivers/char/random.c:777:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(struct entropy_store *, lastp) =
^
>> drivers/char/random.c:777:50: error: section attribute cannot be specified for local variables
static DEFINE_PER_CPU(struct entropy_store *, lastp) =
^
include/linux/percpu-defs.h:93:44: note: in definition of macro 'DEFINE_PER_CPU_SECTION'
extern __PCPU_ATTRS(sec) __typeof__(type) name; \
^
drivers/char/random.c:777:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(struct entropy_store *, lastp) =
^
>> drivers/char/random.c:777:50: error: section attribute cannot be specified for local variables
static DEFINE_PER_CPU(struct entropy_store *, lastp) =
^
include/linux/percpu-defs.h:95:19: note: in definition of macro 'DEFINE_PER_CPU_SECTION'
__typeof__(type) name
^
drivers/char/random.c:777:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(struct entropy_store *, lastp) =
^
>> drivers/char/random.c:777:50: error: weak declaration of 'lastp' must be public
static DEFINE_PER_CPU(struct entropy_store *, lastp) =
^
include/linux/percpu-defs.h:95:19: note: in definition of macro 'DEFINE_PER_CPU_SECTION'
__typeof__(type) name
^
drivers/char/random.c:777:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(struct entropy_store *, lastp) =
^
>> drivers/char/random.c:777:50: error: declaration of 'lastp' with no linkage follows extern declaration
static DEFINE_PER_CPU(struct entropy_store *, lastp) =
^
include/linux/percpu-defs.h:95:19: note: in definition of macro 'DEFINE_PER_CPU_SECTION'
__typeof__(type) name
^
drivers/char/random.c:777:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(struct entropy_store *, lastp) =
^
drivers/char/random.c:777:50: note: previous declaration of 'lastp' was here
static DEFINE_PER_CPU(struct entropy_store *, lastp) =
^
include/linux/percpu-defs.h:93:44: note: in definition of macro 'DEFINE_PER_CPU_SECTION'
extern __PCPU_ATTRS(sec) __typeof__(type) name; \
^
drivers/char/random.c:777:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(struct entropy_store *, lastp) =
^
>> include/linux/percpu-defs.h:91:33: error: section attribute cannot be specified for local variables
extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name; \
^
include/linux/percpu-defs.h:116:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
DEFINE_PER_CPU_SECTION(type, name, "")
^
drivers/char/random.c:779:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(int, next_pool);
^
include/linux/percpu-defs.h:92:26: error: section attribute cannot be specified for local variables
__PCPU_DUMMY_ATTRS char __pcpu_unique_##name; \
^
include/linux/percpu-defs.h:116:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
DEFINE_PER_CPU_SECTION(type, name, "")
^
drivers/char/random.c:779:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(int, next_pool);
^
>> include/linux/percpu-defs.h:92:26: error: declaration of '__pcpu_unique_next_pool' with no linkage follows extern declaration
__PCPU_DUMMY_ATTRS char __pcpu_unique_##name; \
^
include/linux/percpu-defs.h:116:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
DEFINE_PER_CPU_SECTION(type, name, "")
^
drivers/char/random.c:779:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(int, next_pool);
^
include/linux/percpu-defs.h:91:33: note: previous declaration of '__pcpu_unique_next_pool' was here
extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name; \
^
include/linux/percpu-defs.h:116:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
DEFINE_PER_CPU_SECTION(type, name, "")
^
drivers/char/random.c:779:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(int, next_pool);
^
drivers/char/random.c:779:31: error: section attribute cannot be specified for local variables
static DEFINE_PER_CPU(int, next_pool);
^
include/linux/percpu-defs.h:93:44: note: in definition of macro 'DEFINE_PER_CPU_SECTION'
extern __PCPU_ATTRS(sec) __typeof__(type) name; \
^
drivers/char/random.c:779:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(int, next_pool);
^
drivers/char/random.c:779:31: error: section attribute cannot be specified for local variables
static DEFINE_PER_CPU(int, next_pool);
^
include/linux/percpu-defs.h:95:19: note: in definition of macro 'DEFINE_PER_CPU_SECTION'
__typeof__(type) name
^
drivers/char/random.c:779:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(int, next_pool);
^
>> drivers/char/random.c:779:31: error: weak declaration of 'next_pool' must be public
static DEFINE_PER_CPU(int, next_pool);
^
include/linux/percpu-defs.h:95:19: note: in definition of macro 'DEFINE_PER_CPU_SECTION'
__typeof__(type) name
^
drivers/char/random.c:779:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(int, next_pool);
^
>> drivers/char/random.c:779:31: error: declaration of 'next_pool' with no linkage follows extern declaration
static DEFINE_PER_CPU(int, next_pool);
^
include/linux/percpu-defs.h:95:19: note: in definition of macro 'DEFINE_PER_CPU_SECTION'
__typeof__(type) name
^
drivers/char/random.c:779:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(int, next_pool);
^
drivers/char/random.c:779:31: note: previous declaration of 'next_pool' was here
static DEFINE_PER_CPU(int, next_pool);
^
include/linux/percpu-defs.h:93:44: note: in definition of macro 'DEFINE_PER_CPU_SECTION'
extern __PCPU_ATTRS(sec) __typeof__(type) name; \
^
drivers/char/random.c:779:11: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(int, next_pool);
^
vim +777 drivers/char/random.c
771 * full.
772 * Feed into all pools round robin.
773 */
774 if (entropy_bits > random_write_wakeup_bits &&
775 r->initialized &&
776 r->entropy_total >= 2*random_read_wakeup_bits) {
> 777 static DEFINE_PER_CPU(struct entropy_store *, lastp) =
778 &blocking_pool;
> 779 static DEFINE_PER_CPU(int, next_pool);
780 struct entropy_store *other = &blocking_pool, *last;
781 int np;
782
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 24334 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 3/3] random: Add pool name to urandom_read trace point
2015-10-06 22:05 [PATCH 1/3] Make /dev/urandom scalable Andi Kleen
2015-10-06 22:05 ` [PATCH 2/3] random: Make input to output pool balancing per cpu Andi Kleen
@ 2015-10-06 22:05 ` Andi Kleen
1 sibling, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2015-10-06 22:05 UTC (permalink / raw)
To: tytso; +Cc: linux-kernel, Andi Kleen
From: Andi Kleen <ak@linux.intel.com>
Now that we have multiple nonblocking_pools it makes sense to report
the name of the pool in the urandom_read trace point. Extend
the trace point to report the name too.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
drivers/char/random.c | 2 +-
include/trace/events/random.h | 10 ++++++----
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index a395f783..de6dca8 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1590,7 +1590,7 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3));
ret = extract_entropy_user(pool, buf, nbytes);
- trace_urandom_read(8 * nbytes, ENTROPY_BITS(pool),
+ trace_urandom_read(pool->name, 8 * nbytes, ENTROPY_BITS(pool),
ENTROPY_BITS(pool));
return ret;
}
diff --git a/include/trace/events/random.h b/include/trace/events/random.h
index 4684de3..3b5ab5a 100644
--- a/include/trace/events/random.h
+++ b/include/trace/events/random.h
@@ -288,25 +288,27 @@ TRACE_EVENT(random_read,
);
TRACE_EVENT(urandom_read,
- TP_PROTO(int got_bits, int pool_left, int input_left),
+ TP_PROTO(const char *pool, int got_bits, int pool_left, int input_left),
- TP_ARGS(got_bits, pool_left, input_left),
+ TP_ARGS(pool, got_bits, pool_left, input_left),
TP_STRUCT__entry(
+ __field( const char *, pool )
__field( int, got_bits )
__field( int, pool_left )
__field( int, input_left )
),
TP_fast_assign(
+ __entry->pool = pool;
__entry->got_bits = got_bits;
__entry->pool_left = pool_left;
__entry->input_left = input_left;
),
TP_printk("got_bits %d nonblocking_pool_entropy_left %d "
- "input_entropy_left %d", __entry->got_bits,
- __entry->pool_left, __entry->input_left)
+ "input_entropy_left %d from %s", __entry->got_bits,
+ __entry->pool_left, __entry->input_left, __entry->pool)
);
#endif /* _TRACE_RANDOM_H */
--
2.4.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 1/3] Make /dev/urandom scalable
@ 2016-03-01 5:17 Andi Kleen
2016-03-01 5:17 ` [PATCH 3/3] random: Add pool name to urandom_read trace point Andi Kleen
0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2016-03-01 5:17 UTC (permalink / raw)
To: tytso; +Cc: linux-kernel, Andi Kleen
From: Andi Kleen <ak@linux.intel.com>
We had a case where a 4 socket system spent >80% of its total CPU time
contending on the global urandom nonblocking pool spinlock. While the
application could probably have used an own PRNG, it may have valid
reasons to use the best possible key for different session keys.
The application still ran acceptable under 2S, but just fell over
the locking cliff on 4S.
Implementation
==============
The non blocking pool is used widely these days, from every execve() (to
set up AT_RANDOM for ld.so randomization), to getrandom(3) and to frequent
/dev/urandom users in user space. Clearly having such a popular resource
under a global lock is bad thing.
This patch changes the random driver to use distributed per NUMA node
nonblocking pools. The basic structure is not changed: entropy is
first fed into the input pool and later from there distributed
round-robin into the blocking and non blocking pools. This patch extends
this to use an dedicated non blocking pool for each node, and distribute
evenly from the input pool into these distributed pools, in
addition to the blocking pool.
Then every urandom/getrandom user fetches data from its node local
pool. At boot time when users may be still waiting for the non
blocking pool initialization we use the node 0 non blocking pool,
to avoid the need for different wake up queues.
For single node systems (like the vast majority of non server systems)
nothing changes. There is still only a single non blocking pool.
For other systems the original nonblocking pool is used until this
pool has 128 bits worth of entropy. After that it is used
to initialize the other pools. This already gives them
different init states, so they don't run in lock-step
to avoid "replay" attacks.
Since we still have a global input pool there are no problems
with load balancing entropy data between nodes. Any node that never
runs any interrupts would still get the same amount of entropy as
other nodes.
Entropy is fed preferably to nodes that need it more using
the existing 75% threshold.
For saving/restoring /dev/urandom, there is currently no mechanism
to access the non local node pool (short of setting task affinity).
This implies that currently the standard init/exit random save/restore
scripts would only save node 0. On restore all pools are updates.
So the entropy of non 0 gets lost over reboot. That seems acceptable
to me for now (fixing this would need a new separate save/restore interface)
Scalability
===========
I tested the patch with a simple will-it-scale test banging
on get_random() in parallel on more and more CPUs. Of course
that is not a realistic scenario, as real programs should
do some work between getting random numbers. But it's a worst
case for the random scalability.
On a 4S Xeon v3 system _without_ the patchkit the benchmark
maxes out when using all the threads on one node. After
that it quickly settles to about half the throughput of
one node with 2-4 nodes.
(all throughput factors, bigger is better)
Without patchkit:
1 node: 1x
2 nodes: 0.75x
3 nodes: 0.55x
4 nodes: 0.42x
With the patchkit applied:
1 node: 1x
2 nodes: 2x
3 nodes: 2.4x
4 nodes: 3x
So it's not quite linear scalability, but 3x maximum throughput
is already a lot better.
A node can still have a large number of CPUs: on my test system 36
logical software threads (18C * 2T). In principle it may make
sense to split it up further. Per logical CPU would be clearly
overkill. But that would also add more pressure on the input
pools. For now per node seems like a acceptable compromise.
/dev/random still uses a single global lock. For now that seems
acceptable as it normally cannot be used for real high volume
accesses anyways.
The input pool also still uses a global lock. The existing per CPU
fast pool and "give up when busy" mechanism seems to scale well enough
even on larger systems.
v2: Fix name of pool 0. Fix race with interrupts. Make
iteration loops slightly more efficient. Add ifdefs to avoid
any extra code on non-NUMA. Delay other pool use to when
the original pool initialized and initialize the pools from
pool 0. Add comments on memory allocation.
v3: Minor changes from review. Consistent pool names.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
drivers/char/random.c | 187 ++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 160 insertions(+), 27 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index d0da5d8..e7e02c0 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -156,6 +156,17 @@
* particular randomness source. They do this by keeping track of the
* first and second order deltas of the event timings.
*
+ * Distributed pools
+ * =================
+ *
+ * On larger systems the locking on the single non blocking pool can
+ * become a bottleneck. To avoid this, we use an own non blocking pool
+ * for each NUMA node. The distributed pools are fed round robin from
+ * the input pool. Each user then only reads entropy from their local
+ * pool.
+ *
+ * For the blocking pool there is still only a single instance.
+ *
* Ensuring unpredictability at system startup
* ============================================
*
@@ -467,7 +478,7 @@ static struct entropy_store blocking_pool = {
static struct entropy_store nonblocking_pool = {
.poolinfo = &poolinfo_table[1],
- .name = "nonblocking",
+ .name = "nonblocking 0",
.pull = &input_pool,
.lock = __SPIN_LOCK_UNLOCKED(nonblocking_pool.lock),
.pool = nonblocking_pool_data,
@@ -475,6 +486,41 @@ static struct entropy_store nonblocking_pool = {
push_to_pool),
};
+/*
+ * Per NUMA node nonblocking pool. This avoids lock contention
+ * when many processes extract data from /dev/urandom in parallel.
+ * /dev/random stays single instance for now.
+ *
+ * This variable is only set up once all non node 0 pools
+ * are initialized.
+ */
+#ifdef CONFIG_NUMA
+static struct entropy_store **nonblocking_node_pool __read_mostly;
+static struct entropy_store **early_nonblocking_node_pool __read_mostly;
+#else
+#define nonblocking_node_pool ((struct entropy_store **)NULL)
+#endif
+
+/* User must check for zero nonblocking_node_pool */
+#define for_each_nb_pool(i, pool) \
+ { int num_nodes = num_possible_nodes(); \
+ for (i = 0; i < num_nodes; i++) { \
+ pool = nonblocking_node_pool[i];
+#define end_for_each_nb() } }
+
+static inline struct entropy_store *get_nonblocking_pool(void)
+{
+ struct entropy_store *pool = &nonblocking_pool;
+
+ /*
+ * Non node 0 pools may take longer to initialize. Keep using
+ * the boot nonblocking pool while this happens.
+ */
+ if (nonblocking_node_pool)
+ pool = nonblocking_node_pool[numa_node_id()];
+ return pool;
+}
+
static __u32 const twist_table[8] = {
0x00000000, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };
@@ -608,6 +654,27 @@ static void process_random_ready_list(void)
spin_unlock_irqrestore(&random_ready_list_lock, flags);
}
+#define POOL_INIT_BYTES (128 / 8)
+
+void init_node_pools(void)
+{
+#ifdef CONFIG_NUMA
+ int i;
+ int num_nodes = num_possible_nodes();
+
+ for (i = 0; i < num_nodes; i++) {
+ struct entropy_store *pool = early_nonblocking_node_pool[i];
+ char buf[POOL_INIT_BYTES];
+
+ get_random_bytes(buf, sizeof buf);
+ mix_pool_bytes(pool, buf, sizeof buf);
+ }
+ /* Initialize first before setting variable */
+ mb();
+ nonblocking_node_pool = early_nonblocking_node_pool;
+#endif
+}
+
/*
* Credit (or debit) the entropy store with n bits of entropy.
* Use credit_entropy_bits_safe() if the value comes from userspace
@@ -681,6 +748,7 @@ retry:
prandom_reseed_late();
process_random_ready_list();
wake_up_all(&urandom_init_wait);
+ init_node_pools();
pr_notice("random: %s pool is initialized\n", r->name);
}
}
@@ -698,18 +766,23 @@ retry:
kill_fasync(&fasync, SIGIO, POLL_IN);
}
/* If the input pool is getting full, send some
- * entropy to the two output pools, flipping back and
- * forth between them, until the output pools are 75%
+ * entropy to the other output pools, cycling
+ * between them, until the output pools are 75%
* full.
+ * Feed into all pools round robin.
*/
if (entropy_bits > random_write_wakeup_bits &&
r->initialized &&
r->entropy_total >= 2*random_read_wakeup_bits) {
static struct entropy_store *last = &blocking_pool;
+ static int next_pool = -1;
struct entropy_store *other = &blocking_pool;
- if (last == &blocking_pool)
- other = &nonblocking_pool;
+ /* -1: use blocking pool, 0<=max_node: node nb pool */
+ if (next_pool > -1)
+ other = nonblocking_node_pool[next_pool];
+ if (++next_pool >= num_possible_nodes())
+ next_pool = -1;
if (other->entropy_count <=
3 * other->poolinfo->poolfracbits / 4)
last = other;
@@ -748,6 +821,17 @@ struct timer_rand_state {
#define INIT_TIMER_RAND_STATE { INITIAL_JIFFIES, };
+static void pool_add_buf(struct entropy_store *pool,
+ const void *buf, unsigned int size)
+{
+ unsigned long flags;
+ unsigned long time = random_get_entropy() ^ jiffies;
+ spin_lock_irqsave(&pool->lock, flags);
+ _mix_pool_bytes(pool, buf, size);
+ _mix_pool_bytes(pool, &time, sizeof(time));
+ spin_unlock_irqrestore(&pool->lock, flags);
+}
+
/*
* Add device- or boot-specific data to the input and nonblocking
* pools to help initialize them to unique values.
@@ -758,19 +842,18 @@ struct timer_rand_state {
*/
void add_device_randomness(const void *buf, unsigned int size)
{
- unsigned long time = random_get_entropy() ^ jiffies;
- unsigned long flags;
+ struct entropy_store *pool;
+ int i;
trace_add_device_randomness(size, _RET_IP_);
- spin_lock_irqsave(&input_pool.lock, flags);
- _mix_pool_bytes(&input_pool, buf, size);
- _mix_pool_bytes(&input_pool, &time, sizeof(time));
- spin_unlock_irqrestore(&input_pool.lock, flags);
-
- spin_lock_irqsave(&nonblocking_pool.lock, flags);
- _mix_pool_bytes(&nonblocking_pool, buf, size);
- _mix_pool_bytes(&nonblocking_pool, &time, sizeof(time));
- spin_unlock_irqrestore(&nonblocking_pool.lock, flags);
+ pool_add_buf(&input_pool, buf, size);
+ if (!nonblocking_node_pool) {
+ pool_add_buf(&nonblocking_pool, buf, size);
+ } else {
+ for_each_nb_pool (i, pool) {
+ pool_add_buf(pool, buf, size);
+ } end_for_each_nb()
+ }
}
EXPORT_SYMBOL(add_device_randomness);
@@ -1252,15 +1335,16 @@ static ssize_t extract_entropy_user(struct entropy_store *r, void __user *buf,
*/
void get_random_bytes(void *buf, int nbytes)
{
+ struct entropy_store *pool = get_nonblocking_pool();
#if DEBUG_RANDOM_BOOT > 0
- if (unlikely(nonblocking_pool.initialized == 0))
+ if (unlikely(pool->initialized == 0))
printk(KERN_NOTICE "random: %pF get_random_bytes called "
"with %d bits of entropy available\n",
(void *) _RET_IP_,
- nonblocking_pool.entropy_total);
+ pool->entropy_total);
#endif
trace_get_random_bytes(nbytes, _RET_IP_);
- extract_entropy(&nonblocking_pool, buf, nbytes, 0, 0);
+ extract_entropy(pool, buf, nbytes, 0, 0);
}
EXPORT_SYMBOL(get_random_bytes);
@@ -1350,7 +1434,7 @@ void get_random_bytes_arch(void *buf, int nbytes)
}
if (nbytes)
- extract_entropy(&nonblocking_pool, p, nbytes, 0, 0);
+ extract_entropy(get_nonblocking_pool(), p, nbytes, 0, 0);
}
EXPORT_SYMBOL(get_random_bytes_arch);
@@ -1390,12 +1474,43 @@ static void init_std_data(struct entropy_store *r)
* initializations complete at compile time. We should also
* take care not to overwrite the precious per platform data
* we were given.
+ *
+ * This is early initialization, if memory allocations of a few
+ * bytes fails the kernel is doomed anyways. So use __GFP_NOFAIL.
*/
static int rand_initialize(void)
{
+#ifdef CONFIG_NUMA
+ int i;
+ int num_nodes = num_possible_nodes();
+ struct entropy_store **pool_list;
+#endif
+
init_std_data(&input_pool);
init_std_data(&blocking_pool);
init_std_data(&nonblocking_pool);
+#ifdef CONFIG_NUMA
+ pool_list = kmalloc(num_nodes * sizeof(void *),
+ GFP_KERNEL|__GFP_NOFAIL);
+ pool_list[0] = &nonblocking_pool;
+ for (i = 1; i < num_nodes; i++) {
+ struct entropy_store *pool;
+
+ pool = kzalloc(sizeof(struct entropy_store),
+ GFP_KERNEL|__GFP_NOFAIL);
+ pool_list[i] = pool;
+ pool->poolinfo = &poolinfo_table[1];
+ pool->pull = &input_pool;
+ spin_lock_init(&pool->lock);
+ /* pool data not cleared intentionally */
+ pool->pool = kmalloc(sizeof(nonblocking_pool_data),
+ GFP_KERNEL|__GFP_NOFAIL);
+ INIT_WORK(&pool->push_work, push_to_pool);
+ pool->name = kasprintf(GFP_KERNEL|__GFP_NOFAIL,
+ "nonblocking %d", i);
+ }
+ early_nonblocking_node_pool = pool_list;
+#endif
return 0;
}
early_initcall(rand_initialize);
@@ -1458,17 +1573,19 @@ static ssize_t
urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
{
int ret;
+ struct entropy_store *pool;
- if (unlikely(nonblocking_pool.initialized == 0))
+ pool = get_nonblocking_pool();
+ if (unlikely(pool->initialized == 0))
printk_once(KERN_NOTICE "random: %s urandom read "
"with %d bits of entropy available\n",
current->comm, nonblocking_pool.entropy_total);
nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3));
- ret = extract_entropy_user(&nonblocking_pool, buf, nbytes);
+ ret = extract_entropy_user(pool, buf, nbytes);
- trace_urandom_read(8 * nbytes, ENTROPY_BITS(&nonblocking_pool),
- ENTROPY_BITS(&input_pool));
+ trace_urandom_read(8 * nbytes, ENTROPY_BITS(pool),
+ ENTROPY_BITS(pool));
return ret;
}
@@ -1513,22 +1630,34 @@ static ssize_t random_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos)
{
size_t ret;
+ struct entropy_store *pool;
+ int i;
ret = write_pool(&blocking_pool, buffer, count);
if (ret)
return ret;
- ret = write_pool(&nonblocking_pool, buffer, count);
- if (ret)
- return ret;
+ if (nonblocking_node_pool) {
+ for_each_nb_pool (i, pool) {
+ ret = write_pool(pool, buffer, count);
+ if (ret)
+ return ret;
+ } end_for_each_nb()
+ } else {
+ ret = write_pool(&nonblocking_pool, buffer, count);
+ if (ret)
+ return ret;
+ }
return (ssize_t)count;
}
static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
{
+ int i;
int size, ent_count;
int __user *p = (int __user *)arg;
int retval;
+ struct entropy_store *pool;
switch (cmd) {
case RNDGETENTCNT:
@@ -1569,6 +1698,10 @@ static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
return -EPERM;
input_pool.entropy_count = 0;
nonblocking_pool.entropy_count = 0;
+ if (nonblocking_node_pool)
+ for_each_nb_pool (i, pool) {
+ pool->entropy_count = 0;
+ } end_for_each_nb()
blocking_pool.entropy_count = 0;
return 0;
default:
--
2.5.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] random: Add pool name to urandom_read trace point
2016-03-01 5:17 [PATCH 1/3] Make /dev/urandom scalable Andi Kleen
@ 2016-03-01 5:17 ` Andi Kleen
0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2016-03-01 5:17 UTC (permalink / raw)
To: tytso; +Cc: linux-kernel, Andi Kleen
From: Andi Kleen <ak@linux.intel.com>
Now that we have multiple nonblocking_pools it makes sense to report
the name of the pool in the urandom_read trace point. Extend
the trace point to report the name too.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
drivers/char/random.c | 2 +-
include/trace/events/random.h | 10 ++++++----
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 21ae44b..00c7be2 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1590,7 +1590,7 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3));
ret = extract_entropy_user(pool, buf, nbytes);
- trace_urandom_read(8 * nbytes, ENTROPY_BITS(pool),
+ trace_urandom_read(pool->name, 8 * nbytes, ENTROPY_BITS(pool),
ENTROPY_BITS(pool));
return ret;
}
diff --git a/include/trace/events/random.h b/include/trace/events/random.h
index 4684de3..3b5ab5a 100644
--- a/include/trace/events/random.h
+++ b/include/trace/events/random.h
@@ -288,25 +288,27 @@ TRACE_EVENT(random_read,
);
TRACE_EVENT(urandom_read,
- TP_PROTO(int got_bits, int pool_left, int input_left),
+ TP_PROTO(const char *pool, int got_bits, int pool_left, int input_left),
- TP_ARGS(got_bits, pool_left, input_left),
+ TP_ARGS(pool, got_bits, pool_left, input_left),
TP_STRUCT__entry(
+ __field( const char *, pool )
__field( int, got_bits )
__field( int, pool_left )
__field( int, input_left )
),
TP_fast_assign(
+ __entry->pool = pool;
__entry->got_bits = got_bits;
__entry->pool_left = pool_left;
__entry->input_left = input_left;
),
TP_printk("got_bits %d nonblocking_pool_entropy_left %d "
- "input_entropy_left %d", __entry->got_bits,
- __entry->pool_left, __entry->input_left)
+ "input_entropy_left %d from %s", __entry->got_bits,
+ __entry->pool_left, __entry->input_left, __entry->pool)
);
#endif /* _TRACE_RANDOM_H */
--
2.5.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Scalable random patchkit revisited
@ 2016-02-10 23:01 Andi Kleen
2016-02-10 23:01 ` [PATCH 3/3] random: Add pool name to urandom_read trace point Andi Kleen
0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2016-02-10 23:01 UTC (permalink / raw)
To: tytso; +Cc: linux-kernel
Hi Ted,
So far noone has come with a better solution, and the /dev/*random scaling
problem on large systems is still there and affecting workloads.
So I would like to resubmit the /dev/*random patchkit I posted some time
ago. Back then it was successfully reviewed, but then some alternative
proposals came up that nobody was interested in working on.
If new proposals are implemented they could be done on top of my patchkit
(even removing the extra pools if they are not needed anymore).
I forward ported the patches to the current master (4.5-rc*) and retested
it and it works just fine.
Please consider applying.
Thanks,
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 3/3] random: Add pool name to urandom_read trace point
2016-02-10 23:01 Scalable random patchkit revisited Andi Kleen
@ 2016-02-10 23:01 ` Andi Kleen
0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2016-02-10 23:01 UTC (permalink / raw)
To: tytso; +Cc: linux-kernel, Andi Kleen
From: Andi Kleen <ak@linux.intel.com>
Now that we have multiple nonblocking_pools it makes sense to report
the name of the pool in the urandom_read trace point. Extend
the trace point to report the name too.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
drivers/char/random.c | 2 +-
include/trace/events/random.h | 10 ++++++----
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index a395f783..de6dca8 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1590,7 +1590,7 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3));
ret = extract_entropy_user(pool, buf, nbytes);
- trace_urandom_read(8 * nbytes, ENTROPY_BITS(pool),
+ trace_urandom_read(pool->name, 8 * nbytes, ENTROPY_BITS(pool),
ENTROPY_BITS(pool));
return ret;
}
diff --git a/include/trace/events/random.h b/include/trace/events/random.h
index 4684de3..3b5ab5a 100644
--- a/include/trace/events/random.h
+++ b/include/trace/events/random.h
@@ -288,25 +288,27 @@ TRACE_EVENT(random_read,
);
TRACE_EVENT(urandom_read,
- TP_PROTO(int got_bits, int pool_left, int input_left),
+ TP_PROTO(const char *pool, int got_bits, int pool_left, int input_left),
- TP_ARGS(got_bits, pool_left, input_left),
+ TP_ARGS(pool, got_bits, pool_left, input_left),
TP_STRUCT__entry(
+ __field( const char *, pool )
__field( int, got_bits )
__field( int, pool_left )
__field( int, input_left )
),
TP_fast_assign(
+ __entry->pool = pool;
__entry->got_bits = got_bits;
__entry->pool_left = pool_left;
__entry->input_left = input_left;
),
TP_printk("got_bits %d nonblocking_pool_entropy_left %d "
- "input_entropy_left %d", __entry->got_bits,
- __entry->pool_left, __entry->input_left)
+ "input_entropy_left %d from %s", __entry->got_bits,
+ __entry->pool_left, __entry->input_left, __entry->pool)
);
#endif /* _TRACE_RANDOM_H */
--
2.4.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Updated scalable urandom patchkit
@ 2015-09-24 17:19 Andi Kleen
2015-09-24 17:19 ` [PATCH 3/3] random: Add pool name to urandom_read trace point Andi Kleen
0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2015-09-24 17:19 UTC (permalink / raw)
To: tytso; +Cc: linux-kernel, linux
Updated patchkit to make /dev/urandom scale on larger systesm.
I addressed all the review comments. See ChangeLogs in the individual
patches.
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 3/3] random: Add pool name to urandom_read trace point
2015-09-24 17:19 Updated scalable urandom patchkit Andi Kleen
@ 2015-09-24 17:19 ` Andi Kleen
0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2015-09-24 17:19 UTC (permalink / raw)
To: tytso; +Cc: linux-kernel, linux, Andi Kleen
From: Andi Kleen <ak@linux.intel.com>
Now that we have multiple nonblocking_pools it makes sense to report
the name of the pool in the urandom_read trace point. Extend
the trace point to report the name too.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
drivers/char/random.c | 2 +-
include/trace/events/random.h | 10 ++++++----
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 31141b4..c44831d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1588,7 +1588,7 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3));
ret = extract_entropy_user(pool, buf, nbytes);
- trace_urandom_read(8 * nbytes, ENTROPY_BITS(pool),
+ trace_urandom_read(pool->name, 8 * nbytes, ENTROPY_BITS(pool),
ENTROPY_BITS(pool));
return ret;
}
diff --git a/include/trace/events/random.h b/include/trace/events/random.h
index 4684de3..3b5ab5a 100644
--- a/include/trace/events/random.h
+++ b/include/trace/events/random.h
@@ -288,25 +288,27 @@ TRACE_EVENT(random_read,
);
TRACE_EVENT(urandom_read,
- TP_PROTO(int got_bits, int pool_left, int input_left),
+ TP_PROTO(const char *pool, int got_bits, int pool_left, int input_left),
- TP_ARGS(got_bits, pool_left, input_left),
+ TP_ARGS(pool, got_bits, pool_left, input_left),
TP_STRUCT__entry(
+ __field( const char *, pool )
__field( int, got_bits )
__field( int, pool_left )
__field( int, input_left )
),
TP_fast_assign(
+ __entry->pool = pool;
__entry->got_bits = got_bits;
__entry->pool_left = pool_left;
__entry->input_left = input_left;
),
TP_printk("got_bits %d nonblocking_pool_entropy_left %d "
- "input_entropy_left %d", __entry->got_bits,
- __entry->pool_left, __entry->input_left)
+ "input_entropy_left %d from %s", __entry->got_bits,
+ __entry->pool_left, __entry->input_left, __entry->pool)
);
#endif /* _TRACE_RANDOM_H */
--
2.4.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 1/3] Make /dev/urandom scalable
@ 2015-09-22 23:16 Andi Kleen
2015-09-22 23:16 ` [PATCH 3/3] random: Add pool name to urandom_read trace point Andi Kleen
0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2015-09-22 23:16 UTC (permalink / raw)
To: tytso; +Cc: linux-kernel, kirill.shutemov, herbert, Andi Kleen
From: Andi Kleen <ak@linux.intel.com>
We had a case where a 4 socket system spent >80% of its total CPU time
contending on the global urandom nonblocking pool spinlock. While the
application could probably have used an own PRNG, it may have valid
reasons to use the best possible key for different session keys.
The application still ran acceptable under 2S, but just fell over
the locking cliff on 4S.
Implementation
==============
The non blocking pool is used widely these days, from every execve() (to
set up AT_RANDOM for ld.so randomization), to getrandom(3) and to frequent
/dev/urandom users in user space. Clearly having such a popular resource
under a global lock is bad thing.
This patch changes the random driver to use distributed per NUMA node
nonblocking pools. The basic structure is not changed: entropy is
first fed into the input pool and later from there distributed
round-robin into the blocking and non blocking pools. This patch extends
this to use an dedicated non blocking pool for each node, and distribute
evenly from the input pool into these distributed pools, in
addition to the blocking pool.
Then every urandom/getrandom user fetches data from its node local
pool. At boot time when users may be still waiting for the non
blocking pool initialization we use the node 0 non blocking pool,
to avoid the need for different wake up queues.
For single node systems (like the vast majority of non server systems)
nothing changes. There is still only a single non blocking pool.
The different per-node pools also start with different start
states and diverge more and more over time, as they get
feed different input data. So "replay" attacks are
difficult after some time.
Without hardware random number seed support the start states
(until enough real entropy is collected) are not very random, but
that's not worse than before
Since we still have a global input pool there are no problems
with load balancing entropy data between nodes. Any node that never
runs any interrupts would still get the same amount of entropy as
other nodes.
Entropy is fed preferably to nodes that need it more using
the existing 75% threshold.
For saving/restoring /dev/urandom, there is currently no mechanism
to access the non local node pool (short of setting task affinity).
This implies that currently the standard init/exit random save/restore
scripts would only save node 0. On restore all pools are updates.
So the entropy of non 0 gets lost over reboot. That seems acceptable
to me for now (fixing this would need a new separate save/restore interface)
Scalability
===========
I tested the patch with a simple will-it-scale test banging
on get_random() in parallel on more and more CPUs. Of course
that is not a realistic scenario, as real programs should
do some work between getting random numbers. But it's a worst
case for the random scalability.
On a 4S Xeon v3 system _without_ the patchkit the benchmark
maxes out when using all the threads on one node. After
that it quickly settles to about half the throughput of
one node with 2-4 nodes.
(all throughput factors, bigger is better)
Without patchkit:
1 node: 1x
2 nodes: 0.75x
3 nodes: 0.55x
4 nodes: 0.42x
With the patchkit applied:
1 node: 1x
2 nodes: 2x
3 nodes: 3.4x
4 nodes: 6x
So it's not quite linear scalability, but 6x maximum throughput
is already a lot better.
A node can still have a large number of CPUs: on my test system 36
logical software threads (18C * 2T). In principle it may make
sense to split it up further. Per logical CPU would be clearly
overkill. But that would also add more pressure on the input
pools. For now per node seems like a acceptable compromise.
/dev/random still uses a single global lock. For now that seems
acceptable as it normally cannot be used for real high volume
accesses anyways.
The input pool also still uses a global lock. The existing per CPU
fast pool and "give up when busy" mechanism seems to scale well enough
even on larger systems.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
drivers/char/random.c | 142 ++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 114 insertions(+), 28 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index d0da5d8..d0302be 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -156,6 +156,17 @@
* particular randomness source. They do this by keeping track of the
* first and second order deltas of the event timings.
*
+ * Distributed pools
+ * =================
+ *
+ * On larger systems the locking on the single non blocking pool can
+ * become a bottleneck. To avoid this, we use an own non blocking pool
+ * for each NUMA node. The distributed pools are fed round robin from
+ * the input pool. Each user then only reads entropy from their local
+ * pool.
+ *
+ * For the blocking pool there is still only a single instance.
+ *
* Ensuring unpredictability at system startup
* ============================================
*
@@ -467,7 +478,7 @@ static struct entropy_store blocking_pool = {
static struct entropy_store nonblocking_pool = {
.poolinfo = &poolinfo_table[1],
- .name = "nonblocking",
+ .name = "nonblocking 0",
.pull = &input_pool,
.lock = __SPIN_LOCK_UNLOCKED(nonblocking_pool.lock),
.pool = nonblocking_pool_data,
@@ -475,6 +486,32 @@ static struct entropy_store nonblocking_pool = {
push_to_pool),
};
+/*
+ * Per NUMA node nonblocking pool. This avoids lock contention
+ * when many processes extract data from /dev/urandom in parallel.
+ * /dev/random stays single instance for now.
+ */
+static struct entropy_store **nonblocking_node_pool __read_mostly;
+
+#define for_each_nb_pool(i, pool) for (i = 0; i < num_possible_nodes(); i++) { \
+ pool = nonblocking_node_pool[i];
+#define end_for_each_nb() }
+
+static inline struct entropy_store *get_nonblocking_pool(void)
+{
+ struct entropy_store *pool = &nonblocking_pool;
+
+ /*
+ * Non node 0 pools may take longer to initialize. Keep using
+ * the boot nonblocking pool while this happens.
+ */
+ if (nonblocking_node_pool)
+ pool = nonblocking_node_pool[numa_node_id()];
+ if (!pool->initialized)
+ pool = &nonblocking_pool;
+ return pool;
+}
+
static __u32 const twist_table[8] = {
0x00000000, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };
@@ -698,18 +735,23 @@ retry:
kill_fasync(&fasync, SIGIO, POLL_IN);
}
/* If the input pool is getting full, send some
- * entropy to the two output pools, flipping back and
- * forth between them, until the output pools are 75%
+ * entropy to the other output pools, cycling
+ * between them, until the output pools are 75%
* full.
+ * Feed into all pools round robin.
*/
if (entropy_bits > random_write_wakeup_bits &&
r->initialized &&
r->entropy_total >= 2*random_read_wakeup_bits) {
static struct entropy_store *last = &blocking_pool;
+ static int next_pool = -1;
struct entropy_store *other = &blocking_pool;
- if (last == &blocking_pool)
- other = &nonblocking_pool;
+ /* -1: use blocking pool, 0<=max_node: node nb pool */
+ if (next_pool > -1)
+ other = nonblocking_node_pool[next_pool];
+ if (++next_pool >= num_possible_nodes())
+ next_pool = -1;
if (other->entropy_count <=
3 * other->poolinfo->poolfracbits / 4)
last = other;
@@ -748,6 +790,17 @@ struct timer_rand_state {
#define INIT_TIMER_RAND_STATE { INITIAL_JIFFIES, };
+static void pool_add_buf(struct entropy_store *pool,
+ const void *buf, unsigned int size)
+{
+ unsigned long flags;
+ unsigned long time = random_get_entropy() ^ jiffies;
+ spin_lock_irqsave(&pool->lock, flags);
+ _mix_pool_bytes(pool, buf, size);
+ _mix_pool_bytes(pool, &time, sizeof(time));
+ spin_unlock_irqrestore(&pool->lock, flags);
+}
+
/*
* Add device- or boot-specific data to the input and nonblocking
* pools to help initialize them to unique values.
@@ -758,19 +811,18 @@ struct timer_rand_state {
*/
void add_device_randomness(const void *buf, unsigned int size)
{
- unsigned long time = random_get_entropy() ^ jiffies;
- unsigned long flags;
+ struct entropy_store *pool;
+ int i;
trace_add_device_randomness(size, _RET_IP_);
- spin_lock_irqsave(&input_pool.lock, flags);
- _mix_pool_bytes(&input_pool, buf, size);
- _mix_pool_bytes(&input_pool, &time, sizeof(time));
- spin_unlock_irqrestore(&input_pool.lock, flags);
-
- spin_lock_irqsave(&nonblocking_pool.lock, flags);
- _mix_pool_bytes(&nonblocking_pool, buf, size);
- _mix_pool_bytes(&nonblocking_pool, &time, sizeof(time));
- spin_unlock_irqrestore(&nonblocking_pool.lock, flags);
+ pool_add_buf(&input_pool, buf, size);
+ if (!nonblocking_node_pool) {
+ pool_add_buf(&nonblocking_pool, buf, size);
+ } else {
+ for_each_nb_pool (i, pool) {
+ pool_add_buf(pool, buf, size);
+ } end_for_each_nb()
+ }
}
EXPORT_SYMBOL(add_device_randomness);
@@ -1252,15 +1304,16 @@ static ssize_t extract_entropy_user(struct entropy_store *r, void __user *buf,
*/
void get_random_bytes(void *buf, int nbytes)
{
+ struct entropy_store *pool = get_nonblocking_pool();
#if DEBUG_RANDOM_BOOT > 0
- if (unlikely(nonblocking_pool.initialized == 0))
+ if (unlikely(pool->initialized == 0))
printk(KERN_NOTICE "random: %pF get_random_bytes called "
"with %d bits of entropy available\n",
(void *) _RET_IP_,
- nonblocking_pool.entropy_total);
+ pool->entropy_total);
#endif
trace_get_random_bytes(nbytes, _RET_IP_);
- extract_entropy(&nonblocking_pool, buf, nbytes, 0, 0);
+ extract_entropy(pool, buf, nbytes, 0, 0);
}
EXPORT_SYMBOL(get_random_bytes);
@@ -1350,7 +1403,7 @@ void get_random_bytes_arch(void *buf, int nbytes)
}
if (nbytes)
- extract_entropy(&nonblocking_pool, p, nbytes, 0, 0);
+ extract_entropy(get_nonblocking_pool(), p, nbytes, 0, 0);
}
EXPORT_SYMBOL(get_random_bytes_arch);
@@ -1393,9 +1446,32 @@ static void init_std_data(struct entropy_store *r)
*/
static int rand_initialize(void)
{
+ int i;
+ int num_nodes = num_possible_nodes();
+ char name[40];
+
+ nonblocking_node_pool = kzalloc(num_nodes * sizeof(void *),
+ GFP_KERNEL|__GFP_NOFAIL);
+
init_std_data(&input_pool);
init_std_data(&blocking_pool);
+ nonblocking_node_pool[0] = &nonblocking_pool;
init_std_data(&nonblocking_pool);
+ for (i = 1; i < num_nodes; i++) {
+ struct entropy_store *pool = kzalloc(sizeof(struct entropy_store),
+ GFP_KERNEL|__GFP_NOFAIL);
+ nonblocking_node_pool[i] = pool;
+ pool->poolinfo = &poolinfo_table[1];
+ pool->pull = &input_pool;
+ spin_lock_init(&pool->lock);
+ /* pool data not cleared intentionally */
+ pool->pool = kmalloc(sizeof(nonblocking_pool_data),
+ GFP_KERNEL|__GFP_NOFAIL);
+ INIT_WORK(&pool->push_work, push_to_pool);
+ snprintf(name, sizeof name, "nonblocking pool %d", i);
+ pool->name = kstrdup(name, GFP_KERNEL|__GFP_NOFAIL);
+ init_std_data(pool);
+ }
return 0;
}
early_initcall(rand_initialize);
@@ -1458,17 +1534,19 @@ static ssize_t
urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
{
int ret;
+ struct entropy_store *pool;
- if (unlikely(nonblocking_pool.initialized == 0))
+ pool = get_nonblocking_pool();
+ if (unlikely(pool->initialized == 0))
printk_once(KERN_NOTICE "random: %s urandom read "
"with %d bits of entropy available\n",
current->comm, nonblocking_pool.entropy_total);
nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3));
- ret = extract_entropy_user(&nonblocking_pool, buf, nbytes);
+ ret = extract_entropy_user(pool, buf, nbytes);
- trace_urandom_read(8 * nbytes, ENTROPY_BITS(&nonblocking_pool),
- ENTROPY_BITS(&input_pool));
+ trace_urandom_read(8 * nbytes, ENTROPY_BITS(pool),
+ ENTROPY_BITS(pool));
return ret;
}
@@ -1513,22 +1591,28 @@ static ssize_t random_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos)
{
size_t ret;
+ struct entropy_store *pool;
+ int i;
ret = write_pool(&blocking_pool, buffer, count);
if (ret)
return ret;
- ret = write_pool(&nonblocking_pool, buffer, count);
- if (ret)
- return ret;
+ for_each_nb_pool (i, pool) {
+ ret = write_pool(pool, buffer, count);
+ if (ret)
+ return ret;
+ } end_for_each_nb()
return (ssize_t)count;
}
static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
{
+ int i;
int size, ent_count;
int __user *p = (int __user *)arg;
int retval;
+ struct entropy_store *pool;
switch (cmd) {
case RNDGETENTCNT:
@@ -1568,7 +1652,9 @@ static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
input_pool.entropy_count = 0;
- nonblocking_pool.entropy_count = 0;
+ for_each_nb_pool (i, pool) {
+ pool->entropy_count = 0;
+ } end_for_each_nb();
blocking_pool.entropy_count = 0;
return 0;
default:
--
2.4.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] random: Add pool name to urandom_read trace point
2015-09-22 23:16 [PATCH 1/3] Make /dev/urandom scalable Andi Kleen
@ 2015-09-22 23:16 ` Andi Kleen
0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2015-09-22 23:16 UTC (permalink / raw)
To: tytso; +Cc: linux-kernel, kirill.shutemov, herbert, Andi Kleen
From: Andi Kleen <ak@linux.intel.com>
Now that we have multiple nonblocking_pools it makes sense to report
the name of the pool in the urandom_read trace point. Extend
the trace point to report the name too.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
drivers/char/random.c | 2 +-
include/trace/events/random.h | 10 ++++++----
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index b74919a..23ec90e 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1551,7 +1551,7 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3));
ret = extract_entropy_user(pool, buf, nbytes);
- trace_urandom_read(8 * nbytes, ENTROPY_BITS(pool),
+ trace_urandom_read(pool->name, 8 * nbytes, ENTROPY_BITS(pool),
ENTROPY_BITS(pool));
return ret;
}
diff --git a/include/trace/events/random.h b/include/trace/events/random.h
index 4684de3..3b5ab5a 100644
--- a/include/trace/events/random.h
+++ b/include/trace/events/random.h
@@ -288,25 +288,27 @@ TRACE_EVENT(random_read,
);
TRACE_EVENT(urandom_read,
- TP_PROTO(int got_bits, int pool_left, int input_left),
+ TP_PROTO(const char *pool, int got_bits, int pool_left, int input_left),
- TP_ARGS(got_bits, pool_left, input_left),
+ TP_ARGS(pool, got_bits, pool_left, input_left),
TP_STRUCT__entry(
+ __field( const char *, pool )
__field( int, got_bits )
__field( int, pool_left )
__field( int, input_left )
),
TP_fast_assign(
+ __entry->pool = pool;
__entry->got_bits = got_bits;
__entry->pool_left = pool_left;
__entry->input_left = input_left;
),
TP_printk("got_bits %d nonblocking_pool_entropy_left %d "
- "input_entropy_left %d", __entry->got_bits,
- __entry->pool_left, __entry->input_left)
+ "input_entropy_left %d from %s", __entry->got_bits,
+ __entry->pool_left, __entry->input_left, __entry->pool)
);
#endif /* _TRACE_RANDOM_H */
--
2.4.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-03-01 5:17 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-06 22:05 [PATCH 1/3] Make /dev/urandom scalable Andi Kleen
2015-10-06 22:05 ` [PATCH 2/3] random: Make input to output pool balancing per cpu Andi Kleen
2015-10-06 22:18 ` kbuild test robot
2015-10-06 22:05 ` [PATCH 3/3] random: Add pool name to urandom_read trace point Andi Kleen
-- strict thread matches above, loose matches on Subject: below --
2016-03-01 5:17 [PATCH 1/3] Make /dev/urandom scalable Andi Kleen
2016-03-01 5:17 ` [PATCH 3/3] random: Add pool name to urandom_read trace point Andi Kleen
2016-02-10 23:01 Scalable random patchkit revisited Andi Kleen
2016-02-10 23:01 ` [PATCH 3/3] random: Add pool name to urandom_read trace point Andi Kleen
2015-09-24 17:19 Updated scalable urandom patchkit Andi Kleen
2015-09-24 17:19 ` [PATCH 3/3] random: Add pool name to urandom_read trace point Andi Kleen
2015-09-22 23:16 [PATCH 1/3] Make /dev/urandom scalable Andi Kleen
2015-09-22 23:16 ` [PATCH 3/3] random: Add pool name to urandom_read trace point Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).