* [PATCH 07/13] netvm: Allow the use of __GFP_MEMALLOC by specific sockets
From: Mel Gorman @ 2011-04-27 16:08 UTC (permalink / raw)
To: Linux-MM, Linux-Netdev
Cc: LKML, David Miller, Neil Brown, Peter Zijlstra, Mel Gorman
In-Reply-To: <1303920491-25302-1-git-send-email-mgorman@suse.de>
Allow specific sockets to be tagged SOCK_MEMALLOC and use __GFP_MEMALLOC
for their allocations. These sockets will be able to go below watermarks
and allocate from the emergency reserve. Such sockets are to be used
to service the VM (iow. to swap over). They must be handled kernel side,
exposing such a socket to user-space is a bug.
There is a risk that the reserves be depleted so for now, the administrator is
responsible for increasing min_free_kbytes as necessary to prevent deadlock
for their workloads.
[a.p.zijlstra@chello.nl: Original patches]
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
include/net/sock.h | 5 ++++-
net/core/sock.c | 22 ++++++++++++++++++++++
2 files changed, 26 insertions(+), 1 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index e89c38f..046bc97 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -554,6 +554,7 @@ enum sock_flags {
SOCK_RCVTSTAMPNS, /* %SO_TIMESTAMPNS setting */
SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
+ SOCK_MEMALLOC, /* VM depends on this socket for swapping */
SOCK_TIMESTAMPING_TX_HARDWARE, /* %SOF_TIMESTAMPING_TX_HARDWARE */
SOCK_TIMESTAMPING_TX_SOFTWARE, /* %SOF_TIMESTAMPING_TX_SOFTWARE */
SOCK_TIMESTAMPING_RX_HARDWARE, /* %SOF_TIMESTAMPING_RX_HARDWARE */
@@ -587,7 +588,7 @@ static inline int sock_flag(struct sock *sk, enum sock_flags flag)
static inline gfp_t sk_allocation(struct sock *sk, gfp_t gfp_mask)
{
- return gfp_mask;
+ return gfp_mask | (sk->sk_allocation & __GFP_MEMALLOC);
}
static inline void sk_acceptq_removed(struct sock *sk)
@@ -717,6 +718,8 @@ extern int sk_stream_wait_memory(struct sock *sk, long *timeo_p);
extern void sk_stream_wait_close(struct sock *sk, long timeo_p);
extern int sk_stream_error(struct sock *sk, int flags, int err);
extern void sk_stream_kill_queues(struct sock *sk);
+extern void sk_set_memalloc(struct sock *sk);
+extern void sk_clear_memalloc(struct sock *sk);
extern int sk_wait_data(struct sock *sk, long *timeo);
diff --git a/net/core/sock.c b/net/core/sock.c
index 6e81978..c685eda 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -219,6 +219,28 @@ __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX;
int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*(2*UIO_MAXIOV+512);
EXPORT_SYMBOL(sysctl_optmem_max);
+/**
+ * sk_set_memalloc - sets %SOCK_MEMALLOC
+ * @sk: socket to set it on
+ *
+ * Set %SOCK_MEMALLOC on a socket for access to emergency reserves.
+ * It's the responsibility of the admin to adjust min_free_kbytes
+ * to meet the requirements
+ */
+void sk_set_memalloc(struct sock *sk)
+{
+ sock_set_flag(sk, SOCK_MEMALLOC);
+ sk->sk_allocation |= __GFP_MEMALLOC;
+}
+EXPORT_SYMBOL_GPL(sk_set_memalloc);
+
+void sk_clear_memalloc(struct sock *sk)
+{
+ sock_reset_flag(sk, SOCK_MEMALLOC);
+ sk->sk_allocation &= ~__GFP_MEMALLOC;
+}
+EXPORT_SYMBOL_GPL(sk_clear_memalloc);
+
#if defined(CONFIG_CGROUPS) && !defined(CONFIG_NET_CLS_CGROUP)
int net_cls_subsys_id = -1;
EXPORT_SYMBOL_GPL(net_cls_subsys_id);
--
1.7.3.4
^ permalink raw reply related
* [PATCH 06/13] net: Introduce sk_allocation() to allow addition of GFP flags depending on the individual socket
From: Mel Gorman @ 2011-04-27 16:08 UTC (permalink / raw)
To: Linux-MM, Linux-Netdev
Cc: LKML, David Miller, Neil Brown, Peter Zijlstra, Mel Gorman
In-Reply-To: <1303920491-25302-1-git-send-email-mgorman@suse.de>
Introduce sk_allocation(), this function allows to inject sock specific
flags to each sock related allocation. It is only used on allocation
paths that may be required for writing pages back to network storage.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
include/net/sock.h | 5 +++++
net/ipv4/tcp.c | 3 ++-
net/ipv4/tcp_output.c | 13 +++++++------
net/ipv6/tcp_ipv6.c | 12 +++++++++---
4 files changed, 23 insertions(+), 10 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index f2046e4..e89c38f 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -585,6 +585,11 @@ static inline int sock_flag(struct sock *sk, enum sock_flags flag)
return test_bit(flag, &sk->sk_flags);
}
+static inline gfp_t sk_allocation(struct sock *sk, gfp_t gfp_mask)
+{
+ return gfp_mask;
+}
+
static inline void sk_acceptq_removed(struct sock *sk)
{
sk->sk_ack_backlog--;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 054a59d..8c1a9d5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -698,7 +698,8 @@ struct sk_buff *sk_stream_alloc_skb(struct sock *sk, int size, gfp_t gfp)
/* The TCP header must be at least 32-bit aligned. */
size = ALIGN(size, 4);
- skb = alloc_skb_fclone(size + sk->sk_prot->max_header, gfp);
+ skb = alloc_skb_fclone(size + sk->sk_prot->max_header,
+ sk_allocation(sk, gfp));
if (skb) {
if (sk_wmem_schedule(sk, skb->truesize)) {
/*
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 17388c7..6157ced 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2324,7 +2324,7 @@ void tcp_send_fin(struct sock *sk)
/* Socket is locked, keep trying until memory is available. */
for (;;) {
skb = alloc_skb_fclone(MAX_TCP_HEADER,
- sk->sk_allocation);
+ sk_allocation(sk, GFP_KERNEL));
if (skb)
break;
yield();
@@ -2350,7 +2350,7 @@ void tcp_send_active_reset(struct sock *sk, gfp_t priority)
struct sk_buff *skb;
/* NOTE: No TCP options attached and we never retransmit this. */
- skb = alloc_skb(MAX_TCP_HEADER, priority);
+ skb = alloc_skb(MAX_TCP_HEADER, sk_allocation(sk, priority));
if (!skb) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTFAILED);
return;
@@ -2423,7 +2423,8 @@ struct sk_buff *tcp_make_synack(struct sock *sk, struct dst_entry *dst,
if (cvp != NULL && cvp->s_data_constant && cvp->s_data_desired)
s_data_desired = cvp->s_data_desired;
- skb = sock_wmalloc(sk, MAX_TCP_HEADER + 15 + s_data_desired, 1, GFP_ATOMIC);
+ skb = sock_wmalloc(sk, MAX_TCP_HEADER + 15 + s_data_desired, 1,
+ sk_allocation(sk, GFP_ATOMIC));
if (skb == NULL)
return NULL;
@@ -2719,7 +2720,7 @@ void tcp_send_ack(struct sock *sk)
* tcp_transmit_skb() will set the ownership to this
* sock.
*/
- buff = alloc_skb(MAX_TCP_HEADER, GFP_ATOMIC);
+ buff = alloc_skb(MAX_TCP_HEADER, sk_allocation(sk, GFP_ATOMIC));
if (buff == NULL) {
inet_csk_schedule_ack(sk);
inet_csk(sk)->icsk_ack.ato = TCP_ATO_MIN;
@@ -2734,7 +2735,7 @@ void tcp_send_ack(struct sock *sk)
/* Send it off, this clears delayed acks for us. */
TCP_SKB_CB(buff)->when = tcp_time_stamp;
- tcp_transmit_skb(sk, buff, 0, GFP_ATOMIC);
+ tcp_transmit_skb(sk, buff, 0, sk_allocation(sk, GFP_ATOMIC));
}
/* This routine sends a packet with an out of date sequence
@@ -2754,7 +2755,7 @@ static int tcp_xmit_probe_skb(struct sock *sk, int urgent)
struct sk_buff *skb;
/* We don't queue it, tcp_transmit_skb() sets ownership. */
- skb = alloc_skb(MAX_TCP_HEADER, GFP_ATOMIC);
+ skb = alloc_skb(MAX_TCP_HEADER, sk_allocation(sk, GFP_ATOMIC));
if (skb == NULL)
return -1;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 2f6c671..4c11638 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -597,7 +597,8 @@ static int tcp_v6_md5_do_add(struct sock *sk, struct in6_addr *peer,
} else {
/* reallocate new list if current one is full. */
if (!tp->md5sig_info) {
- tp->md5sig_info = kzalloc(sizeof(*tp->md5sig_info), GFP_ATOMIC);
+ tp->md5sig_info = kzalloc(sizeof(*tp->md5sig_info),
+ sk_allocation(sk, GFP_ATOMIC));
if (!tp->md5sig_info) {
kfree(newkey);
return -ENOMEM;
@@ -610,7 +611,8 @@ static int tcp_v6_md5_do_add(struct sock *sk, struct in6_addr *peer,
}
if (tp->md5sig_info->alloced6 == tp->md5sig_info->entries6) {
keys = kmalloc((sizeof (tp->md5sig_info->keys6[0]) *
- (tp->md5sig_info->entries6 + 1)), GFP_ATOMIC);
+ (tp->md5sig_info->entries6 + 1)),
+ sk_allocation(sk, GFP_ATOMIC));
if (!keys) {
tcp_free_md5sig_pool();
@@ -734,7 +736,8 @@ static int tcp_v6_parse_md5_keys (struct sock *sk, char __user *optval,
struct tcp_sock *tp = tcp_sk(sk);
struct tcp_md5sig_info *p;
- p = kzalloc(sizeof(struct tcp_md5sig_info), GFP_KERNEL);
+ p = kzalloc(sizeof(struct tcp_md5sig_info),
+ sk_allocation(sk, GFP_KERNEL));
if (!p)
return -ENOMEM;
@@ -1084,6 +1087,7 @@ static void tcp_v6_send_reset(struct sock *sk, struct sk_buff *skb)
struct tcphdr *th = tcp_hdr(skb);
u32 seq = 0, ack_seq = 0;
struct tcp_md5sig_key *key = NULL;
+ gfp_t gfp_mask = GFP_ATOMIC;
if (th->rst)
return;
@@ -1095,6 +1099,8 @@ static void tcp_v6_send_reset(struct sock *sk, struct sk_buff *skb)
if (sk)
key = tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr);
#endif
+ if (sk)
+ gfp_mask = sk_allocation(sk, gfp_mask);
if (th->ack)
seq = ntohl(th->ack_seq);
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related
* [PATCH 05/13] mm: Ignore mempolicies when using ALLOC_NO_WATERMARK
From: Mel Gorman @ 2011-04-27 16:08 UTC (permalink / raw)
To: Linux-MM, Linux-Netdev
Cc: LKML, David Miller, Neil Brown, Peter Zijlstra, Mel Gorman
In-Reply-To: <1303920491-25302-1-git-send-email-mgorman@suse.de>
The reserve is proportionally distributed over all !highmem zones in the
system. So we need to allow an emergency allocation access to all zones.
In order to do that we need to break out of any mempolicy boundaries we
might have.
In my opinion that does not break mempolicies as those are user oriented
and not system oriented. That is, system allocations are not guaranteed to
be within mempolicy boundaries. For instance IRQs don't even have a mempolicy.
So breaking out of mempolicy boundaries for 'rare' emergency allocations,
which are always system allocations (as opposed to user) is ok.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/page_alloc.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7e7d9ce..5ff1f71 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2085,6 +2085,13 @@ restart:
rebalance:
/* Allocate without watermarks if the context allows */
if (alloc_flags & ALLOC_NO_WATERMARKS) {
+ /*
+ * Ignore mempolicies if ALLOC_NO_WATERMARKS on the grounds
+ * the allocation is high priority and these type of
+ * allocations are system rather than user orientated
+ */
+ zonelist = node_zonelist(numa_node_id(), gfp_mask);
+
page = __alloc_pages_high_priority(gfp_mask, order,
zonelist, high_zoneidx, nodemask,
preferred_zone, migratetype);
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related
* [PATCH 04/13] mm: allow PF_MEMALLOC from softirq context
From: Mel Gorman @ 2011-04-27 16:08 UTC (permalink / raw)
To: Linux-MM, Linux-Netdev
Cc: LKML, David Miller, Neil Brown, Peter Zijlstra, Mel Gorman
In-Reply-To: <1303920491-25302-1-git-send-email-mgorman@suse.de>
This is needed to allow network softirq packet processing to make use
of PF_MEMALLOC.
Currently softirq context cannot use PF_MEMALLOC due to it not being
associated with a task, and therefore not having task flags to fiddle with -
thus the gfp to alloc flag mapping ignores the task flags when in interrupts
(hard or soft) context.
Allowing softirqs to make use of PF_MEMALLOC therefore requires some trickery.
We basically borrow the task flags from whatever process happens to be
preempted by the softirq.
So we modify the gfp to alloc flags mapping to not exclude task flags in
softirq context, and modify the softirq code to save, clear and restore
the PF_MEMALLOC flag.
The save and clear, ensures the preempted task's PF_MEMALLOC flag doesn't
leak into the softirq. The restore ensures a softirq's PF_MEMALLOC flag
cannot leak back into the preempted process.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
include/linux/sched.h | 7 +++++++
kernel/softirq.c | 3 +++
mm/page_alloc.c | 5 ++++-
3 files changed, 14 insertions(+), 1 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3f7d3f9..e87bb68 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1822,6 +1822,13 @@ static inline void rcu_copy_process(struct task_struct *p)
#endif
+static inline void tsk_restore_flags(struct task_struct *p,
+ unsigned long pflags, unsigned long mask)
+{
+ p->flags &= ~mask;
+ p->flags |= pflags & mask;
+}
+
#ifdef CONFIG_SMP
extern int set_cpus_allowed_ptr(struct task_struct *p,
const struct cpumask *new_mask);
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 1396017..2817c27 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -210,6 +210,8 @@ asmlinkage void __do_softirq(void)
__u32 pending;
int max_restart = MAX_SOFTIRQ_RESTART;
int cpu;
+ unsigned long pflags = current->flags;
+ current->flags &= ~PF_MEMALLOC;
pending = local_softirq_pending();
account_system_vtime(current);
@@ -265,6 +267,7 @@ restart:
account_system_vtime(current);
__local_bh_enable(SOFTIRQ_OFFSET);
+ tsk_restore_flags(current, pflags, PF_MEMALLOC);
}
#ifndef __ARCH_HAS_DO_SOFTIRQ
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0f04b7b..7e7d9ce 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2005,7 +2005,10 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
if (gfp_mask & __GFP_MEMALLOC)
alloc_flags |= ALLOC_NO_WATERMARKS;
- else if (likely(!(gfp_mask & __GFP_NOMEMALLOC)) && !in_interrupt())
+ else if (!in_irq() && (current->flags & PF_MEMALLOC))
+ alloc_flags |= ALLOC_NO_WATERMARKS;
+ else if (!in_interrupt() &&
+ unlikely(test_thread_flag(TIF_MEMDIE)))
alloc_flags |= ALLOC_NO_WATERMARKS;
}
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related
* [PATCH 03/13] mm: Introduce __GFP_MEMALLOC to allow access to emergency reserves
From: Mel Gorman @ 2011-04-27 16:08 UTC (permalink / raw)
To: Linux-MM, Linux-Netdev
Cc: LKML, David Miller, Neil Brown, Peter Zijlstra, Mel Gorman
In-Reply-To: <1303920491-25302-1-git-send-email-mgorman@suse.de>
__GFP_MEMALLOC will allow the allocation to disregard the watermarks,
much like PF_MEMALLOC. It allows one to pass along the memalloc state in
object related allocation flags as opposed to task related flags, such
as sk->sk_allocation. This removes the need for ALLOC_PFMEMALLOC as
callers using __GFP_MEMALLOC can get the ALLOC_NO_WATERMARK flag which
is now enough to identify allocations related to page reclaim.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
include/linux/gfp.h | 10 ++++++++--
include/linux/mm_types.h | 2 +-
mm/page_alloc.c | 14 ++++++--------
mm/slab.c | 2 +-
4 files changed, 16 insertions(+), 12 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index bfb8f93..7aa80f0 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -23,6 +23,7 @@ struct vm_area_struct;
#define ___GFP_REPEAT 0x400u
#define ___GFP_NOFAIL 0x800u
#define ___GFP_NORETRY 0x1000u
+#define ___GFP_MEMALLOC 0x2000u
#define ___GFP_COMP 0x4000u
#define ___GFP_ZERO 0x8000u
#define ___GFP_NOMEMALLOC 0x10000u
@@ -75,9 +76,14 @@ struct vm_area_struct;
#define __GFP_REPEAT ((__force gfp_t)___GFP_REPEAT) /* See above */
#define __GFP_NOFAIL ((__force gfp_t)___GFP_NOFAIL) /* See above */
#define __GFP_NORETRY ((__force gfp_t)___GFP_NORETRY) /* See above */
+#define __GFP_MEMALLOC ((__force gfp_t)___GFP_MEMALLOC)/* Allow access to emergency reserves */
#define __GFP_COMP ((__force gfp_t)___GFP_COMP) /* Add compound page metadata */
#define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) /* Return zeroed page on success */
-#define __GFP_NOMEMALLOC ((__force gfp_t)___GFP_NOMEMALLOC) /* Don't use emergency reserves */
+#define __GFP_NOMEMALLOC ((__force gfp_t)___GFP_NOMEMALLOC) /* Don't use emergency reserves.
+ * This takes precedence over the
+ * __GFP_MEMALLOC flag if both are
+ * set
+ */
#define __GFP_HARDWALL ((__force gfp_t)___GFP_HARDWALL) /* Enforce hardwall cpuset memory allocs */
#define __GFP_THISNODE ((__force gfp_t)___GFP_THISNODE)/* No fallback, no policies */
#define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) /* Page is reclaimable */
@@ -127,7 +133,7 @@ struct vm_area_struct;
/* Control page allocator reclaim behavior */
#define GFP_RECLAIM_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS|\
__GFP_NOWARN|__GFP_REPEAT|__GFP_NOFAIL|\
- __GFP_NORETRY|__GFP_NOMEMALLOC)
+ __GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC)
/* Control slab gfp mask during early boot */
#define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_WAIT|__GFP_IO|__GFP_FS))
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5630d27..b890a0b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -72,7 +72,7 @@ struct page {
pgoff_t index; /* Our offset within mapping. */
void *freelist; /* SLUB: freelist req. slab lock */
bool pfmemalloc; /* If set by the page allocator,
- * ALLOC_PFMEMALLOC was set and the
+ * ALLOC_NO_WATERMARKS was set and the
* low watermark was not met implying
* that the system is under some
* pressure. The caller should try
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 178d792..0f04b7b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1367,7 +1367,6 @@ failed:
#define ALLOC_HARDER 0x10 /* try to alloc harder */
#define ALLOC_HIGH 0x20 /* __GFP_HIGH set */
#define ALLOC_CPUSET 0x40 /* check for correct cpuset */
-#define ALLOC_PFMEMALLOC 0x80 /* Caller has PF_MEMALLOC set */
#ifdef CONFIG_FAIL_PAGE_ALLOC
@@ -2003,11 +2002,10 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
} else if (unlikely(rt_task(current)) && !in_interrupt())
alloc_flags |= ALLOC_HARDER;
- if ((current->flags & PF_MEMALLOC) ||
- unlikely(test_thread_flag(TIF_MEMDIE))) {
- alloc_flags |= ALLOC_PFMEMALLOC;
-
- if (likely(!(gfp_mask & __GFP_NOMEMALLOC)) && !in_interrupt())
+ if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
+ if (gfp_mask & __GFP_MEMALLOC)
+ alloc_flags |= ALLOC_NO_WATERMARKS;
+ else if (likely(!(gfp_mask & __GFP_NOMEMALLOC)) && !in_interrupt())
alloc_flags |= ALLOC_NO_WATERMARKS;
}
@@ -2016,7 +2014,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
{
- return !!(gfp_to_alloc_flags(gfp_mask) & ALLOC_PFMEMALLOC);
+ return !!(gfp_to_alloc_flags(gfp_mask) & ALLOC_NO_WATERMARKS);
}
static inline struct page *
@@ -2218,7 +2216,7 @@ got_pg:
* steps that will free more memory. The caller should avoid the
* page being used for !PFMEMALLOC purposes.
*/
- page->pfmemalloc = !!(alloc_flags & ALLOC_PFMEMALLOC);
+ page->pfmemalloc = !!(alloc_flags & ALLOC_NO_WATERMARKS);
return page;
}
diff --git a/mm/slab.c b/mm/slab.c
index d0161f2..342c7c7 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2978,7 +2978,7 @@ static int cache_grow(struct kmem_cache *cachep,
if (!slabp)
goto opps1;
- /* Record if ALLOC_PFMEMALLOC was set when allocating the slab */
+ /* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */
if (pfmemalloc) {
struct array_cache *ac = cpu_cache_get(cachep);
slabp->pfmemalloc = true;
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related
* [PATCH 02/13] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages
From: Mel Gorman @ 2011-04-27 16:08 UTC (permalink / raw)
To: Linux-MM, Linux-Netdev
Cc: LKML, David Miller, Neil Brown, Peter Zijlstra, Mel Gorman
In-Reply-To: <1303920491-25302-1-git-send-email-mgorman@suse.de>
Allocations of pages below the min watermark run a risk of the machine
hanging due to lack of memory. To prevent this, only callers who
have PF_MEMALLOC or TIF_MEMDIE set and not processing an interrupt are
allowed to allocate with ALLOC_NO_WATERMARKS. Once they are allocated
to a slab though, nothing prevents other callers consuming free objects
within those slabs. This patch limits access to slab pages that were
alloced from the PFMEMALLOC reserves.
Pages allocated from the reserve are returned with page->pfmemalloc
set and it's up to the caller to determine how the page should be
protected. SLAB restricts access to any page with page->pfmemalloc set
to callers which are known to able to access the PFMEMALLOC reserve. If
one is not available, an attempt is made to allocate a new page rather
than use a reserve. SLUB is a bit more relaxed in that it only records
if the current per-CPU page was allocated from PFMEMALLOC reserve and
uses another partial slab if the caller does not have the necessary
GFP or process flags. This was found to be sufficient in tests to
avoid hangs due to SLUB generally maintaining smaller lists than SLAB.
In low-memory conditions it does mean that !PFMEMALLOC allocators
can fail a slab allocation even though free objects are available
because they are being preserved for callers that are freeing pages.
[a.p.zijlstra@chello.nl: Original implementation]
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
include/linux/mm_types.h | 8 ++
include/linux/slub_def.h | 1 +
mm/internal.h | 3 +
mm/page_alloc.c | 27 +++++-
mm/slab.c | 216 +++++++++++++++++++++++++++++++++++++++-------
mm/slub.c | 35 ++++++--
6 files changed, 246 insertions(+), 44 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index ca01ab2..5630d27 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -71,6 +71,14 @@ struct page {
union {
pgoff_t index; /* Our offset within mapping. */
void *freelist; /* SLUB: freelist req. slab lock */
+ bool pfmemalloc; /* If set by the page allocator,
+ * ALLOC_PFMEMALLOC was set and the
+ * low watermark was not met implying
+ * that the system is under some
+ * pressure. The caller should try
+ * ensure this page is only used to
+ * free other pages.
+ */
};
struct list_head lru; /* Pageout list, eg. active_list
* protected by zone->lru_lock !
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 45ca123..639aace 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -42,6 +42,7 @@ struct kmem_cache_cpu {
#endif
struct page *page; /* The slab from which we are allocating */
int node; /* The node of the page (or -1 for debug) */
+ bool pfmemalloc; /* Slab page had pfmemalloc set */
#ifdef CONFIG_SLUB_STATS
unsigned stat[NR_SLUB_STAT_ITEMS];
#endif
diff --git a/mm/internal.h b/mm/internal.h
index d071d380..a520f3b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -193,6 +193,9 @@ static inline struct page *mem_map_next(struct page *iter,
#define __paginginit __init
#endif
+/* Returns true if the gfp_mask allows use of ALLOC_NO_WATERMARK */
+bool gfp_pfmemalloc_allowed(gfp_t gfp_mask);
+
/* Memory initialisation debug and verification */
enum mminit_level {
MMINIT_WARNING,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a93013a..178d792 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -654,6 +654,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
trace_mm_page_free_direct(page, order);
kmemcheck_free_shadow(page, order);
+ page->pfmemalloc = false;
if (PageAnon(page))
page->mapping = NULL;
for (i = 0; i < (1 << order); i++)
@@ -1172,6 +1173,7 @@ void free_hot_cold_page(struct page *page, int cold)
migratetype = get_pageblock_migratetype(page);
set_page_private(page, migratetype);
+ page->pfmemalloc = false;
local_irq_save(flags);
if (unlikely(wasMlocked))
free_page_mlock(page);
@@ -1365,6 +1367,7 @@ failed:
#define ALLOC_HARDER 0x10 /* try to alloc harder */
#define ALLOC_HIGH 0x20 /* __GFP_HIGH set */
#define ALLOC_CPUSET 0x40 /* check for correct cpuset */
+#define ALLOC_PFMEMALLOC 0x80 /* Caller has PF_MEMALLOC set */
#ifdef CONFIG_FAIL_PAGE_ALLOC
@@ -2000,16 +2003,22 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
} else if (unlikely(rt_task(current)) && !in_interrupt())
alloc_flags |= ALLOC_HARDER;
- if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
- if (!in_interrupt() &&
- ((current->flags & PF_MEMALLOC) ||
- unlikely(test_thread_flag(TIF_MEMDIE))))
+ if ((current->flags & PF_MEMALLOC) ||
+ unlikely(test_thread_flag(TIF_MEMDIE))) {
+ alloc_flags |= ALLOC_PFMEMALLOC;
+
+ if (likely(!(gfp_mask & __GFP_NOMEMALLOC)) && !in_interrupt())
alloc_flags |= ALLOC_NO_WATERMARKS;
}
return alloc_flags;
}
+bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
+{
+ return !!(gfp_to_alloc_flags(gfp_mask) & ALLOC_PFMEMALLOC);
+}
+
static inline struct page *
__alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, enum zone_type high_zoneidx,
@@ -2202,8 +2211,16 @@ nopage:
got_pg:
if (kmemcheck_enabled)
kmemcheck_pagealloc_alloc(page, order, gfp_mask);
- return page;
+ /*
+ * page->pfmemalloc is set when the caller had PFMEMALLOC set or is
+ * been OOM killed. The expectation is that the caller is taking
+ * steps that will free more memory. The caller should avoid the
+ * page being used for !PFMEMALLOC purposes.
+ */
+ page->pfmemalloc = !!(alloc_flags & ALLOC_PFMEMALLOC);
+
+ return page;
}
/*
diff --git a/mm/slab.c b/mm/slab.c
index 46a9c16..d0161f2 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -120,6 +120,8 @@
#include <asm/tlbflush.h>
#include <asm/page.h>
+#include "internal.h"
+
/*
* DEBUG - 1 for kmem_cache_create() to honour; SLAB_RED_ZONE & SLAB_POISON.
* 0 for faster, smaller code (especially in the critical paths).
@@ -226,6 +228,7 @@ struct slab {
unsigned int inuse; /* num of objs active in slab */
kmem_bufctl_t free;
unsigned short nodeid;
+ bool pfmemalloc; /* Slab had pfmemalloc set */
};
struct slab_rcu __slab_cover_slab_rcu;
};
@@ -247,15 +250,37 @@ struct array_cache {
unsigned int avail;
unsigned int limit;
unsigned int batchcount;
- unsigned int touched;
+ bool touched;
+ bool pfmemalloc;
spinlock_t lock;
void *entry[]; /*
* Must have this definition in here for the proper
* alignment of array_cache. Also simplifies accessing
* the entries.
+ *
+ * Entries should not be directly dereferenced as
+ * entries belonging to slabs marked pfmemalloc will
+ * have the lower bits set SLAB_OBJ_PFMEMALLOC
*/
};
+#define SLAB_OBJ_PFMEMALLOC 1
+static inline bool is_obj_pfmemalloc(void *objp)
+{
+ return (unsigned long)objp & SLAB_OBJ_PFMEMALLOC;
+}
+
+static inline void set_obj_pfmemalloc(void **objp)
+{
+ *objp = (void *)((unsigned long)*objp | SLAB_OBJ_PFMEMALLOC);
+ return;
+}
+
+static inline void clear_obj_pfmemalloc(void **objp)
+{
+ *objp = (void *)((unsigned long)*objp & ~SLAB_OBJ_PFMEMALLOC);
+}
+
/*
* bootstrap: The caches do not work without cpuarrays anymore, but the
* cpuarrays are allocated from the generic caches...
@@ -888,12 +913,100 @@ static struct array_cache *alloc_arraycache(int node, int entries,
nc->avail = 0;
nc->limit = entries;
nc->batchcount = batchcount;
- nc->touched = 0;
+ nc->touched = false;
spin_lock_init(&nc->lock);
}
return nc;
}
+/* Clears ac->pfmemalloc if no slabs have pfmalloc set */
+static void check_ac_pfmemalloc(struct kmem_cache *cachep,
+ struct array_cache *ac)
+{
+ struct kmem_list3 *l3 = cachep->nodelists[numa_mem_id()];
+ struct slab *slabp;
+
+ if (!ac->pfmemalloc)
+ return;
+
+ list_for_each_entry(slabp, &l3->slabs_full, list)
+ if (slabp->pfmemalloc)
+ return;
+
+ list_for_each_entry(slabp, &l3->slabs_partial, list)
+ if (slabp->pfmemalloc)
+ return;
+
+ list_for_each_entry(slabp, &l3->slabs_free, list)
+ if (slabp->pfmemalloc)
+ return;
+
+ ac->pfmemalloc = false;
+}
+
+static void *ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
+ gfp_t flags, bool force_refill)
+{
+ int i;
+ void *objp = ac->entry[--ac->avail];
+
+ /* Ensure the caller is allowed to use objects from PFMEMALLOC slab */
+ if (unlikely(is_obj_pfmemalloc(objp))) {
+ struct kmem_list3 *l3;
+
+ if (gfp_pfmemalloc_allowed(flags)) {
+ clear_obj_pfmemalloc(&objp);
+ return objp;
+ }
+
+ /* The caller cannot use PFMEMALLOC objects, find another one */
+ for (i = 1; i < ac->avail; i++) {
+ /* If a !PFMEMALLOC object is found, swap them */
+ if (!is_obj_pfmemalloc(ac->entry[i])) {
+ objp = ac->entry[i];
+ ac->entry[i] = ac->entry[ac->avail];
+ ac->entry[ac->avail] = objp;
+ return objp;
+ }
+ }
+
+ /*
+ * If there are empty slabs on the slabs_free list and we are
+ * being forced to refill the cache, mark this one !pfmemalloc.
+ */
+ l3 = cachep->nodelists[numa_mem_id()];
+ if (!list_empty(&l3->slabs_free) && force_refill) {
+ struct slab *slabp = virt_to_slab(objp);
+ slabp->pfmemalloc = false;
+ clear_obj_pfmemalloc(&objp);
+ check_ac_pfmemalloc(cachep, ac);
+ return objp;
+ }
+
+ /* No !PFMEMALLOC objects available */
+ ac->avail++;
+ objp = NULL;
+ }
+
+ return objp;
+}
+
+static void ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
+ void *objp)
+{
+ struct slab *slabp;
+
+ /* If there are pfmemalloc slabs, check if the object is part of one */
+ if (unlikely(ac->pfmemalloc)) {
+ slabp = virt_to_slab(objp);
+
+ if (slabp->pfmemalloc)
+ set_obj_pfmemalloc(&objp);
+ }
+
+ ac->entry[ac->avail++] = objp;
+}
+
/*
* Transfer objects in one arraycache to another.
* Locking must be handled by the caller.
@@ -1070,7 +1183,7 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
STATS_INC_ACOVERFLOW(cachep);
__drain_alien_cache(cachep, alien, nodeid);
}
- alien->entry[alien->avail++] = objp;
+ ac_put_obj(cachep, alien, objp);
spin_unlock(&alien->lock);
} else {
spin_lock(&(cachep->nodelists[nodeid])->list_lock);
@@ -1677,7 +1790,8 @@ __initcall(cpucache_init);
* did not request dmaable memory, we might get it, but that
* would be relatively rare and ignorable.
*/
-static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
+static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid,
+ bool *pfmemalloc)
{
struct page *page;
int nr_pages;
@@ -1698,6 +1812,7 @@ static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder);
if (!page)
return NULL;
+ *pfmemalloc = page->pfmemalloc;
nr_pages = (1 << cachep->gfporder);
if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
@@ -2130,7 +2245,7 @@ static int __init_refok setup_cpu_cache(struct kmem_cache *cachep, gfp_t gfp)
cpu_cache_get(cachep)->avail = 0;
cpu_cache_get(cachep)->limit = BOOT_CPUCACHE_ENTRIES;
cpu_cache_get(cachep)->batchcount = 1;
- cpu_cache_get(cachep)->touched = 0;
+ cpu_cache_get(cachep)->touched = false;
cachep->batchcount = 1;
cachep->limit = BOOT_CPUCACHE_ENTRIES;
return 0;
@@ -2677,6 +2792,7 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep, void *objp,
slabp->s_mem = objp + colour_off;
slabp->nodeid = nodeid;
slabp->free = 0;
+ slabp->pfmemalloc = false;
return slabp;
}
@@ -2808,7 +2924,7 @@ static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
* kmem_cache_alloc() when there are no active objs left in a cache.
*/
static int cache_grow(struct kmem_cache *cachep,
- gfp_t flags, int nodeid, void *objp)
+ gfp_t flags, int nodeid, void *objp, bool pfmemalloc)
{
struct slab *slabp;
size_t offset;
@@ -2852,7 +2968,7 @@ static int cache_grow(struct kmem_cache *cachep,
* 'nodeid'.
*/
if (!objp)
- objp = kmem_getpages(cachep, local_flags, nodeid);
+ objp = kmem_getpages(cachep, local_flags, nodeid, &pfmemalloc);
if (!objp)
goto failed;
@@ -2862,6 +2978,13 @@ static int cache_grow(struct kmem_cache *cachep,
if (!slabp)
goto opps1;
+ /* Record if ALLOC_PFMEMALLOC was set when allocating the slab */
+ if (pfmemalloc) {
+ struct array_cache *ac = cpu_cache_get(cachep);
+ slabp->pfmemalloc = true;
+ ac->pfmemalloc = 1;
+ }
+
slab_map_pages(cachep, slabp, objp);
cache_init_objs(cachep, slabp);
@@ -3003,16 +3126,19 @@ bad:
#define check_slabp(x,y) do { } while(0)
#endif
-static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags)
+static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags,
+ bool force_refill)
{
int batchcount;
struct kmem_list3 *l3;
struct array_cache *ac;
int node;
-retry:
check_irq_off();
node = numa_mem_id();
+ if (unlikely(force_refill))
+ goto force_grow;
+retry:
ac = cpu_cache_get(cachep);
batchcount = ac->batchcount;
if (!ac->touched && batchcount > BATCHREFILL_LIMIT) {
@@ -3030,7 +3156,7 @@ retry:
/* See if we can refill from the shared array */
if (l3->shared && transfer_objects(ac, l3->shared, batchcount)) {
- l3->shared->touched = 1;
+ l3->shared->touched = true;
goto alloc_done;
}
@@ -3062,8 +3188,8 @@ retry:
STATS_INC_ACTIVE(cachep);
STATS_SET_HIGH(cachep);
- ac->entry[ac->avail++] = slab_get_obj(cachep, slabp,
- node);
+ ac_put_obj(cachep, ac, slab_get_obj(cachep, slabp,
+ node));
}
check_slabp(cachep, slabp);
@@ -3082,18 +3208,25 @@ alloc_done:
if (unlikely(!ac->avail)) {
int x;
- x = cache_grow(cachep, flags | GFP_THISNODE, node, NULL);
+force_grow:
+ x = cache_grow(cachep, flags | GFP_THISNODE, node, NULL, false);
/* cache_grow can reenable interrupts, then ac could change. */
ac = cpu_cache_get(cachep);
- if (!x && ac->avail == 0) /* no objects in sight? abort */
+
+ /* no objects in sight? abort */
+ if (!x && (ac->avail == 0 || force_refill))
return NULL;
- if (!ac->avail) /* objects refilled by interrupt? */
+ /* objects refilled by interrupt? */
+ if (!ac->avail) {
+ node = numa_node_id();
goto retry;
+ }
}
- ac->touched = 1;
- return ac->entry[--ac->avail];
+ ac->touched = true;
+
+ return ac_get_obj(cachep, ac, flags, force_refill);
}
static inline void cache_alloc_debugcheck_before(struct kmem_cache *cachep,
@@ -3176,23 +3309,35 @@ static inline void *____cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
void *objp;
struct array_cache *ac;
+ bool force_refill = false;
check_irq_off();
ac = cpu_cache_get(cachep);
if (likely(ac->avail)) {
- STATS_INC_ALLOCHIT(cachep);
- ac->touched = 1;
- objp = ac->entry[--ac->avail];
- } else {
- STATS_INC_ALLOCMISS(cachep);
- objp = cache_alloc_refill(cachep, flags);
+ ac->touched = true;
+ objp = ac_get_obj(cachep, ac, flags, false);
+
/*
- * the 'ac' may be updated by cache_alloc_refill(),
- * and kmemleak_erase() requires its correct value.
+ * Allow for the possibility all avail objects are not allowed
+ * by the current flags
*/
- ac = cpu_cache_get(cachep);
+ if (objp) {
+ STATS_INC_ALLOCHIT(cachep);
+ goto out;
+ }
+ force_refill = true;
}
+
+ STATS_INC_ALLOCMISS(cachep);
+ objp = cache_alloc_refill(cachep, flags, force_refill);
+ /*
+ * the 'ac' may be updated by cache_alloc_refill(),
+ * and kmemleak_erase() requires its correct value.
+ */
+ ac = cpu_cache_get(cachep);
+
+out:
/*
* To avoid a false negative, if an object that is in one of the
* per-CPU caches is leaked, we need to make sure kmemleak doesn't
@@ -3245,6 +3390,7 @@ static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags)
enum zone_type high_zoneidx = gfp_zone(flags);
void *obj = NULL;
int nid;
+ bool pfmemalloc;
if (flags & __GFP_THISNODE)
return NULL;
@@ -3281,7 +3427,8 @@ retry:
if (local_flags & __GFP_WAIT)
local_irq_enable();
kmem_flagcheck(cache, flags);
- obj = kmem_getpages(cache, local_flags, numa_mem_id());
+ obj = kmem_getpages(cache, local_flags, numa_mem_id(),
+ &pfmemalloc);
if (local_flags & __GFP_WAIT)
local_irq_disable();
if (obj) {
@@ -3289,7 +3436,7 @@ retry:
* Insert into the appropriate per node queues
*/
nid = page_to_nid(virt_to_page(obj));
- if (cache_grow(cache, flags, nid, obj)) {
+ if (cache_grow(cache, flags, nid, obj, pfmemalloc)) {
obj = ____cache_alloc_node(cache,
flags | GFP_THISNODE, nid);
if (!obj)
@@ -3361,7 +3508,7 @@ retry:
must_grow:
spin_unlock(&l3->list_lock);
- x = cache_grow(cachep, flags | GFP_THISNODE, nodeid, NULL);
+ x = cache_grow(cachep, flags | GFP_THISNODE, nodeid, NULL, false);
if (x)
goto retry;
@@ -3511,9 +3658,12 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
struct kmem_list3 *l3;
for (i = 0; i < nr_objects; i++) {
- void *objp = objpp[i];
+ void *objp;
struct slab *slabp;
+ clear_obj_pfmemalloc(&objpp[i]);
+ objp = objpp[i];
+
slabp = virt_to_slab(objp);
l3 = cachep->nodelists[node];
list_del(&slabp->list);
@@ -3625,12 +3775,12 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp)
if (likely(ac->avail < ac->limit)) {
STATS_INC_FREEHIT(cachep);
- ac->entry[ac->avail++] = objp;
+ ac_put_obj(cachep, ac, objp);
return;
} else {
STATS_INC_FREEMISS(cachep);
cache_flusharray(cachep, ac);
- ac->entry[ac->avail++] = objp;
+ ac_put_obj(cachep, ac, objp);
}
}
@@ -4056,7 +4206,7 @@ static void drain_array(struct kmem_cache *cachep, struct kmem_list3 *l3,
if (!ac || !ac->avail)
return;
if (ac->touched && !force) {
- ac->touched = 0;
+ ac->touched = false;
} else {
spin_lock_irq(&l3->list_lock);
if (ac->avail) {
diff --git a/mm/slub.c b/mm/slub.c
index df77f78..6707d2e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -30,6 +30,8 @@
#include <trace/events/kmem.h>
+#include "internal.h"
+
/*
* Lock order:
* 1. slab_lock(page)
@@ -1219,7 +1221,8 @@ static void setup_object(struct kmem_cache *s, struct page *page,
s->ctor(object);
}
-static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
+static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node,
+ bool *pfmemalloc)
{
struct page *page;
void *start;
@@ -1234,6 +1237,7 @@ static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
goto out;
inc_slabs_node(s, page_to_nid(page), page->objects);
+ *pfmemalloc = page->pfmemalloc;
page->slab = s;
page->flags |= 1 << PG_slab;
@@ -1757,6 +1761,16 @@ slab_out_of_memory(struct kmem_cache *s, gfp_t gfpflags, int nid)
}
}
+#define SLAB_PAGE_PFMEMALLOC 1
+
+static inline bool pfmemalloc_match(struct kmem_cache_cpu *c, gfp_t gfpflags)
+{
+ if (unlikely(c->pfmemalloc))
+ return gfp_pfmemalloc_allowed(gfpflags);
+
+ return true;
+}
+
/*
* Slow path. The lockless freelist is empty or we need to perform
* debugging duties.
@@ -1780,6 +1794,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
{
void **object;
struct page *new;
+ bool pfmemalloc = false;
#ifdef CONFIG_CMPXCHG_LOCAL
unsigned long flags;
@@ -1801,7 +1816,13 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
goto new_slab;
slab_lock(c->page);
- if (unlikely(!node_match(c, node)))
+
+ /*
+ * By rights, we should be searching for a slab page that was
+ * PFMEMALLOC but right now, we are losing the pfmemalloc
+ * information when the page leaves the per-cpu allocator
+ */
+ if (unlikely(!pfmemalloc_match(c, gfpflags) || !node_match(c, node)))
goto another_slab;
stat(s, ALLOC_REFILL);
@@ -1841,7 +1862,7 @@ new_slab:
if (gfpflags & __GFP_WAIT)
local_irq_enable();
- new = new_slab(s, gfpflags, node);
+ new = new_slab(s, gfpflags, node, &pfmemalloc);
if (gfpflags & __GFP_WAIT)
local_irq_disable();
@@ -1854,6 +1875,7 @@ new_slab:
slab_lock(new);
__SetPageSlubFrozen(new);
c->page = new;
+ c->pfmemalloc = pfmemalloc;
goto load_freelist;
}
if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
@@ -1922,8 +1944,8 @@ redo:
#endif
object = c->freelist;
- if (unlikely(!object || !node_match(c, node)))
-
+ if (unlikely(!object || !node_match(c, node) ||
+ !pfmemalloc_match(c, gfpflags)))
object = __slab_alloc(s, gfpflags, node, addr, c);
else {
@@ -2389,10 +2411,11 @@ static void early_kmem_cache_node_alloc(int node)
struct page *page;
struct kmem_cache_node *n;
unsigned long flags;
+ bool pfmemalloc; /* Ignore this early in boot */
BUG_ON(kmem_cache_node->size < sizeof(struct kmem_cache_node));
- page = new_slab(kmem_cache_node, GFP_NOWAIT, node);
+ page = new_slab(kmem_cache_node, GFP_NOWAIT, node, &pfmemalloc);
BUG_ON(!page);
if (page_to_nid(page) != node) {
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related
* [PATCH 01/13] mm: Serialize access to min_free_kbytes
From: Mel Gorman @ 2011-04-27 16:07 UTC (permalink / raw)
To: Linux-MM, Linux-Netdev
Cc: LKML, David Miller, Neil Brown, Peter Zijlstra, Mel Gorman
In-Reply-To: <1303920491-25302-1-git-send-email-mgorman@suse.de>
There is a race between the min_free_kbytes sysctl, memory hotplug
and transparent hugepage support enablement. Memory hotplug uses a
zonelists_mutex to avoid a race when building zonelists. Reuse it to
serialise watermark updates.
[a.p.zijlstra@chello.nl: Older patch fixed the race with spinlock]
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/page_alloc.c | 23 +++++++++++++++--------
1 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1d5c189..a93013a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4990,14 +4990,7 @@ static void setup_per_zone_lowmem_reserve(void)
calculate_totalreserve_pages();
}
-/**
- * setup_per_zone_wmarks - called when min_free_kbytes changes
- * or when memory is hot-{added|removed}
- *
- * Ensures that the watermark[min,low,high] values for each zone are set
- * correctly with respect to min_free_kbytes.
- */
-void setup_per_zone_wmarks(void)
+static void __setup_per_zone_wmarks(void)
{
unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
unsigned long lowmem_pages = 0;
@@ -5052,6 +5045,20 @@ void setup_per_zone_wmarks(void)
calculate_totalreserve_pages();
}
+/**
+ * setup_per_zone_wmarks - called when min_free_kbytes changes
+ * or when memory is hot-{added|removed}
+ *
+ * Ensures that the watermark[min,low,high] values for each zone are set
+ * correctly with respect to min_free_kbytes.
+ */
+void setup_per_zone_wmarks(void)
+{
+ mutex_lock(&zonelists_mutex);
+ __setup_per_zone_wmarks();
+ mutex_unlock(&zonelists_mutex);
+}
+
/*
* The inactive anon list should be small enough that the VM never has to
* do too much work, but large enough that each inactive page has a chance
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related
* [PATCH 00/13] Swap-over-NBD without deadlocking v3
From: Mel Gorman @ 2011-04-27 16:07 UTC (permalink / raw)
To: Linux-MM, Linux-Netdev
Cc: LKML, David Miller, Neil Brown, Peter Zijlstra, Mel Gorman
Changelog since V2
o Document that __GFP_NOMEMALLOC overrides __GFP_MEMALLOC (Neil)
o Use wait_event_interruptible (Neil)
o Use !! when casting to bool to avoid any possibilitity of type
truncation (Neil)
o Nicer logic when using skb_pfmemalloc_protocol (Neil)
Changelog since V1
o Rebase on top of mmotm
o Use atomic_t for memalloc_socks (David Miller)
o Remove use of sk_memalloc_socks in vmscan (Neil Brown)
o Check throttle within prepare_to_wait (Neil Brown)
o Add statistics on throttling instead of printk
Swapping over NBD is something that is technically possible but not
often advised. While there are number of guides on the internet
on how to configure it and nbd-client supports a -swap switch to
"prevent deadlocks", the fact of the matter is a machine using NBD
for swap can be locked up within minutes if swap is used intensively.
The problem is that network block devices do not use mempools like
normal block devices do. As the host cannot control where they receive
packets from, they cannot reliably work out in advance how much memory
they might need.
Some years ago, Peter Ziljstra developed a series of patches that
supported swap over an NFS that some distributions are carrying in
their kernels. This patch series borrows very heavily from Peter's work
to support swapping over NBD (the relatively straight-forward case)
and uses throttling instead of dynamically resized memory reserves
so the series is not too unwieldy for review.
Patch 1 serialises access to min_free_kbytes. It's not strictly needed
by this series but as the series cares about watermarks in
general, it's a harmless fix. It could be merged independently.
Patch 2 adds knowledge of the PFMEMALLOC reserves to SLAB and SLUB to
preserve access to pages allocated under low memory situations
to callers that are freeying memory.
Patch 3 introduces __GFP_MEMALLOC to allow access to the PFMEMALLOC
reserves without setting PFMEMALLOC.
Patch 4 opens the possibility for softirqs to use PFMEMALLOC reserves
for later use by network packet processing.
Patch 5 ignores memory policies when ALLOC_NO_WATERMARKS is set.
Patches 6-9 allows network processing to use PFMEMALLOC reserves when
the socket has been marked as being used by the VM to clean
pages. If packets are received and stored in pages that were
allocated under low-memory situations and are unrelated to
the VM, the packets are dropped.
Patch 10 is a micro-optimisation to avoid a function call in the
common case.
Patch 11 tags NBD sockets as being SOCK_MEMALLOC so they can use
PFMEMALLOC if necessary.
Patch 12 notes that it is still possible for the PFMEMALLOC reserve
to be depleted. To prevent this, direct reclaimers get
throttled on a waitqueue if 50% of the PFMEMALLOC reserves are
depleted. It is expected that kswapd and the direct reclaimers
already running will clean enough pages for the low watermark
to be reached and the throttled processes are woken up.
Patch 13 adds a statistic to track how often processes get throttled
Some basic performance testing was run using kernel builds, netperf
on loopback for UDP and TCP, hackbench (pipes and sockets), iozone
and sysbench. Each of them were expected to use the sl*b allocators
reasonably heavily but there did not appear to be significant
performance variances. Here is the results from netperf using
slab as an example
NETPERF UDP
netperf-udp udp-swapnbd
vanilla-slab v1r17-slab
64 178.06 ( 0.00%)* 189.46 ( 6.02%)
1.02% 1.00%
128 355.06 ( 0.00%) 370.75 ( 4.23%)
256 662.47 ( 0.00%) 721.62 ( 8.20%)
1024 2229.39 ( 0.00%) 2567.04 (13.15%)
2048 3974.20 ( 0.00%) 4114.70 ( 3.41%)
3312 5619.89 ( 0.00%) 5800.09 ( 3.11%)
4096 6460.45 ( 0.00%) 6702.45 ( 3.61%)
8192 9580.24 ( 0.00%) 9927.97 ( 3.50%)
16384 13259.14 ( 0.00%) 13493.88 ( 1.74%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 2960.17 2540.14
Total Elapsed Time (seconds) 3554.10 3050.10
NETPERF TCP
netperf-tcp tcp-swapnbd
vanilla-slab v1r17-slab
64 1230.29 ( 0.00%) 1273.17 ( 3.37%)
128 2309.97 ( 0.00%) 2375.22 ( 2.75%)
256 3659.32 ( 0.00%) 3704.87 ( 1.23%)
1024 7267.80 ( 0.00%) 7251.02 (-0.23%)
2048 8358.26 ( 0.00%) 8204.74 (-1.87%)
3312 8631.07 ( 0.00%) 8637.62 ( 0.08%)
4096 8770.95 ( 0.00%) 8704.08 (-0.77%)
8192 9749.33 ( 0.00%) 9769.06 ( 0.20%)
16384 11151.71 ( 0.00%) 11135.32 (-0.15%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 1245.04 1619.89
Total Elapsed Time (seconds) 1250.66 1622.18
Here is the equivalent test for SLUB
NETPERF UDP
netperf-udp udp-swapnbd
vanilla-slub v1r17-slub
64 180.83 ( 0.00%) 183.68 ( 1.55%)
128 357.29 ( 0.00%) 367.11 ( 2.67%)
256 679.64 ( 0.00%)* 724.03 ( 6.13%)
1.15% 1.00%
1024 2343.40 ( 0.00%)* 2610.63 (10.24%)
1.68% 1.00%
2048 3971.53 ( 0.00%) 4102.21 ( 3.19%)*
1.00% 1.40%
3312 5677.04 ( 0.00%) 5748.69 ( 1.25%)
4096 6436.75 ( 0.00%) 6549.41 ( 1.72%)
8192 9698.56 ( 0.00%) 9808.84 ( 1.12%)
16384 13337.06 ( 0.00%) 13404.38 ( 0.50%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 2880.15 2180.13
Total Elapsed Time (seconds) 3458.10 2618.09
NETPERF TCP
netperf-tcp tcp-swapnbd
vanilla-slub v1r17-slub
64 1256.79 ( 0.00%) 1287.32 ( 2.37%)
128 2308.71 ( 0.00%) 2371.09 ( 2.63%)
256 3672.03 ( 0.00%) 3771.05 ( 2.63%)
1024 7245.08 ( 0.00%) 7261.60 ( 0.23%)
2048 8315.17 ( 0.00%) 8244.14 (-0.86%)
3312 8611.43 ( 0.00%) 8616.90 ( 0.06%)
4096 8711.64 ( 0.00%) 8695.97 (-0.18%)
8192 9795.71 ( 0.00%) 9774.11 (-0.22%)
16384 11145.48 ( 0.00%) 11225.70 ( 0.71%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 1345.05 1425.06
Total Elapsed Time (seconds) 1350.61 1430.66
Time to completion varied a lot but this can happen with netperf as
it tries to find results within a sufficiently high confidence. I
wouldn't read too much into the performance gains of netperf-udp
as it can sometimes be affected by code just shuffling around for
whatever reason.
For testing swap-over-NBD, a machine was booted with 2G of RAM with a
swapfile backed by NBD. 16*NUM_CPU processes were started that create
anonymous memory mappings and read them linearly in a loop. The total
size of the mappings were 4*PHYSICAL_MEMORY to use swap heavily under
memory pressure. Without the patches, the machine locks up within
minutes and runs to completion with them applied.
drivers/block/nbd.c | 7 +-
include/linux/gfp.h | 13 ++-
include/linux/mm_types.h | 8 ++
include/linux/mmzone.h | 1 +
include/linux/sched.h | 7 ++
include/linux/skbuff.h | 19 +++-
include/linux/slub_def.h | 1 +
include/linux/vm_event_item.h | 1 +
include/net/sock.h | 19 ++++
kernel/softirq.c | 3 +
mm/page_alloc.c | 57 ++++++++--
mm/slab.c | 240 +++++++++++++++++++++++++++++++++++------
mm/slub.c | 35 +++++-
mm/vmscan.c | 55 ++++++++++
mm/vmstat.c | 1 +
net/core/dev.c | 48 ++++++++-
net/core/filter.c | 8 ++
net/core/skbuff.c | 95 ++++++++++++++---
net/core/sock.c | 42 +++++++
net/ipv4/tcp.c | 3 +-
net/ipv4/tcp_output.c | 13 ++-
net/ipv6/tcp_ipv6.c | 12 ++-
22 files changed, 601 insertions(+), 87 deletions(-)
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [ethtool PATCH 4/6] Add support for __be64 and bitops to ethtool
From: Ben Hutchings @ 2011-04-27 15:54 UTC (permalink / raw)
To: Alexander Duyck; +Cc: davem, jeffrey.t.kirsher, netdev
In-Reply-To: <20110421204035.23054.6918.stgit@gitlad.jf.intel.com>
On Thu, 2011-04-21 at 13:40 -0700, Alexander Duyck wrote:
> This change is meant to add support for __be64 values and bitops to
> ethtool. These changes will be needed in order to support network flow
> classifier rule configuration.
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>
> ethtool-bitops.h | 25 +++++++++++++++++++++++++
> ethtool-util.h | 30 ++++++++++++++++++++++++++----
> ethtool.c | 7 -------
> 3 files changed, 51 insertions(+), 11 deletions(-)
> create mode 100644 ethtool-bitops.h
>
> diff --git a/ethtool-bitops.h b/ethtool-bitops.h
> new file mode 100644
> index 0000000..0ff14f1
> --- /dev/null
> +++ b/ethtool-bitops.h
> @@ -0,0 +1,25 @@
> +#ifndef ETHTOOL_BITOPS_H__
> +#define ETHTOOL_BITOPS_H__
> +
> +#define BITS_PER_BYTE 8
> +#define BITS_PER_LONG (BITS_PER_BYTE * sizeof(long))
> +#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
> +#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_LONG)
> +
> +static inline void set_bit(int nr, unsigned long *addr)
> +{
> + addr[nr / BITS_PER_LONG] |= 1UL << (nr % BITS_PER_LONG);
> +}
> +
> +static inline void clear_bit(int nr, unsigned long *addr)
> +{
> + addr[nr / BITS_PER_LONG] &= ~(1UL << (nr % BITS_PER_LONG));
> +}
> +
> +static __always_inline int test_bit(unsigned int nr, const unsigned long *addr)
> +{
> + return !!((1UL << (nr % BITS_PER_LONG)) &
> + (((unsigned long *)addr)[nr / BITS_PER_LONG]));
> +}
Where is __always_inline supposed to be defined?
> +#endif
> diff --git a/ethtool-util.h b/ethtool-util.h
> index f053028..3d46faf 100644
> --- a/ethtool-util.h
> +++ b/ethtool-util.h
> @@ -5,15 +5,18 @@
>
> #include <sys/types.h>
> #include <endian.h>
> +#include <sys/ioctl.h>
> +#include <net/if.h>
> +#include "ethtool-config.h"
> +#include "ethtool-copy.h"
>
> /* ethtool.h expects these to be defined by <linux/types.h> */
> #ifndef HAVE_BE_TYPES
> typedef __uint16_t __be16;
> typedef __uint32_t __be32;
> +typedef unsigned long long __be64;
> #endif
>
> -#include "ethtool-copy.h"
> -
You can't move the inclusion of ethtool-copy.h; that defeats the whole
purpose of the HAVE_BE_TYPES feature test.
[...]
> +#ifndef ARRAY_SIZE
> +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> +#endif
> +
> +#ifndef SIOCETHTOOL
> +#define SIOCETHTOOL 0x8946
> #endif
>
> /* National Semiconductor DP83815, DP83816 */
> diff --git a/ethtool.c b/ethtool.c
> index 9ad7000..15af86a 100644
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -45,16 +45,9 @@
> #include <linux/sockios.h>
> #include "ethtool-util.h"
>
> -
> -#ifndef SIOCETHTOOL
> -#define SIOCETHTOOL 0x8946
> -#endif
> #ifndef MAX_ADDR_LEN
> #define MAX_ADDR_LEN 32
> #endif
> -#ifndef ARRAY_SIZE
> -#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> -#endif
>
> #ifndef HAVE_NETIF_MSG
> enum {
>
Presumably this is needed by the next patch, but it has nothing to do
with what the commit message says.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [ethtool PATCH 2/6] Add support for NFC flow classifier extensions
From: Ben Hutchings @ 2011-04-27 15:48 UTC (permalink / raw)
To: Alexander Duyck; +Cc: davem, jeffrey.t.kirsher, netdev
In-Reply-To: <20110421204025.23054.3310.stgit@gitlad.jf.intel.com>
On Thu, 2011-04-21 at 13:40 -0700, Alexander Duyck wrote:
> This change makes it so that we can add VLAN Ethertype, TCI, and 64bits of
> driver defined data to a network flow classifier allowing us to handle a
> n-tuple flow contained within a network flow classifier.
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
[...]
This does not correspond to any version of ethtool.h in kernel history.
I sync'd from current net-next-2.6 instead.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [ethtool PATCH 1/6] Add support for ESP as a separate protocol from AH
From: Ben Hutchings @ 2011-04-27 15:47 UTC (permalink / raw)
To: Alexander Duyck; +Cc: davem, jeffrey.t.kirsher, netdev
In-Reply-To: <20110421204020.23054.60822.stgit@gitlad.jf.intel.com>
On Thu, 2011-04-21 at 13:40 -0700, Alexander Duyck wrote:
> This change is mostly cosmetic. NIU had supported AH and ESP seperately.
> As such it is possible that a return value of ESP or AH may be returned
> for a has request instead of the AH_ESP combined value. To resolve that
> the inputs are combined for AH and ESP into the AH_ESP value and return
> values for AH and ESP will display the combined string info.
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Applied, but I moved this into a separate commit:
[...]
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -32,7 +32,6 @@
> #include <sys/ioctl.h>
> #include <sys/stat.h>
> #include <stdio.h>
> -#include <string.h>
> #include <errno.h>
> #include <net/if.h>
> #include <sys/utsname.h>
[...]
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [RFC PATCH] netlink: Increase netlink dump skb message size
From: Steve Hodgson @ 2011-04-27 15:46 UTC (permalink / raw)
To: Eric Dumazet
Cc: Rose, Gregory V, David Miller, netdev@vger.kernel.org,
bhutchings@solarflare.com
In-Reply-To: <1303834864.3358.58.camel@edumazet-laptop>
On 04/27/2011 04:24 PM, Eric Dumazet wrote:
> Le mardi 26 avril 2011 à 09:12 -0700, Rose, Gregory V a écrit :
>
>> I'm fine with however you folks want to approach this, just give me some direction.
>
> I would just try following patch :
>
This allows the sfc driver to use 102 VFs, up from the current limit of
45 VFs.
It's unfortunate that this patch isn't sufficient to allow all 127 VFs
to be used, but whilst we wait for a new netlink api this is an
improvement worth having.
I'd like to see this patch committed.
Thanks Eric,
- Steve
^ permalink raw reply
* RE: [PATCH] igb: restore EEPROM 16kB access limit
From: Wyborny, Carolyn @ 2011-04-27 15:01 UTC (permalink / raw)
To: Andy Gospodarek
Cc: Stefan Assmann, netdev@vger.kernel.org,
e1000-devel@lists.sourceforge.net, Kirsher, Jeffrey T,
Pieper, Jeffrey E, Ronciak, John
In-Reply-To: <20110427141542.GB21309@gospo.rdu.redhat.com>
>-----Original Message-----
>From: Andy Gospodarek [mailto:andy@greyhouse.net]
>Sent: Wednesday, April 27, 2011 7:16 AM
>To: Wyborny, Carolyn
>Cc: Andy Gospodarek; Stefan Assmann; netdev@vger.kernel.org; e1000-
>devel@lists.sourceforge.net; Kirsher, Jeffrey T; Pieper, Jeffrey E;
>Ronciak, John
>Subject: Re: [PATCH] igb: restore EEPROM 16kB access limit
>
>On Tue, Apr 26, 2011 at 08:12:20AM -0700, Wyborny, Carolyn wrote:
>[...]
>> Part of the problem you are seeing is an apparently widespread EEPROM
>problem where the size word in the EEPROM is invalid. Since we didn't
>really check it before it didn't cause a problem. I have a patch coming
>that addresses this by messaging the user that the size is invalid but
>setting it to a default and continuing.
>>
>
>It wasn't really a problem for me until the commit Stefan mentioned
>4322e561a93ec7ee034b603a6c610e7be90d4e8a was applied.
>
>I'm glad you are planning a fix for it, but I hope it can be out soon
>and not held up for too long by other patches planned for the next
>update.
Yes, the problem wasn't there before because of a bug in the code. The bad EEPROM's have apparently been out there a while and are now being exposed, now that the code is fixed. We didn't see one in our test of the fix originally or know there were out there until the reports starting coming in.
I'm pushing the fix through as soon as possible. Its in test now. I apologize for the delay.
Thanks,
Carolyn
Carolyn Wyborny
Linux Development
LAN Access Division
Intel Corporation
^ permalink raw reply
* Re: [PATCH net-next-2.6] ipv4,ipv6,bonding: Restore control over number of peer notifications
From: Brian Haley @ 2011-04-27 14:21 UTC (permalink / raw)
To: Ben Hutchings
Cc: Jay Vosburgh, Andy Gospodarek, David Miller, Patrick McHardy,
netdev
In-Reply-To: <1303870446.3032.399.camel@localhost>
On 04/26/2011 10:14 PM, Ben Hutchings wrote:
> On Tue, 2011-04-26 at 22:09 -0400, Brian Haley wrote:
>> On 04/26/2011 09:25 PM, Ben Hutchings wrote:
>>> For backward compatibility, we should retain the module parameters and
>>> sysfs attributes to control the number of peer notifications
>>> (gratuitous ARPs and unsolicited NAs) sent after bonding failover.
>>> Also, it is possible for failover to take place even though the new
>>> active slave does not have link up, and in that case the peer
>>> notification should be deferred until it does.
>>>
>>> Change ipv4 and ipv6 so they do not automatically send peer
>>> notifications on bonding failover.
>>>
>>> Change the bonding driver to send separate NETDEV_NOTIFY_PEERS
>>> notifications when the link is up, as many times as requested. Since
>>> it does not directly control which protocols send notifications, make
>>> num_grat_arp and num_unsol_na aliases for a single parameter. Bump
>>> the bonding version number and update its documentation.
>>>
>>> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
>>
>> Signed-off-by: Brian Haley <brian.haley@hp.com>
>
> I'm not sure what you mean by this. You didn't write any of it and
> you're not a maintainer with your own repository. Did you mean to say
> 'Reviewed-by' or 'Acked-by'?
Sorry, yes:
Acked-by: Brian Haley <brian.haley@hp.com>
^ permalink raw reply
* Re: [PATCH] igb: restore EEPROM 16kB access limit
From: Andy Gospodarek @ 2011-04-27 14:15 UTC (permalink / raw)
To: Wyborny, Carolyn
Cc: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org,
Stefan Assmann, Ronciak, John, Andy Gospodarek
In-Reply-To: <EDC0E76513226749BFBC9C3FB031318F016AF39010@orsmsx508.amr.corp.intel.com>
On Tue, Apr 26, 2011 at 08:12:20AM -0700, Wyborny, Carolyn wrote:
[...]
> Part of the problem you are seeing is an apparently widespread EEPROM problem where the size word in the EEPROM is invalid. Since we didn't really check it before it didn't cause a problem. I have a patch coming that addresses this by messaging the user that the size is invalid but setting it to a default and continuing.
>
It wasn't really a problem for me until the commit Stefan mentioned
4322e561a93ec7ee034b603a6c610e7be90d4e8a was applied.
I'm glad you are planning a fix for it, but I hope it can be out soon
and not held up for too long by other patches planned for the next
update.
------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today. Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
^ permalink raw reply
* Re: [PATCH] Applying inappropriate ioctl operation on socket should return ENOTTY
From: Lifeng Sun @ 2011-04-27 13:54 UTC (permalink / raw)
To: Alan Cox; +Cc: linux-kernel, netdev
In-Reply-To: <20110427130904.47331a9f@lxorguk.ukuu.org.uk>
On 13:09 Wed 04/27/11 Apr, Alan Cox wrote:
> > diff --git a/drivers/char/applicom.c b/drivers/char/applicom.c
> > index 25373df..50c09e4 100644
> > --- a/drivers/char/applicom.c
> > +++ b/drivers/char/applicom.c
> > @@ -838,6 +838,6 @@ static long ac_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> > Dummy = readb(apbs[IndexCard].RamIO + VERS);
> > kfree(adgl);
> > mutex_unlock(&ac_mutex);
> > - return 0;
> > + return ret;
> > }
>
> This one in fact is a bug fix where 0 gets returned not an error code it
> ought to be submitted separately and described as such.
Will do.
>
> > diff --git a/drivers/char/dtlk.c b/drivers/char/dtlk.c
> > index 85156dd..2d116d5 100644
> > --- a/drivers/char/dtlk.c
> > +++ b/drivers/char/dtlk.c
> > @@ -289,7 +289,7 @@ static long dtlk_ioctl(struct file *file,
> > return put_user(portval, argp);
> >
> > default:
> > - return -EINVAL;
> > + return -ENOTTY;
> > }
> > }
>
> This one looks good (and the driver has another error in the ioctl
> handler too that wants fixing where it returnds -EINVAL not -EFAULT)
Will fix.
> > diff --git a/drivers/char/i8k.c b/drivers/char/i8k.c
> > index d72433f..4ba9b9f 100644
> > --- a/drivers/char/i8k.c
> > +++ b/drivers/char/i8k.c
> > @@ -370,7 +370,7 @@ i8k_ioctl_unlocked(struct file *fp, unsigned int cmd, unsigned long arg)
> > break;
> >
> > default:
> > - return -EINVAL;
> > + return -ENOTTY;
>
> This one is incomplete - the driver also has a bogus check for arg being
> non zero. That means ioctl(fd, BOGUS, 0) will return the wrong error code
> still.
Will fix.
> > diff --git a/drivers/char/ipmi/ipmi_devintf.c b/drivers/char/ipmi/ipmi_devintf.c
> > index 2aa3977..bc8af5a 100644
> > --- a/drivers/char/ipmi/ipmi_devintf.c
> > +++ b/drivers/char/ipmi/ipmi_devintf.c
> > @@ -232,7 +232,7 @@ static int ipmi_ioctl(struct file *file,
> > unsigned int cmd,
> > unsigned long data)
> > {
> > - int rv = -EINVAL;
> > + int rv = -ENOTTY;
> > struct ipmi_file_private *priv = file->private_data;
> > void __user *arg = (void __user *)data;
>
> No - there are cases that should return -EINVAL that this will break - a
> default case needs adding
All execution paths overwrite the return value except those should
return -ENOTTY, but it's more clear to add a default case.
Will do.
> > diff --git a/drivers/char/viotape.c b/drivers/char/viotape.c
> > index ad6e64a..a427d40 100644
> > --- a/drivers/char/viotape.c
> > +++ b/drivers/char/viotape.c
> > @@ -529,7 +529,7 @@ static int viotap_ioctl(struct inode *inode, struct file *file,
> >
> > down(&reqSem);
> >
> > - ret = -EINVAL;
> > + ret = -ENOTTY;
>
> Again this messes up the returns because code assumes the initial
> default.
Likewise, except the unsupported MTIOCPOS command. SuSv4 has two
appropriate errno's for this unsupported case and the one below
returns EOPNOTSUPP,
[ENOTSUP]
Not supported (may be the same value as [EOPNOTSUPP]).
[EOPNOTSUPP]
Operation not supported on socket (may be the same value as
[ENOTSUP]).
but the manpage of ioctl disagree. I am wondering how to handle
unsupported ioctl operations. Maybe following the manpage is a better
choice though it's not exact.
> The original code also has bugs too (wrong error off copy_*_user()
> again)
Will fix.
> > diff --git a/fs/pipe.c b/fs/pipe.c
> > index da42f7d..fe7ffe4 100644
> > --- a/fs/pipe.c
> > +++ b/fs/pipe.c
> > @@ -665,7 +665,7 @@ static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> >
> > return put_user(count, (int __user *)arg);
> > default:
> > - return -EINVAL;
> > + return -ENOTTY;
> > }
>
> Looks good - but this one really does want to be a patch on its own as if
> anything causes compatibility funnies it will be this, and we need to be
> sure we can bisect nicely to it should this occur.
will submit serparately.
> > @@ -5041,7 +5041,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
> > /* Set the per device memory buffer space.
> > * Not applicable in our case */
> > case SIOCSIFLINK:
> > - return -EINVAL;
> > + return -EOPNOTSUPP;
>
> This change seems unrelated to anything in your description and outside
> of anything SuS cares about or demands.
As stated above. I would submit separately.
- Lifeng
--
^ permalink raw reply
* Re: Question for canutils
From: Marc Kleine-Budde @ 2011-04-27 13:50 UTC (permalink / raw)
To: Tomoya MORINAGA
Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
netdev-u79uwXL29TY76Z2rM5mHXA,
toshiharu-linux-ECg8zkTtlr0C6LszWs/t0g,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
'Wolfgang Grandegger'
In-Reply-To: <02D6556ECE7A41859777EB7300D12394-c0cKtqp5df7I9507bXv2FdBPR1lH4CV8@public.gmane.org>
[-- Attachment #1.1: Type: text/plain, Size: 1937 bytes --]
On 04/27/2011 10:36 AM, Tomoya MORINAGA wrote:
> I have 2 questions for canutils.
>
> 1) Build issue
> I downloaded the latest canutils and libsocketcan.
> - canutils-4.0.6
> - libsocketcan-0.0.8
>
> Firstly, I installed libsocketcan-0.0.8
Into which location?
Please send me the output of:
ls -l /usr/local/lib
ls -l /usr/local/lib/pkgconfig
> Secondly, I tried to install canutils-4.0.6.
> But it failed like below.
>
> [root@localhost canutils-4.0.6]# ./configure
> ...snip...
> checking whether lstat correctly handles trailing slash... yes
> checking whether stat accepts an empty string... no
> checking for socket... yes
> checking for strchr... yes
> checking for strtoul... yes
> checking for pkg-config... /usr/bin/pkg-config
> checking pkg-config is at least version 0.9.0... yes
> checking for libsocketcan... no
> configure: error: *** libsocketcan version above 0.0.8 not found on your system
> [root@localhost canutils-4.0.6]#
>
> Do you have any information about the fail?
> BTW, Using canutils-3.0.2, I could confirm build becomes success.
3.x doesn't work with the mainline configuration interface.
> 2) How to use
> Executing candump like following, I see the following message.(Of course, pch_can have already installed)
> [root@localhost morinaga]# candump can0
> interface = can0, family = 29, type = 3, proto = 1
> read: Network is down
> [root@localhost morinaga]#
>
> Would you tell me how to active can0 interface.
First build the canutils, then configure the bitrate, e.g. 250 kbit/s:
$ canconfig can0 bitrate 250000
$ ifconfig can0 up
regards, Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
[-- Attachment #2: Type: text/plain, Size: 188 bytes --]
_______________________________________________
Socketcan-core mailing list
Socketcan-core-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org
https://lists.berlios.de/mailman/listinfo/socketcan-core
^ permalink raw reply
* Re: [PATCH v4 1/1] can: add pruss CAN driver.
From: Wolfgang Grandegger @ 2011-04-27 13:34 UTC (permalink / raw)
To: Subhasish Ghosh
Cc: sachi-EvXpCiN+lbve9wHmmfpqLFaTQe2KTcn/,
davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/,
Netdev-u79uwXL29TY76Z2rM5mHXA, nsekhar-l0cyMroinI0, open list,
CAN NETWORK DRIVERS, Marc Kleine-Budde, m-watkins-l0cyMroinI0,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <4DB81A12.1000006-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>
On 04/27/2011 03:28 PM, Wolfgang Grandegger wrote:
> On 04/27/2011 03:08 PM, Subhasish Ghosh wrote:
>>>
>>> - Use just *one* value per sysfs file
>>
>> SG - I felt adding entry for each mbx_id will clutter the sysfs.
>> Is it ok to do that.
>
> No, see:
>
> http://lxr.linux.no/#linux+v2.6.38/Documentation/filesystems/sysfs.txt#L56
s/No/Yes/
Sorry for the confusion.
Wolfgang.
^ permalink raw reply
* Re: [PATCH v4 1/1] can: add pruss CAN driver.
From: Wolfgang Grandegger @ 2011-04-27 13:28 UTC (permalink / raw)
To: Subhasish Ghosh
Cc: sachi-EvXpCiN+lbve9wHmmfpqLFaTQe2KTcn/,
davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/,
Netdev-u79uwXL29TY76Z2rM5mHXA, nsekhar-l0cyMroinI0, open list,
CAN NETWORK DRIVERS, Marc Kleine-Budde, m-watkins-l0cyMroinI0,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <46D523E49EFF489F9B088AE7B9CD7286@subhasishg>
On 04/27/2011 03:08 PM, Subhasish Ghosh wrote:
>>
>> - Use just *one* value per sysfs file
>
> SG - I felt adding entry for each mbx_id will clutter the sysfs.
> Is it ok to do that.
No, see:
http://lxr.linux.no/#linux+v2.6.38/Documentation/filesystems/sysfs.txt#L56
>>>> +static u32 pruss_intc_init[19][3] = {
>>>> + {PRUSS_INTC_POLARITY0, PRU_INTC_REGMAP_MASK, 0xFFFFFFFF},
>>>> + {PRUSS_INTC_POLARITY1, PRU_INTC_REGMAP_MASK, 0xFFFFFFFF},
>>>> + {PRUSS_INTC_TYPE0, PRU_INTC_REGMAP_MASK, 0x1C000000},
>>>> + {PRUSS_INTC_TYPE1, PRU_INTC_REGMAP_MASK, 0},
>>>> + {PRUSS_INTC_GLBLEN, 0, 1},
>>>> + {PRUSS_INTC_HOSTMAP0, PRU_INTC_REGMAP_MASK, 0x03020100},
>>>> + {PRUSS_INTC_HOSTMAP1, PRU_INTC_REGMAP_MASK, 0x07060504},
>>>> + {PRUSS_INTC_HOSTMAP2, PRU_INTC_REGMAP_MASK, 0x0000908},
>>>> + {PRUSS_INTC_CHANMAP0, PRU_INTC_REGMAP_MASK, 0},
>>>> + {PRUSS_INTC_CHANMAP8, PRU_INTC_REGMAP_MASK, 0x00020200},
>>>> + {PRUSS_INTC_STATIDXCLR, 0, 32},
>>>> + {PRUSS_INTC_STATIDXCLR, 0, 19},
>>>> + {PRUSS_INTC_ENIDXSET, 0, 19},
>>>> + {PRUSS_INTC_STATIDXCLR, 0, 18},
>>>> + {PRUSS_INTC_ENIDXSET, 0, 18},
>>>> + {PRUSS_INTC_STATIDXCLR, 0, 34},
>>>> + {PRUSS_INTC_ENIDXSET, 0, 34},
>>>> + {PRUSS_INTC_ENIDXSET, 0, 32},
>>>> + {PRUSS_INTC_HOSTINTEN, 0, 5}
>>>
>>> please add ","
>>
>> Also a struct to describe each entry would improve readability.
>> Then you could also use ARRAY_SIZE.
>
> SG _ I could not follow this, are you recommending that I create a
> structure with three variables and then create
> an array for it.
> something like:
>
> const static struct [] = {
> {
> unsigned int reg_base;
> unsigned int reg_mask;
> unsigned int reg_val;
> },
> ...
> };
Yes:
struct s_name {
unsigned int base;
unsigned int mask;
unsigned int val;
};
const static struct s_name array[] = {
...
};
>
>>>> + value = (PRUSS_CAN_GPIO_SETUP_DELAY *
>>>> + (priv->clk_freq_pru / 1000000) / 1000) /
>>>> + PRUSS_CAN_DELAY_LOOP_LENGTH;
>>
>> This calculation looks delicate. 64-bit math would be safer.
>
> SG - This one works fine. I am dividing it twice to avoid the problem.
Yes, but what if the frequency increases with the next generation of the
hardware?
Wolfgang.
^ permalink raw reply
* Re: [PATCH v4 1/1] can: add pruss CAN driver.
From: Arnd Bergmann @ 2011-04-27 13:25 UTC (permalink / raw)
To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
Cc: sachi-EvXpCiN+lbve9wHmmfpqLFaTQe2KTcn/,
davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/,
Subhasish Ghosh, nsekhar-l0cyMroinI0, open list,
CAN NETWORK DRIVERS, Marc Kleine-Budde,
Netdev-u79uwXL29TY76Z2rM5mHXA, m-watkins-l0cyMroinI0,
Wolfgang Grandegger
In-Reply-To: <46D523E49EFF489F9B088AE7B9CD7286@subhasishg>
On Wednesday 27 April 2011, Subhasish Ghosh wrote:
> >
> > - Use just one value per sysfs file
>
> SG - I felt adding entry for each mbx_id will clutter the sysfs.
> Is it ok to do that.
That is probably not much better either.
Note also that every sysfs file needs to come with associated
documentation in Documentation/ABI/*/ to make sure that users
will know exactly how the file is meant to work.
Why do you need to export these values in the first place? Is
it just for debugging or do you expect all CAN user space
to look at this?
If it's for debugging, please don't export the files through sysfs.
Depending on how useful the data is to regular users, you can
still export it through a debugfs file in that case, which has
much less strict rules.
If the file is instead meant as part of the regular operation of
the device, it should not be in debugfs but probably be integrated
into the CAN socket interface, so that users don't need to work
with two different ways of getting to the device (socket and sysfs).
Arnd
^ permalink raw reply
* Re: [PATCH v4 1/1] can: add pruss CAN driver.
From: Marc Kleine-Budde @ 2011-04-27 13:21 UTC (permalink / raw)
To: Subhasish Ghosh
Cc: sachi-EvXpCiN+lbve9wHmmfpqLFaTQe2KTcn/,
davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/,
Netdev-u79uwXL29TY76Z2rM5mHXA, nsekhar-l0cyMroinI0, open list,
CAN NETWORK DRIVERS,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
m-watkins-l0cyMroinI0, Wolfgang Grandegger
In-Reply-To: <46D523E49EFF489F9B088AE7B9CD7286@subhasishg>
[-- Attachment #1.1: Type: text/plain, Size: 2178 bytes --]
On 04/27/2011 03:08 PM, Subhasish Ghosh wrote:
>>>> +static u32 pruss_intc_init[19][3] = {
>>>> + {PRUSS_INTC_POLARITY0, PRU_INTC_REGMAP_MASK, 0xFFFFFFFF},
>>>> + {PRUSS_INTC_POLARITY1, PRU_INTC_REGMAP_MASK, 0xFFFFFFFF},
>>>> + {PRUSS_INTC_TYPE0, PRU_INTC_REGMAP_MASK, 0x1C000000},
>>>> + {PRUSS_INTC_TYPE1, PRU_INTC_REGMAP_MASK, 0},
>>>> + {PRUSS_INTC_GLBLEN, 0, 1},
>>>> + {PRUSS_INTC_HOSTMAP0, PRU_INTC_REGMAP_MASK, 0x03020100},
>>>> + {PRUSS_INTC_HOSTMAP1, PRU_INTC_REGMAP_MASK, 0x07060504},
>>>> + {PRUSS_INTC_HOSTMAP2, PRU_INTC_REGMAP_MASK, 0x0000908},
>>>> + {PRUSS_INTC_CHANMAP0, PRU_INTC_REGMAP_MASK, 0},
>>>> + {PRUSS_INTC_CHANMAP8, PRU_INTC_REGMAP_MASK, 0x00020200},
>>>> + {PRUSS_INTC_STATIDXCLR, 0, 32},
>>>> + {PRUSS_INTC_STATIDXCLR, 0, 19},
>>>> + {PRUSS_INTC_ENIDXSET, 0, 19},
>>>> + {PRUSS_INTC_STATIDXCLR, 0, 18},
>>>> + {PRUSS_INTC_ENIDXSET, 0, 18},
>>>> + {PRUSS_INTC_STATIDXCLR, 0, 34},
>>>> + {PRUSS_INTC_ENIDXSET, 0, 34},
>>>> + {PRUSS_INTC_ENIDXSET, 0, 32},
>>>> + {PRUSS_INTC_HOSTINTEN, 0, 5}
>>>
>>> please add ","
>>
>> Also a struct to describe each entry would improve readability.
>> Then you could also use ARRAY_SIZE.
>
> SG _ I could not follow this, are you recommending that I create a
> structure with three variables and then create
> an array for it.
> something like:
>
> const static struct [] = {
> {
> unsigned int reg_base;
> unsigned int reg_mask;
> unsigned int reg_val;
> },
> ...
> };
I think this isn't valid C. It should look like this:
struct pruss_intc_init {
unsigned long reg_base;
u32 reg_mask;
u32 reg+val;
};
static const struct pruss_intc_init pruss_initc_init[] = {
{ .reg_base = 0xdeadbeef, .reg_mask = 0xaa, .reg_val = 0x55 },
...
};
I'm not sure about the datatype of reg_base. I haven't looked at the
code that uses this array.
cheers, Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
[-- Attachment #2: Type: text/plain, Size: 188 bytes --]
_______________________________________________
Socketcan-core mailing list
Socketcan-core-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org
https://lists.berlios.de/mailman/listinfo/socketcan-core
^ permalink raw reply
* Re: [PATCH v4 1/1] can: add pruss CAN driver.
From: Subhasish Ghosh @ 2011-04-27 13:08 UTC (permalink / raw)
To: Wolfgang Grandegger, Marc Kleine-Budde
Cc: sachi-EvXpCiN+lbve9wHmmfpqLFaTQe2KTcn/,
davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/,
Netdev-u79uwXL29TY76Z2rM5mHXA, nsekhar-l0cyMroinI0, open list,
CAN NETWORK DRIVERS, m-watkins-l0cyMroinI0,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <4DB5D452.9050500-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>
>
> - Use just *one* value per sysfs file
SG - I felt adding entry for each mbx_id will clutter the sysfs.
Is it ok to do that.
>>> +static u32 pruss_intc_init[19][3] = {
>>> + {PRUSS_INTC_POLARITY0, PRU_INTC_REGMAP_MASK, 0xFFFFFFFF},
>>> + {PRUSS_INTC_POLARITY1, PRU_INTC_REGMAP_MASK, 0xFFFFFFFF},
>>> + {PRUSS_INTC_TYPE0, PRU_INTC_REGMAP_MASK, 0x1C000000},
>>> + {PRUSS_INTC_TYPE1, PRU_INTC_REGMAP_MASK, 0},
>>> + {PRUSS_INTC_GLBLEN, 0, 1},
>>> + {PRUSS_INTC_HOSTMAP0, PRU_INTC_REGMAP_MASK, 0x03020100},
>>> + {PRUSS_INTC_HOSTMAP1, PRU_INTC_REGMAP_MASK, 0x07060504},
>>> + {PRUSS_INTC_HOSTMAP2, PRU_INTC_REGMAP_MASK, 0x0000908},
>>> + {PRUSS_INTC_CHANMAP0, PRU_INTC_REGMAP_MASK, 0},
>>> + {PRUSS_INTC_CHANMAP8, PRU_INTC_REGMAP_MASK, 0x00020200},
>>> + {PRUSS_INTC_STATIDXCLR, 0, 32},
>>> + {PRUSS_INTC_STATIDXCLR, 0, 19},
>>> + {PRUSS_INTC_ENIDXSET, 0, 19},
>>> + {PRUSS_INTC_STATIDXCLR, 0, 18},
>>> + {PRUSS_INTC_ENIDXSET, 0, 18},
>>> + {PRUSS_INTC_STATIDXCLR, 0, 34},
>>> + {PRUSS_INTC_ENIDXSET, 0, 34},
>>> + {PRUSS_INTC_ENIDXSET, 0, 32},
>>> + {PRUSS_INTC_HOSTINTEN, 0, 5}
>>
>> please add ","
>
> Also a struct to describe each entry would improve readability.
> Then you could also use ARRAY_SIZE.
SG _ I could not follow this, are you recommending that I create a structure
with three variables and then create
an array for it.
something like:
const static struct [] = {
{
unsigned int reg_base;
unsigned int reg_mask;
unsigned int reg_val;
},
...
};
>>> + value = (PRUSS_CAN_GPIO_SETUP_DELAY *
>>> + (priv->clk_freq_pru / 1000000) / 1000) /
>>> + PRUSS_CAN_DELAY_LOOP_LENGTH;
>
> This calculation looks delicate. 64-bit math would be safer.
SG - This one works fine. I am dividing it twice to avoid the problem.
>>> + pru_can_mask_ints(priv->dev, PRUSS_CAN_TX_PRU_1, false);
>>> + pru_can_mask_ints(priv->dev, PRUSS_CAN_RX_PRU_0, false);
>>> + can_bus_off(ndev);
>>> + dev_dbg(priv->ndev->dev.parent, "Bus off mode\n");
>>> + }
>>> +
>>> + netif_rx(skb);
>
> You should use netif_receive_skb(skb) here as well.
SG - Ok, Will do.
>
> if (PRUSS_CAN_ISR_BIT_ESI &
> priv->can_rx_cntx.intr_stat) {
>
> Is more readable.
SG - Ok, Will do.
>
>
>>> + pru_can_gbl_stat_get(priv->dev,
>>> + &priv->can_rx_cntx);
>>> + pru_can_err(ndev,
>>> + priv->can_rx_cntx.intr_stat,
>>> + priv->can_rx_cntx.gbl_stat);
>
> Please fix bogous indention.
SG - Ok, Will do.
>>> +
>>> + pdata = dev->platform_data;
>>> + if (!pdata) {
>>> + dev_err(&pdev->dev, "platform data not found\n");
>>> + return -EINVAL;
>>> + }
>>> + (pdata->setup)();
>>
>> no need fot the ( )
SG - Ok, Will do.
>>> + }
>>> +
>>> + priv->ndev = ndev;
>>> + priv->dev = dev;
>>> +
>>> + priv->can.bittiming_const = &pru_can_bittiming_const;
>>> + priv->can.do_set_bittiming = pru_can_set_bittiming;
>>> + priv->can.do_set_mode = pru_can_set_mode;
>>> + priv->can.do_get_state = pru_can_get_state;
>
> Please remove that callback. It's not needed as state changes are
> handled properly.
>
SG -- Ok, Will do
^ permalink raw reply
* Re: Maximum no of bytes Ethernet can transfer at a time ??
From: Neil Horman @ 2011-04-27 13:01 UTC (permalink / raw)
To: Ajit; +Cc: netdev
In-Reply-To: <loom.20110427T140651-957@post.gmane.org>
On Wed, Apr 27, 2011 at 12:11:35PM +0000, Ajit wrote:
> Guys,
>
> I have developed a code which uses raw sockets to transfer files. My code skips
> all the upper layer protocols,I have designed a small protocol of my own.
>
> Now to problem is, whenever I transfer a large file it creates a problem. If
> transfer a file of suppose 100kb or more, only 97.9 Kb is received, unlike in
> the case of files smaller that 97.9.
>
> What can be the problem ?? Does continuously sending and receiving of frames
> creates a problem ??
>
> If any one is interested I can give you my code..
>
> Thank you.
>
What transport layer are you using (UDP/TCP/SCTP/etc)? Does a simmilar transfer
work if you use an actual TCP/UDP socket, rather than your raw one? If you're
consistently getting 97.9k transferred, that almost sounds like some sort of
firewall type connection limitation. Take a TCPDump from the peer to get a
better idea of whats going on when the connection fails
Neil
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: Maximum no of bytes Ethernet can transfer at a time ??
From: Ajit @ 2011-04-27 12:27 UTC (permalink / raw)
To: netdev
In-Reply-To: <1303906910.3166.50.camel@edumazet-laptop>
Eric Dumazet <eric.dumazet <at> gmail.com> writes:
> Sure, check your syscall returns values, and search for SO_RCVBUF &
> SO_SNDBUF (man 7 socket)
>
> --
okies..I dont know exactly how to use those but I will google and try it..
will post my result after some time.
Thank you
^ permalink raw reply
* Re: Maximum no of bytes Ethernet can transfer at a time ??
From: Eric Dumazet @ 2011-04-27 12:21 UTC (permalink / raw)
To: Ajit; +Cc: netdev
In-Reply-To: <loom.20110427T140651-957@post.gmane.org>
Le mercredi 27 avril 2011 à 12:11 +0000, Ajit a écrit :
> Guys,
>
> I have developed a code which uses raw sockets to transfer files. My code skips
> all the upper layer protocols,I have designed a small protocol of my own.
>
> Now to problem is, whenever I transfer a large file it creates a problem. If
> transfer a file of suppose 100kb or more, only 97.9 Kb is received, unlike in
> the case of files smaller that 97.9.
>
> What can be the problem ?? Does continuously sending and receiving of frames
> creates a problem ??
Sure, check your syscall returns values, and search for SO_RCVBUF &
SO_SNDBUF (man 7 socket)
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox