Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] net/sched: CAN Filter/Classifier
From: Eric Dumazet @ 2012-06-05  8:09 UTC (permalink / raw)
  To: Rostislav Lisovy; +Cc: netdev, linux-can, lartc, pisa, sojkam1
In-Reply-To: <1338826176-11646-2-git-send-email-lisovy@gmail.com>

On Mon, 2012-06-04 at 18:09 +0200, Rostislav Lisovy wrote:
> This classifier classifies CAN frames (AF_CAN) according to their
> identifiers. This functionality can not be easily achieved with
> existing classifiers, such as u32. This classifier can be used
> with any available qdisc and it is able to classify both SFF
> or EFF frames.
> 
> The filtering rules for EFF frames are stored in an array, which
> is traversed during classification. A bitmap is used to store SFF
> rules -- one bit for each ID.
> 
> More info about the project:
> http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf
> 
> Signed-off-by: Rostislav Lisovy <lisovy@gmail.com>
> ---
>  net/sched/Kconfig   |   10 +
>  net/sched/Makefile  |    1 +
>  net/sched/cls_can.c |  572 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 583 insertions(+)
>  create mode 100644 net/sched/cls_can.c

It seems a huge amount of code, and before reviewing it I would like to
ask :

1) Did you try to extend cls_flow somehow ?

2) Adding a cls_filter (or extend cls_flow to be able to use a bpf),
could be more generic, and thanks to bpf jit could be way faster.

3) sfq/fq_codel could be CAN aware if you adapt skb_flow_dissect() ?





^ permalink raw reply

* Re: [ovs-dev] [PATCH 01/21] datapath: tunnelling: Replace tun_id with tun_key
From: Simon Horman @ 2012-06-05  8:12 UTC (permalink / raw)
  To: Jesse Gross; +Cc: dev, netdev
In-Reply-To: <CAEP_g=8ahniEzTQQ8a8Kfu2cudxyM_+DA5YLyxM0EdY43NkpBg@mail.gmail.com>

On Tue, Jun 05, 2012 at 12:33:11PM +0900, Jesse Gross wrote:
> On Tue, Jun 5, 2012 at 7:34 AM, Simon Horman <horms@verge.net.au> wrote:
> > On Sun, Jun 03, 2012 at 06:15:04PM +0900, Jesse Gross wrote:
> >> On Thu, May 24, 2012 at 6:08 PM, Simon Horman <horms@verge.net.au> wrote:
> >> > @@ -1204,15 +1210,21 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
> >> >  int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
> >> >  {
> >> >        struct ovs_key_ethernet *eth_key;
> >> > +       struct ovs_key_ipv4_tunnel *tun_key;
> >> >        struct nlattr *nla, *encap;
> >> >
> >> >        if (swkey->phy.priority &&
> >> >            nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority))
> >> >                goto nla_put_failure;
> >> >
> >> > -       if (swkey->phy.tun_id != cpu_to_be64(0) &&
> >> > -           nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, swkey->phy.tun_id))
> >> > -               goto nla_put_failure;
> >> > +       if (swkey->phy.tun_key.ipv4_dst) {
> >>
> >> It's probably OK to use DIP equal to zero as a not present marker but
> >> we need to enforce that it's always true - for example we shouldn't
> >> allow somebody to setup a flow that way or receive packets with a zero
> >> address.  Alternately, we may be able to find a spare bit to indicate
> >> this, like is done with vlans.
> >
> > When I originally wrote this there didn't seem to be any obvious
> > place in ovs_key_ipv4_tunnel to have an active/inactive bit, which
> > is in part why the code relies on checking DIP.
> >
> > However, more recent versions of ovs_key_ipv4_tunnel have a flags field of
> > which only one bit is currently used. We could use one of the unused flag
> > bits.
> 
> I guess it depends on what we end up doing with the lookup struct.  If
> it stays as it is today, there's plenty of space if you include those
> padding bytes.  If we shrink it down and there isn't a place then I do
> think it is fine to use DIP (since this is traversing an IP stack and
> DIP = 0 is an invalid value it's not like an L2 switch not allowing
> invalid IP packet).  In that case, we just need to do more validation
> in other places to make sure that this is the only situation that the
> condition can arise.

Ok, my suspicion is that there will be space.

> >> In any case, I think we need to do some additional validation when
> >> setting up flows to check reserved space, for example, as otherwise
> >> that will never match.
> >
> > Sure. My testing seems to indicate that matching does occur,
> > though I am quite happy to tighten things up.
> 
> I don't think it causes a problem as long as userspace is well
> behaved, I was think it's best to detect problems early.

Ok, good point. I'll make sure the data is clean.
> 
> >> In a similar vein, struct ovs_key_ipv4_tunnel contains some fields
> >> that I think can never apply for lookup such as the flags so it would
> >> be nice if we could remove that for lookup.
> >
> > I think they need to be there to be passed around, so I'm not
> > sure if they can easily be removed from ovs_key_ipv4_tunnel if that
> > is what you are asking.
> 
> My guess is that we'll probably want to separate out the struct that
> is used for lookup from the one that is used for communication with
> userspace, which is what we do for most things so that we have more
> freedom to optimize in the kernel.

Understood, I'll look into doing that.

> >> >  bool ovs_tnl_frag_needed(struct vport *vport,
> >> >                         const struct tnl_mutable_config *mutable,
> >> > -                        struct sk_buff *skb, unsigned int mtu, __be64 flow_key)
> >> > +                        struct sk_buff *skb, unsigned int mtu,
> >> > +                        struct ovs_key_ipv4_tunnel *tun_key)
> >> >  {
> >> >        unsigned int eth_hdr_len = ETH_HLEN;
> >> >        unsigned int total_length = 0, header_length = 0, payload_length;
> >> >        struct ethhdr *eh, *old_eh = eth_hdr(skb);
> >> >        struct sk_buff *nskb;
> >> > +       struct ovs_key_ipv4_tunnel ntun_key;
> >> >
> >> >        /* Sanity check */
> >> >        if (skb->protocol == htons(ETH_P_IP)) {
> >> > @@ -705,8 +707,10 @@ bool ovs_tnl_frag_needed(struct vport *vport,
> >> >         * any way of synthesizing packets.
> >> >         */
> >> >        if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
> >> > -           (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION))
> >> > -               OVS_CB(nskb)->tun_id = flow_key;
> >> > +           (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) {
> >> > +               ntun_key = *tun_key;
> >> > +               OVS_CB(nskb)->tun_key = &ntun_key;
> >> > +       }
> >>
> >> I guess this is probably where you were going to use the function to
> >> reverse IP addresses.  The logic doesn't really work but it's moot
> >> since this is going away anyways.
> >
> > My latest series includes a clean up to ovs_tnl_frag_needed() to allow
> > it to work in some circumstances - i.e. those found in my test environment.
> > That series removes knowledge of tun_key from ovs_tnl_frag_needed().
> >
> > I am however happy to remove ovs_tnl_frag_needed() completely if you think
> > that is appropriate.
> 
> I think, in retrospect, that the path MTU discovery that I implemented
> here was probably not the right choice and MSS clamping is the correct
> way to do things.  It was better when it wasn't possible to do any
> kind of flow-based manipulation of tunnels but the model is breaking
> down more and more over time.  Given that I would be hesitant to
> submit it upstream and since that's the goal of this work, removing it
> completely is probably the right thing to do.

Sure. My latest series also includes an implementation of MSS clamping,
so assuming it isn't doomed in some way we should be safe to remove
MTU discovery. I'll see about doing that for my next post.

^ permalink raw reply

* Re: [net-next PATCH 1/3] Added kernel support in EEE Ethtool commands
From: Yuval Mintz @ 2012-06-05  8:11 UTC (permalink / raw)
  To: Giuseppe CAVALLARO
  Cc: netdev@vger.kernel.org, davem@davemloft.net, Eilon Greenstein
In-Reply-To: <4FCDBCBA.8040306@st.com>

On 06/05/2012 11:00 AM, Giuseppe CAVALLARO wrote:

> Hello Yuval,
> 
> On 6/5/2012 8:39 AM, Yuval Mintz wrote:
>> This patch extends the kernel's ethtool interface by adding support
>> for 2 new EEE commands - get_eee and set_eee.
> 
>> +	__u32	tx_lpi_enabled;
>> +	__u32	tx_lpi_timer;
> 
> Is the tx_lpi_timer field for the MAC driver as we discussed in the
> past? If yes, so I can use it for the stmmac too.
> 
> Peppe
> 



Hi Peppe,

The short answer is yes.

The API does not specify whether the delay should be made in the MAC
or controller levels, but it should guarantee that if the nic is idle
it should wait that many microseconds prior to asserting its Tx lpi.

Thanks,
Yuval

^ permalink raw reply

* Re: [net-next PATCH 1/3] Added kernel support in EEE Ethtool commands
From: Giuseppe CAVALLARO @ 2012-06-05  8:16 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: netdev@vger.kernel.org, davem@davemloft.net, Eilon Greenstein
In-Reply-To: <4FCDBF2E.6040705@broadcom.com>

On 6/5/2012 10:11 AM, Yuval Mintz wrote:
> On 06/05/2012 11:00 AM, Giuseppe CAVALLARO wrote:
> 
>> Hello Yuval,
>>
>> On 6/5/2012 8:39 AM, Yuval Mintz wrote:
>>> This patch extends the kernel's ethtool interface by adding support
>>> for 2 new EEE commands - get_eee and set_eee.
>>
>>> +	__u32	tx_lpi_enabled;
>>> +	__u32	tx_lpi_timer;
>>
>> Is the tx_lpi_timer field for the MAC driver as we discussed in the
>> past? If yes, so I can use it for the stmmac too.
>>
>> Peppe
>>
> 
> 
> 
> Hi Peppe,
> 
> The short answer is yes.
> 
> The API does not specify whether the delay should be made in the MAC
> or controller levels, but it should guarantee that if the nic is idle
> it should wait that many microseconds prior to asserting its Tx lpi.

Perfect! So I'll give you the patches for phy and stmmac asap.

Peppe

> 
> Thanks,
> Yuval
> 
> 

^ permalink raw reply

* [net-next RFC PATCH] virtio_net: collect satistics and export through ethtool
From: Jason Wang @ 2012-06-05  8:38 UTC (permalink / raw)
  To: netdev, rusty, virtualization, linux-kernel, mst

Satistics counters is useful for debugging and performance optimization, so this
patch lets virtio_net driver collect following and export them to userspace
through "ethtool -S":

- number of packets sent/received
- number of bytes sent/received
- number of callbacks for tx/rx
- number of kick for tx/rx
- number of bytes/packets queued for tx

As virtnet_stats were per-cpu, so both per-cpu and gloabl satistics were exposed
like:

NIC statistics:
     tx_bytes[0]: 2551
     tx_packets[0]: 12
     tx_kick[0]: 12
     tx_callbacks[0]: 1
     tx_queued_packets[0]: 12
     tx_queued_bytes[0]: 3055
     rx_bytes[0]: 0
     rx_packets[0]: 0
     rx_kick[0]: 0
     rx_callbacks[0]: 0
     tx_bytes[1]: 5575
     tx_packets[1]: 37
     tx_kick[1]: 38
     tx_callbacks[1]: 0
     tx_queued_packets[1]: 38
     tx_queued_bytes[1]: 5217
     rx_bytes[1]: 4175
     rx_packets[1]: 25
     rx_kick[1]: 1
     rx_callbacks[1]: 16
     tx_bytes: 8126
     tx_packets: 49
     tx_kick: 50
     tx_callbacks: 1
     tx_queued_packets: 50
     tx_queued_bytes: 8272
     rx_bytes: 4175
     rx_packets: 25
     rx_kick: 1
     rx_callbacks: 16

TODO:

- more satistics
- unitfy the ndo_get_stats64 and get_ethtool_stats
- calculate the pending bytes/pkts

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c |  130 +++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 127 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5214b1e..7ab0cc1 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -41,6 +41,10 @@ module_param(gso, bool, 0444);
 #define VIRTNET_SEND_COMMAND_SG_MAX    2
 #define VIRTNET_DRIVER_VERSION "1.0.0"
 
+#define VIRTNET_STAT_OFF(m) offsetof(struct virtnet_stats, m)
+#define VIRTNET_STAT(stat, i) (*((u64 *)((char *)stat + \
+			       virtnet_stats_str_attr[i].stat_offset)))
+
 struct virtnet_stats {
 	struct u64_stats_sync syncp;
 	u64 tx_bytes;
@@ -48,8 +52,33 @@ struct virtnet_stats {
 
 	u64 rx_bytes;
 	u64 rx_packets;
+
+	u64 tx_kick;
+	u64 rx_kick;
+	u64 tx_callbacks;
+	u64 rx_callbacks;
+	u64 tx_queued_packets;
+	u64 tx_queued_bytes;
+};
+
+static struct {
+	char string[ETH_GSTRING_LEN];
+	int stat_offset;
+} virtnet_stats_str_attr[] = {
+	{ "tx_bytes", VIRTNET_STAT_OFF(tx_bytes)},
+	{ "tx_packets", VIRTNET_STAT_OFF(tx_packets)},
+	{ "tx_kick", VIRTNET_STAT_OFF(tx_kick)},
+	{ "tx_callbacks", VIRTNET_STAT_OFF(tx_callbacks)},
+	{ "tx_queued_packets", VIRTNET_STAT_OFF(tx_queued_packets)},
+	{ "tx_queued_bytes", VIRTNET_STAT_OFF(tx_queued_bytes)},
+	{ "rx_bytes" , VIRTNET_STAT_OFF(rx_bytes)},
+	{ "rx_packets", VIRTNET_STAT_OFF(rx_packets)},
+	{ "rx_kick", VIRTNET_STAT_OFF(rx_kick)},
+	{ "rx_callbacks", VIRTNET_STAT_OFF(rx_callbacks)},
 };
 
+#define VIRTNET_NUM_STATS ARRAY_SIZE(virtnet_stats_str_attr)
+
 struct virtnet_info {
 	struct virtio_device *vdev;
 	struct virtqueue *rvq, *svq, *cvq;
@@ -142,6 +171,11 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
 static void skb_xmit_done(struct virtqueue *svq)
 {
 	struct virtnet_info *vi = svq->vdev->priv;
+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+
+	u64_stats_update_begin(&stats->syncp);
+	stats->tx_callbacks++;
+	u64_stats_update_end(&stats->syncp);
 
 	/* Suppress further interrupts. */
 	virtqueue_disable_cb(svq);
@@ -461,6 +495,7 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
 {
 	int err;
 	bool oom;
+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
 
 	do {
 		if (vi->mergeable_rx_bufs)
@@ -477,13 +512,24 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
 	} while (err > 0);
 	if (unlikely(vi->num > vi->max))
 		vi->max = vi->num;
-	virtqueue_kick(vi->rvq);
+	if (virtqueue_kick_prepare(vi->rvq)) {
+		virtqueue_notify(vi->rvq);
+		u64_stats_update_begin(&stats->syncp);
+		stats->rx_kick++;
+		u64_stats_update_end(&stats->syncp);
+	}
 	return !oom;
 }
 
 static void skb_recv_done(struct virtqueue *rvq)
 {
 	struct virtnet_info *vi = rvq->vdev->priv;
+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+
+	u64_stats_update_begin(&stats->syncp);
+	stats->rx_callbacks++;
+	u64_stats_update_end(&stats->syncp);
+
 	/* Schedule NAPI, Suppress further interrupts if successful. */
 	if (napi_schedule_prep(&vi->napi)) {
 		virtqueue_disable_cb(rvq);
@@ -626,7 +672,9 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
 	int capacity;
+	bool kick;
 
 	/* Free up any pending old buffers before queueing new ones. */
 	free_old_xmit_skbs(vi);
@@ -651,7 +699,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		kfree_skb(skb);
 		return NETDEV_TX_OK;
 	}
-	virtqueue_kick(vi->svq);
+
+	kick = virtqueue_kick_prepare(vi->svq);
+	if (kick)
+		virtqueue_notify(vi->svq);
+
+	u64_stats_update_begin(&stats->syncp);
+	if (kick)
+		stats->tx_kick++;
+	stats->tx_queued_bytes += skb->len;
+	stats->tx_queued_packets++;
+	u64_stats_update_end(&stats->syncp);
 
 	/* Don't wait up for transmitted skbs to be freed. */
 	skb_orphan(skb);
@@ -926,7 +984,6 @@ static void virtnet_get_ringparam(struct net_device *dev,
 
 }
 
-
 static void virtnet_get_drvinfo(struct net_device *dev,
 				struct ethtool_drvinfo *info)
 {
@@ -939,10 +996,77 @@ static void virtnet_get_drvinfo(struct net_device *dev,
 
 }
 
+static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
+{
+	int i, cpu;
+	switch (stringset) {
+	case ETH_SS_STATS:
+		for_each_possible_cpu(cpu)
+			for (i = 0; i < VIRTNET_NUM_STATS; i++) {
+				sprintf(buf, "%s[%u]",
+					virtnet_stats_str_attr[i].string, cpu);
+				buf += ETH_GSTRING_LEN;
+			}
+		for (i = 0; i < VIRTNET_NUM_STATS; i++) {
+			memcpy(buf, virtnet_stats_str_attr[i].string,
+				ETH_GSTRING_LEN);
+			buf += ETH_GSTRING_LEN;
+		}
+		break;
+	}
+}
+
+static int virtnet_get_sset_count(struct net_device *dev, int sset)
+{
+	switch (sset) {
+	case ETH_SS_STATS:
+		return VIRTNET_NUM_STATS * (num_online_cpus() + 1);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static void virtnet_get_ethtool_stats(struct net_device *dev,
+				struct ethtool_stats *stats, u64 *buf)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	int cpu, i;
+	unsigned int start;
+	struct virtnet_stats sample, total;
+
+	memset(&total, 0, sizeof(total));
+	memset(&sample, 0, sizeof(sample));
+
+	for_each_possible_cpu(cpu) {
+		struct virtnet_stats *stats = per_cpu_ptr(vi->stats, cpu);
+		do {
+			start = u64_stats_fetch_begin(&stats->syncp);
+			for (i = 0; i < VIRTNET_NUM_STATS; i++) {
+				VIRTNET_STAT(&sample, i) =
+					VIRTNET_STAT(stats, i);
+
+			}
+		} while (u64_stats_fetch_retry(&stats->syncp, start));
+		for (i = 0; i < VIRTNET_NUM_STATS; i++) {
+			*buf = VIRTNET_STAT(&sample, i);
+			VIRTNET_STAT(&total, i) += VIRTNET_STAT(stats, i);
+			buf++;
+		}
+	}
+
+	for (i = 0; i < VIRTNET_NUM_STATS; i++) {
+		*buf = VIRTNET_STAT(&total, i);
+		buf++;
+	}
+}
+
 static const struct ethtool_ops virtnet_ethtool_ops = {
 	.get_drvinfo = virtnet_get_drvinfo,
 	.get_link = ethtool_op_get_link,
 	.get_ringparam = virtnet_get_ringparam,
+	.get_ethtool_stats = virtnet_get_ethtool_stats,
+	.get_strings = virtnet_get_strings,
+	.get_sset_count = virtnet_get_sset_count,
 };
 
 #define MIN_MTU 68

^ permalink raw reply related

* Server Rental Service in Hong Kong
From: borislamsv2 @ 2012-06-04  8:45 UTC (permalink / raw)


Dear All,

We have our own datacenter in Hong Kong & provide email/application/web rental service to clients.We are APNIC member & provide clean IP to clients.

Dell? PowerEdge? EnterpriseRack Mount Server
-Intel(R) Xeon(R) E3-1240 Processor (3.3GHz, 8M Cache, Turbo, 4C/8T, 80W)
-8GB RAM, 2x4GB, 1333MHz, DDR-3, Dual Ranked UDIMMs
-500GB, 3.5", 6Gbps SAS x 2
-Raid 1 Mirroring Protection
-Remote KVM (iDRAC6 Enterprise)

Dell(TM) PowerEdge(TM) R410 Rack Mount Server
-Intel(R) Quad Core E5606 Xeon(R) CPU, 2.13GHz, 4M Cache, 4.86 GT/s QPI
-4GB Memory (2x2GB), 1333MHz Dual Ranked RDIMMs Fully-Buffered
-500GB 7.2K RPM SATAII 3.5" Hard Drive x 2
-iDRAC6 Enterprise or Express (Remote KVM Management)

Every Dedicated Server Hosting Solution Also Includes:

Software Specification
- CentOS / Fedora / Debian / FreeBSD / Ubuntu / Redhat Linux
- Full root-level access
- Data Center Facilities
- Shared Local & International Bandwidth
- 2 IP Addresses Allocation
- Un-interruptible Power Supply (UPS) backed up by private diesel generator
- FM200¡§based fire suppression system
- 24x7 CRAC Air Conditioning and Humidity Control
- 24x7 Security Control
- 24x7 Remote Hand Service

Pls send us email for further information.Thanks,

Boris
boris@dedicatedserver.com.hk

If you do not wish to further receive this event message, email "borislamsv2@gmail.com" to unsubscribe this message or remove your email from the list.

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] inetpeer: add namespace support for inetpeer
From: Eric Dumazet @ 2012-06-05  8:57 UTC (permalink / raw)
  To: Gao feng
  Cc: serge.hallyn, ebiederm, herbert, steffen.klassert, davem, netdev,
	containers
In-Reply-To: <1338882737-11914-1-git-send-email-gaofeng@cn.fujitsu.com>

On Tue, 2012-06-05 at 15:52 +0800, Gao feng wrote:

> +static void __net_exit inetpeer_net_exit(struct net *net)
> +{
> +	inetpeer_invalidate_tree(net, AF_INET);
> +	kfree(net->ipv4.peers);
> +
> +	inetpeer_invalidate_tree(net, AF_INET6);
> +	kfree(net->ipv6.peers);
> +}
> +

Are we 1000% sure no code ever run in inetpeer land after this call ?

I would add
	net->ipv4.peers = NULL;
	net->ipv6.peers = NULL;

to catch NULL deref instead of strange errors, just in case.

By the way, I think we have a bug in inetpeer_gc_worker()

Steffen ?

We have no rcu grace period to make sure the following is safe :

if (!atomic_read(&p->refcnt)) {
	list_del(&p->gc_list);
	kmem_cache_free(peer_cachep, p);
}

I'll post a fix like :

diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index d4d61b6..07731b5 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -137,7 +137,7 @@ static void inetpeer_gc_worker(struct work_struct *work)
 
 		n = list_entry(p->gc_list.next, struct inet_peer, gc_list);
 
-		if (!atomic_read(&p->refcnt)) {
+		if (atomic_cmpxchg(&p->refcnt, 0, -1) == 0) {
 			list_del(&p->gc_list);
 			kmem_cache_free(peer_cachep, p);
 		}

^ permalink raw reply related

* Re: [PATCH net-next 5/5] gianfar_ethtool: coding style and whitespace cleanups
From: David Miller @ 2012-06-05  9:14 UTC (permalink / raw)
  To: jan.ceuleers; +Cc: b06378, joe, netdev
In-Reply-To: <4FCD9F15.2050605@computer.org>

From: Jan Ceuleers <jan.ceuleers@computer.org>
Date: Tue, 05 Jun 2012 07:54:29 +0200

> So your build environment happens to be one of powerpc, alpha or mips,
> does it?

No, I get those errors you posted too.

But like I said, you simply IGNORE THEM, and look for newly introduced
errors and warnings.

^ permalink raw reply

* [PATCH] inetpeer: fix a race in inetpeer_gc_worker()
From: Eric Dumazet @ 2012-06-05  9:28 UTC (permalink / raw)
  To: David Miller; +Cc: Steffen Klassert, netdev

From: Eric Dumazet <edumazet@google.com>

commit 5faa5df1fa2024 (inetpeer: Invalidate the inetpeer tree along with
the routing cache) added a race :

Before freeing an inetpeer, we must respect a RCU grace period, and make
sure no user will attempt to increase refcnt.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/inetpeer.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index d4d61b6..f936e95 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -108,6 +108,11 @@ int inet_peer_threshold __read_mostly = 65536 + 128;	/* start to throw entries m
 int inet_peer_minttl __read_mostly = 120 * HZ;	/* TTL under high load: 120 sec */
 int inet_peer_maxttl __read_mostly = 10 * 60 * HZ;	/* usual time to live: 10 min */
 
+static void inetpeer_free_rcu(struct rcu_head *head)
+{
+	kmem_cache_free(peer_cachep, container_of(head, struct inet_peer, rcu));
+}
+
 static void inetpeer_gc_worker(struct work_struct *work)
 {
 	struct inet_peer *p, *n;
@@ -137,9 +142,9 @@ static void inetpeer_gc_worker(struct work_struct *work)
 
 		n = list_entry(p->gc_list.next, struct inet_peer, gc_list);
 
-		if (!atomic_read(&p->refcnt)) {
+		if (atomic_cmpxchg(&p->refcnt, 0, -1) == 0) {
 			list_del(&p->gc_list);
-			kmem_cache_free(peer_cachep, p);
+			call_rcu(&p->rcu, inetpeer_free_rcu);
 		}
 	}
 
@@ -364,11 +369,6 @@ do {								\
 	peer_avl_rebalance(stack, stackptr, base);		\
 } while (0)
 
-static void inetpeer_free_rcu(struct rcu_head *head)
-{
-	kmem_cache_free(peer_cachep, container_of(head, struct inet_peer, rcu));
-}
-
 static void unlink_from_pool(struct inet_peer *p, struct inet_peer_base *base,
 			     struct inet_peer __rcu **stack[PEER_MAXDEPTH])
 {

^ permalink raw reply related

* Re: [net-next RFC PATCH] virtio_net: collect satistics and export through ethtool
From: Michael S. Tsirkin @ 2012-06-05 10:10 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20120605083841.11878.69056.stgit@amd-6168-8-1.englab.nay.redhat.com>

On Tue, Jun 05, 2012 at 04:38:41PM +0800, Jason Wang wrote:
> Satistics counters is useful for debugging and performance optimization, so this
> patch lets virtio_net driver collect following and export them to userspace
> through "ethtool -S":
> 
> - number of packets sent/received
> - number of bytes sent/received
> - number of callbacks for tx/rx
> - number of kick for tx/rx
> - number of bytes/packets queued for tx
> 
> As virtnet_stats were per-cpu, so both per-cpu and gloabl satistics were exposed
> like:
> 
> NIC statistics:
>      tx_bytes[0]: 2551
>      tx_packets[0]: 12
>      tx_kick[0]: 12
>      tx_callbacks[0]: 1
>      tx_queued_packets[0]: 12
>      tx_queued_bytes[0]: 3055
>      rx_bytes[0]: 0
>      rx_packets[0]: 0
>      rx_kick[0]: 0
>      rx_callbacks[0]: 0
>      tx_bytes[1]: 5575
>      tx_packets[1]: 37
>      tx_kick[1]: 38
>      tx_callbacks[1]: 0
>      tx_queued_packets[1]: 38
>      tx_queued_bytes[1]: 5217
>      rx_bytes[1]: 4175
>      rx_packets[1]: 25
>      rx_kick[1]: 1
>      rx_callbacks[1]: 16
>      tx_bytes: 8126
>      tx_packets: 49
>      tx_kick: 50
>      tx_callbacks: 1
>      tx_queued_packets: 50
>      tx_queued_bytes: 8272
>      rx_bytes: 4175
>      rx_packets: 25
>      rx_kick: 1
>      rx_callbacks: 16
> 
> TODO:
> 
> - more satistics
> - unitfy the ndo_get_stats64 and get_ethtool_stats
> - calculate the pending bytes/pkts
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c |  130 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 127 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 5214b1e..7ab0cc1 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -41,6 +41,10 @@ module_param(gso, bool, 0444);
>  #define VIRTNET_SEND_COMMAND_SG_MAX    2
>  #define VIRTNET_DRIVER_VERSION "1.0.0"
>  
> +#define VIRTNET_STAT_OFF(m) offsetof(struct virtnet_stats, m)
> +#define VIRTNET_STAT(stat, i) (*((u64 *)((char *)stat + \

What's going on? Why cast to char *?

> +			       virtnet_stats_str_attr[i].stat_offset)))

These are confusing unless you see what virtnet_stats_str_attr
is so please move them near that definition.

> +
>  struct virtnet_stats {
>  	struct u64_stats_sync syncp;
>  	u64 tx_bytes;
> @@ -48,8 +52,33 @@ struct virtnet_stats {
>  
>  	u64 rx_bytes;
>  	u64 rx_packets;
> +
> +	u64 tx_kick;
> +	u64 rx_kick;
> +	u64 tx_callbacks;
> +	u64 rx_callbacks;
> +	u64 tx_queued_packets;
> +	u64 tx_queued_bytes;
> +};

I have an idea (not a must): why don't we simply create an enum
enum virtnet_stats {
	VIRTNET_TX_KICK,
	VIRTNET_RX_KICK,
	....
	VIRTNET_MAX_STAT,
}


now stats can just do
	stats->data[VIRTNET_RX_KICK] instead of stats->rx_kick
which is not a big problem, but copying them in bulk
becomes straight-forward, no need for macros at all.

If we decide to do this, needs to be a separate patch,
then this one on top.

> +
> +static struct {

static const.

> +	char string[ETH_GSTRING_LEN];
> +	int stat_offset;
> +} virtnet_stats_str_attr[] = {
> +	{ "tx_bytes", VIRTNET_STAT_OFF(tx_bytes)},
> +	{ "tx_packets", VIRTNET_STAT_OFF(tx_packets)},
> +	{ "tx_kick", VIRTNET_STAT_OFF(tx_kick)},
> +	{ "tx_callbacks", VIRTNET_STAT_OFF(tx_callbacks)},
> +	{ "tx_queued_packets", VIRTNET_STAT_OFF(tx_queued_packets)},
> +	{ "tx_queued_bytes", VIRTNET_STAT_OFF(tx_queued_bytes)},
> +	{ "rx_bytes" , VIRTNET_STAT_OFF(rx_bytes)},
> +	{ "rx_packets", VIRTNET_STAT_OFF(rx_packets)},
> +	{ "rx_kick", VIRTNET_STAT_OFF(rx_kick)},
> +	{ "rx_callbacks", VIRTNET_STAT_OFF(rx_callbacks)},

VIRTNET_STAT_OFF does not save much here, but if you are after
saving characters then make the macro instanciate the string
as well.

>  };
>  
> +#define VIRTNET_NUM_STATS ARRAY_SIZE(virtnet_stats_str_attr)
> +

if you pass virtnet_stats_str_attr to VIRTNET_STAT macro,
then it's explicit and VIRTNET_NUM_STATS won't be needed either.

>  struct virtnet_info {
>  	struct virtio_device *vdev;
>  	struct virtqueue *rvq, *svq, *cvq;
> @@ -142,6 +171,11 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
>  static void skb_xmit_done(struct virtqueue *svq)
>  {
>  	struct virtnet_info *vi = svq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	stats->tx_callbacks++;
> +	u64_stats_update_end(&stats->syncp);
>  
>  	/* Suppress further interrupts. */
>  	virtqueue_disable_cb(svq);
> @@ -461,6 +495,7 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>  {
>  	int err;
>  	bool oom;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>  
>  	do {
>  		if (vi->mergeable_rx_bufs)
> @@ -477,13 +512,24 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>  	} while (err > 0);
>  	if (unlikely(vi->num > vi->max))
>  		vi->max = vi->num;
> -	virtqueue_kick(vi->rvq);
> +	if (virtqueue_kick_prepare(vi->rvq)) {
> +		virtqueue_notify(vi->rvq);
> +		u64_stats_update_begin(&stats->syncp);
> +		stats->rx_kick++;
> +		u64_stats_update_end(&stats->syncp);
> +	}
>  	return !oom;
>  }
>  
>  static void skb_recv_done(struct virtqueue *rvq)
>  {
>  	struct virtnet_info *vi = rvq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	stats->rx_callbacks++;
> +	u64_stats_update_end(&stats->syncp);
> +
>  	/* Schedule NAPI, Suppress further interrupts if successful. */
>  	if (napi_schedule_prep(&vi->napi)) {
>  		virtqueue_disable_cb(rvq);
> @@ -626,7 +672,9 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct virtnet_info *vi = netdev_priv(dev);
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>  	int capacity;
> +	bool kick;
>  
>  	/* Free up any pending old buffers before queueing new ones. */
>  	free_old_xmit_skbs(vi);
> @@ -651,7 +699,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  		kfree_skb(skb);
>  		return NETDEV_TX_OK;
>  	}
> -	virtqueue_kick(vi->svq);
> +
> +	kick = virtqueue_kick_prepare(vi->svq);
> +	if (kick)

probably
	 if (unlikely(kick))

> +		virtqueue_notify(vi->svq);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	if (kick)

this too

> +		stats->tx_kick++;
> +	stats->tx_queued_bytes += skb->len;
> +	stats->tx_queued_packets++;
> +	u64_stats_update_end(&stats->syncp);
>  
>  	/* Don't wait up for transmitted skbs to be freed. */
>  	skb_orphan(skb);
> @@ -926,7 +984,6 @@ static void virtnet_get_ringparam(struct net_device *dev,
>  
>  }
>  
> -
>  static void virtnet_get_drvinfo(struct net_device *dev,
>  				struct ethtool_drvinfo *info)
>  {
> @@ -939,10 +996,77 @@ static void virtnet_get_drvinfo(struct net_device *dev,
>  
>  }
>  
> +static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
> +{
> +	int i, cpu;
> +	switch (stringset) {
> +	case ETH_SS_STATS:
> +		for_each_possible_cpu(cpu)
> +			for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +				sprintf(buf, "%s[%u]",
> +					virtnet_stats_str_attr[i].string, cpu);
> +				buf += ETH_GSTRING_LEN;
> +			}
> +		for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +			memcpy(buf, virtnet_stats_str_attr[i].string,
> +				ETH_GSTRING_LEN);
> +			buf += ETH_GSTRING_LEN;
> +		}
> +		break;
> +	}
> +}
> +
> +static int virtnet_get_sset_count(struct net_device *dev, int sset)
> +{
> +	switch (sset) {
> +	case ETH_SS_STATS:
> +		return VIRTNET_NUM_STATS * (num_online_cpus() + 1);

This will allocate buffers for online cpus only, but the above
will fill them in for all possible cpus.
Will this overrun some buffer?

> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +}
> +
> +static void virtnet_get_ethtool_stats(struct net_device *dev,
> +				struct ethtool_stats *stats, u64 *buf)

The coding style says
	Descendants are always substantially shorter than the parent and
	are placed substantially to the right.

you can't call it substantially to the right if it's to the left of
the opening '('  :), so please indent it aligning on the opening.

> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	int cpu, i;
> +	unsigned int start;
> +	struct virtnet_stats sample, total;
> +
> +	memset(&total, 0, sizeof(total));
> +	memset(&sample, 0, sizeof(sample));
> +
> +	for_each_possible_cpu(cpu) {
> +		struct virtnet_stats *stats = per_cpu_ptr(vi->stats, cpu);
> +		do {
> +			start = u64_stats_fetch_begin(&stats->syncp);
> +			for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +				VIRTNET_STAT(&sample, i) =
> +					VIRTNET_STAT(stats, i);

when you feel the need to break lines like this - don't :)
use an inline function instead.

> +

kill empty line here
> +			}

don't put {} around single statements pls.

> +		} while (u64_stats_fetch_retry(&stats->syncp, start));
> +		for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +			*buf = VIRTNET_STAT(&sample, i);
> +			VIRTNET_STAT(&total, i) += VIRTNET_STAT(stats, i);
> +			buf++;
> +		}
> +	}
> +
> +	for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +		*buf = VIRTNET_STAT(&total, i);
> +		buf++;
> +	}
> +}
> +
>  static const struct ethtool_ops virtnet_ethtool_ops = {
>  	.get_drvinfo = virtnet_get_drvinfo,
>  	.get_link = ethtool_op_get_link,
>  	.get_ringparam = virtnet_get_ringparam,
> +	.get_ethtool_stats = virtnet_get_ethtool_stats,
> +	.get_strings = virtnet_get_strings,
> +	.get_sset_count = virtnet_get_sset_count,
>  };
>  
>  #define MIN_MTU 68

^ permalink raw reply

* [PATCH] macvtap: use set_curren_state to ensure mb
From: Hong zhi guo @ 2012-06-05 10:46 UTC (permalink / raw)
  To: davem; +Cc: netdev, arnd, zhiguo.hong, vikifang

replace raw assignment of current->state with
set_current_state(TASK_...) to ensure memory barrier.

Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
---
 drivers/net/macvtap.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 2ee56de..62754f6 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -853,7 +853,7 @@ static ssize_t macvtap_do_read(struct
macvtap_queue *q, struct kiocb *iocb,

 	add_wait_queue(sk_sleep(&q->sk), &wait);
 	while (len) {
-		current->state = TASK_INTERRUPTIBLE;
+		set_current_state(TASK_INTERRUPTIBLE);

 		/* Read frames from the queue */
 		skb = skb_dequeue(&q->sk.sk_receive_queue);
@@ -875,7 +875,7 @@ static ssize_t macvtap_do_read(struct
macvtap_queue *q, struct kiocb *iocb,
 		break;
 	}

-	current->state = TASK_RUNNING;
+	set_current_state(TASK_RUNNING);
 	remove_wait_queue(sk_sleep(&q->sk), &wait);
 	return ret;
 }
-- 
1.7.4.1

^ permalink raw reply related

* Re: [PATCH net-next 5/5] gianfar_ethtool: coding style and whitespace cleanups
From: Jan Ceuleers @ 2012-06-05 10:49 UTC (permalink / raw)
  To: David Miller; +Cc: b06378, joe, netdev
In-Reply-To: <20120605.051437.1378262604943296071.davem@davemloft.net>

On 06/05/2012 11:14 AM, David Miller wrote:
> From: Jan Ceuleers <jan.ceuleers@computer.org>
> Date: Tue, 05 Jun 2012 07:54:29 +0200
> 
>> So your build environment happens to be one of powerpc, alpha or mips,
>> does it?
> 
> No, I get those errors you posted too.
> 
> But like I said, you simply IGNORE THEM, and look for newly introduced
> errors and warnings.

David,

The error I quoted was a fatal error, meaning that compilation did not
proceed beyond the point to which the error pertained (inclusion of a
header that does not exist in my arch). So I cannot test-compile the
driver on my arch and draw any conclusions from that beyond line 91.

I really wanted to make a modest contribution with this patch set, and
it seems I've failed miserably due to the amount of other people's
bandwidth that I'm consuming...

Jan

^ permalink raw reply

* Re: [PATCH net-next 5/5] gianfar_ethtool: coding style and whitespace cleanups
From: Florian Fainelli @ 2012-06-05 10:54 UTC (permalink / raw)
  To: Jan Ceuleers; +Cc: David Miller, b06378, joe, netdev
In-Reply-To: <4FCDE447.3030103@computer.org>

Hi,

On Tuesday 05 June 2012 12:49:43 Jan Ceuleers wrote:
> On 06/05/2012 11:14 AM, David Miller wrote:
> > From: Jan Ceuleers <jan.ceuleers@computer.org>
> > Date: Tue, 05 Jun 2012 07:54:29 +0200
> > 
> >> So your build environment happens to be one of powerpc, alpha or mips,
> >> does it?
> > 
> > No, I get those errors you posted too.
> > 
> > But like I said, you simply IGNORE THEM, and look for newly introduced
> > errors and warnings.
> 
> David,
> 
> The error I quoted was a fatal error, meaning that compilation did not
> proceed beyond the point to which the error pertained (inclusion of a
> header that does not exist in my arch). So I cannot test-compile the
> driver on my arch and draw any conclusions from that beyond line 91.

What about you setup a cross-compiler and build for one of these architectures 
where the driver is enabled?

> 
> I really wanted to make a modest contribution with this patch set, and
> it seems I've failed miserably due to the amount of other people's
> bandwidth that I'm consuming...
> 
> Jan
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] macvtap: use set_curren_state to ensure mb
From: Hong zhi guo @ 2012-06-05 10:57 UTC (permalink / raw)
  To: davem; +Cc: netdev, arnd, zhiguo.hong, vikifang
In-Reply-To: <CAA7+ByVzL2kA=uN_yUVFpSnDbLgwecAi9f__QPzDoXHpt_vyaQ@mail.gmail.com>

On Tue, Jun 5, 2012 at 6:46 PM, Hong zhi guo <honkiko@gmail.com> wrote:
> replace raw assignment of current->state with
> set_current_state(TASK_...) to ensure memory barrier.
>
> Signed-off-by: Hong Zhiguo <honkiko@gmail.com>

Sorry for the wrong format in former mail. Corrected.

Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
---
 drivers/net/macvtap.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 2ee56de..62754f6 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -853,7 +853,7 @@ static ssize_t macvtap_do_read(struct
macvtap_queue *q, struct kiocb *iocb,

 	add_wait_queue(sk_sleep(&q->sk), &wait);
 	while (len) {
-		current->state = TASK_INTERRUPTIBLE;
+		set_current_state(TASK_INTERRUPTIBLE);

 		/* Read frames from the queue */
 		skb = skb_dequeue(&q->sk.sk_receive_queue);
@@ -875,7 +875,7 @@ static ssize_t macvtap_do_read(struct
macvtap_queue *q, struct kiocb *iocb,
 		break;
 	}

-	current->state = TASK_RUNNING;
+	set_current_state(TASK_RUNNING);
 	remove_wait_queue(sk_sleep(&q->sk), &wait);
 	return ret;
 }
-- 
1.7.4.1

^ permalink raw reply related

* Re: [PATCH] macvtap: use set_curren_state to ensure mb
From: Eric Dumazet @ 2012-06-05 11:05 UTC (permalink / raw)
  To: Hong zhi guo; +Cc: davem, netdev, arnd, zhiguo.hong, vikifang
In-Reply-To: <CAA7+ByX7Kp+Zh3ZQ2D6rSuhu6WCeqL-1MgDi9gy-0hHizVXncg@mail.gmail.com>

On Tue, 2012-06-05 at 18:46 +0800, Hong zhi guo wrote:
> replace raw assignment of current->state with
> set_current_state(TASK_...) to ensure memory barrier.
> 
> Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
> ---
>  drivers/net/macvtap.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 2ee56de..62754f6 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -853,7 +853,7 @@ static ssize_t macvtap_do_read(struct
> macvtap_queue *q, struct kiocb *iocb,
> 
>  	add_wait_queue(sk_sleep(&q->sk), &wait);
>  	while (len) {
> -		current->state = TASK_INTERRUPTIBLE;
> +		set_current_state(TASK_INTERRUPTIBLE);
> 
>  		/* Read frames from the queue */
>  		skb = skb_dequeue(&q->sk.sk_receive_queue);
> @@ -875,7 +875,7 @@ static ssize_t macvtap_do_read(struct
> macvtap_queue *q, struct kiocb *iocb,
>  		break;
>  	}
> 
> -	current->state = TASK_RUNNING;
> +	set_current_state(TASK_RUNNING);
>  	remove_wait_queue(sk_sleep(&q->sk), &wait);
>  	return ret;
>  }


Why not using the more standard :

prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);

and

finish_wait(sk_sleep(sk), &wait);

^ permalink raw reply

* Re: [PATCH net-next 5/5] gianfar_ethtool: coding style and whitespace cleanups
From: Eric Dumazet @ 2012-06-05 11:10 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: Jan Ceuleers, David Miller, b06378, joe, netdev
In-Reply-To: <4550590.BhQA539plh@flexo>

On Tue, 2012-06-05 at 12:54 +0200, Florian Fainelli wrote:
> Hi,
> 
> On Tuesday 05 June 2012 12:49:43 Jan Ceuleers wrote:

> > The error I quoted was a fatal error, meaning that compilation did not
> > proceed beyond the point to which the error pertained (inclusion of a
> > header that does not exist in my arch). So I cannot test-compile the
> > driver on my arch and draw any conclusions from that beyond line 91.
> 
> What about you setup a cross-compiler and build for one of these architectures 
> where the driver is enabled?

That pretty easy IMHO : http://www.kernel.org/pub/tools/crosstool/

^ permalink raw reply

* Re: [PATCH RFC] c_can_pci: generic module for c_can on PCI
From: Federico Vaga @ 2012-06-05 11:19 UTC (permalink / raw)
  To: Bhupesh SHARMA
  Cc: Alan Cox, Wolfgang Grandegger, Marc Kleine-Budde,
	Giancarlo ASNAGHI, Alan Cox, Alessandro Rubini,
	linux-can@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <D5ECB3C7A6F99444980976A8C6D896384FA5EC4B37@EAPEX1MAIL1.st.com>

> In some of these SoC's the C_CAN registers which are essentially
> 16-bit or 32-bit registers are aligned always to a 32-bit boundary
> (i.e. even a 16-bit register is aligned to 32-bit boundary).
> 
> So, I had to implement two variants of the read/write reg routines. I
> am not sure your SoC implementation needs them. If it does, I will
> categorize it as flaky as well :)

My implementation is align to 32, but I'm trying to make a generic PCI 
wrapper (some other could be aligned to 16)
 
> See above. There was a reason for keeping these routines in
> c_can_platform.c Simply put, every platform having a Bosch C_CAN
> module can have it's own implementation of the bus (for example you
> use PCI) and register bank layout (16-bit or 32-bit aligned).

I don't understand the reason to keep these functions in 
c_can_platform.c . Two generic read/write functions could be written 
into c_can.c by using a shift value (0 if aligned to 16, 1 if aligned to 
32) as I showed in the previous mail:

> > static u16 c_can_read_reg(struct c_can_priv *priv, enum reg index)
> > {
> > 
> > 	return readw(priv->base + (priv->regs[index] << priv->offset));
> > 
> > }
> > static void c_can_write_reg(struct c_can_priv *priv, enum reg index,
> > 
> > 						u16 val)
> > 
> > {
> > 
> > 	writew(val, priv->base + (priv->regs[index] << priv->offset));
> > 
> > }

Every platform having a Bosch C_CAN/D_CAN can specify its shift value (0 
or 1) and it's done.

-- 
Federico Vaga

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] inetpeer: add namespace support for inetpeer
From: Steffen Klassert @ 2012-06-05 11:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, ebiederm-aS9lmoZGLiVWk0Htik3J/w
In-Reply-To: <1338886626.2760.2109.camel@edumazet-glaptop>

On Tue, Jun 05, 2012 at 10:57:06AM +0200, Eric Dumazet wrote:
> On Tue, 2012-06-05 at 15:52 +0800, Gao feng wrote:
> 
> > +static void __net_exit inetpeer_net_exit(struct net *net)
> > +{
> > +	inetpeer_invalidate_tree(net, AF_INET);
> > +	kfree(net->ipv4.peers);
> > +
> > +	inetpeer_invalidate_tree(net, AF_INET6);
> > +	kfree(net->ipv6.peers);
> > +}
> > +
> 
> Are we 1000% sure no code ever run in inetpeer land after this call ?
> 
> I would add
> 	net->ipv4.peers = NULL;
> 	net->ipv6.peers = NULL;
> 
> to catch NULL deref instead of strange errors, just in case.

I thought about that too, and I'm not absolutely sure.
The rest of this patch looks ok to me.

> 
> By the way, I think we have a bug in inetpeer_gc_worker()
> 
> Steffen ?
> 
> We have no rcu grace period to make sure the following is safe :
> 
> if (!atomic_read(&p->refcnt)) {
> 	list_del(&p->gc_list);
> 	kmem_cache_free(peer_cachep, p);
> }

I think this is ok as it is. inetpeer_invalidate_tree()
unlinks the whole inetpeer tree from the inetpeer base and
adds it to a gc_list. These intetpeer entries are stale,
they can't be looked up again. So noone should increment the
refcount, they just wait until the refcount get zero.

^ permalink raw reply

* Re: [PATCH] inetpeer: fix a race in inetpeer_gc_worker()
From: Steffen Klassert @ 2012-06-05 11:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1338888507.2760.2146.camel@edumazet-glaptop>

On Tue, Jun 05, 2012 at 11:28:27AM +0200, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> commit 5faa5df1fa2024 (inetpeer: Invalidate the inetpeer tree along with
> the routing cache) added a race :
> 
> Before freeing an inetpeer, we must respect a RCU grace period, and make
> sure no user will attempt to increase refcnt.
> 

As already mentioned in the other mail. In this case, I think
we can just delete the inetpeer once the refcount got zero.

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] inetpeer: add namespace support for inetpeer
From: Eric Dumazet @ 2012-06-05 12:00 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Gao feng, serge.hallyn, ebiederm, herbert, davem, netdev,
	containers
In-Reply-To: <20120605112728.GB27795@secunet.com>

On Tue, 2012-06-05 at 13:27 +0200, Steffen Klassert wrote:

> > 
> > By the way, I think we have a bug in inetpeer_gc_worker()
> > 
> > Steffen ?
> > 
> > We have no rcu grace period to make sure the following is safe :
> > 
> > if (!atomic_read(&p->refcnt)) {
> > 	list_del(&p->gc_list);
> > 	kmem_cache_free(peer_cachep, p);
> > }
> 
> I think this is ok as it is. inetpeer_invalidate_tree()
> unlinks the whole inetpeer tree from the inetpeer base and
> adds it to a gc_list. These intetpeer entries are stale,
> they can't be looked up again. So noone should increment the
> refcount, they just wait until the refcount get zero.
> 

Its not OK, lookups are done under rcu.

Since there is no RCU grace period, the worker free the entries while
another cpus are doing their lookups.

Alternative would be to wait a RCU grace period before feeding them to
worker.

^ permalink raw reply

* Re: [PATCH] macvtap: use set_curren_state to ensure mb
From: Hong zhi guo @ 2012-06-05 12:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, arnd, zhiguo.hong, vikifang
In-Reply-To: <1338894329.2760.2318.camel@edumazet-glaptop>

On Tue, Jun 5, 2012 at 7:05 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Why not using the more standard :
>
> prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
>
> and
>
> finish_wait(sk_sleep(sk), &wait);
>

Thanks for your comments.  The difference is that prepare_to_wait
inside loop introduces extra "list_empty" judgement than original
patch. But personally I prefer the newer version too:

Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
---
 drivers/net/macvtap.c |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 2ee56de..0737bd4 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -847,13 +847,12 @@ static ssize_t macvtap_do_read(struct
macvtap_queue *q, struct kiocb *iocb,
 			       const struct iovec *iv, unsigned long len,
 			       int noblock)
 {
-	DECLARE_WAITQUEUE(wait, current);
+	DEFINE_WAIT(wait);
 	struct sk_buff *skb;
 	ssize_t ret = 0;

-	add_wait_queue(sk_sleep(&q->sk), &wait);
 	while (len) {
-		current->state = TASK_INTERRUPTIBLE;
+		prepare_to_wait(sk_sleep(&q->sk), &wait, TASK_INTERRUPTIBLE);

 		/* Read frames from the queue */
 		skb = skb_dequeue(&q->sk.sk_receive_queue);
@@ -875,8 +874,7 @@ static ssize_t macvtap_do_read(struct
macvtap_queue *q, struct kiocb *iocb,
 		break;
 	}

-	current->state = TASK_RUNNING;
-	remove_wait_queue(sk_sleep(&q->sk), &wait);
+	finish_wait(sk_sleep(&q->sk), &wait);
 	return ret;
 }

-- 
1.7.4.1

^ permalink raw reply related

* Re: [PATCH net-next v2 1/2] inetpeer: add namespace support for inetpeer
From: Steffen Klassert @ 2012-06-05 12:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, ebiederm-aS9lmoZGLiVWk0Htik3J/w
In-Reply-To: <1338897630.2760.2433.camel@edumazet-glaptop>

On Tue, Jun 05, 2012 at 02:00:30PM +0200, Eric Dumazet wrote:
> On Tue, 2012-06-05 at 13:27 +0200, Steffen Klassert wrote:
> 
> > > 
> > > By the way, I think we have a bug in inetpeer_gc_worker()
> > > 
> > > Steffen ?
> > > 
> > > We have no rcu grace period to make sure the following is safe :
> > > 
> > > if (!atomic_read(&p->refcnt)) {
> > > 	list_del(&p->gc_list);
> > > 	kmem_cache_free(peer_cachep, p);
> > > }
> > 
> > I think this is ok as it is. inetpeer_invalidate_tree()
> > unlinks the whole inetpeer tree from the inetpeer base and
> > adds it to a gc_list. These intetpeer entries are stale,
> > they can't be looked up again. So noone should increment the
> > refcount, they just wait until the refcount get zero.
> > 
> 
> Its not OK, lookups are done under rcu.
> 
> Since there is no RCU grace period, the worker free the entries while
> another cpus are doing their lookups.
> 

Lookups are done under rcu, yes. But a lookup will not find
these stale entries because the whole interpeer tree is removed
from the inetpeer base before the worker is scheduled. A lookup
would have to cerate a new inetpeer entry in this case.

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] inetpeer: add namespace support for inetpeer
From: Eric Dumazet @ 2012-06-05 12:18 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, ebiederm-aS9lmoZGLiVWk0Htik3J/w
In-Reply-To: <20120605121510.GD27795-opNxpl+3fjRBDgjK7y7TUQ@public.gmane.org>

On Tue, 2012-06-05 at 14:15 +0200, Steffen Klassert wrote:

> Lookups are done under rcu, yes. But a lookup will not find
> these stale entries because the whole interpeer tree is removed
> from the inetpeer base before the worker is scheduled. A lookup
> would have to cerate a new inetpeer entry in this case.
> 


Thats simply not true. You don't respect basic RCU rules.

Please read them.

^ permalink raw reply

* Re: [PATCH] inetpeer: fix a race in inetpeer_gc_worker()
From: Eric Dumazet @ 2012-06-05 12:19 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, netdev
In-Reply-To: <20120605115640.GC27795@secunet.com>

On Tue, 2012-06-05 at 13:56 +0200, Steffen Klassert wrote:
> On Tue, Jun 05, 2012 at 11:28:27AM +0200, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > commit 5faa5df1fa2024 (inetpeer: Invalidate the inetpeer tree along with
> > the routing cache) added a race :
> > 
> > Before freeing an inetpeer, we must respect a RCU grace period, and make
> > sure no user will attempt to increase refcnt.
> > 
> 
> As already mentioned in the other mail. In this case, I think
> we can just delete the inetpeer once the refcount got zero.
> 

Nope, a concurrent lookup can find an entry about to be freed.

We must prevent it to increase the refcount from 0 to 1.

And we must wait a RCU grace period before freeing inetpeer.

Alternative would be the following patch :


diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index d4d61b6..6df9951 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -560,6 +560,17 @@ bool inet_peer_xrlim_allow(struct inet_peer *peer, int timeout)
 }
 EXPORT_SYMBOL(inet_peer_xrlim_allow);
 
+static void inetpeer_inval_rcu(struct rcu_head *head)
+{
+	struct inet_peer *p = container_of(head, struct inet_peer, rcu);
+
+	spin_lock(&gc_lock);
+	list_add_tail(&p->gc_list, &gc_list);
+	spin_unlock(&gc_lock);
+
+	schedule_delayed_work(&gc_work, gc_delay);
+}
+
 void inetpeer_invalidate_tree(int family)
 {
 	struct inet_peer *old, *new, *prev;
@@ -576,10 +587,7 @@ void inetpeer_invalidate_tree(int family)
 	prev = cmpxchg(&base->root, old, new);
 	if (prev == old) {
 		base->total = 0;
-		spin_lock(&gc_lock);
-		list_add_tail(&prev->gc_list, &gc_list);
-		spin_unlock(&gc_lock);
-		schedule_delayed_work(&gc_work, gc_delay);
+		call_rcu(&prev->rcu, inetpeer_inval_rcu);
 	}
 
 out:

^ permalink raw reply related

* Re: [PATCH net-next v2 1/2] inetpeer: add namespace support for inetpeer
From: Gao feng @ 2012-06-05 12:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: steffen.klassert-opNxpl+3fjRBDgjK7y7TUQ,
	herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1338886626.2760.2109.camel@edumazet-glaptop>

于 2012年06月05日 16:57, Eric Dumazet 写道:
> On Tue, 2012-06-05 at 15:52 +0800, Gao feng wrote:
> 
>> +static void __net_exit inetpeer_net_exit(struct net *net)
>> +{
>> +	inetpeer_invalidate_tree(net, AF_INET);
>> +	kfree(net->ipv4.peers);
>> +
>> +	inetpeer_invalidate_tree(net, AF_INET6);
>> +	kfree(net->ipv6.peers);
>> +}
>> +
> 
> Are we 1000% sure no code ever run in inetpeer land after this call ?

I am not sure,I need more time to research it.
I just do kfree peers here without set NULL pointer is
beacuse there is the same code with fib6_main_tbl in fib6_net_exit
and it seems work well.

Anyway, I will research it.

> 
> I would add
> 	net->ipv4.peers = NULL;
> 	net->ipv6.peers = NULL;
> 
> to catch NULL deref instead of strange errors, just in case.
> 
> By the way, I think we have a bug in inetpeer_gc_worker()
> 
> Steffen ?
> 
> We have no rcu grace period to make sure the following is safe :
> 
> if (!atomic_read(&p->refcnt)) {
> 	list_del(&p->gc_list);
> 	kmem_cache_free(peer_cachep, p);
> }
> 
> I'll post a fix like :
> 
> diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
> index d4d61b6..07731b5 100644
> --- a/net/ipv4/inetpeer.c
> +++ b/net/ipv4/inetpeer.c
> @@ -137,7 +137,7 @@ static void inetpeer_gc_worker(struct work_struct *work)
>  
>  		n = list_entry(p->gc_list.next, struct inet_peer, gc_list);
>  
> -		if (!atomic_read(&p->refcnt)) {
> +		if (atomic_cmpxchg(&p->refcnt, 0, -1) == 0) {
>  			list_del(&p->gc_list);
>  			kmem_cache_free(peer_cachep, p);
>  		}
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox