Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v13 10/16] Add a hook to intercept external buffers from NIC driver.
From: David Miller @ 2010-10-19 15:24 UTC (permalink / raw)
  To: xiaohui.xin; +Cc: netdev, kvm, linux-kernel, mst, mingo, herbert, jdike
In-Reply-To: <33f51c5eddec843260451eccbfa3c5fdea13b479.1287132437.git.xiaohui.xin@intel.com>

From: xiaohui.xin@intel.com
Date: Fri, 15 Oct 2010 17:12:11 +0800

> @@ -2891,6 +2922,11 @@ static int __netif_receive_skb(struct sk_buff *skb)
>  ncls:
>  #endif
>  
> +	/* To intercept mediate passthru(zero-copy) packets here */
> +	skb = handle_mpassthru(skb, &pt_prev, &ret, orig_dev);
> +	if (!skb)
> +		goto out;
> +
>  	/* Handle special case of bridge or macvlan */
>  	rx_handler = rcu_dereference(skb->dev->rx_handler);
>  	if (rx_handler) {

If you consume the packet here, devices in passthru mode cannot
be use with bonding.

But there is nothing that prevents a bond being created with such
a device.

So we have to either prevent such configurations (bad) or make
it work somehow (good) :-)

^ permalink raw reply

* Re: [PATCH] Fixed race condition at ip_vs.ko module init.
From: Simon Horman @ 2010-10-19 15:23 UTC (permalink / raw)
  To: Eduardo Blanco
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Patrick McHardy
In-Reply-To: <AANLkTinVcSK0Dqx4Pkp1dZskR8YmkGYkkrSNgDqX6psW@mail.gmail.com>

On Tue, Oct 19, 2010 at 10:26:47AM +0100, Eduardo Blanco wrote:
> Lists were initialized after the module was registered.  Multiple ipvsadm
> processes at module load triggered a race condition that resulted in a null
> pointer dereference in do_ip_vs_get_ctl(). As a result, __ip_vs_mutex
> was left locked preventing all further ipvsadm commands.
> 
> Signed-off-by: Eduardo J. Blanco <ejblanco@google.com>

Thanks Eduardo.

Patrick, please consider pulling

git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6.git for-patrick

to get this change.

> ---
>  net/netfilter/ipvs/ip_vs_ctl.c |   19 ++++++++++---------
>  1 files changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> index 0f0c079..68624dc 100644
> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> @@ -3385,6 +3385,16 @@ int __init ip_vs_control_init(void)
> 
>  	EnterFunction(2);
> 
> +	/* Initialize ip_vs_svc_table, ip_vs_svc_fwm_table, ip_vs_rtable */
> +	for(idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++)  {
> +		INIT_LIST_HEAD(&ip_vs_svc_table[idx]);
> +		INIT_LIST_HEAD(&ip_vs_svc_fwm_table[idx]);
> +	}
> +	for(idx = 0; idx < IP_VS_RTAB_SIZE; idx++)  {
> +		INIT_LIST_HEAD(&ip_vs_rtable[idx]);
> +	}
> +	smp_wmb();
> +
>  	ret = nf_register_sockopt(&ip_vs_sockopts);
>  	if (ret) {
>  		pr_err("cannot register sockopt.\n");
> @@ -3403,15 +3413,6 @@ int __init ip_vs_control_init(void)
> 
>  	sysctl_header = register_sysctl_paths(net_vs_ctl_path, vs_vars);
> 
> -	/* Initialize ip_vs_svc_table, ip_vs_svc_fwm_table, ip_vs_rtable */
> -	for(idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++)  {
> -		INIT_LIST_HEAD(&ip_vs_svc_table[idx]);
> -		INIT_LIST_HEAD(&ip_vs_svc_fwm_table[idx]);
> -	}
> -	for(idx = 0; idx < IP_VS_RTAB_SIZE; idx++)  {
> -		INIT_LIST_HEAD(&ip_vs_rtable[idx]);
> -	}
> -
>  	ip_vs_new_estimator(&ip_vs_stats);
> 
>  	/* Hook the defense timer */
> -- 
> 1.7.1
> 

^ permalink raw reply

* [GIT PULL net-2.6] vhost-net: access_ok fix
From: Michael S. Tsirkin @ 2010-10-19 14:59 UTC (permalink / raw)
  To: David Miller; +Cc: kvm, virtualization, netdev, linux-kernel

David,
Not sure if it's too late for 2.6.36 - in case it's not, the following tree
includes a last minute bugfix for vhost-net, found by code inspection.
It is on top of net-2.6.
Thanks!

The following changes since commit b0057c51db66c5f0f38059f242c57d61c4741d89:

  tg3: restore rx_dropped accounting (2010-10-11 16:06:24 -0700)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net

Dan Carpenter (1):
      vhost: fix return code for log_access_ok()

 drivers/vhost/vhost.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

-- 
MST

^ permalink raw reply

* Re: openvswitch/flow WAS ( Re: [rfc] Merging the Open vSwitch datapath
From: Simon Horman @ 2010-10-19 14:56 UTC (permalink / raw)
  To: jamal; +Cc: Jesse Gross, Ben Pfaff, netdev, ovs-team
In-Reply-To: <1287483768.26671.2.camel@bigi>

On Tue, Oct 19, 2010 at 06:22:48AM -0400, jamal wrote:
> On Mon, 2010-10-18 at 17:20 +0200, Simon Horman wrote:
> 
> > As I understand things, the packet goes from the kernel to userspace
> > and then (typically) comes back again.
> 
> Injection back is trivial.
> 
> > I guess that it would be possible to send a copy of the headers
> > to user-sapce while the packet is quarantined in the kernel pending
> > a response from user-space. I say only the headers, as typically
> > that is all user-space needs to make a decision, though I guess it
> > may need the body to make some types of decisions. I have no idea
> > if such a scheme would be desirable in any circumstances.
> 
> quarantine the packet in the kernel would be trickier than sending the
> whole thing up - for a sample of how it is done i believe the netfilter
> approach (ipq?) as well as ipsec would be good samples to look at.

Ok, lets forget my quarantine idea - I was just thinking aloud.


^ permalink raw reply

* Re: [PATCH] bridge: make br_parse_ip_options  static
From: Stephen Hemminger @ 2010-10-19 14:55 UTC (permalink / raw)
  To: Bandan Das; +Cc: David Miller, netdev
In-Reply-To: <20101019112234.GB12005@stratus.com>

On Tue, 19 Oct 2010 07:22:34 -0400
Bandan Das <bandan.das@stratus.com> wrote:

> On  0, Stephen Hemminger <shemminger@vyatta.com> wrote:
> > 
> > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> > 
> > --- a/net/bridge/br_netfilter.c	2010-10-18 17:01:36.903364885 -0700
> > +++ b/net/bridge/br_netfilter.c	2010-10-18 17:01:48.106569141 -0700
> > @@ -213,7 +213,7 @@ static inline void nf_bridge_update_prot
> >   * expected format
> >   */
> >  
> > -int br_parse_ip_options(struct sk_buff *skb)
> > +static int br_parse_ip_options(struct sk_buff *skb)
> >  {
> >  	struct ip_options *opt;
> >  	struct iphdr *iph;
> > 
> 
> My main motivation behind not making this static was that
> there would be possibly other places in the bridge code 
> (besides br_netfilter.c) where we enter the IP stack and might 
> want to call this. Not sure if it's indeed the case though..
> 

I checked by doing make allmodconfig as well as looking by
git grep 'br_parse_ip_options'

-- 

^ permalink raw reply

* [RFC] Make the ip6_tunnel reflect the true mtu.
From: Anders.Franzen @ 2010-10-19 14:38 UTC (permalink / raw)
  To: eric.dumazet, netdev; +Cc: Anders Franzen

From: Anders Franzen <anders.franzen@ericsson.com>

The ip6_tunnel always assumes it consumes 40 bytes (ip6 hdr) of the mtu of the
underlaying device. So for a normal ethernet bearer, the mtu of the ip6_tunnel is
1460.
However, when creating a tunnel the encap limit option is enabled by default, and it
consumes 8 bytes more, so the true mtu shall be 1452.

I dont really know if this breaks some statement in some RFC, so this is a request for
comments.
---
 net/ipv6/ip6_tunnel.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index c2c0f89..1f4c3cc 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1175,6 +1175,8 @@ static void ip6_tnl_link_config(struct ip6_tnl *t)
 				sizeof (struct ipv6hdr);
 
 			dev->mtu = rt->rt6i_dev->mtu - sizeof (struct ipv6hdr);
+			if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
+				dev->mtu-=8;
 
 			if (dev->mtu < IPV6_MIN_MTU)
 				dev->mtu = IPV6_MIN_MTU;
@@ -1362,12 +1364,17 @@ static const struct net_device_ops ip6_tnl_netdev_ops = {
 
 static void ip6_tnl_dev_setup(struct net_device *dev)
 {
+	struct ip6_tnl *t = NULL;
+
 	dev->netdev_ops = &ip6_tnl_netdev_ops;
 	dev->destructor = ip6_dev_free;
 
 	dev->type = ARPHRD_TUNNEL6;
 	dev->hard_header_len = LL_MAX_HEADER + sizeof (struct ipv6hdr);
 	dev->mtu = ETH_DATA_LEN - sizeof (struct ipv6hdr);
+	t = netdev_priv(dev);
+	if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
+		dev->mtu-=8;
 	dev->flags |= IFF_NOARP;
 	dev->addr_len = sizeof(struct in6_addr);
 	dev->features |= NETIF_F_NETNS_LOCAL;
-- 
1.7.2.3


^ permalink raw reply related

* Re: [PATCH] ip6_tunnel dont update the mtu on the route.
From: Eric Dumazet @ 2010-10-19 14:32 UTC (permalink / raw)
  To: Anders.Franzen; +Cc: netdev
In-Reply-To: <1287496247-24127-1-git-send-email-Anders.Franzen@ericsson.com>

Le mardi 19 octobre 2010 à 15:50 +0200, Anders.Franzen@ericsson.com a
écrit :
> From: Anders Franzen <anders.franzen@ericsson.com>
> 
> The ip6_tunnel device did not unset the flag,
> IFF_XMIT_DST_RELEASE. This will make the dev layer
> to release the dst before calling the tunnel.
> The tunnel will not update any mtu/pmtu info, since
> it does not have a dst on the skb.
> ---
>  net/ipv6/ip6_tunnel.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
> index c2c0f89..38b9a56 100644
> --- a/net/ipv6/ip6_tunnel.c
> +++ b/net/ipv6/ip6_tunnel.c
> @@ -1371,6 +1371,7 @@ static void ip6_tnl_dev_setup(struct net_device *dev)
>  	dev->flags |= IFF_NOARP;
>  	dev->addr_len = sizeof(struct in6_addr);
>  	dev->features |= NETIF_F_NETNS_LOCAL;
> +	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
>  }
>  
> 

Thanks for catching this Anders

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>




^ permalink raw reply

* [RFC 3/3] MPEG2/TS drop analyzer file: libxt_mp2t.c
From: Jesper Dangaard Brouer @ 2010-10-19 14:27 UTC (permalink / raw)
  To: Netfilter Developers; +Cc: paulmck, Eric Dumazet, netdev
In-Reply-To: <Pine.LNX.4.64.1010191608080.18708@ask.diku.dk>

/*
  * Userspace interface for MPEG2 TS match extension "mp2t" for Xtables.
  *
  * Copyright (c) Jesper Dangaard Brouer <jdb@comx.dk>, 2009+
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License; either
  * version 2 of the License, or any later version, as published by the
  * Free Software Foundation.
  *
  */

#include <getopt.h>
#include <netdb.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h>

#include <xtables.h>
#include "xt_mp2t.h"

/*
  * Userspace iptables/xtables interface for mp2t module.
  */

/* FIXME: don't think this compat check does not cover all versions */
#ifndef XTABLES_VERSION
#define xtables_error exit_error
#endif

static const struct option mp2t_mt_opts[] = {
 	{.name = "name",	.has_arg = true,  .val = 'n'},
 	{.name = "drop",	.has_arg = false, .val = 'd'},
 	{.name = "drop-detect",	.has_arg = false, .val = 'd'},
 	{.name = "max",		.has_arg = true,  .val = 'x'},
 	{.name = "max-streams",	.has_arg = true,  .val = 'x'},
 	{NULL},
};

static void mp2t_mt_help(void)
{
 	printf(
"mp2t (MPEG2 Transport Stream) match options:\n"
"VERSION %s\n"
"   [--name <name>]        Name for proc file /proc/net/xt_mp2t/rule_NAME\n"
"   [--drop-detect]        Match lost TS frames (occured before this packet)\n"
"   [--max-streams <num>]  Track 'max' number of streams (per rule)\n",
 		version
 		);
}

static void mp2t_mt_init(struct xt_entry_match *match)
{
 	struct xt_mp2t_mtinfo *info = (void *)match->data;
 	/* Enable drop detection per default */
 	info->flags = XT_MP2T_DETECT_DROP;
}

static int mp2t_mt_parse(int c, char **argv, int invert, unsigned int *flags,
 			 const void *entry, struct xt_entry_match **match)
{
 	struct xt_mp2t_mtinfo *info = (void *)(*match)->data;
 	u_int32_t num;

 	switch (c) {
 	case 'n': /* --name */
 		xtables_param_act(XTF_ONLY_ONCE, "mp2t", "--name",
 				  *flags & XT_MP2T_PARAM_NAME);
 		if (invert)
 			xtables_error(PARAMETER_PROBLEM, "Inverting name?");
 		if (strlen(optarg) == 0)
 			xtables_error(PARAMETER_PROBLEM, "Zero-length name?");
 		if (strchr(optarg, '"') != NULL)
 			xtables_error(PARAMETER_PROBLEM,
 				      "Illegal character in name (\")!");
 		strncpy(info->rule_name, optarg, sizeof(info->rule_name));
 		info->flags |= XT_MP2T_PARAM_NAME;
 		*flags |= XT_MP2T_PARAM_NAME;
 		break;

 	case 'd': /* --drop-detect */
 		if (*flags & XT_MP2T_DETECT_DROP)
 			xtables_error(PARAMETER_PROBLEM,
 				      "Can't specify --drop option twice");
 		*flags |= XT_MP2T_DETECT_DROP;

 		if (invert)
 			info->flags &= ~XT_MP2T_DETECT_DROP;
 		else
 			info->flags |= XT_MP2T_DETECT_DROP;

 		break;

 	case 'x': /* --max-streams */
 		if (*flags & XT_MP2T_MAX_STREAMS)
 			xtables_error(PARAMETER_PROBLEM,
 				"Can't specify --max-streams option twice");
 		*flags |= XT_MP2T_MAX_STREAMS;

 		if (invert) {
 			info->cfg.max = 0;
 			/* printf("inverted\n"); */
 			break;
 		}

 		/* OLD iptables style
 		if (string_to_number(optarg, 0, 0xffffffff, &num) == -1)
 			xtables_error(PARAMETER_PROBLEM,
 				      "bad --max-stream: `%s'", optarg);
 		*/

 		/* C-style
 		char *end;
 		num = strtoul(optarg, &end, 0);
 		*/

 		/* New xtables style */
 		if (!xtables_strtoui(optarg, NULL, &num, 0, UINT32_MAX))
 			xtables_error(PARAMETER_PROBLEM,
 				      "bad --max-stream: `%s'", optarg);

 		/* DEBUG: printf("--max-stream=%lu\n", num); */
 		info->flags |= XT_MP2T_MAX_STREAMS;
 		info->cfg.max = num;

 		break;

 	default:
 		return false;
 	}

 	return true;
}

static void mp2t_mt_print(const void *entry,
 			  const struct xt_entry_match *match, int numeric)
{
 	const struct xt_mp2t_mtinfo *info = (const void *)(match->data);

 	/* Always indicate this is a mp2t match rule */
 	printf("mp2t match");

 	if (info->flags & XT_MP2T_PARAM_NAME)
 		printf(" name:\"%s\"", info->rule_name);

 	if (!(info->flags & XT_MP2T_DETECT_DROP))
 		printf(" !drop-detect");

 	if (info->flags & XT_MP2T_MAX_STREAMS)
 		printf(" max-streams:%u ", info->cfg.max);
}

static void mp2t_mt_save(const void *entry,
 			 const struct xt_entry_match *match)
{
 	const struct xt_mp2t_mtinfo *info = (const void *)(match->data);

 	/* We need to handle --name, --drop-detect, and --max-streams. */
 	if (info->flags & XT_MP2T_PARAM_NAME)
 		printf("--name \"%s\" ",  info->rule_name);

 	if (!(info->flags & XT_MP2T_DETECT_DROP))
 		printf("! --drop-detect ");

 	if (info->flags & XT_MP2T_MAX_STREAMS)
 		printf("--max-streams %u ", info->cfg.max);

}

static struct xtables_match mp2t_mt_reg = {
 	.version        = XTABLES_VERSION,
 	.name           = "mp2t",
 	.revision       = 0,
 	.family         = PF_UNSPEC,
 	.size           = XT_ALIGN(sizeof(struct xt_mp2t_mtinfo)),
 	.userspacesize  = offsetof(struct xt_mp2t_mtinfo, hinfo),
 	.init           = mp2t_mt_init,
 	.help           = mp2t_mt_help,
 	.parse          = mp2t_mt_parse,
/*	.final_check    = mp2t_mt_check,*/
 	.print          = mp2t_mt_print,
 	.save           = mp2t_mt_save,
 	.extra_opts     = mp2t_mt_opts,
};

static void _init(void)
{
 	xtables_register_match(&mp2t_mt_reg);
}

^ permalink raw reply

* [RFC 2/3] MPEG2/TS drop analyzer file: xt_mp2t.c
From: Jesper Dangaard Brouer @ 2010-10-19 14:26 UTC (permalink / raw)
  To: Netfilter Developers; +Cc: paulmck, Eric Dumazet, netdev
In-Reply-To: <Pine.LNX.4.64.1010191608080.18708@ask.diku.dk>

/*
  * MPEG2 TS match extension "mp2t" for Xtables.
  *
  * This module analyses the contents of MPEG2 Transport Stream (TS)
  * packets, and can detect TS/CC packet drops.
  *
  * Copyright (c) Jesper Dangaard Brouer <jdb@comx.dk>, 2009+
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License; either
  * version 2 of the License, or any later version, as published by the
  * Free Software Foundation.
  *
  */

#include <linux/ip.h>
#include <linux/udp.h>
#include <linux/module.h>
#include <linux/skbuff.h>
#include <linux/version.h>
#include <linux/netfilter/x_tables.h>

#include <linux/rculist.h>

#include "xt_mp2t.h"
#include "compat_xtables.h"

#include <linux/netdevice.h> /* msg levels */

/* Proc file related */
#include <linux/proc_fs.h>
#include <linux/seq_file.h>

/* Timestamp related */
#include <linux/time.h>

MODULE_AUTHOR("Jesper Dangaard Brouer <jdb@comx.dk>");
MODULE_DESCRIPTION("Detecting packet drops in MPEG2 Transport Streams (TS)");
MODULE_LICENSE("GPL");
MODULE_VERSION(XT_MODULE_VERSION);
MODULE_ALIAS("ipt_mp2t");
MODULE_ALIAS("ipt_mpeg2ts");

/* Proc related */
static struct proc_dir_entry *mp2t_procdir;
static const struct file_operations dl_file_ops;

/* Message level instrumentation based upon the device driver message
  * levels see include/linux/netdevice.h.
  *
  * Note that "msg_level" is runtime adjustable via:
  *  /sys/module/xt_mp2t/parameters/msg_level
  *
  */
#define NETIF_MSG_DEBUG  0x10000

/* Performance tuning instrumentation that can be compiled out */
/* #define PERFTUNE 1 */
#define PERFTUNE 0

#if 1
#define MP2T_MSG_DEFAULT						\
 	(NETIF_MSG_DRV   | NETIF_MSG_PROBE  | NETIF_MSG_LINK |		\
 	 NETIF_MSG_IFUP  | NETIF_MSG_IFDOWN |				\
 	 NETIF_MSG_DEBUG | NETIF_MSG_RX_ERR | NETIF_MSG_RX_STATUS	\
 	)
#else
#define MP2T_MSG_DEFAULT						\
 	(NETIF_MSG_DRV    | NETIF_MSG_PROBE  | NETIF_MSG_LINK |		\
 	 NETIF_MSG_IFUP   | NETIF_MSG_IFDOWN |				\
 	 NETIF_MSG_RX_ERR |						\
 	)
#endif

static int debug  = -1;
static int msg_level;
module_param(debug, int, 0);
module_param(msg_level, int, 0664);
MODULE_PARM_DESC(debug, "Set low N bits of message level");
MODULE_PARM_DESC(msg_level, "Message level bit mask");

/* Possibility to compile out print statements, this was used when
  * profiling the code.
  */
/* #define NO_MSG_CODE 1 */
/* #undef DEBUG */
/* #define DEBUG 1 */

#ifdef NO_MSG_CODE
#undef DEBUG
#endif

#ifdef DEBUG
#define msg_dbg(TYPE, f, a...)						\
 	do {	if (msg_level & NETIF_MSG_##TYPE)			\
 			if (net_ratelimit())				\
 				printk(KERN_DEBUG PFX f "\n", ## a);	\
 	} while (0)
#else
#define msg_dbg(TYPE, f, a...)
#endif

#ifdef NO_MSG_CODE
#define msg_info(TYPE, f, a...)
#else
#define msg_info(TYPE, f, a...)						\
 	do {	if (msg_level & NETIF_MSG_##TYPE)			\
 			if (net_ratelimit())				\
 				printk(KERN_INFO PFX f "\n", ## a);	\
 	} while (0)
#endif

#ifdef NO_MSG_CODE
#define msg_notice(TYPE, f, a...)
#else
#define msg_notice(TYPE, f, a...)					\
 	do {	if (msg_level & NETIF_MSG_##TYPE)			\
 			if (net_ratelimit())				\
 				printk(KERN_NOTICE PFX f "\n", ## a);	\
 	} while (0)
#endif

#ifdef NO_MSG_CODE
#define msg_warn(TYPE, f, a...)
#else
#define msg_warn(TYPE, f, a...)						\
 	do {	if (msg_level & NETIF_MSG_##TYPE)			\
 			if (net_ratelimit())				\
 				printk(KERN_WARNING PFX f "\n", ## a);	\
 	} while (0)
#endif


#ifdef NO_MSG_CODE
#define msg_err(TYPE, f, a...)
#else
#define msg_err(TYPE, f, a...)						\
 	do {	if (msg_level & NETIF_MSG_##TYPE)			\
 			if (net_ratelimit())				\
 				printk(KERN_ERR PFX f "\n", ## a);	\
 	} while (0)
#endif


/*** Defines from Wireshark packet-mp2t.c ***/
#define MP2T_PACKET_SIZE 188
#define MP2T_SYNC_BYTE 0x47

#define MP2T_SYNC_BYTE_MASK	0xFF000000
#define MP2T_TEI_MASK		0x00800000
#define MP2T_PUSI_MASK		0x00400000
#define MP2T_TP_MASK		0x00200000
#define MP2T_PID_MASK		0x001FFF00
#define MP2T_TSC_MASK		0x000000C0
#define MP2T_AFC_MASK		0x00000030
#define MP2T_CC_MASK		0x0000000F

#define MP2T_SYNC_BYTE_SHIFT	24
#define MP2T_TEI_SHIFT		23
#define MP2T_PUSI_SHIFT		22
#define MP2T_TP_SHIFT		21
#define MP2T_PID_SHIFT		8
#define MP2T_TSC_SHIFT		6
#define MP2T_AFC_SHIFT		4
#define MP2T_CC_SHIFT		0

/** WIRESHARK CODE COPY-PASTE
  *
  * Wireshark value_string structures
  * typedef struct _value_string {
  *	u32	   value;
  *	const char *strptr;
  * } value_string;
  *
  * Adaption field values "doc" taken from Wireshark
  * static const value_string mp2t_afc_vals[] = {
  *	{ 0, "Reserved" },
  *	{ 1, "Payload only" },
  *	{ 2, "Adaptation Field only" },
  *	{ 3, "Adaptation Field and Payload" },
  *	{ 0, NULL }
  * };
  *
  * WIRESHARK Data structure used for detecting CC drops
  *
  *  conversation
  *    |
  *    +-> mp2t_analysis_data
  *          |
  *          +-> pid_table (RB tree)
  *          |     |
  *          |     +-> pid_analysis_data (per pid)
  *          |     +-> pid_analysis_data
  *          |     +-> pid_analysis_data
  *          |
  *          +-> frame_table (RB tree)
  *                |
  *                +-> frame_analysis_data (only created if drop detected)
  *                      |
  *                      +-> ts_table (RB tree)
  *                            |
  *                            +-> pid_analysis_data (per TS subframe)
  *                            +-> pid_analysis_data

  * Datastructures:
  * ---------------
  *
  * xt_rule_mp2t_conn_htable (per iptables rule)
  *    metadata
  *    locking: RCU
  *    hash[metadata.cfg.size]
  *          |
  *          +-> lists of type mp2t_stream elements
  *
  *
  * mp2t_stream (per multicast/mpeg2-ts stream)
  *     stats (about skips and discontinuities)
  *     locking: Spinlock
  *     pid_cc_table (normal list)
  *       |
  *       +-> list of type pid_data_t
  *           One per PID representing the last TS frames CC value
  *
  *
  **/

/*** Global defines ***/
static DEFINE_SPINLOCK(mp2t_lock); /* Protects conn_htables list */
static LIST_HEAD(conn_htables);    /* List of xt_rule_mp2t_conn_htable's */
static u_int32_t GLOBAL_ID;	   /* Used for assigning rule_id's */
/* TODO/FIXME: xt_hashlimit has this extra mutex, do I need it?
static DEFINE_MUTEX(mp2t_mutex);*/ /* Additional checkentry protection */


/* This is sort of the last TS frames info per pid */
struct pid_data_t {
 	struct list_head list;
 	int16_t pid;
 	int16_t cc_prev;
};

#define MAX_PID 0x1FFF

/** Hash table stuff **/

/* Data to match a stream / connection */
struct mp2t_stream_match { /* Like xt_hashlimit: dsthash_dst */
 	union {
 		struct {
 			__be32 dst; /* MC addr first */
 			__be32 src;
 		} ip;
 	};
 	__be16 dst_port;
 	__be16 src_port;
};

/* Hash entry with info about the mp2t stream / connection */
struct mp2t_stream { /* Like xt_hashlimit: dsthash_ent */
 	/* Place static / read-only parts in the beginning */
 	struct hlist_node node;
 	struct mp2t_stream_match match;

 	/* Place modified structure members in the end */
 	/* FIXME: Add spacing in struct for cache alignment */

 	/* Per stream total skips and discontinuity */
 	/* TODO: Explain difference between skips and discontinuity */
 	u64 skips;
 	u64 discontinuity;

 	/* lock for writing/changing/updating */
 	spinlock_t lock;

 	/* Usage counter to protect against dealloc/kfree */
 	atomic_t use;

 	/* PID list with last CC value */
 	struct list_head pid_list;
 	int pid_list_len;

 	/* For RCU-protected deletion */
 	struct rcu_head rcu_head;
};


/* This is basically our "stream" connection tracking.
  *
  * Keeping track of the MPEG2 streams per iptables rule.
  * There is one hash-table per iptables rule.
  * (Based on xt_hashlimit).
  */
struct xt_rule_mp2t_conn_htable {

 	/* Global list containing these elements are needed: (1) to
 	 * avoid realloc of our data structures when other rules gets
 	 * inserted. (2) to provide stats via /proc/ as data must not
 	 * be deallocated while a process reads data from /proc.
 	 */
 	struct list_head list;		/* global list of all htables */
 	atomic_t use;			/* reference counting  */
 	u_int32_t id;			/* id corrosponding to rule_id */
 	/* u_int8_t family; */ /* needed for IPv6 support */

 	/* "cfg" is also defined here as the real hash array size might
 	 * differ from the user defined size, and changing the
 	 * userspace defined rule data is not allowed as userspace
 	 * then cannot match the rule again for deletion */
 	struct mp2t_cfg cfg;		/* config */

 	/* Used internally */
 	spinlock_t lock;		/* write lock for hlist_head */
 	u_int32_t rnd;			/* random seed for hash */
 	int rnd_initialized;
 	unsigned int count;		/* number entries in table */
 	u_int16_t warn_condition;	/* limiting warn printouts */

 	/* Rule creation time can be used by userspace to 1) determine
 	 * the running periode and 2) to detect if the rule has been
 	 * flushed between two reads.
 	 */
 	struct timespec time_created;

 	/*TODO: Implement timer GC cleanup, to detect streams disappearing
 	  struct timer_list timer;*/	/* timer for gc */

 	/* Instrumentation for perf tuning */
 	int32_t max_list_search;	/* Longest search in a hash list */
 	atomic_t concurrency_cnt;	/* Trying to detect concurrency */
 	int32_t stream_not_found;	/* Number of stream created */

 	/* Proc seq_file entry */
 	struct proc_dir_entry *pde;

 	struct hlist_head stream_hash[0];/* conn/stream hashtable
 					  * struct mp2t_stream elements */
};

/* Inspired by xt_hashlimit.c : htable_create() */
static bool
mp2t_htable_create(struct xt_mp2t_mtinfo *minfo)
{
 	struct xt_rule_mp2t_conn_htable *hinfo;
 	unsigned int hash_buckets;
 	unsigned int hash_struct_sz;
 	char rule_name[IFNAMSIZ+5];
 	unsigned int i;
 	u_int32_t id;
 	size_t size;

 	/* Q: is lock with mp2t_lock necessary */
 	spin_lock(&mp2t_lock);
 	id = GLOBAL_ID++;
 	spin_unlock(&mp2t_lock);

 	if (minfo->cfg.size)
 		hash_buckets = minfo->cfg.size;
 	else
 		hash_buckets = 100;

 	hash_struct_sz = sizeof(*minfo->hinfo); /* metadata struct size */
 	size = hash_struct_sz +	sizeof(struct list_head) * hash_buckets;

 	msg_info(IFUP, "Alloc htable(%d) %d bytes elems:%d metadata:%d bytes",
 		 id, (int)size, hash_buckets, hash_struct_sz);

 	hinfo = kzalloc(size, GFP_ATOMIC);
 	if (hinfo == NULL) {
 		msg_err(DRV, "unable to create hashtable(%d), out of memory!",
 			id);
 		return false;
 	}
 	minfo->hinfo = hinfo;

 	/* Copy match config into hashtable config */
 	memcpy(&hinfo->cfg, &minfo->cfg, sizeof(hinfo->cfg));
 	hinfo->cfg.size = hash_buckets;

 	/* Max number of connection we want to track */
 	/* TODO: REMOVE code
 	if (minfo->cfg.max == 0)
 		hinfo->cfg.max = 8 * hinfo->cfg.size;
 	else if (hinfo->cfg.max < hinfo->cfg.size)
 		hinfo->cfg.max = hinfo->cfg.size;
 	*/

 	if (hinfo->cfg.max_list == 0)
 		hinfo->cfg.max_list = 20;

 	/* Init the hash buckets */
 	for (i = 0; i < hinfo->cfg.size; i++)
 		INIT_HLIST_HEAD(&hinfo->stream_hash[i]);

 	/* Refcnt to allow alloc data to survive between rule updates*/
 	atomic_set(&hinfo->use, 1);
 	hinfo->id = id;

 	INIT_LIST_HEAD(&hinfo->list);
 	/*
 	spin_lock(&mp2t_lock);
 	list_add_tail(&conn_htables, &hinfo->list);
 	spin_unlock(&mp2t_lock);
 	*/

 	hinfo->count = 0;
 	hinfo->rnd_initialized = 0;
 	hinfo->max_list_search = 0;
 	atomic_set(&hinfo->concurrency_cnt, 0);
 	hinfo->stream_not_found = 0;

 	getnstimeofday(&hinfo->time_created);

 	/* Generate a rule_name for proc if none given */
 	if (!minfo->rule_name || !strlen(minfo->rule_name))
 		snprintf(rule_name, IFNAMSIZ+5, "rule_%d", hinfo->id);
 	else
 		/* FIXME: Check for duplicate names! */
 		snprintf(rule_name, IFNAMSIZ+5, "rule_%s", minfo->rule_name);

 	/* Create proc entry */
 	hinfo->pde = proc_create_data(rule_name, 0, mp2t_procdir,
 				      &dl_file_ops, hinfo);

#ifdef CONFIG_PROC_FS
 	if (!hinfo->pde) {
 		msg_err(PROBE, "Cannot create proc file named: %s",
 			minfo->rule_name);
 		kfree(hinfo);
 		return false;
 	}
#endif

 	spin_lock_init(&hinfo->lock);

 	return true;
}

static u_int32_t
hash_match(const struct xt_rule_mp2t_conn_htable *ht,
 	   const struct mp2t_stream_match *match)
{
 	u_int32_t hash = jhash2((const u32 *)match,
 				sizeof(*match)/sizeof(u32),
 				ht->rnd);
 	/*
 	 * Instead of returning hash % ht->cfg.size (implying a divide)
 	 * we return the high 32 bits of the (hash * ht->cfg.size) that will
 	 * give results between [0 and cfg.size-1] and same hash distribution,
 	 * but using a multiply, less expensive than a divide
 	 */
 	return ((u64)hash * ht->cfg.size) >> 32;
}

static inline
bool match_cmp(const struct mp2t_stream *ent,
 			     const struct mp2t_stream_match *b)
{
 	return !memcmp(&ent->match, b, sizeof(ent->match));
}

static struct mp2t_stream *
mp2t_stream_find(struct xt_rule_mp2t_conn_htable *ht,
 		 const struct mp2t_stream_match *match)
{
 	struct mp2t_stream *entry;
 	struct hlist_node  *pos;
 	u_int32_t hash;
 	int cnt = 0;

#if PERFTUNE
 	int parallel = 0;
 	static int limit;

 	/* rcu_read_lock(); // Taken earlier */
 	parallel = atomic_inc_return(&ht->concurrency_cnt);
#endif
 	hash = hash_match(ht, match);

 	if (!hlist_empty(&ht->stream_hash[hash])) {
 		/* The hlist_for_each_entry_rcu macro uses the
 		 * appropiate rcu_dereference() to access the
 		 * mp2t_stream pointer */
 		hlist_for_each_entry_rcu(entry, pos,
 				     &ht->stream_hash[hash], node) {
 			cnt++;
 			if (match_cmp(entry, match))
 				goto found;
 		}
 	}

 	/* rcu_read_unlock(); // Released later */
#if PERFTUNE
 	atomic_dec(&ht->concurrency_cnt);
#endif
 	ht->stream_not_found++; /* This is racy, but its only a debug var */
 	return NULL;

found:
 	if (unlikely(cnt > ht->cfg.max_list) &&
 	    unlikely(cnt > ht->max_list_search)) {
 		ht->max_list_search = cnt;
 		msg_warn(PROBE, "Perf: Long list search %d in stream_hash[%u]",
 			 cnt, hash);
 	}

#if PERFTUNE
 	atomic_dec(&ht->concurrency_cnt);

 	if (parallel > 2 && (limit++ % 100 == 0))
 		msg_info(PROBE, "Did it in parallel, concurrency count:%d",
 			 parallel);
#endif

 	return entry;
}

static struct pid_data_t *
mp2t_pid_find(struct mp2t_stream *stream, const int16_t pid)
{
 	struct pid_data_t *entry;

 	list_for_each_entry(entry, &stream->pid_list, list) {
 		if (entry->pid == pid)
 			return entry;
 	}
 	return NULL;
}

static struct pid_data_t *
mp2t_pid_create(struct mp2t_stream *stream, const int16_t pid)
{
 	struct pid_data_t *entry;

 	entry = kmalloc(sizeof(*entry), GFP_ATOMIC);
 	if (!entry) {
 		msg_err(DRV, "can't allocate new pid list entry");
 		return NULL;
 	}
 	entry->pid     = pid;
 	entry->cc_prev = -1;

 	stream->pid_list_len++;

 	list_add_tail(&entry->list, &stream->pid_list);

 	return entry;
}

static int
mp2t_pid_destroy_list(struct mp2t_stream *stream)
{
 	struct pid_data_t *entry, *n;

 	msg_dbg(PROBE, "Cleanup up pid list with %d elements",
 		stream->pid_list_len);

 	list_for_each_entry_safe(entry, n, &stream->pid_list, list) {
 		stream->pid_list_len--;
 		kfree(entry);
 	}
 	WARN_ON(stream->pid_list_len != 0);
 	return stream->pid_list_len;
}

static struct mp2t_stream *
mp2t_stream_alloc_init(struct xt_rule_mp2t_conn_htable *ht,
 		       const struct mp2t_stream_match *match)
{
 	struct mp2t_stream *entry; /* hashtable entry */
 	unsigned int entry_sz;
 	size_t size;
 	u_int32_t hash;

 	/* initialize hash with random val at the time we allocate
 	 * the first hashtable entry */
 	if (unlikely(!ht->rnd_initialized)) {
 		spin_lock_bh(&ht->lock);
 		if (unlikely(!ht->rnd_initialized)) {
 			get_random_bytes(&ht->rnd, 4);
 			ht->rnd_initialized = 1;
 		}
 		spin_unlock_bh(&ht->lock);
 	}

 	/* DoS protection / embedded feature, for protection the size
 	 * of the hash table lists. Limit the number of streams the
 	 * module are willing to track.  This limit is configurable
 	 * from userspace.  Can also be useful on small CPU/memory
 	 * systems. */
 	if (ht->cfg.max && ht->count >= ht->cfg.max) {
 		if (unlikely(ht->warn_condition < 10)) {
 			ht->warn_condition++;
 			msg_warn(RX_ERR,
 			 "Rule[%d]: "
 			 "Stopped tracking streams, max %u exceeded (%u) "
 			 "(Max can be adjusted via --max-streams param)",
 			 ht->id, ht->cfg.max, ht->count);
 		}
 		return NULL;
 	}

 	/* Calc the hash value */
 	hash = hash_match(ht, match);

 	/* Allocate new stream element */
 	/* entry = kmem_cache_alloc(hashlimit_cachep, GFP_ATOMIC); */
 	size = entry_sz = sizeof(*entry);
 	/* msg_info(IFUP, "Alloc new stream entry (%d bytes)", entry_sz); */

 	entry = kzalloc(entry_sz, GFP_ATOMIC);
 	if (!entry) {
 		msg_err(DRV, "can't allocate new stream elem");
 		return NULL;
 	}
 	memcpy(&entry->match, match, sizeof(entry->match));

 	spin_lock_init(&entry->lock);
 	atomic_set(&entry->use, 1);

 	/* Init the pid table list */
 	INIT_LIST_HEAD(&entry->pid_list);
 	entry->pid_list_len = 0;

 	/* init the RCU callback structure needed by call_rcu() */
 	INIT_RCU_HEAD(&entry->rcu_head);

 	/* Q Locking: Adding and deleting elements from the
 	 * stream_hash[] lists is protected by the spinlock ht->lock.
 	 * Should we only use try lock and exit if we cannot get it???
 	 * I'm worried about what happens if we are waiting for the
 	 * lock held by xt_mp2t_mt_destroy() which will dealloc ht
 	 */
 	spin_lock_bh(&ht->lock);
 	hlist_add_head_rcu(&entry->node, &ht->stream_hash[hash]);
 	ht->count++; /* Convert to atomic? Its write protected by ht->lock */
 	spin_unlock_bh(&ht->lock);

 	return entry;
}

/*
  * The xt_mp2t_mt_check(), return type changed, which is quite
  *  confusing as the return logic gets turned around.
  *
  *  TODO: Think change happend in 2.6.35, need to check the exact
  *  kernel version this changed in!
  */
#if LINUX_VERSION_CODE <= KERNEL_VERSION(2, 6, 34)
enum RETURNVALS { error = 0 /*false*/, success = 1 /*true*/, };
#endif
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 35)
enum RETURNVALS { error = -EINVAL, success = 0, };
#endif

static int
xt_mp2t_mt_check(const struct xt_mtchk_param *par)
{
 	struct xt_mp2t_mtinfo *info = par->matchinfo;

 	/*
 	if (info->flags & ~XT_MP2T_DETECT_DROP)
 		return false;
 	*/

 	/* Debugging, this should not be possible */
 	if (!info) {
 		msg_err(DRV, "ERROR info is NULL");
 		return error;
 	}

 	/* Debugging, this should not be possible */
 	if (IS_ERR_VALUE((unsigned long)(info->hinfo))) {
 		msg_err(DRV, "ERROR info->hinfo is an invalid pointer!!!");
 		return error;
 	}

 	/* TODO/FIXME: Add a check to NOT allow proc files with same
 	 * name in /proc/net/xt_mp2t/rule_%s */


 	/* TODO: Write about how, this preserves htable memory by
 	 * reuse of hinfo pointer and incrementing 'use' refcounter
 	 * assures that xt_mp2t_mt_destroy() will not call
 	 * conn_htable_destroy() thus not deallocating our memory */
 	if (info->hinfo != NULL) {
 		atomic_inc(&info->hinfo->use);
 		msg_info(DEBUG, "ReUsing info->hinfo ptr:[%p] htable id:%d",
 			 info->hinfo, info->hinfo->id);
 		return success;
 	}

 	if (mp2t_htable_create(info) == false) {
 		msg_err(DRV, "Error creating hash table");
 		return error;
 	}

 	return success;
}

static void
mp2t_stream_free(struct rcu_head *head)
{
 	struct mp2t_stream *stream;

 	stream = container_of(head, struct mp2t_stream, rcu_head);

 	/* Debugging check */
 	if (unlikely(!stream))
 		printk(KERN_CRIT PFX
 		       "Free BUG: Stream ptr is NULL (tell:jdb@comx.dk)\n");

 	/* Deallocate the PID list */
 	spin_lock_bh(&stream->lock);
 	mp2t_pid_destroy_list(stream);
 	spin_unlock_bh(&stream->lock);

 	/* Before free, check the 'use' reference counter */
 	if (atomic_dec_and_test(&stream->use)) {
 		kfree(stream);
 	} else {
 		/* If this can occur, we should schedule something
 		 * that can clean up */
 		printk(KERN_CRIT PFX
 		       "Free BUG: Stream still in use! (tell:jdb@comx.dk)\n");
 	}
}

static void
conn_htable_destroy(struct xt_rule_mp2t_conn_htable *ht)
{
 	unsigned int i;

 	/* Remove proc entry */
 	remove_proc_entry(ht->pde->name, mp2t_procdir);

 	msg_info(IFDOWN, "Destroy stream elements (%d count) in htable(%d)",
 		 ht->count, ht->id);
 	msg_dbg(IFDOWN, "Find stream, not found %d times",
 		ht->stream_not_found);

 	/* lock hash table and iterate over it to release all elements */
 	spin_lock(&ht->lock);
 	for (i = 0; i < ht->cfg.size; i++) {
 		struct mp2t_stream *stream;
 		struct hlist_node *pos, *n;
 		hlist_for_each_entry_safe(stream, pos, n,
 					  &ht->stream_hash[i], node) {

 			hlist_del_rcu(&stream->node);
 			ht->count--;

 			/* Have to use call_rcu(), because we cannot
 			   use synchronize_rcu() here, because we are
 			   holding a spinlock, or else we will get a
 			   "scheduling while atomic" bug.
 			*/
 			call_rcu_bh(&stream->rcu_head, mp2t_stream_free);
 		}
 	}
 	spin_unlock(&ht->lock);

 	msg_info(IFDOWN,
 		 "Free htable(%d) (%d buckets) longest list search %d",
 		 ht->id, ht->cfg.size, ht->max_list_search);

 	if (ht->count != 0)
 		printk(KERN_CRIT PFX
 		       "Free BUG: ht->count != 0 (tell:jdb@comx.dk)\n");

 	kfree(ht);
}


/*
  * Keeping dynamic allocated memory when the rulesets are swapped.
  *
  * Iptables rule updates works by replacing the entire ruleset.  Our
  * dynamic allocated data (per rule) needs to survive this update, BUT
  * only if our rule has not been removed.  This is achieved by having
  * a reference counter.  The reason it works, is that during swapping
  * of rulesets, the checkentry function (xt_mp2t_mt_check) is called
  * on the new ruleset _before_ calling the destroy function
  * (xt_mp2t_mt_destroy) on the old ruleset.  During checkentry, we
  * increment the reference counter on data if we can find the data
  * associated with this rule.
  *
  * Functions used to achieve this is:
  *   conn_htable_get() - Find data and increment refcnt
  *   conn_htable_put() - Finished usind data, delete if last user
  *   conn_htable_add() - Add data to the global searchable list
  */

static struct xt_rule_mp2t_conn_htable*
conn_htable_get(u32 rule_id)
{
 	struct xt_rule_mp2t_conn_htable *hinfo;

 	spin_lock_bh(&mp2t_lock);
 	list_for_each_entry(hinfo, &conn_htables, list) {
 		if (hinfo->id == rule_id) {
 			atomic_inc(&hinfo->use);
 			spin_unlock_bh(&mp2t_lock);
 			return hinfo;
 		}
 	}
 	spin_unlock_bh(&mp2t_lock);
 	return NULL;
}

static void
conn_htable_put(struct xt_rule_mp2t_conn_htable *hinfo)
{
 	/* Finished using element, delete if last user */
 	if (atomic_dec_and_test(&hinfo->use)) {
 		spin_lock_bh(&mp2t_lock);
 		list_del(&hinfo->list);
 		spin_unlock_bh(&mp2t_lock);
 		conn_htable_destroy(hinfo);
 	}
}

static void
conn_htable_add(struct xt_rule_mp2t_conn_htable *hinfo)
{
 	spin_lock_bh(&mp2t_lock);
 	list_add_tail(&conn_htables, &hinfo->list);
 	spin_unlock_bh(&mp2t_lock);
}

static void
xt_mp2t_mt_destroy(const struct xt_mtdtor_param *par)
{
 	const struct xt_mp2t_mtinfo *info = par->matchinfo;
 	struct xt_rule_mp2t_conn_htable *hinfo;
 	hinfo = info->hinfo;

 	/* Calls only destroy if refcnt is zero */
 	if (atomic_dec_and_test(&hinfo->use))
 		conn_htable_destroy(hinfo);
}


/* Calc the number of skipped CC numbers. Note that this can easy
  * overflow, and a value above 7 indicate several network packets
  * could be lost.
  */
static inline unsigned int
calc_skips(unsigned int curr, unsigned int prev)
{
 	int res = 0;

 	/* Only count the missing TS frames in between prev and curr.
 	 * The "prev" frame CC number seen is confirmed received, its
 	 * the next frames CC counter which is the first known missing
 	 * TS frame
 	 */
 	prev += 1;

 	/* Calc missing TS frame 'skips' */
 	res = curr - prev;

 	/* Handle wrap around */
 	if (res < 0)
 		res += 16;

 	return res;
}

/* Return the number of skipped CC numbers */
static int
detect_cc_drops(struct pid_data_t *pid_data, int8_t cc_curr,
 		const struct sk_buff *skb)
{
 	int8_t cc_prev;
 	int skips = 0;

 	cc_prev           = pid_data->cc_prev;
 	pid_data->cc_prev = cc_curr;

 	/* Null packet always have a CC value equal 0 */
 	if (pid_data->pid == 0x1fff)
 		return 0;

 	/* FIXME: Handle adaptation fields and Remove this code */
 	/* Its allowed that (cc_prev == cc_curr) if its an adaptation
 	 * field.
 	 */
 	if (cc_prev == cc_curr)
 		return 0;

 	/* Have not seen this pid before */
 	if (cc_prev == -1)
 		return 0;

 	/* Detect if CC is not increasing by one all the time */
 	if (cc_curr != ((cc_prev+1) & MP2T_CC_MASK)) {
 		skips = calc_skips(cc_curr, cc_prev);

 		msg_info(RX_STATUS,
 			 "Detected drop pid:%d CC curr:%d prev:%d skips:%d",
 			 pid_data->pid, cc_curr, cc_prev, skips);

 		/* TODO: Do accounting per PID ?
 		pid_data->cc_skips += skips;
 		pid_data->cc_err++;
 		*/
 	}

 	return skips;
}


static int
dissect_tsp(unsigned char *payload_ptr, u16 payload_len,
 	    const struct sk_buff *skb, struct mp2t_stream *stream)
{
 	__be32 header;
 	u16 pid;
 	u8 afc;
 	int8_t cc_curr;
 	int skips = 0;
 	struct pid_data_t *pid_data;

 	/* Process header*/
 	header  = ntohl(*(u32 *)payload_ptr);
 	pid     = (header & MP2T_PID_MASK) >> MP2T_PID_SHIFT;
 	afc     = (header & MP2T_AFC_MASK) >> MP2T_AFC_SHIFT;
 	cc_curr = (header & MP2T_CC_MASK)  >> MP2T_CC_SHIFT;

 	msg_dbg(PKTDATA, "TS header:0x%X pid:%d cc:%d afc:%d",
 		header, pid, cc_curr, afc);

 	/* Adaption Field Control header */
 	if (unlikely(afc == 2)) {
 		/* An 'adaptation field only' packet will have the
 		 * same CC value as the previous payload packet. */
 		return 0;
 		/* TODO: Add parsing of Adaption headers. The PCR
 		 * counter is hidden here...*/
 	}

 	pid_data = mp2t_pid_find(stream, pid);
 	if (!pid_data) {
 		pid_data = mp2t_pid_create(stream, pid);
 		if (!pid_data)
 			return 0;
 	}


 	skips = detect_cc_drops(pid_data, cc_curr, skb);

 	return skips;
}


static int
dissect_mp2t(unsigned char *payload_ptr, u16 payload_len,
 	     const struct sk_buff *skb, const struct udphdr *uh,
 	     const struct xt_mp2t_mtinfo *info)
{
 	u16 offset = 0;
 	int skips  = 0;
 	int skips_total = 0;
 	int discontinuity = 0;
 	const struct iphdr *iph = ip_hdr(skb);

 	struct mp2t_stream     *stream; /* "Connection" */
 	struct mp2t_stream_match match;

 	struct xt_rule_mp2t_conn_htable *hinfo;
 	hinfo = info->hinfo;

 	/** Lookup stream data structures **/

 	/* Fill in the match struct */
 	memset(&match, 0, sizeof(match)); /* Worried about struct padding */
 	match.ip.src = iph->saddr;
 	match.ip.dst = iph->daddr;
 	match.src_port = uh->source;
 	match.dst_port = uh->dest;

 	/* spin_lock_bh(&hinfo->lock); // Replaced by RCU */
 	rcu_read_lock_bh();

 	stream = mp2t_stream_find(hinfo, &match);
 	if (!stream) {
 		stream = mp2t_stream_alloc_init(hinfo, &match);
 		if (!stream) {
 			/* spin_unlock_bh(&hinfo->lock); // Replaced by RCU */
 			rcu_read_unlock_bh();
 			return 0;
 		}
 		/* msg_info(RX_STATUS, */
 		printk(KERN_INFO
 		       "Rule:%d New stream (%pI4 -> %pI4)\n",
 		       hinfo->id, &iph->saddr, &iph->daddr);
 	}

 	/** Process payload **/

 	spin_lock_bh(&stream->lock); /* Update lock for the stream */

 	/* Protect against dealloc (via atomic counter stream->use) */
 	if (!atomic_inc_not_zero(&stream->use)) {
 		/* If "use" is zero, then we about to be free'd */
 		spin_unlock_bh(&stream->lock); /* Update lock for the stream */
 		rcu_read_unlock_bh();
 		printk(KERN_CRIT PFX "Error atomic stream->use is zero\n");
 		return 0;
 	}

 	while ((payload_len - offset) >= MP2T_PACKET_SIZE) {

 		skips = dissect_tsp(payload_ptr, payload_len, skb, stream);

 		if (skips > 0)
 			discontinuity++;
 		/* TODO: if (skips > 7) signal_loss++; */
 		skips_total += skips;

 		offset +=  MP2T_PACKET_SIZE;
 		payload_ptr += MP2T_PACKET_SIZE;
 	}

 	if (discontinuity > 0) {
 		stream->skips         += skips_total;
 		stream->discontinuity += discontinuity;
 	}

 	atomic_dec(&stream->use); /* Protect agains dealloc */
 	spin_unlock_bh(&stream->lock); /* Update lock for the stream */
 	rcu_read_unlock_bh();
 	/* spin_unlock_bh(&hinfo->lock); // Replaced by RCU */

 	/* Place print statement after the unlock section */
 	if (discontinuity > 0) {
 		msg_notice(RX_STATUS,
 			   "Detected discontinuity "
 			   "%pI4 -> %pI4 (CCerr:%d skips:%d)",
 			   &ip_hdr(skb)->saddr, &ip_hdr(skb)->daddr,
 			   discontinuity, skips_total);
 	}

 	return skips_total;
}


static bool
is_mp2t_packet(unsigned char *payload_ptr, u16 payload_len)
{
 	u16 offset = 0;

 	/* IDEA/TODO: Detect wrong/changing TS mappings */

 	/* Basic payload Transport Stream check */
 	if (payload_len % MP2T_PACKET_SIZE > 0) {
 		msg_dbg(PKTDATA, "Not a MPEG2 TS packet, wrong size");
 		return false;
 	}

 	/* Check for a sync byte in all TS frames */
 	while ((payload_len - offset) >= MP2T_PACKET_SIZE) {

 		if (payload_ptr[0] != MP2T_SYNC_BYTE) {
 			msg_dbg(PKTDATA, "Invalid MP2T packet skip!");
 			return false;
 		}
 		offset +=  MP2T_PACKET_SIZE;
 		payload_ptr += MP2T_PACKET_SIZE;
 	}
 	/* msg_dbg(PKTDATA, "True MP2T packet"); */

 	return true;
}


static bool
xt_mp2t_match(const struct sk_buff *skb, struct xt_action_param *par)
{
 	const struct xt_mp2t_mtinfo *info = par->matchinfo;
 	const struct iphdr *iph = ip_hdr(skb);
 	const struct udphdr *uh;
 	struct udphdr _udph;
 	__be32 saddr, daddr;
 	u16 ulen;
 	u16 hdr_size;
 	u16 payload_len;
 	unsigned char *payload_ptr;

 	bool res = false;
 	int skips = 0;

 	if (!(info->flags & XT_MP2T_DETECT_DROP)) {
 		msg_err(RX_ERR, "You told me to do nothing...?!");
 		return false;
 	}

 	/*
 	if (!pskb_may_pull((struct sk_buff *)skb, sizeof(struct udphdr)))
 		return false;
 	*/

 	saddr = iph->saddr;
 	daddr = iph->daddr;

 	/* Must not be a fragment. */
 	if (par->fragoff != 0) {
 		msg_warn(RX_ERR, "Skip cannot handle fragments "
 			 "(pkt from:%pI4 to:%pI4) len:%u datalen:%u"
 			 , &saddr, &daddr, skb->len, skb->data_len);
 		return false;
 	}

 	/* We need to walk through the payload data, and I don't want
 	 * to handle fragmented SKBs, the SKB has to be linearized */
 	if (skb_is_nonlinear(skb)) {
 		if (skb_linearize((struct sk_buff *)skb) != 0) {
 			msg_err(RX_ERR, "SKB linearization failed"
 				"(pkt from:%pI4 to:%pI4) len:%u datalen:%u",
 				&saddr, &daddr, skb->len, skb->data_len);
 			/* TODO: Should we just hotdrop it?
 			   *par->hotdrop = true;
 			*/
 			return false;
 		}
 	}

 	uh = skb_header_pointer(skb, par->thoff, sizeof(_udph), &_udph);
 	if (unlikely(uh == NULL)) {
 		/* Something is wrong, cannot even access the UDP
 		 * header, no choice but to drop. */
 		msg_err(RX_ERR, "Dropping evil UDP tinygram "
 			"(pkt from:%pI4 to:%pI4)", &saddr, &daddr);
 		par->hotdrop = true;
 		return false;
 	}
 	ulen = ntohs(uh->len);

 	/* How much do we need to skip to access payload data */
 	hdr_size    = par->thoff + sizeof(struct udphdr);
 	payload_ptr = skb_network_header(skb) + hdr_size;
 	/* payload_ptr = skb->data + hdr_size; */
 	BUG_ON(payload_ptr != (skb->data + hdr_size));

 	/* Different ways to determine the payload_len.  Think the
 	 * safest is to use the skb->len, as we really cannot trust
 	 * the contents of the packet.
 	  payload_len = ntohs(iph->tot_len)- hdr_size;
 	  payload_len = ulen - sizeof(struct udphdr);
 	*/
 	payload_len = skb->len - hdr_size;

/* Not sure if we need to clone packets
 	if (skb_shared(skb))
 		msg_dbg(RX_STATUS, "skb(0x%p) shared", skb);

 	if (!skb_cloned(skb))
 		msg_dbg(RX_STATUS, "skb(0x%p) NOT cloned", skb);
*/

 	if (is_mp2t_packet(payload_ptr, payload_len)) {
 		msg_dbg(PKTDATA, "Jubii - its a MP2T packet");
 		skips = dissect_mp2t(payload_ptr, payload_len, skb, uh, info);
 	} else {
 		msg_dbg(PKTDATA, "Not a MPEG2 TS packet "
 			"(pkt from:%pI4 to:%pI4)", &saddr, &daddr);
 		return false;
 	}

 	if (info->flags & XT_MP2T_DETECT_DROP)
 		res = !!(skips); /* Convert to a bool */

 	return res;
}

static struct xt_match mp2t_mt_reg[] __read_mostly = {
 	{
 		.name           = "mp2t",
 		.revision       = 0,
 		.family         = NFPROTO_IPV4,
 		.match          = xt_mp2t_match,
 		.checkentry     = xt_mp2t_mt_check,
 		.destroy        = xt_mp2t_mt_destroy,
 		.proto		= IPPROTO_UDP,
 		.matchsize      = sizeof(struct xt_mp2t_mtinfo),
 		.me             = THIS_MODULE,
 	},
};


/*** Proc seq_file functionality ***/

static void *mp2t_seq_start(struct seq_file *s, loff_t *pos)
{
 	struct proc_dir_entry *pde = s->private;
 	struct xt_rule_mp2t_conn_htable *htable = pde->data;
 	unsigned int *bucket;

 	if (*pos >= htable->cfg.size)
 		return NULL;

 	if (!*pos)
 		return SEQ_START_TOKEN;

 	bucket = kmalloc(sizeof(unsigned int), GFP_ATOMIC);
 	if (!bucket)
 		return ERR_PTR(-ENOMEM);

 	*bucket = *pos;
 	return bucket;
}

static void *mp2t_seq_next(struct seq_file *s, void *v, loff_t *pos)
{
 	struct proc_dir_entry *pde = s->private;
 	struct xt_rule_mp2t_conn_htable *htable = pde->data;
 	unsigned int *bucket = (unsigned int *)v;

 	if (v == SEQ_START_TOKEN) {
 		bucket = kmalloc(sizeof(unsigned int), GFP_ATOMIC);
 		if (!bucket)
 			return ERR_PTR(-ENOMEM);
 		*bucket = 0;
 		*pos    = 0;
 		v = bucket;
 		return bucket;
 	}

 	*pos = ++(*bucket);
 	if (*pos >= htable->cfg.size) {
 		kfree(v);
 		return NULL;
 	}
 	return bucket;
}

static void mp2t_seq_stop(struct seq_file *s, void *v)
{
 	unsigned int *bucket = (unsigned int *)v;
 	kfree(bucket);
}

static int mp2t_seq_show_real(struct mp2t_stream *stream, struct seq_file *s,
 			      unsigned int bucket)
{
 	int res;

 	if (!atomic_inc_not_zero(&stream->use)) {
 		/* If "use" is zero, then we about to be free'd */
 		return 0;
 	}

 	res = seq_printf(s, "bucket:%d dst:%pI4 src:%pI4 dport:%u sport:%u "
 			    "pids:%d skips:%llu discontinuity:%llu\n",
 			 bucket,
 			 &stream->match.ip.dst,
 			 &stream->match.ip.src,
 			 ntohs(stream->match.dst_port),
 			 ntohs(stream->match.src_port),
 			 stream->pid_list_len,
 			 stream->skips,
 			 stream->discontinuity
 		);

 	atomic_dec(&stream->use);

 	return res;
}

static int mp2t_seq_show(struct seq_file *s, void *v)
{
 	struct proc_dir_entry *pde = s->private;
 	struct xt_rule_mp2t_conn_htable *htable = pde->data;
 	unsigned int *bucket = (unsigned int *)v;
 	struct mp2t_stream *stream;
 	struct hlist_node *pos;
 	struct timespec delta;
 	struct timespec now;

 	/*
 	  The syntax for the proc output is "key:value" constructs,
 	  seperated by a space.  This is done to ease machine/script
 	  parsing and still keeping it human readable.
 	*/

 	if (v == SEQ_START_TOKEN) {
 		getnstimeofday(&now);
 		delta = timespec_sub(now, htable->time_created);

 		/* version info */
 		seq_printf(s, "# info:version module:%s version:%s\n",
 			   XT_MODULE_NAME, XT_MODULE_VERSION);

 		/* time info */
 		seq_printf(s, "# info:time created:%ld.%09lu"
 			      " now:%ld.%09lu delta:%ld.%09lu\n",
 			   (long)htable->time_created.tv_sec,
 			   htable->time_created.tv_nsec,
 			   (long)now.tv_sec, now.tv_nsec,
 			   (long)delta.tv_sec, delta.tv_nsec);

 		/* dynamic info */
 		seq_puts(s, "# info:dynamic");
 		seq_printf(s, " rule_id:%d", htable->id);
 		seq_printf(s, " streams:%d", htable->count);
 		seq_printf(s, " streams_check:%d", htable->stream_not_found);
 		seq_printf(s, " max_list_search:%d",  htable->max_list_search);
 		seq_printf(s, " rnd:%u", htable->rnd);
 		seq_puts(s, "\n");

 		/* config info */
 		seq_puts(s, "# info:config");
 		seq_printf(s, " htable_size:%u", htable->cfg.size);
 		seq_printf(s, " max-streams:%u", htable->cfg.max);
 		seq_printf(s, " list_search_warn_level:%d",
 			   htable->cfg.max_list);
 		seq_puts(s, "\n");

 	} else {
 		rcu_read_lock();
 		if (!hlist_empty(&htable->stream_hash[*bucket])) {
 			hlist_for_each_entry_rcu(stream, pos,
 						 &htable->stream_hash[*bucket],
 						 node) {
 				if (mp2t_seq_show_real(stream, s, *bucket)) {
 					rcu_read_unlock();
 					return -1;
 				}
 			}
 		}
 		rcu_read_unlock();
 	}
 	return 0;
}

static const struct seq_operations dl_seq_ops = {
 	.start = mp2t_seq_start,
 	.next  = mp2t_seq_next,
 	.stop  = mp2t_seq_stop,
 	.show  = mp2t_seq_show
};

static int mp2t_proc_open(struct inode *inode, struct file *file)
{
 	int ret = seq_open(file, &dl_seq_ops);

 	if (!ret) {
 		struct seq_file *sf = file->private_data;
 		sf->private = PDE(inode);
 	}
 	return ret;
}

static const struct file_operations dl_file_ops = {
 	.owner   = THIS_MODULE,
 	.open    = mp2t_proc_open,
 	.read    = seq_read,
 	.llseek  = seq_lseek,
 	.release = seq_release
};

/*** Module init & exit ***/

static int __init mp2t_mt_init(void)
{
 	int err;
 	GLOBAL_ID = 1; /* Module counter for rule_id assignments */

 	/* The list conn_htables contain references to dynamic
 	 * allocated memory (via xt_rule_mp2t_conn_htable ptr) that
 	 * needes to survive between rule updates.
 	 */
 	INIT_LIST_HEAD(&conn_htables);

 	msg_level = netif_msg_init(debug, MP2T_MSG_DEFAULT);
 	msg_info(DRV, "Loading: %s", version);
 	msg_dbg(DRV, "Message level (msg_level): 0x%X", msg_level);

 	/* Register the mp2t matches */
 	err = xt_register_matches(mp2t_mt_reg, ARRAY_SIZE(mp2t_mt_reg));
 	if (err) {
 		msg_err(DRV, "unable to register matches");
 		return err;
 	}

#ifdef CONFIG_PROC_FS
 	/* Create proc directory shared by all rules */
 	mp2t_procdir = proc_mkdir(XT_MODULE_NAME, init_net.proc_net);
 	if (!mp2t_procdir) {
 		msg_err(DRV, "unable to create proc dir entry");
 		/* In case of error unregister the mp2t matches */
 		xt_unregister_matches(mp2t_mt_reg, ARRAY_SIZE(mp2t_mt_reg));
 		err = -ENOMEM;
 	}
#endif

 	return err;
}

static void __exit mp2t_mt_exit(void)
{
 	msg_info(DRV, "Unloading: %s", version);

 	remove_proc_entry(XT_MODULE_NAME, init_net.proc_net);

 	xt_unregister_matches(mp2t_mt_reg, ARRAY_SIZE(mp2t_mt_reg));

 	/* Its important to wait for all call_rcu_bh() callbacks to
 	 * finish before this module is deallocated as the code
 	 * mp2t_stream_free() is used by these callbacks.
 	 *
 	 * Notice doing a synchronize_rcu() is NOT enough. Need to
 	 * invoke rcu_barrier_bh() to enforce wait for completion of
 	 * call_rcu_bh() callbacks on all CPUs.
 	 */
 	rcu_barrier_bh();
}

module_init(mp2t_mt_init);
module_exit(mp2t_mt_exit);

^ permalink raw reply

* [RFC 1/3] MPEG2/TS drop analyzer file: xt_mp2t.h
From: Jesper Dangaard Brouer @ 2010-10-19 14:25 UTC (permalink / raw)
  To: Netfilter Developers; +Cc: paulmck, Eric Dumazet, netdev
In-Reply-To: <Pine.LNX.4.64.1010191608080.18708@ask.diku.dk>

/*
  * Header file for MPEG2 TS match extension "mp2t" for Xtables.
  *
  * Copyright (c) Jesper Dangaard Brouer <jdb@comx.dk>, 2009+
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License; either
  * version 2 of the License, or any later version, as published by the
  * Free Software Foundation.
  *
  */
#ifndef _LINUX_NETFILTER_XT_MP2T_MATCH_H
#define _LINUX_NETFILTER_XT_MP2T_MATCH_H 1

#define XT_MODULE_NAME		"xt_mp2t"
#define XT_MODULE_VERSION	"0.2.1-devel"
#define XT_MODULE_RELDATE	"Sep 15, 2010"
#define PFX			XT_MODULE_NAME ": "

static char version[] =
 	XT_MODULE_NAME ".c:v" XT_MODULE_VERSION " (" XT_MODULE_RELDATE ")";

enum {
 	XT_MP2T_DETECT_DROP = 1 << 0,
 	XT_MP2T_MAX_STREAMS = 1 << 1,
 	XT_MP2T_PARAM_NAME  = 1 << 2,
};

/* Details of this hash structure is hidden in kernel space xt_mp2t.c */
struct xt_rule_mp2t_conn_htable;

struct mp2t_cfg {

 	/* Hash table setup */
 	u_int32_t size;		/* how many hash buckets */
 	u_int32_t max;		/* max number of entries */
 	u_int32_t max_list;	/* warn if list searches exceed this number */
};

struct xt_mp2t_mtinfo {
 	__u16 flags;

 	/* FIXME:

 	   I need to fix the problem, where I have to reallocated data
 	   each time a single rule change occur.

 	   The idea with rule_name and rule_id is that the name is
 	   optional, simply to provide a name in /proc/, the rule_id
 	   is the real lookup-key in the internal kernel list of the
 	   rules associated dynamic-allocated-data.

 	 */
 	char rule_name[IFNAMSIZ];

 	struct mp2t_cfg cfg;

 	/** Below used internally by the kernel **/
 	__u32 rule_id;

 	/* Hash table pointer */
 	struct xt_rule_mp2t_conn_htable *hinfo __attribute__((aligned(8)));
};

#endif /* _LINUX_NETFILTER_XT_MP2T_MATCH_H */

^ permalink raw reply

* [RFC 0/3] MPEG2/TS drop analyzer iptables match extension
From: Jesper Dangaard Brouer @ 2010-10-19 14:21 UTC (permalink / raw)
  To: Netfilter Developers; +Cc: paulmck, Eric Dumazet, netdev

This is my iptables match module for analyzing IPTV MPEG2/TS streams.
Currently it only detects dropped packets, but I want to extend it for
analyzing jitter and bursts.

Jan Engelhardt convinced me that I should just send the module as-is
for review on the list.  I wrote the code in 2009, and have only done
some minor changes to make it work on kernel 2.6.35 since.

The code is running in our production environment, where it handles
approx 120 TV-channels which gives an approx 600 Mbit/s bandwidth
usage.  The production hardware are using an ATOM 330 CPU (1.6Ghz),
and a PCI-express attached NIC Intel 82576 (driver igb).

I take advantage of the multiqueue features in the 82576 Intel NIC.
This together with using RCU locking, makes the module scale very well
to several CPUs (the ATOM 330 has two cores with HyperTreading).
The total CPU load on production is only 10% CPU load (6-7% in softirq).

I do "deep-packet-inspection" of the IP packet in order to process the
7 internal TS (Transport Stream) packets.

I have placed the presentation I'm giving at Netfilter Workshop 2010 on 
the Wiki list of presentations:
  http://workshop.netfilter.org/2010/wiki/index.php/List_of_presentations
and a direct link here:
  http://workshop.netfilter.org/2010/presentations/IPTV-burst-issues-NFWS2010-JesperBrouer.odp

Cheers,
   Jesper Brouer

--
-------------------------------------------------------------------
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
-------------------------------------------------------------------

^ permalink raw reply

* Re: [Ksummit-2010-discuss] [v2] Remaining BKL users, what to do
From: Paul Mundt @ 2010-10-19 13:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Arnd Bergmann, Dave Airlie, Theodore Kilgore, Greg KH, codalist,
	autofs, Samuel Ortiz, Jan Kara, Mikulas Patocka, Jan Harkes,
	netdev, Anders Larsen, linux-kernel, dri-devel, Bryan Schumaker,
	Christoph Hellwig, ksummit-2010-discuss, Petr Vandrovec,
	Arnaldo Carvalho de Melo, linux-fsdevel, Evgeniy Dushistov,
	Ingo Molnar, Andrew Hendry, linux-media
In-Reply-To: <1287491998.16971.360.camel@gandalf.stny.rr.com>

On Tue, Oct 19, 2010 at 08:39:58AM -0400, Steven Rostedt wrote:
> On Tue, 2010-10-19 at 09:26 +0200, Arnd Bergmann wrote:
> > On Tuesday 19 October 2010 06:52:32 Dave Airlie wrote:
> > > > I might be able to find some hardware still lying around here that uses an
> > > > i810. Not sure unless I go hunting it. But I get the impression that if
> > > > the kernel is a single-CPU kernel there is not any problem anyway? Don't
> > > > distros offer a non-smp kernel as an installation option in case the user
> > > > needs it? So in reality how big a problem is this?
> > > 
> > > Not anymore, which is my old point of making a fuss. Nowadays in the
> > > modern distro world, we supply a single kernel that can at runtime
> > > decide if its running on SMP or UP and rewrite the text section
> > > appropriately with locks etc. Its like magic, and something like
> > > marking drivers as BROKEN_ON_SMP at compile time is really wrong when
> > > what you want now is a runtime warning if someone tries to hotplug a
> > > CPU with a known iffy driver loaded or if someone tries to load the
> > > driver when we are already in SMP mode.
> > 
> > We could make the driver run-time non-SMP by adding
> > 
> > 	if (num_present_cpus() > 1) {
> > 		pr_err("i810 no longer supports SMP\n");
> > 		return -EINVAL;
> > 	}
> > 
> > to the init function. That would cover the vast majority of the
> > users of i810 hardware, I guess.
> 
> I think we also need to cover the PREEMPT case too. But that could be a
> compile time check, since you can't boot a preempt kernel and make it
> non preempt.
> 
There are enough nameless embedded vendors that have turned a preempt
kernel in to a non-preempt one at run-time by leaking the preempt count,
whether by design or not, so it's certainly possile :-)

^ permalink raw reply

* Re: [PATCH 1/1] ARC vmac ethernet driver.
From: David Miller @ 2010-10-19 13:53 UTC (permalink / raw)
  To: andreas.fenkart; +Cc: netdev
In-Reply-To: <1287129254-18078-2-git-send-email-andreas.fenkart@streamunlimited.com>

From: Andreas Fenkart <andreas.fenkart@streamunlimited.com>
Date: Fri, 15 Oct 2010 09:54:14 +0200

> +
> +#undef DEBUG
> +

Please remove this.

> +#if 0
> +	/* FIXME: what is it used for? */
> +	platform_set_drvdata(ap->dev, ap->mii_bus);
> +#endif

Resolve this one way or another, either figure out what it's used
for and keep it or remove it if it is unnedeed.

> +		/* IP header Alignment (14 byte Ethernet header) */
> +		skb_reserve(skb, 2);

Use "NET_IP_ALIGN", not "2", different architectures want to
use different values.

> +	skb_reserve(merge_skb, 2);

Same thing here, use NET_IP_ALIGN.

> +/* arcvmac private data structures */
> +struct vmac_buffer_desc {
> +	unsigned int info;
> +	dma_addr_t data;
> +};

If this is the actual descriptor used by the hardware you
cannot define it this way.

dma_addr_t is a variable type, on some platforms it is a
"u32", on others it is a "u64" but you cannot assume one
way or another.

Also, are these values big or little endian?  You must use
the appropriate endian types such as __be32 et al. and then
access the members using the proper conversion functions.

^ permalink raw reply

* [PATCH] ip6_tunnel dont update the mtu on the route.
From: Anders.Franzen @ 2010-10-19 13:50 UTC (permalink / raw)
  To: eric.dumazet, netdev; +Cc: Anders Franzen

From: Anders Franzen <anders.franzen@ericsson.com>

The ip6_tunnel device did not unset the flag,
IFF_XMIT_DST_RELEASE. This will make the dev layer
to release the dst before calling the tunnel.
The tunnel will not update any mtu/pmtu info, since
it does not have a dst on the skb.
---
 net/ipv6/ip6_tunnel.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index c2c0f89..38b9a56 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1371,6 +1371,7 @@ static void ip6_tnl_dev_setup(struct net_device *dev)
 	dev->flags |= IFF_NOARP;
 	dev->addr_len = sizeof(struct in6_addr);
 	dev->features |= NETIF_F_NETNS_LOCAL;
+	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
 }
 
 
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 4/4] crypto: algif_skcipher - User-space interface for skcipher operations
From: Herbert Xu @ 2010-10-19 13:46 UTC (permalink / raw)
  To: Linux Crypto Mailing List, netdev, Linux Kernel Mailing List
In-Reply-To: <20100907084213.GA4610@gondor.apana.org.au>

crypto: algif_skcipher - User-space interface for skcipher operations

This patch adds the af_alg plugin for symmetric key ciphers,
corresponding to the ablkcipher kernel operation type.

Keys can optionally be set through the setsockopt interface.

Once a sendmsg call occurs without MSG_MORE no further writes
may be made to the socket until all previous data has been read.

IVs and and whether encryption/decryption is performed can be
set through the setsockopt interface or as a control message
to sendmsg.

The interface is completely synchronous, all operations are
carried out in recvmsg(2) and will complete prior to the system
call returning.

The splice(2) interface support reading the user-space data directly
without copying (except that the Crypto API itself may copy the data
if alignment is off).

The recvmsg(2) interface supports directly writing to user-space
without additional copying, i.e., the kernel crypto interface will
receive the user-space address as its output SG list.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 crypto/Kconfig          |    8 
 crypto/Makefile         |    1 
 crypto/algif_skcipher.c |  664 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 673 insertions(+)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 6db27d7..69437e2 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -852,6 +852,14 @@ config CRYPTO_USER_API_HASH
 	  This option enables the user-spaces interface for hash
 	  algorithms.
 
+config CRYPTO_USER_API_SKCIPHER
+	tristate "User-space interface for symmetric key cipher algorithms"
+	select CRYPTO_BLKCIPHER
+	select CRYPTO_USER_API
+	help
+	  This option enables the user-spaces interface for symmetric
+	  key cipher algorithms.
+
 source "drivers/crypto/Kconfig"
 
 endif	# if CRYPTO
diff --git a/crypto/Makefile b/crypto/Makefile
index 14ab405..efc0f18 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -87,6 +87,7 @@ obj-$(CONFIG_CRYPTO_TEST) += tcrypt.o
 obj-$(CONFIG_CRYPTO_GHASH) += ghash-generic.o
 obj-$(CONFIG_CRYPTO_USER_API) += af_alg.o
 obj-$(CONFIG_CRYPTO_USER_API_HASH) += algif_hash.o
+obj-$(CONFIG_CRYPTO_USER_API_SKCIPHER) += algif_skcipher.o
 
 #
 # generic algorithms and the async_tx api
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
new file mode 100644
index 0000000..af26040
--- /dev/null
+++ b/crypto/algif_skcipher.c
@@ -0,0 +1,664 @@
+/*
+ * algif_skcipher: User-space interface for skcipher algorithms
+ *
+ * This file provides the user-space API for symmetric key ciphers.
+ *
+ * Copyright (c) 2010 Herbert Xu <herbert@gondor.apana.org.au>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#include <crypto/skcipher.h>
+#include <crypto/if_alg.h>
+#include <linux/completion.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/net.h>
+#include <net/sock.h>
+
+struct skcipher_sg_list {
+	struct list_head list;
+
+	int cur;
+
+	struct scatterlist sg[0];
+};
+
+struct skcipher_ctx {
+	struct list_head tsgl;
+	struct af_alg_sgl rsgl;
+
+	struct completion completion;
+
+	void *iv;
+
+	unsigned used;
+
+	unsigned int len;
+	int err;
+	bool more;
+	bool merge;
+	bool enc;
+
+	struct ablkcipher_request req;
+};
+
+#define MAX_SGL_ENTS ((PAGE_SIZE - sizeof(struct skcipher_sg_list)) / \
+		      sizeof(struct scatterlist) - 1)
+
+static inline bool skcipher_writable(struct sock *sk)
+{
+	struct alg_sock *ask = alg_sk(sk);
+	struct skcipher_ctx *ctx = ask->private;
+
+	return ctx->used + PAGE_SIZE <= max_t(int, sk->sk_sndbuf, PAGE_SIZE);
+}
+
+static void skcipher_done(struct crypto_async_request *req, int err)
+{
+	struct sock *sk = req->data;
+	struct alg_sock *ask = alg_sk(sk);
+	struct skcipher_ctx *ctx = ask->private;
+
+	ctx->err = err;
+	complete(&ctx->completion);
+}
+
+static int skcipher_alloc_sgl(struct sock *sk, int size)
+{
+	struct alg_sock *ask = alg_sk(sk);
+	struct skcipher_ctx *ctx = ask->private;
+	struct skcipher_sg_list *sgl;
+	struct scatterlist *sg = NULL;
+
+	sgl = list_first_entry(&ctx->tsgl, struct skcipher_sg_list, list);
+	if (!list_empty(&ctx->tsgl))
+		sg = sgl->sg;
+
+	if (!sg || sgl->cur >= MAX_SGL_ENTS) {
+		sgl = sock_kmalloc(sk, sizeof(*sgl) +
+				       sizeof(sgl->sg[0]) * (MAX_SGL_ENTS + 1),
+				   GFP_KERNEL);
+		if (!sgl)
+			return -ENOMEM;
+
+		sg_init_table(sgl->sg, MAX_SGL_ENTS + 1);
+		sgl->cur = 0;
+
+		if (sg)
+			sg_chain(sg, MAX_SGL_ENTS + 1, sgl->sg);
+
+		list_add_tail(&sgl->list, &ctx->tsgl);
+	}
+
+	return 0;
+}
+
+static void skcipher_pull_sgl(struct sock *sk, int used)
+{
+	struct alg_sock *ask = alg_sk(sk);
+	struct skcipher_ctx *ctx = ask->private;
+	struct skcipher_sg_list *sgl;
+	struct scatterlist *sg;
+	int i;
+
+	while (!list_empty(&ctx->tsgl)) {
+		sgl = list_first_entry(&ctx->tsgl, struct skcipher_sg_list,
+				       list);
+		sg = sgl->sg;
+
+		for (i = 0; i < sgl->cur; i++) {
+			int plen = min_t(int, used, sg[i].length);
+
+			if (!sg_page(sg + i))
+				continue;
+
+			if (!used)
+				return;
+
+			sg[i].length -= plen;
+			sg[i].offset += plen;
+
+			if (!sg[i].length) {
+				put_page(sg_page(sg + i));
+				sg_assign_page(sg + i, NULL);
+			}
+
+			used -= plen;
+			ctx->used -= plen;
+		}
+
+		list_del(&sgl->list);
+		sock_kfree_s(sk, sgl,
+			     sizeof(*sgl) + sizeof(sgl->sg[0]) *
+					    (MAX_SGL_ENTS + 1));
+	}
+
+	if (!ctx->used)
+		ctx->merge = 0;
+}
+
+static void skcipher_free_sgl(struct sock *sk)
+{
+	struct alg_sock *ask = alg_sk(sk);
+	struct skcipher_ctx *ctx = ask->private;
+
+	skcipher_pull_sgl(sk, ctx->used);
+}
+
+static int skcipher_wait_for_wmem(struct sock *sk, unsigned flags)
+{
+	DEFINE_WAIT(wait);
+	int err = -ERESTARTSYS;
+
+	if (flags & MSG_DONTWAIT)
+		return -EAGAIN;
+
+	set_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags);
+
+	for (;;) {
+		if (signal_pending(current))
+			break;
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
+		if (skcipher_writable(sk)) {
+			err = 0;
+			break;
+		}
+		schedule();
+	}
+	finish_wait(sk_sleep(sk), &wait);
+
+	return err;
+}
+
+static void skcipher_wmem_wakeup(struct sock *sk)
+{
+	struct socket_wq *wq;
+
+	if (!skcipher_writable(sk))
+		return;
+
+	rcu_read_lock();
+	wq = rcu_dereference(sk->sk_wq);
+	if (wq_has_sleeper(wq))
+		wake_up_interruptible_sync_poll(&wq->wait, POLLIN |
+							   POLLRDNORM |
+							   POLLRDBAND);
+	sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN);
+	rcu_read_unlock();
+}
+
+static int skcipher_wait_for_data(struct sock *sk, unsigned flags)
+{
+	struct alg_sock *ask = alg_sk(sk);
+	struct skcipher_ctx *ctx = ask->private;
+	DEFINE_WAIT(wait);
+	int err = -ERESTARTSYS;
+
+	if (flags & MSG_DONTWAIT) {
+		return -EAGAIN;
+	}
+
+	set_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
+
+	for (;;) {
+		if (signal_pending(current))
+			break;
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
+		if (ctx->used) {
+			err = 0;
+			break;
+		}
+		schedule();
+	}
+	finish_wait(sk_sleep(sk), &wait);
+
+	clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
+
+	return err;
+}
+
+static void skcipher_data_wakeup(struct sock *sk)
+{
+	struct alg_sock *ask = alg_sk(sk);
+	struct skcipher_ctx *ctx = ask->private;
+	struct socket_wq *wq;
+
+	if (!ctx->used);
+		return;
+
+	rcu_read_lock();
+	wq = rcu_dereference(sk->sk_wq);
+	if (wq_has_sleeper(wq))
+		wake_up_interruptible_sync_poll(&wq->wait, POLLOUT |
+							   POLLRDNORM |
+							   POLLRDBAND);
+	sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT);
+	rcu_read_unlock();
+}
+
+static int skcipher_sendmsg(struct kiocb *unused, struct socket *sock,
+			    struct msghdr *msg, size_t size)
+{
+	struct sock *sk = sock->sk;
+	struct alg_sock *ask = alg_sk(sk);
+	struct skcipher_ctx *ctx = ask->private;
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(&ctx->req);
+	unsigned ivsize = crypto_ablkcipher_ivsize(tfm);
+	struct skcipher_sg_list *sgl;
+	struct af_alg_control con = {};
+	long copied = 0;
+	bool enc = 0;
+	int limit;
+	int err;
+	int i;
+
+	if (msg->msg_controllen) {
+		err = af_alg_cmsg_send(msg, &con);
+		if (err)
+			return err;
+
+		switch (con.op) {
+		case ALG_OP_ENCRYPT:
+			enc = 1;
+			break;
+		case ALG_OP_DECRYPT:
+			enc = 0;
+			break;
+		default:
+			return -EINVAL;
+		}
+
+		if (con.iv && con.iv->ivlen != ivsize)
+			return -EINVAL;
+	}
+
+	err = -EINVAL;
+
+	lock_sock(sk);
+	if (!ctx->more && ctx->used)
+		goto unlock;
+
+	if (!ctx->used) {
+		ctx->enc = enc;
+		if (con.iv)
+			memcpy(ctx->iv, con.iv->iv, ivsize);
+	}
+
+	limit = max_t(int, sk->sk_sndbuf, PAGE_SIZE);
+	limit -= ctx->used;
+
+	while (size) {
+		struct scatterlist *sg;
+		unsigned long len = size;
+		int plen;
+
+		if (ctx->merge) {
+			sgl = list_entry(ctx->tsgl.prev,
+					 struct skcipher_sg_list, list);
+			sg = sgl->sg + sgl->cur - 1;
+			len = min_t(unsigned long, len, PAGE_SIZE - sg->length);
+
+			err = memcpy_fromiovec(page_address(sg_page(sg)) +
+					       sg->length, msg->msg_iov, len);
+			if (err)
+				goto unlock;
+
+			sg->length += len;
+			ctx->merge = sg->length & (PAGE_SIZE - 1);
+
+			size -= len;
+			copied += len;
+			continue;
+		}
+
+		if (limit < PAGE_SIZE) {
+			release_sock(sk);
+			err = skcipher_wait_for_wmem(sk, msg->msg_flags);
+			lock_sock(sk);
+			if (err)
+				goto unlock;
+
+			limit = max_t(int, sk->sk_sndbuf, PAGE_SIZE);
+			limit -= ctx->used;
+		}
+
+		len = min_t(unsigned long, len, limit);
+
+		err = skcipher_alloc_sgl(sk, len);
+		if (err)
+			goto unlock;
+
+		sgl = list_first_entry(&ctx->tsgl, struct skcipher_sg_list,
+				       list);
+		sg = sgl->sg;
+		do {
+			i = sgl->cur;
+			plen = min_t(int, len, PAGE_SIZE);
+
+			sg_assign_page(sg + i, alloc_page(GFP_KERNEL));
+			err = -ENOMEM;
+			if (!sg_page(sg + i))
+				goto unlock;
+
+			err = memcpy_fromiovec(page_address(sg_page(sg + i)),
+					       msg->msg_iov, plen);
+			if (err) {
+				__free_page(sg_page(sg + i));
+				sg_assign_page(sg + i, NULL);
+				goto unlock;
+			}
+
+			sg[i].length = plen;
+			len -= plen;
+			ctx->used += plen;
+			copied += plen;
+			size -= plen;
+			limit -= plen;
+			sgl->cur++;
+		} while (sg_page(sg + ++i));
+
+		ctx->merge = plen & (PAGE_SIZE - 1);
+		if (ctx->merge)
+			sgl->cur--;
+	}
+
+	err = 0;
+
+	ctx->more = msg->msg_flags & MSG_MORE;
+	if (!ctx->more && !list_empty(&ctx->tsgl)) {
+		sgl = list_entry(ctx->tsgl.prev, struct skcipher_sg_list, list);
+		sg_mark_end(sgl->sg + sgl->cur - 1);
+	}
+
+unlock:
+	skcipher_data_wakeup(sk);
+	release_sock(sk);
+
+	return copied ?: err;
+}
+
+static ssize_t skcipher_sendpage(struct socket *sock, struct page *page,
+				 int offset, size_t size, int flags)
+{
+	struct sock *sk = sock->sk;
+	struct alg_sock *ask = alg_sk(sk);
+	struct skcipher_ctx *ctx = ask->private;
+	struct skcipher_sg_list *sgl;
+	int err = -EINVAL;
+	int limit;
+
+	lock_sock(sk);
+	if (!ctx->more && ctx->used)
+		goto unlock;
+
+	if (!size)
+		goto done;
+
+	limit = max_t(int, sk->sk_sndbuf, PAGE_SIZE);
+	limit -= ctx->used;
+
+	if (limit < PAGE_SIZE) {
+		release_sock(sk);
+		err = skcipher_wait_for_wmem(sk, flags);
+		lock_sock(sk);
+		if (err)
+			goto unlock;
+
+		limit = max_t(int, sk->sk_sndbuf, PAGE_SIZE);
+		limit -= ctx->used;
+	}
+
+	err = skcipher_alloc_sgl(sk, 0);
+	if (err)
+		goto unlock;
+
+	ctx->merge = 0;
+	sgl = list_entry(ctx->tsgl.prev, struct skcipher_sg_list, list);
+
+	sg_set_page(sgl->sg + sgl->cur, page, size, offset);
+	sgl->cur++;
+
+done:
+	ctx->more = flags & MSG_MORE;
+	if (!ctx->more && !list_empty(&ctx->tsgl)) {
+		sgl = list_entry(ctx->tsgl.prev, struct skcipher_sg_list, list);
+		sg_mark_end(sgl->sg + sgl->cur - 1);
+	}
+
+unlock:
+	skcipher_data_wakeup(sk);
+	release_sock(sk);
+
+	return err ?: size;
+}
+
+static int skcipher_recvmsg(struct kiocb *unused, struct socket *sock,
+			    struct msghdr *msg, size_t ignored, int flags)
+{
+	struct sock *sk = sock->sk;
+	struct alg_sock *ask = alg_sk(sk);
+	struct skcipher_ctx *ctx = ask->private;
+	unsigned bs = crypto_ablkcipher_blocksize(crypto_ablkcipher_reqtfm(
+		&ctx->req));
+	struct skcipher_sg_list *sgl;
+	struct scatterlist *sg;
+	unsigned long iovlen;
+	struct iovec *iov;
+	int err = -EAGAIN;
+	int used;
+	long copied = 0;
+
+	lock_sock(sk);
+	for (iov = msg->msg_iov, iovlen = msg->msg_iovlen; iovlen > 0;
+	     iovlen--, iov++) {
+		unsigned long seglen = iov->iov_len;
+		char *from = iov->iov_base;
+
+		sgl = list_first_entry(&ctx->tsgl, struct skcipher_sg_list,
+				       list);
+		sg = sgl->sg;
+		while (!sg->length)
+			sg++;
+
+		while (seglen) {
+			used = ctx->used;
+			if (!used) {
+				release_sock(sk);
+				err = skcipher_wait_for_data(sk, flags);
+				lock_sock(sk);
+				if (err)
+					goto unlock;
+			}
+
+			used = min_t(unsigned long, used, seglen);
+
+			if (ctx->more || used < ctx->used)
+				used -= used % bs;
+
+			err = -EINVAL;
+			if (!used)
+				goto unlock;
+
+			used = af_alg_make_sg(&ctx->rsgl, from, used, 1);
+			if (used < 0)
+				goto unlock;
+
+			ablkcipher_request_set_crypt(&ctx->req, sg,
+						     ctx->rsgl.sg, used,
+						     ctx->iv);
+
+			err = ctx->enc ?
+			      crypto_ablkcipher_encrypt(&ctx->req) :
+			      crypto_ablkcipher_decrypt(&ctx->req);
+
+			switch (err) {
+			case -EINPROGRESS:
+			case -EBUSY:
+				wait_for_completion(&ctx->completion);
+				INIT_COMPLETION(ctx->completion);
+				err = ctx->err;
+				break;
+			}
+
+			af_alg_free_sg(&ctx->rsgl);
+
+			if (err)
+				goto unlock;
+
+			copied += used;
+			from += used;
+			seglen -= used;
+			skcipher_pull_sgl(sk, used);
+		}
+	}
+
+	err = 0;
+
+unlock:
+	skcipher_wmem_wakeup(sk);
+	release_sock(sk);
+
+	return copied ?: err;
+}
+
+
+unsigned int skcipher_poll(struct file *file, struct socket *sock,
+			   poll_table *wait)
+{
+	struct sock *sk = sock->sk;
+	struct alg_sock *ask = alg_sk(sk);
+	struct skcipher_ctx *ctx = ask->private;
+	unsigned int mask;
+
+	sock_poll_wait(file, sk_sleep(sk), wait);
+	mask = 0;
+
+	if (ctx->used)
+		mask |= POLLIN | POLLRDNORM;
+
+	if (skcipher_writable(sk))
+		mask |= POLLOUT | POLLWRNORM | POLLWRBAND;
+
+	return mask;
+}
+
+static struct proto_ops algif_skcipher_ops = {
+	.family		=	PF_ALG,
+
+	.connect	=	sock_no_connect,
+	.socketpair	=	sock_no_socketpair,
+	.getname	=	sock_no_getname,
+	.ioctl		=	sock_no_ioctl,
+	.listen		=	sock_no_listen,
+	.shutdown	=	sock_no_shutdown,
+	.getsockopt	=	sock_no_getsockopt,
+	.mmap		=	sock_no_mmap,
+	.bind		=	sock_no_bind,
+	.accept		=	sock_no_accept,
+	.setsockopt	=	sock_no_setsockopt,
+
+	.release	=	af_alg_release,
+	.sendmsg	=	skcipher_sendmsg,
+	.sendpage	=	skcipher_sendpage,
+	.recvmsg	=	skcipher_recvmsg,
+	.poll		=	skcipher_poll,
+};
+
+static void *skcipher_bind(const char *name, u32 type, u32 mask)
+{
+	return crypto_alloc_ablkcipher(name, type, mask);
+}
+
+static void skcipher_release(void *private)
+{
+	crypto_free_ablkcipher(private);
+}
+
+static int skcipher_setkey(void *private, const u8 *key, unsigned int keylen)
+{
+	return crypto_ablkcipher_setkey(private, key, keylen);
+}
+
+static void skcipher_sock_destruct(struct sock *sk)
+{
+	struct alg_sock *ask = alg_sk(sk);
+	struct skcipher_ctx *ctx = ask->private;
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(&ctx->req);
+
+	skcipher_free_sgl(sk);
+	sock_kfree_s(sk, ctx->iv, crypto_ablkcipher_blocksize(tfm));
+	sock_kfree_s(sk, ctx, ctx->len);
+	af_alg_release_parent(sk);
+}
+
+static int skcipher_accept_parent(void *private, struct sock *sk)
+{
+	struct skcipher_ctx *ctx;
+	struct alg_sock *ask = alg_sk(sk);
+	unsigned int len = sizeof(*ctx) + crypto_ablkcipher_reqsize(private);
+
+	ctx = sock_kmalloc(sk, len, GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->iv = sock_kmalloc(sk, crypto_ablkcipher_ivsize(private),
+			       GFP_KERNEL);
+	if (!ctx->iv) {
+		sock_kfree_s(sk, ctx, len);
+		return -ENOMEM;
+	}
+
+	memset(ctx->iv, 0, crypto_ablkcipher_ivsize(private));
+
+	INIT_LIST_HEAD(&ctx->tsgl);
+	ctx->len = len;
+	ctx->used = 0;
+	ctx->more = 0;
+	ctx->merge = 0;
+	ctx->enc = 0;
+	init_completion(&ctx->completion);
+
+	ask->private = ctx;
+
+	ablkcipher_request_set_tfm(&ctx->req, private);
+	ablkcipher_request_set_callback(&ctx->req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+					skcipher_done, sk);
+
+	sk->sk_destruct = skcipher_sock_destruct;
+
+	return 0;
+}
+
+static const struct af_alg_type algif_type_skcipher = {
+	.bind		=	skcipher_bind,
+	.release	=	skcipher_release,
+	.setkey		=	skcipher_setkey,
+	.accept		=	skcipher_accept_parent,
+	.ops		=	&algif_skcipher_ops,
+	.name		=	"skcipher",
+	.owner		=	THIS_MODULE
+};
+
+static int __init algif_skcipher_init(void)
+{
+	return af_alg_register_type(&algif_type_skcipher);
+}
+
+static void __exit algif_skcipher_exit(void)
+{
+	int err = af_alg_unregister_type(&algif_type_skcipher);
+	BUG_ON(err);
+}
+
+module_init(algif_skcipher_init);
+module_exit(algif_skcipher_exit);
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_NETPROTO(AF_ALG);

^ permalink raw reply related

* [PATCH 3/4] crypto: algif_hash - User-space interface for hash operations
From: Herbert Xu @ 2010-10-19 13:46 UTC (permalink / raw)
  To: Linux Crypto Mailing List, netdev, Linux Kernel Mailing List
In-Reply-To: <20100907084213.GA4610@gondor.apana.org.au>

crypto: algif_hash - User-space interface for hash operations

This patch adds the af_alg plugin for hash, corresponding to
the ahash kernel operation type.

Keys can optionally be set through the setsockopt interface.

Each sendmsg call will finalise the hash unless sent with a MSG_MORE
flag.

Partial hash states can be cloned using accept(2).

The interface is completely synchronous, all operations will
complete prior to the system call returning.

Both sendmsg(2) and splice(2) support reading the user-space
data directly without copying (except that the Crypto API itself
may copy the data if alignment is off).

For now only the splice(2) interface supports performing digest
instead of init/update/final.  In future the sendmsg(2) interface
will also be modified to use digest/finup where possible so that
hardware that cannot return a partial hash state can still benefit
from this interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 crypto/Kconfig      |    8 +
 crypto/Makefile     |    1 
 crypto/algif_hash.c |  345 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 354 insertions(+)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 357e3ca..6db27d7 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -844,6 +844,14 @@ config CRYPTO_ANSI_CPRNG
 config CRYPTO_USER_API
 	tristate
 
+config CRYPTO_USER_API_HASH
+	tristate "User-space interface for hash algorithms"
+	select CRYPTO_HASH
+	select CRYPTO_USER_API
+	help
+	  This option enables the user-spaces interface for hash
+	  algorithms.
+
 source "drivers/crypto/Kconfig"
 
 endif	# if CRYPTO
diff --git a/crypto/Makefile b/crypto/Makefile
index 0b13197..14ab405 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -86,6 +86,7 @@ obj-$(CONFIG_CRYPTO_ANSI_CPRNG) += ansi_cprng.o
 obj-$(CONFIG_CRYPTO_TEST) += tcrypt.o
 obj-$(CONFIG_CRYPTO_GHASH) += ghash-generic.o
 obj-$(CONFIG_CRYPTO_USER_API) += af_alg.o
+obj-$(CONFIG_CRYPTO_USER_API_HASH) += algif_hash.o
 
 #
 # generic algorithms and the async_tx api
diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
new file mode 100644
index 0000000..28f29aa
--- /dev/null
+++ b/crypto/algif_hash.c
@@ -0,0 +1,345 @@
+/*
+ * algif_hash: User-space interface for hash algorithms
+ *
+ * This file provides the user-space API for hash algorithms.
+ *
+ * Copyright (c) 2010 Herbert Xu <herbert@gondor.apana.org.au>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#include <crypto/hash.h>
+#include <crypto/if_alg.h>
+#include <linux/completion.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/net.h>
+#include <net/sock.h>
+
+struct hash_ctx {
+	struct af_alg_sgl sgl;
+
+	struct completion completion;
+	u8 *result;
+
+	unsigned int len;
+	int err;
+	bool more;
+
+	struct ahash_request req;
+};
+
+static void hash_done(struct crypto_async_request *req, int err)
+{
+	struct sock *sk = req->data;
+	struct alg_sock *ask = alg_sk(sk);
+	struct hash_ctx *ctx = ask->private;
+
+	ctx->err = err;
+	complete(&ctx->completion);
+}
+
+static int hash_sendmsg(struct kiocb *unused, struct socket *sock,
+			struct msghdr *msg, size_t ignored)
+{
+	int limit = ALG_MAX_PAGES * PAGE_SIZE;
+	struct sock *sk = sock->sk;
+	struct alg_sock *ask = alg_sk(sk);
+	struct hash_ctx *ctx = ask->private;
+	unsigned long iovlen;
+	struct iovec *iov;
+	long copied = 0;
+	int err;
+
+	if (limit > sk->sk_sndbuf)
+		limit = sk->sk_sndbuf;
+
+	lock_sock(sk);
+	if (!ctx->more) {
+		err = crypto_ahash_init(&ctx->req);
+		if (err)
+			goto unlock;
+	}
+
+	ctx->more = 0;
+
+	for (iov = msg->msg_iov, iovlen = msg->msg_iovlen; iovlen > 0;
+	     iovlen--, iov++) {
+		unsigned long seglen = iov->iov_len;
+		char *from = iov->iov_base;
+
+		while (seglen) {
+			int len = min_t(unsigned long, seglen, limit);
+			int newlen;
+
+			newlen = af_alg_make_sg(&ctx->sgl, from, len, 0);
+			if (newlen < 0)
+				goto unlock;
+
+			ahash_request_set_crypt(&ctx->req, ctx->sgl.sg, NULL,
+						newlen);
+
+			err = crypto_ahash_update(&ctx->req);
+			switch (err) {
+			case -EINPROGRESS:
+			case -EBUSY:
+				wait_for_completion(&ctx->completion);
+				INIT_COMPLETION(ctx->completion);
+				err = ctx->err;
+				break;
+			}
+
+			af_alg_free_sg(&ctx->sgl);
+
+			if (err)
+				goto unlock;
+
+			seglen -= newlen;
+			from += newlen;
+			copied += newlen;
+		}
+	}
+
+	err = 0;
+
+	ctx->more = msg->msg_flags & MSG_MORE;
+	if (!ctx->more) {
+		ahash_request_set_crypt(&ctx->req, NULL, ctx->result, 0);
+		err = crypto_ahash_final(&ctx->req);
+
+		switch (err) {
+		case -EINPROGRESS:
+		case -EBUSY:
+			wait_for_completion(&ctx->completion);
+			INIT_COMPLETION(ctx->completion);
+			err = ctx->err;
+			break;
+		}
+	}
+
+unlock:
+	release_sock(sk);
+
+	return err ?: copied;
+}
+
+static ssize_t hash_sendpage(struct socket *sock, struct page *page,
+			     int offset, size_t size, int flags)
+{
+	struct sock *sk = sock->sk;
+	struct alg_sock *ask = alg_sk(sk);
+	struct hash_ctx *ctx = ask->private;
+	int err;
+
+	lock_sock(sk);
+	sg_init_table(ctx->sgl.sg, 1);
+	sg_set_page(ctx->sgl.sg, page, size, offset);
+
+	ahash_request_set_callback(&ctx->req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+				   hash_done, sk);
+	ahash_request_set_crypt(&ctx->req, ctx->sgl.sg, ctx->result, size);
+
+	if (!(flags & MSG_MORE)) {
+		if (ctx->more)
+			err = crypto_ahash_finup(&ctx->req);
+		else
+			err = crypto_ahash_digest(&ctx->req);
+	} else {
+		if (!ctx->more) {
+			err = crypto_ahash_init(&ctx->req);
+			if (err)
+				goto unlock;
+		}
+
+		err = crypto_ahash_update(&ctx->req);
+	}
+
+	switch (err) {
+	case -EINPROGRESS:
+	case -EBUSY:
+		wait_for_completion(&ctx->completion);
+		INIT_COMPLETION(ctx->completion);
+		err = ctx->err;
+		if (err)
+			goto unlock;
+		break;
+	}
+
+	err = 0;
+
+	ctx->more = flags & MSG_MORE;
+
+unlock:
+	release_sock(sk);
+
+	return err ?: size;
+}
+
+static int hash_recvmsg(struct kiocb *unused, struct socket *sock,
+			struct msghdr *msg, size_t len, int flags)
+{
+	struct sock *sk = sock->sk;
+	struct alg_sock *ask = alg_sk(sk);
+	struct hash_ctx *ctx = ask->private;
+	unsigned ds = crypto_ahash_digestsize(crypto_ahash_reqtfm(&ctx->req));
+	int err = -EINVAL;
+
+	if (len > ds)
+		len = ds;
+	else if (len < ds)
+		msg->msg_flags |= MSG_TRUNC;
+
+	lock_sock(sk);
+	if (!ctx->more)
+		err = memcpy_toiovec(msg->msg_iov, ctx->result, len);
+	release_sock(sk);
+
+	return err ?: len;
+}
+
+static int hash_accept(struct socket *sock, struct socket *newsock, int flags)
+{
+	struct sock *sk = sock->sk;
+	struct alg_sock *ask = alg_sk(sk);
+	struct hash_ctx *ctx = ask->private;
+	struct ahash_request *req = &ctx->req;
+	char state[crypto_ahash_statesize(crypto_ahash_reqtfm(req))];
+	struct sock *sk2;
+	struct alg_sock *ask2;
+	struct hash_ctx *ctx2;
+	int err;
+
+	err = crypto_ahash_export(req, state);
+	if (err)
+		return err;
+
+	err = af_alg_accept(sk, newsock);
+	if (err)
+		return err;
+
+	sk2 = newsock->sk;
+	ask2 = alg_sk(sk2);
+	ctx2 = ask2->private;
+
+	err = crypto_ahash_import(&ctx2->req, state);
+	if (err) {
+		sock_orphan(sk2);
+		sock_put(sk2);
+	}
+
+	return err;
+}
+
+static struct proto_ops algif_hash_ops = {
+	.family		=	PF_ALG,
+
+	.connect	=	sock_no_connect,
+	.socketpair	=	sock_no_socketpair,
+	.getname	=	sock_no_getname,
+	.ioctl		=	sock_no_ioctl,
+	.listen		=	sock_no_listen,
+	.shutdown	=	sock_no_shutdown,
+	.getsockopt	=	sock_no_getsockopt,
+	.mmap		=	sock_no_mmap,
+	.bind		=	sock_no_bind,
+	.setsockopt	=	sock_no_setsockopt,
+	.poll		=	sock_no_poll,
+
+	.release	=	af_alg_release,
+	.sendmsg	=	hash_sendmsg,
+	.sendpage	=	hash_sendpage,
+	.recvmsg	=	hash_recvmsg,
+	.accept		=	hash_accept,
+};
+
+static void *hash_bind(const char *name, u32 type, u32 mask)
+{
+	return crypto_alloc_ahash(name, type, mask);
+}
+
+static void hash_release(void *private)
+{
+	crypto_free_ahash(private);
+}
+
+static int hash_setkey(void *private, const u8 *key, unsigned int keylen)
+{
+	return crypto_ahash_setkey(private, key, keylen);
+}
+
+static void hash_sock_destruct(struct sock *sk)
+{
+	struct alg_sock *ask = alg_sk(sk);
+	struct hash_ctx *ctx = ask->private;
+
+	sock_kfree_s(sk, ctx->result,
+		     crypto_ahash_digestsize(crypto_ahash_reqtfm(&ctx->req)));
+	sock_kfree_s(sk, ctx, ctx->len);
+	af_alg_release_parent(sk);
+}
+
+static int hash_accept_parent(void *private, struct sock *sk)
+{
+	struct hash_ctx *ctx;
+	struct alg_sock *ask = alg_sk(sk);
+	unsigned len = sizeof(*ctx) + crypto_ahash_reqsize(private);
+	unsigned ds = crypto_ahash_digestsize(private);
+
+	ctx = sock_kmalloc(sk, len, GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->result = sock_kmalloc(sk, ds, GFP_KERNEL);
+	if (!ctx->result) {
+		sock_kfree_s(sk, ctx, len);
+		return -ENOMEM;
+	}
+
+	memset(ctx->result, 0, ds);
+
+	ctx->len = len;
+	ctx->more = 0;
+	init_completion(&ctx->completion);
+
+	ask->private = ctx;
+
+	ahash_request_set_tfm(&ctx->req, private);
+	ahash_request_set_callback(&ctx->req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+				   hash_done, sk);
+
+	sk->sk_destruct = hash_sock_destruct;
+
+	return 0;
+}
+
+static const struct af_alg_type algif_type_hash = {
+	.bind		=	hash_bind,
+	.release	=	hash_release,
+	.setkey		=	hash_setkey,
+	.accept		=	hash_accept_parent,
+	.ops		=	&algif_hash_ops,
+	.name		=	"hash",
+	.owner		=	THIS_MODULE
+};
+
+static int __init algif_hash_init(void)
+{
+	return af_alg_register_type(&algif_type_hash);
+}
+
+static void __exit algif_hash_exit(void)
+{
+	int err = af_alg_unregister_type(&algif_type_hash);
+	BUG_ON(err);
+}
+
+module_init(algif_hash_init);
+module_exit(algif_hash_exit);
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_NETPROTO(AF_ALG);

^ permalink raw reply related

* [PATCH 2/4] crypto: af_alg - User-space interface for Crypto API
From: Herbert Xu @ 2010-10-19 13:46 UTC (permalink / raw)
  To: Linux Crypto Mailing List, netdev, Linux Kernel Mailing List
In-Reply-To: <20100907084213.GA4610@gondor.apana.org.au>

crypto: af_alg - User-space interface for Crypto API

This patch creates the backbone of the user-space interface for
the Crypto API, through a new socket family AF_ALG.

Each session corresponds to one or more connections obtained from
that socket.  The number depends on the number of inputs/outputs
of that particular type of operation.  For most types there will
be a s ingle connection/file descriptor that is used for both input
and output.  AEAD is one of the few that require two inputs.

Each algorithm type will provide its own implementation that plugs
into af_alg.  They're keyed using a string such as "skcipher" or
"hash".

IOW this patch only contains the boring bits that is required
to hold everything together.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 crypto/Kconfig          |    3 
 crypto/Makefile         |    1 
 crypto/af_alg.c         |  433 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/crypto/if_alg.h |   75 ++++++++
 include/linux/if_alg.h  |   40 ++++
 5 files changed, 552 insertions(+)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index e4bac29..357e3ca 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -841,6 +841,9 @@ config CRYPTO_ANSI_CPRNG
 	  ANSI X9.31 A.2.4. Note that this option must be enabled if
 	  CRYPTO_FIPS is selected
 
+config CRYPTO_USER_API
+	tristate
+
 source "drivers/crypto/Kconfig"
 
 endif	# if CRYPTO
diff --git a/crypto/Makefile b/crypto/Makefile
index 423b7de..0b13197 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -85,6 +85,7 @@ obj-$(CONFIG_CRYPTO_RNG2) += krng.o
 obj-$(CONFIG_CRYPTO_ANSI_CPRNG) += ansi_cprng.o
 obj-$(CONFIG_CRYPTO_TEST) += tcrypt.o
 obj-$(CONFIG_CRYPTO_GHASH) += ghash-generic.o
+obj-$(CONFIG_CRYPTO_USER_API) += af_alg.o
 
 #
 # generic algorithms and the async_tx api
diff --git a/crypto/af_alg.c b/crypto/af_alg.c
new file mode 100644
index 0000000..f816850
--- /dev/null
+++ b/crypto/af_alg.c
@@ -0,0 +1,433 @@
+/*
+ * af_alg: User-space algorithm interface
+ *
+ * This file provides the user-space API for algorithms.
+ *
+ * Copyright (c) 2010 Herbert Xu <herbert@gondor.apana.org.au>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#include <asm/atomic.h>
+#include <crypto/if_alg.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/net.h>
+#include <linux/rwsem.h>
+
+struct alg_type_list {
+	const struct af_alg_type *type;
+	struct list_head list;
+};
+
+static atomic_t alg_memory_allocated;
+
+static struct proto alg_proto = {
+	.name			= "ALG",
+	.owner			= THIS_MODULE,
+	.memory_allocated	= &alg_memory_allocated,
+	.obj_size		= sizeof(struct alg_sock),
+};
+
+static LIST_HEAD(alg_types);
+static DECLARE_RWSEM(alg_types_sem);
+
+static const struct af_alg_type *alg_get_type(const char *name)
+{
+	const struct af_alg_type *type = ERR_PTR(-ENOENT);
+	struct alg_type_list *node;
+
+	down_read(&alg_types_sem);
+	list_for_each_entry(node, &alg_types, list) {
+		if (strcmp(node->type->name, name))
+			continue;
+
+		if (try_module_get(node->type->owner))
+			type = node->type;
+		break;
+	}
+	up_read(&alg_types_sem);
+
+	return type;
+}
+
+int af_alg_register_type(const struct af_alg_type *type)
+{
+	struct alg_type_list *node;
+	int err = -EEXIST;
+
+	down_write(&alg_types_sem);
+	list_for_each_entry(node, &alg_types, list) {
+		if (!strcmp(node->type->name, type->name))
+			goto unlock;
+	}
+
+	node = kmalloc(sizeof(*node), GFP_KERNEL);
+	err = -ENOMEM;
+	if (!node)
+		goto unlock;
+
+	type->ops->owner = THIS_MODULE;
+	node->type = type;
+	list_add(&node->list, &alg_types);
+	err = 0;
+
+unlock:
+	up_write(&alg_types_sem);
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(af_alg_register_type);
+
+int af_alg_unregister_type(const struct af_alg_type *type)
+{
+	struct alg_type_list *node;
+	int err = -ENOENT;
+
+	down_write(&alg_types_sem);
+	list_for_each_entry(node, &alg_types, list) {
+		if (strcmp(node->type->name, type->name))
+			continue;
+
+		list_del(&node->list);
+		kfree(node);
+		err = 0;
+		break;
+	}
+	up_write(&alg_types_sem);
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(af_alg_unregister_type);
+
+static void alg_do_release(const struct af_alg_type *type, void *private)
+{
+	if (!type)
+		return;
+
+	type->release(private);
+	module_put(type->owner);
+}
+
+int af_alg_release(struct socket *sock)
+{
+	sock_put(sock->sk);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(af_alg_release);
+
+static int alg_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
+{
+	struct sock *sk = sock->sk;
+	struct alg_sock *ask = alg_sk(sk);
+	struct sockaddr_alg *sa = (void *)uaddr;
+	const struct af_alg_type *type;
+	void *private;
+
+	if (sock->state == SS_CONNECTED)
+		return -EINVAL;
+
+	if (addr_len != sizeof(*sa))
+		return -EINVAL;
+
+	sa->salg_type[sizeof(sa->salg_type) - 1] = 0;
+	sa->salg_name[sizeof(sa->salg_name) - 1] = 0;
+
+	type = alg_get_type(sa->salg_type);
+	if (IS_ERR(type) && PTR_ERR(type) == -ENOENT) {
+		request_module("algif-%s", sa->salg_type);
+		type = alg_get_type(sa->salg_type);
+	}
+
+	if (IS_ERR(type))
+		return PTR_ERR(type);
+
+	private = type->bind(sa->salg_name, sa->salg_feat, sa->salg_mask);
+	if (IS_ERR(private)) {
+		module_put(type->owner);
+		return PTR_ERR(private);
+	}
+
+	lock_sock(sk);
+
+	swap(ask->type, type);
+	swap(ask->private, private);
+
+	release_sock(sk);
+
+	alg_do_release(type, private);
+
+	return 0;
+}
+
+static int alg_setkey(struct sock *sk, char __user *ukey,
+		      unsigned int keylen)
+{
+	struct alg_sock *ask = alg_sk(sk);
+	const struct af_alg_type *type = ask->type;
+	u8 *key;
+	int err;
+
+	key = sock_kmalloc(sk, keylen, GFP_KERNEL);
+	if (!key)
+		return -ENOMEM;
+
+	if (copy_from_user(key, ukey, keylen))
+		return -EFAULT;
+
+	err = type->setkey(ask->private, key, keylen);
+
+	sock_kfree_s(sk, key, keylen);
+
+	return err;
+}
+
+static int alg_setsockopt(struct socket *sock, int level, int optname,
+			  char __user *optval, unsigned int optlen)
+{
+	struct sock *sk = sock->sk;
+	struct alg_sock *ask = alg_sk(sk);
+	const struct af_alg_type *type = ask->type;
+
+	if (level != SOL_ALG || !type)
+		return -ENOPROTOOPT;
+
+	switch (optname) {
+	case ALG_SET_KEY:
+		if (sock->state == SS_CONNECTED)
+			return -ENOPROTOOPT;
+		if (!type->setkey)
+			return -ENOPROTOOPT;
+
+		return alg_setkey(sk, optval, optlen);
+	}
+
+	return -ENOPROTOOPT;
+}
+
+int af_alg_accept(struct sock *sk, struct socket *newsock)
+{
+	struct alg_sock *ask = alg_sk(sk);
+	const struct af_alg_type *type = ask->type;
+	struct sock *sk2;
+	int err;
+
+	if (!type)
+		return -EINVAL;
+
+	sk2 = sk_alloc(sock_net(sk), PF_ALG, GFP_KERNEL, &alg_proto);
+	if (!sk2)
+		return -ENOMEM;
+
+	sock_init_data(newsock, sk2);
+
+	err = type->accept(ask->private, sk2);
+	if (err) {
+		sk_free(sk2);
+		return err;
+	}
+
+	sk2->sk_family = PF_ALG;
+
+	sock_hold(sk);
+	alg_sk(sk2)->parent = sk;
+	alg_sk(sk2)->type = type;
+
+	newsock->ops = type->ops;
+	newsock->state = SS_CONNECTED;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(af_alg_accept);
+
+static int alg_accept(struct socket *sock, struct socket *newsock, int flags)
+{
+	return af_alg_accept(sock->sk, newsock);
+}
+
+static const struct proto_ops alg_proto_ops = {
+	.family		=	PF_ALG,
+	.owner		=	THIS_MODULE,
+
+	.connect	=	sock_no_connect,
+	.socketpair	=	sock_no_socketpair,
+	.getname	=	sock_no_getname,
+	.ioctl		=	sock_no_ioctl,
+	.listen		=	sock_no_listen,
+	.shutdown	=	sock_no_shutdown,
+	.getsockopt	=	sock_no_getsockopt,
+	.mmap		=	sock_no_mmap,
+	.sendpage	=	sock_no_sendpage,
+	.sendmsg	=	sock_no_sendmsg,
+	.recvmsg	=	sock_no_recvmsg,
+	.poll		=	sock_no_poll,
+
+	.bind		=	alg_bind,
+	.release	=	af_alg_release,
+	.setsockopt	=	alg_setsockopt,
+	.accept		=	alg_accept,
+};
+
+static void alg_sock_destruct(struct sock *sk)
+{
+	struct alg_sock *ask = alg_sk(sk);
+
+	alg_do_release(ask->type, ask->private);
+}
+
+static int alg_create(struct net *net, struct socket *sock, int protocol,
+		      int kern)
+{
+	struct sock *sk;
+	int err;
+
+	if (sock->type != SOCK_SEQPACKET)
+		return -ESOCKTNOSUPPORT;
+	if (protocol != 0)
+		return -EPROTONOSUPPORT;
+
+	err = -ENOMEM;
+	sk = sk_alloc(net, PF_ALG, GFP_KERNEL, &alg_proto);
+	if (!sk)
+		goto out;
+
+	sock->ops = &alg_proto_ops;
+	sock_init_data(sock, sk);
+
+	sk->sk_family = PF_ALG;
+	sk->sk_destruct = alg_sock_destruct;
+
+	return 0;
+out:
+	return err;
+}
+
+static const struct net_proto_family alg_family = {
+	.family	=	PF_ALG,
+	.create	=	alg_create,
+	.owner	=	THIS_MODULE,
+};
+
+int af_alg_make_sg(struct af_alg_sgl *sgl, void *addr, int len, int write)
+{
+	unsigned long from = (unsigned long)addr;
+	unsigned long npages;
+	unsigned off;
+	int err;
+	int i;
+
+	err = -EFAULT;
+	if (!access_ok(write ? VERIFY_READ : VERIFY_WRITE, from, len))
+		goto out;
+
+	off = from & ~PAGE_MASK;
+	npages = (off + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	err = get_user_pages_fast(from, npages, write, sgl->pages);
+	if (err < 0)
+		goto out;
+
+	npages = err;
+	err = -EINVAL;
+	if (WARN_ON(npages == 0))
+		goto out;
+
+	err = 0;
+
+	sg_init_table(sgl->sg, npages);
+
+	for (i = 0; i < npages; i++) {
+		int plen = min_t(int, len, PAGE_SIZE - off);
+
+		sg_set_page(sgl->sg + i, sgl->pages[i], plen, off);
+
+		off = 0;
+		len -= plen;
+		err += plen;
+	}
+
+out:
+	return err;
+}
+EXPORT_SYMBOL_GPL(af_alg_make_sg);
+
+void af_alg_free_sg(struct af_alg_sgl *sgl)
+{
+	int i;
+
+	i = 0;
+	do {
+		put_page(sgl->pages[i]);
+	} while (!sg_is_last(sgl->sg + (i++)));
+}
+EXPORT_SYMBOL_GPL(af_alg_free_sg);
+
+int af_alg_cmsg_send(struct msghdr *msg, struct af_alg_control *con)
+{
+	struct cmsghdr *cmsg;
+
+	for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
+		if (!CMSG_OK(msg, cmsg))
+			return -EINVAL;
+		if (cmsg->cmsg_level != SOL_ALG)
+			continue;
+
+		switch(cmsg->cmsg_type) {
+		case ALG_SET_IV:
+			if (cmsg->cmsg_len < sizeof(*con->iv))
+				return -EINVAL;
+			con->iv = (void *)CMSG_DATA(cmsg);
+			if (cmsg->cmsg_len < con->iv->ivlen +
+					     sizeof(con->iv->ivlen))
+				return -EINVAL;
+			break;
+
+		case ALG_SET_OP:
+			if (cmsg->cmsg_len < sizeof(u32))
+				return -EINVAL;
+			con->op = *(u32 *)CMSG_DATA(cmsg);
+			break;
+
+		default:
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(af_alg_cmsg_send);
+
+static int __init af_alg_init(void)
+{
+	int err = proto_register(&alg_proto, 0);
+
+	if (err)
+		goto out;
+
+	err = sock_register(&alg_family);
+	if (err != 0)
+		goto out_unregister_proto;
+
+out:
+	return err;
+
+out_unregister_proto:
+	proto_unregister(&alg_proto);
+	goto out;
+}
+
+static void __exit af_alg_exit(void)
+{
+	sock_unregister(PF_ALG);
+	proto_unregister(&alg_proto);
+}
+
+module_init(af_alg_init);
+module_exit(af_alg_exit);
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_NETPROTO(AF_ALG);
diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
new file mode 100644
index 0000000..e303910
--- /dev/null
+++ b/include/crypto/if_alg.h
@@ -0,0 +1,75 @@
+/*
+ * if_alg: User-space algorithm interface
+ *
+ * Copyright (c) 2010 Herbert Xu <herbert@gondor.apana.org.au>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#ifndef _CRYPTO_IF_ALG_H
+#define _CRYPTO_IF_ALG_H
+
+#include <linux/compiler.h>
+#include <linux/if_alg.h>
+#include <linux/types.h>
+#include <net/sock.h>
+
+#define ALG_MAX_PAGES			16
+
+struct alg_sock {
+	/* struct sock must be the first member of struct alg_sock */
+	struct sock sk;
+
+	struct sock *parent;
+
+	const struct af_alg_type *type;
+	void *private;
+};
+
+struct af_alg_control {
+	struct af_alg_iv *iv;
+	int op;
+};
+
+struct af_alg_type {
+	void *(*bind)(const char *name, u32 type, u32 mask);
+	void (*release)(void *private);
+	int (*setkey)(void *private, const u8 *key, unsigned int keylen);
+	int (*accept)(void *private, struct sock *sk);
+
+	struct proto_ops *ops;
+	struct module *owner;
+	char name[14];
+};
+
+struct af_alg_sgl {
+	struct scatterlist sg[ALG_MAX_PAGES];
+	struct page *pages[ALG_MAX_PAGES];
+};
+
+int af_alg_register_type(const struct af_alg_type *type);
+int af_alg_unregister_type(const struct af_alg_type *type);
+
+int af_alg_release(struct socket *sock);
+int af_alg_accept(struct sock *sk, struct socket *newsock);
+
+int af_alg_make_sg(struct af_alg_sgl *sgl, void *addr, int len, int write);
+void af_alg_free_sg(struct af_alg_sgl *sgl);
+
+int af_alg_cmsg_send(struct msghdr *msg, struct af_alg_control *con);
+
+static inline struct alg_sock *alg_sk(struct sock *sk)
+{
+	return (struct alg_sock *)sk;
+}
+
+static inline void af_alg_release_parent(struct sock *sk)
+{
+	sock_put(alg_sk(sk)->parent);
+}
+
+#endif	/* _CRYPTO_IF_ALG_H */
diff --git a/include/linux/if_alg.h b/include/linux/if_alg.h
new file mode 100644
index 0000000..0f9acce
--- /dev/null
+++ b/include/linux/if_alg.h
@@ -0,0 +1,40 @@
+/*
+ * if_alg: User-space algorithm interface
+ *
+ * Copyright (c) 2010 Herbert Xu <herbert@gondor.apana.org.au>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#ifndef _LINUX_IF_ALG_H
+#define _LINUX_IF_ALG_H
+
+#include <linux/types.h>
+
+struct sockaddr_alg {
+	__u16	salg_family;
+	__u8	salg_type[14];
+	__u32	salg_feat;
+	__u32	salg_mask;
+	__u8	salg_name[64];
+};
+
+struct af_alg_iv {
+	__u32	ivlen;
+	__u8	iv[0];
+};
+
+/* Socket options */
+#define ALG_SET_KEY			1
+#define ALG_SET_IV			2
+#define ALG_SET_OP			3
+
+/* Operations */
+#define ALG_OP_DECRYPT			0
+#define ALG_OP_ENCRYPT			1
+
+#endif	/* _LINUX_IF_ALG_H */

^ permalink raw reply related

* [PATCH 1/4] net - Add AF_ALG macros
From: Herbert Xu @ 2010-10-19 13:46 UTC (permalink / raw)
  To: Linux Crypto Mailing List, netdev, Linux Kernel Mailing List
In-Reply-To: <20100907084213.GA4610@gondor.apana.org.au>

net - Add AF_ALG macros

This patch adds the socket family/level macros for the yet-to-be-born
AF_ALG family.  The AF_ALG family provides the user-space interface
for the kernel crypto API.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/socket.h |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index a2fada9..3dd2472 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -193,7 +193,8 @@ struct ucred {
 #define AF_PHONET	35	/* Phonet sockets		*/
 #define AF_IEEE802154	36	/* IEEE802154 sockets		*/
 #define AF_CAIF		37	/* CAIF sockets			*/
-#define AF_MAX		38	/* For now.. */
+#define AF_ALG		38	/* Algorithm sockets		*/
+#define AF_MAX		39	/* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC	AF_UNSPEC
@@ -234,6 +235,7 @@ struct ucred {
 #define PF_PHONET	AF_PHONET
 #define PF_IEEE802154	AF_IEEE802154
 #define PF_CAIF		AF_CAIF
+#define PF_ALG		AF_ALG
 #define PF_MAX		AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
@@ -307,6 +309,7 @@ struct ucred {
 #define SOL_RDS		276
 #define SOL_IUCV	277
 #define SOL_CAIF	278
+#define SOL_ALG		279
 
 /* IPX options */
 #define IPX_TYPE	1

^ permalink raw reply related

* Re: RFC: Crypto API User-interface
From: Herbert Xu @ 2010-10-19 13:44 UTC (permalink / raw)
  To: Linux Crypto Mailing List, netdev, Linux Kernel Mailing List
In-Reply-To: <20100907084213.GA4610@gondor.apana.org.au>

On Tue, Sep 07, 2010 at 04:42:13PM +0800, Herbert Xu wrote:
> 
> This is what I am proposing for the Crypto API user-interface.
> 
> Note that this is the interface for operations.  There will be
> a separate interface (most likely netlink) for configuring crypto
> algorithms, e.g., picking a specific AES implementation as the
> system default.

OK I've gone ahead and implemented the user-space API for hashes
and ciphers.

To recap this interface is designed to allow user-space programs
to access hardware cryptographic accelerators that we have added
to the kernel.

The intended usage scenario is where a large amount of data needs
to be processed where the benefits offered by hardware acceleration
that is normally unavailable in user-space (as opposed to ones
such as the Intel AES instruction which may be used directly from
user-space) outweigh the overhead of going through the kernel.

In order to further minimise the overhead in these cases, this
interface offers the option of avoiding copying data between
user-space and the kernel where possible and appropriate.  For
ciphers this means the use of the splice(2) interface instead of
sendmsg(2)

Here is a sample hash program (note that these only illustrate
what the interface looks like and are not meant to be good examples
of coding :)

int main(void)
{
	int opfd;
	int tfmfd;
	struct sockaddr_alg sa = {
		.salg_family = AF_ALG,
		.salg_type = "hash",
		.salg_name = "sha1"
	};
	char buf[20];
	int i;

	tfmfd = socket(AF_ALG, SOCK_SEQPACKET, 0);

	bind(tfmfd, (struct sockaddr *)&sa, sizeof(sa));

	opfd = accept(tfmfd, NULL, 0);

	write(opfd, "abc", 3);
	read(opfd, buf, 20);

	for (i = 0; i < 20; i++) {
		printf("%02x", (unsigned char)buf[i]);
	}
	printf("\n");

	close(opfd);
	close(tfmfd);

	return 0;
}

And here is one for ciphers:

int main(void)
{
	int opfd;
	int tfmfd;
	struct sockaddr_alg sa = {
		.salg_family = AF_ALG,
		.salg_type = "skcipher",
		.salg_name = "cbc(aes)"
	};
	struct msghdr msg = {};
	struct cmsghdr *cmsg;
	char cbuf[CMSG_SPACE(4) + CMSG_SPACE(20)];
	char buf[16];
	struct af_alg_iv *iv;
	struct iovec iov;
	int i;

	tfmfd = socket(AF_ALG, SOCK_SEQPACKET, 0);

	bind(tfmfd, (struct sockaddr *)&sa, sizeof(sa));

	setsockopt(tfmfd, SOL_ALG, ALG_SET_KEY,
		   "\x06\xa9\x21\x40\x36\xb8\xa1\x5b"
		   "\x51\x2e\x03\xd5\x34\x12\x00\x06", 16);

	opfd = accept(tfmfd, NULL, 0);

	msg.msg_control = cbuf;
	msg.msg_controllen = sizeof(cbuf);

	cmsg = CMSG_FIRSTHDR(&msg);
	cmsg->cmsg_level = SOL_ALG;
	cmsg->cmsg_type = ALG_SET_OP;
	cmsg->cmsg_len = CMSG_LEN(4);
	*(__u32 *)CMSG_DATA(cmsg) = ALG_OP_ENCRYPT;

	cmsg = CMSG_NXTHDR(&msg, cmsg);
	cmsg->cmsg_level = SOL_ALG;
	cmsg->cmsg_type = ALG_SET_IV;
	cmsg->cmsg_len = CMSG_LEN(20);
	iv = (void *)CMSG_DATA(cmsg);
	iv->ivlen = 16;
	memcpy(iv->iv, "\x3d\xaf\xba\x42\x9d\x9e\xb4\x30"
		       "\xb4\x22\xda\x80\x2c\x9f\xac\x41", 16);

	iov.iov_base = "Single block msg";
	iov.iov_len = 16;

	msg.msg_iov = &iov;
	msg.msg_iovlen = 1;

	sendmsg(opfd, &msg, 0);
	read(opfd, buf, 16);

	for (i = 0; i < 16; i++) {
		printf("%02x", (unsigned char)buf[i]);
	}
	printf("\n");

	close(opfd);
	close(tfmfd);

	return 0;
}

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: Linux 2.6.35/TIPC 2.0 ABI breaking changes
From: Neil Horman @ 2010-10-19 13:28 UTC (permalink / raw)
  To: Leandro Lucarella
  Cc: jon.maloy, netdev, linux-kernel, paul.gortmaker, tipc-discussion,
	David Miller
In-Reply-To: <20101019131936.GB8781@llucax.com.ar>

On Tue, Oct 19, 2010 at 10:19:36AM -0300, Leandro Lucarella wrote:
> Neil Horman, el 19 de octubre a las 07:04 me escribiste:
> > On Tue, Oct 19, 2010 at 01:16:49AM -0700, David Miller wrote:
> > > From: Leandro Lucarella <luca@llucax.com.ar>
> > > Date: Mon, 18 Oct 2010 23:16:57 -0300
> > > 
> > > > 
> > > > The problem is not between the tipc stacks in different hosts, is
> > > > between the tipc stack and the applications using it (well, maybe
> > > > there is a problem somewhere else too).
> > > > 
> > > > This was a deliberate API change, not a subtle bug...
> > > 
> > > Neil et al., if these packets live only between the kernel stack
> > > and the userspace API layer, we should not be byte-swapping this
> > > stuff and we need to fix this fast.
> > > 
> > Copy that Dave.  I think I see the problem.  The subscription code handles
> > messages both off the wire and from local user space.  The off the wire case
> > should work because the subscription code assumes that all the incomming data is
> > in network byte order, but user space is an exception to that rule as its in
> > local byte order.  I'll have a patch together for Leandro to test soon.
> > Neil
> 
> Thank you very much. Bare in mind that the byte order is just one of the
> problems, the other problem is the change in the value of
> TIPC_SUB_SERVICE from 2 to 0. That too is breaking the API/ABI, as
> a message with a filter value of 2 is rejected by TIPC 2.0/2.6.35+.
> 
Yeah, that was the format change.  We might just need to revert that.  Lets see
about getting the endianess issue straight, and we'll go from there.
Neil

> -- 
> Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
> ----------------------------------------------------------------------
> GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
> ----------------------------------------------------------------------
> Dentro de 30 años Argentina va a ser un gran supermercado con 15
> changuitos, porque esa va a ser la cantidad de gente que va a poder
> comprar algo.
> 	-- Sidharta Wiki
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

------------------------------------------------------------------------------
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly 
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev

^ permalink raw reply

* Re: [Ksummit-2010-discuss] [v2] Remaining BKL users, what to do
From: Arnd Bergmann @ 2010-10-19 13:26 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Theodore Kilgore, Steven Rostedt, Greg KH, codalist, autofs,
	Samuel Ortiz, Jan Kara, Mikulas Patocka, Jan Harkes, netdev,
	Anders Larsen, linux-kernel, dri-devel, Bryan Schumaker,
	Christoph Hellwig, ksummit-2010-discuss, Petr Vandrovec,
	Arnaldo Carvalho de Melo, linux-fsdevel, Evgeniy Dushistov,
	Ingo Molnar, Andrew Hendry, linux-media
In-Reply-To: <201010190926.54635.arnd@arndb.de>

On Tuesday 19 October 2010, Arnd Bergmann wrote:
> On Tuesday 19 October 2010 06:52:32 Dave Airlie wrote:
> > > I might be able to find some hardware still lying around here that uses an
> > > i810. Not sure unless I go hunting it. But I get the impression that if
> > > the kernel is a single-CPU kernel there is not any problem anyway? Don't
> > > distros offer a non-smp kernel as an installation option in case the user
> > > needs it? So in reality how big a problem is this?
> > 
> > Not anymore, which is my old point of making a fuss. Nowadays in the
> > modern distro world, we supply a single kernel that can at runtime
> > decide if its running on SMP or UP and rewrite the text section
> > appropriately with locks etc. Its like magic, and something like
> > marking drivers as BROKEN_ON_SMP at compile time is really wrong when
> > what you want now is a runtime warning if someone tries to hotplug a
> > CPU with a known iffy driver loaded or if someone tries to load the
> > driver when we are already in SMP mode.
> 
> We could make the driver run-time non-SMP by adding
> 
> 	if (num_present_cpus() > 1) {
> 		pr_err("i810 no longer supports SMP\n");
> 		return -EINVAL;
> 	}
> 
> to the init function. That would cover the vast majority of the
> users of i810 hardware, I guess.

Some research showed that Intel never support i810/i815 SMP setups,
but there was indeed one company (http://www.acorpusa.com at the time,
now owned by a domain squatter) that made i815E based dual Pentium-III
boards like this one: http://cgi.ebay.com/280319795096

The first person that can send me an authentic log file showing the
use of X.org with DRM on a 2.6.35 kernel with two processors on that
mainboard dated today or earlier gets a free upgrade to an AGP graphics
card of comparable or better 3D performance from me. Please include
the story how why you are running this machine with a new kernel.

i830 is harder, apparently some i865G boards support Pentium 4 with HT
and even later dual-core processors.

	Arnd

^ permalink raw reply

* Re: Linux 2.6.35/TIPC 2.0 ABI breaking changes
From: Leandro Lucarella @ 2010-10-19 13:19 UTC (permalink / raw)
  To: Neil Horman
  Cc: jon.maloy, netdev, linux-kernel, paul.gortmaker, tipc-discussion,
	David Miller
In-Reply-To: <20101019110452.GA14410@hmsreliant.think-freely.org>

Neil Horman, el 19 de octubre a las 07:04 me escribiste:
> On Tue, Oct 19, 2010 at 01:16:49AM -0700, David Miller wrote:
> > From: Leandro Lucarella <luca@llucax.com.ar>
> > Date: Mon, 18 Oct 2010 23:16:57 -0300
> > 
> > > 
> > > The problem is not between the tipc stacks in different hosts, is
> > > between the tipc stack and the applications using it (well, maybe
> > > there is a problem somewhere else too).
> > > 
> > > This was a deliberate API change, not a subtle bug...
> > 
> > Neil et al., if these packets live only between the kernel stack
> > and the userspace API layer, we should not be byte-swapping this
> > stuff and we need to fix this fast.
> > 
> Copy that Dave.  I think I see the problem.  The subscription code handles
> messages both off the wire and from local user space.  The off the wire case
> should work because the subscription code assumes that all the incomming data is
> in network byte order, but user space is an exception to that rule as its in
> local byte order.  I'll have a patch together for Leandro to test soon.
> Neil

Thank you very much. Bare in mind that the byte order is just one of the
problems, the other problem is the change in the value of
TIPC_SUB_SERVICE from 2 to 0. That too is breaking the API/ABI, as
a message with a filter value of 2 is rejected by TIPC 2.0/2.6.35+.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Dentro de 30 años Argentina va a ser un gran supermercado con 15
changuitos, porque esa va a ser la cantidad de gente que va a poder
comprar algo.
	-- Sidharta Wiki

------------------------------------------------------------------------------
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly 
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

^ permalink raw reply

* Re: [Ksummit-2010-discuss] [v2] Remaining BKL users, what to do
From: Steven Rostedt @ 2010-10-19 12:39 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Dave Airlie, Theodore Kilgore, Greg KH, codalist, autofs,
	Samuel Ortiz, Jan Kara, Mikulas Patocka, Jan Harkes, netdev,
	Anders Larsen, linux-kernel, dri-devel, Bryan Schumaker,
	Christoph Hellwig, ksummit-2010-discuss, Petr Vandrovec,
	Arnaldo Carvalho de Melo, linux-fsdevel, Evgeniy Dushistov,
	Ingo Molnar, Andrew Hendry, linux-media
In-Reply-To: <201010190926.54635.arnd@arndb.de>

On Tue, 2010-10-19 at 09:26 +0200, Arnd Bergmann wrote:
> On Tuesday 19 October 2010 06:52:32 Dave Airlie wrote:
> > > I might be able to find some hardware still lying around here that uses an
> > > i810. Not sure unless I go hunting it. But I get the impression that if
> > > the kernel is a single-CPU kernel there is not any problem anyway? Don't
> > > distros offer a non-smp kernel as an installation option in case the user
> > > needs it? So in reality how big a problem is this?
> > 
> > Not anymore, which is my old point of making a fuss. Nowadays in the
> > modern distro world, we supply a single kernel that can at runtime
> > decide if its running on SMP or UP and rewrite the text section
> > appropriately with locks etc. Its like magic, and something like
> > marking drivers as BROKEN_ON_SMP at compile time is really wrong when
> > what you want now is a runtime warning if someone tries to hotplug a
> > CPU with a known iffy driver loaded or if someone tries to load the
> > driver when we are already in SMP mode.
> 
> We could make the driver run-time non-SMP by adding
> 
> 	if (num_present_cpus() > 1) {
> 		pr_err("i810 no longer supports SMP\n");
> 		return -EINVAL;
> 	}
> 
> to the init function. That would cover the vast majority of the
> users of i810 hardware, I guess.

I think we also need to cover the PREEMPT case too. But that could be a
compile time check, since you can't boot a preempt kernel and make it
non preempt.

-- Steve

^ permalink raw reply

* You 've won $650,000.00.
From: UNITED NATIONS TRUST FUND @ 2010-10-19 11:56 UTC (permalink / raw)




^ permalink raw reply

* Re: [patch 1/1] phy/marvell: fix 88e1121 support
From: Arnaud Patard @ 2010-10-19 11:49 UTC (permalink / raw)
  To: cyril; +Cc: netdev@vger.kernel.org, David S. Miller
In-Reply-To: <4CBD785E.8040704@ti.com>

Cyril Chemparathy <cyril@ti.com> writes:

> Hi Arnaud,

Hi,

>
> On 10/18/2010 06:29 PM, Arnaud Patard wrote:
>> Commit c477d0447db08068a497e7beb892b2b2a7bff64b added support for RGMII
>> rx/tx delays except that it ends up clearing rx/tx delays bit for modes
>> differents that RGMII*ID. Due to this, ethernet is not working anymore
>> on my guruplug server +. This patch is fixing that.
>> 
>> Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
>> Index: linux-2.6/drivers/net/phy/marvell.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/net/phy/marvell.c	2010-10-18 22:46:09.000000000 +0200
>> +++ linux-2.6/drivers/net/phy/marvell.c	2010-10-19 00:20:22.000000000 +0200
>> @@ -196,20 +196,27 @@
>>  			MII_88E1121_PHY_MSCR_PAGE);
>>  	if (err < 0)
>>  		return err;
>> -	mscr = phy_read(phydev, MII_88E1121_PHY_MSCR_REG) &
>> -		MII_88E1121_PHY_MSCR_DELAY_MASK;
>>  
>> -	if (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
>> -		mscr |= (MII_88E1121_PHY_MSCR_RX_DELAY |
>> -			 MII_88E1121_PHY_MSCR_TX_DELAY);
>> -	else if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID)
>> -		mscr |= MII_88E1121_PHY_MSCR_RX_DELAY;
>> -	else if (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID)
>> -		mscr |= MII_88E1121_PHY_MSCR_TX_DELAY;
>> +	if ((phydev->interface == PHY_INTERFACE_MODE_RGMII) ||
>> +	    (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID) ||
>> +	    (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID) ||
>> +	    (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID)) {
>>  
>> -	err = phy_write(phydev, MII_88E1121_PHY_MSCR_REG, mscr);
>> -	if (err < 0)
>> -		return err;
>> +		mscr = phy_read(phydev, MII_88E1121_PHY_MSCR_REG) &
>> +			MII_88E1121_PHY_MSCR_DELAY_MASK;
>> +
>> +		if (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
>> +			mscr |= (MII_88E1121_PHY_MSCR_RX_DELAY |
>> +				 MII_88E1121_PHY_MSCR_TX_DELAY);
>> +		else if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID)
>> +			mscr |= MII_88E1121_PHY_MSCR_RX_DELAY;
>> +		else if (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID)
>> +			mscr |= MII_88E1121_PHY_MSCR_TX_DELAY;
>> +
>> +		err = phy_write(phydev, MII_88E1121_PHY_MSCR_REG, mscr);
>> +		if (err < 0)
>> +			return err;
>> +	}
>>  
>>  	phy_write(phydev, MII_88E1121_PHY_PAGE, oldpage);
>
> That looks more correct.  Just out of curiosity, what is the interface
> mode on your platform?

The interface is in GMII mode iirc. The device is the ethernet embedded
in the Kirkwood SoC with a 88e1121 phy connected. It's handled by the
mv643xx_eth driver and the marvell phy driver.

Arnaud

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox