* Re: vlan tagged packets and libpcap breakage
From: Guy Harris @ 2012-12-17 10:35 UTC (permalink / raw)
To: David Laight
Cc: Michael Richardson, netdev, Francesco Ruggeri, Daniel Borkmann,
tcpdump-workers
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B70EF@saturn3.aculab.com>
On Dec 17, 2012, at 1:50 AM, "David Laight" <David.Laight@ACULAB.COM> wrote:
> How are you going to tell whether a feature is present in a non-Linux
> kernel ?
The Linux memory-mapped capture mechanism is not present in a non-Linux kernel, so all the libpcap work involved here would, if necessary on other platforms, have to be done differently on those platforms. Those platforms would have to have their own mechanisms to indicate whether any changes to filter code, processing of VLAN tags supplied out of band, etc. would need to be done.
The same would apply to other additional features of the Linux memory-mapped capture mechanism that require changes in libpcap. (Ideally, those changes would only require changes in order to use them, and would not break existing userland code, including but not limited to libpcap - your reply was to Daniel Borkmann, who is, I believe, the originator of netsniff-ng:
http://netsniff-ng.org
which has its own code using PF_PACKET sockets.)
_______________________________________________
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
^ permalink raw reply
* RE: [PATCH] netlink: align attributes on 64-bits
From: David Laight @ 2012-12-17 9:59 UTC (permalink / raw)
To: Nicolas Dichtel, tgraf; +Cc: netdev, davem
In-Reply-To: <1355491002-3931-1-git-send-email-nicolas.dichtel@6wind.com>
> - if (unlikely(skb_tailroom(skb) < nla_total_size(attrlen)))
> + int align = IS_ALIGNED((unsigned long)skb_tail_pointer(skb), sizeof(void *)) ? 0 : 4;
> +
> + if (unlikely(skb_tailroom(skb) < nla_total_size(attrlen) + align))
> return -EMSGSIZE;
>
> + if (align) {
> + /* Goal is to add an attribute with size 4. We know that
> + * NLA_HDRLEN is 4, hence payload is 0.
> + */
> + __nla_reserve(skb, 0, 0);
> + }
> +
Shouldn't the size of the dummy parameter be based on the value
of 'align' - and that be based on the amount of padding needed?
That aligns the write pointer, what guarantees the alignment of
the start of the buffer - so that the reader will find aligned data?
What guarantees that the reader will read the data into an
8-byte aligned buffer.
There is also the lurking issue of items that require more
than 8-byte alignment.
(x86/amd64 requires 16-byte alignment for 16-byte SSE2 regs and
32-byte alignment for the AVX regs.)
Will anyone ever want to put such items into a netlink message?
David
^ permalink raw reply
* RE: [tcpdump-workers] vlan tagged packets and libpcap breakage
From: David Laight @ 2012-12-17 9:50 UTC (permalink / raw)
To: Daniel Borkmann, Ani Sinha
Cc: Michael Richardson, netdev, tcpdump-workers, Francesco Ruggeri
In-Reply-To: <CAD6jFUTht82HOjGjDU7hFCEWyE3TOx_W4_j=SZK-DrcGfrio-A@mail.gmail.com>
> > I do agree that instead of a /proc entry, we should check for a kenrel
> > version >= X where X is the upstream version that first started
> > supporting all the features needed by libpcap for vlan filtering. This
> > is not a compile time check but a run time one. Does anyone see any
> > issues with this? Is there any long term implications of this, like if
> > you backport patches to an older long term supported kernel? Are there
> > other better ways to do this, like may be returning feature bits from
> > an ioctl call? This is something we need to deal with on a continuous
> > basis as we keep supporting newer AUX fields and libpcap and other
> > user land code needs to make use of it. At the same time, they need to
> > handle backward compatibility issues with older kernels.
>
> As Eric mentioned earlier, for now there seems not to be a reliable
> way to get to know which ops are present and which not. It's not
> really nice, but if you want to make use of those new (ANC*) features,
> probably checking kernel version might be the only way if I'm not
> missing something. Now net-next is closed, but if it reopens, I'll
> submit a version 2 of my patch where you've been CC'd to. If it gets
> in, then at least it's for sure that since kernel <xyz> this kind of
> feature test is present.
How are you going to tell whether a feature is present in a non-Linux
kernel ?
Testing kernel versions is somewhat suboptimal as support
could be patched into a much older kernel (maybe not for
this but ...)
David
^ permalink raw reply
* Re: [PATCH v3] netfilter: nf_conntrack_sip: Handle Cisco 7941/7945 IP phones
From: Pablo Neira Ayuso @ 2012-12-17 9:55 UTC (permalink / raw)
To: Kevin Cernekee
Cc: David Woodhouse, Eric Dumazet, Patrick McHardy, David S. Miller,
Alexey Kuznetsov, Pekka Savola (ipv6), James Morris,
Hideaki YOSHIFUJI, netfilter-devel, netfilter, coreteam,
linux-kernel, netdev
In-Reply-To: <CAJiQ=7BBquMQmQWp3=aD_s3-rSYr4Y+gke0GJKCkJV-mq5buug@mail.gmail.com>
On Sun, Dec 16, 2012 at 11:26:31PM -0800, Kevin Cernekee wrote:
> On Sun, Dec 16, 2012 at 4:44 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >> What happened to this? OpenWRT is still carrying it, and it broke in
> >> 3.7. Here's a completely untested update...
> >
> > I requested Kevin to resend a new version based on the current kernel
> > tree while spinning on old pending patches since I have no access to
> > that hardware, but no luck.
> >
> > So I'll review this and, since OpenWRT is carrying, I guess we can get
> > this into net-next merge window.
>
> Sorry, been putting it off since the OpenWRT version has worked flawlessly...
>
> I just reassembled my test rig and I'll get you a working patch this week.
>
> Is it OK to use
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git as the
> baseline?
That's fine in this case because no recent changes went into that
code, but better if you use the netfilter next tree:
git://1984.lsi.us.es/nf-next
Thanks Kevin.
^ permalink raw reply
* Re: [PATCH 4/4] FEC: Add time stamping code and a PTP hardware clock
From: Sascha Hauer @ 2012-12-17 9:13 UTC (permalink / raw)
To: Frank Li
Cc: lznua, richardcochran, shawn.guo, linux-arm-kernel, netdev, davem
In-Reply-To: <1351657531-25989-1-git-send-email-Frank.Li@freescale.com>
On Wed, Oct 31, 2012 at 12:25:31PM +0800, Frank Li wrote:
> This patch adds a driver for the FEC(MX6) that offers time
> stamping and a PTP haderware clock. Because FEC\ENET(MX6)
> hardware frequency adjustment is complex, we have implemented
> this in software by changing the multiplication factor of the
> timecounter.
>
> Signed-off-by: Frank Li <Frank.Li@freescale.com>
> ---
> drivers/net/ethernet/freescale/Kconfig | 9 +
> drivers/net/ethernet/freescale/Makefile | 1 +
> drivers/net/ethernet/freescale/fec.c | 88 +++++++-
> drivers/net/ethernet/freescale/fec.h | 38 +++
> drivers/net/ethernet/freescale/fec_ptp.c | 386 ++++++++++++++++++++++++++++++
> 5 files changed, 521 insertions(+), 1 deletions(-)
> create mode 100644 drivers/net/ethernet/freescale/fec_ptp.c
>
> diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig
> index feff516..ff3be53 100644
> --- a/drivers/net/ethernet/freescale/Kconfig
> +++ b/drivers/net/ethernet/freescale/Kconfig
> @@ -92,4 +92,13 @@ config GIANFAR
> This driver supports the Gigabit TSEC on the MPC83xx, MPC85xx,
> and MPC86xx family of chips, and the FEC on the 8540.
>
> +config FEC_PTP
> + bool "PTP Hardware Clock (PHC)"
> + depends on FEC
> + select PPS
> + select PTP_1588_CLOCK
> + --help---
> + Say Y here if you want to use PTP Hardware Clock (PHC) in the
> + driver. Only the basic clock operations have been implemented.
> +
> endif # NET_VENDOR_FREESCALE
> diff --git a/drivers/net/ethernet/freescale/Makefile b/drivers/net/ethernet/freescale/Makefile
> index 3d1839a..d4d19b3 100644
> --- a/drivers/net/ethernet/freescale/Makefile
> +++ b/drivers/net/ethernet/freescale/Makefile
> @@ -3,6 +3,7 @@
> #
>
> obj-$(CONFIG_FEC) += fec.o
> +obj-$(CONFIG_FEC_PTP) += fec_ptp.o
> obj-$(CONFIG_FEC_MPC52xx) += fec_mpc52xx.o
> ifeq ($(CONFIG_FEC_MPC52xx_MDIO),y)
> obj-$(CONFIG_FEC_MPC52xx) += fec_mpc52xx_phy.o
> diff --git a/drivers/net/ethernet/freescale/fec.c b/drivers/net/ethernet/freescale/fec.c
> index d0e1b33..2665162 100644
> --- a/drivers/net/ethernet/freescale/fec.c
> +++ b/drivers/net/ethernet/freescale/fec.c
> @@ -280,6 +280,17 @@ fec_enet_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> | BD_ENET_TX_LAST | BD_ENET_TX_TC);
> bdp->cbd_sc = status;
>
> +#ifdef CONFIG_FEC_PTP
This ifdef desert in the fec driver currently breaks all SoCs except
i.MX6 in the imx_v6_v7_defconfig.
Most of these could be fixed with something like if (fec_use_ptp(fep)),
> #if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || \
> defined(CONFIG_M520x) || defined(CONFIG_M532x) || \
> defined(CONFIG_ARCH_MXC) || defined(CONFIG_SOC_IMX28)
> @@ -88,6 +94,13 @@ struct bufdesc {
> unsigned short cbd_datlen; /* Data length */
> unsigned short cbd_sc; /* Control and status info */
> unsigned long cbd_bufaddr; /* Buffer address */
> +#ifdef CONFIG_FEC_PTP
> + unsigned long cbd_esc;
> + unsigned long cbd_prot;
> + unsigned long cbd_bdu;
> + unsigned long ts;
> + unsigned short res0[4];
> +#endif
> };
This one changes the layout of the hardware buffer description which is
not so easy to fix.
I don't know how to continue from here. Since the whole patch doesn't
seem to reviewed very much I tend to say we should revert it for now and
let Frank redo it for the next merge window.
Other opinions?
Sascha
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
^ permalink raw reply
* Re: net/usb device additions for -stable
From: Bjørn Mork @ 2012-12-17 9:10 UTC (permalink / raw)
To: David Miller
Cc: netdev, jspurohit, valdis.kletnieks, jan.ceuleers, olof.ermis,
tommy7765
In-Reply-To: <20121214.181241.739657750054924669.davem@davemloft.net>
David Miller <davem@davemloft.net> writes:
> From: Bjørn Mork <bjorn@mork.no>
> Date: Tue, 13 Nov 2012 21:25:58 +0100
>
>> I looked quickly through the list of added devices in the range
>> v3.6..net/master and tried to cherry-pick them into the current 3.0,
>> 3.2, 3.4 and 3.6 stable trees. There weren't really that many. The
>> result was:
>>
>> # for stable-3.6:
>> af1b85e usb/ipheth: Add iPhone 5 support
>> c6846ee net: qmi_wwan: adding more ZTE devices
>> bbc8d92 net: cdc_ncm: add Huawei devices
>>
>> # for stable-3.4:
>> af1b85e usb/ipheth: Add iPhone 5 support
>>
>> # for stable-3.2:
>> af1b85e usb/ipheth: Add iPhone 5 support
>>
>> # for stable-3.0:
>> af1b85e usb/ipheth: Add iPhone 5 support
>
> The iPhone 5 change applied cleanly in all cases so I added that one.
> The others did not.
Sorry about that. They did in when I tested them, but the conditions
must have been different.
> Could you respin them for me and I'll queue them up for the next batch
> I send out?
I don't think there is much point anymore, as Greg has announced the
last 3.6 stable: https://lkml.org/lkml/2012/12/14/441
I'll come back with a set for 3.7 stable when there is something to add
instead.
Thanks,
Bjørn
^ permalink raw reply
* Re: [PATCH iproute2 6/6] ip/link_iptnl: fix indentation
From: Nicolas Dichtel @ 2012-12-17 8:44 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20121214100212.00297856@nehalam.linuxnetplumber.net>
Le 14/12/2012 19:02, Stephen Hemminger a écrit :
> On Thu, 13 Dec 2012 14:42:54 +0100
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>
>> Use tabs instead of space when possible.
>>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>
> Thanks applied all these.
>
Two patches are missing in your tree:
1/6 ip: update man pages and usage() for 'ip monitor'
2/6 ip: add man pages for netconf
Should I resend them?
^ permalink raw reply
* Re: [PATCH iproute2 v2] ip: use rtnelink to manage mroute
From: Nicolas Dichtel @ 2012-12-17 8:41 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20121214101004.56a1fb62@nehalam.linuxnetplumber.net>
Le 14/12/2012 19:10, Stephen Hemminger a écrit :
> On Thu, 13 Dec 2012 10:16:42 +0100
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>
>> mroute was using /proc/net/ip_mr_[vif|cache] to display mroute entries. Hence,
>> only RT_TABLE_DEFAULT was displayed and only IPv4.
>> With rtnetlink, it is possible to display all tables for IPv4 and IPv6. The output
>> format is kept. Also, like before the patch, statistics are displayed when user specify
>> the '-s' argument.
>>
>> The patch also adds the support of 'ip monitor mroute', which is now possible.
>>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>
> Applied. I had to clean up some merge conflicts because of applying your
> patches out of order. It would help if you would recheck the version
> that I just pushed to git.
>
Your version is ok.
^ permalink raw reply
* Re: openconnect triggers soft lockup in __skb_get_rxhash
From: Kirill A. Shutemov @ 2012-12-17 8:11 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, maxk, netdev, dwmw2
In-Reply-To: <1355719589.10504.13.camel@edumazet-glaptop>
On Sun, Dec 16, 2012 at 08:46:29PM -0800, Eric Dumazet wrote:
> On Mon, 2012-12-17 at 03:46 +0200, Kirill A. Shutemov wrote:
> > On Sun, Dec 16, 2012 at 05:22:14PM -0800, David Miller wrote:
> > >
> > > Already fixed in Linus's tree by:
> > >
> > > From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001
> >
> > No, it's not. I use up-to-date (2a74dbb) Linus tree with the patch in and
> > still see the issue.
> >
>
> Coud you try the following one liner ?
Works for me. So far no problems.
Reported-and-tested-by: Kirill A. Shutemov <kirill@shutemov.name>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 255a9f5..173acf5 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1199,6 +1199,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
> skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
> }
>
> + skb_reset_network_header(skb);
> rxhash = skb_get_rxhash(skb);
> netif_rx_ni(skb);
>
>
>
--
Kirill A. Shutemov
^ permalink raw reply
* Re: [PATCH v3] netfilter: nf_conntrack_sip: Handle Cisco 7941/7945 IP phones
From: Kevin Cernekee @ 2012-12-17 7:26 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: David Woodhouse, Eric Dumazet, Patrick McHardy, David S. Miller,
Alexey Kuznetsov, Pekka Savola (ipv6), James Morris,
Hideaki YOSHIFUJI, netfilter-devel, netfilter, coreteam,
linux-kernel, netdev
In-Reply-To: <20121217004457.GA12234@1984>
On Sun, Dec 16, 2012 at 4:44 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> What happened to this? OpenWRT is still carrying it, and it broke in
>> 3.7. Here's a completely untested update...
>
> I requested Kevin to resend a new version based on the current kernel
> tree while spinning on old pending patches since I have no access to
> that hardware, but no luck.
>
> So I'll review this and, since OpenWRT is carrying, I guess we can get
> this into net-next merge window.
Sorry, been putting it off since the OpenWRT version has worked flawlessly...
I just reassembled my test rig and I'll get you a working patch this week.
Is it OK to use
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git as the
baseline?
^ permalink raw reply
* Re: XFRM: Could we change ESP padding?
From: Steffen Klassert @ 2012-12-17 7:25 UTC (permalink / raw)
To: RongQing Li; +Cc: netdev
In-Reply-To: <CAJFZqHzCEJyvxc2NNh3_U8oT9Uh94N6EXLb4UA=twvVcVnEd5w@mail.gmail.com>
On Mon, Dec 17, 2012 at 02:56:47PM +0800, RongQing Li wrote:
> 2012/12/17 Steffen Klassert <steffen.klassert@secunet.com>:
> >
> > RFC 4303 says that the receiver should inspect the padding field,
> > so we are free to do it or not. You can find a comment that explains
> > why we don't do it in the esp_input_done2() function ;-)
> >
> Thanks.
>
> But I see BSD has implemented it, and cisco device has similar implmentation.
>
The comment at the place where the padding field inspection should be done
is rather old. I always respected this when I came accross this code, but
I would not mind to have it implemented. Not sure if somebody still
remembers exactly why it was not implemented.
^ permalink raw reply
* Re: XFRM: Could we change ESP padding?
From: RongQing Li @ 2012-12-17 6:56 UTC (permalink / raw)
To: Steffen Klassert; +Cc: netdev
In-Reply-To: <20121217064302.GK18940@secunet.com>
2012/12/17 Steffen Klassert <steffen.klassert@secunet.com>:
> On Mon, Dec 17, 2012 at 11:28:05AM +0800, RongQing Li wrote:
>> Hi:
>>
>> setkey has the below parameter, but this parameter seems not be
>> implemented in kernel and userspace,
>>
>> -f pad_option defines the content of the ESP padding.
>> pad_option is one of following:
>> zero-pad All the paddings are zero.
>> random-pad A series of randomized values are used.
>> seq-pad A series of sequential increasing numbers
>> started from 1 are used.
>>
>
> We can not implement this. As you already mentioned, RFC 4303
> makes strong statements on how the padding bytes are initialized.
> An IPsec implementation that checks the padding bytes would drop our
> packets if we don't use the padding method described in RFC 4303.
>
>>
>> and kernel seems not inspect the ESP padding content too, the result
>> is the packets are not dropped even if they are with a wrong pad
>> content(not a monotonically increasing sequence).
>>
>>
>> Could anyone tell me why, bad description in RFC, performance, lack time,
>> or other reason? Thanks very much!
>>
>
> RFC 4303 says that the receiver should inspect the padding field,
> so we are free to do it or not. You can find a comment that explains
> why we don't do it in the esp_input_done2() function ;-)
>
Thanks.
But I see BSD has implemented it, and cisco device has similar implmentation.
http://fxr.watson.org/fxr/source/netipsec/xform_esp.c
-RongQing
^ permalink raw reply
* Re: [PATCH] tuntap: fix ambigious multiqueue API
From: Jason Wang @ 2012-12-17 6:46 UTC (permalink / raw)
To: mst, davem, netdev, linux-kernel, pmoore; +Cc: wkevils, mprivozn
In-Reply-To: <1355478810-10144-1-git-send-email-jasowang@redhat.com>
----- Original Message -----
> The current multiqueue API is ambigious which may confuse both user
> and LSM to
> do things correctly:
>
> - Both TUNSETIFF and TUNSETQUEUE could be used to create the queues
> of a tuntap
> device.
> - TUNSETQUEUE were used to disable and enable a specific queue of the
> device. But since the state of tuntap were completely removed from
> the queue,
> it could be used to attach to another device (there's no such kind
> of
> requirement currently, and it needs new kind of LSM policy.
> - TUNSETQUEUE could be used to attach to a persistent device without
> any
> queues. This kind of attching bypass the necessary checking during
> TUNSETIFF
> and may lead unexpected result.
>
> So this patch tries to make a cleaner and simpler API by:
>
> - Only allow TUNSETIFF to create queues.
> - TUNSETQUEUE could be only used to disable and enabled the queues of
> a device,
> and the state of the tuntap device were not detachd from the queues
> when it
> was disabled, so TUNSETQUEUE could be only used after TUNSETIFF and
> with the
> same device.
>
> This is done by introducing a list which keeps track of all queues
> which were
> disabled. The queue would be moved between this list and tfiles[]
> array when it
> was enabled/disabled. A pointer of the tun_struct were also introdued
> to track
> the device it belongs to when it was disabled.
>
> After the change, the isolation between management and application
> could be done
> through: TUNSETIFF were only called by management software and
> TUNSETQUEUE were
> only called by application.For LSM/SELinux, the things left is to do
> proper
> check during tun_set_queue() if needed.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/net/tun.c | 86
> ++++++++++++++++++++++++++++++++++++++--------------
> 1 files changed, 63 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 2ac2164..6f2053d 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -138,6 +138,8 @@ struct tun_file {
> /* only used for fasnyc */
> unsigned int flags;
> u16 queue_index;
> + struct list_head next;
> + struct tun_struct *detached;
> };
>
> struct tun_flow_entry {
> @@ -182,6 +184,8 @@ struct tun_struct {
> struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
> struct timer_list flow_gc_timer;
> unsigned long ageing_time;
> + unsigned int numdisabled;
> + struct list_head disabled;
> };
>
> static inline u32 tun_hashfn(u32 rxhash)
> @@ -386,6 +390,23 @@ static void tun_set_real_num_queues(struct
> tun_struct *tun)
> netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
> }
>
> +static void tun_disable_queue(struct tun_struct *tun, struct
> tun_file *tfile)
> +{
> + tfile->detached = tun;
> + list_add_tail(&tfile->next, &tun->disabled);
> + ++tun->numdisabled;
> +}
> +
> +struct tun_struct *tun_enable_queue(struct tun_file *tfile)
> +{
> + struct tun_struct *tun = tfile->detached;
> +
> + tfile->detached = NULL;
> + list_del_init(&tfile->next);
> + --tun->numdisabled;
> + return tun;
> +}
> +
> static void __tun_detach(struct tun_file *tfile, bool clean)
> {
> struct tun_file *ntfile;
> @@ -407,20 +428,25 @@ static void __tun_detach(struct tun_file
> *tfile, bool clean)
> ntfile->queue_index = index;
>
> --tun->numqueues;
> - sock_put(&tfile->sk);
> + if (clean)
> + sock_put(&tfile->sk);
> + else
> + tun_disable_queue(tun, tfile);
>
> synchronize_net();
> tun_flow_delete_by_queue(tun, tun->numqueues + 1);
> /* Drop read queue */
> skb_queue_purge(&tfile->sk.sk_receive_queue);
> tun_set_real_num_queues(tun);
> -
> - if (tun->numqueues == 0 && !(tun->flags & TUN_PERSIST))
> - if (dev->reg_state == NETREG_REGISTERED)
> - unregister_netdevice(dev);
> - }
> + } else if (tfile->detached && clean)
> + tun = tun_enable_queue(tfile);
>
> if (clean) {
> + if (tun && tun->numqueues == 0 && tun->numdisabled == 0 &&
> + !(tun->flags & TUN_PERSIST))
> + if (tun->dev->reg_state == NETREG_REGISTERED)
> + unregister_netdevice(tun->dev);
> +
> BUG_ON(!test_bit(SOCK_EXTERNALLY_ALLOCATED,
> &tfile->socket.flags));
> sk_release_kernel(&tfile->sk);
> @@ -437,7 +463,7 @@ static void tun_detach(struct tun_file *tfile,
> bool clean)
> static void tun_detach_all(struct net_device *dev)
> {
> struct tun_struct *tun = netdev_priv(dev);
> - struct tun_file *tfile;
> + struct tun_file *tfile, *tmp;
> int i, n = tun->numqueues;
>
> for (i = 0; i < n; i++) {
> @@ -458,6 +484,12 @@ static void tun_detach_all(struct net_device
> *dev)
> skb_queue_purge(&tfile->sk.sk_receive_queue);
> sock_put(&tfile->sk);
> }
> + list_for_each_entry_safe(tfile, tmp, &tun->disabled, next) {
> + tun_enable_queue(tfile);
> + skb_queue_purge(&tfile->sk.sk_receive_queue);
> + sock_put(&tfile->sk);
> + }
> + BUG_ON(tun->numdisabled != 0);
> }
>
> static int tun_attach(struct tun_struct *tun, struct file *file)
> @@ -474,7 +506,8 @@ static int tun_attach(struct tun_struct *tun,
> struct file *file)
> goto out;
>
> err = -E2BIG;
> - if (tun->numqueues == MAX_TAP_QUEUES)
> + if (!tfile->detached &&
> + tun->numqueues + tun->numdisabled == MAX_TAP_QUEUES)
> goto out;
>
> err = 0;
> @@ -488,9 +521,13 @@ static int tun_attach(struct tun_struct *tun,
> struct file *file)
> tfile->queue_index = tun->numqueues;
> rcu_assign_pointer(tfile->tun, tun);
> rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> - sock_hold(&tfile->sk);
> tun->numqueues++;
>
> + if (tfile->detached)
> + tun_enable_queue(tfile);
> + else
> + sock_hold(&tfile->sk);
> +
> tun_set_real_num_queues(tun);
>
> /* device is allowed to go away first, so no need to hold extra
> @@ -1348,6 +1385,7 @@ static void tun_free_netdev(struct net_device
> *dev)
> {
> struct tun_struct *tun = netdev_priv(dev);
>
> + BUG_ON(!(list_empty(&tun->disabled)));
> tun_flow_uninit(tun);
> free_netdev(dev);
> }
> @@ -1542,6 +1580,10 @@ static int tun_set_iff(struct net *net, struct
> file *file, struct ifreq *ifr)
> err = tun_attach(tun, file);
> if (err < 0)
> return err;
> +
> + if (tun->flags & TUN_TAP_MQ &&
> + (tun->numqueues + tun->numdisabled > 1))
> + return err;
> }
> else {
> char *name;
> @@ -1600,6 +1642,7 @@ static int tun_set_iff(struct net *net, struct
> file *file, struct ifreq *ifr)
> TUN_USER_FEATURES;
> dev->features = dev->hw_features;
>
> + INIT_LIST_HEAD(&tun->disabled);
> err = tun_attach(tun, file);
> if (err < 0)
> goto err_free_dev;
> @@ -1754,32 +1797,28 @@ static int tun_set_queue(struct file *file,
> struct ifreq *ifr)
> {
> struct tun_file *tfile = file->private_data;
> struct tun_struct *tun;
> - struct net_device *dev;
> int ret = 0;
>
> rtnl_lock();
>
> if (ifr->ifr_flags & IFF_ATTACH_QUEUE) {
> - dev = __dev_get_by_name(tfile->net, ifr->ifr_name);
> - if (!dev) {
> - ret = -EINVAL;
> - goto unlock;
> - }
> -
> - tun = netdev_priv(dev);
> - if (dev->netdev_ops != &tap_netdev_ops &&
> - dev->netdev_ops != &tun_netdev_ops)
> + tun = tfile->detached;
> + if (!tun)
> ret = -EINVAL;
> else if (tun_not_capable(tun))
> ret = -EPERM;
> else
> ret = tun_attach(tun, file);
> - } else if (ifr->ifr_flags & IFF_DETACH_QUEUE)
> - __tun_detach(tfile, false);
> - else
> + } else if (ifr->ifr_flags & IFF_DETACH_QUEUE) {
> + tun = rcu_dereference_protected(tfile->tun,
> + lockdep_rtnl_is_held());
> + if (!tun || !(tun->flags & TUN_TAP_MQ))
> + ret = -EINVAL;
> + else
> + __tun_detach(tfile, false);
> + } else
> ret = -EINVAL;
>
> -unlock:
> rtnl_unlock();
> return ret;
> }
> @@ -2091,6 +2130,7 @@ static int tun_chr_open(struct inode *inode,
> struct file * file)
>
> file->private_data = tfile;
> set_bit(SOCK_EXTERNALLY_ALLOCATED, &tfile->socket.flags);
> + INIT_LIST_HEAD(&tfile->next);
>
> return 0;
> }
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: XFRM: Could we change ESP padding?
From: Steffen Klassert @ 2012-12-17 6:43 UTC (permalink / raw)
To: RongQing Li; +Cc: netdev
In-Reply-To: <CAJFZqHzDDtUacnQzd-gcS8JBvPdgspozWkUFOogS4nDmvZz7rg@mail.gmail.com>
On Mon, Dec 17, 2012 at 11:28:05AM +0800, RongQing Li wrote:
> Hi:
>
> setkey has the below parameter, but this parameter seems not be
> implemented in kernel and userspace,
>
> -f pad_option defines the content of the ESP padding.
> pad_option is one of following:
> zero-pad All the paddings are zero.
> random-pad A series of randomized values are used.
> seq-pad A series of sequential increasing numbers
> started from 1 are used.
>
We can not implement this. As you already mentioned, RFC 4303
makes strong statements on how the padding bytes are initialized.
An IPsec implementation that checks the padding bytes would drop our
packets if we don't use the padding method described in RFC 4303.
>
> and kernel seems not inspect the ESP padding content too, the result
> is the packets are not dropped even if they are with a wrong pad
> content(not a monotonically increasing sequence).
>
>
> Could anyone tell me why, bad description in RFC, performance, lack time,
> or other reason? Thanks very much!
>
RFC 4303 says that the receiver should inspect the padding field,
so we are free to do it or not. You can find a comment that explains
why we don't do it in the esp_input_done2() function ;-)
^ permalink raw reply
* Re: [PATCH 2/2] bridge: add flags to distinguish permanent mdb entires
From: Cong Wang @ 2012-12-17 5:46 UTC (permalink / raw)
To: David Miller; +Cc: netdev, bridge, herbert, shemminger
In-Reply-To: <20121215.171656.1197452765852503859.davem@davemloft.net>
On Sat, 2012-12-15 at 17:16 -0800, David Miller wrote:
> From: Cong Wang <amwang@redhat.com>
> Date: Sat, 15 Dec 2012 16:09:51 +0800
>
> > This patch adds a flag to each mdb entry, so that we can distinguish
> > permanent entries with temporary entries.
> >
> > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > Cc: Stephen Hemminger <shemminger@vyatta.com>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Signed-off-by: Cong Wang <amwang@redhat.com>
>
> Applied, but you _really_ need to lock down the interface and
> stop making changes to the user visible side of this _now_.
>
OK. I think it is okay to break ABI at this time, since the merge window
is not closed yet, who will develop applications based on an unstable
kernel though. :-/
^ permalink raw reply
* Re: openconnect triggers soft lockup in __skb_get_rxhash
From: Eric Dumazet @ 2012-12-17 4:46 UTC (permalink / raw)
To: Kirill A. Shutemov; +Cc: David Miller, maxk, netdev, dwmw2
In-Reply-To: <20121217014631.GA23101@shutemov.name>
On Mon, 2012-12-17 at 03:46 +0200, Kirill A. Shutemov wrote:
> On Sun, Dec 16, 2012 at 05:22:14PM -0800, David Miller wrote:
> >
> > Already fixed in Linus's tree by:
> >
> > From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001
>
> No, it's not. I use up-to-date (2a74dbb) Linus tree with the patch in and
> still see the issue.
>
Coud you try the following one liner ?
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 255a9f5..173acf5 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1199,6 +1199,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
}
+ skb_reset_network_header(skb);
rxhash = skb_get_rxhash(skb);
netif_rx_ni(skb);
^ permalink raw reply related
* XFRM: Could we change ESP padding?
From: RongQing Li @ 2012-12-17 3:28 UTC (permalink / raw)
To: netdev
Hi:
setkey has the below parameter, but this parameter seems not be
implemented in kernel and userspace,
-f pad_option defines the content of the ESP padding.
pad_option is one of following:
zero-pad All the paddings are zero.
random-pad A series of randomized values are used.
seq-pad A series of sequential increasing numbers
started from 1 are used.
and kernel seems not inspect the ESP padding content too, the result
is the packets are not dropped even if they are with a wrong pad
content(not a monotonically increasing sequence).
Could anyone tell me why, bad description in RFC, performance, lack time,
or other reason? Thanks very much!
RFC4303:
If Padding bytes are needed but the encryption algorithm does not
specify the padding contents, then the following default processing
MUST be used. The Padding bytes are initialized with a series of
(unsigned, 1-byte) integer values. The first padding byte appended
to the plaintext is numbered 1, with subsequent padding bytes making
up a monotonically increasing sequence: 1, 2, 3, .... When this
padding scheme is employed, the receiver SHOULD inspect the Padding
field. (This scheme was selected because of its relative simplicity,
ease of implementation in hardware, and because it offers limited
protection against certain forms of "cut and paste" attacks in the
absence of other integrity measures, if the receiver checks the
padding values upon decryption.)
Thanks
-RongQing
^ permalink raw reply
* Re: openconnect triggers soft lockup in __skb_get_rxhash
From: Kirill A. Shutemov @ 2012-12-17 1:46 UTC (permalink / raw)
To: David Miller; +Cc: maxk, netdev, dwmw2
In-Reply-To: <20121216.172214.687979484434537200.davem@davemloft.net>
On Sun, Dec 16, 2012 at 05:22:14PM -0800, David Miller wrote:
>
> Already fixed in Linus's tree by:
>
> From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001
No, it's not. I use up-to-date (2a74dbb) Linus tree with the patch in and
still see the issue.
--
Kirill A. Shutemov
^ permalink raw reply
* Re: request to queue patches for stable
From: David Miller @ 2012-12-17 1:22 UTC (permalink / raw)
To: caiqian; +Cc: netdev
In-Reply-To: <1687740004.1489340.1355706322262.JavaMail.root@redhat.com>
From: CAI Qian <caiqian@redhat.com>
Date: Sun, 16 Dec 2012 20:05:22 -0500 (EST)
> it was empty
It's empty because you didn't change the filter to not
filter patches already applied upstream.
^ permalink raw reply
* Re: openconnect triggers soft lockup in __skb_get_rxhash
From: David Miller @ 2012-12-17 1:22 UTC (permalink / raw)
To: kirill; +Cc: maxk, netdev, dwmw2
In-Reply-To: <20121217005616.GA23029@shutemov.name>
Already fixed in Linus's tree by:
>From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 12 Dec 2012 19:22:57 +0000
Subject: [PATCH 18/19] tuntap: dont use skb after netif_rx_ni(skb)
On Wed, 2012-12-12 at 23:16 -0500, Dave Jones wrote:
> Since todays net merge, I see this when I start openvpn..
>
> general protection fault: 0000 [#1] PREEMPT SMP
> Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables xfs iTCO_wdt iTCO_vendor_support snd_emu10k1 snd_util_mem snd_ac97_codec coretemp ac97_bus microcode snd_hwdep snd_seq pcspkr snd_pcm snd_page_alloc snd_timer lpc_ich i2c_i801 snd_rawmidi mfd_core snd_seq_device snd e1000e soundcore emu10k1_gp gameport i82975x_edac edac_core vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci sata_sil firewire_core crc_itu_t radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core floppy
> CPU 0
> Pid: 1381, comm: openvpn Not tainted 3.7.0+ #14 /D975XBX
> RIP: 0010:[<ffffffff815b54a4>] [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0
> RSP: 0018:ffff88007d0d9c48 EFLAGS: 00010206
> RAX: 000000000000055d RBX: 6b6b6b6b6b6b6b4b RCX: 1471030a0180040a
> RDX: 0000000000000005 RSI: 00000000ffffffe0 RDI: ffff8800ba83fa80
> RBP: ffff88007d0d9cb8 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000101 R12: ffff8800ba83fa80
> R13: 0000000000000008 R14: ffff88007d0d9cc8 R15: ffff8800ba83fa80
> FS: 00007f6637104800(0000) GS:ffff8800bf600000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f563f5b01c4 CR3: 000000007d140000 CR4: 00000000000007f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process openvpn (pid: 1381, threadinfo ffff88007d0d8000, task ffff8800a540cd60)
> Stack:
> ffff8800ba83fa80 0000000000000296 0000000000000000 0000000000000000
> ffff88007d0d9cc8 ffffffff815bcff4 ffff88007d0d9ce8 ffffffff815b1831
> ffff88007d0d9ca8 00000000703f6364 ffff8800ba83fa80 0000000000000000
> Call Trace:
> [<ffffffff815bcff4>] ? netif_rx+0x114/0x4c0
> [<ffffffff815b1831>] ? skb_copy_datagram_from_iovec+0x61/0x290
> [<ffffffff815b672a>] __skb_get_rxhash+0x1a/0xd0
> [<ffffffffa03b9538>] tun_get_user+0x418/0x810 [tun]
> [<ffffffff8135f468>] ? delay_tsc+0x98/0xf0
> [<ffffffff8109605c>] ? __rcu_read_unlock+0x5c/0xa0
> [<ffffffffa03b9a41>] tun_chr_aio_write+0x81/0xb0 [tun]
> [<ffffffff81145011>] ? __buffer_unlock_commit+0x41/0x50
> [<ffffffff811db917>] do_sync_write+0xa7/0xe0
> [<ffffffff811dc01f>] vfs_write+0xaf/0x190
> [<ffffffff811dc375>] sys_write+0x55/0xa0
> [<ffffffff81705540>] tracesys+0xdd/0xe2
> Code: 41 8b 44 24 68 41 2b 44 24 6c 01 de 29 f0 83 f8 03 0f 8e a0 00 00 00 48 63 de 49 03 9c 24 e0 00 00 00 48 85 db 0f 84 72 fe ff ff <8b> 03 41 89 46 08 b8 01 00 00 00 e9 43 fd ff ff 0f 1f 40 00 48
> RIP [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0
> RSP <ffff88007d0d9c48>
> ---[ end trace 6d42c834c72c002e ]---
>
>
> Faulting instruction is
>
> 0: 8b 03 mov (%rbx),%eax
>
> rbx is slab poison (-20) so this looks like a use-after-free here...
>
> flow->ports = *ports;
> 314: 8b 03 mov (%rbx),%eax
> 316: 41 89 46 08 mov %eax,0x8(%r14)
>
> in the inlined skb_header_pointer in skb_flow_dissect
>
> Dave
>
commit 96442e4242 (tuntap: choose the txq based on rxq) added
a use after free.
Cache rxhash in a temp variable before calling netif_rx_ni()
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/tun.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 2ac2164..40b426e 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -297,13 +297,12 @@ static void tun_flow_cleanup(unsigned long data)
spin_unlock_bh(&tun->lock);
}
-static void tun_flow_update(struct tun_struct *tun, struct sk_buff *skb,
+static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
u16 queue_index)
{
struct hlist_head *head;
struct tun_flow_entry *e;
unsigned long delay = tun->ageing_time;
- u32 rxhash = skb_get_rxhash(skb);
if (!rxhash)
return;
@@ -1010,6 +1009,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
int copylen;
bool zerocopy = false;
int err;
+ u32 rxhash;
if (!(tun->flags & TUN_NO_PI)) {
if ((len -= sizeof(pi)) > total_len)
@@ -1162,12 +1162,13 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
}
+ rxhash = skb_get_rxhash(skb);
netif_rx_ni(skb);
tun->dev->stats.rx_packets++;
tun->dev->stats.rx_bytes += len;
- tun_flow_update(tun, skb, tfile->queue_index);
+ tun_flow_update(tun, rxhash, tfile->queue_index);
return total_len;
}
--
1.7.11.7
^ permalink raw reply related
* Re: request to queue patches for stable
From: CAI Qian @ 2012-12-17 1:05 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20121214.153825.1087482329989146130.davem@davemloft.net>
----- Original Message -----
> From: "David Miller" <davem@davemloft.net>
> To: caiqian@redhat.com
> Cc: netdev@vger.kernel.org
> Sent: Saturday, December 15, 2012 4:38:25 AM
> Subject: Re: request to queue patches for stable
>
> From: CAI Qian <caiqian@redhat.com>
> Date: Mon, 10 Dec 2012 21:03:15 -0500 (EST)
>
> >
> >
> > ----- Original Message -----
> >> From: "David Miller" <davem@davemloft.net>
> >> To: caiqian@redhat.com
> >> Cc: greg@kroah.com, stable@vger.kernel.org, mbizon@freebox.fr,
> >> ja@ssi.bg
> >> Sent: Friday, December 7, 2012 11:23:21 AM
> >> Subject: Re: [PATCH] ipv4: do not cache looped multicasts
> >>
> >> From: CAI Qian <caiqian@redhat.com>
> >> Date: Thu, 6 Dec 2012 21:56:35 -0500 (EST)
> >>
> >> > OK, I have a few network patches in the queue that looks
> >> > applicable
> >> > to
> >> > the stable as well. I think I'll send them out here too to seek
> >> > their
> >> > ACKs. David, please let me know if I should stop doing this.
> >>
> >> Please stop doing this.
> >>
> >> If you want networking patches to reach stable, first
> >> consult:
> >>
> >> http://patchwork.ozlabs.org/bundle/davem/stable/
> >>
> >> to see if the patch you want isn't queued up already.
> >>
> >> If it is not, ask me to queue it up on netdev@vger.kernel.org
> >>
> >> But note that I like to let networking patches "cook" upstream
> >> in Linus's tree for a certain amount of time before I submit
> >> them to -stable. There can be up to even a week or two.
> > Dave, the following patches looks applicable for the stable
> > releases. Please queue them up if you agree.
> >
> > 0e376bd0b791ac6ac6bdb051492df0769c840848 (for 3.0.x, 3.4.x and
> > 3.6.x)
> > e196c0e579902f42cf72414461fb034e5a1ffbf7 (for 3.0.x, 3.4.x and
> > 3.6.x)
> > 6e51fe7572590d8d86e93b547fab6693d305fd0d (for 3.0.x, 3.4.x and
> > 3.6.x)
> > e1a676424c290b1c8d757e3860170ac7ecd89af4 (for 3.6.x)
> > 636174219b52b5a8bc51bc23bbcba97cd30a65e3 (for 3.6.x)
>
> What is the point of my publishing the pending networking -stable
> queue if you're not even going to check it? Those last two patches
> were already queued up.
Dave, Yes, I did check the link you gave to me. However, it was empty
(no patch there) when emailing you, so I thought none of those been
queued up yet. It is also empty now. If I clicked the "patches" link,
it pointed me to http://patchwork.ozlabs.org/project/netdev/list/
which it has patches but I believe it is not for the stable. Please
let me know if I am missing anything.
>
> Furthermore, it is erroneous to suggest the -ENOMEM SCTP fix without
> the memory leak fix that happens in the commit right before it.
Thanks for the reviewing.
>
> I've queued things up appropriately, but I really don't appreciate
> how you've handled this at all. It makes a lot more work for me than
> necessary.
OK, thanks for letting me know.
CAI Qian
>
>
^ permalink raw reply
* openconnect triggers soft lockup in __skb_get_rxhash
From: Kirill A. Shutemov @ 2012-12-17 0:56 UTC (permalink / raw)
To: Maxim Krasnyansky, David S. Miller; +Cc: netdev, David Woodhouse
Hi,
In few minutes after starting openconnect it starts consume 100% CPU and I
can see soft lockup report in dmesg:
[ 231.684591] BUG: soft lockup - CPU#3 stuck for 22s! [openconnect:3537]
[ 231.684595] Modules linked in: rfcomm bnep iwldvm btusb iwlwifi bluetooth acpi_cpufreq container mperf thermal battery ac processor
[ 231.684607] CPU 3
[ 231.684610] Pid: 3537, comm: openconnect Not tainted 3.7.0-08585-g2a74dbb #165 Hewlett-Packard HP EliteBook 8440p/172A
[ 231.684612] RIP: 0010:[<ffffffff815f3f62>] [<ffffffff815f3f62>] skb_flow_dissect+0x52/0x3b0
[ 231.684621] RSP: 0018:ffff88012ebbdc48 EFLAGS: 00000246
[ 231.684622] RAX: 0000000000000004 RBX: ffffffff8173b360 RCX: ff040404ff040404
[ 231.684623] RDX: 0000000000000000 RSI: ffff88012ebbdcc8 RDI: ffff880122770e00
[ 231.684624] RBP: ffff88012ebbdcb8 R08: 00000000000004f2 R09: ffff88012faa6000
[ 231.684626] R10: ffff880122770e00 R11: 0000000000000000 R12: ffffffff8173b476
[ 231.684627] R13: 0000000010000002 R14: ffff88012ebbc000 R15: 0000000000000004
[ 231.684628] FS: 00007f31249d9740(0000) GS:ffff880132e00000(0000) knlGS:0000000000000000
[ 231.684630] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 231.684631] CR2: 00007fed86cca000 CR3: 0000000122489000 CR4: 00000000000007e0
[ 231.684632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 231.684633] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 231.684635] Process openconnect (pid: 3537, threadinfo ffff88012ebbc000, task ffff8801305be4a0)
[ 231.684636] Stack:
[ 231.684637] ffff88012ebbdce8 ffffffff815f0072 ffff8801305be4a0 00000000000004f2
[ 231.684640] 0000000000000000 00000000000004f2 00000000000004f2 0000000000000000
[ 231.684643] ffff88012ebbdce8 00000000748aa9b5 ffff880122770e00 ffff88012ebbddf0
[ 231.684646] Call Trace:
[ 231.684651] [<ffffffff815f0072>] ? memcpy_fromiovecend+0x72/0xc0
[ 231.684654] [<ffffffff815f71fa>] __skb_get_rxhash+0x1a/0xd0
[ 231.684659] [<ffffffff814a04a8>] tun_get_user+0x5b8/0x7b0
[ 231.684662] [<ffffffff8149ea79>] ? __tun_get+0x59/0x80
[ 231.684664] [<ffffffff814a07b1>] tun_chr_aio_write+0x81/0xb0
[ 231.684670] [<ffffffff810e342e>] ? put_lock_stats.isra.15+0xe/0x40
[ 231.684675] [<ffffffff811ac857>] do_sync_write+0xa7/0xe0
[ 231.684678] [<ffffffff811acf7b>] vfs_write+0xab/0x190
[ 231.684681] [<ffffffff811ad2f5>] sys_write+0x55/0xb0
[ 231.684684] [<ffffffff81742906>] system_call_fastpath+0x1a/0x1f
[ 231.684685] Code: 31 c0 44 8b a7 a0 00 00 00 44 0f b7 6f 76 4c 03 a7 b0 00 00 00 44 2b a7 b8 00 00 00 48 c7 06 00 00 00 00 48 c7 46 08 00 00 00 00 <66> 41 81 fd 81 00 0f 84 72 01 00 00 77 70 66 41 83 fd 08 74 29
--
Kirill A. Shutemov
^ permalink raw reply
* Re: [PATCH v3] netfilter: nf_conntrack_sip: Handle Cisco 7941/7945 IP phones
From: Pablo Neira Ayuso @ 2012-12-17 0:44 UTC (permalink / raw)
To: David Woodhouse
Cc: Eric Dumazet, Kevin Cernekee, Patrick McHardy, David S. Miller,
Alexey Kuznetsov, Pekka Savola (ipv6), James Morris,
Hideaki YOSHIFUJI, netfilter-devel, netfilter, coreteam,
linux-kernel, netdev
In-Reply-To: <1355703441.18919.6.camel@shinybook.infradead.org>
Hi David,
On Mon, Dec 17, 2012 at 12:17:21AM +0000, David Woodhouse wrote:
> On Mon, 2010-11-22 at 08:52 +0100, Eric Dumazet wrote:
> > Le dimanche 21 novembre 2010 à 18:40 -0800, Kevin Cernekee a écrit :
> > > [v3:
> > > Only activate the new forced_dport logic if the IP matches, but the
> > > port does not. ]
> > >
> > > Most SIP devices use a source port of 5060/udp on SIP requests, so the
> > > response automatically comes back to port 5060:
> > >
> > > phone_ip:5060 -> proxy_ip:5060 REGISTER
> > > proxy_ip:5060 -> phone_ip:5060 100 Trying
> > >
> > > The newer Cisco IP phones, however, use a randomly chosen high source
> > > port for the SIP request but expect the response on port 5060:
> > >
> > > phone_ip:49173 -> proxy_ip:5060 REGISTER
> > > proxy_ip:5060 -> phone_ip:5060 100 Trying
> > >
> > > Standard Linux NAT, with or without nf_nat_sip, will send the reply back
> > > to port 49173, not 5060:
> > >
> > > phone_ip:49173 -> proxy_ip:5060 REGISTER
> > > proxy_ip:5060 -> phone_ip:49173 100 Trying
> > >
> > > But the phone is not listening on 49173, so it will never see the reply.
> > >
> > > This patch modifies nf_*_sip to work around this quirk by extracting
> > > the SIP response port from the Via: header, iff the source IP in the
> > > packet header matches the source IP in the SIP request.
> > >
> > > Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
> > > ---
> >
> > Thanks for doing this work Keven !
> >
> > Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> What happened to this? OpenWRT is still carrying it, and it broke in
> 3.7. Here's a completely untested update...
I requested Kevin to resend a new version based on the current kernel
tree while spinning on old pending patches since I have no access to
that hardware, but no luck.
So I'll review this and, since OpenWRT is carrying, I guess we can get
this into net-next merge window.
Thanks for the reminder.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v3] netfilter: nf_conntrack_sip: Handle Cisco 7941/7945 IP phones
From: David Woodhouse @ 2012-12-17 0:17 UTC (permalink / raw)
To: Eric Dumazet
Cc: Kevin Cernekee, Patrick McHardy, David S. Miller,
Alexey Kuznetsov, Pekka Savola (ipv6), James Morris,
Hideaki YOSHIFUJI, netfilter-devel, netfilter, coreteam,
linux-kernel, netdev
In-Reply-To: <1290412334.2756.141.camel@edumazet-laptop>
[-- Attachment #1: Type: text/plain, Size: 6988 bytes --]
On Mon, 2010-11-22 at 08:52 +0100, Eric Dumazet wrote:
> Le dimanche 21 novembre 2010 à 18:40 -0800, Kevin Cernekee a écrit :
> > [v3:
> > Only activate the new forced_dport logic if the IP matches, but the
> > port does not. ]
> >
> > Most SIP devices use a source port of 5060/udp on SIP requests, so the
> > response automatically comes back to port 5060:
> >
> > phone_ip:5060 -> proxy_ip:5060 REGISTER
> > proxy_ip:5060 -> phone_ip:5060 100 Trying
> >
> > The newer Cisco IP phones, however, use a randomly chosen high source
> > port for the SIP request but expect the response on port 5060:
> >
> > phone_ip:49173 -> proxy_ip:5060 REGISTER
> > proxy_ip:5060 -> phone_ip:5060 100 Trying
> >
> > Standard Linux NAT, with or without nf_nat_sip, will send the reply back
> > to port 49173, not 5060:
> >
> > phone_ip:49173 -> proxy_ip:5060 REGISTER
> > proxy_ip:5060 -> phone_ip:49173 100 Trying
> >
> > But the phone is not listening on 49173, so it will never see the reply.
> >
> > This patch modifies nf_*_sip to work around this quirk by extracting
> > the SIP response port from the Via: header, iff the source IP in the
> > packet header matches the source IP in the SIP request.
> >
> > Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
> > ---
>
> Thanks for doing this work Keven !
>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
What happened to this? OpenWRT is still carrying it, and it broke in
3.7. Here's a completely untested update...
+ if (!skb_make_writable(skb, skb->len))
+ return NF_DROP;
+
-+ uh = (struct udphdr *)(skb->data + ip_hdrlen(skb));
++ uh = (void *)skb->data + protoff;
+ uh->dest = ct_sip_info->forced_dport;
+
-+ if (!nf_nat_mangle_udp_packet(skb, ct, ctinfo, 0, 0, NULL, 0))
++ if (!nf_nat_mangle_udp_packet(skb, ct, ctinfo, protoff, 0, 0, NU
+ return NF_DROP;
+ }
+
return NF_ACCEPT;
}
diff --git a/include/linux/netfilter/nf_conntrack_sip.h b/include/linux/netfilter/nf_conntrack_sip.h
index 387bdd0..ba7f571 100644
--- a/include/linux/netfilter/nf_conntrack_sip.h
+++ b/include/linux/netfilter/nf_conntrack_sip.h
@@ -4,12 +4,15 @@
#include <net/netfilter/nf_conntrack_expect.h>
+#include <linux/types.h>
+
#define SIP_PORT 5060
#define SIP_TIMEOUT 3600
struct nf_ct_sip_master {
unsigned int register_cseq;
unsigned int invite_cseq;
+ __be16 forced_dport;
};
enum sip_expectation_classes {
diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index df8f4f2..72a67bb 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -1440,8 +1440,25 @@ static int process_sip_request(struct sk_buff *skb, unsigned int protoff,
{
enum ip_conntrack_info ctinfo;
struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+ struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
+ enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
unsigned int matchoff, matchlen;
unsigned int cseq, i;
+ union nf_inet_addr addr;
+ __be16 port;
+
+ /* Many Cisco IP phones use a high source port for SIP requests, but
+ * listen for the response on port 5060. If we are the local
+ * router for one of these phones, save the port number from the
+ * Via: header so that nf_nat_sip can redirect the responses to
+ * the correct port.
+ */
+ if (ct_sip_parse_header_uri(ct, *dptr, NULL, *datalen,
+ SIP_HDR_VIA_UDP, NULL, &matchoff,
+ &matchlen, &addr, &port) > 0 &&
+ port != ct->tuplehash[dir].tuple.src.u.udp.port &&
+ nf_inet_addr_cmp(&addr, &ct->tuplehash[dir].tuple.src.u3))
+ ct_sip_info->forced_dport = port;
for (i = 0; i < ARRAY_SIZE(sip_handlers); i++) {
const struct sip_handler *handler;
diff --git a/net/netfilter/nf_nat_sip.c b/net/netfilter/nf_nat_sip.c
index 16303c7..552e270 100644
--- a/net/netfilter/nf_nat_sip.c
+++ b/net/netfilter/nf_nat_sip.c
@@ -95,6 +95,7 @@ static int map_addr(struct sk_buff *skb, unsigned int protoff,
enum ip_conntrack_info ctinfo;
struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+ struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
char buffer[INET6_ADDRSTRLEN + sizeof("[]:nnnnn")];
unsigned int buflen;
union nf_inet_addr newaddr;
@@ -107,7 +108,8 @@ static int map_addr(struct sk_buff *skb, unsigned int protoff,
} else if (nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.dst.u3, addr) &&
ct->tuplehash[dir].tuple.dst.u.udp.port == port) {
newaddr = ct->tuplehash[!dir].tuple.src.u3;
- newport = ct->tuplehash[!dir].tuple.src.u.udp.port;
+ newport = ct_sip_info->forced_dport ? ct_sip_info->forced_dport :
+ ct->tuplehash[!dir].tuple.src.u.udp.port;
} else
return 1;
@@ -144,6 +146,7 @@ static unsigned int nf_nat_sip(struct sk_buff *skb, unsigned int protoff,
enum ip_conntrack_info ctinfo;
struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+ struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
unsigned int coff, matchoff, matchlen;
enum sip_header_types hdr;
union nf_inet_addr addr;
@@ -258,6 +261,20 @@ next:
!map_sip_addr(skb, protoff, dataoff, dptr, datalen, SIP_HDR_TO))
return NF_DROP;
+ /* Mangle destination port for Cisco phones, then fix up checksums */
+ if (dir == IP_CT_DIR_REPLY && ct_sip_info->forced_dport) {
+ struct udphdr *uh;
+
+ if (!skb_make_writable(skb, skb->len))
+ return NF_DROP;
+
+ uh = (void *)skb->data + protoff;
+ uh->dest = ct_sip_info->forced_dport;
+
+ if (!nf_nat_mangle_udp_packet(skb, ct, ctinfo, protoff, 0, 0, NULL, 0))
+ return NF_DROP;
+ }
+
return NF_ACCEPT;
}
@@ -311,8 +328,10 @@ static unsigned int nf_nat_sip_expect(struct sk_buff *skb, unsigned int protoff,
enum ip_conntrack_info ctinfo;
struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+ struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
union nf_inet_addr newaddr;
u_int16_t port;
+ __be16 srcport;
char buffer[INET6_ADDRSTRLEN + sizeof("[]:nnnnn")];
unsigned int buflen;
@@ -326,8 +345,9 @@ static unsigned int nf_nat_sip_expect(struct sk_buff *skb, unsigned int protoff,
/* If the signalling port matches the connection's source port in the
* original direction, try to use the destination port in the opposite
* direction. */
- if (exp->tuple.dst.u.udp.port ==
- ct->tuplehash[dir].tuple.src.u.udp.port)
+ srcport = ct_sip_info->forced_dport ? ct_sip_info->forced_dport :
+ ct->tuplehash[dir].tuple.src.u.udp.port;
+ if (exp->tuple.dst.u.udp.port == srcport)
port = ntohs(ct->tuplehash[!dir].tuple.dst.u.udp.port);
else
port = ntohs(exp->tuple.dst.u.udp.port);
--
dwmw2
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]
^ permalink raw reply related
* Re: [PATCH] Fix comment for packets without data
From: Pablo Neira Ayuso @ 2012-12-16 22:38 UTC (permalink / raw)
To: Rick Jones; +Cc: Florent Fourcot, yoshfuji, netdev, netfilter-devel
In-Reply-To: <50CB9A16.1090006@hp.com>
On Fri, Dec 14, 2012 at 01:28:54PM -0800, Rick Jones wrote:
> On 12/14/2012 02:53 AM, Florent Fourcot wrote:
> >Remove ambiguity of double negation
> >
> >Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
> >---
> > net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> >diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> >index 00ee17c..137e245 100644
> >--- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> >+++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> >@@ -81,8 +81,8 @@ static int ipv6_get_l4proto(const struct sk_buff *skb, unsigned int nhoff,
> > }
> > protoff = ipv6_skip_exthdr(skb, extoff, &nexthdr, &frag_off);
> > /*
> >- * (protoff == skb->len) mean that the packet doesn't have no data
> >- * except of IPv6 & ext headers. but it's tracked anyway. - YK
> >+ * (protoff == skb->len) means the packet has not data, just
> >+ * IPv6 and possibly extensions headers, but it is tracked anyway
> > */
> > if (protoff < 0 || (frag_off & htons(~0x7)) != 0) {
> > pr_debug("ip6_conntrack_core: can't find proto in pkt\n");
> >
>
> Acked-by: Rick Jones <rick.jones2@hp.com>
Applied, thanks.
That was many discussion for a patch to fix a comment, nice indeed :-)
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox