* Re: [PATCH v2] RPS: Sparse connection optimizations - v2
From: Deng-Cheng Zhu @ 2012-05-07 8:01 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Tom Herbert, davem, netdev
In-Reply-To: <1336376282.3752.2252.camel@edumazet-glaptop>
On 05/07/2012 03:38 PM, Eric Dumazet wrote:
> On Mon, 2012-05-07 at 14:48 +0800, Deng-Cheng Zhu wrote:
>> On 05/04/2012 11:31 PM, Tom Herbert wrote:
>>>> I think the mechanisms of rps_dev_flow_table and cpu_flow (in this
>>>> patch) are different: The former works along with rps_sock_flow_table
>>>> whose CPU info is based on recvmsg by the application. But for the tests
>>>> like what I did, there's no application involved.
>>>>
>>> While rps_sock_flow_table is currently only managed by recvmsg, it
>>> still is the general mechanism that maps flows to CPUs for steering.
>>> There should be nothing preventing you from populating and managing
>>> entries in other ways.
>>
>> Well, even using rps_sock_flow_table to map the sparse flows to CPUs,
>> we still need a data structure to describe a single flow -- that's what
>> struct cpu_flow is doing. Besides, rps_sock_flow_table, by its meaning,
>> does not seem to make sense for our purpose. How about keeping the patch
>> as is but renaming struct cpu_flow to struct rps_sparse_flow? It's like:
>>
>
> sock_flow_table is about mapping a flow (by its rxhash) to cpu.
>
> If you feel 'sock' is bad name, you can rename it.
>
> You dont need adding new data structure and code in fast path.
>
> Only the first packet of a new flow might be handled by 'the wrong cpu'.
>
> If you add code in forward path to change flow_table for next packets,
> added cost in fast path is null.
Did you really read my patch and understand what I commented? When I was
talking about using rps_sparse_flow (initially cpu_flow), neither
rps_sock_flow_table nor rps_dev_flow_table is activated (number of
entries: 0).
FYI below:
On 05/04/2012 11:39 AM, Deng-Cheng Zhu wrote:
> On 05/04/2012 11:22 AM, Tom Herbert wrote:
>>> +struct cpu_flow {
>>> + struct net_device *dev;
>>> + u32 rxhash;
>>> + unsigned long ts;
>>> +};
>>
>> This seems like overkill, we already have the rps_flow_table and this
>> used in accelerated RFS so the device can also take advantage of
>> steering. Maybe somehow program that table for your sparse flows?
>
> In fact I did ever try something different in rps_flow_cnt (except for
> rps_cpus, the only tunable thing relating to RPS in sysfs, am I
> missing something?) and found no effect in my tests (iperf between 2
> PCs via Malta which works as router and uses iptables/NAT+RPS)...
Deng-Cheng
^ permalink raw reply
* Re: [PATCH RESEND 0/5] Adopt pinctrl support for a few outstanding imx drivers
From: Dong Aisheng @ 2012-05-07 7:53 UTC (permalink / raw)
To: Shawn Guo
Cc: Dong Aisheng-B29396, linux-arm-kernel@lists.infradead.org,
Arnd Bergmann, netdev@vger.kernel.org, Sascha Hauer, Wolfram Sang,
linux-can@vger.kernel.org, Grant Likely, Marc Kleine-Budde,
linux-i2c@vger.kernel.org, linux-serial@vger.kernel.org,
Greg Kroah-Hartman, Olof Johansson,
spi-devel-general@lists.sourceforge.net, Dong Aisheng,
David S. Miller
In-Reply-To: <20120507073403.GG19389@S2101-09.ap.freescale.net>
On Mon, May 07, 2012 at 03:34:06PM +0800, Shawn Guo wrote:
> On Mon, May 07, 2012 at 02:50:02PM +0800, Dong Aisheng wrote:
> > Shouldn't we add the pinctrl states in dts file at the same time
> > with this patch series or using another separate patch to add them
> > before this series to avoid breaking the exist mx6q platforms?
> >
> Ah, I just noticed that your patch "ARM: imx: enable pinctrl dummy
> states" did not cover imx6q. I think we should do the same for imx6q,
Yes, doing that was to force people to add pinctrl states in dts file
rather than using dummy state since mx6 supports pinctrl driver.
> so that we can separate dts update from the driver change. When all
> imx6q boards' dts files get updated to have pins defined for the
> devices, we can then remove dummy state for imx6q. Doing so will ease
> the pinctrl migration for those imx6q boards.
>
Well, considering we have several mx6 boards, i think i can also be fine
with this way to ease the mx6q pinctrl migration.
> Will update your patch on my branch to have dummy state enabled for
> imx6q.
>
Then go ahead.
Regards
Dong Aisheng
^ permalink raw reply
* Re: [PATCH v2] RPS: Sparse connection optimizations - v2
From: Eric Dumazet @ 2012-05-07 7:38 UTC (permalink / raw)
To: Deng-Cheng Zhu; +Cc: Tom Herbert, davem, netdev
In-Reply-To: <4FA77051.20804@mips.com>
On Mon, 2012-05-07 at 14:48 +0800, Deng-Cheng Zhu wrote:
> On 05/04/2012 11:31 PM, Tom Herbert wrote:
> >> I think the mechanisms of rps_dev_flow_table and cpu_flow (in this
> >> patch) are different: The former works along with rps_sock_flow_table
> >> whose CPU info is based on recvmsg by the application. But for the tests
> >> like what I did, there's no application involved.
> >>
> > While rps_sock_flow_table is currently only managed by recvmsg, it
> > still is the general mechanism that maps flows to CPUs for steering.
> > There should be nothing preventing you from populating and managing
> > entries in other ways.
>
> Well, even using rps_sock_flow_table to map the sparse flows to CPUs,
> we still need a data structure to describe a single flow -- that's what
> struct cpu_flow is doing. Besides, rps_sock_flow_table, by its meaning,
> does not seem to make sense for our purpose. How about keeping the patch
> as is but renaming struct cpu_flow to struct rps_sparse_flow? It's like:
>
sock_flow_table is about mapping a flow (by its rxhash) to cpu.
If you feel 'sock' is bad name, you can rename it.
You dont need adding new data structure and code in fast path.
Only the first packet of a new flow might be handled by 'the wrong cpu'.
If you add code in forward path to change flow_table for next packets,
added cost in fast path is null.
^ permalink raw reply
* Re: [PATCH RESEND 0/5] Adopt pinctrl support for a few outstanding imx drivers
From: Shawn Guo @ 2012-05-07 7:34 UTC (permalink / raw)
To: Dong Aisheng
Cc: linux-arm-kernel, Arnd Bergmann, netdev, Sascha Hauer,
Wolfram Sang, linux-can, Grant Likely, Marc Kleine-Budde,
linux-i2c, linux-serial, Greg Kroah-Hartman, Olof Johansson,
spi-devel-general, Dong Aisheng, David S. Miller
In-Reply-To: <20120507065001.GA23607@shlinux2.ap.freescale.net>
On Mon, May 07, 2012 at 02:50:02PM +0800, Dong Aisheng wrote:
> Shouldn't we add the pinctrl states in dts file at the same time
> with this patch series or using another separate patch to add them
> before this series to avoid breaking the exist mx6q platforms?
>
Ah, I just noticed that your patch "ARM: imx: enable pinctrl dummy
states" did not cover imx6q. I think we should do the same for imx6q,
so that we can separate dts update from the driver change. When all
imx6q boards' dts files get updated to have pins defined for the
devices, we can then remove dummy state for imx6q. Doing so will ease
the pinctrl migration for those imx6q boards.
Will update your patch on my branch to have dummy state enabled for
imx6q.
> > net: fec: adopt pinctrl support
> > can: flexcan: adopt pinctrl support
> This two also depends on another patch you sent.
> [PATCH RESEND 1/9] ARM: mxs: enable pinctrl dummy states
> http://www.spinics.net/lists/arm-kernel/msg173341.html
>
> Maybe you can put this two in the mxs convert series to avoid breaking
> mxs platforms.
> [PATCH 0/9] Enable pinctrl support for mach-mxs
> http://www.spinics.net/lists/arm-kernel/msg173312.html
>
Right. I'm going to merge these two series into one since there are
device drives shared between imx and mxs.
--
Regards,
Shawn
^ permalink raw reply
* Re: [PATCH v2] RPS: Sparse connection optimizations - v2
From: Eric Dumazet @ 2012-05-07 7:22 UTC (permalink / raw)
To: Deng-Cheng Zhu; +Cc: Tom Herbert, davem, netdev
In-Reply-To: <4FA770DA.6000104@mips.com>
On Mon, 2012-05-07 at 14:51 +0800, Deng-Cheng Zhu wrote:
> This is not about arch, it's about system design. Even with IRQ affinity
> working, for single queue NICs, RPS still has its value in this test.
>
Post your numbers when proper affinities are done.
Having several cpus fighting on output path for qdisc/device lock is a
killer. Extra IPI are not worth the pain.
^ permalink raw reply
* Re: [PATCH] 9p: disconnect channel when PCI device is removed
From: Aneesh Kumar K.V @ 2012-05-07 7:14 UTC (permalink / raw)
To: Rusty Russell, Sasha Levin, davem, ericvh, jvrao
Cc: netdev, linux-kernel, davej, Sasha Levin
In-Reply-To: <87397chebf.fsf@rustcorp.com.au>
Rusty Russell <rusty@rustcorp.com.au> writes:
> On Fri, 13 Apr 2012 17:48:36 -0400, Sasha Levin <levinsasha928@gmail.com> wrote:
>> When a virtio_9p pci device is being removed, we should close down any
>> active channels and free up resources, we're not supposed to BUG() if there's
>> still an open channel since it's a valid case when removing the PCI device.
>>
>> Otherwise, removing the PCI device with an open channel would cause the
>> following BUG():
>
> (Damn changed notmuch.el bindings! Previous reply went only to Sasha).
>
> Applied thanks,
> Rusty.
I am not sure whether the patch is sufficient, p9_virtio_remove does a
kfree(chan) and since we are not doing anything at the file system
level, we would still allow new 9p client request. That means
p9_virtio_request would be dereferencing a freed memory.
-aneesh
^ permalink raw reply
* Re: [net-next 0/4][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-05-07 7:12 UTC (permalink / raw)
To: David Miller; +Cc: netdev, gospo, sassmann
In-Reply-To: <20120506.132513.36895742709809565.davem@davemloft.net>
[-- Attachment #1: Type: text/plain, Size: 1042 bytes --]
On Sun, 2012-05-06 at 13:25 -0400, David Miller wrote:
> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Sat, 5 May 2012 05:38:09 -0700
>
> > This series of patches contains updates for e1000e and ixgbe.
> >
> > NOTE- The ixgbe patch can and probably should be applied to
> > David Miller's net tree as well.
> >
> > The following are changes since commit bd14b1b2e29bd6812597f896dde06eaf7c6d2f24:
> > tcp: be more strict before accepting ECN negociation
> > and are available in the git repository at:
> > git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
>
> No new changes there?
>
> [davem@drr net-next]$ git pull git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
> From git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
> * branch master -> FETCH_HEAD
> Already up-to-date.
Sorry Dave, I thought I had pushed the changes but it appears I did not.
I have rectified that and now my net-next tree contains the four
patches.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply
* Re: [PATCH v2] RPS: Sparse connection optimizations - v2
From: Deng-Cheng Zhu @ 2012-05-07 6:51 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Tom Herbert, davem, netdev
In-Reply-To: <1336117669.3752.49.camel@edumazet-glaptop>
On 05/04/2012 03:47 PM, Eric Dumazet wrote:
> On Fri, 2012-05-04 at 12:25 +0800, Deng-Cheng Zhu wrote:
>> On 05/04/2012 11:22 AM, Tom Herbert wrote:
>>>> +struct cpu_flow {
>>>> + struct net_device *dev;
>>>> + u32 rxhash;
>>>> + unsigned long ts;
>>>> +};
>>>
>>> This seems like overkill, we already have the rps_flow_table and this
>>> used in accelerated RFS so the device can also take advantage of
>>> steering.
>>
>> I think the mechanisms of rps_dev_flow_table and cpu_flow (in this
>> patch) are different: The former works along with rps_sock_flow_table
>> whose CPU info is based on recvmsg by the application. But for the tests
>> like what I did, there's no application involved.
>>
>>
>> Deng-Cheng
>
> I really suggest you speak with MIPS arch maintainers about these IRQ
> being all serviced by CPU0.
This is not about arch, it's about system design. Even with IRQ affinity
working, for single queue NICs, RPS still has its value in this test.
>
> Adding tweaks in network stack to lower the impact of this huge problem
> is a no go.
This is merely an option, to whom care about sparse flow throughput. And
it's absolutely sort of tradeoff, and not selected by default.
Deng-Cheng
^ permalink raw reply
* Re: [PATCH RESEND 0/5] Adopt pinctrl support for a few outstanding imx drivers
From: Dong Aisheng @ 2012-05-07 6:50 UTC (permalink / raw)
To: Shawn Guo
Cc: linux-arm-kernel, Arnd Bergmann, netdev, Sascha Hauer,
Wolfram Sang, linux-can, Grant Likely, Marc Kleine-Budde,
linux-i2c, linux-serial, Greg Kroah-Hartman, Olof Johansson,
spi-devel-general, Dong Aisheng, David S. Miller
In-Reply-To: <1336352040-28447-1-git-send-email-shawn.guo@linaro.org>
On Mon, May 07, 2012 at 08:53:55AM +0800, Shawn Guo wrote:
> With patch 5b3aa5f (pinctrl: add pinctrl_provide_dummies interface for
> platforms to use) applied on pinctrl tree, and patch "ARM: imx: enable
> pinctrl dummy states" [1] being there, we are ready to adopt pinctrl
> API for imx drivers. So let's start from a few outstanding ones.
>
> I would expect to ask Arnd and Olof to pull pinctrl tree into arm-soc
> as a dependency and then have series [1] and this patch set go through
> arm-soc tree to ease the merge process.
>
Shouldn't we add the pinctrl states in dts file at the same time
with this patch series or using another separate patch to add them
before this series to avoid breaking the exist mx6q platforms?
> Resend to have subsystem lists Cc-ed.
>
> Regards,
> Shawn
>
> [1] http://thread.gmane.org/gmane.linux.kernel.mmc/14180
>
> Shawn Guo (5):
> tty: serial: imx: adopt pinctrl support
...
> net: fec: adopt pinctrl support
> can: flexcan: adopt pinctrl support
This two also depends on another patch you sent.
[PATCH RESEND 1/9] ARM: mxs: enable pinctrl dummy states
http://www.spinics.net/lists/arm-kernel/msg173341.html
Maybe you can put this two in the mxs convert series to avoid breaking
mxs platforms.
[PATCH 0/9] Enable pinctrl support for mach-mxs
http://www.spinics.net/lists/arm-kernel/msg173312.html
Regards
Dong Aisheng
> i2c: imx: adopt pinctrl support
> spi/imx: adopt pinctrl support
>
> drivers/i2c/busses/i2c-imx.c | 8 ++++++++
> drivers/net/can/flexcan.c | 6 ++++++
> drivers/net/ethernet/freescale/fec.c | 9 +++++++++
> drivers/spi/spi-imx.c | 8 ++++++++
> drivers/tty/serial/imx.c | 8 ++++++++
> 5 files changed, 39 insertions(+), 0 deletions(-)
>
> --
> 1.7.5.4
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
^ permalink raw reply
* Re: [PATCH v2] RPS: Sparse connection optimizations - v2
From: Deng-Cheng Zhu @ 2012-05-07 6:48 UTC (permalink / raw)
To: Tom Herbert; +Cc: davem, netdev, eric.dumazet
In-Reply-To: <CA+mtBx-CErRU=3ewkAjVrGN3dGzjsTz8Q-E8J+Xa+529OVEvwA@mail.gmail.com>
On 05/04/2012 11:31 PM, Tom Herbert wrote:
>> I think the mechanisms of rps_dev_flow_table and cpu_flow (in this
>> patch) are different: The former works along with rps_sock_flow_table
>> whose CPU info is based on recvmsg by the application. But for the tests
>> like what I did, there's no application involved.
>>
> While rps_sock_flow_table is currently only managed by recvmsg, it
> still is the general mechanism that maps flows to CPUs for steering.
> There should be nothing preventing you from populating and managing
> entries in other ways.
Well, even using rps_sock_flow_table to map the sparse flows to CPUs,
we still need a data structure to describe a single flow -- that's what
struct cpu_flow is doing. Besides, rps_sock_flow_table, by its meaning,
does not seem to make sense for our purpose. How about keeping the patch
as is but renaming struct cpu_flow to struct rps_sparse_flow? It's like:
---------------------------------------------
In include/linux/netdevice.h:
struct rps_sparse_flow {
struct net_device *dev;
u32 rxhash;
unsigned long ts;
};
In net/core/dev.c:
static DEFINE_PER_CPU(struct rps_sparse_flow [CONFIG_NR_RPS_MAP_LOOPS],
rps_sparse_flow_table);
---------------------------------------------
The above looks similar to rps_dev_flow/rps_dev_flow_table, and we do
not necessarily go with rps_sock_flow_table.
Thanks,
Deng-Cheng
^ permalink raw reply
* Re: ipctl - new tool for efficient read/write of net related sysctl
From: Thomas Graf @ 2012-05-07 6:14 UTC (permalink / raw)
To: Oskar Berggren; +Cc: Stephen Hemminger, netdev
In-Reply-To: <CAHOuc7PjmcF=EhEUDEqfF_RShezxDqzc+53frNwWqhww8YQ+mA@mail.gmail.com>
On Sun, May 06, 2012 at 02:46:01PM +0200, Oskar Berggren wrote:
> 2012/5/6 Stephen Hemminger <stephen.hemminger@vyatta.com>:
> >
> >>
> >> In a project of mine I need to read (and possibly set) many of the
> >> properties
> >> found under /proc/sys/net/ipv4/conf/. This is simple enough, except
> >> that
> >> when you have hundreds of interfaces, it is really slow. In my tests
> >> it takes
> >> about 4 seconds to read a single variable for 700 interfaces. For a
> >> while I
> >> worked around this using the binary sysctl() interface, but this is
> >> deprecated.
> >>
> >
> > What about exposing these as NETLINK attributes? That would be faster
> > and you could do bulk updates.
>
>
> This is my first attempt at using NETLINK, so could you please elaborate?
> Below is the generic netlink interface I implemented so far. Any pointers
> on how I should do this differently?
What Stephen means is to use the existing message types RTM_SETLINK
and RTM_GETLINK in the NETLINK_ROUTE family.
This is already partially implemented. See the IFLA_AF_SPEC attribute
carrying IPV4_DEVCONF_ and DEVCONF_ (IPv6). Grep for rtnl_af_register()
and you will find the corresponding implementations.
Feel free to complete these existing interfaces, such as adding write
support to IPv6 or adding support to iproute2 which is currently
lacking.
src/nl-link-list.c in the libnl sources allows you to display the
configurations:
$ src/nl-link-list --details --name virbr0-nic
virbr0-nic ether 52:54:00:cb:da:db master virbr0 <broadcast,multicast>
mtu 1500 txqlen 500 weight 0 qdisc noop index 7
brd ff:ff:ff:ff:ff:ff state down mode default
ipv4 devconf:
forwarding 1 mc_forwarding 0 proxy_arp 0
accept_redirects 1 secure_redirects 1 send_redirects 1
shared_media 1 rp_filter 1 accept_source_route 0
bootp_relay 0 log_martians 0 tag 0
arpfilter 0 medium_id 0 noxfrm 0
nopolicy 0 force_igmp_version 0 arp_announce 0
arp_ignore 0 promote_secondaries 0 arp_accept 0
arp_notify 0 accept_local 0 src_vmark 0
proxy_arp_pvlan 0
ipv6 max-reasm-len 64KiB <>
create-stamp 13.35s reachable-time 40s 898msec retrans-time 1s
devconf:
forwarding 1 hoplimit 64 mtu6 1500
accept_ra 1 accept_redirects 1 autoconf 1
dad_transmits 1 rtr_solicits 3 rtr_solicit_interval 4s
rtr_solicit_delay 1s use_tempaddr 0 temp_valid_lft 7d
temp_prefered_lft 1d regen_max_retry 3 max_desync_factor 600
max_addresses 16 force_mld_version 0 accept_ra_defrtr 1
accept_ra_pinfo 1 accept_ra_rtr_pref 1 rtr_probe_interval 1m
accept_ra_rt_info 0 proxy_ndp 0 optimistic_dad 0
accept_source_route 0 mc_forwarding 0 disable_ipv6 0
accept_dad 1 force_tllao 0
^ permalink raw reply
* Re: [net-next 1/4 (V3)] net: ethtool: add the EEE support
From: Giuseppe CAVALLARO @ 2012-05-07 5:25 UTC (permalink / raw)
To: Ben Hutchings; +Cc: Yuval Mintz, netdev, davem
In-Reply-To: <1335736615.2424.42.camel@bwh-desktop.uk.solarflarecom.com>
Hello Ben
On 4/29/2012 11:56 PM, Ben Hutchings wrote:
> On Sun, 2012-04-29 at 12:20 +0300, Yuval Mintz wrote:
>> On 04/27/2012 05:11 PM, Giuseppe CAVALLARO wrote:
>>
>>> On 4/26/2012 7:17 PM, Ben Hutchings wrote:
>>>> On Thu, 2012-04-26 at 09:48 +0200, Giuseppe CAVALLARO wrote:
>>>>> Hello Ben
>>>>>
>>>>> On 4/19/2012 5:30 PM, Ben Hutchings wrote:
>>>>> [snip]
>>>>>>> I'm changing the code for getting/setting the EEE capability and trying
>>>>>>> to follow your suggestions.
>>>>>>>
>>>>>>> The "get" will show the following things; this is a bit different of the
>>>>>>> points "a" "b" and "c" we had discussed. Maybe, this could also be a
>>>>>>> more complete (*) .
>>>>>>> The ethtool (see output below as example) could report the phy
>>>>>>> (supported/advertised/lp_advertised) and mac eee capabilities separately.
>>>>>> Sounds reasonable.
>>>>>>
>>>>>>> The "set" will be useful for some eth devices (like the stmmac) that can
>>>>>>> stop/enable internally the eee capability (at mac level).
>>>>>> I don't know much about EEE, but shouldn't the driver take care of
>>>>>> configuring the MAC for this whenever the PHY is set to advertise EEE
>>>>>> capability?
>>>>> Yes indeed this can be done at driver level. So could I definitely
>>>>> remove it from ethtool? What do you suggest?
>>>>>
>>>>> In case of the stmmac I could add a specific driver option via sys to
>>>>> enable/disable the eee and set timer.
>>>> Generally, ethtool doesn't distinguish MAC and PHY settings because they
>>>> have to be configured consistently for the device to do anything useful.
>>>> If there is some good use for enabling EEE in the MAC and not the PHY,
>>>> or vice versa, then this should be exposed in the ethtool interface.
>>>> But if not then I don't believe it needs to be in either an ethtool or a
>>>> driver-specific interface.
>>> Thanks Ben for this clarification: in case of the stmmac the option is
>>> useful to stop a timer to enter in lpi state for the tx.
>>> So it's worth having that and from ethtool.
>
> I think I finally get it. If we negotiate a 100BASE-TX link (or one of
> the various backplane modes) with EEE enabled, we allow the link partner
> to assert LPI but we might still not want to assert it in the transmit
> direction. Right? (Whereas for 1000BASE-T and 10GBASE-T this would be
> useless, since both sides must assert LPI before any transition can
> happen.)
>
>> How will a user turn off EEE support using this implementation?
>
> At the ethtool API level this would be done by clearing the EEE
> advertising mask. At the command-line level there could be a shortcut
> for this, just as you can use 'autoneg on' and 'autoneg off' rather than
> specifying a mask of link modes.
>
>> Are you suggesting a "set" that works similarly to the control of the pause
>> parameters - that is, a user could either shutdown EEE or only Tx, which
>> will mean to the driver "don't enter Tx LPI mode"?
>>
>> Keep in mind that if later an interface controlling the LPI timers would be
>> added (as a measure of user control to the power saving vs. latency issue),
>> it could make this 'partial' closure interface redundant.
>>
>> Perhaps "set" should only turn the EEE feature on/off entirely (adv. them or
>> not, since clearly the link will have to be re-established afterwards), and
>> we should have a different function that prevents entry into LPI mode in Tx
>> - one whose functionality could later on be extended.
>
> It sounds like this might as well be included, even if not all
> drivers/hardware would allow the values to be changed. So the command
> structure would have at least:
>
> 1. EEE link mode supported flags (get-only)
> 2. EEE link mode advertising flags (get/set)
> 3. Ditto for link partner (get-only)
> 4. TX LPI enable flag (get/set)
> 5. TX LPI timer values (get/set but driver may reject changes)
Ok I'll try to rework all following the points above. Just a note for
the timer and point 5 below.
> But if it's not yet clear exactly what timer parameters will be useful,
> we could leave some reserved space and then later define them along with
> flags to indicate whether the driver understands them.
I can use and test the LPI timer parameters that I intends, in case of
the stmmac d.d., the values added in a mac core register. These two timers:
1) specify the minimum time for which the link-status from the PHY
should be up. The default value 1 sec as defined in the IEEE
standard.
2) specify the minimum time for which the MAC waits after it has
stopped transmitting the LPI pattern to the PHY
Peppe
>
> Ben.
>
^ permalink raw reply
* Re: [PATCH] 9p: disconnect channel when PCI device is removed
From: Rusty Russell @ 2012-05-07 3:15 UTC (permalink / raw)
To: Sasha Levin, davem, ericvh, aneesh.kumar, jvrao
Cc: netdev, linux-kernel, davej, Sasha Levin
In-Reply-To: <1334353716-19483-1-git-send-email-levinsasha928@gmail.com>
On Fri, 13 Apr 2012 17:48:36 -0400, Sasha Levin <levinsasha928@gmail.com> wrote:
> When a virtio_9p pci device is being removed, we should close down any
> active channels and free up resources, we're not supposed to BUG() if there's
> still an open channel since it's a valid case when removing the PCI device.
>
> Otherwise, removing the PCI device with an open channel would cause the
> following BUG():
(Damn changed notmuch.el bindings! Previous reply went only to Sasha).
Applied thanks,
Rusty.
^ permalink raw reply
* [PATCH RESEND 2/5] net: fec: adopt pinctrl support
From: Shawn Guo @ 2012-05-07 0:53 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Arnd Bergmann, Olof Johansson, Sascha Hauer, Dong Aisheng,
Shawn Guo, netdev, David S. Miller
In-Reply-To: <1336352040-28447-1-git-send-email-shawn.guo@linaro.org>
Cc: netdev@vger.kernel.org
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
---
drivers/net/ethernet/freescale/fec.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ethernet/freescale/fec.c b/drivers/net/ethernet/freescale/fec.c
index a12b3f5..500c106 100644
--- a/drivers/net/ethernet/freescale/fec.c
+++ b/drivers/net/ethernet/freescale/fec.c
@@ -48,6 +48,7 @@
#include <linux/of_device.h>
#include <linux/of_gpio.h>
#include <linux/of_net.h>
+#include <linux/pinctrl/consumer.h>
#include <asm/cacheflush.h>
@@ -1542,6 +1543,7 @@ fec_probe(struct platform_device *pdev)
struct resource *r;
const struct of_device_id *of_id;
static int dev_id;
+ struct pinctrl *pinctrl;
of_id = of_match_device(fec_dt_ids, &pdev->dev);
if (of_id)
@@ -1609,6 +1611,12 @@ fec_probe(struct platform_device *pdev)
}
}
+ pinctrl = devm_pinctrl_get_select_default(&pdev->dev);
+ if (IS_ERR(pinctrl)) {
+ ret = PTR_ERR(pinctrl);
+ goto failed_pin;
+ }
+
fep->clk = clk_get(&pdev->dev, NULL);
if (IS_ERR(fep->clk)) {
ret = PTR_ERR(fep->clk);
@@ -1639,6 +1647,7 @@ failed_mii_init:
failed_init:
clk_disable_unprepare(fep->clk);
clk_put(fep->clk);
+failed_pin:
failed_clk:
for (i = 0; i < FEC_IRQ_NUM; i++) {
irq = platform_get_irq(pdev, i);
--
1.7.5.4
^ permalink raw reply related
* [PATCH RESEND 0/5] Adopt pinctrl support for a few outstanding imx drivers
From: Shawn Guo @ 2012-05-07 0:53 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Arnd Bergmann, Olof Johansson, Sascha Hauer, Dong Aisheng,
Shawn Guo, spi-devel-general, Grant Likely, linux-i2c,
Wolfram Sang, linux-can, Marc Kleine-Budde, netdev,
David S. Miller, linux-serial, Greg Kroah-Hartman
With patch 5b3aa5f (pinctrl: add pinctrl_provide_dummies interface for
platforms to use) applied on pinctrl tree, and patch "ARM: imx: enable
pinctrl dummy states" [1] being there, we are ready to adopt pinctrl
API for imx drivers. So let's start from a few outstanding ones.
I would expect to ask Arnd and Olof to pull pinctrl tree into arm-soc
as a dependency and then have series [1] and this patch set go through
arm-soc tree to ease the merge process.
Resend to have subsystem lists Cc-ed.
Regards,
Shawn
[1] http://thread.gmane.org/gmane.linux.kernel.mmc/14180
Shawn Guo (5):
tty: serial: imx: adopt pinctrl support
net: fec: adopt pinctrl support
can: flexcan: adopt pinctrl support
i2c: imx: adopt pinctrl support
spi/imx: adopt pinctrl support
drivers/i2c/busses/i2c-imx.c | 8 ++++++++
drivers/net/can/flexcan.c | 6 ++++++
drivers/net/ethernet/freescale/fec.c | 9 +++++++++
drivers/spi/spi-imx.c | 8 ++++++++
drivers/tty/serial/imx.c | 8 ++++++++
5 files changed, 39 insertions(+), 0 deletions(-)
--
1.7.5.4
^ permalink raw reply
* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Pablo Neira Ayuso @ 2012-05-06 22:57 UTC (permalink / raw)
To: Hans Schillstrom
Cc: kaber@trash.net, jengelh@medozas.de,
netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
hans@schillstrom.com
In-Reply-To: <201205021949.48741.hans.schillstrom@ericsson.com>
[-- Attachment #1: Type: text/plain, Size: 9477 bytes --]
Hi Hans,
[...]
> > > > Regarding ICMP traffic, I think we can use the ID field for the
> > > > hashing as well. Thus, we handle ICMP like other protocols.
> > >
> > > Yes why not, I can give it a try.
> > >
>
> I think we wait with this one..
I see. This is easy to add for the conntrack side, but it will require
some extra code for the packet-based solution.
Not directly related to this but, I know that your intention is to
make this as flexible as possible. However, I still don't find how I
would use the port mask feature in any of my setups. Basically, I
don't come up with any useful example for this situation.
I'm also telling this because I think that ICMP support will be
easier to add if port masking is removed.
[...]
> This is what I have done.
>
> - I reduced the code size a little bit by combining the hmark_ct_set_htuple_ipvX into one func.
> by adding a hmark_addr6_mask() and hmark_addr_any_mask()
> Note that using "otuple->src.l3num" as param 1 in both src and dst is not a typo.
> (it's not set in the rtuple)
Good one, this made the code even smaller.
> - Made the if (dst < src) swap() in the hmark_hash() since it should be used by every caller.
Not really, you don't need for the conntrack part. The original tuple
is always the same, not matter where the packet is coming from. I have
removed this again so it only affects packet-based hashing.
> - Moved the L3 check a little bit earlier.
good.
> - changed return values for fragments.
With this, you're giving up on trying to classify fragments. Do you
really want this?
>From my point of view, if your firewalls (assuming they are the HMARK
classification) are stateless, it still makes sense to me to classify
fragments using the XT_HMARK_METHOD_L3_4.
> - Added nhoffs to: hmark_set_tuple_ports(skb, (ip->ihl * 4) + nhoff, t, info);
> to get icmp working
good catch.
Below, some minor changes that I made to your patch (you can find a
new version enclosed to this email).
[...]
> +#ifndef XT_HMARK_H_
> +#define XT_HMARK_H_
> +
> +#include <linux/types.h>
> +
> +enum {
> + XT_HMARK_NONE,
> + XT_HMARK_SADR_AND,
> + XT_HMARK_DADR_AND,
> + XT_HMARK_SPI_AND,
> + XT_HMARK_SPI_OR,
> + XT_HMARK_SPORT_AND,
> + XT_HMARK_DPORT_AND,
> + XT_HMARK_SPORT_OR,
> + XT_HMARK_DPORT_OR,
> + XT_HMARK_PROTO_AND,
> + XT_HMARK_RND,
> + XT_HMARK_MODULUS,
> + XT_HMARK_OFFSET,
> + XT_HMARK_CT,
> + XT_HMARK_METHOD_L3,
> + XT_HMARK_METHOD_L3_4,
> + XT_F_HMARK_SADR_AND = 1 << XT_HMARK_SADR_AND,
> + XT_F_HMARK_DADR_AND = 1 << XT_HMARK_DADR_AND,
> + XT_F_HMARK_SPI_AND = 1 << XT_HMARK_SPI_AND,
> + XT_F_HMARK_SPI_OR = 1 << XT_HMARK_SPI_OR,
> + XT_F_HMARK_SPORT_AND = 1 << XT_HMARK_SPORT_AND,
> + XT_F_HMARK_DPORT_AND = 1 << XT_HMARK_DPORT_AND,
> + XT_F_HMARK_SPORT_OR = 1 << XT_HMARK_SPORT_OR,
> + XT_F_HMARK_DPORT_OR = 1 << XT_HMARK_DPORT_OR,
> + XT_F_HMARK_PROTO_AND = 1 << XT_HMARK_PROTO_AND,
> + XT_F_HMARK_RND = 1 << XT_HMARK_RND,
> + XT_F_HMARK_MODULUS = 1 << XT_HMARK_MODULUS,
> + XT_F_HMARK_OFFSET = 1 << XT_HMARK_OFFSET,
> + XT_F_HMARK_CT = 1 << XT_HMARK_CT,
> + XT_F_HMARK_METHOD_L3 = 1 << XT_HMARK_METHOD_L3,
> + XT_F_HMARK_METHOD_L3_4 = 1 << XT_HMARK_METHOD_L3_4,
I've defined:
#define XT_HMARK_FLAG(flag) (1 << flag)
So we save all those extra _F_ defintions, they look redundant.
[...]
> diff --git a/net/netfilter/xt_HMARK.c b/net/netfilter/xt_HMARK.c
> new file mode 100644
> index 0000000..76a3fa7
> --- /dev/null
> +++ b/net/netfilter/xt_HMARK.c
> +/*
> + * xt_HMARK - Netfilter module to set mark as hash value
> + *
> + * (C) 2012 by Hans Schillstrom <hans.schillstrom@ericsson.com>
> + * (C) 2012 by Pablo Neira Ayuso <pablo@netfilter.org>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + *
> + * Description:
> + *
> + * This module calculates a hash value that can be modified by modulus and an
> + * offset, i.e. it is possible to produce a skb->mark within a range The hash
> + * value is based on a direction independent five tuple: src & dst addr src &
> + * dst ports and protocol.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/skbuff.h>
> +#include <linux/icmp.h>
> +
> +#include <linux/netfilter/x_tables.h>
> +#include <linux/netfilter/xt_HMARK.h>
> +
> +#include <net/ip.h>
> +#if IS_ENABLED(CONFIG_NF_CONNTRACK)
> +#include <net/netfilter/nf_conntrack.h>
> +#endif
> +#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
> +#include <net/ipv6.h>
> +#include <linux/netfilter_ipv6/ip6_tables.h>
> +#endif
> +
> +
I removed this extra blank line above.
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
> +MODULE_DESCRIPTION("Xtables: packet marking using hash calculation");
> +MODULE_ALIAS("ipt_HMARK");
> +MODULE_ALIAS("ip6t_HMARK");
> +
> +struct hmark_tuple {
> + u32 src;
> + u32 dst;
> + union hmark_ports uports;
> + uint8_t proto;
> +};
> +
> +static int
> +hmark_ct_set_htuple(const struct sk_buff *skb, struct hmark_tuple *t,
> + const struct xt_hmark_info *info);
> +static inline u32
> +hmark_hash(struct hmark_tuple *t, const struct xt_hmark_info *info)
> +{
> + u32 hash;
> +
> + if (t->dst < t->src)
> + swap(t->src, t->dst);
> +
> + hash = jhash_3words(t->src, t->dst, t->uports.v32, info->hashrnd);
> + hash = hash ^ (t->proto & info->proto_mask);
> +
> + return (hash % info->hmodulus) + info->hoffset;
> +}
> +
> +static void
> +hmark_set_tuple_ports(const struct sk_buff *skb, unsigned int nhoff,
> + struct hmark_tuple *t, const struct xt_hmark_info *info)
> +{
> + int protoff;
> +
> + protoff = proto_ports_offset(t->proto);
> + if (protoff < 0)
> + return;
> +
> + nhoff += protoff;
> + if (skb_copy_bits(skb, nhoff, &t->uports, sizeof(t->uports)) < 0)
> + return;
> +
> + if (t->proto == IPPROTO_ESP || t->proto == IPPROTO_AH)
> + t->uports.v32 = (t->uports.v32 & info->spi_mask) |
> + info->spi_set;
> + else {
> + t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
> + info->port_set.v32;
> +
> + if (t->uports.p16.dst < t->uports.p16.src)
> + swap(t->uports.p16.dst, t->uports.p16.src);
> + }
> +}
> +
> +#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
> +static int get_inner6_hdr(const struct sk_buff *skb, int *offset)
> +{
> + struct icmp6hdr *icmp6h, _ih6;
> +
> + icmp6h = skb_header_pointer(skb, *offset, sizeof(_ih6), &_ih6);
> + if (icmp6h == NULL)
> + return 0;
> +
> + if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
> + *offset += sizeof(struct icmp6hdr);
> + return 1;
> + }
> + return 0;
> +}
> +
> +static inline u32 hmark_addr6_mask(const __u32 *addr32, const __u32 *mask)
> +{
> + return (addr32[0] & mask[0]) ^
> + (addr32[1] & mask[1]) ^
> + (addr32[2] & mask[2]) ^
> + (addr32[3] & mask[3]);
> +}
> +
> +static int
> +hmark_pkt_set_htuple_ipv6(const struct sk_buff *skb, struct hmark_tuple *t,
> + const struct xt_hmark_info *info)
> +{
> + struct ipv6hdr *ip6, _ip6;
> + int flag = IP6T_FH_F_AUTH; /* Ports offset, find_hdr flags */
> + unsigned int nhoff = 0;
> + u16 fragoff = 0;
> + int nexthdr;
> +
> + ip6 = (struct ipv6hdr *) (skb->data + skb_network_offset(skb));
> + nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
> + if (nexthdr < 0)
> + return 0;
> + /* No need to check for icmp errors on fragments */
> + if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
> + goto noicmp;
> + /* if an icmp error, use the inner header */
> + if (get_inner6_hdr(skb, &nhoff)) {
> + ip6 = skb_header_pointer(skb, nhoff, sizeof(_ip6), &_ip6);
> + if (ip6 == NULL)
> + return -1;
> + /* Treat AH as ESP, use SPI nothing else. */
> + flag = IP6T_FH_F_AUTH;
> + nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
> + if (nexthdr < 0)
> + return -1;
> + }
> +noicmp:
> + t->src = hmark_addr6_mask(ip6->saddr.s6_addr32, info->src_mask.all);
> + t->dst = hmark_addr6_mask(ip6->daddr.s6_addr32, info->dst_mask.all);
> +
> + if (info->flags & XT_F_HMARK_METHOD_L3)
> + return 0;
> +
> + t->proto = nexthdr;
> +
> + if (t->proto == IPPROTO_ICMPV6)
> + return 0;
> +
> + if (flag & IP6T_FH_F_FRAG)
> + return -1;
> +
> + hmark_set_tuple_ports(skb, nhoff, t, info);
> +
> + return 0;
> +}
> +
> +static unsigned int
> +hmark_tg_v6(struct sk_buff *skb, const struct xt_action_param *par)
> +{
> + const struct xt_hmark_info *info = par->targinfo;
> + struct hmark_tuple t;
> +
> + memset(&t, 0, sizeof(struct hmark_tuple));
> +
> + if (info->flags & XT_F_HMARK_CT) {
> + if (hmark_ct_set_htuple(skb, &t, info) < 0)
> + return XT_CONTINUE;
> + } else {
> + if (hmark_pkt_set_htuple_ipv6(skb, &t, info) < 0)
> + return XT_CONTINUE;
> + }
> +
> + skb->mark = hmark_hash(&t, info);
> + return XT_CONTINUE;
> +}
> +
> +static inline u32
> +hmark_addr_any_mask(int l3num, const __u32 *addr32, const __u32 *mask)
> +{
> + if (l3num == AF_INET)
> + return *addr32 & *mask;
> +
> + return hmark_addr6_mask(addr32, mask);
> +}
> +#else
> +static inline u32
> +hmark_addr_any_mask(int l3num, const __u32 *addr32, const __u32 *mask)
> +{
> + return *addr32 & *mask;
> +}
> +
> +#endif
This is ugly. I think you will not find any section of the Netfilter
code with something similar. I have declared this function out of the
#ifdef section, those are static inline, the compiler will put them
out if unused with no further complain.
Please, find a new takeover patch enclosed.
[-- Attachment #2: 0001-netfilter-add-xt_hmark-target-for-hash-based-skb-mar.patch --]
[-- Type: text/x-diff, Size: 13655 bytes --]
>From d5065af3988cc7561a02f30bae8342e1a89126a4 Mon Sep 17 00:00:00 2001
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date: Wed, 2 May 2012 07:49:47 +0000
Subject: netfilter: add xt_hmark target for hash-based skb
marking
The target allows you to create rules in the "raw" and "mangle" tables
which set the skbuff mark by means of hash calculation within a given
range. The nfmark can influence the routing method (see "Use netfilter
MARK value as routing key") and can also be used by other subsystems to
change their behaviour.
Some examples:
* Default rule handles all TCP, UDP, SCTP, ESP & AH
iptables -t mangle -A PREROUTING -m state --state NEW,ESTABLISHED,RELATED \
-j HMARK --hmark-offset 10000 --hmark-mod 10
* Handle SCTP and hash dest port only and produce a nfmark between 100-119.
iptables -t mangle -A PREROUTING -p SCTP -j HMARK --src-mask 0 --dst-mask 0 \
--sp-mask 0 --offset 100 --mod 20
* Fragment safe Layer 3 only, that keep a class C network flow together
iptables -t mangle -A PREROUTING -j HMARK --method L3 \
--src-mask 24 --mod 20 --offset 100
[ A big part of this patch has been refactorized by Pablo Neira Ayuso ]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
include/linux/netfilter/xt_HMARK.h | 48 +++++
net/netfilter/Kconfig | 15 ++
net/netfilter/Makefile | 1 +
net/netfilter/xt_HMARK.c | 358 ++++++++++++++++++++++++++++++++++++
4 files changed, 422 insertions(+)
create mode 100644 include/linux/netfilter/xt_HMARK.h
create mode 100644 net/netfilter/xt_HMARK.c
diff --git a/include/linux/netfilter/xt_HMARK.h b/include/linux/netfilter/xt_HMARK.h
new file mode 100644
index 0000000..05e43ba
--- /dev/null
+++ b/include/linux/netfilter/xt_HMARK.h
@@ -0,0 +1,48 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+enum {
+ XT_HMARK_NONE,
+ XT_HMARK_SADR_AND,
+ XT_HMARK_DADR_AND,
+ XT_HMARK_SPI_AND,
+ XT_HMARK_SPI_OR,
+ XT_HMARK_SPORT_AND,
+ XT_HMARK_DPORT_AND,
+ XT_HMARK_SPORT_OR,
+ XT_HMARK_DPORT_OR,
+ XT_HMARK_PROTO_AND,
+ XT_HMARK_RND,
+ XT_HMARK_MODULUS,
+ XT_HMARK_OFFSET,
+ XT_HMARK_CT,
+ XT_HMARK_METHOD_L3,
+ XT_HMARK_METHOD_L3_4,
+};
+#define XT_HMARK_FLAG(flag) (1 << flag)
+
+union hmark_ports {
+ struct {
+ __u16 src;
+ __u16 dst;
+ } p16;
+ __u32 v32;
+};
+
+struct xt_hmark_info {
+ union nf_inet_addr src_mask; /* Source address mask */
+ union nf_inet_addr dst_mask; /* Dest address mask */
+ union hmark_ports port_mask;
+ union hmark_ports port_set;
+ __u32 spi_mask;
+ __u32 spi_set;
+ __u32 flags; /* Print out only */
+ __u16 proto_mask; /* L4 Proto mask */
+ __u32 hashrnd;
+ __u32 hmodulus; /* Modulus */
+ __u32 hoffset; /* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 0c6f67e..209c1ed 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -509,6 +509,21 @@ config NETFILTER_XT_TARGET_HL
since you can easily create immortal packets that loop
forever on the network.
+config NETFILTER_XT_TARGET_HMARK
+ tristate '"HMARK" target support'
+ depends on (IP6_NF_IPTABLES || IP6_NF_IPTABLES=n)
+ depends on NETFILTER_ADVANCED
+ ---help---
+ This option adds the "HMARK" target.
+
+ The target allows you to create rules in the "raw" and "mangle" tables
+ which set the skbuff mark by means of hash calculation within a given
+ range. The nfmark can influence the routing method (see "Use netfilter
+ MARK value as routing key") and can also be used by other subsystems to
+ change their behaviour.
+
+ To compile it as a module, choose M here. If unsure, say N.
+
config NETFILTER_XT_TARGET_IDLETIMER
tristate "IDLETIMER target support"
depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index ca36765..4e7960c 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_HMARK.o
obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
obj-$(CONFIG_NETFILTER_XT_TARGET_LOG) += xt_LOG.o
obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
diff --git a/net/netfilter/xt_HMARK.c b/net/netfilter/xt_HMARK.c
new file mode 100644
index 0000000..b4aa912
--- /dev/null
+++ b/net/netfilter/xt_HMARK.c
@@ -0,0 +1,358 @@
+/*
+ * xt_HMARK - Netfilter module to set mark by means of hashing
+ *
+ * (C) 2012 by Hans Schillstrom <hans.schillstrom@ericsson.com>
+ * (C) 2012 by Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/icmp.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_HMARK.h>
+
+#include <net/ip.h>
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include <net/netfilter/nf_conntrack.h>
+#endif
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+#include <net/ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
+MODULE_DESCRIPTION("Xtables: packet marking using hash calculation");
+MODULE_ALIAS("ipt_HMARK");
+MODULE_ALIAS("ip6t_HMARK");
+
+struct hmark_tuple {
+ u32 src;
+ u32 dst;
+ union hmark_ports uports;
+ uint8_t proto;
+};
+
+static inline u32 hmark_addr6_mask(const __u32 *addr32, const __u32 *mask)
+{
+ return (addr32[0] & mask[0]) ^
+ (addr32[1] & mask[1]) ^
+ (addr32[2] & mask[2]) ^
+ (addr32[3] & mask[3]);
+}
+
+static inline u32
+hmark_addr_mask(int l3num, const __u32 *addr32, const __u32 *mask)
+{
+ switch(l3num) {
+ case AF_INET:
+ return *addr32 & *mask;
+ case AF_INET6:
+ return hmark_addr6_mask(addr32, mask);
+ }
+ return 0;
+}
+
+static int
+hmark_ct_set_htuple(const struct sk_buff *skb, struct hmark_tuple *t,
+ const struct xt_hmark_info *info)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+ enum ip_conntrack_info ctinfo;
+ struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+ struct nf_conntrack_tuple *otuple;
+ struct nf_conntrack_tuple *rtuple;
+
+ if (ct == NULL || nf_ct_is_untracked(ct))
+ return -1;
+
+ otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+ rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+
+ t->src = hmark_addr_mask(otuple->src.l3num, otuple->src.u3.all,
+ info->src_mask.all);
+ t->dst = hmark_addr_mask(otuple->src.l3num, rtuple->src.u3.all,
+ info->dst_mask.all);
+
+ if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+ return 0;
+
+ t->proto = nf_ct_protonum(ct);
+ if (t->proto != IPPROTO_ICMP) {
+ t->uports.p16.src = otuple->src.u.all;
+ t->uports.p16.dst = rtuple->src.u.all;
+ t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+ info->port_set.v32;
+ }
+
+ return 0;
+#else
+ return -1;
+#endif
+}
+
+static inline u32
+hmark_hash(struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+ u32 hash;
+
+ hash = jhash_3words(t->src, t->dst, t->uports.v32, info->hashrnd);
+ hash = hash ^ (t->proto & info->proto_mask);
+
+ return (hash % info->hmodulus) + info->hoffset;
+}
+
+static void
+hmark_set_tuple_ports(const struct sk_buff *skb, unsigned int nhoff,
+ struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+ int protoff;
+
+ protoff = proto_ports_offset(t->proto);
+ if (protoff < 0)
+ return;
+
+ nhoff += protoff;
+ if (skb_copy_bits(skb, nhoff, &t->uports, sizeof(t->uports)) < 0)
+ return;
+
+ if (t->proto == IPPROTO_ESP || t->proto == IPPROTO_AH)
+ t->uports.v32 = (t->uports.v32 & info->spi_mask) |
+ info->spi_set;
+ else {
+ t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+ info->port_set.v32;
+
+ if (t->uports.p16.dst < t->uports.p16.src)
+ swap(t->uports.p16.dst, t->uports.p16.src);
+ }
+}
+
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+static int get_inner6_hdr(const struct sk_buff *skb, int *offset)
+{
+ struct icmp6hdr *icmp6h, _ih6;
+
+ icmp6h = skb_header_pointer(skb, *offset, sizeof(_ih6), &_ih6);
+ if (icmp6h == NULL)
+ return 0;
+
+ if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
+ *offset += sizeof(struct icmp6hdr);
+ return 1;
+ }
+ return 0;
+}
+
+static int
+hmark_pkt_set_htuple_ipv6(const struct sk_buff *skb, struct hmark_tuple *t,
+ const struct xt_hmark_info *info)
+{
+ struct ipv6hdr *ip6, _ip6;
+ int flag = IP6T_FH_F_AUTH; /* Ports offset, find_hdr flags */
+ unsigned int nhoff = 0;
+ u16 fragoff = 0;
+ int nexthdr;
+
+ ip6 = (struct ipv6hdr *) (skb->data + skb_network_offset(skb));
+ nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
+ if (nexthdr < 0)
+ return 0;
+ /* No need to check for icmp errors on fragments */
+ if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
+ goto noicmp;
+ /* if an icmp error, use the inner header */
+ if (get_inner6_hdr(skb, &nhoff)) {
+ ip6 = skb_header_pointer(skb, nhoff, sizeof(_ip6), &_ip6);
+ if (ip6 == NULL)
+ return -1;
+ /* Treat AH as ESP, use SPI nothing else. */
+ flag = IP6T_FH_F_AUTH;
+ nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
+ if (nexthdr < 0)
+ return -1;
+ }
+noicmp:
+ t->src = hmark_addr6_mask(ip6->saddr.s6_addr32, info->src_mask.all);
+ t->dst = hmark_addr6_mask(ip6->daddr.s6_addr32, info->dst_mask.all);
+
+ if (t->dst < t->src)
+ swap(t->src, t->dst);
+
+ if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+ return 0;
+
+ t->proto = nexthdr;
+
+ if (t->proto == IPPROTO_ICMPV6)
+ return 0;
+
+ if (flag & IP6T_FH_F_FRAG)
+ return -1;
+
+ hmark_set_tuple_ports(skb, nhoff, t, info);
+
+ return 0;
+}
+
+static unsigned int
+hmark_tg_v6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+ const struct xt_hmark_info *info = par->targinfo;
+ struct hmark_tuple t;
+
+ memset(&t, 0, sizeof(struct hmark_tuple));
+
+ if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT)) {
+ if (hmark_ct_set_htuple(skb, &t, info) < 0)
+ return XT_CONTINUE;
+ } else {
+ if (hmark_pkt_set_htuple_ipv6(skb, &t, info) < 0)
+ return XT_CONTINUE;
+ }
+
+ skb->mark = hmark_hash(&t, info);
+ return XT_CONTINUE;
+}
+#endif
+
+static int get_inner_hdr(const struct sk_buff *skb, int iphsz, int *nhoff)
+{
+ const struct icmphdr *icmph;
+ struct icmphdr _ih;
+
+ /* Not enough header? */
+ icmph = skb_header_pointer(skb, *nhoff + iphsz, sizeof(_ih), &_ih);
+ if (icmph == NULL && icmph->type > NR_ICMP_TYPES)
+ return 0;
+
+ /* Error message? */
+ if (icmph->type != ICMP_DEST_UNREACH &&
+ icmph->type != ICMP_SOURCE_QUENCH &&
+ icmph->type != ICMP_TIME_EXCEEDED &&
+ icmph->type != ICMP_PARAMETERPROB &&
+ icmph->type != ICMP_REDIRECT)
+ return 0;
+
+ *nhoff += iphsz + sizeof(_ih);
+ return 1;
+}
+
+static int
+hmark_pkt_set_htuple_ipv4(const struct sk_buff *skb, struct hmark_tuple *t,
+ const struct xt_hmark_info *info)
+{
+ struct iphdr *ip, _ip;
+ int nhoff = skb_network_offset(skb);
+
+ ip = (struct iphdr *) (skb->data + nhoff);
+ if (ip->protocol == IPPROTO_ICMP) {
+ /* use inner header in case of ICMP errors */
+ if (get_inner_hdr(skb, ip->ihl * 4, &nhoff)) {
+ ip = skb_header_pointer(skb, nhoff, sizeof(_ip), &_ip);
+ if (ip == NULL)
+ return -1;
+ }
+ }
+
+ t->src = (__force u32) ip->saddr;
+ t->dst = (__force u32) ip->daddr;
+
+ t->src &= info->src_mask.ip;
+ t->dst &= info->dst_mask.ip;
+
+ if (t->dst < t->src)
+ swap(t->src, t->dst);
+
+ if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+ return 0;
+
+ t->proto = ip->protocol;
+
+ /* ICMP has no ports, skip */
+ if (t->proto == IPPROTO_ICMP)
+ return 0;
+
+ /* follow-up fragments don't contain ports, skip */
+ if (ip->frag_off & htons(IP_MF | IP_OFFSET))
+ return -1;
+
+ hmark_set_tuple_ports(skb, (ip->ihl * 4) + nhoff, t, info);
+
+ return 0;
+}
+
+static unsigned int
+hmark_tg_v4(struct sk_buff *skb, const struct xt_action_param *par)
+{
+ const struct xt_hmark_info *info = par->targinfo;
+ struct hmark_tuple t;
+
+ memset(&t, 0, sizeof(struct hmark_tuple));
+
+ if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT)) {
+ if (hmark_ct_set_htuple(skb, &t, info) < 0)
+ return XT_CONTINUE;
+ } else {
+ if (hmark_pkt_set_htuple_ipv4(skb, &t, info) < 0)
+ return XT_CONTINUE;
+ }
+
+ skb->mark = hmark_hash(&t, info);
+ return XT_CONTINUE;
+}
+
+static int hmark_tg_check(const struct xt_tgchk_param *par)
+{
+ const struct xt_hmark_info *info = par->targinfo;
+
+ if (!info->hmodulus) {
+ pr_info("xt_HMARK: hash modulus can't be zero\n");
+ return -EINVAL;
+ }
+ if (info->proto_mask &&
+ (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))) {
+ pr_info("xt_HMARK: proto mask must be zero with L3 mode\n");
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static struct xt_target hmark_tg_reg[] __read_mostly = {
+ {
+ .name = "HMARK",
+ .family = NFPROTO_IPV4,
+ .target = hmark_tg_v4,
+ .targetsize = sizeof(struct xt_hmark_info),
+ .checkentry = hmark_tg_check,
+ .me = THIS_MODULE,
+ },
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+ {
+ .name = "HMARK",
+ .family = NFPROTO_IPV6,
+ .target = hmark_tg_v6,
+ .targetsize = sizeof(struct xt_hmark_info),
+ .checkentry = hmark_tg_check,
+ .me = THIS_MODULE,
+ },
+#endif
+};
+
+static int __init hmark_tg_init(void)
+{
+ return xt_register_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+static void __exit hmark_tg_exit(void)
+{
+ xt_unregister_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+module_init(hmark_tg_init);
+module_exit(hmark_tg_exit);
--
1.7.9.5
^ permalink raw reply related
* Re: [PATCH] sky2: override for PCI legacy power management
From: Jonathan Nieder @ 2012-05-06 22:26 UTC (permalink / raw)
To: Knut Petersen
Cc: Bjorn Helgaas, Stephen Hemminger, David S. Miller, Linus Torvalds,
arekm, Jared, dilieto, linux-kernel, netdev
In-Reply-To: <4F6BA960.6020003@t-online.de>
Knut Petersen wrote, a few months ago:
> It´s easy to dmi_match() known broken systems - I have dmidecode
> outputs of four systems that definitely need the patch.
>
> Two ASUSTek P5* mainboards with AMI BIOSes, two AOpen i915G*
> mainboards with Award/Phoenix BIOSes.
Yes, please. Could you attach those to [1]?
Thanks,
Jonathan
[1] https://bugzilla.kernel.org/show_bug.cgi?id=19492
^ permalink raw reply
* RE: [PATCH net-next 1/3] Add capability to retrieve plug-in module EEPROM
From: Ben Hutchings @ 2012-05-06 22:05 UTC (permalink / raw)
To: Yaniv Rosner
Cc: smhodgson@solarflare.com, netdev@vger.kernel.org,
bruce.w.allan@intel.com, decot@google.com,
alexander.h.duyck@intel.com, linux-kernel@vger.kernel.org,
David Miller
In-Reply-To: <A6AA2EF896BED345B0F23520D832A4B106AD8E@SJEXCHMB05.corp.ad.broadcom.com>
On Sun, 2012-05-06 at 08:02 +0000, Yaniv Rosner wrote:
> > On Mon, 2012-04-23 at 17:28 -0400, David Miller wrote:
> > > You can't just submit three seperate patches each with the same exact
> > > Subject line.
> > >
> > > Otherwise someone scanning the commit headers can't figure out what
> > > is different in each of these changes.
> > >
> > > There also is no signoff from Ben for patches #2 or #3, did he review
> > > them? If so, why didn't he ACK or sign off on it? If not, why not?
> >
> > Sorry, I've been busy with another project. I'll reply to Stuart's
> > patches faster next round.
> >
> > Ben.
>
> Hi Stuart,
> It's been 3 weeks since you sent the initial patch series which were not
> accepted. Please reply if you're going to resubmit fixed patches soon,
> otherwise I'll take over, and complete this task from where you left it.
I'll be making a pull request including Stuart's patches in the next few
days.
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [GIT PULL net-next v2] IPVS
From: Pablo Neira Ayuso @ 2012-05-06 21:56 UTC (permalink / raw)
To: Simon Horman
Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
Julian Anastasov, Hans Schillstrom, Jesper Dangaard Brouer
In-Reply-To: <1335939762-1912-1-git-send-email-horms@verge.net.au>
Hi Simon,
On Wed, May 02, 2012 at 03:22:24PM +0900, Simon Horman wrote:
> Hi Pablo,
>
> please consider the following 18 changes for 3.5.
>
> This replaces a 17 patch pull request that I sent earlier today by adding a
> patch from Hans patch to the head of the request. You should be able to
> just pull again if you have already pulled the original series and not
> applied anything else on top.
>
> Please note that there will be a conflict with the following when merged
> with net-next due to the following change which is already present in
> David's tree. It should be a trivial merge but please let me know if you
> want to handle this another way.
>
> 4a17fd5 sock: Introduce named constants for sk_reuse
>
> The following changes since commit c3dc836d807a9b9855eefe535fdcbcf7cbb7a574:
>
> netfilter: bridge: optionally set indev to vlan (2012-04-24 01:22:44 +0200)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git master
Pulled, thanks.
I have fixed some minor glitches, see below.
> for you to fetch changes up to 3de4b8ca4ffa3dafaeef985fabe152a1717a8ea0:
>
> export sysctl symbols needed by ip_vs_sync (2012-05-02 14:48:55 +0900)
>
> ----------------------------------------------------------------
> H Hartley Sweeten (2):
> IPVS: ip_vs_ftp.c: local functions should not be exposed globally
> IPVS: ip_vs_proto.c: local functions should not be exposed globally
^^^^
Please, use consistent tagging across patches (sometimes this is in
uppercase, sometimes lowercase).
I think using the same tagging is good for grepping for changes.
I have also removed the CC tag to David Miller in those two patches
above from H. Hartely, they are not useful anymore.
> Hans Schillstrom (1):
> export sysctl symbols needed by ip_vs_sync
^^^
This one missed "net: " or "sock: " tag in the beginnging. I added
"net: " here.
I have manually fixed this, but you'll have to rebase your tree, sorry.
^ permalink raw reply
* Re: [PATCH 18/18] export sysctl symbols needed by ip_vs_sync
From: David Miller @ 2012-05-06 21:19 UTC (permalink / raw)
To: pablo
Cc: horms, lvs-devel, netdev, netfilter-devel, wensong, ja,
hans.schillstrom, brouer
In-Reply-To: <20120506205412.GA22406@1984>
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Sun, 6 May 2012 22:54:12 +0200
> Hi David,
>
> The IPVS people needs this patch for net-next to allow to tune the
> socket buffer for ipvs_sync (the state synchronization that they do
> from kernel-space). This exports sysctl_wmem_max and sysctl_rmem_max
> living in net/core/sock.c. So far, they've been using global socket
> tuning to make them bigger (this avoids overruning the socket under
> high peak of state-change synchronization).
>
> I think this is out of my scope (since it's out of the netfilter
> tree).
>
> Would you acknowledge it, please?
Feel free to merge it via your tree:
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: [PATCH 18/18] export sysctl symbols needed by ip_vs_sync
From: Pablo Neira Ayuso @ 2012-05-06 20:54 UTC (permalink / raw)
To: davem
Cc: Simon Horman, lvs-devel, netdev, netfilter-devel, Wensong Zhang,
Julian Anastasov, Hans Schillstrom, Jesper Dangaard Brouer
In-Reply-To: <1335939762-1912-19-git-send-email-horms@verge.net.au>
Hi David,
The IPVS people needs this patch for net-next to allow to tune the
socket buffer for ipvs_sync (the state synchronization that they do
from kernel-space). This exports sysctl_wmem_max and sysctl_rmem_max
living in net/core/sock.c. So far, they've been using global socket
tuning to make them bigger (this avoids overruning the socket under
high peak of state-change synchronization).
I think this is out of my scope (since it's out of the netfilter
tree).
Would you acknowledge it, please?
On Wed, May 02, 2012 at 03:22:42PM +0900, Simon Horman wrote:
> From: Hans Schillstrom <hans.schillstrom@ericsson.com>
>
> To build ip_vs as a module sysctl_rmem_max and sysctl_wmem_max
> needs to be exported.
> The dependency was added by "ipvs: wakeup master thread" patch
>
> Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
> Signed-off-by: Simon Horman <horms@verge.net.au>
> ---
> net/core/sock.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index c7e60ea..ac3131a 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -258,7 +258,9 @@ static struct lock_class_key af_callback_keys[AF_MAX];
>
> /* Run time adjustable parameters. */
> __u32 sysctl_wmem_max __read_mostly = SK_WMEM_MAX;
> +EXPORT_SYMBOL(sysctl_wmem_max);
> __u32 sysctl_rmem_max __read_mostly = SK_RMEM_MAX;
> +EXPORT_SYMBOL(sysctl_rmem_max);
> __u32 sysctl_wmem_default __read_mostly = SK_WMEM_MAX;
> __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX;
>
> --
> 1.7.10
>
^ permalink raw reply
* Re: [PATCH 01/13 v4] usb/net: rndis: inline the cpu_to_le32() macro
From: Jussi Kivilinna @ 2012-05-06 18:23 UTC (permalink / raw)
To: Linus Walleij
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
Greg Kroah-Hartman, David S. Miller, Felipe Balbi, Haiyang Zhang,
Wei Yongjun, Ben Hutchings
In-Reply-To: <CACRpkdY0t2UvU9vX4kPxV+BPOaerOWw4UKzTe_RjLOpYr2akDg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Quoting Linus Walleij <linus.walleij-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>:
> On Wed, May 2, 2012 at 5:29 PM, Jussi Kivilinna
> <jussi.kivilinna-E01nCVcF24I@public.gmane.org> wrote:
>
>> Quoting Linus Walleij <linus.walleij-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>:
>>
>>> The header file <linux/usb/rndis_host.h> used a number of #defines
>>> that included the cpu_to_le32() macro to assure the result will be
>>> in LE endianness. Inlining this into the code instead of using it
>>> in the code definitions yields consolidation opportunities later
>>> on as you will see in the following patches. The individual
>>> drivers also used local defines - all are switched over to the
>>> pattern of doing the conversion at the call sites instead.
>>>
>>
>> After this patch, endianness checks with sparse output:
> (...)
>> Patch fixing this attached.
>
> Thanks! Folded this into patch 1 and added your Signed-off-by.
>
>> Patch-set to clean-up ugliness caused by this patch at:
>> http://koti.mbnet.fi/axh/kernel/rndis_wlan/
>
> This seems like a good middle-ground as compared to the
> other suggestion to force all defines to be cpu_to_le32().
>
> Do you want me to rebase this on top of my series (there was
> a number of conflicts later in the series) and carry it as part
> of this patch set?
Please, do.
>
> Yours,
> Linus Walleij
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 7/9] net: add skb_orphan_frags to copy aside frags with destructors
From: Michael S. Tsirkin @ 2012-05-06 17:01 UTC (permalink / raw)
To: David Miller; +Cc: ian.campbell, netdev, eric.dumazet
In-Reply-To: <20120504.025433.1474691040952890731.davem@davemloft.net>
On Fri, May 04, 2012 at 02:54:33AM -0400, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Fri, 4 May 2012 00:10:24 +0300
>
> > Hmm we orphan skbs when we loop them back so how about reusing the
> > skb->destructor for this?
>
> That's one possibility.
>
> But I fear we're about to toss Ian into yet another rabbit hole. :-)
>
> Let's try to converge on something quickly as I think integration of
> his work has been delayed enough as-is.
OK I tried doing this and I recalled why we
do the copy with ubufs before clone:
the problem is that shinfo is shared between skbs,
so modifying frags like skb_orphan_frags does is racy.
Stuck for now.
So I have a question: how about reusing the TX_DEV_ZEROCOPY
machinery for this, instead of frag destructors?
Thanks,
--
MST
^ permalink raw reply
* Re: [net-next 0/4][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2012-05-06 17:25 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1336221493-913-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Sat, 5 May 2012 05:38:09 -0700
> This series of patches contains updates for e1000e and ixgbe.
>
> NOTE- The ixgbe patch can and probably should be applied to
> David Miller's net tree as well.
>
> The following are changes since commit bd14b1b2e29bd6812597f896dde06eaf7c6d2f24:
> tcp: be more strict before accepting ECN negociation
> and are available in the git repository at:
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
No new changes there?
[davem@drr net-next]$ git pull git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
>From git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
* branch master -> FETCH_HEAD
Already up-to-date.
^ permalink raw reply
* Re: [PATCH 3/3] skb: Add inline helper for getting the skb end offset from head
From: David Miller @ 2012-05-06 17:13 UTC (permalink / raw)
To: eric.dumazet; +Cc: alexander.h.duyck, netdev, jeffrey.t.kirsher
In-Reply-To: <1336196361.3752.485.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 05 May 2012 07:39:21 +0200
> On Fri, 2012-05-04 at 17:26 -0700, Alexander Duyck wrote:
>> With the recent changes for how we compute the skb truesize it occurs to me
>> we are probably going to have a lot of calls to skb_end_pointer -
>> skb->head. Instead of running all over the place doing that it would make
>> more sense to just make it a separate inline skb_end_offset(skb) that way
>> we can return the correct value without having gcc having to do all the
>> optimization to cancel out skb->head - skb->head.
>>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
...
> Acked-by: Eric Dumazet <edumazet@google.com>
Applied.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox