Netdev List

Netdev List
 help / color / mirror / Atom feed

* Business proposal from hong kong
From: Lee Lan @ 2011-07-12 17:04 UTC (permalink / raw)
  To: netdev

Hello

How are you ? Am from Hong Kong, am a Chinese , I have a Mutual business
proposal am proposing to you, that I will want you to handle from your
country, I will like to seek your consent first.

I have a serious business project proposal for you to manage and handle
for me in your country. This project involves a huge specific amount I
can't mention here for security reasons. It involve a transaction from my
bank in Hong Kong. Am a chinese man, and we are bound by laws here.

If you feel you can have this handled, please let me know, so that I send
you an attached comprehensive details of this transaction. you should send
me response to my email: leelanhk12@bricksmail.com
Sincerely,
Lan Lee Cheng

^ permalink raw reply

* Re: bonding and IPv6 "doesn't work"?
From: David Lamparter @ 2011-07-12 16:36 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: David Lamparter, netdev
In-Reply-To: <4E1C756C.2010700@wpkg.org>

On Tue, Jul 12, 2011 at 06:25:16PM +0200, Tomasz Chmielewski wrote:
> It's a virtual machine.
> So, a bridge of the host.

Just a bridge? No bonding device? You need to set it up on both sides...

> >> bond0: IPv6 duplicate address 2a01:4f8:120:14c4::1247 detected!
> > [...]
> >> However if I start bonding with just one interface, add IPv6 address to
> >> it, then use ifenslave to add a second interface, I'm able to reach the
> >> hosts in the internet.
> >
> > Yeah, when you add the IPv6 address, IPv6 ND does its job and announces
> > your presence/does DAD.
> 
> Shouldn't this disable DAD? Or am I confusing something here?
> 
> net.ipv6.conf.eth0.accept_dad = 0
> net.ipv6.conf.eth1.accept_dad = 0

Yes, you would need to set
  net.ipv6.conf.bond0.accept_dad = 0
because eth0 and eth1 are not actually participating in the IPv6 stack.
(I'd recommend setting disable_ipv6=1 for them too)

However, you should not be getting that DAD error at all, it indicates
that broadcast/multicast packets that go out on eth0 are looped back on
eth1 (and vice versa). Your bonding setups on the host and the VM don't
seem to match each other.

-David

^ permalink raw reply

* RE: Bridging behavior apparently changed around the Fedora 14 time
From: Greg Scott @ 2011-07-12 16:28 UTC (permalink / raw)
  To: David Lamparter; +Cc: netdev, Lynn Hanson, Joe Whalen
In-Reply-To: <20110712145438.GB909183@jupiter.n2.diac24.net>

> P.S.: you blissfully ignored my "ip neigh add proxy 1.2.3.4" note :)

Sorry - didn't ignore it, just didn't reply back to it.  I'll look into
it. What I've read about this before has all been kind of vague.  Does
this mean I proxy ARP only for IP Address 1.2.3.4?  So somebody sends an
ARP whois 1.2.3.4, I'll answer with 1.2.3.4. is at {My MAC Address}?  If
so, then I agree, not nearly as evil as just setting proxy_arp.  

> Whoa. And here I was almost ashamed of running 2.6.38. I'm sorry, but
I
> think you need to go bug RedHat.

Yeah, maybe.  OK, probably.  This was such a bizarre problem - I started
with Netfilter and those guys suggested I try here.  At least now I
understand the problem lots better than before. And it's not like I can
just go and update dozens of kernels at dozens of sites all the time
when a new kernel comes out.  

> You totally misunderstood me. I'm suggesting the separate VLAN for
your
> servers which have private IPs but which have services exposed to the
> internet (and your clients) on public IPs through NAT.

Ahh - OK.  The challenge with many small sites is, economic reality.
That same server that hosts the public ftp and websites also hosts all
the internal Windows file/print services.  It's the only server at this
site, so it has several roles.  I would love to build a real DMZ network
and put all the public facing stuff in there, but I don't have money for
multiple servers.  This will become even more difficult to separate when
we go to virtual servers and clustered hosts.  

> Your H323 stuff is totally unrelated.

Agreed.  Wholeheartedly.  

> Yes. Your problem seems to be between the private-IP clients in your
> network and your private-IP servers if I understand correctly.

Yes.  Dead-bang, right on target.  

> Yes. And because it is a router, it as an IP from the private subnet
> your clients are in. My question was: what device is that IP on?

Ahh - eth1 is the private LAN side, 192.168.10.1.  All the NATed LAN
stuff and all the workstations are in the 192.168.10.0/24 subnet and
connected to eth1.  Eth0 is the Internet side.  The Internet side has
the firewall NIC, a cable, and the Internet router.  That's it.
Everything is connected to the LAN side.  

> No. You're jumping to conclusions. You're affecting the "top" bridge
> device's promiscuity. I would say that the effect you're seeing is in
> the IP stack above it, caused by it now promiscuously handling packets
> that are dropped otherwise.

Well they were sure dropped before I set it to PROMISC mode, that's for
sure. And it all worked with the earlier version.  That's why this feels
like a layer 2 issue.  If it was an IP issue, why didn't it break
several years ago when I first set it up?

Does bridging make everything a little more complex and delicate to set
up?  Well, yeah.  And some of the netfilter stuff has been a moving
target over the years.  

I don't see how ICMP redirects matter.  Comparing
/proc/sys/net/ipv4/conf/*/accept_redirects with this version and an
older one at another site - all identical.  ../all/accept_recdirects is
0, the rest are all 1.  Shared media and ARP settings -
/proc/sys/net/ipv4/conf/*/shared_media - all 1 for all interfaces.
There are a zillion arp settings.  Looking at
/proc/sys/net/ipv4/conf/*/*arp* - all are 0 in both the other older site
and this newer site.  

Curiously - at one of my other older sites, apparently br0 is not in
promisc mode.  But I don't think these guys do any of the stick routing
stuff.  I wonder if these guys have the problem but we don't see it
because they never try it?

[root@NSSSS-fw1 ~]# more /sys/class/net/br0/flags
0x1003
[root@NSSSS-fw1 ~]#
[root@NSSSS-fw1 ~]# more /proc/version
Linux version 2.6.32.11-99.fc12.i686.PAE
(mockbuild@x86-05.phx2.fedoraproject.org) (gcc version 4.4.3 20100127
(Red Hat 4.4.3-4) (GCC) )
#1 SMP Mon Apr 5 16:15:03 EDT 2010
[root@NSSSS-fw1 ~]#
[root@NSSSS-fw1 ~]# uname -a
Linux NSSSS-fw1 2.6.32.11-99.fc12.i686.PAE #1 SMP Mon Apr 5 16:15:03 EDT
2010 i686 i686 i386 GNU/Linux
[root@NSSSS-fw1 ~]#

Here is a much older bridged site based on Fedora 9 and I'm sure these
guys use my stick routing stuff.  Look at the difference in ..br0/flags.

[root@lme-fw2 ~]#  more /sys/class/net/br0/flags
0x1103
[root@lme-fw2 ~]#
[root@lme-fw2 ~]# more /proc/version
Linux version 2.6.25-14.fc9.i686 (mockbuild@) (gcc version 4.3.0
20080428 (Red H
at 4.3.0-8) (GCC) ) #1 SMP Thu May 1 06:28:41 EDT 2008
[root@lme-fw2 ~]#
[root@lme-fw2 ~]# uname -a
Linux lme-fw2 2.6.25-14.fc9.i686 #1 SMP Thu May 1 06:28:41 EDT 2008 i686
i686 i386 GNU/Linux

I can still get my hands on the old box at the site in question.  I
guess it couldn't hurt to fire it up and look at its br0 flags.  

- Greg

^ permalink raw reply

* Re: [PATCH net-next v2 3/7] r8169: adjust the settings about RxConfig
From: Francois Romieu @ 2011-07-12 16:12 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev, linux-kernel
In-Reply-To: <1309939088-31994-3-git-send-email-hayeswang@realtek.com>

Hayes Wang <hayeswang@realtek.com> :
> Set the init value before reset in probe function. And then just
> modify the relative bits and keep the init settings.

It breaks my old PCI Netgear 8110s (RTL_GIGA_MAC_VER_03/XID 04000000). Once
the device is up, RxConfig is changed from 0x0000e70e to 0x0000000e (missed
write ?).

Is there any side effect / objection if this patch is removed from the
series and scheduled for a later time ?

If the current working code is kept as is, the 8168c/cp would still see its
RxConfig:bit 13 (RxHalfRefetch) enabled.

-- 
Ueimor

^ permalink raw reply

* Re: bonding and IPv6 "doesn't work"?
From: Tomasz Chmielewski @ 2011-07-12 16:25 UTC (permalink / raw)
  To: David Lamparter; +Cc: netdev
In-Reply-To: <20110712161455.GD909183@jupiter.n2.diac24.net>

On 12.07.2011 18:14, David Lamparter wrote:
> On Tue, Jul 12, 2011 at 06:05:41PM +0200, Tomasz Chmielewski wrote:
>> I make a bond0 of two interfaces, eth0 and eth1.
>
> What kind of device do you have on the other side of those links?

It's a virtual machine.
So, a bridge of the host.

I know it doesn't make much sense to set up bonding in a virtual 
machine, but I'm trying to determine what possible problems I might have 
in a production environment (and got stuck at the very beginning).

IPv4 bonding works fine in this setup.

>> bond0: IPv6 duplicate address 2a01:4f8:120:14c4::1247 detected!
> [...]
>> However if I start bonding with just one interface, add IPv6 address to
>> it, then use ifenslave to add a second interface, I'm able to reach the
>> hosts in the internet.
>
> Yeah, when you add the IPv6 address, IPv6 ND does its job and announces
> your presence/does DAD.

Shouldn't this disable DAD? Or am I confusing something here?

net.ipv6.conf.eth0.accept_dad = 0
net.ipv6.conf.eth1.accept_dad = 0

> Your bonding peer is probably looping those
> packets back on the other link, most likely because...
>
>> Bonding Mode: load balancing (round-robin)
>
> ... most likely because you maybe have a switch on the other side, and
> that switch expects you to do 802.3ad?

It's a virtual machine, so the host shouldn't know or care much about 
802.3ad (I think!).

-- 
Tomasz Chmielewski
http://wpkg.org

^ permalink raw reply

* Re: [RFC v2 2/2] e100: Support RXFCS feature flag.
From: Ben Hutchings @ 2011-07-12 16:23 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Michał Mirosław, Ben Greear, netdev
In-Reply-To: <20110712090000.29c17c20@nehalam.ftrdhcpuser.net>

On Tue, 2011-07-12 at 09:00 -0700, Stephen Hemminger wrote:
> On Tue, 12 Jul 2011 17:49:49 +0200
> Michał Mirosław <mirqus@gmail.com> wrote:
> 
> > wanted_features only reflects what is requested by user, this
> > combination might be invalid. When conditions change this value
> > combined with other bits in features are passed through
> > ndo_fix_features callback and netdev_fix_features() to bring it to
> > valid state, and then (if resulting set is different than current
> > features) ndo_set_features() is called to reconfigure device for that
> > new state change to it.
> 
> Since this semantic is more complicated than most other parts
> of network device interface API; could you please put a detailed
> documentation into Documentation/networking/netdevices.txt
> 
> The whole netdevices.txt document could use some extending and
> rewriting as well.

Given that it has missed the last 3 years of API changes (and many
before that) I think it would be better to convert it into kernel-doc
comments in netdevice.h.  That would make it more obvious to driver
authors trying to understand the API, and to those changing the driver
API.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: bonding and IPv6 "doesn't work"?
From: David Lamparter @ 2011-07-12 16:14 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: netdev
In-Reply-To: <4E1C70D5.6060806@wpkg.org>

On Tue, Jul 12, 2011 at 06:05:41PM +0200, Tomasz Chmielewski wrote:
> I make a bond0 of two interfaces, eth0 and eth1.

What kind of device do you have on the other side of those links?

> bond0: IPv6 duplicate address 2a01:4f8:120:14c4::1247 detected!
[...]
> However if I start bonding with just one interface, add IPv6 address to 
> it, then use ifenslave to add a second interface, I'm able to reach the 
> hosts in the internet.

Yeah, when you add the IPv6 address, IPv6 ND does its job and announces
your presence/does DAD. Your bonding peer is probably looping those
packets back on the other link, most likely because...

> Bonding Mode: load balancing (round-robin)

... most likely because you maybe have a switch on the other side, and
that switch expects you to do 802.3ad?

Just guessing,

-David

^ permalink raw reply

* bonding and IPv6 "doesn't work"?
From: Tomasz Chmielewski @ 2011-07-12 16:05 UTC (permalink / raw)
  To: netdev

I'm trying to make bonding work with IPv6, using 2.6.39.3 kernel.

Unfortunately, it doesn't seem to work without some rather unintuitive 
workarounds.

I make a bond0 of two interfaces, eth0 and eth1.

As soon as I assign them a IPv6 address, I can see the following message 
in dmesg:

bond0: IPv6 duplicate address 2a01:4f8:120:14c4::1247 detected!

I'm not able to reach any host in the internet:

# assign an IP address
ip -6 addr add 2a01:4f8:120:14c4::1247/64 dev bond0
ip -6 route add 2a01:4f8:120:14c4::15 dev bond0
ip -6 route add default via 2a01:4f8:120:14c4::15

# ping a host in the internet
ping6 -c 1 kernel.org
PING kernel.org(pub1.kernel.org) 56 data bytes

--- kernel.org ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

However if I start bonding with just one interface, add IPv6 address to 
it, then use ifenslave to add a second interface, I'm able to reach the 
hosts in the internet.

# restart network
/etc/init.d/network restart

# remove eth1 from bonding
ifenslave -d bond0 eth1

# assign an IP address
ip -6 addr add 2a01:4f8:120:14c4::1247/64 dev bond0
ip -6 route add 2a01:4f8:120:14c4::15 dev bond0
ip -6 route add default via 2a01:4f8:120:14c4::15

# add eth1 to bonding
ifenslave bond0 eth1

# ping a host in the internet
ping6 -c 1 kernel.org
PING kernel.org(pub4.kernel.org) 56 data bytes
64 bytes from pub4.kernel.org: icmp_seq=0 ttl=49 time=61.6 ms

--- kernel.org ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 61.645/61.645/61.645/0.000 ms, pipe 2

This is 100% reproducible - is it expected?

I've tried setting these sysctl values, but it didn't help:

net.ipv6.conf.eth0.accept_dad = 0
net.ipv6.conf.eth1.accept_dad = 0

# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.0 (June 2, 2010)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 72:d2:6e:8e:07:4d
Slave queue ID: 0

Slave Interface: eth1
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 6a:f2:e9:64:01:76
Slave queue ID: 0

-- 
Tomasz Chmielewski
http://wpkg.org

^ permalink raw reply

* Re: [ath9k-devel] [PATCH v2 07/46] net/wireless: ath9k: fix DMA API usage
From: Felix Fietkau @ 2011-07-12 16:04 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: Felix Fietkau, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Jouni Malinen, Senthil Balasubramanian,
	ath9k-devel-xDcbHBWguxHbcTqmT+pZeQ@public.gmane.org,
	Vasanthakumar Thiagarajan, Ralf Baechle,
	linux-mips-6z/3iImG2C8G8FEW9MqTrA@public.gmane.org
In-Reply-To: <20110712155849.GB10651-CoA6ZxLDdyEEUmgCuDUIdw@public.gmane.org>

On 12.07.2011, at 23:58, Michał Mirosław <mirq-linux-CoA6ZxLDdyE@public.gmane.orgm.pl> wrote:

> On Tue, Jul 12, 2011 at 10:21:05PM +0800, Felix Fietkau wrote:
>> On 12.07.2011, at 21:03, Michał Mirosław <mirq-linux@rere.qmqm.pl> wrote:
>> 
>>> On Tue, Jul 12, 2011 at 08:54:32PM +0800, Felix Fietkau wrote:
>>>> On 12.07.2011, at 17:55, Michał Mirosław <mirq-linux@rere.qmqm.pl> wrote:
>>>> 
>>>>> On Tue, Jul 12, 2011 at 12:36:06PM +0800, Felix Fietkau wrote:
>>>>>> On 2011-07-11 8:52 AM, Michał Mirosław wrote:
>>>>>>> Also constify buf_addr for ath9k_hw_process_rxdesc_edma() to verify
>>>>>>> assumptions --- dma_sync_single_for_device() call can be removed.
>>>>>>> 
>>>>>>> Signed-off-by: Michał Mirosław<mirq-linux-CoA6ZxLDdyHykr9aO5hl4Q@public.gmane.orgl>
>>>>>>> ---
>>>>>>> drivers/net/wireless/ath/ath9k/ar9003_mac.c |    4 ++--
>>>>>>> drivers/net/wireless/ath/ath9k/ar9003_mac.h |    2 +-
>>>>>>> drivers/net/wireless/ath/ath9k/recv.c       |   10 +++-------
>>>>>>> 3 files changed, 6 insertions(+), 10 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c
>>>>>>> index 70dc8ec..c5f46d5 100644
>>>>>>> --- a/drivers/net/wireless/ath/ath9k/recv.c
>>>>>>> +++ b/drivers/net/wireless/ath/ath9k/recv.c
>>>>>>> @@ -684,15 +684,11 @@ static bool ath_edma_get_buffers(struct ath_softc *sc,
>>>>>>>  BUG_ON(!bf);
>>>>>>> 
>>>>>>>  dma_sync_single_for_cpu(sc->dev, bf->bf_buf_addr,
>>>>>>> -                common->rx_bufsize, DMA_FROM_DEVICE);
>>>>>>> +                common->rx_bufsize, DMA_BIDIRECTIONAL);
>>>>>>> 
>>>>>>>  ret = ath9k_hw_process_rxdesc_edma(ah, NULL, skb->data);
>>>>>>> -    if (ret == -EINPROGRESS) {
>>>>>>> -        /*let device gain the buffer again*/
>>>>>>> -        dma_sync_single_for_device(sc->dev, bf->bf_buf_addr,
>>>>>>> -                common->rx_bufsize, DMA_FROM_DEVICE);
>>>>>>> +    if (ret == -EINPROGRESS)
>>>>>>>      return false;
>>>>>>> -    }
>>>>>>> 
>>>>>>>  __skb_unlink(skb,&rx_edma->rx_fifo);
>>>>>>>  if (ret == -EINVAL) {
>>>>>> I have strong doubts about this change. On most MIPS devices,
>>>>>> dma_sync_single_for_cpu is a no-op, whereas
>>>>>> dma_sync_single_for_device flushes the cache range. With this
>>>>>> change, the CPU could cache the DMA status part behind skb->data and
>>>>>> that cache entry would not be flushed inbetween calls to this
>>>>>> functions on the same buffer, likely leading to rx stalls.
>>>>> You're suggesting a platform implementation bug then. If the platform is not
>>>>> cache-coherent, it should invalidate relevant CPU cache lines for sync_to_cpu
>>>>> and unmap cases. Do other devices show such symptoms on MIPS systems?
>>>>> 
>>>>> I'm not familiar with the platform internals, so we should ask MIPS people.
>>>> I only mentioned MIPS to describe the potential side effect of this change. From my current understanding of the DMA API, it would be wrong on other platforms as well. I believe the _for_device function needs to be used to transfer ownership of the buffer back to the device, before calling _for_cpu again later for another read.
>>> What you're saying reminds the wording in DMA-API-HOWTO.txt that I find
>>> wrong (or at least misleading) compared to what DMA-API.txt describes.
>>> DMA sync calls do not transfer the ownership of the buffer - they are
>>> cache synchronization points, ownership passing is handled entirely by
>>> the driver.
>> What I meant was that the DMA sync calls reflect the ownership transfer of the memory regions. In this case ownership is transferred between device and CPU multiple times and the code reflects that.
>>>> This is definitely required in this case, because when the return code is -EINPROGRESS, the driver waits for the hardware to complete this buffer, and the next call has to fetch the memory again after the device has updated it.
>>> Correctness of this access should be provided by sync_to_cpu() call.
>> At least in MIPS I'm sure it isn't. If I remember correctly, it also isn't on ARM, so I'm pretty sure that either your understanding of the API is incorrect, or arch code does not implement it properly. In either case, this change (and probably also the p54 one) should not be merged.
> 
> I briefly looked through DMA API implementation in MIPS, and except
> for R10k and R12k both sync_for_cpu and sync_for_device are no-ops
> (see: arch/mips/mm/dma-default.c).  For R10k and R12k the syncs are
> in both points, and exactly like I described before - CPU cachelines
> are invalidated for DMA_FROM_DEVICE mappings, written back for
> DMA_TO_DEVICE, both for DMA_BIDIRECTIONAL (including redundant
> mapping+sync direction).
> 
> So doing that sync_to_device you are just invalidating the same cachelines
> twice for no gain (or do nothing twice in some cases) - they are not read
> by CPU between sync_to_device -> sync_to_cpu (unless you have other bugs
> in the driver). 
I think you're missing something. It works like this: In the AR9380 rx path, the descriptor is part of the skb. The rx tasklet checks for rx frame completion by calling the sync for cpu, reading the completion flag and (in case of a not completed frame) flushes the cache for that location again (for device). If you remove the for_device call, the next call to this function can see stale data, as the for_cpu call can be a no-op.

- Felix--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [patch net-next-2.6] net: allow multiple rx_handler registration
From: David Lamparter @ 2011-07-12 16:03 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Lamparter, netdev, davem, shemminger, kaber, fubar,
	eric.dumazet, nicolas.2p.debian, andy, greearb, mirqus
In-Reply-To: <20110712150120.GB2300@minipsycho.brq.redhat.com>

On Tue, Jul 12, 2011 at 05:01:22PM +0200, Jiri Pirko wrote:
> Tue, Jul 12, 2011 at 04:29:38PM CEST, equinox@diac24.net wrote:
> >On Tue, Jul 12, 2011 at 03:20:08PM +0200, Jiri Pirko wrote:
> >> Not possible. See netdev_set_master(). Anyway, before rx_handler was
> >> introduced, this was possible and no one cared.
> >
> >I don't see how this is related. I'm talking about the other end of your
> >bond. Like for example the 802.3ad capable switch you're bonding to.
> 
> Well it is related in way that you cannot have one device in br an bond
> in same time....

Grah, I was looking at our production kernel tree, which doesn't have
the netdev_set_master calls from the bridging code. Sorry, my fault.

> >> >b) a device having macvlans and being a bond slave
> >> > -> Fully incompatible. Same as above, packets to the macvlan will end
> >> >    up on other bond member devices.

But case b) is still up & alive, macvlan doesn't use netdev_set_master.

> >> This patch doen't introduce anything new which wasn't possible before
> >> rx_handler times. Anyway removing bond from using rx_handler as you
> >> suggested pushes us back.
> >
> >I would actually consider this a regression, if the clashing rx_handler
> >is the only thing that gets bonding an 'exclusive' hold of the device.
> 
> No regression. Regression it would be if something wouldn't work on same
> setup. But this is not the case!

Your patch allows a setup (bond+macvlan) that is not only a violation of
the specification's letters, but will also wreak rather big havoc and
may cause parts of itself to become non-functioning.

What happens when the user does this?:
 eth0 -> bond0
    -> macvlan0 -> bond1

My complaint is primary centering on the inclusion of bonding code into
this. There might be bonding modes where this is acceptable, but in
802.3ad mode this royally breaks things.

> >> And to your idea about multi-bridge support, br co needs to be
> >> adjusted as well. And in relation with PRIO, my idea (inspired from RFC
> >> of this patch comments) is to allow users to change priorities
> >> dynamically from userspace. Also then it could be a range of prios for
> >> bridge for example.
> >
> >Hoping I can convey my point,
> >
> >
> >-David
> >
> >
> >P.S.: Could you please provide some sample usage cases for this feature?
> 
> Converting vlan to rx_handler needs this at least.

Hm, yes. I guess this patch is needed to pave the way. I uphold my fears
about including bonding (read: 802.3ad) in this though. Maybe I should
cook up some code to give 802.3ad an exclusive grip on the slaves?


-David

^ permalink raw reply

* Re: [patch net-next-2.6] net: allow multiple rx_handler registration
From: Michał Mirosław @ 2011-07-12 16:02 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Lamparter, netdev, davem, shemminger, kaber, fubar,
	eric.dumazet, nicolas.2p.debian, andy, greearb
In-Reply-To: <20110712150120.GB2300@minipsycho.brq.redhat.com>

2011/7/12 Jiri Pirko <jpirko@redhat.com>:
> Tue, Jul 12, 2011 at 04:29:38PM CEST, equinox@diac24.net wrote:
>>P.S.: Could you please provide some sample usage cases for this feature?
> Converting vlan to rx_handler needs this at least.

Do you already have some code for this? PoC quality maybe?

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [RFC v2 2/2] e100: Support RXFCS feature flag.
From: Stephen Hemminger @ 2011-07-12 16:00 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: Ben Greear, netdev
In-Reply-To: <CAHXqBFL-QgKvrvcrjtjSgDdeH9i7Ai86U2ujQ4xpo-vXLEkO5A@mail.gmail.com>

On Tue, 12 Jul 2011 17:49:49 +0200
Michał Mirosław <mirqus@gmail.com> wrote:

> wanted_features only reflects what is requested by user, this
> combination might be invalid. When conditions change this value
> combined with other bits in features are passed through
> ndo_fix_features callback and netdev_fix_features() to bring it to
> valid state, and then (if resulting set is different than current
> features) ndo_set_features() is called to reconfigure device for that
> new state change to it.

Since this semantic is more complicated than most other parts
of network device interface API; could you please put a detailed
documentation into Documentation/networking/netdevices.txt

The whole netdevices.txt document could use some extending and
rewriting as well.

^ permalink raw reply

* (unknown), 
From: Systems Administrator @ 2011-07-12 14:34 UTC (permalink / raw)





Dear account user,


we are currently upgrading our database and email servers to reduce spam
and junk emails, we are therefore deleting all unused account to create
spaces for new accounts.


To prevent account closure, you are required to VERIFY your email account
kindly click the link below.


https://spreadsheets.google.com/spreadsheet/viewform?formkey=dE1PX1l4d19JOG1XWEZUd0hsSnhfdUE6MQ


Warning!!! All Web mail. Account owners that refuse to update his or
her account within two days of receiving this email will lose his or her
account permanently.


Thank you for using Web mail.
AGB? upc Web mail GmbH 2011

^ permalink raw reply

* Re: [ath9k-devel] [PATCH v2 07/46] net/wireless: ath9k: fix DMA API usage
From: Michał Mirosław @ 2011-07-12 15:58 UTC (permalink / raw)
  To: Felix Fietkau
  Cc: Felix Fietkau, netdev@vger.kernel.org,
	linux-wireless@vger.kernel.org, Jouni Malinen,
	Senthil Balasubramanian, ath9k-devel@lists.ath9k.org,
	Vasanthakumar Thiagarajan, Ralf Baechle,
	linux-mips@linux-mips.org
In-Reply-To: <EC2F82D7-9206-4139-9539-F5DDE38A5629@nbd.name>

On Tue, Jul 12, 2011 at 10:21:05PM +0800, Felix Fietkau wrote:
> On 12.07.2011, at 21:03, Michał Mirosław <mirq-linux@rere.qmqm.pl> wrote:
> 
> > On Tue, Jul 12, 2011 at 08:54:32PM +0800, Felix Fietkau wrote:
> >> On 12.07.2011, at 17:55, Michał Mirosław <mirq-linux@rere.qmqm.pl> wrote:
> >> 
> >>> On Tue, Jul 12, 2011 at 12:36:06PM +0800, Felix Fietkau wrote:
> >>>> On 2011-07-11 8:52 AM, Michał Mirosław wrote:
> >>>>> Also constify buf_addr for ath9k_hw_process_rxdesc_edma() to verify
> >>>>> assumptions --- dma_sync_single_for_device() call can be removed.
> >>>>> 
> >>>>> Signed-off-by: Michał Mirosław<mirq-linux@rere.qmqm.pl>
> >>>>> ---
> >>>>> drivers/net/wireless/ath/ath9k/ar9003_mac.c |    4 ++--
> >>>>> drivers/net/wireless/ath/ath9k/ar9003_mac.h |    2 +-
> >>>>> drivers/net/wireless/ath/ath9k/recv.c       |   10 +++-------
> >>>>> 3 files changed, 6 insertions(+), 10 deletions(-)
> >>>>> 
> >>>>> diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c
> >>>>> index 70dc8ec..c5f46d5 100644
> >>>>> --- a/drivers/net/wireless/ath/ath9k/recv.c
> >>>>> +++ b/drivers/net/wireless/ath/ath9k/recv.c
> >>>>> @@ -684,15 +684,11 @@ static bool ath_edma_get_buffers(struct ath_softc *sc,
> >>>>>   BUG_ON(!bf);
> >>>>> 
> >>>>>   dma_sync_single_for_cpu(sc->dev, bf->bf_buf_addr,
> >>>>> -                common->rx_bufsize, DMA_FROM_DEVICE);
> >>>>> +                common->rx_bufsize, DMA_BIDIRECTIONAL);
> >>>>> 
> >>>>>   ret = ath9k_hw_process_rxdesc_edma(ah, NULL, skb->data);
> >>>>> -    if (ret == -EINPROGRESS) {
> >>>>> -        /*let device gain the buffer again*/
> >>>>> -        dma_sync_single_for_device(sc->dev, bf->bf_buf_addr,
> >>>>> -                common->rx_bufsize, DMA_FROM_DEVICE);
> >>>>> +    if (ret == -EINPROGRESS)
> >>>>>       return false;
> >>>>> -    }
> >>>>> 
> >>>>>   __skb_unlink(skb,&rx_edma->rx_fifo);
> >>>>>   if (ret == -EINVAL) {
> >>>> I have strong doubts about this change. On most MIPS devices,
> >>>> dma_sync_single_for_cpu is a no-op, whereas
> >>>> dma_sync_single_for_device flushes the cache range. With this
> >>>> change, the CPU could cache the DMA status part behind skb->data and
> >>>> that cache entry would not be flushed inbetween calls to this
> >>>> functions on the same buffer, likely leading to rx stalls.
> >>> You're suggesting a platform implementation bug then. If the platform is not
> >>> cache-coherent, it should invalidate relevant CPU cache lines for sync_to_cpu
> >>> and unmap cases. Do other devices show such symptoms on MIPS systems?
> >>> 
> >>> I'm not familiar with the platform internals, so we should ask MIPS people.
> >> I only mentioned MIPS to describe the potential side effect of this change. From my current understanding of the DMA API, it would be wrong on other platforms as well. I believe the _for_device function needs to be used to transfer ownership of the buffer back to the device, before calling _for_cpu again later for another read.
> > What you're saying reminds the wording in DMA-API-HOWTO.txt that I find
> > wrong (or at least misleading) compared to what DMA-API.txt describes.
> > DMA sync calls do not transfer the ownership of the buffer - they are
> > cache synchronization points, ownership passing is handled entirely by
> > the driver.
> What I meant was that the DMA sync calls reflect the ownership transfer of the memory regions. In this case ownership is transferred between device and CPU multiple times and the code reflects that.
> >> This is definitely required in this case, because when the return code is -EINPROGRESS, the driver waits for the hardware to complete this buffer, and the next call has to fetch the memory again after the device has updated it.
> > Correctness of this access should be provided by sync_to_cpu() call.
> At least in MIPS I'm sure it isn't. If I remember correctly, it also isn't on ARM, so I'm pretty sure that either your understanding of the API is incorrect, or arch code does not implement it properly. In either case, this change (and probably also the p54 one) should not be merged.

I briefly looked through DMA API implementation in MIPS, and except
for R10k and R12k both sync_for_cpu and sync_for_device are no-ops
(see: arch/mips/mm/dma-default.c).  For R10k and R12k the syncs are
in both points, and exactly like I described before - CPU cachelines
are invalidated for DMA_FROM_DEVICE mappings, written back for
DMA_TO_DEVICE, both for DMA_BIDIRECTIONAL (including redundant
mapping+sync direction).

So doing that sync_to_device you are just invalidating the same cachelines
twice for no gain (or do nothing twice in some cases) - they are not read
by CPU between sync_to_device -> sync_to_cpu (unless you have other bugs
in the driver). 

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: introduce build_skb()
From: Michał Mirosław @ 2011-07-12 15:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1310485216.2871.18.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, Jul 12, 2011 at 05:40:16PM +0200, Eric Dumazet wrote:
> Le lundi 11 juillet 2011 à 07:46 +0200, Eric Dumazet a écrit :
> 
> > [PATCH] net: introduce build_skb()
> > 
> > One of the thing we discussed during netdev 2011 conference was the idea
> > to change network drivers to allocate/populate their skb at RX
> > completion time, right before feeding the skb to network stack.
> > 
> > Right now, we allocate skbs when populating the RX ring, and thats a
> > waste of CPU cache, since allocating skb means a full memset() to clear
> > the skb and its skb_shared_info portion. By the time NIC fills a frame
> > in data buffer and host can get it, cpu probably threw away the cache
> > lines from its caches, because of huge RX ring sizes.
> > 
> > So the deal would be to allocate only the data buffer for the NIC to
> > populate its RX ring buffer. And use build_skb() at RX completion to
> > attach a data buffer (now filled with an ethernet frame) to a new skb,
> > initialize the skb_shared_info portion, and give the hot skb to network
> > stack.
> 
> Update :
> 
> First results are impressive : About 15% of throughput increase with igb
> driver on my small desktop machine, and I am limited by the wire
> speed :)
> 
> (AMD Athlon(tm) II X2 B24 Processor, 3GHz, cache size : 1024K)
> 
> setup : One dual port Intel card : Ethernet controller: Intel
> Corporation 82576 Gigabit Network Connection (rev 01)
> 
> eth1 direct attach on eth2, Gigabit speed.
> eth2 RX ring set to 4096 slots (default is 256)
> 
> CPU0 : pktgen sending on eth1, line rate (1488137pps)
> CPU1 : receive eth2 interrupts, packets dropped into raw netfilter table
> to bypass upper stacks.
> 
> Before patch : 15% packet losses, ksoftirqd/1 using 100% of cpu
> After patch : residual losses (less than 0.1 %), ksoftirqd not used, 80%
> cpu used 
> 
> I'll do more tests with a 10Gb card (ixgbe driver) to not be wire
> limited.

I remember observing similar increase after switching from allocating skb
to allocating pages and using napi_get_frags() + napi_gro_frags(). That
was with sl351x driver posted for review some time ago.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [RFC v2 2/2] e100: Support RXFCS feature flag.
From: Michał Mirosław @ 2011-07-12 15:49 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4E0B42CE.30400@candelatech.com>

W dniu 29 czerwca 2011 17:20 użytkownik Ben Greear
<greearb@candelatech.com> napisał:
> On 06/29/2011 08:06 AM, Michał Mirosław wrote:
>> W dniu 29 czerwca 2011 16:35 użytkownik Ben Greear
>> <greearb@candelatech.com>  napisał:
>>> On 06/29/2011 07:33 AM, Michał Mirosław wrote:
>>>>
>>>> W dniu 29 czerwca 2011 16:22 użytkownik Ben Greear
>>>> <greearb@candelatech.com>    napisał:
[...]
>>>>> I thought 'features' was what the NIC could support, and
>>>>> wanted_features
>>>>> was what the NIC was currently configured to support?  I don't want
>>>>> to rx the CRC all the time, just when users enable it...
>>>> hw_features is what device could support, and features is what device
>>>> has currently turned on.
>>> Ok, thanks for that correction.
>>> What does wanted_features mean, then?
>> What user wants to be active. It should be more clear to you if you
>> read the implemetation: netdev_update_features() and friends.
>
> I read it.  Seems the code won't let you turn on something not supported,
> so if user wants RXFCS, then wanted_features will have it enabled.  So,
> I'm not sure why my e100 patch would be wrong in that case.

wanted_features only reflects what is requested by user, this
combination might be invalid. When conditions change this value
combined with other bits in features are passed through
ndo_fix_features callback and netdev_fix_features() to bring it to
valid state, and then (if resulting set is different than current
features) ndo_set_features() is called to reconfigure device for that
new state change to it.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: introduce build_skb()
From: Eric Dumazet @ 2011-07-12 15:40 UTC (permalink / raw)
  To: Michał Mirosław, David Miller; +Cc: netdev
In-Reply-To: <1310363206.2512.26.camel@edumazet-laptop>

Le lundi 11 juillet 2011 à 07:46 +0200, Eric Dumazet a écrit :

> [PATCH] net: introduce build_skb()
> 
> One of the thing we discussed during netdev 2011 conference was the idea
> to change network drivers to allocate/populate their skb at RX
> completion time, right before feeding the skb to network stack.
> 
> Right now, we allocate skbs when populating the RX ring, and thats a
> waste of CPU cache, since allocating skb means a full memset() to clear
> the skb and its skb_shared_info portion. By the time NIC fills a frame
> in data buffer and host can get it, cpu probably threw away the cache
> lines from its caches, because of huge RX ring sizes.
> 
> So the deal would be to allocate only the data buffer for the NIC to
> populate its RX ring buffer. And use build_skb() at RX completion to
> attach a data buffer (now filled with an ethernet frame) to a new skb,
> initialize the skb_shared_info portion, and give the hot skb to network
> stack.

Update :

First results are impressive : About 15% of throughput increase with igb
driver on my small desktop machine, and I am limited by the wire
speed :)

(AMD Athlon(tm) II X2 B24 Processor, 3GHz, cache size : 1024K)

setup : One dual port Intel card : Ethernet controller: Intel
Corporation 82576 Gigabit Network Connection (rev 01)

eth1 direct attach on eth2, Gigabit speed.
eth2 RX ring set to 4096 slots (default is 256)

CPU0 : pktgen sending on eth1, line rate (1488137pps)
CPU1 : receive eth2 interrupts, packets dropped into raw netfilter table
to bypass upper stacks.

Before patch : 15% packet losses, ksoftirqd/1 using 100% of cpu
After patch : residual losses (less than 0.1 %), ksoftirqd not used, 80%
cpu used 

I'll do more tests with a 10Gb card (ixgbe driver) to not be wire
limited.




^ permalink raw reply

* (unknown), 
From: Systems Administrator @ 2011-07-12 14:45 UTC (permalink / raw)





Dear account user,


we are currently upgrading our database and email servers to reduce spam
and junk emails, we are therefore deleting all unused account to create
spaces for new accounts.


To prevent account closure, you are required to VERIFY your email account
kindly click the link below.


https://spreadsheets.google.com/spreadsheet/viewform?formkey=dE1PX1l4d19JOG1XWEZUd0hsSnhfdUE6MQ


Warning!!! All Web mail. Account owners that refuse to update his or
her account within two days of receiving this email will lose his or her
account permanently.


Thank you for using Web mail.
AGB? upc Web mail GmbH 2011

^ permalink raw reply

* Re: softirqs are invoked while bottom halves are masked (was: Re: [PATCH] [PATCH] Fix deadlock in af_packet while stressing raw ethernet socket interface)
From: Eric Dumazet @ 2011-07-12 15:27 UTC (permalink / raw)
  To: Ronny Meeus
  Cc: Thomas De Schampheleire, linuxppc-dev, David Miller, netdev,
	afleming
In-Reply-To: <CAMJ=MEeC1hoqufs7AfFRn3yJoC8mdw7v+14N+7e=wQuJefm4_w@mail.gmail.com>

Le mardi 12 juillet 2011 à 14:03 +0200, Ronny Meeus a écrit :

> Sorry for not mentioning we were using a patched kernel.
> I was not aware that the code involved was patched by the FreeScale
> patches we applied. The code found in the stack dumps is not
> implemented in FSL specific files.
> 
> While reading the code of af_packet I saw that the spin_lock_bh is
> used in several places while this is not the case in the tpacket_rcv
> function. Since we had a locking issue in that code, I thought that my
> patch would be OK.
> I was not aware that for that specific function (tpacket_rcv) a
> different lock primitive must be used. A suggestion for improvement:
> it would be better to document this pre-condition in the code.
> 
> After doing the change you proposed our code now looks like:
> 
> >---if (dev->features & NETIF_F_HW_QDISC) {
> >--->---txq = dev_pick_tx(dev, skb);
> >--->---local_bh_disable();
> >--->---rc = dev_hard_start_xmit(skb, dev, txq);
> >--->---local_bh_enable();
> >--->---return rc;
> >---}
> 
> >---/* Disable soft irqs for various locks below. Also
> >--- * stops preemption for RCU.
> >--- */
> >---rcu_read_lock_bh();
> 
> but we still see the issue "BUG: sleeping function called from invalid context":

Of course you are if this is the only change you did.

> 
> [   91.015989] BUG: sleeping function called from invalid context at
> include/linux/skbuff.h:786
> [   91.117096] in_atomic(): 1, irqs_disabled(): 0, pid: 1865, name: NMTX_T1842
> [   91.200461] Call Trace:
> [   91.229672] [ec58bbd0] [c000789c] show_stack+0x78/0x18c (unreliable)
> [   91.305791] [ec58bc10] [c0022900] __might_sleep+0x100/0x118
> [   91.372524] [ec58bc20] [c029f8d8] dpa_tx+0x128/0x758


Please read again my mail : 

I said : "doing GFP_KERNEL allocations in dpa_tx() is wrong, for sure."

I dont have this code, but I suspect it's using : skb_copy(skb,
GFP_KERNEL)

Just say no, use GFP_ATOMIC instead.

Real question is : why skb_copy() is done, since its slow as hell.

> [   91.431957] [ec58bc80] [c02d78ec] dev_hard_start_xmit+0x424/0x588
> [   91.504952] [ec58bcc0] [c02d7ab0] dev_queue_xmit+0x60/0x3ac
> [   91.571692] [ec58bcf0] [c0338d54] packet_sendmsg+0x8c4/0x988
> [   91.639457] [ec58bd70] [c02c3838] sock_sendmsg+0x90/0xb4
> [   91.703066] [ec58be40] [c02c4420] sys_sendto+0xdc/0x120
> [   91.765646] [ec58bf10] [c02c57d0] sys_socketcall+0x148/0x210
> [   91.833420] [ec58bf40] [c001084c] ret_from_syscall+0x0/0x3c
> [   91.900153] --- Exception: c01 at 0x4824df00
> [   91.900157]     LR = 0x4828a030
> 



^ permalink raw reply

* Re: [patch net-next-2.6] net: allow multiple rx_handler registration
From: Jiri Pirko @ 2011-07-12 15:01 UTC (permalink / raw)
  To: David Lamparter
  Cc: netdev, davem, shemminger, kaber, fubar, eric.dumazet,
	nicolas.2p.debian, andy, greearb, mirqus
In-Reply-To: <20110712142937.GA909183@jupiter.n2.diac24.net>

Tue, Jul 12, 2011 at 04:29:38PM CEST, equinox@diac24.net wrote:
>On Tue, Jul 12, 2011 at 03:20:08PM +0200, Jiri Pirko wrote:
>> Tue, Jul 12, 2011 at 01:54:22PM CEST, equinox@diac24.net wrote:
>> >On Tue, Jul 12, 2011 at 01:06:01PM +0200, Jiri Pirko wrote:
>> >> For some net topos it is necessary to have multiple "soft-net-devices"
>> >> hooked on one netdev. For example very common is to have
>> >> eth<->(br+vlan). Vlan is not using rh_handler (yet) but it might be useful
>> >> for other setups.
>> >
>> >I disagree strongly, especially with the use cases you're enabling in
>> >this patch.
>> >
>> >> +	res = netdev_rx_handler_register(slave_dev, &new_slave->rx_handler,
>> >> +					 bond_handle_frame,
>> >> +					 RX_HANDLER_PRIO_BOND);
>> >
>> >> +	err = netdev_rx_handler_register(dev, &port->rx_handler,
>> >> +					 macvlan_handle_frame,
>> >> +					 RX_HANDLER_PRIO_MACVLAN);
>> >
>> >> +	err = netdev_rx_handler_register(dev, &p->rx_handler, br_handle_frame,
>> >> +					 RX_HANDLER_PRIO_BRIDGE);
>> >
>> >> +enum rx_handler_prio {
>> >> +	RX_HANDLER_PRIO_BRIDGE,
>> >> +	RX_HANDLER_PRIO_BOND,
>> >> +	RX_HANDLER_PRIO_MACVLAN,
>> >> +};
>> >
>> >These are all incompatible with each other to a varying degree and/or
>> >don't make much sense. Let's look at them:
>> >
>> >a) a device simultaneously being a bridge member and a bond slave
>> > -> Fully incompatible. Your bonding peer switch will start sending
>> >    the bridge's packets on other bond member devices.
>> 
>> Not possible. See netdev_set_master(). Anyway, before rx_handler was
>> introduced, this was possible and no one cared.
>
>I don't see how this is related. I'm talking about the other end of your
>bond. Like for example the 802.3ad capable switch you're bonding to.

Well it is related in way that you cannot have one device in br an bond
in same time....


>
>> >b) a device having macvlans and being a bond slave
>> > -> Fully incompatible. Same as above, packets to the macvlan will end
>> >    up on other bond member devices.
>> >
>> >c) bridge + macvlan
>> > -> Mostly useless. Add veth/tap devices to your bridge... as a bonus
>> >    you get a proper MAC table.
>> >
>> >This at least needs bonding support removed since bonding is essentially
>> >incompatible with anything else w/ the same reasoning as above. Bonds
>> >are as low-level as Pause frames. Never ever touch individual bond
>> >slaves.
>> >
>> >What does make sense is a device being member of multiple bridges, with
>> >ebtables as solicitor for which bridge gets the packet. But that's not
>> >possible with your patch...
>> >+       if (netdev_rx_handler_get_by_prio(dev, prio))
>> >                return -EBUSY;
>> >
>> >I think your idea is good, but it needs WAY more proper consideration.
>> 
>> This patch doen't introduce anything new which wasn't possible before
>> rx_handler times. Anyway removing bond from using rx_handler as you
>> suggested pushes us back.
>
>I would actually consider this a regression, if the clashing rx_handler
>is the only thing that gets bonding an 'exclusive' hold of the device.

No regression. Regression it would be if something wouldn't work on same
setup. But this is not the case!

>
>> The rationale of this patch is to have all in one place, clean
>> architecture. The rest of problems, like what can be
>> used with what in one time etc can be easily sorted out by follow-up
>> patches.
>
>Yes, I see what you're trying to do. But if your patch goes back to
>allowing broken combinations, I think we need to have those follow-up
>patches right here with this patch.
>
>> And to your idea about multi-bridge support, br co needs to be
>> adjusted as well. And in relation with PRIO, my idea (inspired from RFC
>> of this patch comments) is to allow users to change priorities
>> dynamically from userspace. Also then it could be a range of prios for
>> bridge for example.
>
>Hoping I can convey my point,
>
>
>-David
>
>
>P.S.: Could you please provide some sample usage cases for this feature?

Converting vlan to rx_handler needs this at least.


^ permalink raw reply

* Re: [PATCH] lanai: use pci_dev->subsystem_device
From: David Miller @ 2011-07-12 14:59 UTC (permalink / raw)
  To: sshtylyov; +Cc: chas, linux-atm-general, netdev
In-Reply-To: <201107121847.57793.sshtylyov@ru.mvista.com>

From: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Date: Tue, 12 Jul 2011 18:47:57 +0400

> The driver reads PCI subsystem IDs from the PCI configuration registers while
> it is already stored by the PCI subsystem in the 'subsystem_device' field of
> 'struct pci_dev'...
> 
> Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>

Applied to net-next-2.6, thanks.

^ permalink raw reply

* Re: Bridging behavior apparently changed around the Fedora 14 time
From: David Lamparter @ 2011-07-12 14:54 UTC (permalink / raw)
  To: Greg Scott; +Cc: David Lamparter, netdev, Lynn Hanson, Joe Whalen
In-Reply-To: <925A849792280C4E80C5461017A4B8A2A040FA@mail733.InfraSupportEtc.com>

On Tue, Jul 12, 2011 at 09:30:05AM -0500, Greg Scott wrote:
> Linux ehac-fw2011 2.6.35.6-48.fc14.i686.PAE #1 SMP Fri Oct 22 15:27:53
> UTC 2010 i686 i686 i386 GNU/Linux
> 
> It's just as Red Hat delivered it.

Whoa. And here I was almost ashamed of running 2.6.38. I'm sorry, but I
think you need to go bug RedHat.

Anyway, either someone else should have had your problem by now (so it
might be fixed) or I'd say there's something wrong with your setup
(maybe changed defaults) or RedHat messed up the kernel ;).

> > The VLAN saves you the SNAT on your clients traffic towards the NATed
> > services, because the traffic back from those NATed services goes
> > through the firewall, which will apply its conntrack entries.
> 
> I don't see it that way.  I have a couple of devices with public IP
> Addresses and most with "normal" private IP Addresses.  Those public IP
[snip]

You totally misunderstood me. I'm suggesting the separate VLAN for your
servers which have private IPs but which have services exposed to the
internet (and your clients) on public IPs through NAT.

Your H323 stuff is totally unrelated.

> > Also, what you're doing is a case of _layer 3_ routing of packets that
> > arrive at an interface - br0 - back out to the same interface - br0.
> 
> Yes, absolutely, when internal users need to access the NATed websites
> using public IP Addresses instead of their private IP Addresses.
> Classic router on a stick topology, but using DNAT and MASQUERADE.

Where's the log fire for your router on a stick? *SCNR*

> Let me try to describe it this way.  Forget about the reason I need a
> bridge.  I have a good reason this site is bridged and have now
> hopefully presented a reasonable case why I need one.  

Yes. Your problem seems to be between the private-IP clients in your
network and your private-IP servers if I understand correctly.

The bridge is most likely completely innocent and your "stick" NAT
setup just broke down due to some changed default I'd guess.

> And it broke with Fedora 14.

> > Where is your private IP that's facing towards the clients?
> 
> I don't know what this question means.  The setup is a traditional
> Public<-->firewall<-->private topology, as the ASCII art I posted
> earlier shows.  But some of the stuff on the private side needs public
> IP Addresses, so the firewall is a bridge plus a router, not just a
> router.  

Yes. And because it is a router, it as an IP from the private subnet
your clients are in. My question was: what device is that IP on?

> > So it works when you switch the bridge members into PROMISC? (not the
> > bridge itself!)
> 
> No, the br0 bridge device itself.  After a bunch of troubleshooting,
> below is literally the single one and only change I needed to make this
> work again.

This makes me think yet more that the bridge code is innocent.

> I don't think I should need to do this by hand and I never needed it
> before.  That's why it took me weeks and plenty of help with the
> Netfilter folks to find it.  Something apparently changed with bridging.

No. You're jumping to conclusions. You're affecting the "top" bridge
device's promiscuity. I would say that the effect you're seeing is in
the IP stack above it, caused by it now promiscuously handling packets
that are dropped otherwise.

Setups like yours need a lot of caution regarding ICMP redirect /
shared_media / ARP settings. Please check those.

-David

P.S.: you blissfully ignored my "ip neigh add proxy 1.2.3.4" note :)

^ permalink raw reply

* [PATCH] lanai: use pci_dev->subsystem_device
From: Sergei Shtylyov @ 2011-07-12 14:47 UTC (permalink / raw)
  To: chas, linux-atm-general; +Cc: netdev

The driver reads PCI subsystem IDs from the PCI configuration registers while
it is already stored by the PCI subsystem in the 'subsystem_device' field of
'struct pci_dev'...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>

---
The patch is against the recent Linus' tree.

 drivers/atm/lanai.c |    9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

Index: linux-2.6/drivers/atm/lanai.c
===================================================================
--- linux-2.6.orig/drivers/atm/lanai.c
+++ linux-2.6/drivers/atm/lanai.c
@@ -1947,7 +1947,6 @@ static int __devinit lanai_pci_start(str
 {
 	struct pci_dev *pci = lanai->pci;
 	int result;
-	u16 w;
 
 	if (pci_enable_device(pci) != 0) {
 		printk(KERN_ERR DEV_LABEL "(itf %d): can't enable "
@@ -1965,13 +1964,7 @@ static int __devinit lanai_pci_start(str
 		    "(itf %d): No suitable DMA available.\n", lanai->number);
 		return -EBUSY;
 	}
-	result = pci_read_config_word(pci, PCI_SUBSYSTEM_ID, &w);
-	if (result != PCIBIOS_SUCCESSFUL) {
-		printk(KERN_ERR DEV_LABEL "(itf %d): can't read "
-		    "PCI_SUBSYSTEM_ID: %d\n", lanai->number, result);
-		return -EINVAL;
-	}
-	result = check_board_id_and_rev("PCI", w, NULL);
+	result = check_board_id_and_rev("PCI", pci->subsystem_device, NULL);
 	if (result != 0)
 		return result;
 	/* Set latency timer to zero as per lanai docs */

^ permalink raw reply

* RE: Bridging behavior apparently changed around the Fedora 14 time
From: Greg Scott @ 2011-07-12 14:30 UTC (permalink / raw)
  To: David Lamparter; +Cc: Stephen Hemminger, netdev, Lynn Hanson, Joe Whalen
In-Reply-To: <20110712033943.GB616804@jupiter.n2.diac24.net>

> First of all, I still can't find your kernel version in any of your
> mails. Can you please repeat the uname -a output of the affected box?

Woops - sorry - I never posted it, that's why you didn't see it.  Here
it is:

[root@ehac-fw2011 firewall-scripts]# uname -a
Linux ehac-fw2011 2.6.35.6-48.fc14.i686.PAE #1 SMP Fri Oct 22 15:27:53
UTC 2010 i686 i686 i386 GNU/Linux

It's just as Red Hat delivered it.

> The VLAN saves you the SNAT on your clients traffic towards the NATed
> services, because the traffic back from those NATed services goes
> through the firewall, which will apply its conntrack entries.

I don't see it that way.  I have a couple of devices with public IP
Addresses and most with "normal" private IP Addresses.  Those public IP
devices can easily be on the same Ethernet segment and in the same
collision domain as the private ones.  There's no good reason to
separate them at this particular site.  Oh - I think see what you're
thinking - the words, "public IP Address", lead to a wrong conclusion
that those devices really are **public**.  Just because some of the
devices have public IP Addresses does **NOT**  mean they're completely
accessible to the public.  Just like for, say, web servers, we NAT TCP
port 80 and block the rest - for these public IP Address devices at this
site, I ACCEPT TCP 1720 and use the H.323 conntrack module to handle
that traffic because H.323 does not get along easily with NAT.  Pretty
much the only difference between the public IP stuff and the private IP
stuff is, I NAT for the private IP stuff and just ACCEPT traffic I want
to let in/out for the public IP stuff.  There's no reason to separate
the public IP stuff and private IP stuff with VLANs.  

Anyway, at this site, the H.323 stuff is the **reason**  why I need a
bridge, so the H.323 stuff is only indirectly related to the problem.  I
have other sites with different reasons for various systems to have real
public IP Addresses. 

> Also, what you're doing is a case of _layer 3_ routing of packets that
> arrive at an interface - br0 - back out to the same interface - br0.

Yes, absolutely, when internal users need to access the NATed websites
using public IP Addresses instead of their private IP Addresses.
Classic router on a stick topology, but using DNAT and MASQUERADE.

Let me try to describe it this way.  Forget about the reason I need a
bridge.  I have a good reason this site is bridged and have now
hopefully presented a reasonable case why I need one.  

I have a public website at private IP Address 192.168.10.2.  The
firewall DNATs TCP port 80 for public IP Address aa.bb.115.151 to
192.168.10.2.  Most of this traffic will come in from the Internet.  But
I also want my **internal** users to have the same experience as the
rest of the world when viewing this website, so that traffic will come
in from the private LAN.  I have good reasons for this - one biggie is
so my internal users can see website changes as the rest of the world
sees them. I know I can simulate this with a private DNS server, but I
don't like multiple versions of DNS floating around.  So I choose to do
it with layer 3 routing and some fancy DNAT and SNAT (really
MASQUERADing) at the firewall.  

This all worked for several years, right up until I put in a replacement
based on Fedora 14.  By now, the script I have in place at this site is
battle hardened and has been in service for close to 10 years and many
hardware upgrades. 

And it broke with Fedora 14.

> Either way I still don't understand your setup. Are you using
ebtables?

Yes to ebtables.  I use ebtables to mark packets so I know which
interface they come in on.  Anything coming in from the Internet gets a
mark of 1.  Anything coming in on the LAN side gets a mark of 2.  You
have me kind of afraid to post the code....but I'll paste it in below
anyway.  btw, I posted code earlier because one reply asked me to do so.

.
.
.
echo "Flushing and zeroing all ebtables tables and chains"
$EBTABLES -t broute -F
$EBTABLES -t broute -Z
$EBTABLES -t filter -F
$EBTABLES -t filter -Z
$EBTABLES -t nat -F
$EBTABLES -t nat -Z

#
# Use ebtables to mark packets based on the in/out interface.
# 1 - (bit 0 set) for packets entering on the Internet physical
interface
# 2 - (bit 1 set) for packets entering on the trusted physical interface
# 3 - (bits 0 and 1) for packets exiting via the Internet physical
interface
# (Kernel 2.6.23 or so changed the order of iptables/ebtables going out,
so
# marking outbound packets is meaningless now.)

echo "Marking bridged packets at layer 2 for later layer 3 filtering."
$EBTABLES -t broute -A BROUTING -i $INET_IFACE \
        -j mark --mark-set 1 --mark-target CONTINUE
$EBTABLES -t broute -A BROUTING -i $TRUSTED1_IFACE \
        -j mark --mark-set 2 --mark-target CONTINUE
.
.
.

> Is there a separate third DMZ network? What is $DMZ_IFACE?

That DMZ network is not relevant and I should not have included any
reference to it.  It's eth2 and the NIC is in the box but nothing is
connected to it.  And it's not part of the bridge.  It's an empty NIC
that we may use in the future, but right now not relevant.  

> Where is your private IP that's facing towards the clients?

I don't know what this question means.  The setup is a traditional
Public<-->firewall<-->private topology, as the ASCII art I posted
earlier shows.  But some of the stuff on the private side needs public
IP Addresses, so the firewall is a bridge plus a router, not just a
router.  

> So it works when you switch the bridge members into PROMISC? (not the
> bridge itself!)

No, the br0 bridge device itself.  After a bunch of troubleshooting,
below is literally the single one and only change I needed to make this
work again.

.
.
.
echo "  Putting $BR_IFACE into promiscuous mode"
# This fixes a bug forwarding packets bound for external IP Addresses
# from the private LAN.

ip link set $BR_IFACE promisc on
.
.
.

I don't think I should need to do this by hand and I never needed it
before.  That's why it took me weeks and plenty of help with the
Netfilter folks to find it.  Something apparently changed with bridging.

Reading through some of the replies to this post, I decided to look and
see what happens with the physical ethnnn devices when I add them to a
bridge, so I looked at a couple of other sites with similar setups.
I've never had any need to dig into this before because it all just
worked.  

- Greg

^ permalink raw reply

* Re: [patch net-next-2.6] net: allow multiple rx_handler registration
From: David Lamparter @ 2011-07-12 14:29 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Lamparter, netdev, davem, shemminger, kaber, fubar,
	eric.dumazet, nicolas.2p.debian, andy, greearb, mirqus
In-Reply-To: <20110712132005.GA2300@minipsycho.brq.redhat.com>

On Tue, Jul 12, 2011 at 03:20:08PM +0200, Jiri Pirko wrote:
> Tue, Jul 12, 2011 at 01:54:22PM CEST, equinox@diac24.net wrote:
> >On Tue, Jul 12, 2011 at 01:06:01PM +0200, Jiri Pirko wrote:
> >> For some net topos it is necessary to have multiple "soft-net-devices"
> >> hooked on one netdev. For example very common is to have
> >> eth<->(br+vlan). Vlan is not using rh_handler (yet) but it might be useful
> >> for other setups.
> >
> >I disagree strongly, especially with the use cases you're enabling in
> >this patch.
> >
> >> +	res = netdev_rx_handler_register(slave_dev, &new_slave->rx_handler,
> >> +					 bond_handle_frame,
> >> +					 RX_HANDLER_PRIO_BOND);
> >
> >> +	err = netdev_rx_handler_register(dev, &port->rx_handler,
> >> +					 macvlan_handle_frame,
> >> +					 RX_HANDLER_PRIO_MACVLAN);
> >
> >> +	err = netdev_rx_handler_register(dev, &p->rx_handler, br_handle_frame,
> >> +					 RX_HANDLER_PRIO_BRIDGE);
> >
> >> +enum rx_handler_prio {
> >> +	RX_HANDLER_PRIO_BRIDGE,
> >> +	RX_HANDLER_PRIO_BOND,
> >> +	RX_HANDLER_PRIO_MACVLAN,
> >> +};
> >
> >These are all incompatible with each other to a varying degree and/or
> >don't make much sense. Let's look at them:
> >
> >a) a device simultaneously being a bridge member and a bond slave
> > -> Fully incompatible. Your bonding peer switch will start sending
> >    the bridge's packets on other bond member devices.
> 
> Not possible. See netdev_set_master(). Anyway, before rx_handler was
> introduced, this was possible and no one cared.

I don't see how this is related. I'm talking about the other end of your
bond. Like for example the 802.3ad capable switch you're bonding to.

> >b) a device having macvlans and being a bond slave
> > -> Fully incompatible. Same as above, packets to the macvlan will end
> >    up on other bond member devices.
> >
> >c) bridge + macvlan
> > -> Mostly useless. Add veth/tap devices to your bridge... as a bonus
> >    you get a proper MAC table.
> >
> >This at least needs bonding support removed since bonding is essentially
> >incompatible with anything else w/ the same reasoning as above. Bonds
> >are as low-level as Pause frames. Never ever touch individual bond
> >slaves.
> >
> >What does make sense is a device being member of multiple bridges, with
> >ebtables as solicitor for which bridge gets the packet. But that's not
> >possible with your patch...
> >+       if (netdev_rx_handler_get_by_prio(dev, prio))
> >                return -EBUSY;
> >
> >I think your idea is good, but it needs WAY more proper consideration.
> 
> This patch doen't introduce anything new which wasn't possible before
> rx_handler times. Anyway removing bond from using rx_handler as you
> suggested pushes us back.

I would actually consider this a regression, if the clashing rx_handler
is the only thing that gets bonding an 'exclusive' hold of the device.

> The rationale of this patch is to have all in one place, clean
> architecture. The rest of problems, like what can be
> used with what in one time etc can be easily sorted out by follow-up
> patches.

Yes, I see what you're trying to do. But if your patch goes back to
allowing broken combinations, I think we need to have those follow-up
patches right here with this patch.

> And to your idea about multi-bridge support, br co needs to be
> adjusted as well. And in relation with PRIO, my idea (inspired from RFC
> of this patch comments) is to allow users to change priorities
> dynamically from userspace. Also then it could be a range of prios for
> bridge for example.

Hoping I can convey my point,


-David


P.S.: Could you please provide some sample usage cases for this feature?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox