* Re: Process Hang in __read_seqcount_begin [not found] ` <1351270087.30380.9.camel@edumazet-glaptop> @ 2012-10-26 18:51 ` Peter LaDow 2012-10-26 20:53 ` Thomas Gleixner 2012-10-26 21:05 ` Eric Dumazet 0 siblings, 2 replies; 7+ messages in thread From: Peter LaDow @ 2012-10-26 18:51 UTC (permalink / raw) To: Eric Dumazet; +Cc: linux-kernel, linux-rt-users, netfilter (I've added netfilter and linux-rt-users to try to pull in more help). On Fri, Oct 26, 2012 at 9:48 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Upstream kernel is fine, there is no race, as long as : > > local_bh_disable() disables BH and preemption. Looking at the unpatched code in net/ipv4/netfilter/ip_tables.c, it doesn't appear that any of the code checks the return value for xt_write_receq_begin to determine if it is safe to write. And neither does the newly patched code. How did the mainline code prevent corruption of the tables it is updating? Why isn't there something like: while ( (addend = xt_write_recseq_begin()) == 0 ); To make sure that only one person has write access to the tables? Better yet, why not use a seqlock_t instead? > Apparently RT changes this, so RT needs to change the code. The RT patch only touches local_bh_disable/enable, not the code in ip_tables.c. Does the local_bh_disable/enable in the mainline code protect against multiple writers? > cmpxchg() has strong guarantees (and is also slower than upstream code), > and seems a reasonable way to fix RT/iptables I see this now. And I agree that your patch would prevent corruption of the sequence counter. Thanks, Pete ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Process Hang in __read_seqcount_begin 2012-10-26 18:51 ` Process Hang in __read_seqcount_begin Peter LaDow @ 2012-10-26 20:53 ` Thomas Gleixner 2012-10-26 21:05 ` Eric Dumazet 1 sibling, 0 replies; 7+ messages in thread From: Thomas Gleixner @ 2012-10-26 20:53 UTC (permalink / raw) To: Peter LaDow; +Cc: Eric Dumazet, linux-kernel, linux-rt-users, netfilter On Fri, 26 Oct 2012, Peter LaDow wrote: > (I've added netfilter and linux-rt-users to try to pull in more help). > > On Fri, Oct 26, 2012 at 9:48 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > Upstream kernel is fine, there is no race, as long as : > > > > local_bh_disable() disables BH and preemption. > > Looking at the unpatched code in net/ipv4/netfilter/ip_tables.c, it > doesn't appear that any of the code checks the return value for > xt_write_receq_begin to determine if it is safe to write. And neither > does the newly patched code. How did the mainline code prevent > corruption of the tables it is updating? > > Why isn't there something like: > > while ( (addend = xt_write_recseq_begin()) == 0 ); > > To make sure that only one person has write access to the tables? > Better yet, why not use a seqlock_t instead? > > > Apparently RT changes this, so RT needs to change the code. > > The RT patch only touches local_bh_disable/enable, not the code in > ip_tables.c. Does the local_bh_disable/enable in the mainline code > protect against multiple writers? > > > cmpxchg() has strong guarantees (and is also slower than upstream code), > > and seems a reasonable way to fix RT/iptables > > I see this now. And I agree that your patch would prevent corruption > of the sequence counter. Thanks for the reminder. I'll have a look. tglx ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Process Hang in __read_seqcount_begin 2012-10-26 18:51 ` Process Hang in __read_seqcount_begin Peter LaDow 2012-10-26 20:53 ` Thomas Gleixner @ 2012-10-26 21:05 ` Eric Dumazet 2012-10-26 21:25 ` Peter LaDow 1 sibling, 1 reply; 7+ messages in thread From: Eric Dumazet @ 2012-10-26 21:05 UTC (permalink / raw) To: Peter LaDow; +Cc: linux-kernel, linux-rt-users, netfilter On Fri, 2012-10-26 at 11:51 -0700, Peter LaDow wrote: > (I've added netfilter and linux-rt-users to try to pull in more help). > > On Fri, Oct 26, 2012 at 9:48 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > Upstream kernel is fine, there is no race, as long as : > > > > local_bh_disable() disables BH and preemption. > > Looking at the unpatched code in net/ipv4/netfilter/ip_tables.c, it > doesn't appear that any of the code checks the return value for > xt_write_receq_begin to determine if it is safe to write. And neither > does the newly patched code. How did the mainline code prevent > corruption of the tables it is updating? > Do you know what is per cpu data in linux kernel ? > Why isn't there something like > > while ( (addend = xt_write_recseq_begin()) == 0 ); > > To make sure that only one person has write access to the tables? > Better yet, why not use a seqlock_t instead? > Because its not needed. Really I dont know why you want that. Once you are sure a thread cannot be interrupted by a softirq, and cannot migrate to another cpu, access to percpu data doesnt need other synchronization at all. Following sequence is safe : addend = (__this_cpu_read(xt_recseq.sequence) + 1) & 1; /* * This is kind of a write_seqcount_begin(), but addend is 0 or 1 * We dont check addend value to avoid a test and conditional jump, * since addend is most likely 1 */ __this_cpu_add(xt_recseq.sequence, addend); Because any other thread will use a different percpu data. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Process Hang in __read_seqcount_begin 2012-10-26 21:05 ` Eric Dumazet @ 2012-10-26 21:25 ` Peter LaDow 2012-10-26 21:54 ` Thomas Gleixner 0 siblings, 1 reply; 7+ messages in thread From: Peter LaDow @ 2012-10-26 21:25 UTC (permalink / raw) To: Eric Dumazet; +Cc: linux-kernel, linux-rt-users, netfilter On Fri, Oct 26, 2012 at 2:05 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Do you know what is per cpu data in linux kernel ? I sorta did. But since your response, I did more reading, and now I see what you mean. But I don't think this is a per cpu issue. More below. > Because its not needed. Really I dont know why you want that. > > Once you are sure a thread cannot be interrupted by a softirq, and > cannot migrate to another cpu, access to percpu data doesnt need other > synchronization at all. Because there are multiple entry points on the same CPU. In net/ipv4/netfilter/ip_tables, there are two entries to xt_write_recseq_begin(). The first is in ipt_do_table and the other is in get_counters. Where we are seeing the lockup is with a getsockopt syscall leading to do_counters. The other path is through ipt_do_table, which is installed as a hook. I'm not sure from what context the hooks are called, but it is certainly from a different context than the syscall. > Following sequence is safe : > > addend = (__this_cpu_read(xt_recseq.sequence) + 1) & 1; > /* > * This is kind of a write_seqcount_begin(), but addend is 0 or 1 > * We dont check addend value to avoid a test and conditional jump, > * since addend is most likely 1 > */ > __this_cpu_add(xt_recseq.sequence, addend); If this were safe, we wouldn't be seeing this lockup and your patch wouldn't be needed. So it seems that your patch doesn't really address the issue that we are not "sure a thread cannot be interrupted by a softirq, and cannot migrate to another cpu". Well, we know it cannot migrate to another CPU, because there isn't another CPU. So apparently, it can be interrupted by a softirq. So local_bh_disable isn't doing anything useful in the RT patches with regard to this. As I mentioned earlier, I think perhaps what your patch did was ensure an atomic update of the sequence counter. But it does nothing to synchronize two writers. If they were already synchronized (such as via the call to local_bh_disable), then we wouldn't see sequence counter corruption, no? Pete ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Process Hang in __read_seqcount_begin 2012-10-26 21:25 ` Peter LaDow @ 2012-10-26 21:54 ` Thomas Gleixner 2012-10-30 23:09 ` Peter LaDow 0 siblings, 1 reply; 7+ messages in thread From: Thomas Gleixner @ 2012-10-26 21:54 UTC (permalink / raw) To: Peter LaDow; +Cc: Eric Dumazet, linux-kernel, linux-rt-users, netfilter On Fri, 26 Oct 2012, Peter LaDow wrote: > On Fri, Oct 26, 2012 at 2:05 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > If this were safe, we wouldn't be seeing this lockup and your patch > wouldn't be needed. So it seems that your patch doesn't really > address the issue that we are not "sure a thread cannot be interrupted > by a softirq, and cannot migrate to another cpu". Well, we know it > cannot migrate to another CPU, because there isn't another CPU. So > apparently, it can be interrupted by a softirq. So local_bh_disable > isn't doing anything useful in the RT patches with regard to this. RT changes the semantics slightly. And yes it's not prepared for stuff which is relying on some of the magic mainline implicit semantics. Let me have a look at the whole scenario, once I'm more awake. Thanks, tglx ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Process Hang in __read_seqcount_begin 2012-10-26 21:54 ` Thomas Gleixner @ 2012-10-30 23:09 ` Peter LaDow 2012-10-31 0:33 ` Thomas Gleixner 0 siblings, 1 reply; 7+ messages in thread From: Peter LaDow @ 2012-10-30 23:09 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Eric Dumazet, linux-kernel, linux-rt-users, netfilter Ok. More of an update. We've managed to create a scenario that exhibits the problem much earlier. We can now cause the lockup to occur within a few hours (rather than the 12 to 24 hours in our other scenario). Our setup is to to have a a lot of traffic constantly being processed by the netfilter code. After about 2 hours, any external attempt to read the table entries (such as with getsocktopt and IPT_SO_GET_ENTRIES) triggers the lockup. What is strange is that this does not appear until after a couple of hours of heavy traffic. We cannot trigger this problem in the first hour, rarely in the second hour, and always after the second hour. Now, our original setup did not have as much traffic. But based upon a quick, back of the napkin computation, it seems to occurring after a certain amount of traffic. I can try to get more firm numbers. But this kind of behavior hints less at a race condition between two writers, and is instead somehow dependent upon the amount of traffic. Indeed, my test program only uses IPT_SO_GET_ENTRIES which does not trigger the second path do do_add_counters. So I'm no longer thinking the path through setsockopts is a cause of the problem. So instead, it seems that the only way there could be multiple writers (assuming that is the problem), is if there are multiple contexts through which ipt_do_table() is called. So far, my perusal of the code indicates only through the hooks in each of the iptables modules. And it isn't clear to me how these are called. But it does seem that even with the patch Eric provided (which fixes the seqcount update), there is still a potential problem. If indeed we have multiple contexts executing ipt_do_table(), it is possible for more than just the seqcount to be corrupted. Indeed, it seems that any updates to the internal structures could cause problems. It isn't clear to me if there is anything modified here, other than the counters, so I'm not sure if there are any other issues. But regardless, if the counters could become corrupted, then it is possible to break any rules that use them. Anyway, based on earlier discussion, is there any reason not to use a lock (presuming any solution properly takes into account possible recursion)? I understand that the mainline is protected, but perhaps in the RT version we can use seqlock (and prevent a recursive lock)? Thanks, Pete LaDow ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Process Hang in __read_seqcount_begin 2012-10-30 23:09 ` Peter LaDow @ 2012-10-31 0:33 ` Thomas Gleixner 0 siblings, 0 replies; 7+ messages in thread From: Thomas Gleixner @ 2012-10-31 0:33 UTC (permalink / raw) To: Peter LaDow; +Cc: Eric Dumazet, linux-kernel, linux-rt-users, netfilter On Tue, 30 Oct 2012, Peter LaDow wrote: > Anyway, based on earlier discussion, is there any reason not to use a > lock (presuming any solution properly takes into account possible > recursion)? I understand that the mainline is protected, but perhaps > in the RT version we can use seqlock (and prevent a recursive lock)? Have you tried 3.6.3-rt9? ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-10-31 0:33 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAN8Q1EdobyFuAvtajGRSqNsCG+XoPA+FkwN=2j_H126mmfEjTQ@mail.gmail.com>
[not found] ` <1350925299.8609.978.camel@edumazet-glaptop>
[not found] ` <CAN8Q1EeqmHMHD-VUgr7Wzv0Z_DAD1SrEniftg==L14d+hGwu+Q@mail.gmail.com>
[not found] ` <CAN8Q1EcTRsHjCJD90q0bKRJFMvi7Vn8YRsyQTAHNtTFVuuNdWQ@mail.gmail.com>
[not found] ` <CAN8Q1EeB2kC6A_YP9zdtp_qYHMG5FF7-Usjx7Rz5pv--bjhqkQ@mail.gmail.com>
[not found] ` <1351053164.6537.95.camel@edumazet-glaptop>
[not found] ` <CAN8Q1Ef9sCAG8-KNWXEeuRUcmFNcg93mMG=HRQw6i3UuGtJCVg@mail.gmail.com>
[not found] ` <1351270087.30380.9.camel@edumazet-glaptop>
2012-10-26 18:51 ` Process Hang in __read_seqcount_begin Peter LaDow
2012-10-26 20:53 ` Thomas Gleixner
2012-10-26 21:05 ` Eric Dumazet
2012-10-26 21:25 ` Peter LaDow
2012-10-26 21:54 ` Thomas Gleixner
2012-10-30 23:09 ` Peter LaDow
2012-10-31 0:33 ` Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).