* [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter())
@ 2016-03-16 4:17 Al Viro
2016-03-16 4:34 ` Linus Torvalds
2016-03-16 15:46 ` Doug Ledford
0 siblings, 2 replies; 7+ messages in thread
From: Al Viro @ 2016-03-16 4:17 UTC (permalink / raw)
To: Mike Marciniszyn; +Cc: linux-rdma, linux-fsdevel, linux-kernel, Linus Torvalds
Folks, we'd discussed that kind of crap already; why, in name of
everything unholy, is that kind of garbage brought back in a new driver?
Having both ->write() and ->write_iter() *AND* having entirely
unrelated interpretation of user input on those two on the same device
is bogus, with the capital Stop That Shit Right Now.
As it is, write(fd, p, len) and writev(fd, &(struct iovec){p,len}, 1)
are interpreted in absolutely unrelated ways. As in, entirely different
set of commands. Moreover, aio IOCB_CMD_PWRITE matches writev(), not write().
What's going on? Sure, ipathfs is a precious snowflake with the same
kind of braindamage. Back when it had been brought to your attention
(along with the fact that this piece of idiocy happens to be a file on
filesystem of your own, under your full control and no need whatsofuckingever
to multiplex completely unrelated sets of commands on the same file with
"was it write(2) or writev(2)?" used as a selector) the answer had been
basically "it doesn't have to make sense, it's Special". Now it turns out
that it has grown an equally special sibling. With the idiocy directly
exposed as userland ABI. Could we please fix that thing before it's cast in
stone?
Take a look at drivers/staging/rdma/hfi1/file_ops.c in -next and
compare hfi1_write_iter() with hfi1_file_write(). Folks, this ABI is too
ugly to live, let alone to be allowed breeding.
It's also brittle as hell - trivial massage around fs/read_write.c
and fs/aio.c is quite capable of breaking that shit. Arguably, IOCB_CMD_PWRITE
and IOCB_CMD_PWRITEV both triggering your writev() semantics is an example of
just such breakage. Sigh...
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter()) 2016-03-16 4:17 [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter()) Al Viro @ 2016-03-16 4:34 ` Linus Torvalds 2016-03-16 15:46 ` Doug Ledford 1 sibling, 0 replies; 7+ messages in thread From: Linus Torvalds @ 2016-03-16 4:34 UTC (permalink / raw) To: Al Viro Cc: Mike Marciniszyn, linux-rdma@vger.kernel.org, linux-fsdevel, Linux Kernel Mailing List On Tue, Mar 15, 2016 at 9:17 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Take a look at drivers/staging/rdma/hfi1/file_ops.c in -next and > compare hfi1_write_iter() with hfi1_file_write(). Folks, this ABI is too > ugly to live, let alone to be allowed breeding. > > It's also brittle as hell - trivial massage around fs/read_write.c > and fs/aio.c is quite capable of breaking that shit. Arguably, IOCB_CMD_PWRITE > and IOCB_CMD_PWRITEV both triggering your writev() semantics is an example of > just such breakage. Sigh... We could just decide that if a file descriptor has both ->write and ->write_iter entities, we always pick ->write_iter in the vfs layer. That way it's always consistent. Simple ordering change in __vfs_write().. We can switch is back later, but make sure it hits a release or two. Or at least a few rc's, to flush out any problems. Anybody who thinks that they can have different semantics for write() and writev() is just completely broken. Linus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter()) 2016-03-16 4:17 [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter()) Al Viro 2016-03-16 4:34 ` Linus Torvalds @ 2016-03-16 15:46 ` Doug Ledford 2016-03-16 16:02 ` Theodore Ts'o ` (2 more replies) 1 sibling, 3 replies; 7+ messages in thread From: Doug Ledford @ 2016-03-16 15:46 UTC (permalink / raw) To: Al Viro, Mike Marciniszyn Cc: linux-rdma, linux-fsdevel, linux-kernel, Linus Torvalds, Dennis Dalessandro, ira weiny, Jubin John [-- Attachment #1: Type: text/plain, Size: 4123 bytes --] On 03/16/2016 12:17 AM, Al Viro wrote: > Folks, we'd discussed that kind of crap already; why, in name of > everything unholy, is that kind of garbage brought back in a new driver? No doubt because in large part the hfi1 driver copy and pastes significant portions of the qib driver. Likewise, libpsm2 (which works with the hfi1 devices) is largely a copy and paste job from open-psm (which works with qib/ipath devices). > Having both ->write() and ->write_iter() *AND* having entirely > unrelated interpretation of user input on those two on the same device > is bogus, with the capital Stop That Shit Right Now. > > As it is, write(fd, p, len) and writev(fd, &(struct iovec){p,len}, 1) > are interpreted in absolutely unrelated ways. As in, entirely different > set of commands. Moreover, aio IOCB_CMD_PWRITE matches writev(), not write(). > > What's going on? Sure, ipathfs is a precious snowflake with the same > kind of braindamage. Back when it had been brought to your attention > (along with the fact that this piece of idiocy happens to be a file on > filesystem of your own, under your full control and no need whatsofuckingever > to multiplex completely unrelated sets of commands on the same file with > "was it write(2) or writev(2)?" used as a selector) the answer had been > basically "it doesn't have to make sense, it's Special". I can't speak for Mike, but I never said "It's Special". I said it's a driver internal thing with only one consumer and the kernel driver and the user space consumer are a matched pair. If this were a general API for use by any old program I would agree with you, but since it isn't, I wasn't that concerned about whether it got fixed. If it broke, Intel had both pieces and could fix it. And with that in mind I said "ince this is an internal driver interface that only Intel uses, I'm not inclined to force them to rewrite their driver and their library just because their particular usage took you off guard." I should point out that the fragility is not so rampant as you make it sound. The qib driver was added in 2010 and had this interface then (modulo your changes in 4961772560d2). It was a copy from the ipath driver at the time, so in truth had been around much longer even than 2010. > Now it turns out > that it has grown an equally special sibling. With the idiocy directly > exposed as userland ABI. Could we please fix that thing before it's cast in > stone? If we want to maintain back compatibility, then the qib driver has to maintain this interface. We could possibly do a new one as well, but we can't remove this one. For the hfi1 driver (and OPA in general), we do have the ability to do a new API. But, going back to what I said before, I just don't care that much. It's Intel internal stuff as far as I'm concerned. If they do something fragile and it breaks, then that's all on their hands. They've gotten away with it for over 10 years so I can see why they probably aren't that concerned about it themselves. > Take a look at drivers/staging/rdma/hfi1/file_ops.c in -next and > compare hfi1_write_iter() with hfi1_file_write(). Folks, this ABI is too > ugly to live, let alone to be allowed breeding. > > It's also brittle as hell - trivial massage around fs/read_write.c > and fs/aio.c is quite capable of breaking that shit. Arguably, IOCB_CMD_PWRITE > and IOCB_CMD_PWRITEV both triggering your writev() semantics is an example of > just such breakage. Sigh... I'm sure you could break it intentionally if you wanted to. Not that I think you *should*, but I'm sure you *could*. Intel: For your own sake's you might want to consider doing something simple such as using two files, one for regular commands and one for SDMA commands, and modifying both the hfi1 and libpsm2 code bases. I don't care, and I wouldn't force you to do it, but I've made my argument to Al and he appears to be running it up the tree, so it might be easiest to capitulate. -- Doug Ledford <dledford@redhat.com> GPG KeyID: 0E572FDD [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 884 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter()) 2016-03-16 15:46 ` Doug Ledford @ 2016-03-16 16:02 ` Theodore Ts'o 2016-03-16 16:36 ` Linus Torvalds [not found] ` <20160316220029.GA5213@scvm10.sc.intel.com> 2 siblings, 0 replies; 7+ messages in thread From: Theodore Ts'o @ 2016-03-16 16:02 UTC (permalink / raw) To: Doug Ledford Cc: Al Viro, Mike Marciniszyn, linux-rdma, linux-fsdevel, linux-kernel, Linus Torvalds, Dennis Dalessandro, ira weiny, Jubin John On Wed, Mar 16, 2016 at 11:46:31AM -0400, Doug Ledford wrote: > I can't speak for Mike, but I never said "It's Special". I said it's a > driver internal thing with only one consumer and the kernel driver and > the user space consumer are a matched pair. If this were a general API > for use by any old program I would agree with you, but since it isn't, I > wasn't that concerned about whether it got fixed. If it broke, Intel > had both pieces and could fix it. And with that in mind I said "ince > this is an internal driver interface that only Intel uses, I'm not > inclined to force them to rewrite their driver and their library just > because their particular usage took you off guard." The thing which is really scary about the "we own both pieces, so we can be sloppy with the interface design" is there is the question of whether there are any security issues with such an interface. Maybe the assumption is that only root will get to access the interface, but as we're seeing with user namespaces, very often these assumptions are getting upended. So certainly if I were a bad guy working at the NSA^H^H^H^H KGB trying to find a zero-day that I would keep in my agency's back pocket, interfaces designed with this kind of attitude would be the first place I would look. > For the hfi1 driver (and OPA in general), we do have the ability to do a > new API. But, going back to what I said before, I just don't care that > much. It's Intel internal stuff as far as I'm concerned. If they do > something fragile and it breaks, then that's all on their hands. Except if there's a security vulernability, then it's on our collective heads as kernel developers. Unless we want to tell people, "don't use intel hardware, it's written by device driver authors who are sloppy".... - Ted ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter()) 2016-03-16 15:46 ` Doug Ledford 2016-03-16 16:02 ` Theodore Ts'o @ 2016-03-16 16:36 ` Linus Torvalds 2016-03-16 17:06 ` Al Viro [not found] ` <20160316220029.GA5213@scvm10.sc.intel.com> 2 siblings, 1 reply; 7+ messages in thread From: Linus Torvalds @ 2016-03-16 16:36 UTC (permalink / raw) To: Doug Ledford Cc: Al Viro, Mike Marciniszyn, linux-rdma@vger.kernel.org, linux-fsdevel, Linux Kernel Mailing List, Dennis Dalessandro, ira weiny, Jubin John On Wed, Mar 16, 2016 at 8:46 AM, Doug Ledford <dledford@redhat.com> wrote: > > If we want to maintain back compatibility, then the qib driver has to > maintain this interface. We could possibly do a new one as well, but we > can't remove this one. We've broken more important driver ABI's before - all the nasty X stuff. Now, the X people did learn their lesson, and it hasn't happened lately (thank Gods!), but quite frankly, some shit-for-brains hardware-specific config interface for a rdma device that basically nobody uses is a _lot_ less important than X ever was. So I don't care one whit if we break it, and it's not the kind of backwards compatibility the kernel should worry about. There are exactly zero regular users of this interface. I assume that people who use this thing are *so* deeply technical that they can take care of themselves. And it really is a completely broken interface. I might be proven wrong, and somebody's dear old grandma ends up complaining about a new kernel breaking her configuration, and in that case we'd have to revert anything that causes that breakage. But I suspect I'm not wrong. Linus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter()) 2016-03-16 16:36 ` Linus Torvalds @ 2016-03-16 17:06 ` Al Viro 0 siblings, 0 replies; 7+ messages in thread From: Al Viro @ 2016-03-16 17:06 UTC (permalink / raw) To: Linus Torvalds Cc: Doug Ledford, Mike Marciniszyn, linux-rdma@vger.kernel.org, linux-fsdevel, Linux Kernel Mailing List, Dennis Dalessandro, ira weiny, Jubin John On Wed, Mar 16, 2016 at 09:36:46AM -0700, Linus Torvalds wrote: > On Wed, Mar 16, 2016 at 8:46 AM, Doug Ledford <dledford@redhat.com> wrote: > > > > If we want to maintain back compatibility, then the qib driver has to > > maintain this interface. We could possibly do a new one as well, but we > > can't remove this one. > > We've broken more important driver ABI's before - all the nasty X stuff. > > Now, the X people did learn their lesson, and it hasn't happened > lately (thank Gods!), but quite frankly, some shit-for-brains > hardware-specific config interface for a rdma device that basically > nobody uses is a _lot_ less important than X ever was. > > So I don't care one whit if we break it, and it's not the kind of > backwards compatibility the kernel should worry about. There are > exactly zero regular users of this interface. I assume that people who > use this thing are *so* deeply technical that they can take care of > themselves. And it really is a completely broken interface. > > I might be proven wrong, and somebody's dear old grandma ends up > complaining about a new kernel breaking her configuration, and in that > case we'd have to revert anything that causes that breakage. But I > suspect I'm not wrong. Let's start with "do saner interface for hfi1 if you want it in", then duplicate it for ipathfs (using the saner userland side already tested on fixed hfi1), then we make sure nobody is even tempted to repeat that ugliness more or less along the lines of what you'd proposed in fs/read_write.c, but might as well add a sanity check in do_dentry_open(), where we already have if ((f->f_mode & FMODE_WRITE) && likely(f->f_op->write || f->f_op->write_iter)) f->f_mode |= FMODE_CAN_WRITE; - the check for both methods being present could go there conveniently enough; "use new_sync_write() as ->write" is spelled as "have NULL ->write and non-NULL ->write_iter" these days, so having both is very rare; it's only /dev/zero, /dev/null and these two perversions. And while writes to /dev/null are very important (for email handling, if nothing else), I suspect that we won't be seriously hurt by having it do the extra work new_sync_write() would involve... Or we could special-case that in the check (move write_null(), aka return the third argument into fs/libfs.c and in unlikely case of having both ->write and ->write_iter check if ->write is that one). ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20160316220029.GA5213@scvm10.sc.intel.com>]
* RE: [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter()) [not found] ` <20160316220029.GA5213@scvm10.sc.intel.com> @ 2016-03-16 22:52 ` Dalessandro, Dennis 0 siblings, 0 replies; 7+ messages in thread From: Dalessandro, Dennis @ 2016-03-16 22:52 UTC (permalink / raw) To: Doug Ledford Cc: Al Viro, infinipath, linux-rdma@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Linus Torvalds, Weiny, Ira, John, Jubin Just wanted to re-send so it gets picked up by the lists. I was using a different server than usual and vger did not like it. Hopefully this attempt fares better. Sorry for the extra noise. -- On Wed, Mar 16, 2016 at 11:46:31AM -0400, Doug Ledford wrote: >Intel: For your own sake's you might want to consider doing something >simple such as using two files, one for regular commands and one for >SDMA commands, and modifying both the hfi1 and libpsm2 code bases. I >don't care, and I wouldn't force you to do it, but I've made my argument >to Al and he appears to be running it up the tree, so it might be >easiest to capitulate. This should be doable. We could have multiple files and have each implement only the appropriate handler, write() for control and writev() for data. Another option we are considering is combine the struct used for regular commands and the one for SDMA commands into a union in a structure that includes some magic value that tells us how to interpret the payload. This way whether it's write() or writev() won't matter we'll know how to handle it. We'll start with fixing hfi1 first and go from there. -Denny ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-03-16 22:52 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-16 4:17 [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter()) Al Viro
2016-03-16 4:34 ` Linus Torvalds
2016-03-16 15:46 ` Doug Ledford
2016-03-16 16:02 ` Theodore Ts'o
2016-03-16 16:36 ` Linus Torvalds
2016-03-16 17:06 ` Al Viro
[not found] ` <20160316220029.GA5213@scvm10.sc.intel.com>
2016-03-16 22:52 ` Dalessandro, Dennis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).