public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Paul Turner <pjt@google.com>
Cc: linux-kernel@vger.kernel.org, adobriyan@gmail.com
Subject: Re: Migration of kernel interfaces to seq_files breaks pread() consumers
Date: Sat, 24 Jan 2009 18:19:24 -0800	[thread overview]
Message-ID: <20090124181924.d633523c.akpm@linux-foundation.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0901162229420.12858@kitami.corp.google.com>

On Fri, 16 Jan 2009 23:51:35 -0800 (PST) Paul Turner <pjt@google.com> wrote:

> 
> (Specifically) Several interfaces under /proc have been migrated to use 
> seq_files.  This was previously observed to be a problem with VMware's 
> reading of /proc/uptime.  We're now running into the same problem on  
> /proc/<pid>/stat; we have many consumers performing preads on this 
> interface which break under new kernels.
> 
> Reverting these migrations presents other problems and doesn't scale with 
> everyones' pet dependencies over an abi that's been
> broken :(

We changed userspace-visible behaviour and broke real applications. 
This is a serious matter.  So serious in fact that your report has
languished without reply for a week.

Reverting those changes until we have a suitable reimplementation which
doesn't bust userspace is 100% justifiable.

In which kernel versions is this regression present?

What would a revert look like?  Big and ugly or small and simple?  Do
the original commits (which were they?) still revert OK?

> Part of the problem in implementing pread in seq_files is that we don't  
> know know whether the read was issued by pread(2) or read(2).  It's not 
> nice to shoehorn this information down the stack.  I've attached a 
> skeleton patch which shows one way we could push it up (although something 
> like a second f_pos would be necessary to make it maintain pread 
> semantics against reads).
> 
> One advantage of this style of approach is that it doesn't break on 
> partial record reads.  But it's a little gross at the same time.
> 

Yes, that is a bit gross.

Does this patch actually 100% solve the problem, or is it a precursor
to some other fix or what?  It's hard to comment sensibly if it's a
partial thing with no sign how it will be used.

> diff --git a/fs/read_write.c b/fs/read_write.c
> index 2fc2980..744094a 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -407,6 +407,16 @@ asmlinkage ssize_t sys_pread64(unsigned int fd, char __user *buf,
>  		ret = -ESPIPE;
>  		if (file->f_mode & FMODE_PREAD)
>  			ret = vfs_read(file, buf, count, &pos);
> +		else if (file->f_mode & FMODE_SEQ_FILE) {
> +			/*
> +			 * We break the pread semantic and actually make it
> +			 * seek, this prevents inconsistent record reads across
> +			 * boundaries.
> +			 */
> +			vfs_llseek(file, pos, SEEK_SET);
> +			ret = vfs_read(file, buf, count, &pos);
> +			file_pos_write(file, pos);
> +		}

Well yes, that's a userspace-visible wrong change too.

>  		fput_light(file, fput_needed);
>  	}
>  
> diff --git a/fs/seq_file.c b/fs/seq_file.c
> index 3f54dbd..f8c5379 100644
> --- a/fs/seq_file.c
> +++ b/fs/seq_file.c
> @@ -50,6 +50,8 @@ int seq_open(struct file *file, const struct seq_operations *op)
>  
>  	/* SEQ files support lseek, but not pread/pwrite */
>  	file->f_mode &= ~(FMODE_PREAD | FMODE_PWRITE);
> +	file->f_mode |= FMODE_SEQ_FILE;
> +
>  	return 0;
>  }
>  EXPORT_SYMBOL(seq_open);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 5f7b912..c3b5916 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -76,6 +76,8 @@ extern int dir_notify_enable;
>     behavior for cross-node execution/opening_for_writing of files */
>  #define FMODE_EXEC	16
>  
> +#define FMODE_SEQ_FILE_PREAD	32

-EWONTCOMPILE, btw.

  reply	other threads:[~2009-01-25  2:19 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-17  7:51 Migration of kernel interfaces to seq_files breaks pread() consumers Paul Turner
2009-01-25  2:19 ` Andrew Morton [this message]
2009-01-25  3:40   ` Paul Turner
2009-01-25 12:08   ` Alexey Dobriyan
2009-01-27 21:47     ` Eric W. Biederman
2009-01-27 21:48       ` [PATCH 1/2] seq_file: Move traverse so it can be used from seq_read Eric W. Biederman
2009-01-27 21:49         ` [PATCH 2/2] seq_file: Properly cope with pread Eric W. Biederman
2009-01-30  1:26       ` Migration of kernel interfaces to seq_files breaks pread() consumers Paul Turner
2009-01-30  3:01         ` Eric W. Biederman
2009-01-30  6:09           ` Paul Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090124181924.d633523c.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=adobriyan@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pjt@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox