All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH net 1/1] net: openvswitch: Fix ct_state nat flags for conns arriving from tc
From: Jamal Hadi Salim @ 2022-01-05 14:57 UTC (permalink / raw)
  To: Paul Blakey, dev, netdev, Cong Wang, Pravin B Shelar, davem,
	Jiri Pirko, Jakub Kicinski
  Cc: Saeed Mahameed, Oz Shlomo, Vlad Buslov, Roi Dayan
In-Reply-To: <20220104082821.22487-1-paulb@nvidia.com>

On 2022-01-04 03:28, Paul Blakey wrote:
[..]
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -287,7 +287,9 @@ struct tc_skb_ext {
>   	__u32 chain;
>   	__u16 mru;
>   	__u16 zone;
> -	bool post_ct;
> +	bool post_ct:1;
> +	bool post_ct_snat:1;
> +	bool post_ct_dnat:1;
>   };


is skb_ext intended only for ovs? If yes, why does it belong
in the core code? Ex: Looking at tcf_classify() which is such
a core function in the fast path any packet going via tc, it
is now encumbered with with checking presence of skb_ext.
I know passing around metadata is a paramount requirement
for programmability but this is getting messier with speacial
use cases for ovs and/or offload...

cheers,
jamal

^ permalink raw reply

* Re: [PATCH 0/7] y2038: cond_wait_prologue64 and related fixes
From: Jan Kiszka @ 2022-01-05 14:58 UTC (permalink / raw)
  To: Bezdeka, Florian (T CED SES-DE), xenomai@xenomai.org
In-Reply-To: <c53158602ef5d92c35dade7e66e34fcbafaa333c.camel@siemens.com>

On 05.01.22 15:56, Bezdeka, Florian (T CED SES-DE) wrote:
> On Wed, 2022-01-05 at 15:43 +0100, Jan Kiszka wrote:
>> On 05.01.22 15:06, Florian Bezdeka wrote:
>>> Hi all,
>>>
>>> this is the last missing POSIX related y2038 affected syscall in
>>> Xenomai. With this applied we have two Xenomai specific syscalls
>>> missing:
>>>
>>>   - sc_cobalt_thread_setschedparam_ex
>>>   - sc_cobalt_thread_getschedparam_ex
>>>
>>> While adding tests for the introduced cond_wait_prologue64 I hit a
>>> kernel OOPS due to insuficient validation of user provided pointers.
>>> That has been addressed as well.
>>
>> Thanks for both! Is it possibly to move the fixes the front? That would
>> also ensure that I can easily pick them into stable.
> 
> Yes. Patch 4 and 7 could be moved to the front easily. Do you want me
> to split patch 2 into the y2038 and non y2038 part, or does that not
> qualify for stable at all?

Can I reorder things myself, or does patch 4 break (patch 7 does not,
already checked)? Then I just change the application order while doing
git am.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply

* Re: Backport request: commit 0dc54bd4d6e0 ("fscache_cookie_enabled: check cookie is valid before accessing it")
From: Greg KH @ 2022-01-05 14:58 UTC (permalink / raw)
  To: Jeffrey E Altman; +Cc: stable, linux-afs
In-Reply-To: <8b47354f-ff8f-4dfe-6c1e-813ffefbcf79@auristor.com>

On Tue, Jan 04, 2022 at 05:29:34PM -0500, Jeffrey E Altman wrote:
> Please backport commit 0dc54bd4d6e03be1f0b678c4297170b79f1a44ab
> ("fscache_cookie_enabled: check cookie is valid before accessing it") to
> the 5.13, 5.14, and 5.15 kernel series.

Only 5.15 is still alive, see the front page of kernel.org to see the
active kernel verisons.

> Commit 0dc54bd4d6e03be1f0b678c4297170b79f1a44ab fixes a bug introduced
> by 3003bbd0697b659944237f3459489cb596ba196c ("afs: Use the
> netfs_write_begin() helper") that results in a NULL pointer dereference
> observed in Fedora 35 when accessing afs volumes from Kubernetes.
> 
> [ 3627.403829] BUG: kernel NULL pointer dereference, address:
> 0000000000000068
> [ 3627.411649] RIP: 0010:afs_is_cache_enabled+0xc/0x30 [kafs]
> [ 3627.419900] Call Trace:
> [ 3627.420432]  <TASK>
> [ 3627.420957]  netfs_write_begin+0x1ff/0x810 [netfs]
> [ 3627.421498]  ? lock_timer_base+0x61/0x80
> [ 3627.422124]  afs_write_begin+0x58/0x240 [kafs]
> [ 3627.422738]  generic_perform_write+0xae/0x1d0
> [ 3627.423325]  ? file_update_time+0xd2/0x120
> [ 3627.423806]  __generic_file_write_iter+0x101/0x1d0
> [ 3627.424275]  generic_file_write_iter+0x5d/0xb0
> [ 3627.424741]  afs_file_write+0x73/0xa0 [kafs]
> [ 3627.425270]  new_sync_write+0x10b/0x180
> [ 3627.425708]  vfs_write+0x1ce/0x260
> [ 3627.426160]  ksys_write+0x4f/0xc0
> [ 3627.426606]  do_syscall_64+0x3b/0x90
> [ 3627.427086]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> The defect was introduced in v5.13-rc1 and fixed in v5.16-rc1.


Now queued up, thanks.

greg k-h

^ permalink raw reply

* Re: [PATCH v2] hw/arm/virt: KVM: Enable PAuth when supported by the host
From: Andrew Jones @ 2022-01-05 14:58 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Richard Henderson, qemu-devel, kernel-team, kvmarm
In-Reply-To: <20220103180507.2190429-1-maz@kernel.org>

On Mon, Jan 03, 2022 at 06:05:07PM +0000, Marc Zyngier wrote:
> Add basic support for Pointer Authentication when running a KVM
> guest and that the host supports it, loosely based on the SVE
> support.
> 
> Although the feature is enabled by default when the host advertises
> it, it is possible to disable it by setting the 'pauth=off' CPU
> property. The 'pauth' comment is removed from cpu-features.rst,
> as it is now common to both TCG and KVM.
> 
> Tested on an Apple M1 running 5.16-rc6.
> 
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Andrew Jones <drjones@redhat.com>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
> * From v1:
>   - Drop 'pauth' documentation
>   - Make the TCG path common to both TCG and KVM
>   - Some tidying up
> 
>  docs/system/arm/cpu-features.rst |  4 ----
>  target/arm/cpu.c                 | 14 ++++----------
>  target/arm/cpu.h                 |  1 +
>  target/arm/cpu64.c               | 33 ++++++++++++++++++++++++++++----
>  target/arm/kvm64.c               | 21 ++++++++++++++++++++
>  5 files changed, 55 insertions(+), 18 deletions(-)
>

 
Reviewed-by: Andrew Jones <drjones@redhat.com>

Thanks,
drew

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply

* Re: [PATCH v2] hw/arm/virt: KVM: Enable PAuth when supported by the host
From: Andrew Jones @ 2022-01-05 14:58 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: qemu-devel, kvmarm, kvm, kernel-team, Eric Auger,
	Richard Henderson, Peter Maydell
In-Reply-To: <20220103180507.2190429-1-maz@kernel.org>

On Mon, Jan 03, 2022 at 06:05:07PM +0000, Marc Zyngier wrote:
> Add basic support for Pointer Authentication when running a KVM
> guest and that the host supports it, loosely based on the SVE
> support.
> 
> Although the feature is enabled by default when the host advertises
> it, it is possible to disable it by setting the 'pauth=off' CPU
> property. The 'pauth' comment is removed from cpu-features.rst,
> as it is now common to both TCG and KVM.
> 
> Tested on an Apple M1 running 5.16-rc6.
> 
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Andrew Jones <drjones@redhat.com>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
> * From v1:
>   - Drop 'pauth' documentation
>   - Make the TCG path common to both TCG and KVM
>   - Some tidying up
> 
>  docs/system/arm/cpu-features.rst |  4 ----
>  target/arm/cpu.c                 | 14 ++++----------
>  target/arm/cpu.h                 |  1 +
>  target/arm/cpu64.c               | 33 ++++++++++++++++++++++++++++----
>  target/arm/kvm64.c               | 21 ++++++++++++++++++++
>  5 files changed, 55 insertions(+), 18 deletions(-)
>

 
Reviewed-by: Andrew Jones <drjones@redhat.com>

Thanks,
drew


^ permalink raw reply

* nfsd v4 server can crash in COPY_NOTIFY
From: rtm @ 2022-01-05 14:59 UTC (permalink / raw)
  To: J. Bruce Fields, Chuck Lever; +Cc: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 1222 bytes --]

If the special ONE stateid is passed to nfs4_preprocess_stateid_op(),
it returns status=0 but does not set *cstid. nfsd4_copy_notify()
depends on stid being set if status=0, and thus can crash if the
client sends the right COPY_NOTIFY RPC.

I've attached a demo.

# uname -a
Linux (none) 5.16.0-rc7-00108-g800829388818-dirty #28 SMP Wed Jan 5 14:40:37 UTC 2022 riscv64 riscv64 riscv64 GNU/Linux
# cc nfsd_5.c
# ./a.out
...
[   35.583265] Unable to handle kernel paging request at virtual address ffffffff00000008
[   35.596916] status: 0000000200000121 badaddr: ffffffff00000008 cause: 000000000000000d
[   35.597781] [<ffffffff80640cc6>] nfs4_alloc_init_cpntf_state+0x94/0xdc
[   35.598576] [<ffffffff80274c98>] nfsd4_copy_notify+0xf8/0x28e
[   35.599386] [<ffffffff80275c86>] nfsd4_proc_compound+0x2b6/0x4ee
[   35.600166] [<ffffffff8025f7f4>] nfsd_dispatch+0x118/0x174
[   35.600840] [<ffffffff8061a2e8>] svc_process_common+0x2f4/0x56c
[   35.601630] [<ffffffff8061a624>] svc_process+0xc4/0x102
[   35.602302] [<ffffffff8025f25a>] nfsd+0xfa/0x162
[   35.602979] [<ffffffff80027010>] kthread+0x124/0x136
[   35.603668] [<ffffffff8000303e>] ret_from_exception+0x0/0xc
[   35.604667] ---[ end trace 69f12ad62072e251 ]---


[-- Attachment #2: nfsd_5.c --]
[-- Type: application/octet-stream, Size: 41067 bytes --]

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/ioctl.h>
#include <netinet/in.h>
#include <sys/wait.h>
#include <sys/resource.h>
#include <arpa/inet.h>
#include <assert.h>

#define NAA 128
unsigned long long aa[NAA] = {
0xc2ffffffull,
0x0ull,
0xfcffffff00000000ull,
0xfaffffffull,
0xc6ffffff00000000ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
0x0ull,
};
int aai = 0;
int symstart = -1;

char obuf[10240];
int oi = 0;

int s; // socket fd
int xid = 1;
unsigned long long clientid; // server tells us in exchange_id reply
unsigned int sequenceid;
unsigned int slot0sequenceid = 1;
char sessionid[16];
int stateid_seqid; // from last received stateid4
char stateid_other[12]; // from last received stateid4
int tmp_fh_len;
char tmp_fh[256];

void
sys(const char *cmd)
{
  volatile int junk = system(cmd);
  (void) junk;
}

void put_fattr4_one();
void put_fattr4_many();

void
put32(unsigned int x)
{
  assert((oi % 4) == 0);
  *(int*)(obuf+oi) = htonl(x);
  oi += 4;
}

void
put64(unsigned long long x)
{
  put32(x >> 32);
  put32(x);
}

void
put_opaque(int n, const char *buf)
{
  put32(n);
  for(int i = 0; i < n; i++)
    obuf[oi++] = (buf ? buf[i] : 0);
  while(n & 3){
    obuf[oi++] = 0;
    n++;
  }
}

void
put_opaque_repeat(int n, char c)
{
  put32(n);
  for(int i = 0; i < n; i++)
    obuf[oi++] = c;
  while((n%4)!=0){
    obuf[oi++] = 0;
    n++;
  }
}

void
put_sessionid(const char *sid)
{
  for(int i = 0; i < 16; i++){
    obuf[oi++] = (sid ? sid[i] : 0);
  }
}

void
put_reset()
{
  oi = 4; // leave room for packet length
}

void
send_send()
{
  assert(oi >= 4);
  assert((oi % 4) == 0);
  assert(oi <= sizeof(obuf));
  assert(aai <= NAA);
  for(int i = 0; i < 16; i++)
    put32(0xffffffff);
  if(symstart != -1){
    for(int i = symstart; i < oi && aai < NAA; i += 8)
      *(long long *)(obuf + i) ^= aa[aai++];
  }
  *(int*)(obuf+0) = htonl((oi - 4) | 0x80000000);
  printf("writing %d xid %d\n", oi, ntohl(*(int*)(obuf+4)));
  if(write(s, obuf, oi) <= 0) perror("write");
  oi = 0;
  symstart = -1;
}

void
put_rpc_header(int proc)
{
  put_reset();
  put32(xid++);
  put32(0); // mtype=CALL
  put32(2); // rpc version
  put32(100003); // prog #
  put32(4); // prog vers
  put32(proc); // proc
  if(proc == 0){
    put32(0); // cred type
    put32(0); // cred len
  } else {
    put32(1); // cred type AUTH_SYS / AUTH_UNIX
    put32(32); // cred length
    put32(0); // stamp
    put_opaque(9, "localhost");
    put32(65534); // uid
    put32(65534); // gid
    put32(0); // # gids
  }
  put32(0); // verf type
  put32(0); // verf len
}

void
put_compound(int n)
{
  put_rpc_header(1);

  // compound header
  put_opaque(0, ""); // tag
  put32(2); // minor version
  put32(n); // # operations in the compound
}

// most COMPOUNDs are required to start with a SEQUENCE.
void
put_sequence()
{
  put32(53); // SEQUENCE
  put_sessionid(sessionid); // sessionid (16 bytes)
  put32(slot0sequenceid++); // sequenceid ???
  put32(0); // slotid
  put32(0); // highest_slotid
  put32(0); // cachethis
}

void
put_reclaim_complete()
{
  put32(58); // RECLAIM_COMPLETE
  put32(0); // 0 means global, 1 means just current fh
}


char ibuf[10240];
int ii;
int ilen;

int
readn(int fd, void *xbuf, int n)
{
  char *buf = (char *) xbuf;
  int orig = n;
  while(n > 0){
    int cc = read(fd, buf, n);
    if(cc <= 0) { perror("read"); return -1; }
    n -= cc;
    buf += cc;
  }
  return orig;
}

unsigned int
parse32()
{
  if(ii >= ilen){
    printf("parsed beyond the end of the input\n");
    return 0;
  }
  unsigned int x = *(int*)(ibuf+ii);
  ii += 4;
  return ntohl(x);
}

unsigned long long
parse64()
{
  unsigned long long hi = parse32();
  unsigned long long lo = parse32();
  return (hi << 32) | lo;
}

// sessionid4 -- 16 bytes
void
parse_sessionid(char *sid)
{
  for(int i = 0; i < 16; i++){
    if(sid)
      sid[i] = ibuf[ii];
    ii++;
  }
}

void
put_sid(char *sid)
{
  for(int i = 0; i < 16; i++){
    obuf[oi++] = (sid ? sid[i] : 0);
  }
}

// sessionid4 -- 16 bytes
void
parse_sid(char *sid)
{
  for(int i = 0; i < 16; i++){
    if(sid)
      sid[i] = ibuf[ii];
    ii++;
  }
}

unsigned int
parse_opaque(char *buf)
{
  if(buf)
    buf[0] = 0;
  int nominal_n = parse32();
  if(nominal_n > 4096){
    printf("crazy opaque length %d\n", nominal_n);
    return 0;
  }
  int real_n = nominal_n;
  while((real_n%4) != 0) real_n += 1;
  for(int i = 0; i < real_n; i++){
    if(buf && i < real_n)
      buf[i] = ibuf[ii];
    ii++;
  }
  return nominal_n;
}

void
parse_exchange_id_reply()
{
  int status = parse32();
  if(status != 0)
    printf("exchange_id reply status %d, not 0\n", status);
  clientid = parse64();
  sequenceid = parse32();
  printf("exchange_id clientid 0x%llx sequenceid 0x%x\n", clientid, sequenceid);
}

void
parse_create_session_reply()
{
  int status = parse32();
  if(status != 0)
    printf("create_session reply status %d, not 0\n", status);
  parse_sessionid(sessionid);
}

void
parse_sequence_reply()
{
  int status = parse32();
  if(status != 0)
    printf("sequence reply status %d, not 0\n", status);
  parse_sessionid(0);
  parse32(); // sequenceid
  parse32(); // slotid
  parse32(); // highest_slotid
  parse32(); // target_highest_slotid
  parse32(); // status_flags
}

void
parse_putrootfh_reply()
{
  int status = parse32();
  if(status != 0)
    printf("putrootfh_reply status %d\n", status);
}

void
parse_lookup_reply()
{
  int status = parse32();
  if(status != 0)
    printf("lookup_reply status %d\n", status);
}

void
parse_stateid()
{
  stateid_seqid = parse32();
  for(int i = 0; i < 12; i++)
    stateid_other[i] = ibuf[ii++];
}

void
parse_open_reply()
{
  int status = parse32();
  if(status != 0){
    printf("open status %d\n", status);
    return;
  }
  parse_stateid();
  parse32(); // change_info atomic
  parse64(); // change_info before
  parse64(); // change_info after
  parse32(); // rflags
  unsigned int bitwords = parse32(); // attrset
  for(int i = 0; i < bitwords; i++)
    parse32();
  int delegation_type = parse32(); // open_delegation4
  if(delegation_type == 0){
    // OPEN_DELEGATE_NONE
  } else if(delegation_type == 1){
    // OPEN_DELEGATE_READ
    // open_read_delegation4
    parse32(); // stateid seqid
    parse32(); // other
    parse32(); // other
    parse32(); // other
    parse32(); // recall
    // nfsace4
    parse32(); // nfsace4 type
    parse32(); // nfsace4 flag
    parse32(); // nfsace4 access_mark
    parse_opaque(0); // nfsace4 who
  } else if(delegation_type == 2){
    // OPEN_DELEGATE_WRITE
    parse32(); // stateid seqid
    parse32(); // other
    parse32(); // other
    parse32(); // other
    parse32(); // recall
    // nfs_space_limit4
    int limitby = parse32();
    if(limitby == 1){
      // NFS_LIMIT_SIZE
      parse64(); // filesize
    } else if(limitby == 2){
      // NFS_LIMIT_BLOCKS
      parse32(); // num_blocks
      parse32(); // bytes_per_block
    } else {
      printf("open reply, unknown limitby %d\n", limitby);
    }
    // nfsace4
    parse32(); // nfsace4 type
    parse32(); // nfsace4 flag
    parse32(); // nfsace4 access_mark
    parse_opaque(0); // nfsace4 who
  } else {
    printf("DID NOT understand delegation_type %d\n", delegation_type);
  }
}

void
parse_compound_reply()
{
  int stat = parse32(); // OK
  parse_opaque(0);
  int nops = parse32();
  printf("compound reply, nops %d, stat %d", nops, stat);
  if(stat > 0 && stat < 200){
    printf(" %s", strerror(stat));
  }
  printf("\n");
  for(int opi = 0; opi < nops && ii < ilen; opi++){
    int op = parse32();
    printf("reply for op %d\n", op);
    if(op == 53){
      parse_sequence_reply();
    } else if(op == 42){
      parse_exchange_id_reply();
    } else if(op == 43){
      parse_create_session_reply();
    } else if(op == 24){
      parse_putrootfh_reply();
    } else if(op == 15){
      parse_lookup_reply();
    } else if(op == 18){
      parse_open_reply();
    } else if(op == 26){
      int status = parse32();
      printf("readdir status %d\n", status);
      if(status == 0){
        long long verf = parse64();
        int nentries = parse32();
        long long cookie = parse64();
        char name[1024];
        memset(name, 0, sizeof(name));
        parse_opaque(name);
        printf("verf %llx *entries %d cookie %llx name %s\n", verf, nentries, cookie, name);
      }
      break;
    } else if(op == 34){
      int status = parse32();
      printf("setattr status %d\n", status);
      break;
    } else if(op == 22){
      // putfh
      int status = parse32();
      printf("putfh status %d\n", status);
    } else if(op == 10){
      // getfh
      int status = parse32();
      if(status == 0){
        int tmp_fh_len = parse_opaque(tmp_fh);
        printf("getfh fh_len %d\n", tmp_fh_len);
      } else {
        printf("getfh status %d\n", status);
      }
    } else {
      break;
    }
  }
}

void
parse_callback(int xid)
{
  parse32(); // rpc version
  parse32(); // prog #
  parse32(); // prog vers
  int proc = parse32(); // proc
  parse32(); // auth flavor
  parse_opaque(0); // auth
  parse32(); // verf flavor
  parse_opaque(0); // verf
  printf("callback proc=%d\n", proc);

  oi = 0;
  put32(0); // placeholder
  put32(xid);
  put32(1); // REPLY
  put32(0); // MSG_ACCEPTED
  put32(0); // opaque_auth flavor = AUTH_NULL
  put32(0); // opaque_auth length
  put32(0); // SUCCESS
  int xoi = oi;
  if(proc == 0){
    // nop
  } else if(proc == 1){
    // compound
    parse_opaque(0); // tag
    parse32(); // minorversion
    parse32(); // callback_ident
    int nops = parse32();
    put32(0); // status
    put_opaque(0, ""); // tag
    put32(nops);
    for(int opi = 0; opi < nops; opi++){
      int op = parse32();
      xoi = oi;
      put32(op);
      if(op == 11){
        // CB_SEQUENCE
        char sid[16];
        parse_sid(sid);
        int seq = parse32(); // sequenceid
        int slot = parse32(); // slotid
        int hislot = parse32(); // highest_slotid
        parse32(); // cachethis
        int nrcl = parse32(); // csa_referring_call_lists<>
        for(int rci = 0; rci < nrcl; rci++){
          parse32(); // sessionid
          parse32();
          parse32();
          parse32();
          int nxxx = parse32(); // rcl_referring_calls<>
          for(int xi = 0; xi < nxxx; xi++){
            parse32(); // sequenceid
            parse32(); // slotid
          }
        }
        put_sid(sid);
        put32(seq); // sequenceid
        put32(slot); // slotid
        put32(hislot); // highest_slotid
        put32(hislot); // target_highest_slotid
      } else if(op == 4){
        printf("CB_RECALL\n");
        // stateid4
        parse32(); // seqid
        parse32(); // other
        parse32();
        parse32();
        parse32(); // truncate
        parse_opaque(0); // fh
        put32(0); // OK
      } else {
        printf("callback unknown op %d\n", op);
        break;
      }
    }
  } else {
    printf("callback: unknown proc %d\n", proc);
  }
  send_send();
}

void
parse_reply(int proc)
{
  int desired_xid = xid - 1;
 again:
  if(readn(s, &ilen, 4) < 0)
    return;
  ilen = ntohl(ilen);
  if((ilen & 0x80000000) == 0)
    printf("ilen is missing 0x80000000\n");
  ilen &= 0x7fffffff;
  if(ilen > sizeof(ibuf)){
    printf("huge packet %d\n", ilen);
    return;
  }
  if(readn(s, ibuf, ilen) < 0)
    return;
  ii = 0;
  int xxid = parse32(); // xid
  int mtype = parse32(); // 1 = REPLY
  if(mtype == 0){
    // CALL -- a callback
    parse_callback(xxid);
    goto again;
  }
  if(xxid != desired_xid){
    printf("xid mismatch, wanted 0x%x, got 0x%x, ilen %d, ii %d\n", desired_xid, xxid, ilen, ii);
  }
  if(mtype != 1)
    printf("unexpected mtype %d, expected 1 / REPLY\n", mtype);
  int stat = parse32(); // MSG_ACCEPTED
  if(stat != 0)
    printf("unexpected reply stat %d, expected 0 / MSG_ACCEPTED\n", stat);
  int flavor = parse32(); // auth flavor
  if(flavor != 0)
    printf("unexpected auth_flavor %d, expecting 0 / AUTH_NONE\n", flavor);
  parse_opaque(0); // verf
  stat = parse32(); // SUCCESS
  if(stat != 0)
    printf("unexpected stat %d, expected 0 / SUCCESS\n", stat);

  if(proc == 0){
    printf("got reply for proc %d\n", proc);
  } else if(proc == 1){
    parse_compound_reply();
  } else {
    printf("got unexpected reply for proc %d xid %d\n", proc, xxid);
  }
}

void
send_nop()
{
  put_rpc_header(0);
  send_send();
  parse_reply(0);
}

void
send_exchange_id(int dosym)
{
  put_compound(1);
  if(dosym && symstart == -1)
    symstart = oi;
  put32(42); // operation 42: EXCHANGE_ID
  int co_verifier = 1;
#if !SYM
  co_verifier = getpid(); // needs to be unique
#endif
  put64(co_verifier); // verifier4
  put_opaque(22, "Linux NFSv4.2 xyzzy"); // co_ownerid
  put32(0x103);  // flags
  put32(0); // SP4_NONE
  put32(1); // length of client_impl_id
  put_opaque(10, "kernel.org"); // nii_domain
  put_opaque(4, "blah"); // nii_name
  put64(0); // nfstime4
  put32(0); // nfstime4
  send_send();
  parse_reply(1);
}

void
send_exchange_id_sym()
{
  put_compound(1);
  put32(42); // operation 42: EXCHANGE_ID
  put64(1); // verifier4
  put_opaque(22, "Linux NFSv4.2 xyzzy"); // co_ownerid
  put32(0x103 ^ aa[aai++]);  // flags
  unsigned int how = aa[aai++];
  put32(how);
  int xoi = oi;
  if(how == 0){ // SP4_NONE
  } else if(how == 1){ // SP4_MACH_CRED
    put32(3);
    put32(0xffffffff);
    put32(0xffffffff);
    put32(0xffffffff);
    put32(3);
    put32(0xffffffff);
    put32(0xffffffff);
    put32(0xffffffff);
  } else if(how == 2){ // SP4_SSV
    // ssp_ops
    put32(3);
    put32(0xffffffff);
    put32(0xffffffff);
    put32(0xffffffff);
    // ssp_hash_algs<>
    put32(2);
    put_opaque(8, "12345678");
    put_opaque(8, "1bcdefgh");
    // ssp_encr_algs<>
    put32(2);
    put_opaque(8, "12345678");
    put_opaque(8, "1bcdefgh");
    put32(99); // ssp_window
    put32(99); // ssp_num_gss_handles
  }
  unsigned int n = aa[aai++];
  if(n > 20) n = 20;
  put32(n); // length of client_impl_id
  for(int i = xoi; i < oi && aai < NAA; i += 8)
    *(long long *)(obuf+i) ^= aa[aai++];
  for(int i = 0; i < n; i++){
    put_opaque_repeat(aa[aai++] & 0xff, 'x'); // nii_domain
    put_opaque_repeat(aa[aai++] & 0xff, 'y'); // nii_name
    put64(0); // nfstime4
    put32(0); // nfstime4
  }
  send_send();
  parse_reply(1);
}

void
send_create_session(int dosym)
{
  put_compound(1);
  put32(43); // CREATE_SESSION
  put64(clientid);
  put32(sequenceid++);
  if(dosym && symstart == -1)
    symstart = oi;
  put32(3); // flags, 1=FLAG_PERSIST, 2=CONN_BACK_CHAN
  // csa_fore_chan_attrs, csa_back_chan_attrs
  for(int i = 0; i < 2; i++){
    put32(0); // headerpadsize
    put32(4096); // maxrequestsize
    put32(4096); // maxresponsesize
    put32(4096); // maxresponsesize_cached
    put32(8); // maxoperations
    put32(16); // maxrequests
    put32(0); // ca_rdma_ird<>
  }
  put32(0x40000000); // csa_cb_program
  put32(1); // length of csa_sec_parms
#if 0
  put32(0); // AUTH_NONE
#else
  put32(1); // flavor AUTH_SYS
  put32(0); // stamp
  put_opaque(9, "localhost");
  put32(65534); // uid
  put32(65534); // gid
  put32(0); // # gids
#endif
  send_send();
  parse_reply(1);
}

void
send_sequence()
{
  put_compound(1);
  put_sequence();
  send_send();
  parse_reply(1);
}

void
send_reclaim_complete()
{
  put_compound(2);
  put_sequence();
  put_reclaim_complete();
  send_send();
  parse_reply(1);
}

void
put_rootfh()
{
  put32(24);
}

void
put_open_existing(const char *filename, int share)
{
  put32(18);
  put32(0); // seqid
  put32(share); // share_access 1=READ 3=BOTH
  put32(0); // share_deny
  put64(clientid); // owner
  put_opaque(22, "Linux NFSv4.2 xyzzy"); // owner
  put32(0); // openhow OPEN4_NOCREATE
  put32(0); // CLAIM_NULL
  put_opaque(strlen(filename), filename);
}

void
put_open_create(char *name, int dosym)
{
  put32(18);
  put32(0); // seqid
  put32(3); // share_access BOTH
  put32(0); // share_deny
  put64(clientid); // owner
  put_opaque(22, "Linux NFSv4.2 xyzzy"); // owner
  put32(1); // openhow OPEN4_CREATE
  unsigned int mode = 0; // UNCHECKED4
  if(dosym)
    mode ^= aa[aai++];
  put32(mode);
  if(mode == 2 || mode == 3){
    put64(0); // verifier
  }
  if(mode != 2){
    if(dosym){
      //put_fattr4_one();
      put32(2);
      put64(aa[aai++]);
      put32(16*8);
      for(int i = 0; i < 16; i++)
        put64(aa[aai++]);
    } else {
      put32(2); // attr bitmap length
      put32(16); // attr bits
      put32(2); // attr bits
      put32(12); // attr len
      put32(0);
      put32(0);
      put32(420);
    }
  }
  unsigned int claim_type = 0; // CLAIM_NULL
  if(dosym)
    claim_type ^= aa[aai++];
  put32(claim_type);
  {
    char ff[256];
    sprintf(ff, "/tmp/%s", name);
    unlink(ff);
  }
  if(claim_type == 0 || claim_type == 3){
    put_opaque(strlen(name), name);
  } else if(claim_type == 1){ // CLAIM_PREVIOUS
    put32(aa[aai++]);
  } else if(claim_type == 2){ // CLAIM_DELEGATE_CUR
    // stateid4
    put32(stateid_seqid); // open_stateid, from previous OPEN
    for(int i = 0; i < 12; i++)
      obuf[oi++] = stateid_other[i];
    put_opaque(strlen(name), name);
  } else if(claim_type == 4){ // CLAIM_FH
  } else if(claim_type == 5){ // CUR_FH
    put32(stateid_seqid); // open_stateid, from previous OPEN
    for(int i = 0; i < 12; i++)
      obuf[oi++] = stateid_other[i];
  } else if(claim_type == 6){ // PREV_FH
  }
}

void
put_readdir(int dosym)
{
  put32(26);
  if(dosym && symstart == -1)
    symstart = oi;
  put64(0); // cookie
  put64(0); // cookieverf
  put32(512); // dircount (bytes)
  put32(512); // maxcount (bytes)
  // bitmap
  put32(4);
  put32(0x0018091a);
  put32(0x00b0a23a);
  put32(0);
  put32(0);
}

void
put_lookup(const char *name)
{
  put32(15);
  put_opaque(strlen(name), name);
}

//
// generates a fattr4 (bitmap4 then attrlist4).
//
void
put_fattr4(int xwords[], int fh)
{
  int words[3];
  for(int i = 0; i < 3; i++){
    words[i] = xwords[i];
  }
  int bitwords = 3;
  put32(bitwords);
  int word0i = oi;
  for(int i = 0; i < bitwords; i++)
    put32(words[i]);
  int leni = oi;
  put32(0); // placeholder for total length of attrs
  for(int a = 0; a < bitwords*32; a++){
    if(words[a/32] & (1 << (a % 32))){
      int xoi = oi;
      if(a == 0){
        put32(2); // # bitmap words of supported attrs
        put32(0xffffffff);
        put32(0xffffffff);
      } else if(a == 1){
        int type = 1;
        if(fh == 0 || fh == 1)
          type = 2;
        put32(type); // NF4DIR=2 or NF4REG=1
      } else if(a == 2){
        put32(0); // fh_expire_type
      } else if(a == 3){
        put64(0); // change
      } else if(a == 4){
        put64(4096*10); // size
      } else if(a == 5){
        put32(1); // link support
      } else if(a == 6){
        put32(1); // symlink support
      } else if(a == 8){
        put64(1); // fsid major
        put64(1); // fsid minor
      } else if(a == 10){
        put32(1); // lease time
      } else if(a == 11){
        put32(0); // rdattr_error
      } else if(a == 12){
        // ACL
        int n = 2;
        put32(n);
        for(int i = 0; i < n; i++){
          put32(0); // type
          put32(0); // flag
          put32(0); // mask
          char who[9];
          memset(who, 0, sizeof(who));
          //strcpy(who, "1");
          strcpy(who, "OWNER@");
          //strcpy(who, "GROUP@");
          put_opaque(strlen(who), who);
        }
      } else if(a == 13){
        put32(0xf); // aclsupport
      } else if(a == 19){
        // filehandle
        int xfh = fh;
        put_opaque(4, (char*)&xfh); // fh
      } else if(a == 20){
        put64(fh); // fileid
      } else if(a == 24){
        // fs_locations
        put32(1);
        put_opaque(10, "abcde12345"); // pathname4
        put32(1); // locations<>
        put_opaque(10, "abcde12345"); // server
        put32(1);
        put_opaque(10, "abcde12345"); // rootpath
      } else if(a == 27){
        put64(0xffffffffffff); // max file size
      } else if(a == 28){
        put32(0xffff); // max link
      } else if(a == 29){
        put32(256); // max name
      } else if(a == 30){
        put64(10*4096); // max read
      } else if(a == 31){
        put64(10*4096); // max write
      } else if(a == 33){
        put32(0777); // mode
      } else if(a == 35){
        put32(3); // numlinks
      } else if(a == 36){
        put_opaque(6, "other"); // owner
      } else if(a == 37){
        put_opaque(6, "other"); // owner_group
      } else if(a == 41){
        put32(1); // rawdev major
        put32(1); // rawdev minor
      } else if(a == 45){
        put64(4096*10); // space used
      } else if(a == 47){
        put64(0); // time access seconds
        put32(0); // nseconds
      } else if(a == 51){
        put64(0); // time delta seconds
        put32(0); // nseconds
      } else if(a == 52){
        put64(0); // time metadata seconds
        put32(0); // nseconds
      } else if(a == 53){
        put64(0); // time modify seconds
        put32(0); // nseconds
      } else if(a == 55){
        put64(0); // mounted_on_fileid ???
      } else if(a == 62){
        // fs_layout_types
        put32(1);
        put32(1); // LAYOUT4_NFSV4_1_FILES
      } else if(a == 75){
        // FATTR4_SUPPATTR_EXCLCREAT
        put32(2); // bitmap length
        put32(0xffffffff);
        put32(0xffffffff);
      } else {
        // unknown attr, delete from bitmap.
        words[a/32] &= ~(1 << (a % 32));
        *(int*)(obuf + word0i + 4*(a/32)) = htonl(words[a/32]);
      }
    }
  }
  for(int i = 0; i < 16; i++)
    put32(0xffffffff);
  *(int*)(obuf+leni) = htonl(oi - leni - 4);
}

void
put_fattr4_inner(int words[])
{
  int bitwords = 3;
  put32(bitwords);
  int word0i = oi;
  for(int i = 0; i < bitwords; i++)
    put32(words[i]);
  int leni = oi;
  put32(0); // placeholder for total length of attrs
  for(int a = 0; a < bitwords*32; a++){
    if(words[a/32] & (1 << (a % 32))){
      if(a == 0){
        int n = 3 ^ (aa[aai++] & 0xf);
        put32(n); // # bitmap words of supported attrs
        for(int i = 0; i < n; i++){
          put32(0xffffffff ^ aa[aai++]);
        }
      } else if(a == 1){
        put32(1 ^ aa[aai++]); // NF4DIR=2 or NF4REG=1
      } else if(a == 2){
        put32(aa[aai++]); // fh_expire_type
      } else if(a == 3){
        put64(aa[aai++]); // change
      } else if(a == 4){
        put64(103 ^ aa[aai++]); // size
      } else if(a == 5){
        put32(aa[aai++]); // link support
      } else if(a == 6){
        put32(aa[aai++]); // symlink support
      } else if(a == 8){
        put64(aa[aai++]); // fsid major
        put64(aa[aai++]); // fsid minor
      } else if(a == 10){
        put32(aa[aai++]); // lease time
      } else if(a == 11){
        put32(aa[aai++]); // rdattr_error
      } else if(a == 12){
        // ACL
        int n = 1 ^ aa[aai++];
        put32(n);
        for(int i = 0; i < n && i < 2; i++){
          put32(aa[aai++]); // type
          put32(aa[aai++]); // flag
          put32(aa[aai++]); // mask
          char who[9];
          memset(who, 0, sizeof(who));
          // strcpy(who, "65534");
          strcpy(who, "OWNER@");
          *(long long*)who ^= aa[aai++];
          put_opaque(strlen(who), who);
        }
      } else if(a == 13){
        put32(0xf ^ aa[aai++]); // aclsupport
      } else if(a == 19){
        // filehandle
        int n = aa[aai++] & 0xff;
        put_opaque_repeat(n, 'x');
      } else if(a == 20){
        put64(aa[aai++] & 0x3); // fileid
      } else if(a == 24){
        // fs_locations
        put_opaque(10, "abcde12345"); // pathname4
        int n = aa[aai++] & 0x1f;
        put32(n); // locations<>
        for(int i = 0; i < n; i++){
          put_opaque_repeat(aa[aai++] & 0x1ff, 'x'); // server
          put_opaque_repeat(aa[aai++] & 0x1ff, 'y'); // rootpath
        }
      } else if(a == 27){
        put64(aa[aai++]); // max file size
      } else if(a == 28){
        put32(aa[aai++]); // max link
      } else if(a == 29){
        put32(aa[aai++]); // max name
      } else if(a == 30){
        put64(aa[aai++]); // max read
      } else if(a == 31){
        put64(aa[aai++]); // max write
      } else if(a == 33){
        put32(aa[aai++]); // mode
      } else if(a == 35){
        put32(aa[aai++]); // numlinks
      } else if(a == 36){
        put_opaque_repeat(aa[aai++] & 0x1ff, 'z'); // owner
      } else if(a == 37){
        put_opaque_repeat(aa[aai++] & 0x1ff, 'z'); // owner_group
      } else if(a == 41){
        put32(aa[aai++]); // rawdev major
        put32(aa[aai++]); // rawdev minor
      } else if(a == 45){
        put64(aa[aai++]); // space used
      } else if(a == 47){
        put64(0); // time access seconds
        put32(0); // nseconds
      } else if(a == 51){
        put64(aa[aai++]); // time delta seconds
        put32(aa[aai++]); // nseconds
      } else if(a == 52){
        put64(0); // time metadata seconds
        put32(0); // nseconds
      } else if(a == 53){
        put64(0); // time modify seconds
        put32(0); // nseconds
      } else if(a == 55){
        put64(aa[aai++]); // mounted_on_fileid ???
      } else if(a == 62){
        // fs_layout_types
        put32(aa[aai++]);
        put32(aa[aai++]); // LAYOUT4_NFSV4_1_FILES
      } else if(a == 75){
        // FATTR4_SUPPATTR_EXCLCREAT
        int n = aa[aai++] & 0xf;
        put32(n); // # bitmap words of supported attrs
        for(int i = 0; i < n; i++){
          put32(aa[aai++]);
        }
      } else {
        // unknown attr, delete from bitmap.
        words[a/32] &= ~(1 << (a % 32));
        *(int*)(obuf + word0i + 4*(a/32)) = htonl(words[a/32]);
      }
    }
  }
  for(int i = 0; i < 16; i++)
    put32(0xffffffff);
  *(int*)(obuf+leni) = htonl(oi - leni - 4);
}

//
// generate a symbolic fattr4, with multiple elements.
// tries to avoid generating illegal XDR.
//
void
put_fattr4_many()
{
  int bitwords = 3;
  int words[4];
  memset(words, 0, sizeof(words));
  int setme[] = { -1 };
  for(int i = 0; setme[i] >= 0; i++){
    int a = setme[i];
    words[a/32] |= 1 << (a % 32);
  }
  for(int i = 0; i < bitwords; i++){
    words[i] ^= aa[aai++];
  }
  put_fattr4_inner(words);
}

//
// a symbolic fattr4 with just one item set.
//
void
put_fattr4_one()
{
  int bitwords = 3;
  int words[4];
  memset(words, 0, sizeof(words));
  unsigned int bit = aa[aai++];
  if(bit >= 3*32)
    bit = 4;
  words[bit/32] |= 1 << (bit % 32);
  put_fattr4_inner(words);
}

void
put_setattr()
{
  put32(34);
  put32(stateid_seqid); // open_stateid, from previous OPEN
  for(int i = 0; i < 12; i++)
    obuf[oi++] = stateid_other[i];
  put_fattr4_one();
}

void
send_open_existing(const char *filename, int sharing)
{
  put_compound(4);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_open_existing(filename, sharing);
  send_send();
  parse_reply(1);
}

void
send_open_create(char *name, int dosym)
{
  put_compound(4);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_open_create(name, dosym);
  send_send();
  parse_reply(1);
}

void
put_create()
{
  put32(6);
  int type = 5; // NFS4LNK
  type ^= aa[aai++];
  put32(type);
  if(type == 5){
    put_opaque(16, "abcdefgh12345678");
  } else if(type == 4 || type == 3){ // CHR, BLK
    put32(aa[aai++]); // major
    put32(aa[aai++]); // minor
  }
  unlink("/tmp/newlink");
  put_opaque(7, "newlink");
  put_fattr4_one();
  if(0){
    put32(3); // fattr4 bitmap size
    put32(0);
    put32(0);
    put32(0);
    put32(0); // opaque fattr4 size
  }
}

void
send_create()
{
  put_compound(4);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_create();
  send_send();
  parse_reply(1);
}

void
send_setattr(char *name, int dosym)
{
  put_compound(5);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_lookup(name);
  put_setattr();
  send_send();
  parse_reply(1);
}

void
put_set_acl(int dosym)
{
  put32(34);
  put32(stateid_seqid); // open_stateid, from previous OPEN
  for(int i = 0; i < 12; i++)
    obuf[oi++] = stateid_other[i];
  int words[3];
  words[0] = words[1] = words[2] = 0;
  words[0] |= (1 << 12); // acl
  if(dosym && symstart == -1)
    symstart = oi + 4*4; // skip over FATTR4 bitmap
  put_fattr4(words, 1);
}

void
put_getattr()
{
  put32(9);
  put32(3);
  put32(1 << 12);
  put32(0);
  put32(0);
}

void
send_set_acl(char *name, int dosym)
{
  put_compound(6);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_lookup(name);
  put_set_acl(dosym);
  put_getattr();
  send_send();
  parse_reply(1);
}

void
put_putfh(int dosym)
{
  put32(22);
  if(dosym && symstart == -1)
    symstart = oi;
  put_opaque(28, tmp_fh);
}

void
send_putfh()
{
  put_compound(2);
  put_sequence();
  put_putfh(0);
  send_send();
  parse_reply(1);
}

void
put_getfh()
{
  put32(10);
}

void
send_lookup()
{
  put_compound(4);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_getfh();
  send_send();
  parse_reply(1);
}

void
send_readdir(int dosym)
{
  put_compound(4);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_readdir(dosym);
  send_send();
  parse_reply(1);
}

void
put_listxattrs()
{
  put32(74);
  put64(0); // cookie
  put32(16384); // maxcount
}

void
send_listxattrs()
{
  put_compound(4);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_listxattrs();
  send_send();
  parse_reply(1);
}

void
put_unlock(int dosym)
{
  put32(14);
  if(dosym && symstart == -1)
    symstart = oi;
  put32(1); // READ_LT
  put32(0); // open_seqid
  put32(stateid_seqid); // open_stateid, from previous OPEN
  for(int i = 0; i < 12; i++)
    obuf[oi++] = stateid_other[i];
  put64(0); // offset
  put64(1); // length
}

void
send_unlock(int dosym)
{
  put_compound(5);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_lookup("lockfile");
  put_unlock(dosym);
  send_send();
  parse_reply(1);
}

void
put_lock(int dosym)
{
  put32(12);
  if(dosym && symstart == -1)
    symstart = oi;
  put32(1); // READ_LT
  put32(0); // reclaim
  put64(0); // offset
  put64(2); // length
  put32(1); // new_lock_owner
  put32(0); // open_seqid
  put32(stateid_seqid); // open_stateid, from previous OPEN
  for(int i = 0; i < 12; i++)
    obuf[oi++] = stateid_other[i];
  put32(0); // lock_seqid
  put64(clientid); // clientid
  put_opaque(22, "Linux NFSv4.2 xyzzy"); // owner
}

void
send_lock(int dosym)
{
  put_compound(5);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_lookup("lockfile");
  put_lock(dosym);
  send_send();
  parse_reply(1);
}

void
put_dir_delegation(int dosym)
{
  put32(46);
  if(dosym && symstart == -1)
    symstart = oi;
  put32(0); // signal_deleg_avail
  put32(3); // notification_types bitmap length
  put32(0xffffffff);
  put32(0xffffffff);
  put32(0xffffffff);
  put64(0); // child_attr_delay
  put32(0); // child_attr_delay
  put64(0); // dir_attr_delay
  put32(0); // dir_attr_delay
  put32(3); // child_attributes bitmap length
  put32(0xffffffff);
  put32(0xffffffff);
  put32(0xffffffff);
  put32(3); // dir_attributes bitmap length
  put32(0xffffffff);
  put32(0xffffffff);
  put32(0xffffffff);
}

void
send_dir_delegation(int dosym)
{
  put_compound(4);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_dir_delegation(dosym);
  send_send();
  parse_reply(1);
}

void
put_verify(int dosym)
{
  put32(37);
  put32(2);
  if(dosym && symstart == -1)
    symstart = oi;
  put32(0);
  put32(0);
  put32(1024);
  for(int i = 0; i < 1024/4; i++)
    put32(0xffffffff);
}

void
send_verify()
{
  put_compound(4);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_verify(1);
  send_send();
  parse_reply(1);
}

void
put_setxattr(int dosym)
{
  put32(73);
  if(dosym == 2){
    put32(0); // SETXATTR4_EITHER
    unsigned int klen = aa[aai++] & 0xfff;
    unsigned int vlen = aa[aai++] & 0xfff;
    for(int i = oi; i+8 <= sizeof(obuf) && aai < NAA; i += 8)
      *(unsigned long long *)(obuf + i) = 0x4444444444444444ll ^ aa[aai++];
    if(klen < 1) klen = 1;
    put32(klen);
    oi += klen;
    while((oi % 4) != 0) oi++;
    put32(vlen);
    oi += vlen;
    while((oi % 4) != 0) oi++;
  } else if(dosym == 3) {
    int xoi = oi;
    put32(0); // SETXATTR4_EITHER
    put32(24); // klen
    for(int i = 0; i < 3; i++){
      *(long long *)(obuf + oi) = 0x4444444444444444ll ^ aa[aai++];
      oi += 8;
    }
    put32(24); // vlen
    for(int i = 0; i < 3; i++){
      *(long long *)(obuf + oi) = 0x4444444444444444ll ^ aa[aai++];
      oi += 8;
    }
  } else {
    int xoi = oi;
    put32(0); // SETXATTR4_EITHER
    put_opaque(24, "12345678abcdefgh12345678"); // key
    put_opaque(16, "abcdefgh12345678"); // value
    if(dosym)
      for(int i = xoi; i+8 <= oi; i += 8)
        *(long long *)(obuf+i) ^= aa[aai++];
  }
}

void
send_setxattr(int dosym)
{
  put_compound(5);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  sys("echo hi > /tmp/xfile ; chown nobody /tmp/xfile");
  put_lookup("xfile");
  put_setxattr(dosym);
  send_send();
  parse_reply(1);
}

void
put_remove(char *name)
{
  put32(28);
  put_opaque(strlen(name), name);
}

void
send_remove(char *name)
{
  put_compound(5);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_remove(name);
  send_send();
  parse_reply(1);
}

void
put_junk(int op, int words)
{
  put32(op);
  if(symstart != -1)
    symstart = oi;
  for(int i = 0; i < words; i++)
    put32(0);
}

void
send_junk(int op, int words)
{
  put_compound(4);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_junk(op, words);
  send_send();
  parse_reply(1);
}

void
put_layoutget(int dosym)
{
  put32(50);
  if(dosym && symstart == -1)
    symstart = oi;
  put32(1); // signal_layout_avail
  put32(4); // layout_type, FLEXFILE
  put32(3); // iomode, ANY
  put64(0); // offset
  put64(8); // length
  put64(8); // minlength
  put32(1); // stateid seq -- special current stateid
  for(int i = 0; i < 12; i++)
    obuf[oi++] = 0;
  put32(4096); // maxcount
}

void
send_layoutget(int dosym)
{
  sys("echo 1234567890123456 > /tmp/out");
  sys("chown nobody /tmp/out");
  put_compound(5);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_open_existing("out", 3);
  put_layoutget(dosym);
  send_send();
  parse_reply(1);
}

void
put_layoutreturn(int dosym)
{
  put32(51);
  if(dosym && symstart == -1)
    symstart = oi;
  put32(1); // reclaim
  put32(4); // layout_type, FLEXFILE
  put32(3); // iomode, ANY
  put32(1); // returntype, FILE
  put64(0); // offset
  put64(8); // length
  put32(1); // stateid seq -- special current stateid
  for(int i = 0; i < 12; i++)
    obuf[oi++] = 0;
  put_opaque_repeat(64, 'x'); // lrf_body<>
}

void
send_layoutreturn(int dosym)
{
  sys("echo 1234567890123456 > /tmp/out");
  sys("chown nobody /tmp/out");
  put_compound(5);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_open_existing("out", 3);
  put_layoutreturn(dosym);
  send_send();
  parse_reply(1);
}

void
put_secinfo_no_name(int dosym)
{
  put32(52);
  put32(1); // 0=CURRENT_FH, 1=PARENT
}

void
send_secinfo_no_name(int dosym)
{
  sys("echo 1234567890123456 > /tmp/out");
  sys("chown nobody /tmp/out");
  put_compound(5);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_open_existing("out", 3);
  put_secinfo_no_name(dosym);
  send_send();
  parse_reply(1);
}

void
put_get_dir_delegation(int dosym)
{
  put32(46);
  if(dosym && symstart == -1)
    symstart = oi;
  put32(1); // signal_delet_avail
  put32(3); // notification_types bitmap length
  put32(0xffffffff);
  put32(0xffffffff);
  put32(0xffffffff);
  put64(0); // child_attr_delay
  put64(0);
  put64(0); // dir_attr_delay
  put64(0);
  put32(3); // child_attributes bitmap length
  put32(0xffffffff);
  put32(0xffffffff);
  put32(0xffffffff);
  put32(3); // dir_attributes bitmap length
  put32(0xffffffff);
  put32(0xffffffff);
  put32(0xffffffff);
}

void
send_get_dir_delegation(int dosym)
{
  put_compound(4);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  put_get_dir_delegation(dosym);
  send_send();
  parse_reply(1);
}

void
send_blah()
{
  put_compound(4);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  symstart = oi;
  for(int i = 0; i < 32; i++)
    put32(0xffffffff);
  send_send();
  parse_reply(1);
}

void
put_read(int dosym)
{
  put32(25);
  if(dosym && symstart == -1)
    symstart = oi;
  put32(stateid_seqid); // open_stateid, from previous OPEN
  for(int i = 0; i < 12; i++)
    obuf[oi++] = stateid_other[i];
  put64(0); // offset
  put32(512); // count
}

void
send_read(int dosym)
{
  put_compound(5);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  sys("rm -f /tmp/foof; echo hello > /tmp/foof; chown nobody /tmp/foof");
  put_open_existing("foof", 1);
  put_read(dosym);
  send_send();
  parse_reply(1);
}

void
put_write(int dosym)
{
  put32(38);
  if(dosym && symstart == -1)
    symstart = oi;
  put32(stateid_seqid); // open_stateid, from previous OPEN
  for(int i = 0; i < 12; i++)
    obuf[oi++] = stateid_other[i];
  put64(-9999); // offset
  put32(0); // stable_how
  put_opaque(6, "hello\n"); // data
}

void
send_write(int dosym)
{
  put_compound(5);
  put_sequence();
  put_rootfh();
  put_lookup("tmp");
  unlink("/tmp/newnew");
  put_open_create("newnew", 0);
  put_write(dosym);
  send_send();
  parse_reply(1);
}

int
main(){
  setlinebuf(stdout);
  struct rlimit r;
  r.rlim_cur = r.rlim_max = 0;
  setrlimit(RLIMIT_CORE, &r);

  // /etc/exports
  // /tmp 127.0.0.1(rw,subtree_check)

  sys("/etc/init.d/rpcbind start");
  sys("/usr/sbin/rpc.idmapd");
  sys("mount -t nfsd nfsd /proc/fs/nfsd");
  sleep(2);
  sys("/usr/sbin/rpc.nfsd --lease-time 10 --grace-time 10 1");
  sleep(2);
  sys("/usr/sbin/rpc.mountd --manage-gids");
  sleep(2);
  sys("exportfs -au");
  sys("exportfs -f");
  sys("exportfs -r");
  //sys("exportfs -v");
  // sys("rpcdebug -m nfsd -s all");
  // sys("rpcdebug -m nfs -s all");
  // sys("rpcdebug -m rpc -s all");
  //sys("cat /proc/fs/nfsd/exports");

  s = socket(AF_INET, SOCK_STREAM, 0);
  int yes = 1;
  if(setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(yes)) < 0)
    perror("SO_REUSEADDR");
  struct sockaddr_in sin;
  memset(&sin, 0, sizeof(sin));
  sin.sin_family = AF_INET;
  sin.sin_addr.s_addr = inet_addr("127.0.0.1");
  for(int i = 100; i < 1024; i++){
    sin.sin_port = htons(i);
    if(bind(s, (struct sockaddr *)&sin, sizeof(sin)) == 0){
      printf("bound to port %d\n", i);
      break;
    }
  }
  sin.sin_port = htons(2049);

  sync();
  sleep(11); // grace period

  if(connect(s, (struct sockaddr *)&sin, sizeof(sin)) < 0) {
    perror("connect");
    exit(1);
  }

  int pid = fork();
  if(pid == 0){
    send_nop();
    
    // send_exchange_id_sym();
    send_exchange_id(0);
    
    send_create_session(0);

    send_reclaim_complete();
    
    // server may miss the first time, yielding 10008 NFS4ERR_DELAY.
    // so trigger the upcall to rpc.mountd and wait a bit.
    send_lookup();
    setpriority(PRIO_PROCESS, 0, 15);
    sleep(2);

    send_blah();
    // send_write(0);
    //send_read(0);

    // send_secinfo_no_name(0);

    // send_layoutget(0);
    // send_layoutreturn(0);

    //send_verify();

    //send_junk(51, 20); // 51 is LAYOUTRETURN

    //send_listxattrs();

    //send_putfh();
    
    //send_open_read();
    
    // send_open_create("newfile", 0);
    // send_remove("newfile");
    
    //send_readdir(0);
    //send_readdir(1);

    //sys("rm -f /tmp/frobozz ; touch /tmp/frobozz ; chown nobody /tmp/frobozz");
    //send_open_existing("frobozz", 3); // fetch stateid, for lock
    // send_setattr(1);
    //send_set_acl(1);

    //send_create();

    //sys("echo xxxxxxxxxx > /tmp/lockfile ; chmod ogu+rwx /tmp/lockfile");
    //sys("chown nobody /tmp/lockfile");
    //send_open_existing("lockfile", 3); // fetch stateid, for lock
    //send_lock(0);
    //send_lock(1);
    //send_unlock(1);

    //send_dir_delegation(1);

    // send_setxattr(2);

    sleep(2);
    close(s);
    sleep(1);

    exit(0);
  }
  close(s);

  for(int i = 0; i < 60; i++){
    sleep(1);
    int st;
    int ret = waitpid(pid, &st, WNOHANG);
    if(ret == pid)
      break;
  }

}

^ permalink raw reply

* Re: [RFC PATCH 0/3] mm/memcg: Address PREEMPT_RT problems instead of disabling it.
From: Michal Koutný @ 2022-01-05 14:59 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Thomas Gleixner, Waiman Long, Peter Zijlstra
In-Reply-To: <20211222114111.2206248-1-bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>

On Wed, Dec 22, 2021 at 12:41:08PM +0100, Sebastian Andrzej Siewior <bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org> wrote:
> - lockdep complains were triggered by test_core and test_freezer (both
>   had to run):

This doesn't happen on the patched kernel, correct?

Thanks,
Michal

^ permalink raw reply

* Re: [RFC PATCH 0/3] mm/memcg: Address PREEMPT_RT problems instead of disabling it.
From: Michal Koutný @ 2022-01-05 14:59 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: cgroups, linux-mm, Johannes Weiner, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Thomas Gleixner, Waiman Long,
	Peter Zijlstra
In-Reply-To: <20211222114111.2206248-1-bigeasy@linutronix.de>

On Wed, Dec 22, 2021 at 12:41:08PM +0100, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> - lockdep complains were triggered by test_core and test_freezer (both
>   had to run):

This doesn't happen on the patched kernel, correct?

Thanks,
Michal


^ permalink raw reply

* [PATCH 1/4] drm/i915: don't call free_mmap_offset when purging
From: Matthew Auld @ 2022-01-05 14:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: Thomas Hellström, dri-devel

The TTM backend is in theory the only user here(also purge should only
be called once we have dropped the pages), where it is setup at object
creation and is only removed once the object is destroyed. Also
resetting the node here might be iffy since the ttm fault handler
uses the stored fake offset to determine the page offset within the pages
array.

This also blows up in the dontneed-before-mmap test, since the
expectation is that the vma_node will live on, until the object is
destroyed:

<2> [749.062902] kernel BUG at drivers/gpu/drm/i915/gem/i915_gem_ttm.c:943!
<4> [749.062923] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
<4> [749.062928] CPU: 0 PID: 1643 Comm: gem_madvise Tainted: G     U  W         5.16.0-rc8-CI-CI_DRM_11046+ #1
<4> [749.062933] Hardware name: Gigabyte Technology Co., Ltd. GB-Z390 Garuda/GB-Z390 Garuda-CF, BIOS IG1c 11/19/2019
<4> [749.062937] RIP: 0010:i915_ttm_mmap_offset.cold.35+0x5b/0x5d [i915]
<4> [749.063044] Code: 00 48 c7 c2 a0 23 4e a0 48 c7 c7 26 df 4a a0 e8 95 1d d0 e0 bf 01 00 00 00 e8 8b ec cf e0 31 f6 bf 09 00 00 00 e8 5f 30 c0 e0 <0f> 0b 48 c7 c1 24 4b 56 a0 ba 5b 03 00 00 48 c7 c6 c0 23 4e a0 48
<4> [749.063052] RSP: 0018:ffffc90002ab7d38 EFLAGS: 00010246
<4> [749.063056] RAX: 0000000000000240 RBX: ffff88811f2e61c0 RCX: 0000000000000006
<4> [749.063060] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
<4> [749.063063] RBP: ffffc90002ab7e58 R08: 0000000000000001 R09: 0000000000000001
<4> [749.063067] R10: 000000000123d0f8 R11: ffffc90002ab7b20 R12: ffff888112a1a000
<4> [749.063071] R13: 0000000000000004 R14: ffff88811f2e61c0 R15: ffff888112a1a000
<4> [749.063074] FS:  00007f6e5fcad500(0000) GS:ffff8884ad600000(0000) knlGS:0000000000000000
<4> [749.063078] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [749.063081] CR2: 00007efd264e39f0 CR3: 0000000115fd6005 CR4: 00000000003706f0
<4> [749.063085] Call Trace:
<4> [749.063087]  <TASK>
<4> [749.063089]  __assign_mmap_offset+0x41/0x300 [i915]
<4> [749.063171]  __assign_mmap_offset_handle+0x159/0x270 [i915]
<4> [749.063248]  ? i915_gem_dumb_mmap_offset+0x70/0x70 [i915]
<4> [749.063325]  drm_ioctl_kernel+0xae/0x140
<4> [749.063330]  drm_ioctl+0x201/0x3d0
<4> [749.063333]  ? i915_gem_dumb_mmap_offset+0x70/0x70 [i915]
<4> [749.063409]  ? do_user_addr_fault+0x200/0x670
<4> [749.063415]  __x64_sys_ioctl+0x6d/0xa0
<4> [749.063419]  do_syscall_64+0x3a/0xb0
<4> [749.063423]  entry_SYSCALL_64_after_hwframe+0x44/0xae
<4> [749.063428] RIP: 0033:0x7f6e5f100317

Testcase: igt@gem_madvise@dontneed-before-mmap
Fixes: cf3e3e86d779 ("drm/i915: Use ttm mmap handling for ttm bo's.")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 89b70f5cde7a..9f429ed6e78a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -161,7 +161,6 @@ int i915_gem_object_pin_pages_unlocked(struct drm_i915_gem_object *obj)
 /* Immediately discard the backing storage */
 int i915_gem_object_truncate(struct drm_i915_gem_object *obj)
 {
-	drm_gem_free_mmap_offset(&obj->base);
 	if (obj->ops->truncate)
 		return obj->ops->truncate(obj);
 
-- 
2.31.1


^ permalink raw reply related

* [Intel-gfx] [PATCH 1/4] drm/i915: don't call free_mmap_offset when purging
From: Matthew Auld @ 2022-01-05 14:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: Thomas Hellström, dri-devel

The TTM backend is in theory the only user here(also purge should only
be called once we have dropped the pages), where it is setup at object
creation and is only removed once the object is destroyed. Also
resetting the node here might be iffy since the ttm fault handler
uses the stored fake offset to determine the page offset within the pages
array.

This also blows up in the dontneed-before-mmap test, since the
expectation is that the vma_node will live on, until the object is
destroyed:

<2> [749.062902] kernel BUG at drivers/gpu/drm/i915/gem/i915_gem_ttm.c:943!
<4> [749.062923] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
<4> [749.062928] CPU: 0 PID: 1643 Comm: gem_madvise Tainted: G     U  W         5.16.0-rc8-CI-CI_DRM_11046+ #1
<4> [749.062933] Hardware name: Gigabyte Technology Co., Ltd. GB-Z390 Garuda/GB-Z390 Garuda-CF, BIOS IG1c 11/19/2019
<4> [749.062937] RIP: 0010:i915_ttm_mmap_offset.cold.35+0x5b/0x5d [i915]
<4> [749.063044] Code: 00 48 c7 c2 a0 23 4e a0 48 c7 c7 26 df 4a a0 e8 95 1d d0 e0 bf 01 00 00 00 e8 8b ec cf e0 31 f6 bf 09 00 00 00 e8 5f 30 c0 e0 <0f> 0b 48 c7 c1 24 4b 56 a0 ba 5b 03 00 00 48 c7 c6 c0 23 4e a0 48
<4> [749.063052] RSP: 0018:ffffc90002ab7d38 EFLAGS: 00010246
<4> [749.063056] RAX: 0000000000000240 RBX: ffff88811f2e61c0 RCX: 0000000000000006
<4> [749.063060] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
<4> [749.063063] RBP: ffffc90002ab7e58 R08: 0000000000000001 R09: 0000000000000001
<4> [749.063067] R10: 000000000123d0f8 R11: ffffc90002ab7b20 R12: ffff888112a1a000
<4> [749.063071] R13: 0000000000000004 R14: ffff88811f2e61c0 R15: ffff888112a1a000
<4> [749.063074] FS:  00007f6e5fcad500(0000) GS:ffff8884ad600000(0000) knlGS:0000000000000000
<4> [749.063078] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [749.063081] CR2: 00007efd264e39f0 CR3: 0000000115fd6005 CR4: 00000000003706f0
<4> [749.063085] Call Trace:
<4> [749.063087]  <TASK>
<4> [749.063089]  __assign_mmap_offset+0x41/0x300 [i915]
<4> [749.063171]  __assign_mmap_offset_handle+0x159/0x270 [i915]
<4> [749.063248]  ? i915_gem_dumb_mmap_offset+0x70/0x70 [i915]
<4> [749.063325]  drm_ioctl_kernel+0xae/0x140
<4> [749.063330]  drm_ioctl+0x201/0x3d0
<4> [749.063333]  ? i915_gem_dumb_mmap_offset+0x70/0x70 [i915]
<4> [749.063409]  ? do_user_addr_fault+0x200/0x670
<4> [749.063415]  __x64_sys_ioctl+0x6d/0xa0
<4> [749.063419]  do_syscall_64+0x3a/0xb0
<4> [749.063423]  entry_SYSCALL_64_after_hwframe+0x44/0xae
<4> [749.063428] RIP: 0033:0x7f6e5f100317

Testcase: igt@gem_madvise@dontneed-before-mmap
Fixes: cf3e3e86d779 ("drm/i915: Use ttm mmap handling for ttm bo's.")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 89b70f5cde7a..9f429ed6e78a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -161,7 +161,6 @@ int i915_gem_object_pin_pages_unlocked(struct drm_i915_gem_object *obj)
 /* Immediately discard the backing storage */
 int i915_gem_object_truncate(struct drm_i915_gem_object *obj)
 {
-	drm_gem_free_mmap_offset(&obj->base);
 	if (obj->ops->truncate)
 		return obj->ops->truncate(obj);
 
-- 
2.31.1


^ permalink raw reply related

* [Intel-gfx] [PATCH 2/4] drm/i915/ttm: only fault WILLNEED objects
From: Matthew Auld @ 2022-01-05 14:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: Thomas Hellström, dri-devel
In-Reply-To: <20220105145835.142950-1-matthew.auld@intel.com>

Don't attempt to fault and re-populate purged objects. By some fluke
this passes the dontneed-after-mmap IGT, but for the wrong reasons.

Fixes: cf3e3e86d779 ("drm/i915: Use ttm mmap handling for ttm bo's.")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 923cc7ad8d70..8d61d4538a64 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -883,6 +883,11 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
 	if (ret)
 		return ret;
 
+	if (obj->mm.madv != I915_MADV_WILLNEED) {
+		dma_resv_unlock(bo->base.resv);
+		return VM_FAULT_SIGBUS;
+	}
+
 	if (drm_dev_enter(dev, &idx)) {
 		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
 					       TTM_BO_VM_NUM_PREFAULT);
-- 
2.31.1


^ permalink raw reply related

* [PATCH 2/4] drm/i915/ttm: only fault WILLNEED objects
From: Matthew Auld @ 2022-01-05 14:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: Thomas Hellström, dri-devel
In-Reply-To: <20220105145835.142950-1-matthew.auld@intel.com>

Don't attempt to fault and re-populate purged objects. By some fluke
this passes the dontneed-after-mmap IGT, but for the wrong reasons.

Fixes: cf3e3e86d779 ("drm/i915: Use ttm mmap handling for ttm bo's.")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 923cc7ad8d70..8d61d4538a64 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -883,6 +883,11 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
 	if (ret)
 		return ret;
 
+	if (obj->mm.madv != I915_MADV_WILLNEED) {
+		dma_resv_unlock(bo->base.resv);
+		return VM_FAULT_SIGBUS;
+	}
+
 	if (drm_dev_enter(dev, &idx)) {
 		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
 					       TTM_BO_VM_NUM_PREFAULT);
-- 
2.31.1


^ permalink raw reply related

* [Intel-gfx] [PATCH 3/4] drm/i915/ttm: ensure we unmap when purging
From: Matthew Auld @ 2022-01-05 14:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: Thomas Hellström, dri-devel
In-Reply-To: <20220105145835.142950-1-matthew.auld@intel.com>

Purging can happen during swapping out, or directly invoked with the
madvise ioctl. In such cases this doesn't involve a ttm move, which
skips umapping the object.

Fixes: cf3e3e86d779 ("drm/i915: Use ttm mmap handling for ttm bo's.")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 8d61d4538a64..f148e7e48f86 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -399,6 +399,8 @@ int i915_ttm_purge(struct drm_i915_gem_object *obj)
 	if (obj->mm.madv == __I915_MADV_PURGED)
 		return 0;
 
+	ttm_bo_unmap_virtual(bo);
+
 	ret = ttm_bo_validate(bo, &place, &ctx);
 	if (ret)
 		return ret;
-- 
2.31.1


^ permalink raw reply related

* Re: [PATCH] ethtool: use phydev variable
From: Andrew Lunn @ 2022-01-05 15:00 UTC (permalink / raw)
  To: trix
  Cc: davem, kuba, leon, arnd, danieller, gustavoars, hkallweit1,
	netdev, linux-kernel
In-Reply-To: <20220105141020.3793409-1-trix@redhat.com>

On Wed, Jan 05, 2022 at 06:10:20AM -0800, trix@redhat.com wrote:
> From: Tom Rix <trix@redhat.com>
> 
> In ethtool_get_phy_stats(), the phydev varaible is set to
> dev->phydev but dev->phydev is still used.  Replace
> dev->phydev uses with phydev.
> 
> Signed-off-by: Tom Rix <trix@redhat.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [PATCH 3/4] drm/i915/ttm: ensure we unmap when purging
From: Matthew Auld @ 2022-01-05 14:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: Thomas Hellström, dri-devel
In-Reply-To: <20220105145835.142950-1-matthew.auld@intel.com>

Purging can happen during swapping out, or directly invoked with the
madvise ioctl. In such cases this doesn't involve a ttm move, which
skips umapping the object.

Fixes: cf3e3e86d779 ("drm/i915: Use ttm mmap handling for ttm bo's.")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 8d61d4538a64..f148e7e48f86 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -399,6 +399,8 @@ int i915_ttm_purge(struct drm_i915_gem_object *obj)
 	if (obj->mm.madv == __I915_MADV_PURGED)
 		return 0;
 
+	ttm_bo_unmap_virtual(bo);
+
 	ret = ttm_bo_validate(bo, &place, &ctx);
 	if (ret)
 		return ret;
-- 
2.31.1


^ permalink raw reply related

* [PATCH 4/4] drm/i915/ttm: ensure we unmap when shrinking
From: Matthew Auld @ 2022-01-05 14:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: Thomas Hellström, dri-devel
In-Reply-To: <20220105145835.142950-1-matthew.auld@intel.com>

Assuming we don't purge the pages, but instead swap them out then we
need to ensure we also unmap the object.

Fixes: 7ae034590cea ("drm/i915/ttm: add tt shmem backend")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index f148e7e48f86..adbbd57bb9bf 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -462,6 +462,8 @@ static int i915_ttm_shrinker_release_pages(struct drm_i915_gem_object *obj,
 	if (bo->ttm->page_flags & TTM_TT_FLAG_SWAPPED)
 		return 0;
 
+	ttm_bo_unmap_virtual(bo);
+
 	bo->ttm->page_flags |= TTM_TT_FLAG_SWAPPED;
 	ret = ttm_bo_validate(bo, &place, &ctx);
 	if (ret) {
-- 
2.31.1


^ permalink raw reply related

* [Intel-gfx] [PATCH 4/4] drm/i915/ttm: ensure we unmap when shrinking
From: Matthew Auld @ 2022-01-05 14:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: Thomas Hellström, dri-devel
In-Reply-To: <20220105145835.142950-1-matthew.auld@intel.com>

Assuming we don't purge the pages, but instead swap them out then we
need to ensure we also unmap the object.

Fixes: 7ae034590cea ("drm/i915/ttm: add tt shmem backend")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index f148e7e48f86..adbbd57bb9bf 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -462,6 +462,8 @@ static int i915_ttm_shrinker_release_pages(struct drm_i915_gem_object *obj,
 	if (bo->ttm->page_flags & TTM_TT_FLAG_SWAPPED)
 		return 0;
 
+	ttm_bo_unmap_virtual(bo);
+
 	bo->ttm->page_flags |= TTM_TT_FLAG_SWAPPED;
 	ret = ttm_bo_validate(bo, &place, &ctx);
 	if (ret) {
-- 
2.31.1


^ permalink raw reply related

* Re: [PATCH] linux-user/syscall.c: fix missed flag for shared memory in open_self_maps
From: Alex Bennée @ 2022-01-05 14:54 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: qemu-devel
In-Reply-To: <18882253-9e57-0654-1eb2-870a451a50ce@vivier.eu>


Laurent Vivier <laurent@vivier.eu> writes:

> Le 27/12/2021 à 13:50, Andrey Kazmin a écrit :
>> The possible variants for region type in /proc/self/maps are either
>> private "p" or shared "s". In the current implementation,
>> we mark shared regions as "-". It could break memory mapping parsers
>> such as included into ASan/HWASan sanitizers.
>> Signed-off-by: Andrey Kazmin <a.kazmin@partner.samsung.com>

Acked-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply

* Re: [PATCH 03/13] kprobe: Add support to register multiple ftrace kprobes
From: Masami Hiramatsu @ 2022-01-05 15:00 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, netdev, bpf,
	lkml, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Steven Rostedt, Naveen N. Rao, Anil S Keshavamurthy,
	David S. Miller
In-Reply-To: <20220104080943.113249-4-jolsa@kernel.org>

On Tue,  4 Jan 2022 09:09:33 +0100
Jiri Olsa <jolsa@redhat.com> wrote:

> Adding support to register kprobe on multiple addresses within
> single kprobe object instance.
> 
> It uses the CONFIG_KPROBES_ON_FTRACE feature (so it's only
> available for function entry addresses) and separated ftrace_ops
> object placed directly in the kprobe object.
> 
> There's new CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE config option,
> enabled for archs that support multi kprobes.
> 
> It registers all the provided addresses in the ftrace_ops object
> filter and adds separate ftrace_ops callback.
> 
> To register multi kprobe user provides array of addresses or
> symbols with their count, like:
> 
>   struct kprobe kp = {};
> 
>   kp.multi.symbols = (const char **) symbols;
>   kp.multi.cnt = cnt;
>   ...
> 
>   err = register_kprobe(&kp);

I would like to keep the kprobe itself as simple as possible, which
also should provide a consistent probe handling model.

I understand that you consider the overhead of having multiple
probes, but as far as I can see, this implementation design
smells no good, sorry.
The worst point is that the multi kprobe only supports function
entry (and only available if FTRACE is enabled). Then, that is
FTRACE, not kprobes.

Yes, kprobe supports probing on FTRACE by using FTRACE, but that does
not mean kprobes wrapps FTRACE. That is just for "avoidance" of the
limitation (because we can not put a breakpoint on self-modified code.)

If you need a probe which support multiple address but only
on function entry, that should be something like 'fprobe', not
kprobes.
IMHO, that should be implemented as similar but different APIs
because those are simply different.

So, can't we use ftrace directly from bpf? I don't think there is
no reason that the bpf sticks on kprobes APIs.

Thank you,

> 
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  arch/Kconfig                     |   3 +
>  arch/x86/Kconfig                 |   1 +
>  arch/x86/kernel/kprobes/ftrace.c |  48 ++++++--
>  include/linux/kprobes.h          |  25 ++++
>  kernel/kprobes.c                 | 204 +++++++++++++++++++++++++------
>  5 files changed, 235 insertions(+), 46 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index d3c4ab249e9c..0131636e1ef8 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -191,6 +191,9 @@ config HAVE_OPTPROBES
>  config HAVE_KPROBES_ON_FTRACE
>  	bool
>  
> +config HAVE_KPROBES_MULTI_ON_FTRACE
> +	bool
> +
>  config ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
>  	bool
>  	help
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 5c2ccb85f2ef..0c870238016a 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -217,6 +217,7 @@ config X86
>  	select HAVE_KERNEL_ZSTD
>  	select HAVE_KPROBES
>  	select HAVE_KPROBES_ON_FTRACE
> +	select HAVE_KPROBES_MULTI_ON_FTRACE
>  	select HAVE_FUNCTION_ERROR_INJECTION
>  	select HAVE_KRETPROBES
>  	select HAVE_KVM
> diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
> index dd2ec14adb77..ac4d256b89c6 100644
> --- a/arch/x86/kernel/kprobes/ftrace.c
> +++ b/arch/x86/kernel/kprobes/ftrace.c
> @@ -12,22 +12,14 @@
>  
>  #include "common.h"
>  
> -/* Ftrace callback handler for kprobes -- called under preempt disabled */
> -void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> -			   struct ftrace_ops *ops, struct ftrace_regs *fregs)
> +static void ftrace_handler(struct kprobe *p, unsigned long ip,
> +			   struct ftrace_regs *fregs)
>  {
>  	struct pt_regs *regs = ftrace_get_regs(fregs);
> -	struct kprobe *p;
>  	struct kprobe_ctlblk *kcb;
> -	int bit;
>  
> -	bit = ftrace_test_recursion_trylock(ip, parent_ip);
> -	if (bit < 0)
> -		return;
> -
> -	p = get_kprobe((kprobe_opcode_t *)ip);
>  	if (unlikely(!p) || kprobe_disabled(p))
> -		goto out;
> +		return;
>  
>  	kcb = get_kprobe_ctlblk();
>  	if (kprobe_running()) {
> @@ -57,11 +49,43 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
>  		 */
>  		__this_cpu_write(current_kprobe, NULL);
>  	}
> -out:
> +}
> +
> +/* Ftrace callback handler for kprobes -- called under preempt disabled */
> +void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> +			   struct ftrace_ops *ops, struct ftrace_regs *fregs)
> +{
> +	struct kprobe *p;
> +	int bit;
> +
> +	bit = ftrace_test_recursion_trylock(ip, parent_ip);
> +	if (bit < 0)
> +		return;
> +
> +	p = get_kprobe((kprobe_opcode_t *)ip);
> +	ftrace_handler(p, ip, fregs);
> +
>  	ftrace_test_recursion_unlock(bit);
>  }
>  NOKPROBE_SYMBOL(kprobe_ftrace_handler);
>  
> +void kprobe_ftrace_multi_handler(unsigned long ip, unsigned long parent_ip,
> +				 struct ftrace_ops *ops, struct ftrace_regs *fregs)
> +{
> +	struct kprobe *p;
> +	int bit;
> +
> +	bit = ftrace_test_recursion_trylock(ip, parent_ip);
> +	if (bit < 0)
> +		return;
> +
> +	p = container_of(ops, struct kprobe, multi.ops);
> +	ftrace_handler(p, ip, fregs);
> +
> +	ftrace_test_recursion_unlock(bit);
> +}
> +NOKPROBE_SYMBOL(kprobe_ftrace_multi_handler);
> +
>  int arch_prepare_kprobe_ftrace(struct kprobe *p)
>  {
>  	p->ainsn.insn = NULL;
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index a204df4fef96..03fd86ef69cb 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -68,6 +68,16 @@ struct kprobe {
>  	/* location of the probe point */
>  	kprobe_opcode_t *addr;
>  
> +#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
> +	/* location of the multi probe points */
> +	struct {
> +		const char **symbols;
> +		kprobe_opcode_t **addrs;
> +		unsigned int cnt;
> +		struct ftrace_ops ops;
> +	} multi;
> +#endif
> +
>  	/* Allow user to indicate symbol name of the probe point */
>  	const char *symbol_name;
>  
> @@ -105,6 +115,7 @@ struct kprobe {
>  				   * this flag is only for optimized_kprobe.
>  				   */
>  #define KPROBE_FLAG_FTRACE	8 /* probe is using ftrace */
> +#define KPROBE_FLAG_MULTI      16 /* probe multi addresses */
>  
>  /* Has this kprobe gone ? */
>  static inline bool kprobe_gone(struct kprobe *p)
> @@ -130,6 +141,18 @@ static inline bool kprobe_ftrace(struct kprobe *p)
>  	return p->flags & KPROBE_FLAG_FTRACE;
>  }
>  
> +/* Is this ftrace multi kprobe ? */
> +static inline bool kprobe_ftrace_multi(struct kprobe *p)
> +{
> +	return kprobe_ftrace(p) && (p->flags & KPROBE_FLAG_MULTI);
> +}
> +
> +/* Is this single kprobe ? */
> +static inline bool kprobe_single(struct kprobe *p)
> +{
> +	return !(p->flags & KPROBE_FLAG_MULTI);
> +}
> +
>  /*
>   * Function-return probe -
>   * Note:
> @@ -365,6 +388,8 @@ static inline void wait_for_kprobe_optimizer(void) { }
>  #ifdef CONFIG_KPROBES_ON_FTRACE
>  extern void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
>  				  struct ftrace_ops *ops, struct ftrace_regs *fregs);
> +extern void kprobe_ftrace_multi_handler(unsigned long ip, unsigned long parent_ip,
> +					struct ftrace_ops *ops, struct ftrace_regs *fregs);
>  extern int arch_prepare_kprobe_ftrace(struct kprobe *p);
>  #else
>  static inline int arch_prepare_kprobe_ftrace(struct kprobe *p)
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index c4060a8da050..e7729e20d85c 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -44,6 +44,7 @@
>  #include <asm/cacheflush.h>
>  #include <asm/errno.h>
>  #include <linux/uaccess.h>
> +#include <linux/ftrace.h>
>  
>  #define KPROBE_HASH_BITS 6
>  #define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)
> @@ -1022,6 +1023,35 @@ static struct kprobe *alloc_aggr_kprobe(struct kprobe *p)
>  }
>  #endif /* CONFIG_OPTPROBES */
>  
> +static int check_kprobe_address(unsigned long addr)
> +{
> +	/* Ensure it is not in reserved area nor out of text */
> +	return !kernel_text_address(addr) ||
> +		within_kprobe_blacklist(addr) ||
> +		jump_label_text_reserved((void *) addr, (void *) addr) ||
> +		static_call_text_reserved((void *) addr, (void *) addr) ||
> +		find_bug(addr);
> +}
> +
> +static int check_ftrace_location(unsigned long addr, struct kprobe *p)
> +{
> +	unsigned long ftrace_addr;
> +
> +	ftrace_addr = ftrace_location(addr);
> +	if (ftrace_addr) {
> +#ifdef CONFIG_KPROBES_ON_FTRACE
> +		/* Given address is not on the instruction boundary */
> +		if (addr != ftrace_addr)
> +			return -EILSEQ;
> +		if (p)
> +			p->flags |= KPROBE_FLAG_FTRACE;
> +#else	/* !CONFIG_KPROBES_ON_FTRACE */
> +		return -EINVAL;
> +#endif
> +	}
> +	return 0;
> +}
> +
>  #ifdef CONFIG_KPROBES_ON_FTRACE
>  static struct ftrace_ops kprobe_ftrace_ops __read_mostly = {
>  	.func = kprobe_ftrace_handler,
> @@ -1043,6 +1073,13 @@ static int __arm_kprobe_ftrace(struct kprobe *p, struct ftrace_ops *ops,
>  
>  	lockdep_assert_held(&kprobe_mutex);
>  
> +#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
> +	if (kprobe_ftrace_multi(p)) {
> +		ret = register_ftrace_function(&p->multi.ops);
> +		WARN(ret < 0, "Failed to register kprobe-multi-ftrace (error %d)\n", ret);
> +		return ret;
> +	}
> +#endif
>  	ret = ftrace_set_filter_ip(ops, (unsigned long)p->addr, 0, 0);
>  	if (WARN_ONCE(ret < 0, "Failed to arm kprobe-ftrace at %pS (error %d)\n", p->addr, ret))
>  		return ret;
> @@ -1081,6 +1118,13 @@ static int __disarm_kprobe_ftrace(struct kprobe *p, struct ftrace_ops *ops,
>  
>  	lockdep_assert_held(&kprobe_mutex);
>  
> +#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
> +	if (kprobe_ftrace_multi(p)) {
> +		ret = unregister_ftrace_function(&p->multi.ops);
> +		WARN(ret < 0, "Failed to unregister kprobe-ftrace (error %d)\n", ret);
> +		return ret;
> +	}
> +#endif
>  	if (*cnt == 1) {
>  		ret = unregister_ftrace_function(ops);
>  		if (WARN(ret < 0, "Failed to unregister kprobe-ftrace (error %d)\n", ret))
> @@ -1103,6 +1147,94 @@ static int disarm_kprobe_ftrace(struct kprobe *p)
>  		ipmodify ? &kprobe_ipmodify_ops : &kprobe_ftrace_ops,
>  		ipmodify ? &kprobe_ipmodify_enabled : &kprobe_ftrace_enabled);
>  }
> +
> +#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
> +/*
> + * In addition to standard kprobe address check for multi
> + * ftrace kprobes we also allow only:
> + * - ftrace managed function entry address
> + * - kernel core only address
> + */
> +static unsigned long check_ftrace_addr(unsigned long addr)
> +{
> +	int err;
> +
> +	if (!addr)
> +		return -EINVAL;
> +	err = check_ftrace_location(addr, NULL);
> +	if (err)
> +		return err;
> +	if (check_kprobe_address(addr))
> +		return -EINVAL;
> +	if (__module_text_address(addr))
> +		return -EINVAL;
> +	return 0;
> +}
> +
> +static int check_ftrace_multi(struct kprobe *p)
> +{
> +	kprobe_opcode_t **addrs = p->multi.addrs;
> +	const char **symbols = p->multi.symbols;
> +	unsigned int i, cnt = p->multi.cnt;
> +	unsigned long addr, *ips;
> +	int err;
> +
> +	if ((symbols && addrs) || (!symbols && !addrs))
> +		return -EINVAL;
> +
> +	/* do we want sysctl for this? */
> +	if (cnt >= 20000)
> +		return -E2BIG;
> +
> +	ips = kmalloc(sizeof(*ips) * cnt, GFP_KERNEL);
> +	if (!ips)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < cnt; i++) {
> +		if (symbols)
> +			addr = (unsigned long) kprobe_lookup_name(symbols[i], 0);
> +		else
> +			addr = (unsigned long) addrs[i];
> +		ips[i] = addr;
> +	}
> +
> +	jump_label_lock();
> +	preempt_disable();
> +
> +	for (i = 0; i < cnt; i++) {
> +		err = check_ftrace_addr(ips[i]);
> +		if (err)
> +			break;
> +	}
> +
> +	preempt_enable();
> +	jump_label_unlock();
> +
> +	if (err)
> +		goto out;
> +
> +	err = ftrace_set_filter_ips(&p->multi.ops, ips, cnt, 0, 0);
> +	if (err)
> +		goto out;
> +
> +	p->multi.ops.func = kprobe_ftrace_multi_handler;
> +	p->multi.ops.flags = FTRACE_OPS_FL_SAVE_REGS|FTRACE_OPS_FL_DYNAMIC;
> +
> +	p->flags |= KPROBE_FLAG_MULTI|KPROBE_FLAG_FTRACE;
> +	if (p->post_handler)
> +		p->multi.ops.flags |= FTRACE_OPS_FL_IPMODIFY;
> +
> +out:
> +	kfree(ips);
> +	return err;
> +}
> +
> +static void free_ftrace_multi(struct kprobe *p)
> +{
> +	ftrace_free_filter(&p->multi.ops);
> +}
> +#endif
> +
>  #else	/* !CONFIG_KPROBES_ON_FTRACE */
>  static inline int arm_kprobe_ftrace(struct kprobe *p)
>  {
> @@ -1489,6 +1621,9 @@ static struct kprobe *__get_valid_kprobe(struct kprobe *p)
>  
>  	lockdep_assert_held(&kprobe_mutex);
>  
> +	if (kprobe_ftrace_multi(p))
> +		return p;
> +
>  	ap = get_kprobe(p->addr);
>  	if (unlikely(!ap))
>  		return NULL;
> @@ -1520,41 +1655,18 @@ static inline int warn_kprobe_rereg(struct kprobe *p)
>  	return ret;
>  }
>  
> -static int check_ftrace_location(struct kprobe *p)
> -{
> -	unsigned long ftrace_addr;
> -
> -	ftrace_addr = ftrace_location((unsigned long)p->addr);
> -	if (ftrace_addr) {
> -#ifdef CONFIG_KPROBES_ON_FTRACE
> -		/* Given address is not on the instruction boundary */
> -		if ((unsigned long)p->addr != ftrace_addr)
> -			return -EILSEQ;
> -		p->flags |= KPROBE_FLAG_FTRACE;
> -#else	/* !CONFIG_KPROBES_ON_FTRACE */
> -		return -EINVAL;
> -#endif
> -	}
> -	return 0;
> -}
> -
>  static int check_kprobe_address_safe(struct kprobe *p,
>  				     struct module **probed_mod)
>  {
>  	int ret;
>  
> -	ret = check_ftrace_location(p);
> +	ret = check_ftrace_location((unsigned long) p->addr, p);
>  	if (ret)
>  		return ret;
>  	jump_label_lock();
>  	preempt_disable();
>  
> -	/* Ensure it is not in reserved area nor out of text */
> -	if (!kernel_text_address((unsigned long) p->addr) ||
> -	    within_kprobe_blacklist((unsigned long) p->addr) ||
> -	    jump_label_text_reserved(p->addr, p->addr) ||
> -	    static_call_text_reserved(p->addr, p->addr) ||
> -	    find_bug((unsigned long)p->addr)) {
> +	if (check_kprobe_address((unsigned long) p->addr)) {
>  		ret = -EINVAL;
>  		goto out;
>  	}
> @@ -1599,13 +1711,16 @@ static unsigned long resolve_func_addr(kprobe_opcode_t *addr)
>  	return 0;
>  }
>  
> -int register_kprobe(struct kprobe *p)
> +static int check_addr(struct kprobe *p, struct module **probed_mod)
>  {
>  	int ret;
> -	struct kprobe *old_p;
> -	struct module *probed_mod;
>  	kprobe_opcode_t *addr;
>  
> +#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
> +	if (p->multi.cnt)
> +		return check_ftrace_multi(p);
> +#endif
> +
>  	/* Adjust probe address from symbol */
>  	addr = kprobe_addr(p);
>  	if (IS_ERR(addr))
> @@ -1616,13 +1731,21 @@ int register_kprobe(struct kprobe *p)
>  	ret = warn_kprobe_rereg(p);
>  	if (ret)
>  		return ret;
> +	return check_kprobe_address_safe(p, probed_mod);
> +}
> +
> +int register_kprobe(struct kprobe *p)
> +{
> +	struct module *probed_mod = NULL;
> +	struct kprobe *old_p;
> +	int ret;
>  
>  	/* User can pass only KPROBE_FLAG_DISABLED to register_kprobe */
>  	p->flags &= KPROBE_FLAG_DISABLED;
>  	p->nmissed = 0;
>  	INIT_LIST_HEAD(&p->list);
>  
> -	ret = check_kprobe_address_safe(p, &probed_mod);
> +	ret = check_addr(p, &probed_mod);
>  	if (ret)
>  		return ret;
>  
> @@ -1644,14 +1767,21 @@ int register_kprobe(struct kprobe *p)
>  	if (ret)
>  		goto out;
>  
> -	INIT_HLIST_NODE(&p->hlist);
> -	hlist_add_head_rcu(&p->hlist,
> -		       &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);
> +	/*
> +	 * Multi ftrace kprobes do not have single address,
> +	 * so they are not stored in the kprobe_table hash.
> +	 */
> +	if (kprobe_single(p)) {
> +		INIT_HLIST_NODE(&p->hlist);
> +		hlist_add_head_rcu(&p->hlist,
> +			       &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);
> +	}
>  
>  	if (!kprobes_all_disarmed && !kprobe_disabled(p)) {
>  		ret = arm_kprobe(p);
>  		if (ret) {
> -			hlist_del_rcu(&p->hlist);
> +			if (kprobe_single(p))
> +				hlist_del_rcu(&p->hlist);
>  			synchronize_rcu();
>  			goto out;
>  		}
> @@ -1778,7 +1908,13 @@ static int __unregister_kprobe_top(struct kprobe *p)
>  	return 0;
>  
>  disarmed:
> -	hlist_del_rcu(&ap->hlist);
> +	if (kprobe_single(ap))
> +		hlist_del_rcu(&ap->hlist);
> +
> +#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
> +	if (kprobe_ftrace_multi(ap))
> +		free_ftrace_multi(ap);
> +#endif
>  	return 0;
>  }
>  
> -- 
> 2.33.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply

* [PATCH] mtd: nand: pxa3xx: set mtd->dev
From: Robert Marko @ 2022-01-05 15:01 UTC (permalink / raw)
  To: u-boot, marek.behun, sr, pali; +Cc: Robert Marko

Currently the pxa3xx driver does not set the udevice in the mtd_info
struct and this prevents the mtd from parsing the partitions via DTS
like for SPI-NOR.

So simply set the mtd->dev to the driver udevice.

Signed-off-by: Robert Marko <robert.marko@sartura.hr>
---
 drivers/mtd/nand/raw/pxa3xx_nand.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c
index 8ff58a7038..eb739bb3b9 100644
--- a/drivers/mtd/nand/raw/pxa3xx_nand.c
+++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
@@ -1913,6 +1913,7 @@ static int pxa3xx_nand_probe(struct udevice *dev)
 		 * user's mtd partitions configuration would get broken.
 		 */
 		mtd->name = "pxa3xx_nand-0";
+		mtd->dev = dev;
 		info->cs = cs;
 		ret = pxa3xx_nand_scan(mtd);
 		if (ret) {
-- 
2.34.1


^ permalink raw reply related

* [PATCH v3 10/16] jobs: protect jobs with job_lock/unlock
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, Wen Congyang,
	Xie Changlong, Emanuele Giuseppe Esposito, Markus Armbruster,
	qemu-devel, Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini,
	John Snow
In-Reply-To: <20220105140208.365608-1-eesposit@redhat.com>

Introduce the job locking mechanism through the whole job API,
following the comments and requirements of job-monitor (assume
lock is held) and job-driver (lock is not held).

job_{lock/unlock} is independent from real_job_{lock/unlock}.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block.c             |  18 +++---
 block/replication.c |   8 ++-
 blockdev.c          |  17 +++++-
 blockjob.c          |  64 ++++++++++++++-------
 job-qmp.c           |   2 +
 job.c               | 132 +++++++++++++++++++++++++++++++-------------
 monitor/qmp-cmds.c  |   6 +-
 qemu-img.c          |  41 ++++++++------
 8 files changed, 199 insertions(+), 89 deletions(-)

diff --git a/block.c b/block.c
index 8fcd525fa0..fac0759422 100644
--- a/block.c
+++ b/block.c
@@ -4976,7 +4976,9 @@ static void bdrv_close(BlockDriverState *bs)
 
 void bdrv_close_all(void)
 {
-    assert(job_next_locked(NULL) == NULL);
+    WITH_JOB_LOCK_GUARD() {
+        assert(job_next_locked(NULL) == NULL);
+    }
     assert(qemu_in_main_thread());
 
     /* Drop references from requests still in flight, such as canceled block
@@ -6154,13 +6156,15 @@ XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error **errp)
         }
     }
 
-    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
-        GSList *el;
+    WITH_JOB_LOCK_GUARD() {
+        for (job = block_job_next(NULL); job; job = block_job_next(job)) {
+            GSList *el;
 
-        xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
-                           job->job.id);
-        for (el = job->nodes; el; el = el->next) {
-            xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
+            xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
+                                job->job.id);
+            for (el = job->nodes; el; el = el->next) {
+                xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
+            }
         }
     }
 
diff --git a/block/replication.c b/block/replication.c
index 5215c328c1..50ea778937 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -149,7 +149,9 @@ static void replication_close(BlockDriverState *bs)
     if (s->stage == BLOCK_REPLICATION_FAILOVER) {
         commit_job = &s->commit_job->job;
         assert(commit_job->aio_context == qemu_get_current_aio_context());
-        job_cancel_sync_locked(commit_job, false);
+        WITH_JOB_LOCK_GUARD() {
+            job_cancel_sync_locked(commit_job, false);
+        }
     }
 
     if (s->mode == REPLICATION_MODE_SECONDARY) {
@@ -726,7 +728,9 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
          * disk, secondary disk in backup_job_completed().
          */
         if (s->backup_job) {
-            job_cancel_sync_locked(&s->backup_job->job, true);
+            WITH_JOB_LOCK_GUARD() {
+                job_cancel_sync_locked(&s->backup_job->job, true);
+            }
         }
 
         if (!failover) {
diff --git a/blockdev.c b/blockdev.c
index ee35aff13a..099d57e0d2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -155,6 +155,8 @@ void blockdev_mark_auto_del(BlockBackend *blk)
         return;
     }
 
+    JOB_LOCK_GUARD();
+
     for (job = block_job_next(NULL); job; job = block_job_next(job)) {
         if (block_job_has_bdrv(job, blk_bs(blk))) {
             AioContext *aio_context = job->job.aio_context;
@@ -1832,7 +1834,9 @@ static void drive_backup_abort(BlkActionState *common)
         aio_context = bdrv_get_aio_context(state->bs);
         aio_context_acquire(aio_context);
 
-        job_cancel_sync_locked(&state->job->job, true);
+        WITH_JOB_LOCK_GUARD() {
+            job_cancel_sync_locked(&state->job->job, true);
+        }
 
         aio_context_release(aio_context);
     }
@@ -1933,7 +1937,9 @@ static void blockdev_backup_abort(BlkActionState *common)
         aio_context = bdrv_get_aio_context(state->bs);
         aio_context_acquire(aio_context);
 
-        job_cancel_sync_locked(&state->job->job, true);
+        WITH_JOB_LOCK_GUARD() {
+            job_cancel_sync_locked(&state->job->job, true);
+        }
 
         aio_context_release(aio_context);
     }
@@ -2382,7 +2388,10 @@ exit:
     if (!has_props) {
         qapi_free_TransactionProperties(props);
     }
-    job_txn_unref_locked(block_job_txn);
+
+    WITH_JOB_LOCK_GUARD() {
+        job_txn_unref_locked(block_job_txn);
+    }
 }
 
 BlockDirtyBitmapSha256 *qmp_x_debug_block_dirty_bitmap_sha256(const char *node,
@@ -3705,6 +3714,8 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
     BlockJobInfoList *head = NULL, **tail = &head;
     BlockJob *job;
 
+    JOB_LOCK_GUARD();
+
     for (job = block_job_next(NULL); job; job = block_job_next(job)) {
         BlockJobInfo *value;
 
diff --git a/blockjob.c b/blockjob.c
index ce356be51e..e00c8d31d5 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -88,7 +88,9 @@ static char *child_job_get_parent_desc(BdrvChild *c)
 static void child_job_drained_begin(BdrvChild *c)
 {
     BlockJob *job = c->opaque;
-    job_pause_locked(&job->job);
+    WITH_JOB_LOCK_GUARD() {
+        job_pause_locked(&job->job);
+    }
 }
 
 static bool child_job_drained_poll(BdrvChild *c)
@@ -100,8 +102,10 @@ static bool child_job_drained_poll(BdrvChild *c)
     /* An inactive or completed job doesn't have any pending requests. Jobs
      * with !job->busy are either already paused or have a pause point after
      * being reentered, so no job driver code will run before they pause. */
-    if (!job->busy || job_is_completed_locked(job)) {
-        return false;
+    WITH_JOB_LOCK_GUARD() {
+        if (!job->busy || job_is_completed_locked(job)) {
+            return false;
+        }
     }
 
     /* Otherwise, assume that it isn't fully stopped yet, but allow the job to
@@ -116,7 +120,9 @@ static bool child_job_drained_poll(BdrvChild *c)
 static void child_job_drained_end(BdrvChild *c, int *drained_end_counter)
 {
     BlockJob *job = c->opaque;
-    job_resume_locked(&job->job);
+    WITH_JOB_LOCK_GUARD() {
+        job_resume_locked(&job->job);
+    }
 }
 
 static bool child_job_can_set_aio_ctx(BdrvChild *c, AioContext *ctx,
@@ -238,7 +244,13 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
 
 static void block_job_on_idle(Notifier *n, void *opaque)
 {
+    /*
+     * we can't kick with job_mutex held, but we also want
+     * to protect the notifier list.
+     */
+    job_unlock();
     aio_wait_kick();
+    job_lock();
 }
 
 bool block_job_is_internal(BlockJob *job)
@@ -278,7 +290,9 @@ bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
     job->speed = speed;
 
     if (drv->set_speed) {
+        job_unlock();
         drv->set_speed(job, speed);
+        job_lock();
     }
 
     if (speed && speed <= old_speed) {
@@ -458,13 +472,15 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     job->ready_notifier.notify = block_job_event_ready;
     job->idle_notifier.notify = block_job_on_idle;
 
-    notifier_list_add(&job->job.on_finalize_cancelled,
-                      &job->finalize_cancelled_notifier);
-    notifier_list_add(&job->job.on_finalize_completed,
-                      &job->finalize_completed_notifier);
-    notifier_list_add(&job->job.on_pending, &job->pending_notifier);
-    notifier_list_add(&job->job.on_ready, &job->ready_notifier);
-    notifier_list_add(&job->job.on_idle, &job->idle_notifier);
+    WITH_JOB_LOCK_GUARD() {
+        notifier_list_add(&job->job.on_finalize_cancelled,
+                          &job->finalize_cancelled_notifier);
+        notifier_list_add(&job->job.on_finalize_completed,
+                          &job->finalize_completed_notifier);
+        notifier_list_add(&job->job.on_pending, &job->pending_notifier);
+        notifier_list_add(&job->job.on_ready, &job->ready_notifier);
+        notifier_list_add(&job->job.on_idle, &job->idle_notifier);
+    }
 
     error_setg(&job->blocker, "block device is in use by block job: %s",
                job_type_str(&job->job));
@@ -477,11 +493,14 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     blk_set_disable_request_queuing(blk, true);
     blk_set_allow_aio_context_change(blk, true);
 
-    if (!block_job_set_speed(job, speed, errp)) {
-        job_early_fail(&job->job);
-        return NULL;
+    WITH_JOB_LOCK_GUARD() {
+        if (!block_job_set_speed(job, speed, errp)) {
+            job_early_fail_locked(&job->job);
+            return NULL;
+        }
     }
 
+
     return job;
 }
 
@@ -499,7 +518,9 @@ void block_job_user_resume(Job *job)
 {
     BlockJob *bjob = container_of(job, BlockJob, job);
     assert(qemu_in_main_thread());
-    block_job_iostatus_reset(bjob);
+    WITH_JOB_LOCK_GUARD() {
+        block_job_iostatus_reset(bjob);
+    }
 }
 
 BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
@@ -532,10 +553,15 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
                                         action);
     }
     if (action == BLOCK_ERROR_ACTION_STOP) {
-        if (!job->job.user_paused) {
-            job_pause_locked(&job->job);
-            /* make the pause user visible, which will be resumed from QMP. */
-            job->job.user_paused = true;
+        WITH_JOB_LOCK_GUARD() {
+            if (!job->job.user_paused) {
+                job_pause_locked(&job->job);
+                /*
+                 * make the pause user visible, which will be
+                 * resumed from QMP.
+                 */
+                job->job.user_paused = true;
+            }
         }
         block_job_iostatus_set_err(job, error);
     }
diff --git a/job-qmp.c b/job-qmp.c
index f6f9840436..9fa14bf761 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -171,6 +171,8 @@ JobInfoList *qmp_query_jobs(Error **errp)
     JobInfoList *head = NULL, **tail = &head;
     Job *job;
 
+    JOB_LOCK_GUARD();
+
     for (job = job_next_locked(NULL); job; job = job_next_locked(job)) {
         JobInfo *value;
 
diff --git a/job.c b/job.c
index 2ee7233763..56722a5043 100644
--- a/job.c
+++ b/job.c
@@ -394,6 +394,8 @@ void *job_create(const char *job_id, const JobDriver *driver, JobTxn *txn,
 {
     Job *job;
 
+    JOB_LOCK_GUARD();
+
     if (job_id) {
         if (flags & JOB_INTERNAL) {
             error_setg(errp, "Cannot specify job ID for internal job");
@@ -467,7 +469,9 @@ void job_unref_locked(Job *job)
         assert(!job->txn);
 
         if (job->driver->free) {
+            job_unlock();
             job->driver->free(job);
+            job_lock();
         }
 
         QLIST_REMOVE(job, job_list);
@@ -551,11 +555,14 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job))
     timer_del(&job->sleep_timer);
     job->busy = true;
     real_job_unlock();
+    job_unlock();
     aio_co_enter(job->aio_context, job->co);
+    job_lock();
 }
 
 void job_enter(Job *job)
 {
+    JOB_LOCK_GUARD();
     job_enter_cond_locked(job, NULL);
 }
 
@@ -574,7 +581,9 @@ static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
     job->busy = false;
     job_event_idle(job);
     real_job_unlock();
+    job_unlock();
     qemu_coroutine_yield();
+    job_lock();
 
     /* Set by job_enter_cond_locked() before re-entering the coroutine.  */
     assert(job->busy);
@@ -584,18 +593,23 @@ void coroutine_fn job_pause_point(Job *job)
 {
     assert(job && job_started(job));
 
+    job_lock();
     if (!job_should_pause(job)) {
+        job_unlock();
         return;
     }
-    if (job_is_cancelled(job)) {
+    if (job_is_cancelled_locked(job)) {
+        job_unlock();
         return;
     }
 
     if (job->driver->pause) {
+        job_unlock();
         job->driver->pause(job);
+        job_lock();
     }
 
-    if (job_should_pause(job) && !job_is_cancelled(job)) {
+    if (job_should_pause(job) && !job_is_cancelled_locked(job)) {
         JobStatus status = job->status;
         job_state_transition(job, status == JOB_STATUS_READY
                                   ? JOB_STATUS_STANDBY
@@ -605,6 +619,7 @@ void coroutine_fn job_pause_point(Job *job)
         job->paused = false;
         job_state_transition(job, status);
     }
+    job_unlock();
 
     if (job->driver->resume) {
         job->driver->resume(job);
@@ -613,15 +628,17 @@ void coroutine_fn job_pause_point(Job *job)
 
 void job_yield(Job *job)
 {
-    assert(job->busy);
+    WITH_JOB_LOCK_GUARD() {
+        assert(job->busy);
 
-    /* Check cancellation *before* setting busy = false, too!  */
-    if (job_is_cancelled(job)) {
-        return;
-    }
+        /* Check cancellation *before* setting busy = false, too!  */
+        if (job_is_cancelled_locked(job)) {
+            return;
+        }
 
-    if (!job_should_pause(job)) {
-        job_do_yield(job, -1);
+        if (!job_should_pause(job)) {
+            job_do_yield(job, -1);
+        }
     }
 
     job_pause_point(job);
@@ -629,21 +646,23 @@ void job_yield(Job *job)
 
 void coroutine_fn job_sleep_ns(Job *job, int64_t ns)
 {
-    assert(job->busy);
+    WITH_JOB_LOCK_GUARD() {
+        assert(job->busy);
 
-    /* Check cancellation *before* setting busy = false, too!  */
-    if (job_is_cancelled(job)) {
-        return;
-    }
+        /* Check cancellation *before* setting busy = false, too!  */
+        if (job_is_cancelled_locked(job)) {
+            return;
+        }
 
-    if (!job_should_pause(job)) {
-        job_do_yield(job, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);
+        if (!job_should_pause(job)) {
+            job_do_yield(job, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);
+        }
     }
 
     job_pause_point(job);
 }
 
-/* Assumes the block_job_mutex is held */
+/* Assumes the job_mutex is held */
 static bool job_timer_not_pending(Job *job)
 {
     return !timer_pending(&job->sleep_timer);
@@ -653,7 +672,7 @@ void job_pause_locked(Job *job)
 {
     job->pause_count++;
     if (!job->paused) {
-        job_enter(job);
+        job_enter_cond_locked(job, NULL);
     }
 }
 
@@ -699,7 +718,9 @@ void job_user_resume_locked(Job *job, Error **errp)
         return;
     }
     if (job->driver->user_resume) {
+        job_unlock();
         job->driver->user_resume(job);
+        job_lock();
     }
     job->user_paused = false;
     job_resume_locked(job);
@@ -753,7 +774,7 @@ static void job_conclude(Job *job)
 
 static void job_update_rc(Job *job)
 {
-    if (!job->ret && job_is_cancelled(job)) {
+    if (!job->ret && job_is_cancelled_locked(job)) {
         job->ret = -ECANCELED;
     }
     if (job->ret) {
@@ -769,7 +790,9 @@ static void job_commit(Job *job)
     assert(!job->ret);
     assert(qemu_in_main_thread());
     if (job->driver->commit) {
+        job_unlock();
         job->driver->commit(job);
+        job_lock();
     }
 }
 
@@ -778,7 +801,9 @@ static void job_abort(Job *job)
     assert(job->ret);
     assert(qemu_in_main_thread());
     if (job->driver->abort) {
+        job_unlock();
         job->driver->abort(job);
+        job_lock();
     }
 }
 
@@ -786,12 +811,15 @@ static void job_clean(Job *job)
 {
     assert(qemu_in_main_thread());
     if (job->driver->clean) {
+        job_unlock();
         job->driver->clean(job);
+        job_lock();
     }
 }
 
 static int job_finalize_single(Job *job)
 {
+    int job_ret;
     AioContext *ctx = job->aio_context;
 
     assert(job_is_completed_locked(job));
@@ -811,12 +839,15 @@ static int job_finalize_single(Job *job)
     aio_context_release(ctx);
 
     if (job->cb) {
-        job->cb(job->opaque, job->ret);
+        job_ret = job->ret;
+        job_unlock();
+        job->cb(job->opaque, job_ret);
+        job_lock();
     }
 
     /* Emit events only if we actually started */
     if (job_started(job)) {
-        if (job_is_cancelled(job)) {
+        if (job_is_cancelled_locked(job)) {
             job_event_cancelled(job);
         } else {
             job_event_completed(job);
@@ -832,7 +863,9 @@ static void job_cancel_async(Job *job, bool force)
 {
     assert(qemu_in_main_thread());
     if (job->driver->cancel) {
+        job_unlock();
         force = job->driver->cancel(job, force);
+        job_lock();
     } else {
         /* No .cancel() means the job will behave as if force-cancelled */
         force = true;
@@ -841,7 +874,9 @@ static void job_cancel_async(Job *job, bool force)
     if (job->user_paused) {
         /* Do not call job_enter here, the caller will handle it.  */
         if (job->driver->user_resume) {
+            job_unlock();
             job->driver->user_resume(job);
+            job_lock();
         }
         job->user_paused = false;
         assert(job->pause_count > 0);
@@ -911,7 +946,7 @@ static void job_completed_txn_abort(Job *job)
         ctx = other_job->aio_context;
         aio_context_acquire(ctx);
         if (!job_is_completed_locked(other_job)) {
-            assert(job_cancel_requested(other_job));
+            assert(job_cancel_requested_locked(other_job));
             job_finish_sync_locked(other_job, NULL, NULL);
         }
         job_finalize_single(other_job);
@@ -930,13 +965,17 @@ static void job_completed_txn_abort(Job *job)
 
 static int job_prepare(Job *job)
 {
+    int ret;
     AioContext *ctx = job->aio_context;
     assert(qemu_in_main_thread());
 
     if (job->ret == 0 && job->driver->prepare) {
+        job_unlock();
         aio_context_acquire(ctx);
-        job->ret = job->driver->prepare(job);
+        ret = job->driver->prepare(job);
         aio_context_release(ctx);
+        job_lock();
+        job->ret = ret;
         job_update_rc(job);
     }
 
@@ -982,6 +1021,7 @@ static int job_transition_to_pending(Job *job)
 
 void job_transition_to_ready(Job *job)
 {
+    JOB_LOCK_GUARD();
     job_state_transition(job, JOB_STATUS_READY);
     job_event_ready(job);
 }
@@ -1031,6 +1071,7 @@ static void job_exit(void *opaque)
     Job *job = (Job *)opaque;
     AioContext *ctx;
 
+    JOB_LOCK_GUARD();
     job_ref_locked(job);
     aio_context_acquire(job->aio_context);
 
@@ -1061,13 +1102,17 @@ static void job_exit(void *opaque)
 static void coroutine_fn job_co_entry(void *opaque)
 {
     Job *job = opaque;
+    int ret;
 
     assert(job->aio_context == qemu_get_current_aio_context());
     assert(job && job->driver && job->driver->run);
     job_pause_point(job);
-    job->ret = job->driver->run(job, &job->err);
-    job->deferred_to_main_loop = true;
-    job->busy = true;
+    ret = job->driver->run(job, &job->err);
+    WITH_JOB_LOCK_GUARD() {
+        job->ret = ret;
+        job->deferred_to_main_loop = true;
+        job->busy = true;
+    }
     aio_bh_schedule_oneshot(qemu_get_aio_context(), job_exit, job);
 }
 
@@ -1083,16 +1128,20 @@ static int job_pre_run(Job *job)
 
 void job_start(Job *job)
 {
-    assert(job && !job_started(job) && job->paused &&
-           job->driver && job->driver->run);
-    job->co = qemu_coroutine_create(job_co_entry, job);
+    WITH_JOB_LOCK_GUARD() {
+        assert(job && !job_started(job) && job->paused &&
+            job->driver && job->driver->run);
+        job->co = qemu_coroutine_create(job_co_entry, job);
+    }
     if (job_pre_run(job)) {
         return;
     }
-    job->pause_count--;
-    job->busy = true;
-    job->paused = false;
-    job_state_transition(job, JOB_STATUS_RUNNING);
+    WITH_JOB_LOCK_GUARD() {
+        job->pause_count--;
+        job->busy = true;
+        job->paused = false;
+        job_state_transition(job, JOB_STATUS_RUNNING);
+    }
     aio_co_enter(job->aio_context, job->co);
 }
 
@@ -1116,11 +1165,11 @@ void job_cancel_locked(Job *job, bool force)
          * choose to call job_is_cancelled() to show that we invoke
          * job_completed_txn_abort() only for force-cancelled jobs.)
          */
-        if (job_is_cancelled(job)) {
+        if (job_is_cancelled_locked(job)) {
             job_completed_txn_abort(job);
         }
     } else {
-        job_enter(job);
+        job_enter_cond_locked(job, NULL);
     }
 }
 
@@ -1164,6 +1213,7 @@ void job_cancel_sync_all(void)
     Job *job;
     AioContext *aio_context;
 
+    JOB_LOCK_GUARD();
     while ((job = job_next_locked(NULL))) {
         aio_context = job->aio_context;
         aio_context_acquire(aio_context);
@@ -1185,13 +1235,15 @@ void job_complete_locked(Job *job, Error **errp)
     if (job_apply_verb_locked(job, JOB_VERB_COMPLETE, errp)) {
         return;
     }
-    if (job_cancel_requested(job) || !job->driver->complete) {
+    if (job_cancel_requested_locked(job) || !job->driver->complete) {
         error_setg(errp, "The active block job '%s' cannot be completed",
                    job->id);
         return;
     }
 
+    job_unlock();
     job->driver->complete(job, errp);
+    job_lock();
 }
 
 int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
@@ -1211,10 +1263,12 @@ int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
         return -EBUSY;
     }
 
-    AIO_WAIT_WHILE(job->aio_context,
-                   (job_enter(job), !job_is_completed_locked(job)));
+    job_unlock();
+    AIO_WAIT_WHILE(job->aio_context, (job_enter(job), !job_is_completed(job)));
+    job_lock();
 
-    ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret;
+    ret = (job_is_cancelled_locked(job) && job->ret == 0) ?
+           -ECANCELED : job->ret;
     job_unref_locked(job);
     return ret;
 }
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 343353e27a..2f11d086a6 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -133,8 +133,10 @@ void qmp_cont(Error **errp)
         blk_iostatus_reset(blk);
     }
 
-    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
-        block_job_iostatus_reset(job);
+    WITH_JOB_LOCK_GUARD() {
+        for (job = block_job_next(NULL); job; job = block_job_next(job)) {
+            block_job_iostatus_reset(job);
+        }
     }
 
     /* Continuing after completed migration. Images have been inactivated to
diff --git a/qemu-img.c b/qemu-img.c
index 09f3b11eab..95e2e33e61 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -906,25 +906,30 @@ static void run_block_job(BlockJob *job, Error **errp)
     int ret = 0;
 
     aio_context_acquire(aio_context);
-    job_ref_locked(&job->job);
-    do {
-        float progress = 0.0f;
-        aio_poll(aio_context, true);
+    WITH_JOB_LOCK_GUARD() {
+        job_ref_locked(&job->job);
+        do {
+            float progress = 0.0f;
+            job_unlock();
+            aio_poll(aio_context, true);
+
+            progress_get_snapshot(&job->job.progress, &progress_current,
+                                &progress_total);
+            if (progress_total) {
+                progress = (float)progress_current / progress_total * 100.f;
+            }
+            qemu_progress_print(progress, 0);
+            job_lock();
+        } while (!job_is_ready_locked(&job->job) &&
+                !job_is_completed_locked(&job->job));
 
-        progress_get_snapshot(&job->job.progress, &progress_current,
-                              &progress_total);
-        if (progress_total) {
-            progress = (float)progress_current / progress_total * 100.f;
+        if (!job_is_completed_locked(&job->job)) {
+            ret = job_complete_sync_locked(&job->job, errp);
+        } else {
+            ret = job->job.ret;
         }
-        qemu_progress_print(progress, 0);
-    } while (!job_is_ready(&job->job) && !job_is_completed_locked(&job->job));
-
-    if (!job_is_completed_locked(&job->job)) {
-        ret = job_complete_sync_locked(&job->job, errp);
-    } else {
-        ret = job->job.ret;
+        job_unref_locked(&job->job);
     }
-    job_unref_locked(&job->job);
     aio_context_release(aio_context);
 
     /* publish completion progress only when success */
@@ -1077,7 +1082,9 @@ static int img_commit(int argc, char **argv)
         bdrv_ref(bs);
     }
 
-    job = block_job_get("commit");
+    WITH_JOB_LOCK_GUARD() {
+        job = block_job_get("commit");
+    }
     assert(job);
     run_block_job(job, &local_err);
     if (local_err) {
-- 
2.31.1



^ permalink raw reply related

* Re: [PATCH] f2fs: quota: fix potential deadlock
From: Greg KH @ 2022-01-05 15:01 UTC (permalink / raw)
  To: Chao Yu; +Cc: stable, jaegeuk, linux-f2fs-devel, Yi Zhuang
In-Reply-To: <f07cbfa3-29f8-c671-98cf-45b664000f95@kernel.org>

On Tue, Jan 04, 2022 at 11:48:25PM +0800, Chao Yu wrote:
> On 2022/1/4 23:17, Greg KH wrote:
> > On Tue, Jan 04, 2022 at 11:05:36PM +0800, Chao Yu wrote:
> > > On 2022/1/4 21:18, Greg KH wrote:
> > > > On Tue, Jan 04, 2022 at 09:05:13PM +0800, Chao Yu wrote:
> > > > > commit a5c0042200b28fff3bde6fa128ddeaef97990f8d upstream.
> > > > > 
> > > > > As Yi Zhuang reported in bugzilla:
> > > > > 
> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=214299
> > > > > 
> > > > > There is potential deadlock during quota data flush as below:
> > > > > 
> > > > > Thread A:			Thread B:
> > > > > f2fs_dquot_acquire
> > > > > down_read(&sbi->quota_sem)
> > > > > 				f2fs_write_checkpoint
> > > > > 				block_operations
> > > > > 				f2fs_look_all
> > > > > 				down_write(&sbi->cp_rwsem)
> > > > > f2fs_quota_write
> > > > > f2fs_write_begin
> > > > > __do_map_lock
> > > > > f2fs_lock_op
> > > > > down_read(&sbi->cp_rwsem)
> > > > > 				__need_flush_qutoa
> > > > > 				down_write(&sbi->quota_sem)
> > > > > 
> > > > > This patch changes block_operations() to use trylock, if it fails,
> > > > > it means there is potential quota data updater, in this condition,
> > > > > let's flush quota data first and then trylock again to check dirty
> > > > > status of quota data.
> > > > > 
> > > > > The side effect is: in heavy race condition (e.g. multi quota data
> > > > > upaters vs quota data flusher), it may decrease the probability of
> > > > > synchronizing quota data successfully in checkpoint() due to limited
> > > > > retry time of quota flush.
> > > > > 
> > > > > Fixes: db6ec53b7e03 ("f2fs: add a rw_sem to cover quota flag changes")
> > > > > Cc: stable@vger.kernel.org # v5.3+
> > > > > Reported-by: Yi Zhuang <zhuangyi1@huawei.com>
> > > > > Signed-off-by: Chao Yu <chao@kernel.org>
> > > > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > ---
> > > > >    fs/f2fs/checkpoint.c | 3 ++-
> > > > >    1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > > > > index 83e9bc0f91ff..7b0282724231 100644
> > > > > --- a/fs/f2fs/checkpoint.c
> > > > > +++ b/fs/f2fs/checkpoint.c
> > > > > @@ -1162,7 +1162,8 @@ static bool __need_flush_quota(struct f2fs_sb_info *sbi)
> > > > >    	if (!is_journalled_quota(sbi))
> > > > >    		return false;
> > > > > -	down_write(&sbi->quota_sem);
> > > > > +	if (!down_write_trylock(&sbi->quota_sem))
> > > > > +		return true;
> > > > >    	if (is_sbi_flag_set(sbi, SBI_QUOTA_SKIP_FLUSH)) {
> > > > >    		ret = false;
> > > > >    	} else if (is_sbi_flag_set(sbi, SBI_QUOTA_NEED_REPAIR)) {
> > > > > -- 
> > > > > 2.32.0
> > > > > 
> > > > 
> > > > What stable tree(s) is this for?
> > > 
> > > Oh, please help to try applying to 5.4, 5.10, and 5.15 stable trees, thanks!
> > 
> > This is already in the 5.15.6 kernel release, do you need it applied
> > there again?  :)
> 
> Oops, no, so 5.4 and 5.10 is enough. ;)
> We can skip 5.15 since this patch was merged in 5.15-rc1 at the first time.

It was merged in 5.16-rc1, and then backported to 5.15.6.  You might
want to check your git scripts.

Anyway, now queued up, thanks.

greg k-h

^ permalink raw reply

* Re: [PATCH v1 21/34] hw/arm: add control knob to disable kaslr_seed via DTB
From: Andrew Jones @ 2022-01-05 14:49 UTC (permalink / raw)
  To: Alex Bennée
  Cc: fam, Peter Maydell, berrange, Heinrich Schuchardt,
	Ilias Apalodimas, qemu-devel, f4bug, pbonzini, aurelien, stefanha,
	crosa, Jerome Forissier, open list:Virt
In-Reply-To: <20220105135009.1584676-22-alex.bennee@linaro.org>

On Wed, Jan 05, 2022 at 01:49:56PM +0000, Alex Bennée wrote:
> Generally a guest needs an external source of randomness to properly
> enable things like address space randomisation. However in a trusted
> boot environment where the firmware will cryptographically verify
> components having random data in the DTB will cause verification to
> fail. Add a control knob so we can prevent this being added to the
> system DTB.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Acked-by: Jerome Forissier <jerome@forissier.org>
> Message-Id: <20211215120926.1696302-1-alex.bennee@linaro.org>
> ---
>  docs/system/arm/virt.rst |  7 +++++++
>  include/hw/arm/virt.h    |  1 +
>  hw/arm/virt.c            | 32 ++++++++++++++++++++++++++++++--
>  3 files changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
> index 850787495b..c86a4808df 100644
> --- a/docs/system/arm/virt.rst
> +++ b/docs/system/arm/virt.rst
> @@ -121,6 +121,13 @@ ras
>    Set ``on``/``off`` to enable/disable reporting host memory errors to a guest
>    using ACPI and guest external abort exceptions. The default is off.
>  
> +kaslr-dtb-seed
> +  Set ``on``/``off`` to pass a random seed via the guest dtb to use for features
> +  like address space randomisation. The default is ``on``. You will want
> +  to disable it if your trusted boot chain will verify the DTB it is
> +  passed. It would be the responsibility of the firmware to come up
> +  with a seed and pass it on if it wants to.
> +
>  Linux guest kernel configuration
>  """"""""""""""""""""""""""""""""
>  
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index dc6b66ffc8..acd0665fe7 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -148,6 +148,7 @@ struct VirtMachineState {
>      bool virt;
>      bool ras;
>      bool mte;
> +    bool kaslr_dtb_seed;
>      OnOffAuto acpi;
>      VirtGICType gic_version;
>      VirtIOMMUType iommu;
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 6bce595aba..1781e47c76 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -247,11 +247,15 @@ static void create_fdt(VirtMachineState *vms)
>  
>      /* /chosen must exist for load_dtb to fill in necessary properties later */
>      qemu_fdt_add_subnode(fdt, "/chosen");
> -    create_kaslr_seed(ms, "/chosen");
> +    if (vms->kaslr_dtb_seed) {
> +        create_kaslr_seed(ms, "/chosen");
> +    }
>  
>      if (vms->secure) {
>          qemu_fdt_add_subnode(fdt, "/secure-chosen");
> -        create_kaslr_seed(ms, "/secure-chosen");
> +        if (vms->kaslr_dtb_seed) {
> +            create_kaslr_seed(ms, "/secure-chosen");
> +        }
>      }
>  
>      /* Clock node, for the benefit of the UART. The kernel device tree
> @@ -2235,6 +2239,20 @@ static void virt_set_its(Object *obj, bool value, Error **errp)
>      vms->its = value;
>  }
>  
> +static bool virt_get_kaslr_dtb_seed(Object *obj, Error **errp)
> +{
> +    VirtMachineState *vms = VIRT_MACHINE(obj);
> +
> +    return vms->kaslr_dtb_seed;
> +}
> +
> +static void virt_set_kaslr_dtb_seed(Object *obj, bool value, Error **errp)
> +{
> +    VirtMachineState *vms = VIRT_MACHINE(obj);
> +
> +    vms->kaslr_dtb_seed = value;
> +}
> +
>  static char *virt_get_oem_id(Object *obj, Error **errp)
>  {
>      VirtMachineState *vms = VIRT_MACHINE(obj);
> @@ -2764,6 +2782,13 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>                                            "Set on/off to enable/disable "
>                                            "ITS instantiation");
>  
> +    object_class_property_add_bool(oc, "kaslr-dtb-seed",
> +                                   virt_get_kaslr_dtb_seed,
> +                                   virt_set_kaslr_dtb_seed);
> +    object_class_property_set_description(oc, "kaslr-dtb-seed",
> +                                          "Set off to disable passing of kaslr "
> +                                          "dtb node to guest");
> +
>      object_class_property_add_str(oc, "x-oem-id",
>                                    virt_get_oem_id,
>                                    virt_set_oem_id);
> @@ -2828,6 +2853,9 @@ static void virt_instance_init(Object *obj)
>      /* MTE is disabled by default.  */
>      vms->mte = false;
>  
> +    /* Supply a kaslr-seed by default */
> +    vms->kaslr_dtb_seed = true;
> +
>      vms->irqmap = a15irqmap;
>  
>      virt_flash_create(vms);
> -- 
> 2.30.2
> 
>

Reviewed-by: Andrew Jones <drjones@redhat.com>



^ permalink raw reply

* Re: [f2fs-dev] [PATCH] f2fs: quota: fix potential deadlock
From: Greg KH @ 2022-01-05 15:01 UTC (permalink / raw)
  To: Chao Yu; +Cc: jaegeuk, Yi Zhuang, stable, linux-f2fs-devel
In-Reply-To: <f07cbfa3-29f8-c671-98cf-45b664000f95@kernel.org>

On Tue, Jan 04, 2022 at 11:48:25PM +0800, Chao Yu wrote:
> On 2022/1/4 23:17, Greg KH wrote:
> > On Tue, Jan 04, 2022 at 11:05:36PM +0800, Chao Yu wrote:
> > > On 2022/1/4 21:18, Greg KH wrote:
> > > > On Tue, Jan 04, 2022 at 09:05:13PM +0800, Chao Yu wrote:
> > > > > commit a5c0042200b28fff3bde6fa128ddeaef97990f8d upstream.
> > > > > 
> > > > > As Yi Zhuang reported in bugzilla:
> > > > > 
> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=214299
> > > > > 
> > > > > There is potential deadlock during quota data flush as below:
> > > > > 
> > > > > Thread A:			Thread B:
> > > > > f2fs_dquot_acquire
> > > > > down_read(&sbi->quota_sem)
> > > > > 				f2fs_write_checkpoint
> > > > > 				block_operations
> > > > > 				f2fs_look_all
> > > > > 				down_write(&sbi->cp_rwsem)
> > > > > f2fs_quota_write
> > > > > f2fs_write_begin
> > > > > __do_map_lock
> > > > > f2fs_lock_op
> > > > > down_read(&sbi->cp_rwsem)
> > > > > 				__need_flush_qutoa
> > > > > 				down_write(&sbi->quota_sem)
> > > > > 
> > > > > This patch changes block_operations() to use trylock, if it fails,
> > > > > it means there is potential quota data updater, in this condition,
> > > > > let's flush quota data first and then trylock again to check dirty
> > > > > status of quota data.
> > > > > 
> > > > > The side effect is: in heavy race condition (e.g. multi quota data
> > > > > upaters vs quota data flusher), it may decrease the probability of
> > > > > synchronizing quota data successfully in checkpoint() due to limited
> > > > > retry time of quota flush.
> > > > > 
> > > > > Fixes: db6ec53b7e03 ("f2fs: add a rw_sem to cover quota flag changes")
> > > > > Cc: stable@vger.kernel.org # v5.3+
> > > > > Reported-by: Yi Zhuang <zhuangyi1@huawei.com>
> > > > > Signed-off-by: Chao Yu <chao@kernel.org>
> > > > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > ---
> > > > >    fs/f2fs/checkpoint.c | 3 ++-
> > > > >    1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > > > > index 83e9bc0f91ff..7b0282724231 100644
> > > > > --- a/fs/f2fs/checkpoint.c
> > > > > +++ b/fs/f2fs/checkpoint.c
> > > > > @@ -1162,7 +1162,8 @@ static bool __need_flush_quota(struct f2fs_sb_info *sbi)
> > > > >    	if (!is_journalled_quota(sbi))
> > > > >    		return false;
> > > > > -	down_write(&sbi->quota_sem);
> > > > > +	if (!down_write_trylock(&sbi->quota_sem))
> > > > > +		return true;
> > > > >    	if (is_sbi_flag_set(sbi, SBI_QUOTA_SKIP_FLUSH)) {
> > > > >    		ret = false;
> > > > >    	} else if (is_sbi_flag_set(sbi, SBI_QUOTA_NEED_REPAIR)) {
> > > > > -- 
> > > > > 2.32.0
> > > > > 
> > > > 
> > > > What stable tree(s) is this for?
> > > 
> > > Oh, please help to try applying to 5.4, 5.10, and 5.15 stable trees, thanks!
> > 
> > This is already in the 5.15.6 kernel release, do you need it applied
> > there again?  :)
> 
> Oops, no, so 5.4 and 5.10 is enough. ;)
> We can skip 5.15 since this patch was merged in 5.15-rc1 at the first time.

It was merged in 5.16-rc1, and then backported to 5.15.6.  You might
want to check your git scripts.

Anyway, now queued up, thanks.

greg k-h


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply

* Re: [PATCH] RDMA: null pointer in __ib_umem_release causes kernel panic
From: Trond Myklebust @ 2022-01-05 15:02 UTC (permalink / raw)
  To: jgg@nvidia.com, trondmy@kernel.org
  Cc: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org
In-Reply-To: <20220105143705.GS2328285@nvidia.com>

On Wed, 2022-01-05 at 10:37 -0400, Jason Gunthorpe wrote:
> On Wed, Jan 05, 2022 at 09:18:41AM -0500, trondmy@kernel.org wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > When doing RPC/RDMA, we're seeing a kernel panic when
> > __ib_umem_release()
> > iterates over the scatter gather list and hits NULL pages.
> > 
> > It turns out that commit 79fbd3e1241c ended up changing the
> > iteration
> > from being over only the mapped entries to being over the original
> > list
> > size.
> 
> You mean this?
> 
> -       for_each_sg(umem->sg_head.sgl, sg, umem->sg_nents, i)
> +       for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i)
> 
> I don't see what changed there? The invarient should be that
> 
>   umem->sg_nents == sgt->orig_nents
> 
> > @@ -55,7 +55,7 @@ static void __ib_umem_release(struct ib_device
> > *dev, struct ib_umem *umem, int d
> >                 ib_dma_unmap_sgtable_attrs(dev, &umem-
> > >sgt_append.sgt,
> >                                            DMA_BIDIRECTIONAL, 0);
> >  
> > -       for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i)
> > +       for_each_sgtable_dma_sg(&umem->sgt_append.sgt, sg, i)
> >                 unpin_user_page_range_dirty_lock(sg_page(sg),
> 
> Calling sg_page() from under a dma_sg iterator is unconditionally
> wrong..
> 
> More likely your case is something has gone wrong when the sgtable
> was
> created and it has the wrong value in orig_nents..

Can you define "wrong value" in this case? Chuck's RPC/RDMA code
appears to call ib_alloc_mr() with an 'expected maximum number of
entries' (depth) in net/sunrpc/xprtrdma/frwr_ops.c:frwr_mr_init().

It then fills that table with a set of n <= depth pages in
net/sunrpc/xprtrdma/frwr_ops.c:frwr_map() and calls ib_dma_map_sg() to
map them, and then adjusts the sgtable with a call to ib_map_mr_sg().


What part of that sequence of operations is incorrect?
> 
> Jason

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply


This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.