Skip to content
  • John Fastabend's avatar
    bpf: sk_msg, sock{map|hash} redirect through ULP · 0608c69c
    John Fastabend authored
    A sockmap program that redirects through a kTLS ULP enabled socket
    will not work correctly because the ULP layer is skipped. This
    fixes the behavior to call through the ULP layer on redirect to
    ensure any operations required on the data stream at the ULP layer
    continue to be applied.
    
    To do this we add an internal flag MSG_SENDPAGE_NOPOLICY to avoid
    calling the BPF layer on a redirected message. This is
    required to avoid calling the BPF layer multiple times (possibly
    recursively) which is not the current/expected behavior without
    ULPs. In the future we may add a redirect flag if users _do_
    want the policy applied again but this would need to work for both
    ULP and non-ULP sockets and be opt-in to avoid breaking existing
    programs.
    
    Also to avoid polluting the flag space with an internal flag we
    reuse the flag space overlapping MSG_SENDPAGE_NOPOLICY with
    MSG_WAITFORONE. Here WAITFORONE is specific to recv path and
    SENDPAGE_NOPOLICY is only used for sendpage hooks. The last thing
    to verify is user space API is masked correctly to ensure the flag
    can not be set by user. (Note this needs to be true regardless
    because we have internal flags already in-use that user space
    should not be able to set). But for completeness we have two UAPI
    paths into sendpage, sendfile and splice.
    
    In the sendfile case the function do_sendfile() zero's flags,
    
    ./fs/read_write.c:
     static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
    		   	    size_t count, loff_t max)
     {
       ...
       fl = 0;
    #if 0
       /*
        * We need to debate whether we can enable this or not. The
        * man page documents EAGAIN return for the output at least,
        * and the application is arguably buggy if it doesn't expect
        * EAGAIN on a non-blocking file descriptor.
        */
        if (in.file->f_flags & O_NONBLOCK)
    	fl = SPLICE_F_NONBLOCK;
    #endif
        file_start_write(out.file);
        retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
     }
    
    In the splice case the pipe_to_sendpage "actor" is used which
    masks flags with SPLICE_F_MORE.
    
    ./fs/splice.c:
     static int pipe_to_sendpage(struct pipe_inode_info *pipe,
    			    struct pipe_buffer *buf, struct splice_desc *sd)
     {
       ...
       more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
       ...
     }
    
    Confirming what we expect that internal flags  are in fact internal
    to socket side.
    
    Fixes: d3b18ad3
    
     ("tls: add bpf support to sk_msg handling")
    Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    0608c69c