1. 16 Nov, 2019 1 commit
  2. 24 Aug, 2019 1 commit
    • Zhu Yanjun's avatar
      net: rds: add service level support in rds-info · e0e6d062
      Zhu Yanjun authored
      >From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
      is used to identify different flows within an IBA subnet.
      It is carried in the local route header of the packet.
      
      Before this commit, run "rds-info -I". The outputs are as
      below:
      "
      RDS IB Connections:
       LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
      192.2.95.3  192.2.95.1  2   0  fe80::21:28:1a:39  fe80::21:28:10:b9
      192.2.95.3  192.2.95.1  1   0  fe80::21:28:1a:39  fe80::21:28:10:b9
      192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
      "
      After this commit, the output is as below:
      "
      RDS IB Connections:
       LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
      192.2.95.3  192.2.95.1  2   2  fe80::21:28:1a:39  fe80::21:28:10:b9
      192.2.95.3  192.2.95.1  1   1  fe80::21:28:1a:39  fe80::21:28:10:b9
      192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
      "
      
      The commit fe3475af ("net: rds: add per rds connection cache
      statistics") adds cache_allocs in struct rds_info_rdma_connection
      as below:
      struct rds_info_rdma_connection {
      ...
              __u32           rdma_mr_max;
              __u32           rdma_mr_size;
              __u8            tos;
              __u32           cache_allocs;
       };
      The peer struct in rds-tools of struct rds_info_rdma_connection is as
      below:
      struct rds_info_rdma_connection {
      ...
              uint32_t        rdma_mr_max;
              uint32_t        rdma_mr_size;
              uint8_t         tos;
              uint8_t         sl;
              uint32_t        cache_allocs;
      };
      The difference between userspace and kernel is the member variable sl.
      In the kernel struct, the member variable sl is missing. This will
      introduce risks. So it is necessary to use this commit to avoid this risk.
      
      Fixes: fe3475af
      
       ("net: rds: add per rds connection cache statistics")
      CC: Joe Jin <joe.jin@oracle.com>
      CC: JUNXIAO_BI <junxiao.bi@oracle.com>
      Suggested-by: default avatarGerd Rausch <gerd.rausch@oracle.com>
      Signed-off-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0e6d062
  3. 17 Jul, 2019 2 commits
  4. 10 Jul, 2019 1 commit
    • Gerd Rausch's avatar
      Revert "RDS: IB: split the mr registration and invalidation path" · a5520788
      Gerd Rausch authored
      This reverts commit 56012459.
      
      RDS kept spinning inside function "rds_ib_post_reg_frmr", waiting for
      "i_fastreg_wrs" to become incremented:
               while (atomic_dec_return(&ibmr->ic->i_fastreg_wrs) <= 0) {
                       atomic_inc(&ibmr->ic->i_fastreg_wrs);
                       cpu_relax();
               }
      
      Looking at the original commit:
      
      commit 56012459
      
       ("RDS: IB: split the mr registration and
      invalidation path")
      
      In there, the "rds_ib_mr_cqe_handler" was changed in the following
      way:
      
       void rds_ib_mr_cqe_handler(struct
       rds_ib_connection *ic,
       struct ib_wc *wc)
              if (frmr->fr_inv) {
                        frmr->fr_state = FRMR_IS_FREE;
                        frmr->fr_inv = false;
                      atomic_inc(&ic->i_fastreg_wrs);
              } else {
                      atomic_inc(&ic->i_fastunreg_wrs);
              }
      
      It looks like it's got it exactly backwards:
      
      Function "rds_ib_post_reg_frmr" keeps track of the outstanding
      requests via "i_fastreg_wrs".
      
      Function "rds_ib_post_inv" keeps track of the outstanding requests
      via "i_fastunreg_wrs" (post original commit). It also sets:
               frmr->fr_inv = true;
      
      However the completion handler "rds_ib_mr_cqe_handler" adjusts
      "i_fastreg_wrs" when "fr_inv" had been true, and adjusts
      "i_fastunreg_wrs" otherwise.
      
      The original commit was done in the name of performance:
      to remove the performance bottleneck
      
      No performance benefit could be observed with a fixed-up version
      of the original commit measured between two Oracle X7 servers,
      both equipped with Mellanox Connect-X5 HCAs.
      
      The prudent course of action is to revert this commit.
      
      Signed-off-by: default avatarGerd Rausch <gerd.rausch@oracle.com>
      Signed-off-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      a5520788
  5. 21 May, 2019 1 commit
  6. 04 Feb, 2019 4 commits
  7. 01 Aug, 2018 2 commits
  8. 24 Jul, 2018 2 commits
    • Ka-Cheong Poon's avatar
      rds: Enable RDS IPv6 support · 1e2b44e7
      Ka-Cheong Poon authored
      
      
      This patch enables RDS to use IPv6 addresses. For RDS/TCP, the
      listener is now an IPv6 endpoint which accepts both IPv4 and IPv6
      connection requests.  RDS/RDMA/IB uses a private data (struct
      rds_ib_connect_private) exchange between endpoints at RDS connection
      establishment time to support RDMA. This private data exchange uses a
      32 bit integer to represent an IP address. This needs to be changed in
      order to support IPv6. A new private data struct
      rds6_ib_connect_private is introduced to handle this. To ensure
      backward compatibility, an IPv6 capable RDS stack uses another RDMA
      listener port (RDS_CM_PORT) to accept IPv6 connection. And it
      continues to use the original RDS_PORT for IPv4 RDS connections. When
      it needs to communicate with an IPv6 peer, it uses the RDS_CM_PORT to
      send the connection set up request.
      
      v5: Fixed syntax problem (David Miller).
      
      v4: Changed port history comments in rds.h (Sowmini Varadhan).
      
      v3: Added support to set up IPv4 connection using mapped address
          (David Miller).
          Added support to set up connection between link local and non-link
          addresses.
          Various review comments from Santosh Shilimkar and Sowmini Varadhan.
      
      v2: Fixed bound and peer address scope mismatched issue.
          Added back rds_connect() IPv6 changes.
      
      Signed-off-by: default avatarKa-Cheong Poon <ka-cheong.poon@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e2b44e7
    • Ka-Cheong Poon's avatar
      rds: Changing IP address internal representation to struct in6_addr · eee2fa6a
      Ka-Cheong Poon authored
      
      
      This patch changes the internal representation of an IP address to use
      struct in6_addr.  IPv4 address is stored as an IPv4 mapped address.
      All the functions which take an IP address as argument are also
      changed to use struct in6_addr.  But RDS socket layer is not modified
      such that it still does not accept IPv6 address from an application.
      And RDS layer does not accept nor initiate IPv6 connections.
      
      v2: Fixed sparse warnings.
      
      Signed-off-by: default avatarKa-Cheong Poon <ka-cheong.poon@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eee2fa6a
  9. 12 Jun, 2018 1 commit
    • Kees Cook's avatar
      treewide: Use array_size() in vzalloc_node() · fd7beced
      Kees Cook authored
      
      
      The vzalloc_node() function has no 2-factor argument form, so
      multiplication factors need to be wrapped in array_size(). This patch
      replaces cases of:
      
              vzalloc_node(a * b, node)
      
      with:
              vzalloc_node(array_size(a, b), node)
      
      as well as handling cases of:
      
              vzalloc_node(a * b * c, node)
      
      with:
      
              vzalloc_node(array3_size(a, b, c), node)
      
      This does, however, attempt to ignore constant size factors like:
      
              vzalloc_node(4 * 1024, node)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        vzalloc_node(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        vzalloc_node(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        vzalloc_node(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc_node(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc_node(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc_node(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc_node(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc_node(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc_node(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc_node(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
        vzalloc_node(
      -	sizeof(TYPE) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
        vzalloc_node(
      -	SIZE * COUNT
      +	array_size(COUNT, SIZE)
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        vzalloc_node(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        vzalloc_node(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        vzalloc_node(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc_node(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc_node(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc_node(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc_node(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc_node(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc_node(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc_node(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        vzalloc_node(C1 * C2 * C3, ...)
      |
        vzalloc_node(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants.
      @@
      expression E1, E2;
      constant C1, C2;
      @@
      
      (
        vzalloc_node(C1 * C2, ...)
      |
        vzalloc_node(
      -	E1 * E2
      +	array_size(E1, E2)
        , ...)
      )
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      fd7beced
  10. 25 Apr, 2018 1 commit
  11. 08 Feb, 2018 1 commit
    • Sowmini Varadhan's avatar
      rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and... · ebeeb1ad
      Sowmini Varadhan authored
      rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management
      
      An rds_connection can get added during netns deletion between lines 528
      and 529 of
      
        506 static void rds_tcp_kill_sock(struct net *net)
        :
        /* code to pull out all the rds_connections that should be destroyed */
        :
        528         spin_unlock_irq(&rds_tcp_conn_lock);
        529         list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node)
        530                 rds_conn_destroy(tc->t_cpath->cp_conn);
      
      Such an rds_connection would miss out the rds_conn_destroy()
      loop (that cancels all pending work) and (if it was scheduled
      after netns deletion) could trigger the use-after-free.
      
      A similar race-window exists for the module unload path
      in rds_tcp_exit -> rds_tcp_destroy_conns
      
      Concurrency with netns deletion (rds_tcp_kill_sock()) must be handled
      by checking check_net() before enqueuing new work or adding new
      connections.
      
      Concurrency with module-unload is handled by maintaining a module
      specific flag that is set at the start of the module exit function,
      and must be checked before enqueuing new work or adding new connections.
      
      This commit refactors existing RDS_DESTROY_PENDING checks added by
      commit 3db6e0d1
      
       ("rds: use RCU to synchronize work-enqueue with
      connection teardown") and consolidates all the concurrency checks
      listed above into the function rds_destroy_pending().
      
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebeeb1ad
  12. 14 Mar, 2017 1 commit
  13. 09 Mar, 2017 1 commit
  14. 02 Jan, 2017 5 commits
  15. 01 Jul, 2016 2 commits
  16. 19 Jun, 2016 1 commit
  17. 15 Jun, 2016 2 commits
  18. 16 Apr, 2016 1 commit
  19. 02 Mar, 2016 3 commits
  20. 28 Oct, 2015 1 commit
  21. 05 Oct, 2015 3 commits
  22. 30 Aug, 2015 1 commit
  23. 25 Aug, 2015 2 commits