1. 06 Feb, 2015 2 commits
    • Jon Paul Maloy's avatar
      tipc: resolve race problem at unicast message reception · c637c103
      Jon Paul Maloy authored
      
      
      TIPC handles message cardinality and sequencing at the link layer,
      before passing messages upwards to the destination sockets. During the
      upcall from link to socket no locks are held. It is therefore possible,
      and we see it happen occasionally, that messages arriving in different
      threads and delivered in sequence still bypass each other before they
      reach the destination socket. This must not happen, since it violates
      the sequentiality guarantee.
      
      We solve this by adding a new input buffer queue to the link structure.
      Arriving messages are added safely to the tail of that queue by the
      link, while the head of the queue is consumed, also safely, by the
      receiving socket. Sequentiality is secured per socket by only allowing
      buffers to be dequeued inside the socket lock. Since there may be multiple
      simultaneous readers of the queue, we use a 'filter' parameter to reduce
      the risk that they peek the same buffer from the queue, hence also
      reducing the risk of contention on the receiving socket locks.
      
      This solves the sequentiality problem, and seems to cause no measurable
      performance degradation.
      
      A nice side effect of this change is that lock handling in the functions
      tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
      will enable future simplifications of those functions.
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c637c103
    • Jon Paul Maloy's avatar
      tipc: reduce usage of context info in socket and link · c5898636
      Jon Paul Maloy authored
      
      
      The most common usage of namespace information is when we fetch the
      own node addess from the net structure. This leads to a lot of
      passing around of a parameter of type 'struct net *' between
      functions just to make them able to obtain this address.
      
      However, in many cases this is unnecessary. The own node address
      is readily available as a member of both struct tipc_sock and
      tipc_link, and can be fetched from there instead.
      The fact that the vast majority of functions in socket.c and link.c
      anyway are maintaining a pointer to their respective base structures
      makes this option even more compelling.
      
      In this commit, we introduce the inline functions tsk_own_node()
      and link_own_node() to make it easy for functions to fetch the node
      address from those structs instead of having to pass along and
      dereference the namespace struct.
      
      In particular, we make calls to the msg_xx() functions in msg.{h,c}
      context independent by directly passing them the own node address
      as parameter when needed. Those functions should be regarded as
      leaves in the code dependency tree, and it is hence desirable to
      keep them namspace unaware.
      
      Apart from a potential positive effect on cache behavior, these
      changes make it easier to introduce the changes that will follow
      later in this series.
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5898636
  2. 12 Jan, 2015 3 commits
  3. 09 Dec, 2014 3 commits
  4. 26 Nov, 2014 2 commits
  5. 10 Sep, 2014 1 commit
  6. 02 Sep, 2014 2 commits
    • Erik Hugne's avatar
      tipc: add name distributor resiliency queue · a5325ae5
      Erik Hugne authored
      
      
      TIPC name table updates are distributed asynchronously in a cluster,
      entailing a risk of certain race conditions. E.g., if two nodes
      simultaneously issue conflicting (overlapping) publications, this may
      not be detected until both publications have reached a third node, in
      which case one of the publications will be silently dropped on that
      node. Hence, we end up with an inconsistent name table.
      
      In most cases this conflict is just a temporary race, e.g., one
      node is issuing a publication under the assumption that a previous,
      conflicting, publication has already been withdrawn by the other node.
      However, because of the (rtt related) distributed update delay, this
      may not yet hold true on all nodes. The symptom of this failure is a
      syslog message: "tipc: Cannot publish {%u,%u,%u}, overlap error".
      
      In this commit we add a resiliency queue at the receiving end of
      the name table distributor. When insertion of an arriving publication
      fails, we retain it in this queue for a short amount of time, assuming
      that another update will arrive very soon and clear the conflict. If so
      happens, we insert the publication, otherwise we drop it.
      
      The (configurable) retention value defaults to 2000 ms. Knowing from
      experience that the situation described above is extremely rare, there
      is no risk that the queue will accumulate any large number of items.
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Acked-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5325ae5
    • Erik Hugne's avatar
      tipc: refactor name table updates out of named packet receive routine · f4ad8a4b
      Erik Hugne authored
      
      
      We need to perform the same actions when processing deferred name
      table updates, so this functionality is moved to a separate
      function.
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Acked-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4ad8a4b
  7. 17 Jul, 2014 2 commits
  8. 05 May, 2014 2 commits
  9. 28 Apr, 2014 1 commit
    • Ying Xue's avatar
      tipc: move the delivery of named messages out of nametbl lock · eab8c045
      Ying Xue authored
      Commit a89778d8
      
       ("tipc: add support
      for link state subscriptions") introduced below possible deadlock
      scenario:
      
             CPU0                          CPU1
      T0:   tipc_publish()                 link_timeout()
      T1:   tipc_nametbl_publish()         [grab node lock]*
      T2:   [grab nametbl write lock]*     link_state_event()
      T3:   named_cluster_distribute()     link_activate()
      T4:   [grab node lock]*              tipc_node_link_up()
      T5:                                  tipc_nametbl_publish()
      T6:                                  [grab nametble write lock]*
      
      The opposite order of holding nametbl write lock and node lock on
      above two different paths may result in a deadlock. If we move the
      the delivery of named messages via link out of name nametbl lock,
      the reverse order of holding locks will be eliminated, as a result,
      the deadlock will be killed as well.
      Signed-off-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eab8c045
  10. 23 Apr, 2014 1 commit
    • Ying Xue's avatar
      tipc: purge tipc_net_lock lock · 7216cd94
      Ying Xue authored
      
      
      Now tipc routing hierarchy comprises the structures 'node', 'link'and
      'bearer'. The whole hierarchy is protected by a big read/write lock,
      tipc_net_lock, to ensure that nothing is added or removed while code
      is accessing any of these structures. Obviously the locking policy
      makes node, link and bearer components closely bound together so that
      their relationship becomes unnecessarily complex. In the worst case,
      such locking policy not only has a negative influence on performance,
      but also it's prone to lead to deadlock occasionally.
      
      In order o decouple the complex relationship between bearer and node
      as well as link, the locking policy is adjusted as follows:
      
      - Bearer level
        RTNL lock is used on update side, and RCU is used on read side.
        Meanwhile, all bearer instances including broadcast bearer are
        saved into bearer_list array.
      
      - Node and link level
        All node instances are saved into two tipc_node_list and node_htable
        lists. The two lists are protected by node_list_lock on write side,
        and they are guarded with RCU lock on read side. All members in node
        structure including link instances are protected by node spin lock.
      
      - The relationship between bearer and node
        When link accesses bearer, it first needs to find the bearer with
        its bearer identity from the bearer_list array. When bearer accesses
        node, it can iterate the node_htable hash list with the node
        address to find the corresponding node.
      
      In the new locking policy, every component has its private locking
      solution and the relationship between bearer and node is very simple,
      that is, they can find each other with node address or bearer identity
      from node_htable hash list or bearer_list array.
      
      Until now above all changes have been done, so tipc_net_lock can be
      removed safely.
      Signed-off-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Tested-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7216cd94
  11. 27 Mar, 2014 2 commits
  12. 18 Feb, 2014 1 commit
    • Ying Xue's avatar
      tipc: align tipc function names with common naming practice in the network · 247f0f3c
      Ying Xue authored
      
      
      Rename the following functions, which are shorter and more in line
      with common naming practice in the network subsystem.
      
      tipc_bclink_send_msg->tipc_bclink_xmit
      tipc_bclink_recv_pkt->tipc_bclink_rcv
      tipc_disc_recv_msg->tipc_disc_rcv
      tipc_link_send_proto_msg->tipc_link_proto_xmit
      link_recv_proto_msg->tipc_link_proto_rcv
      link_send_sections_long->tipc_link_iovec_long_xmit
      tipc_link_send_sections_fast->tipc_link_iovec_xmit_fast
      tipc_link_send_sync->tipc_link_sync_xmit
      tipc_link_recv_sync->tipc_link_sync_rcv
      tipc_link_send_buf->__tipc_link_xmit
      tipc_link_send->tipc_link_xmit
      tipc_link_send_names->tipc_link_names_xmit
      tipc_named_recv->tipc_named_rcv
      tipc_link_recv_bundle->tipc_link_bundle_rcv
      tipc_link_dup_send_queue->tipc_link_dup_queue_xmit
      link_send_long_buf->tipc_link_frag_xmit
      
      tipc_multicast->tipc_port_mcast_xmit
      tipc_port_recv_mcast->tipc_port_mcast_rcv
      tipc_port_reject_sections->tipc_port_iovec_reject
      tipc_port_recv_proto_msg->tipc_port_proto_rcv
      tipc_connect->tipc_port_connect
      __tipc_connect->__tipc_port_connect
      __tipc_disconnect->__tipc_port_disconnect
      tipc_disconnect->tipc_port_disconnect
      tipc_shutdown->tipc_port_shutdown
      tipc_port_recv_msg->tipc_port_rcv
      tipc_port_recv_sections->tipc_port_iovec_rcv
      
      release->tipc_release
      accept->tipc_accept
      bind->tipc_bind
      get_name->tipc_getname
      poll->tipc_poll
      send_msg->tipc_sendmsg
      send_packet->tipc_send_packet
      send_stream->tipc_send_stream
      recv_msg->tipc_recvmsg
      recv_stream->tipc_recv_stream
      connect->tipc_connect
      listen->tipc_listen
      shutdown->tipc_shutdown
      setsockopt->tipc_setsockopt
      getsockopt->tipc_getsockopt
      
      Above changes have no impact on current users of the functions.
      Signed-off-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      247f0f3c
  13. 22 Nov, 2012 1 commit
  14. 13 Jul, 2012 1 commit
  15. 30 Apr, 2012 1 commit
    • Paul Gortmaker's avatar
      tipc: compress out gratuitous extra carriage returns · 617d3c7a
      Paul Gortmaker authored
      
      
      Some of the comment blocks are floating in limbo between two
      functions, or between blocks of code.  Delete the extra line
      feeds between any comment and its associated following block
      of code, to be consistent with the majority of the rest of
      the kernel.  Also delete trailing newlines at EOF and fix
      a couple trivial typos in existing comments.
      
      This is a 100% cosmetic change with no runtime impact.  We get
      rid of over 500 lines of non-code, and being blank line deletes,
      they won't even show up as noise in git blame.
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      617d3c7a
  16. 19 Apr, 2012 2 commits
  17. 18 Apr, 2012 2 commits
  18. 24 Feb, 2012 1 commit
  19. 06 Feb, 2012 1 commit
  20. 30 Dec, 2011 1 commit
  21. 27 Dec, 2011 1 commit
  22. 18 Sep, 2011 3 commits
  23. 24 Jun, 2011 1 commit
  24. 31 Mar, 2011 1 commit
  25. 13 Mar, 2011 2 commits
    • Paul Gortmaker's avatar
      tipc: cosmetic - function names are not to be full sentences · 8f19afb2
      Paul Gortmaker authored
      
      
      Function names like "tipc_node_has_redundant_links" are unweildy
      and result in long lines even for simple lines.  The "has" doesn't
      contribute any value add, so dropping that is a slight step in the
      right direction.   This is a cosmetic change, basic result of:
      
      for i in `grep -l tipc_node_has_ *` ; do sed -i s/tipc_node_has_/tipc_node_/ $i ; done
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      8f19afb2
    • Allan Stephens's avatar
      tipc: Convert node object array to a hash table · 672d99e1
      Allan Stephens authored
      
      
      Replaces the dynamically allocated array of pointers to the cluster's
      node objects with a static hash table. Hash collisions are resolved
      using chaining, with a typical hash chain having only a single node,
      to avoid degrading performance during processing of incoming packets.
      The conversion to a hash table reduces the memory requirements for
      TIPC's node table to approximately the same size it had prior to
      the previous commit.
      
      In addition to the hash table itself, TIPC now also maintains a
      linked list for the node objects, sorted by ascending network address.
      This list allows TIPC to continue sending responses to user space
      applications that request node and link information in sorted order.
      The list also improves performance when name table update messages are
      sent by making it easier to identify the nodes that must be notified.
      Signed-off-by: default avatarAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      672d99e1