1. 21 Jul, 2015 1 commit
  2. 08 Jul, 2015 1 commit
  3. 28 Jun, 2015 1 commit
    • Jon Paul Maloy's avatar
      tipc: purge backlog queue counters when broadcast link is reset · 7d967b67
      Jon Paul Maloy authored
      In commit 1f66d161
      
      
      ("tipc: introduce starvation free send algorithm")
      we introduced a counter per priority level for buffers
      in the link backlog queue. We also introduced a new
      function tipc_link_purge_backlog(), to reset these
      counters to zero when the link is reset.
      
      Unfortunately, we missed to call this function when
      the broadcast link is reset, with the result that the
      values of these counters might be permanently skewed
      when new nodes are attached. This may in the worst case
      lead to permananent, but spurious, broadcast link
      congestion, where no broadcast packets can be sent at
      all.
      
      We fix this bug with this commit.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d967b67
  4. 11 Jun, 2015 1 commit
  5. 31 May, 2015 1 commit
  6. 26 May, 2015 1 commit
    • Jon Paul Maloy's avatar
      tipc: fix bug in link protocol message create function · f3903bcc
      Jon Paul Maloy authored
      In commit dd3f9e70
      
      
      ("tipc: add packet sequence number at instant of transmission") we
      made a change with the consequence that packets in the link backlog
      queue don't contain valid sequence numbers.
      
      However, when we create a link protocol message, we still use the
      sequence number of the first packet in the backlog, if there is any,
      as "next_sent" indicator in the message. This may entail unnecessary
      retransissions or stale packet transmission when there is very low
      traffic on the link.
      
      This commit fixes this issue by only using the current value of
      tipc_link::snd_nxt as indicator.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3903bcc
  7. 14 May, 2015 9 commits
    • Ying Xue's avatar
      tipc: use sock_create_kern interface to create kernel socket · fa787ae0
      Ying Xue authored
      After commit eeb1bd5c
      
       ("net: Add a struct net parameter to
      sock_create_kern"), we should use sock_create_kern() to create kernel
      socket as the interface doesn't reference count struct net any more.
      Signed-off-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa787ae0
    • Jon Paul Maloy's avatar
      tipc: add packet sequence number at instant of transmission · dd3f9e70
      Jon Paul Maloy authored
      
      
      Currently, the packet sequence number is updated and added to each
      packet at the moment a packet is added to the link backlog queue.
      This is wasteful, since it forces the code to traverse the send
      packet list packet by packet when adding them to the backlog queue.
      It would be better to just splice the whole packet list into the
      backlog queue when that is the right action to do.
      
      In this commit, we do this change. Also, since the sequence numbers
      cannot now be assigned to the packets at the moment they are added
      the backlog queue, we do instead calculate and add them at the moment
      of transmission, when the backlog queue has to be traversed anyway.
      We do this in the function tipc_link_push_packet().
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd3f9e70
    • Jon Paul Maloy's avatar
      tipc: improve link congestion algorithm · f21e897e
      Jon Paul Maloy authored
      
      
      The link congestion algorithm used until now implies two problems.
      
      - It is too generous towards lower-level messages in situations of high
        load by giving "absolute" bandwidth guarantees to the different
        priority levels. LOW traffic is guaranteed 10%, MEDIUM is guaranted
        20%, HIGH is guaranteed 30%, and CRITICAL is guaranteed 40% of the
        available bandwidth. But, in the absence of higher level traffic, the
        ratio between two distinct levels becomes unreasonable. E.g. if there
        is only LOW and MEDIUM traffic on a system, the former is guaranteed
        1/3 of the bandwidth, and the latter 2/3. This again means that if
        there is e.g. one LOW user and 10 MEDIUM users, the  former will have
        33.3% of the bandwidth, and the others will have to compete for the
        remainder, i.e. each will end up with 6.7% of the capacity.
      
      - Packets of type MSG_BUNDLER are created at SYSTEM importance level,
        but only after the packets bundled into it have passed the congestion
        test for their own respective levels. Since bundled packets don't
        result in incrementing the level counter for their own importance,
        only occasionally for the SYSTEM level counter, they do in practice
        obtain SYSTEM level importance. Hence, the current implementation
        provides a gap in the congestion algorithm that in the worst case
        may lead to a link reset.
      
      We now refine the congestion algorithm as follows:
      
      - A message is accepted to the link backlog only if its own level
        counter, and all superior level counters, permit it.
      
      - The importance of a created bundle packet is set according to its
        contents. A bundle packet created from messges at levels LOW to
        CRITICAL is given importance level CRITICAL, while a bundle created
        from a SYSTEM level message is given importance SYSTEM. In the latter
        case only subsequent SYSTEM level messages are allowed to be bundled
        into it.
      
      This solves the first problem described above, by making the bandwidth
      guarantee relative to the total number of users at all levels; only
      the upper limit for each level remains absolute. In the example
      described above, the single LOW user would use 1/11th of the bandwidth,
      the same as each of the ten MEDIUM users, but he still has the same
      guarantee against starvation as the latter ones.
      
      The fix also solves the second problem. If the CRITICAL level is filled
      up by bundle packets of that level, no lower level packets will be
      accepted any more.
      Suggested-by: default avatarGergely Kiss <gergely.kiss@ericsson.com>
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f21e897e
    • Jon Paul Maloy's avatar
      tipc: simplify link supervision checkpointing · cd4eee3c
      Jon Paul Maloy authored
      
      
      We change the sequence number checkpointing that is performed
      by the timer in order to discover if the peer is active. Currently,
      we store a checkpoint of the next expected sequence number "rcv_nxt"
      at each timer expiration, and compare it to the current expected
      number at next timeout expiration. Instead, we now use the already
      existing field "silent_intv_cnt" for this task. We step the counter
      at each timeout expiration, and zero it at each valid received packet.
      If no valid packet has been received from the peer after "abort_limit"
      number of silent timer intervals, the link is declared faulty and reset.
      
      We also remove the multiple instances of timer activation from inside
      the FSM function "link_state_event()", and now do it at only one place;
      at the end of the timer function itself.
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd4eee3c
    • Jon Paul Maloy's avatar
      tipc: rename fields in struct tipc_link · a97b9d3f
      Jon Paul Maloy authored
      
      
      We rename some fields in struct tipc_link, in order to give them more
      descriptive names:
      
      next_in_no -> rcv_nxt
      next_out_no-> snd_nxt
      fsm_msg_cnt-> silent_intv_cnt
      cont_intv  -> keepalive_intv
      last_retransmitted -> last_retransm
      
      There are no functional changes in this commit.
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a97b9d3f
    • Jon Paul Maloy's avatar
      tipc: simplify packet sequence number handling · e4bf4f76
      Jon Paul Maloy authored
      
      
      Although the sequence number in the TIPC protocol is 16 bits, we have
      until now stored it internally as an unsigned 32 bits integer.
      We got around this by always doing explicit modulo-65535 operations
      whenever we need to access a sequence number.
      
      We now make the incoming and outgoing sequence numbers to unsigned
      16-bit integers, and remove the modulo operations where applicable.
      
      We also move the arithmetic inline functions for 16 bit integers
      to core.h, and the function buf_seqno() to msg.h, so they can easily
      be accessed from anywhere in the code.
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4bf4f76
    • Jon Paul Maloy's avatar
      tipc: simplify include dependencies · a6bf70f7
      Jon Paul Maloy authored
      
      
      When we try to add new inline functions in the code, we sometimes
      run into circular include dependencies.
      
      The main problem is that the file core.h, which really should be at
      the root of the dependency chain, instead is a leaf. I.e., core.h
      includes a number of header files that themselves should be allowed
      to include core.h. In reality this is unnecessary, because core.h does
      not need to know the full signature of any of the structs it refers to,
      only their type declaration.
      
      In this commit, we remove all dependencies from core.h towards any
      other tipc header file.
      
      As a consequence of this change, we can now move the function
      tipc_own_addr(net) from addr.c to addr.h, and make it inline.
      
      There are no functional changes in this commit.
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6bf70f7
    • Jon Paul Maloy's avatar
      tipc: simplify link timer handling · 75b44b01
      Jon Paul Maloy authored
      
      
      Prior to this commit, the link timer has been running at a "continuity
      interval" of configured link tolerance/4. When a timer wakes up and
      discovers that there has been no sign of life from the peer during the
      previous interval, it divides its own timer interval by another factor
      four, and starts sending one probe per new interval. When the configured
      link tolerance time has passed without answer, i.e. after 16 unacked
      probes, the link is declared faulty and reset.
      
      This is unnecessary complex. It is sufficient to continue with the
      original continuity interval, and instead reset the link after four
      missed probe responses. This makes the timer handling in the link
      simpler, and opens up for some planned later changes in this area.
      This commit implements this change.
      Reviewed-by: default avatarRichard Alpe <richard.alpe@ericsson.com>
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75b44b01
    • Jon Paul Maloy's avatar
      tipc: simplify resetting and disabling of bearers · b1c29f6b
      Jon Paul Maloy authored
      
      
      Since commit 4b475e3f2f8e4e241de101c8240f1d74d0470494
      ("tipc: eliminate delayed link deletion at link failover") the extra
      boolean parameter "shutting_down" is not any longer needed for the
      functions bearer_disable() and tipc_link_delete_list().
      
      Furhermore, the function tipc_link_reset_links(), called from
      bearer_reset()  is now unnecessary. We can just as well delete
      all the links, as we do in bearer_disable(), and start over with
      creating new links.
      
      This commit introduces those changes.
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1c29f6b
  8. 11 May, 2015 1 commit
  9. 09 May, 2015 3 commits
  10. 04 May, 2015 5 commits
    • Ying Xue's avatar
      tipc: deal with return value of tipc_conn_new callback · 90bdfcb7
      Ying Xue authored
      
      
      Once tipc_conn_new() returns NULL, the connection should be shut
      down immediately, otherwise, oops may happen due to the NULL pointer.
      Signed-off-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90bdfcb7
    • Ying Xue's avatar
      tipc: adjust locking policy of subscription · a13683f2
      Ying Xue authored
      
      
      Currently subscriber's lock protects not only subscriber's subscription
      list but also all subscriptions linked into the list. However, as all
      members of subscription are never changed after they are initialized,
      it's unnecessary for subscription to be protected under subscriber's
      lock. If the lock is used to only protect subscriber's subscription
      list, the adjustment not only makes the locking policy simpler, but
      also helps to avoid a deadlock which may happen once creating a
      subscription is failed.
      Signed-off-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a13683f2
    • Ying Xue's avatar
      tipc: involve reference counter for subscriber · 00bc00a9
      Ying Xue authored
      
      
      At present subscriber's lock is used to protect the subscription list
      of subscriber as well as subscriptions linked into the list. While one
      or all subscriptions are deleted through iterating the list, the
      subscriber's lock must be held. Meanwhile, as deletion of subscription
      may happen in subscription timer's handler, the lock must be grabbed
      in the function as well. When subscription's timer is terminated with
      del_timer_sync() during above iteration, subscriber's lock has to be
      temporarily released, otherwise, deadlock may occur. However, the
      temporary release may cause the double free of a subscription as the
      subscription is not disconnected from the subscription list.
      
      Now if a reference counter is introduced to subscriber, subscription's
      timer can be asynchronously stopped with del_timer(). As a result, the
      issue is not only able to be fixed, but also relevant code is pretty
      readable and understandable.
      Signed-off-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00bc00a9
    • Ying Xue's avatar
      tipc: introduce tipc_subscrb_create routine · 1b764828
      Ying Xue authored
      
      
      Introducing a new function makes the purpose of tipc_subscrb_connect_cb
      callback routine more clear.
      Signed-off-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b764828
    • Ying Xue's avatar
      tipc: rename functions defined in subscr.c · 57f1d186
      Ying Xue authored
      
      
      When a topology server accepts a connection request from its client,
      it allocates a connection instance and a tipc_subscriber structure
      object. The former is used to communicate with client, and the latter
      is often treated as a subscriber which manages all subscription events
      requested from a same client. When a topology server receives a request
      of subscribing name services from a client through the connection, it
      creates a tipc_subscription structure instance which is seen as a
      subscription recording what name services are subscribed. In order to
      manage all subscriptions from a same client, topology server links
      them into the subscrp_list of the subscriber. So subscriber and
      subscription completely represents different meanings respectively,
      but function names associated with them make us so confused that we
      are unable to easily tell which function is against subscriber and
      which is to subscription. So we want to eliminate the confusion by
      renaming them.
      Signed-off-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57f1d186
  11. 29 Apr, 2015 2 commits
    • Jon Paul Maloy's avatar
      tipc: fix problem with parallel link synchronization mechanism · 0d699f28
      Jon Paul Maloy authored
      
      
      Currently, we try to accumulate arrived packets in the links's
      'deferred' queue during the parallel link syncronization phase.
      
      This entails two problems:
      
      - With an unlucky combination of arriving packets the algorithm
        may go into a lockstep with the out-of-sequence handling function,
        where the synch mechanism is adding a packet to the deferred queue,
        while the out-of-sequence handling is retrieving it again, thus
        ending up in a loop inside the node_lock scope.
      
      - Even if this is avoided, the link will very often send out
        unnecessary protocol messages, in the worst case leading to
        redundant retransmissions.
      
      We fix this by just dropping arriving packets on the upcoming link
      during the synchronization phase, thus relying on the retransmission
      protocol to resolve the situation once the two links have arrived to
      a synchronized state.
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d699f28
    • Nicolas Dichtel's avatar
      tipc: remove wrong use of NLM_F_MULTI · f2f67390
      Nicolas Dichtel authored
      NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
      it is sent only at the end of a dump.
      
      Libraries like libnl will wait forever for NLMSG_DONE.
      
      Fixes: 35b9dd76 ("tipc: add bearer get/dump to new netlink api")
      Fixes: 7be57fc6 ("tipc: add link get/dump to new netlink api")
      Fixes: 46f15c67
      
       ("tipc: add media get/dump to new netlink api")
      CC: Richard Alpe <richard.alpe@ericsson.com>
      CC: Jon Maloy <jon.maloy@ericsson.com>
      CC: Ying Xue <ying.xue@windriver.com>
      CC: tipc-discussion@lists.sourceforge.net
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2f67390
  12. 23 Apr, 2015 3 commits
    • Erik Hugne's avatar
      tipc: fix node refcount issue · 73a31737
      Erik Hugne authored
      When link statistics is dumped over netlink, we iterate over
      the list of peer nodes and append each links statistics to
      the netlink msg. In the case where the dump is resumed after
      filling up a nlmsg, the node refcnt is decremented without
      having been incremented previously which may cause the node
      reference to be freed. When this happens, the following
      info/stacktrace will be generated, followed by a crash or
      undefined behavior.
      We fix this by removing the erroneous call to tipc_node_put
      inside the loop that iterates over nodes.
      
      [  384.312303] INFO: trying to register non-static key.
      [  384.313110] the code is fine but needs lockdep annotation.
      [  384.313290] turning off the locking correctness validator.
      [  384.313290] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.0.0+ #13
      [  384.313290] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  384.313290]  ffff88003c6d0290 ffff88003cc03ca8 ffffffff8170adf1 0000000000000007
      [  384.313290]  ffffffff82728730 ffff88003cc03d38 ffffffff810a6a6d 00000000001d7200
      [  384.313290]  ffff88003c6d0ab0 ffff88003cc03ce8 0000000000000285 0000000000000001
      [  384.313290] Call Trace:
      [  384.313290]  <IRQ>  [<ffffffff8170adf1>] dump_stack+0x4c/0x65
      [  384.313290]  [<ffffffff810a6a6d>] __lock_acquire+0xf3d/0xf50
      [  384.313290]  [<ffffffff810a7375>] lock_acquire+0xd5/0x290
      [  384.313290]  [<ffffffffa0043e8c>] ? link_timeout+0x1c/0x170 [tipc]
      [  384.313290]  [<ffffffffa0043e70>] ? link_state_event+0x4e0/0x4e0 [tipc]
      [  384.313290]  [<ffffffff81712890>] _raw_spin_lock_bh+0x40/0x80
      [  384.313290]  [<ffffffffa0043e8c>] ? link_timeout+0x1c/0x170 [tipc]
      [  384.313290]  [<ffffffffa0043e8c>] link_timeout+0x1c/0x170 [tipc]
      [  384.313290]  [<ffffffff810c4698>] call_timer_fn+0xb8/0x490
      [  384.313290]  [<ffffffff810c45e0>] ? process_timeout+0x10/0x10
      [  384.313290]  [<ffffffff810c5a2c>] run_timer_softirq+0x21c/0x420
      [  384.313290]  [<ffffffffa0043e70>] ? link_state_event+0x4e0/0x4e0 [tipc]
      [  384.313290]  [<ffffffff8105a954>] __do_softirq+0xf4/0x630
      [  384.313290]  [<ffffffff8105afdd>] irq_exit+0x5d/0x60
      [  384.313290]  [<ffffffff8103ade1>] smp_apic_timer_interrupt+0x41/0x50
      [  384.313290]  [<ffffffff817144a0>] apic_timer_interrupt+0x70/0x80
      [  384.313290]  <EOI>  [<ffffffff8100db10>] ? default_idle+0x20/0x210
      [  384.313290]  [<ffffffff8100db0e>] ? default_idle+0x1e/0x210
      [  384.313290]  [<ffffffff8100e61a>] arch_cpu_idle+0xa/0x10
      [  384.313290]  [<ffffffff81099803>] cpu_startup_entry+0x2c3/0x530
      [  384.313290]  [<ffffffff810d2893>] ? clockevents_register_device+0x113/0x200
      [  384.313290]  [<ffffffff81038b0f>] start_secondary+0x13f/0x170
      
      Fixes: 8a0f6ebe
      
       ("tipc: involve reference counter for node structure")
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73a31737
    • Erik Hugne's avatar
      tipc: fix random link reset problem · 9871b27f
      Erik Hugne authored
      
      
      In the function tipc_sk_rcv(), the stack variable 'err'
      is only initialized to TIPC_ERR_NO_PORT for the first
      iteration over the link input queue. If a chain of messages
      are received from a link, failure to lookup the socket for
      any but the first message will cause the message to bounce back
      out on a random link.
      We fix this by properly initializing err.
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9871b27f
    • Ying Xue's avatar
      tipc: fix topology server broken issue · def81f69
      Ying Xue authored
      When a new topology server is launched in a new namespace, its
      listening socket is inserted into the "init ns" namespace's socket
      hash table rather than the one owned by the new namespace. Although
      the socket's namespace is forcedly changed to the new namespace later,
      the socket is still stored in the socket hash table of "init ns"
      namespace. When a client created in the new namespace connects
      its own topology server, the connection is failed as its server's
      socket could not be found from its own namespace's socket table.
      
      If __sock_create() instead of original sock_create_kern() is used
      to create the server's socket through specifying an expected namesapce,
      the socket will be inserted into the specified namespace's socket
      table, thereby avoiding to the topology server broken issue.
      
      Fixes: 76100a8a
      
       ("tipc: fix netns refcnt leak")
      Reported-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      def81f69
  13. 07 Apr, 2015 1 commit
  14. 02 Apr, 2015 3 commits
    • Jon Paul Maloy's avatar
      tipc: simplify link mtu negotiation · ed193ece
      Jon Paul Maloy authored
      
      
      When a link is being established, the two endpoints advertise their
      respective interface MTU in the transmitted RESET and ACTIVATE messages.
      If there is any difference, the lower of the two MTUs will be selected
      for use by both endpoints.
      
      However, as a remnant of earlier attempts to introduce TIPC level
      routing. there also exists an MTU discovery mechanism. If an intermediate
      node has a lower MTU than the two endpoints, they will discover this
      through a bisectional approach, and finally adopt this MTU for common use.
      
      Since there is no TIPC level routing, and probably never will be,
      this mechanism doesn't make any sense, and only serves to make the
      link level protocol unecessarily complex.
      
      In this commit, we eliminate the MTU discovery algorithm,and fall back
      to the simple MTU advertising approach. This change is fully backwards
      compatible.
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed193ece
    • Jon Paul Maloy's avatar
      tipc: eliminate delayed link deletion at link failover · dff29b1a
      Jon Paul Maloy authored
      
      
      When a bearer is disabled manually, all its links have to be reset
      and deleted. However, if there is a remaining, parallel link ready
      to take over a deleted link's traffic, we currently delay the delete
      of the removed link until the failover procedure is finished. This
      is because the remaining link needs to access state from the reset
      link, such as the last received packet number, and any partially
      reassembled buffer, in order to perform a successful failover.
      
      In this commit, we do instead move the state data over to the new
      link, so that it can fulfill the procedure autonomously, without
      accessing any data on the old link. This means that we can now
      proceed and delete all pertaining links immediately when a bearer
      is disabled. This saves us from some unnecessary complexity in such
      situations.
      
      We also choose to change the confusing definitions CHANGEOVER_PROTOCOL,
      ORIGINAL_MSG and DUPLICATE_MSG to the more descriptive TUNNEL_PROTOCOL,
      FAILOVER_MSG and SYNCH_MSG respectively.
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dff29b1a
    • Jon Paul Maloy's avatar
      tipc: drop tunneled packet duplicates at reception · 2da71425
      Jon Paul Maloy authored
      In commit 8b4ed863
      
      
      ("tipc: eliminate race condition at dual link establishment")
      we introduced a parallel link synchronization mechanism that
      guarentees sequential delivery even for users switching from
      an old to a newly established link. The new mechanism makes it
      unnecessary to deliver the tunneled duplicate packets back to
      the old link, as we are currently doing. It is now sufficient
      to use the last tunneled packet's inner sequence number as
      synchronization point between the two parallel links, whereafter
      it can be dropped.
      
      In this commit, we drop the duplicate packets arriving on the new
      link, after updating the synchronization point at each new arrival.
      
      Although it would now have been sufficient for the other endpoint
      to only tunnel the last packet in its send queue, and not the
      entire queue, we must still do this to maintain compatibility
      with older nodes.
      
      This commit makes it possible to get rid if some complex
      interaction between the two parallel links.
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2da71425
  15. 01 Apr, 2015 1 commit
    • Ying Xue's avatar
      tipc: fix a slab object leak · 7e436905
      Ying Xue authored
      When remove TIPC module, there is a warning to remind us that a slab
      object is leaked like:
      
      root@localhost:~# rmmod tipc
      [   19.056226] =============================================================================
      [   19.057549] BUG TIPC (Not tainted): Objects remaining in TIPC on kmem_cache_close()
      [   19.058736] -----------------------------------------------------------------------------
      [   19.058736]
      [   19.060287] INFO: Slab 0xffffea0000519a00 objects=23 used=1 fp=0xffff880014668b00 flags=0x100000000004080
      [   19.061915] INFO: Object 0xffff880014668000 @offset=0
      [   19.062717] kmem_cache_destroy TIPC: Slab cache still has objects
      
      This is because the listening socket of TIPC topology server is not
      closed before TIPC proto handler is unregistered with proto_unregister().
      However, as the socket is closed in tipc_exit_net() which is called by
      unregister_pernet_subsys() during unregistering TIPC namespace operation,
      the warning can be eliminated if calling unregister_pernet_subsys() is
      moved before calling proto_unregister().
      
      Fixes: e05b31f4
      
       ("tipc: make tipc socket support net namespace")
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e436905
  16. 29 Mar, 2015 3 commits
    • Jon Paul Maloy's avatar
      tipc: fix two bugs in secondary destination lookup · d482994f
      Jon Paul Maloy authored
      
      
      A message sent to a node after a successful name table lookup may still
      find that the destination socket has disappeared, because distribution
      of name table updates is non-atomic. If so, the message will be rejected
      back to the sender with error code TIPC_ERR_NO_PORT. If the source
      socket of the message has disappeared in the meantime, the message
      should be dropped.
      
      However, in the currrent code, the message will instead be subject to an
      unwanted tertiary lookup, because the function tipc_msg_lookup_dest()
      doesn't check if there is an error code present in the message before
      performing the lookup. In the worst case, the message may now find the
      old destination again, and be redirected once more, instead of being
      dropped directly as it should be.
      
      A second bug in this function is that the "prev_node" field in the message
      is not updated after successful lookup, something that may have
      unpredictable consequences.
      
      The problems arising from those bugs occur very infrequently.
      
      The third change in this function; the test on msg_reroute_msg_cnt() is
      purely cosmetic, reflecting that the returned value never can be negative.
      
      This commit corrects the two bugs described above.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d482994f
    • Ying Xue's avatar
      tipc: involve reference counter for node structure · 8a0f6ebe
      Ying Xue authored
      
      
      TIPC node hash node table is protected with rcu lock on read side.
      tipc_node_find() is used to look for a node object with node address
      through iterating the hash node table. As the entire process of what
      tipc_node_find() traverses the table is guarded with rcu read lock,
      it's safe for us. However, when callers use the node object returned
      by tipc_node_find(), there is no rcu read lock applied. Therefore,
      this is absolutely unsafe for callers of tipc_node_find().
      
      Now we introduce a reference counter for node structure. Before
      tipc_node_find() returns node object to its caller, it first increases
      the reference counter. Accordingly, after its caller used it up,
      it decreases the counter again. This can prevent a node being used by
      one thread from being freed by another thread.
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericson.com>
      Signed-off-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a0f6ebe
    • Ying Xue's avatar
      tipc: fix potential deadlock when all links are reset · b952b2be
      Ying Xue authored
      
      
      [   60.988363] ======================================================
      [   60.988754] [ INFO: possible circular locking dependency detected ]
      [   60.989152] 3.19.0+ #194 Not tainted
      [   60.989377] -------------------------------------------------------
      [   60.989781] swapper/3/0 is trying to acquire lock:
      [   60.990079]  (&(&n_ptr->lock)->rlock){+.-...}, at: [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
      [   60.990743]
      [   60.990743] but task is already holding lock:
      [   60.991106]  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.991738]
      [   60.991738] which lock already depends on the new lock.
      [   60.991738]
      [   60.992174]
      [   60.992174] the existing dependency chain (in reverse order) is:
      [   60.992174]
      -> #1 (&(&bclink->lock)->rlock){+.-...}:
      [   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   60.992174]        [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.992174]        [<ffffffffa0000f57>] tipc_bclink_add_node+0x97/0xf0 [tipc]
      [   60.992174]        [<ffffffffa0011815>] tipc_node_link_up+0xf5/0x110 [tipc]
      [   60.992174]        [<ffffffffa0007783>] link_state_event+0x2b3/0x4f0 [tipc]
      [   60.992174]        [<ffffffffa00193c0>] tipc_link_proto_rcv+0x24c/0x418 [tipc]
      [   60.992174]        [<ffffffffa0008857>] tipc_rcv+0x827/0xac0 [tipc]
      [   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
      [   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
      [   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
      [   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
      [   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
      [   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
      [   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
      [   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
      [   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
      [   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
      [   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
      [   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
      [   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
      [   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
      [   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
      [   60.992174]
      -> #0 (&(&n_ptr->lock)->rlock){+.-...}:
      [   60.992174]        [<ffffffff810a8f7d>] __lock_acquire+0x163d/0x1ca0
      [   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   60.992174]        [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
      [   60.992174]        [<ffffffffa0001e11>] tipc_bclink_rcv+0x611/0x640 [tipc]
      [   60.992174]        [<ffffffffa0008646>] tipc_rcv+0x616/0xac0 [tipc]
      [   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
      [   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
      [   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
      [   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
      [   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
      [   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
      [   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
      [   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
      [   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
      [   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
      [   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
      [   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
      [   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
      [   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
      [   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
      [   60.992174]
      [   60.992174] other info that might help us debug this:
      [   60.992174]
      [   60.992174]  Possible unsafe locking scenario:
      [   60.992174]
      [   60.992174]        CPU0                    CPU1
      [   60.992174]        ----                    ----
      [   60.992174]   lock(&(&bclink->lock)->rlock);
      [   60.992174]                                lock(&(&n_ptr->lock)->rlock);
      [   60.992174]                                lock(&(&bclink->lock)->rlock);
      [   60.992174]   lock(&(&n_ptr->lock)->rlock);
      [   60.992174]
      [   60.992174]  *** DEADLOCK ***
      [   60.992174]
      [   60.992174] 3 locks held by swapper/3/0:
      [   60.992174]  #0:  (rcu_read_lock){......}, at: [<ffffffff81646791>] __netif_receive_skb_core+0x71/0x980
      [   60.992174]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0002c35>] tipc_l2_rcv_msg+0x5/0xd0 [tipc]
      [   60.992174]  #2:  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.992174]
      
      The correct the sequence of grabbing n_ptr->lock and bclink->lock
      should be that the former is first held and the latter is then taken,
      which exactly happened on CPU1. But especially when the retransmission
      of broadcast link is failed, bclink->lock is first held in
      tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure()
      called by tipc_link_retransmit() subsequently, which is demonstrated on
      CPU0. As a result, deadlock occurs.
      
      If the order of holding the two locks happening on CPU0 is reversed, the
      deadlock risk will be relieved. Therefore, the node lock taken in
      link_retransmit_failure() originally is moved to tipc_bclink_rcv()
      so that it's obtained before bclink lock. But the precondition of
      the adjustment of node lock is that responding to bclink reset event
      must be moved from tipc_bclink_unlock() to tipc_node_unlock().
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b952b2be
  17. 25 Mar, 2015 3 commits
    • Jon Paul Maloy's avatar
      tipc: eliminate race condition at dual link establishment · 8b4ed863
      Jon Paul Maloy authored
      
      
      Despite recent improvements, the establishment of dual parallel
      links still has a small glitch where messages can bypass each
      other. When the second link in a dual-link configuration is
      established, part of the first link's traffic will be steered over
      to the new link. Although we do have a mechanism to ensure that
      packets sent before and after the establishment of the new link
      arrive in sequence to the destination node, this is not enough.
      The arriving messages will still be delivered upwards in different
      threads, something entailing a risk of message disordering during
      the transition phase.
      
      To fix this, we introduce a synchronization mechanism between the
      two parallel links, so that traffic arriving on the new link cannot
      be added to its input queue until we are guaranteed that all
      pre-establishment messages have been delivered on the old, parallel
      link.
      
      This problem seems to always have been around, but its occurrence is
      so rare that it has not been noticed until recent intensive testing.
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b4ed863
    • Jon Paul Maloy's avatar
      tipc: clean up handling of link congestion · 3127a020
      Jon Paul Maloy authored
      
      
      After the recent changes in message importance handling it becomes
      possible to simplify handling of messages and sockets when we
      encounter link congestion.
      
      We merge the function tipc_link_cong() into link_schedule_user(),
      and simplify the code of the latter. The code should now be
      easier to follow, especially regarding return codes and handling
      of the message that caused the situation.
      
      In case the scheduling function is unable to pre-allocate a wakeup
      message buffer, it now returns -ENOBUFS, which is a more correct
      code than the previously used -EHOSTUNREACH.
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3127a020
    • Jon Paul Maloy's avatar
      tipc: introduce starvation free send algorithm · 1f66d161
      Jon Paul Maloy authored
      
      
      Currently, we only use a single counter; the length of the backlog
      queue, to determine whether a message should be accepted to the queue
      or not. Each time a message is being sent, the queue length is compared
      to a threshold value for the message's importance priority. If the queue
      length is beyond this threshold, the message is rejected. This algorithm
      implies a risk of starvation of low importance senders during very high
      load, because it may take a long time before the backlog queue has
      decreased enough to accept a lower level message.
      
      We now eliminate this risk by introducing a counter for each importance
      priority. When a message is sent, we check only the queue level for that
      particular message's priority. If that is ok, the message can be added
      to the backlog, irrespective of the queue level for other priorities.
      This way, each level is guaranteed a certain portion of the total
      bandwidth, and any risk of starvation is eliminated.
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f66d161