Forum Discussion

andy_12_5042's avatar
andy_12_5042
Icon for Nimbostratus rankNimbostratus
Sep 04, 2013

LTM TCP Connection management

I am trying to understand why the F5 always shows 2-3 times more active sessions for a pool member than is actaully in the physical server state table. In addition I am seeing a problem with having Linux (ubuntu) and Solaris servers in the same pool. What happens is that the Solaris servers get most all of the connections and the Ubunut servers which are on better hardware are sitting mostly idle... The distribution method we use is least connections (node) and a both performance layer 4 or standard TCP depending on location.

 

So I guess to questions from this: 1) My uderstanding of LTM is the TCP connections which are closed normally for 4-way/3-way close should be immediately closed on the F5. The server always intiates the active close and hence goes into TIME_WAIT. Why does the pool member active connections always show so much more than the server really has active? (Server side I can see this via netstat and F5 I can use b pool | grep cur)

 

2) Ubuntu has hard coded 60 sec TIME_WAIT in the kernel but Solaris it is a tuneable paramter which we have set to 10secs for performance reasons. ( These connections are very short/fast so no issues with lower time). Why would the f5 send most everything to Solaris servers on poor hardware which translates to slower response times? ( we are not using oneconnect)

 

I cant seem to find any data that would explain this behaviour and it does not make any technical sense. we are on archaic code (9.25) which I have no control over but I have not seen this issue with multiple OS before. I have also tried to use a Round Robin pool balance method which also did not work and same behaviour... Does anyone have any logic as what is the problem here?

 

Thanks Andy

 

18 Replies

  • OK, so could you drop the idle timeout. I'm clutching at straws but you could also enable loose close.

     

    OneConnect would also help reduce the number of server-side connections and reduce the load somewhat on the servers.

     

  • I have also tried to use a Round Robin pool balance method which also did not work and same behaviour

     

    i think round robin should work. how did you test? can you reproduce the issue?

     

  • I have tried Round Robin and it would work for a while and then under heavy load, we start to see the same issue with much more traffic going to Solaris Servers.

     

    I have reduced the ide time to as low as 10 seconds but that does not help since these are active. The F5 sees these connections as EST and not persistent but this is not reflected in the server session state table. There appears to be a difference in how long it is holding connections between these servers and I just cant understand why.

     

  • I have tried Round Robin and it would work for a while and then under heavy load, we start to see the same issue with much more traffic going to Solaris Servers.

     

    how did you measure traffic to each server? was it from statistics on bigip?

     

    are you using any setting which may affect load distribution?

     

    sol10430: Causes of uneven traffic distribution across BIG-IP pool members

     

    http://support.f5.com/kb/en-us/solutions/public/10000/400/sol10430.html

     

  • traffic was measured from both the F5 pool member statistics as well as the server side session table.. The server will always reflect the most accurate number of sockets that are in EST or TIME_WAIT state for example.

     

    None of the things mentioned in that article apply here. Since I have seen this with Round robin that eliminates it being just with least connnections. This is one of those issues that I would need to get at the internals which I cant do without support. For example with some other vendor devices I can turn on specific types of debugging and observer the decision logic on where a request is sent based on the current configuration which is very helpful in these cases. It would at least provide some logic as to why more traffic is getting sent to same set of servers.

     

    • What_Lies_Bene1's avatar
      What_Lies_Bene1
      Icon for Cirrostratus rankCirrostratus
      I doubt very much if we'll get to the root cause of this, particularly with such an old version of code. However (Nitass gave me this idea in response to another post) perhaps it can be overcome using a more 'intelligent' load balancing method. Candidates would be Weighted Least Connections, Dynamic Ratio, Observed or Predictive.
  • Yeah I agree and was starting to think that is the only possible solution at this point. I will have to test some different types of balancing methods and see what I can do.

     

    Thanks for the comments guys! I dont know how I ended up with another gig that is using such old software and no support :)

     

    Andy

     

  • You're welcome, it's always the way. Please do post back if this does the trick. Here's a quick run down of the methods I mentioned;

     

    Weighted Least Connections – Member & Node - This method load balances new connections to whichever Pool Member or Node has the least number of active connections, however, you define a Connection Limit (Weight) for each Pool Member or Node based on your knowledge of its abilities. The Connection Limits are used along with the active connection count to distribute connections unequally in a Least Connections fashion.

     

    This method is suitable where the real servers have differing capabilities.

     

    As each connection can have differing overheads (one could related to a request for a HTML page, the other a 20Mb PDF document that needs to be generated and downloaded) this is not a reliable way of distributing bandwidth and processing load between servers.

     

    Member method: The weights and connection count for each Pool Member is calculated only in relation to connections specific to the Pool in question.

     

    Node method: The weights and connection count for each Node is calculated in relation to all the Pools the Node is a Member of.

     

    If all Pool Members have the same Connection Limit then this method acts just like Least Connections.

     

    Dynamic Ratio – Member & Node - Also known as Dynamic Round Robin, this method is similar to Ratio but dynamic; real-time server performance (such as the current number of connections and response time) analysis is used to distribute connections unequally in a circular (Round Robin) fashion. This may sound like Observed but keep in mind connections are still distributed in a circular way.

     

    This method is suitable where the real servers have differing capabilities.

     

    Member method: The performance of each Pool Member is calculated only in relation to the Pool in question.

     

    Node method: The performance of each Node is calculated in relation to all the Pools the Node is a Member of.

     

    Observed – Member & Node - This method load balances connections using a ranking derived from the number of Layer Four connections to each real server and each server’s response time to the last request. This is effectively a combination of the Least Connections and Fastest methods.

     

    Not recommended except in specific circumstances and not at all for large Pools. Connections to each Pool Member are only considered in relation to the specific Pool in question.

     

    Member method: The weights and connection count for each Pool Member is calculated only in relation to connections specific to the Pool in question.

     

    Node method: The weights and connection count for each Node is calculated in relation to all the Pools the Node is a Member of.

     

    Predictive – Member & Node - Similar to Observed but more aggressive as the resulting Pool Member rankings are analysed over time and if a Pool Member’s ranking is improving it will receive a higher proportion of connections than one whose ranking is declining.

     

    Not recommended except in specific circumstances and not at all for large Pools.

     

    Member method: The ranking and analysis for each Pool Member is calculated only in relation to connections and response times specific to the Pool in question.

     

    Node method: The ranking and analysis for each Node is calculated in relation to connections and response times for all the Pools the Node is a Member of.