I was basically in the same position as you about a year ago. By doing a lot of tcp trace analysis, I was able to conclude that basically the LTM was throttling down all of our application traffic - much the same way yours is. It took a lot of studying and analysis, but I eventually came out with a TCP profile that is no longer throttling the connections. From my perspective, here are the most important settings:
Proxy Buffer Low: 98304
Proxy Buffer High: 131072
Send Buffer: 65535
Receive Window: 65535
Slow Start: disabled
Nagle's Algorithm: disabled
This tcp profile used for both our Client and Server connections. First I adjusted the four buffer settings. Next, I disabled Nagle. Finally I disabled Slow Start. Each individual step in that process significantly improved our TCP performance.
Like you, I was also concerned about CPU/memory utilization when adjusting these parameters (our platform is also 6400s). One of our pairs has a max number of connections of 1.5 million, and pushes over 500Mbps at times. And I haven't seen any noticable change to the CPU or memory utilization after these tcp profile adjustments. But to be honest, it was a leap of faith to get there. I couldn't come up with any way to measure the potential impact. The good news is that the changes don't disrupt traffic processing, and it is very easy and quick to back out.
What I concluded in a general sense is, with the default tcp profile settings, the LTM sends a limited number of data packets to the client before it sits and waits for an acknowledgement. On a circuit with an 80ms delay, this time adds up quickly. By changing the tcp parameters to the ones I noted above, the LTM now tries to fill up the entire client receive window resulting in much higher throughput.