node.js - Linux Server Benchmarking - Stuck at 31449 requests -
i apologize in advance length of question, wanted make clear have attempted.
setup:
- 4 t1.micro ec2 instances (clients)
- 1 c1.medium ec2 instance (server) in vpc (behind amazon elastic load balancer (elb))
- 1 simple node.js server running on c1.medium (listening on http port 3000, returns simple "hello" html string)
- 4 node.js servers (1 on each t1.micro) doing distributed load testing custom benchmarking suite against c1.medium
*clients , server running ubuntu , have file descriptor limits raised 102400.
run case:
the 4 clients try make n number of connections (simple http request) second, ranging 400 1000, until 80,000 requests made. server has hard response wait time, y, tested @ 500, 1000, 2000, , 3000 milliseconds before responds "hello".
problem:
at more 500 connections/second, there several second (up 10 or 15) halt, server no longer responds of clients, , clients left idle waiting responses. consistently @ 31449 requests. clients show appropriate amount of established connections (using netstat) holding during time. meanwhile, server shows around 31550 time_wait connections. after few seconds number reported server begins drop, , starts responding clients again. then, same issue occurs @ later total request count, e.g. 62198 (though not consistent). file descriptor count port drops 0.
attempted resolutions:
increasing ephemeral port range. default 32768-61000, or 30k. note despite coming 4 different physical clients, traffic routed through local ip of elb , ports assigned ip. effectively, 4 clients treated 1 instead of expected result of each of them being able use full port range. instead of 30k x 4 total ports, 4 limited 30k. increased port range 1024-65535 net.ipv4.ip_local_port_range, restarted server , following observed:
- the new port range used. ports low in 1000's , high 65000's observed being used.
- the connections still getting stuck @ 31449.
- total port's in time_wait state observed going high 50000, after getting stuck around 31550 10-15 seconds.
other tcp configurations changed, independent of each other , in conjuction each other such tc_fin_timeout, tcp_tw_recycle, tcp_tw_reuse, , several others without sizable improvement. tcp_tw_recycle seems most, makes status results on clients print out strangely , in wrong order, , still doesn't guarantee connections don't stuck. understand dangerous option enable.
question:
i want have many connections possible, real server gets put on c1.medium has high baseline when benchmarked. else can avoid hitting 31449 connection wall short of recompiling kernel or making server unstable? feel should able go higher 500/s, , thought increasing port range alone should have shown improvement, missing else.
thanks!
Comments
Post a Comment