連接失效問題
例子
其中,Redis常見的報(bào)錯(cuò)就是:
配置項(xiàng):timeout
報(bào)錯(cuò)信息:Error while reading line from the server
Redis可以配置如果客戶端經(jīng)過多少秒還不給Redis服務(wù)器發(fā)送數(shù)據(jù),那么就會(huì)把連接close掉。
MySQL常見的報(bào)錯(cuò):
配置項(xiàng):wait_timeout & interactive_timeout
報(bào)錯(cuò)信息:has gone away
和Redis服務(wù)器一樣,MySQL也會(huì)定時(shí)的去清理掉沒用的連接。
如何解決
1、用的時(shí)候進(jìn)行重連
2、定時(shí)發(fā)送心跳維持連接
用的時(shí)候進(jìn)行重連
優(yōu)點(diǎn)是簡(jiǎn)單,缺點(diǎn)是面臨短連接的問題。
定時(shí)發(fā)送心跳維持連接
推薦。
如何維持長(zhǎng)連接
tcp協(xié)議中實(shí)現(xiàn)的tcp_keepalive
?
操作系統(tǒng)底層提供了一組tcp的keepalive配置:
1 tcp_keepalive_time (integer; default: 7200; since Linux 2.2) 2 The number
of seconds a connection needs to be idle before TCP 3 begins sending out
keep-alive probes. Keep-alives are sent only 4 when the SO_KEEPALIVE socket
option is enabled. Thedefault 5 value is 7200 seconds (2 hours). An idle
connection is 6 terminated after approximately an additional 11 minutes (9 7
probes an interval of 75 seconds apart) when keep-alive is 8 enabled. 9 10
Note that underlying connection tracking mechanisms and11 application timeouts
may be much shorter.12 13 tcp_keepalive_intvl (integer; default: 75; since
Linux 2.4) 14 The number of seconds between TCP keep-alive probes. 15 16
tcp_keepalive_probes (integer; default: 9; since Linux 2.2) 17 The maximum
number of TCP keep-alive probes to send before 18 giving up and killing the
connectionif no response is obtained 19 from the other end. 20 8
Swoole底層把這些配置開放出來了,例如:
1 ?php 2 3 $server = new \Swoole\Server('127.0.0.1', 6666, SWOOLE_PROCESS);
4 5 $server->set([ 6 'worker_num' => 1, 7 'open_tcp_keepalive' => 1, 8
'tcp_keepidle' => 4,// 對(duì)應(yīng)tcp_keepalive_time 9 'tcp_keepinterval' => 1, //
對(duì)應(yīng)tcp_keepalive_intvl 10 'tcp_keepcount' => 5, // 對(duì)應(yīng)tcp_keepalive_probes 11 ]);
其中:
1 'open_tcp_keepalive' => 1, // 總開關(guān),用來開啟tcp_keepalive 2 'tcp_keepidle' => 4, //
4s沒有數(shù)據(jù)傳輸就進(jìn)行檢測(cè)3 // 檢測(cè)的策略如下: 4 'tcp_keepinterval' => 1, //
1s探測(cè)一次,即每隔1s給客戶端發(fā)一個(gè)包(然后客戶端可能會(huì)回一個(gè)ack的包,如果服務(wù)端收到了這個(gè)ack包,那么說明這個(gè)連接是活著的) 5
'tcp_keepcount' => 5,// 探測(cè)的次數(shù),超過5次后客戶端還沒有回ack包,那么close此連接
?
我們來實(shí)戰(zhàn)測(cè)試體驗(yàn)一下,服務(wù)端腳本如下:
1 <?php 2 3 $server = new \Swoole\Server('127.0.0.1', 6666, SWOOLE_PROCESS);
4 5 $server->set([ 6 'worker_num' => 1, 7 'open_tcp_keepalive' => 1, //
開啟tcp_keepalive 8 'tcp_keepidle' => 4, // 4s沒有數(shù)據(jù)傳輸就進(jìn)行檢測(cè) 9 'tcp_keepinterval'
=> 1,// 1s探測(cè)一次 10 'tcp_keepcount' => 5, // 探測(cè)的次數(shù),超過5次后還沒有回包c(diǎn)lose此連接 11 ]); 12 13
$server->on('connect', function ($server, $fd) { 14 var_dump("Client: Connect
$fd"); 15 }); 16 17 $server->on('receive', function ($server, $fd, $reactor_id,
$data) { 18 var_dump($data); 19 }); 20 21 $server->on('close', function ($server
,$fd) { 22 var_dump("close fd $fd"); 23 }); 24 25 $server->start();
我們啟動(dòng)這個(gè)服務(wù)器:
1 ~/codeDir/phpCode/hyperf-skeleton # php server.php
然后通過tcpdump進(jìn)行抓包:
~/codeDir/phpCode/hyperf-skeleton # tcpdump -i lo port 6666 tcpdump: verbose
output suppressed,use -v or -vv for full protocol decode listening on lo,
link-type EN10MB (Ethernet), capture size 262144 bytes
我們此時(shí)正在監(jiān)聽lo上的6666端口的數(shù)據(jù)包。
然后我們用客戶端去連接它:
1 ~/codeDir/phpCode/hyperf-skeleton # nc 127.0.0.1 6666
此時(shí)服務(wù)端會(huì)打印出消息:
~/codeDir/phpCode/hyperf-skeleton # php server.php string(17) "Client: Connect
1"
tcpdump的輸出信息如下:
1 01:48:40.178439 IP localhost.33933 > localhost.6666: Flags [S], seq
43162537, win 43690, options [mss 65495,sackOK,TS val 9833698 ecr 0,nop,wscale
7], length 02 01:48:40.178484 IP localhost.6666 > localhost.33933: Flags [S.],
seq 1327460565, ack 43162538, win 43690, options [mss 65495,sackOK,TS val
9833698 ecr 9833698,nop,wscale 7], length 03 01:48:40.178519 IP localhost.33933
> localhost.6666: Flags [.], ack 1, win 342, options [nop,nop,TS val 9833698
ecr 9833698], length 04 01:48:44.229926 IP localhost.6666 > localhost.33933:
Flags [.], ack 1, win 342, options [nop,nop,TS val 9834104 ecr 9833698], length
05 01:48:44.229951 IP localhost.33933 > localhost.6666: Flags [.], ack 1, win
342, options [nop,nop,TS val 9834104 ecr 9833698], length 06 01:48:44.229926 IP
localhost.6666 > localhost.33933: Flags [.], ack 1, win 342, options
[nop,nop,TS val 9834104 ecr 9833698], length 07 01:48:44.229951 IP
localhost.33933 > localhost.6666: Flags [.], ack 1, win 342, options
[nop,nop,TS val 9834104 ecr 9833698], length 08 01:48:44.229926 IP
localhost.6666 > localhost.33933: Flags [.], ack 1, win 342, options
[nop,nop,TS val 9834104 ecr 9833698], length 09 // 省略了其他的輸出
我們會(huì)發(fā)現(xiàn)最開始的時(shí)候,會(huì)打印三次握手的包:
01:48:40.178439 IP localhost.33933 > localhost.6666: Flags [S], seq 43162537,
win 43690, options [mss 65495,sackOK,TS val 9833698 ecr 0,nop,wscale 7], length
0 01:48:40.178484 IP localhost.6666 > localhost.33933: Flags [S.], seq
1327460565, ack 43162538, win 43690, options [mss 65495,sackOK,TS val 9833698
ecr 9833698,nop,wscale 7], length 0 01:48:40.178519 IP localhost.33933 >
localhost.6666: Flags [.], ack 1, win 342, options [nop,nop,TS val 9833698 ecr
9833698], length 0
然后,停留了4s沒有任何包的輸出。
之后,每隔1s左右就會(huì)打印出一組:
1 01:52:54.359341 IP localhost.6666 > localhost.43101: Flags [.], ack 1, win
342, options [nop,nop,TS val 9859144 ecr 9858736], length 02 01:52:54.359377 IP
localhost.43101 > localhost.6666: Flags [.], ack 1, win 342, options
[nop,nop,TS val 9859144 ecr 9855887], length 0
其實(shí)這就是我們配置的策略:
1 'tcp_keepinterval' => 1, // 1s探測(cè)一次 2 'tcp_keepcount' => 5, //
探測(cè)的次數(shù),超過5次后還沒有回包c(diǎn)lose此連接
因?yàn)槲覀儾僮飨到y(tǒng)底層會(huì)自動(dòng)的給客戶端回ack,所以這個(gè)連接不會(huì)在5次探測(cè)后被關(guān)閉。操作系統(tǒng)底層會(huì)持續(xù)不斷的發(fā)送這樣的一組包:
1 01:52:54.359341 IP localhost.6666 > localhost.43101: Flags [.], ack 1, win
342, options [nop,nop,TS val 9859144 ecr 9858736], length 02 01:52:54.359377 IP
localhost.43101 > localhost.6666: Flags [.], ack 1, win 342, options
[nop,nop,TS val 9859144 ecr 9855887], length 0
如果我們要測(cè)試5次探測(cè)后關(guān)閉這個(gè)連接,可以禁掉6666端口的包:
1 ~/codeDir/phpCode/hyperf-skeleton # iptables -A INPUT -p tcp --dport 6666 -j
DROP
這樣會(huì)把所有從6666端口進(jìn)來的包給禁掉,自然,服務(wù)器就接收不到從客戶端那一邊發(fā)來的ack包了。
然后服務(wù)器過5秒就會(huì)打印出close(服務(wù)端主動(dòng)的調(diào)用了close方法,給客戶端發(fā)送了FIN包):
1 ~/codeDir/phpCode/hyperf-skeleton # php server.php 2 string(17) "Client:
Connect 1"3 string(10) "close fd 1"
我們恢復(fù)一下iptables的規(guī)則:
1 ~/codeDir/phpCode # iptables -D INPUT -p tcp -m tcp --dport 6666 -j DROP
即把我們?cè)O(shè)置的規(guī)則給刪除了。
通過tcp_keepalive的方式實(shí)現(xiàn)心跳的功能,優(yōu)點(diǎn)是簡(jiǎn)單,不要寫代碼就可以完成這個(gè)功能,并且發(fā)送的心跳包小。缺點(diǎn)是依賴于系統(tǒng)的網(wǎng)絡(luò)環(huán)境,必須保證服務(wù)器和客戶端都實(shí)現(xiàn)了這樣的功能,需要客戶端配合發(fā)心跳包。還有一個(gè)更為嚴(yán)重的缺點(diǎn)是如果客戶端和服務(wù)器不是直連的,而是通過代理來進(jìn)行連接的,例如socks5代理,它只會(huì)轉(zhuǎn)發(fā)應(yīng)用層的包,不會(huì)轉(zhuǎn)發(fā)更為底層的tcp探測(cè)包,那這個(gè)心跳功能就失效了。
所以,Swoole就提供了其他的解決方案,一組檢測(cè)死連接的配置。
1 'heartbeat_check_interval' => 1, // 1s探測(cè)一次 2 'heartbeat_idle_time' => 5, //
5s未發(fā)送數(shù)據(jù)包就close此連接
swoole實(shí)現(xiàn)的heartbeat
我們來測(cè)試一下:
1 <?php 2 3 $server = new \Swoole\Server('127.0.0.1', 6666, SWOOLE_PROCESS);
4 5 $server->set([ 6 'worker_num' => 1, 7 'heartbeat_check_interval' => 1,
// 1s探測(cè)一次 8 'heartbeat_idle_time' => 5, // 5s未發(fā)送數(shù)據(jù)包就close此連接 9 ]); 10 11
$server->on('connect', function ($server, $fd) { 12 var_dump("Client: Connect
$fd"); 13 }); 14 15 $server->on('receive', function ($server, $fd, $reactor_id,
$data) { 16 var_dump($data); 17 }); 18 19 $server->on('close', function ($server
,$fd) { 20 var_dump("close fd $fd"); 21 }); 22 23 $server->start();
然后啟動(dòng)服務(wù)器:
1 ~/codeDir/phpCode/hyperf-skeleton # php server.php
然后啟動(dòng)tcpdump:
?
1 ~/codeDir/phpCode # tcpdump -i lo port 6666 2 tcpdump: verbose output
suppressed,use -v or -vv for full protocol decode 3 listening on lo, link-type
EN10MB (Ethernet), capture size 262144 bytes
?
然后再啟動(dòng)客戶端:
1 ~/codeDir/phpCode/hyperf-skeleton # nc 127.0.0.1 6666
?
此時(shí)服務(wù)器端打印:
1 ~/codeDir/phpCode/hyperf-skeleton # php server.php 2 string(17) "Client:
Connect 1"
?
然后tcpdump打?。?br>
?
1 02:48:32.516093 IP localhost.42123 > localhost.6666: Flags [S], seq
1088388248, win 43690, options [mss 65495,sackOK,TS val 10193342 ecr
0,nop,wscale 7], length 02 02:48:32.516133 IP localhost.6666 > localhost.42123:
Flags [S.], seq 80508236, ack 1088388249, win 43690, options [mss
65495,sackOK,TS val 10193342 ecr 10193342,nop,wscale 7], length 03
02:48:32.516156 IP localhost.42123 > localhost.6666: Flags [.], ack 1, win 342,
options [nop,nop,TS val 10193342 ecr 10193342], length 0
?
這是三次握手信息。
然后過了5s后,tcpdump會(huì)打印出:
1 02:48:36.985027 IP localhost.6666 > localhost.42123: Flags [F.], seq 1, ack
1, win 342, options [nop,nop,TS val 10193789 ecr 10193342], length 02
02:48:36.992172 IP localhost.42123 > localhost.6666: Flags [.], ack 2, win 342,
options [nop,nop,TS val 10193790 ecr 10193789], length 0
也就是服務(wù)端發(fā)送了FIN包。因?yàn)榭蛻舳藳]有發(fā)送數(shù)據(jù),所以Swoole關(guān)閉了連接。
然后服務(wù)器端會(huì)打?。?br>1 ~/codeDir/phpCode/hyperf-skeleton # php server.php 2 string(17) "Client:
Connect 1"3 string(10) "close fd 1"
?
所以,heartbeat和tcp keepalive還是有一定的區(qū)別的,tcp
keepalive有?;钸B接的功能,但是heartbeat存粹是檢測(cè)沒有數(shù)據(jù)的連接,然后關(guān)閉它,并且只可以在服務(wù)端這邊配置,如果需要?;睿部梢宰尶蛻舳伺浜习l(fā)送心跳。
如果我們不想讓服務(wù)端close掉連接,那么就得在應(yīng)用層里面不斷的發(fā)送數(shù)據(jù)包來進(jìn)行?;?,例如我在nc客戶端里面不斷的發(fā)送包:
1 ~/codeDir/phpCode/hyperf-skeleton # nc 127.0.0.1 6666 2 ping 3 ping 4
ping 5 ping 6 ping 7 ping 8 ping 9 ping 10 ping
?
我發(fā)送了9個(gè)ping包給服務(wù)器,tcpdump的輸出如下:
1 // 省略了三次握手的包 2 02:57:53.697363 IP localhost.44195 > localhost.6666: Flags
[P.], seq 1:6, ack 1, win 342, options [nop,nop,TS val 10249525 ecr 10249307],
length 5 3 02:57:53.697390 IP localhost.6666 > localhost.44195: Flags [.], ack
6, win 342, options [nop,nop,TS val 10249525 ecr 10249525], length 0 4
02:57:55.309532 IP localhost.44195 > localhost.6666: Flags [P.], seq 6:11, ack
1, win 342, options [nop,nop,TS val 10249686 ecr 10249525], length 5 5
02:57:55.309576 IP localhost.6666 > localhost.44195: Flags [.], ack 11, win
342, options [nop,nop,TS val 10249686 ecr 10249686], length 0 6 02:57:58.395206
IP localhost.44195 > localhost.6666: Flags [P.], seq 11:16, ack 1, win 342,
options [nop,nop,TS val 10249994 ecr 10249686], length 5 7 02:57:58.395239 IP
localhost.6666 > localhost.44195: Flags [.], ack 16, win 342, options
[nop,nop,TS val 10249994 ecr 10249994], length 0 8 02:58:01.858094 IP
localhost.44195 > localhost.6666: Flags [P.], seq 16:21, ack 1, win 342,
options [nop,nop,TS val 10250341 ecr 10249994], length 5 9 02:58:01.858126 IP
localhost.6666 > localhost.44195: Flags [.], ack 21, win 342, options
[nop,nop,TS val 10250341 ecr 10250341], length 010 02:58:04.132584 IP
localhost.44195 > localhost.6666: Flags [P.], seq 21:26, ack 1, win 342,
options [nop,nop,TS val 10250568 ecr 10250341], length 511 02:58:04.132609 IP
localhost.6666 > localhost.44195: Flags [.], ack 26, win 342, options
[nop,nop,TS val 10250568 ecr 10250568], length 012 02:58:05.895704 IP
localhost.44195 > localhost.6666: Flags [P.], seq 26:31, ack 1, win 342,
options [nop,nop,TS val 10250744 ecr 10250568], length 513 02:58:05.895728 IP
localhost.6666 > localhost.44195: Flags [.], ack 31, win 342, options
[nop,nop,TS val 10250744 ecr 10250744], length 014 02:58:07.150265 IP
localhost.44195 > localhost.6666: Flags [P.], seq 31:36, ack 1, win 342,
options [nop,nop,TS val 10250870 ecr 10250744], length 515 02:58:07.150288 IP
localhost.6666 > localhost.44195: Flags [.], ack 36, win 342, options
[nop,nop,TS val 10250870 ecr 10250870], length 016 02:58:08.349124 IP
localhost.44195 > localhost.6666: Flags [P.], seq 36:41, ack 1, win 342,
options [nop,nop,TS val 10250990 ecr 10250870], length 517 02:58:08.349156 IP
localhost.6666 > localhost.44195: Flags [.], ack 41, win 342, options
[nop,nop,TS val 10250990 ecr 10250990], length 018 02:58:09.906223 IP
localhost.44195 > localhost.6666: Flags [P.], seq 41:46, ack 1, win 342,
options [nop,nop,TS val 10251145 ecr 10250990], length 519 02:58:09.906247 IP
localhost.6666 > localhost.44195: Flags [.], ack 46, win 342, options
[nop,nop,TS val 10251145 ecr 10251145], length 0
?
有9組數(shù)據(jù)包的發(fā)送。(這里的Flags [P.]代表Push的含義)
此時(shí)服務(wù)器還沒有close掉連接,實(shí)現(xiàn)了客戶端?;钸B接的功能。然后我們停止發(fā)送ping,過了5秒后tcpdump就會(huì)輸出一組:
02:58:14.811761 IP localhost.6666 > localhost.44195: Flags [F.], seq 1, ack
46, win 342, options [nop,nop,TS val 10251636 ecr 10251145], length 0
02:58:14.816420 IP localhost.44195 > localhost.6666: Flags [.], ack 2, win
342, options [nop,nop,TS val 10251637 ecr 10251636], length 0
服務(wù)端那邊發(fā)送了FIN包,說明服務(wù)端close掉了連接。服務(wù)端的輸出如下:
1 ~/codeDir/phpCode/hyperf-skeleton # php server.php 2 string(17) "Client:
Connect 1" 3 string(5) "ping 4 " 5 string(5) "ping 6 " 7 string(5) "ping 8
" 9 string(5) "ping 10 " 11 string(5) "ping 12 " 13 string(5) "ping 14 " 15
string(5) "ping 16 " 17 string(5) "ping 18 " 19 string(5) "ping 20 " 21 string
(10) "close fd 1"
?
然后我們?cè)诳蛻舳四沁卌trl + c來關(guān)閉連接:
1 ~/codeDir/phpCode/hyperf-skeleton # nc 127.0.0.1 6666 2 ping 3 ping 4
ping 5 ping 6 ping 7 ping 8 ping 9 ping 10 ping 11 ^Cpunt! 12 13
~/codeDir/phpCode/hyperf-skeleton#
?
此時(shí),tcpdump的輸出如下:
1 03:03:02.257667 IP localhost.44195 > localhost.6666: Flags [F.], seq 46, ack
2, win 342, options [nop,nop,TS val 10280414 ecr 10251636], length 02
03:03:02.257734 IP localhost.6666 > localhost.44195: Flags [R], seq 2678621620,
win 0, length 0
?
應(yīng)用層心跳
1、制定ping/pong協(xié)議(mysql等自帶ping協(xié)議)
2、客戶端靈活的發(fā)送ping心跳包
3、服務(wù)端OnRecive檢查可用性回復(fù)pong
例如:
1 $server->on('receive', function (\Swoole\Server $server, $fd, $reactor_id,
$data) 2 { 3 if ($data == 'ping') 4 { 5 checkDB(); 6 checkServiceA(); 7
checkRedis(); 8 $server->send('pong'); 9 } 10 });
?
結(jié)論
1、tcp的keepalive最簡(jiǎn)單,但是有兼容性問題,不夠靈活
2、swoole提供的keepalive最實(shí)用,但是需要客戶端配合,復(fù)雜度適中
3、應(yīng)用層的keepalive最靈活但是最麻煩
熱門工具 換一換