淺述 Docker 的容器編排
在下方公眾號(hào)后臺(tái)回復(fù):JGNB,可獲取杰哥原創(chuàng)的 PDF 手冊(cè)。
概述
作為容器引擎,Docker為容器化的應(yīng)用程序提供了開放標(biāo)準(zhǔn),使得開發(fā)者可以用管理應(yīng)用程序的方式來(lái)管理基礎(chǔ)架構(gòu),實(shí)現(xiàn)快速交付、測(cè)試和部署代碼。隨著容器的大量使用,又產(chǎn)生了如何協(xié)調(diào)、調(diào)度和管理容器的問(wèn)題,Docker的容器編排應(yīng)運(yùn)而生。所謂容器編排,通俗一點(diǎn)可以理解為集群管理。
Docker的容器編排工具有很多,最出名的當(dāng)屬Compose、Machine和Swarm,合稱Docker三劍客。其中Compose和Machine是第三方的,而Swarm則是Docker官方的容器編排工具,已經(jīng)被集成在Docker中了。
Swarm由三大部分組成:
swarm:集群管理
node:節(jié)點(diǎn)管理
service:服務(wù)管理
集群與節(jié)點(diǎn)管理
使用 docker swarm 命令,可以創(chuàng)建或加入集群,Docker集群中的節(jié)點(diǎn)分為manager和worker兩種。這兩種節(jié)點(diǎn),都可以運(yùn)行Docker容器,但只有manager節(jié)點(diǎn),擁有管理功能。
一個(gè)集群中,即便只有manager節(jié)點(diǎn)也可以正常的工作。
2.1 創(chuàng)建集群
我測(cè)試的環(huán)境有兩臺(tái)機(jī)器,ip地址分別為192.168.1.220和192.168.1.116。下面在192.168.1.220上創(chuàng)建集群:
#?docker?swarm?init
Swarm?initialized:?current?node?(ppmurem8j7mdbmgpdhssjh0h9)?is?now?a?manager.
To?add?a?worker?to?this?swarm,?run?the?following?command:
????docker?swarm?join?--token?SWMTKN-1-3e4l8crbt04xlqfxwyw6nuf9gtcwpw72zggtayzy8clyqmvb5h-7o6ww4ftwm38dz7ydbolsz3kd?192.168.1.220:2377
To?add?a?manager?to?this?swarm,?run?'docker?swarm?join-token?manager'?and?follow?the?instructions.
執(zhí)行 docker swarm init 后,集群就被創(chuàng)建好了。當(dāng)前的機(jī)器,自動(dòng)成為集群的manager節(jié)點(diǎn),并且輸出了其他機(jī)器加入集群的方式,
即:
docker swarm join --token SWMTKN-1-3e4l8crbt04xlqfxwyw6nuf9gtcwpw72zggtayzy8clyqmvb5h-7o6ww4ftwm38dz7ydbolsz3kd 192.168.1.220:2377。
使用這個(gè)token加入的節(jié)點(diǎn),是worker節(jié)點(diǎn),如果想加入一個(gè)新的manager節(jié)點(diǎn),可以執(zhí)行 docker swarm join-token manager,它也會(huì)輸出一串類似的命令,執(zhí)行就可以以manager的方式加入。如果忘記加入的命令,也可以使用docker swarm join-token worker 進(jìn)行查看。
2.2 加入集群
下面在192.168.1.116上執(zhí)行加入命令:
#?docker?swarm?join?--token?SWMTKN-1-12dlq70adr3z38mlkltc288rdzevtjn73xse7d0qndnjmx45zs-b1kwenzmrsqb4o5nvni5rafcr?192.168.1.220:2377
This?node?joined?a?swarm?as?a?worker.
這里發(fā)生了一個(gè)小插曲,在我創(chuàng)建集群的兩臺(tái)機(jī)器的時(shí)區(qū)不一致,導(dǎo)致在加入worker節(jié)點(diǎn)時(shí)報(bào)錯(cuò):
Error?response?from?daemon:?error?while?validating?Root?CA?Certificate:?x509:?certificate?has?expired?or?is?not?yet?valid
在更新了220的時(shí)區(qū)后,依然無(wú)法加入。于是,我刪除了集群又重新創(chuàng)建,就可以了。沒(méi)有嘗試使用docker swarm update是不是也可以。
加入了集群后,可以在manager節(jié)點(diǎn)上,查詢集群的節(jié)點(diǎn):
#?docker?node?ls
ID????????????????????????????HOSTNAME??????????????????STATUS????AVAILABILITY???MANAGER?STATUS???ENGINE?VERSION
9b4cmakc4hpc9ra4rruy5x5yo?*???localhost.localdomain?????Ready?????Active?????????Leader???????????20.10.3
hz50cnwrbk4vxa7h0g23ccil9?????zhangmh-virtual-machine???Ready?????Active??????????????????????????20.10.1
2.3 退出集群
在192.168.1.116上執(zhí)行下面命令,可以退出集群:
#?docker?swarm?leave???????????????????????????????????????????????????????????????????????????????????????????????????????????
Node?left?the?swarm.
再次查看節(jié)點(diǎn):
#?docker?node?ls
ID????????????????????????????HOSTNAME??????????????????STATUS????AVAILABILITY???MANAGER?STATUS???ENGINE?VERSION
9b4cmakc4hpc9ra4rruy5x5yo?*???localhost.localdomain?????Ready?????Active?????????Leader???????????20.10.3
hz50cnwrbk4vxa7h0g23ccil9?????zhangmh-virtual-machine???Down??????Active??????????????????????????20.10.1
發(fā)現(xiàn)剛退出的這個(gè)節(jié)點(diǎn)還在,只是狀態(tài)變成了Down。需要在manager節(jié)點(diǎn)中刪除:
#?docker?node?rm?hz50cnwrbk4vxa7h0g23ccil9
hz50cnwrbk4vxa7h0g23ccil9
#?docker?node?ls
ID????????????????????????????HOSTNAME??????????????????STATUS????AVAILABILITY???MANAGER?STATUS???ENGINE?VERSION
9b4cmakc4hpc9ra4rruy5x5yo?*???localhost.localdomain?????Ready?????Active?????????Leader???????????20.10.3
xby86ffkqw3axyfkwd4s7nubz?????zhangmh-virtual-machine???Ready?????Active??????????????????????????20.10.1
這樣才真正刪除了節(jié)點(diǎn)。
如果退出的節(jié)點(diǎn)是manager節(jié)點(diǎn),需要強(qiáng)制退出,即:docker swarm leave -f。
2.4 將節(jié)點(diǎn)提升為 manager 節(jié)點(diǎn)
只有一個(gè)manager的集群是不穩(wěn)定的,當(dāng)manager節(jié)點(diǎn)崩潰時(shí),整個(gè)集群就群龍無(wú)首了。Docker認(rèn)為,一個(gè)集群中應(yīng)該至少有三個(gè)manager節(jié)點(diǎn),并且有一半以上的manager節(jié)點(diǎn)是可達(dá)的,才能保證集群的正常運(yùn)行。當(dāng)集群中只有兩個(gè)manager節(jié)點(diǎn),且有一個(gè)節(jié)點(diǎn)出現(xiàn)問(wèn)題時(shí),整個(gè)集群還是處于不可用的狀態(tài)。
當(dāng)然,對(duì)于我們測(cè)試,是沒(méi)有必要的,我們只需要使用兩個(gè)manager節(jié)點(diǎn),測(cè)試一下是否可以主從切換就可以了。使用下面的命令,可以直接將workder節(jié)點(diǎn)提升為manager節(jié)點(diǎn):
#?docker?node?promote?xby86ffkqw3axyfkwd4s7nubz
Node?xby86ffkqw3axyfkwd4s7nubz?promoted?to?a?manager?in?the?swarm.
#?docker?node?ls
ID????????????????????????????HOSTNAME??????????????????STATUS????AVAILABILITY???MANAGER?STATUS???ENGINE?VERSION
9b4cmakc4hpc9ra4rruy5x5yo?*???localhost.localdomain?????Ready?????Active?????????Leader???????????20.10.3
xby86ffkqw3axyfkwd4s7nubz?????zhangmh-virtual-machine???Ready?????Active?????????Reachable????????20.10.1
OK,現(xiàn)在有兩個(gè)manager節(jié)點(diǎn)了,192.168.1.220的狀態(tài)為leader,即當(dāng)前是領(lǐng)導(dǎo)節(jié)點(diǎn),192.168.1.116的狀態(tài)為Reachable,是可達(dá)的。下面關(guān)閉192.168.1.220節(jié)點(diǎn)的Docker服務(wù):
#?systemctl?stop?docker
Warning:?Stopping?docker.service,?but?it?can?still?be?activated?by:
??docker.socket
關(guān)閉時(shí)輸出了一個(gè)警告,意思是Docker服務(wù)已經(jīng)被關(guān)閉了,但它仍然可被docker.socket服務(wù)喚醒。再次查看節(jié)點(diǎn)狀態(tài):
#?docker?node?ls
ID????????????????????????????HOSTNAME??????????????????STATUS????AVAILABILITY???MANAGER?STATUS???ENGINE?VERSION
9b4cmakc4hpc9ra4rruy5x5yo?*???localhost.localdomain?????Ready?????Active?????????Reachable????????20.10.3
xby86ffkqw3axyfkwd4s7nubz?????zhangmh-virtual-machine???Ready?????Active?????????Leader???????????20.10.1
可以看到192.168.1.116已經(jīng)成為了Leader,并且,192.168.1.220也已經(jīng)被喚醒。
由此可見,Docker集群的穩(wěn)定是相當(dāng)不錯(cuò)的。
服務(wù)管理
集群中各節(jié)點(diǎn)都配置好后,就可以創(chuàng)建服務(wù)了。Docker的服務(wù)其實(shí)就是啟動(dòng)容器,并且賦予了容器副本和負(fù)載均衡的能力。以之前創(chuàng)建的ws:1.0為例,創(chuàng)建5個(gè)副本:
#?docker?service?create?--replicas?5?--name?ws?-p?80:8000?ws:1.0
image?ws:1.0?could?not?be?accessed?on?a?registry?to?record
its?digest.?Each?node?will?access?ws:1.0?independently,
possibly?leading?to?different?nodes?running?different
versions?of?the?image.
1nj3o38slbo2zwt5p69l1qi5t
overall?progress:?5?out?of?5?tasks?
1/5:?running???[==================================================>]?
2/5:?running???[==================================================>]?
3/5:?running???[==================================================>]?
4/5:?running???[==================================================>]?
5/5:?running???[==================================================>]?
verify:?Service?converged
服務(wù)已經(jīng)創(chuàng)建并運(yùn)行了,使用瀏覽器訪問(wèn)192.168.1.220和192.168.1.116的80端口都可以訪問(wèn)。
使用 docker service ls 命令可以查看ws服務(wù):
#?docker?service?ls
ID?????????????NAME??????MODE?????????REPLICAS???IMAGE?????PORTS
1nj3o38slbo2???ws????????replicated???5/5????????ws:1.0????*:80->8000/tcp
使用 docker service ps ws 命令可查看ws服務(wù)的進(jìn)程:
#?docker?service?ps?ws
ID?????????????NAME??????IMAGE?????NODE??????????????????????DESIRED?STATE???CURRENT?STATE???????????ERROR?????PORTS
jpckj0mn24ae???ws.1??????ws:1.0????zhangmh-virtual-machine???Running?????????Running?6?minutes?ago?????????????
yrrdn4ntb089???ws.2??????ws:1.0????localhost.localdomain?????Running?????????Running?6?minutes?ago?????????????
mdjxadbmlmhs???ws.3??????ws:1.0????zhangmh-virtual-machine???Running?????????Running?6?minutes?ago?????????????
kqdwfrddbaxd???ws.4??????ws:1.0????localhost.localdomain?????Running?????????Running?6?minutes?ago?????????????
is2iimz1v4eb???ws.5??????ws:1.0????zhangmh-virtual-machine???Running?????????Running?6?minutes?ago?
可以看到有兩個(gè)進(jìn)程運(yùn)行在192.168.1.220上,三個(gè)進(jìn)程運(yùn)行在192.168.1.116上。我在瀏覽器上訪問(wèn)了幾次之后 ,使用 docker service logs ws 命令查看服務(wù)的日志:
#?docker?service?logs?ws
ws.5.is2iimz1v4eb@zhangmh-virtual-machine????|?[I?210219?01:57:23?web:2239]?200?GET?/?(10.0.0.2)?3.56ms
ws.5.is2iimz1v4eb@zhangmh-virtual-machine????|?[W?210219?01:57:23?web:2239]?404?GET?/favicon.ico?(10.0.0.2)?0.97ms
ws.5.is2iimz1v4eb@zhangmh-virtual-machine????|?[I?210219?01:57:28?web:2239]?200?GET?/?(10.0.0.4)?0.82ms
ws.5.is2iimz1v4eb@zhangmh-virtual-machine????|?[W?210219?01:57:28?web:2239]?404?GET?/favicon.ico?(10.0.0.4)?0.79ms
ws.1.jpckj0mn24ae@zhangmh-virtual-machine????|?[I?210219?02:01:45?web:2239]?304?GET?/?(10.0.0.2)?1.82ms
ws.1.jpckj0mn24ae@zhangmh-virtual-machine????|?[I?210219?02:01:59?web:2239]?304?GET?/?(10.0.0.2)?0.49ms
ws.1.jpckj0mn24ae@zhangmh-virtual-machine????|?[I?210219?02:02:01?web:2239]?304?GET?/?(10.0.0.2)?2.05ms
ws.1.jpckj0mn24ae@zhangmh-virtual-machine????|?[I?210219?02:02:02?web:2239]?304?GET?/?(10.0.0.2)?0.89ms
ws.1.jpckj0mn24ae@zhangmh-virtual-machine????|?[I?210219?02:02:02?web:2239]?304?GET?/?(10.0.0.2)?1.13ms
ws.1.jpckj0mn24ae@zhangmh-virtual-machine????|?[I?210219?02:02:03?web:2239]?304?GET?/?(10.0.0.2)?0.92ms
ws.1.jpckj0mn24ae@zhangmh-virtual-machine????|?[I?210219?02:02:03?web:2239]?304?GET?/?(10.0.0.2)?2.19ms
ws.1.jpckj0mn24ae@zhangmh-virtual-machine????|?[I?210219?02:02:20?web:2239]?304?GET?/?(10.0.0.2)?1.00ms
可以看到即使我訪問(wèn)的是192.168.1.220,而實(shí)際訪問(wèn)的扔然是192.168.1.116上的進(jìn)程。
如果把192.168.1.116關(guān)機(jī),其上運(yùn)行的進(jìn)程會(huì)自動(dòng)轉(zhuǎn)移到192.168.1.220的節(jié)點(diǎn)中,因?yàn)?92.168.1.116現(xiàn)在是manager節(jié)點(diǎn),如果停止,集群會(huì)進(jìn)入不可用的狀態(tài),所以,需要先將其降級(jí)為worker節(jié)點(diǎn):
#?docker?node?demote?xby86ffkqw3axyfkwd4s7nubz
Manager?xby86ffkqw3axyfkwd4s7nubz?demoted?in?the?swarm.
然后,將192.168.1.116關(guān)機(jī)。
#?docker?service?ps?ws
ID?????????????NAME???????IMAGE?????NODE??????????????????????DESIRED?STATE???CURRENT?STATE????????????????ERROR??PORTS
jrj9ben9vr5c???ws.1???????ws:1.0????localhost.localdomain?????Running?????????Running?57?minutes?ago?????????????????????????
yrrdn4ntb089???ws.2???????ws:1.0????localhost.localdomain?????Running?????????Running?about?an?hour?ago???????????????????????
opig9zrmp261???ws.3???????ws:1.0????localhost.localdomain?????Running?????????Running?57?minutes?ago?????????????????????????
kqdwfrddbaxd???ws.4???????ws:1.0????localhost.localdomain?????Running?????????Running?about?an?hour?ago
hiz8730pl3je???ws.5???????ws:1.0????localhost.localdomain?????Running?????????Running?57?minutes?ago?????????????????????????
可以看到5個(gè)進(jìn)程都轉(zhuǎn)移到192.168.1.220上運(yùn)行了。
#?docker?ps
CONTAINER?ID???IMAGE?????COMMAND??????????????????CREATED???????STATUS???????PORTS?????NAMES
bc4c457ce769???ws:1.0????"/bin/sh?-c?'python?…"???3?hours?ago???Up?3?hours?????????????ws.5.hiz8730pl3je7qvo2lv6k554b
c846ac1c4d91???ws:1.0????"/bin/sh?-c?'python?…"???3?hours?ago???Up?3?hours?????????????ws.3.opig9zrmp2619t4e1o3ntnj2w
214daa36c138???ws:1.0????"/bin/sh?-c?'python?…"???3?hours?ago???Up?3?hours?????????????ws.1.jrj9ben9vr5c3biuc90xtoffh
17842db9dc47???ws:1.0????"/bin/sh?-c?'python?…"???3?hours?ago???Up?3?hours?????????????ws.4.kqdwfrddbaxd5z78uo3zsy5sd
47185ba9a4fd???ws:1.0????"/bin/sh?-c?'python?…"???3?hours?ago???Up?3?hours?????????????ws.2.yrrdn4ntb089t6i66w8xvq8r9
#?docker?kill?bc4c457ce769
bc4c457ce769
殺死第5個(gè)進(jìn)程后,等待幾秒再查看進(jìn)程:
#?docker?ps
CONTAINER?ID???IMAGE?????COMMAND??????????????????CREATED??????????????STATUS??????????????PORTS?????NAMES
416b55e8d174???ws:1.0????"/bin/sh?-c?'python?…"???About?a?minute?ago???Up?About?a?minute?????????????ws.5.fvpm334t2zqbj5l50tyx5glr6
c846ac1c4d91???ws:1.0????"/bin/sh?-c?'python?…"???3?hours?ago??????????Up?3?hours????????????????????ws.3.opig9zrmp2619t4e1o3ntnj2w
214daa36c138???ws:1.0????"/bin/sh?-c?'python?…"???3?hours?ago??????????Up?3?hours????????????????????ws.1.jrj9ben9vr5c3biuc90xtoffh
17842db9dc47???ws:1.0????"/bin/sh?-c?'python?…"???3?hours?ago??????????Up?3?hours????????????????????ws.4.kqdwfrddbaxd5z78uo3zsy5sd
47185ba9a4fd???ws:1.0????"/bin/sh?-c?'python?…"???3?hours?ago??????????Up?3?hours????????????????????ws.2.yrrdn4ntb089t6i66w8xvq8r9
第5個(gè)進(jìn)程又被啟動(dòng)。
Docker服務(wù)的副本數(shù)量是可以動(dòng)態(tài)調(diào)整的,比如系統(tǒng)負(fù)載過(guò)高,需要添加副本時(shí),只需要執(zhí)行:
#?docker?service?scale?ws=6
ws?scaled?to?6
overall?progress:?6?out?of?6?tasks?
1/6:?running???[==================================================>]?
2/6:?running???[==================================================>]?
3/6:?running???[==================================================>]?
4/6:?running???[==================================================>]?
5/6:?running???[==================================================>]?
6/6:?running???[==================================================>]?
verify:?Service?converged
這樣,就增加了一個(gè)副本。
服務(wù)創(chuàng)建好以后,就可以隨著Docker的系統(tǒng)服務(wù)被啟動(dòng),只要執(zhí)行:
systemctl?enable?docker
剛才創(chuàng)建的集群和服務(wù)都會(huì)開機(jī)啟動(dòng),不用擔(dān)心機(jī)器重啟導(dǎo)致程序運(yùn)行不正常。
共享數(shù)據(jù)卷
首先,使用 docker volume create 命令創(chuàng)建一個(gè)數(shù)據(jù)卷:
#?docker?volume?create?ws_volume
ws_volume
創(chuàng)建完成后,使用 docker volume ls 命令可查看現(xiàn)有的數(shù)據(jù)卷:
#?docker?volume?ls
DRIVER????VOLUME?NAME
local?????ws_volume
使用 docker inspect 命令可查看數(shù)據(jù)卷的詳細(xì)信息:
#?docker?inspect?ws_volume
[
????{
????????"CreatedAt":?"2021-02-19T14:09:58+08:00",
????????"Driver":?"local",
????????"Labels":?{},
????????"Mountpoint":?"/var/lib/docker/volumes/ws_volume/_data",
????????"Name":?"ws_volume",
????????"Options":?{},
????????"Scope":?"local"
????}
]
在創(chuàng)建service時(shí),可使用 --mount 參數(shù)將數(shù)據(jù)卷掛載到service中:
#?docker?service?create?--replicas?2?--name?ws?-p?80:8000?--mount?type=volume,src=ws_volume,dst=/volume?ws:1.0
image?ws:1.0?could?not?be?accessed?on?a?registry?to?record
its?digest.?Each?node?will?access?ws:1.0?independently,
possibly?leading?to?different?nodes?running?different
versions?of?the?image.
iiiit9slq9qqwcdwwi0w0mcz5
overall?progress:?2?out?of?2?tasks?
1/2:?running???[==================================================>]?
2/2:?running???[==================================================>]?
verify:?Service?converged?
--mount 有很多的子參數(shù),把它們寫成key=value的形式,然后用逗號(hào)隔開即可,最簡(jiǎn)單的,只需要設(shè)置type、src、dst三個(gè)參數(shù)即可。
作者?|?天元浪子
來(lái)源?|?CSDN博客
推薦閱讀
超值一篇分享,Docker:從入門到實(shí)戰(zhàn)過(guò)程全記錄
史上講解最好的 Docker 教程,從入門到精通(建議收藏的教程)

