基于GAN的自動(dòng)駕駛汽車語義分割
點(diǎn)擊上方“小白學(xué)視覺”,選擇加"星標(biāo)"或“置頂”
重磅干貨,第一時(shí)間送達(dá)

語義分割是計(jì)算機(jī)視覺中的關(guān)鍵概念之一,語義分割允許計(jì)算機(jī)通過按類型對(duì)圖像中的對(duì)象進(jìn)行顏色編碼。GAN建立在基于真實(shí)內(nèi)容的基礎(chǔ)上復(fù)制和生成原始內(nèi)容的概念上,這使它們適合于在街景圖像上進(jìn)行語義分割的任務(wù),不同部分的分割使在環(huán)境中導(dǎo)航的代理能夠適當(dāng)?shù)匕l(fā)揮作用。
我們從一個(gè)kaggle數(shù)據(jù)集獲取數(shù)據(jù),街景和分割的圖像被配對(duì)在一起。這意味著為了構(gòu)建數(shù)據(jù)集,必須將每個(gè)圖像分成兩部分,以分割每個(gè)實(shí)例的語義圖像和街景圖像。
from PIL import Imagefrom IPython.display import clear_outputimport numpy as npsemantic = []real = []semantic_imgs = []real_imgs = []counter = 0for img in img_paths:if 'jpg' in img:im = Image.open(img)left = 0top = 0right = 256bottom = 256real_img = im.crop((left, top, right, bottom))real_imgs.append(real_img)real.append(np.array(real_img.getdata()).reshape(256,256,3))left = 256top = 0right = 512bottom = 256semantic_img = im.crop((left, top, right, bottom))semantic_imgs.append(semantic_img)semantic.append(np.array(semantic_img.getdata()).reshape(256,256,3))counter += 1print(counter)if counter % 10 == 0:clear_output()else:print(img)
該腳本將每個(gè)圖像裁剪為兩個(gè),并記錄像素值和原始圖像。原始圖像也被記錄下來,因此以后無需再進(jìn)行顯示。
import numpy as npsemantic = np.array(semantic)real = np.array(real)X = realy = semantic
將兩個(gè)列表都轉(zhuǎn)換為numpy數(shù)組后,可以直接定義x和y值。實(shí)際上,根據(jù)目標(biāo),你們可以切換x和y值以控制模型的輸出。在這種情況下,我們想將真實(shí)圖像轉(zhuǎn)換為語義圖像。但是,稍后我們將嘗試訓(xùn)練GAN將語義數(shù)據(jù)轉(zhuǎn)換為真實(shí)數(shù)據(jù)。
from numpy import expand_dimsfrom numpy import zerosfrom numpy import onesfrom numpy import vstackfrom numpy.random import randnfrom numpy.random import randintfrom keras.utils import plot_modelfrom keras.models import Modelfrom keras.layers import Inputfrom keras.layers import Densefrom keras.layers import Flattenfrom keras.layers.convolutional import Conv2D,Conv2DTransposefrom keras.layers.pooling import MaxPooling2Dfrom keras.layers.merge import concatenatefrom keras.initializers import RandomNormalfrom keras.layers import LeakyReLUfrom keras.layers import BatchNormalizationfrom keras.layers import Activation,Reshapefrom keras.optimizers import Adamfrom keras.models import Sequentialfrom keras.layers import Dropoutfrom IPython.display import clear_outputfrom keras.layers import Concatenate
當(dāng)我們使用keras框架構(gòu)造生成器和鑒別器時(shí),我們需要導(dǎo)入所有必需的圖層類型以構(gòu)造模型。這包括主要的卷積和卷積轉(zhuǎn)置層,以及批處理歸一化層和泄漏的relu層。串聯(lián)層用于構(gòu)建U-net體系結(jié)構(gòu),因?yàn)樗梢詫⒛承渔溄釉谝黄稹?/p>
def define_discriminator(image_shape=(256,256,3)): init = RandomNormal(stddev=0.02) in_src_image = Input(shape=image_shape) in_target_image = Input(shape=image_shape) merged = Concatenate()([in_src_image, in_target_image]) d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(merged) d = LeakyReLU(alpha=0.2)(d) d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) d = Conv2D(256, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) d = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) d = Conv2D(512, (4,4), padding='same', kernel_initializer=init)(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) d = Conv2D(1, (4,4), padding='same', kernel_initializer=init)(d) patch_out = Activation('sigmoid')(d) model = Model([in_src_image, in_target_image], patch_out) opt = Adam(lr=0.0002, beta_1=0.5) ='binary_crossentropy', optimizer=opt, loss_weights=[0.5]) return model
該判別器是pix2pix GAN論文使用的模型的keras實(shí)現(xiàn)。使用泄漏的Relu而不是正常的Relu是為了使負(fù)值仍然被考慮在內(nèi)。這增加了收斂速度。鑒別器執(zhí)行二進(jìn)制分類,因此在最后一層使用S形,并使用二進(jìn)制交叉熵作為損失函數(shù)。
def define_encoder_block(layer_in, n_filters, batchnorm=True):init = RandomNormal(stddev=0.02)g = Conv2D(n_filters, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(layer_in)if batchnorm:g = BatchNormalization()(g, training=True)g = LeakyReLU(alpha=0.2)(g)return gdef decoder_block(layer_in, skip_in, n_filters, dropout=True):init = RandomNormal(stddev=0.02)g = Conv2DTranspose(n_filters, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(layer_in)g = BatchNormalization()(g, training=True)if dropout:g = Dropout(0.5)(g, training=True)g = Concatenate()([g, skip_in])g = Activation('relu')(g)return g
生成器包括多次對(duì)初始數(shù)據(jù)進(jìn)行編碼,直到獲得原始圖像的特征圖為止。然后將此特征圖像解碼,直到獲得完整分辨率的圖像為止。這意味著生成器中的大多數(shù)層只是編碼器和解碼器塊。在對(duì)編碼器解碼器塊進(jìn)行了精心設(shè)計(jì)之后,為了構(gòu)建生成器,沒有更多的工作要做。
def define_generator(image_shape=(256,256,3)):init = RandomNormal(stddev=0.02)in_image = Input(shape=image_shape)e1 = define_encoder_block(in_image, 64, batchnorm=False)e2 = define_encoder_block(e1, 128)e3 = define_encoder_block(e2, 256)e4 = define_encoder_block(e3, 512)e5 = define_encoder_block(e4, 512)e6 = define_encoder_block(e5, 512)e7 = define_encoder_block(e6, 512)b = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(e7)b = Activation('relu')(b)d1 = decoder_block(b, e7, 512)d2 = decoder_block(d1, e6, 512)d3 = decoder_block(d2, e5, 512)d4 = decoder_block(d3, e4, 512, dropout=False)d5 = decoder_block(d4, e3, 256, dropout=False)d6 = decoder_block(d5, e2, 128, dropout=False)d7 = decoder_block(d6, e1, 64, dropout=False)g = Conv2DTranspose(3, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d7)out_image = Activation('tanh')(g)model = Model(in_image, out_image)return model
使用多個(gè)編碼器和解碼器,我們得到了這個(gè)生成器。使用雙曲正切可對(duì)數(shù)據(jù)進(jìn)行歸一化,范圍從(0,255)到(-1,1)。我們必須記住將數(shù)據(jù)編碼為范圍(-1,1),這樣才能正確評(píng)估生成器的輸出和y值。
def define_gan(g_model, d_model, image_shape):= Falsein_src = Input(shape=image_shape)gen_out = g_model(in_src)dis_out = d_model([in_src, gen_out])model = Model(in_src, [dis_out, gen_out])opt = Adam(lr=0.0002, beta_1=0.5)=['binary_crossentropy', 'mse'], optimizer=opt)return model
將兩個(gè)模型連接在一起即可得到完整的GAN。發(fā)生器的輸出直接饋入鑒別器。
def generate_real_samples(dataset, n_samples, patch_shape):trainB = datasetix = randint(0, trainA.shape[0], n_samples)X2 = trainA[ix], trainB[ix]y = ones((n_samples, patch_shape, patch_shape, 1))X1 = (X1 - 127.5) / 127.5X2 = (X2 - 127.5) / 127.5return [X1, X2], ydef generate_fake_samples(g_model, samples, patch_shape):X = g_model.predict(samples)y = zeros((len(X), patch_shape, patch_shape, 1))return X, y
為了使鑒別器起作用,必須同時(shí)提供真實(shí)樣本和計(jì)算機(jī)生成的樣本。但是,該過程并不是那么簡(jiǎn)單,需要對(duì)這些值進(jìn)行標(biāo)準(zhǔn)化。由于像素值的范圍介于0到255之間,因此通過使用等式X1 =(X1–127.5)/ 127.5,所有值都將在(-1,1)范圍內(nèi)進(jìn)行歸一化。
def train(d_model, g_model, gan_model, dataset, n_epochs=100, n_batch=10):n_patch = d_model.output_shape[1]trainB = datasetbat_per_epo = int(len(trainA) / n_batch)n_steps = bat_per_epo * n_epochsfor i in range(n_steps):X_realB], y_real = generate_real_samples(dataset, n_batch, n_patch)y_fake = generate_fake_samples(g_model, X_realA, n_patch)d_loss1 = d_model.train_on_batch([X_realA, X_realB], y_real)d_loss2 = d_model.train_on_batch([X_realA, X_fakeB], y_fake)_, _ = gan_model.train_on_batch(X_realA, [y_real, X_realB])d1[%.3f] d2[%.3f] g[%.3f]' % (i+1, d_loss1, d_loss2, g_loss))if (i+1) % 100 == 0:clear_output()
此功能訓(xùn)練GAN。這里要注意的關(guān)鍵是批次大小。該論文建議使用迷你們批處理(n_batch = 1),但經(jīng)過一些測(cè)試,我們發(fā)現(xiàn)批處理大小為10會(huì)產(chǎn)生更好的結(jié)果。
image_shape = (256,256,3)d_model = define_discriminator()g_model = define_generator()gan_model = define_gan(g_model, d_model, image_shape)train(d_model, g_model, gan_model, [X,y])
該腳本定義圖像形狀,并調(diào)用函數(shù)來構(gòu)造GAN的不同部分。然后,它調(diào)用訓(xùn)練功能來訓(xùn)練模型。
真實(shí)到語義:
盡管計(jì)算機(jī)生成的圖像模糊,但可以正確對(duì)圖像中的所有內(nèi)容進(jìn)行顏色編碼。請(qǐng)記住,計(jì)算機(jī)無法看到真實(shí)圖像的實(shí)際語義表示!我們認(rèn)為圖像是模糊的,因?yàn)檎嬲?56 x 256圖像不是很復(fù)雜,而且有許多可能使機(jī)器掉色的顏色。右邊的圖像(計(jì)算機(jī)生成)可以分割成正方形。如果計(jì)算這些平方,它將與卷積層的過濾器數(shù)量匹配!

語義到真實(shí):
將語義數(shù)據(jù)轉(zhuǎn)換為真實(shí)的街景圖像時(shí),我們擔(dān)心這是不可能的,因?yàn)楫?dāng)轉(zhuǎn)換為語義數(shù)據(jù)時(shí),會(huì)丟失大量數(shù)據(jù)。例如,紅色汽車和綠色汽車都變成藍(lán)色,因?yàn)槠囀前此{(lán)色像素分類的。這是一個(gè)明顯的問題??赡芫哂胁煌伾膶?duì)象根本沒有出現(xiàn),從而導(dǎo)致圖像看起來只有一點(diǎn)點(diǎn)相似??匆幌孪旅娴膱D片:

考慮到該網(wǎng)絡(luò)僅訓(xùn)練了10個(gè)紀(jì)元,我們認(rèn)為該項(xiàng)目是成功的,并且結(jié)果似乎很有希望。我們希望人們可以玩弄模型架構(gòu)和超參數(shù),以提高GAN創(chuàng)建的圖像的質(zhì)量。
交流群
歡迎加入公眾號(hào)讀者群一起和同行交流,目前有SLAM、三維視覺、傳感器、自動(dòng)駕駛、計(jì)算攝影、檢測(cè)、分割、識(shí)別、醫(yī)學(xué)影像、GAN、算法競(jìng)賽等微信群(以后會(huì)逐漸細(xì)分),請(qǐng)掃描下面微信號(hào)加群,備注:”昵稱+學(xué)校/公司+研究方向“,例如:”張三 + 上海交大 + 視覺SLAM“。請(qǐng)按照格式備注,否則不予通過。添加成功后會(huì)根據(jù)研究方向邀請(qǐng)進(jìn)入相關(guān)微信群。請(qǐng)勿在群內(nèi)發(fā)送廣告,否則會(huì)請(qǐng)出群,謝謝理解~

