原文：https://ww2.mathworks.cn/help/deeplearning/ug/train-a-siamese-network-for-dimensionality-reduction.html
这个例子展示了如何训练一个暹罗网络使用降维来比较手写数字。
暹罗网络是一种深度学习网络，它使用两个或多个具有相同架构和共享相同参数和权重的相同子网。暹罗网络通常用于寻找两个可比事物之间关系的任务。暹罗网络的一些常见应用包括面部识别、签名验证[1]或释义识别[2]。暹罗网络在这些任务中表现良好，因为它们的共享权重意味着在训练期间需要学习的参数更少，并且它们可以用相对少量的训练数据产生良好的结果。
暹罗网络在有大量类而每个类的观测值很少的情况下特别有用。在这种情况下，没有足够的数据来训练深度卷积神经网络来将图像分类到这些类别中。相反，暹罗网络可以确定两幅图像是否在同一个类中。该网络通过降低训练数据的维数并使用基于距离的成本函数来区分类别来做到这一点。
该示例使用暹罗网络对手写数字图像集合进行降维。暹罗架构通过将相同类别的图像映射到低维空间中的邻近点来降低维度。然后使用简化特征表示从数据集中提取与测试图像最相似的图像。本例中的训练数据是大小为28×28×1的图像，初始特征维数为784。暹罗网络将输入图像的维数缩减为两个特征，并被训练为输出具有相同标签的图像的相似缩减特征。
您也可以使用暹罗网络通过直接比较来识别相似的图像。例如，请参见训练暹罗网络来比较图像。

训练数据的加载和预处理

加载由手写数字图像组成的训练数据。函数digitTrain4DArrayData加载数字图像及其标签。

[XTrain,YTrain] = digitTrain4DArrayData;

XTrain是一个28×28×1×5000的阵列，包含5000个单通道图像，每个图像的大小为28×28。每个像素的值介于0和1之间。YTrain是一个分类向量，包含每个观察值的标签，这些标签是从0到9的数字，对应于书写数字的值。
显示随机选择的图像。

perm = randperm(numel(YTrain), 9);
imshow(imtile(XTrain(:,:,:,perm),"ThumbnailSize",[100 100]));

在这里插入图片描述

创建相似和不同的图像对

为了训练网络，必须将数据分成相似或不相似的图像对。这里，相似的图像被定义为具有相同的标签，而不同的图像具有不同的标签。函数getSiameseBatch(在本例的支持函数部分中定义)创建相似或不相似图像的随机对，即成对图像1和成对图像2。该函数还返回标签对标签，它标识图像对是否彼此相似或不相似。相似的图像对的pairLabel = 1，而不相似的图像对的pairLabel = 0。
例如，创建一个由五对图像组成的小型代表性集合

batchSize = 10;
[pairImage1,pairImage2,pairLabel] = getSiameseBatch(XTrain,YTrain,batchSize);

显示生成的图像对。

for i = 1:batchSize
subplot(2,5,i)
imshow([pairImage1(:,:,:,i) pairImage2(:,:,:,i)]);
if pairLabel(i) == 1s = "similar";
elses = "dissimilar";
end
title(s)
end

在这里插入图片描述在本例中，为训练循环的每次迭代创建一批新的180对图像。这确保了网络是在大量随机图像对上训练的，这些图像对具有近似相等比例的相似和不相似对。

定义网络架构

暹罗网络架构如下图所示。
在这里插入图片描述在本例中，两个相同的子网被定义为一系列具有ReLU层的完全连接的层。创建一个接受28×28×1图像的网络，并输出用于简化特征表示的两个特征向量。该网络将输入图像的维数减少到2，该值比初始维数784更容易绘制和可视化。
对于前两个完全连接的层，指定输出大小为1024，并使用he权重初始值设定项。
对于最终完全连接的层，指定输出大小为2，并使用he权重初始值设定项。

layers = [imageInputLayer([28 28],'Name','input1','Normalization','none')fullyConnectedLayer(1024,'Name','fc1','WeightsInitializer','he')reluLayer('Name','relu1')fullyConnectedLayer(1024,'Name','fc2','WeightsInitializer','he')reluLayer('Name','relu2')fullyConnectedLayer(2,'Name','fc3','WeightsInitializer','he')];lgraph = layerGraph(layers);

要使用自定义训练循环训练网络并启用自动区分，请将图层图转换为dlnetwork对象。

dlnet = dlnetwork(lgraph);

定义模型梯度函数

创建函数模型梯度(在本例的支持函数部分定义)。模型梯度函数采用暹罗数据网络对象dlnet和带有标签对标签的输入数据dlX1和dlX2的小批量。该函数返回损耗值和损耗相对于网络可学习参数的梯度。
暹罗网络的目标是为每个图像输出特征向量，使得相似图像的特征向量相似，而不同图像的特征向量显著不同。这样，网络可以区分两个输入。
找出最后一个完全连接层的输出，即分别来自成对图像1和成对图像2的特征向量特征1和特征1之间的对比损失。一对的对比损失由[3]给出：
loss=12yd2+12(1−y)max(margin−d,0)2,
其中y是配对标签的值(对于相似的图像，y = 1；y=0(对于不同的图像)，d是两个特征向量f1和f2之间的欧几里德距离:d = f1 F2 2。
边距参数用于约束:如果一对中的两个图像不相似，那么它们的距离至少应该是边距，否则会导致损失。
对比损失有两个术语，但对于给定的图像对，只有一个术语是非零的。在类似图像的情况下，第一项可以是非零的，并且通过减小图像特征f1和f2之间的距离来最小化。在不相似图像的情况下，第二项可以是非零的，并且通过增加图像特征之间的距离而最小化，至少到边缘的距离。保证金的价值越小，在发生损失之前，对不同对的限制就越小。

指定训练参数

指定训练期间的margin值

margin = 0.3;

指定训练期间使用的参数。训练3000次迭代。

numIterations = 3000;
miniBatchSize = 180;

指定自动数据管理优化的选项:将学习率设置为0.0001。
用[]初始化拖尾平均梯度和拖尾平均梯度平方衰减率。
将梯度衰减系数设置为0.9，将梯度衰减系数的平方设置为0.99。

learningRate = 1e-4;
trailingAvg = [];
trailingAvgSq = [];
gradDecay = 0.9;
gradDecaySq = 0.99;

在GPU上训练，如果有的话。使用图形处理器需要并行计算工具箱和受支持的图形处理器设备。有关受支持设备的信息，请参见按版本划分的GPU支持(并行计算工具箱)。要自动检测您是否有可用的图形处理器，并将相关数据放在图形处理器上，请将执行环境的值设置为“自动”。如果您没有GPU，或者不想使用GPU进行训练，请将executionEnvironment的值设置为“cpu”。为确保您使用gpu进行训练，请将executionEnvironment的值设置为“GPU”。

executionEnvironment = "auto";

为了监控培训进度，您可以在每次迭代后绘制培训损失。创建包含“培训进度”的可变图。如果不想绘制训练进度，请将该值设置为“无”。

plots = "training-progress";

初始化训练损失进度图的图参数。

plotRatio = 16/9;if plots == "training-progress"trainingPlot = figure;trainingPlot.Position(3) = plotRatio*trainingPlot.Position(4);trainingPlot.Visible = 'on';trainingPlotAxes = gca;lineLossTrain = animatedline(trainingPlotAxes);xlabel(trainingPlotAxes,"Iteration")ylabel(trainingPlotAxes,"Loss")title(trainingPlotAxes,"Loss During Training")
end

为了评估网络在降维方面做得有多好，在每次迭代后计算并绘制一组测试数据的降维特征。加载测试数据，该数据由类似于训练数据的手写数字图像组成。将测试数据转换为dlarray，并指定维度标签“SSCB”(空间、空间、通道、批次)。如果您正在使用图形处理器，请将测试数据转换为图形处理器。

[XTest,YTest] = digitTest4DArrayData;
dlXTest = dlarray(single(XTest),'SSCB');% If training on a GPU, then convert data to gpuArray.
if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu"dlXTest = gpuArray(dlXTest);           
end

为测试数据的简化特征绘图初始化绘图参数。

dimensionPlot = figure;
dimensionPlot.Position(3) = plotRatio*dimensionPlot.Position(4);
dimensionPlot.Visible = 'on';dimensionPlotAxes = gca;uniqueGroups = unique(YTest);
colors = hsv(length(uniqueGroups));

初始化一个计数器来记录迭代的总次数。

iteration = 1;

训练模型

使用自定义训练循环训练模型。在每次迭代中循环训练数据并更新网络参数。
对于每次迭代:使用在创建图像对的批处理一节中定义的获取图像批处理功能提取一批图像对和标签。
将图像数据转换为底层类型为single的dlarray对象，并指定维度标签“SSCB”(空间、空间、通道、批处理)。
对于GPU训练，将图像数据转换为gpuArray对象。
使用dlfeval和模型梯度函数评估模型梯度。
使用adamupdate函数更新网络参数。

% Loop over mini-batches.
for iteration = 1:numIterations% Extract mini-batch of image pairs and pair labels[X1,X2,pairLabels] = getSiameseBatch(XTrain,YTrain,miniBatchSize);% Convert mini-batch of data to dlarray. Specify the dimension labels% 'SSCB' (spatial, spatial, channel, batch) for image datadlX1 = dlarray(single(X1),'SSCB');dlX2 = dlarray(single(X2),'SSCB');% If training on a GPU, then convert data to gpuArray.if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu"dlX1 = gpuArray(dlX1);dlX2 = gpuArray(dlX2);end       % Evaluate the model gradients and the generator state using% dlfeval and the modelGradients function listed at the end of the% example.[gradients,loss] = dlfeval(@modelGradients,dlnet,dlX1,dlX2,pairLabels,margin);lossValue = double(gather(extractdata(loss)));% Update the Siamese network parameters.[dlnet.Learnables,trailingAvg,trailingAvgSq] = ...adamupdate(dlnet.Learnables,gradients, ...trailingAvg,trailingAvgSq,iteration,learningRate,gradDecay,gradDecaySq);% Update the training loss progress plot.if plots == "training-progress"addpoints(lineLossTrain,iteration,lossValue);end% Update the reduced-feature plot of the test data.        % Compute reduced features of the test data:dlFTest = predict(dlnet,dlXTest);FTest = extractdata(dlFTest);figure(dimensionPlot);for k = 1:length(uniqueGroups)% Get indices of each image in test data with the same numeric % label (defined by the unique group):ind = YTest==uniqueGroups(k);% Plot this group:plot(dimensionPlotAxes,gather(FTest(1,ind)'),gather(FTest(2,ind)'),'.','color',...colors(k,:));hold onendlegend(uniqueGroups)% Update title of reduced-feature plot with training progress information.title(dimensionPlotAxes,"2-D Feature Representation of Digits Images. Iteration = " +...iteration);legend(dimensionPlotAxes,'Location','eastoutside');xlabel(dimensionPlotAxes,"Feature 1")ylabel(dimensionPlotAxes,"Feature 2")hold off    drawnow    
end

在这里插入图片描述网络现在已经学会将每个图像表示为二维向量。从测试数据的简化特征图中可以看出，在这种二维表示中，相似数字的图像聚集在一起。

使用训练好的网络查找相似图像

您可以使用训练好的网络从一组图像中选择彼此相似的图像。在这种情况下，使用测试数据作为图像组。如果你使用的是图形处理器，将这组图像转换成数据阵列对象和图形处理器对象。

groupX = XTest;dlGroupX = dlarray(single(groupX),'SSCB');if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu"dlGroupX = gpuArray(dlGroupX);           
end

从组中提取单个测试图像并显示。从组中删除测试图像，使其不出现在相似图像集中。

testIdx = randi(5000);
testImg = dlGroupX(:,:,:,testIdx);trialImgDisp = extractdata(testImg);figure
imshow(trialImgDisp, 'InitialMagnification', 500);

在这里插入图片描述

dlGroupX(:,:,:,testIdx) = [];

使用预测查找测试图像的简化特征。

trialF = predict(dlnet,testImg);

使用训练好的网络找到组中每个图像的二维简化特征表示。

FGroupX = predict(dlnet,dlGroupX);

使用简化的特征表示，使用欧几里德距离度量来查找组中最接近测试图像的九个图像。显示图像。

distances = vecnorm(extractdata(trialF - FGroupX));
[~,idx] = sort(distances);
sortedImages = groupX(:,:,:,idx);figure
imshow(imtile(sortedImages(:,:,:,1:9)), 'InitialMagnification', 500);

在这里插入图片描述通过将图像降低到较低的维度，网络能够识别与试验图像相似的图像。简化的特征表示允许网络区分相似和不相似的图像。暹罗网络通常用于面部或签名识别。例如，您可以训练一个暹罗网络接受一张人脸图像作为输入，并从数据库中返回一组最相似的人脸。

支持函数

模型梯度函数

模型梯度函数接受暹罗数据网络对象数据网络，一对小批量输入数据X1和X2，以及标签对标签。该函数返回损失相对于网络中可学习参数的梯度以及成对图像的降维特征之间的对比损失。在本例中，模型梯度函数在定义模型梯度函数一节中介绍。

function [gradients, loss] = modelGradients(net,X1,X2,pairLabel,margin)
% The modelGradients function calculates the contrastive loss between the
% paired images and returns the loss and the gradients of the loss with 
% respect to the network learnable parameters% Pass first half of image pairs forward through the networkF1 = forward(net,X1);% Pass second set of image pairs forward through the networkF2 = forward(net,X2);% Calculate contrastive lossloss = contrastiveLoss(F1,F2,pairLabel,margin);% Calculate gradients of the loss with respect to the network learnable% parametersgradients = dlgradient(loss, net.Learnables);endfunction loss = contrastiveLoss(F1,F2,pairLabel,margin)
% The contrastiveLoss function calculates the contrastive loss between
% the reduced features of the paired images % Define small value to prevent taking square root of 0delta = 1e-6;% Find Euclidean distance metricdistances = sqrt(sum((F1 - F2).^2,1) + delta);% label(i) = 1 if features1(:,i) and features2(:,i) are features% for similar images, and 0 otherwiselossSimilar = pairLabel.*(distances.^2);lossDissimilar = (1 - pairLabel).*(max(margin - distances, 0).^2);loss = 0.5*sum(lossSimilar + lossDissimilar,'all');
end

创建批量图像对

下列函数根据标签创建相似或不相似的随机图像对。在这个例子中，函数getSiameseBatch是在创建相似和不同图像对一节中引入的。

function [X1,X2,pairLabels] = getSiameseBatch(X,Y,miniBatchSize)
% getSiameseBatch returns a randomly selected batch of paired images. 
% On average, this function produces a balanced set of similar and 
% dissimilar pairs.pairLabels = zeros(1, miniBatchSize);imgSize = size(X(:,:,:,1));X1 = zeros([imgSize 1 miniBatchSize]);X2 = zeros([imgSize 1 miniBatchSize]);for i = 1:miniBatchSizechoice = rand(1);if choice < 0.5[pairIdx1, pairIdx2, pairLabels(i)] = getSimilarPair(Y);else[pairIdx1, pairIdx2, pairLabels(i)] = getDissimilarPair(Y);endX1(:,:,:,i) = X(:,:,:,pairIdx1);X2(:,:,:,i) = X(:,:,:,pairIdx2);endendfunction [pairIdx1,pairIdx2,pairLabel] = getSimilarPair(classLabel)
% getSimilarPair returns a random pair of indices for images
% that are in the same class and the similar pair label = 1.% Find all unique classes.classes = unique(classLabel);% Choose a class randomly which will be used to get a similar pair.classChoice = randi(numel(classes));% Find the indices of all the observations from the chosen class.idxs = find(classLabel==classes(classChoice));% Randomly choose two different images from the chosen class.pairIdxChoice = randperm(numel(idxs),2);pairIdx1 = idxs(pairIdxChoice(1));pairIdx2 = idxs(pairIdxChoice(2));pairLabel = 1;
endfunction  [pairIdx1,pairIdx2,pairLabel] = getDissimilarPair(classLabel)
% getDissimilarPair returns a random pair of indices for images
% that are in different classes and the dissimilar pair label = 0.% Find all unique classes.classes = unique(classLabel);% Choose two different classes randomly which will be used to get a dissimilar pair.classesChoice = randperm(numel(classes), 2);% Find the indices of all the observations from the first and second classes.idxs1 = find(classLabel==classes(classesChoice(1)));idxs2 = find(classLabel==classes(classesChoice(2)));% Randomly choose one image from each class.pairIdx1Choice = randi(numel(idxs1));pairIdx2Choice = randi(numel(idxs2));pairIdx1 = idxs1(pairIdx1Choice);pairIdx2 = idxs2(pairIdx2Choice);pairLabel = 0;
end

参考文献

[1] Bromley, J., I. Guyon, Y. LeCunn, E. Säckinger, and R. Shah. “Signature Verification using a “Siamese” Time Delay Neural Network.” In Proceedings of the 6th International Conference on Neural Information Processing Systems (NIPS 1993), 1994, pp737-744. Available at Signature Verification using a “Siamese” Time Delay Neural Network on the NIPS Proceedings website.

[2] Wenpeg, Y., and H Schütze. “Convolutional Neural Network for Paraphrase Identification.” In Proceedings of 2015 Conference of the North American Cahapter of the ACL, 2015, pp901-911. Available at Convolutional Neural Network for Paraphrase Identification on the ACL Anthology website.

[3] Hadsell, R., S. Chopra, and Y. LeCunn. “Dimensionality Reduction by Learning an Invariant Mapping.” In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), 2006, pp1735-1742.