.NET做人脸识别并分类的实现示例

在游乐场、玻璃天桥、滑雪场等娱乐场所，经常能看到有摄影师在拍照片，令这些经营者发愁的一件事就是照片太多了，客户在成千上万张照片中找到自己可不是件容易的事。在一次游玩等活动或家庭聚会也同理，太多了照片导致挑选十分困难。

还好有.net，只需少量代码，即可轻松找到人脸并完成分类。

本文将使用microsoft azure云提供的认知服务（cognitive services）api来识别并进行人脸分类，可以免费使用，注册地址是：https://portal.azure.com。注册完成后，会得到两个密钥，通过这个密钥即可完成本文中的所有代码，这个密钥长这个样子（非真实密钥）：

fa3a7bfd807ccd6b17cf559ad584cbaa

使用方法

首先安装nuget包microsoft.azure.cognitiveservices.vision.face，目前最新版是2.5.0-preview.1，然后创建一个faceclient：

string key = "fa3a7bfd807ccd6b17cf559ad584cbaa"; // 替换为你的key
using var fc = new faceclient(new apikeyserviceclientcredentials(key))
{
  endpoint = "https://southeastasia.api.cognitive.microsoft.com",
};

然后识别一张照片：

using var file = file.openread(@"c:\photos\dsc_996icu.jpg");
ilist<detectedface> faces = await fc.face.detectwithstreamasync(file);

其中返回的faces是一个ilist结构，很显然一次可以识别出多个人脸，其中一个示例返回结果如下（已转换为json）：

[
  {
   "faceid": "9997b64e-6e62-4424-88b5-f4780d3767c6",
   "recognitionmodel": null,
   "facerectangle": {
    "width": 174,
    "height": 174,
    "left": 62,
    "top": 559
   },
   "facelandmarks": null,
   "faceattributes": null
  },
  {
   "faceid": "8793b251-8cc8-45c5-ab68-e7c9064c4cfd",
   "recognitionmodel": null,
   "facerectangle": {
    "width": 152,
    "height": 152,
    "left": 775,
    "top": 580
   },
   "facelandmarks": null,
   "faceattributes": null
  }
 ]

可见，该照片返回了两个detectedface对象，它用faceid保存了其id，用于后续的识别，用facerectangle保存了其人脸的位置信息，可供对其做进一步操作。recognitionmodel、facelandmarks、faceattributes是一些额外属性，包括识别性别、年龄、表情等信息，默认不识别，如下图api所示，可以通过各种参数配置，非常好玩，有兴趣的可以试试：

最后，通过.groupasync来将之前识别出的多个faceid进行分类：

var faceids = faces.select(x => x.faceid.value).tolist();
groupresult reslut = await fc.face.groupasync(faceids);

返回了一个groupresult，其对象定义如下：

public class groupresult
{
  public ilist<ilist<guid>> groups
  {
    get;
    set;
  }

  public ilist<guid> messygroup
  {
    get;
    set;
  }

  // ...
}

包含了一个groups对象和一个messygroup对象，其中groups是一个数据的数据，用于存放人脸的分组，messygroup用于保存未能找到分组的faceid。

有了这个，就可以通过一小段简短的代码，将不同的人脸组，分别复制对应的文件夹中：

void copygroup(string outputpath, groupresult result, dictionary<guid, (string file, detectedface face)> faces)
{
  foreach (var item in result.groups
    .selectmany((group, index) => group.select(v => (faceid: v, index)))
    .select(x => (info: faces[x.faceid], i: x.index + 1)).dump())
  {
    string dir = path.combine(outputpath, item.i.tostring());
    directory.createdirectory(dir);
    file.copy(item.info.file, path.combine(dir, path.getfilename(item.info.file)), overwrite: true);
  }
  
  string messyfolder = path.combine(outputpath, "messy");
  directory.createdirectory(messyfolder);
  foreach (var file in result.messygroup.select(x => faces[x].file).distinct())
  {
    file.copy(file, path.combine(messyfolder, path.getfilename(file)), overwrite: true);
  }
}

然后就能得到运行结果，如图，我传入了102张照片，输出了15个分组和一个“未找到队友”的分组：

还能有什么问题？

就两个api调用而已，代码一把梭，感觉太简单了？其实不然，还会有很多问题。

图片太大，需要压缩

毕竟要把图片上传到云服务中，如果上传网速不佳，流量会挺大，而且现在的手机、单反、微单都能轻松达到好几千万像素，jpg大小轻松上10mb，如果不压缩就上传，一来流量和速度遭不住。

二来……其实azure也不支持，文档(https://docs.microsoft.com/en-us/rest/api/cognitiveservices/face/face/detectwithstream)显示，最大仅支持6mb的图片，且图片大小应不大于1920x1080的分辨率：

jpeg, png, gif (the first frame), and bmp format are supported. the allowed image file size is from 1kb to 6mb.
the minimum detectable face size is 36×36 pixels in an image no larger than 1920×1080 pixels. images with dimensions higher than 1920×1080 pixels will need a proportionally larger minimum face size.

因此，如果图片太大，必须进行一定的压缩（当然如果图片太小，显然也没必要进行压缩了），使用.net的bitmap，并结合c# 8.0的switch expression，这个判断逻辑以及压缩代码可以一气呵成：

byte[] compressimage(string image, int edgelimit = 1920)
{
  using var bmp = bitmap.fromfile(image);
  
  using var resized = (1.0 * math.max(bmp.width, bmp.height) / edgelimit) switch
  {
    var x when x > 1 => new bitmap(bmp, new size((int)(bmp.size.width / x), (int)(bmp.size.height / x))), 
    _ => bmp, 
  };
  
  using var ms = new memorystream();
  resized.save(ms, imageformat.jpeg);
  return ms.toarray();
}

竖立的照片

相机一般都是3:2的传感器，拍出来的照片一般都是横向的。但偶尔寻求一些构图的时候，我们也会选择纵向构图。虽然现在许多api都支持正负30度的侧脸，但竖着的脸api基本都是不支持的，如下图（实在找不到可以授权使用照片的模特了）：

还好照片在拍摄后，都会保留exif信息，只需读取exif信息并对照片做相应的旋转即可：

void handleorientation(image image, propertyitem[] propertyitems)
{
  const int exiforientationid = 0x112;
  propertyitem orientationprop = propertyitems.firstordefault(i => i.id == exiforientationid);
  
  if (orientationprop == null) return;
  
  int val = bitconverter.touint16(orientationprop.value, 0);
  rotatefliptype rotatefliptype = val switch
  {
    2 => rotatefliptype.rotatenoneflipx, 
    3 => rotatefliptype.rotate180flipnone, 
    4 => rotatefliptype.rotate180flipx, 
    5 => rotatefliptype.rotate90flipx, 
    6 => rotatefliptype.rotate90flipnone, 
    7 => rotatefliptype.rotate270flipx, 
    8 => rotatefliptype.rotate270flipnone, 
    _ => rotatefliptype.rotatenoneflipnone, 
  };
  
  if (rotatefliptype != rotatefliptype.rotatenoneflipnone)
  {
    image.rotateflip(rotatefliptype);
  }
}

旋转后，我的照片如下：

这样竖拍的照片也能识别出来了。

并行速度

前文说过，一个文件夹可能会有成千上万个文件，一个个上传识别，速度可能慢了点，它的代码可能长这个样子：

dictionary<guid, (string file, detectedface face)> faces = getfiles(infolder)
 .select(file => 
 {
  byte[] bytes = compressimage(file);
  var result = (file, faces: fc.face.detectwithstreamasync(new memorystream(bytes)).getawaiter().getresult());
  (result.faces.count == 0 ? $"{file} not detect any face!!!" : $"{file} detected {result.faces.count}.").dump();
  return (file, faces: result.faces.tolist());
 })
 .selectmany(x => x.faces.select(face => (x.file, face)))
 .todictionary(x => x.face.faceid.value, x => (file: x.file, face: x.face));

要想把速度变化，可以启用并行上传，有了c#/.net的linq支持，只需加一行.asparallel()即可完成：

dictionary<guid, (string file, detectedface face)> faces = getfiles(infolder)
 .asparallel() // 加的就是这行代码
 .select(file => 
 {
  byte[] bytes = compressimage(file);
  var result = (file, faces: fc.face.detectwithstreamasync(new memorystream(bytes)).getawaiter().getresult());
  (result.faces.count == 0 ? $"{file} not detect any face!!!" : $"{file} detected {result.faces.count}.").dump();
  return (file, faces: result.faces.tolist());
 })
 .selectmany(x => x.faces.select(face => (x.file, face)))
 .todictionary(x => x.face.faceid.value, x => (file: x.file, face: x.face));

断点续传

也如上文所说，有成千上万张照片，如果一旦网络传输异常，或者打翻了桌子上的咖啡（谁知道呢？）……或者完全一切正常，只是想再做一些其它的分析，所有东西又要重新开始。我们可以加入下载中常说的“断点续传”机制。

其实就是一个缓存，记录每个文件读取的结果，然后下次运行时先从缓存中读取即可，缓存到一个json文件中：

dictionary<guid, (string file, detectedface face)> faces = getfiles(infolder)
 .asparallel() // 加的就是这行代码
 .select(file => 
 {
  byte[] bytes = compressimage(file);
  var result = (file, faces: fc.face.detectwithstreamasync(new memorystream(bytes)).getawaiter().getresult());
  (result.faces.count == 0 ? $"{file} not detect any face!!!" : $"{file} detected {result.faces.count}.").dump();
  return (file, faces: result.faces.tolist());
 })
 .selectmany(x => x.faces.select(face => (x.file, face)))
 .todictionary(x => x.face.faceid.value, x => (file: x.file, face: x.face));

注意代码下方有一个lock关键字，是为了保证多线程下载时的线程安全。

使用时，只需只需在select中添加一行代码即可：

var cache = new cache<list<detectedface>>(); // 重点
dictionary<guid, (string file, detectedface face)> faces = getfiles(infolder)
 .asparallel()
 .select(file => (file: file, faces: cache.getorcreate(file, () => // 重点
 {
  byte[] bytes = compressimage(file);
  var result = (file, faces: fc.face.detectwithstreamasync(new memorystream(bytes)).getawaiter().getresult());
  (result.faces.count == 0 ? $"{file} not detect any face!!!" : $"{file} detected {result.faces.count}.").dump();
  return result.faces.tolist();
 })))
 .selectmany(x => x.faces.select(face => (x.file, face)))
 .todictionary(x => x.face.faceid.value, x => (file: x.file, face: x.face));

将人脸框起来

照片太多，如果活动很大，或者合影中有好几十个人，分出来的组，将长这个样子：

完全不知道自己的脸在哪，因此需要将检测到的脸框起来。

注意框起来的过程，也很有技巧，回忆一下，上传时的照片本来就是压缩和旋转过的，因此返回的detectedface对象值，它也是压缩和旋转过的，如果不进行压缩和旋转，找到的脸的位置会完全不正确，因此需要将之前的计算过程重新演算一次：

using var bmp = bitmap.fromfile(item.info.file);
handleorientation(bmp, bmp.propertyitems);
using (var g = graphics.fromimage(bmp))
{
 using var brush = new solidbrush(color.red);
 using var pen = new pen(brush, 5.0f);
 var rect = item.info.face.facerectangle;
 float scale = math.max(1.0f, (float)(1.0 * math.max(bmp.width, bmp.height) / 1920.0));
 g.scaletransform(scale, scale);
 g.drawrectangle(pen, new rectangle(rect.left, rect.top, rect.width, rect.height));
}
bmp.save(path.combine(dir, path.getfilename(item.info.file)));

使用我上面的那张照片，检测结果如下（有点像相机对焦时人脸识别的感觉）：

1000个脸的限制

.groupasync方法一次只能检测1000个faceid，而上次活动800多张照片中有超过2000个faceid，因此需要做一些必要的分组。

分组最简单的方法，就是使用system.interactive包，它提供了rx.net那样方便快捷的api（这些api在linq中未提供），但又不需要引入observable<t>那样重量级的东西，因此使用起来很方便。

这里我使用的是.buffer(int)函数，它可以将ienumerable<t>按指定的数量（如1000）进行分组，代码如下：

foreach (var buffer in faces
 .buffer(1000)
 .select((list, groupid) => (list, groupid))
{
 groupresult group = await fc.face.groupasync(buffer.list.select(x => x.key).tolist());
 var folder = outfolder + @"\gid-" + buffer.groupid;
 copygroup(folder, group, faces);
}

总结

文中用到的完整代码，全部上传了到我的博客数据github，只要输入图片和key，即可直接使用和运行：

这个月我参加了上海的.net conf，我上述代码对.net conf的800多张照片做了分组，识别出了2000多张人脸，我将其中我的照片的前三张找出来，结果如下：

……

总的来说，这个效果还挺不错，渣渣分辨率的照片的脸都被它找到了。

注意，不一定非得用azure cognitive services来做人脸识别，国内还有阿里云等厂商也提供了人脸识别等服务，并提供了.net接口，无非就是调用api，注意其限制，代码总体差不多。

另外，如有离线人脸识别需求，luxand提供了还有离线版人脸识别sdk，名叫luxand facesdk，同样提供了.net接口。因为无需网络调用，其识别更快，匹配速度更是可达每秒5千万个人脸数据，精度也非常高，亲测好用，目前最新版是v7.1.0，授权昂贵（但百度有惊喜）。

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持www.887551.com。

黄山市民网：https://www.huangshanshimin.com/

相关文章