MATEC Web of Conferences 336, 06008 (2021) https://doi.org/10.1051/matecconf/202133606008 CSCNS2020 Forensic face recognition based on KDE and evidence theory Wen Xiao1, * 1JiangXi Police College, NanChang, JiangXi, China Abstract. Forensic face recognition (FFR) has been studied in recent years in forensic science. Given an automatic face recognition system, output scores of the system are used to describe the similarity of face image pairs, but not suitable for forensics. In this study, a score-mapping model based on kernel density estimation (KDE) and evidence theory is proposed. First, KDE was used to generate probability density function (PDF) for each dimensional feature vector of face image pairs. Then, the PDFs could be utilized to determine separately the basic probability assignment (BPA) of supporting the prosecution hypothesis and the defence hypothesis. Finally, the BPAs of each feature were combined by Dempster’s rule to get the final BPA, which reflects the strength of evidence support. The experimental results demonstrate that compared with the classic KDE-based likelihood ratio method, the proposed method has a better performance in terms of accuracy, sensitivity and specificity. 1 Introduction In the context of forensic science, face recognition approaches have fallen into two main categories: Subjective-based and objective-based methods. The subjective-based methods are the traditional forensic means in the past few decades, and face biometric features have been commonly used to inspect and compare static images for such methods [1,2]. Mainly four subjective-based methods can be used during the analysis and comparison phase [1]: Holistic Comparison, Morphological Analysis, Photo-anthropometry, and Superimposition. Facial Identification Scientific Working Group (FISWG) recommends morphological analysis by trained examiners as the primary method of comparison [3]. Moreover, in recent years, soft Biometrics, such as gender, age, race, skin colour, spots and other characteristics, have been considered into face recognition procedure so as to improve recognition results [4,5]. But these methods need to be manually carried out by forensic experts, so they heavily depend on the experience and knowledge of the forensic experts. On the other hand, the objective- based methods attempt to identify faces using automatic face recognition [6-10]. Using automatic recognition system to verify faces can not only improve the efficiency of forensic work, but also promote the standardization of the forensic process. In commercial face recognition systems, the similarity or distance between two faces is usually reported in terms of one or several score values, which is so called “Score-based procedures” [11]. In order to * Corresponding author: shauven@126.com © The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (http://creativecommons.org/licenses/by/4.0/). MATEC Web of Conferences 336, 06008 (2021) https://doi.org/10.1051/matecconf/202133606008 CSCNS2020 take the differences and typicality of the population into account and allow for comparisons between facial scores from different face recognition systems, there is necessary to convert the score to Likelihood Ratio (LR) value [12]. Some organizations such as the European Network of Forensic Science Institutes (ENFSI), report the certainty of the statement match/nonmatch via a quantifiable amount, that is, verify whether it is the same person/different person or not [13]. To that end, ENFSI enforces the use of a LR value to evaluate the strength of evidence as the degree of supporting the appraisal conclusion, namely, it can be regarded as the mensurable method to express the confidence in the match/nonmatch decision [10]. A suitable approach to achieve this is to append such a score-to-LR mapping in a post-processing step to an existing score-producing facial recognition system [14]. Once a model for score-to-LR mapping has been set up, the strength of evidences can be obtained by plugging the scores into the model. 2 Evidence evaluation and evidence theory Evidence evaluation has been proposed in recent years as a logical and appropriate way to report evidence to a court of law using a Bayesian probabilistic framework. LR is based on Bayes’ rule, it is defined as the ratio of the probabilities of two hypotheses [15]: the null hypothesis of the prosecution (Hp), and the alternative hypothesis of the defense (Hd). The hypothesis of the prosecution Hp means that the evidences are from the same source, and the hypothesis of the defense Hd means that the evidences are from the different source. Then LR is obtained from two conditional probabilities, that is, the conditional probability of the prosecution hypothesis divided by the conditional probability of the defense hypothesis. So, the LR is defined as follow: Pr�𝐸𝐸𝐸𝐸|𝐻𝐻𝐻𝐻𝑝𝑝𝑝𝑝 � 𝐿𝐿𝐿𝐿�𝐻𝐻𝐻𝐻𝑝𝑝𝑝𝑝 , 𝐻𝐻𝐻𝐻𝑑𝑑𝑑𝑑 , 𝐸𝐸𝐸𝐸� = (1) Pr(𝐸𝐸𝐸𝐸|𝐻𝐻𝐻𝐻𝑑𝑑𝑑𝑑 ) In order to express the strength of evidence support, especially for convenient to communicate evidence values in the courtroom, it would be useful to translate the numerical expression to a verbal counterpart. One of the current frameworks to relate verbal and numerical likelihood ratios is defined as follows [16]: Table 1. Relation of verbal and numerical LR. LR range Evidence to support Hp 1-2 no assistance 2-10 slightly more probable 10-100 more probable 100-10,000 much more probable 10,000 -1,000,000 far more probable >1,000,0000 exceedingly more probable 2.1 Score-to-LR conversion model When an automated system is used to calculate the similarity or the distance between the two faces to be compared, it returns a score. This score itself has no forensic relevance and needs to be converted to LR. Four score-to-LR conversion models have been proposed [17]: Kernel Density Estimation (KDE), Linear Logistic Regression (LLR), Histogram Binning and Pool Adjacent Violators (PAV), where KDE is a commonly used method which is easy to explain. In KDE, a kernel distribution is a non-parametric representation of the probability density function (PDF) of a 2 MATEC Web of Conferences 336, 06008 (2021) https://doi.org/10.1051/matecconf/202133606008 CSCNS2020 random variable. It is used when a parameter distribution cannot properly describe the data, or to avoid making assumptions about the data distribution. A kernel distribution is defined by a smoothing function and a bandwidth value h, which controls the smoothness of the resulting density curve. In other words, it is a technique that lets you create a smooth curve given a set of data. It is given by the following equation: 𝑛𝑛𝑛𝑛 𝑛𝑛𝑛𝑛 1 1 𝑥𝑥𝑥𝑥 − 𝑥𝑥𝑥𝑥𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑘𝑘𝑘𝑘 (𝑥𝑥𝑥𝑥; ℎ, 𝐾𝐾𝐾𝐾) = � 𝐾𝐾𝐾𝐾ℎ (𝑥𝑥𝑥𝑥 − 𝑥𝑥𝑥𝑥𝑖𝑖𝑖𝑖 ) = � 𝐾𝐾𝐾𝐾 � � (2) 𝑛𝑛𝑛𝑛 𝑛𝑛𝑛𝑛ℎ ℎ 𝑖𝑖𝑖𝑖=1 𝑖𝑖𝑖𝑖=1 where K is the kernel and h is the bandwidth. The kernel smoothing function K defines the shape of the curve used to generate the probability distribution function, and the bandwidth h steers the smoothness of the resulting approximation. Gaussian distribution is usually used as kernel function, and the bandwidth could be adaptively generated from the sample data. Unlike a histogram, which places the values into discrete bins, a kernel distribution sums the component smoothing functions for each data value to produce a smooth, continuous probability curve. Use the kernel function to estimate the data from the same source (the prosecution hypothesis Hp) and the data from different sources (the prosecution hypothesis Hd) respectively, LR is calculated with the following: Pr�𝑠𝑠𝑠𝑠�𝐻𝐻𝐻𝐻𝑝𝑝𝑝𝑝 � 𝑓𝑓𝑓𝑓𝑝𝑝𝑝𝑝 (𝑠𝑠𝑠𝑠; ℎ, 𝑘𝑘𝑘𝑘) 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿(𝑠𝑠𝑠𝑠) = = (3) Pr(𝑠𝑠𝑠𝑠|𝐻𝐻𝐻𝐻𝑑𝑑𝑑𝑑 ) 𝑓𝑓𝑓𝑓𝑑𝑑𝑑𝑑 (𝑠𝑠𝑠𝑠; ℎ, 𝑘𝑘𝑘𝑘) 2.2 Evidence theory Evidence theory is the generalization of probability theory [18,19], which can handle uncertainty, impreciseness, and unknown information and fuse multi-source information without depending on prior information. A set of finite mutually exclusive hypotheses or propositions Ω is called the frame of discernment, and the Basic Probability Assignment (BPA) function under the frame of discernment is a function 𝑚𝑚𝑚𝑚: 2Ω ↦ [0,1], and satisfies with ∑𝐴𝐴𝐴𝐴⊆Ω 𝑚𝑚𝑚𝑚(𝐴𝐴𝐴𝐴) = 1 and 𝑚𝑚𝑚𝑚(Φ) = 0. The BPA m is also called the mass function, and 𝑚𝑚𝑚𝑚(𝐴𝐴𝐴𝐴) expresses the proportion of all relevant and available evidence that supports the claim that a particular element of 2Ω belongs to the set A but to no particular subset of A. Suppose H1 and H2 are two independent evidence with two mass functions m1 and m2 in the same frame of discernment Ω; the Dempster’s rule of combination is defined as follow: 𝑚𝑚𝑚𝑚1 ⨁ 2 (𝐴𝐴𝐴𝐴) = 𝐾𝐾𝐾𝐾 −1 ∙ � 𝑚𝑚𝑚𝑚1 (𝐵𝐵𝐵𝐵𝑖𝑖𝑖𝑖 )𝑚𝑚𝑚𝑚2 �𝐶𝐶𝐶𝐶𝑗𝑗𝑗𝑗 � (4) 𝐵𝐵𝐵𝐵𝑖𝑖𝑖𝑖 ∩𝐶𝐶𝐶𝐶𝑗𝑗𝑗𝑗 =𝐴𝐴𝐴𝐴 where, 𝐾𝐾𝐾𝐾 = ∑𝐵𝐵𝐵𝐵𝑖𝑖𝑖𝑖 ∩𝐶𝐶𝐶𝐶𝑗𝑗𝑗𝑗 ≠Φ 𝑚𝑚𝑚𝑚1 (𝐵𝐵𝐵𝐵𝑖𝑖𝑖𝑖 )𝑚𝑚𝑚𝑚2 �𝐶𝐶𝐶𝐶𝑗𝑗𝑗𝑗 � is referred to as the degree of conflict between the two BPAs. If K is close to 0, the Dempster’s rule of combination becomes invalid. One of the methods to solve this problem is to set discount coefficient, which usually represents the unreliability or dependence of the evidences. If m is a BPA and (1-α) is its corresponding discount coefficient, then the BPA after discounting is: 𝛼𝛼𝛼𝛼 ⋅ 𝑚𝑚𝑚𝑚(𝐴𝐴𝐴𝐴) 𝐴𝐴𝐴𝐴 ≠ Ω 𝑚𝑚𝑚𝑚𝛼𝛼𝛼𝛼 (𝐴𝐴𝐴𝐴) = � (5) 𝛼𝛼𝛼𝛼 ⋅ 𝑚𝑚𝑚𝑚(Ω) + (1 − 𝛼𝛼𝛼𝛼) 𝐴𝐴𝐴𝐴 = Ω 3 Proposed method Recently, a new non-parametric method based on KDE has been proposed to determine BPA [20,21]. Inspire by the idea, a score-mapping method for forensic is presented in this paper, in which KDE and evidence theory are used to determine the confidence of whether two face 3 MATEC Web of Conferences 336, 06008 (2021) https://doi.org/10.1051/matecconf/202133606008 CSCNS2020 image pairs to compare are from the same source or from the difference source. The flowchart of the proposed score-mapping method, named KDE-DS, is shown as figure 1. Np pairs Compute from the 128 dim KDE per same vectors dim Reference source Database Nd pairs Automatic Compute from the Face 128 dim KDE per difference Recognition vectors dim source System BPA1 Test face BPA2 DS Output images to 128 dim score ratio … Combina- compare vectors compute Plug-in … tion rule BPA128 Fig. 1. The flowchart of the proposed score-mapping method. The steps are described as follows: Step 1: First, the reference database can be divided into two parts. One is training sample which constructs the model of each feature with its PDF curves. Another one is test sample which is used to verify the constructed model. Each sample includes two types of images: face-pairs from the same source and face-pairs from the difference sources. Step 2: Automatic face recognition system is used to generate n-dimensional feature vectors (e.g. 128-dim). For each dimensional feature, the similarity/distances of all image pair are calculated and probability density function is generated using those similarity/distances via KDE, which can be regarded as the probability model for the related feature using the training sample. Step 3: If there are new face image pairs that need to be compared (which could be regard as evidence in a case), the automatic face recognition system is also utilized to generate n- dimensional feature vectors, then the value of each dimensional feature would be plugged in the corresponding PDFs. Thus, we obtain two values k1 and k2, one for the prosecution hypothesis Hp and another for the defense hypothesis Hd, as shown in Figure 2. Fig. 2. Generate two pdfs for each feature. Given a hypothesis H0, the probability of H0 for each PDF model is proportional to the specific intersection point value f(x0) [20]. According to this, a frame of discernment Ω = �𝐻𝐻𝐻𝐻𝑝𝑝𝑝𝑝 , 𝐻𝐻𝐻𝐻𝑑𝑑𝑑𝑑 � could be constructed, and the rules about how a membership is assigned to the focal element are as follows: 𝑘𝑘𝑘𝑘1 𝑚𝑚𝑚𝑚��𝐻𝐻𝐻𝐻𝑝𝑝𝑝𝑝 �� = 𝑐𝑐𝑐𝑐1 = 𝑘𝑘𝑘𝑘1 + 𝑘𝑘𝑘𝑘2 (6) 𝑘𝑘𝑘𝑘2 𝑚𝑚𝑚𝑚({𝐻𝐻𝐻𝐻𝑑𝑑𝑑𝑑 }) = 𝑐𝑐𝑐𝑐2 = 𝑘𝑘𝑘𝑘1 + 𝑘𝑘𝑘𝑘2 4 MATEC Web of Conferences 336, 06008 (2021) https://doi.org/10.1051/matecconf/202133606008 CSCNS2020 Consider that the features of images may not be completely independent of each other, and the source may also be unreliable, the discount efficient is considered to reflect such effects. 𝑚𝑚𝑚𝑚��𝐻𝐻𝐻𝐻𝑝𝑝𝑝𝑝 �� = 𝛼𝛼𝛼𝛼 ∙ 𝑐𝑐𝑐𝑐1 𝑚𝑚𝑚𝑚({𝐻𝐻𝐻𝐻𝑑𝑑𝑑𝑑 }) = 𝛼𝛼𝛼𝛼 ∙ 𝑐𝑐𝑐𝑐2 (7) 𝑚𝑚𝑚𝑚��𝐻𝐻𝐻𝐻𝑝𝑝𝑝𝑝 , 𝐻𝐻𝐻𝐻𝑑𝑑𝑑𝑑 �� = 1 − 𝛼𝛼𝛼𝛼 Generally, (1-α) could be set to 0.3. Step 4: Finally, Dempster’s combination rule is used to combine multiple BPAs to get the final BPA, and the mass function m({Hp}) reflects the strength of evidence support of the prosecution. 4 Experimental results To Verify the effectiveness of the proposed method, the LFW (Labeled Faces in the Wild) face database is used, which is a popular test set for face recognition [22]. The face images provided are all from natural scenes in life, so the difficulty of recognition will increase. Here we select 1100 pairs of same person and different person respectively as training set, and 500 pairs of same person and different person respectively as test set. After the aligning images step, the available numbers of image-pairs are shown as Table 2. Table 2. Numbers of face image pairs. Same source Difference source Training set 1090 1087 Test set 495 495 In forensics, transparency in the methods is essential [10]. Since Openface is an easily available open-source toolkit, which based on the FaceNet algorithm for automatic facial identification that was created by Google [23], it is used in the experiment, and we utilize the commonly known matching indexes of accuracy, sensitivity and specificity, defined as: 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 + 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑎𝑎𝑎𝑎𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = (8) 𝑇𝑇𝑇𝑇 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑛𝑛𝑛𝑛𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑎𝑎𝑎𝑎 = (9) 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹𝑇𝑇𝑇𝑇 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑐𝑐𝑐𝑐𝑠𝑠𝑠𝑠𝑓𝑓𝑓𝑓𝑠𝑠𝑠𝑠𝑐𝑐𝑐𝑐𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑎𝑎𝑎𝑎 = (10) 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹𝑇𝑇𝑇𝑇 With N denoting the number of scores, TP the number of True Positives, TN the number of True Negatives, FN the number of False Negatives, and FP the number of False Positives. When calculating the accuracy, we first convert the LR values to a grade according to the conclusion scale in Table 2. A LR value counts as a match prediction if it receives a grade +2 or higher, and as a no-match prediction if it receives a grade +0.5 or lower. Similarly, if the ratio of the final BPA supporting the Hp to supporting the Hd is greater than 2, which counts as a match prediction, vice versa. The comparative experimental results are shown as Table 3: Table 3. Experimental results. Accuracy Sensitivity Specificity KDE 91.01% 90.71% 91.31% KDE-DS 93.84% 94.14% 93.54% From the experimental results, the proposed method has a better performance in terms of accuracy, sensitivity and specificity than the classic KDE-based likelihood ratio method. 5 MATEC Web of Conferences 336, 06008 (2021) https://doi.org/10.1051/matecconf/202133606008 CSCNS2020 After analysis, the main cause of the error is that multiple face images are extracted incorrectly, or there is a face occluded image in the face image pair, such as wearing glasses, so we should assure that before using automatic face recognition system for face verification, the face image pairs to compare should be carefully checked. 5 Conclusions The KDE method is a common method for likelihood ratio calculation in the field of face forensics, which utilizes the similarity/distance of face image pairs to generate PDF. N- dimensional feature vectors are integrated into a distance measure, which lose some local information. In this study, each dimensional feature is treated as a random variable, and the distances of each feature between image-pairs are accumulated as PDF via KDE, then the PDFs can be mapped in BPA of the prosecution and the defense, finally the BPAs are combined by Dempster’s combination rule. The experiments are verified that the proposed method has a better performance than KDE method. It would be a powerful supplement to traditional likelihood ratio calculation method. This research was financially supported by Science and Technology Research Project of Jiangxi Provincial Department of Education (NO. GJJ151196) and Collaborative Innovation Center for Economics crime investigation and prevention technology, Jiangxi Province (No. JXJZXTCX-019). References 1. C. G. Zeinstra, D. Meuwly, A. C. Ruifro, R. N. Veldhuis, L. J. Spreeuwers, Forensic face recognition as a means to determine strength of evidence: a survey. Forensic Sci Rev, 30, 1, 21-32 (2018). 2. P. Tome, J. Fierrez, R. Vera‐Rodriguez, J. Ortega‐Garcia, Combination of face regions in forensic scenarios. Journal of forensic sciences, 60, 4, 1046-1051 (2015). 3. Facial Identification Scientific Working Group: Facial comparison overview and methodology guidelines, https://fiswg.org/fiswg_facial_comparison_overview_and_methodology_guidelines_V 1.0_20191025.pdf. 4. M. S. Nixon, P. L. Correia, K. Nasrollahi, T. B. Moeslund, A. Hadid, M. Tistarelli, On soft biometrics. Pattern Recognition Letters, 68, 218-230 (2015). 5. P. Tome, R. Vera-Rodriguez, J. Fierrez, J. Ortega-Garcia, Facial soft biometric features for forensic face recognition. Forensic science international, 257, 271-284 (2015). 6. M. Jacquet, C. Champod, Automated face recognition in forensic science: Review and perspectives. Forensic Science International, 307, 110124 (2020). 7. A. L. Mölder, I. E. Åström, E. Leitet, Development of a score-to-likelihood ratio model for facial recognition using authentic criminalistic data. In 2020 8th International Workshop on Biometrics and Forensics (IWBF), IEEE, 1-6 (2020). 8. D. Meuwly, D. Ramos, R. Haraksim, A guideline for the validation of likelihood ratio methods used for forensic evidence evaluation. Forensic science international, 276, 142- 153 (2017). 9. N. Suki, N. Poh, F. M. Senan, N. A. Zamani, M. Z. A. Darus, On the reproducibility and repeatability of likelihood ratio in forensics: A case study using face biometrics. In 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS), 1-8 (2016). 6 MATEC Web of Conferences 336, 06008 (2021) https://doi.org/10.1051/matecconf/202133606008 CSCNS2020 10. A. Macarulla Rodriguez, Z. Geradts, M. Worring, Likelihood Ratios for Deep Neural Networks in Face Comparison. Journal of Forensic Sciences, 65, 4, 1169-1183 (2020). 11. G. S. Morrison, E. Enzinger, Score based procedures for the calculation of forensic likelihood ratios–Scores should take account of both similarity and typicality. Science & Justice, 58, 1, 47-58 (2018). 12. N. Garton, D. Ommen, J. Niemi, A. Carriquiry, Score-based likelihood ratios to evaluate forensic pattern evidence. arXiv preprint arXiv:2002.09470, 1-22 (2020). 13. L. McKenna, S. McDermott, G. O’Donell, ENFSI Guideline for Evaluative Reporting in Forensic Science: Strengthening the evaluation of forensic results across Europe (STEOFRAE). Wiesbaden, Germany: European Network of Forensic Science Institutes, 30–41 (2015). 14. T. Ali, L. Spreeuwers, R. Veldhuis, D. Meuwly, Effect of calibration data on forensic likelihood ratio from a face recognition system. In 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), 1-8 (2013). 15. D. Ramos, R. P. Krish, J. Fierrez, D. Meuwly, From biometric scores to forensic likelihood ratios. In Handbook of biometrics for forensic science, Springer, Cham, 305- 327, (2017). 16. F. S. Kool, Feature-based models for forensic likelihood ratio calculation: Supporting research for the ENFSI-LR project, (2016). 17. T. Ali, Biometric Score Calibration for Forensic Face Recognition. Ph.D. Thesis Series, Centre for Telematics and Information Technology, 14–336 (2014). 18. A. Dempster, Upper and lower probabilities induced by multivalued mapping. Annals of Mathematical Statistics, 38, 2, 325-339 (1967). 19. G. Shafer, A mathematical theory of evidence. Princeton University Press, 1976. 20. P. Xu, X. Su, S. Mahadevan, C. Li, Y. Deng, A non-parametric method to determine basic probability assignment for classification problems. Applied intelligence, 41, 3, 681-693 (2014). 21. B. Qin, F. Xiao, A non-parametric method to determine basic probability assignment based on kernel density estimation. IEEE Access, 6: 73509-73519 (2018). 22. G. B. Huang, M. Mattar, T. Berg, E. Learned-Miller, Labeled faces in the wild: A database for studying face recognition in unconstrained environments. University of Massachusetts, Amherst, Technical Report, 07-49 (2007). 23. A. Fydanaki, Z. Geradts, Evaluating OpenFace: an open-source automatic facial comparison algorithm for forensics. Forensic sciences research, 3, 3, 202-209 (2018). 7