6.0.0-git
2019-03-24

[#10420] Face similarity indexing is incorrect/broken
Summary Face similarity indexing is incorrect/broken
Queue Ansel
Queue Version Git master
Type Bug
State Resolved
Priority 1. Low
Owners mrubinsk (at) horde (dot) org
Requester thpo+horde (at) dotrc (dot) de
Created 2011-08-10 (2783 days ago)
Due
Updated 2011-10-05 (2727 days ago)
Assigned 2011-09-03 (2759 days ago)
Resolved 2011-10-05 (2727 days ago)
Milestone 2
Patch No

History
2011-10-05 17:24:52 Michael Rubinsky State ⇒ Resolved
 
2011-10-05 00:10:37 Git Commit Comment #11 Reply to this comment
Changes have been made in Git for this ticket:

Need to include image_id in GROUP BY clause.
Bug: 10420
Signed-off-by: Michael J Rubinsky <mrubinsk@horde.org>

  1 files changed, 1 insertions(+), 1 deletions(-)
http://git.horde.org/horde-git/-/commit/912b151ce197158a757a3fc231cf9f0828ff7cda
2011-10-04 23:24:14 Michael Rubinsky Comment #10 Reply to this comment
I've seen you have started to implement the correct face indexing in 
git. Great!
Yeah, it's indexed properly now (still have some tweaks to make), but 
the downside is that the image has to be just about identical to get a 
hit. This indexing technique is really designed for quickly detecting 
duplicates/similar images, not facial recognition...but at least it's 
better than nothing I guess.

The alternative, loading each know face image for the user, then 
comparing it with a known face image using the puzzle routines (which 
would give a similarity ranking between the two) is not practical. 
Plus, that, too, is really designed to detect similar, entire, images, 
not facial recognition. i.e, If the same person is in two images, but 
has a different facial expression, or is looking in a slightly 
different direction, there will be no match.

Just a hint: when you create the face indexes you will
miss out the last one (str_len - word_len == 0).
Yeah, have that fixed locally already, along with the method typo 
thanks for pointing it out. I didn't have the query fixed yet, so 
thanks for that patch :)
2011-10-04 07:53:04 thpo+horde (at) dotrc (dot) de Comment #9 Reply to this comment
I've seen you have started to implement the correct face indexing in 
git. Great! Just a hint: when you create the face indexes you will 
miss out the last one (str_len - word_len == 0).

Furthermore, there ist a call to a function "updatec" that does not exist.
And in getSignatureMatches() the f.image_id has to be part of the "GROUP BY".

Here is my diff for these issues:
--- a/ansel/lib/Faces/Base.php
+++ b/ansel/lib/Faces/Base.php
@@ -555,7 +555,7 @@ class Ansel_Faces_Base
          try {
              $GLOBALS['ansel_db']->update('UPDATE ansel_images SET 
image_faces = '
                  . count($fids) . ' WHERE image_id = ' . $image->id);
-            $GLOBALS['ansel_db']->updatec('UPDATE ansel_shares '
+            $GLOBALS['ansel_db']->update('UPDATE ansel_shares '
                  . 'SET attribute_faces = attribute_faces + ' . count($fids)
                  . ' WHERE share_id = ' . $image->gallery);
          } catch (Horde_Db_Exception $e) {
@@ -654,7 +654,7 @@ class Ansel_Faces_Base
          $str_len = strlen($signature);
          $GLOBALS['ansel_db']->delete('DELETE FROM ansel_faces_index 
WHERE face_id = ' . $face_id);
          $q = 'INSERT INTO ansel_faces_index (face_id, 
index_position, index_part) VALUES (?, ?, ?)';
-        for ($i = 0; $i < $str_len - $word_len; $i++) {
+        for ($i = 0; $i <= $str_len - $word_len; $i++) {
              $data = array(
                  $face_id,
                  $i,
@@ -807,7 +807,7 @@ class Ansel_Faces_Base
          $word_len = $GLOBALS['conf']['faces']['search'];
          $str_len = strlen($signature);
          $indexes = array();
-        for ($i = 0; $i < $str_len - $word_len; $i++) {
+        for ($i = 0; $i <= $str_len - $word_len; $i++) {
              $sig = new Horde_Db_Value_Binary(substr($signature, $i, 
$word_len));
              $indexes[] = '(index_position = ' . $i . ' AND 
index_part = ' . $sig->quote($GLOBALS['ansel_db']) . ')';
          }
@@ -822,7 +822,7 @@ class Ansel_Faces_Base
          if ($indexes) {
              $sql .= ' AND (' . implode(' OR ', $indexes) . ')';
          }
-        $sql .= ' GROUP BY i.face_id, f.face_name HAVING 
count(i.face_id) > 0 '
+        $sql .= ' GROUP BY i.face_id, f.face_name, f.image_id HAVING 
count(i.face_id) > 0 '
              . 'ORDER BY count(i.face_id) DESC';
          $sql = $GLOBALS['ansel_db']->addLimitOffset(
              $sql,

2011-09-03 05:43:30 Michael Rubinsky Comment #8
Assigned to Michael Rubinsky
Summary ⇒ Face similarity indexing is incorrect/broken
State ⇒ Assigned
Milestone ⇒ 2
Reply to this comment
I've fixed most of the issues you have brought up in this ticket. 
However, the query dealing with the vector indexes I have left alone 
for now. The person that originally wrote this part of the code has 
been gone for a while now, but it looks like the way the index is 
created is completely incorrect e.g.,

Given a vector such as [abcdefgh] with a word length of 2, the 
*correct* words should be:
[ab] [bc] [cd] [de] [ef] [fg] [gh]

not

[ab] [cd] [ef] [gh]

Also, these words need to be stored composed with the index position, 
in the same field (See 
http://download.pureftpd.org/pub/pure-ftpd/misc/libpuzzle/doc/README 
for more technical detail if interested).

Combine this with the fact that the images that are being compared are 
very small (just the face portion), and the current query really has 
very little chance of finding a similar face, unless it almost the 
exact same image - after that, it's just pure luck.

This should be fixed before Ansel 2.0
2011-09-03 05:32:17 Git Commit Comment #7 Reply to this comment
Changes have been made in Git for this ticket:

Some clean up of Ansel_Faces_Base
The face signature index stuff still looks broken. The index that
is calculated is done so incorrectly (The table should contain a single,
compound index field composed of the word and the position - not to
mention the words that are calculated are not correct).

See Bug: 10420

  2 files changed, 24 insertions(+), 15 deletions(-)
http://git.horde.org/horde-git/-/commit/49c02e64b89fcf72cce97246fd76b274ffc2c31b
2011-09-03 05:32:11 Git Commit Comment #6 Reply to this comment
Changes have been made in Git for this ticket:

Ensure this binary value is quoted correctly by Horde_Db.
Partially fixes Bug: 10420

  1 files changed, 1 insertions(+), 1 deletions(-)
http://git.horde.org/horde-git/-/commit/3b6afef1f9e6a90247acd865e9012a6a1d8fa327
2011-08-11 07:13:43 thpo+horde (at) dotrc (dot) de Comment #5 Reply to this comment
Also, your changes around line 810 are not correct.

The fields you removed are needed further down in the method. Of 
course, that's another bug - the variable is misnamed (the query 
results should be assigned to $results, not $faces).
Ahh, that explains a lot. I really puzzled a few minutes what you are 
actually trying to get from the query, as the result was overwritten 
just a few steps later.
2011-08-11 07:09:53 thpo+horde (at) dotrc (dot) de Comment #4 Reply to this comment
Can you provide the error that is thrown from not quoting the 
signature? It's a bound parameter, so the quoting should be taken 
care of by the Horde_Db library. If it's not, something is wrong in 
that library.
ERR: HORDE [ansel] SQL QUERY FAILED: SQLSTATE[22021]: Character not in 
repertoire: 7 ERROR:  invalid byte sequence for encoding "UTF8": 0x80
TIP:  This error can also happen if the byte sequence does not match 
the encoding expected by the server, which is controlled by 
"client_encoding".
         UPDATE ansel_faces SET face_signature = 
'<80>6srh{Z^_^_Eyu1|^Bxz ^Z^T' WHERE
           face_id = 1 [pid 14759 on line 808 of 
"/usr/share/php/Horde/Db/Adapter/Base.php"]
2011-08-11 05:41:57 Michael Rubinsky Comment #3 Reply to this comment
Also, your changes around line 810 are not correct.

The fields you removed are needed further down in the method. Of 
course, that's another bug - the variable is misnamed (the query 
results should be assigned to $results, not $faces).
2011-08-11 05:37:13 Michael Rubinsky Comment #2
State ⇒ Feedback
Reply to this comment
Can you provide the error that is thrown from not quoting the 
signature? It's a bound parameter, so the quoting should be taken care 
of by the Horde_Db library. If it's not, something is wrong in that 
library.
2011-08-10 06:41:28 thpo+horde (at) dotrc (dot) de Comment #1
Type ⇒ Bug
State ⇒ Unconfirmed
Priority ⇒ 1. Low
Summary ⇒ wrong column names and non-standard compliant queries
Queue ⇒ Ansel
Milestone ⇒
Patch ⇒ No
New Attachment: ansel.diff Download
Reply to this comment
- column gallery_id in ansel_shares is now share_id
- unquoted signature from libpuzzle creates error in update statement
- GROUP BY and HAVING have to create an unambiguous result set

not yet fixed in the patch:
- ansel gallery object ids in _fetchFaces and _countFaces can no 
longer be accessed with array_keys()

Saved Queries