Loading...
 
Features / Usability

Features / Usability


Need some help with DB collation woe

posts: 210

Hi, one of my Tiki 18.8 has stopped to search... After login in as Admin, I could see the whole error message. Much to my astonishment it was

Copy to clipboard
Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_unicode_ci,IMPLICIT) for operation '=' Die Abfrage war: SELECT DISTINCT c.`title` AS name, LEFT(c.`data`, 240) AS data, p.`hits` AS hits, c.`commentDate` AS lastModif, CONCAT(p.`pageName`,': ',c.`title`) AS pageName,outputType ,p.`pageName` AS id1,c.`threadId` AS id2, MATCH(c.`title`,c.`data`) AGAINST ('Kämpfer' IN BOOLEAN MODE) AS relevance FROM `tiki_comments` c, `tiki_pages` p left join `tiki_output` on `tiki_output`.`entityId` = p.`pageName` WHERE c.`objectType` = 'wiki page' AND p.`pageName`=c.`object` AND MATCH(c.`title`,c.`data`) AGAINST ('Kämpfer' IN BOOLEAN MODE) ORDER BY relevance desc, p.`hits` The built query was likely: SELECT DISTINCT c.`title` AS name, LEFT(c.`data`, 240) AS data, p.`hits` AS hits, c.`commentDate` AS lastModif, CONCAT(p.`pageName`,': ',c.`title`) AS pageName,outputType ,p.`pageName` AS id1,c.`threadId` AS id2, MATCH(c.`title`,c.`data`) AGAINST ('Kämpfer' IN BOOLEAN MODE) AS relevance FROM `tiki_comments` c, `tiki_pages` p left join `tiki_output` on `tiki_output`.`entityId` = p.`pageName` WHERE c.`objectType` = 'wiki page' AND p.`pageName`=c.`object` AND MATCH(c.`title`,c.`data`) AGAINST ('Kämpfer' IN BOOLEAN MODE) ORDER BY relevance desc, p.`hits`


This particular Tiki has data from very old days, migrated lots of times. So far no problems with the DB.

When looking deeper I found that for reasons unknown search was set to Lucene, which I do not recall ever activating. Deactivating it, and rebuilding the index (successful) did not help. I had the suspicion it could be related to PHP 7.3, so I downgraded that to 7.2, on which another Tiki 18.8 runs, including search. To no avail.

A look at my backups showed me that a discrepancy between the default

Copy to clipboard
CREATE DATABASE IF NOT EXISTS `MYDB` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;


and many tables like

Copy to clipboard
CREATE TABLE `galaxia_activities` ( `activityId` int(14) NOT NULL, `name` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL, `normalized_name` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL, ...


exist. And have existed for looong. So why is this causing problems now?

And how can I rectify this?
Thanks
hman

posts: 210
Maybe I should add that the contents of the DB look absolutely okay, nothing is visibly damaged. The Tiki contains mostly text, written in German. Besides text, there are some images in the DB and even some MPEG movies, all look okay.
posts: 126886 United Kingdom
hman wrote:
Maybe I should add that the contents of the DB look absolutely okay, nothing is visibly damaged. The Tiki contains mostly text, written in German. Besides text, there are some images in the DB and even some MPEG movies, all look okay.


Hi @hman

Glad your data is ok, as far as i understand it, the collation only affects sorting, and the benefits for utf8_unicode_ci over utf8_general_ci is it's better at sorting non-ascii characters, especially capitalised ones, so a probable fix (after taking a restorable backup ;) would be to change the collation on the utf8_general_ci tables and fields to utf8_unicode_ci.

Which version of MySQL are you on? Maybe a minor update made it more sensitive to this?

posts: 210
Jonny Bradley wrote:
Which version of MySQL are you on? Maybe a minor update made it more sensitive to this?


I'll have to check. But I have another Tiki, also 18.8., running on the same PHP 7.23 at the same ISP, so I guess MySQL versions are synchronized as well. This other Tiki searches perfectly okay...

I am puzzled as to why search was set to Lucene. I do not recall ever activating that. If I remember correctly, Lucene has to be installed separately in order to use it, und I have never installed Lucene anywhere. Maybe this setting damaged something?

I just checked some collations from the SELECT in the the error message. Most collations are utf8_unicode_ci, but tiki_output isn't. It's utf8_general_ci. No idea why this is differing. And also no idea what tiki_output is used for? This table is completely empty (0 records)...

posts: 210
Jonny Bradley wrote:

Hi @hman
Which version of MySQL are you on? Maybe a minor update made it more sensitive to this?


5.7. On both Tikis...

posts: 210
hman wrote:
5.7. On both Tikis...


Just checked my other Tiki. There tiki_output is also completely empty, but collation ist uft8_unicode_ci. So maybe I'll just switch collation. If I recall correctly (will check with a collegue who is SQL expert) collation only affects sorting, but nothing else. And tiki_output is empty already :-)


posts: 210

Looking into my backup, things get even stranger. The following is true only for tiki_output. In my last backup from March, tiki_output was the only table, as far as I can tell, that did not have any collation...

Copy to clipboard
CREATE TABLE `tiki_output` ( `entityId` varchar(160) NOT NULL DEFAULT '', `objectType` varchar(32) NOT NULL DEFAULT '', `outputType` varchar(32) NOT NULL DEFAULT '', `version` int(8) NOT NULL DEFAULT '0', `outputId` int(11) NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8;


Other tables do have utf8_unicode_ci... And tiki_output in my backup of the other Tiki ist defined as

Copy to clipboard
CREATE TABLE `tiki_output` ( `entityId` varchar(160) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `objectType` varchar(32) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `outputType` varchar(32) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `version` int(8) NOT NULL DEFAULT '0', `outputId` int(11) NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


That backup is just one week older, so it's comparable.

This raises the question: What changed the collation of tiki_output after March? Is this some Tiki feature to "repair" things collationwise? If so, why isn't the correct collation applied?


posts: 210

There is more than one table on utf8_general_ci. Actually, that is -for whatever reason- the database default.

galaxia_workitems,index_628bfa7e573fa,index_pref_de,tiki_activity_stream_rules,tiki_addon_profiles,tiki_credits_usage,tiki_galleries_scales,tiki_goal_events,tiki_h5p_contents,tiki_h5p_contents_libraries,tiki_hp5_libraries,tiki_h5_libaries_cachedassets,tiki_h5_libraries_hub_cache,tiki_h5_libraries_languages,tiki_h5_libraries_libraries (!),tiki_hp5_results,tiki_h5p_tmpfiles,tiki_object_scores,tiki_output (here it is),tiki_scheduler,tiki_scheduler_run,tiki_search_queries (!),tiki_stats,tiki_tabular_formats,tiki_user_monitors.

NOW this looks like some form of bug in one of the updater scripts, doesn't it? And possibly I did not re-test searching after I did an upgrade.


posts: 210

On my newer Tiki 18.8 the default is also utf8_general_ci. But there all tables are utf8_unicode_ci, expect for ONE:

index_pref_de is utf8_general_ci. So the bug is not limited to one of the upgrader scripts, but is within initial creation script as well, just smaller...


posts: 210
Now that there is proof that Tiki initializes at least SOME tables non-conforming to the database's default, suspicion is sustantial, that the same non-conforming status could be found on some row status' as well. Oh my, I'll have to check them ALL...

Upcoming Events

1)  18 Apr 2024 14:00 GMT-0000
Tiki Roundtable Meeting
2)  16 May 2024 14:00 GMT-0000
Tiki Roundtable Meeting
3)  20 Jun 2024 14:00 GMT-0000
Tiki Roundtable Meeting
4)  18 Jul 2024 14:00 GMT-0000
Tiki Roundtable Meeting
5)  15 Aug 2024 14:00 GMT-0000
Tiki Roundtable Meeting
6)  19 Sep 2024 14:00 GMT-0000
Tiki Roundtable Meeting
7) 
Tiki birthday
8)  17 Oct 2024 14:00 GMT-0000
Tiki Roundtable Meeting
9)  21 Nov 2024 14:00 GMT-0000
Tiki Roundtable Meeting
10)  19 Dec 2024 14:00 GMT-0000
Tiki Roundtable Meeting