Atlassian uses cookies to improve your browsing experience, perform analytics and research, and conduct advertising. Accept all cookies to indicate that you agree to our use of cookies on your device. Atlassian cookies and tracking notice, (opens new window)
/
Handling UTF-8 multi-byte characters with a MySQL database
Updated Jan 29, 2021

    Handling UTF-8 multi-byte characters with a MySQL database

    Summary

    The standard UTF-8 character set used by MySQL databases (called utf8) does not truly support all UTF-8 characters - it can only use a maximum of 3 bytes per character. This leaves out the remaining 4-byte characters, including all “emojis” (😕 for example).

    Both Atlassian and ourselves recommend using PostgreSQL, an SQL database that fully supports all possible UTF-8 characters. According to Atlassian, as of Jira 7.3 they also support MySQL 5.7, which should apparently work with the utf8mb4 character set.  However if upgrading Jira and MySQL is not possible, there are some things that can be done using JEMH to alleviate the problems.

    Cleaning email subjects using the "MySQL Subject Cleaner" pre-processing task

    One of JEMH's great features is its modular pre-processing task system.  Particular email processing problems can be overcome by enabling specific tasks to run before the main email processing begins.

    The MySQL Subject Cleaner pre-processing task has been added to JEMH.  See JEMH-5291 for the versions it was added in.  This task filters out 4-byte characters from email subjects, meaning that Jira should not have a problem storing the resulting issue summary.  To see what versions this was added in, check the above improvement issue.

    To enable the pre-processing task:

    1. Go to the "Auditing" tab in JEMH and expand the "Auditing Enablement" section if it is not already

    2. Click "Enable" to enable JEMH auditing

    3. Go to your JEMH profile and edit the "Email" configuration section

    4. Under the "Pre-processing" section, enable "Use Reprocessed Message"

    5. Select the "MySQL Subject Cleaner" task from the "Pre Processing Tasks" list.  If for some reason you need more than one task enabled, control+click to select multiple.  Note that processing tasks should only be enabled if you are sure they are needed!

    6. Save changes by clicking "Submit" at the bottom of the page

    Cleaning email body content using a Body Cleanup Regular Expression

    If your Jira is running on MySQL, unsupported 4-byte characters in the email body could also be a problem.  Jira will try to save the content as the description or a comment, and may fail if such characters are present and unsupported.  If you suspect this to be the case, you can use the Body Cleanup Regexps setting found under Profile>Email to cut out these characters, allowing successful processing.

    In Java, the UTF-16 representation is used for characters.  Long story short, this means that multi-byte characters are represented with a pair of "surrogate" values.  We can use a regular expression to match these values:

    [\x{10000}-\x{10FFFF}]+



    It has come to our attention that the previous regular expression suggested ([\uD800-\uDBFF\uDC00-\uDFFF]+) was not correct for Java's regular expression implementation. This incorrectly matched hyphens "-". Please use the above regular expression instead. 

    See the following blog post for more information: Using Body Cleanup Regexps to remove 4 byte characters?



    Entering this as a regular expression in the above mentioned "Body Cleanup Regexps" setting will remove these pairs, leaving what should be valid characters for your database to store.

    The Plugin People
    Teams
    , (opens new window)

    Enterprise Mail Handler for Jira Data Center (JEMH)
    • First time Install - two minute quickstart
      First time Install - two minute quickstart
       This trigger is hidden
    • How-to articles
      How-to articles
       This trigger is hidden
    • How Do I....
      How Do I....
       This trigger is hidden
    • Licensing
      Licensing
       This trigger is hidden
    • Migrating from Server/DC to Cloud
      Migrating from Server/DC to Cloud
       This trigger is hidden
    • Common Problems
      Common Problems
       This trigger is hidden
    • Technical Details related to data usage
      Technical Details related to data usage
       This trigger is hidden
    • The Plugin People Knowledge Base
      The Plugin People Knowledge Base
       This trigger is hidden
    • JEMH App Documentation
      JEMH App Documentation
       This trigger is hidden
    Results will update as you type.
    • 1 - What is JEMH
    • 2 - Configuring JEMH
      • Feature Overview
      • Enabling JEMH logging
      • Configuration Questions
      • Two minute JEMH quickstart
      • How Do I....
      • Common Problems
        • Handling UTF-8 multi-byte characters with a MySQL database
      • Handling Non Conformant emails
      • Videos
      • Enabling internal user comments to be restricted by default but not the remote user
      • I have four email addresses, how do I setup JEMH?
      • Project Mapping (from address) - example
      • Project Mapping (group) - example
      • Routing issues to a project - by user group membership
      • Supporting JIRA and email only users for email issue creation
      • Attachment could not be extracted
    • 3 - Licensing
    • 4 - Reporting Issues with JEMH
    • 6 - Supporting Information
    • 7 - Development & Automation
    • 8 - Roadmap
    • 9 - Bootstrap Profile examples
    • 10 - JEMH Application Docs
    • How-to articles
    • Upgrading
      You‘re viewing this with anonymous access, so some content might be blocked.
      {"serverDuration": 10, "requestCorrelationId": "1ca36828a81c4e12bc79b36cd467a592"}