MOCKSTACKS
EN
Questions And Answers

More Tutorials




How to convert UTF-8 byte[] to string in Javascript?


Here's a function I wrote a while back for encoding a JavaScript UTF-16 string in UTF-8:

function toUTF8Array(str) {
    var utf8 = [];
    for (var i=0; i < str.length; i++) {
        var charcode = str.charCodeAt(i);
        if (charcode < 0x80) utf8.push(charcode);
        else if (charcode < 0x800) {
            utf8.push(0xc0 | (charcode >> 6), 
                      0x80 | (charcode & 0x3f));
        }
        else if (charcode < 0xd800 || charcode >= 0xe000) {
            utf8.push(0xe0 | (charcode >> 12), 
                      0x80 | ((charcode>>6) & 0x3f), 
                      0x80 | (charcode & 0x3f));
        }
        // surrogate pair
        else {
            i++;
            // UTF-16 encodes 0x10000-0x10FFFF by
            // subtracting 0x10000 and splitting the
            // 20 bits of 0x0-0xFFFFF into two halves
            charcode = 0x10000 + (((charcode & 0x3ff)<<10)
                      | (str.charCodeAt(i) & 0x3ff));
            utf8.push(0xf0 | (charcode >>18), 
                      0x80 | ((charcode>>12) & 0x3f), 
                      0x80 | ((charcode>>6) & 0x3f), 
                      0x80 | (charcode & 0x3f));
        }
    }
    return utf8;
}

toUTF8Array: Javascript function for encoding a string in UTF8

function toUTF8Array(str) {
    var utf8 = [];
    for (var i=0; i < str.length; i++) {
        var charcode = str.charCodeAt(i);
        if (charcode < 0x80) utf8.push(charcode);
        else if (charcode < 0x800) {
            utf8.push(0xc0 | (charcode >> 6), 
                      0x80 | (charcode & 0x3f));
        }
        else if (charcode < 0xd800 || charcode >= 0xe000) {
            utf8.push(0xe0 | (charcode >> 12), 
                      0x80 | ((charcode>>6) & 0x3f), 
                      0x80 | (charcode & 0x3f));
        }
        // surrogate pair
        else {
            i++;
            // UTF-16 encodes 0x10000-0x10FFFF by
            // subtracting 0x10000 and splitting the
            // 20 bits of 0x0-0xFFFFF into two halves
            charcode = 0x10000 + (((charcode & 0x3ff)<<10)
                      | (str.charCodeAt(i) & 0x3ff))
            utf8.push(0xf0 | (charcode >>18), 
                      0x80 | ((charcode>>12) & 0x3f), 
                      0x80 | ((charcode>>6) & 0x3f), 
                      0x80 | (charcode & 0x3f));
        }
    }
    return utf8;
}


Conclusion

In this page (written and validated by ) you learned about . What's Next? If you are interested in completing Javascript tutorial, we encourage you simply to start here: Javascript Tutorial.



Incorrect info or code snippet? We take very seriously the accuracy of the information provided on our website. We also make sure to test all snippets and examples provided for each section. If you find any incorrect information, please send us an email about the issue: mockstacks@gmail.com.


Share On:


Mockstacks was launched to help beginners learn programming languages; the site is optimized with no Ads as, Ads might slow down the performance. We also don't track any personal information; we also don't collect any kind of data unless the user provided us a corrected information. Almost all examples have been tested. Tutorials, references, and examples are constantly reviewed to avoid errors, but we cannot warrant full correctness of all content. By using Mockstacks.com, you agree to have read and accepted our terms of use, cookies and privacy policy.