๐Ÿ”ฐUnicode

๐Ÿšง under construction -> tidy this page

JS โŸฉ value โŸฉ primitive โŸฉ String โŸฉ Unicode

Drawing
Unicode

โ€ข Script (a writing system) = Cyrillic, Greek, Arabic, Han (Chinese) ... ๐Ÿ‘‰ full list)

const {log} = console;
const convert = require('./Converter.js');                

[
    convert.stringToCodeUnits("๐ŸŽ"),    // [ 55356=a, 57166=b ]

    /๐ŸŽ{3}/.test("๐ŸŽ๐ŸŽ๐ŸŽ"),              // falseโ—๏ธ
    // assume ๐ŸŽ = ab (2 code units)
    // ๐ŸŽ{3} = ab{3} = abbb โ‰  ababab = ๐ŸŽ๐ŸŽ๐ŸŽ

    convert.stringToCodeUnits("๐ŸŒน"),    // [ 55356=a, 57145=c ]

    /<.>/.test("<๐ŸŒน>"),                 // falseโ—๏ธ
    // assumn ๐ŸŒน = ac (2 code units)
    // <.> != <ac>

    /<.>/u.test("<๐ŸŒน>"),                // true โญ๏ธ
    // โœ… enable /u flag
    

    // ๆœๅฐ‹ใ€Œๆผขๅญ—ใ€
    `Hello ะŸั€ะธะฒะตั‚ ไฝ ๅฅฝ`.match(/\p{sc=Han}/gu),  // [ 'ไฝ ', 'ๅฅฝ' ]

    // Script
    /\p{Script=Greek}/u.test("ฮฑ"),      // โ†’ true
    /\p{Script=Arabic}/u.test("ฮฑ"),     // โ†’ false

    // Alphabetic
    /\p{Alphabetic}/u.test("ฮฑ"),        // โ†’ true
    /\p{Alphabetic}/u.test("!"),        // โ†’ false
    /\p{Alphabetic}/u.test("ๆผข"),       // โ†’ true

].forEach(x => log(x));

main categories and subcategories

  • Letter L:

    • lowercase Ll

    • modifier Lm,

    • titlecase Lt,

    • uppercase Lu,

    • other Lo.

  • Number N:

    • decimal digit Nd,

    • letter number Nl,

    • other No.

  • Punctuation P:

    • connector Pc,

    • dash Pd,

    • initial quote Pi,

    • final quote Pf,

    • open Ps,

    • close Pe,

    • other Po.

  • Mark M (accents etc):

    • spacing combining Mc,

    • enclosing Me,

    • non-spacing Mn.

  • Symbol S:

    • currency Sc,

    • modifier Sk,

    • math Sm,

    • other So.

  • Separator Z:

    • line Zl,

    • paragraph Zp,

    • space Zs.

  • Other C:

    • control Cc,

    • format Cf,

    • not assigned Cn,

    • private use Co,

    • surrogate Cs.

Last updated

Was this helpful?