• Steven's avatar
    fix(parser): support Unicode characters in tags · 64e9d82d
    Steven authored
    Fixes #5264
    
    Chinese, Japanese, Korean, and other Unicode characters are now
    properly recognized in hashtags, following the standard hashtag
    parsing conventions used by Twitter, Instagram, and GitHub.
    
    Changes:
    - Updated tag parser to allow Unicode letters and digits
    - Tags stop at whitespace and punctuation (both ASCII and CJK)
    - Allow dash, underscore, forward slash in tags
    - Added comprehensive tests for CJK characters and emoji
    
    Examples:
    - #测试 → recognized as tag '测试'
    - #日本語 → recognized as tag '日本語'
    - #한국어 → recognized as tag '한국어'
    - #测试。→ recognized as tag '测试' (stops at punctuation)
    - #work/测试/项目 → hierarchical tag with Unicode
    64e9d82d
Name
Last commit
Last update
..
ast Loading commit data...
extensions Loading commit data...
parser Loading commit data...
renderer Loading commit data...
markdown.go Loading commit data...
markdown_test.go Loading commit data...