• Steven's avatar
    fix(parser): support Unicode characters in tags · 64e9d82d
    Steven authored
    Fixes #5264
    
    Chinese, Japanese, Korean, and other Unicode characters are now
    properly recognized in hashtags, following the standard hashtag
    parsing conventions used by Twitter, Instagram, and GitHub.
    
    Changes:
    - Updated tag parser to allow Unicode letters and digits
    - Tags stop at whitespace and punctuation (both ASCII and CJK)
    - Allow dash, underscore, forward slash in tags
    - Added comprehensive tests for CJK characters and emoji
    
    Examples:
    - #测试 → recognized as tag '测试'
    - #日本語 → recognized as tag '日本語'
    - #한국어 → recognized as tag '한국어'
    - #测试。→ recognized as tag '测试' (stops at punctuation)
    - #work/测试/项目 → hierarchical tag with Unicode
    64e9d82d
Name
Last commit
Last update
.github Loading commit data...
cmd/memos Loading commit data...
internal Loading commit data...
plugin Loading commit data...
proto Loading commit data...
scripts Loading commit data...
server Loading commit data...
store Loading commit data...
web Loading commit data...
.dockerignore Loading commit data...
.gitignore Loading commit data...
.golangci.yaml Loading commit data...
.goreleaser.yaml Loading commit data...
CLAUDE.md Loading commit data...
CODEOWNERS Loading commit data...
LICENSE Loading commit data...
README.md Loading commit data...
SECURITY.md Loading commit data...
go.mod Loading commit data...
go.sum Loading commit data...