Scouting report
Built a 452★ Chinese sci-fi NLP corpus from scratch
assessed from open-source footprint
Guhhhhaa's standout is 4675-scifi (452★), a Chinese-language NLP corpus of ~4,675 science-fiction novels, backed by a companion corpus wula-scifi (129★) — nearly 700 stars of genuinely useful ML dataset work. It's solo, niche, and dataset-heavy rather than production engineering, with only 3 commits last year and 86% of repos abandoned. A strong fit for data/NLP teams that value curated corpora; less so for hands-on app delivery.
Authorship & open source
What they build
Industry experience
- Data, ML & AI
- Education & EdTech
- Fintech & Payments
Signal breakdown
695
top repo 452
36
45% forks
42
9.5 yr
2
Active
86% stale
Strengths
- Verified author — wrote 100% of commits on 4675-scifi
- 695 stars earned across projects
- A standout project with 452 stars
- Ships to production — 2 live demos
- Data / ML focus with Frontend
- Domain experience in Data, ML & AI & Education & EdTech
- Core stack: HTML, CSS, Jupyter, Python
About
Software Engineer from Cayman Islands, active on GitHub with 66 public repositories and 42 followers. Explore the work and reach out on GitHub.
Skills
- Science Fiction
- Scifi
- Chinese Nlp
- Corpus
- Corpus Data
- Datasets
- Nlp
- Nlp Datasets
Featured work
4675 Scifi
chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库…
- Chinese Nlp
- Corpus
- Corpus Data
- Datasets
- Nlp
by Guhhhhaa
Wula Scifi
chinese NLP corpus of chinese science fiction, chinese science fiction corpus: Archive of the Ark Plan of Ula Science Fiction Website 乌拉科幻小说网方舟计划存档,中…
- Chinese Nlp
- Corpus
- Corpus Data
- Datasets
- Nlp
by Guhhhhaa