Text classification in Ruby. Five algorithms, native performance, streaming support.
Documentation · Tutorials · API Reference
Why This Library?
| This Gem | Other Forks | |
|---|---|---|
| Algorithms | ✅ 5 classifiers | ❌ 2 only |
| Incremental LSI | ✅ Brand's algorithm (no rebuild) | ❌ Full SVD rebuild on every add |
| LSI Performance | ✅ Native C extension (5-50x faster) | ❌ Pure Ruby or requires GSL |
| Streaming | ✅ Train on multi-GB datasets | ❌ Must load all data in memory |
| Persistence | ✅ Pluggable (file, Redis, S3, SQL, Custom) | ❌ Marshal only |
Installation
Or install via Homebrew for CLI-only usage:
brew install cardmagic/tap/classifier
Command Line
Classify text instantly with pre-trained models—no coding required:
# Detect spam classifier -r sms-spam-filter "You won a free iPhone" # => spam # Analyze sentiment classifier -r imdb-sentiment "This movie was absolutely amazing" # => positive # Detect emotions classifier -r emotion-detection "I am so happy today" # => joy # List all available models classifier models
Train your own model:
# Train from files classifier train positive reviews/good/*.txt classifier train negative reviews/bad/*.txt # Classify new text classifier "Great product, highly recommend" # => positive
Claude Code Plugin
Install as a plugin to get skills (auto-invoked) and slash commands:
# Add the marketplace claude plugin marketplace add cardmagic/ai-marketplace # Install the plugin claude plugin install classifier@cardmagic
This gives you:
- Skill: Claude automatically classifies text when you ask about spam, sentiment, or emotions
- Slash commands:
/classifier:classify,/classifier:train,/classifier:models
Quick Start
Bayesian
classifier = Classifier::Bayes.new(:spam, :ham) classifier.train(spam: "Buy viagra cheap pills now") classifier.train(spam: "You won million dollars prize") classifier.train(ham: ["Meeting tomorrow at 3pm", "Quarterly report attached"]) classifier.classify("Cheap pills!") # => "Spam"
Logistic Regression
classifier = Classifier::LogisticRegression.new(:positive, :negative) classifier.train(positive: "love amazing great wonderful") classifier.train(negative: "hate terrible awful bad") classifier.classify("I love it!") # => "Positive"
LSI (Latent Semantic Indexing)
lsi = Classifier::LSI.new lsi.add(dog: "dog puppy canine bark fetch", cat: "cat kitten feline meow purr") lsi.classify("My puppy barks") # => "dog"
k-Nearest Neighbors
knn = Classifier::KNN.new(k: 3) %w[laptop coding software developer programming].each { |w| knn.add(tech: w) } %w[football basketball soccer goal team].each { |w| knn.add(sports: w) } knn.classify("programming code") # => "tech"
TF-IDF
tfidf = Classifier::TFIDF.new tfidf.fit(["Ruby is great", "Python is great", "Ruby on Rails"]) tfidf.transform("Ruby programming") # => {:rubi => 1.0}
Key Features
Incremental LSI
Add documents without rebuilding the entire index—400x faster for streaming data:
lsi = Classifier::LSI.new(incremental: true) lsi.add(tech: ["Ruby is elegant", "Python is popular"]) lsi.build_index # These use Brand's algorithm—no full rebuild lsi.add(tech: "Go is fast") lsi.add(tech: "Rust is safe")
Persistence
classifier.storage = Classifier::Storage::File.new(path: "model.json") classifier.save loaded = Classifier::Bayes.load(storage: classifier.storage)
Streaming Training
classifier.train_from_stream(:spam, File.open("spam_corpus.txt"))
Performance
Native C extension provides 5-50x speedup for LSI operations:
| Documents | Speedup |
|---|---|
| 10 | 25x |
| 20 | 50x |
rake benchmark:compare # Run your own comparisonDevelopment
bundle install rake compile # Build native extension rake test # Run tests
Authors
- Lucas Carlson - lucas@rufy.com
- David Fayram II - dfayram@gmail.com
- Cameron McBride - cameron.mcbride@gmail.com
- Ivan Acosta-Rubio - ivan@softwarecriollo.com