🙂 State-of-the-art transformers for Ruby
For fast inference, check out Informers 🔥
Installation
First, install Torch.rb.
Then add this line to your application’s Gemfile:
Getting Started
Models
Embedding
- sentence-transformers/all-MiniLM-L6-v2
- sentence-transformers/multi-qa-MiniLM-L6-cos-v1
- sentence-transformers/all-mpnet-base-v2
- sentence-transformers/paraphrase-MiniLM-L6-v2
- mixedbread-ai/mxbai-embed-large-v1
- thenlper/gte-small
- intfloat/e5-base-v2
- BAAI/bge-base-en-v1.5
- Snowflake/snowflake-arctic-embed-m-v1.5
Sparse embedding
Reranking
sentence-transformers/all-MiniLM-L6-v2
sentences = ["This is an example sentence", "Each sentence is converted"] model = Transformers.pipeline("embedding", "sentence-transformers/all-MiniLM-L6-v2") embeddings = model.(sentences)
sentence-transformers/multi-qa-MiniLM-L6-cos-v1
query = "How many people live in London?" docs = ["Around 9 Million people live in London", "London is known for its financial district"] model = Transformers.pipeline("embedding", "sentence-transformers/multi-qa-MiniLM-L6-cos-v1") query_embedding = model.(query) doc_embeddings = model.(docs) scores = doc_embeddings.map { |e| e.zip(query_embedding).sum { |d, q| d * q } } doc_score_pairs = docs.zip(scores).sort_by { |d, s| -s }
sentence-transformers/all-mpnet-base-v2
sentences = ["This is an example sentence", "Each sentence is converted"] model = Transformers.pipeline("embedding", "sentence-transformers/all-mpnet-base-v2") embeddings = model.(sentences)
sentence-transformers/paraphrase-MiniLM-L6-v2
sentences = ["This is an example sentence", "Each sentence is converted"] model = Transformers.pipeline("embedding", "sentence-transformers/paraphrase-MiniLM-L6-v2") embeddings = model.(sentences)
mixedbread-ai/mxbai-embed-large-v1
query_prefix = "Represent this sentence for searching relevant passages: " input = [ "The dog is barking", "The cat is purring", query_prefix + "puppy" ] model = Transformers.pipeline("embedding", "mixedbread-ai/mxbai-embed-large-v1") embeddings = model.(input)
thenlper/gte-small
sentences = ["That is a happy person", "That is a very happy person"] model = Transformers.pipeline("embedding", "thenlper/gte-small") embeddings = model.(sentences)
intfloat/e5-base-v2
doc_prefix = "passage: " query_prefix = "query: " input = [ doc_prefix + "Ruby is a programming language created by Matz", query_prefix + "Ruby creator" ] model = Transformers.pipeline("embedding", "intfloat/e5-base-v2") embeddings = model.(input)
BAAI/bge-base-en-v1.5
query_prefix = "Represent this sentence for searching relevant passages: " input = [ "The dog is barking", "The cat is purring", query_prefix + "puppy" ] model = Transformers.pipeline("embedding", "BAAI/bge-base-en-v1.5") embeddings = model.(input)
Snowflake/snowflake-arctic-embed-m-v1.5
query_prefix = "Represent this sentence for searching relevant passages: " input = [ "The dog is barking", "The cat is purring", query_prefix + "puppy" ] model = Transformers.pipeline("embedding", "Snowflake/snowflake-arctic-embed-m-v1.5") embeddings = model.(input, pooling: "cls")
opensearch-project/opensearch-neural-sparse-encoding-v1
docs = ["The dog is barking", "The cat is purring", "The bear is growling"] model_id = "opensearch-project/opensearch-neural-sparse-encoding-v1" model = Transformers::AutoModelForMaskedLM.from_pretrained(model_id) tokenizer = Transformers::AutoTokenizer.from_pretrained(model_id) special_token_ids = tokenizer.special_tokens_map.map { |_, token| tokenizer.vocab[token] } feature = tokenizer.(docs, padding: true, truncation: true, return_tensors: "pt", return_token_type_ids: false) output = model.(**feature)[0] values, _ = Torch.max(output * feature[:attention_mask].unsqueeze(-1), dim: 1) values = Torch.log(1 + Torch.relu(values)) values[0.., special_token_ids] = 0 embeddings = values.to_a
mixedbread-ai/mxbai-rerank-base-v1
query = "How many people live in London?" docs = ["Around 9 Million people live in London", "London is known for its financial district"] model = Transformers.pipeline("reranking", "mixedbread-ai/mxbai-rerank-base-v1") result = model.(query, docs)
BAAI/bge-reranker-base
query = "How many people live in London?" docs = ["Around 9 Million people live in London", "London is known for its financial district"] model = Transformers.pipeline("reranking", "BAAI/bge-reranker-base") result = model.(query, docs)
Pipelines
Text
Embedding
embed = Transformers.pipeline("embedding") embed.("We are very happy to show you the 🤗 Transformers library.")
Reranking
rerank = Informers.pipeline("reranking") rerank.("Who created Ruby?", ["Matz created Ruby", "Another doc"])
Named-entity recognition
ner = Transformers.pipeline("ner") ner.("Ruby is a programming language created by Matz")
Sentiment analysis
classifier = Transformers.pipeline("sentiment-analysis") classifier.("We are very happy to show you the 🤗 Transformers library.")
Question answering
qa = Transformers.pipeline("question-answering") qa.(question: "Who invented Ruby?", context: "Ruby is a programming language created by Matz")
Feature extraction
extractor = Transformers.pipeline("feature-extraction") extractor.("We are very happy to show you the 🤗 Transformers library.")
Vision
Image classification
classifier = Transformers.pipeline("image-classification") classifier.("image.jpg")
Image feature extraction
extractor = Transformers.pipeline("image-feature-extraction") extractor.("image.jpg")
API
This library follows the Transformers Python API. The following model architectures are currently supported:
- BERT
- DeBERTa-v2
- DistilBERT
- MPNet
- ViT
- XLM-RoBERTa
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/transformers-ruby.git cd transformers-ruby bundle install bundle exec rake download:files bundle exec rake test