Expose Encoder and Decoder in TiktokenTokenizer by razshare · Pull Request #7314 · dotnet/machinelearning

Skip to content

Navigation Menu

Sign in

Appearance settings

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Appearance settings

Conversation

@razshare

Copy link

Fixes #7313

We are excited to review your PR.

So we can do the best job, please check:

  • There's a descriptive title that will make sense to other developers some time from now.
  • There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • You have included any necessary tests in the same PR.

@codecov

Copy link

codecov bot commented

Nov 15, 2024

edited

Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.10%. Comparing base (5090327) to head (140aa42).
Report is 44 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7314      +/-   ##
==========================================
+ Coverage   68.87%   69.10%   +0.23%     
==========================================
  Files        1470     1475       +5     
  Lines      274005   277261    +3256     
  Branches    28403    29043     +640     
==========================================
+ Hits       188717   191606    +2889     
- Misses      77970    78374     +404     
+ Partials     7318     7281      -37     
Flag Coverage Δ
Debug 62.64% <100.00%> (+0.01%) ⬆️
production 62.64% <100.00%> (+0.01%) ⬆️
test ∅ <100.00%> (∅)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...Microsoft.ML.Tokenizers/Model/TiktokenTokenizer.cs 78.28% <100.00%> (ø)
...est/Microsoft.ML.Tokenizers.Tests/TiktokenTests.cs 99.39% <100.00%> (+0.40%) ⬆️

... and 56 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tarekgh

Copy link

Member

tarekgh commented

Nov 18, 2024

edited

Loading

@razshare I replied on the issue. Let's discuss it there first before we continue here. Thanks a lot for your submission. I converted this PR to be draft for now till we finish the discussion.

@tarekgh tarekgh self-assigned this

Nov 18, 2024

@tarekgh tarekgh marked this pull request as draft

November 18, 2024 20:46

@razshare

Copy link

Author

@dotnet-policy-service agree

@dotnet-policy-service

Copy link

Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@github-actions github-actions bot locked and limited conversation to collaborators

May 23, 2025

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Reviewers

No reviews

Assignees

@tarekgh tarekgh

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Expose Encoder in TiktokenTokenizer

2 participants

@razshare @tarekgh