Comparing oluka007:main...JamePeng:main · oluka007/llama-cpp-python
Commits on Apr 7, 2026
-
ci: restrict cudaarch to Volta-Hopper to fix GitHub Actions timeout
Using the `all` option for `cudaarch` on CUDA 12.4-12.6 causes the compilation process to exceed the 6-hour maximum execution limit on GitHub Actions, leading to cancelled jobs. To resolve this and reduce build times, the target architectures are now restricted to explicitly support compute capabilities 7.0 through 9.0 (`70-real` to `90-real`). This maintains support for all modern NVIDIA GPUs equipped with Tensor Cores (from Volta up to Hopper architectures) while keeping the build time safely within CI constraints. Signed-off-by: JamePeng <jame_peng@sina.com>
-
Update CI Action runner version
microsoft/setup-msbuild@v2 -> v3 actions/checkout@v5 -> v6 actions/upload-artifact@v4 -> v6 actions/download-artifact@v4 -> v6 Signed-off-by: JamePeng <jame_peng@sina.com>
-
feat(types): align with latest OpenAI API spec and fix type issues
- Expand `CompletionUsage` with `PromptTokensDetails` and `CompletionTokensDetails` for granular token tracking. - Add `usage` to `CreateChatCompletionStreamResponse` to support usage reporting in streaming mode. - Fix duplicate `object` field in `CreateCompletionResponse`. - Update `ChatCompletionRequestAssistantMessage` to accept `None` for `content` and introduce the new `refusal` field. - Clean up `ChatCompletionRequestMessage` Union by removing the duplicate user message type. - Broaden `ChatCompletionToolChoiceOption` to fully support `allowed_tools` and `custom` tool choice behaviors. Signed-off-by: JamePeng <jame_peng@sina.com>