Comparing JamePeng:main...LSDJesus:main · JamePeng/llama-cpp-python
Commits on Apr 6, 2026
-
feat: add semantic memory injection research and layer capture/skip C…
… API extensions - Add copilot instructions for development workflow and architecture overview - Document semantic memory injection methodology and K/V injection findings - Add synthetic token architecture brainstorm and KV pruning specifications - Implement layer capture/skip C API extensions (llama_set_layer_capture, llama_set_layer_skip, llama_get_embeddings_layer_ith, etc.) - Add activation analysis and KV injection/compression/merge experiment scripts - Reorganize original docs into docs/Original_repo/ subdirectory - Update .gitignore for generated artifacts and analysis outputs - Extend llama_cpp.py with new ctypes bindings for layer manipulation - Extend _internals.py and llama.py with high-level Python APIs for layer operations
-
ci: fix release tag_name for manual dispatch
softprops/action-gh-release requires an explicit tag_name when triggered via workflow_dispatch (github.ref is refs/heads/main, not a tag). Generate tag from wheel version + short SHA for manual runs.
-
docs: add qwen3-vl hacking guide and penultimate layer extraction
- Add comprehensive hacking guide for llama.cpp qwen3-vl model modifications - Add penultimate hidden states PR with patch for transformer layer capture - Add activation steering script for semantic memory research - Add KV fragment testing utilities - Add GGUF dequantization test harness - Update per-layer embeddings results - Add steering delta coefficients for layer injection