Multi-Language Prime Number Benchmark
Key Discoveries
- Safety vs Performance Trade-off: Rust's memory safety costs ~50% in tight loops
- Unsafe Rust can match C performance when maximum speed is needed
- Optimization flags provide 4-10x performance gains
- Go's defaults explain why it beats highly optimized safe Rust
- Memory efficiency != execution speed
- Loop unrolling + unchecked access = massive gains
the safety vs performance trade-off in Rust.
Quick Results
| Language | Time (seconds) | Memory (KB) | Optimization | Performance Rank |
|---|---|---|---|---|
| C | 22.70 | 2,650 | -O3 |
1st |
| Rust (unsafe) | 21.95 | 2,900 | opt-level=3 + unsafe |
2nd |
| Java | 39.53 | ~4,000 | JIT | 3rd |
| Go | 45.61 | ~3,000 | Default | 4th |
| Rust (safe) | 44.10 | 2,900 | opt-level=3 |
5th |
| C++ | 51.39 | ~3,500 | -O3 |
6th |
| Node.js | 114.35 | ~6,000 | V8 JIT | 7th |
| PHP | 1,020.72 | ~4,000 | Interpreted | 8th |
| Python | 1,612.73 | ~5,000 | Interpreted | 9th |
Task: Find all prime numbers up to 10^10 using segmented sieve algorithm
Major Discovery: Unsafe Rust Performance
Breakthrough finding: Unsafe Rust achieves near-C performance by selectively removing safety guarantees:
- Rust (safe): 44.10s - Full memory safety
- Rust (unsafe): 21.95s - Selective safety removal
- Speedup: 2.46x faster (50% safety overhead)
- Performance: Matches C for compute-intensive algorithms
Key Unsafe Optimizations
// 1. Remove bounds checking unsafe { *segment.get_unchecked_mut(index) = false; } // 2. Direct memory operations unsafe { std::ptr::write_bytes(segment.as_mut_ptr(), 1, len); } // 3. Loop unrolling (4x) while j + p * 4 <= high { // Process 4 elements per iteration }
Prime Number Benchmark
A comprehensive performance comparison of prime number algorithms across 9 programming languages, exploring compilation optimization, memory usage patterns, and real-world performance characteristics.
Quick Results
| Language | Time (seconds) | Memory (KB) | Optimization | Performance Rank |
|---|---|---|---|---|
| C | 14.92 | 2,652 | -O3 |
1st |
| Go | 28.51 | ~3,000 | Default | 2nd |
| Java | 29.11 | ~4,000 | JIT | 3rd |
| Rust | 38.68 | 2,976 | opt-level=3 |
4th |
| C++ | 51.39 | ~3,500 | -O3 |
5th |
| Node.js | 114.35 | ~6,000 | V8 JIT | 6th |
| PHP | 1,020.72 | ~4,000 | Interpreted | 7th |
| Python | 1,612.73 | ~5,000 | Interpreted | 8th |
Task: Find all prime numbers up to 10^10 using segmented sieve algorithm
Key Discoveries
- Optimization flags provide 4-10x performance gains
- Go's defaults explain why it beats highly optimized Rust
- Memory efficiency ≠ execution speed
- Context switches reveal compiler optimization quality
- Understood RUntime and Compile-time tradeoffs (Go and Rust)
Quick Start
Using Docker (Recommended)
# Clone the repository git clone <your-repo-url> cd prime_benchmark # Run the complete benchmark docker build -t prime-benchmark . docker run --rm prime-benchmark # Run specific optimization analysis docker run --rm prime-benchmark ./benchmark_optimizations.sh
Manual Setup
# Install dependencies (Ubuntu/Debian) sudo apt-get update && sudo apt-get install -y \ build-essential gcc g++ default-jdk \ golang-go rustc python3 nodejs php-cli \ erlang elixir mono-mcs bc util-linux # Compile all programs ./compile.sh # Run benchmarks ./benchmark.sh # Analyze memory usage ./analyze_memory.sh
Project Structure
prime_benchmark/ ├── README.md # This file ├── BLOG_POST.md # Detailed learning journey ├── SUMMARY.md # Quick reference guide ├── Dockerfile # Multi-language container ├── compile.sh # Standard compilation ├── compile_optimized.sh # Multi-optimization builds ├── benchmark.sh # Performance testing ├── benchmark_rust_focus.sh # Rust variant comparison ├── analyze_memory.sh # Memory usage analysis ├── Source Code/ │ ├── prime_finder.c # C implementation │ ├── prime_finder.cpp # C++ implementation │ ├── prime_finder.rs # Rust (safe) implementation │ ├── prime_finder_unsaferus.rs # Rust (unsafe) implementation │ ├── prime_finder.go # Go implementation │ ├── PrimeFinder.java # Java implementation │ ├── prime_finder.py # Python implementation │ ├── prime_finder.js # Node.js implementation │ ├── prime_finder.php # PHP implementation │ └── prime_finder.exs # Elixir implementation └── results.csv # Benchmark output
Algorithm Details
All implementations use the segmented sieve of Eratosthenes algorithm:
- Input: Find primes up to 10^10 (10 billion)
- Method: Memory-efficient segmented approach
- Segments: 1 million number chunks
- Output: Count of prime numbers found
Why This Algorithm?
- Memory efficient: Processes large ranges without excessive RAM
- CPU intensive: Good test of computational performance
- Comparable: Same algorithmic complexity across languages
Optimization Analysis
C Compiler Flags Impact
| Flag | Time (s) | Speedup | Use Case |
|---|---|---|---|
-O0 |
60+ | Baseline | Debug only |
-O2 |
17.27 | 3.5x | Production standard |
-O3 |
14.92 | 4x+ | Performance critical |
-Ofast |
18.06 | 3.3x | Use with caution |
Rust Safety vs Performance Analysis
| Implementation | Time (s) | Speedup | Safety Level |
|---|---|---|---|
Rust (safe) |
44.10 | Baseline | Full memory safety |
Rust (unsafe) |
21.95 | 2.46x | Selective unsafe blocks |
Key Insight: Memory safety in tight loops costs ~50% performance in compute-intensive algorithms.
Key Learnings
Performance Insights
- Go's excellent defaults explain competitive performance
- Rust's safety guarantees have measurable overhead
- JIT compilation (Java, Node.js) is surprisingly effective
- Memory patterns don't always correlate with speed
Development Insights
- Always benchmark with optimizations enabled
- Profile memory usage alongside execution time
- Language ergonomics matter for maintainability
- Compilation model affects real-world performance
Detailed Analysis
For a complete deep-dive into the methodology, discoveries, and technical details, see:
- BLOG_POST.md - Full learning journey tutorial
- SUMMARY.md - Quick reference tables
Blog Post
This project is documented in detail on my blog: ["Benchmarking 9 Programming Languages: A Journey Through Performance Optimization"](//Add url later here)
Contributing
Found an optimization or want to add another language? Contributions welcome!
- Fork the repository
- Add your implementation following existing patterns
- Update compilation and benchmark scripts
- Submit a pull request with performance results
License
MIT License - Feel free to use this for your own learning and benchmarking needs.
🔗 Connect
"The best way to learn about performance is to measure it."