AI-powered automation framework for Web and Android with natural language-driven UI operations - Java version
π Language Version
π Project Overview
Midscene Java is a revolutionary AI-powered automation framework designed for UI automation operations on Web and Android platforms. It is the Java implementation of Midscene Python, inheriting its core philosophy: making automation as simple as speaking.
π― Core Features
- Natural Language Operations - Describe operation intentions in everyday language, and AI will automatically understand and execute them
- Intelligent Element Locating - Multi-strategy fusion, automatically selects the optimal positioning method, adapts to page changes
- Structured Data Extraction - Use natural language to extract complex structured data
- Intelligent Assertion Verification - Describe verification conditions in natural language, AI automatically judges
- Multi-Platform Support - Unified interface supports Web and Android platforms
- Visual Debugging - Detailed execution screenshots and decision process recording
- Code Optimization and Refactoring - Systematically refactored for more modular and maintainable code
ποΈ Project Structure
midscene-java/
βββ packages/
β βββ core/ # Core module, providing Agent and AI engine
β βββ web/ # Web automation module
β β βββ playwright/ # Playwright implementation
β β βββ selenium/ # Selenium implementation
β βββ android/ # Android automation module
β βββ cli/ # Command line tool
β βββ examples/ # Example code
β βββ playground/ # Development testing environment
β βββ tests/ # Test cases
βββ apps/ # Application examples
βββ docs/ # Project documentation and optimization plans
βββ wiki/ # Project wiki documentation
π Quick Start
Prerequisites
- Java 17+
- Maven 3.6+ or Gradle 7.0+
- Browser (Chrome/Firefox/Edge, for Web automation)
- AI model API Key (Choose one from OpenAI, Claude, Qwen, or Gemini)
Installation
Add Midscene Java dependencies to your pom.xml file:
<dependencies> <!-- Core module --> <dependency> <groupId>com.midscene</groupId> <artifactId>midscene-core</artifactId> <version>0.1.1</version> </dependency> <!-- Web automation modules (choose as needed) --> <dependency> <groupId>com.midscene</groupId> <artifactId>midscene-web-playwright</artifactId> <version>0.1.1</version> </dependency> <dependency> <groupId>com.midscene</groupId> <artifactId>midscene-web-selenium</artifactId> <version>0.1.1</version> </dependency> <!-- Android automation module (choose as needed) --> <dependency> <groupId>com.midscene</groupId> <artifactId>midscene-android</artifactId> <version>0.1.1</version> </dependency> </dependencies>
Configure AI Model
Create an application.properties or application.yml file to configure the AI model:
# application.properties midscene.ai.provider=openai midscene.ai.model=gpt-4-vision-preview midscene.ai.api-key=your_openai_api_key_here
Example Code
Web Automation Example
package com.example; import com.midscene.core.Agent; import com.midscene.web.playwright.PlaywrightPage; import com.midscene.web.playwright.PlaywrightUIContextProvider; import com.microsoft.playwright.Playwright; import com.microsoft.playwright.Browser; import com.microsoft.playwright.Page; public class SearchExample { public static void main(String[] args) { try (Playwright playwright = Playwright.create()) { // Create browser instance Browser browser = playwright.chromium().launch(); Page page = browser.newPage(); // Create PlaywrightPage wrapper PlaywrightPage playwrightPage = new PlaywrightPage(page); // Create Agent Agent agent = new Agent(new PlaywrightUIContextProvider(playwrightPage)); // Navigate to website page.navigate("https://www.baidu.com"); // Use natural language for search agent.aiAction("Type 'Java tutorial' in the search box"); agent.aiAction("Click the search button"); // Verify search results agent.aiAssert("The page displays search results for Java tutorials"); System.out.println("β Search operation completed!"); // Close browser browser.close(); } } }
Data Extraction Example
package com.example; import com.midscene.core.Agent; import com.midscene.web.playwright.PlaywrightPage; import com.midscene.web.playwright.PlaywrightUIContextProvider; import com.microsoft.playwright.Playwright; import com.microsoft.playwright.Browser; import com.microsoft.playwright.Page; import java.util.HashMap; import java.util.List; import java.util.Map; public class ExtractExample { public static void main(String[] args) { try (Playwright playwright = Playwright.create()) { Browser browser = playwright.chromium().launch(); Page page = browser.newPage(); PlaywrightPage playwrightPage = new PlaywrightPage(page); Agent agent = new Agent(new PlaywrightUIContextProvider(playwrightPage)); // Visit news website page.navigate("https://news.example.com"); // Extract structured data Map<String, Object> schema = new HashMap<>(); schema.put("articles", List.of( Map.of( "title", "News title", "time", "Publish time", "summary", "News summary" ) )); Map<String, Object> newsData = agent.aiExtract(schema); // Output results List<Map<String, String>> articles = (List<Map<String, String>>) newsData.get("articles"); for (Map<String, String> article : articles) { System.out.println("π° " + article.get("title")); System.out.println("β° " + article.get("time")); System.out.println("π " + article.get("summary") + "\n"); } browser.close(); } } }
Android Automation Example
package com.example; import com.midscene.core.Agent; import com.midscene.android.AndroidDevice; import com.midscene.android.AndroidUIContextProvider; import java.util.concurrent.CompletableFuture; public class AndroidExample { public static void main(String[] args) { // Connect to Android device AndroidDevice device = new AndroidDevice(); CompletableFuture<Void> connectFuture = device.connect(); connectFuture.join(); // Wait for connection to complete try { // Create Agent Agent agent = new Agent(new AndroidUIContextProvider(device)); // Launch application agent.aiAction("Launch the settings app"); // Perform operations agent.aiAction("Tap on the Wi-Fi option"); agent.aiAssert("The Wi-Fi settings page is open"); System.out.println("β Android automation operation completed!"); } finally { device.disconnect(); } } }
π Documentation
- Project Overview - Chinese only
- Installation and Configuration - Chinese only
- Quick Start - Chinese only
- API Reference
- Example Code
- Frequently Asked Questions - Chinese only
- Core Concepts - Chinese only
- Platform Integration
π Comparison with Traditional Tools
| Feature | Traditional Automation Tools | Midscene Java |
|---|---|---|
| Learning Curve | Steep, requires learning complex APIs | Gentle, natural language driven |
| Code Readability | Obscure and hard to understand | Intuitive and easy to understand |
| Maintenance Cost | High, requires extensive modifications for page changes | Low, AI automatically adapts to changes |
| Element Locating | Manual selector writing | AI intelligent locating |
| Error Handling | Manual handling of various exceptions | AI automatic retry and recovery |
| Cross-Platform | Requires learning different tools | Unified interface |
| Code Quality | Varies by project | Systematically refactored, modular design |
π€ Contribution Guidelines
We welcome all forms of contributions! Whether it's submitting bug reports, feature requests, documentation improvements, or code contributions.
How to Contribute
- Fork this repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Create a Pull Request
Development Environment Setup
# Clone the repository git clone https://github.com/Master-Frank/midscene-java.git cd midscene-java # Build the project mvn clean install # Run tests mvn test
Code Standards
- Follow commit message conventions from Conventional Commits
- Add corresponding test cases for new features
- Add JavaDoc documentation for public APIs
- Keep code modular, avoid overly long methods
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Credits
Thanks to Midscene Project: https://github.com/web-infra-dev/midscene for inspiration and technical references
π Contact Us
- GitHub: Master-Frank/midscene-java
- Issue Reporting: GitHub Issues
- Discussions: GitHub Discussions
β If this project helps you, please give us a star!