New Top-Down Pose Estimation Model - RTMPose by n-poulsen · Pull Request #2795 · DeepLabCut/DeepLabCut

Expand Up @@ -420,16 +420,20 @@ train(
### Running Video Analysis outside a DeepLabCut Project
DeepLabCut provides high-level APIs (via the GUI or the python package) to analyze your data. The usage of this API assumes the existance of a DLC project (with `config.yaml` file, etc.). DeepLabCut provides high-level APIs (via the GUI or the python package) to analyze your data. The usage of this API assumes the existance of a DLC project (with `config.yaml` file, etc.).
Sometimes it might be more convenient to just run a model on your data via a low-level API. We also use this API under the hood, in particular for the Model Zoo. Check out the example below: Sometimes it might be more convenient to just run a model on your data via a low-level API. We also use this API under the hood, in particular for the Model Zoo. Check out the example below:
```python from pathlib import Path
from deeplabcut.pose_estimation_pytorch import Task from deeplabcut.pose_estimation_pytorch.apis.analyze_videos import video_inference from deeplabcut.pose_estimation_pytorch.config import read_config_as_dict from deeplabcut.pose_estimation_pytorch.task import Task from deeplabcut.pose_estimation_pytorch.apis.utils import get_inference_runners
train_dir = Path("/Users/Jaylen/my-dlc-models/train") Expand All @@ -447,30 +451,76 @@ detector_batch_size = 8
# read model configuration model_cfg = read_config_as_dict(pytorch_config_path) bodyparts = model_cfg["metadata"]["bodyparts"] unique_bodyparts = model_cfg["metadata"]["unique_bodyparts"] with_identity = model_cfg["metadata"].get("with_identity", False)
pose_task = Task(model_cfg["method"]) pose_runner, detector_runner = get_inference_runners( model_config=model_cfg, snapshot_path=snapshot_path, max_individuals=max_num_animals, num_bodyparts=len(bodyparts), num_unique_bodyparts=len(unique_bodyparts), batch_size=batch_size, with_identity=with_identity, transform=None, detector_batch_size=detector_batch_size, detector_path=detector_snapshot_path, detector_transform=None, )
predictions = video_inference( video=video_path, task=pose_task, pose_runner=pose_runner, detector_runner=detector_runner, with_identity=False, ) ```

### Running Top-Down Video Analysis with Existing Bounding Boxes
When `deeplabcut.pose_estimation_pytorch.apis.analyze_videos.video_inference` is called with a top-down model, it is assumed that a detector snapshot is given as well to obtain bounding boxes with which to run pose estimation. It's possible that you've already obtained bounding boxes for your video (with another object detector or through some other means), and you want to re-use those bounding boxes instead of running an object detector again.
You can easily do so by writing a bit of custom code, as shown in the example below:
```python from pathlib import Path
import numpy as np from deeplabcut.pose_estimation_pytorch import get_inference_runners from deeplabcut.pose_estimation_pytorch.apis import VideoIterator from deeplabcut.pose_estimation_pytorch.config import read_config_as_dict from tqdm import tqdm
# create an iterator for your video video = VideoIterator("/Users/Jayson/my-cool-video.mp4")
# dummy bboxes - you can load yours from a file or in another way # the bboxes should be in `xywh` format, i.e. (x_top_left, y_top_left, width, height) bounding_boxes = [ dict( # frame 0 bounding boxes bboxes=np.array([[12, 37, 120, 78]]), ), dict( # frame 1 bounding boxes bboxes=np.array([[17, 45, 128, 73], [532, 34, 117, 87]]), ), # ... dict( # frame N bboxes -> must be equal to the number of frames in the video! bboxes=np.array([[17, 45, 128, 73], [532, 34, 117, 87]]), ), ] video.set_context(bounding_boxes) max_individuals = np.max([len(context["bboxes"]) for context in bounding_boxes])
# run inference! model_cfg = read_config_as_dict("/Users/Jayson/pytorch_config.yaml") pose_runner, _ = get_inference_runners( model_config=model_cfg, snapshot_path=Path("/Users/Jayson/model-snapshot.pt"), max_individuals=max_individuals, batch_size=32, )
# your predictions will be a list, containing the predictions made for each frame # as a dict (with keys for "bodyparts" but also "bboxes")! predictions = pose_runner.inference(images=tqdm(video)) ```