Affine transform not working
What did you do?
Replace OpenCV with Pillow. During this process, the math in the affine transformation no longer works for a specific case. Yes... I know that PIL takes the inverse while OpenCV does not.
What did you expect to happen?
The same affine transform.
What actually happened?
Bad affine transform
What are your OS, Python and Pillow versions?
- OS: Mac OS
- Python: 11.1.0
- Pillow: 3.13.1
--------------------------------------------------------------------
Pillow 11.1.0
Python 3.13.1 (v3.13.1:06714517797, Dec 3 2024, 14:00:22) [Clang 15.0.0 (clang-1500.3.9.4)]
--------------------------------------------------------------------
Python executable is /Library/Frameworks/Python.framework/Versions/3.13/bin/python3
System Python files loaded from /Library/Frameworks/Python.framework/Versions/3.13
--------------------------------------------------------------------
Python Pillow modules loaded from /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/PIL
Binary Pillow modules loaded from /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/PIL
--------------------------------------------------------------------
--- PIL CORE support ok, compiled for 11.1.0
--- TKINTER support ok, loaded 8.6
--- FREETYPE2 support ok, loaded 2.13.2
--- LITTLECMS2 support ok, loaded 2.16
--- WEBP support ok, loaded 1.5.0
--- JPEG support ok, compiled for libjpeg-turbo 3.1.0
--- OPENJPEG (JPEG2000) support ok, loaded 2.5.3
--- ZLIB (PNG/ZIP) support ok, loaded 1.3.1.zlib-ng, compiled for zlib-ng 2.2.2
--- LIBTIFF support ok, loaded 4.6.0
*** RAQM (Bidirectional Text) support not installed
*** LIBIMAGEQUANT (Quantization method) support not installed
--- XCB (X protocol) support ok
--------------------------------------------------------------------
I'm using the minimum rectangle approach to determine the bounding box of OCR results. I was first using OpenCV, but I decided to move to Pillow to reduce size. In this process, I found one image that produced different results in OpenCV. The implementation in OpenCV is correct, while the one in PIL is not. I've included the example image and found bounding box.
from PIL import Image import cv2 import numpy as np def invert_affine(a, b, c, d, e, f): """ Inverts the 2x3 affine transform: [ a b c ] [ d e f ] [ 0 0 1 ] Returns the 6-tuple (a_inv, b_inv, c_inv, d_inv, e_inv, f_inv) for the inverse transform, provided the determinant is not zero. """ det = a * e - b * d if abs(det) < 1e-14: raise ValueError("Singular transform cannot be inverted.") a_inv = e / det b_inv = -b / det c_inv = (b * f - c * e) / det d_inv = -d / det e_inv = a / det f_inv = (c * d - a * f) / det return (a_inv, b_inv, c_inv, d_inv, e_inv, f_inv) # Download file from CDN cdn_url = "https://dev.tylernorlund.com/assets/2608fbeb-dd25-4ab8-8034-5795282b6cd6.png" local_file = "2608fbeb-dd25-4ab8-8034-5795282b6cd6.png" import requests r = requests.get(cdn_url) with open(local_file, "wb") as f: f.write(r.content) bbox = np.array([ [136.86524540105756, 612.9459893206688], [869.5855437615264, 279.14297123098027], [2067.888499216709, 2909.499353483462], [1335.1682008562402, 3243.3023715731506] ], dtype="float32") # Optional: Order the points in a consistent order (top-left, top-right, bottom-right, bottom-left) def order_points(pts): # initialize a list of coordinates that will be ordered rect = np.zeros((4, 2), dtype="float32") s = pts.sum(axis=1) rect[0] = pts[np.argmin(s)] # top-left: smallest sum rect[2] = pts[np.argmax(s)] # bottom-right: largest sum diff = np.diff(pts, axis=1) rect[1] = pts[np.argmin(diff)] # top-right: smallest difference rect[3] = pts[np.argmax(diff)] # bottom-left: largest difference return rect # Order the bounding box points (if your points are already in the right order, you can skip this) rect = order_points(bbox) # Compute the width and height for the output (destination) rectangle. # For an affine transform we can define these as the distances between: # - top-left and top-right (for width) and # - top-left and bottom-left (for height) width = int(np.linalg.norm(rect[0] - rect[1])) height = int(np.linalg.norm(rect[0] - rect[3])) # Choose three source points for the affine transform. # Here we take: top-left, top-right, and bottom-left. src_tri = np.float32([rect[0], rect[1], rect[3]]) # Define the destination points: we want the region to become an upright rectangle. dst_tri = np.float32([ [0, 0], # top-left maps to (0, 0) [width - 1, 0], # top-right maps to (width, 0) [0, height - 1] # bottom-left maps to (0, height) ]) # Get the affine transformation matrix (2x3) that maps src_tri to dst_tri M = cv2.getAffineTransform(src_tri, dst_tri) # print(width, height) # Load the image image = cv2.imread(local_file) # Apply the affine transformation. # Note: warpAffine uses the size (width, height) of the destination image. warped = cv2.warpAffine(image, M, (width, height)) # (Optional) Save the result cv2.imwrite("warped_cv.png", warped) # Open the image using PIL image = Image.open(local_file) # Convert the OpenCV M matrix to a PIL affine transform matrix a_f = M[0, 0] b_f = M[0, 1] c_f = M[0, 2] d_f = M[1, 0] e_f = M[1, 1] f_f = M[1, 2] a_i, b_i, c_i, d_i, e_i, f_i = invert_affine(a_f, b_f, c_f, d_f, e_f, f_f) affine_img = image.transform( (805, 2890), Image.AFFINE, (a_i, b_i, c_i, d_i, e_i, f_i), fill=1, fillcolor=(255, 255, 0), # resample=PIL_Image.NEAREST, ) affine_img.save("warped_pil.png") # delete the file import os os.remove(local_file)