Do not convert BC1 LUT to UINT32 by radarhere · Pull Request #8837 · python-pillow/Pillow

#define LOAD32(p) (p)[0] | ((p)[1] << 8) | ((p)[2] << 16) | ((p)[3] << 24)
static void
bc1_color_load(bc1_color *dst, const UINT8 *src) {
dst->c0 = LOAD16(src);
dst->c1 = LOAD16(src + 2);
dst->lut = LOAD32(src + 4);
for (n = 0; n < 16; n++) {
cw = 3 & (col.lut >> (2 * n));
dst[n] = p[cw];
}

With a little maths and changing the loop of size 16 to two range loops of size 4 each, this code can be changed to avoid the UINT32. If you think that changing the size of the loop is misleading to the reality of the image, it's not - looking at https://learn.microsoft.com/en-us/windows/win32/direct3d10/d3d10-graphics-programming-guide-resources-block-compression#bc1, you can see that the LUT is actually representing a 4x4 block.