add check if libmagic fails by aadland6 · Pull Request #4273 · Unstructured-IO/unstructured

@aadland6

@aadland6

@aadland6

badGarnet

# -- on some environments libmagic can return a generic/unhelpful MIME-type
# -- like octet-stream") for files that the `filetype` package identify.
# -- when that happens we retry using `filetype` `FileType.UNK` results.
if LIBMAGIC_AVAILABLE:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only need to call this when file_type == FileType.UNK and LIBMAGIC_AVAILABLE, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - only cases where libmagic is already available but failed to get the right file

  1. mime_type is not None (otherwise the function already returned None), and

  2. FileType.from_mime_type(mime_type) returned FileType.UNK, and

  3. LIBMAGIC_AVAILABLE is True.

qued

qued approved these changes Mar 3, 2026