Feature/pdfinfo win by Falven · Pull Request #367 · Unstructured-IO/unstructured-inference

layout.py

def process_data_with_model(
    data: BinaryIO,
    model_name: Optional[str],
    suffix: Optional[str] = ".pdf",
    **kwargs,
) -> DocumentLayout:
    """Processes pdf file in the form of a file handler (supporting a read method) into a
    DocumentLayout by using a model identified by model_name."""
    with tempfile.NamedTemporaryFile(suffix=suffix) as tmp_file:
        tmp_file.write(data.read())
        tmp_file.flush()  # Make sure the file is written out
        layout = process_file_with_model(
            tmp_file.name,
            model_name,
            **kwargs,
        )

    return layout

Is creating a NamedTemporaryFile. On Windows, when you create a NamedTemporaryFile, the file is opened with an exclusive lock by default. This means that no other process can open the file while it is being used by the process that created it.

Up the stack, pdf2image's convert_from_path is being invoked, which uses the system poppler installation to gather information about the pdf. Because poppler is a separate process, this results in an error.