Add SINGLEFILE_ARGS to control single-file arguments by ntevenhere · Pull Request #1021 · ArchiveBox/ArchiveBox

@ntevenhere

Making a pull request of @renaisun's work for them, since it looks good to me but has been cold for two months.

Summary

#981

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

@renaisun

@renaisun

@notevenaperson Hi, I found this fix is not well rounded. SingleFile need a browser to run which could be set via --browser-executable-path. But for archivebox, the browser path is set and passed to singlefile-cli via CHROME_BINARY, which seems ambiguous and redundant. See this
Should it be removed?

@ntevenhere

@notevenaperson Hi, I found this fix is not well rounded. SingleFile need a browser to run which could be set via --browser-executable-path. But for archivebox, the browser path is set and passed to singlefile-cli via CHROME_BINARY, which seems ambiguous redundant. See this Should it be removed?

IMO that's not an issue. For example COOKIES_FILE affects wget, but user can still redundantly set WGET_ARGS=["--load-cookies=cookies.txt"]. If WGET_ARGS is okay SINGLEFILE_ARGS also is. It's up to the user to sort these redundancies in his or her own config.

@renaisun

@notevenaperson Hi, I found this fix is not well rounded. SingleFile need a browser to run which could be set via --browser-executable-path. But for archivebox, the browser path is set and passed to singlefile-cli via CHROME_BINARY, which seems ambiguous redundant. See this Should it be removed?

IMO that's not an issue. For example COOKIES_FILE affects wget, but user can still redundantly set WGET_ARGS=["--load-cookies=cookies.txt"]. If WGET_ARGS is okay SINGLEFILE_ARGS also is. It's up to the user to sort these redundancies in his or her own config.

Sorry for the confusion. I think there shouldn't be --browser-executable-path outside the SINGLEFILE_ARGS.

Otherwise, the --browser-executable-path in SINGLEFILE_ARGS will be overwritten once the user set CHROME_BINARY. I'm not sure if there is any guarantee that the arguments are read sequentially.

@ntevenhere

Oh I understand now. If the user sets SINGLEFILE_ARGS=["--browser-executable-path=foobar"], single-file would be given the --browser-executable-path two times. I tested it and it does cause an error:

single-file --browser-executable-path=foobar --browser-executable-path=/bin/chromium https://example.com test.html
The "file" argument must be of type string. Received an instance of Array

This pull request should also have code to eliminate conflicting (duplicated) options in the final command.

@ntevenhere

@renaisun

@renaisun I sent a pull request to fix the issue. Thanks for bringing it up. If you accept the pull request, it'll be added here.

renaisun/ArchiveBox/pull/1

Thanks for your work. Let's see if the PR will be merged🤣

@pirate

Thanks for your work here! Will review this soon.

@tzwm

@pirate please review this issue. I really want to add more args to SingleFile. Thanks!