parser: add IANA time_zone support for naive timestamps by arnikola · Pull Request #11639 · fluent/fluent-bit

Conversation

@arnikola

Add optional parser setting time_zone (IANA name, e.g. from tzdata) so naive timestamps from time_format are converted with local rules and DST, instead of only fixed offset or system timezone.

[N/A]


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@coderabbitai

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 350dd3b9-7a64-4202-bef7-8e0c58c7a54a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

chatgpt-codex-connector[bot]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7e01d163f6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +71 to +74

if (parser->time_zone) {
char *old_tz = NULL;
const char *prev;
struct tm tmp;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip time_zone conversion when timestamp already has offset

The new time_zone branch always uses mktime in the configured zone, even when time_format includes %z/%Z and the parsed record already carries an explicit offset. In that case flb_parser_time_lookup() has already parsed the source zone, but this path ignores it and reinterprets the wall clock in parser->time_zone, producing shifted epochs (for example, a +0000 input gets converted as local New York time). Please either reject time_zone with %z/%Z formats in flb_parser_create() or bypass this branch when parser->time_with_tz is true.

Useful? React with 👍 / 👎.

Comment on lines +86 to +90

if (setenv("TZ", parser->time_zone, 1) != 0) {
flb_free(old_tz);
pthread_mutex_unlock(&flb_time_zone_mutex);
return (time_t) -1;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid mutating process TZ during per-record parsing

This code changes TZ and calls tzset() for every parsed record, which is process-global state. The mutex only serializes this function, but other paths still call mktime without this lock (e.g. flb_parser_tm2time_simple for time_system_timezone), so concurrent parsing can observe the temporary timezone and compute wrong epochs. This creates cross-parser timestamp corruption when one thread uses time_zone and another thread parses local/system timestamps at the same time.

Useful? React with 👍 / 👎.

@arnikola

Example config files:
config.conf:

[SERVICE]
    Parsers_File  parsers.conf

[INPUT]
    Name          tail
    Path          tz-example.log
    Tag           tz.test
    Parser        tz_naive
    Read_from_Head true
    Exit_On_Eof   true

[OUTPUT]
    Name          stdout
    Match         tz.test
    Format        json_lines

parsers.conf

[PARSER]
    Name         tz_naive
    Format       regex
    Regex        ^(?<ts>[^ ]+) (?<msg>.+)$
    Time_Key     ts
    Time_Format  %Y-%m-%dT%H:%M:%S
    Time_Keep    On
    Time_Strict  On
    Time_Zone    Australia/Sydney

@cosmo0920

The patch direction looks great but we need to add Signed-off line on every commit. So, could you add this line?
This can be achieved by:

To add your Signed-off-by line to every commit in this branch:

    Ensure you have a local copy of your branch by [checking out the pull request locally via command line](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/checking-out-pull-requests-locally).
    In your local branch, run: git rebase HEAD~1 --signoff
    Force push your changes to overwrite the branch: git push --force-with-lease origin arnikola/parser-timezones-dst

ref: https://github.com/fluent/fluent-bit/pull/11639/checks?check_run_id=69300784903

Labels

2 participants

@arnikola @cosmo0920