resolve call failed: DNSSEC validation failed: failed-auxiliary · Issue #9867 · systemd/systemd

It would appear that most resolvers will happily resolve FQDNs such as savannah.gnu.org, but simply don't consider them authenticated (i.e., by omitting the ad flag in the response):

$ dig @1.1.1.1 savannah.gnu.org. +dnssec  | grep ';; flags'
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
$ dig @8.8.8.8 savannah.gnu.org. +dnssec  | grep ';; flags'
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

I think systemd-resolved ought to behave in the same way.

A practical consequence of the current behaviour is that it becomes impossible for a domain owner to gracefully deploy (or remove) DNSSEC signatures without interruption, as he has no control over when the parent (gTLD/ccTLD) zone is reloaded and the DS record becomes visible. It is impossible to in an «atomic»/instantaneous manner transition from a having no DS in the parent zone and no signatures in zone itself to a state where the zone is signed and DS records exists in the parent zone (or vice versa) - even before taking TTL and caching into account.

The normal procedure when deploying DNSSEC is to first sign the zone, wait for all slaves to pick it up and for TTLs to expire, and only then push the DS records to the parent zone via the registrar. This procedure could take hours or even days, during which systemd-resolved with DNSSEC enabled will fail to resolve any hostname in the zone (as I understand it).

RFC 4033 section 5 contains some guidance on how to behave in this this situation:

   Insecure: The validating resolver has a trust anchor, a chain of
      trust, and, at some delegation point, signed proof of the
      non-existence of a DS record.  This indicates that subsequent
      branches in the tree are provably insecure.  A validating resolver
      may have a local policy to mark parts of the domain space as
      insecure.

   Bogus: The validating resolver has a trust anchor and a secure
      delegation indicating that subsidiary data is signed, but the
      response fails to validate for some reason: missing signatures,
      expired signatures, signatures with unsupported algorithms, data
      missing that the relevant NSEC RR says should be present, and so
      forth.

   Indeterminate: There is no trust anchor that would indicate that a
      specific portion of the tree is secure.  This is the default
      operation mode.

As I understand it, savannah.gnu.org matches the Insecure definition, as there's no DS record for gnu.org in the org gTLD zone. It is not Bogus.

The RFC goes on to say:

   This specification only defines how security-aware name servers can
   signal non-validating stub resolvers that data was found to be bogus
   (using RCODE=2, "Server Failure"; see [RFC4035]).

And also:

   This specification does not define a format for communicating why
   responses were found to be bogus or marked as insecure.  The current
   signaling mechanism does not distinguish between indeterminate and
   insecure states.

However, even though savannah.gnu.org is not Bogus, systemd-resolved responds with SERVFAIL when queried for it:

$ dig @127.0.0.53 savannah.gnu.org. SOA | grep status:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 41084

This seems improper to me. The correct thing to do, as I understand it, would be to answer the query without the ad bit absent - i.e., the same as what it would do with Insecure answers.