Enable `f16` and `f128` in assembly on platforms that support it

f16 should be okay to pass in GPRs. f16 and f128 can likely get passed in vector or FP registers, but we should prefer to only allow this if an ABI is specified for the type.

Loose list of platforms that specify an ABI:

Additionally, for f128:

  • s390x supports f128, referred to as "BFP Extended Format" in https://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf. I am not sure if this comes with any special instructions.
  • PowerPC with -Ctarget-cpu=pwr9 seems to have f128 support via instructions like xsaddqp

Tracking issue: #116909