codec package - github.com/ugorji/go/codec - Go Packages
Package codec provides a High Performance, Feature-Rich Idiomatic Go codec/encoding library for binc, msgpack, cbor, json.
Supported Serialization formats are:
- msgpack: https://github.com/msgpack/msgpack
- binc: http://github.com/ugorji/binc
- cbor: http://cbor.io http://tools.ietf.org/html/rfc7049
- json: http://json.org http://tools.ietf.org/html/rfc7159
- simple: (unpublished)
For detailed usage information, read the primer at http://ugorji.net/blog/go-codec-primer .
The idiomatic Go support is as seen in other encoding packages in the standard library (ie json, xml, gob, etc).
Rich Feature Set includes:
- Simple but extremely powerful and feature-rich API
- Support for go 1.21 and above, selectively using newer APIs for later releases
- Excellent code coverage ( ~ 85-90% )
- Very High Performance, significantly outperforming libraries for Gob, Json, Bson, etc
- Careful selected use of 'unsafe' for targeted performance gains.
- 100% safe mode supported, where 'unsafe' is not used at all.
- Lock-free (sans mutex) concurrency for scaling to 100's of cores
- In-place updates during decode, with option to zero value in maps and slices prior to decode
- Coerce types where appropriate e.g. decode an int in the stream into a float, decode numbers from formatted strings, etc
- Corner Cases: Overflows, nil maps/slices, nil values in streams are handled correctly
- Standard field renaming via tags
- Support for omitting empty fields during an encoding
- Encoding from any value and decoding into pointer to any value (struct, slice, map, primitives, pointers, interface{}, etc)
- Extensions to support efficient encoding/decoding of any named types
- Support encoding.(Binary|Text)(M|Unm)arshaler interfaces
- Support using existence of `IsZero() bool` to determine if a zero value
- Decoding without a schema (into a interface{}). Includes Options to configure what specific map or slice type to use when decoding an encoded list or map into a nil interface{}
- Mapping a non-interface type to an interface, so we can decode appropriately into any interface type with a correctly configured non-interface value.
- Encode a struct as an array, and decode struct from an array in the data stream
- Option to encode struct keys as numbers (instead of strings) (to support structured streams with fields encoded as numeric codes)
- Comprehensive support for anonymous fields
- Fast (no-reflection) encoding/decoding of common maps and slices
- Code-generation for faster performance, supported in go 1.6+
- Support binary (e.g. messagepack, cbor) and text (e.g. json) formats
- Support indefinite-length formats to enable true streaming (for formats which support it e.g. json, cbor)
- Support canonical encoding, where a value is ALWAYS encoded as same sequence of bytes. This mostly applies to maps, where iteration order is non-deterministic.
- NIL in data stream decoded as zero value
- Never silently skip data when decoding. User decides whether to return an error or silently skip data when keys or indexes in the data stream do not map to fields in the struct.
- Detect and error when encoding a cyclic reference (instead of stack overflow shutdown)
- Encode/Decode from/to chan types (for iterative streaming support)
- Drop-in replacement for encoding/json. `json:` key in struct tag supported.
- Provides a RPC Server and Client Codec for net/rpc communication protocol.
- Handle unique idiosyncrasies of codecs e.g. For messagepack, configure how ambiguities in handling raw bytes are resolved and provide rpc server/client codec to support msgpack-rpc protocol defined at: https://github.com/msgpack-rpc/msgpack-rpc/blob/master/spec.md
Supported build tags ¶
We gain performance by code-generating fast-paths for slices and maps of built-in types, and monomorphizing generic code explicitly so we gain inlining and de-virtualization benefits.
The results are 20-50% performance improvements over v1.2.
Building and running is configured using build tags as below.
At runtime:
- codec.safe: run in safe mode (not using unsafe optimizations) - codec.notmono: use generics code (bypassing performance-boosting monomorphized code) - codec.notfastpath: skip fast path code for slices and maps of built-in types (number, bool, string, bytes)
Each of these "runtime" tags have a convenience synonym i.e. safe, notmono, notfastpath. Pls use these mostly during development - use codec.XXX in your go files.
Build only:
- codec.build: used to generate fastpath and monomorphization code
Test only:
- codec.notmammoth: skip the mammoth generated tests
Extension Support ¶
Users can register a function to handle the encoding or decoding of their custom types.
There are no restrictions on what the custom type can be. Some examples:
type BisSet []int
type BitSet64 uint64
type UUID string
type MyStructWithUnexportedFields struct { a int; b bool; c []int; }
type GifImage struct { ... }
As an illustration, MyStructWithUnexportedFields would normally be encoded as an empty map because it has no exported fields, while UUID would be encoded as a string. However, with extension support, you can encode any of these however you like.
There is also seamless support provided for registering an extension (with a tag) but letting the encoding mechanism default to the standard way.
Custom Encoding and Decoding ¶
This package maintains symmetry in the encoding and decoding halfs. We determine how to encode or decode by walking this decision tree
- is there an extension registered for the type?
- is type a codec.Selfer?
- is format binary, and is type a encoding.BinaryMarshaler and BinaryUnmarshaler?
- is format specifically json, and is type a encoding/json.Marshaler and Unmarshaler?
- is format text-based, and type an encoding.TextMarshaler and TextUnmarshaler?
- else we use a pair of functions based on the "kind" of the type e.g. map, slice, int64, etc
This symmetry is important to reduce chances of issues happening because the encoding and decoding sides are out of sync e.g. decoded via very specific encoding.TextUnmarshaler but encoded via kind-specific generalized mode.
Consequently, if a type only defines one-half of the symmetry (e.g. it implements UnmarshalJSON() but not MarshalJSON() ), then that type doesn't satisfy the check and we will continue walking down the decision tree.
RPC ¶
RPC Client and Server Codecs are implemented, so the codecs can be used with the standard net/rpc package.
Usage ¶
The Handle is SAFE for concurrent READ, but NOT SAFE for concurrent modification.
The Encoder and Decoder are NOT safe for concurrent use.
Consequently, the usage model is basically:
- Create and initialize the Handle before any use. Once created, DO NOT modify it.
- Multiple Encoders or Decoders can now use the Handle concurrently. They only read information off the Handle (never write).
- However, each Encoder or Decoder MUST not be used concurrently
- To re-use an Encoder/Decoder, call Reset(...) on it first. This allows you use state maintained on the Encoder/Decoder.
Sample usage model:
// create and configure Handle
var (
bh codec.BincHandle
mh codec.MsgpackHandle
ch codec.CborHandle
)
mh.MapType = reflect.TypeOf(map[string]interface{}(nil))
// configure extensions
// e.g. for msgpack, define functions and enable Time support for tag 1
// mh.SetExt(reflect.TypeOf(time.Time{}), 1, myExt)
// create and use decoder/encoder
var (
r io.Reader
w io.Writer
b []byte
h = &bh // or mh to use msgpack
)
dec = codec.NewDecoder(r, h)
dec = codec.NewDecoderBytes(b, h)
err = dec.Decode(&v)
enc = codec.NewEncoder(w, h)
enc = codec.NewEncoderBytes(&b, h)
err = enc.Encode(v)
//RPC Server
go func() {
for {
conn, err := listener.Accept()
rpcCodec := codec.GoRpc.ServerCodec(conn, h)
//OR rpcCodec := codec.MsgpackSpecRpc.ServerCodec(conn, h)
rpc.ServeCodec(rpcCodec)
}
}()
//RPC Communication (client side)
conn, err = net.Dial("tcp", "localhost:5555")
rpcCodec := codec.GoRpc.ClientCodec(conn, h)
//OR rpcCodec := codec.MsgpackSpecRpc.ClientCodec(conn, h)
client := rpc.NewClientWithCodec(rpcCodec)
Running Tests ¶
To run tests, use the following:
go test
To run the full suite of tests, use the following:
go test -tags codec.alltests -run Suite
You can run the tag 'codec.safe' to run tests or build in safe mode. e.g.
go test -tags codec.safe -run Json go test -tags "codec.alltests codec.safe" -run Suite
You can run the tag 'codec.notmono' to build bypassing the monomorphized code e.g.
go test -tags codec.notmono -run Json
Running Benchmarks
cd bench go test -bench . -benchmem -benchtime 1s
Please see http://github.com/ugorji/go-codec-bench .
Caveats ¶
Struct fields matching the following are ignored during encoding and decoding
- struct tag value set to -
- func, complex numbers, unsafe pointers
- unexported and not embedded
- unexported and embedded and not struct kind
- unexported and embedded pointers (from go1.10)
Every other field in a struct will be encoded/decoded.
Embedded fields are encoded as if they exist in the top-level struct, with some caveats. See Encode documentation.
- Constants
- Variables
- type BasicHandledeprecated
- type BincHandle
- type BytesExt
- type CborHandle
- type DecodeOptions
- type Decoder
- type EncodeOptions
- type Encoder
- type Ext
- type Handle
- type InterfaceExt
- type JsonHandle
- type MapBySlice
- type MissingFielder
- type MsgpackHandle
- type MsgpackSpecRpcMultiArgs
- type RPCOptions
- type Raw
- type RawExt
- type Rpc
- type Selfer
- type SimpleHandle
- type TypeInfos
const ( CborStreamBytes byte = 0x5f CborStreamString byte = 0x7f CborStreamArray byte = 0x9f CborStreamMap byte = 0xbf CborStreamBreak byte = 0xff )
These define some in-stream descriptors for manual encoding e.g. when doing explicit indefinite-length
GoRpc implements Rpc using the communication protocol defined in net/rpc package.
Note: network connection (from net.Dial, of type io.ReadWriteCloser) is not buffered.
For performance, you should configure WriterBufferSize and ReaderBufferSize on the handle. This ensures we use an adequate buffer during reading and writing. If not configured, we will internally initialize and use a buffer during reads and writes. This can be turned off via the RPCNoBuffer option on the Handle.
var handle codec.JsonHandle handle.RPCNoBuffer = true // turns off attempt by rpc module to initialize a buffer
Example 1: one way of configuring buffering explicitly:
var handle codec.JsonHandle // codec handle handle.ReaderBufferSize = 1024 handle.WriterBufferSize = 1024 var conn io.ReadWriteCloser // connection got from a socket var serverCodec = GoRpc.ServerCodec(conn, handle) var clientCodec = GoRpc.ClientCodec(conn, handle)
Example 2: you can also explicitly create a buffered connection yourself, and not worry about configuring the buffer sizes in the Handle.
var handle codec.Handle // codec handle
var conn io.ReadWriteCloser // connection got from a socket
var bufconn = struct { // bufconn here is a buffered io.ReadWriteCloser
io.Closer
*bufio.Reader
*bufio.Writer
}{conn, bufio.NewReader(conn), bufio.NewWriter(conn)}
var serverCodec = GoRpc.ServerCodec(bufconn, handle)
var clientCodec = GoRpc.ClientCodec(bufconn, handle)
MsgpackSpecRpc implements Rpc using the communication protocol defined in the msgpack spec at https://github.com/msgpack-rpc/msgpack-rpc/blob/master/spec.md .
See GoRpc documentation, for information on buffering for better performance.
SelfExt is a sentinel extension signifying that types registered with it SHOULD be encoded and decoded based on the native mode of the format.
This allows users to define a tag for an extension, but signify that the types should be encoded/decoded as the native encoding. This way, users need not also define how to encode or decode the extension.
This section is empty.
type BasicHandle
deprecated
type BasicHandle struct {
TypeInfos *TypeInfos
DecodeOptions
EncodeOptions
RPCOptions
TimeNotBuiltin bool
ExplicitRelease bool
}
BasicHandle encapsulates the common options and extension functions.
Deprecated: DO NOT USE DIRECTLY. EXPORTED FOR GODOC BENEFIT. WILL BE REMOVED.
func (*BasicHandle) AddExt
deprecated
AddExt registes an encode and decode function for a reflect.Type. To deregister an Ext, call AddExt with nil encfn and/or nil decfn.
Deprecated: Use SetBytesExt or SetInterfaceExt on the Handle instead.
func (*BasicHandle) SetExt
deprecated
SetExt will set the extension for a tag and reflect.Type. Note that the type must be a named type, and specifically not a pointer or Interface. An error is returned if that is not honored. To Deregister an ext, call SetExt with nil Ext.
It will throw an error if called after the Handle has been initialized.
Deprecated: Use SetBytesExt or SetInterfaceExt on the Handle instead (which *may* internally call this)
type BincHandle ¶
type BincHandle struct {
BasicHandle
AsSymbols uint8
}
BincHandle is a Handle for the Binc Schema-Free Encoding Format defined at https://github.com/ugorji/binc .
BincHandle currently supports all Binc features with the following EXCEPTIONS:
- only integers up to 64 bits of precision are supported. big integers are unsupported.
- Only IEEE 754 binary32 and binary64 floats are supported (ie Go float32 and float64 types). extended precision and decimal IEEE 754 floats are unsupported.
- Only UTF-8 strings supported. Unicode_Other Binc types (UTF16, UTF32) are currently unsupported.
Note that these EXCEPTIONS are temporary and full support is possible and may happen soon.
func (*BincHandle) SetBytesExt ¶
SetBytesExt sets an extension
BytesExt handles custom (de)serialization of types to/from []byte. It is used by codecs (e.g. binc, msgpack, simple) which do custom serialization of the types.
type CborHandle ¶
type CborHandle struct {
BasicHandle
IndefiniteLength bool
TimeRFC3339 bool
SkipUnexpectedTags bool
}
CborHandle is a Handle for the CBOR encoding format, defined at http://tools.ietf.org/html/rfc7049 and documented further at http://cbor.io .
CBOR is comprehensively supported, including support for:
- indefinite-length arrays/maps/bytes/strings
- (extension) tags in range 0..0xffff (0 .. 65535)
- half, single and double-precision floats
- all numbers (1, 2, 4 and 8-byte signed and unsigned integers)
- nil, true, false, ...
- arrays and maps, bytes and text strings
None of the optional extensions (with tags) defined in the spec are supported out-of-the-box. Users can implement them as needed (using SetExt), including spec-documented ones:
- timestamp, BigNum, BigFloat, Decimals,
- Encoded Text (e.g. URL, regexp, base64, MIME Message), etc.
DecodeOptions captures configuration options during decode.
NewDecoder returns a Decoder for decoding a stream of bytes from an io.Reader.
For efficiency, Users are encouraged to configure ReaderBufferSize on the handle OR pass in a memory buffered reader (eg bufio.Reader, bytes.Buffer).
NewDecoderBytes returns a Decoder which efficiently decodes directly from a byte slice with zero copying.
NewDecoderString returns a Decoder which efficiently decodes directly from a string with zero copying.
It is a convenience function that calls NewDecoderBytes with a []byte view into the string.
This can be an efficient zero-copy if using default mode i.e. without codec.safe tag.
type EncodeOptions struct {
WriterBufferSize int
ChanRecvTimeout time.Duration
StructToArray bool
Canonical bool
CheckCircularRef bool
RecursiveEmptyCheck bool
Raw bool
StringToRaw bool
OptimumSize bool
NoAddressableReadonly bool
NilCollectionToZeroLength bool
}
EncodeOptions captures configuration options during encode.
NewEncoder returns an Encoder for encoding into an io.Writer.
For efficiency, Users are encouraged to configure WriterBufferSize on the handle OR pass in a memory buffered writer (eg bufio.Writer, bytes.Buffer).
NewEncoderBytes returns an encoder for encoding directly and efficiently into a byte slice, using zero-copying to temporary slices.
It will potentially replace the output byte slice pointed to. After encoding, the out parameter contains the encoded contents.
type Ext interface {
BytesExt
InterfaceExt
}
Ext handles custom (de)serialization of custom types / extensions.
type Handle ¶
type Handle interface {
Name() string
}
Handle defines a specific encoding format. It also stores any runtime state used during an Encoding or Decoding session e.g. stored state about Types, etc.
Once a handle is configured, it can be shared across multiple Encoders and Decoders.
Note that a Handle is NOT safe for concurrent modification.
A Handle also should not be modified after it is configured and has been used at least once. This is because stored state may be out of sync with the new configuration, and a data race can occur when multiple goroutines access it. i.e. multiple Encoders or Decoders in different goroutines.
Consequently, the typical usage model is that a Handle is pre-configured before first time use, and not modified while in use. Such a pre-configured Handle is safe for concurrent access.
type InterfaceExt interface {
ConvertExt(v interface{}) interface{}
UpdateExt(dst interface{}, src interface{})
}
InterfaceExt handles custom (de)serialization of types to/from another interface{} value. The Encoder or Decoder will then handle the further (de)serialization of that known type.
It is used by codecs (e.g. cbor, json) which use the format to do custom serialization of types.
type JsonHandle ¶
JsonHandle is a handle for JSON encoding format.
Json is comprehensively supported:
- decodes numbers into interface{} as int, uint or float64 based on how the number looks and some config parameters e.g. PreferFloat, SignedInt, etc.
- decode integers from float formatted numbers e.g. 1.27e+8
- decode any json value (numbers, bool, etc) from quoted strings
- configurable way to encode/decode []byte . by default, encodes and decodes []byte using base64 Std Encoding
- UTF-8 support for encoding and decoding
It has better performance than the json library in the standard library, by leveraging the performance improvements of the codec library.
In addition, it doesn't read more bytes than necessary during a decode, which allows reading multiple values from a stream containing json and non-json content. For example, a user can read a json value, then a cbor value, then a msgpack value, all from the same stream in sequence.
Note that, when decoding quoted strings, invalid UTF-8 or invalid UTF-16 surrogate pairs are not treated as an error. Instead, they are replaced by the Unicode replacement character U+FFFD.
Note also that the float values for NaN, +Inf or -Inf are encoded as null, as suggested by NOTE 4 of the ECMA-262 ECMAScript Language Specification 5.1 edition. see http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf .
Note the following behaviour differences vs std-library encoding/json package:
- struct field names matched in case-sensitive manner
type MapBySlice interface {
MapBySlice()
}
MapBySlice is a tag interface that denotes the slice or array value should encode as a map in the stream, and can be decoded from a map in the stream.
The slice or array must contain a sequence of key-value pairs. The length of the slice or array must be even (fully divisible by 2).
This affords storing a map in a specific sequence in the stream.
Example usage:
type T1 []string // or []int or []Point or any other "slice" type
func (_ T1) MapBySlice{} // T1 now implements MapBySlice, and will be encoded as a map
type T2 struct { KeyValues T1 }
var kvs = []string{"one", "1", "two", "2", "three", "3"}
var v2 = T2{ KeyValues: T1(kvs) }
// v2 will be encoded like the map: {"KeyValues": {"one": "1", "two": "2", "three": "3"} }
The support of MapBySlice affords the following:
- A slice or array type which implements MapBySlice will be encoded as a map
- A slice can be decoded from a map in the stream
type MissingFielder interface {
CodecMissingField(field []byte, value interface{}) bool
CodecMissingFields() map[string]interface{}
}
MissingFielder defines the interface allowing structs to internally decode or encode values which do not map to struct fields.
We expect that this interface is bound to a pointer type (so the mutation function works).
A use-case is if a version of a type unexports a field, but you want compatibility between both versions during encoding and decoding.
Note that the interface is completely ignored during codecgen.
type MsgpackHandle ¶
type MsgpackHandle struct {
BasicHandle
NoFixedNum bool
WriteExt bool
PositiveIntUnsigned bool
}
MsgpackHandle is a Handle for the Msgpack Schema-Free Encoding Format.
func (*MsgpackHandle) SetBytesExt ¶
SetBytesExt sets an extension
type MsgpackSpecRpcMultiArgs []interface{}
MsgpackSpecRpcMultiArgs is a special type which signifies to the MsgpackSpecRpcCodec that the backend RPC service takes multiple arguments, which have been arranged in sequence in the slice.
The Codec then passes it AS-IS to the rpc service (without wrapping it in an array of 1 element).
type RPCOptions struct {
RPCNoBuffer bool
}
RPCOptions holds options specific to rpc functionality
Raw represents raw formatted bytes. We "blindly" store it during encode and retrieve the raw bytes during decode. Note: it is dangerous during encode, so we may gate the behaviour behind an Encode flag which must be explicitly set.
RawExt represents raw unprocessed extension data.
Some codecs will decode extension data as a *RawExt if there is no registered extension for the tag.
On encode, encode the Data. If nil, then try to encode the Value.
On decode: store tag, then store bytes and/or decode into Value.
Rpc provides a rpc Server or Client Codec for rpc communication.
Selfer defines methods by which a value can encode or decode itself.
Any type which implements Selfer will be able to encode or decode itself. Consequently, during (en|de)code, this takes precedence over (text|binary)(M|Unm)arshal or extension support.
By definition, it is not allowed for a Selfer to directly call Encode or Decode on itself. If that is done, Encode/Decode will rightfully fail with a Stack Overflow style error. For example, the snippet below will cause such an error.
type testSelferRecur struct{}
func (s *testSelferRecur) CodecEncodeSelf(e *Encoder) { e.MustEncode(s) }
func (s *testSelferRecur) CodecDecodeSelf(d *Decoder) { d.MustDecode(s) }
Note: *the first set of bytes of any value MUST NOT represent nil in the format*. This is because, during each decode, we first check the the next set of bytes represent nil, and if so, we just set the value to nil.
type SimpleHandle ¶
type SimpleHandle struct {
BasicHandle
EncZeroValuesAsNil bool
}
SimpleHandle is a Handle for a very simple encoding format.
simple is a simplistic codec similar to binc, but not as compact.
- Encoding of a value is always preceded by the descriptor byte (bd)
- True, false, nil are encoded fully in 1 byte (the descriptor)
- Integers (intXXX, uintXXX) are encoded in 1, 2, 4 or 8 bytes (plus a descriptor byte). There are positive (uintXXX and intXXX >= 0) and negative (intXXX < 0) integers.
- Floats are encoded in 4 or 8 bytes (plus a descriptor byte)
- Length of containers (strings, bytes, array, map, extensions) are encoded in 0, 1, 2, 4 or 8 bytes. Zero-length containers have no length encoded. For others, the number of bytes is given by pow(2, bd%3)
- maps are encoded as [bd] [length] [[key][value]]...
- arrays are encoded as [bd] [length] [value]...
- extensions are encoded as [bd] [length] [tag] [byte]...
- strings/bytearrays are encoded as [bd] [length] [byte]...
- time.Time are encoded as [bd] [length] [byte]...
The full spec will be published soon.
func (*SimpleHandle) SetBytesExt ¶
SetBytesExt sets an extension
type TypeInfos struct {
}
TypeInfos caches typeInfo for each type on first inspection.
It is configured with a set of tag keys, which are used to get configuration for the type.
NewTypeInfos creates a TypeInfos given a set of struct tags keys.
This allows users customize the struct tag keys which contain configuration of their types.