functions (Spark 4.2.0 JavaDoc)

Method Details

  • countDistinct

    Aggregate function: returns the number of distinct items in a group.

    An alias of count_distinct, and it is encouraged to use count_distinct directly.

    Parameters:
    expr - (undocumented)
    exprs - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • countDistinct

    public static Column countDistinct(String columnName, String... columnNames)

    Aggregate function: returns the number of distinct items in a group.

    An alias of count_distinct, and it is encouraged to use count_distinct directly.

    Parameters:
    columnName - (undocumented)
    columnNames - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • count_distinct

    Aggregate function: returns the number of distinct items in a group.

    Parameters:
    expr - (undocumented)
    exprs - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.2.0
  • grouping_id

    Aggregate function: returns the level of grouping, equals to

    
       (grouping(c1) <<; (n-1)) + (grouping(c2) <<; (n-2)) + ... + grouping(cn)
     
    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
    Note:
    The list of columns should match with grouping columns exactly, or empty (means all the grouping columns).
  • grouping_id

    Aggregate function: returns the level of grouping, equals to

    
       (grouping(c1) <<; (n-1)) + (grouping(c2) <<; (n-2)) + ... + grouping(cn)
     
    Parameters:
    colName - (undocumented)
    colNames - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
    Note:
    The list of columns should match with grouping columns exactly.
  • array

    Creates a new array column. The input columns must all have the same data type.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • array

    Creates a new array column. The input columns must all have the same data type.

    Parameters:
    colName - (undocumented)
    colNames - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • map

    Creates a new map column. The input columns must be grouped as key-value pairs, e.g. (key1, value1, key2, value2, ...). The key columns must all have the same data type, and can't be null. The value columns must all have the same data type.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0
  • named_struct

    public static Column named_struct(Column... cols)

    Creates a struct with the given field names and values.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • coalesce

    Returns the first column that is not null, or null if all inputs are null.

    For example, coalesce(a, b, c) will return a if a is not null, or b if a is null and b is not null, or c if both a and b are null but c is not null.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • struct

    Creates a new struct column. If the input column is a column in a DataFrame, or a derived column expression that is named (i.e. aliased), its name would be retained as the StructField's name, otherwise, the newly generated StructField's name would be auto generated as col with a suffix index + 1, i.e. col1, col2, col3, ...

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • struct

    Creates a new struct column that composes multiple input columns.

    Parameters:
    colName - (undocumented)
    colNames - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • greatest

    Returns the greatest value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

    Parameters:
    exprs - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • greatest

    Returns the greatest value of the list of column names, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

    Parameters:
    columnName - (undocumented)
    columnNames - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • least

    Returns the least value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

    Parameters:
    exprs - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • least

    Returns the least value of the list of column names, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

    Parameters:
    columnName - (undocumented)
    columnNames - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • hash

    Calculates the hash code of given columns, and returns the result as an int column.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
  • xxhash64

    Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column. The hash computation uses an initial seed of 42.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.0.0
  • reflect

    Calls a method with reflection.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • java_method

    Calls a method with reflection.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_reflect

    This is a special version of reflect that performs the same operation, but returns a NULL value instead of raising an error if the invoke method thrown exception.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • stack

    Separates col1, ..., colk into n rows. Uses column names col0, col1, etc. by default unless specified otherwise.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • concat_ws

    Concatenates multiple input string columns together into a single string column, using the given separator.

    Parameters:
    sep - (undocumented)
    exprs - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
    Note:
    Input strings which are null are skipped.
  • format_string

    Formats the arguments in printf-style and returns the result as a string column.

    Parameters:
    format - (undocumented)
    arguments - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • printf

    Formats the arguments in printf-style and returns the result as a string column.

    Parameters:
    format - (undocumented)
    arguments - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • elt

    Returns the n-th input, e.g., returns input2 when n is 2. The function returns NULL if the index exceeds the length of the array and spark.sql.ansi.enabled is set to false. If spark.sql.ansi.enabled is set to true, it throws ArrayIndexOutOfBoundsException for invalid indices.

    Parameters:
    inputs - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • concat

    Concatenates multiple input columns together into a single column. The function works with strings, binary and compatible array columns.

    Parameters:
    exprs - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
    Note:
    Returns null if any of the input columns are null.
  • json_tuple

    Creates a new row for a json column according to the given field names.

    Parameters:
    json - (undocumented)
    fields - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • arrays_zip

    Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • map_concat

    Returns the union of all the given maps.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • callUDF

    Call an user-defined function.

    Parameters:
    udfName - (undocumented)
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • call_udf

    Call an user-defined function. Example:

    
      import org.apache.spark.sql._
    
      val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value")
      val spark = df.sparkSession
      spark.udf.register("simpleUDF", (v: Int) => v * v)
      df.select($"id", call_udf("simpleUDF", $"value"))
     
    Parameters:
    udfName - (undocumented)
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.2.0
  • call_function

    Parameters:
    funcName - function name that follows the SQL identifier syntax (can be quoted, can be qualified)
    cols - the expression parameters of function
    Returns:
    (undocumented)
    Since:
    3.5.0
  • col

    Returns a Column based on the given column name.

    Parameters:
    colName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • column

    Returns a Column based on the given column name. Alias of col(java.lang.String).

    Parameters:
    colName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • lit

    Creates a Column of literal value.

    The passed in object is returned directly if it is already a Column. If the object is a Scala Symbol, it is converted into a Column also. Otherwise, a new Column is created to represent the literal value.

    Parameters:
    literal - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • typedLit

    public static <T> Column typedLit(T literal, scala.reflect.api.TypeTags.TypeTag<T> evidence$1)

    Creates a Column of literal value.

    An alias of typedlit, and it is encouraged to use typedlit directly.

    Parameters:
    literal - (undocumented)
    evidence$1 - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.2.0
  • typedlit

    public static <T> Column typedlit(T literal, scala.reflect.api.TypeTags.TypeTag<T> evidence$2)

    Creates a Column of literal value.

    The passed in object is returned directly if it is already a Column. If the object is a Scala Symbol, it is converted into a Column also. Otherwise, a new Column is created to represent the literal value. The difference between this function and lit(java.lang.Object) is that this function can handle parameterized scala types e.g.: List, Seq and Map.

    Parameters:
    literal - (undocumented)
    evidence$2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.2.0
    Note:
    typedlit will call expensive Scala reflection APIs. lit is preferred if parameterized Scala types are not used.
  • asc

    Returns a sort expression based on ascending order of the column.

    
       df.sort(asc("dept"), desc("age"))
     
    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • asc_nulls_first

    public static Column asc_nulls_first(String columnName)

    Returns a sort expression based on ascending order of the column, and null values return before non-null values.

    
       df.sort(asc_nulls_first("dept"), desc("age"))
     
    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.1.0
  • asc_nulls_last

    public static Column asc_nulls_last(String columnName)

    Returns a sort expression based on ascending order of the column, and null values appear after non-null values.

    
       df.sort(asc_nulls_last("dept"), desc("age"))
     
    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.1.0
  • desc

    Returns a sort expression based on the descending order of the column.

    
       df.sort(asc("dept"), desc("age"))
     
    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • desc_nulls_first

    public static Column desc_nulls_first(String columnName)

    Returns a sort expression based on the descending order of the column, and null values appear before non-null values.

    
       df.sort(asc("dept"), desc_nulls_first("age"))
     
    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.1.0
  • desc_nulls_last

    public static Column desc_nulls_last(String columnName)

    Returns a sort expression based on the descending order of the column, and null values appear after non-null values.

    
       df.sort(asc("dept"), desc_nulls_last("age"))
     
    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.1.0
  • approxCountDistinct

    public static Column approxCountDistinct(Column e)

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • approxCountDistinct

    public static Column approxCountDistinct(String columnName)

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • approxCountDistinct

    public static Column approxCountDistinct(Column e, double rsd)

    Parameters:
    e - (undocumented)
    rsd - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • approxCountDistinct

    public static Column approxCountDistinct(String columnName, double rsd)

    Parameters:
    columnName - (undocumented)
    rsd - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • approx_count_distinct

    public static Column approx_count_distinct(Column e)

    Aggregate function: returns the approximate number of distinct items in a group.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.1.0
  • approx_count_distinct

    public static Column approx_count_distinct(String columnName)

    Aggregate function: returns the approximate number of distinct items in a group.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.1.0
  • approx_count_distinct

    public static Column approx_count_distinct(Column e, double rsd)

    Aggregate function: returns the approximate number of distinct items in a group.

    Parameters:
    rsd - maximum relative standard deviation allowed (default = 0.05)
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.1.0
  • approx_count_distinct

    public static Column approx_count_distinct(String columnName, double rsd)

    Aggregate function: returns the approximate number of distinct items in a group.

    Parameters:
    rsd - maximum relative standard deviation allowed (default = 0.05)
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.1.0
  • avg

    Aggregate function: returns the average of the values in a group.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • avg

    Aggregate function: returns the average of the values in a group.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • collect_list

    Aggregate function: returns a list of objects with duplicates.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
    Note:
    The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
  • collect_list

    public static Column collect_list(String columnName)

    Aggregate function: returns a list of objects with duplicates.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
    Note:
    The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
  • collect_set

    Aggregate function: returns a set of objects with duplicate elements eliminated.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
    Note:
    The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
  • collect_set

    public static Column collect_set(String columnName)

    Aggregate function: returns a set of objects with duplicate elements eliminated.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
    Note:
    The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
  • count_min_sketch

    Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a CountMinSketch before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.

    Parameters:
    e - (undocumented)
    eps - (undocumented)
    confidence - (undocumented)
    seed - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • count_min_sketch

    Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a CountMinSketch before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.

    Parameters:
    e - (undocumented)
    eps - (undocumented)
    confidence - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • corr

    Aggregate function: returns the Pearson Correlation Coefficient for two columns.

    Parameters:
    column1 - (undocumented)
    column2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • corr

    Aggregate function: returns the Pearson Correlation Coefficient for two columns.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • count

    Aggregate function: returns the number of items in a group.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • count

    Aggregate function: returns the number of items in a group.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • countDistinct

    public static Column countDistinct(Column expr, scala.collection.immutable.Seq<Column> exprs)

    Aggregate function: returns the number of distinct items in a group.

    An alias of count_distinct, and it is encouraged to use count_distinct directly.

    Parameters:
    expr - (undocumented)
    exprs - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • countDistinct

    public static Column countDistinct(String columnName, scala.collection.immutable.Seq<String> columnNames)

    Aggregate function: returns the number of distinct items in a group.

    An alias of count_distinct, and it is encouraged to use count_distinct directly.

    Parameters:
    columnName - (undocumented)
    columnNames - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • count_distinct

    public static Column count_distinct(Column expr, scala.collection.immutable.Seq<Column> exprs)

    Aggregate function: returns the number of distinct items in a group.

    Parameters:
    expr - (undocumented)
    exprs - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.2.0
  • covar_pop

    Aggregate function: returns the population covariance for two columns.

    Parameters:
    column1 - (undocumented)
    column2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
  • covar_pop

    Aggregate function: returns the population covariance for two columns.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
  • covar_samp

    Aggregate function: returns the sample covariance for two columns.

    Parameters:
    column1 - (undocumented)
    column2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
  • covar_samp

    Aggregate function: returns the sample covariance for two columns.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
  • first

    public static Column first(Column e, boolean ignoreNulls)

    Aggregate function: returns the first value in a group.

    The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Parameters:
    e - (undocumented)
    ignoreNulls - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
    Note:
    The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
  • first

    public static Column first(String columnName, boolean ignoreNulls)

    Aggregate function: returns the first value of a column in a group.

    The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Parameters:
    columnName - (undocumented)
    ignoreNulls - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
    Note:
    The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
  • first

    Aggregate function: returns the first value in a group.

    The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
    Note:
    The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
  • first

    Aggregate function: returns the first value of a column in a group.

    The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
    Note:
    The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
  • first_value

    Aggregate function: returns the first value in a group.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
    Note:
    The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
  • first_value

    Aggregate function: returns the first value in a group.

    The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Parameters:
    e - (undocumented)
    ignoreNulls - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
    Note:
    The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
  • grouping

    Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
  • grouping

    Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
  • grouping_id

    public static Column grouping_id(scala.collection.immutable.Seq<Column> cols)

    Aggregate function: returns the level of grouping, equals to

    
       (grouping(c1) <<; (n-1)) + (grouping(c2) <<; (n-2)) + ... + grouping(cn)
     
    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
    Note:
    The list of columns should match with grouping columns exactly, or empty (means all the grouping columns).
  • grouping_id

    public static Column grouping_id(String colName, scala.collection.immutable.Seq<String> colNames)

    Aggregate function: returns the level of grouping, equals to

    
       (grouping(c1) <<; (n-1)) + (grouping(c2) <<; (n-2)) + ... + grouping(cn)
     
    Parameters:
    colName - (undocumented)
    colNames - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
    Note:
    The list of columns should match with grouping columns exactly.
  • hll_sketch_agg

    Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.

    Parameters:
    e - (undocumented)
    lgConfigK - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • hll_sketch_agg

    public static Column hll_sketch_agg(Column e, int lgConfigK)

    Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.

    Parameters:
    e - (undocumented)
    lgConfigK - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • hll_sketch_agg

    public static Column hll_sketch_agg(String columnName, int lgConfigK)

    Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.

    Parameters:
    columnName - (undocumented)
    lgConfigK - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • hll_sketch_agg

    Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with default lgConfigK value.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • hll_sketch_agg

    public static Column hll_sketch_agg(String columnName)

    Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with default lgConfigK value.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • hll_union_agg

    public static Column hll_union_agg(Column e, Column allowDifferentLgConfigK)

    Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is set to false.

    Parameters:
    e - (undocumented)
    allowDifferentLgConfigK - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • hll_union_agg

    public static Column hll_union_agg(Column e, boolean allowDifferentLgConfigK)

    Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is set to false.

    Parameters:
    e - (undocumented)
    allowDifferentLgConfigK - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • hll_union_agg

    public static Column hll_union_agg(String columnName, boolean allowDifferentLgConfigK)

    Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is set to false.

    Parameters:
    columnName - (undocumented)
    allowDifferentLgConfigK - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • hll_union_agg

    Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • hll_union_agg

    public static Column hll_union_agg(String columnName)

    Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • kurtosis

    Aggregate function: returns the kurtosis of the values in a group.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • kurtosis

    Aggregate function: returns the kurtosis of the values in a group.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • last

    public static Column last(Column e, boolean ignoreNulls)

    Aggregate function: returns the last value in a group.

    The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Parameters:
    e - (undocumented)
    ignoreNulls - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
    Note:
    The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
  • last

    public static Column last(String columnName, boolean ignoreNulls)

    Aggregate function: returns the last value of the column in a group.

    The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Parameters:
    columnName - (undocumented)
    ignoreNulls - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
    Note:
    The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
  • last

    Aggregate function: returns the last value in a group.

    The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
    Note:
    The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
  • last

    Aggregate function: returns the last value of the column in a group.

    The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
    Note:
    The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
  • last_value

    Aggregate function: returns the last value in a group.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
    Note:
    The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
  • last_value

    Aggregate function: returns the last value in a group.

    The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Parameters:
    e - (undocumented)
    ignoreNulls - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
    Note:
    The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
  • make_time

    Create time from hour, minute and second fields. For invalid inputs it will throw an error.

    Parameters:
    hour - the hour to represent, from 0 to 23
    minute - the minute to represent, from 0 to 59
    second - the second to represent, from 0 to 59.999999
    Returns:
    (undocumented)
    Since:
    4.1.0
  • mode

    Aggregate function: returns the most frequent value in a group.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.4.0
  • mode

    public static Column mode(Column e, boolean deterministic)

    Aggregate function: returns the most frequent value in a group.

    When multiple values have the same greatest frequency then either any of values is returned if deterministic is false or is not defined, or the lowest value is returned if deterministic is true.

    Parameters:
    e - (undocumented)
    deterministic - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • max

    Aggregate function: returns the maximum value of the expression in a group.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • max

    Aggregate function: returns the maximum value of the column in a group.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • max_by

    Aggregate function: returns the value associated with the maximum value of ord.

    Parameters:
    e - (undocumented)
    ord - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.3.0
    Note:
    The function is non-deterministic so the output order can be different for those associated the same values of e.
  • max_by

    Aggregate function: returns an array of values associated with the top k values of ord.

    The result array contains values in descending order by their associated ordering values. Returns null if there are no non-null ordering values.

    Parameters:
    e - (undocumented)
    ord - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
    Note:
    The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle when there are ties in the ordering expression., The maximum value of k is 100000.
  • max_by

    Aggregate function: returns an array of values associated with the top k values of ord.

    The result array contains values in descending order by their associated ordering values. Returns null if there are no non-null ordering values.

    Parameters:
    e - (undocumented)
    ord - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
    Note:
    The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle when there are ties in the ordering expression., The maximum value of k is 100000.
  • mean

    Aggregate function: returns the average of the values in a group. Alias for avg.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • mean

    Aggregate function: returns the average of the values in a group. Alias for avg.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • median

    Aggregate function: returns the median of the values in a group.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.4.0
  • min

    Aggregate function: returns the minimum value of the expression in a group.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • min

    Aggregate function: returns the minimum value of the column in a group.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • min_by

    Aggregate function: returns the value associated with the minimum value of ord.

    Parameters:
    e - (undocumented)
    ord - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.3.0
    Note:
    The function is non-deterministic so the output order can be different for those associated the same values of e.
  • min_by

    Aggregate function: returns an array of values associated with the bottom k values of ord.

    The result array contains values in ascending order by their associated ordering values. Returns null if there are no non-null ordering values.

    Parameters:
    e - (undocumented)
    ord - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
    Note:
    The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle when there are ties in the ordering expression., The maximum value of k is 100000.
  • min_by

    Aggregate function: returns an array of values associated with the bottom k values of ord.

    The result array contains values in ascending order by their associated ordering values. Returns null if there are no non-null ordering values.

    Parameters:
    e - (undocumented)
    ord - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
    Note:
    The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle when there are ties in the ordering expression., The maximum value of k is 100000.
  • percentile

    Aggregate function: returns the exact percentile(s) of numeric column expr at the given percentage(s) with value range in [0.0, 1.0].

    Parameters:
    e - (undocumented)
    percentage - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • percentile

    Aggregate function: returns the exact percentile(s) of numeric column expr at the given percentage(s) with value range in [0.0, 1.0].

    Parameters:
    e - (undocumented)
    percentage - (undocumented)
    frequency - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • percentile_approx

    Aggregate function: returns the approximate percentile of the numeric column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value.

    If percentage is an array, each value must be between 0.0 and 1.0. If it is a single floating point value, it must be between 0.0 and 1.0.

    The accuracy parameter is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation.

    Parameters:
    e - (undocumented)
    percentage - (undocumented)
    accuracy - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.1.0
  • approx_percentile

    Aggregate function: returns the approximate percentile of the numeric column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value.

    If percentage is an array, each value must be between 0.0 and 1.0. If it is a single floating point value, it must be between 0.0 and 1.0.

    The accuracy parameter is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation.

    Parameters:
    e - (undocumented)
    percentage - (undocumented)
    accuracy - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • product

    Aggregate function: returns the product of all numerical elements in a group.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.2.0
  • skewness

    Aggregate function: returns the skewness of the values in a group.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • skewness

    Aggregate function: returns the skewness of the values in a group.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • std

    Aggregate function: alias for stddev_samp.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • stddev

    Aggregate function: alias for stddev_samp.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • stddev

    Aggregate function: alias for stddev_samp.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • stddev_samp

    Aggregate function: returns the sample standard deviation of the expression in a group.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • stddev_samp

    public static Column stddev_samp(String columnName)

    Aggregate function: returns the sample standard deviation of the expression in a group.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • stddev_pop

    Aggregate function: returns the population standard deviation of the expression in a group.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • stddev_pop

    public static Column stddev_pop(String columnName)

    Aggregate function: returns the population standard deviation of the expression in a group.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • sum

    Aggregate function: returns the sum of all values in the expression.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • sum

    Aggregate function: returns the sum of all values in the given column.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • sumDistinct

    Aggregate function: returns the sum of distinct values in the expression.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • sumDistinct

    public static Column sumDistinct(String columnName)

    Aggregate function: returns the sum of distinct values in the expression.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • sum_distinct

    Aggregate function: returns the sum of distinct values in the expression.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.2.0
  • theta_intersection_agg

    public static Column theta_intersection_agg(Column e)

    Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch, generated by intersecting the Datasketches ThetaSketch instances in the input column via a Datasketches Intersection instance.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_intersection_agg

    public static Column theta_intersection_agg(String columnName)

    Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch, generated by intersecting the Datasketches ThetaSketch instances in the input volumn via a Datasketches Intersection instance.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_sketch_agg

    Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch built with the values in the input column and configured with the lgNomEntries nominal entries.

    Parameters:
    e - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_sketch_agg

    public static Column theta_sketch_agg(Column e, int lgNomEntries)

    Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch built with the values in the input column and configured with the lgNomEntries nominal entries.

    Parameters:
    e - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_sketch_agg

    public static Column theta_sketch_agg(String columnName, int lgNomEntries)

    Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch built with the values in the input column and configured with the lgNomEntries nominal entries.

    Parameters:
    columnName - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_sketch_agg

    Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch built with the values in the input column and configured with the default value of 12 for lgNomEntries.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_sketch_agg

    public static Column theta_sketch_agg(String columnName)

    Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch built with the values in the input column and configured with the default value of 12 for lgNomEntries.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_union_agg

    Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch, generated by the union of Datasketches ThetaSketch instances in the input column via a Datasketches Union instance. It allows the configuration of lgNomEntries log nominal entries for the union buffer.

    Parameters:
    e - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_union_agg

    public static Column theta_union_agg(Column e, int lgNomEntries)

    Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch, generated by the union of Datasketches ThetaSketch instances in the input column via a Datasketches Union instance. It allows the configuration of lgNomEntries log nominal entries for the union buffer.

    Parameters:
    e - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_union_agg

    public static Column theta_union_agg(String columnName, int lgNomEntries)

    Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch, generated by the union of Datasketches ThetaSketch instances in the input column via a Datasketches Union instance. It allows the configuration of lgNomEntries log nominal entries for the union buffer.

    Parameters:
    columnName - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_union_agg

    Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch, generated by the union of Datasketches ThetaSketch instances in the input column via a Datasketches Union instance. It is configured with the default value of 12 for lgNomEntries.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_union_agg

    public static Column theta_union_agg(String columnName)

    Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch, generated by the union of Datasketches ThetaSketch instances in the input column via a Datasketches Union instance. It is configured with the default value of 12 for lgNomEntries.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • tuple_intersection_agg_double

    public static Column tuple_intersection_agg_double(Column e, Column mode)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary, generated by intersecting the Datasketches TupleSketch instances in the input column via a Datasketches Intersection instance. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    e - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_agg_double

    public static Column tuple_intersection_agg_double(Column e, String mode)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary, generated by intersecting the Datasketches TupleSketch instances in the input column via a Datasketches Intersection instance. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    e - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_agg_double

    public static Column tuple_intersection_agg_double(String columnName, String mode)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary, generated by intersecting the Datasketches TupleSketch instances in the input column via a Datasketches Intersection instance. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    columnName - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_agg_double

    public static Column tuple_intersection_agg_double(Column e)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary, generated by intersecting the Datasketches TupleSketch instances in the input column via a Datasketches Intersection instance. It is configured with the default mode of 'sum'.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_agg_double

    public static Column tuple_intersection_agg_double(String columnName)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary, generated by intersecting the Datasketches TupleSketch instances in the input column via a Datasketches Intersection instance. It is configured with the default mode of 'sum'.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_agg_integer

    public static Column tuple_intersection_agg_integer(Column e, Column mode)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary, generated by intersecting the Datasketches TupleSketch instances in the input column via a Datasketches Intersection instance. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    e - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_agg_integer

    public static Column tuple_intersection_agg_integer(Column e, String mode)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary, generated by intersecting the Datasketches TupleSketch instances in the input column via a Datasketches Intersection instance. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    e - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_agg_integer

    public static Column tuple_intersection_agg_integer(String columnName, String mode)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary, generated by intersecting the Datasketches TupleSketch instances in the input column via a Datasketches Intersection instance. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    columnName - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_agg_integer

    public static Column tuple_intersection_agg_integer(Column e)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary, generated by intersecting the Datasketches TupleSketch instances in the input column via a Datasketches Intersection instance. It is configured with the default mode of 'sum'.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_agg_integer

    public static Column tuple_intersection_agg_integer(String columnName)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary, generated by intersecting the Datasketches TupleSketch instances in the input column via a Datasketches Intersection instance. It is configured with the default mode of 'sum'.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_agg_double

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary built with the key and summary values in the input columns and configured with the lgNomEntries nominal entries and aggregation mode. The mode parameter specifies the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    key - (undocumented)
    summary - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_agg_double

    public static Column tuple_sketch_agg_double(Column key, Column summary, int lgNomEntries, String mode)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary built with the key and summary values in the input columns and configured with the lgNomEntries nominal entries and aggregation mode. The mode parameter specifies the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    key - (undocumented)
    summary - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_agg_double

    public static Column tuple_sketch_agg_double(String keyColumnName, String summaryColumnName, int lgNomEntries, String mode)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary built with the key and summary values in the input columns and configured with the lgNomEntries nominal entries and aggregation mode. The mode parameter specifies the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    keyColumnName - (undocumented)
    summaryColumnName - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_agg_double

    public static Column tuple_sketch_agg_double(Column key, Column summary, int lgNomEntries)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary built with the key and summary values in the input columns and configured with the lgNomEntries nominal entries. It uses the default mode of 'sum'.

    Parameters:
    key - (undocumented)
    summary - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_agg_double

    public static Column tuple_sketch_agg_double(String keyColumnName, String summaryColumnName, int lgNomEntries)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary built with the key and summary values in the input columns and configured with the lgNomEntries nominal entries. It uses the default mode of 'sum'.

    Parameters:
    keyColumnName - (undocumented)
    summaryColumnName - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_agg_double

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary built with the key and summary values in the input columns. It uses the default values of 12 for lgNomEntries and 'sum' for mode.

    Parameters:
    key - (undocumented)
    summary - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_agg_double

    public static Column tuple_sketch_agg_double(String keyColumnName, String summaryColumnName)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary built with the key and summary values in the input columns. It uses the default values of 12 for lgNomEntries and 'sum' for mode.

    Parameters:
    keyColumnName - (undocumented)
    summaryColumnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_agg_integer

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary built with the key and summary values in the input columns and configured with the lgNomEntries nominal entries and aggregation mode. The mode parameter specifies the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    key - (undocumented)
    summary - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_agg_integer

    public static Column tuple_sketch_agg_integer(Column key, Column summary, int lgNomEntries, String mode)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary built with the key and summary values in the input columns and configured with the lgNomEntries nominal entries and aggregation mode. The mode parameter specifies the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    key - (undocumented)
    summary - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_agg_integer

    public static Column tuple_sketch_agg_integer(String keyColumnName, String summaryColumnName, int lgNomEntries, String mode)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary built with the key and summary values in the input columns and configured with the lgNomEntries nominal entries and aggregation mode. The mode parameter specifies the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    keyColumnName - (undocumented)
    summaryColumnName - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_agg_integer

    public static Column tuple_sketch_agg_integer(Column key, Column summary, int lgNomEntries)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary built with the key and summary values in the input columns and configured with the lgNomEntries nominal entries. It uses the default mode of 'sum'.

    Parameters:
    key - (undocumented)
    summary - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_agg_integer

    public static Column tuple_sketch_agg_integer(String keyColumnName, String summaryColumnName, int lgNomEntries)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary built with the key and summary values in the input columns and configured with the lgNomEntries nominal entries. It uses the default mode of 'sum'.

    Parameters:
    keyColumnName - (undocumented)
    summaryColumnName - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_agg_integer

    public static Column tuple_sketch_agg_integer(Column key, Column summary)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary built with the key and summary values in the input columns. It uses the default values of 12 for lgNomEntries and 'sum' for mode.

    Parameters:
    key - (undocumented)
    summary - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_agg_integer

    public static Column tuple_sketch_agg_integer(String keyColumnName, String summaryColumnName)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary built with the key and summary values in the input columns. It uses the default values of 12 for lgNomEntries and 'sum' for mode.

    Parameters:
    keyColumnName - (undocumented)
    summaryColumnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_agg_double

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary, generated by the union of Datasketches TupleSketch instances in the input column via a Datasketches Union instance. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    e - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_agg_double

    public static Column tuple_union_agg_double(Column e, int lgNomEntries, String mode)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary, generated by the union of Datasketches TupleSketch instances in the input column via a Datasketches Union instance. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    e - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_agg_double

    public static Column tuple_union_agg_double(String columnName, int lgNomEntries, String mode)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary, generated by the union of Datasketches TupleSketch instances in the input column via a Datasketches Union instance. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    columnName - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_agg_double

    public static Column tuple_union_agg_double(Column e, int lgNomEntries)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary, generated by the union of Datasketches TupleSketch instances in the input column via a Datasketches Union instance. It allows the configuration of lgNomEntries log nominal entries for the union buffer. It uses the default mode of 'sum'.

    Parameters:
    e - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_agg_double

    public static Column tuple_union_agg_double(String columnName, int lgNomEntries)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary, generated by the union of Datasketches TupleSketch instances in the input column via a Datasketches Union instance. It allows the configuration of lgNomEntries log nominal entries for the union buffer. It uses the default mode of 'sum'.

    Parameters:
    columnName - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_agg_double

    public static Column tuple_union_agg_double(Column e)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary, generated by the union of Datasketches TupleSketch instances in the input column via a Datasketches Union instance. It is configured with the default values of 12 for lgNomEntries and 'sum' for mode.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_agg_double

    public static Column tuple_union_agg_double(String columnName)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with a double type summary, generated by the union of Datasketches TupleSketch instances in the input column via a Datasketches Union instance. It is configured with the default values of 12 for lgNomEntries and 'sum' for mode.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_agg_integer

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary, generated by the union of Datasketches TupleSketch instances in the input column via a Datasketches Union instance. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    e - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_agg_integer

    public static Column tuple_union_agg_integer(Column e, int lgNomEntries, String mode)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary, generated by the union of Datasketches TupleSketch instances in the input column via a Datasketches Union instance. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    e - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_agg_integer

    public static Column tuple_union_agg_integer(String columnName, int lgNomEntries, String mode)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary, generated by the union of Datasketches TupleSketch instances in the input column via a Datasketches Union instance. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    columnName - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_agg_integer

    public static Column tuple_union_agg_integer(Column e, int lgNomEntries)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary, generated by the union of Datasketches TupleSketch instances in the input column via a Datasketches Union instance. It allows the configuration of lgNomEntries log nominal entries for the union buffer. It uses the default mode of 'sum'.

    Parameters:
    e - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_agg_integer

    public static Column tuple_union_agg_integer(String columnName, int lgNomEntries)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary, generated by the union of Datasketches TupleSketch instances in the input column via a Datasketches Union instance. It allows the configuration of lgNomEntries log nominal entries for the union buffer. It uses the default mode of 'sum'.

    Parameters:
    columnName - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_agg_integer

    public static Column tuple_union_agg_integer(Column e)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary, generated by the union of Datasketches TupleSketch instances in the input column via a Datasketches Union instance. It is configured with the default values of 12 for lgNomEntries and 'sum' for mode.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_agg_integer

    public static Column tuple_union_agg_integer(String columnName)

    Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with an integer type summary, generated by the union of Datasketches TupleSketch instances in the input column via a Datasketches Union instance. It is configured with the default values of 12 for lgNomEntries and 'sum' for mode.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • kll_sketch_agg_bigint

    Aggregate function: returns the compact binary representation of the Datasketches KllLongsSketch built with the values in the input column. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535).

    Parameters:
    e - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_agg_bigint

    public static Column kll_sketch_agg_bigint(Column e, int k)

    Aggregate function: returns the compact binary representation of the Datasketches KllLongsSketch built with the values in the input column. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535).

    Parameters:
    e - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_agg_bigint

    public static Column kll_sketch_agg_bigint(String columnName, int k)

    Aggregate function: returns the compact binary representation of the Datasketches KllLongsSketch built with the values in the input column. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535).

    Parameters:
    columnName - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_agg_bigint

    public static Column kll_sketch_agg_bigint(Column e)

    Aggregate function: returns the compact binary representation of the Datasketches KllLongsSketch built with the values in the input column with default k value of 200.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_agg_bigint

    public static Column kll_sketch_agg_bigint(String columnName)

    Aggregate function: returns the compact binary representation of the Datasketches KllLongsSketch built with the values in the input column with default k value of 200.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_agg_float

    Aggregate function: returns the compact binary representation of the Datasketches KllFloatsSketch built with the values in the input column. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535).

    Parameters:
    e - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_agg_float

    public static Column kll_sketch_agg_float(Column e, int k)

    Aggregate function: returns the compact binary representation of the Datasketches KllFloatsSketch built with the values in the input column. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535).

    Parameters:
    e - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_agg_float

    public static Column kll_sketch_agg_float(String columnName, int k)

    Aggregate function: returns the compact binary representation of the Datasketches KllFloatsSketch built with the values in the input column. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535).

    Parameters:
    columnName - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_agg_float

    public static Column kll_sketch_agg_float(Column e)

    Aggregate function: returns the compact binary representation of the Datasketches KllFloatsSketch built with the values in the input column with default k value of 200.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_agg_float

    public static Column kll_sketch_agg_float(String columnName)

    Aggregate function: returns the compact binary representation of the Datasketches KllFloatsSketch built with the values in the input column with default k value of 200.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_agg_double

    Aggregate function: returns the compact binary representation of the Datasketches KllDoublesSketch built with the values in the input column. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535).

    Parameters:
    e - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_agg_double

    public static Column kll_sketch_agg_double(Column e, int k)

    Aggregate function: returns the compact binary representation of the Datasketches KllDoublesSketch built with the values in the input column. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535).

    Parameters:
    e - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_agg_double

    public static Column kll_sketch_agg_double(String columnName, int k)

    Aggregate function: returns the compact binary representation of the Datasketches KllDoublesSketch built with the values in the input column. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535).

    Parameters:
    columnName - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_agg_double

    public static Column kll_sketch_agg_double(Column e)

    Aggregate function: returns the compact binary representation of the Datasketches KllDoublesSketch built with the values in the input column with default k value of 200.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_agg_double

    public static Column kll_sketch_agg_double(String columnName)

    Aggregate function: returns the compact binary representation of the Datasketches KllDoublesSketch built with the values in the input column with default k value of 200.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_merge_agg_bigint

    Aggregate function: merges binary KllLongsSketch representations and returns the merged sketch. The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

    Parameters:
    e - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_merge_agg_bigint

    public static Column kll_merge_agg_bigint(Column e, int k)

    Aggregate function: merges binary KllLongsSketch representations and returns the merged sketch. The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

    Parameters:
    e - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_merge_agg_bigint

    public static Column kll_merge_agg_bigint(String columnName, int k)

    Aggregate function: merges binary KllLongsSketch representations and returns the merged sketch. The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

    Parameters:
    columnName - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_merge_agg_bigint

    public static Column kll_merge_agg_bigint(Column e)

    Aggregate function: merges binary KllLongsSketch representations and returns the merged sketch. If k is not specified, the merged sketch adopts the k value from the first input sketch.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_merge_agg_bigint

    public static Column kll_merge_agg_bigint(String columnName)

    Aggregate function: merges binary KllLongsSketch representations and returns the merged sketch. If k is not specified, the merged sketch adopts the k value from the first input sketch.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_merge_agg_float

    Aggregate function: merges binary KllFloatsSketch representations and returns merged sketch. The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

    Parameters:
    e - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_merge_agg_float

    public static Column kll_merge_agg_float(Column e, int k)

    Aggregate function: merges binary KllFloatsSketch representations and returns merged sketch. The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

    Parameters:
    e - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_merge_agg_float

    public static Column kll_merge_agg_float(String columnName, int k)

    Aggregate function: merges binary KllFloatsSketch representations and returns merged sketch. The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

    Parameters:
    columnName - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_merge_agg_float

    public static Column kll_merge_agg_float(Column e)

    Aggregate function: merges binary KllFloatsSketch representations and returns merged sketch. If k is not specified, the merged sketch adopts the k value from the first input sketch.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_merge_agg_float

    public static Column kll_merge_agg_float(String columnName)

    Aggregate function: merges binary KllFloatsSketch representations and returns merged sketch. If k is not specified, the merged sketch adopts the k value from the first input sketch.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_merge_agg_double

    Aggregate function: merges binary KllDoublesSketch representations and returns merged sketch. The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

    Parameters:
    e - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_merge_agg_double

    public static Column kll_merge_agg_double(Column e, int k)

    Aggregate function: merges binary KllDoublesSketch representations and returns merged sketch. The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

    Parameters:
    e - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_merge_agg_double

    public static Column kll_merge_agg_double(String columnName, int k)

    Aggregate function: merges binary KllDoublesSketch representations and returns merged sketch. The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

    Parameters:
    columnName - (undocumented)
    k - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_merge_agg_double

    public static Column kll_merge_agg_double(Column e)

    Aggregate function: merges binary KllDoublesSketch representations and returns merged sketch. If k is not specified, the merged sketch adopts the k value from the first input sketch.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_merge_agg_double

    public static Column kll_merge_agg_double(String columnName)

    Aggregate function: merges binary KllDoublesSketch representations and returns merged sketch. If k is not specified, the merged sketch adopts the k value from the first input sketch.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • listagg

    Aggregate function: returns the concatenation of non-null input values.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • listagg

    Aggregate function: returns the concatenation of non-null input values, separated by the delimiter.

    Parameters:
    e - (undocumented)
    delimiter - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • listagg_distinct

    Aggregate function: returns the concatenation of distinct non-null input values.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • listagg_distinct

    Aggregate function: returns the concatenation of distinct non-null input values, separated by the delimiter.

    Parameters:
    e - (undocumented)
    delimiter - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • string_agg

    Aggregate function: returns the concatenation of non-null input values. Alias for listagg.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • string_agg

    Aggregate function: returns the concatenation of non-null input values, separated by the delimiter. Alias for listagg.

    Parameters:
    e - (undocumented)
    delimiter - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • string_agg_distinct

    public static Column string_agg_distinct(Column e)

    Aggregate function: returns the concatenation of distinct non-null input values. Alias for listagg.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • string_agg_distinct

    Aggregate function: returns the concatenation of distinct non-null input values, separated by the delimiter. Alias for listagg.

    Parameters:
    e - (undocumented)
    delimiter - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • variance

    Aggregate function: alias for var_samp.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • variance

    Aggregate function: alias for var_samp.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • var_samp

    Aggregate function: returns the unbiased variance of the values in a group.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • var_samp

    Aggregate function: returns the unbiased variance of the values in a group.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • var_pop

    Aggregate function: returns the population variance of the values in a group.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • var_pop

    Aggregate function: returns the population variance of the values in a group.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • regr_avgx

    Aggregate function: returns the average of the independent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

    Parameters:
    y - (undocumented)
    x - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • regr_avgy

    Aggregate function: returns the average of the independent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

    Parameters:
    y - (undocumented)
    x - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • regr_count

    Aggregate function: returns the number of non-null number pairs in a group, where y is the dependent variable and x is the independent variable.

    Parameters:
    y - (undocumented)
    x - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • regr_intercept

    Aggregate function: returns the intercept of the univariate linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

    Parameters:
    y - (undocumented)
    x - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • regr_r2

    Aggregate function: returns the coefficient of determination for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

    Parameters:
    y - (undocumented)
    x - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • regr_slope

    Aggregate function: returns the slope of the linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

    Parameters:
    y - (undocumented)
    x - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • regr_sxx

    Aggregate function: returns REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

    Parameters:
    y - (undocumented)
    x - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • regr_sxy

    Aggregate function: returns REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

    Parameters:
    y - (undocumented)
    x - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • regr_syy

    Aggregate function: returns REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

    Parameters:
    y - (undocumented)
    x - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • any_value

    Aggregate function: returns some value of e for a group of rows.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • any_value

    Aggregate function: returns some value of e for a group of rows. If isIgnoreNull is true, returns only non-null values.

    Parameters:
    e - (undocumented)
    ignoreNulls - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • count_if

    Aggregate function: returns the number of TRUE values for the expression.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • current_time

    public static Column current_time()

    Returns the current time at the start of query evaluation. Note that the result will contain 6 fractional digits of seconds.

    Returns:
    A time.
    Since:
    4.1.0
  • current_time

    public static Column current_time(int precision)

    Returns the current time at the start of query evaluation.

    Parameters:
    precision - An integer literal in the range [0..6], indicating how many fractional digits of seconds to include in the result.
    Returns:
    A time.
    Since:
    4.1.0
  • histogram_numeric

    Aggregate function: computes a histogram on numeric 'expr' using nb bins. The return value is an array of (x,y) pairs representing the centers of the histogram's bins. As the value of 'nb' is increased, the histogram approximation gets finer-grained, but may yield artifacts around outliers. In practice, 20-40 histogram bins appear to work well, with more bins being required for skewed or smaller datasets. Note that this function creates a histogram with non-uniform bin widths. It offers no guarantees in terms of the mean-squared-error of the histogram, but in practice is comparable to the histograms produced by the R/S-Plus statistical computing packages. Note: the output type of the 'x' field in the return value is propagated from the input value consumed in the aggregate function.

    Parameters:
    e - (undocumented)
    nBins - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • every

    Aggregate function: returns true if all values of e are true.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • bool_and

    Aggregate function: returns true if all values of e are true.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • some

    Aggregate function: returns true if at least one value of e is true.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • any

    Aggregate function: returns true if at least one value of e is true.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • bool_or

    Aggregate function: returns true if at least one value of e is true.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • bit_and

    Aggregate function: returns the bitwise AND of all non-null input values, or null if none.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • bit_or

    Aggregate function: returns the bitwise OR of all non-null input values, or null if none.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • bit_xor

    Aggregate function: returns the bitwise XOR of all non-null input values, or null if none.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • cume_dist

    public static Column cume_dist()

    Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.

    
       N = total number of rows in the partition
       cumeDist(x) = number of values before (and including) x / N
     
    Returns:
    (undocumented)
    Since:
    1.6.0
  • dense_rank

    public static Column dense_rank()

    Window function: returns the rank of rows within a window partition, without any gaps.

    The difference between rank and dense_rank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.

    This is equivalent to the DENSE_RANK function in SQL.

    Returns:
    (undocumented)
    Since:
    1.6.0
  • lag

    Window function: returns the value that is offset rows before the current row, and null if there is less than offset rows before the current row. For example, an offset of one will return the previous row at any given point in the window partition.

    This is equivalent to the LAG function in SQL.

    Parameters:
    e - (undocumented)
    offset - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • lag

    public static Column lag(String columnName, int offset)

    Window function: returns the value that is offset rows before the current row, and null if there is less than offset rows before the current row. For example, an offset of one will return the previous row at any given point in the window partition.

    This is equivalent to the LAG function in SQL.

    Parameters:
    columnName - (undocumented)
    offset - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • lag

    public static Column lag(String columnName, int offset, Object defaultValue)

    Window function: returns the value that is offset rows before the current row, and defaultValue if there is less than offset rows before the current row. For example, an offset of one will return the previous row at any given point in the window partition.

    This is equivalent to the LAG function in SQL.

    Parameters:
    columnName - (undocumented)
    offset - (undocumented)
    defaultValue - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • lag

    Window function: returns the value that is offset rows before the current row, and defaultValue if there is less than offset rows before the current row. For example, an offset of one will return the previous row at any given point in the window partition.

    This is equivalent to the LAG function in SQL.

    Parameters:
    e - (undocumented)
    offset - (undocumented)
    defaultValue - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • lag

    public static Column lag(Column e, int offset, Object defaultValue, boolean ignoreNulls)

    Window function: returns the value that is offset rows before the current row, and defaultValue if there is less than offset rows before the current row. ignoreNulls determines whether null values of row are included in or eliminated from the calculation. For example, an offset of one will return the previous row at any given point in the window partition.

    This is equivalent to the LAG function in SQL.

    Parameters:
    e - (undocumented)
    offset - (undocumented)
    defaultValue - (undocumented)
    ignoreNulls - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.2.0
  • lead

    public static Column lead(String columnName, int offset)

    Window function: returns the value that is offset rows after the current row, and null if there is less than offset rows after the current row. For example, an offset of one will return the next row at any given point in the window partition.

    This is equivalent to the LEAD function in SQL.

    Parameters:
    columnName - (undocumented)
    offset - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • lead

    Window function: returns the value that is offset rows after the current row, and null if there is less than offset rows after the current row. For example, an offset of one will return the next row at any given point in the window partition.

    This is equivalent to the LEAD function in SQL.

    Parameters:
    e - (undocumented)
    offset - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • lead

    public static Column lead(String columnName, int offset, Object defaultValue)

    Window function: returns the value that is offset rows after the current row, and defaultValue if there is less than offset rows after the current row. For example, an offset of one will return the next row at any given point in the window partition.

    This is equivalent to the LEAD function in SQL.

    Parameters:
    columnName - (undocumented)
    offset - (undocumented)
    defaultValue - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • lead

    Window function: returns the value that is offset rows after the current row, and defaultValue if there is less than offset rows after the current row. For example, an offset of one will return the next row at any given point in the window partition.

    This is equivalent to the LEAD function in SQL.

    Parameters:
    e - (undocumented)
    offset - (undocumented)
    defaultValue - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • lead

    public static Column lead(Column e, int offset, Object defaultValue, boolean ignoreNulls)

    Window function: returns the value that is offset rows after the current row, and defaultValue if there is less than offset rows after the current row. ignoreNulls determines whether null values of row are included in or eliminated from the calculation. The default value of ignoreNulls is false. For example, an offset of one will return the next row at any given point in the window partition.

    This is equivalent to the LEAD function in SQL.

    Parameters:
    e - (undocumented)
    offset - (undocumented)
    defaultValue - (undocumented)
    ignoreNulls - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.2.0
  • nth_value

    public static Column nth_value(Column e, int offset, boolean ignoreNulls)

    Window function: returns the value that is the offsetth row of the window frame (counting from 1), and null if the size of window frame is less than offset rows.

    It will return the offsetth non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    This is equivalent to the nth_value function in SQL.

    Parameters:
    e - (undocumented)
    offset - (undocumented)
    ignoreNulls - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.1.0
  • nth_value

    public static Column nth_value(Column e, int offset)

    Window function: returns the value that is the offsetth row of the window frame (counting from 1), and null if the size of window frame is less than offset rows.

    This is equivalent to the nth_value function in SQL.

    Parameters:
    e - (undocumented)
    offset - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.1.0
  • ntile

    public static Column ntile(int n)

    Window function: returns the ntile group id (from 1 to n inclusive) in an ordered window partition. For example, if n is 4, the first quarter of the rows will get value 1, the second quarter will get 2, the third quarter will get 3, and the last quarter will get 4.

    This is equivalent to the NTILE function in SQL.

    Parameters:
    n - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • percent_rank

    public static Column percent_rank()

    Window function: returns the relative rank (i.e. percentile) of rows within a window partition.

    This is computed by:

    
       (rank of row in its partition - 1) / (number of rows in the partition - 1)
     

    This is equivalent to the PERCENT_RANK function in SQL.

    Returns:
    (undocumented)
    Since:
    1.6.0
  • rank

    public static Column rank()

    Window function: returns the rank of rows within a window partition.

    The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.

    This is equivalent to the RANK function in SQL.

    Returns:
    (undocumented)
    Since:
    1.4.0
  • row_number

    public static Column row_number()

    Window function: returns a sequential number starting at 1 within a window partition.

    Returns:
    (undocumented)
    Since:
    1.6.0
  • array

    public static Column array(scala.collection.immutable.Seq<Column> cols)

    Creates a new array column. The input columns must all have the same data type.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • array

    public static Column array(String colName, scala.collection.immutable.Seq<String> colNames)

    Creates a new array column. The input columns must all have the same data type.

    Parameters:
    colName - (undocumented)
    colNames - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • map

    public static Column map(scala.collection.immutable.Seq<Column> cols)

    Creates a new map column. The input columns must be grouped as key-value pairs, e.g. (key1, value1, key2, value2, ...). The key columns must all have the same data type, and can't be null. The value columns must all have the same data type.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0
  • named_struct

    public static Column named_struct(scala.collection.immutable.Seq<Column> cols)

    Creates a struct with the given field names and values.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • map_from_arrays

    Creates a new map column. The array in the first column is used for keys. The array in the second column is used for values. All elements in the array for key should not be null.

    Parameters:
    keys - (undocumented)
    values - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4
  • str_to_map

    Creates a map after splitting the text into key/value pairs using delimiters. Both pairDelim and keyValueDelim are treated as regular expressions.

    Parameters:
    text - (undocumented)
    pairDelim - (undocumented)
    keyValueDelim - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • str_to_map

    Creates a map after splitting the text into key/value pairs using delimiters. The pairDelim is treated as regular expressions.

    Parameters:
    text - (undocumented)
    pairDelim - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • str_to_map

    Creates a map after splitting the text into key/value pairs using delimiters.

    Parameters:
    text - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • broadcast

    Marks a DataFrame as small enough for use in broadcast joins.

    The following example marks the right DataFrame for broadcast hash join using joinKey.

    
       // left and right are DataFrames
       left.join(broadcast(right), "joinKey")
     
    Parameters:
    df - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • coalesce

    public static Column coalesce(scala.collection.immutable.Seq<Column> e)

    Returns the first column that is not null, or null if all inputs are null.

    For example, coalesce(a, b, c) will return a if a is not null, or b if a is null and b is not null, or c if both a and b are null but c is not null.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • input_file_name

    public static Column input_file_name()

    Creates a string column for the file name of the current Spark task.

    Returns:
    (undocumented)
    Since:
    1.6.0
  • isnan

    Return true iff the column is NaN.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • isnull

    Return true iff the column is null.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • monotonicallyIncreasingId

    public static Column monotonicallyIncreasingId()

    A column expression that generates monotonically increasing 64-bit integers.

    The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.

    As an example, consider a DataFrame with two partitions, each with 3 records. This expression would return the following IDs:

    
     0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
     
    Returns:
    (undocumented)
    Since:
    1.4.0
  • monotonically_increasing_id

    public static Column monotonically_increasing_id()

    A column expression that generates monotonically increasing 64-bit integers.

    The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.

    As an example, consider a DataFrame with two partitions, each with 3 records. This expression would return the following IDs:

    
     0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
     
    Returns:
    (undocumented)
    Since:
    1.6.0
  • nanvl

    Returns col1 if it is not NaN, or col2 if col1 is NaN.

    Both inputs should be floating point columns (DoubleType or FloatType).

    Parameters:
    col1 - (undocumented)
    col2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • negate

    Unary minus, i.e. negate the expression.

    
       // Select the amount column and negates all values.
       // Scala:
       df.select( -df("amount") )
    
       // Java:
       df.select( negate(df.col("amount")) );
     
    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • not

    Inversion of boolean expression, i.e. NOT.

    
       // Scala: select rows that are not active (isActive === false)
       df.filter( !df("isActive") )
    
       // Java:
       df.filter( not(df.col("isActive")) );
     
    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • rand

    public static Column rand(long seed)

    Generate a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).

    Parameters:
    seed - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
    Note:
    The function is non-deterministic in general case.
  • rand

    public static Column rand()

    Generate a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).

    Returns:
    (undocumented)
    Since:
    1.4.0
    Note:
    The function is non-deterministic in general case.
  • randn

    public static Column randn(long seed)

    Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.

    Parameters:
    seed - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
    Note:
    The function is non-deterministic in general case.
  • randn

    public static Column randn()

    Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.

    Returns:
    (undocumented)
    Since:
    1.4.0
    Note:
    The function is non-deterministic in general case.
  • randstr

    Returns a string of the specified length whose characters are chosen uniformly at random from the following pool of characters: 0-9, a-z, A-Z. The string length must be a constant two-byte or four-byte integer (SMALLINT or INT, respectively).

    Parameters:
    length - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • randstr

    Returns a string of the specified length whose characters are chosen uniformly at random from the following pool of characters: 0-9, a-z, A-Z, with the chosen random seed. The string length must be a constant two-byte or four-byte integer (SMALLINT or INT, respectively).

    Parameters:
    length - (undocumented)
    seed - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • spark_partition_id

    public static Column spark_partition_id()

    Returns:
    (undocumented)
    Since:
    1.6.0
    Note:
    This is non-deterministic because it depends on data partitioning and task scheduling.
  • sqrt

    Computes the square root of the specified float value.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • sqrt

    Computes the square root of the specified float value.

    Parameters:
    colName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • try_add

    Returns the sum of left and right and the result is null on overflow. The acceptable input types are the same with the + operator.

    Parameters:
    left - (undocumented)
    right - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_avg

    Returns the mean calculated from values of a group and the result is null on overflow.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_divide

    Returns dividend/divisor. It always performs floating point division. Its result is always null if divisor is 0.

    Parameters:
    left - (undocumented)
    right - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_mod

    Returns the remainder of dividend/divisor. Its result is always null if divisor is 0.

    Parameters:
    left - (undocumented)
    right - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • try_multiply

    Returns left*right and the result is null on overflow. The acceptable input types are the same with the * operator.

    Parameters:
    left - (undocumented)
    right - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_subtract

    Returns left-right and the result is null on overflow. The acceptable input types are the same with the - operator.

    Parameters:
    left - (undocumented)
    right - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_sum

    Returns the sum calculated from values of a group and the result is null on overflow.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • struct

    public static Column struct(scala.collection.immutable.Seq<Column> cols)

    Creates a new struct column. If the input column is a column in a DataFrame, or a derived column expression that is named (i.e. aliased), its name would be retained as the StructField's name, otherwise, the newly generated StructField's name would be auto generated as col with a suffix index + 1, i.e. col1, col2, col3, ...

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • struct

    public static Column struct(String colName, scala.collection.immutable.Seq<String> colNames)

    Creates a new struct column that composes multiple input columns.

    Parameters:
    colName - (undocumented)
    colNames - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • when

    Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.

    
       // Example: encoding gender string column into integer.
    
       // Scala:
       people.select(when(people("gender") === "male", 0)
         .when(people("gender") === "female", 1)
         .otherwise(2))
    
       // Java:
       people.select(when(col("gender").equalTo("male"), 0)
         .when(col("gender").equalTo("female"), 1)
         .otherwise(2))
     
    Parameters:
    condition - (undocumented)
    value - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • bitwiseNOT

    Computes bitwise NOT (~) of a number.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • bitwise_not

    Computes bitwise NOT (~) of a number.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.2.0
  • bit_count

    Returns the number of bits that are set in the argument expr as an unsigned 64-bit integer, or NULL if the argument is NULL.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • bit_get

    Returns the value of the bit (0 or 1) at the specified position. The positions are numbered from right to left, starting at zero. The position argument cannot be negative.

    Parameters:
    e - (undocumented)
    pos - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • getbit

    Returns the value of the bit (0 or 1) at the specified position. The positions are numbered from right to left, starting at zero. The position argument cannot be negative.

    Parameters:
    e - (undocumented)
    pos - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • expr

    Parses the expression string into the column that it represents, similar to Dataset.selectExpr(java.lang.String...).

    
       // get the number of words of each length
       df.groupBy(expr("length(word)")).count()
     
    Parameters:
    expr - (undocumented)
    Returns:
    (undocumented)
  • abs

    Computes the absolute value of a numeric value.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • acos

    Parameters:
    e - (undocumented)
    Returns:
    inverse cosine of e in radians, as if computed by java.lang.Math.acos
    Since:
    1.4.0
  • acos

    Parameters:
    columnName - (undocumented)
    Returns:
    inverse cosine of columnName, as if computed by java.lang.Math.acos
    Since:
    1.4.0
  • acosh

    Parameters:
    e - (undocumented)
    Returns:
    inverse hyperbolic cosine of e
    Since:
    3.1.0
  • acosh

    Parameters:
    columnName - (undocumented)
    Returns:
    inverse hyperbolic cosine of columnName
    Since:
    3.1.0
  • asin

    Parameters:
    e - (undocumented)
    Returns:
    inverse sine of e in radians, as if computed by java.lang.Math.asin
    Since:
    1.4.0
  • asin

    Parameters:
    columnName - (undocumented)
    Returns:
    inverse sine of columnName, as if computed by java.lang.Math.asin
    Since:
    1.4.0
  • asinh

    Parameters:
    e - (undocumented)
    Returns:
    inverse hyperbolic sine of e
    Since:
    3.1.0
  • asinh

    Parameters:
    columnName - (undocumented)
    Returns:
    inverse hyperbolic sine of columnName
    Since:
    3.1.0
  • atan

    Parameters:
    e - (undocumented)
    Returns:
    inverse tangent of e as if computed by java.lang.Math.atan
    Since:
    1.4.0
  • atan

    Parameters:
    columnName - (undocumented)
    Returns:
    inverse tangent of columnName, as if computed by java.lang.Math.atan
    Since:
    1.4.0
  • atan2

    Parameters:
    y - coordinate on y-axis
    x - coordinate on x-axis
    Returns:
    the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2
    Since:
    1.4.0
  • atan2

    Parameters:
    y - coordinate on y-axis
    xName - coordinate on x-axis
    Returns:
    the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2
    Since:
    1.4.0
  • atan2

    Parameters:
    yName - coordinate on y-axis
    x - coordinate on x-axis
    Returns:
    the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2
    Since:
    1.4.0
  • atan2

    Parameters:
    yName - coordinate on y-axis
    xName - coordinate on x-axis
    Returns:
    the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2
    Since:
    1.4.0
  • atan2

    public static Column atan2(Column y, double xValue)

    Parameters:
    y - coordinate on y-axis
    xValue - coordinate on x-axis
    Returns:
    the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2
    Since:
    1.4.0
  • atan2

    public static Column atan2(String yName, double xValue)

    Parameters:
    yName - coordinate on y-axis
    xValue - coordinate on x-axis
    Returns:
    the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2
    Since:
    1.4.0
  • atan2

    public static Column atan2(double yValue, Column x)

    Parameters:
    yValue - coordinate on y-axis
    x - coordinate on x-axis
    Returns:
    the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2
    Since:
    1.4.0
  • atan2

    public static Column atan2(double yValue, String xName)

    Parameters:
    yValue - coordinate on y-axis
    xName - coordinate on x-axis
    Returns:
    the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2
    Since:
    1.4.0
  • atanh

    Parameters:
    e - (undocumented)
    Returns:
    inverse hyperbolic tangent of e
    Since:
    3.1.0
  • atanh

    Parameters:
    columnName - (undocumented)
    Returns:
    inverse hyperbolic tangent of columnName
    Since:
    3.1.0
  • bin

    An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • bin

    An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • cbrt

    Computes the cube-root of the given value.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • cbrt

    Computes the cube-root of the given column.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • ceil

    Computes the ceiling of the given value of e to scale decimal places.

    Parameters:
    e - (undocumented)
    scale - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.3.0
  • ceil

    Computes the ceiling of the given value of e to 0 decimal places.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • ceil

    Computes the ceiling of the given value of e to 0 decimal places.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • ceiling

    Computes the ceiling of the given value of e to scale decimal places.

    Parameters:
    e - (undocumented)
    scale - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • ceiling

    Computes the ceiling of the given value of e to 0 decimal places.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • conv

    public static Column conv(Column num, int fromBase, int toBase)

    Convert a number in a string column from one base to another.

    Parameters:
    num - (undocumented)
    fromBase - (undocumented)
    toBase - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • cos

    Parameters:
    e - angle in radians
    Returns:
    cosine of the angle, as if computed by java.lang.Math.cos
    Since:
    1.4.0
  • cos

    Parameters:
    columnName - angle in radians
    Returns:
    cosine of the angle, as if computed by java.lang.Math.cos
    Since:
    1.4.0
  • cosh

    Parameters:
    e - hyperbolic angle
    Returns:
    hyperbolic cosine of the angle, as if computed by java.lang.Math.cosh
    Since:
    1.4.0
  • cosh

    Parameters:
    columnName - hyperbolic angle
    Returns:
    hyperbolic cosine of the angle, as if computed by java.lang.Math.cosh
    Since:
    1.4.0
  • cot

    Parameters:
    e - angle in radians
    Returns:
    cotangent of the angle
    Since:
    3.3.0
  • csc

    Parameters:
    e - angle in radians
    Returns:
    cosecant of the angle
    Since:
    3.3.0
  • e

    Returns:
    (undocumented)
    Since:
    3.5.0
  • exp

    Computes the exponential of the given value.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • exp

    Computes the exponential of the given column.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • expm1

    Computes the exponential of the given value minus one.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • expm1

    Computes the exponential of the given column minus one.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • factorial

    Computes the factorial of the given value.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • floor

    Computes the floor of the given value of e to scale decimal places.

    Parameters:
    e - (undocumented)
    scale - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.3.0
  • floor

    Computes the floor of the given value of e to 0 decimal places.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • floor

    Computes the floor of the given column value to 0 decimal places.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • greatest

    public static Column greatest(scala.collection.immutable.Seq<Column> exprs)

    Returns the greatest value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

    Parameters:
    exprs - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • greatest

    public static Column greatest(String columnName, scala.collection.immutable.Seq<String> columnNames)

    Returns the greatest value of the list of column names, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

    Parameters:
    columnName - (undocumented)
    columnNames - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • hex

    Computes hex value of the given column.

    Parameters:
    column - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • unhex

    Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of number.

    Parameters:
    column - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • hypot

    Computes sqrt(a^2^ + b^2^) without intermediate overflow or underflow.

    Parameters:
    l - (undocumented)
    r - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • hypot

    Computes sqrt(a^2^ + b^2^) without intermediate overflow or underflow.

    Parameters:
    l - (undocumented)
    rightName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • hypot

    Computes sqrt(a^2^ + b^2^) without intermediate overflow or underflow.

    Parameters:
    leftName - (undocumented)
    r - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • hypot

    Computes sqrt(a^2^ + b^2^) without intermediate overflow or underflow.

    Parameters:
    leftName - (undocumented)
    rightName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • hypot

    Computes sqrt(a^2^ + b^2^) without intermediate overflow or underflow.

    Parameters:
    l - (undocumented)
    r - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • hypot

    public static Column hypot(String leftName, double r)

    Computes sqrt(a^2^ + b^2^) without intermediate overflow or underflow.

    Parameters:
    leftName - (undocumented)
    r - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • hypot

    Computes sqrt(a^2^ + b^2^) without intermediate overflow or underflow.

    Parameters:
    l - (undocumented)
    r - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • hypot

    public static Column hypot(double l, String rightName)

    Computes sqrt(a^2^ + b^2^) without intermediate overflow or underflow.

    Parameters:
    l - (undocumented)
    rightName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • least

    public static Column least(scala.collection.immutable.Seq<Column> exprs)

    Returns the least value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

    Parameters:
    exprs - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • least

    public static Column least(String columnName, scala.collection.immutable.Seq<String> columnNames)

    Returns the least value of the list of column names, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

    Parameters:
    columnName - (undocumented)
    columnNames - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • ln

    Computes the natural logarithm of the given value.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • log

    Computes the natural logarithm of the given value.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • log

    Computes the natural logarithm of the given column.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • log

    Returns the first argument-base logarithm of the second argument.

    Parameters:
    base - (undocumented)
    a - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • log

    public static Column log(double base, String columnName)

    Returns the first argument-base logarithm of the second argument.

    Parameters:
    base - (undocumented)
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • log10

    Computes the logarithm of the given value in base 10.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • log10

    Computes the logarithm of the given value in base 10.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • log1p

    Computes the natural logarithm of the given value plus one.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • log1p

    Computes the natural logarithm of the given column plus one.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • log2

    Computes the logarithm of the given column in base 2.

    Parameters:
    expr - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • log2

    Computes the logarithm of the given value in base 2.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • negative

    Returns the negated value.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • pi

    public static Column pi()

    Returns:
    (undocumented)
    Since:
    3.5.0
  • positive

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • pow

    Returns the value of the first argument raised to the power of the second argument.

    Parameters:
    l - (undocumented)
    r - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • pow

    Returns the value of the first argument raised to the power of the second argument.

    Parameters:
    l - (undocumented)
    rightName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • pow

    Returns the value of the first argument raised to the power of the second argument.

    Parameters:
    leftName - (undocumented)
    r - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • pow

    Returns the value of the first argument raised to the power of the second argument.

    Parameters:
    leftName - (undocumented)
    rightName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • pow

    Returns the value of the first argument raised to the power of the second argument.

    Parameters:
    l - (undocumented)
    r - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • pow

    public static Column pow(String leftName, double r)

    Returns the value of the first argument raised to the power of the second argument.

    Parameters:
    leftName - (undocumented)
    r - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • pow

    Returns the value of the first argument raised to the power of the second argument.

    Parameters:
    l - (undocumented)
    r - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • pow

    public static Column pow(double l, String rightName)

    Returns the value of the first argument raised to the power of the second argument.

    Parameters:
    l - (undocumented)
    rightName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • power

    Returns the value of the first argument raised to the power of the second argument.

    Parameters:
    l - (undocumented)
    r - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • pmod

    Returns the positive value of dividend mod divisor.

    Parameters:
    dividend - (undocumented)
    divisor - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • rint

    Returns the double value that is closest in value to the argument and is equal to a mathematical integer.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • rint

    Returns the double value that is closest in value to the argument and is equal to a mathematical integer.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • round

    Returns the value of the column e rounded to 0 decimal places with HALF_UP round mode.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • round

    Round the value of e to scale decimal places with HALF_UP round mode if scale is greater than or equal to 0 or at integral part when scale is less than 0.

    Parameters:
    e - (undocumented)
    scale - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • round

    Round the value of e to scale decimal places with HALF_UP round mode if scale is greater than or equal to 0 or at integral part when scale is less than 0.

    Parameters:
    e - (undocumented)
    scale - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • bround

    Returns the value of the column e rounded to 0 decimal places with HALF_EVEN round mode.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
  • bround

    Round the value of e to scale decimal places with HALF_EVEN round mode if scale is greater than or equal to 0 or at integral part when scale is less than 0.

    Parameters:
    e - (undocumented)
    scale - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
  • bround

    Round the value of e to scale decimal places with HALF_EVEN round mode if scale is greater than or equal to 0 or at integral part when scale is less than 0.

    Parameters:
    e - (undocumented)
    scale - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • sec

    Parameters:
    e - angle in radians
    Returns:
    secant of the angle
    Since:
    3.3.0
  • shiftLeft

    public static Column shiftLeft(Column e, int numBits)

    Shift the given value numBits left. If the given value is a long value, this function will return a long value else it will return an integer value.

    Parameters:
    e - (undocumented)
    numBits - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • shiftleft

    public static Column shiftleft(Column e, int numBits)

    Shift the given value numBits left. If the given value is a long value, this function will return a long value else it will return an integer value.

    Parameters:
    e - (undocumented)
    numBits - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.2.0
  • shiftRight

    public static Column shiftRight(Column e, int numBits)

    (Signed) shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.

    Parameters:
    e - (undocumented)
    numBits - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • shiftright

    public static Column shiftright(Column e, int numBits)

    (Signed) shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.

    Parameters:
    e - (undocumented)
    numBits - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.2.0
  • shiftRightUnsigned

    public static Column shiftRightUnsigned(Column e, int numBits)

    Unsigned shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.

    Parameters:
    e - (undocumented)
    numBits - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • shiftrightunsigned

    public static Column shiftrightunsigned(Column e, int numBits)

    Unsigned shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.

    Parameters:
    e - (undocumented)
    numBits - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.2.0
  • sign

    Computes the signum of the given value.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • signum

    Computes the signum of the given value.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • signum

    Computes the signum of the given column.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • sin

    Parameters:
    e - angle in radians
    Returns:
    sine of the angle, as if computed by java.lang.Math.sin
    Since:
    1.4.0
  • sin

    Parameters:
    columnName - angle in radians
    Returns:
    sine of the angle, as if computed by java.lang.Math.sin
    Since:
    1.4.0
  • sinh

    Parameters:
    e - hyperbolic angle
    Returns:
    hyperbolic sine of the given value, as if computed by java.lang.Math.sinh
    Since:
    1.4.0
  • sinh

    Parameters:
    columnName - hyperbolic angle
    Returns:
    hyperbolic sine of the given value, as if computed by java.lang.Math.sinh
    Since:
    1.4.0
  • tan

    Parameters:
    e - angle in radians
    Returns:
    tangent of the given value, as if computed by java.lang.Math.tan
    Since:
    1.4.0
  • tan

    Parameters:
    columnName - angle in radians
    Returns:
    tangent of the given value, as if computed by java.lang.Math.tan
    Since:
    1.4.0
  • tanh

    Parameters:
    e - hyperbolic angle
    Returns:
    hyperbolic tangent of the given value, as if computed by java.lang.Math.tanh
    Since:
    1.4.0
  • tanh

    Parameters:
    columnName - hyperbolic angle
    Returns:
    hyperbolic tangent of the given value, as if computed by java.lang.Math.tanh
    Since:
    1.4.0
  • toDegrees

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • toDegrees

    public static Column toDegrees(String columnName)

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • degrees

    Converts an angle measured in radians to an approximately equivalent angle measured in degrees.

    Parameters:
    e - angle in radians
    Returns:
    angle in degrees, as if computed by java.lang.Math.toDegrees
    Since:
    2.1.0
  • degrees

    Converts an angle measured in radians to an approximately equivalent angle measured in degrees.

    Parameters:
    columnName - angle in radians
    Returns:
    angle in degrees, as if computed by java.lang.Math.toDegrees
    Since:
    2.1.0
  • toRadians

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • toRadians

    public static Column toRadians(String columnName)

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.4.0
  • radians

    Converts an angle measured in degrees to an approximately equivalent angle measured in radians.

    Parameters:
    e - angle in degrees
    Returns:
    angle in radians, as if computed by java.lang.Math.toRadians
    Since:
    2.1.0
  • radians

    Converts an angle measured in degrees to an approximately equivalent angle measured in radians.

    Parameters:
    columnName - angle in degrees
    Returns:
    angle in radians, as if computed by java.lang.Math.toRadians
    Since:
    2.1.0
  • width_bucket

    Returns the bucket number into which the value of this expression would fall after being evaluated. Note that input arguments must follow conditions listed below; otherwise, the method will return null.

    Parameters:
    v - value to compute a bucket number in the histogram
    min - minimum value of the histogram
    max - maximum value of the histogram
    numBucket - the number of buckets
    Returns:
    the bucket number into which the value would fall after being evaluated
    Since:
    3.5.0
  • current_catalog

    public static Column current_catalog()

    Returns the current catalog.

    Returns:
    (undocumented)
    Since:
    3.5.0
  • current_database

    public static Column current_database()

    Returns the current database.

    Returns:
    (undocumented)
    Since:
    3.5.0
  • current_schema

    public static Column current_schema()

    Returns the current schema.

    Returns:
    (undocumented)
    Since:
    3.5.0
  • current_user

    public static Column current_user()

    Returns the user name of current execution context.

    Returns:
    (undocumented)
    Since:
    3.5.0
  • md5

    Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • sha1

    Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • sha2

    Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.

    Parameters:
    e - column to compute SHA-2 on.
    numBits - one of 224, 256, 384, or 512.
    Returns:
    (undocumented)
    Since:
    1.5.0
  • crc32

    Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • hash

    public static Column hash(scala.collection.immutable.Seq<Column> cols)

    Calculates the hash code of given columns, and returns the result as an int column.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.0.0
  • xxhash64

    public static Column xxhash64(scala.collection.immutable.Seq<Column> cols)

    Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column. The hash computation uses an initial seed of 42.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.0.0
  • assert_true

    Returns null if the condition is true, and throws an exception otherwise.

    Parameters:
    c - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.1.0
  • assert_true

    Returns null if the condition is true; throws an exception with the error message otherwise.

    Parameters:
    c - (undocumented)
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.1.0
  • raise_error

    Throws an exception with the provided error message.

    Parameters:
    c - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.1.0
  • user

    public static Column user()

    Returns the user name of current execution context.

    Returns:
    (undocumented)
    Since:
    3.5.0
  • session_user

    public static Column session_user()

    Returns the user name of current execution context.

    Returns:
    (undocumented)
    Since:
    4.0.0
  • uuid

    public static Column uuid()

    Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.

    Returns:
    (undocumented)
    Since:
    3.5.0
  • uuid

    Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.

    Parameters:
    seed - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • aes_encrypt

    Returns an encrypted value of input using AES in given mode with the specified padding. Key lengths of 16, 24 and 32 bits are supported. Supported combinations of (mode, padding) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS'). Optional initialization vectors (IVs) are only supported for CBC and GCM modes. These must be 16 bytes for CBC and 12 bytes for GCM. If not provided, a random vector will be generated and prepended to the output. Optional additional authenticated data (AAD) is only supported for GCM. If provided for encryption, the identical AAD value must be provided for decryption. The default mode is GCM.

    Parameters:
    input - The binary value to encrypt.
    key - The passphrase to use to encrypt the data.
    mode - Specifies which block cipher mode should be used to encrypt messages. Valid modes: ECB, GCM, CBC.
    padding - Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC.
    iv - Optional initialization vector. Only supported for CBC and GCM modes. Valid values: None or "". 16-byte array for CBC mode. 12-byte array for GCM mode.
    aad - Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption.
    Returns:
    (undocumented)
    Since:
    3.5.0
  • aes_encrypt

    Returns an encrypted value of input.

    Parameters:
    input - (undocumented)
    key - (undocumented)
    mode - (undocumented)
    padding - (undocumented)
    iv - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
    See Also:
    • org.apache.spark.sql.functions.aes_encrypt(Column, Column, Column, Column, Column, Column)
  • aes_encrypt

    Returns an encrypted value of input.

    Parameters:
    input - (undocumented)
    key - (undocumented)
    mode - (undocumented)
    padding - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
    See Also:
    • org.apache.spark.sql.functions.aes_encrypt(Column, Column, Column, Column, Column, Column)
  • aes_encrypt

    Returns an encrypted value of input.

    Parameters:
    input - (undocumented)
    key - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
    See Also:
    • org.apache.spark.sql.functions.aes_encrypt(Column, Column, Column, Column, Column, Column)
  • aes_encrypt

    Returns an encrypted value of input.

    Parameters:
    input - (undocumented)
    key - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
    See Also:
    • org.apache.spark.sql.functions.aes_encrypt(Column, Column, Column, Column, Column, Column)
  • aes_decrypt

    Returns a decrypted value of input using AES in mode with padding. Key lengths of 16, 24 and 32 bits are supported. Supported combinations of (mode, padding) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS'). Optional additional authenticated data (AAD) is only supported for GCM. If provided for encryption, the identical AAD value must be provided for decryption. The default mode is GCM.

    Parameters:
    input - The binary value to decrypt.
    key - The passphrase to use to decrypt the data.
    mode - Specifies which block cipher mode should be used to decrypt messages. Valid modes: ECB, GCM, CBC.
    padding - Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC.
    aad - Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption.
    Returns:
    (undocumented)
    Since:
    3.5.0
  • aes_decrypt

    Returns a decrypted value of input.

    Parameters:
    input - (undocumented)
    key - (undocumented)
    mode - (undocumented)
    padding - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
    See Also:
    • org.apache.spark.sql.functions.aes_decrypt(Column, Column, Column, Column, Column)
  • aes_decrypt

    Returns a decrypted value of input.

    Parameters:
    input - (undocumented)
    key - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
    See Also:
    • org.apache.spark.sql.functions.aes_decrypt(Column, Column, Column, Column, Column)
  • aes_decrypt

    Returns a decrypted value of input.

    Parameters:
    input - (undocumented)
    key - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
    See Also:
    • org.apache.spark.sql.functions.aes_decrypt(Column, Column, Column, Column, Column)
  • try_aes_decrypt

    This is a special version of aes_decrypt that performs the same operation, but returns a NULL value instead of raising an error if the decryption cannot be performed.

    Parameters:
    input - The binary value to decrypt.
    key - The passphrase to use to decrypt the data.
    mode - Specifies which block cipher mode should be used to decrypt messages. Valid modes: ECB, GCM, CBC.
    padding - Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC.
    aad - Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption.
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_aes_decrypt

    Returns a decrypted value of input.

    Parameters:
    input - (undocumented)
    key - (undocumented)
    mode - (undocumented)
    padding - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
    See Also:
    • org.apache.spark.sql.functions.try_aes_decrypt(Column, Column, Column, Column, Column)
  • try_aes_decrypt

    Returns a decrypted value of input.

    Parameters:
    input - (undocumented)
    key - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
    See Also:
    • org.apache.spark.sql.functions.try_aes_decrypt(Column, Column, Column, Column, Column)
  • try_aes_decrypt

    Returns a decrypted value of input.

    Parameters:
    input - (undocumented)
    key - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
    See Also:
    • org.apache.spark.sql.functions.try_aes_decrypt(Column, Column, Column, Column, Column)
  • sha

    Returns a sha1 hash value as a hex string of the col.

    Parameters:
    col - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • input_file_block_length

    public static Column input_file_block_length()

    Returns the length of the block being read, or -1 if not available.

    Returns:
    (undocumented)
    Since:
    3.5.0
  • input_file_block_start

    public static Column input_file_block_start()

    Returns the start offset of the block being read, or -1 if not available.

    Returns:
    (undocumented)
    Since:
    3.5.0
  • reflect

    public static Column reflect(scala.collection.immutable.Seq<Column> cols)

    Calls a method with reflection.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • java_method

    public static Column java_method(scala.collection.immutable.Seq<Column> cols)

    Calls a method with reflection.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_reflect

    public static Column try_reflect(scala.collection.immutable.Seq<Column> cols)

    This is a special version of reflect that performs the same operation, but returns a NULL value instead of raising an error if the invoke method thrown exception.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • version

    public static Column version()

    Returns the Spark version. The string contains 2 fields, the first being a release version and the second being a git revision.

    Returns:
    (undocumented)
    Since:
    3.5.0
  • typeof

    Return DDL-formatted type string for the data type of the input.

    Parameters:
    col - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • stack

    public static Column stack(scala.collection.immutable.Seq<Column> cols)

    Separates col1, ..., colk into n rows. Uses column names col0, col1, etc. by default unless specified otherwise.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • uniform

    Returns a random value with independent and identically distributed (i.i.d.) values with the specified range of numbers. The provided numbers specifying the minimum and maximum values of the range must be constant. If both of these numbers are integers, then the result will also be an integer. Otherwise if one or both of these are floating-point numbers, then the result will also be a floating-point number.

    Parameters:
    min - (undocumented)
    max - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • uniform

    Returns a random value with independent and identically distributed (i.i.d.) values with the specified range of numbers, with the chosen random seed. The provided numbers specifying the minimum and maximum values of the range must be constant. If both of these numbers are integers, then the result will also be an integer. Otherwise if one or both of these are floating-point numbers, then the result will also be a floating-point number.

    Parameters:
    min - (undocumented)
    max - (undocumented)
    seed - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • random

    Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).

    Parameters:
    seed - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • random

    public static Column random()

    Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).

    Returns:
    (undocumented)
    Since:
    3.5.0
  • bitmap_bit_position

    public static Column bitmap_bit_position(Column col)

    Returns the bucket number for the given input column.

    Parameters:
    col - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • bitmap_bucket_number

    public static Column bitmap_bucket_number(Column col)

    Returns the bit position for the given input column.

    Parameters:
    col - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • bitmap_construct_agg

    public static Column bitmap_construct_agg(Column col)

    Returns a bitmap with the positions of the bits set from all the values from the input column. The input column will most likely be bitmap_bit_position().

    Parameters:
    col - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • bitmap_count

    Returns the number of set bits in the input bitmap.

    Parameters:
    col - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • bitmap_or_agg

    Returns a bitmap that is the bitwise OR of all of the bitmaps from the input column. The input column should be bitmaps created from bitmap_construct_agg().

    Parameters:
    col - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • bitmap_and_agg

    Returns a bitmap that is the bitwise AND of all of the bitmaps from the input column. The input column should be bitmaps created from bitmap_construct_agg().

    Parameters:
    col - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • ascii

    Computes the numeric value of the first character of the string column, and returns the result as an int column.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • base64

    Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • bit_length

    Calculates the bit length for the specified string column.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.3.0
  • concat_ws

    public static Column concat_ws(String sep, scala.collection.immutable.Seq<Column> exprs)

    Concatenates multiple input string columns together into a single string column, using the given separator.

    Parameters:
    sep - (undocumented)
    exprs - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
    Note:
    Input strings which are null are skipped.
  • decode

    Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 'UTF-32'). If either argument is null, the result will also be null.

    Parameters:
    value - (undocumented)
    charset - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • encode

    Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 'UTF-32'). If either argument is null, the result will also be null.

    Parameters:
    value - (undocumented)
    charset - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • is_valid_utf8

    Returns true if the input is a valid UTF-8 string, otherwise returns false.

    Parameters:
    str - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • make_valid_utf8

    Returns a new string in which all invalid UTF-8 byte sequences, if any, are replaced by the Unicode replacement character (U+FFFD).

    Parameters:
    str - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • validate_utf8

    Returns the input value if it corresponds to a valid UTF-8 string, or emits a SparkIllegalArgumentException exception otherwise.

    Parameters:
    str - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • try_validate_utf8

    public static Column try_validate_utf8(Column str)

    Returns the input value if it corresponds to a valid UTF-8 string, or NULL otherwise.

    Parameters:
    str - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • format_number

    public static Column format_number(Column x, int d)

    Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column.

    If d is 0, the result has no decimal point or fractional part. If d is less than 0, the result will be null.

    Parameters:
    x - (undocumented)
    d - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • format_string

    public static Column format_string(String format, scala.collection.immutable.Seq<Column> arguments)

    Formats the arguments in printf-style and returns the result as a string column.

    Parameters:
    format - (undocumented)
    arguments - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • initcap

    Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace.

    For example, "hello world" will become "Hello World".

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • instr

    Locate the position of the first occurrence of substr column in the given string. Returns null if either of the arguments are null.

    Parameters:
    str - (undocumented)
    substring - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
    Note:
    The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.
  • instr

    Locate the position of the first occurrence of substr column in the given string. Returns null if either of the arguments are null.

    Parameters:
    str - (undocumented)
    substring - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
    Note:
    The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.
  • length

    Computes the character length of a given string or number of bytes of a binary string. The length of character strings include the trailing spaces. The length of binary strings includes binary zeros.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • len

    Computes the character length of a given string or number of bytes of a binary string. The length of character strings include the trailing spaces. The length of binary strings includes binary zeros.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • lower

    Converts a string column to lower case.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • levenshtein

    Computes the Levenshtein distance of the two given string columns if it's less than or equal to a given threshold.

    Parameters:
    l - (undocumented)
    r - (undocumented)
    threshold - (undocumented)
    Returns:
    result distance, or -1
    Since:
    3.5.0
  • levenshtein

    Computes the Levenshtein distance of the two given string columns.

    Parameters:
    l - (undocumented)
    r - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • locate

    Locate the position of the first occurrence of substr.

    Parameters:
    substr - (undocumented)
    str - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
    Note:
    The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.
  • locate

    Locate the position of the first occurrence of substr in a string column, after position pos.

    Parameters:
    substr - (undocumented)
    str - (undocumented)
    pos - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
    Note:
    The position is not zero based, but 1 based index. returns 0 if substr could not be found in str.
  • lpad

    Left-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.

    Parameters:
    str - (undocumented)
    len - (undocumented)
    pad - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • lpad

    public static Column lpad(Column str, int len, byte[] pad)

    Left-pad the binary column with pad to a byte length of len. If the binary column is longer than len, the return value is shortened to len bytes.

    Parameters:
    str - (undocumented)
    len - (undocumented)
    pad - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.3.0
  • lpad

    Left-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.

    Parameters:
    str - (undocumented)
    len - (undocumented)
    pad - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • ltrim

    Trim the spaces from left end for the specified string value.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • ltrim

    Trim the specified character string from left end for the specified string column.

    Parameters:
    e - (undocumented)
    trimString - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.3.0
  • ltrim

    Trim the specified character string from left end for the specified string column.

    Parameters:
    e - (undocumented)
    trim - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • octet_length

    Calculates the byte length for the specified string column.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.3.0
  • collate

    Marks a given column with specified collation.

    Parameters:
    e - (undocumented)
    collation - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • collation

    Returns the collation name of a given column.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • rlike

    Returns true if str matches regexp, or false otherwise.

    Parameters:
    str - (undocumented)
    regexp - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • regexp

    Returns true if str matches regexp, or false otherwise.

    Parameters:
    str - (undocumented)
    regexp - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • regexp_like

    Returns true if str matches regexp, or false otherwise.

    Parameters:
    str - (undocumented)
    regexp - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • regexp_count

    Returns a count of the number of times that the regular expression pattern regexp is matched in the string str.

    Parameters:
    str - (undocumented)
    regexp - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • regexp_extract

    Extract a specific group matched by a Java regex, from the specified string column. If the regex did not match, or the specified group did not match, an empty string is returned. if the specified group index exceeds the group count of regex, an IllegalArgumentException will be thrown.

    Parameters:
    e - (undocumented)
    exp - (undocumented)
    groupIdx - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • regexp_extract_all

    Extract all strings in the str that match the regexp expression and corresponding to the first regex group index.

    Parameters:
    str - (undocumented)
    regexp - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • regexp_extract_all

    Extract all strings in the str that match the regexp expression and corresponding to the regex group index.

    Parameters:
    str - (undocumented)
    regexp - (undocumented)
    idx - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • regexp_replace

    Replace all substrings of the specified string value that match regexp with rep.

    Parameters:
    e - (undocumented)
    pattern - (undocumented)
    replacement - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • regexp_replace

    Replace all substrings of the specified string value that match regexp with rep.

    Parameters:
    e - (undocumented)
    pattern - (undocumented)
    replacement - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.1.0
  • regexp_substr

    Returns the substring that matches the regular expression regexp within the string str. If the regular expression is not found, the result is null.

    Parameters:
    str - (undocumented)
    regexp - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • regexp_instr

    Searches a string for a regular expression and returns an integer that indicates the beginning position of the matched substring. Positions are 1-based, not 0-based. If no match is found, returns 0.

    Parameters:
    str - (undocumented)
    regexp - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • regexp_instr

    Searches a string for a regular expression and returns an integer that indicates the beginning position of the matched substring. Positions are 1-based, not 0-based. If no match is found, returns 0.

    Parameters:
    str - (undocumented)
    regexp - (undocumented)
    idx - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • unbase64

    Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • rpad

    Right-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.

    Parameters:
    str - (undocumented)
    len - (undocumented)
    pad - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • rpad

    public static Column rpad(Column str, int len, byte[] pad)

    Right-pad the binary column with pad to a byte length of len. If the binary column is longer than len, the return value is shortened to len bytes.

    Parameters:
    str - (undocumented)
    len - (undocumented)
    pad - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.3.0
  • rpad

    Right-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.

    Parameters:
    str - (undocumented)
    len - (undocumented)
    pad - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • repeat

    Repeats a string column n times, and returns it as a new string column.

    Parameters:
    str - (undocumented)
    n - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • repeat

    Repeats a string column n times, and returns it as a new string column.

    Parameters:
    str - (undocumented)
    n - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • rtrim

    Trim the spaces from right end for the specified string value.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • rtrim

    Trim the specified character string from right end for the specified string column.

    Parameters:
    e - (undocumented)
    trimString - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.3.0
  • rtrim

    Trim the specified character string from right end for the specified string column.

    Parameters:
    e - (undocumented)
    trim - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • soundex

    Returns the soundex code for the specified expression.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • split

    Splits str around matches of the given pattern.

    Parameters:
    str - a string expression to split
    pattern - a string representing a regular expression. The regex string should be a Java regular expression.
    Returns:
    (undocumented)
    Since:
    1.5.0
  • split

    Splits str around matches of the given pattern.

    Parameters:
    str - a string expression to split
    pattern - a column of string representing a regular expression. The regex string should be a Java regular expression.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • split

    Splits str around matches of the given pattern.

    Parameters:
    str - a string expression to split
    pattern - a string representing a regular expression. The regex string should be a Java regular expression.
    limit - an integer expression which controls the number of times the regex is applied.
    • limit greater than 0: The resulting array's length will not be more than limit, and the resulting array's last entry will contain all input beyond the last matched regex.
    • limit less than or equal to 0: regex will be applied as many times as possible, and the resulting array can be of any size.
    Returns:
    (undocumented)
    Since:
    3.0.0
  • split

    Splits str around matches of the given pattern.

    Parameters:
    str - a string expression to split
    pattern - a column of string representing a regular expression. The regex string should be a Java regular expression.
    limit - a column of integer expression which controls the number of times the regex is applied.
    • limit greater than 0: The resulting array's length will not be more than limit, and the resulting array's last entry will contain all input beyond the last matched regex.
    • limit less than or equal to 0: regex will be applied as many times as possible, and the resulting array can be of any size.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • substring

    public static Column substring(Column str, int pos, int len)

    Substring starts at pos and is of length len when str is String type or returns the slice of byte array that starts at pos in byte and is of length len when str is Binary type

    Parameters:
    str - (undocumented)
    pos - (undocumented)
    len - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
    Note:
    The position is not zero based, but 1 based index.
  • substring

    Substring starts at pos and is of length len when str is String type or returns the slice of byte array that starts at pos in byte and is of length len when str is Binary type

    Parameters:
    str - (undocumented)
    pos - (undocumented)
    len - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
    Note:
    The position is not zero based, but 1 based index.
  • substring_index

    public static Column substring_index(Column str, String delim, int count)

    Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything the left of the final delimiter (counting from left) is returned. If count is negative, every to the right of the final delimiter (counting from the right) is returned. substring_index performs a case-sensitive match when searching for delim.

    Parameters:
    str - (undocumented)
    delim - (undocumented)
    count - (undocumented)
    Returns:
    (undocumented)
  • overlay

    Overlay the specified portion of src with replace, starting from byte position pos of src and proceeding for len bytes.

    Parameters:
    src - (undocumented)
    replace - (undocumented)
    pos - (undocumented)
    len - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.0.0
  • overlay

    Overlay the specified portion of src with replace, starting from byte position pos of src.

    Parameters:
    src - (undocumented)
    replace - (undocumented)
    pos - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.0.0
  • sentences

    Splits a string into arrays of sentences, where each sentence is an array of words.

    Parameters:
    string - (undocumented)
    language - (undocumented)
    country - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.2.0
  • sentences

    Splits a string into arrays of sentences, where each sentence is an array of words. The default country('') is used.

    Parameters:
    string - (undocumented)
    language - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • sentences

    Splits a string into arrays of sentences, where each sentence is an array of words. The default locale is used.

    Parameters:
    string - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.2.0
  • translate

    Translate any character in the src by a character in replaceString. The characters in replaceString correspond to the characters in matchingString. The translate will happen when any character in the string matches the character in the matchingString.

    Parameters:
    src - (undocumented)
    matchingString - (undocumented)
    replaceString - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • trim

    Trim the spaces from both ends for the specified string column.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • trim

    Trim the specified character from both ends for the specified string column.

    Parameters:
    e - (undocumented)
    trimString - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.3.0
  • trim

    Trim the specified character from both ends for the specified string column.

    Parameters:
    e - (undocumented)
    trim - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • upper

    Converts a string column to upper case.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • to_binary

    Converts the input e to a binary value based on the supplied format. The format can be a case-insensitive string literal of "hex", "utf-8", "utf8", or "base64". By default, the binary format for conversion is "hex" if format is omitted. The function returns NULL if at least one of the input parameters is NULL.

    Parameters:
    e - (undocumented)
    f - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • to_binary

    Converts the input e to a binary value based on the default format "hex". The function returns NULL if at least one of the input parameters is NULL.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • to_char

    Convert e to a string based on the format. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input value, generating a result string of the same length as the corresponding sequence in the format string. The result string is left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of the decimal value, starts with 0, and is before the decimal point. Otherwise, it is padded with spaces. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the end of the format string; specifies that the result string will be wrapped by angle brackets if the input value is negative.

    If e is a datetime, format shall be a valid datetime pattern, see Datetime Patterns. If e is a binary, it is converted to a string in one of the formats: 'base64': a base 64 string. 'hex': a string in the hexadecimal format. 'utf-8': the input binary is decoded to UTF-8 string.

    Parameters:
    e - (undocumented)
    format - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • to_varchar

    Convert e to a string based on the format. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input value, generating a result string of the same length as the corresponding sequence in the format string. The result string is left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of the decimal value, starts with 0, and is before the decimal point. Otherwise, it is padded with spaces. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the end of the format string; specifies that the result string will be wrapped by angle brackets if the input value is negative.

    If e is a datetime, format shall be a valid datetime pattern, see Datetime Patterns. If e is a binary, it is converted to a string in one of the formats: 'base64': a base 64 string. 'hex': a string in the hexadecimal format. 'utf-8': the input binary is decoded to UTF-8 string.

    Parameters:
    e - (undocumented)
    format - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • to_number

    Convert string 'e' to a number based on the string format 'format'. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input string. If the 0/9 sequence starts with 0 and is before the decimal point, it can only match a digit sequence of the same size. Otherwise, if the sequence starts with 9 or is after the decimal point, it can match a digit sequence that has the same or smaller size. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. 'expr' must match the grouping separator relevant for the size of the number. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' allows '-' but 'MI' does not. 'PR': Only allowed at the end of the format string; specifies that 'expr' indicates a negative number with wrapping angled brackets.

    Parameters:
    e - (undocumented)
    format - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • replace

    Replaces all occurrences of search with replace.

    Parameters:
    src - A column of string to be replaced
    search - A column of string, If search is not found in str, str is returned unchanged.
    replace - A column of string, If replace is not specified or is an empty string, nothing replaces the string that is removed from str.
    Returns:
    (undocumented)
    Since:
    3.5.0
  • replace

    Replaces all occurrences of search with replace.

    Parameters:
    src - A column of string to be replaced
    search - A column of string, If search is not found in src, src is returned unchanged.
    Returns:
    (undocumented)
    Since:
    3.5.0
  • split_part

    Splits str by delimiter and return requested part of the split (1-based). If any input is null, returns null. if partNum is out of range of split parts, returns empty string. If partNum is 0, throws an error. If partNum is negative, the parts are counted backward from the end of the string. If the delimiter is an empty string, the str is not split.

    Parameters:
    str - (undocumented)
    delimiter - (undocumented)
    partNum - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • substr

    Returns the substring of str that starts at pos and is of length len, or the slice of byte array that starts at pos and is of length len.

    Parameters:
    str - (undocumented)
    pos - (undocumented)
    len - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • substr

    Returns the substring of str that starts at pos, or the slice of byte array that starts at pos.

    Parameters:
    str - (undocumented)
    pos - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_parse_url

    Extracts a part from a URL.

    Parameters:
    url - (undocumented)
    partToExtract - (undocumented)
    key - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • try_parse_url

    Extracts a part from a URL.

    Parameters:
    url - (undocumented)
    partToExtract - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • parse_url

    Extracts a part from a URL.

    Parameters:
    url - (undocumented)
    partToExtract - (undocumented)
    key - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • parse_url

    Extracts a part from a URL.

    Parameters:
    url - (undocumented)
    partToExtract - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • printf

    public static Column printf(Column format, scala.collection.immutable.Seq<Column> arguments)

    Formats the arguments in printf-style and returns the result as a string column.

    Parameters:
    format - (undocumented)
    arguments - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • url_decode

    Decodes a str in 'application/x-www-form-urlencoded' format using a specific encoding scheme.

    Parameters:
    str - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_url_decode

    This is a special version of url_decode that performs the same operation, but returns a NULL value instead of raising an error if the decoding cannot be performed.

    Parameters:
    str - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • url_encode

    Translates a string into 'application/x-www-form-urlencoded' format using a specific encoding scheme.

    Parameters:
    str - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • position

    Returns the position of the first occurrence of substr in str after position start. The given start and return value are 1-based.

    Parameters:
    substr - (undocumented)
    str - (undocumented)
    start - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • position

    Returns the position of the first occurrence of substr in str after position 1. The return value are 1-based.

    Parameters:
    substr - (undocumented)
    str - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • endswith

    Returns a boolean. The value is True if str ends with suffix. Returns NULL if either input expression is NULL. Otherwise, returns False. Both str or suffix must be of STRING or BINARY type.

    Parameters:
    str - (undocumented)
    suffix - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • startswith

    Returns a boolean. The value is True if str starts with prefix. Returns NULL if either input expression is NULL. Otherwise, returns False. Both str or prefix must be of STRING or BINARY type.

    Parameters:
    str - (undocumented)
    prefix - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • btrim

    Removes the leading and trailing space characters from str.

    Parameters:
    str - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • btrim

    Remove the leading and trailing trim characters from str.

    Parameters:
    str - (undocumented)
    trim - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_to_binary

    This is a special version of to_binary that performs the same operation, but returns a NULL value instead of raising an error if the conversion cannot be performed.

    Parameters:
    e - (undocumented)
    f - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_to_binary

    This is a special version of to_binary that performs the same operation, but returns a NULL value instead of raising an error if the conversion cannot be performed.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_to_number

    Convert string e to a number based on the string format format. Returns NULL if the string e does not match the expected format. The format follows the same semantics as the to_number function.

    Parameters:
    e - (undocumented)
    format - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • char_length

    Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.

    Parameters:
    str - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • character_length

    public static Column character_length(Column str)

    Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.

    Parameters:
    str - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • chr

    Returns the ASCII character having the binary equivalent to n. If n is larger than 256 the result is equivalent to chr(n % 256)

    Parameters:
    n - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • contains

    Returns a boolean. The value is True if right is found inside left. Returns NULL if either input expression is NULL. Otherwise, returns False. Both left or right must be of STRING or BINARY type.

    Parameters:
    left - (undocumented)
    right - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • elt

    public static Column elt(scala.collection.immutable.Seq<Column> inputs)

    Returns the n-th input, e.g., returns input2 when n is 2. The function returns NULL if the index exceeds the length of the array and spark.sql.ansi.enabled is set to false. If spark.sql.ansi.enabled is set to true, it throws ArrayIndexOutOfBoundsException for invalid indices.

    Parameters:
    inputs - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • find_in_set

    Returns the index (1-based) of the given string (str) in the comma-delimited list (strArray). Returns 0, if the string was not found or if the given string (str) contains a comma.

    Parameters:
    str - (undocumented)
    strArray - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • like

    Returns true if str matches pattern with escapeChar, null if any arguments are null, false otherwise.

    Parameters:
    str - (undocumented)
    pattern - (undocumented)
    escapeChar - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • like

    Returns true if str matches pattern with escapeChar('\'), null if any arguments are null, false otherwise.

    Parameters:
    str - (undocumented)
    pattern - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • ilike

    Returns true if str matches pattern with escapeChar case-insensitively, null if any arguments are null, false otherwise.

    Parameters:
    str - (undocumented)
    pattern - (undocumented)
    escapeChar - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • ilike

    Returns true if str matches pattern with escapeChar('\') case-insensitively, null if any arguments are null, false otherwise.

    Parameters:
    str - (undocumented)
    pattern - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • lcase

    Returns str with all characters changed to lowercase.

    Parameters:
    str - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • ucase

    Returns str with all characters changed to uppercase.

    Parameters:
    str - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • left

    Returns the leftmost len(len can be string type) characters from the string str, if len is less or equal than 0 the result is an empty string.

    Parameters:
    str - (undocumented)
    len - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • right

    Returns the rightmost len(len can be string type) characters from the string str, if len is less or equal than 0 the result is an empty string.

    Parameters:
    str - (undocumented)
    len - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • quote

    Returns str enclosed by single quotes and each instance of single quote in it is preceded by a backslash.

    Parameters:
    str - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • hll_sketch_estimate

    public static Column hll_sketch_estimate(Column c)

    Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.

    Parameters:
    c - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • hll_sketch_estimate

    public static Column hll_sketch_estimate(String columnName)

    Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • hll_union

    Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Throws an exception if sketches have different lgConfigK values.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • hll_union

    Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Throws an exception if sketches have different lgConfigK values.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • hll_union

    public static Column hll_union(Column c1, Column c2, boolean allowDifferentLgConfigK)

    Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is set to false.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    allowDifferentLgConfigK - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • hll_union

    public static Column hll_union(String columnName1, String columnName2, boolean allowDifferentLgConfigK)

    Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is set to false.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    allowDifferentLgConfigK - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • theta_difference

    Subtracts two binary representations of Datasketches ThetaSketch objects in the input columns using a Datasketches AnotB object

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_difference

    public static Column theta_difference(String columnName1, String columnName2)

    Subtracts two binary representations of Datasketches ThetaSketch objects in the input columns using a Datasketches AnotB object

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_intersection

    Intersects two binary representations of Datasketches ThetaSketch objects in the input columns using a Datasketches Intersection object

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_intersection

    public static Column theta_intersection(String columnName1, String columnName2)

    Intersects two binary representations of Datasketches ThetaSketch objects in the input columns using a Datasketches Intersection object

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_sketch_estimate

    public static Column theta_sketch_estimate(Column c)

    Returns the estimated number of unique values given the binary representation of a Datasketches ThetaSketch.

    Parameters:
    c - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_sketch_estimate

    public static Column theta_sketch_estimate(String columnName)

    Returns the estimated number of unique values given the binary representation of a Datasketches ThetaSketch.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_union

    Unions two binary representations of Datasketches ThetaSketch objects in the input columns using a Datasketches Union object. It is configured with the default value of 12 for lgNomEntries.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_union

    Unions two binary representations of Datasketches ThetaSketch objects in the input columns using a Datasketches Union object. It is configured with the default value of 12 for lgNomEntries.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_union

    Unions two binary representations of Datasketches ThetaSketch objects in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_union

    public static Column theta_union(String columnName1, String columnName2, int lgNomEntries)

    Unions two binary representations of Datasketches ThetaSketch objects in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • theta_union

    Unions two binary representations of Datasketches ThetaSketch objects in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • tuple_difference_double

    Subtracts two binary representations of Datasketches TupleSketch objects with double summary data type in the input columns using a Datasketches AnotB object. Returns elements in the first sketch that are not in the second sketch.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_difference_double

    public static Column tuple_difference_double(String columnName1, String columnName2)

    Subtracts two binary representations of Datasketches TupleSketch objects with double summary data type in the input columns using a Datasketches AnotB object. Returns elements in the first sketch that are not in the second sketch.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_difference_integer

    Subtracts two binary representations of Datasketches TupleSketch objects with integer summary data type in the input columns using a Datasketches AnotB object. Returns elements in the first sketch that are not in the second sketch.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_difference_integer

    public static Column tuple_difference_integer(String columnName1, String columnName2)

    Subtracts two binary representations of Datasketches TupleSketch objects with integer summary data type in the input columns using a Datasketches AnotB object. Returns elements in the first sketch that are not in the second sketch.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_double

    Intersects two binary representations of Datasketches TupleSketch objects with double summary data type in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone). It is configured with the default mode of 'sum'.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_double

    public static Column tuple_intersection_double(String columnName1, String columnName2)

    Intersects two binary representations of Datasketches TupleSketch objects with double summary data type in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone). It is configured with the default mode of 'sum'.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_double

    Intersects two binary representations of Datasketches TupleSketch objects with double summary data type in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_double

    public static Column tuple_intersection_double(String columnName1, String columnName2, String mode)

    Intersects two binary representations of Datasketches TupleSketch objects with double summary data type in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_double

    Intersects two binary representations of Datasketches TupleSketch objects with double summary data type in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_integer

    Intersects two binary representations of Datasketches TupleSketch objects with integer summary data type in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone). It is configured with the default mode of 'sum'.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_integer

    public static Column tuple_intersection_integer(String columnName1, String columnName2)

    Intersects two binary representations of Datasketches TupleSketch objects with integer summary data type in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone). It is configured with the default mode of 'sum'.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_integer

    Intersects two binary representations of Datasketches TupleSketch objects with integer summary data type in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_integer

    public static Column tuple_intersection_integer(String columnName1, String columnName2, String mode)

    Intersects two binary representations of Datasketches TupleSketch objects with integer summary data type in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_integer

    Intersects two binary representations of Datasketches TupleSketch objects with integer summary data type in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_estimate_double

    public static Column tuple_sketch_estimate_double(Column c)

    Returns the estimated number of unique values given the binary representation of a Datasketches TupleSketch with double summary data type.

    Parameters:
    c - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_estimate_double

    public static Column tuple_sketch_estimate_double(String columnName)

    Returns the estimated number of unique values given the binary representation of a Datasketches TupleSketch with double summary data type.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_estimate_integer

    public static Column tuple_sketch_estimate_integer(Column c)

    Returns the estimated number of unique values given the binary representation of a Datasketches TupleSketch with integer summary data type.

    Parameters:
    c - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_estimate_integer

    public static Column tuple_sketch_estimate_integer(String columnName)

    Returns the estimated number of unique values given the binary representation of a Datasketches TupleSketch with integer summary data type.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_summary_double

    public static Column tuple_sketch_summary_double(Column c)

    Aggregates the summary values from a Datasketches TupleSketch with double summary data type. The mode parameter specifies the aggregation mode (sum, min, max, alwaysone). It is configured with the default mode of 'sum'.

    Parameters:
    c - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_summary_double

    public static Column tuple_sketch_summary_double(String columnName)

    Aggregates the summary values from a Datasketches TupleSketch with double summary data type. The mode parameter specifies the aggregation mode (sum, min, max, alwaysone). It is configured with the default mode of 'sum'.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_summary_double

    Aggregates the summary values from a Datasketches TupleSketch with double summary data type. The mode parameter specifies the aggregation mode (sum, min, max, alwaysone).

    Parameters:
    c - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_summary_double

    public static Column tuple_sketch_summary_double(String columnName, String mode)

    Aggregates the summary values from a Datasketches TupleSketch with double summary data type. The mode parameter specifies the aggregation mode (sum, min, max, alwaysone).

    Parameters:
    columnName - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_summary_double

    Aggregates the summary values from a Datasketches TupleSketch with double summary data type. The mode parameter specifies the aggregation mode (sum, min, max, alwaysone).

    Parameters:
    c - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_summary_integer

    public static Column tuple_sketch_summary_integer(Column c)

    Aggregates the summary values from a Datasketches TupleSketch with integer summary data type. The mode parameter specifies the aggregation mode (sum, min, max, alwaysone). It is configured with the default mode of 'sum'.

    Parameters:
    c - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_summary_integer

    public static Column tuple_sketch_summary_integer(String columnName)

    Aggregates the summary values from a Datasketches TupleSketch with integer summary data type. The mode parameter specifies the aggregation mode (sum, min, max, alwaysone). It is configured with the default mode of 'sum'.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_summary_integer

    Aggregates the summary values from a Datasketches TupleSketch with integer summary data type. The mode parameter specifies the aggregation mode (sum, min, max, alwaysone).

    Parameters:
    c - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_summary_integer

    public static Column tuple_sketch_summary_integer(String columnName, String mode)

    Aggregates the summary values from a Datasketches TupleSketch with integer summary data type. The mode parameter specifies the aggregation mode (sum, min, max, alwaysone).

    Parameters:
    columnName - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_summary_integer

    Aggregates the summary values from a Datasketches TupleSketch with integer summary data type. The mode parameter specifies the aggregation mode (sum, min, max, alwaysone).

    Parameters:
    c - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_theta_double

    public static Column tuple_sketch_theta_double(Column c)

    Returns the theta value (sampling rate) from a Datasketches TupleSketch with double summary data type. The theta value represents the effective sampling rate of the sketch, between 0.0 and 1.0.

    Parameters:
    c - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_theta_double

    public static Column tuple_sketch_theta_double(String columnName)

    Returns the theta value (sampling rate) from a Datasketches TupleSketch with double summary data type. The theta value represents the effective sampling rate of the sketch, between 0.0 and 1.0.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_theta_integer

    public static Column tuple_sketch_theta_integer(Column c)

    Returns the theta value (sampling rate) from a Datasketches TupleSketch with integer summary data type. The theta value represents the effective sampling rate of the sketch, between 0.0 and 1.0.

    Parameters:
    c - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_sketch_theta_integer

    public static Column tuple_sketch_theta_integer(String columnName)

    Returns the theta value (sampling rate) from a Datasketches TupleSketch with integer summary data type. The theta value represents the effective sampling rate of the sketch, between 0.0 and 1.0.

    Parameters:
    columnName - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_double

    Unions two binary representations of Datasketches TupleSketch objects with double summary data type in the input columns using a Datasketches Union object. It is configured with the default values of 12 for lgNomEntries and 'sum' for mode.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_double

    public static Column tuple_union_double(String columnName1, String columnName2)

    Unions two binary representations of Datasketches TupleSketch objects with double summary data type in the input columns using a Datasketches Union object. It is configured with the default values of 12 for lgNomEntries and 'sum' for mode.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_double

    public static Column tuple_union_double(Column c1, Column c2, int lgNomEntries)

    Unions two binary representations of Datasketches TupleSketch objects with double summary data type in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer. It uses the default mode of 'sum'.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_double

    public static Column tuple_union_double(String columnName1, String columnName2, int lgNomEntries)

    Unions two binary representations of Datasketches TupleSketch objects with double summary data type in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer. It uses the default mode of 'sum'.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_double

    Unions two binary representations of Datasketches TupleSketch objects with double summary data type in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_double

    public static Column tuple_union_double(String columnName1, String columnName2, int lgNomEntries, String mode)

    Unions two binary representations of Datasketches TupleSketch objects with double summary data type in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_double

    Unions two binary representations of Datasketches TupleSketch objects with double summary data type in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_integer

    Unions two binary representations of Datasketches TupleSketch objects with integer summary data type in the input columns using a Datasketches Union object. It is configured with the default values of 12 for lgNomEntries and 'sum' for mode.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_integer

    public static Column tuple_union_integer(String columnName1, String columnName2)

    Unions two binary representations of Datasketches TupleSketch objects with integer summary data type in the input columns using a Datasketches Union object. It is configured with the default values of 12 for lgNomEntries and 'sum' for mode.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_integer

    public static Column tuple_union_integer(Column c1, Column c2, int lgNomEntries)

    Unions two binary representations of Datasketches TupleSketch objects with integer summary data type in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer. It uses the default mode of 'sum'.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_integer

    public static Column tuple_union_integer(String columnName1, String columnName2, int lgNomEntries)

    Unions two binary representations of Datasketches TupleSketch objects with integer summary data type in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer. It uses the default mode of 'sum'.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_integer

    Unions two binary representations of Datasketches TupleSketch objects with integer summary data type in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_integer

    public static Column tuple_union_integer(String columnName1, String columnName2, int lgNomEntries, String mode)

    Unions two binary representations of Datasketches TupleSketch objects with integer summary data type in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_integer

    Unions two binary representations of Datasketches TupleSketch objects with integer summary data type in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_difference_theta_double

    Subtracts the binary representation of a Datasketches ThetaSketch from a TupleSketch with double summary data type in the input columns using a Datasketches AnotB object. Returns elements in the TupleSketch that are not in the ThetaSketch.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_difference_theta_double

    public static Column tuple_difference_theta_double(String columnName1, String columnName2)

    Subtracts the binary representation of a Datasketches ThetaSketch from a TupleSketch with double summary data type in the input columns using a Datasketches AnotB object. Returns elements in the TupleSketch that are not in the ThetaSketch.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_difference_theta_integer

    public static Column tuple_difference_theta_integer(Column c1, Column c2)

    Subtracts the binary representation of a Datasketches ThetaSketch from a TupleSketch with integer summary data type in the input columns using a Datasketches AnotB object. Returns elements in the TupleSketch that are not in the ThetaSketch.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_difference_theta_integer

    public static Column tuple_difference_theta_integer(String columnName1, String columnName2)

    Subtracts the binary representation of a Datasketches ThetaSketch from a TupleSketch with integer summary data type in the input columns using a Datasketches AnotB object. Returns elements in the TupleSketch that are not in the ThetaSketch.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_theta_double

    public static Column tuple_intersection_theta_double(Column c1, Column c2)

    Intersects the binary representation of a Datasketches TupleSketch with double summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone). It is configured with the default mode of 'sum'.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_theta_double

    public static Column tuple_intersection_theta_double(String columnName1, String columnName2)

    Intersects the binary representation of a Datasketches TupleSketch with double summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone). It is configured with the default mode of 'sum'.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_theta_double

    Intersects the binary representation of a Datasketches TupleSketch with double summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_theta_double

    public static Column tuple_intersection_theta_double(String columnName1, String columnName2, String mode)

    Intersects the binary representation of a Datasketches TupleSketch with double summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_theta_double

    Intersects the binary representation of a Datasketches TupleSketch with double summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_theta_integer

    public static Column tuple_intersection_theta_integer(Column c1, Column c2)

    Intersects the binary representation of a Datasketches TupleSketch with integer summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone). It is configured with the default mode of 'sum'.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_theta_integer

    public static Column tuple_intersection_theta_integer(String columnName1, String columnName2)

    Intersects the binary representation of a Datasketches TupleSketch with integer summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone). It is configured with the default mode of 'sum'.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_theta_integer

    Intersects the binary representation of a Datasketches TupleSketch with integer summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_theta_integer

    public static Column tuple_intersection_theta_integer(String columnName1, String columnName2, String mode)

    Intersects the binary representation of a Datasketches TupleSketch with integer summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_intersection_theta_integer

    Intersects the binary representation of a Datasketches TupleSketch with integer summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Intersection object. The mode parameter specifies the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone).

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_theta_double

    Unions the binary representation of a Datasketches TupleSketch with double summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Union object. It is configured with the default values of 12 for lgNomEntries and 'sum' for mode.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_theta_double

    public static Column tuple_union_theta_double(String columnName1, String columnName2)

    Unions the binary representation of a Datasketches TupleSketch with double summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Union object. It is configured with the default values of 12 for lgNomEntries and 'sum' for mode.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_theta_double

    public static Column tuple_union_theta_double(Column c1, Column c2, int lgNomEntries)

    Unions the binary representation of a Datasketches TupleSketch with double summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer. It uses the default mode of 'sum'.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_theta_double

    public static Column tuple_union_theta_double(String columnName1, String columnName2, int lgNomEntries)

    Unions the binary representation of a Datasketches TupleSketch with double summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer. It uses the default mode of 'sum'.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_theta_double

    public static Column tuple_union_theta_double(Column c1, Column c2, int lgNomEntries, String mode)

    Unions the binary representation of a Datasketches TupleSketch with double summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_theta_double

    public static Column tuple_union_theta_double(String columnName1, String columnName2, int lgNomEntries, String mode)

    Unions the binary representation of a Datasketches TupleSketch with double summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_theta_double

    Unions the binary representation of a Datasketches TupleSketch with double summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_theta_integer

    Unions the binary representation of a Datasketches TupleSketch with integer summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Union object. It is configured with the default values of 12 for lgNomEntries and 'sum' for mode.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_theta_integer

    public static Column tuple_union_theta_integer(String columnName1, String columnName2)

    Unions the binary representation of a Datasketches TupleSketch with integer summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Union object. It is configured with the default values of 12 for lgNomEntries and 'sum' for mode.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_theta_integer

    public static Column tuple_union_theta_integer(Column c1, Column c2, int lgNomEntries)

    Unions the binary representation of a Datasketches TupleSketch with integer summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer. It uses the default mode of 'sum'.

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_theta_integer

    public static Column tuple_union_theta_integer(String columnName1, String columnName2, int lgNomEntries)

    Unions the binary representation of a Datasketches TupleSketch with integer summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer. It uses the default mode of 'sum'.

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    lgNomEntries - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_theta_integer

    public static Column tuple_union_theta_integer(Column c1, Column c2, int lgNomEntries, String mode)

    Unions the binary representation of a Datasketches TupleSketch with integer summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_theta_integer

    public static Column tuple_union_theta_integer(String columnName1, String columnName2, int lgNomEntries, String mode)

    Unions the binary representation of a Datasketches TupleSketch with integer summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    columnName1 - (undocumented)
    columnName2 - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • tuple_union_theta_integer

    Unions the binary representation of a Datasketches TupleSketch with integer summary data type with a Datasketches ThetaSketch in the input columns using a Datasketches Union object. It allows the configuration of lgNomEntries log nominal entries for the union buffer and the aggregation mode for numeric summaries (sum, min, max, alwaysone).

    Parameters:
    c1 - (undocumented)
    c2 - (undocumented)
    lgNomEntries - (undocumented)
    mode - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • kll_sketch_to_string_bigint

    public static Column kll_sketch_to_string_bigint(Column e)

    Returns a string with human readable summary information about the KLL bigint sketch.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_to_string_float

    public static Column kll_sketch_to_string_float(Column e)

    Returns a string with human readable summary information about the KLL float sketch.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_to_string_double

    public static Column kll_sketch_to_string_double(Column e)

    Returns a string with human readable summary information about the KLL double sketch.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_get_n_bigint

    public static Column kll_sketch_get_n_bigint(Column e)

    Returns the number of items collected in the KLL bigint sketch.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_get_n_float

    public static Column kll_sketch_get_n_float(Column e)

    Returns the number of items collected in the KLL float sketch.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_get_n_double

    public static Column kll_sketch_get_n_double(Column e)

    Returns the number of items collected in the KLL double sketch.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_merge_bigint

    Merges two KLL bigint sketch buffers together into one.

    Parameters:
    left - (undocumented)
    right - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_merge_float

    Merges two KLL float sketch buffers together into one.

    Parameters:
    left - (undocumented)
    right - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_merge_double

    Merges two KLL double sketch buffers together into one.

    Parameters:
    left - (undocumented)
    right - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_get_quantile_bigint

    public static Column kll_sketch_get_quantile_bigint(Column sketch, Column rank)

    Extracts a quantile value from a KLL bigint sketch given an input rank value. The rank can be a single value or an array.

    Parameters:
    sketch - (undocumented)
    rank - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_get_quantile_float

    public static Column kll_sketch_get_quantile_float(Column sketch, Column rank)

    Extracts a quantile value from a KLL float sketch given an input rank value. The rank can be a single value or an array.

    Parameters:
    sketch - (undocumented)
    rank - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_get_quantile_double

    public static Column kll_sketch_get_quantile_double(Column sketch, Column rank)

    Extracts a quantile value from a KLL double sketch given an input rank value. The rank can be a single value or an array.

    Parameters:
    sketch - (undocumented)
    rank - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_get_rank_bigint

    public static Column kll_sketch_get_rank_bigint(Column sketch, Column quantile)

    Extracts a rank value from a KLL bigint sketch given an input quantile value. The quantile can be a single value or an array.

    Parameters:
    sketch - (undocumented)
    quantile - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_get_rank_float

    public static Column kll_sketch_get_rank_float(Column sketch, Column quantile)

    Extracts a rank value from a KLL float sketch given an input quantile value. The quantile can be a single value or an array.

    Parameters:
    sketch - (undocumented)
    quantile - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • kll_sketch_get_rank_double

    public static Column kll_sketch_get_rank_double(Column sketch, Column quantile)

    Extracts a rank value from a KLL double sketch given an input quantile value. The quantile can be a single value or an array.

    Parameters:
    sketch - (undocumented)
    quantile - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • add_months

    public static Column add_months(Column startDate, int numMonths)

    Returns the date that is numMonths after startDate.

    Parameters:
    startDate - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    numMonths - The number of months to add to startDate, can be negative to subtract months
    Returns:
    A date, or null if startDate was a string that could not be cast to a date
    Since:
    1.5.0
  • add_months

    Returns the date that is numMonths after startDate.

    Parameters:
    startDate - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    numMonths - A column of the number of months to add to startDate, can be negative to subtract months
    Returns:
    A date, or null if startDate was a string that could not be cast to a date
    Since:
    3.0.0
  • curdate

    public static Column curdate()

    Returns the current date at the start of query evaluation as a date column. All calls of current_date within the same query return the same value.

    Returns:
    (undocumented)
    Since:
    3.5.0
  • current_date

    public static Column current_date()

    Returns the current date at the start of query evaluation as a date column. All calls of current_date within the same query return the same value.

    Returns:
    (undocumented)
    Since:
    1.5.0
  • current_timezone

    public static Column current_timezone()

    Returns the current session local timezone.

    Returns:
    (undocumented)
    Since:
    3.5.0
  • current_timestamp

    public static Column current_timestamp()

    Returns the current timestamp at the start of query evaluation as a timestamp column. All calls of current_timestamp within the same query return the same value.

    Returns:
    (undocumented)
    Since:
    1.5.0
  • now

    public static Column now()

    Returns the current timestamp at the start of query evaluation.

    Returns:
    (undocumented)
    Since:
    3.5.0
  • localtimestamp

    public static Column localtimestamp()

    Returns the current timestamp without time zone at the start of query evaluation as a timestamp without time zone column. All calls of localtimestamp within the same query return the same value.

    Returns:
    (undocumented)
    Since:
    3.3.0
  • date_format

    Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.

    See Datetime Patterns for valid date and time format patterns

    Parameters:
    dateExpr - A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    format - A pattern dd.MM.yyyy would return a string like 18.03.1993
    Returns:
    A string, or null if dateExpr was a string that could not be cast to a timestamp
    Throws:
    IllegalArgumentException - if the format pattern is invalid
    Since:
    1.5.0
    Note:
    Use specialized functions like year(org.apache.spark.sql.Column) whenever possible as they benefit from a specialized implementation.
  • date_add

    public static Column date_add(Column start, int days)

    Returns the date that is days days after start

    Parameters:
    start - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    days - The number of days to add to start, can be negative to subtract days
    Returns:
    A date, or null if start was a string that could not be cast to a date
    Since:
    1.5.0
  • date_add

    Returns the date that is days days after start

    Parameters:
    start - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    days - A column of the number of days to add to start, can be negative to subtract days
    Returns:
    A date, or null if start was a string that could not be cast to a date
    Since:
    3.0.0
  • dateadd

    Returns the date that is days days after start

    Parameters:
    start - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    days - A column of the number of days to add to start, can be negative to subtract days
    Returns:
    A date, or null if start was a string that could not be cast to a date
    Since:
    3.5.0
  • date_sub

    public static Column date_sub(Column start, int days)

    Returns the date that is days days before start

    Parameters:
    start - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    days - The number of days to subtract from start, can be negative to add days
    Returns:
    A date, or null if start was a string that could not be cast to a date
    Since:
    1.5.0
  • date_sub

    Returns the date that is days days before start

    Parameters:
    start - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    days - A column of the number of days to subtract from start, can be negative to add days
    Returns:
    A date, or null if start was a string that could not be cast to a date
    Since:
    3.0.0
  • datediff

    Returns the number of days from start to end.

    Only considers the date part of the input. For example:

    
     dateddiff("2018-01-10 00:00:00", "2018-01-09 23:59:59")
     // returns 1
     
    Parameters:
    end - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    start - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    Returns:
    An integer, or null if either end or start were strings that could not be cast to a date. Negative if end is before start
    Since:
    1.5.0
  • date_diff

    Returns the number of days from start to end.

    Only considers the date part of the input. For example:

    
     dateddiff("2018-01-10 00:00:00", "2018-01-09 23:59:59")
     // returns 1
     
    Parameters:
    end - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    start - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    Returns:
    An integer, or null if either end or start were strings that could not be cast to a date. Negative if end is before start
    Since:
    3.5.0
  • date_from_unix_date

    public static Column date_from_unix_date(Column days)

    Create date from the number of days since 1970-01-01.

    Parameters:
    days - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • year

    Extracts the year as an integer from a given date/timestamp/string.

    Parameters:
    e - (undocumented)
    Returns:
    An integer, or null if the input was a string that could not be cast to a date
    Since:
    1.5.0
  • quarter

    Extracts the quarter as an integer from a given date/timestamp/string.

    Parameters:
    e - (undocumented)
    Returns:
    An integer, or null if the input was a string that could not be cast to a date
    Since:
    1.5.0
  • month

    Extracts the month as an integer from a given date/timestamp/string.

    Parameters:
    e - (undocumented)
    Returns:
    An integer, or null if the input was a string that could not be cast to a date
    Since:
    1.5.0
  • dayofweek

    Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday

    Parameters:
    e - (undocumented)
    Returns:
    An integer, or null if the input was a string that could not be cast to a date
    Since:
    2.3.0
  • dayofmonth

    Extracts the day of the month as an integer from a given date/timestamp/string.

    Parameters:
    e - (undocumented)
    Returns:
    An integer, or null if the input was a string that could not be cast to a date
    Since:
    1.5.0
  • day

    Extracts the day of the month as an integer from a given date/timestamp/string.

    Parameters:
    e - (undocumented)
    Returns:
    An integer, or null if the input was a string that could not be cast to a date
    Since:
    3.5.0
  • dayofyear

    Extracts the day of the year as an integer from a given date/timestamp/string.

    Parameters:
    e - (undocumented)
    Returns:
    An integer, or null if the input was a string that could not be cast to a date
    Since:
    1.5.0
  • hour

    Extracts the hours as an integer from a given date/time/timestamp/string.

    Parameters:
    e - (undocumented)
    Returns:
    An integer, or null if the input was a string that could not be cast to a date
    Since:
    1.5.0
  • extract

    Extracts a part of the date/timestamp or interval source.

    Parameters:
    field - selects which part of the source should be extracted.
    source - a date/timestamp or interval column from where field should be extracted.
    Returns:
    a part of the date/timestamp or interval source
    Since:
    3.5.0
  • date_part

    Extracts a part of the date/timestamp or interval source.

    Parameters:
    field - selects which part of the source should be extracted, and supported string values are as same as the fields of the equivalent function extract.
    source - a date/timestamp or interval column from where field should be extracted.
    Returns:
    a part of the date/timestamp or interval source
    Since:
    3.5.0
  • datepart

    Extracts a part of the date/timestamp or interval source.

    Parameters:
    field - selects which part of the source should be extracted, and supported string values are as same as the fields of the equivalent function EXTRACT.
    source - a date/timestamp or interval column from where field should be extracted.
    Returns:
    a part of the date/timestamp or interval source
    Since:
    3.5.0
  • last_day

    Returns the last day of the month which the given date belongs to. For example, input "2015-07-27" returns "2015-07-31" since July 31 is the last day of the month in July 2015.

    Parameters:
    e - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    Returns:
    A date, or null if the input was a string that could not be cast to a date
    Since:
    1.5.0
  • minute

    Extracts the minutes as an integer from a given date/time/timestamp/string.

    Parameters:
    e - (undocumented)
    Returns:
    An integer, or null if the input was a string that could not be cast to a date
    Since:
    1.5.0
  • weekday

    Returns the day of the week for date/timestamp (0 = Monday, 1 = Tuesday, ..., 6 = Sunday).

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • make_date

    Parameters:
    year - (undocumented)
    month - (undocumented)
    day - (undocumented)
    Returns:
    A date created from year, month and day fields.
    Since:
    3.3.0
  • months_between

    Returns number of months between dates start and end.

    A whole number is returned if both inputs have the same day of month or both are the last day of their respective months. Otherwise, the difference is calculated assuming 31 days per month.

    For example:

    
     months_between("2017-11-14", "2017-07-14")  // returns 4.0
     months_between("2017-01-01", "2017-01-10")  // returns 0.29032258
     months_between("2017-06-01", "2017-06-16 12:00:00")  // returns -0.5
     
    Parameters:
    end - A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    start - A date, timestamp or string. If a string, the data must be in a format that can cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    Returns:
    A double, or null if either end or start were strings that could not be cast to a timestamp. Negative if end is before start
    Since:
    1.5.0
  • months_between

    public static Column months_between(Column end, Column start, boolean roundOff)

    Returns number of months between dates end and start. If roundOff is set to true, the result is rounded off to 8 digits; it is not rounded otherwise.

    Parameters:
    end - (undocumented)
    start - (undocumented)
    roundOff - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • next_day

    Returns the first date which is later than the value of the date column that is on the specified day of the week.

    For example, next_day('2015-07-27', "Sunday") returns 2015-08-02 because that is the first Sunday after 2015-07-27.

    Parameters:
    date - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    dayOfWeek - Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"
    Returns:
    A date, or null if date was a string that could not be cast to a date or if dayOfWeek was an invalid value
    Since:
    1.5.0
  • next_day

    Returns the first date which is later than the value of the date column that is on the specified day of the week.

    For example, next_day('2015-07-27', "Sunday") returns 2015-08-02 because that is the first Sunday after 2015-07-27.

    Parameters:
    date - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    dayOfWeek - A column of the day of week. Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"
    Returns:
    A date, or null if date was a string that could not be cast to a date or if dayOfWeek was an invalid value
    Since:
    3.2.0
  • second

    Extracts the seconds as an integer from a given date/time/timestamp/string.

    Parameters:
    e - (undocumented)
    Returns:
    An integer, or null if the input was a string that could not be cast to a timestamp
    Since:
    1.5.0
  • weekofyear

    Extracts the week number as an integer from a given date/timestamp/string.

    A week is considered to start on a Monday and week 1 is the first week with more than 3 days, as defined by ISO 8601

    Parameters:
    e - (undocumented)
    Returns:
    An integer, or null if the input was a string that could not be cast to a date
    Since:
    1.5.0
  • from_unixtime

    Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the yyyy-MM-dd HH:mm:ss format.

    Parameters:
    ut - A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch
    Returns:
    A string, or null if the input was a string that could not be cast to a long
    Since:
    1.5.0
  • from_unixtime

    Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.

    See Datetime Patterns for valid date and time format patterns

    Parameters:
    ut - A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch
    f - A date time pattern that the input will be formatted to
    Returns:
    A string, or null if ut was a string that could not be cast to a long or f was an invalid date time pattern
    Since:
    1.5.0
  • unix_timestamp

    public static Column unix_timestamp()

    Returns the current Unix timestamp (in seconds) as a long.

    Returns:
    (undocumented)
    Since:
    1.5.0
    Note:
    All calls of unix_timestamp within the same query return the same value (i.e. the current timestamp is calculated at the start of query evaluation).
  • unix_timestamp

    Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale.

    Parameters:
    s - A date, timestamp or string. If a string, the data must be in the yyyy-MM-dd HH:mm:ss format
    Returns:
    A long, or null if the input was a string not of the correct format
    Since:
    1.5.0
  • unix_timestamp

    Converts time string with given pattern to Unix timestamp (in seconds).

    See Datetime Patterns for valid date and time format patterns

    Parameters:
    s - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    p - A date time pattern detailing the format of s when s is a string
    Returns:
    A long, or null if s was a string that could not be cast to a date or p was an invalid format
    Since:
    1.5.0
  • to_time

    Parses a string value to a time value.

    Parameters:
    str - A string to be parsed to time.
    Returns:
    A time, or raises an error if the input is malformed.
    Since:
    4.1.0
  • to_time

    Parses a string value to a time value.

    See Datetime Patterns for valid time format patterns.

    Parameters:
    str - A string to be parsed to time.
    format - A time format pattern to follow.
    Returns:
    A time, or raises an error if the input is malformed.
    Since:
    4.1.0
  • to_timestamp

    Converts to a timestamp by casting rules to TimestampType.

    Parameters:
    s - A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    Returns:
    A timestamp, or null if the input was a string that could not be cast to a timestamp
    Since:
    2.2.0
  • to_timestamp

    Converts time string with the given pattern to timestamp.

    See Datetime Patterns for valid date and time format patterns

    Parameters:
    s - A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    fmt - A date time pattern detailing the format of s when s is a string
    Returns:
    A timestamp, or null if s was a string that could not be cast to a timestamp or fmt was an invalid format
    Since:
    2.2.0
  • try_to_time

    Parses a string value to a time value.

    Parameters:
    str - A string to be parsed to time.
    Returns:
    A time, or null if the input is malformed.
    Since:
    4.1.0
  • try_to_time

    Parses a string value to a time value.

    See Datetime Patterns for valid time format patterns.

    Parameters:
    str - A string to be parsed to time.
    format - A time format pattern to follow.
    Returns:
    A time, or null if the input is malformed.
    Since:
    4.1.0
  • try_to_timestamp

    Parses the s with the format to a timestamp. The function always returns null on an invalid input with/without ANSI SQL mode enabled. The result data type is consistent with the value of configuration spark.sql.timestampType.

    Parameters:
    s - (undocumented)
    format - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_to_timestamp

    Parses the s to a timestamp. The function always returns null on an invalid input with/without ANSI SQL mode enabled. It follows casting rules to a timestamp. The result data type is consistent with the value of configuration spark.sql.timestampType.

    Parameters:
    s - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • to_date

    Converts the column into DateType by casting rules to DateType.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • to_date

    Converts the column into a DateType with a specified format

    See Datetime Patterns for valid date and time format patterns

    Parameters:
    e - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    fmt - A date time pattern detailing the format of e when eis a string
    Returns:
    A date, or null if e was a string that could not be cast to a date or fmt was an invalid format
    Since:
    2.2.0
  • try_to_date

    This is a special version of to_date that performs the same operation, but returns a NULL value instead of raising an error if date cannot be created.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • try_to_date

    This is a special version of to_date that performs the same operation, but returns a NULL value instead of raising an error if date cannot be created.

    Parameters:
    e - (undocumented)
    fmt - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • unix_date

    Returns the number of days since 1970-01-01.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • unix_micros

    Returns the number of microseconds since 1970-01-01 00:00:00 UTC.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • unix_millis

    Returns the number of milliseconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • unix_seconds

    Returns the number of seconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • trunc

    Returns date truncated to the unit specified by the format.

    For example, trunc("2018-11-19 12:01:19", "year") returns 2018-01-01

    Parameters:
    date - A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    format - : 'year', 'yyyy', 'yy' to truncate by year, or 'month', 'mon', 'mm' to truncate by month Other options are: 'week', 'quarter'
    Returns:
    A date, or null if date was a string that could not be cast to a date or format was an invalid value
    Since:
    1.5.0
  • date_trunc

    Returns timestamp truncated to the unit specified by the format.

    For example, date_trunc("year", "2018-11-19 12:01:19") returns 2018-01-01 00:00:00

    Parameters:
    format - : 'year', 'yyyy', 'yy' to truncate by year, 'month', 'mon', 'mm' to truncate by month, 'day', 'dd' to truncate by day, Other options are: 'microsecond', 'millisecond', 'second', 'minute', 'hour', 'week', 'quarter'
    timestamp - A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    Returns:
    A timestamp, or null if timestamp was a string that could not be cast to a timestamp or format was an invalid value
    Since:
    2.3.0
  • from_utc_timestamp

    Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.

    Parameters:
    ts - A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    tz - A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.
    Returns:
    A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value
    Since:
    1.5.0
  • from_utc_timestamp

    Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.

    Parameters:
    ts - (undocumented)
    tz - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • to_utc_timestamp

    Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.

    Parameters:
    ts - A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
    tz - A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.
    Returns:
    A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value
    Since:
    1.5.0
  • to_utc_timestamp

    Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.

    Parameters:
    ts - (undocumented)
    tz - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • window

    Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The following example takes the average stock price for a one minute window every 10 seconds starting 5 seconds after the hour:

    
       val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType
       df.groupBy(window($"timestamp", "1 minute", "10 seconds", "5 seconds"), $"stockId")
         .agg(mean("price"))
     

    The windows will look like:

    
       09:00:05-09:01:05
       09:00:15-09:01:15
       09:00:25-09:01:25 ...
     

    For a streaming query, you may use the function current_timestamp to generate windows on processing time.

    Parameters:
    timeColumn - The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType or TimestampNTZType.
    windowDuration - A string specifying the width of the window, e.g. 10 minutes, 1 second. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example, 1 day always means 86,400,000 milliseconds, not a calendar day.
    slideDuration - A string specifying the sliding interval of the window, e.g. 1 minute. A new window will be generated every slideDuration. Must be less than or equal to the windowDuration. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar.
    startTime - The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide startTime as 15 minutes.
    Returns:
    (undocumented)
    Since:
    2.0.0
  • window

    Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The windows start beginning at 1970-01-01 00:00:00 UTC. The following example takes the average stock price for a one minute window every 10 seconds:

    
       val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType
       df.groupBy(window($"timestamp", "1 minute", "10 seconds"), $"stockId")
         .agg(mean("price"))
     

    The windows will look like:

    
       09:00:00-09:01:00
       09:00:10-09:01:10
       09:00:20-09:01:20 ...
     

    For a streaming query, you may use the function current_timestamp to generate windows on processing time.

    Parameters:
    timeColumn - The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType or TimestampNTZType.
    windowDuration - A string specifying the width of the window, e.g. 10 minutes, 1 second. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example, 1 day always means 86,400,000 milliseconds, not a calendar day.
    slideDuration - A string specifying the sliding interval of the window, e.g. 1 minute. A new window will be generated every slideDuration. Must be less than or equal to the windowDuration. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar.
    Returns:
    (undocumented)
    Since:
    2.0.0
  • window

    Generates tumbling time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The windows start beginning at 1970-01-01 00:00:00 UTC. The following example takes the average stock price for a one minute tumbling window:

    
       val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType
       df.groupBy(window($"timestamp", "1 minute"), $"stockId")
         .agg(mean("price"))
     

    The windows will look like:

    
       09:00:00-09:01:00
       09:01:00-09:02:00
       09:02:00-09:03:00 ...
     

    For a streaming query, you may use the function current_timestamp to generate windows on processing time.

    Parameters:
    timeColumn - The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType or TimestampNTZType.
    windowDuration - A string specifying the width of the window, e.g. 10 minutes, 1 second. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers.
    Returns:
    (undocumented)
    Since:
    2.0.0
  • window_time

    public static Column window_time(Column windowColumn)

    Extracts the event time from the window column.

    The window column is of StructType { start: Timestamp, end: Timestamp } where start is inclusive and end is exclusive. Since event time can support microsecond precision, window_time(window) = window.end - 1 microsecond.

    Parameters:
    windowColumn - The window column (typically produced by window aggregation) of type StructType { start: Timestamp, end: Timestamp }
    Returns:
    (undocumented)
    Since:
    3.4.0
  • session_window

    public static Column session_window(Column timeColumn, String gapDuration)

    Generates session window given a timestamp specifying column.

    Session window is one of dynamic windows, which means the length of window is varying according to the given inputs. The length of session window is defined as "the timestamp of latest input of the session + gap duration", so when the new inputs are bound to the current session window, the end time of session window can be expanded according to the new inputs.

    Windows can support microsecond precision. gapDuration in the order of months are not supported.

    For a streaming query, you may use the function current_timestamp to generate windows on processing time.

    Parameters:
    timeColumn - The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType or TimestampNTZType.
    gapDuration - A string specifying the timeout of the session, e.g. 10 minutes, 1 second. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers.
    Returns:
    (undocumented)
    Since:
    3.2.0
  • session_window

    public static Column session_window(Column timeColumn, Column gapDuration)

    Generates session window given a timestamp specifying column.

    Session window is one of dynamic windows, which means the length of window is varying according to the given inputs. For static gap duration, the length of session window is defined as "the timestamp of latest input of the session + gap duration", so when the new inputs are bound to the current session window, the end time of session window can be expanded according to the new inputs.

    Besides a static gap duration value, users can also provide an expression to specify gap duration dynamically based on the input row. With dynamic gap duration, the closing of a session window does not depend on the latest input anymore. A session window's range is the union of all events' ranges which are determined by event start time and evaluated gap duration during the query execution. Note that the rows with negative or zero gap duration will be filtered out from the aggregation.

    Windows can support microsecond precision. gapDuration in the order of months are not supported.

    For a streaming query, you may use the function current_timestamp to generate windows on processing time.

    Parameters:
    timeColumn - The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType or TimestampNTZType.
    gapDuration - A column specifying the timeout of the session. It could be static value, e.g. 10 minutes, 1 second, or an expression/UDF that specifies gap duration dynamically based on the input row.
    Returns:
    (undocumented)
    Since:
    3.2.0
  • timestamp_seconds

    Converts the number of seconds from the Unix epoch (1970-01-01T00:00:00Z) to a timestamp.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.1.0
  • timestamp_millis

    Creates timestamp from the number of milliseconds since UTC epoch.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • timestamp_micros

    Creates timestamp from the number of microseconds since UTC epoch.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • timestamp_diff

    Gets the difference between the timestamps in the specified units by truncating the fraction part.

    Parameters:
    unit - (undocumented)
    start - (undocumented)
    end - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • timestamp_add

    Adds the specified number of units to the given timestamp.

    Parameters:
    unit - (undocumented)
    quantity - (undocumented)
    ts - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • time_diff

    Returns the difference between two times, measured in specified units. Throws a SparkIllegalArgumentException, in case the specified unit is not supported.

    Parameters:
    unit - A STRING representing the unit of the time difference. Supported units are: "HOUR", "MINUTE", "SECOND", "MILLISECOND", and "MICROSECOND". The unit is case-insensitive.
    start - A starting TIME.
    end - An ending TIME.
    Returns:
    The difference between end and start times, measured in specified units.
    Since:
    4.1.0
    Note:
    If any of the inputs is NULL, the result is NULL.
  • time_trunc

    Returns time truncated to the unit.

    Parameters:
    unit - A STRING representing the unit to truncate the time to. Supported units are: "HOUR", "MINUTE", "SECOND", "MILLISECOND", and "MICROSECOND". The unit is case-insensitive.
    time - A TIME to truncate.
    Returns:
    A TIME truncated to the specified unit.
    Throws:
    IllegalArgumentException - If the unit is not supported.
    Since:
    4.1.0
    Note:
    If any of the inputs is NULL, the result is NULL.
  • time_from_seconds

    Creates a TIME from the number of seconds since midnight.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • time_from_millis

    Creates a TIME from the number of milliseconds since midnight.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • time_from_micros

    Creates a TIME from the number of microseconds since midnight.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • time_to_seconds

    Extracts the number of seconds (including fractional seconds) from a TIME value. Returns a DECIMAL(14,6) to preserve microsecond precision.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • time_to_millis

    Extracts the number of milliseconds since midnight from a TIME value.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • time_to_micros

    Extracts the number of microseconds since midnight from a TIME value.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • to_timestamp_ltz

    Parses the timestamp expression with the format expression to a timestamp without time zone. Returns null with invalid input.

    Parameters:
    timestamp - (undocumented)
    format - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • to_timestamp_ltz

    public static Column to_timestamp_ltz(Column timestamp)

    Parses the timestamp expression with the default format to a timestamp without time zone. The default format follows casting rules to a timestamp. Returns null with invalid input.

    Parameters:
    timestamp - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • to_timestamp_ntz

    Parses the timestamp_str expression with the format expression to a timestamp without time zone. Returns null with invalid input.

    Parameters:
    timestamp - (undocumented)
    format - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • to_timestamp_ntz

    public static Column to_timestamp_ntz(Column timestamp)

    Parses the timestamp expression with the default format to a timestamp without time zone. The default format follows casting rules to a timestamp. Returns null with invalid input.

    Parameters:
    timestamp - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • to_unix_timestamp

    Returns the UNIX timestamp of the given time.

    Parameters:
    timeExp - (undocumented)
    format - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • to_unix_timestamp

    public static Column to_unix_timestamp(Column timeExp)

    Returns the UNIX timestamp of the given time.

    Parameters:
    timeExp - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • monthname

    Extracts the three-letter abbreviated month name from a given date/timestamp/string.

    Parameters:
    timeExp - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • dayname

    Extracts the three-letter abbreviated day name from a given date/timestamp/string.

    Parameters:
    timeExp - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • array_contains

    Returns null if the array is null, true if the array contains value, and false otherwise.

    Parameters:
    column - (undocumented)
    value - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • array_append

    Returns an ARRAY containing all elements from the source ARRAY as well as the new element. The new element/column is located at end of the ARRAY.

    Parameters:
    column - (undocumented)
    element - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.4.0
  • arrays_overlap

    Returns true if a1 and a2 have at least one non-null element in common. If not and both the arrays are non-empty and any of them contains a null, it returns null. It returns false otherwise.

    Parameters:
    a1 - (undocumented)
    a2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • slice

    public static Column slice(Column x, int start, int length)

    Returns an array containing all the elements in x from index start (or starting from the end if start is negative) with the specified length.

    Parameters:
    x - the array column to be sliced
    start - the starting index
    length - the length of the slice
    Returns:
    (undocumented)
    Since:
    2.4.0
  • slice

    Returns an array containing all the elements in x from index start (or starting from the end if start is negative) with the specified length.

    Parameters:
    x - the array column to be sliced
    start - the starting index
    length - the length of the slice
    Returns:
    (undocumented)
    Since:
    3.1.0
  • array_join

    Concatenates the elements of column using the delimiter. Null values are replaced with nullReplacement.

    Parameters:
    column - (undocumented)
    delimiter - (undocumented)
    nullReplacement - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • array_join

    Concatenates the elements of column using the delimiter.

    Parameters:
    column - (undocumented)
    delimiter - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • concat

    public static Column concat(scala.collection.immutable.Seq<Column> exprs)

    Concatenates multiple input columns together into a single column. The function works with strings, binary and compatible array columns.

    Parameters:
    exprs - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
    Note:
    Returns null if any of the input columns are null.
  • array_position

    Locates the position of the first occurrence of the value in the given array as long. Returns null if either of the arguments are null.

    Parameters:
    column - (undocumented)
    value - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
    Note:
    The position is not zero based, but 1 based index. Returns 0 if value could not be found in array.
  • element_at

    Returns element of array at given index in value if column is array. Returns value for the given key in value if column is map.

    Parameters:
    column - (undocumented)
    value - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • try_element_at

    (array, index) - Returns element of array at given (1-based) index. If Index is 0, Spark will throw an error. If index &lt; 0, accesses elements from the last to the first. The function always returns NULL if the index exceeds the length of the array.

    (map, key) - Returns value for given key. The function always returns NULL if the key is not contained in the map.

    Parameters:
    column - (undocumented)
    value - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • get

    Returns element of array at given (0-based) index. If the index points outside of the array boundaries, then this function returns NULL.

    Parameters:
    column - (undocumented)
    index - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.4.0
  • array_sort

    Sorts the input array in ascending order. The elements of the input array must be orderable. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the end of the returned array.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • array_sort

    Sorts the input array based on the given comparator function. The comparator will take two arguments representing two elements of the array. It returns a negative integer, 0, or a positive integer as the first element is less than, equal to, or greater than the second element. If the comparator function returns null, the function will fail and raise an error.

    Parameters:
    e - (undocumented)
    comparator - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.4.0
  • array_remove

    Remove all elements that equal to element from the given array.

    Parameters:
    column - (undocumented)
    element - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • array_compact

    public static Column array_compact(Column column)

    Remove all null elements from the given array.

    Parameters:
    column - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.4.0
  • array_prepend

    Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.

    Parameters:
    column - (undocumented)
    element - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • array_distinct

    Removes duplicate values from the array.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • array_intersect

    Returns an array of the elements in the intersection of the given two arrays, without duplicates.

    Parameters:
    col1 - (undocumented)
    col2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • array_insert

    Adds an item into a given array at a specified position

    Parameters:
    arr - (undocumented)
    pos - (undocumented)
    value - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.4.0
  • array_union

    Returns an array of the elements in the union of the given two arrays, without duplicates.

    Parameters:
    col1 - (undocumented)
    col2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • array_except

    Returns an array of the elements in the first array but not in the second array, without duplicates. The order of elements in the result is not determined

    Parameters:
    col1 - (undocumented)
    col2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • transform

    Returns an array of elements after applying a transformation to each element in the input array.

    
       df.select(transform(col("i"), x => x + 1))
     
    Parameters:
    column - the input array column
    f - col => transformed_col, the lambda function to transform the input column
    Returns:
    (undocumented)
    Since:
    3.0.0
  • transform

    Returns an array of elements after applying a transformation to each element in the input array.

    
       df.select(transform(col("i"), (x, i) => x + i))
     
    Parameters:
    column - the input array column
    f - (col, index) => transformed_col, the lambda function to transform the input column given the index. Indices start at 0.
    Returns:
    (undocumented)
    Since:
    3.0.0
  • exists

    Returns whether a predicate holds for one or more elements in the array.

    
       df.select(exists(col("i"), _ % 2 === 0))
     
    Parameters:
    column - the input array column
    f - col => predicate, the Boolean predicate to check the input column
    Returns:
    (undocumented)
    Since:
    3.0.0
  • forall

    Returns whether a predicate holds for every element in the array.

    
       df.select(forall(col("i"), x => x % 2 === 0))
     
    Parameters:
    column - the input array column
    f - col => predicate, the Boolean predicate to check the input column
    Returns:
    (undocumented)
    Since:
    3.0.0
  • filter

    Returns an array of elements for which a predicate holds in a given array.

    
       df.select(filter(col("s"), x => x % 2 === 0))
     
    Parameters:
    column - the input array column
    f - col => predicate, the Boolean predicate to filter the input column
    Returns:
    (undocumented)
    Since:
    3.0.0
  • filter

    Returns an array of elements for which a predicate holds in a given array.

    
       df.select(filter(col("s"), (x, i) => i % 2 === 0))
     
    Parameters:
    column - the input array column
    f - (col, index) => predicate, the Boolean predicate to filter the input column given the index. Indices start at 0.
    Returns:
    (undocumented)
    Since:
    3.0.0
  • aggregate

    Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.

    
       df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x, _ * 10))
     
    Parameters:
    expr - the input array column
    initialValue - the initial value
    merge - (combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value
    finish - combined_value => final_value, the lambda function to convert the combined value of all inputs to final result
    Returns:
    (undocumented)
    Since:
    3.0.0
  • aggregate

    Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.

    
       df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x))
     
    Parameters:
    expr - the input array column
    initialValue - the initial value
    merge - (combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value
    Returns:
    (undocumented)
    Since:
    3.0.0
  • reduce

    Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.

    
       df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x, _ * 10))
     
    Parameters:
    expr - the input array column
    initialValue - the initial value
    merge - (combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value
    finish - combined_value => final_value, the lambda function to convert the combined value of all inputs to final result
    Returns:
    (undocumented)
    Since:
    3.5.0
  • reduce

    Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.

    
       df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x))
     
    Parameters:
    expr - the input array column
    initialValue - the initial value
    merge - (combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value
    Returns:
    (undocumented)
    Since:
    3.5.0
  • zip_with

    Merge two given arrays, element-wise, into a single array using a function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying the function.

    
       df.select(zip_with(df1("val1"), df1("val2"), (x, y) => x + y))
     
    Parameters:
    left - the left input array column
    right - the right input array column
    f - (lCol, rCol) => col, the lambda function to merge two input columns into one column
    Returns:
    (undocumented)
    Since:
    3.0.0
  • transform_keys

    Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new keys for the pairs.

    
       df.select(transform_keys(col("i"), (k, v) => k + v))
     
    Parameters:
    expr - the input map column
    f - (key, value) => new_key, the lambda function to transform the key of input map column
    Returns:
    (undocumented)
    Since:
    3.0.0
  • transform_values

    Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new values for the pairs.

    
       df.select(transform_values(col("i"), (k, v) => k + v))
     
    Parameters:
    expr - the input map column
    f - (key, value) => new_value, the lambda function to transform the value of input map column
    Returns:
    (undocumented)
    Since:
    3.0.0
  • map_filter

    Returns a map whose key-value pairs satisfy a predicate.

    
       df.select(map_filter(col("m"), (k, v) => k * 10 === v))
     
    Parameters:
    expr - the input map column
    f - (key, value) => predicate, the Boolean predicate to filter the input map column
    Returns:
    (undocumented)
    Since:
    3.0.0
  • map_zip_with

    Merge two given maps, key-wise into a single map using a function.

    
       df.select(map_zip_with(df("m1"), df("m2"), (k, v1, v2) => k === v1 + v2))
     
    Parameters:
    left - the left input map column
    right - the right input map column
    f - (key, value1, value2) => new_value, the lambda function to merge the map values
    Returns:
    (undocumented)
    Since:
    3.0.0
  • explode

    Creates a new row for each element in the given array or map column. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • explode_outer

    Creates a new row for each element in the given array or map column. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise. Unlike explode, if the array/map is null or empty then null is produced.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.2.0
  • posexplode

    Creates a new row for each element with position in the given array or map column. Uses the default column name pos for position, and col for elements in the array and key and value for elements in the map unless specified otherwise.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.1.0
  • posexplode_outer

    Creates a new row for each element with position in the given array or map column. Uses the default column name pos for position, and col for elements in the array and key and value for elements in the map unless specified otherwise. Unlike posexplode, if the array/map is null or empty then the row (null, null) is produced.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.2.0
  • inline

    Creates a new row for each element in the given array of structs.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.4.0
  • inline_outer

    Creates a new row for each element in the given array of structs. Unlike inline, if the array is null or empty then null is produced for each nested column.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.4.0
  • get_json_object

    Extracts json object from a json string based on json path specified, and returns json string of the extracted json object. It will return null if the input json string is invalid.

    Parameters:
    e - (undocumented)
    path - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • json_tuple

    public static Column json_tuple(Column json, scala.collection.immutable.Seq<String> fields)

    Creates a new row for a json column according to the given field names.

    Parameters:
    json - (undocumented)
    fields - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.6.0
  • from_json

    (Scala-specific) Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing JSON data.
    schema - the schema to use when parsing the json string
    options - options to control how the json is parsed. Accepts the same options as the json data source. See Data Source Option in the version you use.
    Returns:
    (undocumented)
    Since:
    2.1.0
  • from_json

    (Scala-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing JSON data.
    schema - the schema to use when parsing the json string
    options - options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.
    Returns:
    (undocumented)
    Since:
    2.2.0
  • from_json

    (Java-specific) Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing JSON data.
    schema - the schema to use when parsing the json string
    options - options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.
    Returns:
    (undocumented)
    Since:
    2.1.0
  • from_json

    (Java-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing JSON data.
    schema - the schema to use when parsing the json string
    options - options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.
    Returns:
    (undocumented)
    Since:
    2.2.0
  • from_json

    Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing JSON data.
    schema - the schema to use when parsing the json string
    Returns:
    (undocumented)
    Since:
    2.1.0
  • from_json

    Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing JSON data.
    schema - the schema to use when parsing the json string
    Returns:
    (undocumented)
    Since:
    2.2.0
  • from_json

    (Java-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing JSON data.
    schema - the schema as a DDL-formatted string.
    options - options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.
    Returns:
    (undocumented)
    Since:
    2.1.0
  • from_json

    (Scala-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing JSON data.
    schema - the schema as a DDL-formatted string.
    options - options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.
    Returns:
    (undocumented)
    Since:
    2.3.0
  • from_json

    (Scala-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType of StructTypes with the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing JSON data.
    schema - the schema to use when parsing the json string
    Returns:
    (undocumented)
    Since:
    2.4.0
  • from_json

    (Java-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType of StructTypes with the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing JSON data.
    schema - the schema to use when parsing the json string
    options - options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.
    Returns:
    (undocumented)
    Since:
    2.4.0
  • try_parse_json

    Parses a JSON string and constructs a Variant value. Returns null if the input string is not a valid JSON value.

    Parameters:
    json - a string column that contains JSON data.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • parse_json

    Parses a JSON string and constructs a Variant value.

    Parameters:
    json - a string column that contains JSON data.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • to_variant_object

    public static Column to_variant_object(Column col)

    Converts a column containing nested inputs (array/map/struct) into a variants where maps and structs are converted to variant objects which are unordered unlike SQL structs. Input maps can only have string keys.

    Parameters:
    col - a column with a nested schema or column name.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • is_variant_null

    Check if a variant value is a variant null. Returns true if and only if the input is a variant null and false otherwise (including in the case of SQL NULL).

    Parameters:
    v - a variant column.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • variant_get

    Extracts a sub-variant from v according to path string, and then cast the sub-variant to targetType. Returns null if the path does not exist. Throws an exception if the cast fails.

    Parameters:
    v - a variant column.
    path - the extraction path. A valid path should start with $ and is followed by zero or more segments like [123], .name, ['name'], or ["name"].
    targetType - the target data type to cast into, in a DDL-formatted string.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • variant_get

    Extracts a sub-variant from v according to path column, and then cast the sub-variant to targetType. Returns null if the path does not exist. Throws an exception if the cast fails.

    Parameters:
    v - a variant column.
    path - the column containing the extraction path strings. A valid path string should start with $ and is followed by zero or more segments like [123], .name, ['name'], or ["name"].
    targetType - the target data type to cast into, in a DDL-formatted string.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • try_variant_get

    Extracts a sub-variant from v according to path string, and then cast the sub-variant to targetType. Returns null if the path does not exist or the cast fails..

    Parameters:
    v - a variant column.
    path - the extraction path. A valid path should start with $ and is followed by zero or more segments like [123], .name, ['name'], or ["name"].
    targetType - the target data type to cast into, in a DDL-formatted string.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • try_variant_get

    Extracts a sub-variant from v according to path column, and then cast the sub-variant to targetType. Returns null if the path does not exist or the cast fails..

    Parameters:
    v - a variant column.
    path - the column containing the extraction path strings. A valid path string should start with $ and is followed by zero or more segments like [123], .name, ['name'], or ["name"].
    targetType - the target data type to cast into, in a DDL-formatted string.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • schema_of_variant

    Returns schema in the SQL format of a variant.

    Parameters:
    v - a variant column.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • schema_of_variant_agg

    public static Column schema_of_variant_agg(Column v)

    Returns the merged schema in the SQL format of a variant column.

    Parameters:
    v - a variant column.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • schema_of_json

    Parses a JSON string and infers its schema in DDL format.

    Parameters:
    json - a JSON string.
    Returns:
    (undocumented)
    Since:
    2.4.0
  • schema_of_json

    Parses a JSON string and infers its schema in DDL format.

    Parameters:
    json - a foldable string column containing a JSON string.
    Returns:
    (undocumented)
    Since:
    2.4.0
  • schema_of_json

    Parses a JSON string and infers its schema in DDL format using options.

    Parameters:
    json - a foldable string column containing JSON data.
    options - options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.
    Returns:
    a column with string literal containing schema in DDL format.
    Since:
    3.0.0
  • json_array_length

    Returns the number of elements in the outermost JSON array. NULL is returned in case of any other valid JSON string, NULL or an invalid JSON.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • json_object_keys

    Returns all the keys of the outermost JSON object as an array. If a valid JSON object is given, all the keys of the outermost object will be returned as an array. If it is any other valid JSON string, an invalid JSON string or an empty string, the function returns null.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • to_json

    (Scala-specific) Converts a column containing a StructType, ArrayType or a MapType into a JSON string with the specified schema. Throws an exception, in the case of an unsupported type.

    Parameters:
    e - a column containing a struct, an array or a map.
    options - options to control how the struct column is converted into a json string. accepts the same options and the json data source. See Data Source Option in the version you use. Additionally the function supports the pretty option which enables pretty JSON generation.
    Returns:
    (undocumented)
    Since:
    2.1.0
  • to_json

    (Java-specific) Converts a column containing a StructType, ArrayType or a MapType into a JSON string with the specified schema. Throws an exception, in the case of an unsupported type.

    Parameters:
    e - a column containing a struct, an array or a map.
    options - options to control how the struct column is converted into a json string. accepts the same options and the json data source. See Data Source Option in the version you use. Additionally the function supports the pretty option which enables pretty JSON generation.
    Returns:
    (undocumented)
    Since:
    2.1.0
  • to_json

    Converts a column containing a StructType, ArrayType or a MapType into a JSON string with the specified schema. Throws an exception, in the case of an unsupported type.

    Parameters:
    e - a column containing a struct, an array or a map.
    Returns:
    (undocumented)
    Since:
    2.1.0
  • mask

    Masks the given string value. The function replaces characters with 'X' or 'x', and numbers with 'n'. This can be useful for creating copies of tables with sensitive information removed.

    Parameters:
    input - string value to mask. Supported types: STRING, VARCHAR, CHAR
    Returns:
    (undocumented)
    Since:
    3.5.0
  • mask

    Masks the given string value. The function replaces upper-case characters with specific character, lower-case characters with 'x', and numbers with 'n'. This can be useful for creating copies of tables with sensitive information removed.

    Parameters:
    input - string value to mask. Supported types: STRING, VARCHAR, CHAR
    upperChar - character to replace upper-case characters with. Specify NULL to retain original character.
    Returns:
    (undocumented)
    Since:
    3.5.0
  • mask

    Masks the given string value. The function replaces upper-case and lower-case characters with the characters specified respectively, and numbers with 'n'. This can be useful for creating copies of tables with sensitive information removed.

    Parameters:
    input - string value to mask. Supported types: STRING, VARCHAR, CHAR
    upperChar - character to replace upper-case characters with. Specify NULL to retain original character.
    lowerChar - character to replace lower-case characters with. Specify NULL to retain original character.
    Returns:
    (undocumented)
    Since:
    3.5.0
  • mask

    Masks the given string value. The function replaces upper-case, lower-case characters and numbers with the characters specified respectively. This can be useful for creating copies of tables with sensitive information removed.

    Parameters:
    input - string value to mask. Supported types: STRING, VARCHAR, CHAR
    upperChar - character to replace upper-case characters with. Specify NULL to retain original character.
    lowerChar - character to replace lower-case characters with. Specify NULL to retain original character.
    digitChar - character to replace digit characters with. Specify NULL to retain original character.
    Returns:
    (undocumented)
    Since:
    3.5.0
  • mask

    Masks the given string value. This can be useful for creating copies of tables with sensitive information removed.

    Parameters:
    input - string value to mask. Supported types: STRING, VARCHAR, CHAR
    upperChar - character to replace upper-case characters with. Specify NULL to retain original character.
    lowerChar - character to replace lower-case characters with. Specify NULL to retain original character.
    digitChar - character to replace digit characters with. Specify NULL to retain original character.
    otherChar - character to replace all other characters with. Specify NULL to retain original character.
    Returns:
    (undocumented)
    Since:
    3.5.0
  • size

    Returns length of array or map.

    This function returns -1 for null input only if spark.sql.ansi.enabled is false and spark.sql.legacy.sizeOfNull is true. Otherwise, it returns null for null input. With the default settings, the function returns null for null input.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • cardinality

    Returns length of array or map. This is an alias of size function.

    This function returns -1 for null input only if spark.sql.ansi.enabled is false and spark.sql.legacy.sizeOfNull is true. Otherwise, it returns null for null input. With the default settings, the function returns null for null input.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • sort_array

    Sorts the input array for the given column in ascending order, according to the natural ordering of the array elements. Null elements will be placed at the beginning of the returned array.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • sort_array

    public static Column sort_array(Column e, boolean asc)

    Sorts the input array for the given column in ascending or descending order, according to the natural ordering of the array elements. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order.

    Parameters:
    e - (undocumented)
    asc - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • array_min

    Returns the minimum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • array_max

    Returns the maximum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • array_size

    Returns the total number of elements in the array. The function returns null for null input.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • array_agg

    Aggregate function: returns a list of objects with duplicates.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
    Note:
    The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
  • shuffle

    Returns a random permutation of the given array.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
    Note:
    The function is non-deterministic.
  • shuffle

    Returns a random permutation of the given array.

    Parameters:
    e - (undocumented)
    seed - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
    Note:
    The function is non-deterministic.
  • reverse

    Returns a reversed string or an array with reverse order of elements.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • flatten

    Creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • sequence

    Generate a sequence of integers from start to stop, incrementing by step.

    Parameters:
    start - (undocumented)
    stop - (undocumented)
    step - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • sequence

    Generate a sequence of integers from start to stop, incrementing by 1 if start is less than or equal to stop, otherwise -1.

    Parameters:
    start - (undocumented)
    stop - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • array_repeat

    Creates an array containing the left argument repeated the number of times given by the right argument.

    Parameters:
    left - (undocumented)
    right - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • array_repeat

    public static Column array_repeat(Column e, int count)

    Creates an array containing the left argument repeated the number of times given by the right argument.

    Parameters:
    e - (undocumented)
    count - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • map_contains_key

    Returns true if the map contains the key.

    Parameters:
    column - (undocumented)
    key - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.3.0
  • map_keys

    Returns an unordered array containing the keys of the map.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.3.0
  • map_values

    Returns an unordered array containing the values of the map.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.3.0
  • map_entries

    Returns an unordered array of all entries in the given map.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.0.0
  • map_from_entries

    Returns a map created from the given array of entries.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • arrays_zip

    public static Column arrays_zip(scala.collection.immutable.Seq<Column> e)

    Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • map_concat

    public static Column map_concat(scala.collection.immutable.Seq<Column> cols)

    Returns the union of all the given maps.

    Parameters:
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.4.0
  • from_csv

    Parses a column containing a CSV string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing CSV data.
    schema - the schema to use when parsing the CSV string
    options - options to control how the CSV is parsed. accepts the same options and the CSV data source. See Data Source Option in the version you use.
    Returns:
    (undocumented)
    Since:
    3.0.0
  • from_csv

    (Java-specific) Parses a column containing a CSV string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing CSV data.
    schema - the schema to use when parsing the CSV string
    options - options to control how the CSV is parsed. accepts the same options and the CSV data source. See Data Source Option in the version you use.
    Returns:
    (undocumented)
    Since:
    3.0.0
  • schema_of_csv

    Parses a CSV string and infers its schema in DDL format.

    Parameters:
    csv - a CSV string.
    Returns:
    (undocumented)
    Since:
    3.0.0
  • schema_of_csv

    Parses a CSV string and infers its schema in DDL format.

    Parameters:
    csv - a foldable string column containing a CSV string.
    Returns:
    (undocumented)
    Since:
    3.0.0
  • schema_of_csv

    Parses a CSV string and infers its schema in DDL format using options.

    Parameters:
    csv - a foldable string column containing a CSV string.
    options - options to control how the CSV is parsed. accepts the same options and the CSV data source. See Data Source Option in the version you use.
    Returns:
    a column with string literal containing schema in DDL format.
    Since:
    3.0.0
  • to_csv

    (Java-specific) Converts a column containing a StructType into a CSV string with the specified schema. Throws an exception, in the case of an unsupported type.

    Parameters:
    e - a column containing a struct.
    options - options to control how the struct column is converted into a CSV string. It accepts the same options and the CSV data source. See Data Source Option in the version you use.
    Returns:
    (undocumented)
    Since:
    3.0.0
  • to_csv

    Converts a column containing a StructType into a CSV string with the specified schema. Throws an exception, in the case of an unsupported type.

    Parameters:
    e - a column containing a struct.
    Returns:
    (undocumented)
    Since:
    3.0.0
  • from_xml

    Parses a column containing a XML string into the data type corresponding to the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing XML data.
    schema - the schema to use when parsing the XML string
    options - options to control how the XML is parsed. accepts the same options and the XML data source. See Data Source Option in the version you use.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • from_xml

    (Java-specific) Parses a column containing a XML string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing XML data.
    schema - the schema as a DDL-formatted string.
    options - options to control how the XML is parsed. accepts the same options and the xml data source. See Data Source Option in the version you use.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • from_xml

    (Java-specific) Parses a column containing a XML string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing XML data.
    schema - the schema to use when parsing the XML string
    Returns:
    (undocumented)
    Since:
    4.0.0
  • from_xml

    (Java-specific) Parses a column containing a XML string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing XML data.
    schema - the schema to use when parsing the XML string
    options - options to control how the XML is parsed. accepts the same options and the XML data source. See Data Source Option in the version you use.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • from_xml

    Parses a column containing a XML string into the data type corresponding to the specified schema. Returns null, in the case of an unparseable string.

    Parameters:
    e - a string column containing XML data.
    schema - the schema to use when parsing the XML string
    Returns:
    (undocumented)
    Since:
    4.0.0
  • schema_of_xml

    Parses a XML string and infers its schema in DDL format.

    Parameters:
    xml - a XML string.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • schema_of_xml

    Parses a XML string and infers its schema in DDL format.

    Parameters:
    xml - a foldable string column containing a XML string.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • schema_of_xml

    Parses a XML string and infers its schema in DDL format using options.

    Parameters:
    xml - a foldable string column containing XML data.
    options - options to control how the xml is parsed. accepts the same options and the XML data source. See Data Source Option in the version you use.
    Returns:
    a column with string literal containing schema in DDL format.
    Since:
    4.0.0
  • to_xml

    (Java-specific) Converts a column containing a StructType into a XML string with the specified schema. Throws an exception, in the case of an unsupported type.

    Parameters:
    e - a column containing a struct.
    options - options to control how the struct column is converted into a XML string. It accepts the same options as the XML data source. See Data Source Option in the version you use.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • to_xml

    Converts a column containing a StructType into a XML string with the specified schema. Throws an exception, in the case of an unsupported type.

    Parameters:
    e - a column containing a struct.
    Returns:
    (undocumented)
    Since:
    4.0.0
  • years

    (Java-specific) A transform for timestamps and dates to partition data into years.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.0.0
  • months

    (Java-specific) A transform for timestamps and dates to partition data into months.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.0.0
  • days

    (Java-specific) A transform for timestamps and dates to partition data into days.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.0.0
  • xpath

    Returns a string array of values within the nodes of xml that match the XPath expression.

    Parameters:
    xml - (undocumented)
    path - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • xpath_boolean

    Returns true if the XPath expression evaluates to true, or if a matching node is found.

    Parameters:
    xml - (undocumented)
    path - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • xpath_double

    Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.

    Parameters:
    xml - (undocumented)
    path - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • xpath_number

    Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.

    Parameters:
    xml - (undocumented)
    path - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • xpath_float

    Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.

    Parameters:
    xml - (undocumented)
    path - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • xpath_int

    Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.

    Parameters:
    xml - (undocumented)
    path - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • xpath_long

    Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.

    Parameters:
    xml - (undocumented)
    path - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • xpath_short

    Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.

    Parameters:
    xml - (undocumented)
    path - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • xpath_string

    Returns the text contents of the first xml node that matches the XPath expression.

    Parameters:
    xml - (undocumented)
    path - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • hours

    (Java-specific) A transform for timestamps to partition data into hours.

    Parameters:
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.0.0
  • convert_timezone

    Converts the timestamp without time zone sourceTs from the sourceTz time zone to targetTz.

    Parameters:
    sourceTz - the time zone for the input timestamp. If it is missed, the current session time zone is used as the source time zone.
    targetTz - the time zone to which the input timestamp should be converted.
    sourceTs - a timestamp without time zone.
    Returns:
    (undocumented)
    Since:
    3.5.0
  • convert_timezone

    Converts the timestamp without time zone sourceTs from the current time zone to targetTz.

    Parameters:
    targetTz - the time zone to which the input timestamp should be converted.
    sourceTs - a timestamp without time zone.
    Returns:
    (undocumented)
    Since:
    3.5.0
  • make_dt_interval

    Make DayTimeIntervalType duration from days, hours, mins and secs.

    Parameters:
    days - (undocumented)
    hours - (undocumented)
    mins - (undocumented)
    secs - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • make_dt_interval

    Make DayTimeIntervalType duration from days, hours and mins.

    Parameters:
    days - (undocumented)
    hours - (undocumented)
    mins - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • make_dt_interval

    Make DayTimeIntervalType duration from days and hours.

    Parameters:
    days - (undocumented)
    hours - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • make_dt_interval

    public static Column make_dt_interval(Column days)

    Make DayTimeIntervalType duration from days.

    Parameters:
    days - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • make_dt_interval

    public static Column make_dt_interval()

    Make DayTimeIntervalType duration.

    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_make_interval

    This is a special version of make_interval that performs the same operation, but returns a NULL value instead of raising an error if interval cannot be created.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    weeks - (undocumented)
    days - (undocumented)
    hours - (undocumented)
    mins - (undocumented)
    secs - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • make_interval

    Make interval from years, months, weeks, days, hours, mins and secs.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    weeks - (undocumented)
    days - (undocumented)
    hours - (undocumented)
    mins - (undocumented)
    secs - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_make_interval

    This is a special version of make_interval that performs the same operation, but returns a NULL value instead of raising an error if interval cannot be created.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    weeks - (undocumented)
    days - (undocumented)
    hours - (undocumented)
    mins - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • make_interval

    Make interval from years, months, weeks, days, hours and mins.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    weeks - (undocumented)
    days - (undocumented)
    hours - (undocumented)
    mins - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_make_interval

    This is a special version of make_interval that performs the same operation, but returns a NULL value instead of raising an error if interval cannot be created.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    weeks - (undocumented)
    days - (undocumented)
    hours - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • make_interval

    Make interval from years, months, weeks, days and hours.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    weeks - (undocumented)
    days - (undocumented)
    hours - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_make_interval

    This is a special version of make_interval that performs the same operation, but returns a NULL value instead of raising an error if interval cannot be created.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    weeks - (undocumented)
    days - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • make_interval

    Make interval from years, months, weeks and days.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    weeks - (undocumented)
    days - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_make_interval

    This is a special version of make_interval that performs the same operation, but returns a NULL value instead of raising an error if interval cannot be created.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    weeks - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • make_interval

    Make interval from years, months and weeks.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    weeks - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_make_interval

    This is a special version of make_interval that performs the same operation, but returns a NULL value instead of raising an error if interval cannot be created.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • make_interval

    Make interval from years and months.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_make_interval

    public static Column try_make_interval(Column years)

    This is a special version of make_interval that performs the same operation, but returns a NULL value instead of raising an error if interval cannot be created.

    Parameters:
    years - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • make_interval

    Make interval from years.

    Parameters:
    years - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • make_interval

    public static Column make_interval()

    Returns:
    (undocumented)
    Since:
    3.5.0
  • make_timestamp

    Create timestamp from years, months, days, hours, mins, secs and timezone fields. The result data type is consistent with the value of configuration spark.sql.timestampType. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    days - (undocumented)
    hours - (undocumented)
    mins - (undocumented)
    secs - (undocumented)
    timezone - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • make_timestamp

    Create timestamp from years, months, days, hours, mins and secs fields. The result data type is consistent with the value of configuration spark.sql.timestampType. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    days - (undocumented)
    hours - (undocumented)
    mins - (undocumented)
    secs - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • make_timestamp

    Create a local date-time from date, time, and timezone fields.

    Parameters:
    date - (undocumented)
    time - (undocumented)
    timezone - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • make_timestamp

    Create a local date-time from date and time fields.

    Parameters:
    date - (undocumented)
    time - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • try_make_timestamp

    Try to create a timestamp from years, months, days, hours, mins, secs and timezone fields. The result data type is consistent with the value of configuration spark.sql.timestampType. The function returns NULL on invalid inputs.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    days - (undocumented)
    hours - (undocumented)
    mins - (undocumented)
    secs - (undocumented)
    timezone - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • try_make_timestamp

    Try to create a timestamp from years, months, days, hours, mins, and secs fields. The result data type is consistent with the value of configuration spark.sql.timestampType. The function returns NULL on invalid inputs.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    days - (undocumented)
    hours - (undocumented)
    mins - (undocumented)
    secs - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • try_make_timestamp

    Try to create a local date-time from date, time, and timezone fields.

    Parameters:
    date - (undocumented)
    time - (undocumented)
    timezone - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • try_make_timestamp

    Try to create a local date-time from date and time fields.

    Parameters:
    date - (undocumented)
    time - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • make_timestamp_ltz

    Create the current timestamp with local time zone from years, months, days, hours, mins, secs and timezone fields. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    days - (undocumented)
    hours - (undocumented)
    mins - (undocumented)
    secs - (undocumented)
    timezone - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • make_timestamp_ltz

    Create the current timestamp with local time zone from years, months, days, hours, mins and secs fields. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    days - (undocumented)
    hours - (undocumented)
    mins - (undocumented)
    secs - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • try_make_timestamp_ltz

    Try to create the current timestamp with local time zone from years, months, days, hours, mins, secs and timezone fields. The function returns NULL on invalid inputs.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    days - (undocumented)
    hours - (undocumented)
    mins - (undocumented)
    secs - (undocumented)
    timezone - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • try_make_timestamp_ltz

    Try to create the current timestamp with local time zone from years, months, days, hours, mins and secs fields. The function returns NULL on invalid inputs.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    days - (undocumented)
    hours - (undocumented)
    mins - (undocumented)
    secs - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • make_timestamp_ntz

    Create local date-time from years, months, days, hours, mins, secs fields. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    days - (undocumented)
    hours - (undocumented)
    mins - (undocumented)
    secs - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • make_timestamp_ntz

    Create a local date-time from date and time fields.

    Parameters:
    date - (undocumented)
    time - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • try_make_timestamp_ntz

    Try to create a local date-time from years, months, days, hours, mins, secs fields. The function returns NULL on invalid inputs.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    days - (undocumented)
    hours - (undocumented)
    mins - (undocumented)
    secs - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • try_make_timestamp_ntz

    Try to create a local date-time from date and time fields.

    Parameters:
    date - (undocumented)
    time - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • make_ym_interval

    Make year-month interval from years, months.

    Parameters:
    years - (undocumented)
    months - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • make_ym_interval

    public static Column make_ym_interval(Column years)

    Make year-month interval from years.

    Parameters:
    years - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • make_ym_interval

    public static Column make_ym_interval()

    Make year-month interval.

    Returns:
    (undocumented)
    Since:
    3.5.0
  • bucket

    (Java-specific) A transform for any type that partitions by a hash of the input column.

    Parameters:
    numBuckets - (undocumented)
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.0.0
  • bucket

    public static Column bucket(int numBuckets, Column e)

    (Java-specific) A transform for any type that partitions by a hash of the input column.

    Parameters:
    numBuckets - (undocumented)
    e - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.0.0
  • ifnull

    Returns col2 if col1 is null, or col1 otherwise.

    Parameters:
    col1 - (undocumented)
    col2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • isnotnull

    Returns true if col is not null, or false otherwise.

    Parameters:
    col - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • equal_null

    Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null.

    Parameters:
    col1 - (undocumented)
    col2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • nullif

    Returns null if col1 equals to col2, or col1 otherwise.

    Parameters:
    col1 - (undocumented)
    col2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • nullifzero

    Returns null if col is equal to zero, or col otherwise.

    Parameters:
    col - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • nvl

    Returns col2 if col1 is null, or col1 otherwise.

    Parameters:
    col1 - (undocumented)
    col2 - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • nvl2

    Returns col2 if col1 is not null, or col3 otherwise.

    Parameters:
    col1 - (undocumented)
    col2 - (undocumented)
    col3 - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.5.0
  • zeroifnull

    Returns zero if col is null, or col otherwise.

    Parameters:
    col - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.0.0
  • st_asbinary

    Returns the input GEOGRAPHY or GEOMETRY value in WKB format.

    Parameters:
    geo - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • st_geogfromwkb

    Parses the WKB description of a geography and returns the corresponding GEOGRAPHY value.

    Parameters:
    wkb - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • st_geomfromwkb

    Parses the WKB description of a geometry and returns the corresponding GEOMETRY value.

    Parameters:
    wkb - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • st_geomfromwkb

    Parses the WKB description of a geometry and returns the corresponding GEOMETRY value.

    Parameters:
    wkb - (undocumented)
    srid - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • st_geomfromwkb

    public static Column st_geomfromwkb(Column wkb, int srid)

    Parses the WKB description of a geometry and returns the corresponding GEOMETRY value.

    Parameters:
    wkb - (undocumented)
    srid - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.2.0
  • st_setsrid

    Returns a new GEOGRAPHY or GEOMETRY value whose SRID is the specified SRID value.

    Parameters:
    geo - (undocumented)
    srid - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • st_setsrid

    public static Column st_setsrid(Column geo, int srid)

    Returns a new GEOGRAPHY or GEOMETRY value whose SRID is the specified SRID value.

    Parameters:
    geo - (undocumented)
    srid - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • st_srid

    Returns the SRID of the input GEOGRAPHY or GEOMETRY value.

    Parameters:
    geo - (undocumented)
    Returns:
    (undocumented)
    Since:
    4.1.0
  • udaf

    public static <IN, BUF, OUT> UserDefinedFunction udaf(Aggregator<IN,BUF,OUT> agg, scala.reflect.api.TypeTags.TypeTag<IN> evidence$3)

    Obtains a UserDefinedFunction that wraps the given Aggregator so that it may be used with untyped Data Frames.

    
       val agg = // Aggregator[IN, BUF, OUT]
    
       // declare a UDF based on agg
       val aggUDF = udaf(agg)
       val aggData = df.agg(aggUDF($"colname"))
    
       // register agg as a named function
       spark.udf.register("myAggName", udaf(agg))
     
    Parameters:
    agg - the typed Aggregator
    evidence$3 - (undocumented)
    Returns:
    a UserDefinedFunction that can be used as an aggregating expression.
    Note:
    The input encoder is inferred from the input type IN.
  • udaf

    Obtains a UserDefinedFunction that wraps the given Aggregator so that it may be used with untyped Data Frames.

    
       Aggregator<IN, BUF, OUT> agg = // custom Aggregator
       Encoder<IN> enc = // input encoder
    
       // declare a UDF based on agg
       UserDefinedFunction aggUDF = udaf(agg, enc)
       DataFrame aggData = df.agg(aggUDF($"colname"))
    
       // register agg as a named function
       spark.udf.register("myAggName", udaf(agg, enc))
     
    Parameters:
    agg - the typed Aggregator
    inputEncoder - a specific input encoder to use
    Returns:
    a UserDefinedFunction that can be used as an aggregating expression
    Note:
    This overloading takes an explicit input encoder, to support UDAF declarations in Java.
  • udf

    public static <RT> UserDefinedFunction udf(scala.Function0<RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$4)

    Defines a Scala closure of 0 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    evidence$4 - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • udf

    public static <RT, A1> UserDefinedFunction udf(scala.Function1<A1,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$5, scala.reflect.api.TypeTags.TypeTag<A1> evidence$6)

    Defines a Scala closure of 1 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    evidence$5 - (undocumented)
    evidence$6 - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • udf

    public static <RT, A1, A2> UserDefinedFunction udf(scala.Function2<A1,A2,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$7, scala.reflect.api.TypeTags.TypeTag<A1> evidence$8, scala.reflect.api.TypeTags.TypeTag<A2> evidence$9)

    Defines a Scala closure of 2 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    evidence$7 - (undocumented)
    evidence$8 - (undocumented)
    evidence$9 - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • udf

    public static <RT, A1, A2, A3> UserDefinedFunction udf(scala.Function3<A1,A2,A3,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$10, scala.reflect.api.TypeTags.TypeTag<A1> evidence$11, scala.reflect.api.TypeTags.TypeTag<A2> evidence$12, scala.reflect.api.TypeTags.TypeTag<A3> evidence$13)

    Defines a Scala closure of 3 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    evidence$10 - (undocumented)
    evidence$11 - (undocumented)
    evidence$12 - (undocumented)
    evidence$13 - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • udf

    public static <RT, A1, A2, A3, A4> UserDefinedFunction udf(scala.Function4<A1,A2,A3,A4,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$14, scala.reflect.api.TypeTags.TypeTag<A1> evidence$15, scala.reflect.api.TypeTags.TypeTag<A2> evidence$16, scala.reflect.api.TypeTags.TypeTag<A3> evidence$17, scala.reflect.api.TypeTags.TypeTag<A4> evidence$18)

    Defines a Scala closure of 4 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    evidence$14 - (undocumented)
    evidence$15 - (undocumented)
    evidence$16 - (undocumented)
    evidence$17 - (undocumented)
    evidence$18 - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • udf

    public static <RT, A1, A2, A3, A4, A5> UserDefinedFunction udf(scala.Function5<A1,A2,A3,A4,A5,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$19, scala.reflect.api.TypeTags.TypeTag<A1> evidence$20, scala.reflect.api.TypeTags.TypeTag<A2> evidence$21, scala.reflect.api.TypeTags.TypeTag<A3> evidence$22, scala.reflect.api.TypeTags.TypeTag<A4> evidence$23, scala.reflect.api.TypeTags.TypeTag<A5> evidence$24)

    Defines a Scala closure of 5 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    evidence$19 - (undocumented)
    evidence$20 - (undocumented)
    evidence$21 - (undocumented)
    evidence$22 - (undocumented)
    evidence$23 - (undocumented)
    evidence$24 - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • udf

    public static <RT, A1, A2, A3, A4, A5, A6> UserDefinedFunction udf(scala.Function6<A1,A2,A3,A4,A5,A6,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$25, scala.reflect.api.TypeTags.TypeTag<A1> evidence$26, scala.reflect.api.TypeTags.TypeTag<A2> evidence$27, scala.reflect.api.TypeTags.TypeTag<A3> evidence$28, scala.reflect.api.TypeTags.TypeTag<A4> evidence$29, scala.reflect.api.TypeTags.TypeTag<A5> evidence$30, scala.reflect.api.TypeTags.TypeTag<A6> evidence$31)

    Defines a Scala closure of 6 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    evidence$25 - (undocumented)
    evidence$26 - (undocumented)
    evidence$27 - (undocumented)
    evidence$28 - (undocumented)
    evidence$29 - (undocumented)
    evidence$30 - (undocumented)
    evidence$31 - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • udf

    public static <RT, A1, A2, A3, A4, A5, A6, A7> UserDefinedFunction udf(scala.Function7<A1,A2,A3,A4,A5,A6,A7,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$32, scala.reflect.api.TypeTags.TypeTag<A1> evidence$33, scala.reflect.api.TypeTags.TypeTag<A2> evidence$34, scala.reflect.api.TypeTags.TypeTag<A3> evidence$35, scala.reflect.api.TypeTags.TypeTag<A4> evidence$36, scala.reflect.api.TypeTags.TypeTag<A5> evidence$37, scala.reflect.api.TypeTags.TypeTag<A6> evidence$38, scala.reflect.api.TypeTags.TypeTag<A7> evidence$39)

    Defines a Scala closure of 7 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    evidence$32 - (undocumented)
    evidence$33 - (undocumented)
    evidence$34 - (undocumented)
    evidence$35 - (undocumented)
    evidence$36 - (undocumented)
    evidence$37 - (undocumented)
    evidence$38 - (undocumented)
    evidence$39 - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • udf

    public static <RT, A1, A2, A3, A4, A5, A6, A7, A8> UserDefinedFunction udf(scala.Function8<A1,A2,A3,A4,A5,A6,A7,A8,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$40, scala.reflect.api.TypeTags.TypeTag<A1> evidence$41, scala.reflect.api.TypeTags.TypeTag<A2> evidence$42, scala.reflect.api.TypeTags.TypeTag<A3> evidence$43, scala.reflect.api.TypeTags.TypeTag<A4> evidence$44, scala.reflect.api.TypeTags.TypeTag<A5> evidence$45, scala.reflect.api.TypeTags.TypeTag<A6> evidence$46, scala.reflect.api.TypeTags.TypeTag<A7> evidence$47, scala.reflect.api.TypeTags.TypeTag<A8> evidence$48)

    Defines a Scala closure of 8 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    evidence$40 - (undocumented)
    evidence$41 - (undocumented)
    evidence$42 - (undocumented)
    evidence$43 - (undocumented)
    evidence$44 - (undocumented)
    evidence$45 - (undocumented)
    evidence$46 - (undocumented)
    evidence$47 - (undocumented)
    evidence$48 - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • udf

    public static <RT, A1, A2, A3, A4, A5, A6, A7, A8, A9> UserDefinedFunction udf(scala.Function9<A1,A2,A3,A4,A5,A6,A7,A8,A9,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$49, scala.reflect.api.TypeTags.TypeTag<A1> evidence$50, scala.reflect.api.TypeTags.TypeTag<A2> evidence$51, scala.reflect.api.TypeTags.TypeTag<A3> evidence$52, scala.reflect.api.TypeTags.TypeTag<A4> evidence$53, scala.reflect.api.TypeTags.TypeTag<A5> evidence$54, scala.reflect.api.TypeTags.TypeTag<A6> evidence$55, scala.reflect.api.TypeTags.TypeTag<A7> evidence$56, scala.reflect.api.TypeTags.TypeTag<A8> evidence$57, scala.reflect.api.TypeTags.TypeTag<A9> evidence$58)

    Defines a Scala closure of 9 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    evidence$49 - (undocumented)
    evidence$50 - (undocumented)
    evidence$51 - (undocumented)
    evidence$52 - (undocumented)
    evidence$53 - (undocumented)
    evidence$54 - (undocumented)
    evidence$55 - (undocumented)
    evidence$56 - (undocumented)
    evidence$57 - (undocumented)
    evidence$58 - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • udf

    public static <RT, A1, A2, A3, A4, A5, A6, A7, A8, A9, A10> UserDefinedFunction udf(scala.Function10<A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$59, scala.reflect.api.TypeTags.TypeTag<A1> evidence$60, scala.reflect.api.TypeTags.TypeTag<A2> evidence$61, scala.reflect.api.TypeTags.TypeTag<A3> evidence$62, scala.reflect.api.TypeTags.TypeTag<A4> evidence$63, scala.reflect.api.TypeTags.TypeTag<A5> evidence$64, scala.reflect.api.TypeTags.TypeTag<A6> evidence$65, scala.reflect.api.TypeTags.TypeTag<A7> evidence$66, scala.reflect.api.TypeTags.TypeTag<A8> evidence$67, scala.reflect.api.TypeTags.TypeTag<A9> evidence$68, scala.reflect.api.TypeTags.TypeTag<A10> evidence$69)

    Defines a Scala closure of 10 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    evidence$59 - (undocumented)
    evidence$60 - (undocumented)
    evidence$61 - (undocumented)
    evidence$62 - (undocumented)
    evidence$63 - (undocumented)
    evidence$64 - (undocumented)
    evidence$65 - (undocumented)
    evidence$66 - (undocumented)
    evidence$67 - (undocumented)
    evidence$68 - (undocumented)
    evidence$69 - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.3.0
  • udf

    Defines a Java UDF0 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    returnType - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.3.0
  • udf

    Defines a Java UDF1 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    returnType - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.3.0
  • udf

    Defines a Java UDF2 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    returnType - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.3.0
  • udf

    Defines a Java UDF3 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    returnType - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.3.0
  • udf

    Defines a Java UDF4 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    returnType - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.3.0
  • udf

    Defines a Java UDF5 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    returnType - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.3.0
  • udf

    Defines a Java UDF6 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    returnType - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.3.0
  • udf

    Defines a Java UDF7 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    returnType - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.3.0
  • udf

    Defines a Java UDF8 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    returnType - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.3.0
  • udf

    public static UserDefinedFunction udf(UDF9<?,?,?,?,?,?,?,?,?,?> f, DataType returnType)

    Defines a Java UDF9 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    returnType - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.3.0
  • udf

    public static UserDefinedFunction udf(UDF10<?,?,?,?,?,?,?,?,?,?,?> f, DataType returnType)

    Defines a Java UDF10 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Parameters:
    f - (undocumented)
    returnType - (undocumented)
    Returns:
    (undocumented)
    Since:
    2.3.0
  • udf

    Defines a deterministic user-defined function (UDF) using a Scala closure. For this variant, the caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API UserDefinedFunction.asNondeterministic().

    Note that, although the Scala closure can have primitive-type function argument, it doesn't work well with null values. Because the Scala closure is passed in as Any type, there is no type information for the function arguments. Without the type information, Spark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g. udf((x: Int) => x, IntegerType), the result is 0 for null input.

    Parameters:
    f - A closure in Scala
    dataType - The output data type of the UDF
    Returns:
    (undocumented)
    Since:
    2.0.0
  • callUDF

    public static Column callUDF(String udfName, scala.collection.immutable.Seq<Column> cols)

    Call an user-defined function.

    Parameters:
    udfName - (undocumented)
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    1.5.0
  • call_udf

    public static Column call_udf(String udfName, scala.collection.immutable.Seq<Column> cols)

    Call an user-defined function. Example:

    
      import org.apache.spark.sql._
    
      val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value")
      val spark = df.sparkSession
      spark.udf.register("simpleUDF", (v: Int) => v * v)
      df.select($"id", call_udf("simpleUDF", $"value"))
     
    Parameters:
    udfName - (undocumented)
    cols - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.2.0
  • call_function

    public static Column call_function(String funcName, scala.collection.immutable.Seq<Column> cols)

    Parameters:
    funcName - function name that follows the SQL identifier syntax (can be quoted, can be qualified)
    cols - the expression parameters of function
    Returns:
    (undocumented)
    Since:
    3.5.0
  • unwrap_udt

    Unwrap UDT data type column into its underlying type.

    Parameters:
    column - (undocumented)
    Returns:
    (undocumented)
    Since:
    3.4.0