DataFrameNaFunctions (Spark 4.2.0 JavaDoc)
org.apache.spark.sql.DataFrameNaFunctions
public abstract class DataFrameNaFunctions extends Object
Functionality for working with missing data in DataFrames.
- Since:
- 1.3.1
-
Constructor Summary
Constructors
-
Method Summary
drop()Returns a new
DataFramethat drops rows containing any null or NaN values.drop(int minNonNulls) Returns a new
DataFramethat drops rows containing less thanminNonNullsnon-null and non-NaN values.drop(int minNonNulls, String[] cols) Returns a new
DataFramethat drops rows containing less thanminNonNullsnon-null and non-NaN values in the specified columns.drop(int minNonNulls, scala.collection.immutable.Seq<String> cols) (Scala-specific) Returns a new
DataFramethat drops rows containing less thanminNonNullsnon-null and non-NaN values in the specified columns.Returns a new
DataFramethat drops rows containing null or NaN values.Returns a new
DataFramethat drops rows containing any null or NaN values in the specified columns.Returns a new
DataFramethat drops rows containing null or NaN values in the specified columns.drop(String how, scala.collection.immutable.Seq<String> cols) (Scala-specific) Returns a new
DataFramethat drops rows containing null or NaN values in the specified columns.drop(scala.collection.immutable.Seq<String> cols) (Scala-specific) Returns a new
DataFramethat drops rows containing any null or NaN values in the specified columns.fill(boolean value) Returns a new
DataFramethat replaces null values in boolean columns withvalue.fill(boolean value, String[] cols) Returns a new
DataFramethat replaces null values in specified boolean columns.fill(boolean value, scala.collection.immutable.Seq<String> cols) (Scala-specific) Returns a new
DataFramethat replaces null values in specified boolean columns.fill(double value) Returns a new
DataFramethat replaces null or NaN values in numeric columns withvalue.fill(double value, String[] cols) Returns a new
DataFramethat replaces null or NaN values in specified numeric columns.fill(double value, scala.collection.immutable.Seq<String> cols) (Scala-specific) Returns a new
DataFramethat replaces null or NaN values in specified numeric columns.fill(long value) Returns a new
DataFramethat replaces null or NaN values in numeric columns withvalue.fill(long value, String[] cols) Returns a new
DataFramethat replaces null or NaN values in specified numeric columns.fill(long value, scala.collection.immutable.Seq<String> cols) (Scala-specific) Returns a new
DataFramethat replaces null or NaN values in specified numeric columns.Returns a new
DataFramethat replaces null values in string columns withvalue.Returns a new
DataFramethat replaces null values in specified string columns.fill(String value, scala.collection.immutable.Seq<String> cols) (Scala-specific) Returns a new
DataFramethat replaces null values in specified string columns.Returns a new
DataFramethat replaces null values.fill(scala.collection.immutable.Map<String, Object> valueMap) (Scala-specific) Returns a new
DataFramethat replaces null values.Replaces values matching keys in
replacementmap with the corresponding values.Replaces values matching keys in
replacementmap with the corresponding values.replace(String col, scala.collection.immutable.Map<T, T> replacement) (Scala-specific) Replaces values matching keys in
replacementmap.replace(scala.collection.immutable.Seq<String> cols, scala.collection.immutable.Map<T, T> replacement) (Scala-specific) Replaces values matching keys in
replacementmap.
-
Constructor Details
-
DataFrameNaFunctions
public DataFrameNaFunctions()
-
-
Method Details
-
drop
Returns a new
DataFramethat drops rows containing any null or NaN values.- Returns:
- (undocumented)
- Since:
- 1.3.1
-
drop
Returns a new
DataFramethat drops rows containing null or NaN values.If
howis "any", then drop rows containing any null or NaN values. Ifhowis "all", then drop rows only if every column is null or NaN for that row.- Parameters:
how- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.1
-
drop
Returns a new
DataFramethat drops rows containing any null or NaN values in the specified columns.- Parameters:
cols- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.1
-
drop
public Dataset<Row> drop
(scala.collection.immutable.Seq<String> cols) (Scala-specific) Returns a new
DataFramethat drops rows containing any null or NaN values in the specified columns.- Parameters:
cols- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.1
-
drop
Returns a new
DataFramethat drops rows containing null or NaN values in the specified columns.If
howis "any", then drop rows containing any null or NaN values in the specified columns. Ifhowis "all", then drop rows only if every specified column is null or NaN for that row.- Parameters:
how- (undocumented)cols- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.1
-
drop
(Scala-specific) Returns a new
DataFramethat drops rows containing null or NaN values in the specified columns.If
howis "any", then drop rows containing any null or NaN values in the specified columns. Ifhowis "all", then drop rows only if every specified column is null or NaN for that row.- Parameters:
how- (undocumented)cols- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.1
-
drop
public Dataset<Row> drop
(int minNonNulls) Returns a new
DataFramethat drops rows containing less thanminNonNullsnon-null and non-NaN values.- Parameters:
minNonNulls- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.1
-
drop
Returns a new
DataFramethat drops rows containing less thanminNonNullsnon-null and non-NaN values in the specified columns.- Parameters:
minNonNulls- (undocumented)cols- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.1
-
drop
public Dataset<Row> drop
(int minNonNulls, scala.collection.immutable.Seq<String> cols) (Scala-specific) Returns a new
DataFramethat drops rows containing less thanminNonNullsnon-null and non-NaN values in the specified columns.- Parameters:
minNonNulls- (undocumented)cols- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.1
-
fill
public abstract Dataset<Row> fill
(long value) Returns a new
DataFramethat replaces null or NaN values in numeric columns withvalue.- Parameters:
value- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.2.0
-
fill
public abstract Dataset<Row> fill
(double value) Returns a new
DataFramethat replaces null or NaN values in numeric columns withvalue.- Parameters:
value- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.1
-
fill
Returns a new
DataFramethat replaces null values in string columns withvalue.- Parameters:
value- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.1
-
fill
Returns a new
DataFramethat replaces null or NaN values in specified numeric columns. If a specified column is not a numeric column, it is ignored.- Parameters:
value- (undocumented)cols- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.2.0
-
fill
Returns a new
DataFramethat replaces null or NaN values in specified numeric columns. If a specified column is not a numeric column, it is ignored.- Parameters:
value- (undocumented)cols- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.1
-
fill
public abstract Dataset<Row> fill
(long value, scala.collection.immutable.Seq<String> cols) (Scala-specific) Returns a new
DataFramethat replaces null or NaN values in specified numeric columns. If a specified column is not a numeric column, it is ignored.- Parameters:
value- (undocumented)cols- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.2.0
-
fill
public abstract Dataset<Row> fill
(double value, scala.collection.immutable.Seq<String> cols) (Scala-specific) Returns a new
DataFramethat replaces null or NaN values in specified numeric columns. If a specified column is not a numeric column, it is ignored.- Parameters:
value- (undocumented)cols- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.1
-
fill
Returns a new
DataFramethat replaces null values in specified string columns. If a specified column is not a string column, it is ignored.- Parameters:
value- (undocumented)cols- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.1
-
fill
public abstract Dataset<Row> fill
(String value, scala.collection.immutable.Seq<String> cols) (Scala-specific) Returns a new
DataFramethat replaces null values in specified string columns. If a specified column is not a string column, it is ignored.- Parameters:
value- (undocumented)cols- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.1
-
fill
public abstract Dataset<Row> fill
(boolean value) Returns a new
DataFramethat replaces null values in boolean columns withvalue.- Parameters:
value- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
fill
public abstract Dataset<Row> fill
(boolean value, scala.collection.immutable.Seq<String> cols) (Scala-specific) Returns a new
DataFramethat replaces null values in specified boolean columns. If a specified column is not a boolean column, it is ignored.- Parameters:
value- (undocumented)cols- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
fill
Returns a new
DataFramethat replaces null values in specified boolean columns. If a specified column is not a boolean column, it is ignored.- Parameters:
value- (undocumented)cols- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
fill
Returns a new
DataFramethat replaces null values.The key of the map is the column name, and the value of the map is the replacement value. The value must be of the following type:
Integer,Long,Float,Double,String,Boolean. Replacement values are cast to the column data type.For example, the following replaces null values in column "A" with string "unknown", and null values in column "B" with numeric value 1.0.
import com.google.common.collect.ImmutableMap; df.na.fill(ImmutableMap.of("A", "unknown", "B", 1.0));- Parameters:
valueMap- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.1
-
fill
(Scala-specific) Returns a new
DataFramethat replaces null values.The key of the map is the column name, and the value of the map is the replacement value. The value must be of the following type:
Int,Long,Float,Double,String,Boolean. Replacement values are cast to the column data type.For example, the following replaces null values in column "A" with string "unknown", and null values in column "B" with numeric value 1.0.
df.na.fill(Map( "A" -> "unknown", "B" -> 1.0 ))- Parameters:
valueMap- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.1
-
replace
Replaces values matching keys in
replacementmap with the corresponding values.import com.google.common.collect.ImmutableMap; // Replaces all occurrences of 1.0 with 2.0 in column "height". df.na.replace("height", ImmutableMap.of(1.0, 2.0)); // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "name". df.na.replace("name", ImmutableMap.of("UNKNOWN", "unnamed")); // Replaces all occurrences of "UNKNOWN" with "unnamed" in all string columns. df.na.replace("*", ImmutableMap.of("UNKNOWN", "unnamed"));- Parameters:
col- name of the column to apply the value replacement. Ifcolis "*", replacement is applied on all string, numeric or boolean columns.replacement- value replacement map. Key and value ofreplacementmap must have the same type, and can only be doubles, strings or booleans. The map value can have nulls.- Returns:
- (undocumented)
- Since:
- 1.3.1
-
replace
Replaces values matching keys in
replacementmap with the corresponding values.import com.google.common.collect.ImmutableMap; // Replaces all occurrences of 1.0 with 2.0 in column "height" and "weight". df.na.replace(new String[] {"height", "weight"}, ImmutableMap.of(1.0, 2.0)); // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "firstname" and "lastname". df.na.replace(new String[] {"firstname", "lastname"}, ImmutableMap.of("UNKNOWN", "unnamed"));- Parameters:
cols- list of columns to apply the value replacement. Ifcolis "*", replacement is applied on all string, numeric or boolean columns.replacement- value replacement map. Key and value ofreplacementmap must have the same type, and can only be doubles, strings or booleans. The map value can have nulls.- Returns:
- (undocumented)
- Since:
- 1.3.1
-
replace
public abstract <T> Dataset<Row> replace
(String col, scala.collection.immutable.Map<T, T> replacement) (Scala-specific) Replaces values matching keys in
replacementmap.// Replaces all occurrences of 1.0 with 2.0 in column "height". df.na.replace("height", Map(1.0 -> 2.0)); // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "name". df.na.replace("name", Map("UNKNOWN" -> "unnamed")); // Replaces all occurrences of "UNKNOWN" with "unnamed" in all string columns. df.na.replace("*", Map("UNKNOWN" -> "unnamed"));- Parameters:
col- name of the column to apply the value replacement. Ifcolis "*", replacement is applied on all string, numeric or boolean columns.replacement- value replacement map. Key and value ofreplacementmap must have the same type, and can only be doubles, strings or booleans. The map value can have nulls.- Returns:
- (undocumented)
- Since:
- 1.3.1
-
replace
public abstract <T> Dataset<Row> replace
(scala.collection.immutable.Seq<String> cols, scala.collection.immutable.Map<T, T> replacement) (Scala-specific) Replaces values matching keys in
replacementmap.// Replaces all occurrences of 1.0 with 2.0 in column "height" and "weight". df.na.replace("height" :: "weight" :: Nil, Map(1.0 -> 2.0)); // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "firstname" and "lastname". df.na.replace("firstname" :: "lastname" :: Nil, Map("UNKNOWN" -> "unnamed"));- Parameters:
cols- list of columns to apply the value replacement. Ifcolis "*", replacement is applied on all string, numeric or boolean columns.replacement- value replacement map. Key and value ofreplacementmap must have the same type, and can only be doubles, strings or booleans. The map value can have nulls.- Returns:
- (undocumented)
- Since:
- 1.3.1
-