Spark Streaming (Legacy) — PySpark In-Progress documentation

DStream.cache()

Persist the RDDs of this DStream with the default storage level (MEMORY_ONLY).

DStream.checkpoint(interval)

Enable periodic checkpointing of RDDs of this DStream

DStream.cogroup(other[, numPartitions])

Return a new DStream by applying 'cogroup' between RDDs of this DStream and other DStream.

DStream.combineByKey(createCombiner, ...[, ...])

Return a new DStream by applying combineByKey to each RDD.

DStream.context()

Return the StreamingContext associated with this DStream

DStream.count()

Return a new DStream in which each RDD has a single element generated by counting each RDD of this DStream.

DStream.countByValue()

Return a new DStream in which each RDD contains the counts of each distinct value in each RDD of this DStream.

DStream.countByValueAndWindow(...[, ...])

Return a new DStream in which each RDD contains the count of distinct elements in RDDs in a sliding window over this DStream.

DStream.countByWindow(windowDuration, ...)

Return a new DStream in which each RDD has a single element generated by counting the number of elements in a window over this DStream.

DStream.filter(f)

Return a new DStream containing only the elements that satisfy predicate.

DStream.flatMap(f[, preservesPartitioning])

Return a new DStream by applying a function to all elements of this DStream, and then flattening the results

DStream.flatMapValues(f)

Return a new DStream by applying a flatmap function to the value of each key-value pairs in this DStream without changing the key.

DStream.foreachRDD(func)

Apply a function to each RDD in this DStream.

DStream.fullOuterJoin(other[, numPartitions])

Return a new DStream by applying 'full outer join' between RDDs of this DStream and other DStream.

DStream.glom()

Return a new DStream in which RDD is generated by applying glom() to RDD of this DStream.

DStream.groupByKey([numPartitions])

Return a new DStream by applying groupByKey on each RDD.

DStream.groupByKeyAndWindow(windowDuration, ...)

Return a new DStream by applying groupByKey over a sliding window.

DStream.join(other[, numPartitions])

Return a new DStream by applying 'join' between RDDs of this DStream and other DStream.

DStream.leftOuterJoin(other[, numPartitions])

Return a new DStream by applying 'left outer join' between RDDs of this DStream and other DStream.

DStream.map(f[, preservesPartitioning])

Return a new DStream by applying a function to each element of DStream.

DStream.mapPartitions(f[, preservesPartitioning])

Return a new DStream in which each RDD is generated by applying mapPartitions() to each RDDs of this DStream.

DStream.mapPartitionsWithIndex(f[, ...])

Return a new DStream in which each RDD is generated by applying mapPartitionsWithIndex() to each RDDs of this DStream.

DStream.mapValues(f)

Return a new DStream by applying a map function to the value of each key-value pairs in this DStream without changing the key.

DStream.partitionBy(numPartitions[, ...])

Return a copy of the DStream in which each RDD are partitioned using the specified partitioner.

DStream.persist(storageLevel)

Persist the RDDs of this DStream with the given storage level

DStream.reduce(func)

Return a new DStream in which each RDD has a single element generated by reducing each RDD of this DStream.

DStream.reduceByKey(func[, numPartitions])

Return a new DStream by applying reduceByKey to each RDD.

DStream.reduceByKeyAndWindow(func, invFunc, ...)

Return a new DStream by applying incremental reduceByKey over a sliding window.

DStream.reduceByWindow(reduceFunc, ...)

Return a new DStream in which each RDD has a single element generated by reducing all elements in a sliding window over this DStream.

DStream.repartition(numPartitions)

Return a new DStream with an increased or decreased level of parallelism.

DStream.rightOuterJoin(other[, numPartitions])

Return a new DStream by applying 'right outer join' between RDDs of this DStream and other DStream.

DStream.slice(begin, end)

Return all the RDDs between 'begin' to 'end' (both included)

DStream.transform(func)

Return a new DStream in which each RDD is generated by applying a function on each RDD of this DStream.

DStream.transformWith(func, other[, ...])

Return a new DStream in which each RDD is generated by applying a function on each RDD of this DStream and 'other' DStream.

DStream.union(other)

Return a new DStream by unifying data of another DStream with this DStream.

DStream.updateStateByKey(updateFunc[, ...])

Return a new "state" DStream where the state for each key is updated by applying the given function on the previous state of the key and the new values of the key.

DStream.window(windowDuration[, slideDuration])

Return a new DStream in which each RDD contains all the elements in seen in a sliding window of time over this DStream.