What’s new in Spark 3.3.1?

Sonu Singh
2 min readJan 22, 2023

--

Apache Spark 3.3.1 is a minor release of the Spark 3.x series that contains several improvements, bug fixes, and new features. Some of the notable changes in Spark 3.3.1 include:

  • Improved support for running Spark on Kubernetes: Spark 3.3.1 includes several improvements to the Kubernetes scheduler backend, such as support for dynamic allocation, automatic detection of executor pod failures, and more.
  • Improved SQL support: Spark 3.3.1 includes several improvements to the SQL engine, such as support for the SQL standard EXCEPT operator, improved performance for window functions and more.
  • New Vectorized ORC reader: Spark 3.3.1 includes a new vectorized ORC reader that can significantly improve the performance of reading data from ORC-formatted files.
  • Improved support for Python: Spark 3.3.1 includes several improvements to the PySpark API, such as support for new data types, improved support for Pandas UDFs, and more.
  • Improved support for Java: Spark 3.3.1 includes several improvements to the Java API, such as support for new data types, improved support for Java 11, and more.
  • Performance Improvements: Spark 3.3.1 includes several performance improvements, such as improved performance for the groupByKey and reduceByKey operations, improved performance for the sortBy operation and more.
  • Bug fixes: Spark 3.3.1 includes several bug fixes and improvements to the stability of the framework.

This is not an exhaustive list and you can refer to the official release notes of Spark 3.3.1 for more information and other changes.

--

--

Sonu Singh
Sonu Singh

No responses yet